Context Length

Free Tier16,382 tokens
Paid TiersUp to 32k

Speed

~2100
tokens/sec

Input / Output

Input Formats JSON, plain text
Output FormatsJSON, plain text, structured

Model Notes

Model ID: qwen-3-32b
Currently, Cerebras only supports the default reasoning mode. However, if you don't want the model to use reasoning for certain queries, you can add /no_think to your prompt.

For example: Tell me about cats /no_think

Rate Limits

TierRequests/minInput Tokens/minOutput Tokens/minDaily Tokens
Free3060k-1M
1300300k30k70M
2600600k60k150M
310001M100K325M
412001.2M120k470M
514501.45M145k680M

Endpoints

Chat Completions
Completions

Features

Streaming
Structured Outputs
Tool Calling
Multi-Turn Tool Calling
Tool Calling w/ Structured Outputs
Tool Calling w/ Reasoning
Top P
Temperature
Max Completion Tokens
Logit Probabilities

Need Higher Limits?

Reach out for custom pricing with our Enterprise tier for higher rate limits and dedicated support.

Contact Sales