Context Length
Free Tier64k tokens
Paid Tiers128k tokens
Speed
~2600
tokens/sec
Input / Output
Input Formats JSON, plain text
Output FormatsJSON, plain text, structured
Pricing
Input
$0.40 / M tokens
Output
$0.80 / M tokens
Exploration pricing shown above is per million tokens. For volume discounts and enterprise features, see our pricing page.
Model Notes
Model ID:
qwen-3-32b
Currently, Cerebras only supports the default reasoning mode. However, if you don't want the model to use reasoning for certain queries, or if you experience reduced accuracy when using this model with long contexts (e.g., 131k tokens), try appending
For example:
/no_think
to your prompt to disable the model's default reasoning behavior.For example:
Tell me about cats /no_think
When using thinking mode, we recommend setting
temperature=0.6
and top_p=0.95
, and avoid greedy decoding completely as it causes performance issues and repetitions.Rate Limits
Tier | Requests/min | Input Tokens/min | Output Tokens/min | Daily Tokens |
---|---|---|---|---|
Free | 30 | 60k | 8k/request | 1M |
1 | 300 | 300k | 30k | 70M |
2 | 600 | 600k | 60k | 150M |
3 | 1000 | 1M | 100K | 325M |
4 | 1200 | 1.2M | 120k | 470M |
5 | 1450 | 1.45M | 145k | 680M |
Endpoints
Chat Completions
Completions
Features
Reasoning
Streaming
Structured Outputs
Tool Calling