Skip to main content
Model ID: qwen-3-32b

Model Stats

SPEED
~2600
tokens/sec
INPUT / OUTPUT
/
CONTEXT
Free Tier
65k tokens
Paid Tiers
131k tokens
MAX OUTPUT
Free Tier
8k tokens
Paid Tiers
8k tokens

Pricing

Input
$0.40 / M tokens
Output
$0.80 / M tokens
Developer pricing shown above is per million tokens. For volume discounts and enterprise features, see our pricing page.

Model Notes

Currently, Cerebras only supports the default reasoning mode. However, if you don't want the model to use reasoning for certain queries, or if you experience reduced accuracy when using this model with long contexts (e.g., 131k tokens), try appending /no_think to your prompt to disable the model's default reasoning behavior.

For example: Tell me about cats /no_think
When using thinking mode, we recommend setting temperature=0.6 and top_p=0.95, and avoid greedy decoding completely as it causes performance issues and repetitions.

Rate Limits

TierRequests/minInput Tokens/minDaily Tokens
Free3060k1M
Developer1K1MN/A

Endpoints

Chat Completions
Completions

Capabilities

Reasoning
Streaming
Structured Outputs
Tool Calling

Need Higher Limits?

Reach out for custom pricing with our Enterprise tier for higher rate limits and dedicated support.
I