Context Length

Free Tier64k tokens
Paid TiersUp to 128k

Speed

~2100
tokens/sec

Input / Output

Input Formats JSON, plain text
Output FormatsJSON, plain text, structured

Model Notes

Model ID: qwen-3-32b
Available through OpenRouter.
Currently, Cerebras only supports the default reasoning mode. However, if you don't want the model to use reasoning for certain queries, or if you experience reduced accuracy when using this model with long contexts (e.g., 131k tokens), try appending /no_think to your prompt to disable the model's default reasoning behavior.

For example: Tell me about cats /no_think
When using thinking mode, we recommend setting temperature=0.6 and top_p=0.95, and avoid greedy decoding completely as it causes performance issues and repetitions.

Rate Limits

TierRequests/minInput Tokens/minOutput Tokens/minDaily Tokens
Free3060k8k/request1M
1300300k30k70M
2600600k60k150M
310001M100K325M
412001.2M120k470M
514501.45M145k680M

Endpoints

Chat Completions
Completions

Features

Streaming
Structured Outputs
Tool Calling
Multi-Turn Tool Calling
Tool Calling w/ Structured Outputs
Tool Calling w/ Reasoning
Top P
Temperature
Max Completion Tokens
Logit Probabilities

Need Higher Limits?

Reach out for custom pricing with our Enterprise tier for higher rate limits and dedicated support.Contact Sales