Models
Qwen 3 235B
This is a mixture-of-experts model featuring hybrid thinking modes that allow you to toggle between quick responses and step-by-step reasoning. It’s optimized for coding, mathematics, and agentic workflows with support for 119 languages.
Context Length
Free Tier41k tokens
Paid TiersUp to 131k
Speed
~1500
tokens/sec
Input / Output
Input Formats JSON, plain text
Output FormatsJSON, plain text, structured
Model Notes
Model ID:
qwen-3-235b-a22b
Currently, Cerebras only supports the default reasoning mode. However, if you don't want the model to use reasoning for certain queries, you can add
For example:
/no_think
to your prompt.For example:
Tell me about cats /no_think
When using thinking mode, we recommend setting
temperature=0.6
and top_p=0.95
, and avoid greedy decoding completely as it causes performance issues and repetitions.Rate Limits
Tier | Requests/min | Input Tokens/min | Output Tokens/min | Daily Tokens |
---|---|---|---|---|
Free | 30 | 64k | 8k/request | 1M |
Endpoints
Chat Completions
Completions
Features
Streaming
Structured Outputs
Tool Calling
Multi-Turn Tool Calling
Tool Calling w/ Structured Outputs
Tool Calling w/ Reasoning
Top P
Temperature
Max Completion Tokens
Logit Probabilities
Need Higher Limits?
Reach out for custom pricing with our Enterprise tier for higher rate limits and dedicated support.
Contact Sales