Context Length

Free Tier41k tokens
Paid TiersUp to 131k

Speed

~1500
tokens/sec

Input / Output

Input Formats JSON, plain text
Output FormatsJSON, plain text, structured

Model Notes

Model ID: qwen-3-235b-a22b
Currently, Cerebras only supports the default reasoning mode. However, if you don't want the model to use reasoning for certain queries, you can add /no_think to your prompt.

For example: Tell me about cats /no_think
When using thinking mode, we recommend setting temperature=0.6 and top_p=0.95, and avoid greedy decoding completely as it causes performance issues and repetitions.

Rate Limits

TierRequests/minInput Tokens/minOutput Tokens/minDaily Tokens
Free3064k8k/request1M

Endpoints

Chat Completions
Completions

Features

Streaming
Structured Outputs
Tool Calling
Multi-Turn Tool Calling
Tool Calling w/ Structured Outputs
Tool Calling w/ Reasoning
Top P
Temperature
Max Completion Tokens
Logit Probabilities

Need Higher Limits?

Reach out for custom pricing with our Enterprise tier for higher rate limits and dedicated support.

Contact Sales