Qwen 3 32B

Model ID: qwen-3-32b

Model Stats

SPEED

~2600

tokens/sec

INPUT / OUTPUT

CONTEXT

Free Tier

65k tokens

Paid Tiers

131k tokens

MAX OUTPUT

Free Tier

8k tokens

Paid Tiers

8k tokens

Pricing

Input

$0.40 / M tokens

Output

$0.80 / M tokens

Developer pricing shown above is per million tokens. For volume discounts and enterprise features, see our pricing page.

Model Notes

Currently, Cerebras only supports the default reasoning mode. However, if you don't want the model to use reasoning for certain queries, or if you experience reduced accuracy when using this model with long contexts (e.g., 131k tokens), try appending /no_think to your prompt to disable the model's default reasoning behavior.

For example: Tell me about cats /no_think

When using thinking mode, we recommend setting temperature=0.6 and top_p=0.95, and avoid greedy decoding completely as it causes performance issues and repetitions.

Rate Limits

Tier	Requests/min	Input Tokens/min	Daily Tokens
Free	30	60k	1M
Developer	1K	1M	N/A

Endpoints

Chat Completions

Completions

Capabilities

Reasoning

Streaming

Structured Outputs

Tool Calling

Need Higher Limits?

Reach out for custom pricing with our Enterprise tier for higher rate limits and dedicated support.

Contact Sales

Llama 3.3 70B

Qwen 3 235B Instruct

⌘I

Get Started

Capabilities

Resources

Support

Model Stats

Pricing

Model Notes

Rate Limits

Endpoints

Capabilities

Need Higher Limits?