Qwen 3 235B Thinking

This model will be deprecated on November 14, 2025.

Model ID: qwen-3-235b-a22b-thinking-2507

Model card

Model Stats

SPEED

~1700

tokens/sec

INPUT / OUTPUT

CONTEXT

Free Tier

65k tokens

Paid Tiers

131k tokens

MAX OUTPUT

Free Tier

32k tokens

Paid Tiers

40k tokens

Pricing

Input

$0.60 / M tokens

Output

$2.90 / M tokens

Exploration pricing shown above is per million tokens. For volume discounts and enterprise features, see our pricing page.

Model Notes

This model supports only thinking mode. The default chat template automatically includes <think> tags, and it's normal to see output that contains only a closing </think> tag without an explicit opening <think> tag.

This model tends to produce longer, more verbose responses. To prevent truncation, we recommend setting max_completion_tokens to 64,000 when using this model.

In multi-turn conversations, the historical model output should contain only the final output portion and exclude thinking content. While the Jinja2 chat template handles this automatically, developers using other frameworks must manually ensure this best practice is implemented to maintain clean conversation history.

Rate Limits

Tier	Requests/min	Input Tokens/min	Daily Tokens
Free	30	60k	1M
Developer	1K	1M	N/A

Endpoints

→

Chat Completions

/v1/chat/completions

→

Completions

/v1/completions

Capabilities

✓Reasoning

✓Streaming

✓Structured Outputs

✓Tool Calling

Need Higher Limits?

Reach out for custom pricing with our Enterprise tier for higher rate limits and dedicated support.

Contact Sales

⌘I

Get Started

Capabilities

Compatibility

Resources

Support

Model Stats

Pricing

Model Notes

Rate Limits

Endpoints

Capabilities

Need Higher Limits?