Skip to main content
Model ID: qwen-3-235b-a22b-thinking-2507

Model Stats

SPEED
~1700
tokens/sec
INPUT / OUTPUT
/
CONTEXT
Free Tier
65k tokens
Paid Tiers
131k tokens
MAX OUTPUT
Free Tier
32k tokens
Paid Tiers
40k tokens

Pricing

Input
$0.60 / M tokens
Output
$2.90 / M tokens
Developer pricing shown above is per million tokens. For volume discounts and enterprise features, see our pricing page.

Model Notes

This model supports only thinking mode. The default chat template automatically includes <think> tags, and it's normal to see output that contains only a closing </think> tag without an explicit opening <think> tag.
This model tends to produce longer, more verbose responses. To prevent truncation, we recommend setting max_completion_tokens to 64,000 when using this model.
In multi-turn conversations, the historical model output should contain only the final output portion and exclude thinking content. While the Jinja2 chat template handles this automatically, developers using other frameworks must manually ensure this best practice is implemented to maintain clean conversation history.

Rate Limits

TierRequests/minInput Tokens/minDaily Tokens
Free3060k1M
Developer1K1MN/A

Endpoints

Chat Completions
Completions

Features

Reasoning
Streaming
Structured Outputs
Tool Calling

Need Higher Limits?

Reach out for custom pricing with our Enterprise tier for higher rate limits and dedicated support.
I