Context Length

Free Tier65k tokens
Paid Tiers128k tokens

Speed

~1700
tokens/sec

Input / Output

Input Formats JSON, plain text
Output FormatsJSON, plain text, structured

Model Notes

Model ID: qwen-3-235b-a22b-thinking-2507
This model supports only thinking mode. The default chat template automatically includes <think> tags, and it's normal to see output that contains only a closing </think> tag without an explicit opening <think> tag.
This model tends to produce longer, more verbose responses. To prevent truncation, we recommend setting max_completion_tokens to 64,000 when using this model.
In multi-turn conversations, the historical model output should contain only the final output portion and exclude thinking content. While the Jinja2 chat template handles this automatically, developers using other frameworks must manually ensure this best practice is implemented to maintain clean conversation history.

Pricing

Input
$0.60 / M tokens
Output
$1.20 / M tokens
Exploration pricing shown above is per-token. For volume discounts and enterprise features, see our pricing page.

Rate Limits

TierRequests/minInput Tokens/minOutput Tokens/minDaily Tokens
Free3060k8k/request1M

Endpoints

Chat Completions
Completions

Features

Streaming
Structured Outputs
Tool Calling

Need Higher Limits?

Reach out for custom pricing with our Enterprise tier for higher rate limits and dedicated support.Contact Sales