This is a hybrid reasoning model that can operate with or without thinking tokens. It’s ideal for complex reasoning tasks, multi-step workflows, and applications requiring both speed and intelligence.
qwen-3-32b
/no_think
to your prompt to disable the model's default reasoning behavior.Tell me about cats /no_think
temperature=0.6
and top_p=0.95
, and avoid greedy decoding completely as it causes performance issues and repetitions.Tier | Requests/min | Input Tokens/min | Output Tokens/min | Daily Tokens |
---|---|---|---|---|
Free | 30 | 60k | 8k/request | 1M |
1 | 300 | 300k | 30k | 70M |
2 | 600 | 600k | 60k | 150M |
3 | 1000 | 1M | 100K | 325M |
4 | 1200 | 1.2M | 120k | 470M |
5 | 1450 | 1.45M | 145k | 680M |
Chat Completions
Completions