Reasoning capabilities are currently available for the OpenAI GPT OSS (
gpt-oss-120b), Qwen3 (qwen3-32b), and Z.ai GLM (zai-glm-4.6, zai-glm-4.7) models. Each model family has slight variations in the parameters used to control reasoning.Reasoning Format
Control how reasoning text appears in responses using thereasoning_format parameter.
Available Formats
| Format | Description |
|---|---|
parsed | Reasoning returned in separate reasoning field; logprobs separated into reasoning_logprobs |
raw | Reasoning prepended to content; GLM and Qwen use <think>...</think> tokens, GPT-OSS concatenates without separators |
hidden | Reasoning text and logprobs dropped completely (tokens still counted) |
none | Uses model’s default behavior |
Default Behavior by Model
Whenreasoning_format is set to none or omitted, each model uses its default format:
| Model | Default Reasoning Format |
|---|---|
| Qwen3 | raw (hidden for JSON object/schema) |
| GLM | text_parsed |
| GPT-OSS | text_parsed |
parsed Format
Reasoning text is returned in a separate reasoning field without start/end tokens. When logprobs are enabled, reasoning logprobs are returned in a separate reasoning_logprobs field.
reasoning field of the delta.
raw Format
Reasoning text is included in the content field, prepended to the response. For GLM and Qwen models, reasoning is wrapped in <think>...</think> tokens. All logprobs are returned together in the standard logprobs field.
Since GPT-OSS does not use thinking tokens, reasoning and content are concatenated without separators when using
raw format.The
raw format is not compatible with json_object or json_schema response formats. Models that default to raw will automatically use hidden instead when structured output is requested.hidden Format
Reasoning text and reasoning logprobs are dropped completely from the response. The reasoning tokens are still generated and counted toward total completion tokens.
Model-Specific Parameters
Each model family has its own parameter for controlling reasoning behavior.There are key differences between the OpenAI client and the Cerebras SDK when using non-standard OpenAI parameters. These examples use the Cerebras SDK. For more info, see Passing Non-Standard Parameters.
GPT-OSS: reasoning_effort
Use reasoning_effort to control how much reasoning the model performs:
"low"- Minimal reasoning, faster responses"medium"- Moderate reasoning (default)"high"- Extensive reasoning, more thorough analysis
GLM: disable_reasoning
Use disable_reasoning to toggle reasoning on or off. Set to true to disable reasoning, or false (default) to enable it.
Reasoning Context Retention
Reasoning tokens are not automatically retained across requests. To maintain awareness of prior reasoning in multi-turn conversations, include the reasoning text in thecontent field of the assistant message.
Use the same format the model outputs: for GLM and Qwen, include reasoning in <think>...</think> tags; for GPT-OSS, prepend reasoning text directly before the answer.
- GPT-OSS
- GLM / Qwen

