Reasoning capabilities are currently available for the OpenAI GPT OSS (
gpt-oss-120b) and Z.ai GLM 4.6 (zai-glm-4.6) models. Each model uses different parameters to control reasoning.Reasoning with OpenAI GPT OSS
Use thereasoning_effort parameter to control the amount of reasoning the model performs.
1
Initial Setup
Begin by importing the Cerebras SDK and setting up the client.
2
Using Reasoning with reasoning_effort
Set the
reasoning_effort parameter within the chat.completions.create method to control reasoning capabilities.Reasoning Effort Levels
This applies only to
gpt-oss-120b.reasoning_effort parameter accepts the following values:
"low"- Minimal reasoning, faster responses"medium"- Moderate reasoning (default)"high"- Extensive reasoning, more thorough analysis
Reasoning with Z.ai GLM 4.6
Use thedisable_reasoning parameter to toggle reasoning on or off.
There are key differences between the OpenAI client and the Cerebras SDK when using non-standard OpenAI parameters. This example uses the Cerebras SDK. For more info, see Passing Non-Standard Parameters.
1
Initial Setup
Begin by importing the Cerebras SDK and setting up the client.
2
Using Reasoning with disable_reasoning
Set the
disable_reasoning parameter within the chat.completions.create method to control reasoning. Set to true to disable reasoning, or false (or omit) to enable it.Accessing Reasoning Tokens
When reasoning is enabled, the model’s internal thought process is included in the response format. The structure differs depending on whether you’re using streaming or non-streaming responses.Non-Streaming Responses
In non-streaming responses, the reasoning content is included in areasoning field within the message:
Streaming Responses
When using streaming with reasoning models, reasoning tokens are delivered in thereasoning field of the delta for models that support it:
Reasoning Context Retention
Reasoning tokens are not automatically retained across requests. If you want the model to maintain awareness of its prior reasoning, you’ll need to pass the reasoning tokens back into the conversation manually. To do this, include the reasoning text in thecontent field of an assistant message in your next request, alongside the assistant’s answer.
For example:
- Non-Streaming
- Streaming

