Reasoning flags are currently only available for the OpenAI GPT OSS model.
reasoning_effort
parameter within the chat.completions.create
method. This parameter controls the amount of reasoning the model performs.
1
Initial Setup
Begin by importing the Cerebras SDK and setting up the client.
2
Using Reasoning
Set the
reasoning_effort
parameter within the chat.completions.create
method to enable reasoning capabilities.Reasoning Effort Levels
Thereasoning_effort
parameter accepts the following values:
"low"
- Minimal reasoning, faster responses"medium"
- Moderate reasoning (default)"high"
- Extensive reasoning, more thorough analysis
Accessing Reasoning Tokens
When reasoning is enabled, the model’s internal thought process is included in the response format. The structure differs depending on whether you’re using streaming or non-streaming responses.Non-Streaming Responses
In non-streaming responses, the reasoning content is included in areasoning
field within the message:
Streaming Responses
When using streaming with reasoning models, reasoning tokens are delivered in thereasoning
field of the delta for models that support it:
Reasoning Context Retention
Reasoning tokens are not automatically retained across requests. If you want the model to maintain awareness of its prior reasoning, you’ll need to pass the reasoning tokens back into the conversation manually. To do this, include the reasoning text in thecontent
field of an assistant
message in your next request, alongside the assistant’s answer.
For example: