Reasoning

Reasoning capabilities are currently available for the OpenAI GPT OSS (gpt-oss-120b) and Z.ai GLM 4.6 (zai-glm-4.6) models. Each model uses different parameters to control reasoning.

Reasoning with OpenAI GPT OSS

Use the reasoning_effort parameter to control the amount of reasoning the model performs.

Initial Setup

Begin by importing the Cerebras SDK and setting up the client.

import os
from cerebras.cloud.sdk import Cerebras

client = Cerebras(
    # This is the default and can be omitted
    api_key=os.environ.get("CEREBRAS_API_KEY"),
)

Using Reasoning with reasoning_effort

Set the reasoning_effort parameter within the chat.completions.create method to control reasoning capabilities.

completion_create_response = client.chat.completions.create(
  messages=[
      {
          "role": "system",
          "content": "You are a helpful assistant."
      },
      {
          "role": "user",
          "content": "Say hello to the world."
      },
      {
          "role": "assistant",
          "content": "Hello, world! 🌍"
      }
  ],
  model="gpt-oss-120b",
  stream=False,
  max_completion_tokens=65536,
  temperature=1,
  top_p=1,
  reasoning_effort="medium"
)

print(completion_create_response)

Reasoning Effort Levels

This applies only to gpt-oss-120b.

The reasoning_effort parameter accepts the following values:

"low" - Minimal reasoning, faster responses
"medium" - Moderate reasoning (default)
"high" - Extensive reasoning, more thorough analysis

Reasoning with Z.ai GLM 4.6

Use the disable_reasoning parameter to toggle reasoning on or off.

There are key differences between the OpenAI client and the Cerebras SDK when using non-standard OpenAI parameters. This example uses the Cerebras SDK. For more info, see Passing Non-Standard Parameters.

Initial Setup

Begin by importing the Cerebras SDK and setting up the client.

import os
from cerebras.cloud.sdk import Cerebras

client = Cerebras(
    # This is the default and can be omitted
    api_key=os.environ.get("CEREBRAS_API_KEY"),
)

Using Reasoning with disable_reasoning

Set the disable_reasoning parameter within the chat.completions.create method to control reasoning. Set to true to disable reasoning, or false (or omit) to enable it.

completion_create_response = client.chat.completions.create(
  messages=[
      {
          "role": "user",
          "content": "Explain how photosynthesis works."
      }
  ],
  model="zai-glm-4.6",
  stream=False,
  max_completion_tokens=65536,
  disable_reasoning=False  # Set to True to disable reasoning
)

print(completion_create_response)

Accessing Reasoning Tokens

When reasoning is enabled, the model’s internal thought process is included in the response format. The structure differs depending on whether you’re using streaming or non-streaming responses.

Non-Streaming Responses

In non-streaming responses, the reasoning content is included in a reasoning field within the message:

{
  "id": "chatcmpl-xxx",
  "object": "chat.completion",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Hello, World!",
        "reasoning": "The user is asking for a simple greeting to the world."
      }
    }
  ]
}

Streaming Responses

When using streaming with reasoning models, reasoning tokens are delivered in the reasoning field of the delta for models that support it:

{
  "choices": [
    {
      "delta": {
        "reasoning": " should"
      },
      "index": 0
    }
  ]
}

Reasoning Context Retention

Reasoning tokens are not automatically retained across requests. If you want the model to maintain awareness of its prior reasoning, you’ll need to pass the reasoning tokens back into the conversation manually. To do this, include the reasoning text in the content field of an assistant message in your next request, alongside the assistant’s answer. For example:

Non-Streaming
Streaming

completion_create_response = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Explain the difference between supervised and unsupervised learning."
        },
        {
            "role": "assistant",
            "content": "Supervised learning uses labeled data…<answer>Supervised learning is…</answer>"
        },
        {
            "role": "user",
            "content": "Can you give an example?"
        }
    ],
    model="gpt-oss-120b",
    stream=False,
    max_completion_tokens=65536,
    temperature=1,
    top_p=1,
    reasoning_effort="medium"
)

print(completion_create_response)

Get Started

Capabilities

Resources

Support

Reasoning with OpenAI GPT OSS

Reasoning Effort Levels

Reasoning with Z.ai GLM 4.6

Accessing Reasoning Tokens

Non-Streaming Responses

Streaming Responses

Reasoning Context Retention

Get Started

Capabilities

Resources

Support

​Reasoning with OpenAI GPT OSS

​Reasoning Effort Levels

​Reasoning with Z.ai GLM 4.6

​Accessing Reasoning Tokens

​Non-Streaming Responses

​Streaming Responses

​Reasoning Context Retention

Reasoning with OpenAI GPT OSS

Reasoning Effort Levels

Reasoning with Z.ai GLM 4.6

Accessing Reasoning Tokens

Non-Streaming Responses

Streaming Responses

Reasoning Context Retention