Reasoning flags are currently only available for the OpenAI GPT OSS model.
To control reasoning, use the reasoning_effort parameter within the chat.completions.create method. This parameter controls the amount of reasoning the model performs.
1

Initial Setup

Begin by importing the Cerebras SDK and setting up the client.
import os
from cerebras.cloud.sdk import Cerebras

client = Cerebras(
    # This is the default and can be omitted
    api_key=os.environ.get("CEREBRAS_API_KEY"),
)
2

Using Reasoning

Set the reasoning_effort parameter within the chat.completions.create method to enable reasoning capabilities.
completion_create_response = client.chat.completions.create(
  messages=[
      {
          "role": "system",
          "content": "You are a helpful assistant."
      },
      {
          "role": "user",
          "content": "Say hello to the world."
      },
      {
          "role": "assistant",
          "content": "Hello, world! 🌍"
      }
  ],
  model="gpt-oss-120b",
  stream=False,
  max_completion_tokens=65536,
  temperature=1,
  top_p=1,
  reasoning_effort="medium"
)

print(completion_create_response)

Reasoning Effort Levels

The reasoning_effort parameter accepts the following values:
  • "low" - Minimal reasoning, faster responses
  • "medium" - Moderate reasoning (default)
  • "high" - Extensive reasoning, more thorough analysis

Accessing Reasoning Tokens

When reasoning is enabled, the model’s internal thought process is included in the response format. The structure differs depending on whether you’re using streaming or non-streaming responses.

Non-Streaming Responses

In non-streaming responses, the reasoning content is included in a reasoning field within the message:
{
  "id": "chatcmpl-xxx",
  "object": "chat.completion",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Hello, World!",
        "reasoning": "The user is asking for a simple greeting to the world."
      }
    }
  ]
}

Streaming Responses

When using streaming with reasoning models, reasoning tokens are delivered in the reasoning field of the delta for models that support it:
{
  "choices": [
    {
      "delta": {
        "reasoning": " should"
      },
      "index": 0
    }
  ]
}

Reasoning Context Retention

Reasoning tokens are not automatically retained across requests. If you want the model to maintain awareness of its prior reasoning, you’ll need to pass the reasoning tokens back into the conversation manually. To do this, include the reasoning text in the content field of an assistant message in your next request, alongside the assistant’s answer. For example:
completion_create_response = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Explain the difference between supervised and unsupervised learning."
        },
        {
            "role": "assistant",
            "content": "<think>Supervised learning uses labeled data…</think><answer>Supervised learning is…</answer>"
        },
        {
            "role": "user",
            "content": "Can you give an example?"
        }
    ],
    model="gpt-oss-120b",
    stream=False,
    max_completion_tokens=65536,
    temperature=1,
    top_p=1,
    reasoning_effort="medium"
)

print(completion_create_response)