from cerebras.cloud.sdk import Cerebras
import os
client = Cerebras(api_key=os.environ.get("CEREBRAS_API_KEY"),)
chat_completion = client.chat.completions.create(
model="gpt-oss-120b",
messages=[
{"role": "user", "content": "Hello!",}
],
)
print(chat_completion)
{
"id": "chatcmpl-30b3c3d8-ca41-48e7-9ef0-27e322604a13",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! 👋 How can I help you today?",
"reasoning": "The user just says \"Hello!\" with no further context.\n\nWe need to respond politely, maybe ask how can assist.\n\nWe can also be friendly.\n\nNo constraints. We'll respond as chat.",
"tool_calls": null
},
"logprobs": null,
"reasoning_logprobs": null
}
],
"created": 1769729480,
"model": "gpt-oss-120b",
"object": "chat.completion",
"system_fingerprint": "fp_e7ab83753cbd28777b40",
"time_info": {
"completion_time": 0.04040406,
"prompt_time": 0.00383762,
"queue_time": 0.004628115,
"total_time": 0.05013155937194824,
"created": 1769729480.0787008
},
"usage": {
"completion_tokens": 59,
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0,
"reasoning_tokens": 0
},
"prompt_tokens": 69,
"prompt_tokens_details": {
"cached_tokens": 0
},
"total_tokens": 128
},
"service_tier": null
}
from cerebras.cloud.sdk import Cerebras
import os
client = Cerebras(api_key=os.environ.get("CEREBRAS_API_KEY"),)
chat_completion = client.chat.completions.create(
model="gpt-oss-120b",
messages=[
{"role": "user", "content": "Hello!",}
],
)
print(chat_completion)
{
"id": "chatcmpl-30b3c3d8-ca41-48e7-9ef0-27e322604a13",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! 👋 How can I help you today?",
"reasoning": "The user just says \"Hello!\" with no further context.\n\nWe need to respond politely, maybe ask how can assist.\n\nWe can also be friendly.\n\nNo constraints. We'll respond as chat.",
"tool_calls": null
},
"logprobs": null,
"reasoning_logprobs": null
}
],
"created": 1769729480,
"model": "gpt-oss-120b",
"object": "chat.completion",
"system_fingerprint": "fp_e7ab83753cbd28777b40",
"time_info": {
"completion_time": 0.04040406,
"prompt_time": 0.00383762,
"queue_time": 0.004628115,
"total_time": 0.05013155937194824,
"created": 1769729480.0787008
},
"usage": {
"completion_tokens": 59,
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0,
"reasoning_tokens": 0
},
"prompt_tokens": 69,
"prompt_tokens_details": {
"cached_tokens": 0
},
"total_tokens": 128
},
"service_tier": null
}
flex or auto service tiers. Requests are preemptively rejected if the rolling average queue time exceeds this threshold.50 - 20000 (milliseconds)Default: System default if not specifiedSee Service Tiers for more information.Show possible types
objectShow properties
Show possible types
assistant.Show properties
function is supported.objectdeveloper messages replace the previous system messages, but system is still accepted.developer role is currently only available for the gpt-oss-120b model.Show properties
Show possible types
developer.objectShow properties
Show possible types
system.objectShow properties
Show possible types
tool.objectShow properties
Show possible types
stringarrayShow possible types
user.llama3.1-8bqwen-3-235b-a22b-instruct-2507 (preview)gpt-oss-120bzai-glm-4.7 (preview)false - Thinking from all previous turns is preserved in the conversation history. Recommended for agentic workflows where reasoning from past tool-calling turns may be relevant for future tool calls.true (default) - Thinking from earlier turns is excluded. Recommended for general chat conversations where reasoning from past turns is less relevant for performance.null, the API defaults to clear_thinking: true.falsetrueShow possible types
Show properties
Show possible types
stringarrayShow possible types
content."low" – Minimal reasoning, faster responses"medium" – Moderate reasoning (default)"high" – Extensive reasoning, more thorough analysis"none" – Disables reasoning entirely{ "type": "json_schema", "json_schema": { "name": "schema_name", "strict": true, "schema": {...} } } enforces schema compliance by ensuring that the model output conforms to your specified JSON schema. See Structured Outputs for more information.Setting { "type": "json_object" } enables the legacy JSON mode, ensuring that the model output is valid JSON. However, using json_schema is recommended for models that support it.Show properties
objectShow properties
text.objectShow properties
Show properties
true, enforces strict adherence to the schema. The model will only return fields defined in the schema and with the correct types. When false, behaves similar to JSON mode but uses the schema as a guide. Defaults to false.json_schema.objectjson_schema is recommended for models that support it. To use json_object remember to also include a system or user message to specify the desired format.Show properties
json_object.json_object is not compatible with streaming - stream must be set to false.seed and parameters should return the same result. Determinism is not guaranteed.priority - Highest priority processing (Only available for dedicated endpoints, not shared endpoints.)default - Standard priority processingauto - Automatically uses the highest available service tierflex - Lowest priority processingdefaultSee Service Tiers for more information.none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool.none is the default when no tools are present. auto is the default if tools are present.Show properties
function is supported.logprobs must be set to true if this parameter is used.n is greater than 1.Show choice properties
stop, length, content_filter, tool_calls.Show message properties
Show possible types
chat.completion.null if not specified.service_tier is set to auto in the request.Possible values: priority, default, flexShow usage properties
Show properties
Show properties
Show time_info properties
from cerebras.cloud.sdk import Cerebras
import os
client = Cerebras(api_key=os.environ.get("CEREBRAS_API_KEY"),)
chat_completion = client.chat.completions.create(
model="gpt-oss-120b",
messages=[
{"role": "user", "content": "Hello!",}
],
)
print(chat_completion)
{
"id": "chatcmpl-30b3c3d8-ca41-48e7-9ef0-27e322604a13",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! 👋 How can I help you today?",
"reasoning": "The user just says \"Hello!\" with no further context.\n\nWe need to respond politely, maybe ask how can assist.\n\nWe can also be friendly.\n\nNo constraints. We'll respond as chat.",
"tool_calls": null
},
"logprobs": null,
"reasoning_logprobs": null
}
],
"created": 1769729480,
"model": "gpt-oss-120b",
"object": "chat.completion",
"system_fingerprint": "fp_e7ab83753cbd28777b40",
"time_info": {
"completion_time": 0.04040406,
"prompt_time": 0.00383762,
"queue_time": 0.004628115,
"total_time": 0.05013155937194824,
"created": 1769729480.0787008
},
"usage": {
"completion_tokens": 59,
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0,
"reasoning_tokens": 0
},
"prompt_tokens": 69,
"prompt_tokens_details": {
"cached_tokens": 0
},
"total_tokens": 128
},
"service_tier": null
}
Was this page helpful?