from cerebras.cloud.sdk import Cerebras
import os
client = Cerebras(api_key=os.environ.get("CEREBRAS_API_KEY"),)
chat_completion = client.chat.completions.create(
model="gpt-oss-120b",
messages=[
{"role": "user", "content": "Hello!",}
],
)
print(chat_completion)
{
"id": "chatcmpl-292e278f-514e-4186-9010-91ce6a14168b",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Hello! How can I assist you today?",
"reasoning": "The user is asking for a simple greeting to the world. This is a straightforward request that doesn't require complex analysis. I should provide a friendly, direct response.",
"role": "assistant"
}
}
],
"created": 1723733419,
"model": "gpt-oss-120b",
"system_fingerprint": "fp_70185065a4",
"object": "chat.completion",
"usage": {
"prompt_tokens": 12,
"completion_tokens": 10,
"total_tokens": 22,
"prompt_tokens_details": {
"cached_tokens": 0
},
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"time_info": {
"queue_time": 0.000073161,
"prompt_time": 0.0010744798888888889,
"completion_time": 0.005658071111111111,
"total_time": 0.022224903106689453,
"created": 1723733419
}
}
from cerebras.cloud.sdk import Cerebras
import os
client = Cerebras(api_key=os.environ.get("CEREBRAS_API_KEY"),)
chat_completion = client.chat.completions.create(
model="gpt-oss-120b",
messages=[
{"role": "user", "content": "Hello!",}
],
)
print(chat_completion)
{
"id": "chatcmpl-292e278f-514e-4186-9010-91ce6a14168b",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Hello! How can I assist you today?",
"reasoning": "The user is asking for a simple greeting to the world. This is a straightforward request that doesn't require complex analysis. I should provide a friendly, direct response.",
"role": "assistant"
}
}
],
"created": 1723733419,
"model": "gpt-oss-120b",
"system_fingerprint": "fp_70185065a4",
"object": "chat.completion",
"usage": {
"prompt_tokens": 12,
"completion_tokens": 10,
"total_tokens": 22,
"prompt_tokens_details": {
"cached_tokens": 0
},
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"time_info": {
"queue_time": 0.000073161,
"prompt_time": 0.0010744798888888889,
"completion_time": 0.005658071111111111,
"total_time": 0.022224903106689453,
"created": 1723733419
}
}
flex or auto service tiers. Requests are preemptively rejected if the rolling average queue time exceeds this threshold.50 - 20000 (milliseconds)Default: System default if not specifiedSee Service Tiers for more information.false - Thinking from all previous turns is preserved in the conversation history. Recommended for agentic workflows where reasoning from past tool-calling turns may be relevant for future tool calls.true (default) - Thinking from earlier turns is excluded. Recommended for general chat conversations where reasoning from past turns is less relevant for performance.null, the API defaults to clear_thinking: true.falsemessages parameter as a string. Support for other object types will be added in future releases.llama3.1-8bllama-3.3-70bqwen-3-32bqwen-3-235b-a22b-instruct-2507 (preview)gpt-oss-120bzai-glm-4.7 (preview)trueShow possible types
Show properties
Show possible types
stringarrayShow possible types
content."low" - Minimal reasoning, faster responses"medium" - Moderate reasoning (default)"high" - Extensive reasoning, more thorough analysis{ "type": "json_schema", "json_schema": { "name": "schema_name", "strict": true, "schema": {...} } } enforces schema compliance by ensuring that the model output conforms to your specified JSON schema. See Structured Outputs for more information.Setting { "type": "json_object" } enables the legacy JSON mode, ensuring that the model output is valid JSON. However, using json_schema is recommended for models that support it.Show properties
objectShow properties
text.objectShow properties
Show properties
true, enforces strict adherence to the schema. The model will only return fields defined in the schema and with the correct types. When false, behaves similar to JSON mode but uses the schema as a guide. Defaults to false.json_schema.objectjson_schema is recommended for models that support it. To use json_object remember to also include a system or user message to specify the desired format.Show properties
json_object.json_object is not compatible with streaming - stream must be set to false.seed and parameters should return the same result. Determinism is not guaranteed.priority - Highest priority processing (Only available for dedicated endpoints, not shared endpoints.)default - Standard priority processingauto - Automatically uses the highest available service tierflex - Lowest priority processingdefaultSee Service Tiers for more information.logprobs must be set to true if this parameter is used.none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool.none is the default when no tools are present. auto is the default if tools are present.trueShow properties
function is supported.n is greater than 1.Show choice properties
stop, length, content_filter, tool_calls.chat.completion.service_tier is set to auto in the request.Possible values: priority, default, flexShow usage properties
Show properties
Show properties
Show time_info properties
from cerebras.cloud.sdk import Cerebras
import os
client = Cerebras(api_key=os.environ.get("CEREBRAS_API_KEY"),)
chat_completion = client.chat.completions.create(
model="gpt-oss-120b",
messages=[
{"role": "user", "content": "Hello!",}
],
)
print(chat_completion)
{
"id": "chatcmpl-292e278f-514e-4186-9010-91ce6a14168b",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Hello! How can I assist you today?",
"reasoning": "The user is asking for a simple greeting to the world. This is a straightforward request that doesn't require complex analysis. I should provide a friendly, direct response.",
"role": "assistant"
}
}
],
"created": 1723733419,
"model": "gpt-oss-120b",
"system_fingerprint": "fp_70185065a4",
"object": "chat.completion",
"usage": {
"prompt_tokens": 12,
"completion_tokens": 10,
"total_tokens": 22,
"prompt_tokens_details": {
"cached_tokens": 0
},
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"time_info": {
"queue_time": 0.000073161,
"prompt_time": 0.0010744798888888889,
"completion_time": 0.005658071111111111,
"total_time": 0.022224903106689453,
"created": 1723733419
}
}
Was this page helpful?