from cerebras.cloud.sdk import Cerebras
import os
client = Cerebras(api_key=os.environ.get("CEREBRAS_API_KEY"),)
chat_completion = client.chat.completions.create(
model="gpt-oss-120b",
messages=[
{"role": "user", "content": "Hello!",}
],
)
print(chat_completion)
{
"id": "chatcmpl-292e278f-514e-4186-9010-91ce6a14168b",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Hello! How can I assist you today?",
"reasoning": "The user is asking for a simple greeting to the world. This is a straightforward request that doesn't require complex analysis. I should provide a friendly, direct response.",
"role": "assistant"
}
}
],
"created": 1723733419,
"model": "gpt-oss-120b",
"system_fingerprint": "fp_70185065a4",
"object": "chat.completion",
"usage": {
"prompt_tokens": 12,
"completion_tokens": 10,
"total_tokens": 22,
"prompt_tokens_details": {
"cached_tokens": 0
},
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"time_info": {
"queue_time": 0.000073161,
"prompt_time": 0.0010744798888888889,
"completion_time": 0.005658071111111111,
"total_time": 0.022224903106689453,
"created": 1723733419
}
}
from cerebras.cloud.sdk import Cerebras
import os
client = Cerebras(api_key=os.environ.get("CEREBRAS_API_KEY"),)
chat_completion = client.chat.completions.create(
model="gpt-oss-120b",
messages=[
{"role": "user", "content": "Hello!",}
],
)
print(chat_completion)
{
"id": "chatcmpl-292e278f-514e-4186-9010-91ce6a14168b",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Hello! How can I assist you today?",
"reasoning": "The user is asking for a simple greeting to the world. This is a straightforward request that doesn't require complex analysis. I should provide a friendly, direct response.",
"role": "assistant"
}
}
],
"created": 1723733419,
"model": "gpt-oss-120b",
"system_fingerprint": "fp_70185065a4",
"object": "chat.completion",
"usage": {
"prompt_tokens": 12,
"completion_tokens": 10,
"total_tokens": 22,
"prompt_tokens_details": {
"cached_tokens": 0
},
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"time_info": {
"queue_time": 0.000073161,
"prompt_time": 0.0010744798888888889,
"completion_time": 0.005658071111111111,
"total_time": 0.022224903106689453,
"created": 1723733419
}
}
Falseqwen-3-32b = 40k | llama-3.3-70b = 64k.messages parameter as a string. Support for other object types will be added in future releases.llama3.1-8bllama-3.3-70bqwen-3-32bqwen-3-235b-a22b-instruct-2507 (preview)gpt-oss-120bzai-glm-4.6 (preview)trueShow possible types
Show properties
Show possible types
stringarrayShow possible types
content."low" - Minimal reasoning, faster responses"medium" - Moderate reasoning (default)"high" - Extensive reasoning, more thorough analysis{ "type": "json_schema", "json_schema": { "name": "schema_name", "strict": true, "schema": {...} } } enforces schema compliance by ensuring that the model output conforms to your specified JSON schema. See Structured Outputs for more information.Setting { "type": "json_object" } enables the legacy JSON mode, ensuring that the model output is valid JSON. However, using json_schema is recommended for models that support it.Show properties
objectShow properties
text.objectShow properties
Show properties
true, enforces strict adherence to the schema. The model will only return fields defined in the schema and with the correct types. When false, behaves similar to JSON mode but uses the schema as a guide. Defaults to false.json_schema.objectjson_schema is recommended for models that support it. To use json_object remember to also include a system or user message to specify the desired format.Show properties
json_object.json_object is not compatible with streaming - stream must be set to false.seed and parameters should return the same result. Determinism is not guaranteed.logprobs must be set to true if this parameter is used.none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool.none is the default when no tools are present. auto is the default if tools are present.Show properties
function is supported.n is greater than 1.Show choice properties
stop, length, content_filter, tool_calls.chat.completion.Show usage properties
Show properties
Show properties
Show time_info properties
from cerebras.cloud.sdk import Cerebras
import os
client = Cerebras(api_key=os.environ.get("CEREBRAS_API_KEY"),)
chat_completion = client.chat.completions.create(
model="gpt-oss-120b",
messages=[
{"role": "user", "content": "Hello!",}
],
)
print(chat_completion)
{
"id": "chatcmpl-292e278f-514e-4186-9010-91ce6a14168b",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Hello! How can I assist you today?",
"reasoning": "The user is asking for a simple greeting to the world. This is a straightforward request that doesn't require complex analysis. I should provide a friendly, direct response.",
"role": "assistant"
}
}
],
"created": 1723733419,
"model": "gpt-oss-120b",
"system_fingerprint": "fp_70185065a4",
"object": "chat.completion",
"usage": {
"prompt_tokens": 12,
"completion_tokens": 10,
"total_tokens": 22,
"prompt_tokens_details": {
"cached_tokens": 0
},
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"time_info": {
"queue_time": 0.000073161,
"prompt_time": 0.0010744798888888889,
"completion_time": 0.005658071111111111,
"total_time": 0.022224903106689453,
"created": 1723733419
}
}
Was this page helpful?