> ## Documentation Index
> Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Chat Completions

Generate conversational responses using a structured message format with roles (system, user, assistant, developer, tool). Best for chatbots, assistants, and multi-turn conversations.

Parameter support can differ depending on the model used to generate the response, particularly for newer reasoning models. For details about parameters in reasoning models, refer to the [Reasoning Guide](/capabilities/reasoning).

## Request

### Headers

<ParamField header="Content-Type" type="string">
  The media type of the request body.

  **Supported values:** `application/json`, `application/vnd.msgpack`

  **Default:** `application/json`

  See [Payload Optimization](/capabilities/payload-optimization) for details.
</ParamField>

<ParamField header="Content-Encoding" type="string">
  The compression encoding applied to the request body.

  **Supported values:** `gzip`

  When set, the request body must be gzip-compressed. Can be combined with any supported `Content-Type`.

  See [Payload Optimization](/capabilities/payload-optimization) for details.
</ParamField>

<ParamField header="queue_threshold" type="string">
  Controls the queue time threshold for requests using the `flex` or `auto` service tiers. Requests are preemptively rejected if the rolling average queue time exceeds this threshold.

  <Callout icon="lock" color="#b2b1b1ff" iconType="regular">
    This feature is in **Private Preview**. For access or more information, [contact us](https://www.cerebras.ai/contact) or reach out to your account representative.
  </Callout>

  **Valid range:** `50` - `20000` (milliseconds)

  **Default:** System default if not specified

  See [Service Tiers](/capabilities/service-tiers) for more information.
</ParamField>

### Body

<ParamField path="messages" type="object[]" required="true">
  A list of messages comprising the conversation so far.

  <Expandable title="possible types">
    <br />

    ##### Assistant message `object`

    Messages sent by the model in response to user messages.

    <Expandable title="properties">
      <ResponseField name="content" type="string | array | null">
        The contents of the assistant message.

        <Expandable title="possible types">
          <br />

          ##### Text content `string`

          The contents of the assistant message.

          ***

          ##### Array of content parts `array`

          An array of content parts with a defined type. For assistant messages, only type `text` is supported.

          <Expandable title="possible types">
            <ResponseField name="text" type="string" required>
              The text content.
            </ResponseField>

            <ResponseField name="type" type="string" required>
              The type of content part. Always `text`.
            </ResponseField>
          </Expandable>

          <br />
        </Expandable>
      </ResponseField>

      <ResponseField name="role" type="string" required>
        The role of the messages author, in this case `assistant`.
      </ResponseField>

      <ResponseField name="name" type="string">
        An optional name for the participant. Provides the model information to differentiate between participants of the same role.
      </ResponseField>

      <ResponseField name="reasoning" type="string | null">
        The reasoning content from the model's response. Include this when passing back assistant messages that contained reasoning.
      </ResponseField>

      <ResponseField name="tool_calls" type="array">
        The tool calls generated by the model, such as function calls.

        <Expandable title="properties">
          <ResponseField name="id" type="string" required>
            The ID of the tool call.
          </ResponseField>

          <ResponseField name="type" type="string" required>
            The type of the tool. Currently, only `function` is supported.
          </ResponseField>

          <ResponseField name="function" type="object" required>
            The function that the model called.

            <Expandable title="properties">
              <ResponseField name="name" type="string" required>
                The name of the function to call.
              </ResponseField>

              <ResponseField name="arguments" type="string" required>
                The arguments to call the function with, as generated by the model in JSON format.
              </ResponseField>
            </Expandable>
          </ResponseField>
        </Expandable>
      </ResponseField>
    </Expandable>

    <br />

    ##### Developer message `object`

    Developer-provided instructions that the model should follow, regardless of messages sent by the user. For gpt-oss-120b, `developer` messages replace the previous `system` messages, but `system` is still accepted.

    <Note>
      The `developer` role is currently only available for the [gpt-oss-120b](/models/openai-oss) model.
    </Note>

    <Expandable title="properties">
      <ResponseField name="content" type="string | array" required>
        The contents of the developer message.

        <Expandable title="possible types">
          <br />

          ##### Text content `string`

          The contents of the developer message.

          ***

          ##### Array of content parts `array`

          An array of content parts with a defined type. For developer messages, only type `text` is supported.

          <Expandable title="properties">
            <ResponseField name="text" type="string" required>
              The text content.
            </ResponseField>

            <ResponseField name="type" type="string" required>
              The type of content part. Always `text`.
            </ResponseField>
          </Expandable>

          <br />
        </Expandable>
      </ResponseField>

      <ResponseField name="role" type="string" required>
        The role of the messages author, in this case `developer`.
      </ResponseField>

      <ResponseField name="name" type="string">
        An optional name for the participant. Provides the model information to differentiate between participants of the same role.
      </ResponseField>
    </Expandable>

    <br />

    ##### System message `object`

    Developer-provided instructions that the model should follow, regardless of messages sent by the user.

    <Expandable title="properties">
      <ResponseField name="content" type="string | array" required>
        The contents of the system message.

        <Expandable title="possible types">
          <br />

          ##### Text content `string`

          The contents of the system message.

          ***

          ##### Array of content parts `array`

          An array of content parts with a defined type. For system messages, only type `text` is supported.

          <Expandable title="possible types">
            <ResponseField name="text" type="string" required>
              The text content.
            </ResponseField>

            <ResponseField name="type" type="string" required>
              The type of content part. Always `text`.
            </ResponseField>
          </Expandable>

          <br />
        </Expandable>
      </ResponseField>

      <ResponseField name="role" type="string" required>
        The role of the messages author, in this case `system`.
      </ResponseField>

      <ResponseField name="name" type="string">
        An optional name for the participant. Provides the model information to differentiate between participants of the same role.
      </ResponseField>
    </Expandable>

    <br />

    ##### Tool message `object`

    Tool message containing the result of a tool call.

    <Expandable title="properties">
      <ResponseField name="content" type="string | array" required>
        The contents of the tool message.

        <Expandable title="possible types">
          <br />

          ##### Text content `string`

          The contents of the tool message.

          ***

          ##### Array of content parts `array`

          An array of content parts with a defined type. For tool messages, only type `text` is supported.

          <Expandable title="possible types">
            <ResponseField name="text" type="string" required>
              The text content.
            </ResponseField>

            <ResponseField name="type" type="string" required>
              The type of content part. Always `text`.
            </ResponseField>
          </Expandable>

          <br />
        </Expandable>
      </ResponseField>

      <ResponseField name="role" type="string" required>
        The role of the messages author, in this case `tool`.
      </ResponseField>

      <ResponseField name="tool_call_id" type="string" required>
        Tool call that this message is responding to.
      </ResponseField>
    </Expandable>

    <br />

    ##### User message `object`

    Messages sent by an end user, containing prompts or additional context information.

    <Expandable title="properties">
      <ResponseField name="content" type="string | array" required>
        The contents of the user message.

        <Expandable title="possible types">
          <br />

          ##### Text content `string`

          The text contents of the message.

          ***

          ##### Array of content parts `array`

          An array of content parts with a defined type. Supported options differ based on the model being used to generate the response. Can contain text inputs.

          <Expandable title="possible types">
            <ResponseField name="text" type="string" required>
              The text content.
            </ResponseField>

            <ResponseField name="type" type="string" required>
              The type of content part. Always `text`.
            </ResponseField>
          </Expandable>

          <br />
        </Expandable>
      </ResponseField>

      <ResponseField name="role" type="string" required>
        The role of the messages author, in this case `user`.
      </ResponseField>

      <ResponseField name="name" type="string">
        An optional name for the participant. Provides the model information to differentiate between participants of the same role.
      </ResponseField>
    </Expandable>

    <br />
  </Expandable>
</ParamField>

<ParamField path="model" type="string" required="true">
  Available options:

  * `llama3.1-8b`
  * `qwen-3-235b-a22b-instruct-2507` (preview)
  * `gpt-oss-120b`
  * `zai-glm-4.7` (preview)
</ParamField>

<ParamField path="clear_thinking" type="boolean | null">
  Controls whether thinking content from previous conversation turns is included in the prompt context.

  **Note:** Thinking content from the current (latest unfinished) turn is always included regardless of this setting.

  * `false` - Thinking from all previous turns is preserved in the conversation history. Recommended for agentic workflows where reasoning from past tool-calling turns may be relevant for future tool calls.
  * `true` (default) - Thinking from earlier turns is excluded. Recommended for general chat conversations where reasoning from past turns is less relevant for performance.

  When this parameter is not specified or set to `null`, the API defaults to `clear_thinking: true`.

  <Note>
    This parameter is supported only on the [zai-glm-4.7](/models/zai-glm-47) model. For additional information, see [Preserved thinking](https://docs.z.ai/guides/capabilities/thinking-mode#preserved-thinking) in the Z.ai documentation.
  </Note>
</ParamField>

<ParamField path="frequency_penalty" type="number | null">
  A number between -2.0 and 2.0. Positive values reduce the likelihood of the model repeating tokens by applying a penalty proportional to how frequently each token has already appeared in the generated output.

  Minimum: `-2`, Maximum: `2`

  Default: `0`
</ParamField>

<ParamField path="logit_bias" type="map | null">
  Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

  Default: `null`
</ParamField>

<ParamField path="logprobs" type="bool">
  Whether to return log probabilities of the output tokens or not.

  Default: `false`
</ParamField>

<ParamField path="max_completion_tokens" type="integer | null">
  The maximum number of tokens that can be generated in the completion, including reasoning tokens. The total length of input tokens and generated tokens is limited by the model's context length.
</ParamField>

<ParamField path="parallel_tool_calls" type="boolean | null">
  Whether to enable parallel function calling during tool use. When enabled (default), the model can request multiple tool calls simultaneously in a single response. When disabled, the model will only request one tool call at a time.

  Default: `true`
</ParamField>

<ParamField body="prediction" type="object | null">
  Configuration for a [Predicted Output](/capabilities/predicted-outputs), which can greatly speed up response times when large parts of the model response are known in advance. This is most common when you are regenerating a file with mostly minor changes to the content.

  <Expandable title="possible types">
    <br />

    ##### Static Content

    Static predicted output content, such as the content of a text or code file that is being regenerated.

    <Expandable title="properties">
      <ResponseField name="content" type="string | array" required>
        The content that should be matched when generating a model response. If continuous token sequences from the generated tokens match this content, the entire model response can be returned faster.

        <Expandable title="possible types">
          <br />

          ##### Text content `string`

          The content used for a given Predicted Output. Typically the text of a file you are regenerating with only minor changes. <br />

          <br />

          ##### Array of content parts `array`

          An array of content parts with a defined type. Supported options may differ based on the [model](/models/) that is used to generate the response. May contain text inputs.

          <Expandable title="possible types">
            <ResponseField name="text" type="string" required>
              The text content.
            </ResponseField>

            <ResponseField name="type" type="string" required>
              The type of content part. Always `text`.
            </ResponseField>
          </Expandable>

          <br />
        </Expandable>
      </ResponseField>

      <ResponseField name="type" type="string" required>
        The type of the predicted content you wish to provide. This type is currently always <code>content</code>.
      </ResponseField>
    </Expandable>

    <br />
  </Expandable>

  Visit our page on [Predicted Outputs](/capabilities/predicted-outputs) for more information and examples.
</ParamField>

<ParamField path="presence_penalty" type="number | null">
  A number between -2.0 and 2.0. Positive values reduce the likelihood of the model repeating tokens that have already appeared in the output, encouraging the model to introduce new topics.

  Minimum: `-2`, Maximum: `2`

  Default: `0`
</ParamField>

<ParamField path="prompt_cache_key" type="string | null">
  An opaque identifier that groups related requests so they reuse the same [prompt cache](/capabilities/prompt-caching). Requests sharing the same `prompt_cache_key` are routed together, which increases cache hits and reduces time to first token.

  Set it to a stable identifier like a conversation ID, user ID, or session ID.

  Maximum length: `1024` characters

  Default: `null`

  <Note>
    `prompt_cache_key` must be enabled on your account before you can use it. [Contact us](https://www.cerebras.ai/contact) or reach out to your account representative to request access.
  </Note>
</ParamField>

<ParamField path="reasoning_effort" type="string | null">
  Controls the amount of reasoning the model performs. Supported values vary by model:

  **[gpt-oss-120b](/models/openai-oss)**

  * `"low"` – Minimal reasoning, faster responses
  * `"medium"` – Moderate reasoning (default)
  * `"high"` – Extensive reasoning, more thorough analysis

  **[zai-glm-4.7](/models/zai-glm-47)** (reasoning enabled by default)

  * `"none"` – Disables reasoning entirely

  <Note>
    This parameter is only available for [gpt-oss-120b](/models/openai-oss) and [zai-glm-4.7](/models/zai-glm-47) models.
  </Note>
</ParamField>

<ParamField path="response_format" type="object | null">
  An object that controls the format of the model response.

  Setting to `{ "type": "json_schema", "json_schema": { "name": "schema_name", "strict": true, "schema": {...} } }` enforces schema compliance by ensuring that the model output conforms to your specified JSON schema. See [Structured Outputs](../capabilities/structured-outputs) for more information.

  Setting `{ "type": "json_object" }` enables the legacy JSON mode, ensuring that the model output is valid JSON. However, using `json_schema` is recommended for models that support it.

  <Expandable title="properties">
    <br />

    ##### Text `object`

    Default response format. Generates plain text responses.

    <Expandable title="properties">
      <ResponseField name="type" type="string" required>
        The type of response format being defined. Always `text`.
      </ResponseField>
    </Expandable>

    <br />

    ##### JSON schema `object`

    Generates structured JSON output that conforms to the specified schema. Use this format when you need the model to return structured JSON.

    <Expandable title="properties">
      <ParamField path="json_schema" type="object" required>
        Structured Outputs configuration options.
      </ParamField>

      <Expandable title="properties">
        <ParamField path="name" type="string" required>
          An optional name for your schema.
        </ParamField>

        <ParamField path="description" type="string" optional>
          A description of the response format's purpose, used by the model to determine how to generate its response in that format.
        </ParamField>

        <ParamField path="schema" type="object">
          A valid [JSON Schema](https://json-schema.org/) object that defines the structure, types, and requirements for the response. Supports standard JSON Schema features including types (string, number, boolean, integer, object, array, enum, anyOf, null), nested structures, required fields, and additionalProperties (must be set to false).
        </ParamField>

        <ParamField path="strict" type="boolean">
          When set to `true`, enforces strict adherence to the schema. The model will only return fields defined in the schema and with the correct types. When `false`, behaves similar to JSON mode but uses the schema as a guide. Defaults to `false`.
        </ParamField>
      </Expandable>

      <ParamField path="type" type="string" required>
        The type of response format being defined. Always `json_schema`.
      </ParamField>
    </Expandable>

    <br />

    ##### JSON object `object`

    A legacy method for generating JSON responses. Using `json_schema` is recommended for models that support it. To use `json_object` remember to also include a system or user message to specify the desired format.

    <Expandable title="properties">
      <ResponseField name="type" type="string" required>
        The type of response format being defined. Always `json_object`.
      </ResponseField>
    </Expandable>

    <Note>
      When using JSON object, you must explicitly instruct the model to generate JSON through a system or user message. `json_object` is not compatible with streaming - `stream` must be set to `false`.
    </Note>

    <br />
  </Expandable>
</ParamField>

<ParamField path="seed" type="integer | null">
  If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result. Determinism is not guaranteed.
</ParamField>

<ParamField path="service_tier" type="string | null">
  Controls request prioritization.

  <Callout icon="lock" color="#b2b1b1ff" iconType="regular">
    This feature is in **Private Preview**. For access or more information, [contact us](https://www.cerebras.ai/contact) or reach out to your account representative.
  </Callout>

  Available options:

  * `priority` - Highest priority processing (Only available for dedicated endpoints, not shared endpoints.)
  * `default` - Standard priority processing
  * `auto` - Automatically uses the highest available service tier
  * `flex` - Lowest priority processing

  Default: `default`

  See [Service Tiers](/capabilities/service-tiers) for more information.
</ParamField>

<ParamField path="stop" type="string | null">
  Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
</ParamField>

<ParamField path="stream" type="boolean | null">
  If set, partial message deltas will be sent.
</ParamField>

<ParamField path="temperature" type="number | null">
  What sampling temperature to use, between 0 and 2.0. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top\_p but not both.

  Minimum: `0`, Maximum: `2`
</ParamField>

<ParamField path="tool_choice" type="string | object">
  Controls which (if any) tool is called by the model. `none` means the model will not call any tool and instead generates a message. `auto` means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that tool.

  `none` is the default when no tools are present. `auto` is the default if tools are present.
</ParamField>

<ParamField path="tools" type="object | null">
  A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.

  Specifying tools consumes prompt tokens in the context. If too many are given, the model may perform poorly or you may hit context length limitations

  <Expandable title="properties">
    <ParamField path="tools.function.description" type="string">
      A description of what the function does, used by the model to choose when and how to call the function.
    </ParamField>

    <ParamField path="tools.function.name" type="string">
      The name of the function to be called.

      Supported characters: `a-z`, `A-Z`, `0-9`, `_`, `-`

      Maximum length: `64`

      Names that use other characters or exceed this length might work with some models, but compatibility isn't guaranteed.
    </ParamField>

    <ParamField path="tools.function.parameters" type="object">
      The parameters the functions accepts, described as a JSON Schema object. Omitting parameters defines a function with an empty parameter list.
    </ParamField>

    <ParamField path="tools.type" type="string">
      The type of the tool. Currently, only `function` is supported.
    </ParamField>
  </Expandable>
</ParamField>

<ParamField path="top_logprobs" type="integer | null">
  An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.
  `logprobs` must be set to true if this parameter is used.

  Minimum: `0`, Maximum: `20`
</ParamField>

<ParamField path="top_p" type="number | null">
  An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top\_p probability mass. So, 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

  Minimum: `0`, Maximum: `1`
</ParamField>

<ParamField path="user" type="string | null">
  A unique identifier representing your end-user, which can help to monitor and detect abuse.
</ParamField>

## Response

<ResponseField name="id" type="string">
  A unique identifier for the chat completion.
</ResponseField>

<ResponseField name="choices" type="object[]">
  A list of chat completion choices. Can be more than one if `n` is greater than 1.

  <Expandable title="choice properties">
    <ResponseField name="finish_reason" type="string">
      The reason the model stopped generating tokens. Possible values: `stop`, `length`, `content_filter`, `tool_calls`.
    </ResponseField>

    <ResponseField name="index" type="integer">
      The index of the choice in the list of choices.
    </ResponseField>

    <ResponseField name="logprobs" type="object | null">
      Log probability information for the output tokens.
    </ResponseField>

    <ResponseField name="reasoning_logprobs" type="object | null">
      Log probability information for the reasoning tokens, if provided by the model.
    </ResponseField>

    <ResponseField name="message" type="object">
      A chat completion message generated by the model.

      <Expandable title="message properties">
        <ResponseField name="content" type="string">
          The contents of the message.
        </ResponseField>

        <ResponseField name="role" type="string">
          The role of the author of this message.
        </ResponseField>

        <ResponseField name="reasoning" type="string">
          The model's reasoning content when using reasoning models.
        </ResponseField>

        <ResponseField name="tool_calls" type="array | null">
          A list of tool calls the model requested in its response.

          <Expandable title="possible types">
            <br />

            ##### Tool call `object`

            <ResponseField name="id" type="string">
              The ID of the tool call.
            </ResponseField>

            <ResponseField name="type" type="string">
              The type of tool call, currently only `function` is supported.
            </ResponseField>

            <ResponseField name="function" type="object">
              The function call details.

              <Expandable title="properties">
                <ResponseField name="name" type="string">
                  The name of the function to call.
                </ResponseField>

                <ResponseField name="arguments" type="string">
                  The JSON-encoded arguments supplied by the model.
                </ResponseField>
              </Expandable>
            </ResponseField>
          </Expandable>
        </ResponseField>
      </Expandable>
    </ResponseField>
  </Expandable>
</ResponseField>

<ResponseField name="created" type="integer">
  The Unix timestamp (in seconds) of when the chat completion was created.
</ResponseField>

<ResponseField name="model" type="string">
  The model used for the chat completion.
</ResponseField>

<ResponseField name="object" type="string">
  The object type, which is always `chat.completion`.
</ResponseField>

<ResponseField name="system_fingerprint" type="string">
  A fingerprint for the model or backend used to generate the response.
</ResponseField>

<ResponseField name="service_tier" type="string">
  The service tier used for the request, or `null` if not specified.
</ResponseField>

<ResponseField name="service_tier_used" type="string">
  The service tier used for processing the request. Only present when `service_tier` is set to `auto` in the request.

  Possible values: `priority`, `default`, `flex`
</ResponseField>

<ResponseField name="usage" type="object">
  Usage statistics for the completion request.

  <Expandable title="usage properties">
    <ResponseField name="prompt_tokens" type="integer">
      Number of tokens in the prompt.
    </ResponseField>

    <ResponseField name="completion_tokens" type="integer">
      Number of tokens in the generated completion.
    </ResponseField>

    <ResponseField name="total_tokens" type="integer">
      Total number of tokens used in the request (prompt + completion).
    </ResponseField>

    <ResponseField name="prompt_tokens_details" type="object">
      Detailed breakdown of prompt token usage.

      <Expandable title="properties">
        <ResponseField name="cached_tokens" type="integer">
          Number of prompt tokens that were served from the cache and reused from a previous request. See [Prompt Caching](/capabilities/prompt-caching) for more information.
        </ResponseField>
      </Expandable>
    </ResponseField>

    <ResponseField name="completion_tokens_details" type="object">
      Breakdown of completion tokens when using Predicted Outputs.

      <Expandable title="properties">
        <ResponseField name="accepted_prediction_tokens" type="integer">
          When using Predicted Outputs, the number of tokens in the prediction that appeared in the completion.
        </ResponseField>

        <ResponseField name="rejected_prediction_tokens" type="integer">
          When using Predicted Outputs, the number of tokens in the prediction that did not appear in the completion. Like reasoning tokens, these tokens are still counted in the total completion tokens for the purposes of billing, output, and context window limits.
        </ResponseField>

        <ResponseField name="reasoning_tokens" type="integer">
          Tokens spent on model reasoning, if applicable.
        </ResponseField>
      </Expandable>
    </ResponseField>
  </Expandable>
</ResponseField>

<ResponseField name="time_info" type="object">
  Performance timing information for the request.

  <Expandable title="time_info properties">
    <ResponseField name="queue_time" type="number">
      Time spent in queue waiting for processing (in seconds).
    </ResponseField>

    <ResponseField name="prompt_time" type="number">
      Time spent processing the prompt/input tokens (in seconds).
    </ResponseField>

    <ResponseField name="completion_time" type="number">
      Time spent generating the completion/output tokens (in seconds).
    </ResponseField>

    <ResponseField name="total_time" type="number">
      Total time for the entire request from submission to completion (in seconds).
    </ResponseField>

    <ResponseField name="created" type="number">
      Unix timestamp (in seconds) of when the time\_info was recorded.
    </ResponseField>
  </Expandable>
</ResponseField>

<RequestExample>
  ```python Python theme={null}
  from cerebras.cloud.sdk import Cerebras
  import os 

  client = Cerebras(api_key=os.environ.get("CEREBRAS_API_KEY"),)

  chat_completion = client.chat.completions.create(
      model="gpt-oss-120b",
      messages=[
          {"role": "user", "content": "Hello!",}
      ],
  )
  print(chat_completion)
  ```

  ```javascript Node.js theme={null}
  import Cerebras from '@cerebras/cerebras_cloud_sdk';

  const client = new Cerebras({
    apiKey: process.env['CEREBRAS_API_KEY'],
  });

  async function main() {
    const completionCreateResponse = await client.chat.completions.create({
      messages: [{ role: 'user', content: 'Hello!' }],
      model: 'gpt-oss-120b',
    });

    console.log(completionCreateResponse);
  }
  main();
  ```

  ```cli cURL theme={null}
  curl --location 'https://api.cerebras.ai/v1/chat/completions' \
  --header 'Content-Type: application/json' \
  --header "Authorization: Bearer ${CEREBRAS_API_KEY}" \
  --data '{
    "model": "gpt-oss-120b",
    "stream": false,
    "messages": [{"content": "Hello!", "role": "user"}],
    "temperature": 0,
    "max_completion_tokens": -1,
    "seed": 0,
    "top_p": 1
  }'
  ```
</RequestExample>

<ResponseExample>
  ```json Response theme={null}
  {
    "id": "chatcmpl-30b3c3d8-ca41-48e7-9ef0-27e322604a13",
    "choices": [
      {
        "finish_reason": "stop",
        "index": 0,
        "message": {
          "role": "assistant",
          "content": "Hello! 👋 How can I help you today?",
          "reasoning": "The user just says \"Hello!\" with no further context.\n\nWe need to respond politely, maybe ask how can assist.\n\nWe can also be friendly.\n\nNo constraints. We'll respond as chat.",
          "tool_calls": null
        },
        "logprobs": null,
        "reasoning_logprobs": null
      }
    ],
    "created": 1769729480,
    "model": "gpt-oss-120b",
    "object": "chat.completion",
    "system_fingerprint": "fp_e7ab83753cbd28777b40",
    "time_info": {
      "completion_time": 0.04040406,
      "prompt_time": 0.00383762,
      "queue_time": 0.004628115,
      "total_time": 0.05013155937194824,
      "created": 1769729480.0787008
    },
    "usage": {
      "completion_tokens": 59,
      "completion_tokens_details": {
        "accepted_prediction_tokens": 0,
        "rejected_prediction_tokens": 0,
        "reasoning_tokens": 0
      },
      "prompt_tokens": 69,
      "prompt_tokens_details": {
        "cached_tokens": 0
      },
      "total_tokens": 128
    },
    "service_tier": null
  }
  ```
</ResponseExample>
