> ## Documentation Index
> Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Completions

Generate text continuations from a single prompt string. Best for simple text generation, autocomplete, and single-turn tasks.

## Request

### Headers

<ParamField header="Content-Type" type="string">
  The media type of the request body.

  **Supported values:** `application/json`, `application/vnd.msgpack`

  **Default:** `application/json`

  See [Payload Optimization](/capabilities/payload-optimization) for details.
</ParamField>

<ParamField header="Content-Encoding" type="string">
  The compression encoding applied to the request body.

  **Supported values:** `gzip`

  When set, the request body must be gzip-compressed. Can be combined with any supported `Content-Type`.

  See [Payload Optimization](/capabilities/payload-optimization) for details.
</ParamField>

### Body

<ParamField path="prompt" type="string | array">
  The prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays.
  Default: `""`
</ParamField>

<ParamField path="model" type="string" required="true">
  The model to use for completion. See [Supported Models](/models/overview) for a list of available models.
</ParamField>

<ParamField path="stream" type="boolean | null">
  If set, partial message deltas will be sent. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a `data: [DONE]` message.

  Default: `false`
</ParamField>

<ParamField path="return_raw_tokens" type="boolean | null">
  Return raw tokens instead of text.

  Default: `false`
</ParamField>

<ParamField path="max_tokens" type="integer | null">
  The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length.

  Default: `null`
</ParamField>

<ParamField path="min_tokens" type="integer | null">
  The minimum number of tokens to generate for a completion. If not specified or set to 0, the model will generate as many tokens as it deems necessary. Setting to -1 sets to max sequence length.

  Default: `null`
</ParamField>

<ParamField path="grammar_root" type="string | null">
  The grammar root used for structured output generation.
  Supported values: `root`, `fcall`, `nofcall`, `insidevalue`, `value`, `object`, `array`, `string`, `number`, `funcarray`, `func`, `ws`.

  Default: `null`
</ParamField>

<ParamField path="seed" type="integer | null">
  If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result. Determinism is not guaranteed.

  Default: `null`
</ParamField>

<ParamField path="stop" type="string | array | null">
  Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

  Default: `null`
</ParamField>

<ParamField path="temperature" type="float | null">
  What sampling temperature to use, between 0 and 1.5. Higher values (e.g., 0.8) will make the output more random, while lower values (e.g., 0.2) will make it more focused and deterministic. We generally recommend altering this or `top_p` but not both.

  Minimum: `0`, Maximum: `1.5`

  Default: `1.0`
</ParamField>

<ParamField path="top_p" type="float | null">
  An alternative to sampling with temperature, called nucleus sampling, where the model considers the tokens with top\_p probability mass. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or `temperature` but not both.

  Minimum: `0`, Maximum: `1`

  Default: `1.0`
</ParamField>

<ParamField path="echo" type="boolean">
  Echo back the prompt in addition to the completion. Incompatible with `return_raw_tokens=True`.

  Default: `false`
</ParamField>

<ParamField path="user" type="string | null">
  A unique identifier representing your end-user, which can help Cerebras to monitor and detect abuse.

  Default: `null`
</ParamField>

<ParamField path="prompt_cache_key" type="string | null">
  An opaque identifier that groups related requests so they reuse the same [prompt cache](/capabilities/prompt-caching). Requests sharing the same `prompt_cache_key` are routed together, which increases cache hits and reduces time to first token.

  Set it to a stable identifier like a conversation ID, user ID, or session ID.

  Maximum length: `1024` characters

  Default: `null`

  <Note>
    `prompt_cache_key` must be enabled on your account before you can use it. [Contact us](https://www.cerebras.ai/contact) or reach out to your account representative to request access.
  </Note>
</ParamField>

<ParamField path="logprobs" type="integer | null">
  Return log probabilities of the output tokens.

  For example, if `logprobs` is 5, the API will return a list of the 5 most likely tokens. The API will always return the logprob of the sampled token, so there may be up to `logprobs+1` elements in the response.

  Minimum: `0`, Maximum: `20`

  Default: `null`

  <Note>
    Setting `logprobs` to 0 is different than `null`. When set to `null`, log probabilities are disabled entirely. When set to 0, log probabilities are enabled but it does not return `top_logprobs`.
  </Note>
</ParamField>

## Completion Response

<ParamField path="choices" type="object[]" required="true">
  The list of completion choices the model generated for the input prompt.

  <Expandable title="properties">
    <ParamField path="finish_reason" type="string | null">
      The reason the model stopped generating tokens. This will be `stop` if the model hit a natural stop point or a provided stop sequence, `length` if the maximum number of tokens specified in the request was reached, or `content_filter` if content was omitted due to a flag from our content filters.
    </ParamField>

    <ParamField path="index" type="integer" />

    <ParamField path="logprobs" type="object | null" />

    <Expandable title="properties">
      <ParamField path="text_offset" type="array">
        Number of characters since the prompt.
      </ParamField>

      <ParamField path="token_logprobs" type="array">
        Logprob value for each token.
      </ParamField>

      <ParamField path="tokens" type="string">
        The tokens.
      </ParamField>

      <ParamField path="top_logprobs" type="array">
        List of the most likely tokens and their log probability, at this token position. In rare cases, there may be fewer than the number of requested `top_logprobs` returned.
      </ParamField>
    </Expandable>

    <ParamField path="text" type="string">
      The generated completion text.
    </ParamField>

    <ParamField path="tokens" type="array | null">
      The raw tokens of the completion, returned when `return_raw_tokens` is set to `true`.
    </ParamField>
  </Expandable>
</ParamField>

<ParamField path="created" type="integer | null" required="true">
  The Unix timestamp (in seconds) of when the completion was created.
</ParamField>

<ParamField path="id" type="string">
  A unique identifier for the completion.
</ParamField>

<ParamField path="model" type="string">
  The model used for completion.
</ParamField>

<ParamField path="object" type="string" required="true">
  The object type, which is always "text\_completion"
</ParamField>

<ParamField path="system_fingerprint" type="string">
  This fingerprint represents the backend configuration that the model runs with.

  Can be used in conjunction with the `seed` request parameter to understand when backend changes have been made that might impact determinism.
</ParamField>

<ResponseField name="usage" type="object">
  Usage statistics for the completion request.

  <Expandable title="properties">
    <ResponseField name="prompt_tokens" type="integer">
      Number of tokens in the prompt.
    </ResponseField>

    <ResponseField name="completion_tokens" type="integer">
      Number of tokens in the generated completion.
    </ResponseField>

    <ResponseField name="total_tokens" type="integer">
      Total number of tokens used in the request (prompt + completion).
    </ResponseField>

    <ResponseField name="prompt_tokens_details" type="object">
      Detailed breakdown of prompt token usage.

      <Expandable title="properties">
        <ResponseField name="cached_tokens" type="integer">
          Number of prompt tokens that were served from the cache and reused from a previous request. See [Prompt Caching](/capabilities/prompt-caching) for more information.
        </ResponseField>
      </Expandable>
    </ResponseField>
  </Expandable>
</ResponseField>

<ResponseField name="time_info" type="object">
  Performance timing information for the request.

  <Expandable title="properties">
    <ResponseField name="queue_time" type="number">
      Time spent in queue waiting for processing (in seconds).
    </ResponseField>

    <ResponseField name="prompt_time" type="number">
      Time spent processing the prompt/input tokens (in seconds).
    </ResponseField>

    <ResponseField name="completion_time" type="number">
      Time spent generating the completion/output tokens (in seconds).
    </ResponseField>

    <ResponseField name="total_time" type="number">
      Total time for the entire request from submission to completion (in seconds).
    </ResponseField>

    <ResponseField name="created" type="number">
      Unix timestamp (in seconds) of when the time\_info was recorded.
    </ResponseField>
  </Expandable>
</ResponseField>

<RequestExample>
  ```python Python theme={null}
  import os
  from cerebras.cloud.sdk import Cerebras

  client = Cerebras(
      api_key=os.environ.get("CEREBRAS_API_KEY"),  # This is the default and can be omitted
  )

  completion = client.completions.create(
      prompt="It was a dark and stormy night",
      max_tokens=100,
      model="gpt-oss-120b",
      logprobs=5,
  )

  print(completion)
  ```

  ```javascript Node.js theme={null}
  import Cerebras from '@cerebras/cerebras_cloud_sdk';

  const client = new Cerebras({
    apiKey: process.env['CEREBRAS_API_KEY'], // This is the default and can be omitted
  });

  async function main() {
    const completion = await client.completions.create({
      prompt: "It was a dark and stormy night",
      model: 'gpt-oss-120b',
      logprobs: 5,
    });

    console.log(completion?.choices[0]?.text);
  }

  main();
  ```

  ```cli cURL theme={null}
  curl -X POST https://api.cerebras.ai/v1/completions \
     -H "Authorization: Bearer $CEREBRAS_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
           "prompt": "It was a dark and stormy night",
           "max_tokens": 100,
           "model": "gpt-oss-120b",
           "logprobs": 5
         }'
  ```
</RequestExample>

<ResponseExample>
  ```json Response theme={null}
  {
      "id": "chatcmpl-b1743f2d-8c20-4ad5-a77c-426521ae1b1c",
      "choices": [
          {
              "index": 0,
              "finish_reason": "length",
              "logprobs": {
                  "text_offset": [30, 34, 40, 44, 47, 52],
                  "token_logprobs": [
                      -3.3912997245788574,
                      -4.322617053985596,
                      -0.3532562553882599,
                      -1.7001153230667114,
                      -2.187103271484375,
                      -2.4667720794677734
                  ],
                  "tokens": [" and", " there", " was", " no", " moon", ","],
                  "top_logprobs": [
                      {
                          ".": -1.1334871053695679,
                          ",": -1.3366121053695679,
                          " when": -2.3366122245788574,
                          " in": -2.9381747245788574,
                          "...": -3.3287997245788574,
                          " and": -3.3912997245788574
                      },
                      {
                          " I": -0.9554294943809509,
                          " the": -1.7679295539855957,
                          " a": -2.2366795539855957,
                          " all": -3.1898045539855957,
                          " we": -3.4241795539855957,
                          " there": -4.322617053985596
                      },
                      {
                          " was": -0.3532562553882599,
                          " were": -1.6813812255859375,
                          " I": -4.0251312255859375,
                          " wasn": -4.1345062255859375,
                          " had": -4.1970062255859375
                      },
                      {
                          " a": -0.3719903528690338,
                          " no": -1.7001153230667114,
                          " an": -3.614177942276001,
                          " this": -4.129802703857422,
                          " nothing": -4.184490203857422
                      },
                      {
                          " electricity": -1.3433531522750854,
                          " one": -1.6949156522750854,
                          " moon": -2.187103271484375,
                          " way": -3.038665771484375,
                          " sign": -3.054290771484375
                      },
                      {
                          ".": -0.8495846390724182,
                          " to": -2.1386470794677734,
                          " in": -2.2870845794677734,
                          ",": -2.4667720794677734,
                          ".\n": -3.2714595794677734
                      }
                  ]
              },
              "text": " and there was no moon,",
              "tokens": null
          }
      ],
      "created": 1769297019,
      "model": "gpt-oss-120b",
      "object": "text_completion",
      "system_fingerprint": "fp_feb5e1faa8274e54bef0",
      "time_info": {
          "completion_time": 0.002412253,
          "prompt_time": 0.000347391,
          "queue_time": 0.000245702,
          "total_time": 0.004331350326538086,
          "created": 1769297019.2632425
      },
      "usage": {
          "completion_tokens": 6,
          "prompt_tokens": 8,
          "prompt_tokens_details": {
              "cached_tokens": 0
          },
          "total_tokens": 14
      }
  }
  ```
</ResponseExample>
