> ## Documentation Index > Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt > Use this file to discover all available pages before exploring further. # Completions Generate text continuations from a single prompt string. Best for simple text generation, autocomplete, and single-turn tasks. ## Request ### Headers The media type of the request body. **Supported values:** `application/json`, `application/vnd.msgpack` **Default:** `application/json` See [Payload Optimization](/capabilities/payload-optimization) for details. The compression encoding applied to the request body. **Supported values:** `gzip` When set, the request body must be gzip-compressed. Can be combined with any supported `Content-Type`. See [Payload Optimization](/capabilities/payload-optimization) for details. ### Body The prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Default: `""` The model to use for completion. See [Supported Models](/models/overview) for a list of available models. If set, partial message deltas will be sent. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a `data: [DONE]` message. Default: `false` Return raw tokens instead of text. Default: `false` The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. Default: `null` The minimum number of tokens to generate for a completion. If not specified or set to 0, the model will generate as many tokens as it deems necessary. Setting to -1 sets to max sequence length. Default: `null` The grammar root used for structured output generation. Supported values: `root`, `fcall`, `nofcall`, `insidevalue`, `value`, `object`, `array`, `string`, `number`, `funcarray`, `func`, `ws`. Default: `null` If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result. Determinism is not guaranteed. Default: `null` Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. Default: `null` What sampling temperature to use, between 0 and 1.5. Higher values (e.g., 0.8) will make the output more random, while lower values (e.g., 0.2) will make it more focused and deterministic. We generally recommend altering this or `top_p` but not both. Minimum: `0`, Maximum: `1.5` Default: `1.0` An alternative to sampling with temperature, called nucleus sampling, where the model considers the tokens with top\_p probability mass. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or `temperature` but not both. Minimum: `0`, Maximum: `1` Default: `1.0` Echo back the prompt in addition to the completion. Incompatible with `return_raw_tokens=True`. Default: `false` A unique identifier representing your end-user, which can help Cerebras to monitor and detect abuse. Default: `null` An opaque identifier that groups related requests so they reuse the same [prompt cache](/capabilities/prompt-caching). Requests sharing the same `prompt_cache_key` are routed together, which increases cache hits and reduces time to first token. Set it to a stable identifier like a conversation ID, user ID, or session ID. Maximum length: `1024` characters Default: `null` `prompt_cache_key` must be enabled on your account before you can use it. [Contact us](https://www.cerebras.ai/contact) or reach out to your account representative to request access. Return log probabilities of the output tokens. For example, if `logprobs` is 5, the API will return a list of the 5 most likely tokens. The API will always return the logprob of the sampled token, so there may be up to `logprobs+1` elements in the response. Minimum: `0`, Maximum: `20` Default: `null` Setting `logprobs` to 0 is different than `null`. When set to `null`, log probabilities are disabled entirely. When set to 0, log probabilities are enabled but it does not return `top_logprobs`. ## Completion Response The list of completion choices the model generated for the input prompt. The reason the model stopped generating tokens. This will be `stop` if the model hit a natural stop point or a provided stop sequence, `length` if the maximum number of tokens specified in the request was reached, or `content_filter` if content was omitted due to a flag from our content filters. Number of characters since the prompt. Logprob value for each token. The tokens. List of the most likely tokens and their log probability, at this token position. In rare cases, there may be fewer than the number of requested `top_logprobs` returned. The generated completion text. The raw tokens of the completion, returned when `return_raw_tokens` is set to `true`. The Unix timestamp (in seconds) of when the completion was created. A unique identifier for the completion. The model used for completion. The object type, which is always "text\_completion" This fingerprint represents the backend configuration that the model runs with. Can be used in conjunction with the `seed` request parameter to understand when backend changes have been made that might impact determinism. Usage statistics for the completion request. Number of tokens in the prompt. Number of tokens in the generated completion. Total number of tokens used in the request (prompt + completion). Detailed breakdown of prompt token usage. Number of prompt tokens that were served from the cache and reused from a previous request. See [Prompt Caching](/capabilities/prompt-caching) for more information. Performance timing information for the request. Time spent in queue waiting for processing (in seconds). Time spent processing the prompt/input tokens (in seconds). Time spent generating the completion/output tokens (in seconds). Total time for the entire request from submission to completion (in seconds). Unix timestamp (in seconds) of when the time\_info was recorded. ```python Python theme={null} import os from cerebras.cloud.sdk import Cerebras client = Cerebras( api_key=os.environ.get("CEREBRAS_API_KEY"), # This is the default and can be omitted ) completion = client.completions.create( prompt="It was a dark and stormy night", max_tokens=100, model="gpt-oss-120b", logprobs=5, ) print(completion) ``` ```javascript Node.js theme={null} import Cerebras from '@cerebras/cerebras_cloud_sdk'; const client = new Cerebras({ apiKey: process.env['CEREBRAS_API_KEY'], // This is the default and can be omitted }); async function main() { const completion = await client.completions.create({ prompt: "It was a dark and stormy night", model: 'gpt-oss-120b', logprobs: 5, }); console.log(completion?.choices[0]?.text); } main(); ``` ```cli cURL theme={null} curl -X POST https://api.cerebras.ai/v1/completions \ -H "Authorization: Bearer $CEREBRAS_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "It was a dark and stormy night", "max_tokens": 100, "model": "gpt-oss-120b", "logprobs": 5 }' ``` ```json Response theme={null} { "id": "chatcmpl-b1743f2d-8c20-4ad5-a77c-426521ae1b1c", "choices": [ { "index": 0, "finish_reason": "length", "logprobs": { "text_offset": [30, 34, 40, 44, 47, 52], "token_logprobs": [ -3.3912997245788574, -4.322617053985596, -0.3532562553882599, -1.7001153230667114, -2.187103271484375, -2.4667720794677734 ], "tokens": [" and", " there", " was", " no", " moon", ","], "top_logprobs": [ { ".": -1.1334871053695679, ",": -1.3366121053695679, " when": -2.3366122245788574, " in": -2.9381747245788574, "...": -3.3287997245788574, " and": -3.3912997245788574 }, { " I": -0.9554294943809509, " the": -1.7679295539855957, " a": -2.2366795539855957, " all": -3.1898045539855957, " we": -3.4241795539855957, " there": -4.322617053985596 }, { " was": -0.3532562553882599, " were": -1.6813812255859375, " I": -4.0251312255859375, " wasn": -4.1345062255859375, " had": -4.1970062255859375 }, { " a": -0.3719903528690338, " no": -1.7001153230667114, " an": -3.614177942276001, " this": -4.129802703857422, " nothing": -4.184490203857422 }, { " electricity": -1.3433531522750854, " one": -1.6949156522750854, " moon": -2.187103271484375, " way": -3.038665771484375, " sign": -3.054290771484375 }, { ".": -0.8495846390724182, " to": -2.1386470794677734, " in": -2.2870845794677734, ",": -2.4667720794677734, ".\n": -3.2714595794677734 } ] }, "text": " and there was no moon,", "tokens": null } ], "created": 1769297019, "model": "gpt-oss-120b", "object": "text_completion", "system_fingerprint": "fp_feb5e1faa8274e54bef0", "time_info": { "completion_time": 0.002412253, "prompt_time": 0.000347391, "queue_time": 0.000245702, "total_time": 0.004331350326538086, "created": 1769297019.2632425 }, "usage": { "completion_tokens": 6, "prompt_tokens": 8, "prompt_tokens_details": { "cached_tokens": 0 }, "total_tokens": 14 } } ```