Completion Request
The prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays.
Default:
""Available options:
llama3.1-8bllama-3.3-70bqwen-3-32bqwen-3-235b-a22b-instruct-2507(preview)qwen-3-235b-a22b-thinking-2507(preview)qwen-3-coder-480b(preview)
If set, partial message deltas will be sent. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a
data: [DONE] message.Default: falseReturn raw tokens instead of text.Default:
falseThe maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length.Default:
nullThe minimum number of tokens to generate for a completion. If not specified or set to 0, the model will generate as many tokens as it deems necessary. Setting to -1 sets to max sequence length.Default:
nullThe grammar root used for structured output generation.
Supported values:
root, fcall, nofcall, insidevalue, value, object, array, string, number, funcarray, func, ws.Default: nullIf specified, our system will make a best effort to sample deterministically, such that repeated requests with the same
seed and parameters should return the same result. Determinism is not guaranteed.Default: nullUp to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.Default:
nullWhat sampling temperature to use, between 0 and 1.5. Higher values (e.g., 0.8) will make the output more random, while lower values (e.g., 0.2) will make it more focused and deterministic. We generally recommend altering this or
top_p but not both.Default: 1.0An alternative to sampling with temperature, called nucleus sampling, where the model considers the tokens with top_p probability mass. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or
temperature but not both.Default: 1.0Echo back the prompt in addition to the completion. Incompatible with
return_raw_tokens=True.Default: falseA unique identifier representing your end-user, which can help Cerebras to monitor and detect abuse.Default:
nullReturn log probabilities of the output tokens.For example, if
logprobs is 5, the API will return a list of the 5 most likely tokens. The API will always return the logprob of the sampled token, so there may be up to logprobs+1 elements in the response.The max value is 20.Default: nullSetting
logprobs to 0 is different than null. When set to null, log probabilities are disabled entirely. When set to 0, log probabilities are enabled but it does not return top_logprobs.Completion Response
The list of completion choices the model generated for the input prompt.
The Unix timestamp (in seconds) of when the completion was created.
A unique identifier for the completion.
The model used for completion.
The object type, which is always “text_completion”
This fingerprint represents the backend configuration that the model runs with.Can be used in conjunction with the
seed request parameter to understand when backend changes have been made that might impact determinism.Usage statistics for the completion request.

