Completions - Cerebras Inference

import os
from cerebras.cloud.sdk import Cerebras

client = Cerebras(
    api_key=os.environ.get("CEREBRAS_API_KEY"),  # This is the default and can be omitted
)

completion = client.completions.create(
    prompt="It was a dark and stormy night",
    max_tokens=100,
    model="llama3.1-8b",
)

print(completion)

{
    "id": "chatcmpl-b8718798-d389-4421-9242-13b07e84983b",
    "choices": [
        {
            "finish_reason": "length",
            "index": 0,
            "text": " when I stumbled upon a small, quirky shop tucked away in a quiet alley. The sign above the door read \"Curios and Wonders,\" and the windows were filled with a dazzling array of strange and exotic items. I pushed open the door and stepped inside, my eyes adjusting to the dim light within.\n\nThe shop was a treasure trove of oddities, with shelves upon shelves of peculiar objects that seemed to defy explanation. There were vintage taxidermy animals, antique medical equipment, and"
        }
    ],
    "created": 1731597024,
    "model": "llama3.1-8b",
    "system_fingerprint": "fp_e8eacef18a",
    "object": "text_completion",
    "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 100,
        "total_tokens": 110
    },
    "time_info": {
        "queue_time": 4.673e-05,
        "prompt_time": 0.0004940576161616161,
        "completion_time": 0.045957338383838385,
        "total_time": 0.058876991271972656,
        "created": 1731597024
    }
}

import os
from cerebras.cloud.sdk import Cerebras

client = Cerebras(
    api_key=os.environ.get("CEREBRAS_API_KEY"),  # This is the default and can be omitted
)

completion = client.completions.create(
    prompt="It was a dark and stormy night",
    max_tokens=100,
    model="llama3.1-8b",
)

print(completion)

{
    "id": "chatcmpl-b8718798-d389-4421-9242-13b07e84983b",
    "choices": [
        {
            "finish_reason": "length",
            "index": 0,
            "text": " when I stumbled upon a small, quirky shop tucked away in a quiet alley. The sign above the door read \"Curios and Wonders,\" and the windows were filled with a dazzling array of strange and exotic items. I pushed open the door and stepped inside, my eyes adjusting to the dim light within.\n\nThe shop was a treasure trove of oddities, with shelves upon shelves of peculiar objects that seemed to defy explanation. There were vintage taxidermy animals, antique medical equipment, and"
        }
    ],
    "created": 1731597024,
    "model": "llama3.1-8b",
    "system_fingerprint": "fp_e8eacef18a",
    "object": "text_completion",
    "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 100,
        "total_tokens": 110
    },
    "time_info": {
        "queue_time": 4.673e-05,
        "prompt_time": 0.0004940576161616161,
        "completion_time": 0.045957338383838385,
        "total_time": 0.058876991271972656,
        "created": 1731597024
    }
}

Completion Request

prompt

string | array

The prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Default: ""

model

string

required

Available options:

llama-4-scout-17b-16e-instruct
llama3.1-8b
llama-3.3-70b
llama-4-maverick-17b-128e
qwen-3-32b
qwen-3-235b-a22b
deepseek-r1-distill-llama-70b (private preview)

stream

boolean | null

If set, partial message deltas will be sent. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Default: false

return_raw_tokens

boolean | null

Return raw tokens instead of text. Default: false

max_tokens

integer | null

The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length. Default: null

min_tokens

integer | null

The minimum number of tokens to generate for a completion. If not specified or set to 0, the model will generate as many tokens as it deems necessary. Setting to -1 sets to max sequence length. Default: null

grammar_root

string | null

The grammar root used for structured output generation. Supported values: root, fcall, nofcall, insidevalue, value, object, array, string, number, funcarray, func, ws. Default: null

seed

integer | null

If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed. Default: null

stop

string | array | null

Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. Default: null

temperature

float | null

What sampling temperature to use, between 0 and 1.5. Higher values (e.g., 0.8) will make the output more random, while lower values (e.g., 0.2) will make it more focused and deterministic. We generally recommend altering this or top_p but not both. Default: 1.0

top_p

float | null

An alternative to sampling with temperature, called nucleus sampling, where the model considers the tokens with top_p probability mass. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. Default: 1.0

echo

boolean

Echo back the prompt in addition to the completion. Incompatible with return_raw_tokens=True. Default: false

user

string | null

A unique identifier representing your end-user, which can help Cerebras to monitor and detect abuse. Default: null

Completion Response

choices

object[]

required

The list of completion choices the model generated for the input prompt.

Show properties

finish_reason

string | null

The reason the model stopped generating tokens. This will be stop if the model hit a natural stop point or a provided stop sequence, length if the maximum number of tokens specified in the request was reached, or content_filter if content was omitted due to a flag from our content filters.

index

integer

logprobs

object | null

text

string

created

integer | null

required

The Unix timestamp (in seconds) of when the completion was created.

string

A unique identifier for the completion.

model

string

The model used for completion.

object

string

required

The object type, which is always “text_completion”

system_fingerprint

string

This fingerprint represents the backend configuration that the model runs with.Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.

usage

object

Usage statistics for the completion request.

import os
from cerebras.cloud.sdk import Cerebras

client = Cerebras(
    api_key=os.environ.get("CEREBRAS_API_KEY"),  # This is the default and can be omitted
)

completion = client.completions.create(
    prompt="It was a dark and stormy night",
    max_tokens=100,
    model="llama3.1-8b",
)

print(completion)

{
    "id": "chatcmpl-b8718798-d389-4421-9242-13b07e84983b",
    "choices": [
        {
            "finish_reason": "length",
            "index": 0,
            "text": " when I stumbled upon a small, quirky shop tucked away in a quiet alley. The sign above the door read \"Curios and Wonders,\" and the windows were filled with a dazzling array of strange and exotic items. I pushed open the door and stepped inside, my eyes adjusting to the dim light within.\n\nThe shop was a treasure trove of oddities, with shelves upon shelves of peculiar objects that seemed to defy explanation. There were vintage taxidermy animals, antique medical equipment, and"
        }
    ],
    "created": 1731597024,
    "model": "llama3.1-8b",
    "system_fingerprint": "fp_e8eacef18a",
    "object": "text_completion",
    "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 100,
        "total_tokens": 110
    },
    "time_info": {
        "queue_time": 4.673e-05,
        "prompt_time": 0.0004940576161616161,
        "completion_time": 0.045957338383838385,
        "total_time": 0.058876991271972656,
        "created": 1731597024
    }
}

Chat Completions

Models

Endpoints

​Completion Request

​Completion Response

Completion Request

Completion Response