Skip to main content

Documentation Index

Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt

Use this file to discover all available pages before exploring further.

The Playground lets you experiment with Cerebras models directly in the browser. Use it to evaluate models, iterate on prompts, test tool/function calling, and tune parameters, then export a working request to code when you’re ready.
Your playground and API requests are never used to train models.

Message Roles

RolePurpose
SystemSets the behavior and context for the model. Use this to define a persona, provide background information, or constrain responses.
UserRepresents input from the human turn. This is the prompt the model responds to.
AssistantRepresents a prior model response. Insert assistant messages to simulate a multi-turn conversation or prime the model toward a particular style or format.
Use the Add button to add a new User or Assistant message to the conversation without running inference. Use it to build multi-turn conversations manually. Click Run to send the full conversation to the model. After each response, the Playground displays token usage, inference time, speed (tokens per second), and round trip time in the upper-right corner of the response.

Configuration

Select a model from the dropdown at the top right. See the Models overview to learn more about the available options.
ParameterWhat it controls
TemperatureRandomness of the output. Lower values produce more focused, predictable responses; higher values produce more varied responses.
Max Completion TokensMaximum number of tokens the model will generate in a single response.
Top PNucleus sampling threshold. Limits the model to sampling from the top portion of the probability distribution.
FormatOutput format. text (default), json_object, or json_schema.
FunctionsDefine tool/function schemas the model can call. See Tool Calling for the full reference.
Reasoning EffortControls how much effort the model spends reasoning before responding. Only available for select models.
SeedSet an integer to produce deterministic outputs across requests with the same input.
StreamStream tokens as they’re generated rather than returning the full response at once.
Stop SequenceA string that causes the model to stop generating when produced. Useful for structured outputs.

View Code

Once you have a prompt and parameters you’re happy with, click View Code to get a ready-to-run code snippet for your application.