Skip to main content
Get started with the world’s fastest inference. Already familiar with LLM APIs? Skip straight to the API reference or try the playground.
1

Set up your API key

Visit the Cloud Console and sign up or log in. Navigate to API Keys in the left nav bar to create a key. See API Keys for details.Set your API key as an environment variable so you don’t have to pass it with every request:
export CEREBRAS_API_KEY="your-api-key-here"
Confirm the variable is set:
echo $CEREBRAS_API_KEY
export and $env: set the variable for the current shell only. setx on Windows persists the variable, but you must open a new terminal window for it to take effect. To persist on macOS or Linux, add the export line to your ~/.zshrc, ~/.bashrc, or equivalent shell profile.
2

Install the SDK

Install the Cerebras SDK for your language of choice. You can also call the API directly with cURL (see Step 3).
pip install --upgrade cerebras_cloud_sdk
3

Make your first API request

Run the following code to send a chat completion request:
import os
from cerebras.cloud.sdk import Cerebras

client = Cerebras(
    api_key=os.environ.get("CEREBRAS_API_KEY"),
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Why is fast inference important?",
        }
    ],
    model="gpt-oss-120b",
)

print(chat_completion.choices[0].message.content)
You should see a response like:
Fast inference is important because it enables real-time interactions,
reduces latency in production applications, and allows for more complex
reasoning workflows within acceptable response times...

Common Errors

  • 401 UnauthorizedCEREBRAS_API_KEY isn’t set in the shell running your code. Re-run the echo command above in the same terminal to confirm. On Windows after setx, open a new terminal.
  • 404 model not found — The model ID is misspelled, deprecated, or not available on your account. See the full list of public models on the Models page.
  • 429 Too Many Requests — You’ve hit a rate limit. Free accounts have lower per-minute limits than paid accounts. See Rate limits for current quotas and how to request an increase.
For all status codes, see the error reference.

Next Steps