Skip to main content

Documentation Index

Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt

Use this file to discover all available pages before exploring further.

Make your first API call and see what inference at thousands of tokens per second feels like. Already familiar with LLM APIs? Skip straight to the API reference or try the playground.

Prerequisites

To complete this guide, you will need:
  • A Cerebras account (sign up free)
  • A Cerebras Inference API key
  • Python 3.10+ or TypeScript 4.5+
1

Set up your API key

Visit the Cloud Console and navigate to API Keys in the left nav bar to create a key.Set your API key as an environment variable so you don’t have to pass it with every request:
export CEREBRAS_API_KEY="your-api-key-here"
Confirm the variable is set:
echo $CEREBRAS_API_KEY
2

Install the SDK

Install the Cerebras SDK for your language of choice. You can also call the API directly with cURL (see Step 3).
pip install --upgrade cerebras_cloud_sdk
3

Make your first API request

Run the following code to send a chat completion request:
import os
from cerebras.cloud.sdk import Cerebras

client = Cerebras(
    api_key=os.environ.get("CEREBRAS_API_KEY"),
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Why is fast inference important?",
        }
    ],
    model="llama3.1-8b",
)

print(chat_completion.choices[0].message.content)
You should see a response like:
Fast inference is important because it enables real-time interactions,
reduces latency in production applications, and allows for more complex
reasoning workflows within acceptable response times...
If you get a 401 Unauthorized error, double-check that your CEREBRAS_API_KEY environment variable is set correctly.

Next Steps