Skip to main content

Documentation Index

Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt

Use this file to discover all available pages before exploring further.

Build at the Speed of Cerebras
Experience real-time AI responses across coding, reasoning, voice, and agentic workloads with the world’s fastest AI inference.
import os
from cerebras.cloud.sdk import Cerebras

client = Cerebras(
    api_key=os.environ.get("CEREBRAS_API_KEY"),
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Why is fast inference important?",
        }
],
    model="llama3.1-8b",
)

print(chat_completion)

Explore Models

View our available models, including performance specifications, rate limits, and pricing details.

Dedicated Endpoints

Private, high-performance inference endpoints with reserved capacity and guaranteed throughput for production workloads.

Start building