Build at the Speed of Cerebras

Experience real-time AI responses across coding, reasoning, voice, and agentic workloads with the world’s fastest AI inference.

Get an API key Quickstart →

import os
from cerebras.cloud.sdk import Cerebras

client = Cerebras(
    api_key=os.environ.get("CEREBRAS_API_KEY"),
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Why is fast inference important?",
        }
],
    model="gpt-oss-120b",
)

print(chat_completion)

Explore Models

View our available models, including performance specifications, rate limits, and pricing details.

Dedicated Endpoints

Private, high-performance inference endpoints with reserved capacity and guaranteed throughput for production workloads.

Start building

Designing for Cerebras — Architectural patterns for building on wafer-scale inference.
OpenAI Compatibility — Migrate your existing code with minimal changes.
Integrations — Plug into popular AI frameworks and tools.