Cerebras Inference on OpenRouter

OpenRouter provides a unified API that gives you access to multiple AI providers, including Cerebras, through a single interface. This means you can use familiar tools and SDKs to access Cerebras’s ultra-fast inference without changing your existing code structure. For a complete list of Cerebras Inference powered models available on OpenRouter, visit the OpenRouter site.

Prerequisites

Before you begin, ensure you have:

OpenRouter API Key - Create a free account and get your API key at OpenRouter
Python 3.8 or higher (for Python examples) or Node.js 16+ (for JavaScript examples)

Quick Start

Install Dependencies

Choose your preferred method:

pip install requests

Set Your API Key

Create a .env file in your project directory:

OPENROUTER_API_KEY=your_openrouter_key_here

Or set it as an environment variable:

export OPENROUTER_API_KEY="your_openrouter_key_here"

Make Your First API Call

Choose your preferred method to query Llama 3.3-70B on Cerebras:

import os
import requests

# Set your OpenRouter API key
api_key = os.getenv("OPENROUTER_API_KEY")

url = "https://openrouter.ai/api/v1/chat/completions"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

# Define the request payload
data = {
    "model": "llama-3.3-70b-instruct",
    "provider": {
        "only": ["Cerebras"]
    },
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
}

# Send the POST request
response = requests.post(url, headers=headers, json=data)

# Print the response
print(response.json())

Advanced Examples

You did it — your first API call is complete! Now, let’s explore how to make your model smarter at handling tasks and more precise in how it formats its responses via structured outputs and tool calling. See the examples below for how to use both.

import os
import requests
import json

api_key = os.getenv("OPENROUTER_API_KEY")
url = "https://openrouter.ai/api/v1/chat/completions"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

# Define the JSON schema for the expected output
movie_schema = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "director": {"type": "string"},
        "year": {"type": "integer"}
    },
    "required": ["title", "director", "year"],
    "additionalProperties": False
}

data = {
    "model": "llama-3.3-70b-instruct",
    "provider": {
        "only": ["Cerebras"]
    },
    "messages": [
        {"role": "system", "content": "You are a helpful assistant that generates movie recommendations."},
        {"role": "user", "content": "Suggest a sci-fi movie from the 1990s."}
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "movie_schema",
            "strict": True,
            "schema": movie_schema
        }
    }
}

# Parse and display the result
response = requests.post(url, headers=headers, json=data)
result = response.json()
if 'choices' in result:
    movie_data = json.loads(result['choices'][0]['message']['content'])
    print(json.dumps(movie_data, indent=2))
else:
    print(f"Error: {result}")

Available Models

OpenRouter provides access to all Cerebras models:

Model	Parameters	Best For
llama-3.3-70b	70B	Best for complex reasoning, long-form content, and tasks requiring deep understanding
qwen-3-32b	32B	Balanced performance for general-purpose applications
llama3.1-8b	8B	Fastest option for simple tasks and high-throughput scenarios
gpt-oss-120b	120B	Largest model for the most demanding tasks
zai-glm-4.7	357B	Advanced 357B parameter model with strong reasoning capabilities

Visit the OpenRouter Cerebras provider page for the complete list of available models.

FAQ

What context length can I run?

This varies by model. See our provider page to view max context length for each model.

What additional latency can I expect when using Cerebras through OpenRouter?

A marginal amount of latency may appear as Cerebras Inference is only available via proxy, which has to be queried after your initial API request.

Why do I see “Wrong API Format“ when running the OpenRouter test code?

The official OpenRouter inference example uses a multimodal input call, which is not currently supported by Cerebras. To avoid this error, use the code provided in Step 2 of the tutorial above.

Get Started

Capabilities

Compatibility

Resources

Support

Cerebras Inference on OpenRouter

Prerequisites

Quick Start

Available Models

FAQ

Get Started

Capabilities

Compatibility

Resources

Support

​Prerequisites

​Quick Start

​Available Models

​FAQ

Prerequisites

Quick Start

Available Models

FAQ