Cerebras Inference on OpenRouter

This guide provides a step-by-step walkthrough for using the OpenRouter API to run inference on Cerebras hardware. For a complete list of Cerebras Inference powered models available on OpenRouter, visit the OpenRouter site. We currently support the Chat Completion endpoint via the OpenRouter platform. You can get started with just a few lines of code. To get started, follow the steps below.

Get an OpenRouter API Key

First, you will need to create an OpenRouter API key. You’ll use this key to authenticate with OpenRouter and access the Cerebras provider.

Go to API Keys in OpenRouter
Click Create Key
Give it a name and copy your API key

Make an API Call

Here’s an example using ChatCompletions to query Llama 3.3-70B on Cerebras.Be sure to replace your_openrouter_key_here with your actual API key.

import os
import requests

# Set your OpenRouter API key
api_key = "your_openrouter_key_here"

url = "https://openrouter.ai/api/v1/chat/completions"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

# Define the request payload
data = {
    "model": "meta-llama/llama-3.3-70b-instruct",
    "provider": {
        "only": ["Cerebras"]
    },
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
}

# Send the POST request
response = requests.post(url, headers=headers, json=data)

# Print the response
print(response.json())

Try Structured Outputs + Make a Tool Call

You did it — your first API call is complete! Now, let’s explore how to make your model smarter at handling tasks and more precise in how it formats its responses via structured outputs and tool calling. See the examples below for how to use both.

import os
import requests
import json

api_key = "your_openrouter_key_here"
url = "https://openrouter.ai/api/v1/chat/completions"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

# Define the JSON schema for the expected output
movie_schema = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "director": {"type": "string"},
        "year": {"type": "integer"}
    },
    "required": ["title", "director", "year"],
    "additionalProperties": False
}

data = {
    "model": "meta-llama/llama-3.3-70b-instruct",
    "provider": {
        "only": ["Cerebras"]
    },
    "messages": [
        {"role": "system", "content": "You are a helpful assistant that generates movie recommendations."},
        {"role": "user", "content": "Suggest a sci-fi movie from the 1990s."}
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "movie_schema",
            "strict": True,
            "schema": movie_schema
        }
    }
}

# Parse and display the result
response = requests.post(url, headers=headers, json=data)
result = response.json()
movie_data = json.loads(result['choices'][0]['message']['content'])
print(json.dumps(movie_data, indent=2))

Differences Between Cerebras Cloud and OpenRouter

Cerebras Cloud is primarily intended for free tier users and high-throughput startups that need a dedicated plan to handle their inference. OpenRouter acts as one of our “pay-as-you-go“ providers. Certain models, such as DeepSeek r1-distilled-70b can only be accessed on Cerebras Cloud with a paid plan.

FAQ

What context length can I run?

What additional latency can I expect when using Cerebras through OpenRouter?

Why do I see “Wrong API Format“ when running the OpenRouter test code?

Get Started

Capabilities

Integrations

Support

Cerebras Inference on OpenRouter

Differences Between Cerebras Cloud and OpenRouter

FAQ

Get Started

Capabilities

Integrations

Support

​Differences Between Cerebras Cloud and OpenRouter

​FAQ

Differences Between Cerebras Cloud and OpenRouter

FAQ