Skip to main content
OpenRouter provides a unified API that gives you access to multiple AI providers, including Cerebras, through a single interface. This means you can use familiar tools and SDKs to access Cerebras’s ultra-fast inference without changing your existing code structure. For a complete list of Cerebras Inference powered models available on OpenRouter, visit the OpenRouter site.

Prerequisites

Before you begin, ensure you have:
  • OpenRouter API Key - Create a free account and get your API key at OpenRouter
  • Python 3.8 or higher (for Python examples) or Node.js 16+ (for JavaScript examples)

Quick Start

1

Install Dependencies

Choose your preferred method:
pip install requests
2

Set Your API Key

Create a .env file in your project directory:
OPENROUTER_API_KEY=your_openrouter_key_here
Or set it as an environment variable:
export OPENROUTER_API_KEY="your_openrouter_key_here"
3

Make Your First API Call

Choose your preferred method to query Llama 3.3-70B on Cerebras:
import os
import requests

# Set your OpenRouter API key
api_key = os.getenv("OPENROUTER_API_KEY")

url = "https://openrouter.ai/api/v1/chat/completions"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

# Define the request payload
data = {
    "model": "llama-3.3-70b-instruct",
    "provider": {
        "only": ["Cerebras"]
    },
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
}

# Send the POST request
response = requests.post(url, headers=headers, json=data)

# Print the response
print(response.json())
4

Advanced Examples

You did it — your first API call is complete! Now, let’s explore how to make your model smarter at handling tasks and more precise in how it formats its responses via structured outputs and tool calling. See the examples below for how to use both.
import os
import requests
import json

api_key = os.getenv("OPENROUTER_API_KEY")
url = "https://openrouter.ai/api/v1/chat/completions"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

# Define the JSON schema for the expected output
movie_schema = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "director": {"type": "string"},
        "year": {"type": "integer"}
    },
    "required": ["title", "director", "year"],
    "additionalProperties": False
}

data = {
    "model": "llama-3.3-70b-instruct",
    "provider": {
        "only": ["Cerebras"]
    },
    "messages": [
        {"role": "system", "content": "You are a helpful assistant that generates movie recommendations."},
        {"role": "user", "content": "Suggest a sci-fi movie from the 1990s."}
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "movie_schema",
            "strict": True,
            "schema": movie_schema
        }
    }
}

# Parse and display the result
response = requests.post(url, headers=headers, json=data)
result = response.json()
if 'choices' in result:
    movie_data = json.loads(result['choices'][0]['message']['content'])
    print(json.dumps(movie_data, indent=2))
else:
    print(f"Error: {result}")

FAQ

This varies by model. See our provider page to view max context length for each model.
A marginal amount of latency may appear as Cerebras Inference is only available via proxy, which has to be queried after your initial API request.
The official OpenRouter inference example uses a multimodal input call, which is not currently supported by Cerebras. To avoid this error, use the code provided in Step 2 of the tutorial above.