Learn how to use Cerebras Inference on OpenRouter.
This guide provides a step-by-step walkthrough for using the OpenRouter API to run inference on Cerebras hardware. For a complete list of Cerebras Inference powered models available on OpenRouter, visit the OpenRouter site.We currently support the Chat Completion endpoint via the OpenRouter platform. You can get started with just a few lines of code.To get started, follow the steps below.
1
Get an OpenRouter API Key
First, you will need to create an OpenRouter API key. You’ll use this key to authenticate with OpenRouter and access the Cerebras provider.
Here’s an example using ChatCompletions to query Llama 3.3-70B on Cerebras.Be sure to replace your_openrouter_key_here with your actual API key.
Report incorrect code
Copy
Ask AI
import osimport requests# Set your OpenRouter API keyapi_key = "your_openrouter_key_here"url = "https://openrouter.ai/api/v1/chat/completions"headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}# Define the request payloaddata = { "model": "meta-llama/llama-3.3-70b-instruct", "provider": { "only": ["Cerebras"] }, "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ]}# Send the POST requestresponse = requests.post(url, headers=headers, json=data)# Print the responseprint(response.json())
3
Try Structured Outputs + Make a Tool Call
You did it — your first API call is complete! Now, let’s explore how to make your model smarter at handling tasks and more precise in how it formats its responses via structured outputs and tool calling. See the examples below for how to use both.
Structured Outputs
Tool Calls
Report incorrect code
Copy
Ask AI
import osimport requestsimport jsonapi_key = "your_openrouter_key_here"url = "https://openrouter.ai/api/v1/chat/completions"headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}# Define the JSON schema for the expected outputmovie_schema = { "type": "object", "properties": { "title": {"type": "string"}, "director": {"type": "string"}, "year": {"type": "integer"} }, "required": ["title", "director", "year"], "additionalProperties": False}data = { "model": "meta-llama/llama-3.3-70b-instruct", "provider": { "only": ["Cerebras"] }, "messages": [ {"role": "system", "content": "You are a helpful assistant that generates movie recommendations."}, {"role": "user", "content": "Suggest a sci-fi movie from the 1990s."} ], "response_format": { "type": "json_schema", "json_schema": { "name": "movie_schema", "strict": True, "schema": movie_schema } }}# Parse and display the resultresponse = requests.post(url, headers=headers, json=data)result = response.json()movie_data = json.loads(result['choices'][0]['message']['content'])print(json.dumps(movie_data, indent=2))
This varies by model. See our provider page to view max context length for each model.
What additional latency can I expect when using Cerebras through OpenRouter?
A marginal amount of latency may appear as Cerebras Inference is only available via proxy, which has to be queried after your initial API request.
Why do I see “Wrong API Format“ when running the OpenRouter test code?
The official OpenRouter inference example uses a multimodal input call, which is not currently supported by Cerebras. To avoid this error, use the code provided in Step 2 of the tutorial above.