Learn how to use Cerebras Inference on OpenRouter.
This guide provides a step-by-step walkthrough for using the OpenRouter API to run inference on Cerebras hardware. For a complete list of Cerebras Inference powered models available on OpenRouter, visit the OpenRouter site.
We currently support the Chat Completion endpoint via the OpenRouter platform. You can get started with just a few lines of code.
To get started, follow the steps below.
1
Get an OpenRouter API Key
First, you will need to create an OpenRouter API key. You’ll use this key to authenticate with OpenRouter and access the Cerebras provider.
Here’s an example using ChatCompletions to query Llama 3.3-70B on Cerebras.
Be sure to replace your_openrouter_key_here with your actual API key.
Copy
import osimport requests# Set your OpenRouter API keyapi_key = "your_openrouter_key_here"url = "https://openrouter.ai/api/v1/chat/completions"headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}# Define the request payloaddata = { "model": "meta-llama/llama-3.3-70b-instruct", "provider": { "only": ["Cerebras"] }, "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ]}# Send the POST requestresponse = requests.post(url, headers=headers, json=data)# Print the responseprint(response.json())
3
Try Structured Outputs + Make a Tool Call
You did it — your first API call is complete! Now, let’s explore how to make your model smarter at handling tasks and more precise in how it formats its responses via structured outputs and tool calling. See the examples below for how to use both.
Copy
import osimport requestsimport jsonapi_key = "your_openrouter_key_here"url = "https://openrouter.ai/api/v1/chat/completions"headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}# Define the JSON schema for the expected outputmovie_schema = { "type": "object", "properties": { "title": {"type": "string"}, "director": {"type": "string"}, "year": {"type": "integer"} }, "required": ["title", "director", "year"], "additionalProperties": False}data = { "model": "meta-llama/llama-3.3-70b-instruct", "provider": { "only": ["Cerebras"] }, "messages": [ {"role": "system", "content": "You are a helpful assistant that generates movie recommendations."}, {"role": "user", "content": "Suggest a sci-fi movie from the 1990s."} ], "response_format": { "type": "json_schema", "json_schema": { "name": "movie_schema", "strict": True, "schema": movie_schema } }}# Parse and display the resultresponse = requests.post(url, headers=headers, json=data)result = response.json()movie_data = json.loads(result['choices'][0]['message']['content'])print(json.dumps(movie_data, indent=2))
Copy
import osimport requestsimport jsonapi_key = "your_openrouter_key_here"url = "https://openrouter.ai/api/v1/chat/completions"headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}# Define the JSON schema for the expected outputmovie_schema = { "type": "object", "properties": { "title": {"type": "string"}, "director": {"type": "string"}, "year": {"type": "integer"} }, "required": ["title", "director", "year"], "additionalProperties": False}data = { "model": "meta-llama/llama-3.3-70b-instruct", "provider": { "only": ["Cerebras"] }, "messages": [ {"role": "system", "content": "You are a helpful assistant that generates movie recommendations."}, {"role": "user", "content": "Suggest a sci-fi movie from the 1990s."} ], "response_format": { "type": "json_schema", "json_schema": { "name": "movie_schema", "strict": True, "schema": movie_schema } }}# Parse and display the resultresponse = requests.post(url, headers=headers, json=data)result = response.json()movie_data = json.loads(result['choices'][0]['message']['content'])print(json.dumps(movie_data, indent=2))
Copy
import osimport requestsimport jsonapi_key = "your_openrouter_key_here"url = "https://openrouter.ai/api/v1/chat/completions"headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}# Define the calculator tooltools = [ { "type": "function", "function": { "name": "calculator", "description": "Performs mathematical calculations.", "parameters": { "type": "object", "properties": { "expression": { "type": "string", "description": "A mathematical expression to evaluate, e.g., 'sqrt(16)'" } }, "required": ["expression"] } } }]# Define the message historymessages = [ {"role": "system", "content": "You are a helpful assistant capable of performing mathematical calculations using the calculator tool."}, {"role": "user", "content": "Is the square root of 16 equal to 4?"},]# Define the request payloaddata = { "model": "meta-llama/llama-3.3-70b-instruct", "provider": { "only": ["Cerebras"] }, "messages": messages, "tools": tools, "tool_choice": "auto"}# Send the POST requestresponse = requests.post(url, headers=headers, json=data)# Parse the responseresult = response.json()# Extract the tool calltool_calls = result['choices'][0]['message'].get('tool_calls', [])if tool_calls: for tool_call in tool_calls: function_name = tool_call['function']['name'] arguments = json.loads(tool_call['function']['arguments']) expression = arguments.get('expression') # Simulate executing the calculator function try: # WARNING: Using eval can be dangerous. In production, use a safe math parser. calculation_result = eval(expression, {"__builtins__": None}, {"sqrt": lambda x: x ** 0.5}) tool_response = f"The result of {expression} is {calculation_result}." except Exception as e: tool_response = f"Error evaluating expression: {e}" # Append the tool's response to the message history messages.append({ "role": "tool", "tool_call_id": tool_call['id'], "content": tool_response }) # Send the updated message history back to the model data = { "model": "meta-llama/llama-3.3-70b-instruct", "provider": { "only": ["Cerebras"] }, "messages": messages } response = requests.post(url, headers=headers, json=data) final_result = response.json() assistant_reply = final_result['choices'][0]['message']['content'] print(assistant_reply)else: print("No tool calls were made by the model.")
Cerebras Cloud is primarily intended for free tier users and high-throughput startups that need a dedicated plan to handle their inference. OpenRouter acts as one of our “pay-as-you-go“ providers.
Certain models, such as DeepSeek r1-distilled-70b can only be accessed on Cerebras Cloud with a paid plan.
Although the OpenRouter page may mention support for up to 10 million tokens, Cerebras currently supports a maximum context length of 32k.
A marginal amount of latency may appear as Cerebras Inference is only available via proxy, which has to be queried after your initial API request.
The official OpenRouter inference example uses a multimodal input call, which is not currently supported by Cerebras. To avoid this error, use the code provided in Step 2 of the tutorial above.