Learn how to use Cerebras Inference on OpenRouter with Python, LangChain, and JavaScript.
OpenRouter provides a unified API that gives you access to multiple AI providers, including Cerebras, through a single interface. This means you can use familiar tools and SDKs to access Cerebras’s ultra-fast inference without changing your existing code structure.For a complete list of Cerebras Inference powered models available on OpenRouter, visit the OpenRouter site.
Choose your preferred method to query Llama 3.3-70B on Cerebras:
Report incorrect code
Copy
Ask AI
import osimport requests# Set your OpenRouter API keyapi_key = os.getenv("OPENROUTER_API_KEY")url = "https://openrouter.ai/api/v1/chat/completions"headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}# Define the request payloaddata = { "model": "llama-3.3-70b-instruct", "provider": { "only": ["Cerebras"] }, "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ]}# Send the POST requestresponse = requests.post(url, headers=headers, json=data)# Print the responseprint(response.json())
4
Advanced Examples
You did it — your first API call is complete! Now, let’s explore how to make your model smarter at handling tasks and more precise in how it formats its responses via structured outputs and tool calling. See the examples below for how to use both.
Report incorrect code
Copy
Ask AI
import osimport requestsimport jsonapi_key = os.getenv("OPENROUTER_API_KEY")url = "https://openrouter.ai/api/v1/chat/completions"headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}# Define the JSON schema for the expected outputmovie_schema = { "type": "object", "properties": { "title": {"type": "string"}, "director": {"type": "string"}, "year": {"type": "integer"} }, "required": ["title", "director", "year"], "additionalProperties": False}data = { "model": "llama-3.3-70b-instruct", "provider": { "only": ["Cerebras"] }, "messages": [ {"role": "system", "content": "You are a helpful assistant that generates movie recommendations."}, {"role": "user", "content": "Suggest a sci-fi movie from the 1990s."} ], "response_format": { "type": "json_schema", "json_schema": { "name": "movie_schema", "strict": True, "schema": movie_schema } }}# Parse and display the resultresponse = requests.post(url, headers=headers, json=data)result = response.json()if 'choices' in result: movie_data = json.loads(result['choices'][0]['message']['content']) print(json.dumps(movie_data, indent=2))else: print(f"Error: {result}")
This varies by model. See our provider page to view max context length for each model.
What additional latency can I expect when using Cerebras through OpenRouter?
A marginal amount of latency may appear as Cerebras Inference is only available via proxy, which has to be queried after your initial API request.
Why do I see “Wrong API Format“ when running the OpenRouter test code?
The official OpenRouter inference example uses a multimodal input call, which is not currently supported by Cerebras. To avoid this error, use the code provided in Step 2 of the tutorial above.