Skip to main content
Cloudflare AI Gateway acts as a proxy between your application and Cerebras Inference, providing powerful features like request logging, caching, rate limiting, and analytics. This integration allows you to monitor and optimize your AI workloads while maintaining the ultra-low latency of Cerebras hardware.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here
  • Cloudflare Account - Visit Cloudflare and create an account or log in
  • AI Gateway Created - Set up an AI Gateway in your Cloudflare dashboard
  • Python 3.11 or higher (for Python examples)

Configure Cloudflare AI Gateway

1

Create an AI Gateway

First, you’ll need to create an AI Gateway in your Cloudflare dashboard to enable request routing and monitoring.
  1. Log in to the Cloudflare dashboard
  2. Navigate to AI > AI Gateway
  3. Click Create Gateway
  4. Give your gateway a name (e.g., “cbrs” or “cerebras-gateway”) - this will be your Gateway ID
  5. Click Create to complete the setup
Important: After creating your gateway, note these two values:
  • Gateway ID: This is the name you just chose for your gateway (e.g., “cbrs”)
  • Account ID: This is visible in your browser’s URL bar (e.g., xxxxxx)
Your gateway URL will look like: https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cerebrasFor example: https://gateway.ai.cloudflare.com/v1/xxxxxx/cbrs/cerebras
Configure Cerebras Provider: In your AI Gateway settings, you may need to add Cerebras as a custom provider. If you see an error about configuring the gateway, go to your gateway settings and add Cerebras with the base URL: https://api.cerebras.ai/v1
2

Install required dependencies

Install the necessary Python packages:
pip install requests python-dotenv
3

Configure environment variables

Create a .env file in your project directory to securely store your credentials:
CEREBRAS_API_KEY=your-cerebras-api-key-here
CLOUDFLARE_ACCOUNT_ID=your-account-id-here  # From your browser URL
CLOUDFLARE_GATEWAY_ID=your-gateway-name  # The name you gave your gateway (e.g., "cbrs")
Replace the placeholder values with your actual credentials:
  • CEREBRAS_API_KEY: Your Cerebras API key from the dashboard
  • CLOUDFLARE_ACCOUNT_ID: The account ID from your Cloudflare dashboard URL (e.g., xxxxxx)
  • CLOUDFLARE_GATEWAY_ID: The gateway name you chose when creating the gateway
4

Initialize the client with AI Gateway

Set up your client to route requests through Cloudflare AI Gateway. The key is to use Cloudflare’s gateway URL as your base URL, which automatically enables logging, caching, and analytics.
import os
import requests
from dotenv import load_dotenv

load_dotenv()

# Set up the gateway URL
account_id = os.getenv("CLOUDFLARE_ACCOUNT_ID")
gateway_id = os.getenv("CLOUDFLARE_GATEWAY_ID")
api_key = os.getenv("CEREBRAS_API_KEY")

url = f"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cerebras/chat/completions"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}
5

Make your first request

Now you can make requests exactly as you would with the standard Cerebras API. All requests will automatically flow through AI Gateway, enabling logging and analytics.
import os
import requests
from dotenv import load_dotenv

load_dotenv()

account_id = os.getenv("CLOUDFLARE_ACCOUNT_ID")
gateway_id = os.getenv("CLOUDFLARE_GATEWAY_ID")
api_key = os.getenv("CEREBRAS_API_KEY")

url = f"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cerebras/chat/completions"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

data = {
    "model": "llama3.1-8b",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain how AI Gateway improves observability."}
    ],
    "max_tokens": 500,
    "temperature": 0.7
}

response = requests.post(url, headers=headers, json=data)
print(response.json()["choices"][0]["message"]["content"])
6

Enable streaming responses

Cloudflare AI Gateway fully supports streaming responses from Cerebras, allowing you to display results in real-time as they’re generated.
import os
import requests
from dotenv import load_dotenv

load_dotenv()

account_id = os.getenv("CLOUDFLARE_ACCOUNT_ID")
gateway_id = os.getenv("CLOUDFLARE_GATEWAY_ID")
api_key = os.getenv("CEREBRAS_API_KEY")

url = f"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cerebras/chat/completions"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

data = {
    "model": "llama3.1-8b",
    "messages": [
        {"role": "user", "content": "Write a short story about AI."}
    ],
    "stream": True,
    "max_tokens": 1000
}

# Stream responses for real-time output
response = requests.post(url, headers=headers, json=data, stream=True)

for line in response.iter_lines():
    if line:
        import json
        chunk = json.loads(line.decode('utf-8').replace('data: ', ''))
        if 'choices' in chunk and chunk['choices'][0].get('delta', {}).get('content'):
            print(chunk['choices'][0]['delta']['content'], end="")

Key Features

All requests routed through AI Gateway are automatically logged in your Cloudflare dashboard. You can view:
  • Request volume and patterns over time
  • Token usage and cost tracking
  • Response times and latency metrics
  • Error rates and types for debugging
  • Model usage distribution
Access your analytics by navigating to AI > AI Gateway in your Cloudflare dashboard and selecting your gateway.
Enable caching to reduce costs and improve response times for repeated queries. Cached responses are served instantly without hitting the Cerebras API:
  1. Go to your AI Gateway in the Cloudflare dashboard
  2. Navigate to Settings > Caching
  3. Enable caching and configure TTL (time-to-live)
  4. Cached responses will be served instantly for identical requests
Learn more about caching strategies in the Cloudflare documentation.
Protect your application and control costs with flexible rate limiting:
  1. In your AI Gateway settings, go to Rate Limiting
  2. Set limits per user, IP address, or API key
  3. Configure time windows (per minute, hour, or day)
  4. Requests exceeding limits will receive a 429 status code
Explore rate limiting configuration for advanced options.

Next Steps

FAQ

This usually means your Cerebras API key is invalid or missing. Double-check that:
  1. Your CEREBRAS_API_KEY environment variable is set correctly
  2. The API key is active and hasn’t been revoked
  3. You’re using the correct Authorization header format: Bearer YOUR_API_KEY
You can verify your API key by making a direct request to Cerebras (without AI Gateway) to isolate the issue.
To view request logs and analytics:
  1. Log in to your Cloudflare dashboard
  2. Navigate to AI > AI Gateway
  3. Select your gateway
  4. Click on the Analytics tab to see request metrics and logs
Logs include request/response bodies, timestamps, token usage, latency metrics, and error details. You can filter by date range, model, and status code.
Cloudflare AI Gateway adds minimal latency (typically 10-50ms) as requests are routed through Cloudflare’s global network. However, this is often offset by:
  • Caching: Repeated queries are served instantly from cache with near-zero latency
  • Edge network: Cloudflare’s global edge network may provide faster routing than direct connections
  • Optimization insights: The observability features help you identify and fix performance bottlenecks
For latency-critical applications, you can measure the impact by comparing direct Cerebras requests with gateway-routed requests using the same prompts.
Yes! AI Gateway fully supports streaming responses from Cerebras. Simply set stream=True (Python) or stream: true (JavaScript) in your request, and chunks will be streamed through the gateway in real-time. All streaming requests are still logged and counted in your analytics, giving you complete visibility into your streaming workloads.
Cloudflare AI Gateway is built on Cloudflare’s highly reliable global network with 99.99%+ uptime. In the rare event of an outage:
  1. You can temporarily switch to direct Cerebras API calls by changing your base_url to https://api.cerebras.ai/v1
  2. Cloudflare provides real-time status updates at cloudflarestatus.com
  3. Your application code doesn’t need to change - just update the base URL configuration
Consider implementing automatic fallback logic in production applications to switch between gateway and direct endpoints based on availability.
All current Cerebras models are available through AI Gateway:
  • llama-3.3-70b - Best for complex reasoning, long-form content, and tasks requiring deep understanding
  • qwen-3-32b - Balanced performance for general-purpose applications
  • llama3.1-8b - Fastest option for simple tasks and high-throughput scenarios
  • gpt-oss-120b - Largest model for the most demanding tasks
  • zai-glm-4.6 - Advanced 357B parameter model with strong reasoning capabilities
You can use any of these models by specifying the model name in your request. When using the OpenAI-compatible endpoint, prefix the model name with cerebras/ (e.g., cerebras/llama-3.3-70b).