Get Started with Cloudflare AI Gateway

Cloudflare AI Gateway acts as a proxy between your application and Cerebras Inference, providing powerful features like request logging, caching, rate limiting, and analytics. This integration allows you to monitor and optimize your AI workloads while maintaining the ultra-low latency of Cerebras hardware.

Prerequisites

Before you begin, ensure you have:

Cerebras API Key - Get a free API key here
Cloudflare Account - Visit Cloudflare and create an account or log in
AI Gateway Created - Set up an AI Gateway in your Cloudflare dashboard
Python 3.11 or higher (for Python examples)

Configure Cloudflare AI Gateway

Create an AI Gateway

First, you’ll need to create an AI Gateway in your Cloudflare dashboard to enable request routing and monitoring.

Log in to the Cloudflare dashboard
Navigate to AI > AI Gateway
Click Create Gateway
Give your gateway a name (e.g., “cbrs” or “cerebras-gateway”) - this will be your Gateway ID
Click Create to complete the setup

Important: After creating your gateway, note these two values:

Gateway ID: This is the name you just chose for your gateway (e.g., “cbrs”)
Account ID: This is visible in your browser’s URL bar (e.g., xxxxxx)

Your gateway URL will look like: https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cerebrasFor example: https://gateway.ai.cloudflare.com/v1/xxxxxx/cbrs/cerebras

Configure Cerebras Provider: In your AI Gateway settings, you may need to add Cerebras as a custom provider. If you see an error about configuring the gateway, go to your gateway settings and add Cerebras with the base URL: https://api.cerebras.ai/v1

Install required dependencies

Install the necessary Python packages:

pip install requests python-dotenv

Configure environment variables

Create a .env file in your project directory to securely store your credentials:

CEREBRAS_API_KEY=your-cerebras-api-key-here
CLOUDFLARE_ACCOUNT_ID=your-account-id-here  # From your browser URL
CLOUDFLARE_GATEWAY_ID=your-gateway-name  # The name you gave your gateway (e.g., "cbrs")

Replace the placeholder values with your actual credentials:

CEREBRAS_API_KEY: Your Cerebras API key from the dashboard
CLOUDFLARE_ACCOUNT_ID: The account ID from your Cloudflare dashboard URL (e.g., xxxxxx)
CLOUDFLARE_GATEWAY_ID: The gateway name you chose when creating the gateway

Initialize the client with AI Gateway

Set up your client to route requests through Cloudflare AI Gateway. The key is to use Cloudflare’s gateway URL as your base URL, which automatically enables logging, caching, and analytics.

import os
import requests
from dotenv import load_dotenv

load_dotenv()

# Set up the gateway URL
account_id = os.getenv("CLOUDFLARE_ACCOUNT_ID")
gateway_id = os.getenv("CLOUDFLARE_GATEWAY_ID")
api_key = os.getenv("CEREBRAS_API_KEY")

url = f"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cerebras/chat/completions"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

Make your first request

Now you can make requests exactly as you would with the standard Cerebras API. All requests will automatically flow through AI Gateway, enabling logging and analytics.

import os
import requests
from dotenv import load_dotenv

load_dotenv()

account_id = os.getenv("CLOUDFLARE_ACCOUNT_ID")
gateway_id = os.getenv("CLOUDFLARE_GATEWAY_ID")
api_key = os.getenv("CEREBRAS_API_KEY")

url = f"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cerebras/chat/completions"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

data = {
    "model": "llama3.1-8b",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain how AI Gateway improves observability."}
    ],
    "max_tokens": 500,
    "temperature": 0.7
}

response = requests.post(url, headers=headers, json=data)
print(response.json()["choices"][0]["message"]["content"])

Enable streaming responses

Cloudflare AI Gateway fully supports streaming responses from Cerebras, allowing you to display results in real-time as they’re generated.

import os
import requests
from dotenv import load_dotenv

load_dotenv()

account_id = os.getenv("CLOUDFLARE_ACCOUNT_ID")
gateway_id = os.getenv("CLOUDFLARE_GATEWAY_ID")
api_key = os.getenv("CEREBRAS_API_KEY")

url = f"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cerebras/chat/completions"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

data = {
    "model": "llama3.1-8b",
    "messages": [
        {"role": "user", "content": "Write a short story about AI."}
    ],
    "stream": True,
    "max_tokens": 1000
}

# Stream responses for real-time output
response = requests.post(url, headers=headers, json=data, stream=True)

for line in response.iter_lines():
    if line:
        import json
        chunk = json.loads(line.decode('utf-8').replace('data: ', ''))
        if 'choices' in chunk and chunk['choices'][0].get('delta', {}).get('content'):
            print(chunk['choices'][0]['delta']['content'], end="")

Key Features

Request Logging and Analytics

All requests routed through AI Gateway are automatically logged in your Cloudflare dashboard. You can view:

Request volume and patterns over time
Token usage and cost tracking
Response times and latency metrics
Error rates and types for debugging
Model usage distribution

Access your analytics by navigating to AI > AI Gateway in your Cloudflare dashboard and selecting your gateway.

Caching

Enable caching to reduce costs and improve response times for repeated queries. Cached responses are served instantly without hitting the Cerebras API:

Go to your AI Gateway in the Cloudflare dashboard
Navigate to Settings > Caching
Enable caching and configure TTL (time-to-live)
Cached responses will be served instantly for identical requests

Learn more about caching strategies in the Cloudflare documentation.

Rate Limiting

Protect your application and control costs with flexible rate limiting:

In your AI Gateway settings, go to Rate Limiting
Set limits per user, IP address, or API key
Configure time windows (per minute, hour, or day)
Requests exceeding limits will receive a 429 status code

Explore rate limiting configuration for advanced options.

Next Steps

Explore the Cloudflare AI Gateway documentation for advanced features
Learn about caching strategies to optimize performance and reduce costs
Set up custom analytics dashboards to monitor your AI workloads
Try different Cerebras models to find the best fit for your use case
Implement rate limiting to control costs and protect your application
Review the Cerebras API reference for all available parameters
Want to use the latest model? Check out the GLM4.6 migration guide

FAQ

Why am I getting a 401 Unauthorized error?

This usually means your Cerebras API key is invalid or missing. Double-check that:

Your CEREBRAS_API_KEY environment variable is set correctly
The API key is active and hasn’t been revoked
You’re using the correct Authorization header format: Bearer YOUR_API_KEY

You can verify your API key by making a direct request to Cerebras (without AI Gateway) to isolate the issue.

How do I view my request logs?

To view request logs and analytics:

Log in to your Cloudflare dashboard
Navigate to AI > AI Gateway
Select your gateway
Click on the Analytics tab to see request metrics and logs

Logs include request/response bodies, timestamps, token usage, latency metrics, and error details. You can filter by date range, model, and status code.

Does AI Gateway add latency to my requests?

Cloudflare AI Gateway adds minimal latency (typically 10-50ms) as requests are routed through Cloudflare’s global network. However, this is often offset by:

Caching: Repeated queries are served instantly from cache with near-zero latency
Edge network: Cloudflare’s global edge network may provide faster routing than direct connections
Optimization insights: The observability features help you identify and fix performance bottlenecks

For latency-critical applications, you can measure the impact by comparing direct Cerebras requests with gateway-routed requests using the same prompts.

Can I use AI Gateway with streaming responses?

Yes! AI Gateway fully supports streaming responses from Cerebras. Simply set stream=True (Python) or stream: true (JavaScript) in your request, and chunks will be streamed through the gateway in real-time. All streaming requests are still logged and counted in your analytics, giving you complete visibility into your streaming workloads.

What happens if AI Gateway is down?

Cloudflare AI Gateway is built on Cloudflare’s highly reliable global network with 99.99%+ uptime. In the rare event of an outage:

You can temporarily switch to direct Cerebras API calls by changing your base_url to https://api.cerebras.ai/v1
Cloudflare provides real-time status updates at cloudflarestatus.com
Your application code doesn’t need to change - just update the base URL configuration

Consider implementing automatic fallback logic in production applications to switch between gateway and direct endpoints based on availability.

Which Cerebras models are available through AI Gateway?

All current Cerebras models are available through AI Gateway:

llama-3.3-70b - Best for complex reasoning, long-form content, and tasks requiring deep understanding
qwen-3-32b - Balanced performance for general-purpose applications
llama3.1-8b - Fastest option for simple tasks and high-throughput scenarios
gpt-oss-120b - Largest model for the most demanding tasks
zai-glm-4.6 - Advanced 357B parameter model with strong reasoning capabilities

You can use any of these models by specifying the model name in your request. When using the OpenAI-compatible endpoint, prefix the model name with cerebras/ (e.g., cerebras/llama-3.3-70b).

Get Started

Capabilities

Compatibility

Resources

Support

Get Started with Cloudflare AI Gateway

Prerequisites

Configure Cloudflare AI Gateway

Key Features

Next Steps

FAQ

Get Started

Capabilities

Compatibility

Resources

Support

​Prerequisites

​Configure Cloudflare AI Gateway

​Key Features

​Next Steps

​FAQ

Prerequisites

Configure Cloudflare AI Gateway

Key Features

Next Steps

FAQ