Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here
- Cloudflare Account - Visit Cloudflare and create an account or log in
- AI Gateway Created - Set up an AI Gateway in your Cloudflare dashboard
- Python 3.11 or higher (for Python examples)
Configure Cloudflare AI Gateway
1
Create an AI Gateway
First, you’ll need to create an AI Gateway in your Cloudflare dashboard to enable request routing and monitoring.
- Log in to the Cloudflare dashboard
- Navigate to AI > AI Gateway
- Click Create Gateway
- Give your gateway a name (e.g., “cbrs” or “cerebras-gateway”) - this will be your Gateway ID
- Click Create to complete the setup
Important: After creating your gateway, note these two values:
- Gateway ID: This is the name you just chose for your gateway (e.g., “cbrs”)
- Account ID: This is visible in your browser’s URL bar (e.g.,
xxxxxx)
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cerebrasFor example: https://gateway.ai.cloudflare.com/v1/xxxxxx/cbrs/cerebras2
Install required dependencies
Install the necessary Python packages:
3
Configure environment variables
Create a Replace the placeholder values with your actual credentials:
.env file in your project directory to securely store your credentials:CEREBRAS_API_KEY: Your Cerebras API key from the dashboardCLOUDFLARE_ACCOUNT_ID: The account ID from your Cloudflare dashboard URL (e.g.,xxxxxx)CLOUDFLARE_GATEWAY_ID: The gateway name you chose when creating the gateway
4
Initialize the client with AI Gateway
Set up your client to route requests through Cloudflare AI Gateway. The key is to use Cloudflare’s gateway URL as your base URL, which automatically enables logging, caching, and analytics.
5
Make your first request
Now you can make requests exactly as you would with the standard Cerebras API. All requests will automatically flow through AI Gateway, enabling logging and analytics.
6
Enable streaming responses
Cloudflare AI Gateway fully supports streaming responses from Cerebras, allowing you to display results in real-time as they’re generated.
Key Features
Request Logging and Analytics
Request Logging and Analytics
All requests routed through AI Gateway are automatically logged in your Cloudflare dashboard. You can view:
- Request volume and patterns over time
- Token usage and cost tracking
- Response times and latency metrics
- Error rates and types for debugging
- Model usage distribution
Caching
Caching
Enable caching to reduce costs and improve response times for repeated queries. Cached responses are served instantly without hitting the Cerebras API:
- Go to your AI Gateway in the Cloudflare dashboard
- Navigate to Settings > Caching
- Enable caching and configure TTL (time-to-live)
- Cached responses will be served instantly for identical requests
Rate Limiting
Rate Limiting
Protect your application and control costs with flexible rate limiting:
- In your AI Gateway settings, go to Rate Limiting
- Set limits per user, IP address, or API key
- Configure time windows (per minute, hour, or day)
- Requests exceeding limits will receive a 429 status code
Next Steps
- Explore the Cloudflare AI Gateway documentation for advanced features
- Learn about caching strategies to optimize performance and reduce costs
- Set up custom analytics dashboards to monitor your AI workloads
- Try different Cerebras models to find the best fit for your use case
- Implement rate limiting to control costs and protect your application
- Review the Cerebras API reference for all available parameters
- Want to use the latest model? Check out the GLM4.6 migration guide
FAQ
Why am I getting a 401 Unauthorized error?
Why am I getting a 401 Unauthorized error?
How do I view my request logs?
How do I view my request logs?
To view request logs and analytics:
- Log in to your Cloudflare dashboard
- Navigate to AI > AI Gateway
- Select your gateway
- Click on the Analytics tab to see request metrics and logs
Does AI Gateway add latency to my requests?
Does AI Gateway add latency to my requests?
Cloudflare AI Gateway adds minimal latency (typically 10-50ms) as requests are routed through Cloudflare’s global network. However, this is often offset by:
- Caching: Repeated queries are served instantly from cache with near-zero latency
- Edge network: Cloudflare’s global edge network may provide faster routing than direct connections
- Optimization insights: The observability features help you identify and fix performance bottlenecks
Can I use AI Gateway with streaming responses?
Can I use AI Gateway with streaming responses?
Yes! AI Gateway fully supports streaming responses from Cerebras. Simply set
stream=True (Python) or stream: true (JavaScript) in your request, and chunks will be streamed through the gateway in real-time. All streaming requests are still logged and counted in your analytics, giving you complete visibility into your streaming workloads.What happens if AI Gateway is down?
What happens if AI Gateway is down?
Cloudflare AI Gateway is built on Cloudflare’s highly reliable global network with 99.99%+ uptime. In the rare event of an outage:
- You can temporarily switch to direct Cerebras API calls by changing your
base_urltohttps://api.cerebras.ai/v1 - Cloudflare provides real-time status updates at cloudflarestatus.com
- Your application code doesn’t need to change - just update the base URL configuration
Which Cerebras models are available through AI Gateway?
Which Cerebras models are available through AI Gateway?
All current Cerebras models are available through AI Gateway:
llama-3.3-70b- Best for complex reasoning, long-form content, and tasks requiring deep understandingqwen-3-32b- Balanced performance for general-purpose applicationsllama3.1-8b- Fastest option for simple tasks and high-throughput scenariosgpt-oss-120b- Largest model for the most demanding taskszai-glm-4.6- Advanced 357B parameter model with strong reasoning capabilities
cerebras/ (e.g., cerebras/llama-3.3-70b).
