Prerequisites
- Cerebras API Key - Get a free API key at cloud.cerebras.ai
- Kong Konnect Account - Sign up for free at Kong Konnect
Quick Start
This guide uses Kong Konnect (cloud-managed) with a local data plane for testing. All commands are copy-paste ready.1
Get your Cerebras API Key
- Go to https://cloud.cerebras.ai
- Sign in or create an account
- Navigate to API Keys in the dashboard
- Click Create API Key
- Copy the key (starts with
csk-)
2
Generate Kong Konnect Personal Access Token
- Sign in to Kong Konnect
- Click your profile icon (top-right corner)
- Select Personal Access Tokens
- Click Generate Token
- Give it a name like
cerebras-integration - Click Generate
- Copy the token immediately (starts with
kpat_)
3
Set up environment variables
Export your API keys:
4
Install decK
Install decK (Kong’s configuration tool):
- macOS
- Linux
- Windows
5
Deploy Kong Gateway
Run Kong’s quickstart script to deploy a local data plane connected to Konnect:This script:
- Creates a control plane in Konnect
- Deploys a local Kong Gateway data plane using Docker
- Configures everything to work together
6
Create Gateway Service and Route
Create a service and route for Cerebras:
7
Configure AI Proxy Plugin
Add the AI Proxy plugin to route traffic to Cerebras:
8
Test your integration
Send a test request:You should receive a response from Cerebras routed through Kong!
What’s Happening?
- Kong Gateway runs locally in Docker (port 8000)
- Kong Konnect manages the control plane in the cloud
- AI Proxy plugin intercepts requests to
/chatand routes them to Cerebras - Your Cerebras API key is securely injected by the plugin
- Responses flow back through Kong to your client
Using Different Models
To use a different Cerebras model, update the plugin configuration:Advanced Configuration
Using Different Cerebras Models
Kong’s AI Proxy plugin supports all Cerebras models. Simply update the model configuration:Benefits of Using Kong with Cerebras
- Centralized Management: Manage all AI API traffic through a single gateway
- Security: Add authentication, rate limiting, and IP whitelisting
- Observability: Monitor request patterns, latency, and errors
- Load Balancing: Distribute traffic across multiple Cerebras endpoints
- Caching: Reduce costs and improve response times with intelligent caching
- Transformation: Modify requests and responses without changing your application code
Troubleshooting
Why am I getting 401 Unauthorized errors?
Why am I getting 401 Unauthorized errors?
How do I debug plugin configuration issues?
How do I debug plugin configuration issues?
Enable debug logging in Kong to troubleshoot plugin issues:Common issues to check:
- Plugin is enabled and properly configured
- Environment variables are correctly set
- Service and route configurations match
- Upstream connectivity to Cerebras API
Can I use streaming responses with Kong?
Can I use streaming responses with Kong?
Yes! Kong’s AI Proxy plugin fully supports streaming responses from Cerebras. Simply include The streaming response will be passed through Kong in real-time, maintaining the low latency benefits of Cerebras inference.
"stream": true in your request:How do I monitor Cerebras usage through Kong?
How do I monitor Cerebras usage through Kong?
Kong provides several ways to monitor your Cerebras API usage:Access metrics at:
- Kong Vitals: Built-in analytics for request metrics
- Prometheus Metrics: Export metrics for monitoring systems
- Custom Logging: Configure detailed request/response logging
- Datadog Integration: Send metrics and logs to Datadog
http://localhost:8001/metricsNext Steps
- Explore the Kong AI Proxy documentation for advanced configuration options
- Configure advanced security plugins for production deployments
- Implement monitoring and alerting for your AI workloads
- Try different Cerebras models to optimize for your specific use case
- Set up high availability configurations for production workloads
- Migrate to GLM4.6: Ready to upgrade? Follow our migration guide to start using our latest model

