Skip to main content
Kong API Gateway is a cloud-native API gateway that provides powerful traffic management, security, and observability features. When combined with Cerebras’s ultra-fast inference, Kong enables you to build scalable, secure AI applications with advanced routing, authentication, and monitoring capabilities.

Prerequisites

Quick Start

This guide uses Kong Konnect (cloud-managed) with a local data plane for testing. All commands are copy-paste ready.
1

Get your Cerebras API Key

  1. Go to https://cloud.cerebras.ai
  2. Sign in or create an account
  3. Navigate to API Keys in the dashboard
  4. Click Create API Key
  5. Copy the key (starts with csk-)
2

Generate Kong Konnect Personal Access Token

  1. Sign in to Kong Konnect
  2. Click your profile icon (top-right corner)
  3. Select Personal Access Tokens
  4. Click Generate Token
  5. Give it a name like cerebras-integration
  6. Click Generate
  7. Copy the token immediately (starts with kpat_)
3

Set up environment variables

Export your API keys:
export KONNECT_TOKEN=kpat_your-token-here
export CEREBRAS_API_KEY=csk_your-key-here
4

Install decK

Install decK (Kong’s configuration tool):
brew install kong/deck/deck

# Verify installation
deck version
5

Deploy Kong Gateway

Run Kong’s quickstart script to deploy a local data plane connected to Konnect:
curl -Ls https://get.konghq.com/quickstart | bash
This script:
  • Creates a control plane in Konnect
  • Deploys a local Kong Gateway data plane using Docker
  • Configures everything to work together
Wait for the script to complete. You should see confirmation that Kong Gateway is running.
6

Create Gateway Service and Route

Create a service and route for Cerebras:
echo '
_format_version: "3.0"
services:
  - name: cerebras-service
    url: https://api.cerebras.ai
routes:
  - name: cerebras-route
    paths:
      - /chat
    methods:
      - POST
    service:
      name: cerebras-service
' | deck gateway apply -
7

Configure AI Proxy Plugin

Add the AI Proxy plugin to route traffic to Cerebras:
echo '
_format_version: "3.0"
plugins:
  - name: ai-proxy
    config:
      route_type: llm/v1/chat
      auth:
        header_name: Authorization
        header_value: Bearer '${CEREBRAS_API_KEY}'
      model:
        provider: cerebras
        name: llama-3.3-70b
        options:
          max_tokens: 1024
          temperature: 0.7
' | deck gateway apply -
8

Test your integration

Send a test request:
curl -X POST http://localhost:8000/chat \
  --header "Content-Type: application/json" \
  --json '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant"},
      {"role": "user", "content": "Hello! What can you help me with?"}
    ]
  }'
You should receive a response from Cerebras routed through Kong!

What’s Happening?

  1. Kong Gateway runs locally in Docker (port 8000)
  2. Kong Konnect manages the control plane in the cloud
  3. AI Proxy plugin intercepts requests to /chat and routes them to Cerebras
  4. Your Cerebras API key is securely injected by the plugin
  5. Responses flow back through Kong to your client

Using Different Models

To use a different Cerebras model, update the plugin configuration:
echo '
_format_version: "3.0"
plugins:
  - name: ai-proxy
    config:
      route_type: llm/v1/chat
      auth:
        header_name: Authorization
        header_value: Bearer '${CEREBRAS_API_KEY}'
      model:
        provider: cerebras
        name: llama3.1-8b  # or qwen-3-32b, gpt-oss-120b
        options:
          max_tokens: 512
          temperature: 0.5
' | deck gateway apply -

Advanced Configuration

Using Different Cerebras Models

Kong’s AI Proxy plugin supports all Cerebras models. Simply update the model configuration:
# For faster responses with Llama 3.1 8B
model:
  provider: cerebras
  name: llama3.1-8b
  options:
    max_tokens: 512
    temperature: 0.5

# For balanced performance with Qwen 3 32B
model:
  provider: cerebras
  name: qwen-3-32b
  options:
    max_tokens: 1024
    temperature: 0.7

# For maximum capability with GPT-OSS 120B
model:
  provider: cerebras
  name: gpt-oss-120b
  options:
    max_tokens: 2048
    temperature: 0.8

Benefits of Using Kong with Cerebras

  • Centralized Management: Manage all AI API traffic through a single gateway
  • Security: Add authentication, rate limiting, and IP whitelisting
  • Observability: Monitor request patterns, latency, and errors
  • Load Balancing: Distribute traffic across multiple Cerebras endpoints
  • Caching: Reduce costs and improve response times with intelligent caching
  • Transformation: Modify requests and responses without changing your application code

Troubleshooting

This usually indicates an issue with your Cerebras API key configuration:
  1. Verify your CEREBRAS_API_KEY environment variable is set correctly
  2. Check that the API key is valid and active in your Cerebras dashboard
  3. Ensure the header value format is correct: Bearer YOUR_API_KEY
  4. Test direct Cerebras API access to isolate the issue:
curl -X POST https://api.cerebras.ai/v1/chat/completions \
  --header "Authorization: Bearer $CEREBRAS_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{"model":"llama3.1-8b","messages":[{"role":"user","content":"test"}]}'
Enable debug logging in Kong to troubleshoot plugin issues:
# Enable debug logging
curl -X PATCH http://localhost:8001/config \
  --data "log_level=debug"

# Check plugin configuration
curl -X GET http://localhost:8001/plugins

# View specific plugin details
curl -X GET http://localhost:8001/plugins/{plugin-id}
Common issues to check:
  • Plugin is enabled and properly configured
  • Environment variables are correctly set
  • Service and route configurations match
  • Upstream connectivity to Cerebras API
Yes! Kong’s AI Proxy plugin fully supports streaming responses from Cerebras. Simply include "stream": true in your request:
curl -X POST http://localhost:8000/chat \
  --header "Content-Type: application/json" \
  --data '{
    "messages": [
      {"role": "user", "content": "Write a short story"}
    ],
    "stream": true,
    "max_tokens": 500
  }'
The streaming response will be passed through Kong in real-time, maintaining the low latency benefits of Cerebras inference.
Kong provides several ways to monitor your Cerebras API usage:
  1. Kong Vitals: Built-in analytics for request metrics
  2. Prometheus Metrics: Export metrics for monitoring systems
  3. Custom Logging: Configure detailed request/response logging
  4. Datadog Integration: Send metrics and logs to Datadog
Example Prometheus metrics setup:
curl -X POST http://localhost:8001/plugins \
  --data "name=prometheus" \
  --data "config.per_consumer=true"
Access metrics at: http://localhost:8001/metrics

Next Steps

Additional Resources