Get Started with Kong API Gateway

Kong API Gateway is a cloud-native API gateway that provides powerful traffic management, security, and observability features. When combined with Cerebras’s ultra-fast inference, Kong enables you to build scalable, secure AI applications with advanced routing, authentication, and monitoring capabilities.

Prerequisites

Cerebras API Key - Get a free API key at cloud.cerebras.ai
Kong Konnect Account - Sign up for free at Kong Konnect

Quick Start

This guide uses Kong Konnect (cloud-managed) with a local data plane for testing. All commands are copy-paste ready.

Get your Cerebras API Key

Go to https://cloud.cerebras.ai
Sign in or create an account
Navigate to API Keys in the dashboard
Click Create API Key
Copy the key (starts with csk-)

Generate Kong Konnect Personal Access Token

Sign in to Kong Konnect
Click your profile icon (top-right corner)
Select Personal Access Tokens
Click Generate Token
Give it a name like cerebras-integration
Click Generate
Copy the token immediately (starts with kpat_)

Set up environment variables

Export your API keys:

export KONNECT_TOKEN=kpat_your-token-here
export CEREBRAS_API_KEY=csk_your-key-here

Install decK

Install decK (Kong’s configuration tool):

macOS
Linux
Windows

brew install kong/deck/deck

# Verify installation
deck version

curl -sL https://github.com/Kong/deck/releases/download/v1.43.0/deck_1.43.0_linux_amd64.tar.gz -o deck.tar.gz
tar -xf deck.tar.gz -C /tmp
sudo cp /tmp/deck /usr/local/bin/

# Verify installation
deck version

# Download from GitHub releases
curl -sL https://github.com/Kong/deck/releases/download/v1.43.0/deck_1.43.0_windows_amd64.tar.gz -o deck.tar.gz

# Extract and add to PATH
# Or use: choco install deck

# Verify installation
deck version

Deploy Kong Gateway

Run Kong’s quickstart script to deploy a local data plane connected to Konnect:

curl -Ls https://get.konghq.com/quickstart | bash

This script:

Creates a control plane in Konnect
Deploys a local Kong Gateway data plane using Docker
Configures everything to work together

Wait for the script to complete. You should see confirmation that Kong Gateway is running.

Create Gateway Service and Route

Create a service and route for Cerebras:

echo '
_format_version: "3.0"
services:
  - name: cerebras-service
    url: https://api.cerebras.ai
routes:
  - name: cerebras-route
    paths:
      - /chat
    methods:
      - POST
    service:
      name: cerebras-service
' | deck gateway apply -

Configure AI Proxy Plugin

Add the AI Proxy plugin to route traffic to Cerebras:

echo '
_format_version: "3.0"
plugins:
  - name: ai-proxy
    config:
      route_type: llm/v1/chat
      auth:
        header_name: Authorization
        header_value: Bearer '${CEREBRAS_API_KEY}'
      model:
        provider: cerebras
        name: llama-3.3-70b
        options:
          max_tokens: 1024
          temperature: 0.7
' | deck gateway apply -

Test your integration

Send a test request:

curl -X POST http://localhost:8000/chat \
  --header "Content-Type: application/json" \
  --json '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant"},
      {"role": "user", "content": "Hello! What can you help me with?"}
    ]
  }'

You should receive a response from Cerebras routed through Kong!

What’s Happening?

Kong Gateway runs locally in Docker (port 8000)
Kong Konnect manages the control plane in the cloud
AI Proxy plugin intercepts requests to /chat and routes them to Cerebras
Your Cerebras API key is securely injected by the plugin
Responses flow back through Kong to your client

Using Different Models

To use a different Cerebras model, update the plugin configuration:

echo '
_format_version: "3.0"
plugins:
  - name: ai-proxy
    config:
      route_type: llm/v1/chat
      auth:
        header_name: Authorization
        header_value: Bearer '${CEREBRAS_API_KEY}'
      model:
        provider: cerebras
        name: llama3.1-8b  # or qwen-3-32b, gpt-oss-120b
        options:
          max_tokens: 512
          temperature: 0.5
' | deck gateway apply -

Advanced Configuration

Using Different Cerebras Models

Kong’s AI Proxy plugin supports all Cerebras models. Simply update the model configuration:

# For faster responses with Llama 3.1 8B
model:
  provider: cerebras
  name: llama3.1-8b
  options:
    max_tokens: 512
    temperature: 0.5

# For balanced performance with Qwen 3 32B
model:
  provider: cerebras
  name: qwen-3-32b
  options:
    max_tokens: 1024
    temperature: 0.7

# For maximum capability with GPT-OSS 120B
model:
  provider: cerebras
  name: gpt-oss-120b
  options:
    max_tokens: 2048
    temperature: 0.8

Benefits of Using Kong with Cerebras

Centralized Management: Manage all AI API traffic through a single gateway
Security: Add authentication, rate limiting, and IP whitelisting
Observability: Monitor request patterns, latency, and errors
Load Balancing: Distribute traffic across multiple Cerebras endpoints
Caching: Reduce costs and improve response times with intelligent caching
Transformation: Modify requests and responses without changing your application code

Available Models

Kong AI Proxy supports all Cerebras models:

Model	Parameters	Best For
llama-3.3-70b	70B	Best for complex reasoning, long-form content, and tasks requiring deep understanding
qwen-3-32b	32B	Balanced performance for general-purpose applications
llama3.1-8b	8B	Fastest option for simple tasks and high-throughput scenarios
gpt-oss-120b	120B	Largest model for the most demanding tasks
zai-glm-4.7	357B	Advanced 357B parameter model with strong reasoning capabilities

Update the model.name field in your Kong configuration to switch between models.

Troubleshooting

Why am I getting 401 Unauthorized errors?

This usually indicates an issue with your Cerebras API key configuration:

Verify your CEREBRAS_API_KEY environment variable is set correctly
Check that the API key is valid and active in your Cerebras dashboard
Ensure the header value format is correct: Bearer YOUR_API_KEY
Test direct Cerebras API access to isolate the issue:

curl -X POST https://api.cerebras.ai/v1/chat/completions \
  --header "Authorization: Bearer $CEREBRAS_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{"model":"llama3.1-8b","messages":[{"role":"user","content":"test"}]}'

How do I debug plugin configuration issues?

Enable debug logging in Kong to troubleshoot plugin issues:

# Enable debug logging
curl -X PATCH http://localhost:8001/config \
  --data "log_level=debug"

# Check plugin configuration
curl -X GET http://localhost:8001/plugins

# View specific plugin details
curl -X GET http://localhost:8001/plugins/{plugin-id}

Common issues to check:

Plugin is enabled and properly configured
Environment variables are correctly set
Service and route configurations match
Upstream connectivity to Cerebras API

Can I use streaming responses with Kong?

Yes! Kong’s AI Proxy plugin fully supports streaming responses from Cerebras. Simply include "stream": true in your request:

curl -X POST http://localhost:8000/chat \
  --header "Content-Type: application/json" \
  --data '{
    "messages": [
      {"role": "user", "content": "Write a short story"}
    ],
    "stream": true,
    "max_tokens": 500
  }'

The streaming response will be passed through Kong in real-time, maintaining the low latency benefits of Cerebras inference.

How do I monitor Cerebras usage through Kong?

Kong provides several ways to monitor your Cerebras API usage:

Kong Vitals: Built-in analytics for request metrics
Prometheus Metrics: Export metrics for monitoring systems
Custom Logging: Configure detailed request/response logging
Datadog Integration: Send metrics and logs to Datadog

Example Prometheus metrics setup:

curl -X POST http://localhost:8001/plugins \
  --data "name=prometheus" \
  --data "config.per_consumer=true"

Access metrics at: http://localhost:8001/metrics

Next Steps

Explore the Kong AI Proxy documentation for advanced configuration options
Configure advanced security plugins for production deployments
Implement monitoring and alerting for your AI workloads
Try different Cerebras models to optimize for your specific use case
Set up high availability configurations for production workloads
Migrate to GLM4.7: Ready to upgrade? Follow our migration guide to start using our latest model

Get Started

Capabilities

Compatibility

Resources

Support

Get Started with Kong API Gateway

Prerequisites

Quick Start

What’s Happening?

Using Different Models

Advanced Configuration

Using Different Cerebras Models

Benefits of Using Kong with Cerebras

Available Models

Troubleshooting

Next Steps

Additional Resources

Get Started

Capabilities

Compatibility

Resources

Support

​Prerequisites

​Quick Start

​What’s Happening?

​Using Different Models

​Advanced Configuration

​Using Different Cerebras Models

​Benefits of Using Kong with Cerebras

​Available Models

​Troubleshooting

​Next Steps

​Additional Resources

Prerequisites

Quick Start

What’s Happening?

Using Different Models

Advanced Configuration

Using Different Cerebras Models

Benefits of Using Kong with Cerebras

Available Models

Troubleshooting

Next Steps

Additional Resources