Skip to main content

What is Portkey?

Portkey is an AI gateway that provides a unified interface to manage, monitor, and optimize your LLM applications. It offers features like request routing, caching, load balancing, fallbacks, and comprehensive observability—all while maintaining compatibility with the OpenAI SDK. With Portkey, you can:
  • Monitor and trace all your Cerebras API calls in real-time
  • Cache responses to reduce costs and improve latency
  • Implement fallbacks to other providers for reliability
  • Load balance across multiple API keys or models
  • Set budgets and rate limits to control spending
  • Analyze usage patterns with detailed analytics
Learn more at Portkey.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here.
  • Portkey Account - Visit Portkey and create a free account.
  • Portkey API Key - After signing up, generate your Portkey API key from the dashboard under Settings > API Keys.
  • Python 3.11 or higher

Configure Portkey

1

Install the Portkey SDK

Install the Portkey SDK for your preferred language. The SDK provides a drop-in replacement for the OpenAI client with additional gateway features.
pip install portkey-ai openai python-dotenv
2

Configure environment variables

Create a .env file in your project directory to store your Portkey API key:
PORTKEY_API_KEY=your-portkey-api-key-here
With Portkey’s new integration, you no longer need to create virtual keys. Simply use the @cerebras/ prefix with your model name, and Portkey will route requests directly to Cerebras.
3

Initialize the Portkey client

Set up the Portkey client with your API key. The client is compatible with the OpenAI SDK interface, making it easy to integrate into existing code.
import os
from portkey_ai import Portkey

# Initialize Portkey client
portkey = Portkey(
    api_key=os.getenv("PORTKEY_API_KEY")
)
Portkey automatically routes requests to Cerebras when you use the @cerebras/ model prefix. No additional configuration needed!
4

Make your first request

Now you can make requests to Cerebras through Portkey’s gateway. Use the @cerebras/ prefix with your model name. All requests are automatically logged, monitored, and can leverage Portkey’s advanced features.
import os
from portkey_ai import Portkey

portkey = Portkey(
    api_key=os.getenv("PORTKEY_API_KEY")
)

# Make a chat completion request using @cerebras/ prefix
response = portkey.chat.completions.create(
    model="@cerebras/gpt-oss-120b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    max_tokens=500
)

print(response.choices[0].message.content)
After running this code, you’ll see the request appear in your Portkey dashboard with detailed logs, latency metrics, and token usage.

Use Portkey with the OpenAI SDK

If you prefer to use the standard OpenAI SDK, you can route requests through Portkey by configuring the base URL and headers. This approach gives you Portkey’s observability features while maintaining your existing OpenAI SDK code.
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Initialize OpenAI client with Portkey gateway
client = OpenAI(
    api_key="portkey",  # Dummy key when using Portkey
    base_url="https://api.portkey.ai/v1",
    default_headers={
        "x-portkey-api-key": os.getenv("PORTKEY_API_KEY"),
        "X-Cerebras-3rd-Party-Integration": "Portkey"
    }
)

# Make a request using @cerebras/ prefix
response = client.chat.completions.create(
    model="@cerebras/gpt-oss-120b",
    messages=[
        {"role": "user", "content": "What are the benefits of AI gateways?"}
    ]
)

print(response.choices[0].message.content)

Advanced Features

Response Caching

Portkey can cache responses to reduce costs and improve latency for repeated queries. When you enable caching, identical requests return cached responses instantly without calling the Cerebras API.
import os
from portkey_ai import Portkey
from dotenv import load_dotenv

load_dotenv()

portkey = Portkey(
    api_key=os.getenv("PORTKEY_API_KEY")
)

# Enable simple caching for this request
response = portkey.with_options(
    cache="simple",
    cache_force_refresh=False
).chat.completions.create(
    model="@cerebras/gpt-oss-120b",
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)

print(response.choices[0].message.content)
Portkey supports two cache modes:
  • Simple cache: Caches based on the exact request parameters
  • Semantic cache: Uses embeddings to match semantically similar requests

Fallback Configuration

Set up automatic fallbacks to other providers or models if Cerebras is unavailable. This ensures your application remains resilient even during outages or rate limiting. To configure fallbacks:
  1. Log in to your Portkey dashboard
  2. Navigate to Configs and create a new configuration
  3. Add multiple providers (e.g., Cerebras as primary, OpenAI as fallback)
  4. Set the strategy mode to “fallback”
  5. Save the configuration and use its ID in your code
Learn more about fallback configurations in the Portkey documentation.

Load Balancing

Distribute requests across multiple API keys or providers to increase throughput and avoid rate limits. Portkey supports multiple load balancing strategies including round-robin, weighted, and priority-based distribution. To configure load balancing:
  1. Log in to your Portkey dashboard
  2. Navigate to Configs and create a new configuration
  3. Add multiple targets with the same or different providers
  4. Set the strategy mode to “loadbalance”
  5. Configure weights for each target (optional)
  6. Save the configuration and use its ID in your code
Learn more about load balancing strategies in the Portkey documentation.

Request Tracing with Metadata

Add custom metadata to track requests across your application. This helps you analyze usage patterns, debug issues, and attribute costs to specific users or features.
import os
from portkey_ai import Portkey
from dotenv import load_dotenv

load_dotenv()

portkey = Portkey(
    api_key=os.getenv("PORTKEY_API_KEY")
)

response = portkey.with_options(
    metadata={
        "user_id": "user_123",
        "session_id": "session_456",
        "environment": "production",
        "feature": "chat_assistant"
    }
).chat.completions.create(
    model="@cerebras/gpt-oss-120b",
    messages=[{"role": "user", "content": "Hello!"}]
)
You can filter and analyze requests by metadata in the Portkey dashboard, making it easy to track usage by user, feature, or environment.

Budget and Rate Limits

Set spending limits and rate limits to control costs and prevent unexpected charges. Configure these in your Portkey dashboard under Settings > Budgets.
Python
import os
from portkey_ai import Portkey
from dotenv import load_dotenv

load_dotenv()

portkey = Portkey(
    api_key=os.getenv("PORTKEY_API_KEY")
)

# Budget limits are configured in the Portkey dashboard
# Requests will be rejected if limits are exceeded
response = portkey.chat.completions.create(
    model="@cerebras/gpt-oss-120b",
    messages=[{"role": "user", "content": "Hello!"}]
)

Supported Cerebras Models

All current Cerebras models are available through Portkey:
ModelParametersBest For
@cerebras/llama-3.3-70b70BComplex reasoning, long-form content
@cerebras/qwen-3-32b32BMultilingual tasks, balanced performance
@cerebras/gpt-oss-120b120BMost capable open-source model
@cerebras/llama3.1-8b8BFast responses, simple tasks
Learn more about each model in the Cerebras models documentation.

Monitoring and Analytics

Portkey’s dashboard provides comprehensive insights into your Cerebras API usage. Access your dashboard at app.portkey.ai.

Request Logs

View all requests with full details including:
  • Complete prompts and responses
  • Token usage and costs
  • Latency and performance metrics
  • Custom metadata
  • Error messages and stack traces

Performance Metrics

Track key performance indicators:
  • Latency: P50, P95, and P99 response times
  • Throughput: Requests per second and tokens per second
  • Success Rate: Percentage of successful requests
  • Cache Hit Rate: Percentage of requests served from cache

Cost Analytics

Monitor spending across:
  • Different models and providers
  • Time periods (hourly, daily, monthly)
  • Users, features, or environments (via metadata)
  • API keys and virtual keys

Custom Dashboards

Create custom views filtered by:
  • Metadata fields (user_id, environment, etc.)
  • Model or provider
  • Time range
  • Success or error status
  • Cache hits or misses

Troubleshooting

If you see an error about an invalid model:
  • Ensure you’re using the @cerebras/ prefix with your model name (e.g., @cerebras/gpt-oss-120b)
  • Check that the model name is spelled correctly
  • Verify the model is available in the Cerebras models documentation
  • Confirm your Portkey API key is valid and active
If requests aren’t showing up in your Portkey dashboard:
  • Confirm your PORTKEY_API_KEY is correct and active
  • Check that you’re using the Portkey client or routing through the Portkey gateway
  • Verify your network allows outbound connections to api.portkey.ai
  • Check the browser console for any CORS or network errors
  • Wait a few seconds—there may be a slight delay in log processing
If you’re hitting rate limits:
  • Check your Cerebras account’s rate limits in the Cerebras dashboard
  • Implement request queuing or exponential backoff retry logic
  • Use Portkey’s load balancing feature to distribute requests across multiple API keys
  • Consider upgrading your Cerebras plan for higher rate limits
If caching isn’t reducing your costs:
  • Ensure you’re using identical request parameters for cache hits (model, messages, temperature, etc.)
  • Check that cache_force_refresh is set to False
  • Verify caching is enabled in your Portkey organization settings
  • Review cache analytics in the dashboard to see hit rates
  • Consider using semantic cache for similar but not identical queries
If fallbacks aren’t working as expected:
  • Verify your fallback configuration includes valid virtual keys
  • Check that the primary provider is actually failing (not just slow)
  • Review the fallback strategy mode (fallback vs loadbalance)
  • Check logs in the dashboard to see which provider handled each request
If you’re experiencing slower response times:
  • Check the Portkey status page for any ongoing issues
  • Review latency metrics in the dashboard to identify patterns
  • Consider enabling caching for frequently asked questions
  • Verify your network connection and geographic location relative to Portkey’s servers
  • Try using a different Cerebras model (smaller models are faster)
For additional support, visit Portkey’s documentation, contact their support team, or join their Discord community.

Next Steps

Now that you have Portkey set up with Cerebras, explore these advanced features: For more examples and use cases, check out Portkey’s cookbook.