Get Started with Portkey

What is Portkey?

Portkey is an AI gateway that provides a unified interface to manage, monitor, and optimize your LLM applications. It offers features like request routing, caching, load balancing, fallbacks, and comprehensive observability—all while maintaining compatibility with the OpenAI SDK. With Portkey, you can:

Monitor and trace all your Cerebras API calls in real-time
Cache responses to reduce costs and improve latency
Implement fallbacks to other providers for reliability
Load balance across multiple API keys or models
Set budgets and rate limits to control spending
Analyze usage patterns with detailed analytics

Learn more at Portkey.

Prerequisites

Before you begin, ensure you have:

Cerebras API Key - Get a free API key here.
Portkey Account - Visit Portkey and create a free account.
Portkey API Key - After signing up, generate your Portkey API key from the dashboard under Settings > API Keys.
Python 3.11 or higher

Configure Portkey

Install the Portkey SDK

Install the Portkey SDK for your preferred language. The SDK provides a drop-in replacement for the OpenAI client with additional gateway features.

pip install portkey-ai openai python-dotenv

Configure environment variables

Create a .env file in your project directory to store your Portkey API key:

PORTKEY_API_KEY=your-portkey-api-key-here

With Portkey’s new integration, you no longer need to create virtual keys. Simply use the @cerebras/ prefix with your model name, and Portkey will route requests directly to Cerebras.

Initialize the Portkey client

Set up the Portkey client with your API key. The client is compatible with the OpenAI SDK interface, making it easy to integrate into existing code.

import os
from portkey_ai import Portkey

# Initialize Portkey client
portkey = Portkey(
    api_key=os.getenv("PORTKEY_API_KEY")
)

Portkey automatically routes requests to Cerebras when you use the @cerebras/ model prefix. No additional configuration needed!

Make your first request

Now you can make requests to Cerebras through Portkey’s gateway. Use the @cerebras/ prefix with your model name. All requests are automatically logged, monitored, and can leverage Portkey’s advanced features.

import os
from portkey_ai import Portkey

portkey = Portkey(
    api_key=os.getenv("PORTKEY_API_KEY")
)

# Make a chat completion request using @cerebras/ prefix
response = portkey.chat.completions.create(
    model="@cerebras/gpt-oss-120b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    max_tokens=500
)

print(response.choices[0].message.content)

After running this code, you’ll see the request appear in your Portkey dashboard with detailed logs, latency metrics, and token usage.

Use Portkey with the OpenAI SDK

If you prefer to use the standard OpenAI SDK, you can route requests through Portkey by configuring the base URL and headers. This approach gives you Portkey’s observability features while maintaining your existing OpenAI SDK code.

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Initialize OpenAI client with Portkey gateway
client = OpenAI(
    api_key="portkey",  # Dummy key when using Portkey
    base_url="https://api.portkey.ai/v1",
    default_headers={
        "x-portkey-api-key": os.getenv("PORTKEY_API_KEY"),
        "X-Cerebras-3rd-Party-Integration": "Portkey"
    }
)

# Make a request using @cerebras/ prefix
response = client.chat.completions.create(
    model="@cerebras/gpt-oss-120b",
    messages=[
        {"role": "user", "content": "What are the benefits of AI gateways?"}
    ]
)

print(response.choices[0].message.content)

Advanced Features

Response Caching

Portkey can cache responses to reduce costs and improve latency for repeated queries. When you enable caching, identical requests return cached responses instantly without calling the Cerebras API.

import os
from portkey_ai import Portkey
from dotenv import load_dotenv

load_dotenv()

portkey = Portkey(
    api_key=os.getenv("PORTKEY_API_KEY")
)

# Enable simple caching for this request
response = portkey.with_options(
    cache="simple",
    cache_force_refresh=False
).chat.completions.create(
    model="@cerebras/gpt-oss-120b",
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)

print(response.choices[0].message.content)

Portkey supports two cache modes:

Simple cache: Caches based on the exact request parameters
Semantic cache: Uses embeddings to match semantically similar requests

Fallback Configuration

Set up automatic fallbacks to other providers or models if Cerebras is unavailable. This ensures your application remains resilient even during outages or rate limiting. To configure fallbacks:

Log in to your Portkey dashboard
Navigate to Configs and create a new configuration
Add multiple providers (e.g., Cerebras as primary, OpenAI as fallback)
Set the strategy mode to “fallback”
Save the configuration and use its ID in your code

Learn more about fallback configurations in the Portkey documentation.

Load Balancing

Distribute requests across multiple API keys or providers to increase throughput and avoid rate limits. Portkey supports multiple load balancing strategies including round-robin, weighted, and priority-based distribution. To configure load balancing:

Log in to your Portkey dashboard
Navigate to Configs and create a new configuration
Add multiple targets with the same or different providers
Set the strategy mode to “loadbalance”
Configure weights for each target (optional)
Save the configuration and use its ID in your code

Learn more about load balancing strategies in the Portkey documentation.

Request Tracing with Metadata

Add custom metadata to track requests across your application. This helps you analyze usage patterns, debug issues, and attribute costs to specific users or features.

import os
from portkey_ai import Portkey
from dotenv import load_dotenv

load_dotenv()

portkey = Portkey(
    api_key=os.getenv("PORTKEY_API_KEY")
)

response = portkey.with_options(
    metadata={
        "user_id": "user_123",
        "session_id": "session_456",
        "environment": "production",
        "feature": "chat_assistant"
    }
).chat.completions.create(
    model="@cerebras/gpt-oss-120b",
    messages=[{"role": "user", "content": "Hello!"}]
)

You can filter and analyze requests by metadata in the Portkey dashboard, making it easy to track usage by user, feature, or environment.

Budget and Rate Limits

Set spending limits and rate limits to control costs and prevent unexpected charges. Configure these in your Portkey dashboard under Settings > Budgets.

Python

import os
from portkey_ai import Portkey
from dotenv import load_dotenv

load_dotenv()

portkey = Portkey(
    api_key=os.getenv("PORTKEY_API_KEY")
)

# Budget limits are configured in the Portkey dashboard
# Requests will be rejected if limits are exceeded
response = portkey.chat.completions.create(
    model="@cerebras/gpt-oss-120b",
    messages=[{"role": "user", "content": "Hello!"}]
)

Supported Cerebras Models

All current Cerebras models are available through Portkey:

Model	Parameters	Best For
`@cerebras/llama-3.3-70b`	70B	Complex reasoning, long-form content
`@cerebras/qwen-3-32b`	32B	Multilingual tasks, balanced performance
`@cerebras/gpt-oss-120b`	120B	Most capable open-source model
`@cerebras/llama3.1-8b`	8B	Fast responses, simple tasks

Learn more about each model in the Cerebras models documentation.

Monitoring and Analytics

Portkey’s dashboard provides comprehensive insights into your Cerebras API usage. Access your dashboard at app.portkey.ai.

Request Logs

View all requests with full details including:

Complete prompts and responses
Token usage and costs
Latency and performance metrics
Custom metadata
Error messages and stack traces

Performance Metrics

Track key performance indicators:

Latency: P50, P95, and P99 response times
Throughput: Requests per second and tokens per second
Success Rate: Percentage of successful requests
Cache Hit Rate: Percentage of requests served from cache

Cost Analytics

Monitor spending across:

Different models and providers
Time periods (hourly, daily, monthly)
Users, features, or environments (via metadata)
API keys and virtual keys

Custom Dashboards

Create custom views filtered by:

Metadata fields (user_id, environment, etc.)
Model or provider
Time range
Success or error status
Cache hits or misses

Troubleshooting

Model Not Found or Invalid

If you see an error about an invalid model:

Ensure you’re using the @cerebras/ prefix with your model name (e.g., @cerebras/gpt-oss-120b)
Check that the model name is spelled correctly
Verify the model is available in the Cerebras models documentation
Confirm your Portkey API key is valid and active

Requests Not Appearing in Dashboard

If requests aren’t showing up in your Portkey dashboard:

Confirm your PORTKEY_API_KEY is correct and active
Check that you’re using the Portkey client or routing through the Portkey gateway
Verify your network allows outbound connections to api.portkey.ai
Check the browser console for any CORS or network errors
Wait a few seconds—there may be a slight delay in log processing

Rate Limit Errors

If you’re hitting rate limits:

Check your Cerebras account’s rate limits in the Cerebras dashboard
Implement request queuing or exponential backoff retry logic
Use Portkey’s load balancing feature to distribute requests across multiple API keys
Consider upgrading your Cerebras plan for higher rate limits

Cache Not Working

If caching isn’t reducing your costs:

Ensure you’re using identical request parameters for cache hits (model, messages, temperature, etc.)
Check that cache_force_refresh is set to False
Verify caching is enabled in your Portkey organization settings
Review cache analytics in the dashboard to see hit rates
Consider using semantic cache for similar but not identical queries

Fallback Not Triggering

If fallbacks aren’t working as expected:

Verify your fallback configuration includes valid virtual keys
Check that the primary provider is actually failing (not just slow)
Review the fallback strategy mode (fallback vs loadbalance)
Check logs in the dashboard to see which provider handled each request

High Latency

If you’re experiencing slower response times:

Check the Portkey status page for any ongoing issues
Review latency metrics in the dashboard to identify patterns
Consider enabling caching for frequently asked questions
Verify your network connection and geographic location relative to Portkey’s servers
Try using a different Cerebras model (smaller models are faster)

For additional support, visit Portkey’s documentation, contact their support team, or join their Discord community.

Next Steps

Now that you have Portkey set up with Cerebras, explore these advanced features:

Prompt Templates - Create reusable prompt templates with variables
Guardrails - Filter inputs and outputs for safety and compliance
Continuous Logging - Export logs to your data warehouse
A/B Testing - Compare different models or prompts
Cerebras Models - Explore all available Cerebras models and their capabilities

For more examples and use cases, check out Portkey’s cookbook.

Get Started

Capabilities

Compatibility

Resources

Support

What is Portkey?

Prerequisites

Configure Portkey

Use Portkey with the OpenAI SDK

Advanced Features

Response Caching

Fallback Configuration

Load Balancing

Request Tracing with Metadata

Budget and Rate Limits

Supported Cerebras Models

Monitoring and Analytics

Request Logs

Performance Metrics

Cost Analytics

Custom Dashboards

Troubleshooting

Next Steps

Get Started

Capabilities

Compatibility

Resources

Support

​What is Portkey?

​Prerequisites

​Configure Portkey

​Use Portkey with the OpenAI SDK

​Advanced Features

​Response Caching

​Fallback Configuration

​Load Balancing

​Request Tracing with Metadata

​Budget and Rate Limits

​Supported Cerebras Models

​Monitoring and Analytics

​Request Logs

​Performance Metrics

​Cost Analytics

​Custom Dashboards

​Troubleshooting

​Next Steps

What is Portkey?

Prerequisites

Configure Portkey

Use Portkey with the OpenAI SDK

Advanced Features

Response Caching

Fallback Configuration

Load Balancing

Request Tracing with Metadata

Budget and Rate Limits

Supported Cerebras Models

Monitoring and Analytics

Request Logs

Performance Metrics

Cost Analytics

Custom Dashboards

Troubleshooting

Next Steps