What is Portkey?
Portkey is an AI gateway that provides a unified interface to manage, monitor, and optimize your LLM applications. It offers features like request routing, caching, load balancing, fallbacks, and comprehensive observability—all while maintaining compatibility with the OpenAI SDK. With Portkey, you can:- Monitor and trace all your Cerebras API calls in real-time
- Cache responses to reduce costs and improve latency
- Implement fallbacks to other providers for reliability
- Load balance across multiple API keys or models
- Set budgets and rate limits to control spending
- Analyze usage patterns with detailed analytics
Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here.
- Portkey Account - Visit Portkey and create a free account.
- Portkey API Key - After signing up, generate your Portkey API key from the dashboard under Settings > API Keys.
- Python 3.11 or higher
Configure Portkey
Install the Portkey SDK
Install the Portkey SDK for your preferred language. The SDK provides a drop-in replacement for the OpenAI client with additional gateway features.
Configure environment variables
Create a
.env file in your project directory to store your Portkey API key:With Portkey’s new integration, you no longer need to create virtual keys. Simply use the
@cerebras/ prefix with your model name, and Portkey will route requests directly to Cerebras.Initialize the Portkey client
Set up the Portkey client with your API key. The client is compatible with the OpenAI SDK interface, making it easy to integrate into existing code.
Portkey automatically routes requests to Cerebras when you use the
@cerebras/ model prefix. No additional configuration needed!Make your first request
Now you can make requests to Cerebras through Portkey’s gateway. Use the After running this code, you’ll see the request appear in your Portkey dashboard with detailed logs, latency metrics, and token usage.
@cerebras/ prefix with your model name. All requests are automatically logged, monitored, and can leverage Portkey’s advanced features.Use Portkey with the OpenAI SDK
If you prefer to use the standard OpenAI SDK, you can route requests through Portkey by configuring the base URL and headers. This approach gives you Portkey’s observability features while maintaining your existing OpenAI SDK code.Advanced Features
Response Caching
Portkey can cache responses to reduce costs and improve latency for repeated queries. When you enable caching, identical requests return cached responses instantly without calling the Cerebras API.- Simple cache: Caches based on the exact request parameters
- Semantic cache: Uses embeddings to match semantically similar requests
Fallback Configuration
Set up automatic fallbacks to other providers or models if Cerebras is unavailable. This ensures your application remains resilient even during outages or rate limiting. To configure fallbacks:- Log in to your Portkey dashboard
- Navigate to Configs and create a new configuration
- Add multiple providers (e.g., Cerebras as primary, OpenAI as fallback)
- Set the strategy mode to “fallback”
- Save the configuration and use its ID in your code
Load Balancing
Distribute requests across multiple API keys or providers to increase throughput and avoid rate limits. Portkey supports multiple load balancing strategies including round-robin, weighted, and priority-based distribution. To configure load balancing:- Log in to your Portkey dashboard
- Navigate to Configs and create a new configuration
- Add multiple targets with the same or different providers
- Set the strategy mode to “loadbalance”
- Configure weights for each target (optional)
- Save the configuration and use its ID in your code
Request Tracing with Metadata
Add custom metadata to track requests across your application. This helps you analyze usage patterns, debug issues, and attribute costs to specific users or features.Budget and Rate Limits
Set spending limits and rate limits to control costs and prevent unexpected charges. Configure these in your Portkey dashboard under Settings > Budgets.Python
Supported Cerebras Models
All current Cerebras models are available through Portkey:| Model | Parameters | Best For |
|---|---|---|
@cerebras/llama-3.3-70b | 70B | Complex reasoning, long-form content |
@cerebras/qwen-3-32b | 32B | Multilingual tasks, balanced performance |
@cerebras/gpt-oss-120b | 120B | Most capable open-source model |
@cerebras/llama3.1-8b | 8B | Fast responses, simple tasks |
Monitoring and Analytics
Portkey’s dashboard provides comprehensive insights into your Cerebras API usage. Access your dashboard at app.portkey.ai.Request Logs
View all requests with full details including:- Complete prompts and responses
- Token usage and costs
- Latency and performance metrics
- Custom metadata
- Error messages and stack traces
Performance Metrics
Track key performance indicators:- Latency: P50, P95, and P99 response times
- Throughput: Requests per second and tokens per second
- Success Rate: Percentage of successful requests
- Cache Hit Rate: Percentage of requests served from cache
Cost Analytics
Monitor spending across:- Different models and providers
- Time periods (hourly, daily, monthly)
- Users, features, or environments (via metadata)
- API keys and virtual keys
Custom Dashboards
Create custom views filtered by:- Metadata fields (user_id, environment, etc.)
- Model or provider
- Time range
- Success or error status
- Cache hits or misses
Troubleshooting
Model Not Found or Invalid
Model Not Found or Invalid
If you see an error about an invalid model:
- Ensure you’re using the
@cerebras/prefix with your model name (e.g.,@cerebras/gpt-oss-120b) - Check that the model name is spelled correctly
- Verify the model is available in the Cerebras models documentation
- Confirm your Portkey API key is valid and active
Requests Not Appearing in Dashboard
Requests Not Appearing in Dashboard
If requests aren’t showing up in your Portkey dashboard:
- Confirm your
PORTKEY_API_KEYis correct and active - Check that you’re using the Portkey client or routing through the Portkey gateway
- Verify your network allows outbound connections to
api.portkey.ai - Check the browser console for any CORS or network errors
- Wait a few seconds—there may be a slight delay in log processing
Rate Limit Errors
Rate Limit Errors
If you’re hitting rate limits:
- Check your Cerebras account’s rate limits in the Cerebras dashboard
- Implement request queuing or exponential backoff retry logic
- Use Portkey’s load balancing feature to distribute requests across multiple API keys
- Consider upgrading your Cerebras plan for higher rate limits
Cache Not Working
Cache Not Working
If caching isn’t reducing your costs:
- Ensure you’re using identical request parameters for cache hits (model, messages, temperature, etc.)
- Check that
cache_force_refreshis set toFalse - Verify caching is enabled in your Portkey organization settings
- Review cache analytics in the dashboard to see hit rates
- Consider using semantic cache for similar but not identical queries
Fallback Not Triggering
Fallback Not Triggering
If fallbacks aren’t working as expected:
- Verify your fallback configuration includes valid virtual keys
- Check that the primary provider is actually failing (not just slow)
- Review the fallback strategy mode (
fallbackvsloadbalance) - Check logs in the dashboard to see which provider handled each request
High Latency
High Latency
If you’re experiencing slower response times:
- Check the Portkey status page for any ongoing issues
- Review latency metrics in the dashboard to identify patterns
- Consider enabling caching for frequently asked questions
- Verify your network connection and geographic location relative to Portkey’s servers
- Try using a different Cerebras model (smaller models are faster)
For additional support, visit Portkey’s documentation, contact their support team, or join their Discord community.
Next Steps
Now that you have Portkey set up with Cerebras, explore these advanced features:- Prompt Templates - Create reusable prompt templates with variables
- Guardrails - Filter inputs and outputs for safety and compliance
- Continuous Logging - Export logs to your data warehouse
- A/B Testing - Compare different models or prompts
- Cerebras Models - Explore all available Cerebras models and their capabilities

