What is LiteLLM?
LiteLLM is a lightweight Python package that provides a unified interface for calling 100+ LLM providers (OpenAI, Azure, Anthropic, Cohere, Cerebras, Replicate, PaLM, and more) using the OpenAI format. With LiteLLM, you can easily switch between different LLM providers, including Cerebras, without changing your code structure.Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here.
- Python 3.7 or higher - LiteLLM requires a modern Python environment.
- pip package manager - For installing the LiteLLM library.
Configure LiteLLM
1
Install LiteLLM
Install the LiteLLM package using pip:This will install LiteLLM and all its dependencies, including the OpenAI SDK which LiteLLM uses under the hood.
2
Set up your environment variables
Create a Alternatively, you can export the environment variable in your terminal:
.env file in your project directory to securely store your API key:3
Make your first request with LiteLLM
LiteLLM provides a simple The
completion() function that works across all providers. Here’s how to call Cerebras models:cerebras/ prefix tells LiteLLM to route the request to Cerebras, and the integration header ensures proper tracking and support.4
Use streaming responses
LiteLLM supports streaming responses, which is useful for real-time applications where you want to display tokens as they’re generated:Streaming is particularly powerful with Cerebras’s fast inference speeds, allowing you to deliver near-instantaneous responses to your users.
5
Try different Cerebras models
Cerebras offers several high-performance models optimized for different use cases. Here’s how to use them with LiteLLM:Choose the model that best fits your latency, cost, and capability requirements.
Advanced Features
Using LiteLLM’s Router for Load Balancing
LiteLLM’s Router allows you to load balance across multiple models or providers, including Cerebras. This is useful for distributing traffic and implementing fallback strategies:Fallback and Retry Logic
LiteLLM supports automatic fallbacks between providers, which is useful for building resilient applications:Cost Tracking and Budgets
LiteLLM includes built-in cost tracking to help you monitor your API usage:Why Use LiteLLM with Cerebras?
- Unified Interface: LiteLLM provides a consistent API across 100+ providers, making it easy to experiment with different models or migrate between providers without rewriting code.
- Production-Ready Features: Built-in support for retries, fallbacks, load balancing, and cost tracking.
- Observability: Integrate with logging and monitoring tools to track your LLM usage and performance.
- Speed Meets Flexibility: Combine Cerebras’s industry-leading inference speed with LiteLLM’s flexible routing and management capabilities.
FAQ
Can I use LiteLLM Proxy with Cerebras?
Can I use LiteLLM Proxy with Cerebras?
Yes! LiteLLM Proxy allows you to create a centralized gateway for all your LLM requests. You can configure Cerebras as one of your providers in the proxy configuration file:Learn more in the LiteLLM Proxy documentation.
How do I handle rate limits with LiteLLM?
How do I handle rate limits with LiteLLM?
LiteLLM provides built-in retry logic with exponential backoff. You can configure this behavior:You can also use the Router to distribute load across multiple API keys or models.
Can I use LiteLLM with async/await in Python?
Can I use LiteLLM with async/await in Python?
Yes! LiteLLM supports async operations using This is particularly useful for building high-performance applications that need to handle multiple concurrent requests.
acompletion():Next Steps
- Explore the LiteLLM documentation for advanced features like caching, budgets, and custom callbacks
- Learn about Cerebras’s available models and their capabilities
- Set up LiteLLM Proxy for centralized LLM management
Troubleshooting
Authentication Errors
If you encounter authentication errors, verify that:- Your
CEREBRAS_API_KEYenvironment variable is set correctly - The API key is valid and hasn’t expired
- You’re using the correct
api_baseURL:https://api.cerebras.ai/v1
Model Not Found Errors
Ensure you’re using the correct model name format:- Use
cerebras/llama-3.3-70b(with thecerebras/prefix) - Check the available models to confirm the model name
- Note that model names are case-sensitive
Rate Limiting
If you hit rate limits:- Implement exponential backoff using LiteLLM’s built-in retry logic with
num_retries - Consider using the Router for load balancing across multiple API keys

