Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here
- Python 3.9 or higher - LangChain requires Python 3.9 or higher
- Basic familiarity with LangChain - Visit LangChain documentation to learn more
Configure LangChain with Cerebras
1
Install required dependencies
Install the LangChain Cerebras integration package. This package provides native LangChain integration for Cerebras models, including chat models and embeddings.
Dependency resolution: If you encounter dependency conflicts during installation, try running the install command twice. The first run may update core dependencies, and the second run will resolve any remaining conflicts. This is a known behavior with some package managers when updating to newer versions of
langchain-core.2
Configure environment variables
Create a Alternatively, you can set the environment variable in your shell:
.env file in your project directory to securely store your API key. This keeps your credentials separate from your code.3
Initialize the Cerebras chat model
Import and initialize the Cerebras chat model. The
ChatCerebras class provides a LangChain-compatible interface that automatically handles connection to Cerebras Cloud and includes proper tracking headers.4
Make your first request
Now you can use the model just like any other LangChain chat model. This example demonstrates basic message handling with system and user messages.
5
Use with LangChain chains
LangChain’s real power comes from chaining operations together. This example uses LCEL (LangChain Expression Language) to create a composable translation chain.
6
Enable streaming responses
Cerebras models support streaming, which is perfect for real-time applications. Streaming allows you to display responses as they’re generated, providing a better user experience.
Advanced Usage
Using Different Models
Cerebras supports multiple high-performance models. Choose the right model based on your use case:Building a RAG Application
Here’s a complete example of building a Retrieval-Augmented Generation (RAG) application with Cerebras and LangChain:Async Operations
For high-throughput applications, use async operations to handle multiple requests concurrently:Using with LangChain Agents
Cerebras models work seamlessly with LangChain agents for building autonomous AI systems:Using OpenAI Client Directly
If you prefer to use the OpenAI client directly instead of the LangChain integration, you can configure it to work with Cerebras:Troubleshooting
Why am I getting authentication errors?
Why am I getting authentication errors?
Make sure your If it returns
CEREBRAS_API_KEY environment variable is set correctly. You can verify it’s loaded by running:None, your environment variable isn’t set. Try setting it directly in your code for testing:How do I handle rate limits?
How do I handle rate limits?
Cerebras Cloud has generous rate limits, but if you’re making many concurrent requests, consider:
- Using async operations with controlled concurrency
- Implementing retry logic with exponential backoff
- Batching requests when possible
What's the difference between ChatCerebras and using OpenAI client directly?
What's the difference between ChatCerebras and using OpenAI client directly?
ChatCerebras is a native LangChain integration that:- Provides a consistent interface with other LangChain chat models
- Automatically handles message formatting and parsing
- Supports all LangChain features like callbacks, streaming, and async
- Includes proper integration tracking headers
- Works seamlessly with LangChain chains and agents
ChatCerebras. If you need direct API access, use the OpenAI client with Cerebras base URL.Can I use Cerebras with LangSmith for tracing?
Can I use Cerebras with LangSmith for tracing?
Yes! LangSmith provides powerful debugging and monitoring capabilities for LangChain applications. To enable LangSmith tracing:Visit LangSmith to view your traces and debug your applications.
Which Cerebras model should I use?
Which Cerebras model should I use?
Choose based on your use case:
- llama-3.3-70b: Best for complex reasoning, long-form content, and tasks requiring deep understanding
- qwen-3-32b: Balanced performance for general-purpose applications
- llama3.1-8b: Fastest option for simple tasks and high-throughput scenarios
- gpt-oss-120b: Largest model for the most demanding tasks
Next Steps
- Explore LangChain Documentation - Visit the official LangChain docs to learn about chains, agents, and more
- Try Different Cerebras Models - Experiment with our available models to find the best fit for your use case
- Build Complex Chains - Combine multiple LangChain components to create sophisticated AI workflows
- Explore LangSmith - Use LangSmith for debugging and monitoring your LangChain applications
- Join the Community - Connect with other developers in the LangChain Discord
- Read the API Reference - Check out our Chat Completions API documentation for detailed API information
Additional Resources
- Cerebras API Reference - Detailed API documentation
- LangChain Cerebras Provider - Official LangChain integration docs
- Cerebras Models - Learn about available models and their capabilities
- LangChain Cookbook - Example notebooks and recipes
- LangChain Expression Language (LCEL) - Learn about building chains with LCEL

