What is AG2?
AG2 (formerly AutoGen) is an open-source framework for building and orchestrating multi-agent AI workflows. It enables developers to create sophisticated AI systems where multiple agents collaborate, reason, and solve complex problems together. Learn more at AG2 documentation. By integrating Cerebras Inference with AG2, you combine AG2’s powerful agent orchestration with Cerebras’s ultra-fast inference speeds, making it ideal for real-time multi-agent applications.Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here.
- Python 3.10 or higher - AG2 requires Python 3.10+. Check your version with
python --version.
Configure AG2
1
Install AG2 with Cerebras support
AG2 provides a dedicated extra for Cerebras integration. Install it using pip:This installs AG2 along with the necessary dependencies to communicate with Cerebras’s API. If you’re upgrading from an existing installation, use
pip install -U ag2[cerebras].2
Set up your API key
Add your Cerebras API key to your environment variables:For Windows users:Alternatively, create a
.env file in your project directory:3
Create your configuration file
AG2 uses a configuration list to define which models and APIs to use. Create a file named The
OAI_CONFIG_LIST.json in your project directory:api_type: "cerebras" parameter tells AG2 to use the Cerebras client, which handles API communication and tracks token usage for cost monitoring.Build Your First Multi-Agent System
Let’s create a simple two-agent system where a user proxy agent and an assistant agent collaborate to solve a coding problem.1
Import required modules
Start by importing AG2’s agent classes and configuration utilities:
2
Configure your LLM
Set up the configuration to use Cerebras models with your API key:You can add multiple models to the
config_list for automatic fallback if one is unavailable.3
Create your agents
Create two agents: an assistant that writes code and a user proxy that executes it:The
AssistantAgent generates responses using Cerebras models, while the UserProxyAgent can execute code automatically.4
Start the conversation
Initiate a conversation between the agents with a coding task:The assistant will write a function that returns Paris attractions, execute it, and display the results.
5
Complete example
Here’s the full working example:This example asks the assistant to write code that computes 32 × 32. The user proxy executes the code and displays the result (1024). The conversation ends when the assistant says “TERMINATE”.
Available Models
Cerebras offers several high-performance models optimized for different use cases:| Model | Parameters | Best For |
|---|---|---|
| llama-3.3-70b | 70B | Best for complex reasoning, long-form content, and tasks requiring deep understanding |
| qwen-3-32b | 32B | Balanced performance for general-purpose applications |
| llama3.1-8b | 8B | Fastest option for simple tasks and high-throughput scenarios |
| gpt-oss-120b | 120B | Largest model for the most demanding tasks |
| zai-glm-4.6 | 357B | Advanced 357B parameter model with strong reasoning capabilities |
llama-3.3-70b.
Advanced Configuration
Configure Model Parameters
Cerebras supports several parameters to fine-tune model behavior. Add these to your configuration:max_tokens: Maximum number of tokens to generate (integer ≥ 0)seed: Random seed for reproducible outputs (integer)stream: Enable streaming responses (true/false)temperature: Controls randomness, 0 to 1.5 (lower = more focused)top_p: Nucleus sampling threshold, 0 to 1 (alternative to temperature)
Set either
temperature or top_p, but not both, as they control similar aspects of generation.Use Environment Variables
Reference environment variables in your configuration file for better security:Configure Multiple Model Fallback
Set up multiple models for automatic fallback if one is unavailable:Track Token Usage and Costs
The Cerebras client automatically tracks token usage. After running agent conversations, you can access statistics through AG2’s built-in cost tracking usinggather_usage_summary([assistant, user_proxy]) to get detailed token usage and cost information for your agents.
Frequently Asked Questions
How do I fix 'API Key Not Found' errors?
How do I fix 'API Key Not Found' errors?
If you see an error about missing API keys:
- Verify your environment variable is set:
echo $CEREBRAS_API_KEY - Check that your
.envfile is in the correct directory - Restart your terminal after setting environment variables
- Try hardcoding the key temporarily to isolate the issue
What if my model isn't responding?
What if my model isn't responding?
If a model isn’t responding:
- Verify the model name matches exactly (case-sensitive)
- Check Cerebras model availability
- Try a different model from your configuration list
- Ensure your API key has access to the requested model
Why is code execution failing?
Why is code execution failing?
If the user proxy agent can’t execute code:
- Check that the
work_direxists or can be created - Verify required Python packages are installed (e.g., matplotlib)
- Review the
code_execution_configsettings - Consider enabling Docker:
"use_docker": True
How can I improve slow response times?
How can I improve slow response times?
If agents are responding slowly:
- Try a smaller, faster model like
llama3.1-8b - Reduce
max_tokensin your configuration - Enable streaming with
"stream": true - Check your network connection to Cerebras’s API
Can I use AG2 with other Cerebras models?
Can I use AG2 with other Cerebras models?
Yes! AG2 supports all Cerebras models. Simply update the
model field in your configuration to use gpt-oss-120b, qwen-3-32b, or any other available model.Next Steps
Now that you have AG2 configured with Cerebras, explore these advanced capabilities:- Build Complex Workflows - Create multi-agent systems with specialized roles (researcher, coder, reviewer)
- Add Human-in-the-Loop - Set
human_input_mode="ALWAYS"to review agent actions before execution - Explore Group Chat - Use AG2’s
GroupChatfeature to orchestrate conversations between multiple agents - Try Different Models - Experiment with different Cerebras models to find the best performance/cost balance
- Want to migrate to GLM4.6? GLM4.6 migration guide

