What is Pydantic AI?
Pydantic AI is a Python agent framework designed to make it easy to build production-grade applications with Generative AI. It leverages Pydantic’s validation and serialization capabilities to create type-safe, reliable AI agents. When combined with Cerebras Inference, you get both the safety of type validation and the speed of the world’s fastest inference. Key features include:- Type-safe structured outputs using Pydantic models
- Function calling (tools) to extend agent capabilities
- Streaming support for real-time responses
- Async-first design for high-performance applications
- Built-in validation to ensure reliable outputs
Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here
- Python 3.9 or higher - Pydantic AI requires Python 3.9+
- Basic familiarity with Pydantic - Understanding of Pydantic models is helpful but not required
Configure Pydantic AI with Cerebras
1
Install Pydantic AI
Install Pydantic AI using pip. This will install Pydantic AI along with its core dependencies:For development with additional tools, you can install optional dependencies:
2
Set up your API key
Create a Alternatively, you can set it as an environment variable in your shell:
.env file in your project directory to store your Cerebras API key securely:3
Create your first agent
Pydantic AI makes it easy to create agents with Cerebras models. Here’s a simple example that creates an agent using Llama 3.3 70B. The This creates a basic agent that uses Cerebras for inference. The agent will respond with validated, type-safe outputs.
OpenAIModel class is used because Cerebras provides an OpenAI-compatible API:4
Add structured outputs with Pydantic models
One of Pydantic AI’s most powerful features is the ability to get structured, validated outputs. Define a Pydantic model to specify the exact structure you want, and the agent will automatically extract and validate the data:The agent will automatically validate the output against your Pydantic model, ensuring type safety and data integrity.
5
Use async for better performance
For production applications, use async/await to handle multiple requests efficiently. This is especially powerful with Cerebras’s fast inference speeds, allowing you to process many requests concurrently:Async operations allow your application to scale efficiently while taking advantage of Cerebras’s ultra-low latency.
6
Add tools for function calling
Pydantic AI supports tools (function calling) to extend your agent’s capabilities beyond text generation. Tools allow your agent to perform actions, fetch data, or interact with external systems:The agent will automatically determine when to call tools based on the user’s query, making your agents more capable and interactive.
7
Stream responses in real-time
For real-time applications like chatbots or interactive UIs, you can stream responses from Cerebras as they’re generated. This provides a better user experience with immediate feedback:Streaming is particularly effective with Cerebras’s high-speed inference, delivering tokens to users almost instantaneously.
Available Cerebras Models
You can use any of the following Cerebras models with Pydantic AI. Each model offers different trade-offs between capability, speed, and cost:llama-3.3-70b- Most capable model, excellent for complex reasoning, structured outputs, and multi-step tasksqwen-3-32b- Strong multilingual support, great for international applicationsllama3.1-8b- Fast and efficient for simpler tasks, ideal for high-throughput scenariosgpt-oss-120b- Large open-source model with broad knowledge
OpenAIModel instance:
Advanced Features
Dependency Injection
Pydantic AI supports dependency injection to pass context and state to your agents and tools:Result Validators
Add custom validation logic to ensure outputs meet your requirements:Debugging with Pydantic Logfire
Pydantic Logfire provides powerful debugging and monitoring capabilities for your AI agents. It automatically tracks all agent runs, tool calls, and model interactions:Next Steps
- Explore the Pydantic AI documentation - Learn about advanced features like dependency injection, result validators, and more
- Try different Cerebras models - Experiment with different models to find the best balance of speed and capability for your use case
- Build multi-agent systems - Use Pydantic AI’s multi-agent patterns to create complex workflows
- Add monitoring with Logfire - Integrate Pydantic Logfire for debugging and observability
- Check out examples - Browse the Pydantic AI examples for inspiration
- Read the API reference - Dive deep into the API documentation
FAQ
Why use OpenAIModel for Cerebras?
Why use OpenAIModel for Cerebras?
Cerebras provides an OpenAI-compatible API, so you use the
OpenAIModel class from Pydantic AI. This allows you to leverage the full OpenAI ecosystem while benefiting from Cerebras’s superior speed and performance. Simply point the base_url to Cerebras’s endpoint.Can I use streaming with structured outputs?
Can I use streaming with structured outputs?
Currently, streaming is primarily designed for text outputs. For structured outputs with Pydantic models, use the standard
run() or run_sync() methods. The model needs to generate the complete response before it can be validated against your schema.How do I handle rate limits?
How do I handle rate limits?
Pydantic AI automatically handles retries for transient errors. For rate limiting, you can implement custom retry logic using the
retries parameter when running agents, or use a library like tenacity to add more sophisticated retry strategies around your agent calls.Which model should I choose for structured outputs?
Which model should I choose for structured outputs?
For complex structured outputs with nested objects or strict validation requirements, we recommend using
llama-3.3-70b. It provides the best accuracy for following schemas and generating valid JSON. For simpler extractions, qwen-3-32b or llama3.1-8b can be more cost-effective.Can I use Pydantic AI with other Cerebras features?
Can I use Pydantic AI with other Cerebras features?
Yes! Pydantic AI works seamlessly with all Cerebras features. You can use streaming, function calling, and all supported parameters. Simply pass additional parameters through the model configuration or when calling
run().Troubleshooting
Import Error: “No module named ‘pydantic_ai’”
Make sure you’ve installed Pydantic AI:Authentication Error
If you see authentication errors, verify that:- Your
CEREBRAS_API_KEYenvironment variable is set correctly - Your API key is valid and active (check your Cerebras dashboard)
- You’re using the correct base URL:
https://api.cerebras.ai/v1 - The API key is being loaded properly with
os.getenv('CEREBRAS_API_KEY')
Model Not Found Error
Ensure you’re using one of the supported Cerebras models:llama-3.3-70bqwen-3-32bllama3.1-8bgpt-oss-120b
Structured Output Validation Errors
If your structured outputs aren’t validating correctly:- Add field descriptions - Use Pydantic’s
Field(description="...")to provide clear guidance to the model - Simplify your schema - Start with simpler models and gradually add complexity
- Add examples in your prompt - Include sample outputs in your system prompt
- Use a more capable model - Try
llama-3.3-70bfor better structured output quality - Add result validators - Use
@agent.result_validatorto provide feedback and retry logic
Tool Calling Issues
If tools aren’t being called correctly:- Add clear docstrings - Tools need descriptive docstrings that explain what they do
- Use type hints - Properly annotate all parameters with types
- Test tools independently - Verify your tool functions work correctly outside the agent
- Check the system prompt - Make sure your prompt doesn’t discourage tool use
Performance Issues
If you’re experiencing slower than expected performance:- Use async operations - Switch from
run_sync()torun()with async/await - Enable streaming - Use
run_stream()for better perceived performance - Choose the right model - Use
llama3.1-8bfor simple tasks that don’t require the largest model - Batch requests - Use
asyncio.gather()to process multiple requests concurrently
For more detailed troubleshooting and community support, visit the Pydantic AI GitHub discussions or check the troubleshooting guide.

