Skip to main content

What is Pydantic AI?

Pydantic AI is a Python agent framework designed to make it easy to build production-grade applications with Generative AI. It leverages Pydantic’s validation and serialization capabilities to create type-safe, reliable AI agents. When combined with Cerebras Inference, you get both the safety of type validation and the speed of the world’s fastest inference. Key features include:
  • Type-safe structured outputs using Pydantic models
  • Function calling (tools) to extend agent capabilities
  • Streaming support for real-time responses
  • Async-first design for high-performance applications
  • Built-in validation to ensure reliable outputs
Learn more on the Pydantic AI website.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here
  • Python 3.9 or higher - Pydantic AI requires Python 3.9+
  • Basic familiarity with Pydantic - Understanding of Pydantic models is helpful but not required

Configure Pydantic AI with Cerebras

1

Install Pydantic AI

Install Pydantic AI using pip. This will install Pydantic AI along with its core dependencies:
pip install pydantic-ai openai 
For development with additional tools, you can install optional dependencies:
pip install 'pydantic-ai[logfire]'  # For debugging and monitoring
2

Set up your API key

Create a .env file in your project directory to store your Cerebras API key securely:
CEREBRAS_API_KEY=your-cerebras-api-key-here
Alternatively, you can set it as an environment variable in your shell:
export CEREBRAS_API_KEY=your-cerebras-api-key-here
3

Create your first agent

Pydantic AI makes it easy to create agents with Cerebras models. Here’s a simple example that creates an agent using Llama 3.3 70B. The OpenAIModel class is used because Cerebras provides an OpenAI-compatible API:
import os
import asyncio
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel

# Create a Cerebras model instance
model = OpenAIChatModel(
    'llama-3.3-70b',
    provider='cerebras'
)

# Create an agent with the Cerebras model
agent = Agent(
    model=model,
    system_prompt='You are a helpful assistant that provides clear, concise answers.'
)

async def main():
    # Run the agent asynchronously
    result = await agent.run('What is the capital of France?')
    print(result)

asyncio.run(main())
This creates a basic agent that uses Cerebras for inference. The agent will respond with validated, type-safe outputs.
4

Add structured outputs with Pydantic models

One of Pydantic AI’s most powerful features is the ability to get structured, validated outputs. Define a Pydantic model to specify the exact structure you want, and the agent will automatically extract and validate the data:
import os
from pydantic import BaseModel, Field
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel

# Define your output structure with validation
class CityInfo(BaseModel):
    city: str = Field(description="The name of the city")
    country: str = Field(description="The country where the city is located")
    population: int = Field(description="Approximate population")
    famous_for: list[str] = Field(description="List of things the city is famous for")

# Create the Cerebras model
model = OpenAIChatModel(
    'llama-3.3-70b',
    provider='cerebras'
)

# Create an agent with structured output
agent = Agent(
    model=model,
    output_type=CityInfo,  # Changed from result_type to output_type
    system_prompt='Extract city information from the user query. Provide accurate, factual data.'
)

# Get structured, validated output
result = await agent.run('Tell me about Paris')
print(result)
The agent will automatically validate the output against your Pydantic model, ensuring type safety and data integrity.
5

Use async for better performance

For production applications, use async/await to handle multiple requests efficiently. This is especially powerful with Cerebras’s fast inference speeds, allowing you to process many requests concurrently:
import os
import asyncio
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel

# Create the Cerebras model
model = OpenAIChatModel(
    'llama-3.3-70b',
    provider='cerebras'
)

agent = Agent(
    model=model,
    system_prompt='You are a helpful assistant.'
)

results = await asyncio.gather(
    agent.run('What is 2+2?'),
    agent.run('What is the capital of Spain?'),
    agent.run('Who wrote Romeo and Juliet?')
)

for i, result in enumerate(results, 1):
    print(f"Answer {i}: {result.output}")
Async operations allow your application to scale efficiently while taking advantage of Cerebras’s ultra-low latency.
6

Add tools for function calling

Pydantic AI supports tools (function calling) to extend your agent’s capabilities beyond text generation. Tools allow your agent to perform actions, fetch data, or interact with external systems:
import os
from datetime import datetime
from pydantic_ai import Agent, RunContext
from pydantic_ai.models.openai import OpenAIChatModel

model = OpenAIChatModel(
    'llama-3.3-70b',
    provider='cerebras',
)

agent = Agent(
    model=model,
    system_prompt='You are a helpful assistant with access to tools.'
)

@agent.tool
def get_current_time(ctx: RunContext[None]) -> str:
    """Get the current time in a readable format."""
    return datetime.now().strftime('%I:%M %p')

@agent.tool
def calculate(ctx: RunContext[None], expression: str) -> str:
    """Safely evaluate a mathematical expression.
    
    Args:
        expression: A mathematical expression like '15 * 23'
    """
    try:
        # In production, use a safe eval library like numexpr
        result = eval(expression, {'__builtins__': {}}, {})
        return str(result)
    except Exception as e:
        return f"Error: {str(e)}"

# The agent can now use these tools automatically
result = await agent.run('What time is it and what is 15 * 23?')
print(result)
The agent will automatically determine when to call tools based on the user’s query, making your agents more capable and interactive.
7

Stream responses in real-time

For real-time applications like chatbots or interactive UIs, you can stream responses from Cerebras as they’re generated. This provides a better user experience with immediate feedback:
import os
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel

model = OpenAIModel(
    'llama-3.3-70b',
    base_url='https://api.cerebras.ai/v1',
    api_key=os.getenv('CEREBRAS_API_KEY'),
    http_client_kwargs={
        'headers': {
            'X-Cerebras-3rd-Party-Integration': 'pydantic-ai'
        }
    }
)

agent = Agent(
    model=model,
    system_prompt='You are a creative writing assistant.'
)

# Stream the response token by token
with agent.run_stream('Write a short poem about AI') as response:
    for chunk in response.stream_text():
        print(chunk, end='', flush=True)
    print()  # New line at the end
Streaming is particularly effective with Cerebras’s high-speed inference, delivering tokens to users almost instantaneously.

Available Cerebras Models

You can use any of the following Cerebras models with Pydantic AI. Each model offers different trade-offs between capability, speed, and cost:
  • llama-3.3-70b - Most capable model, excellent for complex reasoning, structured outputs, and multi-step tasks
  • qwen-3-32b - Strong multilingual support, great for international applications
  • llama3.1-8b - Fast and efficient for simpler tasks, ideal for high-throughput scenarios
  • gpt-oss-120b - Large open-source model with broad knowledge
To switch models, simply change the model name when creating your OpenAIModel instance:
model = OpenAIModel(
    'qwen-3-32b',  # Change model here
    base_url='https://api.cerebras.ai/v1',
    api_key=os.getenv('CEREBRAS_API_KEY'),
    http_client_kwargs={
        'headers': {
            'X-Cerebras-3rd-Party-Integration': 'pydantic-ai'
        }
    }
)

Advanced Features

Dependency Injection

Pydantic AI supports dependency injection to pass context and state to your agents and tools:
import os
from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
from pydantic_ai.models.openai import OpenAIChatModel

@dataclass
class UserContext:
    user_id: str
    preferences: dict

model = OpenAIChatModel(
    'llama-3.3-70b',
    provider='cerebras'
)

agent = Agent(
    model=model,
    deps_type=UserContext
)

@agent.tool
def get_user_preference(ctx: RunContext[UserContext], key: str) -> str:
    """Get a user preference by key."""
    return ctx.deps.preferences.get(key, 'Not set')

# Run with dependencies
user_ctx = UserContext(
    user_id='user123',
    preferences={'theme': 'dark', 'language': 'en'}
)
result = await agent.run('What is my theme preference?', deps=user_ctx)
print(result)

Result Validators

Add custom validation logic to ensure outputs meet your requirements:
import os
from pydantic import BaseModel, Field
from pydantic_ai import Agent, ModelRetry
from pydantic_ai.models.openai import OpenAIChatModel

class Summary(BaseModel):
    title: str
    summary: str = Field(description="A concise summary")
    word_count: int

model = OpenAIChatModel(
    'llama-3.3-70b',
    provider='cerebras'
)

agent = Agent(
    model=model,
    output_type=Summary  # Changed from result_type to output_type
)

@agent.output_validator
def validate_summary_length(ctx, result: Summary) -> Summary:
    """Ensure summary is not too long."""
    if result.word_count > 100:
        raise ModelRetry('Summary is too long. Please make it more concise.')
    return result

result = await agent.run('Summarize the history of artificial intelligence')
print(result)

Debugging with Pydantic Logfire

Pydantic Logfire provides powerful debugging and monitoring capabilities for your AI agents. It automatically tracks all agent runs, tool calls, and model interactions:
import os
import logfire
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel

# Configure Logfire
logfire.configure()

model = OpenAIModel(
    'llama-3.3-70b',
    base_url='https://api.cerebras.ai/v1',
    api_key=os.getenv('CEREBRAS_API_KEY'),
    http_client_kwargs={
        'headers': {
            'X-Cerebras-3rd-Party-Integration': 'pydantic-ai'
        }
    }
)

agent = Agent(model=model)

# All agent runs are automatically logged to Logfire
result = agent.run_sync('What is machine learning?')
print(result.data)
Logfire provides a web UI where you can view traces, debug issues, and monitor performance. Learn more in the Pydantic Logfire documentation.

Next Steps

  • Explore the Pydantic AI documentation - Learn about advanced features like dependency injection, result validators, and more
  • Try different Cerebras models - Experiment with different models to find the best balance of speed and capability for your use case
  • Build multi-agent systems - Use Pydantic AI’s multi-agent patterns to create complex workflows
  • Add monitoring with Logfire - Integrate Pydantic Logfire for debugging and observability
  • Check out examples - Browse the Pydantic AI examples for inspiration
  • Read the API reference - Dive deep into the API documentation

FAQ

Cerebras provides an OpenAI-compatible API, so you use the OpenAIModel class from Pydantic AI. This allows you to leverage the full OpenAI ecosystem while benefiting from Cerebras’s superior speed and performance. Simply point the base_url to Cerebras’s endpoint.
Currently, streaming is primarily designed for text outputs. For structured outputs with Pydantic models, use the standard run() or run_sync() methods. The model needs to generate the complete response before it can be validated against your schema.
Pydantic AI automatically handles retries for transient errors. For rate limiting, you can implement custom retry logic using the retries parameter when running agents, or use a library like tenacity to add more sophisticated retry strategies around your agent calls.
For complex structured outputs with nested objects or strict validation requirements, we recommend using llama-3.3-70b. It provides the best accuracy for following schemas and generating valid JSON. For simpler extractions, qwen-3-32b or llama3.1-8b can be more cost-effective.
Yes! Pydantic AI works seamlessly with all Cerebras features. You can use streaming, function calling, and all supported parameters. Simply pass additional parameters through the model configuration or when calling run().

Troubleshooting

Import Error: “No module named ‘pydantic_ai’”

Make sure you’ve installed Pydantic AI:
pip install pydantic-ai
If you’re using a virtual environment, ensure it’s activated before installing.

Authentication Error

If you see authentication errors, verify that:
  1. Your CEREBRAS_API_KEY environment variable is set correctly
  2. Your API key is valid and active (check your Cerebras dashboard)
  3. You’re using the correct base URL: https://api.cerebras.ai/v1
  4. The API key is being loaded properly with os.getenv('CEREBRAS_API_KEY')

Model Not Found Error

Ensure you’re using one of the supported Cerebras models:
  • llama-3.3-70b
  • qwen-3-32b
  • llama3.1-8b
  • gpt-oss-120b
Model names are case-sensitive and must match exactly as shown above.

Structured Output Validation Errors

If your structured outputs aren’t validating correctly:
  1. Add field descriptions - Use Pydantic’s Field(description="...") to provide clear guidance to the model
  2. Simplify your schema - Start with simpler models and gradually add complexity
  3. Add examples in your prompt - Include sample outputs in your system prompt
  4. Use a more capable model - Try llama-3.3-70b for better structured output quality
  5. Add result validators - Use @agent.result_validator to provide feedback and retry logic

Tool Calling Issues

If tools aren’t being called correctly:
  1. Add clear docstrings - Tools need descriptive docstrings that explain what they do
  2. Use type hints - Properly annotate all parameters with types
  3. Test tools independently - Verify your tool functions work correctly outside the agent
  4. Check the system prompt - Make sure your prompt doesn’t discourage tool use

Performance Issues

If you’re experiencing slower than expected performance:
  1. Use async operations - Switch from run_sync() to run() with async/await
  2. Enable streaming - Use run_stream() for better perceived performance
  3. Choose the right model - Use llama3.1-8b for simple tasks that don’t require the largest model
  4. Batch requests - Use asyncio.gather() to process multiple requests concurrently
For more detailed troubleshooting and community support, visit the Pydantic AI GitHub discussions or check the troubleshooting guide.