Skip to main content

What is Pydantic AI?

Pydantic AI is a Python agent framework designed to make it easy to build production-grade applications with Generative AI. It leverages Pydantic’s validation and serialization capabilities to create type-safe, reliable AI agents. When combined with Cerebras Inference, you get both the safety of type validation and the speed of the world’s fastest inference. Key features include:
  • Type-safe structured outputs using Pydantic models
  • Function calling (tools) to extend agent capabilities
  • Streaming support for real-time responses
  • Async-first design for high-performance applications
  • Built-in validation to ensure reliable outputs
Learn more on the Pydantic AI website.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here
  • Python 3.9 or higher - Pydantic AI requires Python 3.9+
  • Basic familiarity with Pydantic - Understanding of Pydantic models is helpful but not required

Configure Pydantic AI with Cerebras

1

Install Pydantic AI

Install Pydantic AI using pip. This will install Pydantic AI along with its core dependencies:
pip install pydantic-ai openai 
For development with additional tools, you can install optional dependencies:
pip install 'pydantic-ai[logfire]'  # For debugging and monitoring
2

Set up your API key

Create a .env file in your project directory to store your Cerebras API key securely:
CEREBRAS_API_KEY=your-cerebras-api-key-here
Alternatively, you can set it as an environment variable in your shell:
export CEREBRAS_API_KEY=your-cerebras-api-key-here
3

Create your first agent

Pydantic AI makes it easy to create agents with Cerebras models. Here’s a simple example that creates an agent using Llama 3.3 70B. The OpenAIChatModel class is used because Cerebras provides an OpenAI-compatible API:
import os
import asyncio
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel

# Create a Cerebras model instance
model = OpenAIChatModel(
    'llama-3.3-70b',
    provider='cerebras'
)

# Create an agent with the Cerebras model
agent = Agent(
    model=model,
    system_prompt='You are a helpful assistant that provides clear, concise answers.'
)

async def main():
    # Run the agent asynchronously
    result = await agent.run('What is the capital of France?')
    print(result)

asyncio.run(main())
This creates a basic agent that uses Cerebras for inference. The agent will respond with validated, type-safe outputs.
4

Add structured outputs with Pydantic models

One of Pydantic AI’s most powerful features is the ability to get structured, validated outputs. Define a Pydantic model to specify the exact structure you want, and the agent will automatically extract and validate the data:
import os
import asyncio
from pydantic import BaseModel, Field
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel

# Define your output structure with validation
class CityInfo(BaseModel):
    city: str = Field(description="The name of the city")
    country: str = Field(description="The country where the city is located")
    population: int = Field(description="Approximate population")
    famous_for: list[str] = Field(description="List of things the city is famous for")

# Create the Cerebras model
model = OpenAIChatModel(
    'llama-3.3-70b',
    provider='cerebras'
)

# Create an agent with structured output
agent = Agent(
    model=model,
    output_type=CityInfo,  # Changed from result_type to output_type
    system_prompt='Extract city information from the user query. Provide accurate, factual data.'
)

# Get structured, validated output
async def get_city_info():
    result = await agent.run('Tell me about Paris')
    print(result)

# Run the async function
import asyncio
asyncio.run(get_city_info())
The agent will automatically validate the output against your Pydantic model, ensuring type safety and data integrity.
5

Use async for better performance

For production applications, use async/await to handle multiple requests efficiently. This is especially powerful with Cerebras’s fast inference speeds, allowing you to process many requests concurrently:
import os
import asyncio
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel

# Create the Cerebras model
model = OpenAIChatModel(
    'llama-3.3-70b',
    provider='cerebras'
)

agent = Agent(
    model=model,
    system_prompt='You are a helpful assistant.'
)

async def run_concurrent_queries():
    results = await asyncio.gather(
        agent.run('What is 2+2?'),
        agent.run('What is the capital of Spain?'),
        agent.run('Who wrote Romeo and Juliet?')
    )

    for i, result in enumerate(results, 1):
        print(f"Answer {i}: {result.output}")

# Run the async function
asyncio.run(run_concurrent_queries())
Async operations allow your application to scale efficiently while taking advantage of Cerebras’s ultra-low latency.
6

Add tools for function calling

Pydantic AI supports tools (function calling) to extend your agent’s capabilities beyond text generation. Tools allow your agent to perform actions, fetch data, or interact with external systems:
import os
import asyncio
from datetime import datetime
from pydantic_ai import Agent, RunContext
from pydantic_ai.models.openai import OpenAIChatModel

model = OpenAIChatModel(
    'llama-3.3-70b',
    provider='cerebras',
)

agent = Agent(
    model=model,
    system_prompt='You are a helpful assistant with access to tools.'
)

@agent.tool
def get_current_time(ctx: RunContext[None]) -> str:
    """Get the current time in a readable format."""
    return datetime.now().strftime('%I:%M %p')

@agent.tool
def calculate(ctx: RunContext[None], expression: str) -> str:
    """Safely evaluate a mathematical expression.
    
    Args:
        expression: A mathematical expression like '15 * 23'
    """
    try:
        # In production, use a safe eval library like numexpr
        result = eval(expression, {'__builtins__': {}}, {})
        return str(result)
    except Exception as e:
        return f"Error: {str(e)}"

# The agent can now use these tools automatically
async def test_tools():
    result = await agent.run('What time is it and what is 15 * 23?')
    print(result)

# Run the async function
asyncio.run(test_tools())
The agent will automatically determine when to call tools based on the user’s query, making your agents more capable and interactive.
7

Stream responses in real-time

For real-time applications like chatbots or interactive UIs, you can stream responses from Cerebras as they’re generated. This provides a better user experience with immediate feedback:
import os
import asyncio
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel

model = OpenAIChatModel(
    'llama-3.3-70b',
    provider='cerebras'
)

agent = Agent(
    model=model,
    system_prompt='You are a creative writing assistant.'
)

# Stream the response token by token
async def stream_response():
    async with agent.run_stream('Write a short poem about AI') as response:
        async for chunk in response.stream_text():
            print(chunk, end='', flush=True)
        print()  # New line at the end

# Run the async function
asyncio.run(stream_response())
Streaming is particularly effective with Cerebras’s high-speed inference, delivering tokens to users almost instantaneously.

Available Cerebras Models

You can use any of the following Cerebras models with Pydantic AI. Each model offers different trade-offs between capability, speed, and cost:
  • llama-3.3-70b - Most capable model, excellent for complex reasoning, structured outputs, and multi-step tasks
  • qwen-3-32b - Strong multilingual support, great for international applications
  • llama3.1-8b - Fast and efficient for simpler tasks, ideal for high-throughput scenarios
  • gpt-oss-120b - Large open-source model with broad knowledge
  • zai-glm-4.6 - Advanced 357B parameter model with strong reasoning capabilities
To switch models, simply change the model name when creating your OpenAIChatModel instance:
import os
from pydantic_ai.models.openai import OpenAIChatModel

model = OpenAIChatModel(
    'qwen-3-32b',  # Change model here
    provider='cerebras'
)

Advanced Features

Dependency Injection

Pydantic AI supports dependency injection to pass context and state to your agents and tools:
import os
import asyncio
from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
from pydantic_ai.models.openai import OpenAIChatModel

@dataclass
class UserContext:
    user_id: str
    preferences: dict

model = OpenAIChatModel(
    'llama-3.3-70b',
    provider='cerebras'
)

agent = Agent(
    model=model,
    deps_type=UserContext
)

@agent.tool
def get_user_preference(ctx: RunContext[UserContext], key: str) -> str:
    """Get a user preference by key."""
    return ctx.deps.preferences.get(key, 'Not set')

# Run with dependencies
async def test_with_deps():
    user_ctx = UserContext(
        user_id='user123',
        preferences={'theme': 'dark', 'language': 'en'}
    )
    result = await agent.run('What is my theme preference?', deps=user_ctx)
    print(result)

# Run the async function
asyncio.run(test_with_deps())

Result Validators

Add custom validation logic to ensure outputs meet your requirements:
import os
import asyncio
from pydantic import BaseModel, Field
from pydantic_ai import Agent, ModelRetry
from pydantic_ai.models.openai import OpenAIChatModel

class Summary(BaseModel):
    title: str
    summary: str = Field(description="A concise summary")
    word_count: int

model = OpenAIChatModel(
    'llama-3.3-70b',
    provider='cerebras'
)

agent = Agent(
    model=model,
    output_type=Summary  # Changed from result_type to output_type
)

@agent.output_validator
def validate_summary_length(ctx, result: Summary) -> Summary:
    """Ensure summary is not too long."""
    if result.word_count > 500:
        raise ModelRetry('Summary is too long. Please make it more concise.')
    return result

async def test_validator():
    result = await agent.run('Summarize the history of artificial intelligence in 2-3 sentences')
    print(result)

# Run the async function
asyncio.run(test_validator())

Debugging with Pydantic Logfire

Pydantic Logfire provides powerful debugging and monitoring capabilities for your AI agents. It automatically tracks all agent runs, tool calls, and model interactions:
import os
import asyncio
import logfire
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel

# Configure Logfire
logfire.configure()

model = OpenAIChatModel(
    'llama-3.3-70b',
    provider='cerebras'
)

agent = Agent(model=model)

# All agent runs are automatically logged to Logfire
async def test_logfire():
    result = await agent.run('What is machine learning?')
    print(result)

# Run the async function
asyncio.run(test_logfire())
Logfire provides a web UI where you can view traces, debug issues, and monitor performance. Learn more in the Pydantic Logfire documentation.

Next Steps

  • Explore the Pydantic AI documentation - Learn about advanced features like dependency injection, result validators, and more
  • Try different Cerebras models - Experiment with different models to find the best balance of speed and capability for your use case
  • Build multi-agent systems - Use Pydantic AI’s multi-agent patterns to create complex workflows
  • Add monitoring with Logfire - Integrate Pydantic Logfire for debugging and observability
  • Check out examples - Browse the Pydantic AI examples for inspiration
  • Read the API reference - Dive deep into the API documentation
  • Migrate to GLM4.6: Ready to upgrade? Follow our migration guide to start using our latest model

FAQ

Cerebras provides an OpenAI-compatible API, so you use the OpenAIModel class from Pydantic AI. This allows you to leverage the full OpenAI ecosystem while benefiting from Cerebras’s superior speed and performance. Simply point the base_url to Cerebras’s endpoint.
Currently, streaming is primarily designed for text outputs. For structured outputs with Pydantic models, use the standard run() or run_sync() methods. The model needs to generate the complete response before it can be validated against your schema.
Pydantic AI automatically handles retries for transient errors. For rate limiting, you can implement custom retry logic using the retries parameter when running agents, or use a library like tenacity to add more sophisticated retry strategies around your agent calls.
For complex structured outputs with nested objects or strict validation requirements, we recommend using llama-3.3-70b. It provides the best accuracy for following schemas and generating valid JSON. For simpler extractions, qwen-3-32b or llama3.1-8b can be more cost-effective.
Yes! Pydantic AI works seamlessly with all Cerebras features. You can use streaming, function calling, and all supported parameters. Simply pass additional parameters through the model configuration or when calling run().

Troubleshooting

Import Error: “No module named ‘pydantic_ai’”

Make sure you’ve installed Pydantic AI:
pip install pydantic-ai
If you’re using a virtual environment, ensure it’s activated before installing.

Authentication Error

If you see authentication errors, verify that:
  1. Your CEREBRAS_API_KEY environment variable is set correctly
  2. Your API key is valid and active (check your Cerebras dashboard)
  3. You’re using the correct base URL: https://api.cerebras.ai/v1
  4. The API key is being loaded properly with os.getenv('CEREBRAS_API_KEY')

Model Not Found Error

Ensure you’re using one of the supported Cerebras models:
  • llama-3.3-70b
  • qwen-3-32b
  • llama3.1-8b
  • gpt-oss-120b
Model names are case-sensitive and must match exactly as shown above.

Structured Output Validation Errors

If your structured outputs aren’t validating correctly:
  1. Add field descriptions - Use Pydantic’s Field(description="...") to provide clear guidance to the model
  2. Simplify your schema - Start with simpler models and gradually add complexity
  3. Add examples in your prompt - Include sample outputs in your system prompt
  4. Use a more capable model - Try llama-3.3-70b for better structured output quality
  5. Add result validators - Use @agent.result_validator to provide feedback and retry logic

Tool Calling Issues

If tools aren’t being called correctly:
  1. Add clear docstrings - Tools need descriptive docstrings that explain what they do
  2. Use type hints - Properly annotate all parameters with types
  3. Test tools independently - Verify your tool functions work correctly outside the agent
  4. Check the system prompt - Make sure your prompt doesn’t discourage tool use

Performance Issues

If you’re experiencing slower than expected performance:
  1. Use async operations - Switch from run_sync() to run() with async/await
  2. Enable streaming - Use run_stream() for better perceived performance
  3. Choose the right model - Use llama3.1-8b for simple tasks that don’t require the largest model
  4. Batch requests - Use asyncio.gather() to process multiple requests concurrently
For more detailed troubleshooting and community support, visit the Pydantic AI GitHub discussions or check the troubleshooting guide.