Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here.
- Cartesia API Key - Visit Cartesia and create an account. Navigate to your profile settings to generate an API key.
- Python 3.10 or higher - Required for running the integration code.
Configure Cartesia Integration
1
Install required dependencies
Install the necessary Python packages for both Cerebras Inference and Cartesia:The
openai package provides the client for Cerebras Inference (OpenAI-compatible), cartesia is the official Cartesia SDK for voice synthesis, and pyaudio enables real-time audio playback.macOS users: If you encounter errors installing
pyaudio, first install PortAudio with: brew install portaudio2
Configure environment variables
Create a Alternatively, you can set these as environment variables in your shell:
.env file in your project directory to securely store your API keys:3
Initialize the Cerebras client
Set up the Cerebras client using the OpenAI-compatible interface. The integration header helps us track and optimize this integration:
4
Create a basic text-to-speech pipeline
Now let’s create a complete pipeline that generates text with Cerebras and converts it to speech with Cartesia. This example demonstrates the power of combining Cerebras’s fast inference with Cartesia’s ultra-low latency voice synthesis:
5
Build a conversational voice agent
For a more advanced use case, here’s how to build a multi-turn conversational agent that maintains context across multiple interactions:This voice agent maintains conversation context and provides natural, spoken responses using Cerebras’s fast inference and Cartesia’s voice synthesis.
6
Stream responses for lower latency
For even faster response times, you can stream the Cerebras output and generate speech in real-time. This provides the lowest possible latency for interactive voice applications:This streaming approach minimizes latency by starting audio playback as soon as the first chunks are available.
Available Models
Cerebras offers several models optimized for voice AI applications:| Model | Parameters | Best For |
|---|---|---|
| llama-3.3-70b | 70B | Best for complex reasoning, long-form content, and tasks requiring deep understanding |
| qwen-3-32b | 32B | Balanced performance for general-purpose applications |
| llama3.1-8b | 8B | Fastest option for simple tasks and high-throughput scenarios |
| gpt-oss-120b | 120B | Largest model for the most demanding tasks |
| zai-glm-4.7 | 357B | Advanced 357B parameter model with strong reasoning capabilities |
model parameter in your Cerebras API calls to switch between models.
Next Steps
Explore Voice Options
Browse Cartesia’s library of natural-sounding voices
Advanced Examples
See production voice agent implementations
API Documentation
Learn more about Cartesia’s API capabilities
Cerebras Models
Explore available Cerebras models for your use case
Additional Resources
- Cartesia Python SDK - Official Python client library
- Line SDK Documentation - Build production voice agents
- Cerebras Tool Use - Add function calling to your voice agents
- Migrate to GLM4.7 - Ready to upgrade? Follow our migration guide to start using our latest model

