Skip to main content
ElevenLabs is a leading voice AI platform that provides realistic text-to-speech, voice cloning, and dubbing capabilities. By combining Cerebras Inference’s lightning-fast LLM responses with ElevenLabs’ natural-sounding voice synthesis, you can build responsive voice agents and conversational AI applications. This guide will walk you through integrating Cerebras models with ElevenLabs to create a complete voice AI pipeline.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here.
  • ElevenLabs API Key - Visit ElevenLabs and create an account. Navigate to your profile settings to generate an API key.
  • Python 3.10 or higher - Required for running the integration code.

Configure ElevenLabs Integration

1

Install required dependencies

Install the necessary Python packages for both Cerebras Inference and ElevenLabs:
pip install openai elevenlabs
The openai package provides the client for Cerebras Inference (OpenAI-compatible), and elevenlabs is the official ElevenLabs SDK for voice synthesis.
Audio playback requirement: To play audio files, you may need to install FFmpeg:
  • macOS: brew install ffmpeg
  • Windows: Download from ffmpeg.org or use choco install ffmpeg
  • Linux: sudo apt install ffmpeg (Ubuntu/Debian) or sudo yum install ffmpeg (CentOS/RHEL)
2

Configure environment variables

Create a .env file in your project directory to securely store your API keys:
CEREBRAS_API_KEY=your-cerebras-api-key-here
ELEVENLABS_API_KEY=your-elevenlabs-api-key-here
Alternatively, you can set these as environment variables in your shell:
export CEREBRAS_API_KEY="your-cerebras-api-key-here"
export ELEVENLABS_API_KEY="your-elevenlabs-api-key-here"
3

Initialize the Cerebras client

Set up the Cerebras client using the OpenAI-compatible interface. The integration header helps us track and optimize this integration:
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1"
)
4

Create a basic text-to-speech pipeline

Now let’s create a complete pipeline that generates text with Cerebras and converts it to speech with ElevenLabs. This example demonstrates the power of combining Cerebras’s fast inference with ElevenLabs’s natural voice synthesis:
import os
from openai import OpenAI
from elevenlabs.client import ElevenLabs
from elevenlabs.play import play

# Initialize Cerebras client
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1"
)

# Initialize ElevenLabs client
elevenlabs_client = ElevenLabs(
    api_key=os.getenv("ELEVENLABS_API_KEY")
)

def generate_and_speak(prompt, voice_id="21m00Tcm4TlvDq8ikWAM"):
    """
    Generate text response using Cerebras and convert to speech with ElevenLabs.
    
    Args:
        prompt: User input text
        voice_id: ElevenLabs voice ID (default is Rachel)
    """
    # Generate text response with Cerebras
    response = cerebras_client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": "You are a helpful assistant. Keep responses concise and natural for voice output."},
            {"role": "user", "content": prompt}
        ],
        max_completion_tokens=150,
        temperature=0.7,
        extra_headers={
            "X-Cerebras-3rd-Party-Integration": "elevenlabs"
        }
    )
    
    text_response = response.choices[0].message.content
    print(f"Generated text: {text_response}")
    
    # Convert to speech with ElevenLabs
    audio = elevenlabs_client.text_to_speech.convert(
        text=text_response,
        voice_id=voice_id,
        model_id="eleven_multilingual_v2",
        output_format="mp3_44100_128"
    )
    
    # Play the audio
    play(audio)
    
    # And save to a file
    with open("output.mp3", "wb") as f:
        for chunk in audio:
            f.write(chunk)
    
    return text_response

# Example usage
if __name__ == "__main__":
    generate_and_speak("Tell me an interesting fact about space exploration.")
5

Build a conversational voice agent

For a more advanced use case, here’s how to build a multi-turn conversational agent that maintains context across multiple interactions:
import os
from openai import OpenAI
from elevenlabs.client import ElevenLabs
from elevenlabs.play import play

# Initialize Cerebras client
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1"
)

# Initialize ElevenLabs client
elevenlabs_client = ElevenLabs(
    api_key=os.getenv("ELEVENLABS_API_KEY")
)

class VoiceAgent:
    def __init__(self, system_prompt, voice_id="21m00Tcm4TlvDq8ikWAM"):
        self.conversation_history = [
            {"role": "system", "content": system_prompt}
        ]
        self.voice_id = voice_id
    
    def chat(self, user_input):
        """Process user input and generate voice response."""
        # Add user message to history
        self.conversation_history.append({
            "role": "user",
            "content": user_input
        })
        
        # Generate response with Cerebras
        response = cerebras_client.chat.completions.create(
            model="llama-3.3-70b",
            messages=self.conversation_history,
            max_completion_tokens=200,
            temperature=0.8,
            extra_headers={
                "X-Cerebras-3rd-Party-Integration": "elevenlabs"
            }
        )
        
        assistant_message = response.choices[0].message.content
        
        # Add assistant response to history
        self.conversation_history.append({
            "role": "assistant",
            "content": assistant_message
        })
        
        print(f"Assistant: {assistant_message}")
        
        # Convert to speech using updated ElevenLabs API method
        audio = elevenlabs_client.text_to_speech.convert(
            text=assistant_message,
            voice_id=self.voice_id,
            model_id="eleven_multilingual_v2",  # Use current supported free tier model
            output_format="mp3_44100_128"
        )
        
        # Play audio
        play(audio)
        
        return assistant_message

# Example: Customer service agent
agent = VoiceAgent(
    system_prompt="You are a friendly customer service representative. Be helpful, concise, and professional.",
    voice_id="21m00Tcm4TlvDq8ikWAM"  # Rachel voice
)

# Simulate conversation
agent.chat("Hi, I need help with my order.")
agent.chat("My order number is 12345.")
agent.chat("When will it arrive?")
This voice agent maintains conversation context and provides natural, spoken responses using Cerebras’s fast inference and ElevenLabs’s voice synthesis.
6

Stream responses for lower latency

For even faster response times, you can stream the Cerebras output and generate speech in chunks. Streaming provides the lowest possible latency by starting audio playback as soon as content is ready:
import os
from openai import OpenAI
from elevenlabs.client import ElevenLabs
from elevenlabs.play import play  # Use play for audio playback

cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1"
)

elevenlabs_client = ElevenLabs(
    api_key=os.getenv("ELEVENLABS_API_KEY")
)

def streaming_voice_response(prompt):
    """Generate and speak response with streaming for minimal latency."""
    # Stream text from Cerebras
    response_stream = cerebras_client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        stream=True,
        max_completion_tokens=200,
        extra_headers={
            "X-Cerebras-3rd-Party-Integration": "elevenlabs"
        }
    )
    
    full_response = ""
    for chunk in response_stream:
        if chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            full_response += content
            print(content, end="", flush=True)
    print()  # Newline after text stream
    
    # Generate audio from ElevenLabs
    audio = elevenlabs_client.text_to_speech.convert(
        text=full_response,
        voice_id="21m00Tcm4TlvDq8ikWAM",
        model_id="eleven_multilingual_v2",
        output_format="mp3_44100_128"
    )
    
    # Play audio
    play(audio)
    
    return full_response

# Example usage
streaming_voice_response("Explain quantum computing in simple terms.")

Voice Selection

ElevenLabs offers a variety of pre-made voices. Here are some popular options:
  • Rachel (21m00Tcm4TlvDq8ikWAM) - Calm, professional female voice
  • Adam (pNInz6obpgDQGcFmaJgB) - Deep, authoritative male voice
  • Bella (EXAVITQu4vr4xnSDxMaL) - Soft, friendly female voice
  • Antoni (ErXwobaYiN019PkySvjV) - Well-rounded male voice
You can also create custom voices or clone voices using the ElevenLabs platform. Visit the ElevenLabs Voice Library to explore more options.

Use Cases

The Cerebras + ElevenLabs integration is perfect for:
  • Voice Assistants - Build responsive AI assistants with natural conversation flow
  • Content Creation - Generate and narrate articles, stories, or educational content
  • Customer Service - Create automated voice support systems with human-like responses
  • Accessibility Tools - Convert text content to speech for visually impaired users
  • Interactive Experiences - Build voice-enabled games, tours, or educational apps
  • Podcast Generation - Automatically create podcast episodes from text content

FAQ

If you’re having trouble playing audio:
  1. Ensure you have audio output devices properly configured
  2. Try saving the audio to a file instead of playing directly:
# Assuming you have audio from elevenlabs_client.text_to_speech.convert()
# audio = elevenlabs_client.text_to_speech.convert(...)

# Save to file instead of playing
# with open("output.mp3", "wb") as f:
#     for chunk in audio:
#         f.write(chunk)
  1. Install additional audio libraries if needed: pip install sounddevice soundfile
If you encounter rate limiting:
  • Cerebras: Check your rate limits and consider upgrading your plan
  • ElevenLabs: Monitor your character quota in the ElevenLabs dashboard. Free tier has monthly limits.
To reduce latency:
  1. Use streaming for both text generation and audio synthesis (see Step 6)
  2. Keep responses concise by setting lower max_completion_tokens values
  3. Use faster Cerebras models like llama3.1-8b for simpler tasks
  4. Consider caching common responses

Next Steps

Additional Resources