Skip to main content
Seb Duerr
January 23, 2026
Open in Github
This cookbook demonstrates how to build a realtime voice translation agent that:
  • Captures audio from your browser microphone
  • Transcribes speech using OpenAI Whisper
  • Translates text using Cerebras LLM at ~80-150ms latency
  • Speaks the translation back using OpenAI TTS

What You’ll Learn

  1. LiveKit Agents Framework - Building voice AI agents with WebRTC
  2. Cerebras Integration - Using Cerebras LLMs via OpenAI-compatible API
  3. Voice Pipeline - Connecting STT → LLM → TTS for realtime conversation
  4. Jupyter Integration - Running voice agents inline with microphone widgets

Setup

Install Dependencies

%pip install -q "livekit-agents[openai,silero]>=1.3.0" python-dotenv

Load API Keys

Get API keys to get started: For detailed LiveKit setup, see our LiveKit Integration Guide.
CEREBRAS_API_KEY=your-key-here
OPENAI_API_KEY=your-key-here
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your-key-here
LIVEKIT_API_SECRET=your-key-here
import os
from dotenv import load_dotenv

load_dotenv()

required = ["OPENAI_API_KEY", "CEREBRAS_API_KEY", "LIVEKIT_URL", "LIVEKIT_API_KEY", "LIVEKIT_API_SECRET"]
missing = [k for k in required if not os.getenv(k)]
if missing:
    raise RuntimeError(f"Missing API keys: {', '.join(missing)}. Add them to .env file.")

print("✅ API keys loaded")

Part 1: Translation Prompt

The translation prompt instructs the LLM to act as a real-time translator. It’s designed to be concise and focused on accurate translation without commentary.
TARGET_LANGUAGE = "Spanish"  # Change this to your desired target language

def get_translation_prompt(target_language: str) -> str:
    """Generate system prompt for translation."""
    return f"""You are a real-time translator. Your task is to translate spoken text to {target_language}.

Rules:
1. Translate the input text accurately to {target_language}
2. Preserve the tone and intent of the original message
3. Keep translations natural and conversational
4. If the input is already in {target_language}, repeat it with minor improvements if needed
5. Do NOT add explanations or commentary - just translate
6. Respond ONLY with the translated text"""

Why This Prompt Works

  • Single responsibility: The LLM only translates, no explanations
  • Tone preservation: Maintains the speaker’s intent
  • Edge case handling: Handles same-language input gracefully
  • Minimal latency: Short responses = faster TTS

Part 2: Configure the Voice Agent

LiveKit Agents provides a high-level API for building voice AI applications. We configure:
  • VAD (Voice Activity Detection): Silero VAD detects when the user is speaking
  • STT (Speech-to-Text): OpenAI Whisper transcribes audio
  • LLM: Cerebras for ultra-fast translation
  • TTS (Text-to-Speech): OpenAI TTS speaks the translation
import logging
import os
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    WorkerOptions,
    AgentSession,
)
from livekit.agents.voice import Agent as VoiceAgent
from livekit.plugins import openai, silero

# Suppress verbose logging
logging.getLogger("livekit").setLevel(logging.WARNING)
logging.getLogger("livekit.agents").setLevel(logging.WARNING)

Part 3: Create the Agent Entrypoint

The entrypoint function is called when a user joins the LiveKit room. It sets up the voice pipeline and starts the translation session.
async def entrypoint(ctx: JobContext):
    """LiveKit agent entrypoint."""
    # Connect to the room
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
    
    print(f"🎤 Connected to room: {ctx.room.name}")
    
    # Create Cerebras LLM client (OpenAI-compatible)
    cerebras_llm = openai.LLM(
        model="llama-3.1-8b",
        base_url="https://api.cerebras.ai/v1",
        api_key=os.environ["CEREBRAS_API_KEY"],
        extra_headers={"X-Cerebras-3rd-Party-Integration": "realtime-translation"}
    )
    
    # Create voice agent with translation instructions
    agent = VoiceAgent(
        instructions=get_translation_prompt(TARGET_LANGUAGE),
    )
    
    # Create session with STT/LLM/TTS/VAD components
    session = AgentSession(
        vad=silero.VAD.load(),
        stt=openai.STT(model="whisper-1"),
        llm=cerebras_llm,
        tts=openai.TTS(model="tts-1", voice="alloy"),
    )
    
    # Start session with agent and room
    await session.start(agent=agent, room=ctx.room)
    
    # Greet the user
    await session.say(f"Translation agent ready. I will translate to {TARGET_LANGUAGE}. Please speak.")


print(f"✅ Agent defined. Target language: {TARGET_LANGUAGE}")

Key Components Explained

ComponentPurposeProvider
silero.VADDetects when user starts/stops speakingSilero
openai.STTTranscribes speech to textOpenAI Whisper
cerebras_llmTranslates text at 450+ tokens/secCerebras
openai.TTSConverts translation to speechOpenAI

Part 4: Run the Agent in Jupyter

LiveKit provides a Jupyter integration that displays an inline microphone widget. This allows you to test the agent directly in your notebook.
from livekit.agents import jupyter

# Run the agent inside the notebook
jupyter.run_app(
    WorkerOptions(entrypoint_fnc=entrypoint)
)

How It Works

  1. Widget appears: An embedded audio widget displays below the cell
  2. Microphone access: Browser requests microphone permission
  3. Speak: Your voice is captured and sent to the agent
  4. Translation: Cerebras translates in ~80-150ms
  5. Response: You hear the translation spoken back
Important: Run this notebook in a browser (not VS Code or other IDEs) for proper microphone access via the LiveKit widget.

Part 5: Supported Languages

The translation agent supports any language that Llama-3.1-8B can translate. Common options:
LanguageCodeExample Greeting
Englishen”Hello, how are you?”
Spanishes”¡Hola! ¿Cómo estás?”
Germande”Hallo, wie geht es dir?”
Frenchfr”Bonjour, comment allez-vous?”
Italianit”Ciao, come stai?”
Portuguesept”Olá, como você está?”
Japaneseja”こんにちは、お元気ですか?“
Chinesezh”你好,你好吗?”
To change the target language, modify TARGET_LANGUAGE and re-run the agent cells.

Performance

StageLatency
STT (Whisper)200-500ms
LLM (Cerebras)80-150ms
TTS (OpenAI)100-300ms
Total500-1200ms
Cerebras provides ~450 tokens/sec inference speed, enabling natural conversational translation with minimal perceived delay.

Summary

What We Built

A realtime voice translation agent with:
  • LiveKit Agents for WebRTC voice handling
  • Cerebras Llama-3.1-8B for ultra-fast translation
  • OpenAI Whisper for accurate speech recognition
  • OpenAI TTS for natural speech synthesis
  • Jupyter integration for easy testing

Key Patterns

  1. OpenAI-Compatible API: Cerebras works with any OpenAI-compatible client
  2. Voice Pipeline: VAD → STT → LLM → TTS for seamless conversation
  3. Minimal Prompts: Short, focused prompts reduce latency
  4. Browser Integration: Jupyter widgets enable microphone access

Next Steps

  • Add language detection for automatic source language identification
  • Implement conversation history for context-aware translation
  • Add support for multiple simultaneous languages
  • Deploy as a standalone web application

Resources

Acknowledgements

Thank you to the Cerebras team—Ryan, Ryann, Zhenwei, and Neeraj—for their support and feedback during the development of this cookbook.