Realtime Voice Translation Agent

Seb Duerr
January 23, 2026

This cookbook demonstrates how to build a realtime voice translation agent that:

Captures audio from your browser microphone
Transcribes speech using OpenAI Whisper
Translates text using Cerebras LLM at ~80-150ms latency
Speaks the translation back using OpenAI TTS

What You’ll Learn

LiveKit Agents Framework - Building voice AI agents with WebRTC
Cerebras Integration - Using Cerebras LLMs via OpenAI-compatible API
Voice Pipeline - Connecting STT → LLM → TTS for realtime conversation
Jupyter Integration - Running voice agents inline with microphone widgets

Setup

Install Dependencies

%pip install -q "livekit-agents[openai,silero]>=1.3.0" python-dotenv

Load API Keys

Get API keys to get started:

Cerebras: https://cloud.cerebras.ai (free tier available)
OpenAI: https://platform.openai.com (for Whisper STT and TTS)
LiveKit: https://cloud.livekit.io (free tier available)

For detailed LiveKit setup, see our LiveKit Integration Guide.

CEREBRAS_API_KEY=your-key-here
OPENAI_API_KEY=your-key-here
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your-key-here
LIVEKIT_API_SECRET=your-key-here

import os
from dotenv import load_dotenv

load_dotenv()

required = ["OPENAI_API_KEY", "CEREBRAS_API_KEY", "LIVEKIT_URL", "LIVEKIT_API_KEY", "LIVEKIT_API_SECRET"]
missing = [k for k in required if not os.getenv(k)]
if missing:
    raise RuntimeError(f"Missing API keys: {', '.join(missing)}. Add them to .env file.")

print("✅ API keys loaded")

Part 1: Translation Prompt

The translation prompt instructs the LLM to act as a real-time translator. It’s designed to be concise and focused on accurate translation without commentary.

TARGET_LANGUAGE = "Spanish"  # Change this to your desired target language

def get_translation_prompt(target_language: str) -> str:
    """Generate system prompt for translation."""
    return f"""You are a real-time translator. Your task is to translate spoken text to {target_language}.

Rules:
1. Translate the input text accurately to {target_language}
2. Preserve the tone and intent of the original message
3. Keep translations natural and conversational
4. If the input is already in {target_language}, repeat it with minor improvements if needed
5. Do NOT add explanations or commentary - just translate
6. Respond ONLY with the translated text"""

Why This Prompt Works

Single responsibility: The LLM only translates, no explanations
Tone preservation: Maintains the speaker’s intent
Edge case handling: Handles same-language input gracefully
Minimal latency: Short responses = faster TTS

Part 2: Configure the Voice Agent

LiveKit Agents provides a high-level API for building voice AI applications. We configure:

VAD (Voice Activity Detection): Silero VAD detects when the user is speaking
STT (Speech-to-Text): OpenAI Whisper transcribes audio
LLM: Cerebras for ultra-fast translation
TTS (Text-to-Speech): OpenAI TTS speaks the translation

import logging
import os
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    WorkerOptions,
    AgentSession,
)
from livekit.agents.voice import Agent as VoiceAgent
from livekit.plugins import openai, silero

# Suppress verbose logging
logging.getLogger("livekit").setLevel(logging.WARNING)
logging.getLogger("livekit.agents").setLevel(logging.WARNING)

Part 3: Create the Agent Entrypoint

The entrypoint function is called when a user joins the LiveKit room. It sets up the voice pipeline and starts the translation session.

async def entrypoint(ctx: JobContext):
    """LiveKit agent entrypoint."""
    # Connect to the room
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
    
    print(f"🎤 Connected to room: {ctx.room.name}")
    
    # Create Cerebras LLM client (OpenAI-compatible)
    cerebras_llm = openai.LLM(
        model="llama-3.1-8b",
        base_url="https://api.cerebras.ai/v1",
        api_key=os.environ["CEREBRAS_API_KEY"],
        extra_headers={"X-Cerebras-3rd-Party-Integration": "realtime-translation"}
    )
    
    # Create voice agent with translation instructions
    agent = VoiceAgent(
        instructions=get_translation_prompt(TARGET_LANGUAGE),
    )
    
    # Create session with STT/LLM/TTS/VAD components
    session = AgentSession(
        vad=silero.VAD.load(),
        stt=openai.STT(model="whisper-1"),
        llm=cerebras_llm,
        tts=openai.TTS(model="tts-1", voice="alloy"),
    )
    
    # Start session with agent and room
    await session.start(agent=agent, room=ctx.room)
    
    # Greet the user
    await session.say(f"Translation agent ready. I will translate to {TARGET_LANGUAGE}. Please speak.")


print(f"✅ Agent defined. Target language: {TARGET_LANGUAGE}")

Key Components Explained

Component	Purpose	Provider
`silero.VAD`	Detects when user starts/stops speaking	Silero
`openai.STT`	Transcribes speech to text	OpenAI Whisper
`cerebras_llm`	Translates text at 450+ tokens/sec	Cerebras
`openai.TTS`	Converts translation to speech	OpenAI

Part 4: Run the Agent in Jupyter

LiveKit provides a Jupyter integration that displays an inline microphone widget. This allows you to test the agent directly in your notebook.

from livekit.agents import jupyter

# Run the agent inside the notebook
jupyter.run_app(
    WorkerOptions(entrypoint_fnc=entrypoint)
)

How It Works

Widget appears: An embedded audio widget displays below the cell
Microphone access: Browser requests microphone permission
Speak: Your voice is captured and sent to the agent
Translation: Cerebras translates in ~80-150ms
Response: You hear the translation spoken back

Important: Run this notebook in a browser (not VS Code or other IDEs) for proper microphone access via the LiveKit widget.

Part 5: Supported Languages

The translation agent supports any language that Llama-3.1-8B can translate. Common options:

Language	Code	Example Greeting
English	en	”Hello, how are you?”
Spanish	es	”¡Hola! ¿Cómo estás?”
German	de	”Hallo, wie geht es dir?”
French	fr	”Bonjour, comment allez-vous?”
Italian	it	”Ciao, come stai?”
Portuguese	pt	”Olá, como você está?”
Japanese	ja	”こんにちは、お元気ですか？“
Chinese	zh	”你好，你好吗？”

To change the target language, modify TARGET_LANGUAGE and re-run the agent cells.

Performance

Stage	Latency
STT (Whisper)	200-500ms
LLM (Cerebras)	80-150ms
TTS (OpenAI)	100-300ms
Total	500-1200ms

Cerebras provides ~450 tokens/sec inference speed, enabling natural conversational translation with minimal perceived delay.

Summary

What We Built

A realtime voice translation agent with:

LiveKit Agents for WebRTC voice handling
Cerebras Llama-3.1-8B for ultra-fast translation
OpenAI Whisper for accurate speech recognition
OpenAI TTS for natural speech synthesis
Jupyter integration for easy testing

Key Patterns

OpenAI-Compatible API: Cerebras works with any OpenAI-compatible client
Voice Pipeline: VAD → STT → LLM → TTS for seamless conversation
Minimal Prompts: Short, focused prompts reduce latency
Browser Integration: Jupyter widgets enable microphone access

Next Steps

Add language detection for automatic source language identification
Implement conversation history for context-aware translation
Add support for multiple simultaneous languages
Deploy as a standalone web application

Resources

Acknowledgements

Thank you to the Cerebras team—Ryan, Ryann, Zhenwei, and Neeraj—for their support and feedback during the development of this cookbook.

Cookbook

​What You’ll Learn

​Setup

​Install Dependencies

​Load API Keys

​Part 1: Translation Prompt

​Why This Prompt Works

​Part 2: Configure the Voice Agent

​Part 3: Create the Agent Entrypoint

​Key Components Explained

​Part 4: Run the Agent in Jupyter

​How It Works

​Part 5: Supported Languages

​Performance

​Summary

​What We Built

​Key Patterns

​Next Steps

​Resources

​Acknowledgements

What You’ll Learn

Setup

Install Dependencies

Load API Keys

Part 1: Translation Prompt

Why This Prompt Works

Part 2: Configure the Voice Agent

Part 3: Create the Agent Entrypoint

Key Components Explained

Part 4: Run the Agent in Jupyter

How It Works

Part 5: Supported Languages

Performance

Summary

What We Built

Key Patterns

Next Steps

Resources

Acknowledgements