Seb Duerr
January 23, 2026
Open in GithubJanuary 23, 2026
- Captures audio from your browser microphone
- Transcribes speech using OpenAI Whisper
- Translates text using Cerebras LLM at ~80-150ms latency
- Speaks the translation back using OpenAI TTS
What You’ll Learn
- LiveKit Agents Framework - Building voice AI agents with WebRTC
- Cerebras Integration - Using Cerebras LLMs via OpenAI-compatible API
- Voice Pipeline - Connecting STT → LLM → TTS for realtime conversation
- Jupyter Integration - Running voice agents inline with microphone widgets
Setup
Install Dependencies
Load API Keys
Get API keys to get started:- Cerebras: https://cloud.cerebras.ai (free tier available)
- OpenAI: https://platform.openai.com (for Whisper STT and TTS)
- LiveKit: https://cloud.livekit.io (free tier available)
Part 1: Translation Prompt
The translation prompt instructs the LLM to act as a real-time translator. It’s designed to be concise and focused on accurate translation without commentary.Why This Prompt Works
- Single responsibility: The LLM only translates, no explanations
- Tone preservation: Maintains the speaker’s intent
- Edge case handling: Handles same-language input gracefully
- Minimal latency: Short responses = faster TTS
Part 2: Configure the Voice Agent
LiveKit Agents provides a high-level API for building voice AI applications. We configure:- VAD (Voice Activity Detection): Silero VAD detects when the user is speaking
- STT (Speech-to-Text): OpenAI Whisper transcribes audio
- LLM: Cerebras for ultra-fast translation
- TTS (Text-to-Speech): OpenAI TTS speaks the translation
Part 3: Create the Agent Entrypoint
The entrypoint function is called when a user joins the LiveKit room. It sets up the voice pipeline and starts the translation session.Key Components Explained
| Component | Purpose | Provider |
|---|---|---|
silero.VAD | Detects when user starts/stops speaking | Silero |
openai.STT | Transcribes speech to text | OpenAI Whisper |
cerebras_llm | Translates text at 450+ tokens/sec | Cerebras |
openai.TTS | Converts translation to speech | OpenAI |
Part 4: Run the Agent in Jupyter
LiveKit provides a Jupyter integration that displays an inline microphone widget. This allows you to test the agent directly in your notebook.How It Works
- Widget appears: An embedded audio widget displays below the cell
- Microphone access: Browser requests microphone permission
- Speak: Your voice is captured and sent to the agent
- Translation: Cerebras translates in ~80-150ms
- Response: You hear the translation spoken back
Important: Run this notebook in a browser (not VS Code or other IDEs) for proper microphone access via the LiveKit widget.
Part 5: Supported Languages
The translation agent supports any language that Llama-3.1-8B can translate. Common options:| Language | Code | Example Greeting |
|---|---|---|
| English | en | ”Hello, how are you?” |
| Spanish | es | ”¡Hola! ¿Cómo estás?” |
| German | de | ”Hallo, wie geht es dir?” |
| French | fr | ”Bonjour, comment allez-vous?” |
| Italian | it | ”Ciao, come stai?” |
| Portuguese | pt | ”Olá, como você está?” |
| Japanese | ja | ”こんにちは、お元気ですか?“ |
| Chinese | zh | ”你好,你好吗?” |
TARGET_LANGUAGE and re-run the agent cells.
Performance
| Stage | Latency |
|---|---|
| STT (Whisper) | 200-500ms |
| LLM (Cerebras) | 80-150ms |
| TTS (OpenAI) | 100-300ms |
| Total | 500-1200ms |
Summary
What We Built
A realtime voice translation agent with:- LiveKit Agents for WebRTC voice handling
- Cerebras Llama-3.1-8B for ultra-fast translation
- OpenAI Whisper for accurate speech recognition
- OpenAI TTS for natural speech synthesis
- Jupyter integration for easy testing
Key Patterns
- OpenAI-Compatible API: Cerebras works with any OpenAI-compatible client
- Voice Pipeline: VAD → STT → LLM → TTS for seamless conversation
- Minimal Prompts: Short, focused prompts reduce latency
- Browser Integration: Jupyter widgets enable microphone access
Next Steps
- Add language detection for automatic source language identification
- Implement conversation history for context-aware translation
- Add support for multiple simultaneous languages
- Deploy as a standalone web application

