; }; export const AuthorBlock = ({name, title, date, githubUrl}) => { return

{name} {title && <>
{title}} {date && <>
{date}}

; }; This cookbook demonstrates how to build a realtime voice translation agent that: * Captures audio from your browser microphone * Transcribes speech using OpenAI Whisper * Translates text using Cerebras LLM at \~80-150ms latency * Speaks the translation back using OpenAI TTS ## What You'll Learn 1. **LiveKit Agents Framework** - Building voice AI agents with WebRTC 2. **Cerebras Integration** - Using Cerebras LLMs via OpenAI-compatible API 3. **Voice Pipeline** - Connecting STT → LLM → TTS for realtime conversation 4. **Jupyter Integration** - Running voice agents inline with microphone widgets ## Setup ### Install Dependencies ```python theme={null} %pip install -q "livekit-agents[openai,silero]>=1.3.0" python-dotenv ``` ### Load API Keys Get API keys to get started: * **Cerebras**: [https://cloud.cerebras.ai](https://cloud.cerebras.ai?utm_source=3pi_realtime-voice-translation\&utm_campaign=docs) (free tier available) * **OpenAI**: [https://platform.openai.com](https://platform.openai.com) (for Whisper STT and TTS) * **LiveKit**: [https://cloud.livekit.io](https://cloud.livekit.io) (free tier available) For detailed LiveKit setup, see our [LiveKit Integration Guide](/integrations/livekit). ```bash theme={null} CEREBRAS_API_KEY=your-key-here OPENAI_API_KEY=your-key-here LIVEKIT_URL=wss://your-project.livekit.cloud LIVEKIT_API_KEY=your-key-here LIVEKIT_API_SECRET=your-key-here ``` ```python theme={null} import os from dotenv import load_dotenv load_dotenv() required = ["OPENAI_API_KEY", "CEREBRAS_API_KEY", "LIVEKIT_URL", "LIVEKIT_API_KEY", "LIVEKIT_API_SECRET"] missing = [k for k in required if not os.getenv(k)] if missing: raise RuntimeError(f"Missing API keys: {', '.join(missing)}. Add them to .env file.") print("✅ API keys loaded") ``` ## Part 1: Translation Prompt The translation prompt instructs the LLM to act as a real-time translator. It's designed to be concise and focused on accurate translation without commentary. ```python theme={null} TARGET_LANGUAGE = "Spanish" # Change this to your desired target language def get_translation_prompt(target_language: str) -> str: """Generate system prompt for translation.""" return f"""You are a real-time translator. Your task is to translate spoken text to {target_language}. Rules: 1. Translate the input text accurately to {target_language} 2. Preserve the tone and intent of the original message 3. Keep translations natural and conversational 4. If the input is already in {target_language}, repeat it with minor improvements if needed 5. Do NOT add explanations or commentary - just translate 6. Respond ONLY with the translated text""" ``` ### Why This Prompt Works * **Single responsibility**: The LLM only translates, no explanations * **Tone preservation**: Maintains the speaker's intent * **Edge case handling**: Handles same-language input gracefully * **Minimal latency**: Short responses = faster TTS ## Part 2: Configure the Voice Agent LiveKit Agents provides a high-level API for building voice AI applications. We configure: * **VAD (Voice Activity Detection)**: Silero VAD detects when the user is speaking * **STT (Speech-to-Text)**: OpenAI Whisper transcribes audio * **LLM**: Cerebras for ultra-fast translation * **TTS (Text-to-Speech)**: OpenAI TTS speaks the translation ```python theme={null} import logging import os from livekit.agents import ( AutoSubscribe, JobContext, WorkerOptions, AgentSession, ) from livekit.agents.voice import Agent as VoiceAgent from livekit.plugins import openai, silero # Suppress verbose logging logging.getLogger("livekit").setLevel(logging.WARNING) logging.getLogger("livekit.agents").setLevel(logging.WARNING) ``` ## Part 3: Create the Agent Entrypoint The entrypoint function is called when a user joins the LiveKit room. It sets up the voice pipeline and starts the translation session. ```python theme={null} async def entrypoint(ctx: JobContext): """LiveKit agent entrypoint.""" # Connect to the room await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) print(f"🎤 Connected to room: {ctx.room.name}") # Create Cerebras LLM client (OpenAI-compatible) cerebras_llm = openai.LLM( model="llama-3.1-8b", base_url="https://api.cerebras.ai/v1", api_key=os.environ["CEREBRAS_API_KEY"], extra_headers={"X-Cerebras-3rd-Party-Integration": "realtime-translation"} ) # Create voice agent with translation instructions agent = VoiceAgent( instructions=get_translation_prompt(TARGET_LANGUAGE), ) # Create session with STT/LLM/TTS/VAD components session = AgentSession( vad=silero.VAD.load(), stt=openai.STT(model="whisper-1"), llm=cerebras_llm, tts=openai.TTS(model="tts-1", voice="alloy"), ) # Start session with agent and room await session.start(agent=agent, room=ctx.room) # Greet the user await session.say(f"Translation agent ready. I will translate to {TARGET_LANGUAGE}. Please speak.") print(f"✅ Agent defined. Target language: {TARGET_LANGUAGE}") ``` ### Key Components Explained | Component | Purpose | Provider | | -------------- | --------------------------------------- | -------------- | | `silero.VAD` | Detects when user starts/stops speaking | Silero | | `openai.STT` | Transcribes speech to text | OpenAI Whisper | | `cerebras_llm` | Translates text at 450+ tokens/sec | Cerebras | | `openai.TTS` | Converts translation to speech | OpenAI | ## Part 4: Run the Agent in Jupyter LiveKit provides a Jupyter integration that displays an inline microphone widget. This allows you to test the agent directly in your notebook. ```python theme={null} from livekit.agents import jupyter # Run the agent inside the notebook jupyter.run_app( WorkerOptions(entrypoint_fnc=entrypoint) ) ``` ### How It Works 1. **Widget appears**: An embedded audio widget displays below the cell 2. **Microphone access**: Browser requests microphone permission 3. **Speak**: Your voice is captured and sent to the agent 4. **Translation**: Cerebras translates in \~80-150ms 5. **Response**: You hear the translation spoken back **Important**: Run this notebook in a browser (not VS Code or other IDEs) for proper microphone access via the LiveKit widget. ## Part 5: Supported Languages The translation agent supports any language that Llama-3.1-8B can translate. Common options: | Language | Code | Example Greeting | | ---------- | ---- | ------------------------------ | | English | en | "Hello, how are you?" | | Spanish | es | "¡Hola! ¿Cómo estás?" | | German | de | "Hallo, wie geht es dir?" | | French | fr | "Bonjour, comment allez-vous?" | | Italian | it | "Ciao, come stai?" | | Portuguese | pt | "Olá, como você está?" | | Japanese | ja | "こんにちは、お元気ですか？" | | Chinese | zh | "你好，你好吗？" | To change the target language, modify `TARGET_LANGUAGE` and re-run the agent cells. ## Performance | Stage | Latency | | -------------- | -------------- | | STT (Whisper) | 200-500ms | | LLM (Cerebras) | 80-150ms | | TTS (OpenAI) | 100-300ms | | **Total** | **500-1200ms** | Cerebras provides \~450 tokens/sec inference speed, enabling natural conversational translation with minimal perceived delay. ## Summary ### What We Built A **realtime voice translation agent** with: * **LiveKit Agents** for WebRTC voice handling * **Cerebras Llama-3.1-8B** for ultra-fast translation * **OpenAI Whisper** for accurate speech recognition * **OpenAI TTS** for natural speech synthesis * **Jupyter integration** for easy testing ### Key Patterns 1. **OpenAI-Compatible API**: Cerebras works with any OpenAI-compatible client 2. **Voice Pipeline**: VAD → STT → LLM → TTS for seamless conversation 3. **Minimal Prompts**: Short, focused prompts reduce latency 4. **Browser Integration**: Jupyter widgets enable microphone access ### Next Steps * Add language detection for automatic source language identification * Implement conversation history for context-aware translation * Add support for multiple simultaneous languages * Deploy as a standalone web application ### Resources * [Cerebras Inference Docs](https://inference-docs.cerebras.ai) * [LiveKit Agents Docs](https://docs.livekit.io/agents/) * [LiveKit Integration Guide](/integrations/livekit) ### Acknowledgements Thank you to the Cerebras team—Ryan, Ryann, Zhenwei, and Neeraj—for their support and feedback during the development of this cookbook.