> ## Documentation Index
> Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Realtime Voice Translation Agent

> Translate spoken conversations to any language with sub-second latency by building a realtime voice translation agent powered by Cerebras and LiveKit.

export const CookbookLayout = () => {
  return <div className="full-width-layout" />;
};

export const AuthorBlock = ({name, title, date, githubUrl}) => {
  return <div style={{
    display: 'flex',
    alignItems: 'center',
    justifyContent: 'space-between',
    marginBottom: '1rem',
    marginTop: '1rem',
    gap: '1rem'
  }}>
      <div>
        <strong>{name}</strong>
        {title && <><br /><span style={{
    fontSize: '0.9em',
    color: '#666'
  }}>{title}</span></>}
        {date && <><br /><span style={{
    fontSize: '0.85em',
    color: '#888'
  }}>{date}</span></>}
      </div>

      {githubUrl && <a href={githubUrl} target="_blank" rel="noopener noreferrer" className="github-button" style={{
    display: 'flex',
    alignItems: 'center',
    gap: '0.5rem',
    padding: '0.5rem 0.75rem',
    textDecoration: 'none',
    borderRadius: '6px',
    fontSize: '0.875rem',
    fontWeight: '500',
    transition: 'all 0.2s'
  }}>
          <svg width="16" height="16" viewBox="0 0 16 16" fill="currentColor" style={{
    flexShrink: 0
  }}>
            <path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0016 8c0-4.42-3.58-8-8-8z" />
          </svg>
          Open in Github
        </a>}
    </div>;
};

<CookbookLayout />

<AuthorBlock name="Seb Duerr" date="January 23, 2026" githubUrl="https://github.com/Cerebras/Cerebras-Inference-Cookbook/blob/main/agents/realtime-voice-translation.ipynb" />

This cookbook demonstrates how to build a realtime voice translation agent that:

* Captures audio from your browser microphone
* Transcribes speech using OpenAI Whisper
* Translates text using Cerebras LLM at \~80-150ms latency
* Speaks the translation back using OpenAI TTS

## What You'll Learn

1. **LiveKit Agents Framework** - Building voice AI agents with WebRTC
2. **Cerebras Integration** - Using Cerebras LLMs via OpenAI-compatible API
3. **Voice Pipeline** - Connecting STT → LLM → TTS for realtime conversation
4. **Jupyter Integration** - Running voice agents inline with microphone widgets

## Setup

### Install Dependencies

```python theme={null}
%pip install -q "livekit-agents[openai,silero]>=1.3.0" python-dotenv
```

### Load API Keys

Get API keys to get started:

* **Cerebras**: [https://cloud.cerebras.ai](https://cloud.cerebras.ai?utm_source=3pi_realtime-voice-translation\&utm_campaign=docs) (free tier available)
* **OpenAI**: [https://platform.openai.com](https://platform.openai.com) (for Whisper STT and TTS)
* **LiveKit**: [https://cloud.livekit.io](https://cloud.livekit.io) (free tier available)

For detailed LiveKit setup, see our [LiveKit Integration Guide](/integrations/livekit).

```bash theme={null}
CEREBRAS_API_KEY=your-key-here
OPENAI_API_KEY=your-key-here
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your-key-here
LIVEKIT_API_SECRET=your-key-here
```

```python theme={null}
import os
from dotenv import load_dotenv

load_dotenv()

required = ["OPENAI_API_KEY", "CEREBRAS_API_KEY", "LIVEKIT_URL", "LIVEKIT_API_KEY", "LIVEKIT_API_SECRET"]
missing = [k for k in required if not os.getenv(k)]
if missing:
    raise RuntimeError(f"Missing API keys: {', '.join(missing)}. Add them to .env file.")

print("✅ API keys loaded")
```

## Part 1: Translation Prompt

The translation prompt instructs the LLM to act as a real-time translator. It's designed to be concise and focused on accurate translation without commentary.

```python theme={null}
TARGET_LANGUAGE = "Spanish"  # Change this to your desired target language

def get_translation_prompt(target_language: str) -> str:
    """Generate system prompt for translation."""
    return f"""You are a real-time translator. Your task is to translate spoken text to {target_language}.

Rules:
1. Translate the input text accurately to {target_language}
2. Preserve the tone and intent of the original message
3. Keep translations natural and conversational
4. If the input is already in {target_language}, repeat it with minor improvements if needed
5. Do NOT add explanations or commentary - just translate
6. Respond ONLY with the translated text"""
```

### Why This Prompt Works

* **Single responsibility**: The LLM only translates, no explanations
* **Tone preservation**: Maintains the speaker's intent
* **Edge case handling**: Handles same-language input gracefully
* **Minimal latency**: Short responses = faster TTS

## Part 2: Configure the Voice Agent

LiveKit Agents provides a high-level API for building voice AI applications. We configure:

* **VAD (Voice Activity Detection)**: Silero VAD detects when the user is speaking
* **STT (Speech-to-Text)**: OpenAI Whisper transcribes audio
* **LLM**: Cerebras for ultra-fast translation
* **TTS (Text-to-Speech)**: OpenAI TTS speaks the translation

```python theme={null}
import logging
import os
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    WorkerOptions,
    AgentSession,
)
from livekit.agents.voice import Agent as VoiceAgent
from livekit.plugins import openai, silero

# Suppress verbose logging
logging.getLogger("livekit").setLevel(logging.WARNING)
logging.getLogger("livekit.agents").setLevel(logging.WARNING)
```

## Part 3: Create the Agent Entrypoint

The entrypoint function is called when a user joins the LiveKit room. It sets up the voice pipeline and starts the translation session.

```python theme={null}
async def entrypoint(ctx: JobContext):
    """LiveKit agent entrypoint."""
    # Connect to the room
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
    
    print(f"🎤 Connected to room: {ctx.room.name}")
    
    # Create Cerebras LLM client (OpenAI-compatible)
    cerebras_llm = openai.LLM(
        model="llama-3.1-8b",
        base_url="https://api.cerebras.ai/v1",
        api_key=os.environ["CEREBRAS_API_KEY"],
        extra_headers={"X-Cerebras-3rd-Party-Integration": "realtime-translation"}
    )
    
    # Create voice agent with translation instructions
    agent = VoiceAgent(
        instructions=get_translation_prompt(TARGET_LANGUAGE),
    )
    
    # Create session with STT/LLM/TTS/VAD components
    session = AgentSession(
        vad=silero.VAD.load(),
        stt=openai.STT(model="whisper-1"),
        llm=cerebras_llm,
        tts=openai.TTS(model="tts-1", voice="alloy"),
    )
    
    # Start session with agent and room
    await session.start(agent=agent, room=ctx.room)
    
    # Greet the user
    await session.say(f"Translation agent ready. I will translate to {TARGET_LANGUAGE}. Please speak.")


print(f"✅ Agent defined. Target language: {TARGET_LANGUAGE}")
```

### Key Components Explained

| Component      | Purpose                                 | Provider       |
| -------------- | --------------------------------------- | -------------- |
| `silero.VAD`   | Detects when user starts/stops speaking | Silero         |
| `openai.STT`   | Transcribes speech to text              | OpenAI Whisper |
| `cerebras_llm` | Translates text at 450+ tokens/sec      | Cerebras       |
| `openai.TTS`   | Converts translation to speech          | OpenAI         |

## Part 4: Run the Agent in Jupyter

LiveKit provides a Jupyter integration that displays an inline microphone widget. This allows you to test the agent directly in your notebook.

```python theme={null}
from livekit.agents import jupyter

# Run the agent inside the notebook
jupyter.run_app(
    WorkerOptions(entrypoint_fnc=entrypoint)
)
```

### How It Works

1. **Widget appears**: An embedded audio widget displays below the cell
2. **Microphone access**: Browser requests microphone permission
3. **Speak**: Your voice is captured and sent to the agent
4. **Translation**: Cerebras translates in \~80-150ms
5. **Response**: You hear the translation spoken back

<Note>
  **Important**: Run this notebook in a browser (not VS Code or other IDEs) for proper microphone access via the LiveKit widget.
</Note>

## Part 5: Supported Languages

The translation agent supports any language that Llama-3.1-8B can translate. Common options:

| Language   | Code | Example Greeting               |
| ---------- | ---- | ------------------------------ |
| English    | en   | "Hello, how are you?"          |
| Spanish    | es   | "¡Hola! ¿Cómo estás?"          |
| German     | de   | "Hallo, wie geht es dir?"      |
| French     | fr   | "Bonjour, comment allez-vous?" |
| Italian    | it   | "Ciao, come stai?"             |
| Portuguese | pt   | "Olá, como você está?"         |
| Japanese   | ja   | "こんにちは、お元気ですか？"                |
| Chinese    | zh   | "你好，你好吗？"                      |

To change the target language, modify `TARGET_LANGUAGE` and re-run the agent cells.

## Performance

| Stage          | Latency        |
| -------------- | -------------- |
| STT (Whisper)  | 200-500ms      |
| LLM (Cerebras) | 80-150ms       |
| TTS (OpenAI)   | 100-300ms      |
| **Total**      | **500-1200ms** |

Cerebras provides \~450 tokens/sec inference speed, enabling natural conversational translation with minimal perceived delay.

## Summary

### What We Built

A **realtime voice translation agent** with:

* **LiveKit Agents** for WebRTC voice handling
* **Cerebras Llama-3.1-8B** for ultra-fast translation
* **OpenAI Whisper** for accurate speech recognition
* **OpenAI TTS** for natural speech synthesis
* **Jupyter integration** for easy testing

### Key Patterns

1. **OpenAI-Compatible API**: Cerebras works with any OpenAI-compatible client
2. **Voice Pipeline**: VAD → STT → LLM → TTS for seamless conversation
3. **Minimal Prompts**: Short, focused prompts reduce latency
4. **Browser Integration**: Jupyter widgets enable microphone access

### Next Steps

* Add language detection for automatic source language identification
* Implement conversation history for context-aware translation
* Add support for multiple simultaneous languages
* Deploy as a standalone web application

### Resources

* [Cerebras Inference Docs](https://inference-docs.cerebras.ai)
* [LiveKit Agents Docs](https://docs.livekit.io/agents/)
* [LiveKit Integration Guide](/integrations/livekit)

### Acknowledgements

Thank you to the Cerebras team—Ryan, Ryann, Zhenwei, and Neeraj—for their support and feedback during the development of this cookbook.
