Skip to main content
Cartesia is a voice AI platform that provides ultra-realistic, real-time Text-to-Speech (TTS) models with industry-leading latency. By combining Cerebras Inference’s lightning-fast LLM responses with Cartesia’s natural-sounding voice synthesis, you can build highly responsive voice agents and conversational AI applications. This guide will walk you through integrating Cerebras models with Cartesia Line to create a complete voice AI pipeline.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Sign up and get a free API key from the Cerebras Inference platform.
  • Cartesia API Key - Visit Cartesia and create an account. Navigate to your profile or project settings to generate an API key.
  • Python 3.10 or higher - Required for running the integration code.
  • Cartesia Line environment - Install and configure the Cartesia CLI and Line SDK following the official Line getting-started guide.

Installation and Setup

1

Install required dependencies

Install the Cartesia Line SDK. It includes LiteLLM (used internally for LLM routing) as a dependency:
pip install cartesia-line python-dotenv
  • cartesia-line is the official Cartesia Line SDK for building production voice agents. It bundles LiteLLM for calling LLM providers like Cerebras.
  • python-dotenv is optional but convenient for loading environment variables from a .env file.
2

Configure environment variables

Create a .env file in your project directory to securely store your API keys:
CEREBRAS_API_KEY=your-cerebras-api-key-here
CARTESIA_API_KEY=your-cartesia-api-key-here
Alternatively, you can set these as environment variables in your shell:
export CEREBRAS_API_KEY="your-cerebras-api-key-here"
export CARTESIA_API_KEY="your-cartesia-api-key-here"
Cartesia Line uses your CARTESIA_API_KEY for audio orchestration, and LiteLLM (bundled with Line) uses CEREBRAS_API_KEY to call Cerebras models.
3

Verify Cerebras connectivity via LiteLLM

Before building a voice agent, verify that Cerebras Inference is reachable through LiteLLM (which Cartesia Line uses internally to call LLMs). This self-contained example makes a real API call:
import os

from dotenv import load_dotenv
import litellm

load_dotenv()

# LiteLLM routes to Cerebras when the model name starts with "cerebras/"
response = litellm.completion(
    model="cerebras/llama3.1-8b",
    api_key=os.getenv("CEREBRAS_API_KEY"),
    api_base="https://api.cerebras.ai/v1",
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "What is Cartesia Line used for? Answer in one sentence."}
    ],
    max_tokens=100,
    temperature=0.7,
    extra_headers={"X-Cerebras-3rd-Party-Integration": "cartesia-line"},
)

print(f"Model: {response.model}")
print(f"Response: {response.choices[0].message.content}")
print(f"Tokens used: {response.usage.total_tokens}")
If this prints a response, your Cerebras API key and LiteLLM routing are working correctly.
4

Build a Cartesia Line voice agent with Cerebras

With Line, you do not manually manage WebSockets, audio streams, or pyaudio. Instead, you implement your agent’s reasoning in code and let Line handle audio, speech-to-text, and text-to-speech using Cartesia’s Sonic and Ink models.Below is a complete main.py that configures a Line LlmAgent to use Cerebras as the LLM provider:
import os

from dotenv import load_dotenv
from line.llm_agent import LlmAgent, LlmConfig, end_call
from line.voice_agent_app import VoiceAgentApp
from line.events import CallStarted, UserTurnEnded, UserTextSent, CallEnded

load_dotenv()

async def get_agent(env, call_request):
    """
    Create a Cartesia Line LlmAgent backed by a Cerebras model.
    """
    llm_config = LlmConfig.from_call_request(
        call_request,
        fallback_system_prompt=(
            "You are a helpful, friendly voice assistant. "
            "Keep responses concise and natural for voice output."
        ),
        fallback_introduction=(
            "Hi there! How can I help you today?"
        ),
        temperature=0.7,
        max_tokens=256,
        extra={
            "api_base": "https://api.cerebras.ai/v1",
            "custom_llm_provider": "cerebras",
            "extra_headers": {
                "X-Cerebras-3rd-Party-Integration": "cartesia-line"
            },
        },
    )

    return LlmAgent(
        model="cerebras/llama3.1-8b",
        api_key=os.getenv("CEREBRAS_API_KEY"),
        tools=[end_call],
        config=llm_config,
        run_filter=[CallStarted, UserTurnEnded, UserTextSent, CallEnded],
    )

app = VoiceAgentApp(get_agent=get_agent)

print(f"VoiceAgentApp created: {type(app).__name__}")
print("Agent configured for Cerebras Llama 3.1 8B")
print("Start the server with: PORT=8000 uv run python main.py")
This configuration:
  • Uses Cerebras Llama 3.1 8B (cerebras/llama3.1-8b) as the reasoning engine — the fastest option for low-latency voice interactions.
  • Passes Cerebras-specific options via LlmConfig.extra, including the API base URL, provider name, and an integration tracking header.
  • Lets Cartesia Line handle telephony/WebRTC audio, speech recognition (Ink), text-to-speech (Sonic), streaming, barge-in, and turn-taking.
No direct use of cartesia.tts.websocket() or pyaudio is required — Line encapsulates all audio orchestration.
5

Talk to your agent

Start the voice agent server:
PORT=8000 uv run python main.py
Use the Cartesia CLI to connect to your local Line voice agent:
cartesia chat 8000
This opens a bi-directional audio session where:
  • Audio is streamed through Cartesia’s Sonic/Ink stack.
  • User speech is transcribed and sent as text to your Cerebras-backed LlmAgent.
  • The Cerebras model responds via LiteLLM, and Line converts the reply back to high-quality speech.

Complete Example: Voice Agent with Custom Voice and Pre-Call Configuration

This self-contained example shows how to configure a pre_call_handler to programmatically select Sonic voices, languages, and TTS/STT settings:
import os

from dotenv import load_dotenv
from line.llm_agent import LlmAgent, LlmConfig, end_call
from line.voice_agent_app import CallRequest, PreCallResult, VoiceAgentApp
from line.events import CallStarted, UserTurnEnded, UserTextSent, CallEnded

load_dotenv()

async def get_agent(env, call_request):
    """Create a Cartesia Line LlmAgent backed by Cerebras."""
    llm_config = LlmConfig.from_call_request(
        call_request,
        fallback_system_prompt=(
            "You are a friendly customer service representative. "
            "Be helpful, concise, and professional."
        ),
        fallback_introduction="Hello! How can I assist you today?",
        temperature=0.8,
        max_tokens=256,
        extra={
            "api_base": "https://api.cerebras.ai/v1",
            "custom_llm_provider": "cerebras",
            "extra_headers": {
                "X-Cerebras-3rd-Party-Integration": "cartesia-line"
            },
        },
    )

    return LlmAgent(
        model="cerebras/llama3.1-8b",
        api_key=os.getenv("CEREBRAS_API_KEY"),
        tools=[end_call],
        config=llm_config,
        run_filter=[CallStarted, UserTurnEnded, UserTextSent, CallEnded],
    )

async def pre_call_handler(call_request: CallRequest) -> PreCallResult:
    """Configure voice and TTS/STT settings before each call."""
    return PreCallResult(
        metadata={"tier": "premium"},
        config={
            "tts": {
                "voice": "a0e99841-438c-4a64-b679-ae501e7d6091",  # Cartesia voice ID
                "model": "sonic-3",
                "language": "en",
            },
        },
    )

app = VoiceAgentApp(get_agent=get_agent, pre_call_handler=pre_call_handler)

print(f"VoiceAgentApp created with pre_call_handler: {type(app).__name__}")
print("Configured with Sonic 3 voice and Cerebras Llama 3.1 8B")
You can also attach tools and multi-step workflows to your LlmAgent (e.g., database lookup, web search, CRM APIs) and let Line orchestrate tool calls and multi-agent handoffs.

Complete Example: Tiered Model Selection

This example shows how to select different Cerebras models per call based on metadata — useful for offering premium vs. standard tiers:
import os

from dotenv import load_dotenv
import litellm
from line.llm_agent import LlmAgent, LlmConfig, end_call
from line.voice_agent_app import VoiceAgentApp
from line.events import CallStarted, UserTurnEnded, UserTextSent, CallEnded

load_dotenv()

# Available Cerebras models for voice agents (fastest to most capable)
CEREBRAS_MODELS = {
    "fast": "cerebras/llama3.1-8b",
    "powerful": "cerebras/gpt-oss-120b",
}

async def get_agent(env, call_request):
    """Select a Cerebras model based on call metadata for tiered experiences."""
    metadata = getattr(call_request, "metadata", {}) or {}
    tier = metadata.get("tier", "fast")
    model = CEREBRAS_MODELS.get(tier, CEREBRAS_MODELS["fast"])

    llm_config = LlmConfig.from_call_request(
        call_request,
        fallback_system_prompt="You are a helpful voice assistant.",
        fallback_introduction="Hi there! How can I help?",
        temperature=0.7,
        max_tokens=256,
        extra={
            "api_base": "https://api.cerebras.ai/v1",
            "custom_llm_provider": "cerebras",
            "extra_headers": {
                "X-Cerebras-3rd-Party-Integration": "cartesia-line"
            },
        },
    )

    return LlmAgent(
        model=model,
        api_key=os.getenv("CEREBRAS_API_KEY"),
        tools=[end_call],
        config=llm_config,
        run_filter=[CallStarted, UserTurnEnded, UserTextSent, CallEnded],
    )

app = VoiceAgentApp(get_agent=get_agent)

# Verify the LiteLLM → Cerebras path works
response = litellm.completion(
    model="cerebras/llama3.1-8b",
    api_key=os.getenv("CEREBRAS_API_KEY"),
    api_base="https://api.cerebras.ai/v1",
    messages=[{"role": "user", "content": "Say hello in exactly 3 words."}],
    max_tokens=20,
    extra_headers={"X-Cerebras-3rd-Party-Integration": "cartesia-line"},
)

print(f"VoiceAgentApp created: {type(app).__name__}")
print(f"Available tiers: {list(CEREBRAS_MODELS.keys())}")
print(f"Cerebras test response: {response.choices[0].message.content}")
When calling through LiteLLM from Line, always use the cerebras/ prefix (e.g., cerebras/llama3.1-8b).

Available Models

Cerebras offers several models optimized for voice AI applications that work seamlessly through Cartesia Line:
ModelParametersBest For
llama3.1-8b8BFastest option — ideal for low-latency voice interactions
gpt-oss-120b120BComplex reasoning and demanding tasks
zai-glm-4.7357BAdvanced 357B parameter model with strong reasoning capabilities

Next Steps

Explore Cartesia Line voice options

Browse Cartesia’s voice and agent configuration options:

Advanced Line examples

See production-grade examples and patterns for voice agents:

Cerebras + LiteLLM

Cartesia Line uses LiteLLM internally. Learn more about Cerebras + LiteLLM routing, retries, and fallbacks:

Cerebras models and tooling

  • Cerebras Models – Explore available models and choose the best fit for latency, cost, and capability.
  • Cerebras Tool Use – Add function calling and tool use on top of Cerebras models, then expose those tools through your Line voice agents.
  • Migrate to GLM4.7 – Ready to upgrade? Follow the Cerebras migration guide to start using the latest zai-glm-4.7 model in your Line agents.

Additional Resources