Skip to main content

What is Hume AI?

Hume AI is an empathic AI platform that builds emotionally intelligent voice agents. Their Empathic Voice Interface (EVI) combines speech recognition, emotion detection, and natural language understanding to create voice experiences that understand and respond to human emotions in real-time. By integrating Cerebras’s ultra-fast inference with Hume AI’s emotional intelligence capabilities, you can build voice agents that are both lightning-fast and emotionally aware. Learn more at Hume AI.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here
  • Hume AI Account - Visit Hume AI and create an account to get your API credentials
  • Python 3.11 or higher - Required for running the integration examples
  • Basic understanding of async Python - Hume AI’s SDK uses asynchronous operations

Configure Hume AI with Cerebras

1

Install required dependencies

Install the Hume AI SDK and OpenAI client for Cerebras integration:
pip install "hume[microphone]" openai python-dotenv
The hume[microphone] package provides the official SDK with audio playback support. The [microphone] extra includes dependencies for recording and playing audio.
2

Configure environment variables

Create a .env file in your project directory with your API credentials:
CEREBRAS_API_KEY=your-cerebras-api-key-here
HUME_API_KEY=your-hume-api-key-here
HUME_SECRET_KEY=your-hume-secret-key-here
You can find your Hume AI credentials in your Hume AI dashboard under API Keys.
3

Initialize the Cerebras client

Set up the Cerebras client to handle language model inference. This client will process text-based interactions while Hume AI handles voice and emotion detection:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Initialize Cerebras client with integration tracking
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "Hume AI"
    }
)
The X-Cerebras-3rd-Party-Integration header helps Cerebras track integration usage and provide better support.
4

Create a basic empathic response generator

Build a function that generates emotionally aware responses using Cerebras. This example shows how to incorporate emotional context into your prompts:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Initialize Cerebras client
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "Hume AI"
    }
)

def generate_empathic_response(user_message, emotions):
    """Generate an emotionally aware response using Cerebras."""
    
    # Build context-aware prompt with emotional information
    emotion_context = ", ".join([f"{e['name']}: {e['score']:.2f}" for e in emotions[:3]])
    
    system_prompt = f"""You are an empathic AI assistant. 
    The user's current emotional state shows: {emotion_context}.
    Respond with appropriate empathy and understanding."""
    
    # Generate response using Cerebras
    response = cerebras_client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ],
        temperature=0.7,
        max_tokens=150
    )
    
    return response.choices[0].message.content

# Example usage
emotions = [
    {"name": "joy", "score": 0.75},
    {"name": "excitement", "score": 0.62},
    {"name": "curiosity", "score": 0.48}
]

response = generate_empathic_response(
    "I just got accepted to my dream university!",
    emotions
)
print(response)
This function takes user input and detected emotions, then generates contextually appropriate responses that acknowledge the user’s emotional state.
5

Process emotional context in conversations

Use detected emotions to generate contextually appropriate responses. This example shows how to integrate emotional intelligence into your Cerebras-powered applications:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Initialize Cerebras client
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "Hume AI"
    }
)

def process_with_emotion_context(user_message, emotions):
    """Generate responses that acknowledge emotional context."""
    
    # Build emotion-aware system prompt
    emotion_context = ", ".join([f"{e['name']}: {e['score']:.2f}" for e in emotions[:3]])
    
    system_prompt = f"""You are an empathic AI assistant. 
    The user's current emotional state shows: {emotion_context}.
    Respond with appropriate empathy and understanding."""
    
    # Generate response using Cerebras
    response = cerebras_client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ],
        temperature=0.7,
        max_tokens=150
    )
    
    return response.choices[0].message.content

# Example: Process a message with detected emotions
emotions = [
    {"name": "frustration", "score": 0.68},
    {"name": "confusion", "score": 0.52},
    {"name": "determination", "score": 0.45}
]

response = process_with_emotion_context(
    "I've been trying to fix this bug for hours and nothing works.",
    emotions
)
print(response)
For real-time voice interactions with Hume AI’s EVI (Empathic Voice Interface), refer to the Hume AI EVI documentation. Voice features require WebSocket connections and are best suited for interactive applications.
6

Implement streaming responses

Use streaming to reduce latency when generating empathic responses. This provides immediate feedback to users:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Initialize Cerebras client
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "Hume AI"
    }
)

def stream_empathic_response(user_message, emotions):
    """Stream responses with emotional context for lower latency."""
    
    emotion_context = ", ".join([f"{e['name']}: {e['score']:.2f}" for e in emotions[:3]])
    
    system_prompt = f"""You are an empathic AI assistant. 
    The user's current emotional state shows: {emotion_context}.
    Respond with appropriate empathy and understanding."""
    
    # Stream response from Cerebras
    stream = cerebras_client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ],
        temperature=0.7,
        max_tokens=150,
        stream=True
    )
    
    full_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            full_response += content
            print(content, end="", flush=True)
    
    print()  # New line after streaming
    return full_response

# Example usage
emotions = [
    {"name": "anxiety", "score": 0.72},
    {"name": "hope", "score": 0.58}
]

response = stream_empathic_response(
    "I'm nervous about my presentation tomorrow.",
    emotions
)
Streaming responses significantly improves the user experience by providing immediate feedback while the full response is being generated.

Advanced Features

Multi-Turn Conversations with Emotional Tracking

Track emotional state across multiple conversation turns to build more contextually aware interactions:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Initialize Cerebras client
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "Hume AI"
    }
)

class EmotionalConversation:
    """Track emotional context across multiple turns."""
    
    def __init__(self):
        self.conversation_history = []
        self.emotion_history = []
    
    def add_turn(self, user_message, emotions, ai_response):
        """Record a conversation turn with emotional context."""
        self.conversation_history.append({
            "user": user_message,
            "assistant": ai_response
        })
        self.emotion_history.append(emotions)
    
    def generate_response(self, user_message, current_emotions):
        """Generate response considering conversation and emotional history."""
        
        # Analyze emotional trend
        if len(self.emotion_history) > 0:
            prev_emotions = self.emotion_history[-1]
            emotion_trend = self._analyze_emotion_change(prev_emotions, current_emotions)
        else:
            emotion_trend = "This is the start of our conversation."
        
        # Build context-aware prompt
        emotion_context = ", ".join([f"{e['name']}: {e['score']:.2f}" for e in current_emotions[:3]])
        
        system_prompt = f"""You are an empathic AI assistant. 
        Current emotional state: {emotion_context}
        Emotional trend: {emotion_trend}
        Respond with appropriate empathy and understanding."""
        
        # Build messages with history
        messages = [{"role": "system", "content": system_prompt}]
        for turn in self.conversation_history[-3:]:  # Last 3 turns
            messages.append({"role": "user", "content": turn["user"]})
            messages.append({"role": "assistant", "content": turn["assistant"]})
        messages.append({"role": "user", "content": user_message})
        
        # Generate response
        response = cerebras_client.chat.completions.create(
            model="llama-3.3-70b",
            messages=messages,
            temperature=0.7,
            max_tokens=150
        )
        
        ai_response = response.choices[0].message.content
        self.add_turn(user_message, current_emotions, ai_response)
        
        return ai_response
    
    def _analyze_emotion_change(self, prev_emotions, current_emotions):
        """Analyze how emotions have changed."""
        prev_dict = {e['name']: e['score'] for e in prev_emotions}
        curr_dict = {e['name']: e['score'] for e in current_emotions}
        
        # Find biggest change
        changes = []
        for emotion in curr_dict:
            if emotion in prev_dict:
                change = curr_dict[emotion] - prev_dict[emotion]
                if abs(change) > 0.2:
                    direction = "increased" if change > 0 else "decreased"
                    changes.append(f"{emotion} has {direction}")
        
        return ", ".join(changes) if changes else "Emotions are stable"

# Example usage
conversation = EmotionalConversation()

# Turn 1
emotions_1 = [
    {"name": "confusion", "score": 0.65},
    {"name": "curiosity", "score": 0.58}
]
response_1 = conversation.generate_response(
    "I'm not sure how to approach this problem.",
    emotions_1
)
print(f"Turn 1: {response_1}\n")

# Turn 2
emotions_2 = [
    {"name": "understanding", "score": 0.72},
    {"name": "confidence", "score": 0.61}
]
response_2 = conversation.generate_response(
    "Oh, I think I'm starting to get it now!",
    emotions_2
)
print(f"Turn 2: {response_2}")

Best Practices

Optimize Response Times

For real-time voice interactions, response speed is critical. Here are strategies to minimize latency:
  1. Choose the right model - Use cerebras/llama3.1-8b for ultra-low latency or cerebras/llama-3.3-70b for better quality with good speed
  2. Implement streaming - Stream responses to provide immediate feedback
  3. Cache common responses - Store frequently requested information to avoid redundant API calls
  4. Optimize token limits - Use lower max_tokens values for faster generation
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Initialize Cerebras client
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "Hume AI"
    }
)

# Ultra-low latency configuration
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Quick question: what's 2+2?"}
]

stream = cerebras_client.chat.completions.create(
    model="llama3.1-8b",
    messages=messages,
    max_tokens=100,  # Shorter responses = faster generation
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()

Add Voice Output with Hume TTS

Convert Cerebras-generated text responses to spoken audio using Hume’s Text-to-Speech API. This creates a complete voice experience:
import os
import asyncio
import base64
from openai import OpenAI
from hume import AsyncHumeClient
from hume.tts import PostedUtterance, PostedUtteranceVoiceWithName
from hume.empathic_voice.chat.audio.audio_utilities import play_audio
from dotenv import load_dotenv

load_dotenv()

# Initialize clients
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "Hume AI"
    }
)

hume_client = AsyncHumeClient(api_key=os.getenv("HUME_API_KEY"))

async def speak_response(text: str, voice_name: str = "Ava Song"):
    """Convert text to speech and play audio."""
    utterance = PostedUtterance(
        text=text,
        voice=PostedUtteranceVoiceWithName(name=voice_name, provider='HUME_AI')
    )
    
    result = await hume_client.tts.synthesize_json(
        utterances=[utterance]
    )
    
    audio_data = base64.b64decode(result.generations[0].audio)
    await play_audio(audio_data)

async def generate_and_speak(user_message: str, emotions: list):
    """Generate empathic response with Cerebras and speak it with Hume TTS."""
    
    emotion_context = ", ".join([f"{e['name']}: {e['score']:.2f}" for e in emotions[:3]])
    
    # Generate text response with Cerebras
    response = cerebras_client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": f"You are an empathic assistant. User emotions: {emotion_context}. Keep responses concise (2-3 sentences)."},
            {"role": "user", "content": user_message}
        ],
        max_tokens=150
    )
    
    text_response = response.choices[0].message.content
    print(f"Response: {text_response}")
    
    # Speak the response
    await speak_response(text_response)

# Run the example
emotions = [{"name": "curiosity", "score": 0.75}, {"name": "excitement", "score": 0.60}]
asyncio.run(generate_and_speak("Tell me something interesting about space!", emotions))
The hume[microphone] package is required for audio playback. Install with: pip install hume[microphone]

Handle Emotional Context

Leverage Hume AI’s emotion detection to create more empathic and contextually appropriate responses:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Initialize Cerebras client
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "Hume AI"
    }
)

def build_emotion_aware_prompt(user_message, emotions):
    """Create prompts that incorporate emotional context."""
    
    # Get top three detected emotions
    top_emotions = sorted(emotions, key=lambda x: x['score'], reverse=True)[:3]
    emotion_str = ", ".join([f"{e['name']} ({e['score']:.0%})" for e in top_emotions])
    
    # Adjust tone based on dominant emotion
    if top_emotions[0]['name'] in ['sadness', 'distress', 'anxiety']:
        tone = "compassionate and supportive"
    elif top_emotions[0]['name'] in ['joy', 'excitement', 'amusement']:
        tone = "enthusiastic and celebratory"
    else:
        tone = "warm and understanding"
    
    system_prompt = f"""You are an empathic AI assistant. The user is currently 
    expressing these emotions: {emotion_str}. Respond in a {tone} manner, 
    acknowledging their emotional state appropriately."""
    
    return system_prompt

Implement Error Handling

Build robust error handling for production voice agents:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Initialize Cerebras client
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "Hume AI"
    }
)

def robust_generate_response(user_message, emotions, max_retries=3):
    """Generate response with automatic retry and fallback logic."""
    
    # Build emotion-aware prompt
    emotion_context = ", ".join([f"{e['name']}: {e['score']:.2f}" for e in emotions[:3]])
    system_prompt = f"""You are an empathic AI assistant. 
    The user's current emotional state shows: {emotion_context}.
    Respond with appropriate empathy and understanding."""
    
    for attempt in range(max_retries):
        try:
            response = cerebras_client.chat.completions.create(
                model="llama-3.3-70b",
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_message}
                ],
                timeout=10.0
            )
            return response.choices[0].message.content
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt == max_retries - 1:
                # Final fallback to simpler model
                print("Falling back to llama3.1-8b")
                response = cerebras_client.chat.completions.create(
                    model="llama3.1-8b",
                    messages=[
                        {"role": "user", "content": user_message}
                    ]
                )
                return response.choices[0].message.content

# Example usage
emotions = [
    {"name": "frustration", "score": 0.65},
    {"name": "determination", "score": 0.58}
]

response = robust_generate_response(
    "I need help understanding this concept.",
    emotions
)
print(response)
Hume AI’s EVI API uses WebSocket connections for real-time voice interactions. Make sure your network environment supports WebSocket connections and consider implementing reconnection logic for production applications.

Common Questions

For real-time voice interactions, we recommend:
  • cerebras/llama3.1-8b - Best for ultra-low latency when speed is critical
  • cerebras/llama-3.3-70b - Best balance of quality and speed for most applications
  • cerebras/qwen-3-32b - Good alternative with strong multilingual support
Start with llama-3.3-70b and switch to llama3.1-8b if you need faster responses.
Implement these strategies to manage rate limits:
  1. Use exponential backoff with retry logic (see error handling example above)
  2. Cache common responses to reduce API calls
  3. Implement request queuing for high-traffic scenarios
  4. Monitor your usage and upgrade your plan if needed
Both Cerebras and Hume AI offer higher rate limits on paid plans.
Yes! Cerebras models like qwen-3-32b and llama-3.3-70b support multiple languages. Hume AI also provides multilingual emotion detection and text-to-speech capabilities. Check the Hume AI documentation for supported languages and features.
To get the best emotion detection results:
  1. Use a quality microphone with minimal background noise
  2. Ensure clear audio input with good signal-to-noise ratio
  3. Allow sufficient speech samples (at least 2-3 seconds) for accurate analysis
  4. Test with different voice settings to find optimal configuration
  5. Review Hume AI’s emotion model documentation for best practices
For voice interactions, use Hume AI’s EVI (Empathic Voice Interface) or TTS (Text-to-Speech) APIs:
  • EVI: Full conversational voice agent with real-time emotion detection and synthesis. Requires WebSocket connections. See the EVI Python Quickstart.
  • TTS: Converts Cerebras-generated text to emotionally expressive speech. See the TTS documentation.
Both can be combined with Cerebras for ultra-fast, emotionally intelligent voice applications.

Troubleshooting

WebSocket Connection Issues

If you experience connection problems with Hume AI’s voice interface:
  • Ensure your firewall allows WebSocket connections on ports 80 and 443
  • Check that your API credentials are correct and active in the Hume AI dashboard
  • Verify your network supports WSS (WebSocket Secure) protocol
  • Implement exponential backoff for reconnection attempts
  • Check Hume AI’s status page for service issues

Audio Quality Problems

For poor audio quality or latency:
  • Use Hume AI’s Octave 2 voice model ("version": "2") for improved quality
  • Reduce the max_tokens parameter in Cerebras requests to speed up generation
  • Consider using cerebras/llama3.1-8b for faster responses in real-time scenarios
  • Check your internet connection bandwidth (minimum 1 Mbps recommended)
  • Test with different voice options to find the best quality for your use case

Emotion Detection Accuracy

If emotion detection seems inaccurate:
  • Ensure clear audio input with minimal background noise
  • Use a quality microphone for better voice capture
  • Allow sufficient speech samples for accurate emotion analysis (2-3 seconds minimum)
  • Review Hume AI’s emotion model documentation for supported languages and contexts
  • Test with different speakers to understand model behavior

API Rate Limits

If you hit rate limits:
  • Implement request queuing and retry logic with exponential backoff
  • Cache responses for common queries
  • Monitor your usage in the Cerebras and Hume AI dashboards
  • Consider upgrading your plan for higher limits
  • Use streaming responses to reduce the number of API calls
When building production voice agents, always implement proper error handling and graceful degradation. Voice interactions should continue even if one service experiences issues. Consider implementing fallback responses and offline capabilities.

Next Steps

Now that you’ve set up the integration, explore these resources to build more advanced applications:

Resources