Automate User Research with LangChain

Sarah Chieng
August 11, 2025

Open in Github

For this workshop, you’ll need:

Cerebras API: the fastest inference provider, get started for free here
LangGraph: for orchestrating multi-agent workflows

If you would like to add tracing and evaluation (not required to get started):

LangSmith: for tracing and evaluating agents, free signup here

If you have any questions, please reach out on the Cerebras Discord

Step 1: Environment Setup

First, let’s install all the necessary libraries, import everything we need, and configure our API credentials.

$pip install langchain langgraph langchain-openai cerebras-cloud-sdk langchain_cerebras

import logging
import sys
from typing import Dict, List, TypedDict
import time
import os, getpass
from IPython.display import Image, display

from langchain_cerebras import ChatCerebras
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from langgraph.graph import StateGraph, END

# Configuration Constants
DEFAULT_NUM_INTERVIEWS = 10
DEFAULT_NUM_QUESTIONS = 5

os.environ["CEREBRAS_API_KEY"]="your-cerebras-api-key"
os.environ["LANGSMITH_TRACING"] = "your-langsmit-api-key"
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_PROJECT"] = "langchain-cerebras" #or your preferred project name

Step 2: Set Up Our LLM

These functions sends prompts to llama3.3-70b running on Cerebras and return clean, direct responses. It will serve as our core communication layer throughout the research process - from generating interview questions to creating participant personas to analyzing simulated responses.

from langchain_cerebras import ChatCerebras

prompt = f"""Generate exactly 3 interview questions about: model context protocol

Requirements:
- Each question must be open-ended (not yes/no)
- Keep questions conversational and clear
- One question per line
- No numbering, bullets, or extra formatting

Topic: model context protocol"""

# Results
llm = ChatCerebras(model="llama3.3-70b",temperature=0.7,max_tokens=800)
response = llm.invoke([{"role": "user", "content": f"You are a helpful assistant. Provide a direct, clear response without showing your thinking process {prompt}"}])
response.pretty_print()

# General model instructions
system_prompt = """You are a helpful assistant. Provide a direct, clear response without showing your thinking process. Respond directly without using <think> tags or showing internal reasoning."""

def ask_ai(prompt: str) -> str:
    """Send prompt to Cerebras AI and return response"""

    response = llm.invoke([{"role":"system", "content": system_prompt},{"role": "user", "content": prompt}])
    return response.content

print("✅ Setup complete")

Step 3: Define State

Today, we’ll be using LangGraph to orchestrate our multi-agent research workflow. LangGraph uses state to coordinate between different nodes, acting as shared memory where each specialized agent can store and access information throughout the process. We start by defining data classes we will use and a TypedDict that specifies exactly what data our workflow needs to track - from the initial research question all the way through to the final synthesized insights.

from typing import List
from pydantic import BaseModel, Field, ValidationError

class Persona(BaseModel):
    name: str = Field(..., description="Full name of the persona")
    age: int = Field(..., description="Age in years")
    job: str = Field(..., description="Job title or role")
    traits: List[str] = Field(..., description="3-4 personality traits")
    communication_style: str = Field(..., description="How this person communicates")
    background: str = Field(..., description="One background detail shaping their perspective")

class PersonasList(BaseModel):
    personas: List[Persona] = Field(..., description="List of generated personas")

class InterviewState(TypedDict):
    # Configuration inputs
    research_question: str
    target_demographic: str
    num_interviews: int
    num_questions: int

    # Generated data
    interview_questions: List[str]
    personas: List[Persona]

    # Current interview tracking
    current_persona_index: int
    current_question_index: int
    current_interview_history: List[Dict]

    # Results storage
    all_interviews: List[Dict]
    synthesis: str

print("✅ State management ready")

Step 4: Define Core Node Functions

Next, we’ll build the core nodes that handle each part of our research process. Each node is a specialized agent that performs one specific task and updates the shared state for other nodes to use. In this step, we’ll create four main nodes:

Configuration node: gets research question from the user
Persona generation node: creates synthetic users
Interview node: conducts our interviews
Synthesis node: analyzes and present results

Configuration Node - Entry point that gathers research parameters and generates questions

from pydantic import BaseModel, Field

class Questions(BaseModel):
    questions: List = Field(..., description="List of interview questions")

# Generate interview questions using AI
question_gen_prompt = """Generate exactly {DEFAULT_NUM_QUESTIONS} interview questions about: {research_question}. Use the provided structured output to format the questions."""

def configuration_node(state: InterviewState) -> Dict:
    """Get user inputs and generate interview questions"""

    print(f"\n🔧 Configuring research: {state['research_question']}")
    print(f"📊 Planning {DEFAULT_NUM_INTERVIEWS} interviews with {DEFAULT_NUM_QUESTIONS} questions each")

    structured_llm = llm.with_structured_output(Questions)
    questions = structured_llm.invoke(question_gen_prompt.format(DEFAULT_NUM_QUESTIONS=DEFAULT_NUM_QUESTIONS,research_question=state['research_question']))
    questions = questions.questions
    print(f"✅ Generated {len(questions)} questions")

    return {
        "num_questions": DEFAULT_NUM_QUESTIONS,
        "num_interviews": DEFAULT_NUM_INTERVIEWS,
        "interview_questions": questions
    }

Persona Generation Node - Creates diverse user profiles matching the target demographic

persona_prompt = (
         "Generate exactly {num_personas} unique personas for an interview. "
         "Each should belong to the target demographic: {demographic}. "
         "Respond only in JSON using this format: {{ personas: [ ... ] }}"
     )

def persona_generation_node(state: InterviewState) -> Dict:

    num_personas = state['num_interviews']
    demographic  = state['target_demographic']
    max_retries = 5

    print(f"\n👥 Creating {state['num_interviews']} personas...")

    print(persona_prompt.format(num_personas=num_personas, demographic=demographic))

    structured_llm = llm.with_structured_output(PersonasList)

    for attempt in range(max_retries):
        try:
            raw_output = structured_llm.invoke([{"role": "user", "content": persona_prompt.format(num_personas=num_personas, demographic=demographic)}])
            if raw_output is None:
                raise ValueError("LLM returned None")

            validated = PersonasList.model_validate(raw_output)

            if len(validated.personas) != num_personas:
                raise ValueError(f"Expected {num_personas} personas, got {len(validated.personas)}")

            personas = validated.personas
            for i, p in enumerate(personas):
                print(f"Persona {i+1}: {p}")

            return {
                "personas": personas,
                "current_persona_index": 0,
                "current_question_index": 0,
                "all_interviews": []
            }

        except (ValidationError, ValueError, TypeError) as e:
            print(f"❌ Attempt {attempt+1} failed: {e}")
            print(raw_output)
            if attempt == max_retries - 1:
                raise RuntimeError(f"❗️Failed after {max_retries} attempts")

Interview Node - Conducts the actual Q&A with each persona, one question at a time

# Generate response as this persona with detailed character context
interview_prompt = """You are {persona_name}, a {persona_age}-year-old {persona_job} who is {persona_traits}.
Answer the following question in 2-3 sentences:

Question: {question}

Answer as {persona_name} in your own authentic voice. Be brief but creative and unique, and make each answer conversational.
BE REALISTIC – do not be overly optimistic. Mimic real human behavior based on your persona, and give honest answers."""

def interview_node(state: InterviewState) -> Dict:
    """Conduct interview with current persona"""
    persona = state['personas'][state['current_persona_index']]
    question = state['interview_questions'][state['current_question_index']]

    print(f"\n💬 Interview {state['current_persona_index'] + 1}/{len(state['personas'])} - {persona.name}")
    print(f"Q{state['current_question_index'] + 1}: {question}")

    # Generate response as this persona with detailed character context
    prompt = interview_prompt.format(persona_name=persona.name,persona_age=persona.age, persona_job=persona.job, persona_traits=persona.traits, question=question)
    answer = ask_ai(prompt)
    print(f"A: {answer}")

    # Update state with interview history
    history = state.get('current_interview_history', []) + [{
        "question": question,
        "answer": answer
    }]

    # Check if this interview is complete
    if state['current_question_index'] + 1 >= len(state['interview_questions']):
        # Interview complete - save it and move to next persona
        return {
            "all_interviews": state['all_interviews'] + [{
                'persona': persona,
                'responses': history
            }],
            "current_interview_history": [],
            "current_question_index": 0,
            "current_persona_index": state['current_persona_index'] + 1
        }

    # Continue with next question for same persona
    return {
        "current_interview_history": history,
        "current_question_index": state['current_question_index'] + 1
    }

Synthesis Node - Analyzes all completed interviews and generates actionable insights

synthesis_prompt_template = """Analyze these {num_interviews} user interviews about "{research_question}" among {target_demographic} and concise yet comprehensive analysis:

1. KEY THEMES: What patterns and common themes emerged across all interviews? Look for similarities in responses, shared concerns, and recurring topics.

2. DIVERSE PERSPECTIVES: What different viewpoints or unique insights did different personas provide? Highlight contrasting opinions or approaches.

3. PAIN POINTS & OPPORTUNITIES: What challenges, frustrations, or unmet needs were identified? What opportunities for improvement emerged?

4. ACTIONABLE RECOMMENDATIONS: Based on these insights, what specific actions should be taken? Provide concrete, implementable suggestions.

Keep the analysis thorough but well-organized and actionable.

Interview Data:
{interview_summary}
"""

def synthesis_node(state: InterviewState) -> Dict:
    """Synthesize insights from all interviews"""
    print("\n🧠 Analyzing all interviews...")

    # Compile all responses in a structured format
    interview_summary = f"Research Question: {state['research_question']}\n"
    interview_summary += f"Target Demographic: {state['target_demographic']}\n"
    interview_summary += f"Number of Interviews: {len(state['all_interviews'])}\n\n"

    for i, interview in enumerate(state['all_interviews'], 1):
        p = interview['persona']
        interview_summary += f"Interview {i} - {p.name} ({p.age}, {p.job}):\n"
        interview_summary += f"Persona Traits: {p.traits}\n"
        for j, qa in enumerate(interview['responses'], 1):
            interview_summary += f"Q{j}: {qa['question']}\n"
            interview_summary += f"A{j}: {qa['answer']}\n"
        interview_summary += "\n"

    prompt = synthesis_prompt_template.format(
        num_interviews=len(state['all_interviews']),
        research_question=state['research_question'],
        target_demographic=state['target_demographic'],
        interview_summary=interview_summary
    )

    try:
        synthesis = ask_ai(prompt)
    except Exception as e:
        synthesis = f"Error during synthesis: {e}\n\nRaw interview data available for manual analysis."

    # Display results with better formatting
    print("\n" + "="*60)
    print("🎯 COMPREHENSIVE RESEARCH INSIGHTS")
    print("="*60)
    print(f"Research Topic: {state['research_question']}")
    print(f"Demographic: {state['target_demographic']}")
    print(f"Interviews Conducted: {len(state['all_interviews'])}")
    print("-"*60)
    print(synthesis)
    print("="*60)

    return {"synthesis": synthesis}

print("✅ Core nodes ready")

Step 4: Interview Router

This router function determines the next step of our workflow. It decides whether to continue interviewing the current persona, move to the next persona, or end the process and synthesize results. The router checks our current progress and directs the workflow accordingly - this is what makes LangGraph powerful for complex multi-step processes.

def interview_router(state: InterviewState) -> str:
    """Route between continuing interviews or ending"""
    if state['current_persona_index'] >= len(state['personas']):
        return "synthesize"
    else:
        return "interview"

print("✅ Router ready")

Step 5: Build LangGraph Workflow

Now we’ll connect all our nodes into a complete workflow using LangGraph. This creates a multi-agent system where each node specializes in one task, and the router intelligently manages the flow between them. The workflow follows this path: Configuration → Persona Generation → Interview Loop → Synthesis

def build_interview_workflow():
    """Build the complete interview workflow graph"""
    workflow = StateGraph(InterviewState)

    # Add all our specialized nodes
    workflow.add_node("config", configuration_node)
    workflow.add_node("personas", persona_generation_node)
    workflow.add_node("interview", interview_node)
    workflow.add_node("synthesize", synthesis_node)

    # Define the workflow connections
    workflow.set_entry_point("config")
    workflow.add_edge("config", "personas")
    workflow.add_edge("personas", "interview")

    # Conditional routing based on interview progress
    workflow.add_conditional_edges(
        "interview",
        interview_router,
        {
            "interview": "interview",    # Continue interviewing
            "synthesize": "synthesize"   # All done, analyze results
        }
    )
    workflow.add_edge("synthesize", END)

    return workflow.compile()

print("✅ Workflow builder ready")

Step 6: Run the Complete System

This is the main function that executes our entire LangGraph workflow. It initializes the state, runs the multi-agent system, and delivers comprehensive user research insights. The workflow automatically handles the complex orchestration between configuration, persona generation, interviews, and synthesis.

def run_research_system():
    """Execute the complete LangGraph research workflow"""

    research_question = input("\nWhat research question would you like to explore? ")
    target_demographic = input("What kinds of users would you like to interview? ")

    workflow = build_interview_workflow()

    display(Image(workflow.get_graph(xray=True).draw_mermaid_png()))

    start_time = time.time()

    # Initialize state. This is needed before saving our values later
    initial_state = {
        "research_question": research_question,
        "target_demographic": target_demographic,
        "num_interviews": DEFAULT_NUM_INTERVIEWS,
        "num_questions": DEFAULT_NUM_QUESTIONS,
        "interview_questions": [],
        "personas": [],
        "current_persona_index": 0,
        "current_question_index": 0,
        "current_interview_history": [],
        "all_interviews": [],
        "synthesis": ""
    }

    try:
        final_state = workflow.invoke(initial_state, {"recursion_limit": 100})
        total_time = time.time() - start_time
        print(f"\n✅ Workflow complete! {len(final_state['all_interviews'])} interviews in {total_time:.1f}s")
        return final_state
    except Exception as e:
        print(f"❌ Error during workflow execution: {e}")
        return None

print("✅ Complete LangGraph system ready")
result = run_research_system()

Tracing and Evaluation

LangSmith is a platform for tracing, monitoring and evaluating your LLM applications. It is very handy when developing applications. It gives you visibility into the flow of data to and from models and nodes of your graph. The instructions here will help you get started: Getting Started with LangSmith

Optional: Follow Up Question

If you’d like to add a little bit more complexity, we can change our router and create a system for each persona to be asked one follow up question based on their previous answers.

followup_question_prompt = """
Generate ONE natural follow‑up question for {persona_name} based on their last answer:
"{previous_answer}"
Keep it conversational and dig a bit deeper.
"""

followup_answer_prompt = """
You are {persona_name}, a {persona_age}-year-old {persona_job} who is {persona_traits}.

Answer the follow‑up question below in 2‑4 sentences, staying authentic and specific.

Follow‑up question: {followup_question}

Answer as {persona_name}:
"""

# ── main node ────────────────────────────────────────────────────────────────
def interview_node(state: InterviewState) -> Dict:
    """Conduct interview with current persona (adds a single follow‑up)."""

    persona  = state['personas'][state['current_persona_index']]
    question = state['interview_questions'][state['current_question_index']]

    print(f"\n💬 Interview {state['current_persona_index'] + 1}/{len(state['personas'])} - {persona.name}")
    print(f"Q{state['current_question_index'] + 1}: {question}")

    # main answer
    prompt  = interview_prompt.format(
        persona_name   = persona.name,
        persona_age    = persona.age,
        persona_job    = persona.job,
        persona_traits = persona.traits,
        question       = question
    )
    answer  = ask_ai(prompt)
    print(f"A: {answer}")

    # update history
    history = state.get('current_interview_history', []) + [{
        "question"   : question,
        "answer"     : answer,
        "is_followup": False
    }]

    # ---------- if that was the last main question ----------
    if state['current_question_index'] + 1 >= len(state['interview_questions']):

        # ----- add ONE follow‑up (only if not done already) -----
        if not any(entry.get("is_followup") for entry in history):
            followup_q = ask_ai(
                followup_question_prompt.format(
                    persona_name    = persona.name,
                    previous_answer = answer
                )
            )
            print(f"🔄 Follow‑up: {followup_q}")

            followup_ans = ask_ai(
                followup_answer_prompt.format(
                    persona_name      = persona.name,
                    persona_age       = persona.age,
                    persona_job       = persona.job,
                    persona_traits    = persona.traits,
                    followup_question = followup_q
                )
            )
            print(f"A: {followup_ans}")

            history.append({
                "question"   : followup_q,
                "answer"     : followup_ans,
                "is_followup": True
            })

        # save interview & advance to next persona
        return {
            "all_interviews"         : state['all_interviews'] + [{
                'persona'  : persona,
                'responses': history
            }],
            "current_interview_history": [],
            "current_question_index"   : 0,
            "current_persona_index"    : state['current_persona_index'] + 1
        }

    # ---------- otherwise keep going through main questions ----------
    return {
        "current_interview_history": history,
        "current_question_index"  : state['current_question_index'] + 1
    }

This won’t require running any additional code, as it uses the same graph and router with identicle naming conventions. All that’s left is to run it:

result = run_research_system()

Cookbook

​Step 1: Environment Setup

​Step 2: Set Up Our LLM

​Step 3: Define State

​Step 4: Define Core Node Functions

​Step 4: Interview Router

​Step 5: Build LangGraph Workflow

​Step 6: Run the Complete System

​Tracing and Evaluation

​Optional: Follow Up Question