Skip to main content
LangGraph is a powerful framework for building stateful, multi-agent applications with LLMs. By combining LangGraph’s orchestration capabilities with Cerebras’s ultra-fast inference, you can create sophisticated AI agents that respond in real-time. LangGraph provides low-level infrastructure for building long-running, stateful workflows and agents. Learn more on the LangGraph website.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here
  • Python 3.11.5 or higher - LangGraph requires Python 3.11.5+
  • Basic familiarity with LangChain - LangGraph builds on LangChain concepts

What You’ll Build

In this guide, you’ll learn how to:
  • Set up LangGraph with Cerebras Inference
  • Create a simple agent with tool calling
  • Build a stateful conversation flow
  • Implement streaming responses
  • Create multi-agent collaboration workflows

Configure LangGraph with Cerebras

1

Create a virtual environment

First, create and activate a Python virtual environment to keep your dependencies isolated:
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
2

Install required dependencies

Install LangGraph, LangChain, and the OpenAI SDK. The OpenAI SDK provides OpenAI-compatible client functionality that works seamlessly with Cerebras:
pip install langgraph langchain langchain-openai openai python-dotenv
3

Configure environment variables

Create a .env file in your project directory to store your API key securely:
CEREBRAS_API_KEY=your-cerebras-api-key-here
Never commit your .env file to version control. Add it to your .gitignore file.
4

Initialize the Cerebras client

Set up the LangChain ChatOpenAI client to connect to Cerebras. This client handles all communication with Cerebras’s ultra-fast inference API:
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI

# Load environment variables
load_dotenv()

# Initialize Cerebras LLM through LangChain
llm = ChatOpenAI(
    model="llama-3.3-70b",
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "langgraph"
    }
)
This client will be used by LangGraph to make inference calls to Cerebras’s ultra-fast models.
5

Create a simple LangGraph agent

Let’s create a basic agent that can respond to user queries. This example demonstrates how to integrate Cerebras with LangGraph’s state management:
import os
from dotenv import load_dotenv
from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

# Load environment variables
load_dotenv()

# Define the state structure
class AgentState(TypedDict):
    messages: list

# Initialize Cerebras LLM through LangChain
llm = ChatOpenAI(
    model="llama-3.3-70b",
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "langgraph"
    }
)

# Define the agent node
def call_model(state: AgentState):
    """Call the Cerebras model with the current conversation history."""
    messages = state["messages"]
    response = llm.invoke(messages)
    return {"messages": messages + [response]}

# Build the graph
workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.set_entry_point("agent")
workflow.add_edge("agent", END)

# Compile the graph
app = workflow.compile()

# Run the agent
result = app.invoke({
    "messages": [HumanMessage(content="What is the capital of France?")]
})

print(result["messages"][-1].content)
This creates a simple conversational agent that maintains state across interactions.

Building a Web Search Agent

One of LangGraph’s most powerful features is tool calling. Here’s how to build an agent that can search the web using Cerebras:
import os
from dotenv import load_dotenv
from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, ToolMessage
from langchain_core.tools import tool

load_dotenv()

# Define a simple search tool
@tool
def web_search(query: str) -> str:
    """Search the web for information."""
    # In production, integrate with a real search API
    return f"Search results for: {query}"

# Define state
class AgentState(TypedDict):
    messages: list

# Initialize LLM with tools
llm = ChatOpenAI(
    model="llama-3.3-70b",
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "langgraph"
    }
).bind_tools([web_search])

# Define agent logic
def call_model(state: AgentState):
    """Call the model and handle tool calls."""
    messages = state["messages"]
    response = llm.invoke(messages)
    return {"messages": messages + [response]}

def should_continue(state: AgentState):
    """Determine if we should continue to tools or end."""
    last_message = state["messages"][-1]
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    return "end"

def call_tools(state: AgentState):
    """Execute tool calls."""
    messages = state["messages"]
    last_message = messages[-1]
    
    tool_messages = []
    for tool_call in last_message.tool_calls:
        if tool_call["name"] == "web_search":
            result = web_search.invoke(tool_call["args"])
            tool_messages.append(
                ToolMessage(content=result, tool_call_id=tool_call["id"])
            )
    
    return {"messages": messages + tool_messages}

# Build the graph
workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.add_node("tools", call_tools)
workflow.set_entry_point("agent")
workflow.add_conditional_edges(
    "agent",
    should_continue,
    {"tools": "tools", "end": END}
)
workflow.add_edge("tools", "agent")

# Compile and run
app = workflow.compile()

result = app.invoke({
    "messages": [HumanMessage(content="Search for the latest AI news")]
})

for message in result["messages"]:
    print(f"{message.__class__.__name__}: {message.content}")
Cerebras’s fast inference speeds are particularly beneficial for agentic workflows, where multiple LLM calls may be needed to complete a task.

Streaming Responses

LangGraph supports streaming, which is perfect for real-time applications. Here’s how to stream responses from Cerebras:
import os
from dotenv import load_dotenv
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from typing import TypedDict

load_dotenv()

class AgentState(TypedDict):
    messages: list

llm = ChatOpenAI(
    model="llama-3.3-70b",
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    streaming=True,
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "langgraph"
    }
)

def call_model(state: AgentState):
    messages = state["messages"]
    response = llm.invoke(messages)
    return {"messages": messages + [response]}

workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.set_entry_point("agent")
workflow.add_edge("agent", END)

app = workflow.compile()

# Stream the response
for chunk in app.stream({
    "messages": [HumanMessage(content="Tell me a short story")]
}):
    print(chunk)

Advanced: Multi-Agent Collaboration

LangGraph excels at orchestrating multiple agents. Here’s an example of two agents collaborating - a researcher and a writer:
import os
from dotenv import load_dotenv
from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

load_dotenv()

class MultiAgentState(TypedDict):
    messages: list
    next_agent: str

# Initialize two different agents with different models
researcher = ChatOpenAI(
    model="llama-3.3-70b",
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "langgraph"
    }
)

writer = ChatOpenAI(
    model="qwen-3-32b",
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "langgraph"
    }
)

def research_node(state: MultiAgentState):
    """Research agent gathers information."""
    messages = state["messages"]
    system_msg = SystemMessage(content="You are a research assistant. Gather key facts.")
    response = researcher.invoke([system_msg] + messages)
    return {
        "messages": messages + [response],
        "next_agent": "writer"
    }

def writer_node(state: MultiAgentState):
    """Writer agent creates content from research."""
    messages = state["messages"]
    system_msg = SystemMessage(content="You are a writer. Create engaging content from the research.")
    response = writer.invoke([system_msg] + messages)
    return {
        "messages": messages + [response],
        "next_agent": "end"
    }

def route_agent(state: MultiAgentState):
    """Route to the next agent."""
    return state["next_agent"]

# Build the graph
workflow = StateGraph(MultiAgentState)
workflow.add_node("researcher", research_node)
workflow.add_node("writer", writer_node)
workflow.set_entry_point("researcher")
workflow.add_conditional_edges(
    "researcher",
    route_agent,
    {"writer": "writer", "end": END}
)
workflow.add_conditional_edges(
    "writer",
    route_agent,
    {"end": END}
)

app = workflow.compile()

result = app.invoke({
    "messages": [HumanMessage(content="Write a brief article about quantum computing")],
    "next_agent": "researcher"
})

print("\n=== Final Output ===")
print(result["messages"][-1].content)

Next Steps

  • Explore LangGraph Documentation - Visit the official LangGraph docs for advanced patterns
  • Try Different Models - Experiment with different Cerebras models like llama-3.3-70b, qwen-3-32b, llama3.1-8b, gpt-oss-120b, or zai-glm-4.6
  • Add Persistence - Use LangGraph’s checkpointing to save agent state between runs
  • Build RAG Agents - Combine LangGraph with vector databases for retrieval-augmented generation
  • Want to migrate to the best model? GLM4.6

FAQ

Make sure you’re using one of the available Cerebras models with the correct format:
  • llama-3.3-70b
  • qwen-3-32b
  • llama3.1-8b
  • gpt-oss-120b
  • zai-glm-4.6
The model name should match exactly as shown above (without any prefix).
Cerebras has generous rate limits, but if you’re building high-throughput applications, consider:
  • Implementing exponential backoff
  • Using LangGraph’s built-in retry mechanisms
  • Batching requests when possible
  • Monitoring your usage through the Cerebras Cloud dashboard
Yes! Set streaming=True when initializing the ChatOpenAI client. LangGraph will automatically handle streaming responses. Cerebras’s fast inference makes streaming particularly smooth and responsive.
LangGraph provides excellent debugging tools:
  • Use app.get_graph().print_ascii() to visualize your workflow
  • Enable verbose logging with langchain.debug = True
  • Print intermediate state values to understand the flow
Cerebras provides ultra-fast inference speeds, which is particularly beneficial for:
  • Agentic workflows that require multiple LLM calls
  • Real-time applications that need low latency
  • Interactive agents that benefit from quick response times
  • Multi-agent systems where speed compounds across multiple calls

Additional Resources