LangGraph is a powerful framework for building stateful, multi-agent applications with LLMs. By combining LangGraph’s orchestration capabilities with Cerebras’s ultra-fast inference, you can create sophisticated AI agents that respond in real-time.
LangGraph provides low-level infrastructure for building long-running, stateful workflows and agents. Learn more on the LangGraph website .
Prerequisites
Before you begin, ensure you have:
Cerebras API Key - Get a free API key here
Python 3.11.5 or higher - LangGraph requires Python 3.11.5+
Basic familiarity with LangChain - LangGraph builds on LangChain concepts
What You’ll Build
In this guide, you’ll learn how to:
Set up LangGraph with Cerebras Inference
Create a simple agent with tool calling
Build a stateful conversation flow
Implement streaming responses
Create multi-agent collaboration workflows
Create a virtual environment
First, create and activate a Python virtual environment to keep your dependencies isolated: python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
Install required dependencies
Install LangGraph, LangChain, and the OpenAI SDK. The OpenAI SDK provides OpenAI-compatible client functionality that works seamlessly with Cerebras: pip install langgraph langchain langchain-openai openai python-dotenv
Configure environment variables
Create a .env file in your project directory to store your API key securely: CEREBRAS_API_KEY = your-cerebras-api-key-here
Never commit your .env file to version control. Add it to your .gitignore file.
Initialize the Cerebras client
Set up the LangChain ChatOpenAI client to connect to Cerebras. This client handles all communication with Cerebras’s ultra-fast inference API: import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
# Load environment variables
load_dotenv()
# Initialize Cerebras LLM through LangChain
llm = ChatOpenAI(
model = "llama-3.3-70b" ,
api_key = os.getenv( "CEREBRAS_API_KEY" ),
base_url = "https://api.cerebras.ai/v1" ,
default_headers = {
"X-Cerebras-3rd-Party-Integration" : "langgraph"
}
)
This client will be used by LangGraph to make inference calls to Cerebras’s ultra-fast models.
Create a simple LangGraph agent
Let’s create a basic agent that can respond to user queries. This example demonstrates how to integrate Cerebras with LangGraph’s state management: import os
from dotenv import load_dotenv
from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
# Load environment variables
load_dotenv()
# Define the state structure
class AgentState ( TypedDict ):
messages: list
# Initialize Cerebras LLM through LangChain
llm = ChatOpenAI(
model = "llama-3.3-70b" ,
api_key = os.getenv( "CEREBRAS_API_KEY" ),
base_url = "https://api.cerebras.ai/v1" ,
default_headers = {
"X-Cerebras-3rd-Party-Integration" : "langgraph"
}
)
# Define the agent node
def call_model ( state : AgentState):
"""Call the Cerebras model with the current conversation history."""
messages = state[ "messages" ]
response = llm.invoke(messages)
return { "messages" : messages + [response]}
# Build the graph
workflow = StateGraph(AgentState)
workflow.add_node( "agent" , call_model)
workflow.set_entry_point( "agent" )
workflow.add_edge( "agent" , END )
# Compile the graph
app = workflow.compile()
# Run the agent
result = app.invoke({
"messages" : [HumanMessage( content = "What is the capital of France?" )]
})
print (result[ "messages" ][ - 1 ].content)
This creates a simple conversational agent that maintains state across interactions.
Building a Web Search Agent
One of LangGraph’s most powerful features is tool calling. Here’s how to build an agent that can search the web using Cerebras:
import os
from dotenv import load_dotenv
from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, ToolMessage
from langchain_core.tools import tool
load_dotenv()
# Define a simple search tool
@tool
def web_search ( query : str ) -> str :
"""Search the web for information."""
# In production, integrate with a real search API
return f "Search results for: { query } "
# Define state
class AgentState ( TypedDict ):
messages: list
# Initialize LLM with tools
llm = ChatOpenAI(
model = "llama-3.3-70b" ,
api_key = os.getenv( "CEREBRAS_API_KEY" ),
base_url = "https://api.cerebras.ai/v1" ,
default_headers = {
"X-Cerebras-3rd-Party-Integration" : "langgraph"
}
).bind_tools([web_search])
# Define agent logic
def call_model ( state : AgentState):
"""Call the model and handle tool calls."""
messages = state[ "messages" ]
response = llm.invoke(messages)
return { "messages" : messages + [response]}
def should_continue ( state : AgentState):
"""Determine if we should continue to tools or end."""
last_message = state[ "messages" ][ - 1 ]
if hasattr (last_message, "tool_calls" ) and last_message.tool_calls:
return "tools"
return "end"
def call_tools ( state : AgentState):
"""Execute tool calls."""
messages = state[ "messages" ]
last_message = messages[ - 1 ]
tool_messages = []
for tool_call in last_message.tool_calls:
if tool_call[ "name" ] == "web_search" :
result = web_search.invoke(tool_call[ "args" ])
tool_messages.append(
ToolMessage( content = result, tool_call_id = tool_call[ "id" ])
)
return { "messages" : messages + tool_messages}
# Build the graph
workflow = StateGraph(AgentState)
workflow.add_node( "agent" , call_model)
workflow.add_node( "tools" , call_tools)
workflow.set_entry_point( "agent" )
workflow.add_conditional_edges(
"agent" ,
should_continue,
{ "tools" : "tools" , "end" : END }
)
workflow.add_edge( "tools" , "agent" )
# Compile and run
app = workflow.compile()
result = app.invoke({
"messages" : [HumanMessage( content = "Search for the latest AI news" )]
})
for message in result[ "messages" ]:
print ( f " { message. __class__ . __name__ } : { message.content } " )
Cerebras’s fast inference speeds are particularly beneficial for agentic workflows, where multiple LLM calls may be needed to complete a task.
Streaming Responses
LangGraph supports streaming, which is perfect for real-time applications. Here’s how to stream responses from Cerebras:
import os
from dotenv import load_dotenv
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from typing import TypedDict
load_dotenv()
class AgentState ( TypedDict ):
messages: list
llm = ChatOpenAI(
model = "llama-3.3-70b" ,
api_key = os.getenv( "CEREBRAS_API_KEY" ),
base_url = "https://api.cerebras.ai/v1" ,
streaming = True ,
default_headers = {
"X-Cerebras-3rd-Party-Integration" : "langgraph"
}
)
def call_model ( state : AgentState):
messages = state[ "messages" ]
response = llm.invoke(messages)
return { "messages" : messages + [response]}
workflow = StateGraph(AgentState)
workflow.add_node( "agent" , call_model)
workflow.set_entry_point( "agent" )
workflow.add_edge( "agent" , END )
app = workflow.compile()
# Stream the response
for chunk in app.stream({
"messages" : [HumanMessage( content = "Tell me a short story" )]
}):
print (chunk)
Advanced: Multi-Agent Collaboration
LangGraph excels at orchestrating multiple agents. Here’s an example of two agents collaborating - a researcher and a writer:
import os
from dotenv import load_dotenv
from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
load_dotenv()
class MultiAgentState ( TypedDict ):
messages: list
next_agent: str
# Initialize two different agents with different models
researcher = ChatOpenAI(
model = "llama-3.3-70b" ,
api_key = os.getenv( "CEREBRAS_API_KEY" ),
base_url = "https://api.cerebras.ai/v1" ,
default_headers = {
"X-Cerebras-3rd-Party-Integration" : "langgraph"
}
)
writer = ChatOpenAI(
model = "qwen-3-32b" ,
api_key = os.getenv( "CEREBRAS_API_KEY" ),
base_url = "https://api.cerebras.ai/v1" ,
default_headers = {
"X-Cerebras-3rd-Party-Integration" : "langgraph"
}
)
def research_node ( state : MultiAgentState):
"""Research agent gathers information."""
messages = state[ "messages" ]
system_msg = SystemMessage( content = "You are a research assistant. Gather key facts." )
response = researcher.invoke([system_msg] + messages)
return {
"messages" : messages + [response],
"next_agent" : "writer"
}
def writer_node ( state : MultiAgentState):
"""Writer agent creates content from research."""
messages = state[ "messages" ]
system_msg = SystemMessage( content = "You are a writer. Create engaging content from the research." )
response = writer.invoke([system_msg] + messages)
return {
"messages" : messages + [response],
"next_agent" : "end"
}
def route_agent ( state : MultiAgentState):
"""Route to the next agent."""
return state[ "next_agent" ]
# Build the graph
workflow = StateGraph(MultiAgentState)
workflow.add_node( "researcher" , research_node)
workflow.add_node( "writer" , writer_node)
workflow.set_entry_point( "researcher" )
workflow.add_conditional_edges(
"researcher" ,
route_agent,
{ "writer" : "writer" , "end" : END }
)
workflow.add_conditional_edges(
"writer" ,
route_agent,
{ "end" : END }
)
app = workflow.compile()
result = app.invoke({
"messages" : [HumanMessage( content = "Write a brief article about quantum computing" )],
"next_agent" : "researcher"
})
print ( " \n === Final Output ===" )
print (result[ "messages" ][ - 1 ].content)
Next Steps
Explore LangGraph Documentation - Visit the official LangGraph docs for advanced patterns
Try Different Models - Experiment with different Cerebras models like llama-3.3-70b, qwen-3-32b, llama3.1-8b, gpt-oss-120b, or zai-glm-4.6
Add Persistence - Use LangGraph’s checkpointing to save agent state between runs
Build RAG Agents - Combine LangGraph with vector databases for retrieval-augmented generation
Want to migrate to the best model? GLM4.6
FAQ
Why am I getting 'model not found' errors?
Make sure you’re using one of the available Cerebras models with the correct format:
llama-3.3-70b
qwen-3-32b
llama3.1-8b
gpt-oss-120b
zai-glm-4.6
The model name should match exactly as shown above (without any prefix).
How do I handle rate limits?
Cerebras has generous rate limits, but if you’re building high-throughput applications, consider:
Implementing exponential backoff
Using LangGraph’s built-in retry mechanisms
Batching requests when possible
Monitoring your usage through the Cerebras Cloud dashboard
Can I use LangGraph with streaming?
Yes! Set streaming=True when initializing the ChatOpenAI client. LangGraph will automatically handle streaming responses. Cerebras’s fast inference makes streaming particularly smooth and responsive.
How do I debug my LangGraph workflows?
LangGraph provides excellent debugging tools:
Use app.get_graph().print_ascii() to visualize your workflow
Enable verbose logging with langchain.debug = True
Print intermediate state values to understand the flow
What's the advantage of using Cerebras with LangGraph?
Cerebras provides ultra-fast inference speeds, which is particularly beneficial for:
Agentic workflows that require multiple LLM calls
Real-time applications that need low latency
Interactive agents that benefit from quick response times
Multi-agent systems where speed compounds across multiple calls
Additional Resources