Skip to main content
This cookbook shows how to build a grounded research agent that can:
  • Search the web for current information with Exa
  • Hand the results to a Cerebras model through tool calling
  • Return answers with inline citations and a clean source list
Exa search returns clean page content (highlights) with every result, so a single search tool is enough to ground your AI agent.

Prerequisites

Before you begin, ensure you have:
  • A Cerebras API key
  • An Exa API key
  • Python 3.10+ or Node.js 18+
Install the dependencies:
pip install "exa-py>=2.0" openai python-dotenv
The Node.js examples use ES modules and top-level await. Save them with a .mjs extension (or set "type": "module" in your package.json) and run them with node file.mjs.
Then store your API keys in a .env file:
CEREBRAS_API_KEY=your-cerebras-api-key
EXA_API_KEY=your-exa-api-key
Get your keys here: Cerebras and Exa.

Step 1: Initialize the Clients

We use Exa for search and the OpenAI client against Cerebras’ OpenAI-compatible API for agent reasoning and tool use.
import json
import os
import re
from dotenv import load_dotenv
from exa_py import Exa
from openai import OpenAI

load_dotenv()

exa = Exa(api_key=os.environ["EXA_API_KEY"])
exa.headers["x-exa-integration"] = "cerebras-integration"

cerebras = OpenAI(
    api_key=os.environ["CEREBRAS_API_KEY"],
    base_url="https://api.cerebras.ai/v1",
    default_headers={"X-Cerebras-3rd-Party-Integration": "exa"},
)

Step 2: Define the Exa Search Tool

The agent gets one tool: exa_search. It returns clean highlights for each result, with each source tagged [n] so the model can cite it. A finalize helper cleans up the model’s output and appends a numbered source list, so every answer ends with reliable citations.
sources = []
index_by_url = {}

def register(title, url):
    if url not in index_by_url:
        sources.append((title or url, url))
        index_by_url[url] = len(sources)
    return index_by_url[url]

def exa_search(query, type="auto", num_results=10, max_age_hours=None, **_):
    contents = {"highlights": True}
    if max_age_hours is not None:
        contents["max_age_hours"] = max_age_hours
    results = exa.search(query, type=type, num_results=num_results, contents=contents)
    return "\n\n".join(
        f"[{register(r.title, r.url)}] {r.title or r.url}\nURL: {r.url}\n{' '.join(r.highlights or [])}"
        for r in results.results
    )

# Remove stray citation markers (e.g. 【†L1-L9】) the model sometimes adds.
GARBAGE = re.compile(r"【[^]*|\d*[^\s\]]*?|[【】†]")

def finalize(answer):
    answer = GARBAGE.sub("", answer)
    answer = re.sub(r"\[\[(\d+)\]\]", r"[\1]", answer).strip()
    if not sources:
        return answer
    lines = "\n".join(f"[{i}] {title} - {url}" for i, (title, url) in enumerate(sources, 1))
    return f"{answer}\n\nSources:\n{lines}"

Step 3: Register the Tool for the Model

The schema exposes the three search types so the model can choose faster or deeper search per query. Only query is required; everything else is optional.
tools = [
    {
        "type": "function",
        "function": {
            "name": "exa_search",
            "description": "Search the web with Exa and get clean, ready-to-use results. Best for current information, news, facts, people, and companies. Returns numbered sources [n] with title, URL, and highlights.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "The search query."},
                    "type": {
                        "type": "string",
                        "enum": ["auto", "fast", "deep"],
                        "description": "Search strategy. 'auto' (default and recommended) balances quality and speed; 'fast' is the lowest-latency option; 'deep' is most thorough.",
                    },
                    "num_results": {
                        "type": "integer",
                        "description": "Number of results to return (1-100, default 10).",
                    },
                    "max_age_hours": {
                        "type": "integer",
                        "description": "Only accept cached pages newer than this many hours; older pages are refreshed before returning. Omit for no freshness limit, 0 to always fetch fresh content, or -1 to use cached content only.",
                    },
                },
                "required": ["query"],
            },
        },
    }
]

available_tools = {"exa_search": exa_search}

Step 4: Run the Agent Loop

The core pattern is:
  1. Ask the model what it needs
  2. Let it call the search tool
  3. Feed tool results back into the conversation
  4. Stop when the model returns a final answer
The loop has a step limit, calls tools safely, and passes any tool error back to the model as a tool message so it can fix its input instead of crashing.
def run_research_agent(question):
    messages = [
        {
            "role": "system",
            "content": (
                "You are a research analyst. Use exa_search to find current sources, then answer "
                "the question. Cite sources inline as [n], matching the labels returned by "
                "exa_search (for example [1] or [2])."
            ),
        },
        {"role": "user", "content": question},
    ]

    for _ in range(6):
        response = cerebras.chat.completions.create(
            model="gpt-oss-120b",
            messages=messages,
            tools=tools,
            tool_choice="auto",
            max_completion_tokens=2000,
        )
        message = response.choices[0].message
        messages.append(message)

        if not message.tool_calls:
            return finalize(message.content or "")

        for tool_call in message.tool_calls:
            tool_fn = available_tools.get(tool_call.function.name)
            try:
                args = json.loads(tool_call.function.arguments)
                result = tool_fn(**args) if tool_fn else f"Unknown tool: {tool_call.function.name}"
            except Exception as e:
                result = f"Tool error ({type(e).__name__}): {e}. Adjust your arguments and try again."
            messages.append({"role": "tool", "tool_call_id": tool_call.id, "content": result})

    return "Could not produce a final answer within the step limit."

Step 5: Try It on a Real Question

Now you can ask for a grounded answer. The agent searches the web, then writes a cited answer.
question = "How are AI agents being used in production today? Cite specific examples."
answer = run_research_agent(question)
print(answer)
Inline [n] markers map to the numbered Sources list at the end of the answer. Non-consecutive citations like [1], [2], and [4] are expected when the model cites only some of the results.

Complete Example

The full agent in a single file. Copy it into agent.py (or agent.mjs) and run it.
import json
import os
import re
from dotenv import load_dotenv
from exa_py import Exa
from openai import OpenAI

load_dotenv()

exa = Exa(api_key=os.environ["EXA_API_KEY"])
exa.headers["x-exa-integration"] = "cerebras-integration"

cerebras = OpenAI(
    api_key=os.environ["CEREBRAS_API_KEY"],
    base_url="https://api.cerebras.ai/v1",
    default_headers={"X-Cerebras-3rd-Party-Integration": "exa"},
)


sources = []
index_by_url = {}

def register(title, url):
    if url not in index_by_url:
        sources.append((title or url, url))
        index_by_url[url] = len(sources)
    return index_by_url[url]

def exa_search(query, type="auto", num_results=10, max_age_hours=None, **_):
    contents = {"highlights": True}
    if max_age_hours is not None:
        contents["max_age_hours"] = max_age_hours
    results = exa.search(query, type=type, num_results=num_results, contents=contents)
    return "\n\n".join(
        f"[{register(r.title, r.url)}] {r.title or r.url}\nURL: {r.url}\n{' '.join(r.highlights or [])}"
        for r in results.results
    )

# Remove stray citation markers (e.g. 【†L1-L9】) the model sometimes adds.
GARBAGE = re.compile(r"【[^]*|\d*[^\s\]]*?|[【】†]")

def finalize(answer):
    answer = GARBAGE.sub("", answer)
    answer = re.sub(r"\[\[(\d+)\]\]", r"[\1]", answer).strip()
    if not sources:
        return answer
    lines = "\n".join(f"[{i}] {title} - {url}" for i, (title, url) in enumerate(sources, 1))
    return f"{answer}\n\nSources:\n{lines}"


tools = [
    {
        "type": "function",
        "function": {
            "name": "exa_search",
            "description": "Search the web with Exa and get clean, ready-to-use results. Best for current information, news, facts, people, and companies. Returns numbered sources [n] with title, URL, and highlights.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "The search query."},
                    "type": {
                        "type": "string",
                        "enum": ["auto", "fast", "deep"],
                        "description": "Search strategy. 'auto' (default and recommended) balances quality and speed; 'fast' is the lowest-latency option; 'deep' is most thorough.",
                    },
                    "num_results": {
                        "type": "integer",
                        "description": "Number of results to return (1-100, default 10).",
                    },
                    "max_age_hours": {
                        "type": "integer",
                        "description": "Only accept cached pages newer than this many hours; older pages are refreshed before returning. Omit for no freshness limit, 0 to always fetch fresh content, or -1 to use cached content only.",
                    },
                },
                "required": ["query"],
            },
        },
    }
]

available_tools = {"exa_search": exa_search}


def run_research_agent(question):
    messages = [
        {
            "role": "system",
            "content": (
                "You are a research analyst. Use exa_search to find current sources, then answer "
                "the question. Cite sources inline as [n], matching the labels returned by "
                "exa_search (for example [1] or [2])."
            ),
        },
        {"role": "user", "content": question},
    ]

    for _ in range(6):
        response = cerebras.chat.completions.create(
            model="gpt-oss-120b",
            messages=messages,
            tools=tools,
            tool_choice="auto",
            max_completion_tokens=2000,
        )
        message = response.choices[0].message
        messages.append(message)

        if not message.tool_calls:
            return finalize(message.content or "")

        for tool_call in message.tool_calls:
            tool_fn = available_tools.get(tool_call.function.name)
            try:
                args = json.loads(tool_call.function.arguments)
                result = tool_fn(**args) if tool_fn else f"Unknown tool: {tool_call.function.name}"
            except Exception as e:
                result = f"Tool error ({type(e).__name__}): {e}. Adjust your arguments and try again."
            messages.append({"role": "tool", "tool_call_id": tool_call.id, "content": result})

    return "Could not produce a final answer within the step limit."


question = "How are AI agents being used in production today? Cite specific examples."
answer = run_research_agent(question)
print(answer)

Summary

What We Built

A grounded research agent with:
  • Exa search for current source discovery, with page content (highlights) returned inline
  • Cerebras tool calling to plan searches and write cited answers
  • Reliable inline citations backed by a numbered source list

Next Steps

  • Use fast for low-latency chat assistants and deep for broader research tasks
  • Lower max_age_hours for newsy queries that need fresher content
  • Try other Exa API config in Exa API Dashboard

Resources

Acknowledgements

Thank you to Ishan Goswami from Exa for his collaboration and feedback during the development of this cookbook.