Skip to main content
Exa is one of the fastest and most accurate web search APIs, built for AI applications. By combining Exa’s search API with Cerebras Inference, you can ground responses in current web content while keeping agent latency low.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key
  • Exa API Key
  • Python or Node.js

Configure Exa with Cerebras

1

Install required dependencies

Install the Exa SDK and the OpenAI client library. The OpenAI client is used to connect to Cerebras’ OpenAI-compatible API.
pip install "exa-py>=2.0" openai python-dotenv
The Node.js examples use ES modules and top-level await. Save them with a .mjs extension (or set "type": "module" in your package.json) and run them with node file.mjs.
2

Configure environment variables

Create a .env file in your project directory to securely store your API keys:
CEREBRAS_API_KEY=your-cerebras-api-key
EXA_API_KEY=your-exa-api-key
Get your keys here: Cerebras and Exa.
3

Perform your first grounded web search

This example uses Exa search to gather current web results, then asks a Cerebras model to combine them into a short answer.
import os
from dotenv import load_dotenv
from exa_py import Exa
from openai import OpenAI

load_dotenv()

exa = Exa(api_key=os.environ["EXA_API_KEY"])
exa.headers["x-exa-integration"] = "cerebras-integration"

cerebras = OpenAI(
    api_key=os.environ["CEREBRAS_API_KEY"],
    base_url="https://api.cerebras.ai/v1",
    default_headers={"X-Cerebras-3rd-Party-Integration": "exa"},
)

results = exa.search(
    "latest developments in AI agents",
    type="auto",
    num_results=10,
    contents={"highlights": True},
)

context = "\n\n".join(
    f"Source: {result.url}\n{' '.join(result.highlights or [])}"
    for result in results.results
)

response = cerebras.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {
            "role": "system",
            "content": "You are a research assistant. Using only the search results provided, write a short summary. Do not include citation markers or footnotes.",
        },
        {
            "role": "user",
            "content": f"Search results:\n\n{context}\n\nSummarize the latest developments in AI agents.",
        },
    ],
)

print(response.choices[0].message.content)

Search types and freshness controls

Exa supports a few search modes with different speed and coverage tradeoffs:
Search typeBest for
auto(Default) Best balance of quality and speed. Recommended. 1s latency.
fastLowest-latency search. 450ms latency.
deepFor the most thorough search. 4s-18s latency.
Freshness is controlled with max_age_hours in Python or maxAgeHours in Node, inside contents. It is optional: leave it out for no freshness limit, or set 0 to always fetch fresh content, 24 to accept cached pages up to one day old, or -1 to use cached content only.

Get page contents

Every Exa search result already includes page content (highlights), so you usually don’t need a separate call. We recommend using search for most grounding workflows. Reach for the Contents API when you already have a URL and want to get its highlights directly.
import os
from dotenv import load_dotenv
from exa_py import Exa

load_dotenv()

exa = Exa(api_key=os.environ["EXA_API_KEY"])
exa.headers["x-exa-integration"] = "cerebras-integration"

contents = exa.get_contents(
    ["https://openai.com/index/hello-gpt-4o/"],
    highlights=True,
)

for result in contents.results:
    print(result.url)
    print(" ".join(result.highlights or []))

Use Exa as a tool for grounded answers

Tool calling works well when you want a Cerebras model to decide when to search the web. This example exposes Exa search as a tool.
import os
import json
import re
from dotenv import load_dotenv
from exa_py import Exa
from openai import OpenAI

load_dotenv()

exa = Exa(api_key=os.environ["EXA_API_KEY"])
exa.headers["x-exa-integration"] = "cerebras-integration"

cerebras = OpenAI(
    api_key=os.environ["CEREBRAS_API_KEY"],
    base_url="https://api.cerebras.ai/v1",
    default_headers={"X-Cerebras-3rd-Party-Integration": "exa"},
)

sources = []
index_by_url = {}

def register(title, url):
    if url not in index_by_url:
        sources.append((title or url, url))
        index_by_url[url] = len(sources)
    return index_by_url[url]

def exa_search(query, type="auto", num_results=10, max_age_hours=None, **_):
    contents = {"highlights": True}
    if max_age_hours is not None:
        contents["max_age_hours"] = max_age_hours
    results = exa.search(query, type=type, num_results=num_results, contents=contents)
    return "\n\n".join(
        f"[{register(r.title, r.url)}] {r.title or r.url}\nURL: {r.url}\n{' '.join(r.highlights or [])}"
        for r in results.results
    )

available_tools = {"exa_search": exa_search}

tools = [
    {
        "type": "function",
        "function": {
            "name": "exa_search",
            "description": "Search the web with Exa and get clean, ready-to-use results. Best for current information, news, facts, people, and companies. Returns numbered sources [n] with title, URL, and highlights.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "The search query."},
                    "type": {
                        "type": "string",
                        "enum": ["auto", "fast", "deep"],
                        "description": "Search strategy. 'auto' (default) balances quality and speed; 'fast' is the lowest-latency option; 'deep' is most thorough.",
                    },
                    "num_results": {
                        "type": "integer",
                        "description": "Number of results to return (1-100, default 10).",
                    },
                    "max_age_hours": {
                        "type": "integer",
                        "description": "Only accept cached pages newer than this many hours; older pages are refreshed before returning. Omit for no freshness limit, 0 to always fetch fresh content, or -1 to use cached content only.",
                    },
                },
                "required": ["query"],
            },
        },
    }
]

# Remove stray citation markers (e.g. 【†L1-L9】) the model sometimes adds.
GARBAGE = re.compile(r"【[^]*|\d*[^\s\]]*?|[【】†]")

def finalize(answer):
    answer = GARBAGE.sub("", answer)
    answer = re.sub(r"\[\[(\d+)\]\]", r"[\1]", answer).strip()
    if not sources:
        return answer
    lines = "\n".join(f"[{i}] {title} - {url}" for i, (title, url) in enumerate(sources, 1))
    return f"{answer}\n\nSources:\n{lines}"

def answer_with_search(question):
    messages = [
        {
            "role": "system",
            "content": (
                "You are a research assistant. Use exa_search to find current information, then "
                "answer the question. Cite sources inline as [n], matching the labels returned "
                "by exa_search (for example [1] or [2])."
            ),
        },
        {"role": "user", "content": question},
    ]

    for _ in range(6):
        response = cerebras.chat.completions.create(
            model="gpt-oss-120b",
            messages=messages,
            tools=tools,
            tool_choice="auto",
            max_completion_tokens=2000,
        )
        message = response.choices[0].message
        messages.append(message)

        if not message.tool_calls:
            return finalize(message.content or "")

        for tool_call in message.tool_calls:
            tool_fn = available_tools.get(tool_call.function.name)
            try:
                args = json.loads(tool_call.function.arguments)
                result = tool_fn(**args) if tool_fn else f"Unknown tool: {tool_call.function.name}"
            except Exception as e:
                result = f"Tool error ({type(e).__name__}): {e}. Adjust your arguments and try again."
            messages.append({"role": "tool", "tool_call_id": tool_call.id, "content": result})

    return "Could not produce a final answer within the step limit."

print(answer_with_search("What are the latest AI model releases, and what makes them notable?"))

Next steps