Skip to main content
Parallel provides a suite of web research APIs built specifically for AI agents. By combining Parallel’s high-accuracy Search, Extract, and Monitor APIs with Cerebras’ ultra-fast inference, you can build agents that search the web, extract structured content, and monitor for real-time updates—all with sub-second response times.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here.
  • Parallel API Key - Visit Parallel and create an account to get your API key.
  • Python 3.10 or higher or Node.js 20 or higher
Parallel’s APIs are designed to deliver token-efficient, LLM-ready content. Combined with Cerebras’ fast inference (available models: llama-3.3-70b, qwen-3-32b, qwen-3-235b-a22b-instruct-2507, gpt-oss-120b, zai-glm-4.6, llama3.1-8b), your agents can perform complex web research tasks with minimal latency.

Configure Parallel with Cerebras

1

Install required dependencies

Install the Parallel SDK and OpenAI client library. The OpenAI client is used to connect to Cerebras’ OpenAI-compatible API.
pip install parallel-web openai requests
2

Configure environment variables

Create a .env file in your project directory to securely store your API keys:
CEREBRAS_API_KEY=your-cerebras-api-key-here
PARALLEL_API_KEY=your-parallel-api-key-here
3

Perform your first web search

The Parallel Search API returns high-accuracy, compressed excerpts optimized for LLM context windows. Here’s a simple example that searches the web and uses Cerebras to synthesize the results:
import os
from parallel import Parallel
from openai import OpenAI

# Initialize clients
parallel = Parallel(api_key=os.environ["PARALLEL_API_KEY"])
cerebras = OpenAI(
    api_key=os.environ["CEREBRAS_API_KEY"],
    base_url="https://api.cerebras.ai/v1",
    default_headers={"X-Cerebras-3rd-Party-Integration": "parallel"}
)

# Search the web with Parallel
search_result = parallel.beta.search(
    objective="What are the latest developments in quantum computing?",
    max_results=5,
    excerpts={"max_chars_per_result": 2000},
)

# Format search results for the LLM
context = "\n\n".join([
    f"Source: {r.url}\n{' '.join(r.excerpts or [])}"
    for r in search_result.results
])

# Use Cerebras to synthesize the results
response = cerebras.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a research assistant. Synthesize the provided search results into a comprehensive answer."},
        {"role": "user", "content": f"Based on these search results:\n\n{context}\n\nWhat are the latest developments in quantum computing?"},
    ],
    max_completion_tokens=1000,
)

print(response.choices[0].message.content)
The Search API’s objective parameter accepts natural language descriptions of your research goal, making it intuitive for agents to use programmatically.

Core APIs

Parallel offers three main APIs that work together for comprehensive web research:
APIPurposeBest For
SearchHigh-accuracy web search with compressed excerptsFinding relevant information across the web
ExtractConvert web pages and PDFs to LLM-ready markdownDeep content extraction from specific URLs
MonitorWatch the web for state changesReal-time alerts and continuous intelligence

Search API

The Search API is engineered for AI agents, delivering the most relevant, token-efficient web data at the lowest cost.
import os
from parallel import Parallel

parallel = Parallel(api_key=os.environ["PARALLEL_API_KEY"])

result = parallel.beta.search(
    objective="When was the United Nations established?",
    search_queries=["Founding year UN", "Year of founding United Nations"],
    max_results=10,
    excerpts={"max_chars_per_result": 10000},
)

for r in result.results:
    print(f"URL: {r.url}")
    excerpt_text = ' '.join(r.excerpts or [])
    print(f"Excerpt: {excerpt_text[:200]}...")
    print()

Search with Cerebras Synthesis

Combine Parallel’s search with Cerebras’ fast inference to create a complete research workflow:
import os
from parallel import Parallel
from openai import OpenAI

parallel = Parallel(api_key=os.environ["PARALLEL_API_KEY"])
cerebras = OpenAI(
    api_key=os.environ["CEREBRAS_API_KEY"],
    base_url="https://api.cerebras.ai/v1",
    default_headers={"X-Cerebras-3rd-Party-Integration": "parallel"}
)

def research(question: str) -> str:
    """Perform web research and synthesize results."""
    # Step 1: Search the web
    search_result = parallel.beta.search(
        objective=question,
        max_results=8,
        excerpts={"max_chars_per_result": 3000},
    )

    # Step 2: Format context
    context = "\n\n---\n\n".join([
        f"Source: {r.url}\nContent: {' '.join(r.excerpts or [])}"
        for r in search_result.results
    ])

    # Step 3: Synthesize with Cerebras
    response = cerebras.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {
                "role": "system",
                "content": "You are a research assistant. Provide accurate, well-sourced answers based on the search results provided. Cite sources when possible."
            },
            {
                "role": "user",
                "content": f"Search Results:\n{context}\n\nQuestion: {question}"
            },
        ],
        max_completion_tokens=1500,
    )

    return response.choices[0].message.content

# Example usage
answer = research("What are the environmental impacts of lithium mining?")
print(answer)

Extract API

The Extract API converts web pages and PDFs to LLM-ready markdown. It supports two modes:
  • Compressed excerpts: Dense, objective-focused extractions
  • Full content extraction: Complete page content in markdown format

Extract Compressed Excerpts

import os
from parallel import Parallel

parallel = Parallel(api_key=os.environ["PARALLEL_API_KEY"])

extract = parallel.beta.extract(
    urls=["https://www.un.org/en/about-us/history-of-the-un"],
    objective="When was the United Nations established?",
    excerpts=True,
    full_content=False,
)

for result in extract.results:
    print(f"URL: {result.url}")
    print(f"Excerpt: {' '.join(result.excerpts or [])}")

Extract Full Content

import os
from parallel import Parallel

parallel = Parallel(api_key=os.environ["PARALLEL_API_KEY"])

extract = parallel.beta.extract(
    urls=["https://docs.python.org/3/tutorial/index.html"],
    excerpts=False,
    full_content=True,
)

for result in extract.results:
    print(f"URL: {result.url}")
    print(f"Content length: {len(result.full_content or '')} characters")
    print(f"Content preview: {(result.full_content or '')[:500]}...")

Search + Extract Workflow

Combine Search and Extract for comprehensive research:
import os
from parallel import Parallel
from openai import OpenAI

parallel = Parallel(api_key=os.environ["PARALLEL_API_KEY"])
cerebras = OpenAI(
    api_key=os.environ["CEREBRAS_API_KEY"],
    base_url="https://api.cerebras.ai/v1",
    default_headers={"X-Cerebras-3rd-Party-Integration": "parallel"}
)

def deep_research(topic: str) -> str:
    """Search for relevant pages, then extract full content from top results."""
    # Step 1: Search to find relevant URLs
    search_result = parallel.beta.search(
        objective=topic,
        max_results=3,
    )

    # Step 2: Extract full content from top results
    urls = [r.url for r in search_result.results[:3]]
    extract_result = parallel.beta.extract(
        urls=urls,
        objective=topic,
        excerpts=True,
        full_content=False,
    )

    # Step 3: Synthesize with Cerebras
    context = "\n\n---\n\n".join([
        f"Source: {r.url}\nContent: {' '.join(r.excerpts or [])}"
        for r in extract_result.results
    ])

    response = cerebras.chat.completions.create(
        model="qwen-3-32b",
        messages=[
            {"role": "system", "content": "You are a research analyst. Provide a detailed analysis based on the extracted content."},
            {"role": "user", "content": f"Extracted Content:\n{context}\n\nProvide a comprehensive analysis of: {topic}"},
        ],
        max_completion_tokens=2000,
    )

    return response.choices[0].message.content

# Example usage
analysis = deep_research("Recent advances in battery technology for electric vehicles")
print(analysis)

Monitor API

The Monitor API flips the traditional pull model to push—create queries that trigger notifications when new information is published to the web.

Create a Monitor

import os
import requests

url = "https://api.parallel.ai/v1alpha/monitors"

payload = {
    "query": "New product announcements from OpenAI",
    "cadence": "daily"
}

headers = {
    "x-api-key": os.environ["PARALLEL_API_KEY"],
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)
monitor = response.json()

print(f"Monitor created: {monitor['monitor_id']}")

Monitor Use Cases

  • Proactive sub-agent: Create agents that are invoked when web changes are detected
  • Workflow trigger: Trigger workflows when new information surfaces (e.g., add leads to CRM)
  • Continuous intelligence feed: Maintain always-up-to-date data feeds for investment research

Building a Search Agent with Vercel AI SDK

For production applications, you can build a full-stack search agent using the Vercel AI SDK with Cerebras:
skip
import 'dotenv/config';
import Parallel from 'parallel-web';
import OpenAI from 'openai';

const parallel = new Parallel({ apiKey: process.env.PARALLEL_API_KEY });
const cerebras = new OpenAI({
  apiKey: process.env.CEREBRAS_API_KEY,
  baseURL: "https://api.cerebras.ai/v1",
  defaultHeaders: { "X-Cerebras-3rd-Party-Integration": "parallel" }
});

// Search with Parallel
const searchResult = await parallel.beta.search({
  objective: "Latest quantum computing breakthroughs 2024",
  max_results: 3,
  excerpts: { max_chars_per_result: 1000 },
});

// Format context for LLM
const context = searchResult.results.map(r => 
  `Source: ${r.url}\n${(r.excerpts || []).join(' ')}`
).join('\n\n');

// Synthesize with Cerebras
const response = await cerebras.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    { role: "system", content: "Summarize the search results in 2-3 sentences." },
    { role: "user", content: context }
  ],
  max_tokens: 200,
});

console.log(response.choices[0].message.content);

Choosing the Right Cerebras Model

Different research tasks benefit from different models:
ModelBest ForSpeed
llama-3.3-70bComplex multi-step research, nuanced synthesisFast
qwen-3-235b-a22b-instruct-2507Multilingual research, advanced reasoningFast
gpt-oss-120bDocument analysis, coding research, agentic workflowsFast
zai-glm-4.6Tool-heavy agents, coding documentation researchFast
qwen-3-32bBalanced research tasks, structured extractionVery Fast
llama3.1-8bSimple lookups, high-volume research pipelinesUltra Fast

Next Steps

Troubleshooting

Try these approaches:
  1. Be more specific in your objective - Instead of “AI news”, try “Recent announcements about large language model capabilities from major AI labs”
  2. Use multiple search queries - Provide explicit search_queries to cover different angles
  3. Increase max_results - Get more results and let the LLM filter for relevance
  4. Use the “pro” processor - For fresher, higher-quality results (at higher cost)
Some pages require special handling:
  1. JavaScript-rendered content - Parallel handles most JS-rendered sites, but some may require additional wait time
  2. PDFs - Parallel supports multi-page PDF extraction, including images
  3. Paywalled content - Some content may not be accessible; check the extraction status in the response
Choose based on your research complexity:
  • Complex synthesis (multiple sources, nuanced analysis): llama-3.3-70b or qwen-3-235b-a22b-instruct-2507
  • Structured extraction (tables, lists, specific data): qwen-3-32b
  • High-volume pipelines (many simple queries): llama3.1-8b
  • Coding/technical research: gpt-oss-120b or zai-glm-4.6

Additional Resources