Get Started with Parallel

Parallel provides a suite of web research APIs built specifically for AI agents. By combining Parallel’s high-accuracy Search, Extract, and Monitor APIs with Cerebras’ ultra-fast inference, you can build agents that search the web, extract structured content, and monitor for real-time updates—all with sub-second response times.

Prerequisites

Before you begin, ensure you have:

Cerebras API Key - Get a free API key here.
Parallel API Key - Visit Parallel and create an account to get your API key.
Python 3.10 or higher or Node.js 20 or higher

Parallel’s APIs are designed to deliver token-efficient, LLM-ready content. Combined with Cerebras’ fast inference (available models: llama-3.3-70b, qwen-3-32b, qwen-3-235b-a22b-instruct-2507, gpt-oss-120b, zai-glm-4.6, llama3.1-8b), your agents can perform complex web research tasks with minimal latency.

Configure Parallel with Cerebras

Install required dependencies

Install the Parallel SDK and OpenAI client library. The OpenAI client is used to connect to Cerebras’ OpenAI-compatible API.

pip install parallel-web openai requests

Configure environment variables

Create a .env file in your project directory to securely store your API keys:

CEREBRAS_API_KEY=your-cerebras-api-key-here
PARALLEL_API_KEY=your-parallel-api-key-here

Perform your first web search

The Parallel Search API returns high-accuracy, compressed excerpts optimized for LLM context windows. Here’s a simple example that searches the web and uses Cerebras to synthesize the results:

import os
from parallel import Parallel
from openai import OpenAI

# Initialize clients
parallel = Parallel(api_key=os.environ["PARALLEL_API_KEY"])
cerebras = OpenAI(
    api_key=os.environ["CEREBRAS_API_KEY"],
    base_url="https://api.cerebras.ai/v1",
    default_headers={"X-Cerebras-3rd-Party-Integration": "parallel"}
)

# Search the web with Parallel
search_result = parallel.beta.search(
    objective="What are the latest developments in quantum computing?",
    max_results=5,
    excerpts={"max_chars_per_result": 2000},
)

# Format search results for the LLM
context = "\n\n".join([
    f"Source: {r.url}\n{' '.join(r.excerpts or [])}"
    for r in search_result.results
])

# Use Cerebras to synthesize the results
response = cerebras.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a research assistant. Synthesize the provided search results into a comprehensive answer."},
        {"role": "user", "content": f"Based on these search results:\n\n{context}\n\nWhat are the latest developments in quantum computing?"},
    ],
    max_completion_tokens=1000,
)

print(response.choices[0].message.content)

The Search API’s objective parameter accepts natural language descriptions of your research goal, making it intuitive for agents to use programmatically.

Core APIs

Parallel offers three main APIs that work together for comprehensive web research:

API	Purpose	Best For
Search	High-accuracy web search with compressed excerpts	Finding relevant information across the web
Extract	Convert web pages and PDFs to LLM-ready markdown	Deep content extraction from specific URLs
Monitor	Watch the web for state changes	Real-time alerts and continuous intelligence

Search API

The Search API is engineered for AI agents, delivering the most relevant, token-efficient web data at the lowest cost.

Basic Search

import os
from parallel import Parallel

parallel = Parallel(api_key=os.environ["PARALLEL_API_KEY"])

result = parallel.beta.search(
    objective="When was the United Nations established?",
    search_queries=["Founding year UN", "Year of founding United Nations"],
    max_results=10,
    excerpts={"max_chars_per_result": 10000},
)

for r in result.results:
    print(f"URL: {r.url}")
    excerpt_text = ' '.join(r.excerpts or [])
    print(f"Excerpt: {excerpt_text[:200]}...")
    print()

Search with Cerebras Synthesis

Combine Parallel’s search with Cerebras’ fast inference to create a complete research workflow:

import os
from parallel import Parallel
from openai import OpenAI

parallel = Parallel(api_key=os.environ["PARALLEL_API_KEY"])
cerebras = OpenAI(
    api_key=os.environ["CEREBRAS_API_KEY"],
    base_url="https://api.cerebras.ai/v1",
    default_headers={"X-Cerebras-3rd-Party-Integration": "parallel"}
)

def research(question: str) -> str:
    """Perform web research and synthesize results."""
    # Step 1: Search the web
    search_result = parallel.beta.search(
        objective=question,
        max_results=8,
        excerpts={"max_chars_per_result": 3000},
    )

    # Step 2: Format context
    context = "\n\n---\n\n".join([
        f"Source: {r.url}\nContent: {' '.join(r.excerpts or [])}"
        for r in search_result.results
    ])

    # Step 3: Synthesize with Cerebras
    response = cerebras.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {
                "role": "system",
                "content": "You are a research assistant. Provide accurate, well-sourced answers based on the search results provided. Cite sources when possible."
            },
            {
                "role": "user",
                "content": f"Search Results:\n{context}\n\nQuestion: {question}"
            },
        ],
        max_completion_tokens=1500,
    )

    return response.choices[0].message.content

# Example usage
answer = research("What are the environmental impacts of lithium mining?")
print(answer)

Extract API

The Extract API converts web pages and PDFs to LLM-ready markdown. It supports two modes:

Compressed excerpts: Dense, objective-focused extractions
Full content extraction: Complete page content in markdown format

Extract Compressed Excerpts

import os
from parallel import Parallel

parallel = Parallel(api_key=os.environ["PARALLEL_API_KEY"])

extract = parallel.beta.extract(
    urls=["https://www.un.org/en/about-us/history-of-the-un"],
    objective="When was the United Nations established?",
    excerpts=True,
    full_content=False,
)

for result in extract.results:
    print(f"URL: {result.url}")
    print(f"Excerpt: {' '.join(result.excerpts or [])}")

Extract Full Content

import os
from parallel import Parallel

parallel = Parallel(api_key=os.environ["PARALLEL_API_KEY"])

extract = parallel.beta.extract(
    urls=["https://docs.python.org/3/tutorial/index.html"],
    excerpts=False,
    full_content=True,
)

for result in extract.results:
    print(f"URL: {result.url}")
    print(f"Content length: {len(result.full_content or '')} characters")
    print(f"Content preview: {(result.full_content or '')[:500]}...")

Search + Extract Workflow

Combine Search and Extract for comprehensive research:

import os
from parallel import Parallel
from openai import OpenAI

parallel = Parallel(api_key=os.environ["PARALLEL_API_KEY"])
cerebras = OpenAI(
    api_key=os.environ["CEREBRAS_API_KEY"],
    base_url="https://api.cerebras.ai/v1",
    default_headers={"X-Cerebras-3rd-Party-Integration": "parallel"}
)

def deep_research(topic: str) -> str:
    """Search for relevant pages, then extract full content from top results."""
    # Step 1: Search to find relevant URLs
    search_result = parallel.beta.search(
        objective=topic,
        max_results=3,
    )

    # Step 2: Extract full content from top results
    urls = [r.url for r in search_result.results[:3]]
    extract_result = parallel.beta.extract(
        urls=urls,
        objective=topic,
        excerpts=True,
        full_content=False,
    )

    # Step 3: Synthesize with Cerebras
    context = "\n\n---\n\n".join([
        f"Source: {r.url}\nContent: {' '.join(r.excerpts or [])}"
        for r in extract_result.results
    ])

    response = cerebras.chat.completions.create(
        model="qwen-3-32b",
        messages=[
            {"role": "system", "content": "You are a research analyst. Provide a detailed analysis based on the extracted content."},
            {"role": "user", "content": f"Extracted Content:\n{context}\n\nProvide a comprehensive analysis of: {topic}"},
        ],
        max_completion_tokens=2000,
    )

    return response.choices[0].message.content

# Example usage
analysis = deep_research("Recent advances in battery technology for electric vehicles")
print(analysis)

Monitor API

The Monitor API flips the traditional pull model to push—create queries that trigger notifications when new information is published to the web.

Create a Monitor

import os
import requests

url = "https://api.parallel.ai/v1alpha/monitors"

payload = {
    "query": "New product announcements from OpenAI",
    "cadence": "daily"
}

headers = {
    "x-api-key": os.environ["PARALLEL_API_KEY"],
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)
monitor = response.json()

print(f"Monitor created: {monitor['monitor_id']}")

Monitor Use Cases

Proactive sub-agent: Create agents that are invoked when web changes are detected
Workflow trigger: Trigger workflows when new information surfaces (e.g., add leads to CRM)
Continuous intelligence feed: Maintain always-up-to-date data feeds for investment research

Building a Search Agent with Vercel AI SDK

For production applications, you can build a full-stack search agent using the Vercel AI SDK with Cerebras:

skip

import 'dotenv/config';
import Parallel from 'parallel-web';
import OpenAI from 'openai';

const parallel = new Parallel({ apiKey: process.env.PARALLEL_API_KEY });
const cerebras = new OpenAI({
  apiKey: process.env.CEREBRAS_API_KEY,
  baseURL: "https://api.cerebras.ai/v1",
  defaultHeaders: { "X-Cerebras-3rd-Party-Integration": "parallel" }
});

// Search with Parallel
const searchResult = await parallel.beta.search({
  objective: "Latest quantum computing breakthroughs 2024",
  max_results: 3,
  excerpts: { max_chars_per_result: 1000 },
});

// Format context for LLM
const context = searchResult.results.map(r => 
  `Source: ${r.url}\n${(r.excerpts || []).join(' ')}`
).join('\n\n');

// Synthesize with Cerebras
const response = await cerebras.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [
    { role: "system", content: "Summarize the search results in 2-3 sentences." },
    { role: "user", content: context }
  ],
  max_tokens: 200,
});

console.log(response.choices[0].message.content);

Choosing the Right Cerebras Model

Different research tasks benefit from different models:

Model	Best For	Speed
`llama-3.3-70b`	Complex multi-step research, nuanced synthesis	Fast
`qwen-3-235b-a22b-instruct-2507`	Multilingual research, advanced reasoning	Fast
`gpt-oss-120b`	Document analysis, coding research, agentic workflows	Fast
`zai-glm-4.6`	Tool-heavy agents, coding documentation research	Fast
`qwen-3-32b`	Balanced research tasks, structured extraction	Very Fast
`llama3.1-8b`	Simple lookups, high-volume research pipelines	Ultra Fast

Next Steps

Explore Parallel’s full documentation for advanced features
Try different Cerebras models to optimize for your research use case
Check out the Parallel + Cerebras Search Agent Cookbook for a complete implementation example
Build with the Vercel AI SDK for production-ready streaming agents

Troubleshooting

Search results aren't relevant enough

Try these approaches:

Be more specific in your objective - Instead of “AI news”, try “Recent announcements about large language model capabilities from major AI labs”
Use multiple search queries - Provide explicit search_queries to cover different angles
Increase max_results - Get more results and let the LLM filter for relevance
Use the “pro” processor - For fresher, higher-quality results (at higher cost)

Extraction is missing content

Some pages require special handling:

JavaScript-rendered content - Parallel handles most JS-rendered sites, but some may require additional wait time
PDFs - Parallel supports multi-page PDF extraction, including images
Paywalled content - Some content may not be accessible; check the extraction status in the response

Which Cerebras model should I use for research tasks?

Choose based on your research complexity:

Complex synthesis (multiple sources, nuanced analysis): llama-3.3-70b or qwen-3-235b-a22b-instruct-2507
Structured extraction (tables, lists, specific data): qwen-3-32b
High-volume pipelines (many simple queries): llama3.1-8b
Coding/technical research: gpt-oss-120b or zai-glm-4.6

Get Started

Capabilities

Resources

Support

Prerequisites

Configure Parallel with Cerebras

Core APIs

Search API

Basic Search

Search with Cerebras Synthesis

Extract API

Extract Compressed Excerpts

Extract Full Content

Search + Extract Workflow

Monitor API

Create a Monitor

Monitor Use Cases

Building a Search Agent with Vercel AI SDK

Choosing the Right Cerebras Model

Next Steps

Troubleshooting

Additional Resources

Get Started

Capabilities

Resources

Support

​Prerequisites

​Configure Parallel with Cerebras

​Core APIs

​Search API

​Basic Search

​Search with Cerebras Synthesis

​Extract API

​Extract Compressed Excerpts

​Extract Full Content

​Search + Extract Workflow

​Monitor API

​Create a Monitor

​Monitor Use Cases

​Building a Search Agent with Vercel AI SDK

​Choosing the Right Cerebras Model

​Next Steps

​Troubleshooting

​Additional Resources

Prerequisites

Configure Parallel with Cerebras

Core APIs

Search API

Basic Search

Search with Cerebras Synthesis

Extract API

Extract Compressed Excerpts

Extract Full Content

Search + Extract Workflow

Monitor API

Create a Monitor

Monitor Use Cases

Building a Search Agent with Vercel AI SDK

Choosing the Right Cerebras Model

Next Steps

Troubleshooting

Additional Resources