Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here.
- Parallel API Key - Visit Parallel and create an account to get your API key.
- Python 3.10 or higher or Node.js 20 or higher
Parallel’s APIs are designed to deliver token-efficient, LLM-ready content. Combined with Cerebras’ fast inference (available models:
llama-3.3-70b, qwen-3-32b, qwen-3-235b-a22b-instruct-2507, gpt-oss-120b, zai-glm-4.6, llama3.1-8b), your agents can perform complex web research tasks with minimal latency.Configure Parallel with Cerebras
1
Install required dependencies
Install the Parallel SDK and OpenAI client library. The OpenAI client is used to connect to Cerebras’ OpenAI-compatible API.
2
Configure environment variables
Create a
.env file in your project directory to securely store your API keys:3
Perform your first web search
The Parallel Search API returns high-accuracy, compressed excerpts optimized for LLM context windows. Here’s a simple example that searches the web and uses Cerebras to synthesize the results:
The Search API’s
objective parameter accepts natural language descriptions of your research goal, making it intuitive for agents to use programmatically.Core APIs
Parallel offers three main APIs that work together for comprehensive web research:| API | Purpose | Best For |
|---|---|---|
| Search | High-accuracy web search with compressed excerpts | Finding relevant information across the web |
| Extract | Convert web pages and PDFs to LLM-ready markdown | Deep content extraction from specific URLs |
| Monitor | Watch the web for state changes | Real-time alerts and continuous intelligence |
Search API
The Search API is engineered for AI agents, delivering the most relevant, token-efficient web data at the lowest cost.Basic Search
Search with Cerebras Synthesis
Combine Parallel’s search with Cerebras’ fast inference to create a complete research workflow:Extract API
The Extract API converts web pages and PDFs to LLM-ready markdown. It supports two modes:- Compressed excerpts: Dense, objective-focused extractions
- Full content extraction: Complete page content in markdown format
Extract Compressed Excerpts
Extract Full Content
Search + Extract Workflow
Combine Search and Extract for comprehensive research:Monitor API
The Monitor API flips the traditional pull model to push—create queries that trigger notifications when new information is published to the web.Create a Monitor
Monitor Use Cases
- Proactive sub-agent: Create agents that are invoked when web changes are detected
- Workflow trigger: Trigger workflows when new information surfaces (e.g., add leads to CRM)
- Continuous intelligence feed: Maintain always-up-to-date data feeds for investment research
Building a Search Agent with Vercel AI SDK
For production applications, you can build a full-stack search agent using the Vercel AI SDK with Cerebras:skip
Choosing the Right Cerebras Model
Different research tasks benefit from different models:| Model | Best For | Speed |
|---|---|---|
llama-3.3-70b | Complex multi-step research, nuanced synthesis | Fast |
qwen-3-235b-a22b-instruct-2507 | Multilingual research, advanced reasoning | Fast |
gpt-oss-120b | Document analysis, coding research, agentic workflows | Fast |
zai-glm-4.6 | Tool-heavy agents, coding documentation research | Fast |
qwen-3-32b | Balanced research tasks, structured extraction | Very Fast |
llama3.1-8b | Simple lookups, high-volume research pipelines | Ultra Fast |
Next Steps
- Explore Parallel’s full documentation for advanced features
- Try different Cerebras models to optimize for your research use case
- Check out the Parallel + Cerebras Search Agent Cookbook for a complete implementation example
- Build with the Vercel AI SDK for production-ready streaming agents
Troubleshooting
Search results aren't relevant enough
Search results aren't relevant enough
Try these approaches:
- Be more specific in your objective - Instead of “AI news”, try “Recent announcements about large language model capabilities from major AI labs”
- Use multiple search queries - Provide explicit
search_queriesto cover different angles - Increase max_results - Get more results and let the LLM filter for relevance
- Use the “pro” processor - For fresher, higher-quality results (at higher cost)
Extraction is missing content
Extraction is missing content
Some pages require special handling:
- JavaScript-rendered content - Parallel handles most JS-rendered sites, but some may require additional wait time
- PDFs - Parallel supports multi-page PDF extraction, including images
- Paywalled content - Some content may not be accessible; check the extraction status in the response
Which Cerebras model should I use for research tasks?
Which Cerebras model should I use for research tasks?
Choose based on your research complexity:
- Complex synthesis (multiple sources, nuanced analysis):
llama-3.3-70borqwen-3-235b-a22b-instruct-2507 - Structured extraction (tables, lists, specific data):
qwen-3-32b - High-volume pipelines (many simple queries):
llama3.1-8b - Coding/technical research:
gpt-oss-120borzai-glm-4.6

