Build an AI agent pipeline that searches, summarizes, and synthesizes information from multiple sources to generate comprehensive reports.
cerebras_client.py
module implements a singleton pattern:
get_client()
or get_async_client()
as needed, with the client being created lazily on first use and reused for all subsequent API calls.
config.yaml
) that allows easy customization of models, API limits, word counts, and other parameters without modifying code. This makes the system highly configurable and maintainable.
The config_loader.py
module provides access to these settings:
ask_follow_up_questions(topic)
- Takes the user’s provided topic and generates three structured follow-up questions.capture_user_answers(questions)
- Presents these generated questions one-by-one, interactively capturing the user’s answers.research_brief.json
.
research_brief.json
from Step 1 into a collection of high-quality, detailed summaries through a combination of web searches and advanced AI summarization.
This script operates around two main functions:
run_research_tasks(research_brief)
- Orchestrates a “General + Specific” query strategy. It first formulates one broad search query based on your initial topic, followed by three specific queries derived from the clarifying questions and answers in the brief. Using the Exa API, it retrieves the top three articles for each query, resulting in 12 selected source articles.
summarize_single_article(article_text, research_brief)
- Conducts a sophisticated four-step “Iterative Refinement” summarization loop:
search_queries.json
raw_exa_results.json
summarized_articles.json
(most important - sets the stage for subsequent outlining and drafting phases)create_report_outline(summaries)
function within 3_outliner.py
.
The script operates as follows:
summarized_articles.json
.report_outline.json
, a structured file that contains the blueprint for the report, including a title, introduction, body sections, conclusion, and bullet points. Each bullet point is paired with a list of source indices, ensuring every claim is traceable back to its original source material. This file acts as the primary input for the next agent, the Writer.
write_report_from_outline(outline)
function.
The script operates by loading the report_outline.json
file and using its contents to construct a detailed prompt. The prompt directs the model to:
[Source 1]
for a single source and [Source 1, 3, 5]
for multiple sources.draft_report.md
. This file serves as the near-final draft, containing all the generated text and correctly formatted citation placeholders, ready for the final processing step.
citation_manager.py
, polishes the report by formatting citations and appending a complete reference list. This entire process is handled by the create_final_report()
function.
The script operates as follows:
draft_report.md
, which contains the text with [Source X]
placeholders, and summarized_articles.json
, which holds the metadata for each source.[Source X]
placeholders are replaced with these new, ordered numbers.final_report.md
, the completed output of the pipeline.
This tutorial demonstrated a five-phase pipeline that automates the creation of a comprehensive, cited news report from a single user topic. By breaking the process into distinct, manageable steps—from user interaction to final citation management—the system ensures a high-quality and coherent output.
A core takeaway is the power of iterative refinement in the research phase. The “Reflect, Elaborate, Critique, Refine” summarization loop significantly enhances the accuracy and depth of the generated summaries. This technique, which involves using a second model to provide critical feedback, improves the quality of the final content without requiring expensive model fine-tuning.
Such advanced, multi-step agentic workflows are only practical with access to high-speed inference. The research phase alone requires nearly 50 sequential LLM calls to process all the articles. Traditional inference speeds would introduce significant latency, making this iterative approach impractical. Fast, low-latency inference is the enabling technology that allows for the development of more sophisticated and reliable AI agents.
While this pipeline focused on generating a news report, the “generate, critique, refine” pattern is a versatile technique applicable to numerous AI agentic workflows: