Automating Search-Based Report Generation with a Multi-Agent AI Pipeline

Njdeh Satourian
July 15, 2025

Creating accurate, insightful, and verifiable reports based on extensive research typically requires significant manual effort. To simplify this process, we’ve built a robust, multi-agent AI pipeline that automates the entire workflow—from initial research via web searches to final publication of the report—in a modular, easy-to-maintain system. Data flows smoothly through the pipeline in a structured series of JSON and Markdown files. Starting from a single high-level topic provided by the user, the system systematically moves through stages of detailed research, iterative summarization, outlining, full report drafting, and finally, polished citation management. A critical piece at the heart of our pipeline is the use of iterative summarization. This method, leveraging rapid inference capabilities, enhances the accuracy, depth, and reliability of the content generated by our agents. By repeatedly refining content through a structured Reflect → Elaborate → Critique → Refine loop, the pipeline ensures high-quality, trustworthy output that traditional single-step approaches can’t match. Below, we’ll explore why iterative summarization is essential and how high-speed inference makes this approach practical and effective.

How Iteration Improves Summarization & Why Fast Inference Matters

Having established iterative summarization as the cornerstone of our report generation pipeline, let’s examine why this approach dramatically enhances quality and why fast inference is critical to making this feasible. The iterative summarization method—Reflect → Elaborate → Critique → Refine—is fundamentally designed to address common pitfalls in automated summarization, such as factual inaccuracies, omission of critical details, and surface-level analysis. Each iteration improves upon the previous one, guided by structured feedback and critique, ensuring that summaries are thorough, accurate, and nuanced. Research studies underscore the effectiveness of this approach. For instance, the Self-Refine method (Madaan et al., 2023) demonstrated that iterative self-feedback significantly boosts accuracy by as much as 20%. Similarly, SelfCheckGPT (Manakul et al., 2023) confirmed that incorporating a critique phase using a second model or alternative decoding strategies substantially reduces errors, omissions, and hallucinations common in single-pass summarization. However, such iterative processes are computationally intensive, requiring multiple sequential LLM calls for every article. In our pipeline, summarizing just 12 articles involves nearly 50 sequential model invocations. Traditional inference providers with slower response times would render such an iterative loop impractical due to prohibitively high latency. Our pipeline leverages fast inference—using models like Llama 3.3 70B and Qwen 3 32B, served at speeds exceeding 2100 tokens per second—to execute each summarization step in mere seconds. This rapid response makes the iterative approach not only feasible but highly practical, enabling near-instant feedback loops that significantly enhance final report quality. In this cookbook recipe, we’ll walk you through how to build this multi-agent system step by step.

Architecture Overview

The system’s architecture is modeled after an assembly line for knowledge work, comprising five distinct, specialized agents that each perform a single task before passing their work to the next:

Interactor: Refines the user’s initial topic by asking clarifying questions and capturing the answers to create a detailed research_brief.json.
Researcher: Takes the research brief, generates targeted queries, searches the web, and uses an advanced iterative loop (Reflect, Elaborate, Critique, Refine) to produce high-quality article summaries.
Outliner: Synthesizes all the research summaries into a structured, logical blueprint for the final report, embedding citation placeholders at each step.
Writer: Composes the full, human-readable narrative in Markdown, transforming the outline’s bullet points into flowing paragraphs and preserving the citation placeholders.
Citation Manager: Performs the final post-processing step, converting the placeholders into numbered citations and building a professional reference list at the end of the document.

Prerequisites

In order to be more concise, we have not included every snippet of code found in the codebase for this agent. You may notice import statements and the main orchestrations missing. You can find the entirety of the code in its directory here.

To complete this workflow you will require a Cerebras Inference API key, as well as an API key for Exa. Please ensure you have those before beginning, and then define them as environment variables as such:

export CEREBRAS_API_KEY="insert-api-key-here"
export EXA_API_KEY="insert-api-key-here"

Shared Client Architecture

To optimize performance and reduce API initialization overhead, the system uses a shared client architecture. Instead of creating a new Cerebras client instance for each function call, we create it once and reuse it throughout the entire pipeline. The cerebras_client.py module implements a singleton pattern:

_client = None
_async_client = None

def get_client():
    """Returns a singleton Cerebras client instance."""
    global _client
    if _client is None:
        if not os.environ.get("CEREBRAS_API_KEY"):
            raise ValueError("CEREBRAS_API_KEY environment variable not set")
        _client = Cerebras()
    return _client

This approach provides several benefits:

Performance: Eliminates repeated client initialization overhead
Resource efficiency: One connection/session instead of many
Flexibility: Both synchronous and asynchronous clients available
Consistency: All modules use the same client instance

Each module simply imports and calls get_client() or get_async_client() as needed, with the client being created lazily on first use and reused for all subsequent API calls.

Configuration System

The system uses a centralized configuration file (config.yaml) that allows easy customization of models, API limits, word counts, and other parameters without modifying code. This makes the system highly configurable and maintainable. The config_loader.py module provides access to these settings:

import yaml
import os

_config = None

def get_config():
    """Get the loaded configuration."""
    global _config
    if _config is None:
        config_path = os.path.join(os.path.dirname(__file__), "config.yaml")
        with open(config_path, "r") as f:
            _config = yaml.safe_load(f)
    return _config

All modules import and use the config consistently, ensuring centralized control over system behavior.

Step 1: Interaction

The first step in our pipeline focuses on clarifying and capturing the user’s precise intent, ensuring the accuracy and relevance of the final output. The script comprises two core functions:

ask_follow_up_questions(topic) - Takes the user’s provided topic and generates three structured follow-up questions.
capture_user_answers(questions) - Presents these generated questions one-by-one, interactively capturing the user’s answers.

We use structured outputs to capture the initial topic, questions, and responses into a JSON object that is saved as research_brief.json.

def ask_follow_up_questions(topic: str, model: str = None) -> list:
    """
    Generates insightful follow-up questions for a given topic using the Cerebras API.
    """
    print(f"\n🧠 Generating follow-up questions for topic: '{topic}'...")
    try:
        config = get_config()
        if model is None:
            model = config['models']['question_generator']
        client = get_client()
        questions_schema = {
            "type": "object",
            "properties": {
                "questions": {
                    "type": "array",
                    "description": f"A list of {config['research']['num_follow_up_questions']} insightful, open-ended follow-up questions.",
                    "items": {"type": "string"}
                }
            },
            "required": ["questions"]
        }
        system_prompt = (
            "You are a helpful research assistant. Your primary goal is to better understand what the user "
            f"is specifically looking for. Based on their topic, generate exactly {config['research']['num_follow_up_questions']} insightful "
            "follow-up questions to help them narrow down their request and clarify the exact "
            "angle they want for a news report."
        )
        user_prompt = f"The user wants a report on the following topic: '{topic}'."
        completion = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            response_format={
                "type": "json_schema",
                "json_schema": {"name": "follow_up_questions", "strict": True, "schema": questions_schema}
            },
            max_tokens=config['api']['max_tokens']['default'],
        )
        response_content = completion.choices[0].message.content
        parsed_response = json.loads(response_content)
        print("   - ✅ Successfully generated questions.")
        return parsed_response.get("questions", [])
    except Exception as e:
        print(f"   - ❌ An error occurred: {e}")
        return []

def capture_user_answers(questions: list) -> dict:
    """
    Prompts the user to answer a list of questions and captures their responses.
    """
    print("\n--- To help me understand what you're looking for, please answer these questions: ---")
    answers = {}
    for i, question in enumerate(questions, 1):
        # Prompt the user for an answer to the current question
        user_answer = input(f"❓ {question}\n> ")
        answers[question] = user_answer
    print("-------------------------------------------------------------------------------------\n")
    return answers

Step 2: Research and Advanced Summarization

Next, we build out the core research engine of the pipeline. It transforms the detailed research_brief.json from Step 1 into a collection of high-quality, detailed summaries through a combination of web searches and advanced AI summarization. This script operates around two main functions:

run_research_tasks(research_brief) - Orchestrates a “General + Specific” query strategy. It first formulates one broad search query based on your initial topic, followed by three specific queries derived from the clarifying questions and answers in the brief. Using the Exa API, it retrieves the top three articles for each query, resulting in 12 selected source articles.
summarize_single_article(article_text, research_brief) - Conducts a sophisticated four-step “Iterative Refinement” summarization loop:
1. Reflect: Identifies key points to structure the summary.
2. Elaborate: Generates an initial detailed summary draft.
3. Critique: Employs a different model (Qwen-3-32B) to review and critique the summary.
4. Refine: Produces a final, refined summary incorporating the critique.

This iterative approach significantly enhances summary accuracy and completeness, ensuring each summary is deeply informative and trustworthy. Outputs include:

search_queries.json
raw_exa_results.json
summarized_articles.json (most important - sets the stage for subsequent outlining and drafting phases)

# Initialize API clients
def summarize_single_article(article_text: str, research_brief: dict) -> str:
    """
    Performs a four-step "Reflect, Elaborate, Critique, Refine" summarization.
    """
    try:
        # Step 1 & 2: Reflect and Elaborate (Generate v1 Summary)
        print("   - Step 1 & 2: Generating initial summary (v1)...")
        reflection_schema = {"type": "object", "properties": {"key_points": {"type": "array", "description": "A list of the 3-5 most important, distinct points.", "items": {"type": "string"}}}, "required": ["key_points"]}
        reflection_prompt = f"Read the following article and identify the 3-5 most important points relevant to this research brief:\n{json.dumps(research_brief, indent=2)}\n\nARTICLE:\n{article_text}"
        reflection_completion = get_client().chat.completions.create(model=MODEL, messages=[{"role": "user", "content": reflection_prompt}], response_format={"type": "json_schema", "json_schema": {"name": "key_points_extractor", "strict": True, "schema": reflection_schema}}, max_tokens=config['api']['max_tokens']['extended'])
        key_points = json.loads(reflection_completion.choices[0].message.content).get("key_points", [])

        if not key_points:
            print("   - Could not extract key points. Skipping article.")
            return ""

        elaboration_prompt = f"Write a detailed summary based on these key points, using the full article text for context.\n\nKEY POINTS:\n{json.dumps(key_points, indent=2)}\n\nFULL ARTICLE:\n{article_text}"
        elaboration_completion = get_client().chat.completions.create(model=MODEL, messages=[{"role": "user", "content": elaboration_prompt}], max_tokens=config['api']['max_tokens']['extended'])
        summary_v1 = elaboration_completion.choices[0].message.content
        print("   - v1 Summary generated.")

        # Step 3: Critique the v1 Summary
        print("   - Step 3: Critiquing the summary...")
        critique_prompt = textwrap.dedent(f"""\
            You are a meticulous editor. Read the original article and the generated summary.
            Identify any key facts, statistics, or nuances that were missed in the summary.
            Provide a specific, actionable list of feedback for improvement.

            ORIGINAL ARTICLE:
            {article_text}

            ---
            SUMMARY TO CRITIQUE:
            {summary_v1}
        """)
        critique_completion = get_client().chat.completions.create(model=config['models']['critique_model'], messages=[{"role": "user", "content": critique_prompt}], max_tokens=config['api']['max_tokens']['default'])
        critique = critique_completion.choices[0].message.content
        print("   - Critique complete.")

        # Step 4: Refine the summary based on the critique
        print("   - Step 4: Refining the summary (v2)...")
        refinement_prompt = textwrap.dedent(f"""\
            You are a writer. Rewrite the 'ORIGINAL SUMMARY' to incorporate the 'EDITOR'S FEEDBACK'.
            Use the 'ORIGINAL ARTICLE' as the ultimate source of truth.

            ORIGINAL ARTICLE:
            {article_text}

            ---
            EDITOR'S FEEDBACK:
            {critique}

            ---
            ORIGINAL SUMMARY:
            {summary_v1}
        """)
        refinement_completion = get_client().chat.completions.create(model=MODEL, messages=[{"role": "user", "content": refinement_prompt}], max_tokens=config['api']['max_tokens']['extended'])
        final_summary = refinement_completion.choices[0].message.content
        print("   - Final summary created.")
        return final_summary

    except Exception as e:
        print(f"   - An error occurred during summarization: {e}")
        return ""


def run_research_tasks(research_brief: dict) -> tuple[list, list, list]:
    """Orchestrates the entire research process."""
    config = get_config()
    all_queries, summarized_articles, raw_exa_results = [], [], []
    query_schema = {"type": "object", "properties": {"query": {"type": "string", "description": "A single, targeted search query."}}, "required": ["query"]}

    # General Research Task
    print(f"\n--- Starting General Research Task (1 query, {config['research']['num_search_results_per_query']} articles) ---")
    try:
        prompt = f"Generate one broad, foundational search query for the topic: \"{research_brief['initial_topic']}\""
        completion = get_client().chat.completions.create(model=MODEL, messages=[{"role": "user", "content": prompt}], response_format={"type": "json_schema", "json_schema": {"name": "query_generator", "strict": True, "schema": query_schema}}, max_tokens=config['api']['max_tokens']['default'])
        general_query = json.loads(completion.choices[0].message.content).get("query")
        
        if general_query:
            all_queries.append(general_query)
            print(f"Generated General Query: \"{general_query}\"")
            print(f"   - Searching with Exa for top {config['research']['num_search_results_per_query']} articles...")
            search_response = exa_client.search_and_contents(general_query, num_results=config['research']['num_search_results_per_query'], text=True)
            raw_exa_results.append({"query": general_query, "results": [{"url": r.url, "title": r.title, "id": r.id, "published_date": r.published_date, "text": r.text} for r in search_response.results]})

            for i, result in enumerate(search_response.results, 1):
                print(f"\n   --- Summarizing Article {i}/{len(search_response.results)}: \"{result.title}\" ---")
                summary = summarize_single_article(result.text, research_brief)
                if summary:
                    summarized_articles.append({"url": result.url, "title": result.title, "summary": summary})
    except Exception as e:
        print(f"   - An error occurred during the general research task: {e}")

    # Specific Research Tasks
    for i in range(len(research_brief['clarifying_questions'])):
        question = research_brief['clarifying_questions'][i]
        answer = research_brief['user_answers'][i]
        print(f"\n--- Starting Specific Research Task {i+1}/{len(research_brief['clarifying_questions'])} (1 query, {config['research']['num_search_results_per_query']} articles) ---")
        try:
            prompt = f"Based ONLY on the answer to the question below, generate one highly specific search query.\nQuestion: {question}\nUser's Answer: {answer}"
            completion = get_client().chat.completions.create(model=MODEL, messages=[{"role": "user", "content": prompt}], response_format={"type": "json_schema", "json_schema": {"name": "query_generator", "strict": True, "schema": query_schema}}, max_tokens=config['api']['max_tokens']['default'])
            specific_query = json.loads(completion.choices[0].message.content).get("query")

            if specific_query:
                all_queries.append(specific_query)
                print(f"Generated Specific Query: \"{specific_query}\"")
                print("   - Searching with Exa for top articles...")
                search_response = exa_client.search_and_contents(specific_query, num_results=config['research']['num_search_results_per_query'], text=True)
                raw_exa_results.append({"query": specific_query, "results": [{"url": r.url, "title": r.title, "id": r.id, "published_date": r.published_date, "text": r.text} for r in search_response.results]})

                for j, result in enumerate(search_response.results, 1):
                    print(f"\n   --- Summarizing Article {j}/{len(search_response.results)}: \"{result.title}\" ---")
                    summary = summarize_single_article(result.text, research_brief)
                    if summary:
                        summarized_articles.append({"url": result.url, "title": result.title, "summary": summary})
        except Exception as e:
            print(f"   - An error occurred during specific research task {i+1}: {e}")

    return all_queries, summarized_articles, raw_exa_results

Step 3: Creating a Structured Outline

Phase 3 transforms the collection of detailed summaries into a single, coherent outline for the final report. This process is managed by the create_report_outline(summaries) function within 3_outliner.py. The script operates as follows:

It begins by loading the article summaries from summarized_articles.json.
The summaries are combined into a single context and sent to the LLM, which is prompted to synthesize the information and organize it into a logical narrative flow.
To ensure a predictable and usable structure, the model’s output is constrained by a detailed JSON schema.

The final output is report_outline.json, a structured file that contains the blueprint for the report, including a title, introduction, body sections, conclusion, and bullet points. Each bullet point is paired with a list of source indices, ensuring every claim is traceable back to its original source material. This file acts as the primary input for the next agent, the Writer.

def create_report_outline(summaries: list, model: str = None) -> dict:
    """
    Generates a structured report outline from a list of article summaries.

    Args:
        summaries: A list of dictionaries, where each dict has 'title', 'url', and 'summary'.
        model: The Cerebras model to use.

    Returns:
        A dictionary containing the structured report outline.
    """
    print("Generating report outline...")

    try:
        config = get_config()
        if model is None:
            model = config['models']['outliner_model']
        client = get_client()

        # Define the detailed JSON schema for the report outline
        outline_schema = {
            "type": "object",
            "properties": {
                "report_title": {
                    "type": "string",
                    "description": "A compelling and informative title for the news report."
                },
                "introduction": {
                    "type": "string",
                    "description": "An introductory paragraph summarizing the report's key themes and findings."
                },
                "body_sections": {
                    "type": "array",
                    "description": "A list of the main sections of the report.",
                    "items": {
                        "type": "object",
                        "properties": {
                            "section_heading": {
                                "type": "string",
                                "description": "A descriptive heading for this section."
                            },
                            "bullet_points": {
                                "type": "array",
                                "description": "A list of key points, facts, or arguments for this section.",
                                "items": {
                                    "type": "object",
                                    "properties": {
                                        "content": {
                                            "type": "string",
                                            "description": "The text of the bullet point."
                                        },
                                        "sources": {
                                            "type": "array",
                                            "description": "A list of 1-based integer indices referencing the source summaries.",
                                            "items": {"type": "integer"}
                                        }
                                    },
                                    "required": ["content", "sources"]
                                }
                            }
                        },
                        "required": ["section_heading", "bullet_points"]
                    }
                },
                "conclusion": {
                    "type": "string",
                    "description": "A concluding paragraph that summarizes the report and offers final thoughts."
                }
            },
            "required": ["report_title", "introduction", "body_sections", "conclusion"]
        }

        # Prepare the context by concatenating all summaries
        context = ""
        for i, article in enumerate(summaries, 1):
            context += f"--- Source {i} ---\n"
            context += f"Title: {article['title']}\n"
            context += f"URL: {article['url']}\n"
            context += f"Summary:\n{article['summary']}\n\n"

        system_prompt = (
            "You are an expert editor and report strategist. Your task is to synthesize the provided "
            f"research summaries into a detailed, structured outline for an {config['report']['outline_word_count']['min']}-{config['report']['outline_word_count']['max']} word news report. "
            "Analyze all sources, identify the main themes, and create a logical narrative flow. "
            "For each bullet point in the outline, you MUST cite the integer index of the source(s) it came from."
        )
        user_prompt = f"Here is the research content from all sources:\n\n{context}"

        completion = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            response_format={
                "type": "json_schema",
                "json_schema": {
                    "name": "report_outline_generator",
                    "strict": True,
                    "schema": outline_schema
                }
            },
            max_tokens=config['api']['max_tokens']['extended'],
        )

        response_content = completion.choices[0].message.content
        report_outline = json.loads(response_content)
        
        print("   - ✅ Successfully generated report outline.")
        return report_outline

    except Exception as e:
        print(f"   - An error occurred: {e}")
        return {}

Step 4: Drafting the Report

Phase 4 transforms the structured outline from the previous step into a complete, narrative-driven report. This task is managed by the write_report_from_outline(outline) function. The script operates by loading the report_outline.json file and using its contents to construct a detailed prompt. The prompt directs the model to:

Act as an expert journalist, weaving the outline’s bullet points into a flowing, paragraph-based article.
Use Markdown for all formatting.
Preserve the source citations using a critical format, such as [Source 1] for a single source and [Source 1, 3, 5] for multiple sources.

The output is a full-length article saved as draft_report.md. This file serves as the near-final draft, containing all the generated text and correctly formatted citation placeholders, ready for the final processing step.

def write_report_from_outline(outline: dict, model: str = None) -> str:
    """
    Generates a full-length Markdown report from a structured outline,
    with very strict instructions for citation formatting.
    """
    print("✍️  Writing full report from outline with improved style and strict citations...")

    try:
        config = get_config()
        if model is None:
            model = config['models']['writer_model']
        client = get_client()

        system_prompt = textwrap.dedent("""\
            You are an expert journalist and author with a talent for creating engaging narratives from research.
            Your task is to write a full, comprehensive news report ({config['report']['target_word_count']['min']}-{config['report']['target_word_count']['max']} words) based on the provided JSON outline.

            **Your primary goal is to transform the outline's bullet points into well-structured, flowing paragraphs.** Weave them into a compelling story.

            Follow these critical instructions:
            1.  **Narrative First**: Write in full paragraphs. The final output should read like a professional news article, not a list.
            2.  **Selective Bullet Points**: You may use bullet points sparingly, ONLY if it makes sense for clarity.
            3.  **Formatting**: Use Markdown for all formatting. The main title should be a Level 1 Heading (`#`), and all section headings should be Level 2 Headings (`##`).
            4.  **CRITICAL CITATION FORMAT**: You MUST preserve the source citation markers. The format is extremely important.
                - For a single source, use the format `[Source 1]`.
                - For multiple sources, combine them inside ONE bracket, like this: `[Source 1, 3, 5]`.
                - The word "Source" must appear only ONCE per placeholder.
                - **DO NOT** create separate placeholders like `[Source 1], [Source 3]`.
                - **DO NOT** nest placeholders like `[Source 1, [Source 3]]`.
        """)
        
        user_prompt = f"Please write the report based on the following outline:\n\n{json.dumps(outline, indent=2)}"

        completion = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            max_tokens=config['api']['max_tokens']['extended'],
            temperature=config['api']['temperature'] 
        )

        final_report_markdown = completion.choices[0].message.content
        
        print("   - Successfully generated draft report with clean citations.")
        return final_report_markdown

    except Exception as e:
        print(f"   - An error occurred during report generation: {e}")
        return ""

Step 5: Citation Management and Finalization

The final step, executed by citation_manager.py, polishes the report by formatting citations and appending a complete reference list. This entire process is handled by the create_final_report() function. The script operates as follows:

It loads the two required inputs: draft_report.md, which contains the text with [Source X] placeholders, and summarized_articles.json, which holds the metadata for each source.
It parses the draft to find every unique source that was cited, regardless of where it appears.
It re-numbers the sources to ensure they appear sequentially (1, 2, 3…) in the final document. The original [Source X] placeholders are replaced with these new, ordered numbers.
A “References” section is generated in Markdown, listing each unique source with its title and URL.

def create_final_report():
    """
    Reads the draft report and summarized articles to produce a final,
    cited report, correctly handling both single and multi-source placeholders.
    """
    print("🚀 Finalizing report and formatting citations...")

    try:
        with open("draft_report.md", "r", encoding="utf-8") as f:
            draft_text = f.read()
        
        with open("summarized_articles.json", "r", encoding="utf-8") as f:
            sources = json.load(f)

    except FileNotFoundError as e:
        print(f"❌ Error: Could not find a necessary file. Make sure '{e.filename}' exists.")
        print("Please run all previous phases before this script.")
        return

    # Use a more robust regex to find all source placeholders
    placeholder_groups = re.findall(r"\[Source ([^\]]+)\]", draft_text)
    
    if not placeholder_groups:
        print("⚠️ No source placeholders found in the draft. Saving as is.")
        with open("final_report.md", "w", encoding="utf-8") as f:
            f.write(draft_text)
        return

    # Create a definitive list of all unique sources cited
    all_source_indices = set()
    for group in placeholder_groups:
        indices = [int(s.strip()) for s in group.split(',')]
        all_source_indices.update(indices)

    sorted_unique_indices = sorted(list(all_source_indices))
    
    # Create the mapping from original index to new citation number
    citation_map = {original_index: new_index for new_index, original_index in enumerate(sorted_unique_indices, 1)}
    
    print(f"   - Found {len(sorted_unique_indices)} unique sources to cite.")

    # Define a replacer function to handle each placeholder
    def replace_match(match):
        number_string = match.group(1)
        original_indices = [int(s.strip()) for s in number_string.split(',')]
        new_numbers = sorted([citation_map[idx] for idx in original_indices])
        return f"[{', '.join(map(str, new_numbers))}]"

    # Use the robust replacer function to substitute all placeholders
    final_text = re.sub(r"\[Source ([^\]]+)\]", replace_match, draft_text)
    print("   - Replaced all placeholders.")

    # Build the final reference list
    reference_list_md = "\n\n---\n\n## References\n"
    for original_index in sorted_unique_indices:
        new_citation_number = citation_map[original_index]
        source_data = sources[original_index - 1]
        reference_list_md += f"{new_citation_number}. [{source_data['title']}]({source_data['url']})\n"
    
    print("   - Built reference list.")

    # Combine the modified text with the reference list
    final_report_text = final_text + reference_list_md
    with open("final_report.md", "w", encoding="utf-8") as f:
        f.write(final_report_text)
    
    print("\n✅ Project Complete! ✅")
    print("Successfully saved the polished report to 'final_report.md'.")

Conclusion

The fully processed text and the new reference list are then saved together as final_report.md, the completed output of the pipeline. This tutorial demonstrated a five-phase pipeline that automates the creation of a comprehensive, cited news report from a single user topic. By breaking the process into distinct, manageable steps—from user interaction to final citation management—the system ensures a high-quality and coherent output. A core takeaway is the power of iterative refinement in the research phase. The “Reflect, Elaborate, Critique, Refine” summarization loop significantly enhances the accuracy and depth of the generated summaries. This technique, which involves using a second model to provide critical feedback, improves the quality of the final content without requiring expensive model fine-tuning. Such advanced, multi-step agentic workflows are only practical with access to high-speed inference. The research phase alone requires nearly 50 sequential LLM calls to process all the articles. Traditional inference speeds would introduce significant latency, making this iterative approach impractical. Fast, low-latency inference is the enabling technology that allows for the development of more sophisticated and reliable AI agents. While this pipeline focused on generating a news report, the “generate, critique, refine” pattern is a versatile technique applicable to numerous AI agentic workflows:

Code Generation: An agent could write a function, a second agent could critique it for bugs and style, and a third could implement the suggested fixes.
Strategic Planning: An agent could draft a business plan, a critique agent could identify potential risks or logical gaps, and a refine agent could create a more robust final strategy.
Creative Writing: An agent could write a chapter of a story, a critique agent could check for plot holes or inconsistent character voices, and a refine agent could rewrite the section to address the feedback.

Ultimately, this iterative refinement strategy is a fundamental building block for creating more accurate, reliable, and intelligent automated systems, regardless of the specific task.

Cookbook

​How Iteration Improves Summarization & Why Fast Inference Matters

​Architecture Overview

​Prerequisites

​Shared Client Architecture

​Configuration System

​Step 1: Interaction

​Step 2: Research and Advanced Summarization

​Step 3: Creating a Structured Outline

​Step 4: Drafting the Report

​Step 5: Citation Management and Finalization

​Conclusion

How Iteration Improves Summarization & Why Fast Inference Matters

Architecture Overview

Prerequisites

Shared Client Architecture

Configuration System

Step 1: Interaction

Step 2: Research and Advanced Summarization

Step 3: Creating a Structured Outline

Step 4: Drafting the Report

Step 5: Citation Management and Finalization

Conclusion