Skip to main content

What is Reducto?

Reducto is a document parsing platform that extracts structured data from PDFs, images, Word documents, spreadsheets, and presentations. It converts complex documents into clean markdown with bounding boxes, tables, figures, and metadata—making it easy to feed document content into LLMs for analysis, summarization, and question-answering. By combining Reducto’s parsing capabilities with Cerebras’s ultra-fast inference, you can build powerful document processing pipelines that analyze thousands of documents in seconds. Learn more at Reducto.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here
  • Reducto Account - Visit Reducto and create an account to get your API key
  • Python 3.7 or higher
  • Documents to parse - PDFs, images, Word docs, or other supported formats

Configure Reducto with Cerebras

1

Install required dependencies

Install the Reducto SDK and OpenAI client for Cerebras:
pip install reductoai openai python-dotenv requests
2

Configure environment variables

Create a .env file in your project directory with your API keys. This keeps your credentials secure and separate from your code.
CEREBRAS_API_KEY=your-cerebras-api-key-here
REDUCTO_API_KEY=your-reducto-api-key-here
You can find your Reducto API key in your Reducto dashboard under Settings.
3

Parse a document with Reducto

Use Reducto to extract structured content from your document. Reducto converts complex documents into clean markdown, preserving tables, figures, and document structure.
import os
import requests
from pathlib import Path
from reducto import Reducto
from dotenv import load_dotenv

load_dotenv()

# Initialize Reducto client
client = Reducto(api_key=os.getenv("REDUCTO_API_KEY"))

# Download sample PDF
pdf_url = "https://www.visitissaquahwa.com/wp-content/uploads/2023/03/Issaquah-Trails-Map-202108041607087155.pdf"
response = requests.get(pdf_url)

with open("/tmp/temp_doc.pdf", "wb") as f:
    f.write(response.content)

# Upload and parse the document
upload = client.upload(file=Path("/tmp/temp_doc.pdf"))
result = client.parse.run(input=upload.file_id)

# Get the parsed content from chunks
parsed_content = "\n".join([chunk.content for chunk in result.result.chunks])
print(f"Parsed {len(parsed_content)} characters of content")
The parsed_content variable now contains clean markdown with all text, tables, and figures extracted from your document.
4

Analyze parsed content with Cerebras

Now that you have structured content from Reducto, use Cerebras to analyze it. Cerebras’s fast inference means you can process hundreds of documents per minute.
Python
import os
import requests
from pathlib import Path
from reducto import Reducto
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Initialize clients
reducto_client = Reducto(api_key=os.getenv("REDUCTO_API_KEY"))
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "Reducto"
    }
)

# Download and parse document
pdf_url = "https://www.visitissaquahwa.com/wp-content/uploads/2023/03/Issaquah-Trails-Map-202108041607087155.pdf"
response = requests.get(pdf_url)
with open("/tmp/temp_doc.pdf", "wb") as f:
    f.write(response.content)

upload = reducto_client.upload(file=Path("/tmp/temp_doc.pdf"))
result = reducto_client.parse.run(input=upload.file_id)
parsed_content = "\n".join([chunk.content for chunk in result.result.chunks])

# Analyze the parsed document with Cerebras
response = cerebras_client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant that analyzes documents and provides clear summaries."
        },
        {
            "role": "user",
            "content": f"Please summarize this document:\n\n{parsed_content}"
        }
    ],
    max_tokens=1000
)

summary = response.choices[0].message.content
print("Document Summary:")
print(summary)
5

Extract structured information

You can also use Cerebras to extract specific information from parsed documents. This example extracts key financial metrics using JSON mode for structured output.
Python
import os
import json
import requests
from pathlib import Path
from reducto import Reducto
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Initialize clients
reducto_client = Reducto(api_key=os.getenv("REDUCTO_API_KEY"))
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "Reducto"
    }
)

# Download and parse document
pdf_url = "https://www.visitissaquahwa.com/wp-content/uploads/2023/03/Issaquah-Trails-Map-202108041607087155.pdf"
response = requests.get(pdf_url)
with open("/tmp/temp_doc.pdf", "wb") as f:
    f.write(response.content)

upload = reducto_client.upload(file=Path("/tmp/temp_doc.pdf"))
result = reducto_client.parse.run(input=upload.file_id)
parsed_content = "\n".join([chunk.content for chunk in result.result.chunks])

# Extract structured data from the document
response = cerebras_client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {
            "role": "system",
            "content": "You are a financial analyst. Extract key metrics from documents and return them as JSON."
        },
        {
            "role": "user",
            "content": f"""Extract the following information from this document:
            - Revenue
            - Net Income
            - Total Assets
            - Key Risks
            
            Document content:
            {parsed_content}
            
            Return the data as JSON."""
        }
    ],
    response_format={"type": "json_object"},
    max_tokens=1000
)

extracted_data = json.loads(response.choices[0].message.content)
print("Extracted Financial Data:")
print(json.dumps(extracted_data, indent=2))

Complete Example: Document Q&A Pipeline

Here’s a complete example that combines Reducto’s parsing with Cerebras’s inference to create a document question-answering system:
import os
import requests
from pathlib import Path
from reducto import Reducto
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Initialize clients
reducto_client = Reducto(api_key=os.getenv("REDUCTO_API_KEY"))
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "Reducto"
    }
)

# Download and parse document
pdf_url = "https://www.visitissaquahwa.com/wp-content/uploads/2023/03/Issaquah-Trails-Map-202108041607087155.pdf"
response = requests.get(pdf_url)
with open("/tmp/temp_doc.pdf", "wb") as f:
    f.write(response.content)

upload = reducto_client.upload(file=Path("/tmp/temp_doc.pdf"))
result = reducto_client.parse.run(input=upload.file_id)
parsed_content = "\n".join([chunk.content for chunk in result.result.chunks])

# Answer question using Cerebras
question = "What trails are shown on this map?"
response = cerebras_client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant that answers questions based on document content."
        },
        {
            "role": "user",
            "content": f"Question: {question}\n\nDocument content:\n{parsed_content[:2000]}"
        }
    ],
    max_tokens=500
)

print("Answer:")
print(response.choices[0].message.content)

Advanced Features

Process Multiple Documents

Process multiple documents in parallel using Reducto’s batch API and Cerebras’s fast inference. This approach is ideal for analyzing large document collections:
import os
import requests
from pathlib import Path
from reducto import Reducto
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

reducto_client = Reducto(api_key=os.getenv("REDUCTO_API_KEY"))
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "Reducto"
    }
)

def process_document(pdf_url):
    """Parse and summarize a document."""
    # Download PDF
    response = requests.get(pdf_url)
    with open("/tmp/temp_doc.pdf", "wb") as f:
        f.write(response.content)
    
    # Parse with Reducto
    upload = reducto_client.upload(file=Path("/tmp/temp_doc.pdf"))
    result = reducto_client.parse.run(input=upload.file_id)
    parsed_content = "\n".join([chunk.content for chunk in result.result.chunks])
    
    # Summarize with Cerebras
    response = cerebras_client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": "Summarize this document in two to three sentences."},
            {"role": "user", "content": parsed_content}
        ],
        max_tokens=200
    )
    
    return response.choices[0].message.content

# Example: Process a document
pdf_url = "https://www.visitissaquahwa.com/wp-content/uploads/2023/03/Issaquah-Trails-Map-202108041607087155.pdf"
summary = process_document(pdf_url)
print("Document Summary:")
print(summary)

Use Reducto Studio Pipelines

Reducto Studio lets you configure parsing pipelines visually and deploy them for API access. Once you’ve created a pipeline in Reducto Studio, you can use it programmatically:
import os
import requests
from pathlib import Path
from reducto import Reducto
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

reducto_client = Reducto(api_key=os.getenv("REDUCTO_API_KEY"))
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "Reducto"
    }
)

# Download and parse document
pdf_url = "https://www.visitissaquahwa.com/wp-content/uploads/2023/03/Issaquah-Trails-Map-202108041607087155.pdf"
response = requests.get(pdf_url)
with open("/tmp/temp_doc.pdf", "wb") as f:
    f.write(response.content)

upload = reducto_client.upload(file=Path("/tmp/temp_doc.pdf"))
result = reducto_client.parse.run(input=upload.file_id)
parsed_content = "\n".join([chunk.content for chunk in result.result.chunks])

# Analyze with Cerebras
response = cerebras_client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": f"Summarize the key information from this document:\n\n{parsed_content}"}
    ]
)

print(response.choices[0].message.content)

Async Processing with Webhooks

For large document batches, use Reducto’s webhook support for async processing. This pairs well with Cerebras’s fast inference for real-time analysis:
import os
import requests
from pathlib import Path
from reducto import Reducto
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

reducto_client = Reducto(api_key=os.getenv("REDUCTO_API_KEY"))
cerebras_client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "Reducto"
    }
)

# Download and upload document for processing
pdf_url = "https://www.visitissaquahwa.com/wp-content/uploads/2023/03/Issaquah-Trails-Map-202108041607087155.pdf"
response = requests.get(pdf_url)
with open("/tmp/temp_doc.pdf", "wb") as f:
    f.write(response.content)

upload = reducto_client.upload(file=Path("/tmp/temp_doc.pdf"))
print(f"Document uploaded: {upload.file_id}")

# Parse the document
result = reducto_client.parse.run(input=upload.file_id)
parsed_content = "\n".join([chunk.content for chunk in result.result.chunks])

# Analyze with Cerebras
response = cerebras_client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "Summarize this document briefly."},
        {"role": "user", "content": parsed_content[:2000]}
    ],
    max_tokens=200
)
print(response.choices[0].message.content)

Troubleshooting

Check file format: Reducto supports 30+ formats including PDF, DOCX, XLSX, PPTX, and images. Ensure your file isn’t corrupted.File size limits: Large files may need to be split or compressed. Check Reducto’s rate limits for current limits.API key issues: Verify your Reducto API key is correct and has sufficient credits in your dashboard.
Use batch processing: Process multiple documents in parallel using ThreadPoolExecutor to maximize throughput.Optimize prompts: Shorter, more focused prompts reduce token usage and latency. Be specific about what information you need.Choose the right model: Use cerebras/llama3.1-8b for simple tasks like classification, cerebras/llama-3.3-70b for complex analysis and extraction.
Split large documents: If parsed content exceeds token limits, split the document into sections and process separately.Increase max_tokens: Adjust the max_tokens parameter for longer responses, but be mindful of costs.Use summarization: Summarize sections before detailed analysis to reduce token usage.Check context windows: See our models documentation for context window sizes of each model.
Reducto limits: Check your Reducto plan limits and upgrade if needed.Cerebras limits: See our rate limits documentation for current limits and how to request increases.Implement retry logic: Add exponential backoff for production applications to handle temporary rate limits gracefully.
Improve prompts: Be specific about the format and structure you want. Use examples in your prompts.Use JSON mode: Enable response_format={"type": "json_object"} for structured data extraction.Configure Reducto parsing: Adjust Reducto’s parsing configuration to better preserve document structure. See Parse Configurations.Try different models: cerebras/llama-3.3-70b and cerebras/qwen-3-32b offer different strengths for extraction tasks.

Next Steps

For production deployments, consider using Reducto’s webhook support for async processing of large document batches. This pairs well with Cerebras’s fast inference for real-time analysis. See Async Processing & Webhooks for details.