Skip to main content
This feature is in Private Preview. For access or more information, contact us or reach out to your account representative.
The Batch API lets you process groups of requests asynchronously, making it perfect for workloads where you don’t need immediate results:
  • Evaluation pipelines: Test model performance across thousands of test cases
  • Data labeling: Classify or annotate large datasets for training
  • Content generation: Create product descriptions, summaries, or translations in bulk
  • Research and analysis: Process scientific data or run experiments at scale
You’ll get 50% off regular pricing and guaranteed completion within 24 hours.

How It Works

The basic workflow has four steps:
  1. Prepare a JSONL file containing all your requests
  2. Upload the file using the Files API
  3. Submit a batch job referencing your uploaded file
  4. Download results once processing completes
Behind the scenes, Cerebras processes your requests during periods of lower demand, which enables the significant cost savings.

Create a Batch Request

1

Prepare your input file

Start by creating a .jsonl file where each line represents one API request. Every request needs a unique custom_id so you can match inputs to outputs later. The available endpoint is currently /v1/chat/completions.Here’s what two requests look like:
{"custom_id": "eval-001", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "llama-3.3-70b", "messages": [{"role": "user", "content": "Summarize the water cycle"}], "max_completion_tokens": 500}}
{"custom_id": "eval-002", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "llama-3.3-70b", "messages": [{"role": "user", "content": "Explain photosynthesis"}], "max_completion_tokens": 500}}
Each line contains the same parameters you’d use in a regular chat completion request, just wrapped in the batch format.Important constraints:
  • Maximum 200 MB file size
  • Minimum 10 requests
  • Up to 50,000 requests per file
  • Each line limited to 1 MB
  • UTF-8 encoding with LF line endings
  • All requests must use the same model
2

Upload your file

Use the Files API to upload your prepared input file:
from cerebras.cloud.sdk import Cerebras

client = Cerebras(api_key="your-api-key")

input_file = client.files.create(
    file=open("my_batch_requests.jsonl", "rb"),
    purpose="batch"
)

print(f"Uploaded file: {input_file.id}")
3

Start the batch job

Once your file is uploaded, create the batch job:
batch = client.batches.create(
    input_file_id=input_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={"description": "Evaluation run - December 2025"}
)

print(f"Batch started: {batch.id}")
print(f"Status: {batch.status}")
The batch starts in queued status, then moves to in_progress, finalizing, and finally completed.
4

Track progress

Check on your batch job anytime to see how it’s progressing:
batch = client.batches.retrieve("batch_abc123")

print(f"Status: {batch.status}")
print(f"Completed: {batch.request_counts.completed}/{batch.request_counts.total}")
The response includes detailed request count metrics showing how many requests have completed and failed.
5

Get your results

When the status shows completed, download your results:
# Retrieve the result file
results = client.batches.retrieve_results("batch_abc123")

# Save locally
with open("batch_results.jsonl", "wb") as f:
    f.write(results)
Your results file contains one line per request. Successful requests include the full completion response, while failed requests include error details:
{"custom_id":"eval-001","status":"succeeded","response":{"id":"cmpl_1","object":"chat.completion","created":1699999999,"model":"llama-3.3-70b","choices":[{"index":0,"message":{"role":"assistant","content":"The water cycle consists of..."},"finish_reason":"stop"}],"usage":{"prompt_tokens":15,"completion_tokens":85,"total_tokens":100}}}
{"custom_id":"eval-002","status":"succeeded","response":{"id":"cmpl_2","object":"chat.completion","created":1700000000,"model":"llama-3.3-70b","choices":[{"index":0,"message":{"role":"assistant","content":"Photosynthesis is the process..."},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":92,"total_tokens":104}}}
Results aren’t necessarily in the same order as your input. Always use custom_id to match requests to responses.

Batch States

Your batch progresses through these states:
StateWhat’s happening
queuedWaiting for processing capacity
in_progressActively processing requests
finalizingPreparing output file
completedAll done - results ready to download
failedSystem error prevented completion
expiredExceeded 24-hour window
cancelledYou stopped the batch
cancellingCancellation in progress
Most batches complete well before the 24-hour limit, often within a few hours depending on size and load.

Expired Batches

If your batch doesn’t finish within 24 hours, unprocessed requests are marked as expired. You’ll still get results for any completed requests, and expired ones appear in your output like this:
{"custom_id":"eval-999","status":"expired","error":{"code":"timeout","message":"Batch expired before this request completed."}}
You’re only charged for requests that completed.

Cancel a Batch

Use the cancel endpoint:
cancelled = client.batches.cancel("batch_abc123")
print(f"Status: {cancelled.status}")
Any completed requests remain available in your results. Unfinished requests will be deleted.

Limits and Quotas

  • 50,000 requests maximum per batch
  • 200 MB maximum file size
  • 1 MB maximum per request line
  • 10 concurrent active batches
  • Configurable rate limits separate from real-time API
  • Results retained for 7 days after completion
  • Automatic deletion after expiration (download promptly!)

Organize Batches

Use metadata to keep track of different batch runs. Metadata is a set of key-value pairs that you can attach to a batch job for your own organizational purposes. For example, to track which environment, dataset, or version a batch belongs to.
batch = client.batches.create(
    input_file_id="file_xyz",
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
        "environment": "production",
        "dataset": "customer_feedback_q4",
        "version": "v2.1"
    }
)
The metadata you provide is stored with the batch object and returned in all API responses, making it easy to filter, search, and organize your batch jobs. Common uses include:
  • Environment tracking: Tag batches with production, staging, or development
  • Dataset identification: Link batches to specific datasets or experiments
  • Version control: Track which version of your prompts or models you’re testing

Handling Large Datasets

Need to process more than 50,000 items? Split them across multiple batches:
def split_into_batches(items, batch_size=50000):
    for i in range(0, len(items), batch_size):
        yield items[i:i + batch_size]

# Process each chunk as a separate batch
for i, chunk in enumerate(split_into_batches(all_items)):
    file = create_batch_file(chunk, f"batch_{i}.jsonl")
    # Submit each batch...