Batch - Cerebras Inference

This feature is in Private Preview. For access or more information, contact us or reach out to your account representative.

The Batch API lets you process groups of requests asynchronously, making it perfect for workloads where you don’t need immediate results:

Evaluation pipelines: Test model performance across thousands of test cases
Data labeling: Classify or annotate large datasets for training
Content generation: Create product descriptions, summaries, or translations in bulk
Research and analysis: Process scientific data or run experiments at scale

Batch requests are guaranteed to complete in 24 hours.

How It Works

To process a batch request:

Prepare a JSONL file containing all your requests
Upload the file using the Files API
Submit a batch job referencing your uploaded file
Download results once processing completes

Behind the scenes, Cerebras processes your requests during periods of lower demand.

Understanding Batch IDs

The Batch API generates different IDs at each stage of the workflow:

ID Type	When is it generated?	Purpose
`input_file_id`	When you upload your JSONL file	Reference your uploaded input file when creating a batch
`batch_id`	When you create a batch job	Track status and manage the batch job
`output_file_id`	When batch completes	Download successful results
`error_file_id`	When batch completes	Download failed request details

Store each ID as you receive it. You’ll need the input file ID to create a batch, the batch ID to check progress, and the output/error file IDs to download results.

Create a Batch Request

SDK support for Batch is not yet available during Private Preview. Use cURL or direct HTTP requests for now—SDK support will be added at GA.

Prepare your input file

Start by creating a .jsonl file where each line represents one API request. Every request needs a unique custom_id so you can match inputs to outputs later. The available endpoint is currently /v1/chat/completions.Here’s what two requests look like:

{"custom_id": "eval-001", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-oss-120b", "messages": [{"role": "user", "content": "Summarize the water cycle"}], "max_completion_tokens": 500}}
{"custom_id": "eval-002", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-oss-120b", "messages": [{"role": "user", "content": "Explain photosynthesis"}], "max_completion_tokens": 500}}

Each line contains the same parameters you’d use in a regular chat completion request, just wrapped in the batch format.Important constraints:

Maximum 200 MB file size
Minimum 10 requests
Up to 50,000 requests per file
Each line limited to 1 MB
UTF-8 encoding with LF line endings
All requests must use the same model

Upload your file

Use the Files API to upload your prepared input file:

from cerebras.cloud.sdk import Cerebras

client = Cerebras(api_key="your-api-key")

input_file = client.files.create(
    file=open("my_batch_requests.jsonl", "rb"),
    purpose="batch"
)

print(f"Uploaded file: {input_file.id}")

Start the batch job

Once your file is uploaded, create the batch job:

batch = client.batches.create(
    input_file_id=input_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={"description": "Evaluation run - December 2025"}
)

print(f"Batch started: {batch.id}")
print(f"Status: {batch.status}")

The batch starts in queued status, then moves to in_progress, finalizing, and finally completed.

Track progress

Check on your batch job anytime to see how it’s progressing:

batch = client.batches.retrieve("<batch-id>")

print(f"Status: {batch.status}")
print(f"Completed: {batch.request_counts.completed}/{batch.request_counts.total}")

The response includes detailed request count metrics showing how many requests have completed and failed.

Get your results

When the status shows completed, the response object will include output_file_id and error_file_id:

Response

{
    "id": "<batch-id>",
    "object": "batch",
    "endpoint": "/v1/chat/completions",
    "status": "completed",
    "output_file_id": "<output-file-id>",
    "error_file_id": "<error-file-id>",
    "created_at": 1768244812,
    "completed_at": 1768244962,
    "request_counts": {
        "total": 10,
        "completed": 6,
        "failed": 4
  }
}

Use the output_file_id and error_file_id to download your results through the Files API:

results = client.files.retrieve_content("<output-file-id>")

with open("batch_results.jsonl", "wb") as f:
    f.write(results)

Your results file contains one line per request. Successful requests include the full completion response, while failed requests include error details:

{"custom_id":"eval-001","status":"succeeded","response":{"id":"cmpl_1","object":"chat.completion","created":1699999999,"model":"gpt-oss-120b","choices":[{"index":0,"message":{"role":"assistant","content":"The water cycle consists of..."},"finish_reason":"stop"}],"usage":{"prompt_tokens":15,"completion_tokens":85,"total_tokens":100}}}
{"custom_id":"eval-002","status":"succeeded","response":{"id":"cmpl_2","object":"chat.completion","created":1700000000,"model":"gpt-oss-120b","choices":[{"index":0,"message":{"role":"assistant","content":"Photosynthesis is the process..."},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":92,"total_tokens":104}}}

Results aren’t necessarily in the same order as your input. Always use custom_id to match requests to responses.

Batch States

Your batch progresses through these states:

State	What’s happening
`in_progress`	Actively processing requests
`finalizing`	Preparing output file
`completed`	Results ready to download
`failed`	System error prevented completion
`expired`	Exceeded 24-hour window
`cancelled`	You stopped the batch
`cancelling`	Cancellation in progress

Most batches complete well before the 24-hour limit, often within a few hours depending on size and load.

Expired Batches

If your batch doesn’t finish within 24 hours, unprocessed requests are marked as expired. You’ll still get results for any completed requests, and expired ones appear in your error_file like this:

{"custom_id":"eval-999","status":"expired","error":{"code":"timeout","message":"Batch expired before this request completed."}}

You’re only charged for requests that completed.

Cancel a Batch

Use the cancel endpoint:

cancelled = client.batches.cancel("<batch-id>")
print(f"Status: {cancelled.status}")

Any completed requests remain available in your results. Unfinished requests will be deleted.

Limits and Quotas

Batch size limits

50,000 requests maximum per batch
200 MB maximum file size
1 MB maximum per request line

Account limits

10 concurrent active batches
Configurable rate limits separate from real-time API

Storage

Results retained for 7 days after completion
Automatic deletion after expiration (download promptly!)

Organize Batches

Use metadata to keep track of different batch runs. Metadata is a set of key-value pairs that you can attach to a batch job for your own organizational purposes. For example, to track which environment, dataset, or version a batch belongs to.

batch = client.batches.create(
    input_file_id="file_xyz",
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
        "environment": "production",
        "dataset": "customer_feedback_q4",
        "version": "v2.1"
    }
)

The metadata you provide is stored with the batch object and returned in all API responses, making it easy to filter, search, and organize your batch jobs. Common uses include:

Environment tracking: Tag batches with production, staging, or development
Dataset identification: Link batches to specific datasets or experiments
Version control: Track which version of your prompts or models you’re testing

Handling Large Datasets

Need to process more than 50,000 items? Split them across multiple batches:

def split_into_batches(items, batch_size=50000):
    for i in range(0, len(items), batch_size):
        yield items[i:i + batch_size]

# Process each chunk as a separate batch
for i, chunk in enumerate(split_into_batches(all_items)):
    file = create_batch_file(chunk, f"batch_{i}.jsonl")
    # Submit each batch...

​How It Works

​Understanding Batch IDs

​Create a Batch Request

​Batch States

​Expired Batches

​Cancel a Batch

​Limits and Quotas

​Organize Batches

​Handling Large Datasets

How It Works

Understanding Batch IDs

Create a Batch Request

Batch States

Expired Batches

Cancel a Batch

Limits and Quotas

Organize Batches

Handling Large Datasets