This feature is in Private Preview. For access or more information, contact us or reach out to your account representative.
- Evaluation pipelines: Test model performance across thousands of test cases
- Data labeling: Classify or annotate large datasets for training
- Content generation: Create product descriptions, summaries, or translations in bulk
- Research and analysis: Process scientific data or run experiments at scale
How It Works
The basic workflow has four steps:- Prepare a JSONL file containing all your requests
- Upload the file using the Files API
- Submit a batch job referencing your uploaded file
- Download results once processing completes
Create a Batch Request
1
Prepare your input file
Start by creating a Each line contains the same parameters you’d use in a regular chat completion request, just wrapped in the batch format.Important constraints:
.jsonl file where each line represents one API request. Every request needs a unique custom_id so you can match inputs to outputs later. The available endpoint is currently /v1/chat/completions.Here’s what two requests look like:- Maximum 200 MB file size
- Minimum 10 requests
- Up to 50,000 requests per file
- Each line limited to 1 MB
- UTF-8 encoding with LF line endings
- All requests must use the same model
2
Upload your file
Use the Files API to upload your prepared input file:
3
Start the batch job
Once your file is uploaded, create the batch job:The batch starts in
queued status, then moves to in_progress, finalizing, and finally completed.4
Track progress
Check on your batch job anytime to see how it’s progressing:The response includes detailed request count metrics showing how many requests have completed and failed.
5
Get your results
When the status shows Your results file contains one line per request. Successful requests include the full completion response, while failed requests include error details:
completed, download your results:Batch States
Your batch progresses through these states:| State | What’s happening |
|---|---|
queued | Waiting for processing capacity |
in_progress | Actively processing requests |
finalizing | Preparing output file |
completed | All done - results ready to download |
failed | System error prevented completion |
expired | Exceeded 24-hour window |
cancelled | You stopped the batch |
cancelling | Cancellation in progress |
Expired Batches
If your batch doesn’t finish within 24 hours, unprocessed requests are marked as expired. You’ll still get results for any completed requests, and expired ones appear in your output like this:Cancel a Batch
Use the cancel endpoint:Limits and Quotas
Batch size limits
Batch size limits
- 50,000 requests maximum per batch
- 200 MB maximum file size
- 1 MB maximum per request line
Account limits
Account limits
- 10 concurrent active batches
- Configurable rate limits separate from real-time API
Storage
Storage
- Results retained for 7 days after completion
- Automatic deletion after expiration (download promptly!)
Organize Batches
Usemetadata to keep track of different batch runs. Metadata is a set of key-value pairs that you can attach to a batch job for your own organizational purposes. For example, to track which environment, dataset, or version a batch belongs to.
- Environment tracking: Tag batches with
production,staging, ordevelopment - Dataset identification: Link batches to specific datasets or experiments
- Version control: Track which version of your prompts or models you’re testing

