How are rate limits measured?
We measure rate limits in requests sent and tokens used within a specified timeframe:- Requests per minute/hour/day (RPM, RPH, RPD)
- Tokens per minute/hour/day (TPM, TPH, TPD)
Rate Limits
This provides an overview of general limits, though specific cases may vary. For precise, up-to-date rate limit information applicable to your organization, check the Limits section within your account.- Free
- Developer
Model | TPM | TPH | TPD | RPM | RPH | RPD |
---|---|---|---|---|---|---|
gpt-oss-120b | 60K | 1M | 1M | 30 | 90 | 14.4K |
llama3.1-8b | 60K | 1M | 1M | 30 | 90 | 14.4K |
llama-3.3-70b | 60K | 1M | 1M | 30 | 90 | 14.4K |
qwen-3-32b | 60K | 1M | 1M | 30 | 90 | 14.4K |
qwen-3-235b-a22b-instruct-2507 | 60K | 1M | 1M | 30 | 90 | 14.4K |
qwen-3-235b-a22b-thinking-2507 | 60K | 1M | 1M | 30 | 90 | 14.4K |
qwen-3-coder-480b | 150K | 1M | 1M | 10 | 100 | 100 |
Rate Limit Headers
To help you monitor your usage in real time, we inject several custom headers into every API response. These headers provide insight into your current usage and when your limits will reset. You’ll find the following headers in the response:Header | Description |
---|---|
x-ratelimit-limit-requests-day | Maximum number of requests allowed per day. |
x-ratelimit-limit-tokens-minute | Maximum number of tokens allowed per minute. |
x-ratelimit-remaining-requests-day | Number of requests remaining for the current day. |
x-ratelimit-remaining-tokens-minute | Number of tokens remaining for the current minute. |
x-ratelimit-reset-requests-day | Time (in seconds) until your daily request limit resets. |
x-ratelimit-reset-tokens-minute | Time (in seconds) until your per-minute token limit resets. |
Example
You can view these headers by adding the--verbose
flag to a cURL request:
Notes
- The
reset
headers are measured in seconds. - If you exceed your rate limits, you will receive a 429 Too Many Requests error.