How are rate limits measured?
We measure rate limits in requests sent and tokens used within a specified timeframe:- Requests per minute/hour/day (RPM, RPH, RPD)
- Tokens per minute/hour/day (TPM, TPH, TPD)
Token Rate Limiting
When you send a request, we estimate the total tokens that will be consumed by:- Estimating the input tokens in your prompt
- Adding either the
max_completion_tokensparameter or the maximum sequence length (MSL), minus input tokens
max_completion_tokens appropriately for your use case to avoid overestimating token usage and triggering unnecessary rate limits.
Quota Replenishment
Your quota is calculated as:Limits by Tier
This provides an overview of general limits, though specific cases may vary. For precise, up-to-date rate limit information applicable to your organization, check the Limits section within your account.- Free
- Developer
| Model | TPM | TPH | TPD | RPM | RPH | RPD |
|---|---|---|---|---|---|---|
gpt-oss-120b | 60K | 1M | 1M | 30 | 900 | 14.4K |
llama3.1-8b | 60K | 1M | 1M | 30 | 900 | 14.4K |
llama-3.3-70b | 60K | 1M | 1M | 30 | 900 | 14.4K |
qwen-3-32b | 60K | 1M | 1M | 30 | 090 | 14.4K |
qwen-3-235b-a22b-instruct-2507 | 60K | 1M | 1M | 30 | 900 | 14.4K |
zai-glm-4.6 | 150K | 1M | 1M | 10 | 100 | 100 |
Rate Limit Headers
To help you monitor your usage in real time, we inject several custom headers into every API response. These headers provide insight into your current usage and when your limits will reset. You’ll find the following headers in the response:| Header | Description |
|---|---|
x-ratelimit-limit-requests-day | Maximum number of requests allowed per day. |
x-ratelimit-limit-tokens-minute | Maximum number of tokens allowed per minute. |
x-ratelimit-remaining-requests-day | Number of requests remaining for the current day. |
x-ratelimit-remaining-tokens-minute | Number of tokens remaining for the current minute. |
x-ratelimit-reset-requests-day | Time (in seconds) until your daily request limit resets. |
x-ratelimit-reset-tokens-minute | Time (in seconds) until your per-minute token limit resets. |
Example
You can view these headers by adding the--verbose flag to a cURL request:
Notes
If you exceed your rate limits, you will receive a 429 Too Many Requests error.

