Skip to main content
Rate limits ensure fair usage and system stability by regulating how often users and applications can access our API within a specified timeframe. They help protect our service from abuse or misuse and keep your access fair and without slowdowns.

How are rate limits measured?

We measure rate limits in requests sent and tokens used within a specified timeframe:
  • Requests per minute/hour/day (RPM, RPH, RPD)
  • Tokens per minute/hour/day (TPM, TPH, TPD)
Rate limiting can be triggered by any metric, whichever comes first. For example, you have a rate limit of 50 RPM and 200K TPM. If you submit 50 requests in one minute with just 100 tokens each, you’ll hit your limit even though your total token usage (5,000) is far below the 200K token threshold. Rate limits apply at the organization level, not the user level, and vary based on the model.

Rate Limits

This provides an overview of general limits, though specific cases may vary. For precise, up-to-date rate limit information applicable to your organization, check the Limits section within your account.
  • Free
  • Developer
ModelTPMTPHTPDRPMRPHRPD
gpt-oss-120b60K1M1M309014.4K
llama3.1-8b60K1M1M309014.4K
llama-3.3-70b60K1M1M309014.4K
qwen-3-32b60K1M1M309014.4K
qwen-3-235b-a22b-instruct-250760K1M1M309014.4K
qwen-3-235b-a22b-thinking-250760K1M1M309014.4K
qwen-3-coder-480b150K1M1M10100100

Rate Limit Headers

To help you monitor your usage in real time, we inject several custom headers into every API response. These headers provide insight into your current usage and when your limits will reset. You’ll find the following headers in the response:
HeaderDescription
x-ratelimit-limit-requests-dayMaximum number of requests allowed per day.
x-ratelimit-limit-tokens-minuteMaximum number of tokens allowed per minute.
x-ratelimit-remaining-requests-dayNumber of requests remaining for the current day.
x-ratelimit-remaining-tokens-minuteNumber of tokens remaining for the current minute.
x-ratelimit-reset-requests-dayTime (in seconds) until your daily request limit resets.
x-ratelimit-reset-tokens-minuteTime (in seconds) until your per-minute token limit resets.
These values update with each API call, giving you immediate visibility into your current usage.

Example

You can view these headers by adding the --verbose flag to a cURL request:
curl --location 'https://api.cerebras.ai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${CEREBRAS_API_KEY}" \
--data '{
  "model": "llama3.1-8b",
  "stream": false,
  "messages": [{"content": "Hello!", "role": "user"}],
  "temperature": 0,
  "max_completion_tokens": -1,
  "seed": 0,
  "top_p": 1
}' \
--verbose
In the response, look for headers like these:
x-ratelimit-limit-requests-day: 1000000000
x-ratelimit-limit-tokens-minute: 1000000000
x-ratelimit-remaining-requests-day: 999997455
x-ratelimit-remaining-tokens-minute: 999998298
x-ratelimit-reset-requests-day: 33011.382867097855
x-ratelimit-reset-tokens-minute: 11.382867097854614

Notes

If you have questions about your usage or need higher rate limits, contact us via our website, or reach out to your account representative.
I