We enforce rate limits to ensure fair usage and system stability. These limits apply per API key and reset periodically based on the type of limit (daily or per-minute).

To help you monitor your usage in real time, we inject several custom headers into every API response. These headers provide insight into your current usage and when your limits will reset.

Rate Limit Headers

You’ll find the following headers in the response:

HeaderDescription
x-ratelimit-limit-requests-dayMaximum number of requests allowed per day.
x-ratelimit-limit-tokens-minuteMaximum number of tokens allowed per minute.
x-ratelimit-remaining-requests-dayNumber of requests remaining for the current day.
x-ratelimit-remaining-tokens-minuteNumber of tokens remaining for the current minute.
x-ratelimit-reset-requests-dayTime (in seconds) until your daily request limit resets.
x-ratelimit-reset-tokens-minuteTime (in seconds) until your per-minute token limit resets.

These values update with each API call, giving you immediate visibility into your current usage.

Example

You can view these headers by adding the —verbose flag to a cURL request. Here’s an example:

curl --location 'https://api.cerebras.ai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${CEREBRAS_API_KEY}" \
--data '{
  "model": "llama3.1-8b",
  "stream": false,
  "messages": [{"content": "Hello!", "role": "user"}],
  "temperature": 0,
  "max_completion_tokens": -1,
  "seed": 0,
  "top_p": 1
}' \
--verbose

In the response, look for headers like these:

x-ratelimit-limit-requests-day: 1000000000
x-ratelimit-limit-tokens-minute: 1000000000
x-ratelimit-remaining-requests-day: 999997455
x-ratelimit-remaining-tokens-minute: 999998298
x-ratelimit-reset-requests-day: 33011.382867097855
x-ratelimit-reset-tokens-minute: 11.382867097854614

Notes

If you have questions about your usage or need higher rate limits, contact us via our website, or reach out to your account representative.

Was this page helpful?