This feature is in Private Preview. For access or more information, contact us or reach out to your account representative.
Service Tiers
Service tiers determine the processing priority of your requests. You can specify a tier using theservice_tier parameter in your API requests.
| Tier | Description |
|---|---|
priority1 | Highest priority - requests are processed first. Use for time-critical, user-facing requests that require immediate processing. Only available for dedicated endpoints, not shared endpoints. |
default | Standard priority processing. Use for standard production workloads with normal latency requirements. |
auto | Automatically uses the highest available service tier. Use when you want to maximize requests served while allowing flexibility in processing priority. |
flex | Lowest priority - requests are processed towards the end. Use for overflow requests that cannot fit in higher service tier rate limits or for experiments. |
service_tier is specified, requests default to the default tier.
1 The
priority tier requires a dedicated endpoint. If interested, contact your account representative for more information.Usage
Add theservice_tier parameter to your chat completions request to specify the priority level.
auto, the response will include a service_tier_used field that indicates the effective service tier used for processing.
Queue Threshold Control
Only applies to requests using the
flex or auto service tiers.queue_threshold header allows you to set a maximum acceptable queue time for flex tier requests. If the expected queue time exceeds your threshold, the request is preemptively rejected rather than waiting in the queue.
Valid range: 50-20000 milliseconds
FAQ
How do rate limits apply across service tiers?
How do rate limits apply across service tiers?
priority and default rate limits are the same, while flex rate limits are tracked independently and are several multiples of default rate limits.Are priority, flex, or auto logged differently in usage tracking?
Are priority, flex, or auto logged differently in usage tracking?
Yes. Log in to cloud.cerebras.ai and click Analytics. Graphs in the analytics tab display usage across different service tiers, allowing you to monitor consumption by priority level.
Are priority, flex, or auto billed differently than default?
Are priority, flex, or auto billed differently than default?
No, during the preview launch all service tiers are billed equally.
Will my request ever be processed on a lower service tier if I do not set service_tier to auto?
Will my request ever be processed on a lower service tier if I do not set service_tier to auto?
No, only requests set to
auto can be processed on a lower service tier.Can I set queue time threshold on other service tiers?
Can I set queue time threshold on other service tiers?
The queue time threshold only applies once a request is being processed on the
flex service tier. You can set it on requests using auto or flex, but it will only be evaluated if the request is processed on the flex tier.
