Retrieve metrics
Retrieve operational metrics for your organization’s inference endpoints in Prometheus format.
GET
See the Metrics guide for more info.Documentation Index
Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt
Use this file to discover all available pages before exploring further.
Path Parameters
The unique identifier for your organization (e.g.,
org_abc123)Response
Returns metrics in Prometheus text-based exposition format.Available Metrics
The following metrics are available on an opt-in basis. Contact your Cerebras account representative to enable specific metrics for your organization.Endpoint Health
Status of inference endpointValues:
-1= Error calculating status0= Down1= Up
Request Metrics
Total request count (all HTTP codes) in the last complete minute
Total successful requests (HTTP 200) in the last complete minute
Total failed requests by HTTP code in the last complete minute
Token Metrics
Total input tokens for successful requests in the last complete minute
Total output tokens for successful requests in the last complete minute
Total input tokens read from cache for successful requests in the last complete minute
Ratio of input tokens read from cache, to total input tokens, for successful requests in the last complete minute
Latency Metrics
Queue time percentiles in seconds for successful requests (avg/p50/p90/p95/p99) (e.g. time a request spends waiting for resources at runtime)
End-to-end API latency percentiles in seconds for successful requests (avg/p50/p90/p95/p99). Includes overall latency from requests received at the API gateway to the response output from API gateway, inclusive of
latency_generation_seconds.Time To First Token percentiles in seconds for successful requests (avg/p50/p90/p95/p99)
Time per output tokens percentiles (avg/p50/p90/p95/p99), excluding time to first token, averaged across successful requests
Time to generate all output tokens (e.g. time from last prompt to last output token) percentiles in seconds for successful requests (avg/p50/p90/p95/p99)

