Skip to main content
GET
https://cloud.cerebras.ai
/
api
/
v1
/
metrics
/
organizations
/
{organization_id}
curl -H "Authorization: Bearer $CEREBRAS_API_KEY" \
  https://cloud.cerebras.ai/api/v1/metrics/organizations/org_abc123
# HELP inference_endpoint_status Status of inference endpoint (-1=error calculating status, 0=down, 1=up)
# TYPE inference_endpoint_status gauge
inference_endpoint_status{endpoint="model",organization_id="org_abc123"} 1.0

# HELP requests_count_total Total request count (all HTTP codes) in the last complete minute
# TYPE requests_count_total gauge
requests_count_total{endpoint="model",organization_id="org_abc123"} 1.0

# HELP requests_success_total Total successful requests (HTTP 200) in the last complete minute
# TYPE requests_success_total gauge
requests_success_total{endpoint="model",organization_id="org_abc123"} 1.0

# HELP requests_failure_total Total failed requests by HTTP code in the last complete minute
# TYPE requests_failure_total gauge
requests_failure_total{endpoint="model",organization_id="org_abc123"} 0.0

# HELP input_tokens_total Total input tokens for successful requests in the last complete minute
# TYPE input_tokens_total gauge
input_tokens_total{endpoint="model",organization_id="org_abc123"} 123456.0

# HELP output_tokens_total Total output tokens for successful requests in the last complete minute
# TYPE output_tokens_total gauge
output_tokens_total{endpoint="model",organization_id="org_abc123"} 12345.0

# HELP queue_time_seconds Queue time percentiles in seconds
# TYPE queue_time_seconds gauge
queue_time_seconds{endpoint="model",organization_id="org_abc123",percentile="avg"} 0.55
queue_time_seconds{endpoint="model",organization_id="org_abc123",percentile="p50"} 0.55
queue_time_seconds{endpoint="model",organization_id="org_abc123",percentile="p90"} 0.55
queue_time_seconds{endpoint="model",organization_id="org_abc123",percentile="p95"} 0.55
queue_time_seconds{endpoint="model",organization_id="org_abc123",percentile="p99"} 0.55

# HELP e2e_latency_seconds End-to-end API latency percentiles in seconds
# TYPE e2e_latency_seconds gauge
e2e_latency_seconds{endpoint="model",organization_id="org_abc123",statistic="avg"} 0.9
e2e_latency_seconds{endpoint="model",organization_id="org_abc123",statistic="p50"} 0.9
e2e_latency_seconds{endpoint="model",organization_id="org_abc123",statistic="p90"} 0.9
e2e_latency_seconds{endpoint="model",organization_id="org_abc123",statistic="p95"} 0.9
e2e_latency_seconds{endpoint="model",organization_id="org_abc123",statistic="p99"} 0.9

# HELP ttft_seconds Time To First Token percentiles in seconds
# TYPE ttft_seconds gauge
ttft_seconds{endpoint="model",organization_id="org_abc123",statistic="avg"} 0.9
ttft_seconds{endpoint="model",organization_id="org_abc123",statistic="p50"} 0.9
ttft_seconds{endpoint="model",organization_id="org_abc123",statistic="p90"} 0.9
ttft_seconds{endpoint="model",organization_id="org_abc123",statistic="p95"} 0.9
ttft_seconds{endpoint="model",organization_id="org_abc123",statistic="p99"} 0.9

# HELP cache_reads_total Total input tokens read from cache for successful requests in the last complete minute
# TYPE cache_reads_total gauge
cache_reads_total{endpoint="model",organization_id="org_abc123"} 1234.0

# HELP cache_rate Ratio of input tokens read from cache, to total input tokens, for successful requests in the last complete minute
# TYPE cache_rate gauge
cache_rate{endpoint="model",organization_id="org_abc123"} 0.01

# HELP tpot Time Per Output Token (TPOT) percentiles
# TYPE tpot gauge
tpot{endpoint="model",organization_id="org_abc123",statistic="avg"} 0.0001
tpot{endpoint="model",organization_id="org_abc123",statistic="p50"} 0.0001
tpot{endpoint="model",organization_id="org_abc123",statistic="p90"} 0.0001
tpot{endpoint="model",organization_id="org_abc123",statistic="p95"} 0.0001
tpot{endpoint="model",organization_id="org_abc123",statistic="p99"} 0.0001

# HELP latency_generation_seconds Completion time percentiles in seconds
# TYPE latency_generation_seconds gauge
latency_generation_seconds{endpoint="model",organization_id="org_abc123",statistic="avg"} 1.1
latency_generation_seconds{endpoint="model",organization_id="org_abc123",statistic="p50"} 1.1
latency_generation_seconds{endpoint="model",organization_id="org_abc123",statistic="p90"} 1.1
latency_generation_seconds{endpoint="model",organization_id="org_abc123",statistic="p95"} 1.1
latency_generation_seconds{endpoint="model",organization_id="org_abc123",statistic="p99"} 1.1
This feature is in Private Preview. For access or more information, contact us or reach out to your account representative.
See the Metrics guide for more info.

Path Parameters

organization_id
string
required
The unique identifier for your organization (e.g., org_abc123)

Response

Returns metrics in Prometheus text-based exposition format.
curl -H "Authorization: Bearer $CEREBRAS_API_KEY" \
  https://cloud.cerebras.ai/api/v1/metrics/organizations/org_abc123
# HELP inference_endpoint_status Status of inference endpoint (-1=error calculating status, 0=down, 1=up)
# TYPE inference_endpoint_status gauge
inference_endpoint_status{endpoint="model",organization_id="org_abc123"} 1.0

# HELP requests_count_total Total request count (all HTTP codes) in the last complete minute
# TYPE requests_count_total gauge
requests_count_total{endpoint="model",organization_id="org_abc123"} 1.0

# HELP requests_success_total Total successful requests (HTTP 200) in the last complete minute
# TYPE requests_success_total gauge
requests_success_total{endpoint="model",organization_id="org_abc123"} 1.0

# HELP requests_failure_total Total failed requests by HTTP code in the last complete minute
# TYPE requests_failure_total gauge
requests_failure_total{endpoint="model",organization_id="org_abc123"} 0.0

# HELP input_tokens_total Total input tokens for successful requests in the last complete minute
# TYPE input_tokens_total gauge
input_tokens_total{endpoint="model",organization_id="org_abc123"} 123456.0

# HELP output_tokens_total Total output tokens for successful requests in the last complete minute
# TYPE output_tokens_total gauge
output_tokens_total{endpoint="model",organization_id="org_abc123"} 12345.0

# HELP queue_time_seconds Queue time percentiles in seconds
# TYPE queue_time_seconds gauge
queue_time_seconds{endpoint="model",organization_id="org_abc123",percentile="avg"} 0.55
queue_time_seconds{endpoint="model",organization_id="org_abc123",percentile="p50"} 0.55
queue_time_seconds{endpoint="model",organization_id="org_abc123",percentile="p90"} 0.55
queue_time_seconds{endpoint="model",organization_id="org_abc123",percentile="p95"} 0.55
queue_time_seconds{endpoint="model",organization_id="org_abc123",percentile="p99"} 0.55

# HELP e2e_latency_seconds End-to-end API latency percentiles in seconds
# TYPE e2e_latency_seconds gauge
e2e_latency_seconds{endpoint="model",organization_id="org_abc123",statistic="avg"} 0.9
e2e_latency_seconds{endpoint="model",organization_id="org_abc123",statistic="p50"} 0.9
e2e_latency_seconds{endpoint="model",organization_id="org_abc123",statistic="p90"} 0.9
e2e_latency_seconds{endpoint="model",organization_id="org_abc123",statistic="p95"} 0.9
e2e_latency_seconds{endpoint="model",organization_id="org_abc123",statistic="p99"} 0.9

# HELP ttft_seconds Time To First Token percentiles in seconds
# TYPE ttft_seconds gauge
ttft_seconds{endpoint="model",organization_id="org_abc123",statistic="avg"} 0.9
ttft_seconds{endpoint="model",organization_id="org_abc123",statistic="p50"} 0.9
ttft_seconds{endpoint="model",organization_id="org_abc123",statistic="p90"} 0.9
ttft_seconds{endpoint="model",organization_id="org_abc123",statistic="p95"} 0.9
ttft_seconds{endpoint="model",organization_id="org_abc123",statistic="p99"} 0.9

# HELP cache_reads_total Total input tokens read from cache for successful requests in the last complete minute
# TYPE cache_reads_total gauge
cache_reads_total{endpoint="model",organization_id="org_abc123"} 1234.0

# HELP cache_rate Ratio of input tokens read from cache, to total input tokens, for successful requests in the last complete minute
# TYPE cache_rate gauge
cache_rate{endpoint="model",organization_id="org_abc123"} 0.01

# HELP tpot Time Per Output Token (TPOT) percentiles
# TYPE tpot gauge
tpot{endpoint="model",organization_id="org_abc123",statistic="avg"} 0.0001
tpot{endpoint="model",organization_id="org_abc123",statistic="p50"} 0.0001
tpot{endpoint="model",organization_id="org_abc123",statistic="p90"} 0.0001
tpot{endpoint="model",organization_id="org_abc123",statistic="p95"} 0.0001
tpot{endpoint="model",organization_id="org_abc123",statistic="p99"} 0.0001

# HELP latency_generation_seconds Completion time percentiles in seconds
# TYPE latency_generation_seconds gauge
latency_generation_seconds{endpoint="model",organization_id="org_abc123",statistic="avg"} 1.1
latency_generation_seconds{endpoint="model",organization_id="org_abc123",statistic="p50"} 1.1
latency_generation_seconds{endpoint="model",organization_id="org_abc123",statistic="p90"} 1.1
latency_generation_seconds{endpoint="model",organization_id="org_abc123",statistic="p95"} 1.1
latency_generation_seconds{endpoint="model",organization_id="org_abc123",statistic="p99"} 1.1

Available Metrics

The following metrics are available on an opt-in basis. Contact your Cerebras account representative to enable specific metrics for your organization.

Endpoint Health

inference_endpoint_status
gauge
Status of inference endpointValues:
  • -1 = Error calculating status
  • 0 = Down
  • 1 = Up

Request Metrics

requests_count_total
gauge
Total request count (all HTTP codes) in the last complete minute
requests_success_total
gauge
Total successful requests (HTTP 200) in the last complete minute
requests_failure_total
gauge
Total failed requests by HTTP code in the last complete minute

Token Metrics

input_tokens_total
gauge
Total input tokens for successful requests in the last complete minute
output_tokens_total
gauge
Total output tokens for successful requests in the last complete minute
cache_reads_total
gauge
Total input tokens read from cache for successful requests in the last complete minute
cache_rate
gauge
Ratio of input tokens read from cache, to total input tokens, for successful requests in the last complete minute

Latency Metrics

queue_time_seconds
gauge
Queue time percentiles in seconds for successful requests (avg/p50/p90/p95/p99) (e.g. time a request spends waiting for resources at runtime)
e2e_latency_seconds
gauge
End-to-end API latency percentiles in seconds for successful requests (avg/p50/p90/p95/p99). Includes overall latency from requests received at the API gateway to the response output from API gateway, inclusive of latency_generation_seconds.
ttft_seconds
gauge
Time To First Token percentiles in seconds for successful requests (avg/p50/p90/p95/p99)
tpot
gauge
Time per output tokens percentiles (avg/p50/p90/p95/p99), excluding time to first token, averaged across successful requests
latency_generation_seconds
gauge
Time to generate all output tokens (e.g. time from last prompt to last output token) percentiles in seconds for successful requests (avg/p50/p90/p95/p99)

Error Codes

For information about possible error responses, see the Error Codes documentation.