Skip to main content

Documentation Index

Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt

Use this file to discover all available pages before exploring further.

Analytics

The Analytics page has three tabs: Usage, Cached-Usage, and Cost.
All dates and timestamps are displayed in UTC.
Track request volume and token consumption over a selected date range. Toggle Show quotas to overlay your rate limit thresholds and see how close you are to your limits. Use Download Report to export the data as a CSV.

Tips

Monitor quota headroom — Enable Show quotas on the Usage tab to see how close you are to your rate limits. If you’re consistently near the ceiling, consider distributing traffic across projects or requesting a limit increase. Optimize caching — If cache hits are low on the Cached-Usage tab, review whether your prompts have a stable, shared prefix. Effective caching reduces Time to First Token (TTFT) for long-context workloads. Track costs by model — Filter the Cost tab by model to compare spend. This helps when deciding whether a smaller, faster model is sufficient for a given use case. Debug usage spikes — Narrow the date range to isolate when a spike started, then cross-reference with Logs to identify the source.

Logs

The Logs page has two tabs: Request Logs and Audit Logs.
All dates and timestamps are displayed in UTC.
Inspect individual API calls by filtering on model, API key, date range, or HTTP status code. The status code chart at the top gives you a quick visual of error rates over time. Use Download Report to export the current view as a CSV.When contacting support about a failed request, include the Request ID from the log entry.

Limits

The Limits page displays a table of models available to your organization or project, along with their associated rate limits. The table shows each model’s name, context length, limit type (requests or tokens), and quota by minute and day. Hourly limits are shown in the org-level view only. If you need higher limits, contact us or reach out to your account representative.
The values shown on the Limits page are specific to your plan and project. Your limits may differ from examples in the documentation.

Per-Org vs. Per-Project Limits

The limits shown depend on your current console context:
Console contextLimits shown
All Projects viewOrg-level limits
Specific project selectedProject-level limits for that project
Cerebras uses a two-level quota model — requests are checked against both the project-level limit and the org-level ceiling. See Projects for more on how the two levels interact.