Skip to main content
A dedicated endpoint is a private, provisioned instance of the Cerebras Inference service reserved exclusively for your organization. Your traffic runs on reserved capacity, ensuring latency and throughput are not affected by other users. Dedicated endpoints are intended for production workloads that require predictable performance—such as real-time applications, customer-facing products, and high-volume pipelines that need guaranteed capacity. See supported models here. Key Benefits
Your endpoint runs on reserved capacity that is not shared with other customers, so your performance is never impacted by other workloads.
Performance is reserved and predictable, even under load.
Deploy your custom fine-tuned models alongside standard model variants.
Tailor your endpoint to match the performance and scale requirements of your workload through bespoke draft models, model configurations, and quantization strategies.
All capabilities available on shared endpoints are available on dedicated endpoints. In addition, dedicated customers get access to advanced features including fine-tuning, weight management, and enhanced service tier controls.
To get started with a dedicated endpoint, contact us.

Supported Models

Dedicated endpoints support a broad range of model families, including multiple versions, parameter sizes, and weight variations (e.g., -instruct and -thinking) as well as your own custom weights. We can also work with you to tune your endpoint configuration to meet your specific performance goals. For models that are natively multimodal, we currently support text-only inference, with multimodal support coming soon.

Features

Dedicated endpoints include all shared endpoints capabilities, plus:
  • Fine-tuning — Deploy custom model weights on your dedicated endpoint.
  • Management API — Programmatically manage models, capacity, and endpoints.
  • Batch API — Run large-scale asynchronous workloads against your reserved capacity.
  • Service tiers — Configure request prioritization to match your SLA requirements.
  • Metrics — Monitor your endpoint with Prometheus-compatible metrics for requests, tokens, latency, and health.

Get Started

Dedicated endpoints are available to enterprise customers. Contact us to discuss your requirements.