Dedicated capacity
Dedicated capacity
Your endpoint runs on reserved capacity that is not shared with other customers, so your performance is never impacted by other workloads.
Consistent latency and throughput
Consistent latency and throughput
Performance is reserved and predictable, even under load.
Bring your own weights
Bring your own weights
Deploy your custom fine-tuned models alongside standard model variants.
Performance customization
Performance customization
Tailor your endpoint to match the performance and scale requirements of your workload through bespoke draft models, model configurations, and quantization strategies.
Exclusive access to advanced features
Exclusive access to advanced features
All capabilities available on shared endpoints are available on dedicated endpoints. In addition, dedicated customers get access to advanced features including fine-tuning, weight management, and enhanced service tier controls.
Supported Models
Dedicated endpoints support a broad range of model families, including multiple versions, parameter sizes, and weight variations (e.g.,-instruct and -thinking) as well as your own custom weights. We can also work with you to tune your endpoint configuration to meet your specific performance goals.
For models that are natively multimodal, we currently support text-only inference, with multimodal support coming soon.

Alibaba Qwen — Qwen3, Qwen3-Coder
Alibaba Qwen — Qwen3, Qwen3-Coder
Qwen3-235B-A22BQwen3-32BQwen3-30B-A3BSmall & Tiny VariantsQwen3-Coder
OpenAI (OSS) — GPT-OSS
OpenAI (OSS) — GPT-OSS

MiniMax — MiniMax M2.X
MiniMax — MiniMax M2.X
Meta — Llama 3, Llama 4
Meta — Llama 3, Llama 4

Mistral — Mistral Small, Mistral Large 3, Devstral 2, Mixtral
Mistral — Mistral Small, Mistral Large 3, Devstral 2, Mixtral

Z.AI — GLM 4.X
Z.AI — GLM 4.X

Moonshot AI — Kimi K2.X
Moonshot AI — Kimi K2.X

DeepSeek — DeepSeek V3.X
DeepSeek — DeepSeek V3.X

ByteDance — OSS Seed
ByteDance — OSS Seed

ServiceNow — Apriel
ServiceNow — Apriel
Coming soon: multimodal
Coming soon: multimodal
- Qwen3-VL
- GLM 4.6V
- Kimi K2.5 Vision
- Pixtral Large
Features
Dedicated endpoints include all shared endpoints capabilities, plus:- Fine-tuning — Deploy custom model weights on your dedicated endpoint.
- Management API — Programmatically manage models, capacity, and endpoints.
- Batch API — Run large-scale asynchronous workloads against your reserved capacity.
- Service tiers — Configure request prioritization to match your SLA requirements.
- Metrics — Monitor your endpoint with Prometheus-compatible metrics for requests, tokens, latency, and health.

