This feature is in Private Preview. For access or more information, contact us or reach out to your account representative.
Key Concepts
- Model Architecture: The underlying structure of a model (e.g.,
gpt-oss-120b,llama3.1-8b). Your dedicated endpoint is provisioned for a specific architecture, and any custom weights you upload must be compatible with that architecture. - Model Version: Each time you upload custom weights, a new version is created. Versions are identified by an auto-incrementing integer (e.g.,
1,2,3) and can also have user-defined aliases likeproductionorv1-stable. - Endpoint: Your dedicated endpoint is identified by a unique ID (e.g.,
my-org-gpt-oss-120b). This ID is used as themodelfield when making inference requests.
Typical Workflow
- Upload model weights — Push your fine-tuned weights from S3 to Cerebras. The upload is asynchronous; you’ll receive a version ID to track progress.
- Check upload status — Poll the version status until the sync completes.
- Deploy to endpoint — Once the upload is complete, deploy the version to your dedicated endpoint.
-
Run inference — Make requests to your endpoint using the same endpoint ID as the
modelfield. -
Iterate — Upload new versions as you fine-tune, assign aliases like
productionto track releases, and deploy updates with zero downtime.
Authentication
The Management API uses a separate API key from the standard inference API. You can find your Management API key under Management API keys on the API keys page in the Cerebras Cloud console.S3 Bucket Setup
Before uploading model weights, you need an S3 bucket configured with cross-account access to Cerebras. Your Cerebras representative will provide specific instructions, but the bucket policy must grant cross-account access:<your-bucket-name> with your S3 bucket name. The IAM role ARN (<cerebras-provided-iam-role-arn>) will be provided by Cerebras and enables secure cross-account access to your model weights.
