Management API - Cerebras Inference

This feature is in Private Preview. For access or more information, contact us or reach out to your account representative.

Upload fine-tuned models from your S3 bucket, track multiple versions, and deploy them to your dedicated endpoint. For detailed endpoint documentation, see the Management API Reference.

Key Concepts

Model Architecture: The underlying structure of a model (e.g., gpt-oss-120b, zai-glm-4.7). Your dedicated endpoint is provisioned for a specific architecture, and any custom weights you upload must be compatible with that architecture.
Model Version: Each time you upload custom weights, a new version is created. Versions are identified by an auto-incrementing integer (e.g., 1, 2, 3) and can also have user-defined aliases like production or v1-stable.
Endpoint: Your dedicated endpoint is identified by a unique ID (e.g., my-org-gpt-oss-120b). This ID is used as the model field when making inference requests.

Typical Workflow

Upload model weights — Push your fine-tuned weights from S3 to Cerebras. The upload is asynchronous; you’ll receive a version ID to track progress.
Check upload status — Poll the version status until the sync completes.
Deploy to endpoint — Once the upload is complete, deploy the version to your dedicated endpoint.
Run inference — Make requests to your endpoint using the same endpoint ID as the model field.
Iterate — Upload new versions as you fine-tune, assign aliases like production to track releases, and deploy updates with zero downtime.

Authentication

The Management API uses a separate API key from the standard inference API. You can find your Management API key under Management API keys on the API keys page in the Cerebras Cloud console.

S3 Bucket Setup

Before uploading model weights, you need an S3 bucket configured with cross-account access to Cerebras. Your Cerebras representative will provide specific instructions, but the bucket policy must grant cross-account access:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "<cerebras-provided-iam-role-arn>"
      },
      "Action": [
        "s3:GetObject",
        "s3:GetObjectVersion",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::<your-bucket-name>",
        "arn:aws:s3:::<your-bucket-name>/*"
      ]
    }
  ]
}

Replace <your-bucket-name> with your S3 bucket name. The IAM role ARN (<cerebras-provided-iam-role-arn>) will be provided by Cerebras and enables secure cross-account access to your model weights.

​Key Concepts

​Typical Workflow

​Authentication

​S3 Bucket Setup

Key Concepts

Typical Workflow

Authentication

S3 Bucket Setup