> ## Documentation Index
> Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Management API

> Programmatically upload, version, and deploy custom model weights on your dedicated endpoint.

<Callout icon="lock" color="#b2b1b1ff" iconType="regular">
  This feature is in [Private Preview](/support/preview-releases). For access or more information, [contact us](https://www.cerebras.ai/contact) or reach out to your account representative.
</Callout>

Upload fine-tuned models from your S3 bucket, track multiple versions, and deploy them to your [dedicated endpoint](/dedicated/overview).

For detailed endpoint documentation, see the [Management API Reference](/api-reference/customer_management_api/list-model-architectures).

## Key Concepts

* **Model Architecture**: The underlying structure of a model (e.g., `gpt-oss-120b`, `zai-glm-4.7`). Your dedicated endpoint is provisioned for a specific architecture, and any custom weights you upload must be compatible with that architecture.
* **Model Version**: Each time you upload custom weights, a new version is created. Versions are identified by an auto-incrementing integer (e.g., `1`, `2`, `3`) and can also have user-defined aliases like `production` or `v1-stable`.
* **Endpoint**: Your dedicated endpoint is identified by a unique ID (e.g., `my-org-gpt-oss-120b`). This ID is used as the `model` field when making inference requests.

## Typical Workflow

1. **Upload model weights** — Push your fine-tuned weights from S3 to Cerebras. The upload is asynchronous; you'll receive a version ID to track progress.

2. **Check upload status** — Poll the version status until the sync completes.

3. **Deploy to endpoint** — Once the upload is complete, deploy the version to your dedicated endpoint.

4. **Run inference** — Make requests to your endpoint using the same endpoint ID as the `model` field.

5. **Iterate** — Upload new versions as you fine-tune, assign aliases like `production` to track releases, and deploy updates with zero downtime.

## Authentication

The Management API uses a separate API key from the standard inference API. You can find your Management API key under **Management API keys** on the API keys page in the [Cerebras Cloud console](https://cloud.cerebras.ai).

## S3 Bucket Setup

Before uploading model weights, you need an S3 bucket configured with cross-account access to Cerebras. Your Cerebras representative will provide specific instructions, but the bucket policy must grant cross-account access:

```json theme={null}
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "<cerebras-provided-iam-role-arn>"
      },
      "Action": [
        "s3:GetObject",
        "s3:GetObjectVersion",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::<your-bucket-name>",
        "arn:aws:s3:::<your-bucket-name>/*"
      ]
    }
  ]
}
```

Replace `<your-bucket-name>` with your S3 bucket name. The IAM role ARN (`<cerebras-provided-iam-role-arn>`) will be provided by Cerebras and enables secure cross-account access to your model weights.
