Deploy model to endpoint

curl --request POST \
  --url https://api.cerebras.ai/management/v1/endpoints/{endpoint_id}:deployModel \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "orgs/my-org/models/gpt-oss-120b/versions/1"
}
'

{
  "name": "550e8400-e29b-41d4-a716-446655440000",
  "done": false,
  "response": {
    "deployment_id": "550e8400-e29b-41d4-a716-446655440000",
    "endpoint_id": "my-org-gpt-oss-120b",
    "org_name": "my-org",
    "model_arch_id": "gpt-oss-120b",
    "version_id": 1,
    "created": 1736700000,
    "updated": 1736700000,
    "rollout_status": "not_started"
  }
}

POST

management

endpoints

{endpoint_id}

:deployModel

curl --request POST \
  --url https://api.cerebras.ai/management/v1/endpoints/{endpoint_id}:deployModel \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "orgs/my-org/models/gpt-oss-120b/versions/1"
}
'

{
  "name": "550e8400-e29b-41d4-a716-446655440000",
  "done": false,
  "response": {
    "deployment_id": "550e8400-e29b-41d4-a716-446655440000",
    "endpoint_id": "my-org-gpt-oss-120b",
    "org_name": "my-org",
    "model_arch_id": "gpt-oss-120b",
    "version_id": 1,
    "created": 1736700000,
    "updated": 1736700000,
    "rollout_status": "not_started"
  }
}

This feature is in Private Preview. For access or more information, contact us or reach out to your account representative.

Deploy a model version to a dedicated endpoint running a model with the same underlying architecture. The endpoint queues the deployment operation and returns a deployment ID for tracking status.

Authorizations

Authorization

string

header

required

Management API key generated from the Management API keys section on the API keys page at https://cloud.cerebras.ai. Use the format: Bearer <MANAGEMENT_API_KEY>

Path Parameters

endpoint_id

string

required

Unique identifier for the endpoint. It is used as the model field when making an inference request.

Example: my-org-gpt-oss-120b

curl --location 'https://api.cerebras.ai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${CEREBRAS_API_KEY}" \
--data '{
  "model": "my-org-gpt-oss-120b",
  "messages": [{"content": "Hello!", "role": "user"}]
}'

Body

application/json

model

string

required

Model version name in the format of orgs/<org_name>/models/<model_arch_id>/versions/<version_id>, where <version_id> can be an integer version ID or a model version alias.

Response

200 - application/json

Successful Response

name

string

required

Deployment UUID. This can be used to later query the status of this deployment.

done

boolean

required

Whether the operation is complete.

response

EndpointDeploymentOperation · object

Deployment metadata captured at submission time.

Hide child attributes

response.deployment_id

string

required

UUID for the deployment record.

response.endpoint_id

string

required

Unique identifier for the endpoint. It is used as the model field when making an inference request.

Example: my-org-gpt-oss-120b

curl --location 'https://api.cerebras.ai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${CEREBRAS_API_KEY}" \
--data '{
  "model": "my-org-gpt-oss-120b",
  "messages": [{"content": "Hello!", "role": "user"}]
}'

response.org_name

string

required

Owning organization for the deployment.

response.model_arch_id

string

required

Name of the model architecture (e.g. llama3.1-8b, gpt-oss-120b).

response.version_id

integer

required

Version identifier for the deployed model.

response.created

integer

required

Unix timestamp (in seconds) when the deployment record was created.

response.updated

integer

required

Unix timestamp (in seconds) when the deployment record was last updated.

response.version_alias

string | null

Original version reference from deploy request (alias or integer version ID as specified by user).

response.rollout_status

enum<string> | null

Latest rollout status reported by the deployer, if any.

Available options:

not_started,

in_progress,

rolling_back,

rolled_back,

done,

error,

cancelled

response.max_unavailable_replicas

integer | null

Maximum number of replicas that can be unavailable during rollout.

response.rollout_replicas_updated

integer | null

Number of replicas that have been updated in the rollout.

response.rollout_total_replicas

integer | null

Total number of replicas in the deployment.

response.rollout_error_message

string | null

Error message from the rollout, if any.

List endpoints

Retrieve endpoint status

⌘I

Introduction

Chat

Completions

Models

Batch

Files

Metrics

Management

Authorizations

Path Parameters

Body

Response