Skip to main content
POST
/
management
/
v1
/
endpoints
/
{endpoint_id}
:deployModel
curl --request POST \
  --url https://api.cerebras.ai/management/v1/endpoints/{endpoint_id}:deployModel \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "orgs/my-org/models/gpt-oss-120b/versions/1"
}
'
{
  "name": "550e8400-e29b-41d4-a716-446655440000",
  "done": false,
  "response": {
    "deployment_id": "550e8400-e29b-41d4-a716-446655440000",
    "endpoint_id": "my-org-gpt-oss-120b",
    "org_name": "my-org",
    "model_arch_id": "gpt-oss-120b",
    "version_id": 1,
    "created": 1736700000,
    "updated": 1736700000,
    "rollout_status": "not_started"
  }
}

Documentation Index

Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt

Use this file to discover all available pages before exploring further.

This feature is in Private Preview. For access or more information, contact us or reach out to your account representative.
Deploy a model version to a dedicated endpoint running a model with the same underlying architecture. The endpoint queues the deployment operation and returns a deployment ID for tracking status.

Authorizations

Authorization
string
header
required

Management API key generated from the Management API keys section on the API keys page at https://cloud.cerebras.ai. Use the format: Bearer <MANAGEMENT_API_KEY>

Path Parameters

endpoint_id
string
required

Unique identifier for the endpoint. It is used as the model field when making an inference request.

Example: my-org-gpt-oss-120b

curl --location 'https://api.cerebras.ai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${CEREBRAS_API_KEY}" \
--data '{
"model": "my-org-gpt-oss-120b",
"messages": [{"content": "Hello!", "role": "user"}]
}'

Body

application/json
model
string
required

Model version name in the format of orgs/<org_name>/models/<model_arch_id>/versions/<version_id>, where <version_id> can be an integer version ID or a model version alias.

Response

200 - application/json

Successful Response

name
string
required

Deployment UUID. This can be used to later query the status of this deployment.

done
boolean
required

Whether the operation is complete.

response
EndpointDeploymentOperation · object

Deployment metadata captured at submission time.