Skip to main content
POST
/
management
/
v1
/
endpoints
/
{endpoint_id}
:deployModel
curl --request POST \ --url https://api.cerebras.ai/management/v1/endpoints/{endpoint_id}:deployModel \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data ' { "model": "orgs/my-org/models/llama-3.3-70b/versions/1" } '
{
  "name": "550e8400-e29b-41d4-a716-446655440000",
  "done": false,
  "response": {
    "deployment_id": "550e8400-e29b-41d4-a716-446655440000",
    "endpoint_id": "my-org-llama-3.3-70b",
    "org_name": "my-org",
    "model_arch_id": "llama-3.3-70b",
    "version_id": 1,
    "created": 1736700000,
    "updated": 1736700000,
    "rollout_status": "not_started"
  }
}
This feature is in Private Preview. For access or more information, contact us or reach out to your account representative.
Deploy a model version to a dedicated endpoint running a model with the same underlying architecture. The endpoint queues the deployment operation and returns a deployment ID for tracking status.

Authorizations

Authorization
string
header
required

Management API key generated from the Management API keys section on the API keys page at https://cloud.cerebras.ai. Use the format: Bearer <MANAGEMENT_API_KEY>

Path Parameters

endpoint_id
string
required

Unique identifier for the endpoint. It is used as the model field when making an inference request.

Example: my-org-llama-3.3-70b

curl --location 'https://api.cerebras.ai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${CEREBRAS_API_KEY}" \
--data '{
  "model": "my-org-llama-3.3-70b",
  "messages": [{"content": "Hello!", "role": "user"}]
}'

Body

application/json
model
string
required

Model version name in the format of orgs/<org_name>/models/<model_arch_id>/versions/<version_id>, where <version_id> can be an integer version ID or a model version alias.

Response

200 - application/json

Successful Response

name
string
required

Deployment UUID. This can be used to later query the status of this deployment.

done
boolean
required

Whether the operation is complete.

response
EndpointDeploymentOperation · object

Deployment metadata captured at submission time.