{
"name": "550e8400-e29b-41d4-a716-446655440000",
"done": false,
"response": {
"deployment_id": "550e8400-e29b-41d4-a716-446655440000",
"endpoint_id": "my-org-llama-3.3-70b",
"org_name": "my-org",
"model_arch_id": "llama-3.3-70b",
"version_id": 1,
"created": 1736700000,
"updated": 1736700000,
"rollout_status": "not_started"
}
}{
"name": "550e8400-e29b-41d4-a716-446655440000",
"done": false,
"response": {
"deployment_id": "550e8400-e29b-41d4-a716-446655440000",
"endpoint_id": "my-org-llama-3.3-70b",
"org_name": "my-org",
"model_arch_id": "llama-3.3-70b",
"version_id": 1,
"created": 1736700000,
"updated": 1736700000,
"rollout_status": "not_started"
}
}Management API key generated from the Management API keys section on the API keys page at https://cloud.cerebras.ai. Use the format: Bearer <MANAGEMENT_API_KEY>
Unique identifier for the endpoint. It is used as the model field when making an inference request.
Example: my-org-llama-3.3-70b
curl --location 'https://api.cerebras.ai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${CEREBRAS_API_KEY}" \
--data '{
"model": "my-org-llama-3.3-70b",
"messages": [{"content": "Hello!", "role": "user"}]
}'
Model version name in the format of orgs/<org_name>/models/<model_arch_id>/versions/<version_id>, where <version_id> can be an integer version ID or a model version alias.
Successful Response
Deployment UUID. This can be used to later query the status of this deployment.
Whether the operation is complete.
Deployment metadata captured at submission time.
Hide child attributes
UUID for the deployment record.
Unique identifier for the endpoint. It is used as the model field when making an inference request.
Example: my-org-llama-3.3-70b
curl --location 'https://api.cerebras.ai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${CEREBRAS_API_KEY}" \
--data '{
"model": "my-org-llama-3.3-70b",
"messages": [{"content": "Hello!", "role": "user"}]
}'
Owning organization for the deployment.
Name of the model architecture (e.g. llama3.1-8b, llama-3.3-70b).
Version identifier for the deployed model.
Unix timestamp (in seconds) when the deployment record was created.
Unix timestamp (in seconds) when the deployment record was last updated.
Original version reference from deploy request (alias or integer version ID as specified by user).
Latest rollout status reported by the deployer, if any.
not_started, in_progress, rolling_back, rolled_back, done, error, cancelled Maximum number of replicas that can be unavailable during rollout.
Number of replicas that have been updated in the rollout.
Total number of replicas in the deployment.
Error message from the rollout, if any.
Was this page helpful?