curl --request GET \
--url https://api.cerebras.ai/management/v1/endpoints/{endpoint_id} \
--header 'Authorization: Bearer <token>'{
"name": "my-org-llama-3.3-70b",
"model_arch_id": "llama-3.3-70b",
"deployed_models": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"model": "orgs/my-org/models/llama-3.3-70b/versions/1",
"version_alias": "production-jan-12-2026-finetuned",
"created": 1736700000,
"state": "complete"
}
],
"managing_org_name": "my-org",
"created": 1736600000,
"updated": 1736700000
}curl --request GET \
--url https://api.cerebras.ai/management/v1/endpoints/{endpoint_id} \
--header 'Authorization: Bearer <token>'{
"name": "my-org-llama-3.3-70b",
"model_arch_id": "llama-3.3-70b",
"deployed_models": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"model": "orgs/my-org/models/llama-3.3-70b/versions/1",
"version_alias": "production-jan-12-2026-finetuned",
"created": 1736700000,
"state": "complete"
}
],
"managing_org_name": "my-org",
"created": 1736600000,
"updated": 1736700000
}Management API key generated from the Management API keys section on the API keys page at https://cloud.cerebras.ai. Use the format: Bearer <MANAGEMENT_API_KEY>
Unique identifier for the endpoint. It is used as the model field when making an inference request.
Example: my-org-llama-3.3-70b
curl --location 'https://api.cerebras.ai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${CEREBRAS_API_KEY}" \
--data '{
"model": "my-org-llama-3.3-70b",
"messages": [{"content": "Hello!", "role": "user"}]
}'Successful Response
Endpoint name (e.g. my-org-llama-3.3-70b).
Name of the model architecture (e.g. llama3.1-8b, llama-3.3-70b).
List of deployed models.
Hide child attributes
Deployment ID corresponding to this instance of deployed model. This is the UUID generated during deployment.
Model Version ID in the format of orgs/<org_name>/models/<model_arch_id>/versions/<version_id>.
Unix timestamp (in seconds) when the deployment was created.
Rollout status reported for the deployment.
not_started, in_progress, rolling_back, rolled_back, done, error, cancelled Original version reference from deploy request (alias or integer version ID as specified by user).
Maximum number of replicas that can be unavailable during rollout.
Number of replicas that have been updated in the rollout.
Total number of replicas in the deployment.
Error message from the rollout, if any.
Unix timestamp (in seconds) when the endpoint was created.
Unix timestamp (in seconds) when the endpoint was last updated.
Organization that has management access to this endpoint.
Was this page helpful?