> ## Documentation Index
> Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Deploy model to endpoint

<Callout icon="lock" color="#b2b1b1ff" iconType="regular">
  This feature is in [Private Preview](/support/preview-releases). For access or more information, [contact us](https://www.cerebras.ai/contact) or reach out to your account representative.
</Callout>

Deploy a model version to a dedicated endpoint running a model with the same underlying architecture. The endpoint queues the deployment operation and returns a deployment ID for tracking status.


## OpenAPI

````yaml post /management/v1/endpoints/{endpoint_id}:deployModel
openapi: 3.1.0
info:
  title: Endpoint Management Orchestrator API
  version: 1.0.0
servers:
  - url: https://api.cerebras.ai
security:
  - BearerAuth: []
paths:
  /management/v1/endpoints/{endpoint_id}:deployModel:
    post:
      summary: Deploy model to endpoint
      operationId: >-
        deploy_model_to_endpoint_route_management_v1_endpoints__endpoint_id__deployModel_post
      parameters:
        - name: endpoint_id
          in: path
          required: true
          schema:
            type: string
            title: Endpoint Id
            description: >-
              Unique identifier for the endpoint. It is used as the
              [`model`](https://inference-docs.cerebras.ai/api-reference/chat-completions#param-model)
              field when making an inference request.


              Example: `my-org-gpt-oss-120b`


              ```bash

              curl --location 'https://api.cerebras.ai/v1/chat/completions' \

              --header 'Content-Type: application/json' \

              --header "Authorization: Bearer ${CEREBRAS_API_KEY}" \

              --data '{
                "model": "my-org-gpt-oss-120b",
                "messages": [{"content": "Hello!", "role": "user"}]
              }'

              ```
          description: >-
            Unique identifier for the endpoint. It is used as the
            [`model`](https://inference-docs.cerebras.ai/api-reference/chat-completions#param-model)
            field when making an inference request.


            Example: `my-org-gpt-oss-120b`


            ```bash

            curl --location 'https://api.cerebras.ai/v1/chat/completions' \

            --header 'Content-Type: application/json' \

            --header "Authorization: Bearer ${CEREBRAS_API_KEY}" \

            --data '{
              "model": "my-org-gpt-oss-120b",
              "messages": [{"content": "Hello!", "role": "user"}]
            }'

            ```
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/EndpointDeployModelRequest'
            examples:
              deploy_model:
                summary: Deploy a model version
                value:
                  model: orgs/my-org/models/gpt-oss-120b/versions/1
              deploy_by_alias:
                summary: Deploy using version alias
                value:
                  model: orgs/my-org/models/gpt-oss-120b/versions/production
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/EndpointDeployResponse'
              example:
                name: 550e8400-e29b-41d4-a716-446655440000
                done: false
                response:
                  deployment_id: 550e8400-e29b-41d4-a716-446655440000
                  endpoint_id: my-org-gpt-oss-120b
                  org_name: my-org
                  model_arch_id: gpt-oss-120b
                  version_id: 1
                  created: 1736700000
                  updated: 1736700000
                  rollout_status: not_started
components:
  schemas:
    EndpointDeployModelRequest:
      properties:
        model:
          type: string
          title: Model
          description: >-
            Model version name in the format of
            `orgs/<org_name>/models/<model_arch_id>/versions/<version_id>`,
            where `<version_id>` can be an integer version ID or a model version
            alias.
      additionalProperties: false
      type: object
      required:
        - model
      title: EndpointDeployModelRequest
    EndpointDeployResponse:
      properties:
        name:
          type: string
          title: Name
          description: >-
            Deployment UUID. This can be used to later query the status of this
            deployment.
        done:
          type: boolean
          title: Done
          description: Whether the operation is complete.
        response:
          anyOf:
            - $ref: '#/components/schemas/EndpointDeploymentOperation'
            - type: 'null'
          description: Deployment metadata captured at submission time.
      additionalProperties: false
      type: object
      required:
        - name
        - done
      title: EndpointDeployResponse
    EndpointDeploymentOperation:
      properties:
        deployment_id:
          type: string
          title: Deployment Id
          description: UUID for the deployment record.
        endpoint_id:
          type: string
          title: Endpoint Id
          description: >-
            Unique identifier for the endpoint. It is used as the
            [`model`](https://inference-docs.cerebras.ai/api-reference/chat-completions#param-model)
            field when making an inference request.


            Example: `my-org-gpt-oss-120b`


            ```bash

            curl --location 'https://api.cerebras.ai/v1/chat/completions' \

            --header 'Content-Type: application/json' \

            --header "Authorization: Bearer ${CEREBRAS_API_KEY}" \

            --data '{
              "model": "my-org-gpt-oss-120b",
              "messages": [{"content": "Hello!", "role": "user"}]
            }'

            ```
        org_name:
          type: string
          title: Org Name
          description: Owning organization for the deployment.
        model_arch_id:
          type: string
          title: Model Arch Id
          description: Name of the model architecture (e.g. `llama3.1-8b`, `gpt-oss-120b`).
        version_id:
          type: integer
          title: Version Id
          description: Version identifier for the deployed model.
        version_alias:
          anyOf:
            - type: string
            - type: 'null'
          title: Version Alias
          description: >-
            Original version reference from deploy request (alias or integer
            version ID as specified by user).
        created:
          type: integer
          title: Created
          description: Unix timestamp (in seconds) when the deployment record was created.
        updated:
          type: integer
          title: Updated
          description: >-
            Unix timestamp (in seconds) when the deployment record was last
            updated.
        rollout_status:
          anyOf:
            - $ref: '#/components/schemas/DeploymentRolloutStatus'
            - type: 'null'
          description: Latest rollout status reported by the deployer, if any.
        max_unavailable_replicas:
          anyOf:
            - type: integer
            - type: 'null'
          title: Max Unavailable Replicas
          description: Maximum number of replicas that can be unavailable during rollout.
        rollout_replicas_updated:
          anyOf:
            - type: integer
            - type: 'null'
          title: Rollout Replicas Updated
          description: Number of replicas that have been updated in the rollout.
        rollout_total_replicas:
          anyOf:
            - type: integer
            - type: 'null'
          title: Rollout Total Replicas
          description: Total number of replicas in the deployment.
        rollout_error_message:
          anyOf:
            - type: string
            - type: 'null'
          title: Rollout Error Message
          description: Error message from the rollout, if any.
      additionalProperties: false
      type: object
      required:
        - deployment_id
        - endpoint_id
        - org_name
        - model_arch_id
        - version_id
        - created
        - updated
      title: EndpointDeploymentOperation
    DeploymentRolloutStatus:
      type: string
      enum:
        - not_started
        - in_progress
        - rolling_back
        - rolled_back
        - done
        - error
        - cancelled
      title: DeploymentRolloutStatus
      description: Deployment rollout states for database storage.
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      description: >-
        Management API key generated from the `Management API keys` section on
        the `API keys` page at https://cloud.cerebras.ai. Use the format:
        `Bearer <MANAGEMENT_API_KEY>`

````