> ## Documentation Index
> Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Error Codes

The Cerebras Inference API uses standard HTTP response status codes to indicate the success or failure of an API request. In cases of errors, the SDK throws specific exceptions that inherit from `cerebras.cloud.sdk.APIError`. This documentation outlines the error types, how to handle them, and provides examples for effective error management.

## Error Types

All errors in the Cerebras Inference API inherit from `cerebras.cloud.sdk.APIError`. The main categories of errors are:

1. `cerebras.cloud.sdk.APIConnectionError`: Raised when the library is unable to connect to the API.
2. `cerebras.cloud.sdk.APIStatusError`: Raised when the API returns a non-success status code (4xx or 5xx).

## HTTP Status Codes

| Status Code | Error Type                                                                                                                                                              |
| ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 400         | BadRequestError                                                                                                                                                         |
| 401         | AuthenticationError                                                                                                                                                     |
| 402         | PaymentRequired                                                                                                                                                         |
| 403         | PermissionDeniedError                                                                                                                                                   |
| 404         | NotFoundError                                                                                                                                                           |
| 422         | UnprocessableEntityError                                                                                                                                                |
| 429         | <Tooltip tip="Learn how rate limits are applied and measured." cta="Read our Rate Limits guide." href="/support/rate-limits">RateLimitError/Too many requests</Tooltip> |
| 500         | InternalServerError                                                                                                                                                     |
| 503         | ServiceUnavailable                                                                                                                                                      |
| N/A         | APIConnectionError                                                                                                                                                      |

## Handling Errors

Here's an example of how to handle different types of errors:

<CodeGroup>
  ```python Python theme={null}
  import cerebras.cloud.sdk
  from cerebras.cloud.sdk import Cerebras

  client = Cerebras()

  try:
      client.chat.completions.create(
          messages=[
              {
                  "role": "user",
                  "content": "This should cause an error!",
              }
          ],
          model="some-model-that-doesnt-exist",
      )
  except cerebras.cloud.sdk.APIConnectionError as e:
      print("The server could not be reached")
      print(e.__cause__)  # an underlying Exception, likely raised within httpx.
  except cerebras.cloud.sdk.RateLimitError as e:
      print("A 429 status code was received; we should back off a bit.")
  except cerebras.cloud.sdk.APIStatusError as e:
      print("Another non-200-range status code was received")
      print(e.status_code)
      print(e.response)
  ```

  ```javascript Node.js theme={null}
  import Cerebras from '@cerebras/cerebras_cloud_sdk';

  const client = new Cerebras({
    apiKey: process.env['CEREBRAS_API_KEY'], // This is the default and can be omitted
  });
  async function main() {
    const completion = await client.chat.completions
      .create({
        messages: [{ role: 'user', content: 'This should cause an error!' }],
        model: 'some-model-that-doesnt-exist' as any, // Ask TS to ignore the obviously invalid model name... Do not do this!
      })
      .catch(async (err) => {
        if (err instanceof Cerebras.APIError) {
          console.log(err.status); // 400
          console.log(err.name); // BadRequestError
          console.log(err.headers); // {server: 'nginx', ...}
          console.log(err); // Full exception
        } else {
          throw err;
        }
      });
  }

  main();
  ```
</CodeGroup>

## Retries

By default, certain errors are automatically retried 2 times with a short exponential backoff. These include:

* Connection errors
* 408 Request Timeout
* <Tooltip tip="Learn how rate limits are applied and measured." cta="Read our Rate Limits guide." href="/support/rate-limits">429 Rate Limit</Tooltip>
* \>= 500 Internal errors

You can configure or disable retry settings using the `max_retries` option:

<CodeGroup>
  ```python Python theme={null}
  from cerebras.cloud.sdk import Cerebras

  # Configure the default for all requests:
  client = Cerebras(
      max_retries=0,  # Disable retries (default is 2)
  )

  # Or, configure per-request:
  client.with_options(max_retries=5).chat.completions.create(
      messages=[
          {
              "role": "user",
              "content": "Why is fast inference important?",
          }
      ],
      model="gpt-oss-120b",
  )
  ```

  ```javascript Node.js theme={null}
  import Cerebras from '@cerebras/cerebras_cloud_sdk';

  // Configure the default for all requests:
  const client = new Cerebras({
    maxRetries: 0, // default is 2
  });

  // Or, configure per-request:
  await client.chat.completions.create({ messages: [{ role: 'user', content: 'Why is fast inference important?' }], model: 'gpt-oss-120b' }, {
    maxRetries: 5,
  });
  ```
</CodeGroup>

## Timeouts

Requests time out after 1 minute by default. You can configure this with a `timeout` option:

<CodeGroup>
  ```python Python theme={null}
  from cerebras.cloud.sdk import Cerebras
  import httpx

  # Configure the default for all requests:
  client = Cerebras(
      timeout=20.0,  # 20 seconds (default is 1 minute)
  )

  # More granular control:
  client = Cerebras(
      timeout=httpx.Timeout(60.0, read=5.0, write=10.0, connect=2.0),
  )

  # Override per-request:
  client.with_options(timeout=5.0).chat.completions.create(
      messages=[
          {
              "role": "user",
              "content": "Why is fast inference important?",
          }
      ],
      model="gpt-oss-120b",
  )
  ```

  ```javascript Node.js theme={null}
  import Cerebras from '@cerebras/cerebras_cloud_sdk';

  // Configure the default for all requests:
  const client = new Cerebras({
    timeout: 20 * 1000, // 20 seconds (default is 1 minute)
  });

  // Override per-request:
  await client.chat.completions.create({ messages: [{ role: 'user', content: 'Why is fast inference important?' }], model: 'gpt-oss-120b' }, {
    timeout: 5 * 1000,
  });
  ```
</CodeGroup>

On timeout, an `APITimeoutError` is thrown. Note that requests that time out are retried twice by default.
