> ## Documentation Index
> Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Error Codes

The Cerebras Inference API uses standard HTTP response status codes to indicate the success or failure of an API request. When errors occur, the SDK throws specific exceptions that inherit from `cerebras.cloud.sdk.APIError`. This documentation outlines the error types, how to handle them, and provides examples for effective error management.

## Error Types

All errors in the Cerebras Inference API inherit from `cerebras.cloud.sdk.APIError`. The main categories of errors are:

1. `cerebras.cloud.sdk.APIConnectionError`: The SDK raises this error when the library cannot connect to the API.
2. `cerebras.cloud.sdk.APIStatusError`: The SDK raises this error when the API returns a non-success status code (4xx or 5xx).

## HTTP Status Codes

| Status Code | Error Type                                                                                                                                                              |
| ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 400         | BadRequestError                                                                                                                                                         |
| 401         | AuthenticationError                                                                                                                                                     |
| 402         | PaymentRequired                                                                                                                                                         |
| 403         | PermissionDeniedError                                                                                                                                                   |
| 404         | NotFoundError                                                                                                                                                           |
| 413         | ContentTooLarge                                                                                                                                                         |
| 422         | UnprocessableEntityError                                                                                                                                                |
| 429         | <Tooltip tip="Learn how rate limits are applied and measured." cta="Read our Rate Limits guide." href="/support/rate-limits">RateLimitError/Too many requests</Tooltip> |
| 500         | InternalServerError                                                                                                                                                     |
| 503         | ServiceUnavailable                                                                                                                                                      |
| N/A         | APIConnectionError                                                                                                                                                      |

## Image Input Errors

| Status                    | Condition                                                    | Error Detail                                                                   |
| ------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------------------------ |
| **413 Content Too Large** | Total image payload exceeds the configured request limit     | `"Total request size exceeds maximum"`                                         |
|                           | Number of image inputs exceeds the configured request limit  | `"Number of image inputs exceeds maximum"`                                     |
|                           | Decompressed RGB bytes exceed 350 MB for an individual image | `"Image decompression exceeds maximum memory limit"`                           |
| **400 Bad Request**       | `image_url.url` is not a valid `data:` URI                   | `"Invalid image_url: expected base64 data URI"`                                |
|                           | `image_url.url` is an HTTPS URL                              | `"HTTPS image URLs are not supported. Use a base64-encoded data URI instead."` |
|                           | Base64 payload is corrupt or image cannot be decoded         | `"Image data could not be decoded"`                                            |
|                           | `image_url` content part on a non-`user` role                | `"image_url content parts are only supported on user messages"`                |
|                           | Image sent to a model without multimodal support             | `"Model {model} does not support image inputs"`                                |

## Handling Errors

Here's an example of how to handle different types of errors:

<CodeGroup>
  ```python Python theme={null}
  import cerebras.cloud.sdk
  from cerebras.cloud.sdk import Cerebras

  client = Cerebras()

  try:
      client.chat.completions.create(
          messages=[
              {
                  "role": "user",
                  "content": "This should cause an error!",
              }
          ],
          model="some-model-that-doesnt-exist",
      )
  except cerebras.cloud.sdk.APIConnectionError as e:
      print("The server could not be reached")
      print(e.__cause__)  # an underlying Exception, likely raised within httpx.
  except cerebras.cloud.sdk.RateLimitError as e:
      print("A 429 status code was received; we should back off a bit.")
  except cerebras.cloud.sdk.APIStatusError as e:
      print("Another non-200-range status code was received")
      print(e.status_code)
      print(e.response)
  ```

  ```javascript Node.js theme={null}
  import Cerebras from '@cerebras/cerebras_cloud_sdk';

  const client = new Cerebras({
    apiKey: process.env['CEREBRAS_API_KEY'], // This is the default and can be omitted
  });
  async function main() {
    const completion = await client.chat.completions
      .create({
        messages: [{ role: 'user', content: 'This should cause an error!' }],
        model: 'some-model-that-doesnt-exist' as any, // Ask TS to ignore the obviously invalid model name... Do not do this!
      })
      .catch(async (err) => {
        if (err instanceof Cerebras.APIError) {
          console.log(err.status); // 400
          console.log(err.name); // BadRequestError
          console.log(err.headers); // {server: 'nginx', ...}
          console.log(err); // Full exception
        } else {
          throw err;
        }
      });
  }

  main();
  ```
</CodeGroup>

## Retries

By default, the SDK automatically retries certain errors two times with a short exponential backoff. These include:

* Connection errors
* 408 Request Timeout
* <Tooltip tip="Learn how rate limits are applied and measured." cta="Read our Rate Limits guide." href="/support/rate-limits">429 Rate Limit</Tooltip>
* \>= 500 Internal errors

You can configure or disable retry settings using the `max_retries` option:

<CodeGroup>
  ```python Python theme={null}
  from cerebras.cloud.sdk import Cerebras

  # Configure the default for all requests:
  client = Cerebras(
      max_retries=0,  # Disable retries (default is 2)
  )

  # Or, configure per-request:
  client.with_options(max_retries=5).chat.completions.create(
      messages=[
          {
              "role": "user",
              "content": "Why is fast inference important?",
          }
      ],
      model="gpt-oss-120b",
  )
  ```

  ```javascript Node.js theme={null}
  import Cerebras from '@cerebras/cerebras_cloud_sdk';

  // Configure the default for all requests:
  const client = new Cerebras({
    maxRetries: 0, // default is 2
  });

  // Or, configure per-request:
  await client.chat.completions.create({ messages: [{ role: 'user', content: 'Why is fast inference important?' }], model: 'gpt-oss-120b' }, {
    maxRetries: 5,
  });
  ```
</CodeGroup>

## Timeouts

Requests time out after 1 minute by default. You can configure this with a `timeout` option:

<CodeGroup>
  ```python Python theme={null}
  from cerebras.cloud.sdk import Cerebras
  import httpx

  # Configure the default for all requests:
  client = Cerebras(
      timeout=20.0,  # 20 seconds (default is 1 minute)
  )

  # More granular control:
  client = Cerebras(
      timeout=httpx.Timeout(60.0, read=5.0, write=10.0, connect=2.0),
  )

  # Override per-request:
  client.with_options(timeout=5.0).chat.completions.create(
      messages=[
          {
              "role": "user",
              "content": "Why is fast inference important?",
          }
      ],
      model="gpt-oss-120b",
  )
  ```

  ```javascript Node.js theme={null}
  import Cerebras from '@cerebras/cerebras_cloud_sdk';

  // Configure the default for all requests:
  const client = new Cerebras({
    timeout: 20 * 1000, // 20 seconds (default is 1 minute)
  });

  // Override per-request:
  await client.chat.completions.create({ messages: [{ role: 'user', content: 'Why is fast inference important?' }], model: 'gpt-oss-120b' }, {
    timeout: 5 * 1000,
  });
  ```
</CodeGroup>

On timeout, the SDK throws an `APITimeoutError`. Note that the SDK retries requests that time out twice by default.