Error Codes

The Cerebras Inference API uses standard HTTP response status codes to indicate the success or failure of an API request. In cases of errors, the SDK throws specific exceptions that inherit from cerebras.cloud.sdk.APIError. This documentation outlines the error types, how to handle them, and provides examples for effective error management.

Error Types

All errors in the Cerebras Inference API inherit from cerebras.cloud.sdk.APIError. The main categories of errors are:

cerebras.cloud.sdk.APIConnectionError: Raised when the library is unable to connect to the API.
cerebras.cloud.sdk.APIStatusError: Raised when the API returns a non-success status code (4xx or 5xx).

Error Codes and Corresponding Exceptions

Status Code	Error Type
400	BadRequestError
401	AuthenticationError
402	PaymentRequired
403	PermissionDeniedError
404	NotFoundError
422	UnprocessableEntityError
429
>=500	InternalServerError
N/A	APIConnectionError

Handling Errors

Here’s an example of how to handle different types of errors:

import cerebras.cloud.sdk
from cerebras.cloud.sdk import Cerebras

client = Cerebras()

try:
    client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "This should cause an error!",
            }
        ],
        model="some-model-that-doesnt-exist",
    )
except cerebras.cloud.sdk.APIConnectionError as e:
    print("The server could not be reached")
    print(e.__cause__)  # an underlying Exception, likely raised within httpx.
except cerebras.cloud.sdk.RateLimitError as e:
    print("A 429 status code was received; we should back off a bit.")
except cerebras.cloud.sdk.APIStatusError as e:
    print("Another non-200-range status code was received")
    print(e.status_code)
    print(e.response)

Retries

By default, certain errors are automatically retried 2 times with a short exponential backoff. These include:

Connection errors
408 Request Timeout
409 Conflict
>= 500 Internal errors

You can configure or disable retry settings using the max_retries option:

from cerebras.cloud.sdk import Cerebras

# Configure the default for all requests:
client = Cerebras(
    max_retries=0,  # Disable retries (default is 2)
)

# Or, configure per-request:
client.with_options(max_retries=5).chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Why is fast inference important?",
        }
    ],
    model="llama3.1-8b",
)

Timeouts

Requests time out after 1 minute by default. You can configure this with a timeout option:

from cerebras.cloud.sdk import Cerebras
import httpx

# Configure the default for all requests:
client = Cerebras(
    timeout=20.0,  # 20 seconds (default is 1 minute)
)

# More granular control:
client = Cerebras(
    timeout=httpx.Timeout(60.0, read=5.0, write=10.0, connect=2.0),
)

# Override per-request:
client.with_options(timeout=5.0).chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Why is fast inference important?",
        }
    ],
    model="llama3.1-8b",
)

On timeout, an APITimeoutError is thrown. Note that requests that time out are retried twice by default.

Get Started

Capabilities

Compatibility

Resources

Support

Error Types

Error Codes and Corresponding Exceptions

Handling Errors

Retries

Timeouts

Get Started

Capabilities

Compatibility

Resources

Support

​Error Types

​Error Codes and Corresponding Exceptions

​Handling Errors

​Retries

​Timeouts

Error Types

Error Codes and Corresponding Exceptions

Handling Errors

Retries

Timeouts