Skip to main content

What is AI Suite?

AI Suite is a Python library that provides a unified interface for interacting with multiple large language model (LLM) providers. With AI Suite, you can easily switch between different providers and models using the same codebase, making it simple to compare performance, cost, and accuracy across providers. By integrating Cerebras with AI Suite, you can leverage Cerebras’s ultra-fast inference speeds while maintaining the flexibility to use other providers when needed. Learn more at the AI Suite GitHub repository.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here.
  • Python 3.11 or higher installed on your system.

Configure AI Suite

1

Install AI Suite

Install the AI Suite library using pip. This lightweight package provides the unified interface for accessing multiple LLM providers:
pip install aisuite cerebras-cloud-sdk
If you want to compare Cerebras with other providers (as shown in the examples below), you’ll also need to install their SDKs:
pip install openai anthropic
2

Set up your API key

Configure your Cerebras API key as an environment variable. AI Suite will automatically detect and use this key when making requests to Cerebras:
export CEREBRAS_API_KEY="your-cerebras-api-key-here"
For a more permanent solution, add this to your .env file:
CEREBRAS_API_KEY=your-cerebras-api-key-here
If you’re using other providers for comparison, set their API keys as well:
export OPENAI_API_KEY="your-openai-api-key-here"
export ANTHROPIC_API_KEY="your-anthropic-api-key-here"
3

Initialize the AI Suite client

Create an AI Suite client instance. This single client can be used to access any supported LLM provider, including Cerebras:
import aisuite as ai

client = ai.Client()
The client automatically configures itself based on your environment variables, so no additional setup is needed.
4

Make your first request

To use Cerebras models through AI Suite, prefix the model name with cerebras: followed by the model identifier. Here’s a simple example that generates a response:
import aisuite as ai

client = ai.Client()

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What are the benefits of fast inference?"}
]

response = client.chat.completions.create(
    model="cerebras:llama-3.3-70b",
    messages=messages,
    temperature=0.7
)

print(response.choices[0].message.content)
This code sends a chat completion request to Cerebras’s Llama 3.3 70B model and prints the response.

Compare Multiple Models

One of AI Suite’s key advantages is the ability to easily compare responses from different models. Here’s how to query multiple Cerebras models with the same prompt:
import aisuite as ai

client = ai.Client()

# Define different Cerebras models to compare
models = [
    "cerebras:llama-3.3-70b",
    "cerebras:qwen-3-32b",
    "cerebras:llama3.1-8b"
]

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to calculate fibonacci numbers."}
]

for model in models:
    print(f"\n--- Response from {model} ---")
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.7
    )
    print(response.choices[0].message.content)
This approach allows you to:
  • Compare response quality across different Cerebras models
  • Benchmark inference speeds across model sizes
  • Test different models for specific use cases
  • Easily switch between models without changing your code structure
You can also compare Cerebras with other providers like OpenAI or Anthropic by adding their models to the list (e.g., "openai:gpt-4o", "anthropic:claude-opus-4-5"). Just make sure to install their SDKs and set the appropriate API keys as shown in the setup steps above.

Available Cerebras Models

You can use any of Cerebras’s production models through AI Suite by prefixing them with cerebras::
  • cerebras:llama-3.3-70b - Best for complex reasoning, long-form content, and tasks requiring deep understanding
  • cerebras:qwen-3-32b - Balanced performance for general-purpose applications
  • cerebras:llama3.1-8b - Fastest option for simple tasks and high-throughput scenarios
  • cerebras:gpt-oss-120b - Largest model for the most demanding tasks
  • cerebras:zai-glm-4.6 - Advanced 357B parameter model with strong reasoning capabilities
For detailed information about each model’s capabilities and pricing, visit the Cerebras models page.

Advanced Usage

Adjusting Parameters

Customize model behavior with parameters like temperature, max_tokens, and top_p to fine-tune responses for your specific use case:
import aisuite as ai

client = ai.Client()

response = client.chat.completions.create(
    model="cerebras:llama-3.3-70b",
    messages=[{"role": "user", "content": "Explain quantum computing."}],
    temperature=0.3,  # Lower temperature for more focused responses
    max_tokens=500,   # Limit response length
    top_p=0.9        # Nucleus sampling parameter
)

print(response.choices[0].message.content)

Multi-Turn Conversations

Maintain context across multiple exchanges by building up your messages array:
import aisuite as ai

client = ai.Client()

messages = [
    {"role": "system", "content": "You are a helpful math tutor."},
    {"role": "user", "content": "What is the Pythagorean theorem?"}
]

# First response
response = client.chat.completions.create(
    model="cerebras:llama-3.3-70b",
    messages=messages
)

print(response.choices[0].message.content)

# Add assistant's response to conversation history
messages.append({
    "role": "assistant",
    "content": response.choices[0].message.content
})

# Continue the conversation
messages.append({
    "role": "user",
    "content": "Can you give me an example?"
})

response = client.chat.completions.create(
    model="cerebras:llama-3.3-70b",
    messages=messages
)

print(response.choices[0].message.content)

Frequently Asked Questions

AI Suite automatically detects API keys from environment variables based on the provider prefix. For Cerebras, it looks for CEREBRAS_API_KEY. You can set multiple provider keys (like OPENAI_API_KEY, ANTHROPIC_API_KEY) and AI Suite will use the appropriate key based on the model prefix in your request.
Currently, AI Suite focuses on synchronous API calls. For async operations, you may need to wrap calls in your own async functions or use the provider’s native SDK directly. Check the AI Suite GitHub repository for updates on async support.
Different providers may have different error formats. Wrap your API calls in try-except blocks and handle provider-specific errors. AI Suite attempts to normalize responses, but error handling may vary by provider.
try:
    response = client.chat.completions.create(
        model="cerebras:llama-3.3-70b",
        messages=messages
    )
except Exception as e:
    print(f"Error: {e}")
    # Fallback to another provider or handle error
AI Suite provides a unified interface for basic chat completions. Advanced features like function calling depend on the underlying provider’s capabilities. Check the specific provider’s documentation for feature availability and implementation details.
Use AI Suite to benchmark different models for your specific use case. Cerebras offers competitive pricing with ultra-fast inference speeds. Start with smaller models like cerebras:llama3.1-8b for simple tasks and reserve larger models for complex reasoning. Visit the Cerebras pricing page for detailed cost information.

Next Steps

Troubleshooting

If you see an error about missing API keys:
  • Verify your CEREBRAS_API_KEY environment variable is set correctly
  • Ensure you’re running your script in the same terminal session where you exported the variable
  • Try using a .env file with a library like python-dotenv for persistent configuration
  • Restart your Python interpreter or IDE after setting environment variables
If you receive a model not found error:
  • Verify you’re using the correct model name format: cerebras:model-name
  • Check that the model name matches one of the available Cerebras models
  • Ensure there are no typos in the model identifier (note the hyphen in llama-3.3-70b)
  • Confirm you’re using a current production model, not a deprecated version
If you experience connection issues:
  • Verify your internet connection is stable
  • Check that your API key is valid and has not expired in your dashboard
  • Ensure you’re not hitting rate limits (check your usage in the dashboard)
  • Try a simple test request to isolate the issue
If responses seem slower than expected:
  • Cerebras typically provides the fastest inference speeds in the industry
  • Compare with other providers using the multi-model example above to benchmark
  • Check your network latency and consider your geographic location relative to Cerebras’s servers
  • Verify you’re not using an unnecessarily large model for simple tasks
  • Ensure you’re not rate-limited or experiencing API throttling
If you encounter import errors:
  • Verify AI Suite is installed: pip show aisuite
  • Ensure you’re using the correct import statement: import aisuite as ai
  • Check your Python version is 3.7 or higher: python --version
  • Try reinstalling the package: pip install --upgrade aisuite