Cerebras Inference with Docker

By combining Docker’s containerization with Cerebras’ inference speed, teams can evaluate new code quickly while keeping experiments isolated, repeatable, and far, far away from a production database.

Try it yourself

In this tutorial we will build DevDuck, our multi-agent system that combines Cerebras fast inference with Docker Compose for isolated AI development environments.

Initialization

First, make sure that you have Docker installed, which you can download from the official website.

Clone the repository

Next, open a terminal and run the following commands:

git clone https://github.com/sebastiand-cerebras/docker-cerebras-demo
cd docker-cerebras-demo

Set up your environment

Once you’ve cloned the repository, create a .env file with your Cerebras API key, which you can obtain from the Cerebras platform. DevDuck uses the ultra-fast gpt-oss-120b model through Cerebras for all agents.

cat <<EOF > .env
CEREBRAS_API_KEY={YOUR_API_KEY}
CEREBRAS_BASE_URL=https://api.cerebras.ai/v1
CEREBRAS_CHAT_MODEL=gpt-oss-120b
EOF

Running DevDuck

All that’s left to do is run the program. To build and start DevDuck, run:

docker compose up --build

The compose setup spins up our agents and Docker’s MCP gateway which manages the MCP tools for working with the Node.js sandbox containers.

Using DevDuck

The application will be available at http://0.0.0.0:8000. DevDuck has a coordinator agent that manages two specialized Cerebras-powered sub-agents for different programming tasks.To get started, simply ask for a task and the program will automatically route to the correct agent. The DevDuck coordinator hands off to the appropriate Cerebras agent, which then executes your request in seconds.

With Compose, we can easily host multiple containers. Our agents can switch between each other and call tools with no extra work on your part. With Cerebras inference, expect sandbox initializations, file creation, and code generation to be done in seconds.

Example Usage

🔢 Prime Number Checker

Hello Cerebras!

Create a function called isPrime that checks if a number is prime.

EXECUTE it with these numbers: 7, 10, 13, 20

Show me which ones are prime.

🌊 Fibonacci Sequence Generator

Cerebras, create a function that generates the first N Fibonacci numbers.

EXECUTE it to generate the first 10 Fibonacci numbers.

Print the sequence.

📊 Array Statistics Calculator

Hi Cerebras!

Write code that takes this array: [5, 10, 15, 20, 25]
Calculate the sum, average, min, and max.

EXECUTE it and print the results.

💡 Quick Tips for Sandbox Execution

Use the word “EXECUTE” explicitly
Keep functions simple and self-contained
Provide specific test data
Avoid external dependencies

Available Models

DevDuck uses Cerebras models for ultra-fast code generation:

Model	Parameters	Best For
llama-3.3-70b	70B	Best for complex reasoning, long-form content, and tasks requiring deep understanding
qwen-3-32b	32B	Balanced performance for general-purpose applications
llama3.1-8b	8B	Fastest option for simple tasks and high-throughput scenarios
gpt-oss-120b	120B	Largest model for the most demanding tasks (default in DevDuck)
zai-glm-4.7	357B	Advanced 357B parameter model with strong reasoning capabilities

Change the CEREBRAS_CHAT_MODEL environment variable to switch between models.

Get Started

Capabilities

Compatibility

Resources

Support

Try it yourself

Example Usage

Available Models

Get Started

Capabilities

Compatibility

Resources

Support

​Try it yourself

​Example Usage

​Available Models

Try it yourself

Example Usage

Available Models