Skip to main content

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here
  • Weights & Biases Account - Visit Weights & Biases and create an account or log in
  • Python 3.7 or higher

What is Weave?

Weave is W&B’s lightweight toolkit for tracking and evaluating LLM applications. It automatically captures traces of your LLM calls, including inputs, outputs, token usage, and latency. This makes it easy to debug issues, monitor performance, and iterate on your prompts and models. Key features when using Weave with Cerebras:
  • Automatic tracing of all Cerebras API calls
  • Version control for your prompts and code
  • Performance monitoring with detailed metrics
  • Evaluation framework for testing model outputs
  • Beautiful UI for exploring traces and debugging

Configure Weave

1

Install required dependencies

Install the Weave SDK and Cerebras Cloud SDK to get started:
pip install weave cerebras-cloud-sdk
2

Configure environment variables

Create a .env file in your project directory with your API keys. You can find your W&B API key in your W&B settings.
CEREBRAS_API_KEY=your-cerebras-api-key-here
WANDB_API_KEY=your-wandb-api-key-here
3

Initialize Weave and create your client

Weave needs to be initialized at the start of your script. This creates a project in W&B where all your traces will be logged.
import os
import weave
from cerebras.cloud.sdk import Cerebras

# Initialize Weave with your project name
weave.init("cerebras-quickstart")

# Create the Cerebras client with integration tracking
client = Cerebras(
    api_key=os.environ["CEREBRAS_API_KEY"]
)
The weave.init() call automatically starts tracking all LLM calls made through the Cerebras SDK. You don’t need to add any additional decorators or wrappers for basic tracing.
4

Make your first traced request

Now you can use the Cerebras SDK as usual. Weave will automatically capture all the details of your API calls, including the model used, messages sent, tokens consumed, and response time.
import os
import weave
from cerebras.cloud.sdk import Cerebras

# Initialize Weave (requires WANDB_API_KEY environment variable)
if os.getenv("WANDB_API_KEY"):
    weave.init("cerebras-quickstart")
else:
    print("WANDB_API_KEY not set - skipping Weave initialization")

# Create client with integration header
client = Cerebras(
    api_key=os.environ["CEREBRAS_API_KEY"]
)

# Make a request - Weave automatically traces this if initialized
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the fastest land animal?"}
    ],
    extra_headers={
        "X-Cerebras-3rd-Party-Integration": "weave"
    }
)

print(response.choices[0].message.content)
After running this code, visit your Weave dashboard to see the trace, including token usage, latency, and the full conversation.

Advanced Usage

Wrapping Functions with @weave.op

For more granular tracking, you can wrap your functions with the @weave.op decorator. This creates versioned operations that track inputs, outputs, and the code itself. This is especially useful when you want to track custom logic around your LLM calls.
import os
import weave
from cerebras.cloud.sdk import Cerebras

# Initialize Weave (requires WANDB_API_KEY environment variable)
if os.getenv("WANDB_API_KEY"):
    weave.init("cerebras-operations")
else:
    print("WANDB_API_KEY not set - skipping Weave initialization")

# Create client with integration header
client = Cerebras(
    api_key=os.environ["CEREBRAS_API_KEY"]
)

# Weave will track the inputs, outputs, and code of this function
@weave.op
def get_animal_speed(animal: str, model: str = "llama3.1-8b") -> str:
    """Get information about how fast an animal can run."""
    
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a zoology expert."},
            {"role": "user", "content": f"How fast can a {animal} run? Provide the speed in mph."}
        ],
        extra_headers={
            "X-Cerebras-3rd-Party-Integration": "weave"
        }
    )
    return response.choices[0].message.content

# Each call is tracked with its inputs and outputs
result1 = get_animal_speed("cheetah")
print(result1)
The @weave.op decorator provides:
  • Automatic versioning - Code changes create new versions
  • Input/output tracking - All parameters and returns are logged
  • Call hierarchy - See how operations call each other
  • Performance metrics - Track execution time for each operation

Creating Weave Models

Weave Models are a powerful way to encapsulate your LLM logic with configurable parameters. They make it easy to experiment with different model configurations and track which settings produce the best results.
import os
import weave
from cerebras.cloud.sdk import Cerebras

# Initialize Weave (requires WANDB_API_KEY environment variable)
if os.getenv("WANDB_API_KEY"):
    weave.init("cerebras-models")
else:
    print("WANDB_API_KEY not set - skipping Weave initialization")

# Create client with integration header
client = Cerebras(
    api_key=os.environ["CEREBRAS_API_KEY"]
)

class AnimalSpeedModel(weave.Model):
    """A model for predicting animal speeds using Cerebras."""
    
    model: str
    temperature: float
    system_prompt: str = "You are a zoology expert specializing in animal locomotion."

    @weave.op
    def predict(self, animal: str) -> str:
        """Predict the top speed of an animal."""
        
        response = client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": self.system_prompt},
                {"role": "user", "content": f"What's the top speed of a {animal}? Provide a specific number in mph."}
            ],
            temperature=self.temperature,
            extra_headers={
                "X-Cerebras-3rd-Party-Integration": "weave"
            }
        )
        return response.choices[0].message.content

# Create an instance with specific configuration
speed_model = AnimalSpeedModel(
    model="llama-3.3-70b",
    temperature=0.3
)

# Make predictions
result = speed_model.predict(animal="cheetah")
print(result)

# Try with different configuration
speed_model_creative = AnimalSpeedModel(
    model="qwen-3-32b",
    temperature=0.8
)
result2 = speed_model_creative.predict(animal="cheetah")
print(result2)
Weave Models provide:
  • Configuration tracking - All model parameters are versioned
  • Easy experimentation - Compare different configurations side-by-side
  • Reproducibility - Exact model settings are saved with each prediction
  • Evaluation ready - Models can be easily evaluated with Weave’s evaluation framework

Next Steps

FAQ

If you don’t see traces in your Weave dashboard:
  1. Verify that weave.init() is called before any Cerebras API calls
  2. Check that your W&B API key is correctly set in your environment
  3. Ensure you’re logged into the correct W&B account in your browser
  4. Try running wandb login in your terminal to re-authenticate
  5. Wait a few seconds after your script completes for traces to sync
Weave is designed to be lightweight with minimal overhead. The tracing happens asynchronously, so it doesn’t significantly impact your API call latency. Most users see less than 10ms of additional overhead per traced call.
Yes! Weave automatically handles streaming responses from the Cerebras SDK. The complete streamed response will be captured in the trace once the stream completes.
You can disable tracing by simply not calling weave.init() at the start of your script. Alternatively, you can use environment variables to conditionally enable Weave:
import os
import weave

if os.getenv("ENABLE_WEAVE") == "true":
    weave.init("my-project")
For additional support, visit the Weave GitHub repository or reach out in the W&B Community forums.