To get started with a free API key, click here.

The Cerebras Inference SDK supports tool use, enabling programmatic execution of specific tasks by sending requests with clearly defined operations. This guide will walk you through a detailed example of how to use tool use with the Cerebras Inference SDK.

For a more detailed conceptual guide to tool use and function calling, please visit our AI Agent Bootcamp section on the topic.

1

Initial Setup

To begin, we need to import the necessary libraries and set up our Cerebras client.

If you haven’t set up your Cerebras API key yet, please visit our QuickStart guide for detailed instructions on how to obtain and configure your API key.

import os
import json
import re
from cerebras.cloud.sdk import Cerebras

# Initialize Cerebras client
client = Cerebras(
    api_key=os.environ.get("CEREBRAS_API_KEY"),
)
2

Setting Up the Tool

Our first step is to define the tool that our AI will use. In this example, we’re creating a simple calculator function that can perform basic arithmetic operations.

def calculate(expression):
    expression = re.sub(r'[^0-9+\-*/().]', '', expression)
    
    try:
        result = eval(expression)
        return str(result)
    except (SyntaxError, ZeroDivisionError, NameError, TypeError, OverflowError):
        return "Error: Invalid expression"
3

Defining the Tool Schema

Next, we define the tool schema. This schema acts as a blueprint for the AI, describing the tool’s functionality, when to use it, and what parameters it expects. It helps the AI understand how to interact with our custom tool effectively.

Please ensure that "strict": True is set inside the function object in the tool schema.

tools = [
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "strict": True,
            "description": "A calculator tool that can perform basic arithmetic operations. Use this when you need to compute mathematical expressions or solve numerical problems.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "The mathematical expression to evaluate"
                    }
                },
                "required": ["expression"]
            }
        }
    }
]
4

Making the API Call

With our tool and its schema defined, we can now set up the conversation for our AI. We will prompt the LLM using natural language to conduct a simple calculation, and make the API call.

This call sends our messages and tool schema to the LLM, allowing it to generate a response that may include tool use.

You must set parallel_tool_calls=False when using tool calling with llama-4-scout-17b-16e-instruct. The model doesn’t currently support parallel tool calling, but a future release will.

messages = [
    {"role": "system", "content": "You are a helpful assistant with access to a calculator. Use the calculator tool to compute mathematical expressions when needed."},
    {"role": "user", "content": "What's the result of 15 multiplied by 7?"},
]

response = client.chat.completions.create(
    model="llama-4-scout-17b-16e-instruct",
    messages=messages,
    tools=tools,
    parallel_tool_calls=False,
)
5

Handling Tool Calls

Now that we’ve made the API call, we need to process the response and handle any tool calls the LLM might have made. Note that the LLM determines based on the prompt if it should rely on a tool to respond to the user. Therefore, we need to check for any tool calls and handle them appropriately.

In the code below, we first check if there are any tool calls in the model’s response. If a tool call is present, we proceed to execute it and ensure that the function is fulfilled correctly. The function call is logged to indicate that the model is requesting a tool call, and the result of the tool call is logged to clarify that this is not the model’s final output but rather the result of fulfilling its request. The result is then passed back to the model so it can continue generating a final response.

choice = response.choices[0].message

if choice.tool_calls:
    function_call = choice.tool_calls[0].function
    if function_call.name == "calculate":
        # Logging that the model is executing a function named "calculate".
        print(f"Model executing function '{function_call.name}' with arguments {function_call.arguments}")

        # Parse the arguments from JSON format and perform the requested calculation.
        arguments = json.loads(function_call.arguments)
        result = calculate(arguments["expression"])

        # Note: This is the result of executing the model's request (the tool call), not the model's own output.
        print(f"Calculation result sent to model: {result}")
       
       # Send the result back to the model to fulfill the request.
        messages.append({
            "role": "tool",
            "content": json.dumps(result),
            "tool_call_id": choice.tool_calls[0].id
        })
 
       # Request the final response from the model, now that it has the calculation result.
        final_response = client.chat.completions.create(
            model="llama-4-scout-17b-16e-instruct",
            messages=messages,
        )
        
        # Handle and display the model's final response.
        if final_response:
            print("Final model output:", final_response.choices[0].message.content)
        else:
            print("No final response received")
else:
    # Handle cases where the model's response does not include expected tool calls.
    print("Unexpected response from the model")

Tool calling is currently enabled via prompt engineering, but strict adherence to expected outputs is not yet guaranteed. The LLM autonomously determines whether to call a tool. An update is in progress to improve reliability in future versions.

In this case, the LLM determined that a tool call was appropriate to answer the users’ question of what the result of 15 multiplied by 7 is. See the output below.

Model executing function 'calculate' with arguments {"expression": "15 * 7"}
Calculation result sent to model: 105
Final model output: 15 * 7 = 105

Multi-turn Tool Use

Most real-world workflows require more than one tool invocation. Multi-turn tool calling lets a model call a tool, incorporate its output, and then, within the same conversation, decide whether it needs to call the tool (or another tool) again to finish the task.

It works as follows:

  1. After every tool call you append the tool response to messages, then ask the model to continue.
  2. The model itself decides when enough information has been gathered to produce a final answer.
  3. Continue calling client.chat.completions.create() until you get a message without tool_calls.

Note: Multi-turn tool use is currently supported only with the qwen-3-32b model. We plan to support full multi-turn support for additional models in a future SDK release. See Current Limitations of Multi-turn Tool Use for more information.

The example below demonstrates multi-turn tool use as an extension of the calculator example above. Before continuing, make sure you’ve completed Steps 1–3 from the calculator setup section.

messages = [
    {
        "role": "system",
        "content": (
            "You are a helpful assistant with a calculator tool. "
            "Use it whenever math is required."
        ),
    },
    {"role": "user", "content": "First, multiply 15 by 7. Then take that result, add 20, and divide the total by 2. What's the final number?"},
]

# Register every callable tool once
available_functions = {
    "calculate": calculate,
}

while True:
    resp = client.chat.completions.create(
        model="qwen-3-32b",
        messages=messages,
        tools=tools,
    )
    msg = resp.choices[0].message

    # If the assistant didn’t ask for a tool, we’re done
    if not msg.tool_calls:
        print("Assistant:", msg.content)
        break

    # Save the assistant turn exactly as returned
    messages.append(msg.model_dump())    

    # Run the requested tool
    call  = msg.tool_calls[0]
    fname = call.function.name

    if fname not in available_functions:
        raise ValueError(f"Unknown tool requested: {fname!r}")

    args_dict = json.loads(call.function.arguments)  # assumes JSON object
    output = available_functions[fname](**args_dict)

    # Feed the tool result back
    messages.append({
        "role": "tool",
        "tool_call_id": call.id,
        "content": json.dumps(output),
    })

Current Limitations of Multi-turn Tool Use

Multi-turn tool use is currently supported only with the qwen-3-32b model. Other models will error if you include a non-empty tool_calls array on an assistant turn.

For those models, make sure your assistant response explicitly clears its tool_calls like this:

{
  "role": "assistant",
  "content": "Here's the current temperature in Paris: 18°C",
  "tool_calls": []
}

Then append your “role”: “tool” message containing the function output:

{
  "role": "tool",
  "tool_call_id": "abc123",
  "content": "Paris temperature is 18°C"
}

Conclusion

Tool use is an important feature that extends the capabilities of LLMs by allowing them to access pre-defined tools. Here are some more resources to continue learning about tool use with the Cerebras Inference SDK.