How It Works
- Define the tool: Provide a name, description, and input parameters for each tool you want the model to access.
- Send the request: The prompt is sent along with available tool definitions in your API call.
- Model decides: The model analyzes the prompt and its available tools to decide if a tool can help answer the question. If it decides to use a tool, it responds with a structured output indicating which tool to call and what arguments to use.
- Execute the tool: The client application receives the model’s tool call request, executes the specified tool (such as calling an external API), and retrieves the result.
- Generate final response: The result from the tool is sent back to the model, which can then use this new information to generate a final, accurate response to the user.
Basic Tool Calling
1
Initial Setup
To begin, we need to import the necessary libraries and set up our Cerebras client.
2
Setting Up the Tool
Our first step is to define the tool that our AI will use. In this example, we’re creating a simple calculator function that can perform basic arithmetic operations.
3
Defining the Tool Schema
Next, we define the tool schema. This schema acts as a blueprint for the AI, describing the tool’s functionality, when to use it, and what parameters it expects. It helps the AI understand how to interact with our custom tool effectively.
With
strict: true enabled, tool call arguments are guaranteed to match your schema exactly through constrained decoding.4
Making the API Call
With our tool and its schema defined, we can now set up the conversation for our AI. We will prompt the LLM using natural language to conduct a simple calculation, and make the API call.This call sends our messages and tool schema to the LLM, allowing it to generate a response that may include tool use.
5
Handling Tool Calls
Now that we’ve made the API call, we need to process the response and handle any tool calls the LLM might have made. Note that the LLM determines based on the prompt if it should rely on a tool to respond to the user. Therefore, we need to check for any tool calls and handle them appropriately.In the code below, we first check if there are any tool calls in the model’s response. If a tool call is present, we proceed to execute it and ensure that the function is fulfilled correctly. The function call is logged to indicate that the model is requesting a tool call, and the result of the tool call is logged to clarify that this is not the model’s final output but rather the result of fulfilling its request. The result is then passed back to the model so it can continue generating a final response.
Strict Mode for Tool Calling
Strict mode ensures that the model generates tool call arguments that exactly match your defined schema. This is essential for building reliable agentic workflows where invalid parameters could break your application.Why Strict Mode Matters for Tools
Without strict mode, tool calls might include:- Wrong parameter types (e.g.,
"2"instead of2) - Missing required parameters
- Unexpected extra parameters
- Malformed argument JSON
Enabling Strict Mode
Setstrict to true inside the function object of your tool definition:
Python
Schema Requirements
When using strict mode, you must setadditionalProperties: false. This is required for every object in your schema.
For information about schema limitations that apply when using strict mode, see Limitations in Strict Mode.
Strict Mode with Parallel Tool Calling
Strict mode works with parallel tool calling. When multiple tools are called simultaneously, each tool call’s arguments will conform to its respective schema:Python
Multi-turn Tool Calling
Most real-world workflows require more than one tool invocation. Multi-turn tool calling lets a model call a tool, incorporate its output, and then, within the same conversation, decide whether it needs to call the tool (or another tool) again to finish the task. It works as follows:- After every tool call you append the tool response to
messages, then ask the model to continue. - The model itself decides when enough information has been gathered to produce a final answer.
- Continue calling
client.chat.completions.create()until you get a message withouttool_calls.
Parallel Tool Calling
Parallel tool calling allows models to call multiple tools simultaneously for reduced latency and faster responses. For example, if a user asks “Is Toronto warmer than Montreal?”, the model needs to check the weather in both cities. Rather than making two separate requests, parallel tool calling enables the model to request both operations at once, reducing latency and improving efficiency. Parallel tool calling is most beneficial when:- A single query requires multiple independent data points (e.g., comparing weather in different cities)
- Multiple tools need to be invoked that don’t have dependencies on each other
- You want to reduce the number of API calls and overall response time
Enable Parallel Tool Calling
You can explicitly control this behavior using theparallel_tool_calls parameter:
Example: Weather Comparison
Let’s walk through a complete example that demonstrates parallel tool calling by comparing weather in two cities.1
Define the Weather Tool
First, we’ll create a simple weather function and define the tool in our schema:
2
Make the API Call with Parallel Tool Calling Enabled
Now we’ll send a query that requires checking weather in two different cities:
3
Handle Multiple Tool Calls
When parallel tool calling is enabled, the model’s response may contain multiple tool calls in the
tool_calls array. We need to iterate through all of them:
