Introduction

Since their inception, large language models (LLMs) have impressed us with their ability to write eloquent prose, generate code, and answer seemingly any question we pose. The technology, along with the applications developed using it, offers glimpses of the potential use cases that could be unlocked in the near future.

One of the most promising areas receiving significant research and engineering focus is AI agents. Currently, most interactions with LLMs involve users either prompting a chatbot or using LLM-powered features within applications to autocomplete or summarize text. AI agents, or ‘agentic workflows,’ introduce a design pattern for LLM application development that enables developers to create more complex applications. By incorporating components such as planning, memory, and tool access, AI agents can interact with various environments, reason through their actions, and enhance their own outputs.

An important aspect of agentic workflows, as hinted at earlier, is that they often require multiple steps to complete a task. Each step may involve one or several calls to an LLM. To achieve this effectively, low-latency inference is crucial when building AI agents. A slower inference solution could cause the application to operate so sluggishly that it becomes unusable. See our blog post on ReadAgent for an example of the benefit of low-latency inference.

At Cerebras, we’ve designed an inference solution that excels in these workflows with its low-latency performance, and we’re excited to see the types of AI agents it will enable. To introduce more developers to the possibilities of agentic workflows and the Cerebras SDK, we’re launching a series on building AI agents, which will cover the various components of the agentic workflow.

In this first section, we will introduce the key components of an AI agent. In subsequent articles, we will explore each component in greater detail and build applications that apply the methods discussed.

AI Agent Control Flow

To understand how AI agents work, let’s first examine their unique approach to control flow. Unlike traditional software or standard LLM applications, AI agents delegate control flow to the AI itself. In programming terms, control flow refers to the sequence in which individual statements, instructions, or function calls are executed. With AI agents, this decision-making process is entrusted to the AI, enabling it to determine the sequence and nature of actions based on natural language inputs and instructions. This is an essential aspect for handling complex and open-ended tasks.

It’s important to note that AI-driven control flow doesn’t have to be an all-or-nothing approach. In practical applications, it’s common and often beneficial to have a mix of AI-driven and traditional software-driven control. Some modules or components of a system can rely on AI for decision-making and task execution, while others can use conventional programming logic. This hybrid approach allows developers to leverage the strengths of both paradigms – the flexibility and natural language understanding of AI agents, and the predictability and efficiency of traditional software. We recently explored an example of this paradigm in a blog post outlining how to build an AI agent for writing marketing copy.

Underlying this control flow are four key components that form the foundation of agentic workflows. Let’s explore each of these elements to gain a more comprehensive understanding of how AI agents function.

The Components of Agentic Workflows 

As the field of AI engineering is still in its infancy and continually evolving, so are the components that form an AI agent. That said, the most common components found in agentic workflows are tool use, planning, reflection, and memory.

Tool Use

Our standard interactions with LLMs typically involve prompting the model for a response generated either from the model’s training data or external data provided in the prompt. Tool use extends an LLM’s capabilities by enabling an AI agent to interact with its environment using tools essential for completing assigned tasks. For example, these tools can allow the AI agent to retrieve data from external sources and write to them.

In the context of AI agents, tools are simply functions that execute specific pieces of code. The tools available to the model are defined in a tool schema, which includes the tool’s name, description, available parameters, and a description of what each parameter does. When an AI agent is prompted to complete a task, it refers to the tool schema to select the appropriate tool for that task.

Imagine a workflow where an AI agent needs to convert between two currencies. Without access to external tools, such as an API that provides up-to-date currency data, the model wouldn’t know the current conversion rate. However, with the use of tools, developers can define a function that utilizes an available API to retrieve the necessary currency values and perform the conversion.

As mentioned earlier, tool use can also be applied to writing data to external sources. For example, an AI agent could assist with screening emails from potential clients. Available tools could fetch emails from an inbox, screen them to identify leads, extract relevant information, and write it to a database such as a CRM.

Planning

When handling complex workflows, AI agents must execute several intermediate steps to complete their assigned tasks. To do this, they need to plan or ‘reason’ about the logical sequence of actions required. This process typically involves prompting the agent to think through the task step-by-step and outline a plan of action that it can refer to throughout. During this planning phase, the AI agent must consider factors such as the user’s request, the available tools, and the information needed to complete each subsequent step.

AI agents may also refine an initial plan based on new information. This can be done with a human-in-the-loop, where the user reviews and approves the plan before the AI agent executes it, or by having another AI system evaluate and improve the plan—a technique known as reflection, which we’ll explore shortly.

The effectiveness of planning for AI agents is an active and still evolving area of research. A recent paper titled ‘HuggingGPT’, authored by researchers at Microsoft and Zhejiang University, demonstrated the use of an LLM as a planning tool. In this approach, the AI agent is given access to several AI models available on Hugging Face. The LLM handles task planning by analyzing user requests, breaking them down into subtasks, and then selecting the appropriate models from Hugging Face for each subtask. This global planning approach enables the system to address a wide range of AI tasks across language, vision, audio, and video domains.

Reflection

A simple yet effective strategy for developing AI agents is reflection. LLMs have been shown to produce better results when given feedback on how to improve their outputs. For example, a code sample generated by an LLM might use an inefficient design pattern that could slow down execution compared to a more optimized solution. A human expert, such as a software engineer, could critique the LLM’s output and suggest improvements to address the issue. While including a human-in-the-loop is necessary for many workflows, it can lead to slower iteration times. Another effective approach is to have the LLM critique its own outputs. An AI agent can start by generating an initial output for a given prompt, then the same model can be prompted to reflect on this output and suggest improvements. Through this iterative process, the AI agent continuously enhances its own output.

One paper that explores this concept is ‘SELF-REFINE,’ a collaboration between the University of Washington, the University of California San Diego, NVIDIA, and Google. The study tested a method that improved initial outputs from LLMs through iterative feedback and refinement. Researchers prompted an LLM across various tasks, including coding, dialogue response, and mathematical reasoning. After generating an initial output, the same model provided feedback on its response, which was then incorporated into the next prompt to enhance the subsequent output. On average, the model’s outputs improved by approximately 20% using this method on the tasks that researchers used in their experimentation (see Section 3 of the paper for details).

Memory

The final component of agentic workflows is memory, which enables agents to retain and utilize information over time. In the context of agentic workflows, we differentiate between short-term and long-term memory. Short-term memory pertains to the model’s context, which is useful for storing information from previous user interactions and data immediately relevant to the task at hand. Long-term memory, on the other hand, involves larger volumes of data typically stored in an external database and retrieved by the AI agent as needed. By leveraging both types of memory, AI agents can provide more relevant and personalized interactions. Additionally, they can retain information related to previous steps in a larger task, enabling them to handle more complex use cases.

Conclusion

In this first post, we explored the components that make up agentic workflows and highlighted some key differences in control flow between traditional software and agentic workflows. AI agents are an exciting area within the LLM space because they enable developers to maximize a model’s potential without modifying the model itself. This makes it a crucial topic for developers interested in building with LLMs. In the following sections, we’ll delve deeper and create applications that incorporate tools, reflection, planning, memory, and even more advanced topics such as multi-agent workflows.