LLM Agent Architecture

# LLM Agent Architecture ## Overview {[[wang2024LLMAgentSurvey]]} provide a complete study on the use of LLMs as agents. > ## 2.1 Agent Architecture Design > Recent advancements in LLMs have demonstrated their great potential to accomplish a wide range of tasks in the form of question-answering (QA). However, building autonomous agents is far from QA, since they need to fulfill specific roles and autonomously perceive and learn from the environment to evolve themselves like humans. To bridge the gap between traditional LLMs and autonomous agents, a crucial aspect is to design rational agent architectures to assist LLMs in maximizing their capabilities. Along this direction, previous work has developed a number of modules to enhance LLMs. In this section, we propose a unified framework to summarize these modules. Specifically, the overall structure of our framework is illustrated in Figure 2, which is composed of a profiling module, a memory module, a planning module, and an action module. The purpose of the profiling module is to identify the role of the agent. The memory and planning modules place the agent into a dynamic environment, enabling it to recall past behaviors and plan future actions. The action module is responsible for translating the agent’s decisions into specific outputs. Within these modules, the profiling module impacts the memory and planning modules, and collectively, these three modules influence the action module. In the following, we detail these modules. |![wang2024fig2](<./_images/wang2024LLMAgentSurvey_fig2.png>) | |:--:| | *Figure 2 from {[[wang2024LLMAgentSurvey]]}.* | ### Profiling Module Autonomous agents typically perform tasks by assuming specific roles, such as coders, teachers, and domain experts. The profiling module aims to indicate the profiles of the agent roles, which are usually written into the prompt to influence the behavior of the LLM. The choice of information to profile the agent is largely determined by the specific application scenarios. Existing literature commonly employs the following three strategies for creating specific profiles. #### **Handcrafting Method** In this method, agent profiles are manually specified. For instance, if one would like to design agents with different personalities, he can use "you are an outgoing person" or "you are an introverted person" to profile the agent. The handcrafting method has been leveraged in a lot of previous work to specify the agent profiles. In general, the handcrafting method is very flexible, since one can assign any profile information to the agents. However, it can be also labor-intensive, particularly when dealing with a large number of agents. #### **LLM-generation Method** In this method, agent profiles are automatically generated based on LLMs. Typically, it begins by indicating the profile generation rules, elucidating the composition and attributes of the agent profiles within the target population. Then, one can optionally specify several seed agent profiles to serve as few-shot examples. Finally, LLMs are leveraged to generate all the agent profiles. However, it may lack precise control over the generated profiles, which can result in inconsistencies or deviations from the intended characteristics. #### **Dataset Alignment Method** In this method, the agent profiles are obtained from real-world datasets. Typically, one can first organize the information about real humans in the datasets into natural language prompts, and then leverage it to profile the agents. > ***Remark*** While most of the previous work leverage the above profile generation strategies independently, we argue that combining them may yield additional benefits. The profile module serves as the foundation for agent design, exerting significant influence on the agent memorization, planning, and action procedures. ### Memory Module The memory module plays a very important role in the agent architecture design. It stores information perceived from the environment and leverages the recorded memories to facilitate future actions. The memory module can help the agent to accumulate experiences, self-evolve, and behave in a more consistent, reasonable, and effective manner. This section provides a comprehensive overview of the memory module, focusing on its structures, formats, and operations. #### **Memory Structures** LLM-based autonomous agents often draw inspiration from cognitive science research on human memory processes. Human memory follows a general progression from sensory memory that registers perceptual inputs, to short-term memory that maintains information transiently, to long-term memory that consolidates information over extended periods. When designing the agent memory structures, researchers take inspiration from these aspects of human memory. In particular, short-term memory is analogous to the input information within the context window constrained by the transformer architecture. Longterm memory resembles the external vector storage that agents can rapidly query and retrieve from as needed. In the following, we introduce two commonly used memory structures based on the shortterm and long-term memories. - *Unified Memory*. This structure only simulates the human short-term memory, which is usually realized by in-context learning, and the memory information is directly written into the prompts. In practice, implementing short-term memory is straightforward and can enhance an agent’s ability to perceive recent or contextually sensitive behaviors and observations. However, the limited context window of LLMs restricts incorporating comprehensive memories into prompts, which can impair agent performance. This challenge necessitates LLMs with larger context windows and the ability to handle extended contexts. Consequently, numerous researchers turn to hybrid memory systems to mitigate this issue. - *Hybrid Memory*. This structure explicitly models the human short-term and long-term memories. The short-term memory temporarily buffers recent perceptions, while long-term memory consolidates important information over time. Users can create multiple conversations with different agents on the same canvas, facilitating the sharing of memory objects through a simple dragand-drop interface. In practice, integrating both short-term and long-term memories can enhance an agent’s ability for long-range reasoning and accumulation of valuable experiences, which are crucial for accomplishing tasks in complex environments. > ***Remark*** Careful readers may find that there may also exist another type of memory structure, that is, only based on the long-term memory. However, we find that such type of memory is rarely documented in the literature. Our speculation is that the agents are always situated in continuous and dynamic environments, with consecutive actions displaying a high correlation. Therefore, the capture of shortterm memory is very important and usually cannot be disregarded. #### **Memory Formats** In addition to the memory structure, another perspective to analyze the memory module is based on the formats of the memory storage medium, for example, natural language memory or embedding memory. Different memory formats possess distinct strengths and are suitable for various applications. In the following, we introduce several representative memory formats. - *Natural Languages*. In this format, memory information such as the agent behaviors and observations are directly described using raw natural language. This format possesses several strengths. Firstly, the memory information can be expressed in a flexible and understandable manner. Moreover, it retains rich semantic information that can provide comprehensive signals to guide agent behaviors. - *Embeddings*. In this format, memory information is encoded into embedding vectors, which enhances both retrieval and reading efficiency. - *Databases*. In this format, memory information is stored in databases, allowing the agent to manipulate memories efficiently and comprehensively. - *Structured Lists*. In this format, memory information is organized into lists, and the semantic of memory can be conveyed in an efficient and concise manner. > ***Remark*** Here we only show several representative memory formats, but it is important to note that there are many uncovered ones. Moreover, it should be emphasized that these formats are not mutually exclusive; many models incorporate multiple formats to concurrently harness their respective benefits. Above, we mainly discuss the internal designs of the memory module. In the following, we turn our focus to memory operations, which are used to interact with external environments. #### **Memory Operations** The memory module plays a critical role in allowing the agent to acquire, accumulate, and utilize significant knowledge by interacting with the environment. The interaction between the agent and the environment is accomplished through three crucial memory operations: memory reading, memory writing, and memory reflection. In the following, we introduce these operations more in detail. - *Memory Reading*. The objective of memory reading is to extract meaningful information from memory to enhance the agent’s actions. The key of memory reading lies in how to extract valuable information from history actions. Memories that are more recent, relevant, and important are more likely to be extracted. - *Memory Writing*. The purpose of memory writing is to store information about the perceived environment in memory. Storing valuable information in memory provides a foundation for retrieving informative memories in the future, enabling the agent to act more efficiently and rationally. During the memory writing process, there are two potential problems that should be carefully addressed. On one hand, it is crucial to address how to store information that is similar to existing memories (i.e., memory duplicated). On the other hand, it is important to consider how to remove information when the memory reaches its storage limit (i.e., memory overflow). In the following, we discuss these problems more in detail. (1) Memory Duplicated. To incorporate similar information, people have developed various methods for integrating new and previous records. - *Memory Reflection*. Memory reflection emulates humans’ ability to witness and evaluate their own cognitive, emotional, and behavioral processes. When adapted to agents, the objective is to provide agents with the capability to independently summarize and infer more abstract, complex and high-level information. More specifically, the agent has the capability to summarize its past experiences stored in memory into broader and more abstract insights. To begin with, the agent generates three key questions based on its recent memories. Then, these questions are used to query the memory to obtain relevant information. Building upon the acquired information, the agent generates five insights, which reflect the agent high-level ideas. A significant distinction between traditional LLMs and the agents is that the latter must possess the capability to learn and complete tasks in dynamic environments. If we consider the memory module as responsible for managing the agents’ past behaviors, it becomes essential to have another significant module that can assist the agents in planning their future actions. In the following, we present an overview of how researchers design the planning module ### Planning Module When faced with a complex task, humans tend to deconstruct it into simpler subtasks and solve them individually. The planning module aims to empower the agents with such human capability, which is expected to make the agent behave more reasonably, powerfully, and reliably. #### **Planning without Feedback** In this method, the agents do not receive feedback that can influence its future behaviors after taking actions. In the following, we present several representative strategies. - *Single-path Reasoning*. In this strategy, the final task is decomposed into several intermediate steps. These steps are connected in a cascading manner, with each step leading to only one subsequent step. LLMs follow these steps to achieve the final goal. - *Multi-path Reasoning*. In this strategy, the reasoning steps for generating the final plans are organized into a tree-like structure. Each intermediate step may have multiple subsequent steps. This approach is analogous to human thinking, as individuals may have multiple choices at each reasoning step. - *External Planner*. Despite the demonstrated power of LLMs in zero-shot planning, effectively generating plans for domain-specific problems remains highly challenging. To address this challenge, researchers turn to external planners. These tools are well-developed and employ efficient search algorithms to rapidly identify correct, or even optimal, plans. #### **Planning with Feedback** In many real-world scenarios, the agents need to make long-horizon planning to solve complex tasks. When facing these tasks, the above planning modules without feedback can be less effective due to the following reasons: firstly, generating a flawless plan directly from the beginning is extremely difficult as it needs to consider various complex preconditions. As a result, simply following the initial plan often leads to failure. Moreover, the execution of the plan may be hindered by unpredictable transition dynamics, rendering the initial plan non-executable. Simultaneously, when examining how humans tackle complex tasks, we find that individuals may iteratively make and revise their plans based on external feedback. To simulate such human capability, researchers have designed many planning modules, where the agent can receive feedback after taking actions. The feedback can be obtained from environments, humans, and models, which are detailed in the following. - *Environmental Feedback*. This feedback is obtained from the objective world or virtual environment. For instance, it could be the game’s task completion signals or the observations made after the agent takes an action. - *Human Feedback*. In addition to obtaining feedback from the environment, directly interacting with humans is also a very intuitive strategy to enhance the agent planning capability. The human feedback is a subjective signal. It can effectively make the agent align with the human values and preferences, and also help to alleviate the hallucination problem. - *Model Feedback*. Apart from the aforementioned environmental and human feedback, which are external signals, researchers have also investigated the utilization of internal feedback from the agents themselves. This type of feedback is usually generated based on pre-trained models. > ***Remark*** In conclusion, the implementation of planning module without feedback is relatively straightforward. However, it is primarily suitable for simple tasks that only require a small number of reasoning steps. Conversely, the strategy of planning with feedback needs more careful designs to handle the feedback. Nevertheless, it is considerably more powerful and capable of effectively addressing complex tasks that involve long-range reasoning. ### Action Module The action module is responsible for translating the agent’s decisions into specific outcomes. This module is located at the most downstream position and directly interacts with the environment. It is influenced by the profile, memory, and planning modules. This section introduces the action module from four perspectives: (1) Action goal: what are the intended outcomes of the actions? (2) Action production: how are the actions generated? (3) Action space: what are the available actions? (4) Action impact: what are the consequences of the actions? Among these perspectives, the first two focus on the aspects preceding the action ("beforeaction" aspects), the third focuses on the action itself ("in-action" aspect), and the fourth emphasizes the impact of the actions ("after-action" aspect). #### **Action Goal** The agent can perform actions with various objectives. Here, we present several representative examples: (1) Task Completion. In this scenario, the agent’s actions are aimed at accomplishing specific tasks, such as crafting an iron pickaxe in Minecraft or completing a function in software development. These actions usually have well-defined objectives, and each action contributes to the completion of the final task. Actions aimed at this type of goal are very common in existing literature. (2) Communication. In this case, the actions are taken to communicate with the other agents or real humans for sharing information or collaboration. (3) Environment Exploration. In this example, the agent aims to explore unfamiliar environments to expand its perception and strike a balance between exploring and exploiting. #### **Action Production** Different from ordinary LLMs, where the model input and output are directly associated, the agent may take actions via different strategies and sources. In the following, we introduce two types of commonly used action production strategies. (1) Action via Memory Recollection. In this strategy, the action is generated by extracting information from the agent memory according to the current task. The task and the extracted memories are used as prompts to trigger the agent actions. (2) Action via Plan Following. In this strategy, the agent takes actions following its pre-generated plans. #### **Action Space** Action space refers to the set of possible actions that can be performed by the agent. In general, we can roughly divide these actions into two classes: (1) external tools and (2) internal knowledge of the LLMs. In the following, we introduce these actions more in detail. - **External Tools**. While LLMs have been demonstrated to be effective in accomplishing a large amount of tasks, they may not work well for the domains which need comprehensive expert knowledge. In addition, LLMs may also encounter hallucination problems, which are hard to be resolved by themselves. To alleviate the above problems, the agents are empowered with the capability to call external tools for executing action. In the following, we present several representative tools which have been exploited in the literature. - **APIs**. Leveraging external APIs to complement and expand action space is a popular paradigm in recent years. - **Databases & Knowledge Bases**. Integrating external database or knowledge base enables agents to obtain specific domain information for generating more realistic actions. - **External Modules**. Previous studies often utilize external models to expand the range of possible actions. In comparison to APIs, external models typically handle more complex tasks. Each external model may correspond to multiple APIs. - **Internal Knowledge**. In addition to utilizing external tools, many agents rely solely on the internal knowledge of LLMs to guide their actions. We now present several crucial capabilities of LLMs that can support the agent to behave reasonably and effectively. (1) Planning Capability. Previous work has demonstrated that LLMs can be used as decent planners to decompose complex tasks into simpler ones. (2) Conversation Capability. LLMs can usually generate high-quality conversations. This capability enables agents to behave more like humans. In the previous work, many agents take actions based on the strong conversation capability of LLMs. (3) Common Sense Understanding Capability. Another important capability of LLMs is that they can well comprehend human common sense. Based on this capability, many agents can simulate human daily life and make human-like decisions. #### **Action Impact** Action impact refers to the consequences of an agent’s actions. While the range of possible impacts is vast, we highlight a few key examples for clarity: (1) Changing Environments. Agents can directly alter environment states by actions, such as moving their positions, collecting items, constructing buildings, etc. (2) Altering Internal States. Actions taken by the agent can also change the agent itself, including updating memories, forming new plans, acquiring novel knowledge, and more. (3) Triggering New Actions. In task completion processes, one action often leads to subsequent actions. F