0% found this document useful (0 votes)
17 views

2504.01963v1

This survey explores the foundational technologies for developing effective Large Language Model (LLM)-based multi-agent systems, focusing on architecture, memory, planning, and frameworks. It highlights the limitations of current LLM applications, such as hallucination and challenges in complex reasoning, while showcasing innovative frameworks like Mixture of Agents and ReAct that enhance agent collaboration and decision-making. The findings aim to provide a roadmap for future research in optimizing LLM multi-agent systems for improved performance and resilience.

Uploaded by

khanakmittal92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

2504.01963v1

This survey explores the foundational technologies for developing effective Large Language Model (LLM)-based multi-agent systems, focusing on architecture, memory, planning, and frameworks. It highlights the limitations of current LLM applications, such as hallucination and challenges in complex reasoning, while showcasing innovative frameworks like Mixture of Agents and ReAct that enhance agent collaboration and decision-making. The findings aim to provide a roadmap for future research in optimizing LLM multi-agent systems for improved performance and resilience.

Uploaded by

khanakmittal92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

LLMs Working in Harmony: A Survey on the

Technological Aspects of Building Effective


LLM-Based Multi Agent Systems
RM Aratchige Dr. WMKS Ilmini
Department of Computer Science, Department of Computer Science,
Faculty of Computing, Faculty of Computing,
General Sir John Kotelawala Defence University, General Sir John Kotelawala Defence University,
Ratmalana, Sri Lanka Ratmalana, Sri Lanka
arXiv:2504.01963v1 [cs.MA] 13 Mar 2025

[email protected] [email protected]

Abstract—This survey investigates foundational technologies Despite their successes, LLM-based applications have lim-
essential for developing effective Large Language Model (LLM)- itations. Hallucination remains a major issue, as models can
based multi-agent systems. Aiming to answer how best to opti- produce inaccurate information without external validation, as
mize these systems for collaborative, dynamic environments, we
focus on four critical areas: Architecture, Memory, Planning, and noted by Adewumi et al. [3]. This limits reliability where
Technologies/Frameworks. By analyzing recent advancements precision is essential. LLMs also struggle with complex or
and their limitations—such as scalability, real-time response abstract concepts, a challenge discussed by Cherkassky et
challenges, and agent coordination constraints—we provide a al. [4], who showed that even advanced models like GPT-
detailed view of the technological landscape. Frameworks like the 4 often fall short in imitating human reasoning. Multi-agent
Mixture of Agents architecture and the ReAct planning model
exemplify current innovations, showcasing improvements in role systems can mitigate these issues, allowing distinct agents to
assignment and decision-making. This review synthesizes key collaborate on complex tasks, as explored by Han et al. [5].
strengths and persistent challenges, offering practical recommen- Such collaboration improves decision-making, especially for
dations to enhance system scalability, agent collaboration, and tasks requiring deeper reasoning or specialized skills.
adaptability. Our findings provide a roadmap for future research, Research on LLM-based multi-agent systems exists [6],
supporting the creation of robust, efficient multi-agent systems
that advance both individual agent performance and collective but there remains a gap in identifying the best technological
system resilience. approaches for building these systems. This survey aims to
Index Terms—Multi-Agent Systems, Large Language Models, address this gap by answering two research questions: What
Artificial Intelligence, Technology Survey state-of-the-art technologies and approaches are available for
LLM multi-agent systems? And, which of these are most
I. I NTRODUCTION effective in practice? This survey seeks to identify optimal
technologies and methodologies, helping researchers and prac-
“Individually, we are one drop. Together, we are an ocean.” titioners navigate the evolving landscape of tools for building
- Ryunosuke Satoro advanced multi-agent systems.
The advent of Large Language Models (LLMs) has trans-
II. L ITERATURE R EVIEW
formed artificial intelligence, with the introduction of the
Transformer architecture in the landmark paper “Attention is The literature on large language models (LLMs) within
All You Need” [1] marking a key turning point. The Trans- multi-agent systems is still emerging, with significant research
former replaced traditional sequence models like recurrent gaps, particularly in understanding and improving multi-agent
neural networks with an attention-based mechanism, boosting paradigms. This review critically examines the current research
machine translation performance and reducing training time. in multi-agent LLM systems, focusing on architecture, plan-
Since then, LLMs have evolved further, particularly with mod- ning, memory, and frameworks. These technological aspects
els like GPT, which demonstrate unprecedented performance shape the potential and limitations of LLM multi-agent sys-
in natural language processing tasks [2]. LLMs now handle tems, and further investigation can expand their applications
tasks ranging from text generation to summarization, enabling and efficiency.
software that can understand and reason with natural language. This literature review is structured as given below:
This versatility drives the current research: exploring how • Architecture: Exploring various frameworks like
LLMs can build complex multi-agent systems that collaborate Conquer-and-Merge Discussion (CMD), Chain-of-
and specialize in tasks, especially in environments demanding Agents (CoA), Agent Forest, and Mixture-of-Agents
more than a single model’s capabilities. (MoA).
• Planning: Discussing frameworks such as AdaPlanner, The CoA architecture’s key innovation is the interleaved
ChatCoT, KnowAgent, RAP, Tree of Thoughts (ToT), and read-process method, allowing agents to process chunks of
ReAct. input before receiving the full context. This approach reduces
• Memory: Covering the role of memory in LLM systems, the complexity of handling long-context tasks and enhances
with insights on Vector Databases, Retrieval Augmented interpretability by splitting the work across multiple agents.
Generation, ChatDB, MemoryBank, RET-LLM, and Self- However, CoA has some limitations: the communication be-
Controlled Memory. tween agents could be improved by leveraging in-context
• Technologies / Frameworks: Examining the technolo- learning or fine-tuning LLMs to optimize their interaction; and
gies and frameworks that facilitate collaboration and the architecture needs further refinement to reduce computa-
task execution, including AutoGen, CAMEL, CrewAI, tional costs and latency, particularly in tasks requiring multiple
MetaGPT, and LangGraph. rounds of communication between agents.
A summary of the findings of each section is provided in 3) Agent Forest: The Agent Forest method, introduced by
Tables I-IV. Li et al. [9] in their paper ”More Agents is All You Need,”
focuses on scaling LLM performance by simply increasing
A. Architecture the number of agents. The method employs a sampling-and-
Research on multi-agent architectures in the LLM space voting approach: multiple agents generate responses, and the
is relatively limited. A significant gap exists in developing final answer is determined through majority voting.
frameworks that effectively orchestrate multiple agents to This technique demonstrates that increasing the number of
collaborate and solve complex reasoning tasks. While some agents improves performance, particularly on complex tasks.
research has started addressing these challenges, there remains However, it also reveals limitations: the performance gains
a need for more comprehensive solutions. Below, we analyze from Agent Forest depend on the inherent difficulty of the task,
some of the key papers exploring architectural designs in with diminishing returns for overly complex or overly simple
multi-agent LLM systems. tasks; and the approach increases computational costs due to
1) Conquer-and-Merge Discussion (CMD): In their paper, the need for multiple LLM queries, requiring optimization of
Wang et al. [7] present the Conquer-and-Merge Discussion the sampling phase to enhance cost efficiency.
(CMD) framework. This architecture leverages multiple LLM- 4) Mixture-of-Agents: The Mixture-of-Agents (MoA) ar-
powered agents that engage in open discussions to address chitecture proposed by Wang et al. [10] offers a layered design
reasoning tasks. Inspired by Minsky’s Society of Mind (1988), where agents collaborate in both proposer and aggregator
the CMD framework simulates human-like debates, where roles. Proposers generate diverse responses, while aggregators
each agent contributes different perspectives to improve overall synthesize the responses into high-quality outputs. This col-
reasoning capabilities. laborative model improves LLM performance across multiple
The CMD framework is structured to allow a group of benchmarks, including AlpacaEval 2.0 and MT-Bench.
agents to discuss a question, with each agent generating a MoA excels by leveraging the strengths of different LLMs,
viewpoint and explanation in several rounds of interaction. enabling specialized roles for various agents. Its main limita-
The discussion is guided by a shared history of responses, tions include the fact that not all LLMs are equally effective
with agents building on each other’s inputs. This design in both proposer and aggregator roles, with some models, like
outperforms single-agent methods like the Chain-of-Thought WizardLM, excelling as proposers but struggling as aggrega-
(CoT) approach, as demonstrated in their experiments. How- tors; and although MoA shows impressive results, expanding
ever, several limitations persist: the framework simplifies the number of agents and aggregators introduces complexity in
LLM sessions as agents, missing the opportunity to integrate managing collaboration, which may require more sophisticated
more sophisticated reasoning techniques such as the Tree- orchestration techniques in the future.
of-Thought method or external knowledge bases; CMD has
only been tested on reasoning tasks, leaving its applicability
B. Planning
to broader domains such as strategic planning or real-time
decision-making unexplored; and their experiments were lim- In LLM-based multi-agent systems, planning involves the
ited to a few LLMs (Bard, Gemini Pro, and ChatGPT-3.5), reasoning and action strategies that agents employ to achieve
so further analysis using other models is needed to assess their goals in dynamic environments. These systems must bal-
generalizability. ance reasoning over short and long horizons while responding
2) Chain-of-Agents (CoA): Zhang et al. [8], in their paper to environmental feedback. As autonomous decision-makers,
”Chain of Agents: Large Language Models Collaborating LLM agents generate sequences of actions based on initial
on Long-Context Tasks,” propose the Chain-of-Agents (CoA) goals but often require adaptive planning to address the com-
framework. This architecture is designed for handling long- plexity of real-world problems. Effective planning frameworks
context tasks that surpass the token limits of individual LLMs. for LLM multi-agent systems incorporate feedback mecha-
It consists of worker agents that sequentially process portions nisms, refining or recalibrating actions based on evolving
of the input and pass their results to a manager agent, which environmental conditions to avoid issues like hallucination or
aggregates the final output. over-simplification of plans.
TABLE I: Comparison of Architectures in LLM Multi-Agent Systems
Architecture Key Features Strengths Limitations
Conquer-and-Merge Multiple LLM-powered agents en- Outperforms single-agent methods; Lacks integration of sophisticated
Discussion (CMD) gage in open discussions; Simu- Allows agents to build on each reasoning techniques; Limited test-
lates human-like debates to im- other’s inputs. ing on broader domains; Analysis
prove reasoning. required with other models for gen-
eralizability.
Chain-of-Agents Sequential processing of input por- Reduces complexity of long- Communication improvement
(CoA) tions by worker agents; Man- context tasks; Enhances needed; Further refinement
ager agent aggregates results; Inter- interpretability by splitting work required to reduce computational
leaved read-process method. across agents. costs and latency.
Agent Forest Employs a sampling-and-voting Performance improves with the Gains depend on task difficulty;
approach with multiple agents; Per- number of agents; Effective for Increased computational costs due
formance determined by majority complex tasks. to multiple LLM queries.
voting.
Mixture-of-Agents Layered design with proposers and Leverages strengths of different Role effectiveness varies among
(MoA) aggregators; Collaborates to gen- LLMs; Shows impressive results LLMs; Complexity in managing
erate and synthesize high-quality across benchmarks. collaboration increases with more
outputs. agents.

1) AdaPlanner: In the AdaPlanner framework, Sun et al. ChatCoT leverages the strengths of chat-based LLMs by
[11] introduce a novel approach that allows LLM agents to initiating discussions with foundational knowledge about the
refine and adapt plans in response to real-time environmen- tools, tasks, and reasoning structure involved. The resulting
tal feedback. This marks a departure from traditional static process is iterative, using step-by-step reasoning that integrates
planning systems, which typically follow a fixed sequence tool manipulation seamlessly with CoT reasoning. Evaluations
of actions. AdaPlanner’s closed-loop system enables dynamic on datasets such as MATH and HotpotQA show that ChatCoT
adjustments, providing critical flexibility for handling com- achieves a 7.9% relative improvement in performance over
plex, long-horizon tasks where static plans often fail due to existing methods, demonstrating its potential in advancing
unexpected changes. LLM reasoning capabilities for intricate tasks.
AdaPlanner incorporates two major refinement strategies: However, ChatCoT has several limitations. The framework
in-plan refinement, where agents modify specific parts of an has yet to be tested with GPT-4 due to access restrictions,
existing plan to address immediate feedback, and out-of-plan which could affect the generalizability of its findings. More-
refinement, where they create new actions to tackle unforeseen over, its design is optimized for chat-based LLMs, which may
scenarios. To tackle the issue of LLM hallucinations, the au- limit its compatibility with other architectures. Additionally,
thors implement a code-style prompting mechanism, reducing the high computational requirements, particularly in terms
ambiguity in the generated plans and fostering consistency of GPU resources, pose challenges for widespread imple-
across various tasks. Additionally, AdaPlanner includes a skill mentation. Future research will aim to extend ChatCoT’s
discovery feature, allowing agents to reuse successful plans applicability across a broader range of tasks and expand its
from past tasks as few-shot examples for future problem- toolset, potentially enhancing its utility in diverse, complex
solving, effectively improving their adaptability and efficiency. reasoning scenarios.
Despite its advancements, AdaPlanner is not without limita- 3) KnowAgent: In the KnowAgent framework, Zhu et al.
tions. Its reliance on few-shot expert demonstrations for more [13] present a novel approach to enhance large language mod-
complex tasks remains a constraint, suggesting the need for els (LLMs) in performing complex reasoning tasks. LLMs,
future research on reducing or eliminating this dependency. while powerful, often struggle with generating coherent ac-
Additionally, while its performance in environments like ALF- tion sequences and interacting effectively with environments
World and MiniWoB++ is promising, further testing across due to a lack of inherent action knowledge. KnowAgent
diverse domains would better establish the robustness and addresses this gap by integrating an action knowledge base
generalizability of this adaptive planning approach. and employing a self-learning strategy, providing LLMs with
2) ChatCoT: Chen et al. [12] introduce ChatCoT, a frame- structured action knowledge that guides the planning process
work designed to improve large language models (LLMs) in and mitigates issues like planning hallucinations.
handling complex, multi-step reasoning tasks. Recognizing the KnowAgent utilizes this action knowledge to refine planning
limitations of traditional static reasoning models for tasks paths, ensuring LLMs generate more reasonable and exe-
that require specific knowledge and complex logical steps, cutable action trajectories. This structured approach translates
ChatCoT employs a tool-augmented chain-of-thought (CoT) complex action data into an understandable format for LLMs,
reasoning approach tailored to chat-based interactions, such as significantly enhancing planning accuracy. Experimental re-
those in ChatGPT. This framework allows LLMs to alternate sults on datasets like HotpotQA and ALFWorld show that
between tool manipulation and reasoning actions within a KnowAgent not only matches but often exceeds current state-
dynamic, multi-turn conversation, enhancing adaptability in of-the-art performance while effectively reducing planning
complex scenarios. hallucinations, demonstrating its potential for improving LLM
task execution. by evaluating multiple reasoning paths and assessing choices,
Despite its advancements, KnowAgent has limitations. The facilitating the ability to look ahead or backtrack as nec-
framework has primarily been evaluated on commonsense essary for making more informed global decisions. Exper-
question-answering and household tasks, with future poten- iments demonstrate that ToT significantly improves LMs’
tial in domains like medical reasoning, arithmetic, and web performance on three novel tasks that necessitate intricate
browsing. Additionally, KnowAgent currently supports only planning and search: Game of 24, Creative Writing, and Mini
single-agent applications; exploring multi-agent systems could Crosswords. For example, in the Game of 24, while GPT-4
further enhance its utility through collaborative task execution. with chain-of-thought prompting achieved only a 4% success
Lastly, the manual design of action knowledge bases presents rate, the ToT framework elevated this figure to 74%.
a labor-intensive challenge, suggesting the need for automated In the limitations and future directions section, the authors
solutions to improve adaptability and broaden the framework’s note that deliberate search methods like ToT may not be
application across diverse environments. essential for many tasks where GPT-4 already performs well.
4) RAP: Retrieval Augmented Processing: RAP [14] intro- The initial exploration is limited to three relatively simple
duces a groundbreaking framework that enhances the planning tasks, indicating a need for further research into more complex
capabilities of large language models (LLMs) by integrat- decision-making applications, such as coding, data analysis,
ing retrieval-augmented techniques with contextual memory. and robotics. Additionally, the resource-intensive nature of
As LLMs are increasingly employed as agents for complex search methods compared to sampling methods may pose
decision-making tasks in fields such as robotics, gaming, and challenges, though the modular flexibility of ToT allows users
API integration, the challenge of incorporating past experi- to tailor performance-cost tradeoffs. Ongoing open-source
ences into current decision-making processes remains signifi- initiatives could further reduce associated costs. The potential
cant. To address this, RAP dynamically leverages relevant past for fine-tuning LMs using ToT-style high-level counterfactual
experiences tailored to the current context, thereby improving decision-making is also highlighted as an avenue for improv-
the agents’ ability to plan effectively. ing LMs’ problem-solving abilities.
What sets RAP apart is its versatility, as it is designed to 6) ReAct: The ”ReAct” framework [16] represents a signif-
function seamlessly in both text-only and multimodal environ- icant advancement in the integration of reasoning and acting
ments. This adaptability allows it to tackle a broad spectrum of within large language models (LLMs). While LLMs have
tasks. Empirical evaluations demonstrate RAP’s effectiveness, showcased remarkable capabilities in language understanding
achieving state-of-the-art performance in textual scenarios and and interactive decision-making, the traditional separation of
significantly enhancing the capabilities of multimodal LLM reasoning (e.g., chain-of-thought prompting) and acting (e.g.,
agents in embodied tasks. These findings underscore RAP’s action plan generation) has limited their effectiveness in com-
potential to advance the functionality and applicability of LLM plex tasks. ReAct addresses this limitation by facilitating an
agents in real-world applications that demand sophisticated interleaved generation of reasoning traces and task-specific
decision-making. actions, enhancing synergy between the two processes. This
The RAP framework enables agents to store and retrieve approach allows reasoning traces to aid the model in inducing,
past experiences, guiding subsequent actions based on contex- tracking, and updating action plans while also managing
tual information extracted from various modalities, including exceptions, whereas actions enable interaction with external
text and images. The results from evaluations across multiple sources, such as knowledge bases or environments.
benchmarks reveal that RAP outperforms baseline methods, The application of ReAct across various language and
allowing language agents to flexibly utilize historical expe- decision-making tasks demonstrates its superiority over state-
riences in alignment with current situations. This capabil- of-the-art baselines, providing enhanced human interpretabil-
ity mirrors a fundamental human ability, thereby enhancing ity and trustworthiness. In experiments involving question
decision-making capabilities and paving the way for more answering (HotpotQA) and fact verification (Fever), ReAct
effective and intelligent LLM-based agents in complex, real- effectively mitigates common issues of hallucination and error
world scenarios. propagation found in chain-of-thought reasoning by leveraging
5) Tree of Thoughts (ToT): The ”Tree of Thoughts” (ToT) a simple Wikipedia API. This interaction leads to more human-
framework [15] introduces a novel approach to enhance the like task-solving trajectories that are more interpretable than
problem-solving capabilities of language models (LMs), ad- those produced by methods lacking reasoning traces.
dressing their limitations in tasks that require exploration, ReAct introduces a straightforward yet powerful method
strategic foresight, and the importance of initial decisions. for synergizing reasoning and acting within LLMs, yielding
Traditional LMs typically operate within a token-level, left-to- superior performance and interpretable decision traces across
right decision-making paradigm during inference, which can diverse tasks, including multi-hop question answering, fact-
hinder performance in more complex scenarios. ToT expands checking, and interactive decision-making. Given the positive
upon the popular ”Chain of Thought” prompting technique, results achieved, this approach is highly recommended for
allowing LMs to explore coherent units of text, referred to as enhancing LLMs’ action planning capabilities. Although the
”thoughts,” that act as intermediate steps in problem-solving. simplicity of ReAct presents advantages, it also reveals the
ToT enables LMs to engage in deliberate decision-making need for more demonstrations to effectively learn complex
tasks with large action spaces. Initial experiments indicate that research into expanding the utility of VecDBs in diverse
fine-tuning on specific tasks, such as HotpotQA, may further LLM applications, driving advancements in data handling,
improve performance, particularly through the incorporation of knowledge extraction, and the development of robust memory
high-quality human annotations. Exploring multi-task training solutions for LLM-based systems.
and integrating ReAct with complementary paradigms like 2) Retrieval Augmented Generation: Lewis et al. [18] intro-
reinforcement learning could lead to the development of more duce the Retrieval-Augmented Generation (RAG) framework,
robust agents, further unlocking the potential of LLMs for a novel approach aimed at enhancing the performance of large
diverse applications. pre-trained language models on knowledge-intensive tasks.
Traditional LLMs, while capable of storing vast amounts
C. Memory of factual knowledge within their parameters, often struggle
Memory plays a critical role in enhancing the capabilities with accessing and manipulating this knowledge in a precise
of agents to retain and retrieve information relevant to their and scalable way. Furthermore, updating their knowledge or
tasks. Effective memory systems enable agents to recall past providing provenance for decisions remains a significant chal-
interactions and experiences, facilitating informed decision- lenge. RAG addresses these limitations by combining para-
making and improving task performance in dynamic environ- metric memory, based on a pre-trained sequence-to-sequence
ments. The integration of advanced memory mechanisms, such (seq2seq) model, with non-parametric memory, represented
as vector databases and retrieval-augmented generation, allows as a dense vector index of external sources like Wikipedia,
agents to store vast amounts of information while maintaining accessed via a neural retriever.
quick access to pertinent data. Additionally, these memory By integrating these two memory systems, RAG allows
systems can implement self-controlled mechanisms to ensure models to retrieve relevant information dynamically from
that agents prioritize the most relevant memories, enabling external sources during language generation, rather than solely
them to adapt to new challenges while avoiding the pitfalls relying on the knowledge embedded in their parameters. This
of irrelevant or outdated information. By enhancing memory dual-memory design significantly improves the specificity,
functionality, LLM agents can better navigate complex tasks diversity, and factual accuracy of generated responses, partic-
and interactions, leading to more robust and intelligent behav- ularly in tasks such as open-domain question answering (QA)
ior. and knowledge-intensive text generation. The authors present
1) Vector Databases: Jing et al. [17] provide a comprehen- two variants of RAG: one that retrieves the same passage
sive survey of the intersection between large language models for the entire sequence and another that retrieves different
(LLMs) and vector databases (VecDBs), a rapidly evolv- passages for each token, further refining the generation pro-
ing area of research aimed at addressing critical limitations cess. Through extensive experiments, RAG outperforms state-
in LLM-based systems. Despite the impressive capabilities of-the-art parametric seq2seq models and retrieve-and-extract
of LLMs, they struggle with issues such as hallucinations, architectures, setting new benchmarks in multiple QA tasks.
memory constraints, outdated knowledge, and the high costs RAG’s impact goes beyond immediate performance gains; it
associated with commercial deployment. VecDBs offer a laid the groundwork for future developments in LLM memory
promising solution to these challenges by efficiently storing systems. The ability to hot-swap the retrieval index without
and retrieving the high-dimensional vector representations that retraining the model provides a scalable solution for updating
are fundamental to LLM operations. LLMs with new knowledge, addressing a critical issue in long-
The integration of LLMs and VecDBs enhances the ability term LLM use. This hybrid memory structure has inspired
of LLM systems to manage and retrieve vast amounts of infor- subsequent memory-augmented solutions, paving the way for
mation, reducing reliance on static memory and enabling more more sophisticated LLMs capable of seamlessly integrating
dynamic, context-aware interactions. By leveraging VecDBs, parametric and non-parametric knowledge sources to enhance
LLMs can access external knowledge bases, mitigating hallu- reasoning, generation, and decision-making across a wide
cinations and outdated information while improving response range of tasks.
accuracy. Additionally, this synergy helps overcome memory 3) ChatDB: In their paper, Hu et al. [19] introduce a novel
limitations by enabling the offloading of knowledge, allowing framework that enhances large language models (LLMs) by
for scalable, long-term storage solutions. This framework integrating symbolic memory, represented by SQL databases.
has paved the way for the development of memory systems The motivation stems from the limitations of neural memory
specifically designed for LLMs, advancing their ability to mechanisms, which are prone to error accumulation and strug-
manage complex tasks over extended interactions. gle with complex reasoning tasks. By incorporating symbolic
The paper highlights both the opportunities and chal- memory, ChatDB enables more precise and reliable memory
lenges in combining LLMs and VecDBs, categorizing ex- manipulation, allowing LLMs to perform multi-hop reasoning
isting research into distinct prototypes and interdisciplinary through interaction with an external database. This is inspired
approaches. It also addresses the engineering challenges re- by modern computer architectures rather than biological mod-
lated to optimizing this integration, such as designing efficient els, providing a more robust solution for advanced reasoning
data retrieval mechanisms and ensuring compatibility with tasks.
LLM architectures. Looking ahead, the authors call for further The ChatDB framework operates in three main stages.
TABLE II: Comparison of Planning Frameworks in LLM Multi-Agent Systems
Framework Key Features Strengths Limitations
AdaPlanner Dynamic adjustments to plans Flexibility in complex, long- Relies on few-shot expert demon-
based on real-time feedback; In- horizon tasks; Improved strations; Performance needs fur-
plan and out-of-plan refinement adaptability and efficiency through ther testing across diverse domains.
strategies; Code-style prompting skill discovery.
for reducing ambiguity.
ChatCoT Tool-augmented chain-of-thought Achieves 7.9% relative improve- Limited testing with GPT-4; High
reasoning for chat-based interac- ment in performance on complex computational requirements; Opti-
tions; Iterative, multi-turn conver- tasks; Enhances adaptability in rea- mized for chat-based models.
sations. soning scenarios.
KnowAgent Integration of an action knowledge Matches or exceeds state-of-the- Primarily evaluated on specific
base with self-learning strategies; art performance; Reduces planning tasks; Currently supports only
Refines planning paths for coherent hallucinations effectively. single-agent applications; Manual
action sequences. design of action knowledge bases
is labor-intensive.
RAP Retrieval-augmented techniques in- Achieves state-of-the-art perfor- Adapting for multimodal tasks
tegrated with contextual memory; mance in various benchmarks; En- requires careful implementation;
Functions in both text-only and hances decision-making by utiliz- Complexity in managing retrieval
multimodal environments. ing past experiences. processes.
ReAct Interleaves reasoning and acting; Superior performance across di- Simplicity may limit complexity
Facilitates decision traces and task- verse tasks; Effective in mitigat- handling; Further demonstrations
specific actions; Enhances inter- ing hallucination and error prop- needed for large action spaces.
pretability and trustworthiness. agation; Highly recommended for
action planning capabilities.

First, in the input processing stage, the system generates SQL MemoryBank operates around three key components: a
instructions to interact with the database if memory is required; memory storage system for data retention, a memory re-
otherwise, the LLM responds directly. In the chain-of-memory triever to summon context-specific memories, and a memory
stage, a series of SQL operations such as select, update, insert, updater inspired by psychological principles. This updater
and delete are executed, with each step influencing the next ensures that the system adapts over time, retaining essen-
based on the results of previous operations. Finally, in the tial information while allowing less significant memories to
response summary stage, ChatDB generates a coherent final fade. This anthropomorphic memory mechanism enhances
output based on the results obtained from manipulating the user interactions, providing more personalized responses and
symbolic memory, ensuring accurate and logical responses. a deeper understanding of user behavior. The framework is
The experimental results show that ChatDB outperforms versatile, functioning across both closed-source models, such
models like ChatGPT, especially in tasks requiring complex as ChatGPT, and open-source models, including ChatGLM.
reasoning, by eliminating error propagation through precise MemoryBank demonstrates its capabilities through the chatbot
memory operations. By using SQL as a symbolic memory SiliconFriend, designed for long-term companionship, which
language, ChatDB introduces a reliable method for LLMs uses MemoryBank to recall past conversations and adjust its
to handle intermediate results, enhancing both the accuracy responses based on user preferences and emotional state.
and capability of the model in various management and MemoryBank significantly improves the ability of LLMs to
reasoning scenarios. This symbolic memory approach sets the handle long-term interactions by offering a scalable solution
stage for further advancements in memory-augmented LLMs, for memory retention and recall. SiliconFriend, equipped with
offering a more scalable and efficient framework for handling this mechanism, demonstrates the potential for AI systems to
knowledge-intensive tasks. deliver more empathetic and personalized experiences. Mem-
oryBank’s flexible structure and memory updating mechanism
4) MemoryBank: Zhong et al. [20] introduce MemoryBank,
allow LLMs to provide relevant and accurate information
an innovative memory mechanism designed to address a sig-
across extended dialogues, setting the stage for further ad-
nificant limitation in large language models (LLMs): the lack
vancements in memory-augmented LLMs. This framework
of long-term memory. While LLMs have made remarkable
not only enhances LLM performance in personal companion
strides in performing various tasks, their inability to maintain
systems but also lays the groundwork for future developments
and recall information from past interactions has hindered their
in AI-human interaction, where long-term memory plays a crit-
performance in applications requiring sustained context, such
ical role in delivering meaningful and sustained engagements.
as personal assistance or therapy. MemoryBank enables LLMs
to store, retrieve, and update memories dynamically, allowing 5) RET-LLM: Modarressi et al. [21] present RET-LLM,
models to evolve in understanding users’ personalities over a groundbreaking framework designed to enhance large lan-
time. By integrating a memory updater based on the Ebbing- guage models (LLMs) by integrating a general read-write
haus Forgetting Curve theory, the system can selectively forget memory unit. Despite the remarkable advancements LLMs
or reinforce memories depending on their relevance and the have made in natural language processing (NLP), their lack
time elapsed, mirroring human memory retention patterns. of a dedicated memory system restricts their ability to store
and retrieve knowledge explicitly for diverse tasks. RET-LLM Experimental results reveal that the SCM framework signifi-
addresses this gap by allowing LLMs to extract, store, and re- cantly improves retrieval recall and generates more informative
call information as needed, improving their task performance. responses compared to competitive baselines in long-term di-
Drawing inspiration from Davidsonian semantics theory, the alogue scenarios. These findings demonstrate SCM’s potential
memory unit captures knowledge in the form of triplets, to enhance the performance of LLMs, allowing them to better
facilitating a scalable and interpretable memory structure that manage extensive conversations and detailed summarizations,
can be easily updated and aggregated. ultimately addressing a key challenge in the field of natural
The RET-LLM architecture comprises three key compo- language processing.
nents: a Controller, a Fine-tuned LLM, and a Memory unit. Despite its advantages, the SCM framework has some limi-
The Controller regulates the information flow among the user, tations, particularly regarding the evaluation of its performance
the LLM, and the Memory unit, ensuring efficient commu- in infinite dialogue settings, which were tested only up to
nication. The Fine-tuned LLM processes incoming text and 200 dialogue turns and a maximum token count of 34,000.
determines when to invoke memory. To facilitate memory in- This constraint arises from the challenges associated with
teraction, the framework implements a text-based API schema, qualitatively and quantitatively evaluating very long texts.
allowing the LLM to generate standardized memory API calls. Additionally, the effectiveness of the SCM framework relies
Knowledge is stored in a triplet format, structured as ¡first on powerful instruction-following LLMs like text-davinci-
argument, relation, second argument¿, reflecting the theoretical 003 and gpt-3.5-turbo-0301. However, the authors anticipate
principles of Davidsonian semantics. This organization enables that future advancements in smaller, more powerful LLMs
effective management of relational knowledge, allowing the could mitigate this limitation. Overall, the SCM framework
model to perform better in various NLP tasks, particularly in offers a promising direction for extending the input length of
question answering. LLMs and improving their ability to capture and recall useful
RET-LLM significantly enhances LLM capabilities by en- information from historical data.
abling the explicit storage and retrieval of information, thereby
addressing one of the critical limitations of traditional LLMs. D. Technologies / Frameworks
The framework’s triplet-based memory structure allows for The development of LLM-based multi-agent systems relies
nuanced relationships to be stored and accessed, showcasing heavily on a variety of technologies and frameworks that fa-
superior performance in question answering tasks, especially cilitate efficient agent collaboration and task execution. These
those requiring temporal reasoning. Preliminary qualitative frameworks provide essential tools for building, deploying,
evaluations indicate that RET-LLM outperforms baseline ap- and managing multi-agent environments, enabling seamless
proaches, demonstrating its potential in effectively managing communication and coordination among agents. Technologies
time-dependent information. Although still under develop- such as AutoGen and MetaGPT empower agents to generate
ment, future iterations of RET-LLM will focus on comprehen- dynamic responses and solutions based on real-time data
sive empirical evaluations using real datasets and refining the and interactions. Additionally, frameworks like CAMEL and
fine-tuning process to broaden its applicability across various CrewAI offer integrated environments that streamline the
types of informative relations. The ongoing research highlights design and orchestration of multi-agent systems, allowing for
the transformative potential of incorporating a robust memory enhanced scalability and flexibility. By leveraging these tech-
unit into LLMs, paving the way for more intelligent and nologies, researchers and practitioners can create sophisticated
adaptable AI systems. LLM-based systems that adapt to changing circumstances and
6) Self-Controlled Memory: Wang et al. [22] introduce the optimize their performance across a range of applications,
Self-Controlled Memory (SCM) framework, a novel approach from robotics to intelligent assistance.
aimed at enhancing large language models (LLMs) by address- 1) AutoGen: Wu et al. [23] present AutoGen, an open-
ing their limitations in processing lengthy inputs. Traditional source framework designed to facilitate the development of
LLMs often struggle to retain critical historical information, large language model (LLM) applications through multi-agent
which hinders their performance in tasks requiring long- conversations. This innovative framework allows multiple
term memory. The SCM framework consists of three main agents to interact, collaborate, and accomplish tasks by lever-
components: an LLM-based agent that serves as the backbone aging customizable and conversable agents that can operate in
of the system, a memory stream that stores agent memories, diverse modes. AutoGen supports a combination of LLMs, hu-
and a memory controller that updates these memories and man inputs, and various tools, enabling developers to flexibly
determines when and how to utilize them. Importantly, SCM define agent interaction behaviors. By employing both natural
operates in a plug-and-play manner, enabling seamless integra- language and computer code for programming conversation
tion with any instruction-following LLMs without the need for patterns, AutoGen serves as a generic framework that caters to
extensive modifications or fine-tuning. a wide range of applications, including mathematics, coding,
To validate the effectiveness of the SCM framework, the question answering, operations research, and online decision-
authors annotated a dataset designed for evaluating its capabil- making.
ities in handling ultra-long texts across three tasks: long-term To streamline the creation of complex LLM applications,
dialogues, book summarization, and meeting summarization. AutoGen is built upon the principles of conversable agents
TABLE III: Comparison of Memory Frameworks in LLM Multi-Agent Systems
Framework Key Features Strengths Limitations
Vector Databases Efficient storage and retrieval of Reduces reliance on static mem- Integration challenges; Requires
(VecDB) high-dimensional vector represen- ory; Mitigates hallucinations and optimization for compatibility with
tations; Enhances LLMs’ ability to outdated information; Enables dy- LLM architectures.
access external knowledge bases. namic, context-aware interactions.
Retrieval-Augmented Combines parametric and non- Improves specificity, diversity, and Complexity in retrieval index man-
Generation (RAG) parametric memory; Retrieves rel- factual accuracy; Scalable solu- agement; Requires extensive train-
evant information during language tion for updating LLMs with new ing data for optimal performance.
generation. knowledge.
ChatDB Integrates symbolic memory via Enables precise memory manipu- Limited to tasks that can be
SQL databases; Supports multi- lation; Reduces error propagation; mapped to SQL operations; Re-
hop reasoning through external Enhances performance in complex quires careful SQL instruction gen-
database interaction. reasoning tasks. eration.
MemoryBank Offers long-term memory retention Enhances personalized responses; Performance may vary based on
and dynamic updating; Uses the Adapts over time to user interac- user interaction context; Requires
Ebbinghaus Forgetting Curve for tions; Versatile for various LLM careful memory updating to avoid
selective memory management. architectures. overload.
RET-LLM Integrates a read-write memory Addresses LLM limitations in stor- Still under development; Requires
unit; Stores knowledge as triplets ing and retrieving knowledge; Su- comprehensive evaluations and real
for scalable, interpretable memory perior performance in question an- dataset testing.
management. swering tasks.
Self-Controlled Consists of an agent, memory Significantly improves retrieval re- Limited evaluation in infinite dia-
Memory (SCM) stream, and memory controller; En- call; Generates informative re- logue settings; Dependent on pow-
hances handling of lengthy inputs. sponses; Easy integration with ex- erful instruction-following LLMs
isting LLMs. for effectiveness.

and conversation programming. A conversable agent can send staying aligned with human intentions. This framework not
and receive messages to engage with other agents while only facilitates the generation of conversational data but also
maintaining its internal context. This modular approach allows serves as a valuable resource for exploring the behaviors and
for a variety of capabilities, powered by LLMs, tools, or hu- capabilities of a society of agents, particularly in multi-agent
man input. Conversation programming encompasses two key settings focused on instruction-following cooperation.
concepts: computation, which pertains to the actions agents The paper highlights the significance of autonomous co-
undertake in a multi-agent conversation, and control flow, operation among communicative agents and delineates the
which dictates the sequence or conditions under which these challenges that accompany it, such as conversation deviation,
actions occur. This conversation-centric paradigm simplifies role flipping, and defining termination conditions. The role-
the reasoning behind complex workflows, allowing agents to playing framework offers a scalable solution to these chal-
pass messages dynamically and adaptively as they collaborate. lenges, allowing agents to engage in effective collaboration
The authors emphasize that AutoGen enhances multi-agent with minimal human intervention. The authors conducted
cooperation through its unified conversation interface and comprehensive evaluations to assess the framework’s effec-
auto-reply mechanisms, effectively harnessing the strengths tiveness, demonstrating that it leads to better outcomes in task
of chat-optimized LLMs. The framework enables developers completion. Additionally, their open-sourced library includes
to create and experiment with multi-agent systems that can implementations of various agents, data generation pipelines,
be reused, customized, and extended, all while significantly and analytical tools, thus fostering research on communica-
reducing development effort. Experimental results indicate that tive agents and advancing the understanding of cooperative
AutoGen outperforms state-of-the-art approaches, streamlining behaviors in multi-agent systems.
the development process and enabling flexible, dynamic inter- By providing insights into the complexities of agent inter-
actions among agents. While still in the early stages of de- actions and the dynamics of cooperative AI systems, this work
velopment, AutoGen lays the groundwork for future research significantly contributes to the growing field of large language
into the integration of existing agent implementations, optimal models. The framework not only emphasizes the potential for
agent topologies, and the balance between automation and autonomous agent collaboration but also sets the stage for
human control in multi-agent workflows, addressing potential future research endeavors aimed at improving the scalability
safety challenges as the complexity of applications increases. and efficacy of communicative agents in diverse applications.
2) CAMEL: Li et al. [24] present CAMEL, a novel frame- With CAMEL, Li et al. pave the way for more sophisticated
work aimed at enhancing the autonomous cooperation of interactions among agents, enhancing the capabilities of lan-
communicative agents in chat-based language models. As guage models and their applications in real-world scenarios.
these models continue to evolve, their effectiveness often 3) CrewAI: In the paper by Berti et al. [25], the CrewAI
hinges on human input to guide conversations, which can be a framework is introduced as a crucial component for imple-
daunting and time-consuming task. The authors propose a role- menting the AI-Based Agents Workflow (AgWf) paradigm,
playing approach that employs inception prompting to enable aimed at enhancing process mining (PM) tasks through the
agents to work collaboratively toward task completion while integration of Large Language Models (LLMs). CrewAI serves
as a Python framework that facilitates the design and execution relevant information from other roles and the environment,
of AgWf, enabling developers to harness the capabilities of thereby streamlining the workflow and facilitating a more
LLMs in a structured manner. The framework is built upon coherent solution generation process.
several key concepts: AI-based agents, AI-based tasks, and MetaGPT represents a significant advancement in the de-
tools. AI-based agents combine LLMs with tailored system velopment of LLM-based multi-agent systems, combining
prompts, effectively aligning the model’s behavior with spe- flexibility and convenience with robust functionality. The
cific roles. This role prompting is essential for ensuring that integration of human-like SOPs within the framework mini-
the agents perform their designated tasks accurately. mizes unproductive collaboration, while the novel executable
Within the CrewAI framework, AI-based tasks are defined feedback mechanism allows for real-time debugging and code
through textual instructions linked to these AI-based agents, execution during runtime, leading to notable improvements in
allowing for a clear delineation of responsibilities. Further- code generation quality. MetaGPT’s impressive performance
more, tools are implemented as Python classes or functions, on benchmarks like HumanEval and MBPP underscores its
which can be integrated into tasks based on their documen- potential as a valuable tool for future research and applica-
tation strings, including input parameters and output types. tion in multi-agent collaborations, paving the way for more
The framework supports both traditional sequential execution effective and coherent solutions in complex problem-solving
of tasks and more complex concurrent execution through scenarios.
hierarchical processes, although further development is needed 5) LangGraph: The LangGraph framework [27] emerges as
in this area. By decomposing complex PM tasks into simpler, a powerful tool for developing advanced Retrieval-Augmented
manageable workflows, CrewAI aims to enhance the reasoning Generation (RAG) systems, particularly for knowledge-based
capabilities of LLMs, thus addressing the limitations that arise question-answering (QA) applications. Unlike traditional RAG
when these models are faced with intricate scenarios. models that often suffer from accuracy degradation due to their
The CrewAI framework exemplifies a modern approach to reliance on static pre-loaded knowledge, LangGraph leverages
leveraging AI for process mining by combining the strengths graph technology to enhance the information retrieval process.
of LLMs with deterministic tools to produce high-quality By enabling efficient searches and evaluations of the reliability
outputs. The paper details various AI-based tasks that can of retrieved data, LangGraph significantly improves the con-
be employed within CrewAI for PM applications, including textual understanding and accuracy of generated responses.
prompt optimizers, ensembles, routers, evaluations, and output This innovative approach not only mitigates the limitations
improvers. Through practical examples such as root cause of existing RAG models but also facilitates the integration
analysis and bias detection in process mining event logs, Berti of real-time data, allowing for a more dynamic and accurate
et al. demonstrate the potential of CrewAI to revolutionize information synthesis process.
how process mining tasks are approached in the era of AI- LangGraph stands out among other frameworks by provid-
based agents. The framework not only provides a pathway for ing a stateful, multi-actor application environment specifically
implementing effective workflows but also encourages further designed for LLMs. Its capability to create agent workflows
research into automating workflow definitions and enhancing as cyclic graph structures allows developers to define intricate
agent evaluation frameworks. flows and control the state of the application, which is essential
4) MetaGPT: The paper by Hong et al. [26] introduces for building reliable agents. The LangGraph Conversational
MetaGPT, an innovative meta-programming framework de- Retrieval Agent further enhances this by incorporating lan-
signed to enhance collaboration among multi-agent systems guage processing, AI model integration, and graph-based
built on large language models (LLMs). Existing LLM- data management, making it an ideal option for crafting
based multi-agent systems excel at simple dialogue tasks but sophisticated language-based AI applications. Its architecture
struggle with complex scenarios due to logic inconsistencies encourages collaborative interactions among agents, ensuring
and cascading hallucinations that arise from naively chaining that complex tasks are handled with precision and reliability.
LLMs together. MetaGPT addresses these challenges by in- Overall, the implementation of the LangGraph framework
corporating efficient human workflows through the encoding within the context of advanced RAG systems offers a com-
of Standardized Operating Procedures (SOPs) into structured pelling advantage over previously mentioned frameworks. Its
prompt sequences. This approach allows agents with human- focus on creating cyclic workflows not only allows for a more
like domain expertise to verify intermediate results, thus robust and efficient handling of multi-agent tasks but also sig-
reducing errors and improving overall performance. nificantly elevates the quality of responses through improved
A key feature of MetaGPT is its assembly line paradigm, data processing and reliability assessment. The framework’s
which efficiently assigns diverse roles to various agents, ability to enhance real-time data accessibility and support
breaking down complex tasks into manageable subtasks that diverse question types positions it as an invaluable resource
promote effective collaboration. The framework emphasizes for developing high-quality generative AI services, particularly
role specialization and structured communication, enhancing in customer support and information retrieval applications.
the agents’ ability to interact and share information. By imple- As demonstrated in Jeong’s study, LangGraph provides a
menting a communication protocol that includes structured in- crucial foundation for advancing the capabilities of RAG-
terfaces and a publish-subscribe mechanism, agents can access based systems, making it a preferred choice for researchers
and developers in the field.
III. M ETHODOLOGY
This review aimed to systematically evaluate and synthesize
existing research on large language model (LLM) multi-
agent systems, specifically addressing the aspects that directly
support their application and scalability, while ensuring that
a methodology synonymous with the standard practice of
following the scientific method was utilized.
To focus the scope effectively, research questions were
first defined to limit the exploration to technologies explicitly
designed for LLM multi-agent systems rather than those per-
taining to LLMs and multi-agent systems independently. This
choice allowed for a thorough examination of the unique inter-
sections between LLMs and multi-agent interactions, avoiding
the dilution of findings across broader, less targeted studies.
To cover the breadth of critical topics in the field, four
primary aspects were identified: Architecture, Memory, Plan-
ning, and Technologies/Frameworks. Each aspect addresses an
essential component in the design and operation of LLM-based
multi-agent systems, reflecting the distinctive requirements and
challenges of these systems.
The literature search was conducted across multiple well-
regarded academic sources, including Google Scholar, IEEE
Xplore, and arXiv, using targeted keywords associated with
each of the four topics. Among these, arXiv proved to be the
most valuable repository, providing a high concentration of
relevant papers that detailed recent developments and experi-
mental applications of LLM-based multi-agent systems. Each
paper was evaluated for its relevance to the identified topics,
Fig. 1: Overview of Survey Methodology
with priority given to publications from credible authors and
reputable conferences or journals. This selection process en-
sured that the reviewed literature included the most influential
and innovative work within the field. architecture proposed by Wang et al. emerges as a highly ef-
For each shortlisted paper, detailed content analysis was fective design for achieving sophisticated collaboration among
performed, with particular attention to descriptions of meth- agents. MoA’s layered model differentiates agents into pro-
ods, architectures, and experimental setups. The merits and poser and aggregator roles, with proposers generating diverse
limitations of each approach were recorded, allowing for a responses and aggregators synthesizing them into cohesive,
comprehensive understanding of current capabilities, typical high-quality outputs. This architecture enhances LLM per-
challenges, and areas with potential for improvement. This formance across benchmarks like AlpacaEval 2.0 and MT-
analysis informed recommendations for researchers and en- Bench, underscoring its versatility. By allowing specialized
gineers in the field, offering guidance on optimal practices roles for different agents, MoA maximizes the strengths of
and common pitfalls in developing LLM-based multi-agent various LLMs. However, MoA does have limitations, as not
systems. all LLMs function equally well in both roles; for instance,
The findings from this review are synthesized to highlight WizardLM excels as a proposer but struggles in the aggregator
prominent trends and identify future research opportunities, capacity. Furthermore, scaling MoA with additional agents
such as advancements in scalability and robustness. The focus introduces complexity, suggesting that future implementations
on these emerging needs aims to guide ongoing research ef- may benefit from advanced orchestration techniques to manage
forts in building systems that can effectively manage complex the increasing number of interactions effectively.
multi-agent interactions. An overview of this methodology is Regarding memory in LLM multi-agent systems, the analy-
depicted in the following diagram. sis found that various memory approaches could be equally
applicable, depending on the specific use case. Short-term
IV. D ISCUSSION
memory models, for example, excel in scenarios where agents
A. Key Findings need rapid access to recent information but do not necessarily
Upon analyzing various architectural approaches for LLM- require extensive historical context. Conversely, long-term
based multi-agent systems, the ”Mixture of Agents” (MoA) memory models are valuable for applications that demand
TABLE IV: Comparison of Technologies and Frameworks in LLM Multi-Agent Systems
Framework Key Features Strengths Limitations
AutoGen Open-source framework for multi- Enhances multi-agent cooperation; Still in early development; Integra-
agent conversations; Supports di- Flexible definition of agent behav- tion of existing agent implementa-
verse modes of agent interaction; iors; Outperforms state-of-the-art tions requires further research.
Combines LLMs, human inputs, approaches.
and tools.
CAMEL Role-playing approach with in- Autonomous cooperation reduces Challenges with conversation de-
ception prompting; Facilitates task human input; Scalable solution to viation and role flipping; Requires
completion among communicative conversation challenges; Includes evaluation in diverse contexts.
agents; Generates conversational open-sourced agent implementa-
data. tions.
CrewAI Framework for AI-Based Agents Enhances reasoning capabilities of Further development needed for
Workflow (AgWf); Integrates LLMs; Breaks down complex tasks concurrent execution; Manual de-
LLMs with AI-based tasks and into manageable workflows; En- sign of agents can be complex.
tools; Supports sequential and courages high-quality outputs.
concurrent task execution.
MetaGPT Meta-programming framework that Reduces errors through human- Complexity in implementation; Re-
integrates Standardized Operating like verification; Improves collabo- liance on structured prompts may
Procedures (SOPs); Assigns roles ration and solution generation; No- limit flexibility.
to agents and enhances structured table performance improvements in
communication. benchmarks.
LangGraph Enhances Retrieval-Augmented Improves accuracy and contextual Complexity in managing cyclic
Generation (RAG) systems with understanding; Enables real-time workflows; Reliance on graph
graph technology; Allows cyclic data integration; Supports collabo- structures may pose a learning
workflows for agent applications. rative interactions among agents. curve.

more in-depth information retention over extended interac- B. Future Directions


tions. The choice of memory architecture should align with the
intended function of the system, as this can have a significant In summary, this review has identified the Mixture of
impact on performance and scalability, especially in cases that Agents architecture and ReAct planning framework as highly
require high responsiveness or nuanced historical recall. effective strategies for designing and managing LLM-based
multi-agent systems. Memory and technology choices remain
For planning, the ReAct framework stands out as a pre- largely application-specific, underscoring the importance of
ferred approach for integrating reasoning and action plan- aligning system components with the desired outcomes. To-
ning within LLM-based multi-agent systems. By enabling an gether, these findings present a roadmap for future develop-
interleaved generation of reasoning traces and task-specific ments in LLM multi-agent systems, where ongoing research
actions, ReAct effectively addresses traditional limitations in and refined frameworks will likely contribute to increasingly
complex task handling. This framework synergizes reasoning robust, versatile applications in this domain.
and action, allowing the model to update and manage plans
while interacting with external sources like knowledge bases. V. C ONCLUSION
ReAct has demonstrated its strengths across diverse tasks,
including multi-hop question answering and fact verification, The development of LLM-based multi-agent systems marks
by mitigating common issues such as hallucination and error a significant step forward in enabling complex, collaborative
propagation. While ReAct provides an elegant solution for task AI applications. This review has identified core frameworks
planning, it could benefit from enhancements in action space and methodologies that enhance system effectiveness, such
handling, with possibilities for multi-task training or integra- as the Mixture of Agents (MoA) for structured agent collab-
tion with reinforcement learning to expand its robustness and oration and the ReAct framework for integrating reasoning
applicability. and action. Memory architectures within multi-agent systems
remain diverse, with application-specific requirements deter-
In terms of technologies and frameworks for developing mining the most suitable approach for short-term or long-
LLM-based multi-agent systems, this review found that the term data retention. Technology choices hinge on application
choice of framework often depends on the requirements of demands, favoring adaptable frameworks capable of integrat-
the application rather than any inherent superiority of one ing external data and supporting scalability. Although these
framework over another. Factors such as ease of integration, systems exhibit great potential, ongoing challenges—including
support for specific programming languages, scalability, and computational costs, communication optimization, and role
compatibility with external data sources play a significant specialization—must be addressed for broader applicability.
role in determining which framework is optimal. As the field Future research and practical advancements are essential to
continues to evolve, the adaptability of frameworks to new evolving these frameworks, promoting resilient and flexible
advances and compatibility with emerging tools for LLMs will LLM-based multi-agent systems poised to support sophisti-
be crucial for their sustained utility. cated AI-driven workflows in varied domains.
R EFERENCES [24] G. Li, H. A. A. Kader Hammoud, H. Itani, D. Khizbullin, and
B. Ghanem, ”CAMEL: Communicative Agents for ’Mind’ Explo-
[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, ration of Large Language Model Society,” 2023. [Online]. Available:
L. Kaiser, and I. Polosukhin, ”Attention Is All You Need,” 2017. [Online]. arXiv:2303.17760.
Available: arXiv:1706.03762. [25] A. Berti, M. Maatallah, U. Jessen, M. Sroka, and S. A. Ghannouchi, ”Re-
[2] S. Minaee, T. Mikolov, N. Nikzad, M. Chenaghlu, R. Socher, X. Ama- Thinking Process Mining in the AI-Based Agents Era,” 2024. [Online].
triain, and J. Gao, ”Large Language Models: A Survey,” 2024. [Online]. Available: arXiv:2408.07720.
Available: arXiv:2402.06196. [26] S. Hong, M. Zhuge, J. Chen, X. Zheng, Y. Cheng, C. Zhang, J. Wang,
[3] T. Adewumi, N. Habib, L. Alkhaled, and E. Barney, ”On the Limitations Z. Wang, S. K. Shing Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu,
of Large Language Models (LLMs): False Attribution,” 2024. [Online]. and J. Schmidhuber, ”MetaGPT: Meta Programming for A Multi-Agent
Available: arXiv:2404.04631. Collaborative Framework,” 2023. [Online]. Available: arXiv:2308.00352.
[4] V. Cherkassky and E. Hock Lee, ”A Perspective on Large Language Mod- [27] C. Jeong, ”A Study on the Implementation Method of an Agent-
els, Intelligent Machines, and Knowledge Acquisition,” 2024. [Online]. Based Advanced RAG System Using Graph,” 2024. [Online]. Available:
Available: arXiv:2408.06598. arXiv:2407.19994.
[5] S. Han, Q. Zhang, Y. Yao, W. Jin, Z. Xu, and C. He, ”LLM Multi-Agent
Systems: Challenges and Open Problems,” 2024. [Online]. Available:
arXiv:2402.03578.
[6] T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N. V. Chawla, O. Wiest,
and X. Zhang, ”Large Language Model based Multi-Agents: A Survey of
Progress and Challenges,” 2024. [Online]. Available: arXiv:2402.01680.
[7] Q. Wang, Z. Wang, Y. Su, H. Tong, and Y. Song, ”Rethinking the
Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?”
2024. [Online]. Available: arXiv:2402.18272.
[8] Y. Zhang, R. Sun, Y. Chen, T. Pfister, R. Zhang, and S. Ö. Arik, ”Chain of
Agents: Large Language Models Collaborating on Long-Context Tasks,”
2024. [Online]. Available: arXiv:2406.02818.
[9] J. Li, Q. Zhang, Y. Yu, Q. Fu, and D. Ye, ”More Agents Is All You
Need,” 2024. [Online]. Available: arXiv:2402.05120.
[10] J. Wang, J. Wang, B. Athiwaratkun, C. Zhang, and J. Zou, ”Mixture-of-
Agents Enhances Large Language Model Capabilities,” 2024. [Online].
Available: arXiv:2406.04692.
[11] H. Sun, Y. Zhuang, L. Kong, B. Dai, and C. Zhang, ”AdaPlanner: Adap-
tive Planning from Feedback with Language Models,” 2023. [Online].
Available: arXiv:2305.16653.
[12] Z. Chen, K. Zhou, B. Zhang, Z. Gong, W. X. Zhao, and J.-R. Wen,
”ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based
Large Language Models,” 2023. [Online]. Available: arXiv:2305.14323.
[13] Y. Zhu, S. Qiao, Y. Ou, S. Deng, N. Zhang, S. Lyu, Y. Shen, L. Liang,
J. Gu, and H. Chen, ”KnowAgent: Knowledge-Augmented Planning for
LLM-Based Agents,” 2024. [Online]. Available: arXiv:2403.03101.
[14] T. Kagaya, T. J. Yuan, Y. Lou, J. Karlekar, S. Pranata, A. Kinose,
K. Oguri, F. Wick, and Y. You, ”RAP: Retrieval-Augmented Planning
with Contextual Memory for Multimodal LLM Agents,” 2024. [Online].
Available: arXiv:2402.03610.
[15] S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, and K.
Narasimhan, ”Tree of Thoughts: Deliberate Problem Solving with Large
Language Models,” 2023. [Online]. Available: arXiv:2305.10601.
[16] S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao,
”ReAct: Synergizing Reasoning and Acting in Language Models,” 2022.
[Online]. Available: arXiv:2210.03629.
[17] Z. Jing, Y. Su, Y. Han, B. Yuan, H. Xu, C. Liu, K. Chen, and M. Zhang,
”When Large Language Models Meet Vector Databases: A Survey,” 2024.
[Online]. Available: arXiv:2402.01763.
[18] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H.
Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, and D. Kiela,
”Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,”
2020. [Online]. Available: arXiv:2005.11401.
[19] C. Hu, J. Fu, C. Du, S. Luo, J. Zhao, and H. Zhao, ”ChatDB:
Augmenting LLMs with Databases as Their Symbolic Memory,” 2023.
[Online]. Available: arXiv:2306.03901.
[20] W. Zhong, L. Guo, Q. Gao, H. Ye, and Y. Wang, ”MemoryBank:
Enhancing Large Language Models with Long-Term Memory,” 2023.
[Online]. Available: arXiv:2305.10250.
[21] A. Modarressi, A. Imani, M. Fayyaz, and H. Schütze, ”RET-LLM:
Towards a General Read-Write Memory for Large Language Models,”
2023. [Online]. Available: arXiv:2305.14322.
[22] B. Wang, X. Liang, J. Yang, H. Huang, S. Wu, P. Wu, L. Lu, Z. Ma, and
Z. Li, ”Enhancing Large Language Model with Self-Controlled Memory
Framework,” 2023. [Online]. Available: arXiv:2304.13343.
[23] Q. Wu, G. Bansal, J. Zhang, Y. Wu, B. Li, E. Zhu, L. Jiang, X. Zhang,
S. Zhang, J. Liu, A. H. Awadallah, R. W. White, D. Burger, and C.
Wang, ”AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent
Conversation,” 2023. [Online]. Available: arXiv:2308.08155.

You might also like