LLM Based Multi Ageny
LLM Based Multi Ageny
Taicheng Guo1 , Xiuying Chen2 , Yaqi Wang3∗ , Ruidi Chang4∗ , Shichao Pei5 ,
Nitesh V. Chawla1 , Olaf Wiest1 , Xiangliang Zhang1†
1
University of Notre Dame
2
King Abdullah University of Science and Technology
3
Southern University of Science and Technology
4
Unaffliated
5
University of Massachusetts Boston
{tguo2, nchawla, owiest, xzhang33}@nd.edu, [email protected], [email protected],
arXiv:2402.01680v1 [cs.CL] 21 Jan 2024
[email protected], [email protected]
Abstract Liang et al., 2023]. Hence, LLM-based agent has been stud-
ied and rapidly developed to understand and generate human-
Large Language Models (LLMs) have achieved re- like instructions, facilitating sophisticated interactions and
markable success across a wide array of tasks. decision-making in a wide range of contexts [Yao et al.,
Due to the impressive planning and reasoning abil- 2023; Shinn et al., 2023; Li et al., 2023d]. Timely survey
ities of LLMs, they have been used as autonomous papers systematically summarize the progress of LLM-based
agents to do many tasks automatically. Recently, agents, as seen in works [Xi et al., 2023; Wang et al., 2023b].
based on the development of using one LLM as a
single planning or decision-making agent, LLM- Based on the inspiring capabilities of the single LLM-
based multi-agent systems have achieved consid- based agent, LLM-based Multi-Agents have been proposed
erable progress in complex problem-solving and to leverage the collective intelligence and specialized pro-
world simulation. To provide the community with files and skills of multiple agents. Compared to systems us-
an overview of this dynamic field, we present this ing a single LLM-powered agent, multi-agent systems offer
survey to offer an in-depth discussion on the essen- advanced capabilities by 1) specializing LLMs into various
tial aspects of multi-agent systems based on LLMs, distinct agents, each with different capabilities, and 2) en-
as well as the challenges. Our goal is for readers to abling interactions among these diverse agents to simulate
gain substantial insights on the following questions: complex real-world environments effectively. In this context,
What domains and environments do LLM-based multiple autonomous agents collaboratively engage in plan-
multi-agents simulate? How are these agents pro- ning, discussions, and decision-making, mirroring the co-
filed and how do they communicate? What mech- operative nature of human group work in problem-solving
anisms contribute to the growth of agents’ capaci- tasks. This approach capitalizes on the communicative ca-
ties? For those interested in delving into this field pabilities of LLMs, leveraging their ability to generate text
of study, we also summarize the commonly used for communication and respond to textual inputs. Further-
datasets or benchmarks for them to have convenient more, it exploits LLMs’ extensive knowledge across vari-
access. To keep researchers updated on the latest ous domains and their latent potential to specialize in spe-
studies, we maintain an open-source GitHub repos- cific tasks. Recent research has demonstrated promising re-
itory, dedicated to outlining the research on LLM- sults in utilizing LLM-based multi-agents for solving vari-
based multi-agent systems. ous tasks, such as software development [Hong et al., 2023;
Qian et al., 2023], multi-robot systems [Mandi et al., 2023;
Zhang et al., 2023c], society simulation [Park et al., 2023;
1 Introduction Park et al., 2022], policy simulation [Xiao et al., 2023;
Large Language Models (LLMs) have recently shown re- Hua et al., 2023], and game simulation [Xu et al., 2023c;
markable potential in reaching a level of reasoning and plan- Wang et al., 2023c]. Due to the nature of interdisciplinary
ning capabilities comparable to humans. This ability ex- study in this field, it has attracted a diverse range of re-
actly aligns with the expectations of humans for autonomous searchers, expanding beyond AI experts to include those from
agents that can perceive the surroundings, make decisions, social science, psychology, and policy research. The vol-
and take actions in response [Xi et al., 2023; Wooldridge and ume of research papers is rapidly increasing, as shown in
Jennings, 1995; Russell and Norvig, 2009; Guo et al., 2023; Fig. 1 (inspired by the design in [Gao et al., 2023b]), thus
broadening the impact of LLM-based Multi-Agent research.
∗ Nonetheless, earlier efforts were undertaken independently,
This work was done when Yaqi and Ruidi were visiting students
at the University of Notre Dame. resulting in an absence of a systematic review to summarize
†
Corresponding author. them, establish comprehensive blueprint of this field, and ex-
Figure 1: The rising trend in the research field of LLM-based Multi-Agents. For Problem Solving and World Simulation, we categorize
current work into several categories and count the number of papers of different types at 3-month intervals. The number at each leaf node
denotes the count of papers within that category.
amine future research challenges. This underscores the sig- present a comprehensive schema for positioning, differenti-
nificance of our work and serves as the motivation behind pre- ating, and connecting various aspects of LLM-MA systems
senting this survey paper, dedicated to the research on LLM- in Section 3. We delve into this question by discussing: 1)
based multi-agent systems. the agents-environment interface, which details how agents
We expect that our survey can make significant contribu- interact with the task environment; 2) agent profiling, which
tions to both the research and development of LLMs and to explains how an agent is characterized by an LLM to behave
a wider range of interdisciplinary studies employing LLMs. in specific ways; 3) agent communication, which examines
Readers will gain a comprehensive overview of LLM-based how agents exchange messages and collaborate; and 4) agent
Multi-Agent (LLM-MA) systems, grasp the fundamental capability acquisition, which explores how agents develop
concepts involved in establishing multi-agent systems based their abilities to effectively solve problems. An additional
on LLMs, and catch the latest research trends and applica- perspective for reviewing studies about LLM-MA is their ap-
tions in this dynamic field. We recognize that this field is in plication. In Section 4, we categorize current applications
its early stages and is rapidly evolving with fresh methodolo- into two primary streams: multi-agents for problem-solving
gies and applications. To provide a sustainable resource com- and multi-agents for world simulation. To guide individuals
plementing our survey paper, we maintain an open-source in identifying appropriate tools and resources, we present
GitHub repository1 . We hope that our survey will inspire fur- open-source implementation frameworks for studying LLM-
ther exploration and innovation in this field, as well as appli- MA, as well as the usable datasets and benchmarks in Sec-
cations across a wide array of research disciplines. tion 5. Based on the previous summary, we open the dis-
To assist individuals from various backgrounds in under- cussion for future research challenges and opportunities in
standing LLM-MA techniques and to complement existing Section 6. The conclusions are summarized in Section 7.
surveys by tackling unresolved questions, we have organized
our survey paper in the following manner. After laying out 2 Background
the background knowledge in Section 2, we address a piv-
otal question: How are LLM-MA systems aligned with the 2.1 Single-Agent Systems Powered LLMs
collaborative task-solving environment? To answer this, we We introduce the background by first outlining the capabili-
ties of a single-agent system based on LLMs, following the
1
https://github.com/taichengguo/LLM MultiAgents Survey Papers discussion presented in [Weng, 2023].
Decision-making Thought: This term denotes the capabil- the environment, which in turn influences their behavior and
ity of LLM-based agents, guided by prompts, to break down decision making. For example, in the Werewolf Game simu-
complex tasks into smaller subgoals [Khot et al., 2023], think lation, the sandbox environment sets the game’s framework,
through each part methodically (sometimes exploring mul- including transitions from day to night, discussion periods,
tiple paths) [Yao et al., 2023], and learn from past experi- voting mechanics, and reward rules. Agents, such as were-
ences [Shinn et al., 2023] to perform better decision-making wolves and the Seer, perform specific actions like killing or
on complex tasks. This capability enhances the autonomy checking roles. Following these actions, agents receive feed-
of a single LLM-based agent and bolsters its effectiveness in back from the environment, informing them of the game’s
problem-solving. current state. This information guides the agents in adjust-
Tool-use: LLM-based agents’ tool-use capability allows ing their strategies over time, responding to the evolving
them to leverage external tools and resources to accom- gameplay and interactions with other agents. The Agents-
plish tasks, enhancing their functional capabilities and oper- Environment Interface refers to the way in which agents in-
ate more effectively in diverse and dynamic environments [Li teract with and perceive the environment. It’s through this
et al., 2023d; Ruan et al., 2023; Gao et al., 2023b]. interface that agents understand their surroundings, make de-
cisions, and learn from the outcomes of their actions. We
Memory: This ability refers to the capability of LLM- categorize the current interfaces in LLM-MA systems into
based agent for conducting in-context learning [Dong et al., three types, Sandbox, Physcial, and None, as detailed in Ta-
2023a] as short memory or external vector database [Lewis et ble 1. The Sandbox refers to a simulated or virtual environ-
al., 2021] as long memory to preserve and retrieve informa- ment built by human where agents can interact more freely
tion over prolonged periods [Wang et al., 2023b]. This ability and experiment with various actions and strategies. This kind
enables a single LLM-based agent to maintain contextual co- of interface is widely used in software development (code
herence and enhance learning from interactions. interpreter as simulated environment) [Hong et al., 2023],
gaming (using game rules as simulated environment) [Mao
2.2 Single-Agent VS. Multi-Agent Systems et al., 2023], etc. The Physical is a real-world environment
Single-Agent systems empowered by LLMs have shown in- where agents interact with physical entities and obey real-
spiring cognitive abilities [Sumers et al., 2023]. The con- world physics and constraints. In physical space, agents nor-
struction of such systems concentrates on formulating their mally need to take actions that can have direct physical out-
internal mechanisms and interactions with the external en- comes. For example, in tasks such as sweeping the floor,
vironment. Conversely, LLM-MA systems emphasize di- making sandwiches, packing groceries, and arranging cab-
verse agent profiles, inter-agent interactions, and collective inets, robotic agents are required to perform actions itera-
decision-making processes. From this perspective, more dy- tively, observe the physical environment, and continuously
namic and complex tasks can be tackled by the collaboration refine their actions [Mandi et al., 2023]. Lastly, None refers
of multiple autonomous agents, each of which is equipped to scenarios where there is no specific external environment,
with unique strategies and behaviors, and engaged in com- and agents do not interact with any environment. For exam-
munication with one another. ple, many applications [Du et al., 2023; Xiong et al., 2023;
Chan et al., 2023] utilize multiple agents to debate a ques-
3 Dissecting LLM-MA Systems: Interface, tion to reach a consensus. These applications primarily focus
on communication among agents and do not depend on the
Profiling, Communication, and Capabilities external environment.
In this section, we delve into the intricacies of LLM-MA sys-
tems, where multiple autonomous agents engage in collabo- 3.2 Agents Profiling
rative activities akin to human group dynamics in problem- In LLM-MA systems, agents are defined by their traits, ac-
solving scenarios. A critical inquiry we address is how tions, and skills, which are tailored to meet specific goals.
these LLM-MA systems are aligned to their operational envi- Across various systems, agents assume distinct roles, each
ronments and the collective objectives they are designed to with comprehensive descriptions encompassing characteris-
achieve. To shed light on this, we present the general ar- tics, capabilities, behaviors, and constraints. For instance,
chitecture of these systems in Fig. 2. Our analysis dissects in gaming environments, agents might be profiled as players
the operational framework of these systems, focusing on four with varying roles and skills, each contributing differently to
key aspects: the agents-environment interface, agent profil- the game’s objectives. In software development, agents could
ing, agent communication, and agent capability acquisition. take on the roles of product managers and engineers, each
with responsibilities and expertise that guide the development
3.1 Agents-Environment Interface process. Similarly, in a debating platform, agents might be
The operational environments defines the specific contexts or designated as proponents, opponents, or judges, each with
settings in which the LLM-MA systems are deployed and unique functions and strategies to fulfill their roles effectively.
interact. For example, these environments can be like soft- These profiles are crucial for defining the agents’ interactions
ware development [Hong et al., 2023], gaming [Mao et al., and effectiveness within their respective environments. Table
2023], and various other domains such as financial markets 1 lists the agent Profiles in recent LLM-MA works.
[Li et al., 2023g] or even social behavior modeling [Park et Regarding the Agent Profiling Methods, we categorized
al., 2023]. The LLM-based agents perceive and act within them into three types: Pre-defined, Model-Generated, and
Figure 2: The Architecture of LLM-MA Systems.
Table 1: Summary of the LLM-MA studies. We categorize current work according to their motivation, research domains and goals, and detail
each work from different aspects regarding Agents-Environment Interface, Agents Profiling, Agents Communication and Agents Capability
Acquisition. “-” denotes that a particular element is not specifically mentioned in this work.
4.1.1 Software Development the generated code. [Li et al., 2023b] first proposes a sim-
ple role-play agent framework, which utilizes the interplay
Given that software development is a complex endeavor re- of two roles to realize autonomous programming based on
quiring the collaboration of various roles like product man- one-sentence user instruction. It provides insights into the
agers, programmers, and testers, LLM-MA systems are typ- “cognitive” processes of communicative agents. [Dong et al.,
ically set to emulate these distinct roles and collaborate to 2023b] makes LLMs work as distinct “experts” for sub-tasks
address the intricate challenge. Following the waterfall or in software development, autonomously collaborating to gen-
Standardized Operating Procedures (SOPs) workflow of the erate code. Moreover, [Qian et al., 2023] presents an end-to-
software development, the communication structure among end framework for software development, utilizing multiple
agents is usually layered. Agents generally interact with the agents for software development without incorporating ad-
code interpreter, other agents or human to iteratively refine
vanced human teamwork experience. [Hong et al., 2023] first 4.1.4 Science Debate
incorporates human workflow insights for more controlled LLM-MA can be set for science debating scenarios, where
and validated performance. It encodes SOPs into prompts to agents debate with each other to enhance the collective rea-
enhance structured coordination. [Huang et al., 2023a] delves soning capabilities in tasks such as Massive Multitask Lan-
deeper into multi-agent based programming by solving the guage Understanding (MMLU) [Hendrycks et al., 2020],
problem of balancing code snippet generation with effective Math problems [Cobbe et al., 2021], and StrategyQA [Geva
test case generation, execution, and optimization. et al., 2021]. The main idea is that each agent initially of-
fers its own analysis of a problem, which is then followed
4.1.2 Embodied Agents by a joint debating process. Through multiple rounds of de-
bate, the agents converge on a single, consensus answer. [Du
Most embodied agents applications inherently utilize multi- et al., 2023] leverages the multi-agents debate process on a
ple robots working together to perform complex real-world set of six different reasoning and factual accuracy tasks and
planning and manipulation tasks such as warehouse manage- demonstrates that LLM-MA debating can improve factual-
ment with heterogeneous robot capabilities. Hence, LLM- ity. [Xiong et al., 2023] focuses on the commonsense rea-
MA can be used to model robots with different capabilities soning tasks and formulates a three-stage debate to align with
and cooperate with each other to solve real-world physical real-world scenarios including fair debate, mismatched de-
tasks. [Dasgupta et al., 2023] first explores the potential to bate, and roundtable debate. The paper also analyzes the
use LLM as an action planner for embedded agents. [Mandi inter-consistency between different LLMs and claims that de-
et al., 2023] introduces RoCo, a novel approach for multi- bating can improve the inter-consistency. [Tang et al., 2023]
robot collaboration that uses LLMs for high-level commu- also utilizes multiple LLM-based agents as distinct domain
nication and low-level path planning. Each robotic arm is experts to do the collaborative discussion on a medical report
equipped with an LLM, cooperating with inverse kinemat- to reach a consensus for medical diagnosis.
ics and collision checking. Experimental results demonstrate
the adaptability and success of RoCo in collaborative tasks. 4.2 LLM-MA for World Simulation
[Zhang et al., 2023c] presents CoELA, a Cooperative Em- Another mainstream application scenario of LLM-MA is the
bodied Language Agent, managing discussions and task plan- world simulation. Research in this area is rapidly growing
ning in an LLM-MA setting. This challenging setting is and spans a diverse range of fields including social sciences,
featured with decentralized control, complex partial observa- gaming, psychology, economics, policy-making, etc. The key
tion, costly communication, and multi-objective long-horizon reason for employing LLM-MA in world simulations lies in
tasks. [Chen et al., 2023d] investigates communication chal- their exceptional role-playing abilities, which are crucial for
lenges in scenarios involving a large number of robots, as realistically depicting various roles and viewpoints in a sim-
assigning each robot an LLM will be costly and unpracti- ulated world. The environment of world simulation projects
cal due to the long context. The study compares four com- is usually crafted to reflect the specific scenario being simu-
munication frameworks, centralized, decentralized, and two lated, with agents designed in various profiles to match this
hybrid models, to evaluate their effectiveness in coordinating context. Unlike the problem solving systems that focus on
complex multi-agent tasks. [Yu et al., 2023] proposes Co- agent cooperation, world simulation systems involve diverse
NavGPT for multi-robot cooperative visual target navigation, methods of agent management and communication, reflecting
integrating LLM as a global planner to assign frontier goals the complexity and variety of real-world interactions. Next,
to each robot. [Chen et al., 2023b] proposes an LLM-based we explore simulations conducted in diverse fields.
consensus-seeking framework, which can be applied as a co-
operative planner to a multi-robot aggregation task. 4.2.1 Societal Simulation
In societal simulation, LLM-MA models are used to simu-
4.1.3 Science Experiments late social behaviors, aiming to explore the potential social
dynamics and propagation, test social science theories, and
Like multiple agents play as different specialists and cooper- populate virtual spaces and communities with realistic social
ate to solve the Software Development and Embodied Agents phenomena [Park et al., 2023]. Leveraging LLM’s capabili-
problem, multiple agents can also be used to form a science ties, agents with unique profiles engage in extensive commu-
team to conduct science experiments. One important differ- nication, generating rich behavioral data for in-depth social
ence from previous applications lies in the crucial role of hu- science analysis.
man oversight, due to the high expenses of the science ex- The scale of societal simulation has expanded over time,
periments and the hallucination of the LLM agents. Human beginning with smaller, more intimate settings and progress-
experts are at the center of these agents to process the infor- ing to larger, more intricate ones. Initial work by [Park et al.,
mation of agents and give feedback to the agents. [Zheng et 2023] introduces generative agents within an interactive sand-
al., 2023] utilizes multiple LLM-based agents, each focusing box environment reminiscent of the sims, allowing end users
on specific tasks for the science experiments including strat- to engage with a modest community of 25 agents through nat-
egy planning, literature search, coding, robotic operations, ural language. At the same time, [Park et al., 2022] develops
and labware design. All these agents interact with humans Social Simulacra, which constructs a simulated community
to work collaboratively to optimize the synthesis process of of 1,000 personas. This system takes a designer’s vision for
complex materials. a community—its goals, rules, and member personas—and
simulates it, generating behaviors like posting, replying, and explicit game process, agents may still overlook or modify
even anti-social actions. Building on this, [Gao et al., 2023a] refined beliefs when taking actions.
takes the concept further by constructing vast networks com-
4.2.3 Psychology
prising 8,563 and 17,945 agents, respectively, designed to
simulate social networks focused on the topics of Gender Dis- In psychological simulation studies, like in the societal simu-
crimination and Nuclear Energy. This evolution showcases lation, multiple agents are utilized to simulate humans with
the increasing complexity and size of simulated environments various traits and thought processes. However, unlike so-
in recent research. Recent studies such as [Chen et al., 2023b; cietal simulations, one approach in psychology involves di-
Kaiya et al., 2023; Li et al., 2023a; Li et al., 2023f; Ziems et rectly applying psychological experiments to these agents.
al., 2023] highlight the evolving complexity in multi-agent This method focuses on observing and analyzing their varied
systems, LLM impacts on social networks, and their integra- behaviors through statistical methods. Here, each agent op-
tion into social science research. erates independently, without interacting with others, essen-
tially representing different individuals. Another approach
4.2.2 Gaming aligns more closely with societal simulations, where multiple
agents interact and communicate with each other. In this sce-
LLM-MA is well-suited for creating simulated gaming en-
nario, psychological theories are applied to understand and
vironments, allowing agents to assume various roles within
analyze the emergent behavioral patterns. This method fa-
games. This technology enables the development of con-
cilitates the study of interpersonal dynamics and group be-
trolled, scalable, and dynamic settings that closely mimic
haviors, providing insights into how individual psychological
human interactions, making it ideal for testing a range of
traits influence collective actions. [Ma et al., 2023] explores
game theory hypotheses [Mao et al., 2023; Xu et al., 2023b].
the psychological implications and outcomes of employing
Most games simulated by LLM-MA rely heavily on natu-
LLM-based conversational agents for mental well-being sup-
ral language communication, offering a sandbox environment
port. It emphasizes the need for carefully evaluating the use
within different game settings for exploring or testing game
of LLM-based agents in mental health applications from a
theory hypotheses including reasoning, cooperation, persua-
psychological perspective. [Kovač et al., 2023] introduces
sion, deception, leadership, etc.
a tool named SocialAI school for creating interactive envi-
[Akata et al., 2023] leverages behavioral game theory to
ronments simulating social interactions. It draws from devel-
examine LLMs’ behavior in interactive social settings, partic- opmental psychology to understand how agents can acquire,
ularly their performance in games like the iterated Prisoner’s demonstrate, and evolve social skills such as joint attention,
Dilemma and Battle of the Sexes. Furthermore, [Xu et al., communication, and cultural learning. [Zhang et al., 2023d]
2023b] proposes a framework using ChatArena library [Wu explores how LLM agents, with distinct traits and thinking
et al., 2023b] for engaging LLMs in communication games patterns, emulate human-like social behaviors such as confor-
like Werewolf, using retrieval and reflection on past commu- mity and majority rule. This integration of psychology into
nications for improvement, as well as the Chain-of-Thought the understanding of agent collaboration offers a novel lens
mechanism [Wei et al., 2022]. [Light et al., 2023b] explores for examining and enhancing the mechanisms behind LLM-
the potential of LLM agents in playing Resistance Avalon, in- based multi-agents systems. [Aher et al., 2023] introduces
troducing AVALONBENCH, a comprehensive game environ- Turing Experiments to evaluate the extent to which large lan-
ment and benchmark for further developing advanced LLMs guage models can simulate different aspects of human behav-
and multi-agent frameworks. [Wang et al., 2023c] also fo- iors. The Turing Experiments replicate classical experiments
cuses on the capabilities of LLM Agents in dealing with mis- and phenomena in psychology, economics, and sociology us-
information in the Avalon game, proposing the Recursive ing a question-answering format to mimic experimental con-
Contemplation (ReCon) framework to enhance LLMs’ ability ditions. They also design a prompt that is used to simulate
to discern and counteract deceptive information. [Xu et al., the responses of multiple different individuals by varying the
2023c] introduces a framework combining LLMs with rein- name. By simulating various kinds of individuals via LLM,
forcement learning (RL) to develop strategic language agents they show that larger models replicate human behavior more
for the Werewolf game. It introduces a new approach to use faithfully, but they also reveal a hyper-accuracy distortion, es-
RL policy in the case that the action and state sets are not pre- pecially in knowledge-based tasks.
defined but in the natural language setting. [Mukobi et al.,
2023] designs the “Welfare Diplomacy”, a general-sum vari- 4.2.4 Economy
ant of the zero-sum board game Diplomacy, where players LLM-MA is used to simulate economic and financial trading
must balance military conquest and domestic welfare. It also environments mainly because it can serve as implicit com-
offers an open-source benchmark, aiming to help improve the putational models of humans. In these simulations, agents
cooperation ability of multi-agent AI systems. On top of that, are provided with endowments, and information, and set with
there is a work [Li et al., 2023c] in a multi-agent cooperative pre-defined preferences, allowing for an exploration of their
text game testing the agents’ Theory of Mind (ToM), the abil- actions in economic and financial contexts. This is similar to
ity to reason about the concealed mental states of others and the way economists model ’homo economicus’, the character-
is fundamental to human social interactions, collaborations, ization of man in some economic theories as a rational person
and communications. [Fan et al., 2023] comprehensively as- who pursues wealth for his own self-interest [Horton, 2023].
sesses the capability of LLMs as rational players, and iden- There are several studies demonstrate the diverse applications
tifies the weaknesses of LLM-based Agents that even in the of LLM-MA in simulating economic scenarios, encompass-
Motivation Domain Datasets and Benchmarks Used by Data Link
HumanEval [Hong et al., 2023] Link
Software Development MBPP [Hong et al., 2023] Link
SoftwareDev [Hong et al., 2023] Link
RoCoBench [Mandi et al., 2023] Link
Communicative Watch-And-Help (C-WAH) [Zhang et al., 2023c] Link
Embodied AI [Zhang et al., 2023c]
ThreeDWorld Multi-Agent Transport (TDW-MAT) Link
Problem Solving HM3D v0.2 [Yu et al., 2023] Link
MMLU [Tang et al., 2023] Link
MedQA [Tang et al., 2023] Link
PubMedQA [Tang et al., 2023] Link
Science Debate [Du et al., 2023]
GSM8K Link
StrategyQA [Xiong et al., 2023] Link
Chess Move Validity [Du et al., 2023] Link
SOTOPIA [Zhou et al., 2023b] /
Society Gender Discrimination [Gao et al., 2023a] /
Nuclear Energy [Gao et al., 2023a] /
Werewolf [Xu et al., 2023b] /
Avalon [Light et al., 2023b] /
Welfare Diplomacy [Mukobi et al., 2023] /
Gaming [Agashe et al., 2023]
Layout in the Overcooked-AI environment /
World Simulation Chameleon [Xu et al., 2023a] Link
Undercover [Xu et al., 2023a] Link
Ultimatum Game TE [Aher et al., 2023] Link
Psychology Garden Path TE [Aher et al., 2023] Link
Wisdom of Crowds TE [Aher et al., 2023] Link
MovieLens-1M [Zhang et al., 2023a] Link
Recommender System [Zhang et al., 2023e]
Amazon review dataset /
Policy Making Board Connectivity Evaluation [Hua et al., 2023] Link
Table 2: Datasets and Benchmarks commonly used in LLM-MA studies. “ / ” denotes the unavailability of data link.
ing macroeconomic activities, information marketplaces, fi- tween offline metrics and real-world performance in recom-
nancial trading, and virtual town simulations. Agents in- mendation systems, Agent4Rec [Zhang et al., 2023a] intro-
teract in cooperative or debate, decentralized environments. duces a simulation platform based on LLM-MA. 1000 gener-
[Li et al., 2023e] employs LLMs for macroeconomic simu- ative agents are initialized with the MovieLens-1M dataset to
lation, featuring prompt-engineering-driven agents that emu- simulate complex user interactions in a recommendation en-
late human-like decision-making, thereby enhancing the real- vironment. Agent4Rec shows that LLM-MA can effectively
ism of economic simulations compared to rule-based or other mimic real user preferences and behaviors, provide insights
AI agents. [Anonymous, 2023] explores the buyer’s inspec- into phenomena like the filter bubble effect, and help uncover
tion paradox in an information marketplace, reveals improved causal relationships in recommendation tasks. In Agent4Rec
decision-making and answer quality when agents temporar- work, agents are used to simulate users and they do not com-
ily access information before purchase. [Li et al., 2023g] municate with each other. Different from Agent4Rec work,
presents an LLM-MA framework for financial trading, em- [Zhang et al., 2023e] treats both users and items as agents,
phasizing a layered memory system, debate mechanisms, and optimizing them collectively to reflect and adjust to real-
individualized trading characters, thereby fortifying decision- world interaction disparities. This work emphasizes simulat-
making robustness. [Zhao et al., 2023] utilizes LLM-based ing user-item interactions and propagates preferences among
Agents to simulate a virtual town with restaurant and cus- agents, capturing the collaborative filtering essence.
tomer agents, yielding insights aligned with sociological and
economic theories. These studies collectively illuminate the 4.2.6 Policy Making
broad spectrum of applications and advancements in employ- Similar to simulations in gaming and economic scenarios,
ing LLMs for diverse economic simulation scenarios. Policy Making requires strong decision-making capabilities
to realistic and dynamic complex problems. LLM-MA can
4.2.5 Recommender Systems be used to simulate the policy making via simulating a virtual
The use of the LLM-MA in recommender systems is similar government or simulating the impact of various policies on
to that in psychology since studies in both fields involve the different communities. These simulations provide valuable
consideration of extrinsic and intrinsic human factors such as insights into how policies are formulated and their potential
cognitive processes and personality [Lex and Schedl, 2022]. effects, aiding policymakers in understanding and anticipat-
One way to use LLM-MA in recommender systems is to di- ing the consequences of their decisions [Farmer and Axtell,
rectly introduce items to multiple LLM-based agents within 2022]. The research outlined in [Xiao et al., 2023] is cen-
diverse traits and conduct statistics of the preferences of dif- tered on simulating a township water pollution crisis. It sim-
ferent agents. Another way is to treat both users and items ulated a town located on an island including a demographic
as agents and the user-item communication as interactions, structure of different agents and township head and advisor.
simulating the preference propagation. To bridge the gap be- Within the water pollution crisis simulation, this work pro-
vides an in-depth analysis of how a virtual government entity al., 2023] present platforms and libraries for building au-
might respond to such a public administration challenge and tonomous agents, emphasizing their adaptability in task-
how information transfer in the social network in this crisis. solving and social simulations.
[Hua et al., 2023] introduces WarAgent to simulate key his-
torical conflicts and provides insights for conflict resolution 5.2 Datasets and Benchmarks
and understanding, with potential applications in preventing
future international conflicts. We summarize commonly used datasets or benchmarks for
LLM-MA study in Table 2. We observe that different re-
4.2.7 Disease Propagation Simulation search applications use different datasets and benchmarks.
Leveraging the societal simulation capabilities of LLM-MA In the Problem solving scenarios, most datasets and bench-
can also be used to simulate disease propagation. The most marks are used to evaluate the planning and reasoning capa-
recent study in [Williams et al., 2023] delves into the use of bilities by Multiple agents cooperation or debate. In World
LLM-MA in simulating disease spread. The research show- Simulation scenarios, datasets and benchmarks are used to
cases through various simulations how these LLM-based evaluate the alignment between the simulated world and real-
agents can accurately emulate human responses to disease world or analyze the behaviors of different agents. However,
outbreaks, including behaviors like self-quarantine and iso- in certain research applications like Science Team operations
lation during heightened case numbers. The collective be- for experiments and economic modeling, there is still a need
havior of these agents mirrors the complex patterns of multi- for comprehensive benchmarks. The development of such
ple waves typically seen in pandemics, eventually stabilizing benchmarks would greatly enhance the ability to gauge the
into an endemic state. Impressively, their actions contribute success and applicability of LLM-MA in these complex and
to the attenuation of the epidemic curve. [Ghaffarzadegan et dynamic fields.
al., 2023] also discusses the epidemic propagation simulation
and decomposes the simulation into two parts: the Mechanis-
tic Model which represents the information or propagation of 6 Challenges and Opportunities
the virus and the Decision-Making Model which represents
the agents’ decision-making process when facing the virus. Studies of LLM-MA frameworks and applications are ad-
vancing rapidly, giving rise to numerous challenges and op-
5 Implementation Tools and Resources portunities. We identified several critical challenges and po-
tential areas for future study.
5.1 Multi-Agents Framework
We provide a detailed introduction to three open-source 6.1 Advancing into Multi-Modal Environment
multi-agent frameworks: MetaGPT [Hong et al., 2023],
CAMEL [Li et al., 2023b], and Autogen [Wu et al., 2023a]. Most previous work on LLM-MA has been focused on text-
They are all frameworks that utilize language models for based environments, excelling in processing and generating
complex task-solving with a focus on multi-agent collabora- text. However, there is a notable lack in multi-modal set-
tion, but they differ in their approaches and applications. tings, where agents would interact with and interpret data
MetaGPT is designed to embed human workflow processes from multiple sensory inputs and generate multiple outputs
into the operation of language model agents, thereby reducing such as images, audio, video, and physical actions. Inte-
the hallucination problem that often arises in complex tasks. grating LLMs into multi-modal environments presents addi-
It does this by encoding Standard Operating Procedures into tional challenges, such as processing diverse data types and
the system and using an assembly line approach to assign spe- enabling agents to understand each other and respond to more
cific roles to different agents. than just textual information.
CAMEL, or Communicative Agent Framework, is oriented
towards facilitating autonomous cooperation among agents. 6.2 Addressing Hallucination
It uses a novel technique called inception prompting to guide
conversational agents towards fulfilling tasks that are consis- The hallucination problem is a significant challenge in LLMs
tent with human objectives. This framework also serves as a and single LLM-based Agent systems. It refers to the phe-
tool for generating and studying conversational data, help- nomenon where the model generates text that is factually in-
ing researchers understand how communicative agents be- correct [Huang et al., 2023b]. However, this problem takes
have and interact. on an added layer of complexity in a multi-agent setting. In
AutoGen is a versatile framework that allows for the cre- such scenarios, one agent’s hallucination can have a cascad-
ation of applications using language models. It is distinctive ing effect. This is due to the interconnected nature of multi-
for its high level of customization, enabling developers to pro- agent systems, where misinformation from one agent can be
gram agents using both natural language and code to define accepted and further propagated by others in the network.
how these agents interact. This versatility enables its use in Therefore, detecting and mitigating hallucinations in LLM-
diverse fields, from technical areas such as coding and math- MA is not just a crucial task but also presents a unique set
ematics to consumer-focused sectors like entertainment. of challenges. It involves not only correcting inaccuracies at
More recently, [Chen et al., 2023c; Chen et al., 2023a] the level of individual agents but also managing the flow of
introduce frameworks for dynamic multi-agent collabora- information between agents to prevent the spread of these in-
tion, while [Zhou et al., 2023a; Li et al., 2023h; Xie et accuracies throughout the system.
6.3 Acquiring Collective Intelligence overlook the broader and more complex emergent behaviors
In traditional multi-agent systems, agents often use reinforce- that are integral to multi-agent systems. Secondly, there is a
ment learning to learn from offline training datasets. How- notable shortfall in the development of comprehensive bench-
ever, LLM-MA systems mainly learn from instant feedback, marks across several research domains, such as Science Team
such as interactions with the environment or humans, as we for Experiment Operations, Economic analysis, and Disease
discussed in Section 3. This learning style requires a reli- propagation simulation. This gap presents an obstacle to ac-
able interactive environment and it would be tricky to design curately assessing and benchmarking the full capabilities of
such an interactive environment for many tasks, limiting the LLM-MA systems in these varied and crucial fields.
scalability of LLM-MA systems. Moreover, the prevailing 6.6 Applications and Beyond
approaches in current research involve employing Memory
and Self-Evolution techniques to adjust agents based on feed- The potential of LLM-MA systems extends far beyond their
back. While effective for individual agents, these methods do current applications, holding great promise for advanced
not fully capitalize on the potential collective intelligence of computational problem-solving in fields such as finance, edu-
the agent network. They adjust agents in isolation, overlook- cation, healthcare, environmental science, urban planning and
ing the synergistic effects that can emerge from coordinated so on. As we have discussed, LLM-MA systems possess the
multi-agent interactions. Hence, jointly adjusting multiple capability to tackle complex problems and simulate various
agents and achieving optimal collective intelligence is still a aspects of the real world. While the current role-playing ca-
critical challenge for LLM-MA. pabilities of LLMs may have limitations, ongoing advance-
ments in LLM technology suggest a bright future. It is an-
6.4 Scaling Up LLM-MA Systems ticipated to have more sophisticated methodologies, applica-
LLM-MA systems are composed of a number of individual tions, datasets, and benchmarks tailored for diverse research
LLM-based agents, posing a significant challenge of scala- fields. Furthermore, there are opportunities to explore LLM-
bility regarding the number of agents. From the computa- MA systems from various theoretical perspectives, such as
tional complexity perspective, each LLM-based agent, typ- Cognitive Science [Sumers et al., 2023], Symbolic Artificial
ically built on large language models like GPT-4, demands Intelligence, Cybernetics, Complex Systems, and Collective
substantial computational power and memory. Scaling up the Intelligence. Such a multi-faceted approach could contribute
number of these agents in an LLM-MA system significantly to a more comprehensive understanding and innovative appli-
increases resource requirements. In scenarios with limited cations in this rapidly evolving field.
computational resource, it would be challenging to develop
these LLM-MA systems. 7 Conclusion
Additionally, as the number of agents in an LLM-MA sys- LLM-based Multi-Agents have shown inspiring collective in-
tem increases, additional complexities and research opportu- telligence and rapidly garnered increasing interest among re-
nities emerge, particularly in areas like efficient agent coor- searchers. In this survey, we first systematically review the
dination, communication, and understanding the scaling laws development of LLM-MA systems by positioning, differen-
of multi-agents. For instance, with more LLM-based agents, tiating, and connecting them from various aspects, regard-
the intricacy of ensuring effective coordination and commu- ing the agents-environment interface, the characterization of
nication rises significantly. As highlighted in [Dibia, 2023], agents by LLMs, the strategies for managing agent communi-
designing advanced Agents Orchestration methodologies is cation and the paradigms for capability acquisition. We also
increasingly important. These methodologies aim to opti- summarized LLM-MA applications for problem-solving and
mize agents workflows, task assignments tailored to differ- world simulation. By also highlighting the commonly used
ent agents, and communication patterns across agents such as datasets and benchmarks and discussing challenges and fu-
communication constraints between agents. Effective Agents ture opportunities, we hope that this survey can serve as a use-
Orchestration facilitates harmonious operation among agents, ful resource for researchers across various research fields, in-
minimizing conflicts and redundancies. Additionally, explor- spiring future research to explore the potential of LLM-based
ing and defining the scaling laws that govern the behavior and Multi-Agents.
efficiency of multi-agent systems as they grow larger remains
an important area of research. These aspects highlight the References
need for innovative solutions to optimize LLM-MA systems, [Agashe et al., 2023] Saaket Agashe, Yue Fan, and Xin Eric
making them both effective and resource-efficient. Wang. Evaluating multi-agent coordination abilities in
6.5 Evaluation and Benchmarks large language models, 2023.
We have summarized the datasets and benchmarks currently [Aher et al., 2023] Gati Aher, Rosa I. Arriaga, and
available for LLM-MA in Table 2. This is a starting point, and Adam Tauman Kalai. Using large language models
far from being comprehensive. We identify two significant to simulate multiple humans and replicate human subject
challenges in evaluating LLM-MA systems and benchmark- studies, 2023.
ing their performance against each other. Firstly, as discussed [Akata et al., 2023] Elif Akata, Lion Schulz, Julian Coda-
in [Xu et al., 2023a], much of the existing research focuses Forno, Seong Joon Oh, Matthias Bethge, and Eric Schulz.
on evaluating individual agents’ understanding and reason- Playing repeated games with large language models. arXiv
ing within narrowly defined scenarios. This focus tends to preprint arXiv:2305.16867, 2023.
[Anonymous, 2023] Anonymous. Rethinking the buyer’s in- [Farmer and Axtell, 2022] J. Doyne Farmer and Robert L.
spection paradox in information markets with language Axtell. Agent-Based Modeling in Economics and Finance:
agents. In Submitted to The Twelfth International Con- Past, Present, and Future. INET Oxford Working Papers
ference on Learning Representations, 2023. under review. 2022-10, Institute for New Economic Thinking at the Ox-
[Chan et al., 2023] Chi-Min Chan, Weize Chen, Yusheng ford Martin School, University of Oxford, June 2022.
Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and [Gao et al., 2023a] Chen Gao, Xiaochong Lan, Zhihong Lu,
Zhiyuan Liu. Chateval: Towards better llm-based evalua- Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin,
tors through multi-agent debate, 2023. and Yong Li. S3 : Social-network simulation system with
[Chen et al., 2023a] Guangyao Chen, Siwei Dong, Yu Shu, large language model-empowered agents. arXiv preprint
Ge Zhang, Jaward Sesay, Börje F Karlsson, Jie Fu, and arXiv:2307.14984, 2023.
Yemin Shi. Autoagents: A framework for automatic agent [Gao et al., 2023b] Yunfan Gao, Yun Xiong, Xinyu Gao,
generation. arXiv preprint arXiv:2309.17288, 2023. Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei
[Chen et al., 2023b] Huaben Chen, Wenkang Ji, Lufeng Xu, Sun, and Haofen Wang. Retrieval-augmented generation
and Shiyu Zhao. Multi-agent consensus seeking via large for large language models: A survey. arXiv preprint
language models. arXiv preprint arXiv:2310.20151, 2023. arXiv:2312.10997, 2023.
[Chen et al., 2023c] Weize Chen, Yusheng Su, Jingwei Zuo, [Geva et al., 2021] Mor Geva, Daniel Khashabi, Elad Segal,
Cheng Yang, Chenfei Yuan, Chen Qian, Chi-Min Chan, Tushar Khot, Dan Roth, and Jonathan Berant. Did aris-
Yujia Qin, Yaxi Lu, Ruobing Xie, et al. Agentverse: Facil- totle use a laptop? a question answering benchmark with
itating multi-agent collaboration and exploring emergent implicit reasoning strategies, 2021.
behaviors in agents. arXiv preprint arXiv:2308.10848,
2023. [Ghaffarzadegan et al., 2023] Navid Ghaffarzadegan, Aritra
Majumdar, Ross Williams, and Niyousha Hosseinichimeh.
[Chen et al., 2023d] Yongchao Chen, Jacob Arkin, Yang Generative agent-based modeling: Unveiling social sys-
Zhang, Nicholas Roy, and Chuchu Fan. Scalable multi- tem dynamics through coupling mechanistic models
robot collaboration with large language models: Cen- with generative artificial intelligence. arXiv preprint
tralized or decentralized systems? arXiv preprint arXiv:2309.11456, 2023.
arXiv:2309.15943, 2023.
[Guo et al., 2023] Taicheng Guo, Kehan Guo, Zhengwen
[Cobbe et al., 2021] Karl Cobbe, Vineet Kosaraju, Moham-
Liang, Zhichun Guo, Nitesh V Chawla, Olaf Wiest, Xi-
mad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser,
angliang Zhang, et al. What indeed can gpt models do
Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro
in chemistry? a comprehensive benchmark on eight tasks.
Nakano, et al. Training verifiers to solve math word prob-
arXiv preprint arXiv:2305.18365, 2023.
lems. arXiv preprint arXiv:2110.14168, 2021.
[Dasgupta et al., 2023] Ishita Dasgupta, Christine Kaeser- [Hendrycks et al., 2020] Dan Hendrycks, Collin Burns,
Chen, Kenneth Marino, Arun Ahuja, Sheila Babayan, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song,
Felix Hill, and Rob Fergus. Collaborating with lan- and Jacob Steinhardt. Measuring massive multitask lan-
guage models for embodied reasoning. arXiv preprint guage understanding. arXiv preprint arXiv:2009.03300,
arXiv:2302.00763, 2023. 2020.
[Dibia, 2023] Victor Dibia. Multi-agent llm applica- [Hong et al., 2023] Sirui Hong, Xiawu Zheng, Jonathan
tions — a review of current research, tools, and Chen, Yuheng Cheng, Ceyao Zhang, Zili Wang, Steven
challenges. https://newsletter.victordibia.com/p/ Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran,
multi-agent-llm-applications-a-review, 2023. et al. Metagpt: Meta programming for multi-agent col-
[Dong et al., 2023a] Qingxiu Dong, Lei Li, Damai Dai, laborative framework. arXiv preprint arXiv:2308.00352,
2023.
Ce Zheng, Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing
Xu, Lei Li, and Zhifang Sui. A survey on in-context learn- [Horton, 2023] John J Horton. Large language models as
ing, 2023. simulated economic agents: What can we learn from homo
[Dong et al., 2023b] Yihong Dong, Xue Jiang, Zhi Jin, and silicus? Technical report, National Bureau of Economic
Ge Li. Self-collaboration code generation via chatgpt, Research, 2023.
2023. [Hua et al., 2023] Wenyue Hua, Lizhou Fan, Lingyao Li,
[Du et al., 2023] Yilun Du, Shuang Li, Antonio Torralba, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, and
Joshua B. Tenenbaum, and Igor Mordatch. Improving fac- Yongfeng Zhang. War and peace (waragent): Large lan-
tuality and reasoning in language models through multia- guage model-based multi-agent simulation of world wars,
gent debate, 2023. 2023.
[Fan et al., 2023] Caoyun Fan, Jindou Chen, Yaohui Jin, and [Huang et al., 2023a] Dong Huang, Qingwen Bu, Jie M.
Hao He. Can large language models serve as rational play- Zhang, Michael Luck, and Heming Cui. Agentcoder:
ers in game theory? a systematic analysis. arXiv preprint Multi-agent-based code generation with iterative testing
arXiv:2312.05488, 2023. and optimisation, 2023.
[Huang et al., 2023b] Lei Huang, Weijiang Yu, Weitao Ma, large language model driven social bots in online social
Weihong Zhong, Zhangyin Feng, Haotian Wang, Qiang- networks. arXiv preprint arXiv:2307.10337, 2023.
long Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. [Li et al., 2023g] Yang Li, Yangyang Yu, Haohang Li, Zhi
A survey on hallucination in large language models: Prin-
Chen, and Khaldoun Khashanah. Tradinggpt: Multi-agent
ciples, taxonomy, challenges, and open questions. arXiv
system with layered memory and distinct characters for
preprint arXiv:2311.05232, 2023.
enhanced financial trading performance, 2023.
[Kaiya et al., 2023] Zhao Kaiya, Michelangelo Naim, Jo-
[Li et al., 2023h] Yuan Li, Yixuan Zhang, and Lichao Sun.
vana Kondic, Manuel Cortes, Jiaxin Ge, Shuying Luo,
Guangyu Robert Yang, and Andrew Ahn. Lyfe agents: Metaagents: Simulating interactions of human behaviors
Generative agents for low-cost real-time social interac- for llm-based task-oriented coordination via collaborative
tions. arXiv preprint arXiv:2310.02172, 2023. generative agents. arXiv preprint arXiv:2310.06500, 2023.
[Khot et al., 2023] Tushar Khot, Harsh Trivedi, Matthew [Liang et al., 2023] Zhenwen Liang, Wenhao Yu, Tanmay
Finlayson, Yao Fu, Kyle Richardson, Peter Clark, and Rajpurohit, Peter Clark, Xiangliang Zhang, and Ashwin
Ashish Sabharwal. Decomposed prompting: A modular Kaylan. Let gpt be a math tutor: Teaching math word prob-
approach for solving complex tasks, 2023. lem solvers with customized exercise generation. arXiv
preprint arXiv:2305.14386, 2023.
[Kovač et al., 2023] Grgur Kovač, Rémy Portelas, Peter Ford
Dominey, and Pierre-Yves Oudeyer. The socialai school: [Light et al., 2023a] Jonathan Light, Min Cai, Sheng Shen,
Insights from developmental psychology towards artificial and Ziniu Hu. Avalonbench: Evaluating llms playing the
socio-cultural agents. arXiv preprint arXiv:2307.07871, game of avalon, 2023.
2023. [Light et al., 2023b] Jonathan Light, Min Cai, Sheng Shen,
[Lewis et al., 2021] Patrick Lewis, Ethan Perez, Aleksan- and Ziniu Hu. From text to tactic: Evaluating llms play-
dra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman ing the game of avalon. arXiv preprint arXiv:2310.05036,
Goyal, Heinrich Küttler, Mike Lewis, Wen tau Yih, 2023.
Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. [Liu et al., 2023] Zijun Liu, Yanzhe Zhang, Peng Li, Yang
Retrieval-augmented generation for knowledge-intensive Liu, and Diyi Yang. Dynamic llm-agent network: An llm-
nlp tasks, 2021. agent collaboration framework with agent team optimiza-
[Lex and Schedl, 2022] Elisabeth Lex and Markus Schedl. tion. arXiv preprint arXiv:2310.02170, 2023.
Psychology-informed recommender systems: A human- [Ma et al., 2023] Zilin Ma, Yiyang Mei, and Zhaoyuan Su.
centric perspective on recommender systems. In Proceed- Understanding the benefits and challenges of using large
ings of the 2022 Conference on Human Information In- language model-based conversational agents for mental
teraction and Retrieval, CHIIR ’22, page 367–368, New well-being support. arXiv preprint arXiv:2307.15810,
York, NY, USA, 2022. Association for Computing Ma- 2023.
chinery.
[Mandi et al., 2023] Zhao Mandi, Shreeya Jain, and Shuran
[Li et al., 2023a] Chao Li, Xing Su, Chao Fan, Haoying Han,
Song. Roco: Dialectic multi-robot collaboration with large
Cong Xue, and Chunmo Zheng. Quantifying the impact
language models. arXiv preprint arXiv:2307.04738, 2023.
of large language models on collective opinion dynamics.
arXiv preprint arXiv:2308.03313, 2023. [Mao et al., 2023] Shaoguang Mao, Yuzhe Cai, Yan Xia,
[Li et al., 2023b] Guohao Li, Hasan Abed Al Kader Ham- Wenshan Wu, Xun Wang, Fengyi Wang, Tao Ge, and Furu
moud, Hani Itani, Dmitrii Khizbullin, and Bernard Wei. Alympics: Language agents meet game theory. arXiv
Ghanem. Camel: Communicative agents for” mind” ex- preprint arXiv:2311.03220, 2023.
ploration of large scale language model society. arXiv [Moura, 2023] João Moura. Crewai. https://github.com/
preprint arXiv:2303.17760, 2023. joaomdmoura/crewAI, 2023.
[Li et al., 2023c] Huao Li, Yu Quan Chong, Simon Stepput- [Mukobi et al., 2023] Gabriel Mukobi, Hannah Erlebach,
tis, Joseph Campbell, Dana Hughes, Michael Lewis, and Niklas Lauffer, Lewis Hammond, Alan Chan, and Jesse
Katia Sycara. Theory of mind for multi-agent collabora- Clifton. Welfare diplomacy: Benchmarking language
tion via large language models, 2023. model cooperation. arXiv preprint arXiv:2310.08901,
[Li et al., 2023d] Minghao Li, Yingxiu Zhao, Bowen Yu, 2023.
Feifan Song, Hangyu Li, Haiyang Yu, Zhoujun Li, Fei [Nascimento et al., 2023] Nathalia Nascimento, Paulo Alen-
Huang, and Yongbin Li. Api-bank: A comprehensive car, and Donald Cowan. Self-adaptive large language
benchmark for tool-augmented llms, 2023. model (llm)-based multiagent systems. In 2023 IEEE
[Li et al., 2023e] Nian Li, Chen Gao, Yong Li, and Qingmin International Conference on Autonomic Computing and
Liao. Large language model-empowered agents for simu- Self-Organizing Systems Companion (ACSOS-C), pages
lating macroeconomic activities, 2023. 104–109. IEEE, 2023.
[Li et al., 2023f] Siyu Li, Jin Yang, and Kui Zhao. Are you [Park et al., 2022] Joon Sung Park, Lindsay Popowski, Car-
in a masquerade? exploring the behavior and impact of rie Cai, Meredith Ringel Morris, Percy Liang, and
Michael S Bernstein. Social simulacra: Creating popu- [Weng, 2023] Lilian Weng. Llm powered au-
lated prototypes for social computing systems. In Pro- tonomous agents. https://lilianweng.github.io/posts/
ceedings of the 35th Annual ACM Symposium on User In- 2023-06-23-agent/, 2023.
terface Software and Technology, pages 1–18, 2022. [Williams et al., 2023] Ross Williams, Niyousha Hos-
[Park et al., 2023] Joon Sung Park, Joseph C O’Brien, Car- seinichimeh, Aritra Majumdar, and Navid Ghaffarzade-
rie J Cai, Meredith Ringel Morris, Percy Liang, and gan. Epidemic modeling with generative agents. arXiv
Michael S Bernstein. Generative agents: Interac- preprint arXiv:2307.04986, 2023.
tive simulacra of human behavior. arXiv preprint
[Wooldridge and Jennings, 1995] Michael Wooldridge and
arXiv:2304.03442, 2023.
Nicholas R. Jennings. Intelligent agents: theory and prac-
[Qian et al., 2023] Chen Qian, Xin Cong, Wei Liu, Cheng tice. The Knowledge Engineering Review, 10:115 – 152,
Yang, Weize Chen, Yusheng Su, Yufan Dang, Jiahao Li, 1995.
Juyuan Xu, Dahai Li, Zhiyuan Liu, and Maosong Sun.
Communicative agents for software development, 2023. [Wu et al., 2023a] Qingyun Wu, Gagan Bansal, Jieyu Zhang,
Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li,
[Ruan et al., 2023] Jingqing Ruan, Yihong Chen, Bin Zhang, Li Jiang, Xiaoyun Zhang, and Chi Wang. Autogen: En-
Zhiwei Xu, Tianpeng Bao, Guoqing Du, Shiwei Shi, abling next-gen llm applications via multi-agent conversa-
Hangyu Mao, Ziyue Li, Xingyu Zeng, and Rui Zhao. Tptu: tion framework. arXiv preprint arXiv:2308.08155, 2023.
Large language model-based ai agents for task planning
and tool usage, 2023. [Wu et al., 2023b] Yuxiang Wu, Zhengyao Jiang, Akbir
Khan, Yao Fu, Laura Ruis, Edward Grefenstette, and Tim
[Russell and Norvig, 2009] Stuart Russell and Peter Norvig. Rocktäschel. Chatarena: Multi-agent language game en-
Artificial Intelligence: A Modern Approach. Prentice Hall vironments for large language models. GitHub repository,
Press, USA, 3rd edition, 2009. 2023.
[Shinn et al., 2023] Noah Shinn, Federico Cassano, Edward [Xi et al., 2023] Zhiheng Xi, Wenxiang Chen, Xin Guo,
Berman, Ashwin Gopinath, Karthik Narasimhan, and Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Jun-
Shunyu Yao. Reflexion: Language agents with verbal re- zhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran
inforcement learning, 2023. Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran
[Sumers et al., 2023] Theodore R Sumers, Shunyu Yao, Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu,
Karthik Narasimhan, and Thomas L Griffiths. Cogni- Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wensen
tive architectures for language agents. arXiv preprint Cheng, Qi Zhang, Wenjuan Qin, Yongyan Zheng, Xipeng
arXiv:2309.02427, 2023. Qiu, Xuanjing Huang, and Tao Gui. The rise and potential
[Tang et al., 2023] Xiangru Tang, Anni Zou, Zhuosheng of large language model based agents: A survey, 2023.
Zhang, Yilun Zhao, Xingyao Zhang, Arman Cohan, and [Xiao et al., 2023] Bushi Xiao, Ziyuan Yin, and Zixuan
Mark Gerstein. Medagents: Large language models as col- Shan. Simulating public administration crisis: A novel
laborators for zero-shot medical reasoning, 2023. generative agent-based simulation system to lower tech-
[Wang et al., 2021] Zijie J. Wang, Dongjin Choi, Shenyu nology barriers in social science research. arXiv preprint
Xu, and Diyi Yang. Putting humans in the natural lan- arXiv:2311.06957, 2023.
guage processing loop: A survey, 2021. [Xie et al., 2023] Tianbao Xie, Fan Zhou, Zhoujun Cheng,
[Wang et al., 2023a] Kuan Wang, Yadong Lu, Michael San- Peng Shi, Luoxuan Weng, Yitao Liu, Toh Jing Hua, Jun-
tacroce, Yeyun Gong, Chao Zhang, and Yelong Shen. ning Zhao, Qian Liu, Che Liu, et al. Openagents: An open
Adapting llm agents through communication, 2023. platform for language agents in the wild. arXiv preprint
arXiv:2310.10634, 2023.
[Wang et al., 2023b] Lei Wang, Chen Ma, Xueyang Feng,
Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Ji- [Xiong et al., 2023] Kai Xiong, Xiao Ding, Yixin Cao, Ting
akai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Liu, and Bing Qin. Examining inter-consistency of large
Wei, and Ji-Rong Wen. A survey on large language model language models collaboration: An in-depth analysis via
based autonomous agents, 2023. debate, 2023.
[Wang et al., 2023c] Shenzhi Wang, Chang Liu, Zilong [Xu et al., 2023a] Lin Xu, Zhiyuan Hu, Daquan Zhou,
Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Hongyu Ren, Zhen Dong, Kurt Keutzer, See Kiong Ng,
Chaofei Wang, Shiji Song, and Gao Huang. Avalon’s game and Jiashi Feng. Magic: Investigation of large language
of thoughts: Battle against deception through recursive model powered multi-agent in cognition, adaptability, ra-
contemplation. arXiv preprint arXiv:2310.01320, 2023. tionality and collaboration, 2023.
[Wei et al., 2022] Jason Wei, Xuezhi Wang, Dale Schuur- [Xu et al., 2023b] Yuzhuang Xu, Shuo Wang, Peng Li,
mans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Fuwen Luo, Xiaolong Wang, Weidong Liu, and Yang
Denny Zhou, et al. Chain-of-thought prompting elicits Liu. Exploring large language models for communication
reasoning in large language models. Advances in Neural games: An empirical study on werewolf. arXiv preprint
Information Processing Systems, 35:24824–24837, 2022. arXiv:2309.04658, 2023.
[Xu et al., 2023c] Zelai Xu, Chao Yu, Fei Fang, Yu Wang, large language models transform computational social sci-
and Yi Wu. Language agents with reinforcement learning ence? Computational Linguistics, pages 1–53, 2023.
for strategic play in the werewolf game. arXiv preprint
arXiv:2310.18940, 2023.
[Yao et al., 2023] Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak
Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik
Narasimhan. Tree of thoughts: Deliberate problem solving
with large language models, 2023.
[Yu et al., 2023] Bangguo Yu, Hamidreza Kasaei, and Ming
Cao. Co-navgpt: Multi-robot cooperative visual semantic
navigation using large language models, 2023.
[Zhang et al., 2023a] An Zhang, Leheng Sheng, Yuxin
Chen, Hao Li, Yang Deng, Xiang Wang, and Tat-Seng
Chua. On generative agents in recommendation, 2023.
[Zhang et al., 2023b] Ceyao Zhang, Kaijie Yang, Siyi Hu,
Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang,
Zhaowei Zhang, Anji Liu, Song-Chun Zhu, et al. Proa-
gent: Building proactive cooperative ai with large lan-
guage models. arXiv preprint arXiv:2308.11339, 2023.
[Zhang et al., 2023c] Hongxin Zhang, Weihua Du, Jiaming
Shan, Qinhong Zhou, Yilun Du, Joshua B Tenenbaum,
Tianmin Shu, and Chuang Gan. Building cooperative
embodied agents modularly with large language models.
arXiv preprint arXiv:2307.02485, 2023.
[Zhang et al., 2023d] Jintian Zhang, Xin Xu, and Shumin
Deng. Exploring collaboration mechanisms for llm agents:
A social psychology view, 2023.
[Zhang et al., 2023e] Junjie Zhang, Yupeng Hou, Ruobing
Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu
Lin, and Ji-Rong Wen. Agentcf: Collaborative learning
with autonomous language agents for recommender sys-
tems, 2023.
[Zhao et al., 2023] Qinlin Zhao, Jindong Wang, Yixuan
Zhang, Yiqiao Jin, Kaijie Zhu, Hao Chen, and Xing Xie.
Competeai: Understanding the competition behaviors in
large language model-based agents, 2023.
[Zheng et al., 2023] Zhiling Zheng, Oufan Zhang, Ha L.
Nguyen, Nakul Rampal, Ali H. Alawadhi, Zichao Rong,
Teresa Head-Gordon, Christian Borgs, Jennifer T. Chayes,
and Omar M. Yaghi. Chatgpt research group for optimiz-
ing the crystallinity of mofs and cofs. ACS Central Sci-
ence, 9(11):2161–2170, 2023.
[Zhou et al., 2023a] Wangchunshu Zhou, Yuchen Eleanor
Jiang, Long Li, Jialong Wu, Tiannan Wang, Shi Qiu, Jin-
tian Zhang, Jing Chen, Ruipu Wu, Shuai Wang, et al.
Agents: An open-source framework for autonomous lan-
guage agents. arXiv preprint arXiv:2309.07870, 2023.
[Zhou et al., 2023b] Xuhui Zhou, Hao Zhu, Leena Mathur,
Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-
Philippe Morency, Yonatan Bisk, Daniel Fried, Graham
Neubig, and Maarten Sap. Sotopia: Interactive evaluation
for social intelligence in language agents, 2023.
[Ziems et al., 2023] Caleb Ziems, Omar Shaikh, Zhehao
Zhang, William Held, Jiaao Chen, and Diyi Yang. Can