FINCON: LLM Multi-Agent Financial System
FINCON: LLM Multi-Agent Financial System
Yangyang Yu1,⋆ , Zhiyuan Yao1,⋆ , Haohang Li1,⋆ , Zhiyang Deng1,⋆ , Yuechen Jiang1,⋆ , Yupeng Cao1,⋆
Zhi Chen1,⋆ , Jordan W. Suchow1 , Zhenyu Cui1 , Rong Liu1 , Zhaozhuo Xu1 , Denghui Zhang1
arXiv:2407.06567v3 [[Link]] 7 Nov 2024
Koduvayur Subbalakshmi1 , Guojun Xiong2 , Yueru He3 , Jimin Huang 3 , Dong Li3 , Qianqian Xie3,†
1
Stevens Institute of Technology 2 Harvard University 3 The Fin AI
⋆
These authors contributed equally † Corresponding author: [Link]@[Link]
Abstract
Large language models (LLMs) have shown potential in complex financial tasks,
but sequential financial decision-making remains challenging due to the volatile
environment and the need for intelligent risk management. While LLM-based
agent systems have achieved impressive returns, optimizing multi-source infor-
mation synthesis and decision-making through timely experience refinement is
underexplored. We introduce F IN C ON, an LLM-based multi-agent framework
with CONceptual verbal reinforcement for diverse FINancial tasks. Inspired by
real-world investment firm structures, F IN C ON employs a manager-analyst hier-
archy, enabling synchronized cross-functional agent collaboration towards unified
goals via natural language interactions. Its dual-level risk-control component
enhances decision-making by monitoring daily market risk and updating systematic
investment beliefs through self-critique. These conceptualized beliefs provide
verbal reinforcement for future decisions, selectively propagated to relevant agents,
improving performance while reducing unnecessary peer-to-peer communication
costs. F IN C ON generalizes well across tasks, including single stock trading and
portfolio management. 1
1 Introduction
The intricacies and fluctuations inherent in financial markets pose significant challenges for making
high-quality, sequential investment decisions. In tasks such as single stock trading and portfolio
management, each intelligent decision is driven by multiple market interactions and the integration
of diverse information streams, characterized by varying levels of timeliness and modalities [1, 2].
The primary objective of these tasks is to maximize profit while managing present market risks in an
open-ended environment.
In practice, trading firms often depend on synthesized teamwork, structured hierarchically with
functional roles such as data analysts, risk analysts, and portfolio managers communicating across
levels [3, 4]. These roles are responsible for the careful integration of diverse resources. However,
the cognitive limitations of human team members can hinder their capacity to rapidly process market
signals and achieve optimal investment outcomes [5].
To enhance investment returns and address the limitations of human decision-making, various
studies have explored methods such as deep reinforcement learning (DRL) to develop agent systems
that simulate market environments and automate investment strategies [6, 7, 8]. Concurrently,
1
We will release the code and demo in the following repo [Link]
2
2 Preliminaries
Here, we outline the mathematical notations for the two major financial decision-making tasks
that will be explicitly discussed in our work. We also formally present the generalized modeling
formulation using a Partially Observable Markov Decision Process (POMDP) [38] for financial
decision-making tasks.
∈ [0, 1], “buy”
max⟨w, µ⟩ − ⟨w, Σw⟩ s.t. wn = ∈ [−1, 0], “sell” , ∀n ∈ {1, · · · , N } (1)
w
= 0, “hold”
where w = (w1 · · · , wN ) ∈ RN is portfolio weights vector, µ and Σ are the shrinkage estimators
of N -dimensional sample expected return and N × N sample covariance matrix of chosen stocks’
daily return sequences respectively [35]. We note that portfolio weights are rebalanced on daily
basis. In our implementation, we begin by calculating the portfolio weights through solving the
aforementioned optimization problem. Next, the target positions are determined by linearly scaling
these portfolio weights from the previous step.
Formally, we model quantitative trading task as an infinite horizon POMDP [40, 41] with time index
T = {0, 1, 2, · · · } and discount factor α ∈ (0, 1]. The components of this model are as follows: (1) a
state space X × Y where X is the observable component and Y is unobservable component of the
QI
financial market; (2) the action space of analyst agents group is A = i=1 Ai , where Ai represents
the collection of processed market information in textual format done by agent i (total I analyst
agents), and for manager agent, its action space is A, which is modeled as {“buy", “sell", “hold"}
for single stock trading task and as ({“buy", “sell", “hold"} × [−1, 1])⊗N for portfolio management
task among N -stocks; (3) the reward function R(o, b, a) : X × Y × A → R uses daily profit &
loss (PnL) as the output; (4) the observation process {Ot }t∈T ⊆ X is an I-dimensional process,
with the ith entry {Oti }t∈T representing one type of uni-modal information flow solely processed
by the analyst agent i; (5) the reflection process {Bt }t∈T ⊆ Y represents the manager agent’s
self-reflection, which is updated from Bt to Bt+1 on daily basis [42]); (7) the processed information
flow Ôt = (Ôt1 , · · · , ÔtI ) ∈ A, ∀ t ∈ T, which represents the information processing outputs from
analyst agents group.
Then, our multi-agent system is supposed to learn the policies of all agents: the policies of analyst
agents πθi i : X → Ai , i ∈ {1, · · · , I} (the ways to process information, i.e. Ôti ∼ πθi i (·|Oti )),
and the policy of manager agent πθa : A × Y → A (the ways to make trading decisions, i.e.
At ∼ πθa (·|Ôt , Bt )) such that the system maximizes cumulative trading reward while controlling
risk [43]. All policies Πθ = ({πθi i }Ii=1 , πθa ) are parameterized by textual prompts θ = ({θi }Ii=1 , θa ).
By updating prompts via the risk-control component Mr , the whole system optimizes policies Πθ in
a verbal reinforcement manner. By denoting daily profit & loss (PnL) by RtΠθ = R(Ot , Bt , At ), the
3
optimization objective for the whole system can be written as:
hX θ
i
max E αt RtΠ (2)
θ
t∈T
is a risk-sensitive optimization problem that leverages textual gradient descent, fundamentally dif-
fering from DRL algorithms designed for POMDPs. Further details on the textual gradient descent
approach are provided in the Appendix A.2.
3 Architecture of F IN C ON
In this section, we present the architecture of F IN C ON using a two-level hierarchy. First, we describe
the hierarchical framework for coordinating the agents’ synchronous work and communication. Then,
we elaborate on the functionalities of each module that constitutes each agent in F IN C ON. Finally,
we aim to elaborate on how F IN C ON solves the objective function expressed as Equation (2) through
a verbal reinforcement approach.
The agent system of F IN C ON consists of two main components: the Manager-Analyst Agent Group
component and the Risk-Control component.
4
3.1 Synthesized Multi-agent Hierarchical Structure Design
The primary goal is to enhance information presentation and comprehension while minimizing
unnecessary communication costs. The working mechanism of each agent is illustrated in Figure 2.
Analyst Agents. In F IN C ON, analyst agents distill concise investment insights from large volumes of
multi-source market data, each focused on a specific trading target. To ensure high-quality reasoning
by reducing task load and sharpening focus, each agent processes information from a single source in
a uni-modal fashion, providing pre-specified outputs based on prompts. This setup mimics an efficient
human team, where each analyst specializes in a specific function, filtering out market noise and
extracting key insights. These agents assist the manager agent by consolidating denoised investment
information from multiple perspectives. We implement seven distinct types of analyst agents using
LLMs, each producing unique investment insights, as shown in the upper section of Figure 2. Based
on input modalities, three textual data processing agents extract insights and sentiments from daily
news and financial reports. An audio agent uses the Whisper API to interpret investment signals from
earnings call recordings. Additionally, a data analysis agent and a stock selection agent compute
critical financial metrics, such as momentum and CVaR, using tabular time series data. The stock
selection agent also oversees portfolio selection by applying the classic risk diversification method in
quantitative finance [1].
Manager Agent. In F IN C ON, the manager agent acts as the sole decision-maker, responsible for
generating trading actions for sequential financial tasks. For portfolio management, it calculates
portfolio weights using convex optimization techniques constrained by directional trading decisions
(see optimization problem as presented in Formula (1)). Four key mechanisms support each decision:
1) Consolidating distilled insights from multiple analyst agents. 2) Receiving timely risk alerts and
conceptual investment updates from the risk control component. 3) Refining its investment beliefs
about the influence of different information sources on trading decisions for specific targets. 4)
Conducting self-reflection by reviewing reasoning outcomes from previous trading actions.
5
3.2 Modular Design of F IN C ON Agents
the respective analyst agents, key financial indicators (such as historical momentum), or other crucial
viewpoints.
These belief updates are first received by the manager agent and then selectively propagated to
relevant agents, minimizing over-communication. Unlike the text-based gradient descent proposed
by Tang et al.[28], which uses prompt editing distance as a learning rate, we derive investment belief
updates by measuring the overlapping percentage of trading actions between two consecutive training
trajectories at each belief update, as presented in Table 1. This approach has proven effective in
improving the performance of a synthesized agent system, where each worker has a clearly defined
and specialized role. The above describes the workflow of F IN C ON during the training stage, while
the workflow during the testing stage is detailed in the Appendix A.3.
Algorithm 1 Training Stage Algorithm of F IN C ON: Conceptual Verbal Reinforcement using Textual-based
Gradient Descent
i
Initialize manager-analysts component {Mpr }Ii=1 &Ma , and risk-control component Mr .
Initialize trading start date s, stock pool of portfolio and portfolio weights w0 = 0.
Initialize Prompts θ, policy Πθ .
while episode k < M ax do
for 0 ≤ t ≤ T do
Run policy Πθ (collecting daily PnL rt , portfolio weights wt and daily CVaR value ρt ).
if ρt < ρt−1 or rt < 0 then
Trigger Ma self-reflection and generate self-reflection text Bt .
end if
Get the investment trajectory Hk and calculate the objective function value (Function (2)).
end for
Compare the objective function values of episodes k − 1 & k, and decide which episode has higher
performance;
Pass sustained profitable and losing trades from two episodes Hk−1 & Hk into risk-control component
Mr ;
Guide Mr to summarize conceptualized investment insights {c1k−1 , · · · , cn 1 m
k−1 } & {ck , · · · , ck };
Compare two sets of conceptualized insights and give the reasoning for higher performance (providing
textual optimization direction, i.e. meta prompt);
Calculate the overlapping percentage between trading decision sequences from two episodes (providing
the learning rate τ );
Update the prompts by textual gradient-descent:
θ ←− Mr (θ, τ, meta prompt).
end while
Here, we explain the modular design of F IN C ON agents. Inspired by the recent works of Park et al.
[44] and Sumers et al. [45] on developing the cognitive structure of language agents for human-like
behavior, agents in F IN C ON integrate four modules to support their necessary functionalities, along
with a shared general configuration, as detailed in Appendix A.4:
General Configuration and Profiling Module. This module defines task types (e.g., stock trading,
portfolio management) and specifies trading targets, including sector and performance details. The
profiling module outlines each agent’s roles and responsibilities. The concatenated textual content
from these parts is used to query investment-related events from the agents’ memory databases.
Perception Module. This module defines how each agent interacts with the market, specifying the
information they perceive, receive, and communicate, with interactions tailored to each agent’s role.
In detail, it converts raw market data, feedback from other agents, and information retrieved from
the memory module into formats compatible with large language models, enabling them to process
these inputs effectively. Memory Module. The memory module comprises three key components:
6
working memory, procedural memory, and episodic memory. Much like how humans process events
in their working memory [46], F IN C ON agents leverage their working memory to perform a range of
tasks, including observation, distillation, and refinement of available memory events, all tailored to
the specific roles of the agents. Procedural memory and episodic memory are critical for recording
historical actions, outcomes, and reflections during sequential decision-making. Procedural memory
is generated after each decision step within an episode, storing data as memory events. For trading
inquiries, top events are retrieved from procedural memory and ranked based on recency, relevance,
and importance, following a simplified version of the method proposed by Yu et al. [32], with further
details provided in Appendix A.13. Each functional analyst agent has distinct procedural memory
decay rates, reflecting the timeliness of various financial data sources, which is crucial for aligning
multi-type data influencing specific time points and supporting informed decision-making. The
manager agent enhances the procedural memory of analyst agents by providing feedback through
an access counter. Both analyst and manager agents maintain procedural memory, but they keep
different records, as illustrated in Appendix A.4. Episodic memory, exclusive to the manager agent,
stores actions, PnL series from previous episodes, and updated conceptual investment beliefs from
the risk control component.
4 Experiments
Our experiment answers the key research questions (RQs): RQ1: Does F IN C ON demonstrate robust-
ness across multiple financial decision-making tasks, especially single-asset trading and portfolio
management? RQ2: Is the within-episode risk control mechanism in F IN C ON effective in maintain-
ing superior decision-making performance? RQ3: Is the over-episode risk control mechanism in
F IN C ON effective in timely updating the manager agent’s beliefs to enhance trading performance?
7
4.2 Main Results
achieves one of the lowest MDD values across most trading assets, demonstrating effective risk man-
agement while still delivering the highest investment returns. For detailed performance comparisons
across all models and metrics, refer to Table 1.
Overall, even with extended training periods, DRL-based models tend to underperform, with the A2C
algorithm lagging significantly behind other agents in general. Notably, the training periods for Nio
Inc. (NIO) and Coinbase Global Inc. (COIN) require clarification. NIO, which completed its IPO
in September 2018, has a slightly shorter training period than other tickers, yet the DRL algorithms
for NIO still achieved convergence. In contrast, Coinbase Global Inc. (COIN), which completed its
IPO in April 2021, presented a more significant challenge due to the limited available trading data,
causing DRL algorithms to struggle with convergence. This limitation underscores a major drawback
for DRL agents when trading recently listed IPOs. Consequently, our analysis of COIN focuses on
comparisons between F IN C ON, LLM-based agents, and the buy-and-hold (B & H) strategy. In this
context, F IN C ON demonstrates a clear advantage, achieving a cumulative return of over 57% and a
Sharpe ratio of 0.825. Furthermore, LLM-based agents, which can leverage diverse data types and
require minimal training, effectively mitigate the challenges faced by DRL algorithms.
TSLA AMZN NIO MSFT
Categories Models
CR% ↑ SR↑ MDD% ↓ CR % ↑ SR↑ MDD% ↓ CR% ↑ SR↑ MDD% ↓ CR% ↑ SR↑ MDD% ↓
Market B&H 6.425 0.145 58.150 2.030 0.072 34.241 -77.210 -1.449 63.975 27.856 1.230 15.010
Our Model F IN C ON 82.871 1.972 29.727 24.848 0.904 25.889 17.461 0.335 40.647 31.625 1.538 15.010
GA 16.535 0.391 54.131 -5.631 -0.199 37.213 -3.176 -1.574 3.155 -31.821 -1.414 39.808
F IN GPT 1.549 0.044 42.400 -29.811 -1.810 29.671 -4.959 -0.121 37.344 21.535 1.315 16.503
LLM-based
F IN M EM 34.624 1.552 15.674 -18.011 -0.773 36.825 -48.437 -1.180 64.144 -22.036 -1.247 29.435
F INAGENT 11.960 0.271 55.734 -24.588 -1.493 33.074 0.933 0.051 19.181 -27.534 -1.247 39.544
A2C -35.644 -0.805 61.502 -12.560 -0.444 37.106 -91.910 -1.728 68.911 21.397 0.962 21.458
DRL-based PPO 1.409 0.032 49.740 3.863 0.138 28.085 -72.119 -1.352 62.093 -4.761 -0.214 30.950
DQN -1.296 -0.029 58.150 11.171 0.398 31.174 -35.419 -0.662 56.905 27.021 1.216 21.458
Table 2: Comparison of key performance metrics during the testing period for the single-asset trading tasks
involving eight stocks, between F IN C ON and other algorithmic agents. Note that the highest and second highest
CRs and SRs have been tested and found statistically significant using the Wilcoxon signed-rank test. The highest
CRs and SRs are highlighted in red, while the second highest are marked in blue.
In alignment with market trends, F IN C ON consistently exhibits superior decision-making quality
compared to other LLM-based agents, regardless of market conditions—whether bullish (e.g., GOOG,
MSFT), bearish (e.g., NIO), or mixed (e.g., TSLA). We attribute this performance to its high-quality
distillation of information through a synthesized multi-agent collaboration mechanism, combined with
its dual-level risk control design, positioning F IN C ON as a leader in the space. By contrast, F IN GPT
primarily relies on sentiment analysis of financial information, failing to fully exploit the potential of
LLMs to integrate nuanced textual insights with numerical financial indicators. Similarly, GA and
F IN M EM use single-agent frameworks without sophisticated information distillation processes or a
diverse toolset, placing heavy cognitive demand on the agent to process multi-source information,
especially when dealing with large and varied data modalities. Moreover, their static or minimal
investment belief systems result in weak filtering of market noise. As illustrated in Figure 7 (a) &
(b) of Appendix A.7.2, this limitation leads these models to consistently hold lower positions and
hesitate between ‘buy’ or ‘sell’ decisions, ultimately resulting in suboptimal performance.
F IN C ON overcomes these challenges through its innovative multi-agent synthesis, enabling it to
deliver superior outcomes. Although F INAGENT performs well when integrating images and tabular
data, it struggles to remain competitive when incorporating audio data, such as ECC recordings,
which are critical in real-world trading. Additionally, F INAGENT relies on similarity-based memory
retrieval, which can lead to decisions based on outdated information, often resulting in errors. In
8
4.3 Ablation Studies
contrast, F IN C ON’s memory structure accounts for the varying timeliness of multi-source financial
data, significantly enhancing decision quality and overall performance.
Table 3: Key performance metrics comparison among all portfolio management strategies of Portfolio 1 & 2.
F IN C ON leads all performance metrics.
Figure 3: Portfolio values of Portfolio 1 & 2 changes over time for all the strategies. The computation of
portfolio value refers to Equation 7 in Appendix A.10.
In response to RQ2 and RQ3, we conduct a comprehensive evaluation of our unique risk control
component through two ablation studies. Both studies maintain consistency with the training and
testing periods used in the main experiments. The first study examines the effectiveness of the within-
episode risk control mechanism, which leverages Conditional Value at Risk (CVaR) to manage risk in
real-time, as detailed in Table 4. Comparisons on primary metrics illustrate that the success of utilizing
CVaR for within-episode risk control is evident in both bullish and bearish market environments in the
9
single asset trading case. Moreover, in portfolio trading with mixed price trends, our within-episode
risk control mechanism performs robustly by monitoring the entire portfolio’s value fluctuations.
The second study focuses on the over-episode risk control mechanism, demonstrating its critical
role in updating the trading manager agent’s beliefs to provide a more comprehensive understanding
of current trading conditions, as articulated in Table 5. The markedly improved CRs and SRs in
both decision-making scenarios underscore the effectiveness of using CVRF to update investment
beliefs episodically, guiding the agent towards more profitable investment strategies. Additionally,
F IN C ON demonstrates significant learning gains, achieving these results after only four training
episodes—substantially fewer than what is typically required by traditional RL algorithmic trading
agents. More visualizations and analysis are provided in the Appendix A.7.
Table 4: Key metrics F IN C ON with vs. without implementing CVaR for within-episode risk control. The
performance of F IN C ON with the implementation of CVaR won a leading performance in both single-asset
trading and portfolio management tasks.
Table 5: Key metrics F IN C ON with vs. without implementing belief updates for over-episode risk control.
The performance of F IN C ON with the implementation of CVRF won a leading performance in both single-asset
trading and portfolio management tasks.
5 Conclusion
In this paper, we present F IN C ON, a novel LLM-based multi-agent framework for financial decision-
making tasks, including single stock trading and portfolio management. Central to F IN C ON is the
Synthesized Manager-Analyst hierarchical communication structure and a dual-level risk control
component. This communication method channels financial data from multiple sources to specialized
analyst agents, who distill it into key investment insights. The manager agent then synthesizes these
insights for decision-making. Our experimental evaluations demonstrate the efficacy of our risk
control mechanism in mitigating investment risks and enhancing trading performance. Additionally,
the streamlined communication structure reduces overhead. The dual-level risk control component
introduces a novel approach to defining agent personas, enabling dynamic updates of risk and
market beliefs within agent communication. A valuable future research direction would be to scale
F IN C ON’s framework to manage large-sized portfolios comprising tens of assets, while maintaining
the impressive decision-making quality demonstrated with smaller portfolios. Given the LLM’s
input length constraint, a critical challenge lies in striking an optimal balance between information
conciseness through agent distillation and potential performance deterioration when extending the
current context window. Addressing this will be essential for ensuring quality-assured outcomes.
10
REFERENCES
References
[1] Harry Markowitz. Portfolio selection. The Journal of Finance, 7(1):77–91, 1952.
[2] Xuan-Hong Dang, Syed Yousaf Shah, and Petros Zerfos. " the squawk bot": Joint learning
of time series and text data modalities for automated financial information filtering. arXiv
preprint arXiv:1912.10858, 2019.
[3] John L Maginn, Donald L Tuttle, Dennis W McLeavey, and Jerald E Pinto. Managing
investment portfolios: a dynamic process, volume 3. John Wiley & Sons, 2007.
[4] Roy Radner. The organization of decentralized information processing. Econometrica: Journal
of the Econometric Society, pages 1109–1146, 1993.
[5] George A Miller. The magical number seven, plus or minus two: Some limits on our capacity
for processing information. Psychological review, 63(2):81, 1956.
[6] Rundong Wang, Hongxin Wei, Bo An, Zhouyan Feng, and Jun Yao. Commission fee is not
enough: A hierarchical reinforced framework for portfolio management. In Proceedings of the
AAAI Conference on Artificial Intelligence, volume 35, pages 626–633, 2021.
[7] Weiguang Han, Boyi Zhang, Qianqian Xie, Min Peng, Yanzhao Lai, and Jimin Huang. Select
and trade: Towards unified pair trading with hierarchical reinforcement learning. In Proceed-
ings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages
4123–4134, 2023.
[8] Molei Qin, Shuo Sun, Wentao Zhang, Haochong Xia, Xinrun Wang, and Bo An. Earnhft:
Efficient hierarchical reinforcement learning for high frequency trading. In Proceedings of the
AAAI Conference on Artificial Intelligence, volume 38, pages 14669–14676, 2024.
[9] Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A
survey. arXiv preprint arXiv:2212.10403, 2022.
[10] Mingyu Jin, Qinkai Yu, Haiyan Zhao, Wenyue Hua, Yanda Meng, Yongfeng Zhang, Mengnan
Du, et al. The impact of reasoning step length on large language models. arXiv preprint
arXiv:2401.04925, 2024.
[11] Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, and Denny Zhou. Large language models
as tool makers. arXiv preprint arXiv:2305.17126, 2023.
[12] Jingqing Ruan, Yihong Chen, Bin Zhang, Zhiwei Xu, Tianpeng Bao, Hangyu Mao, Ziyue
Li, Xingyu Zeng, Rui Zhao, et al. Tptu: Task planning and tool usage of large language
model-based ai agents. In NeurIPS 2023 Foundation Models for Decision Making Workshop,
2023.
[13] Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M Sadler, Wei-Lun Chao, and Yu Su.
Llm-planner: Few-shot grounded planning for embodied agents with large language models. In
Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2998–3009,
2023.
[14] Bhargavi Paranjape, Scott Lundberg, Sameer Singh, Hannaneh Hajishirzi, Luke Zettlemoyer,
and Marco Tulio Ribeiro. Art: Automatic multi-step reasoning and tool-use for large language
models. arXiv preprint arXiv:2303.09014, 2023.
[15] Qianqian Xie, Weiguang Han, Xiao Zhang, Yanzhao Lai, Min Peng, Alejandro Lopez-Lira,
and Jimin Huang. Pixiu: A comprehensive benchmark, instruction dataset and large language
model for finance. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine,
editors, Advances in Neural Information Processing Systems, volume 36, pages 33469–33484.
Curran Associates, Inc., 2023.
[16] Xiao Zhang, Ruoyu Xiang, Chenhan Yuan, Duanyu Feng, Weiguang Han, Alejandro Lopez-
Lira, Xiao-Yang Liu, Meikang Qiu, Sophia Ananiadou, Min Peng, Jimin Huang, and Qianqian
Xie. Dólares or dollars? unraveling the bilingual prowess of financial llms between spanish
and english. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery
and Data Mining, KDD ’24, page 6236–6246, New York, NY, USA, 2024. Association for
Computing Machinery.
[17] Qianqian Xie, Weiguang Han, Zhengyu Chen, Ruoyu Xiang, Xiao Zhang, Yueru He, Mengxi
Xiao, Dong Li, Yongfu Dai, Duanyu Feng, et al. The finben: An holistic financial benchmark
for large language models. arXiv preprint arXiv:2402.12659, 2024.
11
REFERENCES
[18] Gang Hu, Ke Qin, Chenhan Yuan, Min Peng, Alejandro Lopez-Lira, Benyou Wang, Sophia
Ananiadou, Wanlong Yu, Jimin Huang, and Qianqian Xie. No language is an island: Unifying
chinese and english in financial large language models, instruction data, and benchmarks.
arXiv preprint arXiv:2403.06249, 2024.
[19] Yuzhe Yang, Yifei Zhang, Yan Hu, Yilin Guo, Ruoli Gan, Yueru He, Mingcong Lei, Xiao
Zhang, Haining Wang, Qianqian Xie, et al. Ucfe: A user-centric financial expertise benchmark
for large language models. arXiv preprint arXiv:2410.14059, 2024.
[20] Joon Sung Park, Joseph C O’Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and
Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. arXiv
preprint arXiv:2304.03442, 2023.
[21] Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li,
Li Jiang, Xiaoyun Zhang, and Chi Wang. Autogen: Enabling next-gen llm applications via
multi-agent conversation framework. arXiv preprint arXiv:2308.08155, 2023.
[22] Shuofei Qiao, Ningyu Zhang, Runnan Fang, Yujie Luo, Wangchunshu Zhou, Yuchen Eleanor
Jiang, Chengfei Lv, and Huajun Chen. Autoact: Automatic agent learning from scratch via
self-planning. arXiv preprint arXiv:2401.05268, 2024.
[23] Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili
Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, et al. Metagpt: Meta programming for
multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
[24] Xudong Guo, Kaixuan Huang, Jiale Liu, Wenhui Fan, Natalia Vélez, Qingyun Wu, Huazheng
Wang, Thomas L Griffiths, and Mengdi Wang. Embodied llm agents learn to cooperate in
organized teams. arXiv preprint arXiv:2403.12482, 2024.
[25] Junkai Li, Siyu Wang, Meng Zhang, Weitao Li, Yunghwei Lai, Xinhui Kang, Weizhi Ma, and
Yang Liu. Agent hospital: A simulacrum of hospital with evolvable medical agents. arXiv
preprint arXiv:2405.02957, 2024.
[26] Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu,
Yaxi Lu, Yi-Hsin Hung, Chen Qian, et al. Agentverse: Facilitating multi-agent collaboration
and exploring emergent behaviors. In The Twelfth International Conference on Learning
Representations, 2023.
[27] Reid Pryzant, Dan Iter, Jerry Li, Yin Tat Lee, Chenguang Zhu, and Michael Zeng. Au-
tomatic prompt optimization with" gradient descent" and beam search. arXiv preprint
arXiv:2305.03495, 2023.
[28] Xinyu Tang, Xiaolei Wang, Wayne Xin Zhao, Siyuan Lu, Yaliang Li, and Ji-Rong Wen.
Unleashing the potential of large language models as prompt optimizers: An analogical
analysis with gradient-based model optimizers. arXiv preprint arXiv:2402.17564, 2024.
[29] Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh
Murthy, Zeyuan Chen, Jianguo Zhang, Devansh Arpit, et al. Retroformer: Retrospective large
language agents with policy gradient optimization. arXiv preprint arXiv:2308.02151, 2023.
[30] Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Re-
flexion: Language agents with verbal reinforcement learning. Advances in Neural Information
Processing Systems, 36, 2024.
[31] Hongyang Yang, Xiao-Yang Liu, and Christina Dan Wang. Fingpt: Open-source financial
large language models. arXiv preprint arXiv:2306.06031, 2023.
[32] Yangyang Yu, Haohang Li, Zhi Chen, Yuechen Jiang, Yang Li, Denghui Zhang, Rong Liu,
Jordan W Suchow, and Khaldoun Khashanah. Finmem: A performance-enhanced llm trading
agent with layered memory and character design. arXiv preprint arXiv:2311.13743, 2023.
[33] Wentao Zhang, Lingxuan Zhao, Haochong Xia, Shuo Sun, Jiaze Sun, Molei Qin, Xinyi Li,
Yuqing Zhao, Yilei Zhao, Xinyu Cai, et al. Finagent: A multimodal foundation agent for finan-
cial trading: Tool-augmented, diversified, and generalist. arXiv preprint arXiv:2402.18485,
2024.
[34] Freddy Delbaen and Sara Biagini. Coherent risk measures. Springer, 2000.
[35] Frank J Fabozzi, Sergio M Focardi, and Petter N Kolm. Quantitative equity investing: Tech-
niques and strategies. John Wiley & Sons, 2010.
12
REFERENCES
[36] Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang, Lingyao Li, Zhengting Wang,
Wenyue Hua, Dong Shu, Suiyuan Zhu, Xiaobo Jin, et al. When ai meets finance (stockagent):
Large language model-based stock trading in simulated real-world environments. arXiv
preprint arXiv:2407.18957, 2024.
[37] Keith Kuester, Stefan Mittnik, and Marc S Paolella. Value-at-risk prediction: A comparison of
alternative strategies. Journal of Financial Econometrics, 4(1):53–89, 2006.
[38] Matthijs TJ Spaan. Partially observable markov decision processes. In Reinforcement learning:
State-of-the-art, pages 387–414. Springer, 2012.
[39] Frank J Fabozzi, Harry M Markowitz, and Francis Gupta. Portfolio selection. Handbook of
finance, 2, 2008.
[40] Yang Liu, Qi Liu, Hongke Zhao, Zhen Pan, and Chuanren Liu. Adaptive quantitative trading:
An imitative deep reinforcement learning approach. In Proceedings of the AAAI conference on
artificial intelligence, volume 34, pages 2128–2135, 2020.
[41] Taylan Kabbani and Ekrem Duman. Deep reinforcement learning approach for trading
automation in the stock market. IEEE Access, 10:93564–93574, 2022.
[42] Thomas L Griffiths, Jian-Qiao Zhu, Erin Grant, and R Thomas McCoy. Bayes in the age of
intelligent machines. arXiv preprint arXiv:2311.10206, 2023.
[43] Ashwin Rao and Tikhon Jelvis. Foundations of reinforcement learning with applications in
finance. Chapman and Hall/CRC, 2022.
[44] Theodore Sumers, Shunyu Yao, Karthik Narasimhan, and Thomas L. Griffiths. Cognitive
architectures for language agents, 2023.
[45] Jintian Zhang, Xin Xu, and Shumin Deng. Exploring collaboration mechanisms for llm agents:
A social psychology view. arXiv preprint arXiv:2310.02124, 2023.
[46] Anthony D Wagner. Working memory contributions to human learning and remembering.
Neuron, 22(1):19–22, 1999.
[47] Harry M Markowitz and G Peter Todd. Mean-variance analysis in portfolio choice and capital
markets, volume 66. John Wiley & Sons, 2000.
[48] Xiao-Yang Liu, Hongyang Yang, Qian Chen, Runjia Zhang, Liuqing Yang, Bowen Xiao, and
Christina Dan Wang. Finrl: A deep reinforcement learning library for automated stock trading
in quantitative finance. arXiv preprint arXiv:2011.09607, 2020.
[49] Noah Shinn, Beck Labash, and Ashwin Gopinath. Reflexion: an autonomous agent with
dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
[50] Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. Expel:
Llm agents are experiential learners. In Proceedings of the AAAI Conference on Artificial
Intelligence, volume 38, pages 19632–19642, 2024.
[51] Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond,
Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al. Competition-level code
generation with alphacode. Science, 378(6624):1092–1097, 2022.
[52] Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang Lou, and Weizhu
Chen. Codet: Code generation with generated tests. arXiv preprint arXiv:2207.10397, 2022.
[53] Chen Qian, Xin Cong, Cheng Yang, Weize Chen, Yusheng Su, Juyuan Xu, Zhiyuan Liu,
and Maosong Sun. Communicative agents for software development. arXiv preprint
arXiv:2307.07924, 2023.
[54] Zelai Xu, Chao Yu, Fei Fang, Yu Wang, and Yi Wu. Language agents with reinforcement
learning for strategic play in the werewolf game. arXiv preprint arXiv:2310.18940, 2023.
[55] Weiyu Ma, Qirui Mi, Xue Yan, Yuqiao Wu, Runji Lin, Haifeng Zhang, and Jun Wang. Large
language models play starcraft ii: Benchmarks and a chain of summarization approach. arXiv
preprint arXiv:2312.11865, 2023.
[56] J de Curtò, I de Zarzà, Gemma Roig, Juan Carlos Cano, Pietro Manzoni, and Carlos T Calafate.
Llm-informed multi-armed bandit strategies for non-stationary environments. Electronics,
12(13):2814, 2023.
13
REFERENCES
[57] Huaqin Zhao, Zhengliang Liu, Zihao Wu, Yiwei Li, Tianze Yang, Peng Shu, Shaochen Xu,
Haixing Dai, Lin Zhao, Gengchen Mai, et al. Revolutionizing finance with llms: An overview
of applications and insights. arXiv preprint arXiv:2401.11641, 2024.
[58] Zhiwei Liu, Xin Zhang, Kailai Yang, Qianqian Xie, Jimin Huang, and Sophia Ananiadou.
Fmdllama: Financial misinformation detection based on large language models. arXiv preprint
arXiv:2409.16452, 2024.
[59] Yupeng Cao, Zhi Chen, Qingyun Pei, Fabrizio Dimino, Lorenzo Ausiello, Prashant Kumar,
KP Subbalakshmi, and Papa Momar Ndiaye. Risklabs: Predicting financial risk using large
language model based on multi-sources data. arXiv preprint arXiv:2404.07452, 2024.
[60] Qianqian Xie, Dong Li, Mengxi Xiao, Zihao Jiang, Ruoyu Xiang, Xiao Zhang, Zhengyu Chen,
Yueru He, Weiguang Han, Yuzhe Yang, et al. Open-finllms: Open multimodal large language
models for financial applications. arXiv preprint arXiv:2408.11878, 2024.
[61] Lorenzo Canese, Gian Carlo Cardarilli, Luca Di Nunzio, Rocco Fazzolari, Daniele Giardino,
Marco Re, and Sergio Spanò. Multi-agent reinforcement learning: A review of challenges and
applications. Applied Sciences, 11(11):4948, 2021.
[62] Kaiqing Zhang, Zhuoran Yang, and Tamer Başar. Multi-agent reinforcement learning: A
selective overview of theories and algorithms. Handbook of reinforcement learning and control,
pages 321–384, 2021.
[63] Jakob Foerster, Ioannis Alexandros Assael, Nando De Freitas, and Shimon Whiteson. Learning
to communicate with deep multi-agent reinforcement learning. Advances in neural information
processing systems, 29, 2016.
[64] Toru Lin, Jacob Huh, Christopher Stauffer, Ser Nam Lim, and Phillip Isola. Learning to ground
multi-agent communication with autoencoders. Advances in Neural Information Processing
Systems, 34:15230–15242, 2021.
[65] Woojun Kim, Jongeui Park, and Youngchul Sung. Communication in multi-agent reinforce-
ment learning: Intention sharing. In International Conference on Learning Representations,
2020.
[66] Changxi Zhu, Mehdi Dastani, and Shihan Wang. A survey of multi-agent reinforcement
learning with communication. arXiv preprint arXiv:2203.08975, 2022.
[67] Zhiyuan Yao, Zheng Li, Matthew Thomas, and Ionut Florescu. Reinforcement learning in
agent-based market simulation: Unveiling realistic stylized facts and behavior. arXiv preprint
arXiv:2403.19781, 2024.
[68] HaoHang Li and Steve Y Yang. Impact of false information from spoofing strategies: An
abm model of market dynamics. In 2022 IEEE Symposium on Computational Intelligence for
Financial Engineering and Economics (CIFEr), pages 1–10. IEEE, 2022.
[69] Kuan Wang, Yadong Lu, Michael Santacroce, Yeyun Gong, Chao Zhang, and Yelong Shen.
Adapting llm agents through communication. arXiv preprint arXiv:2310.01444, 2023.
[70] Zhao Mandi, Shreeya Jain, and Shuran Song. Roco: Dialectic multi-robot collaboration with
large language models. arXiv preprint arXiv:2307.04738, 2023.
[71] Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B Tenenbaum,
Tianmin Shu, and Chuang Gan. Building cooperative embodied agents modularly with large
language models. arXiv preprint arXiv:2307.02485, 2023.
[72] Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improving
factuality and reasoning in language models through multiagent debate. arXiv preprint
arXiv:2305.14325, 2023.
[73] Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu,
and Zhiyuan Liu. Chateval: Towards better llm-based evaluators through multi-agent debate.
arXiv preprint arXiv:2308.07201, 2023.
[74] Frank Xing. Designing heterogeneous llm agents for financial sentiment analysis. arXiv
preprint arXiv:2401.05799, 2024.
[75] Irene de Zarzà i Cubero, Joaquim de Curtò i Díaz, Gemma Roig, and Carlos T Calafate.
Optimized financial planning: Integrating individual and cooperative budgeting models with
llm recommendations. AI, 5(1):91–114, 2024.
14
REFERENCES
[76] Xiangpeng Wan, Haicheng Deng, Kai Zou, and Shiqi Xu. Enhancing the efficiency and
accuracy of underlying asset reviews in structured finance: The application of multi-agent
framework. arXiv preprint arXiv:2405.04294, 2024.
[77] Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang, Lingyao Li, Zhengting Wang,
Wenyue Hua, Dong Shu, Suiyuan Zhu, Xiaobo Jin, et al. When ai meets finance (stockagent):
Large language model-based stock trading in simulated real-world environments. arXiv
preprint arXiv:2407.18957, 2024.
[78] Patrick Bolton and Mathias Dewatripont. The firm as a communication network. The Quarterly
Journal of Economics, 109(4):809–839, 1994.
[79] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and
Yuan Cao. React: Synergizing reasoning and acting in language models. arXiv preprint
arXiv:2210.03629, 2022.
[80] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le,
Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models.
Advances in neural information processing systems, 35:24824–24837, 2022.
[81] Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik
Narasimhan. Tree of thoughts: Deliberate problem solving with large language models.
Advances in Neural Information Processing Systems, 36, 2024.
[82] Shibo Hao, Yi Gu, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, and
Zhiting Hu. Reasoning with language model is planning with world model. arXiv preprint
arXiv:2305.14992, 2023.
[83] Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin
Li, Lewei Lu, Xiaogang Wang, et al. Ghost in the minecraft: Generally capable agents for
open-world enviroments via large language models with text-based knowledge and memory.
arXiv preprint arXiv:2305.17144, 2023.
[84] Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen,
Jiakai Tang, Xu Chen, Yankai Lin, et al. A survey on large language model based autonomous
agents. Frontiers of Computer Science, 18(6):1–26, 2024.
[85] Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. Reinforcement learning: A
survey. Journal of artificial intelligence research, 4:237–285, 1996.
[86] Guojun Xiong, Shufan Wang, Daniel Jiang, and Jian Li. Personalized federated reinforce-
ment learning with shared representations. In Deployable RL: From Research to Practice@
Reinforcement Learning Conference 2024, 2024.
[87] Guojun Xiong, Ujwal Dinesha, Debajoy Mukherjee, Jian Li, and Srinivas Shakkottai. Dopl:
Direct online preference learning for restless bandits with preference feedback. arXiv preprint
arXiv:2410.05527, 2024.
[88] Zhiyuan Yao, Ionut Florescu, and Chihoon Lee. Control in stochastic environment with
delays: A model-based reinforcement learning approach. In Proceedings of the International
Conference on Automated Planning and Scheduling, volume 34, pages 663–670, 2024.
[89] Bin Zhang, Hangyu Mao, Jingqing Ruan, Ying Wen, Yang Li, Shao Zhang, Zhiwei Xu, Dapeng
Li, Ziyue Li, Rui Zhao, et al. Controlling large language model-based agents for large-scale
decision-making: An actor-critic approach. arXiv preprint arXiv:2311.13884, 2023.
[90] John Hull. Risk Management and Financial Institutions. John Wiley & Sons, 2007.
[91] William F. Sharpe. The sharpe ratio. The Journal of Portfolio Management, 21(1):49–58,
1994.
[92] Andrew Ang and Joseph Chen. Downside risk. Journal of Portfolio Management, 29(4):103–
112, 2003.
[93] Xiao-Yang Liu, Guoxuan Wang, and Daochen Zha. Fingpt: Democratizing internet-scale data
for financial large language models. arXiv preprint arXiv:2307.10485, 2023.
[94] Yu Qin and Yi Yang. What you say and how you say it matters: Predicting stock volatility
using verbal and vocal cues. In Proceedings of the 57th Annual Meeting of the Association for
Computational Linguistics, pages 390–401, 2019.
15
REFERENCES
[95] Linyi Yang, Tin Lok James Ng, Barry Smyth, and Riuhai Dong. Html: Hierarchical
transformer-based multi-task learning for volatility prediction. In Proceedings of The Web
Conference 2020, pages 441–451, 2020.
[96] Yupeng Cao, Zhi Chen, Qingyun Pei, Prashant Kumar, KP Subbalakshmi, and Papa Momar
Ndiaye. Ecc analyzer: Extract trading signal from earnings conference calls using large
language model for stock performance prediction. arXiv preprint arXiv:2404.18470, 2024.
[97] John C Hull. Options, Futures, and Other Derivatives. Pearson Education, 2017.
[98] Xiao-Yang Liu, Hongyang Yang, Jiechao Gao, and Christina Dan Wang. FinRL: Deep rein-
forcement learning framework to automate trading in quantitative finance. ACM International
Conference on AI in Finance (ICAIF), 2021.
[99] Xiao-Yang Liu, Ziyi Xia, Jingyang Rui, Jiechao Gao, Hongyang Yang, Ming Zhu, Christina
Wang, Zhaoran Wang, and Jian Guo. Finrl-meta: Market environments and benchmarks for
data-driven financial reinforcement learning. Advances in Neural Information Processing
Systems, 35:1835–1849, 2022.
[100] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap,
Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforce-
ment learning. In International conference on machine learning, pages 1928–1937. PMLR,
2016.
[101] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal
policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
[102] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan
Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint
arXiv:1312.5602, 2013.
[103] Yuliya Plyakha, Raman Uppal, and Grigory Vilkov. Why does an equal-weighted portfolio
outperform value-and price-weighted portfolios? Available at SSRN 2724535, 2012.
[104] Jaap MJ Murre and Joeri Dros. Replication and analysis of ebbinghaus’ forgetting curve. PloS
one, 10(7):e0120644, 2015.
[105] Guardrails ai. [Link] Open source library for interacting with
Large Language Models.
16
A Appendix
A.1 Related Work
LLM Agents for Financial Decision Making. There are considerable efforts towards developing
general-purpose LLM agent for sequential decision-making [49, 50], and such type of tasks often
involve episodic interactions with environment and verbal reflections for action refinement, such as
coding competition [51, 52], software development [53, 23], game-playing [54, 55]. Furthermore,
researchers have started to exploit how LLM agents can perform better in harder decision-making
tasks from finance [56, 57, 58, 59, 60], in which there are more volatile environments, leading to
that the numerous unpredictable elements can obscure an agent’s ability to reflect accurately on the
reasons for poor decision outcomes. FinMem [32] enhances single stock trading performance by
embedding memory modules with LLM agent for reflection-refinement, and FinAgent [33] improved
trading profits via using external quantitative tool to fight against volatile environment.
Multi-Agent System and Communication Structures. In traditional multi-agent systems [61, 62],
the way for agents’ communication is pre-determined, like sharing data or state observations [63, 64,
65, 66, 67, 68]. The emergence of large language model brings flexibility for human-understandable
communications [69, 20, 23, 70], so some work tries to elevate decision-making ability of LLM-
based multi-agent system by letting agents engage in discussions [71, 21] or debates [72, 73]. The
similar peer-communication strategy was as well utilized by the multi-agent system for financial tasks
[74, 75, 76]. However, such approach are not optimal for unified-goal financial tasks that prioritize
profits [77], because they suffer from potentially ambiguous optimization objectives and are unable
to control the unnecessary communication costs [78].
Prompt Optimization and Verbal Reinforcement. To enhance the reasoning or decision-making
of LLM agents, many prompt optimization techniques have been proposed, like ReAct [79], Chain
of Thought (CoT) [80], Tree of Thoughts (ToT) [81], ART [14], intended for that LLM agents can
automatically generate intermediate reasoning steps as an iterative program. In addition, to make
LLM agents make decisions like humans and generate more understandable reasoning texts, some
researchers recommend incorporating cognitive structures [82, 83, 44, 84]. Inspired by these previous
work and DRL algorithms [85, 86, 87, 67, 88], verbal reinforcement [29, 30, 89, 24] was developed
for LLM agents such that they can update actions based on iterative self-reflection while integrating
additional LLM as a prompt optimizer [27, 28].
In an LLM-based prompt optimizer, a meta-prompt [27, 28] is used to refine the task prompt for
better performance. For example, for a mathematical reasoning task, the task prompt might be "Let’s
solve the problem," while the meta-prompt could be "Improve the prompt to help a model better
perform mathematical reasoning."
Although prompt optimization lacks explicit gradients to control the update direction, we can simulate
“textual gradient” by using LLMs’ reflection capabilities. By generating feedback from past successes
and failures on trading decisions, LLMs can produce "semantic" gradient signals that guide the
optimization process.
Adjusting the optimization process’s direction is crucial, similar to tuning the learning rate in
traditional parameter optimization. An inappropriate learning rate can cause the process to oscillate
or converge too slowly. Similarly, without proper control, the LLM-based optimizer might overshoot
or oscillate during prompt optimization.
To mimic learning rate effects, we measure the overlapping percentage between trading decision
sequences from consecutive iterations. We then directly edit the previous task prompt to enhance
performance. The meta-prompt instructs the LLM to modify the current prompt based on feedback,
ensuring a stable and incremental improvement process. This method allows for effective exploitation
of existing prompts, leading to gradual performance enhancement.
17
A.3 F IN C ON Testing Stage Workflow
During the testing stage, F IN C ON will utilize the investment beliefs learned from the training stage,
and the over-episode risk control mechanism will no longer operate. However, the within-episode
risk control mechanism will still function, allowing the manager agent to adjust trading actions in
real-time based on short-term trading performance and market fluctuations. This ensures that even
during testing, F IN C ON can promptly respond to market risks and potentially prevent losses while
leveraging the knowledge gained during training.
18
A.4 Figure of Modular Design of Agents in F IN C ON
General Configuration
1. Investment task introduction 2. Trading target background
3. Trading sectors 4. Historical financial performance overview
Profiling Module
Manager Agent:
1. Role assignment: You are an experienced trading manager in the investment firm ...
2. Role description: Your responsibilities are to consolidate investment insights from analysts
and make trading actions on {asset symbols} ...
Analyst Agents:
1. Role assignment: You are the investment analysts for news/ market data/ Form 10-K (Q)/
ECC audio recording ...
2. Role duty description: Your responsibilities are to distill investment insights and other
indicators like financial sentiment for {asset symbols} ...
Perception Module
Manager Agent:
1. Perceive: Investment insights from analyst agents; daily risk alert and episode-level
investment belief updates from the risk control component.
2. Send: Feedback to analyst agents about their contribution to significant investment
earnings & losses.
Analyst Agents:
1. Perceive: Market information from certain information sources.
2. Send: Relevant market insights to manager agent.
3. Receive: Feedback from the manager agent.
Memory Module
Manager Agent:
1. Working: - Consolidation -Refinement
2. Procedural: - Trading action records - Reflection records
3. Episodic: - Trajectory history
Analyst Agents:
1. Working: - Observation - Retrieval - Distillation
2. Procedural: - Distilled Investment-related insights - Financial sentiment
- Investment report recommended actions
Action Module
Manager Agent:
1. Conduct: Trading actions.
2. Reflect: Trading reasons and analyst agents’ contribution assessment.
Figure 4: The detailed modular design of the manager and analyst agents. The general configuration
and profiling modules generate text-based queries to retrieve investment-related information from the
agents’ memory databases. The perceptual and memory modules interact with LLMs via prompts to
extract key investment insights. The action module of the manager agent consolidates these insights
to facilitate informed trading decisions.
19
A.5 Experimental Setup
20
A.6 Single Stock Trading Result Graphs
Figure 5: CRs over time for single-asset trading tasks. F IN C ON outperformed other comparative strategies,
achieving the highest CRs across all six stocks by the end of the testing period, regardless of market conditions.
21
A.7 Detailed Ablation Study
To answer the RQ2, we conduct the first ablation study. We assess the efficacy of F IN C ON’s within-
episode risk control mechanisms by monitoring system risk changes through CVaR. To demonstrate
the robustness of F IN C ON, we compare the performance of F IN C ON with versus without CVaR
implementation across two task types: single-asset trading and portfolio management. Furthermore,
in single-asset trading tasks, we consider assets in both general bullish and bearish market conditions
in the testing phase for comprehensive consideration.
Our results demonstrate that implementing CVaR in F IN C ON is highly effective across all finan-
cial metrics for both task types, as shown in Table 4 and Fig 6. For single-asset trading tasks,
F IN C ON without within-episode risk control yields negative CRs and significantly higher MDDs,
underperforming compared to the Buy-and-Hold strategy (CR of GOOG: 22.42%, CR of NIO:
−77.210%), highlighting the severe consequences of ignoring environmental risks. In portfolio
management, the CR increases dramatically from 14.699% to 113.836% with within-episode risk
control, demonstrating its effectiveness in risk supervision even amid non-uniform market trends.
(a) [1] Single Stock: General Bullish (b) [2] Single Stock: General Bearlish
(c) Multi-Assets
Figure 6: CRs of F IN C ON with vs. without implementing CVaR for within-episode risk control show that
the CVaR mechanism significantly improves F IN C ON’s performance. This is evident from two metrics: (a)
cumulative returns over time for single stocks in both bullish and bearish market conditions, and (b) portfolio
value over time for a multi-asset portfolio. In both cases, F IN C ON with CVaR demonstrates substantially higher
gains.
Specifically, the success of utilizing CVaR for within-episode risk control is evident in both bullish
and bearish market environments, as shown in the single asset trading case. In bullish markets,
CVaR sharply captures immediate market shocks and timely informs F IN C ON to exercise caution,
even amidst general optimism. Conversely, in bearish markets, CVaR consistently alerts F IN C ON to
significant price drops, ensuring awareness of market risks. Moreover, in portfolio trading with mixed
price trends, our within-episode risk control mechanism performs robustly by monitoring the entire
portfolio’s value fluctuations, enabling the trading manager agent to adjust potentially aggressive
operations for each asset promptly.
22
A.7 Detailed Ablation Study
(a) [1] Single Stock: General Bullish (b) [2] Single Stock: General Bearlish
(c) Multi-Assets
Figure 7: CRs of F IN C ON with vs. without belief updates for over-episode risk control. (a) The CRs over
time for single stocks. The performance of F IN C ON with belief updates consistently leads in both bullish and
bearish market conditions. (b) The portfolio values over time for multi-asset portfolio. F IN C ON’s performance
with belief updates also won a substantially higher gains.
23
A.7 Detailed Ablation Study
Figure 8: The first time and last time LLM generated investment belief updates by CVRF for GOOG.
24
A.8 Raw Data Sources
We assessed the performance of F IN C ON using multi-modal financial data from January 3, 2022, to
June 10, 2022, sourced from reputable databases and APIs including Yahoo Finance (via yfinance),
Alpaca News API, and Capital IQ, detailed explained in Table. These data, initially stored in the Raw
Financial Data Warehouse as available observations of the financial market environment, are diverged
into the corresponding F IN C ON’s Analysts’ Procedural Memory Databases based on timeliness
through working memory’s summarization operation.
Data Sources
News data associated with ticker: News data is sourced from REFINITIV REAL-TIME NEWS mainly
contains news from Reuters.
Form 10-Q, Part 1 Item 2 (Management’s Discussion and Analysis of Financial Condition and Results of
Operations): Quarterly reports (Form 10-Q) are required by the U.S. Securities and Exchange Commission
(SEC).
Form 10-k, Section 7 (Management’s Discussion and Analysis of Financial Condition and Results of
Operations): Annual reports (Form 10-K) are required by the U.S. Securities and Exchange Commission
(SEC), sourced from EDGAR, and downloaded via SEC API.
Historical stock price: Daily open price, high price, close price, adjusted close price, and volume data from
Yahoo Finance.
Zacks Equity Research:
Zacks Rank: The Zacks Rank is a short-term rating system that is most effective over the one- to three-month
holding horizon. The underlying driver for the quantitatively determined Zacks Rank is the same as the
Zacks Recommendation and reflects trends in earnings estimate revisions.
Zacks Analyst: Reason to Sell, Reason to Buy, and potential risks.
Earning Conference Calls (ECC): ECC is a type of unstructured financial data (audio) that is crucial for
understanding market dynamics and investor sentiment. The company executive board delivers ECC about
recent financial outcomes, future projections, and strategic directions. Recent studies have underscored the
importance of not only the textual content of these calls but also the audio feature. Analyses have revealed
that the audio elements—such as tone, pace, and inflections—offer significant predictive value regarding
company performance and stock movements [94, 95, 96].
Table 6: Raw data and memory warehouses of F IN C ON
Figure 9: The distribution of news from REFINITIV REAL-TIME NEWS for the 42 stocks in the
experiments
25
A.10 Formulas of Classic Financial Metrics for Risk Estimator and Decision-making Task Performance
Evaluation
Figure 10: The distribution of 10k10q from Securities and Exchange Commission (SEC) for the 42
stocks in the experiments
Figure 11: The distribution of Analyst Report from Zacks Equity Research for the 42 stocks in the
experiments
A.10 Formulas of Classic Financial Metrics for Risk Estimator and Decision-making Task
Performance Evaluation
26
A.11 Baseline and Comparative Models on Single Stock Trading Task
n
X
Cumulative Return = ri
t=1
n
X pt+1
= ln · actiont , (5)
t=1
pt
where ri represents the PnL for day t + 1, pt is the closing price on day t, pt+1 is the closing price
on day t + 1, and actiont denotes the trading decision made by the model for that day.
Portfolio Value: Portfolio value represents the total worth of all the investments held in a portfolio at
a given point in time. It is a metric used only in the portfolio management task.
t
Y
Cumulative Simple Returnt = (1 + Daily Simple Returnt ) − 1 (6)
k=1
Rp − Rf
Sharpe Ratio = (8)
σp
Max Drawdown of PnL [92]: Max Drawdown is a metric for assessing risk. It represents the most
significant decrease in a portfolio’s value, from its highest (Ppeak ) to its lowest point (Ptrough ) until a
new peak emerges, detailed in Equation 9. Indicative of investment strategy robustness, a smaller
Max Drawdown suggests reduced risk.
Ppeak − Ptrough
Max Drawdown = max( ) (9)
Ppeak
27
A.12 Portfolio Management
Markowitz Portfolio Selection [1]: introduced by Harry Markowitz in 1952, is a framework for
constructing portfolios that optimize expected return for a given level of risk or minimize risk for
a given level of expected return. This method uses expected returns, variances, and covariances of
asset returns to determine the optimal asset allocation, thereby balancing risk and return through
diversification.
FinRL-A2C [48]: is an RL algorithm proposed to address single stock trading and portfolio opti-
mization problems in Liu et al.. The RL models make trading decisions (i.e., portfolio weights) based
on the observation of previous market conditions and the brokerage information of the RL agents.
The implementation of this algorithm 2 is provided and is used as baselines in our study.
Equal-Weighted ETF [103]: is a portfolio giving equal allocation to all stocks, similar to a buy-and-
hold strategy in single-stock trading, can provide a benchmark on market trends.
2
[Link]
28
A.13 Ranking Metrics for Procedural Memory in F IN C ON
Upon receiving an investment inquiry, each agent in F IN C ON retrieves the top-K pivotal memory
events from its procedural memory, where K is a hyperparameter. These events are selected based on
their information retrieval score. For any given memory event E, its information retrieval score γ E is
defined by
γ E = SRelevancy
E E
+ SImportance (10)
which is adpated from Park et al [20] but with modified relevancy and importance computations, and
is scaled to [0, 1] before summing up. Upon the arrival of a trade inquiry P in processing memory
E
event E via LLM prompts, the agent computes the relevancy score SRelevancy that measures the cosine
similarity between the embedding vectors of the memory event textual content mE and the LLM
prompt query mP , which is defined as follows:
E mE · mP
SRelevancy = (11)
∥mE ∥2 × ∥mP ∥2
Note that the LLM prompt query inputs trading inquiry and trader characteristics. On the other hand,
E
the importance score SImportance is inversely correlates with the time gap between the inquiry and
the event’s memory timestamp δt = tP − tE , mirroring Ebbinghaus’s forgetting curve [104]. More
precisely, if we denote the initial score value of memory event v E and degrading ratio θ ∈ (0, 1),
then the importance score is computed via
E
SImportance = v E × θδt (12)
Note that the ratio θ measures the diminishing importance of an event over time, which is inspired by
design of [20]. But in our design, the factors of recency and importance are handled by one equation.
Different agents in F IN C ON admit different choices of {v E , θ} for memory event E.
Additionally, an access counter function facilitates memory event augmentation, so that critical events
impacting trading decisions can be augmented by F IN C ON, while trivial events are gradually faded.
This is achieved by using the LLM validation tool Guardrails AI [105] to track critical memory ID. A
E
memory ID deemed critical to investment gains receives +5 to its importance score SImportance . This
access counter implementation enables F IN C ON to capture and prioritize crucial events based on type
and retrieval frequency.
The training period was chosen to account for the seasonal nature of corporate financial reporting and
the duration of data retention in F IN C ON’s memory module. The selected training duration ensures
the inclusion of at least one publication cycle of either Form 10-Q, ECC, or Form 10-K. This strategy
ensures that the learned conceptualized investment guidance considers a more comprehensive scope
of factors. Additionally, the training duration allowed F IN C ON sufficient time to establish inferential
links between financial news, market indicators, and stock market trends, thereby accumulating
substantial experience. Furthermore, we set the number of top memory events retrieved for each agent
at 5. We ran F IN C ON. The reported performance outcomes are based on the setting that achieved the
highest cumulative return during the testing phase.
To maintain consistency in the comparison, the training and testing phases for the other three LLM-
based agents were aligned with those of F IN M EM. For parameters of other LLM-based agents that
are not encompassed by F IN M EM’s configuration, they were kept in accordance with their original
settings as specified in their respective source codes.
F IN C ON’s performance was benchmarked against that of the most effective comparative model, using
Cumulative Return and Sharpe Ratio as the primary evaluation metrics. The statistical significance of
F IN C ON’s superior performance was ascertained through the non-parametric Wilcoxon signed-rank
test, which is particularly apt for the non-Gaussian distributed data.
To further illustrate the robustness of F IN C ON’s performance, we assess its effectiveness in two
distinct scenarios: (1) a single-asset trading task using TSLA and (2) a portfolio management task
involving a combination of TSLA, MSFT, and PFE. Our evaluation focuses on key financial metrics,
29
A.15 F IN C ON performance on extreme market conditions
including Cumulative Returns (CRs), Sharpe Ratios (SRs), and Maximum Drawdown (MDD). The
training period spanned from January 17, 2022, to March 31, 2022, while the testing phase covered
April 1, 2022, to October 15, 2022. This specific timeframe was chosen due to the elevated levels
of the CBOE Volatility Index (VIX), which averaged above 20, signaling greater market volatility
during these months.
As demonstrated in Table 7 and Figure 12, FINCON is the sole agent system to achieve positive
Cumulative Returns (CRs) and Sharpe Ratios (SRs) in single stock trading tasks. Regarding portfolio
management tasks, the results of all baselines (four benchmarks) are detailed in Table 8 and Figure 13.
In these comparisons, FINCON consistently attained the highest scores in the primary performance
metrics.
30
The within-episode risk control mechanism in FINCON detects market risk within a single episode, allowing the manager agent to adjust trading actions and mitigate potential losses in real-time. It leverages the Conditional Value at Risk (CVaR) to monitor market fluctuations. This mechanism operates both during training and testing phases, enhancing decision-making performance by accounting for short-term trading dynamics .
FINCON's hierarchical communication structure channels financial data through specialized analyst agents, each focusing on distinct data types. These agents distill data into key investment insights which are synthesized by the manager agent for decision-making. This structure minimizes communication costs and enhances comprehension, enabling more informed trading actions by leveraging a detailed synthesis of multi-source insights .
Experimental results show that the over-episode risk control mechanism significantly enhances FINCON's trading performance by updating the manager agent’s beliefs with insights from performance differences between episodes. The improved Cumulative Returns (CRs) and Sharpe Ratios (SRs) in both single-asset and portfolio management scenarios demonstrate that the mechanism effectively guides agents to more profitable strategies by continuously refining investment beliefs .
FINCON's performance is assessed using Cumulative Return (CR%), Sharpe Ratio (SR), and Max Drawdown (MDD%). CR and SR are prioritized because they provide comprehensive insights into overall performance and risk-adjusted returns, which are crucial for informed investment decisions. MDD is a secondary metric focused on evaluating the potential for significant losses .
Episodic memory plays a pivotal role in FINCON's strategic decision-making by storing actions, PnL series from previous episodes, and updated conceptual investment beliefs from the risk control component. It is exclusive to the manager agent and serves as a historical reference, enabling the agent to refine its decision-making strategies over time by learning from past experiences and updating its investment beliefs accordingly .
The manager agent in FINCON enhances the procedural memory of analyst agents by providing feedback through an access counter. This helps align the multi-type data influencing specific time points and supports informed decision-making. Each analyst agent has distinct procedural memory decay rates reflecting the timeliness of financial data sources, which the manager uses to optimize memory retrieval based on recency, relevance, and importance .
Evidence supporting CVaR's effectiveness in within-episode risk control is evident in the improved trading metrics in both bullish and bearish markets. Utilizing CVaR, FINCON achieved higher CR% and SR while reducing MDD% compared to scenarios without CVaR, indicating superior risk-adjusted performance in both single asset and portfolio management tasks .
In single-stock trading, FINCON is compared with DRL agents (A2C, PPO, DQN), LLM-based agents, and the Buy-and-Hold strategy. For portfolio management, it is compared with Markowitz MV, FinRL-A2C, and Equal-Weighted ETF strategy. FINCON's use of advanced risk control mechanisms and a hierarchical multi-agent system demonstrates robust performance across both trading tasks, achieving higher cumulative returns and Sharpe ratios, thereby outperforming these comparative methods .
FINCON's analyst agents are differentiated by their specialization in processing uni-modal data from single sources and focusing on specific trading targets, reducing task load and enhancing information extraction. The system employs seven distinct types of analyst agents, including textual, audio, and data analysis agents, each generating unique investment insights. This specialized structure mimics an efficient human team by filtering out market noise and extracting key insights, which are then consolidated to aid the manager agent's decision-making .
FINCON implements a hierarchical multi-agent system where analyst agents extract investment insights from multi-source market data, focusing on specific trading targets. Each agent processes data from a single source, specializing in a specific function akin to a human team. These agents then assist a manager agent by consolidating denoised investment information from various perspectives. The manager agent, acting as the decision-maker, synthesizes these insights using convex optimization techniques and risk control mechanisms to produce trading actions. This structure enhances decision-making by maximizing information presentation and minimizing communication costs .