0% found this document useful (0 votes)
82 views8 pages

Deep RL for Multi-Stage Power Grid Failures

This paper presents a deep reinforcement learning approach to mitigate multi-stage cascading failures in power grids, addressing the limitations of existing single-stage strategies. A simulation environment was developed, and the Deep Deterministic Policy Gradient (DDPG) algorithm was employed to train an agent on the IEEE 14-bus and 118-bus systems, achieving high win rates compared to baseline methods. The study highlights the effectiveness of the proposed model while acknowledging the need for improved state design to enhance action variability.

Uploaded by

abd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views8 pages

Deep RL for Multi-Stage Power Grid Failures

This paper presents a deep reinforcement learning approach to mitigate multi-stage cascading failures in power grids, addressing the limitations of existing single-stage strategies. A simulation environment was developed, and the Deep Deterministic Policy Gradient (DDPG) algorithm was employed to train an agent on the IEEE 14-bus and 118-bus systems, achieving high win rates compared to baseline methods. The study highlights the effectiveness of the proposed model while acknowledging the need for improved state design to enhance action variability.

Uploaded by

abd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Published as a workshop paper at“Tackling Climate Change with Machine Learning”, ICLR 2025

D EEP R EINFORCEMENT L EARNING FOR P OWER G RID


M ULTI -S TAGE C ASCADING FAILURE M ITIGATION
Bo Meng, Chenghao Xu & Yongli Zhu∗
School of System Science and Engineering
Sun Yat-sen University
Guangzhou, China
{mengb,xuchh29}@[Link], yzhu16@[Link]

A BSTRACT
arXiv:2505.09012v1 [[Link]] 13 May 2025

Cascading failures in power grids can lead to grid collapse, causing severe dis-
ruptions to social operations and economic activities. In certain cases, multi-
stage cascading failures can occur. However, existing cascading-failure-mitigation
strategies are usually single-stage-based, overlooking the complexity of the multi-
stage scenario. This paper treats the multi-stage cascading failure problem as a
reinforcement learning task and develops a simulation environment. The rein-
forcement learning agent is then trained via the deterministic policy gradient al-
gorithm to achieve continuous actions. Finally, the effectiveness of the proposed
approach is validated on the IEEE 14-bus and IEEE 118-bus systems.

1 I NTRODUCTION

The modern large power grid consists of thousands of generators, substations, and transmission
lines, all intricately interconnected and interdependent, working together to maintain the stable trans-
mission of electricity. However, during the operation of the power system, various events may occur,
among which cascading failures is particularly complex and highly damaging Chen et al. (2019);
Jyoti & Hayat (2023); Uwamahoro & Eftekharnejad (2023). Cascading failures in power systems
are typically triggered by the failure of a single component, e.g., a transmission line. These faults
can rapidly propagate through the tightly interconnected network, potentially causing severe distur-
bances across the entire power grid and even leading to a complete system collapse Li & Tse (2024);
Zhang et al. (2023); Li et al. (2024). Such kind of events can pose a significant threat to the security
of the power grids and result in severe social and economic consequences.
Cascading failures can lead to devastating outcomes Guo et al. (2017); Salehpour & Al-Anbagi
(2024). For example, on June 19, 2024, at approximately 15:17, Ecuador experienced a nationwide
blackout, resulting in a collapse of the nation’s power grid, affecting around 18 million people, with
the power outage lasting for approximately 3 hours. The direct cause of this incident was the failure
of the Milagro-Zhoray transmission line, which triggered a series of cascading failures, ultimately
resulting in a widespread outage. This severe outage underscores the importance of developing fast
cascading failure mitigation strategies for complex power grids.
In the power system area, cascading failure mitigation refers to a series of control actions to prevent
the chain reaction after the first fault (e.g., one-line tripping), thereby avoiding system-wide black-
outs. In recent years, numerous studies have emerged in this field. For example, Guo et al. (2024)
proposed a method combining transient stability analysis with interaction graphs to identify critical
lines and mitigate cascading failures by reducing the fault probability of components on these critical
lines. Li et al. (2023) applied network flow theory to study the process of power flow redistribution
and proposed a cascading failure mitigation strategy based on adaptive power balance recovery and
selective edge protection. Inspired by the propagation patterns of faults, Bhaila & Wu (2024) em-
ployed graph neural networks (GNNs) to model and analyze cascading failures in power grids using
an end-to-end approach. Liu et al. (2024), on the other hand, utilized an improved percolation theory
to analyze the survivability of nodes in power grids and proposed an effective mitigation strategy.

Corresponding author.

1
Published as a workshop paper at“Tackling Climate Change with Machine Learning”, ICLR 2025

Figure 1: An example of a multi-stage cascading failure.

In this paper, a deep reinforcement learning (DRL) approach is developed for mitigating multi-stage
cascading failures (MSCF) in power systems, with the following contributions: (1) A simulation
environment for multi-stage cascading failure study is constructed; (2) The Deep Deterministic Pol-
icy Gradient (DDPG) algorithm is adopted to address the MSCF issue; (3) The proposed model is
validated on the IEEE 14-bus and 118-bus systems, demonstrating its effectiveness.

2 M ETHODOLOGY
2.1 M ULTI -S TAGE C ASCADING FAILURE (MSCF) P ROBLEM

Traditionally, single-stage cascading failure problems have been well studied Qi et al. (2017). How-
ever, in certain situations, multiple stages may occur Zhu (2021). For example, Fig. 1 depicts a
multi-stage cascading failure example: an earthquake causes the loss of the power line 4-5, trig-
gering the first stage of cascading failures (lines 2-4 and 4-9 are subsequently tripped due to the
over-limit line power flow, after the loss of line 4-5). Suppose the (remaining) power grid does not
collapse and enters a steady state. Then, after a short period, the aftershock may break another line,
triggering another stage of cascading failures.
One approach to handling the MSCF problem is to decompose them into multiple sub-problems of
single-stage and then solve each by each. However, this way might overlook the interdependence
between stages. On the other hand, if we map the concept of “each stage” to the concept of “each
step” in the RL context, then the MSCF problem can be investigated holistically under various
mature frameworks of reinforcement learning, which is the motivation of this paper.
In this paper, the DDPG algorithm and the Actor-Critic framework are utilized Lillicrap et al. (2019),
Mnih et al. (2016). The output of DDPG can be deterministic and real-number valued; hence, it
performs well in solving problems with continuous actions Wang & Vittal (2023).

2.2 E NVIRONMENT IMPLEMENTATION

In our work, a simulation environment is developed for MSCF mitigation using Python and Mat-
power, which is a well-known MATLAB toolbox for AC power flow (ACPF) computation. Cross-
tool interaction and data communication between Python and MATLAB have been achieved via a
Python-MATLAB handler. Several key designs regarding this environment are described below.

2.2.1 D EFINITIONS OF S TEP AND E PISODE


Step: a step means a stage when the power grid is attacked (e.g., by natural disasters), causing the
grid to evolve into a new state (i.e., how many buses (i.e., nodes) and ”lines” (i.e., edges) are still
”available”; how many islands are formed; how large is the power flow on each remaining line; etc.).
Episode: an episode is one specific set of steps when the power grid is consecutively attacked. At
the end of each episode, the final status is either “Win” or “Lose ” (cf. definitions in later sections).

2.2.2 S TATE DESIGN


For an n-bus power grid, our state is defined as follows:
state = [line status, P1 , Q1 , V1 , θ1 , ..., Pn , Qn , Vn , θn ]

2
Published as a workshop paper at“Tackling Climate Change with Machine Learning”, ICLR 2025

Figure 2: Island Availability assessment.

where, line status is the percentage value obtained via dividing the actual line power flow by
its maximum limit; Pi , Qi , Vi , θi , (i = 1, ..., n) denotes the active power injection, reactive power
injection, voltage magnitude and angle of the i-th bus, respectively.

2.2.3 ACTION DESIGN


Cascading failure might be mitigated by adjusting the generator’s power generation. Thus, the
generation coefficients [a1 , . . . , am ] of all m generators are considered as the action. The power
output of the i-th generator is the product of ai and its power capacity (i.e., the maximum power).

2.2.4 I SLAND DETECTION AND AVAILABILITY ASSESSMENT


The grid can become disconnected when lines are lost (due to an incident or line overload). There-
fore, the first step is to assess the connectivity of the grid. To that end, we employ the union-find
algorithm (c.f. Appendix A.1) to locate all the remaining islands.
The “availability” of an island means whether it is still alive at the end of a specific cascading failure
stage; if not, it will be discarded in later stages. The availability assessment is carried out after the
island detection. The criteria for island availability are described in Fig. 2. M ax Gen T otal and
Gen T otal are respectively the total power capacity and the total actual power output of all the
remaining generators in a specific island, and Load T otal is the total load demand in that island.

2.2.5 R EWARD DESIGN


• Total cost of generation: −c1 · cost. Here, c1 is a hyperparameter. cost means the total
generation cost ($) of all islands whose availability is true.
• Loss of load penalty: −BaseReward1 ·Ploss /Ptotal . Ploss is the total load on unavailable
islands at current stage, while Ptotal represents the original total load of initial power grid.
• Convergence reward: BaseReward2 . This reward is given when half or more of all the
currently remaining islands have converged.
• Win reward: BaseReward3 · (Pavailable /Ptotal )c2 . This reward is given when the win
conditions are met. Pavailable is the total load of available islands.
Here, c1 , c2 , BaseReward1 , BaseReward2 , and BaseReward3 are constants related to a specific
power grid. A basic idea in picking those constants is to make the above four parts in the same order
of magnitude. Finally, the overall workflow for the RL-based MSCF study is shown in Fig. 3(a).

3 E XPERIMENTS AND R ESULTS


The proposed approach is tested on the IEEE 14-bus and modified IEEE 118-bus systems. The IEEE
14-bus system has 5 generators and 20 lines, with its topology shown in Fig. 3(b). For other details
about the experiment settings and hyperparameters, please refer to Appendix A.5.
For each power grid, a DRL model is trained for 300 episodes. After training, the model interacted
with the environment for an additional 1000 episodes, during which the total reward in each episode
is recorded, and the final win rate is computed.

3
Published as a workshop paper at“Tackling Climate Change with Machine Learning”, ICLR 2025

(a) (b)

Figure 3: (a) The overall workflow of grid simulation for MSCF study; (b) The IEEE 14-bus system.

Table 1: Win rate comparison.

Method IEEE 14-bus System IEEE 118-bus System


DDPG 95.5% 97.8%
Baseline 1 52.0% 51.7%
Baseline 2 93.3% 8.40%
Baseline 3 85.6% 97.0%

The model is compared with three baseline strategies, as shown in Table 1 and Fig. 4. Baseline 1
means each generator output a random power. Baseline 2 means all generators output the maximum
power. Baseline 3 means all generators operate at half of their maximum power output. It can
be observed that the DRL achieves a good performance, with the highest win rate, large average
rewards, and more stable behaviors.

(a) (b)
Figure 4: The moving-average reward comparison.

4
Published as a workshop paper at“Tackling Climate Change with Machine Learning”, ICLR 2025

4 C ONCLUSION

This paper implements and validates a DRL-based solution for multi-stage cascading failure miti-
gation. One limitation of the current solution is that the differences in the states are relatively small,
causing the majority of the model’s actions to be similar. In future work, we will explore other state
designs to improve the action’s variability.

R EFERENCES
Karuna Bhaila and Xintao Wu. Cascading failure prediction in power grid using node and edge
attributed graph neural networks. In 2024 IEEE Green Technologies Conference (GreenTech), pp.
155–156, 2024. doi: 10.1109/GreenTech58819.2024.10520535.
Changsheng Chen, Wenyun Ju, Kai Sun, and Shiying Ma. Mitigation of cascading outages using
a dynamic interaction graph-based optimal power flow model. IEEE Access, 7:168637–168648,
2019. doi: 10.1109/ACCESS.2019.2953774.
Hengdao Guo, Ciyan Zheng, Herbert Ho-Ching Iu, and Tyrone Lucius Fernando. A critical re-
view of cascading failure analysis and modeling of power system. Renewable & Sustainable En-
ergy Reviews, 80:9–22, 2017. URL [Link]
114562742.
Zhenping Guo, Xiaowen Su, Kai Sun, and Srdjan Simunovic. Analysis and mitigation of cascading
outages using an interaction graph addressing transient stability. In 2024 IEEE Power & Energy
Society General Meeting (PESGM), pp. 1–5, 2024. doi: 10.1109/PESGM51994.2024.10689239.
Jamir Shariar Jyoti and Majeed M. Hayat. Topological attributes of cascading failures in power
grids. In 2023 IEEE Power & Energy Society General Meeting (PESGM), pp. 1–5, 2023. doi:
10.1109/PESGM52003.2023.10252476.
Biwei Li, Dong Liu, Junyuan Fang, Xi Zhang, and Chi K. Tse. Strengthening critical power network
branches for cascading failure mitigation. In 2024 IEEE International Symposium on Circuits and
Systems (ISCAS), pp. 1–5, 2024. doi: 10.1109/ISCAS58744.2024.10558306.
Meixuan Jade Li and Chi K. Tse. Quantification of cascading failure propagation in power systems.
IEEE Transactions on Circuits and Systems I: Regular Papers, 71(8):3717–3725, 2024. doi:
10.1109/TCSI.2024.3383450.
Meixuan Jade Li, Chi Kong Tse, Dong Liu, and Xi Zhang. Cascading failure propagation and
mitigation strategies in power systems. IEEE Systems Journal, 17(2):3282–3293, 2023. doi:
10.1109/JSYST.2023.3248044.
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa,
David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning, 2019.
URL [Link]
Xinyu Liu, Yan Li, and Tianqi Xu. Cascading failure model of cyber-physical power systems con-
sidering overloaded edges. In 2024 IEEE 2nd International Conference on Power Science and
Technology (ICPST), pp. 982–987, 2024. doi: 10.1109/ICPST61417.2024.10601896.
Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim
Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement
learning. CoRR, abs/1602.01783, 2016. URL [Link]
Junjian Qi, Wenyun Ju, and Kai Sun. Estimating the propagation of interdependent cascading out-
ages with multi-type branching processes. IEEE Transactions on Power Systems, 32(2):1212–
1223, 2017. doi: 10.1109/TPWRS.2016.2577633.
Ali Salehpour and Irfan Al-Anbagi. Resp: A real-time early stage prediction mechanism for cas-
cading failures in smart grid systems. IEEE Systems Journal, 18(3):1593–1604, 2024. doi:
10.1109/JSYST.2024.3420950.

5
Published as a workshop paper at“Tackling Climate Change with Machine Learning”, ICLR 2025

Nathalie Uwamahoro and Sara Eftekharnejad. A comparative study of data-driven power grid cas-
cading failure prediction methods. In 2023 North American Power Symposium (NAPS), pp. 1–6,
2023. doi: 10.1109/NAPS58826.2023.10318537.

Yuling Wang and Vijay Vittal. Real-time excitation control-based voltage regulation using ddpg
considering system dynamic performance. IEEE Open Access Journal of Power and Energy, 10:
643–653, 2023. doi: 10.1109/OAJPE.2023.3331884.

Xinzhe Zhang, Wenping Qin, Xiang Jing, Jiaxin Liu, Xiaoqing Han, and Peng Wang. Power
system resilience assessment considering the occurrence of cascading failures. In 2023 Inter-
national Conference on Power System Technology (PowerCon), pp. 1–5, 2023. doi: 10.1109/
PowerCon58120.2023.10331511.

Yongli Zhu. Power grid cascading failure mitigation by reinforcement learning. In ICML 2021
Workshop on Tackling Climate Change with Machine Learning, 2021. URL [Link]
[Link]/papers/icml2021/30.

A A PPENDIX

A.1 T HE U NION -F IND A LGORITHM FOR P OWER G RID I SLAND D ETECTION

The union-find algorithm is a data structure used to handle dynamic connectivity problems. Its basic
idea is to determine whether elements belong to the same set recursively and to merge sets when
necessary. Based on the results of island detection, the original grid may need to be divided into
multiple islands, which provides the basis for later evaluation of the system status.

Algorithm 1: Island Detection


Input: A power grid G with bus set N and line set E
Output: Islands I
1 Initialize an array p such that p[n] ← n for all n;
2 for (u,v) in E do
3 Perform Union(u, v, p) to merge their sets;
4 end
5 for n in N do
6 Perform Find(n) to determine the root;
7 end
8 Group all buses by their root into disjoint sets I;

A.2 T HE TOPOLOGY OF THE IEEE 118- BUS SYSTEM

The topology of the IEEE 118-bus system is shown in Fig. 5.

A.3 R EWARD COMPARISON

The reward comparison is shown in Fig. 6.

A.4 C ROSS - TOOL INTERACTION

The process of cross-tool interaction is shown in Fig. 7.

A.5 E XPERIMENT S ETTINGS AND H YPERPARAMETERS

The experiments are carried out on a computer with an Intel Core i5-12400F CPU, 32 GB RAM,
and a GeForce RTX 4060ti GPU. The development environments are Python 3.11, PyTorch 2.3.1,
and MATPOWER 8.0.

6
Published as a workshop paper at“Tackling Climate Change with Machine Learning”, ICLR 2025

G
1 2 G G G G G 59
55
40 41 42 53 54 56

G
3
12 G 117
39
G 4
11
33
52 57
58
63

14

37
G 44
51
60

G 15 34
43
50
6 7
13
5
G
G 48
64
G 36 G 45 61

16 19 35
G G
17 46 49
18
38 47

G G G G 67
8 113 30
73
69 68
G
66

31
20
G 71 G 62

G
24
G G
9 29
G 72
70
116 65
32
21 G G
10 28
115 22
G
114
G G 79

27 G 23 74 G 78
81
G G
G 26
118
76
80
99
G 75
106
25
77
98
97
G
96
82
G G G G
100 104 105 107
83
95
84 94

93 108

85
G G
88 89
92 109

G G 103
86 G
G 90
91
102 101
110
G G 112
87

G G 111

Figure 5: The topology of the IEEE 118-bus system.

(a) (b)
Figure 6: The reward comparison.

Figure 7: The process of cross-tool interaction.

7
Published as a workshop paper at“Tackling Climate Change with Machine Learning”, ICLR 2025

The IEEE 118-bus system contains 54 generators and 179 lines. Its topology is shown in Fig.
5. The environment parameters for both 14-bus and 118-bus systems are summarized in Table 2.
stage max represents the maximum number of stages in the MSCF problem, and line limit refers
to the maximum allowed power flow on the lines.

Table 2: Environment parameters.

IEEE 14-bus IEEE 118-bus Table 3: Model parameters.


Parameter
System System
stage max 3 3 Parameter Value
line limit 200 450 learning rate 1 × 10−4
c1 0.03 0.005 batch size 128
c2 1.7 1.7 discount factor(γ) 0.99
BaseReward1 2000 2000 update rate(τ ) 0.001
BaseReward2 1000 1000
BaseReward3 2000 2000

The model is trained using the DDPG algorithm and the configured parameters are shown in Table
3. Depending on the complexity of a given power grid, the number of hidden-layer neurons can be
adjusted and experimented for the best performance.

Common questions

Powered by AI

Graph neural networks (GNNs) enhance the analysis of cascading failures by effectively modeling the power grid as a graph structure. This allows for an end-to-end examination of fault propagation, providing insights into how failures move through the network's nodes and edges. GNNs can therefore improve understanding of which lines and nodes are most vulnerable and help in crafting targeted mitigation strategies .

Simulation plays a crucial role by providing a controlled environment to model the complex dynamics of power grids under cascading failures. It allows researchers to develop, test, and refine reinforcement learning models, such as those using the DDPG algorithm, without the risks and constraints of real-world testing. By accurately replicating grid conditions and potential failures, simulations facilitate iterative improvements and validation of mitigation approaches .

The DDPG algorithm differs from traditional reinforcement learning methods through its capability of handling problems with continuous action spaces rather than discrete actions, which is essential in the nuanced environment of power grid operation. While traditional reinforcement learning might optimize discrete actions, DDPG optimizes deterministic, real-number valued actions that suit the continuous nature of power grid system controls, leading to more precise failure mitigation in complex, multi-stage cascading events .

The Actor-Critic framework is significant in solving MSCF problems as it combines the benefits of both policy-based and value-based approaches, which is crucial for continuous action spaces like those in power grids. The 'Actor' is responsible for selecting actions, while the 'Critic' assesses them, providing a more robust learning process that can handle the complexity of multi-stage scenarios and ensure that control actions are effectively mitigating cascading failures .

In the context of MSCF, each stage of cascading failures, which may involve successive faults over time, maps conceptually to steps within reinforcement learning. This mapping allows the entire MSCF scenario to be addressed within the mature frameworks of RL, treating the progression of failures as sequentially dependent events. By viewing stages as steps, the problem is structured into a series of actions and outcomes, enabling the application of RL algorithms to manage and mitigate cascading effects effectively .

Multi-stage cascading failures in power grids can lead to widespread power outages, as seen in major incidents like the sudden blackout in Ecuador which affected approximately 18 million people. Such events can disrupt not only immediate access to electricity but also critical infrastructure, leading to economic losses, halted industrial activities, and compromised public services .

The DRL framework was validated on IEEE 14-bus and 118-bus systems through a series of methodological steps. First, a DRL model was trained within a simulation environment that uses Python and MATPOWER for accurate power flow computation. Then, after 300 episodes of training, the model's performance was assessed over an additional 1000 episodes, with evaluations focused on final win rates and reward comparison against baseline methods, ensuring comprehensive validation of the strategy's effectiveness .

Existing cascading failure mitigation strategies often focus on single-stage incidents, which do not account for the complexity of multi-stage cascading failures. This single-stage focus can overlook important interdependencies between events that occur over time. The MSCF approach treated within the reinforcement learning framework allows for a holistic consideration of multiple stages. The use of a Deep Deterministic Policy Gradient algorithm enables handling continuous actions over these stages, thus potentially improving the response to and mitigation of these complex cascading events .

The proposed DRL model demonstrated superior performance compared to baseline strategies on both IEEE 14-bus and 118-bus systems. It achieved a win rate of 95.5% and 97.8% respectively, significantly outperforming the three baselines, which varied in effectiveness depending on the test case but generally showed lower win rates. This highlights the effectiveness of the DRL model in managing and preventing cascading failures .

The current DRL-based solution for multi-stage cascading failure mitigation has limitations in the variability of actions due to relatively small differences in states. This leads to actions that are largely similar, potentially hindering the model's ability to cope with new or rapidly changing conditions. Future work aims to explore different state designs that could enhance the model's action variability and adaptability .

You might also like