An Incremental High Impedance Fault Detection - 2024 - International Journal of
An Incremental High Impedance Fault Detection - 2024 - International Journal of
A R T I C L E I N F O A B S T R A C T
Keywords: In the non-stationary environments of distribution networks, where operating conditions continually evolve,
High impedance fault maintaining reliable high impedance faults (HIF) detection is a significant challenge due to the frequent changes
Incremental learning in data distribution caused by environmental variations. In this paper, we propose a novel HIF detection method
Data replay
based on incremental learning to handle non-stationary data stream with changing distributions. The proposed
Distribution network
method utilizes stationary wavelet transform (SWT) to extract fault characteristics in different frequency do
mains from zero-sequence current data. Subsequently, a complex mapping from signal features to operational
conditions is established using backpropagation neural network (BPNN) to achieve online detection of HIF.
Additionally, signal features are analyzed using density-based spatial clustering of applications with noise
(DBSCAN) to monitor the distribution of data. After encountering multiple distribution changes, an incremental
learning process based on data replay is initiated to evolve the BPNN model for adapting to the changing data
distribution. It is worth noting that the data replay mechanism ensures that the model retains previously acquired
knowledge while learning from newly encountered data distributions. The proposed method was implemented in
a prototype of a designed edge intelligent terminal and validated using a 10 kV testing system data. The
experimental results indicate that the proposed method is capable of identifying and learning new distribution
data information within non-stationary data stream. This enables the classifier model to maintain a high level of
detection accuracy for the current cycle data, effectively enhancing the reliability of HIF detection.
1. Introduction urgent demand for robust and accurate technical solutions to detect HIFs
[9].
HIGH Impedance Faults (HIFs) are common events in distribution With the advances in computing sciences and information technol
networks [1], typically caused by contact between line conductors and ogy, emerging data-driven artificial intelligence(AI) has been well-
utility poles, the ground, or tree branches [2]. The transition resistance developed for fault detection, showing excellent application prospects
of the grounding medium in HIF often reaches several hundred or even [10]. Several studies have utilized AI techniques to address the issue of
thousands of ohms, resulting in weak fault signals that are challenging to HIF detection. LALA H et al. in [11] employed variational mode
trigger the activation threshold of traditional protection devices [3,4]. decomposition (VMD) to acquire IMF1 of the fault phase current. Sub
Moreover, HIF occurrences are frequently accompanied by intermittent sequently, the fault rising trend of IMF1 was extracted using singular
arc discharges, random movements of grounding conductors [5], and value decomposition. Finally, classification was conducted utilizing
nonlinear distortions, among other unstable characteristics. These fac support vector machines (SVM).Guo et al. in [12] introduced a varia
tors significantly complicate the detection of HIFs [6]. Failure to tional autoencoder to extract features from zero-sequence currents and
promptly detect and eliminate HIF can lead to an increased likelihood of used the obtained features to train a decision tree for fault detection.
fire outbreaks and personal injury accidents due to the frequent occur Gomes et al. in [13] employed sparse coding to extract current and
rence of accompanying arc discharges [7,8]. Consequently, there is an voltage features and trained a random forest for accurate detection of
* Corresponding author.
E-mail address: [email protected] (J.-H. Gao).
https://doi.org/10.1016/j.ijepes.2023.109705
Received 19 June 2023; Received in revised form 29 September 2023; Accepted 4 December 2023
Available online 21 December 2023
0142-0615/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705
2. Proposed method
HIF. Based on sufficient data learning, these methods are capable of
extracting fault characteristics across various analysis domains, thereby
Dynamic data distributions necessitate the continuous online evo
establishing a nonlinear mapping from signal features to operational
lution of model parameters [15–17]. In such cases, conventional ma
conditions. Consequently, they enable rapid identification of interfering
chine learning models may fail to meet performance requirements due to
factors and yield precise detection outcomes.
their predetermined and non-adjustable internal parameters. The pro
Existing AI-based algorithms typically utilize static intelligent clas
posed detection method in this paper is characterized by its ability to
sifiers, where the internal parameters are pre-constructed and non-
adaptively adjust internal parameters based on the changing scenario
adjustable. Such classifier models demand complete data during the
characteristics, thus facilitating the machine learning model’s adapta
training phase. However, in practical scenarios, data arrives in the form
tion to emerging fault patterns. Fig. 1 depicts the general flowchart of
of data stream, making it unfeasible to obtain comprehensive training
the detection method, which comprises three key elements:
samples within a short timeframe. Additionally, due to environmental
changes, the data distribution frequently undergoes alterations, result
1) Feature extraction method applied to zero-sequence current signals;
ing in non-stationary field data. When confronted with non-stationary
2) AI-based classifier models;
data stream, static classifier models often experience performance
3) Intelligent evolutionary method for acquiring newly distributed
degradation. Therefore, it is necessary to investigate an HIF detection
knowledge in data stream.
method with continuous learning capability, which can both learn new
information from non-stationary data stream [14] and adapt to changes
The present study introduces an evolutionary approach for sustain
in data distribution, thereby enhancing the reliability of HIF detection.
able learning of the HIF detection model, as depicted in Fig. 3. The
In this paper, an effective method based on incremental learning is
approach consists of four key components:
proposed to tackle the performance degradation of AI-based HIF
detection algorithms in non-stationary environments. In comparison to
1) Baseline model: This component refers to the adoption of a pre-
conventional approaches, this proposed method offers the capability for
existing lightweight machine learning model for the HIF detection
dynamic model adjustments through online incremental learning,
system. In this paper, the BPNN is selected as the underlying model
identify and acquire non-stationary information from data stream,
due to its ease of system deployment and implementation.
making it suited for complex distribution network environments.The
2) Data evaluation: This component involves assessing the data stream
method utilizes SWT to decompose zero-sequence current samples and
to identify new distribution data exhibiting shifts and constructing a
extracts effective features through standard deviation calculation. A
dataset for learning.
BPNN is used to classify these features. To facilitate the adaptive evo
3) Model update: The model learns new distribution knowledge from the
lution of the classifier model, an evolutionary framework is utilized. This
incoming data while revisiting the old distribution by replaying the
framework incorporates data evaluation, model updating, and exemplar
data from the exemplar dataset. This prevents the loss of previously
dataset updating. This framework enables the classifier model to adapt
acquired knowledge.
to changes in field scenarios, ensuring high detection accuracy for the
4) Exemplar dataset update: This component merges the old and new
current cycle data.
data to construct an updated exemplar dataset based on the
The fault detection method proposed in this paper effectively mon
distribution.
itors scenario changes within the field data stream and dynamically
adapts to new scenarios by reorganizing the classifier model through
Among these, the specific workflow for data evaluation, model up
incremental learning. This incremental detection method offers several
date, and exemplar dataset update is illustrated in Fig. 2.
contributions:
1) The method exhibits a strong data screening capability. Its data 2.1. Baseline model
evaluation mechanism can perceive scenario changes in the data
stream and accurately select new distributed data that differs from The core of the AI-based HIF detection method is an intelligent
the representation information of the exemplar dataset. classifier model. This paper proposes a sustainable learning HIF detec
2) The proposed method employs small-sample incremental training, tion approach intended for complex industrial environments, thus
enabling fast learning. It employs a lightweight AI classifier char employing one of the lightweight classifier models, BPNN. The fault
acterized by its simple structure and low computational re characteristics in different frequency domains from zero-sequence cur
quirements. The online incremental learning process utilizes a rent data are extracted using SWT in this paper, followed by the estab
limited number of samples and is built upon a pre-trained classifier lishment of a nonlinear mapping from signal features to operational
model, thereby leading to accelerated convergence. conditions using BPNN.The input layer of the BPNN comprises six
3) The proposed method demonstrates strong resilience against cata neurons that connect the output results of the feature extraction method
strophic forgetting, which refers to the phenomenon where the to the hidden layer. Through extensive experimentation, it has been
model tends to forget the knowledge acquired from past data while determined that configuring the hidden layer of the BPNN as 8 × 18 × 6
yields superior detection performance on real-world HIF data. The
2
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705
output layer consists of two neurons, with each representing HIF and employs soft thresholding. The setting of the soft threshold aims to
non-HIF, respectively. minimize noise while maximizing the preservation of useful signals. To
The BPNN model is initially pre-trained using small-sample data to assess the algorithm’s robustness in the presence of noise interference,
establish the initial classifier model with a structure of 6 × 8 × 18 × 6 × various levels of noise are introduced into the waveform data, and the
2. The training data undergoes data dimensionality reduction and test results are presented in Table 2, as illustrated.
clustering processing. Utilizing the results of the clustering analysis, a As evident from Table 2, feature extraction employing the wavelet
subset of data is selected to construct an exemplar dataset that accu denoising algorithm enhances the detection stability of the approach.
rately represents the information contained in the original training This enhancement enables the proposed algorithm to accurately extract
samples. fault characteristics and exhibit robust performance even in the presence
Training and testing were conducted using the PyTorch framework, of noise interference.
with the specific hyperparameters detailed in Table 1. The selection of
suitable hyperparameters is crucial for optimizing model performance. 2.2. Data evaluation
In our proposed method, the hyperparameters that deliver the best
performance on the dataset for this method were determined using a Given the complex and dynamic nature of power scenarios in the
trial-and-error approach. The hyperparameters presented in Table 1 field, the data presents a diverse and multi-level distribution. Attempt
were chosen through this process to achieve the optimal performance of ing to learn all the data within the data stream would lead to substantial
the proposed method. computational redundancy. Hence, it becomes crucial to employ a data
In practical distribution networks, data recorded by acquisition de evaluation mechanism that assesses the data stream, identifies data
vices often exhibit noise. Noise interference can lead to significant exhibiting distribution changes, and constructs a dataset for learning.
waveform variations, thereby indirectly affecting the accuracy of HIF This mechanism not only guarantees the detection system’s performance
detection. Within the wavelet domain, noise typically manifests as but also minimizes computational costs and enhances overall system
small-amplitude high-frequency components. Therefore, in line with the efficiency.
wavelet denoising principle, this study employs soft thresholding after The data evaluation mechanism proposed in this paper comprises
decomposing the signal using wavelet transformation to suppress these three components: feature extraction, data classification, and compre
high-frequency noise wavelet coefficients. The denoised wavelet co hensive evaluation.
efficients are then utilized to reveal fault characteristics in zero-
sequence current data across different analysis domains. Taking into 1) feature extraction
account the advantages of soft thresholding in balancing noise reduction
and preserving signal details, the thresholding method mentioned above In cases of HIFs and asymmetrical disturbances in distribution
3
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705
Fig. 3. Schematic diagram of the HIF detection method based on small-sample incremental learning.
Table 1 Table 3
Noise immunity performance test results. Detection accuracy and computation time based on SWT feature extraction.
Learning rate Training epoch Optimizer Batch Size Eps Haar Db2 Db4 Coif1 Sym2 Sym4
0.001 100 Adam 25 5.24 Accuracy of simulated 97.84 98.43 98.24 98.08 96.49 97.78
Denoising Non-HIF HIF quantity Confidence MinPts datasets (%)
threshold quantity threshold level processing time (ms) 1.23 1.31 1.48 1.40 1.31 1.49
threshold Accuracy of real-word 85.37 86.79 83.03 83.86 82.85 82.86
0.1 5 5 80 % 5 datasets (%)
Model Update Training epoch Learning rate Optimizer Batch processing time (ms) 1.22 1.31 1.48 1.40 1.32 1.49
Size
10 0.0001 Adam 10
Table 4
Detection accuracy and computation time based on MALLAT feature extraction.
Table 2
Haar Db2 Db4 Coif1 Sym2 Sym4
Noise immunity performance test results.
Accuracy of simulated 97.81 97.25 97.57 96.90 95.12 94.20
SNR(dB) 40 20 15 10 5 1
datasets (%)
Accuracy(%) 98.35 99.21 98.50 98.23 98.63 97.84 processing time (ms) 0.89 0.90 0.89 0.91 0.90 0.93
Accuracy of real-word 81.35 80.76 82.27 77.59 80.35 80.76
datasets (%)
networks, transient fluctuations arise in the line’s zero-sequence cur processing time (ms) 0.89 0.89 0.90 0.91 0.91 0.93
rent. By analyzing the time-varying, nonlinear transient signals, classi
fier models can enhance their capacity to detect HIF more effectively.
analysis is rooted in the principle of scaling and translating the signal
The wavelet transform stands out as a potent tool in signal processing
through the mother wavelet. The mother wavelet undergoes expansion
and analysis. Essentially, it concurrently provides temporal and fre
and compression via scaling operations to yield low and high-frequency
quency information for a given signal. Utilizing wavelets for signal
signals, respectively.
4
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705
Fig. 4. Five level decomposition with SWT. where ψ (x) is called the wavelet function.
As the scaling function ϕ(x) and the wavelet function ψ (x) can be
Regarding the detection algorithm embedded in the on-site micro expressed as convolution with high pass and low pass filters as follows.
computer, the accuracy of feature extraction and the computational 1 (x) ∑
efficiency of the algorithm are both pivotal factors that demand atten ϕ = h(n)ϕ(x − n) (4)
2 2
tion. This study takes both factors into comprehensive consideration. n
posed framework for HIF detection. The outcomes of these detection The approximate coefficients at level j + 1 cj+1,k can be directly
efforts, along with the corresponding computation times, are detailed in computed from the previous step i.e., cj,k as follows.
Tables 3 through 5. The test microcomputer utilized is a Raspberry Pi 4B ∑
with 1.5 GHz BCM2711 (CPU) and 8.00 GB of processor and RAM. Due cj+1,k = h(n − 2k)cj,n (6)
to the constraints posed by the Raspberry Pi’s memory and computa n
tional resources, employing longer wavelet bases may surpass the pro Similarly, the detail coefficients at level j + 1 also can be computed as
cessing capabilities of the Raspberry Pi. Consequently, this study opts for follows.
several commonly used compact wavelet bases during testing. ∑
Based on the findings presented in Tables 3 to 5, it becomes evident dj+1,k = g(n − 2k)cj,n (7)
that, irrespective of whether the data is simulated or real-world, the Eqs. (4) and (7) are used for multi-resolution DWT analysis, in which
most effective signal processing method is the SWT employing the Db2 the signal is down-sampled in each step and the length of the signal is
wavelet base. For computational efficiency, the Mallat transform using halved in each step. However, in SWT, instead of down-sampling, up-
the Haar wavelet base demonstrates the highest performance. sampling of the signal is carried out before filter convolution. Thus, in
Conversely, the Wavelet Packet Transform exhibits both lower accuracy the case of SWT algorithm, the approximate and detailed coefficients are
and computational efficiency compared to SWT and Mallat. For simu obtained as given below.
lated data, the accuracy of feature extraction achieved by SWT surpasses ∑
Mallat by 1.18 %, whereas for real-world data, SWT outperforms Mallat cj+1,k = h(m)cj,k+2j m (8)
by 6.03 % in terms of feature accuracy. It is worth noting that the pre m
especially on edge microcomputers, addressing this concern can be Here, h(⋅) and g(⋅) are the coefficients of low-pass and high-pass fil
accomplished through algorithmic optimization and hardware acceler ters, respectively.
ation. This can be achieved while ensuring the accuracy of feature To minimize redundancy in the original waveform or its transformed
extraction. Consequently, under the condition that computational effi formats, feature extraction is required. The standard deviation, serving
ciency requirements are met, this paper employs SWT as the preferred as an indicator of signal distribution dispersion, offers insights into the
tool for signal processing and analysis. degree of variation in signal frequency distribution. Therefore, standard
As referenced in [18], it is established that longer wavelet bases deviation is computed for the detail coefficients from the first to the
introduce significant time delays when detecting HIF transients, sixth level to achieve time-localized feature extraction and data
whereas compact wavelet bases do not incur such delays. Compact dimensionality reduction for the zero-sequence current.
wavelet bases, when applied to signal processing, offer a more
dependable feature for HIF detection due to the increased distortion 2) data classification
components at higher periodicized signal boundaries. Consequently,
compact wavelet coefficients exhibit superior performance in fast tran After the above feature extraction, the data undergoes dimension
sient detection when compared to their long-wavelet counterparts. ality reduction from the original high-dimensional time series to six
However, it is worth noting that the Haar wavelet base presents wavelet standard deviation features. These features include components
5
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705
Fig. 5. Visualization for the features of the real-word dataset: (a) HIF low-frequency features (b) HIF mid-frequency features (c) HIF high-frequency features (d) non-
HIF low-frequency features (e) non-HIF mid-frequency features (f) non-HIF high-frequency features.
A5 and D5, representing low-frequency characteristics, components D4 iii) If the chosen data object point p qualifies as a core point ac
and D3, characterizing mid-frequency features, and classifications D2 cording to the parameters eps and minPts, proceed to identify and
and D1, signifying high-frequency attributes. gather all data object points that are density-reachable from p,
The conclusions drawn from Shapley additive explanation (SHAP) thereby forming a cluster.
attribution theory in the referenced literature [21] concerning the iv) If the selected data object point p is an edge point, select the next
principles of BPNN detection of HIF reveal that HIF waveforms contain data object point.
abundant odd harmonics and high-frequency components. Conse v) Repeat steps (ii) and (iii) until all points have been processed, and
quently, there exists an overall positive correlation between the self- output the data point cluster labels.
values of features D4, D3, D1, and D2, as well as their Shapley values.
Conversely, the self-values of D5 and A5 exhibit an overall negative The high-dimensional raw zero-sequence current data is projected
correlation with their Shapley values. Based on these findings, this paper onto a 6-dimensional space that characterizes information from each
defines the data distribution characterized by the six wavelet standard frequency band after undergoing SWT feature extraction and SHAP
deviation features as follows: distribution weighting. Various distributions in this space form clusters
of varying shapes. DBSCAN, utilizing density reachability, identifies
C = [M1 • D1 , M2 • D2 , M3 • D3 , M4 • D4 , M5 • D5 , M6 • A5 ] (10)
data points within dense regions and groups them into clusters. Simul
taneously, it detects noise or boundary points in sparse regions, free
∑
6
Mi = 6 × SHAPDi / SHAPDi (11) from the constraints of cluster shapes.Using DBSCAN to cluster new data
i=1 samples with example set data, the focus lies on the cluster labels of the
Among these, SHAPDi is determined based on experimental results new data samples. When a cluster label is assigned as noise or a
from the literature. boundary point, it signifies that the new data samples’ density cannot
To investigate the outcomes of feature extraction in diverse electrical reach that of the data in the example dataset. In other words, these new
scenarios, this study utilizes the 10 kV distribution network model data samples cannot form clusters with the data in the example dataset
constructed within the PSCAD/EMTDC software to acquire zero- within the six-dimensional space following SWT feature extraction and
sequence current data under various conditions. Scenario One is SHAP distribution weighting. This implies a significant difference in
composed of capacitor switching (CS) events and HIF events simulated their distributions from the old data, necessitating manual scrutiny and
by the Mayr arc model, while Scenario Two comprises Low Impedance relearning of such data.Conversely, when a cluster label corresponds to a
Fault (LIF) events and HIF events simulated by the Emanuel arc model. specific cluster, it indicates that the density of the new data samples
Fig. 5 illustrates the results of feature extraction and distribution- matches that of the data in the example dataset. In this case, their dis
weighted visualization of the data for both scenarios. From the visual tribution is considered less different from the old data, and there is no
representation in Fig. 5, it becomes evident that certain data points in need for further learning.
scenario one exhibit a notable distribution shift compared to the data
distribution observed in scenario two. 4) comprehensive evaluation
Based on the data distribution characteristics of the zero-sequence
current dataset following SWT feature extraction and SHAP distribu After performing the aforementioned feature extraction and classi
tion weighting, this study employs Density-Based Spatial Clustering of fication of the zero-sequence currents, the data distribution evaluation
Applications with Noise (DBSCAN) to assess the distribution of new data result is obtained, which indicates the dissimilarity between the data in
within the data stream. The specific process for data distribution eval the data stream and the data in the exemplar dataset. In this study, in
uation based on DBSCAN is outlined as follows: addition to relying on the distribution evaluation results, we also utilize
the probability output from the HIF detection model’s output layer to
i) Based on extensive experimental results, set the neighborhood determine the model’s confidence level in classifying new data. The
parameters (eps, minPts), input the example dataset data and the model’s assessments for certain data points often fall near the boundary
feature-weighted vector C of the new data samples in the data between faults and non-faults, resulting in low confidence in this subset
stream. of data and making detections more error-prone. To address this issue,
ii) Arbitrarily select a data object point p from the data set. the proposed method selects these data points by applying a suitable
6
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705
7
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705
(continued )
Table 6
Line parameters. 3. Simulation and real-world data
Line type Sequential Resistance Capacitance Inductance
Component Ω/km μF/km mH/km 3.1. Simulation model and real-world 10 kV distribution system
Cable Zero 2.7000 0.2800 1.0190
lines Positive 0.2700 0.3390 0.2550 A distribution network model at 10 kV was created using the PSCAD/
Overhead Zero 0.2300 0.0080 5.4780 EMTDC software, as depicted in Fig. 7. The system consists of six lines,
lines Positive 0.1700 0.0097 1.2100 which are a combination of overhead lines and cable lines. The lines in
the system are implemented using the Bergeron model, and their pa
rameters are presented in Table 6. Each feeder line is equipped with
data volume. Finally, datasets X1, X2, X3, and X4 are combined to
zero-sequence current transformers placed at the starting point. The
construct the updated exemplar dataset, denoted as Xnew.
system operates in an overcompensated state with an overcompensation
Algorithm 1. degree of 6 %.
A variety of fault conditions were simulated in a real-world 10 kV
Input distribution system, considering different high impedance grounding
Learn dataset Xlearn={x1,x2,…,xm},Exemplar dataset Xold={xm+1,xm+2,…,xm+n}
media. The system consists of four feeders with simulated lengths of 9.7
Minimum number of points to be included in the category minPts
HIF scanning radius eps1,Non-HIF scanning radius eps2 km, 25 km, 17.35 km, and 4.9 km. The total capacitive current of the
Output system is 41.4 A, with individual capacitive currents of 20 A, 3.4 A, 8 A,
New exemplar dataset Xnew and 10 A for the four lines, respectively. The experiments focused on the
Steps 20 A line, and after arc extinguishing compensation, the residual current
(continued on next column) ranged from approximately 3 A to 4 A. The line has a total capacitance to
Fig. 8. Experimental scenarios:(a) branch grounding (b) masonry grounding (c) cable arc grounding (d) grassland grounding.
8
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705
Table 7 in Fig. 7, various HIF and HIF interference events occurring at different
Parameters of simulated events. fault locations (Fls) and on different lines are implemented, as detailed
Event Topo- FL Delay Simulated model Num- in Table 7. HIF is simulated using a combination of variable resistance
type logy time ber Rarc modeled by the Mayr model, Cassie model, Emanuel model, and
(ms) cybernetic model, in conjunction with a constant resistance Rc. These
HIF Add l11 l12 l13 l21 — Mayr、Cassie、 2560 models are utilized to simulate different grounding medium conditions
Init l22 l31 l32 l33 Emanuel and Control by adjusting their parameters, as specified in Table 8. Specifically, the
Delete l41 l42 l51l52 Theory Model simulation encompasses CS employing shunt 3-phase capacitors, inrush
CS l61 l62 l63 0,1,2,3 3-phase capacitor 520
IC — No-load transformer 520
current (IC) simulated with an no-loaded single-phase transformer, LIF
LIF — Resistance 520 represented by a Low Impedance Model (5 Ω-100 Ω), and load switching
LS — 3-phase asymmetric 520 (LS) modeled using 3-phase asymmetrical loads. Additionally, the ex
Total 4640 periments incorporate asynchronous closing of the CS to simulate non-
fault transient conditions, aligning with practical engineering sce
narios. In this study, 3-phase asynchronous closing entails phase A being
Table 8 connected to the system first, followed by phases B and C with the same
Parameters of simulated events. delay. The initial fault angle is set to 0◦ , 30◦ , 60◦ , 90◦ , and 120◦ . To
Rarc Rc(Ω) broaden the distribution of samples, this research considers variations in
the topology of the distribution system, which involve the addition of
Mayr Ploss(kW) τm(us) 800–3000
4.05–6.77/ 0.30/0.27–0.32 line l7 and the removal of line l3.
34.24–35.67 A total of 134 sets of HIF full-scale experiment data were collected
Cassie E0(kV) τc(us) from a real-world 10 kV distribution system. These experiments simu
3.66–4.05/0.22–0.26 1.05–1.35/230–260 lated various grounding scenarios, including pure resistance, tree
Emanuel Rn(Ω) Vn(kV) Rp(Ω) Vp(kV)
250–350 3.91–4.63 250–350 3.91–4.63
branches, masonry, grass gravel, and cable arcing. In addition, 87 sets of
450–550 4.91–5.63 450–550 4.05–4.77 HIF field data and 249 sets of non-HIF field data were collected from the
Control Lk(cm) Ik(kA) β Vk(V/cm) field distribution system. Typically, HIF characteristics persist for
theory 10–100 4 2.85 × 15 approximately 8 to 10 cycles [12]. In this paper, the input sample length
10− 5
was set to 3 cycles, and each data group was divided into two non-
overlapping samples representing the transient and steady-state pe
ground of 10.8 μF and is connected to parallel resistors of 300 kΩ and 6 riods. Fig. 9 displays the zero-sequence current waveforms and V-I arc
kΩ. Fig. 8 illustrates the experimental scenarios involving arc grounding characteristics of high impedance faults (HIFs) in both simulated and
through tree branches, masonry, cables, grass, and sand. real-world scenarios. For HIFs based on various simulation models and
those occurring in full-scale experiments within scenarios involving
branches, grassland, and masonry, the V-I arc characteristics typically
3.2. Acquisition of simulation and real-world data
exhibit well-defined hysteresis loops, as illustrated in Fig. 9(a)–(d), and
(f)–(h). These characteristics feature steep slopes near the origin and
In the resonant earthed system distribution network model depicted
Fig. 9. zero-sequence current waveforms and V-I arc characteristics of HIF in various scenarios: (a)mayr (b) cassie (c) emanuel (d) cybernetic (e) resistance (f) branch
(g) grassland (h) masonry (i) cable arc (j) field HIF.
9
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705
4. Experimental validation
Table 9 To simulate the non-stationary data stream faced by the HIF detec
Comparison of Time Consumption of Different Methods. tion system in engineering applications, the data samples in Table 10 are
Comparison Literature Feature extraction time Detection time randomly partitioned into 8 groups, with each group’s data samples
(Lala H & Karmakar S, 2020) 234.8 ms 4.703us further randomly partitioned into training, testing, and validation sets in
(S. Kar & S. R. Samantaray, 2016) 1.773 ms 1.672us a 7:2:1 ratio. The training set of the first dataset serves as the initial
(Gautam & Brahma, 2013) 3.281 ms 6.423us training set for training the primitive classifier model, while the
(Gao et al., 2022) 158.8 ms 62.5 ms remaining seven training sets are utilized as the simulated data stream
dataset. In the initial training set, there is a partial distribution of the
total samples, which, however, is insufficient to cover all distribution
Table 10 information. This is done to simulate scenarios where initial training
Simulation scenario composition. samples may be incomplete in field applications. The simulated data
Initial Incremental Scenario Incremental Scenario stream dataset is employed to mimic the data stream encountered by the
Scenario One Two model at different time intervals. The test set for each dataset comprises
Composition Mayr(640) Emanuel(640) Cybernetic(640) 20 % of the data samples in that dataset and is employed to assess the
(samples) Cassie model’s performance on the current cycle of data. Each validation
(640) dataset from every group is used to assist the model in selecting the best
LS(520) LIF(520) CS(520)
hyperparameter configuration for the current data volume.
IC(520)
The process of the simulated data stream experiment is as follows.
The model undergoes initial training in the initial training dataset, fol
lowed by incremental learning in batches using the simulated data
10
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705
stream dataset. The comprehensive accuracy of the model in both the circumvented.
current and previous batched test datasets is recorded. Nevertheless, ML algorithms theoretically equipped with the ability
In this paper, we have employed several methods from literature for incremental learning when employing the direct incremental
[11,24,25,26] as benchmarks to conduct a comprehensive comparison learning model yield subpar results. This is primarily attributed to the
between the proposed approach and existing techniques. The first two issue of “catastrophic forgetting,” a challenge frequently encountered in
methods utilize Machine Learning (ML) models for HIF detection based the course of incremental learning. The proposed method effectively
on Support Vector Machine (SVM) and Decision Tree (DT), respectively, mitigates this problem through the utilization of data replay, thereby
while the latter two employ threshold-based approaches, emphasizing ensuring the stability of the ML model’s detection performance
signal analysis across time, frequency, and time–frequency domains. throughout the learning process.
Additionally, in this paper, we have also applied a direct incremental To investigate the mechanism of data replay technology in the above
learning mode to existing ML algorithms, which involves updating the conclusions, this paper divides the data samples in Table 5 into three
ML model directly with new data. The results of the simulated data scenarios based on the different HIF models and Non-HIF events used.
stream experiments are presented in Fig. 11. The composition of these scenarios is presented in Table 10, with
Fig. 11, the detection accuracy of the proposed method remains numbers in parentheses indicating sample quantities. Each of the three
consistently high for different batches of incremental data, significantly scenario datasets is randomly partitioned into training and test sets at a
outperforming the other four methods in terms of sustaining perfor ratio of 7:3. Evidently, there are significant variations in data distribu
mance. This illustrates the ability of the proposed method to adapt to tion across different scenarios. The aim of this article is to magnify these
new distribution data through the incremental learning framework, distribution differences to observe the proposed method’s learning
while existing methods gradually degrade in detection performance as process with new data and the review process with old data.
new distribution data arrives, especially those relying on ML models. It The flow of the scenario transformation experiments is shown below.
is important to highlight that in this paper, we have fine-tuned the The model is trained with the initial scenario training set to train the
thresholds in [25] and [26] to achieve optimal performance. However, primitive classifier model in the proposed framework and direct learning
threshold-based approaches depend on the normal system operating mode, respectively, and then the training samples of incremental sce
state and tend to produce errors when confronted with unfamiliar data nario one and incremental scenario two are learned, and the perfor
distributions. This is a common limitation of algorithms lacking mance of the model in the three scenario test sets after each update is
continuous learning capabilities, and therefore, the method presented in recorded. The results of the scenario transformation experiments are
this paper aims to assist ML models in achieving lifelong learning. shown in Fig. 12.
The proposed method exhibits strong detection performance in the As evident from Fig. 12(b), the ML algorithm utilizing the direct
initial three data batches, validating that the straightforward structure incremental learning model possesses the capability to learn new dis
employed in this paper mitigates classifier model overfitting issues when tribution data. However, it often forgets the characteristics of the old
dealing with limited data. However, a slight decline in detection accu distribution data during the process of learning the new distribution
racy is observed in the 3rd to 5th incremental data batches. Interest data, resulting in a decline in the model’s ability to recognize the data
ingly, this performance dip is followed by a recovery and improvement from the old distribution. In contrast, the proposed method constructs an
in performance after adjusting the network structure during the 5th example set that includes previously learned information about the old
batch of learning. This performance shift is attributed to the structural distribution and employs retrospective training of the model through
adaptation technique proposed for model structure adjustments. As the data playback techniques. This approach ensures the model’s effective
model’s learning distribution widens and the dataset expands, the initial recognition of the old scenario is maintained while learning the new
simplistic model proves inadequate for capturing the increasing scenario.
complexity in data distribution. By expanding the model structure to After learning the new distribution data, this portion of the data is
accommodate this broader distribution range, the problem of decreased transformed into the learned old distribution data through the example
detection performance due to model underfitting is effectively set updating technique. The proposed method retains its capability to
11
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705
Fig. 12. Adaptation of models using different methods to adjust to simulation scenario changes.
12
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705
Fig. 14. Adaptation of models using different methods to adjust to real-world scenario changes.
13
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705
Furthermore, it demonstrates a strong resistance to forgetting old data. Declaration of Competing Interest
14
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705
[24] Kar S, Samantaray SR. High impedance fault detection in microgrid using maximal [26] Wang X, Wei X, Gao J, et al. High-Impedance fault detection method based on
overlapping discrete wavelet transform and decision tree. In: International stochastic resonance for a distribution network with strong background noise. IEEE
Conference on Electrical Power & Energy Systems; 2016. Trans Power Delivery 2022;37(2):1004–16.
[25] Gautam Brahma S. Detection of high impedance fault in power distribution systems
using mathematical morphology. IEEE Trans Power Syst 2013;28(2):1226–34.
15