0% found this document useful (0 votes)

31 views15 pages

An Incremental High Impedance Fault Detection - 2024 - International Journal of

Uploaded by

pat juv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views15 pages

An Incremental High Impedance Fault Detection - 2024 - International Journal of

Uploaded by

pat juv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Electrical Power and Energy Systems 156 (2024) 109705

Contents lists available at ScienceDirect

International Journal of Electrical Power and Energy Systems

journal homepage: www.elsevier.com/locate/ijepes

An incremental high impedance fault detection method under

non-stationary environments in distribution networks
Mou-Fa Guo a, d, Meitao Yao a, d, Jian-Hong Gao a, b, c, d, *, Wen-Li Liu a, d, Shuyue Lin b
a
College of Electrical Engineering and Automation, Fuzhou University, Fuzhou 350108, China
b
School of Engineering, University of Hull, Hull HU67RX, UK
c
Department of Electrical Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
d
Engineering Research Center of Smart Distribution Grid Equipment, Fujian Province University, Fuzhou 350108, China

A R T I C L E I N F O A B S T R A C T

Keywords: In the non-stationary environments of distribution networks, where operating conditions continually evolve,
High impedance fault maintaining reliable high impedance faults (HIF) detection is a significant challenge due to the frequent changes
Incremental learning in data distribution caused by environmental variations. In this paper, we propose a novel HIF detection method
Data replay
based on incremental learning to handle non-stationary data stream with changing distributions. The proposed
Distribution network
method utilizes stationary wavelet transform (SWT) to extract fault characteristics in different frequency do
mains from zero-sequence current data. Subsequently, a complex mapping from signal features to operational
conditions is established using backpropagation neural network (BPNN) to achieve online detection of HIF.
Additionally, signal features are analyzed using density-based spatial clustering of applications with noise
(DBSCAN) to monitor the distribution of data. After encountering multiple distribution changes, an incremental
learning process based on data replay is initiated to evolve the BPNN model for adapting to the changing data
distribution. It is worth noting that the data replay mechanism ensures that the model retains previously acquired
knowledge while learning from newly encountered data distributions. The proposed method was implemented in
a prototype of a designed edge intelligent terminal and validated using a 10 kV testing system data. The
experimental results indicate that the proposed method is capable of identifying and learning new distribution
data information within non-stationary data stream. This enables the classifier model to maintain a high level of
detection accuracy for the current cycle data, effectively enhancing the reliability of HIF detection.

1. Introduction urgent demand for robust and accurate technical solutions to detect HIFs
[9].
HIGH Impedance Faults (HIFs) are common events in distribution With the advances in computing sciences and information technol
networks [1], typically caused by contact between line conductors and ogy, emerging data-driven artificial intelligence(AI) has been well-
utility poles, the ground, or tree branches [2]. The transition resistance developed for fault detection, showing excellent application prospects
of the grounding medium in HIF often reaches several hundred or even [10]. Several studies have utilized AI techniques to address the issue of
thousands of ohms, resulting in weak fault signals that are challenging to HIF detection. LALA H et al. in [11] employed variational mode
trigger the activation threshold of traditional protection devices [3,4]. decomposition (VMD) to acquire IMF1 of the fault phase current. Sub
Moreover, HIF occurrences are frequently accompanied by intermittent sequently, the fault rising trend of IMF1 was extracted using singular
arc discharges, random movements of grounding conductors [5], and value decomposition. Finally, classification was conducted utilizing
nonlinear distortions, among other unstable characteristics. These fac support vector machines (SVM).Guo et al. in [12] introduced a varia
tors significantly complicate the detection of HIFs [6]. Failure to tional autoencoder to extract features from zero-sequence currents and
promptly detect and eliminate HIF can lead to an increased likelihood of used the obtained features to train a decision tree for fault detection.
fire outbreaks and personal injury accidents due to the frequent occur Gomes et al. in [13] employed sparse coding to extract current and
rence of accompanying arc discharges [7,8]. Consequently, there is an voltage features and trained a random forest for accurate detection of

* Corresponding author.
E-mail address: [email protected] (J.-H. Gao).

https://doi.org/10.1016/j.ijepes.2023.109705
Received 19 June 2023; Received in revised form 29 September 2023; Accepted 4 December 2023
Available online 21 December 2023
0142-0615/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705

learning new data. To address this challenge, the method constructs

an exemplar dataset to store the information of previous scenario
data and introduces a data replay mechanism. This ensures that the
model can learn new distribution knowledge without forgetting the
knowledge obtained from previous distributions.

The structure of this paper is as follows. Section II provides an in-

depth introduction to the incremental HIF detection algorithm pro
posed in this study. In Section III, we describe the experimental test
setup and present the results obtained from the application of the pro
posed method. Finally, Section IV summarizes the contributions of this
Fig. 1. Flowchart of the proposed HIF detection method. paper and offers recommendations for further research.

2. Proposed method
HIF. Based on sufficient data learning, these methods are capable of
extracting fault characteristics across various analysis domains, thereby
Dynamic data distributions necessitate the continuous online evo
establishing a nonlinear mapping from signal features to operational
lution of model parameters [15–17]. In such cases, conventional ma
conditions. Consequently, they enable rapid identification of interfering
chine learning models may fail to meet performance requirements due to
factors and yield precise detection outcomes.
their predetermined and non-adjustable internal parameters. The pro
Existing AI-based algorithms typically utilize static intelligent clas
posed detection method in this paper is characterized by its ability to
sifiers, where the internal parameters are pre-constructed and non-
adaptively adjust internal parameters based on the changing scenario
adjustable. Such classifier models demand complete data during the
characteristics, thus facilitating the machine learning model’s adapta
training phase. However, in practical scenarios, data arrives in the form
tion to emerging fault patterns. Fig. 1 depicts the general flowchart of
of data stream, making it unfeasible to obtain comprehensive training
the detection method, which comprises three key elements:
samples within a short timeframe. Additionally, due to environmental
changes, the data distribution frequently undergoes alterations, result
1) Feature extraction method applied to zero-sequence current signals;
ing in non-stationary field data. When confronted with non-stationary
2) AI-based classifier models;
data stream, static classifier models often experience performance
3) Intelligent evolutionary method for acquiring newly distributed
degradation. Therefore, it is necessary to investigate an HIF detection
knowledge in data stream.
method with continuous learning capability, which can both learn new
information from non-stationary data stream [14] and adapt to changes
The present study introduces an evolutionary approach for sustain
in data distribution, thereby enhancing the reliability of HIF detection.
able learning of the HIF detection model, as depicted in Fig. 3. The
In this paper, an effective method based on incremental learning is
approach consists of four key components:
proposed to tackle the performance degradation of AI-based HIF
detection algorithms in non-stationary environments. In comparison to
1) Baseline model: This component refers to the adoption of a pre-
conventional approaches, this proposed method offers the capability for
existing lightweight machine learning model for the HIF detection
dynamic model adjustments through online incremental learning,
system. In this paper, the BPNN is selected as the underlying model
identify and acquire non-stationary information from data stream,
due to its ease of system deployment and implementation.
making it suited for complex distribution network environments.The
2) Data evaluation: This component involves assessing the data stream
method utilizes SWT to decompose zero-sequence current samples and
to identify new distribution data exhibiting shifts and constructing a
extracts effective features through standard deviation calculation. A
dataset for learning.
BPNN is used to classify these features. To facilitate the adaptive evo
3) Model update: The model learns new distribution knowledge from the
lution of the classifier model, an evolutionary framework is utilized. This
incoming data while revisiting the old distribution by replaying the
framework incorporates data evaluation, model updating, and exemplar
data from the exemplar dataset. This prevents the loss of previously
dataset updating. This framework enables the classifier model to adapt
acquired knowledge.
to changes in field scenarios, ensuring high detection accuracy for the
4) Exemplar dataset update: This component merges the old and new
current cycle data.
data to construct an updated exemplar dataset based on the
The fault detection method proposed in this paper effectively mon
distribution.
itors scenario changes within the field data stream and dynamically
adapts to new scenarios by reorganizing the classifier model through
Among these, the specific workflow for data evaluation, model up
incremental learning. This incremental detection method offers several
date, and exemplar dataset update is illustrated in Fig. 2.
contributions:

1) The method exhibits a strong data screening capability. Its data 2.1. Baseline model
evaluation mechanism can perceive scenario changes in the data
stream and accurately select new distributed data that differs from The core of the AI-based HIF detection method is an intelligent
the representation information of the exemplar dataset. classifier model. This paper proposes a sustainable learning HIF detec
2) The proposed method employs small-sample incremental training, tion approach intended for complex industrial environments, thus
enabling fast learning. It employs a lightweight AI classifier char employing one of the lightweight classifier models, BPNN. The fault
acterized by its simple structure and low computational re characteristics in different frequency domains from zero-sequence cur
quirements. The online incremental learning process utilizes a rent data are extracted using SWT in this paper, followed by the estab
limited number of samples and is built upon a pre-trained classifier lishment of a nonlinear mapping from signal features to operational
model, thereby leading to accelerated convergence. conditions using BPNN.The input layer of the BPNN comprises six
3) The proposed method demonstrates strong resilience against cata neurons that connect the output results of the feature extraction method
strophic forgetting, which refers to the phenomenon where the to the hidden layer. Through extensive experimentation, it has been
model tends to forget the knowledge acquired from past data while determined that configuring the hidden layer of the BPNN as 8 × 18 × 6
yields superior detection performance on real-world HIF data. The

2
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705

Fig. 2. Flowchart of the proposed HIF detection method.

output layer consists of two neurons, with each representing HIF and employs soft thresholding. The setting of the soft threshold aims to
non-HIF, respectively. minimize noise while maximizing the preservation of useful signals. To
The BPNN model is initially pre-trained using small-sample data to assess the algorithm’s robustness in the presence of noise interference,
establish the initial classifier model with a structure of 6 × 8 × 18 × 6 × various levels of noise are introduced into the waveform data, and the
2. The training data undergoes data dimensionality reduction and test results are presented in Table 2, as illustrated.
clustering processing. Utilizing the results of the clustering analysis, a As evident from Table 2, feature extraction employing the wavelet
subset of data is selected to construct an exemplar dataset that accu denoising algorithm enhances the detection stability of the approach.
rately represents the information contained in the original training This enhancement enables the proposed algorithm to accurately extract
samples. fault characteristics and exhibit robust performance even in the presence
Training and testing were conducted using the PyTorch framework, of noise interference.
with the specific hyperparameters detailed in Table 1. The selection of
suitable hyperparameters is crucial for optimizing model performance. 2.2. Data evaluation
In our proposed method, the hyperparameters that deliver the best
performance on the dataset for this method were determined using a Given the complex and dynamic nature of power scenarios in the
trial-and-error approach. The hyperparameters presented in Table 1 field, the data presents a diverse and multi-level distribution. Attempt
were chosen through this process to achieve the optimal performance of ing to learn all the data within the data stream would lead to substantial
the proposed method. computational redundancy. Hence, it becomes crucial to employ a data
In practical distribution networks, data recorded by acquisition de evaluation mechanism that assesses the data stream, identifies data
vices often exhibit noise. Noise interference can lead to significant exhibiting distribution changes, and constructs a dataset for learning.
waveform variations, thereby indirectly affecting the accuracy of HIF This mechanism not only guarantees the detection system’s performance
detection. Within the wavelet domain, noise typically manifests as but also minimizes computational costs and enhances overall system
small-amplitude high-frequency components. Therefore, in line with the efficiency.
wavelet denoising principle, this study employs soft thresholding after The data evaluation mechanism proposed in this paper comprises
decomposing the signal using wavelet transformation to suppress these three components: feature extraction, data classification, and compre
high-frequency noise wavelet coefficients. The denoised wavelet co hensive evaluation.
efficients are then utilized to reveal fault characteristics in zero-
sequence current data across different analysis domains. Taking into 1) feature extraction
account the advantages of soft thresholding in balancing noise reduction
and preserving signal details, the thresholding method mentioned above In cases of HIFs and asymmetrical disturbances in distribution

3
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705

Fig. 3. Schematic diagram of the HIF detection method based on small-sample incremental learning.

Table 1 Table 3
Noise immunity performance test results. Detection accuracy and computation time based on SWT feature extraction.
Learning rate Training epoch Optimizer Batch Size Eps Haar Db2 Db4 Coif1 Sym2 Sym4

0.001 100 Adam 25 5.24 Accuracy of simulated 97.84 98.43 98.24 98.08 96.49 97.78
Denoising Non-HIF HIF quantity Confidence MinPts datasets (%)
threshold quantity threshold level processing time (ms) 1.23 1.31 1.48 1.40 1.31 1.49
threshold Accuracy of real-word 85.37 86.79 83.03 83.86 82.85 82.86
0.1 5 5 80 % 5 datasets (%)
Model Update Training epoch Learning rate Optimizer Batch processing time (ms) 1.22 1.31 1.48 1.40 1.32 1.49
Size
10 0.0001 Adam 10

Table 4
Detection accuracy and computation time based on MALLAT feature extraction.
Table 2
Haar Db2 Db4 Coif1 Sym2 Sym4
Noise immunity performance test results.
Accuracy of simulated 97.81 97.25 97.57 96.90 95.12 94.20
SNR(dB) 40 20 15 10 5 1
datasets (%)
Accuracy(%) 98.35 99.21 98.50 98.23 98.63 97.84 processing time (ms) 0.89 0.90 0.89 0.91 0.90 0.93
Accuracy of real-word 81.35 80.76 82.27 77.59 80.35 80.76
datasets (%)
networks, transient fluctuations arise in the line’s zero-sequence cur processing time (ms) 0.89 0.89 0.90 0.91 0.91 0.93
rent. By analyzing the time-varying, nonlinear transient signals, classi
fier models can enhance their capacity to detect HIF more effectively.
analysis is rooted in the principle of scaling and translating the signal
The wavelet transform stands out as a potent tool in signal processing
through the mother wavelet. The mother wavelet undergoes expansion
and analysis. Essentially, it concurrently provides temporal and fre
and compression via scaling operations to yield low and high-frequency
quency information for a given signal. Utilizing wavelets for signal
signals, respectively.

4
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705

Table 5 undesirable low-frequency oscillations, which complicate transient

Detection accuracy and computation time based on Wavelet Packet Transform detection. Conversely, other compact wavelets derived from db2 do not
feature extraction. exhibit such problematic low-frequency oscillations [19]. This aligns
Haar Db2 Db4 Coif1 Sym2 Sym4 with the previously discussed test results, thus, in this paper, the db2
Accuracy of simulated 90.15 91.37 92.96 91.91 89.19 88.74
wavelet base is employed as the mother wavelet for the SWT.
datasets (%) SWT is a multi-scale analysis method that decomposes signals into
processing time (ms) 1.77 1.78 1.75 1.79 1.78 1.81 sub-signals at different scales, suitable for analyzing time-varying
Accuracy of real-word 81.43 78.34 75.08 68.89 73.32 73.49 nonlinear transient signals [20]. In this paper, a high sampling rate of
datasets (%)
4000 Hz is used to capture high-frequency features in the zero-sequence
processing time (ms) 1.77 1.78 1.75 1.79 1.78 1.80
current waveform. The zero-sequence current undergoes 5-level
decomposition using SWT, as schematically depicted in Fig. 4. Based
on coefficients cj,k , we can formulate the general transformation equa
tion for SWT, as follows.
〈〉
cj,k = f (x), ϕj,k (x) (1)
( )
ϕj,k (x) = 2− j ϕ 2− j x − k (2)
j
The discrete detail coefficients at the resolution 2 are given by.
〈 ( )〉
dj,k = f (x), 2− j ψ 2− j x − k (3)

Fig. 4. Five level decomposition with SWT. where ψ (x) is called the wavelet function.
As the scaling function ϕ(x) and the wavelet function ψ (x) can be
Regarding the detection algorithm embedded in the on-site micro expressed as convolution with high pass and low pass filters as follows.
computer, the accuracy of feature extraction and the computational 1 (x) ∑
efficiency of the algorithm are both pivotal factors that demand atten ϕ = h(n)ϕ(x − n) (4)
2 2
tion. This study takes both factors into comprehensive consideration. n

Specifically, it employs the Stationary Wavelet Transform, Mallat

1 (x) ∑
transform, and Wavelet Packet Transform to analyze the zero-sequence ψ = g(n)ϕ(x − n) (5)
2 2
currents. Subsequently, the analysis results are applied within the pro n

posed framework for HIF detection. The outcomes of these detection The approximate coefficients at level j + 1 cj+1,k can be directly
efforts, along with the corresponding computation times, are detailed in computed from the previous step i.e., cj,k as follows.
Tables 3 through 5. The test microcomputer utilized is a Raspberry Pi 4B ∑
with 1.5 GHz BCM2711 (CPU) and 8.00 GB of processor and RAM. Due cj+1,k = h(n − 2k)cj,n (6)
to the constraints posed by the Raspberry Pi’s memory and computa n

tional resources, employing longer wavelet bases may surpass the pro Similarly, the detail coefficients at level j + 1 also can be computed as
cessing capabilities of the Raspberry Pi. Consequently, this study opts for follows.
several commonly used compact wavelet bases during testing. ∑
Based on the findings presented in Tables 3 to 5, it becomes evident dj+1,k = g(n − 2k)cj,n (7)
that, irrespective of whether the data is simulated or real-world, the Eqs. (4) and (7) are used for multi-resolution DWT analysis, in which
most effective signal processing method is the SWT employing the Db2 the signal is down-sampled in each step and the length of the signal is
wavelet base. For computational efficiency, the Mallat transform using halved in each step. However, in SWT, instead of down-sampling, up-
the Haar wavelet base demonstrates the highest performance. sampling of the signal is carried out before filter convolution. Thus, in
Conversely, the Wavelet Packet Transform exhibits both lower accuracy the case of SWT algorithm, the approximate and detailed coefficients are
and computational efficiency compared to SWT and Mallat. For simu obtained as given below.
lated data, the accuracy of feature extraction achieved by SWT surpasses ∑
Mallat by 1.18 %, whereas for real-world data, SWT outperforms Mallat cj+1,k = h(m)cj,k+2j m (8)
by 6.03 % in terms of feature accuracy. It is worth noting that the pre m

cision of feature extraction directly impacts the algorithm’s overall ∑

performance and utility, as it furnishes a dependable foundation for dj+1,k = g(m)cj,k+2j m (9)
decision-making. While computational efficiency holds significance,
m

especially on edge microcomputers, addressing this concern can be Here, h(⋅) and g(⋅) are the coefficients of low-pass and high-pass fil
accomplished through algorithmic optimization and hardware acceler ters, respectively.
ation. This can be achieved while ensuring the accuracy of feature To minimize redundancy in the original waveform or its transformed
extraction. Consequently, under the condition that computational effi formats, feature extraction is required. The standard deviation, serving
ciency requirements are met, this paper employs SWT as the preferred as an indicator of signal distribution dispersion, offers insights into the
tool for signal processing and analysis. degree of variation in signal frequency distribution. Therefore, standard
As referenced in [18], it is established that longer wavelet bases deviation is computed for the detail coefficients from the first to the
introduce significant time delays when detecting HIF transients, sixth level to achieve time-localized feature extraction and data
whereas compact wavelet bases do not incur such delays. Compact dimensionality reduction for the zero-sequence current.
wavelet bases, when applied to signal processing, offer a more
dependable feature for HIF detection due to the increased distortion 2) data classification
components at higher periodicized signal boundaries. Consequently,
compact wavelet coefficients exhibit superior performance in fast tran After the above feature extraction, the data undergoes dimension
sient detection when compared to their long-wavelet counterparts. ality reduction from the original high-dimensional time series to six
However, it is worth noting that the Haar wavelet base presents wavelet standard deviation features. These features include components

5
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705

Fig. 5. Visualization for the features of the real-word dataset: (a) HIF low-frequency features (b) HIF mid-frequency features (c) HIF high-frequency features (d) non-
HIF low-frequency features (e) non-HIF mid-frequency features (f) non-HIF high-frequency features.

A5 and D5, representing low-frequency characteristics, components D4 iii) If the chosen data object point p qualifies as a core point ac
and D3, characterizing mid-frequency features, and classifications D2 cording to the parameters eps and minPts, proceed to identify and
and D1, signifying high-frequency attributes. gather all data object points that are density-reachable from p,
The conclusions drawn from Shapley additive explanation (SHAP) thereby forming a cluster.
attribution theory in the referenced literature [21] concerning the iv) If the selected data object point p is an edge point, select the next
principles of BPNN detection of HIF reveal that HIF waveforms contain data object point.
abundant odd harmonics and high-frequency components. Conse v) Repeat steps (ii) and (iii) until all points have been processed, and
quently, there exists an overall positive correlation between the self- output the data point cluster labels.
values of features D4, D3, D1, and D2, as well as their Shapley values.
Conversely, the self-values of D5 and A5 exhibit an overall negative The high-dimensional raw zero-sequence current data is projected
correlation with their Shapley values. Based on these findings, this paper onto a 6-dimensional space that characterizes information from each
defines the data distribution characterized by the six wavelet standard frequency band after undergoing SWT feature extraction and SHAP
deviation features as follows: distribution weighting. Various distributions in this space form clusters
of varying shapes. DBSCAN, utilizing density reachability, identifies
C = [M1 • D1 , M2 • D2 , M3 • D3 , M4 • D4 , M5 • D5 , M6 • A5 ] (10)
data points within dense regions and groups them into clusters. Simul
taneously, it detects noise or boundary points in sparse regions, free
∑
6
Mi = 6 × SHAPDi / SHAPDi (11) from the constraints of cluster shapes.Using DBSCAN to cluster new data
i=1 samples with example set data, the focus lies on the cluster labels of the
Among these, SHAPDi is determined based on experimental results new data samples. When a cluster label is assigned as noise or a
from the literature. boundary point, it signifies that the new data samples’ density cannot
To investigate the outcomes of feature extraction in diverse electrical reach that of the data in the example dataset. In other words, these new
scenarios, this study utilizes the 10 kV distribution network model data samples cannot form clusters with the data in the example dataset
constructed within the PSCAD/EMTDC software to acquire zero- within the six-dimensional space following SWT feature extraction and
sequence current data under various conditions. Scenario One is SHAP distribution weighting. This implies a significant difference in
composed of capacitor switching (CS) events and HIF events simulated their distributions from the old data, necessitating manual scrutiny and
by the Mayr arc model, while Scenario Two comprises Low Impedance relearning of such data.Conversely, when a cluster label corresponds to a
Fault (LIF) events and HIF events simulated by the Emanuel arc model. specific cluster, it indicates that the density of the new data samples
Fig. 5 illustrates the results of feature extraction and distribution- matches that of the data in the example dataset. In this case, their dis
weighted visualization of the data for both scenarios. From the visual tribution is considered less different from the old data, and there is no
representation in Fig. 5, it becomes evident that certain data points in need for further learning.
scenario one exhibit a notable distribution shift compared to the data
distribution observed in scenario two. 4) comprehensive evaluation
Based on the data distribution characteristics of the zero-sequence
current dataset following SWT feature extraction and SHAP distribu After performing the aforementioned feature extraction and classi
tion weighting, this study employs Density-Based Spatial Clustering of fication of the zero-sequence currents, the data distribution evaluation
Applications with Noise (DBSCAN) to assess the distribution of new data result is obtained, which indicates the dissimilarity between the data in
within the data stream. The specific process for data distribution eval the data stream and the data in the exemplar dataset. In this study, in
uation based on DBSCAN is outlined as follows: addition to relying on the distribution evaluation results, we also utilize
the probability output from the HIF detection model’s output layer to
i) Based on extensive experimental results, set the neighborhood determine the model’s confidence level in classifying new data. The
parameters (eps, minPts), input the example dataset data and the model’s assessments for certain data points often fall near the boundary
feature-weighted vector C of the new data samples in the data between faults and non-faults, resulting in low confidence in this subset
stream. of data and making detections more error-prone. To address this issue,
ii) Arbitrarily select a data object point p from the data set. the proposed method selects these data points by applying a suitable

6
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705

replayed data to prevent the potential influence of dataset imbalance

introduced by retrospective training. The pending-to-learn dataset
comprises newly distributed data encountered by the model in the data
stream and data with low confidence levels. The utilization of this data
segment enables the model to acquire new knowledge, while the
exemplar dataset represents the data distribution of previously trained
samples, allowing the model to reinforce the old knowledge it has ac
quired and avoid “catastrophic forgetting” during its evolution.
During the learning of the nth distribution, the model can only access
the current training dataset, Dn. However, updating the model solely
with Dn can lead to catastrophic forgetting. To address this issue, a
method for preserving an additional exemplar dataset, denoted as ℇ=
{(xj, yj)}, which contains a certain number of representative samples for
each encountered distribution, has been proposed. Here, xj represents
the data of exemplar dataset samples, and yj represents the labels of
exemplar dataset samples. By retaining these samples, the model is
better equipped to resist catastrophic forgetting during the process of
learning new distributions. This approach mimics the way humans
actively review previously acquired knowledge when acquiring new
information, a concept referred to as data replay [22,23]. The afore
Fig. 6. Schematic diagram of Exemplar dataset update. mentioned review process can be described as follows:
∑
L = ℓ(f (xi ), yi ) (12)
confidence threshold. When the confidence level exceeds the predefined
(xi ,yi )∈(D b ∪E )
threshold, it indicates that the model’s detection of the current data is
relatively reliable, eliminating the need for further learning. Conversely,
where L represents the model training process, ℓ() represents the loss
if the confidence level falls below the set threshold, the model’s detec
calculation function, and f(xi) represents the output of the model for
tion of the current data is considered insufficiently reliable, requiring
sample xi. Equation (12) indicates that when the model learns the new
manual examination and subsequent re-learning.
task Dn, it needs to consider the loss on the previous exemplar dataset of
In this paper, we recognize that new data with shifted data distri
old distributions simultaneously, and optimize the model f() to enable it
bution can contribute to the discovery of new detection patterns for the
to possess the detection capability for both the old and new distributions
detection model. Consequently, the proposed data evaluation system
simultaneously.
emphasizes the determination of data distribution as the primary cri
To prevent overfitting of the BPNN model, this paper employs a
terion, with the evaluation of the model’s confidence in the detection
structurally adaptive approach to align the model’s complexity with the
results serving as a supplementary criterion. The data evaluation
volume of training data. In scenarios with limited data, a simpler neural
mechanism categorizes the evaluation results of new data into three
network structure is selected to reduce model complexity and mitigate
groups:
the risk of overfitting. As the volume of data increases, the simplicity of
the neural network hampers its learning capabilities, making it inef
i) High confidence in the existing distribution: When the new data
fective at acquiring complex feature representations, thereby con
exhibits minor deviations from the distribution of the old data,
straining the model’s performance. Consequently, when the training
and the model displays high confidence in the detection results of
data volume reaches a certain level, this paper expands the model’s
the new data, the information provided by the new data is
structure by adding hidden layer neurons. To ensure that this structural
redundant, and the model does not need to learn from this data.
expansion does not lead to performance degradation in the original data
ii) Low confidence in the existing distribution: When the new data
distribution task, the weights and biases associated with the added
shows slight deviations from the distribution of the old data, but
neurons are initially set to 0.01 % of those from the preceding neuron.
the model demonstrates low confidence in the detection results, it
indicates an insufficient understanding of this data by the model,
necessitating further learning. 2.4. Exemplar dataset update
iii) New distribution: The distribution of the new data has shifted,
indicating the presence of new feature information. Regardless of The exemplar dataset represents the knowledge that the model has
the confidence level in the model’s detection results, the system already acquired, necessitating updates to the exemplar dataset
stores this data and conducts manual verification, considering it following model evolution. As the model evolves, the amount of ac
as one of the samples for model learning in the next stage. quired data grows. However, due to storage constraints on edge devices,
the exemplar dataset cannot expand indefinitely. Consequently, an
under-sampling update approach is implemented for the exemplar
2.3. Model update dataset after model evolution to maintain a stable storage overhead.
As shown in Fig. 6, the proposed approach for updating the exemplar
As illustrated in the Model Update process box in Fig. 2, when the dataset aims to select and retain samples that effectively represent dis
stored data volume of pending-to-learn HIF and Non-HIF samples rea tribution boundaries. Algorithm 1 outlines the specific process of sample
ches the predefined thresholds for the number of HIFs and Non-HIFs, selection to update the exemplar dataset. Firstly, the old and new data
respectively, samples equal to these threshold data volumes are used in the model’s learning phase are extracted and clustered. The
selected to constitute the pending-to-learn dataset, while the remaining discrete points from these clusters form datasets X1 and X2. Next, the
data proceeds to the next incremental cycle. Building upon the inheri center of each cluster is determined, and the Euclidean distance between
tance of all the parameter weights of the original model, a data replay each dimension-reduced data point and its corresponding cluster center
mechanism is introduced to update the classifier model using samples is calculated. The desired number of datasets X3 is selected by sorting
from the exemplar dataset and the pending-to-learn dataset. The settings these distances in descending order. Subsequently, a random subset of
of the two quantity thresholds are synchronized with the amount of data is chosen to form a supplementary dataset X4, based on the selected

7
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705

(continued )

1. Stationary wavelet transform φ(•) : Rc →Rd

Y1 , Y2 ←φ(Xlearn + Xold )
2.Density-Based Spatial Clustering of Applications with Noise DBSCAN
L1 ←DBSCAN(Y1 ),L2 ←DBSCAN(Y2 )
3.Select all discrete points X1,X2
X1 ←L1 = − 1,X2 ←L2 = − 1
4.Select category boundary points X3
unique_labels = unique(L1)
for i in unique_labels
1 ∑ i
Ci = Y //cluster center
len(Li )
end for
for j = 1,2,…,m + n do
dj = |Yij − Ci |//distance
end for
Arrange dj in descending order and select X3 according to minPts
5.Select supplemental points X4
if len(X1+X2+X3)>=len(Xold)
X4=Ø
else
Randomly select len(Xold)-len(X1+X2+X3) points from Xold + Xold-X1-X2-X3 to form
X4
6.X1, X2, X3, and X4 constitute Xnew
Fig. 7. Simulation model of 10 kV distribution system.

Table 6
Line parameters. 3. Simulation and real-world data
Line type Sequential Resistance Capacitance Inductance
Component Ω/km μF/km mH/km 3.1. Simulation model and real-world 10 kV distribution system
Cable Zero 2.7000 0.2800 1.0190
lines Positive 0.2700 0.3390 0.2550 A distribution network model at 10 kV was created using the PSCAD/
Overhead Zero 0.2300 0.0080 5.4780 EMTDC software, as depicted in Fig. 7. The system consists of six lines,
lines Positive 0.1700 0.0097 1.2100 which are a combination of overhead lines and cable lines. The lines in
the system are implemented using the Bergeron model, and their pa
rameters are presented in Table 6. Each feeder line is equipped with
data volume. Finally, datasets X1, X2, X3, and X4 are combined to
zero-sequence current transformers placed at the starting point. The
construct the updated exemplar dataset, denoted as Xnew.
system operates in an overcompensated state with an overcompensation
Algorithm 1. degree of 6 %.
A variety of fault conditions were simulated in a real-world 10 kV
Input distribution system, considering different high impedance grounding
Learn dataset Xlearn={x1,x2,…,xm},Exemplar dataset Xold={xm+1,xm+2,…,xm+n}
media. The system consists of four feeders with simulated lengths of 9.7
Minimum number of points to be included in the category minPts
HIF scanning radius eps1,Non-HIF scanning radius eps2 km, 25 km, 17.35 km, and 4.9 km. The total capacitive current of the
Output system is 41.4 A, with individual capacitive currents of 20 A, 3.4 A, 8 A,
New exemplar dataset Xnew and 10 A for the four lines, respectively. The experiments focused on the
Steps 20 A line, and after arc extinguishing compensation, the residual current
(continued on next column) ranged from approximately 3 A to 4 A. The line has a total capacitance to

Fig. 8. Experimental scenarios:(a) branch grounding (b) masonry grounding (c) cable arc grounding (d) grassland grounding.

8
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705

Table 7 in Fig. 7, various HIF and HIF interference events occurring at different
Parameters of simulated events. fault locations (Fls) and on different lines are implemented, as detailed
Event Topo- FL Delay Simulated model Num- in Table 7. HIF is simulated using a combination of variable resistance
type logy time ber Rarc modeled by the Mayr model, Cassie model, Emanuel model, and
(ms) cybernetic model, in conjunction with a constant resistance Rc. These
HIF Add l11 l12 l13 l21 — Mayr、Cassie、 2560 models are utilized to simulate different grounding medium conditions
Init l22 l31 l32 l33 Emanuel and Control by adjusting their parameters, as specified in Table 8. Specifically, the
Delete l41 l42 l51l52 Theory Model simulation encompasses CS employing shunt 3-phase capacitors, inrush
CS l61 l62 l63 0,1,2,3 3-phase capacitor 520
IC — No-load transformer 520
current (IC) simulated with an no-loaded single-phase transformer, LIF
LIF — Resistance 520 represented by a Low Impedance Model (5 Ω-100 Ω), and load switching
LS — 3-phase asymmetric 520 (LS) modeled using 3-phase asymmetrical loads. Additionally, the ex
Total 4640 periments incorporate asynchronous closing of the CS to simulate non-
fault transient conditions, aligning with practical engineering sce
narios. In this study, 3-phase asynchronous closing entails phase A being
Table 8 connected to the system first, followed by phases B and C with the same
Parameters of simulated events. delay. The initial fault angle is set to 0◦ , 30◦ , 60◦ , 90◦ , and 120◦ . To
Rarc Rc(Ω) broaden the distribution of samples, this research considers variations in
the topology of the distribution system, which involve the addition of
Mayr Ploss(kW) τm(us) 800–3000
4.05–6.77/ 0.30/0.27–0.32 line l7 and the removal of line l3.
34.24–35.67 A total of 134 sets of HIF full-scale experiment data were collected
Cassie E0(kV) τc(us) from a real-world 10 kV distribution system. These experiments simu
3.66–4.05/0.22–0.26 1.05–1.35/230–260 lated various grounding scenarios, including pure resistance, tree
Emanuel Rn(Ω) Vn(kV) Rp(Ω) Vp(kV)
250–350 3.91–4.63 250–350 3.91–4.63
branches, masonry, grass gravel, and cable arcing. In addition, 87 sets of
450–550 4.91–5.63 450–550 4.05–4.77 HIF field data and 249 sets of non-HIF field data were collected from the
Control Lk(cm) Ik(kA) β Vk(V/cm) field distribution system. Typically, HIF characteristics persist for
theory 10–100 4 2.85 × 15 approximately 8 to 10 cycles [12]. In this paper, the input sample length
10− 5
was set to 3 cycles, and each data group was divided into two non-
overlapping samples representing the transient and steady-state pe
ground of 10.8 μF and is connected to parallel resistors of 300 kΩ and 6 riods. Fig. 9 displays the zero-sequence current waveforms and V-I arc
kΩ. Fig. 8 illustrates the experimental scenarios involving arc grounding characteristics of high impedance faults (HIFs) in both simulated and
through tree branches, masonry, cables, grass, and sand. real-world scenarios. For HIFs based on various simulation models and
those occurring in full-scale experiments within scenarios involving
branches, grassland, and masonry, the V-I arc characteristics typically
3.2. Acquisition of simulation and real-world data
exhibit well-defined hysteresis loops, as illustrated in Fig. 9(a)–(d), and
(f)–(h). These characteristics feature steep slopes near the origin and
In the resonant earthed system distribution network model depicted

Fig. 9. zero-sequence current waveforms and V-I arc characteristics of HIF in various scenarios: (a)mayr (b) cassie (c) emanuel (d) cybernetic (e) resistance (f) branch
(g) grassland (h) masonry (i) cable arc (j) field HIF.

9
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705

considerably smaller slopes near the extremes. However, in the cases of

pure resistance, cable arcing, and field HIFs, the V-I arc characteristics
do not demonstrate the aforementioned behavior but instead display
entirely distinct trajectories.

4. Experimental validation

In this paper, to validate the feasibility of the proposed method in

engineering applications, we developed a prototype of an edge intelli
gent terminal deployed in a 10 kV distribution network field. This pro
totype can be installed in both substations and distribution feeders. The
structure of the prototype, as shown in Fig. 10(a), comprises a data
acquisition module and a data processing module. The data acquisition
module is based on the STM32 microcontroller and is responsible for
analog-to-digital conversion and data transmission. The data processing
module, based on the Raspberry Pi 4B, is responsible for real-time HIF
detection and incremental learning.
Based on the prototype of the edge intelligent terminal described
earlier, this paper establishes a simulated system for real-time detection
and learning. The components of the system are depicted in Fig. 10(b).
In this system, PC1 machine and relay tester form a data generator
responsible for generating virtual scenarios of real-time data signals.
PC2 machine is used as a real-time detection and learning panel to
display the corresponding detection results and incremental learning
status. Raspberry Pi is a common edge microcomputer based on ARM
architecture. In this study, the Raspberry Pi utilizes a 1.5 GHz BCM2711
(CPU) processor, 8.00 GB of RAM, and the Linux operating system. It is
noteworthy that the Raspberry Pi 4B employs the Broadcom BCM2711
SoC, which integrates four ARM Cortex-A72 processor cores. Conse
quently, the Raspberry Pi can concurrently execute multiple threads,
with each core dedicated to processing an individual thread. To assess
the algorithm’s performance, we computed the approximate time
required for the proposed method and several methods from the litera
ture [11,24,25,26] to complete feature extraction, detection, and
learning tasks on the developed prototype, as presented in Table 9 and
Fig. 10(c).
From Table 9 and Fig. 10(c), it is evident that the proposed method in
this paper exhibits the shortest combined feature extraction and detec
tion time. However, the proposed incremental framework consumes
more time. The proposed method addressed this issue by employing
multi-threaded programming techniques, enabling the execution of HIF
Fig. 10. A simulating scenario of real-time detection and learning: (a) proto detection tasks on core one and incremental learning tasks on core two,
type of an edge intelligent terminal (b) system composition (c) process and thereby achieving a time frame suitable for engineering applications.
time used.
4.1. Adaptability analysis based on simulation data

Table 9 To simulate the non-stationary data stream faced by the HIF detec
Comparison of Time Consumption of Different Methods. tion system in engineering applications, the data samples in Table 10 are
Comparison Literature Feature extraction time Detection time randomly partitioned into 8 groups, with each group’s data samples
(Lala H & Karmakar S, 2020) 234.8 ms 4.703us further randomly partitioned into training, testing, and validation sets in
(S. Kar & S. R. Samantaray, 2016) 1.773 ms 1.672us a 7:2:1 ratio. The training set of the first dataset serves as the initial
(Gautam & Brahma, 2013) 3.281 ms 6.423us training set for training the primitive classifier model, while the
(Gao et al., 2022) 158.8 ms 62.5 ms remaining seven training sets are utilized as the simulated data stream
dataset. In the initial training set, there is a partial distribution of the
total samples, which, however, is insufficient to cover all distribution
Table 10 information. This is done to simulate scenarios where initial training
Simulation scenario composition. samples may be incomplete in field applications. The simulated data
Initial Incremental Scenario Incremental Scenario stream dataset is employed to mimic the data stream encountered by the
Scenario One Two model at different time intervals. The test set for each dataset comprises
Composition Mayr(640) Emanuel(640) Cybernetic(640) 20 % of the data samples in that dataset and is employed to assess the
(samples) Cassie model’s performance on the current cycle of data. Each validation
(640) dataset from every group is used to assist the model in selecting the best
LS(520) LIF(520) CS(520)
hyperparameter configuration for the current data volume.
IC(520)
The process of the simulated data stream experiment is as follows.
The model undergoes initial training in the initial training dataset, fol
lowed by incremental learning in batches using the simulated data

10
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705

Fig. 11. Model performance in simulation data stream in different methods.

stream dataset. The comprehensive accuracy of the model in both the circumvented.
current and previous batched test datasets is recorded. Nevertheless, ML algorithms theoretically equipped with the ability
In this paper, we have employed several methods from literature for incremental learning when employing the direct incremental
[11,24,25,26] as benchmarks to conduct a comprehensive comparison learning model yield subpar results. This is primarily attributed to the
between the proposed approach and existing techniques. The first two issue of “catastrophic forgetting,” a challenge frequently encountered in
methods utilize Machine Learning (ML) models for HIF detection based the course of incremental learning. The proposed method effectively
on Support Vector Machine (SVM) and Decision Tree (DT), respectively, mitigates this problem through the utilization of data replay, thereby
while the latter two employ threshold-based approaches, emphasizing ensuring the stability of the ML model’s detection performance
signal analysis across time, frequency, and time–frequency domains. throughout the learning process.
Additionally, in this paper, we have also applied a direct incremental To investigate the mechanism of data replay technology in the above
learning mode to existing ML algorithms, which involves updating the conclusions, this paper divides the data samples in Table 5 into three
ML model directly with new data. The results of the simulated data scenarios based on the different HIF models and Non-HIF events used.
stream experiments are presented in Fig. 11. The composition of these scenarios is presented in Table 10, with
Fig. 11, the detection accuracy of the proposed method remains numbers in parentheses indicating sample quantities. Each of the three
consistently high for different batches of incremental data, significantly scenario datasets is randomly partitioned into training and test sets at a
outperforming the other four methods in terms of sustaining perfor ratio of 7:3. Evidently, there are significant variations in data distribu
mance. This illustrates the ability of the proposed method to adapt to tion across different scenarios. The aim of this article is to magnify these
new distribution data through the incremental learning framework, distribution differences to observe the proposed method’s learning
while existing methods gradually degrade in detection performance as process with new data and the review process with old data.
new distribution data arrives, especially those relying on ML models. It The flow of the scenario transformation experiments is shown below.
is important to highlight that in this paper, we have fine-tuned the The model is trained with the initial scenario training set to train the
thresholds in [25] and [26] to achieve optimal performance. However, primitive classifier model in the proposed framework and direct learning
threshold-based approaches depend on the normal system operating mode, respectively, and then the training samples of incremental sce
state and tend to produce errors when confronted with unfamiliar data nario one and incremental scenario two are learned, and the perfor
distributions. This is a common limitation of algorithms lacking mance of the model in the three scenario test sets after each update is
continuous learning capabilities, and therefore, the method presented in recorded. The results of the scenario transformation experiments are
this paper aims to assist ML models in achieving lifelong learning. shown in Fig. 12.
The proposed method exhibits strong detection performance in the As evident from Fig. 12(b), the ML algorithm utilizing the direct
initial three data batches, validating that the straightforward structure incremental learning model possesses the capability to learn new dis
employed in this paper mitigates classifier model overfitting issues when tribution data. However, it often forgets the characteristics of the old
dealing with limited data. However, a slight decline in detection accu distribution data during the process of learning the new distribution
racy is observed in the 3rd to 5th incremental data batches. Interest data, resulting in a decline in the model’s ability to recognize the data
ingly, this performance dip is followed by a recovery and improvement from the old distribution. In contrast, the proposed method constructs an
in performance after adjusting the network structure during the 5th example set that includes previously learned information about the old
batch of learning. This performance shift is attributed to the structural distribution and employs retrospective training of the model through
adaptation technique proposed for model structure adjustments. As the data playback techniques. This approach ensures the model’s effective
model’s learning distribution widens and the dataset expands, the initial recognition of the old scenario is maintained while learning the new
simplistic model proves inadequate for capturing the increasing scenario.
complexity in data distribution. By expanding the model structure to After learning the new distribution data, this portion of the data is
accommodate this broader distribution range, the problem of decreased transformed into the learned old distribution data through the example
detection performance due to model underfitting is effectively set updating technique. The proposed method retains its capability to

11
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705

Fig. 12. Adaptation of models using different methods to adjust to simulation scenario changes.

acquired knowledge during the continuous learning process.

Table 11 In the proposed method, samples from the data stream undergo data
Real-world scenario composition.
evaluation using data evaluation techniques. Only the data that deviates
Initial Incremental Incremental from the distribution observed in the exemplary set is subjected to
Scenario Scenario One Scenario Two further learning. In the scenario transformation experiments, the direct
Composition Branch(50) Grassland (68) Practical HIF(178) incremental learning model learns all the data samples, whereas the
(samples) Masonry(28) proposed method selectively learns 340 data samples, which represent
Resistance Non-HIF2(146) Non-HIF3(120)
20 % of the total samples. As shown in Fig. 12(a), the model effectively
(106)
Non- captures the distribution characteristics of the specific scenario by
HIF1(200) learning the selected data, thus enabling accurate recognition of data
within that scenario’s distribution. This validates the capability of the
proposed data evaluation technique to monitor distribution changes and
detect the data from the first incremental scenario during the second identify new distribution data.
incremental scenario learning (as shown by the latter part of the green Combining the results of both experiments, it can be observed that
dashed line in Fig. 12(a)). This demonstrates that the example set within the proposed incremental learning framework, the BPNN is
updating technique can integrate the acquired distributional features capable of learning new data through incremental learning even in
into the new example set, thereby enabling the preservation of newly scenarios where the initial training samples are incomplete.

12
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705

Fig. 13. Model performance in real-word data stream in different methods.

Fig. 14. Adaptation of models using different methods to adjust to real-world scenario changes.

13
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705

Furthermore, it demonstrates a strong resistance to forgetting old data. Declaration of Competing Interest

The authors declare that they have no known competing financial

4.2. Adaptability analysis based on real-world data interests or personal relationships that could have appeared to influence
the work reported in this paper.
In this section, 940 sets of existing real-world data have been chosen
to assess the proposed method. Due to the scarcity of field data, this Data availability
section randomly allocates the measured data samples into four groups
for the simulated data stream test. Each group’s data samples are then No data was used for the research described in the article.
randomly divided into training, testing, and validation sets in a 7:2:1
ratio. In the scenario transformation test, these data are categorized into References
three different scenarios based on grounding medium and interference,
as outlined in Table 11. These scenario-based data are further divided [1] Yuan J, Jiao Z. Faulty feeder detection for single phase-to-ground faults in
distribution networks based on waveform encoding and waveform segmentation.
into real training and real testing data in a 7:3 ratio. The testing pro
IEEE Trans Smart Grid 2023:1.
cedures for both sets of data are consistent with the previous section, and [2] Zhang BL, Guo MF, Zheng ZY, Guo CH. A novel method for simultaneous power
the results of these tests are presented in Fig. 13 and Fig. 14. compensation and ground fault elimination in distribution networks, CSEE J Power
Considering the inherent time-varying and stochastic nature of real- Energy Syst, to be published. https://doi.org/10.17775/CSEEJPES.2022.03830.
[3] Deshmukh B, Lal DK, Biswal S. A reconstruction based adaptive fault detection
world data, it significantly challenges the model’s ability to recognize scheme for distribution system containing AC microgrid. Int J Electrical Power
such data, resulting in a final recognition accuracy of only 83.58 % for Energy Syst 2023;147.
the real-world data. However, when looking at the entire incremental [4] Wang X, Liu W, Liang Z, et al. Faulty feeder detection based on the integrated inner
product under high impedance fault for small resistance to ground systems. Int J
learning process, it becomes evident that the stability-plasticity dilemma Electr Power Energy Syst 2022;140:108078.
faced by the model is substantially mitigated within the proposed in [5] Zhang BL, Guo MF, Zheng ZY, Hong Q. Fault current limitation with energy
cremental learning framework. The model adeptly balances the inter recovery based on power electronics in hybrid AC–DC active distribution networks.
IEEE Trans Power Electron 2023;38(10):12593–606.
play between new and old data, ensuring that recognition accuracy for [6] Lopes GN, Menezes TS, Santos GG, et al. High Impedance Fault detection based on
field data remains stable under the proposed incremental learning harmonic energy variation via S-transform. Int J Electr Power Energy Syst 2022;
framework. 136.
[7] Yuan J, Wu T, Hu Y, et al. Faulty feeder detection based on image recognition of
In summary, when confronted with non-stationary data streams and voltage-current waveforms in non-effectively grounded distribution networks. Int J
scenarios characterized by significant variations in operating conditions, Electr Power Energy Syst 2022;143:108434.
the model continues to demonstrate its incremental learning capabil [8] Gao J, Wang X, Wang X, et al. A high-impedance fault detection method for
distribution systems based on empirical wavelet transform and differential faulty
ities. It can acquire common knowledge from both old and new data,
energy. IEEE Trans Smart Grid 2022;13(2):900–12.
effectively mitigating forgetfulness regarding old data and assimilating [9] Gao J-H, Guo MF, Lin S, et al. Application of semantic segmentation in High-
new data from the waveform data streams. This highlights the promising Impedance fault diagnosis combined signal envelope and Hilbert marginal
engineering applications of the proposed method. spectrum for resonant distribution networks. Expert Syst Appl 2023;231:120631.
[10] Guo MF, Liu WL, Gao JH, et al. A data-enhanced high impedance fault detection
method under imbalanced sample scenarios in distribution networks. IEEE Trans
5. Conclusion Ind Appl 2023:1–14.
[11] Lala H, Karmakar S. Detection and experimental validation of high impedance arc
fault in distribution system using empirical mode decomposition. IEEE Syst J 2020;
Traditional artificial intelligence approaches can achieve high ac 14(3):3494–505.
curacy in HIF detection when provided with complete training samples. [12] Xiao Q-M, Guo M-F, Chen D-Y. High-Impedance fault detection method based on
one-dimensional variational prototyping-encoder for distribution networks. IEEE
However, in real-world scenarios, dynamic distribution networks often Syst J 2022;16(1):966–76.
face continuously incoming data with new distributions, which are [13] Gomes DPS, Ozansoy C, Ulhaq A. Vegetation high-impedance faults’ high-
challenging to obtain as complete training samples during the early frequency signatures via sparse coding. IEEE Trans Instrum Meas 2020;69(7):
5233–42.
stages of training. In oeder to facilitate lifelong learning for the HIF
[14] Van De Ven GM, Tuytelaars T, Tolias AS. Three types of incremental learning. Nat
detection model and consistently acquire the capability to detect new Mach Intell 2022;4(12):1185–97.
distributed data from non-stationary data streams, this paper presents an [15] Leite D, Costa P, Gomide F. Evolving granular neural network for fuzzy time series
incremental learning-based framework for HIF detection in distribution forecasting. In: The 2012 International Joint Conference on Neural Networks
(IJCNN), 2012. p. 1-8.
grids. This framework employs a data evaluation mechanism, an in [16] Angelov P, Filev DP, Kasabov N. Evolving intelligent systems: methodology and
cremental learning algorithm, and an exemplar dataset updating applications. John Wiley & Sons; 2010.
mechanism to handle real-time data streams. The effectiveness of this [17] Leite D, Costa P, Gomide F. Evolving granular neural network for semi-supervised
data stream classification. In: The 2010 International Joint Conference on Neural
technique has been verified through both simulation and real-world Networks (IJCNN), 2010. p. 1-8.
data. [18] Costa FB, Souza BA, Brito NSD, et al. Real-Time detection of transients induced by
The results can be summarized as follows: high-impedance faults based on the boundary wavelet transform. IEEE Trans Ind
Appl 2015;51(6):5312–23.
[19] Costa FB. Fault-Induced transient detection based on real-time analysis of the
i) In non-stationary data streams with time-varying distributions, wavelet coefficient energy. IEEE Trans Power Delivery 2014;29(1):140–53.
the proposed method, when compared to other traditional arti [20] Yusuff AA, Jimoh AA, Munda JL. Fault location in transmission lines based on
stationary wavelet transform, determinant function feature and support vector
ficial intelligence approaches, exhibits the capability to learn new regression. Electr Pow Syst Res 2014;110:73–83.
distributional data information, thereby enhancing its detection [21] Gu S, Qiao J, Shi W, et al. Multi-task transient stability assessment of power system
capabilities. based on graph neural network with interpretable attribution analysis. Energy Rep
2023;9:930–42.
ii) The data evaluation mechanism allows for monitoring changes in
[22] Xu T, Zou P, Xu T, et al. Study on weight function of meshless method based on B-
data distribution within the data stream, facilitating the selection spline wavelet function. In: 3rd International Joint Conference on Computational
of data samples that require learning. Sciences and Optimization, CSO 2010: Theoretical Development and Engineering
iii) The incremental learning algorithm based on data replay can Practice, May 28, 2010 - May 31, 2010; 2010. p. 36-40.
[23] Robins A. Catastrophic forgetting in neural networks: the role of rehearsal
effectively counteract the forgetting of previously acquired mechanisms. In: Proceedings 1993 The First New Zealand International Two-
knowledge through retrospective training. Furthermore, the Stream Conference on Artificial Neural Networks and Expert Systems, 1993. p. 65-
proposed exemplar dataset updating mechanism can incorporate 68.

learned distributional features into new exemplar dataset while

significantly reducing data storage costs.

14
M.-F. Guo et al. International Journal of Electrical Power and Energy Systems 156 (2024) 109705

[24] Kar S, Samantaray SR. High impedance fault detection in microgrid using maximal [26] Wang X, Wei X, Gao J, et al. High-Impedance fault detection method based on
overlapping discrete wavelet transform and decision tree. In: International stochastic resonance for a distribution network with strong background noise. IEEE
Conference on Electrical Power & Energy Systems; 2016. Trans Power Delivery 2022;37(2):1004–16.
[25] Gautam Brahma S. Detection of high impedance fault in power distribution systems
using mathematical morphology. IEEE Trans Power Syst 2013;28(2):1226–34.