0% found this document useful (0 votes)
62 views14 pages

Prediction Method For Power Transformer Running ST PDF

Uploaded by

Kanwar lal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views14 pages

Prediction Method For Power Transformer Running ST PDF

Uploaded by

Kanwar lal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

energies

Article
Prediction Method for Power Transformer Running
State Based on LSTM_DBN Network
Jun Lin 1, * ID
, Lei Su 2 , Yingjie Yan 1 , Gehao Sheng 1 , Da Xie 1 and Xiuchen Jiang 1
1 Department of Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;
[email protected] (Y.Y.); [email protected] (G.S.); [email protected] (D.X.); [email protected] (X.J.)
2 Electric Power Research Institute of State Grid Shanghai Municipal Electric Power Company,
Shanghai 200120, China; [email protected]
* Correspondence: [email protected]; Tel.: +86-188-1827-2579

Received: 3 July 2018; Accepted: 12 July 2018; Published: 19 July 2018 

Abstract: It is of great significance to accurately get the running state of power transformers and
timely detect the existence of potential transformer faults. This paper presents a prediction method
of transformer running state based on LSTM_DBN network. Firstly, based on the trend of gas
concentration in transformer oil, a long short-term memory (LSTM) model is established to predict
the future characteristic gas concentration. Then, the accuracy and influencing factors of the LSTM
model are analyzed with examples. The deep belief network (DBN) model is used to establish the
transformer operation using the information in the transformer fault case library. The accuracy of
state classification is higher than the support vector machine (SVM) and back-propagation neural
network (BPNN). Finally, combined with the actual transformer data collected from the State Grid
Corporation of China, the LSTM_DBN model is used to predict the transformer state. The results
show that the method has higher prediction accuracy and can analyze potential faults.

Keywords: dissolved gas analysis; long short-term memory; deep belief network; running state prediction

1. Introduction
As one of the important pieces of equipment in the power system, power transformers can
directly influence the stability and safety of the entire power grid. If the transformer fails in operation,
it will cause power to turn off and also cause damage to the transformer itself and the power system,
which may result in greater damage [1]. So, it is necessary to take real-time condition monitoring into
consideration and make diagnoses for the transformer to predict future running states. The potential
failure of the transformer is discovered in time and the potential failure types are analyzed. Sending
early warning signals to maintainers and taking corresponding measures in a timely manner can
reduce the possibility of an accident.
At present, there is much research on transformer fault diagnosis, but there are relatively few
studies on the prediction of future running states of transformers and fault prediction. During the
operation of the transformer, its internal insulating oil and solid insulating material will be dissolved
in the insulating oil due to aging or external electric field and humidity. The content of various
components of the gas in the oil and the proportional relationship between the different components
are closely related to the running state of the transformer. Before the occurrence of electrical or thermal
faults, the concentration of various gases has a gradual and regular change with time. Therefore,
the dissolved gas analysis (DGA) is an important method to find the transformer defects and latent
faults. It is highly feasible and accurate to predict transformer running states and make future fault
classifications based on the trend of each historical gas concentration and the ratio between gas
concentrations [2–4]. Current methods include oil gas ratio analysis [5–7], SVM [8,9] and artificial

Energies 2018, 11, 1880; doi:10.3390/en11071880 www.mdpi.com/journal/energies


Energies 2018, 11, 1880 2 of 14

neural network (ANN) [10,11]. Li et al. [12] proposes an incipient problem diagnosis method based
on the combined use of a multi-classification algorithm self-adaptive evolutionary extreme Learning
machine (SaE-ELM) and a simple arctangent transform (AT). This paper uses AT alter the data structure
of the experiment data to enhance the generalization capability for SaE-ELM. This article utilizes AT to
change the structure of experimental data to enhance SaE-ELM fitting and generalization capabilities.
Sherif S.M. Ghoneim [13] utilizes the thermodynamic theory to evaluate the fault severity based on
dissolved gas analysis, also it proposes fuzzy logic approach to enhance the network fault diagnosis
ability. Zhao et al. [14] proposes a transformer fault combination prediction model based on SVM.
The prediction results of multiple single prediction methods such as exponential model, gray model, etc.
are taken as the input of SVM for the second prediction to form a variable weight combination forecast.
Compared with single prediction, the accuracy of fault prediction is improved. Zhou et al. [15] uses
cloud theory to predict the expected value of gas changes in oil in the short term to obtain a series
of prediction results with stable tendency. The current method still has the following two problems.
(1) The current research is mostly aimed at fault diagnosis at the current moment, which lack analysis
of the running states in the future and fault warning analysis. (2) In the state assessment and fault
classification of transformers, the gas concentration ratio coding is mainly used as the input of the
model, but there are problems such as incomplete coding and too-absolute boundaries [16].
In recent years, with the continuous development of deep learning technologies, some deep
learning models have been applied to the analysis of time series data. The deep learning model is a
kind of deep neural network with multiple non-linear mapping levels. It can abstract the input signal
layer by layer and extract features to discover potential laws at a deeper level. In many deep learning
models, the recurrent neural network (RNN) can fully consider the correlation of time series and can
predict future data based on historical data. It is more adaptable to predict and analyze time series
data. The LSTM is used as an improved model of RNN to make up for the disappearance of gradients,
gradient explosions, and lack of long-term memory in the training process of the RNN model. It can
make full use of historical data. At present, LSTM has achieved extensive research and application in
such fields as speech recognition [17], video classification [18], and flow prediction [19,20]. In this paper,
the LSTM model is used to process the superiority of the time series, and the gas concentration in
the future is predicted based on the trend of the gas concentration in the transformer oil history.
The DBN is cumulatively accumulated by multiple restricted Boltzmann machines (RBM), and the data
is pre-trained using a comparative divergence (CD) algorithm. The error back-propagation is used
to adjust the parameters of the whole network. The DBN network can effectively support traditional
neural networks that are vulnerable to initial parameters, and that handle high-dimensional data at
a slower speed. Currently, DBN networks have been widely used in fault diagnosis [21], pattern
recognition [22], and image processing [23]. In this paper, the ratio of the future gas concentration
obtained from the LSTM prediction model is used as the DBN network input to classify the future
operating status of the transformer.
This paper presents a prediction method of transformer running state based on LSTM_DBN
model. Firstly, the ability of LSTM model to deal with time series is used to analyze the changing
trend of dissolved gas concentration data in transformer oil to obtain the future gas concentration and
calculate the gas concentration ratio. Using the powerful feature learning ability of DBN network,
the gas concentration ratio value is the input of the model, and the transformer operation state type is
output, and a plurality of hidden layer deep networks are constructed. The entire LSTM_DBN model
makes full use of the historical data of the transformer oil chromatogram and realizes the analysis
of the state of the transformer in the future and the analysis of the early fault warning. Through the
analysis of specific examples, we can see that the model proposed in this paper has good prediction
accuracy and can analyze potential faults.
Energies 2018, 11, 1880 3 of 14

Energies 2018, 11, x of


2. Prediction FORDissolved
PEER REVIEW
Gases Concentration in Transformer Oil Based on LSTM3Model
of 14

2. Prediction
2.1. of of
Prediction Dissolved
DissolvedGases
GasesConcentration
Concentrationin Transformer Oil Based on LSTM Model
Transformer
2.1. Prediction oil chromatographic
of Dissolved analysis technology has become one of the important methods
Gases Concentration
for monitoring the early latency faults of oil-immersed power transformers and analyzing fault nature
Transformer oil chromatographic analysis technology has become one of the important methods
and locations after
for monitoring the early failure. Condition-based
latency maintenance
faults of oil-immersed power of transformers
oil-immersed andtransformers
analyzing faultis fully based
on oil chromatographic
nature data. Transformer
and locations after failure. Condition-based oil chromatographic analysis test
maintenance of oil-immersed can quickly
transformers and effectively
is fully
find potential
based faults and defects
on oil chromatographic data.without interruption
Transformer of power. Itanalysis
oil chromatographic has high testrecognition
can quicklyof overheating
and
effectively find potential faults and defects without
faults, discharge faults, and dielectric breakdown failures. interruption of power. It has high recognition of
overheating faults, discharge
Most transformers usefaults, and dielectric
oil-paper composite breakdown failures.
insulation. When the transformer is under normal
operation, the insulating oil and solid insulating material the
Most transformers use oil-paper composite insulation. When willtransformer
gradually isdeteriorate
under normal and a small
operation, the insulating oil and solid insulating material will gradually deteriorate and a small
amount of gas will be decomposed, mainly including H2 , CH4 , C2 H2 , C2 H4 , C2 H6 , CO, and CO2 .
amount of gas will be decomposed, mainly including H2, CH4, C2H2, C2H4, C2H6, CO, and CO2. When
When the internal fault of the transformer occurs, the speed of these gases will be accelerated.
the internal fault of the transformer occurs, the speed of these gases will be accelerated. As the failure
As the failure
develops, develops, the
the decomposed gasdecomposed
forms bubbles, gas formsbubbles
causing bubbles, to causing
flow andbubbles to oil.
diffuse in flowTheand diffuse
in oil. The composition
composition and content ofand content
the gas of therelated
are closely gas are to closely
the typerelated to the
of fault and thetype of fault
severity of theand the severity
fault.
Therefore, during the operation of the transformer, chromatographic analysis of the oil is performed of the oil
of the fault. Therefore, during the operation of the transformer, chromatographic analysis
is regular
at performed at regular
intervals, so as tointervals, so as internal
detect potential to detect potential
equipment internal
failures equipment
as early failures
as possible, whichas early as
can avoidwhich
possible, equipment failure
can avoid or greaterfailure
equipment losses.or However, due toHowever,
greater losses. the complex dueoperation of the operation
to the complex
transformer oil chromatography
of the transformer test and the
oil chromatography testlong
andsampling
the longinterval,
sampling it is of greatitsignificance
interval, is of great to significance
predict the future development trend based on the historical trend of the gas concentration
to predict the future development trend based on the historical trend of the gas concentration in the in the
transformer oil.
transformer oil.
2.2. Principles of Prediction
2.2. Principles of Prediction
The LSTM network is an improved model based on the RNN. While retaining the recursive
The LSTM network is an improved model based on the RNN. While retaining the recursive nature
nature of RNNs, the problem of disappearance of gradients and gradient explosions in the RNN
of RNNs, the problem
training process of disappearance of gradients and gradient explosions in the RNN training
is solved [24–27].
process is solved [24–27].

y
y(t−1) y(t) y(t+1)
(t+1)
Why Why(t−1) Why(t) Why
Whh h (t−1)
h (t)
h(t+1)
h
Whh(t−2) Whh(t−1) Whh(t) Whh(t+1)
Wxh Wxh(t−1) Wxh(t) Wxh(t+1)
x x(t)
x(t−1) x(t+1)

(a)
(b)
Figure 1. (a) Basic recurrent neural network (RNN) network; (b) RNN expansion diagram.
Figure 1. (a) Basic recurrent neural network (RNN) network; (b) RNN expansion diagram.
A basic RNN network is shown in Figure 1a. It consists of an input layer, a hidden layer, and an
outputAlayer.
basicThe
RNN RNN network
network is timing
showndiagram in Figure is shown in Figure of
1a. It consists 1b.anx =input
[x(1), xlayer,
(2), x(3)…, x(n−1), x(n)] is
a hidden layer, and an
the input
output vector
layer. and
The RNNy = [y (1), y , …,
network (2) y ] isdiagram
timing (n) the output vector. h
is shown inisFigure
the state
1b.ofx = (1) , x(2) ,layer.
the[xhidden x(3) . .W (n−1) , x(n) ] is
. ,xhxis
the
the weight matrix and
input vector of the
y input
= [y(1)layer
, y(2) ,to. .the
. , yhidden layer.
(n) ] is the Why isvector.
output the weight matrix
h is the stateofof the hidden
the hidden layer layer. W xh is
to the output layer, and Whh is the weight matrix of the hidden layer state as the input at the next
the weight matrix of the input layer to the hidden layer. W hy is the weight matrix of the hidden layer to
moment. The layer state h(t−1) is used as the weight matrix input at time t. So when the input at t is x(t),
the output layer, and W hh is the(t) weight matrix of the hidden layer state as the input at the next moment.
the value of the hidden
(t−1) layer is h and the output value is y .
(t)
(t)
The layer state h is used as the weight matrix input at time t. So when the input at t is x , the value of
( t − 1) ( t − 1)
h( t ) =value
the hidden layer is h(t) and the output f (Wxh( tis
)
⋅ xy((t)
t)
.+ Whh ⋅ h ) (1)

= g(Wxh(t)⋅ h )
(t ) (t ) (t )
h(t) =yf (W
(t) ( t −1) (2)
xh ·x + Whh · h( t −1) ) (1)

(t)
y(t) = g(Wxh ·h(t) ) (2)
Energies 2018, 11, 1880 4 of 14

where f is the hidden layer activation function and g is the output layer activation function. Substituting
(2) into (1), we can get:

Energies 2018,(t11,
) x FOR PEER REVIEW
(t) (t) ( t −1) ( t −1) ( t −2) ( t −2) 4 of 14
y(t) = g(Wxh ·h(t) ) = g(Wxh · f (Wxh ·x(t) + Whh · f (Wxh ·x(t−1) + Whh · f (Wxh ·x(t−2)
( t −3) ( t −3) ( t −3) (3)
Whh ·f fis(Wthe
+Where ·x
xh hidden + ...))))
layer activation function and g is the output layer activation function.
Substituting (2) into (1), we can get:
(t)
From (3), it (can be seen ( tthat the (output ( tvalue ( t ) y ( t −of
1) the RNN network ( t − 2)is affected not− 2) only by the
(t) y t)
= g (W (t )
xh ⋅ h )
) = g ( W t)
xh ⋅ f (W xh
)
⋅ x + W hh ⋅ f (Wxh( t −1) ⋅ x ( t −1) + W(thh− 1) ⋅ f (t
(W −
( t − 2)
xh
2) ⋅(tx−( t3)
input x at the current moment, but also by the previous input value x ,x ,x . . . . (3)
+Whh( t − 3) ⋅ f (Wxh( t − 3) ⋅ x ( t − 3) + ...))))
The RNN network has a memory function and can effectively deal with non-linear time series.
However,From when (3),the
it can RNN be seen processes a time value
that the output sequence with
y(t) of the RNN a longnetwork delay, the problem
is affected not onlyof bygradient
the
input x (t) at the current moment, but also by the previous input value x(t−1),x(t−2),x(t−3)….
disappearance and gradient explosion will occur during the back-propagation through time (BPTT)
The RNNAs
training process. network
an improved has a memory model, function
LSTM and adds cana effectively
gating unit deal with allows
which non-linear thetime model series.
to store
However, when the RNN processes a time sequence
and transmit information for a longer period of time through the selective passage of information. with a long delay, the problem of gradient
disappearance and gradient explosion will occur during the back-propagation through time (BPTT)
The gating unit of LSTM is shown in Figure 2. It consists of an input gate, a forget gate and an
training process. As an improved model, LSTM adds a gating unit which allows the model to store
output gate. The workflow of the LSTM gate unit is as follows:
and transmit information for a longer period of time through the selective passage of information.
(1) Input
The gatingthe sequence
unit of LSTM value x(t) atin time
is shown Figuret 2. and the hidden
It consists of an input layergate, state h(t−1)gate
a forget at timeand ant − 1.
The discarded information is determined
output gate. The workflow of the LSTM gate unit is as follows: by the activation function. The output at this time is:
(1) Input the sequence value x(t) at time t and the hidden layer state h(t−1) at time t − 1. The
f(t) = by
discarded information is determined σ (W h( t −1) + W
thef ·activation f ·x
(t)
function. b f )output at this time is:
+The (4)
(t ) ( t −1) (t )
where f (t) is the result of the forget state, f =W σ(Wisf ⋅ the + W f ⋅ x matrix
h weight + b f ) of forget state, and b is offset
(4) of
f f
forgetwhere
state.f(t)σisisthe
theresult
activation function.
of the forget state, It
Wis usually a tanh or sigmoid function.
f is the weight matrix of forget state, and bf is offset of forget
(2) Enter the gate unit state c(t−1) at time t − 1 and determine the information to update. Update
state. σ is the activation function. It is usually a tanh or sigmoid function.
the gate unit c(t) gate
state the
(2) Enter unitt:state c(t−1) at time t − 1 and determine the information to update. Update
at time
the gate unit state c(t) at time t:
i(t) = (σ
t)
(Wi ·h(t−(1t –1) )+ Wi ·x((t t) ) + bi ) (5)
i = σ(Wi ⋅ h + Wi ⋅ x + bi ) (5)
c(t) = (tanh
e (Wc ·h(t−(1t –)1)+ Wc ·x((t )t) + bc ) (6)
t)
c = tanh (Wc ⋅ h + Wc ⋅ x + bc ) (6)
c( t ) + f ( t ) ◦ c( t −1)
c( t ) = i( t ) ◦ e (7)
(t)c = i  c + f  c
(t ) (t ) (t ) (t ) ( t – 1)
(7)
where i(t) is the input gate state result. e c is the cell state input at t. W i is the input gate weight matrix.
W c iswhere i(t) is the
the input cellinput
stategate
weight matrix.c(t )bisi is
state result. thethe
cellinput
state input
gate at t. Wiand
bias, is thebcinput gate
is the weight
input cellmatrix.
state bias.
W is the input cell state
◦ means multiplication by elements.
c weight matrix. b i is the input gate bias, and b c is the input cell state bias. 
means multiplication by elements.
(3) The output of the LSTM is determined by the output gate and unit status:
(3) The output of the LSTM is determined by the output gate and unit status:
·h(⋅t−
o(t) =o(σt ) (=Wσo(W )
1t –1
h ( )+ Wo⋅·xx((t )t)++
+W b )bo ) (8) (8)
o o o

h(t) h=
(t ) o
(t) )
= o ( t◦
(t)
tanh((cc ( t ) ))
tanh (9) (9)

where o(t) iso(t)the


where output
is the gate
output gatestate result. WW
state result. o is
o is thethe output
output gategate weight
weight matrixmatrix and
and bo is theboutput
o is the output
gate
gate offset.
offset.

h(t)

C(t−1) c(t)
× +
tanh
×
(t) o(t) ×
c ( t )
(t)
f i

σ σ tanh σ
h(t−1)
h(t)

x(t)

Figure
Figure 2. Structureofoflong
2. Structure longshort-term
short-term memory
memory(LSTM) gate
(LSTM) unit.
gate unit.
Energies 2018, 11, x FOR PEER REVIEW 5 of 14
Energies 2018, 11, 1880 5 of 14
3. Analysis of Transformer Running State Based on Deep Belief Network

3.
3.1.Analysis of Transformer
Transformer Running
Running Status State Based on Deep Belief Network
Analysis
For the running
3.1. Transformer Runningstate classification
Status Analysis of the transformer, it is firstly divided into healthy state (H)
and potential failure (P). According to the IEC60599 standard, the types of potential transformer
For the running state classification of the transformer, it is firstly divided into healthy state (H) and
faults can be classified into thermal fault of partial discharge (PD), low-energy discharge (LD), and
potential failure (P). According to the IEC60599 standard, the types of potential transformer faults can
high-energy discharge (HD), low temperature (LT), thermal fault of medium temperature (MT),
be classified into thermal fault of partial discharge (PD), low-energy discharge (LD), and high-energy
thermal fault of high temperature (HT) [21]. Thus, the predicted running state of the transformer is
discharge (HD), low temperature (LT), thermal fault of medium temperature (MT), thermal fault
divided into 7 (6 + 1) types.
of high temperature (HT) [21]. Thus, the predicted running state of the transformer is divided into
Due to the normal aging of the transformer, the decomposed gas in the transformer oil is in an
7 (6 + 1) types.
unstable state and will accumulate over time and change dynamically. Even though different
Due to the normal aging of the transformer, the decomposed gas in the transformer oil is in
transformers are in healthy operation, because of their different operating times, the concentration of
an unstable state and will accumulate over time and change dynamically. Even though different
dissolved gases in the oil varies greatly among different transformers. Therefore, it is necessary to
transformers are in healthy operation, because of their different operating times, the concentration of
use the ratio between the gas concentrations instead of the simple gas concentration as a reference
dissolved gases in the oil varies greatly among different transformers. Therefore, it is necessary to use
vector for the prediction of the final running state.
the ratio between the gas concentrations instead of the simple gas concentration as a reference vector
The currently used ratios include IEC ratios, Rogers ratios, Dornenburg ratios and Duval ratios.
for the prediction of the final running state.
This paper combines these four methods with other codeless ratio methods. The gas concentration
The currently used ratios include IEC ratios, Rogers ratios, Dornenburg ratios and Duval ratios.
ratios used is shown in Table 1.
This paper combines these four methods with other codeless ratio methods. The gas concentration
ratios used is shown in Table 1. Table 1. Structure of LSTM gate unit.

Table 1. Structure of LSTM gate unit.


IEC ratios CH4/H2, C2H2/C2H4, C2H4/C2H6
Rogers ratios CH4/H2, C2H2/C2H4, C2H4/C2H6, C2H6/CH4
IEC ratios CH4 /H2 , C2 H2 /C2 H4 , C2 H4 /C2 H6
Dornenburg
Rogers ratios
ratios CH
CH 4/H2, C2H2/C2H4, C2H2/CH4, C2H6/C2H2
4 /H2 , C2 H2 /C2 H4 , C2 H4 /C2 H6 , C2 H6 /CH4
Duval ratiosratios
Dornenburg CH4/C,CH4C/H 2H22,/C,
C2 HC22/C
H42/C,
H4 , where C =4 , CH
C2 H2 /CH C2 H46+/CC22HH22 + C2H4
Duval ratios CH4/H2, CCH /C, C4,HC
2H42/C2H2 2 /C,
2H4/C
C2 2HH4 /C, where C = CH4 2+/CH
6, C2H6/CH4, C2H
C2 H4,2 C
+ 2CH2 H
6/C
4 2H2, CH4/C1,
CH4 /H2 , C2 H2 /C2 H4 , C2 H4 /C2 H6 , C2 H6 /CH4 , C2 H2 /CH4 , C2 H6 /C2 H2 ,
gas concentration ratios
gas concentration ratios
C2H2/C1, C2H4/C1, H2/C2, CH4/C2, C2H2/C2, C2H4/C2, C2H6/C2,
CH4 /C1 , C2 H2 /C1 , C2 H4 /C1 , H2 /C2 , CH4 /C2 , C2 H2 /C2 , C2 H4 /C2 , C2 H6 /C2
where
where C
C11 == CH + CC22H
CH44 + H22 ++CC22HH4 ,4,where
where C2 =CH 2 =
2+HCH 2 + 4CH + 2C+2H
+ C42 H C22 H+ 4C+2H +6C2H6
C24H

3.2. Deep
3.2. Deep Belief Network
Belief Network
RBM, as
RBM, as aa component
component of of DBN,
DBN, includes
includes aa visible
visible layer
layer vv and
and aa hidden
hidden layer
layer h.
h. The
The structure
structure of
of
RBM is
RBM is shown
shown inin Figure
Figure 3.
3. The
The visible layer consists
visible layer consists of
of visible
visible units
units vvi and is used to input the training
i and is used to input the training
data. The hidden layer is composed of hidden units h i and is used for feature detection. w represents
data. The hidden layer is composed of hidden units hi and is used for feature detection. w represents
the weights
the weights between
between twotwo layers.
layers. For
For the
the visible
visible and
and hidden
hidden layers
layers of
of RBM,
RBM, the
the interlayer
interlayer neurons are
neurons are
fully connected and the inner layer neurons are not connected
fully connected and the inner layer neurons are not connected [28–31]. [28–31].

h1 h2 h3 ... hj h

v1 v2 v3 v4 ... vi v

Figure 3. Structure of LSTM gate unit.

For aa specific
For specific set
set of
of (v,
(v, h),
h), the
the energy
energy function
function of
of the
the RBM
RBM is
is defined
defined as:
as:
nv nh nv nh
E(v, h | θ) =nv−  ai vi −nh bj hj −n
v nh v ω h (10)
E(v, h|θ ) = − ∑ ai=i1vi − ∑j=1b j h j − ∑i= 1 ∑
i ij j
j= 1 vi ωij h j (10)
i =1 j =1 i =1 j =1
where θ = (ωij , ai , bj ) is the parameter of RBM. ωij is the connection weight between the visible
layer node vi and the hidden layer node hj. ai and bj are the offsets of vi and hj respectively. According
to this energy function, the joint probability density of (v, h) is:
Energies 2018, 11, 1880 6 of 14
Energies 2018, 11, x FOR PEER REVIEW 6 of 14

where θ = (ωij , ai , b j ) is the parameter of RBM. ωij is the connection weight between the visible layer
− E ( v , h |θ )
node vi and the hidden layer node hj . api (and = ethe
v, h b| θ )are offsets / Z(θof
) v and h respectively. According
(11)
to
j i j
this energy function, the joint probability density of (v, h) is:
Z(θ) =  e− E( v, h|θ) (12)
p(v, h|θ ) = ev− Eh(v,h|θ ) /Z(θ ) (11)
The probability that the jth hidden unit in the hidden layer and the ith visible unit in the visible
Z(θ ) = ∑ ∑ e− E(v,h|θ ) (12)
layer are activated are: v h
n
p( h j = 1| v ,θ) = σ(bj +  vi ω ji )
v
The probability that the jth hidden unit in the hidden layer and the ith visible unit in the visible
(13)
layer are activated are: i =1
nv
p(h j = 1|v, θ ) = σ (b j + ∑ nh
vi ω ji ) (13)
p( v j = 1| h ,θ) = σ( ai +  h j ω ji )
i = 1
(14)
ni=h1
p(v j = 1|h, θ ) = σ ( ai + ∑ h j ω ji ) (14)
where σ (⋅) is the activation function. Usually we can choose i =1
sigmoid function, tanh function or
ReLU function. The expressions are:
where σ (·) is the activation function. Usually we can choose sigmoid function, tanh function or ReLU
function. The expressions are: 1
sigmoid( x) = (15)
1 −x
sigmoid( x ) = 1 + e− x (15)
1+e
e x − e− x
tanh( x) =ex x− e−− xx (16)
tanh( x ) = xe + e− x (16)
e +e
ReLU ( x ) x=) =max
ReLU( (0, xx))
max(0, (17)
(17)
SinceSince
the ReLU function
the ReLU can improve
function the convergence
can improve speed ofspeed
the convergence the model
of theand has the
model andnon-saturation
has the non-
characteristics, this paper uses the ReLU function as the activation function.
saturation characteristics, this paper uses the ReLU function as the activation function.
When given
When given aa set
set of
of training
training samples S, n
samples S, is the
nss is the number
number of of training
training samples. Maximizing the
samples. Maximizing the
likelihood function
likelihood function can
can achieve
achieve the
the purpose
purpose of
of training
training RBM.
RBM.
nnss
lnθ,S
ln L ln
Lθ,S==∑ lnP((vvi))
i =11
i
(18)
(18)
i=

The DBN network


The DBN is essentially
network a deep
is essentially neuralneural
a deep network composed
network of multiple
composed RBM networks
of multiple and a
RBM networks
classified output layer. Its structure is shown in Figure 4.
and a classified output layer. Its structure is shown in Figure 4.

o1 o2 ... on pre-training

h1 h2 h3 h4 hj
RBM

h1 h2 h3 h4 hj

RBM

h3 h4 ... hj
h1 h2

v1 v2 v3 ... vi
RBM
fine-tuning

Figure
Figure 4.
4. Structure
Structure of
of deep
deep belief
belief network
network (DBN)s.
(DBN)s.

The DBN training process includes two stages: pre-training and fine-tuning. In the pre-training
phase, a contrast divergence (CD) algorithm is used to train each layer of RBM layer by layer. The
output of the first layer of RBM hidden layer is used as the input of the upper layer of RBM. In the
Energies 2018, 11, 1880 7 of 14

The DBN training process includes two stages: pre-training and fine-tuning. In the pre-training
phase, a contrast divergence (CD) algorithm is used to train each layer of RBM layer by layer.
The output of the first layer of RBM hidden layer is used as the input of the upper layer of RBM.
In the fine-tuning phase, the gradient descent method is used to propagate the error between the
actual output and the labeled numerical label from top to bottom and back to the bottom to achieve
optimization
Energiesof the11,entire
2018, DBNREVIEW
x FOR PEER model parameters. 7 of 14

fine-tuning
4. Transformer phase,
State the gradient
Prediction descent method is used to propagate the error between the actual
Process
output and the labeled numerical label from top to bottom and back to the bottom to achieve
With the continuous
optimization of thedevelopment of power
entire DBN model equipment on-line monitoring technology, the monitoring
parameters.
data are also increasing rapidly. Utilizing the existing historical state information, such as the type and
development law of theState
4. Transformer characteristic
Prediction gas in the insulating oil, and analyzing the change of the running
Process
state is of great significance
With to the development
the continuous state assessment and prediction.
of power equipment on-line monitoring technology, the
Themonitoring
flowchartdataof the transformer running state prediction
are also increasing rapidly. Utilizing method
the existing based
historical stateon the LSTM_DBN
information, such
model isasshown in Figure 5. The specific steps are as follows:
the type and development law of the characteristic gas in the insulating oil, and analyzing the
change of the running state is of great significance to the state assessment and prediction.
(1) CollectThe the flowchart
transformer oiltransformer
of the chromatographic
running data
state and select method
prediction the characteristic
based on the parameters
LSTM_DBNH2 ,
CHmodel
4 , C H , C
2 is2shown H and
2 4in FigureC H as input for the model;
2 5.6 The specific steps are as follows:
(2) Train the LSTM model. According to the transformer oil chromatography historical data, each
(1) Collect the transformer oil chromatographic data and select the characteristic parameters H2,
characteristic gas concentration is taken as the input, and the corresponding gas concentration is
CH4, C2H2, C2H4 and C2H6 as input for the model;
used(2) asTrain
the output
the LSTM to train
model. LSTM modeltotothe
According obtain future oil
transformer gaschromatography
concentration values;
historical data, each
(3) Train the DBN model.
characteristic According toisthe
gas concentration takensamples of theand
as the input, transformer fault case
the corresponding gaslibrary, the gas
concentration
is used as the output to train LSTM model to obtain future gas concentration
concentration ratios are taken as the input of the DBN network, and 7 kinds of transformer values;
(3) Train
running statestheare
DBNused model.
as theAccording
output to totrain
the samples of the transformer fault case library, the gas
DBN model;
(4) Use the trained LSTM_DBN network to test the test setDBN
concentration ratios are taken as the input of the network,
samples. andthe
Input 7 kinds of transformer
five characteristic gas
running states are used as the output to train DBN model;
concentration values to the LSTM model and predict future gas changes. Then calculate the gas
(4) Use the trained LSTM_DBN network to test the test set samples. Input the five characteristic gas
concentration ratio and use the ratio results as input to the DBN network to obtain the future
concentration values to the LSTM model and predict future gas changes. Then calculate the gas
running states of theratio
concentration transformer;
and use the ratio results as input to the DBN network to obtain the future
(5) If thererunning
is fault states
information in the prediction result, an early warning signal needs to be issued in
of the transformer;
time(5)and the fault
If there type
is fault can be predicted.
information in the prediction result, an early warning signal needs to be issued
in time and the fault type can be predicted.

transformer oil chromatography


transformer fault case
historical data
H2, CH4, C2H2, C2H4, C2H6 library

gas concentration Gas concentration ratios


error response propagation,
back propagation along time pre-training
LSTM network:
predict LSTM(1) RBM(1)
future gas DBN network:
concentration running state
LSTM(2) RBM(2)
classification
...

...

LSTM(n) RBM(n)
forward calculation
softmax classifier
fine-tuning

trained LSTM_DBN
network

Analyze the output

Figure 5. Flowchart of transformer running state prediction.


Figure 5. Flowchart of transformer running state prediction.
Energies 2018, 11, x FOR PEER REVIEW 8 of 14

Energies
5. Case2018, 11, x FOR PEER REVIEW
Analysis 8 of 14
Energies 2018, 11, 1880 8 of 14
5. Case
5.1. Gas Analysis
Concentration Prediction
5.5.1.
CaseGasAnalysis
This paper takes Prediction
Concentration the oil chromatographic monitoring data collected by a 220 kV transformer oil
chromatography online monitoring device as an example. The sampling interval is 1 day. For the
5.1. Gas
methane Concentration
This paper
gas takesPrediction
the oilsequence,
concentration chromatographic monitoring
800 monitoring datadata collectedas
are selected bytraining
a 220 kVsamples
transformer
and 100oil
chromatography
monitoring data online
are the monitoring device
usedoilaschromatographic as an
test samples. Themonitoring example.
prediction data The
results sampling
are shown interval is 1 day. For the
This paper takes collected by ain Figure
220 6.
kV transformer oil
methane gas concentration
In order toonline
evaluate sequence,
the accuracy 800 monitoring data are selected as training samples and 100
chromatography monitoring device and
as anvalidity
example. ofThethe sampling
prediction modelisproposed
interval 1 day. For inthethis paper,
methane
monitoring data
theconcentration are used
following evaluation as test samples. The prediction results are shown in Figure 6.
gas sequence, criteria are useddata
800 monitoring for are
analysis.
selected as training samples and 100 monitoring data
In order to evaluate the accuracy and validity of the prediction model proposed in this paper,
are used as test samples. The prediction results are shown in Figure 6.
1 N xi − xi
the following evaluation criteria are used
In order to evaluate the accuracyavg_err and validity 
for=analysis.
N i =1 xi
of the × 100% model proposed in this paper,
prediction (19)
the following evaluation criteria are used for analysis. 1 N
x − xi
avg_err =  i × 100% (19)
N
N i =1
xi

x x − xi
1 xe=i −
N i∑
maxi i

avg_err =max _err x ×x 100% (20)
(19)
i i
=1 xi − xi
max _err = max (20)
where N is the number of set, xi is the real value and xxei is−the
xxii predicted value.
i
max_err = max (20)
x
where N is the number of set, xi is the real value and xi is ithe predicted value.
where N is the number of set, 14.7xi is the real value and x ei is the predicted value.
true value
14.7 predicted value
14.6
true value
predicted value
14.6
14.5

14.5
14.4

14.4
14.3

14.3
14.2

14.2
14.1
0 20 40 60 80 100

14.1
0 20 40 60 80 100
Figure 6. Methane gas concentration prediction results.
Figure 6. Methane gas concentration prediction results.
Figure 6. Methane gas concentration prediction results.

1.2
error(%)

1.2
error(%)
percentage

0.8
percentage

0.8
relativerelative

0.4

0.4

0
0 10 20 30 40 50 60 70 80 90 100

0
0 10 Figure
20 7.30Relative
40 percent
50 60error.
70 80 90 100
Figure 7. Relative percent error.

Figure 7. Relative percent error.


Energies 2018, 11, 1880 9 of 14
Energies 2018, 11, x FOR PEER REVIEW 9 of 14

Asshown
As shownininFigure
Figure.
6, 6,
thethe prediction
prediction model
model proposed
proposed in this
in this paper
paper has has better
better fitting
fitting ability
ability and
and has
ahas
gooda good
degreedegree of to
of fitting fitting to the changing
the changing trend ofgas
trend of methane methane gas concentration.
concentration. The relativeThe relative
percentage
percentage
error between error between
the true the true values
and predicted and predicted
is shown values
in Figureis 7,
shown
whereintheFigure 7, relative
average where the average
percentage
relative
error percentage
is 0.26% and the error is 0.26%relative
maximum and thepercentage
maximumerror relative percentage error is 1.21%.
is 1.21%.
TheLSTM
The LSTMmodel
modelisisused
usedto topredict
predictthe
theother
othergas
gas concentrations.
concentrations.The The predicted
predictedresults
resultsare
areshown
shown
in Table 2, which shows that the average error of the LSTM method is lower than
in Table 2, which shows that the average error of the LSTM method is lower than general regression general regression
neural network
neural network (GRNN),
(GRNN), DBN, DBN, and and SVM.
SVM. Therefore,
Therefore, itit can
can bebe seen
seen that
that the
the use
useofofLSTM
LSTMto topredict
predict
transformer concentration has high stability and
transformer concentration has high stability and reliability. reliability.

Table2.2.Gas
Table GasConcentration
ConcentrationPrediction
PredictionResults.
Results.

Average Error (%)


Type of Gas Average Error (%)
LSTM
Type of Gas GRNN DBN SVM
LSTM GRNN DBN SVM
H2 1.89 5.01 2.48 6.77
CH4 H2 0.26 1.89 5.01 3.93 2.48 6.77 1.78 4.01
CH4 0.26 3.93 1.78 4.01
C2H2 C2 H2
2.45 2.45 4.67
4.67 1.93 6.32
1.93 6.32
C2H4 C2 H4 1.45 1.45 2.98 2.98 2.05 5.94 2.05 5.94
C2H6 C2 H6 2.1 2.1 4.24 4.24 1.64 8.46 1.64 8.46

5.2.Gas
5.2. GasConcentration
ConcentrationPrediction
Prediction
Thetransformer
The transformeroil oilchromatographic
chromatographicgas gasconcentration
concentration ratios
ratios are
areused
usedas asthe
theinput
inputto tothe
theDBN
DBN
networkand
network andthe
theseven
seventransformer
transformerrunning
runningstates
statesareareoutput.
output. The
The case
casedatabase
database usedused in inthis
thispaper
paper
contains a total of 3870 datasets, including 838 normal cases and 3032 failure
contains a total of 3870 datasets, including 838 normal cases and 3032 failure cases (521 LT cases,cases (521 LT cases, 376
MTMT
376 cases, 587587
cases, HTHT cases, 519
cases, PDPD
519 cases,
cases,489
489LDLDcases
casesand
and540
540HDHDcases).
cases). 90%
90% ofof the sample data
the sample data are
are
randomly selected from the database to train the DBN network, leaving 10%
randomly selected from the database to train the DBN network, leaving 10% of the sample data as the of the sample data as
the test sample to test the accuracy of the classification.
test sample to test the accuracy of the classification.
Theresults
The resultsofofthethe classification
classification of DBN,
of DBN, SVM, SVM,
andand
BPNN BPNN at aare
at a test testshown
are shown
in Figurein Figure
8. This8. This
paper
paper evaluates the classification results of transformer running states by drawing
evaluates the classification results of transformer running states by drawing confusion matrix. Light confusion matrix.
Light squares
green green squares on the diagonal
on the diagonal indicate indicate
the number the number
of samples of that
samples
match that
thematch
predictedthe predicted
category
category
with with the
the actual actual and
category, category, andsquares
the blue the blueindicate
squaresthe
indicate
number theofnumber
falsely of falsely identified
identified samples.
samples. The last row of gray squares is the precision (the number
The last row of gray squares is the precision (the number of correctly predicted samples/number of of correctly predicted
samples/number
predicted samples). ofThe
predicted samples).
last column The last
of orange column
squares of recall
is the orange squares
(the number is the recall (the
of correctly number
predicted
of correctly predicted samples/actual number of samples). The last purple square
samples/actual number of samples). The last purple square is the accuracy (all correctly predicted is the accuracy (all
correctly predicted
samples/all samples). samples/all samples).

H LT MT HT PD LD HD recall
H 80 1 0 1 0 1 0 96.4%
LT 2 45 0 2 0 0 3 86.5%
actual category

MT 0 1 34 0 1 1 0 91.9%
HT 0 2 3 51 0 0 2 87.9%
PD 0 2 0 0 48 0 1 94.1%
LD 1 0 2 2 1 42 0 87.5%
HD 1 0 0 0 3 1 49 90.7%
precision 95.2% 88.2% 87.2% 91.1% 90.6% 93.3% 89.1% 91.1%
(a)

Figure 8. Cont.
Energies 2018, 11, 1880 10 of 14
Energies 2018, 11, x FOR PEER REVIEW 10 of 14

H LT MT HT PD LD HD recall
H 75 0 1 3 1 1 2 90.4%
LT 2 40 3 2 0 2 3 76.92%

actual category
MT 0 2 30 1 2 1 1 81.1%
HT 1 3 2 45 1 3 3 77.6%
PD 2 3 0 1 42 2 1 82.4%
LD 2 1 3 1 2 39 0 81.2%
HD 3 2 0 5 1 2 41 75.93%
precision 88.2% 78.4% 76.9% 77.6% 85.7% 78.0% 80.4% 81.5%
(b)

H LT MT HT PD LD HD recall
H 71 2 1 1 5 1 2 85.5%
LT 3 37 4 1 1 2 4 71.2%

actual category
MT 1 3 25 4 0 2 2 67.6%
HT 1 4 5 40 4 3 1 68.9%
PD 1 2 1 4 41 2 0 80.4%
LD 0 2 2 1 2 38 3 79.2%
HD 3 4 3 5 2 2 35 64.81%
precision 88.8% 68.5% 61.0% 71.4% 74.5% 76.0% 74.5% 74.9%
(c)
Figure8.8. Comparison
Figure Comparison of
ofclassification
classificationresults
results(a)
(a)DBN
DBNresults;
results;(b)
(b)SVM
SVMresults;
results;(c)
(c)BPNN
BPNNresults.
results.

From Figure 8, it can be seen that compared with the SVM model and the BPNN model, the DBN
From Figure 8, it can be seen that compared with the SVM model and the BPNN model, the DBN
model has the highest classification accuracy, which respectively increases the accuracy by 9.6% and
model has the highest classification accuracy, which respectively increases the accuracy by 9.6%
16.2%. And the precision and recall rate of the DBN model are both high, exceeding 85%. The
and 16.2%. And the precision and recall rate of the DBN model are both high, exceeding 85%.
comparison shows that the DBN model has a good effect for the classification of transformer running
The comparison shows that the DBN model has a good effect for the classification of transformer
states. Since a single experiment may be accidental, this paper repeats 10 sets of tests on the DBN
running states. Since a single experiment may be accidental, this paper repeats 10 sets of tests on
model, the SVM model, and the BPNN model to obtain the average accuracy respectively. The
the DBN model, the SVM model, and the BPNN model to obtain the average accuracy respectively.
average accuracy of the three models is 89.4%, 80.1%, and 71.9%. Therefore, it can be seen that the
The average accuracy of the three models is 89.4%, 80.1%, and 71.9%. Therefore, it can be seen that the
DBN model has strong classification stability while maintaining a high accuracy.
DBN model has strong classification stability while maintaining a high accuracy.
5.3. Running State Prediction
5.3. Running State Prediction
The oil chromatogram data from January to October in 2015 of a main transformer in a substation
The oil chromatogram data from January to October in 2015 of a main transformer in a substation
are selected for analysis. The sampling interval for data points is 12 h. The original data are shown in
are selected for analysis. The sampling interval for data points is 12 h. The original data are shown in
Figure 9.
Figure 9.
Energies 2018,
Energies 11, x1880
2018, 11, FOR PEER REVIEW 11 of
11 of 14
14

Figure 9. Relative percent error.


Figure 9. Relative percent error.

First, using
First, usingthetheIEC
IECthree-ratio
three-ratiomethod
methodintegrated
integrated ininthethe original
original system
system forfor analysis,
analysis, there
there is nois
no abnormal warning before September. Until September, the measured ratio
abnormal warning before September. Until September, the measured ratio code is 021, which is code is 021, which is
consistent with
consistent with thermal
thermal fault
fault of
of medium
medium temperature.
temperature. An An abnormal
abnormal warning
warning should
should be be issued
issued at at this
this
time. Secondly, using the integrated threshold method in the original system, H content
time. Secondly, using the integrated threshold method in the original system, H22 content in excess of in excess of
150 µL/L is detected in October and an early warning signal
150 μL/L is detected in October and an early warning signal is required. is required.
Using the
Using the LSTM_DBN
LSTM_DBN model model proposed
proposed inin this
this paper,
paper, the
the transformer
transformer running
running state
state isis predicted
predicted
and evaluated. Starting from the 5th month, use the LSTM model to predict
and evaluated. Starting from the 5th month, use the LSTM model to predict the transformer gas the transformer gas
concentration value
concentration value inin the
the next
next month,
month, then
then calculate
calculate the
the gas
gas concentration
concentration ratios
ratios and
and input
input them
them into
into
the DBN network to get the transformer’s future running state. The transformer’s
the DBN network to get the transformer's future running state. The transformer's running state from running state from
May to
May to October
October is is shown
shown in in Table
Table 3.
3.

Table 3.
Table Transformer running
3. Transformer running state
state prediction
prediction results.
results.

Month Month H LT
H LTMTMT HT
HT PD
PD LDLD HD HD Fault
Fault Case Rate Case Rate
May May57 571 1 3 3 00 11 0 0 0 0 8.1% 8.1%
June June53 530 0 5 5 11 11 0 0 0 0 11.7% 11.7%
July July49 490 0 10 10 22 00 0 0 1 1 20.9% 20.9%
August August30 302 2 28 28 11 00 1 1 0 0 51.6% 51.6%
September 21 4 34 1 0 0 0 65%
September October
21 16
4 3
34 37
14 0
0 2
0 0
0 74.2%
65%
October 16 3 37 4 0 2 0 74.2%

As
As itit can
can be
be seen
seen from
from Table
Table 3, the percentage of fault cases that are obtained obtained through
through analysis
analysis
using the LSTM_DBN model is gradually increasing, of which the percentage
using the LSTM_DBN model is gradually increasing, of which the percentage in August has exceeded in August has exceeded
50% and the
50% and thehighest
highestpercentage
percentage of of fault
fault cases
cases in October
in October is 74.2%.
is 74.2%. It can It
becan
seenbe seen
that therethatis athere is a
potential
potential
operational operational failure.
failure. Table Table that,
3 shows 3 shows that,all
among among all the
the fault typefault type analysis
analysis results, results,
the number the number
of fault
of faultwith
cases casesMTwith
is MT
the is the largest,
largest, so that
so that there
there is aispotential
a potential fault
fault typewith
type withthermal
thermalfaultfault of medium
medium
temperature.
temperature. ItIt needs
needs to to send
send early
early warning
warning signals.
signals. Oil
Oil chromatography
chromatography monitoring
monitoring devicedevice can can bebe
interfered
interfered withwith the
the external
external environment
environment and and cause
cause errors
errors in
in data acquisition.
acquisition. When
When the the fault
fault case
case
accounts
accounts for for more
more than
than 50%,
50%, itit should
should immediately
immediately attract
attract the
the attention
attention of of the
the staff.
staff. For
For thisthis case,
case,
equipment
equipment earlyearly warning
warningshouldshouldbebe issued
issued in August:
in August: “Closely
“Closely concerned
concerned withwith the development
the development trend
trend of chromatographic
of chromatographic data anddatatimely
and timely
check check the transformer
the transformer status”. status”.
The
The actual
actual situation
situation for
for the
the operation
operation and
and maintenance
maintenance personnel's
personnel’s detection
detection records
records shows
shows that that
the
the oil
oil temperature
temperature rises
rises abnormally
abnormally since since June
June andand the
the value
value ofofthe
thecore
coregrounding
grounding current
current increases
increases
gradually.
gradually. The The value
value ofof HH22 in
in the
the oil
oil chromatographic
chromatographic device device exceeds
exceeds 150 µL/L μL/L fromfrom October
October to to
December.
December. During
During the
the outage
outage maintenance
maintenance in in 2016,
2016, there
there are
are traces
traces of of burn
burn atat the
the end
end of of the
the winding
winding
and
and the
the BB phase
phase winding
winding is is distorted.
distorted. The Theprediction
predictionresults
results of
ofthe
thetransformer
transformer running
running state
state through
through
the
the LSTM_DBN model are more consistent with the actual situation. This example shows that the
LSTM_DBN model are more consistent with the actual situation. This example shows that the
transformer running state prediction method based on LSTM_DBN model can detect the abnormal
Energies 2018, 11, 1880 12 of 14

transformer running state prediction method based on LSTM_DBN model can detect the abnormal
upward trend of oil chromatographic data in time and provide early warning to the abnormal state of
the transformer.

6. Conclusions
(1) The LSTM model has excellent ability to process time series and solves problems such as
gradient disappearance, gradient explosion, and lack of long-term memory in the training process.
It can fully utilize historical data. The DBN model can extract the characteristic information hidden in
fault case data layer by layer and has high classification ability.
(2) The transformer running state prediction method based on the LSTM_DBN model presented
in this paper has high accuracy and can send warning information to potential transformer faults in
time. Compared with the threshold method according to the standard and the state prediction method
in the research literature, this paper can make full use of the historical and current state data.
(3) We will focus on the improvement of the LSTM model and the DBN model, as well as
parameter optimization, to further improve the transformer state prediction accuracy in the next step.
Due to the small number of substations with complete online monitoring equipment and rich state
data, the method proposed in this paper needs further verification.

Author Contributions: Jun Lin designed the algorithm, test the example and write the manuscript. Lei Sum
Gehao Sheng, Yingjie Yan, Da Xie and Xiuchen Jiang helped design the algorithm and debug the code.
Acknowledgments: This work was supported in part by the National Natural Science Foundation of China
(51477100) and the State Grid Science and Technology Program of China.
Conflicts of Interest: The authors declare no conflict of interest.

Nomenclature
Variables
x the input vector
y the output vector
h the state of the hidden layer
W xh the weight matrix of the input layer to the hidden layer of RNN network
W hy the weight matrix of the hidden layer to the output layer of RNN network
W hh the weight matrix of the hidden layer state as the input at the next moment of RNN network
f (t) the result of the forget state
Wf the weight matrix of forget state
bf The offset of forget state
i(t) the input gate state result
c( t )
e the cell state input at time t
Wi the input gate weight matrix
Wc the input cell state weight matrix
bi the input gate bias
bc the input cell state bias
o(t) the output gate state result
Wo the output gate weight matrix
bo the output gate offset
v a visible layer
w the weights between visible layers and hidden layers
θ the parameter of RBM
ωij the connection weight between the visible layer node vi and the hidden layer node hj
ai the offsets of vi
bj the offsets of and hj
Symbol
σ the activation function
◦ multiplication by elements
Energies 2018, 11, 1880 13 of 14

References
1. Taha, I.B.M.; Mansour, D.A.; Ghoneim, S.S.M. Conditional probability-based interpretation of dissolved gas
analysis for transformer incipient faults. IET Gener. Transm. Distrib. 2017, 11, 943–951. [CrossRef]
2. Singh, S.; Bandyopadhyay, M.N. Dissolved gas analysis technique for incipient fault diagnosis in power
transformers: A bibliographic survey. IEEE Electr. Insul. Mag. 2010, 26, 41–46. [CrossRef]
3. Cruz, V.G.M.; Costa, A.L.H.; Paredes, M.L.L. Development and evaluation of a new DGA diagnostic method
based on thermodynamics fundamentals. IEEE Trans. Dielectr. Electr. Insul. 2015, 22, 888–894. [CrossRef]
4. Yan, Y.; Sheng, G.; Liu, Y.; Du, X.; Wang, H.; Jiang, X.C. Anomalous State Detection of Power Transformer
Based on Algorithm Sliding Windows and Clustering. High Volt. Eng. 2016, 42, 4020–4025.
5. Gouda, O.S.; El-Hoshy, S.H.; El-Tamaly, H.H. Proposed heptagon graph for DGA interpretation of oil
transformers. IET Gener. Transm. Distrib. 2018, 12, 490–498. [CrossRef]
6. Malik, H.; Mishra, S. Application of gene expression programming (GEP) in power transformers fault
diagnosis using DGA. IEEE Trans. Ind. Appl. 2016, 52, 4556–4565. [CrossRef]
7. Khan, S.A.; Equbal, M.D.; Islam, T. A comprehensive comparative study of DGA based transformer fault
diagnosis using fuzzy logic and ANFIS models. IEEE Trans. Dielectr. Electr. Insul. 2015, 22, 590–596. [CrossRef]
8. Li, J.; Zhang, Q.; Wang, K. Optimal dissolved gas ratios selected by genetic algorithm for power transformer fault
diagnosis based on support vector machine. IEEE Trans. Dielectr. Electr. Insul. 2016, 23, 1198–1206. [CrossRef]
9. Zheng, R.; Zhao, J.; Zhao, T.; Li, M. Power Transformer Fault Diagnosis Based on Genetic Support Vector
Machine and Gray Artificial Immune Algorithm. Proc. CSEE 2011, 31, 56–64.
10. Tripathy, M.; Maheshwari, R.P.; Verma, H.K. Power transformer differential protection based on optimal
probabilistic neural network. IEEE Trans. Power Del. 2010, 25, 102–112. [CrossRef]
11. Ghoneim, S.S.M.; Taha, I.B.M.; Elkalashy, N.I. Integrated ANN-based proactive fault diagnostic scheme
for power transformers using dissolved gas analysis. IEEE Trans. Dielectr. Electr. Insul. 2016, 23, 586–595.
[CrossRef]
12. Li, S.; Wu, G.; Gao, B.; Hao, C.; Xin, D.; Yin, X. Interpretation of DGA for transformer fault diagnosis with
complementary SaE-ELM and arctangent transform. IEEE Trans. Dielectr. Electr. Insul. 2016, 23, 586–595.
[CrossRef]
13. Ghoneim, S.S.M. Intelligent Prediction of Transformer Faults and Severities Based on Dissolved Gas Analysis
Integrated with Thermodynamics Theory. IET Sci. Meas. Technol. 2018, 12, 388–394. [CrossRef]
14. Zhao, W.; Zhu, Y.; Zhang, X. Combinational Forecast for Transformer Faults Based on Support Vector
Machine. Proc. CSEE 2008, 28, 14–19.
15. Zhou, Q.; Sun, C.; Liao, R.J. Multiple Fault Diagnosis and Short-term Forecast of Transformer Based on
Cloud Theory. High Volt. Eng. 2014, 40, 1453–1460.
16. Liu, Z.; Song, B.; Li, E. Study of “code absence” in the IEC three-ratio method of dissolved gas analysis.
IEEE Electr. Insul. Mag. 2015, 31, 6–12. [CrossRef]
17. Song, E.; Soong, F.K.; Kang, H.G. Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based
Speech Synthesis Systems. IEEE Trans. Speech Audio Process. 2017, 25, 2152–2161. [CrossRef]
18. Gao, L.; Guo, Z.; Zhang, H. Video captioning with attention-based lstm and semantic consistency.
IEEE Trans. Multimed. 2017, 19, 2045–2055. [CrossRef]
19. Zhao, J.; Qu, H.; Zhao, J. Towards traffic matrix prediction with LSTM recurrent neural networks. Electron. Lett.
2018, 54, 566–568. [CrossRef]
20. Lin, J.; Sheng, G.; Yan, Y.; Dai, J.; Jiang, X. Prediction of Dissolved Gases Concentration in Transformer Oil
Based on KPCA_IFOA_GRNN Model. Energies 2018, 11, 225. [CrossRef]
21. Dai, J.J.; Song, H.; Sheng, G.H. Dissolved gas analysis of insulating oil for power transformer fault diagnosis
with deep belief network. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 2828–2835. [CrossRef]
22. Ma, M.; Sun, C.; Chen, X. Discriminative Deep Belief Networks with Ant Colony Optimization for Health
Status Assessment of Machine. IEEE Trans. Instrum. Meas. 2017, 66, 3115–3125. [CrossRef]
23. Beevi, K.S.; Nair, M.S.; Bindu, G.R. A Multi-Classifier System for Automatic Mitosis Detection in Breast
Histopathology Images Using Deep Belief Networks. IEEE J. Transl. Eng. Health Med. 2017, 5, 1–11. [CrossRef]
[PubMed]
24. Karim, F.; Majumdar, S.; Darabi, H. LSTM fully convolutional networks for time series classification.
IEEE Access 2017, 6, 1662–1669. [CrossRef]
Energies 2018, 11, 1880 14 of 14

25. Zhang, Q.; Wang, H.; Dong, J. Prediction of Sea Surface Temperature Using Long Short-Term Memory.
IEEE Geosci. Remote Sen. Lett. 2017, 14, 1745–1749. [CrossRef]
26. Zhang, S.; Wang, Y.; Liu, M. Data-Based Line Trip Fault Prediction in Power Systems Using LSTM Networks
and SVM. IEEE Access 2017, 6, 7675–7686. [CrossRef]
27. Song, H.; Dai, J.; Luo, L.; Sheng, G.; Jiang, X. Power Transformer Operating State Prediction Method Based
on an LSTM Network. Energies 2018, 11, 914. [CrossRef]
28. Zhong, P.; Gong, Z.; Li, S. Learning to diversify deep belief networks for hyperspectral image classification.
IEEE Geosci. Remote Sen. Lett. 2017, 55, 3516–3530. [CrossRef]
29. Lu, N.; Li, T.; Ren, X. A deep learning scheme for motor imagery classification based on restricted boltzmann
machines. IEEE Trans. Neural Syst. Rehabil. Eng. 2017, 25, 566–576. [CrossRef] [PubMed]
30. Chen, Y.; Zhao, X.; Jia, X. Spectral–spatial classification of hyperspectral data based on deep belief network.
IEEE J. Sel. Top. Appl. Earth Observ. 2015, 8, 2381–2392. [CrossRef]
31. Taji, B.; Chan, A.D.C.; Shirmohammadi, S. False Alarm Reduction in Atrial Fibrillation Detection Using Deep
Belief Networks. IEEE Trans. Instrum. Meas. 2018, 67, 1124–1131. [CrossRef]

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like