Energies 11 03018
Energies 11 03018
Article
Wind Turbine Multi-Fault Detection and
Classification Based on SCADA Data
Yolanda Vidal * , Francesc Pozo and Christian Tutivén
Control, Modeling, Identification and Applications (CoDAlab), Department of Mathematics, Escola d’Enginyeria
de Barcelona Est (EEBE), Universitat Politècnica de Catalunya (UPC), Campus Diagonal-Besòs (CDB), Eduard
Maristany, 16, 08019 Barcelona, Spain; [email protected] (F.P.); [email protected] (C.T.)
* Correspondence: [email protected]; Tel.: +34-934-137-309
Abstract: Due to the increasing installation of wind turbines in remote locations, both onshore and
offshore, advanced fault detection and classification strategies have become crucial to accomplish
the required levels of reliability and availability. In this work, without using specific tailored devices
for condition monitoring but only increasing the sampling frequency in the already available (in
all commercial wind turbines) sensors of the Supervisory Control and Data Acquisition (SCADA)
system, a data-driven multi-fault detection and classification strategy is developed. An advanced
wind turbine benchmark is used. The wind turbine we consider is subject to different types
of faults on actuators and sensors. The main challenges of the wind turbine fault detection lie
in their non-linearity, unknown disturbances, and significant measurement noise at each sensor.
First, the SCADA measurements are pre-processed by group scaling and feature transformation
(from the original high-dimensional feature space to a new space with reduced dimensionality)
based on multiway principal component analysis through sample-wise unfolding. Then, 10-fold
cross-validation support vector machines-based classification is applied. In this work, support vector
machines were used as a first choice for fault detection as they have proven their robustness for some
particular faults, but at the same time have never accomplished the detection and classification of
all the proposed faults considered in this work. To this end, the choice of the features as well as
the selection of data are of primary importance. Simulation results showed that all studied faults
were detected and classified with an overall accuracy of 98.2%. Finally, it is noteworthy that the
prediction speed allows this strategy to be deployed for online (real-time) condition monitoring in
wind turbines.
Keywords: wind turbine; fault detection; fault classification; fault diagnosis; principal component
analysis; support vector machines; (Fatigue, Aerodynamics, Structures and Turbulence) FAST code
1. Introduction
Wind energy offers many advantages, as it is an inexhaustible clean fuel source. This explains why
it is one of the fastest-growing renewable sources against greenhouse effects. Currently, research efforts
are aimed at minimizing the overall cost of this energy. The tendency to use larger wind turbines (WTs)
in harsh operating environments (e.g., offshore) implies that one of the main cost drivers is directly
related to operation and maintenance actions. Thus, fault diagnosis (FD) is crucial for wind power to
be cost-competitive, and even more so for offshore wind farms where bad weather conditions (e.g.,
storms, high tides, etc.) can prevent any repair actions for several weeks.
A variety of surveys on FD considering different WT components have recently been published.
For example, in [1] a wide variety of WT fault locations are considered—rotor, gearbox, bearing, main
shaft, hydraulic system, tower, generator, and sensors—as well as the different signal processing
methods that are most frequently used in the literature to deal with these types of faults. Reference [2]
mainly aims to survey the most recent condition and performance monitoring approaches of WTs
with the primary focus on blade, gearbox, generator, braking system, and rotor. However, the more
recent trend in this type of literature review is to focus on a specific WT sub-assembly: the bearings
and planetary gearbox [3,4], the generator and power converter [5,6], the blades [7,8], etc. Most of
these methods, which focus on a specific part of the WT, require the choice of the most appropriate
sensors, their advisable position in the sub-assembly, and the most convenient strategy to extract as
much information as possible from the obtained data. These are highly localized strategies, and each
one relies on the installation of (costly) extra sensors. However, it should be possible to retrofit a
multi-fault condition monitoring package onto existing WTs without requiring additional sensors and
wiring on the machines. In fact, there is a large amount of operational (Supervisory Control and Data
Acquisition—SCADA) data available (already collected at the WT controller), which can be used to
diagnose the turbine condition. This section addresses the state-of-the-art in the FD of WT faults using
SCADA data.
In recent years, there have been efforts to develop FD strategies by analyzing only SCADA
data. The use of machine learning techniques has been crucial in this area. For example, in [9], fault
prediction and diagnosis for the WT generator is accomplished using real-world SCADA data from two
wind power plants located in China based on principal component analysis (PCA) and unsupervised
clustering methods. In [10], a FD strategy for WT gearboxes is proposed based on artificial neural
networks (ANNs) and tested on real-world SCADA data sets of a wind farm in Southern Italy. In [11],
a strategy to diagnose WT faults from SCADA data using support vector machines (SVMs) is advised.
Generally, the classification methods that deserve special mention are SVM and ANN, because of their
ability to handle non-linear and noisy data. On one hand, the use of ANNs has drawbacks related to
their training time and dependability on the optimization of fine-tuning their parameters. In particular,
in [12] the correct number of parameters and their corresponding values must be carefully selected to
create a normal behavior model based on an ANN. On the other hand, the SVM is simpler and has
successfully proven its suitability in this type of problem. Thus, the SVM is the selected classifier in
this paper.
Considerable research has been done on FD methods based on SVM classifiers that analyze
only SCADA data. For example, different faults are studied in [13], but faults in the pitch actuators
unfortunately could not be detected, and furthermore, the sampling period is unfeasible (0.01 s).
Note that SCADA data is typically recorded at 10-minute intervals to reduce transmitted data
bandwidth and storage. In [14], an SVM could isolate some faults, except for high varying dynamics
(including a pitch actuator fault), where the use of an observer, which is model-based, was found
necessary and, again, the sampling period was 0.01 s. Later references based on SVM are, mainly,
specifically tailored for a particular type of fault. For example, in [15] an SVM-based method is
proposed to classify the misalignment type of fault; generator faults are diagnosed in [9]; only actuator
faults are considered in [16]; and generator and power feeder cables faults are diagnosed in [11]. In this
paper, we widen the number and type of the studied faults with a unique strategy to cope with them
all: three different pitch actuator faults (i.e., high air content in oil, pump wear, hydraulic leakage),
a generator speed sensor fault (gain factor of 1.2), three different pitch sensor faults (stuck in 5 deg,
stuck in 10 deg, and with a gain factor of 1.2), and a torque actuator offset fault.
As has been noted previously, one of the major drawbacks to using SCADA data is the 10-minute
sampling period. This low-frequency resolution negatively affects the diagnosis capabilities, and
may hide short-lived events. On the other hand, high-resolution (but feasible) SCADA data should
allow the dynamic turbine behavior to be identified with higher fidelity and thus improve detection
efficiency. As stated in [17,18], in this work a research framework is proposed that takes SCADA data
with an additional high but feasible (1 s) frequency from the sensors. That is, the only requirement is to
increase the frequency rate in the SCADA data from the already available sensors. Following this idea,
in this work, we propose a strategy to detect and classify (through SVM) multiple WT faults using only
Energies 2018, 11, 3018 3 of 18
conventional SCADA data with an additional, but feasible (sampling period of 1 s), high-frequency
sampling from the sensors and without the added cost of retrofitting additional sensors to the turbine.
This paper is organized as follows. In Section 2, the WT benchmark model is introduced and
the proposed FD strategy is described. The obtained results are presented and discussed in Section 3.
Section 4 states the conclusions and future work.
The most important features of the WT are detailed in Table 2. In this paper, we deal with the full
load region of operation in the sense that the proposed controller main objective is that the electric
power closely follows the rated power.
Energies 2018, 11, 3018 4 of 18
A set of fault scenarios are defined in the WT model. These scenarios are primarily introduced in
sensors and actuators. More precisely, the types of faults are gain factors, offsets, changes in the system
dynamics, and stuck, as shown in Table 3. These faults are inspired by research in both proprietary
and public domain sources [23]. As an extra reference, the interested reader can find a comprehensive
description of these faults and their importance in [24].
how the results of a statistical analysis will generalize to an independent data set, was also considered,
and therefore the impact of a particular noisy subset of data was minimized.
• Fault 8 (F8) is required to fulfill Td < 3Ts . This is the most restrictive detection time, as this is the
most severe fault. It is related to the torque actuator, and it is noteworthy that the torque rate limit
for the NREL 5-MW WT is 15,000 Nm/s [26].
• Fault 1 (F1) is required to fulfill Td < 8Ts . This fault has a high varying dynamic and is related to
the pitch actuator (i.e., high air content in oil). In this case, the blade-pitch rate limit for the NREL
5-MW WT is 8 deg/s, as this is speculated to be the blade pitch rate limit of conventional 5-MW
machines based on General Electric (GE) Wind’s long blade test program [26].
• Faults 4 to 7 (F4, F5, F6, F7) are required to fulfill Td < 10Ts . These faults are related to the
generator speed sensor and the pitch sensors.
• Finally, Faults 2 and 3 (F2, F3) are only required to satisfy Td < 100Ts , as these are faults
with a very slow dynamic. These faults are related to the pitch actuator (i.e., pump wear and
hydraulic leakage).
Using the three most restrictive requirements, it is proposed to organize the available data from
the simulations in three different manners:
(a) In samples of only J = 3 time steps (this will lead to a detection time of approximately 3Ts ).
(b) In samples of J = 8 time steps (in this case, detection time is close to 8Ts ).
(c) In samples of J = 10 time steps (for a detection time around to 10Ts ).
Energies 2018, 11, 3018 6 of 18
The goal of the remainder of this section is to show how the data were reshaped in samples of
J time steps. As said before, the data came from 260 simulations of 400 s duration each (with a time
step of 1 s) and nine sensors available. These data were initially stored, for each sensor, in a matrix
as follows:
(k) (k) (k)
x1,1 x1,2 · · · x j 400 k
1, J J
x (k) (k)
x2,2 · · ·
(k)
x j 400 k
2,1
2, J J
.. .. ..
..
. . . .
∈M j k (R), (1)
(k) (k) (kj) k 260× 400 J
x x · · · x
J
i,1 i,2 400
i, J J
.. .. ..
. ..
. . .
(k) (k) (k) j k
x260,1 x260,2 · · · x 400
260, J J
where the super-index (k) is related to the different sensors k = 1, 2, . . . , 9. That is, there is one of these
matrices for
j each
k sensor. The matrix has as many rows as simulations (260). The number of columns is
taken as 400
J J, where b·c is the floor function, to ensure that the matrix can later be reshaped in a
j k of J columns. When J = 8 or J = 10, this results in using the whole 400 s of each simulation
matrix
400
( J J = 400 in these two cases, as 8 and 10 are divisors of 400). However, when J = 3 it is obtained
j k
that 400
J J = 399, thus in this case only 399 s of each simulation are used.
As said before, when a WT has to be diagnosed, it is desirable that a diagnosis can be obtained
with a few seconds of measured data. Thus, instead of working with the matrices in Equation (1)
(where each sample would correspond to 399 or 400 s of data), data were reshaped in a matrix with
only J columns (as stated before, in this work J = 3, 8, or 10) as follows:
(k) (k) (k)
x1,1 x1,2 ··· x1,J
(k) (k) (k)
···
x1,J +1 x1,J +2 x1,2J
.. .. .. ..
. . . .
(k) (k) (kj)
···
x1,400− J +1 x1,400− J +2 x 400
k
1, J J
(k) (k) (k)
x2,1 x2,2 ··· x2,J
(k) (k) (k)
x2,J +1 x2,J +2 ··· x2,2J
.. .. .. ..
. . . .
∈ M j 400 k (R), (2)
(k) (k) (kj)
260 J × J
x
2,400− J +1 x2,400− J +2 ··· x 400
k
J
2, J
.. .. .. ..
.
. . .
(k) (k) (k)
x260,1 x260,2 ··· x260,J
x (k) (k) (k)
260,J +1 x260,J +2 ··· x260,2J
.. .. .. ..
. . . .
(k) (k) (k) j
x260,400− J +1 x260,400− J +2 ··· x 400
k
260, J J
where J defines the number of seconds of each sample, and recall that the super-index j(k) isk related to
400
the different sensors k = 1, 2, . . . , 9. The total number of samples is given by I = 260 · J , that is
Figure 1 illustrates how the available data from the 260 long run simulations (see Equation (1))
were reorganized in a third-order tensor (multidimensional array with three indices) with short time
samples of J time steps (see Equation (2)). The first J data-points determine the first sample (represented
by the light blue color box in Figure 1). Immediately after, the next J data-points determine the second
sample (red color box), etc. After the last J data-points of the first simulation (light green), the first J
data-points of the second simulation (orange box) define the next sample, and so on. In general, let us
consider that we have different sensors k = 1, 2, . . . , K stored at j = 1, 2, . . . , J time instants. Similar
data are generated for a number of samples i = 1, 2, . . . , I. This results in the third-order tensor X
( I × J × K ) as illustrated in Figure 1, where the height (I) gives the number of samples; the width (J)
gives the number of time instants; and the length (K) gives the number of sensors.
1
2
Simulations =
Samples (i)
Sensors 260
1 J 2J
Time
Sensors (k) I
1 J
Time (j)
Figure 1. Reshape data from long-run simulations (left) into a third-order tensor X ( I × J × K ) with
short time samples of J seconds (right).
The crux of the matter for fault detection by SVM is the definition of the features to be used for
classification [13]. In this work, statistical analysis by multiway PCA is used for pretreatment of the
raw data. This is equivalent to implementing basic PCA on a large two-dimensional matrix assembled
by unfolding the third-order tensor X , see Figure 1. There are three possible ways of unfolding this
tensor, as suggested by [28]. In general, sample-wise unfolding facilitates the analysis of the variability
among samples by summarizing the information related to the measured variables (sensors) and their
variations over time. Thus, in this work, the sample-wise unfolding is used (see Figure 2), where
X ( I × J × K ) −→ X ( I × JK ). (3)
That is, the I × J planes are concatenated into a large two-dimensional matrix X. In summary,
multiway PCA of the third-order tensor X in Figure 1 is implemented considering PCA of the
sample-wise unfolded matrix X in Equation (2).
Energies 2018, 11, 3018 8 of 18
Samples (i)
Sensors (k)
Time (j)
1 J 2J JK
X= Sensor (1) Sensor (2) Sensor (k) Sensor (K) Samples (i)
1 I
I i∑
µj = xij , j = 1, . . . , JK, (4)
=1
v
u I
u1
σj = t ∑ ( xij − µ j )2 , j = 1, . . . , JK, (5)
I i =1
where µ j and σj are the mean and the standard deviation, respectively, of all the measures at column j.
Accordingly, the elements of matrix X are normalized to create a new matrix X̃ as
xij − µ j
x̃ij := , i = 1, . . . , I, j = 1, . . . , JK. (6)
σj
Since the input data are given in a mean-centered matrix X̃, the empirical covariance matrix, S,
can be computed as
1
S= X̃ T X̃ ∈ M( JK )×( JK ) (R). (7)
I−1
S = PDP T , (8)
λ1 + · · · + λ d
. (9)
λ1 + · · · + λ JK
In the first case, when J = 3, from a total of J × K = 3 × 9 = 27 components, 99.98% of the variance
is accomplished by the first d = 16 components. When J = 8, from a total of J × K = 8 × 9 = 72
components, the first d = 42 components are needed to keep 99.98% of the variance. Finally, when
J = 10, from a total of J × K = 10 × 9 = 90 components, the demanded variance is accomplished by
the first d = 52 components. Thus, the matrix Pd ∈ M( JK )×(d) (R), with only the first d columns of P is
used. Finally, the score matrix Y ∈ M( I )×(d) (R) (transformed coordinates of the X̃ data in the new
basis given by the first d principal components), whose columns will be used as features by the SVM
strategy, is computed as
Y = X̃Pd . (10)
x2 +
+
op
+ +
tim
al
+ hy
- pe
rp
lan
e
-
arg um
- - -
m xim
in
ma
x1
Figure 3. Linear support vector machine (SVM) in a two-dimensional example.
h( x ) = ω T x + b, (11)
where b is known as the bias term and ω is the weight vector. The optimal hyperplane can be
characterized in an infinite number of different ways by scaling of b and ω. As a matter of agreement,
among all the possible descriptions of the hyperplane, the so-called canonical hyperplane is chosen
that satisfies
ω T x+
sv
+ b = 1, (12)
ω T x−
sv
+ b = −1, (13)
where x+ sv and x sv symbolize the (+) and (−) training samples closest to the hyperplane, that is the
−
so-called support vectors, see Figure 3. The distance between a point x and the hyperplane h is given by
|ω T x + b|
d( x, h) = . (14)
||ω ||
In particular, for the canonical hyperplane, when x is a support vector, the numerator |ω T x + b| is
equal to one and the distance to the support vector is
sv 1
d( x ± , h) = . (15)
||ω ||
2
The width of the margin is twice this distance (i.e., ||ω ||
). Thus, maximizing the
||ω ||
margin is equivalent to minimizing the expression 2 , which is equivalent to the following
minimization problem:
(
1 h( xi ) ≥ 1, ∀yi = 1 samples;
min ||ω ||2 subject to (16)
ω,b 2 h( xi ) ≤ −1, ∀yi = −1 samples.
The two previous restrictions can be rewritten in one single equation by taking the product h( x )y,
1
min ||ω ||2 subject to h( xi )yi ≥ 1, i = 1, . . . , N. (17)
ω,b 2
Energies 2018, 11, 3018 11 of 18
This problem, to find the extrema of a function with constraints, can be solved using Lagrange
multipliers, thus leading to
N
1 h i
min L(ω, b) = ||ω ||2 − ∑ αi yi (ω T xi + b) − 1 , (18)
ω,b 2 i =1
where αi are the Lagrange multipliers. Taking partial derivative with respect to ω equal to zero,
N N
∂L(ω, b)
= ω − ∑ αi yi xi = 0 ⇔ ω = ∑ αi yi xi . (19)
∂ω i =1 i =1
This equation states that the decision vector, ω, is a linear combination of the data samples.
Taking partial derivative with respect to b equal to zero,
N
∂L(ω, b)
= − ∑ αi yi = 0. (20)
∂b i =1
Finally, substitution of Equations (19) and (20) into Equation (18) leads to
!T ! !T
1 N N N N N N
∑ αi yi xi ∑ αj yj xj − ∑ αi yi ∑ αj yj xj xi − b ∑ αi yi + ∑ αi
min , (21)
αi 2
i =1 j =1 i =1 j =1 i =1 i =1
| {z }
=0
If the data do not admit a separating hyperplane, SVM can use a soft margin, meaning a
hyperplane that separates many, although not all data points. Consequently, the previous problem is
generalized by means of slack variables, ε i , and a penalty parameter, C. The general formulation for
the linear kernel is in this case:
(
N
1 h( xi )yi ≥ 1 − ε i , i = 1, . . . , N;
min ||ω ||2 + C ∑ ε i subject to (23)
ω,b,ε i 2 i =1
ε i ≥ 0, i = 1, . . . , N.
The final set of restrictions shows why the penalty parameter C is frequently called a box constraint,
as it keeps the admissible values of the Lagrange multipliers in a bounded region. In this work, the box
constraint value was tuned to optimize the performance of the SVM, as shown in Section 4.
From Equations (22) and (24), it is obvious that optimization depends only on dot products of
pairs of samples. Additionally, the decision rule depends only on the dot product. Furthermore,
the optimization problem is solved in a convex space (in contrast to neural networks), thus it never
obtains a local extrema but the global one. When the space is not linearly separable (the classification
problem does not have a simple hyperplane as a useful separating criterion even using a soft margin),
Energies 2018, 11, 3018 12 of 18
a transformation to another space can be used, φ(·). In fact, the transformation itself is not needed, but
just the dot product, the so-called kernel function:
K ( x i , x j ) = φ ( x i ) φ ( x j ). (25)
The kernel function permits the computation of the inner product between the mapped vectors
without expressly calculating the mapping. This is advantageous, as it implies that if data are
transformed into a higher-dimensional space (which helps to better classification) there is no need
to compute the exact transformation of the data, but only the inner product of the data in that
higher-dimensional space (which is computationally cheaper). This is known as the “kernel trick” [31].
Different kernels can be used, namely polynomial, hyperbolic tangent, or Gaussian radial basis
function. On one hand, the feature space mapping of the Gaussian kernel has infinite dimensionality.
On the other hand, the Gaussian kernel has a ready interpretation as a similarity measure, as its value
decreases with distance and ranges between zero and one. For these reasons, in this work the Gaussian
kernel is used, namely,
K ( xi , x j ) = e
−γ (|| xi − x j ||2 ) , (26)
where γ is a free parameter, hereafter denoted as kernel scale, related to the Gaussian kernel width.
In this work, the kernel scale is computed as the inverse of the square root of the number of features.
Note that in this work, the same features and the same kernel scale value for the Gaussian kernel
are used to detect all faults. In other words, a unique trained SVM is able to classify among all the
studied classes (i.e., eight faulty classes and one healthy class). That is not the case in the previous
literature related to WT fault detection (e.g., [13,32]) where the features and the variance were adjusted
case-by-case to detect each different fault, thus leading to a much more complex strategy that needed
as many different SVM classifiers as faults to detect. Regarding computational effort, there is a
clear advantage related to the feature computation, as only one set of features is needed in our
proposed approach.
As was mentioned earlier, SVM classification is essentially a binary (i.e., two-class) classification
technique, which has to be modified to deal with the multi-fault classification. Two of the most
common methods to enable this adaptation include the one-vs.-one and one-vs.-all approaches. The
one-vs.-all technique represents the earliest and most common SVM multiclass approach [33], and
comprises the division of an N class dataset into N two-class cases, and it chooses the class which
classifies the test with greatest margin. The one-vs.-one strategy comprises constructing a machine
for each pair of classes, thus resulting in N ( N − 1)/2 machines. When this approach is applied to a
test point, each classification gives one vote to the winning class, and the point is labeled with the
class having the most votes. The one-vs.-one strategy is more computationally demanding because the
results of more SVM pairs need to be computed. In this work, the one-vs.-all approach is used.
J x #sensors
J time steps
data coming
from a WT to be …
diagnosed
training SCALING
data
PCA P
SVM
}
…
CLASSIFIER
Figure 4. Data coming from a wind turbine (WT) to be diagnosed are first scaled, then projected
into the vectorial space spanned by the first principal components, and finally the projection enters
the classifier.
The box constraint value is tuned to optimize the SVM performance. Making this value large
increases the weight of misclassification, see Equation (23), which leads to a stricter separation.
However, increasing its value leads to longer training times. The value C = 50 was used in this work
because, as shown in Figure 5, with smaller values the overall accuracy was degraded and with larger
values similar results were obtained (with longer training times).
Table 4 summarizes the results obtained from the proposed strategy. It presents not only the
overall accuracy, but also the training time and prediction speed, as both parameters are critical in
real application. Notice that in all cases, the prediction speed allows this strategy to be deployed
for online (real-time) condition monitoring in WTs. Besides, a comprehensive decomposition of the
error between the true classes and the predicted classes is shown by means of the so-called confusion
matrices, see Figures 6–8 (an empty blank square means 0%). In these matrices, each row represents
the instances in a true class while each column represents the instances in a predicted class (by the
classifier). In particular, the first row (and first column) is labeled as 0 and corresponds to the healthy
Energies 2018, 11, 3018 14 of 18
case. The next labels (for rows and columns) correspond to each fault (from Fault 1 to Fault 8). From
the confusion matrices and Table 4, the following issues can be highlighted.
When detection time was approximately 3 s (J = 3), the overall accuracy was 95.5%. In this case,
the healthy class had a true positive rate (TPR, the percentage of correctly classified instances) higher
than 99% and a false negative rate (FNR, the percentage of incorrectly classified instances) smaller than
1%. Fault 1 (the most difficult to classify in previous references and related to the pitch actuator fault
with high dynamics) had a TPR of 77% and an FNR of 23%. This FNR percentage was mainly obtained
from 17% missing faults and 6% confusion with Fault 2, which is also a fault located in the pitch
actuator. Fault 6, related to a stuck value (10 deg) of the pitch sensor measurement, was misclassified
as healthy 5% of the time, 3% of the time it was confused with the same type of fault but with only a
5 deg stuck value (Fault 5), and 2% of the time it was misclassified as Fault 2 (pitch actuator fault).
The other faults had a TPR higher than 92%. Note that Fault 8, the most severe one and related to the
torque actuator, had a 100% TPR with this most restrictive detection time.
When detection time was approximately 8 s (J = 8), the overall accuracy was 98%. As in the
previous case, the healthy class had a TPR higher than 99%. Fault 1 increased its TPR to 79% (where
16% were missed faults and 5% confusion with Fault 2), and all the other classes increased their TPR
to values higher than 98%. Note that Fault 4, related to the generator speed sensor, reached a 100%
TPR. The generator speed measurement from the sensor was used as input in the torque and pitch
controllers, and thus being able to correctly diagnose this type of fault is extremely important. As in
the previous case, Fault 8 kept a 100% TPR.
Finally, when J = 10 the overall accuracy was 98.2%. In this case, Fault 1 was improved to
have a TPR of 80%. In this case, all misclassifications were 1% or lower, except for Fault 1 that was
misclassified as healthy 15% of the time and misclassified as Fault 2 5% of the time (recall that this is
also a pitch actuator fault). Observe that Faults 1, 4, and 8 obtained a remarkable 100% TPR.
96
95
94
93
Overall Accuracy
92
91
90
89
88
0 10 20 30 40 50 60 70 80 90
C
8 100% 100%
0 1 2 3 4 5 6 7 8 True False
Positive Negative
Predicted class
Rate Rate
4 100% 100%
5 1% 99% 99% 1%
8 100% 100%
0 1 2 3 4 5 6 7 8 True False
Positive Negative
Predicted class
Rate Rate
0 100% 100%
4 100% 100%
8 100% 100%
0 1 2 3 4 5 6 7 8 True False
Positive Negative
Predicted class
Rate Rate
Access to real SCADA datasets is often proprietary, and therefore they are not accessible by
the scientific community. To overcome this difficulty, in this work simulated data were obtained by
one of the most widely accepted WT simulators in the scientific community (FAST). The drawbacks
of using simulated data is that there is no possibility to evaluate the proposed method in a full
test set representing the true distribution of real-world data where class imbalance is a challenging
problem [40]. However, there are several references (e.g., [11]) where this problem is solved in the
training stage using under/oversampling of the training data.
4. Conclusions
Because of its standard low sampling rate, there is a lack of knowledge on the potential of SCADA
data for condition monitoring. In this work, a promising strategy to detect and classify multiple
WT faults was presented using only conventional SCADA data with an additional, but feasible,
high-frequency sampling from the sensors (1 sample/s). That is, the FD strategy does not involve the
supplementary installation of costly purpose-built data sensing equipment for wind power plants.
Note that in this work, in contrast to the previous literature, the same features and the same
variance for the Gaussian kernel were used to detect all the faults detailed in the benchmark. Thus,
leading to a unique trained classifier capable of coping with all the studied faults by computing only
one set of features from the data to diagnose. Consequently, the strategy that we propose outperformed
other approaches.
Energies 2018, 11, 3018 17 of 18
As future work, other faults will be included involving misalignment, ice accumulation, and tower
damage. Finally, we will study the contribution of an effective predictive maintenance strategy based
on this same principle in order to further optimize operation and maintenance in WTs.
References
1. Hossain, M.L.; Abu-Siada, A.; Muyeen, S.M. Methods for Advanced Wind Turbine Condition Monitoring
and Early Diagnosis: A Literature Review. Energies 2018, 11, 1309. [CrossRef]
2. Ahadi, A. Wind turbine fault diagnosis techniques and related algorithms. Int. J. Renew. Energy Res. (IJRER)
2016, 6, 80–89.
3. De Azevedo, H.D.M.; Araújo, A.M.; Bouchonneau, N. A review of wind turbine bearing condition
monitoring: State of the art and challenges. Renew. Sustain. Energy Rev. 2016, 56, 368–379. [CrossRef]
4. Kandukuri, S.T.; Klausen, A.; Karimi, H.R.; Robbersmyr, K.G. A review of diagnostics and prognostics of
low-speed machinery towards wind turbine farm-level health management. Renew. Sustain. Energy Rev.
2016, 53, 697–708. [CrossRef]
5. Huang, S.; Wu, X.; Liu, X.; Gao, J.; He, Y. Overview of condition monitoring and operation control of electric
power conversion systems in direct-drive wind turbines under faults. Front. Mech. Eng. 2017, 12, 281–302.
6. Yang, Z.; Chai, Y. A survey of fault diagnosis for onshore grid-connected converter in wind energy
conversion systems. Renew. Sustain. Energy Rev. 2016, 66, 345–359. [CrossRef]
7. Ochieng, F.X.; Hancock, C.M.; Roberts, G.W.; Le Kernec, J. A review of ground-based radar as a noncontact
sensor for structural health monitoring of in-field wind turbines blades. Wind Energy 2018. [CrossRef]
8. Shohag, M.A.S.; Hammel, E.C.; Olawale, D.O.; Okoli, O.I. Damage mitigation techniques in wind turbine
blades: A review. Wind Eng. 2017, 41, 185–210. [CrossRef]
9. Zhao, Y.; Li, D.; Dong, A.; Kang, D.; Lv, Q.; Shang, L. Fault Prediction and Diagnosis of Wind Turbine
Generators Using SCADA Data. Energies 2017, 10, 1210. [CrossRef]
10. Astolfi, D.; Castellani, F.; Scappaticci, L.; Terzi, L. Diagnosis of wind turbine misalignment through SCADA
data. Diagnostyka 2017, 18, 17–24.
11. Leahy, K.; Hu, R.L.; Konstantakopoulos, I.C.; Spanos, C.J.; Agogino, A.M.; O’Sullivan, D.T.J. Diagnosing and
predicting wind turbine faults from SCADA data using support vector machines. Int. J. Progn. Health Manag.
2018, 9, 1–11.
12. Mazidi, P.; Du, M.; Tjernberg, L.B.; Bobi, M.A.S. A performance and maintenance evaluation framework for
wind turbines. In Proceedings of the 2016 International Conference on Probabilistic Methods Applied to
Power Systems (PMAPS), Beijing, China, 16–20 October 2016; pp. 1–8. [CrossRef]
13. Laouti, N.; Sheibat, N.; Othman, S. Support vector machines for fault detection in wind turbines.
In Proceedings of the IFAC World Congress, Milano, Italy, 28 August–2 September 2011; Volume 2,
pp. 7067–7707.
14. Laouti, N.; Othman, S.; Alamir, M.; Sheibat-Othman, N. Combination of model-based observer and support
vector machines for fault detection of wind turbines. Int. J. Autom. Comput. 2014, 11, 274–287. [CrossRef]
15. Xiao, Y.; Hong, Y.; Chen, X.; Chen, W. The application of dual-tree complex wavelet transform (DTCWT)
energy entropy in misalignment fault diagnosis of doubly-fed wind turbine (DFWT). Entropy 2017, 19, 587.
[CrossRef]
16. Abdelkrim, S.; Djamel, M.M.; Samia, A.; Hayet, M.; Mawloud, T. The MAED and SVM for fault diagnosis of
wind turbine system. Int. J. Renew. Energy Res. (IJRER) 2017, 7, 758–769.
Energies 2018, 11, 3018 18 of 18
17. Wang, K.S.; Sharma, V.S.; Zhang, Z.Y. SCADA data based condition monitoring of wind turbines. Adv. Manuf.
2014, 2, 61–69. [CrossRef]
18. Gonzalez, E.; Stephen, B.; Infield, D.; Melero, J. On the use of high-frequency SCADA data for improved
wind turbine performance monitoring. J. Phys. Conf. Ser. 2017, 926, 012009. [CrossRef]
19. Odgaard, P.F.; Stoustrup, J.; Kinnaert, M. Fault tolerant control of wind turbines—A benchmark model.
IFAC Proc. Vol. 2009, 42, 155–160. [CrossRef]
20. KK Wind Solutions. Available online: http://www.kkwindsolutions.com/ (accessed on 10 September 2018).
21. The MathWorks, Inc. Available online: http://www.mathworks.com/ (accessed on 10 September 2018).
22. Odgaard, P.F.; Stoustrup, J.; Kinnaert, M. Fault-tolerant control of wind turbines: A benchmark model.
IEEE Trans. Control Syst. Technol. 2013, 21, 1168–1182. [CrossRef]
23. Odgaard, P.; Johnson, K. Wind Turbine Fault Diagnosis and Fault Tolerant Control—An Enhanced Benchmark
Challenge. In Proceedings of the American Control Conference, Washington, DC, USA, 17–19 June 2013;
pp. 1–6.
24. Ruiz, M.; Mujica, L.E.; Alférez, S.; Acho, L.; Tutivén, C.; Vidal, Y.; Rodellar, J.; Pozo, F. Wind turbine fault
detection and classification by means of image texture analysis. Mech. Syst. Signal Process. 2018, 107, 149–167.
[CrossRef]
25. Lackner, M.A.; Rotea, M.A. Passive structural control of offshore wind turbines. Wind Energy 2011,
14, 373–388. [CrossRef]
26. Jonkman, J.; Butterfield, S.; Musial, W.; Scott, G. Definition of a 5-MW Reference Wind Turbine for Offshore
System Development; Technical Report No. NREL/TP-500-38060; National Renewable Energy Laboratory:
Golden, CO, USA, 2009.
27. May, A.; McMillan, D.; Thöns, S. Economic analysis of condition monitoring systems for offshore wind
turbine sub-systems. IET Renew. Power Gener. 2015, 9, 900–907. [CrossRef]
28. Hong, X.; Xu, Y.; Zhao, G. LBP-TOP: A Tensor Unfolding Revisit. In Proceedings of the Asian Conference
on Computer Vision, Taipei, Taiwan, 20–24 November; pp. 513–527.
29. Vapnik, V. The Nature of Statistical Learning Theory; Springer: Berlin, Germany, 1995.
30. Yang, C.H.; Chin, L.C.; Hsieh, S.C. Morse code recognition using support vector machines. In Proceedings
of the IEEE EMBS Asian-Pacific Conference on Biomedical Engineering, Kyoto, Japan, 20–22 October 2003;
pp. 220–222.
31. Theodoridis, S.; Koutroumbas, K. Pattern Recognition; Elsevier: Amsterdam, The Netherlands, 2009.
32. Santos, P.; Villa, L.F.; Reñones, A.; Bustillo, A.; Maudes, J. An SVM-based solution for fault detection in wind
turbines. Sensors 2015, 15, 5627–5648. [CrossRef] [PubMed]
33. Melgani, F.; Bruzzone, L. Classification of hyperspectral remote sensing images with support vector machines.
IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [CrossRef]
34. McLachlan, G. Discriminant Analysis and Statistical Pattern Recognition; John Wiley & Sons: Hoboken, NJ,
USA, 2004; Volume 544.
35. Devroye, L.; Wagner, T. Distribution-free performance bounds with the resubstitution error estimate
(Corresp.). IEEE Trans. Inf. Theory 1979, 25, 208–210. [CrossRef]
36. Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; CRC Press: Boca Raton, FL, USA, 1994.
37. Stone, M. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B Methodol.
1974, 36, 111–147.
38. Westerhuis, J.A.; Kourti, T.; MacGregor, J.F. Comparing alternative approaches for multivariate statistical
analysis of batch process data. J. Chemom. 1999, 13, 397–413. [CrossRef]
39. Pozo, F.; Vidal, Y.; Salgado, Ó. Wind Turbine Condition Monitoring Strategy through Multiway PCA and
Multivariate Inference. Energies 2018, 11, 749. [CrossRef]
40. Leahy, K.; Gallagher, C.; O’Donovan, P.; Bruton, K.; O’Sullivan, D. A Robust Prescriptive Framework and
Performance Metric for Diagnosing and Predicting Wind Turbine Faults Based on SCADA and Alarms Data
with Case Study. Energies 2018, 11, 1738. [CrossRef]
c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).