0% found this document useful (0 votes)
37 views22 pages

Extraction of Features Using Sparse Based Techniques in Rotating Machinery Vibration Signals

This document discusses techniques for extracting features from vibration signals to detect faults in rotating machinery. It describes: 1. Statistical features like skewness, kurtosis, and entropy that characterize vibration signal distributions. These features can help classify damage severity. 2. Independent component analysis (ICA), which separates mixed signal sources into independent components. It discusses the fastICA algorithm to estimate the independent components from observed mixed data. 3. Applying ICA and statistical features to a gearbox vibration dataset to distinguish normal signals from various fault cases. The features help separate most fault classes, though some are difficult to distinguish.

Uploaded by

Mario Peña
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views22 pages

Extraction of Features Using Sparse Based Techniques in Rotating Machinery Vibration Signals

This document discusses techniques for extracting features from vibration signals to detect faults in rotating machinery. It describes: 1. Statistical features like skewness, kurtosis, and entropy that characterize vibration signal distributions. These features can help classify damage severity. 2. Independent component analysis (ICA), which separates mixed signal sources into independent components. It discusses the fastICA algorithm to estimate the independent components from observed mixed data. 3. Applying ICA and statistical features to a gearbox vibration dataset to distinguish normal signals from various fault cases. The features help separate most fault classes, though some are difficult to distinguish.

Uploaded by

Mario Peña
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Extraction of Features Using Sparse based Techniques in

Rotating Machinery Vibration Signals


immediate
ABSTRACT
This report briefly describes several signal processing techniques related to sparsity and compressive sensing for
the attribute estimation oriented to faults detection in rotating machinery. Several features are extracted from
vibration signals acquired from rotating machines using these signal processing techniques. The signal processing
techniques are: independent component analysis, sparse principal component analysis and sparse dictionary
learning. The report also includes a set of higher order statistical features that are used for characterizing the
vibration signals.
Keywords: Vibration Signal Analysis, Sparse signal analysis, Independent Component Analysis, Sparse Dictionaries

1. INTRODUCTION
A short description of the implemented function used for extracting the features from the vibration signals is
given.

2. STATISTICAL FEATURES
Several statistical features have been tempted for classification of damage severity in rotating machinery.1 The
fact is that the standard technique for machinery assessment consists in estimating the mean and the standard
deviation of the time series. However, consideration of only the first and second moments can lead to errors in the
statistical inferences performed. The errors are related to the fact that the time series probability distribution
is actually not normal. In these cases the higher order statistical features could be useful for characterizing the
time series.

2.1 Skewness
Skewness of a probability distribution is a measure of its asymmetry;2 the higher the (absolute value of the)
skewness, the more asymmetric the distribution. Symmetric distributions have skewness of zero. In Figure 1 an
example of skewness is given.

(a)
(b)
Figure 1. Skewness of a Probability Distribution Function

The formula for the skewness of a sample is:


E(x )3
3
where is the mean of x, is the standard deviation and E() is the expected value of the time series x.
Skewness =

Further author information: (Send correspondence to Diana Jadam, Email: [Link]@[Link] )

(1)

2.2 Kurtosis
The fourth moment of the time series distribution function is basic of the Kurtosis statistic. The Kurtosis reflects
the peakedness of the distribution with a value of 3 for the normal distribution. The Figure 2 illustrates the
kurtosis of a distribution. The distribution on the right is more peaked at the center, which might lead us to

(a)
(b)
Figure 2. Kurtosis of a Probability Distribution Function

believe that it has a lower standard deviation. It has fatter tails, which might lead us to believe that it has a
higher standard deviation. If the effect of the peakedness exactly offsets that of the fat tails, the two distributions
will have the same standard deviation. The different shapes of the two distributions illustrate kurtosis. The
distribution on the right has a greater kurtosis than the distribution on the left. The kurtosis of a distribution
is defined as:
E(x )4
Kurtosis =
(2)
4
where is the mean of x, is the standard deviation and E() is the expected value of the time series x.

2.3 Entropy
Entropy is a quantitative measure of disorder in the system.1 Entropy is a concept in thermodynamics, statistical
mechanics and information theory. There are several signal features related to Entropy.
2.3.1 Shannon Entropy
The Shannon Entropy is also known as information entropy. The Shannon entropy for a signal with samples
denoted as xi is estimated as:
X
ShannonEntropy =
x2i log(x2i )
(3)
i

2.3.2 Log Energy Entropy


In the context of probability measures, the free entropy is the logarithmic energy.3 This feature can be estimated
for a signal as:
X
LogEnergyEntropy =
log(x2i )
(4)
i

with the convention that log(0) = 0.


2.3.3 Threshold Entropy
The threshold entropy is the number of instants that the signal is greater than a certain threshold p.
T hresholdEntropy = #{i, such that kxi k > p}

(5)

2.3.4 Sure Entropy


The sure entropy is defined as:
SureEntropy = n #{i, such that kxi k p} +

min(x2i , p2 )

(6)

where n is the length of the signal and p is a threshold.


2.3.5 Norm Entropy
The norm entropy is estimated as:
N ormEntropy =

kxi kp

with

p1

(7)

2.4 Application of the statistical features for the detection of rotating machinery
damage based on the vibration signals
The statistical features were estimated for the gear-box database that includes the normal case as well as nine
cases of failure in the gear-box. In this test we only considered the L1 load. Results are presented in Figures
3 to 7. In general the features selected for plotting allows one to separate the considered classes except for P7
and P10 where the points of the classes compared are close from each other. Even when all described features
were calculated for both signals provided by accelerometers, in this experiment we only show three features. A
careful comparison and tuning of parameters should be performed for the statistical parameters.

(a)
(b)
Figure 3. Results for the Load L1 using statistical features: (a) comparison between P1 and P2 (b) comparison between
P1 and P3

Results for a second combination of three features is shown in Figures 8 to 12. In this case the features
considered are the Kurtosis for the signal 1, the Shannon Entropy for Signal 1 and the Skewness for the signal
2. Results are better than the previous cases reported, however comparison between classes P1 and P7 are close
from each other.

3. INDEPENDENT COMPONENT ANALYSIS


Let us consider that we have n sources denoted S = [s1 , s2 , s3 ..., sn ]T that are combined to produce m mixed
signals denoted X = [x1 , x2 , x3 , ..., xm ]T that can be described as:
xi =

n
X

aij sj + ni

(8)

i=1

where aij are the mixing coefficients and ni is the noise of the ith mixed signal. This expression can be rewritten
in matrix form as:
X = AS + N
(9)

(a)
(b)
Figure 4. Results for the Load L1 using statistical features: (a) comparison between P1 and P4 (b) comparison between
P1 and P5

(a)
(b)
Figure 5. Results for the Load L1 using statistical features: (a) comparison between P1 and P6 (b) comparison between
P1 and P7

(a)
(b)
Figure 6. Results for the Load L1 using statistical features: (a) comparison between P1 and P8 (b) comparison between
P1 and P9

where A is the mixing matrix and N is noise matrix. The statistical model in equation 9 is the known Independent
Component Analysis (ICA). This model describes how the observed data is generated by mixing the source
signals si . The independent components cannot be directly measured and also the mixing matrix is unknown.
In consequence A and S must be estimated from the observation matrix X. After estimation of A, the inverse
could be estimated as W = A1 and the independent components could be estimated as:
S = WX

(10)

The ICA estimation is based on the assumption that the source components are independent.4 Additionally
the independent components must have nongaussian distributions. ICA is widely used for blind source separation5

(a)
(b)
Figure 7. Results for the Load L1 using statistical features: Comparison between P1 and P10

(a)
(b)
Figure 8. Results for the Load L1 using statistical features: (a) comparison between P1 and P2 (b) comparison between
P1 and P3

3.1 fastICA Algorithm


The fastICA algorithm is an efficient technique for estimating the independent components. This method is
based in using a fundamental result of information theory that states that a Gaussian variable has the largest
entropy among all random variables of equal variance. This leads to a functional approximation to negentropy
stated as follows:
p
X
J(y)
ki [E{Gi (y)} E{Gi (v)}]2
(11)
j=1

where ki is a positive constant, v and y are Gaussian variables of zero mean and unit variance. The Gi are some
nonquadratic functions. Fast ICA is a fixed-point iteration scheme for finding a maximum of non-gaussianity of
W T X using the functional in equation 11 and performing the updating of W as follows:
W + = E{Xg(W T X)} E{g 0 (W T X)}W

(12)

where g is a non-quadratic function such as g(u) = uexp(u2 /2). The input data to this algorithm must be
preprocessed including centering by subtracting the mean and also whitening for linearly transforming each input

(a)
(b)
Figure 9. Results for the Load L1 using statistical features: (a) comparison between P1 and P4 (b) comparison between
P1 and P5

(a)
(b)
Figure 10. Results for the Load L1 using statistical features: (a) comparison between P1 and P6 (b) comparison between
P1 and P7

(a)
(b)
Figure 11. Results for the Load L1 using statistical features: (a) comparison between P1 and P8 (b) comparison between
P1 and P9

vector such that its components are uncorrelated and their variances equal unity. Detailed information about
this algorithm can be found in Hyv
arinen and Oja.4

(a)
(b)
Figure 12. Results for the Load L1 using statistical features: Comparison between P1 and P10

3.2 Application of the Independent Component Analysis to the Vibration Signal


The test performed considered the load L1 and the data matrix was completed with the signals recorded for each
of the repetitions considering a particular combination of voltage, velocity, load and condition of the machine.
For instance for the case P1 and load L1 with voltage V1, we have 5 repetitions where two accelerometer
signals are recorded. Then the data matrix considered in this particular case includes 10 signals recorded for
the 5 repetitions. This data matrix is then decomposed into a 10 independent signals and also the mixing and
separating matrices are estimated. The features used for recognition are the square norm for each of the matrices
and another feature is area of the average periodogram for the 10 estimated components. Results of the analysis
considering the load L1 are presented in Figures 13 to 17. In general the results allow one to obtain separate
clusters for the normal condition with respect to the abnormal case. However in the case P8 show in Figure 16
both clusters are almost overlapped so it would be difficult to perform the detection of abnormalities in this case.

4. SPARSE PRINCIPAL COMPONENT ANALYSIS


Principal Component Analysis (PCA) is a signal processing technique that is usually used for data reduction.6
PCA performs the search of linear combinations of the original variables such that the new derived variable
captures the most of variance of the original dataset. Calculation of PCA is performed using Singular Value
Decomposition (SVD).6 The PCA technique can be stated as follows: the input data is denoted as the matrix
X of size n p, where n and p are the number of observations and the number of variables respectively. In this
matrix we assume that the columns are centered so the mean is zero. Then using SVD matrix can be decomposed
as:
X = U DV T
(13)
where Z = U D are the principal components (PCs) and the columns of V are the loadings of the Principal
Components. In this representation usually the first q (q << min(n, p)) PCs are selected for representing the
data thus obtaining a high degree of reduction in the data. In this representation the principal components
capture the maximum variability along the columns of X and additionally they are uncorrelated. The loadings
are the eigenvectors of the SVD decomposition of X and features or PCs can also be estimated as:
Z = X [v1 , v2 , v3 , ..., vq ],

1qp

(14)

(a)
(b)
Figure 13. Results for the Load L1 using Independent Component Analysis: (a) comparison between P1 and P2 (b)
comparison between P1 and P3

(a)
(b)
Figure 14. Results for the Load L1 using Independent Component Analysis: (a) comparison between P1 and P4 (b)
comparison between P1 and P5

An estimation of the original data is performed from the PCs and the loadings as:
= Z [v1 , v2 , v3 , ..., vq ]T
X

(15)

A drawback of the PCA representation is that even when a short number of PCs are necessary for representing
the data, each PC is a linear combination of all p variables and the loadings are typically non-zero. It sounds
convenient to achieve dimensionality reduction but also reduction of the number of explicit variables used in
the representation. An empirical method is setting the loadings values to zero when they are lower than a
predefined threshold. This method could however lead to unreliable representations of data.7 A representation
called Sparse Principal Components Analysis (SPCA) was proposed by Zou et al.8 where the PCA is written as a
regression-type optimization with a quadratic penalty (the Lasso penalty) leading to a modified sparse loadings.
According to this algorithm the loadings are estimated iterative. The calculations are performed for a fixed q,
1 q p selected by the user. The algorithm is based on the fact that the eigenvectors or loadings play two
roles in PCA as shown in equations 14 and 15. Let us B denote the matrix V in equation 14 and A the matrix
V in equation 15. The algorithm performs the estimation of the loadings in an alternating way considering two
stages 1) given A (estimated in the first iteration using SVD) calculate B and 2) given B (estimated previously)
calculate A. The estimation of B is performed a sparse regression algorithm known as larsen algorithm. Both
stages are repeated until a final convergence criterion is met.

(a)
(b)
Figure 15. Results for the Load L1 using Independent Component Analysis: (a) comparison between P1 and P6 (b)
comparison between P1 and P7

(a)
(b)
Figure 16. Results for the Load L1 using Independent Component Analysis: (a) comparison between P1 and P8 (b)
comparison between P1 and P9

4.1 Application of Sparse Principal Component Analysis for feature extraction of


rotating machinery vibration signals
The first experiment performed was constructing the input data matrix X considering 50 fragments of the signal
with a length of 128 samples. This matrix is decomposed into six PCs using conventional PCA and Sparse PCA
(SPCA). In Figure 18 the third Principal Component (PC3) out of six is shown. In this case the estimation is
performed using PCA. In Figure 19 the third Principal Component (PC3) out of six is shown. The estimation
is performed using the same data but in this case the Sparse PCA algorithm is used. The sparse PC3 retains
only a small number of coefficients different from zero. In the Figures it is evident that the peaks retained in
the sparse PC are also present in the dense solution obtained using PCA, however the sparse PC could not be
obtained by simple thresholding.
Applications of SPCA to the detection of damage in rotating machinery has been previously reported by
Bartkowiak and Zimroz.9 The application of this technique involves two stages. The first consists in performing
the selection of a set of signals for analysis. The periodogram of each signal is estimated. The second stage consists
in performing the SPCA decomposition for obtaining the sparse Principal Components enabling calculation of
features used for recognition. In this case the vibration signal is divided in segments of 6250 samples and for
each segment the periodogram is estimated and 256 samples of the periodogram are retained for constructing
the input matrix X of size 80 256. This matrix is then decomposed using SPCA and three features are then

(a)
(b)
Figure 17. Results for the Load L1 using Independent Component Analysis: Comparison between P1 and P10

(a)
(b)
Figure 18. The third Principal Component out of six estimated using ordinary PCA

calculated: the adjusted variance for the first PC, the square norm of the first PC and the square norm of the
variance of the standard PCA. Results of the analysis is shown in Figures 20 to 24. There is a good separation
between classes between P1 and P9, between P1 and P10, P1 and P5 and P1 and P3. For the rest of classes
even when most of the points are separated, there is an important degree of overlapping between the classes
considered.
A second experiment was performed by considering the load L1, but in this case all repetitions, velocities and
voltages were considered for constructing the input matrix X as well as both accelerometer signals, following
a procedure similar to the used for the independent component analysis. Twenty Principal Components were
estimated and the three following features were calculated: the norm of the variance for the SPCA, the norm

(a)
(b)
Figure 19. The third Principal Component out of six estimated using ordinary Sparse PCA

(a)
(b)
Figure 20. Results for the Load L1 using Sparse Principal Component Analysis: (a) comparison between P1 and P2 (b)
comparison between P1 and P3

of the variance for the PCA and the norm of the matrix spanning the first three principal components. Using
this procedure we only have as results 6 points for each class considering the L1 load. In this experiment the
Sparse PCA is performed directly using the samples of the input signals without calculation of periodograms.
Results are shown in Figures 25 to 29. The selected features allow one a good separation between classes except
for the classes P7, P8 and P10 where there is not a clear separation between classes. This technique shows in
general promising results, however there is still a lot of work for improving the results and optimizing the tuning
of parameters.

5. SPARSE DICTIONARIES FOR VIBRATION SIGNAL ANALYSIS


Signals involve important amounts of data in which relevant information is difficult to handle. Processing is faster
and simpler in a sparse representation where a small set of coefficients reveal the information we are looking for.
Such representations can be constructed by decomposing signals over elementary waveforms chosen in a family
called a dictionary. Such dictionary could be complex sinusoidals as in the DFT case, orthogonal wavelets,
cosines or any other transformation. These methods lead to efficient processing techniques. A dictionary of
minimum size that can yield a sparse representation forms an an orthogonal basis. This type of dictionary is

(a)
(b)
Figure 21. Results for the Load L1 using Sparse Principal Component Analysis: (a) comparison between P1 and P4 (b)
comparison between P1 and P5

(a)
(b)
Figure 22. Results for the Load L1 using Sparse Principal Component Analysis: (a) comparison between P1 and P6 (b)
comparison between P1 and P7

(a)
(b)
Figure 23. Results for the Load L1 using Sparse Principal Component Analysis: (a) comparison between P1 and P8 (b)
comparison between P1 and P9

usually designed to concentrate the signal energy over a set of few vectors. Such dictionaries enable efficient
signal compression and denoise techniques. There are several applications using sparse representation including
compression, regularization in inverse problems and feature extraction.

(a)
(b)
Figure 24. Results for the Load L1 using Sparse Principal Component Analysis: Comparison between P1 and P10

(a)
(b)
Figure 25. Results for the Load L1 using Sparse Principal Component Analysis on the original signals: (a) comparison
between P1 and P2 (b) comparison between P1 and P3

5.1 Sparse representation using dictionaries


Sparse signal representations of a signal is attained by combining only a few elementary components, known
as atoms that are extracted from a given redundant matrix known as dictionary. An illustration of a sparse
representation is shown in Figure 30. The observed data b can be represented as a linear combination of only a
few columns of matrix A which is the dictionary and its columns are known as atoms. The linear combination
of atoms is weighted by the non-zero entries of the sparse vector x. A noise-free model corresponding to a sparse
representation can be written as:
b = Ax

(16)

n
where the dictionary A <nK contains K prototypes signal atoms for columns, {aj }K
j=1 and the signal b <
K
is the combination of several of these column vectors, weighted by the sparse coefficients in vector x < . When
n < K and A is a full rank matrix, the problem underdetermined and it is necesary to consider constraints in

(a)
(b)
Figure 26. Results for the Load L1 using Sparse Principal Component Analysis on the original signals: (a) comparison
between P1 and P4 (b) comparison between P1 and P5

(a)
(b)
Figure 27. Results for the Load L1 using Sparse Principal Component Analysis on the original signals: (a) comparison
between P1 and P6 (b) comparison between P1 and P7

(a)
(b)
Figure 28. Results for the Load L1 using Sparse Principal Component Analysis on the original signals: (a) comparison
between P1 and P8 (b) comparison between P1 and P9

the solution. An appealing representation is obtained by constraining the solution to have only a small number
of coefficients. This sparse representation is the solution of:
(P0, ) min kxk0
x

subject to

kb Axk2 

(17)

(a)
(b)
Figure 29. Results for the Load L1 using Sparse Principal Component Analysis on the original signals: Comparison
between P1 and P10

where k k0 is the l0 norm, counting the nonzero entries of a vector.


The overcomplete dictionary A that enables a sparse representation could either be chosen as a prespecified
set of funtions or learned or adapted to fit a set of selected signal examples.10 Choosing a prespecified dictionary
is easier, such as the Fourier basis, a wavelet basis, a discrete cosine basis or even the SVD basis. Combination
of dictionaries is also possible, for instance Fourier and Wavelet basis. The alternative approach is dictionary
learning where the dictionary is learned from the training data aiming at capturing specific features of the
signals.11
5.1.1 Training of dictionaries
Given a set of signal examples Y = {yi }N
i=1 the goal is estimating the dictionary A that enables construction of
the signal examples through sparse combination of atoms. This problem can be formulated as follows:
min kY AXk2F
A,X

subject to

ikxi k00 s

(18)

where xi are sparse representation of the training vectors. The methods for dictionary learning are performed in
two stages. First, the dictionary is fixed, and a sparse signal decomposition is found; then the dictionary elements
are learned (the xi is determined). Secondly, the dictionary is updated assuming known and fixed coefficients
xi . Many dictionary learning algorithms are available such as the K-SVD algorithm10 .12 This algorithm is well
known and follows the two stages described.

5.2 Application of Dictionary learning for feature extraction of rotating machinery


vibration signals
The first test performed consisted in training a dictionary. The training matrix Y is constructed by subdividing
the vibration signal in 1024 segments of 480 samples and arranging each segment as a column of the Y matrix
of size 1024 480. The training procedure is started with an initial dictionary constructed using cosine basis
functions. The number of iterations was set to 10 to obtain the trained dictionary A of size 1024 480. The
vibration signal considered was R1F1L1P10. A fragment of 10000 samples of this signal is shown in Figure
31. The evolution of the error is shown in Figure 32. Using the equation 16 is possible to obtain a estimation of
the original training data that we denote as b. The same fragment of 10000 samples in the estimated signal b is

(a)
(b)
Figure 30. A Sparse representation using an overcomplete dictionary

shown in Figure 31. The reconstructed signal is visually close to the original signal. We can assess the frequency
content of the original signal as well as the frequency content for the reconstructed signal using the periodogram.
The power spectrum for the original signal is shown in Figure 34 and the power spectrum for the reconstructed
signal using the trained dictionary is shown in Figure 35. We can see that even when there are subtle differences
between the power spectrum of both signals they are quite similar.

(a)
(b)
Figure 31. A fragment of 10000 samples for Signal R1F1L1P10

The second test consisted in training three dictionaries using all the files for the case P1 with the load L1.
The first dictionary is trained using only the accelerometer signal 1 , the second dictionary is trained using the
accelerometer signal 2 and the third dictionary is trained using a combination of signal 1 and signal 2. In this
case the matrix Y is constructed taking one column from signal 1 and the next column from signal 2. After
training of the dictionaries the second stage is the calculation of features for failure detection. This calculation

(a)
(b)
Figure 32. Evolution of the error during the training procedure

(a)
(b)
Figure 33. The fragment of 10000 samples of the reconstructed or estimated signal using the trained dictionary

is based on equation 16 where the dictionary A is known as well as a signal that we can arranged in matrix b.
We can obtain the sparse coefficients of the representation in x as:
x = A1 b

(19)

(a)
(b)
Figure 34. Power spectrum for the original signal

(a)
(b)
Figure 35. Power spectrum for the Reconstructed signal

The hypothesis is that when the signal to be tested comes from the training set P1, the sparse representation is
optimal and matrix x is highly sparse and minimum norm. On the other hand when the signal to be tested comes
from another class P2 to P10 the matrix x obtained does not correspond to an optimal sparse representation. In
consequence the square norm of the matrix is larger than for the normal case. This test is performed by taking

each record of the database and constructing three matrices b1 , b2 and b3 . The first matrix is constructed with
segments extracted from signal 1, the second is constructed with segments extracted from the signal 2 and the
third is constructed with fragments from both signals. This allows one to estimate three features as the norm
of the sparse coefficient x for each b. Results from this second test are shown in Figures 36 to 40. The results in
general are good as both classes are separated except for P6 and P7 in Figure 38 where even when the classes
are separated there are some points misclassified.

(a)
(b)
Figure 36. Results for the Load L1 using Sparse Dictionaries trained with signals from P1: (a) comparison between P1
and P2 (b) comparison between P1 and P3

(a)
(b)
Figure 37. Results for the Load L1 using Sparse Dictionaries trained with signals from P1: (a) comparison between P1
and P4 (b) comparison between P1 and P5

A third test was performed using the same dictionaries trained for the second test, using the files for the
normal case P1 and load L1. The difference in this case is the feature calculation. We take all files in each class
with load L1 for constructing the matrices that are going to be tested against the dictionaries. The matrices
are constructed using a similar procedure that was used for the previous test. Matrix b1 is constructed with
segments of signal 1 from all files with L1. Matrix b2 is constructed with segments extracted from signal 2 from
files with L1 and b3 is constructed with segments extracted from both signals. The coefficient matrices x1 , x2
and x3 are estimated and their square norm is the feature used for classification. In this case we obtain a set
of three features for each class considering the L1. Results are shown in Figure 41. The abnormal classes are
clearly separated from the normal class P1 (green diamond). An important observation is that some classes are
clustered together for instance the first cluster ( closer to P1) is formed by P7,P8 and P10, the second cluster

(a)
(b)
Figure 38. Results for the Load L1 using Sparse Dictionaries trained with signals from P1: (a) comparison between P1
and P6 (b) comparison between P1 and P7

(a)
(b)
Figure 39. Results for the Load L1 using Sparse Dictionaries trained with signals from P1: (a) comparison between P1
and P8 (b) comparison between P1 and P9

is formed by P2,P4 and P9. The other classes are well separated the next is P6 then P5, and finally P3. These
results are promising because the studied techniques could be useful not only for failure detection but for allowing
classification of the type of failure.

ACKNOWLEDGMENTS
This work was supported in part by the Research Department of the University of Cuenca ( DIUC ) and Research
Department from Universidad Politecnica Salesiana de Cuenca, Ecuador.

REFERENCES
[1] Sharma, A., Amarnath, M., and Kankar, P., Feature extraction and fault severity classification in ball
bearings, Journal of Vibration and Control 22(1), 176192 (2016).
[2] Newell, K. and Hancock, P., Forgotten moments: A note on skewness and kurtosis as influential factors in
inferences extrapolated from response distributions, Journal of Motor Behavior 16(3), 320335 (1984).
[3] Petz, D. and Hiai, F., Logarithmic energy as an entropy functional, Contemporary Mathematics 217,
205221 (1998).
[4] Hyv
arinen, A. and Oja, E., Independent component analysis: algorithms and applications, Neural networks 13(4), 411430 (2000).

(a)
(b)
Figure 40. Results for the Load L1 using Sparse Dictionaries trained with signals from P1: Comparison between P1 and
P10

(a)
(b)
Figure 41. Results for the Load L1 using Sparse Dictionaries trained with signals from P1: all files in each class with load
1 are tested against the dictionary for obtaining the features used.

[5] Zhou, W. and Chelidze, D., Blind source separation based vibration mode identification, Mechanical
systems and signal processing 21(8), 30723087 (2007).
[6] Glifford, G., Singular value decomposition & independent component analysis for blind source separation,
HST582J/6.555 J/16.456 J, Biomedical Signal and Image Processing (2005).

[7] Cadima, J. and Jolliffe, I. T., Loading and correlations in the interpretation of principle components,
Journal of Applied Statistics 22(2), 203214 (1995).
[8] Zou, H., Hastie, T., and Tibshirani, R., Sparse principal component analysis, Journal of computational
and graphical statistics 15(2), 265286 (2006).
[9] Bartkowiak, A. and Zimroz, R., Sparse pca for gearbox diagnostics, in [Computer Science and Information
Systems (FedCSIS), 2011 Federated Conference on], 2531, IEEE (2011).
[10] Aharon, M., Elad, M., and Bruckstein, A., Svd: An algorithm for designing overcomplete dictionaries for
sparse representation, IEEE Transactions on signal processing 54(11), 43114322 (2006).
[11] Elad, M. and Aharon, M., Image denoising via learned dictionaries and sparse representation, in [2006
IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR06) ], 1, 895900,
IEEE (2006).
[12] Rubinstein, R., Zibulevsky, M., and Elad, M., Double sparsity: Learning sparse dictionaries for sparse
signal approximation, IEEE Transactions on signal processing 58(3), 15531564 (2010).

You might also like