0% found this document useful (0 votes)
48 views

Brazilian

Uploaded by

Neeraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Brazilian

Uploaded by

Neeraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

J Braz. Soc. Mech. Sci. Eng.

DOI 10.1007/s40430-016-0540-8

TECHNICAL PAPER

Novel ensemble techniques for classification of rolling element


bearing faults
Aditya Sharma1 · M. Amarnath1 · Pavan Kumar Kankar1 

Received: 30 January 2016 / Accepted: 7 April 2016


© The Brazilian Society of Mechanical Sciences and Engineering 2016

Abstract  Rolling element bearings are the critical 1 Introduction


mechanical components in industrial applications. These
rotary elements work continuously under different operat- Rolling element bearings are the essential components
ing and environmental conditions. It leads to generation of almost all rotary machines. The operation of rotating
of various defects over the operating surfaces of bearing machines is closely linked to the bearings. Studies have
components. In this study, four machine learning tech- reported that more than 40 % failures of rotary machines
niques, two ensemble techniques, i.e., rotation forest and happen due to the bearing failure [1]. The prolonged opera-
random subspace, and two well-established techniques, tion of bearing results in generation of various localized
i.e., support vector machine and artificial neural network, and distributed defects. Over a period of time due to numer-
are utilized for fault severity classification. Time domain, ous fatigue loads, these defects may cause catastrophic fail-
frequency domain and wavelet-based features are extracted ure of bearing and the associated system.
from the raw vibration signals of rolling element bearings A variety of data can be collected for the health moni-
for the investigations. Four feature ranking techniques are toring of bearings. It may include vibration, acoustic emis-
employed to rank extracted features. Comparative study is sion, lubricant properties, current, voltage, etc. Among
carried out among the machine learning techniques with them vibration-based data and techniques are preferred due
and without the ranking of features. Present study not to their sensitivity [2]. Most of the work in the literature
only investigates various machine learning techniques but focuses on fault diagnosis of rolling element bearings [3–
also examines the performance of various feature ranking 6]. However, few attempts have been made towards fault
techniques for fault severity classification of rolling ele- severity classification [7–16]. Fault severity classification
ment bearings. Results show that ensemble techniques have of rolling element bearings is one of the challenging and
superior classification efficiency and take very less compu- critical activities of diagnostic.
tational time for the analysis in comparison to support vec- Various authors have classified localized defect in dif-
tor machine and artificial neural network. ferent components of rolling element bearings with higher
classification accuracy. Sugumaran and Ramachandran [7]
Keywords  Ensemble techniques · Feature ranking · Fault have used various statistical features, viz. standard devia-
severity classification · Rolling element bearings tion, standard error, range, etc. and histogram features for
the analysis. The authors utilized support vector machine
(SVM) for prediction of faults in inner race and outer race
Technical Editor: Kátia Lucchesi Cavalca Dedin.
and summarized that except statistical features, histogram
* Aditya Sharma features are also good indicators for fault classification.
[email protected] Guo et al. [8] have utilized genetic programming (GP) for
1
the extraction of temporal features from the raw vibration
Machine Dynamics and Vibrations Lab, Mechanical
Engineering Discipline, PDPM Indian Institute
signals of bearing and extracted shock pulse, crest factor,
of Information Technology, Design and Manufacturing and kurtosis values for the analysis. The authors employed
Jabalpur, Jabalpur 482005, India artificial neural networks (ANNs) and SVM for the

13
J Braz. Soc. Mech. Sci. Eng.

investigations and proposed that GP-based feature extrac- can effectively classify the defect severities in various oper-
tion methodology enhances the fault classification perfor- ating conditions.
mance of ANN and SVM. To classify faults, selection of appropriate features is
Wu et al. [9] and Vakharia et al. [10] developed a fault very crucial because it not only affects classification per-
classification framework based on multiscale permutation formance of the algorithm but also computational effort.
entropy (MPE) and SVM. The authors performed a com- The literature also reveals the use of various feature rank-
parative study among various measures of signal complex- ing techniques for selecting best possible features. Feature
ity, viz. permutation entropy, multiscale entropy and MPE ranking techniques improve the computational efficiency
and concluded that MPE provide better categorization and rank the features based on the information content
between various complex vibration signals. Wang et al. [11] about the states of the system. In this process, the features
have classified the faults in rolling element bearings using are ranked in such a way that the feature which carries most
hyper-sphere-structured multiclass support vector machine significant information is ranked at the top. A variety of
(HSSMC-SVM). The authors summarized that empiri- feature ranking techniques have been used and reported in
cal mode decomposition (EDM) approach is an effective the literature. Zhao et al. [17] conducted an extensive study
scheme for fault classification and requires less time. and proposed various feature ranking techniques such as
Abbasion et al. [12] have investigated wavelet denois- Chi-square, information gain (IG) and ReliefF. The authors
ing and SVM for the classification of faults in bearings. also compared the performance of these ranking tech-
The authors concluded that the proposed methodology niques. Samanta et al. [18] utilized genetic algorithm (GA)
provides higher accuracy and can be utilized to other criti- for the selection of appropriate features for bearing fault
cal components for fault classification. Zhang et al. [13] diagnosis. The authors employed ANN for the analysis and
have proposed a fault classification methodology based on summarized that significant improvements can be obtained
kernel principal component analysis (KPCA) and particle using proposed feature selection technique.
swarm optimization support vector machine (PSO-SVM). Kappaganthu and Nataraj [19] proposed the concept of
The authors extracted various time domain and frequency mutual information for the ranking of features. The authors
domain features from the collected raw vibration signals conducted experimental investigations and found that pro-
and summarized that the proposed framework can provide posed approach is helpful in finding out the most appro-
satisfactory results having small number of features. priate features which significantly improve fault diagnosis
Faults having various severities in same bearing com- efficiency. Sugumaran et al. [20] used SVM and proximal
ponent generate same characteristic frequencies. It makes support vector machine (PSVM) for the fault diagnosis of
the fault severity classification process more difficult and rolling element bearings using statistical measures. The
stimulating. Moreover, fallacious defect severity classifica- authors utilized decision tree technique for the selection
tion misleads the maintenance strategies. Zarei et al. [14] of optimal features. The study summarizes that SVM and
proposed a two-stepped approach for the fault classification PSVM show superior performance in presence of optimally
of rolling element bearings. In the first step of the proposed selected features. Malhi and Gao [21] have proposed prin-
approach, the authors removed the noise from the signal. In cipal component analysis-based framework for the selec-
second step, temporal features are extracted for the classifi- tion of appropriate features.
cation of faults. Results showed that the proposed method- The literature highlighted that SVM and ANN have been
ology provides better accuracy and improves the reliability extensively used for the classification of bearing faults over
having low-quality signals. Liu et al. [15] have classified past decades. Various improvements of the aforementioned
the faults in all components of bearing using wavelet sup- machine learning techniques are also proposed time to time
port vector machine (WSVM) and PSO algorithm. The to increase the precision, applicability and intelligence.
authors utilized EMD for the preprocessing of vibration Instead, a variety of hybrid techniques are also suggested
signals and extracted various statistical features. The study for fault severity classification. These hybrid techniques are
concluded that the proposed methodology performs satis- complicated and take more computational time for the anal-
factorily even with small number of features due to the bet- ysis. This motivated authors to use and evaluate the perfor-
ter generalization capability of WSVM than conventional mance of some other classifiers. In this study, two ensem-
SVM. Wen et al. [16] utilized and compared the classifica- ble techniques, rotation forest (RF) and random subspace
tion performance of k-nearest neighbor classifier (KNNC), (RS), are explored for fault severity classification of rolling
back-propagation neural networks (BPNNs) and SVM. The element bearings. To improve the classification accuracy,
authors considered various defect sizes, viz. 0.3, 0.6 and four feature ranking techniques are utilized. Vibration sig-
1.0 mm for the investigations. The study employed multi- nals of healthy and defective bearings have been extracted
scale general fractal dimensions (MGFDs) of the bearing for the analysis [22]. A feature vector has been constituted
vibration signals and concluded that the proposed approach from time domain, frequency domain and wavelet-based

13
J Braz. Soc. Mech. Sci. Eng.

features. Authors have compared results of rotation for- Step 3 In last step, having coefficient of matrix, organ-
est and random subspace techniques with extensively used ize the obtained vectors with coefficient in a sparse rotation
techniques in the literature, i.e., SVM and ANN. matrix Tij. Here, the jth set of coefficients of  the principal
(M )
components is sj, sj = tij(1) , tij(2) , . . . , tij j , i = 1,2, … ,
L; j = 1,2, … , R and use [0] to indicate zero matrix, which
2 Machine learning techniques has the same dimension as vj.
 
In this study, four machine learning techniques are utilized, v1 [0] · · · [0]
 [0] v2 · · · [0] 
these are summarized as follows:
Tj =  . . . .  (1)
 
 .. .. .. .. 
2.1 Rotation forest [0] [0] · · · vR

A variety of ensemble classifiers such as boosting, bagging 2.2 Random subspace


and random forest have been developed to improve perfor-
mance of weak machine learners like trees. The ensemble Similar to rotation forest, random subspace (RS) is an
classifiers generally provide more accurate classification ensemble classifier, proposed by Ho [27]. RS classifier cre-
accuracy in less time and are less susceptive to noise. Rota- ates a decision tree-based classifier which improves the
tion forest (RF) is a new ensemble technique. It utilizes generalization accuracy and preserves the highest accuracy
multiple classifiers instead of a single classifier. RF has on the training data. It builds a partial C4.5 decision tree
been used in various applications such as biomedical [23], during iterations for the analysis and makes the best leaf
geosciences [24], pattern recognition [25] due to its higher into a rule [28]. The various steps of RS method are sum-
computational efficiency and less non-coincident errors. marized as follows.
This ensemble classification technique uses principal Consider a training sample set P = (P1, P2, … , Pn) in
component analysis (PCA) as a feature extraction tech- each training object Pi(i  =  1,…, n) having x-dimensional
nique. In this framework, the training dataset is randomly vector Pi = (pi1, pi2, … , pix). In RS method, one randomly
divided into various subsets and feature extraction is selects f < x features from the x-dimensional data set P. It
applied to each subset. The process of an individual clas- provides f-dimensional random subspace of the original
sifier involves following steps which are inspired from the x-dimensional feature space. This generates the modified
developers of the classifier [26]. training set Pm = (P1m , P2m , . . . , Pnm ) having f-dimen-
Let p  = [p1, … , pn]T be a data point represented by n sional training objects. Pim = (pm i1 , pi2 , . . . , pin ) (i = 1,…,
m m
features and P is the data set having training objects in the n), where f components pm ij (j = 1, . . . , f ) are randomly
form of N × n matrix. Let Q be a vector with class labels
selected from x components pm ij (j = 1, . . . , p) on the train-
for the data, Q  = [q1, … ,qn]T, where qi obtains a value
ing vector Pi. Finally the classifier is constructed in random
from the set of class labels [h1, … , hc]. Indicate the clas-
subspace Pm and combined by simple majority voting. The
sifiers in the ensemble by A1, … AL and the feature set by
organization of RS method is as follows:
f. Most of the ensemble classifiers are required to pick L in
Step 1 Repeat for m = 1,2, … , M;
advance. All classifiers can be trained in parallel similar to
other ensemble classifiers. Following steps are carried out
(a) Select a f-dimensional random subspace Pm from the
to construct the training set for the classifier Ai:
original x-dimensional feature space P.
Step 1 First, split the feature set f randomly into R sub-
(b) Build a classifier Cm(p) in Pm.
sets for maximizing the diversity, where R is a factor of n,
so that each feature subset contains M = n/R features.
Step 2 Merge the classifiers Cm(p) for m = 1,2,…, M, on
Step 2 Designate the jth subset of features for training
the basis of simple majority voting to a final decision.
set of classifier Ai as fij. For each subset randomly select a
nonempty subset of classes and describe a bootstrap sample arg max

A(p) = y∈{−1,1} δsgn (C m (M)), y (2)
of objects having size as three fourth of the input dataset. m
The PCA is implemented only on M features in fij and the
where δi,j is the Kronecker symbol and y ∈ {−1, 1} is the
selected subset of P. Preserve the coefficients of the prin-
(1) (M ) class label.
cipal components as tij , . . . , tij j and each of these Mj RS has various advantages over the single classifiers. In
coefficients take the size of M × 1. Perform the PCA on a RS technique, when the numbers of training examples are
subset of the dataset P instead of on entire set and the same small relative to the dimensionality of the data, it solves
feature subset is selected for different classifies. the small sample problem by constructing the classifier

13
J Braz. Soc. Mech. Sci. Eng.

in random subspace. In this process, the subspace dimen- where j is the number of elements and wi is the intercon-
sionality is smaller than the original feature space, while nection weights of input vector xi.
the number of training objects remains same. However, the In present study, the aforementioned machine learn-
classification accuracy is superior even in presence of many ing techniques are assessed with their default parameters,
redundant features. where RF used J48 classifier and RS utilized REPTree clas-
sifier. On the other hand, SVM used sequential minimal
2.3 Support vector machine optimization training algorithm and polykernel, while ANN
employed back-propagation algorithm and sigmoid activa-
Support vector machine (SVM) is a supervised machine tion function.
learning technique based on structural risk minimization
principal derived in statistical learning theory. SVM is
widely used for classification and regression problems due 3 Feature ranking techniques
to its high generalization performance, robustness, ability
to model non-linear relationships and potential to handle Various features are extracted from the raw vibration sig-
very large and small sample cases [29]. nals of test bearings to correlate them with bearing health.
Basically, SVM deals with the two-class problem and However, all the features are not equally important and
can be formulated as the following optimization problem: some of them may be redundant. It should be noted here
that the features carrying less information for a specific
n
1  application may not be poor indicator in all applications.
Minimize ||W ||2 + C ξi (3)
2 Thus, feature ranking is a very important task in condition
i=1
monitoring. The objective of feature ranking techniques
 is to rank the features based on information and physical
ri (W T si + q) ≥ 1 − ξi spacing. In this study, four feature ranking techniques, Chi-
Subject to (4) square, information gain (IG), gain ratio (GR) and ReliefF,
ξi ≥ 0, 1 = 1, 2, . . . , n
are employed to rank the extracted features. These tech-
where C is the penalty parameter, ξi is a slack variable, n is niques are summarized as follows [30]:
number of samples, q is the bias term and ri, si is the data
set. 3.1 Chi‑square

2.4 Artificial neural network Chi-square is a commonly used feature ranking technique


based on the χ2-statistics. It evaluates the importance of a
Artificial neural network (ANN) is a group of especially feature independently with respect to the class by calculat-
interconnected artificial nodes, called neurons. These neu- ing the Chi-squared statistic. The higher Chi-squared value
rons use a computational model for information processing of a feature indicates the more relevant feature with respect
and collect a weighted set of inputs and, therefore, respond to the class. Mathematically, the Chi-squared value of a
correspondingly. The information is sent to the output feature is calculated as:
neurons from the input neuron(s) through hidden neu-
2
rons. ANN is an adaptive system that changes its structure

� Q
P � Ti ∗Qj
according to the information flows through the network. χ2 = Rij − N  (6)
Ti ∗Qj
ANN is widely used to solve problems that are difficult or i=1 j=1 N
impossible to solve by standard computational and statisti-
cal methods such as pattern recognition, fault classification where P and Q are the number of intervals and of classes,
and detection due to its ability to extract useful information respectively, N is the number of instances, Ti and Qj indi-
from complex data. cate the number of instances in the ith and jth intervals,
ANN is an adaptive system that changes its structure respectively, and Rij represents the number of instances in
according to the information flows through the network. An the ith interval and jth class.
artificial neuron consists of synapses, summing function
and an activation function. Mathematically a neuron can be 3.2 Information gain
represented as:
  Information gain is an entropy-based feature ranking tech-
j
� nique. It is calculated by estimating the overall entropy
K = Z wi xi + q (5) of a feature. It calculates the usefulness of a feature by
i=1

13
J Braz. Soc. Mech. Sci. Eng.

evaluating the performance of feature randomly in its


absence or presence and is expressed as:
IG = unsplitted information−splitted information (7)
3.3 Gain ratio

Gain ratio is a modification of information gain technique.


In information gain, features are ranked on the basis of
large number of values; however, in gain ratio, features are Fig. 1  Schematic of the experimental setup
ranked based on maximizing the feature’s information gain
with minimizing the number of its values. Higher gain ratio Table 1  Drive-end side bearing specifications
value of a feature indicates its higher ranking in a feature Parameter Physical value
set. Mathematically, it is expressed as:
Bearing specification 6205-2RS JEM
IG Inner race diameter 25 mm
GR = (8)
splitted information Outer race diameter 52 mm
3.4 ReliefF Width 15 mm
Ball diameter 7.94 mm
ReliefF is a supervised feature ranking technique. Usu- Pitch circle diameter 39.04 mm
ally, it is employed in data preprocessing as a feature sub- Contact angle 0°
set selection method. The basic idea behind ReliefF is to
evaluate the worth of an attribute by frequently consider-
ing an instance and by taking the value of given attribute bearing housing. A variety of localized faults are generated
for the nearest instance of the same and different class. It is in bearing components using electro-discharge machining
defined for two-class problem, but can also be utilized for (EDM). The schematic representations of localized faults in
multiclass problems. It can be represented as: various bearing components are depicted in Fig. 2.
The setup had been operated at various speeds. The
N
1 healthy bearing data are considered as baseline data in
RF(Zi ) = p(Zt,i − Zdc(xi) ) − p(Zt,i − Zsc(xi) ) (9) the analysis and sampling frequency is kept as 48 kHz per
2
i=1
channel. The specific localized faults in bearing compo-
where Zdc(xi) and Zsc(xi) indicate the value of ith feature nents considered in this study for the investigations are:
of nearest point to xi with different and same class label,
respectively. (i) bearing having ball defect of 0.1778 mm (BF0),
( ii) bearing having ball defect of 0.3556 mm (BF1),
(iii) bearing having ball defect of 0.5334 mm (BF2),
4 Experimental setup and procedure (iv) bearing having ball defect of 0.7112 mm (BF3),
(v) bearing having inner race defect of 0.1778 mm (IF0),
The classification performances of various machine learning (vi) bearing having inner race defect of 0.3556 mm (IF1),
techniques are examined over the vibration data obtained (vii) bearing having inner race defect of 0.5334 mm (IF2),
from healthy and defective bearings. The vibration data (viii) bearing having inner race defect of 0.7112 mm (IF3),
extracted from the various health states of bearing for the (ix) bearing having outer race defect of 0.1778 mm (OF0),
analysis in this study are collected from Case Western (x) bearing having outer race defect of 0.3556 mm (OF1),
Reserve University Bearing Data Centre website [22] with (xi) bearing having outer race defect of 0.5334 mm (OF2).
the kind permission of Prof. Loparo. Figure 1 shows the
brief schematic diagram of the experimental test setup. The Including eleven above-mentioned bearing conditions,
setup consists of a 2HP three-phase induction motor, an healthy bearing condition (HTY) is also considered for the
encoder and a dynamometer. Due to the broader set of con- analysis.
figurations of drive-end side test bearing signals, drive-end
data are selected for the investigations. The physical param-
eters of drive-end side bearings are listed in Table 1. The 5 Feature extraction
drive-end side of the test bearing is loaded by dynamometer.
The vibration signals are acquired using an accelerometer The effective condition monitoring and fault classification of
having magnetic base and it is mounted at the top of the rolling element bearings can be accomplished by extracting

13
J Braz. Soc. Mech. Sci. Eng.

Fig. 2  Representations of
defects in bearing components.
a Ball defect, b inner race
defect and c outer race defect

Table 2  Extracted features
Domain Features

Time Skewness, kurtosis, standard deviation (STD), crest factor, normalized fifth central moment (NFCM),
normalized sixth central moment (NSCM)
Frequency (via FFT) Peak value, RMS frequency value, root variance frequency, spectral centroid, spectral roll off
Wavelet Skewness (WSK), kurtosis (WKU), standard deviation (WSTD), crest factor (WCF)

meaningful features. For stable conditions, time domain


features are good indicators for monitoring bearing health.
Samanta et al. [18, 31] have used the higher moments of
time domain for fault diagnosis of rolling element bearings
and summarized that higher moments provide good separa-
bility between healthy and defective bearing and good clas-
sification accuracy. They also concluded that although the
higher moments are good indicators for fault diagnosis but
the moments higher than sixth order have not any significant
effect on the fault diagnosis. On the other hand, frequency
domain features have the advantage to locate fault on vari-
ous components of bearing. However, combining the fea-
tures of various domains increases the investigation accu-
racy. In present study for more informative analysis, besides
the time domain features and frequency domain features,
various wavelet-based features are extracted from the raw
Fig. 3  Plots between maximum wavelet energy and scale number for
vibration signals. These features are listed in Table 2. healthy bearing with four operating conditions
For the extraction of features from wavelet domain, the
complex Morlet wavelet is selected. In previously pub-
lished research, the real-valued wavelets are extensively components of bearing generates an impulse of energy at
used for bearing fault diagnosis. Rotor bearing system regular interval. Features corresponding to the frequency of
is a highly non-linear dynamical system and it is highly these impulses have information about defect in respective
sensitive to different parameters such as radial internal bearing component. To obtain relevant information from
clearance, type of defect (distributed or localized), defect wavelet coefficients, scale having maximum energy is con-
severity and number of rolling elements carrying load. sidered as candidate for feature extraction. The energy of
Therefore, it is not necessary that in all conditions real- the wavelet coefficients at each resolution is given by:
valued wavelet will always give better features for bearing
fault diagnosis. While, it is necessary to use a complex- N
Cn,j 2

valued (analytic) wavelet to separate the phase and ampli-
 
E(n) = (10)
tude information of vibration signals which are of non- j=1
stationary and non-linear in nature. Therefore, authors have
used complex Morlet wavelet for feature extraction. Dur- where N is the number of wavelet coefficients, n is the level of
ing operation due to defect, an impact between defect and resolution and Cn,j is the jth wavelet coefficient of nth scale.

13
J Braz. Soc. Mech. Sci. Eng.

(a) (b)

(c) (d)

Fig. 4  Plots between maximum wavelet energy and scale number for bearing having 0.1778 mm defect in size in inner race, outer race and ball
with four operating conditions. a 1797 rpm, b 1772 rpm, c 1750 rpm and d 1730 rpm

The total energy can be obtained as follows: analysis. In this study, the classification performances of
all machine learning techniques are examined with tenfold
Cn,j 2 =
  
Etotal = E(n) cross validation to generalize the results. The investigations
(11)
n j n
are carried out on a personal computer which has core i5
For various bearing conditions, the plots between rela- third-generation processor, 2 GB memory and 64-bit oper-
tive wavelet energy and scale number are shown in Fig. 3, ating system. In this process, first the classification is per-
4, 5, 6 and 7. formed without ranked features. Further, the classification is
carried out with the output of various feature ranking tech-
niques. Both of the cases are discussed in further sections.
6 Results and discussions
6.1 Classification without ranking of features
In present study, multiple fault severities in all bearing com-
ponents are considered. Four severity levels, i.e., defect In this section, the extracted features, as listed in Table 2,
sizes are considered in ball and inner race; however, three are used. These features are fed to the machine learning
severity levels are considered in outer race. Various tem- techniques to classify the fault severities. A sample train-
poral features, spectral features and wavelet-based features ing/testing feature vector used for the analysis is shown
corresponding to each level and component are extracted. in Table 3. The detailed numeric predictions of each
This forms a feature space having 48 instances. This fea- machine learning technique are shown in Table 4. Out of 48
ture space is fed as input to the various classifiers for the instances, ensemble techniques, i.e., RF and RS, correctly

13
J Braz. Soc. Mech. Sci. Eng.

(a) (b)

(c) (d)

Fig. 5  Plots between maximum wavelet energy and scale number for bearing having 0.3556 mm defect in size in inner race, outer race and ball
with four operating conditions. a 1797 rpm, b 1772 rpm, c 1750 rpm and d 1730 rpm

classify 44 and 43 instances, respectively, whereas each The detailed numeric predictions of all machine learn-
of SVM and ANN correctly classifies only 40 instances. ing techniques corresponding to Chi-square feature
It indicates higher classification capabilities of RF and ranking technique are summarized in Table 6. It can be
RS over SVM and ANN. On the other hand, the ensemble noticed from Table 6 that the classification performance
techniques take much less time and the errors are relatively of all machine learning techniques enhances using Chi-
very small. In addition, the value of Kappa statistic for RF square feature ranking technique. Out of four machine
and RS is close to 1 which indicates better classification of learning techniques, three techniques, i.e., RF, SVM and
the defect severities. ANN, correctly classify all 48 instances. The classifica-
tion performance of RS also improves with ranked fea-
6.2 Classification with ranked features tures but it could not classify all the instances correctly.
However, as compared to the unranked features, the errors
Four feature ranking techniques, i.e., Chi-square, IG, GR are relatively small and the value of Kappa statistic also
and ReliefF are employed to rank the extracted features. improves.
Table  5 shows the ranking of features corresponding to The results of fault severities classification using GR
various feature ranking techniques. Since no agreement is feature ranking technique are shown in Table 7. The per-
observed among various feature ranking techniques, the formance of three machine learning techniques, i.e., RF,
performance of all of them is evaluated. Further, the ranked SVM and ANN are similar to Chi-square feature rank-
features are fed as input to the machine learning techniques ing technique, i.e., 100 %. In this case, the performance of
for fault severity classification. RS improves significantly and it correctly classifies all 48

13
J Braz. Soc. Mech. Sci. Eng.

(a) (b)

(c) (d)

Fig. 6  Plots between maximum wavelet energy and scale number for bearing having 0.5334 mm defect in size in inner race, outer race and ball
with four operating conditions. a 1797 rpm, b 1772 rpm, c 1750 rpm and d 1730 rpm

instances. It shows perfect classification of defect severities 6.3 Quantitative evaluation


corresponding to their actual classes. The performance of all
machine learning techniques using gain ratio feature ranking To quantify the performance, quantitative evaluation of all
technique also indicates the classification of defect severities machine learning techniques is performed. In this study,
without any error, having ideal value of Kappa statistic. five performance measures: precision, accuracy, sensitiv-
Tables  8 and 9 summarize the results of fault sever- ity, F-measure and receiver operating characteristics (ROC)
ity classification using IG and ReliefF feature rank- value [32] are calculated. A brief description of these per-
ing techniques, respectively. It indicates that ensemble formance measures is given hereafter.
techniques correctly classify all 48 instances, while
each of SVM and ANN correctly classifies only 47 (a) Precision
instances. Although, the classification performances
of SVM and ANN are improved as compared to the Precision is a measure which indicates the consistency of
unranked features, these techniques could not clas- the observations for various iterations over the same dataset.
sify all instances correctly. Moreover, as compared to
TP
the unranked features, the errors are relatively very Precision = (12)
small and the value of Kappa statistic also improves. TP + FP
It can also be noticed from Tables 8 and 9 that the per- where TP (true positive) and FP (false positive) indicate
formance of IG feature ranking technique is similar to the correctly positive classified and incorrectly classified
ReliefF feature ranking technique. instances, respectively.

13
J Braz. Soc. Mech. Sci. Eng.

(a) (b)

(c) (d)

Fig. 7  Plots between maximum wavelet energy and scale number for bearing having 0.7112 mm defect in size in inner race and ball with four
operating conditions. a 1797 rpm, b 1772 rpm, c 1750 rpm and d 1730 rpm

(b) Accuracy F-measure is a mean of precision and sensitivity and


measures the test’s accuracy.
Accuracy indicates the closeness of measurement to the
actual value for various iterations over the same dataset, (e) ROC value
without any error. 2 × precision × sensitivity
F-measure = (15)
TP + TN precision + sensitivity
Accuracy = (13)
TP + FP + FN + TN The ROC value is a widely used measure calculated for
where TN (true negative) and FN (false negative) represent quantitative assessment of a classifier. It is used to differen-
correctly negative classified instances and incorrectly posi- tiate between the true-positive rate and false-positive rate.
tive classified instances, respectively. The value of ROC curve close to 1 represents the perfect
classification of the data.
(c) Sensitivity Figure 8 represents the performance of all four machine
learning techniques without the ranking of features. The
Sensitivity corresponds to the true-positive rate and ideal value of each performance measure is 1. It can be
measures the correctly identified instances. noticed that the ensemble techniques outperform over SVM
and ANN. The maximum value of precision is found as
(d) F-measure
TP 0.914 for RF. The ensemble techniques outperform in terms
Sensitivity = (14) of accuracy and sensitivity also. The maximum values of
TP + FN

13
Table 3  Sample training/testing vector
J Braz. Soc. Mech. Sci. Eng.

Skewness Kurtosis STD Crest factor NFCM NSCM Peak value RMS fre- Root variance Spectral Spectral roll WSK WKU WSTD WCF Speed (RPM) Condition
quency value frequency centroid off

−0.035 2.764 0.072 4.219 0.0093 0.008 0.058 0.0003 0.0002 0.201 0.177 0.881 3.061 0.021 3.539 1797 HTY
−0.173 2.930 0.065 5.212 0.0366 0.0104 0.030 0.0001 0.0001 0.207 0.193 1.826 7.379 0.016 5.548 1772 HTY
−0.008 2.984 0.138 4.359 0.003 0.0109 0.025 0.0007 0.0007 0.476 0.570 2.028 9.112 0.026 7.167 1797 BF1
0.0074 2.963 0.139 4.743 0.0018 0.0108 0.032 0.0007 0.0007 0.479 0.571 1.942 8.556 0.026 7.623 1772 BF1
0.0156 8.837 0.140 9.510 0.0015 0.175 0.050 0.0008 0.0007 0.477 0.554 1.378 6.665 0.027 5.853 1772 BF2
0.1432 9.752 0.143 12.819 0.424 0.313 0.044 0.0008 0.0007 0.447 0.549 3.905 31.616 0.034 11.895 1750 BF2
0.0559 3.871 2.077 5.262 0.0217 0.0203 0.165 0.0119 0.0108 0.494 0.539 2.545 12.498 9.497 8.321 1797 BF4
0.0478 3.910 2.029 5.751 0.0189 0.02141 0.227 0.0116 0.0105 0.497 0.535 2.623 13.387 9.345 8.855 1772 BF4
0.164 5.395 0.291 5.965 0.0973 0.0423 0.090 0.0016 0.0015 0.483 0.597 2.147 7.356 0.11 5.299 1797 IF1
0.1304 5.542 0.292 5.397 0.0737 0.043 0.081 0.0016 0.0015 0.486 0.606 2.915 12.639 0.122 6.605 1772 IF1
−0.058 21.957 0.194 10.164 0.0587 0.764 0.068 0.0011 0.0010 0.496 0.613 6.109 44.449 0.161 11.382 1797 IF2
0.0029 22.084 0.165 12.265 0.0186 0.881 0.027 0.0009 0.0008 0.506 0.611 6.662 60.282 0.09 18.1 1772 IF2
0.3024 7.445 0.525 7.210 0.031 0.015 0.097 0.003 0.0028 0.471 0.539 4.172 25.975 0.947 11.041 1797 IF3
0.2563 7.666 0.441 8.342 0.027 0.012 0.098 0.0025 0.0024 0.472 0.537 4.199 26.505 0.69 13.418 1772 IF3
0.0569 7.649 0.669 5.422 0.0297 0.0689 0.157 0.0038 0.0037 0.545 0.592 2.592 9.79 1.037 6.152 1797 OF1
0.0334 7.594 0.591 5.257 0.0138 0.0693 0.167 0.0033 0.0032 0.551 0.589 2.518 9.306 0.8 6.469 1772 OF1
0.0005 3.056 0.099 5.468 0.0026 0.0125 0.038 0.0005 0.0005 0.509 0.630 3.229 32.149 0.011 14.872 1797 OF2
0.0088 2.940 0.093 4.292 0.002 0.0105 0.042 0.0005 0.0005 0.515 0.631 1.434 5.797 0.011 5.779 1772 OF2
0.1320 23.163 0.569 11.672 0.265 0.888 0.092 0.0032 0.0030 0.512 0.592 6.894 66.197 1.43 18.354 1750 OF3
0.130 23.542 0.559 11.901 0.261 0.9162 0.063 0.0032 0.0029 0.511 0.589 7.186 71.364 1.48 17.927 1730 OF3

13
J Braz. Soc. Mech. Sci. Eng.

Table 4  Detailed numeric predictions without ranking of features


AI technique Total number of Correctly classified Incorrectly classified Error (%) Kappa statistic Time (s) Root mean squared
instances instances instances error

RF 48 44 4 8.3333 0.9091 0.04 0.1353


RS 48 43 5 10.4167 0.8864 0.05 0.1288
SVM 48 40 8 16.6667 0.8182 0.37 0.1558
ANN 48 40 8 16.6667 0.8182 0.29 0.1558

Table 5  Ranking of features corresponding to various feature ranking techniques


Feature’s ranking Feature ranking technique
Chi-square Information gain Gain ratio ReliefF

1 Root variance frequency Root variance frequency Root variance frequency Kurtosis
2 RMS frequency value RMS frequency value STD Root variance frequency
3 STD STD RMS frequency value STD
4 Spectral roll off Spectral roll off Spectral roll off RMS frequency value
5 NSCM Kurtosis Kurtosis NSCM
6 WSTD WSTD Spectral centroid Crest factor
7 Spectral centroid Spectral centroid NFCM Spectral centroid
8 Kurtosis NSCM Peak value Peak value
9 WKU WKU Crest factor WSTD
10 Skewness WSK WSTD Spectral roll off
11 WSK Skewness NSCM Skewness
12 Peak value Peak value Skewness WCF
13 NFCM WCF WKU WSK
14 WCF Crest factor WSK NFCM
15 Crest factor NFCM WCF WKU

Table 6  Detailed numeric predictions with Chi-square feature ranking technique


Machine learning Total number of Correctly classified Incorrectly classified Error (%) Kappa statistic Time (s) Root mean squared
technique instances instances instances error

RF 48 48 0 0 1 0.03 0
RS 48 45 3 6.25 0.9318 0.04 0.1165
SVM 48 48 0 0 1 0.29 0
ANN 48 48 0 0 1 0.27 0

Table 7  Detailed numeric predictions with GR feature ranking technique


Machine learning Total number of Correctly classified Incorrectly classified Error (%) Kappa statistic Time (s) Root mean squared
technique instances instances instances error

RF 48 48 0 0 1 0.02 0
RS 48 48 0 0 1 0.03 0
SVM 48 48 0 0 1 0.31 0
ANN 48 48 0 0 1 0.22 0

13
J Braz. Soc. Mech. Sci. Eng.

Table 8  Detailed numeric predictions with IG feature ranking technique


Machine learning Total number of Correctly classified Incorrectly classified Error (%) Kappa statistic Time (s) Root mean squared
technique instances instances instances error

RF 48 48 0 0 1 0.02 0
RS 48 48 0 0 1 0.03 0
SVM 48 47 1 2.0833 0.9773 0.24 0.0973
ANN 48 47 1 2.0833 0.9773 0.25 0.0973

Table 9  Detailed numeric predictions with ReliefF feature ranking technique


Machine learning Total number of Correctly classified Incorrectly classified Error (%) Kappa statistic Time (s) Root mean squared
technique instances instances instances error

RF 48 48 0 0 1 0.03 0
RS 48 48 0 0 1 0.05 0
SVM 48 47 1 2.0833 0.9773 0.27 0.0973
ANN 48 47 1 2.0833 0.9773 0.24 0.0973

Fig. 8  Performance measures of machine learning techniques with- Fig. 9  Performance measures of machine learning techniques with
out ranking of features Chi-square feature ranking technique

accuracy and sensitivity are noticed as 0.917 for RF, while


SVM and ANN show the least performance. The maxi-
mum value of F-measure is noticed as 0.913 for RF tech-
nique. It indicates higher test accuracy of RF over the other
techniques. The ensemble techniques also show superior
ROC performance over other techniques and its values are
observed as 0.995 and 0.987, respectively, for RS and RF
techniques.
Figure 9 shows the performance measures of all machine
learning techniques with Chi-square feature ranking tech-
nique. It can be noticed that the value of performance meas-
ures improves significantly. Out of four techniques, three
techniques, i.e., RF, SVM and ANN have attained the ideal
value for each of the five performance measures. Although, Fig. 10  Performance measures of machine learning techniques with
the values of the performance measures for RS are less than GR feature ranking technique

13
J Braz. Soc. Mech. Sci. Eng.

includes some of the previous investigations which were


carried out over the same bearing dataset as in this study.

7 Conclusions

This study proposed simpler and efficient approach for


multi-fault severity classification in rolling element bear-
ings. Four machine learning techniques are assessed for
the investigations. Various temporal, spectral and wavelet-
based features are extracted from different bearing condi-
tions. Four feature ranking techniques are utilized to rank
the extracted features. The performances of machine learn-
Fig. 11  Performance measures of machine learning techniques with ing techniques are evaluated with and without the ranked
IG feature ranking technique features. Following conclusions can be drawn from this
study.

• The ensemble techniques show superior classification


efficiency without feature ranking techniques. The max-
imum error is reported as 10.4167 % for RS ensemble
technique. However, for each of SVM and ANN it is
noticed as 16.6667.
• The performance of all machine learning techniques
are also assessed using the output of various feature
ranking techniques. Results show good synchronization
between the actual and predicted class. The minimum
error is reported as 2.0833 % using SVM and ANN
techniques with IG and ReliefF feature ranking tech-
niques, while the maximum error is reported as 6.25 %
Fig. 12  Performance measures of machine learning techniques with using RS technique with Chi-square feature ranking
ReliefF feature ranking technique
techniques.
• Among various machine learning techniques, RF shows
the ideal value, the actual values are very close to the ideal the best performance due to its less sensitivity to noise
values and show efficient performance. and higher classification performance. To demonstrate
Figure  10 indicates the performance of all machine the effectiveness of ensemble machine learning tech-
learning techniques with GR feature ranking technique. It niques, various performance measures are quantified
shows that all machine learning techniques have achieved and is found that RF outperforms in each condition and
the ideal value of each performance measure. However, RS also shows good performance.
the performance measures show lower values for SVM • Ensemble techniques take very less computational time
and ANN machine learning techniques in this study with for the analysis. Out of two ensemble techniques, RF
IG and ReliefF feature ranking techniques and are shown takes least time for the classification. The maximum
in Figs. 11 and 12, respectively. In these cases, the values time taken by RF is 0.04 and 0.02 s, respectively, with
of performance measures for SVM and ANN are identified and without feature ranking techniques.
less than the ideal values, but the actual values are close to • Results indicate that ranking of features is essential
the ideal values and show satisfactory performance. for the analysis. All machine learning techniques show
To show the effectiveness of ensemble techniques and higher performance in presence of feature ranking
feature ranking techniques, a comparison between the pre- techniques and the best performance of all techniques
sent work and some published literature is conducted and is noticed with ranked features using gain ratio feature
listed in Table 10. This comparison is carried out on the ranking technique.
basis of technique used, number of features utilized, classi- • The RF ensemble technique along with feature ranking
fied states, considered defect severity levels and minimum techniques classifies all the defect severities correctly. It
time taken by the machine learning technique for maxi- would be beneficial in real practices to classify defect
mum classification efficiency. This comparative study also severities accurately.

13
J Braz. Soc. Mech. Sci. Eng.

Table 10  Comparison between the current work and some published literature


References Technique No. of features Classified Defect severity Maximum classification Minimum time taken for
states levels efficiency maximum classification
efficiency

Sugumaran, SVM and PSVM 10 statistical features 3 Single 100 % with both type –
Ramachandran [7] and 24 histogram features of features
Guo et al. [8] SVM and ANN 4 6 Single 100 % –
Abbasion et al. [12] SVM with wavelet denoising 4 7 Single 100 % –
Zarei et al. [14] ANN with varying the no. 4 4 Multi 100 % with various –
of neurons no. of neurons
Liu et al. [15] SVM with RBF kernel, 27 3 Multi 100 % in various cases –
Mexican hat and Morlet
wavelet
Wen et al. [16] K-nearest neighbor classifier 3 3 Multi 99.60 % with SVM –
(KNNC), back-propagation
neural networks (BPNNs)
and SVM
Saxena and Saad [33] ANN with without 242 4 Single 100 % with GAs –
genetic algorithms (GAs)
Present work RF, RS, SVM and ANN 15 10 Multi 100 % with RF, SVM 0.02 s with RF using feature
and ANN using feature ranking techniques
ranking techniques

13
J Braz. Soc. Mech. Sci. Eng.

Acknowledgments  The authors are thankful to Prof. KA Loparo 16. Wen W, Fan Z, Karg D, Cheng W (2015) Rolling element bear-
and Case Western Reserve University for providing open access to ing fault diagnosis based on multiscale general fractal features.
bearing dataset. Shock Vib 2015:1–9
17. Zhao Z, Morstatter F, Sharma S, Alelyani S, Anand A, Liu H
(2010) Advancing feature selection research—ASU feature
selection repository. Technical Report, Arizona State University,
References 1–28 (accessed in Dec. 2015)
18. Samanta B, Al-Balushi KR, Al-Araimi SA (2006) Artificial neu-
1. Singh GK, Al Kazzaz SAS (2003) Induction machine drive ral networks and genetic algorithm for bearing fault detection.
condition monitoring and diagnostic research—a survey. Electr Soft Comput 10:264–271
Power Syst Res 64:145–158 19. Kappaganthu K, Nataraj C (2011) Feature selection for fault
2. Sharma A, Amarnath M, Kankar PK (2016) Feature extraction detection in rolling element bearings using mutual information. J
and fault severity classification in ball bearings. J Vib Control Vib Acoust 133:061001–061012
22:176–192 20. Sugumaran V, Muralidharan V, Ramachandran KI (2007) Feature
3. Kankar PK, Sharma SC, Harsha SP (2011) Rolling element bear- selection using decision tree and classification through proximal
ing fault diagnosis using wavelet transform. Neurocomputing support vector machine for fault diagnostics of roller bearing.
74:1638–1645 Mech Syst Signal Process 21:930–942
4. Samanta B, Al-Balushi KR, Al-Araimi SA (2003) Artificial neu- 21. Malhi A, Gao RX (2004) PCA-based feature selection scheme
ral networks and support vector machines with genetic algorithm for machine defect classification. IEEE Trans Instrum Meas
for bearing fault detection. Eng Appl Artif Intel 16:657–665 53:1517–1525
5. Singh S, Kumar N (2014) Combined rotor fault diagnosis in 22. Bearing vibration data set, Case Western Reserve University

rotating machinery using empirical mode decomposition. J Mech bearing data centre website. Available at: http://csegroups.case.
Sci Technol 28:4869–4876 edu/bearingdatacenter/pages/welcome-case-western-reserve-uni-
6. Li B, Chow M-Y, Tipsuwan Y, Hung JC (2000) Neural-network- versity-bearing-data-center-website. Accessed in Dec 2015
based motor rolling bearing fault diagnosis. IEEE Trans Ind 23. Karabulut EM, Ibrikçi T (2012) Effective diagnosis of coronary
Electron 47:1060–1069 artery disease using the rotation forest ensemble method. J Med
7. Sugumaran V, Ramachandran KI (2011) Effect of number of Syst 36:3011–3018
features on classification of roller bearing faults using SVM and 24. Kavzoglu T, Colkesen I (2013) An assessment of the effective-
PSVM. Expert Syst Appl 38:4088–4096 ness of a rotation forest ensemble for land-use and land-cover
8. Guo H, Jack LB, Nandi AK (2005) Feature generation using mapping. Int J Remote Sens 34:4224–4241
genetic programming with application to fault classification. 25. Kuncheva LI, Rodríguez JJ (2007) An experimental study on
IEEE Trans Syst Man Cybern B 35:89–99 rotation forest ensembles. In: Proceedings of the 7th interna-
9. Wu S-D, Wu P-H, Wu C-W, Ding J-J, Wang C-C (2012) Bearing tional conference on multiple classifier systems (MCS’07),
fault diagnosis based on multiscale permutation entropy and sup- Springer, Prague, Czech Republic, pp 459–468
port vector machine. Entropy 14:1343–1356 26. Rodríguez JJ, Kuncheva LI (2006) Rotation forest: a new classi-
10. Vakharia V, Gupta VK, Kankar PK (2015) A multiscale permuta- fier ensemble method. IEEE Trans Pattern Anal 28:1619–1630
tion entropy based approach to select wavelet for fault diagnosis 27. Ho TK (1998) The random subspace method for constructing
of ball bearings. J Vib Control 21:3123–3131 decision forests. IEEE Trans Pattern Anal 20:832–844
11. Wang Y, Kang S, Jiang Y, Yang G, Song L, Mikulovich VI (2012) 28. Witten IH, Frank E (2005) Data mining: practical machine learn-
Classification of fault location and the degree of performance ing tools and techniques. Morgan Kaufmann, San Francisco
degradation of a rolling bearing based on an improved hyper- 29. Widodo A, Yang B-S (2007) Support vector machine in machine
sphere-structured multi-class support vector machine. Mech Syst condition monitoring and fault diagnosis. Mech Syst Signal Pro-
Signal Process 29:404–414 cess 21:2560–2574
12. Abbasion S, Rafsanjani A, Farshidianfar A, Irani N (2007) Roll- 30. Altidor W, Khoshgoftaar TM, Hulse JV (2011) Robustness of
ing element bearings multi-fault classification based on the filter-based feature ranking: a case study. In: Proceedings of the
wavelet denoising and support vector machine. Mech Syst Signal twenty-fourth international Florida artificial intelligence research
Process 21:2933–2945 society conference, pp 453–458
13. Zhang Y, Zuo H, Bai F (2013) Classification of fault location 31. Samanta B, Al-Balushi KR, Al-Araimi SA (2004) Bearing fault
and performance degradation of a roller bearing. Measurement detection using artificial neural networks and genetic algorithm.
46:1178–1189 EURASIP J Appl Signal Process 3:366–377
14. Zarei J, Tajeddini MA, Karimi HR (2014) Vibration analysis for 32. Saxena A, Celaya J, Balaban E, Goebel K, Saha B, Saha S, Schwa-
bearing fault detection and classification using an intelligent fil- bacher M (2008) Metrics for evaluating performance of prognos-
ter. Mechatronics 24:151–157 tics techniques. In: Proceedings of the international conference on
15. Liu Z, Cao H, Chen X, He Z, Shen Z (2013) Multi-fault clas- prognostics and health management (PHM08), pp 1–17
sification based on wavelet SVM with PSO algorithm to analyze 33. Saxena A, Saad A (2007) Evolving an artificial neural network
vibration signals from rolling element bearings. Neurocomputing classifier for condition monitoring of rotating mechanical sys-
99:399–410 tems. Appl Soft Comput 7:441–454

13

You might also like