0% found this document useful (0 votes)
2 views

A-robust-least-squares-support-vector-machine-for-regression-and-classification-with-noise

The document presents a novel robust least squares support vector machine (RLS-SVM) aimed at improving regression and classification tasks in the presence of noise. Unlike weighted least squares support vector machines (WLS-SVMs), which require careful weight assignment for training samples, the RLS-SVM utilizes a truncated least squares loss function to enhance robustness without the need for weight settings. Experimental results demonstrate that RLS-SVM significantly outperforms traditional LS-SVMs and WLS-SVMs in terms of noise resistance and overall robustness.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

A-robust-least-squares-support-vector-machine-for-regression-and-classification-with-noise

The document presents a novel robust least squares support vector machine (RLS-SVM) aimed at improving regression and classification tasks in the presence of noise. Unlike weighted least squares support vector machines (WLS-SVMs), which require careful weight assignment for training samples, the RLS-SVM utilizes a truncated least squares loss function to enhance robustness without the need for weight settings. Experimental results demonstrate that RLS-SVM significantly outperforms traditional LS-SVMs and WLS-SVMs in terms of noise resistance and overall robustness.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/262569252

A robust least squares support vector machine for regression and


classification with noise

Article in Neurocomputing · September 2014


DOI: 10.1016/j.neucom.2014.03.037

CITATIONS READS

129 1,712

3 authors, including:

Xiaowei Yang Lifang He


South China University of Technology Lehigh University
92 PUBLICATIONS 2,729 CITATIONS 224 PUBLICATIONS 6,259 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Lifang He on 14 December 2017.

The user has requested enhancement of the downloaded file.


Neurocomputing 140 (2014) 41–52

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

A robust least squares support vector machine for regression


and classification with noise
Xiaowei Yang a,n, Liangjun Tan a, Lifang He b
a
Department of Mathematics, School of Sciences, South China University of Technology, Guangzhou 510641, PR China
b
School of Computer Science and Engineering, South China University of Technology, Guangzhou 510641, PR China

art ic l e i nf o a b s t r a c t

Article history: Least squares support vector machines (LS-SVMs) are sensitive to outliers or noise in the training
Received 18 September 2013 dataset. Weighted least squares support vector machines (WLS-SVMs) can partly overcome this
Received in revised form shortcoming by assigning different weights to different training samples. However, it is a difficult task
12 March 2014
for WLS-SVMs to set the weights of the training samples, which greatly influences the robustness of
Accepted 15 March 2014
WLS-SVMs. In order to avoid setting weights, in this paper, a novel robust LS-SVM (RLS-SVM) is
Communicated by X. Gao
Available online 13 April 2014 presented based on the truncated least squares loss function for regression and classification with noise.
Based on its equivalent model, we theoretically analyze the reason why the robustness of RLS-SVM is
Keywords: higher than that of LS-SVMs and WLS-SVMs. In order to solve the proposed RLS-SVM, we propose an
Least squares support vector machines
iterative algorithm based on the concave–convex procedure (CCCP) and the Newton algorithm. The
Weighted least squares support vector
statistical tests of the experimental results conducted on fourteen benchmark regression datasets and
machines
Robust least squares support vector ten benchmark classification datasets show that compared with LS-SVMs, WLS-SVMs and iteratively
machine reweighted LS-SVM (IRLS-SVM), the proposed RLS-SVM significantly reduces the effect of the noise in
Regression the training dataset and provides superior robustness.
Classification & 2014 Elsevier B.V. All rights reserved.
Noise

1. Introduction [21,32,33], and control [34–36]. Unfortunately, in real-world


applications, there exist two main drawbacks in LS-SVMs. The
Support vector machines (SVMs) are very important methodol- first one is their solutions are non-sparse [37,38] and the second
ogies for classification [1–4] and regression [5–7] in the fields of one is their training processes are sensitive to noise in the training
pattern recognition and machine learning. It has been widely dataset due to over-fitting [39,40]. In order to deal with the first
applied to many real world pattern recognition problems, such as problem, some pruning algorithms have been proposed [41–44].
text classification [8,9], image classification [10,11], feature extrac- In order to deal with the second problem, two weighted LS-SVMs
tion [12–14], web mining [15] and function estimation [16,17]. (WLS-SVMs) have been presented for regression [45] and classi-
Based on equality constraints instead of inequality ones, two least fication [46], respectively. A key issue for WLS-SVMs is how to
squares support vector machines (LS-SVMs) are proposed for assign suitable weights to training samples. In the previous
classification [18,19] and regression [20,21], respectively. Recently, studies, the weights are assigned to the training samples by a
a matrix pattern based LS-SVM is also presented [22]. The two-stage method [45,47] and a multi-stage method [48]. Theore-
solutions of LS-SVMs are obtained by solving a set of linear tical analyses and the related experiments show that WLS-SVMs
equations instead of solving a quadratic programming (QP) pro- are robust to some noise.
blem as in SVM. Several effective numerical algorithms have been In the field of machine learning, robust loss function is usually
suggested, such as the conjugate gradient based iterative algo- one of the key issues in designing a robust algorithm. At present,
rithm [19,21,23,24], the reduced set of linear equations based various margin-based loss functions, such as squared loss, logistic
algorithm [25], the sequential minimal optimization algorithm loss, hinge loss, exponential loss, 0-1 loss, and brownboost loss,
(SMO) [26], and the Sherman–Morrison–Woodbury (SMW) iden- have been used to search for the optimal classification and
tity based algorithm [27]. regression functions. From their function curves [49], we know
At present, LS-SVMs have been widely applied to text classifi- that squared loss, logistic loss, hinge loss, and exponential loss are
cation [28], image processing [29–31], time series forecasting the upper boundary on the generalization error of a 0-1 loss.
When the training sample has a large negative margin, squared
loss, hinge loss and exponential loss are larger than brownboost
n
Corresponding author. Tel.: þ 86 20 87110446. loss. Therefore, the brownboost loss is usually more robust than
E-mail address: [email protected] (X. Yang). the other loss functions. Recently, motivated by the link between

http://dx.doi.org/10.1016/j.neucom.2014.03.037
0925-2312/& 2014 Elsevier B.V. All rights reserved.
42 X. Yang et al. / Neurocomputing 140 (2014) 41–52

the pinball loss and quantile regression, Huang et al. introduced The optimal conditions can be written as the following system
the pinball loss to classification problems and proposed the pinball of linear equations:
loss SVM (pin-SVM) [50]. The theoretical analysis and the experi- !   
0 eT b 0
mental results show that compared to the hinge loss SVM, the pin- ¼ ; ð4Þ
e ΩþC I
α Y
SVM is less sensitive to the feature noise around the decision
boundary and more stable for re-sampling. where I A Rll is an identity matrix,
In order to avoid setting the weights of the training samples,
which greatly influence the robustness of WLS-SVMs, in this study, Y ¼ ðy1 ; y2 ; …; yl ÞT ; ð5Þ
inspired by the ideas in [51], we propose a novel robust LS-SVM
(RLS-SVM) based on the truncated least squares loss function for α ¼ ðα1 ; α2 ; …; αl ÞT ; ð6Þ
regression and classification with noise. Based on the definition of
influence function [52], we show that the proposed loss function is e ¼ ð1; 1; …; 1ÞT ; ð7Þ
insensitive to noise. Considering that the proposed loss function is
neither differentiable nor convex, inspired by [53], we firstly give a Ω ¼ ðΩij Þ ¼ ðkðxi ; xj ÞÞ; ð8Þ
smoothing procedure to make the proposed loss function smooth.
kðxi ; xj Þ ¼ 〈φðxi Þ; φðxj Þ〉: ð9Þ
Secondly, using the concave–convex procedure (CCCP) [54], we
transform solving a concave–convex optimization problem into
solving iteratively a series of the convex optimization problems. 2.2. Weighted least squares support vector machine for regression
Finally, we apply the Newton algorithm [53] to solve these convex
optimization problems. In order to test the robustness of RLS-SVM, WLS-SVM for the regression problem is described in the
we conduct a set of experiments on four synthetic regression following [45]:
datasets, fourteen benchmark regression datasets, two synthetic
classification datasets and ten benchmark classification datasets. 1 C l
minJðw; b; ξÞ ¼ wT w þ ∑ si ξ2i ; ð10Þ
In the analysis of the experimental results, the Wilcoxon signed- w;b;ξ 2 2i¼1
ranks test and the Friedman test [55] are used to check the s. t.
significant of RLS-SVM.
This paper is organized as follows. In Section 2, we briefly yi  ðwT φðxi Þ þbÞ ¼ ξi ; i ¼ 1; ⋯; l; ð11Þ
review LS-SVMs and WLS-SVMs. In Section 3, we propose RLS- where s ¼ ðs1 ; s2 ; …; sl Þ is a vector of weights associated with the
SVM. In Section 4, we theoretically analyze the reason why the training samples. If sj ¼0, one can delete the corresponding
robustness of RLS-SVM is higher than that of LS-SVMs and WLS- training sample from the model. The optimal dual variables
SVMs. An algorithm for RLS-SVM is given based on the CCCP and can be given by the solution of the following system of linear
the Newton algorithm in Section 5. The experimental results and equations:
analyses are presented in Section 6. Finally, conclusions are given 0 1
0 eT    
in Section 7.
@  A b 0
e Ω þ C diag s1 ; s2 ; …; sl
1 1 1 1 ¼ ; ð12Þ
α Y

2. Least squares support vector machines and weighted least 2.3. Least squares support vector machine for binary classification
squares support vector machines
Considering a training set of l pairs of samples fxi ; yi gli ¼ 1 for
2.1. Least squares support vector machine for regression binary classification, where xi A Rn are the input data and yi A
f  1; þ 1g are the corresponding class labels, LS-SVM for classifica-
Considering a training set of l pairs of samples fxi ; yi gli ¼ 1 for tion problem is also a QP problem based on the equality con-
regression problem, where xi A Rn are the input data and yi A R are straints and quadratic loss function, and can be described in the
the corresponding prediction values, LS-SVM for the regression following [18,19]:
problem is a QP problem based on the equality constraints and can
1 C l
be described in the following [20,21]: minJðw; b; ξÞ ¼ wT w þ ∑ ξ2i ; ð13Þ
w;b;ξ 2 2i¼1
1 C l
minJðw; b; ξÞ ¼ wT w þ ∑ ξ2i ; ð1Þ s. t.
w;b;ξ 2 2i¼1
yi ðwT φðxi Þ þ bÞ ¼ 1  ξi ; i ¼ 1; ⋯; l; ð14Þ
s. t.

yi  ½wT φðxi Þ þ b ¼ ξi ; i ¼ 1; ⋯; l; ð2Þ 2.4. Weighted least squares support vector machine for binary
classification
where w is the normal of the hyperplane, ξi is the error of the ith
training sample, φðxi Þ is a nonlinear function that maps xi to a WLS-SVM for binary classification is described in the following
high-dimensional feature space, C is a regularized parameter [46]:
balancing the tradeoff between the margin and the error, and b
1 C l
is a bias. minJðw; b; ξÞ ¼ wT w þ ∑ si ξ2i ; ð15Þ
w;b;ξ 2 2i¼1
The Lagrangian function of the optimization problem (1) and
(2) is s. t.
l yi ðwT φðxi Þ þ bÞ ¼ 1  ξi ; i ¼ 1; ⋯; l: ð16Þ
Lðw; b; ξ; αÞ ¼ Jðw; b; ξÞ  ∑ αi fwT φðxi Þ þ b þ ξi yi g; ð3Þ
i¼1 Multiplying yi in both sides of (16) yields

where αi are the Lagrangian multipliers. yi  ðwT φðxi Þ þbÞ ¼ yi ξi ; i ¼ 1; ⋯; l: ð17Þ


X. Yang et al. / Neurocomputing 140 (2014) 41–52 43

Let 1 C l
f L ðw; b; sÞ ¼ wT w þ ∑ Ls ðw; b; si ; xi ; yi Þ; ð31Þ
ηi ¼ yi ξi ; i ¼ 1; ⋯; l: ð18Þ 2 2i¼1

Then the optimization problem (15) and (16) can be rewritten ðwr ; br Þ ¼ argminf rob ðw; bÞ; ð32Þ
as w;b

1 C l ðwL ; bL ; sL Þ ¼ argmin min f L ðw; b; sÞ; ð33Þ


minJðw; b; ηÞ ¼ wT w þ ∑ si η2i ; ð19Þ w;b 0 r s r 1
w;b;η 2 2i¼1
s. t. sr ¼ arg min f L ðwr ; br ; sÞ: ð34Þ
0rsr1
yi  ðwT φðxi Þ þ bÞ ¼ ηi ; i ¼ 1; ⋯; l: ð20Þ
From (28), one obtains
From the optimization problems (1) and (2) and (10) and (11),
we know that if all of the weights si are equal to 1, then WLS-SVM min min f L ðw; b; sÞ ¼ min f L ðwL ; bL ; sÞ Z f rob ðwL ; bL Þ Z minf rob ðw; bÞ:
w;b 0 r s r 1 0rsr1 w;b
for regression becomes LS-SVM for regression. Therefore, LS-SVM ð35Þ
for regression is a special case of WLS-SVM for regression.
Obviously, this conclusion is also true for classification. Comparing From (27), we have
the optimization problem (19) and (20) with the optimization minf rob ðw; bÞ ¼ f rob ðwr ; br Þ ¼ min f L ðwr ; br ; sÞ Z min min f L ðw; b; sÞ:
0rsr1 w;b 0 r s r 1
problem (10) and (11), we find that two WLS-SVMs for classification w;b

and regression can be uniformed. Therefore, we will only discuss ð36Þ


the optimization problem (10) and (11) in the following sections. Comparing (35) and (36) yields
1 T C l
3. The novel robust least squares support vector machine min min w w þ ∑ Ls ðw; b; si ; xi ; yi Þ
w;b 0 r s r 12 2i¼1
1 C l
The optimization problem (10) and (11) is equivalent to the ¼ min wT w þ ∑ robust 2 ðw; b; xi ; yi Þ: ð37Þ
w;b 2 2i¼1
following unconstrained optimization problem:
1 C l This shows that the theorem holds. □
minJðw; b; sÞ ¼ wT w þ ∑ si ðyi  ðwT φðxi Þ þ bÞÞ2 : ð21Þ
w;b;s 2 2i¼1 From the following inequality
In order to avoid setting the weights of the training samples and f rob ðwr ; br Þ ¼ f L ðwr ; br ; sr Þ Z f L ðwL ; bL ; sL Þ ¼ f rob ðwL ; bL Þ Zf rob ðwr ; br Þ;
build a more robust learning machine, inspired by the ideas in
ð38Þ
[51], we consider the following two optimization problems:
we know that the optimal solutions ðwr ; br Þ and ðwL ; bL Þ of two
1 T C l
min min w w þ ∑ Ls ðw; b; si ; xi ; yi Þ; ð22Þ optimization problems with respect to ðw; bÞ are interchangeable.
w;b 0 r s r 12 2i¼1
Let the residual of the training sample r ¼ y ðwT φðxÞ þbÞ, then
the loss function robust 2 ðp; rÞ in the optimization problem (23) can
1 C l
min wT w þ ∑ robust 2 ðw; b; xi ; yi Þ; ð23Þ be rewritten as
w;b 2 2i¼1 ( 2 pffiffiffi
2
r ; jr j r p
where robust 2 ðp; rÞ ¼ minðp; r Þ ¼ pffiffiffi :
p; jr j4 p
Ls ðw; b; s; x; yÞ ¼ sðy ðwT φðxÞ þ bÞÞ2 þ pð1  sÞ; ð24Þ
According to the definition of influence function [52], we can
robust 2 ðw; b; x; yÞ ¼ minðp; ðy  ðwT φðxÞ þ bÞÞ2 Þ; ð25Þ obtain its influence function
( pffiffiffi
drobust 2 ðp; rÞ 2r; jr jo p
p Z 0: ð26Þ ¼ pffiffiffi :
dr 0; jr j 4 p
From the definition of Ls ðw; b; s; x; yÞ, we know that if ðy  ðwT
φðxÞ þ bÞÞ2  p Z0, min Ls ðw; b; x; yÞ ¼ p; if ðy ðwT φðxÞ þbÞÞ2  p o 0, Therefore, the proposed loss function is insensitive to noise and
s
min Ls ðw; b; x; yÞ ¼ ðy  ðwT φðxÞ þ bÞÞ2 . Therefore, the following rela- outliers in the training samples, which usually tend to cause large
s
tionship holds: residuals. Considering that the loss function robust 2 ðw; b; x; yÞ in
min Ls ðw; b; s; x; yÞ ¼ robust 2 ðw; b; x; yÞ: ð27Þ (23) is a truncated least squares loss function and the losses of the
s
noise and outliers are bounded, we call the optimization model
From (27), we have (23) RLS-SVM. An important role of the truncated parameter p in
Ls ðw; b; s; x; yÞ Z robust 2 ðw; b; x; yÞ: ð28Þ RLS-SVM is to control the errors of the noise and outliers and
reduce their effects on the robustness of RLS-SVM. It is very
Using (27) and (28), we can prove the following theorem holds. obvious that when p is large enough, the solution of RLS-SVM is
Theorem. The optimization problem (22) is equivalent to the the same as LS-SVM. Based on this observation, we set 0 rp r1
and 0r pr 3 for regression and classification, respectively.
optimization problem (23), i.e.
1 C l
min min wT w þ ∑ Ls ðw; b; si ; xi ; yi Þ
w;b 0 r s r 12 2i¼1 4. Relationship between the solutions of weighted least
1 C l squares support vector machine and the optimization
¼ min wT w þ ∑ robust 2 ðw; b; xi ; yi Þ: ð29Þ
w;b 2 2i¼1 problem (22)

Proof. Define In this section, we discuss the relationship between the solu-
tions of WLS-SVM and the optimization problem (22), and explain
1 C l why the robustness of RLS-SVM is higher than that of LS-SVM and
f rob ðw; bÞ ¼ wT w þ ∑ robust 2 ðw; b; xi ; yi Þ; ð30Þ
2 2i¼1 WLS-SVM.
44 X. Yang et al. / Neurocomputing 140 (2014) 41–52

For the fixed weight vector s, the optimization problem (22) is Based on (43)–(45), the optimization problem (23) can be
equivalent to the following QP problem: rewritten as
1 C l pC l 1 C l C l n
min wT w þ ∑ si ξ2i þ ∑ ð1  si Þ; ð39Þ minJ Rob ðw; bÞ ¼ wT w þ ∑ ðyi  zi Þ2 þ ∑ h ðzi Þ
w;b;ξ 2 2i¼1 2 i¼1 w;b 2 2i¼1 2i¼1
s. t. ¼ J LS LS
vex ðw; bÞ þ J cav ðw; bÞ: ð46Þ
T
yi  ðw xi þbÞ ¼ ξi ; i ¼ 1; 2; …; l: ð40Þ where
Dropping the constant item ðpC=2ÞΣ li ¼ 1 ð1  si Þ
of the objective 1 T C l
function in (39) yields the following optimization problem. J LS
vex ðw; bÞ ¼ w w þ ∑ ðy  zi Þ2 ; ð47Þ
2 2i¼1 i
1 C l
min wT w þ ∑ si ξ2i ; ð41Þ C l n
w;b;ξ 2 2i¼1 J LS
cav ðw; bÞ ¼ ∑ h ðzi Þ: ð48Þ
2i¼1
s. t.
yi  ðwT xi þbÞ ¼ ξi ; i ¼ 1; 2; …; l: ð42Þ
5.2. CCCP for the optimization problem (46)
This is the standard WLS-SVM. Let ðwLS  SV M ; bLS  SV M Þ and
ðwWLS  SV M ; bWLS  SV M Þ be the optimal solutions of LS-SVM and It is difficult to solve (46) by classical convex optimization
WLS-SVM, respectively, then ðwLS  SV M ; bLS  SV M ; sLS  SVM Þ and algorithms because the second term J LS cav ðw; bÞ in the objective
ðwWLS  SV M ; bWLS  SV M ; sWLS  SV M Þ are two feasible solutions but function J Rob ðw; bÞ is non-convex. Fortunately, utilizing the CCCP,
not necessarily optimal solutions for the optimization problem we can transform this non-convex optimization problem into a
(22), where sLS  SV M ¼ ð1; 1; …; 1Þ, sWLS  SV M ¼ ðs1 ; s2 ; …; sl Þ, and 0 o series of the convex optimization problems. According to the basic
si r 1 ði ¼ 1; 2; …; lÞ. Therefore, the robustness of the optimization principle of CCCP, the optimal solution ðw; bÞ of the optimization
problem (22) is usually higher than that of LS-SVM and WLS-SVM. problem (46) can be achieved by iteratively solving the following
Based on the equivalence of the optimization problems (22) and optimization problem until it converges:
(23), we know that the robustness of the optimization problem !
t t
(23) is also usually higher than that of LS-SVM and WLS-SVM. In t þ1 ∂J LS
cav ðw ; b Þ
t
∂J LS ðwt ; b Þ
ðwt þ 1 ; b Þ ¼ argmin J LS
vex ðw; bÞ þ U w þ cav Ub : ð49Þ
order to use some detailed experimental results to support our w;b ∂w ∂b
theoretical analyses, we give an algorithm for solving the optimi-
where
zation problem (23) in the following section.
t t
∂J LS
cav ðw ; b Þ
t l ∂J LS ðw t ; b Þ
¼ ∑ cav Uφðxi Þ; ð50Þ
∂w i¼1 ∂zi
5. Solution to RLS-SVM
t t
∂J LS
cav ðw ; b Þ
t l ∂J LS ðw t ; b Þ
The loss function robust 2 ðw; b; x; yÞ in the optimization problem ¼ ∑ cav ð51Þ
∂b i¼1 ∂zi
(23) is neither differentiable nor convex so that most convex
optimization methods cannot be employed to solve it. In order to Let
overcome this difficulty, inspired by [53], we firstly give a smoothing 8 pffiffiffi pffiffiffi
>
> Cðyi  zi Þ; zi o yi  p  h or zi 4 yi þ p þ h
procedure to make the loss function robust 2 ðw; b; x; yÞ smooth. >
> p ffiffi pffiffi  pffiffiffi
> Cðh þ 2 pÞðyi þ h  p  zi Þ 
; zi  yi þ p r h
Secondly, using the CCCP [54], we transform solving a concave– ∂J LS ðwt Þ < 4h
λti ¼ cav ¼ pffiffiffi pffiffiffi :
convex optimization problem into solving iteratively a series of the ∂zi >
> 0; yi þ h  p ozi oyi  hþ p
>
> pffiffi pffiffi  pffiffiffi
convex optimization problems. Finally, we apply the Newton algo- >
: Cðh þ 2 pÞðyi  h þ p  zi Þ 
; zi  yi  p r h
4h
rithm [53] to solve these convex optimization problems.
ð52Þ
5.1. The smoothing procedure of the loss function Then (49) can be rewritten as
!
t t
Let z ¼ wT φðxÞ þ b; then t þ1 ∂J LS
cav ðw ; b Þ
t
∂J LS ðwt ; b Þ
ðwt þ 1 ; b Þ ¼ argmin J LS
vex ðw; bÞ þ U w þ cav Ub
w;b ∂w ∂b
robust 2 ðw; b; x; yÞ ¼ min fp; ðy zÞ2 g ¼ ðy  zÞ2 þ hðzÞ; ð43Þ
!
where l l
( pffiffiffi pffiffiffi ¼ argmin J LS t t
vex ðw; bÞ þ ∑ λi ϕðx i Þ Uw þ b ∑ λi ð53Þ
0; y  p r z ry þ p w;b i¼1 i¼1
hðzÞ ¼ : ð44Þ
p  ðy  zÞ2 ; otherwise
5.3. Newton algorithm for solving the optimization problem (53)
It is very obvious that hðzÞ is a non-smooth function. In order to solve
the optimization problem (23) via classical convex optimization algori- Using the representation theorem in the reproducing kernel
n
thms, we use the following smoothing function h ðzÞ to replace hðzÞ. Hilbert space, w can be written as follows:
8 pffiffiffi pffiffiffi
>
> p  ðy zÞ2 ; z o y  p  h or z 4 y þ p þh l
>
> pffiffi pffiffi   w ¼ ∑ αi φðxi Þ: ð54Þ
>
< p ffiffiffi
; z  yþ p r h
2
ðh þ 2 pÞðy þ h  p  zÞ i¼1
n
 4h
h ðzÞ ¼ pffiffiffi pffiffiffi ; ð45Þ
>
> 0; y þ h  p o z oy  h þ p Substituting (54) into (53) yields
>
> ffiffi ffiffi 
>
:
p p pffiffiffi !
 ðh þ 2 pÞðy  h þ p  zÞ ; z  y p r h
2

4h 1 T C l l
tþ1
ðαt þ 1 ; b Þ ¼ argmin α Kα þ αT ∑ Ki KTi α þ ∑ ðλti  Cyi ÞKTi α
α 2 2 i¼1 i¼1
where h is the smoothing parameter, typically taking its values
n
between 0.001 and 0.5. Obviously, the function h ðzÞ is continuous !
C l 2 l l Cl 2
and twice-differentiable. When h-0, the function h(z) can be þ ∑ yi þ b ∑ ðλti  Cyi Þ þ Cb ∑ KTi α þ b ð55Þ
2i¼1 i¼1 i¼1 2
recovered immediately.
X. Yang et al. / Neurocomputing 140 (2014) 41–52 45

The Hessian matrix and the gradient of the optimization problem datasets to test the robustness of RLS-SVM. In order to achieve this
(55) are goal, we compare the robustness of RLS-SVM with that of LS-SVM,
! WLS-SVM, and IRLS-SVM in regression experiments. In classification
CeT e CeT K
H¼ ð56Þ experiments, we compare the robustness of RLS-SVM with that of
CKe K þ CKK LS-SVM and WLS-SVM. All the programs are written in Cþ þ and
and compiled using Microsoft Visual Cþ þ6.0 compiler. All computations
!  ! are conducted on a computer with 2.8 GHz Intel(R) Pentium(R) 4
CeT e CeT K b eT ðλ  CYÞ processor, a maximum of 1.96 GB memory and running Microsoft
∇¼ þ ; ð57Þ
CKe K þ CKK α Kðλ  CYÞ Windows XP.
In order to evaluate the robustness of WLS-SVMs, inspired by
respectively. the ideas in [45] and [56], we give the following seven weight-
For any nonzero vector ðb αT Þ A Rl þ 1 , setting formulas:
!   
CeT e CeT K 8
ðb αT Þ
b
¼ αT Kα þCðbe þ KαÞT ðbe þ KαÞ 4 0: >
> 1 if ξi =s^  r c1
CKe K þ CKK α <  
c2  jξi =s^j
sSuyken ¼ if c1 r ξi =s^  r c2 ; ð60Þ
ð58Þ
i >
>
c2  c1
: 4
10 otherwise
Therefore, the optimization problem (55) is a strictly convex
quadratic programming. We can solve it using the Newton algo-  lin jξi j
shyp ¼ 1 ; ð61Þ
rithm. From i maxðjξi jÞ þ Δ
!  !
CeT e CeT K b eT ðλ  CYÞ 2
þ ¼ 0; shyp  exp
¼ ; ð62Þ
CKe K þCKK α Kðλ  CYÞ i 1 þ expðβjξi jÞ
we can obtain that the iterative formula for updating di
cen
!  lin
t þ1
scen
i ¼ 1 cen ; ð63Þ
b maxðdi Þ þ Δ
αt þ 1
 exp 2
scen
i ¼ cen ; ð64Þ
is in the following: 1 þ expðβdi Þ
! !1 !
t þ1 eT ðλ CYÞ
b CeT e CeT K sph
¼  lin di
α t þ1 CKe K þ CKK Kðλ CYÞ ssph
i ¼ 1 sph
; ð65Þ
!  1 maxðdi Þ þ Δ

0 eT 0
¼ ; ð59Þ  exp 2
Ce I þCK λ CY ssph ¼ ; ð66Þ
i sph
1 þ expðβdi Þ
where λ ¼ ðλt1 ; λt2 ; …; λtl ÞT .
Based on the analyses above, we can give the detailed algo- where c1 ¼2.5, c2 ¼3.0, ξi are the sample errors obtained by
cen
rithm as follows: LS-SVMs, s^ ¼ 1:483  madðξi Þ, β¼0.3, Δ¼0.001, di is the Eucli-
dean distance between the training sample xi and its own class
sph
Step 1: Given the tolerance parameter ε, the SVM hyperpara- center, di is the Euclidean distance between the training sample
meters ðs; CÞ, and the iterative variable t ¼ 0. xi and the center of the minimum enclosed ball covering the two
 lin
0
Step 2: Obtain the initial values ðb ; α0 Þ via solving the standard classes. In regression experiments, we use sSuyken i , shyp
i and
hyp  exp
LS-SVM. si to set the weight of the training sample xi , respectively.
 lin  exp
Step 3: Calculate ðb
tþ1
; αt þ 1 Þ according to the formulas (52) and In classification experiments, we use sSuyken i , shyp
i , shyp
i ,
cen  lin cen  exp sph  lin sph  exp
(59). si , si , si and si to set the weight of the
Step 4: Check whether jjðb
tþ1 t
; αt þ 1 Þ  ðb ; αt Þjj o ε holds. If yes, training sample xi , respectively.
terminate; otherwise, t ¼ t þ1, go to Step 3. For IRLS-SVM, we use the Myriad weight function, in which the
parameter δ A f0:5; 1; 1:5; 2; 2:5; …; 20g. In RLS-SVM, the tolerance
parameter ε ¼ 0:001 and the smooth parameter h A f0:025; 0:05;
From [54], we know that the CCCP is globally or locally con-
0:075; ⋯; 0:5g.
vergent. In Step 3, we obtain a globally optimal solution. Therefore
In all of the experiments, the Gaussian kernel function is
the proposed algorithm is globally or locally convergent. Let T r and T i
adopted, the hyperparameters s A f2  4 ; 2  3 ; 2  2 ; ⋯; 25 g and C A
be the total iterative number of the CCCP and iteratively reweighted
f20 ; 21 ; 22 ; ⋯; 29 g. In order to obtain the unbiased statistical results,
LS-SVM (IRLS-SVM) [48] respectively, then the computational com-
we use the ten-fold cross validation strategy to search the optimal
plexity of LS-SVM, WLS-SVM, IRLS-SVM, and RLS-SVM are Oððl þ 1Þ3 Þ,
parameters and weight-setting strategy.
Oð2ðl þ 1Þ3 Þ, OðT i ðl þ 1Þ3 Þ, and OðT r ðl þ1Þ3 Þ, respectively. From the
analyses of the computational complexity, we know that the training
time of IRLS-SVM and RLS-SVM are usually longer than those of 6.1. Regression experiments
LS-SVM and WLS-SVM. When T i 4T r , the training time of IRLS-SVM
is longer than that of RLS-SVM. Otherwise, the training time of IRLS- In regression experiments, we firstly generate four synthetic
SVM is shorter than that of RLS-SVM. datasets to show the robustness of RLS-SVM. Training datasets are
generated by the sine function with four different kinds of noise:
Gaussian noise, multiplicative noise, heterogeneous variant Gaus-
6. Numerical experiments and discussions sian noise, and transformed κ2 noise. Testing dataset is generated
by the sine function. These datasets are illustrated in Fig. 1.
In this section, we will conduct experiments on four synthetic Secondly, we conduct the experiments on fourteen benchmark
regression datasets, fourteen benchmark regression datasets, two regression datasets to test the robustness of RLS-SVM, where the
synthetic classification datasets, and ten benchmark classification datasets Chwirut, Nelson, Gauss3 and Enso are downloaded from
46 X. Yang et al. / Neurocomputing 140 (2014) 41–52

Fig. 1. Four synthetic regression datasets (a) synthetic training dataset one with Gaussian noise ( the mean is zero and the variance is 0.1); (b) synthetic training dataset two
with multiplicative noise (y ¼ ð1 þ vÞ  y, where v is random noise whose mean is zero and variance is 0.05); (c) synthetic training dataset three with heterogeneous variant
Gaussian noise (the mean of noise is zero and the variance of noise changes as the input coordinate changes); (d) synthetic training dataset four with transformed κ 2 noise,
whose degree of the freedom is 199; and (e) synthetic testing dataset. The number of training samples and testing samples are 200 and 100, respectively.

http://www.itl.nist.gov/div898/strd/nls/nls_main.shtml, the data- MPG and Bodyfat are downloaded from http://www.csie.ntu.edu.
sets Boston Housing, Heart Disease and Servo are downloaded tw/  cjlin/libsvmtools/datasets/, the datasets Pollution Scale and
from http://archive.ics.uci.edu/ml/datasets.html, the datasets Auto Crabs are downloaded from http://stat.cmu.edu/datasets/, and the
X. Yang et al. / Neurocomputing 140 (2014) 41–52 47

Table 1 In the field of machine learning, the Friedman test is usually


Detailed information of fourteen benchmark regression datasets. used to compare the performances of multiple learning machines
[55]. It ranks the learning machines for each dataset separately,
Datasets Number of examples Number of attributes
the learning machine with the best performance gets the rank of 1,
Chwirut 214 1 the second best gets the rank of 2. In case of equality (like in
Nelson 128 2 Gauss3), average ranks are assigned. Based on this ranking
Boston Housing 506 13 criterion, we first rank LS-SVM, WLS-SVM, IRLS-SVM and RLS-
Pollution Scale 60 14
Gauss3 250 1
SVM for each dataset and the results are reported in Table 3. Then
Heart Disease 400 4 we use the Friedman test to conduct a statistical comparison of
Crabs 200 6 these four learning machines and demonstrate the robustness of
Compass 108 2 RLS-SVM, where the significance level α is set to 0.05.
Bolts 40 7
From Table 3, we can obtain that the average ranks of LS-SVM,
Motocycle 133 1
Servo 167 4 WLS-SVM, IRLS-SVM and RLS-SVM are R1 ¼3.9286, R2 ¼ 2.4643,
Auto MPG 392 7 R3 ¼2.3214 and R4 ¼1.2857, respectively. In the Friedman test, the
test statisticspfor comparing ffi the i  th and j  th learning machines
Bodyfat 252 14
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Enso 168 1 z ¼ ðRi  Rj Þ= ðkðk þ 1Þ=6NÞ, where k is the number of the com-
pared learning machines and N is the number of the experimental
datasets. After obtaining z, we find the corresponding probability
Table 2 prob from the table of normal distribution and calculate
Comparison of the results of LS-SVM, WLS-SVM, IRLS-SVM and RLS-SVM on four p ¼ 2nð1  probÞ. In order to run the Hochberg's step-up procedure
synthetic regression datasets. [60], we sort the hypotheses in descending order according to
their significance. The sorted results and the corresponding p are
Datasets LS-SVM WLS-SVM IRLS-SVM RLS-SVM
reported in Table 4. From Table 4, we know that for the last
Synthetic dataset one 0.9920 0.9968 0.9986 0.9995 hypothesis, 0.034 o0.05. Therefore, we should reject this null-
Synthetic dataset two 0.9986 0.9989 0.9999 0.9998 hypothesis. It indicates that RLS-SVM is significantly more robust
Synthetic dataset three 0.9959 0.9977 0.9991 0.9994 than LS-SVM, WLS-SVM and IRLS-SVM for regression with noise.
Synthetic dataset four 0.9905 0.9929 0.9907 0.9934
However, from Table 2, we know that the training time of RLS-
SVM is longer than that of LS-SVM and WLS-SVM while the
datasets Compass and Bolts are downloaded from http://www.sci. training time of IRLS-SVM is longer than that of RLS-SVM. As for
usq.edu.au/staff/dunn/Datasets/index.html. The detailed informa- the testing time, in most cases, there are not significant differences
tion about these datasets is listed in Table 1. Each attribute of the among four learning machines.
sample including the output is normalized into [  1,1] and the
truncated parameter p A f0:001; 0:01; 0:1g.
In regression analysis, the mean absolute error (MAE) is usually 6.2. Classification experiments
used to evaluate the generalization error of the algorithm. How-
ever, statisticians have noticed that MAE is sensitive to noise and In the field of machine learning, the noise in classification
outliers in the testing set [57,58]. To solve this problem, statisti- problems is usually divided into two categories [61]: contradictory
cians suggested using R2 statistics to evaluate the regression examples, which appear more than once and are labeled with
models in noisy circumstances. In this section, we use the R2 different classes, and misclassification examples, which are labeled
statistics to evaluate the robustness of LS-SVM, WLS-SVM, IRLS- with wrong classes. In classification experiments, we firstly give two
SVM and RLS-SVM, which is defined as follows [59]: synthetic datasets with class noise, which are illustrated in Fig. 2, to
 2 show that RLS-SVM is insensitive to class noise. Secondly, we conduct
medðjyi  f ðxi ÞjÞ
R2 ¼ 1  ; ð67Þ the experiments on ten benchmark datasets to test the robustness of
madðyi Þ
RLS-SVM, where the dataset Ripley is from [62]; the datasets Banana,
where medðjyi  f ðxi ÞjÞ is the median of the absolute errors Cleveland Heart, Glass, Heartstatlog, Liver Disorder, Monk, PIMA,
between the objective values and the forecasting values of the Transfusion and Vehicle are downloaded from http://archive.ics.uci.
testing samples and edu/ml/datasets.html. The detailed information about benchmark
datasets is listed in Table 5. Each attribute of the sample is normalized
madðyi Þ ¼ medðjyi  medðyj ÞjÞ ð68Þ
into [ 1,1] and the truncated parameter p A f0:2; 0:5; 0:8; 1:0; 1:5;
is the median of the absolute errors between the objective values 2:0; 3:0g. In order to compare the robustness of RLS-SVM with that of
and the median of the objective values of the testing samples. LS-SVM and WLS-SVM, The experimental results on two synthetic
Generally speaking, for a reasonable model, 0 r R2 r 1. From datasets are reported in Table 6. The testing accuracy of three learning
Eq. (67), we know that the smaller medðjyi  f ðxi ÞjÞ is, the larger machines on ten benchmark classification datasets, the optimal
R2 is. It is very obvious that R2 ¼1 corresponds to a perfect fit and parameters, the optimal weight-setting strategy, the training time
R2 o0 corresponds to a bad fit. and the testing time are reported in Table 7. The best testing accuracy
The optimal statistics R2 obtained by LS-SVM, WLS-SVM, IRLS- are presented in bold type.
SVM and RLS-SVM on four synthetic regression datasets are reported From Table 6, we can observe that RLS-SVM is the most robust
in Table 2. The optimal statistics R2 obtained by LS-SVM, WLS-SVM, to class noise among three learning machines.
IRLS-SVM and RLS-SVM on fourteen benchmark regression datasets, Next, we analyze the experimental results obtained by three
the optimal parameters, the optimal weight-setting strategy, the learning machines on ten benchmark datasets. In order to demon-
corresponding training time and testing time are reported in Table 3. strate the robustness of RLS-SVM on these datasets, similarly to
The best statistics R2 are presented in bold type. the regression analysis above, we first rank LS-SVM, WLS-SVM and
From Table 2, we can observe that RLS-SVM is the most robust RLS-SVM for each classification dataset and the results are listed in
to Gaussian noise, heterogeneous variant Gaussian noise, and Table 7. Then we use the Friedman test to conduct a statistical
transformed κ2 noise. For multiplicative noise, the robustness of comparison of these three learning machines, where the signifi-
RLS-SVM is comparable with that of IRLS-SVM. cance level α is set to 0.05.
48 X. Yang et al. / Neurocomputing 140 (2014) 41–52

Table 3
Comparison of the results of LS-SVM, WLS-SVM, IRLS-SVM and RLS-SVM on fourteen benchmark regression datasets.

Datasets Algorithms p h δ s C The optimal weight R2 Ranks Training time (s) Testing time (s)

Chwirut LS-SVM – – – 1.0000 32.0000 – 0.9769 4 0.0858 0.0000


WLS-SVM – – – 1.0000 32.0000 sSuyken 0.9793 3 0.1609 0.0000
i
IRLS-SVM – – 2.0 1.0000 32.0000 0.9800 2 1.8548 0.0000
RLS-SVM 0.01 0.075 – 1.0000 64.0000 – 0.9815 1 0.3811 0.0000

Nelson LS-SVM – – – 1.0000 32.0000 – 0.6517 4 0.0204 0.0000


WLS-SVM – – – 1.0000 64.0000 shyp  lin 0.6713 3 0.0234 0.0000
i
IRLS-SVM – – 2.5 1.0000 32.0000 0.6719 2 0.1908 0.0000
RLS-SVM 0.01 0.050 – 1.0000 128.0000 – 0.7128 1 0.1125 0.0000

Boston Housing LS-SVM – – – 1.0000 64.0000 – 0.8708 4 1.4753 0.0016


WLS-SVM – – – 1.0000 256.0000 sSuyken 0.8818 2 2.6921 0.0016
i
IRLS-SVM – – 1.0 1.0000 64.0000 0.8735 3 147.0519 0.0062
RLS-SVM 0.01 0.025 – 1.0000 32.0000 – 0.8839 1 5.9359 0.0078

Pollution Scale LS-SVM – – – 4.0000 16.0000 – 0.6413 4 0.0047 0.0000


WLS-SVM – – – 4.0000 16.0000 shyp  lin 0.6938 1 0.0221 0.0016
i
IRLS-SVM – – 2.5 4.0000 16.0000 0.6845 3 0.0358 0.0000
RLS-SVM 0.01 0.05 – 4.0000 16.0000 – 0.6928 2 0.0235 0.0000

Gauss3 LS-SVM – – – 0.1250 32.0000 – 0.9972 4 0.1266 0.0000


WLS-SVM – – – 0.1250 128.0000 sSuyken 0.9973 2.5 0.2594 0.0000
i
IRLS-SVM – – 4.0 0.1250 32.0000 0.9973 2.5 0.9078 0.0000
RLS-SVM 0.01 0.075 – 0.1250 64.0000 – 0.9974 1 0.5702 0.0000

Heart Disease LS-SVM – – – 1.0000 4.0000 – -0.4059 4 0.7656 0.0016


WLS-SVM – – – 0.5000 8.0000 sSuyken 0.5298 3 2.6112 0.0031
i
IRLS-SVM – – 0.5 1.0000 4.0000 0.7240 1 18.2749 0.0015
RLS-SVM 0.001 0.025 – 1.0000 512.0000 – 0.5988 2 3.6968 0.0000

Crabs LS-SVM – – – 1.0000 128.0000 – 0.9876 3 0.0812 0.0000


WLS-SVM – – – 1.0000 128.0000 shyp  exp 0.9876 3 0.1656 0.0016
i
IRLS-SVM – – 15.0 1.0000 128.0000 0.9876 3 0.3608 0.0000
RLS-SVM 0.01 0.100 – 1.0000 512.0000 – 0.9877 1 0.3689 0.0000

Compass LS-SVM – – – 1.0000 2.0000 – 0.3941 4 0.0144 0.0000


WLS-SVM – – – 32.0000 256.0000 sSuyken 0.6086 3 0.0250 0.0000
i
IRLS-SVM – – 0.5 1.0000 2.0000 0.6130 2 0.2347 0.0000
RLS-SVM 0.01 0.125 – 4.0000 64.0000 – 0.8760 1 0.0685 0.0000

Bolts LS-SVM – – – 8.0000 512.0000 – 0.5020 4 0.0032 0.0000


WLS-SVM – – – 8.0000 512.0000 shyp  lin 0.6010 3 0.0032 0.0000
i
IRLS-SVM – – 4.0 8.0000 512.0000 0.8768 1 0.0359 0.0000
RLS-SVM 0.01 0.075 – 8.0000 512.0000 – 0.7293 2 0.0094 0.0000

Motocycle LS-SVM – – – 0.1250 512.0000 – 0.4818 4 0.0219 0.0000


WLS-SVM – – – 0.1250 128.0000 shyp  lin 0.5176 3 0.0235 0.0000
i
IRLS-SVM – – 0.5 0.1250 512.0000 0.6854 2 0.8609 0.0000
RLS-SVM 0.01 0.050 – 0.1250 4.0000 – 0.6911 1 0.1156 0.0000

Servo LS-SVM – – – 1.0000 32.0000 – 0.7466 4 0.0468 0.0000


WLS-SVM – – – 1.0000 256.0000 sSuyken 0.8763 2 0.0781 0.0000
i
IRLS-SVM – – 2.0 1.0000 32.0000 0.8561 3 1.8952 0.0000
RLS-SVM 0.001 0.025 – 1.0000 256.0000 – 0.8936 1 0.3249 0.0015

Auto MPG LS-SVM – – – 1.0000 32.0000 – 0.9333 4 0.8671 0.0032


WLS-SVM – – – 2.0000 256.0000 sSuyken 0.9413 2 1.4564 0.0032
i
IRLS-SVM – – 10.0 1.0000 32.0000 0.9349 3 5.7719 0.0031
RLS-SVM 0.001 0.025 – 1.0000 256.0000 – 0.9416 1 2.7312 0.0016

Bodyfat LS-SVM – – – 8.0000 256.0000 – 0.9979 4 0.1997 0.0000


WLS-SVM – – – 8.0000 512.0000 sSuyken 0.9997 1 0.3279 0.0000
i
IRLS-SVM – – 10.0 8.0000 256.0000 0.9993 3 1.3656 0.0015
RLS-SVM 0.001 0.025 – 8.0000 512.0000 – 0.9994 2 0.8172 0.0000

Enso LS-SVM – – – 0.0625 8.0000 – 0.6891 4 0.0439 0.0000


WLS-SVM – – – 0.0625 8.0000 shyp  exp 0.6905 3 0.0876 0.0000
i
IRLS-SVM – – 4.0 0.0625 8.0000 0.6967 2 0.3437 0.0000
RLS-SVM 0.1 0.300 – 0.0625 16.0000 – 0.7059 1 0.2015 0.0000

From Table 7, we can obtain that the average ranks of LS-SVM, The sorted results and the corresponding p are reported in Table 8.
WLS-SVM and RLS-SVM are R1 ¼2.75, R2 ¼ 1.95 and R3 ¼1.30, respec- From Table 8, we know that for the first hypothesis, 0.001o0.025.
tively. In order to run the Holm's step-down procedure [63], we sort Therefore, RLS-SVM is significantly more robust than LS-SVM. How-
the hypotheses in descending order according to their significance. ever, for the last hypothesis, 0.14640.05. It indicates that the
X. Yang et al. / Neurocomputing 140 (2014) 41–52 49

Friedman test cannot detect the significance between RLS-SVM and where Tða; bÞ ¼ min fR þ ða; bÞ; R  ða; bÞg, N is the number of the
WLS-SVM. Considering that the Wilcoxon signed-ranks test is used experimental datasets, R þ (a,b) is the sum of ranks for the experi-
to compare the performances of two learning machines in the field of mental datasets on which learning machine b outperforms learn-
machine learning [55], in the following, we use it to detect the ing machine a and R  (a,b) is the sum of ranks for the opposite,
significance between RLS-SVM and WLS-SVM. which are defined as follows:
The Wilcoxon signed-ranks test ranks the differences in per-
formances of two learning machines for each dataset. The differ- 1
R þ ða; bÞ ¼ ∑ rankðdi Þ þ ∑ rankðdi Þ; ð70Þ
ences are ranked according to their absolute values. The smallest di 4 0 2di ¼ 0
absolute value gets the rank of 1, the second smallest gets the rank
of 2. In case of equality, average ranks are assigned. The statistics
of the Wilcoxon signed-ranks test is [55]: Table 5
Detailed information of ten benchmark datasets.
Tða; bÞ ðNðN þ 1Þ=4Þ
zða; bÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi; ð69Þ
ð1=24ÞNðN þ 1Þð2N þ 1Þ Datasets Number of Number of Number of
Examples Classes Attributes

Banana 400 2 2
Table 4
Cleveland heart 303 5 13
The sorted results of significance of RLS-SVM VS. LS-SVM, WLS-SVM and IRLS-SVM
Glass 211 6 9
for regression.
Heartstatlog 270 2 13
qffiffiffiffiffiffiffiffiffiffiffiffi α Liver disorder 345 2 6
i Learning machines þ 1Þ p
z ¼ ðRi  R4 Þ= kðk6N ki
Monk 122 2 6
PIMA 768 2 8
1 LS-SVM (3.9286  1.2857)/0.488¼ 5.416 0.000 0.017 Ripley 250 2 2
2 WLS-SVM (2.4643 1.2857)/0.488¼ 2.415 0.016 0.025 Transfusion 748 2 4
3 IRLS-SVM (2.3214  1.2857)/0.488 ¼2.122 0.034 0.05 Vehicle 846 4 18

Fig. 2. Two synthetic classification datasets: (a) Training dataset one with misclassification examples and the number of training examples is 172; (b) Training dataset two
with contradictory examples and misclassification examples. The number of training examples is 174; and (c) Testing dataset. The number of testing examples is 205.
50 X. Yang et al. / Neurocomputing 140 (2014) 41–52

Table 6
Comparison of the results of LS-SVM, WLS-SVM and RLS-SVM on two synthetic classification datasets.

Datasets Algorithms p h s C The optimal weight Testing accuracy

Synthetic dataset one LS-SVM – – 0.0625 1.0000 – 0.9561


WLS-SVM – – 0.0625 128.0000 sSuyken 0.9561
i
RLS-SVM 0.2 0.025 32 256.0000 – 1.0000

Synthetic dataset two LS-SVM – – 0.2500 128.000 – 0.9512


WLS-SVM – – 0.5000 512.0000 shyp  exp 0.9805
i
RLS-SVM 0.2 0.025 32.0000 256.0000 – 1.0000

Table 7
Comparison of the results of LS-SVM, WLS-SVM and RLS-SVM on ten benchmark classification datasets.

Datasets Algorithms p h s C The optimal weight Testing accuracy Ranks Training time (s) Testing time (s)

Banana LS-SVM – – 0.1250 1.0000 – 0.8850 3 0.7717 0.0000


WLS-SVM – – 0.1250 2.0000 shyp  exp 0.8875 1.5 0.9094 0.0016
i
RLS-SVM 0.2 0.050 0.1250 8.0000 – 0.8875 1.5 1.7048 0.0000

Cleveland Heart LS-SVM – – 16.000 16.000 – 0.5433 3 0.6420 0.0047


WLS-SVM – – 16.0000 8.0000 sSuyken 0.5567 2 1.0487 0.0062
i
RLS-SVM 0.2 0.025 8.0000 256.0000 – 0.5633 1 1.6405 0.0077

Glass LS-SVM – – 1.0000 8.0000 – 0.6952 2.5 0.2153 0.0062


WLS-SVM – – 2.0000 64.0000 shyp  exp 0.6952 2.5 0.3216 0.0000
i
RLS-SVM 0.2 0.450 2.0000 512.0000 – 0.7381 1 0.5845 0.0000

Heartstalog LS-SVM – – 32.0000 64.0000 – 0.8407 2 0.2780 0.0015


 exp
WLS-SVM – – 32.0000 256.0000 scen
i
0.8407 2 0.3609 0.0030
RLS-SVM 2.0 0.150 2.0000 1.0000 – 0.8407 2 0.6797 0.0000
Liver disorder LS-SVM – – 4.0000 128.0000 – 0.7735 3 0.5280 0.0015
WLS-SVM – – 4.0000 128.0000 scen
i
 lin 0.7824 1.5 0.7437 0.0016
RLS-SVM 3.0 0.025 4.0000 128.0000 – 0.7824 1.5 1.1063 0.0000

Monk LS-SVM – – 4.0000 512.0000 – 0.9167 2 0.0314 0.0000


WLS-SVM – – 2.0000 64.0000 ssph  exp 0.9167 2 0.0470 0.0000
i
RLS-SVM 3.0 0.025 4.0000 512.0000 – 0.9167 2 0.0683 0.0000

PIMA LS-SVM – – 8.0000 256.0000 – 0.7750 3 5.5578 0.0046


WLS-SVM – – 2.0000 4.0000 shyp  lin 0.7789 2 12.2968 0.0109
i
RLS-SVM 2.0 0.350 4.0000 64.0000 – 0.7816 1 11.1577 0.0046

Ripley LS-SVM – – 0.5000 8.0000 – 0.8800 3 0.1717 0.0000


WLS-SVM – – 0.1250 16.0000 sSuyken 0.8960 2 0.3733 0.0000
i
RLS-SVM 0.8 0.125 0.2500 2.0000 – 0.9000 1 0.4031 0.0000

Transfusion LS-SVM – – 1.0000 64.0000 – 0.8243 3 5.0513 0.0047


WLS-SVM – – 0.2500 1.0000 sSuyken 0.8284 2 11.5060 0.0109
i
RLS-SVM 0.8 0.100 0.1250 2.0000 – 0.8311 1 10.2362 0.0047

Vehicle LS-SVM – – 2.0000 512.0000 – 0.8571 3 7.3796 0.0331


WLS-SVM – – 2.0000 512.0000 shyp  exp 0.8583 2 14.1562 0.0329
i
RLS-SVM 2.0 0.225 2.0000 128.0000 – 0.8726 1 16.0751 0.0328

Table 8 Table 9
The sorted results of significances of RLS-SVM VS. LS-SVM and WLS-SVM for The difference between the optimal testing accuracy of WLS-SVM and RLS-SVM
classification. and their rank values on ten benchmark classification datasets.

qffiffiffiffiffiffiffiffiffiffiffiffi α
i Learning machines kðk þ 1Þ p Datasets WLS-SVM RLS-SVM di rank(di)
z ¼ ðRi  R3 Þ= 6N
ki

Banana 0.8875 0.8875 0.0000 2.5


1 LS-SVM (2.75  1.30)/0.447¼3.244 0.001 0.025
Cleveland heart 0.5567 0.5633 0.0066 8
2 WLS-SVM (1.95  1.30)/0.0477 ¼1.454 0.146 0.05
Glass 0.6952 0.7381 0.0429 10
Heartstalog 0.8407 0.8407 0.0000 2.5
Liver disorder 0.7824 0.7824 0.0000 2.5
1 Monk 0.9167 0.9167 0.0000 2.5
R  ða; bÞ ¼ ∑ rankðdi Þ þ ∑ rankðdi Þ: ð71Þ PIMA 0.7789 0.7816 0.0027 5.5
di o 0 2di ¼ 0
Ripley 0.8960 0.9000 0.004 7
Transfusion 0.8284 0.8311 0.0027 5.5
where di is the difference between the performance scores of two Vehicle 0.8583 0.8726 0.0143 9
learning machines on the ith experimental dataset, rank(di) is the
X. Yang et al. / Neurocomputing 140 (2014) 41–52 51

rank value of |di|. di and rank(di) on ten classification datasets are [10] H. Sahbi, J.Y. Audibert, R. Keriven, Context-dependent kernels for object
reported in Table 9. classification, IEEE Trans. Pattern Anal. Mach. Intell. 33 (4) (2011) 699–708.
[11] M.M. Rahman, S.K. Antani, G.R. Thoma, A learning-based similarity fusion
From Table 9, based on formulas (69)–(71), we can obtain and filtering approach for biomedical image retrieval using SVM classification
zðWLS  SVM; RLS  SVMÞ ¼  2:29 o  1:96. It shows that for the and relevance feedback, IEEE Trans. Inf. Technol. Biomed. 15 (4) (2011)
significance level of 0.05, RLS-SVM is significantly more robust 640–646.
[12] Y.Q. Li, C.T. Guan, Joint feature re-extraction and classification using an
than WLS-SVM for classification with noise. However, from iterative semi-supervised support vector machine algorithm, Mach. Learn.
Table 7, we can find that the training time of RLS-SVM is longer 71 (1) (2008) 33–53.
than LS-SVM and WLS-SVM for classification with noise. In most [13] K.Q. Shen, C.J. Ong, X.P. Li, Feature selection via sensitivity analysis of SVM
probabilistic outputs, Mach. Learn. 70 (1) (2008) 1–20.
cases, the testing time of RLS-SVM is the shortest.
[14] C. Dhanjal, S.R. Gunn, J. Shawe-Taylor, Efficient sparse kernel feature extrac-
tion based on partial least squares, IEEE Trans.Pattern Anal. Mach. Intell. 31 (8)
(2009) 1347–1361.
[15] D. Bollegala, Y. Matsuo, M. Ishizuka, A web search engine-based approach to
7. Conclusion and future work
measure semantic similarity between words, IEEE Trans. Knowl. Data Eng. 23
(7) (2011) 977–990.
The contributions of this paper are as follows. Firstly, a novel [16] L.J. Cao, F.E.H. Tay, Support vector machine with adaptive parameters in
RLS-SVM model is presented based on the truncated least squares financial time series forecasting, IEEE Trans. Neural Netw. 14 (6) (2003)
1506–1518.
loss for regression and classification with noise. Secondly, the [17] M. Narwaria, W.W. Lin, Objective image quality estimation based on support
relationship between the solutions of WLS-SVM and the equiva- vector regression, IEEE Trans. Neural Netw. 21 (3) (2010) 515–519.
lent model of RLS-SVM is discussed. Thirdly, an algorithm for [18] J.A.K. Suykens, J. Vandewalle, Least squares support vector machine classifiers,
Neural Process. Lett. 9 (3) (1999) 293–300.
solving RLS-SVM is presented based on CCCP and the Newton [19] T. Van Gestel, J.A.K. Suykens, B. Baesens, S. Viaene, J. Vanthienen, G. Dedene,
algorithm. The experiments have been conducted on four syn- B. De Moor, J. Vandewalle, Benchmarking least squares support vector
thetic regression datasets, fourteen benchmark regression data- machine classifiers, Mach. Learn. 54 (2004) 5–32.
[20] C. Saunders, A. Gammerman, V. Vovk, Ridge regression learning algorithm in
sets, two synthetic classification datasets, and ten benchmark dual variables, in: Proceedings of the 15th International Conference on
classification datasets to test the robustness of RLS-SVM. The Machine Learning ICML–98, Morgan Kaufmann, 1998, pp. 515–521.
results show that RLS-SVM is significantly more robust than LS- [21] T. Van Gestel, J.A.K. Suykens, D.E. Baestaens, A. Lambrechts, G. Lanckriet,
B. Vandaele, B. De Moor, J. Vandewalle, Financial time series prediction using
SVM, WLS-SVM and IRLS-SVM for regression with noise. As for
least squares support vector machines within the evidence framework, IEEE
classification with noise, RLS-SVM is also the most robust among Trans. Neural Netw. 12 (4) (2001) 809–821.
the compared three learning machines. The main shortcoming of [22] Z. Wang, S.C. Chen, New least squares support vector machines based on
RLS-SVM is its training time is usually longer than that of LS-SVM matrix patterns, Neural Process. Lett. 26 (2007) 41–56.
[23] J.A.K. Suykens, L. Lukas, P. Van Dooren, B. De Moor, J. Vandewalle, Least
and WLS-SVM. squares support vector machine classifiers: a large scale algorithm, in:
In future work, we will investigate the techniques of data Proceedings of Europe Conference on Circuit Theory and Design (ECCTD'99),
sampling and data compressing so that RLS-SVM can be applied Stresa, Italy, 1999, pp 839–842.
[24] B. Hamers, J.A.K. Suykens, B. De Moor, A comparison of iterative methods for
to large-scale regression and classification with noise. Another least squares support vector machine classifiers, ESAT-SISTA, K.U. Leuven,
interesting topic would be to design some pruning algorithms for Leuven, Belgium, Internal Report, 2001, pp. 01–110.
the proposed RLS-SVM to obtain sparse solutions. Further study on [25] W. Chu, C. Ong, S.S. Keerthi, An improved conjugate gradient scheme to the
solution of least squares SVM, IEEE Trans. Neural Netw. 16 (2) (2005) 498–501.
this topic will also include many applications of RLS-SVM in real- [26] S.S. Keerthi, S.K. Shevade, SMO algorithm for least squares SVM formulations,
world regression and classification with noise. Neural Comput. 15 (2003) 487–507.
[27] K. Chua, Efficient computations for large least square support vector machine
classifiers, Pattern Recogn. Lett. 24 (2003) 75–80.
[28] V. Mitra, C.J. Wang, S. Banerjee, Text classification: a least square support
Acknowledgments vector machine approach, Appl. Soft Comput. 7 (3) (2007) 908–914.
[29] S. Zheng, W.Z. Shi, J. Liu, G.X. Zhu, J.W. Tian, Multisource image fusion method
using support value transform, IEEE Trans. Image Process. 16 (7) (2007)
The work presented in this paper is supported by the National 1831–1839.
Science Foundation of China (61273295), the Major Project of the [30] J. Luts, A. Heerschap, J.A.K. Suykens, S. Van Huffel, A combined MRI and MRSI
National Social Science Foundation of China (11&ZD156), and the based multiclass system for brain tumour recognition using LS-SVMs with
class probabilities and feature selection, Artif. Intell. Med. 40 (2) (2007)
Open Project of Key Laboratory of Symbolic Computation and 87–102.
Knowledge Engineering of the Chinese Ministry of Education (93K- [31] M.M. Adankon, M. Cheriet, Model selection for the LS-SVM application to
17-2009-K04). handwriting recognition, Pattern Recog. 42 (12) (2009) 3264–3270.
[32] Z.Y. Xin, M. Gu, Complicated financial data time series forecasting analysis
based on least square support vector machine, J. Tsinghua Univ. (Sci. Technol.)
References 48 (7) (2008) 1147–1149.
[33] R. Ormandi, Variance minimization least squares support vector machines for
time series analysis, in: Proceedings of 8th IEEE International Conference on
[1] S.S. Keerthi, S.K. Shevade, C. Bhattacharyya, K.R.K. Murthy, Improvement to Data Mining, Pisa, ITALY, 2008, pp. 965–970.
Platt's SMO algorithm for SVM classifier design, Neural Comput. 13 (3) (2001) [34] J.A.K. Suykens, J. Vandewalle, B. De Moor, Optimal control by least squares
637–649. support vector machines, Neural Netw. 14 (1) (2001) 23–35.
[2] I.W. Tsang, J.T.Y. Kwok, P.M. Cheung, Core vector machines: fast SVM training [35] H.M. Khalil, M. El-Bardini, Implementation of speed controller for rotary
on very large data sets, J. Mach. Learn. Res. 6 (2005) 363–392. hydraulic motor based on LS-SVM, Expert Syst. Appl. 38 (11) (2011)
[3] P.H. Chen, R.E. Fan, C.J. Lin, A study on SMO-type decomposition methods for 14249–14256.
support vector machines, IEEE Trans. Neural Netw. 17 (4) (2006) 893–908. [36] J. Pahasa, I. Ngamroo, A heuristic training-based least squares support vector
[4] F. Chang, C.Y. Guo, X.R. Lin, C.J. Lu, Tree decomposition for large-scale SVM machines for power system stabilization by SMES, Expert Syst. Appl. 38 (11)
problems, J. Mach. Learn. Res. 11 (2010) 2935–2972. (2011) 13987–13993.
[5] S.K. Shevade, S.S. Keerthi, C. Bhattacharyya, K.R.K. Murthy, Improvements to [37] J.A.K. Suykens, L. Lukas, J. Vandewalle, Sparse approximation using least
SMO algorithm for SVM regression, IEEE Trans. Neural Netw. 11 (5) (2000) squares support vector machines, in: Proceedings of IEEE International
1188–1193. Symposium on Circuits and Systems, Genvea, Switzerland, 2000, pp 757–760.
[6] G.W. Flake, S. Lawrence, Efficient SVM regression training with SMO, Mach. [38] J.L. Liu, J.P. Li, W.X. Xu, Y. Shi, A weighted L q adaptive least squares support
Learn. 46 (1–3) (2002) 271–290. vector machine classifiers-robust and sparse approximation, Expert Syst. Appl.
[7] A.J. Smola, B. Schö lkopf, A tutorial on support vector regression, Stat. Comput. 38 (3) (2011) 2253–2259.
14 (3) (2004) 199–222. [39] W. Wen, Z.F. Hao, X.W. Yang, A heuristic weight-setting and iteratively
[8] D. Isa, L.H. Lee, V.P. Kallimani, R. RajKumar, Text document preprocessing with updating algorithm for weighted least squares support vector regression,
the Bayes formula for classification using the support vector machine, IEEE Neurocomputing 71 (16–18) (2008) 3096–3103.
Trans. Knowl. and Eng. 20 (9) (2008) 1264–1272. [40] T.W. Quan, X.M. Liu, Q. Liu, Weighted least squares support vector machine
[9] O. Amayri, N. Bouguila, A study of spam filtering using support vector local region method for nonlinear time series prediction, Appl. Soft Comput.
machines, Artif. Intell. Rev. 34 (1) (2010) 73–108. 10 (2) (2010) 562–566.
52 X. Yang et al. / Neurocomputing 140 (2014) 41–52

[41] B. de Kruif, T. de Vries, Pruning error minimization in least squares support [63] S. Holm, A simple sequentially rejective multiple test procedure, Scand. J. Stat.
vector machines, IEEE Trans. Neural Netw. 14 (3) (2003) 696–702. 6 (1979) 65–70.
[42] L. Hoegaerts, J.A.K. Suykens, J. Vandewalle, B. De Moor, A comparison of
pruning algorithms for sparse least squares support vector machines, in:
Proceedings of ICONIP 2004, Lecture Notes in Computer Science, Springer-
Verlag, Berlin Heidelberg, vol. 3316, 2004, pp 1247–253.
Xiaowei Yang received the B.S. degree in theoretical
[43] X.Y. Zeng, X.W. Chen, SMO based pruning methods for sparse least squares
and applied mechanics, the M.Sc. degree in computa-
support vector machines, IEEE Tran. Neural Netw. 16 (6) (2005) 1541–1546.
tional mechanics, and the Ph.D. degree in solid
[44] X.W. Yang, J. Lu, G.Q. Zhang, Adaptive pruning algorithm for least squares
mechanics from Jilin University, Changchun, China, in
support vector machine classifier, Soft Comput. 14 (7) (2010) 667–680.
1991, 1996, and 2000, respectively. He is currently a full
[45] J.A.K. Suykens, J. De Brabanter, L. Lukas, J. Vandewalle, Weighted least squares
time professor in the Department of Mathematics,
support vector machines: robustness and sparse approximation, Neurocom-
South China University of Technology. His current
puting 48 (1–4) (2002) 85–105.
research interests include designs and analyses of
[46] G.C. Cawley, Leave-one-out cross-validation based model selection criteria for
algorithms for large-scale pattern recognitions, imbal-
weighted LS-SVMs, in: Proceedings of the International Joint Conference on
anced learning, semi-supervised learning, support vec-
Neural Networks, IEEE Press, Vancouver, 2006, pp 1661–1668.
tor machines, tensor learning, and evolutionary
[47] C.A.M. Lima, A.L.V. Coelho, F.J. Von Zuben, Pattern classification with mixtures
computation. He has published more than 90 journals
of weighted least-squares support vector machine experts, Neural Comput.
and refereed international conference articles, includ-
Appl. 18 (2009) 843–860.
ing the areas of structural reanalysis, interval analysis, soft computing, support
[48] K. De Brabanter, K. Pelckmans, J. De Brabanter, M. Debruyne, J.A.K. Suykens, M.
vector machines, and tensor learning.
Hubert, B. De Moor, Robustness of kernel based regression: a comparison of
iterative weighting schemes, in: Proceedings of the 19th International Con-
ference on Artificial Neural Networks (ICANN), Limassol, Cyprus, 2009,
pp. 100–110.
[49] W. Pan, Q.H. Hu, P.J. Ma, X.H. Su, Robust feature selection based on regularized Liangjun Tan received the B.S. degree in information
brownboost loss, Knowl. Based Syst. 54 (2013) 180–198. management and information system, the M.Sc. degree
[50] X.L. Huang, L. Shi, J.A.K. Suykens, Support vector machine classifier with in computational mathematics from South China Uni-
pinball loss, in: Proceedings of IEEE Transactions on Pattern Analysis and versity of Technology, Guangzhou, China, in 2010 and
Machine Intelligence, 2013, IEEE computer Society Digital Library. IEEE Computer 2013, respectively. His current research interests
Society. http://doi.ieeecomputersociety.org/10.1109/TPAMI.2013.178. include robust learning, support vector machines, and
[51] L.L. Xu, K. Crammer, D. Schuurmans, Robust support vector machine training tensor learning.
via convex outlier ablation, in: AAAI'06: Proceedings of the 21 st National
Conference on Artificial Intelligence, Boston, Massachusetts: AAAI Press, 2006,
pp. 536–542.
[52] K. Liano, Robust error measure for supervised neural network learning with
outliers, IEEE Trans. Neural Netw. 7 (1) (1996) 246–250.
[53] O. Chapelle, Training a support vector machine in the primal, Neural Comput.
19 (5) (2007) 1155–1178.
[54] A.L. Yuille, A. Rangarajia, The concave-convex procedure, Neural Comput. 15
(4) (2003) 915–936.
[55] J. Demsar, Statistical comparisons of classifiers over multiple data sets, J. Lifang He received the B.S. degree in information and
Mach. Learn. Res. 7 (2006) 1–30. computing science from the Northwest Normal Uni-
[56] R. Batuwita, V. Palade, FSVM-CIL: fuzzy support vector machines for class versity in 2009. She is currently a Ph.D. candidate in the
imbalance learning, IEEE Trans. Fuzzy Syst. 18 (3) (2010) 558–571. Department of Computer Science and Engineering,
[57] T.O. Kvalseth, Cautionary note about R2, Am. Stat. 39 (4) (1985) 279–285. South China University of Technology. Her current
[58] P.J. Rousseeuw, A. Leroy, Robust Regression and Outlier Detection, John Wiley research interests include manifold learning, machine
& Sons, New York, 1987. learning, tensor learning, and pattern recognition.
[59] W. Wen, Z.F. Hao, X.W. Yang, Robust least squares support vector machine
based on recursive outlier elimination, Soft Comput. 14 (11) (2010) 1241–1251.
[60] Y. Hochberg, A sharper Bonferroni procedure for multiple tests of significance,
Biometrika 75 (4) (1988) 800–802.
[61] X.D. Wu, Knowledge Acquisition from Databases, Ablex Publishing Corpora-
tion, Norwood, New Jersey, 1995.
[62] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge University
Press, Cambridge, UK, 1996.

View publication stats

You might also like