
ZHENG et al.: ITERATIVE RE-CONSTRAINED GROUP SPARSE FACE RECOGNITION WITH ADAPTIVE WEIGHTS LEARNING 2411
where σ is the bandwidth parameter and η
ij
denotes the
distance between y and x
ij
. Clearly, the larger the η
ij
is,
the smaller the weight coefficient is. In (5), when p is assigned
as 1 or 2, the corresponding weighted classifier is WSRC [18]
and WCRC [19], respectively.
Similar to WSRC and WCRC, LGSR [20] combines the
reconstruction error with data locality constrained group
sparsity regularizer. However, the locality constraint regu-
larized term disrupts the group structure of sparse solution.
WGSC [21] imposes not only locality constraints on group
sparsity to exclude the training samples which are far away
from the test sample, but also considers the similarity between
the test sample and each class. Accordingly, the weight coef-
ficients of WGSC is formulated as η = r d,whered is
computed same as (6), and r is used to assess the relative
importance of all the reference classes. Inspired by LRC [9],
r is computed as
r
i
=y − X
i
θ
∗
i
2
2
, θ
∗
i
= arg min y − X
i
θ
i
2
2
. (7)
C. Robust Regression Type Classifier
Although the diversity of norm constrained regularization,
the aforementioned classifiers all employ the l
2
-norm to
characterize the reconstruction residual. Considering the fact
that the l
2
-norm is sensitive to large outliers, RSRC [5]
adopts l
1
-norm into SRC for prompting the residual estimation
to be more robust. Similarly, RCRC [23] characterizes the
representation fidelity of CRC with l
1
-norm for robustness to
corruptions/occlusions. That is to say, the value of q in (4) is
set to be 1. From the perspective of maximum likelihood esti-
mation, the l
2
-norm and l
1
-norm are based on the hypothesis
that the error residuals are independent identically distributed
with Gaussian distribution or Laplacian distribution. RRC [28]
models sparse coding as a robust regression problem and
adopts an adaptive distribution to characterize the noises.
Specifically, RRC uses the following criteria
min
θ
s (y − Xθ )
2
2
+ λθ
p
p
, (8)
where p can be 1 or 2 (termed as RRCL1 or RRCL2, respec-
tively). Rewrite X as X =[z
1
; z
2
; ...; z
m
],wherez
i
∈ R
n
is
the ith row of X,andlete = y− Xθ =[e
1
; e
2
; ...; e
n
],where
e
i
= y
i
− z
i
θ, i = 1, 2,...,m. The feature weight s in (8) is
set to be the following logistic function
s
i
=
exp (−μe
2
i
+ μδ)
1 + exp (−μe
2
i
+ μδ)
, (9)
where μ and δ are positive scalars. Parameter μ manipulates
the decreasing rate from 1 to 0, and parameter δ determines
the location of demarcation point. With the help of s, RRC
prefers to assign larger weights to inliers and smaller weights
to outliers; that is, it has higher capability to classify inliers
and outliers.
III. I
TERATIVE RE-CONSTRAINED GROUP
SPARSE CLASSIFICATION
In addition to encompassing most popular regression type
classification algorithms, the criteria (3) can also serve as a
Fig. 1. The residual distributions of occluded face image by competing
methods. (a) Clean image. (b) Occluded query image. (c) and (d) Distributions
of coding residuals in linear and log domains.
general framework under which new algorithms for classifica-
tion to be derived. The mentioned works in Section 2 only
employ limited advantages to their models. For example,
SRC just emphasizes that strong sparsity will bring about
strong discriminant power. RRC assigns different weights
to different features according to their coding residuals, but
without consideration of locality. Following the intuition
that group sparsity, distance locality, and weighted features
explicitly encourages a better representation-type classifier,
we incorporate all these advantages into the objective function.
Correspondingly, our preliminary objective is
min
θ
s (y − Xθ)
2
2
+ λ
c
i=1
η
i
θ
i
p
2
, (10)
which consists of two parts: s(y−Xθ ) is the reconstruction
error with integration of feature weights s, and the second part
represents the weighted l
2, p
mixed norm based regularizer on
the coefficient θ ,wherep > 0.
A. Adaptive Feature Weights Learning
As indicated by [25], different s leads to different classifiers.
When s is set to be 1 constantly, it corresponds to the l
2
-norm
fidelity as in CRC; when set as s
i
= 1/|e
i
|, it turns to the
l
1
-norm fidelity as in SRC; when set to a logistic function
as Eq.(9), it becomes RRC. However, all these functions
fundamentally have certain limits. The l
2
-norm fidelity treats
all features equally, no matter whether they are outlier or not.
The l
1
-norm fidelity assigns infinity to features when the
residual approaches to zero, making the coding unstable. The
logistic weight function has two tunable parameters need to be
set manually, which costs a lot of time. Moreover, the essential
relationship between the logistic function and the real weights
has not been theoretically revealed.
In this paper, we expect s to be more flexible, which
is adaptive to the query y so that the classifier is more
robust to mixed types of noises. We first use an example
to illustrate different distributions of residual e by different
models. Fig.1(a) is a clean face sample from the AR database,
while Fig.1(b) presents the real disguised query face image y
with scarf. The distributions of e by using Gaussian (SRC),
Laplacian (RSRC), and the logistic function (RRC) are plotted
in Fig.1(c). Fig.1(d) further illustrates the distributions in log
domain for better observation of the tails. It has been reported
repeatedly that the empirical distribution of e has a strong
peak at zero but with a long tail, which is mostly caused by