Liu 2012
Liu 2012
a r t i c l e i n f o a b s t r a c t
Article history: This paper presents a novel approach for texture classification, generalizing the well-known local binary pattern
Received 19 April 2011 (LBP) approach. In the proposed approach, two different and complementary types of features (pixel intensities
Received in revised form 20 October 2011 and differences) are extracted from local patches. The intensity-based features consider the intensity of the
Accepted 4 January 2012
central pixel (CI) and those of its neighbors (NI); while for the difference-based feature, two components
are computed: the radial-difference (RD) and the angular-difference (AD). Inspired by the LBP approach,
Keywords:
Texture classification
two intensity-based descriptors CI-LBP and NI-LBP, and two difference-based descriptors RD-LBP and
Local binary pattern (LBP) AD-LBP are developed. All four descriptors are in the same form as conventional LBP codes, so they can be
Bag-of-words (BoW) readily combined to form joint histograms to represent textured images. The proposed approach is compu-
Rotation invariance tationally very simple: it is totally training-free, there is no need to learn a texton dictionary, and no tuning
of parameters. We have conducted extensive experiments on three challenging texture databases (Outex,
CUReT and KTHTIPS2b). Outex results show significant improvements over the classical LBP approach,
which clearly demonstrates the great power of the joint distributions of these proposed descriptors for
gray-scale and rotation invariant texture classification. The proposed method produces the best classifica-
tion results on KTHTIPS2b, and results comparable to the state-of-the-art on CUReT.
© 2012 Elsevier B.V. All rights reserved.
1. Introduction vectors to form a texton dictionary, and then represent texture images
statistically as compact histograms over the learned texton dictionary.
Texture classification is a fundamental issue in computer vision and In this simple and efficient BoW framework, it is generally agreed
image processing, playing a significant role in a wide range of applica- that the local descriptors play a much more important role, and have
tions that include medical image analysis, remote sensing, object recog- therefore received considerable attention [2-5, 8, 9, 6]. The ap-
nition, document analysis, environment modeling, content-based proaches can be grouped into sparse and dense types, with the sparse
image retrieval etc. [1]. For four decades, texture analysis has been an approach using appearance descriptors at a sparse set of detected in-
area of intense research, however analyzing real world textures has terest points. Noticeable sparse descriptors include SPIN, SIFT and
proven to be surprisingly difficult, in many cases caused by natural RIFT [8, 10]. In contrast, dense approaches use appearance descriptors
texture inhomogeneity of varying illumination, scale changes, and pixel by pixel [2-5, 9]. The sparse approach largely relies on the sparse
variability in surface shape. output of local interest region detectors, which might miss important
Recently, the orderless Bag-of-Words (BoW) [5, 2, 3, 8] approach, texture primitives and fail to provide enough regions for a robust statis-
representing texture images statistically as histograms over a discrete tical characterization of the texture.
texton dictionary, has proven extremely popular and successful in Among the most popular dense descriptors are the various filter
texture classification tasks. Robust and discriminative local texture banks, such as Gabor filters [11], the filter bank of Schmid [5], the filter
descriptors and global statistical histogram characterization have sup- bank of Leung and Malik [5], the MR8 [2], the filter bank of Crosier [9]
plied complementary components toward the BoW feature extraction and many others [12]. The design of a filter bank is nontrivial and likely
of texture images. The former attempts to extract a collection of powerful to be application dependent. Although enormous efforts have been
and distinctive appearance descriptors from local patches; while the carried out along this direction, the supremacy of filter bank-based de-
latter first utilizes the fact that texture images contain self-repeating scriptors for texture analysis has been challenged by several authors [4,
patterns by vector-quantifying (typically by k-means) the local feature 3, 7] who have demonstrated that using the intensities or differences in
a local small patch directly can produce superior or comparable classifi-
cation performance to filter banks with large spatial support. In [7], the
☆ This paper has been recommended for acceptance by Matti Pietikainen.
authors propose sparse modeling of local texture patches, however the
⁎ Corresponding author. Tel.: + 86 731 84573479 807; fax: + 86 731 84518730.
E-mail addresses: [email protected] (L. Liu), [email protected]
sparse texton learning and sparse coding process is computationally ex-
(L. Zhao), [email protected] (Y. Long), [email protected] (G. Kuang), pensive. Two particularly important works along these lines are the
pfi[email protected] (P. Fieguth). VZ-Joint classifier [3] and the LBP method [4]. The simple, elegant and
0262-8856/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
doi:10.1016/j.imavis.2012.01.001
L. Liu et al. / Image and Vision Computing 30 (2012) 86–99 87
efficient local texture descriptor LBP may be the preferable choice over common concept, the LBP coding strategy. The pixel intensities are di-
VZ-Joint classifier since LBP uses a pre-defined texton dictionary and vided into two components: the intensity of the central pixel and the in-
does not need to use nearest neighbor to obtain the texton labels, a tensities of its neighboring pixels. For pixel differences, we study radial
time consuming step. and angular differences.
Due to its impressive computational efficiency and good texture All four descriptors (two intensity based, two difference based)
discriminative property, the dense LBP descriptor [4] has gained con- are in the same form as the conventional LBP codes, thus they can be
siderable attention since its publication [13], and has already been readily combined to form a joint histogram. The fusing of these descrip-
used in many other applications, including visual inspection, image tors will be shown to lead to significantly improved classification results
retrieval, dynamic texture recognition, remote sensing, biomedical on the experimental protocols designed for verifying the performance
image analysis, face image analysis, motion analysis, environment of the LBP approach in [4]. The key to our proposed approach is that it
modeling, and outdoor scene analysis [14–16, 18, 19, 34]. 1 Despite employs the advantages of VZ-Joint/VZ-MRF in its strong performance
the great success of LBP in computer vision and pattern recognition, from having a joint distribution, and those of LBP in computational
the conventional LBP operator comes with disadvantages and efficiency.
limitations: The paper is organized as follows: we start with a brief review of the
classical LBP approach in Section 2, followed by details of the derivation
1. The LBP operator produces long histograms which are sensitive to
of the proposed descriptors and the classification scheme. In Section 3,
image rotation.
we verify the proposed approach with extensive experiments on popular
2. The LBP has small spatial support; in its basic form, the LBP operator
texture datasets and comparisons with various state-of-the-art texture
cannot properly detect large-scale textural structures.
classification techniques. Section 4 provides concluding remarks and pos-
3. LBP loses local textural information, since only the signs of differ-
sible extensions of the proposed method. A short, preliminary version of
ences of neighboring pixels are utilized.
this work appeared in [26].
4. LBP is very sensitive to noise. The slightest fluctuation above or
below the value of the central pixel is treated as equivalent to a
2. Proposed descriptors
major contrast between the central pixel and its surroundings.
On the basis of the above issues, researchers have proposed a variety This section begins by reviewing conventional LBP, followed by
of LBP variants. In terms of locality, the authors in [20] propose to ex- the new descriptors designed to address the limitations of LBP. Finally,
tract global features from Gabor filter responses as a complementary the multiresolution analysis and classification scheme of this work is
descriptor. In order to recover the loss of information created by com- presented.
puting the LBP value, the local image contrast has been introduced by
Ojala et al. [4] as a complementary measure, and better performance 2.1. A brief review of LBP
has been reported therein. Moreover, Guo et al. [21] propose to include
the information contained in the magnitudes of local differences as The LBP method, first proposed by Ojala et al. [25, 4], encodes the
complementary to the signs used by LBP, and claim better performance. pixel-wise information in textured images. Images are probed locally
Regarding LBP robustness, especially to noise, the influential work by sampling grayscale values at a central point x0, 0 and p points xr, 0,
by Ojala et al. [4] extends basic LBP to a multiresolution context, and …, xr, p − 1 spaced equidistantly around a circle of radius r (the choice
rotation invariant patterns are introduced and successfully used in reduc- of which acts as a surrogate for controlling the scale of description),
ing the dimension of the LBP histogram and enhancing robustness and as shown in Fig. 1. In LBP, a “local pattern” operator describes the re-
speed. Ahonen et al. introduce soft histograms [28], and Tan and Triggs lationships between a pixel and its neighborhood pixels; all neigh-
[29] introduced local ternary patterns (LTP), using tertiary numbers in- bors that have values higher than or equal to the value of the
stead of binary. Noting that uniform LBPs are not necessary to occupy central pixel are given a value of 1, and all those lower a value of 0.
the major pattern proportions, Liao et al. [20] proposed to use dominant The binary values associated with the neighbors are then read se-
LBP (DLBP) which considers the most frequently occurred patterns in a quentially, clockwise, to form a binary number which may be used
texture image. to characterize the local texture. Formally,
Very recently, Heikkilä et al. [22] exploit circular symmetric LBP
X
p−1
(CS-LBP) for local interest region description, and Chen et al. present n 1; x ≥ 0
LBP p;r ¼ s xr;n −x0;0 2 ; sðxÞ ¼ : ð1Þ
a WLD descriptor by including orientation information as a robust de- 0; x b 0
n¼0
scriptor [23].
The LBP approach is based on the assumption that the local differ-
ences of the central pixel and its neighbors are independent of the cen-
tral pixel itself. However, in practice an exact independence is not
warranted: the superiority of both VZ-Joint and VZ-MRF classifiers
over LBP clearly demonstrates the benefits of explicitly including the in-
formation contained in the central pixel [3].
The fundamental question being raised here is whether explicitly
modeling the joint distribution of the central pixel and its neighbors
is an advantage or not, and how to effectively include the missing
between-scale information so that better texture classification can
be achieved? Motivated by the work of Varma and Zisserman [3]
and the LBP approach studied by Ojala et al. [4], in this paper we pro-
pose a simple, yet very powerful and novel local texture descriptor to
generalize the conventional LBP approach. In the proposed approach,
two different but complementary types of features in a local patch,
the pixel intensities and the pixel differences, are utilized by using a
1 −1
A bibliography of LBP-related research can be found at http://www.cse.oulu.fi/ Fig. 1. A central pixel x0, 0 and its p circularly and evenly spaced neighbors {xr, i}ip= 0 on
MVG/LBP_Bibliography/. radius r.
88 L. Liu et al. / Image and Vision Computing 30 (2012) 86–99
where
(a) (b)
0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 0 1 1 1 1
0 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 0
0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
In conventional LBP the central pixel is discarded (despite the im- Table 1
plicit use of the intensity of the central pixel as the threshold to Comparison of the detail mean classification accuracy for NI-LBP, LBP and MBP on test
suite Outex_TC_00000. Results are obtained as the average of the 100 test groups. The
achieve local gray-scale invariance), and only the joint distribution patch size used is 3 × 3. The 1NN classifier is used. The distance measure is χ2. Each
of the neighborhood around each pixel is considered. However, in image is normalized to have zero mean and unit standard deviation. “NI-LBP (512)” en-
their recent extensive texture study, Zhang et al. [8] suggested that codes the binary value of the center pixel, similar to “MBP (512)”. “MBP (256)” excludes
it is vital to use a combination of several detectors and descriptors. the binary value of the center pixel. Since LBP uses the value of the center pixel as the
threshold, therefore it is unnecessary to include the center pixel in this case. The numbers
Motivated by the work of Lazebnik et al. [10] and Zhang et al. [8], in
in the brackets denote the number of bins of the histogram.
this paper we seek to propose a method which possesses the
strengths of combining complementary local features, with those of Class NI-LBP (512) NI-LBP (256) LBP MBP (512) MBP (256)
LBP in computational efficiency and smaller support regions. canvas001 100.0% 100.0% 100.0% 100.0% 100.0%
canvas002 100.0% 100.0% 100.0% 100.0% 100.0%
2.2. Intensity-based descriptors canvas003 100.0% 100.0% 100.0% 100.0% 100.0%
canvas005 100.0% 100.0% 100.0% 100.0% 100.0%
canvas006 100.0% 100.0% 100.0% 100.0% 100.0%
The brightness level at a point in an image is highly dependent on canvas009 100.0% 100.0% 100.0% 100.0% 100.0%
the brightness levels of its neighboring points unless the image is canvas011 100.0% 100.0% 100.0% 100.0% 100.0%
simply random noise [24]. In MRF modeling [24], the probability of canvas021 100.0% 100.0% 100.0% 100.0% 100.0%
canvas022 100.0% 100.0% 100.0% 100.0% 100.0%
the central pixel depends only on its neighborhood as
canvas023 99.6% 99.4% 99.6% 99.8% 99.8%
canvas025 100.0% 100.0% 100.0% 100.0% 100.0%
pðΙðxc ÞjΙðxÞ; ∀x≠xc Þ ¼ pðΙðxc ÞjΙðxÞ; ∀x∈N ðxc ÞÞ ð5Þ canvas026 100.0% 100.0% 100.0% 100.0% 100.0%
canvas031 100.0% 100.0% 100.0% 100.0% 100.0%
where xc is a site in the 2D integer lattice on which the image I has canvas032 100.0% 100.0% 100.0% 100.0% 100.0%
canvas033 95.5% 96.0% 92.0% 94.4% 92.5%
been defined and N ðxc Þ is the neighborhood of that site. The center canvas035 100.0% 100.0% 100.0% 100.0% 100.0%
pixel also has discriminant information, however its distribution is canvas038 95.5% 98.2% 100.0% 99.7% 99.6%
conditioned on its neighbors alone. canvas039 99.8% 99.5% 100.0% 99.6% 99.8%
Inspired by such MRF models, and related to the ideas explored by tile005 100.0% 100.0% 100.0% 100.0% 100.0%
tile006 100.0% 99.6% 100.0% 99.8% 99.7%
Varma and Zisserman [3], we propose to use only local neighborhood
carpet002 100.0% 100.0% 100.0% 100.0% 100.0%
carpet004 100.0% 100.0% 100.0% 100.0% 100.0%
carpet005 100.0 99.6% 100.0% 99.4% 96.5%
66 182 248 carpet009 99.9% 99.9% 99.3% 95.2% 94.4%
63 58 126 5 193 12 Mean 99.76% 99.68% 99.62% 99.50% 99.26%
Original 60 60 60
Texture 66 56 248 12 182 5
Pattern
68 64 80 250 80 250
62 193 126 distributions in our NI-LBP descriptor. We explicitly model the joint
(a) (b) (c) distribution of the central pixels and its neighbors, in order to test
how significant this conditional probability distribution is for
1 1 1
1 0 1 0 1 0 classification.
LBP Next, inspired by the coding strategy of LBP, we define the following
1 0 1 0 1 0
NI-LBP descriptor (see also Fig. 2):
1 1 1 1 1 1
1 1 1
X
p−1
(a1) (b1) (c1) n 1; x ≥ 0
NI−LBP p;r ¼ s xr;n −μ 2 ; sðx ¼Þ ð6Þ
0; x b 0
n¼0
VAR 15.8 8333.8 8333.8
p−1 riu2
1 1 1 where μ ¼ 12 ∑n¼0 xr;n . Similar to LBPp, r , the rotation invariant version
riu2
1 0 0 0 1 0 of NI − LBP, denoted by NI − LBPp, r , can also be defined to achieve rota-
tion invariant classification.
NI-LBP 1 0 1 0 1 0
Regarding the selection of the threshold μ, although it was moti-
1 1 0 1 0 1 vated by intuition and experimental studies, it is also selected in
0 1 0
(a2) (b2) (c2) order to preserve LBP characteristics and to increase robustness.
Hafiane et al. [17] proposed Median Binary Pattern (MBP) which
1 1 1 seeks to derive the localized binary pattern by thresholding the pixels
1 0 1 0 1 0
0 0 0 against their median value over a 3 ×3 neighborhood. In MBP, the central
MBP 1 0 1 0 1 0 pixel is also included in this filtering process, resulting 2 9 binary
1 0 0
patterns.
1 1 1
0 1 1 NI-LBP, LBP and MBP differ in the selection of thresholding value.
(a3) (b3) (c3) The capability of encoding image configuration and pixelwise rela-
tionships might be different as they use different thresholds. For illus-
Fig. 4. Three different original texture patterns (a, b, c) and their corresponding LBPs tration purpose, Fig. 4 gives three different example local texture
(a1, b1, c1), NI-LBPs (a2, b2, c2), MBPs (a3, b3, c3) and VAR values. All three LBP patterns
patterns. The patterns shown in Fig. 4(a) and (b) would be classified
(a1, b1, c1) are the same. Patterns in (a) and (b) would be considered as the same pattern
type by LBP, though corresponding textural surfaces might be quite different from each into the same class. But the textural surfaces they represent are quite
other. By incorporating LBP with local variance information, patterns in (a) and (b) different from each other, which means they probably belong to dif-
could be distinguished, while patterns in (b) and (c) would still be considered as the ferent classes. While the other three descriptors NI-LBP, MBP and
same pattern type because of the same variance. But they are different in configuration, VAR can all tell the difference between (a) and (b). This is why
which is not due to the rotation but underlying textural properties. In terms of MBP,
MBP can distinguish (a) and (b). However, MBP cannot distinguish (b) and (c). In contrast,
Ojala et al. use the combination of LBP and VAR. However, the joint
all three NI-LBPs are different. Therefore, all three patterns can be distinguished by our histogram of LBP and VAR cannot fully solve the problem. The classi-
proposed NI-LBP. fication might be misled without considering the relationships among
90 L. Liu et al. / Image and Vision Computing 30 (2012) 86–99
80
the proposed NI-LBP descriptor has the following advantages:
60 1. Thresholding at μ is equivalent to making the local neighborhood
vector zero-mean, therefore resistant to local lighting effects, and
40 specifically invariant to gray scale changes.
2. Compared with LBP, weak edges are preserved by NI-LBP, as illus-
NI−LBPriu2
20 8,1 trated in Fig. 3. We can clearly observe that LBP does not match the
riu2
LBP
8,1
visual patterns, producing output unrelated to the peak in (a) or
0 the edge in (b). In contrast, the proposed NI-LBP outputs more
100 90 80 70 60 50 35 25
consistent patterns, owing to the better thresholding of μ.
SNR (dB)
3. Better noise robustness, as shown in Fig. 5.
Fig. 5. Comparison of the robustness to additive Gaussian noise of different signal-to-noise Recall that the local contrast measure proposed by Ojala et al. [4] is
ratios (SNR) for the proposed NI-LBP and the conventional LBP on the Outex textures: (a)
riu2 riu2 defined as follows:
NI− LBP8, 1 vs. LBP8, 1; (b) NI− LBP8, 1 vs. LBP8, 1 . We have used all the original texture im-
ages present in the Outex_TC_00010 training set (20 samples of illuminant “inca” and
1 X 2 1X
angle 0 in each of the 24 texture classes, totaling 480 images). Training is done with all p−1 p−1
the 480 noise free images and testing is done with the same images, but added with addi- VARp;r ¼ xr;n −μ ; where μ ¼ x : ð7Þ
tive Gaussian noise with different SNR. The nearest neighbor classifier is used for 2 n¼0 2 n¼0 r;n
classification.
We can see that NI −LBPp, r and VARp, r capture similar types of texture
neighborhood intersection. Taking the patterns in Fig. 4(b) and (c) information, with slight differences:
for example, which would be considered as the same pattern type
1. VARp, r achieves rotation invariance by summing up the whole var-
according to LBP and VAR, they are actually two patterns with differ-
iation in the circular neighborhood, whereas NI − LBPp, r is rotation
ent textural properties. Moreover, MBP also fails to distinguish patterns
sensitive, by default;
(b) and (c). Clearly, our proposed NI-LBP approach can distinguish
2. NI − LBPp, r is independent of gray scale, whereas VARp, r is not;
all the three different patterns, as shown in (a2) (b2) and (c2).
3. Finally, VARp, r is continuous-valued and needs to be quantized.
Therefore, the proposed NI-LBP approach is more discriminative
and effective. The latter quantization step has associated limitations of additional
In order to make further comparisons, we conducted texture clas- training to determine threshold values, and the difficulty in setting
sification on test suite Outex_TC_00000, which was used in [17]. The the number of bins. Too few bins will fail to provide enough discrimina-
results are listed in Table 1. For test suite Outex_TC_00000, there are tive information while too many bins would make the feature size too
Outex (P = 16, R = 2)
100
Proportion of Uniform Patterns (%)
Basic LBP
RD-LBP
AD-LBP
80
60
40
20
0
1
9
5
9
00
00
00
00
00
00
01
02
02
02
02
02
03
03
03
03
03
03
00
00
t00
t00
t00
t00
as
as
as
as
as
as
as
as
as
as
as
as
as
as
as
as
as
as
tile
tile
rpe
rpe
rpe
rpe
nv
nv
nv
nv
nv
nv
nv
nv
nv
nv
nv
nv
nv
nv
nv
nv
nv
nv
ca
ca
ca
ca
ca
ca
ca
ca
ca
ca
ca
ca
ca
ca
ca
ca
ca
ca
ca
ca
ca
ca
Texture Class
Fig. 6. Comparing the proportions (%) of “Uniform” patterns of all patterns for each texture in Outex for three methods: LBP, RD-LBP and AD-LBP, with P = 16, R = 2.
L. Liu et al. / Image and Vision Computing 30 (2012) 86–99 91
large. Although there are some rules to guide selection [4], it is hard to (denoted as RD-LBP and AD-LBP respectively, as illustrated in
obtain an optimal number of bins in terms of accuracy and feature Fig. 2). We define the RD-LBP descriptor as follows:
size. On the basis of the above discussion, we expect that the pro-
posed NI − LBPp, r will be a better choice over VARp, r. X
p−1
Rad n 1; x ≥ 0
To make it consistent with the binary coding strategy, the 1D dis- RD−LBP p;r;δ ¼ s Δδ;n 2 ; sðxÞ ¼ ð9Þ
0; x b 0
n¼0
tribution of the central pixels' intensity is represented by two bins,
i.e., Rad
where Δδ, n = xr, n − xr − δ, n is the radial difference computed with
given integer radial displacement δ, xr, n and xr − δ, n correspond to
1; x ≥ 0
CI−LBP ¼ s x0;0 −μ r ; sðxÞ ¼ ð8Þ the gray values of pairs of pixels of δ equally spaced pixels of the
0; x b 0
same radial direction.
Similarly, the AD-LBP descriptor is defined as
where μI is the mean of the whole image.
X
p−1
2.3. Difference-based descriptors Ang n 1; x ≥ ε
AD−LBP p;r;δ;ε s Δδ;n 2 ; sðxÞ ¼ ð10Þ
0; x b ε
n¼0
As a parallel development to the intensity descriptors just devel-
Ang
oped, we also propose pixel differences in radial and angular directions where Δδ, n = xr, n − xr, mod(n + δ, p) is the angular difference computed
on a circular grid, different from the traditional pixel differences with given angular displacement δ(2π/p), where δ is an integer such
which are computed in horizontal and vertical directions. More spe- that 1 ≤ δ ≤ p/2, xr, n and xr, mod(n + δ, p) correspond to the gray values
cifically, we propose two different descriptors, Radial Difference of pairs of pixels of δ equally spaced pixels on a circular radius r,
Local Binary Pattern and Angular Difference Local Binary Pattern and function mod(x, y) is the modulus x of y. ε is a threshold value,
Table 2
Summary of texture datasets used in our experiments.
Experiment # 1
Texture Texture Samples per Sample size Test suite Training or Number of Illuminant Samples in
dataset classes class testing angles used total
Experiment # 2
Texture Dataset Image rotation Controlled Scale variation Texture classes Sample size Samples per Samples in
dataset notation illumination class total
Fig. 8. 128 × 128 samples of the textures from Brodatz used in Experiment #1.
L. Liu et al. / Image and Vision Computing 30 (2012) 86–99 93
Fig. 9. 128 × 128 samples of the 24 textures from Outex used in Experiment #1.
Table 4
Classification accuracies (%) on Contrib_TC_00001, where training is done at just one rotation angle and the average accuracy over 10 angles. The results for LBP, VAR, and LBP/VAR
are quoted directly from the original paper by Ojala et al. [4].
LBP (16, 2) 18 96.2 99.0 98.6 98.9 98.5 99.1 97.6 98.6 98.7 97.5 98.3
VAR (16, 2) 128 89.9 84.5 986.2 90.5 87.3 85.6 91.0 89.8 90.8 88.5 88.4
LBP/VAR [4] (8, 1) + (16, 2) + (24, 3) 864 100 99.7 99.5 99.8 99.6 99.7 99.8 99.6 99.8 99.9 99.7
2 ∗ NI (8, 1) 10 65.4 85.5 81.3 76.6 77.0 78.4 68.8 81.4 75.8 76.5 76.7
(16, 2) 18 87.6 95.2 92.3 93.6 89.4 96.0 88.9 91.3 93.4 90.1 91.8
(24, 3) 26 96.2 93.4 97.6 96.6 98.3 96.7 97.1 96.7 92.6 98.2 96.4
2 ∗ RD (8, 1) 10 68.8 86.4 84.4 76.0 84.9 84.4 70.2 84.1 76.1 84.7 80.0
(16, 2) 18 89.2 92.9 96.7 97.8 96.1 92.6 88.4 94.7 96.7 97.3 94.3
(24, 3) 26 87.6 90.6 98.2 90.8 96.5 93.8 89.5 98.6 89.5 94.2 92.9
2 ∗ RD/CI (8, 1) 20 87.1 84.7 94.3 88.6 95.9 95.1 85.8 94.8 90.3 95.0 92.2
(16, 2) 36 92.7 94.6 96.8 97.3 98.4 95.6 91.8 99.4 96.7 98.6 96.2
(24, 3) 52 96.9 95.8 95.6 92.8 96.5 94.3 96.9 99.1 95.3 95.9 95.9
2 ∗ NI/CI (8, 1) 20 74.8 90.4 86.4 80.3 82.5 85.2 74.4 86.2 80.6 82.2 82.2
(16, 2) 36 95.6 99.2 98.8 98.0 98.2 99.4 93.8 98.3 96.9 97.4 97.6
(24, 3) 52 99.1 98.7 99.4 99.4 100 100 99.7 97.5 97.3 99.1 99.1
2 ∗ NI/RD/CI (8, 1) 100 70.2 88.9 87.0 80.0 85.2 85.5 71.9 87.1 81.6 84.9 82.2
(16, 2) 324 100 100 100 100 100 100 100 100 100 100 100
(24, 3) 676 98.2 100 100 100 100 100 99.6 99.9 99.9 100 99.8
2 ∗ 100.0% (8, 1) 200 78.1 94.5 92.2 91.1 93.0 92.0 76.2 92.4 91.8 92.6 89.4
(16, 2) 648 100 100 100 100 100 100 100 100 100 100 100
(24, 3) 1352 98.8 100 100 100 100 100 99.8 100 99.8 100 99.8
94 L. Liu et al. / Image and Vision Computing 30 (2012) 86–99
Table 5
Classification accuracies (%) for the three Outex test suites, where training was done at angle 0 and testing at the remaining 9 angles. The mean accuracy is the average over the three
test suites. The results for LBP, VAR, and LBP/VAR are quoted directly from the original paper by Ojala et al. [4].
The bold numbers indicate the highest classification score achieved on each dataset.
available texture databases namely, the Brodatz [31], the Outex [32], DLBP and NGF [20]: DLBP is an LBP variant extracting the dominant
the CUReT [3], and the KTHTIPS2b [27, 30] databases. The presenta- LBP patterns in texture image for classification. It is suggested in [20]
tion of the experimental results is divided into two groups with cor- that DLBP in combination with another complementary Gabor based
responding objectives. descriptor NGF, which captures global texture information, can yield
Experiment #1, presented in Section 3.2, aims at investigating the improved and robust classification results. This method also needs
proposed approach for gray scale and rotation invariant texture clas- pretraining.
sification, comparing our proposed descriptors with the classical LBP CLBP [21]: A local texture patch is represented by its center pixel,
and VAR descriptors proposed by Ojala et al. [4] and with other LBP the sign and magnitude of the differences of the neighborhoods
based approaches [21, 33, 20]. This setup utilizes all the same texture against the center pixel. CLBP is training free.
test suites and experimental setup as those used by Ojala et al. [4] The following three state-of-the-art approaches all need a time
(except that additional training angles where tested for Out- consuming universal texton dictionary learning stage:
ex_TC_00010 and Outex_TC_00012).
Experiment #2, presented in Section 3.3, examines the classifica- VZ-MR8 [2, 3]: Eight filter responses derived from the responses of
tion performance of the proposed approach for two more realistic 38 filters with large spatial support. A complicated anisotropic
and challenging texture classification tasks:
Gaussian filtering method was used to calculate the MR8 responses,
a texton dictionary is learned from the MR8 feature space, and a his-
1. Material classification dealing with exemplar identification, where
togram model is learned for an image by labeling each of the image
instances are imaged from single images obtained under unknown
viewpoint and illumination, using the popular CUReT database [3]. pixels with the texton that lies closest to it in filter response space.
2. Material categorization where each material consists of instances VZ-Joint [3]: The VZ-Joint is identical to the VZ-MR8 except the
imaged from multiple different physical samples under different local descriptor used, instead of using a dense filter bank descriptor,
viewpoints, illuminations and imaging distances, using the material the raw pixel intensities of an N × N square neighborhood around
database KTHTIPS2b [27, 30]. that point are taken as features.
VZ-MRF [3]: A texture image is represented using a two-
In both cases, comparisons are made with state-of-the-art dimensional histogram: one for the quantized bins of the patch
methods that have reported results on these datasets. center pixel, the other for the learned textons from the patch
with the center pixel excluded. The number of bins for the center
3.1. Methods tested pixel in [3] is as large as 200, and the size of the texton dictionary
riu2 riu2
is 61 × 40 = 2440, resulting in an extremely high dimensionality of
LBP and VAR [4]: three descriptors: joint LBPp, r /V ARp, r, LBPp, r and
2440 × 200 = 488,000.
VARp, r, are used in comparison. We follow the experimental setup in
[4] for these three descriptors, see [4] for details. VAR needs Implementation details: To make the comparisons as meaningful as
pretraining. possible, we keep our experimental settings as in [4]. The descriptor
Table 6
riu2 riu2
The number of misclassified samples for each texture class and rotation angle for NI − LBP16, 2/LBP _ R16, 2/CI on test suites Outex_TC_00010 (italic), Outex_TC_00012 “tl84” (plain)
and “horizon” (bold). Only texture classes with misclassified samples are shown, and all other texture classes are all correctly classified. This table can be compared with Table 5
from Ojala et al. [4], where however only results for Outex_TC_00010 are shown.
Texture 0° 5° 10° 15° 30° 45° 60° 75° 90° Total All
canvas001 • • • • • • • • • • • • • 1 • • • • • 1 • • 1 • • • • • 3 • 3
canvas033 • • • • 2 • 1 3 2 1 3 3 • 1 2 1 3 2 • 5 4 3 6 5 3 8 4 9 31 22 62
canvas038 • • 1 • • 1 • • 2 • • 1 • • 2 • • 2 • • 5 • 2 4 1 4 8 1 6 26 33
tile005 • 4 1 • 5 • • 3 1 • 5 • • 2 • • 2 • • 1 • • • • • 1 • • 23 2 25
tile006 • • 3 • 1 3 3 2 2 • 4 2 2 8 2 4 5 3 4 4 2 4 3 5 4 6 19 22 38 79
carpet002 • • • • • • • • • • • • • 1 • • 1 • • • • • 1 • • 1 • • 4 • 4
total 0 4 5 0 8 4 4 9 7 3 8 8 2 6 12 3 9 9 3 10 13 5 12 12 9 18 18 29 89 88 206
L. Liu et al. / Image and Vision Computing 30 (2012) 86–99 95
Fig. 10. Some example texture samples from tile005 (top row) and tile006 (bottom row). We can see that they look fairly similar.
abbreviations are summarized in Table 3. In all experiments, each tex- Following [4], each training sample is split into 121 disjoint 16× 16 sub-
ture sample is normalized to be zero mean and unit standard deviation. samples, whose histograms are then merged into one model histogram.
Results for the CUReT database are reported over 100 random partitions We point out that the seven testing images in each texture class are
of training and testing sets. 1NN is used for classification. physically different from the one designated training image.
Outex_TC_00010: 24 Outex texture classes (shown in Fig. 9) with
3.2. Experiment #1 each class having 20 samples. It was created by Ojala et al. [4], again
for rotation invariant texture classification. All textures in this test
3.2.1. Image data and experimental setup suite have the same illuminant “inca”. The training and testing
Contrib_TC_00001: This test suite consists of 16 texture classes scheme is the same as that for Contrib_TC_00001 but with nine differ-
from the Brodatz database [31] (a few shown in Fig. 8). This test ent rotation angles. All of the 480 (24 × 20) samples rotated by one
suite was designed for rotation invariant texture classification note [2] angle are adopted as the training data, and testing data consists of
http://www.ee.oulu.fi/mvg/page/image_data. There are eight samples all 480 samples rotated by the other 8 angles. Hence, there are 480
of size 180 × 180 in each class, out of which the first sample is utilized models for training, and 3840 (480 × 8) for validation.
for training and the other seven as testing. Given ten rotation angles, Outex_TC_00012: Created by Ojala et al. [4] for rotation and illumina-
the classifier is trained with samples artificially rotated to just one tion invariant texture classification. The texture classes are the same as
angle and tested against samples rotated to the other nine angles. In Outex_TC_00010. The classifier was trained with the same training
each experiment, the classifier was trained with 16 images and tested samples as Outex_TC_00010, but tested with all samples captured at all
with 1008 (16 × 7 × 9) samples, 63 in each of the 16 texture classes. 9 rotation angles under different illuminants “t184” or “horizon”. Due
Table 7
Classification accuracies (%) of descriptor NI/RD/CI for Outex_TC_00010 and Outex_TC_00012: training is done at just one rotation angle, and the average accuracy over 9 angles.
7 ∗ [c] Outex_ (8, 1) 90.9 91.6 92.1 93.0 91.3 90.8 88.9 89.0 84.3 90.2
TC_00012 (16, 2) 98.0 98.3 99.1 98.6 98.4 98.6 98.6 97.7 96.8 98.3
(“tl84”) (24, 3) 97.3 98.3 98.5 98.7 97.2 96.4 93.4 94.2 94.1 96.5
(8, 1) + (16, 2) 97.4 98.0 98.4 98.5 98.3 98.3 97.8 97.1 95.6 97.7
(8, 1) + (24, 3) 97.7 93.3 98.7 98.7 98.5 97.9 96.4 96.6 96.4 97.7
(16, 2) + (24, 3) 98.3 99.0 99.3 99.2 98.9 98.9 98.3 98.1 98.1 98.7
(8, 1) + (16, 2) + (24, 3) 98.5 98.9 99.1 99.1 99.0 98.9 98.4 98.2 98.1 98.7
7 ∗ [c] Outex_ (8, 1) 92.7 92.8 93.3 93.6 92.7 91.6 90.3 91.1 86.6 91.6
TC_00012 (16, 2) 98.0 98.0 98.3 98.4 97.7 97.9 98.2 98.3 98.1 98.1
(“horizon”) (24, 3) 96.2 97.0 97.0 97.3 95.5 95.1 92.7 93.7 94.1 95.4
(8, 1) + (16, 2) 98.2 97.8 98.3 97.9 97.1 97.8 98.2 97.8 97.0 97.8
(8, 1) + (24, 3) 97.8 97.5 97.7 97.7 96.2 96.1 95.1 95.2 95.1 96.3
(16, 2) + (24, 3) 97.8 98.3 98.2 98.3 97.3 97.5 96.9 97.0 97.7 97.7
(8, 1) + (16, 2) + (24, 3) 97.8 98.4 98.4 98.2 97.4 97.7 97.5 97.1 97.6 97.8
7 ∗ [c] Outex_ (8, 1) 96.5 96.3 97.4 97.6 96.2 95.3 92.7 94.9 91.8 95.4
TC_00010 (16, 2) 99.3 99.4 99.5 99.7 99.6 99.6 99.5 99.0 99.0 99.4
(“inca”) (24, 3) 99.2 99.5 99.4 99.5 99.5 99.5 99.2 99.3 99.1 99.4
(8, 1) + (16, 2) 99.4 99.4 99.6 99.6 99.5 99.4 99.4 99.0 98.6 99.3
(8, 1) + (24, 3) 99.3 99.5 99.5 99.5 99.6 99.6 99.7 99.4 99.2 99.5
(16, 2) + (24, 3) 99.6 99.7 99.8 99.7 99.7 99.9 99.8 99.7 99.5 99.7
(8, 1) + (16, 2) + (24, 3) 99.7 99.7 99.7 99.6 99.6 99.8 99.9 99.7 99.4 99.7
The bold numbers indicate the highest classification score achieved on each dataset.
96 L. Liu et al. / Image and Vision Computing 30 (2012) 86–99
Fig. 11. Comparing the best classification scores of our approach with various state-of-the-art methods on all the three test suites. All the results are as originally reported, except for
those of VZ-MR8 and VZ-Joint, which are obtained by us using the exact same experimental setup as Varma and Zisserman did [2, 3]. For VZ-MR8 and VZ-Joint, 40 textons per class
is used for building the universal texton dictionary.
riu2
to the varying illuminants, some texture samples have a large tactile significantly outperformed their simpler counterpart NI − LBP8, 1.
dimension which induces significant local gray-scale distortions, there- This is also the case with RD-LBP. Interestingly, the performance of
fore Outex_TC_00012 is more challenging than Outex_TC_00010. NI-LBP increases with the neighborhood size, while for RD-LBP, the
riu2
best performance is achieved by RD − LBP16, 2. On average, between
3.2.2. Experimental results on Contrib_TC_00001 the individual descriptors, LBP performs the best and VAR the worst.
Ojala et al. [4] reported a near-perfect classification accuracy of The center pixel also provides useful discriminative information,
99.7% for the joint descriptor LBP VAR when using two spatial resolu- since it is apparent in Table 4 that combining the center pixel CI-LBP
tions (8,1) + (24,3) or three spatial resolutions (8,1) + (16,2) + 24,3. with NI-LBP or RD-LBP can generally improve classification perfor-
Table 4 presents the results for our proposed descriptors, comparing mance. Neglecting the center pixel clearly results the loss in informa-
with the state-of-the-art methods [4]. tion, similar to how [3] and [21] demonstrated the benefits of
The individual descriptor NI-LBP and RD-LBP perform similarly, explicitly including the information of the center pixel in the
riu2 riu2
with NI-LBP doing slightly better. NI − LBP16, 2 and NI − LBP24, 3 classifier.
Fig. 12. One sample of each of the 61 texture classes from CUReT.
L. Liu et al. / Image and Vision Computing 30 (2012) 86–99 97
Fig. 13. The variations within each category of the new KTHTIPS2b database. Each row shows one example image from each of four samples of a category.
Table 8
Comparing classification accuracy (%) on CUReT: Ntr is the number of training samples per class used. All results are obtained by us except for VZ-Joint, which are quoted from the
recent comparative study of Zhang et al. [8]. For VZ-MR8, we learn 10 textons per class.
Ntr 46 23 12 6 2 46 23 12 6 2 46 23 12 6 2
LBP/VAR 93.76 88.71 81.80 71.08 50.43 4.00 89.76 81.53 71.09 52.77 91.90 85.34 77.12 66.04 48.64
NI/RD/CI 95.15 92.00 86.19 77.97 57.96 9563 92.7 87.12 79.57 60.89 92.59 87.85 80.92 70.33 52.21
As shown in Table 10 and Fig. 14, we also compare our method [8] J. Zhang, M. Marszalek, S. Lazebnik, C. Schmid, Local features and kernels for clas-
sification of texture and object categories: a comprehensive study, Int. J. Comput.
with state-of-the-art methods on the material categorization task of Vis. 73 (2) (2007) 213–238.
the KTHTIPS2b textures, with all results from other methods quoted [9] M. Crosier, L.D. Griffin, Using basic image features for texture classification, Int. J.
directly from [27]. For this database, our proposed NI-LBP/RD-LBP/CI- Comput. Vis. 88 (3) (2010) 447–460.
[10] S. Lazebnik, C. Schmid, J. Ponce, A sparse texture representation using local affine
LBP descriptor outperforms all compared state-of-the-art methods regions, IEEE Trans. Pattern Anal. Mach. Intell. 27 (8) (2005) 1265–1278.
by a significant margin. We should bear in mind that the classification [11] B.S. Manjunath, W.Y. Ma, Texture features for browsing and retrieval of image
results of all of the methods are obtained with a 1NN classifier, since data, IEEE Trans. Pattern Anal. Mach. Intell. 18 (8) (1996) 837–842.
[12] T. Randen, J. Husøy, Filtering for texture classification: a comparative study, IEEE
we mainly focus our attention on the effectiveness of the descriptors Trans. Pattern Anal. Mach. Intell. 21 (4) (1999) 291–310.
rather than on the capabilities of the classifier. Using a more advanced [13] T. Ojala, M. Pietikäinen, D. Harwood, A comparative study of texture measures
classifier (SVM or k > 1) might improve performance significantly. with classification based on feature distributions, Pattern Recognit. 29 (1)
(1996) 51–59.
[14] Y. Rodriguez, S. Marcel, Face authentication using adapted local binary pattern
histograms, European Conference on Computer Vision (ECCV), Graz, Austria,
4. Conclusions and future work
2006, pp. 321–332.
[15] T. Ahonen, A. Hadid, M. Pietikäinen, Face description with local binary patterns:
This paper has proposed a novel local texture descriptor, generalizing application to face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 28 (12)
(2006) 2037–2041.
the well-known LBP approach. Four LBP-like descriptors, two local
[16] G. Zhao, M. Pietikäinen, Dynamic texture recognition using local binary patterns
intensity-based CI-LBP and NI-LBP, and two local difference-based de- with an application to facial expressions, IEEE Trans. Pattern Anal. Mach. Intell.
scriptors RD-LBP and AD-LBP, were presented to extract complementary 29 (6) (2007) 915–928.
texture information of local spatial patterns. We showed that combining [17] A. Hafiane1, G. Seetharaman, B. Zavidovique, Median binary pattern for textures
classification, Proceedings of ICIAR, 2007, pp. 387–398.
complementary descriptors played an important role in texture discrim- [18] M. Heikkilä, M. Pietikäinen, J. Heikkilä, A texture-based method for detecting
ination. In addition, we found that information contained in radial differ- moving objects, British Machine Vision Conference (BMVC), London, vol. 1,
ences is more discriminative than those contained in angular difference. 2004, pp. 187–196.
[19] M. Pietikäinen, T. Nurmela, T. Mäenpää, M. Turtinen, View-based recognition of
The advantages of the proposed approach include its computational real-world textures, Pattern Recognit. 37 (2) (2004) 313–323.
simplicity, no training (in the feature extraction stage), and a data- [20] S. Liao, M.W.K. Law, A.C.S. Chung, Dominant local binary patterns for texture clas-
independent universal texton dictionary. Extensive experimental re- sification, IEEE Trans. Image Process. 18 (5) (2009) 1107–1118.
[21] Z. Guo, L. Zhang, D. Zhang, A completed modeling of local binary pattern operator
sults show that the joint distribution of CI-LBP, NI-LBP and RD-LBP sig- for texture classification, IEEE Trans. Image Process. 9 (16) (2010) 1657–1663.
nificantly outperform the conventional LBP approach and its various [22] M. Heikkilä, M. Pietikänen, C. Schmid, Description of interest regions with local
invariants on the Outex test suites. Furthermore, results on the material binary patterns, Pattern Recognit. 42 (3) (2009) 425–436.
[23] J. Chen, S. Shan, C. He, G. Zhao, M. Pietikäinen, X. Chen, W. Gao, WLD: A robust
database KTHTIPS2b demonstrate the best performance of the proposed
local image descriptor, IEEE Trans. Pattern Anal. Mach. Intell. 32 (9) (2010)
approach in comparison with several state-of-the-art methods with a 1705–1720.
nearest neighbor classifier. [24] G.K. Cross, A.K. Jain, Markov random field texture models, IEEE Trans. Pattern
Anal. Mach. Intell. 5 (1) (1983) 25–39.
In the future, we plan to explore how to reduce the feature dimen-
[25] T. Ojala, K. Valkealahti, E. Oja, M. Pietikäinen, Texture discrimination with multi-
sion of the multiresolution CI-LBP/NI-LBP/RD-LBP. We also believe dimensional distributions of signed gray-level differences, Pattern Recognit. 34 (3)
that an in-depth investigation of the AD-LBP descriptor would be valu- (2001) 727–739.
able for local region description, looking at the parallels between [26] L. Liu, P. Fieguth, G. Kuang, Generalized local binary patterns for texture classifica-
tion, British Machine Vision Conference (BMVC2011), 2011.
AD-LBP and the CS-LBP of [22] developed for image matching. [27] B. Caputo, E. Hayman, P. Mallikarjuna, Class-specific material categorization, Inter-
national Conference on Computer Vision (ICCV), Beijing, 2005, pp. 1597–1604.
[28] T. Ahonen, M. Pietikäinen, Soft histograms for local binary patterns, Finnish Signal
References Processing Symposium, Oulu, Finland, 2007.
[29] X. Tan, B. Triggs, Enhanced local texture feature sets for face recognition under
[1] M. Tuceryan, A.K. Jain, Texture analysis, in: C.H. Chen, L.F. Pau, P.S.P. Wang (Eds.), difficult lighting conditions, IEEE Trans. Image Process. 19 (6) (2010) 1635–1650.
Handbook Pattern Recognition and Computer Vision, World Scientific, Singapore, [30] B. Caputo, E. Hayman, M. Fritz, J.-O. Eklundh, Classifying materials in the real
1993, pp. 235–276. world, Image Vis. Comput. 28 (1) (2010) 150–163.
[2] M. Varma, A. Zisserman, A statistical approach to texture classification from single [31] P. Brodatz, Texture: A Photographic Album for Artists and Designers, Dover, New
images, Int. J. Comput. Vis. 62 (1–2) (2005) 61–81. York, 1966.
[3] M. Varma, A. Zisserman, A statistical approach to material classification using [32] T. Ojala, T. Mäenpää, M. Pietikäinen, J. Viertola, J. Kyllönen, S. Huovinen, Outex–new
image patches, IEEE Trans. Pattern Anal. Mach. Intell. 31 (11) (2009) 2032–2047. frame work for empirical evaluation of texture analysis algorithm, International
[4] T. Ojala, M. Pietikäinen, T. Mäenpää, Multiresolution gray-scale and rotation in- Conference on Pattern Recognition, 2002, pp. 701–706.
variant texture classification with local binary patterns, IEEE Trans. Pattern [33] Z. Guo, L. Zhang, D. Zhang, Rotation invariant texture classification using LBP vari-
Anal. Mach. Intell. 24 (7) (2002) 971–987. ance (LBPV) with global matching, Pattern Recognit. 43 (3) (2010) 706–719.
[5] T. Leung, J. Malik, Representing and recognizing the visual appearance of materials [34] L. Nanni, A. Lumini, S. Brahnam, Local binary patterns variants as texture descriptors
using three-dimensional textons, Int. J. Comput. Vis. 43 (1) (2001) 29–44. for medical image analysis, Artif. Intell. Med. 49 (2) (2010) 117–125.
[6] L. Zhang, L. Zhang, Z. Guo, D. Zhang, Monogenic-LBP: a new approach for rotation [35] K.J. Dana, B. van Ginneken, S.K. Nayar, J.J. Koenderink, Reflectance and texture of
invariant texture classification, IEEE International Conference on Image Processing real-world surfaces, ACM Trans. Graph. 18 (1) (1999) 1–34.
(ICIP), 2010, pp. 2677–2680. [36] P. Mallikarjuna, M. Fritz, A.T. Targhi, E. Hayman, B. Caputo, J.-O. Eklundh, The
[7] J. Xie, L. Zhang, J. You, D. Zhang, Texture classification via patch-based sparse texton KTH-TIPS and KTH-TIPS2 Databases, http://www.nada.kth.se/cvap/databases/
learning, IEEE International Conference on Image Processing (ICIP), 2010, kth-tips/ 2006.
pp. 2737–2740.