0% found this document useful (0 votes)
86 views22 pages

Comparative Study of Global Color and Text - 2012 - Journal of Visual Communicat

This document presents a comparative study of global color and texture descriptors for web image retrieval. It analyzes 24 color descriptors and 28 texture descriptors through theoretical complexity analysis and experimental evaluation on small and large datasets containing over 230,000 images from the web. The study considers efficiency, effectiveness, and how well descriptors agree with human perception of semantic similarity for queries. It also analyzes how descriptor performance is affected by large scale and heterogeneous web image collections. The best performing descriptors are found to provide complementary information when used in combination for web image retrieval tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views22 pages

Comparative Study of Global Color and Text - 2012 - Journal of Visual Communicat

This document presents a comparative study of global color and texture descriptors for web image retrieval. It analyzes 24 color descriptors and 28 texture descriptors through theoretical complexity analysis and experimental evaluation on small and large datasets containing over 230,000 images from the web. The study considers efficiency, effectiveness, and how well descriptors agree with human perception of semantic similarity for queries. It also analyzes how descriptor performance is affected by large scale and heterogeneous web image collections. The best performing descriptors are found to provide complementary information when used in combination for web image retrieval tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

J. Vis. Commun. Image R.

23 (2012) 359–380

Contents lists available at SciVerse ScienceDirect

J. Vis. Commun. Image R.


journal homepage: www.elsevier.com/locate/jvci

Comparative study of global color and texture descriptors for web image retrieval
Otávio A.B. Penatti a,⇑, Eduardo Valle a,b, Ricardo da S. Torres a
a
RECOD Lab – Institute of Computing (IC), University of Campinas (Unicamp), Brazil
b
Department of Computer Engineering and Industrial Automation (DCA), School of Electrical and Computer Engineering (FEEC), University of Campinas (Unicamp), Brazil

a r t i c l e i n f o a b s t r a c t

Article history: This paper presents a comparative study of color and texture descriptors considering the Web as the envi-
Received 10 March 2011 ronment of use. We take into account the diversity and large-scale aspects of the Web considering a large
Accepted 10 November 2011 number of descriptors (24 color and 28 texture descriptors, including both traditional and recently pro-
Available online 28 November 2011
posed ones). The evaluation is made on two levels: a theoretical analysis in terms of algorithms complex-
ities and an experimental comparison considering efficiency and effectiveness aspects. The experimental
Keywords: comparison contrasts the performances of the descriptors in small-scale datasets and in a large hetero-
Comparative study
geneous database containing more than 230 thousand images. Although there is a significant correlation
Color descriptors
Texture descriptors
between descriptors performances in the two settings, there are notable deviations, which must be taken
Web into account when selecting the descriptors for large-scale tasks. An analysis of the correlation is pro-
Content-based image retrieval vided for the best descriptors, which hints at the best opportunities of their use in combination.
Efficiency and effectiveness Ó 2011 Elsevier Inc. All rights reserved.
Asymptotic complexity
Correlation analysis

1. Introduction the feature vectors of the database images, that had been previ-
ously extracted. The comparisons are made by computing distance
This paper presents a comparative study of global color and values and those values are used to rank the database images
texture descriptors considering the Web as the environment of according to their similarities to the query image. The most similar
use. images are finally shown to the user. The image descriptor is in-
In the latest years, the amount of digital images has grown rap- volved in this process in the extraction of images properties and
idly. Among the main reasons for that, one may mention digital in the distances computations. Therefore, it is clear the critical
cameras and high-speed Internet connections. Those elements importance of image descriptors for CBIR systems.
have created a simple way to generate and publish visual content It is known that many image descriptors are application depen-
worldwide. That means that a huge amount of visual information dent, that is, their performances vary from one application to another.
becomes available everyday to a growing number of users. Much Therefore, conducting comparative evaluation of image descriptors
of that visual information is available on the Web, which has be- considering different environments of use is very important.
come the largest and most heterogeneous image database so far. Literature presents several comparative studies for color, tex-
In that scenario, there is a crucial demand for image retrieval ture, and shape descriptors. A recent study [3] compares a large
systems [1,2], which could be satisfied by content-based image re- number of image descriptors in five different image collections
trieval (CBIR) systems. In CBIR systems, the image descriptor is a for tasks of classification and image retrieval. Other studies are spe-
very important element. It is responsible for assessing the similar- cific to certain properties: shape descriptors [4–8], texture descrip-
ities among images. Descriptors can be classified depending on the tors [9–11], or color descriptors [12–14]. Surveys and comparisons
image property analyzed, like, for example, color or texture markedly related to ours can be found in Sections 2.4 and 2.6.
descriptors, that analyze color or texture properties, respectively. In comparative studies of descriptors, the Web is rarely consid-
In CBIR systems, the searching process works as follows. The ered as the environment of use. In general, the amount of descrip-
user queries the system, usually, by using a query image. Its prop- tors considered is small and the application analyzed is specific.
erties are encoded into feature vectors and then compared against Besides that, asymptotic theoretical analysis and other efficiency
considerations are generally not discussed in detail.
⇑ Corresponding author. Our study has many novel aspects. First, it considers a large
E-mail addresses: [email protected] (O.A.B. Penatti), [email protected]. number of descriptors: 24 color and 28 texture descriptors, includ-
unicamp.br (E. Valle), [email protected] (Ricardo da S. Torres). ing both traditional and recently proposed ones. Our evaluation is

1047-3203/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved.
doi:10.1016/j.jvcir.2011.11.002
360 O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380

made in two levels: a theoretical analysis of algorithms complexi-  Dencodes image visual properties into feature vectors, as
ties and an experimental comparison. The experimental compari- shown in Fig. 1. A feature vector contains information related
son is made over specific and heterogeneous collections. to the image visual properties, like color, texture, shape, and
Descriptors are analyzed in a Web environment with a database spatial relationship of objects.
containing more than 230 thousand images with very heteroge-  dD compares two feature vectors. As shown in Fig. 1, given two
neous content. The experimental analysis considers efficiency feature vectors, the function computes a distance or similarity
and effectiveness aspects. The effectiveness evaluation in the value between these vectors. The distance or similarity between
Web environment takes into account how much the descriptors the vectors is considered as the distance or similarity between
agree with the human perception of semantic similarity, by asking the images from which the vectors were extracted.
a pool of users to annotate the relevance of answers for each query.
Another important aspect of our study is related to scalability It is worth noting that in some papers from literature, what we
and diversity. How does a descriptor perform as the size of the col- call here as feature vector is considered the descriptor, with the dis-
lection increases considerably? And how does the heterogeneity of tance function being accounted elsewhere. We adopt the definition
the collection affect the descriptors effectiveness? Our experi- of [19], in which the descriptor also includes the distance function,
ments in the Web environment address both issues. since feature vectors need a specific distance function to establish
The large-scale, heterogeneous nature of the Web can benefit from the geometry of the description space: the vectors alone, without
the use of descriptors in combination. Although the combination of the metric, are meaningless. That is better understood if we ob-
descriptors is a complex topic, with many competing techniques, serve that the same type of feature vector may have radically dif-
and thus outside the scope of this work, we perform a general analysis ferent performances when used with different distance functions.
of the complementarity of the best descriptors, which should be
taken into account when selecting them for combined use. 2.2. Distance functions
We concentrate our evaluation on global image descriptors, since
local descriptors have a radically different cost  benefit compro- Distance functions are very important for the image descriptors.
mise, especially in the context of information retrieval involving Their choice has a huge impact in the descriptor performance.
high-level semantic contexts. Local image detectors and descriptors The most common distance functions are the function L1, also
have been extensively surveyed in [15–17], respectively. named Manhattan or city-block, the function L2, also known as
We have decided not to include shape descriptors in our study, Euclidean distance, and the function L1. Those common functions
because almost all of them are segmentation dependent. As is (or variations of them) are largely used. There are also, more com-
known, segmentation is still a hard and extremely application plex functions, like the Earth Mover’s Distance (EMD) [20].
dependent task. Therefore, shape descriptors are still not mature
for a heterogeneous environment like the Web. Readers interested 2.3. Color descriptors
in shape descriptors should refer to one of the comparative studies
in specific environments [4–8]. Another less common kind of One of the most important visual properties identified by hu-
descriptor, called spatial relationship descriptor, tries to encode man vision is color, making it one of the most used properties in
the spatial relationships between objects [18]. However, those CBIR systems.
descriptors also depend on the segmentation of images.
The paper is organized as follows. Section 2 presents the image 2.3.1. Taxonomy for color descriptors
descriptor definition used throughout this work and taxonomies Literature presents three main approaches for color analysis, as
for color and texture descriptors. Section 3 presents the results of shown in Fig. 2.
theoretical analysis of the descriptors, discussing the evaluation The global approach considers the image color information
criteria used and the theoretical comparative tables. Section 4 pre- globally. As no partitioning or pre-processing stage is applied to
sents the experimental evaluation, showing the experimental mea- the image during features extraction, descriptors from this ap-
sures adopted, the implementation details of each descriptor, and proach usually have simple and fast algorithms for extracting fea-
the results achieved for the specific collections. Section 5 presents ture vectors. However, as no information about the spatial
the evaluation for the Web environment. In Section 6, we conclude distribution of colors is encoded, those descriptors can have little
the paper. discriminating power. Many descriptors from global approach gen-
erate histograms as feature vectors, like, for example, the global
color histogram [21] and the cumulative global color histogram [22].
The fixed-size regions approach divides an image into cells of
2. Descriptors
fixed size and extracts color information from each cell separately.
The descriptors from this approach encode more spatial informa-
Both the effectiveness and the efficiency of content-based im-
tion, at the cost of usually generating larger feature vectors. Exam-
age retrieval systems are very dependent on the image descriptors
ples of descriptors from fixed-size regions approach are local color
that are being used. The image descriptor is responsible for charac-
histogram [21] and cell/color histogram [23].
terizing the image properties and to compute their similarities. In
The segmentation-based approach divides an image in regions
other words, the image descriptor makes it possible to rank images
that may differ in size and quantity from one image to another.
according to their visual properties.
This division is usually made by a segmentation or clustering algo-
rithm, what introduces an extra complexity to the features extrac-
tion process. Another kind of segmentation is the classification of
2.1. Definition pixels before feature extraction. Descriptors from that approach
usually present better effectiveness, although they are often more
The image descriptor can be conceptually understood as complex. Examples of descriptors from segmentation-based ap-
responsible for quantifying how similar two images are. Formally, proach are color-based clustering [24] and dominant-color [25,26].
an image descriptor D can be defined as a pair (D, dD) [19], where Authors often do not give their methods neither a name nor an
D is a feature-extraction algorithm and dD is a function suitable to acronym. To refer to those methods less awkwardly through the
compare the feature vectors generated: text, we have taken the liberty of giving them a short descriptive
O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380 361

Fig. 1. Image descriptor components.

Fig. 2. Taxonomy for color descriptors.


Fig. 3. Taxonomy for texture descriptors [27].

name and acronym. For methods already named in the original


Differently from color, texture is difficult to be analyzed consid-
publication, we have, of course, used their standard designation.
ering the value of a single pixel, since it occurs mainly due to the
variation in a neighborhood of pixels. That makes it possible to
2.3.2. Previous comparisons on color descriptors name some attributes for textures. According to [31], texture has
Besides the studies mentioned in the introduction, studies fo- attributes of roughness, contrast, directionality, regularity, coarse-
cused on the MPEG-7 descriptors are especially related to ours, ness, and line likelihood, the first three ones being the most
due to the large number of descriptors involved on the standard. important.
A comparative study of the color descriptors from the MPEG-7 Although most texture descriptors work on gray-scale, only few
standard [13] shows that CSD [26] has the best performance while of them specify how color images should be converted in order to
CLD [26] is pointed as the most sensitive to noise. optimize the descriptor performance.
Another comparative study of the MPEG-7 color descriptors
[14] points that CSD [26] has the best effectiveness, being better
2.4.1. Taxonomy for texture descriptors
than SCD [26], CLD [26], and DCD [26], in this order. According to
There are several approaches for texture extraction [27,30]. The
the study, DCD, although being the most computationally complex
taxonomy presented in Fig. 3 is the one found in [27].
descriptor, yields the worst results since it focuses on parts of
One of the most traditional ways to analyze the spatial distribu-
images and not on images as a whole.
tion of the gray levels of an image is by statistical analysis, as, for
The use of color in local features is a subject in itself, and out-
example, by computing the probability of co-occurrence of gray
side the scope of this work. In any case, the use of local features of-
values in different distances and orientations. The statistics can
ten imply computational costs which limits their use in Web-like
be computed over the values of single pixels (first-order statistics)
environments. The interested reader is, nevertheless, encouraged
or over the values of pairs of pixels (second-order statistics) [27].
to refer to the comprehensive study of van de Sande et al. [17].
Methods that characterize textures by means of histograms also
have statistical information about texture. One of the most popular
2.4. Texture descriptors statistical methods is the co-occurrence matrix [32].
Geometrical methods analyze textures by ‘‘texture elements’’ or
Texture is an important property for the characterization and primitives. This analysis is made considering the geometrical prop-
recognition of images. This fact is observed by the great amount erties of the primitives, like size, shape, area, and length. Having
of research involving texture analysis of images [10,27–29]. the primitives being identified in an image, placement rules are ex-
Texture is easily recognized in images, as can be shown by ele- tracted from them, like grids or statistics from relative vectors that
ments like sand, leaves, clouds, bricks, etc. However, it is difficult to join the primitives centroids [33]. That kind of analysis becomes
provide a formal definition for it. Literature gives a variety of def- difficult for natural textures, because the primitives and the place-
initions, as shown in [27]: ‘‘its structure is simply attributed to the ment rules can be irregular. For example, describing a wall of
repetitive patterns in which elements or primitives are arranged bricks by the primitive brick and a grid as placement rule can be
according to a placement rule’’; ‘‘an image texture is described simple. However, describing the clouds in the sky is much more
by the number and types of its primitives and the spatial organiza- difficult, since the element cloud can have variable size and shape
tion or layout of its primitives’’. According to [30], texture can be and the positioning of them is more complex.
described by spatial, frequency or perceptual properties. In a gen- Model-based methods rely on the construction of image models
eral way, texture can be understood as a set of intensity variations that can be used to describe and synthesize textures. The parame-
that follow certain repetitive patterns. ters from the model capture the essential perceived qualities of
362 O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380

texture [27]. For example, texture elements can be modeled as: a which is an on-line process. The on-line part directly affects the re-
dark or a bright spot, an horizontal or vertical transition, corners, sponse time of the system. Therefore, it is important for the feature
lines, etc. Descriptors from that approach work well for regular extraction algorithm to be fast.
textures. The local binary pattern [34] descriptor is an example of Our complexity analysis is informed in asymptotic big-O nota-
model-based descriptor. tion, which ignores additive and multiplicative constants. As such,
Signal processing methods characterize textures by applying fil- it should be taken with a grain of salt, since two O(n) algorithms,
ters over the image. Both spatial domain and frequency domain fil- though theoretically similar in behavior, may vary considerably
ters can be used. Descriptors based on wavelets and Gabor filters in practice, because of the hidden constants.
follow that approach, like the homogeneous texture descriptor It will be seen that almost all extractors are linear, or at worse
[28,35], for instance. (when they work on the frequency domain), log-linear in the num-
We have addressed the issue of methods originally without a ber of image pixels. There are notably exceptions, though, for
name or acronym in the same way as we did for the color descrip- example: methods based on scale-spaces must take into account
tors (Section 2.3). the number of scales analyzed. The rare methods which perform
complex adaptive clustering are more expensive.
2.4.2. Previous comparisons on texture descriptors
Literature has some comparative studies on texture descriptors 3.1.2. Distance function complexity
[9,11,10,36]. A study compares a Fourier-based descriptor with a The distance function of a descriptor is very time consuming
Gabor-based descriptor [36] trying to identify the descriptor that when a query is being processed in a CBIR system. Distance func-
is the most robust to noise. The results show that the Fourier-based tions need to be fast, because during the search process, the query
descriptor has better performance in images with no noise and that image will be compared to a large number of candidate images.
the Gabor-based descriptor is better for noisy images. The distance function is also important for indexing issues. The
Another study compares the texture descriptors of MPEG-7 stan- use of indexing structures is important for systems with large dat-
dard [10]. Considering features extraction cost, the study shows that abases and, therefore, critical in a Web environment. Without
TBD [26,35] is the most expensive and that EHD [26] is the cheapest indexing structures the response time would be unacceptable.
one. Besides that, the study points out that HTD [35,28] captures glo- Indexing, however, imposes restrictions on distance functions, be-
bal information, while TBD captures global and local information cause simple, axis-monotone norm-based functions are much
and EHD captures only local information. The study also shows that friendlier to indexing structures than elaborated and unpredictable
TBD is not indicated for CBIR tasks, being more useful for image decision procedures. The interaction between descriptors and
browsing and for defects detection. Additionally, HTD is suggested indexing structures is, however, complex, depending on the statis-
for being used in CBIR and in texture segmentation tasks, while tical characteristics of the feature vector, the image dataset, and
EHD is recommended for CBIR tasks. The study also shows that the properties of the index employed. Prospective studies on that
HTD and TBD are less sensitive to noise while EHD is not recom- subject may be found in [40,41].
mended in environments with noise. The experiments realized by Again, in our theoretical analysis, the complexity is given
the study [10] show that TBD has low effectiveness and that HTD asymptotically. It will be seen that almost all methods are based
is not good for images with rotation. That last problem of HTD is no- on simple distances (Manhattan, Euclidean, etc.) which are linear
ticed in the literature, given the number of versions proposing HTD in the feature vector size.
with rotation invariance, like the descriptors HTDR [37], HTDI [38],
and Han Gabor [39] present in our study. 3.1.3. Storage requirements
The image descriptor stores into feature vectors the encoded
image properties. Each image managed by a CBIR system is associ-
3. Theoretical comparison
ated with one or more feature vectors. As a result, the storage
space required for feature vectors is proportional to the amount
The comparative study performed in this paper comprises two
of images in the database. In a Web scenario, in addition to the very
analyses. This section presents the first one: a theoretical evalua-
large database size, there is the issue of image heterogeneity, mak-
tion conducted for 24 color descriptors and for 28 texture descrip-
ing almost indispensable to employ several descriptor at once. As a
tors. The theoretical evaluation considers the asymptotic
consequence, the storage space required is also proportional to the
complexities of the descriptors algorithms. The next section pre-
array of descriptors employed.
sents the second analysis: an experimental evaluation for the most
Seldom the feature vector size for a method is fixed, more usu-
promising color and texture descriptors. In the following section,
ally it depends on a set of parameters which can be adapted from
color and texture descriptors are tested in a Web-like environment.
application to application. For color descriptors, for example, many
descriptors allow to customize how many colors are to be chosen
3.1. Evaluation criteria on a quantization step. Methods based on frequency-domain trans-
forms (Wavelets, Fourier, DCT) often have a choice of selecting how
Based on the main elements related to the search process in a many transform coefficients are considered. Many methods can be
content-based image retrieval system (as explained in Section 1), seen as multidimensional histograms, and, as such, grow as fast as
the following criteria are used to compare the descriptors: features the product of the number of bins in each dimension. Our analysis
extraction complexity, distance function complexity, storage considers those mentioned possible variations in vectors sizes.
requirements, effectiveness, and validation environment. Dimensionality reduction techniques may be employed in order
to alleviate the storage requirements of the larger descriptors.
3.1.1. Features extraction complexity However, often (but not always) they also impose a loss in effec-
The features extraction algorithm of a descriptor is used when- tiveness. There are many dimensionality reduction techniques,
ever the features from an image need to be extracted. The extrac- including the linear projection techniques like principal compo-
tion is required basically in two moments in a Web CBIR system. nent analysis (PCA) and linear discriminant analysis (LDA), and
The first one is when the images are being collected from the metric embedding techniques (both linear and non-linear). The
Web to be included in the local database, which is usually an off- interaction between dimensionality reduction and descriptor per-
line process. The second moment is when a query is performed, formance are, however, far from trivial and outside the scope of
O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380 363

Table 1
Color descriptors: reference, short description, and complexity analysis.

Class Reference Acronyma Short description Complexityb Vector size Notes


Extraction Distance
Global Williams and JAC Joint O(n) O(vs) NC  NGrad  NRank NGrad, NRank and
Yoon [48] autocorrelogram of  NText  NDist values NText = number of
color, texture, quantization levels for
gradient and rank gradient magnitude, rank,
and texturedness,
respectively
Nallaperumal CW-LUV Wavelet coefficients O(n) O(vs) 127 bits
et al. [46] of color histogram in
CIELuv space
Utenpattanant CW-HSV Wavelet coefficients O(n) O(vs) 63 bits
et al. [47] of color histogram in
HSV space
Chaira and Chaira Color histogram O(n) O(vs) 3  NC values
Ray [49] Fuzzy with fuzzy distance
based on gamma
membership
function
Moghaddam WC Color correlogram O(nlogn) O(vs) NWav  NDist  NMat NWav = wavelet coefficients
et al. [44] based on directional  NS values quantization,
wavelet coefficients NMat = number of matrices,
NS = number of scales
Paschos et al. CM⁄ Spectral OðnÞ þ OðN 2C  N Mo Þ O(vs) 2  NMo values NMo = number of moments
[50] chrominance of an
image by moments
of chromaticity,
based on
chromaticity
histogram
Manjunath SCD⁄ Haar transform of O(n) O(vs) NHaar bins NHaar = number of Haar
et al. [26] - color histogram coefficients
(Section B)
Manjunath CSD⁄ Color histogram in O(n) O(vs) 32–184 values of
et al. [26] – the perceptual 8 bits each
(Section C) HMMD space,
computed with a
scale sensitive
structuring window
Huang et al. ACC Color O(n) O(vs) NC  NDist values
[51] autocorrelogram
Stricker and CGCH Cumulative color O(n) O(vs) NC values
Orengo [22] histogram
Swain and GCH Global color O(n) O(vs) NC values
Ballard [21] histogram
Global and Lu and Chang Color Global and local O(n) O(vs) 3  NB bits
fixed-size [52] Bitmap information by + 6 values
regions comparing average
color of blocks to the
whole image average

Fixed-size Sun et al. [53] CDE⁄ Spatial information O(n) O(vs) 2  NC values
regions of colors, using
entropy
Stehling et al. CCH⁄ For each color in O(n) O(vs) 0.45  NC  NB or   average case or worst
[23] image, histogram NC  NB values   case, respectively
representing its
distribution among
the cells
Li [45] SBS⁄ Color histograms for O(n) O(k  m  NC) NB or NC  NB   value or histogram for
each block in an values   each block; k = NB in query
image combined by image, m = NB in database
weighting according image
to their position
Manjunath CLD⁄ Uses DCT on average O(nlogn) O(vs) 3  NDCT values NDCT = number of DCT
et al. [26] – colors of an image coefficients
(Section E) divided by a grid

(continued on next page)


364 O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380

Table 1 (continued)

Class Reference Acronyma Short description Complexityb Vector size Notes


Extraction Distance
Swain and LCH Concatenation of O(n) O(vs) NC  NB values
Ballard [21] color histograms,
obtained from an
image divided by a
grid

Segmentation Yang et al. DCDI DCD (see below) O(n) +   O(NDq  NDi) 4  ND values   Linear Block Algorithm
based [43] with a new color (LBA) has unknown
clustering algorithm complexity, but lower than
GLA; NDq = ND in query
image, NDi = ND in database
image
Wong et al. DCSD⁄ Combination of DCD O(n) + notec O(vs) 3  ND values
[42] (see below) and CSD
(see above)
Lee et al. [54] Lee SC Color spatial O(n) O(NC) 2  NC + NBord NBord = number of border
information with pixels
color adjacency
histogram and color
vector angle
histogram
Stehling et al. BIC⁄ Two color O(n) O(vs) 2  NC values
[55] histograms in RGB
space, one for border
pixels, one for
interior pixels
Manjunath DCD⁄ Dominant colors O(n) + notec O(NDq  NDi) 4  ND values NDq = ND in query image,
et al. [26] – (most representative NDi = ND in database image
(Section D) colors), as well as
spatial coherence
and color variances
Stehling et al. CBC⁄ Characterization for O(nlogn) O(NRq  NRi  log 6  NR values NRq = NR in query image,
[24] each region of an (NRq  NRi)) NRi = NR in database image
image, which are
determined by color
clustering
Pass et al. [56] CCV⁄ Two color O(n) O(vs) 2  NC values
histograms in RGB
space, one for large
regions, one for
small regions
a
Names originally given by the authors are marked with ⁄.
b
n = number of pixels in an image, vs = vector size, NC = number of colors in quantized space, ND = number of dominant colors, NDist = number of distances, NB = number of
blocks or cells, NR = number of regions.
c
Employs the Generalized Lloyd Algorithm (GLA), commonly used in the k-means clustering algorithm, whose complexity is currently unknown.

this work. Here we are interested in evaluating the ‘‘raw’’ behavior considered the ground truth and creates the possibility for an auto-
of the descriptors, with minimum external influences. matic effectiveness evaluation. A second way to evaluate descrip-
tors is conducted over databases with no fixed ground truth. In
3.1.4. Effectiveness those cases, the system performs a pool of queries and selected
Effectiveness stands for the capacity of a descriptor to retrieve users judge how well the system fared according to their own sub-
relevant images. An effective descriptor puts together images jective criteria. That procedure introduces a source of variability,
which are related in the users’ viewpoint, so that during retrieval but has the advantage of evaluating the system according to the
relevant images are ranked first. The success of an image retrie- opinion of potential users.
val system is, of course, directly dependent on the quality of its Once the matter of annotation is settled, there remains the
results, an thus, on the effectiveness of the descriptors. A user choice of which metric to use to summarize effectiveness over
might tolerate occasional delays, but not systematically wrong the set of queries (and occasionally users) employed. The most tra-
answers. ditional choice are the measures of Precision and Recall, often in
combination or as a function of each other (in a Precision  Recall
3.1.5. Measuring effectiveness graph, for example). Precision measures the fraction of relevant
Effectiveness poses an issue: it depends upon the agreement images in proportion to the answer set, while Recall measures
about which answers are considered relevant for queries. Reaching the fraction of relevant images retrieved in the answer set in pro-
that agreement may be very difficult, since the concept of relevant portion to all relevant images existing in the database. A perfect
is dependent on each user mental model of the query. system would have a Precision of 1 (all images in the answer set
In practice, experimenters have two choices. First, to use a cat- are relevant) and a Recall also of 1 (all relevant images in the
egorized image database, where the database categorization is database are retrieved in the answer set). In practice there is a
O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380 365

Table 2
Color descriptors: validation environment.

Descriptor Collection size Collectiona Evaluation measuresb Compared withb


c
JAC [48] 24000 Berkeley Digital Library Scope  Recall GCH, Joint Histogram, Color correlogram
CW-LUV [46] 100 N/A ANMRR, AVR, MRR, NMRR HSV and CIELuv color spaces
CW-HSV [47] 2997 Heterogeneous from the Web ANMRR Itself
Chaira Fuzzy [49] (1) 100, (2) 120 (1) Country flags, (2) Textures Precision, Recall, Precision  Recall Three different distance functions: fuzzy
divergence, fuzzy similarity measure, GTI
model
WC [44] 1000 COREL Precision, Rank Simplicity, WBIIS, Color correlogram, Border
correlogram
CM [50] 2266 VisTex (smooth, noise, etc.) Precision, Recall Histograms intersection
SCD [26] 5000 CCD MPEG-7 ANMRR Itself
CSD [26] 5000 CCD MPEG-7 ANMRR Itself
ACC [51] 14554 Heterogeneous r-measure, avg-r-measure, p1-measure, GCH, CCV, CCV/C
avg-p1-measure; Recall  Scope
CGCH [22] 3000 Heterogeneous Rank GCH
GCH [21] N/A Objects in close Average match percentile Histogram intersection, Incremental
intersection
Color Bitmap [52] (1) 800, (2) 470, (3) (1) Animations, (2–3) Full color Retrieval accuracy GCH, Color moments, Chang and Liu method
10000
CDE [53] 8000 Heterogeneous from the Web ANMRR, Precision, Recall SCH, Geostat
CCH [23] 20000 COREL Precision  Recall, habs, hrel GCH, CCV, LCH
SBS [45] N/A N/A N/A GCH
CLD [26] 5000 CCD MPEG-7 ANMRR Itself
LCH [21] N/A Objects in close Average match percentile Histogram intersection, Incremental
intersection
DCDI [43] 2100 COREL ARR, ANMRR DCD
DCSD [42] 5466 CCD MPEG-7 ANMRR, NMRR DCD, CSD, SCD, CLD
Lee SC [54] 5000 Heterogeneous ANMRR, Precision, Recall GCH, Hybrid Graph Representation, Color
correlogram
BIC [55] 20000 COREL Precision  Recall, h  Recall, PR, P30, GCH, CCV, CBC, LCH
R30, P100, R100, 3P-Precision, 11P-Precision
DCD [26] 5000 CCD MPEG-7 ANMRR Itself
CBC [24] 20000 COREL Precision  Recall, NavgR CCV, CMM, GCH, LCH, CSH
CCV [56] (1) 11667, (2) 1440, (3) (1) Chabot, (2) QBIC, (3) COREL, Rank GCH
1005, (4) 93, (5) 349 (4) PhotoCD, (5) Video scenes
a
Heterogeneous and Heterogeneous from the Web mean that a non-standard heterogeneous collection was employed; in the latter case the images were collected from the
Web.
b
Evaluation measures and descriptors names that appear in italic are not used in this study, but their explanations can be found in the papers where they are mentioned.
c
We make a distinction between metrics which appear isolated and metrics which appear versus another. The latter appear with a ‘’ between them.

compromise between the two measures: Recall is related to the Measures that incorporate the concept of ranking ab initio have
concept of true positive rate, of the theory of receiver operating also been proposed, notably for the evaluation of the descriptors on
characteristic (ROC), while Precision is related to 1-false positive the MPEG-7 (the Rank itself, AVR, MRR, NMRR, ANMRR, etc. [26]).
rate. Because of the compromise between Precision and Recall, As it will be seen, there is less consensus on the literature on those
sometimes a combination of the two is employed, as a single metric. measures and more tendency of authors on making ad hoc
This is the case of the mAP (mean average precision), F1-score, etc. adjustments each time they are used, instead of keeping them
The problem of Precision and Recall (and their combined mea- consistent.
sures) is that they tend to ignore the ranking of the images. For
example a Precision of 0.5 indicates that half the images in the an-
swer set are correct without saying if it is the first half or the sec- 3.1.6. Validation environment
ond half. In order to count that, those measures can be taken at The validation environment comprised, in addition to the cho-
specific points, considering answer sets of (usually small) growing sen metrics of efficiency and effectiveness, the image databases
sizes. Comparing the Precision at an answer set of 10, 20, and 30 in which the descriptor was tested, and, most importantly, other
images (called P10, P20, and P30, respectively), one can have a better descriptors to which it was compared. Those aspects reveal, for
idea of how the correct images are ranked. Usually, one is inter- example, if the descriptor was adequately assessed.
ested only in the top 10 or 20 answers, corresponding to the first It will be seen that there is little standardization in the valida-
page of answers, since it is observed in real-world search engines tion environment, specially for color descriptors. Several different
that users seldom take the effort of checking the second page, in- evaluation measures and image databases are employed, making
stead preferring to reformulate the query. Corresponding measures it difficult to make a meta-analysis. For texture descriptors, valida-
for the Recall (R10, R20, etc.) also exist. tion is commonly based on classification experimental designs,
366 O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380

Table 3
Texture descriptors: reference, short description, and complexity analysis.

Class Reference Acronyma Short description Complexityb Vector size Notes


Extraction Distance
Statistical Kiranyaz et al. 2DWAH⁄ Local directional 2D   O(vs) 2  LS2 values   Several complex steps,
[59] histograms on the edge field including a at least O(nlogn)
of a scale-space filter. The description is
representation, metaphor of insufficient to deduce the
a ‘‘walking ant’’ complexity. LS = line of sight
of the ant
Huang and Liu QCCH⁄ Quantized histogram of the O(n) O(vs) NQ values
[60] compound rate of change of
gray values for each pixel in
four directions
Hadjidemetriou MRH Set of gray-scale histograms O(n  k) O(vs) O(NQ  L) k = width of separable
et al. [61] built on a scale-space Gaussian filter, L = number
pffiffiffi
representation of scales = log2 n  1
P
Çarkacioglu SASI⁄ Statistics of clique auto- O(S  L  n) O(vs) 2  NK  ðdW i =4e þ 1Þ S  L = windows sizes in
and Yarman- correlation coefficients values pixels, Wi = width of ith
Vural [62,63] calculated over structuring window
windows
Zhou et al. [64] SOH⁄ Histograms of scale and O(n) or O(t  h  vs) NS  NK values   for NS small or for NS large,
orientation built using the O(n2)   respectively; t = number of
frequency domain translations, h = number of
rotations
Manjunath EHD⁄ Edge histograms in different O(n) O(vs) 3  NB  NBQ bits NBQ = borders quantization
et al. [26] – directions by dividing the
(Section IV.C) image into blocks and using
edge detectors
Tao and LAS⁄ Local activity spectrum O(n) O(vs) NQ values
Dickinson [65] from histogram of image
gradient field
Kovalev and CCOM Co-occurrence matrix O(n) O(vs) 2  N 2Q  N Dist values     Worst case
Volmer [66] considering color
information
Unser [67] Unser Sum and difference O(n)   N/A 4  NDir  NQ values  O(NQ) for extracting
histograms based on the histograms information;
sum and difference of NDir = number of directions
random variables of same
variance
 
Haralick et al. GCOM Gray level co-occurrence O(n)  N/A OðN 2Q  N Dist  N Dir Þ or   O N 3Q for extracting
[32] matrix 14  NDir values à matrices information; à
matrix space or matrix
information space,
respectively; NDir = number
of directions
Model-based Takala et al. Takala Local binary patterns (see O(n) O(vsq  vsi) O(NB  SLBP) vsq = vs of query image,
and [68] LBP below) computed on vsi = vs of database image,
statistical overlapping blocks of the SLBP = size of LBP employed
image
Ojala et al. [34] LBP⁄ Gray-scale and rotational O(n) O(vs) 2 + P values P = number of neighbors
invariant local binary
patterns
Signal- Janney and Yu IFLT⁄ Rotation invariant features O(n) O(vs) NS  NQ values
processing [69] obtained from intensity
and vector derived from texture
statistical patch normalized and Haar
wavelet filtered
Signal- Zegarra et al. SBPH⁄ Global texture O(nlogn) O(vs) 256  NS  NK or   NU-SBP or U-SBP,
processing, [70] characterization by using 59  NS  NK values   respectively
model-based Steerable Pyramid
and decomposition and local
statistical arrangements of texture
determined by local binary
patterns
Proc Han and Ma Han Rotation and scale invariant O(nlogn) O(vs) 2  NS or 2  NK values     Scale or orientation
[39] Gabor representation by invariance, respectively
summations on the
conventional Gabor filter
responses
O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380 367

Table 3 (continued)

Class Reference Acronyma Short description Complexityb Vector size Notes


Extraction Distance
Zhang et al. HTDI Rotation invariant O(nlogn)   2 + (2  NS  NK) values   used SVM with RBF kernel
[38] descriptor based on HTD-
MPEG7 by shifting
circularly the feature vector
based on the dominant
orientation
Zegarra et al. M-SPyd⁄ Rotation and scale invariant O(nlogn) O(vs) 2  NS  NK values
[71] representation based on
Steerable Pyramid
decomposition
Huang et al. CSG⁄ Composite gradient vectors O(n) O(vs) 1440/NK values
[72] obtained from sub-images
through a wavelet
decomposition
Lee and Chen TBDI Efficient method for the O(nlogn) O(vs) 13 bits
[57] original TBD-MPEG7 by
using Fourier and Hough
transform
Wang et al. [73] BSpline Over complete B-spline O(n) O(vs) 20  NWD values
wavelet based statistical
features and fractal
signatures
Kokare et al. Kokare Cosine-modulated wavelet O(n) O(vs) 8  NWD values
[74] CW transform
Sim et al. [75] Sim Zernike moments computed O(nlogn) O(vs) 2 + NMo values NMo = number of moments
Zernike from power spectrum of
texture image
Yang and Liu MERF⁄ Random field built upon O(nlog n) O(vs) 2  (NS  NK  SG) SG = size of Gabor filter bank
[76] multi-resolution filters with values
the maximum entropy
method
Sim et al. [77] Sim DCT Texture description based O(nlogn) O(vs) 52 values
on discrete cosine
transform
Ro and Kang HTDR Feature vector shifts for O(nlog n) O(vs) 2  NS  NK values
[37] rotation invariance to the
original HTD-MPEG7
Manjunath TBD⁄ Texture’s regularity, O(nlogn) O(vs) 12 bits
et al. [26,35] directionality, and
coarseness by analyzing
responses from a set of scale
and orientation selective
filters
Manjunath HTD⁄ Gabors filters in different O(nlogn) O(vs) 2  NS  NK values
et al. [35,28] scales and orientations
Rubner and Rubner Clustering with kd-trees O(nlogn) +   O(vsq  vsi) NClu  (NK  NS + 1)   Clustering algorithm of
Tomasi [78] EMD over a space based on a set unspecified complexity;
of Gabor filter responses; vsq = vs of query image,
Distance measured by EMD vsi = vs of database image,
NClu = number of clusters
a
Names originally given by the authors are marked with ⁄.
b
n = number of pixels in image, vs = vector size, NQ = number of colors/values/gray levels in quantized space, NDist = number of distances, NB = number of blocks or cells,
NS = number of scales, NK = number of orientations, NWD = number of wavelet decompositions.

instead of retrieval ones. In general, it will be seen little interest in [24], that have more complex extraction algorithms. It is observed
measuring efficiency. that many of the more complex descriptors are based on segmen-
tation techniques.
3.2. Analysis for color descriptors Considering the distance functions complexity, it is observed
that the great majority of descriptors is linear to the feature vector
Table 1 presents the asymptotic complexities of features extrac- size. Notwithstanding, descriptors like DCDI [43], DCD [26], SBS
tion algorithms and distance functions, the feature vectors size, [45], and CBC [24] are more complex for distance computations.
and the taxonomy class of each descriptor. Descriptors are sorted, For the storage requirements criterium, we highlight the
inside each taxonomy class, in reverse chronological order. descriptors CW-LUV [46] and CW-HSV [47] because of the compact
The great majority of color descriptors extracts the feature vec- feature vectors they generate.
tor in linear time on the number of pixels. However, there are some Table 2 shows the validation environment of each descriptor.
descriptors, like DCSD [42], DCDI [43], DCD [26], WC [44], and CBC Many of the color descriptors are validated using simple image
368 O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380

Table 4
Texture descriptors: validation environment.

Descriptor Collection size Collectiona Evaluation measuresb Compared withb


2DWAH [59] 1760 Brodatz NMRR, ANMRR EHD, EPNH, ARP; Gabor, GLCM, ORDC
QCCH [60] 800 Heterogeneous from the Web and Precision, P20, P50, P80, Extraction SCH, Gabor
categorized time (seconds)
MRH [61] (1) 108, (2) 91, (3) (1) Synthetic, (2) Brodatz, (3) CureT Classification accuracy Fourier Spectrum, Color features, Wavelet
8046 package, Co-occurrence matrix, Markov
random fields
SASI [62,63] (1) 1792, (2) 976, (1) Brodatz, (2) CUReT, (3) PhoTex, (4) Top-15 average retrieval rate Gabor
(3) 480, (4) 2672, VisTex, (5) All together
(5) 5920
SOH [64] 600 Brodatz Average retrieval rate Wold, MRSAR
EHD [26] 11000 MPEG-7 ANMRR Itself
LAS [65] (1) 1920, (2) 100 Brodatz and satellite images; more Classification accuracy, showed query Gradient indexing (4 different gradient
than one texture per image results operators: Sobel, Prewitt, Roberts and
Laplacian), Gaussian Markov Random Fields
(GMRF), Local Linear Transform
CCOM [66] 20000 Heterogeneous Classification accuracy  query size N/A
Unser [67] (1) 3072, (2) 768 Brodatz Classification accuracy GCOM
and (3) 192
GCOM [32] (1) 243, (2) 170, (1) Photomicrographs of sandstones, Classification accuracy N/A
(3) 624 (2) Aerial photos, (3) Satellite images
of California coastline
Takala LBP [68] (1) 1350, (2) 900 (1) Corel, (2) German stamps Precision, Recall Color Correlogram, EHD
LBP [34] (1) 7744, (2) 3840 (1) Brodatz, (2) OUTex Classification accuracy Itself, Rotation invariant wavelet
and 4320
IFLT [69] (1) 672, (2) 3840 (1) Brodatz; (2) OUTEX_TC_00010 Classification precision Itself, LBP
SBPH [70] 640 VisTex Top-N average retrieval rate, with Original Steerable Pyramid Decomposition,
N = [16. . .64] GGD & KLD
Han Gabor [39] (1) 1792, (2) 52 (1) Brodatz, (2) MPEG-7 Precision, Recall Gabor
HTDI [38] (1) 91, (2) 1000 (1) Brodatz, (2) UIUCTex Precision HTD
M-SPyd [71] (1) 208, (2) 208, (1) Brodatz, (2) Brodatz rotated, (3) PN, Showed 4 query results Original Steerable Pyramid Decomposition,
(3) 208 Brodatz scaled Gabor Wavelets
CSG [72] 2400 Brodatz nT = precision or recall Gradient vectors, Wavelet Energy Signature
TBDI [57] (1) 896, (2) 1896 (1) Brodatz, (2) Corel Classification rate, Extraction time HTD
(seconds), Showed query results
BSpline [73] 1792 Brodatz Top-15 average retrieval rate Gabor, DWT
Kokare CW [74] (1) 1728, (2) 112, (1) Brodatz, (2) USC, (3) 1 artificial Top-16 average retrieval rate Daubechies wavelets, Gabor wavelets
(3) 16 texture
Sim Zernike [75] (1) 1856, (2) 832, (1) Brodatz + USC, (2) ICU, (3) Rotated Recall, Vector size (bytes), Extraction Gabor, Radon, Wavelet
(3) 880, (4) 115, (Brodatz + ICU), (4) Scaled (Brodatz), time (seconds), Distance complexity
(5) 2400, (6) (5) Corel, (6) LANSAT, (7) Rotated/ (number of sums and minus
8000, (7) 4900 Scaled (Brodatz + Corel) operations), Entropy
MERF [76] 1744 Brodatz Top-15 average retrieval rate Gabor, MRSAR
Sim DCT [77] more than 3000 Brodatz, ICU Recall, Vector size (bits), Extraction Gabor, Radon, Wavelet
time (seconds), Distance complexity
(number of sums and minus
operations)
HTDR [37] 880 MPEG-7 (Brodatz = 30 + ICU = 25) Top-15 average retrieval rate, Number HTD original distance function, complete
of distance computations, Distance function
time (seconds)
TBD [26,35] 448 Brodatz 5 user subjective evaluations N/A
HTD [35,28] 1856 Brodatz Top-15 average retrieval rate, Vector PWT, TWT, MRSAR
size (number of vector elements),
Extraction time (seconds), Search and
sort times (seconds)
Rubner EMD [78] (1) 1792 + others, (1) Brodatz, (2) Corel Showed query results N/A
(2) 500
a
Heterogeneous and Heterogeneous from the Web mean that a non-standard heterogeneous collection was employed; in the latter case the images were collected from the
Web.
b
Evaluation measures and descriptors names that appear in italic are not used in this study, but their explanations can be found in the papers where they are mentioned.
O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380 369

databases and the databases are usually small, rarely having more 4.1. Selected descriptors
than 10,000 images. One of the image databases commonly used is
Corel, in whole or in parts, but it is often used in non-standard sub- In this section we present the descriptors selected for the exper-
sets. Table 2 also shows that some descriptors are not compared imental evaluation, justifying each choice. The implementation de-
with other descriptors. The global color histogram (GCH) comes clo- tails are shown in Sections 4.3 and 4.4 for color and texture
ser to a standard method again which most other methods tend to descriptors, respectively.
compare, but this is not an absolute rule – some methods for exam-
ple, compare only with versions of themselves. 4.1.1. Color descriptors
The most common evaluation measures are Precision, Recall, Global color histogram descriptor (GCH) [21], cumulative global
and ANMRR. There is seemingly little interest in measuring effi- color histogram (CGCH) [22], local color histogram descriptor
ciency: retrieval and extraction times are rarely offered, and no for- (LCH) [21], and color coherence vectors (CCV) [56] are popular
mal comparison of the complexities is ever considered. descriptors from literature, usually used as baseline for compari-
sons. LCH is a traditional descriptor based on fixed-size regions
and CCV is segmentation-based. GCH, CGCH, and LCH are all very
3.3. Analysis for texture descriptors
simple descriptors.
Descriptors based on correlograms are promising because they
Table 3 presents the results of the theoretical analysis of texture
encode spatial information, as the traditional color autocorrelo-
descriptors in terms of algorithms complexity for features extrac-
gram descriptor (ACC) [51] and the more elaborated joint autocor-
tion and distance computation. Table 3 also shows the feature vec-
relogram descriptor (JAC) [48]. Spatial information is usually lost
tors sizes and presents the taxonomy class of each texture
when using simple color histograms.
descriptor. Descriptors are sorted in reverse chronological order in-
The border/interior pixel classification descriptor (BIC) [55] and
side each taxonomy class.
color-based clustering descriptor (CBC) [24] are descriptors based
From Table 3, it is noticed that several descriptors present fea-
on segmentation. CBC has a complex extraction algorithm while
tures extraction complexity equal to O(nlogn). Most of these more
BIC has a very simple one. BIC has also presented promising results
complex descriptors are based on image filtering algorithms, using
in heterogeneous collections [55].
Gabor filters in many cases.
The Color Bitmap descriptor [52] was selected because it ana-
Considering the complexities of distance functions, it is pre-
lyzes image color properties globally and by using fixed-size re-
dominant the use of linear distance functions, which shows that
gions. Color structure descriptor (CSD) [26] was selected because,
the most laborious step related to texture description is the fea-
according to previous comparisons among the MPEG-7 color
tures extraction phase.
descriptors [13,14], it achieves the best performance.
From the analysis of feature vectors sizes, we highlight the com-
CW-HSV [47] and CW-LUV [46] descriptors are very simple and
pact feature vectors generated by the descriptors TBDI [57], TBD
generate very compact feature vectors. The chromaticity moments
[26,35], EHD [26], and LBP [34].
descriptor (CM) [50] is very simple and generates compact feature
Table 4 shows the validation environments of each texture
vectors.
descriptor. Considering the databases used for validation, there is
a predominance in the use of Brodatz database [58], which indi- 4.1.2. Texture descriptors
cates a certain standardization in the validation of texture descrip- The local binary pattern descriptor (LBP) [34] is very popular in
tors. However, some works use only part of the Brodatz database the literature and has simple algorithms presenting also good
instead of using the complete collection. It is also noteworthy that effectiveness and invariance to rotations and gray-scale variation.
because of the nature of the Brodatz textures, many articles use The homogeneous texture descriptor (HTD) [35,28] is a traditional
classification experimental designs instead of retrieval ones. Few descriptor from MPEG-7 standard.
descriptors are validated in heterogeneous databases, showing The Steerable Pyramid Decomposition descriptor with scale and
the lack of validation of texture descriptors for general image re- rotation invariance (M-SPyd) [71] is used in our experimental com-
trieval tasks. parison due to its good results [71] and its invariance to scale and
Considering the descriptors used as baselines for comparisons, rotation. The statistical analysis of structural information descrip-
there is a clear predominance in the use of descriptors based on tor (SASI) [62] was chosen because it presented better results than
Gabor filters. There are also descriptors that are compared with descriptors based on Gabor filters, as reported in [62].
no other descriptor or only compared with variations of The color co-occurrence matrix (CCOM) [66] was chosen due to
themselves. the popularity of the co-occurrence matrix for analyzing textures.
The evaluation measures most used are related to average re- It also aggregates color information to the original gray-level co-
trieval rate. There are some works worried about extraction and re- occurrence matrix (GCOM) [32]. Unser descriptor [67] was chosen
trieval times, but they are still rare. due to its ability to generate more compact feature vectors, to have
lower complexity and to present effectiveness similar to the GCOM
4. Experimental comparison in small datasets descriptor.
The quantized compound change histogram (QCCH) [60] was
Given the large amount of descriptors selected for theoretical chosen due to its simple extraction algorithm, its fast distance func-
evaluation, we have to choose the most promising descriptors for tion and its compact feature vector. The main reason for choosing
the experimental comparison. In order to contrast with the large- local activity spectrum descriptor (LAS) [65] was its simplicity for
scale Web-like environment, experiments were performed aiming both feature vectors extraction and distance computations.
to evaluate the descriptors’ performance in a controlled small-scale
4.2. Evaluation measures
scenario.
All those experiments were performed in the Eva tool,1 which 4.2.1. Efficiency in features extraction and distance computations
creates a standardized environment for the evaluation of the image The times required for features extraction and distance compu-
descriptors. Details of the Eva tool can be found in [79]. tations of each descriptor were measured in seconds, milliseconds or
microseconds. Time measures were taken by the Eva tool [79],
1
http://www.ic.unicamp.br/penatti/eva/ (as of September 15, 2011). which measures the average and standard deviation value for each
370 O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380

Table 5
Color descriptors implementation details.

Descriptor Color Quantization Feature vector size Distance function


space (number of values)
ACC [51] RGB 64 color, 4 distance (1, 3, 5, and 7) 256 L1
BIC [55] RGB 64 128 dLog
CBC [24] CIELab N/A Variable (6 values per region) L2 and Integrated
Region Matching (IRM)
CCV [56] RGB 64 128 L1
CGCH [22] RGB 64 64 L1
CM [50] CIE XYZ N/A (color), 6 moments 12 L1
Color Bitmap [52] RGB N/A (color), 100 blocks 300 bits + 6 values L2 and Hamming
CSD [26] HMMD 184 (color), 8  8 neighborhood 184 L1
CW-HSV [47] HSV 64 63 bits Hamming
CW-LUV [46] CIELuv 128 127 bits Hamming
GCH [21] RGB 64 64 L1
JAC [48] HSV 64 (color), 5 (gradient magnitude, rank, and texturedness), 4 distance (1, 3, 32000 L1
5, and 7), neighborhood 5  5 (rank and texturedness)
LCH [21] RGB 64 color, grid 4  4 1024 L1

feature vector extraction and for each distance computation. To Table 6


Color descriptors comparison: absolute and relative times for (a) each feature vector
minimize fluctuations on time measurements, Eva performs each
extraction (in milliseconds), (b) for each distance computation (in microseconds), and
feature vector extraction and each distance computation three (c) absolute and relative size of a single feature vector. Average values are shown with
times, keeping the average. From the absolute time values com- their respective standard deviation. Relative measures against GCH descriptor. Tables
puted, an analysis considering the relative performance of the are sorted by the average column.
descriptors was conducted. Therefore, besides showing the abso- Descriptor Average Rel.
lute times of each descriptor, we highlight how much faster or
(a) Extraction times (milliseconds)
slower each descriptor was in relation to a reference descriptor. CGCH 0.3 ± 0.9 0.81
To take into account fluctuations in measured execution times, GCH 0.4 ± 1.1 1.00
a very basic analysis was performed. A difference was consider- LCH 0.6 ± 0.9 1.45
able if it was bigger than the double of the maximum standard ColorBitmap 1.1 ± 1.3 2.69
BIC 2.3 ± 3.0 5.58
deviation for the two averages being compared: jli  ljj > 2 
CW-HSV 2.4 ± 0.9 6.05
max(ri, rj), where l and r are respectively the average and stan- CM 2.6 ± 1.3 6.31
dard deviation of the time measures. CCV 3.3 ± 0.9 8.18
Time measures were taken in a Dual 2.0 GHz Intel Xeon Quad- CSD 15.0 ± 4.3 37.13
ACC 16.6 ± 8.8 40.89
Core computer with 4 GB of RAM, 3 hard disks in RAID-0 by hard-
CW-LUV 18.7 ± 1.4 46.21
ware and Linux operating system. The implementations did not JAC 31.4 ± 4.1 77.58
take advantage of the computer parallelism facilities. CBC 51.0 ± 8.2 126.01
(b) Distance times (microseconds)
4.2.2. Efficiency in feature vectors storage CM 28.6 ± 7.5 0.65
The measures for the feature vectors sizes are bytes or bits mul- CW-HSV 29.9 ± 1.4 0.68
CW-LUV 32.0 ± 1.4 0.73
tiples. Based on the number of values stored in the feature vectors,
ColorBitmap 36.4 ± 2.9 0.83
the feature vector size was computed. We considered that each va- CGCH 43.2 ± 2.0 0.98
lue was a float or integer of 4 bytes each. The number of values in GCH 43.8 ± 1.7 1.00
each feature vector was based on the quantization levels suggested BIC 51.4 ± 1.6 1.17
CCV 52.1 ± 1.8 1.19
in the original versions of the descriptors and used in our
CSD 60.7 ± 2.0 1.38
implementations. ACC 71.8 ± 7.1 1.64
LCH 179.4 ± 17.7 4.09
4.2.3. Effectiveness JAC 788.8 ± 140.0 17.99
CBC 1154.7 ± 1470.9 26.34
The effectiveness measures computed in our experiments were:
Precision  Recall, mAP, P1, P5, P10, R1, R5, and R10. Precision  Re- (c) Feature vector sizes
CW-HSV 7.875 0.03
call and mAP, for the small-scale experiments, and P1, P5, P10, R1,
CW-LUV 15.875 0.06
R5, and R10, for the experiments in the Web scenario. The detailed CM 48 0.19
definition of metrics and criteria is explained on Section 3.1. ColorBitmap 61.5 0.24
Considering the Web-like scenario, it is important to emphasize GCH 256 1.00
that, as the dataset grows, only a small fraction is actually shown CGCH 256 1.00
BIC 512 2.00
to the users in query results. Therefore, it is critical that the rele- CCV 512 2.00
vant images are ranked in the topmost positions. Considering a CSD 736 2.88
Precision  Recall curve, this desired behavior occurs when a CBC 864 3.38
descriptor presents high Precision values mainly for small Recall ACC 1024 4.00
LCH 4096 16.00
values. JAC 128000 500.00

4.3. Results for color descriptors


The color descriptors were tested on ETH database (cropped-
The 13 color descriptors presented in Section 4.1.1 were imple- close). ETH is a free image database available on-line2 that contains
mented according to the details showed in Table 5. For GCH and
LCH we use L1 as the distance function. 2
http://www.d2.mpi-inf.mpg.de/Datasets/ETH80 (as of September 15, 2011).
O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380 371

Fig. 4. Precision  Recall curves for color descriptors in ETH database. The zoomed region represents the high-precision/low-recall zone, which is most important for a Web
environment. (The graph is best viewed in color.)

Table 7
Texture descriptors implementation details.

Descriptor Parameters Feature vector size Distance


(number of values) function
CCOM [66] RGB quantized in 216 bins, d = 1 Variable L1 similar
HTD [35,28] 4 Scales, 6 orientations 48 L1
LAS [65] 4 bins for each measure (non-uniformly quantized) 256 L1
LBP [34] R = 1, P = 8 10 L1
M-SPyd [71] 2 Scales, 4 orientations 16 L1
QCCH [60] T = 40 40 L1
SASI [62,63] Windows: 3  3, 5  5, 7  7; 4 directions (0°, 45°, 90°, and 135°); therefore, K = 8 64 Dot distance
Unser [67] Gray levels = 256 bins, 4 directions (0°, 45°, 90°, and 135°), d = 1.5; measures: mean, 32 L1
contrast, correlation, energy, entropy, homogeneity, maximum probability, and standard deviation

3,280 full color images categorized in 8 classes (apples, pears, toma- As observed, CBC is the most time-consuming descriptor for fea-
toes, cows, dogs, horses, cups, and cars), each class containing 410 tures extraction. In the theoretical evaluation, CBC is the most
images. Each image is composed by one object in the center over a complex descriptor for features extraction, having complexity
homogeneous background. The objects appear in different positions O(nlogn), while all the other 12 descriptors are O(n). Although all
and point of views. other descriptors are O(n), there still were large differences in their
In the following sections, the results of the experimental evalua- actual time measurements.
tion with the color descriptors are presented. The relative values are In distance computations. Table 6(b) shows the average time for
computed using global color histogram (GCH) as the reference one distance computation and the standard deviation for each
descriptor, because GCH is one of the most popular descriptors from descriptor, sorted by average time. The times relative to GCH are
the literature and it is also often used as baseline for comparisons. also shown. The values were obtained from 32,275,200 distance
computations for each descriptor (3,280  3,280  3).
4.3.1. Efficiency The descriptors BIC, CCV, CSD, ACC, LCH, and JAC are consider-
In features extraction. Table 6(a) shows the average time for ably slower than GCH descriptor. On the other hand, CM, CW-
extracting one feature vector, as well as the standard deviation, HSV, CW-LUV, and Color Bitmap are considerably faster than
for each descriptor, sorted by average time. Table 6(a) also shows GCH descriptor for distance computations.
the relative times of each descriptor in relation to GCH descriptor. According to the theoretical analysis, CBC was the only descrip-
The values were obtained from a total of 9,840 features extractions tor with distance function more complex than O(vs). That higher
for each descriptor (3 times for each of the 3,280 images). complexity is observed in the times showed in Table 6(b) where
Only CCV, CSD, CW-LUV, JAC, and CBC are significantly slower CBC was the slowest descriptor. The differences in the times con-
than GCH for features extraction. None of the tested color descrip- sidering the other descriptors that are O(vs) were basically due to
tors is significantly faster than GCH. the feature vector sizes.
372 O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380

In feature vectors storage. Considering the number of values in Table 8


the feature vectors generated by each descriptor, we computed Texture descriptors comparison: absolute and relative times for (a) each feature
vector extraction (in milliseconds), (b) for each distance computation (in microsec-
the feature vector sizes in bytes. Table 6(c) shows the absolute onds), and (c) absolute and relative size of a single feature vector. Average values are
and relative feature vectors sizes of each color descriptor. As CBC shown with their respective standard deviation. Relative measures against LBP
descriptor generates variable-size feature vectors, the size showed descriptor. Tables are sorted by the average column.
in Table 6(c) is the average size for ETH database. Descriptor Average Rel.
The descriptors ACC and JAC, that are based on the idea of cor-
(a) Extraction times (milliseconds)
relogram, generate large feature vectors. JAC, in particular, gener- LBP 1.7 ± 0.4 1.00
ates very large feature vectors because it computes the joint CCOM 3.1 ± 0.6 1.84
autocorrelogram for more than one image property. The LCH Unser 3.3 ± 0.5 2.02
descriptor also generates large feature vectors because it computes LAS 5.1 ± 1.5 3.09
QCCH 21.8 ± 1.0 13.15
a color histogram for each image region.
M-SPyd 218.2 ± 0.7 131.54
To illustrate the impact of the differences in size of feature vec- SASI 430.3 ± 0.9 259.39
tors in a Web scenario, assume a database with 109 images. CW- HTD 4892.6 ± 49.6 2949.22
HSV descriptor would use nearly 7 GB of space for storing feature (b) Distance times (microseconds)
vectors while Color Bitmap would require 57 GB, GCH 238 GB, Unser 31.3 ± 1.3 0.90
CSD 685 GB, and JAC would require 116 TB, for example. M-SPyd 31.9 ± 1.4 0.92
HTD 31.9 ± 1.5 0.92
QCCH 32.2 ± 1.6 0.93
LAS 32.8 ± 1.3 0.95
4.3.2. Effectiveness SASI 33.9 ± 1.0 0.98
Fig. 4 shows the average Precision  Recall curves for all the LBP 34.6 ± 1.3 1.00
evaluated color descriptors considering all ETH images as queries. CCOM 42.2 ± 1.9 1.22
For Recall value equal to 10% the best Precision values are very (c) Feature vector sizes
similar for BIC, CSD, JAC, ACC, and Color Bitmap descriptors. They LBP 40 1.00
M-SPyd 64 1.60
are the only descriptors with Precision values over 70%. They remain
Unser 128 3.20
the best descriptors until 50% of Recall. However, after 30% of Recall QCCH 160 4.00
the Precision values are similar for all descriptors (except for Color HTD 192 4.80
Bitmap, which is far above, and for CM, which is far below) which CCOM 248 6.20
makes it difficult to point out which descriptors are the best. SASI 256 6.40
LAS 1024 25.60
Nevertheless, if we analyze the curves considering a Web envi-
ronment, it is better if the descriptor has high Precision values when
Recall values are small. The reason is because that only a small por-
In the following sections, we present the evaluation results of
tion of the relevant images on the Web will be retrieved and showed
texture descriptors. The relative measures consider LBP descriptor
to the user, which means small Recall. Consequently, we can con-
as reference, due to its current popularity and its simplicity.
sider BIC, CSD, JAC, ACC, and Color Bitmap as the most effective.

4.4.1. Efficiency
4.4. Results for texture descriptors In features extraction. Table 8(a) shows the average times for
extracting one feature vector and the standard deviation values
The 8 texture descriptors showed in Section 4.1.2 were imple- for each descriptor, ordered by average time. The values were com-
mented according to the details showed in Table 7. puted from 5,328 features extractions (3 times for each of the
Given our application scenario, we had to adjust some descrip- 1,776 images). Relative times are also shown.
tors implementations. Our implementation of LBP uses values 1 for All descriptors are significantly slower than LBP for features
the parameter R (radio) and 8 for P (neighbors). Also, we use L1 dis- extraction. As observed, the descriptors M-SPyd, SASI, and HTD
tance instead of the original proposed for LBP, which is non-sym- are considerably slower than the others. As presented in the theo-
metric. The original M-SPyd descriptor, has invariances to scale retical comparison in Section 3.3, those three descriptors have
and rotation independently. In the implemented version, however, higher complexity than the other five descriptors compared here.
both invariances are computed at the same time. In other words, That high complexity resulted in their larger times for features
the feature vector generated by our implementation of M-SPyd is extraction. HTD is very time-consuming when extracting feature
invariant to scale and rotation at the same time. HTD, M-SPyd vectors, being yet more than 10 times slower than SASI.
and SASI had database dependent normalization steps which were The five descriptors with complexity O(n) presented little vari-
forgone, as they are incompatible with the highly dynamic Web- ation among themselves.
like scenario intended. Our implementation of LAS descriptor In distance computations. The average time and standard devia-
quantizes components non-uniformly, to avoid generating histo- tion for one distance computation by each descriptor are shown in
grams with an overwhelming majority of values in the first bin. Table 8(b), ordered by average time. The times relative to LBP are
When the descriptor proposed no distance function (which was also shown. The values were obtained from 9,462,528 distance
the case for Unser), we used L1. computations (1,776  1,776  3).
The texture descriptors were tested on Brodatz database. Bro- The difference in times relative to LBP is considerable only for
datz [58] is one of the most popular image databases for the eval- Unser and CCOM descriptors. Unser is faster than LBP and CCOM
uation of texture descriptors. As observed in Section 3.3, Brodatz is slower.
collection was the most widely used in the evaluation of the tex- Although there exist some difference in the times measured for
ture descriptors present on this work. Brodatz contains 111 differ- each descriptor, it is small. The theoretical analysis showed that
ent textures. As done in the great majority of papers, each texture those 8 descriptors have distance function with linear complexity.
is divided into fixed-size cells. Our experiments divided each tex- That fact is observed in the times measured, as they are similar
ture in 16 blocks, totalizing 1,776 images. The resulting database among the descriptors. The small differences are related to the fea-
is composed by 111 categories having 16 images each. ture vector sizes.
O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380 373

In feature vectors storage. The feature vector sizes of each descrip- images, but no semantical labeling. From now on, we refer to this
tor are presented in Table 8(c). The sizes were computed based on database as WebSample.
the parameters given in Table 7 as explained in Section 4.2.2. CCOM As we did for the previous experiments, efficiency and effective-
descriptor generates variable-size feature vectors, thus the size ness measures were computed for each descriptor. The reference
showed in Table 8(c) is the average size for Brodatz database. descriptor is global color histogram (GCH).
The biggest difference in feature vectors sizes is for LAS descrip-
tor. Although it generates feature vectors more than 25 times 5.1. Efficiency
larger than LBP’s feature vectors, the size is equal to ACC color
descriptor, which is a feasible size in many situations. For the efficiency measures, a subset of 500 images was ran-
The differences in feature vectors sizes become evident when domly selected from the original WebSample database.
we consider a Web environment. Considering a database with In features extraction. Table 9(a) presents the average time for
109 images, LBP descriptor would require nearly 37 GB for storing extracting one feature vector by each descriptor, as well as the
feature vectors, while HTD would use 179 GB, SASI 238 GB, and LAS standard deviation and the relative times. The values were com-
954 GB, for example. puted from 1,500 features extractions for each descriptor (3 times
for each of the 500 images).
4.4.2. Effectiveness WebSample has, in general, larger images and more image size
For the effectiveness evaluation, all images from Brodatz collec- variability than the dedicated databases used earlier. That is re-
tion were used as query images. Fig. 5 shows the average values flected in the larger averages and standard deviations reported in
among all queries performed. the extraction times.
LAS and SASI descriptors are clearly superior in terms of effec- No descriptor was considerably faster or slower than the others,
tiveness than the other descriptors. Considering small Recall val- although they present large differences in the average times. For
ues, the three best descriptors are LAS, SASI, and CCOM. example, the texture descriptors M-SPyd, SASI, and HTD are very
slow to process one image. HTD specially takes almost 20 s, on
average, to extract one feature vector. CBC descriptor is also slow,
5. Evaluation in a Web-like environment taking almost 1 s to process one image, on average. Considering a
Web search engine, M-SPyd, SASI, and HTD face considerable chal-
All the 13 color and 8 texture descriptors were evaluated in a lenges in order to be employed.
Web-like environment. That environment is mainly characterized In distance computations. Table 9(b) shows the average times
by a huge image database with highly heterogeneous content and standard deviation values for one distance computation, or-
and with no ‘‘fixed’’ or ‘‘final’’ ground-truth giving an authoritative dered by average time. The values were obtained from 750,000 dis-
classification of the images. tance computations (500  500  3) for each descriptor.
The database used was collected by researchers from Federal The times observed were similar to the small-scale experi-
University of Amazonas (UFAM), Brazil, with the objective to create ments on the previous section because most of the descriptors
a collection with representative data from the Web. The data gath- generate fixed-size feature vectors. CBC descriptor, however, gen-
ering started recursively from the Yahoo directory and generated a erates variable-size feature vectors and their distance times were
database with 234,143 images (excluding icons and banners) and more than 10 times larger in WebSample than in ETH database. To
1.2 million HTML documents. The database contains all kinds of compare one pair of feature vectors, CBC takes, on average, more

Fig. 5. Precision  Recall for texture descriptors in Brodatz database. The zoomed region represents the high-precision/low-recall zone, which is most important for a Web
environment. (The graph is best viewed in color.)
374 O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380

Table 9 as it takes more than 2 s, on average, to compare each pair of fea-


Descriptors comparison in the Web environment: absolute and relative times for (a) ture vectors.
each feature vector extraction (in milliseconds), (b) for each distance computation (in
microseconds), and (c) absolute and relative size of a single feature vector. Average
There are considerable differences in distance times between
values are shown with their respective standard deviation. Relative measures against the descriptors in relation to GCH. All the descriptors above GCH
GCH descriptor. Tables are sorted by the average column. Texture descriptors appear in Table 9(b) are faster and all below are slower, except for CGCH
in italic. and CBC.
Descriptor Average Rel. In feature vectors storage. The feature vectors sizes shown in
(a) Extraction times (milliseconds)
Table 9(c) were computed with the same parameters used for
CGCH 5.7 ± 29.7 0.8 the small-scale experiments. The sizes are the same for all
GCH 7.4 ± 32.8 1.0 descriptors, except for CBC and CCOM. These two descriptors,
ColorBitmap 8.7 ± 33.4 1.2 as discussed, generate variable-size feature vectors (whose aver-
LCH 11.1 ± 37.7 1.5
age sizes for WebSample are shown in Table 9(c)) and reveals a
LBP 19.8 ± 53.7 2.7
BIC 20.8 ± 51.6 2.8 large increase in comparison to the sizes in ETH and Brodatz
CCOM 23.4 ± 53.6 3.1 databases, respectively. CBC vectors increased, on average, more
Unser 39.9 ± 78.8 5.4 than 7 times and CCOM vectors, more than 4 times. CBC vectors
CW-HSV 40.3 ± 80.4 5.4
are larger because its clustering algorithm groups similar color
CM 44.6 ± 85.8 6.0
LAS 52.7 ± 96.2 7.1 and WebSample database has many more complex images than
CCV 62.3 ± 108.2 8.4 ETH database. CCOM vectors are larger because color is present
CSD 86.1 ± 72.6 11.6 in WebSample images, while it was absent in Brodatz
ACC 87.3 ± 147.1 11.7
textures.
QCCH 239.1 ± 359.3 32.2
CW-LUV 285.9 ± 435.1 38.4
Table 9(c) shows that there is little correlation between vector
JAC 487.8 ± 737.4 65.6 size and the type of descriptor (color or texture).
CBC 817.5 ± 1340.1 109.9 JAC descriptor generates by far the largest feature vectors. That
M-SPyd 2289.0 ± 3867.9 307.8 could have an important impact in the storage requirements of a
SASI 5101.3 ± 7222.1 686.1
HTD 19811.2 ± 25819.6 2664.4
CBIR system.
(b) Distance times (microseconds)
CM 34.0 ± 1.2 0.67
CW-LUV 34.7 ± 1.3 0.69 5.2. Effectiveness
CW-HSV 35.5 ± 1.2 0.70
Unser 36.8 ± 1.2 0.73 The effectiveness evaluation in WebSample collection was
QCCH 37.3 ± 1.3 0.74
M-SPyd 37.6 ± 1.2 0.74
based on a set of queries with an associated pool of relevants
SASI 38.0 ± 1.2 0.75 for each query [80]. This set has 30 images and each query
HTD 38.0 ± 1.2 0.75 has a pool of relevant images that were selected by real users.
LAS 38.9 ± 1.3 0.77 The query images are shown in Table 10. The main reason for
CCOM 41.7 ± 2.8 0.82
this kind of evaluation is that WebSample database has no stan-
LBP 41.8 ± 1.2 0.83
ColorBitmap 42.8 ± 1.4 0.85 dard categorization. Moreover, involving users in the evaluation
GCH 50.5 ± 1.3 1.00 process can measure descriptor effectiveness in a real scenario,
CGCH 52.1 ± 1.6 1.03 taking into account the semantic variability of final users
BIC 61.8 ± 1.3 1.22
judgment.
CCV 64.0 ± 1.7 1.27
CSD 74.1 ± 1.8 1.47 To create the pool of relevants, the user-oriented evaluation
ACC 87.1 ± 1.4 1.72 interface provided by Eva tool [79] was used. A set of human eval-
LCH 230.1 ± 1.9 4.55 uators composed by 18 people (graduate and undergraduate stu-
JAC 2023.1 ± 625.0 40.03 dents) analyzed the images retrieved by a set of descriptors [80]
CBC 11856.6 ± 10281.1 234.61
and marked them as relevant or non-relevant for each query. It is
(c) Feature vector sizes
important to say that there was no information about which
CW-HSV 7.88 0.03
CW-LUV 15.88 0.06 descriptor retrieved each image and no extra information was gi-
LBP 40 0.16 ven about each query image. Therefore, the users had no influence
CM 48 0.19 when interpreting the query image. Table 10 shows the sizes of
ColorBitmap 61.5 0.24 each pool of relevants.
M-SPyd 64 0.25
Unser 128 0.50
Using the pool of relevant images for each query, the effec-
QCCH 160 0.63 tiveness evaluation could be conducted automatically by analyz-
HTD 192 0.75 ing the images retrieved by each descriptor and verifying if they
CGCH 256 1.00 are present in the pool of the corresponding query. The pool
GCH 256 1.00
sizes shown in Table 10 include the query image. However, for
SASI 256 1.00
BIC 512 2.00 the computation of the effectiveness measures, the query was ig-
CCV 512 2.00 nored from the pool.
CSD 736 2.88 Table 11 presents the average P1, P5, P10, R1, R5, and R10 values
CCOM 999.61 3.90
among all query images for each descriptor, while Fig. 6 shows
ACC 1024 4.00
LAS 1024 4.00 the box plot for each descriptor considering the P10 for all 30
LCH 4096 16.00 queries. In general, the measures tell that the effectiveness of
CBC 6456.47 25.22 the descriptors is poor. On average, the best descriptors are able
JAC 128000 500.00
to retrieve around 2 relevant images between the 10 first results.
However, it is important to note that the descriptors evaluated
here characterize the image information globally, without having
than 10 s, making its use on large-scale environments very chal- any notion of which image region is more important for the
lenging. JAC descriptor is also challenging for a Web environment, user.
O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380 375

Table 10
Query images, their identification numbers, and the sizes of their respective pool of relevants.

Table 11 for one query, while BIC, LCH, and CCOM reached P10 of 90% in
Average P1, P5, P10, R1, R5, and R10 values over all the query images for WebSample one case.
database, sorted by P10. Texture descriptors appear in italic. The bold values show the To illustrate the effectiveness of the descriptors in some queries,
sorting criterium.
we have selected four queries, corresponding to the two easiest
Descriptor P1 P5 P10 R1 R5 R10 queries, and two cases that illustrate well the problem of semantic
JAC 0.667 0.433 0.287 0.036 0.118 0.146 gap. The worst descriptors in terms of effectiveness (CGCH, CM,
BIC 0.567 0.367 0.287 0.029 0.085 0.141 Color Bitmap, CW-LUV, LBP, M-SPyd, QCCH, and Unser) were elim-
ACC 0.500 0.367 0.267 0.025 0.092 0.138 inated from the following analysis since their precision was zero or
LCH 0.467 0.320 0.200 0.028 0.092 0.109
close to zero for almost all queries considered.
CCOM 0.367 0.227 0.167 0.017 0.055 0.077
CCV 0.333 0.180 0.113 0.013 0.042 0.053 Table 12 shows the precision and recall values for the two eas-
CSD 0.233 0.153 0.103 0.012 0.041 0.054 iest queries. Despite that, only 6 descriptors had a top-10 precision
HTD 0.233 0.147 0.093 0.011 0.031 0.038 above 50%, showing the challenge of the task. Query 11 is very ste-
GCH 0.367 0.113 0.090 0.020 0.031 0.049 reotypical, and query 29 has a large pool of relevant answers,
SASI 0.233 0.120 0.080 0.011 0.034 0.040
which are both factors that help to ease the task. However, we have
CW-HSV 0.167 0.107 0.070 0.010 0.023 0.030
LAS 0.200 0.073 0.053 0.010 0.021 0.028 observed that those are neither necessary nor sufficient conditions
CBC 0.100 0.073 0.043 0.007 0.024 0.030 to explain query difficulty.
LBP 0.133 0.067 0.040 0.005 0.015 0.016 Table 13 and Fig. 7 shows the results for two queries that illus-
Color Bitmap 0.067 0.053 0.033 0.003 0.012 0.014
trate well the semantic gap phenomenon.3
CW-LUV 0.133 0.060 0.033 0.009 0.017 0.019
CGCH 0.100 0.033 0.023 0.005 0.007 0.010 In general, color descriptors performed better than texture
QCCH 0.167 0.040 0.023 0.008 0.010 0.010 descriptors, but there was a large variability among queries. As
M-SPyd 0.067 0.027 0.020 0.004 0.006 0.009 we hinted above, the most common explanations (size of the pool
Unser 0.033 0.013 0.013 0.002 0.002 0.005 of relevants, simplicity of the image) were not sufficient to explain
CM 0.033 0.007 0.003 0.002 0.002 0.002
that variability. We have observed that the specific choice of the
query image may have an important effect on the user interpreta-
tion of its contents, which poses a challenge for query-by-example
systems based on a single query, and reinforces the importance of
query refinement/user feedback.
The precision and recall values indicate that the best overall
effectiveness is achieved by descriptors JAC, BIC, and ACC, which
have reached more than 26% of P10 and more than 13% of R10, in 5.3. Comparing effectiveness in small and large-scale experiments
the average case. The three are all color descriptors. The best
texture descriptor is CCOM with average P10 over 16%. Local col- In general, it is remarkable the effect of heterogeneity in the
or histogram (LCH) also has average P10 of 20%. descriptors’ effectiveness. To evaluate that effect, we have com-
Fig. 6 shows that many descriptors present very poor effec- puted the correlation between the effectiveness measures in the
tiveness, having P10 values close to zero for most of the queries. small-scale experiments and the results in the WebSample data-
This was the case for descriptors CGCH, CM, Color Bitmap, CW- base. Fig. 8 shows the correlation between the rank of the descrip-
LUV, LBP, M-SPyd, QCCH, and Unser. We can observe that there tors in each scenario and plots the linear regression model for
is a large variation in the behavior of most descriptors, having them. The rank is based on the mAP for the small-scale and the
queries with both very low and very high precisions. P10 for the large-scale experiment. For both color and texture
ACC, BIC, and JAC, which present the best average precisions,
also have similar medians in Fig. 6, except that BIC has more que- 3
We have provided an interface for the visualization of all queries results: <http://
ries with higher P10 values. For example, JAC reached P10 of 100% www.recod.ic.unicamp.br/eva/yahoo230k> (as of September 15, 2011).
376 O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380

Table 13
Average P10 and R10 values for queries 5 and 15, sorted by P10.
Texture descriptors appear in italic.

Fig. 6. Box plot with each descriptor P10 values on all 30 queries. Values above or few texture descriptors designed with heterogeneous database in
below 1.5 the interquartile range were considered outliers and indicated as small
mind. Accordingly, it has the best relative performance on the
circles. Color descriptors appear in light gray while texture ones appear in the right
in dark gray. Descriptors are sorted alphabetically. large-scale experiment, though its performance on the Brodatz
dataset is unremarkable.

5.4. Descriptors combination

Table 12 Considering the challenges of retrieving relevant images in a


Average P10 and R10 values for queries 11 and 29, sorted by P10.
Texture descriptors appear in italic.
Web scenario, the combination of more than one descriptor is of-
ten recommended. The exact way descriptors should be combined
(e.g., early fusion of the feature vectors, late fusion of the distances,
adaptive methods, etc.) in order to achieve maximum performance
is an open research question and outside the scope of this paper.
However, we follow the example of [3] and provide an indica-
tive metric, considering the degree of agreement among the best
descriptors studied over the queries, which is useful to determine
when the combination can be useful. Descriptors with a high de-
gree of agreement tend to reinforce each other, while descriptors
with a low agreement can be used to complement each other
and both effects must be taken into consideration when designing
combination algorithms.
To avoid disrupting the analysis, we have chosen only the best-
performing descriptors, and computed their correlations over the
queries on the WebSample dataset. That pairwise analysis is
shown in Table 14.
A more delicate analysis takes into account all the best descrip-
tors at once and studies the cross-correlations using a principal
component analysis. The two main components (which account
for 82% of the variability) are plotted in Fig. 9, and show that two
groups of descriptors tend to agree: JAC and LCH, on one hand,
and ACC, BIC and CCOM, on the other hand. Note also that the over-
all agreement between those best descriptors is significant.
descriptors, the correlation of relative effectiveness on the two The plot in Fig. 9 also reveals how individual queries performed
experiments is significant, but with notable departs. on those descriptors. Horizontally, queries to the left of the plot
Color Bitmap relative effectiveness is so much better in the tended to be easier than queries to the right. Vertically, queries
small-scale experiment than in the large-scale one, that we had to the center tended to perform equally well (or bad) for all
to consider it an outlier in the regression computation (with it, descriptors, while queries on top or bottom showed the largest
the correlation coefficient falls to R = 0.5199). Other notable de- differences.
parts are the LCH, GCH, CW-HSV, which perform relatively better
in the large-scale experiment. 5.5. Conclusion
For texture descriptors, the correlation observed is smaller,
reflecting the more specific nature both of many texture represen- The experiments in the Web environment have shown interest-
tation and the special-purpose Brodatz dataset. CCOM was one of ing aspects of the descriptors performances. Considering efficiency,
O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380 377

Fig. 7. Results for query 5 and 15 showing the semantic gap phenomenon. The query image is the first in the top-left position of the grid. In (a), there are many images having
people, but only one of them with people graduating. In (b), there are some images very similar in color properties, but not necessarily in semantics. Images retrieved by SASI
and BIC, respectively.

14 9
ColorBitmap
13 SASI
BIC 8
12
11 7 LAS
ACC JAC
10
CSD 6 HTD
9
rank(mAP)
rank(mAP)

CBC QCCH
8 5
7 CCV
LCH 4 CCOM
6
CGCH
5 3 LBP
4
CW-LUV GCH 2
3
M-SPyd
2
CW-HSV 1
1 Unser R=0.6429
CM R=0.7951
0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 7 8 9
rank(P10) rank(P10)

(a) (b)
Fig. 8. Correlation of effectiveness results of (a) color and (b) texture descriptors between small-scale experiments and the experiments in the Web scenario. The correlation
is based on the rank of the descriptors in each experiment. The correlation coefficient and the regression for the data are also shown. For the color descriptors regression, Color
Bitmap was considered an outlier. Note that the performance in small and large-scale experiments has significant correlation, but there are important deviations for both
color and texture descriptors.

Table 14 The effectiveness results have shown a real impact when chang-
Correlation coefficients between each pair of the five best descriptors considering the ing the scenario from specific to broad. Overall, descriptors
P10 values over the queries in WebSample database. Texture descriptors appear in performance fall considerably, making many of them not adequate
italic.
for the Web environment. Such decrease may come from both
ACC BIC JAC LCH CCOM the size and heterogeneity of the database, which aggravate the
ACC – – – – – semantic gap in images, since their interpretation by users in an
BIC 0.801 – – – – open setting is more subtle than in a specific context.
JAC 0.578 0.670 – – – We have noticed the influence of the image background in the
LCH 0.661 0.478 0.610 – –
descriptors’ effectiveness. Heterogeneous backgrounds, with colors
CCOM 0.659 0.712 0.622 0.431 –
and textures that mix up with the image main object, affected the
values generated in the feature vectors. While the human vision eas-
ily discerns the background and the main object, the extraction
it was noticed an increase in extraction times mainly due to the algorithms usually do not distinguish what is in evidence in an im-
larger dimensions of WebSample images. Some descriptors like age. In query images with homogeneous background, the feature
M-SPyd, SASI, HTD, and CBC face challenges in a Web search en- vectors generated were more representative for the image’s main
gine, because of their large extraction times. object.
Considering the times to compute distances, changing the sce- Considering all the experimental results, we could point five
nario caused very few impact for most of the descriptors, because good descriptors: JAC, BIC, ACC, LCH, and CCOM. However, consid-
most of them generate fixed-size feature vectors. For the descrip- ering the Web scenario, JAC would face the challenge of the effi-
tors whose vectors sizes depend on the complexity of the input im- ciency problems presented. LCH and ACC are slower than BIC for
age, some of them have suffered an increase in distance times more distance computations while ACC and CCOM generate larger fea-
than others. Descriptors like CBC and JAC show times challenging ture vectors than BIC. LCH generates larger vectors than BIC, CCOM,
for a Web scenario. and ACC. BIC presented better effectiveness. Therefore, BIC is the
378 O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380

−1.0 −0.5 0.0 0.5 1.0 cal comparison is conducted over 24 color and 28 texture descrip-
tors, including traditional and recently proposed descriptors. The
11 theoretical evaluation shows that most of the descriptors have
0.4

1.0
27
O(n) complexity for features extraction and have also linear com-
8
plexity for distance computations. The most promising descriptors
LCH
25 in the theoretical analysis were implemented and evaluated exper-
20
0.2

0.5
JAC imentally. In the experiments it was possible to measure the effi-
2nd component (11% of variability)

1 ciency in features extraction, in distance computations, in storing


6 23 10 feature vectors, and the effectiveness of the descriptors. Relating
7
13
30 22 2 theoretical and experimental evaluation, we have noticed that
0.0

0.0
5 3 12 descriptors with high theoretical complexity presented poor effi-
29 28 24
ACC 21
14 ciency performance in the experiments.
16 17 4
CCOM 19
Experiments were conducted over specific and heterogeneous
9

−0.5
BIC
−0.2

15 collections. After evaluating color and texture descriptors sepa-


18 rately in specific environments, the descriptors were all evaluated
in a Web scenario, by using a collection with more the 230 thou-
sand images with very heterogeneous content. Efficiency and effec-

−1.0
−0.4

tiveness measures were computed for each descriptor. The


26 effectiveness evaluation considered a pool of queries with their rel-
evant sets created by real users. The correlation between results in
specific and Web scenarios is not neglectable, but show important
−0.4 −0.2 0.0 0.2 0.4 deviations, that should be taken into account when selecting the
1st component (71% of variability) descriptors.
Color descriptors have presented precision values over 70% in
Fig. 9. Plot of the two main component of a Principal Component Analysis of the P10 ETH database, while in Web sample collection the average preci-
values over the queries in the WebSample database. Those two components explain sion was under 30%. Texture descriptors have shown precision val-
82% of the variability and indicate the two groups of descriptors that tend to
perform similarly: JAC and LCH versus ACC, BIC and CCOM. The small numbers
ues above 80% in Brodatz collection, however, in the Web scenario
identify the individual queries. the average precision was under 20%. In general, color descriptors
suffer less effectiveness degradation in relation to the texture
descriptors.
The experimental results in the Web scenario show that BIC
most recommended to be used in a Web environment, as it pre- [55] and ACC [51] are the best choices for an heterogeneous envi-
sented better measures in almost all aspects. ronment. BIC presents good efficiency results keeping one of the
Among the reasons for BIC good performance is its extraction highest effectiveness. ACC is slower than BIC for features extraction
algorithm. The classification of the pixels in interior or border is and distance computations and generates larger feature vectors.
similar to a simple segmentation between homogeneous regions The color co-occurrence matrix (CCOM) [66] would be the next
and textured regions. Consequently, some texture information is choice. CCOM is faster than ACC for features extraction and faster
also encoded into the feature vector. In other cases, this simple seg- than BIC for distance computations, however it generates larger
mentation can characterize the image objects separately from the feature vectors than BIC and its effectiveness is not as good as
image background, having a histogram for the background, usually the two others.
composed by interior pixels, and a histogram for the objects, usu- Overall, the semantic gap continues to be a big challenge for im-
ally composed by border pixels. age descriptors, especially in the context of information retrieval
A simple analysis considering the combination potential was (in opposition to image classification with large learning sets).
performed with the best descriptors revealing collective correla- Users assignment of answer relevance based on a single image is
tions that should be taken into account for combination subtle, because obviously, a single image may represent many dif-
techniques. ferent concepts. The necessity to represent a main object capturing
the concept of interest over variable complex backgrounds is also a
challenge.
6. Discussion and conclusions We expect that relevance feedback techniques, query expansion
and query refinement, computer vision-based approaches, and
The existence of a huge amount of visual information available methodologies able to effectively combine different descriptors
on the Web nowadays demands the creation of efficient and effec- will continue to be important in order to improve retrieval
tive retrieval systems. Content-based image retrieval (CBIR) sys- effectiveness.
tems are suitable for this task. The main aspect of CBIR systems
is the ability to search and index images by their visual properties,
like color, texture, shape and spatial relationship of objects. In a Acknowledgments
CBIR system, the image descriptor has a fundamental role, as it is
responsible for measuring images visual similarities. We would like to thank Fapesp (Grant Nos. 2006/59525-1,
This paper presents a comparative study of global color and tex- 2007/52015-0, 2009/10554-8, 2009/18438-7, and 2009/05951-8),
ture descriptors considering the Web as environment of use. Web CNPq, Capes, and Microsoft Research for infrastructure and finan-
presents an extremely large and heterogeneous scenario. This pa- cial support. We thank the colleagues who have helped with the
per also serves as reference for developers and researchers looking descriptors implementations. We also thank the researchers from
for image descriptors suitable for heterogeneous systems. Federal University of Amazonas (UFAM), Federal University of
The comparative study considers theoretical and experimental Minas Gerais (UFMG) and University of Campinas (Unicamp)
aspects, using efficiency and effectiveness criteria. The study also who have given us the pool of queries to be used in our
shows taxonomies for color and texture description. The theoreti- experiments.
O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380 379

References [33] R. Nevatia, Machine Perception, Prentice-Hall, 1982.


[34] T. Ojala, M. Pietikäinen, T. Mäenpää, Multiresolution gray-scale and rotation
invariant texture classification with local binary patterns, IEEE Transactions on
[1] M.L. Kherfi, D. Ziou, A. Bernardi, Image retrieval from the world wide web:
Pattern Analysis and Machine Intelligence 24 (7) (2002) 971–987.
issues, techniques, and systems, ACM Computing Surveys 36 (1) (2004) 35–67.
[35] P. Wu, B.S. Manjunath, S. Newsam, H.D. Shin, A texture descriptor for browsing
[2] R. Datta, D. Joshi, J. Li, J.Z. Wang, Image retrieval: ideas, influences, and trends
and similarity retrieval, Signal Processing: Image Communication 16 (1–2)
of the new age, ACM Computing Surveys 40 (2008) 1–60.
(2000) 33–43.
[3] T. Deselaers, D. Keysers, H. Ney, Features for image retrieval: an experimental
[36] U.A. Ahmad, K. Kidiyo, R. Joseph, Texture features based on fourier transform
comparison, Information Retrieval 11 (2) (2008) 77–107.
and Gabor filters: an empirical comparison, in: International Conference on
[4] S. Loncaric, A survey of shape analysis techniques, Pattern Recognition 31 (8)
Machine Vision, 2007, pp. 67–72.
(1998) 983–1190.
[37] Y.M. Ro, H.K. Kang, Hierarchical rotational invariant similarity measurement
[5] D. Zhang, G. Lu, Review of shape representation and description, Pattern
for MPEG-7 homogeneous texture descriptor, Electronics Letters 36 (15)
Recognition 37 (1) (2004) 1–19.
(2000) 1268–1270.
[6] M. Safar, C. Shahabi, X. Sun, Image retrieval by shape: a comparative study, in:
[38] L. Zhang, J. Ma, X. Xu, B. Yuan, Rotation invariant image classification based on
IEEE International Conference on Multimedia and Expo, 2000, pp. 141–144.
MPEG-7 homogeneous texture descriptor, in: International Conference on
[7] D. Zhang, G. Lu, A comparative study of fourier descriptors for shape
Software Engineering, Artificial Intelligence, Networking, and Parallel/
representation and retrieval, in: Asian Conference on Computer Vision, 2002,
Distributed Computing, vol. 3, 2007, pp. 798–803.
pp. 646–651.
[39] J. Han, K. Ma, Rotation-invariant and scale-invariant Gabor features for texture
[8] D. Zhang, G. Lu, Evaluation of MPEG-7 shape descriptors against other shape
image retrieval, Image and Vision Computing 25 (9) (2007) 1474–1481.
descriptors, Multimedia Systems 9 (1) (2003) 15–30.
[40] S. Barton, V. Gouet-Brunet, M. Rukoz, C. Charbuillet, G. Peeters, Estimating the
[9] T. Ojala, M. PietikSinen, D. Harwood, A comparative study of texture measures
indexability of multimedia descriptors for similarity searching, in: Adaptivity,
with classification based on featured distributions, Pattern Recognition 29 (1)
Personalization and Fusion of Heterogeneous Information, 2010, pp. 84–87.
(1996) 51–59.
[41] H. Lejsek, F.H. Ásmundsson, B.T. Jónsson, L. Amsaleg, Scalability of local image
[10] F. Xu, Y. Zhang, Evaluation and comparison of texture descriptors proposed in
descriptors: a comparative study, in: International Conference on Multimedia,
MPEG-7, Journal of Visual Communication and Image Representation 17 (4)
2006, pp. 589–598.
(2006) 701–716.
[42] K. Wong, L. Po, K. Cheung, A compact and efficient color descriptor for image
[11] P. Howarth, S.M. Rüger, Evaluation of texture features for content-based image
retrieval, in: IEEE International Conference on Multimedia and Expo, 2007, pp.
retrieval, in: International Conference on Image and Video Retrieval, 2004, pp.
611–614.
326–334.
[43] N. Yang, W. Chang, C. Kuo, T. Li, A fast MPEG-7 dominant color extraction with
[12] O.A.B. Penatti, R. da S. Torres, Color descriptors for web image retrieval: a
new similarity measure for image retrieval, Journal of Visual Communication
comparative study, in: Brazilian Symposium on Computer Graphics and Image
and Image Representation 19 (2) (2008) 92–105.
Processing, 2008, pp. 163–170.
[44] H.A. Moghaddam, T.T. Khajoie, A.H. Rouhi, M.S. Tarzjan, Wavelet correlogram:
[13] J. Annesley, J. Orwell, J.-P. Renno, Evaluation of MPEG-7 color descriptors for
a new approach for image indexing and retrieval, Pattern Recognition 38 (12)
visual surveillance retrieval, in: Joint IEEE International Workshop on Visual
(2005) 2506–2518.
Surveillance and Performance Evaluation of Tracking and Surveillance, 2005,
[45] X. Li, Image retrieval based on perceptive weighted color blocks, Pattern
pp. 105–112.
Recognition Letters 24 (12) (2003) 1935–1941.
[14] T. Ojala, M. Aittola, E. Matinmikko, Empirical evaluation of MPEG-7 XM color
[46] K. Nallaperumal, M.S. Banu, C.C. Christiyana, Content based image indexing
descriptors in content-based retrieval of semantic image categories, in:
and retrieval using color descriptor in wavelet domain, in: International
International Conference on Pattern Recognition, vol. 2, 2002, pp. 1021–1024.
Conference on Computational Intelligence and Multimedia Applications, vol. 3,
[15] T. Tuytelaars, K. Mikolajczyk, Local invariant feature detectors: a survey,
2007, pp. 185–189.
Foundations and Trends in Computer Graphics and Vision 3 (2008) 177–280.
[47] A. Utenpattanant, O. Chitsobhuk, A. Khawne, Color descriptor for image
[16] K. Mikolajczyk, C. Schmid, A performance evaluation of local descriptors, IEEE
retrieval in wavelet domain, in: International Conference on Advanced
Transactions on Pattern Analysis and Machine Intelligence 27 (2005) 1615–
Communication Technology, vol. 1, 2006, pp. 818–821.
1630.
[48] A. Williams, P. Yoon, Content-based image retrieval using joint correlograms,
[17] K.E.A. van de Sande, T. Gevers, C.G.M. Snoek, Evaluating color descriptors for
Multimedia Tools and Applications 34 (2) (2007) 239–248.
object and scene recognition, IEEE Transactions on Pattern Analysis and
[49] T. Chaira, A.K. Ray, Fuzzy measures for color image retrieval, Fuzzy Sets and
Machine Intelligence 32 (2010) 1582–1596.
Systems 150 (3) (2005) 545–560.
[18] Y. Wang, F. Makedon, R-histogram: quantitative representation of spatial
[50] G. Paschos, I. Radev, N. Prabakar, Image content-based retrieval using
relations for similarity-based image retrieval, in: ACM International
chromaticity moments, IEEE Transactions on Knowledge and Data
Conference on Multimedia, 2003, pp. 323–326.
Engineering 15 (5) (2003) 1069–1072.
[19] R. da S. Torres, A.X. Falcão, Content-based image retrieval: theory and
[51] J. Huang, S.R. Kumar, M. Mitra, W. Zhu, R. Zabih, Image indexing using color
applications, Revista de Informática Teórica e Aplicada 13 (2) (2006) 161–185.
correlograms, in: International Conference on Computer Vision and Pattern
[20] Y. Rubner, C. Tomasi, L.J. Guibas, The earth mover’s distance as a metric for
Recognition, 1997, pp. 762–768.
image retrieval, International Journal of Computer Vision 40 (2000) 99–121.
[52] T. Lu, C. Chang, Color image retrieval technique based on color features and
[21] M.J. Swain, D.H. Ballard, Color indexing, International Journal of Computer
image bitmap, Information Processing and Management 43 (2) (2007) 461–472.
Vision 7 (1) (1991) 11–32.
[53] J. Sun, X. Zhang, J. Cui, L. Zhou, Image retrieval based on color distribution
[22] M.A. Stricker, M. Orengo, Similarity of color images, in: SPIE Storage and
entropy, Pattern Recognition Letters 27 (10) (2006) 1122–1126.
Retrieval for Image and Video Databases III, vol. 2420, 1995, pp. 381–392.
[54] H.Y. Lee, H.K. Lee, Y.H. Ha, Spatial color descriptor for image retrieval and video
[23] R.de O. Stehling, M.A. Nascimento, A.X. Falcão, Cell histograms versus color
segmentation, IEEE Transactions on Multimedia 5 (3) (2003) 358–367.
histograms for image representation and retrieval, Knowledge and
[55] R. de O. Stehling, M.A. Nascimento, A.X. Falcão, A compact and efficient image
Information Systems 5 (3) (2003) 315–336.
retrieval approach based on border/interior pixel classification, in:
[24] R. de O. Stehling, M.A. Nascimento, A.X. Falcão, An adaptive and efficient
International Conference on Information and Knowledge Management, 2002,
clustering-based approach for content-based image retrieval in image
pp. 102–109.
databases, in: International Database Engineering and Applications
[56] G. Pass, R. Zabih, J. Miller, Comparing images using color coherence vectors, in:
Symposium, 2001, pp. 356–365.
ACM International Conference on Multimedia, 1996, pp. 65–73.
[25] Y. Deng, B.S. Manjunath, C. Kenney, M.S. Moore, H. Shin, An efficient color
[57] K. Lee, L. Chen, An efficient computation method for the texture browsing
representation for image retrieval, IEEE Transactions on Image Processing 10
descriptor of MPEG-7, Image and Vision Computing 23 (5) (2005) 479–489.
(1) (2001) 140–147.
[58] P. Brodatz, Textures: A Photographic Album for Artists and Designers, Dover,
[26] B.S. Manjunath, J.-R. Ohm, V.V. Vasudevan, A. Yamada, Color and texture
1966.
descriptors, IEEE Transactions on Circuits and Systems for Video Technology
[59] S. Kiranyaz, M. Ferreira, M. Gabbouj, A generic shape/texture descriptor over
11 (6) (2001) 703–715.
multiscale edge field: 2-d walking ant histogram, IEEE Transactions on Image
[27] M. Tuceryan, A.K. Jain, Texture Analysis (1993) 235–276.
Processing 17 (3) (2008) 377–391.
[28] B.S. Manjunath, W.Y. Ma, Texture features for browsing and retrieval of image
[60] C. Huang, Q. Liu, An orientation independent texture descriptor for image
data, IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (8)
retrieval, in: International Conference on Communications, Circuits and
(1996) 837–842.
Systems, 2007, pp. 772–776.
[29] J.A.M. Zegarra, N.J. Leite, R. da S. Torres, Wavelet-based feature extraction for
[61] E. Hadjidemetriou, M.D. Grossberg, S.K. Nayar, Multiresolution histograms and
fingerprint image retrieval, Journal of Computational and Applied
their use for recognition, IEEE Transactions on Pattern Analysis and Machine
Mathematics 227 (2009) 294–307.
Intelligence 26 (7) (2004) 831–847.
[30] A. del Bimbo, Visual Information Retrieval, Morgan Kaufmann Publishers,
[62] A. Çarkacioglu, F. Yarman-Vural, SASI: a generic texture descriptor for image
1999.
retrieval, Pattern Recognition 36 (11) (2003) 2615–2633.
[31] H. Tamura, S. Mori, T. Yamawaki, Textural features corresponding to visual
[63] A. Çarkacioglu, F. Yarman-Vural, SASI: a new texture descriptor for content
perception, IEEE Transactions on Systems, Man and Cybernetics 8 (6) (1978)
based image retrieval, in: International Conference on Image Processing, vol. 2,
460–473.
2001, pp. 137–140.
[32] R.M. Haralick, K. Shanmugam, I. Dinstein, Textural features for image
[64] J. Zhou, L. Xin, D. Zhang, Scale-orientation histogram for texture image
classification, IEEE Transactions on Systems, Man and Cybernetics 3 (6)
retrieval, Pattern Recognition 36 (4) (2003) 1061–1063.
(1973) 610–621.
380 O.A.B. Penatti et al. / J. Vis. Commun. Image R. 23 (2012) 359–380

[65] B. Tao, B.W. Dickinson, Texture recognition and image retrieval using gradient [73] Q. Wang, D.D. Feng, Z. Chi, B-spline over-complete wavelet based fractal
indexing, Journal of Visual Communication and Image Representation 11 (3) signature analysis for texture image retrieval, in: International Symposium on
(2000) 327–342. Intelligent Multimedia, Video and Speech Processing, 2004, pp. 462–466.
[66] V. Kovalev, S. Volmer, Color co-occurrence descriptors for querying-by-example, [74] M. Kokare, B.N. Chatterji, P.K. Biswas, Cosine-modulated wavelet based texture
in: International Conference on MultiMedia Modeling, 1998, pp. 32-38. features for content-based image retrieval, Pattern Recognition Letters 25 (4)
[67] M. Unser, Sum and difference histograms for texture classification, IEEE (2004) 391–398.
Transactions on Pattern Analysis and Machine Intelligence 8 (1) (1986) 118–125. [75] D. Sim, H. Kim, R. Park, Invariant texture retrieval using modified Zernike
[68] V. Takala, T. Ahonen, M. Pietikäinen, Block-based methods for image retrieval moments, Image and Vision Computing 22 (4) (2004) 331–342.
using local binary patterns, in: Scandinavian Conference on Image Analysis, [76] X. Yang, J. Liu, Maximum entropy random fields for texture analysis, Pattern
2005, pp. 882–891. Recognition Letters 23 (1–3) (2002) 93–101.
[69] P. Janney, Z. Yu, Invariant features of local textures – a rotation invariant local [77] D. Sim, H. Kim, R. Park, Fast texture description and retrieval of DCT-based
texture descriptor, in: IEEE Conference on Computer Vision and Pattern compressed images, Electronics Letters 37 (1) (2001) 18–19.
Recognition, 2007, pp. 1–7. [78] Y. Rubner, C. Tomasi, Texture-based image retrieval without segmentation, in:
[70] J.A.M. Zegarra, J. Beeck, N.J. Leite, R. da S. Torres, A.X. Falcão, Combining global IEEE International Conference on Computer Vision, vol. 2, 1999, p. 1018.
with local texture information for image retrieval applications, in: [79] O.A.B. Penatti, R. da S. Torres, Eva – an evaluation tool for comparing
International Symposium on Multimedia, vol. 0, 2008, pp. 148–153. descriptors in content-based image retrieval tasks, in: ACM International
[71] J.A.M. Zegarra, N.J. Leite, R. da S. Torres, Rotation-invariant and scale-invariant Conference on Multimedia Information Retrieval, 2010, pp. 413–416.
steerable pyramid decomposition for texture image retrieval, in: Brazilian [80] P.A.S. Kimura, J.M.B. Cavalcanti, P.C. Saraiva, R. da S. Torres, M.A. Gonçalves,
Symposium on Computer Graphics and Image Processing, 2007, pp. 121–128. Evaluating retrieval effectiveness of descriptors for searching in large image
[72] P.W. Huang, S.K. Dai, P.L. Lin, Texture image retrieval and image segmentation databases, Journal of Information and Data Management 2 (3) (2011) 305–320.
using composite sub-band gradient vectors, Journal of Visual Communication
and Image Representation 17 (5) (2006) 947–957.

You might also like