Advances in Knowledge Discovery and Data Mining Part II 14th Edition by Mohammed Zaki, Jeffrey Xu Yu, Ravindran, Vikram Pudi ISBN 3642136710 978-3642136719 PDF Download
Advances in Knowledge Discovery and Data Mining Part II 14th Edition by Mohammed Zaki, Jeffrey Xu Yu, Ravindran, Vikram Pudi ISBN 3642136710 978-3642136719 PDF Download
https://ebookball.com/product/data-mining-and-analysis-
fundamental-concepts-and-algorithms-1st-edition-by-mohammed-
zaki-15688/
https://ebookball.com/product/large-scale-parallel-data-
mining-1759-lecture-notes-in-computer-science-1st-edition-by-
mohammed-zaki-ching-tien-ho-isbn-3540671943-978-3540671947-19650/
https://ebookball.com/product/knowledge-discovery-practices-and-
emerging-applications-of-data-mining-trends-and-new-domains-1st-
edition-by-senthil-kumar-160960069x-9781609600693-14490/
https://ebookball.com/product/lnai-2682-a-data-mining-query-
language-for-knowledge-discovery-in-a-geographical-information-
system-1st-edition-by-donato-malerba-annalisa-appice-
michelangelo-ceci-isbn-9783540224792-354022479x-11018/
Managing and Mining Uncertain Data Advances in Database Systems 1st
Edition by Charu Aggarwal ISBN 0387096892 9780387096896
https://ebookball.com/product/managing-and-mining-uncertain-data-
advances-in-database-systems-1st-edition-by-charu-aggarwal-
isbn-0387096892-9780387096896-12012/
https://ebookball.com/product/knowledge-discovery-from-sensor-
data-industrial-innovation-1st-edition-by-auroop-r-ganguly-
isbn-0367386232-9780367386238-13442/
https://ebookball.com/product/knowledge-discovery-in-databases-
pkdd-2007-1st-edition-by-anique-isbn-13694/
https://ebookball.com/product/machine-learning-in-cyber-trust-
security-privacy-and-reliability-1st-edition-by-jeffrey-tsai-
philip-yu-isbn-0387887342-978-0387887340-16710/
Mining the Web Discovering Knowledge from Hypertext Data 1st Edition
by Soumen Chakrabarti ISBN 9780080511726 0080511724
https://ebookball.com/product/mining-the-web-discovering-
knowledge-from-hypertext-data-1st-edition-by-soumen-chakrabarti-
isbn-9780080511726-0080511724-10336/
Lecture Notes in Artificial Intelligence 6119
Edited by R. Goebel, J. Siekmann, and W. Wahlster
Advances in
Knowledge Discovery
and Data Mining
13
Series Editors
Randy Goebel, University of Alberta, Edmonton, Canada
Jörg Siekmann, University of Saarland, Saarbrücken, Germany
Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany
Volume Editors
Mohammed J. Zaki
Rensselaer Polytechnic Institute
Troy, NY, USA
E-mail: [email protected]
Jeffrey Xu Yu
The Chinese University of Hong Kong
Hong Kong, China
E-mail: [email protected]
B. Ravindran
IIT Madras, Chennai, India
E-mail: [email protected]
Vikram Pudi
IIIT, Hyderabad, India
E-mail: [email protected]
ISSN 0302-9743
ISBN-10 3-642-13671-0 Springer Berlin Heidelberg New York
ISBN-13 978-3-642-13671-9 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
springer.com
© Springer-Verlag Berlin Heidelberg 2010
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper 06/3180
Preface
The 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining was
held in Hyderabad, India during June 21–24, 2010; this was the first time the
conference was held in India.
PAKDD is a major international conference in the areas of data mining (DM)
and knowledge discovery in databases (KDD). It provides an international fo-
rum for researchers and industry practitioners to share their new ideas, original
research results and practical development experiences from all KDD-related
areas including data mining, data warehousing, machine learning, databases,
statistics, knowledge acquisition and automatic scientific discovery, data visual-
ization, causal induction and knowledge-based systems.
PAKDD-2010 received 412 research papers from over 34 countries includ-
ing: Australia, Austria, Belgium, Canada, China, Cuba, Egypt, Finland, France,
Germany, Greece, Hong Kong, India, Iran, Italy, Japan, S. Korea, Malaysia,
Mexico, The Netherlands, New Caledonia, New Zealand, San Marino, Singapore,
Slovenia, Spain, Switzerland, Taiwan, Thailand, Tunisia, Turkey, UK, USA, and
Vietnam. This clearly reflects the truly international stature of the PAKDD
conference.
After an initial screening of the papers by the Program Committee Chairs, for
papers that did not conform to the submission guidelines or that were deemed
not worthy of further reviews, 60 papers were rejected with a brief explana-
tion for the decision. The remaining 352 papers were rigorously reviewed by
at least three reviewers. The initial results were discussed among the reviewers
and finally judged by the Program Committee Chairs. In some cases of con-
flict additional reviews were sought. As a result of the deliberation process, only
42 papers (10.2%) were accepted as long presentations (25 mins), and an addi-
tional 55 papers (13.3%) were accepted as short presentations (15 mins). The
total acceptance rate was thus about 23.5% across both categories.
The PAKDD 2010 conference program also included seven workshops: Work-
shop on Data Mining for Healthcare Management (DMHM 2010), Pacific Asia
Workshop on Intelligence and Security Informatics (PAISI 2010), Workshop on
Feature Selection in Data Mining (FSDM 2010), Workshop on Emerging Re-
search Trends in Vehicle Health Management (VHM 2010), Workshop on Behav-
ior Informatics (BI 2010), Workshop on Data Mining and Knowledge Discovery
for e-Governance (DMEG 2010), Workshop on Knowledge Discovery for Rural
Systems (KDRS 2010).
The conference would not have been successful without the support of the
Program Committee members (164), external reviewers (195), Conference Orga-
nizing Committee members, invited speakers, authors, tutorial presenters, work-
shop organizers, reviewers, authors and the conference attendees. We highly
appreciate the conscientious reviews provided by the Program Committee
VI Preface
Honorary Chair
Rajeev Sangal IIIT Hyderabad, India
General Chairs
Jaideep Srivastava University of Minnesota, USA
Masaru Kitsuregawa University of Tokyo, Japan
P. Krishna Reddy IIIT Hyderabad, India
Workshop Chair
Pabitra Mitra IIT Kharagpur, India
Tutorial Chairs
Kamal Karlapalem IIIT Hyderabad, India
Publicity Chairs
Arnab Bhattacharya IIT Kanpur, India
Publication Chair
Vikram Pudi IIIT Hyderabad, India
Program Committee
External Reviewers
Abdul Nizar Atul Saroop
Abhinav Mishra Blaz Novak
Alessandra Raffaeta Brian Ruttenberg
Aminul Islam Bum-Soo Kim
Andrea Tagarelli Carlo Mastroianni
Anitha Varghese Carlos Ferreira
Ankit Agrawal Carmela Comito
Anuj Mahajan Cha Lun Li
Anupam Bhattacharjee Chandra Sekhar Chellu
X Conference Organization
Organized by
Sponsoring Institutions
AFOSR, USA
ONRG, USA
Table of Contents – Part II
1 Introduction
In various applications, such as gene expression, image retrieval, etc., one is often con-
fronted with high dimensional data [1]. Dimension reduction, which maps data points
in a high-dimensional space into those in a low-dimensional space, is thus viewed as
one of the most crucial preprocessing steps of data analysis. Dimension reduction meth-
ods can be divided into three categories, which are supervised ones [2], unsupervised
ones[3], and semi-supervised ones[4]. The input data in these three categories are la-
beled data, unlabeled data, and both of them, respectively. In a typical real-world ap-
plication, only a small number of labeled data points are available due to the high cost
to obtain them [4]. Hence the semi-supervised dimension reduction may be considered
to fit into the practical setting. Instead of labeled data points, some semi-supervised
methods assume pairwise constraints, for it is easier for experts to specify them than to
assign the class labels of data points. More specifically speaking, pairwise constraints
consist of must-link constraints and cannot-link constraints. The pair of data points in
M.J. Zaki et al. (Eds.): PAKDD 2010, Part II, LNAI 6119, pp. 1–13, 2010.
c Springer-Verlag Berlin Heidelberg 2010
2 B. Tong and E. Suzuki
a must-link constraint shares the same class label, while the pair of data points in a
cannot-link constraint is given different class labels.
From another viewpoint, dimension reduction methods can be divided into nonlinear
and linear ones. The former allows a nonlinear transformation in the mapping while
the latter restricts itself to linear transformation. We consider a complex distribution
of points that are distributed in multiple subclasses. In other words, the data points of
one class form several separated clusters. A nonlinear method has a higher degree of
freedom and hence can handle data with complex distribution effectively while a linear
method tends to be incompetent in such a case.
In this paper, we restrict our attention to the linear semi-supervised dimension re-
duction for the data of multiple subclasses with pairwise constraints. Previously rele-
vant methods [5] [6] [7] [8] implicitly assume that a class consists of a single cluster.
If the points are of multiple subclasses, handling the pairwise constraints to project
the points into multiple subclasses in the transformed space is challenging for linear di-
mension reduction. For a deep analysis, we particularly classify the must-link constraint
into two categories. If two points in a must-link constraint reside in a single subclass,
we define such a must-link constraint as an intra-subclass must-link constraint. On the
contrary, if two points in a must-link constraint come from different subclasses, we
define such kind of must-link constraint as an inter-subclass must-link constraint. We
attribute the reason of the improper behavior of current linear methods to the fact that
the inter-subclass must-link constraint most probably confuses the discriminant criteria
of existing methods. The problem resulted from the inter-subclass must-link constraint
is also encountered by the constraint transformation. For instance, a method in [9] trans-
forms multiple must-link constraints, which are connected via points in two different
classes, into a cannot-link constraint between the centroids of the points of two classes.
This method fails to give a comprehensible meaning if the points belong to different
subclasses because the centroids may fall into the region of another class.
To overcome above problems, we propose SODRPaC, which consists of two steps. In
the first step, must-link constraints which satisfy several conditions are transformed into
cannot-link constraints and the remaining must-link constraints are deleted. The idea
behind this step is to reduce the harmfulness of the inter-subclass must-link constraints
while exploiting the must-link constraint information as much as possible by respecting
the cluster assumption [10]: nearby points on the same manifold structure in the original
space are likely to belong to the same class. In the second step, we obtain a projection
mapping by inventing a new discriminant criterion for dimension reduction, which is
suitable for the data of multiple subclasses, and employing the manifold regularization
[11], which is helpful for discovering the local structure of data.
The problem setting is defined as follows. We are given a set of N points X = {x1 , x2 ,
. . . , xN }, where xi represents a point in a d-dimensional space, a set of must-link
constraints M = {m1 , m2 , . . . , mNML }, and a set of cannot-link constraints C =
{c1 , c2 , . . . , cNCL }. Here mi consists of a pair of points belonging to the same class
while ci consists of a pair of points belonging to different classes. The output is a
Subclass-Oriented Dimension Reduction 3
SODRPaC 15
10 NPSSDR
SSDR
CLPP 10
5 CMM
Discriminant Criterion
5
0
0
−5 −5
−10
−10
−2 −1 0 1 2 −2 0 2
(a) (b)
Fig. 1. Motivating examples. The data points are of Gaussian distribution. In (a), the blue and
red points are distributed in different clusters. In (b), the red points reside in different subclasses.
Must-link constraints and cannot-link constraints are denoted by black solid and dashed lines,
respectively.
Fig. 1 presents the motivating examples, where d = 2 and l = 1. The task for di-
mension reduction here is thus to project the two dimensional data onto a line, where
the points from different classes can be differentiated. A horizontal line is supposed to
be the best projection while a vertical one is the worst projection. To better illustrate
the motivation of our method, previously relevant methods are firstly retrospected. In
the aspect of pairwise constraints, SSDR [5] and CMM [6] are to maximize the aver-
age distance between the points in cannot-link constraints, and to minimize the average
distance between the points in must-link constraints simultaneously. We can see that
minimizing the average distance between the points in must-link constraints is reason-
able in the case shown in Fig. 1a, where all the must-link constraints are intra-subclass
must-link constraints. However, it disturbs to maximize the average distance between
the points in cannot-link constraints in the case shown in Fig. 1b, where all the must-
link constraints are inter-subclass must-link constraints. CLPP [7] builds an affinity
matrix, each entry of which indicates the similarity between two points. To utilize the
constraint information, the affinity matrix is revised by setting the similarity degree be-
tween non-neighboring points involved in pairwise constraints. For example, given a
must-link constraint, the similarity degree between two points is revised to be 1, indi-
cating two points are close (similar) to each other, no matter the two points are distant
(dissimilar) or not. Suppose that the must-link constraint is inter-subclass must-link
constraint, it implies that the two points are not geometrically nearby each other. This
4 B. Tong and E. Suzuki
arbitrary updating may damage the geometrical structure of data points. This problem
is also confronted by NPSSDR [8]. The above analysis explains the reason why CMM,
SSDR, CLPP and NPSSDR are capable of obtaining excellent performance as shown
in Fig. 1a, while they fail to reach the same fine performance in the multiple subclass
case shown in Fig. 1b.
In the light of observations, we argue that the inter-subclass must-link constraint is
probably harmful for the discriminant criteria of existing methods. For this reason, we
attempt to design a new discriminant criterion that is able to behave appropriately in the
case of multiple subclasses. The new discrimination criterion marked as ‘Discriminant
Criterion’ is able to obtain almost the same performance as others, as shown in Fig.
1a, and can even outperform previous methods, as shown in Fig. 1b. Moreover, the
manifold regularization is helpful for discovering the local structure of data which is
considered as one of the most principle characteristics of the data of multiple subclasses
[12]. We therefore consider to make the new discriminant criterion and the manifold
regularization work together in a collaborative way. Fig. 1b also demonstrates that our
method SODRPaC, which is the combination of the new discrimination criterion and
the manifold regularization, can obtain the best performance.
connected. Under the cluster assumption, it is natural to consider two nearby points as
another form of must-link constraint, so that we have more opportunities to transform
must-link constraints into cannot-link constraints. In this paper, we employ shared near-
est neighbor (SNN) [13] to formulate the sense of ‘nearby’ points. A set of shared near-
est neighbors is denoted by NS = {NxS1 , NxS2 , . . . , NxSN } where NxSi = {{xi, xj }|xi ∈
N(xj ), xj ∈ N(xi )}. N(xi ) denotes the k nearest neighbors set of xi . Let |NS | be the
number of the pairs of shared nearest neighbors, where |·| denotes the cardinality of a
set. The value of SNN between xi and xj is defined as the number of points shared by
their neighbors SN N (i, j) = |N(xi ) ∩ N(xj )|. The larger the value of SNN between
two points is, the closer the two points are. It should be noted that we design a N × N
matrix L to specify a kind of reliability for cannot-link constraints, which could be also
deemed as the trustiness to them. Suppose that all the previously specified constraints
are correct, for the previously given cannot-link constraints and the generated cannot-
link constraints by using must-link constraints, their reliabilities are set to be 1. For the
generated cannot-link constraints by using shared nearest neighbors, their reliabilities
are equal to the similarities between the shared nearest neighbors. It is because that
transformation by employing shared nearest neighbors are considered to be less trustful
than that by using must-link constraints. We believe it is natural to take the similarity
between the shared nearest neighbors as a measurement for the trustiness. For exam-
ple, given a pair of shared nearest neighbors {xi , xj }, we represent the reliability of
a generated cannot-link constraint by using it as a Gaussian kernel, which is a simple
kernel and has been widely applied in research fields. The reliability is formulated as
θ(xi , xj ) = exp(− xi − xj 2 /γ), where · denotes the Euclidian norm and γ is the
kernel parameter. Note that, for the convenient access to the matrix L, given a cannot-
link constraint c = {xi , xj }, we use L(c) to denote the entries of Lij and Lji , thus L
is a symmetric matrix.
a c e a c e a c e a c e
b d f b d f b d f b d f
Fig. 2. Four simple cases for the transformation. The previously specified must-link constraints
and cannot-link constraints are denoted by the black solid and dashed lines, respectively. The
shared nearest neighbor is presented as the blue dotted line. The red dash-dotted line specifies the
new cannot-link constraint.
Fig. 2 shows four fundamental scenarios of the transformation. The set {a, b, e, f},
and {c, d} represent different classes of data points. We explain these four scenarios in a
metaphorical way where the must-link constraint is taken as a friend relationship while
the cannot-link constraint is considered as an enemy one. Standing from the viewpoint
of point ‘a’, it is given a friend relationship, say {a, e}, as shown in Fig. 2a, which is
6 B. Tong and E. Suzuki
called as a basic form. If ‘d’ is my enemy, instead of keeping my friend ‘e’, consider
that ‘e’ is the enemy of my enemy ‘d’. Fig. 2b shows an extension of the basic form
with an enemy’s friend rule. If my enemy ‘d’ has a friend ‘c’, ‘c’ is the enemy of my
friend ‘e’ and me. In these two cases, the reliabilities for the new enemy relationships
are set to be 1. Fig. 2c presents an extension of the basic form, which is called as a
proximity form. If I have no enemy but my neighbor ‘b’ has an enemy ‘d’, ‘d’ is the
enemy of my friend ‘e’ and me. Fig. 2d shows an extension of the proximity form with
the enemy’s friend rule. Note that, in the latter two cases, the reliabilities for the new
enemy relationships are set to be the similarity between my neighbor ‘b’ and me. The
pseudo code for the summary of these four cases is illustrated in Algorithm 1.
nearest neighbors closer in the transformed space. Furthermore, the pair of points in
the shared nearest neighbors probably resides in the same subclass, such that this re-
laxation would not suffer from the harmfulness of inter-subclass must-link constraints.
Therefore, the discriminant criterion, which maximizes the average distance between
the points in cannot-link constraints and minimizes the average distance between the
shared nearest neighbors, is expected to be suitable for the data of multiple subclasses.
Suppose that xi and xj are projected to the image yik and yjk along the direction wk ,
the new discriminant criterion is defined as follows:
k k
y − y k 2 y − y k 2
i j i j
∂(wk ) = Lij − Hij (1)
2|C| 2|NS |
i,j:{xi ,xj }∈C i,j:{xi ,xj }∈NS
Inspired by the local scatter [14], the intuition behind the latter part of the right side
of Eq. 1 could be regarded as the compactness of shared nearest neighbors, since two
points are more likely to be close if the value of SNN between them is large. The differ-
ence from the local scatter lies in the fact that a weighted matrix H which handles the
similarity degree between shared nearest neighbors is employed. Since SNN provides
a robust property that the side effect caused by the noisy points could be reduced to
some degree, the compactness of shared nearest neighbors is more reliable than that of
local scatter. The compactness of shared nearest neighbors could be also re-written as
follows:
k
y − y k 2 1
i j
Hij = Hij (wkT xi − wkT xj )2
2|NS | 2|NS | i j
i,j:{xi, xj }∈NS
⎡ ⎤
1
= wkT ⎣ Hij (xi − xj )(xi − xj )T ⎦ wk
2|NS |
i j
= wkT S1 wk (3)
1
where S1 = 2|NS | Hij (xi − xj )(xi − xj )T . S1 then could be computed as
i j
follows:
⎛ ⎞
1 ⎝
S1 = Hij xi xTi + Hij xj xTj − 2 Hij xi xTj ⎠
2|NS |
i j i j i j
⎛ ⎞
1 ⎝
= Dii xi xTi − Hij xi xTj ⎠
|NS | i i j
1
= XDX T − XHX T (4)
|NS |
8 B. Tong and E. Suzuki
where D is a diagonal matrix whose entries are column sums of H, Dii = Hij .
j
Similarly, the first part of right hand of Eq. 1 could be reformulated as:
k
y − y k 2 1
i j
Lij = Lij (wkT xi − wkT xj )2
2|C| 2|C| i j
i,j:{xi, xj }∈C
= wkT S2 wk (5)
1
where S2 = |C| XGX T − XLX T where G is a diagonal matrix whose entries are
column sums of L, Gii = Lij . Then, ∂(wk ) can be briefly written as:
j
4 Evaluation by Experiments
4.1 Experiments Setup
We use public data sets to evaluate the performance of SODRPaC. Table 1 summarizes
the characteristics of the data sets. All the data come from the UCI repository [15]
except for GCM [16] that is of very high dimensionality. For the ‘monks-1’, ‘monks-2’,
and ‘monks-3’ data, we combined the train and test sets into a whole one. For the ‘letter’
data, we chose ‘A’, ‘B’, ‘C’, and ‘D’ letters from the train and test sets respectively by
randomly picking up 100 samples for each letter, and then assembled them into a whole
set.
Data set Dimension Instance Class Data set Dimension Instance Class
monks-1 6 556 2 monks-2 6 601 2
monks-3 6 554 2 letter(ABCD) 16 800 4
heart 13 270 4 GCM 16063 198 14
As shown in Eq. 10, λ is the parameter that controls the balance between P − Q
and M. In this experiments setting, the parameter λ is searched from 2α , where α ∈
{α| − 5 ≤ α ≤ 10, α ∈ Z}. A weighted 5-nearest-neighbor graph is employed to
construct the manifold regularizer. In addition, the kernel parameter γ follows the sug-
δ2 δ 2 δ2 δ2 2
gestion in [17] that it is searched from the grid { 16 , 8 , 4 , 2 ,δ , 2δ 2 ,4δ 2 ,8δ 2 ,16δ 2 },
where δ is the mean norm of data. The parameter λ and the manifold regularizer are
then optimized by means of the 5-fold cross-validation. As to the parameter settings
of other competitive methods, we follow the parameters recommended by them, which
are considered to be optimal. Without specific explanation, the number of must-link
constraints is always set to be equal to that of cannot-link constraints, as the equal
equilibrium between must-link constraints and cannot-link constraints is favorable for
the existing methods. In addition, the value of k for searching shared nearest neigh-
bors is set to be 3. The reason of this setting is to guarantee that the pairs of points
in shared nearest neighbors reside in the same subclass, and to make the constraint
transformation have more opportunities to be performed. In our experiments, must-link
constraints and cannot-link constraints are selected according to the ground-truth of
data labels.
0.86 0.88
0.65
0.84
0.86
Accuracy
Accuracy
Accuracy
0.82
0.6 0.84
0.8
SODRPaC
NPSSDR 0.82
0.78
0.55 SSDR
CLPP 0.8
0.76
CMM
PCA
0.5 0.74 0.78
20 40 60 80 100 120 140 20 40 60 80 100 120 140 20 40 60 80 100 120 140
Number of Constraints Number of Constraints Number of Constraints
0.78 0.99
0.5
0.985
0.77
0.98 0.4
Accuracy
Accuracy
Accuracy
0.76
0.975
0.75
0.3
0.97
0.74
0.965
0.2
0.73
0.96
Fig. 3. The performance with different numbers of constraints (d: reduced dimensionality)
The reason is probably that the feature of discovering the local structure of data points
could not help CLPP to outperform PCA. However, our SODRPaC, which also utilizes
the manifold regularization due to its property of discovering the local structure, ob-
tains the best performance. We can judge that the new discriminant criterion boosts the
performance. It is also presented in Fig. 3d that the performance of SSDR decreases
to some extent with the increase of the number of constraints. The possible reason is
that increasing the number of available constraints makes the opportunity higher that
inter-subclass must-link constraints exist, which deteriorates the optimization on the
fine dimension reduction. It should be also pointed out that SODRPaC does not sig-
nificantly outperform other methods. A possible reason is that the Euclidean distance,
which is employed to formulate the similarity between points in the original space, is
likely to be meaningless in the high dimensional space.
We then examine the relative impact between must-link constraints and cannot-link
constraints on the performance of SODRPaC. In this experiment, given 150 available
constraints, the ratio of must-link constraints to cannot-link constraints is varied from 0
to 1. Fig. 4 presents that SODRPaC has a much smoother behavior than others with the
change of ratio. It indicates that SODRPaC is more robust than other semi-supervised
methods in terms of the imbalance between must-link constraints and cannot-link con-
straints. As shown in Fig. 4b and Fig. 4f, SODRPaC presents an obvious degradation of
Subclass-Oriented Dimension Reduction 11
0.7 0.9 1
0.68 0.85
0.95
0.66 0.8
Accuracy
Accuracy
Accuracy
0.62 0.7
0.79 1 0.8
0.78 0.7
0.77 0.95
0.6
0.76
Accuracy
Accuracy
Accuracy
0.5
0.75 0.9
0.4
0.74
0.3
0.73 0.85
0.72 0.2
(d) heart (d=10) (e) letter (abcd) (d=10) (f) GCM (d=150)
Fig. 4. The performance with the change of rate for must-link set (d: reduced dimensionality)
performance when all constraints are must-link ones. The most probable reason would
be that the transformation from must-link constraints into cannot-link constraints can
not be performed when the necessary cannot-link constraints lack. This behavior is
consistent with the conclusion demonstrated in [9] that cannot-link constraints are more
important than must-link constraints in guiding the dimension reduction.
As implicated in the previous sections, the parameter λ that controls the balance
between P − Q and M, and the factor γ that is related to computing the similarity
between two points would influence the performance of SODRPaC. An analysis on
the two parameters is necessary to provide the guideline about how to choose their
values. PCA is employed as the baseline because existing methods can not hold such
two parameters simultaneously. Because of the different scale between λ and γ, λ-axis
and γ-axis are thus plotted as λ/(1 + λ) and γ/(1 + γ), respectively. The axis values
are then in the interval (0, 1). We empirically uncover two interesting patterns for most
of data sets and reduced dimensions as well. There are two regions where SODRPaC
are more likely to obtain its best performance. The first region is where λ/(1 + λ) is
small, as shown Fig. 5a, Fig. 5b, Fig. 5c, Fig. 5d, Fig. 5e and Fig. 5g. In this situation,
the variation of γ/(1 + γ) would not cause the dramatic change for the performance
of SODRPaC. The second region is where both λ/(1 + λ) and γ/(1 + γ) are large, as
shown in Fig. 5b, Fig. 5e, Fig. 5f, and Fig. 5h.
12 B. Tong and E. Suzuki
SODRPaC
0.8 PCA 0.9
0.85
0.9
0.8
0.7 0.8 0.85
Accuracy
Accuracy
Accuracy
Accuracy
0.75
0.8
0.6 0.7
0.7
0.75
0.65
0.6 0.7
0.5
(a) monks-1 (d=3) (b) monks-1 (d=4) (c) monks-2 (d=3) (d) monks-2 (d=4)
0.8 0.9
0.99
0.75 0.85 0.8
Accuracy
Accuracy
Accuracy
Accuracy
0.7 0.8
0.7 0.98
0.65 0.75
0.6 0.97
0.6 0.7
0.8 0.8
0.5 0.6 0.5 0.6 0.5 0.6 0.8 0.5 0.6 0.8
0.2
0.4
0.2
0.4
1 0.2 0.4 0.2 0.4
1 1 1
γ/(1+γ) λ/(1+λ) λ/(1+λ) λ/(1+λ) γ/(1+γ) λ/(1+λ)
γ/(1+γ) γ/(1+γ)
(e) monks-3 (d=3) (f) monks-3 (d=4) (g) letter(abcd) (d=3) (h) letter(abcd) (d=15)
References
1. Parsons, L., Haque, E., Liu, H.: Subspace Clustering for High Dimensional Data: A Review.
In: SIGKDD Explorations, pp. 90–105 (2004)
2. Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, San Diego
(1990)
3. Joliffe, I.: Principal Component Analysis. Springer, New York (1986)
4. Zhu, X.: Semi-supervised Learning Literature Survey. Technical Report Computer Sciences
1530, University of Wisconsin-Madison (2007)
5. Zhang, D., Zhou, Z.H., Chen, S.: Semi-supervised Dimensionality Reduction. In: Proceed-
ings of the 7th SIAM International Conference on Data Mining (2007)
6. Wang, F., Chen, S., Li, T., Zhang, C.: Semi-supervised Metric Learning by
Maximizing Constraint Margin. In: ACM 17th Conference on Information and Knowledge
Management, pp. 1457–1458 (2008)
7. Cevikalp, H., Verbeek, J., Jurie, F., Klaser, A.: Semi-supervised Dimensionality Reduction
Using Pairwise Equivalence Constraints. In: International Conference on Computer Vision
Theory and Applications (VISAPP), pp. 489–496 (2008)
8. Wei, J., Peng, H.: Neighborhood Preserving Based Semi-supervised Dimensionality Reduc-
tion. Electronics Letters 44, 1190–1191 (2008)
9. Tang, W., Xiong, H., Zhong, S., Wu, J.: Enhancing Semi-supervised Clustering: A Feature
Projection Perspective. In: Proc. of the 13th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, pp. 707–716 (2007)
10. Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with Local and Global
Consistency. In: Proc. Advances in Neural Information Processing Systems, pp. 321–328
(2004)
11. Belkin, M., Niyogi, P.: Manifold Regularization: A Geometric Framework for Learning From
Labeled and Unlabeled Examples. Journal of Machine Learning Research 7, 2399–2434
(2006)
12. Sugiyama, M.: Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Dis-
criminant Analysis. Journal of Machine Learning Research 8, 1027–1061 (2007)
13. Ertöz, L., Steinbach, M., Kumar, V.: A New Shared Nearest Neighbor Clustering Algorithm
and its Applications. In: Proc. of the Workshop on Clustering High Dimensional Data and its
Applications, Second SIAM International Conference on Data Mining (2002)
14. Yang, J., Zhang, D., Yang, J.Y., Niu, B.: Globally Maximizing, Locally Minimizing: Un-
supervised Discriminant Projection with Applications to Face and Palm Biometrics. IEEE
Trans. Pattern Analysis and Machine Intelligence 29(4), 650–664 (2007)
15. Blake, C., Keogh, E., Merz, C.J.: UCI Repository of Machine Learning Databases,
http://www.ics.uci.edu/˜mlearn/MLRRepository.html
16. Ramaswamy, S., Tamayo, P., Rifkin, R., et al.: Multiclass Cancer Diagnosis Using Tu-
mor Gene Expression Signatures. Proceedings of the National Academy of Sciences,
15149–15154 (1998)
17. He, X., Yan, S., Hu, Y., Niyogi, P.: Face Recognition Using Laplacianfaces. IEEE Transaction
on Pattern Analysis and Machine Intelligence 27, 316–327 (2005)
Other documents randomly have
different content
The Meeting Of Mark And Johanna.
When Ben and Mrs. Oakley had thus disposed of Mr. Lupin, and left
him to his solitary and not very pleasant reflections in a cell of the
round-house, they found themselves together in the open street,
and Ben, as he cast a woeful glance at her, said—
"Well, how does yer feel now? Easy does it! Oh, you aint a-been and
behaved yourself properly lately—you is like the old bear as we calls
Nosey. He's always a-doing what he shouldn't, and always a-never
doing what he should."
"Ben?"
"Well, blaze away. What is yer going to say now?"
"I feel, Ben, that I am a very different woman from what I was—
very different."
"Then you must have gained by the exchange, for you was, I will say
it, anything but a pleasant bit o' goods. There's poor old Oakley a-
making of spectacles all days, and a-wearing of his old eyes out—
and there's Miss Johanna, bless her heart! as wise a little bit o'
human nature as you'd wish to see, whether she's in petticoats or
the other things; and yet you neglects 'em both, all for to run arter a
canting snivelling wagabone like this Lupin, that we wouldn't have
among the beasteses at the Tower, if so be he'd come and offer
himself."
"I know it, Ben—I know it."
"You know it! Why didn't you know it before?"
"I don't know, Ben; but my eyes are open now. I have had a lesson
that to my dying day I shall never forget. I have found that piety
may only be a cloak with which to cover up the most monstrous
iniquity."
"Oh, you have made that discovery, have you?"
"I have, indeed, Ben."
"Well, I knowed as much as that when I was a small baby. It only
shows how back'ard some folks is in coming for'ard with their
edication."
"Yes, Ben."
"Well, and what is you going to be arter now?"
"I wish to go home, and I want you to come with me, and to say a
kind word for me; I want you to tell them how I now see the error of
my ways, and how I am an altered woman, and mean to be a very—
very different person than I was."
Here Mrs. Oakley's genuine feelings got the better of her, and she
began to weep bitterly; and Ben, after looking at her for a few
moments, cried out—
"Why, it's real, and not like our hyena that only does it to gammon
us! Come, mother Oakley, just pop your front paw under my arm,
and I'll go home with you; and if you don't get a welcome there, I'm
not a beef-eater. Why, the old man will fly right bang out of his wits
for joy. You should only see what a house is when the mother and
the wife don't do as she ought. Mother O., you should see what a bit
of fire there is in the grate, and what a hearth."
"I know it—I ought to know it."
"You ought to know it!" added Ben, putting himself into an oratorial
attitude. "You should only see the old man when dinner time comes
round. He goes into the parlour and he finds no fire; then he says
—'Dear me!'"
"Yes—yes."
"Then he gives a boy a ha'penny to go and get him something that
don't do him no sort of good from the cook's shop, and sometimes
the boy nabs the ha'penny and the shilling both, and ain't never
heard of again by any means no more."
"No doubt, Ben."
"Then, when tea comes round, it don't come round at all, and the
old man has none; but he takes in a ha'porth of milk in a jug without
a spout, and he drinks that up, cold and miserable, with a penny-
loaf, you see."
"Yes—yes."
"And then at night, when there ought to be a little sort of comfort
round the fireside, there ain't none."
"But Johanna, Ben—there is Johanna?"
"Johanna?"
"Yes. Is she not there to see to some of her father's comforts? She
loves him—I know she does, Ben!"
Ben placed his finger by the side of his nose, and in an aside to
himself, he said—
"Now I'll touch her up a bit—now I'll punish her for all she has done,
and it will serve her right." Then, elevating his voice, he added—"Did
you mention Johanna?"
"Yes, Ben, I did."
"Then I'm sorry you did. Perhaps you think she's been seeing to the
old man's comforts a little—airing his night-cap, and so on—Eh? Is
that the idea?"
"Yes, I know that she would do anything gladly for her father. She
was always most tenderly attached to him."
"Humph!"
"Why do you say, Humph, Ben?"
"Just answer me one question, Mrs. O. Did you ever hear of a young
girl as was neglected by her mother—her mother who of all ought to
be the person to attend to her—turning out well?"
"Do not terrify me, Ben."
"Well, all I have got to say is, that Johanna can't be in two places at
once, and as she isn't at home, how, I would ask any reasonable
Christian, can she attend to the old man?"
"Not at home, Ben?"
"Not—at—home!"
"Oh, Heaven! why did I not stay in that dreadful man's house, and
let him murder me! Why did I not tell him at once that I knew of his
crime, and implore him to make me his next victim! Oh, Ben, if you
have any compassion in your disposition you will tell me all, and
then I shall know what to hope, and what to dread."
"Well," said Ben, "here goes then."
"What goes?"
"I mean I'm a-going to tell you all, as you seem as if you'd like to
know it."
"Do! Oh, do!"
"Then of course Johanna being but a very young piece of goods, and
not knowing much o' the ways o' this here world, and the habits and
manners o' the wild beasteses as is in it, when she found as the old
house wasn't good enough for her mother, she naturally enough
thought it wasn't good enough for her, you know."
"Oh, this is the most dreadful stroke of all!"
"I should say it were," said Ben, quite solemnly. "Take it easy
though, and you'll get through it in the course of time. Well then,
when Johanna found as everything at home was sixes and sevens,
she borrowed a pair of what do call 'ems of some boy, and a jacket,
and off she went."
"She what?"
"She put on a pair of thingumys—well, breeches then, if you must
have it—and away she went, and the last I saw of her was in Fleet
Street with 'em on."
"Gracious Heaven!"
"Very likely, but that don't alter the facts of the case, you know, Mrs.
O. On she had 'em, and all I can say is that you might have knocked
me down flat to see her, that you might. I didn't think I should ever
have got home to the beasteses in the Tower again, it gave me such
a turn."
"Lost! Lost!"
"Eh? What do you say? What have you lost now?"
"My child! My Johanna!"
"Oh! Ah, to be sure. But then you know, Mrs. O, you ought to have
staid at home, and gived her ever so much good advice, you know;
and when you saw she was bent upon putting on the boy's things,
you as a mother ought to have said, 'My dear, take your legs out of
that if yer pleases, and if yer don't, I'll pretty soon make you,' and
then staid and gived the affair up as a bad job that wouldn't pay,
and took to morals."
"Yes—yes. 'Tis I, and I only, who am to blame. I have been the
destruction of my child. Farewell, Ben. You will perhaps in the course
of time not think quite so badly of me as you now do. Farewell!"
"Hold!" cried Ben as he clutched the arm of Mrs. Oakley only the
more tightly in his own: "What are you at now?"
"Death is now my only resource. My child is lost to me, and I have
driven her by my neglect to such a dreadful course. I cannot live
now. Let me go, Ben. You will never hear of me again."
"If I let you go may I be—Well, no matter—no matter. Come on. It's
all one, you know, a hundred years hence."
"But at present it is madness and despair. Let me go, I say. The river
is not far off, and beneath its waters I shall at least find peace for
my breaking heart. Let my death be considered as some sort of
expiation of my sins."
"Stop a bit."
"No—no—no."
"But I say, yes. Things ain't quite so bad as you think 'em, only it
was right o' me, you know, just to let you know what they might
have been."
"What do you tell me?"
"Why that there ain't a better girl than Johanna in all the world, and
that if all the mothers that ever was or ever will be, had neglected
her and set her all their bad examples in the universal world, she
would still be the little angel that she is now, and no mistake."
"Then she is not from home? It is all a fable?"
"Not quite, Mrs. O. just you trot on now comfortably by the side of
me, and I will tell you the whole particulars, and then you will find
that there ain't no occasion to go plumping into the river on
Johanna's account."
Poor Mrs. Oakley, with delight beaming upon every feature of her
face, now listened to Ben while he explained the whole matter to
her, as far as he himself was cognisant of it; and if he did not offer
to be very explicit in minor details, she at all events heard from him
quite enough to convince her that Johanna was all that the tenderest
mother could wish.
"Oh, Ben," she said, as the tears coursed each other down her
cheeks, "how could you torture me as you have done?"
"All for your own good," said Ben. "It only lets you see what might
have happened if Johanna had not been the good little thing that
she is, that's all."
"Well, perhaps it is for the best that I should have suffered such a
pang, and I only hope that Heaven will accept of it as some sort of
expiation of my wickedness. If you had not held me, Ben, I should
certainly have taken my life."
"Not a doubt about it," said Ben; "and a pretty kittle of fish you
would then have made of the whole affair. However, that's all right
enough now, and as for old Oakley, all you have got to do is to go
into the shop and say to him. 'Here I am, and I am sorry for the
past, which I hope you will forgive, and for the future I will strive to
be a good wife.'"
"Must I say that, Ben?"
"Yes, to be sure. If you are ashamed to say what's right, you may
depend upon it you haven't much inclination to do it."
"You have convinced me, Ben. I will humble myself. It is fit and
proper that I should. So I will say as nearly as I can recollect just
what you have told me to say."
"You can't do better; and here we are at the corner of the street.
Now if you would rather go in by yourself without me, only say the
word, and I'm off."
Mrs. Oakley hesitated for a moment and then she said—
"Yes, Ben, I would rather go alone."
"Very good. I think it's better too, so good-by; and I'll call to-morrow
and see how you are all getting on."
"Do so, Ben. No one can possibly be more welcome than you will be.
You will be sure to come to-morrow?"
"Rather."
With this Ben walked away, and Mrs. Oakley entered the house.
What then passed we do not feel that we ought to relate. The
humiliations of human nature, although for the best of purposes,
and for the ultimate happiness of the parties themselves, are not
subjects for the pen of the chronicler. Suffice it, that Mr. and Mrs.
Oakley were perfectly reconciled, and were happy upon that day.
CHAPTER CXVIII.
TAKES A PEEP AT TOBIAS AT THE COLONEL'S
HOUSE.
"Hold a bit!" he cried. "We don't allow that sort of thing here with
any of our customers. You should have thought of those games
before you got into the stone jug!"
With one powerful blow, the turnkey struck the piece of the broken
bason from the hand of Todd, and with another he felled him to the
floor.
"None o' your nonsense," he said; and then he carefully collected
the pieces of the broken bason.
"Why should you grudge me the means of death," said Todd, "when
you know that you have brought me here among you to die?"
"Contrary to rules."
"In mercy, I ask you only to give me leave to take my own life, for I
have failed in the object of my living."
"Contrary to rules."
The turnkey left the cell, then, as coolly as if nothing had happened,
and carefully locked the door again, while he went to report the
attempted suicide of the prisoner to the proper quarter.
Foiled, then, in every way, Todd looked round the cell for some
means of ridding himself of his life and his troubles together; but he
found none. He then paced the cell to and fro like a maniac, as he
muttered to himself—
"All lost—lost—lost—all lost! Foiled, too, at the moment when I
thought myself most secure—when I had made every preparation to
leave England for ever! Oh, dolt that I was, not to have done so long
ago, when I had half—ay, when I had only a quarter of the sum that
I should this day have fled with! In my dreams I have seen myself as
I am now, and the sight has shaken me, but I never thought to be
so in reality. Is there any hope for me? What do they know?—what
can they know?"
Upon these questions, Todd paused in his uneasy walk in the cell,
and sat down upon the low stool to think. His head rested upon his
breast, and he was profoundly still.
CHAPTER CXX.
A LUNCHEON AT SIR RICHARD BLUNT'S.—THE DOG
AND HIS OLD FRIEND.
ebookball.com