0% found this document useful (0 votes)
26 views6 pages

1368 - نیک افشان

Manifold learning can be used to reduce data anomalies by mapping high-dimensional data points that lie on a low-dimensional manifold to a lower dimensional space while preserving distances. The document discusses Locally Smooth Manifold Learning (LSML), which learns a warping function to map points on a manifold to their neighbors. LSML approximates the warping function to obtain a matrix whose columns are derivatives that capture the modes of variation in the data. The goal is to learn this warping function to map points between neighboring locations on the manifold.

Uploaded by

m1.nourian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views6 pages

1368 - نیک افشان

Manifold learning can be used to reduce data anomalies by mapping high-dimensional data points that lie on a low-dimensional manifold to a lower dimensional space while preserving distances. The document discusses Locally Smooth Manifold Learning (LSML), which learns a warping function to map points on a manifold to their neighbors. LSML approximates the warping function to obtain a matrix whose columns are derivatives that capture the modes of variation in the data. The goal is to learn this warping function to map points between neighboring locations on the manifold.

Uploaded by

m1.nourian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Reduce Data Anomalies Using

Manifold Learning
Hima Nikafshan Rad Homayun Motameni
College of Computer Science Department of Computer Engineering
Tabari Institute of Higher Education Islamic Azad University, sari Branch
Babol, Iran Sari, Iran
[email protected] motameni@iausari,ac,ir

Abstract—Manifold learning has recently appeared as a segregated in a high-dimensional space, where the low-
powerful method for dimensionality reduction. Most studies and dimensional space inverts the underlying parameters and high-
theoretical results in the field of this method have only focussed dimensional space is the feature space. Trying to detect this
on preserves distances quite nicely; however, empirical results
are sparse. In this paper we select the important features of data manifold structure in a data set is referred to as manifold
and assignment rank for high value data and the penalty for low learning.
value data or similar data, then they insert into manifold learning Definition 1. A homeomorphism is a continuous function
algorithm LSML. Next, the general method of dealing with both whose inverse is also a continuous function.
normal data and anomal data is discussed. If the anomalies occur
on low value data, they are removing with dimantional reduction
but if anomalies occured on high value data to retrieve them .The Definition 2. A d-dimensional manifold ℳ is set that is
locally homeomorphic with  . That is, for each x   ,
propose Error function to be divided by the distance between the D
normal data point and anomaly data point and add data penalty,
it will help remove. The methode provides a way to map a there is an open neighborhood around x, N x , and a
number of points in high dimensional spasce into a low
homeomorphism f : N x   . These neighborhoods are
D
dimentional space, with only smal distortion of the distancees
between the points. referred to as co ordinate patches, and the map is referred to a
coordinate chart. The image of the coordinate charts is
Keywords—Manifold Learning; Anomaly Detection; Ranking;
referred to as the parameter space.
Feature

Definition 3. A smooth (or differentiable) manifold is a


manifold such that each coordinate chart is differentiable with
I. INTRODUCTION a differentiable inverse (i.e., each coordinate chart is a
A number of techniques have been developed for diffeomorphism) [2].
administering with high dimensional data sets that fall on or
near a smooth low dimensional nonlinear manifold. Such data Problem: Given points x1 ,..., xn  
D
sets happen when modes of variability of the data are much
fewer than the dimension of the input space. Unsupervised that lie on a d-dimensional manifold ℳ that
manifold learning mentions to the problem of recovering the can be described by a single coordinate
chart f :    , find
D
structure of a manifold from a set of unordered sample points.
Manifold learning is often equated with dimensionality def
reduction, where the goal is to find an embedding or y1 ,..., yn   D , where xi  f ( xi ) .
‘unrolling’ of the manifold into a lower dimensional space
such that certain relationships between points are preserved. For example, suppose we have a collection of frames taken
Locally Smooth Manifold Learning (LSML), attempts to learn from a video of a person rotating his or her head. The
a warping function w with d degrees of freedom that can Dimensionality of the data set is equal to the number of
take any point on the manifold and generate its neighbors. pixels in a frame, which is publicly very large. However, the
LSML recovers a first order approximation of w , and by images are Indeed controled by only a couple of degrees of
making smoothness assumptions on w can generalize to freedom. Manifold learning methods have been successfully
applied to a number of similar video and image data sets
unseen points [1].
[Ple03].
This clairvoyance is formalized using the concept of a
manifold: the data set lies along a low-dimensional manifold

978-1-4673-6206-1/13/$31.00 ©2013 IEEE


Abbreviation Defination that there exists a continuous bijective mapping ℳ that
D Dimension of the input space
converts coordinates y   to points x  
D D
on the
d Dimensional manifold
manifold. The goal is to learn a warping function w that can
ℳ Continuous bijective mapping
w Warping function take a point on the manifold and return any neighboring point
n Number of data points on the manifold, capturing all the modes of variation of the
data. Define w( x,  )  ( y   ) , where y   ( x )
1
Data points
xi
and    . Taking the first order approximation of the
Set of neighbors D
Ni
Free parameters above gives: w( x,  )  x  ( x ) , where each column
 ij
l Labeled instances .k ( x) of the matrix ( x) is the partial derivative of ℳ
u Unlabeled instances
 w.r.t. yk : .k ( x)   ( y ) . This approximation is
The penalty yk
High weight in the manifold valid given  small enough. The goal of LSML is to learn the
W ij
function  :   
Gram matrix D D d
K parameterized by a variable  .
i
|| X ||22 squared L2 norm of X Only data points x sampled from one or several manifolds
i
|| A ||2F squared Frobenius norm of A are given. For each x , the set i of neighbors is then
computed (e.g.using variants of nearest neighbor such as kNN
or  NN), with the constraint that two points can be neighbors
only if they come from the same manifold. The original
j
formulation of LSML was based on the observation that if x
II. MANIFOLD LEARNING APPROACH i
is a neighbor of x , there then exists an unknown  ij such
A. LSML i ij j
that w( x ,  )  x , or to a good approximation:
Here we give details of the general version of LSML
(Doll´ar et al., 2006). Defined a new error function and
motivate the use of a new regularization term. The  ( xi ) ij  .i j
   
minimization of the error is made more efficient, although
some details are eliminated for space. For an introduction to
i j i
LSML see Figure 1[3]. Where  . j  x  x . An interpretation of the above is that

.i j is the un-centered estimate of a directional derivative


i i
at x . However,  . j could also serve as the centered estimate
ij xi  xi
of the directioalinal derivative at x  :
2

ij
 ( x ) ij  .i j   

Although the change is subtle, in practice use of (2)


Fig. 1. Problem Formulation
provides significant benefit, as the centered approximation of
the derivative has no second order error. So, roughly speaking
(1) is valid if locally the manifold has a good linear
B. Motivation and Error Function approximation while (2) is valid where a quadratic
approximation holds [1], [4].
Let D be the dimension of the input space, and assume the
data lies on a smooth d-dimensional manifold ( d ≪ D ). For C. Regularized Algorithms for Ranking and Manifold
simplicity assume that the manifold is diffeomorphic to a Learning
subset of  , meaning that it can be endowed with a global
D

coordinate system (this requirement can easily be relaxed) and The research to date has tended to focus on Multi task
learning learning in the machine learning community.  *  ( K    (l  u ) I ) 1 H    
In [5] review the research conducted on transfer learning
domain has focused on two directions:
2) Penalized Laplacian
a) How to best train the task specific models with the
additional information, if a problem is appropriate for Laplacian RLSC algorithms (Belkin& Niyogi,
transferring information between tasks.
2004) have been used successfully in other Semi
b) Indentify situations where transferring such information
supervised learning settings. The loss function of the
is appropriate.
Laplacian RLSC penalizes the weighted deviation of the
1) Semi‐Supervised Learning Intuition estimated function f ( x ) , for instance i, j that decline close
to each other in the geodesic space of a manifold
The intuition for this approach is that learning from related
(high weight in the manifold space Wi , j ). The manifold is
data is similar to semi‐supervised learning. In both cases, we
want to incorporate related data to our problem but we are not approximation on both the labeled and the unlabeled data.
sure about the labels of the instances from the related data, and
how much they should contribute to the loss function, The additional loss term accounting for the deviation is:
compared to the labeled data for the task.
Regularized manifolds and other clustering approaches
(Nigam et al 2000; Belkin, and Niyogi 2003) try to estimate l 2

the labels based on a weighted combination of the labels of the


closest labeled instances.
 V ( H ( xi ) ij  ( x j  xi )) Wij 
i , j 1


The closest instances are chosen based in the transformed


space representing the clusters or manifold. In the problem of
learning from related tasks, we do not need to estimate the
1 l 2
labels of the similar (or close in the distance metric f *  arg min  V ( xi , M i , f )   A f 2
A
transformation) instances, because they are already labeled. l i 1
As we have seen in the case of combining preference data, it l
1 i ij
sometimes helps to incorporate the related information from A  ( H ( x )  ( x j  x i )) 2Wij
the instances of other tasks to the training of the task specific (u  l ) 2 i 1
models.  
l
1 2

 arg min  V ( xi , M i , f )   A f 2
B
2 l i 1
1 l u n 2
min( )  H ( xi k ) ijk  ( xjk  xi k )   f F  1
l k 1 i 1 jNi 2
B 2
f T Lf
(u  l )

H ( xi )  H ( xi ), xi  xi , For  i  l    Laplacian L  D  W   

D. Feature Selection Method


l l
H ( x )  ( ) H ( x i ), xi  ( ) x i ,  l  i  l  u   
i

u u 1) Overview

Suppose the aim is to select t , (1  t  m) features


l
from the full feature set {v1 , v2 ,..., vm } . In our method
f * ( x)    i* K  ( x, xi )
i 1 we first define the importance score of each feature vi ,
GramMatrix
K  (l  u )  (l  u )   and define the similarity between any two features vi
ij i j
K   K (x , x )   and v j . Then we use an efficient algorithm to maximize the
total importance scores and minimize the total similarity
scores of a set of features [6], [7].
l l
H  ( xi )  [ H1 ,..., H l , ( ) H l 1 ,..., ( ) H l u ]  
u u
2) Importance of feature 4) Optimization formulation

MAP (Mean average precision) is a measure on precision As aforementioned, we want to select those features
of ranking results. It is assumed that there are two types of with largest total importance scores and smallest total
documents: positive and negative (relevant and irrelevant). similarity scores. Mathematically, this can be indicated as
Precision at n measures the accuracy of top n results for a follows:
query.
max  wi xi
number ofpositive ins tan ces within top n i
p ( n)  
n min  e i , j xi x j
i i j
  
N
p(n)  pos (n) s.t xi  {0,1} i  0,...,1
AP  
n 1 number of positive ins tan ces   x i
i t

Where n denotes position, N denotes number of


documents retrieved, pos ( n) denotes a binary function Here t denotes the number of selected features,
indicating whether the document at position n is positive [8]. xi  1 (or 0) indicates that feature vi is selected (or not),
wi denotes the importance score of feature vi , and
3) Similarity between features ei , j denotes the similarity between feature vi and feature
In this work, we measure the similarity between any two vj . In this paper, we let ei , j   (vi , v j ) , and
features on the basis of their ranking results. That is, we
regard each feature as a ranking model, and the similarity obviously ei , j  e j ,i .
between two features is represented by the similarity
between the ranking results that they produce. Many 5) RankNet
methods have been proposed to measure the distance
between two ranking results (ranking lists), such as RankNet employs a neural network as the ranking
Spearman’s footrule F, rank correlation R, and Kendall’s . function and relative entropy as loss function. Let Pij be
In principle all of them can be used here, and in this
paper we choose Kendall’s as an example. The Kendall’s the estimated posterior probability P(di  di ) and Pij be
value of query q for any two features vi and v j can be the “true” posterior probability, and let
calculated as follows, oq ,i , j  f ( ( q, d i ))  f ( ( q, d j )) .

#{(d s , dt )  Dq | d s  vi d t & d s  v j d t } The loss for an instance pair in RankNet is defined as


 q (vi , v j ) 
#{(d s , dt )  Dq }

oq ,i , j
Lq ,i , j  L(oq ,i , j )   Pij oq ,i , j  log(1  e )
Where Dq denotes the set of instance pairs (d s , dt ) in  

response with respect to query q , #{.} represents the number


of elements in a set, and d s  vi d t implies that instance RankNet then uses gradient decent to minimize the total
loss with attention to the training data. Since gradient decent
dt is ranked ahead of instance d s by feature vi . For a set may lead to local optimum, RankNet makes use of a validation
of queries, the Kendall’s values of all the queries are set to select the best model. The effectiveness of RankNet
particularly on large-scale datasets has been verified [11].
averaged, and the result  (vi , v j ) is used as the final
similarity score between features vi and v j . It is easy to see
that  (vi , v j )   (v j , vi ) holds [9],[10].
III. EXPERIMENT IV. CONCLUSION

We begin with a discussion on the intuition behind various


aspects of LSML. We then show experiments affirming the In this paper a novel examination of nonlinear manifold
validity of the method, followed by a number of applications. learning is presented, and then the problem of learning
In the figures that follow we make use of error function to manifold representations in a manner that allows for the
denote point correspondences, for example when we show the manipulation of novel data is expressed.
original point set and its embedding. The Error function to be divided by the distance between
the normal data point and anomaly data point and add data
DATA GENERATION MODEL penalty, anomaly data is removed.
In on going work, the implementation of LSML to handle
Number of variables: 1000 large datasets is scalled, and tuning for use in the image
Number of relevant variables: 25 domain.
Number of training samples: 100
Number of test samples: 200
TABLE I. EXPERIMENT RESULT

LSML Propose LSML


General 1.5032 0.9840
Ranking 0.0478 0.0143

With the above method using the features rankings and


penalized to the data, the proposed error function is reduced.

ACKNOWLEDGMENT
We also thank Dr. Piotr Dollár (MICROSOFT
RESEARCH, Redmond, WA) for providing us technical
supports .

Fig. 2. Error Function


REFERENCES

[1] Piotr Doll´ar, Vincent Rabaud and Serge Belongie, Learning to Traverse
Image Manifolds,Technical Report, 2006

[2] Lawrence Cayton, Algorithms for manifold learning, 2005

[3] Piotr Doll´ar Vincent Rabaud Serge Belongie, Non-Isometric Manifold


Learning: Analysis and an Algorithm, University of California San
Diego, 2007

[4] Neal Patwari, Alfred O. Hero III, Adam Pacholski, Manifold Learning
Visualization of Network Traffic Data, 2005 ACM, University of
Michigan

[5] Giorgos Zacharia, Regularized Algorithms For Ranking And Manifold


Learning For Related Tasks, Doctor of Philosophy at the Massachusetts
Institute Of Technology MIT, 2009
Fig. 3. Error Function
[6] Xiubo Geng, Tie-Yan Liu, Tao Qin, Hang Li, Feature Selection for
Ranking, Microsoft Research Asia, 2007
[7] Uri Shalit, Daphna Weinshall, Gal Chechik, Online Learning in the [10] Jieping Ye, Huan Liu, Dimensionality Reduction for Data Mining
Manifold of Low-Rank Matrices, The Hebrew University of Jerusalem, Techniques, Applications and Trends, Jieping Ye, Huan Liu, Arizona
2010 State University, 2007

[8] Hamed Valizadegan Rong Jin, Ruofei Zhang Jianchang Mao, Learning [11] Graph approximations to geodesics on embedded manifolds, Mira
to Rank by Optimizing NDCG Measure, Michigan State University Bernstein, Vin de Silva, Joshua B. Tenenbaum, 2000

[9] Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski, Proximal


Methods for Hierarchical Sparse Coding, Francis Bach, 2011

You might also like