0% found this document useful (0 votes)
200 views4 pages

Geometric Algebra for Time-Series Learning

1) The document describes a method for extracting geometric features from time-series spatial vector data using geometric algebra to improve machine learning performance. 2) Geometric algebra can systematically extract invariant geometric features from spatial data in a vector space, accounting for the geometric properties that conventional methods often ignore. 3) As a demonstration, the method extracts features from online handwritten digit data using geometric algebra before applying semi-supervised learning, showing improved recognition rates over methods without the geometric feature extraction.

Uploaded by

kurisu_jp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
200 views4 pages

Geometric Algebra for Time-Series Learning

1) The document describes a method for extracting geometric features from time-series spatial vector data using geometric algebra to improve machine learning performance. 2) Geometric algebra can systematically extract invariant geometric features from spatial data in a vector space, accounting for the geometric properties that conventional methods often ignore. 3) As a demonstration, the method extracts features from online handwritten digit data using geometric algebra before applying semi-supervised learning, showing improved recognition rates over methods without the geometric feature extraction.

Uploaded by

kurisu_jp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4





  1

International Workshop on Data-Mining and Statistical Science (DMSS2008)

Feature Extraction with Geometric Algebra for Semi-


Supervised Learning of Time-Series Spatial Vector
Minh Tuan Pham School of Engineering, Nagoya University
minhtuan@ cmplx.cse.nagoya-u.ac.jp

Kanta Tachibana School of Engineering, Nagoya University


kanta@ fcs.coe.nagoya-u.ac.jp

Tomohiro Yoshikawa School of Engineering, Nagoya University


yoshikawa@ cse.nagoya-u.ac.jp

Takeshi Furuhashi School of Engineering, Nagoya University


furuhashi@ cse.nagoya-u.ac.jp

keywords: Geometric Algebra, Feature Extraction, Hidden Markov Model, Semi-supervised Learning

Summary
In fields of machine learning of patterns most conventional methods of feature extraction do not pay much
attention to the geometric properties of data, even in cases where the data have spatial features. In this study we
introduce geometric algebras to systematically extract invariant geometric features from spatial data given in a vector
space. A geometric algebra is a multidimensional generalization of complex numbers and of quaternions, and able
to accurately describe oriented spatial objects and relations between them. We further propose a kernel to measure
similarity between two series of spatial vectors based on Hidden Markov Models. As an apllication, we demonstrate
our new method with the semi-supervised learning of online hand-written digits. The result shows that the feature
extraction with geometric algebra improved recognition rate in one-to-one semi-supervised learning problems of
online hand-written digits.

1. Introduction features, but whether such features arise and are adopted
depends on the experience of the model builder.

Nowadays machine learning methods are of central im- In this study, we use geometric algebra (GA) [2, 3, 4] to
portance for the discovery of information from enormous systematically and algebraically undertake various kinds
amounts of data which are available in various practical of feature extractions, and to improve precision in clas-
fields. An appropriate method to extract features from sification problems. There are already many successful
patterns is needed for the discover of information with examples of its use in e.g. colored image processing or
learning machines. But so far most conventional meth- multi-dimensional time-series signal processing with low
ods of feature extraction ignore the geometric properties dimensional GAs [5, 6, 7, 8, 9, 10, 11]. In addition, GA-
of data even in the case where the data have spatial fea- valued neural network learning methods for learning input-
tures. For example, a time-series of n-dimensional spatial output relationships [12] are well studied. In our proposed
vectors may contain some meaningful geometric relation- method, geometric features extracted with GA can also be
ships between two consecutive vectors. A standard con- used for kernel machines.
ventional method by the hidden Markov model (HMM)[1] Another interesting technique for time-series spatial vec-
for this kind of data uses the n coordinates as an output of tors is semi-supervised learning[13]. It becomes difficult
HMM. However, such an output space,which is the same for human to evaluate and label all available instances when
as the spatial vector space, is not necessarily rich enough massive time-series patterns are measured automatically.
for emission probability distributions to be represented as Semi-supervised classification learns from massive unla-
simple parametric models e.g. Gaussian. If meaningful beled instances and a relatively small number of labeled
geometric features are also used as outputs of HMM, then instances based on similarity among all instances and then
emission probability distributions may be represented bet- it can successfully label unknown patterns. We propose
ter. Some conventional methods may extract geometric a new kernel to measure similarity between two series of

2 DMSS2008

spatial vectors. Because the new kernel is designed with secutive vectors x−1l−1 · xl and the following n C2 compo-
HMM, it can measure similarity between two series whose nents are coefficients of outer product between two con-
series-lengths are different. secutive vectors x−1
l−1 ∧ xl .
In this paper, we compare performance of semi-supervised Below we show the feature extraction for the case of
classification with feature extraction by GA to that with- hand-written digit data of the UCI Repository [14]. Each
out feature extraction by GA. As an application of semi- of the digit data is given by m points ξ = {x1 , . . . , xm },
supervised learning of time-series geometric data, we use with different series lengths in instance by instance. A 2-
an online hand-written digit dataset. For this two dimen- dimensional point is given by xl = xl e1 + yl e2 with x0 =
sional application, human model builder may be able to y0 = 0. Using GA, feature extraction can be undertaken
find and use features which GA extracts systematically systematically.
and algebraically. However, one of the most advantageous
points of GA is that the feature extraction is easily general-
ized to an n-dimensional vector space where n ≥ 3. In this
paper, we evaluate effect of features extracted with GA in
classification of this illustrative two dimensional dataset.

2. Method

2·1 Feature extraction with GA


GA is also called Clifford algebra. An orthonormal ba-
sis {e1 , e2 , . . . , en } can be chosen for a real vector space Fig. 1 Examples of features for hand-written digit ‘6’. The right shows
Rn . The GA of Rn , denoted by Gn , is constructed by an inner product (dotted line) and outer product (solid line) between
consecutive vectors in digit ‘6’.
associative and bilinear product of vectors, the geometric
product, which is defined by

1 (i = j) ,
ei ej = (1)
−ej ei (i = j) .

GAs are also defined for negative squares e2i = −1 of


some or all basis vectors. Such GAs have many appli-
cations in computer graphics, robotics, virtual reality, etc
[4]. But for our purposes definition (1) will be sufficient.

Now we consider two vectors {al = i ali ei , l = 1, 2}
in Rn . Their geometric product is

n
n−1
n
a1 a2 = a1i a2i + (a1i a2j − a1j a2i ) eij , Fig. 2 Examples of features for hand-written digit ‘8’. The right shows
i=1 i=1 j=i+1 inner product (dotted line) and outer product (solid line) between
(2) consecutive vectors in digit ‘8’.

where the abbreviation eij = ei ej is adopted in the fol-


lowing. The first term is the commutative inner product
First n components of a feature vector x∗l are coordi-
a1 · a2 = a2 · a1 . The second term is the anti-commutative
nates xl = [xl , yl ] ∈ R2 . The next component x−1
l−1 · xl =
outer product a1 ∧ a2 = −a2 ∧ a1 . The second term is xl−1 xl +yl−1 yl
called a 2-blade. x2l−1 +yl−1
2 is a parallel component of xl to the direc-
Now we propose a simple but systematic and algebraic tion of xl−1 divided by |xl−1 |. And the last components
xl−1 yl −xl yl−1
derivation of feature extractions from a series of spatial are coefficients of x−1
l−1 ∧ xl = x2 +y 2
. Each co-
l−1 l−1

vectors ξ = {xl ∈ Rn , l = 1, . . . , m}. We propose to trans- efficient means projected area of an oriented parallelism
form the vectors by spanned by xl and xl−1 onto corresponding basis plane
eij , divided by |xl−1 |. In case of n = 2, it is an orthog-
ξ → ξ ∗ = {x∗l ∈ Rn+1+n C2 , l = 2, . . . , m}, (3)
onal component of xl to the direction of xl−1 divided by
where, x∗l is a feature vector whose first n components |xl−1 |. Figure 1 and Figure 2 show examples of feature
are xl , the next component is inner product between con- vectors for the handwritten digits ‘6’ and ‘8’.

Feature Extraction with Geometric Algebra for Semi-Supervised Learning of Time-Series Spatial Vector 3

2·2 Kernel and Semi-Supervised Learning 3·2 Semi-Supervised Learning Result


Semi-supervised learning is to infer labels for unlabeled We selected 10 instanses for each digit at random as la-
data U = {l + 1, . . . , l + u = s} or other unknown unla- beled data. And we used the other instances as unlabeled
beled data when the label yi ∈ {0, 1} of L = {1, . . . , l}, data. We did not evaluate 10-class classification but 55
a part of the instance set is known. In this paper, we solve two-class classifications. In each two-class classification,
a problem in the case of labeling U . The goal is to find a we trained each of two HMMs from the labeled data with
binary function γ: U → {0, 1} so that similar points have the corresponding class.
the same label. % 0 1 2 3 4 5 6 7 8
1 94.1 – – – – – – – –
First, we define a similarity funtion between two in- 2 96.6 72.7 – – – – – – –
stances i, j ∈ {1, . . . , s} as 3
4
99.3
86.1
81.6
89.5
84.1
89.7

99.6










5 93.5 84.5 92.0 90.8 91.9 – – – –

k p (ξi | θk ) p (ξj | θk )
6 83.1 91.0 91.3 99.2 82.9 88.2 – – –
wij =   ,(4) 7 96.6 81.5 81.9 89.6 98.0 96.5 99.7 – –
2
8 84.7 91.9 97.2 96.5 95.6 84.5 73.3 97.4 –
2
k (p (ξ i | θk )) k (p (ξj | θ k )) 9 90.7 90.3 87.1 89.3 92.9 88.6 89.7 92.4 90.5

Table 1 Using only coordinate value


where p (ξ | θk ) is a probability density of ξ with the k-th
hidden Markov model whose parameters θk were learned
% 0 1 2 3 4 5 6 7 8
by the forward-backward algorithm from labeled data with 1 94.6 – – – – – – – –
2 98.7 89.5 – – – – – – –
the k-th class. 3
4
99.1
88.8
90.7
87.0
93.5
94.8

99.7










Then, we construct a s × s symmetric weight matrix 5
6
91.3
97.3
92.3
92.4
97.2
99.0
97.2
100.0
93.5
94.9

97.1






7 98.7 88.4 90.4 94.6 97.3 97.3 99.6 – –
W = [wij ]. The weight matrix can be separated as 8 90.1 95.0 98.5 99.7 96.5 83.6 96.9 99.9 –
  9 94.6 92.6 96.2 96.5 94.1 92.0 99.4 96.0 98.6

WLL WLU Table 2 Using GA


W= (5)
WU L WU U
Table 1 shows correct labelling rate in a case where only
at the l-th row and the l-th column. For this purpose Zhu
coordinates are used as outputs of HMM. Table 2 shows
et al.[15] proposed to first compute a real-valued function
correct labelling rate in a case where features ectracted by
g: U → [0, 1] which minimizes the energy
GA are also used as outputs of HMM. A pairwised t-test
2
E (g) = wij (g(i) − g(j)) . (6) shows that the improvement of correct labeling rate was
i,j
significant (P-value: 1.78 × 10−7 ).
Restricting g(i) = gL (i) ≡ yi for the labeled data, g for
the unlabeled data are calculated by 4. Conclusion
−1
gU = (DU U − WU U ) WU L gL , (7)
In this study, we proposed a simple but systematic and
where DU U = diag(di ) is the diagonal matrix with en- algebraic feature extraction method by GA. We also pro-

tries di = j wij , for the unlabeled data. Then γ(i) is posed a new kernel to measure similarity between two se-
decided using the class mass normalization proposed by ries of spatial vectors. We designed this kernel with HMM
Zhu et al[15]. to be able to measure similarity between two series whose
series-lengths are different. In this paper, we compared
3. Experimental Results and Discussion performance of semi-supervised classification with feature
extraction by GA to that without feature extraction by GA.
3·1 Hand-written Digits As an application of semi-supervised learning of time-
We used the Pen-Based Recognition of Hand-written series geometric data, we used an online hand-written digit
Digits dataset (Pendigits dataset in the following) of the dataset. The result shows that, use of feature extraction by
UCI Repository [14] as an example application for clas- GA improved classification performance.
sification because digits have two-dimensional spatial fea-
tures. The Pendigits dataset consists of 3498 samples writ-
Acknowledgments
ten by 14 people. In the sample collection, the pen point
This work was partly supported by the Grant-in-Aid for
coordinates rl at 100 msec intervals were measured on a
the 21st century COE program “Frontiers of Computa-
tablet with a resolution of 500 × 500 pixels. In this study,
tional Science” Nagoya University and the Grant-in-Aid
we carry out the feature extraction with GA after comput-
for Young Scientists (B) #19700218.
ing xl = rl − r0 , i.e. setting the origin at the first point
r0 .

4 DMSS2008

♦ References ♦

[1] L. R. Rabiner, 1989. A tutorial on hidden Markov models and se-


lected applications in speech recognition. Proceedings of the IEEE,
vol 77(2), pp. 257–286.
[2] C. Doran and A. Lasenby, Geometric algebra for physicists, Cam-
bridge University Press, 2003.
[3] D. Hestenes, New foundations for classical mechanics, Dordrecht,
1986.
[4] L. Dorst, D. Fontijne, and S. Mann, Geometric Algebra for Com-
puter Science: An Object-oriented Approach to Geometry (Morgan
Kaufmann Series in Computer Graphics), 2007.
[5] I. Sekita, T. Kurita, and N. Otsu, Complex Autoregressive Model
for Shape Recognition, IEEE Trans. on Pattern Analysis and Machine
Intelligence, Vol. 14, No. 4, 1992.
[6] A. Hirose, Complex-Valued Neural Networks: Theories and Ap-
plications, Series on Innovative Intelligence, Vol. 5, 2006.
[7] N. Matsui, T. Isokawa, H. Kusamichi, F. Peper, and H. Nishimura,
Quaternion neural network with geometrical operators, Journal of In-
telligent and Fuzzy Systems, Volume 15, Numbers 3–4, pp. 149–164,
2004.
[8] S. Buchholz and N. Le Bihan, Optimal separation of polarized
signals by quaternionic neural networks, 14th European Signal Pro-
cessing Conference, EUSIPCO 2006, September 4–8, Florence, Italy,
2006.
[9] T. Nitta, An Extension of the Back-Propagation Algorithm to
Complex Numbers, Neural Networks, Volume 10, Number 8, pp.
1391–1415(25), November 1997.
[10] D. Hildenbrand and E. Hitzer, Analysis of point clouds using
conformal geometric algebra, 3rd International conference on com-
puter graphics theory and applications, Funchal, Madeira, Portugal,
2008.
[11] E. Hitzer, Quaternion Fourier Transform on Quaternion Fields
and Generalizations, Advances in Applied Clifford Algebras, 17(3),
pp. 497–517 (2007).
[12] G. Sommer, Geometric Computing with Clifford Algebras,
Springer, 2001.
[13] B. Kulis, S. Basu, I. Dhillon, and R. Mooney. Semisupervised
graph clustering: A kernel approach. In Proc. of The 22nd Int ’l
Conf. on Machine Learning, pp. 457–464, 2005.
[14] A. Asuncion, and D. J. Newman, UCI Machine Learning Repos-
itory. Irvine, CA: University of California, School of Information and
Computer Science, 2007.
[15] X. Zhu, J. Lafferty, and Z. Ghahramani, Combining active learn-
ing and semi-supervised learning using Gaussian fields and harmonic
functions. ICML 2003 workshop on the Continuum from Labeled to
Unlabeled Data in Machine Learning and Data Mining, 2003.



You might also like