Translations of
MATHEMATICAL
MONOGRAPHS
Volume 191
Methods of
Information Geometry
Shun-ichi Amari
Hiroshi Nagaoka
aes ee
FUNDA Ce ae eSTranslations of
MATHEMATICAL
MONOGRAPHS
Volume 191
Methods of
Information Geometry
Shun-ichi Amari
Hiroshi Nagaoka
Translated by
Daishi Harada
OXFORD
UNIVERSITY PRESSEditorial Board
Shoshichi Kobayashi (Chair)
Masamichi Takesaki
TERE tal D 5.
JOHO KIKA NO HOHO
(Methods of Information Geometry)
by Shun-ichi Amari and Hiroshi Nagaoka
Copyright © 1998 by Shun-ichi Amari and Hiroshi Nagaoka
Originally published in Japanese by Iwanami Shoten, Publishers, Tokyo, 1993
‘The authors, Oxford University Press, and the American Mathematical Society grate-
fully acknowledge the financial support provided by the Daido Life Foundation for the
editing of this work.
‘Translated from the Japanese by Daishi Harada
2000 Mathematics Subj
Classification. Primary 00A69, 53-02, 53808, 5:
62F05, 62F 12, 93005, 81Q70, 94415.
Library of Congress Cataloging-in-Publication Data
Amari, Shun’ichi
[Joho kika no hoho. English]
Methods of information geometry / Shunichi Amari, Hiroshi Nagaoka ; [translated from the
Japanese by Daishi Harada}
. em, — (Translations of mathematical monographs, ISSN 0065-9282 ; v. 191)
Includes bibliographical references and index.
ISBN 0-8218-0531-2 (alk. paper)
1, Mathematical statistics. 2. Geometry, Differential. I. Nagaoka, Hiroshi, 1955-1. Title
IIL Series.
QA276.A56313 2000
519.5—de2i 00-050362
Copying and reprinting. Individual readers of this publication, and nonprofit libraries
acting for them, are permitted to make fair use of the material, sch as to copy 1 chapter for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the souree is given.
Republication, systematic copying, oF multiple reproduction of any material inthis publication
is permitted only under license from the American Mathematical Society. Requests for suck
permission should be addressed to the Assistant to the Publisher, American Mathematical Society,
P.O. Box 6248, Providence, Rhode Island 0294046248, Requesis can also be made by e-mail to
reprint-permiseionGans.org
© 2000 by the American Mathematical Society, All rights reserved,
‘The American Mathematical Society retains all rights
except those granted to the United States Government.
Printed in the United States of America
© The paper used in this book is acid-free and falls within the guidelines
‘stablished to ensure permanence and durability
Visit the AMS home page at URL: http://w. ans.org/
10987654321 0504030201 00Contents
Preface vii
Preface to the English edition ix
1 Elementary differential geometry 1
LL Differentiable manifolds . : 1
1.2 Tangent vectors and tangent spaces 5
1.3 Vector fields and tensor fields... 02.0.0 eee eee eee 8
1.4 Submanifolds 10
1.5 Riemannian metric . u
16 Affine connections and covariant derivatives : 13
1.7 Flatness . ry ee 17
18 Autoparallel submanifolds | 1... 2... cece eee eee 19
19 Projection of connections and embedding curvature... 2
1.10 Riemannian connection ee eee 23,
2 The geometric structure of statistical models 25
21 Statistical models... 0.2 eee eee BB
The Fisher metric 28
The a-connection . oa see . 32
Ghentsov's theorem and some historical remarks... 2... 37
The geometry of P(¥) eee pees 40
2.6 a-affine manifolds and a-families ©... . 45
3 Dual connections 51
3.1 Duality of connections... . o60cbenc8 51
3.2 Divergences: general contrast functions . bocone 53
3.3 Dually flat spaces . ere Le. 8B
3.4 Canonical divergence . 6L
Lr : 65
ee
3.7 Mutually dual foliations . 75
3.8 A further look at the triangular relation ivi CONTENTS
4 Statistical inference and differential geometry
4.1 Estimation based on independent observations
4.2 Exponential families and observed points
4.3. Curved exponential families
4.4 Consistency and first-order efficiency
4.5 Higher-order asymptotic theory of estimation .
4.6 Asymptoties of Fisher information ..... 2...
4.7 Higher-order asymptotic theory of tests =
4.8 The theory of estimating functions and fiber bundles .
4.8.1 The fiber bundle of local exponential families
4.8.2 Hilbert bundles and estimating functions
5 The geometry of time series and linear systems
5.1 The space of systems and time series... . . .
5.2. The Fisher metric and the a-comnection on the system space
5.3. The geometry of finite-dimensional models . . -
5.4 Stable systems and stable feedback . .
6 Multiterminal information theory and statistical inference
6.1. Statistical inference for multiterminal information .
6.2 O-rate testing veces
6.3 Orate estimation... 0.0.2... eee eee
6.4 Inference for general multiterminal information
7 Information geometry for quantum systems
7.1 The quantum state space... 0.0... ve ee
7.2 The geometric structure induced from a quantum divergence
7.3. The geometric structure induced from a generalized covariance
7.4 Applications to quantum estimation theory
8 Miscellaneous topics
8.1 The geometry of convex anal,
ent flows... ..
8.2. Neuro-manifolds and nonlinear systems
8.3. Lie groups and transformation models in information geometry
8.4 Mathematical problems posed by information geometry
sis, linear programming and gradi-
Guide to the Bibliography
Bibliography
Index
81
81
85
87
89
94
97
100
107
107
= 109
115
- U5
.. 18
+ 123
eelza
133
- 133
136
140
. 142
145
» 45
. 150
. 154
159
167
fer
- 170
172
175
181
187
203Preface
Information geometry provides the mathematical sciences with a new framework
for analysis. This framework is relevant to a wide variety of domains, and it
has already been usefully applied to several of these, providing them with a
new perspective from which to view the structure of the systems which they
investigate. Nevertheless, the development of the field of information geometry
can only be said to have just begun.
Information geometry began as an investigation of the natural differential
geometric structure possessed by families of probability distributions. As a
rather simple example, consider the set S of normal distributions with mean 1
and variance 0:
“}
By specifying (14,0) we determine a particular normal distribution, and hence S
may be viewed as a 2-dimensional space (manifold) which has (j1,0) as a coor-
dinate system. However, this is not a Euclidean space, but rather a Riemannian
space with a metric which naturally follows from the underlying properties of
probability distributions. In particular, when S is a family of normal distribu
tions, itis a space of constant negative curvature, The underlying characteristics
of probability distributions lead not only to this Riemannian structure; an in-
vestigation of the structure of probability distributions leads to a new concept
within differential geometry: that of mutually dual affine connections. In addi-
tion, the structure of dual affine connections naturally arises in the framework
of affine differential geometry, and has begun to attract the attention of math-
ematicians researching differential geometry.
Probability distributions are the fundamental element over which fields such
as statistics, stochastic processes, and information theory are developed. Hence
not only is the natural dualistic differential geometric structure of the space
of probability distributions beautiful, but it must also play a fundamental role
in these information sciences. In fact, considering statistical estimation from a
differential geometric viewpoint has provided statistics with a new analytic tool
which has allowed several previously open problems to be solved; information
geometry has already established itself within the field of statistics. In the fields
of information theory, stochastic processes, and systems, information geometry
viiviii PREFACE
is being currently applied to allow the investigation of hitherto unexplored pos-
sibilities
The utility of information geometry, however, is not limited to these fields. It
has, for example, been productively applied to areas such as statistical physi
and the mathematical theory underlying neural networks. Further, dualistic
differential geometric structure is a general concept not inherently tied to prob-
ability distributions. For example, the interior method for linear programming
may be analyzed from this point of view, and this suggests its relation to com-
pletely integrable dynamical systems. Finally, the investigation of the informa-
tion geometry of quantum systems may lead to even further developments.
‘This book presents for the first time the entirety of the emerging field of
information geometry. To do this requires an understanding of at least the
fundamental concepts in differential geometry. Hence the first three chapters
contain an introduction to differential geometry and the recently developed the-
ory of dual connections. An attempt has been made to develop the fundamental
framework of differential geometry as concisely and intuitively as possible. It
is hoped that this book may serve generally as an introduction to differential
geometry. Although differential geometry is said to be a difficult field to un-
derstand, this is true only of those texts written by mathematicians for other
mathematicians, and it is not the case that the principal ideas in differential
geometry are hard. Nevertheless, this book introduces only the amount of dif-
ferential geometry necessary for the remaining chapters, and endeavors to do so
in a manner which, while consistent with the conventional definitions in mathe-
matical texts, allows the intuition underlying the concepts to be comprehended
most immediately.
On the other hand, a comprehensive treatment of statisties, system theory,
and information theory, among others, from the point of view of information
geometry is for each distinct, relying on properties unique to that particular
theory. It was beyond the scope of this book to include a thorough description
of these fields, and inevitably, many of the relevant topics from these areas are
rather hastily introduced in the latter half of the book. It is hoped that within
these sections the reader will simply gather the flavor of the research being
done, and for a more complete analysis refer to the corresponding papers. To
complement this approach, many topics which are still incomplete and perhaps
consist only of vague ideas have been included.
Nothing would make us happier than if this book could serve as an invitation
for other researches to join in the development of information geometry.Preface to the English
Edition
Information geometry provides a new method applicable to various areas inelud-
ing information sciences and physical sciences. It has emerged from investigat-
ing the geometrical structures of the manifold of probability distributions, and
has applied successfully to statistical inference problems. However, it has been
proved that information geometry opens a new paradigm useful for elucidation
of information systems, intelligent systems, control systems, physical systems,
mathematical systems, and so on.
There have been remarkable progresses recently in information geometry.
For example, in the field of neurocomputing, a set of neural networks forms a
neuro-manifold. Information geometry has become one of fundamental meth-
ods for analyzing neurocomputing and related areas. Its usefulness has also
been recognized in multiterminal information theory and portfolio, in nonlinear
systems and nonlinear prediction, in mathematical programming, in statistical
inference and information theory of quantum mechanical systems, and so on.
Its mathematical foundations have also shown a remarkable progress.
In spite of these developments, there were no expository textbooks covering
the methods and applications of information geometry except for statistical
ones. Although we published a booklet to show the wide scope of information
geometry in 1993, it was unfortunately written in Japanese. It is our great
pleasure to sce its English translation. Mr. Daishi Harada has achieved an
excellent work of translation.
In addition to correction of many misprints and errors found in the Japanese
edition, we have made revision and rearrangement throughout the manuscript
to make it as readable as possible. Also we have added several new topics, and
even new sections and a new chapter such as §2.5, §3.2, §3.5, §3.8 and Chapter 7.
The bibliography and the guide to it have largely been extended as well. These
works were done by the authors after receiving the original translation, and it
is the authors, not the translator, who should be responsible for the English
writing of these parts.
This is a small booklet, however. We have presented a concise but compre-
hensive introduction to the mathematical foundation of information geometry
in the first three chapters, while the other chapters are devoted to an overview
ixx PREFACE TO THE ENGLISH EDITION
of wide areas of applications. Even though we could not show detailed and
comprehensive explanations for many topics, we expect that the readers feel its
flavor and prosperity from the description. It is our pleasure if the book would
play a key role for further developments of information geometry.
Year 2000
Shun-ichi Amari
Hiroshi NagaokaChapter 1
Elementary differential
geometry
Differential geometry is a mature field of mathematics and has many introduc-
tory texts; still, it is not an easy field to master. However, in this book we shall
require only the fundamental ideas and methodologies of differential geometry.
The main theme of modern differential geometry has been to characterize the
global properties of manifolds, and much theory has been developed towards
this ond. At this time, the field of information geometry (mostly) requires only
the theory of the locally characterizable properties of manifolds.
For information geometry the most important aspects of differential geome-
try are those which allow us to take problems from a variety of fields: statistics,
information theory, and control theory; visualize them geometrically; and from
this develop novel tools with which to extend and advance these fields. In this
chapter we present an introduction to differential geometry from this point of
view.
1.1 Differentiable manifolds
A differentiable manifold is a mathematical concept: denoting a generaliza-
tion/abstraction of geometric objects such as smooth curves and surfaces in
an n-dimensional space, Intuitively, a manifold S is a “set with a coordinate
system,” Since S is a set, it has elements. It does not matter what these ele-
ments are (these elements are also called the points of S.) For example, in this
book, we shall introduce manifolds whose points are probability distributions
and also those whose points are linear systems. S must also have a coordinate
system. By this we mean a one-to-one mapping from S (or its subset) to R",
which allows us to specify each point in S using a vector of n real numbers
(this vector is called the coordinates of the corresponding point). We call the
natural number n the dimension of S, and write n = dim S.
We call a coordinate system that has $' as its domain a global coordinate
12 1. ELEMENTARY DIFFERENTIAL GEOMETRY
Figure 1.1: A coordinate system for S.
system. In our analysis below, we shall consider only the case where there exists
a global coordinate system. However, in general there are many manifolds
which do not have global coordinate systems. Examples of such a manifold
inelude the surface of a sphere and the torus (the surface of a donut). These
manifolds have only local coordinate systems. This may be viewed informally
in the following way. Consider an open subset U of S, and suppose that U' has
a coordinate system. This provides a local coordinate system for those points
contained in U. For a point not contained in U, consider another open subset V
containing that point which also has a coordinate system. Repeat this process
until the original set J is covered, so that each point in $ is contained in an
‘open subset which has a coordinate system. Then this collection of open subsets
of S and their corresponding coordinate systems would allow us to express any
point in § using coordinates. However, as mentioned above, in this chapter we
shall consider only the case when there exists a global coordinate system. This
will suffice to prepare us for the later chapters. Indeed, since in this chapter
we principally develop the local theory of manifolds, this assumption does not
typically affect the generality of the analysis.
Let $ be a manifold and g : $ —» R" be a coordinate system for $. Then
maps each point p in $ to n real numbers: gp) = [E¥(p),---,€"()]
[¢!,--»,€"]. These are the coordinates of the point p. Each &* may be viewed as
a function p —» £'(p) which maps a point p to its i” coordinates we call these m
functions €' : SR (i= 1,-+-,n) the coordinate functions. We shall write
the coordinate system in ways such as y = [E!,---,€"] = [E'] (igure 1.1).
Let 1 = [p'] be another coordinate system for $. Then the same point p € S
has both the coordinates [¢'(p)] = (é] © R" with respect to the coordinate
system i, and the coordinates [p'(p)] ~ [p!] eR" with respect to the coordinate
system J. The coordinates [p'] may be obtained from |é'] in the following
way. First apply the inverse mapping g™! to [€*|; this gives us a point p in S.
Then apply y to this point; this result is [p']. In other words, we apply the
We shall use €, p to denote both (the variable representing) the #' coordinate of a point
and a coordinate function. This is similar to writing “the funetion y = y(x).”1.1, DIFFERENTIABLE MANIFOLDS 3
be
Figure 1.2: Coordinate transformation.
transformation on RB” given by
pow! [eye] [al 07]. (1.1)
This is called the coordinate transformation from y = {é"] to & = [p'] (Figure
1.2).
To consider $ as a manifold means that one is interested in investigating
those properties of $ which are invariant under coordinate transformations. In
particular, differential geometry analyzes the geometry of objects using differ
ential operators with respect to a variety of functions on S, and it would be
problematic if these operators depended fundamentally on the choice of coordi-
nates, Hence it is necessary to restrict the coordinate systems to those which
allow smooth transformations between each other.
In order to properly formalize the concepts described above, let us now
formally define manifolds for which there exists a global coordinate system.
Let $ be a set. If there exists a set of coordinate systems A for $ which
satisfies the conditions (i) and (ii) below, we call $ (more properly, (S,.A)) an
n-dimensional C® differentiable manifold, or more simply, a manifold.
(i) Each element ¢ of A is a one-to-one mapping from S to some open subset,
of R"
(ii) For all y € A, given any one-to-one mapping from S to R", the following
holds:
WEA <> Yo! isaC™ diffeomorphism.
Here, by a C® diffeomorphism we mean that yog™! and its inverse po)
are both C (infinitely many times differentiable). From these conditions, and4 1. ELEMENTARY DIFFERENTIAL GEOMETRY
given the coordinate transformation described in Equation (1.1), it follows that
we may take the partial derivative of the function pi = pi(€',---,€") with
respect to its variable arguments as many times as needed, and that the same
holds for é* = €'(p!,---,p"). In this book, the condition C™ is used a number
of times, but in fact it is usually not necessary; it would suffice for the relevant,
functions to be differentiable some appropriate number of times, Intuitively,
then, we may consider C® to simply mean “sufficiently smooth”.
Let § be a manifold and y be a coordinate system for S. Let U7 be a subset,
of S. If the image y(U) is an open subset of R", then we say that U/ is an open
subset of S. From condition (ii) above, we see that this property is invariant over
the choice of coordinate system yp. This allows us to consider $'as a topological
space. For any non-empty open subset U/ of S, we may restrict , the coordinate
system of S, to obtain gly (the mapping U — R" obtained by restricting the
domain of y to U), which may be taken as a coordinate system for U. Hence
we see that U is a manifold whose dimension is the same as that of S.
Let f : S + R be a function on a manifold S. Then if we select a coordinate
system y = (¢'] for S, this function may be rewritten as a function of the
coordinates; i.e., letting [é'] denote the coordinates of the point p, we have
S(p) = (E+ ,€"), where f = foy7". Note that f is a real-valued function
whose domain is y($), an open subset of R”. Now suppose that f(é!,-»-,€")
is partially differentiable at each point in o(S). Then the partial derivative
HE F(E!,---£") is also a function on (5). By transforming the domain back
to S, we may define the partial derivatives of f to be 2 a Pog: SR.
‘We write (#) to denote the value of this function at point p (the partial
derivative at. point p).
When f = fo”! is C®, in other words when f(¢1,-+-,€") can be partially
differentiated with respect to its variables an unbounded number of times, we
call f a C® function on S. This definition does not depend on the choice of
coordinate system y. The partial derivatives $f of a C™ function f are also
C7 functions, We may similarly define the higher-order patil derivatives,
wee me ZL. These will also be C*. As with the case of C functions on
R", 5Side = oy bf holds.
Let us denote the class of C® functions on S by F($), or simply F. For
all f and g in F and a real number c, we define the sum f +g as (f + 9)(»
f(p)-+9(p), the scaling cf as (cf)(p) = ¢f(p), and the product f-g as (f-9)()
‘f(p) -9(p); these functions are also members of F.
Let [é'] and [p!] be two coordinate systems. Since the coordinate functions
& and p! are clearly C™, the partial derivatives 3$; and $2 are well defined,
and they satisfy
“ag! Ap! ap! ae!
Lage age x 1G pF (1.2)
where 6j is 1 if k = i, and 0 otherwise (the Kronecker delta). In addition, for1.2. TANGENT VECTORS AND TANG!
INT SPACES 5
any C® function f, we have
Lop
Note: In this book there often appear equations which contain
indices such as é, j,---, and are to be summed over those indices that,
are both super and subseripted. For these equations we shall abbre-
viate by omitting the summation sign > corresponding to these
indices. For example, Equations (1.2) and (1.3) above would be
written as
OE dp) _ Opt OE _ 5s
Opi DEF ~ AEF Ape ~ “*
of _ dF Of Of _ Ap! OF
Opi dp) aE ag OE Api
We shall also abbreviate 37)", 2), Aj, BY as A¥j.Bh. Hence (un-
less there is ambiguity), whenever there appears such an equation
we shall assume that there is an implicit > (Le., there is a summa-
tion over the relevant indices). Note therefore that AiX? = ALX*,
for instance, is always true. ‘This notation is known’ as Einstein's
convention.
(1.3)
ag of of dp’ Of
apa 4 og oe Ie Op
Let § and Q be manifolds with coordinate systems y : § > R" and YQ
R™. A mapping \; SQ is said to be C® or smooth if wooy™! isa C®
mapping from an open subset of R” to R™. A necessary and sufficient condition
for \ to be O™ is that fod € F(S) for all f ¢ F(Q). Ifa C™ mapping A is a
bijection (ie., one-to-one and A(S) = Q) and the inverse AW? is also C™, then
Ais called a C® diffeomorphism from § onto Q.
1.2 Tangent vectors and tangent spaces
The tangent space Tp at a point p € Sof a manifold S is intuitively the vector
space obtained by “locally linearizing” $ around p. Let {¢"] be some coordinate
system for S, and let e; denote the “tangent vector” which goes through point
p and is parallel to the i" coordinate curve (coordinate axis). By the i" coor-
dinate curve we mean the curve which is obtained by fixing the values of all &
for j #7 and varying only the value of é*. The n-dimensional space spanned by
the n tangent vectors €1,---,€n is the tangent space T, at point p (Figure 1.3)
Let p’ be a point “very close” to p, and let {€*] and [¢* + dé'] (where dé is an
infinitesimal) be the coordinates of p and p', respectively. Then the segment
joining these two points may be described by pp’ = d¢‘e;, an infinitesimal vector
in Tp.
Let us make the above concepts more precise. To do so, we must first
formally define what we mean by curves and the tangent vector of curves on a6 1. ELEMENTARY DIFFERENTIAL GEOMETRY
cvordinate systems Te")
Figure 1.3: Tangent Space
manifold. Consider a one-to-one function 7 : I —+ $ from some interval I (CR)
to S, By defining 7(t) “ £0 (t ») remy exo the point >(t) (t € 1) using
coordinates as 7(t) ‘nt (O]. IF H(t) is C® for t € I, we call ya C%
curve on &. This defiiser i Gist intr ot mene eee
Now, given a curve +y and a point +(a) = p, let us consider what is meant by
the “derivative” of + at p, or alternatively the “tangent vector” (2), = Ha).
When S is simply an open subset of R”, or can be embedded smoothly into R’
(€2 n), the range of + is contained within a single linear space, and hence it
suffices to consider the standard derivative
F lath) ~>(a)
Oe eee
In general, however, the equation above is not meaningful. On the oo hand, if
we take a O® function f € F on $ and consider the value of f(7(t)) on the curve,
since this is a real-valued function, we may define the derivative 2 f(7(0)) in the
(14)
usual way. Using coordinates, we have f((¢)) = f(t) = FO" W).--.7"(t)),
and the derivatives may be rewritten as
a = (8) a _ (af) av -
a) = (38) ae (5) oy a
‘We call this the directional derivative of f along the curve 7. Let us consider
this directional derivative as an expression of the tangent vector of 7. In other
words, we take the operator : F + R which maps f € F to £ f(7(t))t-a, and
a) to be this operator. Then we
simply define the tangent vector (42) =
P
may rewrite Equation (1.5) as
ita) (2
a) = (2),
i
(a) = 44 Oleca). Here (®), is an operator which maps f — (i e), It
is possible to show that when the tangent vectors can be defined using Equa-
tion (1.4), there is a natural one-to-one correspondence between Equations (1.4)1.3. VECTOR FIELDS AND TENSOR FIELDS G
and (1.6). Hence the definition of tangent vectors as operators may be viewed
as a generalization of Equation (1.4).
ice a partial derivative is simply a directional derivative along a coordinate
axis, the operator ( &), is the tangent vector at point p of the i** coordinate
curve, The e; mentioned previously corresponds to this (&) . From Equa-
Ip
tion (1.3), we see that
(35),- (on), (Ge), (ae), ~ (3), Ge), 2”
Consider all curves which pass through the point p. We denote the set
of all tangent vectors corresponding to these curves by Tp, or Tp(S). From
Equation (1.6), we see that
14S) = {° (&) Je .
‘This forms a linear space, and since the operators { (#) AS
»
” cr}. (1.8)
np are
clearly linearly independent, the dimension of this space is n (= dim S). We
call T,($) and its elements the tangent space and tangent vectors, of 5 at
the point p , respectively. In addition, we call (;2-) the natural basis of the
P
coordinate system {¢'].
Let D € 7, be some tangent vector. Then for all f,g € F and all a,b € B,
D satisfies the following:
Linearity] D(af + bg) = aD(f) + 6D(g). (1.9)
{Leibniz’s rule] D(f- 9) = ()D(g) + 9() DL). (1.10)
Conversely, it can be shown that any operator D : F — R satisfying these
properties is an element of T;. Hence, it is possible to define tangent vectors in
terms of these properties
Let \: S + Q be a smooth mapping from a manifold S to another manifold
Q. Given a tangent vector D € T,(S) of S$, the mapping D’ : F(Q) > R
defined by D'(f) = D(f od) satisfies Equations (1.9) (1.10) with p replaced
with A(p), and hence D’ belongs to Typ)(@). Representing this correspondence
as D’ = (dA),(D), we may define a linear mapping (dA), : Tp(S) + Tacp)(Q),
which is called the differential of \ at p. When S and Q are provided with
coordinate systems [é'] and [p’] respectively, we have
WH) (HE), (Bag 0
Moreover, for any curve 7(#) on S passing through the point p it follows that
won) (a) 5° om8 1. ELEMENTARY DIFFERENTIAL GEOMETRY
1.3 Vector fields and tensor fields
Let X : p+» X, be a mapping which maps each point p in the manifold $
to a tangent vector X, € T,(S). We call such a mapping a vector field. For
example, if [é'] is a coordinate system, then we may define n vector fields through
the mappings 72 po ( 2 ) (i=1,-++,n). These are the vector fields formed
»
by the natural basis. Below, we shall write 0; to mean ee In general, given
a vector field X, for each point p there exists n real numbers {X},---,X?}
which uniquely determine X,, = Xi(a)y. Hence we may define the functions
X': p++ Xi on S. We call the n functions {X1,-.-,X”} the components
of X with respect to {é"]. This allows us to write X = X'a. If, in addition,
we let [p/] be another coordinate system and X = X20, ( Sef fs) be the
component expression of X with respect to [p'], then the following hold:
- api
wa x2 ona
ae (1.13)
If the components of a vector field are C with respect to some coordinate
system, then the components are C* with respect to any other. We call such
a vector field a C° vector field. Since we consider only C® vector fields in this
book, we shall refer to them as simply vector fields. We shall denote this family
of veetor fields by T(S), or simply T. Clearly 8; € T (i= 1,---,n).
Now for any X,Y € T and any c € R, the mappings X +Y : p+ X,+Y,
and eX : p++ eX, are also members of T. Hence T is a linear space. In
addition, for any f € F, the mapping fX : p> f(p)Xp is a member of T.
We call F : Vi x Va x «+ x V; > W, where Vi,---,V;,W are linear spaces,
a multilinear mapping if the following property holds. Let F(x) denote a
mapping of one variable equal to F(v;,---,v-) where some v; has been distin-
guished as the variable, and the other v, (j # i) are beld constant to some value
(€ Vj). Then F : v; ++ F(v;) is a linear mapping from V; to W.
Now for each point p € S, let [Tp]® denote the family of multilinear mappings
of the form T, x+++xT, > R, and let [7,]! denote the family of the form
ilesaee”)
F direct products
Tp X +X Ty + Ty. We call mappings A: p > Ay which maps each point
1 direct products
p in S to some element A, of [T,]% (q = 0,1) a tensor field of type (q,7)
on §. The types (0,r) and (1,r) are also respectively called tensor fields
of covariant degree r and tensor fields of contravariant degree 1 and
covariant degree r. Vector fields may be considered to be tensor fields of type
(1,0). Although it is possible to define tensor fields of type (q,7) for q¢ = 2,3,-+-,
they will not be used in this book. In addition, we shall occasionally refer to
tensor fields as simply tensors.
Let A be a tensor field of type (g,7) and X1,---,
‘, ber vector fields. Then1.3, VECTOR FIELDS AND TENSOR FIELDS 9
we may consider a mapping with domain $ of the following form:
A(X 521+, Xr) ep Ap((Xa)py-++s (Xr)p)- (114)
When q =0, A,((X1)p,---s(X;)p) € R and hence this mapping is a real-valued
function on S. When q = 1, Ap((X1)p,---,(X+)p) € Tp, and hence this defines
a vector field on S. mn A, if for all O* vector fields X1,--+,X, € T the
mapping A(X,,---, X,) is C® (i.e., when g = 0 the mapping is in F, and when
1 it is in T), we call A a C™ tensor field. Below, we consider only C™
tensor fields, and shall simply call them tensor fields.
Consider the tensor field A of type (q,7) to be a mapping (Xy,---,X,) >
A(Xi,--+,X,). Then when q = 0 we have A: Tx---x 7 — F, and when
+ direct products
xT, -+T. This, in addition to forming a multilinear
= 1wehave A: Tx
r direct products
mapping, has the following property: for all fi,..., fr € F,
Af X10 fe Xr) = fi fe A(Xay-++ Xp).
We call this the F-multilinearity of A. Conversely, if the mapping A: T x
-xT —» F, or alternatively A:T x ++. x T + T is F-multilinear, then this
determines a tensor field p> Ap satisfying Equation (1.14).
of a tensor field A of type (0,r) on the r basis vector fields
defines a function. Let us denote this by
(D542 85,) = Aigo
‘We call the n” functions {Aj,...;,} obtained by changing the values of i,-+-,i-
the components of A with respect to the coordinate system [€"]. Let X1,-++, Xr
be r vector fields; these may be expressed component-wise as X; = X}0;. Then
from F-multilinearity, we have
A(X1 00+, Xe) = Aigng, XB 0 XE
In the case of a tensor field A of type (1,7), A(Ai,,*++,0i,,) is a vector field, and
its component expression is given by
(Bi, 4 + 85g) = Abi Dk
The n™* functions {A* ,, } thus defined are called the components of A with
respect to [é‘]. As in the previous case, letting Xj = X}0;, the following holds:
A(X1, +++ Xp) = (AB XP XE Op
Let [p)] be another coordinate system. Using ~ to denote components with
respect to [9%], we have
; ogi
Aju, = Aiqmig ()- (1.15)
Boo = Mos (FE (2s)10 1. ELEMENTARY DIFFERENTIAL GEOMETRY
1.4 Submanifolds
Let S$ and M be manifolds, where M is a subset of S. Let [€1,---,£"]
and [u!,--+,u"] = [u*] be coordinate systems for S and M, respectively, where
n = dim§ and m = dimM. Below, we shall use the indices i, j,k,--- over
{1,-++,n} for $ and a,6,¢,-- over {1,--+,m} for M.
We call M a submanifold of $ if the following o
hold.
nditions (i), (ii), and (iii)
(i) The restriction €'|,9 of each € (: S + R) to M, is a C™ function on M.
Bi,
(i) Let By“ ($55) (more precisely, (2) ) and Ba Bre
P 2
R®. Then for each point p in M, {Bi,-+- Bm} are linearly independent
(hence m < n)
(iii) For any open subset W of M, there exists U, an open subset of $, such
that W = MAU.
‘These conditions are independent of the choice of coordinate systems [¢‘] and
[u“]. Indeed, conditions (i) and (ii) mean that the embedding 1: M — $
defined by u(p) = p, Vp € M, is a C™ mapping and that its differential (de), is
nondegenerate at each point p.
An open subset of $, as we noted in §1.1, forms an n-dimensional manifold;
in addition, it is also a submanifold of S. We may construct an example of a
submanifold of dimension m (< n) in the following way. Let [E"] be a coordinate
system of Sand {c’'"*!,.--,c"} be n —m real numbers. Now define
det
M {p€ SE (p)
m+1
0 (1.21)
Note that (, }, € [Z%p(S)]Q since from Equations (1.19) and (1.20) we see that
(, }p is a bilinear form, Hence the mapping from points in $ to theit inner
product on Ty(5), say g: p+ (, )ps is a tensor field of covariant degree 2. We
call this a (C®) Riemannian metric on $. Such a metric, g, is not naturally
determined by the structure of S as a manifold; it is possible to consider an
infinite number of Riemannian metrics on S. Given a Riemannian metric g on
, we call $ (more precisely (5, g)) a Riemannian manifold.
def 9
Let (¢'] be a coordinate system for S, and let
nents {gi3i,j = 1,--+,n} (n = dim $) of a Riemannian metric g with respect
to {€'] are determined by a (0:,0)). This is a C° function which maps
each point p in S to g.s(p) = ((%)p: (Oj)p), If we rewrite the tangent vectors
D,D! € Tp in terms of i coordinates as D = D*(8,), and D’ = D"(A;)p,
their inner product may then be written as:
(D, D'), = 9:;(p)D‘D"
2. Then the compo-12 1, ELEMENTARY DIFFERENTIAL GEOMETRY
Also, the length ||D|| of the tangent vector D is given by
DI? = (D,D), = g(P)D'D?
If we let G(p) = [gis(p)] be an n x n matrix whose (i, j)" element is 9ij(p),
we see from Equations (1.20) and (1.21) that this is a positive definite sym-
metric matrix. Conversely, suppose we are given a coordinate system (¢'] for
an n-dimensional manifold S, and n? C® functions {gig} (C F(S)). Then if
G(p) = {9is(P)] is a positive definite symmetric matrix for every point p € S,
the corresponding Riemannian metric on $ which has gi; as its components
with respect to [é'] is uniquely determined. ‘The relationship between these
abe) with respect to
components and the components gue = (Se,de) (8k
a different coordinate system (p*] is given by the following transformations of
covariant tensor fields of order 2 (refer to Equation (1.15):
agi (62 apk\ (ap!
n= 0 ($5) (35) ot as = on (36) (55). 29
Let g'(p) be the (i, 3)" component of the inverse G(p)~' of G(p) = [as(P)
(this inverse is also positive definite symmetric). Now define the function g‘! :
pr gii(p) on S. Then
1 (k=4) ¢
io ei , (1.28)
and the relationship between this inverse and G(p)~! = [9*(p)], which is the
inverse of G(p) = [gue()], is given by the following.
moe ()() at roe (S)(E) 09
Let 7: a,b] > S be a curve in the Riemannian manifold S. We define its
length |[>]| to be
def 2
tt f
where 4! is the derivative of 7° & 0 (see Equation (1.6).)
Let M be a submanifold of a Riemannian manifold $. As noted in §1.4, for
each point p € M, we ey view T,(M) as a linear subspace of T,(S), and hence
an inner product g(p) = (, ), on T,(S) naturally defines an inner Prot on
T,(M). Then, letting Ap uli (p) ‘denote this inner product, glay : p++ g|nc(p) is @
Riemannian metric on M. en a coordinate system [u*] on M, we see from
Equation (1.18) that the components of glar; {gav} satisfy
(2) (8).
(1.25)1.6. AFFINE CONNECTIONS AND COVARIANT DERIVATIVES 13
1.6 Affine connections and covariant derivatives
Let S be an n-dimensional manifold. If S is an open subset of R”, then by.
defining the tangent vector of a curve “y according to Equation (1.4), the tangent
space T,, = Tp($) at each point p € S may be considered equivalent to R". T
means that for p and not equal, there is still a natural correspondence between,
Ty and Ty. For a general manifold S, however, T,, and Ty are entirely different,
spaces when p # q. Hence, to consider relationships between T, and T,, we
must somehow augment the structure of S as a manifold. Affine connections
are such a structural augmentation.
Intuitively, defining an affine connection on a manifold S means that for
each point p in $ and its “neighbor” p’, we define a linear one-to-one mapping
between T, and Ty. Here we call p’ a neighbor of p if, given a coordinate system
[E'] of S, the difference between the coordinates of p and p’, dé! “ €'(p!)—E'(p),
when construed as a first-order infinitesimal, is sufficiently small that we may
ignore the second-order infinitesimals (dé*)(dé/), Below we shall introduce the
notion of affine connections in an intuitive manner using infinitesimals. (It is
possible to formalize this discussion by using fiber bundles.)
‘As shown in Figure 1.4, in order to establish a linear mapping IIp.yr between
Ty and Ty we must specify, for each j € {1,---,n}, how to express Tp.» ((25)p)
in terms of a linear combination of {(01)p',-+,(On)p'} (a
assume that the difference between TIp,p((8))p) and (Aj), is
and that it may be expressed as a linear combination of {dg",---,dé"}. Then
we have
Tp. ((Os)») = Oily ~ AEH) 9(Oe) yr» (1.27)
where {(P%,)p3i,j,k = 1,---yn} are n® real numbers which depend on the point
P.
If for each pair of neighboring points p and p’ in S, there is defined a linear
mapping Ip,» : Tp —* Tyr of the form described in Equation (1.27), and if the
n° functions Pf; : p+ (PS)p are all C™, then we say that we have introduced
an affine connection on S. In addition, we call {I} the connection coeffi-
cients of the affine connection with respect to the coordinate system |é‘]. Note
that the only constraint on the connection coefficients are that they be C™, and
that therefore affine connections have this degree of freedom. Below, we often
refer to affine connections as simply connections.
Let [p"] = {p',---,p"] be a coordinate system distinct from (¢‘], and let
4
= $£0,. From Equation (1.27) and the linearity of Ij," we have
My (Budo) = (3) (by 28TH) s(Ode>
.
By substituting into the right hand side of this equation
av) _ (88) , (#8) 4,
(i), . (ae), * (arom), and14 1. ELEMENTARY DIFFERENTIAL GEOMETRY
Figure 1.4: Affine connection (an infinitesimal translation)
dei = (a) (a # PP!) 0"),
and ignoring second order infinitesimals, we obtain
Thy ys ((5s)p) = (s)yr = 4p" (F)pBidprs (1.28)
where (I,), is the value of the function
: agi agi BEE Y dp!
t= {pe apt
T= {rs Opr 8, Opps f aE (1.29)
at the point p. Note that Equations (1.27) and (1.28) are of the same form.
Furthermore, if the functions I's, are C* for all (i, j,k) then so are the functions
F, for all (r,s,¢). In other words, the notion of affine connections is independent,
of the choice of coordinate system. Their connection coefficients, however, are
related according to Equation (1.29).
An affine connection determines, for neighboring points p and p’, a corre-
spondence between T, and Ty. By connecting a sequence of such correspon-
dences, we may find, for non-neighboring points p and q, @ correspondence
between 7; and T,. This correspondence depends, however, on the curve by
which one connects p and q. Let us define the notion of “translating tangent
vectors along a curve” in the following way.
Let 7 : [4,6] + $, where >(a) = p and (b) = q, be a curve which connects
points p and q in S. We call a mapping from each point 7(t) to a tangent
vector X(t) € Typ, say X : t+ X(t), a vector field along +. Given such
a vector field X, if, for all ¢ € [a,b] and the corresponding infinitesimal dt, the
corresponding tangent vectors are linearly related as specified by the connection,
ie, if
X(E4 at) = My (e+ (XO)s (1.30)1.6. AFFINE CONNECTIONS AND COVARIANT DERIVATIVES 15
x)= (0)
va)
Figure 1.5: Translation of a tangent vector along a curve
then we say that X is parallel along y (Figure 1.5)
Let us rewrite the equation above with respect to the coordinate system [é"].
Letting 0; = 22;, we have X(t) = X*(1)(8:),. From Equation (1.27) we have
that
i. +an(X()) = {X*O) — a
VOX OT) 0} Ordyeray, (1-31)
where >! % g' 07, and 7#(t) is its derivative with respect to t. Now since in
addition, eae = X'(i-+dt)(A:) (+4 ae), Substituting this into Equation (1.30)
we obtain
X()(TE) sa) =
where X*(t) “f X40 — X*+d)—¥" | Equation (1.32) is an ordinary linear
differential equation on X(t), Sxny '), and hence given an initial (boundary)
condition there exists a unique solution, From this, given D € Tyia) = Tpy we
see that there exists a unique parallel vector field along y such that X(a) = D.
Then letting T1,(D) denote the vector X(b) € T,, = T, determined by D, we
see that IL, is a linear isomorphism from T; to T,. We call IL, the parallel
translation along -y.
Let 7: [a,b] > S be a curve and X be a vector field along +. In general,
X(t) and X(¢+h) lie in different tangent spaces and hence it is not possible
to consider the derivative $X0 = lim,p *4/=*. However, if an affine
connection is given on S, then the parallel translation of X(t +h) € Tyr) to
the space Tyq along 7 gives us Xi(¢+h) = Myr) co(X (t+ A)» and using
J=X19 We call
In other words,
(1.32)
this we may consider within T,.) the quantity lim, 0%
this the covariant derivative of X(t), and denote it by “4
instead of dX (t) = X(t + dt) ~ X(t), we use
6X () = Myeran (X(t + dt) — XO) (1.33)
(see Figure 1.6).16 1 MENTARY DIFFERENTIAL GEOMETRY
Figure 1.6: The covariant derivative along a curve.
Rewriting X(t) as X7(t)(0}) (2), we have
Ta cerae)r((X(t+dt)) = {X*(t + dt) + dt7'()X (HLH). } (Gedo (134)
and substituting this into Equation (1.33), we obtain
5X(t)
dt
{8+ FOX OMe} Ove (1.38)
This also forms a vector field along 7. In addition, we see that the parallel
translation condition in Equation (1.32) may now be written simply as 2% = 0,
In this way, using an affine connection it is possible to define the infinitesimal
4X and the derivative & of a vector field X(t) along a curve. Extending this to
“the directional derivative of a vector field X = X*9, € T on S along a tangent
vector D = D‘(8;), € T,” is straightforward as follows: consider a curve whose
tangent vector at the point p is D, and by taking the covariant derivative of X
along this curve we obtain
VoX = D' {(AX*)p + XLS )p} (e)p € Tr(S)- (1.36)
In fact, letting X, : t + Xx) for an arbitrary curve 7, we have from Equa-
tions (1.35) and (1.36) that
dXyH
at
We may also define for each X,Y € T(S) the vector field Vx¥ € T(S) by
(VxY)p = Vx,¥ € Tp($). We call this the covariant derivative of Y with
respect to X. Given X =X", and Y = Y“8;, we may write
Vin (1.37)
Vx¥ = X'{a¥* + VTE} dy. (1.38)1.7, FLATNESS 7
In particular, when X = 0; and Y = 8}, we obtain the component expression
of the covariant derivative
Vo; =T hx. (1.39)
‘This may be construed as the vector field which expresses the change in the
basis vector 9; as it is moved in the direction of 0.
‘The operator V : T x T > T which maps (X,Y) to Vx¥ satisfies the
following properties: for arbitrary X,Y,Z € T and f € F (: the set of C™
functions on S),
(i) VxsyZ =VxZ+VyZ.
(ii) Vx(¥ + Z)=Vx¥ + VxZ.
(iii) Vyx¥ = fVxY.
(iv) Vx(f¥) = fVXY + (X/)Y-
Here, Xf denotes the function p + Xf (€ F). Note that VxY is F-linear
with respect to X, but not with respect to Y, and hence V is not a tensor field.
In fact, it is possible to consider the conditions (i)-(iv) as the defining prop-
erties of affine connections. In other words, we may define an affine connec-
tion on $ to be a mapping V : T(S) x T(S) ~ T(S) which satisfies condi-
tions (i)-(iv). In addition, we may define the connection coefficients {I} of
V with respect to some coordinate system {é‘] to be the n® functions deter-
mined by Equation (1.39). Then it is possible to prove Equations (1.38) and
(1.29) from conditions (i)-(iv). It is also possible to reverse the derivation in
Equations (1.32)-(1.37) to arrive at the definitions of 9X9 and TL, from that
of V. This method would make the use of both infinitesimals and fiber bundles
unnecessary. In this book, we shall often refer to the “connection V”.
Finally, we note that the totality of affine connections on a manifold forms
an affine space. In other words, for any affine connections V and V' and for
any real number a € R, the affine combination aV + (1~a)V! defines another
affine connection. Note also that the difference of two affine connections is a
tensor field of type (1,2)
1.7 Flatness
Let X € T(S) be a vector field on S. If for any curve on S, Xy:t++ Xai) is
parallel along +7 (with respect to the connection V), we say that X is parallel on
5 (with respect to V.) In this case, for any curve 7 which connects points p and
4g, Xq = 11,(Xp) holds. A necessary and sufficient condition for an X =X‘,
to be parallel is that Vy.X =0 for all ¥ € T(S), or equivalently that
OX* + XITE, = 0. (1.40)
Note that nonzero parallel vector fields do not exist in general.18 1. ELEMENTARY DIFFERENTIAL GEOMETRY
Let {¢!] be a coordinate system of S, and suppose that with respect to this
coordinate system the n basis vector fields 8; = ge (i = 1,+--,n) are all parallel
on S. Then we call [é'] an affine coordinate system for V. This condition
is equivalent both to Vs,0; = 0 and also to the condition that the connection
coefficients {I's} of V with respect to {E"] are all identically 0,
Given some connection, a corresponding affine coordinate system does not
in general exist. If an affine coordinate system exists for connection V, we say
that V is flat, or alternatively that $ is flat with respect to V. Let (€'] be an
affine coordinate system. Then with respect to a different, coordinate
[p"], we see from Equation (1.29) that the connection coefficients {T!,} may be
written as Pt, = 3-$5> 98. Hence a necessary and sufficient condition for ("]
to be another affine coordinate system is that 32-§- = 0. This is equivalent to
. ap
the condition that there exist an n x n matrix A and an n-dimensional vector
B such that
&(p) = Ap(p) + B (vp eS) (1.41)
((p)_= {'(p)] and p(p) = [p"(p)].) We call a transformation of the form
described in Equation (1.41) an affine transformation (when B = 0, this
is simply a linear transformation). In addition, we see that this transformation
is regular, ie., one-to-one, and that A is a regular matrix, The collection of
such regular affine transformations form a group, and affine coordinate s
have this degree of freedom.
Let V be a connection on $. Then for vector fields X,¥, Z € T, if we define
RXYY)Z & oVx(VyZ)-Vy(VxZ)—VixwyZ and (1.42)
T(X,Y) & vxy-VyX-[x,¥], (1.43)
then these are also vector fields (€ T). Here, letting X = X*, and Y = Y%
we have defined [X,Y] to be the vector field
[X,Y] = (X7,¥* - ¥99,X*)9;
(this does not depend on the choice of coordinate system). The mappings R :
TxTxT—T and T:T xT —T as defined above are both F-multilinear.
Hence R and T are respectively tensor fields of types (1,3) and (1,2). We call
R the Riemann-Christoffel curvature tensor (field) of V, or more simply
the curvature tensor (field), and T the torsion tensor (field) of V. The
component expressions of R and T with respect to coordinate system [E'] are
RA, 9; = Rie and T (04,4) = THAg (1.44)
2 ), and these may be computed in the following way:
Ri, = AV, - OT, +050, -T4,0% and (1.45)
Th = TE-Th. (1.46)1.8, AUTOPARALLEL SUBMANIFOLDS 19
If (€'] is an affine coordinate system for V, then clearly Rf,,, = 0 and T= 0. In
fact, in this case the components of R and T, since they are tensors, are always
all 0 with respect to any coordinate system. In other words, if V is flat, then
R=0and T =0. Conversely, if R = 0 and T = 0, it is known that V is locally
flat in the following sense: for each point p € S, there exists a neighborhood U
of p such that V is flat on U. A proof will be found in standard textbooks of
differential geometry.
In general, when T = 0 (ie., Pi; = T4,) holds, V is called a symmetric
connection or torsion-free connection. The connections having appeared
so far in information geometry are mostly symmetric connections. However,
the incorporation of torsion into the framework of information geometry, which
would relate it to such fields as quantum mechanics (noncommutative probabil-
ity theory) and systems theory, is an interesting topic for the future. We will
make an attempt in this direction in §7.3
If a connection is flat, then parallel translation does not depend on the
ect the two points. In particular, the n basis vector fields
yn) of an affine coordinate system (¢'] are parallel vector
fields, and hence IL, ((8))») = (8i)q regardless of the curve 7 used to connect the
points p and q. In addition, if the components X* of a veetor field X = X*0,
are all constant on S, then X is parallel, and I1,(X,) = X,
In general, if parallel translation does not depend on curve choice, or in
other words if there are n linearly independent parallel vector fields on S then
0, and in addition, when $ is simply connected (ie., when arbitrary closed
loops may be continuously contracted to a single point) it is known that the
converse also holds.* There exist, however, connections for which R = 0 and
T #0. When this is the case, although parallel translation does not depend
on the curve selected, there does not exist an affine coordinate system. Such
spaces, called spaces of distant parallelism, were introduced by Einstein within
the context of unified field theory, and also serve a major role within the theory
of non-Riemannian plasticity. Another example will be shown in §7.3.
From Equations (1.45) and (1.46) we see that in general Ri, = —R§,, and
TS =~T. Hence, in the particular ease when § is 1-dimensional, R = 0 and
T =0 necessarily hold, and therefore $ is flat.
1.8 Autoparallel submanifolds
Let $ be an n-dimensional manifold and M be an m-dimensional submanifold
of S. Let {¢'] and {u*] be coordinate systems for S and M, respectively, and let
O; = gr and d, = zez. Suppose also that V is an affine connection on $ and
that {I'S} are the connection coefficients of V with respect to [é']. Now letting
X = XQ, and Y = ¥*0, € T(M) be vector fields on M, we may consider
Vx,Y, the “directional derivative of ¥ along X,,”, as we did in Equation (1.36)
However, even though in general Vx,¥ is a tangent veetor of S (€ T,(S)), it
2 There are those who define “flat” to denote this case.20 1. ELEMENTARY DIFFERENTIAL GEOMETRY
is not necessarily a tangent vector of M (€ T,(M)). If we let Vx¥ denote the
mapping from points p in M to Vx,Y € Tp(S), then using identities such as
Oa = (Oa€')8; we have
Vx¥ = XB"), + XY" { (aE )(Oe WK, + PaOse™} Oe (1.47)
In particular, letting X = 0, and Y = 05, we obtain
Vo, db = { (Bak (ETE, + AOE} (1.48)
Note also that Equation (1.47) may be written as
Vx¥ = X°(aY) + X*YVo, O (1.49)
As we mentioned above, for X,Y € T(M), (Vx¥)p = Vx,¥ is an element,
of T,(S), but not necessarily one of T,(M), ie., in general, VY ¢ T(M). If,
however,
VxYET(M) for ¥X,YeT(M), (1.50)
then V determines a covariant derivative on M. In fact, when this is the case
the conditions (i)-(iv) from §1.6 hold for all X,¥,Z € T(M) and all f ¢ F(M),
and V is an affine connection on M. If we use this connection to define a
parallel translation IE} : T;a)(M) —+ T,)(M) on M along the curves 7 :
(a,b] + M, then this translation coincides exactly with the parallel translation
IL, : Ty(a)(S) + T,()(S) on S restricted to the tangent spaces of M, using the
original connection on S. In other words
nie Thal cary (1.51)
If a submanifold M of S satisfies Equation (1.50), we say that M is au-
toparallel with respect to V. In particular, open subsets of § are autoparallel.
From Equation (1.49) we see that a necessary and sufficient condition for M
to be autoparallel is that Va,0; € T(M) holds for all a,b. This, in turn, is
equivalent to there existing m* funetions {I<,} ( F(M)) which satisfy
V0.5 = To4de. (1.52)
These {I,} form the connection coefficients of V with respect to fu]. Using
Equation (1.48) we may rewrite Equation (1.52) in the following way:
TesOcE* = (On€")(BEI, + DaOE*. (1.53)
We can also see that M is autoparallel in $ if and only if M is closed with respect
to the parallel translation on in the following sense: for every curve 7 : (a, 0] —>
M in M and for every tangent vector D of M at 7(a), the result IL,(D) of the
parallel translation IL, : T,(a)(S) —+ Te)(S) belongs to the tangent space of M
at (0).1.8. AUTOPARALLEL SUBMANIFOLDS 21
1-dimensional autoparallel submanifolds are called autoparallel curves or
geodesics. For a curve 7: 1+ 7(t), the condition in Equation (1.52) may be
rewritten using Equation (1.37) as
2. (1.54)
where [': £++ T(t) is a C® function. As we noted at the end of §1.7, connections
on L-dimensional manifolds are necessarily flat, and hence by substituting into
Equation (1.54) a suitable one-to-one transformation (change of variable) of t,
we may obtain I(¢) = 0. We call such a ¢ an affine parameter of +. In this
case Equation (1.54) reduces to
(1.55)
and implies that 7 is parallel along 7. It is possible to define geodesics using
Equation (1.55). Rewriting Equation (1.55) using the coordinate system [é']
and the corresponding representation »* = £' 0-7, we obtain
A) + FOP OTH) .« = 0. (1.56)
Let M be an autoparallel submanifold of $. If the torsion tensor of $ is
0, then the torsion tensor of M is also 0. This is clear from Equations (1.46)
and (1.53). The same holds for the curvature tensor. The latter fact may be
derived using Equations (1.45) and (1.53), but it is in fact immediate from the
analysis of parallel translation as follows: from Equation (1.51) we see that if
the choice of curve does not affect parallel translation in S, then it similarly does
not in M. Note that, in the case when parallel translation does not depend on
curve choice, a necessary and sufficient condition for a submanifold M to be
autoparallel in S is that there exist m = (dim M) linearly independent vector
fields on M which are parallel with respect to the connection on S.
Consider the case when $ is flat with respect to V. Then by the argu-
ment above autoparallel submanifolds of $ are also flat. Hence without loss
of generality we may assume that [€'] and [u®] are affine coordinate systems
in Equation (1.53), the condition for a submanifold M of $ to be autoparallel.
Equation (1.53) then reduces to d,0:€" = 0. This condition is equivalent to
there existing an n xm matrix A and an n-dimensional vector B which satisfies
€(p) = Au(p) + B (vp € M) (1.57)
(E(p) = [€'(p)] and u(p) = [u"(p)}.) Tn general, a subspace of R” which may
be expressed as {Au + Blu € R™} is called an affine subspace of R” (; when
B = 0 we have a linear subspace). We summarize the discussion above in the
following theorem.
Theorem 1.1 If 5 is flat, then a necessary and sufficient condition for a sub-
manifold M to be autoparaliel is that M is expressed as an affine subspace (or22 1. ELEMENTARY DIFFERENTIAL GEOMETRY
an open subset of an affine subspace) of S with respect to an affine coordinate
system. In particular, geodesics may be expressed using linear equations (as a
line or a segment) with respect to affine coordinate systems. In addition, if M
is autoparallel, then it is also flat,
1.9 Projection of connections and embedding
curvature
If M is a submanifold of $ which is not autoparallel with respect to V on S, then
there is no natural connection on M which may be derived from V. However, if
there is for each point p a mapping 7, from Tp(S) to Tp(M), then we may use
this to define a connection on Mf. Assume that 7, : Tp($) Ty(M) is a linear
mapping and that 7,(D) = D for every D € T,(M) , and that the relation
, for each X,Y € T(M), we define V&) € T(M)
p+ mr is C*. Now suppose
in the following way:
(WY) =m((Vx¥),) (Vp M). (1.58)
‘Then V“ is a connection on M. In particular, if a Riemannian metric g = (, }
is given on S, we may take as m, the orthogonal projection with respect to g.
This is defined to be that which satisfies, for all 1D € T,($) and all D’ € T,(M),
(rp(D), D’), = (D, D" (1.59)
Ip
‘We call such V(") the projection of V onto M with respect to g.
If S has a coordinate system [¢'], then the connection coefficients {I} of V
are determined by Equation (1.39). If § also has a Riemannian metrie g, then
we may define n° additional functions {P';;,.} in the following way:
Page = (Wo,0).9e) = Thon. (1.60)
‘The quantities {Tc}, like {T#,}, may be considered as a different component
expression of the same V. With respect to a different coordinate system [p"] for
S, these may be written as follows (3, “ 32-)
a oo aren agk
Droge (Pid) = (Fe pete + Sone Pa (1.61)
Similarly, for the projection V‘") of V onto M, we may define, given a co-
ordinate system [u‘] for MTS), & (Wa, 2) (Oo & 38 ). Using Equa-
tions (1.58), (1.59) and (1.48) we may rewrite this as
pe
‘aby
= (Fa, 06,8) = {(Ba6*)(Os6! Pipe + (Oad4€?)gyn} (OcE*). (1.62)1.10, RIEMANNIAN CONNECTION 23
‘The connection coefficients of V‘") are then given by PG)! = PG).g%, From
this, we see that if V is symmetric, then so is V'*).
Now let
H(X,Y) © vxy -VOy (1.63)
for X,Y € T(M). Then (H(X,Y))p = (Vx¥)p—tp((Vx¥)p) is the orthogonal
projection of (Vx¥)» onto [T,(M)]-, the orthocomplement of T,(M). Given
this, note that the autoparallel condition for M in Equation (1.50) is equivalent.
to stating that H(X,¥) = 0 holds for all X,¥ € 7(M), and that this, in turn,
is equivalent to simply stating that H = 0. Intuitively, H’ may be considered
as measuring the degree to which M is “not autoparallel” or “curved” in S. In
addition, since H(X,¥) is F(M)-linear with respect to both X and Y (ie., is
F(M)-bilinear), H may be considered as “a kind of” tensor field, even though
H(X,Y) is not a vector field on M in general. We call such an H an embedding
curvature of the submanifold M (c S) with respect to V.
Since M has V') as a connection, we may use this to compute its Riemann-
Christoffel curvature R™, This R expresses the “inherent curvature” of
M itself, while the embedding curvature H expresses the curvature of the ar-
rangement of M within $. As we noted in §1.8, if R, the Riemann-Christoffel
curvature of $, is 0, and if, in addition, H = 0 (ie., M is autoparallel), then
R®) = 0 also. However, R‘) = 0 does not entail H = 0. For example, con-
sider a cylinder surface M embedded within a 3-dimensional Euclidean space.
‘The 2-dimensional geometry on the surface of this cylinder is Euclidean, and
R® = 0. However, within the 3-dimensional spac is curved, and hence H is
not 0. It is important to distinguish these two notions of curvature.
For each point p in S, let {(z)p;1