0% found this document useful (0 votes)
25 views

Geomstats Tutorial

This document introduces Riemannian geometry and its implementation in the Geomstats Python package. It provides an overview of differentiable manifolds, Riemannian manifolds, Lie groups, and more complex geometric structures. The goal is to make Riemannian geometry and geometric statistics tools accessible to applied scientists through efficient and high-level implementations in Geomstats that hide mathematical complexities. Examples are provided throughout to illustrate key concepts and how they are realized in the package.

Uploaded by

jmatter4
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Geomstats Tutorial

This document introduces Riemannian geometry and its implementation in the Geomstats Python package. It provides an overview of differentiable manifolds, Riemannian manifolds, Lie groups, and more complex geometric structures. The goal is to make Riemannian geometry and geometric statistics tools accessible to applied scientists through efficient and high-level implementations in Geomstats that hide mathematical complexities. Examples are provided throughout to illustrate key concepts and how they are realized in the package.

Uploaded by

jmatter4
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 132

Introduction to Riemannian Geometry and Geometric

Statistics: from basic theory to implementation with


Geomstats
Nicolas Guigui, Nina Miolane, Xavier Pennec

To cite this version:


Nicolas Guigui, Nina Miolane, Xavier Pennec. Introduction to Riemannian Geometry and Geometric
Statistics: from basic theory to implementation with Geomstats. Foundations and Trends in Machine
Learning, 2023, 16 (3), pp.329-493. �10.1561/2200000098�. �hal-03766900v2�

HAL Id: hal-03766900


https://inria.hal.science/hal-03766900v2
Submitted on 2 Feb 2023

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est


archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.

Distributed under a Creative Commons Attribution| 4.0 International License


Introduction to Riemannian Geometry and Geometric Statistics: From
Basic Theory to Implementation with Geomstats
Nicolas Guigui Nina Miolane
Université Côte d’Azur, Inria University of California Santa Barbara
[email protected] [email protected]
Xavier Pennec
Université Côte d’Azur, Inria
[email protected]

February 2, 2023

Abstract
As data is a predominant resource in applications, Riemannian geometry is a
natural framework to model and unify complex nonlinear sources of data. However,
the development of computational tools from the basic theory of Riemannian geometry
is laborious. The work presented here forms one of the main contributions to the
open-source project geomstats, that consists of a Python package providing efficient
implementations of the concepts of Riemannian geometry and geometric statistics, both
for mathematicians and for applied scientists for whom most of the difficulties are
hidden under high-level functions. The goal of this monograph is two-fold. First, we
aim at giving a self-contained exposition of the basic concepts of Riemannian geometry,
providing illustrations and examples at each step and adopting a computational point
of view. The second goal is to demonstrate how these concepts are implemented in
Geomstats, explaining the choices that were made and the conventions chosen. The
general concepts are exposed and specific examples are detailed along the text. The
culmination of this implementation is to be able to perform statistics and machine
learning on manifolds, with as few lines of codes as in the wide-spread machine learning
tool scikit-learn. We exemplify this with an introduction to geometric statistics.

1 Introduction
Since the formal axiomization of Euclid in his famous Elements (dated around 300 BC),
geometry was considered as the properties of figures in the plane or in space. The abstract
notion of space as a mathematical object emerged in 1827 with C. F. Gauss’ Theorema
Egrerium proving that curvature is an intrinsic quantity of a surface, i.e. that can be
computed without reference to a “larger” embedding space. This notion was made precise by

1
the cornerstone work of Riemann (1868)1 built around the intuitive idea that a mathematical
space results from varying a number of independent quantities, later identified as coordinates
and formalized in the definition of a manifold by Whitney (1936). Riemannian Geometry
(RG) is the study of such differentiable manifolds equipped with an inner product at each
point that smoothly varies between points. This allows us to generalize the notions of angles,
length and volumes, which can be integrated to global quantities highly coupled with the
topology of the space.
Fruitful developments of these ideas allowed unifying previous examples of non-Euclidean
geometries, that violate Euclid’s parallel postulate (given a point and a straight-line, one
and only one parallel straight-line can be drawn through the point). These ideas echoed
with the developments of Lagrangian and Hamiltonian mechanics, and were instrumental
in formalizing the modern theories of Physics, and especially Einstein’s general relativity.
They also made profound impact on many areas of mathematics such as group theory,
representation theory, analysis, and algebraic and differential topology.
At the intersection of Physics and geometry, groups represent symmetries and transfor-
mations between states, and from the modern point of view of Klein’s Erlangen program,
the study of geometry boils down to studying the action of groups on a space, and their
invariants. The work of Elie Cartan enabled significant progress in this direction.
Riemannian geometry has thus become a vast subject that is not usually taught before
graduate education in mathematics or physics, and that requires familiarity with many
concepts from differential geometry. Hence, although some books on the topic cover most
of the pre-requisites and fundamental results of Riemannian geometry, the entry cost for
applied mathematicians, computer scientists and engineers is high.
Nowadays, as data is a predominant resource in applications, Riemannian geometry is a
natural framework to model and unify complex nonlinear sources of data. However, the
development of computational tools from the basic theory of Riemannian geometry is labo-
rious due to often high dimensional and non-exhaustive coordinate systems, nonlinear and
intractable differential equations, etc. This monograph aims at providing the computational
tools to perform statistics and machine learning on Riemannian manifolds to the wider
data science community. The work presented here forms one of the main contributions to
the open-source project geomstats, that consists in a Python package providing efficient
implementations of the concepts of Riemannian geometry and geometric statistics, both
for mathematicians and for applied scientists for whom most of the difficulties are hidden
under high-level functions.
Other Python packages do exist and mainly fall under one of two following categories.
On the one hand, there are the packages that focus on a single application, for instance
on optimization: Pymanopt (Townsend et al., 2016), Geoopt (Becigneul and Ganea, 2019;

1
See Riemann (1873), the English translation by W.K. Clifford

2
Kochurov et al., 2020), TensorFlow RiemOpt (Smirnov, 2021), and McTorch (Meghwanshi
et al., 2018) or on deep learning such as PyG (Fey and Lenssen, 2019), where, in this case
the geometry is often restricted to graph and mesh spaces. On the other hand, there are
packages dedicated to a single manifold: PyRiemann on SPD matrices (Barachant, 2015),
PyQuaternions on 3D rotations (Wynn, 2014), and PyGeometry on spheres, tori, 3D rotations
and translations (Censi, 2012). Some other packages, like TheanoGeometry (Kühnel and
Sommer, 2017) are not actively maintained anymore. There is therefore a need for a unified
open-source implementation of differential geometry and associated learning algorithms for
manifold-valued data.
The goal of this monograph is two-fold. First, we aim at giving a self-contained exposition
of the basic concepts of Riemannian geometry, providing illustrations and examples at each
step and adopting a computational point of view. We cover the basics of differentiable
manifolds (Section 2), Riemannian manifolds (Section 3) and Lie groups (Section 4). Then
we delve into more complex structures defined by invariance properties, in particular quotient
metrics, and metrics on homogeneous and symmetric spaces (Section 5). Most proofs are
omitted for brevity, but references to the proof of each statement are given. The interested
reader may refer to the textbooks Lafontaine et al. (2004), Gallier and Quaintance (2020),
Boumal (2023), and Lee (2003) for more details. Some mathematical definitions from the
prerequisites can be found in the lexicon in Appendix A. The second goal is to demonstrate
how these concepts are implemented in geomstats, explaining the choices that were made
and the conventions chosen. The general concepts are exposed in Section 2.2, and detailed
along the text and examples. The culmination of this implementation is to be able to
perform statistics and machine learning on manifolds, with as few lines of codes as in the
wide-spread machine learning tool scikit-learn. We exemplify this in Section 6 with a
brief introduction to geometric statistics.

Updates This monograph was written for geomstats version 2.5.0 released in April 2022.
There are two categories of code snippets in this monograph: those from the core of the
package, that explain how it is implemented, and those for examples and figures. For
updates on the former, we invite the interested reader to check the current master branch
of the main geomstats repository2 . In order to ensure compatibility of the latter with future
releases of the package, we maintain a companion repository on Github3 .

2
https://github.com/geomstats/geomstats
3
https://github.com/geomstats/ftmal-paper

3
2 Differentiable manifolds
2.1 Differentiable manifolds and tangent spaces
The differentiable manifold will be the structure underlying this entire monograph, yet its
definition remains difficult for newcomers in the field. We start by that of an embedded
manifold and generalize to the abstract case in a second part. The intuition behind the
notion of smooth manifold is that around every point, it resembles the d-dimensional vector
space Rd for some integer d, and the properties of Rd allow to define the notions of smooth
functions, tangent vectors, etc. on the manifold.

2.1.1 Embedded manifolds


Let N be a strictly positive integer. The fundamental examples of an embedded manifold
are that of an open set∗ of RN , and a vector subspace Rd × {0}N −d ⊂ RN for d ≤ N , written
Rd × 0 for short. These are ‘deformed’ via local diffeomorphisms∗ to obtain an embedded
manifold, which can be thought of as a smooth surface in the ambient space. This was in
fact one of the first motivations of the mathematical developments underlying the notion of
manifold.

Definition 2.1.1. Let d, N be integers with 1 ≤ d ≤ N . Then a d-dimensional smooth


embedded manifold in RN is a non-empty subset M of RN such that for every point
p ∈ M , there are two open subsets U, V ⊆ RN with p ∈ U and 0 ∈ V , and a smooth
diffeomorphism∗ ϕ : U → V such that ϕ(U ∩ M ) = V ∩ (Rd × {0}N −d ).

In this definition, ϕ may be called a local chart. Thankfully there are equivalent
definitions that give greater insights into what makes M a differentiable manifoldWe first
need to define the notions of immersions and submersions.

Definition 2.1.2. Let n ≤ p be two strictly positive integers, and U ∈ Rp , V ∈ Rn two open
sets.

• A differentiable∗ map f : U → V is called a submersion at x ∈ U if dfx is surjective∗ .


We say that f is a submersion if it is a submersion at every x ∈ U .

• A differentiable map f : V → U is called an immersion at x ∈ V if dfx is injective∗ .


We say that f is an immersion if it is an immersion at every x ∈ V .

The fundamental example of an immersion is the injection defined by (x1 , . . . , xn ) 7→


(x1 , . . . , xn , 0, . . . , 0) ∈ Rp , while the projection (x1 , . . . , xp ) ∈ Rp 7→ (x1 , . . . , xn ) ∈ Rn is
that of a submersion. In fact, one can show that up to a local change of variables (i.e.
composition with a diffeomorphism), these maps are respectively the only immersions and

Defined in Appendix A.
4
submersions. This results from the local inversion theorem, see Balzin (2021, Theorem 4.7)
for a proof. We now have the following characterisation theorem (Gallier and Quaintance,
2020, Theorem 3.6):

Theorem 2.1.1. A nonempty subset M ⊆ RN is a d-dimensional manifold if and only if


any of the following conditions hold:

(1) (Local parametrization) For every p ∈ M , there are two open subsets V ⊆ Rd and
U ⊆ RN with p ∈ U and 0 ∈ V , and a smooth function f : V → RN such that
f (0) = p, f is a homeomorphism∗ between V and U ∩ M , and f is an immersion at 0.

(2) (Local implicit function) For every p ∈ M , there exist an open set U ∈ RN and a
smooth map f : U → RN −d that is a submersion at p, such that U ∩ M = f −1 ({0})2 .

(3) (Local graph) For every x ∈ M , there exist an open neighborhood U ⊆ RN of


x, a neighborhood V ⊆ Rd of 0 and a smooth map f : V → RN −d such that
U ∩ M = graph(f )3 .

The characterization (2) encodes the notion of constraint: a manifold can be understood
as the set of points that verify a constraint defined by an implicit equation, given by the
function f . Figure 2.1 (Left) gives an example where f : x ∈ R2 7→ kxk ∈ R. This is one of
the reasons manifolds are ubiquitous in applications, we will give many examples of this
case. The other characterizations can also be understood as follows. The first (1) implies
that at every point of the manifold, a coordinate system defined on Rd exists to parametrize
the manifold around that point. This is depicted in Figure 2.1 (Right). Finally (3) is the
most common to think of surfaces in R3 as sets of points (x, y, f (x, y)).

Example 2.1: Hypersphere


The most simple manifold we will study is the hypersphere, or d-dimensional sphere.
It is the set of unit-norm vectors of Rd+1 :

S d = {x ∈ Rd+1 | kxk22 = 1}.

Let f : x ∈ Rd+1 7→ kxk2 − 1 ∈ R. For x = 6 0, dfx y = 2x> y is surjective, so


Assertion (2) of Theorem 2.1.1 applies (with U = Rd+1 ) and S d is a d-dimensional
embedded manifold in Rd+1 . In dimension d = 1, this corresponds to the circle, and
for d = 2 this is the usual sphere. Both cases are common to represent angles and
directions in space, and as such appear in the field of directional statistics (Mardia
and Jupp, 2009).

2
Recall that the pre-image of a set A by f : E → F is defined by f −1 (A) = {x ∈ E, f (x) ∈ A}.
3
The graph of f is the set graph(f ) = {(x, f (x))|x ∈ V }.

5
Figure 2.1: Left: Representation of a manifold, definition with local ϕ or by a local immersion f . Right:
Representation of a manifold defined by a submersion, in this case the distance to the origin.

Example 2.2: Hyperbolic space


The fundamental counterpart to the hypersphere is the two-sheeted hyperboloid,
defined by
d
H d = {x ∈ Rd+1 | − x20 + x2i = −1}.
X

i=1
It is one of the models of hyperbolic geometry, which is increasingly used to model
hierarchical data, e.g. (Nickel and Kiela, 2017).

Example 2.3: Special Orthogonal group


Matrix groups play an essential role in the theory of RG, and especially the special
orthogonal group SO(n), i.e. the set of unit determinant orthogonal matrices:

SO(n) = {R ∈ Mn (R) | R> R = In , det(R) = 1}.


(
GL+ (n) −→ S(n)
Consider the map f : where GL+ (n) ∈ Mn (R) '
A 7−→ A> A − In
2
Rn is the open set of invertible squared matrices with positive determinant, S(n)
is the set of symmetric matrices of size n, a vector subspace of Mn (R) of dimension
n(n+1)
2 . It is straightforward to show that the differential of f at some R is

dfR (H) = R> H + H > R,


 
and we can see that it is surjective for all R ∈ SO(n), as for any S ∈ S(n), dfR RS
2 =
S. As SO(n) = f (0), we conclude that SO(n) is indeed an embedded manifold of
−1

dimension n(n−1)
2 .

6
One can represent the motion of a rigid-body in the referential of its barycenter as
a curve with values in SO(3), hence this group is widely used in e.g. robotics (Barczyk
et al., 2015).

Example 2.4: Stiefel manifold


A generalization of both hypersphere and special orthogonal group is the Stiefel
manifold, defined as the set of orthonormal k-frames of Rn . If we represent each
vector ui of a k-frame (u1 , . . . , uk ) as the ith column of a matrix U (in the canonical
basis of Rn ), then the Stiefel manifold can be seen as a subset in Mn,k (R):

St(k, n) = {U ∈ Mn,k (R) | U > U = Ik }.

As in the previous example, we can consider the map f : U 7→ U > U − I on an open


subset of Mn,k (R) and show that it is a submersion such that S(k, n) = f −1 (0) to
conclude that S(k, n) is an embedded manifold of dimension nk − k(k+1) 2 .
The Stiefel manifold naturally arises as the optimization domain in many problems
related to matrix decompositions, in linear algebra, statistics, machine learning,
computer vision, etc. see Absil et al. (2010, and references therein).

2.1.2 Manifolds
For generality, we now define a manifold in a more abstract way, i.e. as a topological
space ∗ that is not a priori embedded in some RN . The idea is still that a manifold is a
space that can be covered by open sets that each look like (i.e. are diffeomorphic to an open
set of) the usual space Rd . Of course one can verify that embedded manifolds are indeed
manifolds with this more general definition, and in fact, Whitney (1936) proved that any
manifold can be smoothly embedded in a larger space, showing that the two concepts are
equivalent. We first motivate the need for a more general definition of manifold by the
example of the Kendall size-and-shape space.

Example 2.5: Kendall size-and-shape space


The underlying idea is that a shape is what is left after removing the effects of
translation and rotation. We first define the set of k landmarks of Rm as the space
of m × k matrices Mm,k (R). For x ∈ Mm,k (R), let xi denote the columns of x, i.e.
points of Rm and let x̄ be their barycenter. We remove the effects of translation by


Defined in Appendix A.

7
considering the matrix with columns xi − x̄ instead of x. Let Mm,k
∗ (R) be the set of

such centered matrices.


Now, in order to remove the effects of rotations, we would like to identify the
landmark configurations that only differ by a rotation of all the landmarks. This
defines an equivalence relation ∼:

∀x, y ∈ Mm,k (R), x ∼ y ⇐⇒ ∃R ∈ SO(m), y = Rx.

A shape thus corresponds to an equivalence class [x] of landmark configurations, and


we can define the size-and-shape space as the quotient of the landmark space by the
equivalence relation ∼ (or equivalently by SO(n)):

SΣkm = {[x] | x ∈ Mm,k (R)}.

A quotient space of a manifold by another manifold may not even be a Hausdorff


space, but we will give sufficient conditions in Section 4.4 to ensure that quotients
resulting from a group action are indeed smooth manifolds.
In this case, SΣkm inherits a differentiable structure from the landmark space,
that turns it into a smooth manifold, although we cannot see it explicitly as a
subset of RN for some N . Implementing tools to work with data on such spaces is a
challenging task that we tackle in geomstats. This is the subject of Section 5. Kendall
size-and-shape spaces are a ubiquitous framework for the statistical analysis of data
arising from medical imaging, computer vision, biology, chemistry and many more
domains (Dryden and Mardia, 2016).

Definition 2.1.3 (Atlas). Let M be a topological space∗ and k ≥ 1 an integer. A C k -atlas


A with values in Rd is a collection of pairs (U, ϕ) called charts where ϕ : U → V is a
homeomorphism between the open sets U ⊂ M and V ⊂ Rd , such that M ⊂ U ∈A U and
S

for any (U, ϕ) and (U 0 , ϕ0 ) in A, the transition map


ϕ0 ◦ ϕ−1 : ϕ(U ∩ U 0 ) → ϕ0 (U ∩ U 0 )
is a C k -diffeomorphism∗ .
Two atlases are C k -compatible if their union is still a C k -atlas. Compatibility defines
an equivalence relation, and we will think of the equivalence class of an atlas whenever
referring to one. There is a unique maximal atlas (for the inclusion) that contains a given
atlas.
Note that the transition maps are defined between open sets of Rd (see Figure 2.2), the
usual notions of differentiability are thus available, and will allow to define such notions on

Defined in Appendix A.

8
manifolds. We will always consider the case k = ∞ and say that C ∞ maps and atlases are
smooth.

Figure 2.2: Transition maps.

Definition 2.1.4 (Differentiable manifold). We call differentiable manifold of class C k and


dimension d any topological space M that is Hausdorff∗ and second-countable∗ together
with a maximal C k -atlas A with values in Rd .

We sometimes refer to the atlas of M as its differentiable structure, and to this definition
of manifold as the intrinsic definition as the charts are defined on M rather than an extrinsic
embedding space. A chart (U, ϕ) defines a set of local coordinates on U written (x1 , . . . , xd )
for short, and defined by xi = pri (ϕ(x)), where pri is the projection on the ith coordinate of
Rd . let us now use the intrinsic definition of a manifold to exemplify further how manifolds
can be obtained from others.

Example 2.6: Product manifold


Let M, N be two manifolds with (Ui , ϕi )i∈I , (Vj , ψj )j∈J their respective atlas. Define
for (i, j) ∈ (I × J)
(
Ui × Vj −→ ϕi (Ui ) × ψj (Vj )
φij :
(x, y) 7−→ (ϕi (x), ψj (y))

Then it is easy to check that (Ui × Vj , ϕij )(i,j)∈I×J is an atlas for the product space
M ×N . This atlas does not depend on the choice of original atlases of M, N (in the right
equivalence class), and allows to endow M × N with the structure of manifold. We call
it the product manifold of M and N . Its dimension is dim(M × N ) = dim M + dim N.

To give insights into the importance of the general notion of manifold, let us now
consider two counter-examples.

9
Example 2.7: Cusp and Node
For more details on these two examples see Gallier and Quaintance (2020, chapter
7). First, we consider the classic example of a space that is not a manifold: the nodal
cubic, shown in Figure 2.3. It is the set of points

M1 = {(x, y) ∈ R2 | y 2 = x2 − x3 },

considered under the subset topology. The self-intersection at the origin does nt
preserve the topology of R, so no homeomorphism can exist between M and R around
the origin. Thus, M is not a manifold.

Figure 2.3: Nodal cubic M1 Figure 2.4: Cuspidal curve M2

Secondly, we consider the cuspidal curve displayed on Figure 2.4 and defined as the
set
M2 = {(x, y) ∈ R2 | y 2 = x3 }.
We can define the maps ϕ : (x, y) ∈ M2 7→ y 1/3 ∈ R and ψ : (x, y) ∈ M2 7→ y ∈ R,
that each define a smooth atlas on M2 and endow it with a differentiable structure of
manifold. However, the two atlases (constituted of a single chart) are not compatible,
so they define different manifolds.

For the next notions that will be introduced, we will use the convenient setting of
embedded manifolds, but all these notions can be generalized to the abstract case by using
charts to recover functions defined between vector spaces.

2.1.3 Tangent spaces and differentiable maps


We first define smooth curves on manifolds, using a local parametrization of M and the
notion of smooth function from R to Rd . See Figure 2.5 for a representation.
The following definition does not depend on the choice of local parametrization. See
Figure 2.6 for a representation.

Definition 2.1.5 (Smooth curve). Let M be a d-dimensional manifold in RN . A smooth

10
Figure 2.5: Definition of a smooth curve on a manifold.

curve γ in M is any function γ : I → M where I is an open interval in R, such that for


any t ∈ I, p = γ(t), there is a local chart ϕ : U → V of M at p and  > 0 such that
ϕ ◦ γ : (t − , t + ) → Rd is smooth.
This definition is extended to smooth curves defined on a closed interval I = [a, b] of R
by requiring that γ be the restriction of some smooth curve defined on an open interval
that contains [a, b]. As γ : I → M ⊂ RN is a curve in RN and is differentiable, a tangent
vector along γ at some time t ∈ I is obviously defined. This generalizes to tangent spaces
to the manifold.

Figure 2.6: Left: Definition of a tangent vector as the derivative of a curve. Right: The collection of all
tangent vectors forms a vector space.

Definition 2.1.6 (Tangent vector). Let d ≤ N ∈ N, M be an embedded manifold in RN


of dimension d and p ∈ M . A vector v ∈ RN is tangent to M at p if there exists an open
interval I centered around 0, and a curve γ : I → M such that
γ(0) = p and γ̇(0) = v.
We write Tp M for the set of tangent vectors at p.
Recall that Rd × {0} is a fundamental example of an embedded manifold, it is clear
that the tangent space at any p ∈ Rd × {0} is the whole Rd × {0}. From this case we

11
deduce the characterizations of tangent spaces equivalent to that of manifolds obtained in
Theorem 2.1.1 (Paulin, 2007, Proposition 3.1).

Theorem 2.1.2. Let M be a manifold in RN of dimension d.

(1) If U, V ⊂ RN are two open neighborhoods respectively around p and 0 in RN and


f : U → V is a diffeomorphism such that f (p) = 0 and f (U ∩ M ) = V ∩ (Rd × {0})
then
Tp M = dfp−1 (Rd × {0}).

(2) (Local parametrization) If U ⊆ RN is an open neighborhood of p ∈ M , V ⊆ Rd is


an open neighborhood around 0 and f : V → RN is a smooth function such that
f (0) = p, f is a homeomorphism between V and U ∩ M , and f is an immersion at 0,
then2
Tp M = Im df0 .

(3) (Local implicit function) If U ⊆ RN is an open neighborhood around p ∈ M and


f : U → RN −d is a smooth map that is a submersion at p, such that U ∩M = f −1 ({0}),
then3
Tp M = ker dfp .

(4) (Local graph) If U ⊆ RN is an open neighborhood of p ∈ M , V ⊆ Rd a neighborhood


of 0 and f : V → RN −d a smooth map such that U ∩ M = graph(f ) and p = (0, f (0)),
then
Tp M = Im{v 7→ (v, df0 (v))}.

From (1) we can see that Tp M is a linear subspace of RN of dimension d. Tangent


spaces thus provide local linearizations of the manifold, a property that will be useful as a
first way to handle data on manifolds. The previous theorem allows computing the tangent
spaces of the common manifolds seen in the previous section.

Example 2.8: Tangent space of the hypersphere


Recall that the hypersphere is the embedded manifold defined by S d = f −1 (0)
where f : x 7→ kxk2 − 1. This corresponds to (3) of Theorem 2.1.2, therefore for any
x ∈ Sd
Tx S d = {v ∈ Rd+1 | hx, vi = 0}.

2
Recall that the range of f : E → F is defined by Im f = {f (x), x ∈ E} ⊂ F .
3
Similarly the kernel of f : E → F is defined by ker f = {x ∈ E | f (x) = 0} ⊂ E.

12
Example 2.9: Tangent space of the hyperbolic space
Similarly, as the hyperbolic space is defined as H d = f −1 (0) where f : x 7→
−x20 + di=1 x2i + 1, we obtain for any x ∈ H d
P

d
Tx H d = {v ∈ Rd+1 | − x0 v0 + xi vi = 0}.
X

i=1

Example 2.10: Tangent space of Stiefel manifold


Similarly, for any U ∈ St(k, n),

TU St(k, n) = {H ∈ Mn,k (R) | U > H + H > U = 0}.

Example 2.11: Tangent space of SO(n)


Recall SO(n) = f −1 (0) with f : A 7→ A> A−In , and for any R ∈ SO(n), H ∈ Mn (R)
we have dfR (H) = R> H + H > R. Therefore for any R ∈ SO(n)

TR SO(n) = {H ∈ Mn (R) | R> H + H > R = 0}.

Note the special case R = In , then TIn SO(n) = Skew(n), the set of skew-symmetric
matrices of size n. The tangent space at the identity of a Lie group plays a particularly
important role as will be exposed in Section 4.

We can now define the notion of smooth maps between manifolds and their differential.
Definition 2.1.7 (Smooth map). Let M, Q be two manifolds of dimensions d1 , d2 ∈ N embed-
ded in RN . A function f : M → Q is smooth if for every p ∈ M , there are parametrizations
ϕ : V1 → U1 of M at p and ψ : V2 → U2 of Q at f (p) such that f (U1 ) ⊆ U2 and
ψ −1 ◦ f ◦ ϕ : V1 → V2 is smooth.
Note that in the above definition V1 , V2 are open sets respectively of Rd1 , Rd2 so the
notion of smooth map from V1 to V2 is already well known (see lexicon in Appendix A).
Definition 2.1.8 (Differential). Let M, Q be two manifolds of dimensions d1 , d2 ∈ N embed-
ded in RN and f : M → Q a smooth map. For any p ∈ M and any v ∈ Tp M , let γ be a
smooth curve through p such that γ̇(0) = v and define
dfp (v) = (f ◦ γ)0 (0).
This definition does not depend on the choice of curve γ and the map dfp : Tp M → Tf (p) Q
is called the differential or tangent map of f at p. It is a linear map between tangent spaces.

13
It generalizes the differential map of a differentiable function defined from RN1 to
some RN2 to functions defined on the manifold M instead of the embedding space. It
coincides with the original differential (Definition A.0.11 in the lexicon) when M = Rd ,
hence the use of the same notation dfp . The set of real-valued smooth maps on M will
be particularly useful. For short, we denote it C ∞ (M ) , C ∞ (M, R). C ∞ (M ) is clearly an
infinite dimensional R-vector space, and with point-wise multiplication, an algebra.
Next, it is convenient to consider the set of all the tangent spaces at all points

TM = {x} × Tx M = {(x, v) |x ∈ M, v ∈ Tx M }.
G

x∈M

and its natural projection (


T M −→ M
π: .
(x, v) 7−→ x
This space is called the tangent bundle of M , and one can show that if M is a manifold
of class C k+1 and dimension d, then T M is itself a manifold in RN × RN , of class C k and
dimension 2d. The tangent bundle is the domain of definition of the differential of smooth
functions defined on manifolds:

f : M → Q, df : T M → T Q

It is also the space where vector fields are valued: a vector field X is a smooth assignment of a
tangent vector to each point of a manifold, i.e. X : M → T M such that ∀p ∈ M, π◦X(p) = p.
X(p) will be written Xp for convenience. Let Γ(T M ) denote the set of all vector fields
(VF). It is clear that Γ(T M ) equipped with point-wise sum and multiplication by a scalar
forms a vector space. Multiplication by a smooth function is also defined pointwise: for any
f ∈ C ∞ (M ) and X ∈ Γ(T M ), f X is the vector field such that

∀p ∈ M, (f X)p = f (p)Xp .

This turns the set of vector fields into a C ∞ (M )-module. For a smooth map f : M → R
and a vector field X, we write X(f ) the function defined at every p by

X(f )(p) = dfp (Xp ).

This leads to the following remark.

Remark 2.1.1. We defined vector fields as sections of the tangent bundle, i.e., maps σ :
M → T M such that π ◦σ = Id. Alternatively, vector fields can be defined as derivations over
the algebra C ∞ (M ) of smooth real valued functions. A derivation X : C ∞ (M ) → C ∞ (M )
is a linear map that satisfies the Leibniz rule:

∀f, g ∈ C ∞ (M ), X(f g) = f X(g) + X(f )g. (2.1)

14
One can check that a vector field as defined above indeed defines a derivation. However,
applying the “composition” of two vector fields to a function f is not a derivation because
of second-order derivatives of f . This leads to the definition of the Lie bracket of vector
fields.
Definition 2.1.9 (Lie bracket over Γ(T M )). The Lie bracket of vector fields is defined as
the map (
Γ(T M ) × Γ(T M ) −→ Γ(T M )
[·, ·] : (2.2)
(X, Y ) 7−→ f 7→ X(Y (f )) − Y (X(f ))
A useful tool to handle vector fields locally is to use a basis of Tp M for p in some open
set U .
Definition 2.1.10 (Frame). Let M be a d-dimensional manifold. For any open set U ⊆ M ,
a family of vector fields (X1 , . . . , Xd ) over U is called a frame over U if for every p in U ,
X1 (p), . . . , Xd (p) is a basis of Tp M .


Any chart (U, ϕ) defines a local frame that corresponds to its local coordinates: define
the ith curve γi : t ∈ R 7→ ϕ−1 (0, . . . , 0, t, 0, . . . , 0) ∈ U and Xi (γi (t)) = γ̇i (t). The vector
field Xi defined on U is often written Xi = ∂x ∂
i or simply ∂i and corresponds to dϕ
−1 (e )
i
where (e1 , . . . , ed ) is the canonical basis of Rd . Then the family (X1 , . . . , Xd ) is a local frame
over U .
Remark 2.1.2. If a family (X1 , . . . , Xd ) is a frame over the whole manifold (i.e. U = M ),
we say that it is a global frame. Whether global frames exist depends on the topology of
the manifold, and in that case the tangent bundle is called trivial, i.e. isomorphic to the
direct product M × RN . This is not the case of e.g. the sphere (of dimension 2), by the
hairy ball theorem4 .
Vector fields can be considered as infinitesimal generators of local maps, as we shall
see in the following. These maps, called flows, usually supply strong information on global
properties of the manifold. In this monograph, we will mainly focus on geodesic flows
(Section 3.1), and flows of left-invariant vector fields on Lie groups (Section 4.2).
Definition 2.1.11 (Integral curve). Let X ∈ Γ(M ) and p0 ∈ M . An integral curve for X
with initial condition p0 is a curve γ : I → M such that
∀t ∈ I, γ̇(t) = Xγ(t) and γ(0) = p0 ,
where I = (a, b) ⊆ R is an open interval containing 0.
An integral curve is thus a curve whose speed γ̇(t) coincide with X at any point along
the curve (see Figure 2.7). A collection of such integral curves is called a flow (Figure 2.8):
4
https://en.wikipedia.org/wiki/Hairy_ball_theorem

15
Figure 2.7: Example of an integral curve. The curve γ is obtained by integrating the vector field X (blue),
meaning that Xγ(t) is tangent to γ at all times.

Definition 2.1.12 (Local flow). Let X ∈ Γ(M ) and p0 ∈ M . A local flow of X at p0 is a


map φ : I × U → M where I is an open interval containing 0 and U is an open set in M
containing p, such that for every p ∈ U , the curve t 7→ φ(t, p) is an integral curve of X
starting from p.

Figure 2.8: Example of a flow. The point φ(t, p) is reached at time t by the integral curve of the vector
field X.

Thanks to the theory of ordinary differential equations (ODE), one can prove that for
any vector field, there is a local flow defined around any point, and if two such flows are
defined on overlapping domains, they coincide on the intersection. For t ∈ I, we write
φt : x 7→ φ(t, x). It is clear that φ0 is the identity map, and for some t 6= 0, φt is a map
defined locally on M . See Lafontaine et al. (2004, Proposition 1.55-56-58) for proofs.

Proposition 2.1.1. Let X be a smooth vector field on M , p0 ∈ M and φ : I × U → M the


local flow of X at p0 . For any s, t ∈ I and x ∈ U ,

• if φs (x) ∈ U and t + s ∈ I, then φt ◦ φs (x) = φt+s (x);

• φt is a local diffeomorphism;

• φt preserves X, in the sense that ∀t ∈ I, ∀x ∈ U , d(φt )x (Xx ) = Xφt (x) .

Definition 2.1.13 (Complete). We say that X is complete if the domain of definition of its
flow (t, x) 7→ φt (x) is the entire R × M .

In that case φt is a diffeomorphism of M , and (φt )t∈R is a one-parameter subgroup of


Diff(M ).
We now have the ingredients to introduce geomstats.

16
2.2 Implementation in geomstats
Now that the fundamental notion of manifold has been exposed, we can delve more into
the architecture of the geomstats package, and summarize the choices that we made in its
development. Firstly the package is organized in different modules that distinguish between
the geometric and the statistical operations. There is thus a geometry module, that gathers
all the implementations of manifolds, connections and Riemannian metrics, and a learning
module where estimation algorithms are implemented in a generic fashion and take the
geometric structure and the data as inputs. The goal is that all the learning algorithms
can be run seamlessly on different manifolds, and with different metrics. Basic sampling
schemes are available, and a more extended sampling module is currently being developed
with the same spirit. A visualization module allows to plot data on the common manifolds
in dimension two or three. In this section, we focus on the geometry module, and more
specifically on the objects that represent manifolds. The Riemannian metric objects will
be described in Section 3.1 along with the mathematical definitions. The statistical and
learning tools will be described in Section 6.
The package is object-oriented in the sense that all the tools are implemented as classes
that contain all the methods related to a tool. Object-Oriented programming (OOP) is
a programming paradigm that consists in grouping properties and functions related to a
common concept into an object, called a class in Python5
The geometry module of geomstats is organized by classes that each represent a geometric
structure. To guarantee the consistency of all the classes, we implement an abstract parent
Manifold class, and the actual implementations of the usual manifolds are subclasses of
this parent class. The aim of abstract base classes is to provide the minimal skeleton of
attributes and methods expected in its subclasses. The methods of the abstract class are
thus declared but contain no implementation, and they are overridden by the subclasses.

The manifold classes A subclass of Manifold gathers the methods to work with data
lying on the considered manifold. Note that these classes do not explicitly provide a
representation of the manifold (as e.g. a triangulated surface in 2d), but the tools to handle
points and tangent vectors. As we work with embedded manifolds in most cases, points and
tangent vectors are themselves represented by multi-dimensional arrays. These are either
NumPy arrays, or TensorFlow of PyTorch tensors according to the backend that is being used.
Mathematically, the first attribute of a manifold is of course its dimension, called dim
for brevity throughout the package. Then we use an attribute to inform on the expected
type of the point: whether vectors (for e.g. the hypersphere and hyperbolic space) or

5
For more on OOP, we refer the reader to online tutorials such as the one written by the RealPython
team.

17
matrices (e.g. SPD matrices, the special orthogonal group) should be used. This is called
default_point_type.
Furthermore, a Manifold in geomstats should always implement a method that evaluates
whether a given element belongs to that manifold, and whether a given vector is a tangent
vector at a given point. These are the belongs and is_tangent methods. For practical
reasons we also add a random_point method, to generate random points that belong to the
manifold (regardless of the distribution). This is useful in particular to test the methods
and the learning algorithms. We obtain the following base class:

class Manifold(abc.ABC):
"""Class for manifolds."""

def __init__(
self, dim, metric=None, default_point_type='vector', **kwargs):
super().__init__(**kwargs)
self.dim = dim
self.default_point_type = default_point_type
self.metric = metric

@abc.abstractmethod
def belongs(self, point, atol=gs.atol):
"""Evaluate if a point belongs to the manifold."""

@abc.abstractmethod
def is_tangent(self, vector, base_point, atol=gs.atol):
"""Check whether the vector is tangent at base_point."""

@abc.abstractmethod
def random_point(self, n_samples=1, bound=1.):
"""Sample random points on the manifold."""

Note that the Manifold class contains a metric attribute. This will be detailed in
Section 3.1.

Implementation trick 2.1. The methods decorated (@ symbol) with abc.abstractmethod


are declared as abstract methods of the class. A class that contains abstract methods cannot
be instantiated. This constrains the developer to implement these functions explicitly when
writing subclasses of Manifold.

Two elementary classes of manifolds Throughout the current section, we have met
two elementary ways of defining a manifold, that correspond respectively to (1) and (2) of
Theorem 2.1.1 (page 5):

1. As pre-image of a submersion f : RN → RN −d . We refer to such space as a level-set. In


this case, specifying f , it is straightforward to implement the belongs and is_tangent
method by evaluating f and its differential. It also makes sense to add projection
and to_tangent methods from the embedding space to respectively the manifold and
the tangent space at a point.

18
2. As open sets of a d-dimensional vector space, called ambient space. In this case, all
the tangent spaces are identified with the ambient space. For consistency, we add a
projection method that maps any d-dimensional vector to the manifold at a tolerance
threshold  > 0 away from the boundary of the set (if there is one). This method is
not uniquely defined and can be understood as a regularization method for inputs
very close to the open set, which is helpful in learning algorithms. Therefore, the
method is_tangent just checks if the input belongs to the ambient space, and we can
add a method to_tangent, that calls the projection of the ambient space, to project to
a tangent space, assuming the ambient space to be itself embedded in another space,
or being a vector space, whose projection is the identity.

We thus implement two more abstract classes, for open sets and level sets respectively:

class OpenSet(Manifold, abc.ABC):


"""Class for manifolds that are open sets of a vector space."""

def __init__(self, dim, ambient_space, **kwargs):


kwargs.setdefault("shape", ambient_space.shape)
super().__init__(dim=dim, **kwargs)
self.ambient_space = ambient_space

def is_tangent(self, vector, base_point, atol=gs.atol):


"""Check whether the vector is tangent at base_point."""
return self.ambient_space.belongs(vector, atol)

def to_tangent(self, vector, base_point):


"""Project a vector to a tangent space of the manifold."""
return self.ambient_space.projection(vector)

def random_point(self, n_samples=1, bound=1.):


"""Sample random points on the manifold."""
sample = self.ambient_space.random_point(n_samples, bound)
return self.projection(sample)

@abc.abstractmethod
def projection(self, point):
"""Project a point in ambient manifold on manifold."""

class LevelSet(Manifold, abc.ABC):


"""Class for manifolds embedded in a vector space by a submersion."""

def __init__(
self, dim, embedding_space, submersion, value,
tangent_submersion, **kwargs):
super().__init__(
dim=dim, default_point_type=embedding_space.default_point_type,
**kwargs)
self.embedding_space = embedding_space
self.embedding_metric = embedding_space.metric
self.submersion = submersion
self.value = value
self.tangent_submersion = tangent_submersion

def belongs(self, point, atol=gs.atol):


"""Evaluate if a point belongs to the manifold."""

19
belongs = self.embedding_space.belongs(point, atol)
if not gs.any(belongs):
return belongs
value = self.value
constraint = gs.isclose(self.submersion(point), value, atol=atol)
if value.ndim == 2:
constraint = gs.all(constraint, axis=(-2, -1))
elif value.ndim == 1:
constraint = gs.all(constraint, axis=-1)
return gs.logical_and(belongs, constraint)

def is_tangent(self, vector, base_point, atol=gs.atol):


"""Check whether the vector is tangent at base_point."""
belongs = self.embedding_space.belongs(vector, atol)
tangent_sub_applied = self.tangent_submersion(vector, base_point)
constraint = gs.isclose(tangent_sub_applied, 0., atol=atol)
value = self.value
if value.ndim == 2:
constraint = gs.all(constraint, axis=(-2, -1))
elif value.ndim == 1:
constraint = gs.all(constraint, axis=-1)
return gs.logical_and(belongs, constraint)

@abc.abstractmethod
def projection(self, point):
"""Project a point in embedding space on the manifold."""

Implementation trick 2.2. All the methods of geomstats are vectorized6 , in the sense that
they can take as argument either one input, or a collection of inputs corresponding to
multiple samples. It is very useful to use the einsum method for that purpose, with ellipses
(‘...’) that represent an optional additional dimension.
For example, the syntax gs.einsum('...,...i->...i', coef, point) performs scalar
multiplication between a list of scalars (coef) and a list of points, but it also works for a
single scalar and a single point.

To make sure that the attributes that represent the ambient or embedding space do
implement the methods that are called in OpenSet and LevelSet, we also implemented an
abstract VectorSpace class. Actual manifolds are then implemented as subclasses of the
corresponding abstract manifold and must implement all the abstract methods. We give
examples of each class below, using the Hypersphere and Stiefel manifold as level sets, and
the Poincaré ball as open set.

Example 2.12: Implementation of the hypersphere


The hypersphere is implemented as embedded in Rd+1 , so it is a subclass of
LevelSet.

6
Vectorization may also be referred to as array programming on e.g. Wikipedia.

20
class Hypersphere(LevelSet):
"""Class for the n-dimensional hypersphere."""

def __init__(self, dim):


super().__init__(
dim=dim, embedding_space=Euclidean(dim + 1),
metric=HypersphereMetric(dim),
submersion=lambda x: gs.sum(x ** 2, axis=-1), value=1.,
tangent_submersion=lambda v, x: 2 * gs.sum(x * v, axis=-1))

def projection(self, point):


"""Project a point on the hypersphere."""
norm = gs.linalg.norm(point, axis=-1)
if gs.any(norm < gs.atol):
logging.warning('0 cannot be projected to the hypersphere')
return gs.einsum('...,...i->...i', 1. / norm, point)

def to_tangent(self, vector, base_point):


"""Project a vector to the tangent space."""
sq_norm = gs.sum(base_point ** 2, axis=-1)
inner_prod = self.embedding_metric.inner_product(base_point, vector)
coef = inner_prod / sq_norm
return vector - gs.einsum('...,...j->...j', coef, base_point)

Example 2.13: Implementation of the Stiefel manifold


The Stiefel manifold is implemented as embedded in the space of n × p matrices,
so it is a subclass of LevelSet. The derivation of the projection map can be found
in Absil and Malick (2012).

21
class Stiefel(LevelSet):
"""Class for Stiefel manifolds St(n,p."""

def __init__(self, n, p, **kwargs):


if p > n:
raise ValueError("p needs to be smaller than n.")

dim = int(p * n - (p * (p + 1) / 2))


matrices = Matrices(n, p)
canonical_metric = StiefelCanonicalMetric(n, p)
kwargs.setdefault("metric", canonical_metric)
super(Stiefel, self).__init__(
dim=dim,
embedding_space=matrices,
submersion=lambda x: matrices.mul(matrices.transpose(x), x),
value=gs.eye(p),
tangent_submersion=lambda v, x: 2
* matrices.to_symmetric(matrices.mul(matrices.transpose(x), v)),
**kwargs
)
self.n = n
self.p = p

def to_tangent(self, vector, base_point):


"""Project a vector to a tangent space of the manifold."""
aux = Matrices.mul(Matrices.transpose(base_point), vector)
sym_aux = Matrices.to_symmetric(aux)
return vector - Matrices.mul(base_point, sym_aux)

def projection(self, point):


"""Project a close enough matrix to the Stiefel manifold."""
mat_u, _, mat_v = gs.linalg.svd(point)
return Matrices.mul(mat_u[..., :, : self.p], mat_v)

Example 2.14: Implementation of the Poincaré ball


The Poincare ball is one of the models of hyperbolic geometry, and is defined as
the open unit disk of Rd . It is then a subclass of OpenSet.

class PoincareBall(OpenSet):
"""Class for the n-dimensional hyperbolic space."""

def __init__(self, dim, scale=1):


super().__init__(
dim=dim, ambient_space=Euclidean(dim),
metric=PoincareBallMetric(dim))

def belongs(self, point, atol=gs.atol):


"""Test if a point belongs to the unit ball."""
return gs.sum(point**2, axis=-1) < (1 - atol)

As stated above, the projection method is not well defined and we simply map any
d-dimensional vector to the manifold at a tolerance threshold  > 0 away from the
boundary of the set.

22
def projection(self, point):
"""Project a point on the unit ball."""
if point.shape[-1] != self.dim:
raise NameError("Wrong dimension, expected ", self.dim)

l2_norm = gs.linalg.norm(point, axis=-1)


if gs.any(l2_norm >= 1 - gs.atol):
projected_point = gs.einsum(
'...j,...->...j', point * (1 - gs.atol), 1. / l2_norm)
projected_point = -gs.maximum(-projected_point, -point)
return projected_point
return point

Finally, we illustrate this with a diagram representing all the base classes for manifolds
and all the manifolds on Figure 2.9. The abstract base classes are shown with black bounding
boxes.

Manifold
VectorSpace
MatrixLieGroup ProductManifold
MatrixLieAlgebra
Landmarks
Euclidean OpenSet LevetSet
PolyDisk
SkewSymmetricMatrices
SpecialEuclideanAlgebra DiscreteCurves
GeneralLinear SpecialOrthogonal
SymmetricMatrices
DirichletDistributions SpecialEuclidean
Matrices
PoincareHalfSpace Stiefel
PoincareBall Grassmannian
SPDMatrices Hypersphere
FiberBundle
Hyperboloid

Hyperbolic CorrelationMatrices
PreShapeSpace
CorrelationBundle

Figure 2.9: Architecture of the manifolds of geomstats. The abstract base classes are in black bounded
boxes. Inheritance is shown by blue arrows. Hyperbolic is an exception as it is a common interface to the
different representations of hyperbolic geometry.

Inheritances occur in two cases:


• With an abstract base class as parent class;
• When both parent and child class represent the same manifold, as is the case in
the CorrelationBundle of Example 5.8. A common interface Hyperbolic for the three
representations of hyperbolic geometry follows this logic as any of the three represen-
tations can be chosen, but the instantiated object is either Hyperboloid, PoincareBall
or PoincareHalfSpace as chosen.

23
Manifolds can then be composed to define other manifolds by products (Example 2.6)
or quotients (see Section 5.1.4). For products, we create the class ProductManifold that
takes existing manifolds to construct a new one, and computations on each manifold can be
done in parallel:
class ProductManifold(Manifold):
r"""Class for a product of manifolds M_1 \times ... \times M_n."""

def __init__(self, manifolds, n_jobs=1):


self.dims = [manifold.dim for manifold in manifolds]
super().__init__(dim=sum(self.dims))

3 Riemannian manifolds
3.1 Riemannian metrics
We now introduce a new structure on a differentiable manifold: the Riemannian metric,
that allows to define the length of a curve, a distance function, a volume form, etc. Note
that this additional structure may not be canonical, raising the thorny question of choosing
the metric for the applications.
Recall that T M is the tangent bundle of a smooth manifold M (defined page 14).
Definition 3.1.1 (Riemannian metric). Let M be a smooth d-dimensional manifold. A
Riemannian metric on M (or T M ) is a family (h·, ·ip )p∈M of inner products∗ on each
tangent space Tp M , such that h·, ·ip depends smoothly on p. More formally, for any chart
ϕ, U , and frame (X1 , . . . , Xn ) on U , the maps
p 7→ hXi (p), Xj (p)ip 1 ≤ i, j ≤ n,
are smooth. A pair (M, h·, ·i) is called a Riemannian manifold.
A metric is often written g = (gp )p∈M , where gp is the symmetric, positive definite
(SPD) matrix representing the inner-product in a chart, that is
D  E
gij (p) =

∂i p , ∂j p p
.

Alternatively, we sometimes define a metric with the notation g = f (dx1 , . . . , dxn ) where
f is the quadratic form associated with g, and dxi represent vector coordinates (as linear
forms). For example the usual Euclidean metric is g = ni=1 dx2i . The following theorem
P

ensures that a metric is indeed a general structure. A proof can be found e.g. in Lafontaine
et al., 2004, Theorem 2.2.

Defined in Appendix A.

24
Theorem 3.1.1 (Existence). Any smooth manifold admits a Riemannian metric.

A Riemannian metric also defines a norm on T M , as usually defined by an inner-product


on each tangent space:
q q
∀x ∈ M, ∀v ∈ Tx M, kvkx = gx (v, v) = hv, vix .

Implementation in geomstats In order to guarantee flexibility, we have decided to


keep manifolds and metrics separated in different objects. Indeed, although there may exist
a canonical Riemannian metric on a given manifold, the choice of metric is not always
natural, and researchers have struggled to find criteria to choose the right metric for the
application at hand. The aim of geomstats is to allow researchers to compare different
metrics on their problem, and in the future, to allow to learn, or optimize the metric (Louis
et al., 2019; Hauberg, 2019).
We create an abstract RiemannianMetric class in geomstats to gather the basic attributes
and methods expected of a metric. The most general way of defining a metric is to provide
the metric_matrix method, that is x 7→ (gij (x))1≤i,j≤d . By default, we use the identity
matrix for all points, resulting in the Euclidean metric.
class RiemannianMetric(Connection):
"""Class for Riemannian and pseudo-Riemannian metrics."""

def __init__(self, dim, signature=None):


super().__init__(dim=dim)
if signature is None:
self.signature = (dim, 0)

def metric_matrix(self, base_point=None):


"""Inner product matrix at base point."""
return gs.eye(self.dim)

def inner_product(self, tangent_vec_a, tangent_vec_b, base_point=None):


"""Inner product between two tangent vectors at a base point."""
inner_prod_mat = self.metric_matrix(base_point)
inner_prod = gs.einsum(
'...j,...jk,...k->...', tangent_vec_a, inner_prod_mat, tangent_vec_b)
return inner_prod

def squared_norm(self, vector, base_point=None):


"""Compute the square of the norm of a vector."""
return self.inner_product(vector, vector, base_point)

def norm(self, vector, base_point=None):


"""Compute norm of a vector."""
sq_norm = self.squared_norm(vector, base_point)
return gs.sqrt(sq_norm)

Remark 3.1.1.

1. The above class inherits from Connection. We indeed chose a class for an affine
connection as parent class to a Riemannian metric because it is a more general
structure, as explained in Section 3.2.1.

25
2. The attribute signature refers to the signature of the inner product, in case it is only
a non-degenerate bilinear form, not necessarily positive. In this case the metric is
called a pseudo-Riemannian metric.

We also add a metric property to the Manifold class, meaning that it is an attribute of
the class, that can be set externally, calling the setter that checks that the given argument
is indeed an instance of a RiemannianMetric object. Of course, all manifolds studied in this
monograph come with a default metric, but users can choose to use different metrics or
implement new metrics with only a few minimal operations. When closed-form solutions are
available, the generic methods are overridden. With a metric attribute, a Manifold actually
becomes a Riemannian manifold, but we did not think relevant to implement another layer
of abstract class for Riemannian manifolds (i.e. a RiemannianManifold class), as all necessary
operations are either in the Manifold object if they don’t depend on the metric, or in the
RiemannianMetric object if they do.

@property
def metric(self):
"""Riemannian Metric associated to the Manifold."""
return self._metric

@metric.setter
def metric(self, metric):
if metric is not None:
if not isinstance(metric, RiemannianMetric):
raise ValueError(
'The argument must be a RiemannianMetric object')
if metric.dim != self.dim:
metric.dim = self.dim
self._metric = metric

Example 3.1: Euclidean metric


Let M = Rd be the standard vector space of dimension d, that is trivially a smooth
manifold, and consider its standard inner-product defined for all x, y ∈ Rd by
d
hx, yi2 = xi yi = x> y.
X

i=1

As Tx Rd = Rd , it defines a metric on Rd , which is referred to as the Euclidean metric.

Example 3.2: Product metric


Let (M, g) and (M 0 , g 0 ) be two Riemannian manifolds, and recall from Example 2.6
(page 9) that the Cartesian product M × M 0 is a manifold. It is also a Riemannian
manifold. Indeed, define the product metric g ⊕ g 0 as the map defined at any (x, x0 ) ∈

26
M × M 0 and ∀(v, v 0 ), (w, w0 ) ∈ Tx M × Tx0 M 0 by
0 0 0 0 0 0
g ⊕ g(x,x 0 ) ((v, v ), (w, w )) = g(v, w) + g (v , w ).

In geomstats, it is possible to define such product metrics from existing objects of


the class RiemannianMetric, and operations for each metric can be run in parallel if
necessary.

class ProductRiemannianMetric(RiemannianMetric):
"""Class for product of Riemannian metrics."""

def __init__(self, metrics, default_point_type='vector', n_jobs=1):


self.n_metrics = len(metrics)
dims = [metric.dim for metric in metrics]
signatures = [metric.signature for metric in metrics]

sig_pos = sum(sig[0] for sig in signatures)


sig_neg = sum(sig[1] for sig in signatures)
super().__init__(
dim=sum(dims), signature=(sig_pos, sig_neg),
default_point_type=default_point_type)

Let (N, g) be a Riemannian manifold, M a smooth manifold, and f : M → N a map.


Define the pull-back metric (f ∗ g) on M by
(
∗ Tx M × Tx M −→ R
(f g)x : . (3.1)
(v, w) 7−→ gf (x) (dfx v, dfx w)

Now suppose that f is an immersion. If (f ∗ g)x is non-degenerate and of constant signature


for all x ∈ M , then (M, f ∗ g) is a Riemannian manifold.

Definition 3.1.2 (Isometry). Let (M, g) and (M 0 , g 0 ) be two Riemannian manifolds, and
f : M → M 0 . Then f is called an isometry if it is a bijection and f ∗ g 0 = g.

When M = M 0 and g = g 0 , we write Isom(M ) the set of isometries of M , and Myers


and Steenrod (1939) showed that it is a Lie group that acts smoothly on M .
Now, consider that M ⊆ N = Rd is a submanifold of M , f = i is the inclusion map.
Then g is called the embedding metric and i∗ g is its restriction to M . This case appears in
many examples in geomstats.

Example 3.3: Metric on the hypersphere


The hypersphere S d ⊂ Rd+1 endowed with the restriction of the Euclidean metric
to S d is a Riemannian manifold. We call this metric the standard spherical metric,
and implement it by calling the embedding metric.

27
class HypersphereMetric(RiemannianMetric):
"""Class for the Metric on the Hypersphere."""

def __init__(self, dim):


super().__init__(
dim=dim, signature=(dim, 0))
self.embedding_metric = EuclideanMetric(dim + 1)

def metric_matrix(self, base_point=None):


"""Inner-product matrix at a base point."""
return gs.eye(self.dim + 1)

def inner_product(self, tangent_vec_a, tangent_vec_b, base_point=None):


"""Inner-product of two tangent vectors at a base point."""
return self.embedding_metric.inner_product(
tangent_vec_a, tangent_vec_b, base_point)

Let us now define the Lorentz bilinear form of Rd+1 . It is the canonical bilinear form
with signature (1, d), i.e. for any x, y ∈ Rd+1
d
hx, yiL = −x0 y0 + (3.2)
X
x i yi
i=1

And write k · kL for the associated quadratic form. This is the underlying embedding metric
in the following example.

Example 3.4: Hyperbolic metric


Using the Lorentz form, recall that H d ∈ Rd+1 is the set of points such that
kxkL = −1. Consider now the open subset H+d of H d :

n o
d
H+ = x ∈ Rd+1 | x0 > 0, kxkL = −1 .

Now, consider any x ∈ H+ d and two tangent vectors v, w ∈ T M . By Example 2.9, this
x
means that hv, xiL = hw, xiL = 0. As h, iL is negative definite on Rx = {λx | λ ∈ R},
and its signature is (1, d), it is positive definite on the orthogonal of Rx for the
Lorentz metric, i.e. on Tx M . We can thus conclude that (H+d , h·, ·i ) is a Riemannian
L
manifold.
In the implementation, we use the abstract class and simply override the
inner_product function.

28
class HyperbolicMetric(RiemannianMetric):
"""Class for the hyperbolic metric."""

def __init__(self, dim):


super().__init__(
dim=dim, signature=(1, dim))

def metric_matrix(self, base_point=None):


"""Inner product matrix at base point."""
diagonal = gs.array([-1.] + [1.] * self.dim)
return from_vector_to_diagonal_matrix(diagonal)

def inner_product(self, tangent_vec_a, tangent_vec_b, base_point=None):


"""Inner product between two tangent vectors at a base point."""
diagonal = gs.array([-1.] + [1.] * self.dim)
return gs.sum(diagonal * tangent_vec_a, tangent_vec_b, axis=-1)

Example 3.5: Frobenius metric


The analog of the Euclidean metric on matrix spaces is the Frobenius inner product
defined by

∀A, B ∈ Mm,n (R), hA, BiF = tr(A> B) =


X
Aij Bij .
i,j

where tr is the trace operator. We use the right-hand-side expression in geomstats to


avoid computing a matrix product (O(mn2 ) operations against O(mn)).
Endowed with this metric (or its restriction), Mn (R), GL(n) (Example 4.1 page 45),
SO(n) (Example 2.3 page 6) and SE(n) (Example 4.7 page 51) are Riemannian
manifolds.

3.2 Affine connections and the Levi-Civita connection


A connection is an additional structure that can be defined independently of a Riemannian
metric. It provides a way to compare tangent spaces from one point to another, by defining
the notion of parallelism. For a detailed and historical account of the different approaches
to defining connections, we refer to Marle (2005).

3.2.1 Connections
Definition 3.2.1 (Connection). Let M be a smooth manifold. A connection on M is an
R-bilinear map ∇ : Γ(T M ) × Γ(T M ) → Γ(T M ) that verifies for all X, Y ∈ Γ(T M ), ∀f ∈
C ∞ (M ):

1. (Linearity of 1st argument) ∇f X Y = f ∇X Y ,

29
2. (Leibniz rule in 2nd argument) ∇X (f Y ) = X(f )Y + f ∇X Y .

The vector field ∇X Y is called the covariant derivative of Y w.r.t. X.

In fact, (∇X Y )p only depends on the value of X at p and not in its neighborhood. In
contrast, it does depend on Y around p.
In local coordinates (x1 , . . . , xd ), the Christoffel symbols are used to specify the connec-
tion. Recall that ∂i i is a local frame, so we can decompose ∇ in this basis, and define


(Γkij )ijk such that


∇∂i ∂j = Γkij ∂k ,


where we used Einstein summation convention, meaning that a sum occurs along the indices
that appear both in subscript and superscript, here k.
Two vector fields X, Y ∈ Γ(T M ) can be decomposed locally in coordinates, writing
X = X i ∂i and Y = Y i ∂i where X i , Y i are coordinate functions defined locally. Then using
the properties of a connection,

∇ X Y = X i ∇ ∂i Y j ∂ j


∂Y j
∇X Y = X i ∂j + Y j Γkij ∂j . (3.3)

∂x i

As this formula shows, a connection provides a correction term when compared to the
directional derivative of Y with respect to X in a chart.

3.2.2 Parallel transport and geodesics


We now focus on vector fields that are defined along a curve γ : [a, b] → M , i.e. a smooth
map X : [a, b] → T M such that at any t ∈ [a, b], X(t) ∈ Tγ(t) M . Note that such vector
field need not be defined on the whole manifold, but can be locally extended to an open
set around every point. Thankfully, one can show that there exists a covariant derivative
that coincides with (∇γ̇(t) X)γ(t) at all t ∈ [a, b] and for any X defined on a neighborhood
of γ(t) (Lafontaine et al., 2004, Theorem 2.68). We will skip these technicalities and admit
that ∇γ̇ X is well-defined for any vector field X along γ.
We now define a central notion in geometric statistics: the parallel transport.

Definition 3.2.2 (Parallel vector field). Let M be a smooth manifold and ∇ a connection
on M. For any curve γ : [a, b] → M in M , a vector field X along γ is parallel if

∇γ̇(t) X(t) = 0. (3.4)

In a local chart and using the Christoffel symbols and in particular using Equation (3.3),
Equation (3.4) can be written ∀t ∈ [a, b]

Ẋ k (t) + Γkij X i (t)γ̇ j (t) = 0. (3.5)

30
From the properties of ODEs, one can prove the following existence and uniqueness
property (Lafontaine et al., 2004, Proposition 2.72)
Proposition 3.2.1. Let M be a smooth manifold and let ∇ be a connection on M . For
every C 1 curve γ : [a, b] → M in M , for every t ∈ [a, b] and every v ∈ Tγ(t) M , there is a
unique parallel vector field X along γ such that X(t) = v.
For such parallel vector field X and s ∈ [a, b], we thus call X(s) the parallel transport
of v along γ from t to s, and write X(s) = Πsγ,t v. Another consequence of the properties of
ODEs is that for any s, t ∈ [a.b], Πsγ,t is a linear isomorphism between the tangent spaces
Tγ(t) M and Tγ(s) M .

Figure 3.1: Representation of a parallel vector field X (green) along a curve γ (blue). The orientation of
X with resepct to the speed γ̇ of γ stays the same.

Intuitively, the parallel transport equation constrains the transported vector to keep a
certain angle w.r.t the speed γ̇ of the curve while moving along it, see Figure 3.1 for an
illustration. Conversely, the connection ∇X Y could be retrieved from the parallel transport
Πsγ,t Y when the curve is shrunk to a point in the direction of X, i.e. s → t and γ̇(t) = X.
We now focus on a particular set of curves on M for which the velocity is parallel
transported along the curve.
Definition 3.2.3. Let M be a smooth manifold endowed with a connection ∇. A curve
γ : [a, b] → M is said to be autoparallel and is called a geodesic of (M, ∇) if it satisfies for
all t ∈ [a, b]
∇γ̇(t) γ̇(t) = 0. (3.6)
Note that ∇γ̇ γ̇ can be interpreted as the covariant acceleration of γ, so that Equation (3.6)
constrains geodesics to be zero acceleration curves. The flow of the geodesic equation is
called geodesic flow, and is a fundamental example of dynamical system, that generalizes
straight lines from Euclidean spaces.
The geodesic equation can be written in local coordinates like the parallel transport
equation. A geodesic curve γ satisfies for all times t ∈ [a, b]
γ̈ k (t) + Γkij γ̇ i (t)γ̇ j (t) = 0. (3.7)

31
From the properties of second-order differential equations, for any (x, v) ∈ T M , there exists
a maximal interval Ix,v ⊆ R such that γx,v : Ix,v → M is the unique geodesic that verifies
γx,v (0) = x and γ̇x,v (0) = v. Moreover, by homogeneity of the Equation (3.7), for any s > 0,
Ix,sv = 1s Ix,v and γx,sv (t) = γx,v (st). We deduce that the set of vectors in Tx M such that
1 ∈ Ix,v is non empty, open, and contains 0. This leads to the following definition

Definition 3.2.4 (Exponential map). Let ∇ be a connection on a smooth manifold M . The


map (x, v) 7→ γx,v (1) defined on the open set {(x, v) ∈ T M, 1 ∈ Ix,v } and with values in M
is called the exponential map of ∇. We say that M is geodesically complete if the exponential
map is defined on the whole of T M .

For any x ∈ M , we write the exponential map at x Expx : v ∈ Tx M 7→ γx,v (1).

Remark 3.2.1. Note that although we introduced many notions in this paragraph, there
were no examples. Indeed a connection or its Christoffel symbols are rarely explicit, except
when the connection is compatible with a metric. We explain this notion of compatibility
in the next paragraph, and will give several examples of geodesics and parallel transports.

3.2.3 The Levi-Civita connection


First, given two vector fields Y, Z, recall that hY, Zi is a smooth function on M so X(hY, Zi)
must be understood as X(f ) = df (X) for f : p 7→ hYp , Zp ip .
The following is considered the fundamental theorem of Riemannian geometry. It ensures
that there exists a unique connection that is “compatible” with the metric. See Lafontaine
et al. (2004, Theorem 2.51) for a proof.

Theorem 3.2.1. Let (M, g) be a Riemannian manifold. There is a unique connection on M


that verifies for all X, Y, Z ∈ Γ(T M )

1. (Torsion-free) ∇X Y − ∇Y X = [X, Y ] (3.8)


2. (Compatibility) X(hX, Y i) = h∇X Y, Zi + hY, ∇X Zi (3.9)

This connection is called the Levi-Civita connection and is determined by the Koszul
formula:

2h∇X Y, Zi = X(hY, Zi) + Y (hX, Zi) − Z(hX, Y i)


− hY, [X, Z]i − hX, [Y, Z]i − hZ, [Y, X]i. (3.10)

The notion of compatibility is thus detailed in Equations (3.8) and (3.9). The former is
quite general but ensures uniqueness of the Levi-Civita connection, it is called the zero-
torsion condition. The latter can be understood as a Leibniz rule where hY, Zi is seen as a
product and the derivative is ∇X . More precisely, this condition means that the metric is

32
parallel with respect to the connection. Indeed, although we introduced a connection as a
map on vector fields, it can be extended to tensors of any order. For a 0-order tensor, i.e. a
smooth function f , we have ∇X f = X(f ), and for a (2, 0)-tensor such as g, we have (by
generalising the Leibniz rule) for any X, Y, Z ∈ Γ(T M )

(∇X g)(Y, Z) = ∇X (g(Y, Z)) − g(∇X Y, Z) − g(Y, ∇X Z).


= X(hY, Zi) − h∇X Y, Zi − hY, ∇X Zi (3.11)

Therefore the combination of Equations (3.9) and (3.11) results in ∇g = 0.


In the general case, the Levi-Civita connection is characterized locally by its Christoffel
symbols. Let (x1 , . . . , xn ) be a local coordinate chart, by definition

∇∂i ∂j = Γkij ∂k

The torsion-free condition (3.8) and Schwartz theorem ([∂i , ∂j ] = 0) thus imply the symmetry
of the symbols: Γkij = Γkji for all 1 ≤ i, j, k ≤ n. Furthermore, together with (3.9), Koszul
formula (3.10) translates into

2g(∇∂i ∂j , ∂k ) = ∂i gjk + ∂j gki − ∂k gij .

Writing (g ij )ij = (gij )−1


ij for the inverse of the metric matrix, we obtain

1
Γkij = g lk (∂i gjl + ∂j gli − ∂l gij ) (3.12)
2
Thus the Christoffel symbols can be computed from the metric g. This formula is rarely
used by mathematicians for computations “by hand”, but we shall use it in geomstats to
implement pull back metrics.
The following characterization allows to compute the Levi-Civita connection in the case
of embedded manifolds equipped with the embedding metric, as is the case in many of our
examples. See Lafontaine et al. (2004, Proposition 2.56) for a proof.

Proposition 3.2.2. Let M be a Riemannian manifold and N a submanifold of M endowed


with the induced metric (i.e., the restriction of the embedding metric). If ∇M and ∇N are
the Levi-Civita connections on M and N respectively defined by the metric on M . Then
for any vector field X, Y ∈ Γ(T N ), we have
k
X Y = (∇X Y ) ,
∇N (3.13)
M

where (v)k is the orthogonal projection of any v ∈ Tp M onto Tp N for any p ∈ N .

This proposition is particularly useful for embedded manifolds of RN endowed with


the ambient Euclidean metric, as it is straightforward to see (by uniqueness) that their
Levi-Civita connection coincide with the directional derivative of functions from RN to RN .

33
Therefore, one only needs to compute the projection to the tangent spaces of any point of
an embedded manifold to compute its connection.
Finally, thanks to the compatibility of the metric with the connection, the parallel
transport preserves the metric, as stated in the following proposition (Lafontaine et al.,
2004, Proposition 2.74).

Proposition 3.2.3. Let γ : I → M be a smooth curve, and s, t ∈ I. Then the parallel


transport map Πtγ,s : Tγ(s) M → Tγ(t) M along γ for the Levi-Civita connection is an
isometry, i.e.
∀v, w ∈ Tγ(s) M, hΠtγ,s v, Πtγ,s wiγ(t) = hv, wiγ(s) .

This justifies the use of parallel transport in statistical procedures to move data from
one reference point to another while preserving distances.
Moreover, the compatibility with the metric also ensures that isometries preserve the
Levi-Civita connection and its geodesics. The following can be deduced from the Kozsul
formula (3.10).

Proposition 3.2.4. Let (M, g) be a Riemannian connection and ∇ its Levi-Civita connection.
Let f ∈ Isom(M ) be an isometry of (M, g), then for any X, Y ∈ Γ(T M ) and x ∈ M , we
have

df ∇X Y = ∇df X df Y (3.14)
Expf (x) ◦dfx = f ◦ Expx . (3.15)

We now come to the definition of distance and its link with geodesics.

3.3 Distance and Geodesics


3.3.1 Injectivity and parametrization
Let (M, g) be a Riemannian manifold. As the Levi-Civita is uniquely defined, we call
geodesic of (M, g) any geodesic of its Levi-Civita connection. Similarly, we define:

Definition 3.3.1 (Exp and Log maps). We call Riemannian exponential the exponential map
of the Levi-Civita connection, and define its injectivity radius injM (x) at x as the greatest
 > 0 such that Exp is a diffeomorphism on the open ball of radius  of Tx M .
The following properties of the exponential map can be proved (Lafontaine et al., 2004,
Proposition 2.88).

Proposition 3.3.1. Let (M, g) be a Riemannian manifold and x ∈ M .

• The differential of Expx at 0 is the identity.

34
• (x, v) 7→ (x, Expx (v)) is a smooth diffeomorphism from an open neighborhood of
the null section (i.e. “M × {0}00 ) in T M to an open neighborhood of the diagonal of
M × M.

The open ball of radius injM (x) on which the Exp map is a diffeomorphism can
be maximally extended into a larger star-shaped open set called the injectivity domain
Inj(x) ⊆ Tx M . The Riemannian logarithm is defined as the inverse of the Exp map on this
injectivity domain. The injectivity radius of M , injM , is then the smallest of all injectivity
radii at x for x ∈ M . It may not be finite, as in e.g. the hyperbolic space.

By Equation (3.9), the squared velocity E(t) = 12 kγ̇(t)k2 of any geodesic γ is constant
in time:
d
hγ̇(t), γ̇(t)i = 2h∇γ̇(t) γ̇(t), γ̇(t)i = 0
dt
We say that a curve γ is parametrized with constant speed if the function t 7→ kγ̇(t)kγ(t) is
constant. The above equation thus shows that geodesics are parametrized with constant
speed.
Furthermore, if N ⊂ M is an embedded manifold, (3.13) implies that the geodesics of
N are the curves in M whose acceleration is normal to N .

Example 3.6: Geodesics of the hypersphere


For any x ∈ S d ⊂ Rd+1 , and v ∈ Tx S d , define the curve on S d that parametrizes
the great circle
v
γ : t 7→ cos(tkvk)x + sin(tkvk) . (3.16)
kvk
Then, it is clear that the acceleration γ̈(t) = −kvk2 γ(t) is normal to S d . By uniqueness
this curve is the geodesic from x with initial velocity v.
The Exp map at x, v 7→ cos(tkvk)x + sin(tkvk) kvk v
is thus well defined and is a
diffeomorphism from the open ball of radius π to S \ {−x}, i.e. onto the entire sphere
d

except the antipodal point of x. Its inverse is defined for any y ∈ / {x, −x} ∈ S d by
y − hy, xix
Logx (y) = arccos(hy, xi) . (3.17)
ky − hy, xixk
We implement the Riemannian Exp and Log maps in the HypersphereMetric.

Implementation trick 3.1. To improve numerical stability around 0, we use a Taylor


approximation of the sinc and cosine functions for inputs smaller than 10−6 . As both
functions are even, the squared norm of v is computed and taking the square-root is
not necessary.

35
For a C 1 -curve γ : [a, b] → M , we now define its length L and energy E (also called
action integral in physics) by:
Z b Z b
L(γ) = kγ̇(t)kdt, E(γ) = kγ̇(t)k2 dt
a a

Note that the length does not depend on the parametrization of the curve, while the energy
does. Indeed, moving along a path from a to b does not require the same energy if the
speed is increased, but the length remains the same. This finally leads to the definition of
distance.

3.3.2 Distance and completeness


Definition 3.3.2 (Riemannian distance). Let (M, g) be a Riemannian manifold, and x, y ∈ M .
The Riemannian distance between x and y is the lower bound of the lengths of all piecewise
smooth curves joining x to y:

d(x, y) = inf{L(γ) | γ : I → M piecewise C 1 , γ(0) = x, γ(1) = y}.

We say that γ is minimizing if L(γ) = d(x, y).

If M is connected∗ , this distance function is indeed a distance, and the induced topology
coincides with that of M (Paulin, 2014, Proposition 3.13). We say that a curve γ : [a, b] → M
is parametrized by (or resp. proportional to) arc-length if L(γ) = b − a (resp. ∃λ > 0, L(γ) =
λ(b − a)). We now see that the geodesics of (M, g) are the minimizing curves, and further
minimize the total energy (Paulin, 2014, Proposition 3.14).

Theorem 3.3.1. Let (M, g) be a Riemannian manifold, and γ : [a, b] → M a C 1 curve. The
following assertions are equivalent

• γ is a geodesic;

• γ is parametrized with constant velocity and locally minimizing;

• γ is locally energy minimizing.

In particular, as for any (x, v) ∈ T M , the curve γ : t ∈ Inj(x) 7→ Expx (tv) is a geodesic,
it is locally length-minimizing, so that for any y ∈ Expx (Inj(x)), d(x, y) = L(γ) = kvk. By
definition, v = Logx (y), so that

d(x, y) = k Logx (y)k.

We add this as a method to the RiemannianMetric class of geomstats.


Defined in Appendix A.

36
def dist(self, point_a, point_b):
"""Riemannian distance between two points"""
log = self.log(point_b, base_point=point_a)
return self.norm(log, base_point=point_a)

Furthermore, the last point of Theorem 3.3.1 is a variational principle, i.e. the minimization
of a function on the space of paths.
Recall from Definition 3.2.4 that a Riemannian manifold is called geodesically complete
if the Exp map is defined on the whole of T M , meaning that geodesics are defined on all R.
On the other hand, the more usual notion of completeness of a metric space is defined as
follows:
Definition 3.3.3 (Complete metric space). A metric space is called complete if every Cauchy
sequence converges in that space.

Example 3.7: Completeness of the half-plane


Let the half-plane P = {(x, y) ∈ R2 | y > 0} be equipped with the canonical
Euclidean metric of R2 . Then P is obviously not geodesically complete, as geodesics
are straight lines exiting P .
2 2
Now define the metric g(x, y) = dx y+dy
2 . (P, g) is now a geodesically complete
Riemannian manifold.

The following theorem gives a sufficient condition for Riemannian manifolds to share
the same nice properties as complete metric spaces.
Theorem 3.3.2 (Hopf-Rinow). Let (M, g) be a Riemannian manifold. If M is geodesically
complete, then any two points of M can be joined by a minimizing geodesic.
Corollary 3.3.3. Let (M, g) be a connected Riemannian manifold, and d the Riemannian
distance. The following are equivalent
• (M, g) is geodesically complete;

• every closed and bounded subset of (M, d) is compact;

• the metric space (M, d) is complete (as a metric space).


Remark 3.3.1 (Infinite dimension). The Hopf-Rinow theorem is not true in infinite dimension.
Counter-examples have been constructed and exhibit manifolds that are complete as metric
spaces but for which points exist that cannot be joined by a minimizing geodesic, or even
by any geodesic at all. For more details, see Atkin (1975).
In the previous example of the hypersphere, geodesics are great circles and are known
in closed-form. To finish this subsection, we detail our implementation of the Riemannian
distance when no closed-form solution is available for the geodesics.

37
3.3.3 Numerical approximations in geomstats
In the most general case, recall that geodesics are defined by an ODE expressed either in a
chart or in the ambient space (e.g. Equation (3.7)). Discrete integration methods can thus
be used to approximate the exponential map, and optimization algorithms to compute the
logarithm. Inspired by Kühnel et al. (2019), we implemented those methods in geomstats.
Consider the geodesic equation as a coupled system of first-order ODEs:

v(t) = γ̇(t)
(3.18)
v̇(t) = f (v(t), γ(t), t)

where f is a smooth function. A method of the class Connection is used to compute the
right-hand-side of (3.18), called geodesic_equation where the state variable is (γ(t), γ̇(t)):
def geodesic_equation(self, state, time):
"""Return the right-hide-side of the geodesic equation."""
position, velocity = state
gamma = self.christoffels(position)
equation = -gs.einsum('...kij,...i,...j->...k', gamma, velocity, velocity)
return gs.stack([velocity, equation])

Given initial conditions, a first-order forward Euler scheme, or higher-order Runge-Kutta


method can be used to integrate this system.
STEP_FUNCTIONS = {'euler': 'euler_step', 'rk2': 'rk2_step'}

def euler_step(force, state, time, dt):


"""Compute one step of the Euler approximation."""
derivatives = force(state, time)
new_state = state + derivatives * dt
return new_state

def rk2_step(force, state, time, dt):


"""Compute one step of the rk2 approximation."""
k1 = force(state, time)
k2 = force(state + dt / 2 * k1, time + dt / 2)
new_state = state + dt * k2
return new_state

def integrate(
function, initial_state, end_time=1.0, n_steps=10, step='euler'):
"""Compute the flow of a vector field."""

dt = end_time / n_steps
states = [initial_state]
step_function = globals()[STEP_FUNCTIONS[step]]

current_state = initial_state
for i in range(n_steps):
current_state = step_function(
state=current_state, force=function, time=i * dt, dt=dt)
states.append(current_state)
return states

We can thus add the following method to the class Connection of geomstats to compute
the exponential map

38
def exp(self, tangent_vec, base_point, n_steps=10, step='euler', **kwargs):
"""Exponential map associated to the affine connection."""
initial_state = gs.stack([base_point, tangent_vec])
flow = integrate(
self.geodesic_equation, initial_state, n_steps=n_steps, step=step)
exp, velocity = flow[-1]
return exp

On the other hand, the Log map corresponds to a boundary value problem (BVP), and an
optimization procedure is required. We choose the geodesic shooting method, that solves the
following problem (P) which corresponds to an energy minimization. Working in a convex
neighborhood, it admits a unique minimizer v ∗ :

min d2 (Expx (v), y) (P)

As in a coordinate chart or in the embedding space, all norms are equivalent, we use the
appropriate Euclidean metric (written k.k2 ) for practical purpose.

k Expx (v) − yk22 . (P 0 )

and the problem is solved by gradient descent (GD) until a convergence tolerance  is reached.
We use the minimize function from the scipy function, and use automatic differentiation to
compute the gradient of the exponential map.

def log(self, point, base_point, n_steps=N_STEPS, step='euler',


max_iter=25, verbose=False, tol=gs.atol):
"""Compute logarithm map associated to the affine connection."""
max_shape = point.shape if point.ndim > base_point.ndim else \
base_point.shape

def objective(velocity):
"""Define the objective function."""
velocity = gs.array(velocity, dtype=base_point.dtype)
velocity = gs.reshape(velocity, max_shape)
delta = self.exp(velocity, base_point, n_steps, step) - point
return gs.sum(delta ** 2)

objective_with_grad = gs.autograd.value_and_grad(objective)
tangent_vec = gs.flatten(gs.random.rand(*max_shape))
res = minimize(
objective_with_grad, tangent_vec, method='L-BFGS-B', jac=True,
options={'disp': verbose, 'maxiter': max_iter}, tol=tol)

tangent_vec = gs.array(res.x, dtype=base_point.dtype)


tangent_vec = gs.reshape(tangent_vec, max_shape)
return tangent_vec

3.4 Curvature
3.4.1 Definition and Properties
In a Euclidean space, a constant field is parallel along any curve. In a Riemannian manifold
in general, parallel transport depends on the curve followed, and there may not exist
fields that are parallel along all curves, not even locally. One can investigate the effect
39
of parallel transport along small closed curves. Consider a point x ∈ M and the closed
curves whose tangent velocities at that point span a subspace of dimension two. They
introduce a deviation of parallel transport from the identity map of the tangent space at
that point, and this deviation can be shown to depend only on a basis of this plane. This
is due to the commuting properties of the covariant derivative, i.e. the difference between
evaluating ∇X after ∇Y and vice-versa. This difference is not sufficient however to define a
tensor. The following lemma gives a sufficient condition for a map to define a tensor on a
manifold (Lafontaine et al., 2004, Proposition 1.114).

Lemma 3.4.1 (Tensoriality). Let p ∈ N and A : Γ(T M )p → Γ(T M ) be a C ∞ (M )-


multilinear map, i.e. ∀f1 , . . . , fp ∈ C ∞ (M ), ∀X1 , . . . Xp ∈ Γ(T M ), A(f1 X1 , . . . fp Xp ) =
f1 . . . fp A(X1 , . . . , Xp ). Then for any x ∈ M , the value of A(X1 , . . . , Xp ) at x only depends
on the value of the Xi ’s at x.

Simple computations using the Leibniz rule from the definition of a connection (2) show
that the following defines a C ∞ -multilinear map (see Lee (2018, Proposition 7 .3) for the
computations).

Definition 3.4.1 (Curvature tensor). Let (M, ∇) be a manifold equipped with a connection.
The curvature tensor of ∇ is defined as the map from Γ(T M )3 to Γ(T M ) by

R(X, Y )Z = ∇X ∇Y Z − ∇Y ∇X Z − ∇[X,Y ] Z (3.19)

The curvature tensor of a Riemannian manifold (M, g) is the curvature of its Levi-Civita
connection.

In the above definition, the arguments X, Y, Z are vector fields, but the tensoriality
Lemma 3.4.1 allows to write at any x ∈ M , Rx as a map defined on tangent vectors
u, v, w ∈ Tx M .

Example 3.8: Curvature of the Euclidean space


In a Euclidean space Rd with the canonical inner-product, the connection coincides
with the directional derivative, which commutes. Therefore R = 0.

The following properties help cope with the complexity of this tensor. Proofs can be
found in Lafontaine et al. (2004, Proposition 3.5), Gallier and Quaintance (2020, Proposition
16.1, 16.3).

Proposition 3.4.1. Let (M, g) be a Riemannian manifold and R its curvature tensor. The
following properties hold

40
1. (Skew-symmetry) The map (X, Y ) ∈ Γ(T M )2 7→ R(X, Y ) is skew-symmetric.

2. (Bianchi’s identity) For any X, Y, Z ∈ Γ(T M ), we have


R(X, Y )Z + R(Y, Z)X + R(Z, X)Y = 0.

3. (Bianchi’s second identity) For any X, Y, Z ∈ Γ(T M ), we have


∇X R(Y, Z) + ∇Y R(Z, X) + ∇Z R(X, Y ) = 0.

4. For any isometry f ∈ Isom(M ) and X, Y, Z ∈ Γ(T M ),


R(df X, df Y )(df Z) = df R(X, Y )Z


5. For all X, Y ∈ Γ(T M ), the field of linear maps R(X, Y ) is skew-symmetric with
respect to the metric g, i.e. ∀W, Z ∈ Γ(T M )
g(R(X, Y )Z, W ) = −g(Z, R(X, Y )W )

6. For all X, Y, Z, W ∈ Γ(T M ), we have


g(R(X, Y )Z, W ) = g(R(Z, W )X, Y ) = g(R(W, Z)Y, X)

Note that the compatibility of the metric (Equations (3.9), (3.8)) are required for the
last two assertions.

3.4.2 Sectional curvature


The previous proposition allows to define the sectional curvature for any x ∈ M and
u, v ∈ Tx M such that u, v are not collinear
hRx (u, v)v, ui
κx (u, v) = (3.20)
kuk2 kvk2 − hu, vi2
From the above properties, the value of κx in fact only depends on the plane spanned by
(u, v), i.e. for any non-vanishing linear combination αu + βv, κ(αu + βv, v) = κ(u, v). In
fact, this scalar quantity is enough to characterize the curvature of M . Indeed, one can
prove from algebraic computations only (see e.g. Kobayashi and Nomizu (1996a, Chapter
V, Proposition 1.2)) the following
Lemma 3.4.2. Let two quadrilinear mappings A, B defined on a vector space V that both
verify the properties ∀u, v, w, z ∈ V
A(u, v, w, z) = −A(v, u, w, z) = −A(v, u, z, w)
A(u, v, w, z) + A(u, w, z, v) + A(u, z, v, w, ) = 0
and coincide for any two variables u, v ∈ V in the sense: A(u, v, u, v) = B(u, v, u, v). Then
A = B.

41
This applies to the map (u, v, w, z) 7→ g(R(w, z)v, u) and one can thus show the following
theorem (Lafontaine et al., 2004, Theorem 3.8).

Theorem 3.4.3. The sectional curvature determines the curvature tensor.

In particular, if for every x ∈ M , κ has a constant value κx on every planes of Tx M ,


then we can compute the curvature up to this constant. Indeed from the definition (3.20)
we have for any u, v ∈ Tx M

hR(u, v)v, ui = κx kuk2 kvk2 − hu, vi2 )

Consider the maps A : (u, v, w, z) 7→ hR(u, v)w, zi and B defined by

B(u, v, w, z) = hu, wihv, zi − hu, zihv, wi




It is clear that the hypotheses of Lemma 3.4.2 are verified for A and κx B, so we conclude

hR(u, v)w, zi = (hu, wihv, zi − hu, zihv, wi) κx . (3.21)

For example, Equation (3.21) is always valid in dimension d = 2, as there is only one plane
in each tangent space.
We now state the surprising result of F. Schur that gives a sufficient condition for
formula (3.21) to hold. A proof can be found in Kobayashi and Nomizu (1996a, Theorem
2.2).

Theorem 3.4.4 (Schur). Let (M, g) be a connected Riemannian manifold of dimension d. If


d ≥ 3 and if the sectional curvature κ does not depend on the plane but only on the point
x ∈ M , then κ is constant (i.e. does not even depend on x).

Example 3.9: Sectional curvature of S d and H+


d

Recall that the rotation group SO(d + 1) acts on the hypersphere S d , and suppose
that d ≥ 2. The stabilizer of this action on any x ∈ S d is isomorphic to SO(d), whose
action on Tx S d is transitive on 2-planes, i.e. any 2-plane can be mapped by a rotation
to any other 2-plane.
By assertion 4. of Proposition 3.4.1, the sectional curvature is preserved from one
plane to another by rotations, it is therefore constant on the whole Tx M , write this
value κx . If d ≥ 3, we can use Theorem 3.4.4 to conclude it is constant on S d . Of
course, it is also constant for d = 2, as we can use an isometry to map the curvature
tensor from one point to any other.
The same argument, using this time the group O(1, d) of isometries of the Lorentz
product, that acts transitively on the hyperboloid H+ d , shows that this space also has

constant curvature.

42
Figure 3.2: Drawing of the distortion of geodesics compared to Euclidean tangent vectors. Arrows represent
tangent vectors, thin black lines represent geodesics and we use the notation xv = Expx (v). Curvature
modifies the geodesic distance d compared with the distance between rv and rw in Tx M (adapted from
Paulin, 2014, Section 3.6.3).

Definition 3.4.2 (Constant curvature). Let (M, g) be a Riemannian manifold. (M, g) is said
to have constant (resp. negative, resp. positive) curvature if it has constant (resp. negative,
resp. positive) sectional curvature.
In fact, the (complete simply connected) constant curvature spaces are all isometric to
one of the Euclidean space (flat), the hypersphere (positive curvature) or the hyperbolic
space (negative curvature) (see e.g. Lafontaine et al., 2004, Theorem 3.82). These three
examples thus describe the entire class of constant curvature spaces (up to covering and
isometry).
To finish this section, we may now explain how curvature modifies the correspondence
between distances in tangent spaces and Riemannian distances on a manifold. Indeed,
let (M, g) be a Riemannian manifolds and consider two orthogonal tangent vectors v, w
at some x ∈ M . In the vector space Tx M , the √ distance (induced by the metric at x)
between rv and rw for some r > 0 is of course r 2. Now map these vectors to the manifold
using the exponential map, then the Riemannian distance between Expx (rv) and Expx (rw)
may increase or decrease compared to the distance in the tangent space. Curvature is the
fundamental tool that allows to quantify these variations, and the sign of the sectional
curvature tells whether geodesics accumulate or grow apart (Figure 3.2).
Theorem 3.4.5. Let (M, g) be a Riemannian manifold, R its curvature tensor, κ its sectional
curvature, x ∈ M and γv , γw two geodesics starting from x with initial velocities v, w ∈ Tx M .
Then for r → 0,
r4
d(γv (r), γw (r))2 = r2 kv − wk2 − hR(v, w)w, vi + O(r5 ). (3.22)
3
Moreover, if v, w are orthonormal
√ 
κ(v, w) 2

d(γv (r), γw (r)) = 2r 1 − r + O(r4 ). (3.23)
12

43
We now understand the meaning of the sign of the sectional curvature: if κ > 0, geodesics
get closer to one another, while if κ < 0, they grow apart. Using this theorem and formulas
for geodesics on the hypersphere in Example 3.6, we can compute a Taylor expansion of
d(γv , γw ) and identify the coefficients, to find that κ = 1. Similarly, for the hyperbolic space
κ = −1.
Finally, we state two important theorems that give a hint about how the sign of the
curvature determines the geometry. These theorems also show how curvature and topology
may be intertwined. See Lafontaine et al. (2004, Theorem 3.87) for a proof.

Theorem 3.4.6 (Cartan-Hadamard). Let (M, g) be a complete connected Riemannian man-


ifold with non-positive sectional curvature. Then the exponential map is a Riemannian
covering, ı.e. a covering that is a local isometry. This means that if M is simply connected,
then M is diffeomorphic to Rd , and any two points are joined by a unique minimizing
geodesic.

Remark 3.4.1 (Infinite dimension). This theorem carries over to infinite dimensional Rie-
mannian Hilbert manifolds, i.e. whose atlases are valued in a Hilbert space (McAlpin,
1965).

On the contrary (Berger, 2003, Theorem 63):

Theorem 3.4.7 (Particular case of Bonnet-Myers theorem). Let (M, g) be a complete con-
nected Riemannian manifold with sectional curvature bounded below κ > r12 for some r > 0.
Then the diameter of M (i.e. the largest distance between points in M ) is bounded above
by πr and M is compact.

This ends our exposition of Riemannian metrics. In the next section, we focus on
a particular class of manifolds that play a fundamental role in geometry and in many
applications.

4 Lie groups
We first define the notions of Lie groups, subgroups and algebras, then the exponential map
of Lie groups, and finally introduce smooth actions of Lie groups and homogeneous spaces.
The metrics that can be defined on Lie groups, and their implementation in geomstats are
covered in Section 4.3. We restrict to finite-dimensional Lie groups and only give a few
remarks about infinite dimension.

44
4.1 Lie groups, Lie algebras and Lie subgroups
Definition 4.1.1 (Lie group). A Lie group is a group∗ (G, ·) such that G is also a finite
dimensional smooth manifold, and the group and differential structures are compatible, in
the sense that the group law · and the inverse map g 7→ g −1 are smooth.

We generally omit · in the notation and simply write the group composition as a
multiplication. Let e denote the neutral element, or identity of G.

Remark 4.1.1 (Infinite dimension). Infinite dimensional groups with a smooth structure
appear naturally, for example as the set of diffeomorphisms of a smooth manifolds. However,
although some properties of finite dimensional Lie groups generalize to infinite dimension,
many don’t. See Milnor (1984) for a good treatment of the topic. Unless otherwise specified,
we restrict our exposition to finite dimension.

For any g ∈ G, we define the left and right translation maps Lg and Rg by

Lg : h ∈ G 7→ gh ∈ G Rg : h ∈ G 7→ hg −1 .

By the definition of a Lie group, Lg and Rg are diffeomorphisms of G and their differential
are linear isomorphisms of tangent spaces. This means that all the tangent spaces of G are
identical to Te G, and there is a canonical way of mapping vectors from Tg G to Te G, by
Lg−1 for any g ∈ G.
This fact, that could be rephrased as “the tangent bundle of a Lie group is trivial”,
means that T G is diffeomorphic to the product G × Te G, and is of fundamental importance
to characterize the geometry of a Lie group, and is very handy for the implementation.
Furthermore, the maps g 7→ Lg and g 7→ Rg are group homomorphisms∗ between (G, ·) and
(Diff(G), ◦), the group of diffeomorphisms of G with the composition of maps as group law.

Definition 4.1.2 (Invariant vector field). A vector field X ∈ Γ(T G) is left-invariant if


∀g, h ∈ G,
Xgh = dLg Xh
Let L(G) denote the set of all left-invariant vector fields.

Example 4.1: General Linear group


The real general linear group GL(n) plays a fundamental role as all the groups
implemented in geomstats and encountered in the applications are subgroups of
GL(n). It is defined as the set of invertible matrices of size n

GL(n) = {A ∈ Mn (R) | det(A) 6= 0},


Defined in Appendix A.

45
2
and as an open set of the vector space Mn (R) ' Rn it is a smooth manifold.
The group law is the matrix multiplication, that can be written as a polynomial
of the matrices coefficients and is thus smooth. Thanks to Cramer’s formula A−1 =
det(A)−1 Co(A)T , where Co denotes the matrix formed by all cofactors, the inversion
map is also smooth. Therefore, GL(n) is a Lie group.

Definition 4.1.3 (Lie algebra). A real Lie algebra is a real vector space g equipped with a
bilinear map [·, ·] that verifies

• (Skew-symmetry) ∀x, y ∈ g, [x, y] = −[y, x],

• (Jacobi identity) ∀x, y ∈ g, [x, [y, z]] + [y, [z, x]] + [z, [y, x]] = 0.

Example 4.2: Vector fields


The space of vector fields Γ(T M ) of a manifold M equipped with the Lie bracket
of vector field defined in Equation (2.2) page 15 is an infinite dimensional Lie algebra.

Example 4.3: Matrix algebra


The algebra of squared matrices Mn (R) equipped with the commutator

∀A, B ∈ Mn (R), [A, B] = AB − BA

is a Lie algebra.

Implementation in geomstats As the Lie groups we are working with are all matrix
groups, we implement an abstract class MatrixLieGroup to gather the properties of a Lie
group: its identity, composition law, translation map, exponential and logarithm maps.
Morever, it is useful to have the Lie algebra implemented as a separate class and set as
an attribute of the Lie group. This class is a subclass of VectorSpace and implements a
belongs and projection method. This allows for example to write the methods is_tangent
and to_tangent as follows:
def tangent_translation(self, point, left_or_right='left', inverse=False):
"""Return the differential map of the right or left translation map."""
point_ = self.inverse(point) if inverse else point
if left_or_right == 'left':
return lambda tan_vec: self.compose(point_, tan_vec)
return lambda tan_vec: self.compose(tan_vec, point_)

def is_tangent(self, vector, base_point=None, atol=gs.atol):


"""Check whether a vector is tangent at base point."""

46
if base_point is None:
base_point = self.identity

if gs.allclose(base_point, self.identity):
tangent_vec_at_id = vector
else:
tangent_vec_at_id = self.tangent_translation(
base_point, inverse=True)(vector)
return self.lie_algebra.belongs(tangent_vec_at_id, atol)

def to_tangent(self, vector, base_point=None):


"""Project a vector onto the tangent space at a base point."""
if base_point is None:
return self.lie_algebra.projection(vector)
tangent_vec_at_id = self.tangent_translation(
base_point, inverse=True)(vector)
projected = self.lie_algebra.projection(tangent_vec_at_id)
return self.tangent_translation(base_point)(projected)

Example 4.4: Implementation of GL(n)


The general linear group (Example 4.1) is the archetype of matrix Lie group, and
thus created as a subclass of MatrixLieGroup. It is defined as an open set of the matrix
space, so it also inherits from OpenSet.

class GeneralLinear(MatrixLieGroup, OpenSet):


"""Class for the general linear group GL(n)."""

def __init__(self, n, **kwargs):


if 'dim' not in kwargs.keys():
kwargs['dim'] = n ** 2
super().__init__(
ambient_space=Matrices(n, n), n=n, **kwargs)

def belongs(self, point, atol=gs.atol):


"""Check if a matrix is invertible and of size n."""
has_right_size = self.ambient_space.belongs(point)
if gs.all(has_right_size):
det = gs.linalg.det(point)
return gs.abs(det) > atol
return has_right_size

Again this is an open set, so the projection performs a regularization by adding In :

def projection(self, point):


"""Project a matrix to the general linear group."""
belongs = self.belongs(point)
regularization = gs.einsum(
'...,ij->...ij', gs.where(~belongs, gs.atol, 0.), self.identity)
projected = point + regularization
return projected

The set of left-invariant vector fields is fundamental when studying Lie groups it is
closed under Lie brackets and boils down to the tangent space at the identity, as stated by
the following theorem (Lee, 2003, Proposition 8.33 and Theorem 8.37).

47
Theorem 4.1.1 (Left-invariant vector fields). Let G be a Lie group of finite dimension d.
(1) The space of left-invariant vector fields L(G) of a Lie group G is a sub-algebra of Γ(T G).
This means that the bracket of two left-invariant vector fields is also left-invariant.

(2) As a vector space, L(G) is isomorphic to Te G, the tangent space at the identity of G.
This implies that L(G) is finite dimensional.
Indeed, the map X ∈ L(G) 7→ Xe ∈ Te G is linear and has inverse x 7→ x̃ where we
define x̃ to be the left-invariant vector field
x̃ : g ∈ G 7−→ dLg x ∈ Tg G.
We can thus define the bracket on Te G: [x, y] , [x̃, ỹ]e , which turns Te G into a Lie algebra
that is isomorphic (as Lie algebras) to L(G). We call Te G the Lie algebra of G and denote
it g = Te G.
Thankfully, a linear representation of G allows to compute the Lie bracket without
requiring the handling of vector fields. Indeed, define the conjugation map of G, for any
g ∈ G by Cg : h 7→ ghg −1 , and its differential at the identity, called adjoint representation
of G:
ADg : g 7→ g.
Then the differential of g 7→ ADg , written ad is called the adjoint representation of g, and
one can show that it coincides with the Lie bracket of g:
adx (y) = [x, y].

Example 4.5: General linear Lie algebra


Recall from Example 4.1 that GL(n) is a Lie group, whose differentiable structure
comes from its embedding in Mn (R). Thus, by (1) of Theorem 2.1.2 (page 12), its
tangent space at the identity is the entire matrix space, and as the Lie algebra of
GL(n) it is written gl(n). We now compute explicitely AD and ad and verify that
the bracket of gl(n) coincides with the commutator defined in Example 4.3.
The conjugation map is the restriction to GL(n) of the linear map h 7→ ghg −1 of
Mn (R) for any g ∈ GL(n). Its differential is thus for any X ∈ Mn (R)

ADg (X) = gXg −1 .

Now consider a curve c : (−, ) → GL(n) such that c(0) = In and c0 (0) = X ∈ Mn (R).
For any Y ∈ Mn (R) we have
d
adX Y = d(g 7→ ADg (Y ))In = (c(t)Y c(t)−1 ) = XY − Y X
dt t=0

48
To summarize,
• The Lie algebra of a Lie group is its tangent space at identity Te G or equivalently,
the algebra of left-invariant vector fields.

• The Lie bracket on Te G is defined via the adjoint representation and coincides with
that of left-invariant vector fields.

• In the case of linear Lie algebras (i.e. subalgebras of Mn (R)), the Lie bracket coincides
with the matrix commutator.
Remark 4.1.2. Note that right-invariant vector fields can be defined analogously as left-
invariant vector fields, and their set forms a Lie algebra. However, the Lie bracket of
right-invariant vector fields coincides with the opposite of the adjoint representation. For
practical and historical reasons, right-invariant fields are used in infinite dimension while
left-invariant fields are used in finite dimension.
As all the next examples will be subgroups of the general linear group, we state a result
by von Neumann and Cartan known as the closed-subgroup theorem that gives a necessary
and sufficient condition for a subgroup of a Lie group G to be an embedded Lie subgroup, i.e.
a Lie group with differential structure agreeing with that of G, and such that the inclusion
map is smooth (Cartan, 1930, Section III, paragraph 26). See Lee (2003, Theorem 20.12)
for a proof.
Theorem 4.1.2 (Cartan-von-Neumann). Let G be a Lie group. Any closed sugroup H ⊂ G
of G is a Lie subgroup of G. Conversely, any Lie subgroup of G is closed.
One can also show that the Lie algebra of a subgroup G is a Lie subalgebra of the Lie
algebra g of G. Thus, they share the same bracket. Therefore, all Lie algebras considered in
geomstats use the matrix commutator as bracket.

Example 4.6: Implementation of SO(n)


Recall that the special orthogonal group is the set of orthogonal matrices of positive
determinant. It is clear that it is stable by multiplication, hence it is a subgroup of
GL(n).
Furthermore, SO(n) = f −1 (In ) ∩ det−1 (1), where f : A 7→ A> A and det are
continuous maps. SO(n) is thus closed in Mn (R), and is hence a Lie subgroup of
GL(n).
Moreover, from Example 2.11 (page 13), its Lie algebra is
so(n) , TIn SO(n) = {A ∈ Mn (R) | A> + A = 0} = Skew(n).
In geomstats, SO(n) is implemented as a subclass of the MatrixLieGroup class and as

49
it is embedded in GL+ (n) (the subgroup of GL(n) formed by matrices with positive
determinent), it also inherits from the LevelSet class. This allows to inherit the
composition method, but the inverse method is overridden by the transposition.

class SpecialOrthogonalMatrices(MatrixLieGroup, LevelSet):


"""Class for special orthogonal group."""

def __init__(self, n):


matrices = Matrices(n, n)
gln = GeneralLinear(n, positive_det=True)
super().__init__(
dim=int((n * (n - 1)) / 2), n=n, value=gs.eye(n),
lie_algebra=SkewSymmetricMatrices(n=n), embedding_space=gln,
submersion=lambda x: matrices.mul(matrices.transpose(x), x),
tangent_submersion=lambda v, x: 2 * matrices.to_symmetric(
matrices.mul(matrices.transpose(x), v)))
self.bi_invariant_metric = BiInvariantMetric(group=self)
self.metric = self.bi_invariant_metric

@classmethod
def inverse(cls, point):
"""Return the transpose matrix of point."""
return cls.transpose(point)

def projection(self, point):


"""Project a matrix on SO(n) by minimizing the Frobenius norm."""
aux_mat = self.submersion(point)
inv_sqrt_mat = SymmetricMatrices.powerm(aux_mat, - 1 / 2)
rotation_mat = Matrices.mul(point, inv_sqrt_mat)
det = gs.linalg.det(rotation_mat)
return utils.flip_determinant(rotation_mat, det)

And the Lie algebra is implemented as follows.

class SkewSymmetricMatrices(MatrixLieAlgebra):
"""Class for skew-symmetric matrices."""

def __init__(self, n):


dim = int(n * (n - 1) / 2)
super().__init__(dim, n)
self.ambient_space = Matrices(n, n)

def belongs(self, mat, atol=gs.atol):


"""Evaluate if mat is a skew-symmetric matrix."""
has_right_shape = self.ambient_space.belongs(mat)
if has_right_shape:
return Matrices.equal(mat, - Matrices.transpose(mat), atol=atol)
return False

@classmethod
def projection(cls, mat):
"""Compute the skew-symmetric component of a matrix."""
return 1 / 2 * (mat - Matrices.transpose(mat))

50
Example 4.7: Implementation of SE(n)
We now introduce a group that is ubiquitous in applications. This group is defined
as the set of direct isometries - or rigid-body transformations - of Rn , i.e. the linear
transformations of the affine space Rn that preserve its canonical inner-product.
Such transformation ρ can be decomposed in a rotation part and a translation part:
ρ(x) = Rx + u, where R ∈ SO(n) and x, u ∈ Rn . Define

SE(n) = {(R, u) | R ∈ SO(n), u ∈ Rn }.

Now, the composition of two isometries ρ = (R, u), ρ0 (R0 , u0 ) remains an isometry:

ρ ◦ ρ0 (x) = RR0 x + Ru0 + u,

and can be written ρ ◦ ρ0 = (RR0 , Ru0 + u). This suggests the representation of SE(n)
in homogeneous coordinates
!
R u
ρ= ∈ GL(n + 1) (4.1)
0 1

The composition of isometries then corresponds to matrix multiplication, and SE(n)


is now a Lie subgroup of GL(n + 1). Its Lie algebra is
( ! )
S v
se(n) = | S ∈ Skew(n), v ∈ R n
.
0 0

Thus its dimension is n(n+1)


2 . In geomstats, SE(n) inherits from MatrixLieGroup and
is embedded in GL+ (n + 1) with overridden inverse function. We used a utility
function that builds a block matrix according to Equation (4.1) The submersion used
is the map
!
R u
∈ GL+ (n + 1) 7→ (R> R, v, c) ∈ Mn (R) × Rn × R,
v> c

so that SE(n) = f −1 {(In , 0, 1)}. It is not printed in the code below in the interest of
space.

51
class SpecialEuclideanMatrices(MatrixLieGroup, LevelSet):
"""Class for special Euclidean group."""

def __init__(self, n):


super().__init__(
n=n + 1, dim=int((n * (n + 1)) / 2),
embedding_space=GeneralLinear(n + 1, positive_det=True),
submersion=submersion, value=gs.eye(n + 1),
tangent_submersion=tangent_submersion,
lie_algebra=SpecialEuclideanMatrixLieAlgebra(n=n))
self.rotations = SpecialOrthogonal(n=n)
self.translations = Euclidean(dim=n)
self.n = n

self.left_canonical_metric = \
SpecialEuclideanMatrixCannonicalLeftMetric(group=self)
self.metric = self.left_canonical_metric

@property
def identity(self):
"""Return the identity matrix."""
return gs.eye(self.n + 1, self.n + 1)

def inverse(self, point):


"""Return the inverse of a point."""
n = self.n
transposed_rot = self.transpose(point[..., :n, :n])
translation = point[..., :n, -1]
translation = gs.einsum(
'...ij,...j', transposed_rot, translation)
return homogeneous_representation(
transposed_rot, -translation, point.shape)

Remark 4.1.3. Note that as a manifold, SE(n) defined in the above example corresponds
to the product manifold SO(n) × Rn (Example 2.6 page 9), but as a Lie group, it is the
semi-direct product SO(n) n Rn , because the rotation part acts on the translation part in
the composition rule.

4.2 The exponential map


We now study the flow of left-invariant vector fields of a Lie group. For more details and
proofs, we refer the reader to Gallier and Quaintance (2020, Chapter 18). It allows to
canonically map the Lie algebra to its Lie group. First, the left-invariance translates into a
commutation property of the flow with the left translation map:

Proposition 4.2.1. Let G be a Lie group and X ∈ L(G). Then X is complete, and if (φt )t
is its flow, then for any t ∈ R and g, g 0 ∈ G we have

φt (gg 0 ) = gφt (g 0 ), i.e. φt ◦ Lg = Lg ◦ φt .

This allows the following:

Definition 4.2.1 (Exponential map). We call exponential map, and write exp : g → G the
map defined by x 7→ φ1 (e) where (φt )t is the52flow of the left-invariant vector fields x̃.
Remark 4.2.1. This is the second definition of exponential map we encounter, where the
first was the Riemannian exponential defined by the geodesic flow of a metric. We will refer
to the one canonically defined on a Lie group as the group exponential and use a lowercase
exp.

Example 4.8: Exponential map of GL(n)


The fundamental case is of course again that of the general linear group. Recall
that its Lie algebra gl(n) = Mn (R) is the set of square matrices, and the group law is
the (linear) matrix multiplication. Therefore, a left-invariant vector field X̃ associated
to X ∈ gl(n) is defined by
g 7→ gX
Let γ be the integral curve from the identity In . This means that γ is solution to the
ODE defined on R with initial condition γ(0) = In

γ 0 (t) = γ(t)X

It is well known that the unique solution is the matrix exponential, defined by the
series ∞ k
t k
etX =
X
X .
k=0
k!
Thus exp(X) = eX .

We now state some of the fundamental properties of the exponential map (see e.g.
Gallier and Quaintance, 2020, Proposition 18.6-18.7-18.13).
Proposition 4.2.2. For any Lie group G, the exponential map exp : g → G is smooth and
a local diffeomorphism at 0.
The inverse of the group exponential, defined locally, is called logarithm map and is
valued in g = Te G. It thus allows to map data defined on G to its Lie algebra g, which is a
vector space! This fact is at the basis of many algorithms that handle Lie group data.
Remark 4.2.2 (Infinite dimension). Proposition 4.2.2 is not true in infinite dimension. For
example, if G = Diff(M ), the exp map is not even surjective in any neighborhood of the
identity (Schmid, 2004).
Proposition 4.2.3. Let G, H be a Lie group, and f : G → H a Lie group homomorphism.
Then
f ◦ exp = exp ◦dfe (4.2)
In particular, if G = H and f = Cg , we have
exp(t ADg (u)) = g exp(tu)g −1 = Cg (exp(tu)). (4.3)

53
See Lee (2003, Proposition 20.8) for a proof. The commutation property (4.2) can be
depicted by saying that the following diagram commutes, meaning both paths leading from
the top left space to the bottom right one are equivalent:
exp
g G
dfe f
exp
h H
Using f = Lg , or equivalently (thanks to Equation (4.3)) f = Rg , we define the exponential
map at any point g ∈ G so that it fulfills the above properties. For any v ∈ Tg G
expg (v) = Lg ◦ exp ◦(dLg )−1 = Rg−1 ◦ exp ◦dRg .
where differentiation is at e, and so that the group exp is now understood as expe . In
geomstats we use the implementation of the matrix exponential and logarithm from the
backends and include it in the MatrixLieGroup class.
Moreover, we can recover the whole connected component of the identity from the Lie
algebra:
Theorem 4.2.1. If G is a Lie group and G0 is the connected component of e, then G0 is
generated by exp(g).
In particular (see Gallier and Quaintance (2020, Theorem 1.6 and 1.12)), we have the
following
Proposition 4.2.4. SO(n) and SE(n)
• The exponential map exp : so(n) → SO(n) is surjective,

• The exponential map exp : se(n) → SE(n) is surjective.


Note that by applying the commutation property (4.2) to the inclusion map, the
exponential maps of SE(n) and SO(n) coincide with the restriction of the exponential map
of GL(n), i.e. the matrix exponential.
Finally, recall that flows of complete vector fields are one-parameters subgroups of
Diff(M ). Define now a one-parameter subgroup of G as a Lie group homomorphism
t ∈ R 7→ γt ∈ G, written (γt )t∈R . Then by a change of variable in the flow equation,
it is clear that t 7→ exp(tX) is a one-parameter subgroup for any X ∈ g. In fact, all
one-parameter subgroups are of this form and the maps:
d
θ : (γt )t∈R 7→ γt ; θ0 : X 7→ exp(tX) t∈R

dt t=0
are inverse to each other, between g and the set of one-parameter subgroups.
Before closing this section, we illustrate it with a representation of the one-parameter
subgroups of SE(2).

54
Example 4.9: Curve in SE(2)
Recall that SE(2) is the group of rotation-translation transformations, so that
a smooth curve of SE(2) can represent the motion of a rigid-body in a plane. An
element of SE(2) can be represented by the application of the rotation part to the
canonical orthonormal frame of the 2d-plane, while the translation part is canonically
mapped to a point in the 2d-plane. We use this representation in Figure 4.1 to show
one-parameter subgroups. See Appendix B.3 for the code and different initial vectors.

Figure 4.1: One-parameter subgroup of SE(2) for three different initial conditions (that share the
same rotation part). The translation part corresponds to a circle, while the rotation occurs with
constant angular velocity.

4.3 Invariant metrics on Lie groups


4.3.1 Rationale
Different geometric structures are compatible with the group structure. For instance, the
group exponential defined in Section 4 can be defined as the exponential map of a connection
on G, implying that geodesics are one-parameter subgroups. This connection is known as
the canonical Cartan connection, and is of practical interest as in linear groups geodesics
can be computed in closed-form using the matrix exponential.
The canonical Cartan connection may however not be the Levi-Civita connection of
any metric, for more details we refer the reader to Pennec et al. (2020, Chapter 5). In this
section we focus on the study of invariant metrics on Lie groups, i.e. metrics for which
the left (or right) translation map is an isometry. This case is fundamental in geometric

55
mechanics (Kolev, 2004) and has been studied in depth since the foundational papers
of Arnold (1966) and Milnor (1976). Using left-invariant vector fields, one can compute
explicitly the LC connection, allowing to rewrite the geodesic equation. This fundamental
idea, known as Euler-Poincaré reduction, is that the geodesic equation can be expressed
entirely in the Lie algebra thanks to the symmetry of left-invariance (Marsden and Ratiu,
2009), alleviating the burden of coordinate charts.
We derive here a similar reduction of the parallel transport equation, resulting in a
stable and efficient implementation of parallel transport in geomstats. Finally, knowing the
connection, one can also compute algebraically the curvature of the space. We exemplify
these results with an anisotropic metric on the special Euclidean groups SE(2) and SE(3).
Part of the material and results of this section were presented at the GSI 2021 conference
in Guigui and Pennec (2021).

4.3.2 Definitions and computations of the connection


Let G be a Lie group. Recall that the Lie algebra of left-invariant vector fields L(G) and
the tangent space at the identity g can be identified, and in this section we will write x̃ the
left-invariant field generated by x ∈ g: ∀g ∈ G, x̃g = dLg x.
Definition 4.3.1 (Invariant metric). Let G be a Lie group. A Riemannian metric h·, ·i on G
is called left-invariant if the differential map of the left translation is an isometry between
tangent spaces, that is
∀g, h ∈ G, ∀u, v ∈ Tg G, hu, vig = hdLh u, dLh vihg .
It is thus uniquely determined by an inner product on the tangent space at the identity
Te G = g of G. Similarly, a right-invariant metric is such that the differential of the right
translation is an isometry. A metric is called bi-invariant if it is both left and right-invariant.
Moreover, let (e1 , . . . , en ) be an orthonormal basis of g, and the associated left-invariant
vector fields e˜i . As dLg is an isomorphism, (ẽ1,g , . . . , ẽn,g ) form a basis of Tg G for any g ∈ G,
so for any X ∈ Γ(G) one can write Xg = f i (g)ẽi,g where for i = 1, . . . , n g 7→ f i (g) is a
smooth real-valued function on G. Any vector field on G can thus be expressed as a linear
combination of the ẽi with functional coefficients.

Example 4.10: Invarirant metric on SO(n)


Consider the special orthogonal group endowed with the restriction of the Frobenius
metric of Example 3.5. Let P, Q ∈ SO(n) and A, B ∈ TP SO(n) = P Skew(n). Then
it is clear that
hdLQ A, dLQ Bi = tr((QA)> QB) = tr(A> B) = hA, Bi,
and similarly that hdRQ A, dRQ Bi = hA, Bi. Therefore, the Frobenius metric is

56
bi-invariant on SO(n).

Example 4.11: Invarirant metric on SE(n)


Now, consider the special Euclidean group endowed with the restriction of the
Frobenius metric. As in the previous example, it is easy to show that it is left-
invariant. However, it is not right-invariant. Indeed, let g = (P, t) ∈ SE(n) and
x = (S, u), y = (T, v) ∈ se(n) = Skew(n) ⊕ Rn . Then
! !
SP > u − SP > t T P > v − T P >t
hdRg x, dRg yi = h , i
0 0 0 0
= tr(S > T ) + uT v + t> P S > T P > t
= hx, yi + t> P S > T P > t

So that one can find t =6 0 and S, T ∈ Skew(n) such that hdRg x, dRg yi = 6 hx, yi.
Therefore the metric is not bi-invariant. In fact one can show that there does not
exist any bi-invariant metric on SE(n) for n 6= 1, and there exists a bi-invariant
pseudo-metric for n = 3 (Miolane and Pennec, 2015).

Definition 4.3.2 (Dual adjoint). Define the metric dual adjoint map on g as the unique map
that verifies
∀a, b, c ∈ g, had∗a (b), ci = hb, ada (c)i = h[a, c], bi.
As the bracket can be computed explicitly in the Lie algebra, so can ad∗ thanks to
the orthonormal basis of g. Now let ∇ be the Levi-Civita connection associated to the
metric. It is also left-invariant and can be characterized by a bi-linear form on g that verifies
∀x, y ∈ g (Pennec and Arsigny, 2013; Gallier and Quaintance, 2020):
1
α(x, y) , (∇x̃ ỹ)e = [x, y] − ad∗x (y) − ad∗y (x) (4.4)

2
Indeed by the left-invariance, for two left-invariant vector fields X = x̃, Y = ỹ ∈ L(G), the
map g 7→ hX, Y ig is constant, so for any vector field Z = z̃ we have Z(hX, Y i) = 0. Kozsul
formula thus becomes
2h∇X Y, Zi = h[X, Y ], Zi − h[Y, Z], Xi − h[X, Z], Y i (4.5)
2h∇X Y, Zie = h[x, y], zie − hady (z), xie − hadx (z), yie
2hα(x, y), zie = h[x, y], zie − had∗y (x), zie − had∗x (y), zie .
Definition 4.3.3 (Structure constants). Let G be a Lie group and g = h·, ·i be a left-invariant
metric on G. Let (e1 , . . . , ed ) be an orthonormal basis of g for g. Define the structure constants

57
as
k
Cij = h[ei , ej ], ek i.

Example 4.12: Structure constants on SO(3)


An orthonormal basis of so(3) endowed with the Frobenius metric is √1 (a1 , a2 , a3 )
2
with
0 0 0 0 0 1 0 −1 0
     

a1 = 0 0 −1 a2 =  0 0 0  a3 = 1 0 0 .
     

0 1 0 −1 0 0 0 0 0

And as the Lie bracket is the matrix commutator, we can compute the structure
constants to obtain
1
k
Cij = √ if ijk is a direct cycle of {1, 2, 3},
2
and 0 otherwise.

Example 4.13: Structure constants on SE(3)


We define a left-invariant metric on the special Euclidean group by defining an
inner product in its Lie algebra. Let the metric matrix at the identity be diagonal:
g = diag(1, 1, 1, β, 1, 1) for some β > 0, an anisotropy parameter. For β = 1, this
metric coincides with the Frobenius metric. An orthonormal basis of the Lie algebra
se(3) for this metric is
! ! !
a1 0 a2 0 a3 0
e1 = e2 = e3 =
0 0 0 0 0 0
1
! ! !
0 1 0 2 0 3
e4 = √ e5 = e6 = .
β 0 0 0 0 0 0

where (a1 , a2 , a3 ) is the basis of so(3) defined above, and (1 , 2 , 3 ) is the canonical
basis of R3 . As the Lie bracket is the usual matrix commutator, it is straightforward
to compute
1
k
Cij = √ if ijk is a direct cycle of {1, 2, 3}; (4.6)
2
1 4 1 4 1
6
= −C165
= − βC24
6
= √ C26 = βC34 5
= − √ C35 =√ . (4.7)
p p
C15
β β 2

58
and all others that cannot be deduced by skew-symmetry of the bracket are equal to
0.

The structure constants allow to compute ad∗ and formula (4.4) in practice:
1X k j
α(ei , ej ) = ∇ei ej = (Cij − Cjk
i
+ Cki )ek .
2 k

Note however that Equation (4.4) gives the connection for left-invariant vector fields
only. We will now generalize to any vector field defined along a smooth curve on G, using
the left-invariant basis (ẽ1 , . . . , ẽn ).
Let γ : [0, 1] → G be a smooth curve, and Y a vector field defined along γ. Write
Y = hi ẽi , γ̇ = f i ẽi . let us also define the left-angular velocities ω = dL−1
γ γ̇ = f ◦ γei ∈ g
i

and ζ = dL−1 γ Y = h ◦ γej ∈ g. Then the covariant derivative of Y along γ is


j

∇γ̇(t) Y = (f i ◦ γ)(t)∇ẽi hj ẽj




= (f i ◦ γ)(t)ẽi (hj )ẽj + (f i ◦ γ)(t)(hj ◦ γ)(t)(∇ẽi ẽj )γ(t)


dL−1
γ(t) ∇γ̇(t) Y = (f ◦ γ)(t)ẽi (h )ej
i j

+ (f i ◦ γ)(t)(hj ◦ γ)(t)dL−1
γ(t) (∇ẽi ẽj )γ(t)

= (f i ◦ γ)(t)ẽi (hj )ej + (f i ◦ γ)(t)(hj ◦ γ)(t)∇ei ej

where the Leibniz formula and the invariance of the connection is used in (∇ẽi ẽj ) =
dLγ(t) ∇ei ej . Therefore, for k = 1..n

hdL−1
γ(t) ∇γ̇(t) Y, ek i = (f ◦ γ)(t)ẽi (h )hej , ek i
i j

+ (f i ◦ γ)(t)(hj ◦ γ)(t)h∇ei ej , ek i

but on the one hand

ζ(t) = (hj ◦ γ)(t)ej


ζ̇(t) = (hj ◦ γ)0 (t)ej = dhjγ(t) γ̇(t)ej
 
= dhjγ(t) (f i ◦ γ)(t)ẽi,γ(t) ej
= (f i ◦ γ)(t)dγ(t) hj ẽi,γ(t) ej
= (f i ◦ γ)(t)ẽi (hj )γ(t) ej

59
and on the other hand, using Equation (4.4):
2(f i ◦ γ)(hj ◦ γ)h∇ei ej , ek i
 
= (f i ◦ γ)(hj ◦ γ) h[ei , ej ], ek i − had∗ei ej , ek i − had∗ej ei , ek i
 
= (f i ◦ γ)(hj ◦ γ) h[ei , ej ], ek i − h[ej , ek ], ei i − h[ei , ek ], ej i
= (h[(f i ◦ γ)ei , (hj ◦ γ)ej ], ek i
− h[(hj ◦ γ)ej , ek ], (f i ◦ γ)ei i
− h[(f i ◦ γ)ei , ek ], (hj ◦ γ)ej i)
 
= h [ω, ζ] − ad∗ω ζ − ad∗ζ ω , ek i = h2α(ω, ζ), ek i.
Thus, we obtain an algebraic expression for the covariant derivative of any vector field Y
along a smooth curve γ. It is the main ingredient of this section.
Lemma 4.3.1. Let G be a Lie group and ∇ be the Levi-Civita connection of a left-invariant
metric on G. Let γ be a smooth curve on G, and Y a vector field defined along γ. Consider
the left-angular velocities ω = dL−1
γ γ̇ and ζ = dLγ Y . Then
−1

dL−1
γ(t) ∇γ̇(t) Y (t) = ζ̇(t) + α(ω(t), ζ(t)) (4.8)
A similar expression can be found in Arnold (1966) and Gay-Balmaz et al. (2012). As
all the variables of the right-hand side are defined in g, they can be computed with matrix
operations and an orthonormal basis.
We now focus on two particular cases of (4.8) to derive the equations of geodesics and
of parallel transport along a curve.

4.3.3 Geodesic equation


The first particular case is for Y (t) = γ̇(t). It is then straightforward to deduce from
Equation (4.8) the Euler-Poincaré equation for a geodesic curve (e.g Kolev, 2004; Cendra
et al., 1998). Indeed in this case, recall that ω = dL−1
γ γ̇ is the left-angular velocity, ζ = ω
and α(ω, ω) = − ad∗ω (ω). Hence γ is a geodesic if and only if dL−1 γ(t) ∇γ̇(t) γ̇(t) = 0 i.e.
setting the left-hand side of Equation (4.8) to 0. We obtain

γ̇(t) = dLγ(t) ω(t)
(4.9)
ω̇(t) = ad∗ω(t) ω(t).
Remark 4.3.1.
• A similar treatment of a right-invariant metric is straightforward. Indeed, in this
case the Lie bracket of right-invariant vector fields is the opposite of the adjoint
representation of the Lie algebra, therefore the expressions are all the same with −α
instead of α.

60
• One can show that the metric is bi-invariant if and only if the adjoint map is skew-
symmetric (see Pennec and Arsigny (2013) or Gallier and Quaintance (2020, Prop.
20.7)). In this case ad∗ω (ω) = 0 and Equation (4.9) coincides with the equation of
one-parameter subgroups on G.

In the class InvariantMetric of geomstats, the geodesic_equation method is modified to


implement Equation (4.9), and the integrators of Section 3.3 (page 38) are used to compute
the exponential map.

Example 4.14: Geodesics on SE(2)


We use the same visualization as in Example 4.9 (page 55) to plot the
geodesics of a left-invariant metric on SE(2). We first compare the left and right-
invariant metrics with same inner-product at the identity, and compare their
geodesics to one-parameter subgroups. These curves are shown on Figure 4.2.

1.0

0.5

0.0

−0.5
Group exponential
Left
−1.0 Right by integration
−1.0 −0.5 0.0 0.5 1.0
Figure 4.2: Visualization of geodesics and one-parameter subgroups onSE(2).

Next, we define the metric as in Example 4.13, by the bilinear form defined in
se(2) by the matrix g = diag(1, β, 1) for some β > 0, an anisotropy parameter, and
extended by left-invariance.

61
For β = 1, the metric coincides with the Frobenius metric, and there is no
interaction between the rotation and translation part. This metric is thus isomorphic
to the direct product metric of the bi-invariant metric on SO(2) and the usual
inner product of R2 . The geodesics are thus straight lines on the translation part,
and rotations with constant speed (as one-parameter subgroups) on the rotation
part.
For β 6= 1, the rotation part remains the same, but there
is a direction towards which less movement occurs (Figure 4.3).

1.0
β = 1.0 1.0
β = 2.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0

−0.2 −0.2

−0.4 −0.4

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

1.0
β = 3.0 1.0
β = 5.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0

−0.2 −0.2

−0.4 −0.4

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

Figure 4.3: Visualization of geodesics on SE(2) with the same initial conditions and different values
of β.

62
4.3.4 Reduced parallel transport equation
The second case is for a vector field Y that is parallel along the curve γ, that is, ∀t, ∇γ̇(t) Y (t) =
0. Similarly to the geodesic equation, we deduce from Equation (4.8) the parallel transport
equation expressed in the Lie algebra. To the best of our knowledge, this formulation of the
parallel transport on Lie groups with an invariant metric is original.
Theorem 4.3.2. Let γ be a smooth curve on G. The vector Y is parallel along γ if and
only if it is solution to
ω(t) = dL−1

γ(t) γ̇(t)



Y (t) = dLγ(t) ζ(t) (4.10)

= −α(ω(t), ζ(t))

ζ̇(t)

Note that in order to parallel transport along a geodesic curve, Equations (4.9) and (4.10)
are solved jointly. We add the corresponding method to the InvariantMetric class of
geomstats.
def geodesic_equation(self, state, _time):
"""Compute the right-hand side of the geodesic equation."""
sign = 1. if self.left_or_right == 'left' else -1.
basis = self.normal_basis(self.lie_algebra.basis)

point, vector = state


velocity = self.group.tangent_translation_map(
point, left_or_right=self.left_or_right)(vector)
coefficients = gs.array([self.structure_constant(
vector, basis_vector, vector) for basis_vector in basis])
acceleration = gs.einsum('i...,ijk->...jk', coefficients, basis)
return gs.stack([velocity, sign * acceleration])

def parallel_transport(
self, tangent_vec_a, tangent_vec_b, base_point, n_steps=10,
step='rk4', **kwargs):
"""Compute the parallel transport of a tangent vector along a geodesic."""
group = self.group
translation_map = group.tangent_translation_map(
base_point, left_or_right=self.left_or_right, inverse=True)
left_angular_vel_a = group.to_tangent(translation_map(tangent_vec_a))
left_angular_vel_b = group.to_tangent(translation_map(tangent_vec_b))

def acceleration(state, time):


point, omega, zeta = state
gam_dot, omega_dot = self.geodesic_equation(state[:2], time)
zeta_dot = - self.connection_at_identity(omega, zeta)
return gs.stack([gam_dot, omega_dot, zeta_dot])

if (base_point.ndim == 2 or base_point.shape[0] == 1) and \


(tangent_vec_a.ndim == 3 or tangent_vec_b.ndim == 3):
n_sample = tangent_vec_a.shape[0] if tangent_vec_a.ndim == 3 else\
tangent_vec_b.shape[0]
base_point = gs.stack([base_point] * n_sample)

initial_state = gs.stack([base_point, left_angular_vel_b, left_angular_vel_a])


flow = integrate(acceleration, initial_state, n_steps=n_steps, step=step, **kwargs)
gamma, gamma_dot, zeta_t = flow[-1]
transported = group.tangent_translation_map(
gamma, left_or_right=self.left_or_right, inverse=False)(zeta_t)
return transported

63
Example 4.15: Parallel transport in SE(3)
We exemplify the use of the reduced parallel transport equation on SE(3) endowed
with an anisotropic left-invariant metric, as in Examples 4.13 and 4.14. The metric
is defined by the matrix g = diag(1, 1, 1, β, 1, 1) for some β > 0 at the identity and
extended by left-invariance.
We randomly generate a point in x ∈ SE(3) and two tangent vectors v, w ∈
Tx SE(3), and transport the vector v along the geodesic t 7→ Expx (tw).
The results are plotted in a loglog plot (Figure 4.4), with the error shown with
respect to the number of steps used to integrate Equations (4.9) and (4.10), with an
RK2 or RK4 scheme. As expected, the speed of convergence depends on the order of
the scheme used: quadratic speed is reached for RK2 and speed of order 4 for RK4.
absol te error with respect to the n mber of steps

−3
10

−5
10

−7
10

−9
10

−11
Integration, rk2, β = 1.0
10
Integration, rk4, β = 1.0
Integration, rk2, β = 3.0

Integration, rk4, β = 3.0


−13
10
Integration, rk2, β = 5.0

Integration, rk4, β = 5.0

1 2 3
10 10 10

1 / n

Figure 4.4: Norm of the absolute error represented with respect to the number of steps.

When β = 1, it coincides with the product metric between the bi-invariant metric of
SO(3) and of R3 , so the geodesics and parallel transport are known in closed form.
For β > 1, the curvature grows away from 0 (this is proved in Appendix B), and so
does its covariant derivative, so that parallel transport becomes more complex to
compute. We use n = 1100 steps in the discretization of [0, 1] for integration scheme
to compute a reference value that is then used to measure the error of the method
for 10 ≤ n ≤ 1000. We use different values of β but keep the same initial vectors

64
regardless of the value of β.

4.3.5 Curvature
By definition of the left-invariance of the metric, for any g ∈ G, the left-translation map Lg
is an isometry. It is thus sufficient to compute the curvature tensor at the identity, that is,
in the Lie algebra g, and to map it to any other point. Using an orthonormal basis and
formula (4.4), this reduces to simple algebra. The covariant derivative of the curvature
tensor can also be computed using Leibniz rule, for any u, v, w, z ∈ g:

(∇z R)(u, v)w = ∇z R(u, v)w − R(∇z u, v)w − R(u, ∇z v)w − R(u, v)∇z w (4.11)


Of course this is also valid for right-invariant metrics. This is implemented in geomstats in
the class InvariantMetric as follows.

def curvature_at_identity(
self, tangent_vec_a, tangent_vec_b, tangent_vec_c):
"""Compute the curvature at identity."""
bracket = Matrices.bracket(tangent_vec_a, tangent_vec_b)
bracket_term = self.connection_at_identity(bracket, tangent_vec_c)
left_term = self.connection_at_identity(
tangent_vec_a, self.connection_at_identity(tangent_vec_b, tangent_vec_c))
right_term = self.connection_at_identity(
tangent_vec_b, self.connection_at_identity(tangent_vec_a, tangent_vec_c))
return left_term - right_term - bracket_term

def sectional_curvature_at_identity(self, tangent_vec_a, tangent_vec_b):


"""Compute the sectional curvature at identity."""
curvature = self.curvature_at_identity(tangent_vec_a, tangent_vec_b, tangent_vec_b)
num = self.inner_product(curvature, tangent_vec_a)
denom = (
self.squared_norm(tangent_vec_a)
* self.squared_norm(tangent_vec_a)
- self.inner_product(tangent_vec_a, tangent_vec_b) ** 2)
condition = gs.isclose(denom, 0.)
denom = gs.where(condition, 1., denom)
return gs.where(~condition, num / denom, 0.)

def curvature_derivative_at_identity(
self, tangent_vec_a, tangent_vec_b, tangent_vec_c, tangent_vec_d):
"""Compute the covariant derivative of the curvature at identity."""
first_term = self.connection_at_identity(
tangent_vec_a,
self.curvature_at_identity(
tangent_vec_b, tangent_vec_c, tangent_vec_d))

second_term = self.curvature_at_identity(
self.connection_at_identity(tangent_vec_a, tangent_vec_b),
tangent_vec_c,
tangent_vec_d)

third_term = self.curvature_at_identity(
tangent_vec_b,
self.connection_at_identity(tangent_vec_a, tangent_vec_c),
tangent_vec_d)

65
fourth_term = self.curvature_at_identity(
tangent_vec_b,
tangent_vec_c,
self.connection_at_identity(tangent_vec_a, tangent_vec_d))

return first_term - second_term - third_term - fourth_term

If the metric is bi-invariant, it is well known that these formulas greatly simplify (La-
fontaine et al., 2004, Proposition 3 .17) or (Gallier and Quaintance, 2020, Proposition
20.19).

Proposition 4.3.1. If G is a Lie group equipped with a bi-invariant metric, and if X, Y, Z, T ∈


L(G), then
1
hR(X, Y )Z, T i = h[X, Y ], [Z, T ]i (4.12)
4
In particular, G has non-negative sectional curvature.

We now turn to the case of submersions defined by the action of a Lie group on a
manifold, and the properties of the induced quotient metric.

4.4 Group action and homogeneous spaces


To conclude this section, we define the notion of group action, that will be fundamental in
Section 5. It allows to model groups as generators of transformations of any manifold. In
Example 2.5 (page 7), the action of SO(m) on Mm,k (R) is in fact the main factor to study
the geometry of the quotient space SΣkm .

Definition 4.4.1 (Group action). Given a set M and a group G, a left-action of G on M is


a function . : G × M → M , such that:

• ∀g, h ∈ G, ∀x ∈ M, g . (h . x) = (gh) . x,

• ∀x ∈ M, e . x = x.

If furthermore M is a smooth manifold and G is a Lie group, we say that this action is
smooth if the map . is smooth from the product manifold G × M to M . In this case, for
any g ∈ G, x 7→ g . x is a diffeomorphism of M , whose inverse is x 7→ g −1 . x. We call this
map left translation by g and write it Lg by analogy with the group law.

Remark 4.4.1. Note that the map g 7→ Lg is a group homomorphism between G and
Diff(M ). In fact, if this mapping is injective, we say that the action is faithful and we can
see G as an immersed subgroup of Diff(M ).

We say that an action is

66
• free if for all g ∈ G and x ∈ M , if g . x = x, then g = e;

• faithful if for all g ∈ G, if ∀x ∈ M, g . x = x then g = e;

• transitive if for all x, y ∈ M , there exists g ∈ G such that y = g . x;

• proper if for all compact set K ⊂ M , the set {g ∈ G | K ∩ gK 6= ∅} is compact. This


is always the case if G is compact.

Further, The define for any x ∈ M the orbit of x as [x] = G . x = {g . x | g ∈ G}. As in


Example 2.5 (page 7), the orbits are the equivalence classes of the following relation

x ∼ y ⇐⇒ ∃g ∈ G, y = g . x ⇐⇒ y ∈ g . x.

Note that a free action means that Gx = e for all x ∈ M . In contrast, if an action is
faithful, then the map x 7→ g . x is different from the identity of M for all g ∈ G. These
notions play an essential role in determining the geometry of M . For example, we shall see
in Kendall shape spaces that the action of SO(n) is free if we remove certain points, that
are otherwise considered as singularities.

Definition 4.4.2 (Quotient space). The set of orbits is denoted M/G and is called quotient
of M by G. This set is obtained by identifying all the points in an orbit. Define the canonical
projection (
M −→ M/G
π: (4.13)
x 7−→ [x]

We have the following sufficient conditions on the action to ensure that a quotient space
is indeed a smooth manifold (Lee, 2003, Theorem 7.10).

Theorem 4.4.1 (Quotient manifold). Let M be a smooth manifold, G a Lie group with a
smooth left-action on M that is free and proper. Then there exist a unique differential
structure on M/G such that M/G is a smooth manifold and the canonical projection (4.13)
is a submersion.

Note that M/G is equipped with the quotient topology, and the differential structure
on M/G can be derived by taking charts of M that are adapted to the group action in
the sense that the open sets intersect with only one orbit. See Lee (2003) for more details.
Free and proper actions occur in many manifolds considered until here, e.g. the Stiefel
manifold (Example 2.4 page 7).
We now focus on a particular case of the action of a subgroup of G on G. At any x ∈ M ,
we can define the isotropy subgroup, or stabilizer of x as

Gx = {g ∈ G | g . x = x}.

67
Then Gx is a Lie subgroup of G. We consider the action of Gx on G, or more generally of a
Lie subgroup H of G, obviously defined by the group law. We have (Gallier and Quaintance,
2020, Corollary 22.10):

Theorem 4.4.2. The action of a Lie subgroup H of a Lie group G on G is free and proper.

Theorem 4.4.1 thus applies to the right action (g, h) ∈ G × H 7→ gh ∈ G and G/H is a
smooth manifold such that π : G → G/H is a submersion. The orbits of this right action
are the left cosets {gH | g ∈ G} Note that G acts on G/H by g1 . g2 H = g1 g2 H, and it is
clear that this action is transitive. In fact all transitive left actions yield to a quotient space
of orbits from a right action.

Definition 4.4.3 (Homogeneous space). We say that M is homogeneous if there exists a


smooth transitive action of a Lie group G on M.

Remark 4.4.2. Note that there might be several such Lie groups with a transitive action
on a given homogeneous space M .

All homogenous spaces correspond (up to a diffeomorphism) to quotient spaces of the


form G/H, as stated in the following (Gallier and Quaintance, 2020, Theorem 22.13).

Theorem 4.4.3. Let G be a connected Lie group acting smoothly and transitively on a
smooth manifold M , so that M is homogeneous. Then for any x ∈ M , writing H = Gx , the
map (
G/H −→ M
gH 7−→ g . x
is a diffeomorphism, so that M ' G/H.

Note that the choice of reference point x above does not matter, as for two different
choices x and y, the action is transitive so there exists g ∈ G such that y = g . x, and the
stabilizers Gx and Gy = Gg.x = gGx g −1 are conjugate, hence isomorphic.
We exemplify all these notions with the sphere, the Stiefel manifold and the manifold of
symmetric positive definite (SPD) matrices, although other manifolds in geomstats such as
hyperbolic spaces and the Grassmannian are also homogeneous.

Example 4.16: Hypersphere as a homogenous space


The hypersphere embedded in Rd+1 can be seen as a homogeneous space by
considering the action of the special orthogonal group SO(d + 1) of the embedding
space. For R ∈ SO(d + 1), x ∈ S d :

R . x = Rx ∈ S d (4.14)

68
Indeed, the sphere is stable by the action of rotation matrices, and this action is
transitive. Consider a pole x0 = (1, 0, . . . , 0) ∈ S d . Its stabilizer is the set of rotations
whose axis is x0 , i.e.
( ! )
1 0
H= | R ∈ SO(d) ' SO(d).
0 R

Then by Theorem 4.4.3, S d = SO(d + 1)/SO(d). Intuitively this corresponds to


identifying all the rotations that share the same axis where rotation axes are described
by a unit vector, i.e., a point on the sphere.

Example 4.17: Stiefel manifold as a homogenous space


Recall that the Stiefel manifold St(k, n) is the set of orthonormal k-frames of Rn .
The special orthogonal group SO(n) thus acts on St(k, n) by

R . (u1 , . . . , uk ) = (Ru1 , . . . , Ruk )

It is clear that this action is transitive. Furthermore, as a frame can be represented


by an orthogonal matrix of size n × k, the above action corresponds to matrix
multiplication and is thus smooth.
For any U ∈ St(k, n), the stabilizer of U is
( ! )
Ik 0
H= | R ∈ SO(n − k) ' SO(n − k).
0 R

Then by Theorem 4.4.3, St(k, n) = SO(n)/SO(n − k). Intuitively, this corresponds to


the fact that a k-frame can be completed into an orthonormal basis of Rn , i.e. a matrix
in SO(n), and matrices in H map one such completion to all the other completions.
Thus quotienting SO(n) by SO(n − k) amounts to identifying the orthonormal bases
that agree on the first k vectors.

Example 4.18: Grassmannian as a homogenous space


Define the Grassmann manifold Gr(n, k) as the set of subspaces of Rn of dimension
k. On any k-dimensional subspace U is defined a unique orthogonal projector, i.e. a
linear map p defined in Rn for which p ◦ p = p and Im(p) = U and ker(p) = U ⊥ . Any
such projector is represented by a symmetric matrix P of size n, rank k and such

69
that P 2 = P . We thus adopt the representation of Gr(n, k) as the set
n o
Gr(n, k) = P ∈ Sym(n) | P 2 = P and rank(P ) = k .

Intuitively, any k dimensional subspace can be represented by an equivalence class of


orthonormal bases, so that the Grassmannian is a quotient space of the orthogonal
group O(n). To consider a connected group, let G = SO(n), and let the action of
G on Gr(k, n) correspond to a change of basis: Q . P = QP Q> . It is clear that any
rank-k projector can be represented by a matrix of the form
!
Ik 0
Pk = .
0 0

This means exactly that the action is transitive, and Gr(k, n) is the orbit of Pk . Let
H be its stabilizer, i.e. the set
( ! )
Q 0
H= | Q ∈ SO(k), R ∈ SO(n − k)
0 R
' SO(k) × SO(n − k).

Gr(k, n) = SO(n)/(SO(k) × SO(n − k)) is therefore a homogeneous space with


canonical projection π : Q ∈ SO(n) 7→ QPk Q> and dimension k(n − k). Of course
this manifold also be described as a quotient of St(k, n) by SO(p), where the action
is by right multiplication and the projection is identified with U ∈ St(k, n) 7→ U U > ∈
Gr(k, n). This quotient can be more practical than that of SO(n) when n  k. The
Grassmann manifold is widely used in applications both in numerical problems such
as low-rank matrix decomposition or optimization, and in higher-level applications in
machine learning, computer vision and image processing. We refer to Bendokat et al.
(2020) for a very complete exposition of this manifold.

Example 4.19: SPD matrices as a homogenous space


Recall that a symmetric matrix Σ is positive definite if it is invertible and ∀x ∈
Rn , x> Σx≥ 0. The set SP D(n) of such matrices is an open set of the vector space
Sym(n) of symmetric matrices, hence it is an embedded manifold and the canonical
immersion id : SP D(n) → Sym(n) defines a global chart. Furthermore, for any
Σ ∈ SP D(n) the tangent space TΣ SP D(n) is identified with Sym(n).
Now, define the action of GL(n) on SP D(n) by:

. : (A, Σ) ∈ GL(n) × SP D(n) 7−→ AΣA> .

70
This action is sometimes called action by congruence. It is smooth and transitive.
Indeed, let Σ = P DP > , Σ0 = Q∆Q> ∈ SP D(n) be two SPD matrices with their
eigenvalue decomposition given by the spectral theorem, then A = Q∆−1/2 D−1/2 P >
is such that A . Σ = Σ0 . The manifold of SPD matrices is thus a homogeneous space.
Finally, the isotropy group of the identity matrix In is the orthogonal group
O(n) ⊂ GL(n):

GIn = {A ∈ GL(n) | A . In = In }
= {A ∈ GL(n) | AA> = In } = O(n).

It is indeed a closed subgroup of GL(n), so its right action on GL(n) is free and
proper (Theorem 4.4.2), and by Theorem 4.4.3, SP ( D(n) is isomorphic to the orbit
GL(n) −→ SP D(n)
space GL(n)/O(n). The canonical projection π : is a
A 7−→ AA>
submersion.
In geomstats, we implement a class for symmetric matrices and a class for the
manifold of SPD matrices that inherits from OpenSet.

class SPDMatrices(OpenSet):
"""Class for the manifold of symmetric positive definite matrices."""

def __init__(self, n):


super().__init__(
dim=int(n * (n + 1) / 2),
ambient_space=SymmetricMatrices(n))
self.n = n

def belongs(self, mat, atol=gs.atol):


"""Check if a matrix is symmetric with positive eigenvalues."""
is_symmetric = self.ambient_space.belongs(mat, atol)
eigvalues = gs.linalg.eigh(mat, eigvals_only=True)
is_positive = gs.all(eigvalues > 0, axis=-1)
belongs = gs.logical_and(is_symmetric, is_positive)
return belongs

As the covariance matrix of a multivariate random variable is a positive semi-


definite matrix, this manifold is ubiquitous in applications such as signal processing,
neuroscience, etc. (see Pennec et al., 2020, Chapter 3 and references therein), although
some issues arise when the matrices are degenerate.

5 Metrics defined by invariance properties


As seen in the previous section, group actions play a key role in determining the geometric
properties of a manifold. Designing operations that are either invariant or equivariant
to a group action has been a fruitful source of improvement to deal with data lying on

71
manifolds (e.g. Pennec et al., 2006), and more recently to generalize convolutional neural
networks (Cohen et al., 2019, and references therein).
In this section, we focus on the implementation of Riemannian metrics that verify
invariance properties on quotient manifolds, defining quotient metrics.
In the second case (Section 5.1), we propose a generic framework to implement quotient
metrics, that are defined by a free and proper group action, and a metric that is invariant
to this group action. This framework is exemplified in the case of Kendall shape spaces,
and results in an efficient implementation that allows to compute parallel transport and
curvature. This is, up to our knowledge, the first open-source Python implementation of
such spaces. It was presented at the GSI 2021 conference in Guigui et al. (2021). Likewise,
we implement the quotient metric on correlation matrices studied by Thanwerdas and
Pennec (2021).
This section thus gathers two contributions to the formulation and implementation of
the parallel transport in the cases of invariant metrics and quotient metrics. Our original
implementations allow to apply these results to numerous spaces. For the completeness of
our exposure, we briefly present the particular cases of the invariant metrics on homogeneous
and symmetric spaces in Sections 5.2 and 5.3, and discuss the impact of these properties on
the implementation.

5.1 Submersions and quotient metrics


A particularly interesting setting is when there exists a submersion π : E → M between two
Riemannian manifolds E and M , such that π is compatible with the two metrics. In that
case, most of the geometry of M can be deduced from the geometry of E and the knowledge
of π. Such submersions arise for homogeneous spaces, and for quotients by a Lie group,
hence we shall revisit some examples from the previous sections with this viewpoint. We
first introduce the general notions, then focus on quotient metrics. Metrics on homogeneous
spaces will be treated in Section 5.2.
Although the mathematical notions exposed in this section are well known since the
works of O’Neill (1966) and Le and Kendall (1993), this work is original by the common
implementation of these structures in geomstats. This implementation allowed computing
parallel transport on Kendall shape spaces (Section 5.1.3), outperforming the state-of-the-art
implementations, and are available on other quotient spaces with limited additional software
development. The generic implementation is presented in Section 5.1.4, and constitutes one
of the main contributions of this section.
To exemplify this section, we extensively use the work done in collaboration with Elodie
Maignant on Kendall shape spaces, and presented at the GSI 2021 conference in Guigui
et al. (2021). We also use the Bures-Wasserstein metric, with results from Bhatia et al.

72
(2019) and following Thanwerdas and Pennec (2022).

5.1.1 Riemannian submersions


Throughout this section, let (E, g) and (M, h) be two Riemannian manifolds, and π : E → M
be a submersion. Recall that this means that at every p ∈ E, dπp is surjective. We adopt
the vocabulary of principal fiber bundles, referring to E as the total space, and to M as the
base manifold. In particular, for every x ∈ M , π −1 (x) ⊂ E is a submanifold in E usually
called fiber above x, and the tangent space at any p ∈ π −1 (x) is Tp π −1 (x) = ker dπp .

Definition 5.1.1 (Section). A section σ of π : E → M is a smooth map σ : U → E defined


on an open set U ⊆ M of M , such that ∀x ∈ M, π(σ(x)) = x. If U = M , we say that σ
is a global section.

A section allows to choose a point in a fiber above any x in a small open set. In the sequel,
we assume that π always admits local sections. This is true for Riemannian submersions in
particular by the inverse function theorem.

5.1.2 Quotient metric


Definition 5.1.2 (Horizontal - Vertical subspaces). Let x ∈ M and p ∈ π −1 (x). The vertical
subspace of Tp E is defined as Vp = ker dπp and is the tangent space to the fiber through p.
The horizontal subspace is its orthogonal complement in Tp E: Hp = (Vp )⊥ .

Any tangent vector u ∈ Tp E can thus be decomposed into a vertical and a horizontal
component, and we write ver and hor respectively the orthogonal projections on the vertical
and horizontal subspace. Note that this decomposition depends on the metric on E by the
definition of the orthogonal complement for the horizontal subspace. Furthermore, we say
that u is horizontal if u ∈ Hp , and a curve c : I → E is called horizontal if at every time
t ∈ I, the velocity of c is horizontal: c0 (t) ∈ Hc(t) .

Example 5.1: Bures-Wasserstein metric


Recall from Example 4.19 (page 70) that the map A ∈ GL(n) 7→ AA> ∈ SP D(n)
is a smooth submersion. Its differential at any A is for any H ∈ Mn (R):

dπA H = AH > + HA> ,

and its kernel is ker dπA = {H | HA> ∈ Skew(n)} = Skew(n)A−> . This defines the
vertical subspace VA .
Now, consider GL(n) endowed with the restriction of the Frobenius metric. The

73
orthogonal complement to VA is

HA = {H ∈ Mn (R) | ∀B ∈ Skew(n), hH, BA−> i = 0}


= {H ∈ Mn (R) | ∀B ∈ Skew(n), tr(BA−> H > ) = 0}
= {H ∈ Mn (R) | HA−1 ∈ Sym(n)} = Sym(n)A.

Note that in Section 5.3, we will consider the same manifolds and submersion, but
instead endow GL(n) with a left-invariant metric. This will result in another geometry
on the manifold of SPD matrices.

Recall that as π is a submersion, dπp is a linear isomorphism between Hp and Im(π) =


Tx M . If this identification is additionally isometric, then geodesics and curvature on M
will be deduced from E.

Definition 5.1.3 (Riemannian submersion). Let π : E → M be a smooth submersion between


two Riemannian manifolds. π is called a Riemannian submersion if for every p ∈ E, the map
dπp is an isometry between the horizontal subspace Hp of Tp E and Tx M where x = π(p).

If π is also surjective, this allows to identify tangent vectors of M to horizontal vectors


of E.

Definition 5.1.4 (Horizontal lift). Let π : E → M be a Riemannian submersion that is


surjective onto M . Let X be a vector field on M , then the unique horizontal lift X̄ in E is
defined such that for every x ∈ M and every p ∈ π −1 (x),

dπp X̄p = Xx .

If we have a local section σ, we speak about the corresponding horizontal lift such that
∀(x, v) ∈ T M, v̄ ∈ Hσ(x) ⊂ Tσ(x) E.

Example 5.2: Horizontal lift of SPD matrices


Following Example 5.1, it is known from linear algebra that the submersion
π : A ∈ GL(n) 7→ AA> ∈ SP D(n) is surjective. To compute the horizontal lift of a
tangent vector X at Σ = AA> , we seek a symmetric matrix S such that dπA (SA) = X.
This is true if and only if AA> S + SAA> = X, i.e. if S ∈ Sym(n) solves the Sylvester
equation
ΣS + SΣ = X. (5.1)
More precisely, using the eigenvalue decomposition of Σ = P DP > , S solves Equa-

74
tion (5.1) iff

ΣS + SΣ = X
⇐⇒ DP SP + P > SP D = P > XP
>

(P > XP )ij
⇐⇒ (P > SP )ij = . (5.2)
di + dj
Define the map SΣ : Sym(n) → Sym(n) that solves Sylvester equation using Equa-
tion (5.2). It uniquely defines the horizontal lift X̄ of X at A by X̄ = SAA> (X)A.

Proposition 5.1.1. Similarly, if c : [0, 1] → M is a piecewise smooth curve in M , and


p ∈ π −1 (c(0)), then there exists a unique curve c̄ : [0, 1] → E such that π ◦ c̄ = c and c̄0 ∈ H.
The curve c̄ is called horizontal lift of c.

Geodesics This leads to the main theorem of this section, that is usually attributed to
O’Neill, and relates geodesics of M with horizontal geodesics of E (Gallier and Quaintance,
2020, Proposition 17.8).
Theorem 5.1.1. Let π : E → M be a Riemannian submersion between two Riemannian
manifolds E and M .
(1) If γ is a geodesic in E such that c0 (0) is horizontal, then c is horizontal, and its
projection γ = π ◦ c is a geodesic in M of same length as c.

(2) For every p ∈ E, if γ is a geodesic in M such that γ(0) = π(p), then there exists  > 0
such that there exists a unique horizontal lift c of the restriction of γ to [−, ] and c
is a geodesic of E through p.

(3) If furthermore π is surjective, for any vector fields X, Y ∈ Γ(T M ), hX̄, Ȳ i = hX, Y i ◦π
(where ¯· is the horizontal lift).
The first property of this theorem can be written by the commutative diagram below:
Expp
Hp E
dπp π
Expx
Tx M M
where p ∈ E and x = π(p). Furthermore, as the restriction of dπ on horizontal spaces is an
isomorphism, choosing a section σ of E and the corresponding horizontal lift ¯· allows to
compute the exponential map of M from the one of E:
∀x ∈ M, ∀v ∈ Tx M, Expx (v) = π ◦ Expσ(x) (v̄) (5.3)

75
Remark 5.1.1.

• One can also show that the connection ∇X̄ Ȳ on E verifies (see e.g. Lafontaine et al.
(2004, Proposition 3.55))
1
∇X̄ Ȳ = ∇X Y + ver[X̄, Ȳ ].
2
Consequently, one cannot hope to obtain a similar commutation rule between the
parallel transport in E and the one in M . Indeed, if Ȳ (t) is the horizontal lift of a
parallel vector field Y (t) along a curve γ (whose horizontal lift is γ̄), then ∇γ̄ 0 Ȳ has a
non-zero vertical component given above, which vanishes in the case of a geodesic.
Computing parallel transport is thus not as straightforward as computing geodesics
in this case.

• A Riemannian submersion shortens distances, i.e. for p, q ∈ E, and writing dE and


dM the Riemannian distances on respectively E and M , we have

dM π(p), π(q) ≤ dE (p, q).




Example 5.3: Bures-Wasserstein geodesics


There is a unique metric that turns π : A ∈ GL(n) 7→ AA> ∈ SP D(n) into a
Riemannian submersion between GL(n) endowed with the Frobenius metric and
SP D(n). This metric is called the Bures-Wasserstein (BW) metric. let us apply
Theorem 5.1.1 to compute the geodesics of this metric. Recall that (GL(n), Frobenius)
is flat, so the geodesics are “straight” lines: ExpA (tX) = A + tX. Let Σ = AA> ∈
SP D(n) and X ∈ TΣ SP D(n), then ∀t > 0

ExpΣ (tX) = π ◦ ExpA (tX̄) = π(A + tSΣ (X)A)


= (A + tSΣ (X)A)(A + tSΣ (X)A)>
= AA> + t(SΣ (X)Σ + ΣSΣ (X)) + t2 SΣ (X)ΣSΣ (X)
= AA> + tX + t2 SΣ (X)ΣSΣ (X)

Note that a geodesic hits the boundaries of the manifold when −1/t is in an eigenvalue
of SΣ (X). (SP D(n), BW) is therefore not complete. All the properties of this metric
are derived using the equations of Riemannian submersions in Thanwerdas and
Pennec (2022) and Thanwerdas (2022).

The relation between the connection of the total space and that of the base manifold
also translates into a relation between the curvatures of each space.

76
Curvature O’Neill (1966) showed that the curvatures of the total space and of the
base space were related, and that those of the base space could be computed using two
fundamental tensors defined by the horizontal and vertical projections of the connection.
We state here the main result, that shows that sectional curvature can only increase after a
Riemannian submersion.
Theorem 5.1.2 (O’Neill). Let π : E → M be a Riemannian submersion, and X, Y be
orthonormal vector fields on M , with horizontal lifts X̄, Ȳ . Then
3
κ(X, Y ) = κ(X̄, Ȳ ) + k ver[X̄, Ȳ ]k2 . (5.4)
4

Example 5.4: Curvature of the Bures-Wasserstein metric


As GL(n) with the Frobenius metric is flat, O’Neill’s theorem implies that the
space of SPD matrices endowed with the Bures-Wasserstein metric has non-negative
curvature.

We now apply these results to the special case of submersions defined as the canonical
projection to a quotient space.

Metric Recall from Theorem 4.4.1 (page 67) that if G is a Lie group and E a smooth
manifold such that G acts on E and the action is smooth, free and proper, then the canonical
projection π : x ∈ E 7→ [x] ∈ M = E/G is a submersion.
In this case, the action of G allows to move in the fibers. More precisely, as the fibers are
defined as orbits G . p = {g . p | g ∈ G} for p ∈ E, the fibers are stable by the action of G, i.e.
∀x ∈ M, ∀g ∈ G, ∀p ∈ π −1 (x), g . p ∈ π −1 (x), which can also be written π(g . p) = π(p) = x.
Moreover, this action is transitive on fibers, i.e. for any q ∈ E such that π(q) = x, there
exists g ∈ G such that q = g . p.
Definition 5.1.5 (Invariant metric). A Riemannian metric on E is G-invariant if for any
g ∈ G, Lg : p ∈ E 7→ g . p ∈ E is an isometry.
Now suppose that E is equipped with a G-invariant metric. Then one can define a metric
on the quotient manifold M = E/G such that π is a Riemannian submersion (Lafontaine
et al., 2004, Proposition 2.28).
Proposition 5.1.2 (Quotient metric). Let E be a smooth manifold, G a Lie group acting
smoothly, properly and freely on G, and h·, ·i be a G-invariant metric on E . Let π : E → M
be the canonical projection. Then there exists a unique Riemannian metric on M such that
π is a Riemannian submersion. Let x ∈ M and p ∈ π −1 (x), for any u, v ∈ Tx M , it is given
by
hu, vix , hū, v̄ip . (5.5)

77
We refer to this metric on M as the quotient metric.

Thanks to the invariance of the metric on E, the quotient metric on M is well defined
and does not depend on the choice of p in the fiber above x. The quotient metric is the
unique metric on M such that π is a Riemannian submersion. Note that we do not use
different notations for the two metrics for simplicity, but the subscripts indicate in which
space it is defined: the character p is preferred for a point in the total space E and x in
the quotient manifold M . Let dE and dM denote the Riemannian distance of respectively
(E, h·, ·i) and (M, h·, ·i).

Proposition 5.1.3. Let p, q ∈ E. We have the following relation

dM π(p), π(q) = inf {dE (p, q 0 )} = inf {dE (p, gq)}. (5.6)

q 0 ∈π −1 (π(q)) g∈G

This leads to the following definition.

Definition 5.1.6 (Align). Let p, q ∈ E, we say that q is aligned or well-positioned with p if


dE (p, q) = dM (π(p), π(q)). If p and q are close enough, then there exists a unique q 0 aligned
with p, and the geodesic between p and q is horizontal.

We define the align map ω on a sufficiently small subset of D ⊆ E 2 , such that for any
(x, y) ∈ D, ω(x, y) ∈ E is aligned with x. In general, the optimization problem corresponding
to Equation (5.6) must be solved in G to compute the align map. This procedure is often
referred to as alignment, Procrustes analysis or registration in the literature. We implement
a general optimization procedure using the minimize method of scipy, that implements a
gradient descent. If the group and its Lie algebra are explicitly given, we optimize in the
Lie algebra and use the group exponential map to map back to the group, then compute its
action of a group element. A parameter represents the coefficient in a given basis of the Lie
algebra, and matrix_representation maps these coefficient to the corresponding vector by
linear combination.

def wrap(param):
"""Wrap a parameter vector to a group element and act on point."""
algebra_elt = gs.array(param)
algebra_elt = gs.cast(algebra_elt, dtype=base_point.dtype)
algebra_elt = group.lie_algebra.matrix_representation(algebra_elt)
group_elt = group.exp(algebra_elt)
return group_action(point, group_elt)

If only the group action is given, we optimize without precautions on the group structure:

def wrap(param):
"""Wrap parameter vector to group element and act on point."""
group_elt = gs.array(param)
group_elt = gs.cast(group_elt, dtype=base_point.dtype)
return group_action(group_eltr, point)

78
In both cases, automatic differentiation is used to compute the gradient of the objective
function.

def align(self, point, base_point,


max_iter=25, verbose=False, tol=gs.atol):
"""Align point to base_point by minimization."""
... # definitions of wrap for different cases
objective_with_grad = gs.autograd.value_and_grad(
lambda param: self.ambient_metric.squared_dist(
wrap(param), base_point))

init_param = gs.flatten(gs.random.rand(*max_shape))
res = minimize(
objective_with_grad, init_param, method='L-BFGS-B', jac=True,
options={'disp': verbose, 'maxiter': max_iter}, tol=tol)

return wrap(res.x)

This map is particularly useful because it allows to compute the Log map of the quotient
space from the one of the total space. Indeed, choosing a section σ and the corresponding
horizontal lift ¯·, we can deduce from Theorem 5.1.1 the following relation:

∀x, y ∈ M, Logx (y) = dπσ(x) Logx ω(x, y) . (5.7)




Example 5.5: Bures-Wasserstein distance


Continuing Example 5.2 where the submersion A 7→ AA> from GL(n) to SP D(n),
and explaining why the metric of Example 5.3 is correctly defined, we define the
quotient metric of GL(n) with the Frobenius metric by the orthogonal group. Indeed,
consider the right-action of O(n) on GL(n) by matrix multiplication. As shown in
Example 4.19 (page 70), the canonical projection of GL(n)/O(n) coincides with
π : A 7→ AA> .
The metric hereby defined on the manifold of SPD matrices is called the Bures-
Wasserstein metric (Bhatia et al., 2019). It arises naturally as the L2 -Wasserstein
distance on multivariate centred normal random variables. It also corresponds to
the Procrustes reflection size-and-shape metric (Dryden et al., 2009) taken on the
square root SPD matrices. As GL(n) is Euclidean under the Frobenius metric, the
geodesics and curvature are known and can be projected to the quotient space. The
minimization problem (5.6) can be solved in closed-form to obtain (Thanwerdas and
Pennec, 2022; Thanwerdas, 2022).

d(Σ, Σ0 )2 = tr(Σ) + tr(Σ0 ) − 2 tr((ΣΣ0 )1/2 ).

In the next subsection, we focus on Kendall shape spaces, as most of the geometry
of these spaces can be computed using the results of this section. Our implementation in
geomstats is, to the best of our knowledge, the first open-source Python implementation of
Kendall shape spaces, and allows efficient computation of parallel transport.

79
5.1.3 Application to Kendall shape spaces
The following subsection is inspired by Nava-Yazdani et al. (2020) but constitutes an
original contribution by putting all the code open source in our package geomstats, and
by improving the implementation of parallel transport. It is a collaboration with Elodie
Maignant and was presented at GSI 2021 in Guigui et al. (2021).
We revisit Example 2.5 (page 7) with the additional idea that the size of a set of
landmarks may be filtered out to define a shape. The study of these spaces, including their
mathematical structure, the properties of statistical distributions and estimation methods
for shape data and their applications to many scientific fields goes back to the late 1970’s
with the works of Mardia, Bookstein and Kendall among others. For historical notes on
this research area and an introduction to the applications Dryden and Mardia (see 2016,
Preface and Section 1.4). Recall that we consider the set of centred matrices of size m × k
as the space of configurations (or landmarks). We assume that m ≥ 2, and refer the reader
to Le and Kendall (1993) for more details.
To further remove the effects of scaling, we restrict to non-zero x (i.e. at least two
landmarks are different), and divide x by its Frobenius norm (written k · k). This defines
the pre-shape space
k
= {x ∈ M (m, k) | xi = 0, kxk = 1},
X
k
Sm
i=1

which is identified with the hypersphere of dimension m(k − 1) − 1. The pre-shape space is
therefore a differential manifold whose tangent space at any x ∈ Sm
k is given by

k
= {w ∈ M (m, k) | wi = 0, tr(w> x) = 0}.
X
k
Tx Sm
i=1

Moreover, the rotation group SO(m) acts on Sm k by matrix multiplication, and this

action corresponds to applying a rotation to each landmark individually. As SO(m) is


compact, this action is proper. However, this action is not free everywhere if m ≥ 3. This
makes the orbit space
Σkm = {[x] | x ∈ Sm
k
} = Sm
k
/SO(m)
a “differential manifold with singularities where the action is not free”, and these points
correspond to matrices of rank m − 2 or less (i.e. some landmarks are aligned).
By Theorem 4.4.1 (page 67), the canonical projection π : x 7→ [x] is a submersion. For
any x ∈ Sm k and A ∈ Skew(m), as for all t ∈ R, [exp(tA)x] = x, the curve t 7→ exp(tA)x is

a curve in the fiber through x, so the vertical space at x is

Vx = {Ax | A ∈ Skew(m)} = Skew(m)x.

80
The pullback of Frobenius metric on the pre-shape space allows to define the horizontal
spaces:

Hx = {w ∈ Tx Sm
k
| T r(Axw> ) = 0 ∀A ∈ Skew(m)}
= {w ∈ Tx Sm
k
| xw> ∈ Sym(m)},

where Sym(m) is the space of symmetric matrices of size m. Lemma 1 from Nava-Yazdani
et al. (2020) allows to compute the vertical component of any tangent vector.

Lemma 5.1.3. For any x ∈ Sm k and w ∈ T S k , the vertical component of w can be


x m
computed as Vx (w) = Ax where A solves the Sylvester equation:

Axx> + xx> A = wx> − xw> . (5.8)

If rank(x) ≥ m − 1, the skew-symmetric solution A of Equation (5.8) is unique.

In practice, the Sylvester equation can be solved by an eigenvalue decomposition of xx>


(see Example 5.2). This defines verx , the orthogonal projection on Vx . As Tx Sm
k =V ⊕H ,
x x
any tangent vector w at x ∈ Sm may be decomposed into a horizontal and a vertical
k

component, by solving Equation (5.8) to compute verx (w), and then horx (w) = w − verx (w).
As the Frobenius metric is invariant to the action of SO(m), we can define the quotient
metric on Σkm , and this makes π a Riemannian submersion. Furthermore, the Riemannian
distances d on Sm k and d on Σk are related by
Σ m

dΣ (π(x), π(y)) = inf d(x, Ry).


R∈SO(m)

The optimal rotation R between any x, y is unique in a subset U of Sm


k × S k , which allows
m
to define the align map ω : U → Sm that maps (x, y) to Ry. In this case, dΣ (π(x), π(y)) =
k

d(x, ω(x, y)) and xω(x, y)> ∈ Sym(m). It is useful to notice that ω(x, y) can be directly
computed by a pseudo-singular value decomposition of xy > (Kendall and Le, 2009).

Remark 5.1.2. The alignment problem is similar to the canonical correlation analysis
problem (CCA) between two data sets:

max hU x, V yi.
U,V ∈O(m)

Finally, in the case of the Kendall shape spaces, the quotient space cannot be seen
explicitly as a submanifold of some RN . Moreover, the projection π and its derivative
dπ cannot be computed. However, the align map amounts to identifying the shape space
with a local horizontal section of the pre-shape space, and thanks to the characteristics of
Riemannian submersions mentioned in the previous subsections, all the computations can
be done in the pre-shape space.

81
Let Exp, Log, and d denote the operations of the pre-shape space Sm
k , that are given in

Equation (3.16), (3.17) (page 35). We obtain from Theorem 5.1.1 for any x, y ∈ Sm k and
k
v ∈ Tx Sm
ExpΣ,[x] (dπx v) = π(Expx (horx (v))),
LogΣ,[x] ([y]) = dπx Logx (ω(x, y)),
dΣ ([x], [y]) = d(x, ω(x, y)).
To end this section, we state Proposition 2 of Kim et al. (2021), that echoes with
Remark 5.1.1 for the computation of parallel transport.
Proposition 5.1.4. Let γ be a horizontal C 1 -curve in Sm k and v be a horizontal tangent

vector at γ(0). Assume that rank(γ(s)) ≥ m − 1 except for finitely many s. Then the vector
field s 7→ v(s) along γ is horizontal and the projection of v(s) to T[γ(s)] Σkm is the parallel
transport of dπx v along [γ(s)] if and only if s 7→ v(s) is the solution of
v̇(s) = − tr(γ̇(s)v(s)> )γ(s) + A(s)γ(s), v(0) = v (5.9)
where for every s, A(s) ∈ Skew(m) is the unique solution to
A(s)γ(s)γ(s)> + γ(s)γ(s)> A(s) = γ̇(s)v(s)> − v(s)γ̇(s)> . (5.10)
Equation (5.9) means that the covariant derivative of s 7→ v(s) along γ must be a
vertical vector at all times, defined by the matrix A(s) ∈ Skew(m). These equations can be
used to compute parallel transport in the shape space. To compute the parallel transport of
dπx w along [γ], Kim et al. propose the following method: one first chooses a discretization
time-step δ = n1 , then repeat for every s = ni , i = 0 . . . n
1. Compute γ(s) and γ̇(s),

2. Solve the Sylvester Equation (5.10) to compute A(s) and the r.h.s. of (5.9),

3. Take a discrete Euler step to obtain ṽ(s + δ)

4. Project ṽ(s + δ) to Tγ(s) Sm


k to obtain v̂(s + δ),

5. Project to the horizontal subspace: v(s + δ) ← hor(v̂(s + δ))

6. s ← s + δ
We notice that this method can be accelerated by a higher-order integration scheme, such
as Runge-Kutta (RK) by directly integrating the system v̇ = f (v, s) where f is a smooth
map given by Equations (5.9) and (5.10). In this case, steps 4. and 5. are not necessary. The
precision and complexity of this method is then bound to that of the integration scheme
used, as shown in Figure 5.1 for randomly generated orthogonal tangent vectors of unit
norm.

82
Absolute error wrt number of time steps, k=4, m=3 Absolute error rt number of time steps, k=6, m=3
10−1 10−2

10−3 10−4

10−5 10−6

10−7 10−8

10−9 10−10

10−11 10−12
Integration, RK4 Integration, RK4
10−13 Integration, Euler 10−14 Integration, Euler
Integration, RK2 Integration, RK2
101 102 103 101 102 103
n n

Figure 5.1: Convergence speed of the integration of Equation (5.9) for two orthonormal initial vectors v
and γ̇(0), with respect to the number of steps. The curve γ is a horizontal geodesic here.

5.1.4 Implementation
We now present an abstract class to construct a quotient metric. There are two scenarios.
In the first one, a submersion π : E → M is given between the total space and the base
manifold, together with a metric on E whose restriction to horizontal spaces is preserved
by π. This is the case of the Bures-Wasserstein metric (Examples 5.2, 5.3 and 5.5).

83
In the second scenario, we are only given

1. a total space E,

2. a group G acting freely, properly and smoothly on E,

3. a metric on E invariant to the action of G.

In this case, the base manifold cannot be represented explicitly, that is, as an embedded
submanifold of RN . Thus, we cannot implement it as an OpenSet or LevelSet class on its
own. However, we can use the properties of the canonical projection, that is a Riemannian
submersion, to construct a metric on the quotient space. This metric is in fact defined
on horizontal spaces of the total space, and the base manifold is locally identified with a
section of the total space. This is the case of Kendall shape spaces (Section 5.1.3).
To model both cases, we use the structure of fiber bundle. The first scenario is a fiber
bundle by definition. For the second scenario, we notice that π : E → M and (G, ·) form a
principal fiber bundle. Conversely, if we endow a principal fiber bundle with a G-invariant
metric, we can define a quotient metric on B. We thus choose to implement an abstract
FiberBundle class, as main ingredient of a QuotientMetric class.
As the goal is to compute the Riemannian Exp, Log and distance maps according
to Equations (5.3), (5.7) and (5.6), we need to specify a section and the corresponding
horizontal lift, and the align map. Then the quotient metric is implemented as follows

class QuotientMetric(RiemannianMetric):
"""Quotient metric."""

def __init__(self, fiber_bundle: FiberBundle, dim: int = None):


if dim is None:
if fiber_bundle.base is not None:
dim = fiber_bundle.base.dim
elif fiber_bundle.group is not None:
dim = fiber_bundle.dim - fiber_bundle.group.dim
else:
raise ValueError('Either the base manifold, '
'its dimension, or the group acting on the '
'total space must be provided.')
super().__init__(
dim=dim, default_point_type=fiber_bundle.default_point_type)
self.fiber_bundle = fiber_bundle
self.group = fiber_bundle.group
self.ambient_metric = fiber_bundle.ambient_metric

def inner_product(
self, tangent_vec_a, tangent_vec_b, base_point=None,
point_above=None):
"""Compute the inner-product of two tangent vectors at a base point."""
if point_above is None:
if base_point is not None:
point_above = self.fiber_bundle.lift(base_point)
else:
raise ValueError(
'Either a point (of the total space) or a base point (of '
'the quotient manifold) must be given.')
hor_a = self.fiber_bundle.horizontal_lift(tangent_vec_a, point_above)

84
hor_b = self.fiber_bundle.horizontal_lift(tangent_vec_b, point_above)
return self.ambient_metric.inner_product(hor_a, hor_b, point_above)

def exp(self, tangent_vec, base_point, **kwargs):


"""Compute the Riemannian exponential of a tangent vector."""
lift = self.fiber_bundle.lift(base_point)
horizontal_vec = self.fiber_bundle.horizontal_lift(tangent_vec, lift)
return self.fiber_bundle.riemannian_submersion(
self.ambient_metric.exp(horizontal_vec, lift))

def log(self, point, base_point, **kwargs):


"""Compute the Riemannian logarithm of a point."""
point_fiber = self.fiber_bundle.lift(point)
bp_fiber = self.fiber_bundle.lift(base_point)
aligned = self.fiber_bundle.align(point_fiber, bp_fiber, **kwargs)
return self.fiber_bundle.tangent_riemannian_submersion(
self.ambient_metric.log(aligned, bp_fiber), bp_fiber)

def squared_dist(self, point_a, point_b, **kwargs):


"""Squared geodesic distance between two points."""
lift_a = self.fiber_bundle.lift(point_a)
lift_b = self.fiber_bundle.lift(point_b)
aligned = self.fiber_bundle.align(lift_a, lift_b, **kwargs)
return self.ambient_metric.squared_dist(aligned, lift_b)

Example 5.6: Implementation of the BW metric


In the setting of Examples 5.2, 5.3 and 5.5, that corresponds to the first scenario
as E = GL(n) and M = SP D(n), we create a new class that corresponds to GL(n)
with the fiber bundle structure: Note that all computations can actually be carried
out in closed form (see Bhatia et al., 2019), and this specific examples allows to
test our general implementation. In this case we used the minimization procedure to
compute the align map.

85
class BuresWassersteinBundle(GeneralLinear, FiberBundle):
def __init__(self, n):
super().__init__(
n=n, base=SPDMatrices(n), group=SpecialOrthogonal(n),
ambient_metric=MatricesMetric(n, n))

@staticmethod
def riemannian_submersion(point):
return Matrices.mul(point, Matrices.transpose(point))

@staticmethod
def lift(point):
return gs.linalg.cholesky(point)

def tangent_riemannian_submersion(self, tangent_vec, base_point):


product = Matrices.mul(
base_point, Matrices.transpose(tangent_vec))
return 2 * Matrices.to_symmetric(product)

def horizontal_lift(self, tangent_vec, point_above=None, base_point=None):


if base_point is None and point_above is not None:
if point_above is not None:
base_point = self.riemannian_submersion(point_above)
else: raise ValueError(
'Either a point in the fiber or a base point in base manifold)'
' must be given.')
sylvester = gs.linalg.solve_sylvester(
base_point, base_point, tangent_vec)
return Matrices.mul(sylvester, point_above)

In the second scenario, we only compute in E so both submersion and lift maps are the
identity of E. This can be set by default and overridden as in Example 5.6. Furthermore, in
that case, tangent vectors to M are identified with horizontal vectors of E, so we need to
compute the horizontal decomposition of a tangent vector to E. Thus, either the horizontal
or vertical projection needs to be implemented, and the other one can be deduced.
In the first scenario, these are given by using dπ and the horizontal lift, as vertical
vectors are in the kernel of dπ by definition.
This explains the following FiberBundle class (the align method is not duplicated from
above in the interest of space):

class FiberBundle(Manifold, abc.ABC):


"""Class for (principal) fiber bundles."""

def __init__(
self, dim: int, base: Manifold = None,
group: LieGroup = None, ambient_metric: RiemannianMetric = None,
group_action=None, **kwargs):

super().__init__(dim=dim, **kwargs)
self.base = base
self.group = group
self.ambient_metric = ambient_metric

if group_action is None and group is not None:


group_action = group.compose
self.group_action = group_action

@staticmethod

86
def riemannian_submersion(point):
"""Project a point to base manifold."""
return point

@staticmethod
def lift(point):
"""Lift a point to total space."""
return point

def align(self, point, base_point,


max_iter=25, verbose=False, tol=gs.atol):
"""Align point to base_point by optimization in the Lie algebra."""
pass

def tangent_riemannian_submersion(self, tangent_vec, base_point):


"""Project a tangent vector to base manifold."""
return self.horizontal_projection(tangent_vec, base_point)

def horizontal_lift(self, tangent_vec, point_above=None, base_point=None):


"""Lift a tangent vector to a horizontal vector in the total space."""
if point_above is None:
if base_point is not None:
point_above = self.lift(base_point)
else:
raise ValueError(
'Either a point (of the total space) or a base point (of'
'the base manifold) must be given.')
return self.horizontal_projection(tangent_vec, point_above)

def horizontal_projection(self, tangent_vec, base_point):


"""Project to horizontal subspace."""
try:
return tangent_vec - self.vertical_projection(
tangent_vec, base_point)
except (RecursionError, NotImplementedError):
return self.horizontal_lift(
self.tangent_riemannian_submersion(
tangent_vec, base_point), base_point)

def vertical_projection(self, tangent_vec, base_point, **kwargs):


"""Project to vertical subspace."""
try:
return tangent_vec - self.horizontal_projection(
tangent_vec, base_point)
except RecursionError:
raise NotImplementedError

Example 5.7: Kendall shape metric


With the above construction of the second scenario, Kendall shape metric is just
a subclass of the QuotientMetric class. We also add the parallel transport method
discussed above and used in Figure 5.1.

87
class KendallShapeMetric(QuotientMetric):
"""Quotient metric on the shape space."""

def __init__(self, k_landmarks, m_ambient):


bundle = PreShapeSpace(k_landmarks, m_ambient)
super().__init__(
fiber_bundle=bundle,
dim=bundle.dim - int(m_ambient * (m_ambient - 1) / 2))

def parallel_transport(
self, tangent_vec_a, tangent_vec_b, base_point, n_steps=100,
step='rk4'):
"""Compute the parallel transport of a tangent vec along a geodesic."""
horizontal_a = self.fiber_bundle.horizontal_projection(
tangent_vec_a, base_point)
horizontal_b = self.fiber_bundle.horizontal_projection(
tangent_vec_b, base_point)

def force(state, time):


gamma_t = self.ambient_metric.exp(time * horizontal_b, base_point)
speed = self.ambient_metric.parallel_transport(
horizontal_b, time * horizontal_b, base_point)
coef = self.inner_product(speed, state, gamma_t)
normal = gs.einsum('...,...ij->...ij', coef, gamma_t)

align = gs.matmul(Matrices.transpose(speed), state)


right = align - Matrices.transpose(align)
left = gs.matmul(Matrices.transpose(gamma_t), gamma_t)
skew_ = gs.linalg.solve_sylvester(left, left, right)
vertical_ = - gs.matmul(gamma_t, skew_)
return vertical_ - normal

flow = integrate(force, horizontal_a, n_steps=n_steps, step=step)


return flow[-1]

Example 5.8: Full rank correlation matrices


As a last example, we implement the metric on the set of Correlation matrices
described in Thanwerdas and Pennec (2021) using our abstract FiberBundle and
QuotientMetric classes.
The set of full-rank correlation matrices Corr(n) is a submanifold of SP D(n)
formed by matrices with unit diagonal. Define the action of positive diagonal matrices
on SP D(n) by congruence:

. : (D, Σ) ∈ Diag+ (n) × SP D(n) 7→ DΣD ∈ SP D(n).

This action is smooth, free, and proper (David and Gu, 2019). The quotient manifold
SP D(n)/Diag+ (n) is thus well defined, and can be identified with Corr(n) by the
−1/2
map [Σ] 7→ DΣ . Σ where DΣ is the diagonal matrix with coefficients Σii .
Let π : SP D(n) → Corr(n) be the canonical projection composed with this map,
−1/2
i.e. Σ 7→ DΣ . Σ, and consider the affine-invariant metric defined on SP D(n) by

gΣ (V, W ) = tr(Σ−1 V Σ−1 W ).

88
This metric will be detailed in Example 5.14 in Section 5.3. It is clear that it is
invariant by the action of diagonal matrices.
Therefore we can define the quotient metric such that π is a Riemannian submersion.
The horizontal and vertical spaces are given Thanwerdas and Pennec (in 2021,
Theorem 1). The submersion and its differential, the vertical projection and horizontal
lifts are computed in closed form, so we are in the first scenario, and the quotient
metric is simply

class FullRankCorrelationAffineQuotientMetric(QuotientMetric):
"""Class for the quotient of the affine-invariant metric."""

def __init__(self, n):


super().__init__(
fiber_bundle=CorrelationMatricesBundle(n=n))

where the FiberBundle class is used to define CorrelationMatricesBundle as follows:

class CorrelationMatricesBundle(SPDMatrices, FiberBundle):


"""Fiber bundle for the quotient metric on correlation matrices."""

def __init__(self, n):


super().__init__(
n=n, base=CorrelationMatrices(n),
ambient_metric=SPDMetricAffine(n), group_dim=n,
group_action=CorrelationMatrices.diag_action)

89
@staticmethod
def riemannian_submersion(point):
"""Compute the correlation matrix associated to an SPD matrix."""
diagonal = Matrices.diagonal(point) ** (-.5)
aux = gs.einsum('...i,...j->...ij', diagonal, diagonal)
return point * aux

def tangent_riemannian_submersion(self, tangent_vec, base_point):


"""Compute the differential of the Riemannian submersion."""
diagonal_bp = Matrices.diagonal(base_point)
diagonal_tv = Matrices.diagonal(tangent_vec)

diagonal = diagonal_tv / diagonal_bp


aux = base_point * (diagonal[..., None, :] + diagonal[..., :, None])
mat = tangent_vec - .5 * aux
return CorrelationMatrices.diag_action(diagonal_bp ** (-.5), mat)

def vertical_projection(self, tangent_vec, base_point, **kwargs):


"""Compute the vertical projection wrt the affine-invariant metric."""
n = self.n
inverse_base_point = GeneralLinear.inverse(base_point)
operator = gs.eye(n) + base_point * inverse_base_point
inverse_operator = GeneralLinear.inverse(operator)
vector = gs.einsum(
'...ij,...ji->...i', inverse_base_point, tangent_vec)
diagonal = gs.einsum('...ij,...j->...i', inverse_operator, vector)
return base_point * (diagonal[..., None, :] + diagonal[..., :, None])

def horizontal_lift(self, tangent_vec, base_point=None, fiber_point=None):


"""Compute the horizontal lift wrt the affine-invariant metric."""
if fiber_point is None and base_point is not None:
return self.horizontal_projection(tangent_vec, base_point)
diagonal_point = Matrices.diagonal(fiber_point) ** 0.5
lift = CorrelationMatrices.diag_action(diagonal_point, tangent_vec)
hor_lift = self.horizontal_projection(lift, base_point=fiber_point)
return hor_lift

5.2 Homogeneous spaces


We now come back to the particular case of homogeneous spaces, that are quotient spaces
where the total space is a Lie group E = G under the action of a subgroup H ⊂ G of G,
and M = G/H. We focus on Riemannian metrics that are invariant to the group action,
that is, for which the action of any g ∈ G on M is an isometry. This case is fundamental
as it is the simplest way in which all the geometry of M is determined by that of G. If
additionnally the metric on G is bi-invariant, we say the metric on G/H is normal, and all
the computations can be carried in closed-form.

5.2.1 Characterization
Recall that a homogeneous space M ' G/H is a manifold with a smooth transitive action
of G on M , and corresponds to the orbits of the right-action of a closed subgroup H of
G. As the map π : G 7→ G/H is a submersion (see Theorem 4.4.1 page 67), we shall see

90
that invariant metrics on G/H are in fact particular cases of the quotient metrics of the
previous section. Indeed, a particular case of Theorem 5.1.2 (page 77) gives the following
proposition.
Proposition 5.2.1. Let G be a Lie group and H ⊆ G a closed subgroup of G. Write
o = eH = H ∈ G/H, and g, h the Lie algebras of respectively G, H. Let g be a left-invariant
Riemannian metric on G, that is also right-invariant by H. Then there exists a unique
Riemannian metric on G/H that is invariant to the action of G and such that dπe is an
isometry between h⊥ ⊂ g and To G/H. In fact, π is a Riemannian submersion.
Indeed, Theorem 5.1.2 applies with E = G and G = H as the action of H on G is
smooth, free and proper (by Theorem 4.4.2), and the metric on G is invariant to this action.
This defines a metric on G/H, and we can show that it is invariant to G. Indeed, for any
x, y ∈ G, write Myx−1 : G/H → G/H that maps some [p] to [yx−1 p], and Lyx−1 : G → G
the left translation of G by yx−1 . Then by definition of the action of G on M , we have the
following commutation
π ◦ Lyx−1 = Myx−1 ◦ π
Differentiating the above expression at x and as dπx , dπy and d(Lyx−1 )x are isometries, we
conclude that d(Myx−1 )[x] is also an isometry.

5.2.2 Existence
The above proposition thus states that we need a left-invariant metric on G that is also
right-invariant by H to define an invariant metric on G/H. There are necessary and sufficient
conditions on G and H for these to exist (see e.g. Gallier and Quaintance, 2020, Proposition
22.22).
Proposition 5.2.2. If G acts faithfully on G/H and if g admits a decomposition g = h ⊕ m
with ADH (m) ⊂ m, then G-invariant metrics on G/H are in one-to-one correspondence
with ADH -invariant scalar products on m. These exist if and only if the closure of the
group ADH (m) is compact. Conversly, if G/H admits a G-invariant metric, then G admits
a left-invariant metric which is right invariant under H, and the restriction of this metric
to H is bi-invariant. The decomposition of g is given by m = h⊥ in this case.
Note that if H is connected, the condition ADH (m) ⊂ m is equivalent to [h, m] ⊂ m. If H
is compact, then ADH (m) is compact, so a G-invariant metric exists. The above proposition
suggests the following definitions due to Nomizu (1954).
Definition 5.2.1 (Reductive Homogeneous space). Let G be a Lie group and H a closed
subgroup of G. Write g, h their Lie algebra. We say that the homogeneous space G/H is
reductive if there exists some subspace m ⊂ g such that
g=h⊕m and ADh (m) ⊆ m ∀h ∈ H.

91
In this case, m is isomorphic to To G/H via dπe . In fact, h is the vertical subspace at
e, and m is the horizontal space. The notion of reductive homogeneous space is important
because it is a sufficient condition for M = G/H to admit a G-invariant connection, whose
geodesics are projections (by π) of the one-parameter subgroups of G. This is not sufficient
however to obtain such property from the Levi-Civita connection of some metric, and an
additional condition will be given in the next subsection. Accordingly with the decomposition
g = h ⊕ m, any element of the Lie algebra can be decomposed as the sum of an element of
m and an element of h. For any x ∈ g we write this decomposition as x = xm + xh where
xm ∈ m and xh ∈ h.

Example 5.9: Invariant metric on Stiefel manifold


Recall from Example 4.17 (page 69) that with G = SO(n) and
( ! )
Ik 0
H= | R ∈ SO(n − k) ' SO(n − k),
0 R

we obtain the Stiefel manifold S(n, k) = G/H, the set of orthonormal k-frames
represented by n × k orthogonal matrices. The equivalence class of some Q =
(U, U⊥ ) ∈ S(k, n) is [Q] = QH = (U, U⊥ R) where R is any matrix in SO(n − k). The
canonical projection is therefore given by π : Q = (U, U⊥ ) 7→ U and the origin o = eH
of G/H is π(In ) = Ik . Thus we can write π as the projection on the first k columns.
Moreover, g is the set of skew-symmetric matrices, so that we have g = h ⊕ m with
!
n 0 0 o
h= , S ∈ so(n − k)
0 S
!
n T −A> o
m= , T ∈ so(k)A ∈ Mn−k,k (R)
A 0

It is straightforward to check that [h, m] ⊆ m, and as H is connected, S(k, n) is a


reductive homogeneous space.
As H is compact, by Proposition 5.2.2 there exists a G-invariant metric on S(k, n).
Indeed, as the pullback of the Frobenius metric on G is bi-invariant on G, hence
on H, Proposition 5.2.1 applies and the metric on S(k, n) is defined such ! that
T −A>
π : Q = (U, U⊥ ) 7→ U is a Riemannian submersion. For X = ∈ m,
A 0

92
!
T
dπo X = and we obtain
A
 !> !
1
! !
T S T −A> S −B >
h , i , tr 
2 0 0

A B A B
1
= tr(T > S) + tr(A> B) (5.11)
2
A more convenient way of representing a tangent vector at U is by X = U S +
(I − U U > )A. Then one can show that the SO(n)-invariant metric defined in Equa-
tion (5.11) can be written
1
hX1 , X2 iU = tr(X1 (Ik − U U > )X2 ).
2
For more on this metric, see Gallier and Quaintance (2020, Section 22.5).

5.2.3 Properties
Recall that for a reductive Lie algebra, we have the decomposition: [X, Y ] = [X, Y ]m +[X, Y ]h .
By using the formulas of the connection of an invariant metric on a Lie group (see Section 4.3)
and of a Riemannian submersion (Section 5.1), one can show the following (see e.g. Gallier
and Quaintance, 2020, Propositions 22.25 and 22.27).

Theorem 5.2.1. Let G be a Lie group and H a closed subgroup of G. If there exists an
ADH -invariant inner product on m, then the Levi-Civita connection of the induced metric
on G/H is given by ∇X Y = − 12 [X, Y ]m and the geodesics are projections of one-parameter
subgroups if and only if

hx, [z, y]m i + h[z, x]m , yi = 0 ∀x, y, z ∈ m. (5.12)

In this case, we say that G/H is naturally reductive.

This property allows to derive closed-form expressions for the geodesics in many cases,
as one parameter subgroups are given by the matrix exponential in classical Lie groups. A
similar behavior holds for curvature and parallel transport. Many formulas implemented in
geomstats can be retrieved this way.

93
Example 5.10: Stiefel Exponential map
Following Example 5.9 and applying Theorem 5.2.1, we can verify that the metric
verifies Equation (5.12). Therefore, the exponential map is given by
!! !! !
S S −A> Ik
ExpU = (U, U⊥ ) exp
A A 0 0

This expression can be simplified with a QR decomposition of A, and its inverse (the
Logarithm) can be computed recursively Zimmermann (see 2017, for more details).

5.3 Symmetric spaces


To conclude this section, we briefly expose symmetric spaces, as they can be defined from
a homogeneous space G/H with an additional tool defined on G. This results in one of
the most simple geometries, where the geodesics, parallel transport and curvature can be
computed in closed form. We start with a more intrinsic definition and will connect with this
description after. Similarly to that of homogeneous space, the structure of symmetric space
does not necessarily require a Riemannian metric but only an affine connection. We first
focus on the Riemannian case for simplicity, and give a few remarks on the more general
case, referring to it as the affine symmetric case. This notion was introduced by Cartan
(1926) who fully achieved a classification of symmetric spaces. The most complete reference
is Helgason (1979), and a good exposition of the non Riemannian case is given in Kobayashi
and Nomizu (1996b, Chapter XI) and Postnikov (2001, Chapters 4–10).
We first define the geodesic symmetries, the maps defined locally that revert geodesics.

Definition 5.3.1 (Geodesic symmetry). Let (M, g) be a Riemannian manifold. Let x ∈ M


and U ⊂ M be an open neighborhood of x such that the exponential map is injective on U .
The geodesic symmetry at x is the map defined by
(
U −→ M
sx :
y 7−→ Expx (− Logx (y))

It is clear that for any x ∈ M , x is an isolated fixed point of sx , and that (dsx )x = − id,
where id is the identity transformation of Tx M .

Definition 5.3.2 (Locally symmetric space). Let (M, g) be a Riemannian manifold. M is


called locally symmetric if for any x ∈ M , the geodesic symmetry sx is an isometry.

Remark 5.3.1. This definition is valid in an affine space where the notion of isometry is
replaced by the notion of affine map, i.e., that preserve the connection.

94
The property that defines locally symmetric spaces has direct consequences on its
curvature tensor (Kobayashi and Nomizu, 1996b, Chapter XI, Theorem 6.2).

Theorem 5.3.1. A Riemannian manifold M is locally symmetric if and only if ∇R = 0.

We shall say that the curvature of M is covariantly constant, and this will simplify the
parallel transport on locally symmetric spaces.

Definition 5.3.3 (Symmetric Space). (M, g) is called (globally) symmetric if the geodesic
symmetries are defined on the whole manifold M and are isometries.

The two notions are equivalent up to topological constrains, as stated in the following
theorem (Kobayashi and Nomizu, 1996b, Chapter XI, Theorems 6.3-6.4).

Theorem 5.3.2. A geodesically complete, simply connected, locally symmetric space is


globally symmetric. Conversely, every globally symmetric space is geodesically complete.

We now come to the interesting theorem that relates symmetric spaces to homogeneous
spaces (Kobayashi and Nomizu, 1996b, Chapter XI, Theorem 6.5).

Theorem 5.3.3. The group of isometries of M Isom(M ) is a Lie group that acts transitively
on M . M is thus a homogeneous space. Let G be the largest connected group of isometries
of M , and consider a reference point o ∈ M . Let H its stabilizer by the action of G. Then
H is compact and M ' G/H.

Define G as in the above theorem. We can define an additional structure on G that will
be the essence that differentiate symmetric spaces from homogeneous spaces.

Remark 5.3.2.

• The composition of two geodesic symmetries belongs to G, and is called a transvection.

• Let γ be a geodesic through o, then transvections sγ(t) ◦so are one-parameter subgroups
of G.

Theorem 5.3.4. Let M = G/H be a symmetric space, define on G the map σ : g 7→ so ◦g◦so .

• σ is involutive: σ ◦ σ = Id,

• σ is a group homeomorphism, i.e. σ(g ◦ h) = σ(g) ◦ σ(h), thus an automorphism,

• The set Gσ of fixed point of σ is a closed subgroup, and with Gσ0 its connected
component, we have Gσ0 ⊂ H ⊂ Gσ (implying dim(H) = dim(Gσ )).

95
Define t = {v ∈ g, dσe (v) = v} and m = {v ∈ g, dσe (v) = −v}. Then t coincides with the Lie
algebra h of H and G/H is naturally reductive with g = t ⊕ m, i.e. [t, t] ⊂ t and [t, m] ⊂ m,
and we have the additional property:
[m, m] ⊂ t.
As a consequence (Theorem 5.2.1), the geodesics of a symmetric space are thus projections
of one parameter subgroups. In fact symmetric spaces are the only homogeneous spaces
with an involutive automorphism as described in Theorem 5.3.4.
Definition 5.3.4 (Symmetric pair). A symmetric pair is a triplet (G, H, σ) where G is a
connected Lie group, H a closed subgroup of G and σ is an involutive automorphism of G,
such that its set of fixed points Gσ satisfies Gσ0 ⊂ H ⊂ Gσ .
Remark 5.3.3.
• If there exists a ∈ G s.t. a2 = e, then σ : g 7→ a ◦ g ◦ a−1 is an involutive automorphism,
and its set of fixed points Gσ is a closed (normal) subgroup of G, so that (G, Gσ , σ)
forms a symmetric pair.

• The inversion g → 7 g −1 is an automorphism if and only if G is commutative. This is


very restrictive, and in general the involution is not the inversion map.

• In fact one can show (see e.g. Cheeger and Ebin, 1975, Proposition 3.37) that a simply
connected group G possesses an involutive automorphism σ if and only if its Lie
algebra admits a decomposition g = h ⊕ m with [h, h] ⊂ h, [h, m] ⊂ m and [m, m] ⊂ h.
We now see how to recover a symmetric space from a symmetric pair. Recall that
π : g ∈ G → gH ∈ G/H is the canonical projection of the quotient G/H, and that G acts
on G/H by g1 . (g2 H) = (g1 g2 )H. Let o = eH = H, and so : gH 7→ σ(g)H, i.e. so ◦ π = π ◦ σ.
Theorem 5.3.5. Let (G, H, σ) be a symmetric pair such that H and Gσ0 are compact. Define
h = {v ∈ g, dσe (v) = v} and m = {v ∈ g, dσe (v) = −v}. Then M = G/H is naturally
reductive with the decomposition g = h ⊕ m. Furthermore, with the family of symmetries
defined at any x = gH ∈ M by
sx = g ◦ so ◦ g −1 ,
M is a globally symmetric space.
Remark 5.3.4. The assumption that H and Gσ0 are compact is sufficient to ensure the
existence of a G-invariant metric on G/H such that it is naturally reductive. It is not
necessary however to show that G/H is affine symmetric, but the connection may not be
the Levi-Civita connection of any metric. If a G-invariant metric exists however on an affine
symmetric space M = G/H, its Levi-Civita connection coincides with the connection of
the affine symmetric structure (Kobayashi and Nomizu, 1996b, Chapter XI, Theorem 3.3).

96
The last theorem along with Remark 5.3.3 now make it easier to exemplify the notion
of symmetric space.

Example 5.11: Hypersphere as a symmetric space


Recall from Example 4.16 (page 68) that the hypersphere S d can be seen as the
quotient of G = SO(d + 1) by H ' SO(d) defined by
( ! )
1 0
H= | R ∈ SO(d) .
0 R
!
−1 0
Define J = . Obviously J 2 = Id+1 so that the map defined by σ : P ∈
0 Id
G 7→ JP J ∈ G is an involutive automorphism, and it is straightforward to check
that Gσ = H. Then (G, H, σ) is a symmetric pair and by Theorem 5.3.5, S d is a
symmetric space. The Lie algebra is decomposed in so(d + 1) = h ⊕ m with
( ! )
0 0
h= , S ∈ so(d)
0 S
( ! )
0 −u>
m= , u ∈ Rd .
u 0

From the expressions of the Exp and Log map, we can compute the symmetry at
x ∈ S d . For any y ∈ S d
sx (y) = 2(hx, yi)x − y.
Similarly, the upper hyperboloid and the Euclidean space Rd are also symmetric.
Thus, all constant-curvature spaces are symmetric.

Example 5.12: The Grassmannian as a symmetric space


The Grassmann manifold Gr(k, n) (Example 4.18 page 69) is the set of k-
dimensional subspaces
! of Rn , and is identified with O(n)/(O(k) × O(n − k)). Define
Ik 0
J = . Obviously J 2 = In so that the map defined by σ : P ∈ G 7→
0 −In−k
JP J ∈ G is an involutive automorphism with fixed points
( ! )
Q 0
G =
σ
| Q ∈ O(k), R ∈ O(n − k), det(Q) det(R) = 1 ,
0 R

and Gσ = S(O(k) × O(n − k)) with Gσ0 = SO(k) × SO(n − k). Note that if we use
H = Gσ0 , we restrict to oriented k-subspaces, and consider in this case the oriented

97
Grassmann manifold. In both cases, we have
( ! )
S 0
h= , S ∈ so(k), T ∈ so(n − k)
0 T
( ! )
0 −A>
m= , A ∈ Mn−k,k (R) .
A 0

Then (G, H, σ) is a symmetric pair and by Theorem 5.3.5, Gr(k, n) is a symmet-


ric space. Given any P = QPk Q> ∈ Gr(k, n), the symmetry at P is sP : P̃ 7→
(QJQ> )P̃ (QJQ> ). The symmetric space structure allows to deduce many properties
of Gr(k, n), namely that it is geodesically complete, hence a complete metric space
(by the Hopf-Rinow theorem page 37), and its exponential map is surjective at all
points.

Example 5.13: Stiefel manifold as a symmetric space


We now give an example of homogeneous space that is not symmetric: the Stiefel
manifold St(k, n). Indeed recall that we have St(k, n) = G/H with G = SO(n) and
H = SO(n − k) and reductive decomposition so(n) = h ⊕ m with
( ! )
0 0
h= , T ∈ so(n − k)
0 T
( ! )
S −A>
m= , S ∈ so(k)A ∈ Mn−k,k (R)
A 0

We can check that [m, m] 6⊂ h, so that St(k.n) is not symmetric with this decomposi-
tion.

Example 5.14: The Affine-Invariant metric


Recall that SP D(n) is a homogenous space with (restricting to a connected
component) G = GL+ (n) and H = SO(n) and canonical projection that coincides
with π : A 7→ AA> . Define σ : A ∈ G 7→ A−> . It is clear that σ is an involutive
automorphism with Gσ = H. Then (G, H, σ) is a symmetric pair and by Theorem 5.3.5,
SP D(n) is a symmetric space. From σ and π we deduce the symmetry for Σ, Λ ∈
SP D(n): sΣ (Λ) = ΣΛ−1 Σ.
The affine-invariant (AI) metric is defined as the quotient metric on G/H of the
left-invariant metric on G that coincides with the Frobenius metric at I. Its expression

98
is thus at any Σ ∈ SP D(n), for all V, W ∈ Sym(n)

gΣ (V, W ) = tr(Σ−1 V Σ−1 W ).

From the projection of one-parameter subgroups we deduce ∀Σ, Σ1 , Σ2 ∈


SP D(n), ∀W ∈ Sym(n)
1 1 1 1
ExpΣ (W ) = Σ 2 exp(Σ− 2 W Σ− 2 )Σ 2 ,
1
−1 −1 1
LogΣ1 (Σ2 ) = Σ12 log(Σ1 2 Σ2 Σ1 2 )Σ12 ,

where when not indexed, exp and log refer to the matrix operators. Finally, let
1 t 1 1 1
Pt = Σ 2 exp( Σ− 2 W Σ− 2 )Σ− 2 .
2
The parallel transport from Σ along the geodesic with initial velocity W ∈ Sym(n)
of V ∈ Sym(n) a time t is (Yair et al., 2019)

Πt0,W V = Pt V Pt> .

We now focus on the case of Lie groups themselves, that can be seen as symmetric
spaces. However, one must be cautious on the structure, either metric or affine that is used.
Let G be a connected Lie group. Consider the product group G̃ = G × G with the involution
σ : (g, h) 7→ (h, g). The subgroup of fixed points is the diagonal of G̃, H = {(g, g), g ∈ G},
and its Lie algebra is h = {(x, x), x ∈ g} ' g. We thus see from Proposition 5.2.2, that for
G = G × G/G to be Riemannian symmetric hence reductive homogeneous, it must admit a
bi-invariant metric. There are still three possible reductive decompositions m ⊕ h of the Lie
algebra g̃ = g × g of G̃:
m = {(x, 0), x ∈ g}
m = {(0, x), x ∈ g}
m = {(x, −x), x ∈ g}
With either of these, G is homogeneous reductive, and each decomposition lead to a different
connection, called respectively left, right or mean connection. These are known as the Cartan-
Schouten connections (Lorenzi and Pennec, 2013). On can show that with the last connection,
G is an affine symmetric space with the symmetries ∀g ∈ G, sg : h 7→ gh−1 g (Pennec et al.,
2020, Theorem 5.8). If G admits a bi-invariant metric, this connection is the Levi-Civita
connection of the bi-invariant metric and G is a Riemannian symmetric space. An example
of this case is the group of rotation matrices SO(n). On the contrary, SE(n) does not
admit any bi-invariant metric, so it does not admit a Riemannian symmetric structure that
coincides with its Cartan-Schouten connection.

99
The properties of symmetric spaces allow to compute the geodesics and parallel transport
in closed-form, by the projection of the one-parameter subgroups of the Lie group G.
We rarely use explicitly this structure in geomstats, as all the results were derived and
implemented case by case. The structure is however useful to compute the curvature. Indeed,
using the identification of m with To (M ) induced by the restriction of dπ to m, we have for
all X, Y, Z ∈ m
R(X, Y )Z = −[[X, Y ], Z]
Moreover under topological conditions (M is simply connected and irreducible), O’Neill’s
formula (5.4) (page 77) simplifies and the sectional curvature is for u, v ∈ m orthogonal
1 1
κ(u, v) = h[[u, v], v], ui − h[[v, u], u], vi.
2 2
To conclude this section, symmetric spaces offer a useful framework for statistics on
manifolds beyond the convenience of the closed-form solutions for the geodesics and parallel
transport. Firstly, the normal distribution can be defined on all symmetric spaces, by
generalizing the property that its maximum likelihood estimate coincides with the least-
square problem of the Frechet mean (Said et al., 2018). The normalizing factor (or partition
function) does not depend on the mean and can be computed in closed form thanks a
decomposition of the Lie algebra.
Moreover, many computations such as interpolation or sampling can be performed in the
subspace m of the Lie algebra and be projected back to the symmetric space as in Gawlik
and Leok (2018), Munthe-Kaas et al. (2014), and Barp et al. (2019). Finally, there is a large
theory of harmonic analysis on symmetric spaces (see Terras, 1988), and limit theorems for
stochastic processes such as Brownian motion.
In this section, we focused on the common implementation of Riemannian metrics that
arise from group actions. The first case is for invariant metrics on Lie groups themselves,
for which the invariance allows to reformulate all the geometric operations as algebraic
operations in the Lie algebra. In particular, we applied this reasoning to derive a parallel
transport equation, that lead to a stable implementation in geomstats.
The second case is that of quotient metrics, that arise on the orbits of the group actions.
We identified the key ingredients to a common implementation in geomstats, and focused on
the Kendall shape spaces and Bures-Wasserstein metric to exemplify these. This formulation
allowed to easily implement parallel transport on Kendall shape spaces, and is to the best
of our knowledge the only open-source Python implementation of these spaces. The space
of correlation matrices seen as the quotient of SPD manifolds by the action of diagonal
matrices is also implemented in geomstats, and others spaces such as the space of positive
semi-definite matrices will be implemented in the near future.
Two particular instances of quotient spaces were then introduced: reductive homogeneous
spaces and symmetric spaces. They are in fact direct generalizations of Lie groups and

100
many operations from quotient spaces simplify. These concepts are however not used in
the present implementation, as their properties actually allows to derive more efficient
closed-form solutions.

6 Statistics and machine learning with Geomstats


To conclude this section, we demonstrate the use of geomstats to perform statistics on
manifold-valued data. The strength of the package is that learning algorithms are defined
as external estimators that take the geometric object as input. This happens thanks to
the standardized interface of the classes, and ensures that all learning tools be available
for all the manifolds and metrics. The obtained flexibility allows to compare the impact
of the different metrics on the learning results. We give a brief introduction to geometric
statistics, and the interested reader is referred to Pennec et al. (2020, Chapter 2) for the
theoretical exposition. An introductory paper gathering more examples using geomstats was
presented at the Scipy Conference 2020 in Austin, Texas (Miolane et al., 2020). More
generally, statistical methods for objects living in stratified and infinite dimensional spaces
that may not be manifolds are developed in the emerging domain of Object-Oriented Data
Analysis (OODA) (Marron and Dryden, 2021).

6.1 Probability distributions and sampling


Given a probability measure, one can define random variables valued in a manifold M as
follows.

Definition 6.1.1. Let (Ω, F, Pr) be a probability space, where Ω is a sample space, F a
σ-algebra∗ and Pr a probability measure∗ . Then a random variable in the Riemannian
manifold M is an F-measurable function X : Ω → M .

Recall that a probability measure is an assignment of a size to each subset (e.g. length,
area or volume in Euclidean spaces) such that the size of the entire space is Pr(Ω) = 1. A
Riemannian metric offers a convenient framework to define probability distributions on
manifolds, as it defines a volume measure, that in turn, allows to define the probability
measure. This measure is used as reference-measure of probability densities.
Consider a Euclidean space Rn , and an orthonormal basis (e1 , . . . , en ) of Rn . Let
(a1 , . . . , am ) be a set of vectors, and A the matrix whose columns are formed √ by the
coordinates of the ai . Then the volume spanned by (a1 , . . . , am ) is given by det(A) = det G
where G = A> A, i.e. Aij = hai , aj i (see Figure 6.1.)


Defined in Appendix A.

101
Figure 6.1: Volume spanned by three vectors in R3 .

Now in a Riemannian manifold (M, g), for any x ∈ M , one may choose an orthonormal
basis (a1 , . . . ad ) of Tx M with respect to g. g is then represented
√ by the SPD matrix
Gij = hai , aj i, so that the volume spanned by these vectors is det G.
Definition 6.1.2 (Riemannian volume form). An oriented Riemannian manifold (M, g) has
a natural volume form defined by d Vol(x) = det(g(x))dx.
p

The volume form provides the way to define integrals on M , i.e. it defines a measure, so
that some probability distributions can be expressed by their densities with respect to that
measure.
Definition 6.1.3. The random variable X has density f if ∀X ∈ B(M ), Pr(X ∈ X ) =
X f (y)dv(y) and Pr(M ) = 1.
R

Example 6.1: Uniform distribution


Let M be a compact Riemannian manifold, for example the hypersphere, the
special orthogonal group of the Grassmann manifold. Then a uniform distribution on
M has density
1
f (x) = .
Vol(M )

Example 6.2: Gaussian distribution


Let M be a Riemannian Symmetric space. A Gaussian distribution with mean and
concentration (µ, Γ) is defined by the density
f (x) = α(Γ, µ) exp(−logµ (x)> Γlogµ (x)).
It is the entropy maximizing distribution (Pennec, 2006), and in the isotropic, the
maximum likelihood estimator of µ coincides with the Frechet Mean (Pennec et al.,
2020, Section 2.5.1). The normalizing constant is explicitely computed in negative

102
curvature spaces (Said et al., 2018).

Sampling A common task in statistics is to draw samples from a probability distribution.


This might be motivated by inference tasks where the posterior distributions has constrained
parameters, in testing goodness of fit for exponential families, or to generate samples of
data to test our learning algorithms (Diaconis et al., 2013; Barp et al., 2019, and references
therein). However, on a manifold this does not reduce to sampling from usual distributions
even when a parametrization of the manifold is available, as the curvature of the space may
deform or stretch the densities. This is illustrated in the following examples.

Example 6.3: Uniform distribution on the sphere


Consider the unit sphere S 2 ⊂ R3 , with the spherical coordinates from the north
pole e0 = (0, 0, 1): x = sin(φ) cos(θ), y = sin(φ) sin(θ), z = cos(φ). The volume
element (here the area) is d Vol(θ, φ) = sin(φ)dθdφ. A naive attempt at sampling
from the uniform distribution on the sphere would be to sample θ uniformly in [0, 2π)
and φ uniformly in [0, π).
However, near φ = 0 and φ = π, as sin(φ) there is less density than with respect
to the usual measure of the 2d-plane. We thus expect points to accumulate in a
non-uniform way around the poles of the sphere.
In fact, if Y be a Gaussian vector in Rd with mean 0 and covariance identity. Then
X = kYY k is uniformly distributed in S d . The two sampling strategies are tested with
geomstats and shown on Figure 6.2.

space = Hypersphere(2)
n_samples = 5000
uniform_param = gs.random.rand(
n_samples, 2) * gs.pi * gs.array([1., 2.])[None, :]
naive_samples = space.spherical_to_extrinsic(uniform_param)
uniform_samples = space.random_uniform(n_samples)

103
Figure 6.2: Comparison of the naive sampling from a uniform distribution on the spherical
coordinates (left) and a more appropriate scheme that respects the isotropy of the sphere (right). In
the naive case, points accumulate near the poles and are sparser around the equator.

Example 6.4: Uniform distribution on SO(n)


Let SO(n) be the group of unit determinant orthogonal matrices of size n × n.
Let Y be a matrix of size n × n with independent standard normal distribution on
each coefficient. The QR decomposition of Y defines Y = XR where X ∈ SO(n) and
R upper triangular. Then X is uniformly distributed on SO(n) w.r.t. to the Haar
measure, which coincides with the Riemannian measure (Eaton, 1983).

Example 6.5: Uniform distribution on Stiefel manifold


Recall the Stiefel manifold is the set of orthonormal k-frames in Rn . It can be
represented by the set of matrices {U ∈ Rn×k |U > U = Ik }. Let Z be a matrix with
i.i.d standard normal distribution on the entries. Then X = Z(Z > Z)−1/2 is uniformly
distributed (Chikuse, 2003, Theorem 2.2.1). Note that X is a factor of the polar
decomposition of Z, representing the orientation of Z.

Example 6.6: Uniform distribution on Grassmannians


The Grassmann manifold is the set of k-dim subspaces of Rn . It can be represented
by the the set of projection matrices of size n × n, i.e. symmetric rank-k P such that
P 2 = P . Let Z be a matrix with i.i.d standard normal distribution on the entries.
Then X = Z(Z > Z)−1/2 Z is uniformly distributed (Chikuse, 2003, Theorem 2.2.2).

104
Note that X = Y Y > for Y uniformly distributed on the Stiefel manifold.

These examples are particular cases where simple recipes were available to sample from
the uniform distribution. They are implemented in geomstats. Sampling from non-uniform
distribution is usually intractable and we resort to simulation methods from the literature.
In particular, in geomstats we focused on rejection sampling, that consists in sampling
from a proposal distribution whose density is greater than the target density, and accepting
the samples with probability corresponding to the ratio of the two densities2 . We do not
detail this procedure in this monograph but refer to Wood (1994) and Hauberg (2018) for
examples of rejection sampling algorithms for non-uniform distributions on the hypersphere.

6.2 Distance-based algorithms


A large class of machine learning algorithms only require computing a matrix where each
entry is the distance between a pair of samples. This is the case for examples of the nearest
neighbors methods or hierarchical clustering. These can readily be used by computing the
distance matrix with a geomstats metric. We give below a toy example on the sphere, with
data generated with two von Mises distributions (Example 6.7).

Example 6.7: Hierarchical clustering on the sphere


The coaches of the French skiing Olympic team studies the solid angles of inclination
of their skier during a slalom. To first filter left and right turns, they use a hierarchical
clustering algorithm on the sphere of solid angles.

space = Hypersphere(2)
metric = space.metric

To test the method, they first generate a toy dataset with von Mises-Fisher distribu-
tions
n_clusters = 2
n_samples = 50

left = sphere.random_von_mises_fisher(kappa=10, n_samples=n_samples)


right = -sphere.random_von_mises_fisher(kappa=10, n_samples=n_samples)
dataset = gs.concatenate((left, right), axis=0)

A geomstats model with the Riemannian distance can then be used to compute the
distances. We use n_clusters=None, distance_threshold=0 in order to compute the
full tree and plot the dendrogram shown on the right panel of Figure 6.3 (as in the

2
See the Wikipedia page for more details.

105
scikit-learn example).

model = AgglomerativeHierarchicalClustering(
n_clusters=None, distance=metric.dist, distance_threshold=0)
model.fit(dataset)
plot_dendrogram(model)

The cluster assignments can also be computed by setting n_cluster=2. These are
shown on Figure 6.3 (Left).

model = AgglomerativeHierarchicalClustering(n_clusters, distance=metric.dist)


model.fit(dataset)
clustering_labels = model.labels_

Figure 6.3: Toy experiment of a hierarchical clustering experiment. The data is sampled from two
von Mises-Fisher distributions (left). A dendrogram is computed from the distance matrix (right).

6.3 The Fréchet mean


Let (M, g) be a Riemannian manifold and let x1 , . . . , xn ∈ M be independent identically
distributed (i.i.d.) sample data points. The Euclidean sample mean x̄ = n1 ni=1 xi is not
P

defined unless we consider M ⊂ RN , but in this case x̄ may not lie on M .

Definition, Existence, Uniqueness In fact, the mean is characterized by the property


that it minimises the sum of squared distances to the data points. This property was used
by Fréchet (1948) to generalize the notion of mean to metric spaces, and later by Hermann
Karcher in Riemannian manifolds. For a historical note and corresponding references, we
refer to Karcher (2014).

Definition 6.3.1 (Fréchet mean). Let (M, g) be a Riemannian manifold with Riemannian

106
distance function d and let x1 , . . . , xn ∈ M be an i.i.d. data set. The sample Fréchet mean
is defined as the set of minimizers of the sum-of-squared distances, i.e.,
n
x̄ = arg min
X
d(xi , x)2 .
x∈M
i=1

With some nuance in the requirements on the minimum (local or global), the Fréchet
mean is also referred to as Karcher mean or Riemannian barycenter, or even Wasserstein
barycenter in optimal transport. Note that as it is defined by an optimization, the Fréchet
mean may not exist, or not be unique, depending on the properties of the distance function.
For instance, the completeness of the metric is a sufficient condition to guarantee the
existence of the Fréchet mean of a finite set of points (Pennec et al., 2020, Chapter 2).

Theorem 6.3.1. Let M be a complete metric space. Then the Fréchet mean of any finite
set of points x1 , . . . , xn exists.

On the other hand, uniqueness depends on the convexity of the distance function and
can be related to the sign of the curvature (Karcher, 1977; Kendall, 1990; Afsari, 2011).

Theorem 6.3.2. Let M be a complete Riemannian manifold with sectional curvature


bounded above by some δ, and let inj(M ) be its injectivity radius. If samples x1 , . . . , xn ∈ M
are contained in a geodesic ball of radius
1 π
 
r = min inj(M ), √
2 δ
then the Fréchet mean x̄ is unique.

Note that in the above, √πδ is interpreted as ∞ if δ ≤ 0. This means that in complete
manifolds of non-positive curvature, the Fréchet mean is always uniquely defined, as in the
Euclidean case. On the other hand, this is not the case in spaces of positive curvature. For
example, on the sphere, Theorem 6.3.2 ensures that the mean is unique if all the data points
lie on the same open hemisphere, but the simple data set constituted by two antipodal
points ({x, −x} for any x ∈ S d ) exemplifies non-uniqueness of the mean: a great circle in
this case.
The asymptotic properties of the Fréchet mean were studied in Bhattacharya and
Patrangenaru (2005), where a law of large numbers and a central limit theorem are
established, and prove the relevance of this notion of mean.

Characterization and Estimation Suppose that the data points xi , i = 1, . . . , n lie in


a ball B of radius r < inj(M 2
)
as in Theorem 6.3.2. Then the squared distance is convex
and is obtained by measuring the length of a minimizing geodesic and can be written
d2 (x, y) = k Logx (y)k2x for any x, y ∈ B. One can show that the gradient of y 7→ d2 (x, y)

107
is −2 Logy (x) ∈ Ty M (see the first variation formula in Lafontaine et al., 2004, Theorem
3.31).

Remark 6.3.1. Recall that the gradient of a function f : Rd → R is defined as the adjoint
to its differential, i.e.
∀(x, v) ∈ T M, hgradx (f ), vi , dfx v.
This definition extends to functions defined on Riemannian manifolds, but the gradient thus
depends on the metric, and is sometimes referred to as the Riemannian gradient. It is also
named natural gradient in information geometry, in the case of manifolds of parameters
of families of distributions with the Fisher-Rao metric. If the manifold is embedded in
some RN and equipped with the pullback metric, it can be obtained by projecting the
usual gradient to the tangent space at the point where the function is being differentiated,
similarly to the Levi-Civita connection.

By definition, the Fréchet mean must be a critical point of the sum-of-squared distances
function, so that it verifies
n
Logx̄ (y) = 0. (6.1)
X

i=1
A simple strategy to estimate the sample Fréchet mean is thus to use a fixed-point iterative
algorithm until Equation (6.1) is verified. An update at iteration t is performed along a
geodesic:
n
xt+1 = Expxt (γ Logxt (y)),
X

i=1
where γ is a step size. Higher-order methods use adaptive step size, approximates of the
Hessian function or its link with curvature to improve the performance of the estimation.
Some of these are implemented in the FrechetMean module of geomstats.

Example 6.8: Fréchet mean on the sphere


The following is an illustrative example with simulated data. A scientific expedition
of 15 boats is collecting samples from the Pacific ocean to measure the quantity of
micro-plastics in the water. Every month, they gather all the samples in one boat
that goes back to their home harbour for the analyses. As they have been wandering
around the Pacific for some time, we sample their positions from a spherical normal
distribution around their last meeting point.

space = Hypersphere(2)
n_samples = 15
last_meeting_point = gs.array([0., 1., 0.])
samples = space.random_riemannian_normal(
mean=last_meeting_point, precision=10, n_samples=15)

108
The geometers onboard compute the Fréchet mean of their positions to choose the
meeting spot that minimises the sum-of-squared distances to this gathering (supposing
no wind or currents).

estimator = FrechetMean(space.metric)
estimator.fit(samples)
new_meeting_point = estimator.estimate_

Their position (black) and the next meeting point (red) are displayed in Figure 6.4.
The trajectories they have to follow are drawn in green, and are geodesics.

Figure 6.4: Fréchet mean on the sphere

The Fréchet mean x̄ is key to define many other statistical and learning tools, e.g. the
sample covariance matrix:
n
1X
Σ= Logx̄ xi ⊗ Logx̄ xi . (6.2)
n i=1
One can then parametrize densities from their mean and covariance, as in Chevallier and
Guigui (2020). The proposed model allows to define a simple density estimation scheme.
One can also generalize the K-means clustering algorithm and the minimum distance to
mean (MDM) classification algorithm with Fréchet means. These are implemented in the
classes RiemannianKMeans and RiemannianMinimumDistanceToMeanClassifier in geomstats.
Moreover, one can use the Fréchet mean to linearize the data by lifting it to the tangent
space at the Fréchet mean, i.e. to consider the transformed data set (x̃1 , . . . , x̃n ) where
x̃i = Logx̄ (xi ).

109
This is equivalent to using a Taylor expansion at the first order of the Riemannian exponential
map around 0: xi = Expx̄ (x̃i ) = x̄ + x̃i + O(t2 ). As Tx̄ M is a vector space, all the usual
statistical and machine learning tools can be used off-the-shelf on the transformed data set.
This is implemented as a Transformer from the scikit-learn package in the ToTangent class
of geomstats, and can be used in scikit-learn’s pipeline.

6.4 Generalizations of PCA


The other fundamental tools to analyse data in vector spaces are the sample covariance
matrix, and Principal Component Analysis (PCA). The aim of PCA is to find the sequence
of subspaces such that data projected on these subspaces have maximum variance, or
equivalently minimum reconstruction error (i.e. sum of squared distance to the original
points). This equivalence is no longer true in Riemannian manifolds, as Pythagorean theorem
is not true.
We describe the principal geodesic analysis (PGA), that aims at minimizing the re-
construction error when projecting data on a geodesic submanifold (Fletcher et al., 2004).
In the simplest forward fashion, the mean is first computed, and the first component is
a geodesic from the mean. This is close to a slightly different procedure called Geodesic
PCA (Huckemann et al., 2010). The projection of a data point x on a geodesic γ with initial
velocity v at the mean x̄ is defined by

πx̄,v (x) = arg min d2 (x, Expx̄ (tv)) .


t∈R

There is no guarantee that this projection exists, but if the data is not too spread out, one
can hope that there exists a convex neighborhood where the exponential map is injective
that contains x and a portion of the geodesic. The projection is solved by a gradient descent,
as in the case of the Fréchet mean, where the gradient of the exponential is computed by
automatic differentiation.
The next step is to minimize the overall reconstruction error, given a dataset x1 , . . . , xn ,
we look for the initial velocity of a geodesic as:
n
v ∗ = arg min d2 (xi , πx̄,v (xi )) . (6.3)
X
v∈Tx̄ M
i=1

This time computing the gradient of the objective function seems more complicated, as n
minimization problems are already solved by a gradient descent to evaluate the function.
However Ablin et al. (2020) show that the gradient of a function defined by a minimum could
be computed efficiently by automatic differentiation of the gradient descent approximation.
We exemplify this in the example of the sphere. Of course this example is particularly simple
as both Exp and Log maps are computed with closed-form solutions whose derivatives are

110
known, and some work remains to use automatic differentiation when the Log map itself is
hard to compute.

Example 6.9: Principal geodesic analysis on earth


The following is an illustrative example with simulated data. The boats of the
scientific exploration of Example 6.8 are in fact too busy collecting data and cannot
make route to the Fréchet mean. Thankfully, the geometers onboard are expert users
of geomstats and compute a PGA of their positions, so that one boat can visit the
rest of the fleet by sailing on the principal geodesic. This is illustrated on Figure 6.5.

Figure 6.5: Example of PGA on the sphere. The target data are the black dots, the geodesics that
realize the projection of each sample point to the principal geodesic are shown in green, the first
geodesic subspace or principal geodesic is in gray, and the mean in red.

A general simplification if the data is not too far from the mean is to approximate the
projection operator by the Log map: Logx̄ (πx̄,v (xi )) ≈ Logx̄ (xi ). With this approximation,
PGA resumes to computing the logarithms of all the data points and computing a usual
PCA of the obtained tangent vectors. This procedure is more generally called tangent PCA
and is implemented in the TangentPCA class of geomstats.

6.5 Geodesic Regression


Similarly to the mean, the linear regression can be generalized to geodesic regression
on Riemannian manifolds by solving a least-square fitting problem. Given target points
y1 , . . . , yn ∈ M , and data t1 , . . . , tn ∈ R, we seek the geodesic that best approximates the

111
data: n  
min d2 Expp (ti v), yi , (6.4)
X
(p,v)∈T M
i=1
where d and Exp are the Riemannian distance and exponential maps. When the metric is
Euclidean, this coincides with the usual linear regression problem. However, there is no
closed-form solution in general and the problem must be solved by optimization.
To differentiate the objective function of Equation (6.4), one needs to compose the
gradients of the squared distance with that of the exponential map. The gradient of the
squared distance is proportional to the Riemannian logarithm, as noted above. On the other
hand, the gradient of the Exp map is usually computed via Jacobi fields. However, to avoid
implementing those in geomstats we chose to leverage automatic differentiation tools to
compute the extrinsic gradient, and to project it to the right tangent space, as explained in
Remark 6.3.1.
To solve the optimization problem, either a Riemannian gradient descent, or an extrinsic
one with scipy’s solver is used.

Example 6.10: Geodesic Regression above earth


The following is an illustrative example with simulated data. The European Space
Agency sends a maintenance mission on their geo-stationary satellites. As these are
all at the same altitude, they lie on a sphere, and the geometers seek an optimal
trajectory.
The maintenance should be done as close as possible to the end of the fuel tanks,
which are known from previous missions. One vessel will leave earth, reach the first
point of the trajectory called point γ, and split in two parts that will each go in
opposite directions given by β ∈ Tγ S 2 .
We use data from the previous mission to generate random satellite positions:
space = Hypersphere(2)
n_samples = 50
data = gs.random.rand(n_samples)
data -= gs.mean(data)

previous_gamma = space.random_uniform()
beta = space.to_tangent(5. * gs.random.rand(3), previous_gamma)
target = space.metric.exp(data[:, None] * beta, previous_gamma)

And add some noise because satellites did not stay on the previous misison’s trajectory

normal_noise = gs.random.normal(size=(n_samples, 3))


noise = space.to_tangent(normal_noise, target) / gs.pi / 2
target = space.metric.exp(noise, target)

The optimal trajectory is computed by fitting a geodesic regression model with the

112
Riemannian gradient descent:

gr = GeodesicRegression(space, algorithm='riemannian')
gr.fit(data, target, compute_training_score=True)
gamma_new, beta_new = gr.intercept_, gr.coef_

We can measure the mean squared error (MSE) with respect to the previous γ and β,
and the determination coefficient from the noise level. We obtain MSEs around 10−3
and R2 = 0.96. The results are shown on Figure 6.6.

Figure 6.6: Example of geodesic regression on the sphere. The target data are the blue dots, the
fitted values are in green, the regression geodesic in black, and the intercept (γ) in red.

7 Conclusion
In this monograph, we introduced notions of differential geometry that form the essential
building blocks of geometric statistics. Our developments target an audience with a general
background in mathematics. We focused on embedded manifolds of RN , and highlighted
the different ways of defining a manifold and a Riemannian metric. We detailed the notions
of curvature, geodesic, distance and parallel transport. We also presented the concepts of
Lie groups, group actions and quotient space, with an emphasis on Riemannian metrics
that are invariant to a group action.
These differential geometric notions are key to the architecture of the geomstats library.
They drove our recent contributions to the package, that aimed at making it more faithful
to mathematical theory, more robust and more modular. We exemplified the geometric
concepts with the most common manifolds encountered in mathematical text books as well
as in applications. Our examples include code snippets using geomstats to demonstrate how

113
to leverage differential geometry in practical use cases.
Lastly, we gave an introduction to geometric statistic tools such as the Fréchet mean,
principal geodesic analysis and geodesic regression. We illustrated these on toy examples
using synthetic datasets on manifolds. These geometric statistical tools are implemented
in the geomstats package with a common high-level interface, following the Scikit-Learn
syntax. Consequently, geometric statistics become available to any data scientist on a wide
variety of practical problems.
We hope that the concepts presented here will drive scientists to use, and contribute to,
geometric statistics with the geomstats library. Future developments of the geomstats pack-
age will integrate additional statistical methodologies published by the geometric statistics
community, such as wrapped Gaussian processes, multivariate and polynomial geodesic
regression. We will also extend the scope of geomstats and include computational methods
for information geometry, a field at the intersection of geometry and statistics closely related
to geometric statistics. Another module on stratified spaces such as graph and tree spaces
is currently being developed, and raises fundamental methodological questions as these
spaces are not smooth manifolds but unions of smooth manifolds equipped with a distance
function. Together, these advances aim to provide mathematically-grounded foundations
for computational geometric statistics.

8 Acknowledgment
This work was funded by the ERC grant Nr. 786854 G-Statistics from the European Research
Council under the European Union’s Horizon 2020 research and innovation program. It
was also supported by the French government through the 3IA Côte d’Azur Investments
ANR-19-P3IA-0002 managed by the National Research Agency. The authors are grateful to
Adele Myers for her corrections and suggestions to improve this monograph.

114
List of Examples
2.1 Hypersphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Hyperbolic space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Special Orthogonal group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Stiefel manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Kendall size-and-shape space . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6 Product manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Cusp and Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.8 Tangent space of the hypersphere . . . . . . . . . . . . . . . . . . . . . . . . 12
2.9 Tangent space of the hyperbolic space . . . . . . . . . . . . . . . . . . . . . 13
2.10 Tangent space of Stiefel manifold . . . . . . . . . . . . . . . . . . . . . . . . 13
2.11 Tangent space of SO(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.12 Implementation of the hypersphere . . . . . . . . . . . . . . . . . . . . . . . 20
2.13 Implementation of the Stiefel manifold . . . . . . . . . . . . . . . . . . . . . 21
2.14 Implementation of the Poincaré ball . . . . . . . . . . . . . . . . . . . . . . 22
3.1 Euclidean metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Product metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Metric on the hypersphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Hyperbolic metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 Frobenius metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6 Geodesics of the hypersphere . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.7 Completeness of the half-plane . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.8 Curvature of the Euclidean space . . . . . . . . . . . . . . . . . . . . . . . . 40
3.9 Sectional curvature of S d and H+ d . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1 General Linear group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4 Implementation of GL(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5 General linear Lie algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6 Implementation of SO(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.7 Implementation of SE(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.8 Exponential map of GL(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.9 Curve in SE(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.10 Invarirant metric on SO(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.11 Invarirant metric on SE(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.12 Structure constants on SO(3) . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.13 Structure constants on SE(3) . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.14 Geodesics on SE(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

115
4.15 Parallel transport in SE(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.16 Hypersphere as a homogenous space . . . . . . . . . . . . . . . . . . . . . . 68
4.17 Stiefel manifold as a homogenous space . . . . . . . . . . . . . . . . . . . . 69
4.18 Grassmannian as a homogenous space . . . . . . . . . . . . . . . . . . . . . 69
4.19 SPD matrices as a homogenous space . . . . . . . . . . . . . . . . . . . . . . 70
5.1 Bures-Wasserstein metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Horizontal lift of SPD matrices . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.3 Bures-Wasserstein geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4 Curvature of the Bures-Wasserstein metric . . . . . . . . . . . . . . . . . . . 77
5.5 Bures-Wasserstein distance . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.6 Implementation of the BW metric . . . . . . . . . . . . . . . . . . . . . . . 85
5.7 Kendall shape metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.8 Full rank correlation matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.9 Invariant metric on Stiefel manifold . . . . . . . . . . . . . . . . . . . . . . . 92
5.10 Stiefel Exponential map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.11 Hypersphere as a symmetric space . . . . . . . . . . . . . . . . . . . . . . . 97
5.12 The Grassmannian as a symmetric space . . . . . . . . . . . . . . . . . . . . 97
5.13 Stiefel manifold as a symmetric space . . . . . . . . . . . . . . . . . . . . . 98
5.14 The Affine-Invariant metric . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.1 Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2 Gaussian distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3 Uniform distribution on the sphere . . . . . . . . . . . . . . . . . . . . . . . 103
6.4 Uniform distribution on SO(n) . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.5 Uniform distribution on Stiefel manifold . . . . . . . . . . . . . . . . . . . . 104
6.6 Uniform distribution on Grassmannians . . . . . . . . . . . . . . . . . . . . 104
6.7 Hierarchical clustering on the sphere . . . . . . . . . . . . . . . . . . . . . . 105
6.8 Fréchet mean on the sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.9 Principal geodesic analysis on earth . . . . . . . . . . . . . . . . . . . . . . 111
6.10 Geodesic Regression above earth . . . . . . . . . . . . . . . . . . . . . . . . 112

116
List of Figures
2.1 Representation of a manifold. . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Transition maps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Nodal cubic M1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Cuspidal curve M2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Definition of a smooth curve on a manifold. . . . . . . . . . . . . . . . . . . 11
2.6 Tangent vector and tangent space . . . . . . . . . . . . . . . . . . . . . . . . 11
2.7 Example of an integral curve . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.8 Example of a flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.9 Architecture of the manifolds of geomstats . . . . . . . . . . . . . . . . . . . 23
3.1 Parallel vector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Impact of the curvature on the distance function . . . . . . . . . . . . . . . 43
4.1 One-parameter subgroups of SE(2) . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Geodesics and one-parameter subgroups of SE(2) . . . . . . . . . . . . . . . 61
4.3 Visualization of geodesics on SE(2) . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Error curves for the parallel transport on SE(3) . . . . . . . . . . . . . . . 64
5.1 Error curves for parallel transport on Kendall spaces . . . . . . . . . . . . . 83
6.1 Volume spanned by three vectors in R3 . . . . . . . . . . . . . . . . . . . . . 102
6.2 Samples from the uniform distribution on the sphere . . . . . . . . . . . . . 104
6.3 Hierarchical clustering on the sphere . . . . . . . . . . . . . . . . . . . . . . 106
6.4 Fréchet mean on the sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.5 Principal Geodesic Analysis on the sphere . . . . . . . . . . . . . . . . . . . 111
6.6 Geodesic regression on the sphere . . . . . . . . . . . . . . . . . . . . . . . . 113

117
A Lexicon
Definition A.0.1 (Topology). Let M be a set and P(M ) the set of subsets of M . Then a
topology on M is a set Θ ∈ P(M ) such that
• ∅ ∈ Θ and M ∈ Θ
• ∀U, V ∈ Θ, U ∩ V ∈ Θ
• C ⊆ Θ =⇒ U ∈C U ∈ Θ
S

The sets in Θ are called open sets, and a set S is said to be closed if and only if M \ S ∈ Θ.
Such a pair (M, Θ) is called a topological space.

Definition A.0.2 (Hausdorff). A topological space (M, Θ) is said to be Hausdorff if, for
any two distinct points p, q ∈ M , there exist open neighborhoods of p and q with empty
intersection.

Definition A.0.3. A topological space (M, Θ) is second-countable if there exists some


countable collection A = {Ui }i∈N of open sets of M such that any open set can be written
as a union of elements of A

Definition A.0.4 (Connected). A topological space (M, Θ) is said to be connected unless


there exist two non empty open sets A, B ∈ Θ such that A ∩ B 6= 0 and M = A ∪ B.
Equivalently, M is connected if and only if the only subsets that are both open and
closed are M itself and the empty set ∅.

Definition A.0.5 (Group). A group is a couple (G, ·) where G is a nonempty set, and
· : G × G → G is a map such that
• ∃e ∈ G, ∀g ∈ G, e · g = g · e = g,
• ∀g ∈ G, ∃h, g · h = h · g = e. We define the inverse of g for · as g −1 = h in this case.

Definition A.0.6 (Homomorphism). Les (G, ·) and (H, •) be two groups, and f : G → H be
a map. Then f is a homomorphism if for any x, y ∈ G, f (x · y) = f (x) • f (y).

Definition A.0.7 (Algebra). An algebra over a field K is a vector space (A, +, ·) over K
equipped with a bilinear multiplicative law ⊗ : A × A → A such that
• (distributivity) ∀x, y, z ∈ A, (x + y) ⊗ z = x ⊗ z + y ⊗ z and z ⊗ (x + y) = z ⊗ x + z ⊗ y
• (compatibility with scalars) ∀a, b ∈ K, ∀x, y ∈ A, (ax) ⊗ (by) = (ab)x ⊗ y.

Definition A.0.8 (Injective-Surjective-Bijective map). Let E, F be two sets and f : E → F a


map between E and F . Then we say that
• f is injective if for every x, x0 ∈ E, x 6= x0 =⇒ f (x) 6= f (x0 ),
• f is surjective if for every y ∈ F , there exists x ∈ E such that y = f (x),
• f is bijective if it is both injective and surjective.

118
Definition A.0.9 (Continuous map). Let E, F be two topological spaces. f : E → F is
continuous of C 0 if for every open set U ⊂ F , its preimage f −1 (U ) by f is an open set of E.
Definition A.0.10 (Homeomorphism). Let f : E → F be a map between two topological
spaces. f is called a homeomorphism if it has the following properties:
• f is a bijection,
• f is continuous,
• the inverse f −1 of f is continuous.
Definition A.0.11 (Differential map). Let p, n ∈ N, f : U ⊂ Rn → Rp a map defined on an
open set U , and x0 ∈ U . We say that f is differentiable at x0 if there exists a linear map L
defined in Rn such that
∀h, f (x0 + h) = f (x0 ) + L(h) + o(khk).
In that case, L is unique and is called the differential of f at x0 , and written dfx0 .
Definition A.0.12 (Class C k ). Let p, n ∈ N, f : U ⊂ Rn → Rp a map defined on an open
set U . f is C 1 if it is differentiable on U and the map x 7→ dfx is continuous on U . Similarly
we say that f is C k or of class C k for k ∈ N ∪ {∞} if f is k-times differentiable.
Definition A.0.13 (C k -diffeomorphism). Let k ∈ (N \ {0}) ∪ {∞} and let f : U → V be a
map between two open sets of Rn . f is called a diffeomorphism of class C k if it has the
following properties:
• f is a bijection,
• f is of class C k ,
• the inverse f −1 of f is C k .
Definition A.0.14 (Inner product). Let E be a real vector space. An inner product is a
symmetric positive-definite bilinear map h·, ·i : E × E → R, i.e. ∀x, y ∈ E, hx, yi = hy, xi
and hx, xh> 0 ⇐⇒ x 6= 0.
Definition A.0.15 (σ-algebra). Let M be a set and P(M ) the set of subsets of M . A subset
Σ ⊆ P(M ) is called σ-algebra if it has the three following properties:
• It is closed under complementation: for any set S ∈ Σ, M \ S ∈ Σ;
• M is an element of Σ: M ∈ Σ;
• Σ is closed under countable unions: ∀(Si )i∈N , i Si ∈ Σ.
S

Definition A.0.16 (Borel σ-algebra). Let M be a topological set. The Borel σ-algebra B(M )
on M is the smallest σ-algebra that contains all the open sets of M .
Definition A.0.17 (Probability measure). Let F be a σ-algebra over a set Ω. A probability
measure Pr is a function Pr : F → [0, 1] such that:
• Pr is σ-additive: ∀(Si )i∈N , Pr( i Si ) ∈ Σi Pr(Si ).
S

• Pr has unit mass: Pr(Ω) = 1

119
B SE(n) with an anisotropic metric
B.1 Geodesics
Below is the code to plot geodesics of SE(n) with the anisotropic metric described in
Example 4.14. The code will be updated on the dedicated Github repository1 .
import matplotlib.pyplot as plt

import geomstats.backend as gs
import geomstats.visualization as visualization
from geomstats.algebra_utils import from_vector_to_diagonal_matrix
from geomstats.geometry.invariant_metric import InvariantMetric
from geomstats.geometry.special_euclidean import SpecialEuclidean

SE2_GROUP = SpecialEuclidean(n=2, point_type='matrix')


N_STEPS = 15

def main():
"""Plot geodesics on SE(2) with different structures."""
theta = gs.pi / 4
initial_tangent_vec = gs.array([
[0., - theta, 1],
[theta, 0., 1],
[0., 0., 0.]])
t = gs.linspace(0, 1., N_STEPS + 1)
tangent_vec = gs.einsum('t,ij->tij', t, initial_tangent_vec)

fig = plt.figure(figsize=(10, 10))


maxs_x = []
mins_y = []
maxs = []
for i, beta in enumerate([1., 2., 3., 5.]):
ax = plt.subplot(2, 2, i + 1)
metric_mat = from_vector_to_diagonal_matrix(gs.array([1, beta, 1.]))
metric = InvariantMetric(SE2_GROUP, metric_mat, point_type='matrix')
points = metric.exp(tangent_vec, base_point=SE2_GROUP.identity)
ax = visualization.plot(
points, ax=ax, space='SE2_GROUP', color='black',
label=r'$\beta={}$'.format(beta))
mins_y.append(min(points[:, 1, 2]))
maxs.append(max(points[:, 1, 2]))
maxs_x.append(max(points[:, 0, 2]))
plt.legend(loc='best')

for ax in fig.axes:
x_lim_inf, _ = plt.xlim()
x_lims = [x_lim_inf, 1.1 * max(maxs_x)]
y_lims = [min(mins_y) - .1, max(maxs) + .1]
ax.set_ylim(y_lims)
ax.set_xlim(x_lims)
ax.set_aspect('equal')
plt.savefig('../figures/geo-se2.eps', bbox_inches='tight', pad_inches=0)
plt.show()

if __name__ == '__main__':
main()

1
https://github.com/geomstats/ftmal-paper

120
B.2 Curvature
This metric is used already in Example 4.13. From the structure constants and Equation (4.4),
we can compute the associated Christoffel symbols at identity for the frame (e˜1 , . . . e˜6 ). Let

τ = ( β + √1 ). We obtain
β

1
Γkij = √ if ijk is a cycle of [1,2,3], (B.1)
2 2
2 2 2 2 1
Γ615 = −Γ516 = − Γ624 = Γ426 = Γ534 = − Γ435 = √ , (B.2)
τ τ τ τ 2
and all the others are null.

Lemma B.2.1. (SE(3), g) is locally symmetric, i.e. ∇R = 0, if and only if β = 1.

We can now prove Lemma B.2.1, formulated as: (SE(3), g) is locally symmetric, i.e.
∇R = 0, if and only if β = 1. This is valid for any dimension d ≥ 2 provided that the metric
matrix G is diagonal, of size d(d + 1)/2, with ones everywhere except one coefficient of the
translation part.

Proof. For β = 1, (SE(d), g) is isometric to (SO(d) × Rd , grot ⊕ gtrans ). As the product of


two symmetric spaces is again symmetric, (SE(d), g) is symmetric.
We prove the contraposition of the necessary condition. Let β 6= 1. We give i, j, k, l s.t.
(∇ei R)(ej , ek )el 6= 0:

(∇e3 R)(e3 , e2 )e4 = ∇e3 (R(e3 , e2 )e4 ) − R(e3 , ∇e3 e2 )e4 − R(e3 , e2 )∇e3 e4
1 τ
= ∇e3 (R(e3 , e2 )e4 ) + √ R(e3 , e1 )e4 − √ R(e3 , e2 )e5 .
2 2 2

121
And from the above

R(e3 , e2 )e4 = ∇e3 ∇e2 e4 − ∇e2 ∇e3 e4 − ∇[e3 ,e2 ] e4


τ 1
= − √ ∇e3 e6 − ∇e2 e5 + √ ∇e1 e4
2 2 2
= 0.
R(e3 , e1 )e4 = ∇e3 ∇e1 e4 − ∇e1 ∇e3 e4 − ∇[e3 ,e1 ] e4
τ 1
= − √ ∇e1 e5 − √ ∇e2 e4
2 2 2
τ τ
= − e6 + e6 = 0.
4 4
R(e3 , e2 )e5 = ∇e3 ∇e2 e5 − ∇e2 ∇e3 e5 − ∇[e3 ,e2 ] e5
τ 1
= √ ∇e2 e4 + √ ∇e1 e5
2 2 2
τ 2 1 1 τ2
= − e6 + e6 = (1 − )e6 .
8 2 2 4

And therefore
1
β 6= 1 =⇒ τ = ( β + √ ) 6= 2
p
β
τ τ2
=⇒ (∇e3 R)(e3 , e1 )e4 = − √ (1 − )e6 6= 0,
4 2 4
which proves Lemma B.2.1.

B.3 One parameter subgroups


Note that these don’t depend on the choice of the metric.
import matplotlib.pyplot as plt

import geomstats.backend as gs
import geomstats.visualization as visualization
from geomstats.geometry.special_euclidean import SpecialEuclidean

SE2_GROUP = SpecialEuclidean(n=2, point_type='matrix')


N_STEPS = 30
end_time = 2.7

theta = gs.pi / 3
initial_tangent_vecs = gs.array([
[[0., - theta, 2], [theta, 0., 2], [0., 0., 0.]],
[[0., - theta, 1.2], [theta, 0., 1.2], [0., 0., 0.]],
[[0., - theta, 1.6], [theta, 0., 1.6], [0., 0., 0.]]])
t = gs.linspace(-end_time, end_time, N_STEPS + 1)

122
fig = plt.figure(figsize=(6, 6))
for tv, col in zip(initial_tangent_vecs, ['black', 'y', 'g']):
tangent_vec = gs.einsum('t,ij->tij', t, tv)
group_geo_points = SE2_GROUP.exp(tangent_vec)
ax = visualization.plot(
group_geo_points, space='SE2_GROUP', color=col)
ax = visualization.plot(
gs.eye(3)[None, :, :], space='SE2_GROUP', color='slategray')
ax.set_aspect('equal')
ax.axis("off")
plt.savefig('../figures/exponential_se2.eps')
plt.show()

123
References
Ablin, P., G. Peyré, and T. Moreau (2020). “Super-Efficiency of Automatic Differentiation for
Functions Defined as a Minimum”. In: Proceedings of the 37th International Conference
on Machine Learning. Ed. by D. Hal III and A. Singh. Vol. 119. Proceedings of Machine
Learning Research. PMLR. 32–41. url: https://proceedings.mlr.press/v119/ablin20a.
html.
Absil, P.-A., R. Mahony, and R. Sepulchre (2010). “Optimization On Manifolds: Methods and
Applications”. In: Recent Advances in Optimization and Its Applications in Engineering.
Ed. by M. Diehl, F. Glineur, E. Jarlebring, and W. Michiels. Berlin, Heidelberg: Springer.
125–144. doi: 10.1007/978-3-642-12598-0_12.
Absil, P.-A. and J. Malick (2012). “Projection-like Retractions on Matrix Manifolds”. SIAM
Journal on Optimization. 22(1): 135–158. doi: 10.1137/100802529.
Afsari, B. (2011). “Riemannian Lp Center of Mass: Existence, Uniqueness, and Convexity”.
Proceedings of the American Mathematical Society. 139(2): 655–673. doi: 10.1090/S0002-
9939-2010-10541-5.
Arnold, V. (1966). “Sur la géométrie différentielle des groupes de Lie de dimension infinie et
ses applications à l’hydrodynamique des fluides parfaits”. Annales de l’institut Fourier.
16(1): 319–361. doi: 10.5802/aif.233.
Atkin, C. J. (1975). “The Hopf-Rinow Theorem Is False in Infinite Dimensions”. Bulletin
of the London Mathematical Society. 7(3): 261–266. doi: 10.1112/blms/7.3.261.
Balzin, E. (2021). “Lectures for MAA 306 Course “Topics in Differential Geometry””. June:
102. url: http://eduard-balzin.perso.math.cnrs.fr/Differential_geometry_lecture_
notes_070621.pdf.
Barachant, A. (2015). “pyRiemann v0.2.2”. Zenodo. doi: 10.5281/zenodo.593816.
Barczyk, M., S. Bonnabel, J.-E. Deschaud, and F. Goulette (2015). “Invariant EKF Design for
Scan Matching-Aided Localization”. IEEE Transactions on Control Systems Technology.
23(6): 2440–2448. doi: 10.1109/TCST.2015.2413933.
Barp, A., A. Kennedy, and M. Girolami (2019). “Hamiltonian Monte Carlo on Symmetric
and Homogeneous Spaces via Symplectic Reduction”. doi: 10.48550/arXiv.1903.02699.
Becigneul, G. and O. Ganea (2019). “Riemannian Adaptive Optimization Methods”. In: 7th
International Conference on Learning Representations (ICLR). url: https://openreview.
net/pdf?id=r1eiqi09K7.
Bendokat, T., R. Zimmermann, and P.-A. Absil (2020). “A Grassmann Manifold Handbook:
Basic Geometry and Computational Aspects”. Technical Report No. UCL-INMA-2020.07.
arXiv: 2011.13699.
Berger, M. (2003). A Panoramic View of Riemannian Geometry. Berlin, Heidelberg: Springer.
doi: 10.1007/978-3-642-18245-7.

124
Bhatia, R., T. Jain, and Y. Lim (2019). “On the Bures–Wasserstein Distance between
Positive Definite Matrices”. Expositiones Mathematicae. 37(2): 165–191. doi: 10.1016/j.
exmath.2018.01.002.
Bhattacharya, R. and V. Patrangenaru (2005). “Large Sample Theory of Intrinsic and
Extrinsic Sample Means on Manifolds—II”. The Annals of Statistics. 33(3): 1225–1259.
doi: 10.1214/009053605000000093.
Boumal, N. (2023). An Introduction to Optimization on Smooth Manifolds. Cambridge: Cam-
bridge University Press. url: https://www.cambridge.org/core/books/an-introduction-
to-optimization-on-smooth-manifolds/EAF2B35457B7034AC747188DC2FFC058 (ac-
cessed on 01/06/2023).
Cartan, E. (1926). “Sur une classe remarquable d’espaces de Riemann”. Bulletin de la
Société Mathématique de France. 54: 214–264. doi: 10.24033/bsmf.1105.
Cartan, E. (1930). La théorie des groupes finis et continus et l’analysis situs. Vol. 42.
Mémorial des sciences mathématiques. Gauthier-Villars et cie. url: https://books.
google.fr/books?id=nv8SAQAAMAAJ.
Cendra, H., D. D. Holm, J. E. Marsden, and T. S. Ratiu (1998). “Lagrangian Reduction,
the Euler-Poincaré Equations, and Semidirect Products”. In: Geometry of Differential
Equations. Vol. 186. American Mathematical Society Translations: Series 2. American
Mathematical Society. 1–25. doi: 10.1090/trans2/186.
Censi, A. (2012). “PyGeometry: Library for Handling Various Differentiable Manifolds.”
url: https://github.com/AndreaCensi/geometry.
“Chapter 3 Homogeneous Spaces” (1975). In: Comparison Theorems in Riemannian Geom-
etry. Ed. by J. Cheeger and D. G. Ebin. Vol. 9. North-Holland Mathematical Library.
Elsevier. 55–79. doi: 10.1016/S0924-6509(08)70208-3.
Chevallier, E. and N. Guigui (2020). “A Bi-Invariant Statistical Model Parametrized by
Mean and Covariance on Rigid Motions”. Entropy. 22(4): 432. doi: 10.3390/e22040432.
Chikuse, Y. (2003). Statistics on Special Manifolds. Lecture Notes in Statistics. New York:
Springer-Verlag. doi: 10.1007/978-0-387-21540-2.
Cohen, T. S., M. Geiger, and M. Weiler (2019). “A General Theory of Equivariant CNNs
on Homogeneous Spaces”. In: Advances in Neural Information Processing Systems.
Vol. 32. Curran Associates, Inc. url: https://proceedings.neurips.cc/paper/2019/hash/
b9cfe8b6042cf759dc4c0cccb27a6737-Abstract.html (accessed on 01/06/2023).
David, P. and W. Gu (2019). “A Riemannian Structure for Correlation Matrices”. Operators
and Matrices. 13(3): 607–627. doi: 10.7153/oam-2019-13-46.
Diaconis, P., S. Holmes, and M. Shahshahani (2013). “Sampling from a Manifold”. Advances
in Modern Statistical Theory and Applications: A Festschrift in honor of Morris L.
Eaton. Jan.: 102–125. doi: 10.1214/12-IMSCOLL1006.

125
Dryden, I. L., A. Koloydenko, and D. Zhou (2009). “Non-Euclidean Statistics for Covariance
Matrices, with Applications to Diffusion Tensor Imaging”. Annals of Applied Statistics.
3(3): 1102–1123. doi: 10.1214/09-AOAS249.
Dryden, I. L. and K. V. Mardia (2016). Statistical Shape Analysis: With Applications
in R. Wiley Series in Probability and Statistics. John Wiley & Sons. doi: 10.1002/
9781119072492.
Eaton, M. L. (1983). Multivariate Statistics: A Vector Space Approach. Probability and
Statistics Series. Wiley. url: https://books.google.fr/books?id=1CvvAAAAMAAJ.
Fey, M. and J. E. Lenssen (2019). “Fast Graph Representation Learning with PyTorch
Geometric”. Apr. arXiv: 1903.02428.
Fletcher, P., C. Lu, S. Pizer, and S. Joshi (2004). “Principal Geodesic Analysis for the
Study of Nonlinear Statistics of Shape”. IEEE Transactions on Medical Imaging. 23(8):
995–1005. doi: 10.1109/TMI.2004.831793.
Fréchet, M. (1948). “Les éléments aléatoires de nature quelconque dans un espace distancié”.
Annales de l’institut Henri Poincaré. 10(4): 215–310. url: http://www.numdam.org/
item/AIHP_1948__10_4_215_0/ (accessed on 05/20/2021).
Gallier, J. and J. Quaintance (2020). Differential Geometry and Lie Groups: A Computa-
tional Perspective. Geometry and Computing. Springer International Publishing. doi:
10.1007/978-3-030-46040-2.
Gawlik, E. S. and M. Leok (2018). “Interpolation on Symmetric Spaces Via the Generalized
Polar Decomposition”. Foundations of Computational Mathematics. 18(3): 757–788. doi:
10.1007/s10208-017-9353-0.
Gay-Balmaz, F., D. D. Holm, D. M. Meier, T. S. Ratiu, and F.-X. Vialard (2012). “Invariant
Higher-Order Variational Problems II”. Journal of Nonlinear Science. 22(4): 553–597.
doi: 10.1007/s00332-012-9137-2.
Guigui, N., E. Maignant, A. Trouvé, and X. Pennec (2021). “Parallel Transport on Kendall
Shape Spaces”. In: GSI 2021 - 5th Conference on Geometric Science of Information.
Vol. 12829. Lecture Notes in Computer Science. Springer, Cham. 103–110. doi: 10.1007/
978-3-030-80209-7_12.
Guigui, N. and X. Pennec (2021). “A Reduced Parallel Transport Equation on Lie Groups
with a Left-Invariant Metric”. In: GSI 2021 - 5th Conference on Geometric Science of
Information. Vol. 12829. Lecture Notes in Computer Science. Springer, Cham. 119–126.
doi: 10.1007/978-3-030-80209-7_14.
Hauberg, S. (2018). “Directional Statistics with the Spherical Normal Distribution”. In:
2018 21st International Conference on Information Fusion (FUSION). 704–711. doi:
10.23919/ICIF.2018.8455242.
Hauberg, S. (2019). “Only Bayes Should Learn a Manifold (on the Estimation of Differential
Geometric Structure from Data)”. Sept. arXiv: 1806.04994.

126
Helgason, S. (1979). Differential Geometry, Lie Groups, and Symmetric Spaces. Vol. 80.
Pure and Applied Mathematics. Academic Press. url: https://www.sciencedirect.com/
bookseries/pure-and-applied-mathematics/vol/80/suppl/C.
Huckemann, S., T. Hotz, and A. Munk (2010). “Intrinsic Shape Analysis: Geodesic Pca for
Riemannian Manifolds Modulo Isometric Lie Group Actions”. Statistica Sinica. 20(1):
1–58. url: https://www.jstor.org/stable/24308976 (accessed on 01/09/2023).
Karcher, H. (1977). “Riemannian Center of Mass and Mollifier Smoothing”. Communications
on Pure and Applied Mathematics. 30(5): 509–541. doi: 10.1002/cpa.3160300502.
Karcher, H. (2014). “Riemannian Center of Mass and so Called Karcher Mean”. July. arXiv:
1407.2087.
Kendall, W. S. (1990). “Probability, Convexity, and Harmonic Maps with Small Image
I: Uniqueness and Fine Existence”. Proceedings of the London Mathematical Society.
s3-61(2): 371–406. doi: 10.1112/plms/s3-61.2.371.
Kendall, W. S. and H. Le (2009). “10 Statistical Shape Theory”. In: New Perspectives in
Stochastic Geometry. Ed. by W. S. Kendall and I. Molchanov. Oxford University Press.
348–373. doi: 10.1093/acprof:oso/9780199232574.003.0010.
Kim, K.-R., I. L. Dryden, H. Le, and K. E. Severn (2021). “Smoothing Splines on Riemannian
Manifolds, with Applications to 3D Shape Space”. Journal of the Royal Statistical Society:
Series B (Statistical Methodology). 83(1): 108–132. doi: 10.1111/rssb.12402.
Kobayashi, S. and K. Nomizu (1996a). Foundations of Differential Geometry, Volume 1.
Wiley Classics Library. Wiley.
Kobayashi, S. and K. Nomizu (1996b). Foundations of Differential Geometry, Volume 2.
Wiley.
Kochurov, M., R. Karimov, and S. Kozlukov (2020). “Geoopt: Riemannian Optimization in
PyTorch”. doi: 10.48550/arXiv.2005.02819. arXiv: 2005.02819 [cs].
Kolev, B. (2004). “Lie Groups and Mechanics: An Introduction”. Journal of Nonlinear
Mathematical Physics. 11(4): 480–498. doi: 10.2991/jnmp.2004.11.4.5.
Kühnel, L. and S. Sommer (2017). “Computational Anatomy in Theano”. Graphs in
Biomedical Image Analysis, Computational Anatomy and Imaging Genetics. Sept.: 164–
176. doi: 10.1007/978-3-319-67675-3_15.
Kühnel, L., S. Sommer, and A. Arnaudon (2019). “Differential Geometry and Stochastic
Dynamics with Deep Learning Numerics”. Applied Mathematics and Computation.
356(Sept.): 411–437. doi: 10.1016/j.amc.2019.03.044.
Lafontaine, J., S. Gallot, and D. Hulin (2004). Riemannian Geometry. Universitext. Springer
Verlag. doi: 10.1007/978-3-642-18855-8.
Le, H. and D. G. Kendall (1993). “The Riemannian Structure of Euclidean Shape Spaces:
A Novel Environment for Statistics”. The Annals of Statistics. 21(3): 1225–1271. url:
https://www.jstor.org/stable/2242196 (accessed on 11/09/2020).

127
Lee, J. M. (2003). Introduction to Smooth Manifolds. Vol. 218. Graduate Texts in Mathemat-
ics. New York, NY: Springer Science & Business Media. doi: 10.1007/978-1-4419-9982-5.
Lee, J. M. (2018). Introduction to Riemannian Manifolds. Vol. 176. Graduate Texts in
Mathematics. Cham: Springer International Publishing. doi: 10.1007/978-3-319-91755-9.
Lorenzi, M. and X. Pennec (2013). “Geodesics, Parallel Transport & One-Parameter
Subgroups for Diffeomorphic Image Registration”. International Journal of Computer
Vision. 105(2): 111–127. doi: 10.1007/s11263-012-0598-4.
Louis, M., R. Couronné, I. Koval, B. Charlier, and S. Durrleman (2019). “Riemannian
Geometry Learning for Disease Progression Modelling”. In: Information Processing in
Medical Imaging. Springer, Cham. 542–553. doi: 10.1007/978-3-030-20351-1_42.
Mardia, K. V. and P. E. Jupp (2009). Directional Statistics. Wiley Series in Probability and
Statistics. John Wiley & Sons. doi: 10.1002/9780470316979.
Marle, C.-M. (2005). “The Works of Charles Ehresmann on Connections: From Cartan
Connections to Connections on Fibre Bundles”. In: Geometry and Topology of Manifolds.
Vol. 76. Bedlewo, Poland. 65–86. url: https://hal.archives-ouvertes.fr/hal-00940427
(accessed on 05/08/2021).
Marron, J. and I. Dryden (2021). Object Oriented Data Analysis. Chapman & Hall/CRC
Monographs on Statistics & Applied Probability. CRC Press. url: https://books.google.
fr/books?id=X2yjzgEACAAJ.
Marsden, J. E. and T. S. Ratiu (2009). “Mechanical Systems: Symmetries and Reduction”.
In: Encyclopedia of Complexity and Systems Science. Ed. by R. A. Meyers. New York,
NY: Springer. 5482–5510. doi: 10.1007/978-0-387-30440-3_326.
McAlpin, J. H. (1965). “Infinite Dimensional Manifolds and Morse Theory”. PhD thesis.
Columbia University.
Meghwanshi, M., P. Jawanpuria, A. Kunchukuttan, H. Kasai, and B. Mishra (2018).
“McTorch, a Manifold Optimization Library for Deep Learning”. doi: 10.48550/arXiv.
1810.01811.
Milnor, J. (1976). “Curvatures of Left Invariant Metrics on Lie Groups”. Advances in
Mathematics. 21(3): 293–329. doi: 10.1016/S0001-8708(76)80002-3.
Milnor, J. (1984). “Remarks on Infinite-Dimensional Lie Groups”. Relativity, groups and
topology. 2. url: http : / / inis . iaea . org / Search / search . aspx ? orig _ q = RN : 16043191
(accessed on 07/06/2021).
Miolane, N. and X. Pennec (2015). “Computing Bi-Invariant Pseudo-Metrics on Lie Groups
for Consistent Statistics”. Entropy. 17(4). doi: 10.3390/e17041850.
Miolane, N., N. Guigui, H. Zaatiti, C. Shewmake, H. Hajri, D. Brooks, A. Le Brigant, J.
Mathe, B. Hou, Y. Thanwerdas, S. Heyder, O. Peltre, N. Koep, Y. Cabanes, T. Gerald,
P. Chauchat, B. Kainz, C. Donnat, S. Holmes, and X. Pennec (2020). “Introduction
to Geometric Learning in Python with Geomstats”. In: SciPy 2020 - 19th Python in
Science Conference. 48. doi: 10.25080/Majora-342d178e-007.

128
Munthe-Kaas, H. Z., G. R. W. Quispel, and A. Zanna (2014). “Symmetric Spaces and
Lie Triple Systems in Numerical Analysis of Differential Equations”. BIT Numerical
Mathematics. 54(1): 257–282. doi: 10.1007/s10543-014-0473-5.
Myers, S. B. and N. E. Steenrod (1939). “The Group of Isometries of a Riemannian
Manifold”. Annals of Mathematics. 40(2): 400–416. doi: 10.2307/1968928.
Nava-Yazdani, E., H.-C. Hege, T. J. Sulllivan, and C. von Tycowicz (2020). “Geodesic
Analysis in Kendall’s Shape Space with Epidemiological Applications”. Journal of
Mathematical Imaging and Vision. 62(4): 549–559. doi: 10.1007/s10851-020-00945-w.
Nickel, M. and D. Kiela (2017). “Poincaré Embeddings for Learning Hierarchical Represen-
tations”. In: Advances in Neural Information Processing Systems 30. Ed. by I. Guyon,
U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett.
Curran Associates, Inc. 6338–6347. url: http://papers.nips.cc/paper/7213-poincare-
embeddings-for-learning-hierarchical-representations.pdf.
Nomizu, K. (1954). “Invariant Affine Connections on Homogeneous Spaces”. American
Journal of Mathematics. 76(1): 33. doi: 10.2307/2372398.
O’Neill, B. (1966). “The Fundamental Equations of a Submersion”. Michigan Mathematical
Journal. 13(4): 459–469. doi: 10.1307/mmj/1028999604.
Paulin, F. (2007). “Géométrie différentielle élémentaire”. url: https://www.imo.universite-
paris-saclay.fr/~paulin/notescours/cours_geodiff.pdf.
Paulin, F. (2014). “Géométrie Riemanienne, cours de second année de mastère”. url:
https://www.imo.universite-paris-saclay.fr/~paulin/notescours/cours_georiem.pdf.
Pennec, X. (2006). “Intrinsic Statistics on Riemannian Manifolds: Basic Tools for Geometric
Measurements”. Journal of Mathematical Imaging and Vision. 25(1): 127–154. doi:
10.1007/s10851-006-6228-4.
Pennec, X. and V. Arsigny (2013). “Exponential Barycenters of the Canonical Cartan
Connection and Invariant Means on Lie Groups”. In: Matrix Information Geometry.
Ed. by F. Nielsen and R. Bhatia. Berlin, Heidelberg: Springer Berlin Heidelberg. 123–166.
doi: 10.1007/978-3-642-30232-9_7.
Pennec, X., P. Fillard, and N. Ayache (2006). “A Riemannian Framework for Tensor
Computing”. International Journal of Computer Vision. 66(1): 41–66. doi: 10.1007/
s11263-005-3222-z.
Pennec, X., S. Sommer, and T. Fletcher, eds. (2020). Riemannian Geometric Statistics in
Medical Image Analysis. Vol. 3. The Elsevier and MICCAI Society Book Series. Elsevier.
doi: 10.1016/C2017-0-01561-6.
Postnikov, M. M. (2001). Geometry VI: Riemannian Geometry. Encyclopaedia of Mathe-
matical Sciences, Geometry. Berlin Heidelberg: Springer-Verlag. url: //www.springer.
com/us/book/9783540411086 (accessed on 12/21/2018).

129
Riemann, B. (1868). “Über Die Hypothesen, Welche Der Geometrie Zu Grunde Liegen.
(Mitgetheilt Durch r. Dedekind)”. Abhandlungen der Königlichen Gesellschaft der
Wissenschaften in Göttingen. 13: 133–152. url: http://eudml.org/doc/135760.
Riemann, B. (1873). “On the Hypotheses Which Lie at the Bases of Geometry”. Nature.
8(184): 36–37. doi: 10.1038/008036a0.
Said, S., H. Hajri, L. Bombrun, and B. C. Vemuri (2018). “Gaussian Distributions on Rie-
mannian Symmetric Spaces: Statistical Learning With Structured Covariance Matrices”.
IEEE Transactions on Information Theory. 64(2): 752–772. doi: 10.1109/TIT.2017.
2713829.
Schmid, R. (2004). “Infinite Dimentional Lie Groups with Applications to Mathematical
Physics”. Journal of Geometry and Symmetry in Physics. 1(Jan.): 54–120. doi: 10.7546/
jgsp-1-2004-54-120.
Smirnov, O. (2021). “TensorFlow RiemOpt: A Library for Optimization on Riemannian
Manifolds”. July. arXiv: 2105.13921.
Terras, A. (1988). Harmonic Analysis on Symmetric Spaces and Applications, Vol. II. New
York: Springer-Verlag. doi: 10.1007/978-1-4612-3820-1.
Thanwerdas, Y. (2022). “Riemannian and Stratified Geometries on Covariance and Correla-
tion Matrices”. Theses. Université Côte d’Azur. url: https://hal.archives-ouvertes.fr/tel-
03698752.
Thanwerdas, Y. and X. Pennec (2021). “Geodesics and Curvature of the Quotient-Affine
Metrics on Full-Rank Correlation Matrices”. In: GSI 2021 - 5th Conference on Geometric
Science of Information. Proceedings of Geometric Science of Information. Paris, France.
doi: 10.1007/978-3-030-80209-7_11.
Thanwerdas, Y. and X. Pennec (2022). “O(n)-Invariant Riemannian Metrics on SPD
Matrices”. Linear Algebra and its Applications. Dec. doi: 10.1016/j.laa.2022.12.009.
Townsend, J., N. Koep, and S. Weichwald (2016). “Pymanopt: A Python Toolbox for
Optimization on Manifolds Using Automatic Differentiation”. Journal of Machine
Learning Research. 17(137): 1–5. doi: 10.5555/2946645.3007090.
Whitney, H. (1936). “Differentiable Manifolds”. Annals of Mathematics. 37(3): 645–680.
doi: 10.2307/1968482.
Wood, A. T. A. (1994). “Simulation of the von Mises Fisher Distribution”. Communi-
cations in Statistics - Simulation and Computation. 23(1): 157–164. doi: 10 . 1080 /
03610919408813161.
Wynn, K. (2014). “PyQuaternions: A Fully Featured, Pythonic Library for Representing
and Using Quaternions”. url: https://github.com/KieranWynn/pyquaternion.
Yair, O., M. Ben-Chen, and R. Talmon (2019). “Parallel Transport on the Cone Manifold
of SPD Matrices for Domain Adaptation”. In: IEEE Transactions on Signal Processing.
Vol. 67. 1797–1811. doi: 10.1109/TSP.2019.2894801.

130
Zimmermann, R. (2017). “A Matrix-Algebraic Algorithm for the Riemannian Logarithm on
the Stiefel Manifold under the Canonical Metric”. SIAM Journal on Matrix Analysis
and Applications. 38(2): 322–342. doi: 10.1137/16M1074485.

131

You might also like