0% found this document useful (0 votes)
8 views40 pages

Pennec - Intrinsic Statistics On Riemannian Manifolds

The document discusses intrinsic statistics on Riemannian manifolds, focusing on geometric measurements relevant to medical image analysis and computer vision. It introduces statistical tools for handling geometric features, such as mean values and covariance matrices, while addressing the challenges posed by non-vector space structures of manifolds. The paper also presents a new proof of Riemannian centers of mass and a gradient descent algorithm for their computation, along with a generalized family of probability distributions based on maximum entropy principles.

Uploaded by

Sid Ahmed Mein
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views40 pages

Pennec - Intrinsic Statistics On Riemannian Manifolds

The document discusses intrinsic statistics on Riemannian manifolds, focusing on geometric measurements relevant to medical image analysis and computer vision. It introduces statistical tools for handling geometric features, such as mean values and covariance matrices, while addressing the challenges posed by non-vector space structures of manifolds. The paper also presents a new proof of Riemannian centers of mass and a gradient descent algorithm for their computation, along with a generalized family of probability distributions based on maximum entropy principles.

Uploaded by

Sid Ahmed Mein
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Intrinsic Statistics on Riemannian Manifolds: Basic

Tools for Geometric Measurements


Xavier Pennec

To cite this version:


Xavier Pennec. Intrinsic Statistics on Riemannian Manifolds: Basic Tools for Geometric Measure-
ments. Journal of Mathematical Imaging and Vision, 2006, 25 (1), pp.127-154. �10.1007/s10851-006-
6228-4�. �inria-00614994�

HAL Id: inria-00614994


https://inria.hal.science/inria-00614994
Submitted on 17 Aug 2011

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est


archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
This paper will appear in the Journal of Mathematical Imaging and Vision (Springer).
The original publication will be available at www.springerlink.com.

Intrinsic Statistics on Riemannian Manifolds:


Basic Tools for Geometric Measurements
Xavier Pennec
INRIA - Epidaure / Asclepios project-team
2004 Route des Lucioles, BP 93
F-06902 Sophia Antipolis Cedex, France
Email: [email protected]

Abstract
In medical image analysis and high level computer vision, there is an intensive use of geo-
metric features like orientations, lines, and geometric transformations ranging from simple ones
(orientations, lines, rigid body or affine transformations, etc.) to very complex ones like curves,
surfaces, or general diffeomorphic transformations. The measurement of such geometric prim-
itives is generally noisy in real applications and we need to use statistics either to reduce the
uncertainty (estimation), to compare observations, or to test hypotheses. Unfortunately, even
simple geometric primitives often belong to manifolds that are not vector spaces. In previous
works [1, 2], we investigated invariance requirements to build some statistical tools on transfor-
mation groups and homogeneous manifolds that avoids paradoxes. In this paper, we consider
finite dimensional manifolds with a Riemannian metric as the basic structure. Based on this
metric, we develop the notions of mean value and covariance matrix of a random element, nor-
mal law, Mahalanobis distance and χ2 law. We provide a new proof of the characterization of
Riemannian centers of mass and an original gradient descent algorithm to efficiently compute
them. The notion of Normal law we propose is based on the maximization of the entropy know-
ing the mean and covariance of the distribution. The resulting family of pdfs spans the whole
range from uniform (on compact manifolds) to the point mass distribution. Moreover, we were
able to provide tractable approximations (with their limits) for small variances which show that
we can effectively implement and work with these definitions.
1 Introduction
To represent the results of a random experiment, one theoretically consider a probability measure
on the space of all events. Although this probabilized space contains all the information about the
random experiment, one often have access only to some measurements depending of the outcome
of the experiment. The mathematical way to formalize this is to investigate random variables or
observables which are maps from the probabilized space into R. One usually further simplify by
restricting to random variables that have a probability density function (pdf).
However, from a computational point of view, the pdf is still too informative and we have to
restrict the measurements to a few numeric characteristics of a random variable. Thus, one usually
approximate a unimodal pdf by a central value and a dispersion value around it. The Rmost used
Rcentral value is the mean value or expectation of the random variable: x = E [ x ]= x dPr =
2 = E (x − x)2 .
R y p x (y) dy. The corresponding dispersion value is the variance σ x
In real problems, we can have several simultaneous measurements of the same random exper-
iment. If we arrange these n random variables xi into a vector x = (x1 . . . xn ), we obtain a
random vector. As the expectation is a linear operator, it is easily generalized to vector or ma-
trix functions in order to define R the mean value and the covariance matrix of a random vector:
Σxx = E [ (x − x) (x − x)T ] = (y − x) (y − x)T px (y) dy. If one has to assume a probability dis-
tribution, the Gaussian distribution is usually well adapted, as it is completely determined by the
mean and the covariance. It is moreover the entropy maximizing distribution knowing only these
moments. Then, one can use a statistical distance between distributions such as the Mahalanobis
distance and the associated statistical tests.
The problem we investigate in this article is to generalize this framework to measurements in
finite dimensional Riemannian manifolds instead of measurements in a vector space. Examples of
manifolds we routinely use in medical imaging applications are 3D rotations, 3D rigid transforma-
tions, frames (a 3D point and an orthonormal trihedron), semi- or non-oriented frames (where 2
(resp. 3) of the trihedron unit vectors are given up to their sign) [3, 4], oriented or directed points
[5, 6], positive definite symmetric matrices coming from diffusion tensor imaging [7, 8, 9, 10, 11]
or from variability measurements [12]. We have already shown in [13, 2] that this is not an easy
problem and that some paradoxes can arise. In particular, we cannot generalize the expectation
to give a mean value since it would be an integral with value in the manifold: a new definition of
mean is needed, which implies revisiting an important part of the theory of statistics.
Statistical analysis on manifolds is a relatively new domain at the confluent of several mathemat-
ical and application domains. Its goal is to study statistically geometric object living in differential
manifolds. It is linked to the theory of statistical manifolds [14, 15], which aims at providing a
Riemannian structure to the the space of parameters of statistical distribution. However, the tar-
geted geometrical objects are usually different. Directional statistics [16, 17, 18, 19] provide a first
approach to statistics on manifold. As the manifolds considered here are spheres and projective
spaces, the tools developed were mostly extrinsic, i.e. relying on the embedding of the manifold in
the ambient Euclidean space. More complex objects are obtained when we consider the “shape” of
a set of k points, i.e. what remains invariant under the action of a given group of transformation
(usually rigid body ones or similarities). The statistics on these shape spaces [20, 21, 22, 23] raised
the need for intrinsic tools. However, the link between the tools developed in these works, the
metric used and the space structure was not always very clear.
Another mathematical approach was provided by the study of stochastic processes on Lie groups.
For instance, [24] derived central limit theorems on different families of groups and semi-groups
with specific algebraic properties. Since then, several authors in the area of stochastic differential

1
geometry and stochastic calculus on manifolds proposed results related to mean values [25, 26,
27, 28, 29, 30]. On the applied mathematics and computer science side, people get interested
in computing and optimizing in specific manifolds, like rotations and rigid body transformations
[4, 31, 32, 33, 34], Stiefel and Grassmann manifolds [35], etc.
Over the last years, several groups attempted to federate some of the above approaches in a
general statistical framework, with different objectives in mind. For instance, [36] and [15] aimed
at characterizing the performances of statistical parametric estimators, like the bias and the mean
square error. [36] considered extrinsic statistics, based on the Euclidean distance of the embedding
space, while [15] considered the intrinsic Riemannian distance, and refined the Cramer-Rao lower
bound using bounds on the sectional curvature of the manifold. In [37, 38], the authors focused on
the asymptotic consistency properties of the extrinsic and intrinsic means and variances for large
sample sizes, and were able to propose a central limit theorem for flat manifolds. Here, in view of
computer vision and medical image analysis applications, our concern is quite different: we aim at
developing computational tools that can consistently deal with geometric features, or that provide
at least good approximations. As we often have few measurements, we are interested in small
sample sizes rather than large one, and we prefer to obtain approximations rather than bounds on
the quality of the estimation. Thus, one of our special interest is to develop Taylor expansions with
respect to the variance, in order to evaluate the quality of the computations with respect to the
curvature of the manifold. In all cases, the chosen framework is the one of geodesically complete
Riemannian manifolds, which appears to be powerful enough to support an interesting theory. To
ensure a maximal consistency of the theory, we rely in this paper only on intrinsic properties of
the Riemannian manifold, thus excluding methods based on the embedding of the manifold in an
ambient Euclidean space.
We review in Section 2 some basic notions of differential and Riemannian geometry that will
be needed afterward. This synthesis was inspired from [39, chap. 9], [40, 41, 42], and the reader
can refer to these books to find more details. In the remaining of the paper, we consider that
our Riemannian manifold is connected and geodesically complete. We first detail in Section 3 the
measure induced by the Riemannian metric on the manifold, which allows to define probability
density functions, in particular the uniform one. Then, we turn in Section 4 to the expectation of a
random point. We provide a quite comprehensive survey of the definitions that have been proposed.
Among them, we focus on the Karcher and Fréchet means, defined as the set of points minimizing
locally or globally the variance (the expected Riemannian distance). We provide a new proof of
the barycentric characterization theorem and an original Gauss-Newton gradient descent algorithm
to practically compute the mean. Once the mean value is determined, one can easily define the
covariance matrix of a random point (and possibly higher order moments) using the exponential
map at the mean point (Section 4). To generalize the Gaussian distribution, we propose in Section
6 a new family of distributions based on a maximum entropy approach. Under some reasonable
hypotheses, we show that is amounts to take a truncated Gaussian distribution in the exponential
map at the mean point. We illustrate the properties of this pdf family on the circle, and provide
computationally tractable approximations for concentrated distributions. Last but not least, we
investigate in Section 7 the generalization of the Mahalanobis distance and the χ2 law. A careful
analysis shows that, with our definition of the generalized Gaussian, the χ2 law remains independent
of the variance and of the manifold curvature, up to the order 3. This demonstrate that the whole
framework is computationally sound and particularly consistent.

2
2 Differential geometry background
2.1 Riemannian metric, distance and geodesics
In the geometric framework, one specifies the structure of a manifold M by a Riemannian metric.
This is a continuous collection of dot products h . | . ix on the tangent space Tx M at each point x

of the manifold. A local coordinate system x = (x1 , . . . xn ) induces a basis ∂x = (∂1 , . . . ∂n ) of the
i
tangent spaces (∂i is a shorter notation for ∂/∂x ). Thus, we can express the metric in this basis by
a symmetric positive definite matrix G(x) = [gij (x)] where each element is given by the dot product
of the tangent vector to the coordinate curves: gij (x) = h ∂i | ∂j i. This matrix is called the local
representation of the Riemannian metric in the chart x and the dot products of two vectors v and
w in Tx M is now h v | w ix = v T G(x) w. The matrix G(x) is called the local representation of the
Riemannian metric in the chart x.
If we consider a curve γ(t) on the manifold, we can compute at each point its instantaneous
speed vector γ̇(t) and its norm, the instantaneous speed. To compute the length of the curve, we
can proceed as usual by integrating this value along the curve:
Z b Z b 1
2
Lba (γ) = kγ̇(t)k dt = h γ̇(t) | γ̇(t) iγ(t) dt
a a

The Riemannian metric is the intrinsic way of measuring length on a manifold. The extrinsic
way is to consider the manifold as embedded in a larger vector space E (think for instance to the
sphere S2 in R3 ) and compute the length of a curve in M as for any curve in E. In this case, the
corresponding Riemannian metric is the restriction of the dot product of E onto the tangent space
at each point of the manifold. By Whitney’s theorem, there always exists such an embedding for a
large enough vector space E (dim(E) ≤ 2dim(M) + 1).
To obtain a distance between two points of a connected Riemannian manifold, we simply have
to take the minimum length among the smooth curves joining these points:

dist(x, y) = min L(γ) with γ(0) = x and γ(1) = y (1)


γ

The curves realizing this minimum for any two points of the manifold are called geodesics1 . Let
[g ij ] = [gij ](-1) be the inverse of the metric matrix (in a given coordinate system x) and Γijk =
1 im
2g (∂k gmj + ∂j gmk − ∂m gjk ) the Christoffel symbols (using Einstein summation convention that
implicit sum upon each index that appear up and down in the formula). The calculus of variations
shows the geodesics are the curves satisfying the following second order differential system (in the
chart x = (x1 , . . . xn )):
γ̈ i + Γijk γ̇ j γ̇ k = 0
The manifold is said to be geodesically complete if the definition domain of all geodesics can be
extended to R. This means that the manifold has no boundary nor any singular point that we can
reach in a finite time (for instance, Rn − {0} with the usual metric is not geodesically complete, but
Rn or Sn are). As an important consequence, the Hopf-Rinow-De Rham theorem state that such
a manifold is complete for the induced distance (equation 1), and that there always exist at least
one minimizing geodesic between any two points of the manifold (i.e. which length is the distance
between the two points). From now on, we will assume that the manifold is geodesically complete.
Rb
In facts, geodesics are defined as the critical points of the energy functional E(γ) = 12 a k∂γ k2 dt. It turns out
1

that they also optimize the length functional but they are moreover parameterized proportionally to arc-length.

3
2.2 Exponential map and cut locus
From the theory of second order differential equations, we know that there exists one and only
one geodesic γ(x,∂v ) going through the point x ∈ M at t = 0 with tangent vector ∂v ∈ Tx M. This
geodesic is theoretically defined in a sufficiently small interval around zero but since we the manifold
is geodesically complete, its definition domain can be extended to R. Thus, the point γ(x,∂v ) (t) is
defined for all vector ∂v ∈ Tx M and all parameter t. The exponential map maps each vector ∂v to
the point of the manifold reached in a unit time:
Tx M −→ M
expx :
∂v 7−→ expx (∂v ) = γ(x,∂v ) (1)
This function realizes a local diffeomorphism from a sufficiently small neighborhood of 0 in Tx M
into a neighborhood of the point x ∈ M. We denote by logx = exp(-1) x the inverse map or simply

→ = log (y). In this chart, the geodesics going through x are the represented by the lines going
xy x
through the origin: logx γ(x,−→ (t) = t −
xy)
→ Moreover, the distance with respect to the development
xy.
point x is preserved:
dist(x, y) = k−
→ = (h −
xyk → |−
xy → i )1/2
xy x
Thus, the exponential chart at x can be seen as the development of the manifold in the tangent
space at a given point along the geodesics. This is also called a normal coordinate system if it
is provided with an orthonormal basis. At the origin of such a chart, the metric reduces to the
identity matrix and the Christoffel symbols vanish.
Now, it is natural to search for the maximal domain where the exponential map is a diffeo-
morphism. If we follow a geodesic γ(x,∂v ) (t) = expx (t ∂v ) from t = 0 to infinity, it is either always
minimizing all along or it is minimizing up to a time t0 < ∞ and not any more after (thanks to
the geodesic completeness). In this last case, the point z = γ(x,∂v ) (t0 ) is called a cut point and the
corresponding tangent vector t0 ∂v a tangential cut point. The set of of all cut points of all geodesics
starting from x is the cut locus C(x) ∈ M and the set of corresponding vectors the tangential cut
locus C(x) ∈ Tx M. Thus, we have C(x) = expx (C(x)), and the maximal definition domain for the
exponential chart is the domain D(x) containing 0 and delimited by the tangential cut locus.
It is easy to see that this domain is connected and star-shaped with respect to the origin of Tx M.
Its image by the exponential map covers all the manifold except the cut locus and the segment [0, − →
xy]
is transformed into the unique minimizing geodesic from x to y. Hence, the exponential chart at
x is a chart centered at x with a connected and star-shaped definition domain that covers all the
manifold except the cut locus C(x):
D(x) ∈ Rn ←→ M − C(x)

xy = logx (y) ←→ y = expx (−
→ →
xy)
From a computational point of view, it is often interesting to extend this representation to include
the tangential cut locus. However, we have to take care of the multiple representations: points
in the cut locus where several minimizing geodesics meet are represented by several points on the
tangential cut locus as the geodesics are starting with different tangent vectors (e.g. rotation of
π around the axis ±n for 3D rotations, antipodal point on the sphere). This multiplicity problem
cannot be avoided as the set of such points is dense in the cut locus.
The size of this definition domain is quantified by the injectivity radius i(M, x) = dist(x, C(x)),
which is the maximal radius of centered balls in Tx M on which the exponential map is one-to-one.
The injectivity radius of the manifold i(M) is the infimum of the injectivity over the manifold. It
may be zero, inp which case the manifold somehow tends towards a singularity (think e.g. to the
surface z = 1/ x2 + y 2 as a sub-manifold of R3 ).

4
Example: On the sphere Sn (center 0 and radius 1) with the canonical Riemannian metric (in-
duced by the ambient Euclidean space Rn+1 ), the geodesics are the great circles and the cut locus
of a points x is its antipodal point x = −x. The exponential chart is obtained by rolling the sphere
onto its tangent space so that the great circles going through x become lines. The maximal defini-
tion domain is thus the open ball D = Bn (π). On its boundary ∂D = C = Sn−1 (π), all the points
represent x.

Figure 1: Exponential chart and cut locus for the sphere S2 and the projective space P2

For the real projective space Pn (obtained by identification of antipodal points of the sphere
Sn ), the geodesics are still the great circles, but the cut locus of the point {x, −x} is now the
equator of the two points where antipodal points are still identified (thus the cut locus is Pn−1 ).
The definition domain of the exponential chart is the open ball D = Bn ( π2 ), and the tangential cut
locus is the sphere ∂D = Sn−1 ( π2 ) where antipodal points are identified.

2.3 Taylor expansion of a real function


Let f be a smooth function from M to R (an observable). Its Gradient grad f at point x is the
linear form on Tx M corresponding to the directional derivatives ∂v :
∀ v ∈ Tx M grad f (v) = ∂v f
Thanks to the dot product, we can identify the linear form dω in the tangent space Tx M with the
vector ω if dω(v) = h ω | v ix for all vector v ∈ Tx M. All these notions can be extended to the whole
manifold using vector fields: in a local chart x, we have ∂v f = ∂f∂x (x)
v and h ω | v ix = ω T G(x) v.
Thus, the expression of the gradient in a chart is:
∂f T
grad f = G(-1) (x) = g ij ∂j f
∂x
This definition corresponds to the classical gradient in Rn even in the case of a non orthonormal
basis. The second covariant derivative (the Hessian) is a little bit more complex and makes use of
the connection ∇. We just give here its expression in a local coordinate system:
Hess f = ∇df = (∂ij f − Γkij ∂k f ) dxi dxj
Let now fx be the expression of f in a normal coordinate system at x. Its Taylor expansion
around the origin is:
1
fx (v) = fx (0) + Jfx v + v T Hfx v + O(kvk3 )
2

5
where Jfx = [∂i f ] and Hfx = [∂ij f ]. Since we are in a normal coordinate system, we have fx (v) =
f (expx (v)). Moreover, the metric at the origin reduces to the identity: Jfx = grad f T , and the
Christoffel symbols vanish so that the matrix of second derivatives Hfx corresponds to the Hessian
Hess f . Thus, The Taylor expansion can be written in any coordinate system:
1
f (expx (v)) = f (x) + grad f (v) + Hess f (v, v) + O(kvk3 ) (2)
2

3 Random points on a Riemannian Manifold


In this paper, we are interested in measurements of elements of a Riemannian manifold that depend
on the outcome of a random experiment. Particular cases are given by random transformation and
random feature for the particular case of transformation groups and homogeneous manifolds.

Definition 1 (Random point on a Riemannian Manifold)


Let (Ω, B(Ω), Pr) be a probability space, B(Ω) being the Borel σ-algebra of Ω (i.e. the smallest
σ-algebra containing all the open subsets of Ω) and Pr a measure on that σ-algebra such that
Pr(Ω) = 1. A random point in the Riemannian manifold M is a Borel measurable function x = x(ω)
from Ω to M.

As in the real or vectorial case, we can now make abstraction of the original space Ω and directly
work with the induced probability measure on M.

3.1 Riemannian measure or volume form


In a vector space with basis A = (a1 , . . . an ), the local representation of the metric is given by
G = AT A where A = [a1 , . . . an ] is the matrix of coordinates change from A to an orthonormal
basis. Similarly, the measure (or the infinitesimal volume element) p is given by the volume of the
parallelepipedon spanned by the basis vectors: dV = | det(A)| dx = | det(G)| dx. Assuming now
a Riemannian manifold M, we can see that the Riemannian metric G(x) induces an infinitesimal
volume element on each tangent space, and thus a measure on the manifold:
p
dM(x) = |G(x)| dx

One can show that the cut locus has a null measure. This means that we can integrate in-
differently in M or in any exponential chart. If f is an integrable function of the manifold and
fx (−
→ = f (exp (−
xy) →
x xy)) is its image in the exponential chart at x, we have:
Z Z p
f (x) dM = fx (~z) G~x (~z) d~z
M D(x)

3.2 Probability density function


Definition 2 Let B(M) be the Borel σ-algebra of M. The random point x has a probability density
function px (real, positive and integrable function) if:
Z Z
∀X ∈ B(M), Pr(x ∈ X ) = p(y) dM(y) and Pr(M) = p(y) dM(y) = 1
X M

6
A simple example of a pdf is the uniform pdf in a bounded set X :

1 1X (y)
pX (y) = R 1X (y) =
X dM Vol(X )

One must be careful that this pdf is uniform with respect to the measure dM and is not uniform for
another measure on the manifold. This problem is the basis of the Bertrand paradox for geometrical
probabilities [43, 44, 2] and raise the problem of the measure to choose on the manifold. In our case,
the measure is induced by the Riemannian metric but the problem is only lifted: which Riemannian
metric do we have to choose ? For transformation groups and homogeneous manifolds, we showed
in [1] that an invariant metric is a good geometric choice, even if such a metric does not always
exist for homogeneous manifolds or if it leads in general to a partial consistency only between the
geometric and statistical operations in non compact transformation groups [45].

3.3 Expression of the density in a chart


Let x be a random point of pdf px . If x = π(x) is a chart of the manifold defined almost everywhere,
we obtain a random vector x = π(x) which pdf ρx is defined with respect to the Lebesgue measure
dx in Rn instead of dM in M. Using the expression of the Riemannian measure, the two pdfs are
related by p
ρx (y) = px (y) |G(y)|
We shall note that the density ρx depends on the chart used whereas the pdf px is intrinsic to the
manifold.

3.4 Expectation of an observable


Let ϕ(x) be a Borelian real valued function defined on M and x a random point of pdf px . Then,
ϕ(x) is a real random variable and we can compute its expectation:
Z
E [ ϕ(x) ] = Ex [ ϕ ] = ϕ(y) px (y) dM(y)
M

This notion of expectation corresponds to the one we defined on real random variables and vectors.
However, we cannot directly extend it to define the mean value of the distribution since we have
no way to generalize this integral in R into an integral with value in the manifold.

4 Expectation or Mean of a Random point


We focus in this section to the notion of central value of a distribution. We will preferably use the
denomination mean value or mean point than expected point to stress the difference between this
notion and the expectation of a real function.

4.1 Fréchet expectation or mean value


Let x be a random vector of Rn . Fréchet observed in [46, 47] that the variance σx2 (y) = E dist(x, y)2
 

is minimized for the mean vector x = E [ x ]. The major point for the generalization is that the
expectation of a real valued function is well defined for our connected and geodesically complete
Riemannian manifold M.

7
Definition 3 Variance of a random point
Let x be a random point of pdf px . The variance σx2 (y) is the expectation of the squared distance
between the random point and the fixed point y:
Z
σx2 (y) = E dist(y, x)2 = dist(y, z)2 px (z) dM(z)
 
(3)
M

Definition 4 Fréchet expectation of a random point


Let x be a random point. If the variance σx2 (y) is finite for all point y ∈ M (which is in particular
true for a density with a compact support), every point x̄ minimizing this variance is called an
expected or mean point. Thus, the set of the mean points is:

E [ x ] = arg min E dist(y, x)2


 
y∈M

If there exists a least one mean point x̄, we call variance the minimal value σx2 = σx2 (x̄) and standard
deviation its square-root.

Similarly, one defines the empirical or discrete mean point of a set of measures x1 , . . . xn :
!
1 X
E [ {xi } ] = arg min E { dist(y, xi )2 } = arg min dist(y, xi )2
 
y∈M y∈M n
i

there exists a least a mean point x̄, one calls empirical variance the minimal value s2 =
If P
1 2
n i dist(x̄, xi ) and empirical standard deviation or RMS (for Root Mean Square) its square-
root.
Following the same principle, one can define other types of central values. The mean deviation
at order α is
Z 1/α
α 1/α α
σx,α (y) = (E [ dist(y, x) ]) = dist(y, z) px (z) dM(z)
M

If this function is bounded on M, one call central point at order α every point x̄α minimizing it.
For instance, the modes are obtained for α = 0. Exactly like in a vector space, they are the points
where the density is maximal on the manifold (which is generally not a maximum for the density
on the charts). The median point is obtained for α = 1. For α → ∞, we obtain the “barycenter” of
the distribution support (which has to be compact).
The definition of these central values can be extended to the discrete case easily, except perhaps
for the modes and for α → ∞. We note that the Fréchet expectation is defined for all metric space
and not only for Riemannian manifolds.

4.2 Existence and uniqueness: Riemannian center of mass


As our mean point is the result of a minimization, its existence is not ensured (the global minimum
could be unreachable) and anyway the result is a set and no longer a single element. This has
to be compared with some central values in vector spaces, for instance the modes. However, the
Fréchet expectation does not define all the modes even in vector spaces: one only keep the modes
of maximal intensity.
To get rid of this constraint, Karcher [25] proposed to consider the local minima of the variance
2
σx (y) (equation 3) instead of the global ones. We call these new set of means Riemannian centers

8
of mass. As global minima are local minima, the Fréchet expected points are a subset of the Rie-
mannian centers of mass. However, the use of local minima allows to characterize the Riemannian
centers of mass using only local derivatives of order two.
Using this extended definition, Karcher [25] and Kendall [48] established conditions on the
manifold and the distribution to ensure the existence and uniqueness of the mean. We just recall
here the results without the proofs.

Definition 5 (Regular geodesic balls) The ball B(x, r) = {y ∈ M/ dist(x, y) < r} is said
geodesic if it does not meet the cut locus of its center. This means that there exists a unique
minimizing geodesic from the center to any point of a geodesic ball. The ball is said regular if its

radius verifies 2r κ < π, where κ is the maximum of the Riemannian curvature in this ball.

For instance, on the sphere S2 with radius one, the curvature is constant and equal to 1. A
geodesic ball is regular if r < π/2. Such a ball can almost cover an hemisphere, but not the equator.
In a Riemannian manifold with non positive curvature, a regular geodesic ball can cover the whole
manifold (according to the Cartan-Hadamard theorem, such a manifold is diffeomorphic to Rn if it
is simply connected and complete).

Theorem 1 (Existence and uniqueness of the Riemannian center of mass)


Let x be a random point of pdf px .

• Kendall [48] If the support of px is included in a regular geodesic ball B(y, r), then there
exists one and only one Riemannian center of mass x on this ball.

• Karcher [25] If the support of px is included in a geodesic ball B(y, r) and if the ball of double
radius B(y, 2 r) is still geodesic and regular, then the variance σx2 (z) is a convex function of z
and has only one critical point on B(y, r), necessarily the Riemannian center of mass.

These conditions ensure a correct behavior of the mean for sufficiently localized distributions.
However, they are quite restrictive as they only address pdfs with a compact support. Kendall’s
existence and uniqueness theorem was extended by [30] to distributions with non-compact support
in manifolds with Ψ-convexity. This notion, already introduced by Kendall in his original proof,
is here extended to the whole manifold. Unfortunately, this type of argument can only be used
for a restricted class of manifolds as a non-compact connected and geodesically complete Ψ-convex
manifold is diffeomorphic to Rm . It remains that this extension of the theorem applies to the
important class of Hadamard manifolds (i.e. simply connected, complete and with non-positive
sectional curvature), whose curvature is bounded from below.

4.3 Other possible definitions of the mean points


The Riemannian center of mass is perfectly adapted for our purpose, thanks to its good properties
for optimization (see Section 4.6 below). However, there are other works proposing different ways
to generalize the notion of mean value or barycenter of a distribution in a manifold. We review
them for the sake of completeness and for their mathematical interest, but they do not seem to be
practically applicable.
Doss [49] used another property of the expectation as the starting point for the generalization:
if x is a real random variable, the only real number x̄ verifying:

∀y ∈ R |y − x̄| ≤ E [ |x − x̄| ]

9
is the mean value E [ x ]. Thus, in a metric space, the mean according to Doss is defined as the set
of points x̄ ∈ M verifying:

∀y ∈ M dist(y, x̄) ≤ E [ dist(x, x̄) ]

Herer shows in [50, 51] that this definition includes the classical expectation in a Banach space
(with possibly other points) and develop on this basis a conditional expectation.
A similar definition that uses convex functions on the manifold instead of metric properties
proposed by Emery [27] and Arnaudon [52, 28]. A function from M to R is convex if its restriction
to all geodesic is convex (considered as a function from R to R). The convex barycenter of a random
point x with density px is the set B(x) of points y ∈ M such that α(y) ≤ E [ α(x) ] holds for every
real bounded and convex function α on a neighborhood of the support of px .
This definition seems to be of little interest in our case since for compact manifolds, such as
the sphere or SO3 (the manifold of 3D rotations), the geodesics are closed and the only convex
functions on the manifold are the constant ones. Thus, every random point for which the support
of the distribution is the whole manifold has the whole manifold as convex barycenter.
However, in the case where the support of the distribution is included in a strongly convex open
2
set U, Emery [27] showed that the exponential barycenters, defined as the critical points of the
variance σx2 (y) are subset of the convex barycenter B(x). Local and global minima being particular
critical points, the exponential barycenters include the Riemannian centers of mass that include
themselves the Fréchet means.
Picard [29] realized a good synthesis of most of these notions of mean value and show that the
definition of a “barycenter” (i.e. a mean value) is linked to a connector, which determines itself a
connection, and thus possibly a metric. An interesting property brought by this formulation is that
the distance between two barycenters (with different definitions) is of the order of O(σx ). Thus,
for sufficiently centered random points, all these values are close.

4.4 Characterizing a Riemannian center of mass


To characterize a local minimum of a twice differentiable function, we just have to require a null
gradient and a negative definite Hessian matrix. The problem with the variance function σ 2 (y)
is that the integration domain (namely M\C(y)) depends on the derivation point y. Thus we
cannot just use the Lebesgue theorem to differentiate under the sum, unless there is no cut locus,
or the distribution has a sufficiently small compact support, which is the property used by Karcher
Kendall and Emery for the previous existence, uniqueness and equivalence theorems. We were able
to generalize in appendix A a differentiability proof of Pr Maillot [53] originally designed for the
uniform distribution on compact manifolds. The theorem we obtain is the following:

Theorem 2 Gradient of the variance function


Let P be a probability on the Riemannian manifold M. The variance σ 2 (y) = M dist(y, z)2 dP (z)
R

is differentiable at any point y ∈ M where it is finite and where the cut locus C(y) has a null
probability measure:
Z Z
2
P (C(y)) = dP (z) = 0 and σ (y) = dist(y, z)2 dP (z) < ∞
C(y) M
2
Here, strongly convex means that for every two points of U there is a unique minimizing geodesic joining them
that depend in a C ∞ of the two points.

10
At such a point, it has the following gradient:
Z
yz dP (z) = −2 E −

→  →
(grad σ 2 )(y) = −2 yx
M/C(y)

Now, we know that the variance is continuous but may not be differentiable at the points
where the cut locus has a non-zero probability measure. At these points, the variance can have an
extremum (think for instance to kxk in vector spaces). Thus, the extrema of σ 2 are characterized
by (grad σ 2 )(y) = 0 if this is defined or P (C(y)) > 0.

Corollary 1 (Characterization of Riemannian centers of mass)


Assume that the random point x has a finite variance everywhere and let A be the set of points
where the cut locus has a non-zeroi probability measure. A necessary condition for x̄ to be a Rie-


h
mannian center of mass is E x̄x = 0 if x̄ 6∈ A, or x̄ ∈ A. For discrete or empirical means, the
characterization is the same but we can write explicitly the set A = ∪i C(xi ).

If the manifold does not have a cut locus (for instance in Hadamard manifolds), we have no
differentiation problem. One could think of going one step further and computing the Hessian
matrix. Indeed, we have in the vector case: Hess σx2 (y) = −2 Id everywhere, which proves that
any extremum of the variance is a minimum. In Riemannian manifolds, one has to be more careful
because the Hessian is modified by the curvature of the manifold. One solution is to compute the
Hessian matrix using the theory of Jacobi fields, and then take estimates of its eigenvalues based
on bounds on the curvature. This is essentially the idea exploited in [25] to show the uniqueness of
the mean is small enough geodesic balls, and by [54] to exhibit an example of a manifold without
cut-locus that is strongly convex (i.e. there is one and only one minimizing geodesic joining any
two points), but that support finite mass measures that have non-unique Riemannian centers of
mass. Thus, the absence of a cut locus is not enough: one should also have some constraints on the
curvature of the manifold. In order to remain simple, we stick in this paper to the existence and
uniqueness theorem provided by [30] for simply connected and complete manifolds whose curvature
is non-positive (i.e. Hadamard) and bounded from below.

Corollary 2 (Characterization of the Fréchet mean for Hadamard manifolds with a


curvature bounded from below).
Assume that the random point x has
h a finite variance. Then, there is one and only one Riemannian
−→
i
center of mass characterized by E x̄x = 0. For discrete or empirical means, the characterization
is similar.

Results similar to Theorem 2 and the above corollaries have been derived independently. [15]
defined the mean values in manifolds as the exponential barycenters. To relate them with the
Riemannian centers of mass, they determined the gradient of the variance. However, they only
investigate the relation between the two notions when the probability is dominated by the Rie-
mannian measure, which excludes explicitly point-mass distributions. In [37, 38], the gradient of
the variance is also determined and the existence of the mean is established for simply connected
Riemannian manifolds with non-positive curvature.
Basically, the characterization of the Riemannian center of mass is the same as in Euclidean
spaces if the curvature of manifold is non-positive (and bounded from below), in which case there is
no cut-locus (we assumed that the manifold was complete and simply connected). If the sectional
curvature becomes positive, a cut locus may appear, and a non-zero probability on this cut-locus

11
induces some discontinuities in the first derivative of the variance. This corresponds to something
like a Dirac measure on the second order derivative, which is an additional difficulty to compute
the Hessian matrix of the variance on these manifolds.

4.5 Example on the circle


The easiest example of this difficulty is probably a symmetric distribution on the circle. Let
p = cos(θ)2 /π be the probability density function of our random point θ on the circle. For a circle


with the canonical metric, the exponential chart centered at α is αθ = θ − α for θ ∈]α − π; α + π[,
the distance being obviously dist(α, θ) = |α − θ| within this domain.
Let us first compute the mean points by computing explicitly the variance and its derivatives.
The variance is:
Z α+π Z π
2 2 cos(γ + α)2 π2 1
σ (α) = dist(α, θ) p(θ) dθ = γ2 dγ = − + cos(α)2 .
α−π −π π 3 2

Its derivative is rather easy to compute: grad σ 2 (α) = −2 cos(α) sin(α), and the second order
derivative is H(α) = 4 sin(α)2 − 2. Solving for grad σ 2 (α) = 0, we get four critical points:

• α = 0 and α = ±π with H(0) = H(±π) = −2,

• α = ±π/2 with H(±π/2) = +2.

Thus, there are two relative (and here absolute) minima: E [ θ ] = {0 , ±π}.
Let us use now the general framework developed on Riemannian manifolds. According to
theorem 2, the gradient of the variance is
Z α+π Z α+π
2
h−→i −→ cos(θ)2
grad σ (α) = −2E αθ = −2 αθ dθ = −2 (θ − α) dθ = −2 cos(α) sin(α),
α−π α−π π

which is in accordance with our previous computations. Now, differentiating once again under the
sum, we get:
α+π α+π −
→ Z α+π
∂ 2 dist(α, θ)
Z Z
∂ αθ
p(θ) dθ = −2 p(θ) dθ = 2 p(θ) dθ = 2,
α−π ∂α2 α−π ∂α α−π

which is clearly different from our direct calculation. One way to see the problem is the following:


the vector field αθ is continuous and differentiable on the circle except at the cut locus of α (i.e. at
θ = α ± π) where it has a jump of 2π. Thus, the second order derivative of the squared distance
should be −2(−1 + 2πδ(α±π) (θ)), where δ is the Dirac distribution, and the integral becomes:
Z α+π
−1 + 2πδ(α±π) (θ) p(θ) dθ = 2 − 4π p(α ± π) = 2 − 4 cos(θ)2

H(α) = −2
α−π

which is this time in accordance with the direct calculation.

4.6 A gradient descent algorithm to obtain the mean


Gradient descent is a usual technique to compute a minimum. Moreover, as we have a canonical
way to go from the tangent space to the manifold thanks to the exponential map, this iterative

12
algorithm seems to be perfectly adapted. In this section, we assume that the conditions of theorem
(2) are fulfilled.
Let y be an estimation of the mean of the random point x and f (y) = σx2 (y) the variance.
A practical gradient descent algorithm is to minimize the second order approximation of the cost
function at the current point. According to the Taylor expansion of equation (2), the second order
approximation of f and y is:
1
f (expy (v)) = f (y) + grad f (v) + Hess f (v, v)
2
This is a function of the vector v ∈ Ty M. Assuming that Hess f is positive definite, this function
is convex and has thus a minimum characterized by a null gradient. Let Hf (v) denote the linear
form verifying Hf (v)(w) = Hess f (v, w) for all w and Hf(-1) denote the inverse map. The minimum
is characterized by

grad v fy = 0 = grad f + Hf (v) ⇔ v = −Hf(-1) (grad f )

We saw in the previous section that grad f = −2 E −


 →
yx . Neglecting the “cut locus term” in
the Hessian matrix gives us a perfect positive definite matrix Hess f ' 2 Id. Thus, the gradient
descent algorithm is
yt+1 = expyt E −
 → 
yt x
This gradient descent
h −−−→ ialgorithm can be seen as the discretization of the ordinary differential
equation ẏ(t) = E y(t)x . Other discretization scheme are possible [55], sometimes with conver-
gence theorems [56].
In the case of the discrete or empirical mean, which is much more interesting from a statistical
point of view, we have exactly the same algorithm, but with the empirical expectation:
n
!
1 X −−→
yt+1 = expyt yt xi
n
i=1

We noteP that in the case of a vector space, these two formula simplify to yt+1 = E [ x ] and
yt+1 = n1 i xi , which are the definition of the mean value and the barycenter. Moreover, the
algorithm converges in a single step.
An important point for this algorithm is to determine a good starting point. In the case on a
set of observations {xi }, one can choose at random one of the observations as the starting point.
Another solution is to map to each point xi its mean distance with respect to other points (or
the median distance to be robust) and choose as the starting point the minimizing point. From a
computer science point of view, the complexity is k 2 (where k is the number of observations) but
the method can be randomized efficiently [57, 58].
To verify the uniqueness of the solution, we can repeat the algorithm from several starting
points (for instance all the observations xi ). If we know the Riemannian curvature of the manifold
(for instance if it is constant or if there is an upper bound κ), we can use theorem (1, Section 4.2).
We just have to verify that the maximum distance between the observations and the mean value
we have found is sufficiently small so that all observations fits into a regular geodesic ball of radius:
π
r = max dist(x̄, xi ) < √
i 2 κ

13
5 Covariance matrix
With the mean value, we have a dispersion value: the variance. To go one step further, we observe
that the covariance matrix of a random vector x with respect to a point y is the directional dispersion
of the “difference” vector −
→ = x − y:
yx
Z
−→ −
→ (−
→ (− → T p (x) dx
T

Covx (y) = E yx yx = yx) yx) x
Rn

This definition is readily extendible to a complete Riemannian manifold using the random vector

→ in T M and the Riemannian measure. In fact, we are usually interested in the covariance relative
yx y
to the mean value:

Definition 6 (Covariance)
Let x be a random point and x̄ ∈ E [ x ] a mean value that we assume to be unique to simplify the
notations (otherwise we have to keep a reference to the mean value). We define the covariance Σxx
by the expression:
i Z

→− →T −
→ − →
h
Σxx = Covx (x̄) = E x̄x x̄x = (x̄x) (x̄x)T px (x) dM(x)
D(x̄)

The empirical covariance is defined in the same way using the discrete version of the expectation
operator.

We observe that the covariance depends on the basis used for the exponential chart if we see it
as a matrix, but it does not depend on it if we consider it as a bilinear form over the tangent plane.
The covariance is related to the variance just as in the vector case:

→− →
h i
Tr(Σxx ) = E Tr(x̄x x̄xT ) = E dist(x̄, x)2 = σx2
 

This formula is still valid relatively to any fixed point: Tr(Covx (y)) = σx2 (y).

Figure 2: The covariance is defined in the tangent plane at S2 at the mean point as the
 classical
−→ −→T
covariance matrix of the random vector “deviation from the mean” Σxx = E xx xx .

In fact, as soon as we have found a (or the) mean value and that the probability of its cut locus
is null, everything appears to be similar to the case of a centered random vector by developing

14


the manifold onto
p the tangent space at the mean value. Indeed, x̄x is a random vector of pdf
ρx (y) = px (y) |G(y)| with respect to the Lebesgue measure iin the connected and star-shaped


h
domain D(x̄) ⊂ Tx̄ M. We know that its expectation is E x̄x = 0 and its covariance matrix is
defined as usual. Thus, we could define higher order moments of the distribution by tensors on this
tangent space, just as we have done for the covariance.

6 An information-based generalization of the Normal distribution


In this section, we turn to the generalization of the Gaussian distribution to Riemannian manifolds.
Several generalizations have already be proposed so far. In the stochastic calculus community, one
usually consider the heat kernel p(x, y, t), which is the transition density of the Brownian motion
[24, 59, 60]. This is the smallest positive fundamental solution to the heat equation ∂f ∂t − ∆f =0,
where ∆ is the Laplace-Beltrami operator (i.e. the standard Laplacian with corrections for the
Riemannian metric). On compact manifolds, an explicit basis of the heat kernel is given by the
spectrum of the manifold-Laplacian (eigenvalues λi with associated eigenfunctions fi solutions of
∆f = λf ). The practical problem is that the computation of this spectrum is impossible but in
very few cases [42].
To obtain tractable formulas, several distributions have been proposed in directional statistics
[16, 17, 18, 19, 61], in particular the wrapped Gaussian distributions. The basic idea is to take the
image by the exponential of a Gaussian distribution on the tangent space centered at the mean
value (see e.g. [61] for the circular and spherical case). It is easy to see that the wrapped Gaussian
distribution tends towards the mass distribution if the variance goes to zero. In the circular case,
one can also show that is tends toward the uniform distribution for a large variance. [15] extended
this definition by considering non-centered Gaussian distributions on the tangent spaces of the
manifold in order to tackle the asymptotic properties of estimators. In this case, the mean value
is generally not any more simply linked to the Gaussian parameters. In view of a computational
theory, the main problem is that the pdf of the wrapped distributions can only be expressed if there
is a particularly simple geometrical shape of the cut-locus. For instance, considering an anisotropic
covariance on the n-dimensional sphere leads to very complex calculations.
Thus, instead of keeping a Gaussian pdf in some tangent space, we propose in this paper a
new variational approach based on global properties, consistent with the previous definitions of
the mean and covariance. The property that we take for granted is the maximization of the
entropy knowing the mean and the covariance matrix. For many aspects, this may not be the
best generalization of the Gaussian. However, we demonstrate that it provides a family ranging
from the point-mass distribution to the uniform measure (for compact manifolds) and that we can
provide computationally tractable approximations for any manifold in case of small variances. In
this section the symbols log and exp denote the standard logarithmic and exponential functions in
R.

6.1 Entropy and uniform law


As we can integrate a real valued function, the extension of the entropy H [ x ] of a random point
is straightforward:
Z
H [ x ] = E [ − log(px (x)) ] = − log(px (x)) px (x) dM(x)
M

15
This definition is coherent with the measure inherited from our Riemannian metric since the
pdf pU that maximizes the entropy when we only know that the measure is in a compact set U is
the uniform density in this set:
Z
pU (x) = 1U (x) dM(y)
U

6.2 Maximum entropy characterization


Now assume that we only know the mean (that we suppose to be unique) and the covariance of a
random point: we denote it by x ∼ (x̄, Σ). If we need to assume a pdf for that random point, it
seems reasonable to choose the one which is the least informative, while fitting the mean and the
covariance. The pdf is maximizing in this case the conditional entropy

H [ x | x̄ ∈ E [ x ] , Σxx = Σ ]

In standard multivariate statistics, this maximum entropy principle is one characterization of


the Gaussian distributions [62, p. 409]. In the Riemannian case, we can express all the constraints
directly in the exponential chart at the mean value. Let ρ(y) = p(exp
p x̄ (y)) be the density in the
chart with respect to the induced Riemannian measure dMx̄ (y) = |Gx̄ (y)| dy (we use here the
Riemannian measure instead of the Lebesgue one to simplify equations below). The constraints
are:
Z
• the normalization: E [ 1M ] = ρ(y) dMx̄ (y) = 1
D(x̄)
Z


h i
• a nul mean value: E x̄x = y ρ(y) dMx̄ (y) = 0,
D(x̄)
i Z

→− →
h
• and a fixed covariance Σ : E x̄x x̄xT = y y T ρ(y) dMx̄ (y) = Σ
D(x̄)

To simplify the optimization, we won’t consider any continuity or differentiability constraint on


the cut locus C(x̄) (which would mean constraints on the border of the domain). This means that
we can do the optimization in the exponential chart at the mean point as if we were in the open
domain D(x̄) ∈ Rn .
Using the convexity of the function −x log(x), one can show that the maximum entropy is
yT Γ y
attained by distributions of density ρ(y) = k exp −β y − 2
T
, if there exists constants k,
β and Γ such that our constrains are fulfilled [62, Theorem 13.2.1, p. 409]. Assuming that the
definition domain D(x̄) is symmetric with respect to the origin, we find that β = 0 ensures a null
mean. Substituting in the constraints gives the following equations.

Theorem 3 (Normal law)


We call Normal law on the manifold M the maximum entropy distribution knowing the mean value
and covariance. Assuming no continuity nor differentiability constraint on the cut locus C(x̄) and a
symmetric domain D(x̄), the pdf of the Normal law of mean x̄ (the mean value) and concentration
matrix Γ is given by:

→ −
→!
x̄yT Γ x̄y
N(x̄,Γ) (y) = k exp −
2

16
where the normalization constant and the covariance are related to the concentration matrix by:

→ −
→! −
→ −
→!
x̄yT Γ x̄y x̄yT Γ x̄y
Z Z
(-1) −
→− →T
k = exp − dM(y) and Σ=k x̄y x̄y exp − dM(y)
M 2 M 2

From the concentration matrix, we can compute the covariance of the random point, at least
numerically, but the reverse is more difficult.

6.3 The vectorial case


n
The integrals can be entirely computed, and we find k (-1) = (2π)
√ 2 and Γ = Σ(-1) . The Normal
|Γ|
density is thus the usual Gaussian density:
   
(x − x̄)T Γ (x − x̄) 1 (x − x̄)T Σ(-1) (x − x̄)
N(x̄,Γ) (x) = k exp − = p exp −
2 (2 π)n/2 |Σ| 2

6.4 Example on a simple manifold: the circle


The exponential chart for the circle of radius 1 with the canonical metric is the angle θ ∈ D =]−π; π[
with respect to the development point, and the measure is simply dθ. For a circle of radius r, the
exponential chart becomes x = r θ. The domain is D =] − a; a[ (with a = π r) and the measure is
dx = r dθ. Thus, the normalization factor of the Normal density is:
Z a
γ x2
  r r 
(-1) 2π γ
k = exp − dx = erf a
−a 2 γ 2
Rx
where erf is the error function erf(x) = √2π 0 exp(−t2 ) dt. The density is the truncated Gaussian
 2

N(0,γ) (x) = k exp − γ 2x . It is continuous but not differentiable on the cut locus π ≡ −π. The
truncation introduces a bias in the relation between the variance and the concentration parameter:
Z a
γ x2 γ a2
    
2 2 1
σ = x k exp − dx = 1 − 2 a k exp −
−a 2 γ 2

It is interesting to have a look on limit properties: if the circle radius goes to infinity, the circle
becomes the real line and we obtain the usual Gaussian with the relation σ 2 = 1/γ, as expected.
Now, let us consider a circle with a fixed radius. As anticipated, the variance goes to zero and
the density tends toward a point mass distribution (see figure 3) if the concentration γ goes to
infinity. On the other hand, the variance cannot becomes infinite (as in the real case) when the
concentration parameter γ goes to zero because the circle is compact: a Taylor expansion gives
σ 2 = a2 /3 + O(γ). Thus, the maximal variance on the circle is

a2 1
σ02 = lim σ 2 = with the density N(0,0) (x) =
γ→0 3 2a
Interestingly, the Normal density of concentration 0 is the uniform density. In fact, this result can
be generalized to all compact manifolds

17
Figure 3: Variance σ 2 with respect to the concentration parameter γ on the circle of radius 1 and
the real line. This variance tends toward σ02 = π 2 /3 for the uniform distribution on the circle
(γ = 0) whereas it tends to infinity for the uniform measure on R. For a strong concentration
(γ > 1), the variance on the circle can be accurately approximated by σ 2 ' 1/γ, as in the real case.

6.5 Small concentration matrix on compact manifolds: uniform distribution




Let n be the dimension of the manifold. Using bounds on the eigenvalues of Γ, we have: Tr(Γ)kx̄yk2 /n ≤

→ −
→ −

x̄yT Γ x̄y ≤ Tr(Γ)kx̄yk2 . This means that we can bound the exponential by:
+∞ −
→ +∞ −

X (−1)k Tr(Γ)k kx̄yk2k −
→ −
→ X (−1)k Tr(Γ)k kx̄yk2k
1+ ≤ exp(−x̄yT Γ x̄y/2) ≤ 1 +
k! 2k k! 2k nk
k=1 k=1
R −
→ 2k
As all the moments M kx̄yk dM are finite since the manifold is compact, one can conclude that
Z

→ −

k (-1) = exp(−x̄yT Γ x̄y/2) dM = V ol(M) + O(Tr(Γ))
M

It follows immediately that the limit of the normal distribution is the uniform one, and that the
limit covariance is finite.

Theorem 4 Let M be a compact manifold. The limit of the normal distribution for small concen-
tration matrices is the uniform density N (y) = 1/V ol(M) + O(Tr(Γ)) . Moreover, the covariance
matrix tends towards a finite value:
Z
1 −
→− →
Σ= x̄y x̄yT dM + O(Tr(Γ)) < +∞
V ol(M) M

6.6 Approximation for a large concentration matrix


If the pdf is sufficiently concentrated (a high concentration matrix Γ or a small covariance matrix Σ),
then we can use a Taylor expansion of the metric in a normal coordinate system around the mean
value to approximate the previous integrals and obtain a Taylor expansions of the normalization
factor and the concentration matrix with respect to the covariance matrix.

18
The Taylor expansion of the metric is given by [63, p84]. We easily deduce the Taylor expansion
of the measure around the origin (Ric is the Ricci (or scalar) curvature matrix in the considered
normal coordinate system):
 
p y T Ric y
dM(y) = det(G(y)) dy = 1 − + O(kyk3 ) dy
6

Substituting this Taylor expansion in the integrals and manipulating the formulas (see appendix
B) leads to the following theorem.

Theorem 5 (Approximate normal density)



→ −

In a complete Riemannian manifold, the normal density is N (y) = k exp(−x̄yT Γ x̄y/2). Let r =
i(M, x̄) be the injectivity radius at the mean point (by convention r = +∞ if there is no cut-locus).
Assuming a finite variance for any concentration matrix Γ, we have the following approximations
of the normalization constant and concentration matrix for a covariance matrix Σ of small variance
σ 2 = Tr(Σ):

1 + O(σ 3 ) + ε σr

1 σ 
k= p and Γ = Σ(-1) − Ric + O(σ) + ε
(2π)n det(Σ) 3 r
 
σ
Here, ε(x) is a function that is a O(xk ) for any positive k, with the convention that ε +∞ =
ε(0) = 0. More precisely, this is a function such that ∀k ∈ R+ , lim0+ x−k ε(x) = 0

6.7 Discussion
The maximum entropy approach to generalize the normal distribution to Riemannian manifolds is
interesting since we obtain a whole family of densities going from the Dirac measure to the uniform
distribution (or the uniform measure if the manifold is only locally compact). Unfortunately, this
distribution is generally not differentiable at the cut locus, and often even not continuous.
However, if the relation between the parameters and the moments of the distribution are not as
simple as in the vector case (but can we expect something simpler in the general case of Riemannian
manifolds?), the approximation for small covariances turns out to be rather simple. Thus, this
approximate distribution can be handled quite easily for computational purposes. It is likely that
similar approximations hold for wrapped Gaussian, but this remains to establish.

7 Mahalanobis distance and χ2 law


The problem we are now tackling is to determine if an observation x̂ could reasonably have come
from a given probability distribution x on M or if it should be considered as an outlier. From a
computational point of view, the pdf of the measurement process is often too rich an information
to be estimated or handled. In practice, one often characterizes this pdf by its moments, and
more particularly by the mean and the covariance. We denote it by x ∼ (x̄, Σxx ). Based on these
characteristics only, we have to decide if the observation x̂ is compatible with this measurement
process.
In the vector case, a well adapted tool is the Mahalanobis distance µ2 = (x̂ − x̄)T Σ(-1)
xx (x̂ − x̄),
which measures the distance between the observation x̂ and the mean value x̄ according to the
“metric” Σ(-1) 2
xx . To test if the observation x̂ is an outlier, the principle of the Mahalanobis D test is
to compute the tail probabilities of the resulting distribution (the so called p-value), i.e. the risk

19
of error when saying that the observation does not come from the distribution. The distribution
of x is usually assumed to be Gaussian, as this distribution minimizes the added information (i.e.
minimizes the entropy) when we only know the mean and the covariance. In that case, we know that
the Mahalanobis distance should be χ2n distributed if the observation is correct (n is the dimension
of the vector space). If the probability of the current distance is too small (i.e. µ2 is too large), the
observation x̂ can safely be considered as an outlier.
The definition of the Mahalanobis distance can be easily generalized to complete Riemannian
manifolds with our tools. We note that it is well defined for any distribution of the random point
and not only the normal one.
Definition 7 (Mahalanobis distance)
We call Mahalanobis distance between a random point x ∼ (x̄, Σxx ) and a (deterministic) point y
on the manifold the value

→ −

µ2x (y) = x̄yT Σ(-1)
xx x̄y.

7.1 Properties
Since µ2x is a function from M to R, µ2x (y) is a real random variable. The expectation of this
random variable is well defined and turns out to be quite simple:
Z Z
→T (-1) −
− →
E µ2x (y) µ2x (z) py (z) dM(z) =
 
= x̄z Σxx x̄z py (z) dM(z)
M Z M 
(-1) →−
− →T 
= Tr Σxx x̄z x̄z py (z) dM(z) = Tr Σ(-1)xx Covy (x̄)
M

The expectation of the Mahalanobis distance of a random point with itself is even simpler:
E µ2x (x) = Tr(Σ(-1)
 
xx Σxx ) = Tr( Idn ) = n

Theorem 6 The expected Mahalanobis distance of a random point withitself isindependent of the
distribution and does only depend on the dimension of the manifold: E µ2x (x) = n.
This identity can be used to verify with a posteriori observations that the covariance matrix
has been correctly estimated. It can be compared with
 the expectation of the “normalized” squared
distance, which is by definition: E dist(x, x̄)2 /σx2 = 1.

7.2 A generalized χ2 law


Assuming that the random point x ∼ (x̄, Σxx ) is normal, we can go one step further and compute
the probability that χ2 = µ2x < α2 (see appendix C). This generalization of the χ2 law turns out
to be still independent of the mean value and the covariance matrix of the random point (at least
up to the order O(σ 3 )):
Theorem 7 (Approximate χ2 law)
With the same hypotheses as for the approximate normal law, the χ2 probability is
kxk2
Z   σ 
n
Pr{χ2 ≤ α2 } = (2π)− 2 exp − dx + O(σ 3 ) + ε
kxk≤α 2 r
while the density is
1  u  n −1  u σ 
2 3
pχ2 (u) = exp − + O(σ ) + ε
2 Γ n2

2 2 r

20
The 2 probability can be computed using the incomplete gamma function Pr{χ2 ≤ α2 } =
 χ 2
P n2 , α2 (see for instance [64]).
In practice, one often use this law to test if an observation x̂ has been drawn from a random
point x that we assume to be Gaussian: if the hypothesis is true, the value µ2x (x̂) will be less than
α2 with a probability γ = Pr{χ2 ≤ α2 }. Thus, one choose a confidence level γ (for instance 95%
or 99%), then we find the value α(γ) such that γ = Pr{χ2 ≤ α2 } and accept the hypothesis if
µ2x (x̂) ≤ α2 .

8 Discussion
On a (geodesically complete) Riemannian manifold, it is easy to define probability density functions
associated to random points, thanks to the availability of a metric. However, as soon as the
expectation is concerned, we may only define the expectation of an observable (a real or vectorial
function of the random point). Thus, the definition of a mean value for a random point is much
more complex than for the vectorial case and it requires a distance-based variational formulation:
the Riemannian center of mass basically minimize locally the variance. As the mean is now defined
through a minimization procedure, its existence and uniqueness are not ensured any more (except
for distributions with a sufficiently small compact support). In practice, one mean value almost
always exists, and it is unique as soon as the distribution is sufficiently peaked. The properties
of the mean are very similar to those of the modes (that can be defined as central values of order
0) in the vectorial case. We present here a new proof of the barycentric characterization theorem
that is valid for distribution with any kind of support. The main difference with the vector case is
that we have to ensure a null probability measure of the cut locus. To compute the mean value,
we designed an original Gauss-Newton gradient descent algorithm that essentially alternates the
computation of the barycenter in the exponential chart centered at the current estimation, and a
re-centering step of the chart at the newly computed barycenter.
To define higher moments of the distribution, we used the exponential chart at the mean point
(which may be seen as the development of the manifold onto its tangent space at this point along
the geodesics): the random point is thus represented as a random vector with null mean in a
star-shaped and symmetric domain. With this representation, there is no more problem to define
the covariance matrix and potentially higher order moments. Based on this covariance matrix,
we defined a Mahalanobis distance between a random and a deterministic point that basically
weights the distance between the deterministic point and the mean point using the inverse of the
covariance matrix. Interestingly, the expected Mahalanobis distance of a random point with itself
is independent of the distribution and is equal to the dimension of the manifold, as in the vectorial
case.
Like for the mean, we choose a variational approach to generalize the Normal law: we define
it as the maximum entropy distribution knowing the mean and the covariance. Neglecting the
cut-locus constraints, we show that this amounts to consider a truncated Gaussian distribution on
the exponential chart centered at the mean point. However, the relation between the concentration
matrix (the “metric” used in the exponential of the pdf) and the covariance matrix is slightly more
complex that the simple inversion of the vectorial case, as it has to be corrected for the curvature
of the manifold.
Last but not least, using the Mahalanobis distance of a Normally distributed random point, we
can generalize the χ2 law: we were able to show that is has the same density as in the vectorial
case up to an order 3 in σ. This opens the way to the generalization of many other statistical tests,

21
as we may expect similarly simple approximations for sufficiently centered distributions.
In this paper, we focused on purpose on the theoretical formulation of the statistical framework
in geodesically complete Riemannian spaces, eluding applications examples. The interested reader
will find practical applications in computer vision to compute the mean rotation [32, 33] or for
the generalization of matching algorithms to arbitrary geometric features [65]. In medical image
analysis, selected applications cover the validation of the rigid registration accuracy [4, 66, 67],
shape statistics [7] and more recently tensor computing, either for processing and analyzing diffusion
tensor images [10, 8, 9, 11], or to model the brain variability [12]. One can even find applications
in rock mechanics with the analysis of fracture geometry [68].
In the theory presented here, all definitions are derived from the Riemannian metric of the
manifold. A natural question is how to chose this metric? Invariance properties requirements
provide a partial answer for connected Lie groups and homogeneous manifolds [1, 45]. However,
an invariant metric does not always exist on an homogeneous manifolds. Likewise, left and right
invariant metric and generally different in non-compact Lie groups, so that we only have a partial
consistency between the geometric and statistical operations. Another way to chose the metric
could be to estimate it from empirical data.
More generally, we could design other definitions of the mean value using the notion of connector
introduced in [29]. This connector formalizes a relationship between the manifold and its tangent
space at one point, exactly in the way we used the exponential map of the Riemannian metric. Thus,
we could generalize easily the higher order moments and the other statistical operations we defined
by replacing everywhere the exponential map with the connector. One important point is that
these connectors can model extrinsic distances (like the Euclidean distance on unit vectors), and
could lead to very efficient approximations of the mean value for sufficiently peaked distributions.
For instance, the “barycenter / re-centering” algorithm we designed will most probably converge
toward a first order approximation of the Riemannian mean if we use any chart that is consistent to
the exponential chart at the first order (e.g. Euler’s angle on re-centered 3D rotations). We believe
that this research track may become one of the most productive from the practical point of view.

Acknowledgments
The main part of this work was done at MIT, Artificial Intelligence Lab in 1997. If all the ideas
presented in this paper were in place at that time [69], the formula for gradient of the variance
(appendix A) remained a conjecture. The author would like to thank specially Pr Maillot for
providing a first proof of the differentiability of the variance for the uniform distribution on compact
manifolds [53]. Its generalization to the non compact case with arbitrary distributions considerably
delayed the submission of this paper. In the meantime, other (and simpler) proofs were proposed
[15, 38]. The author would also like to thank two anonymous referees for very valuable advises and
for the detection of a number of technical errors.

A Gradient of the variance


This proof is a generalization of a differentiability proof for the uniform distribution on compact
manifolds by Pr Maillot [53]. The main difficulties were to remove the compactness hypothesis,
and to generalize to arbitrary distributions.

22
Hypotheses
Let P be a probability on the Riemannian manifold M. We assume that the cut locus C(y) of
the derivation point y ∈ M has a null measure with respect to this probability (as it has with the
Riemannian measure) and that the variance is finite at that point:
Z Z
2
P (C(y)) = dP (z) = 0 and σ (y) = dist(y, z)2 dP (z) < ∞
C(y) M

Goal and problem


Let now ~g(y) = M\C(y) − →
yz dP (z). As k−→
yzk = d(z, y) ≤ 1 + d(z, y)2 , and using the null probability
R

of the cut locus, we have:


Z Z
k−
→ 1 + d(z, y)2 dP (z) = 1 + σ 2 (y) < ∞

k~g(y)k ≤ yzk dP (z) ≤
M\C(y) M\C(y)

Thus ~g(y) is well defined everywhere. Let h(y) = d(z, y)2 = k−



yzk2 . For a fixed z 6∈ C(y) (which is


equivalent to y 6∈ C(z)), we have (grad h)(y) = −2yz. Thus the proposition:
Z
(grad σ 2 )(y) = −2 →

yz dP (z)
M/C(y)

corresponds to a derivation under the sum, but the usual conditions of the Lebesgue theorem are
not fulfilled: the zero measure set C(y) varies with y. Thus, we have to come back to the original
definition of the gradient.

Definition of the gradient


Let γ(t) be a curve with γ(0) = y and γ̇(0) = w. By definition, the function σ 2 : M → R is
derivable if there exists a vector (grad σ 2 )(y) ∈ Ty M (the gradient) such that:

σ 2 (γ(t)) − σ 2 (y)
∀w ∈ Ty M (grad σ 2 )(y) | w = ∂w σ 2 (y) = lim
t→0 t
Since tangent vectors are defined as equivalent classes, we can choose the geodesic curve γ(t) =
expy (t w). Using v = t w, the above condition can then be rewritten:

σ 2 (expy (v)) − σ 2 (y) − (grad σ 2 )(y) | v


∀v ∈ Ty M lim =0
kvk→0 kvk

which can be rephrased as: for all η ∈ R+ , there exists ε sufficiently small such that:

∀v ∈ Ty M, kvk < ε σ 2 (expy (v)) − σ 2 (y) − (grad σ 2 )(y) | v ≤ η kvk

General idea
Let ∆(z, v) be the integrated function (for z 6∈ C(y)):

∆(z, v) = dist(expy (v), z)2 − dist(y, z)2 + 2 h −



yz | v i

23
R
and H(v) = M\C(y) ∆(z, v) dP (z) be the function to bound:

H(v) = σ 2 (expy (v)) − σ 2 (y) − (grad σ 2 )(y) | v


Z Z Z
= dist(expy (v), z)2 dP (z) − dist(y, z)2 dP (z) + 2 h−

yz | v i dP (z)
M M M\C(y)

The idea is to split this integral in two in order to bound ∆ on a small neighborhood W around
the cut locus of y and to use the standard Lebesgue theorem to bound the integral of ∆ on M\W .
S
Lemma 1 Wε = x∈B(y,ε) C(x) is a continuous series of included and decreasing open sets all
containing C(y) and converging towards it.

Let us first reformulate the definition of Wε :

z ∈ Wε ⇔ ∃x ∈ B(y, ε) / z ∈ C(x) ⇔ ∃x ∈ B(y, ε) / x ∈ C(z) ⇔ C(z) ∩ B(y, ε) 6= ∅

Going to the limit, we have: z ∈ W0 = lim Wε ⇔ C(z) ∩ {y} =


6 ∅ ⇔ z ∈ C(y). Thus, we have
ε→0
a continuous series of included and decreasing sets all containing C(y) and converging toward it.
Now, let us prove that Wε is an open set.
Let U = {u = (x, v) ∈ M × Tx M ; kvk = 1} be the unit tangent bundle of M and ρ : U →
R̄+ = R+ ∪ {+∞} be the cutting abscissa of the geodesic starting at x with the tangent vector v.
Let now U = ρ(-1) (R+ ) be the subset of the unit tangent bundle where ρ(u) < +∞. The function ρ
being continuous on U , the subspace U = ρ(-1) ([0, +∞[) is open. Let π : U → M be the canonical
projection along the fiber π((x, v)) = x. This is a continuous and open map. Let us denote by
Ux = π (-1) ({x}) the unit tangent bundle at point x and U x = π (-1) ({x}) ∩ U its subset that lead to
a cutting point of x.
Consider u = (x, v) ∈ U x and the geodesic expx (t ρ(u) v) for t ∈ [0; 1]: it is starting from x
with tangent vector v and arriving at z = expx (ρ(u) v) with the same tangent vector by parallel
transportation. Reverting the time course (t → 1 − t), we have a geodesic starting at z with tangent
vector −v and arriving at x with the same tangent vector. By definition of the cutting function, we
have ρ(u) = ρ(u0 ) with u0 = (z, −v). Thus, the function q(u) = (expx (ρ(u) v), −v) is a continuous
bijection from U into itself with q (-1) = q.
Let z ∈ Wε . By definition of Wε , z is in the cut locus of a point x ∈ B(y, ε): there exists a unit
tangent vector v at that point such that z = π(q(x, v)). Conversely, the cut locus of z intersects
B(y, ε): there exists a unit tangent vector v at z such that x = π(q(z, v)) ∈ B(y, ε). Thus, we can
rewrite Wε = π(Uε ) where Uε = q (-1) (π (-1) (B(y, ε)) ∩ U ). The functions π and q being continuous,
Uε is open. Finally, π being an open map, we conclude that Wε is open.

Lemma 2 Let y ∈ M and α > 0. Then there exists an open neighborhood Wε of C(y) such that
(i) For all x ∈ B(y, ε), C(x) ⊂ Wε ,
R
(ii) P (Wε ) = Wε dP (z) < α
R
(iii) Wε dist(y, z) dP (z) < α

By hypothesis, the cut locus C(y) has a null measure for theR measures dP (z). The distance
being a measurable function, its measure is null on the cut locus: C(y) dist(y, z) dP (z) = 0. Thus,
R R
the functions Wε dP (z) and Wε dist(y, z) dP (z) are converging toward zero as ε goes to zero. By
continuity, we can make both terms smaller than any positive α by choosing ε sufficiently small.

24
Bounding ∆ on Wε
Let Wε , ε and α verifying the conditions of lemma 2 and x, x0 ∈ B(y, ε). We have dist(z, x) ≤
dist(z, x0 ) + dist(x0 , x). Thus:

dist(z, x)2 − dist(z, x0 )2 ≤ dist(x, x0 ) dist(x, x0 ) + 2 dist(z, x0 )




Using the symmetry of x and x0 and the inequalities dist(x, x0 )≤ 2ε and dist(z, x0 ) ≤ dist(z, y)+ε, we
have: dist(z, x)2 − dist(z, x0 )2 ≤ 2 dist(x, x0 ) 2ε + dist(z, y) . Applying this bound to x = expy (v)
and x0 = y, we obtain:

dist(z, expy (v))2 − dist(z, y)2 ≤ 2 2ε + dist(z, y) kvk






Now, the last term of ∆(z,  v) is easily bounded by: h yz | v i ≤ dist(y, z) kvk. Thus, we have:
|∆(z, v)| ≤ 4 ε + dist(z, y) kvk and its integral over Wε is bounded by:
Z Z

∆(z, v) dP (z) ≤ 4kvk ε + dist(z, y) dP (z) < 8αkvk
Wε Wε

Bounding ∆ on M\Wε :
Let x = expy (v) ∈ B(y, ε). We know from lemma 2 that the cut locus C(x) of such a point belongs
to Wε . Thus, the integration domain M\Wε is now independent of y and we can use the usual
Lebesgue theorem to differentiate under the sum:
Z ! Z
grad 2
dist(y, z) dP (z) = −2 →

yz dP (z)
M\Wε M\Wε

By definition, this means that for kvk small enough, we have:


Z
∆(z, v) dP (z) < αkvk
M\Wε

Conclusion
R
Thus, for kvk small enough, we have M ∆(z, v) dP (z) < 9αkvk, which means that the variance has
a derivative at the point y:
Z
2
(grad σ )(y) = −2 −

yz dP (z)
M/C(y)

B Approximation of the generalized Normal density


In this section, we only work with a normal coordinate system at the mean value of the considered
normal law. The generalized Normal density is: N (y) = k exp (−y T Γ y/2), where the normalization
constant, the covariance and concentration are related by:
Z  T  Z  T 
(-1) y Γy T y Γy
k = exp − dM(y) and Σ=k y y exp − dM(y)
M 2 M 2
We assume here that there two integrals converge toward a finite value for all concentration matrices
Γ. In these expressions, the parameter is the concentration matrix Γ. The goal of this appendix is

25
to invert these relations in order to obtain a Taylor expansion of the concentration matrix and the
normalization coefficient k with respect to the variance. This can be realized thanks to the Taylor
expansion of the Riemannian metric around the origin in a normal coordinate system [63, section
2.8, corollary 2.3, p.84]: det(gij (exp v)) = 1 − Ric (v, v)/3 + O(kvk3 ), where Ric is the expression
of the Ricci tensor of scalar curvatures in the exponential chart. Thus [42, p.144]:

RdM (y) 3
 
p y T Ric y
dM(y) = det(G(y)) dy = 1 − + RdM (y) dy with lim =0
6 y→0 kyk

Let us investigate the normalization coefficient first. We have:


 T  
y T Ric y
Z
y Γy
k (-1) = exp − 1− + RdM (y) dy
D 2 6

where D is the definition domain of the exponential chart. Since the concentration matrix Γ is a
symmetric positive definite matrix, we have a unique symmetric positive definite square root Γ1/2 ,
which is obtained by changing the eigenvalues of Γ into their square root in a diagonalization. Using
1 1
the change of variable z = Γ1/2 y in the first two terms and the matrix T = Γ− 2 Ric Γ− 2 to simplify
the notations, we have:

kzk2
    T 
zT T z
Z Z
(-1) −1/2 y Γy
k = det(Γ) exp − 1− dy + exp − RdM (y) dy
D0 2 6 D 2

where the new definition domain is D0 = Γ1/2 D. Likewise, we have for the covariance:

kzk2
    T 
zTT z
Z Z
(-1) 1 − 21 T − 21 T y Γy
k Σ= p Γ (zz )Γ exp − 1− dz+ yy exp − RdM (y)dy
det(Γ) D0 2 6 D 2

B.1 Manifolds with an infinite injectivity radius at the mean point


At such a point, the cut locus is void and the definition domain of the exponential chart is D =
D0 = Rn . This simplifies the integrals so that we can compute the first two terms explicitly.

The normalization coefficient: The first term is


kzk2
Z   YZ  2
z
exp − dz = exp − i dzi = (2π)n/2
Rn 2 R 2
i

By linearity, we also easily compute the second term:

kzk2 kzk2
Z    Z   
T
z T z exp − dz = Tr T T
z z exp − dz = (2π)n/2 Tr(T )
Rn 2 Rn 2

Thus, we end up with:

(2π)n/2
   
Tr(Γ(-1) Ric ) yT Γ y
Z
(-1)
k =p 1− + exp − RdM (y) dy
det(Γ) 6 Rn 2

Unfortunately, the last term is not so simple to simplify as we only know that the remainder
RdM (y) behaves as a O(kyk3 ) for small values of y. The idea is to split the right integral into
one part around zero, say for kyk < α, where we know how to bound the remainder, and to show

26
that exp(−kyk2 ) dominate the Remainder elsewhere. Let γm be the smallest eigenvalue of Γ. As
y T Γy ≥ γm kyk2 , we have:
kyk2
Z  T  Z  
y Γy
exp − RdM (y) dy ≤ exp −γm RdM (y) dy
Rn 2 Rn 2
For any η > 0, there exists a constant α > 0 such that kyk < α implies that |RdM (y)| < η kyk3 .
Thus:
kyk2 kyk2
Z   Z  
exp −γm RdM (y) dy < η exp −γm kyk3 dy
kyk<α 2 kyk<α 2
kyk2
Z  
2 η V ol(Sn−1 )
< η exp −γm kyk3 dy = 2
Rn 2 γm
−2 . For the other part, as exp

Thus, we know that this part of the integral behaves as a O γm
is a monotonous function, one has: exp(−γ kyk2 ) < exp(−kyk2 )/γ 2 for kyk2 > 2 log(γ)/(γ − 1).
Moreover, as the limit of log(γ)/(γ − 1) is zero at γ = +∞, it can be made smaller than α/2
provided that γ is large enough. Thus, we have that (for γm large enough):

γm kyk2 kyk2
Z   Z  
1
exp − RdM (y) dy < 2
exp − |RdM (y)| dy
kyk2 >α 2 γm kyk2 >α 2
kyk2
Z  
1
< 2
exp − |RdM (y)| dy
γm Rn 2
The last integral is a constant which is finite, since we assumed that k (-1) was finite for all Γ. Thus,
−2 when γ goes to infinity. We have obtained that:

this part of the integral behaves also as a O γm m
Z
−2

exp(−y T Γ y/2) RdM (y) dy = O γm (4)
Rn

so that
(2π)n/2
 
(-1) Tr(Γ(-1) Ric ) −2

k =p 1− + O γm
det(Γ) 6
−1 and
p 1/2
As Tr(Γ(-1) Ric ) is a term in γm det(Γ) a term in γm , we finally have:
p  
det(Γ) Tr(Γ(-1) Ric ) 
−3/2
k= 1+ + O γm (5)
(2π)n/2 6

The covariance matrix: The principle is the same as for the normalization coefficient, with
slightly more complex integrals. We basically have to compute
  Z  T 
1 y Γy
k (-1) Σ = det(Γ)−1/2 Γ−1/2 I − J Γ−1/2 + y y T exp − RdM (y) dy
6 D 2
kzk2 kzk2
Z   Z  
with I= z z T exp − dz and J= Tr(T z z T ) z z T exp − dz
Rn 2 Rn 2
These are classical calculations in multivariate statistics: for the first integral, the off diagonal
elements Iij (with i 6= j) vanish because we integrate antisymmetric functions over R:
Z
zi zj exp(kzk2 /2) dz = 0 for i 6= j
Rn

27
The diagonal elements are:
Z Z  Y Z 
Iii = zi2 exp(−kzk2 /2) dz = zi2 exp(−zi2 /2) dzi exp(−zj2 /2) dzj = (2π)n/2
Rn R j6=i R

so that we have I = (2π)n/2 Id. Since Tr(T z z T ) = k,l Tkl zk zl , the elements of the second integral
P
are:
kzk2
X Z  
Jij = Tkl zk zl zi zj exp − dz
2
k,l

Let us investigate first off diagonal elements (i 6= j): for k 6= i and l 6= j or l 6= i and k 6= j,
we integrate an antisymmetric function and the result is zero. Since the Ricci curvature matrix is
symmetric, the matrix T is also symmetric and the sum is reduced to a single term:

kzk2
Z  
Jij = (Tij + Tji ) zi2 zj2 exp − dz = 2 Tij (2π)n/2 for i 6= j
2
2 2
P R
The diagonal terms are: Jii = k,l Tkl Rn zi zk zl exp(−kzk /2) dz. For k 6= l, we integrate
antisymmetric functions: the result is zero. Thus, we are left with:

Tkk Rn zi2 zk2 exp(−kzk2 /2) dz + Tii Rn zi4 exp(−kzk2 /2) dz


P R R
Jii = k6 = i
n/2 + T 3 (2π)n/2 = (2π)n/2 ( Tr(T ) + 2 T )
P
= k6=i Tkk (2π) ii ii

Combining with off diagonal terms, we get: J = (2π)n/2 (Tr(T ) Id + 2 T ), so that:

(2π)n/2 −1/2
  Z  T 
(-1) 1 −1/2 T y Γy
k Σ= p Γ Id − (Tr(T ) Id + 2 T ) Γ + y y exp − RdM (y) dy
det(Γ) 6 D 2

Like for the normalization constant, we now have to bound the integral of the remainder. The
principle is the same. Let γm be the smallest eigenvalue of Γ. As y T Γy ≥ γm kyk2 , we have:

kyk2
Z  T  Z  
T y Γy 2
y y exp − RdM (y) dy ≤ kyk exp −γm |RdM (y)| dy
Rn 2 Rn 2

For any η > 0, we can find a constant α > 0 such that kyk < α implies that |RdM (y)| < η kyk3 , i.e:

kyk2 kyk2
Z   Z  
2 2
kyk exp −γm |RdM (y)| dy < η kyk exp −γm kyk3 dy
kyk<α 2 kyk<α 2
kyk2
Z  
8 η V ol(Sn−1 )
< η exp −γm kyk5 dy = 3
Rn 2 γm
−3 . For the other part, as exp

Thus, we know that this part of the integral behaves as a O γm
is a monotonous function, one has: exp(−γ kyk2 ) < exp(−kyk2 )/γ 3 for kyk2 > 3 log(γ)/(γ − 1).
Moreover, as the limit of log(γ)/(γ − 1) is zero at γ = +∞, it can be made smaller than α/3
provided that γ is large enough. Thus, we have that (for γm large enough):

γm kyk2 kyk2
Z   Z  
1
kyk2 exp − |RdM (y)| dy < 3
kyk 2
exp − |RdM (y)| dy
kyk2 >α 2 γm kyk2 >α 2
kyk2
Z  
1 2
< 3
kyk exp − |RdM (y)| dy
γm Rn 2

28
The last integral is a constant which is finite, since
 we assumed that Σ was finite p
for all Γ. Thus,
−3
this part of the integral behaves also as a O γm when γm goes to infinity. As 1/ det(Γ) a term
−1/2
in γm , we finally have:

(2π)n
 
(-1) Tr(Γ(-1) Ric ) (-1) Γ(-1) Ric Γ(-1) 
−5/2

Σ = kp Γ − Γ − + O γm
det(Γ) 6 3

Simplifying this expression using the previous Taylor expansion of k (Eq. 5), we obtain thus the
following relation between the covariance and the concentration matrices:
1 
−5/2

Σ = Γ(-1) − Γ(-1) Ric Γ(-1) + O γm (6)
3

Inverting the variables of the Taylor expansions This relation can be inverted to obtain the
Taylor expansion of the concentration matrix with respect to the covariance. First, we shall note
−5/2 5
that from the above equation, O(γm ) = O(σmax ) where σmax is the square root of the largest
eigenvalue of Σ. However, a global variable such as the variance σ 2 = Tr(Σ) is more appropriate.
2 ≤ i σi2 = σ 2 ≤ n σmax2
P
Since σmax , we have O(σmax ) = O(σ). Then, one easily verifies that the
Taylor expansion of Γ is: Γ = Σ + Σ Ric Σ/3 + O(σ 5 ), and finally:
(-1) (-1)

1
Γ = Σ(-1) − Ric + O(σ) (7)
3
p
To express k with respect to Σ instead of Γ, we have to compute Tr(Γ(-1) Ric ) and det(Γ).

1
Tr(Γ(-1) Ric ) = Tr(Σ Ric + Σ Ric Σ Ric + O(σ 5 )) = Tr(Σ Ric ) + O(σ 3 )
3
For the determinant, one verifies that if A is a differential map from R to the matrix group,
one has det(A)0 = (det(A)) Tr(A0 A(-1) ), so that det( Id + η B) = 1 + η Tr(B) + O(η 2 ). Since
Γ Σ = Id − 13 Ric Σ + O(σ 3 ) and Σ is a term in O(σ 2 ), we have
 
1 1
det(Γ) det(Σ) = det Id − Ric Σ + O(σ 3 ) = 1 − Tr(Σ Ric ) + O(σ 3 )
3 3

and thus  
p 1 1 3
det(Γ) = p 1 − Tr(Σ Ric ) + O(σ )
det(Σ) 6
Substituting this expression in equation 5, we obtain:
  
1 1 1
k=p 1 − Tr(Σ Ric ) + O(σ 3 ) 1 + Tr(Σ Ric ) + O(σ 3 )
(2π)n det(Σ) 6 6

Which simplifies in:


1 + O(σ 3 )
k=p (8)
(2π)n det(Σ)

29
Summary of the approximate normal density: In a manifold without a cut locus at the
mean point, the normal density in a normal coordinate system at the mean value is:
 T 
y Γy
N (y) = k exp −
2
The normalization constant and the concentration matrices are approximated by the following
expressions for a covariance matrix Σ of small variance σ 2 = Tr(Σ):

1 + O(σ 3 ) 1
k=p and Γ = Σ(-1) − Ric + O(σ)
(2π)n det(Σ) 3

B.2 Manifolds with a cut locus at the mean point


Now, we have to integrate all the integrals over the definition domain D of the exponential chart
at the mean point, which is only a subset of Rn . For the integrals involving the remainder of the
Taylor expansion of the metric, there is no problem as a bound on D is dominated by the bound
on Rn we already computed. For the other integrals, the idea is to bound each term from above
and below to show that the same results still holds with just a slight modification of the Taylor
expansion bounding term. Let r be the (finite) injectivity radius at the mean point. The open ball
B(r) is the greatest geodesic ball included in the definition domain: B(r) ⊂ D ⊂ Rn . The idea is
to bound the integrals of positive terms by integrating over these three sets.
In fact, with the change in variable z = Γ1/2 y, the ball is transformed into the ellipsoid

Γ B(r) = {z | z T Γ(-1) z < r2 }. As z T Γ(-1) z ≤ γm
1/2 (-1)
kzk2 , we may use smaller ball B( γm r) for the
integrations in polar coordinates. For the integrals that are separable along axes, it will be simpler
√ √ √ √
to use the maximal cube C = {z/ − γm r/ n < zi < γm r/ n}. As a summary, we have the

inclusion chain: C ⊂ B( γm r) ⊂ Γ1/2 B(r) ⊂ D0 = Γ1/2 D ⊂ Rn .
In the following, we bound the different integrals used in the last section and summarize after-
ward the modifications it implies in the results of the last section. The first integral is bounded
above by the previous value:
kzk2 kzk2
Z   Z  
exp − dz ≤ exp − dz = (2π)n/2
D 0 2 R n 2
and below by an integration over C:
√ √ ! √ n
Z 
kzk2
 Y Z γm r/ n  2
zi

γm r
n/2
exp − dz ≥ √ √
exp − dzi ≥ (2π) 1 − erfc √
D0 2 − γm r/ n 2 2n
i
Rx
Here, erfc (x) = 1 − √2 exp(−t2 ) dt is the complement of the error function. An interesting
π 0
property is that this function (like the exponential) tends toward zero faster than any fraction 1/xk
as x goes to infinity. Putting things the other way, like for Taylor expansions, we will denote by
ε(x) a function that is an O(xk ) for every positive k:
ε(x) erfc (1/x) exp(−1/x)
lim = lim = lim =0
0+ xk 0+ xk 0+ xk
We can summarize the value of the first integral by:
kzk2
Z    
exp − dz = (2π)n/2 + ε γm −1/2 r−1
D0 2

30
Now, for the second integral, we have
√ √
γm r/ n √ √
√ √
z2
Z     
γm r 2 n 2

√ √
zi2 exp − i dzi = 2π −
2π erfc √2n − √
γm r exp − γm2 r
− γm r/ n 2
√ 
−1/2

= 2π + ε γm r−1

We obtain thus
kzk2
Z  
zi2 exp − dz = (2π)n/2 + ε(γm −1/2 r−1 )
D0 2
In fact, it is easy to see that every integral we computed over Rn in the previous paragraph has
−1/2
the same value over D0 plus a term whose absolute value is of the order of ε(γm r−1 ) = ε(σ/r).
Thus, we can directly generalize the previous results by replacing O(σ k ) with O(σ k ) + ε (σ/r).

Summary of the approximate normal density: In a manifold with injectivity radius r at the
mean point, the normal density in a normal coordinate system at this mean value is:
 T 
y Γy
N (y) = k exp −
2

The normalization constant and the concentration matrices are approximated by the following
expressions for a covariance matrix Σ of small variance σ 2 = Tr(Σ):

1 + O(σ 3 ) + ε (σ/r) 1 σ 
k= p and Γ = Σ(-1) − Ric + O(σ) + ε
(2π)n det(Σ) 3 r

C Approximated generalized χ2 law


Assuming that the random point x ∼ (x̄, Σ) follows a normal law, we want to compute the proba-
bility that

→ −→
χ2 = µ2x = x̄xT Σ(-1) x̄x ≤ α2
Let BΣ (α) be the “elliptic ball” of covariance Σ and radius α: BΣ (α) = {y / y T Σ(-1) y ≤ α2 }.

C.1 Manifold without a cut locus at the mean point


Assuming that there is not cut locus, this probability can be written in a normal coordinate system:
Z Z  T 
2 2 y Γy p
Pr{χ ≤ α } = N (y) dMy = k exp − G(y) dy
χ2 ≤α2 BΣ (α) 2

det(G(y)) = 1 − 16 y T Ric y + RdM (y), we have:


p
Since
 T    T 
y T Ric y
Z Z
2 2 y Γy y Γy
Pr{χ ≤ α } = k exp − 1− dy + k exp − RdM (y) dy
BΣ (α) 2 6 BΣ (α) 2

The last integral, involvingRthe remainder of the metric, is obviously bounded by the integral over
the whole tangent space k Rn exp(−y T Γ y/2) |RdM (y)| dy which we have shown to be a O(σ 3 ) in

31
appendix B. Let Σ1/2 be the positive symmetric square root of Σ. With the change of variable
y = Σ1/2 x, this probability becomes:
√ p Z   
2 2 1 T 1 1 1 T 1 1
Pr{χ ≤ α } = 2 k det(Σ) exp − x Σ Γ Σ x
2 2 1 − x Σ Ric Σ x dx + O(σ 3 )
2 2
kxk≤α 2 6

Using S = 13 Σ1/2 Ric Σ1/2 and the fact that k det(Σ) = (2 π)−n/2 (1 + O(σ 3 )) (from (Eq. 8)), we
p

end up with
1 + O(σ 3 )
Z   
2 2 1 T 1 1 1 T
Pr{χ ≤ α } = p exp − x Σ 2 Γ Σ 2 x 1 − x S x dx + O(σ 3 )
(2 π)n
kxk≤α 2 2

The goal is now to show that the above integrated term is basically exp(−kxk2 /2) plus a remainder
which integrates into a O(σ 3 ). In this process, we will have to compute the integrals:

kxk2
Z  
Ik (α) = exp − kxk2k dx
kxk<α 2

By changing into polar coordinates (r = kxk is the radius and u is the corresponding unit vector
so that x = r u), we have dx = rn−1 dr du and thus:
! Z
Z α 
2k+n−1 2
Ik (α) = du r exp(−r /2) dr
Sn−1 0
! !
α2
π n/2
Z  
k−1+ n t
= t 2 exp − dt
Γ(n/2) 0 2
R +∞
with the change of variable t = r2 . In this formula, Γ = 0 tx−1 exp(−t)dt is the Gamma function

which can be recursively computed from Γ(x + 1) = x Γ(x) with Γ(1) = 1 and Γ(1/2) = π.
The value I0 (α) is in fact the standard cumulated probability function of the χ2 law (up to a
normalization factor). For α = +∞, the remaining integral can be computed using the Gamma
function: n
(2π) 2 2k Γ(k + n2 )
Ik (+∞) =
Γ( n2 )

Series expansion of the exponential From Eq. 7, we have Σ1/2 Γ Σ1/2 = Id − S − RΓ , where
the remainder RΓ behaves as a O(σ 3 ). Thus, we have
!
xT Σ1/2 Γ Σ1/2 x
  
1 2 xT S x
exp − = exp − kxk 1+ + Rexp
2 2 2

where the remainder is given by the convergent series


+∞
X (xT (S + RΓ ) x)k
1
Rexp (x) = xT RΓ x + .
2 2k k!
k=2

Substituting in the probability, we find that:


1 + O(σ 3 ) kxk2
Z  
2 2
Pr{χ ≤ α } = exp − (1 + Rexp (x)) dx + O(σ 3 )
(2 π)n/2 kxk≤α 2

32
Integral of the Remainder Rexp S = 31 Σ1/2 Ric Σ1/2 is a term of the order O(σ 2 ) and the
remainder RΓ behaves as a O(σ 3 ). Thus, we can find positive constants C1 and C2 such that, for
σ sufficiently small, we have: xT RΓ x ≤ C1 σ 3 kxk2 and xT (S + RΓ ) x ≤ C2 σ 2 kxk2 . This means
that we can bound the integral of the remainder by:
+∞ k 2k n +∞
kxk2
Z   X C σ (2 π) 2 X
exp − 3
Rexp (x) dx ≤ C1 σ I1 (+∞) + 2
Ik (+∞) =≤ C10 3
σ + ak
2k Γ n2 ) k=2

kxk<α 2 k!
k=2

where ak = C2k σ 2k Γ(k + n/2)/k!. To show that the series converges, let us investigate the ratio of
successive terms:
ak+1 C2 σ 2 Γ(k + 1 + n/2) k + n/2
= = C2 σ 2
ak k + 1 Γ(k + n/2) k+1
The limit for k = +∞ is obviously C2 σ 2 , which can be made smaller than 1 for σ sufficiently small.
Thus, by d’Alembert’s ratio test, the series converges. Moreover, the smaller term of the series is
a σ 4 (for k=2), which shows that the integral of the remainder is a finally dominated by the first
term, a O(σ 3 ).

Conclusion Finally, the researched probability is


α2  n/2
kxk2
Z   Z
2 2 −n 3 1 t
Pr{χ ≤ α } = (2π) 2 exp − dx+O(σ ) = exp(−t/2)dt+O(σ 3 )
kxk≤α 2 2 Γ( n2 ) 0 2

Thus, the probability density function of a χ2 is:

1  u  n −1  u
2
pχ2 (u) = exp − + O(σ 3 )
2 Γ n2

2 2

C.2 Manifold with a cut locus at the mean point


If there is a cut locus, we have to replace O(σ k ) by O(σ k ) + ε (σ/r) and we should only integrate
on BΣ (α) ∩ D. After the change of coordinates, we should integrate on B(α) ∩ D0 . Thus, we have:

kxk2
Z   σ 
2 2 −n
Pr{χ ≤ α } = (2π) 2 exp − dx + O(σ 3 ) + ε
B(α)∩D0 2 r
√ √
As we have B( γm r) ⊂ D0 ⊂ Rn , we can enclose the researched domain into: B min( γm r, α) ⊂

√ √
B(α) ∩ D0 ⊂ B(α). For α ≤ γm r, there is not problem but for α > γm r, we have:

kxk2
Z   σ 
n
Pr{χ2 ≤ α2 } ≥ (2π)− 2 √
exp − dx + O(σ 3
) + ε
B( γm r) 2 r

and we have already seen that this integral is 1 + ε σr . As α > γm r, the same integral is itself


of the same order, and we obtain in all cases the same result as with no cut locus where O(σ 3 ) is
replaced by O(σ 3 ) + ε (σ/r).

33
References
[1] Xavier Pennec. L’incertitude dans les problèmes de reconnaissance et de recalage – Appli-
cations en imagerie médicale et biologie moléculaire. Thèse de sciences (PhD thesis), Ecole
Polytechnique, Palaiseau (France), December 1996.

[2] X. Pennec and N. Ayache. Uniform distribution, distance and expectation problems for ge-
ometric features processing. Journal of Mathematical Imaging and Vision, 9(1):49–67, July
1998.

[3] X. Pennec, N. Ayache, and J.-P. Thirion. Landmark-based registration using features identified
through differential geometry. In I. Bankman, editor, Handbook of Medical Imaging, chapter 31,
pages 499–513. Academic Press, September 2000.

[4] X. Pennec, C.R.G. Guttmann, and J.-P. Thirion. Feature-based registration of medical images:
Estimation and validation of the pose accuracy. In Proc. of First Int. Conf. on Medical Image
Computing and Computer-Assisted Intervention (MICCAI’98), volume 1496 of LNCS, pages
1107–1114, Cambridge, USA, October 1998. Springer Verlag.

[5] S. Granger, X. Pennec, and A. Roche. Rigid point-surface registration using an EM variant of
ICP for computer guided oral implantology. In W.J. Niessen and M.A. Viergever, editors, 4th
Int. Conf. on Medical Image Computing and Computer-Assisted Intervention (MICCAI’01),
volume 2208 of LNCS, pages 752–761, Utrecht, The Netherlands, October 2001.

[6] S. Granger and X. Pennec. Statistiques exactes et approchées sur les normales aléatoires.
Research report RR-4533, INRIA, 2002.

[7] P.T. Fletcher, S. Joshi, C. Lu, and S Pizer. Gaussian distributions on Lie groups and their
application to statistical shape analysis. In Poc of Information Processing in Medical Imaging
(IPMI’2003), pages 450–462, 2003.

[8] P.T. Fletcher and S.C. Joshi. Principal geodesic analysis on symmetric spaces: Statistics of
diffusion tensors. In Proc. of CVAMIA and MMBIA Workshops, Prague, Czech Republic, May
15, 2004, LNCS 3117, pages 87–98. Springer, 2004.

[9] Ch. Lenglet, M. Rousson, R. Deriche, and O. Faugeras. Statistics on multivariate normal
distributions: A geometric approach and its application to diffusion tensor MRI. Research
Report 5242, INRIA, 2004.

[10] P. Batchelor, M. Moakher, D. Atkinson, F. Calamante, and A. Connelly. A rigorous framework


for diffusion tensor calculus. Mag. Res. in Med., 53:221–225, 2005.

[11] Xavier Pennec, Pierre Fillard, and Nicholas Ayache. A Riemannian framework for tensor
computing. International Journal of Computer Vision, 65(1), October 2005.

[12] Pierre Fillard, Vincent Arsigny, Xavier Pennec, Paul Thompson, and Nicholas Ayache. Ex-
trapolation of sparse tensor fields: Application to the modeling of brain variability. In Gary
Christensen and Milan Sonka, editors, Proc. of Information Processing in Medical Imaging
2005 (IPMI’05), volume 3565 of LNCS, pages 27–38, Glenwood springs, Colorado, USA, July
2005. Springer.

34
[13] X. Pennec and J.-P. Thirion. A framework for uncertainty and validation of 3D registra-
tion methods based on points and frames. Int. Journal of Computer Vision, 25(3):203–229,
December 1997.

[14] Shun-ichi Amari. Differential-geometric methods in Statistics, volume 28 of Lecture Notes in


Statistics. Springer, 2nd corr. print edition, 1990.

[15] J.M. Oller and J.M. Corcuera. Intrinsic analysis of statistical estimation. Annals of Statistics,
23(5):1562–1581, 1995.

[16] C. Bingham. An antipodally symmetric distribution on the sphere. The Annals of Statistics,
2(6):1201–1225, 1974.

[17] P.E. Jupp and K.V. Mardia. A unified view of the theory of directional statistics, 1975-1988.
Int. Statistical Review, 57(3):261–294, 1989.

[18] J.T. Kent. The art of Statistical Science, chapter 10 : New Directions in Shape Analysis, pages
115–127. John Wiley & Sons, 1992. K.V. Mardia, ed.

[19] K.V. Mardia. Directional statistics and shape analysis. Journal of applied Statistics, 26(949-
957), 1999.

[20] D.G. Kendall. A survey of the statistical theory of shape (with discussion). Statist. Sci.,
4:87–120, 1989.

[21] I.L. Dryden and K.V. Mardia. Theoretical and distributional aspects of shape analysis. In Prob-
ability Measures on Groups, X (Oberwolfach, 1990), pages 95–116, New York, 1991. Plenum.

[22] H. Le and D.G. Kendall. The Riemannian structure of euclidean shape space: a novel envi-
ronment for statistics. Ann. Statist., 21:1225–1271, 1993.

[23] C.G. Small. The Statistical Theory of Shapes. Springer series in statistics. Springer, 1996.

[24] U. Grenander. Probabilities on Algebraic Structures. Whiley, 1963.

[25] H. Karcher. Riemannian center of mass and mollifier smoothing. Comm. Pure Appl. Math,
30:509–541, 1977.

[26] W.S. Kendall. Convexity and the hemisphere. Journ. London Math. Soc., 43(2):567–576, 1991.

[27] M. Emery and G. Mokobodzki. Sur le barycentre d’une probabilité dans une variété. In M. Yor
J. Azema, P.A. Meyer, editor, Séminaire de probabilités XXV, volume 1485 of Lect. Notes in
Math., pages 220–233. Springer-Verlag, 1991.

[28] M. Arnaudon. Barycentres convexes et approximations des martingales continues dans les
variétés. In M. Yor J. Azema, P.A. Meyer, editor, Séminaire de probabilités XXIX, volume
1613 of Lect. Notes in Math., pages 70–85. Springer-Verlag, 1995.

[29] J. Picard. Barycentres et martingales sur une variété. Annales de l’institut Poincaré - Proba-
bilités et Statistiques, 30(4):647–702, 1994.

[30] R.W.R. Darling. Martingales on non-compact manifolds: maximal inequalities and prescribed
limits. Ann. Inst. H. Poincarré Proba. Statistics, 32(4):431–454, 1996.

35
[31] U. Grenander, M.I. Miller, and A. Srivastava. Hilbert-schmidt lower bounds for estimators on
matrix Lie groups for atr. IEEE Trans. on PAMI, 20(8):790–802, 1998.

[32] X. Pennec. Computing the mean of geometric features - application to the mean rotation.
Research Report RR-3371, INRIA, March 1998.

[33] C. Gramkow. On averaging rotations. Int. Jour. Computer Vision, 42(1-2):7–16, April/May
2001.

[34] M. Moakher. Means and averaging in the group of rotations. SIAM J. of Matrix Anal. Appl.,
24(1):1–16, 2002.

[35] A. Edelman, T. Arias, and S.T. Smith. The geometry of algorithms with orthogonality con-
straints. SIAM Journal of Matrix Analysis and Applications, 20(2):303–353, 1998.

[36] H. Hendricks. A Cramer-Rao type lower bound for estimators with values in a manifold.
Journal of Multivariate Analysis, 38:245–261, 1991.

[37] R. Bhattacharya and V. Patrangenaru. Nonparametric estimation of location and dispersion


on Riemannian manifolds. Journal of Statistical Planning and Inference, 108:23–36, 2002.

[38] R. Bhattacharya and V. Patrangenaru. Large sample theory of intrinsic and extrinsic sample
means on manifolds, I. Annals of Statistics, 31(1):1–29, 2003.

[39] M. Spivak. Differential Geometry, volume 1. Publish or Perish, Inc., 2nd edition, 1979.

[40] W. Klingenberg. Riemannian Geometry. Walter de Gruyter, Berlin, New York, 1982.

[41] M. do Carmo. Riemannian Geometry. Mathematics. Birkhäuser, Boston, Basel, Berlin, 1992.

[42] S. Gallot, D. Hulin, and J. Lafontaine. Riemannian Geometry. Springer Verlag, 2nd edition
edition, 1993.

[43] H. Poincaré. Calcul des probabilités. 2nd edition, Paris, 1912.

[44] M.G. Kendall and P.A.P. Moran. Geometrical probability. Number 10 in Griffin’s statistical
monographs and courses. Charles Griffin & Co. Ltd., 1963.

[45] Xavier Pennec. Probabilities and Statistics on Riemannian Manifolds: A Geometric approach.
Research Report 5093, INRIA, January 2004.

[46] M. Fréchet. L’intégrale abstraite d’une fonction abstraite d’une variable abstraite et son ap-
plication à la moyenne d’un élément aléatoire de nature quelconque. Revue Scientifique, pages
483–512, 1944.

[47] M. Fréchet. Les éléments aléatoires de nature quelconque dans un espace distancié. Ann. Inst.
H. Poincaré, 10:215–310, 1948.

[48] W.S. Kendall. Probability, convexity, and harmonic maps with small image I: uniqueness and
fine existence. Proc. London Math. Soc., 61(2):371–406, 1990.

[49] S. Doss. Sur la moyenne d’un élément aléatoire dans un espace distancié. Bull. Sc. Math.,
73:48–72, 1949.

36
[50] W. Herer. Espérance mathématique au sens de Doss d’une variable aléatoire à valeur dans un
espace métrique. C. R. Acad. Sc. Paris, Série I, t.302(3):131–134, 1986.

[51] W. Herer. Espérance mathématique d’une variable aléatoire à valeur dans un espace métrique
à courbure négative. C. R. Acad. Sc. Paris, Série I, t.306:681–684, 1988.

[52] M. Arnaudon. Espérances conditionnelles et C-martingales dans les variétés. In M. Yor


J. Azema, P.A. Meyer, editor, Séminaire de probabilités XXVIII, volume 1583 of Lect. Notes
in Math., pages 300–311. Springer-Verlag, 1994.

[53] H. Maillot. Différentielle de la variance et centrage de la plaque de coupure sur une variété
riemannienne compacte. Communication personnelle, 1997.

[54] W.S. Kendall. The propeller: a counterexample to a conjectured criterion for the existence of
certain harmonic functions. Journal of the London Mathematical Society, 46:364–374, 1992.

[55] Ernst Hairer, Ch. Lubich, and Gerhard Wanner. Geometric numerical integration : struc-
ture preserving algorithm for ordinary differential equations, volume 31 of Springer series in
computational mathematics. Springer, 2002.

[56] J.-P. Dedieu, G. Malajovich, and P. Priouret. Newton method on Riemannian manifolds:
Covariant alpha-theory. IMA Journal of Numerical Analysis, 23:395–419, 2003.

[57] P. Huber. Robust Statistics. John Wiley, New York, 1981.

[58] P.J. Rousseeuw and A.M. Leroy. Robust Regression and Outliers Detection. Wiley series in
prob. and math. stat. J. Wiley and Sons, 1987.

[59] M. Emery. Stochastic Calculus in Manifolds. Springer, Berlin, 1989.

[60] Alexander Grigor’yan. Heat kernels on weighted manifolds and applications. Cont. Math,
2005. To appear. http://www.ma.ic.ac.uk/ grigor/pubs.htm.

[61] K.V. Mardia and P.E. Jupp. Directional statistics. Whiley, Chichester, 2000.

[62] A.M. Kagan, Y.V. Linnik, and C.R. Rao. Characterization problems in Mathematical Statistics.
Whiley-Interscience, New-York, 1973.

[63] I. Chavel. Riemannian geometry - A modern introduction, volume 108 of Cambridge tracts in
mathematics. Cambridge university press, 1993.

[64] W.H. Press, B.P. Flannery, S.A Teukolsky, and W.T. Vetterling. Numerical Recipices in C.
Cambridge Univ. Press, 1991.

[65] X. Pennec. Toward a generic framework for recognition based on uncertain geometric features.
Videre: Journal of Computer Vision Research, 1(2):58–87, 1998.

[66] A. Roche, X. Pennec, G. Malandain, and N. Ayache. Rigid registration of 3D ultrasound with
mr images: a new approach combining intensity and gradient information. IEEE Transactions
on Medical Imaging, 20(10):1038–1049, October 2001.

37
[67] S. Nicolau, X. Pennec, L. Soler, and N. Ayache. Evaluation of a new 3D/2D registration crite-
rion for liver radio-frequencies guided by augmented reality. In N. Ayache and H. Delingette, ed-
itors, International Symposium on Surgery Simulation and Soft Tissue Modeling (IS4TM’03),
volume 2673 of Lecture Notes in Computer Science, pages 270–283, Juan-les-Pins, France,
2003. INRIA Sophia Antipolis, Springer-Verlag.

[68] V. Rasouli. Application of Riemannian multivariate statistics to the analysis of rock fracture
surface roughness. PhD thesis, University of London, 2002.

[69] X. Pennec. Probabilities and statistics on Riemannian manifolds: Basic tools for geometric
measurements. In A.E. Cetin, L. Akarun, A. Ertuzun, M.N. Gurcan, and Y. Yardimci, editors,
Proc. of Nonlinear Signal and Image Processing (NSIP’99), volume 1, pages 194–198, June
20-23, Antalya, Turkey, 1999. IEEE-EURASIP.

38

You might also like