Symmetric Positive-Definite Matrices From Geometry
Symmetric Positive-Definite Matrices From Geometry
net/publication/227258253
CITATIONS READS
118 2,384
2 authors, including:
Maher Moakher
École Nationale d'Ingénieurs de Tunis
72 PUBLICATIONS 3,219 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Maher Moakher on 19 May 2014.
Summary. In many engineering applications that use tensor analysis, such as ten-
sor imaging, the underlying tensors have the characteristic of being positive def-
inite. It might therefore be more appropriate to use techniques specially adapted
to such tensors. We will describe the geometry and calculus on the Riemannian
symmetric space of positive-definite tensors. First, we will explain why the geom-
etry, constructed by Emile Cartan, is a natural geometry on that space. Then, we
will use this framework to present formulas for means and interpolations specific to
positive-definite tensors.
17.1 Introduction
dF (A, B) = A − B . (17.1)
Let GL(n) be the general linear group of all nonsingular matrices in M(n).
The exponential
∞ of a matrix A ∈ M(n) is given as usual by the power series
exp A = k=0 Ak /k!, which converges for all A ∈ M(n). The exponential is
a differentiable map from M(n) onto GL(n).
The vector space of symmetric matrices in M(n) is denoted by S(n). For
P ∈ S(n) we say that P is positive semidefinite if xT Px ≥ 0 for all x ∈ Rn .
If P is positive semidefinite and invertible we say that P is symmetric posi-
tive definite. The subset of S(n) consisting of all positive-definite matrices is
a convex cone whose interior consists of all positive-definite matrices and is
denoted by P(n). While in most applications n = 3 (or n = 2), the differ-
ent notions dealt with in this work are introduced for arbitrary n > 0. The
graphical illustrations presented here are for the interesting case n = 3.
The set P(n) is a manifold whose tangent space at any of its points P is the
space TP P(n) = {P} × S(n). The infinitesimal arclength
defines a Riemannian metric on P(n). The general linear group GL(n) acts
transitively on the manifold P(n) by congruent transformations defined for
S ∈ GL(n) by [S]P = ST PS. Using properties of the trace one can easily
verify that for any curve P(t) in P(n) we have
where λi (P−1 Q), 1 ≤ i ≤ n are the eigenvalues of the matrix P−1 Q. Because
P−1 Q is similar to P−1/2 QP−1/2 , the eigenvalues λi (P−1 Q) are all positive
and hence (17.3) is well defined for all P and Q of P(n). The unique geodesic
joining P and Q is the curve
It follows from (17.3) that dR (P−1 , Q−1 ) = dR (P, Q). Hence, the inversion
P → P−1 is an involutive isometry on P(n) for this metric, and therefore,
P(n) becomes a Riemannian symmetric space. It is in fact a typical example
of a symmetric space of non-compact type as classified by E. Cartan [13]. It
is also an example of a Riemannian manifold of nonpositive curvature [6].
whose covariant matrices are P and Q gives rise to the Kullback-Leibler di-
vergence for the two SPD matrices P and Q
n
KL(P, Q) = (λi − log λi − 1) . (17.7)
i=1
288 M. Moakher and P.G. Batchelor
From this expression, as x − log x − 1 ≥ 0 for all x > 0 with equality holding
only when x = 1, it becomes clear that KL(·, ·) defines a divergence on the
space of SPD matrices.
We emphasize here the fact that the Kullback-Leibler divergence (17.6)
does not define a distance on the space of positive-definite matrices as it
is neither symmetric with respect to its two arguments nor does it satisfy
the triangle inequality. Its symmetrized form KLs (P, Q) = 12 (KL(P, Q) +
KL(Q, P)) can be expressed as
KLs (P, Q) = 1
2 tr(Q−1 P + P−1 Q − 2I), (17.8)
Fig. 17.1. ‘Spheres’: Isosurfaces of the distance measure between a SPD tensor in
P(3) with eigenvalues (λ1 , λ2 , λ3 ) and the identity tensor; (a) Euclidean distance,
(b) geodesic distance, and (c) square root of the Kullback-Leibler symmetrized
divergence
17 Symmetric Positive-Definite Matrices 289
n
2
3
3n − 1 n
2
= 4 ln2 λi − ln λi ln λj .
n i=1 n
1≤i<j≤n
2 2
3 3
3 3 n n
3
= 424 λi 1/λi − 2n .
i=1 i=1
Fig. 17.2. Anisotropy indices: Isosurfaces of the anisotropy index of SPD tensors in
P(3) represented in the space (λ1 , λ2 , λ3 ) of eigenvalues; (a) Fractional anisotropy,
(b) geodesic anisotropy, and (c) Kullback-Leibler anisotropy
Note that AR (·) has the same functional form in terms of the eigenvalues
as AF (·) but on a logarithmic scale. We remark that AR (·) and AKL (·) are
invariant under matrix inversion while AF (·) is not.
The range of all of the above anisotropy indices is [0, ∞). In order to
compare these indices with other anisotropy indices with range [0, 1) used in
the literature, we normalize these indices in the following way
F A(P) = AF (P)/PF ,
GA(P) = AR (P)/(1 + AR (P)) ,
KLA(P) = AKL (P)/(1 + AKL (P)) .
Fig. 17.3. Anisotropy indices: Contours of the anisotropy index of SPD tensors in
P(3) with eigenvalues (λ1 , λ2 , λ3 ) on an octahedral plane; (a) Fractional anisotropy,
(b) geodesic anisotropy, and (c) Kullback-Leibler anisotropy.
the Kullback-Leibler anisotropy, and both behave quite differently than the
fractional anisotropy.
The set of all symmetric tensors in S(3) with a given trace is a plane
in the (λ1 , λ2 , λ3 )-space, called an octahedral plane, which is perpendicular
to the line of isotropic tensors (or in the language of plasticity theory, the
line of hydrostatic pressure). The intersection of this plane with the positive
orthant of SPD tensors is an equilateral triangle. In Fig. 17.3, contours of
the anisotropies on the octahedral plane that passes through the identity
tensor, i.e., plane of tensors with trace equal 3, are presented. Once again, the
fractional anisotropy does not see the boundary of the set of SPD tensors: for
relatively large values of the anisotropy the contour lines go over the limiting
equilateral triangle. The contour lines for the geodesic and Kullback-Leibler
anisotropies, on the other hand, stay inside this triangle and follow it closely
for large values of the anisotropy index.
We recall that the spectral decomposition of a SPD matrix P in P(3) is
P = RDRT , where D = diag(λ1 , λ2 , λ3 ) are the eigenvalues of P and R
is an orthogonal matrix whose columns are the eigenvectors of P. The set
of (positive) eigenvalues λ1 , λ2 , λ3 and the orthogonal matrix R provide a
parametrization for the elements of P(3). It has been customary to use this
parametrization to visualize a SPD matrix P as an ellipsoid whose principal
directions are parallel to the eigenvectors of P and axes proportional to the
eigenvalues of P. Thus the methods discussed in this chapter such as averag-
ing, interpolation can also be used to perform these operations for ellipsoids.
In Fig. 17.4 we use this representation to visualize the diffusion tensors of a
brain region and we use color for representing the indicated anisotropy index.
17.4 Means
The arithmetic and geometric means, usually used to average a finite set of
positive numbers, generalize naturally to a finite set of SPD matrices. This
292 M. Moakher and P.G. Batchelor
Fig. 17.4. Anisotropies: Diffusion ellipsoids of a brain region colored by the FA (a),
the GA (b) and the KLA (c). (See color plates)
where d(·, ·) designates the distance (or the square root of the divergence).
Using the Euclidean distance (17.1) and the Riemannian distance (17.3) on
P(n) in Definition 17.4.1 we obtain the arithmetic and geometric means.
1
m
A(P1 , . . . , Pm ) = Pi . (17.10)
m i=1
G(P1 , P2 ) = P1 (P−1
1 P2 )
1/2
= P2 (P−1
2 P1 )
1/2
. (17.12)
We remark that the arithmetic and geometric means are invariant under con-
gruent transformations, and that the geometric mean is invariant under inver-
sion. We refer the reader to [8] for further details on the geometric mean and a
proof of its characterization (17.11). Solution of the nonlinear matrix equation
(17.11) can be obtained numerically by different methods. For instance, one
can use Newton’s method on general Riemannian manifolds which is similar
to the classical Newton’s method on a Euclidean space but with the substi-
tution of straight lines by geodesics and vector addition by the exponential
map, see e.g., [12]. We also point out the fixed point algorithm proposed in
[9] specifically to solve (17.11).
In the previous section we showed that arithmetic and geometric means are
defined, and arise naturally, as the unique minimizers of the sum of squared
distances from a given set of SPD matrices. The following Lemma shows that
the arithmetic and harmonic means arise as minimizers of functions defined
by the Kullback-Leibler divergence and that the geometric mean of those two
means arise as the unique minimizer of a function defined by the symmetrized
Kullback-Leibler divergence.
Lemma 17.4.3 Let Qi , i = 1, . . . , m be m given SPD matrices.
1. The function
m
A(P) := KL(Qi , P)
k=1
2. The function
m
B(P) := KL(P, Qi )
k=1
m
−1 −1
is minimized by H(Q1 , . . . , Qm ) := m i=1 Qi , i.e., the harmonic
mean of Q1 , . . . , Qm .
3. The function
m
C(P) := KLs (P, Qi )
k=1
Equating these gradients to zero and solving for P yield the results.
The following Lemma shows that the mean based on the symmetrized
Kullback-Leibler divergence of two matrices coincides with their geometric
mean.
Lemma 17.4.4 The geometric mean satisfies the identity
Before we close this section we note that weighted means can also be
defined by analogy with means of positive numbers.
Definition 17.4.5 We define a weighted mean relative to a distance (or a
divergence) of a finite set of SPD matrices P1 , . . . , Pm with (non-negative)
weights w1 , . . . , wm to be the SPD matrix P that minimizes
m
wi d(Pi , P)2 ,
i=1
where d(·, ·) designates the distance (or the square root of the divergence).
Among the applications of weighted means we can cite their use as a smoothing
filter for denoising measured SPD data, see e.g., Chap. 21 by Welk et al. In
the next section we are going to use weighted means to define interpolation
of scattered SPD data.
17.5 Interpolation
One of the emerging problems from the DT-MRI community is the interpo-
lation of scattered diffusion tensor data, see for example Chap. 18 by Pajevic
et al. and Chap. 19 by Weickert and Welk. Given the values of a symmetric
positive-definite matrix field at some points of space what is the natural way
to evaluate the tensor field at some other points? We present here several
methods of multivariate interpolation of SPD tensor data over simplicial do-
mains. These interpolation methods are analogous to multivariate Lagrange
interpolation of scalar or vector data. The main ingredients are the use of
weighted means and barycentric coordinates.
This property of the linear interpolation on the line segment [0, 1], which also
holds true for simplices of higher dimension, can be generalized by replacing
the weighted arithmetic mean by other weighted means. For example, if we
use the weighted geometric mean we obtain the geodesic interpolation given
explicitly by
λ → G(λ) = P1 (P−1 λ −1
1 P2 ) = P2 (P2 P1 )
1−λ
, 0≤λ≤1. (17.14)
The geodesic interpolation naturally takes into account the positive-definite
character of the involved matrices. Furthermore, the matrix G(λ) is always in
P(n) even if λ falls outside the interval [0, 1] (extrapolation). If the matrices P1
and P2 commute then G(λ) = Pλ1 P1−λ 2 . Figure 17.5 shows diffusion ellipsoids
for linear interpolation based on (17.13) and geodesic interpolation based on
(17.14) between two SPD tensors.
Given d + 1 SPD matrices P1 , . . . , Pd+1 that are the values of a SPD matrix
field at the d + 1 vertices of a d-dimensional simplex, the linear Lagrange
interpolation at a point with barycentric coordinates (λ1 , . . . , λd+1 ) is given
by
d+1
(λ1 , . . . , λd+1 ) → P(λ1 , . . . , λd+1 ) = λi P i , (17.15)
i=1
where the λi ’s satisfy 0 ≤ λi ≤ 1, for 1 ≤ i ≤ d + 1 and λ1 + · · · + λd+1 = 1.
Once again, P(λ1 , . . . , λd+1 ) can be seen as the weighted arithmetic mean
of the SPD matrices P1 , . . . , Pd+1 . Similar to the univariate case, we can also
define the geodesic interpolation of P1 , . . . , Pd+1 at a point with barycentric
coordinates (λ1 , . . . , λd+1 ) as the weighted geometric mean of P1 , . . . , Pd+1
with weights λ1 , . . . , λd+1 . However, unlike the univariate case, for d > 1 this
weighted geometric mean cannot be given explicitly and one has to numerically
solve the nonlinear matrix equation
d+1
λi Log(P−1
i X) = 0 . (17.16)
i=1
17 Symmetric Positive-Definite Matrices 297
In the case where all Pi ’s commute this equation yields the solution
λ
Pλ1 1 · · · Pd+1
d+1
.
References
1. Amari, S. (1985) Differential-Geometrical Methods in Statistics. Springer-
Verlag, Berlin Heidelberg.
2. Basser, P. and Pierpaoli, C. (1996) Microstructural and physiological features
of tissues elucidated by quantitative-diffusion-tensor MRI,
I J. Magn. Reson. B
111/3, pp. 209–219.
3. Batchelor, P. G., Moakher, M., Atkinson, D., Calamante, F., Connelly, A. (2004)
A rigorous framework for diffusion tensor analysis using Riemannian geometry.
Magn. Reson. Med., in press.
4. Helgason, S. (1978) Differential Geometry, Lie Groups, and Symmetric spaces.
Academic Press, New York.
5. Kullback, S. L. (1959) Information Theory and Statistics. Wiley, New York.
6. Lang, S. (1999) Fundamentals of Differential Geometry. Springer-Verlag, New
York.
298 M. Moakher and P.G. Batchelor