Geometric Structures in Info Geometry
Geometric Structures in Info Geometry
Jun Zhang
1 Introduction
Under coordinate transform x 7→ x̃, the new coefficients Γe are related to old
ones Γ via
X X ∂xi ∂xj ∂ x ∂ x̃l
2 k
l
Γemn (x̃) =
m n
Γijk (x) + ; (3)
i,j
∂ x̃ ∂ x̃ ∂ x̃m ∂ x̃n ∂xk
k
A curve whose tangent vectors are parallel along the curve is said to be “auto-
parallel”.
As a primitive on a manifold, affine connections can be characterized in terms
of their (i) torsion and (ii) curvature. The torsion T of a connection Γ , which
is a tensor itself, is
Pgiven by the asymmetric part of the connection T (∂i , ∂j ) =
∇∂i ∂j − ∇∂j ∂i = k Tijk ∂k , where Tijk is its local representation given as
l
By definition, Rkij is anti-symmetric when i ←→ j.
l
A connection is said to be flat when Rkij (x) ≡ 0 and Tijk ≡ 0. Note that this
is a tensorial condition, so that the flatness of a connection ∇ is a coordinate-
independent property even though the local expression of the connection (in
terms of Γ ) is coordinate-dependent. For any flat connection, there exists a local
coordinate system under which Γijk (x) ≡ 0 in a neighborhood; this is the affine
coordinate for the given flat connection.
In the above discussions, metric and connections are treated as separate
structures on a manifold. When both are defined on the same manifold, then it
is convenient to express affine connection Γ in its “covariant” form
X
Γij,k = g(∇∂i ∂j , ∂k ) = glk Γijl . (4)
l
Though Γijk is the more primitive quantity that does not involve metric, Γij,k
represents the projection of Γ onto the manifold spanned by the bases ∂k . The
covariant form of Riemann curvature is (c.f. footnote 2)
X
m
Rlkij = glm Rkij .
m
∂k g(∂i , ∂j ) = g(∇
b ∂ ∂i , ∂j ) + g(∂i , ∇
k
b ∂ ∂j ).
k
(5)
2
This component-wise notation of Riemann curvature tensor followed standard dif-
ferential geometry textbook, such as Nomizu and Sasaki (1994). On the other hand,
information geometers,
P l such as Amari and
P Nagaoka (2000), adopt the notation that
m
R(∂i , ∂j )∂k = l Rijk ∂l , with Rijkl = l Rijk gml .
Such a connection, denoted as ∇,b is called the Levi-Civita connection. Its com-
ponent forms, called Christoffel symbols, are determined by the components of
the metric tensor as (“Christoffel symbols of the second kind”)
The Levi-Civita connection Γb is compatible with the metric g, in the sense that
it treats tangent vectors of the shortest curves on a manifold as being parallel
(or equivalently, it treats geodesics as auto-parallel curves).
It turns out that one can define a kind of “compatibility” relation more
general than expressed by (5), by introducing the notion of “conjugacy” (denoted
by ∗) between two connections. A connection ∇∗ is said to be conjugate (or dual)
to ∇ with respect to g if
Clearly, (∇∗ )∗ = ∇. Moreover, ∇,b which satisfies (5), is special in the sense that
∗
it is self-conjugate (∇) = ∇.
b b
Because metric tensor g provides a one-to-one mapping between points in the
tangent space (i.e., vectors) and points in the cotangent space (i.e., co-vectors),
(6) can also be seen as characterizing how co-vector fields are to be parallel-
transported in order to preserve their dual pairing h·, ·i with vector fields.
Writing out (6):
∂gij ∗
= Γki,j + Γkj,i , (7)
∂xk
where analogous to (2) and (4),
X
∇∗∂i ∂j = Γij∗l ∂l
l
so that X
∗
Γkj,i = g(∇∗∂j ∂k , ∂i ) = ∗l
gil Γkj .
l
where δik is the Kronecker delta. When two connections are projectively equiv-
alent, their corresponding auto-parallel curves have identical shape (i.e, consid-
ered as unparameterized curves); these so-called “pre-geodesics” differ only by a
change of parameterization τ .
Two torsion-free connections Γ and Γ 0 are said to be dual-projectively equiv-
alent if there exists a function τ such that:
0
Γij,k = Γij,k − gij (∂k τ ).
When φ = const (or ψ = const), then the corresponding connections are projec-
tively (dual-projectively, resp) equivalent.
or
X ∂ log Ω(x)
Γill (x) = . (10)
∂xi
l
One immediately sees that the existence of a function Ω satisfying (10) is equiv-
alent to the right side of (12) to be identically zero.
Making use of (10), it is easy to show that the parallel volume form of a
Levi-Civita connection Γb is given by
q
Ω(x)
b = det[gij (x)].
Making use of (7), the parallel volume forms Ω, Ω ∗ associated with Γ and Γ ∗
satisfy (apart from a multiplicative constant which must be positive)
∂Ti ∂Tj
= . (15)
∂xj ∂xi
This expresses the integrability condition. When Equation (15) is satisfied, there
exits a function φ such that Ti = ∂i τ . Furthermore, it can be shown that
τ = −2 log(Ω/Ω).
b
(α)k 1 + α k 1 − α ∗k
Γij = Γij + Γij . (16)
2 2
Obviously, Γ (0) = Γb is the Levi-Civita connection. Using cubic form, this amounts
to ∇(α) g = αC. The α-parallel volume element given by:
α
Ω (α) = e− 2 τ Ω
b
So, Γ is flat if and only if Γ ∗ is flat. In this case, the manifold is said to be
“dually flat”. When Γ, Γ ∗ are dually flat, then Γ (α) is called “α-transitively
flat” (Uohashi, 2002). In such case, {M, g, Γ (α) , Γ (−α) } is called an α-Hessian
structure (Zhang and Matsuzoe, 2009). They are all compatible with a metric g
that is induced from a strictly convex (potential) function, see next subsection.
For an α-Hessian manifold, the Tchebychev form (14) is given by
∂ log(det[gkl ])
Ti =
∂xi
and its derivative (known as the second Koszul form) is
∂Ti ∂ 2 log(det[gkl ])
βij = = .
∂xj ∂xi ∂xj
and
X ∂xr ∂xs ∂xt ∂ 2 xt
Γ rs,t (u) = Γij,k (x) + . (22)
∂ui ∂uj ∂uk ∂ur ∂us
i,j,k
Similarly relations hold between Γt∗rs (u) and Γij∗k (x), and between Γ ∗rs,t (u) and
∗
Γij,k (x).
In analogous to (7), we have the following identity
∂ 2 xt ∂g rt (u)
= = Γ rs,t (u) + Γ ∗ts,r (u).
∂us ∂ur ∂us
Therefore, we have
Proposition 2. Under biorthogonal coordinates, a pair of conjugate connections
Γ, Γ ∗ satisfy X
Γ ∗ts,r (u) = − g ir (u)g js (u)g kt (u)Γij,k (x) (23)
i,j,k
and X
Γr∗ ts (u) = − g js (u)Γjr
t
(x). (24)
j
Let us now express parallel volume forms Ω(x), Ω(u) under biorthogonal
coordinates x or u. Contracting the indices t with r in (24), and invoking (10),
we obtain
∂ log Ω ∗ (u) X ∂xj ∂ log Ω(x) ∂ log Ω ∗ (u) ∂ log Ω(x)
+ = + = 0.
∂us j
∂us ∂xj ∂us ∂us
After integration,
Ω ∗ (u) Ω(x) = const. (25)
From (13) and (25),
Ω(u) Ω ∗ (x) = const. (26)
The relations (25) and (26) indicate that the volume forms of the pair of conju-
gate connections, when expressed in biorthogonal coordinates respectively, are
inversely proportional to each other.
The Γ (α) -parallel volume element Ω (α) can be shown to be given by (in either
x and u coordinates)
1+α 1−α
Ω (α) = Ω 2 (Ω ∗ ) 2 .
Clearly,
Ω (α) (x)Ω (−α) (x) = det[gij (x)] ←→ Ω (α) (u)Ω (−α) (u) = det[g ij (u)].
∂ 2 Φ(x) ij ∂ 2 Φ(u)
e
gij (x) = ←→ g (u) = .
∂xi ∂xj ∂ui ∂uj
It follows from the above Lemma that a necessary and sufficient condition for
a Riemannian manifold to admit biorthogonal coordinates it that its Levi-Civita
connection is given by
1 ∂gik ∂gjk ∂gij 1 ∂gij
Γbij,k (x) ≡ + − = .
2 ∂xj ∂xi ∂xk 2 ∂xk
In other words, biorthogonal coordinates are affine coordinates for the dually-
flat pair of connections. In fact, we can now define a pair of torsion-free connec-
tions by
∗ ∂gij
γij,k (x) = 0, γij,k (x) =
∂xk
and show that they are conjugate with respect to g, that is, they satisfy (6). This
is to say that we select an affine connection γ such that x is its affine coordinate.
From (22), when γ ∗ is expressed in u-coordinates,
X ∂xk ∂gij (x) ∂g ts (u)
γ ∗rs,t (u) = g ir (u)g js (u) +
∂ut ∂xk ∂ur
i,j,k
∂g js (u) ∂g ts (u)
X
= g ir (u) − gij (x) +
i,j
∂ut ∂ur
X ∂g js (u) ∂g ts (u)
=− δjr + = 0.
j
∂ut ∂ur
∗
It is easily verifiable that Γij,k , Γij,k as given above are torsion-free3 and
satisfy the conjugacy condition with respect to the induced metric gij . Hence
{M, g, Γ, Γ ∗ } as induced is a “statistical manifold’ (Lauritzen, 1987).
A natural question is whether/how the statistical structures induced from
different divergence functions are related. The following is known:
ωx = dxi ∧ dξ i .
(Recall that the comma separates the variable being in the first slot versus the
second slot for differentiation.) It is easily to check that in a neighborhood of
3
Conjugate connections which admit torsion has been recently studied by Calin, Mat-
suzoe, and Zhang (2009) and Matsuzoe (2010).
the diagonal ∆M ⊂ M × M, the map LD is a diffeomorphism since the Jacobian
matrix of the map
δij Dij
0 Di,j
is non-degenerate in such a neighborhood of the diagonal ∆M .
We calculate the pullback of this symplectic form (defined on T ∗ Mx ) to
M × M:
(Here Dij dxi ∧ dxj = 0 since Dij (x, y) = Dji (x, y) always holds.)
Similarly, we consider the canonical symplectic form ωy = dy i ∧ dη i on My
and define a map RD from M × M → T ∗ My , (x, y) 7→ (y, η) given by
or explicitly
∂2D ∂2D
i j
= .
∂x ∂y ∂xj ∂y i
Note that this condition is always satisfied on ∆M , by the definition of a di-
vergence function D, which has allowed us to define a Riemannian structure on
∆M (Proposition 6). We now require it to be satisfied on M × M (at least a
neighborhood of ∆M .
For divergence functions satisfying (31), we can consider inducing a metric
GD on M × M — the induced Riemannian (Hermit) metric GD is defined by
GD (X, Y ) = ωD (X, JY ).
∂ ∂ ∂ ∂ ∂ ∂
Gi0 j 0 = gD i
, j = ωD i
,J j = ωD i
,− j = −Dj,i ,
∂y ∂y ∂y ∂y ∂y ∂x
∂ ∂ ∂ ∂ ∂ ∂
Gij 0 = gD i
, j = ωD i
,J j = ωD i
,− j =0.
∂x ∂y ∂x ∂y ∂x ∂x
∂ ∂ ∂ ∂ ∂ ∂
Gi0 j = gD , = ωD ,J = ωD ,− j =0.
∂y i ∂xj ∂y i ∂xj ∂y i ∂y
If D satisfies
Dij + D,ij = κDi,j (32)
where κ is a constant, then M × M admits a Kähler potential (and hence D
b is
a Kähler manifold)
κ ∂2D b
ds2 = dz i ⊗ dz̄ j .
2 ∂z ∂ z̄ j
i
∗ ∂ 3 Φ(x)
Γij,k (x) = 0, Γij,k (x) =
∂xi ∂xj ∂xk
are induced from a convex potential function Φ. In the (biorthogonal) u-coordinates,
these geometric quantities can be expressed as
∂ 2 Φ(u)
e ∂ 3 Φ(u)
e
g ij (u) = , Γ ∗ ij,k (u) = 0, Γ ij,k (u) = ,
∂ui ∂uj ∂ui ∂uj ∂uk
(Where there is no danger of confusion, the subscript n in h·, ·in is often omitted.)
A basic fact in convex analysis is that the necessary and sufficient condition for
a smooth function Φ to be strictly convex is
BΦ (x, y) > 0 (35)
for x 6= y.
e : Ve ⊆ R
Recall that, when Φ is convex, its convex conjugate Φ e n → R is
defined through the Legendre transform:
Φ(u)
e = h(∂Φ)−1 (u), ui − Φ((∂Φ)−1 (u)), (36)
with Φ
ee
= Φ and (∂Φ) = (∂ Φ) e −1 . The function Φ
e is also convex, and through
which (35) precisely expresses the Fenchel inequality
Φ(x) + Φ(u)
e − hx, ui ≥ 0
∂Φ ∂Φ
e
ui = i
←→ xi = . (38)
∂x ∂ui
With the aid of conjugate variables, we can introduce the “canonical diver-
gence” AΦ : V × Ve → R+ (and AΦe : Ve × V → R+ )
Proposition 9 (Zhang, 2004). The manifold {M, g(x), Γ (α) (x), Γ (−α) (x)}4
(α)
associated with DΦ (x, y) is given by
and
(α) 1−α ∗(α) 1+α
Γij,k (x) = Φijk , Γij,k (x) = Φijk . (42)
2 2
4
The functional argument of x (or u-below) indicates that x-coordinate system (or
u-coordinate system) is being used. Recall from Section 2.5 that under x (u, resp)
local coordinates, g and Γ , in component forms, are expressed by lower (upper, resp)
indices.
Here, Φij , Φijk denote, respectively, second and third partial derivatives of Φ(x)
∂ 2 Φ(x) ∂ 3 Φ(x)
Φij = , Φijk = .
∂xi ∂xj ∂xi ∂xj ∂xk
Recall that an α-Hessian manifold is equipped with an α-independent metric
and a family of α-transitively flat connections Γ (α) (i.e., Γ (α) satisfying (16) and
Γ (±1) are dually flat). From (42),
∗(α) (−α)
Γij,k = Γij,k ,
(α) 1 − α2 X ∗(α)
Rµνij (x) = (Φilν Φjkµ − Φilµ Φjkν )Ψ lk = Rijµν (x),
4
l,k
d2 xi dxj dxk d2
X 1−α X 1−α
Φki 2 + Φkij = 0 ←→ 2 Φk x = 0.
i
ds 2 ds ds ds 2
i,l
So the auto-parallel curves of an α-Hessian manifold all have the form
1−α
Φk x = ak s(α) + bk
2
where the scalar s is the arc length and ak , bk , k = 1, c . . . , n are constant vectors
(determined by a point and the direction along which the auto-parallel curve
flows through). For α = −1, the auto-parallel curves are given by uk = Φk (x) =
ak s + bk are affine coordinates as previously noted.
Related divergences and geometries Note that the metric and conjugated
connections in the forms (41) and (42) are induced from (40). Using the con-
vex conjugate Φe : Ve → R given by (36), we introduce the following family of
divergence functions De (α) (x, y) defined by
Φe
Straightforward calculation shows that D e (α) (x, y) induces the α-Hessian struc-
Φ
e
ture {M, g, Γ (−α) , Γ (α) } where Γ (∓α) are given by (42); that is, the pair of
α-connections are themselves “conjugate” (in the sense of α ↔ −α) to those
(α)
induced by DΦ (x, y).
If, instead of choosing x = [x1 , · · · , xn ] as the local coordinates for the mani-
fold M, we use its biorthogonal counterpart u = [u1 , · · · , un ] related to x via (38)
to index points on M. Under this u-coordinate system, the divergence function
(α)
DΦ between the same two points on M becomes
Explicitly written,
e (α) (u, v) = 4 1−α 1+α
DΦ Φ((∂Φ)−1 (u)) + Φ((∂Φ)−1 (v))
1 − α2 2 2
1−α 1+α
−Φ (∂Φ)−1 (u) + (∂Φ)−1 (v) .
2 2
Proposition 11 (Zhang, 2004). The α-Hessian manifold {M, g(u), Γ (α) (u), Γ (−α) (u)}
e (α) (u, v) is given by
associated with DΦ
g ij (u) = Φ
eij (u), (43)
1 + α eijk 1−α e
Γ (α)ij,k (u) = Φ , Γ ∗(α)ij,k (u) = Φijk , (44)
2 2
eij , Φ
Here, Φ eijk denote, respectively, second and third partial derivatives of Φ(u)
e
2e
eij (u) = ∂ Φ(u) , eijk (u) = ∂ 3 Φ(u)
e
Φ Φ .
∂ui ∂uj ∂ui ∂uj ∂uk
We remark that the same metric (43) and the same α-connections (44) are
(−α) (α)
induced by DΦe (u, v) ≡ DΦe (v, u) — this follows as a simple application of
Eguchi relation.
An application of (23) gives rise to the following relations:
(−α)
X
Γ (α)mn,l (u) = − g im (u)g jn (u)g kl (u)Γij,k (x),
i,j,k
(α)
X
∗(α)mn,l
Γ (u) = − g im (u)g jn (u)g kl (u)Γij,k (x),
i,j,k
(α)
X
R(α)klmn (u) = g ik (u)g jl (u)g µm (u)g νn (u)Rijµν (x).
i,j,µ,ν
∂2Φb(α) 1 + α2
1−α 1+α 1−α 1+α
= Φij + √ z+ − √ z̄
∂z i ∂ z̄ j 8 4 4 −1 4 4 −1
which is symmetric in i, j. Both (31) and (32) are satisfied. The symplectic form,
under the complex coordinates, is given by
√
4 −1 ∂ 2 Φ b(α)
1−α 1+α
ω (α) = Φij x+ dxi ∧ dy j = dz i ∧ dz̄ j
2 2 1 + α ∂z ∂ z̄ j
2 i
8 ∂2Φ b(α)
ds2 = dz i ⊗ dz̄ j .
1 + α2 ∂z i ∂ z̄ j
Proposition 12 (Zhang and Li, 2013). A smooth, strictly convex function
Φ : dom(Φ) ⊂ M → R induces a a family of Kähler structure (M, ω (α) , G(α) )
defined on dom(Φ) × dom(Φ) ⊂ M × M with
1. the symplectic form ω (α) is given by
(α)
ω (α) = Φij dxi ∧ dy j
(α) 8 ∂2Φb(α)
ds2(α) = Φij dz i ⊗ dz̄ j = ,
1 + α ∂z ∂ z̄ j
2 i
1 ∂ ∂
ei = √ ( i + i ).
2 ∂x ∂y
The Riemannian metric on the diagonal, induced from G(α) is
5 Summary
In order to construct divergence functions in a principled way, this Chapter
considered the various geometric structures on the underlying manifold M in-
duced from a divergence function. Among the geometric structures considered
are: statistical structure (Riemannian metric with a pair of torsion-free dual
connections, or by simple construction, a family of α-connections), equiaffine
structure (those connections that admit parallel volume forms), and Hessian
structure (those connections that are dually flat) — they are progressively more
restrictive: while any divergence function will induce a statistical manifold, only
canonical divergence (i.e., Bregman divergence) will induce a Hessian manifold.
Lying in-between these extremes is the equiaffine α-Hessian geometry induced
from, say, the class of DΦ -divergence. The α-Hessian structure has the advantage
of the existence of biorthogonal coordinates, induced from the convex function Φ
and its conjugate, and are convenient for computation. It should be noted that
the above geometric structures, from statistical to Hessian, are all induced on
the tangent bundle T M of the manifold M on which the divergence function is
defined.
On the cotangent bundle T ∗ M side, a divergence function can be viewed as a
generating function for a symplectic structure on M×M that can be constructed
in a “canonical” way. This imposes a “properness” condition on divergence func-
tion, stating that the mixed second derivatives of D(x, y) with respect to x and y
must commute. For such divergence functions, a Riemannian structure on M×M
can be constructed, which can be seen as an extension of the Riemannian struc-
ture on ∆M ⊂ M × M. If a further condition on D is imposed, then M × M
may be complexified, so it becomes a Kähler manifold. It was shown that DΦ -
divergence (Zhang, 2004) satisfies this Kählerian condition, in addition to itself
being proper — the Kähler potential is simply given by the real-valued convex
function Φ. These properties, along with the α-Hessian structure it induces on
the tangent bundle, makes DΦ a class of divergence functions that enjoy a spe-
cial role with “nicest” geometric properties, extending the canonical (Bregman)
divergence for dually flat manifolds. This has implications for machine learning,
convex optimization, geometric mechanics, etc..
References