0% found this document useful (0 votes)
87 views27 pages

Geometric Structures in Info Geometry

This document discusses divergence functions and the geometric structures they induce on manifolds. It begins by introducing divergence functions and their role in information geometry. Divergence functions induce (1) a statistical structure (Riemannian metric and pair of affine connections) on the manifold, (2) a symplectic structure on the product manifold if they are "proper", and (3) a Kähler structure on the product manifold if they satisfy an additional condition. The document focuses on the class of DΦ-divergence functions, which satisfy all these requirements and make the product manifold a Kähler manifold.

Uploaded by

sggtio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views27 pages

Geometric Structures in Info Geometry

This document discusses divergence functions and the geometric structures they induce on manifolds. It begins by introducing divergence functions and their role in information geometry. Divergence functions induce (1) a statistical structure (Riemannian metric and pair of affine connections) on the manifold, (2) a symplectic structure on the product manifold if they are "proper", and (3) a Kähler structure on the product manifold if they satisfy an additional condition. The document focuses on the class of DΦ-divergence functions, which satisfy all these requirements and make the product manifold a Kähler manifold.

Uploaded by

sggtio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Divergence Functions and Geometric Structures

They Induce on a Manifold

Jun Zhang

Department of Psychology and Department of Mathematics,


University of Michigan, Ann Arbor, MI 48109, USA
[email protected]

Abstract. Divergence functions play a central role in information geom-


etry. Given a manifold M, a divergence function D is a smooth, nonneg-
ative function on the product manifold M × M that achieves its global
minimum of zero (with semi-positive definite Hessian) at those points
that form its diagonal submanifold ∆M ⊂ M × M. In this Chapter, we
review how such divergence functions induce i) a statistical structure
(i.e., a Riemannian metric with a pair of conjugate affine connections)
on M; ii) a symplectic structure on M × M if they are “proper”; iii) a
Kähler structure on M × M if they further satisfy a certain condition. It
is then shown that the class of DΦ -divergence functions (Zhang, 2004), as
induced by a strictly convex function Φ on M, satisfies all these require-
ments and hence makes M×M a Kähler manifold (with Kähler potential
given by Φ). This provides a larger context for the α-Hessian structure
induced by the DΦ -divergence on M, which is shown to be equiaffine ad-
mitting α-parallel volume forms and biorthogonal coordinates generated
by Φ and its convex conjugate Φ∗ . As the α-Hessian structure is dually
flat for α = ±1, the DΦ -divergence provides a richer geometric struc-
tures (compared to Bregman divergence) to the manifold M on which it
is defined.

1 Introduction

Divergence functions (also called “contrast functions”, “york”) are non-symmetric


measurements of proximity. They play a central role in statistical inference, ma-
chine learning, optimization, and many other fields. The most familiar exam-
ples include Kullback-Leibler divergence, Bregman divergence (Bregman, 1967),
α-divergence (Amari, 1985), f -divergence (Csiszár, 1967), etc. Divergence func-
tions are also a key construct of information geometry. Just as L2 -distance is
associated with Euclidean geometry, Bregman divergence and Kullback-Leibler
divergence are associated with a pair of flat structures (where flatness means
free of torsion and free of curvature) that are “dual” to each other; this is called
Hessian geometry (Shima and Yagi, 1997; Shima, 2001) and it is the dualistic
extension of the Euclidean geometry. So just as Riemannian geometry extends
Euclidean geometry by allowing non-trivial metric structure, Hessian geometry
extends Euclidean geometry by allowing non-trivial affine connections that come
in pairs. The pairing of connections are with respect to a Riemannian metric g,
which is uniquely specified in the case of Hessian geometry, yet the metric-
induced Levi-Civita connection has non-zero curvature in general. The apparent
inconvenience is offset by the existence of biorthogonal coordinates in any dually
flat structure and a canonical divergence, along with the tools of convex analysis
which is useful in many practical applications.
In a quite general setting, any divergence function induces a Riemannian
metric and a pair of torsion-free dual connections on the manifold where they
are defined (Eguchi, 1983). This so-called statistical structure is at the core of
information geometry. Recently, other geometric structures related to divergence
functions are being investigated, including conformal structure (Ohara, Matsu-
zoe, and Amari, 2010), symplectic structure (Barndorff-Nielson and Jupp, 1997;
Zhang and Li, 2013), and complex structures (Zhang and Li, 2013).
The goal of this Chapter is to review the relationship between divergence
function and various information geometric structures. In Section 2, we provide
background materials of various geometric structures on a manifold. In Section
3, we show how these structures can be induced from a divergence function.
Starting from a general divergence function which always induces a statistical
structure, we define the notion of “properness” for it to be a generating function
of a symplectic structure. Imposing a further condition leads to complexification
of the product manifold where divergence functions are defined. In Section 4,
we show that a quite broad class of divergence functions, DΦ -divergence func-
tions (Zhang, 2004) as induced by a strictly convex function, satisfies all these
requirements and induces a Kähler structure (Riemannian and complex struc-
tures simultaneously) on the tangent bundle. Therefore, just as the full-fledged
α-Hessian geometry extends the dually-flat Hessian manifold (α = ±1), DΦ -
divergence generalizes Bregman divergence in the “nicest” way possible. Section
5 closes with a summary of this approach to information geometric structures
through divergence functions.

2 Background: Structures on Smooth Manifolds


2.1 Differentiable manifold: Metric and connection structures on
TM
A differentiable manifold M is a space which locally “looks like” a Euclidean
space Rn . By “looks like”, we mean that for any base (reference) point x ∈ M,
there exists a bijective mapping (“coordinate functions”) between the neighbor-
hood of x (i.e., a patch of the manifold) and a subset V of Rn . By locally, we
mean that various such mappings must be smoothly related to one another (if
they are centered at the same reference point) or consistently glued together
(if they are centered at different reference points) and globally cover the entire
manifold. Below, we assume that a coordinate system is chosen such that each
point is indexed by a vector in V , with the origin as the reference point.
A manifold is specified with certain structures. First, there is an inner-
product structure associated with tangent spaces of the manifold. This is given
by the metric 2-tensor field g which is, when evaluated at each location x, a
symmetric bilinear form g(·, ·) of tangent vectors X, Y ∈ Tx (M) ' Rn such that
g(X, X) is always positive for all non-zero vector X ∈ V . In local “holonomic”
1 i
coordinates
P i with basesP i∂i ≡ ∂/∂x , i = 1, · · · , n, (i.e., X, Y are expressed as
X = i X ∂i , Y = i Y ∂i ), the components of g are denoted as

gij (x) = g(∂i , ∂j ). (1)

Metric tensor allows us to define distance on a manifold as shortest curve (called


“geodesic”) connecting two points, to measure angle and hence define orthog-
onality of vectors — projections of vectors to a lower dimensional submanifold
become possible after a metric is given. Metric tensor also provides a linear iso-
morphism of tangent space with cotangent space at any point on the manifold.
Second, there is a structure associated with the notion of “parallelism” of
vector fields and curviness of a manifold. This is given by the affine (linear)
connection ∇, mapping two vector fields X and Y to a third one denoted by
∇Y X : (X, Y ) 7→ ∇Y X. Intuitively, it represents the “intrinsic” difference of a
tangent vector X(x) at point x and another tangent vector X(x0 ) at a nearby
point x0 , which is connected to x in the direction given by the tangent vector
Y (x). Here “intrinsic” means that vector comparison across two neighboring
points of the manifold is through a process called “parallel transport,” whereby
vector components are adjusted as the vector moves across points on the base
manifold. Under the local coordinate system with bases ∂i ≡ ∂/∂xi , components
of ∇ can be written out in its “contravariant” form denoted Γijl (x)
X
∇∂i ∂j = Γijl ∂l . (2)
l

Under coordinate transform x 7→ x̃, the new coefficients Γe are related to old
ones Γ via
 
X X ∂xi ∂xj ∂ x  ∂ x̃l
2 k
l
Γemn (x̃) = 
m n
Γijk (x) + ; (3)
i,j
∂ x̃ ∂ x̃ ∂ x̃m ∂ x̃n ∂xk
k

A curve whose tangent vectors are parallel along the curve is said to be “auto-
parallel”.
As a primitive on a manifold, affine connections can be characterized in terms
of their (i) torsion and (ii) curvature. The torsion T of a connection Γ , which
is a tensor itself, is
Pgiven by the asymmetric part of the connection T (∂i , ∂j ) =
∇∂i ∂j − ∇∂j ∂i = k Tijk ∂k , where Tijk is its local representation given as

Tijk (x) = Γijk (x) − Γji


k
(x).
1
A holonomic coordinate system means that the coordinates have been properly
“scaled” in length with respect to each other such that the directional derivatives
commute: their Lie bracket [∂i , ∂j ] = ∂i ∂j − ∂j ∂i = 0, i.e., the mixed partial deriva-
tives are exchangeable in their order of application.
The curviness/flatness of a connection Γ is described by the Riemann curvature
tensor R, defined as

R(∂i , ∂j )∂k = (∇∂i ∇∂j − ∇∂j ∇∂i )∂k .


P l
Writing R(∂i , ∂j )∂k = l Rkij ∂l and substituting (2), the components of the
Riemann curvature tensor are2
l l
l
∂Γjk (x) ∂Γik (x) X l m
X
l m
Rkij (x) = − + Γ im (x)Γjk (x) − Γjm (x)Γik (x).
∂xi ∂xj m m

l
By definition, Rkij is anti-symmetric when i ←→ j.
l
A connection is said to be flat when Rkij (x) ≡ 0 and Tijk ≡ 0. Note that this
is a tensorial condition, so that the flatness of a connection ∇ is a coordinate-
independent property even though the local expression of the connection (in
terms of Γ ) is coordinate-dependent. For any flat connection, there exists a local
coordinate system under which Γijk (x) ≡ 0 in a neighborhood; this is the affine
coordinate for the given flat connection.
In the above discussions, metric and connections are treated as separate
structures on a manifold. When both are defined on the same manifold, then it
is convenient to express affine connection Γ in its “covariant” form
X
Γij,k = g(∇∂i ∂j , ∂k ) = glk Γijl . (4)
l

Though Γijk is the more primitive quantity that does not involve metric, Γij,k
represents the projection of Γ onto the manifold spanned by the bases ∂k . The
covariant form of Riemann curvature is (c.f. footnote 2)
X
m
Rlkij = glm Rkij .
m

When the connection is torsion free, Rlkij is anti-symmetric when i ←→ j or


when k ←→ l, and symmetric when (i, j) ←→ (l, k). It is related to the Ricci
tensor Ric via Rickj = i,l Rlkij g il .
P

2.2 Coupling between metric and connection: Statistical structure


A fundamental theorem of Riemannian geometry states that given a metric,
there is a unique connection (among the class of torsion-free connections) that
“preserves” the metric, i.e., the following condition is satisfied

∂k g(∂i , ∂j ) = g(∇
b ∂ ∂i , ∂j ) + g(∂i , ∇
k
b ∂ ∂j ).
k
(5)
2
This component-wise notation of Riemann curvature tensor followed standard dif-
ferential geometry textbook, such as Nomizu and Sasaki (1994). On the other hand,
information geometers,
P l such as Amari and
P Nagaoka (2000), adopt the notation that
m
R(∂i , ∂j )∂k = l Rijk ∂l , with Rijkl = l Rijk gml .
Such a connection, denoted as ∇,b is called the Levi-Civita connection. Its com-
ponent forms, called Christoffel symbols, are determined by the components of
the metric tensor as (“Christoffel symbols of the second kind”)

X g kl  ∂gil ∂gjl ∂gij



Γbijk = + − .
2 ∂xj ∂xi ∂xl
l

and (“Christoffel symbols of the first kind”)


 
1 ∂gik ∂gjk ∂gij
Γij,k =
b + − .
2 ∂xj ∂xi ∂xk

The Levi-Civita connection Γb is compatible with the metric g, in the sense that
it treats tangent vectors of the shortest curves on a manifold as being parallel
(or equivalently, it treats geodesics as auto-parallel curves).
It turns out that one can define a kind of “compatibility” relation more
general than expressed by (5), by introducing the notion of “conjugacy” (denoted
by ∗) between two connections. A connection ∇∗ is said to be conjugate (or dual)
to ∇ with respect to g if

∂k g(∂i , ∂j ) = g(∇∂k ∂i , ∂j ) + g(∂i , ∇∗∂k ∂j). (6)

Clearly, (∇∗ )∗ = ∇. Moreover, ∇,b which satisfies (5), is special in the sense that

it is self-conjugate (∇) = ∇.
b b
Because metric tensor g provides a one-to-one mapping between points in the
tangent space (i.e., vectors) and points in the cotangent space (i.e., co-vectors),
(6) can also be seen as characterizing how co-vector fields are to be parallel-
transported in order to preserve their dual pairing h·, ·i with vector fields.
Writing out (6):
∂gij ∗
= Γki,j + Γkj,i , (7)
∂xk
where analogous to (2) and (4),
X
∇∗∂i ∂j = Γij∗l ∂l
l

so that X

Γkj,i = g(∇∗∂j ∂k , ∂i ) = ∗l
gil Γkj .
l

There is an alternative way of imposing “compatibility” condition between


a metric g and a connection ∇, through investigating the behavior of how the
metric tensor g behaves under ∇. We introduce a 3-tensor field, called “cubic
form”, as the covariant derivative of g: C = ∇g, or in component forms

C(∂i , ∂j , ∂k ) = (∇∂k g)(∂i , ∂j ) = ∂k g(∂i , ∂j ) − g(∇∂k ∂i , ∂j ) − g(∂i , ∇∂k ∂j )


. Writing out the above:
∂gij ∗
Cijk = − Γki,j − Γkj,i (= Γkj,i − Γkj,i ).
∂xk
From its definition, Cijk = Cjik , that is, symmetric with respective to its first
two indices. It can be further shown that:
X
l ∗l
Cijk − Cikj = gil (Tjk − Tjk )
l

where T, T ∗ are torsions of ∇ and ∇∗ , respectively. Therefore, Cijk = Cikj ,


and hence C is totally symmetric in all (pairwise permutation of) indices, when
l ∗l
Tjk = Tjk . So conceptually, requiring Cijk to be totally symmetric imposes a
compatibility condition between g and ∇, making them the so-called “Codazzi
pair” (see Simon, 2000). The Codazzi pairing generalizes the Levi-Civita cou-
pling whose corresponding cubic form Cijk is easily seen to be identically zero.
Lauritzen (1987) defined a “statistical manifold” (M, g, ∇) to be a manifold M
equipped with g and ∇ such that i) ∇ is torsion free; ii) ∇g ≡ C is totally
symmetric. Equivalently, a manifold is said to have statistical structure when
the conjugate connection ∇∗ (with respect to g) of a torsion-free connection ∇
is also torsion-free. In this case, ∇∗ g = −C, and that the Levi-Civita connection
∇ˆ = (∇ + ∇∗ )/2.
Two torsion-free connections Γ and Γ 0 are said to be projectively equivalent
if there exists a function τ such that:

Γij0k = Γijk + δik (∂j τ ) + δjk (∂i τ ),

where δik is the Kronecker delta. When two connections are projectively equiv-
alent, their corresponding auto-parallel curves have identical shape (i.e, consid-
ered as unparameterized curves); these so-called “pre-geodesics” differ only by a
change of parameterization τ .
Two torsion-free connections Γ and Γ 0 are said to be dual-projectively equiv-
alent if there exists a function τ such that:
0
Γij,k = Γij,k − gij (∂k τ ).

When two connections are dual-projectively equivalent, then their conjugate


connections (with respect to g) have identical pre-geodesics (identical shape).
Recall that when the two Riemannian metric g, g 0 are conformally equivalent,
i.e., there exists a function τ such that
0
gij = e2τ gij ,

c0 and Γb are related via


then their respective Levi-Civita connections Γ
c0 ij,k = Γbij,k − (∂k τ )gij + (∂j τ )gik + (∂i τ )gjk .
Γ

(This relation is obtained by directly substituting in the expressions of the cor-


responding Levi-Civita connections.) This motivates the definition of the more
general notion of conformally-projectively equivalent of two statistical structures
(M, g, Γ ) and (M, g 0 , Γ 0 ), through the existence of two functions ψ, φ such that:
0
gij = eψ+φ gij (8)
0
Γij,k = Γij,k − (∂k ψ)gij + (∂j φ)gik + (∂i φ)gjk . (9)

When φ = const (or ψ = const), then the corresponding connections are projec-
tively (dual-projectively, resp) equivalent.

2.3 Equiaffine structure and parallel volume form


For a restrictive set of connections, called “equiaffine” connections, the manifold
M may admit, in a unique way, a volume form Ω(x) that is “parallel” under
the given connection. Here, a volume form is a skew-symmetric multilinear map
from n linearly independent vectors to a non-zero scalar at any point x ∈ M,
and “parallel” is in the sense that ∇Ω = 0, or (∂i Ω)(∂1 , · · · , ∂n ) = 0 where
n
X
(∂i Ω)(∂1 , · · · , ∂n ) ≡ ∂i (Ω(∂1 , · · · , ∂n )) − Ω(· · · , ∇∂i ∂l , · · · ).
l=1

Applying (2), the equiaffine condition becomes


n n
!
X X
∂i (Ω(∂1 , · · · , ∂n )) = Ω ··· , Γilk ∂k , · · ·
l=1 k=1
X n
n X n
X
= Γilk δkl Ω(∂1 , · · · , ∂n ) = Ω(∂1 , · · · , ∂n ) Γill
l=1 k=1 l=1

or
X ∂ log Ω(x)
Γill (x) = . (10)
∂xi
l

Whether or not a connection is equiaffine is related to the so-called Ricci


tensor Ric, defined as the contraction of the Riemann curvature tensor R
X
k
Ricij (x) = Rikj (x). (11)
k

For a torsion-free connection Γijk = Γji k


, we can verify that
! !
∂ X ∂ X
Ricij − Ricji = Γjll (x) − Γill (x) (12)
∂xi ∂xj
l l
X
k
= Rkij .
k

One immediately sees that the existence of a function Ω satisfying (10) is equiv-
alent to the right side of (12) to be identically zero.
Making use of (10), it is easy to show that the parallel volume form of a
Levi-Civita connection Γb is given by
q
Ω(x)
b = det[gij (x)].

Making use of (7), the parallel volume forms Ω, Ω ∗ associated with Γ and Γ ∗
satisfy (apart from a multiplicative constant which must be positive)

Ω(x) Ω ∗ (x) = (Ω(x))


b 2
= det[gij (x)]. (13)

The equiaffine condition can also be expressed using a quantity related to


the cubic form Cijk . We may introduce the Tchebychev form (also known as the
first Koszul form), expressed in the local coordinates,
X
Ti = Cijk g jk . (14)
j,k

A tedious calculation shows that


! !
∂Ti ∂Tj ∂ X ∂ X
− = Γlil − Γljl ,
∂xj ∂xi ∂xj ∂xi
l l

the righthand side of (12). Therefore, an equivalent requirement for equiaffine


structure is that Tchebychev 1-form T is “closed”:

∂Ti ∂Tj
= . (15)
∂xj ∂xi

This expresses the integrability condition. When Equation (15) is satisfied, there
exits a function φ such that Ti = ∂i τ . Furthermore, it can be shown that

τ = −2 log(Ω/Ω).
b

Proposition 1 (Matsuzoe, Takeuchi, and Amari, 2006; Zhang, 2007).


The necessary and sufficient condition for a torsion-free connection ∇ to be
equiaffine is for any of the following to hold:

1. There exists a ∇-parallel volume element Ω : ∇Ω = 0.


2. Ricci tensor of ∇ is symmetric: Ricij = Ricji .
P k
3. Curvature tensor k Rkij = 0.
4. The Tchebychev 1-form T is closed, dT = 0.
5. There exists a function τ , called Tchebychev potential, such that Ti = ∂i τ .

It is known that the Ricci tensor of the Levi-Civita connection is always


symmetric — this is why Riemannian volumn form Ω b always exists.
2.4 α-structure and α-Hessian structure
On a statistical manifold, one can define a one-parameter family of affine con-
nections Γ (α) , called “α-connections” (α ∈ R):

(α)k 1 + α k 1 − α ∗k
Γij = Γij + Γij . (16)
2 2
Obviously, Γ (0) = Γb is the Levi-Civita connection. Using cubic form, this amounts
to ∇(α) g = αC. The α-parallel volume element given by:
α
Ω (α) = e− 2 τ Ω
b

where τ is the Tchebychev potential. The Riemannian volume element Ω b is only


ˆ ˆΩ
parallel with respect to the the Levi-Civita connection ∇ of g, that is, ∇ b = 0,
(α) (α)
but not other α-connections (α 6= 0). Rather, ∇ Ω = 0.

It can be further shown that the curvatures Rlkij , Rlkij for the pair of con-

jugate connections Γ, Γ satisfy

Rlkij = Rlkij .

So, Γ is flat if and only if Γ ∗ is flat. In this case, the manifold is said to be
“dually flat”. When Γ, Γ ∗ are dually flat, then Γ (α) is called “α-transitively
flat” (Uohashi, 2002). In such case, {M, g, Γ (α) , Γ (−α) } is called an α-Hessian
structure (Zhang and Matsuzoe, 2009). They are all compatible with a metric g
that is induced from a strictly convex (potential) function, see next subsection.
For an α-Hessian manifold, the Tchebychev form (14) is given by
∂ log(det[gkl ])
Ti =
∂xi
and its derivative (known as the second Koszul form) is

∂Ti ∂ 2 log(det[gkl ])
βij = = .
∂xj ∂xi ∂xj

2.5 Biorthogonal coordinates


A key feature for α-Hessian manifolds is biorthogonal coordinates, as we shall
discuss now. They are the “best” coordinates one can have when the Riemannian
metric is non-trivial.
Consider coordinate transform x 7→ u,
∂ X ∂xl ∂ X
∂i ≡ = l
= F li ∂l
∂ui ∂ui ∂x
l l

where the Jacobian matrix F is given by


∂ui ∂xi X
Fij (x) = , F ij (u) = , Fil F lj = δkl (17)
∂xj ∂uj
l
where δij is Kronecker delta (taking the value of 1 when i = j and 0 otherwise).
If the new coordinate system u = [u1 , · · · , un ] (with components expressed by
subscripts) is such that
Fij (x) = gij (x), (18)
then the x-coordinate system and the u-coordinate system are said to be “biorthog-
onal” to each other since, from the definition of metric tensor (1),
X X X
g(∂i , ∂ j ) = g(∂i , F lj ∂l ) = F lj g(∂i , ∂l ) = F lj gil = δij .
l l l

In such case, denote


g ij (u) = g(∂ i , ∂ j ), (19)
ij
which equals F , the Jacobian of the inverse coordinate transform u 7→ x.
Also introduce the (contravariant version) of the affine connection Γ under u-
coordinate and denote it by an unconventional notation Γtrs defined by
X
∇∂ r ∂ s = Γtrs ∂ t ;
t

similarly Γt∗rs is defined via


X
∇∗∂ r ∂ s = Γt∗rs ∂ t .
t

The covariant version of the affine connections will be denoted by superscripted


Γ and Γ ∗

Γ ij,k (u) = g(∇∂ i ∂ j , ∂ k ), Γ ∗ij,k (u) = g(∇∗∂ i ∂ j , ∂ k ). (20)

The affine connections in u-coordinates (expressed in superscript) and in x-


coordinates (expressed in subscript) are related via
 
X X ∂xr ∂xs 2 k
∂ x  ∂uk
Γtrs (u) =  Γijk (x) + (21)
i,j
∂u i ∂uj ∂ur ∂us ∂xt
k

and
X ∂xr ∂xs ∂xt ∂ 2 xt
Γ rs,t (u) = Γij,k (x) + . (22)
∂ui ∂uj ∂uk ∂ur ∂us
i,j,k

Similarly relations hold between Γt∗rs (u) and Γij∗k (x), and between Γ ∗rs,t (u) and

Γij,k (x).
In analogous to (7), we have the following identity

∂ 2 xt ∂g rt (u)
= = Γ rs,t (u) + Γ ∗ts,r (u).
∂us ∂ur ∂us
Therefore, we have
Proposition 2. Under biorthogonal coordinates, a pair of conjugate connections
Γ, Γ ∗ satisfy X
Γ ∗ts,r (u) = − g ir (u)g js (u)g kt (u)Γij,k (x) (23)
i,j,k

and X
Γr∗ ts (u) = − g js (u)Γjr
t
(x). (24)
j

Let us now express parallel volume forms Ω(x), Ω(u) under biorthogonal
coordinates x or u. Contracting the indices t with r in (24), and invoking (10),
we obtain
∂ log Ω ∗ (u) X ∂xj ∂ log Ω(x) ∂ log Ω ∗ (u) ∂ log Ω(x)
+ = + = 0.
∂us j
∂us ∂xj ∂us ∂us

After integration,
Ω ∗ (u) Ω(x) = const. (25)
From (13) and (25),
Ω(u) Ω ∗ (x) = const. (26)
The relations (25) and (26) indicate that the volume forms of the pair of conju-
gate connections, when expressed in biorthogonal coordinates respectively, are
inversely proportional to each other.
The Γ (α) -parallel volume element Ω (α) can be shown to be given by (in either
x and u coordinates)
1+α 1−α
Ω (α) = Ω 2 (Ω ∗ ) 2 .
Clearly,

Ω (α) (x)Ω (−α) (x) = det[gij (x)] ←→ Ω (α) (u)Ω (−α) (u) = det[g ij (u)].

2.6 Existence of biorthogonal coordinates


From its definition (18), we can easily show that
Proposition 3. A Riemannian manifold with metric gij admits biorthogonal
∂g
coordinates if and only if ∂xijk is totally symmetric

∂gij (x) ∂gik (x)


= . (27)
∂xk ∂xj
That (27) is satisfied for biorthogonal coordinates is evident by virtue of (17)
and (18). Conversely, given (27), there must be n functions ui (x), i = 1, 2, · · · , n
such that
∂ui (x) ∂uj (x)
= gij (x) = gji (x) = .
∂xj ∂xi
The above identity implies that there exist a function Φ such that ui = ∂i Φ and,
by positive definiteness of gij , Φ would have to be a strictly convex function! In
this case, the x- and u-variables satisfy (37), and the pair of convex functions,
e are related to gij and g ij by
Φ and its conjugate Φ,

∂ 2 Φ(x) ij ∂ 2 Φ(u)
e
gij (x) = ←→ g (u) = .
∂xi ∂xj ∂ui ∂uj

It follows from the above Lemma that a necessary and sufficient condition for
a Riemannian manifold to admit biorthogonal coordinates it that its Levi-Civita
connection is given by
 
1 ∂gik ∂gjk ∂gij 1 ∂gij
Γbij,k (x) ≡ + − = .
2 ∂xj ∂xi ∂xk 2 ∂xk

From this, the following can be shown:

Proposition 4. A Riemannian manifold {M, g} admits a pair of biorthogonal


coordinates x and u if and only if there exists a pair of conjugate connections γ
and γ ∗ such that γij,k (x) = 0, γ ∗rs,t (u) = 0.

In other words, biorthogonal coordinates are affine coordinates for the dually-
flat pair of connections. In fact, we can now define a pair of torsion-free connec-
tions by
∗ ∂gij
γij,k (x) = 0, γij,k (x) =
∂xk
and show that they are conjugate with respect to g, that is, they satisfy (6). This
is to say that we select an affine connection γ such that x is its affine coordinate.
From (22), when γ ∗ is expressed in u-coordinates,
X ∂xk ∂gij (x) ∂g ts (u)
γ ∗rs,t (u) = g ir (u)g js (u) +
∂ut ∂xk ∂ur
i,j,k

∂g js (u) ∂g ts (u)
X  
= g ir (u) − gij (x) +
i,j
∂ut ∂ur
X ∂g js (u) ∂g ts (u)
=− δjr + = 0.
j
∂ut ∂ur

This implies that u is an affine coordinate system with respect to γ ∗ . Therefore,


biorthogonal coordinates are affine coordinates for a pair of dually-flat connec-
tions.

2.7 Symplectic, complex, and Kähler structures


Symplectic structure on a manifold refers to the existence of a closed, non-
degenerate 2-tensor, i.e., a skew-symmetric bilinear map ω : W × W → R,
with ω(X, Y ) = −ω(Y, X) for all X, Y ∈ W ⊆ R2n . For ω to be well-defined,
the vector space W is required to be orientable and even-dimensional. In this
case, there exists a base {e1 , · · · , en , f1 , · · · , fn } of W, dim(W ) = 2n such that
ω(ei , ej ) = 0, ω(fi , fj ) = 0, ω(ei , fj ) = δij for all indices i, j taking values in
1, · · · , n.
Symplectic structure is closely related to inner-product structure (the ex-
istence of a positive-definite symmetric bilinear map G : W × W → R) and
complex structure (linear mapping J : W → W such that J 2 = −Id) on an even-
dimensional vector space W . The complex structure J on W is said to be compat-
ible with a symplectic structure ω if ω(JX, JY ) = ω(X, Y ) (symplectomorphism
condition) and ω(X, JY ) > 0 (taming condition) for any X, Y ∈ W . With ω, J
given, G(X, Y ) ≡ ω(X, JY ) can be shown to be symmetric and positive-definite,
and hence an inner-product on W .
The cotangent bundle T ∗ M of any manifold M admits a canonical symplectic
form written as
X n
ω= dxi ∧ dpi ,
i=1
where (x , · · · , x , p1 , · · · , pn ) are coordinates of T ∗ M. That ω is closed can be
1 n

shown by the existence of the tautological (or Liouville) 1-form


n
X
α= pi dxi
i=1

(which can be checked to be coordinate-independent on T ∗ M) and then verifying


ω = −dα. Hence, ω is also coordinate-independent. Denote ∂i = ∂/∂xi , ∂ei =
∂/∂pi as the base of the tangent bundle T M, then
ω(∂i , ∂j ) = ω(∂ei , ∂ej ) = 0; ω(∂i , ∂ej ) = −ω(∂ej , ∂i ) = ωij . (28)
That is, when viewed as 2 × 2 blocks of n × n matrix, ω vanishes on diagonal
blocks and has non-zero entries ωij and −ωij only on off-diagonal blocks.
The aforementioned linear map J of the tangent space Tx M ' W at any
point x ∈ M
J : J∂i = ∂ej , J ∂ej = −∂i ,
gives rise to an “almost complex structure” on Tx M. For T M to be complex, that
is, admitting complex coordinates, an integrable condition needs to be imposed
for the J-maps at various base points x of M, and hence at various tangent
spaces Tx M, to be “compatible” with one another. The condition is that the
so-called Nijenhuis tensor N
N (X, Y ) = [JX, JY ] − J[X, JY ] − J[JX, Y ] − [X, Y ]
must vanish for arbitrary tangent vector fields X, Y .
The Riemannian metric tensor G on T M compatible with ω has the form
Gij 0 ≡ G(∂i , ∂˜j ) = 0;
Gi0 j ≡ G(∂˜i , ∂j ) = 0;
Gij ≡ G(∂i , ∂j ) = gij
Gi0 j 0 ≡ G(∂˜i , ∂˜j ) = gij .
where i0 = n+i, j 0 = n+j and i, j takes values in 1, · · · , n. When viewed as 2×2
blocks of n × n matrix, G vanishes on the off-diagonal blocks and has non-zero
entries gij only on the two diagonal blocks. Such metric on T M is in the form of
Sasaki metric, which can also result from an appropriate “lift” of the Riemannian
metric on M into T M, via an affine connection on T M which induces a splitting
of T T M, the tangent bundle of T M as the base manifold. We omit the technical
details here, but refer interested readers to Yano and Ishihara (1973) and, in the
context of statistical manifold, to Matsuzoe and Inoguchi (2003).
It is a basic conclusion in symplectic geometry that for any symplectic form,
there exists a compatible almost complex structure J. Along with the Rieman-
nian metric, the three structures (G, ω, J) are said to form a compatible triple
if any two gives rise to the third one. When a manifold has a compatible triple
(G, ω, J) in which J is integrable, it is called a Kähler manifold. On a Kähler
manifold, using complex coordinates, the metric G̃ associated with the complex
line-element
ds2 = G̃ij dz i dz̄ j ,
is given by
∂2Φ
G̃ij (z, z̄) = .
∂z i ∂ z̄ j
Here the real-valued function Φ (of complex variables) is called the “Kähler
potential”.
It is known that the tangent bundle T M of a manifold M with a flat con-
nection on it admits a complex structure (Dombrowski, 1962). As Shima (2001)
pointed out, Hessian manifold can be seen as the “real” Kähler manifold.
Proposition 5 (Dombrowski, 1962). (M, g, ∇) is a Hessian manifold if and
only if (T M, J, G) is a Kähler manifold, where G is the Sasaki lift of g.

3 Divergence Functions and Induced Structures


3.1 Statistical structure induced on T M
A divergence function D : M × M → R≥0 on a manifold M under a local chart
V ⊆ Rn is defined as a smooth function (differentiable up to third order) which
satisfies
(i) D(x, y) ≥ 0 ∀x, y ∈ V with equality holding if and only if x = y;
(ii) Di (x, x) = D,j (x, x) = 0, ∀i, j ∈ {1, 2, · · · , n};
(iii) −Di,j (x, x) is positive definite.
Here Di (x, y) = ∂xi D(x, y), D,i (x, y) = ∂yi D(x, y) denote partial derivatives with
respect to the i-th component of point x and of point y, respectively, Di,j (x, y) =
∂xi ∂yj D(x, y) the second-order mixed derivative, etc.
On a manifold, divergence functions act as “pseudo-distance” functions that
are non-negative but need not be symmetric. That dualistic Riemannian mani-
fold structure (i.e., statistical structure) can be induced from a divergence func-
tion was first demonstrated by S. Eguchi.
Proposition 6 (Eguchi, 1983; 1992). A divergence function D induces a
Riemannian metric g and a pair of torsion-free conjugate connections Γ, Γ ∗
given as

gij (x) = − Di,j (x, y)|x=y ;


Γij,k (x) = − Dij,k (x, y)|x=y ;

Γij,k (x) = − Dk,ij (x, y)|x=y .


It is easily verifiable that Γij,k , Γij,k as given above are torsion-free3 and
satisfy the conjugacy condition with respect to the induced metric gij . Hence
{M, g, Γ, Γ ∗ } as induced is a “statistical manifold’ (Lauritzen, 1987).
A natural question is whether/how the statistical structures induced from
different divergence functions are related. The following is known:

Proposition 7 (Matsuzoe, 2009). Let D be a divergence function and ψ, φ be


two arbitrary functions. If D0 (x, y) = eψ(x)+φ(y) D(x, y), then D0 (x, y) is also a di-
vergence function, and the induced (M, g 0 , Γ 0 ) and (M, g, Γ ) induced from D(x, y)
are conformally-projectively equivalent. In particular, when φ(x) = const, then
Γ 0 and Γ are projectively equivalent; when ψ(y) = const, then Γ 0 and Γ are
dual-projectively equivalent.

3.2 Symplectic structure induced on M × M

A divergence function D is given as a bi-variable function on M (of dimension


n). We now view it as a (single-variable) function on M × M (of dimension 2n)
that assumes zero value along the diagonal ∆M ⊂ M × M. In this subsection,
we investigate the condition under which a divergence function can serve as a
“generating function” of a symplectic structure on M × M. A compatible metric
on M × M will also be derived.
First, we fix a particular y or a particular x in M × M — this results in
two n-dimensional submanifolds of M × M that will be denoted, respectively,
Mx ' M (with y point fixed) and My ' M (with x point fixed). Let us write
out the canonical symplectic form ωx on the cotangent bundle T ∗ Mx given by

ωx = dxi ∧ dξ i .

Given D, we define a map LD from M × M → T ∗ Mx , (x, y) 7→ (x, ξ) given by

LD : (x, y) 7→ (x, Di (x, y)dxi ).

(Recall that the comma separates the variable being in the first slot versus the
second slot for differentiation.) It is easily to check that in a neighborhood of
3
Conjugate connections which admit torsion has been recently studied by Calin, Mat-
suzoe, and Zhang (2009) and Matsuzoe (2010).
the diagonal ∆M ⊂ M × M, the map LD is a diffeomorphism since the Jacobian
matrix of the map  
δij Dij
0 Di,j
is non-degenerate in such a neighborhood of the diagonal ∆M .
We calculate the pullback of this symplectic form (defined on T ∗ Mx ) to
M × M:

L∗D ωx = L∗D (dxi ∧ dξ i ) = dxi ∧ dDi (x, y)


= dxi ∧ (Dij (x, y)dxj + Di,j dy j ) = Di,j (x, y)dxi ∧ dy j .

(Here Dij dxi ∧ dxj = 0 since Dij (x, y) = Dji (x, y) always holds.)
Similarly, we consider the canonical symplectic form ωy = dy i ∧ dη i on My
and define a map RD from M × M → T ∗ My , (x, y) 7→ (y, η) given by

RD : (x, y) = (y, D,i (x, y)dy i ).

Using RD to pullback ωy to M × M yields an analogous formula:



RD ωy = −Di,j (x, y)dxi ∧ dy j .

Therefore, based on canonical symplectic forms on T ∗ Mx and T ∗ My , we


obtained the same symplectic form on M × M

ωD (x, y) = −Di,j (x, y)dxi ∧ dy j . (29)

Proposition 8. A divergence function D induces a symplectic form ωD (29) on


M × M which is the pullback of the canonical symplectic forms ωx and ωy by the
maps LD and RD

L∗D ωy = Di,j (x, y)dxi ∧ dy j = −RD



ωx (30)

It was Barndorff-Nielsen and Jupp (1997) who first proposed (29) as an


induced symplectic form on M × M, apart from a minus sign; the divergence
function D was called a “york”. As an example,
P Bregman divergence BΦ (given
by (33) below) induces the symplectic form Φij dxi ∧ dy j .

3.3 Almost complex structure and Hermite metric on M × M


An almost complex structure J on M × M is defined by a vector bundle isomor-
phism (from T (M × M) to itself), with the property that J 2 = −Id. Requiring
J to be compatible with ωD , that is,

ωD (JX, JY ) = ωD (X, Y ), ∀X, Y ∈ T(x,y) (M × M),

we may obtain a constraint on the divergence function D. From


       
∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂
ωD , = ωD J , J = ωD , − = ω D , ,
∂xi ∂y j ∂xi ∂y j ∂y i ∂xj ∂xj ∂y i
we require
Di,j = Dj,i , (31)

or explicitly
∂2D ∂2D
i j
= .
∂x ∂y ∂xj ∂y i
Note that this condition is always satisfied on ∆M , by the definition of a di-
vergence function D, which has allowed us to define a Riemannian structure on
∆M (Proposition 6). We now require it to be satisfied on M × M (at least a
neighborhood of ∆M .
For divergence functions satisfying (31), we can consider inducing a metric
GD on M × M — the induced Riemannian (Hermit) metric GD is defined by

GD (X, Y ) = ωD (X, JY ).

It is easy to verify GD is invariant under the almost complex structure J. The


metric components are given by:
     
∂ ∂ ∂ ∂ ∂ ∂
Gij = GD , = ωD ,J = ωD , = −Di,j ,
∂xi ∂xj ∂xi ∂xj ∂xi ∂y j

     
∂ ∂ ∂ ∂ ∂ ∂
Gi0 j 0 = gD i
, j = ωD i
,J j = ωD i
,− j = −Dj,i ,
∂y ∂y ∂y ∂y ∂y ∂x

     
∂ ∂ ∂ ∂ ∂ ∂
Gij 0 = gD i
, j = ωD i
,J j = ωD i
,− j =0.
∂x ∂y ∂x ∂y ∂x ∂x

     
∂ ∂ ∂ ∂ ∂ ∂
Gi0 j = gD , = ωD ,J = ωD ,− j =0.
∂y i ∂xj ∂y i ∂xj ∂y i ∂y

So the desired Riemannian metric on M × M is


 
GD = −Di,j dxi dxj + dy i dy j .

So for GD to be a Riemannian metric, we require −Di,j to be positive-definite.


We call a divergence function D proper if and only if −Di,j is symmetric and
positive-definite on M × M. Just as any divergence function induces a Rieman-
nian structure on the diagonal manifold ∆M of M × M, any proper divergence
function induces a Riemannian structure on M × M that is compatible with the
symplectic structure ωD on it.
3.4 Complexification and Kähler structure on M × M
We now discuss possible existence of a Kähler structure on the product manifold
M × M. By definition,

ds2 = GD − −1ωD
  √  
= −Di,j dxi ⊗ dxj + dy i ⊗ dy j + −1Di,j dxi ⊗ dy j − dy i ⊗ dxj
 √   √ 
= −Di,j dxi + −1dy i ⊗ dxj − −1dy j = −Di,j dz i ⊗ dz̄ j .

Now introduce complex coordinates z = x + −1y,
 
z + z̄ z − z̄
D(x, y) = D , √ ≡ D(z,
b z̄),
2 2 −1
so
∂2D 1 1 ∂2D b
i j
= (Dij + D,ij ) = .
∂z ∂ z̄ 4 2 ∂z ∂ z̄ j
i

If D satisfies
Dij + D,ij = κDi,j (32)
where κ is a constant, then M × M admits a Kähler potential (and hence D
b is
a Kähler manifold)
κ ∂2D b
ds2 = dz i ⊗ dz̄ j .
2 ∂z ∂ z̄ j
i

3.5 Canonical divergence for Hessian manifold


On dually flat (i.e., Hessian) manifold, there is a canonical divergence as shown
below. Recall that the Hessian metric
∂ 2 Φ(x)
gij (x) =
∂xi ∂xj
and the dual connections

∗ ∂ 3 Φ(x)
Γij,k (x) = 0, Γij,k (x) =
∂xi ∂xj ∂xk
are induced from a convex potential function Φ. In the (biorthogonal) u-coordinates,
these geometric quantities can be expressed as

∂ 2 Φ(u)
e ∂ 3 Φ(u)
e
g ij (u) = , Γ ∗ ij,k (u) = 0, Γ ij,k (u) = ,
∂ui ∂uj ∂ui ∂uj ∂uk

where Φ e is the convex conjugate of Φ.


Integrating the Hessian structure reveals the so-called Bregman divergence
BΦ (x, y) (Bregman, 1967) as the generating function:

BΦ (x, y) = Φ(x) − Φ(y) − hx − y, ∂Φ(y)i (33)


where ∂Φ = [∂1 Φ, · · · , ∂n Φ] with ∂i ≡ ∂/∂xi denotes the gradient valued in the
co-vector space R e n , and h·, ·in denotes the canonical pairing of a point/vector
x = [x1 , · · · , xn ] ∈ Rn and a point/co-vector u = [u1 , · · · , un ] ∈ R
e n (dual to
n
R ):
n
X
hx, uin = xi ui . (34)
i=1

(Where there is no danger of confusion, the subscript n in h·, ·in is often omitted.)
A basic fact in convex analysis is that the necessary and sufficient condition for
a smooth function Φ to be strictly convex is
BΦ (x, y) > 0 (35)
for x 6= y.
e : Ve ⊆ R
Recall that, when Φ is convex, its convex conjugate Φ e n → R is
defined through the Legendre transform:
Φ(u)
e = h(∂Φ)−1 (u), ui − Φ((∂Φ)−1 (u)), (36)

with Φ
ee
= Φ and (∂Φ) = (∂ Φ) e −1 . The function Φ
e is also convex, and through
which (35) precisely expresses the Fenchel inequality

Φ(x) + Φ(u)
e − hx, ui ≥ 0

for any x ∈ V , u ∈ Ve , with equality holding if and only if


e −1 (x) ←→ x = (∂ Φ)(u)
u = (∂Φ)(x) = (∂ Φ) e = (∂Φ)−1 (u), (37)
or, in component form,

∂Φ ∂Φ
e
ui = i
←→ xi = . (38)
∂x ∂ui
With the aid of conjugate variables, we can introduce the “canonical diver-
gence” AΦ : V × Ve → R+ (and AΦe : Ve × V → R+ )

AΦ (x, v) = Φ(x) + Φ(v)


e − hx, vi = A e(v, x).
Φ

They are related to the Bregman divergence (33) via

BΦ (x, (∂Φ)−1 (v)) = AΦ (x, v) = BΦe((∂ Φ)(x),


e v).

4 DΦ -Divergence and Its Induced Structures


In this section, we study a particular parametric family of divergence functions,
called DΦ , induced by a strictly convex function Φ, with α as the parameter.
This family was first introduced by Zhang (2004), who showed that it included
many familiar families (see also Zhang, 2013). The resulting geometric structures
will be studied below.
4.1 DΦ -divergence functions
Recall that, by definition, a strictly convex function Φ : V ⊆ Rn → R, x 7→ Φ(x)
satisfies  
1−α 1+α 1−α 1+α
Φ(x) + Φ(y) − Φ x+ y >0 (39)
2 2 2 2
for all x 6= y for any |α| < 1 (the inequality sign is reversed when |α| > 1).
Assume Φ to be sufficiently smooth (differentiable up to fourth order).
Zhang (2004) introduced the following family of function on V ×V as indexed
by α ∈ R
  
(α) 4 1−α 1+α 1−α 1+α
DΦ (x, y) = Φ(x) + Φ(y) − Φ x + y .
1 − α2 2 2 2 2
(40)
(α)
From its construction, DΦ (x, y) is non-negative for |α| < 1 due to Eqn. (39),
and for |α| = 1 due to Eqn. (35). For |α| > 1, assuming ( 1−α 1+α
2 x + 2 y) ∈ V ,
(α)
the non-negativity of DΦ (x, y) can also be proven due to the inequality (39)
(±1)
reversing its sign. Furthermore, DΦ (x, y) is defined by taking limα→±1 :
(1) (−1)
DΦ (x, y) = DΦ (y, x) = BΦ (x, y),
(−1) (1)
DΦ (x, y) = DΦ (y, x) = BΦ (y, x).
(α)
Note that DΦ (x, y) satisfies the relation (called “referential duality” in Zhang,
2006a)
(α) (−α)
DΦ (x, y) = DΦ (y, x),
that is, exchanging the asymmetric status of the two points (in the directed
distance) amounts to α ↔ −α.

4.2 Induced α-Hessian structure on T M


We start by reviewing a main result from Zhang (2004) linking the divergence
(α)
function DΦ (x, y) defined in (40) and the α-Hessian structure.

Proposition 9 (Zhang, 2004). The manifold {M, g(x), Γ (α) (x), Γ (−α) (x)}4
(α)
associated with DΦ (x, y) is given by

gij (x) = Φij (41)

and
(α) 1−α ∗(α) 1+α
Γij,k (x) = Φijk , Γij,k (x) = Φijk . (42)
2 2
4
The functional argument of x (or u-below) indicates that x-coordinate system (or
u-coordinate system) is being used. Recall from Section 2.5 that under x (u, resp)
local coordinates, g and Γ , in component forms, are expressed by lower (upper, resp)
indices.
Here, Φij , Φijk denote, respectively, second and third partial derivatives of Φ(x)

∂ 2 Φ(x) ∂ 3 Φ(x)
Φij = , Φijk = .
∂xi ∂xj ∂xi ∂xj ∂xk
Recall that an α-Hessian manifold is equipped with an α-independent metric
and a family of α-transitively flat connections Γ (α) (i.e., Γ (α) satisfying (16) and
Γ (±1) are dually flat). From (42),
∗(α) (−α)
Γij,k = Γij,k ,

with the Levi-Civita connection given as:


1
Γbij,k (x) = Φijk .
2
Straightforward calculation shows that:
Proposition 10 (Zhang and Matsuzoe, 2009). For α-Hessian manifold
{M, g(x), Γ (α) (x), Γ (−α) (x)},
(i) the Riemann curvature tensor of the α-connection is given by:

(α) 1 − α2 X ∗(α)
Rµνij (x) = (Φilν Φjkµ − Φilµ Φjkν )Ψ lk = Rijµν (x),
4
l,k

with Ψ ij being the matrix inverse of Φij ;


(ii) all α-connections are equiaffine, with the α-parallel volume forms (i.e., the
volume forms that are parallel under α-connections) given by
1−α
ω (α) (x) = det[Φij (x)] 2 .

It is worth pointing out that while DΦ -divergence induces the α-Hessian


structure, it is not unique, as the same structure can arise from the following
divergence function, which is a mixture of Bregman divergences in conjugate
forms:
1−α 1+α
BΦ (x, y) + BΦ (y, x).
2 2

4.3 The family of α-geodesics


The family of auto-parallel curves on α-Hessian manifold have analytic expres-
sion. From
d2 xi X i(α) dxj dxk
+ Γjk =0
ds2 ds ds
j,k

and substituting (42), we obtain

d2 xi dxj dxk d2
 
X 1−α X 1−α
Φki 2 + Φkij = 0 ←→ 2 Φk x = 0.
i
ds 2 ds ds ds 2
i,l
So the auto-parallel curves of an α-Hessian manifold all have the form
 
1−α
Φk x = ak s(α) + bk
2

where the scalar s is the arc length and ak , bk , k = 1, c . . . , n are constant vectors
(determined by a point and the direction along which the auto-parallel curve
flows through). For α = −1, the auto-parallel curves are given by uk = Φk (x) =
ak s + bk are affine coordinates as previously noted.

Related divergences and geometries Note that the metric and conjugated
connections in the forms (41) and (42) are induced from (40). Using the con-
vex conjugate Φe : Ve → R given by (36), we introduce the following family of
divergence functions De (α) (x, y) defined by
Φe

e (α) (x, y) ≡ D(α) ((∂Φ)(x), (∂Φ)(y)).


D Φ
e Φ
e

Explicitly written, this new family of divergence functions is



e (α) (x, y) = 4 1−α e 1+α e
D 2
Φ(∂Φ(x)) + Φ(∂Φ(y)) −
Φe
1−α 2 2
 
e 1 − α ∂Φ(x) + 1 + α ∂Φ(y)
Φ .
2 2

Straightforward calculation shows that D e (α) (x, y) induces the α-Hessian struc-
Φ
e
ture {M, g, Γ (−α) , Γ (α) } where Γ (∓α) are given by (42); that is, the pair of
α-connections are themselves “conjugate” (in the sense of α ↔ −α) to those
(α)
induced by DΦ (x, y).
If, instead of choosing x = [x1 , · · · , xn ] as the local coordinates for the mani-
fold M, we use its biorthogonal counterpart u = [u1 , · · · , un ] related to x via (38)
to index points on M. Under this u-coordinate system, the divergence function
(α)
DΦ between the same two points on M becomes

e (α) (u, v) ≡ D(α) ((∂ Φ)(u),


D e (∂ Φ)(v)).
e
Φ Φ

Explicitly written,

e (α) (u, v) = 4 1−α 1+α
DΦ Φ((∂Φ)−1 (u)) + Φ((∂Φ)−1 (v))
1 − α2 2 2
 
1−α 1+α
−Φ (∂Φ)−1 (u) + (∂Φ)−1 (v) .
2 2

Proposition 11 (Zhang, 2004). The α-Hessian manifold {M, g(u), Γ (α) (u), Γ (−α) (u)}
e (α) (u, v) is given by
associated with DΦ

g ij (u) = Φ
eij (u), (43)
1 + α eijk 1−α e
Γ (α)ij,k (u) = Φ , Γ ∗(α)ij,k (u) = Φijk , (44)
2 2
eij , Φ
Here, Φ eijk denote, respectively, second and third partial derivatives of Φ(u)
e
2e
eij (u) = ∂ Φ(u) , eijk (u) = ∂ 3 Φ(u)
e
Φ Φ .
∂ui ∂uj ∂ui ∂uj ∂uk
We remark that the same metric (43) and the same α-connections (44) are
(−α) (α)
induced by DΦe (u, v) ≡ DΦe (v, u) — this follows as a simple application of
Eguchi relation.
An application of (23) gives rise to the following relations:
(−α)
X
Γ (α)mn,l (u) = − g im (u)g jn (u)g kl (u)Γij,k (x),
i,j,k
(α)
X
∗(α)mn,l
Γ (u) = − g im (u)g jn (u)g kl (u)Γij,k (x),
i,j,k
(α)
X
R(α)klmn (u) = g ik (u)g jl (u)g µm (u)g νn (u)Rijµν (x).
i,j,µ,ν

The volume form associated with Γ (α) is


1+α
ω (α) (u) = det[Φ
eij (u)] 2 .

When α = ±1, D e (α) (u, v), as well as D


e (α) (x, y) introduced earlier, take the
Φ Φe
form of Bregman divergence (33). In this case, the manifold is dually flat, with
(±1)
Riemann curvature tensor Rijµν (u) = R(±1)klmn (x) = 0.
We summarize the relations between the convex-based divergence functions
and the geometry they generate in Table 1 below. The duality associated with
α ↔ −α is called “reference duality” whereas the duality associated with x ↔ u
is called represenational duality (Zhang, 2004; Zhang 2006; Zhang, 2013).
Table 1

Divergence Function Induced Geometry


(α) 
DΦ (x, y) Φij (x), Γ (x)(α) , Γ (x)(−α)
(α) 
DΦe ((∂Φ)(x), (∂Φ)(y)) Φij (x), Γ (x)(−α) , Γ (x)(α)
n o
(α) eij (u), Γ (u)(−α) , Γ (u)(α)
DΦe (u, v) Φ
n o
(α) eij (u), Γ (u)(α) , Γ (u)(−α)
DΦ ((∂ Φ)(x),
e (∂ Φ)(y))
e Φ

4.4 Induced symplectic and Kähler structures on M × M


With respect to the DΦ -divergence (40), observe that
 
1−α 1+α  1−α 1+α 1 − α 1 + α  b(α)
Φ x+ y =Φ ( + √ )z+( − √ )z̄ ≡ Φ (z, z̄),
2 2 4 4 −1 4 4 −1
(45)
we have

∂2Φb(α) 1 + α2
    
 1−α 1+α 1−α 1+α
= Φij + √ z+ − √ z̄
∂z i ∂ z̄ j 8 4 4 −1 4 4 −1
which is symmetric in i, j. Both (31) and (32) are satisfied. The symplectic form,
under the complex coordinates, is given by

4 −1 ∂ 2 Φ b(α)
 
1−α 1+α
ω (α) = Φij x+ dxi ∧ dy j = dz i ∧ dz̄ j
2 2 1 + α ∂z ∂ z̄ j
2 i

and the line-element is given by

8 ∂2Φ b(α)
ds2 = dz i ⊗ dz̄ j .
1 + α2 ∂z i ∂ z̄ j
Proposition 12 (Zhang and Li, 2013). A smooth, strictly convex function
Φ : dom(Φ) ⊂ M → R induces a a family of Kähler structure (M, ω (α) , G(α) )
defined on dom(Φ) × dom(Φ) ⊂ M × M with
1. the symplectic form ω (α) is given by
(α)
ω (α) = Φij dxi ∧ dy j

which is compatible with the canonical almost complex structure J

ω (α) (JX, JY ) = ω (α) (X, Y ),

where X, Y are vector fields on dom × dom(Φ);


2. the Riemannian metric G(α) , compatible with J and ω (α) above, is given by
(α)
Φij
(α)
G(α) = Φij (dxi dxj + dy i dy j );
3. the Kähler structure

(α) 8 ∂2Φb(α)
ds2(α) = Φij dz i ⊗ dz̄ j = ,
1 + α ∂z ∂ z̄ j
2 i

with the Kähler potential given by


2 b(α)
Φ (z, z̄).
1 + α2
(α) 1−α 1+α

Here, Φij = Φij 2 x + 2 y .
For the diagonal manifold ∆M = {(x, x) : x ∈ M}, a basis of its tangent
space T(x,x) ∆M can be selected as

1 ∂ ∂
ei = √ ( i + i ).
2 ∂x ∂y
The Riemannian metric on the diagonal, induced from G(α) is

G(α) (ei , ej )|x=y = hG(α) , ei ⊗ ej i


(α) 1 ∂ ∂ 1 ∂ ∂
= hΦkl (dxk ⊗ dxl + dy k ⊗ dy l ), √ ( i + i ) ⊗ √ ( j + j )i
2 ∂x ∂y 2 ∂x ∂y
(α)
= Φij (x, x) = Φij (x),

where hα, ai denotes a form α operating on a tensor field a. Therefore, restricting


to the diagonal ∆M , g (α) reduces to the Riemannian metric induced by the
(α)
divergence DΦ through the Eguchi method.
We next calculate the Levi-Civita connection Γ̃ associated with G(α) . Denote
i0
x = y i , and that
   
(α) ∂ ∂ (α) ∂ ∂
Γ̃i0 jk0 = G ∇ ∂0 ,
j ∂xk0
=G ∇ ∂i j , k ,
∂xi ∂x ∂y ∂x ∂y
and so on. The Levi-Civita connection on M × M is
(α) (α)
(α)
1  ∂Gik ∂Gjk ∂Gij  1 − α (α)
Γ̃ijk = + − = Φijk .
2 ∂xj ∂xi ∂xk 4

(α) (α) (α)


1  ∂Gik0 ∂Gjk0 ∂Gij  1 + α (α)
Γ̃ijk0 = j
+ i
− k 0 =− Φijk .
2 ∂x ∂x ∂x 4
(α) (α) (α)
1  ∂Gik0 ∂Gj 0 k0 ∂Gij 0  1 − α (α)
Γ̃i0 jk0 = Γ̃ij 0 k0 = + − = Φijk .
2 ∂xj 0 ∂xi ∂xk0 4
(α) (α) (α)
1  ∂Gik ∂Gj 0 k ∂Gij 0  1 + α (α)
Γ̃ i0 jk = Γ̃ ij 0 k = + − = Φijk .
2 ∂xj 0 ∂xi ∂xk 4
(α) (α) (α)
1  ∂Gi0 k ∂Gj 0 k ∂Gi0 j 0  1 − α (α)
Γ̃i0 j 0 k = j 0 + i 0 − k
=− Φijk .
2 ∂x ∂x ∂x 4
(α) (α) (α)
1  ∂Gi0 k0 ∂Gj 0 k0 ∂Gi0 j 0  1 + α (α)
Γ̃i0 j 0 k0 = + − = Φijk .
2 ∂xj 0 ∂xi0 ∂xk0 4

5 Summary
In order to construct divergence functions in a principled way, this Chapter
considered the various geometric structures on the underlying manifold M in-
duced from a divergence function. Among the geometric structures considered
are: statistical structure (Riemannian metric with a pair of torsion-free dual
connections, or by simple construction, a family of α-connections), equiaffine
structure (those connections that admit parallel volume forms), and Hessian
structure (those connections that are dually flat) — they are progressively more
restrictive: while any divergence function will induce a statistical manifold, only
canonical divergence (i.e., Bregman divergence) will induce a Hessian manifold.
Lying in-between these extremes is the equiaffine α-Hessian geometry induced
from, say, the class of DΦ -divergence. The α-Hessian structure has the advantage
of the existence of biorthogonal coordinates, induced from the convex function Φ
and its conjugate, and are convenient for computation. It should be noted that
the above geometric structures, from statistical to Hessian, are all induced on
the tangent bundle T M of the manifold M on which the divergence function is
defined.
On the cotangent bundle T ∗ M side, a divergence function can be viewed as a
generating function for a symplectic structure on M×M that can be constructed
in a “canonical” way. This imposes a “properness” condition on divergence func-
tion, stating that the mixed second derivatives of D(x, y) with respect to x and y
must commute. For such divergence functions, a Riemannian structure on M×M
can be constructed, which can be seen as an extension of the Riemannian struc-
ture on ∆M ⊂ M × M. If a further condition on D is imposed, then M × M
may be complexified, so it becomes a Kähler manifold. It was shown that DΦ -
divergence (Zhang, 2004) satisfies this Kählerian condition, in addition to itself
being proper — the Kähler potential is simply given by the real-valued convex
function Φ. These properties, along with the α-Hessian structure it induces on
the tangent bundle, makes DΦ a class of divergence functions that enjoy a spe-
cial role with “nicest” geometric properties, extending the canonical (Bregman)
divergence for dually flat manifolds. This has implications for machine learning,
convex optimization, geometric mechanics, etc..

References

1. Amari, S. (1985). Differential Geometric Methods in Statistics (Lecture Notes in


Statistics, 28), Springer-Verlag, New York. Reprinted in 1990.
2. Amari, S. and Nagaoka, H. (2000). Method of Information Geometry, AMS Mono-
graph, Oxford University Press.
3. Barndorff-Nielsen, O.E. and Jupp, P.E. (1997) Yorks and symplectic structures.
Journal of Statistical Planning and Inferrence, 63, 133-146.
4. Bregman, L. M. (1967). The relaxation method of finding the common point of
convex sets and its application to the solution of problems in convex programming.
USSR Computational Mathematics and Physics, 7, 200-217.
5. Calin, O., Matsuzoe, H., and Zhang. J. (2009). Generalizations of conjugate con-
nections. Trends in Differential Geometry, Complex Analysis and Mathematical
Physics: Proceedings of the 9th International Workshop on Complex Structures
and Vector Fields (pp. 2434).
6. Csiszár, I. (1967). On topical properties of f-divergence. Studia Mathematicarum
Hungarica, 2, 329339.
7. Dombrowski, P. (1962). On the geometry of the tangent bundle. Journal fr der
reine und angewandte Mathematik, 210, 73-88.
8. Eguchi, S. (1983). Second order efficiency of minimum contrast estimators in a
curved exponential family. Annals of Statistics, 11, 793-803.
9. Eguchi, S. (1992). Geometry of minimum contrast. Hiroshima Math. J. 22, 631-
647.
10. Lauritzen, S. (1987). Statistical manifolds. In Amari, S., Barndorff-Nielsen, O.,
Kass, R., Lauritzen, S., and Rao, C.R. (Eds) Differential Geometry in Statistical
Inference, IMS Lecture Notes, Vol. 10, Hayward, CA (pp. 163-216).
11. Matsuzoe H. (1998). On realization of conformally-projectively flat statistical man-
ifolds and the divergences. Hokkaido Math. J. 27, 409-421.
12. Matsuzoe, H. and Inoguchi, J. (2003). Statistical structures on tangent bundles.
Applied Sciences, 5, 55-75.
13. Matsuzoe, H., Takeuchi, J., and Amari. S (2006). Equiaffine structures on statistical
manifolds and Bayesian statistics. Differential Geometry and its Applications, 24,
567-578.
14. Matsuzoe, H. (2009). Computational geometry from the viewpoint of affine dif-
ferential geometry. In Nielsen, F. (Ed.) Emerging Trends in Visual Computing.
Springer Berlin Heidelberg, (pp. 103-123).
15. Matsuzoe, M. (2010). Statistical manifolds and affine differential geometry, Ad-
vanced Studies in Pure Mathematics, 57, 303-321.
16. Nomizu, K. and Sasaki, T. (1994). Affine Differential Geometry – Geometry of
Affine Immersions, Cambridge University Press.
17. Ohara, A., Matsuzoe, H. and Amari, S. (2012). Conformal geometry of escort
probability and its applications. Modern Physics Letters B, 26, 1250063.
18. Shima, H. (2001). Hessian Geometry, Shokabo, Tokyo (in Japanese).
19. Shima, H. and Yagi, K. (1997). Geometry of Hessian manifolds. Differential Ge-
ometry and its Applications, 7, 277-290.
20. Simon, U. (2000). Affine differential geometry. In Dillen, F. and L. Verstraelen, L.
(Eds) Handbook of Differential Geometry, vol. I, Elsevier Science (pp. 905-961).
21. Uohashi, K. (2002). On α-conformal equivalence of statistical manifolds. Journal
of Geometry, 75: 179-184.
22. Yano, K. and Ishihara, S. (1973). Tangent and Cotangent Bundles: Differential
Geometry. Vol. 16. New York: Dekker.
23. Zhang, J. (2004). Divergence function, duality, and convex analysis. Neural Com-
putation, 16, 159-195.
24. Zhang, J. (2006). Referential duality and representational duality on statistical
manifolds. Proceedings of the Second International Symposium on Information Ge-
ometry and Its Applications, Tokyo (pp 58-67).
25. Zhang, J. (2007). A note on curvature of α-connections on a statistical manifold.
Annals of Institute of Statistical Mathematics, 59, 161-170.
26. Zhang, J. and Matsuzoe, H. (2009). Dualistic differential geometry associated with
a convex function. In Gao D.Y. and Sherali, H.D. (Eds) Advances in Applied Math-
ematics and Global Optimization (Dedicated to Gilbert Strang on the Occasion of
His 70th Birthday), Advances in Mechanics and Mathematics, Vol. III, Chapter
13, Springer (pp 439-466).
27. Zhang, J. (2013). Nonparametric information geometry: From divergence function
to referential-representational biduality on statistical manifolds. Entropy (in press).
28. Zhang, J. and Li, F. (2013). Symplectic and Kahler structures on statistical man-
ifolds induced from divergence functions. In Nielson, F. and Barbaresco, F. (Eds)
Proceedings of the First International Conference on Geometric Science of Infor-
mation (GSI2013), (pp. 595-603).

You might also like