0% found this document useful (0 votes)

67 views52 pages

FrankNielsen Soph IA 21nov2019

This document discusses the geometry of machine learning and artificial intelligence. It introduces several geometric learning machines such as Riemannian manifold learning and kernel machines. It also discusses information geometry and the dual geometric structures of information projections. Deep learning is framed as learning trajectories on neuromanifolds. Computational geometry concepts like smallest enclosing balls and coresets are applied to kernel machines. The Fisher information matrix and Fisher-Rao geometry are also covered.

Uploaded by

Nacim Belkhir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views52 pages

FrankNielsen Soph IA 21nov2019

Uploaded by

Nacim Belkhir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/337424905

From geometric learning machines to the geometry of AI

Presentation · November 2019

DOI: 10.13140/RG.2.2.22415.53923

CITATIONS READS

0 297

1 author:

Frank Nielsen
Professor
404 PUBLICATIONS 3,804 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Computational Information Geometry View project

High Performance Computing (HPC) View project

All content following this page was uploaded by Frank Nielsen on 21 November 2019.

The user has requested enhancement of the downloaded file.

From
geometric learning machines
to the
geometry of AI
Frank Nielsen
@FrnkNlsn
21st November 2019
Outline of this talk Computational geometry
for ML!

• Introduction to some famous geometric learning machines

• Unsupervised learning: Riemannian manifold learning
• Supervised learning:
• Kernel machines and Hilbert geometry (RKHS)
• Deep learning and trajectories on neuromanifolds
• Information geometry and information projections:
• The dualistic geometric structures
• Geometry of interpolation machines:
• Double descent learning curves
Non-linear versus linear dimension reduction
• Many non-linear techniques:
• LLE,
• ISOMAP
• etc.
• In very high dimensions, linear dimension projection (curse
of dimensionality). Johnson-Lindenstrauss’ theorem (1984)

Introduction to HPC with MPI for Data Science, Springer, 2016

Riemannian manifolds: Extrinsic view vs intrinsic view
Visualized extrinsically as smooth surfaces of the ambient Euclidean
space: isometric embedding theorem (Nash embedding theorem)
Isometric Extrinsic geometry
embedding:

Manifold learning/reconstruction
Intrinsic geometry from data points (Swiss roll)

Intrinsic geometry versus extrinsic isometric embedding

Kernel Support Vector Machines (SVMs)
Linear separator Non-linear separator
Support vector

Feature map

Inner product
(Hilbert space)

Kernel machines
1992
1970 (RKHS geometry)
SVM: Dual quadratic program amount to solve for a
smallest enclosing ball (= SEB): Computational geometry !

Smallest enclosing ball:

Smallest ball with respect to
radius or set inclusion
Approximating Smallest Enclosing Balls with Applications to Machine Learning, IJGA 2009
Coresets for Smallest Enclosing Balls and Core VMs
• Definition: A coreset is a subset such that [BC 2002]

Independent of input size n

Independent of dimension d
Only dependent of epsilon 
Selected points inside boxes

Apply to finite point sets in infinite-dimensional spaces too!

-> Kernel machines with Core Vector Machines
A note on kernelizing the smallest enclosing ball for machine learning, 2017
Introduction to HPC with MPI for Data Science, Springer, 2016
Computing a coreset for the Smallest Enclosing Balls
Extremely simple algorithm  !
#iterations:

Running time:

On approximating the Riemannian 1-center. Comput. Geom. 46(1): 93-104 (2013)

Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry. GSI 2015
Approximating the kernelized minimum enclosing ball
Kernel with feature map
(D may be infinite)
Trick: Encode implicitly the circumcenter of the enclosing ball as a
convex combination of the data points:

Update weights iteratively:

Index of the current farthest point
Applications: Support Vector Data Description, Support Vector Data Description
A note on kernelizing the smallest enclosing ball for machine learning, 2017
The 1-layer perceptron: linear separator machine
Frank Rosenblatt:
Principles of Neurodynamics, 1962

Marvin Minsky and Seymour Papert:

Perceptrons: An Introduction to Computational Geometry, 1969
Stochastic MLPs and neuromanifolds
Neurodynamics=
Learning trajectory on the manifold

Parameter space

Hidden layers: universal function approximators (post XOR)

Supervised learning: gradient descent + backpropagation
Global geometric objects
Smooth manifold vs
Local descriptions
M in local chart coordinates
p

U
V
Locally Euclidean
(homeomorphism) Atlas= set of charts
Coordinates in charts+transition maps
Φv
n
Φu
Φu (U ∩ V)

Φv(V)
Φu(U)
Φuv
UV mapping in computer graphics
Fisher information matrix (FIM)
𝜕𝜕 𝜕𝜕
g(ξ)=Eξ[ log(pξ) log(pξ)]
𝜕𝜕ξ i
𝜕𝜕ξ j

𝜕𝜕 𝜕𝜕
gij(ξ)=∫ log(pξ(x)) log(pξ(x))pξ(x)dx Sir Ronald Fisher
𝜕𝜕ξ i
𝜕𝜕ξ j

FIM is positive-semidefinite, positive-definite for regular models

1922
Fisher-Rao geometry and geodesic distance
Fisher information metric (FIM)

independent to
reparameterization of θ

Riemannian
geodesics locally
minimize
lengths
Riemannian geometry of normal distributions
equipped with FIM: hyperbolic geometry

Metric tensor:
Measure vector
lengths, angles on
tangent space

Pseudo-sphere
(negative curvature -1/2)

Pattern recognition in nuclear fusion data by means of geometric methods in probabilistic spaces, 2017
Cramer-Rao lower bound: Inverse of Fisher information
Löwner partial ordering on positive-semi-definite matrices:

Accuracy of estimator

CRLB Theorem: depends on true

parameter

Under regularity conditions:

Cramer-Rao lower bound and information geometry. Connected at Infinity II, 2013.
Calculating Rao’s distance is often untractable

• Need to solve the Ordinary Differential Equation (ODE) for find the geodesic
(but trivial in 1D):

• Need to integrate the infinitesimal length elements along the geodesics…

No closed-form Fisher-Rao distance between multivariate normals!

-> geodesic shooting (BVP: boundary value problem, IVP: initial value problem)
Using the Fisher Information Matrix without geodesics?
Ordinary steepest gradient descent method
• Iterative optimization algorithm
• Start from an initial parameter value
• Update iteratively the current parameter using a learning rate α (step
size) and the gradient of the energy function:

• First-order optimization method

• Zig-zag local minimum convergence
• Stopping criterion

Similarly, maximization with hill climbing, steepest ascent

Steepest descent in a Riemannian space:
The natural gradient
• The steepest descent direction of E(θ) in a Riemannian space is given
by

Type checking:
Contravariantfo
rm of the
ordinary
gradient
Learning rate

Computing the inverse of the Fisher information matrix is tricky!

Amari, Shun-Ichi. "Natural gradient works efficiently in learning." Neural computation 10.2 (1998): 251-276.
Pros and cons of natural gradient
• Pros:
• Invariant (intrinsic) gradient (at infinitesimal scale/ODE)
• Not trapped in plateaus (close to degenerate FIM)
• Achieve Fisher efficiency in online learning

• Cons:
• Too expensive to compute (no closed-form FIM; need matrix inversion;
numerical stability) -> Other Riemannian metrics studied
• Degenerate for irregular models (e.g., hierarchical models, Deep learning)
• Need to adapt the step size
Relative Fisher Information Matrix (RFIM) and
Relative Natural Gradient (RNG) for deep learning

Relative Fisher IM:

Dynamic
geometry!

The RFIMs of single neuron models, a linear layer, a non-linear layer, a soft-
max layer, two consecutive layers all have simple RFIM closed form solutions
Relative Fisher Information and Natural Gradient for Learning Large Modular Models (ICML'17)
Dualistic structures
of
Information geometry

and
Information projections
(may seem at first counterintuitive)
An elementary introduction to information geometry, arXiv:1808.08271
An essential concept: Affine Connection 𝛻𝛻
• Define how to “parallel transport” a vector from one tangent plane
to another tangent plane by infinitesimally parallel shifting it along a
curve (thus generally depend on the curve)
• 𝛁𝛁-geodesics = autoparallel curves
Vp=V(p)
V(q)
q

𝛻𝛻 γ̇
γ
Elie Joseph Cartan
(1869-1951)
https://arxiv.org/abs/1808.08271
Curvature of
a connection 𝛁𝛁 Osculating circle

Sectional
curvatures

Cylinder is flat:
Parallel transport is
path-independent Sphere has constant curved curvature:
Parallel transport is path-dependent
Dual exponential/mixture affine connections
Historically, built the e-connection (exponential) and
m-connection (mixture) for statistical models
Log-likelihood

e-connection

m-connection
May not need DUAL CONNECTIONS wrt. the
distances here Fisher information (Riemannian) metric
Dualistic structure of the Gaussian manifold
∇: e-connection, flat
∇*:m-connection, flat
Dually flat space!
(both connections flat)

M-geodesic

E-geodesic
In a Dually flat space, dual Pythagoras’ theorem

Bregman
manifold
induced by a
convex function

Two (affine) coordinate systems coupled by Legendre-Fenchel transformation

Two dually flat connections with respect to the metric tensor
Canonical distance = Bregman divergence induced by convex generator F
Generalize Euclidean space,  very practical  for computing!
Geodesic triangles in Bregman manifolds

3 vertices define 6 geodesic edges from which

8 geodesic triangles can be built, defining 18 interior angles

Geodesic triangle
with two right angles Geometry
(geometry is NOT CONFORMAL) induced by dual
convex potentials
https://arxiv.org/abs/1910.03935
Recalling projections in Euclidean geometry….
Orthogonality and uniqueness of projection

Proof using the Pythagoras’ theorem, min. distance

Guaranteed unique projection Non-unique projection

On uniqueness of projections in dually flat spaces
Projection to a submanifold
with respect to a connection

In dually flat spaces, uniqueness of projections when

Maximum likelihood estimator
for an exponential family as
an information m-projection
Exponential Family Manifold (EFM) is e-flat
Observed point

KL=
Relative
entropy

What is an information projection?, Notices AMS 65.3 (2018)

MaxEnt as an information e-projection
• MaxEnt linear constraints define a m-flat

Pythagoras’ theorem (Fisher orthogonality)

What is an information projection?, Notices AMS 65.3 (2018)

Geometric framework yields interpretations of
MLE/MaxEnt of KL divergence minimizations
as information projections (and uniqueness proofs)
MaxEnt (with prior q) Maximum Likelihood Estimate
e-projection on m-flat m-projection on e-flat
Divergences: Statistical (oriented) distances or
smooth parametric distances
• In information theory, relative entropy called Kullback-Leibler divergence

• KLD can be extended to f-divergences

• Plug f(u)=-log(u) and the f-divergence is the KLD

Classes of distances: Csiszar’s f-divergence
• Function f convex, strictly convex at 1, with f(1)=0

On the chi square and higher-order chi distances for approximating f-divergences, IEEE SPL 2013
The Fisher information matrix and f-divergences

• Fisher Information Metric (FIM): Fisher Riemannian metric

• Infinitesimally, the Kullback-Leibler or f-div related to the FIM

Invariant divergence = f-divergences
• Lump or coarse-bin a separable distance, and ask for
information monotonicity

Theorem: The only monotone separable divergences are f-divergences

(except for the curious case of binary alphabets),
f-divergences are invariant by diffeormorphisms of the sample space
Dual connections from any divergence! (M, 𝐷𝐷g, 𝐷𝐷𝜵𝜵, 𝐷𝐷𝜵𝜵*)

Dual connections from any smooth parametric distance,

called a (parameter) divergence D: D is not necessarily symmetric
𝜕𝜕 𝜕𝜕
• a tensor metric g: gij(pξ)= 𝑖𝑖 𝑗𝑗2D(pξ1,pξ2)|ξ1=ξ2=ξ
𝜕𝜕ξ 𝜕𝜕ξ
1

• a torsion-less affine connection 𝛻𝛻:

𝜕𝜕 𝜕𝜕 𝜕𝜕
Γijk(p 𝜉𝜉 )= − 𝑖𝑖 𝑗𝑗 𝑘𝑘 D(pξ1,pξ2)|ξ1=ξ2=ξ
𝜕𝜕ξ 𝜕𝜕ξ 𝜕𝜕ξ
1 2 2
Dual divergences
D*(pξ1,pξ2)= D(pξ2,pξ1)
and dual connections
Symmetric divergences yields the same connection:
The Levi-Civita connection
Which geometry is best suited for
clustering normalized histograms?
Bag of words
Fisher-Rao geometry of the
categorical distribution (standard simplex)
• Trinomial (trinoulli)
Fisher information metric:

Square root embedding

(Hotelling)-Fisher-Rao distance:

Embedding to the sphere positive orthant

Pattern Learning and Recognition on Statistical Manifolds: An Information-Geometric Review, SIMBAD 2013
Clustering in Hilbert simplex geometry, arXiv:1704.00454
Hilbert log cross-ratio metric

Geodesics are straight lines but not unique

Finsler
geometry!

Clustering in Hilbert simplex geometry. CoRR abs/1704.00454 (2017)

Hilbert log cross-ratio metric for the standard simplex
Experiments
With
K-means

Clustering in Hilbert simplex geometry. CoRR abs/1704.00454 (2017)

Sailing on a sea of distances:
• Which distance is suitable?
• Which loss function to minimize?
• Which “metric” to evaluate?
•…
• Are there first principles?
Classes of distances: Bregman divergence
• Bregman divergence between parameters for a strictly convex and
differentiable convex function F Geometric
design

Unify squared Euclidean geometry

and geometry of information theory

• The canonical divergence of dually flat spaces

Mining matrix data with Bregman matrix divergences for portfolio selection."Matrix Information Geometry.
Springer, Berlin, Heidelberg, 2013. 373-402.
Jensen difference/Jensen divergence (Burbea-Rao)
Geometric
design
• Introduced by Burbea and Rao
• Vertical gap induced by Jensen inequality

Asymptotic scaled Jensen

divergence amount to a Bregman
or reverse Bregman divergence

The Burbea-Rao and Bhattacharyya centroids." IEEE Transactions on Information Theory 57.8 (2011): 5455-5466.
Bregman chord divergence: https://arxiv.org/abs/1810.09113
A family of statistical symmetric divergences based on Jensen's inequality, arXiv:1009.4004
Classical wisdom of machine learning:
The bias-variance tradeoff…

We used to think: Do not overfit Test sample for evaluating

for better generalization! generalization error
Modern view of machine learning:
The age of Interpolation machines!
• Ok, let us do zero-training error! (Gaussian processes or Neural Networks)

Neural networks
perform well even when
overparameterizated

• But to have good models, let us set Occam’s razor principle to choose the
smoothest interpolating function

Deep Neural Networks as Gaussian Processes, https://arxiv.org/abs/1711.00165

Modern view of machine learning:
The double descent view/regime of models

Semi-Riemannian
geometry of
neuromanifold
and learning
trajectories

Reconciling modern machine learning practice and the bias-variance trade-off, https://arxiv.org/abs/1812.11118

Lightlike Neuromanifolds, Occam's Razor and Deep Learning, 1905.11027

Computational information
Concluding remarks geometry
for ML+AI!

• From the very beginning, computational geometry played

a major role in machine learning!

• Geometry: Design guiding principle promoting insightful intuition

and science of invariance. Meaning of distances.

• Dualistic structure of information geometry + information projection:

Role of Fisher information matrix/metric in ML (Fisher kernel, etc.)
Theory of communication between data/(sub)models and models
(but may seem at first counterintuitive)
Geometric Science of Information (GSI)
Co-organized every 2 years since 2013

180 participants, August, Toulouse, France, 2019

Joint Structures and Common Foundations of

Statistical Physics, Information Geometry and
Inference for Learning
26th July to 31st July 2020
Ecole de Physique des Houches
View publication stats

ResearchCards 18sept2020 PDF
No ratings yet
ResearchCards 18sept2020 PDF
172 pages
Support Vector Machine
No ratings yet
Support Vector Machine
198 pages
Data Mining1
No ratings yet
Data Mining1
3 pages
A Geometric Modeling of Occam's Razor in Deep Learning
No ratings yet
A Geometric Modeling of Occam's Razor in Deep Learning
37 pages
A Geometric Modeling of Occam's Razor in Deep Learning: How To Measure The Simplicity or The Complexity of A Model
No ratings yet
A Geometric Modeling of Occam's Razor in Deep Learning: How To Measure The Simplicity or The Complexity of A Model
31 pages
Metric Learning and Manifolds: Preserving The Intrinsic Geometry
No ratings yet
Metric Learning and Manifolds: Preserving The Intrinsic Geometry
37 pages
Information Geometry in ML & Optimization
No ratings yet
Information Geometry in ML & Optimization
3 pages
An Elementary Introduction To Information Geometry
No ratings yet
An Elementary Introduction To Information Geometry
56 pages
CS-AI LECUTRE NOTES Unsupervised Learning-03
No ratings yet
CS-AI LECUTRE NOTES Unsupervised Learning-03
71 pages
PresCNRS22 2
No ratings yet
PresCNRS22 2
37 pages
Research Topics Offered
No ratings yet
Research Topics Offered
55 pages
Entropy EIG
No ratings yet
Entropy EIG
62 pages
HDT SOP Report 1 Copy Copy
No ratings yet
HDT SOP Report 1 Copy Copy
19 pages
Manifold Learning: What, How, and Why: Marina Meila, Hanyu Zhang, November 8, 2023
No ratings yet
Manifold Learning: What, How, and Why: Marina Meila, Hanyu Zhang, November 8, 2023
33 pages
What Are Applications of CONVEX SETS and The Notion of Convexity in Mathematics and Science
No ratings yet
What Are Applications of CONVEX SETS and The Notion of Convexity in Mathematics and Science
20 pages
Manifold Learning Theory and Applications 9781439871102 Compress
No ratings yet
Manifold Learning Theory and Applications 9781439871102 Compress
322 pages
IPAM Splines
No ratings yet
IPAM Splines
48 pages
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
No ratings yet
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
29 pages
Geometric Foundations of Deep Learning by Michael Bronstein Towards Data Science
No ratings yet
Geometric Foundations of Deep Learning by Michael Bronstein Towards Data Science
22 pages
L5 - UCLxDeepMind DL2020
No ratings yet
L5 - UCLxDeepMind DL2020
52 pages
Factor Graphs For Robot Perception
100% (1)
Factor Graphs For Robot Perception
144 pages
Machine Learning and Econometrics
No ratings yet
Machine Learning and Econometrics
50 pages
Lec 02
No ratings yet
Lec 02
27 pages
Revisiting Natural Gradient For Deep Networks
No ratings yet
Revisiting Natural Gradient For Deep Networks
18 pages
121 Testing Manifold
No ratings yet
121 Testing Manifold
67 pages
Estimating Riemannian Metric with Noise
No ratings yet
Estimating Riemannian Metric with Noise
24 pages
UMAP
No ratings yet
UMAP
28 pages
Martinet Z 1993
No ratings yet
Martinet Z 1993
12 pages
SCH Smo 03 C
No ratings yet
SCH Smo 03 C
24 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
Geometric Deep Learning Guide
No ratings yet
Geometric Deep Learning Guide
160 pages
Data-Driven Healthcare Insights
No ratings yet
Data-Driven Healthcare Insights
38 pages
Riemannian Geometric Statistics in Medical Image Analysis 1st Edition Xavier Pennec PDF Available
100% (2)
Riemannian Geometric Statistics in Medical Image Analysis 1st Edition Xavier Pennec PDF Available
102 pages
Geometric Deep Learning With Grids Groups Graphs Geodesics and Gauges
No ratings yet
Geometric Deep Learning With Grids Groups Graphs Geodesics and Gauges
160 pages
Convex Optimization Euclidean Distance Geometry Jon Dattorro Digital Download
100% (2)
Convex Optimization Euclidean Distance Geometry Jon Dattorro Digital Download
144 pages
An Elementary Introduction To Information Geometry
No ratings yet
An Elementary Introduction To Information Geometry
63 pages
Charting A Manifold: Mitsubishi Electric Information Technology Center America, 2003
No ratings yet
Charting A Manifold: Mitsubishi Electric Information Technology Center America, 2003
10 pages
Handbook of Research On Machine Learning Applications and Trends
100% (1)
Handbook of Research On Machine Learning Applications and Trends
34 pages
An Elementary Introduction To Information Geometry
No ratings yet
An Elementary Introduction To Information Geometry
61 pages
Probabilistic Machine Learning Supplement
No ratings yet
Probabilistic Machine Learning Supplement
307 pages
Machine-Generated Fingerprint Classes Mahalanobis Distance: 2009 Springer Science+Business Media, LLC
No ratings yet
Machine-Generated Fingerprint Classes Mahalanobis Distance: 2009 Springer Science+Business Media, LLC
48 pages
Classification Techniques
No ratings yet
Classification Techniques
99 pages
Algorithms and Complexity
No ratings yet
Algorithms and Complexity
130 pages
Riemannian Geometric Statistics in Medical Image Analysis 1st Edition Xavier Pennec PDF Download
100% (1)
Riemannian Geometric Statistics in Medical Image Analysis 1st Edition Xavier Pennec PDF Download
61 pages
Multivariate Approximation and Applications 1st Edition N. Dyn Full Chapters Included
100% (3)
Multivariate Approximation and Applications 1st Edition N. Dyn Full Chapters Included
137 pages
Fit Without Fear - Remarkable Mathematical Phenomena of Deep Learning Through The Prism of Interpolation
No ratings yet
Fit Without Fear - Remarkable Mathematical Phenomena of Deep Learning Through The Prism of Interpolation
51 pages
Supp 2
No ratings yet
Supp 2
332 pages
Self Reading - KNN - Notes
No ratings yet
Self Reading - KNN - Notes
7 pages
DWDM Rit-E22 Unit4
No ratings yet
DWDM Rit-E22 Unit4
139 pages
Probabilitic Modelling of Anatomical Shapes
No ratings yet
Probabilitic Modelling of Anatomical Shapes
73 pages
Optim
No ratings yet
Optim
33 pages
Nielsen A Geometric Modeling of Occams Razor in Deep Learning
No ratings yet
Nielsen A Geometric Modeling of Occams Razor in Deep Learning
41 pages
Pat Recogn
No ratings yet
Pat Recogn
145 pages
Metric-Based Classifiers: Nuno Vasconcelos (Ken Kreutz-Delgado)
No ratings yet
Metric-Based Classifiers: Nuno Vasconcelos (Ken Kreutz-Delgado)
32 pages
Tutorial4 SVM
No ratings yet
Tutorial4 SVM
8 pages
תרגול - SVM 1
No ratings yet
תרגול - SVM 1
32 pages
Fermat Study of Variance-10
No ratings yet
Fermat Study of Variance-10
20 pages
Carlsson - Topology and Data
No ratings yet
Carlsson - Topology and Data
54 pages

FrankNielsen Soph IA 21nov2019

Uploaded by

FrankNielsen Soph IA 21nov2019

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

From geometric learning machines to the geometry of AI

Presentation · November 2019

Computational Information Geometry View project

High Performance Computing (HPC) View project

The user has requested enhancement of the downloaded file.

• Introduction to some famous geometric learning machines

Introduction to HPC with MPI for Data Science, Springer, 2016

Intrinsic geometry versus extrinsic isometric embedding

Smallest enclosing ball:

Independent of input size n

Apply to finite point sets in infinite-dimensional spaces too!

On approximating the Riemannian 1-center. Comput. Geom. 46(1): 93-104 (2013)

Update weights iteratively:

Marvin Minsky and Seymour Papert:

Hidden layers: universal function approximators (post XOR)

FIM is positive-semidefinite, positive-definite for regular models

CRLB Theorem: depends on true

Under regularity conditions:

• Need to integrate the infinitesimal length elements along the geodesics…

No closed-form Fisher-Rao distance between multivariate normals!

• First-order optimization method

Similarly, maximization with hill climbing, steepest ascent

Computing the inverse of the Fisher information matrix is tricky!

Relative Fisher IM:

Two (affine) coordinate systems coupled by Legendre-Fenchel transformation

3 vertices define 6 geodesic edges from which

Proof using the Pythagoras’ theorem, min. distance

Guaranteed unique projection Non-unique projection

In dually flat spaces, uniqueness of projections when

What is an information projection?, Notices AMS 65.3 (2018)

Pythagoras’ theorem (Fisher orthogonality)

What is an information projection?, Notices AMS 65.3 (2018)

• KLD can be extended to f-divergences

• Plug f(u)=-log(u) and the f-divergence is the KLD

• Fisher Information Metric (FIM): Fisher Riemannian metric

• Infinitesimally, the Kullback-Leibler or f-div related to the FIM

Theorem: The only monotone separable divergences are f-divergences

Dual connections from any smooth parametric distance,

• a torsion-less affine connection 𝛻𝛻:

Square root embedding

Embedding to the sphere positive orthant

Geodesics are straight lines but not unique

Clustering in Hilbert simplex geometry. CoRR abs/1704.00454 (2017)

Clustering in Hilbert simplex geometry. CoRR abs/1704.00454 (2017)

Unify squared Euclidean geometry

• The canonical divergence of dually flat spaces

Asymptotic scaled Jensen

We used to think: Do not overfit Test sample for evaluating

Deep Neural Networks as Gaussian Processes, https://arxiv.org/abs/1711.00165

Lightlike Neuromanifolds, Occam's Razor and Deep Learning, 1905.11027

• From the very beginning, computational geometry played

• Geometry: Design guiding principle promoting insightful intuition

• Dualistic structure of information geometry + information projection:

180 participants, August, Toulouse, France, 2019

Joint Structures and Common Foundations of

You might also like