0% found this document useful (0 votes)
34 views42 pages

Persistent Homology and Applied Homotopy

The document discusses persistent homology, a technique developed to analyze the topological features of data sampled from geometric objects. It outlines various research directions, including coordinatization of barcodes, generalized persistence, stability results, and applications in coverage problems and symplectic geometry. The goal is to provide an overview of these techniques and their applications to help readers understand the landscape of persistent homology and its relevance in data analysis.

Uploaded by

Sid Ahmed Mein
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views42 pages

Persistent Homology and Applied Homotopy

The document discusses persistent homology, a technique developed to analyze the topological features of data sampled from geometric objects. It outlines various research directions, including coordinatization of barcodes, generalized persistence, stability results, and applications in coverage problems and symplectic geometry. The goal is to provide an overview of these techniques and their applications to help readers understand the landscape of persistent homology and its relevance in data analysis.

Uploaded by

Sid Ahmed Mein
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Persistent Homology and Applied Homotopy

Theory
arXiv:2004.00738v1 [math.AT] 1 Apr 2020

Gunnar Carlsson
Mathematics Department
Stanford University
and
Ayasdi, Inc.

April 3, 2020

Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 The Motivating Problem . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 The Structure Theorem for Persistence Vector Spaces over a Field . 7
4 Complex Constructors . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Čech Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3 Vietoris-Rips Complex . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4 Alpha Complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.5 Witness Complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.6 Mapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Metrics on Barcode Space, and Stability Theorems . . . . . . . . . . 15
5.1 Metrics on Barcode Space . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Stability Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6 Tree-like Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 19
7 Persistence and Feature Generation . . . . . . . . . . . . . . . . . . . 20
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2 Algebraic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.3 Persistence Landscapes . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.4 Persistence Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
8 Generalized Persistence . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.2 Zig-zag Persistence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
8.3 Multidimensional Persistence . . . . . . . . . . . . . . . . . . . . . . 28
9 Coverage and Evasion Problems . . . . . . . . . . . . . . . . . . . . . 31

1
10 Probabilistic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 33
10.1 Random Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
10.2 Robust Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
10.3 Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2
1 Introduction

Persistent homology is a technique that has been developed over the last 20
years. Initial ideas developed in the early 1990’s [37], but the idea of persis-
tence was introduced by Vanessa Robins in [60], rapidly followed by additional
development ([35], [68]), and has been developing rapidly since that time. The
original motivation for the method was to extend the ideas of algebraic topol-
ogy from the category of spaces X to situations where we only have a sampling
of the space X. Of course, a sample is a discrete space so there is nothing to
be obtained unless one retains some additional information. One assumes the
presence of a metric or a more relaxed “dissimilarity measure”, and uses this in-
formation restricted to the sample in constructing the algebraic invariant. Over
time, persistent homology has been used in other situations, for example where
one has a topological space with additional information, such as a continuous
real valued function, and the sublevel sets of the function determine a filtra-
tion on the space. The output of standard persistent homology (we will discuss
some generalizations) is represented in two ways, via persistence barcodes and
persistence diagrams. Initially persistent homology was used, as homology is
used for topological spaces, to obtain a large scale geometric understanding of
complex data sets, encoded as finite metric spaces. Examples of this kind of
application are [18], [40], [47], [59], and [25]. Another class of applications uses
persistent homology to study data sets where the points themselves are metric
spaces, such as databases of molecule structures or images. This second set of
applications is developing very rapidly, and is exemplified in [15], [16], [66], and
[44]. Another direction in which persistent homology is being applied is in the
study of coverage and evasion problems arising in sensor net technology [27].
Example of research in this direction are [31], [32], [1], and [39].

There are a number of different active research directions in this area.

• Coordinatization of barcodes: Barcodes in their native form, i.e. as


a set of intervals, do not lend themselves to analysis by machine learning
techniques. It is therefore important to represent them in a method more
amenable to analysis. This can be achieved by appropriate coordinatiza-
tions of the space of barcodes. Several methods for this task are described
in Section 7.
• Generalized persistence: Persistent homology has as its output a di-
agram of complexes, parametrized by the partially ordered set of real
numbers, on which algebraic computations are performed so as to pro-
duce barcodes. There are other parameter categories that are useful in
the study of data sets. We discuss two examples of this notion in Section
8, but we would expect there to be many different types of diagrams that
will shed light on finite metric spaces.
• Stability results: Since noise and error are key elements in the study of

3
data, it is important to develop methods that quantify the dependence of
the barcode output on small perturbations of data. This requires the impo-
sition of metrics on the set of barcodes, and proving theorems concerning
the distances between barcodes that differ only by small perturbations.
The progress that has been made in this direction is described in Section
5.
• Probabilistic analysis and inference: Because many of the applica-
tions of persistent homology occur in the study of data, it becomes im-
portant to study not only stability results under small perturbations, but
also perturbations that are “probabilistically small”, i.e. which may be
large, but where a large perturbation is a rare event. This means that one
must study the distributions on barcode space that occur from applying
persistent homology to complexes generated by various random models.
This is a rapidly developing area within the subject.
• Coverage and evasion problems: This work centers around attempts
to understand complements in Euclidean space of regions defined by col-
lections of sensors. It has been approached using different methods, and
appears to be a place where techniques such as Spanier-Whitehead duality
and embedding analysis, applied in suitably generalized situations, should
play a role. The general problem of understanding complements of objects
embedded in Euclidean space of course also plays a role in robotics.
• Symplectic geometry: Although persistent homology is mainly used in
situations where one is examining discrete approximations to continuous
objects, it can be applied in any geometric situation where there is a
metric, or where one is considering filtered objects. Such situations occur
in symplectic geometry, and there is recent work applying the technique
in studying, for example, Floer homology spectra. Examples of this kind
of work are [13], [49], [56], [57], and [65].

The goal of this paper is to discuss the different research directions and applica-
tions at a high level, so that the reader can orient him/herself in the techniques.
We remark that there are a number of useful surveys on persistent homology
and on topological data analysis more generally. The papers [19], [36], and [55]
give different perspectives on this subject.

2 The Motivating Problem

Suppose that we are given a set of points X in the plane, and believe that it
is reasonable to assume that the points are sampled (perhaps with error) from
a geometric object. We could ask for information about the homology of the
underlying space from which X is sampled. Consider the finite set of points X
in R2 displayed in Figure 1 below.

4
Figure 1: Statistical Circle

When we examine X, we observe that it appears to be sampled from a loop,


and would like to have algebraic tools that capture the “loopy” structure of the
set. Note that we are only given a discrete set of points, so direct application
of homological constructions will only produce the homology of a finite set of
points. However, we could attempt to construct a space from the set X, which in
a sense fills in the gaps between the points. We will need to use some additional
information about the points, and that will in this case be restriction of the
Euclidean metric to X. One distance based construction is the Vietoris-Rips
complex.

Definition 2.1 For any finite metric space (X, d), and every R ≥ 0, let V(X, R)
denote the simplicial complex with vertex set equal to X, and such that {x0 , . . . , xk }
spans a k-simplex if and only if d(xi , xj ) ≤ R for all 0 ≤ i < j ≤ k.

Notice that if R is smaller than the smallest interpoint distance, then V(X, R)
will be a discrete complex on the set X. On the other hand, if R is greater
than the diameter of X, then V(X, R) is a full simplex on X. For intermediate
values, one obtains other complexes. In this case, there is a range of values
of R in which V(X, R) has the homotopy type of a circle. One could ask if
there is a principled way to choose a threshold R based on only the distances
between the points. After a great deal of experimentation, one finds that this
is a very difficult, if not unsolvable problem. A question that one can ask is if
there is a more structured object that one can study which incorporates all the
thresholds in a single object, which can be analyzed in a number of different
ways. Statisticians have confronted this problem in an analogous situation.

Hierarchical Clustering: The clustering problem in statistics is to determine


ways to infer the set of connected components of a metric space X from fi-
nite samples Y . One of the approaches statisticians developed was to compute
π0 V(Y, R) for a choice of threshold R, but they confronted the analogous prob-
lem to the one we discussed above, namely the selection of R. They came to the

5
conclusion that the problem of selecting a threshold in a principled way is not
a well posed problem, but managed to construct a structured output, called a
dendrogram, which allowed them to study the behavior of all thresholds at once.
It is defined as follows. For each threshold R, we obtain a set of components
π0 V(Y, R), which yields a partition ΠR of Y . If we have R ≤ R0 , then ΠR is
a refinement of ΠR0 . One definition of a dendrogram structure on a finite set
Y is a parametrized family {ΠR }R≥0 of partitions of Y , with the property that
ΠR refines ΠR0 whenever R ≤ R0 , and so that for any partition Π, the intervals
{R|ΠR = Π} are either empty or closed on the left and open on the right. We
assume that there is an R∞ so that ΠR∞ is the partition with one block, namely
Y . The reason for this terminology is that this information is equivalent to a
tree D with a reference map to the non-negative real line. The tree D is defined
as follows. The points of D are pairs (c, R), where c is a block in the partition
ΠR , and 0 ≤ R ≤ R∞ . We clearly have a reference map to [0, R∞ ] given by
(c, R) → R. To define a topology on this set, we construct an auxiliary space
Z, defined as a
Z= [0, R∞ ]c
c

where c is a block in the partition Π0 , and [0, R∞ ]c denotes a copy of the


interval [0, R∞ ] labelled by c. There is a natural map ϕ from Z to D given by
(c, t) → (ρt (c), t), where ρt denotes the projection Y /Π0 → Y /Πt . It is clear
that ϕ is a surjective map, and therefore that D is the quotient of Z by an
equivalence relation. The topology on D is the quotient topology associated to
the topology on Z. It is easy to check that this topology makes D into a rooted
tree. A tree with a reference map can be laid out in the plane, as indicated
below, and one can recover directly the clustering at any given value of R.

Figure 2: Dendrogram

The dendrogram can be regarded as the “right” version of the invariant π0 in


the statistical world of finite metric spaces. The question now becomes if there
are similar invariants that can capture the notions of higher homotopy groups
or homology groups. In order to define them, we need a preliminary definition.

Definition 2.2 A submonoid A ⊆ R+ is said to be pure if given any r0 , r1 , and

6
r2 in R+ , with r0 and r2 in A, so that r0 + r1 = r2 , then r1 ∈ A. A is a totally
ordered set in its own right by restriction of the total order on R+ .

For example, N (the non-negative integers) and Q+ are both pure. We can now
make a definition that includes the dendrogram as a special case.

Definition 2.3 Let C denote any category. A persistence object in C is a func-


tor R+ → C, where R+ denotes the ordered set of non-negative real numbers,
regarded as a category, so that there is a unique morphism r0 → r1 whenever
r0 ≤ r1 . More generally, if A ⊆ R+ is any pure submonoid, an A-parametrized
persistence object in C we will mean a functor from the ordered set A to C. It is
clear that the A-parametrized persistence objects in C form a category in their
own right, where the morphisms are the natural transformations of functors.
We will denote this category by PAC, with the special case of R+ denoted by
PC.

It is now easily checked that the functor R+ → Sets defined by

R → π0 (V(X, R))

where X is a finite metric space, is a persistence object in Sets, or a persistence


set. This is the case because π0 is a set valued functor. Other topological invari-
ants, such as homology or homotopy groups, take their values in the categories
Grp, Ab, or R − mod, and one can construct persistence objects in these cate-
gories by applying these functors to the Vietoris-Rips complexes. The critical
question becomes whether or not the isomorphism classes of these persistence
objects are in any sense understandable, and useful for distinguishing or under-
standing the underlying metric spaces.

3 The Structure Theorem for Persistence Vec-


tor Spaces over a Field

For the entirety of this section, K will denote a field, which will be fixed through-
out. We are interested in the isomorphism classification of A-parametrized per-
sistence K-vector spaces, for pure submonoids A of R+ . This is very complicated
in general, but is manageable for objects of PVect(K) satisfying a finiteness con-
dition, which is always satisfied for the Vietoris-Rips complexes associated with
finite metric spaces. Let M be any commutative monoid. By an M-graded K-
vector space, we will mean a K-vector space V equipped with a decomposition

V ∼
M
= Vµ
µ∈M

7
Given two M-graded K-vector spaces V∗ and W∗ , by their tensor product we
will mean the tensor product V ⊗ W equipped with the M-grading given by
K
M
(V ⊗ W )µ = Vµ1 ⊗ Wµ2
K
µ1 +µ2 =µ

and we write V∗ ⊗ W∗ for this construction. An M-graded K-algebra is then an


K
M-vector space R∗ together with a homomorphism R∗ ⊗ R∗ → R∗ , satisfying
K
the associativity and distributivity conditions. An important example is the
monoid K-algebra K[M]∗ , for which the grading is given by

K[M]µ = K · µ

It will be convenient to write tµ for elements µ ∈ K[M]∗ . We define the notion


of a M-graded R∗ -module M∗ in a similar way.

We specialize to the situation M = A, where A is a pure submonoid A ⊆ R+ .


We will demonstrate the classification of A-parametrized persistence modules by
using an equivalence of categories to the category of A-graded K[A]∗ -modules.

Proposition 3.1 Let K[A]∗ denote the monoid algebra of A over K, for a pure
submonoid A ⊆ R+ , regarded as a graded K-algebra. Let G(A, K) denote the
category of A-graded K[A]∗ -modules. Then there is an equivalence of categories

PAVect(K) ∼
= G(A, K)

Proof: Given a functor θ : A → Vect(K), we will denote by M (θ) the graded


K-vector space M
M (θ) = θ(α)
α∈A

where the elements of grading α are precisely the elements of the summand
θ(α). We now extend the vector space structure to a graded K[A]∗ -module
structure by defining the action tα · : θ(α0 ) → θ(α + α0 ) to be equal to the linear
transformation
θ(α0 → α + α0 ) : θ(α0 ) → θ(α + α0 )
where α0 → α + α0 denotes the unique morphism in A from α to α + α0 . It is
clear that this defines an A-graded K[A]∗ -module structure on M (θ), and that
M is a functor. We produce an inverse functor η : G(A, K) −→ PAVect(K) on
objects by setting
η(M∗ )(α) = Mα
and on morphisms by

η(M∗ )(α0 → α + α0 ) = tα · : Mα0 → Mα+α0

8
The functors θ and η are clearly inverse to each other. 

Let A be any pure submonoid of R+ . Then for any α ∈ A, we define F (α) to


be the free A-graded K[A]∗ -module on a single generator in grading α. For any
pair α, α0 ∈ A, where α0 > α, we define F (α, α0 ) to be the quotient
0
F (α)/(tα −α · F (α))

The following result describes the isomorphism classification of finitely presented


graded K[A]∗ -modules.

Proposition 3.2 Any finitely presented object of G(A, K) is isomorphic to a


module of the form
Mm n
M
F (αs ) ⊕ F (αt , αt0 )
s=1 t=1

Moreover, the decomposition is unique up to reordering of summands. The ker-


nel of any homomorphism between two finitely generated free A-graded modules
is itself a finitely generated free A-graded module.

Remark 3.1 This result is formally very similar to the structure theorem for
finitely generated modules over a principal ideal domain (PID). Indeed, for the
case of A = N, where K[A] is Noetherian, the result is exactly a structure theo-
rem for finitely generated graded modules over the graded ring K[t]. For other
choices of A, K[A]∗ is not necessarily Noetherian. However, it does turn out
to be coherent, i.e. having the property that the kernel of any homomorphism
between finitely generated free modules is finitely generated.

Proof: A proof is given in [23]. 

Remark: Notice that the proof also gives an algorithm for producing a matrix
in the diagonal form given above.

The above theorem is summarized using the following definition.

Definition 3.1 By an A-valued barcode, we will mean a finite set of elements

(α, α0 ) ∈ A × (A ∪ {+∞})

satisfying the condition α < α0 . An A-valued barcode is said to be finite if the


right hand endpoint +∞ does not occur. If A = R+ , we will simply refer to it
as a barcode, without labeling by the monoid. We have shown that isomorphism
classes of finitely presented K[A]∗ -modules are in bijective correspondence with
A-valued barcodes.

9
Remark 3.2 Barcodes are often represented visually as collections of intervals.
The image in Figure 3 shows barcodes in dimensions zero and one.

Notice that the zero-dimensional barcode has one infinite interval, while the
one-dimensional barcode is finite. There is an equivalent visual representation
called the persistence diagram in which each interval is encoded as a point (x, y)
in the plane, where x and y are the left and right hand endpoints respectively.
An example is pictured in Figure 4.

An advantage of this representation made apparent in this image is that one can
represent different homology groups in the same diagram. The blue dots are
in this case the zero-dimensional persistence diagram and the orange ones are
the one-dimensional ones. When there are infinite intervals, one often selects
an upper threshold τ for the persistence parameter, and represents the infinite
interval by one with left hand endpoint τ .

We also have the following.

Proposition 3.3 The kernel of a homomorphism between finitely generated free


K[A]∗ -modules is finitely generated free.

Proof: This follows immediately from the matrix analysis in the proof of Propo-
sition 3.2. 

Corollary 3.1 Given any chain complex of finitely generated free graded K[A]∗ -
modules, the homology modules are finitely presented K[A]∗ -modules.

We also interpret this result in terms of persistence vector spaces. The category
PAVect(K) is clearly an abelian category. For any α0 ∈ A, we define V (α) to
be the persistence vector space {Vα }α∈A , where Vα = {0} for α < α0 , Vα = K
for α ≥ α0 , and where for any α0 ≤ α ≤ α0 the linear transformation Vα → Vα0
is the identify on K. For any α < α0 , we also define the persistence vector
space V (α, α0 ) to be the quotient of V (α) by the image of the natural inclusion
V (α0 ) ,→ V (α). Corollary 3.1 now has the following consequence.

Figure 3: Persistence Barcodes

10
Figure 4: Persistence Diagram

Corollary 3.2 Let C∗ be a chain complex of A-persistence vector spaces, so that


for every s, Cs is a finite direct sum of persistence vector spaces, each of which
is of the form V (α) for some α ∈ A. Then for each s, Hs (C∗ ) is isomorphic to
a direct sum of finitely many persistence vector spaces, each of which is of the
form V (α) or V (α, α0 ) for α, α0 ∈ A, and α < α0 . In particular, the homology
in each dimension s is uniquely described by an A-valued barcode.

4 Complex Constructors

4.1 Introduction

All data that we consider consists of finite sets of points X. The space X
itself is uninteresting topologically, since it is a discrete set of points. This
means that we have to build a space using auxiliary information attached to
the set of points. The auxiliary information we choose is a metric on the set
X, so X is a finite metric space. In the context of data sets, metrics are often
referred to as dissimilarity measures, since small distances between data points
often reflect notions of similarity between data points. Often the metric chosen
is the restriction of a well known and analyzed metric on an ambient space
containing X, such as n-dimensional Euclidean space. Other choices that are
often appropriate are Hamming distance, correlation distances, and normalized
variants (mean centering of coordinate functions, normalizing variance to 1) of
Euclidean distance. One method for constructing spaces based on metrics is
the Vietoris-Rips complex that we have seen above. It is actually a persistence
object in the category of simplicial complexes. All the constructions we will
look at except the Mapper construction are naturally persistence objects in the
category of simplicial complexes. The Mapper construction can also be equipped
with many such structures, but they are not canonical. Because of the presence
of a persistence structure, the Čech, Vietoris-Rips, alpha, and witness complexes

11
all have persistence barcodes associated to them in all non-negative dimensions.
It is obvious from the constructions that the zero-dimensional barcodes have a
single infinite interval, and that all higher dimensional barcodes are finite.

4.2 Čech Construction

Let (X, d) denote a finite metric space. Given a threshold parameter value R, let
UR denote the covering of X by all balls BR (x) = {x0 |d(x, x0 ) ≤ R}. The Čech
complex at scale R is the nerve of the covering UR , and we denote it by Č(X; R).
It is clear that for R ≤ R0 , there is an inclusion Č(X; R) ,→ Č(X, R0 ), and
that therefore {Č(X; R)}R is a persistence object in the category of simplicial
complexes. From the theoretical point of view, it has the advantage that given
a covering of a topological space (with suitable point set hypotheses) by open
sets, the nerve lemma (see [48], Thm. 15.21) gives a criterion that guarantees
that the nerve of the covering is homotopy equivalent to the original space.

4.3 Vietoris-Rips Complex

The Čech construction from the previous section is computationally expensive,


because it involves computing simplices individually at every level. We can
create another construction that has a strong relationship with the Čech con-
struction. Recall that a simplicial complex Z is said to be a flag complex if for
any collection σ of vertices {z0 , . . . , zk } for which each pair (zi , zj ) is an edge,
σ is a k-simplex of X. From the computational point of view, flag complexes
are attractive because one needs only enumerate all the edges in the complex,
rather than all the higher order simplices. The Vietoris-Rips complex which
was defined in Definition 2.1 is by definition a flag complex for every parameter
value R. There is a relationship between the persistent Čech and Vietoris-Rips
complexes.

Proposition 4.1 There are inclusions


Č(X, R) ,→ V(X, 2R)
and
V(X, R) ,→ Č(X, R)

For specific situations, this bound can be improved. For example, it is shown
in [31] that if X ⊆ Rd is equipped with the restricted metric, then
r
0 R 2d
V(X, R ) ⊆ Č(X, R/2) ⊆ V(X, ) if 0 ≥
R d+1
This result allows us to compare homology computed using the Čech and Vietoris-
Rips methods.

12
4.4 Alpha Complex

There is another kind of complex whose dimension is low and which generally
has a moderate number of simplices. It is called the alpha complex, or the
alpha shapes complex, and was introduced in [34], with a thorough description
in [5]. It applies to data X that is embedded Euclidean space Rn , and so that
the metric on X is the restriction of the Euclidean metric to X. Typically the
number n is relatively small, say ≤ 5, because the construction becomes quite
expensive in higher dimensions. Also, the complex is generically of dimension
≤ n. The notion of generic is the following. Given any set of points X ⊆ Rn ,
it is possible for the alpha complex to have dimension higher than n, but it is
possible to perturb all the points by an arbitrarily small amount and obtain a
complex that is n-dimensional.

For any point x ∈ X, we define the Voronoi cell of x, denoted by V (x), by

V (x) = {y ∈ Y |d(x, y) ≤ d(x0 , y) for all x0 ∈ X}

The collection of all Voronoi cells for a finite subset of Euclidean space is called
its Voronoi diagram. A Voronoi diagram in R2 might look like this.

Figure 5: Alpha Complex in R2

For each x ∈ X, we also denote by B (x) the set {y ∈ Y |d(x, y) ≤ }. By


the α-cell of x ∈ V (x) with scale parameter , we will mean the set A (x) =
B (x) ∩ V (x). The α-complex with scale parameter  of a subset x ∈ X, denoted
by α (X) will be the abstract simplicial complex with vertex set X, and where
the set {x0 , . . . , xk } spans a k-simplex if and only if
k
\
A (xi ) 6= ∅
i=0

It is of course the nerve of the covering of Rn by the sets A (xi ).

4.5 Witness Complex

The witness complex was introduced in [30]. It can be thought of as an analogue


of the alpha complex for non-Euclidean data. The construction of the Voronoi
cells is not dependent on the fact that the embedding space is Euclidean. It can

13
be constructed for any embedding of a data set in a larger metric space. Given a
data set X, we therefore select a subset of landmarks L ⊆ X. We can now build
the analogues of the Voronoi cells for each of the landmark points within X, and
construct the nerve of the covering. In this case, both the ambient space and
the landmark set are usually taken to be finite. We will also need to introduce
persistence into this picture. The construction is as follows.

Definition 4.1 Given a metric space (X, d), a finite subset L ⊆ X, called the
landmark set, and a persistence parameter R, and for every x ∈ X we denote by
mx the distance from x to the set L. We define a simplicial complex W (X, L, R)
as follows. The vertex set of W (X, L, R) is the set L, and {l0 , l1 , . . . , lk } spans
a k-simplex if and only if there is a point x ∈ X (the witness) so that d(x, li ) ≤
mx + R for all i. The family of complexes {W (X, L, R)}R form a persistence
simplicial complex.

There are several variants on this construction. For example, there is the “lazy”
version in which the 1-simplices are the identical to the 1-simplices of the witness
construction, but in which we declare that any higher dimensional simplex is
an element of the complex if and only if all its one dimensional faces are. Each
of the lazy complexes is a flag complex. The lazy witness complex bears the
same relationship to the witness complex as the Vietoris-Rips complex bears to
the Čech complex. There is also the weak witness complex, {W weak (X, L, R)}R ,
which is defined as follows. For each point x ∈ X, we let δx denote the distance
to the second closest element of Λ to x. We then declare that a pair (λ1 , λ2 )
spans an edge of W weak (X, L, R) if and only if there is an x ∈ X so that

max(d(λ1 , x), d(λ2 , x)) ≤ δx + R

A higher dimensional simplex {λ0 , . . . , λn } is contained in W weak (X, L, R) if an


only if all of its edges are contained in it. This is a very useful construction
because the persistence “starts faster” than the standard complex. It is often
the case that one obtains the ultimate result even at R = 0, or for very small
values of R.

4.6 Mapper

Another construction is based on Morse theoretic ideas. We motivate it by


considering a space level construction. Suppose that we have a continuous map
r : X → B of spaces, and suppose further that B is equipped with an open cov-
ering U = {Uα }α∈A . We obtain the open covering r−1 U = {r−1 Uα }α∈A which
can be refined into a new covering r−1 U ∗ by decomposing each set r−1 Uα into
its connected components. We note that the dimension of the nerve of r−1 U ∗ is
less than or equal to the dimension of the nerve of U, so this construction, like

14
the alpha complex, produces complexes of bounded dimension. The analogue of
this construction for finite metric spaces is obtained by assuming that the finite
metric space is equipped with a map ρ to a reference space B, and replacing
the connected component construction by the output of a clustering algorithm.
A simple algorithm to use is single linkage hierarchical clustering, where one
makes a choice of threshold based on a simple heuristic, such as the one found
in [63]. Once this is done, one obtains a covering of the finite set X by the
collection of all clusters constructed in each of the sets ρ−1 (Uα ), and constructs
the nerve complex. This construction is referred to as Mapper. Usually, B
is chosen to be Rn , for a small positive integer n, and therefore the reference
map is determined by an n-tuple of real valued functions on the metric space
X. The reference maps can be chosen in many ways, giving different views of
the data. Some standard choices are density estimators, measures of centrality,
coordinates in linear algebraic algorithms such as principal component analysis
and multidimensional scaling ([43]), or individual coordinates when a metric
space is obtained as a subspace of RN for some N . The method has been used
extensively in work on life sciences data sets, see for example [33], [53], [54], and
[61].

5 Metrics on Barcode Space, and Stability The-


orems

Since persistent homology is used to analyze data sets, and data sets are often
noisy in the sense that one does not want to assign meaning to small changes,
it is important to analyze the stability of persistent homology outputs to small
changes in the underlying data. In order to do this, it is very useful to construct
metrics on the output barcodes so that one can assert the continuity of the
assignment of a barcode to a finite metric space or to the graph of a function.
Informally, one wants to prove that small changes in the data give rise to small
changes in the barcodes. Since small changes will often result in a change in
the number of bars, it will be important to construct a set of all barcodes, on
which we can impose a metric. The following is a natural construction. Let n
be a positive integer, and let Bn denote the set of unordered n-tuples of closed
intervals [x, y], where we permit x = y, and require x, y ≥ 0. It is understood
that B0 consists of a single point, namely the empty set of intervals. The set
Bn can be described as the orbit space of the Σn -action on the set I n which
permutes coordinates, where I denotes the set of closed intervals. To consider
all barcodes, we form a
B+ = Bn
n≥0

15
and define the full barcode space B to be the quotient B+ / ∼, where ∼ is the
equivalence relation generated by the relations

{I1 , . . . , Ik−1 , [xi , xi ], Ik+1 , . . . , In } ∼ {I1 , . . . , Ik−1 , Ik+1 , . . . , In }

The idea is that intervals of length zero, which do not represent non-zero vector
spaces, should be ignored. We will need to construct metrics on B, and to prove
continuity results for these metrics.

5.1 Metrics on Barcode Space

The general idea for the construction of metrics on barcode space is to consider
the set of all partial matchings between the intervals in the barcodes, assign a
penalty to each such matching, and finally to minimize this penalty over the
set of all matchings. Partial matchings are a little awkward, so for a pair of
0
barcodes B1 and B2 one instead considers actual bijections B1 = B1 ∪ Z →
0
B2 ∪ Z = B2 , where Z is the set consisting of a countable number of copies of
the interval of length zero [x, x] for each x ≥ 0. To start, one assigns a penalty
π([x1 , x2 ], [y1 , y2 ]) = ||(y1 − x1 , y2 − x2 )||∞ for every pair of intervals including
those of length zero. Next, given two barcodes B1 and B2 , let D(B1 , B2 ) denote
0 0
the set of all bijections θ : B1 → B2 for which π(I, θ(I)) 6= 0 for only finitely
0
many I ∈ B1 . Given a positive number p, we extend the penalty function π
from individual intervals to barcodes by forming
X 1
Wp (B1 , B2 ) = inf ( π(I, θ(I))p ) p
θ∈D(B1 ,B2 ) 0
I∈B1

As usual, p = ∞ is interpreted as the L∞ norm.

Definition 5.1 For any p > 0 and also for p = ∞, we refer to Wp (B1 , B2 ) as
the p-Wasserstein distance. For p = ∞, this distance is often referred to as the
bottleneck distance.

It is readily verified that under this definition, Wp defines a metric on B.

5.2 Stability Theorems

The metrics defined in the previous section have stability theorems associated
to them, that assert that the assignment of barcodes is a continuous process.
There are two situations of interest.

1. Gromov has defined (see [42]) a metric dGH on the set of all isometry
classes of compact metric spaces, called the Gromov-Hausdorff metric.

16
For any integer k ≥ 0, one can view the assignment to any finite metric
space its barcode as a map from the set of isometry classes of finite metric
spaces to B, and one can ask about its continuity properties.
2. Let f : X → R+ denote a continuous function. One can assign to each such
f and each integer i ≥ 0 the persistence vector space {Hi (f −1 ([0, r]))}r≥0 .
Under suitable situations (e.g. where X is a finite simplicial complex and
f is linear on simplices, or where M is a compact manifold and f is
smooth), one can show that the associated barcode is of finite type. It is
then an interesting question to ask about the continuity properties of this
assignment, where one assigns various metrics to the set of functions on
X.

There are theorems in both these cases. The following theorem demonstrates
the continuity of the assignment of a k-dimensional barcode with coefficients in
a field to a finite metric space, when the metric on the set of isometry classes
of finite metric spaces is the Gromov-Hausdorff distance.

Theorem 5.1 [26] For any two finite metric spaces X and Y and integer k ≥ 0,
let B(X) and B(Y ) denote the k-dimensional barcodes for the Vietoris-Rips
complexes of X and Y in a field K. Then

W∞ (B(X), B(Y )) ≤ dGH (X, Y )

There is a direct analogue for continuous real valued functions on a topological


space. In order to state it, we need a pair of definitions.

Definition 5.2 Let X be a topological space and f a continuous real valued


function on X. A real number a is said to be a homological critical value of f
if for some k and all sufficiently small ε > 0 the inclusion

Hk (f −1 (−∞, a − ε]) → Hk (f −1 (−∞, a + ε])

is not an isomorphism.

Definition 5.3 Let X be a topological space and f be a continuous real-valued


function. We say f is tame if there are finitely many homological critical values
and all the homology groups Hk (f −1 (−∞, a])) with coefficients in a field K are
finite dimensional.

Remark 5.1 Tameness holds in a number of familiar situations.

1. Morse functions on compact smooth manifolds.

17
2. Functions on finite simplicial complexes that are linear on each simplex.
3. Morse functions on compact Whitney-stratified spaces.

The theorem is as follows.

Theorem 5.2 [28] Let X be triangulable space, and suppose f, g : X → R are


tame continuous functions. Let B(f ) and B(g) denote the barcodes attached to
{Hk (f −1 (−∞, r]; K))}r and {Hk (g −1 (−∞, r]; K))}r , respectively. Then

W∞ (B(f ), B(g)) ≤ ||f − g||∞

The situation for the p-Wasserstein distances where p 6= ∞ is more complex.


We will need constraints on the metric space as well as on the functions. For
the space X, it is required to be a triangulable compact metric space, so X is
homeomorphic to a finite simplicial complex. In addition, though, there is a
requirement that the number of simplices required to construct a triangulation
where the diameter of the simplices is less than a threshold r. Specifically, for
a given r > 0, we define N (r) to be the minimal number of simplices in a
triangulation of X for which each simplex has diameter ≤ r. We will assume
that N (r) grows polynomially with r−1 , i.e. that there are constants C and m so
that N (r) ≤ rCm . It is easy to observe that this result holds for a finite simplicial
complex X equipped with the Euclidean metric obtained by restricting along
a piecewise linear embedding X ,→ Rn , as well as for a compact Riemannian
manifold. In [29], it is proved that any metric space satisfying this condition
also satisfies a homological condition. To state the homological condition, given
a barcode B = {[x1 , y1 ], . . . , [xn , yn ]}, we define Pk (B) to be the sum
X
Pk (B) = (yi − xi )k
i

Lemma 5.1 [29] Suppose that X is as above, and that k is a nonnegative real
number. Then there is a constant CX so that Pk (B(f )) ≤ CX for every tame
function f : X → R with Lipschitz constant L(f ) ≤ 1, where B(f ) is defined as
in Theorem 5.2.

Definition 5.4 When the conclusion of Lemma 5.1 holds for a metric space X
and real number k ≥ 1, we say that X implies bounded degree-k persistence.

The theorem is now as follows.

Theorem 5.3 [29] Let X be a triangulable metric space that implies bounded
degree-k persistence for some real number k ≥ 1, and let f, g : X → R be two

18
tame Lipschitz functions. Let CX be the constant in Definition 5.1, and let L(f )
and L(g) denote the Lipschitz constants for f and g respectively. Then
1 1− k
p
Wp (f, g) ≤ C p · ||f − g||∞

for all p ≥ k, where C = CX max{L(f )k , L(g)k }.

6 Tree-like Metric Spaces

Persistent homology gives ways of assessing the shape of a finite metric space.
One situation where this is very useful is in problems in evolution. The notion
that there is a “tree of life” is a very old one which actually predates Darwin.
Different organisms of the same type have attached to them sequences of the
same length in a genetic alphabet A. Therefore any set of organisms produces
a subset of the set of sequences of fixed length in A. One can assign a metric
to the space of all such sequences using Hamming distances or variants thereof.
The notion that there is a tree of the various organisms within a fixed type can
be restated in mathematical terms as stating that the space S(A) corresponding
to the organisms in the family is well modeled by a tree-like metric space, i.e.
a metric space which is isometric to the set of nodes in a tree, possibly with
weighted edges, equipped with the distance function that assigns to each pair
of vertices of the tree the length of the shortest edge path between them. This
approximability could be called the phylogenetic hypothesis for the particular
class of organisms. Testing this hypothesis for particular genetic data sets has
usually been done by attempting to fit trees to a given metric space and attempt-
ing to assess how well the approximation fits. Given the persistent homology
construction, one is tempted to develop criteria attached to the barcodes that
can distinguish between tree-like and non-tree-like metric spaces. The following
theorem, proved in [25] gives such a criterion.

Theorem 6.1 Let X be a finite tree-like metric space. Then the k-dimensional
persistent homology of X vanishes for k > 0.

Remark 6.1 This theorem was proved in the context of a study of data sets
of viral sequences. In that paper it was also shown that representative cycles
for generators of persistent homology in positive degrees gave important clues
to the mechanism of the failure of the phylogenetic hypothesis.

19
7 Persistence and Feature Generation

7.1 Introduction

The output of persistent homology is an interesting data type, consisting as


it does of finite collections of intervals. When humans are directly interpreting
barcodes, they are typically able to interpret barcodes directly from this descrip-
tion. However, there is a whole class of problems where computers are used to
“interpret” the barcodes. For example, suppose that we have a database of
complex molecules. Each molecule is given as a collection of atoms and bonds,
and the bonds may be equipped with lengths. The set of atoms in a molecule
can be endowed with a metric by considering the edge-path distance using the
lengths of the bonds as the lengths of the edges. What we have is now a data
set in which each of the data points is a finite metric space, and therefore pos-
sesses a barcode. If there are many molecules, we cannot hope to interpret
these barcodes by eye, and must therefore allow a computer to deal with them.
Machine learning algorithms are generally not well equipped to deal with data
points described as sets, and it is therefore important to encode them somehow
as vectors, which are the natural input to such algorithms. In this section we
will describe three distinct methods for “vectorizing” barcodes, i.e. for creating
coordinates on the set of barcodes. Specifically, we will define coordinates on
the space B constructed in Section 5.

7.2 Algebraic Functions

This method proceeds from the observation that the sets Bn can be viewed as
subsets of a real algebraic variety. The set I embeds as a subset of the two-
dimensional affine space A2 (R), and is defined by the inequalities x, y ≥ 0 and
y ≥ x for an interval coordinatized by the pair (x, y). Consequently, we have an
embedding
I n ,→ A2 (R)n ∼
= A2n (R)
and it is equivariant with respect to the permutation actions on I n and A2 (R)n .
Under the identification A2 (R)n ∼ = A2n (R), with A2n (R) coordinatized using
coordinates (x1 , . . . , xn , y1 , . . . , yn ), the corresponding action simply permutes
the xi ’s and yi ’s among themselves. It is a standard result in algebraic geometry
(see [52]) that for any action of a finite group on an affine algebraic variety (over
R in this case), there is an orbit variety, whose affine coordinate ring is the ring
of invariants of the group action. It is easy to verify that in this case, the orbit
set of the action on the closed real points of A2n (R) is exactly the symmetric
product Spn (R2 ). Since Bn ⊆ Spn (R2 ), elements in the affine coordinate ring
of the orbit variety can be regarded as functions on Bn , so we now have an
algebra of functions An on Bn . This means that we can describe functions on
the sets of barcodes with exactly n intervals. What one really wants is a ring of

20
functions on all of B. In order to construct such a ring, we observe that B can
be described as a quotient of the direct limit of the system

B0 → B1 → B2 → · · · (7–1)

where the inclusion Bn → Bn+1 is given by

{[x1 , y1 ], [x2 , y2 ], . . . , [xn , yn ]} → {[x1 , y1 ], [x2 , y2 ], . . . , [xn , yn ], [0, 0]}

There is a corresponding direct system of affine schemes

Spec(A0 ) → Spec(A1 ) → Spec(A2 ) → · · · (7–2)

which is compatible with the system (7–1) above. The colimit of the system
(7–2) is an affine scheme, whose affine coordinate ring is the inverse limit of the
system
A0 ←− A1 ←− A2 ←− · · ·
which we will denote A. The ring A can be analyzed, but is a bit too complicated
to be used in applications. To define a smaller subring, we note that Spec(A)
is equipped with an action by the algebraic group Gm , and we can define a
f in
subring A to consist of all those functions f so that all the translates of f
under the Gm -action span a finite dimensional vector subspace within A. The
f in
ring A is actually a graded ring, since the Gm -action determines a grading on
f in f in
it. Within A we define a subring Af in which consists of all elements of A
that respect the equivalence relation ∼. The main result of [3] is the following.

Theorem 7.1 The rings in question have the following properties.

f in
1. The ring A has the structure
f in
A ∼
= R[xi,j ; 0 ≤ i, 0 ≤ j, and i + j > 0]

2. The subring Af in is identified with

R[xi,j ; 0 < i, 0 ≤ j, and i + j > 0]

3. The element xi,j is the function given on a barcode {[x1 , y1 ], . . . , [xn , yn ]}


by
n
X
(ys − xs )i (ys + xs )j
s=1

4. The elements of Af in separate points in B


5. The ring Af in injects into the ring of functions on B.

21
We remark that these functions are not continuous for the bottleneck distance
on B. In [46], a tropical version of this work is presented, which gives functions
which are continuous for the bottleneck distance. For the p < ∞ situation, the
functions are continuous for the p-Wasserstein distance if one gives assigns B
the direct limit topology associated to the filtration of B by the images of the
spaces Bn , defined in Section 5.1.

7.3 Persistence Landscapes

Persistence landscapes were introduced in [12] as another vectorization of bar-


codes. The vectorization in this case consists of an embedding of the set B in a
set of sequences of functions on the real line. Let (a, b) denote a pair of elements
of R with a ≤ b. Then we define a function fa,b (t) on the real line by setting

f(a,b) (t) = min(t − a, b − t)+

where c+ = max(c, 0). A quick analysis of f(a,b) shows that it is zero for t ≤ a
and t ≥ b, that on the interval [a, a+b
2 ] it is equal to the graph of a line of slope 1
including the point (a, 0), and on the interval [ a+b
2 , b] it is the graph of a line of
slope −1 including the point (b, 0). The shape of the graph is that of a pyramid.

Figure 6: Graph of fa,b

Remark 7.1 Note that for a = b, f(a,b) (t) ≡ 0.

Given a persistence barcode B = {(a1 , b1 ), . . . , (an , bn )}, we define a family of


functions λk (t) parametrized by a positive integer k. For k = 1, we let λ1 (B)(t)
denote the maximum of all the values f(ai ,bi ) (t) over all i. For k > 1, we set
λk (B)(t) equal to the k-th largest value occurring in the set {f(ai ,bi ) (t)}i . The
family of functions {λk (B)(t)}k>0 is defined to be the persistence landscape of
the barcode B. It is clear that

λk (B)(t) ≥ 0

that
λk (B)(t) ≥ λk+1 (B)(t)
and that
λk (B)(l) = 0 for k > n

22
In [12] it is also proved that each function λk (B) is 1-Lipschitz, i.e. that

|λk (B)(t) − λk B)(t0 )| ≤ |t − t0 |

To summarize, the persistence landscape lies in the vector space F of real valued
functions on N × R, and it follows directly from Remark 7.1 that the definition
gives us a function P L : B → F.

One extremely useful fact about the persistence landscape is that it is compatible
with the various distances assigned to barcode spaces. We let Fp ⊂ F denote
the space of functions with finite Lp -norm || ||p . It is clear that the function P L
takes values in Fp for all p. Recall the definition of the p-Wasserstein distance
Wp between barcodes from Section 5.1. Bubenik now proves the following in
[12].

Theorem 7.2 The function dp (B, B 0 ) = ||P L(B) − P L(B 0 )||p is a metric on
B. The two metrics Wp+1 and dp generate the same topology on B. It follows
that the map P L is continuous when B is equipped with the metric Wp+1 and
Fp is equipped with the metric associated with the Lp norm.

Remark 7.2 Bubenik also provides explicit inequalities involving the two met-
rics in [12]

Bubenik also proves the following continuity theorem.

Theorem 7.3 The map P L is 1-Lipschitz from B equipped with the bottleneck
distance to the space of persistent landscapes equipped with the sup norm dis-
tance. This is equivalent to the algebraic statement

|λk (B)(t) − λk (B 0 )(t)| ≤ W∞ (B, B 0 )

Remark 7.3 The map P L separates points.

7.4 Persistence Images

There is another approach that proceeds by treating a barcode, recoded as a


persistence diagram, as a collection of point masses and then smoothing the
corresponding measure to produce and image, which is finally discretized by
selecting pixels and assigning each pixel the average value of the function on a
box surrounding that pixel. It is described in [2]. The detailed description is
given in several steps. The input is a persistence diagram (it is more natural
to use the persistence diagram view in this case), a collection of points B =
{(x1 , y1 ), . . . , (xn , yn )} ⊂ R2 . We assume we are given a function φ : R2 ×

23
R2 → R so that (a) φu = φ(u, −) is a probability distribution on R2 for each
u ∈ R2 and (b) the mean of φu is u. A standard choice is that of a spherically
symmetric Gaussian with mean u and a fixed variance σ. We also assume we are
given a continuous and piecewise differentiable nonnegative weighting function
f : R2 → R that is zero along the x-axis.

• Apply the coordinate change (x, y) → (x, y − x) = (ξ, η) to R2 , to obtain


the new set of points B = {(ξ1 , η1 ) . . . , (ξn , ηn )} of the same cardinality
in R2 . The points which correspond to short intervals are now all located
near the ξ-axis. The ξ-axis itself corresponds to intervals of length zero.
• Construct a new function ρB (z) : R2 → R, called the persistence surface
of B, defined by
X n
ρ(z) = f (ξi , ηi )φ((ξi , ηi ), z)
i=1

Notice that ρ vanishes on the x-axis.


• To construct a finite dimensional representation, we first assume that the
persistence diagrams we will be dealing with will always lie in a bounded
region in R2 , and divide a box containing that region into a square grid.
Construct the vector with coordinates in one to one correspondence with
the squares of the grid, and assign the entry corresponding to a square to
be the integral over that square of ρB .

The following is proved in [2]

Theorem 7.4 The map P I : B → RN which assigns to a persistence diagram


a vector using the above procedure is continuous when the metric on B is the
1-Wasserstein distance.

Remark 7.4 In [2], estimates proving this result are given both in the case
of a general choice of φ and the special case where φ is given by Gaussian
distributions with fixed variance. Of course, the estimates in the latter case are
stronger .

8 Generalized Persistence

8.1 Introduction

Persistent homology operates on functors F from the category R+ to simplicial


complexes, by composing them with the homology functor to obtain a persis-
tence vector space. It is useful to consider other parameter categories for the

24
diagrams, which can also help clarify the structure of data sets. There are
at least two such constructions that have been discussed. The first is zig-zag
persistence, introduced in [22], and the second is multidimensional persistence,
discussed for example in [20]. The first is designed to study the relationship
between homology of constructions that are not nested within each other, such
as distinct samples from a given space, and the second provides invariants of
situations where it is natural to study filtrations of spaces involving more than
one parameter, such as filtering by both the scale parameter R and a measure
of density. We describe both extensions of the standard persistent homology
methods.

8.2 Zig-zag Persistence

Consider the triangulation of R whose vertices are the integers. The set of ver-
tices of the barycentric subdivision of this simplicial complex is equipped with
a partial ordering, by recognizing that its elements are in one to one corre-
spondence with the simplices in the original triangulation, and that that set is
equipped with a partial ordering by treating it as a subset of the power set of
R. In concrete terms, we may view it as in one to one correspondence with the
set all integers and half integers, with every integer n being less than or equal
to the elements n ± 12 . We’ll refer to this partially ordered set as Z. A partially
ordered set (and its corresponding category) is said to be connected if any two
objects can be connected by a zig-zag sequence of morphisms. Connected par-
tially ordered subsets of Z are always determined by a pair (x, y), where x and
y are both integers or half integers, via the rule that assigns to the pair (x, y)
the collection of objects z for which x ≤ z ≤ y (in the total ordering on R).

Definition 8.1 For any category C, a zig-zag persistence object in C is a func-


tor from a connected subcategory of Z to C. Suppose further that C is equipped
with an object c0 that is both initial and terminal. Then for any connected sub-
category Z0 ⊆ Z and object c ∈ C, we define the interval object for Z0 and c to
be the functor F = FZ0 ,c : Z → C defined on objects by F (x) = c for all x ∈ Z0
and F (x) = c0 for any x ∈ / Z0 , and on morphisms by F (x ≤ y) = idc whenever
x, y ∈ Z0 . The behavior on morphisms into or out of Z0 is uniquely determined
by the fact that c0 is both initial and terminal. When C is the category of vec-
tor spaces over a field K, then it is understood that c will be chosen to be a
one-dimensional vector space over K.

Given a zig-zag persistence object in the category of simplicial complexes, we


may apply a homology functor Hi (−; A) for an abelian group A to obtain a zig-
zag persistence object in the category of abelian groups. The category of abelian
groups has the zero object as an object which is both initial and terminal, and
so the notion of interval objects makes sense. Of course if A = K, where K is a

25
field, then we obtain a zig-zag persistence vector space. It turns out that they
can be classified up to isomorphism.

Theorem 8.1 (P. Gabriel, [38]) Let Z0 ⊂ Z be any finite connected subcat-
egory, and let F denote any zig-zag persistence object in the category of finite
dimensional vector spaces over a field K defined on Z0 . Then there is a finite
direct sum decomposition
F ∼
M
= Fi
i

where each Fi is an interval object for Z0 and K, where Z0 ⊆ Z0 is a con-


nected subcategory of Z0 . Moreover, the sum is unique up to isomorphism and
reordering of the sum.

Corollary 8.1 The classification of zig-zag persistence vector spaces based on a


connected subcategory Z is given by barcodes where the intervals have endpoints
integers or half integers. We’ll refer to these barcodes as the zig-zag persistence
barcodes of the zig-zag persistence vector space.

This theory with applications is discussed in [22] and [21]. Here are some par-
ticular situations in which this construction can be used.

1. Samples: Given a finite metric space, one can ask to what extent the
persistent homology is captured on smaller samples from the data set. For
example, suppose that we have taken a very large uniform sample X from
the circle, and equip them with a metric by restricting the intrinsic metric
(say) on the circle to these points. We will then with high probability
find that the one-dimensional persistence barcode for the Vietoris-Rips
construction on X will contain one long bar and many much shorter ones.
Supposing that we do not actually know that the sample is coming from
a circle, but simply observe that we obtain a one-dimensional barcode
with one long bar and many shorter ones. A hypothesis suggested by
this observation is that the data is obtained by sampling from a space
with the homotopy type of the circle, but we may wonder if instead it
has somehow appeared “by accident”. One way to provide confirmation
of our hypothesis would be to observe that we obtain the same result
for various subsamples of our space, and that they are compatible in an
appropriate sense. Zig-zag persistence provides a way for carrying this out.
We suppose that we have chosen samples X1 , . . . , Xn ⊆ X, and create a
persistence object in the category of finite metric spaces and distance non-
increasing maps

X1 ∩ X2 ··· Xn ∩ Xn−1
 HH  HH  HH


 j
H 
 j
H 
 j
X1 X2 Xn−1 Xn

26
If we apply V(−, r) for a fixed choice of r, guided by the beginning and
endpoints of the observed long bar in the barcode for X, we obtain a
persistence object in the category of simplicial complexes. If we further
apply H1 (−, K) for a field K, we obtain a zig-zag persistence K-vector
space. By the classification Theorem 8.1 above, we obtain a decomposition
of the resulting zig-zag persistence K-vector space. The interpretation
of the informal idea that each sample has a one-dimensional homology
class and that they are consistent is the presence of an interval K-vector
space for a relatively long interval within the set {1, 32 , 2, . . . n − 12 , n}, or
equivalently a relatively long bar in the zig-zag persistence barcode.
2. Functions on spaces: Suppose that we have a topological space X
equipped with a continuous map f : X → R+ . Then we have the ordinary
persistence K-vector spaces {Hi (f −1 ([0, r], K)}r , which encode informa-
tion about the evolution of the homology of the sublevel sets of f as r
increases. However, one might be interested in gaining information about
approximations to level sets instead. They can provide more useful invari-
ants in a number of cases, and are approachable through zig-zag persistent
homology. We construct a zig-zag diagram of topological spaces as follows.

f −1 (1) ··· f −1 (N − 1)

   { #
f −1 I0 f −1 I1 f −1 IN −2 f −1 IN −1

where Ik = [k, k + 1] for all k. Again we can apply Hi (−, K) to this


diagram, an obtain a zig-zag persistence barcode. It contains information
about how the spaces f −1 [k, k + 1] change as k changes, and how they
assemble together. This situation is studied in [21].
3. Witness complexes: One of the problems with the witness complex is
that we have very little theory about the extent to which it reflects accu-
rately the persistent homology of the underlying metric space. A related
problem is that there is no direct relationship between the construction
for two different landmark sets. Even if L1 ⊆ L2 , there are no maps di-
rectly relating W (X, L1 , R) and W (X, L2 , R) for two different landmark
sets. One approach is to attempt to assess in some manner how consis-
tent the results of the constructions based on L1 and L2 are. It turns out
that given two landmark sets L1 and L2 , it is possible to construct an
intermediate bivariant construction W (X, {L1 , L2 }, R) for which there is
an evident diagram

W (X, L1 , R)  W (X, {L1 , L2 }, R) - W (X, L2 , R)

This is itself a short zig-zag diagram of length three, but if we have land-
mark sets {Li }N i=0 we can clearly construct a longer diagram that includes

27
the bivariant constructions W (X, {Li , Li+1 }, R) for i = 0, . . . , N − 1. The
construction is quite simple. Its vertex set is L1 × L2 , and a subset

{l11 × l21 , l12 × l22 , . . . , l1k × l2k } ⊆ L1 × L2

spans a simplex in W (X, {Li , Li+1 }, R) if and only if there is a point


x ∈ X so that x is a witness for {l11 , l12 , . . . , l1k } and {l21 , l22 , . . . , l2k } in the
complexes W (X, L1 , R) and W (X, L1 , R), respectively. The projections
to W (X, L1 , R) and W (X, L1 , R) are defined in the evident way.

8.3 Multidimensional Persistence

There are many situations where it can be useful to introduce families of spaces
varying with more than one real parameter. For example, in any kind of topolog-
ical analysis of data sets, it usually is the case that if one considers the persistent
homology of the entire data set, the presence of outliers means that we do not
typically obtain the“right homology”. For example, if we have data sampled
from the unit circle, but a small number of points sprinkled throughout the unit
disc, then persistent homology will end up reflecting the homology of the disc
rather than that of the circle. This is often circumvented by selecting only the
points of sufficient density, as measured by a density estimator, since outliers
will typically have very low density. The question then becomes, though, how
to choose the threshold for density. Also, it turns out that in general, there
will be variation through different topologies as one changes the threshold. The
solution to this problem is to attempt to study all thresholds at once, just as
we do when considering the scale parameter in the Vietoris-Rips construction.
This leads us to the following definition.

Definition 8.2 Let C be a category. Then by an n-dimensional persistence


object in C we mean a functor Rn+ → C, where Rn+ denotes the n-fold product
of copies of the category R+ .

Example 8.1 Let X be any metric space, and suppose that X is equipped with
a function f : X → R. It might be a density estimator, but it might also be a
measure of centrality. Then if X[s] denotes the subset {x ∈ X|f (x) ≤ s}, we
obtain family of spaces V(X[s], r), and by applying homology with coefficients
in a field K, we obtain a 2-dimensional persistence K-vector space parametrized
by the pair (r, s). While density is used as described above to remove outliers
or noise, the case of a centrality measure allows one to capture the presence of
the analogues of ends in a finite metric space.

Example 8.2 Multidimensional persistent homology can also be used to cap-


ture geometric information that is not strictly topological (see [17]). For ex-
ample, given a Riemannian manifold, one can compute Gaussian curvature at

28
each point and filter by that quantity. Considering the entire manifold, one can
obtain a one-dimensional persistence vector space by applying homology over
a field K. For computational purposes, though, we would need to deal with a
sample and use a second parameter, namely the scale parameter in a Vietoris-
Rips complex. This kind of analysis can for example be used to distinguish
between various ellipsoids.

The equivalence of categories in Proposition 3.1 has the following straightfor-


ward analogue.

Proposition 8.1 The category of n-dimensional persistence vector spaces over


K is equivalent to the category of Rn+ -graded modules over the Rn+ -graded ring
K[Rn+ ]. The analogous result with Rn+ replaced by Nn also holds.

Although useful, this result does not give us a classification of multidimensional


persistence K-vector spaces analogous to the barcode classification that works
in the one-dimensional case. The reason can be understood as analogous to the
commutative algebraic situation, where finitely generated K[x]-modules can be
classified because K[x] is a principal ideal domain, but K[x1 , . . . , xn ]-modules
cannot be classified for n ≥ 2. In a sense it is provable that there is no classifi-
cation strictly analogous to the n = 1 case, because the classification in the case
n ≥ 2 depends on the structure of the field K, as is demonstrated in [20]. This
is not the case when n = 1, because the classification is always by barcodes, and
that is independent of the structure of K.

Although there is no complete classification of multidimensional persistence vec-


tor spaces, there exist interesting invariants. For any k-dimensional persistence
K-vector space {Vx }x∈Rk , and pair of points x, y ∈ Rk , with x ≤ y in the natural
partial ordering on Rk , we can define r(x, y) to be the rank of the linear trans-
formation Vx → Vy . We extend the definition to all pairs x, y ∈ Rk by setting
r(x, y) = 0 when x is not less than or equal to y. The function r is therefore
defined on Rk × Rk , and we refer to it as the rank invariant. In the case k = 1,
the rank invariant is complete, in that it differentiates between distinct bar-
codes. There is an analogue for multidimensional persistence K-vector spaces
of the results described in Section 7.2. To state it, we first define an analogue of
barcodes. By a cube in Rk , we mean a set of the form I1 × I2 × · · · × Ik , where
each Is is an interval [as , bs ], and write C(a1 , . . . , ak , b1 , . . . , bk ) for this cube.
We write Ck for the set of all k-dimensional cubes. There is a straightforward
analogue to the space `B defined in Section 7.1. We first define Bn = Spn (Ck ),
n
and define B(k)+ = Sp (Ck ). Next, we say a cube C(a1 , . . . , ak , b1 , . . . , bk ) is
negligible if ai = bi for some i, and define an equivalence relation ∼ on B(k)+
to be the equivalence relation generated by the relation
{C1 , C2 , . . . Cn } ∼ {C1 , . . . , Cn−1 } for all negligible Cn
We define B(k) to be B(k)+ / ∼. Every equivalence class under ∼ has a unique
minimal representative consisting entirely of non-negligible cubes. Let M(k)

29
denote the set of isomorphism classes of k-dimensional persistence K-vector
spaces. For each cube C = C(a1 , . . . , ak , b1 , . . . , bk ) we let µ(C) denote the
isomorphism class of the the k-dimensional persistence K-vector space {V~x }~x∈Rk+
for which V~x = K whenever ~x ∈ C, V~x = {0} whenever ~x ∈ / C, and for which all
induced morphisms V~x → V~y for ~x ≤ ~y and ~x, ~y ∈ C are equal to the identity.
There is an obvious map θ : B(k) → M(k) which assigns to each minimal
representative {C1 , . . . , Cn } the direct sum ⊕i µ(Ci ).

Theorem 8.2 The constructions above satisfy the following properties.

1. The set B(k) is a subset of the set of real points of an affine scheme
Spec(A).
2. The ring A is complicated, but there is a Gm -action on Spec(A) that allows
us to define a more manageable subring Af in ⊆ A.
3. Af in is isomorphic to the polynomial ring R[x~a,~b ] where ~a and ~b are k-
vectors of integers for which ai ≥ 1 and bi ≥ 0 for all i.

4. The ring Af in separates points in B(k), and maps injectively to the ring
of all real-valued functions on B(k).
5. There is a natural lift of the ring homomorphism Af in → F (B(k), R) along
θ to a ring homomorphism j : Af in → F (M(k), R). F (X, R) denotes the
ring of real valued functions on a set X.

6. For any α ∈ Af in , the function j(α) : M(k) → R factors through the


rank invariant. For any two elements X, Y ∈ M(k), if for all α ∈ Af in ,
j(α)(X) = j(α)(Y ), then the rank invariants of X and Y are equal.

This result gives one approach to the study of multidimensional persistence.


There is a great deal of other work on this topic. The paper [24] deals with
computing persistent homology using commutative algebra techniques, via the
equivalence of categories from Proposition 8.1. They demonstrate that the
multigrading yields significant simplification. In [62] and [24], an algebraic
framework is constructed for dealing with the fact that there is alway noise
in the applications to data analysis. In [14], multidimensional persistence is
studied by examining the family of all one dimensional persistence modules ob-
tained by considering lines with varying angles in the persistence domain. M.
Lesnick in [50] has defined metric properties of the set of isomorphism classes of
multidimensional persistence vector spaces, and proven uniqueness results for
them. Finally, in [51], software is developed for visualization and interrogation
of two-dimensional persistence vector spaces.

One generalization that has not been studied yet is to multidimensional per-
sistence where some of the persistence directions might be “zig-zag” directions

30
rather than ordinary persistence directions. Formally, this would mean functors
from the categories of the form Rm n
+ × Z . This would be very useful in a number
of situations. For example, in the zig-zag constructions discussed in Section 8
for samples and for witness complexes, we were forced to choose a threshold for
the scale parameter. If we had a way of representing functors from R+ × Z to
vector spaces, we would not be forced to make this selection.

9 Coverage and Evasion Problems

There is an interesting set of technologies used for sensing of various kinds


called sensor nets. A sensor net consists of a collection of sensors distributed
throughout a domain. The sensors are very primitive in the sense that they are
only capable of sensing the presence of an intruder or of another sensor within
a fixed detection radius R. We also assume each sensor is given an identifying
label or number, and that other sensors can sense that identifier when they
are within the radius R of each other. One problem of interest is whether or
not the balls of radius R cover the domain, and it does not have an immediate
solution due to the fact that the positions of the sensors are not available. V.
De Silva, and R. Ghrist have developed a very interesting method for addressing
this problem based on persistent homology (see [31] and [32]). The rough idea
is as follows. Suppose that one has a domain D in the plane, with a connected
and compact curve boundary ∂D, that one has a collection of points {vi }i∈I
in the region, one for each sensor, and that one knows in some way that ∂D
is covered by the open balls BR (vi ). Let U denote the family of open subsets
{BR (vi ) ∩ D}i∈I of D, and let U ∂ denote the covering {BR (vi ) ∩ ∂D}i∈I of ∂D.
Suppose further that one knows that spaces

BR (vi1 ) ∩ · · · ∩ BR (vis ) ∩ D and BR (vi1 ) ∩ · · · ∩ BR (vis ) ∩ ∂D

are all either contractible or empty, for all choices of subsets {i1 , . . . , is } ⊆ I.
The conditions assure that D is homotopy equivalent to the nerve of U, and that
∂D is homotopy equivalent to the nerve of the covering U ∂ , as a consequence of
the nerve theorem. It further assures that the pair (N. U, N. U ∂ ) is equivalent to
the pair (D, ∂D). For any field, the relative group H2 (D, ∂D; K) ∼ = K, since D
is a connected orientable manifold with boundary ∂D, and it follows that the
relative group H2 (N. U, N. U ∂ ; K) ∼
= K. On the other hand, suppose that the
sets BR (vi ) do not cover all of D, and let D0 ⊆ D denote the union
[
Br (vi ) ∩ D
i

The space D0 is a non-compact manifold with boundary, and consequently the


relative group H2 (D0 , ∂D; K) ∼
= 0. As above, it will follows that

H2 (N. U, N. U ∂ ; K) ∼
=0

31
Consequently, the simplicial complex of the nerve of the covering, which can be
computed using the information available from the sensors, determines whether
or not we have a covering based on its simplicial homology. The conditions on
the coverings given above are of course impossible to verify, but DeSilva and
Ghrist are able to formulate a persistent homology condition that is a reasonable
substitute, and which gives a homological criterion in terms of a 2-dimensional
persistent homology group which is sufficient to guarantee coverage.

In order to understand the result in [32], we first observe that the information
from the sensors do not give us access to the Čech complex, since we have no
way of determining the intersection of balls without precise knowledge of the
distances between their centers. However, we do have access to the Vietoris-Rips
complex, since for any pair of points, we can tell whether or not they are within
a distance R, where R is the detection radius. We also have the comparison
results for the Vietoris-Rips complex and the Čech complex given in Proposition
4.1. The overall idea in [31] and [32] is to leverage the relationship between the
complex we have access to (Vietoris-Rips) and the complex from which we can
deduce coverage (Čech). In order to formulate such a result, we assume that we
are attaching a second number to each sensor, namely its covering radius Rc .
It is understood that each sensor covers a disc of radius Rc around it, and that
it can detect other sensors
√ at the detection radius R given above. We further
assume that Rc ≥ R/ 3. This allows us to guarantee that if {x0 , x1 , x2 } forms a
two simplex in the Vietoris-Rips complex V(X, R), then they span a two simplex
in Č(X, Rc ), by the second statement in Proposition 4.1. There are now the
following assumptions made in [32].

1. The sensors lie in a compact connected domain D ⊆ R2 , whose boundary


∂D is connected and piecewise linear with vertices called fence nodes.
2. Each fence node v is within R of its fence node neighbors on ∂D

The following theorem is proved in [31] and [32].

Theorem 9.1 Let R denote the Vietoris-Rips complex of the set of all sensors,
and let F denote the subcomplex on the fence vertices. If the sensors satisfy
the conditions above, and if there exists [α] ∈ H2 (R, F) so that ∂([α]) 6= 0,
where ∂ : H2 (R, F) → H1 (F) is the connecting homomorphism, then the balls
of radius Rc around the sensors cover U .

This theorem is in some situations not ideal, due to the strong assumptions
on the boundary. In [32], it is shown that the use of the persistent homology
of the pair (R, F) can be used to obtain coverage results with much weaker
hypotheses.

Another interesting direction is the study of time varying situations, where the
sensors move in time. In this case, there are situations where the balls around

32
the sensors do not cover the region at any fixed time, but that no “evader”
can avoid being sensed at some time. This kind of problem is referred to as an
evasion problem, and has been studied in [1] and [39]. The two approaches are
quite distinct, the approach in [1] using zig-zag persistence, and the approach
in [39] develops a new kind of cohomology with semigroup coefficients. The
approach in [39] yields “if and only if” results.

10 Probabilistic Analysis

10.1 Random Complexes

A very interesting direction of research is the study of the distributions on the


space of barcodes B (defined in Section 5) obtained by sampling points from
Euclidean space using various models of randomness, i.e. sampling from various
distributions on Rn , or using the theory of random graphs [11]. Since we do not
have a library of well understood distributions on B, one can instead study the
distributions on the real line obtained by pushing forward a distribution on B
along a map from B to R. An interesting such map is
y1 yn
{(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )} = B → max{ ,..., } = λ(B)
x1 xn
In fact, this map is only defined for barcodes in B for which the endpoints of
the intervals in the barcode all lie in {x ∈ R|x > 0}. When one is computing
homology in dimensions ≥ 1 of a Čech or Vietoris-Rips complex of a set of points
in Euclidean space, the barcode satisfies this property. A very interesting result
in this direction is proved in [10]. They proceed by sampling from [0, 1]d using
a uniform Poisson process of intensity n. This means that the sampling is done
from a uniform distribution on [0, 1]d , but that the number of points sampled
is a governed by a Poisson distribution. The description of all these notions is
beyond the scope of the present paper, but we refer the reader to [45]. The
main result of [10] is the following.

Theorem 10.1 We suppose that d and n are as above, that d ≥ 2, that we are
computing the k-dimensional persistence barcode B, and that 1 ≤ k ≤ d − 1. Let
 log n  k1
∆k (n) =
log log n
Then there exist constants Ak and Bk so that
 Πk (n) 
lim P Ak ≤ ≤ Bk = 1
n→∞ ∆k (n)
where P denotes probability, and where Πk (n) denotes the value of λ(B) for a
barcode generated as above.

33
10.2 Robust Estimators

The stability theorem in Section 5.2 deals with the effect on persistence barcodes
of small perturbations in the metric space, where perturbations are small in the
sense of the Gromov-Hausdorff distance. In reality, though, one expects that in
a perturbation of a metric space, a small number of distances may undergo rela-
tively large perturbations. However, one believes that the number will be small,
and that the points involved will be of small measure in an underlying mea-
sure. In order to deal with this problem, one incorporates a measure-theoretic
component in one’s definitions.

Definition 10.1 By a metric measure space we will mean a complete separable


metric space M equipped with a Borel measure µ.

In [41], a metric dGP r called the Gromov-Prokhorov metric is introduced on the


measure preserving isometry classes of compact metric measure spaces. It is
constructed by combining the Gromov-Hausdorff metric on the isometry classes
of compact metric spaces with the Prokhorov metric dP r on measures on a fixed
metric space (see [58]) by methods which we will not discuss here.

The paper [8] studies the distributions on the space of persistence barcodes
arising from the persistence barcodes obtained by sampling from a fixed metric
measure spaces. More precisely, they study distributions on the completion of
the metric space B of B equipped with the bottleneck distance. Let µ(n, k, X)
denote the distribution on B which arises from sampling a set S of n points on
a metric measure space X, and computing the k-dimensional barcode on S. We
have the following.

Theorem 10.2 The inequality


0 0
dP r (µ(n, k, X), µ(n, k, X )) ≤ ndGP r (X, X )

holds for all compact metric measure spaces X and X 0 .

This result is the used in [8] to develop robust statistics for distinguishing the
results of sampling from a fixed metric space. Robust statistics are computable
quantities attached to samples from distributions that are relatively insensitive
to small changes in parameter values in the distribution from which the samples
are gathered, and also to the presence of outliers. An elementary example of this
idea is the median, which is relatively insensitive to outliers and is considered a
robust statistic, while the mean is not. For the problem at hand, [8] defines a
precise notion of robustness.

Definition 10.2 Let f be a function from the set of isomorphism classes of


finite metric spaces to a metric space (W, dW ). We say the f is robust with

34
robustness coefficient r > 0 if for any nonempty finite metric space (X, dX ),
if for any nonempty finite metric space (X, dX ), there exists a bound δ such
0
that for all isometric embeddings of X in a finite metric space (X , dX 0 ) for
0 0
which #(X )/#(X) < 1 + r, it is the case that d(f (X), f (X )) < δ. There
is a corresponding uniform notion that states that the bound δ may be chosen
universally, for all X. There is a corresponding uniform notion that states that
the bound δ may be chosen universally, for all X.

We now obtain the following result for finite metric spaces, which are being
regarded as metric measure spaces by assigning to each metric space the uniform
measure.

Theorem 10.3 ([8]) For fixed n and k, µ(n, k, −) is uniformly robust with ro-
bustness coefficient r and estimate bound δ = nr/(1 + r) for any r.

Since the space B is relatively inaccessible, it is useful to use this result to


construct real valued statistics that also satisfy the robustness property. One
way to do this is to consider a fixed reference distribution P on B, and define
∆P (n, k, X) to be the number dP r (µ(n, k, X), P).

Theorem 10.4 ([8]) For fixed n and k, and P, ∆P (n, k, −) is uniformly robust
with robustness coefficient r and estimate bound nr/(1 + r) for any r.

It is also possible to obtain a somewhat simpler result, which does not require
calculation of the full distribution µ(n, k, X). Instead of fixing a reference dis-
tribution P, we choose a reference barcode B ∈ B, and define ∆med B (n, k, X) to
the median of the distribution of dB (B, −) applied to samples of k-dimensional
barcodes attached to samples of size n taken from the metric measure space X.

Theorem 10.5 For fixed n, k, and B, the function ∆med B (n, k, −) from finite
metric spaces (with uniform probability measure) is robust with robustness coef-
ficient greater than ln 2/n.

10.3 Random Fields

Suppose that we have a function f on a manifold. We have seen in Section 5


that we can associate to f the persistence vector spaces {Hi (f −1 ((−∞, r], K)}r
for i a non-negative integer and K a field, and further that there are stability
results that show that small changes in the function f lead to small changes, as
measured by the bottleneck distance, in the corresponding sublevel set barcode.
One can also ask, though, what the expected behavior of various of the features
attached to barcode space is for a class of functions chosen at random. An initial

35
question is what one means by a function chosen at random. The notion of a
random field is defined in [4] as follows.

Definition 10.3 By a real-valued random field on a topological space T we


mean a measurable mapping
F : Ω → RT
where RT denotes the set of all real-valued functions on T and (Ω, F, P) is a
complete probability space. F creates a probability measure on RT , from which
one can sample. Similarly, one can define an s-dimensional vector-valued ran-
dom field as a measurable mapping

F : Ω → (RT )s

The idea here is that rather than being a function, a random field is an assign-
ment to each t ∈ T a distribution on R, rather than a fixed value. Each of
the restrictions RT → R{t} ∼ = R produces a random variable, and therefore the
corresponding distribution, which we denote Ft . In fact, for any finite set of
points t1 , . . . , tn ∈ T , we obtain a distribution on Rn which we denote Ft1 ,...,tn .
There is a particular class of random fields called the Gaussian random fields
that is particularly amenable to analysis.

Definition 10.4 A real valued random field F is a Gaussian random field if


for all n, the n-dimensional distributions Ft1 ,...,tn are multivariate Gaussian
distributions on Rn . An s-dimensional random field is Gaussian, if all the dis-
tributions Ft1 ,...,tn are multivariate Gaussian distributions on Rn×s . Note that
in Gaussian fields, the behavior of the random function is completely determined
by the expectation function m(t) = E(Ft1 ,...,tn ), and the covariance function C.

The first example of this construction comes out of work of Wiener (see [67]
and [6]), using analysis of Brownian motion. Wiener studied the case T = R+ ,
and produced a Gaussian random field W , where the expected value of Wt is
always = 0 and the variance is given by C(s, t) = min(s, t). He also showed that
when one samples from the associated measure on RR+ , one obtains continuous
functions with probability 1, and so one calls W a continuous Gaussian field.
Given any analytic property of functions on manifolds, such as k-th order dif-
ferentiability, smoothness, or the property of being a Morse function, one can
create and study Gaussian random fields whose samples have the given property
with probability 1. Further, there are frequently a priori conditions on the co-
variance function of the random field that can be readily verified, and guarantee
the satisfaction of such properties.

The paper [9] proves a result concerning the persistent homology of the sublevel
sets of functions sampled from Gaussian random fields. We consider the real

36
valued function σ on barcodes given by
X
σ{[a1 , b1 ], . . . , [an , bn ]} = (bi − ai )
i

For any fixed x ∈ R and barcode β = {[a1 , b1 ], . . . , [an , bn ]}, we define the
x-truncation of β, β[x], to be the barcode
{[a1 , min(b1 , x)], . . . , [an , min(bn , x)]}
where it is understood that for any i such that x ≤ ai , the interval [ai , bi ] is
simply deleted. Finally, we define

X
χpers (M, f, x) = (−1)i σ({Hi (f −1 ((−∞, r], K)}r [x])
i=0

In [9], the following result is proved concerning the distribution of χpers (M, f, x)
for Gaussian random fields on Riemannian manifolds which produce Morse func-
tions with probability one.

Theorem 10.6 Let M be a closed d-dimensional Riemannian manifold, and let


F be a smooth real valued Gaussian random field so that F is Morse with prob-
ability 1, and so that the mean is identically zero and the variance is identically
equal to 1. Then for any x ∈ R, we have
E{χpers (M, f, x)} = χ(M )(ϕ(x) + xΦ(x))
d
X
+ϕ(x) (2π)−j/2 Lj (M )Hj−2 (−x)
j=1
where

1. Hn denotes the n-th Hermite polynomial.


2
2. ϕ(x) = (2π)−1/2 e−x /2
is the density function for the standard Gaussian
distribution.
Rx
3. Φ(x) = −∞ ϕ(u)du
4. Lj (M ) denotes the j-th Lipschitz-Killing curvature of M (defined for ex-
ample in [4], Section 7.6) with respect to a metric constructed from the
covariance metric attached to F .

Remark 10.1 The point of this result is that it gives a theoretical estimate for
the persistent Euler characteristic of sublevel sets in terms of classical invariants
of the manifols. Also, the result in [9] is actually proved in a much more general
context, that of regular stratified spaces and stratified Morse theory, which in
particular permits the study of manifolds with boundary. It also includes the
study of random fields that are of the form G ◦ (F1 , . . . , Fk ), where G is a de-
terministic function from Rk to R, and (F1 , . . . , Fk ) is a vector-valued Gaussian
random field.

37
References
[1] H Adams and G. Carlsson, Evasion paths in mobile sensor networks, The
International Journal of Robotics Research, 34,1, 2015, 90-104.
[2] H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman,
S. Chepushtanova, E. Hanson, F. Motta, and L. Ziegelmeier, Persistence
images: a stable vector representation of persistent homology, J. Machine
Learning Research, 18, 2017, 1-35.
[3] A. Adcock, E. Carlsson, and G. Carlsson, The ring of algebraic functions
on persistence barcodes, Homology, Homotopy, and Applications, vol. 18,
2016, 381-402.
[4] R. Adler and J. Taylor, Random Fields and Geometry, Springer, 2009.
[5] N. Akkiraju, H. Edelsbrunner, M. Facello, P. Fu, E. Mucke, and C. Varela,
Alpha shapes: defintion and software, In Proc. Internat. Comput. Geom.
Software Workshop 1995.
[6] P. Baldi, Stochastic Calculus, an Introduction Through Theory
and Exercises, Springer Universitext, 2017.
[7] S. Barannikov, The framed Morse complex and its invariants Adv. Soviet
Math., vol. 21, 1994, 93-115
[8] A. Blumberg, I. Gal, M. Mandell, and M. Pancia, Robust statistics, hy-
pothesis testing, and confidence intervals for persistent homology on met-
ric measure spaces, Foundations of Computational Mathematics, 14, 2014,
745-789.
[9] O. Bobrowski and M. Borman, Euler integration of Gaussian random fields
and persistent homology, Journal of Topology and Analysis, 4,1,2012, 49-
70.
[10] O. Bobrowski, M. Kahle, and P. Skraba, Maximally persistent cycles in
random geometric complexes, Annals of Applied Probability, 27, 4, 2017,
2032-2060.
[11] B. Bollobás, Random Graphs, second edition, Cambridge University
Press, 2011.
[12] P. Bubenik, Statistical topological data analysis using persistence land-
scapes, The Journal of Machine Learning Research 16 (1), 2015, 77-102
[13] L.Buhovsky, V. Humilière, S. Seyfaddini. The action spectrum and C 0 sym-
plectic topology, arXiv:1808.09790, 2018
[14] F. Cagliari, B. Di Fabio, and M. Ferri, One-dimensional reduction of mul-
tidimensional persistent homology, Proc. Amer. Mat. Soc. 138 (8), 2010,
3003-3017.

38
[15] Z. X. Cang, Lin Mu and G. Wei, Representability of algebraic topology for
biomolecules in machine learning based scoring and virtual screening, PLOS
Computational Biology, 14(1),2018, e100592.
[16] Z. X. Cang and G. Wei, TopologyNet: Topology based deep convolutional
and multi-task neural networks for biomolecular property predictions, PLOS
Computational Biology, 13(7), 2017, e1005690.
[17] G. Carlsson, A. Zomorodian, A. Collins, and L. Guibas, Persistence bar-
codes for shapes, International Journal of Shape Modeling, 11 (02), 2005,
1490187.

[18] G. Carlsson, T. Ishkhanov, V. De Silva, and A. Zomorodian, On the local


behavior of spaces of natural images, International Journal of Computer
Vision, vol. 76, 1, 2008, 1-12.
[19] G. Carlsson, Topology and data, Bull. Amer. Society, 46 (2), 2009, 255-308.
[20] G. Carlsson and A. Zomorodian, The theory of multidimensional persis-
tence, Discrete and Computational Geometry 42 (1), 2009,71-93.
[21] G. Carlsson, V. de Silva, and D. Morozov, Zigzag persistent homology and
real-valued functions, Proceedings of the twenty-fifth symposium on com-
putational geometry, ACM, 2009, 247-256.

[22] G. Carlsson and V. de Silva, Zigzag persistence, Foundations of Computa-


tional Mathematics, 10 (4), 2010, 367-405
[23] G. Carlsson, Topological pattern recognition for point cloud data, Acta Nu-
merica, vol. 23, 2014, 289-368
[24] W. Chacholski, M. Scolamiero, and F. Vaccarino, Combinatorial presenta-
tion of multidimensional persistent homology, J. Pure and Applied Algebra,
221 (5), 2017, 1055-1075.
[25] J. Chan, G. Carlsson, and R. Rabadan, Topology of viral evo-
lution, Proceedings of the National Academy of Sciences, 2013,
https://doi.org/10.1073/pnas.1313480110

[26] F. Chazal, D. Cohen-Steiner, L. Guibas, F. Mémoli, and S. Oudot, Gromov-


Hausdorff stable signatures for shapes using persistence, Eurographics Sym-
posium on Geometry Processing, vol. 28, 5, 2009.
[27] C. Chong and S. Kumar, Sensor networks: evolution, opportunities, and
challenges, Proceedings of the IEEE, 91,8, 2003, 1247-1256.

[28] D.Cohen-Steiner, H. Edelsbrunner, and J. Harer, Stability of persistence


diagrams, Discrete and Computational Geometry, vol. 37, 1, 2007, 103-120.

39
[29] D. Cohen-Steiner, H. Edelsbrunner, J. Harer, and Y. Mileyko, Lipschitz
functions have Lp -stable persistence, Foundations of Computational Math-
ematics, vol. 10,2, 2010, 127-139.
[30] V. De Silva and G. Carlsson, Topological estimation using witness com-
plexes, Symposium on Point Based Graphics, ETH, Zürich, Switzerland,
2004.
[31] V. De Silva and R. Ghrist, Coverage in sensor networks via persistent
homology, Alg. and Geom. Topology 7,2007, 339-358.
[32] V. De Silva and R. Ghrist, Homological sensor networks, Notices A.M.S.,
54,1, 2007.
[33] L. Li, W. Cheng, G. Glicksberg, O. Gottesman, R. Tarnier, R. Chen, E.
Bottinger, and J. Dudley, Identification of type 2 diabetes subgroups through
topological analysis of patient similarity, Science Translational Medicine,
7(311), doi: 10.1126/scitranslmed.aaa9364, 2015.

[34] H. Edelsbrunner, D. Kirkpatrick, and R. Seidel, On the shape of a


set of points in the plane, IEEE Transactions on Information Theory,
29,4,1983,551-559.
[35] H. Edelsbrunner, D. Letscher, A. Zomorodian, Topological persistence and
simplification, Discrete and Computational Geometry, 28 (4), 2002, 511-
533.
[36] H. Edelsbrunner and J. Harer, Persistent homology - a survey, Contemp.
Math. 453, American Mathematical Society, 2008.
[37] P. Frosini, A distance for similarity classes of submanifolds of a Euclidean
space, Bull. Australian Math. Soc. 42,3, 1990, 407-415.
[38] P. Gabriel, Unzerlegbare Darstellungen I, Manuscripta Math 6, 1972, 71-
103.
[39] R. Ghrist and S. Krishnan, Positive Alexander duality for pursuit and eva-
sion, SIAM Journal on Applied Algebra and Geometry, 1,1, 2017, 308-327.

[40] C. Giusti, E. Pastalkova, C. Curto, and V. Itskov, Clique topology reveals


intrinsic geometric structure in neural correlations, Proceedings of the Na-
tional Academy of Sciences, https://doi.org/10.1073/pnas.1506407112
[41] A. Greven, A. Pfaffelhuber, and A. Winter, Convergence in distribution
of random metric measure spaces, Probability Theory and Related Fields,
145, 2009, 285-322.
[42] M. Gromov. Metric Structures for Riemannian and Non-
Riemannian Spaces, Birkhäuser, Basel, 2007.

40
[43] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical
Learning. Data Mining, Inference, and Prediction, Springer Series
in Statistics, Springer, New York 2009.
[44] Y. Hiraoka, T. Nakamura, A. Hirata, E.G. Excolar, K. Matsue, and Y.
Nishiura, Hierarchical structures of amorphous solids characterized by per-
sistent homology, Proceedings of the National Academy of Sciences, 2016,
https://doi.org/10.1073/pnas.1520877113
[45] P. Jones and P. Smith, Stochastic Processes. An Introduction, CRC
Press, 2018.

[46] S. Kalisnik, Tropical coordinates on the space of persistence


barcodes, Foundations of Computational Mathematics, 2018,
https://doi.org/10.1007/s10208-018-9379-y
[47] L. Kanari, P. Dlotko, M. Scolamiero, R. Levi, J. C. Shillcock, K. Hess,
and H. Markram, A topological representation of branching morphologies,
Neuroinformatics, 2017.
[48] D. Kozlov, Combinatorial Algebraic Topology, Algorithms and Com-
putation in Mathematics, 21, Springer, 2008.
[49] F. Le Roux, S. Seyfaddini, and C. Viterbo, Barcodes and area-preserving
homeomorphisms, arXiv:1810.03039, 2018

[50] M. Lesnick, The theory of the interleaving distance on multidimensional


persistence modules, Foundations of Computational Mathematics, 15 (3),
2015, 613-650.
[51] M. Lesnick and M. Wright, Interactive visualization of 2-D persistence mod-
ules, arXiv:1512.00180, 2015.
[52] D. Mumford, J. Fogarty, and F. Kirwan, Geometric Invariant Theory,
Springer Verlag, 2002.
[53] M. Nicolau, A. Levine, and G. Carlsson, Topology based data analysis iden-
tifies a subgroup of breast cancers with a unique mutational profile and
excellent survival, Proceedings of the National Academy of Sciences, Apr
26;108(17):7265-70. doi: 10.1073/pnas.1102826108, 2011.
[54] A. Olin, E. Henckel, Y. Chen, T. Lakshmikanth, C. Pou, J. Mikes, A.
Gustafsson, A. Bernhardsson, C. Zhang, K. Bohlin, and P. Brodin, Stereo-
typic immune system development in newborn children, Cell, 2018 Aug
23;174(5):1277-1292.e14. doi: 10.1016/j.cell.2018.06.045, 2018.
[55] N. Otter, M. Porter, U. Tillmann, P. Grindrod, and H. Harrington, A
roadmap for the computation of persistent homology, EPJ Data Science,
6,17, 2017

41
[56] L. Polterovich and E. Shelukhin, Autonomous Hamiltonian flows, Hofer’s
geometry and persistence modules, Selecta Mathematica, 22, 2016, 227-296
[57] L. Polterovich, E. Shelukhin, and V. Stojisavljevic̀, Persistence modules
with operators in Morse and Floer theory, Moscow Mathematical Journal
17, no. 4, 2017, 757-786
[58] Y. Prokhorov, Convergence of random processes and limit theorems in prob-
ability theory, Theory Probab. Appl., 1, 1956, 157-214.
[59] M. W. Reimann, M. Nolte, M. Scolamiero, K. Turner, R. Perin, G. Chin-
demi, P. Dlotko, R. Levi, K. Hess, and H. Markram, Cliques of neurons
bound into cavities provide a missing link between structure and function,
Front. Comput. Neurosci., 12 June 2017.
[60] V. Robins, Towards computing homology from finite approximations, Pro-
ceedings of the 14th Summer Conference on General Topology and its Ap-
plications (Brookville, NY, 1999), Topology Proc. 24, 1999, 503-532.
[61] M. Saggar, O. Sporns, J. Gonzalez-Castillo, P. Bandettini, G. Carlsson, G.
Glover, and A. Reiss, Towards a new approach to reveal dynamical organi-
zation of the brain using topological data analysis, Nature Communications,
9, Article number 1399, 2018.
[62] M. Scolamiero, W. Chacholski, A. Lundman, R. Ramanujam, and S.
Öberg, Multidimensional persistence and noise Foundations of Computa-
tional Mathematics, 17 (6), 2017, 1367-1406.
[63] G. Singh, F. Memoli, and G. Carlsson, Topological methods for the analysis
of high dimensional data sets and 3D object recognition, SPBG 2007, 91-
100.
[64] J. Skryzalin and G. Carlsson, Numeric invariants from multidimensional
persistence, Journal of Applied and Computational Topology, 1, 2017, 89-
119.
[65] M. Usher and J. Zhang, Persisent homology and Floer-Novikov theory, Ge-
ometry and Topology, no. 6, 2016, 3333-3430
[66] K. Xia and G. Wei, Persistent homology analysis of protein structure, flexi-
bility and folding, International Journal for Numerical Methods in Biomed-
ical Engineering, 30, 2014, 814-844.
[67] N. Wiener, Nonlinear Problems in Random Theory, Technology Press
Research Monographs, The Technology Press of the Massachusetts Institute
of Technology and John Wiley & Sons, Inc. New York; Chapman & Hall,
Ltd., London, 1958.
[68] A. Zomorodian and G. Carlsson, Computing persistent homology, Discrete
and Computational Geometry, 33 (2), 2005, 249-274.

42

You might also like