Persistent Homology and Applied Homotopy
Persistent Homology and Applied Homotopy
Theory
arXiv:2004.00738v1 [math.AT] 1 Apr 2020
Gunnar Carlsson
Mathematics Department
Stanford University
and
Ayasdi, Inc.
April 3, 2020
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 The Motivating Problem . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 The Structure Theorem for Persistence Vector Spaces over a Field . 7
4 Complex Constructors . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Čech Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3 Vietoris-Rips Complex . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4 Alpha Complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.5 Witness Complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.6 Mapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Metrics on Barcode Space, and Stability Theorems . . . . . . . . . . 15
5.1 Metrics on Barcode Space . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Stability Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6 Tree-like Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 19
7 Persistence and Feature Generation . . . . . . . . . . . . . . . . . . . 20
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2 Algebraic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.3 Persistence Landscapes . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.4 Persistence Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
8 Generalized Persistence . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.2 Zig-zag Persistence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
8.3 Multidimensional Persistence . . . . . . . . . . . . . . . . . . . . . . 28
9 Coverage and Evasion Problems . . . . . . . . . . . . . . . . . . . . . 31
1
10 Probabilistic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 33
10.1 Random Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
10.2 Robust Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
10.3 Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2
1 Introduction
Persistent homology is a technique that has been developed over the last 20
years. Initial ideas developed in the early 1990’s [37], but the idea of persis-
tence was introduced by Vanessa Robins in [60], rapidly followed by additional
development ([35], [68]), and has been developing rapidly since that time. The
original motivation for the method was to extend the ideas of algebraic topol-
ogy from the category of spaces X to situations where we only have a sampling
of the space X. Of course, a sample is a discrete space so there is nothing to
be obtained unless one retains some additional information. One assumes the
presence of a metric or a more relaxed “dissimilarity measure”, and uses this in-
formation restricted to the sample in constructing the algebraic invariant. Over
time, persistent homology has been used in other situations, for example where
one has a topological space with additional information, such as a continuous
real valued function, and the sublevel sets of the function determine a filtra-
tion on the space. The output of standard persistent homology (we will discuss
some generalizations) is represented in two ways, via persistence barcodes and
persistence diagrams. Initially persistent homology was used, as homology is
used for topological spaces, to obtain a large scale geometric understanding of
complex data sets, encoded as finite metric spaces. Examples of this kind of
application are [18], [40], [47], [59], and [25]. Another class of applications uses
persistent homology to study data sets where the points themselves are metric
spaces, such as databases of molecule structures or images. This second set of
applications is developing very rapidly, and is exemplified in [15], [16], [66], and
[44]. Another direction in which persistent homology is being applied is in the
study of coverage and evasion problems arising in sensor net technology [27].
Example of research in this direction are [31], [32], [1], and [39].
3
data, it is important to develop methods that quantify the dependence of
the barcode output on small perturbations of data. This requires the impo-
sition of metrics on the set of barcodes, and proving theorems concerning
the distances between barcodes that differ only by small perturbations.
The progress that has been made in this direction is described in Section
5.
• Probabilistic analysis and inference: Because many of the applica-
tions of persistent homology occur in the study of data, it becomes im-
portant to study not only stability results under small perturbations, but
also perturbations that are “probabilistically small”, i.e. which may be
large, but where a large perturbation is a rare event. This means that one
must study the distributions on barcode space that occur from applying
persistent homology to complexes generated by various random models.
This is a rapidly developing area within the subject.
• Coverage and evasion problems: This work centers around attempts
to understand complements in Euclidean space of regions defined by col-
lections of sensors. It has been approached using different methods, and
appears to be a place where techniques such as Spanier-Whitehead duality
and embedding analysis, applied in suitably generalized situations, should
play a role. The general problem of understanding complements of objects
embedded in Euclidean space of course also plays a role in robotics.
• Symplectic geometry: Although persistent homology is mainly used in
situations where one is examining discrete approximations to continuous
objects, it can be applied in any geometric situation where there is a
metric, or where one is considering filtered objects. Such situations occur
in symplectic geometry, and there is recent work applying the technique
in studying, for example, Floer homology spectra. Examples of this kind
of work are [13], [49], [56], [57], and [65].
The goal of this paper is to discuss the different research directions and applica-
tions at a high level, so that the reader can orient him/herself in the techniques.
We remark that there are a number of useful surveys on persistent homology
and on topological data analysis more generally. The papers [19], [36], and [55]
give different perspectives on this subject.
Suppose that we are given a set of points X in the plane, and believe that it
is reasonable to assume that the points are sampled (perhaps with error) from
a geometric object. We could ask for information about the homology of the
underlying space from which X is sampled. Consider the finite set of points X
in R2 displayed in Figure 1 below.
4
Figure 1: Statistical Circle
Definition 2.1 For any finite metric space (X, d), and every R ≥ 0, let V(X, R)
denote the simplicial complex with vertex set equal to X, and such that {x0 , . . . , xk }
spans a k-simplex if and only if d(xi , xj ) ≤ R for all 0 ≤ i < j ≤ k.
Notice that if R is smaller than the smallest interpoint distance, then V(X, R)
will be a discrete complex on the set X. On the other hand, if R is greater
than the diameter of X, then V(X, R) is a full simplex on X. For intermediate
values, one obtains other complexes. In this case, there is a range of values
of R in which V(X, R) has the homotopy type of a circle. One could ask if
there is a principled way to choose a threshold R based on only the distances
between the points. After a great deal of experimentation, one finds that this
is a very difficult, if not unsolvable problem. A question that one can ask is if
there is a more structured object that one can study which incorporates all the
thresholds in a single object, which can be analyzed in a number of different
ways. Statisticians have confronted this problem in an analogous situation.
5
conclusion that the problem of selecting a threshold in a principled way is not
a well posed problem, but managed to construct a structured output, called a
dendrogram, which allowed them to study the behavior of all thresholds at once.
It is defined as follows. For each threshold R, we obtain a set of components
π0 V(Y, R), which yields a partition ΠR of Y . If we have R ≤ R0 , then ΠR is
a refinement of ΠR0 . One definition of a dendrogram structure on a finite set
Y is a parametrized family {ΠR }R≥0 of partitions of Y , with the property that
ΠR refines ΠR0 whenever R ≤ R0 , and so that for any partition Π, the intervals
{R|ΠR = Π} are either empty or closed on the left and open on the right. We
assume that there is an R∞ so that ΠR∞ is the partition with one block, namely
Y . The reason for this terminology is that this information is equivalent to a
tree D with a reference map to the non-negative real line. The tree D is defined
as follows. The points of D are pairs (c, R), where c is a block in the partition
ΠR , and 0 ≤ R ≤ R∞ . We clearly have a reference map to [0, R∞ ] given by
(c, R) → R. To define a topology on this set, we construct an auxiliary space
Z, defined as a
Z= [0, R∞ ]c
c
Figure 2: Dendrogram
6
r2 in R+ , with r0 and r2 in A, so that r0 + r1 = r2 , then r1 ∈ A. A is a totally
ordered set in its own right by restriction of the total order on R+ .
For example, N (the non-negative integers) and Q+ are both pure. We can now
make a definition that includes the dendrogram as a special case.
R → π0 (V(X, R))
For the entirety of this section, K will denote a field, which will be fixed through-
out. We are interested in the isomorphism classification of A-parametrized per-
sistence K-vector spaces, for pure submonoids A of R+ . This is very complicated
in general, but is manageable for objects of PVect(K) satisfying a finiteness con-
dition, which is always satisfied for the Vietoris-Rips complexes associated with
finite metric spaces. Let M be any commutative monoid. By an M-graded K-
vector space, we will mean a K-vector space V equipped with a decomposition
V ∼
M
= Vµ
µ∈M
7
Given two M-graded K-vector spaces V∗ and W∗ , by their tensor product we
will mean the tensor product V ⊗ W equipped with the M-grading given by
K
M
(V ⊗ W )µ = Vµ1 ⊗ Wµ2
K
µ1 +µ2 =µ
K[M]µ = K · µ
Proposition 3.1 Let K[A]∗ denote the monoid algebra of A over K, for a pure
submonoid A ⊆ R+ , regarded as a graded K-algebra. Let G(A, K) denote the
category of A-graded K[A]∗ -modules. Then there is an equivalence of categories
PAVect(K) ∼
= G(A, K)
where the elements of grading α are precisely the elements of the summand
θ(α). We now extend the vector space structure to a graded K[A]∗ -module
structure by defining the action tα · : θ(α0 ) → θ(α + α0 ) to be equal to the linear
transformation
θ(α0 → α + α0 ) : θ(α0 ) → θ(α + α0 )
where α0 → α + α0 denotes the unique morphism in A from α to α + α0 . It is
clear that this defines an A-graded K[A]∗ -module structure on M (θ), and that
M is a functor. We produce an inverse functor η : G(A, K) −→ PAVect(K) on
objects by setting
η(M∗ )(α) = Mα
and on morphisms by
8
The functors θ and η are clearly inverse to each other.
Remark 3.1 This result is formally very similar to the structure theorem for
finitely generated modules over a principal ideal domain (PID). Indeed, for the
case of A = N, where K[A] is Noetherian, the result is exactly a structure theo-
rem for finitely generated graded modules over the graded ring K[t]. For other
choices of A, K[A]∗ is not necessarily Noetherian. However, it does turn out
to be coherent, i.e. having the property that the kernel of any homomorphism
between finitely generated free modules is finitely generated.
Remark: Notice that the proof also gives an algorithm for producing a matrix
in the diagonal form given above.
(α, α0 ) ∈ A × (A ∪ {+∞})
9
Remark 3.2 Barcodes are often represented visually as collections of intervals.
The image in Figure 3 shows barcodes in dimensions zero and one.
Notice that the zero-dimensional barcode has one infinite interval, while the
one-dimensional barcode is finite. There is an equivalent visual representation
called the persistence diagram in which each interval is encoded as a point (x, y)
in the plane, where x and y are the left and right hand endpoints respectively.
An example is pictured in Figure 4.
An advantage of this representation made apparent in this image is that one can
represent different homology groups in the same diagram. The blue dots are
in this case the zero-dimensional persistence diagram and the orange ones are
the one-dimensional ones. When there are infinite intervals, one often selects
an upper threshold τ for the persistence parameter, and represents the infinite
interval by one with left hand endpoint τ .
Proof: This follows immediately from the matrix analysis in the proof of Propo-
sition 3.2.
Corollary 3.1 Given any chain complex of finitely generated free graded K[A]∗ -
modules, the homology modules are finitely presented K[A]∗ -modules.
We also interpret this result in terms of persistence vector spaces. The category
PAVect(K) is clearly an abelian category. For any α0 ∈ A, we define V (α) to
be the persistence vector space {Vα }α∈A , where Vα = {0} for α < α0 , Vα = K
for α ≥ α0 , and where for any α0 ≤ α ≤ α0 the linear transformation Vα → Vα0
is the identify on K. For any α < α0 , we also define the persistence vector
space V (α, α0 ) to be the quotient of V (α) by the image of the natural inclusion
V (α0 ) ,→ V (α). Corollary 3.1 now has the following consequence.
10
Figure 4: Persistence Diagram
4 Complex Constructors
4.1 Introduction
All data that we consider consists of finite sets of points X. The space X
itself is uninteresting topologically, since it is a discrete set of points. This
means that we have to build a space using auxiliary information attached to
the set of points. The auxiliary information we choose is a metric on the set
X, so X is a finite metric space. In the context of data sets, metrics are often
referred to as dissimilarity measures, since small distances between data points
often reflect notions of similarity between data points. Often the metric chosen
is the restriction of a well known and analyzed metric on an ambient space
containing X, such as n-dimensional Euclidean space. Other choices that are
often appropriate are Hamming distance, correlation distances, and normalized
variants (mean centering of coordinate functions, normalizing variance to 1) of
Euclidean distance. One method for constructing spaces based on metrics is
the Vietoris-Rips complex that we have seen above. It is actually a persistence
object in the category of simplicial complexes. All the constructions we will
look at except the Mapper construction are naturally persistence objects in the
category of simplicial complexes. The Mapper construction can also be equipped
with many such structures, but they are not canonical. Because of the presence
of a persistence structure, the Čech, Vietoris-Rips, alpha, and witness complexes
11
all have persistence barcodes associated to them in all non-negative dimensions.
It is obvious from the constructions that the zero-dimensional barcodes have a
single infinite interval, and that all higher dimensional barcodes are finite.
Let (X, d) denote a finite metric space. Given a threshold parameter value R, let
UR denote the covering of X by all balls BR (x) = {x0 |d(x, x0 ) ≤ R}. The Čech
complex at scale R is the nerve of the covering UR , and we denote it by Č(X; R).
It is clear that for R ≤ R0 , there is an inclusion Č(X; R) ,→ Č(X, R0 ), and
that therefore {Č(X; R)}R is a persistence object in the category of simplicial
complexes. From the theoretical point of view, it has the advantage that given
a covering of a topological space (with suitable point set hypotheses) by open
sets, the nerve lemma (see [48], Thm. 15.21) gives a criterion that guarantees
that the nerve of the covering is homotopy equivalent to the original space.
For specific situations, this bound can be improved. For example, it is shown
in [31] that if X ⊆ Rd is equipped with the restricted metric, then
r
0 R 2d
V(X, R ) ⊆ Č(X, R/2) ⊆ V(X, ) if 0 ≥
R d+1
This result allows us to compare homology computed using the Čech and Vietoris-
Rips methods.
12
4.4 Alpha Complex
There is another kind of complex whose dimension is low and which generally
has a moderate number of simplices. It is called the alpha complex, or the
alpha shapes complex, and was introduced in [34], with a thorough description
in [5]. It applies to data X that is embedded Euclidean space Rn , and so that
the metric on X is the restriction of the Euclidean metric to X. Typically the
number n is relatively small, say ≤ 5, because the construction becomes quite
expensive in higher dimensions. Also, the complex is generically of dimension
≤ n. The notion of generic is the following. Given any set of points X ⊆ Rn ,
it is possible for the alpha complex to have dimension higher than n, but it is
possible to perturb all the points by an arbitrarily small amount and obtain a
complex that is n-dimensional.
The collection of all Voronoi cells for a finite subset of Euclidean space is called
its Voronoi diagram. A Voronoi diagram in R2 might look like this.
13
be constructed for any embedding of a data set in a larger metric space. Given a
data set X, we therefore select a subset of landmarks L ⊆ X. We can now build
the analogues of the Voronoi cells for each of the landmark points within X, and
construct the nerve of the covering. In this case, both the ambient space and
the landmark set are usually taken to be finite. We will also need to introduce
persistence into this picture. The construction is as follows.
Definition 4.1 Given a metric space (X, d), a finite subset L ⊆ X, called the
landmark set, and a persistence parameter R, and for every x ∈ X we denote by
mx the distance from x to the set L. We define a simplicial complex W (X, L, R)
as follows. The vertex set of W (X, L, R) is the set L, and {l0 , l1 , . . . , lk } spans
a k-simplex if and only if there is a point x ∈ X (the witness) so that d(x, li ) ≤
mx + R for all i. The family of complexes {W (X, L, R)}R form a persistence
simplicial complex.
There are several variants on this construction. For example, there is the “lazy”
version in which the 1-simplices are the identical to the 1-simplices of the witness
construction, but in which we declare that any higher dimensional simplex is
an element of the complex if and only if all its one dimensional faces are. Each
of the lazy complexes is a flag complex. The lazy witness complex bears the
same relationship to the witness complex as the Vietoris-Rips complex bears to
the Čech complex. There is also the weak witness complex, {W weak (X, L, R)}R ,
which is defined as follows. For each point x ∈ X, we let δx denote the distance
to the second closest element of Λ to x. We then declare that a pair (λ1 , λ2 )
spans an edge of W weak (X, L, R) if and only if there is an x ∈ X so that
4.6 Mapper
14
the alpha complex, produces complexes of bounded dimension. The analogue of
this construction for finite metric spaces is obtained by assuming that the finite
metric space is equipped with a map ρ to a reference space B, and replacing
the connected component construction by the output of a clustering algorithm.
A simple algorithm to use is single linkage hierarchical clustering, where one
makes a choice of threshold based on a simple heuristic, such as the one found
in [63]. Once this is done, one obtains a covering of the finite set X by the
collection of all clusters constructed in each of the sets ρ−1 (Uα ), and constructs
the nerve complex. This construction is referred to as Mapper. Usually, B
is chosen to be Rn , for a small positive integer n, and therefore the reference
map is determined by an n-tuple of real valued functions on the metric space
X. The reference maps can be chosen in many ways, giving different views of
the data. Some standard choices are density estimators, measures of centrality,
coordinates in linear algebraic algorithms such as principal component analysis
and multidimensional scaling ([43]), or individual coordinates when a metric
space is obtained as a subspace of RN for some N . The method has been used
extensively in work on life sciences data sets, see for example [33], [53], [54], and
[61].
Since persistent homology is used to analyze data sets, and data sets are often
noisy in the sense that one does not want to assign meaning to small changes,
it is important to analyze the stability of persistent homology outputs to small
changes in the underlying data. In order to do this, it is very useful to construct
metrics on the output barcodes so that one can assert the continuity of the
assignment of a barcode to a finite metric space or to the graph of a function.
Informally, one wants to prove that small changes in the data give rise to small
changes in the barcodes. Since small changes will often result in a change in
the number of bars, it will be important to construct a set of all barcodes, on
which we can impose a metric. The following is a natural construction. Let n
be a positive integer, and let Bn denote the set of unordered n-tuples of closed
intervals [x, y], where we permit x = y, and require x, y ≥ 0. It is understood
that B0 consists of a single point, namely the empty set of intervals. The set
Bn can be described as the orbit space of the Σn -action on the set I n which
permutes coordinates, where I denotes the set of closed intervals. To consider
all barcodes, we form a
B+ = Bn
n≥0
15
and define the full barcode space B to be the quotient B+ / ∼, where ∼ is the
equivalence relation generated by the relations
The idea is that intervals of length zero, which do not represent non-zero vector
spaces, should be ignored. We will need to construct metrics on B, and to prove
continuity results for these metrics.
The general idea for the construction of metrics on barcode space is to consider
the set of all partial matchings between the intervals in the barcodes, assign a
penalty to each such matching, and finally to minimize this penalty over the
set of all matchings. Partial matchings are a little awkward, so for a pair of
0
barcodes B1 and B2 one instead considers actual bijections B1 = B1 ∪ Z →
0
B2 ∪ Z = B2 , where Z is the set consisting of a countable number of copies of
the interval of length zero [x, x] for each x ≥ 0. To start, one assigns a penalty
π([x1 , x2 ], [y1 , y2 ]) = ||(y1 − x1 , y2 − x2 )||∞ for every pair of intervals including
those of length zero. Next, given two barcodes B1 and B2 , let D(B1 , B2 ) denote
0 0
the set of all bijections θ : B1 → B2 for which π(I, θ(I)) 6= 0 for only finitely
0
many I ∈ B1 . Given a positive number p, we extend the penalty function π
from individual intervals to barcodes by forming
X 1
Wp (B1 , B2 ) = inf ( π(I, θ(I))p ) p
θ∈D(B1 ,B2 ) 0
I∈B1
Definition 5.1 For any p > 0 and also for p = ∞, we refer to Wp (B1 , B2 ) as
the p-Wasserstein distance. For p = ∞, this distance is often referred to as the
bottleneck distance.
The metrics defined in the previous section have stability theorems associated
to them, that assert that the assignment of barcodes is a continuous process.
There are two situations of interest.
1. Gromov has defined (see [42]) a metric dGH on the set of all isometry
classes of compact metric spaces, called the Gromov-Hausdorff metric.
16
For any integer k ≥ 0, one can view the assignment to any finite metric
space its barcode as a map from the set of isometry classes of finite metric
spaces to B, and one can ask about its continuity properties.
2. Let f : X → R+ denote a continuous function. One can assign to each such
f and each integer i ≥ 0 the persistence vector space {Hi (f −1 ([0, r]))}r≥0 .
Under suitable situations (e.g. where X is a finite simplicial complex and
f is linear on simplices, or where M is a compact manifold and f is
smooth), one can show that the associated barcode is of finite type. It is
then an interesting question to ask about the continuity properties of this
assignment, where one assigns various metrics to the set of functions on
X.
There are theorems in both these cases. The following theorem demonstrates
the continuity of the assignment of a k-dimensional barcode with coefficients in
a field to a finite metric space, when the metric on the set of isometry classes
of finite metric spaces is the Gromov-Hausdorff distance.
Theorem 5.1 [26] For any two finite metric spaces X and Y and integer k ≥ 0,
let B(X) and B(Y ) denote the k-dimensional barcodes for the Vietoris-Rips
complexes of X and Y in a field K. Then
is not an isomorphism.
17
2. Functions on finite simplicial complexes that are linear on each simplex.
3. Morse functions on compact Whitney-stratified spaces.
Lemma 5.1 [29] Suppose that X is as above, and that k is a nonnegative real
number. Then there is a constant CX so that Pk (B(f )) ≤ CX for every tame
function f : X → R with Lipschitz constant L(f ) ≤ 1, where B(f ) is defined as
in Theorem 5.2.
Definition 5.4 When the conclusion of Lemma 5.1 holds for a metric space X
and real number k ≥ 1, we say that X implies bounded degree-k persistence.
Theorem 5.3 [29] Let X be a triangulable metric space that implies bounded
degree-k persistence for some real number k ≥ 1, and let f, g : X → R be two
18
tame Lipschitz functions. Let CX be the constant in Definition 5.1, and let L(f )
and L(g) denote the Lipschitz constants for f and g respectively. Then
1 1− k
p
Wp (f, g) ≤ C p · ||f − g||∞
Persistent homology gives ways of assessing the shape of a finite metric space.
One situation where this is very useful is in problems in evolution. The notion
that there is a “tree of life” is a very old one which actually predates Darwin.
Different organisms of the same type have attached to them sequences of the
same length in a genetic alphabet A. Therefore any set of organisms produces
a subset of the set of sequences of fixed length in A. One can assign a metric
to the space of all such sequences using Hamming distances or variants thereof.
The notion that there is a tree of the various organisms within a fixed type can
be restated in mathematical terms as stating that the space S(A) corresponding
to the organisms in the family is well modeled by a tree-like metric space, i.e.
a metric space which is isometric to the set of nodes in a tree, possibly with
weighted edges, equipped with the distance function that assigns to each pair
of vertices of the tree the length of the shortest edge path between them. This
approximability could be called the phylogenetic hypothesis for the particular
class of organisms. Testing this hypothesis for particular genetic data sets has
usually been done by attempting to fit trees to a given metric space and attempt-
ing to assess how well the approximation fits. Given the persistent homology
construction, one is tempted to develop criteria attached to the barcodes that
can distinguish between tree-like and non-tree-like metric spaces. The following
theorem, proved in [25] gives such a criterion.
Theorem 6.1 Let X be a finite tree-like metric space. Then the k-dimensional
persistent homology of X vanishes for k > 0.
Remark 6.1 This theorem was proved in the context of a study of data sets
of viral sequences. In that paper it was also shown that representative cycles
for generators of persistent homology in positive degrees gave important clues
to the mechanism of the failure of the phylogenetic hypothesis.
19
7 Persistence and Feature Generation
7.1 Introduction
This method proceeds from the observation that the sets Bn can be viewed as
subsets of a real algebraic variety. The set I embeds as a subset of the two-
dimensional affine space A2 (R), and is defined by the inequalities x, y ≥ 0 and
y ≥ x for an interval coordinatized by the pair (x, y). Consequently, we have an
embedding
I n ,→ A2 (R)n ∼
= A2n (R)
and it is equivariant with respect to the permutation actions on I n and A2 (R)n .
Under the identification A2 (R)n ∼ = A2n (R), with A2n (R) coordinatized using
coordinates (x1 , . . . , xn , y1 , . . . , yn ), the corresponding action simply permutes
the xi ’s and yi ’s among themselves. It is a standard result in algebraic geometry
(see [52]) that for any action of a finite group on an affine algebraic variety (over
R in this case), there is an orbit variety, whose affine coordinate ring is the ring
of invariants of the group action. It is easy to verify that in this case, the orbit
set of the action on the closed real points of A2n (R) is exactly the symmetric
product Spn (R2 ). Since Bn ⊆ Spn (R2 ), elements in the affine coordinate ring
of the orbit variety can be regarded as functions on Bn , so we now have an
algebra of functions An on Bn . This means that we can describe functions on
the sets of barcodes with exactly n intervals. What one really wants is a ring of
20
functions on all of B. In order to construct such a ring, we observe that B can
be described as a quotient of the direct limit of the system
B0 → B1 → B2 → · · · (7–1)
which is compatible with the system (7–1) above. The colimit of the system
(7–2) is an affine scheme, whose affine coordinate ring is the inverse limit of the
system
A0 ←− A1 ←− A2 ←− · · ·
which we will denote A. The ring A can be analyzed, but is a bit too complicated
to be used in applications. To define a smaller subring, we note that Spec(A)
is equipped with an action by the algebraic group Gm , and we can define a
f in
subring A to consist of all those functions f so that all the translates of f
under the Gm -action span a finite dimensional vector subspace within A. The
f in
ring A is actually a graded ring, since the Gm -action determines a grading on
f in f in
it. Within A we define a subring Af in which consists of all elements of A
that respect the equivalence relation ∼. The main result of [3] is the following.
f in
1. The ring A has the structure
f in
A ∼
= R[xi,j ; 0 ≤ i, 0 ≤ j, and i + j > 0]
21
We remark that these functions are not continuous for the bottleneck distance
on B. In [46], a tropical version of this work is presented, which gives functions
which are continuous for the bottleneck distance. For the p < ∞ situation, the
functions are continuous for the p-Wasserstein distance if one gives assigns B
the direct limit topology associated to the filtration of B by the images of the
spaces Bn , defined in Section 5.1.
where c+ = max(c, 0). A quick analysis of f(a,b) shows that it is zero for t ≤ a
and t ≥ b, that on the interval [a, a+b
2 ] it is equal to the graph of a line of slope 1
including the point (a, 0), and on the interval [ a+b
2 , b] it is the graph of a line of
slope −1 including the point (b, 0). The shape of the graph is that of a pyramid.
λk (B)(t) ≥ 0
that
λk (B)(t) ≥ λk+1 (B)(t)
and that
λk (B)(l) = 0 for k > n
22
In [12] it is also proved that each function λk (B) is 1-Lipschitz, i.e. that
To summarize, the persistence landscape lies in the vector space F of real valued
functions on N × R, and it follows directly from Remark 7.1 that the definition
gives us a function P L : B → F.
One extremely useful fact about the persistence landscape is that it is compatible
with the various distances assigned to barcode spaces. We let Fp ⊂ F denote
the space of functions with finite Lp -norm || ||p . It is clear that the function P L
takes values in Fp for all p. Recall the definition of the p-Wasserstein distance
Wp between barcodes from Section 5.1. Bubenik now proves the following in
[12].
Theorem 7.2 The function dp (B, B 0 ) = ||P L(B) − P L(B 0 )||p is a metric on
B. The two metrics Wp+1 and dp generate the same topology on B. It follows
that the map P L is continuous when B is equipped with the metric Wp+1 and
Fp is equipped with the metric associated with the Lp norm.
Remark 7.2 Bubenik also provides explicit inequalities involving the two met-
rics in [12]
Theorem 7.3 The map P L is 1-Lipschitz from B equipped with the bottleneck
distance to the space of persistent landscapes equipped with the sup norm dis-
tance. This is equivalent to the algebraic statement
23
R2 → R so that (a) φu = φ(u, −) is a probability distribution on R2 for each
u ∈ R2 and (b) the mean of φu is u. A standard choice is that of a spherically
symmetric Gaussian with mean u and a fixed variance σ. We also assume we are
given a continuous and piecewise differentiable nonnegative weighting function
f : R2 → R that is zero along the x-axis.
Remark 7.4 In [2], estimates proving this result are given both in the case
of a general choice of φ and the special case where φ is given by Gaussian
distributions with fixed variance. Of course, the estimates in the latter case are
stronger .
8 Generalized Persistence
8.1 Introduction
24
diagrams, which can also help clarify the structure of data sets. There are
at least two such constructions that have been discussed. The first is zig-zag
persistence, introduced in [22], and the second is multidimensional persistence,
discussed for example in [20]. The first is designed to study the relationship
between homology of constructions that are not nested within each other, such
as distinct samples from a given space, and the second provides invariants of
situations where it is natural to study filtrations of spaces involving more than
one parameter, such as filtering by both the scale parameter R and a measure
of density. We describe both extensions of the standard persistent homology
methods.
Consider the triangulation of R whose vertices are the integers. The set of ver-
tices of the barycentric subdivision of this simplicial complex is equipped with
a partial ordering, by recognizing that its elements are in one to one corre-
spondence with the simplices in the original triangulation, and that that set is
equipped with a partial ordering by treating it as a subset of the power set of
R. In concrete terms, we may view it as in one to one correspondence with the
set all integers and half integers, with every integer n being less than or equal
to the elements n ± 12 . We’ll refer to this partially ordered set as Z. A partially
ordered set (and its corresponding category) is said to be connected if any two
objects can be connected by a zig-zag sequence of morphisms. Connected par-
tially ordered subsets of Z are always determined by a pair (x, y), where x and
y are both integers or half integers, via the rule that assigns to the pair (x, y)
the collection of objects z for which x ≤ z ≤ y (in the total ordering on R).
25
field, then we obtain a zig-zag persistence vector space. It turns out that they
can be classified up to isomorphism.
Theorem 8.1 (P. Gabriel, [38]) Let Z0 ⊂ Z be any finite connected subcat-
egory, and let F denote any zig-zag persistence object in the category of finite
dimensional vector spaces over a field K defined on Z0 . Then there is a finite
direct sum decomposition
F ∼
M
= Fi
i
This theory with applications is discussed in [22] and [21]. Here are some par-
ticular situations in which this construction can be used.
1. Samples: Given a finite metric space, one can ask to what extent the
persistent homology is captured on smaller samples from the data set. For
example, suppose that we have taken a very large uniform sample X from
the circle, and equip them with a metric by restricting the intrinsic metric
(say) on the circle to these points. We will then with high probability
find that the one-dimensional persistence barcode for the Vietoris-Rips
construction on X will contain one long bar and many much shorter ones.
Supposing that we do not actually know that the sample is coming from
a circle, but simply observe that we obtain a one-dimensional barcode
with one long bar and many shorter ones. A hypothesis suggested by
this observation is that the data is obtained by sampling from a space
with the homotopy type of the circle, but we may wonder if instead it
has somehow appeared “by accident”. One way to provide confirmation
of our hypothesis would be to observe that we obtain the same result
for various subsamples of our space, and that they are compatible in an
appropriate sense. Zig-zag persistence provides a way for carrying this out.
We suppose that we have chosen samples X1 , . . . , Xn ⊆ X, and create a
persistence object in the category of finite metric spaces and distance non-
increasing maps
X1 ∩ X2 ··· Xn ∩ Xn−1
HH HH HH
j
H
j
H
j
X1 X2 Xn−1 Xn
26
If we apply V(−, r) for a fixed choice of r, guided by the beginning and
endpoints of the observed long bar in the barcode for X, we obtain a
persistence object in the category of simplicial complexes. If we further
apply H1 (−, K) for a field K, we obtain a zig-zag persistence K-vector
space. By the classification Theorem 8.1 above, we obtain a decomposition
of the resulting zig-zag persistence K-vector space. The interpretation
of the informal idea that each sample has a one-dimensional homology
class and that they are consistent is the presence of an interval K-vector
space for a relatively long interval within the set {1, 32 , 2, . . . n − 12 , n}, or
equivalently a relatively long bar in the zig-zag persistence barcode.
2. Functions on spaces: Suppose that we have a topological space X
equipped with a continuous map f : X → R+ . Then we have the ordinary
persistence K-vector spaces {Hi (f −1 ([0, r], K)}r , which encode informa-
tion about the evolution of the homology of the sublevel sets of f as r
increases. However, one might be interested in gaining information about
approximations to level sets instead. They can provide more useful invari-
ants in a number of cases, and are approachable through zig-zag persistent
homology. We construct a zig-zag diagram of topological spaces as follows.
f −1 (1) ··· f −1 (N − 1)
{ #
f −1 I0 f −1 I1 f −1 IN −2 f −1 IN −1
This is itself a short zig-zag diagram of length three, but if we have land-
mark sets {Li }N i=0 we can clearly construct a longer diagram that includes
27
the bivariant constructions W (X, {Li , Li+1 }, R) for i = 0, . . . , N − 1. The
construction is quite simple. Its vertex set is L1 × L2 , and a subset
There are many situations where it can be useful to introduce families of spaces
varying with more than one real parameter. For example, in any kind of topolog-
ical analysis of data sets, it usually is the case that if one considers the persistent
homology of the entire data set, the presence of outliers means that we do not
typically obtain the“right homology”. For example, if we have data sampled
from the unit circle, but a small number of points sprinkled throughout the unit
disc, then persistent homology will end up reflecting the homology of the disc
rather than that of the circle. This is often circumvented by selecting only the
points of sufficient density, as measured by a density estimator, since outliers
will typically have very low density. The question then becomes, though, how
to choose the threshold for density. Also, it turns out that in general, there
will be variation through different topologies as one changes the threshold. The
solution to this problem is to attempt to study all thresholds at once, just as
we do when considering the scale parameter in the Vietoris-Rips construction.
This leads us to the following definition.
Example 8.1 Let X be any metric space, and suppose that X is equipped with
a function f : X → R. It might be a density estimator, but it might also be a
measure of centrality. Then if X[s] denotes the subset {x ∈ X|f (x) ≤ s}, we
obtain family of spaces V(X[s], r), and by applying homology with coefficients
in a field K, we obtain a 2-dimensional persistence K-vector space parametrized
by the pair (r, s). While density is used as described above to remove outliers
or noise, the case of a centrality measure allows one to capture the presence of
the analogues of ends in a finite metric space.
28
each point and filter by that quantity. Considering the entire manifold, one can
obtain a one-dimensional persistence vector space by applying homology over
a field K. For computational purposes, though, we would need to deal with a
sample and use a second parameter, namely the scale parameter in a Vietoris-
Rips complex. This kind of analysis can for example be used to distinguish
between various ellipsoids.
29
denote the set of isomorphism classes of k-dimensional persistence K-vector
spaces. For each cube C = C(a1 , . . . , ak , b1 , . . . , bk ) we let µ(C) denote the
isomorphism class of the the k-dimensional persistence K-vector space {V~x }~x∈Rk+
for which V~x = K whenever ~x ∈ C, V~x = {0} whenever ~x ∈ / C, and for which all
induced morphisms V~x → V~y for ~x ≤ ~y and ~x, ~y ∈ C are equal to the identity.
There is an obvious map θ : B(k) → M(k) which assigns to each minimal
representative {C1 , . . . , Cn } the direct sum ⊕i µ(Ci ).
1. The set B(k) is a subset of the set of real points of an affine scheme
Spec(A).
2. The ring A is complicated, but there is a Gm -action on Spec(A) that allows
us to define a more manageable subring Af in ⊆ A.
3. Af in is isomorphic to the polynomial ring R[x~a,~b ] where ~a and ~b are k-
vectors of integers for which ai ≥ 1 and bi ≥ 0 for all i.
4. The ring Af in separates points in B(k), and maps injectively to the ring
of all real-valued functions on B(k).
5. There is a natural lift of the ring homomorphism Af in → F (B(k), R) along
θ to a ring homomorphism j : Af in → F (M(k), R). F (X, R) denotes the
ring of real valued functions on a set X.
One generalization that has not been studied yet is to multidimensional per-
sistence where some of the persistence directions might be “zig-zag” directions
30
rather than ordinary persistence directions. Formally, this would mean functors
from the categories of the form Rm n
+ × Z . This would be very useful in a number
of situations. For example, in the zig-zag constructions discussed in Section 8
for samples and for witness complexes, we were forced to choose a threshold for
the scale parameter. If we had a way of representing functors from R+ × Z to
vector spaces, we would not be forced to make this selection.
are all either contractible or empty, for all choices of subsets {i1 , . . . , is } ⊆ I.
The conditions assure that D is homotopy equivalent to the nerve of U, and that
∂D is homotopy equivalent to the nerve of the covering U ∂ , as a consequence of
the nerve theorem. It further assures that the pair (N. U, N. U ∂ ) is equivalent to
the pair (D, ∂D). For any field, the relative group H2 (D, ∂D; K) ∼ = K, since D
is a connected orientable manifold with boundary ∂D, and it follows that the
relative group H2 (N. U, N. U ∂ ; K) ∼
= K. On the other hand, suppose that the
sets BR (vi ) do not cover all of D, and let D0 ⊆ D denote the union
[
Br (vi ) ∩ D
i
H2 (N. U, N. U ∂ ; K) ∼
=0
31
Consequently, the simplicial complex of the nerve of the covering, which can be
computed using the information available from the sensors, determines whether
or not we have a covering based on its simplicial homology. The conditions on
the coverings given above are of course impossible to verify, but DeSilva and
Ghrist are able to formulate a persistent homology condition that is a reasonable
substitute, and which gives a homological criterion in terms of a 2-dimensional
persistent homology group which is sufficient to guarantee coverage.
In order to understand the result in [32], we first observe that the information
from the sensors do not give us access to the Čech complex, since we have no
way of determining the intersection of balls without precise knowledge of the
distances between their centers. However, we do have access to the Vietoris-Rips
complex, since for any pair of points, we can tell whether or not they are within
a distance R, where R is the detection radius. We also have the comparison
results for the Vietoris-Rips complex and the Čech complex given in Proposition
4.1. The overall idea in [31] and [32] is to leverage the relationship between the
complex we have access to (Vietoris-Rips) and the complex from which we can
deduce coverage (Čech). In order to formulate such a result, we assume that we
are attaching a second number to each sensor, namely its covering radius Rc .
It is understood that each sensor covers a disc of radius Rc around it, and that
it can detect other sensors
√ at the detection radius R given above. We further
assume that Rc ≥ R/ 3. This allows us to guarantee that if {x0 , x1 , x2 } forms a
two simplex in the Vietoris-Rips complex V(X, R), then they span a two simplex
in Č(X, Rc ), by the second statement in Proposition 4.1. There are now the
following assumptions made in [32].
Theorem 9.1 Let R denote the Vietoris-Rips complex of the set of all sensors,
and let F denote the subcomplex on the fence vertices. If the sensors satisfy
the conditions above, and if there exists [α] ∈ H2 (R, F) so that ∂([α]) 6= 0,
where ∂ : H2 (R, F) → H1 (F) is the connecting homomorphism, then the balls
of radius Rc around the sensors cover U .
This theorem is in some situations not ideal, due to the strong assumptions
on the boundary. In [32], it is shown that the use of the persistent homology
of the pair (R, F) can be used to obtain coverage results with much weaker
hypotheses.
Another interesting direction is the study of time varying situations, where the
sensors move in time. In this case, there are situations where the balls around
32
the sensors do not cover the region at any fixed time, but that no “evader”
can avoid being sensed at some time. This kind of problem is referred to as an
evasion problem, and has been studied in [1] and [39]. The two approaches are
quite distinct, the approach in [1] using zig-zag persistence, and the approach
in [39] develops a new kind of cohomology with semigroup coefficients. The
approach in [39] yields “if and only if” results.
10 Probabilistic Analysis
Theorem 10.1 We suppose that d and n are as above, that d ≥ 2, that we are
computing the k-dimensional persistence barcode B, and that 1 ≤ k ≤ d − 1. Let
log n k1
∆k (n) =
log log n
Then there exist constants Ak and Bk so that
Πk (n)
lim P Ak ≤ ≤ Bk = 1
n→∞ ∆k (n)
where P denotes probability, and where Πk (n) denotes the value of λ(B) for a
barcode generated as above.
33
10.2 Robust Estimators
The stability theorem in Section 5.2 deals with the effect on persistence barcodes
of small perturbations in the metric space, where perturbations are small in the
sense of the Gromov-Hausdorff distance. In reality, though, one expects that in
a perturbation of a metric space, a small number of distances may undergo rela-
tively large perturbations. However, one believes that the number will be small,
and that the points involved will be of small measure in an underlying mea-
sure. In order to deal with this problem, one incorporates a measure-theoretic
component in one’s definitions.
The paper [8] studies the distributions on the space of persistence barcodes
arising from the persistence barcodes obtained by sampling from a fixed metric
measure spaces. More precisely, they study distributions on the completion of
the metric space B of B equipped with the bottleneck distance. Let µ(n, k, X)
denote the distribution on B which arises from sampling a set S of n points on
a metric measure space X, and computing the k-dimensional barcode on S. We
have the following.
This result is the used in [8] to develop robust statistics for distinguishing the
results of sampling from a fixed metric space. Robust statistics are computable
quantities attached to samples from distributions that are relatively insensitive
to small changes in parameter values in the distribution from which the samples
are gathered, and also to the presence of outliers. An elementary example of this
idea is the median, which is relatively insensitive to outliers and is considered a
robust statistic, while the mean is not. For the problem at hand, [8] defines a
precise notion of robustness.
34
robustness coefficient r > 0 if for any nonempty finite metric space (X, dX ),
if for any nonempty finite metric space (X, dX ), there exists a bound δ such
0
that for all isometric embeddings of X in a finite metric space (X , dX 0 ) for
0 0
which #(X )/#(X) < 1 + r, it is the case that d(f (X), f (X )) < δ. There
is a corresponding uniform notion that states that the bound δ may be chosen
universally, for all X. There is a corresponding uniform notion that states that
the bound δ may be chosen universally, for all X.
We now obtain the following result for finite metric spaces, which are being
regarded as metric measure spaces by assigning to each metric space the uniform
measure.
Theorem 10.3 ([8]) For fixed n and k, µ(n, k, −) is uniformly robust with ro-
bustness coefficient r and estimate bound δ = nr/(1 + r) for any r.
Theorem 10.4 ([8]) For fixed n and k, and P, ∆P (n, k, −) is uniformly robust
with robustness coefficient r and estimate bound nr/(1 + r) for any r.
It is also possible to obtain a somewhat simpler result, which does not require
calculation of the full distribution µ(n, k, X). Instead of fixing a reference dis-
tribution P, we choose a reference barcode B ∈ B, and define ∆med B (n, k, X) to
the median of the distribution of dB (B, −) applied to samples of k-dimensional
barcodes attached to samples of size n taken from the metric measure space X.
Theorem 10.5 For fixed n, k, and B, the function ∆med B (n, k, −) from finite
metric spaces (with uniform probability measure) is robust with robustness coef-
ficient greater than ln 2/n.
35
question is what one means by a function chosen at random. The notion of a
random field is defined in [4] as follows.
F : Ω → (RT )s
The idea here is that rather than being a function, a random field is an assign-
ment to each t ∈ T a distribution on R, rather than a fixed value. Each of
the restrictions RT → R{t} ∼ = R produces a random variable, and therefore the
corresponding distribution, which we denote Ft . In fact, for any finite set of
points t1 , . . . , tn ∈ T , we obtain a distribution on Rn which we denote Ft1 ,...,tn .
There is a particular class of random fields called the Gaussian random fields
that is particularly amenable to analysis.
The first example of this construction comes out of work of Wiener (see [67]
and [6]), using analysis of Brownian motion. Wiener studied the case T = R+ ,
and produced a Gaussian random field W , where the expected value of Wt is
always = 0 and the variance is given by C(s, t) = min(s, t). He also showed that
when one samples from the associated measure on RR+ , one obtains continuous
functions with probability 1, and so one calls W a continuous Gaussian field.
Given any analytic property of functions on manifolds, such as k-th order dif-
ferentiability, smoothness, or the property of being a Morse function, one can
create and study Gaussian random fields whose samples have the given property
with probability 1. Further, there are frequently a priori conditions on the co-
variance function of the random field that can be readily verified, and guarantee
the satisfaction of such properties.
The paper [9] proves a result concerning the persistent homology of the sublevel
sets of functions sampled from Gaussian random fields. We consider the real
36
valued function σ on barcodes given by
X
σ{[a1 , b1 ], . . . , [an , bn ]} = (bi − ai )
i
For any fixed x ∈ R and barcode β = {[a1 , b1 ], . . . , [an , bn ]}, we define the
x-truncation of β, β[x], to be the barcode
{[a1 , min(b1 , x)], . . . , [an , min(bn , x)]}
where it is understood that for any i such that x ≤ ai , the interval [ai , bi ] is
simply deleted. Finally, we define
∞
X
χpers (M, f, x) = (−1)i σ({Hi (f −1 ((−∞, r], K)}r [x])
i=0
In [9], the following result is proved concerning the distribution of χpers (M, f, x)
for Gaussian random fields on Riemannian manifolds which produce Morse func-
tions with probability one.
Remark 10.1 The point of this result is that it gives a theoretical estimate for
the persistent Euler characteristic of sublevel sets in terms of classical invariants
of the manifols. Also, the result in [9] is actually proved in a much more general
context, that of regular stratified spaces and stratified Morse theory, which in
particular permits the study of manifolds with boundary. It also includes the
study of random fields that are of the form G ◦ (F1 , . . . , Fk ), where G is a de-
terministic function from Rk to R, and (F1 , . . . , Fk ) is a vector-valued Gaussian
random field.
37
References
[1] H Adams and G. Carlsson, Evasion paths in mobile sensor networks, The
International Journal of Robotics Research, 34,1, 2015, 90-104.
[2] H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman,
S. Chepushtanova, E. Hanson, F. Motta, and L. Ziegelmeier, Persistence
images: a stable vector representation of persistent homology, J. Machine
Learning Research, 18, 2017, 1-35.
[3] A. Adcock, E. Carlsson, and G. Carlsson, The ring of algebraic functions
on persistence barcodes, Homology, Homotopy, and Applications, vol. 18,
2016, 381-402.
[4] R. Adler and J. Taylor, Random Fields and Geometry, Springer, 2009.
[5] N. Akkiraju, H. Edelsbrunner, M. Facello, P. Fu, E. Mucke, and C. Varela,
Alpha shapes: defintion and software, In Proc. Internat. Comput. Geom.
Software Workshop 1995.
[6] P. Baldi, Stochastic Calculus, an Introduction Through Theory
and Exercises, Springer Universitext, 2017.
[7] S. Barannikov, The framed Morse complex and its invariants Adv. Soviet
Math., vol. 21, 1994, 93-115
[8] A. Blumberg, I. Gal, M. Mandell, and M. Pancia, Robust statistics, hy-
pothesis testing, and confidence intervals for persistent homology on met-
ric measure spaces, Foundations of Computational Mathematics, 14, 2014,
745-789.
[9] O. Bobrowski and M. Borman, Euler integration of Gaussian random fields
and persistent homology, Journal of Topology and Analysis, 4,1,2012, 49-
70.
[10] O. Bobrowski, M. Kahle, and P. Skraba, Maximally persistent cycles in
random geometric complexes, Annals of Applied Probability, 27, 4, 2017,
2032-2060.
[11] B. Bollobás, Random Graphs, second edition, Cambridge University
Press, 2011.
[12] P. Bubenik, Statistical topological data analysis using persistence land-
scapes, The Journal of Machine Learning Research 16 (1), 2015, 77-102
[13] L.Buhovsky, V. Humilière, S. Seyfaddini. The action spectrum and C 0 sym-
plectic topology, arXiv:1808.09790, 2018
[14] F. Cagliari, B. Di Fabio, and M. Ferri, One-dimensional reduction of mul-
tidimensional persistent homology, Proc. Amer. Mat. Soc. 138 (8), 2010,
3003-3017.
38
[15] Z. X. Cang, Lin Mu and G. Wei, Representability of algebraic topology for
biomolecules in machine learning based scoring and virtual screening, PLOS
Computational Biology, 14(1),2018, e100592.
[16] Z. X. Cang and G. Wei, TopologyNet: Topology based deep convolutional
and multi-task neural networks for biomolecular property predictions, PLOS
Computational Biology, 13(7), 2017, e1005690.
[17] G. Carlsson, A. Zomorodian, A. Collins, and L. Guibas, Persistence bar-
codes for shapes, International Journal of Shape Modeling, 11 (02), 2005,
1490187.
39
[29] D. Cohen-Steiner, H. Edelsbrunner, J. Harer, and Y. Mileyko, Lipschitz
functions have Lp -stable persistence, Foundations of Computational Math-
ematics, vol. 10,2, 2010, 127-139.
[30] V. De Silva and G. Carlsson, Topological estimation using witness com-
plexes, Symposium on Point Based Graphics, ETH, Zürich, Switzerland,
2004.
[31] V. De Silva and R. Ghrist, Coverage in sensor networks via persistent
homology, Alg. and Geom. Topology 7,2007, 339-358.
[32] V. De Silva and R. Ghrist, Homological sensor networks, Notices A.M.S.,
54,1, 2007.
[33] L. Li, W. Cheng, G. Glicksberg, O. Gottesman, R. Tarnier, R. Chen, E.
Bottinger, and J. Dudley, Identification of type 2 diabetes subgroups through
topological analysis of patient similarity, Science Translational Medicine,
7(311), doi: 10.1126/scitranslmed.aaa9364, 2015.
40
[43] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical
Learning. Data Mining, Inference, and Prediction, Springer Series
in Statistics, Springer, New York 2009.
[44] Y. Hiraoka, T. Nakamura, A. Hirata, E.G. Excolar, K. Matsue, and Y.
Nishiura, Hierarchical structures of amorphous solids characterized by per-
sistent homology, Proceedings of the National Academy of Sciences, 2016,
https://doi.org/10.1073/pnas.1520877113
[45] P. Jones and P. Smith, Stochastic Processes. An Introduction, CRC
Press, 2018.
41
[56] L. Polterovich and E. Shelukhin, Autonomous Hamiltonian flows, Hofer’s
geometry and persistence modules, Selecta Mathematica, 22, 2016, 227-296
[57] L. Polterovich, E. Shelukhin, and V. Stojisavljevic̀, Persistence modules
with operators in Morse and Floer theory, Moscow Mathematical Journal
17, no. 4, 2017, 757-786
[58] Y. Prokhorov, Convergence of random processes and limit theorems in prob-
ability theory, Theory Probab. Appl., 1, 1956, 157-214.
[59] M. W. Reimann, M. Nolte, M. Scolamiero, K. Turner, R. Perin, G. Chin-
demi, P. Dlotko, R. Levi, K. Hess, and H. Markram, Cliques of neurons
bound into cavities provide a missing link between structure and function,
Front. Comput. Neurosci., 12 June 2017.
[60] V. Robins, Towards computing homology from finite approximations, Pro-
ceedings of the 14th Summer Conference on General Topology and its Ap-
plications (Brookville, NY, 1999), Topology Proc. 24, 1999, 503-532.
[61] M. Saggar, O. Sporns, J. Gonzalez-Castillo, P. Bandettini, G. Carlsson, G.
Glover, and A. Reiss, Towards a new approach to reveal dynamical organi-
zation of the brain using topological data analysis, Nature Communications,
9, Article number 1399, 2018.
[62] M. Scolamiero, W. Chacholski, A. Lundman, R. Ramanujam, and S.
Öberg, Multidimensional persistence and noise Foundations of Computa-
tional Mathematics, 17 (6), 2017, 1367-1406.
[63] G. Singh, F. Memoli, and G. Carlsson, Topological methods for the analysis
of high dimensional data sets and 3D object recognition, SPBG 2007, 91-
100.
[64] J. Skryzalin and G. Carlsson, Numeric invariants from multidimensional
persistence, Journal of Applied and Computational Topology, 1, 2017, 89-
119.
[65] M. Usher and J. Zhang, Persisent homology and Floer-Novikov theory, Ge-
ometry and Topology, no. 6, 2016, 3333-3430
[66] K. Xia and G. Wei, Persistent homology analysis of protein structure, flexi-
bility and folding, International Journal for Numerical Methods in Biomed-
ical Engineering, 30, 2014, 814-844.
[67] N. Wiener, Nonlinear Problems in Random Theory, Technology Press
Research Monographs, The Technology Press of the Massachusetts Institute
of Technology and John Wiley & Sons, Inc. New York; Chapman & Hall,
Ltd., London, 1958.
[68] A. Zomorodian and G. Carlsson, Computing persistent homology, Discrete
and Computational Geometry, 33 (2), 2005, 249-274.
42