A Survey of Dimension Estimation Methods
A Survey of Dimension Estimation Methods
Abstract
It is a standard assumption that datasets in high dimension have an internal structure which means
that they in fact lie on, or near, subsets of a lower dimension. In many instances it is important to
understand the real dimension of the data, hence the complexity of the dataset at hand. A great variety
of dimension estimators have been developed to find the intrinsic dimension of the data but there is little
guidance on how to reliably use these estimators.
This survey reviews a wide range of dimension estimation methods, categorising them by the geo-
metric information they exploit: tangential estimators which detect a local affine structure; parametric
estimators which rely on dimension-dependent probability distributions; and estimators which use topo-
logical or metric invariants.
The paper evaluates the performance of these methods, as well as investigating varying responses
to curvature and noise. Key issues addressed include robustness to hyperparameter selection, sample
size requirements, accuracy in high dimensions, precision, and performance on non-linear geometries.
In identifying the best hyperparameters for benchmark datasets, overfitting is frequent, indicating that
many estimators may not generalise well beyond the datasets on which they have been tested.
1 Introduction
Data analysis in high-dimensional space poses significant challenges due to the “curse of dimensionality”.
However, the data are often believed to lie in a latent space with lower intrinsic dimension than that of the
ambient feature space. Techniques that estimate the intrinsic dimension of data inform how we visualise
data in biological settings [88,110,115], build efficient classifiers [38], forecast time series [83], analyse crystal
structures [80] and even assess the generalisation capabilities of neural networks [5,10,29,96]. Moreover they
have been proposed as being able to distinguish human written text from AI written text [105]. Accurately
identifying the intrinsic dimension of the dataset is also key to understanding the accuracy, efficiency and
limitations of dimensionality reduction methods [7, 75, 107].
We study how estimators respond to practical challenges such as choosing appropriate hyperparameters,
and the presence of noise and curvature in data. We do so by assessing their performance on a standard set of
benchmark manifolds described in [18, 44, 89]. The datasets encompass a large range of intrinsic dimensions
(1 to 70), codimensions (0 to 72), and ambient dimensions of the embedding space (3 to 96) as well as
different geometries (flat, constant curvature, variable curvature).
Dimension estimators have been regularly surveyed, reflecting both their importance and the productivity
of the field [15, 16, 18]. As it is now nine years since the last survey, a renewed picture of the field is
appropriate. Key developments include the growth of topological data analysis, which adds new types of
1 Cardiff University, Cardiff, United Kingdom and Heilbronn Institute for Mathematical Research, Bristol, United Kingdom.
Email: [email protected]
2 Dioscuri Centre in Topological Data Analysis, IMPAN, Polish Academy of Sciences, Warsaw, Poland.
Email: [email protected]
3 Cardiff University, Cardiff, United Kingdom. Email: [email protected]
4 Wroclaw University of Science and Technology, Wroclaw, Poland and Dioscuri Centre in Topological Data Analysis, IMPAN,
1
Figure 1: Example of a cover and a refinement. There are points on the circle which are contained in three
elements of the initial cover. After refining, every point is contained in at most two elements. It would not
be possible to refine until each point was contained in only one element of the cover, so that dim = 1.
estimators. We pay particular attention to the geometric underpinnings of the different estimators, focussing
on the geometric information used in the estimation process and the impact this has, rather than on the local,
global, pointwise division. In carrying out this survey, we have also significantly extended the capability of
the scikit-dimension package [6] by adding a wide variety of estimators.
Algebraic (Hamel) Dimension Perhaps the simplest notion of dimension which can be defined is the
algebraic dimension of a vector space V , which is the cardinality of a basis B of V over the base field F .
Hausdorff Dimension We now specialise further to metric spaces. Let X be a metric space and fix d ≥ 0.
For each δ > 0, consider any countable cover {Ui } of X with diam(Ui ) < δ for all i, where diam(Ui ) denotes
the diameter of Ui . Define
nX d [ o
Hδd (X) = inf diam(Ui ) : X ⊂ Ui , diam(Ui ) < δ ,
i i
and then
H d (X) = lim Hδd (X).
δ→0
If d is too large, one finds H (X) = 0; if d is too small, H d (X) = +∞. The Hausdorff dimension of X is
d
2
equivalently the unique d0 for which
(
+∞, d < d0 ,
H d (X) = H d0 (X) ∈ [0, ∞].
0, d > d0 ,
This definition captures “fractional” scaling behavior—sets like the Cantor set or Julia sets often have non-
integer dimH [12].
Box-Counting (Minkowski-Bouligand) Dimension Let X ⊂ Rn . For each ϵ > 0, let N (ϵ) be the
minimum number of n-dimensional boxes (closed cubes) of side length ϵ required to cover X. Since the
covering elements here are restricted to equal-sized boxes, one records how N (ϵ) grows as ϵ → 0. The
box-counting dimension dB of X is defined as [33]:
log N (ϵ)
dB = lim .
ϵ→0 log 1/ϵ
In other words, if N (ϵ) ∼ ϵ−d as ϵ → 0, then dB = d. Note that this is a special (and generally coarser)
case of the Hausdorff dimension, corresponding to restricting all coverings in the Hausdorff construction to
equal-sized boxes.
Correlation Dimension The correlation dimension dC is a way to measure the dimensionality a space
with a probability measure, particularly in applications like dynamical systems and data science. The
correlation integral captures the spatial correlation between points by finding the probability that a pair of
points (x, y) is separated by a distance less than ε. Let µ be a probability measure on M .
Z
C(ϵ) = 1∥x−y∥<ϵ dµ⊗2 ,
M ×M
log C(ϵ)
dC = lim .
ϵ→0 log ϵ
This dimension estimates the scaling behaviour of point-pair correlations and is especially relevant for
analysing the structure of attractors in dynamical systems. It is bounded above by the information di-
mension, another probability-based dimension, which is in turn bounded above by the Hausdorff dimension
of M [42]. Since it is dependent on the measure µ, it takes account of the varying density of attractors.
Equivalence of dimensions for manifolds Consider the specific case of a smooth submanifold M d ⊂
RD . The tangent space at a point p ∈ M d , Tp M , is isomorphic to Rd and so will have algebraic dimension
d. The Lebesgue covering dimension of a smooth manifold of dimension d is d [108]. Endowing the manifold
with the Riemannian metric inherited from RD , we can also calculate its dimension as a metric space. The
Hausdorff dimension of a Riemannian manifold is the same as its topological dimension [74], in this case d.
We also have that M d ⊂ RD has box-counting dimension d [33]. If µ is a probability measure on M which
is absolutely continuous with respect to the volume measure, then the correlation dimension is the growth
rate of tubes around the diagonal submanifold in M × M , which can again be seen to be d (by a result of
Gray and Vanhecke reported in [43]). Hence we have that for a smooth submanifold of RD of dimension d
all notions of dimension described above coincide.
3
For a given dataset X ⊂ RD , the intrinsic dimension has been described as being the “minimum number
of parameters necessary to describe the data”, and presented as equivalent to the intrinsic dimensionality
of the data generating process itself [37]. However, regardless of what X is, it is always possible to embed
a 1-dimensional curve into RD in such a way that every data point in X lies on the curve (or, indeed, a
disconnected 0-manifold comprised of precisely the points of X).
When a very low-dimensional manifold of this type is used to model the data, the neighbourhood structure
of the data is not preserved. As a result, this representation of the data involves significant information loss.
Since the dataset does not come with a specification of the true latent neighbourhood structure, though, it
is not possible to determine precisely when a representation does or does not involve information loss.
One might attempt to define intrinsic dimensionality of a dataset instead as the smallest d such that
there is a d-dimensional space M which can be mapped into RD in a geometrically controlled way so that
every point of X lies in the image of M . The nature of the geometric control depends on what types of
generating process we might consider and is related to the question of what the neighbourhood structure of
X is. The weaker the geometric control is, the more embeddings are considered, until eventually every point
on X is permitted to lie on the image of a curve of possibly very high curvature.
If M is understood to be an embedded submanifold, then control on the reach of M is appropriate [59].
Indeed, as manifolds of lower reach are considered, the probability of error in estimating the dimension grows.
It is therefore not possible to assign an intrinsic dimension to a dataset. Only the underlying support of the
distribution has an intrinsic dimension and aspects of the geometry of this support and of the distribution
itself influence how challenging it is to estimate this dimension.
In this paper, we address the following core problem: construct a statistical procedure, an estimator
N
dˆN : RD → R,
such that if the finite point cloud X = {x1 , x2 , . . . , xN } ⊂ RD is independently identically distributed
according to an unknown distribution supported on a subset M of true intrinsic dimension d then dˆN
satisfies the consistency requirement
Equivalently, dˆN must converge (in probability, or almost surely under stronger assumptions) to the true
dimension d as the sample size N grows. Beyond mere convergence, one typically seeks quantifiable rates of
convergence and robustness to perturbations—such as additive noise or non-zero curvature on M .
Many estimators which we will consider take a hyperparameter k which defines a local region in the
dataset. Consistency will sometimes be known under the additional assumptions that k → ∞ while k/N → 0.
The first condition requires that every local neighbourhood contains arbitrarily many points, while the second
requires the local neighbourhoods to be arbitrarily small in comparison to the global dataset.
This paper will consider, compare and contrast a variety of estimators, evaluating them with respect to
their computational feasibility, their consistency and convergence on a set of benchmark synthetic datasets
and their sensitivity to hyperparameter selection.
4
Uniform distributions of points in high-dimensional balls and cubes have been widely studied and demon-
strate a variety of interesting behaviours that create statistical challenges. This is discussed in more detail
in Section 2.6.
All of these phenomena both provide a compelling reason to carry out dimension reduction as well as
creating a challenge in estimating the dimension of higher-dimensional spaces.
1.4 Notation
In this paper, we assume that our finite sample lies on a smooth d-dimensional manifold M d ⊂ RD . Con-
cretely, let M d be a compact, smooth submanifold of RD , and let
f : Md → R
X = { x1 , x2 , . . . , xN } ⊂ M d
for a finite sample of size N , drawn i.i.d. from the distribution defined by f .
Many dimension estimation procedures are local in nature, so for each point p of interest we need to
describe a neighbourhood of p within X.
For any point p ∈ RD , let rk (p) denote the Euclidean distance from p to its kth nearest neighbour in X.
If p ∈ X, then by convention we do not count p itself as its own nearest neighbour, so that r1 (p) > 0. The
ratio between two nearest neighbours will be denoted by
ri (p)
ρi,j (p) = .
rj (p)
We write
knn(p; k) = x ∈ X : ∥x − p∥ ≤ rk (p)
for the set of k nearest neighbours of p in X. When the value of k is clear from context, we simply write
knn(p). For any ε > 0, define
B(p, ε) = x ∈ X : ∥x − p∥ < ε .
Thus B(p, ε) is the subset of X lying inside the open Euclidean ball of radius ε centered at p. Denote by
N (p, ε) the number of points in B(p, ε), not counting p itself.
Both knn(p) and B(p, ε) provide local neighbourhoods of p within X, which we will use for various local
dimension estimates.
2 Families of Estimators
The authors have systematically searched the literature to identify papers on dimension estimation pub-
lished after the most recent survey ( [16], which provides a good contextual overview of methods previous
to 2015/16), indexed by Google Scholar using the search terms “intrinsic dimension” and “dimension esti-
mation”. Another survey of previous techniques from the same time [54] focuses on fractal dimensions.
This search is complicated by the vast number of diverse disciplines which are interested in the problem.
The disparate nature of the work in the field means that new estimators are not always well contextualised
within the historical research. Practitioners often appear unaware of well-established techniques.
While we place a focus on papers published after 2015, we also seek to provide a description of the most
essential established techniques. This paper cannot describe every estimator in detail, and our reference list
may not give an entirely comprehensive description of recent developments. We encourage practitioners to
consider more recent estimators rather than just relying on those that are familiar.
The dimension estimators which have been described in the literature fall into several broad families
depending on which underlying geometric information is used to infer the intrinsic dimension.
5
2.1 Tangential Estimators
A small neighbourhood in a smooth manifold can always be viewed as the graph of a function in the following
way. Let p ∈ M d ⊂ RD . After translating and rotating, we may assume that p = 0 and Tp M = Rd . Then,
near p, the manifold is the graph of a function F : Rd → RD and, furthermore, the differential dF is the
standard embedding Rd → RD along the first d co-ordinates.
The points in knn(p; k) or in B(p, ε), for sufficiently small k or ε, are therefore concentrated around an
affine subspace of RD which is the tangent space to the manifold. Many estimators seek to identify the
dimension of the manifold by finding this affine subspace. In this section we describe some of these, which
we refer to as tangential estimators.
Threshold parameters Some methods involve setting a parameter value which can be used to determine
which eigenvalues are “large”. As described below in Section 3.4, the output of the estimator can be very
sensitive to the selected value.
Fukunaga and Olsen [37] compare each eigenvalue to the largest, identifying an eigenvalue as “large” if
it exceeds a certain proportion of the largest one. This gives an estimated dimension of max{u : λu > αλ1 }
for some fixed parameter 0 < α < 1.
Fan et al . use two different thresholding methods and recommend using the lower of the two estimates [34].
One is to consider the “gap” λλu+1 u
. If this number is large, there is little additional explained variance in
adding a dimension, and a parameter β > 1 may be chosen so that the estimated dimension is min{u :
λu
λu+1 < β}.
Another
Pu is based onPthe total proportion of the variance estimated, so that the dimension is given by
N
min{u : i=1 λi > γ · i=1 λi } for some parameter 0 < γ < 1. This method has been validated by Lim
et al . [68], who showed that with theoretical bounds on sampling the dimension is correctly estimated with
high probability.
Kaiser takes a very simple approach to determining which eigenvalues are dominant, using only those
which are above the mean eigenvalue [52]. However, this can give very low estimates in cases of low codi-
mension. Jolliffe has suggested that this should be softened somewhat, to 70% of the mean [49]. Clearly any
other proportion could be used, so that this is a user-chosen parameter.
Probabilistic methods Other thresholding methods seek to label the dth eigenvalue as dominant if it is
sufficiently larger than it would be under a null hypothesis that the eigenvalues arise solely from random
variation. The challenge here is in selecting an appropriate distribution of eigenvalues to model the null
hypothesis.
Frontier uses the “broken stick” distribution [36]. This distribution is generated by drawing D−1 uniform
random variables on [0, 1] and arranging the lengths of the resulting D subintervals in descending order. The
assumption behind this method is that, if the data is unstructured, the eigenvalues will be distributed
according to the lengths of these subintervals. However, there does not appear to be any theoretical basis
for choosing this distribution.
6
A permutation test approach is given by Buja and Eyuboglu [14] in the context of PCA, but it can easily
be adapted for lPCA. If the local data is represented as a k × D matrix, each column can be reshuffled
independently to give an alternative dataset. Each feature has the same variance, but the internal structure
of the dataset has been removed. Carrying out PCA on these unstructured sets yields a null distribution for
the eigenvalues.
Minka [78] adopts a Bayesian model selection approach by maximising the Bayes evidence in fitting the
data to normal distributions of different dimensions, extending the maximum likelihood framework of [11].
Other methods Craddock and Flood [25] argue that eigenvalues corresponding to ‘noise’ should decay
in a geometric progression. Plotting log λi against i, these eigenvalues will therefore appear as a trailing
straight line, while dominant eigenvalues are those before the line. This gives a dimension estimate as the
value d such that λd+1 is the first point on the trailing straight line.
PCA based approaches have been augmented with an autoencoder for residual estimation [55]. For large
sample sizes, alternative methods of finding eigenvalues have been proposed [82]. Where missing features
pose a difficulty in using PCA, a method has been proposed in [88].
7
The conical dimension (CDim) [112] takes a slightly different approach – its value at p is the dimension
of the smallest affine subspace W through p so that the angle at p between each element of knn(p) and W is
less than π/4. This method is proven to be correct under conditions on the reach and the sampling density,
conditions which require an exponentially large sample size in each neighbourhood.
for 0 < ε1 < ε2 , which holds if B(p, ε2 ) is close enough to lying in an affine space and the density f can be
assumed to be constant on B(p, ε2 ). This is the basis of the estimator in [35], which was later elaborated on
in [9]
Volume-based estimators have also been developed for discrete spaces, such as lattices [73].
Correlation Integral The correlation integral CorrInt [42] is a global approach motivated by this geometry.
It is used in chaos theory to indicate the level of spatial correlation between points in the attractor. Let
∆(ε) ⊂ R2D be the neighbourhood of the diagonal given by ∆(ε) = {(x, y) : ∥x − y∥ < ε}. Then the function
1 X
C(ε) = lim 1∆(ε) (xi , xj ),
N →∞ N 2
i,j
known as the correlation integral, counts the proportion of pairs of points lying within distance ε of each
other. The growth rate of this function for small ε, C(ε) ∼ εd , gives the correlation dimension of the
attractor. This depends on the measure on the attractor rather than on the geometry of its support.
8
The value can be estimated by assuming that the fixed finite sample size N is sufficiently large that
1 X
C(ε) ≃ 1∆(ε) (xi , xj ),
N 2 i,j
log(N (r))
dimcap (S) = lim ,
r→0 log(r)
where N (r) is the r-covering number (the number of open balls of radius r required to cover set S). It is
worth noting that if both the topological dimension and capacity dimension exist for a given set, they are
equal to each other. For ease of computation, the r-packing number M (r) can be used instead of N (r).
Given a metric space X with distance metric d(·, ·), the set U ⊂ X is said to be r-separated if d(x, y) ≥ r
for all distinct x, y ∈ U . The r-packing number M (r) of a set S ⊂ X is defined as
log(M (r))
dimcap (S) = lim .
r→0 log(r)
As with CorrInt, this limit can only be approximated when using a finite sample. The dimension is
estimated by the following scale-dependent formula
Because finding the exact value of M (r) is an NP-hard problem, a greedy estimate of this value is used.
Volumes through graphs The methods of Kleindessner and von Luxburg are designed to provide an
estimate without using distances; instead considering only the knn graph [60]. Since the ratio of the Lebesgue
measure of the ball of radius 1 centred at x ∈ X to the Lebesgue measure of the ball of radius 2 is an injective
function of dimension, a dimension estimate dˆDP (x) can be obtained from the ratio. Similarly, the volume
of the intersection of two balls of radius 1 with different centres can be compared to the unit ball, giving the
ratio
d+1 1
S(d) = I 43 , ,
2 2
where I is the regularized incomplete beta function. Inverting this function gives another estimate. The idea
for both methods can be seen in Figure 2
To use these ratios to estimate dimension, construct the knn graph where points are connected by a
directed unweighted edge from i to j if and only if xj ∈ knn(xi ). Using the edge length metric de , the ball
of radius r is defined by Be (i, r) = {j ∈ V |de (i, j) < r} where de (i, j) is the distance from i to j in the knn
graph. They then consider the ratio of |Be (i, 1)| and |Be (i, 2)| and then average these ratios over a subset of
9
Figure 2: On the left the idea behind the doubling property estimator is represented. The ratio of the
number of points in the two balls grows with dimension. On the right, WODCap counts the number of points
in the intersection of the two balls. WODCap stands somewhat alone as being the only bi-local estimator
that we have found.
the points to recover a dimension estimate. A similar construction can be used for the intersection of two
unit balls with neighbouring centres, giving the WODCap estimator dˆCAP .
It has been shown that the sample version converges in probability to the theoretical construction as
n → ∞ in the special case where M d ⊂ Rd is a compact domain in Euclidean space such that the boundary
satisfies certain regularity assumptions and the density of M is bounded both above and below.
Theorem 2.1. [60] Let X = {x1 , . . . , xn } ⊂ M d ⊂ Rd be an i.i.d. sample from f and let G be the directed,
unweighted kNN-graph on X. Given G as input and a vertex i ∈ {1, . . . , n} chosen uniformly at random,
both dˆDP (xi ) and dˆCAP (xi ) converge to the true dimension d in probability as n → ∞ if k = k(n) satisfies
k ∈ o(n), log(n) ∈ o(k), and there exists k ′ = k ′ (n) with k ′ ∈ o(k) and log(n) ∈ o(k ′ ).
These estimators are perhaps the most sensitive to choice of neighbourhood size, as discussed later in
Section 3.4.2. The first method tends to vastly underestimate the dimension, especially with high dimensional
datasets. WODCap performs very closely to Levina and Bickel’s MLE [67] [60].
We refer the reader to [85, 86, 95] which further develops these approaches.
10
For a given p ∈ X, this gives rise to a maximum likelihood estimate for d, which depends on how the
neighbourhood of p is selected. Considering points in B(p, ε), the maximum likelihood estimator for d is
−1
N (p,ε)
1 X ε
dˆε (p) = log ,
N (p, ε) j=1 rj (p)
where N (p, ε) is the number of points in B(p, ε). Similarly, considering points in knn(p; k), we obtain
−1
k−1
1 X
dˆk (p) = (log ρk,j (p)) .
k − 1 j=1
Normalising by k − 2 instead will make this estimator asymptotically unbiased (under the conditions
d2
n, k → ∞ and nk → 0) and give variance k−3 [67].
To obtain a global estimate, Levina and Bickel used the arithmetic mean of these local estimates over
all samples. However, as noted by MacKay and Ghahramani [72], making the approximate assumption that
the rates at each point are independent allows one to write down a global likelihood function. The global
maximum likelihood estimator for the dimension is not the arithmetic mean of the local estimates, but the
harmonic mean. This adjustment provides significant improvements at small k. Levina and Bickel argue
that the spatial dependence does not prevent the variance of the global estimator from behaving as N −1 ,
since for fixed k this results in N/k roughly independent groups of points.
A Bayesian approach to this problem is described in [51].
Since the natural sizes of data clusters are typically unknown, it is preferable if the MLE approach can
operate on minimal neighbourhoods, so that the majority of data points within a neighbourhood genuinely
originate from the same local distribution. The TLE [4] method is an extension of the standard MLE estimator
specifically engineered for these tight neighbourhood scenarios.
Restricting neighbourhood size inherently limits the data available for estimation. To accurately estimate
local dimensionality, it is advantageous to utilize all pairwise distances among neighbours rather than relying
solely on distances from a central reference point. Denote the point at which the dimension estimation is
carried out by x ∈ X. The simplest application of this approach would involve computing the MLE for selected
points within the neighbourhood of x and averaging their results. However, simply aggregating estimators
presents two primary challenges: it can violate the locality condition, by adding additional neighbours, or it
can introduce a clipping bias, by restricting the data to the original neighbourhood. TLE seeks to address
these challenges.
In its basic variant, local dimension estimation is performed at a selected point x in a neighbourhood
V ⊂ X centred at q, as well as at reflections of these points x through q. At each of these points, a
dimension estimate is made using only elements from neighbourhood of point q, thereby preserving locality.
To avoid clipping bias, the distances used are skewed to dq,r (x, v) = r(v−x)·(v−x)
2(q−x)·(v−x) . Finally, these estimates
are aggregated by harmonic mean to yield an estimator of the following form:
−1 1 X dq,r (x, v) dx,r (2q − x, v)
dim
d TLE (q) =− ln + ln .
2|V |(|V | − 1) r r
x,v∈V
x̸=v
This approach relies on the fundamental assumption that local intrinsic dimensionality exhibits uniform
continuity across the data, given its use of dimension estimates for points within the neighbourhood.
The MiND ML [71,89] estimator, along with its derivatives MiND KL and IDEA, which are discussed later,
also builds on the work of Levina and Bickel while hewing closely to likelihood principles. The authors study
the ratio ρ1,k+1 (p) and calculate a maximum likelihood estimator as the root
( )
ρd (p) log ρ(p)
ˆ N X
d = d: + log ρ(p) − (k − 1) =0 .
d p∈X
1 − ρd (p)
In case k = 1, this reduces to the harmonic mean of the Levina–Bickel estimator as recommended by MacKay
and Ghahramani.
11
Another maximum likelihood based estimator, GRIDE [26], uses only two neighbours. Unlike TwoNN,
which will be described later, these need not be the two nearest neighbours. The intention is that the
neighbours be sufficiently far away to overcome problems caused by noise. For two parameters n1 < n2 , the
ratio ρn2 ,n1 is studied. Note that the TwoNN method uses only ρ2,1 . The distribution of these ratios yields
a concave log-likelihood function which in general can be maximised by numerical optimization. However,
where n2 = n1 + 1 a closed form solution exists, which reduces to the standard MLE estimator for k = 2
in case n1 = 1. In these cases, the authors note that if the data are generated from a Poisson process then
the the ratios follow the Pareto distribution. Furthermore, the ratios ρn1 +1,n1 (p) for each n1 are jointly
independent. This allows for uncertainty quantification and, if data over a range of n1 are considered, a
reduction in the variance of the estimator.
A method for adaptively choosing the neighbourhood size to carry out maximum likelihood estimation
is described in [27] while Amsaleg et al . demonstrate an approach using methods from extreme value theory
in [3].
Fitting distributions Building on the premises of Levina and Bickel’s work, Facco et al . [32] developed
the TwoNN method. This uses only distance information to the two nearest neighbours of each point. By
using only the two smallest distances, the neighbourhoods involved can be assumed to be closer to Euclidean
space and the probability distribution need only be approximately constant on the scale r2 (p). The ratio
ρ2,1 (p) is the local statistic of interest here. Assuming that ρ2,1 is independently identically distributed at
each p, and letting F be the cumulative distribution function (CDF) of the distribution, we have
log(1 − F (ρ2,1 ))
= d.
log(ρ2,1 )
The observed ratios provide an empirical CDF which is used to estimate the intrinsic dimension by linear
regression, fitting a line through the origin. The authors recommend truncating this data before fitting the
line, as the higher values of the CDF tend to be noisier.
Rozza et al . claim that, for large d, the value of k needed for the data to closely match the theoretical
distribution grows exponentially with d [89]. The grounds for this claim appear weak, based on an assessment
of the probability of a point lying near the centre of its neighbourhood ball when, in fact, that event is already
conditioned on. It is nevertheless true that for many datasets a large value of n is needed to ensure that a
knn neighbourhood does not meet the boundary of the manifold, so that the methods of this paper are still
relevant.
To address the possibility that the underlying assumptions of maximum likelihood estimators are far
from being true, one proposal, MiND KL, is to compare the distances to neighbours not to the theoretical
distribution, but to an empirical distribution obtained distances to neighbours in a random sample of N
points from a unit ball of each dimension. The dimension estimate is obtained by using the Kullback–Leibler
divergence to select the dimension which most closely matches the data. This method is not so well motivated
for the possibility that M is not geometrically similar to a unit ball.
m
Other distance-based approaches Another proposal from the same paper, IDEA, uses that d = 1−m
where m is the expected norm of a vector sampled uniformly from the ball. Estimating m by
N k
1 XX
m̂ = ρj,k+1 (xi )
N k i=1 j=1
they use
m̂
dˆ =
1 − m̂
as a consistent estimator of d. The authors note an underestimation bias, which they attribute to small
x
sample size, but which may also be due to the convexity of the function 1−x on (0, 1). The bias in the
estimator is corrected using a jackknife technique: subsamples are generated where each point is included
√
with probability p and the number of nearest neighbours is reduced to k p, so that increasing p emulates a
12
k
situation where k, N → ∞ and N → 0. The resulting dimension estimators are fitted to a curve
a1
dˆ = a0 − pN
.
log2 ( a2 + a3 )
The horizontal asymptote a0 is used as the estimate (unless a1 < 0, indicating that dˆ in fact decreases with
p; in this case the original estimate dˆ for the entire dataset is preferred).
The earlier work of Pettis et al . [84] is also relevant here. These authors directly calculated the distribution
1
of rk (x) and from this obtained that the expected value of the mean distance from x to knn(x) is Gk,d k 1/d Cn
1/d
where Gk,d = kΓ(k+Γ(k)1 and Cn , while sample dependent, is independent of k. Taking logarithms yields a
d)
regression problem
1
log r̄k − log Gk,d = log k + log Cn ,
d
Pk
where the slope is d . On the left hand side of this equation, while r̄k = k1 j=1 rj (x) is fixed, the value of
1
G depends explicitly on d. An iterative regression method is used, where log G is initially taken to be 0,
allowing the regression to be carried out, and the resulting value of d is then used to recalculate log G, until
the scheme converges. This was later refined further by Verveer and Duin [109].
Other estimators in this general category include [46].
Let the vector θˆi be given by the angles subtended at xi by each pair of points in knn(xi ). As each
component of θˆi follows a von Mises–Fisher distribution with parameters ν and τ , at each point xi the
parameters can be estimated with a maximum likelihood approach. This yields vectors ν̂ = (ν̂i )N i=1 and
τ̂ = (τ̂i )N
i=1 with means ν̄ and τ̄ .
The statistics dˆML , ν̄ and τ̄ are then compared with the statistics from a family datasets with a known
intrinsic dimension (in this case a uniform sample of N points from S n for 1 ≤ n ≤ D) to provide an
estimate.
Another method based on angles is the Expected Simplex Skewness method, ESS. This method has two
variants: ESSa and ESSb. We describe only ESSa in detail. Consider knn(p) ⊂ X and a target dimension,
m. We may form (m + 1)-simplices with one vertex at the centroid of knn(p) and the others at m + 1 points
drawn from knn(p). A comparison simplex can be formed where at one vertex all edges meet orthogonally,
and the side lengths of those edges are the same as the side lengths incident to the centroid. For each
simplex, a statistic referred to as “simplex skewness” is obtained as the ratio of the volume of the simplex
to this comparison simplex.
For example, in the case of m = 1 the area of a triangle with one vertex at the centroid is compared to
the area of a right triangle with short side lengths equal to the lengths of the edges incident to the centroid.
The simplex skewness for this 2-simplex is simply sin(θ), where θ is the angle at the centroid.
For uniformly distributed points in a d-dimensional ball B d with volume measure µ, the expected simplex
skewness in case m = 1 is Z
(1) 1
sd = | sin(θ(x))| dµ(x)
Vd B d
where Vd is the volume of the unit ball and θ(x) is the angle at the centre of the ball between x and any
fixed co-ordinate axis.
Similarly, for m > 1, we obtain
u ∧ v1 ∧ · · · ∧ vm Γ( d2 )m+1
Z
(m) 1
sd = m dµ(v1 ) dµ(v2 ) · · · dµ(vm ) =
Vd (B d )m |v1 | · · · |vm | Γ( d+1 m d−m
2 ) Γ( 2 )
13
where u is a unit vector along a chosen coordinate axis.
When simplex edges are short, the precise position of the centroid has a major influence on the skewness.
To mitigate this, the empirical estimate ŝ(m) is given by a weighted mean of the skewnesses of each simplex,
(m)
where the weights are the products of the edge lengths incident to the centroid. Since the sd increase
monotonically to 1 as d → ∞, by comparing the skewness of simplices in the data to the expected simplex
skewness for balls of varying dimensions d, a dimension estimate can be obtained by linear interpolation [48].
The alternative method ESSb takes the unit vector in each edge direction and estimates the dimension
from the projection on each edge to to the others, using a similar process. A simplex-based approach was
also used in [21].
Given the relationship between zeroth-dimension persistent homology and minimum spanning trees, it is
natural to generalise this notion to higher dimension homological features captured by X. We let PDi (X)
denote the finite part of the persistence diagram of the i-th Vietoris-Rips or Čech persistent homology of X.
We let the α-total persistence of the persistence diagram be
X
Eαi (X) = (d − b)α (2)
(b,d)∈PDi (X)
In particular if we consider dimension i = 0, we have Eα (X) := Eα0 (X). Some relationships also hold when
i > 0; see [93, 94] for details.
14
In [1, 10, 23], it is proposed that the intrinsic dimension of data X be estimated by taking various sub-
samples X′ ⊂ X, and computing the slope m of the curve log(Eα0 (X′ )) as a function of log(|X′ |). The inferred
dimension is then given by
α
dˆ = . (3)
1−m
There is considerable literature on the study of minimum spanning trees that justifies this estimator. In the
study of minimum spanning trees on point sets in [0, 1]d , it has been a folklore theorem that the growth rate
of Eα with the number of points is at most O(nmax(1−α/d,0) ) for α ∈ (0, d) [61]. In other words, for higher
dimensional cubes d > α, the α-weight Eα can grow sublinearly with n, while for lower dimensional cubes
d ≤ α, the α-weight Eα does not grow with n at all. This motivated the definition of the minimum spanning
tree dimension of a bounded metric space X in [61] as
dimMST (X) := inf{α > 0 : Eα (X) finite ∀ finite subsets X ⊂ X}. (4)
dimiPH (X) := inf{α > 0 : Eαi (X) finite ∀ finite subsets X ⊂ X}. (5)
By definition it follows that dimMST (X) = dim0PH (X). In Theorem 2 of [61], it was shown that for any
bounded metric space X, the minimum spanning tree dimension recovers the upper box counting dimension
Abstractly defined, neither eqs. (4) and (5) are amenable to computation. Adams et al. [1] obtained a lower
bound on dimMST (X) based on results of Schweinhart [93] by extending the O(n1−α/d ) bound on the growth
of Eα to any bounded metric space X. For α ∈ (0, dimMST (X)), and any {x1 , . . . , xn } ⊂ X, there is some
constant Cα,d such that
α
log Eα ({x1 , . . . , xn }) ≤ 1 − log n + Cα,d (7)
dimMST (X)
This implies for any sequence (xi ) of distinct elements of X, we can obtain a lower bound on the upper box
dimension and minimum spanning tree dimension by
γ log(Eγ ({x1 , . . . , xn })
dimMST (X) ≥ where β = lim sup . (8)
1−β n→∞ log n
The probabilistic properties of the lower bound γ/(1 − β) in eq. (8) have also been widely studied. Given
a probability measure µ with bounded support on a metric space X, [93] defines the persistent homology
dimension of the metric measure space (X, µ) as
given a parameter γ > 0. We recall the classic result for probability measures on Euclidean space by
Beardwood, Halton and Hammersley [8], and Steele [99], which implies dim0PH (µ; γ) = d for µ an absolutely
continuous measure on Rd , and γ ∈ (0, d).
Theorem 2.2. Let µ be a probability measure on Rd with compact support, and f be the density of its
absolutely continuous part. If x1 , . . . , xn are i.i.d. samples from µ, then with probability 1,
Eα ({x1 , . . . , xn })
Z
lim → c(α, d) f (x)1−α/d dx, (10)
n→∞ n1−α/d Rd
for α ∈ (0, d). Here c(α, d) > 0 is a constant that only depends on α, d.
15
Refinements of Theorem 2.2 have been proved for special cases. If µ is the uniform distribution on the
d-dimensional unit cube, Kesten and Lee [58] proved a central limit theorem for Eα : for α > 0, we have a
convergence in distribution
√
Eα ({x1 , . . . , xn }) 2
n − µ → N (0, σα,d ). (11)
n1−α/d
Costa and Hero [24] generalise Theorem 2.2 from RD to compact Riemannian manifolds of dimension d, with
f being a bounded density relative to the density of the manifold. In [23] they also observe that the limit
right hand side of eq. (14) is in fact related to Rényi entropy.
Recently, [93] proved a generalisation of Theorem 2.2 for d-Ahlfors regular measures on a metric space.
A measure µ is d-Ahlfors regular if there is are real, positive constants c, δ0 > 0 such that for all δ ∈ (0, δ0 ),
and x ∈ X, the measure of the open ball of radius δ at x is bounded by
1 d
δ ≤ µ(Bδ (x)) ≤ cδ d . (12)
c
This condition is satisfied, for example, for uniform measures on a d-dimensional manifold. Theorem 3
of [93] implies for γ ∈ (0, d), the persistent homology dimension of a d-Ahlfors regular measure recovers the
dimension parameter d:
dim0PH (µ; γ) = d.
Remark 2.3. We note that other aspects of the minimum spanning tree are related to dimension: for samples
from a d-Ahlfors measure, [62] showed the the maximum distance along an edge of its minimum spanning
tree scales as ((log n)/n)1/d .
Remark 2.4. For data sampled from a submanifold, [23] proposed using an Isomap based geodesic distance
distance estimation for the construction of the minimum spanning tree.
In practice, since constructing a minimum spanning tree on n points with full metric data is computa-
tionally onerous, a simplification can be made by taking the k-nearest neighbour graph and computing the
minimum spanning tree of the knn graph instead to speed up the computation.
be the total edge length of the knn graph. Like the total weight of its minimum spanning tree E α , the total
edge length of the knn graph Lk1 satisfies suitable additive properties such that the Euclidean functional
theory as described in [114] applies. Thus, a similar asymptotic result can be derived [114, Theorem 8.3]:
Theorem 2.5. Let µ be a probability measure on Rd with compact support where d ≥ 2, and f be the density
of its absolutely continuous part. If x1 , . . . , xn are i.i.d. samples from µ, then
Lk1 ({x1 , . . . , xn }) P
Z
lim −
→ c(k, d) f (x)1−1/d dx, (14)
n→∞ n1−1/d Rd
Given this theoretical characterisation [22] proposes an estimator KNN using a similar strategy to the
estimation of dimMST and dim0PH , as expressed in eq. (3). Given data X, we compute the total length Lk1 (X′ )
of k-nearest neighbours graphs of subsamples X′ ⊂ X, and infer the slope m of the curve log(Lk1 (X′ )) as a
function of log(|X′ |):
1
dˆ = . (15)
1−m
16
Remark 2.6 (Error propagation). Both dˆknn and dˆPH are subject to instability when the intrinsic dimension
is high. For high dimensions, the regressed slope m is just under one m = 1 − 1/d; any small error ϵ in the
slope can cause a large change in the estimated dimension
d
m 7→ m + ϵ =⇒ dˆ →
7
1 − ϵd
The dimension estimate will be impacted if |ϵ| > 1/d; for datasets with high intrinsic dimension the error
tolerance in the inference of the slope becomes smaller. In particular if the slope is over-estimated ϵd ≳ 1,
the dimension estimate can even become very negative. This is reflected in the performance of KNN on M10d
Cubic with intrinsic dimension 70 (see Appendix A).
Remark 2.7 (Robustness to outliers). Given a probability measure on an embedded manifold, we can regard
the absolutely continuous part of the measure (w.r.t. the density of the underlying manifold) as the ‘signal’,
while the singular part can be regarded as a model for outliers. While the minimum spanning tree and
k-nearest neighbour graphs are individually sensitive to outliers in data, Theorems 2.2 and 2.5 suggests that
the scaling of the total edge lengths of such objects with increasing sample size should be robust against
outliers, as the scaling should only depend on the absolutely continuous part of the measure. Nonetheless,
in practice, if the measure of the absolutely part is small (i.e. the number of outliers dominate the sample),
then it may take a large number of samples for the asymptotic scaling behaviour of Eα and Lk to emerge.
17
2.3.4 Analysis
In our experiments, PH0 dimension performs comparably with other estimators, yet is susceptible to over-
estimation in the presence of noise (see discussion in Section 3.5). However, magnitude dimension requires
a lot of points for even spheres of moderate dimension (see Table B1), making it an unreliable estimator.
Estimators
Local lPCA, MLE, WODCap, ESS
Global PH0 , mag, KNN, GRIDE, TwoNN, DANCo, MiND ML, CorrInt
Table 1: A classification of estimators considered in our experiments into global and local estimators.
An alternative categorisation of estimators divides them into local and global methods, with local esti-
mators providing local dimension estimates which are then combined, while global estimates provide a single
value.
In truth, any reliable estimator for the dimension of a manifold is likely to first obtain local information
of some sort, aggregate this information, and then convert it into a dimension estimator. The estimators
usually known as “local estimators” are, from this point of view, those for which the local information is
already a dimension estimate.
In this section, we review some of the estimators already discussed above which might be described as
“global estimators”.
The method of Fan et al . [34] identifies different eigenvalues for each neighbourhood from lPCA. These
local eigenvalues are then combined before the thresholding method is applied to the combined data. In this
way, despite the method being a variant of “local PCA”, it does not directly produce local dimensions, and
so is strictly speaking a global method. The architecture of the scikit-dimension package does not easily
allow for implementation of this method and so the results as presented do not reflect the original method as
designed by Fan et al . Instead the thresholding method is applied locally, converting it into a local method.
Note that any lPCA method could be adapted to function in the same way, with thresholding being applied
to these “global eigenvalues”.
CorrInt operates in a similar spirit. A simple local estimate of dimension at a point p ∈ X is given by
log N (p,r2 )−log N (p,r1 )
log r2 −log r1 for two radii r1 < r2 , under the assumption that N (p, r) is proportionate to the volume
of the ball of radius r, and that this grows as rd . However, for CorrInt, rather than combining these local
values of d, the volumes are combined over all balls, and a single estimate of d is then calculated.
TwoNN considers the entire distribution of local statistics to obtain a global value. At each point,
the ratios ρ2,1 are considered. A local dimension estimate could be obtained from this ratio, but instead
the empirical distribution of these ratios across all p ∈ X is considered and compared to the theoretical
distribution, which is dependent on d. This approach seems to result in a relatively high standard deviation
compared to other estimators.
The estimators described in Section 2.3 might appear to be global in that a global geometric object,
usually a graph of some kind, is constructed, and then a single metric invariant is extracted from it. How-
ever, even here, the analysis of the estimators to demonstrate their convergence turns out to rely on an
18
understanding of how they behave locally, which can then be used to infer the global behaviour via some
additivity condition.
Truly global estimators generally assume some global structure in the data. For example, a direct
application of PCA can be used to estimate dimension, with the assumption here being that the data lies in
an affine subspace. The MiND KL estimators are also truly global, making the global assumption that the
distribution of distances behaves similarly to that in a ball.
The WODCap method stands somewhat alone, in that it is essentially a bilocal estimator. Two points
are used as the centres of intersecting balls, and the resulting estimate of dimension is an estimate at this
pair of points.
2.6 Underestimation
Underestimation of dimension has been widely reported, especially for higher-dimensional datasets, and is
observed again in this survey. This is commonly attributed two possible causes: the “boundary effect’ or
“edge effect”, with observations of this dating back at least to [84], and a shortage of samples.
However, caution is warranted when considering how well this observation generalises beyond the bench-
marking set. We find that, when using datasets sampled from SO(n), overestimation is common.
19
2.6.2 Sample size effects
Another claimed source of negative bias is the insufficiency of data. For example, it has been claimed
that the original version of the CorrInt estimator [42] can only provide estimates dˆ < log diam(X)−log
2 log N
ε , where
ε ≪ diam(X) is the smallest radius considered [30]. However, this calculation is based on an assumption
that the volume of balls grows as rd for all values of r up to diam(X), which does not generally hold.
Assuming that a given estimator is asymptotically correct, it may be possible to use the given sample to
estimate the value it would take on an arbitrarily large sample. The IDEA estimator [90] attempts this by
using a jackknife subsampling method and fitting the estimates for a subsamples to a curve with a horizontal
asymptote.
For small sample sizes, in high dimensions, most points will be linearly separable from the rest of the data.
This means that the underlying geometric hypotheses do not hold at most points, so that it is reasonable
to expect significant difficulties in estimating dimension. Sensitivity to sample size will be discussed in the
analysis of our expermental results in Section 3.3.
3.2 Datasets
We use a now standard collection of datasets for benchmarking purposes [18, 44, 89], with a small number of
additions. These datasets are readily generated in scikit-dimension. The datasets encompass a large range
of dimensions (1 to 70), codimensions (0 to 72) and geometries (flat, constant curvature, variable curvature).
We should note that not all of the datasets are drawn from uniform distributions on their manifolds. Each
underlying manifold is diffeomorphic to either a sphere or a cube. We give a brief description of each in
Table 2.
For some purposes we have also considered the standard matrix embedding of SO(n) in Rn×n , though
we have not fully benchmarked this dataset. This produces a homogeneous manifold of dimension n(n−1) 2 ,
not lying in any affine subspace, and having a new topology compared to the other benchmark datasets. We
feel that these datasets would be good additions to the benchmark manifolds of [6], as including manifolds
20
Dataset d D Description
M1 Sphere 10 11 Uniform distribution on round sphere
M2 Affine 3to5 and M9 Affine 3, 20 5, 20 Affine subspaces
M3 Nonlinear 4to6 4 6 Nonlinear manifold, could be mistaken to be 3d.
M4, M6 and M8 Nonlinear 4, 6, 12 8, 36, 72 Nonlinear manifolds generated from the same function
M5a Helix1d 1 3 A 1d helix
M5b Helix2d 2 3 Helicoid
M7 Roll 2 3 Classic swiss roll
M10a,b,c,d Cubic 10, 17, 24, 70 11, 18, 24, 72 Hypercubes
M11 Moebius 2 3 The 10 times twisted Moebius band
M12 Norm 20 20 Isotropic multivariate Gaussian
M13a Scurve 2 3 Surface in the shape of an “S”
M13b Spiral 1 13 Helix curve in 13 dimensions.
Mbeta 10 40 Generated with a smooth nonuniform pdf
Mn1 and Mn2 Nonlinear 18, 24 72, 96 Nonlinearly embedded manifolds of dimensions
Mp1, Mp2 and Mp3 Paraboloid 3, 6,, 9 12, 21, 30 Paraboloids nonlinearly embedded of dimensions
with known geometric and topological information will increase knowledge of where specific estimators work
well.
For our experiments investigating the effects of noise, curvature, we consider a collection of datasets
where we have a good control and understanding of their dimensions and geometry. These include a torus
of revolution in R3 (which has no boundary) as well as families of paraboloids with varying curvature shown
in Figures 14, 12 and 13.
21
Dependency on a tailored choice of hyperparameters In our experiments, we varied the hyperpa-
rameters of estimators around those used in the original papers, or the defaults in the scikit-dimension
implementation, as reported in Appendix D. We compared the performance of the best possible estimate
an estimator can achieve with optimal hyperparameters within our range, and the estimate achieved by
choosing hyperparameters that ensures good performance on most of the datasets in the benchmark set (this
hyperparameter is defined precisely in Appendix C). A large discrepancy in results between these two choices
indicates that an estimator needs tailored choices of hyperparameters to achieve optimal results. This is an
undesirable effect,
For the collection of “slope-inference” based global estimators considered here – KNN, PH0 , TwoNN,
and GRIDE – this discrepancy is often small, especially on low dimensional datasets without complicated
nonlinearities. This may be due to the fact that there are only one or two hyperparameters for these
estimators, far fewer compared to others on our list. There seems to be a locality bias for these estimators.
For KNN, the close to optimal performance (on low dimensional datasets without complicated nonlinearities)
can be reached by choosing the nearest neighbour parameter k to be close to one (see Table D6, and also
Figure 6 for an illustration). For PH0 , we can also guarantee reasonable performance with the choice
α = 0.5, which emphasises the contribution of small distances over large ones across edges of the minimum
spanning tree (Table D4). Choosing the hyperparameters that reduce GRIDE to MLE (with input from the
distance to the two nearest neighbour distances) is often effective (Table D8). We also note that given such
hyperparameters, the performances of GRIDE and TwoNN are often similar in Tables C7 and C10.
We defer discussions about other estimators and their need to tune hyperparameters to specific data in
the captions of the benchmark results tables in Appendix C. For local estimators, the specific issue of tuning
the neighbourhood size and aggregation method is analysed in greater detail in Sections 3.4.1 and 3.4.2.
Remark 3.2. We emphasise that our empirical study is subject to our particular choices of hyperparame-
ter ranges, which cannot encompass all possible hyperparameter combinations due to finite computational
resources. We refer the reader to Appendix D for the range of hyperparameters chosen for each estimator,
which was guided by the literature.
Sample Economy We restrict our assessment here to how an estimator performs on low dimensional
datasets (dimension < 6), as most estimators face challenges with even moderately high dimensional datasets.
One surprising observation is the slow increase in most estimators’ accuracy with the number of samples,
at least in the regime of N ∈ {625, 1250, 2500, 5000} being tested. As summarised in Table 3, and detailed
in Appendix C, this is often true with global estimators such as PH0 , KNN, MiND ML, GRIDE, and TwoNN
where N ∈ {625, 1250} often results in accurate dimensions estimation on low dimensional datasets. On the
other hand, while this is also observed on some local estimators such as ESS, TLE, other local estimators
such as lPCA, MLE only recover comparable performance in the N ∈ {2500, 5000} regime, suggesting that
they are more sample hungry. Given fewer samples, knn neighbourhoods have a larger effective radius and
so non-flat geometries in the dataset can further bias the estimation. Figure 3 demonstrates the behaviour
of all estimators on two datasets.
Accuracy on high dimensional datasets Overall, most estimators tend to underestimate, even on
their best hyperparameters. This is often especially acute on high dimensional datasets, and some nonlinear
datasets. One exception is lPCA, which appears to give the correct dimension every time. However, as we
will discuss below, this is mainly due to our ability to tune the hyperparameter to give the correct result;
with another hyperparameter the estimate could be substantially incorrect (see Figure 9).
As we would expect, most estimators are very good on low dimensional data, and struggle when the
dimension increases beyond 6. There are exceptions to this rule, with ESS and DANCo performing very
well on the high dimensional datasets. We note that, as the sample size increases, most estimators improve.
However, there are exceptions, with DANCo, ESS and FisherS not changing substantially on most datasets
as sample size increases.
Variance of dimension estimates Local estimators, such as lPCA, MLE, MiND ML, WODCap, and TLE,
tend to have a low variance. These estimators aggregate many local dimension estimates into a global
dimension estimate. The standard deviation on the mean, median, or harmonic mean of the local estimates
22
(a)
(b)
Figure 3: The best estimates from our benchmark tables for 625, 1250, 2500 and 5000 points for each
estimator on two different datasets, Mbeta and M8Nonlinear. As a general rule, increasing sample size
improves accuracy. However, to return a correct estimate would require a lot more than 5000 points from
these datasets for most estimators. Some estimators have a clear bias which does not reduce with increasing
sample sizes. Also seen is a differing level of responses to changing sample size, which appear broadly
consistent across the two datasets. For example, WODCap underestimates, becoming much better as sample
size increases, while on the other hand FisherS and lPCA do not change significantly from 625 points to 5000.
23
decreases with the number of local estimates. On the other hand, estimators such as PH0 and KNN suffer
from higher error sensitivity in high dimensions which can increase variance (see Theorem 2.6).
In Figure 4, we visualise the effect of increasing sample size on the variance of the estimate on M1Sphere
(uniform samples on S 10 ⊂ R11 ). We note that the standard deviation of MiND ML, KNN and DANCo
decreases at a slower rate than the other estimators as the number of samples increases. Indeed, the standard
deviation of DANCo increases, which may be due to an unusual choice of hyperparameters yielding the best
mean estimate.
The variance might be expected to decay with increasing sample size with rate N −1 . However, we observe
that for N = 625 the variance is higher than would be expected. As N increases from 625 to 1250, 2500,
and 5000, we might expect the variance to be 8, 4 and 2 times greater than its final value at N = 5000.
In fact, taking the median of these ratios over the estimators (discounting lPCA, which has 0 variance, and
DANCo, due to the unusual behaviour at N = 5000), we observe 15.3, 5.1 and 1.8. This suggests that, for
many estimators, small sample sizes which result in larger neighbourhoods that are less well approximated
by flat spaces create additional variance.
Figure 4: Standard deviation of dimension estimates of M1Sphere (uniform samples on S 10 ⊂ R11 ) over 20
runs, as shown in the tables in Appendix C, using the hyperparameters which provide the best estimate.
This plot has a logarithmic scale on both axes. lPCA has standard deviation zero, as does WODCap for
N = 2500, and so these points are not shown.
Nonlinear datasets Most estimators struggle quite a lot with these datasets. An exception is lPCA
though, as discussed, hyperparameter selection can ensure good performance for lPCA. We conjecture that
since lPCA infers the dimension of the tangent plane, nonlinear features such as variable sampling density
generate a smaller bias than with parametric estimators such as MLE.
It is interesting to compare which estimators work well on which datasets. For example, MLE, PH0 , KNN,
GRIDE, TwoNN, MiND ML, and CorrInt tend to perform well on M3,4,8 Nonlinear and struggle with M12
Norm and Mn1,2 Nonlinear; yet it is the reverse with lPCA, DANCo, and ESS. One particularly difficult
dataset is Mbeta. As a 10-dimensional manifold with ambient dimension 40, the dimension and codimension
are both relatively high. Furthermore, density and curvature are variable.
Some estimators, such as PH0 , KNN, and CorrInt, are meant to capture a notion of dimension of a metric
measure space, which can be distinct from the dimension of the support of the measure. Hence, for datasets
such as samples from the normal distributions (M12 Norm), the estimated dimension of such estimators can
be different to the dimension of the support.
24
No tailoring of params Sample economy High dim (> 6) Low variance Nonlinear
lPCA ✓
MLE ✓
PH ✓ ✓
KNN ✓ ✓
WODCap ✓
GRIDE ✓ ✓
TwoNN ✓ ✓
DANCo ✓
MiND ML ✓ ✓
CorrInt ✓
ESS ✓ ✓ ✓ ✓
FisherS ✓
TLE ✓ ✓
Table 3: Qualitative assessment of performances of estimators on the benchmark dataset. None of the
estimators consistently perform well on the nonlinear datasets in the benchmark.
25
mean har. mean median
12
10
8
SO(4)
12 lPCA FO KNN
MLE TLE
CDim WODCap
10 ESS Truth
MiND_MLi
8
S6
10 20 30 40 50 10 20 30 40 50 10 20 30 40 50
k (Nearest neighbors)
Figure 5: Comparison of dimension estimates of SO(4) and S 6 for estimators with a k-nearest neighbour
parameter. The input data consists of 2500 points uniformly sampled on the manifolds, which have intrinsic
dimension 6. The k parameter is varied from 4 to 50 in steps of 2. For local methods – all on the figure, apart
from MiND ML and KNN – we vary the method of aggregation over local estimates. Note that WODCap does
not aggregate the local dimension estimates, but rather the estimated volume fraction of spherical caps. We
also remark that MiND ML, which is hard to distinguish on the plot, consistently returns an estimate of 6.
The estimators obtained from distributions of nearest neighbour distances (MLE and TLE, excluding
the integer-valued MIND MLi) appear to converge to a common value as k grows, which for SO(4) is an
overestimate but for S 6 is an underestimate. For smaller values of k, both MLE and TLE tend to overestimate.
MLE, with harmonic mean, performs best for small values of k.
One clear potential source of bias for parametric estimators of this type is the failure of the underlying
hypotheses to hold. In general, these are, firstly, that a sample from a ball centred on p ∈ X yields points
with the same distance from p as a sample with uniform density from a Euclidean ball would and, secondly,
that the distances ri (p) for each p ∈ X are independently and identically distributed.
The failure of the first assumption can be caused by the failure of the embedding to be totally geodesic,
by the existence of non-manifold points phenomena, including the presence of boundary, and by variable
sampling density.
The second assumption is not true: for points p, q ∈ X which are close to each other the statistic ri is
positively correlated. Furthermore, since X arises from a binomial point process rather than a Poisson point
process, the existence of a densely sampled region with low values for ri necessarily implies that the remainder
of M is more sparsely sampled, so that for points p, q ∈ X which are far from each other, ri will be negatively
correlated. As argued in [67], it seems likely that the long-range effects are much weaker, and it is shown in [26]
that distance ratios are independent as long as they come from disjoint neighbourhoods. Trunk [104] found
spatial correlations were not significant by comparing empirical distributions to the theoretical distribution
which arises from the assumption of independence and using a Kolmogorov–Smirnov test. However, the
exact experimental procedure is unclear: from context it seems likely that at best d ≤ 4 is considered.
All of these errors will tend to become more evident as neighbourhood size increases, so that it is
reasonable to anticipate that parametric methods will experience a bias that increases with k, as MLE
26
appears to.
The remaining estimators, lPCA and CDim, which seek an approximating affine subspace, and ESS,
demonstrate a tendency to increase with k. For lPCA and CDim, the estimate relies on how well the
collection of k + 1 points approximates an affine subspace. The underlying hypotheses are less delicate,
with the failure to be totally geodesic and the existence of non-manifold points being the sources of error.
However, boundary points need not be an issue. As noted below, a good estimate requires k to be sufficiently
large. However, the data for SO(4) demonstrate how, in a high codimension setting, lPCA can significantly
overestimate if k is too high.
Throttling The number of points in a neighbourhood can “throttle” the dimension estimator, so that it
is impossible for it to return an estimate above a certain value.
For example, if knn neighbourhoods are used, then it is clear that lPCA can only give k non-zero eigen-
values and so the dimension estimate will never exceed k. We can describe this as linear throttling – for
an accurate estimate the parameter k must grow linearly in d. This phenomenon is clear in Figure 7 and
Figure 8 where throttling occurs for 4 ≤ k ≤ 6
However, there are also estimators with exponential throttling, where k must grow exponentially in d.
For example, the estimates of [60] suffer from their discrete nature. Volumes are approximated by numbers
of points in balls built from the knn graph. The bounded valence of the graph is what generates throttling
1
in this instance. The doubling property estimator, for example, can never exceed log2 (k + k+1 ). This is
because |Be (i, 1)| = k + 1 always and, since each knn of xi has at most k + 1 unique nearest neighbours, the
maximal value of |Be (i, 2)| is k(k + 1) + 1.
The maximum value which can be returned by WODCap is S −1 ( k+1 2
) where
!
ˆ dˆ + 1 1
S(d) = I 43 , .
2 2
The right term of the product on the right hand side goes to 0 as d → ∞; however, it is completely
dominated by the reciprocal of the integral. As if we plug in d = 2, 20 and 200, we get k ≈ 19, 528 and
2.6 × 1014 respectively. From this it appears that as d grows k needs to grow exponentially. Since k is
intended to define a small neighbourhood, the number of points of the sample required for the method to
estimate d becomes completely unfeasible.
Probabilistic phenomena can cause throttling as well, in the sense that k must grow at a certain rate
for a correct estimator to be returned with a given probability. An example occurs with CDim, where
Figure 5 is suggestive of throttling. Note that the theoretical guarantees for CDim already require k to grow
exponentially with d. While this sample size is sufficient for the estimator to work, it need not be necessary.
The algorithm to compute the estimate finds the largest possible subset of directions to nearest neighbours
where all pairwise angles are at least π/2. In a large dimensional space, the angle between any two directions
is very likely to be close to π/2, so that for any given pair it is approximately equally likely that the angle
is greater than or less than π/2. For a subset of length d,ˆ the probability that any given additional vector
ˆ
can be added to it is approximately 2−d . Since there are k − dˆ neighbours to check, the probability that the
subset cannot be enlarged is
ˆ
!k−dˆ
2d − 1
.
2dˆ
27
KNN Performance on Benchmark Manifolds
M1 Sphere M2 Affine 3to5 M3 Nonlinear 4to6 M4 Nonlinear M5a Helix1d M5b Helix2d
10.0 3.0 4.0 4.0 2.5
9.8 3.9 3.9 1.01
2.9 2.4
9.6 3.8 3.8
1.00 2.3
9.4 2.8 3.7 3.7
2.2
9.2 3.6 3.6 0.99
2.7 3.5 2.1
9.0 3.5
0.98 2.0
3.4 3.4
8.8 2.6
M6 Nonlinear M7 Roll M8 Nonlinear M9 Affine M10a Cubic M10b Cubic
6.0 20 10.0 17
14.5
2.00 19 9.5 16
5.8 14.0
18
1.95 13.5 9.0 15
5.6 17
13.0 8.5 14
5.4 1.90 16
12.5
Estimated Dimension
15 8.0 13
5.2 1.85 12.0 7.5
14 12
M10c Cubic M10d Cubic M11 Moebius M12 Norm M13a Scurve M13b Spiral
24 600 21
2.00
23 500 20 2.00 1.8
1.95
22
400 1.90 19 1.6
21 1.95
20 300 1.85 18 1.4
19 200 1.90
1.80 17 1.2
18
100 1.75 1.85
17 16 1.0
Mbeta Mn1 Nonlinear Mn2 Nonlinear Mp1 Paraboloid Mp2 Paraboloid Mp3 Paraboloid
10 18 24 6.00 9.0
23 3.0 5.75 8.5
9 17
22 2.9 5.50 8.0
16
8 21 5.25 7.5 Median
2.8 IQR
15 20 7.0 Truth
7 5.00
14 19 2.7 6.5
4.75
6 18 2.6 6.0
13 4.50
17 5.5
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
k (Number of Neighbors)
Figure 6: Performance of KNN-estimator vs k on benchmark manifolds, for 5000 samples on each manifold.
We show the empirical median dimension estimate, and the interquartile range, over 20 sets of random
samples. Note the different scaling of the y-axis from dataset to dataset. The high variance for M10d Cubic
is due to the sensitivity to errors in high dimensions of the method; see Theorem 2.6.
28
lPCA: Dimension estimate of S 6, 2500 samples
8
Truth maxgap Kaiser
FO ratio broken stick
7 Fan p. ratio Minka
6
0
0 20 40 60 80 100
k Nearest neighbors
Figure 7: Dimension estimates from lPCA when neighbourhood sizes are varied from 4 to 100, for S 6 ⊂ R7 .
Colours indicate different threshold methods. Solid curves correspond to estimates using knn neighbour-
hoods, dashed curves correspond to using ϵ-neighbourhoods, where for each nearest neighbour value k, the
radius ϵ is chosen to be the median knn distances
Once dˆ is relatively large, it is therefore necessary to have a very large value of k in order to obtain a
new direction which is obtuse to the existing dˆ directions with a given probability. In fact, using Markov’s
inequality and considering the matrix of inner product signs as in [100], we can see that the probability of
ˆˆ
finding dˆ such directions is kdˆ 2−d(d−1)/2 , and so, by Stirling’s formula, for large d the probability of finding
k−d̂(d̂−1)/2
dˆ directions is bounded above by C 2 √ k
. This indicates that CDim suffers at least quadratic throttling.
All the tangential estimators described in this survey are vulnerable to at least linear throttling. This
appears inevitable, since the tangent space is the linear structure of the manifolds, the affine space which
approximates the data. Unless k ≥ d we cannot hope to accurately recover the entire affine space.
This consideration makes clear that, for very high-dimensional datasets, the use of a tangential estimator
cannot be recommended. The presence of throttling can be detected in lPCA by comparing the estimated
dimension to the hyperparameter k and we recommend that implementations of lPCA warn users when
throttling is occurring.
Noise The use of smaller neighbourhoods is dangerous in the presence of noise, where it will tend to
produce overestimates while in the presence of curvature, larger neighbourhoods are more vulnerable. The
influence of noise is discussed in Section 3.5
29
lPCA: Dimension estimate of SO(4) 16, 2500 samples
12
10
0
0 20 40 60 80 100
k Nearest neighbors
Figure 8: Dimension estimates from lPCA when neighbourhood sizes are varied from 4 to 100, for
SO(4) ⊂ R16 . Colours indicate different threshold methods. Solid curves correspond to estimates using
knn neighbourhoods, dashed curves correspond to using ϵ-neighbourhoods, where for each nearest neighbour
value k, the radius ϵ is chosen to be the median knn distances.
Table 4: Performance of lPCA on benchmark datasets, each consisting of 5000 points, and we report the
mean and standard deviation in the dimension estimate over 20 samples. The knn neighbourhood is fixed
to be k = 80, and the local dimension estimates are aggregated using the mean. Individual hyperparameters
of the thresholding methods are scikit-dimension defaults, as stated in Table D1.
scikit-dimension are both 0.05, producing a huge discrepancy between the two methods. However, both
methods are capable of producing the correct answer for the perfectly tuned choice of hyperparameter. It is
crucial that practitioners understand the potential sensitivity in the operation of estimators before applying
them.
30
Figure 9: We consider 5000 points on M6 Nonlinear and fix the neighbourhood size at 50. Searching a
range of values for the hyperparameter α in both alphaFO and alphaRatio, we observe that the methods
mirror each other. Both can produce the correct dimension, 6, but only for a narrow bandwidth of the
hyperparameters. Outside of these bandwidths the estimates can vary significantly. The comparison shows
that although α = 0.05 is the default for both as the “out of the box” value in scikit-dimension, the
results are very different. Hence, a thorough understanding of what the hyperparameters represent and of
their potential importance is essential.
on values previously optimized for the S 10 dataset under noise-free conditions. The final results, presented
in the tables, represent medians calculated from 20 experimental repetitions.
The complete results are presented in Tables E1–E4. Some of the most notable findings are illustrated
in Figure 10.
The estimators’ behavior in the presence of noise varies significantly, depending on both the type and
magnitude of the applied noise. Furthermore, a comparison of the lPCA and MiND ML estimators reveals
that an estimator’s robustness cannot be solely judged by its performance on a single dataset. While all three
estimators perform exceptionally well for a 10-dimensional hypersphere (both with and without noise), a
small amount of ambient Gaussian noise causes a significant overestimation for a 6-dimensional hypersphere.
All tested estimators exhibit susceptibility to ambient Gaussian noise. This is particularly evident when
there is a substantial difference between the intrinsic dimension and the embedding dimension. In such
instances, even a very small noise level (with a standard deviation of 0.01) significantly alters the results for
some estimators (e.g., lPCA and MiND ML in Table E1).
It is crucial to note that the parameters used for these experiments were optimized for the uncorrupted
data. As some methods, such as CorrInt and GRIDE, have parameters which are intended to prevent selecting
neighbours on the noise scale, adjusting these parameters could potentially improve results for noisy data. Of
the tested estimators, CorrInt demonstrates the highest resistance to ambient Gaussian noise. Interestingly,
estimates from FisherS decrease as the standard deviation of the disturbances increases, even though the
disturbances are of a higher dimension than the initial dataset.
Most estimators demonstrated robustness to outliers, with the exceptions being FisherS. For the FisherS
estimator, the addition of outliers led to a significant reduction in dimension estimates.
31
Algorithm 1 Generate data set with outliers
1: procedure AddOutliers(D, nout )
2: Input: Clean dataset D with n observation of size d.
3: Number of outliers nout
4: Output: Datas set with outliers
5: out indices ← random choice([1, ..., n], nout ) ▷ Sample indices for outliers
6: for each index in out indices do
7: point ← D[index]
8: for i ← 1 to d do
9: m ← random(3, 6) ▷ Get random multiplier between 3 and 6
10: point[i] ← point[i] · m
11: end for
12: D[index] ← point
13: end for
14: return D
15: end procedure
Dataset d n
S6 6 11
S 10 10 11
SO(4) 6 16
Table 5: Datasets used for noise experiments. Intrinsic dimension of dataset and embedding dimensions are
marked respectively by d and n.
Figure 10: Left: We compare the average performance over 20 runs for the best performing hyperparamenters
against the “out of the box” hyperparameters for each estimator on the M6 Nonlinear data set with 5000
points. Right: We compare the average performance over 10 runs across two different types of noise, gaussian
noise with a standard deviation of 0.1, 125 outliers as described in Algorithm 1 and baseline of no noise. The
sample size of 2500 points on S 6 ⊂ R11 was used. The hyperparameters used were the best hyperparameters
for S 10 ⊂ R11 . Most can handle outliers well but struggle with Gaussian noise. For FisherS this is reversed.
32
Figure 11: Demonstration of how curvature affects PCA when points are drawn from a non-linear curve in
R2 . By comparison, using lPCA mitigates the effect of curvature. If the neighbourhoods are chosen too large
lPCA will also begin to struggle.
Consider, for example, the performance of PCA and of lPCA on points on drawn from a non-linear curve in
R2 as shown in Figure 11.
We investigate the effect of curvature on two pointwise dimension estimators, lPCA and MLE using a
selection of paraboloids as well as the standard embedding of a torus in R3 as illustrative examples. These
datasets are shown in Figures 12 to 14.
2
We consider a family of elliptic and hyperbolic paraboloids given by the equations 2x2 ± yb2 = z. One
principal curvature is fixed, while the second varies. We estimate the pointwise dimension at the point
(0, 0, 0). This is repeated 1500 times for each surface (sampled uniformly with the same density, so that
when b = 1 we sample 10000 points). The dimension is estimated using lPCA with a range of nearest
neighbour values (20-165 increasing in 5’s to give us 30 values of k). We count how many values of k give
an estimate of 2 and also the largest value of k that gives a dimension 2 estimate.
In Figure 15 we observe a clear improvement (more k values giving dimension 2 and larger values giving
dimension 2) as the Gauss curvature increases from being negative to being zero. This improvement continues
until the second principal curvature is κ2 ≃ 0.6. This suggests that for lPCA using “FO” and α = 0.05,
negative curvature creates an upward bias. If κ1 has a sign, then performance is best when κ2 ≃ κ81 , rather
than κ2 = 0. It would be informative to see if this relationship holds for different values of κ1 . We must
note that the standard deviations are large but the trend is very clear and visible in both statistics.
On the other hand, for MLE the results are quite different. The largest value of k considered always gives
a dimension estimate of 2. The amount of values of k yielding dimension 2 estimates on average is 27.5 to
29 out of 30. We study the effect of curvature here by instead averaging the pointwise dimension of (0, 0, 0)
over 150 runs for each k. This is shown in Figure 16.
In Figures 17 and 18 we plot the average pointwise dimension estimate for a fixed k against κ2 . We find
that for MLE, for all k, there is a trend to slightly underestimate the dimension as curvature increases. As k
increases, the estimate also decreases, as does the standard deviation which is not dependent on curvature.
However, for lPCA we see that for increasing values of k the estimates get worse, and for a fixed k we see
the same trend as in figure Figure 15, that the best estimates are made at a positive curvature. For lPCA
the standard deviations much more than for MLE.
To study the impact of curvature on the torus, we sampled 10000 points uniformly from the torus in
R3 . The degree of overestimation of each estimator is measured by counting the number of points with
pointwise estimated dimension 3 (rounded to the nearest integer). To examine how the estimate varies with
curvature, we plot the cumulative distribution of overestimated points against |ϕ − π|, where ϕ is the angle
of the inner circle from 0 to 2π. In these co-ordinates, |ϕ − π| = 0 corresponds to the inside of the torus
(negative curvature), π/2 is the top and bottom of the torus (zero curvature) and π represents the outside of
the torus (positive curvature. In particular, we observe that for lPCA, as k increases and the neighbourhood
captures more curvature, that pointwise dimension estimates of 3 appear on the inside of the torus (points
of negative curvature) and decrease in frequency towards the outside of the torus. This is captured in the
33
Figure 12: Points drawn from an elliptic paraboloid. The curvature at the origin is positive.
Figure 13: Points drawn from a hyperbolic paraboloid. The curvature at the origin is negative.
34
Figure 14: Sample from a torus. Red points having negative Gauss curvature, blue points have positive
curvature.
(a) Number of choices of k giving an estimate of 2 (b) Largest choice of k giving an estimate of 2
Figure 15: Both figures show the same trend. As curvature increases from negative to positive, the number
of choices of k and the largest choice of k giving an estimate of 2 both increase. This trend continues past
Gauss curvature 0. At κ2 ≃ 0.6 this trend reverses. In the case of the torus below, κ2 ≤ 1 making this
reversal hard to detect.
35
Figure 16: Average pointwise dimension at (0,0,0) for MLE for different for data from the positively curved
2 2
2x2 + yb2 = z on the left and from the negatively curved 2x2 − yb2 = z on the right. The same trend appears
in both, that as k is increased the pointwise estimates decrease.
Figure 17: For MLE there is a trend to underestimate as either κ2 or the neighbourhood size k is increased.
The standard deviation stays reasonably constant for changes of κ2 , while it decreases with k.
Figure 18: For lPCA we recover the same trend in the estimate with changes in κ2 which we saw in Figure 15
for k ≥ 40. We also see the high standard deviations in the estimate which occurs as we are only considering
one point for each dataset.
36
(a) lPCA (b) MLE
Figure 19: Using a torus embedded in R3 as an example, we investigate the tendency to overestimate
dimension as the local geometry is varied. lPCA and MLE exhibit divergent responses. The x-axis represents
the position of points on the torus, with 0 representing the innermost latitude (negative curvature), π/2 the
top and bottom of the torus (flat), and π the outermost latitude (positive curvature). Larger neighbourhood
size n results in overestimation by lPCA, but better accuracy for MLE. For lPCA this phenomenon is first
apparent in negative curvature and, as n grows, it spreads to the positively curved region. In the worst
case MLE performs significantly better than lPCA. The figure also emphasises the great importance of
hyperparameter choice.
37
and metric invariants – demonstrates that there are a variety of perspectives from which dimension estimation
can be approached. We find that the best estimators tend to lie in the parametric family. Persistent homology
provides a reasonably successful estimator from a topological perspective.
A tendency towards underestimation for datasets of higher dimensions is confirmed. However, we caution
against generalisation. Experiments on data drawn from SO(n) often demonstrate overestimation. We
believe the cause of underestimations comes principally from concentration of measure near the boundary
and from finite size effects. Empirical corrections attempt to reverse this underestimation tendency but, if
the dataset under study is not similar to the ones used to calculate the correction factor, this can lead to
large errors.
The range of datasets used allowed us to assess the estimators over a wide variety of desirable criteria.
We find that no estimators provide a satisfactory level of performance on non-linear datasets with dimension
above six. However, ESS performs strongly on all other criteria. We strongly recommend that future
researchers include ESS as a comparison method to benchmark performance.
Our additions to the methods of scikit-dimension: GeoMLE, GRIDE, CDim, WODCap, Camastra &
Vinciarelli’s extension to the Grassberger–Procaccia CorrInt algorithm, the packing-number based estimator,
and the magnitude and PH0 estimators, expand the methods practitioners can draw from in an easily
accessible place. We have also added new functionality for lPCA and MLE, so that users can choose ϵ-
neighbourhoods in addition to knn neighbourhoods, giving greater freedom to practitioners. We have also
added a probabilistic thresholding method for PCA [78].
Given the increasing use of PH0 and mag dimension estimators in the applied topology community,
especially on machine learning problems [5, 69], we give a more detailed investigation and benchmarking of
performance on these estimators. While PH0 performs comparably to other estimators investigated here, the
estimation of magnitude dimension suffers from finite size effects, which is detailed in Appendix B.
We demonstrate that the choice of hyperparameters is crucial and that is is essential for practitioners to
understand the role they play for each estimator. Our recommendation is that a range of hyperparameters
be used to build confidence in the result. We also recommend developers of dimension estimators consider
theoretical limits that hyperparameter choices on the range of dimension estimates it can obtain, as we
have identified a throttling phenomenon that results from poor choices of hyperparameters across several
estimators.
Gaussian noise presents an issue for most estimators. However, local estimators can overcome certain
types of outliers through aggregation.
We provide evidence that the role of curvature plays an important role in estimations. We confirming
the known negative effects of curvature on lPCA, but find that, for at least one aggregation method, slightly
positively curved surfaces can be easier to estimate that those with zero Gauss curvature. The effects of
curvature MLE are much lower that the effect of lPCA, which may generalise to a statement that parametric
estimators are more robust to curvature than tangential estimators.
Areas that require major progress within this field are estimator performance on non-linear manifolds and
high dimensional manifolds, as well as the development of practical ways to guide hyperparameter choice.
Alternatively, it would be worth investigating adaptive methods, automating the choice of hyperparameters
using features of the input data. The limited and varied results shown here on the role of curvature clearly
justify a systematic approach to determining to what extent curvature has an impact on dimension estimators.
Acknowledgements
JB: This work was supported by the Additional Funding Programme for Mathematical Sciences, delivered
by EPSRC (EP/V521917/1) and the Heilbronn Institute for Mathematical Research. PD and JM: This work
was supported by Dioscuri program initiated by the Max Planck Society, jointly managed with the National
Science Centre (Poland), and mutually funded by the Polish Ministry of Science and Higher Education and
the German Federal Ministry of Education and Research. JH and KMY: This work was supported by a
UKRI Future Leaders Fellowship [grant number MR/W01176X/1; PI J Harvey]. This material is based in
part upon work supported by the National Science Foundation under Grant No. DMS-1928930, while JH
was in residence at the Simons Laufer Mathematical Sciences Institute in Berkeley, California, during Fall
2024.
38
References
[1] Henry Adams, Elin Farnell, Manuchehr Aminian, Michael Kirby, Joshua Mirth, Rachel Neville, Chris
Peterson, and Clayton Shonkwiler. A Fractal Dimension for Measures via Persistent Homology. In
Nils A Baas, Gunnar E Carlsson, Gereon Quick, Markus Szymik, and Marius Thaule, editors, Topo-
logical Data Analysis, pages 1–31, Cham, 2020. Springer International Publishing.
[2] Luca Albergante, Jonathan Bac, and Andrei Zinovyev. Estimating the effective dimension of large
biological datasets using Fisher separability analysis. In 2019 International Joint Conference on Neural
Networks (IJCNN), pages 1–8, Budapest, 2019. IEEE.
[3] Laurent Amsaleg, Oussama Chelly, Teddy Furon, Stéphane Girard, Michael E. Houle, Ken ichi
Kawarabayashi, and Michael Nett. Extreme-value-theoretic estimation of local intrinsic dimension-
ality. Data Mining and Knowledge Discovery, 32(6):1768–1805, 11 2018.
[4] Laurent Amsaleg, Oussama Chelly, Michael E Houle, Miloš Radovanovi, and Weeris Treeratanajaru.
Intrinsic Dimensionality Estimation within Tight Localities. In Proceedings of the 2019 SIAM Inter-
national Conference on Data Mining (SDM), pages 181–189, Calgary, 2019. SIAM.
[5] Rayna Andreeva, Katharina Limbeck, Bastian Rieck, and Rik Sarkar. Metric Space Magnitude and
Generalisation in Neural Networks. In Timothy Doster, Tegan Emerson, Henry Kvinge, Nina Miolane,
Mathilde Papillon, Bastian Rieck, and Sophia Sanborn, editors, Proceedings of 2nd Annual Workshop
on Topology, Algebra, and Geometry in Machine Learning (TAG-ML), volume 221 of Proceedings of
Machine Learning Research, pages 242–253. PMLR, 8 2023.
[6] Jonathan Bac, Evgeny M. Mirkes, Alexander N. Gorban, Ivan Tyukin, and Andrei Zinovyev. Scikit-
Dimension: A Python Package for Intrinsic Dimension Estimation. Entropy, 23(10):1368, 10 2021.
[7] Mukund Balasubramanian and Eric L. Schwartz. The isomap algorithm and topological stability.
Science, 295(5552), 2002.
[8] Jillian Beardwood, J. H. Halton, and J. M. Hammersley. The shortest path through many points.
Mathematical Proceedings of the Cambridge Philosophical Society, 55(4):299–327, 10 1959.
[9] Zsigmond Benkő, Marceli Stippinger, Roberta Rehus, Attila Bencze, Dániel Fabó, Boglárka Hajnal,
Loránd G. Eröss, András Telcs, and Zoltán Somogyvári. Manifold-adaptive dimension estimation
revisited. PeerJ Computer Science, 8, 2022.
[10] Tolga Birdal, Aaron Lou, Leonidas J Guibas, and Umut Simsekli. Intrinsic Dimension, Persistent Ho-
mology and Generalization in Neural Networks. In M Ranzato, A Beygelzimer, Y Dauphin, P S Liang,
and J Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34,
pages 6776–6789. Curran Associates, Inc., 2021.
[11] Christopher Bishop. Bayesian PCA. Advances in Neural Information Processing Systems, 11, 1998.
[12] Christopher J. Bishop and Yuval Peres. Fractals in Probability and Analysis. Cambridge University
Press, 2016.
[13] Adam Block, Zeyu Jia, Yury Polyanskiy, and Alexander Rakhlin. Intrinsic Dimension Estimation
Using Wasserstein Distance. Journal of Machine Learning Research, 23:1–37, 2022.
[14] Andreas Buja and Nermin Eyuboglu. Remarks on Parallel Analysis. Multivariate Behavioral Research,
27(4):509–540, 10 1992.
[15] Francesco Camastra. Data dimensionality estimation methods: a survey. Pattern Recognition,
36(12):2945–2954, 12 2003.
[16] Francesco Camastra and Antonino Staiano. Intrinsic dimension estimation: Advances and open prob-
lems. Information Sciences, 328:26–41, 1 2016.
39
[17] Francesco Camastra and Alessandro Vinciarelli. Intrinsic Dimension Estimation of Data: An Approach
Based on Grassberger–Procaccia’s Algorithm. Neural Processing Letters, 14(1):27–34, 8 2001.
[18] P. Campadelli, E. Casiraghi, C. Ceruti, and A. Rozza. Intrinsic Dimension Estimation: Relevant
Techniques and a Benchmark Framework. Mathematical Problems in Engineering, 2015:1–21, 2015.
[19] Luca Candelori, Alexander G. Abanov, Jeffrey Berger, Cameron J. Hogan, Vahagn Kirakosyan, Kharen
Musaelian, Ryan Samson, James E.T. Smith, Dario Villani, Martin T. Wells, and Mengjia Xu. Robust
estimation of the intrinsic dimension of data sets with quantum cognition machine learning. Scientific
Reports, 15(1), 12 2025.
[20] Claudio Ceruti, Simone Bassis, Alessandro Rozza, Gabriele Lombardi, Elena Casiraghi, and Paola
Campadelli. DANCo: An intrinsic dimensionality estimator exploiting angle and norm concentration.
Pattern Recognition, 47(8):2569–2581, 2014.
[21] Siu-Wing Cheng and Man-Kwun Chiu. Dimension Detection via Slivers. In Proceedings of the Twen-
tieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1001–1010. Society for Industrial
and Applied Mathematics, 2009.
[22] Jose A. Costa and Alfred O. Hero. Entropic graphs for manifold learning. In Conference Record of the
Asilomar Conference on Signals, Systems and Computers, volume 1, 2003.
[23] Jose A. Costa and Alfred O. Hero. Geodesic entropic graphs for dimension and entropy estimation in
manifold learning. IEEE Transactions on Signal Processing, 52(8):2210–2221, 8 2004.
[24] Jose A. Costa and Alfred O. Hero. Learning intrinsic dimension and intrinsic entropy of high-
dimensional datasets. In European Signal Processing Conference, volume 06-10-September-2004, 2015.
[25] J. M. Craddock and C. R. Flood. Eigenvectors for representing the 500 mb geopotential surface over
the Northern Hemisphere. Quarterly Journal of the Royal Meteorological Society, 95(405):576–593, 7
1969.
[26] Francesco Denti, Diego Doimo, Alessandro Laio, and Antonietta Mira. The generalized ratios intrinsic
dimension estimator. Scientific Reports, 12(1):20005, 11 2022.
[27] Antonio Di Noia, Iuri Macocco, Aldo Glielmo, Alessandro Laio, and Antonietta Mira. Beyond the
noise: intrinsic dimension estimation with optimal neighbourhood identification. arXiv:2405.15132v2,
5 2024.
[28] Kevin Dunne. Metric Space Spread, Intrinsic Dimension and the Manifold Hypothesis.
arXiv:2308.01382, 8 2023.
[29] Benjamin Dupuis, George Deligiannidis, and Umut Simsekli. Generalization bounds using data-
dependent fractal dimensions. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engel-
hardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference
on Machine Learning, pages 8922–8968, Honolulu, 2023. PMLR.
[30] J.-P. Eckmann and D. Ruelle. Fundamental limitations for estimating dimensions and Lyapunov
exponents in dynamical systems. Physica D: Nonlinear Phenomena, 56(2-3):185–187, 5 1992.
[31] Vittorio Erba, Marco Gherardi, and Pietro Rotondo. Intrinsic dimension estimation for locally under-
sampled data. Scientific Reports, 9(1):17133, 11 2019.
[32] Elena Facco, Maria d’Errico, Alex Rodriguez, and Alessandro Laio. Estimating the intrinsic dimension
of datasets by a minimal neighborhood information. Scientific Reports, 7(1):12140, 9 2017.
[33] Kenneth Falconer. Alternative Definitions of Dimension. In Fractal Geometry, chapter 3, pages 39–58.
John Wiley & Sons, Ltd, 2003.
40
[34] Mingyu Fan, Nannan Gu, Hong Qiao, and Bo Zhang. Intrinsic dimension estimation of data by
principal component analysis. arXiv, 2 2010.
[35] Amir massoud Farahmand, Csaba Szepesvári, and Jean-Yves Audibert. Manifold-adaptive dimension
estimation. In Proceedings of the 24th international conference on Machine learning, pages 265–272,
New York, NY, USA, 6 2007. ACM.
[36] Serge Frontier. Étude de la décroissance des valeurs propres dans une analyse en composantes prin-
cipales: Comparaison avec le modèle du bâton brisé. Journal of Experimental Marine Biology and
Ecology, 25(1):67–75, 11 1976.
[37] K. Fukunaga and D.R. Olsen. An Algorithm for Finding Intrinsic Dimensionality of Data. IEEE
Transactions on Computers, C-20(2):176–183, 2 1971.
[38] Xin Geng, De Chuan Zhan, and Zhi Hua Zhou. Supervised nonlinear dimensionality reduction for
visualization and classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cy-
bernetics, 35(6):1098–1107, 12 2005.
[39] Benyamin Ghojogh, Mark Crowley, Fakhri Karray, and Ali Ghodsi. Elements of Dimensionality Re-
duction and Manifold Learning. Springer International Publishing, Cham, 2023.
[40] Dejan Govc and Richard Hepworth. Persistent magnitude. Journal of Pure and Applied Algebra,
225(3):106517, 3 2021.
[41] Daniele Granata and Vincenzo Carnevale. Accurate Estimation of the Intrinsic Dimension Using Graph
Distances: Unraveling the Geometric Complexity of Datasets. Scientific Reports, 6, 8 2016.
[42] Peter Grassberger and Itamar Procaccia. Measuring the strangeness of strange attractors. Physica D:
Nonlinear Phenomena, 9(1):189–208, 1983.
[43] Alfred Gray. Comparison theorems for the volumes of tubes as generalizations of the Weyl tube formula.
Topology, 21(2):201–228, 1982.
[44] Hein Matthias and Audibert Jean-Yves. Intrinsic Dimensionality Estimation of Submanifolds in $Rˆd$.
In Proceedings of the 22nd International Conference on Machine Learning, pages 289–296, Bonn, 2005.
Association for Computing Machinery.
[45] Christian Horvat and Jean-Pascal Pfister. Intrinsic dimensionality estimation using Normalizing Flows.
In S Koyejo, S Mohamed, A Agarwal, D Belgrave, K Cho, and A Oh, editors, Advances in Neural
Information Processing Systems, volume 35, pages 12225–12236. Curran Associates, Inc., 2022.
[46] Alexander Ivanov, Gleb Nosovskiy, Alexey Chekunov, Denis Fedoseev, Vladislav Kibkalo, Mikhail
Nikulin, Fedor Popelenskiy, Stepan Komkov, Ivan Mazurenko, and Aleksandr Petiushko. Manifold
Hypothesis in Data Analysis: Double Geometrically-Probabilistic Approach to Manifold Dimension
Estimation. arXiv:2107.03903, 7 2021.
[47] William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space.
In Richard Beals, Anatole Beck, Alexandra Bellow, and Arshag Hajian, editors, Conference on Modern
Analysis and Probability, pages 189–206. American Mathematical Society, Providence, RI, 1984.
[48] Kerstin Johnsson, Charlotte Soneson, and Magnus Fontes. Low bias local intrinsic dimension estimation
from expected simplex skewness. IEEE Transactions on Pattern Analysis and Machine Intelligence,
37(1):196–202, 1 2015.
[49] I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 2002.
[50] Iolo Jones. Manifold Diffusion Geometry: Curvature, Tangent Spaces, and Dimension.
arXiv:2411.04100, 11 2024.
41
[51] Zaher Joukhadar, Hanxun Huang, and Sarah Monazam Erfani. Bayesian Estimation Approaches
for Local Intrinsic Dimensionality. In Edgar Chávez, Benjamin Kimia, Jakub Lokoč, Marco Patella,
and Jan Sedmidubsky, editors, Similarity Search and Applications, volume 15268 of Lecture Notes in
Computer Science, pages 111–125, Cham, 2025. Springer Nature Switzerland.
[52] Henry F Kaiser. The Application of Electronic Computers to Factor Analysis. Educational and
Psychological Measurement, 20(1):141–151, 1960.
[53] Hamidreza Kamkari, Brendan Leigh, Ross Rasa Hosseinzadeh, Jesse C Cresswell, and Gabriel Loaiza-
Ganem. A Geometric View of Data Complexity: Efficient Local Intrinsic Dimension Estimation with
Diffusion Models. Technical report, 2024.
[54] Rasa Karbauskaite and Gintautas Dzemyda. Fractal-Based Methods as a Technique for Estimating
the Intrinsic Dimensionality of High-Dimensional Data: A Survey. Informatica. 2016;27(2):257-281,
27(2):257–281, 2016.
[55] Tommi Kärkkäinen and Jan Hänninen. Additive autoencoder for dimension estimation. Neurocomput-
ing, 551, 9 2023.
[56] Hirokazu Katsumasa, Emily Roff, and Masahiko Yoshinaga. Is magnitude ’generically continuous’ for
finite metric spaces? arXiv:2501.08745, 1 2025.
[57] Balázs Kégl. Intrinsic Dimension Estimation Using Packing Numbers. In S Becker, S Thrun, and
K Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 697–704. MIT
Press, 2002.
[58] Harry Kesten and Sungchul Lee. The central limit theorem for weighted minimal spanning trees on
random points. The Annals of Applied Probability, 6(2):495–527, 1996.
[59] Jisu Kim, Alessandro Rinaldo, and Larry Wasserman. Minimax rates for estimating the dimension of
a manifold. Journal of Computational Geometry, 10(1):42–95, 2019.
[60] Matthäus Kleindessner and Ulrike Luxburg. Dimensionality estimation without distances. In Guy
Lebanon and S. V. N. Vishwanathan, editors, Proceedings of the Eighteenth International Conference
on Artificial Intelligence and Statistics, pages 471–479, 2015.
[61] Gady Kozma, Zvi Lotker, and Gideon Stupp. The minimal spanning tree and the upper box dimension.
Proceedings of the American Mathematical Society, 134(4):1183–1187, 9 2005.
[62] Gady Kozma, Zvi Lotker, and Gideon Stupp. On the connectivity threshold for general uniform metric
spaces. Information Processing Letters, 110(10):356–359, 4 2010.
[63] Anna Krakovská and Martina Chvosteková. Simple correlation dimension estimator and its use to
detect causality. Chaos, Solitons and Fractals, 175, 10 2023.
[66] Tom Leinster and Simon Willerton. On the asymptotic magnitude of subsets of Euclidean space.
Geometriae Dedicata, 164(1):287–310, 6 2013.
[67] Elizaveta Levina and Peter J Bickel. Maximum Likelihood Estimation of Intrinsic Dimension. In
Advances in Neural Information Processing Systems 17, 2004.
[68] Uzu Lim, Harald Oberhauser, and Vidit Nanda. Tangent Space and Dimension Estimation with the
Wasserstein Distance. SIAM Journal on Applied Algebra and Geometry, 8:650–685, 10 2024.
42
[69] Katharina Limbeck, Rayna Andreeva, Rik Sarkar, and Bastian Rieck. Metric space magnitude for
evaluating the diversity of latent representations. In NIPS ’24: Proceedings of the 38th International
Conference on Neural Information Processing Systems, pages 123911–123953, Vancouver, BC, Canada,
2025. Curran Associates Inc.
[70] Tong Lin and Hongbin Zha. Riemannian manifold learning. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 30(5):796–809, 5 2008.
[71] Gabriele Lombardi, Alessandro Rozza, Claudio Ceruti, Elena Casiraghi, and Paola Campadelli. Mini-
mum Neighbor Distance Estimators of Intrinsic Dimension. In D Gunopulos, T Hofmann, D Malerba,
and M Vazirgiannis, editors, Machine Learning and Knowledge Discovery in Databases. ECML PKDD
2011., volume 6912, pages 374–389. Springer, 2011.
[72] David J.C. MacKay and Zoubin Ghahramani. Comments on ’Maximum Likelihood Estimation of
Intrinsic Dimension’ by E. Levina and P. Bickel (2004), 2005.
[73] Iuri Macocco, Aldo Glielmo, Jacopo Grilli, and Alessandro Laio. Intrinsic Dimension Estimation for
Discrete Metrics. Physical Review Letters, 130(6), 2 2023.
[74] Pertti Mattila. Geometry of sets and measures in Euclidean spaces. Cambridge University Press,
Cambridge, 1995.
[75] Leland McInnes, John Healy, Nathaniel Saul, and Lukas Großberger. UMAP: Uniform Manifold
Approximation and Projection. Journal of Open Source Software, 3(29), 2018.
[76] Mark W. Meckes. Positive definite metric spaces. Positivity, 17(3):733–757, 9 2013.
[77] Mark W. Meckes. Magnitude, Diversity, Capacities, and Dimensions of Metric Spaces. Potential
Analysis, 42(2):549–572, 2 2015.
[78] Thomas P. Minka. Automatic choice of dimensionality for PCA. In Advances in Neural Information
Processing Systems, 2001.
[79] James Raymond Munkres. Topology (2nd Edition). Prentice Hall, Inc, 2000.
[80] Artem R. Oganov and Mario Valle. How to quantify energy landscapes of solids. Journal of Chemical
Physics, 130(10), 2009.
[81] Miguel O’Malley, Sara Kalisnik, and Nina Otter. Alpha magnitude. Journal of Pure and Applied
Algebra, 227(11):107396, 11 2023.
[82] Kadir Özçoban, Murat Manguoğlu, and Emrullah Fatih Yetkin. A Novel Approach for Intrinsic Di-
mension Estimation. arXiv:2503.09485v1 [cs.LG] 12 Mar 2025, 3 2025.
[83] Panagiotis G. Papaioannou, Ronen Talmon, Ioannis G. Kevrekidis, and Constantinos Siettos. Time-
series forecasting using manifold learning, radial basis function interpolation, and geometric harmonics.
Chaos, 32(8), 8 2022.
[84] Karl W. Pettis, Thomas A. Bailey, Anil K. Jain, and Richard C. Dubes. An Intrinsic Dimensionality
Estimator from Near-Neighbor Information. IEEE Transactions on Pattern Analysis and Machine
Intelligence, PAMI-1(1):25–37, 1979.
[85] Haiquan Qiu, Youlong Yang, and Benchong Li. Intrinsic dimension estimation based on local adjacency
information. Information Sciences, 558:21–33, 5 2021.
[86] Haiquan Qiu, Youlong Yang, and Hua Pan. Underestimation modification for intrinsic dimension
estimation. Pattern Recognition, 140, 8 2023.
[87] Haiquan Qiu, Youlong Yang, and Saeid Rezakhah. Intrinsic dimension estimation method based on
correlation dimension and kNN method. Knowledge-Based Systems, 235, 1 2022.
43
[88] Davide Risso, Fanny Perraudeau, Svetlana Gribkova, Sandrine Dudoit, and Jean Philippe Vert. A gen-
eral and flexible method for signal extraction from single-cell RNA-seq data. Nature Communications,
9(1), 12 2018.
[89] A. Rozza, G. Lombardi, C. Ceruti, E. Casiraghi, and P. Campadelli. Novel high intrinsic dimensionality
estimators. Machine Learning, 89(1-2):37–65, 10 2012.
[90] Alessandro Rozza, Gabriele Lombardi, Marco Rosa, Elena Casiraghi, and Paola Campadelli. IDEA:
Intrinsic Dimension Estimation Algorithm. In Image Analysis and Processing – ICIAP 2011. Springer,
Berlin, Heidelberg., 2011.
[91] John W Sammon. A Nonlinear Mapping for Data Structure Analysis. IEEE Transactions on Com-
puters, C-18(5):401–09, 1969.
[92] Aaron Schumacher. Estimate manifold dimensionality with LID, 12 2020.
[93] Benjamin Schweinhart. Fractal dimension and the persistent homology of random geometric complexes.
Advances in Mathematics, 372:107291, 10 2020.
[94] Benjamin Schweinhart. Persistent Homology and the Upper Box Dimension. Discrete & Computational
Geometry, 65(2):331–364, 2021.
[95] Paulo Serra and Michel Mandjes. Dimension Estimation Using Random Connection Models. Journal
of Machine Learning Research, 18(138):1–35, 2017.
[96] Umut Simsekli, Ozan Sener, George Deligiannidis, and Murat A Erdogdu. Hausdorff dimension, heavy
tails, and generalization in neural networks. Advances in Neural Information Processing Systems,
33:5138–5151, 2020.
[97] Primoz Skraba and Katharine Turner. Wasserstein Stability for Persistence Diagrams.
arXiv:2006.16824, 7 2025.
[98] Jan Pawel Stanczuk, Georgios Batzolis, Teo Deveney, and Carola-Bibiane Schönlieb. Diffusion Models
Encode the Intrinsic Dimension of Data Manifolds. In Ruslan Salakhutdinov, Zico Kolter, Katherine
Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors, Proceedings of
the 41st International Conference on Machine Learning, pages 46412–46440. PMLR, 2024.
[99] J. Michael Steele. Growth Rates of Euclidean Minimal Spanning Trees with Power Weighted Edges.
The Annals of Probability, 16(4), 10 1988.
[100] Xing Sun and Andrew B Nobel. On the Size and Recovery of Submatrices of Ones in a Random Binary
Matrix. Journal of Machine Learning Research, 9(80):2431–2453, 2008.
[101] Oliver J Sutton, Qinghua Zhou, Alexander N Gorban, and Ivan Y Tyukin. Relative Intrinsic Di-
mensionality Is Intrinsic to Learning. In Lazaros Iliadis, Antonios Papaleonidas, Plamen Angelov,
and Chrisina Jayne, editors, Artificial Neural Networks and Machine Learning – ICANN 2023, volume
14254 of Lecture Notes in Computer Science, pages 516–529, Cham, 2023. Springer Nature Switzerland.
[102] Piotr Tempczyk, Rafal Michaluk, Lukasz Garncarek, Przemyslaw Spurek, Jacek Tabor, and Adam
Golinski. LIDL: Local Intrinsic Dimension Estimation Using Approximate Likelihood. In Kamalika
Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceed-
ings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine
Learning Research, pages 21205–21231. PMLR, 8 2022.
[103] Joshua B Tenenbaum, Vin de Silva, and John C Langford. A Global Geometric Framework for Non-
linear Dimensionality Reduction. Science, 290:2319–2323, 2000.
[104] G. V. Trunk. Statistical estimation of the intrinsic dimensionality of data collections. Information and
Control, 12(5):508–525, 5 1968.
44
[105] Eduard Tulchinskii, Kristian Kuznetsov, Laida Kushnareva, Daniil Cherniavskii, Sergey Nikolenko,
Evgeny Burnaev, Serguei Barannikov, and Irina Piontkovskaya. Intrinsic dimension estimation for
robust detection of ai-generated texts. Advances in Neural Information Processing Systems, 36:39257–
39276, 2023.
[106] Andrey Tychonoff. Ein Fixpunktsatz. Mathematische Annalen, 11:767–776, 1935.
[107] Laurens Van Der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine
Learning Research, 9, 2008.
[108] Jan van Mill. Infinite-Dimensional Topology. North-Holland, 1st edition, 1988.
[109] Peter J. Verveer and Robert P.W. Duin. An Evaluation of Intrinsic Dimensionality Estimators. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 17(1):81–86, 1995.
[110] Gregory P. Way, Michael Zietz, Vincent Rubinetti, Daniel S. Himmelstein, and Casey S. Greene.
Compressing gene expression data using multiple latent space dimensionalities learns complementary
biological representations. Genome Biology, 21(1), 5 2020.
[111] Simon Willerton. Heuristic and computer calculations for the magnitude of metric spaces.
arXiv:0910.5500, 10 2009.
[112] Xin Yang, Sebastien Michea, and Hongyuan Zha. Conical dimension as an intrisic dimension estimator
and its applications. In Chid Apte, David Skillicorn, Bing Liu, and Srinivasan Parthasarathy, editors,
Proceedings of the 2007 SIAM International Conference on Data Mining, pages 169–179, Minneapolis,
2007. SIAM.
[113] Eric Yeats, Cameron Darwin, Frank Liu, and Hai Li. Adversarial Estimation of Topological Dimension
with Harmonic Score Maps. arXiv:2312.06869, 12 2023.
[114] Joseph E. Yukich. Probability Theory of Classical Euclidean Optimization Problems. Springer Berlin,
Heidelberg, 1998.
[115] Wenlan Zang. Abstract Latent-Space Construction for Analyzing Large Genomic Data Sets. PhD
thesis, Yale University, 2021.
45
A Comparison of PH and KNN
Since PH0 and KNN are derived from the common theory of Euclidean functionals, and are similar in
construction, we highlight a comparison in their performance on the benchmark set of datasets. We first
discuss some theoretical advantages of PH0 . The key difference between the two estimators is that PH0 further
processes the distance information by using the minimum spanning tree (or zeroth dimensional persistent
homology) of the point set. The minimum spanning tree takes the global connectivity into account; in
comparison, KNN considers edge lengths along the knn graph, which is a much coarser organisation of the
connectivity of the point cloud compared to PH0 . One key benefit of PH0 is the stability property with respect
to the perturbation of points conferred to the α-weight of the minimum spanning tree [97]. In addition, the
minimum spanning tree of PH0 does not rely on assumptions about a suitable local neighbourhood size,
which is required as a hyperparameter in KNN in the construction of the knn graph.
In our empirical observations on the benchmark dataset, while the mean estimates of both estimators
are comparable, we see that KNN is more susceptible to the randomness of the point sample, and can have
much larger variance, especially for higher dimensional datasets.
To investigate the effects of hyperparameter choice, in Table A1, we show how estimates of spheres for
different dimensions vary for different choices of α. In the main text, Figure 6 shows the performance of
KNN as k is varied on the benchmark dataset.
Table A1: Dimension estimates from PH0 , given samples of 1000 points on spheres of different dimensions,
for different choices of the power parameter α.
46
2500 samples (abs)
M1_Sphere M2_Affine_3to5 M3_Nonlinear_4to6 M4_Nonlinear M5a_Helix1d M5b_Helix2d
11.0 3.000
1.04
2.975 4.00 4.00
10.5 1.03 2.15
2.950 3.95 3.95 1.02
10.0 2.925 1.01 2.10
3.90 3.90
2.900 1.00
9.5 2.05
2.875 3.85 3.85 0.99
9.0 2.850 0.98
3.80 3.80 2.00
Figure A1: Performance of PH0 and KNN estimators on benchmark manifolds with 2500 samples. The
hyperparameters chosen here are fixed across all datasets, with α = 0.5 and k = 1 for PH0 and KNN
respectively. Both choices are the ones that minimises the median (as taken across the benchmark manifolds)
absolute and relative error in the mean dimension estimate. The error bars indicate the interquartile range
and median dimension estimate. The red cross indicates the mean dimension estimate, and the dashed
line indicates the true dimension. We note that the KNN estimator consistently has greater variance when
compared to PH, and can output extreme outliers in high dimensional cases such as M10d Cubic.
47
Best performance for 2500 samples
M1_Sphere M2_Affine_3to5 M3_Nonlinear_4to6 M4_Nonlinear M5a_Helix1d M5b_Helix2d
11.0 3.000 4.00 1.015
2.975 4.00 1.010
10.5 3.95 2.15
2.950 3.95 1.005
3.90 2.10
10.0 2.925
3.90 1.000
2.900 3.85
9.5 0.995 2.05
2.875 3.85 3.80
0.990
9.0 2.850 3.75 2.00
3.80 0.985
M6_Nonlinear M7_Roll M8_Nonlinear M9_Affine M10a_Cubic M10b_Cubic
2.08 14.0 20 10.0 17.0
6.15
2.06 13.5 9.8 16.5
19
6.10 2.04 9.6 16.0
2.02 13.0 18
6.05 9.4 15.5
2.00 12.5 9.2
17 15.0
6.00 1.98 9.0
12.0 14.5
5.95 1.96 16 8.8
1.94 11.5 14.0
8.6
M10c_Cubic M10d_Cubic M11_Moebius M12_Norm M13a_Scurve M13b_Spiral
24 300 2.02 1.04
20 2.02
23 200 2.01 1.03
22 100 2.00 2.00 1.02
19
0 1.01
21 1.99 1.98
100 18 1.00
20 1.98
200 1.96 0.99
19 300 1.97 17 0.98
Figure A2: Performance of PH0 and KNN estimators on benchmark manifolds with 2500 samples. The
hyperparameters chosen here are different across the datasets, and are chosen to be the the ones minimises the
difference between the mean dimension estimate and ground truth. The error bars indicate the interquartile
range and median dimension estimate. The red cross indicates the mean dimension estimate, and the dashed
line indicates the true dimension. We note that the KNN estimator consistently has greater variance when
compared to PH0 , though there it is often more accurate given the right hyperparameter choice.
48
B Finite size issues of magnitude dimension
Focussing on the practicalities, finite size issues can affect the inference of magnitude dimension from finite
samples, since dimMag (X) = 0. Because |tX| → |X| as t → ∞, the slope of the line approaches zero. In
practice, while the range of t over which we fit the curve must be large enough to approximate the limit, it
cannot be so large that the finite size effect occurs. If there are too few points sampled from X, then the
finite size effect takes over before t can be large enough for the asymptotic behaviour to emerge. This means
the number of points may need to be quite large for the dimension to be read off the empirical curve log |tX|.
We demonstrate this in an experiment with uniform random samples from S d ⊂ Rd+1 . For example, for
d = 2, Figure B1 displays log t vs log |tX| for |X| = 625, 1250, 2500, 5000. For small t, the curves are identical
for different number of samples, yet as t increases and |tX| grows, the finite size effect takes hold and the
curves plateau at |tX| → |X|. In Table B1, we show the magnitude dimension estimates for d = 2, 4, 8, 16
and varying number of samples. Even for modest dimensions and high number of samples, the dimension
estimates are far below the actual dimension, as the finite size effect prevents the emergence of asymptotic
growth of the magnitude curve. The magnitude curves for higher dimensions are illustrated in Appendix B.
Table B1: Magnitude dimension estimates of the dimension of spheres S m where m = 2, 4, 8, 16, given
N uniform i.i.d. samples. We note that even for moderately high dimensions, the magnitude dimension
estimator can fail to recover the intrinsic dimension of the unit sphere even for N = 5000, where other
estimators succeed.
dt 2 |Xt|
|Xt| d2
104
1.0
103
0.8
102 0.6
5000
d = 1.97 0.4
2500
101 d = 1.96
1250 0.2
d = 1.94
625
100 d = 1.91 0.0
10 1 100 101 102 10 1 100 101 102
Figure B1: Magnitude functions of random samples of N = 625, 1250, 2500, 5000 points on S2 ⊂ R3 . We
observe that, as N increases, the finite cap on |tX| arrives at a larger value of t, and a larger part of the
linear region of the curve is preserved. We use the slope of the linear part of the curve as the magnitude
dimension estimate. On the right we plot the magnitude of the second derivative of the curve, approximated
by finite difference. The linear portion of the curve is selected to be the part of the curve whose second
derivative lies below the threshold value indicated in the right hand panel.
49
dt 2 |Xt|
|Xt| d2
104 4.0
3.5
103 3.0
2.5
102 2.0
5000
d = 3.47 1.5
2500
d = 3.33 1.0
101 1250
d = 3.15 0.5
625
d = 2.92 0.0
100 101 100 101
Figure B2: Magnitude functions of random samples of N = 625, 1250, 2500, 5000 points on S4 ⊂ R5 . We
observe that, as N increases, the finite cap on |tX| arrives at a larger value of t, and a larger part of the
linear region of the curve is preserved. We use the slope of the linear part of the curve as the magnitude
dimension estimate. On the right we plot the magnitude of the second derivative of the curve, approximated
by finite difference. The linear portion of the curve is selected to be the part of the curve whose second
derivative lies below the threshold value indicated in the right hand panel.
dt 2 |Xt|
|Xt| d2
104 12
10
103 5000
d = 5.34 8
2500
d = 4.82
102 1250 6
d = 4.33
625
d = 3.79 4
101
2
100 0
10 1 100 101 10 1 100 101
Figure B3: Magnitude functions of random samples of N = 625, 1250, 2500, 5000 points on S16 ⊂ R17 . We
observe that, as N increases, the finite cap on |tX| arrives at a larger value of t, and a larger part of the
linear region of the curve is preserved. We use the slope of the linear part of the curve as the magnitude
dimension estimate. On the right we plot the magnitude of the second derivative of the curve, approximated
by finite difference. The linear portion of the curve is selected to be the part of the curve whose second
derivative lies below the threshold value indicated in the right hand panel.
50
C Performance of Estimators on Benchmark Datasets
We assess the estimators with the following experimental procedure. Let M be the set of benchmark
manifolds, E the set of estimators, and HE the set of hyperparameters for some estimator E ∈ E. For
each triplet (M, E, H) representing a manifold, an estimator and a hyperparameter choice, we evaluated the
performance of the estimator over 20 randomly sampled point sets from M . We record the empirical mean,
ˆ M, H), and the standard deviation of the dimension estimates. For each dataset
which we denote by d(E,
type in the list of benchmark datasets, we varied the number of samples from 625, 1250, 2500, and 5000,
examining the performance as the number of points is successively doubled.
We varied the choice of hyperparameters over a range specified in Appendix D. Across the local estimators
we varied the number of nearest neighbours. We aggregate the performance across hyperparameters in the
tables in Appendix C, where we show dimension estimates on three different types of hyperparameters:
1. The hyperparameter that minimises the difference between the estimated and intrinsic dimension
2. The performance of the estimator with a fixed choice of hyperparameter; this is either
(a) The hyperparameter that minimises the median absolute error across the benchmark manifolds:
(b) The one that minimises the median relative error across the benchmark manifolds:
The hyperparameter choices Habs and Hrel guarantee a reasonable performance of the estimator on most of
the datasets in the benchmark. While a choice of hyperparameter might be optimal for a certain dataset,
the same hyperparameter may lead to a poor performance on another. The difference between between dˆbest
and dˆabs or dˆrel gives an indication of how dependent the estimator is on the hyperparameter choice in order
to perform well on a particular dataset.
51
lPCA
n 625 1250 2500 5000
d n Best med abs med rel Best med abs med rel Best med abs med rel Best med abs med rel
M1 Sphere 10 11 10.0 (0.0) 9.8 (0.1) 11.0 (0.0) 10.0 (0.0) 10.8 (0.0) 11.0 (0.0) 10.0 (0.0) 11.0 (0.0) 11.0 (0.0) 10.0 (0.0) 10.0 (0.0) 10.0 (0.0)
M2 Affine 3 5 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0)
M3 Nonlinear 4 6 4.0 (0.0) 4.0 (0.1) 5.0 (0.0) 4.0 (0.0) 5.0 (0.0) 5.0 (0.0) 4.0 (0.0) 5.0 (0.0) 5.0 (0.0) 4.0 (0.0) 5.0 (0.0) 5.0 (0.0)
M4 Nonlinear 4 8 4.0 (0.0) 4.6 (0.2) 7.0 (0.0) 4.0 (0.0) 6.6 (0.0) 7.0 (0.0) 4.0 (0.0) 7.0 (0.0) 7.0 (0.0) 4.0 (0.0) 6.0 (0.0) 6.0 (0.0)
M5a Helix1d 1 3 1.0 (0.0) 1.1 (0.0) 3.0 (0.0) 1.0 (0.0) 1.4 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0)
M5b Helix2d 2 3 2.0 (0.0) 2.9 (0.0) 3.0 (0.0) 2.0 (0.0) 3.0 (0.0) 3.0 (0.0) 2.0 (0.0) 3.0 (0.0) 3.0 (0.0) 2.0 (0.0) 3.0 (0.0) 3.0 (0.0)
M6 Nonlinear 6 36 6.0 (0.0) 6.0 (0.3) 11.0 (0.0) 6.0 (0.0) 11.0 (0.1) 11.0 (0.0) 6.0 (0.0) 12.0 (0.2) 12.0 (0.2) 6.0 (0.0) 11.0 (0.0) 11.0 (0.0)
M7 Roll 2 3 2.0 (0.0) 2.0 (0.0) 2.4 (0.5) 2.0 (0.0) 2.1 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0)
M8 Nonlinear 12 72 12.0 (0.0) 7.5 (0.4) 20.4 (0.5) 12.0 (0.0) 20.1 (0.1) 20.0 (0.0) 12.0 (0.0) 24.0 (0.2) 24.0 (0.2) 12.0 (0.0) 23.0 (0.0) 23.0 (0.0)
M9 Affine 20 20 20.0 (0.0) 10.4 (0.4) 19.0 (0.0) 20.0 (0.0) 19.3 (0.0) 19.0 (0.0) 20.0 (0.0) 20.0 (0.0) 20.0 (0.0) 20.0 (0.0) 20.0 (0.0) 20.0 (0.0)
M10a Cubic 10 11 10.0 (0.0) 9.0 (0.1) 11.0 (0.0) 10.0 (0.0) 11.0 (0.0) 11.0 (0.0) 10.0 (0.0) 11.0 (0.0) 11.0 (0.0) 10.0 (0.0) 11.0 (0.0) 11.0 (0.0)
M10b Cubic 17 18 16.5 (0.1) 11.0 (0.3) 18.0 (0.0) 16.4 (0.1) 17.8 (0.0) 18.0 (0.0) 17.6 (0.0) 18.0 (0.0) 18.0 (0.0) 17.6 (0.0) 18.0 (0.0) 18.0 (0.0)
M10c Cubic 24 25 24.0 (0.1) 11.6 (0.4) 23.0 (0.0) 24.0 (0.1) 22.6 (0.0) 23.0 (0.0) 23.9 (0.0) 25.0 (0.0) 25.0 (0.0) 23.7 (0.0) 25.0 (0.0) 25.0 (0.0)
M10d Cubic 70 72 55.2 (0.1) 11.6 (0.6) 37.0 (0.0) 55.2 (0.0) 37.4 (0.0) 37.0 (0.0) 55.2 (0.0) 54.2 (0.4) 54.2 (0.4) 55.2 (0.0) 55.0 (0.0) 55.0 (0.0)
M11 Moebius 2 3 2.0 (0.0) 2.4 (0.0) 3.0 (0.0) 2.0 (0.0) 2.9 (0.0) 3.0 (0.0) 2.0 (0.0) 3.0 (0.0) 3.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0)
M12 Norm 20 20 20.0 (0.0) 6.9 (0.3) 19.0 (0.0) 20.0 (0.0) 19.3 (0.0) 19.0 (0.0) 20.0 (0.0) 20.0 (0.0) 20.0 (0.0) 20.0 (0.0) 20.0 (0.0) 20.0 (0.0)
M13a Scurve 2 3 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0)
M13b Spiral 1 13 1.0 (0.0) 2.0 (0.0) 2.0 (0.0) 1.0 (0.0) 2.0 (0.0) 2.0 (0.0) 1.0 (0.0) 2.0 (0.0) 2.0 (0.0) 1.0 (0.0) 2.0 (0.0) 2.0 (0.0)
Mbeta 10 40 10.0 (0.0) 4.5 (0.2) 9.0 (0.0) 10.0 (0.0) 9.0 (0.1) 9.0 (0.0) 10.0 (0.0) 10.0 (0.0) 10.0 (0.0) 10.0 (0.0) 10.0 (0.0) 10.0 (0.0)
Mn1 Nonlinear 18 72 18.0 (0.0) 8.6 (0.5) 18.0 (0.0) 18.0 (0.0) 17.9 (0.0) 18.0 (0.0) 18.0 (0.0) 18.2 (0.4) 18.2 (0.4) 18.0 (0.0) 18.0 (0.0) 18.0 (0.0)
Mn2 Nonlinear 24 96 24.0 (0.0) 8.9 (0.5) 22.9 (0.3) 24.0 (0.0) 22.5 (0.0) 22.5 (0.5) 24.0 (0.0) 24.0 (0.0) 24.0 (0.0) 24.0 (0.0) 24.0 (0.0) 24.0 (0.0)
52
Mp1 Paraboloid 3 12 3.0 (0.0) 2.9 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0) 3.0 (0.0)
Mp2 Paraboloid 6 21 6.0 (0.0) 5.1 (0.1) 6.0 (0.0) 6.0 (0.0) 5.9 (0.0) 6.0 (0.0) 6.0 (0.0) 6.0 (0.0) 6.0 (0.0) 6.0 (0.0) 6.0 (0.0) 6.0 (0.0)
Mp3 Paraboloid 9 30 9.0 (0.0) 6.8 (0.2) 9.0 (0.0) 9.0 (0.0) 8.4 (0.0) 9.0 (0.0) 9.0 (0.0) 9.0 (0.0) 9.0 (0.0) 9.0 (0.0) 9.0 (0.0) 9.0 (0.0)
Table C1: Number of Samples. With few samples, estimates can be sensitive to hyperparameter choices, such as in the case of M10b,c,d Cubic, and
M12 Norm. However, the estimates become more accurate on more datasets as the number of samples increases. High dimensional datasets. With
many samples, the only high dimensional dataset that troubles lPCA is M10d Cubic, which it significantly underestimates. Variance. lPCA has the
lowest variance among all estimators. Note that the aggregation method over local dimension estimates is often the median in this table (see Table D2),
which tends to produce an integer value; this effectively ‘rounded’ aggregation also reduces the variance. Hyperparameter dependency. While
this is an issue if there are few samples for some datasets, for many samples, the estimates between the best performance and med rel, med abs are
close. Notable exceptions include some non-linear datasets, such as M3, M4, M6, and M8 Nonlinear; and M5b Helix2d. Nonlinear datasets. With
many point samples, lPCA performs well on Mn1,2 Nonlinear, and Mp1,2,3 Paraboloids, which many other estimators struggle with. However, some
estimators perform much more consistently with regards to hyperparameter sensitivity on the other nonlinear datasets M3, M4, M6, M8 Nonlinear,
M13b Spiral.
PH
d n Best med abs med rel Best med abs med rel Best med abs med rel Best med abs med rel
M1Sphere 10 11 9.2 (0.47) 9.2 (0.47) 9.2 (0.47) 9.2 (0.32) 9.2 (0.32) 9.2 (0.32) 9.4 (0.16) 9.4 (0.16) 9.4 (0.16) 9.5 (0.12) 9.5 (0.12) 9.5 (0.12)
M2Affine3to5 3 5 2.9 (0.1) 2.9 (0.1) 2.9 (0.1) 2.9 (0.07) 2.9 (0.07) 2.9 (0.07) 2.9 (0.05) 2.9 (0.05) 2.9 (0.05) 2.9 (0.03) 2.9 (0.03) 2.9 (0.03)
M3Nonlinear4to6 4 6 3.8 (0.18) 3.8 (0.18) 3.8 (0.18) 3.9 (0.12) 3.9 (0.12) 3.9 (0.12) 3.9 (0.07) 3.9 (0.07) 3.9 (0.07) 3.9 (0.08) 3.9 (0.08) 3.9 (0.08)
M4Nonlinear 4 8 4.0 (0.13) 4.0 (0.13) 4.0 (0.13) 4.0 (0.1) 4.0 (0.1) 4.0 (0.1) 3.9 (0.09) 3.9 (0.09) 3.9 (0.09) 3.9 (0.07) 3.9 (0.07) 3.9 (0.07)
M5aHelix1d 1 3 1.0 (0.02) 1.0 (0.02) 1.0 (0.02) 1.0 (0.01) 1.0 (0.01) 1.0 (0.01) 1.0 (0.01) 1.0 (0.01) 1.0 (0.01) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0)
M5bHelix2d 2 3 2.8 (0.12) 2.8 (0.12) 2.8 (0.12) 2.6 (0.15) 2.6 (0.15) 2.6 (0.15) 2.3 (0.09) 2.3 (0.09) 2.3 (0.09) 2.0 (0.03) 2.0 (0.03) 2.0 (0.03)
M6Nonlinear 6 36 6.6 (0.32) 6.6 (0.32) 6.6 (0.32) 6.3 (0.3) 6.3 (0.3) 6.3 (0.3) 6.1 (0.15) 6.1 (0.15) 6.1 (0.15) 6.0 (0.11) 6.0 (0.11) 6.0 (0.11)
M7Roll 2 3 2.0 (0.06) 2.0 (0.06) 2.0 (0.06) 2.0 (0.05) 2.0 (0.05) 2.0 (0.05) 2.0 (0.03) 2.0 (0.03) 2.0 (0.03) 2.0 (0.01) 2.0 (0.01) 2.0 (0.01)
M8Nonlinear 12 72 14.1 (0.82) 14.1 (0.82) 14.1 (0.82) 14.1 (0.57) 14.1 (0.57) 14.1 (0.57) 13.8 (0.41) 13.8 (0.41) 13.8 (0.41) 13.6 (0.31) 13.6 (0.31) 13.6 (0.31)
M9Affine 20 20 15.6 (0.79) 15.6 (0.79) 15.6 (0.79) 15.3 (0.44) 15.3 (0.44) 15.3 (0.44) 15.3 (0.39) 15.3 (0.39) 15.3 (0.39) 15.7 (0.2) 15.7 (0.2) 15.7 (0.2)
M10aCubic 10 11 9.0 (0.4) 9.0 (0.4) 9.0 (0.4) 9.0 (0.27) 9.0 (0.27) 9.0 (0.27) 9.2 (0.18) 9.2 (0.18) 9.2 (0.18) 9.2 (0.12) 9.2 (0.12) 9.2 (0.12)
M10bCubic 17 18 13.7 (0.66) 13.7 (0.66) 13.7 (0.66) 13.7 (0.47) 13.7 (0.47) 13.7 (0.47) 14.1 (0.45) 14.1 (0.45) 14.1 (0.45) 14.2 (0.28) 14.2 (0.28) 14.2 (0.28)
M10cCubic 24 25 18.0 (0.82) 18.0 (0.82) 18.0 (0.82) 18.2 (0.58) 18.2 (0.58) 18.2 (0.58) 18.5 (0.5) 18.5 (0.5) 18.5 (0.5) 18.7 (0.34) 18.7 (0.34) 18.7 (0.34)
M10dCubic 70 72 40.3 (3.26) 40.3 (3.26) 40.3 (3.26) 39.6 (2.05) 39.6 (2.05) 39.6 (2.05) 40.3 (1.4) 40.3 (1.4) 40.3 (1.4) 41.6 (1.02) 41.6 (1.02) 41.6 (1.02)
53
M11Moebius 2 3 2.0 (0.06) 2.0 (0.06) 2.0 (0.06) 2.0 (0.05) 2.0 (0.05) 2.0 (0.05) 2.0 (0.03) 2.0 (0.03) 2.0 (0.03) 2.0 (0.02) 2.0 (0.02) 2.0 (0.02)
M12Norm 20 20 16.5 (0.87) 16.5 (0.87) 16.5 (0.87) 16.9 (0.78) 16.9 (0.78) 16.9 (0.78) 17.1 (0.53) 17.1 (0.53) 17.1 (0.53) 17.4 (0.36) 17.4 (0.36) 17.4 (0.36)
M13aScurve 2 3 2.0 (0.06) 2.0 (0.06) 2.0 (0.06) 2.0 (0.04) 2.0 (0.04) 2.0 (0.04) 2.0 (0.03) 2.0 (0.03) 2.0 (0.03) 2.0 (0.02) 2.0 (0.02) 2.0 (0.02)
M13bSpiral 1 13 1.8 (0.19) 1.8 (0.19) 1.8 (0.19) 1.5 (0.17) 1.5 (0.17) 1.5 (0.17) 1.1 (0.05) 1.1 (0.05) 1.1 (0.05) 1.0 (0.01) 1.0 (0.01) 1.0 (0.01)
Mbeta 10 40 6.3 (0.52) 6.3 (0.52) 6.3 (0.52) 6.4 (0.41) 6.4 (0.41) 6.4 (0.41) 6.5 (0.32) 6.5 (0.32) 6.5 (0.32) 6.7 (0.25) 6.7 (0.25) 6.7 (0.25)
Mn1Nonlinear 18 72 14.1 (0.75) 14.1 (0.75) 14.1 (0.75) 14.2 (0.48) 14.2 (0.48) 14.2 (0.48) 14.5 (0.45) 14.5 (0.45) 14.5 (0.45) 14.5 (0.3) 14.5 (0.3) 14.5 (0.3)
Mn2Nonlinear 24 96 18.0 (1.15) 18.0 (1.15) 18.0 (1.15) 18.3 (0.74) 18.3 (0.74) 18.3 (0.74) 18.4 (0.52) 18.4 (0.52) 18.4 (0.52) 18.6 (0.43) 18.6 (0.43) 18.6 (0.43)
Mp1Paraboloid 3 12 2.8 (0.11) 2.8 (0.11) 2.8 (0.11) 2.9 (0.09) 2.9 (0.09) 2.9 (0.09) 2.9 (0.07) 2.9 (0.07) 2.9 (0.07) 2.9 (0.05) 2.9 (0.05) 2.9 (0.05)
Mp2Paraboloid 6 21 4.7 (0.31) 4.7 (0.31) 4.7 (0.31) 5.0 (0.2) 5.0 (0.2) 5.0 (0.2) 5.3 (0.17) 5.3 (0.17) 5.3 (0.17) 5.4 (0.11) 5.4 (0.11) 5.4 (0.11)
Mp3Paraboloid 9 30 5.3 (0.68) 5.3 (0.68) 5.3 (0.68) 6.3 (0.52) 6.3 (0.52) 6.3 (0.52) 6.9 (0.35) 6.9 (0.35) 6.9 (0.35) 7.2 (0.16) 7.2 (0.16) 7.2 (0.16)
Table C2: Number of samples. PH has good performance for low dimensional manifolds even with few points. High dimensional datasets. PH is prone to underestimate
for high dimensional datasets, such as M9 Affine, and M10b,c,d, Cubic, even with many samples in the benchmark. Variance. The variance between different point samples
is relatively high, especially for high dimensional datasets with few samples, such as M10d cubic. See Theorem 2.6 for discussion on this. Hyperparameter dependency.
Across three different point sampling regimes, and all three different ways of reporting dimension estimates, the choice of α = 0.5 out of α ∈ {0.5, 1.0, 1.5, 2.0} consistently
produces the best results. Nonlinear datasets On Mbeta, Mn1,2 Nonlinear, the propensity to underestimate remain even after increasing the number of samples.
KNN
d n Best med abs med rel Best med abs med rel Best med abs med rel Best med abs med rel
M1 Sphere 10 11 9.6 (1.4) 13.2 (13.7) 9.5 (2.0) 9.9 (1.9) 9.9 (1.9) 9.9 (1.9) 9.8 (1.4) 9.8 (1.4) 9.8 (1.4) 9.7 (0.8) 9.7 (0.8) 9.5 (0.7)
M2 Affine 3to5 3 5 2.9 (0.3) 2.9 (0.3) 2.8 (0.2) 2.9 (0.2) 2.9 (0.2) 2.9 (0.2) 2.9 (0.1) 2.9 (0.1) 2.9 (0.1) 2.9 (0.1) 2.9 (0.1) 2.9 (0.1)
M3 Nonlinear 4to6 4 6 3.8 (0.4) 3.8 (0.4) 3.5 (0.2) 3.8 (0.4) 3.8 (0.4) 3.8 (0.4) 3.9 (0.2) 3.9 (0.2) 3.9 (0.2) 3.8 (0.2) 3.7 (0.1) 3.8 (0.2)
M4 Nonlinear 4 8 4.0 (0.2) 3.9 (0.6) 3.7 (0.3) 4.0 (0.1) 3.9 (0.3) 3.9 (0.3) 3.9 (0.3) 3.9 (0.2) 3.9 (0.2) 3.8 (0.2) 3.7 (0.1) 3.8 (0.1)
M5a Helix1d 1 3 1.0 (0.0) 1.0 (0.1) 1.0 (0.1) 1.0 (0.0) 1.0 (0.1) 1.0 (0.1) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0)
M5b Helix2d 2 3 2.2 (0.1) 2.6 (0.2) 2.6 (0.2) 2.3 (0.1) 2.3 (0.2) 2.3 (0.2) 2.1 (0.1) 2.1 (0.1) 2.1 (0.1) 2.0 (0.1) 2.1 (0.0) 2.0 (0.1)
M6 Nonlinear 6 36 6.0 (0.5) 6.2 (1.1) 5.9 (0.7) 6.0 (0.2) 5.6 (0.5) 5.6 (0.5) 6.0 (0.2) 5.8 (0.4) 5.8 (0.4) 5.9 (0.1) 5.3 (0.2) 5.5 (0.2)
M7 Roll 2 3 2.0 (0.1) 2.0 (0.2) 1.9 (0.1) 2.0 (0.1) 2.0 (0.1) 2.0 (0.1) 2.0 (0.1) 2.0 (0.1) 2.0 (0.1) 2.0 (0.0) 2.0 (0.0) 2.0 (0.1)
M8 Nonlinear 12 72 13.7 (2.9) 14.6 (7.1) 14.4 (3.1) 13.7 (1.4) 13.7 (2.7) 13.7 (2.7) 12.3 (1.3) 12.3 (1.3) 12.3 (1.3) 12.3 (0.6) 12.3 (0.6) 12.7 (0.8)
M9 Affine 20 20 20.0 (2.5) 23.8 (21.1) 18.2 (4.8) 20.1 (1.5) 16.3 (4.6) 16.3 (4.6) 18.1 (0.8) 15.5 (2.7) 15.5 (2.7) 16.9 (0.5) 15.2 (1.1) 15.5 (1.6)
M10a Cubic 10 11 9.7 (2.8) 9.7 (2.8) 8.8 (1.5) 9.4 (1.4) 9.4 (1.4) 9.4 (1.4) 9.0 (1.0) 9.0 (1.0) 9.0 (1.0) 8.9 (0.7) 8.5 (0.4) 8.7 (0.6)
M10b Cubic 17 18 17.1 (2.5) 18.2 (9.4) 14.3 (3.0) 16.0 (0.9) 12.8 (1.9) 12.8 (1.9) 15.1 (0.5) 14.4 (2.8) 14.4 (2.8) 14.3 (0.4) 13.3 (0.9) 13.3 (1.2)
M10c Cubic 24 25 24.0 (5.6) 30.7 (29.0) 22.0 (7.6) 24.0 (4.2) 19.3 (6.0) 19.3 (6.0) 23.5 (1.6) 19.8 (6.8) 19.8 (6.8) 21.9 (1.2) 18.7 (1.5) 19.3 (1.5)
M10d Cubic 70 72 96.8 (323.5) 166.3 (667.8) 96.8 (323.5) 65.7 (1209.4) -313.5 (1667.0) -313.5 (1667.0) 68.5 (768.6) -31.8 (471.5) -31.8 (471.5) 68.4 (120.4) 92.0 (36.6) 67.8 (28.7)
54
M11 Moebius 2 3 2.0 (0.1) 1.9 (0.1) 1.9 (0.1) 2.0 (0.1) 2.0 (0.2) 2.0 (0.2) 2.0 (0.0) 2.0 (0.1) 2.0 (0.1) 2.0 (0.1) 1.9 (0.0) 2.0 (0.1)
M12 Norm 20 20 19.8 (4.3) 17.3 (4.2) 19.8 (4.3) 20.1 (2.5) 19.0 (5.1) 19.0 (5.1) 20.0 (1.1) 19.1 (3.1) 19.1 (3.1) 20.0 (1.0) 18.8 (1.7) 17.8 (1.7)
M13a Scurve 2 3 2.0 (0.2) 2.0 (0.2) 2.0 (0.1) 2.0 (0.1) 1.9 (0.1) 1.9 (0.1) 2.0 (0.1) 2.0 (0.1) 2.0 (0.1) 2.0 (0.1) 2.0 (0.0) 2.0 (0.1)
M13b Spiral 1 13 1.3 (0.1) 1.3 (0.1) 1.9 (0.1) 1.0 (0.1) 1.0 (0.1) 1.0 (0.1) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0)
Mbeta 10 40 6.4 (0.8) 6.4 (1.2) 5.8 (0.7) 6.2 (0.9) 6.2 (0.9) 6.2 (0.9) 6.6 (0.7) 6.6 (0.7) 6.6 (0.7) 6.6 (0.4) 6.1 (0.2) 6.3 (0.4)
Mn1 Nonlinear 18 72 18.0 (3.0) 18.2 (8.2) 13.7 (2.7) 17.0 (1.7) 13.7 (2.0) 13.7 (2.0) 15.6 (0.7) 15.5 (3.6) 15.5 (3.6) 15.0 (0.5) 13.9 (1.3) 14.0 (1.3)
Mn2 Nonlinear 24 96 23.8 (8.0) 20.5 (8.4) 25.3 (11.8) 24.0 (3.1) 19.8 (5.0) 19.8 (5.0) 23.3 (1.3) 18.3 (2.7) 18.3 (2.7) 21.0 (0.7) 18.6 (1.5) 19.5 (2.5)
Mp1 Paraboloid 3 12 2.9 (0.3) 2.9 (0.3) 2.8 (0.2) 2.9 (0.2) 2.9 (0.2) 2.9 (0.2) 2.9 (0.2) 2.9 (0.2) 2.9 (0.2) 3.0 (0.1) 2.9 (0.1) 2.9 (0.1)
Mp2 Paraboloid 6 21 5.2 (0.9) 5.2 (0.9) 4.9 (0.6) 5.1 (0.7) 5.1 (0.7) 5.1 (0.7) 5.3 (0.4) 5.3 (0.4) 5.3 (0.4) 5.4 (0.3) 5.1 (0.2) 5.2 (0.2)
Mp3 Paraboloid 9 30 6.9 (1.4) 6.9 (1.4) 5.6 (0.7) 7.0 (0.7) 7.0 (0.7) 7.0 (0.7) 7.3 (0.7) 7.3 (0.7) 7.3 (0.7) 7.5 (0.6) 7.1 (0.4) 7.2 (0.4)
Table C3: Number of samples. KNN has good performance for low dimensional manifolds even with few points. High dimensional datasets. For high dimensional
datasets such as M10 Cubics, the error in the estimated dimension is even high, and the estimated dimension can even be negative (see comment in Theorem 2.6). Variance. For
the same reasons outlined in Theorem 2.6, the variance can be high for high dimensional datasets, such as the M10 Cubics. Hyperparameter dependency. The differences
between different hyperparameter choices is small for most datasets, apart from high dimensional ones. Nonlinear datasets. KNN has the tendency to underestimate on the
nonlinear datasets, such as Mp2,3 Paraboloids, and Mn1, 2 Nonlinear.
DanCo
d n Best med abs med rel Best med abs med rel Best med abs med rel Best med abs med rel
M1Sphere 10 11 10.7 (0.43) 10.7 (0.43) 10.8 (0.35) 10.7 (0.41) 11.0 (0.21) 10.9 (0.23) 10.9 (0.34) 11.0 (0.0) 11.0 (0.0) 10.5 (1.96) 11.0 (0.0) 11.0 (0.0)
M2Affine3to5 3 5 3.0 (0.1) 2.9 (0.08) 2.8 (0.08) 3.0 (0.07) 2.9 (0.06) 2.9 (0.08) 3.0 (0.04) 2.9 (0.05) 2.9 (0.05) 3.0 (0.04) 2.9 (0.04) 2.9 (0.02)
M3Nonlinear4to6 4 6 4.3 (0.07) 4.8 (0.31) 4.3 (0.07) 4.6 (0.18) 5.0 (0.39) 4.7 (0.26) 3.5 (1.65) 5.0 (0.2) 5.0 (0.2) 4.8 (0.19) 5.2 (0.33) 4.8 (0.19)
M4Nonlinear 4 8 4.5 (2.54) 5.1 (0.25) 4.8 (0.2) 3.6 (2.55) 5.1 (0.27) 4.9 (0.16) 4.0 (2.47) 5.5 (0.29) 5.5 (0.29) 3.8 (1.92) 5.4 (0.23) 5.1 (0.13)
M5aHelix1d 1 3 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0)
M5bHelix2d 2 3 1.9 (0.91) 3.0 (0.0) 3.0 (0.0) 2.1 (0.92) 3.0 (0.0) 3.0 (0.0) 1.9 (0.84) 3.0 (0.0) 3.0 (0.0) 2.1 (1.01) 3.4 (0.04) 3.4 (0.46)
M6Nonlinear 6 36 7.7 (0.33) 7.7 (0.33) 8.1 (0.22) 7.3 (0.3) 7.4 (0.34) 7.4 (0.3) 7.1 (0.05) 7.4 (0.22) 7.4 (0.22) 7.0 (0.02) 7.7 (0.22) 7.1 (0.11)
M7Roll 2 3 2.1 (0.78) 2.3 (0.02) 3.5 (0.14) 2.2 (0.02) 2.3 (0.02) 3.1 (0.03) 2.2 (0.01) 2.3 (0.01) 2.3 (0.01) 2.2 (0.01) 2.2 (0.01) 3.0 (0.24)
M8Nonlinear 12 72 18.0 (0.73) 18.0 (0.73) 18.3 (0.83) 17.4 (4.0) 17.8 (0.56) 17.9 (0.3) 12.4 (0.56) 17.3 (0.53) 17.3 (0.53) 12.3 (0.45) 16.9 (0.21) 17.0 (0.21)
M9Affine 20 20 19.7 (0.46) 19.6 (0.58) 19.3 (0.45) 20.0 (0.22) 19.5 (0.49) 18.5 (3.81) 19.9 (0.3) 19.3 (0.46) 19.3 (0.46) 19.9 (0.3) 18.2 (3.72) 19.1 (0.22)
M10aCubic 10 11 10.1 (0.3) 10.4 (0.48) 10.3 (0.45) 10.1 (0.29) 10.1 (0.29) 10.2 (0.35) 10.0 (0.02) 10.1 (0.29) 10.1 (0.29) 10.0 (0.01) 10.1 (0.29) 10.0 (0.01)
M10bCubic 17 18 17.2 (0.59) 17.4 (0.49) 17.2 (0.59) 17.0 (0.21) 17.2 (0.47) 17.0 (0.21) 17.0 (0.01) 17.2 (0.36) 17.2 (0.36) 17.0 (0.01) 17.0 (0.01) 17.0 (0.01)
M10cCubic 24 25 23.9 (0.62) 24.6 (0.5) 23.9 (0.62) 23.9 (0.5) 24.2 (0.57) 24.1 (0.22) 24.0 (0.0) 24.0 (0.44) 24.0 (0.44) 24.0 (0.0) 24.0 (0.0) 24.0 (0.0)
M10dCubic 70 72 70.6 (0.8) 70.9 (0.36) 68.2 (2.01) 70.7 (0.48) 70.8 (0.6) 70.9 (0.3) 70.8 (0.6) 70.8 (0.6) 70.8 (0.6) 69.6 (1.12) 71.0 (0.0) 71.0 (0.0)
M11Moebius 2 3 2.3 (0.04) 2.7 (0.32) 3.6 (0.43) 1.7 (0.95) 2.3 (0.02) 3.2 (0.04) 2.2 (0.01) 2.3 (0.01) 2.3 (0.01) 2.2 (0.01) 2.3 (0.01) 3.1 (0.01)
M12Norm 20 20 19.9 (0.23) 19.9 (0.35) 19.9 (0.35) 20.0 (0.0) 20.0 (0.0) 20.0 (0.0) 20.0 (0.0) 20.0 (0.0) 20.0 (0.0) 20.0 (0.0) 20.0 (0.0) 20.0 (0.0)
55
M13aScurve 2 3 2.2 (0.03) 2.3 (0.02) 3.5 (0.04) 2.2 (0.02) 2.3 (0.02) 3.0 (0.24) 2.2 (0.01) 2.3 (0.01) 2.3 (0.01) 2.2 (0.01) 2.2 (0.01) 3.0 (0.17)
M13bSpiral 1 13 1.2 (1.05) 2.2 (0.04) 2.3 (0.04) 1.2 (0.09) 1.4 (0.17) 2.1 (0.16) 1.0 (0.0) 1.0 (0.04) 1.0 (0.04) 1.0 (0.0) 1.0 (0.0) 1.0 (0.0)
Mbeta 10 40 7.1 (0.17) 6.3 (0.16) 6.3 (0.21) 7.6 (0.28) 6.7 (0.16) 6.4 (0.12) 8.1 (0.1) 7.0 (0.03) 7.0 (0.03) 8.8 (0.28) 7.3 (0.1) 7.0 (0.02)
Mn1Nonlinear 18 72 17.9 (0.66) 17.9 (0.66) 17.7 (0.53) 17.9 (0.38) 17.7 (0.53) 17.7 (0.43) 17.9 (0.38) 17.9 (0.38) 17.9 (0.38) 17.8 (0.35) 17.7 (0.43) 17.8 (0.35)
Mn2Nonlinear 24 96 24.0 (0.73) 24.0 (0.73) 23.7 (0.72) 24.1 (0.53) 23.9 (0.76) 24.1 (0.53) 23.8 (0.47) 23.8 (0.62) 23.8 (0.62) 24.0 (0.44) 24.0 (0.44) 23.8 (0.35)
Mp1Paraboloid 3 12 3.1 (0.21) 3.3 (0.2) 3.1 (0.21) 3.2 (0.12) 3.3 (0.12) 3.2 (0.14) 3.1 (0.08) 3.2 (0.09) 3.2 (0.09) 3.1 (0.04) 3.2 (0.07) 3.2 (0.07)
Mp2Paraboloid 6 21 6.3 (0.4) 5.7 (0.2) 4.8 (0.16) 5.9 (0.13) 5.9 (0.13) 5.6 (0.12) 5.9 (0.12) 6.3 (0.17) 6.3 (0.17) 6.0 (0.03) 6.6 (0.17) 6.2 (0.11)
Mp3Paraboloid 9 30 8.4 (9.19) 6.2 (0.17) 5.6 (0.19) 8.0 (0.11) 7.0 (0.11) 6.6 (0.19) 8.9 (0.21) 7.7 (0.21) 7.7 (0.21) 9.1 (0.16) 8.0 (0.01) 7.7 (0.16)
Table C4: Number of samples. Fairly accurate even with few points, except with some datasets with non-linearities, such as M3Nonlinear4to6, and M3Nonlinear, M13b
spiral. High dimensional datasets. Performs very well on the cubic datasets, even with few samples. Also on M12 Norm. Variance. Moderate variance, even with
many point samples on some datasets such as M3Nonlinear 4to6, and M4Nonlinear. Note that the variance can also depend on the choice of hyperparameters; see for 5000
samples, M1Sphere, M6 Nonlinear, and Mp2,3 paraboloids. Hyperparameter dependency. The estimate can vary depending on the hyperparameters on some datasets with
nonlinearities and curvature, such as Mbeta, M5bhelix2d , M11 Moebius, M13ab, Mp3 paraboloid, and M4 M8 Nonlinear. Interestingly, the hyperparameter dependency can be
worse with more samples; for example on M8 Nonlinear. Nonlinear datasets. The performance can be dependent on the choice of hyperparameters on the nonlinear datasets.
The performance is good on some datasets with non-uniform sampling densities such as M12 Norm and Mn1,2 Nonlinear, but can be challenged by datasets with curvature.
MLE
d n Best med abs med rel Best med abs med rel Best med abs med rel Best med abs med rel
M1Sphere 10 11 9.9 (0.2) 9.2 (0.21) 9.9 (0.2) 10.1 (0.11) 9.3 (0.1) 10.1 (0.11) 10.3 (0.07) 9.5 (0.08) 10.3 (0.07) 9.7 (0.03) 9.7 (0.04) 9.7 (0.04)
M2Affine3to5 3 5 2.9 (0.02) 2.9 (0.06) 3.2 (0.04) 3.0 (0.02) 3.0 (0.04) 3.2 (0.03) 3.0 (0.02) 3.0 (0.02) 3.3 (0.02) 3.0 (0.02) 3.0 (0.02) 3.0 (0.02)
M3Nonlinear4to6 4 6 3.9 (0.05) 3.9 (0.08) 4.3 (0.09) 4.0 (0.05) 3.9 (0.05) 4.3 (0.06) 4.0 (0.04) 4.0 (0.04) 4.4 (0.04) 4.0 (0.02) 4.0 (0.02) 4.0 (0.02)
M4Nonlinear 4 8 4.1 (0.07) 4.3 (0.08) 4.7 (0.1) 4.0 (0.04) 4.1 (0.06) 4.5 (0.05) 4.0 (0.03) 4.1 (0.03) 4.4 (0.03) 4.0 (0.02) 4.1 (0.02) 4.1 (0.02)
M5aHelix1d 1 3 1.0 (0.02) 1.0 (0.02) 1.1 (0.01) 1.0 (0.01) 1.0 (0.01) 1.1 (0.01) 1.0 (0.0) 1.0 (0.01) 1.1 (0.01) 1.0 (0.0) 1.0 (0.01) 1.0 (0.01)
M5bHelix2d 2 3 2.6 (0.04) 2.8 (0.06) 3.1 (0.05) 2.6 (0.04) 2.7 (0.04) 3.1 (0.04) 2.4 (0.02) 2.5 (0.03) 2.9 (0.03) 2.2 (0.01) 2.3 (0.02) 2.3 (0.02)
M6Nonlinear 6 36 6.6 (0.15) 7.0 (0.17) 7.7 (0.16) 6.3 (0.1) 6.7 (0.11) 7.3 (0.11) 6.2 (0.05) 6.5 (0.06) 7.1 (0.05) 6.0 (0.04) 6.3 (0.06) 6.3 (0.06)
M7Roll 2 3 2.0 (0.04) 2.0 (0.04) 2.2 (0.03) 2.0 (0.01) 2.0 (0.02) 2.2 (0.02) 2.0 (0.01) 2.0 (0.02) 2.2 (0.02) 2.0 (0.01) 2.0 (0.01) 2.0 (0.01)
M8Nonlinear 12 72 12.0 (0.08) 13.9 (0.24) 15.1 (0.22) 12.5 (0.05) 14.1 (0.19) 15.4 (0.21) 13.0 (0.04) 14.1 (0.13) 15.4 (0.15) 13.3 (0.03) 14.2 (0.07) 14.2 (0.07)
M9Affine 20 20 15.8 (0.24) 14.6 (0.24) 15.8 (0.24) 16.2 (0.19) 15.0 (0.23) 16.2 (0.19) 16.6 (0.15) 15.4 (0.17) 16.6 (0.15) 17.0 (0.12) 15.7 (0.11) 15.7 (0.11)
M10aCubic 10 11 9.5 (0.13) 8.8 (0.14) 9.5 (0.13) 9.7 (0.14) 9.0 (0.15) 9.7 (0.14) 10.0 (0.06) 9.2 (0.07) 10.0 (0.06) 10.1 (0.05) 9.3 (0.06) 9.3 (0.06)
M10bCubic 17 18 14.4 (0.23) 13.4 (0.22) 14.4 (0.23) 14.8 (0.15) 13.6 (0.17) 14.8 (0.15) 15.2 (0.15) 14.0 (0.16) 15.2 (0.15) 15.4 (0.1) 14.3 (0.1) 14.3 (0.1)
M10cCubic 24 25 18.4 (0.28) 17.1 (0.25) 18.4 (0.28) 19.3 (0.24) 17.8 (0.21) 19.3 (0.24) 19.8 (0.17) 18.3 (0.17) 19.8 (0.17) 20.2 (0.1) 18.7 (0.14) 18.7 (0.14)
M10dCubic 70 72 37.8 (0.74) 35.2 (0.56) 37.8 (0.74) 39.8 (0.51) 36.9 (0.47) 39.8 (0.51) 41.6 (0.34) 38.5 (0.4) 41.6 (0.34) 43.2 (0.32) 39.9 (0.24) 39.9 (0.24)
56
M11Moebius 2 3 2.0 (0.03) 2.1 (0.04) 2.2 (0.03) 2.0 (0.02) 2.0 (0.03) 2.2 (0.02) 2.0 (0.01) 2.1 (0.02) 2.2 (0.02) 2.0 (0.01) 2.1 (0.01) 2.1 (0.01)
M12Norm 20 20 16.5 (0.3) 15.3 (0.32) 16.5 (0.3) 17.4 (0.15) 16.0 (0.19) 17.4 (0.15) 18.0 (0.16) 16.6 (0.13) 18.0 (0.16) 18.6 (0.1) 17.2 (0.13) 17.2 (0.13)
M13aScurve 2 3 2.0 (0.04) 2.0 (0.04) 2.2 (0.03) 2.0 (0.03) 2.0 (0.03) 2.2 (0.02) 2.0 (0.01) 2.0 (0.02) 2.2 (0.01) 2.0 (0.01) 2.1 (0.01) 2.1 (0.01)
M13bSpiral 1 13 1.4 (0.01) 1.8 (0.05) 2.0 (0.04) 1.5 (0.03) 1.6 (0.03) 2.0 (0.02) 1.1 (0.01) 1.1 (0.01) 1.3 (0.01) 1.0 (0.0) 1.0 (0.01) 1.0 (0.01)
Mbeta 10 40 6.7 (0.11) 6.0 (0.13) 6.7 (0.11) 7.0 (0.08) 6.3 (0.1) 7.0 (0.08) 7.1 (0.04) 6.4 (0.05) 7.1 (0.04) 7.3 (0.06) 6.6 (0.05) 6.6 (0.05)
Mn1Nonlinear 18 72 14.7 (0.23) 13.6 (0.26) 14.7 (0.23) 15.1 (0.14) 14.0 (0.17) 15.1 (0.14) 15.6 (0.13) 14.4 (0.13) 15.6 (0.13) 15.8 (0.11) 14.6 (0.1) 14.6 (0.1)
Mn2Nonlinear 24 96 18.2 (0.22) 16.9 (0.17) 18.2 (0.22) 18.9 (0.21) 17.5 (0.16) 18.9 (0.21) 19.5 (0.18) 18.1 (0.19) 19.5 (0.18) 20.0 (0.15) 18.5 (0.15) 18.5 (0.15)
Mp1Paraboloid 3 12 2.9 (0.05) 2.9 (0.05) 3.2 (0.04) 3.0 (0.06) 3.0 (0.06) 3.2 (0.04) 3.0 (0.03) 3.0 (0.03) 3.3 (0.02) 3.0 (0.01) 3.0 (0.02) 3.0 (0.02)
Mp2Paraboloid 6 21 5.2 (0.09) 4.9 (0.08) 5.2 (0.09) 5.5 (0.06) 5.1 (0.06) 5.5 (0.06) 5.7 (0.04) 5.3 (0.04) 5.7 (0.04) 5.9 (0.03) 5.5 (0.03) 5.5 (0.03)
Mp3Paraboloid 9 30 6.4 (0.13) 6.0 (0.14) 6.4 (0.13) 7.0 (0.09) 6.5 (0.09) 7.0 (0.09) 7.5 (0.08) 6.9 (0.06) 7.5 (0.08) 7.8 (0.06) 7.2 (0.06) 7.2 (0.06)
Table C5: Number of samples. Accurate on some low dimensional datasets with few samples, and needs more samples on those with challenging geometries such as M5b
Helix2d, M6 Nonlinear, and M13b spiral. High dimensional datasets. Performs poorly, and tends to underestimate on the M9 Affine and M10c-d Cubic datasets. Variance.
Generally low variance and good consistency between samples. Hyperparameter dependency. The estimate is mostly insensitive to the choice of neighbourhood size, apart
from some nonlinear datasets such as M8 Nonlinear and Mn1,2 Nonlinear, and moderately high dimensional ones such as M9 Affine, M12Norm, and the M9 Cubic datasets.
Nonlinear datasets. There is a tendency to underestimate on nonlinear datasets such as Mn1,2 Nonlinear and Mp3 Paraboloid.
MiND ML
d n Best med abs med rel Best med abs med rel Best med abs med rel Best med abs med rel
M1Sphere 10 11 9.3 (0.52) 9.3 (0.52) 9.2 (0.62) 9.4 (0.5) 9.4 (0.24) 9.4 (0.24) 9.4 (0.5) 9.4 (0.21) 9.4 (0.21) 9.6 (0.48) 9.6 (0.48) 9.5 (0.11)
M2 Affine 3to5 3 5 3.0 (0.0) 3.0 (0.11) 3.0 (0.0) 3.0 (0.0) 3.0 (0.12) 3.0 (0.12) 3.0 (0.0) 3.0 (0.07) 3.0 (0.07) 3.0 (0.0) 3.0 (0.0) 2.9 (0.05)
M3 Nonlinear 4to6 4 6 4.0 (0.0) 3.8 (0.18) 4.0 (0.22) 4.0 (0.0) 3.8 (0.14) 3.8 (0.14) 4.0 (0.0) 3.9 (0.1) 3.9 (0.1) 4.0 (0.0) 4.0 (0.0) 3.9 (0.08)
M4 Nonlinear 4 8 4.0 (0.0) 3.9 (0.22) 4.0 (0.0) 4.0 (0.0) 3.9 (0.13) 3.9 (0.13) 4.0 (0.0) 3.9 (0.1) 3.9 (0.1) 4.0 (0.0) 4.0 (0.0) 3.9 (0.08)
M5a Helix1d 1 3 1.0 (0.0) 1.0 (0.03) 1.0 (0.0) 1.0 (0.0) 1.0 (0.01) 1.0 (0.01) 1.0 (0.0) 1.0 (0.01) 1.0 (0.01) 1.0 (0.0) 1.0 (0.0) 1.0 (0.01)
M5b Helix2d 2 3 2.0 (0.22) 2.3 (0.12) 2.0 (0.22) 2.0 (0.0) 2.1 (0.07) 2.1 (0.07) 2.0 (0.0) 2.0 (0.05) 2.0 (0.05) 2.0 (0.0) 2.0 (0.0) 2.0 (0.02)
M6 Nonlinear 6 36 6.1 (0.44) 6.2 (0.32) 6.1 (0.44) 6.0 (0.0) 6.0 (0.22) 6.0 (0.22) 6.0 (0.0) 5.9 (0.15) 5.9 (0.15) 6.0 (0.0) 6.0 (0.0) 5.9 (0.1)
M7 Roll 2 3 2.0 (0.0) 2.0 (0.05) 2.0 (0.0) 2.0 (0.0) 2.0 (0.06) 2.0 (0.06) 2.0 (0.0) 2.0 (0.03) 2.0 (0.03) 2.0 (0.0) 2.0 (0.0) 2.0 (0.02)
M8 Nonlinear 12 72 13.2 (0.43) 13.4 (0.74) 13.4 (0.8) 13.5 (0.67) 10.0 (0.0) 13.5 (0.59) 13.2 (0.4) 13.3 (0.35) 10.0 (0.0) 13.0 (0.22) 10.0 (0.0) 10.0 (0.0)
M9 Affine 20 20 15.2 (0.69) 15.2 (0.69) 15.2 (0.75) 15.4 (0.51) 10.0 (0.0) 15.4 (0.51) 15.6 (0.37) 15.6 (0.37) 10.0 (0.0) 16.0 (0.22) 10.0 (0.0) 10.0 (0.0)
M10a Cubic 10 11 9.0 (0.35) 9.0 (0.35) 9.0 (0.38) 9.1 (0.34) 9.1 (0.31) 9.1 (0.34) 9.2 (0.14) 9.2 (0.14) 9.2 (0.14) 9.3 (0.12) 9.0 (0.22) 9.3 (0.12)
M10b Cubic 17 18 13.8 (0.75) 13.8 (0.62) 13.8 (0.75) 13.7 (0.56) 10.0 (0.0) 13.7 (0.41) 14.2 (0.35) 14.2 (0.35) 10.0 (0.0) 14.3 (0.29) 10.0 (0.0) 10.0 (0.0)
M10c Cubic 24 25 18.0 (0.77) 18.0 (0.66) 18.0 (0.77) 18.2 (0.58) 10.0 (0.0) 18.2 (0.58) 18.6 (0.49) 18.6 (0.49) 10.0 (0.0) 19.0 (0.22) 10.0 (0.0) 10.0 (0.0)
M10d Cubic 70 72 38.5 (1.9) 20.0 (0.0) 20.0 (0.0) 39.4 (1.07) 10.0 (0.0) 20.0 (0.0) 40.2 (0.84) 30.0 (0.0) 10.0 (0.0) 41.9 (0.7) 10.0 (0.0) 10.0 (0.0)
M11 Moebius 2 3 2.0 (0.0) 2.0 (0.05) 2.0 (0.0) 2.0 (0.0) 2.0 (0.04) 2.0 (0.04) 2.0 (0.0) 2.0 (0.02) 2.0 (0.02) 2.0 (0.0) 2.0 (0.0) 2.0 (0.02)
M12 Norm 20 20 16.2 (0.61) 16.2 (0.61) 16.2 (0.75) 16.8 (0.52) 10.0 (0.0) 16.8 (0.52) 17.2 (0.43) 17.2 (0.31) 10.0 (0.0) 17.5 (0.27) 10.0 (0.0) 10.0 (0.0)
57
M13a Scurve 2 3 2.0 (0.0) 2.0 (0.05) 2.0 (0.0) 2.0 (0.0) 2.0 (0.05) 2.0 (0.05) 2.0 (0.0) 2.0 (0.03) 2.0 (0.03) 2.0 (0.0) 2.0 (0.0) 2.0 (0.02)
M13b Spiral 1 13 1.0 (0.0) 1.1 (0.05) 1.0 (0.0) 1.0 (0.0) 1.0 (0.02) 1.0 (0.02) 1.0 (0.0) 1.0 (0.01) 1.0 (0.01) 1.0 (0.0) 1.0 (0.0) 1.0 (0.01)
Mbeta 10 40 6.2 (0.27) 6.2 (0.27) 6.2 (0.36) 6.3 (0.21) 6.3 (0.21) 6.3 (0.21) 6.6 (0.5) 6.5 (0.17) 6.5 (0.17) 6.8 (0.36) 6.8 (0.36) 6.6 (0.11)
Mn1 Nonlinear 18 72 14.0 (0.61) 14.0 (0.61) 13.9 (0.7) 14.4 (0.48) 10.0 (0.0) 14.3 (0.42) 14.6 (0.35) 14.6 (0.35) 10.0 (0.0) 14.8 (0.36) 10.0 (0.0) 10.0 (0.0)
Mn2 Nonlinear 24 96 17.8 (0.83) 17.7 (0.83) 17.8 (0.83) 18.3 (0.71) 10.0 (0.0) 18.2 (0.69) 18.3 (0.54) 18.3 (0.54) 10.0 (0.0) 18.8 (0.51) 10.0 (0.0) 10.0 (0.0)
Mp1 Paraboloid 3 12 3.0 (0.0) 2.9 (0.13) 3.0 (0.0) 3.0 (0.0) 3.0 (0.07) 3.0 (0.07) 3.0 (0.0) 3.0 (0.09) 3.0 (0.09) 3.0 (0.0) 3.0 (0.0) 3.0 (0.04)
Mp2 Paraboloid 6 21 5.1 (0.18) 5.1 (0.18) 5.0 (0.22) 5.3 (0.19) 5.3 (0.19) 5.3 (0.19) 5.4 (0.17) 5.4 (0.17) 5.4 (0.17) 5.5 (0.08) 5.4 (0.5) 5.5 (0.08)
Mp3 Paraboloid 9 30 6.6 (0.33) 6.6 (0.33) 6.5 (0.5) 7.0 (0.0) 7.0 (0.21) 7.0 (0.21) 7.2 (0.17) 7.2 (0.17) 7.2 (0.17) 7.4 (0.13) 7.4 (0.48) 7.4 (0.13)
Table C6: Number of samples. On most datasets, especially those of small dimension < 7, MiND ML is quite accurate even with few samples; for higher dimensional
datasets where the estimate is inaccurate, adding more points does not appear to improve the estimation significantly. High dimensional datasets. MiND ML severely
underestimates high dimensional datasets, such as M9 Affine and M10a,b,c,d Cubics. Variance. The variance is low in general, though higher dimensional datasets such as the
M10b,c,d Cubics, and non-linear datasets such as Mn1,2 Nonlinear, can induce higher variances Hyperparameter dependency. On low dimensional datasets, MiND ML has
good agreement between Best, med abs, and med rel estimates, yet on higher dimensional datasets sucha as M10b,c,d Cubics, and nonlinear datasets such as M8 Nonlinear, a
poor choice of the number of neighbours parameter can lead to a vast under-estimation of the dimension. Nonlinear datasets. MiND ML tends to underestimate on Mp2,3
paraboloids, as well as Mn1,2 Nonlinear.
GRIDE
d n Best med abs med rel Best med abs med rel Best med abs med rel Best med abs med rel
M1 Sphere 10 11 9.2 (0.52) 9.2 (0.52) 9.2 (0.52) 9.4 (0.24) 9.4 (0.24) 9.4 (0.24) 9.4 (0.2) 9.4 (0.2) 9.4 (0.2) 9.5 (0.11) 9.5 (0.11) 9.5 (0.11)
M2 Affine 3to5 3 5 2.9 (0.12) 2.9 (0.12) 2.9 (0.12) 2.9 (0.12) 2.9 (0.12) 2.9 (0.12) 2.9 (0.07) 2.9 (0.07) 2.9 (0.07) 2.9 (0.05) 2.9 (0.05) 2.9 (0.05)
M3 Nonlinear 4to6 4 6 3.8 (0.18) 3.8 (0.18) 3.8 (0.18) 3.8 (0.07) 3.8 (0.14) 3.8 (0.14) 3.9 (0.1) 3.9 (0.1) 3.9 (0.1) 3.9 (0.07) 3.9 (0.08) 3.9 (0.08)
M4 Nonlinear 4 8 4.0 (0.11) 3.9 (0.22) 3.9 (0.22) 4.0 (0.05) 3.9 (0.13) 3.9 (0.13) 4.0 (0.04) 3.9 (0.1) 3.9 (0.1) 4.0 (0.01) 3.9 (0.08) 3.9 (0.08)
M5a Helix1d 1 3 1.0 (0.01) 1.0 (0.03) 1.0 (0.03) 1.0 (0.01) 1.0 (0.02) 1.0 (0.02) 1.0 (0.0) 1.0 (0.02) 1.0 (0.02) 1.0 (0.0) 1.0 (0.02) 1.0 (0.02)
M5b Helix2d 2 3 2.3 (0.11) 2.3 (0.11) 2.3 (0.11) 2.1 (0.09) 2.1 (0.09) 2.1 (0.09) 2.0 (0.06) 2.0 (0.06) 2.0 (0.06) 2.0 (0.02) 2.0 (0.04) 2.0 (0.04)
M6 Nonlinear 6 36 6.2 (0.32) 6.2 (0.32) 6.2 (0.32) 6.0 (0.22) 6.0 (0.22) 6.0 (0.22) 6.0 (0.09) 5.9 (0.15) 5.9 (0.15) 6.0 (0.06) 5.9 (0.1) 5.9 (0.1)
M7 Roll 2 3 2.0 (0.02) 2.0 (0.08) 2.0 (0.08) 2.0 (0.09) 2.0 (0.09) 2.0 (0.09) 2.0 (0.05) 2.0 (0.05) 2.0 (0.05) 2.0 (0.04) 2.0 (0.04) 2.0 (0.04)
M8 Nonlinear 12 72 12.1 (0.09) 13.4 (0.74) 13.4 (0.74) 12.1 (0.05) 13.5 (0.59) 13.5 (0.59) 12.5 (0.04) 13.3 (0.35) 13.3 (0.35) 13.0 (0.02) 13.2 (0.22) 13.2 (0.22)
M9 Affine 20 20 15.2 (0.69) 15.2 (0.69) 15.2 (0.69) 15.4 (0.51) 15.4 (0.51) 15.4 (0.51) 15.6 (0.37) 15.6 (0.37) 15.6 (0.37) 15.9 (0.25) 15.9 (0.25) 15.9 (0.25)
M10a Cubic 10 11 9.0 (0.35) 9.0 (0.35) 9.0 (0.35) 9.1 (0.34) 9.1 (0.34) 9.1 (0.34) 9.2 (0.14) 9.2 (0.14) 9.2 (0.14) 9.3 (0.12) 9.3 (0.12) 9.3 (0.12)
M10b Cubic 17 18 13.8 (0.62) 13.8 (0.62) 13.8 (0.62) 13.7 (0.41) 13.7 (0.41) 13.7 (0.41) 14.2 (0.35) 14.2 (0.35) 14.2 (0.35) 14.3 (0.29) 14.3 (0.29) 14.3 (0.29)
M10c Cubic 24 25 17.9 (0.65) 17.9 (0.65) 17.9 (0.65) 18.2 (0.58) 18.2 (0.58) 18.2 (0.58) 18.6 (0.49) 18.6 (0.49) 18.6 (0.49) 18.9 (0.2) 18.9 (0.2) 18.9 (0.2)
M10d Cubic 70 72 38.4 (1.9) 38.4 (1.9) 38.4 (1.9) 39.3 (0.96) 39.3 (0.96) 39.3 (0.96) 40.1 (0.84) 40.1 (0.84) 40.1 (0.84) 41.8 (0.61) 41.8 (0.61) 41.8 (0.61)
M11 Moebius 2 3 2.0 (0.07) 2.0 (0.07) 2.0 (0.07) 2.0 (0.02) 2.0 (0.08) 2.0 (0.08) 2.0 (0.01) 2.0 (0.04) 2.0 (0.04) 2.0 (0.01) 2.0 (0.03) 2.0 (0.03)
58
M12 Norm 20 20 16.2 (0.6) 16.2 (0.6) 16.2 (0.6) 16.8 (0.52) 16.8 (0.52) 16.8 (0.52) 17.2 (0.31) 17.2 (0.31) 17.2 (0.31) 17.5 (0.27) 17.5 (0.27) 17.5 (0.27)
M13a Scurve 2 3 2.0 (0.06) 2.0 (0.07) 2.0 (0.07) 2.0 (0.08) 2.0 (0.08) 2.0 (0.08) 2.0 (0.05) 2.0 (0.05) 2.0 (0.05) 2.0 (0.04) 2.0 (0.04) 2.0 (0.04)
M13b Spiral 1 13 1.1 (0.05) 1.1 (0.05) 1.1 (0.05) 1.0 (0.02) 1.0 (0.02) 1.0 (0.02) 1.0 (0.01) 1.0 (0.02) 1.0 (0.02) 1.0 (0.0) 1.0 (0.02) 1.0 (0.02)
Mbeta 10 40 6.2 (0.27) 6.2 (0.27) 6.2 (0.27) 6.3 (0.21) 6.3 (0.21) 6.3 (0.21) 6.5 (0.17) 6.5 (0.17) 6.5 (0.17) 6.6 (0.11) 6.6 (0.11) 6.6 (0.11)
Mn1 Nonlinear 18 72 14.0 (0.61) 14.0 (0.61) 14.0 (0.61) 14.3 (0.42) 14.3 (0.42) 14.3 (0.42) 14.6 (0.35) 14.6 (0.35) 14.6 (0.35) 14.7 (0.2) 14.7 (0.2) 14.7 (0.2)
Mn2 Nonlinear 24 96 17.7 (0.82) 17.7 (0.82) 17.7 (0.82) 18.2 (0.69) 18.2 (0.69) 18.2 (0.69) 18.3 (0.54) 18.3 (0.54) 18.3 (0.54) 18.7 (0.35) 18.7 (0.35) 18.7 (0.35)
Mp1 Paraboloid 3 12 2.9 (0.12) 2.9 (0.12) 2.9 (0.12) 3.0 (0.07) 3.0 (0.07) 3.0 (0.07) 2.9 (0.08) 2.9 (0.08) 2.9 (0.08) 3.0 (0.04) 3.0 (0.04) 3.0 (0.04)
Mp2 Paraboloid 6 21 5.1 (0.18) 5.1 (0.18) 5.1 (0.18) 5.3 (0.19) 5.3 (0.19) 5.3 (0.19) 5.4 (0.17) 5.4 (0.17) 5.4 (0.17) 5.5 (0.08) 5.5 (0.08) 5.5 (0.08)
Mp3 Paraboloid 9 30 6.6 (0.33) 6.6 (0.33) 6.6 (0.33) 7.0 (0.21) 7.0 (0.21) 7.0 (0.21) 7.2 (0.17) 7.2 (0.17) 7.2 (0.17) 7.4 (0.13) 7.4 (0.13) 7.4 (0.13)
Table C7: Number of samples. The estimator is accurate on low dimensional datasets. More samples does not dramatically improve performance.
High dimensional datasets. Struggles with high dimensional datasets, such as M9 Affine, and M10b,c,d Cubics. Variance. The variance is
at a moderately high level compared to other estimators. Hyperparameter dependency. In most cases (72/96), the simplest set of arguments
(n1 = 1, n2 = 2) resulted in the best estimation. When (n1 = 1, n2 = 2), the method is equivalent to MLE with input from two nearest neighbour
distances. Non-linear datasets. This method tends to underestimate on Mbeta, Mn1,2 Nonlinear, and Mp2,3 paraboloids.
CorrInt
d n Best med abs med rel Best med abs med rel Best med abs med rel Best med abs med rel
M1Sphere 10 11 8.8 (0.17) 8.8 (0.17) 8.4 (0.38) 9.0 (0.12) 9.0 (0.12) 8.6 (0.16) 9.1 (0.06) 9.1 (0.06) 9.1 (0.06) 9.3 (0.06) 9.3 (0.06) 9.3 (0.06)
M2 Affine 3to5 3 5 2.8 (0.04) 2.8 (0.04) 2.8 (0.12) 2.9 (0.04) 2.9 (0.04) 2.8 (0.04) 2.9 (0.02) 2.9 (0.02) 2.9 (0.02) 2.9 (0.03) 2.9 (0.03) 2.9 (0.03)
M3 Nonlinear 4to6 4 6 3.5 (0.07) 3.5 (0.07) 3.4 (0.13) 3.6 (0.08) 3.6 (0.08) 3.5 (0.08) 3.6 (0.03) 3.6 (0.03) 3.6 (0.03) 3.7 (0.03) 3.7 (0.03) 3.7 (0.03)
M4 Nonlinear 4 8 3.9 (0.11) 3.8 (0.1) 3.9 (0.13) 3.8 (0.06) 3.8 (0.04) 3.8 (0.06) 3.8 (0.04) 3.8 (0.04) 3.8 (0.04) 3.8 (0.02) 3.8 (0.02) 3.8 (0.02)
M5a Helix1d 1 3 1.0 (0.02) 1.0 (0.02) 1.0 (0.04) 1.0 (0.01) 1.0 (0.01) 1.0 (0.02) 1.0 (0.01) 1.0 (0.01) 1.0 (0.01) 1.0 (0.01) 1.0 (0.01) 1.0 (0.01)
M5b Helix2d 2 3 2.3 (0.1) 2.7 (0.09) 2.4 (0.15) 2.3 (0.04) 2.7 (0.06) 2.3 (0.05) 2.4 (0.02) 2.4 (0.02) 2.4 (0.02) 2.1 (0.02) 2.1 (0.02) 2.1 (0.02)
M6 Nonlinear 6 36 6.0 (0.16) 6.0 (0.16) 6.1 (0.26) 6.0 (0.14) 5.8 (0.11) 6.0 (0.14) 6.0 (0.07) 5.8 (0.06) 5.8 (0.06) 5.8 (0.04) 5.7 (0.03) 5.7 (0.03)
M7 Roll 2 3 2.0 (0.04) 2.0 (0.04) 1.9 (0.09) 1.9 (0.03) 1.9 (0.03) 1.9 (0.03) 2.0 (0.05) 2.0 (0.02) 2.0 (0.02) 2.0 (0.03) 2.0 (0.02) 2.0 (0.02)
M8 Nonlinear 12 72 11.4 (0.28) 11.4 (0.28) 11.0 (0.49) 11.7 (0.2) 11.7 (0.2) 11.2 (0.2) 11.8 (0.14) 11.8 (0.14) 11.8 (0.14) 12.0 (0.06) 12.0 (0.06) 12.0 (0.06)
M9 Affine 20 20 13.5 (0.2) 13.5 (0.2) 12.7 (0.57) 13.9 (0.19) 13.9 (0.19) 13.1 (0.22) 14.4 (0.15) 14.4 (0.15) 14.4 (0.15) 14.8 (0.1) 14.8 (0.1) 14.8 (0.1)
M10a Cubic 10 11 8.5 (0.15) 8.5 (0.15) 8.1 (0.35) 8.7 (0.15) 8.7 (0.15) 8.3 (0.13) 8.9 (0.06) 8.9 (0.06) 8.9 (0.06) 9.1 (0.06) 9.1 (0.06) 9.1 (0.06)
M10b Cubic 17 18 12.6 (0.18) 12.6 (0.18) 11.9 (0.52) 12.9 (0.15) 12.9 (0.15) 12.2 (0.18) 13.4 (0.14) 13.4 (0.14) 13.4 (0.14) 13.7 (0.09) 13.7 (0.09) 13.7 (0.09)
M10c Cubic 24 25 15.9 (0.28) 15.9 (0.28) 15.0 (0.72) 16.6 (0.24) 16.6 (0.24) 15.4 (0.25) 17.1 (0.18) 17.1 (0.18) 17.1 (0.18) 17.7 (0.1) 17.7 (0.1) 17.7 (0.1)
M10d Cubic 70 72 31.2 (0.66) 31.2 (0.66) 29.0 (0.94) 33.0 (0.42) 33.0 (0.42) 30.0 (0.42) 34.6 (0.32) 34.6 (0.32) 34.6 (0.32) 36.0 (0.24) 36.0 (0.24) 36.0 (0.24)
59
M11 Moebius 2 3 2.0 (0.02) 2.0 (0.04) 2.1 (0.07) 2.0 (0.03) 2.0 (0.03) 2.0 (0.03) 2.0 (0.04) 2.0 (0.02) 2.0 (0.02) 2.0 (0.02) 2.0 (0.02) 2.0 (0.02)
M12 Norm 20 20 12.4 (0.25) 12.4 (0.25) 11.7 (0.35) 13.1 (0.15) 13.1 (0.15) 12.0 (0.2) 13.6 (0.17) 13.6 (0.17) 13.6 (0.17) 14.2 (0.11) 14.2 (0.11) 14.2 (0.11)
M13a Scurve 2 3 2.0 (0.04) 2.0 (0.04) 1.9 (0.09) 2.0 (0.03) 2.0 (0.03) 1.9 (0.03) 2.0 (0.02) 2.0 (0.02) 2.0 (0.02) 2.0 (0.01) 2.0 (0.01) 2.0 (0.01)
M13b Spiral 1 13 1.3 (0.04) 1.8 (0.07) 1.3 (0.04) 1.5 (0.03) 1.5 (0.03) 1.6 (0.03) 1.1 (0.01) 1.1 (0.01) 1.1 (0.01) 1.0 (0.01) 1.0 (0.01) 1.0 (0.01)
Mbeta 10 40 3.4 (0.18) 3.4 (0.18) 3.1 (0.19) 3.5 (0.12) 3.5 (0.12) 3.2 (0.13) 3.7 (0.11) 3.7 (0.11) 3.7 (0.11) 3.9 (0.08) 3.9 (0.08) 3.9 (0.08)
Mn1 Nonlinear 18 72 12.3 (0.28) 12.3 (0.28) 11.6 (0.5) 12.8 (0.15) 12.8 (0.15) 11.9 (0.19) 13.2 (0.09) 13.2 (0.09) 13.2 (0.09) 13.6 (0.09) 13.6 (0.09) 13.6 (0.09)
Mn2 Nonlinear 24 96 15.0 (0.23) 15.0 (0.23) 14.4 (0.45) 15.7 (0.19) 15.7 (0.19) 14.6 (0.19) 16.3 (0.14) 16.3 (0.14) 16.3 (0.14) 16.9 (0.14) 16.9 (0.14) 16.9 (0.14)
Mp1 Paraboloid 3 12 2.1 (0.11) 2.1 (0.07) 2.1 (0.11) 2.2 (0.1) 2.2 (0.08) 2.1 (0.06) 2.2 (0.06) 2.2 (0.06) 2.2 (0.06) 2.2 (0.05) 2.2 (0.07) 2.2 (0.07)
Mp2 Paraboloid 6 21 2.8 (0.22) 2.7 (0.21) 2.8 (0.19) 2.7 (0.16) 2.7 (0.16) 2.7 (0.16) 2.8 (0.12) 2.7 (0.09) 2.7 (0.09) 2.7 (0.08) 2.6 (0.08) 2.6 (0.08)
Mp3 Paraboloid 9 30 3.1 (0.3) 3.1 (0.29) 3.1 (0.3) 3.1 (0.19) 3.0 (0.22) 3.0 (0.17) 3.1 (0.14) 3.0 (0.13) 3.0 (0.13) 3.0 (0.12) 2.9 (0.13) 2.9 (0.13)
Table C8: Number of samples. In general, estimates do not see huge improvement as the sample size is increased. Some inaccuracies on low dimensional datasets even
with high sample densities, such as Mp1 Paraboloid. High dimensional datasets. CorrInt struggles with high M9 Affine, and M10b,c,d cubic datasets with a strong bias to
underestimate. Variance. The variance is generally low and decreases as sample size increases across all datasets. Hyperparameter dependency. In general is insensitive
to the choice of parameters Nonlinear datasets. Can be challenged by datasets with non-uniform sampling density, such as Mn1, Mn2 Nonlinear, and Mbeta; as well as those
with some curvature, such as Mp1,2,3 Paraboloid.
WODCap
d n Best med abs med rel Best med abs med rel Best med abs med rel Best med abs med rel
M1 Sphere 10 11 7.74 (0.08) 7.74 (0.08) 6.93 (0.0) 8.36 (0.07) 6.93 (0.0) 7.63 (0.04) 9.15 (0.0) 9.15 (0.0) 8.06 (0.03) 9.46 (0.03) 9.15 (0.0) 8.4 (0.02)
M2 Affine 3to5 3 5 3.04 (0.06) 3.72 (0.08) 3.42 (0.0) 3.15 (0.04) 3.42 (0.0) 3.45 (0.05) 3.19 (0.04) 3.42 (0.0) 3.55 (0.03) 3.24 (0.02) 3.42 (0.0) 3.61 (0.02)
M3 Nonlinear 4to6 4 6 4.0 (0.23) 4.84 (0.12) 4.3 (0.0) 4.04 (0.07) 4.3 (0.0) 4.31 (0.06) 3.95 (0.04) 4.3 (0.0) 4.48 (0.04) 4.03 (0.04) 4.3 (0.0) 4.61 (0.02)
M4 Nonlinear 4 8 4.3 (0.11) 6.61 (0.19) 5.43 (0.0) 4.22 (0.07) 5.43 (0.0) 5.01 (0.07) 4.17 (0.06) 5.43 (0.0) 4.93 (0.05) 4.19 (0.03) 5.43 (0.0) 4.89 (0.02)
M5a Helix1d 1 3 1.22 (0.07) 2.86 (0.34) 1.3 (0.18) 1.18 (0.05) 1.19 (0.0) 1.25 (0.04) 1.18 (0.03) 1.19 (0.0) 1.25 (0.02) 1.16 (0.04) 1.19 (0.0) 1.26 (0.02)
M5b Helix2d 2 3 3.22 (0.08) 4.16 (0.1) 3.46 (0.19) 3.15 (0.07) 3.55 (0.31) 3.7 (0.06) 3.14 (0.04) 3.42 (0.0) 3.63 (0.04) 2.95 (0.03) 3.46 (0.19) 3.7 (0.04)
M6 Nonlinear 6 36 5.89 (0.15) 8.36 (0.09) 6.93 (0.0) 5.88 (0.08) 7.05 (0.48) 7.23 (0.1) 5.82 (0.05) 6.93 (0.0) 7.31 (0.06) 5.77 (0.04) 6.93 (0.0) 7.22 (0.04)
M7 Roll 2 3 2.21 (0.04) 3.21 (0.17) 2.56 (0.26) 2.28 (0.06) 2.6 (0.22) 2.46 (0.05) 2.31 (0.04) 2.7 (0.0) 2.5 (0.03) 2.32 (0.0) 2.7 (0.0) 2.54 (0.02)
M8 Nonlinear 12 72 9.48 (0.14) 9.48 (0.14) 9.15 (0.0) 10.59 (0.09) 9.15 (0.0) 9.42 (0.11) 11.53 (0.07) 12.72 (1.19) 10.27 (0.06) 11.75 (0.02) 13.12 (0.0) 10.84 (0.03)
M9 Affine 20 20 10.09 (0.13) 10.09 (0.13) 9.15 (0.0) 11.02 (0.07) 9.15 (0.0) 10.12 (0.07) 13.12 (0.0) 13.12 (0.0) 10.84 (0.05) 13.31 (0.03) 13.12 (0.0) 11.37 (0.03)
M10a Cubic 10 11 8.38 (0.12) 8.38 (0.12) 6.93 (0.0) 9.15 (0.0) 9.15 (0.0) 8.07 (0.05) 9.41 (0.05) 9.15 (0.0) 8.47 (0.04) 10.0 (0.03) 9.15 (0.0) 8.71 (0.03)
M10b Cubic 17 18 9.81 (0.09) 9.81 (0.09) 9.15 (0.0) 10.73 (0.07) 9.15 (0.0) 9.81 (0.07) 11.72 (0.05) 10.14 (1.72) 10.45 (0.05) 13.12 (0.0) 13.12 (0.0) 10.89 (0.05)
M10c Cubic 24 25 10.54 (0.07) 10.54 (0.07) 9.15 (0.0) 12.92 (0.86) 12.92 (0.86) 10.65 (0.07) 13.12 (0.0) 13.12 (0.0) 11.34 (0.04) 14.3 (0.03) 13.12 (0.0) 11.8 (0.03)
M10d Cubic 70 72 13.12 (0.0) 11.76 (0.08) 13.12 (0.0) 13.42 (0.13) 13.12 (0.0) 12.03 (0.05) 16.68 (1.24) 13.12 (0.0) 12.52 (0.02) 17.09 (0.0) 13.12 (0.0) 12.75 (0.02)
M11 Moebius 2 3 2.07 (0.07) 3.45 (0.12) 2.7 (0.0) 2.32 (0.0) 2.7 (0.0) 2.64 (0.06) 2.32 (0.0) 2.7 (0.0) 2.62 (0.05) 2.32 (0.0) 2.7 (0.0) 2.6 (0.03)
60
M12 Norm 20 20 9.53 (0.15) 9.53 (0.15) 9.15 (0.0) 10.79 (0.07) 9.15 (0.0) 9.82 (0.09) 13.12 (0.0) 13.12 (0.0) 10.82 (0.07) 13.12 (0.0) 13.12 (0.0) 11.56 (0.03)
M13a Scurve 2 3 2.09 (0.08) 2.63 (0.06) 2.29 (0.27) 2.24 (0.06) 2.65 (0.18) 2.46 (0.04) 2.31 (0.03) 2.7 (0.0) 2.5 (0.02) 2.32 (0.0) 2.7 (0.0) 2.53 (0.02)
M13b Spiral 1 13 1.3 (0.06) 2.35 (0.05) 2.11 (0.0) 1.8 (0.04) 2.11 (0.0) 2.27 (0.04) 1.47 (0.28) 2.7 (0.0) 2.82 (0.02) 1.19 (0.03) 1.61 (0.0) 2.42 (0.05)
Mbeta 10 40 7.63 (0.19) 7.42 (0.21) 6.63 (0.6) 8.15 (0.12) 6.93 (0.0) 5.79 (0.12) 8.47 (0.09) 6.93 (0.0) 6.08 (0.07) 8.74 (0.06) 6.93 (0.0) 6.45 (0.05)
Mn1 Nonlinear 18 72 9.63 (0.15) 9.63 (0.15) 9.15 (0.0) 10.68 (0.05) 9.15 (0.0) 9.71 (0.05) 11.81 (1.81) 11.81 (1.81) 10.49 (0.05) 13.12 (0.0) 13.12 (0.0) 11.07 (0.03)
Mn2 Nonlinear 24 96 10.15 (0.14) 10.15 (0.14) 9.15 (0.0) 11.26 (0.05) 9.75 (1.42) 10.38 (0.07) 13.12 (0.0) 13.12 (0.0) 11.24 (0.04) 13.76 (0.05) 13.12 (0.0) 11.82 (0.02)
Mp1 Paraboloid 3 12 3.0 (0.08) 3.62 (0.08) 3.42 (0.0) 2.93 (0.05) 3.42 (0.0) 3.39 (0.04) 3.21 (0.04) 3.42 (0.0) 3.54 (0.03) 3.29 (0.02) 3.42 (0.0) 3.63 (0.02)
Mp2 Paraboloid 6 21 4.75 (0.09) 4.28 (0.13) 3.77 (0.43) 5.27 (0.07) 4.3 (0.0) 4.23 (0.05) 5.72 (0.05) 5.43 (0.0) 4.87 (0.05) 5.87 (0.04) 5.43 (0.0) 5.35 (0.03)
Mp3 Paraboloid 9 30 4.7 (0.1) 3.92 (0.09) 3.42 (0.0) 5.47 (0.07) 4.3 (0.0) 4.02 (0.07) 6.06 (0.07) 5.43 (0.0) 4.95 (0.06) 6.92 (0.04) 6.86 (0.33) 5.8 (0.02)
Table C9: Number of samples. Vulnerable even on M1Sphere to insufficently many samples, though as sample size increases, the performance improves. Biases can persist
even as the number of samples increases. High dimensional datasets. Significant bias to underestimate datasets with dimension beyond 10, even with many samples.
Variance. Generally very consistent with low variance, even with few samples. Hyperparameter dependency. Can be quite sensitive to the neighbourhood size parameter,
even with many points and on relatively simple and low dimensional datasets such as M7 Roll and M13b Spiral. Nonlinear datasets. On some datasets, there is a bias to
underestimate, as demondstrated on Mn1,2 Nonlinear, and Mp2,3 Paraboloids.
TwoNN
d n Best med abs med rel Best med abs med rel Best med abs med rel Best med abs med rel
M1Sphere 10 11 9.2 (0.54) 9.2 (0.54) 9.2 (0.54) 9.4 (0.26) 9.4 (0.26) 9.4 (0.26) 9.4 (0.21) 9.4 (0.21) 9.4 (0.21) 9.5 (0.12) 9.5 (0.12) 9.5 (0.12)
M2 Affine 3to5 3 5 2.9 (0.13) 2.9 (0.13) 2.9 (0.13) 2.9 (0.12) 2.9 (0.12) 2.9 (0.12) 2.9 (0.09) 2.9 (0.09) 2.9 (0.09) 2.9 (0.05) 2.9 (0.05) 2.9 (0.05)
M3 Nonlinear4to6 4 6 3.8 (0.28) 3.8 (0.18) 3.8 (0.18) 3.8 (0.2) 3.8 (0.16) 3.8 (0.16) 3.9 (0.13) 3.9 (0.11) 3.9 (0.11) 3.9 (0.1) 3.9 (0.08) 3.9 (0.08)
M4 Nonlinear 4 8 4.0 (0.22) 3.9 (0.24) 3.9 (0.24) 4.0 (0.15) 3.9 (0.14) 3.9 (0.14) 3.9 (0.13) 3.9 (0.1) 3.9 (0.1) 3.9 (0.08) 3.9 (0.08) 3.9 (0.08)
M5a Helix1d 1 3 1.0 (0.06) 1.0 (0.05) 1.0 (0.05) 1.0 (0.03) 1.0 (0.04) 1.0 (0.04) 1.0 (0.03) 1.0 (0.03) 1.0 (0.03) 1.0 (0.02) 1.0 (0.02) 1.0 (0.02)
M5b Helix2d 2 3 2.2 (0.14) 2.2 (0.14) 2.2 (0.14) 2.1 (0.08) 2.1 (0.08) 2.1 (0.08) 2.0 (0.06) 2.0 (0.07) 2.0 (0.07) 2.0 (0.05) 2.0 (0.04) 2.0 (0.04)
M6 Nonlinear 6 36 6.2 (0.31) 6.2 (0.31) 6.2 (0.31) 6.0 (0.24) 6.0 (0.25) 6.0 (0.25) 6.0 (0.16) 5.9 (0.15) 5.9 (0.15) 5.9 (0.15) 5.9 (0.1) 5.9 (0.1)
M7 Roll 2 3 2.0 (0.1) 2.0 (0.1) 2.0 (0.1) 2.0 (0.1) 2.0 (0.1) 2.0 (0.1) 2.0 (0.06) 2.0 (0.06) 2.0 (0.06) 2.0 (0.03) 2.0 (0.04) 2.0 (0.04)
M8 Nonlinear 12 72 13.4 (0.83) 13.4 (0.83) 13.4 (0.83) 13.5 (0.59) 13.5 (0.59) 13.5 (0.59) 13.2 (0.35) 13.2 (0.35) 13.2 (0.35) 13.2 (0.23) 13.2 (0.23) 13.2 (0.23)
M9 Affine 20 20 15.2 (1.03) 15.1 (0.7) 15.1 (0.7) 15.3 (0.54) 15.3 (0.54) 15.3 (0.54) 15.6 (0.41) 15.6 (0.41) 15.6 (0.41) 15.9 (0.27) 15.9 (0.27) 15.9 (0.27)
M10a Cubic 10 11 9.0 (0.42) 9.0 (0.38) 9.0 (0.38) 9.1 (0.35) 9.1 (0.34) 9.1 (0.34) 9.2 (0.16) 9.2 (0.15) 9.2 (0.15) 9.2 (0.14) 9.2 (0.14) 9.2 (0.14)
M10b Cubic 17 18 13.6 (0.6) 13.6 (0.6) 13.6 (0.6) 13.6 (0.44) 13.6 (0.44) 13.6 (0.44) 14.2 (0.36) 14.2 (0.37) 14.2 (0.37) 14.3 (0.29) 14.3 (0.29) 14.3 (0.29)
M10c Cubic 24 25 17.8 (0.72) 17.8 (0.72) 17.8 (0.72) 18.1 (0.62) 18.1 (0.62) 18.1 (0.62) 18.5 (0.5) 18.5 (0.5) 18.5 (0.5) 18.9 (0.21) 18.9 (0.21) 18.9 (0.21)
M10d Cubic 70 72 38.1 (1.88) 38.1 (1.88) 38.1 (1.88) 39.2 (1.07) 39.2 (1.07) 39.2 (1.07) 39.9 (0.81) 39.9 (0.82) 39.9 (0.82) 41.7 (0.6) 41.7 (0.6) 41.7 (0.6)
M11 Moebius 2 3 2.0 (0.08) 2.0 (0.08) 2.0 (0.08) 2.0 (0.08) 2.0 (0.09) 2.0 (0.09) 2.0 (0.05) 2.0 (0.05) 2.0 (0.05) 2.0 (0.04) 2.0 (0.04) 2.0 (0.04)
61
M12 Norm 20 20 16.1 (0.65) 16.1 (0.65) 16.1 (0.65) 16.7 (0.56) 16.7 (0.56) 16.7 (0.56) 17.2 (0.29) 17.2 (0.29) 17.2 (0.29) 17.4 (0.29) 17.4 (0.29) 17.4 (0.29)
M13a Scurve 2 3 2.0 (0.09) 2.0 (0.09) 2.0 (0.09) 2.0 (0.09) 2.0 (0.1) 2.0 (0.1) 2.0 (0.06) 2.0 (0.06) 2.0 (0.06) 2.0 (0.03) 2.0 (0.04) 2.0 (0.04)
M13b Spiral 1 13 1.0 (0.06) 1.0 (0.06) 1.0 (0.06) 1.0 (0.03) 1.0 (0.04) 1.0 (0.04) 1.0 (0.03) 1.0 (0.03) 1.0 (0.03) 1.0 (0.02) 1.0 (0.02) 1.0 (0.02)
Mbeta 10 40 6.3 (0.44) 6.2 (0.24) 6.2 (0.24) 6.4 (0.27) 6.2 (0.23) 6.2 (0.23) 6.7 (0.24) 6.5 (0.18) 6.5 (0.18) 6.7 (0.16) 6.6 (0.12) 6.6 (0.12)
Mn1 Nonlinear 18 72 13.9 (0.67) 13.9 (0.67) 13.9 (0.67) 14.2 (0.44) 14.2 (0.44) 14.2 (0.44) 14.5 (0.34) 14.5 (0.34) 14.5 (0.34) 14.6 (0.22) 14.6 (0.22) 14.6 (0.22)
Mn2 Nonlinear 24 96 17.6 (0.79) 17.6 (0.79) 17.6 (0.79) 18.2 (0.68) 18.2 (0.68) 18.2 (0.68) 18.3 (0.54) 18.3 (0.54) 18.3 (0.54) 18.7 (0.39) 18.7 (0.39) 18.7 (0.39)
Mp1 Paraboloid 3 12 2.9 (0.21) 2.8 (0.13) 2.8 (0.13) 3.0 (0.09) 3.0 (0.09) 3.0 (0.09) 3.0 (0.11) 2.9 (0.09) 2.9 (0.09) 3.0 (0.08) 3.0 (0.05) 3.0 (0.05)
Mp2 Paraboloid 6 21 5.1 (0.4) 5.1 (0.17) 5.1 (0.17) 5.3 (0.22) 5.3 (0.22) 5.3 (0.22) 5.4 (0.18) 5.4 (0.18) 5.4 (0.18) 5.5 (0.08) 5.5 (0.08) 5.5 (0.08)
Mp3 Paraboloid 9 30 6.6 (0.36) 6.6 (0.36) 6.6 (0.36) 7.0 (0.21) 7.0 (0.21) 7.0 (0.21) 7.2 (0.17) 7.2 (0.17) 7.2 (0.17) 7.4 (0.13) 7.4 (0.13) 7.4 (0.13)
Table C10: Number of samples. Doesn’t need many samples on low dimensional datasets for good accuracy. High dimensional datasets. Has a bias to underestimate,
on moderately high dimensional datasets, such as M9 Affine; underestimation bias worsens with high dimensions, such as on M10b,c,d Cubic datasets. Variance. Moderate
levels of variance that decreases as the number of samples increases. Hyperparameter dependency. A single hyperparameter can achieve consistent performance across all
datasets that is close to the best possible performance. There is only a single hyperparameter; the discard fraction is varied between 0.1, 0.25, 0.5, 0.75. The discard fraction
for both med abs and med rel columns are 0.1 for all n except for 2500, which is 0.75. Nonlinear datasets. Struggles with non-uniform sampling, such as the Mbeta, and
M20 Norm datasets, and also Mn1,2 Nonlinear. Challenged also by the higher dimensional paraboloids Mp2,3.
TLE
d n Best med abs med rel Best med abs med rel Best med abs med rel Best med abs med rel
M1Sphere 10 11 9.9 (0.14) 9.2 (0.11) 10.9 (0.2) 10.1 (0.09) 10.1 (0.09) 11.2 (0.12) 10.1 (0.05) 10.3 (0.06) 9.8 (0.05) 9.9 (0.04) 10.5 (0.05) 10.5 (0.05)
M2 Affine 3to5 3 5 3.0 (0.03) 3.0 (0.02) 3.4 (0.04) 3.0 (0.02) 3.2 (0.04) 3.4 (0.03) 3.0 (0.01) 3.2 (0.02) 3.1 (0.01) 3.0 (0.01) 3.3 (0.02) 3.3 (0.02)
M3 Nonlinear 4to6 4 6 4.1 (0.05) 3.9 (0.04) 4.5 (0.09) 4.0 (0.05) 4.2 (0.06) 4.6 (0.06) 4.0 (0.02) 4.3 (0.04) 4.1 (0.03) 4.0 (0.01) 4.3 (0.03) 4.3 (0.03)
M4 Nonlinear 4 8 4.5 (0.07) 4.5 (0.07) 5.1 (0.09) 4.4 (0.03) 4.5 (0.05) 4.9 (0.05) 4.3 (0.02) 4.4 (0.03) 4.3 (0.03) 4.2 (0.01) 4.4 (0.02) 4.4 (0.02)
M5a Helix1d 1 3 1.2 (0.01) 1.2 (0.01) 1.2 (0.01) 1.1 (0.0) 1.2 (0.01) 1.2 (0.01) 1.1 (0.0) 1.1 (0.01) 1.1 (0.0) 1.1 (0.0) 1.1 (0.0) 1.1 (0.0)
M5b Helix2d 2 3 2.7 (0.03) 2.9 (0.03) 3.2 (0.05) 2.8 (0.01) 2.9 (0.05) 3.2 (0.04) 2.7 (0.02) 2.7 (0.02) 2.9 (0.02) 2.5 (0.01) 2.5 (0.01) 2.5 (0.01)
M6 Nonlinear 6 36 6.5 (0.08) 7.0 (0.11) 8.1 (0.16) 6.7 (0.05) 7.1 (0.1) 7.8 (0.11) 6.7 (0.04) 6.9 (0.05) 6.8 (0.06) 6.5 (0.02) 6.8 (0.04) 6.8 (0.04)
M7Roll 2 3 2.1 (0.02) 2.1 (0.02) 2.3 (0.02) 2.1 (0.01) 2.2 (0.02) 2.3 (0.02) 2.0 (0.01) 2.2 (0.02) 2.1 (0.01) 2.0 (0.0) 2.2 (0.01) 2.2 (0.01)
M8 Nonlinear 12 72 12.0 (0.12) 12.6 (0.15) 15.3 (0.19) 12.3 (0.08) 14.2 (0.16) 15.7 (0.2) 12.1 (0.05) 14.3 (0.13) 13.6 (0.1) 12.6 (0.03) 14.4 (0.06) 14.4 (0.06)
M9 Affine 20 20 16.6 (0.25) 13.6 (0.09) 16.6 (0.25) 17.1 (0.19) 15.5 (0.18) 17.1 (0.19) 17.6 (0.15) 16.0 (0.14) 14.9 (0.09) 18.1 (0.12) 16.4 (0.09) 16.4 (0.09)
M10a Cubic 10 11 9.7 (0.12) 8.7 (0.07) 10.4 (0.13) 9.9 (0.16) 9.7 (0.13) 10.6 (0.15) 9.9 (0.06) 9.9 (0.06) 9.3 (0.07) 10.0 (0.05) 10.0 (0.05) 10.0 (0.05)
M10b Cubic 17 18 15.4 (0.23) 12.7 (0.11) 15.4 (0.23) 15.8 (0.15) 14.3 (0.14) 15.8 (0.15) 16.4 (0.14) 14.8 (0.12) 13.8 (0.1) 16.7 (0.11) 15.1 (0.1) 15.1 (0.1)
M10c Cubic 24 25 19.2 (0.28) 15.8 (0.15) 19.2 (0.28) 20.2 (0.24) 18.4 (0.18) 20.2 (0.24) 20.9 (0.17) 18.9 (0.14) 17.5 (0.09) 21.5 (0.1) 19.4 (0.09) 19.4 (0.09)
M10d Cubic 70 72 37.1 (0.74) 29.6 (0.36) 37.1 (0.74) 39.4 (0.5) 35.6 (0.4) 39.4 (0.5) 41.3 (0.33) 37.4 (0.29) 33.9 (0.26) 43.2 (0.33) 38.9 (0.26) 38.9 (0.26)
M11 Moebius 2 3 2.2 (0.02) 2.2 (0.02) 2.4 (0.02) 2.2 (0.01) 2.2 (0.02) 2.3 (0.01) 2.1 (0.01) 2.2 (0.02) 2.1 (0.01) 2.1 (0.01) 2.2 (0.01) 2.2 (0.01)
62
M12 Norm 20 20 16.4 (0.26) 13.3 (0.1) 16.4 (0.26) 17.4 (0.15) 15.8 (0.14) 17.4 (0.15) 18.1 (0.16) 16.5 (0.09) 15.1 (0.12) 18.9 (0.09) 17.1 (0.08) 17.1 (0.08)
M13a Scurve 2 3 2.0 (0.02) 2.1 (0.02) 2.3 (0.02) 2.0 (0.01) 2.2 (0.02) 2.3 (0.02) 2.0 (0.0) 2.2 (0.01) 2.1 (0.01) 2.0 (0.0) 2.2 (0.01) 2.2 (0.01)
M13b Spiral 1 13 1.6 (0.01) 1.9 (0.02) 2.0 (0.03) 1.8 (0.03) 1.8 (0.03) 2.0 (0.02) 1.2 (0.01) 1.3 (0.01) 1.9 (0.02) 1.1 (0.0) 1.1 (0.0) 1.1 (0.0)
Mbeta 10 40 6.5 (0.09) 5.1 (0.08) 6.5 (0.09) 6.8 (0.07) 6.0 (0.07) 6.8 (0.07) 7.1 (0.05) 6.3 (0.05) 5.9 (0.04) 7.4 (0.05) 6.6 (0.05) 6.6 (0.05)
Mn1 Nonlinear 18 72 15.3 (0.21) 12.5 (0.13) 15.3 (0.21) 15.8 (0.12) 14.4 (0.11) 15.8 (0.12) 16.4 (0.13) 14.9 (0.11) 13.8 (0.05) 16.8 (0.11) 15.2 (0.09) 15.2 (0.09)
Mn2 Nonlinear 24 96 18.6 (0.21) 15.2 (0.15) 18.6 (0.21) 19.5 (0.21) 17.7 (0.17) 19.5 (0.21) 20.2 (0.17) 18.3 (0.14) 16.9 (0.09) 20.9 (0.14) 18.9 (0.12) 18.9 (0.12)
Mp1 Paraboloid 3 12 3.0 (0.02) 2.9 (0.03) 3.3 (0.04) 3.0 (0.02) 3.2 (0.04) 3.4 (0.04) 3.0 (0.02) 3.2 (0.02) 3.1 (0.01) 3.0 (0.01) 3.3 (0.01) 3.3 (0.01)
Mp2 Paraboloid 6 21 5.5 (0.09) 4.2 (0.08) 5.5 (0.09) 5.9 (0.05) 5.3 (0.06) 5.9 (0.05) 6.1 (0.03) 5.5 (0.04) 5.2 (0.03) 6.0 (0.03) 5.7 (0.03) 5.7 (0.03)
Mp3 Paraboloid 9 30 6.7 (0.14) 4.5 (0.13) 6.7 (0.14) 7.4 (0.09) 6.4 (0.08) 7.4 (0.09) 7.9 (0.07) 7.0 (0.08) 6.6 (0.06) 8.3 (0.06) 7.4 (0.05) 7.4 (0.05)
Table C11: Number of samples. On some tricky low-dimensional datasets, such as M11 Moebius and M13b spiral, more points are needed to converge to the correct
dimension, but on most low dimensional datasets, not a lot of points are required to give an accurate estimate. High dimensional datasets. Like many other estimators, TLE
underestimates on high dimensional datasets, such as M1b,c,d Cubic, and M9 Affine. Variance. The variance is generally low compared to other estimators. Hyperparameter
dependency. For most datasets, there is a mild sensitivity to the choice of hyperparameter (neighbourhood size), but the dependency is more acute on nonlinear datasets
such as M8, Mn1,2 Nonlinear, M12 Norm, Mbeta; and high dimensional datasets, such as M9 Affine, and M10a,b,c,d Cubic. Nonlinear datasets. Generally inaccurate and
underestimates on the non-linear datasets referred to above.
ESS
d n Best med abs med rel Best med abs med rel Best med abs med rel Best med abs med rel
M1Sphere 10 11 10.0 (0.08) 10.2 (0.06) 10.2 (0.06) 10.0 (0.04) 10.1 (0.03) 10.1 (0.03) 10.0 (0.03) 10.1 (0.02) 10.2 (0.01) 10.0 (0.02) 10.1 (0.01) 10.1 (0.01)
M2 Affine 3to5 3 5 3.0 (0.02) 2.8 (0.02) 2.9 (0.01) 3.0 (0.01) 2.9 (0.01) 2.9 (0.01) 3.0 (0.01) 2.9 (0.01) 2.9 (0.01) 3.0 (0.01) 3.0 (0.01) 2.9 (0.0)
M3 Nonlinear4to6 4 6 4.0 (0.05) 3.8 (0.05) 4.0 (0.05) 4.0 (0.04) 3.9 (0.04) 3.9 (0.03) 4.0 (0.02) 3.9 (0.02) 3.9 (0.02) 4.0 (0.02) 4.1 (0.01) 3.9 (0.01)
M4 Nonlinear 4 8 4.3 (0.06) 5.2 (0.08) 5.3 (0.08) 4.1 (0.04) 4.8 (0.03) 4.8 (0.03) 4.0 (0.03) 4.6 (0.02) 4.9 (0.02) 4.0 (0.02) 4.5 (0.01) 4.6 (0.01)
M5a Helix1d 1 3 1.1 (0.0) 1.9 (0.04) 2.0 (0.03) 1.0 (0.0) 1.3 (0.01) 1.2 (0.0) 1.0 (0.0) 1.1 (0.0) 1.2 (0.0) 1.0 (0.0) 1.1 (0.0) 1.1 (0.0)
M5b Helix2d 2 3 2.8 (0.02) 2.8 (0.02) 2.9 (0.01) 2.7 (0.03) 2.9 (0.01) 2.9 (0.01) 2.6 (0.02) 2.9 (0.01) 2.9 (0.01) 2.3 (0.01) 2.8 (0.01) 2.9 (0.01)
M6 Nonlinear 6 36 7.2 (0.16) 8.8 (0.13) 8.8 (0.11) 6.8 (0.08) 8.3 (0.08) 8.4 (0.08) 6.6 (0.05) 7.9 (0.05) 8.4 (0.05) 6.3 (0.03) 7.6 (0.02) 8.0 (0.02)
M7 Roll 2 3 2.0 (0.01) 2.3 (0.02) 2.3 (0.02) 2.0 (0.01) 2.0 (0.01) 2.0 (0.01) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0)
M8 Nonlinear 12 72 14.9 (0.23) 19.9 (0.19) 20.0 (0.21) 14.6 (0.16) 19.2 (0.12) 19.3 (0.12) 14.3 (0.1) 18.6 (0.09) 19.5 (0.08) 14.0 (0.05) 18.2 (0.06) 18.9 (0.04)
M9 Affine 20 20 19.6 (0.06) 19.3 (0.06) 19.3 (0.08) 19.5 (0.05) 19.2 (0.06) 19.2 (0.06) 19.4 (0.04) 19.1 (0.06) 19.4 (0.04) 19.3 (0.02) 19.0 (0.03) 19.3 (0.02)
M10a Cubic 10 11 10.0 (0.06) 10.2 (0.05) 10.3 (0.04) 10.0 (0.07) 10.1 (0.06) 10.2 (0.06) 10.0 (0.04) 10.0 (0.03) 10.2 (0.02) 10.0 (0.02) 10.1 (0.02) 10.1 (0.02)
M10b Cubic 17 18 17.2 (0.09) 17.2 (0.09) 17.3 (0.08) 17.1 (0.04) 17.1 (0.04) 17.1 (0.05) 17.0 (0.04) 17.0 (0.04) 17.3 (0.04) 17.0 (0.03) 17.0 (0.03) 17.2 (0.02)
M10c Cubic 24 25 24.1 (0.13) 24.1 (0.13) 24.1 (0.13) 24.0 (0.08) 24.0 (0.08) 24.0 (0.09) 24.0 (0.06) 23.9 (0.06) 24.3 (0.04) 23.9 (0.03) 23.9 (0.03) 24.2 (0.02)
M10d Cubic 70 72 69.8 (0.37) 67.4 (0.45) 67.5 (0.41) 69.9 (0.18) 67.5 (0.21) 67.5 (0.21) 69.8 (0.1) 67.3 (0.1) 69.8 (0.1) 69.7 (0.06) 67.6 (0.08) 69.7 (0.06)
M11 Moebius 2 3 2.1 (0.02) 2.6 (0.03) 2.7 (0.02) 2.1 (0.01) 2.3 (0.01) 2.3 (0.01) 2.0 (0.01) 2.1 (0.01) 2.3 (0.01) 2.0 (0.0) 2.1 (0.0) 2.1 (0.0)
M12 Norm 20 20 19.8 (0.09) 19.6 (0.1) 19.6 (0.11) 19.8 (0.07) 19.5 (0.08) 19.6 (0.08) 19.8 (0.03) 19.6 (0.04) 19.8 (0.03) 19.8 (0.02) 19.6 (0.03) 19.8 (0.02)
63
M13a Scurve 2 3 2.0 (0.01) 2.0 (0.01) 2.1 (0.01) 2.0 (0.01) 2.0 (0.01) 2.0 (0.01) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0)
M13b Spiral 1 13 1.6 (0.01) 1.9 (0.01) 1.9 (0.01) 1.9 (0.01) 1.9 (0.01) 2.0 (0.01) 1.0 (0.0) 1.9 (0.0) 2.0 (0.0) 1.0 (0.0) 1.9 (0.0) 2.0 (0.0)
Mbeta 10 40 6.1 (0.11) 5.6 (0.11) 5.9 (0.1) 6.1 (0.06) 5.6 (0.07) 5.8 (0.07) 6.2 (0.04) 5.7 (0.06) 5.8 (0.06) 6.3 (0.03) 6.2 (0.02) 5.9 (0.02)
Mn1 Nonlinear 18 72 18.2 (0.09) 18.2 (0.09) 18.4 (0.09) 18.1 (0.07) 18.1 (0.07) 18.1 (0.07) 18.0 (0.05) 17.9 (0.05) 18.3 (0.04) 18.1 (0.03) 17.9 (0.03) 18.1 (0.03)
Mn2 Nonlinear 24 96 23.6 (0.2) 24.5 (0.18) 24.6 (0.19) 24.4 (0.09) 24.4 (0.09) 24.4 (0.1) 24.2 (0.08) 24.2 (0.08) 24.7 (0.08) 24.0 (0.04) 24.2 (0.04) 24.5 (0.03)
Mp1 Paraboloid 3 12 3.0 (0.03) 2.7 (0.04) 2.9 (0.02) 3.0 (0.02) 2.8 (0.02) 2.9 (0.01) 3.0 (0.01) 2.9 (0.01) 2.9 (0.01) 3.0 (0.01) 2.9 (0.0) 2.9 (0.0)
Mp2 Paraboloid 6 21 5.2 (0.06) 3.9 (0.12) 4.8 (0.06) 5.4 (0.04) 4.5 (0.06) 5.1 (0.04) 5.5 (0.03) 4.9 (0.04) 5.1 (0.04) 5.6 (0.02) 5.4 (0.01) 5.3 (0.01)
Mp3 Paraboloid 9 30 7.0 (0.12) 3.9 (0.13) 5.8 (0.15) 7.4 (0.08) 5.2 (0.1) 6.8 (0.09) 7.7 (0.05) 6.3 (0.07) 6.8 (0.05) 7.9 (0.03) 7.5 (0.02) 7.4 (0.03)
Table C12: Number of samples. For most low dimensoinal datasets, ESS does not need many points to obtain good accuracy. Notable exceptions are those with interesting
geometry, such as M11 Moebius, M5b Helix2d, and M13b Spiral, though in the case of the latter two, the over-estimation bias persists even as the number of samples increases.
High dimensional datasets. ESS is remarkably accurate on a lot of high dimensional datasets, such as the M10 Cubics, M9 Affine, and M12 Norm, that usually challenge
other estimators. This accuracy is achieved even on low sample sizes. Variance. Low variance across all datasets. Hyperparameter dependency. ESS is mostly insensitive
to the choice of neighbourhood parameter. with some exceptions, such as M13b Spiral even with many point samples. Nonlinear datasets. Unlike other estimators, ESS
struggles with the lower dimensional non-linear datasets such as M4, M8, Mbeta, and Mp1,2,3 Paraboloids, some of which other estimators perform well on; yet ESS performs
superbly on the high dimensional Mn1,2, Nonlinear datasets which other estimators really struggle with.
FisherS
d n Best med abs med rel Best med abs med rel Best med abs med rel Best med abs med rel
M1Sphere 10 11 10.9 (0.16) 10.9 (0.2) 10.9 (0.2) 11.0 (0.06) 11.0 (0.08) 11.0 (0.08) 11.0 (0.02) 11.0 (0.03) 11.0 (0.03) 11.0 (0.02) 11.0 (0.03) 11.0 (0.03)
M2 Affine 3to5 3 5 2.9 (0.01) 2.5 (0.02) 2.5 (0.02) 2.9 (0.01) 2.5 (0.01) 2.5 (0.01) 2.9 (0.01) 2.5 (0.01) 2.5 (0.01) 2.9 (0.0) 2.5 (0.01) 2.5 (0.01)
M3 Nonlinear 4to6 4 6 3.4 (0.12) 2.0 (0.09) 2.0 (0.09) 3.4 (0.07) 1.9 (0.01) 1.9 (0.01) 3.4 (0.02) 1.9 (0.0) 1.9 (0.0) 3.4 (0.01) 1.9 (0.0) 1.9 (0.0)
M4 Nonlinear 4 8 4.1 (0.09) 4.1 (0.09) 4.1 (0.09) 4.1 (0.05) 4.1 (0.05) 4.1 (0.05) 4.1 (0.03) 4.1 (0.03) 4.1 (0.03) 4.1 (0.03) 4.1 (0.03) 4.1 (0.03)
M5a Helix1d 1 3 2.4 (0.18) 2.4 (0.18) 2.4 (0.18) 2.5 (0.01) 2.5 (0.01) 2.5 (0.01) 2.5 (0.01) 2.5 (0.01) 2.5 (0.01) 2.5 (0.01) 2.5 (0.01) 2.5 (0.01)
M5b Helix2d 2 3 2.0 (0.0) 1.8 (0.01) 1.8 (0.01) 2.0 (0.0) 1.8 (0.0) 1.8 (0.0) 2.0 (0.21) 1.8 (0.0) 1.8 (0.0) 2.0 (0.0) 1.8 (0.0) 1.8 (0.0)
M6 Nonlinear 6 36 5.8 (0.11) 5.8 (0.11) 5.8 (0.11) 5.8 (0.09) 5.8 (0.09) 5.8 (0.09) 5.8 (0.05) 5.8 (0.05) 5.8 (0.05) 5.8 (0.04) 5.8 (0.04) 5.8 (0.04)
M7 Roll 2 3 2.5 (0.02) 2.5 (0.02) 2.5 (0.02) 2.5 (0.02) 2.5 (0.02) 2.5 (0.02) 2.5 (0.01) 2.5 (0.01) 2.5 (0.01) 2.5 (0.01) 2.5 (0.01) 2.5 (0.01)
M8 Nonlinear 12 72 11.0 (0.31) 11.0 (0.31) 11.0 (0.31) 10.9 (0.21) 10.9 (0.21) 10.9 (0.21) 10.8 (0.15) 10.8 (0.15) 10.8 (0.15) 10.8 (0.08) 10.8 (0.08) 10.8 (0.08)
M9 Affine 20 20 19.5 (0.57) 11.3 (0.42) 11.3 (0.42) 19.2 (0.39) 11.1 (0.2) 11.1 (0.2) 18.9 (0.28) 11.0 (0.12) 11.0 (0.12) 18.8 (0.15) 11.1 (0.05) 11.1 (0.05)
M10a Cubic 10 11 10.5 (0.1) 7.3 (0.09) 7.3 (0.09) 10.4 (0.09) 7.2 (0.06) 7.2 (0.06) 10.4 (0.07) 7.2 (0.03) 7.2 (0.03) 10.3 (0.04) 7.2 (0.02) 7.2 (0.02)
M10b Cubic 17 18 17.5 (0.49) 10.9 (0.23) 10.9 (0.23) 17.1 (0.37) 10.8 (0.16) 10.8 (0.16) 17.0 (0.3) 10.8 (0.09) 10.8 (0.09) 16.9 (0.15) 10.8 (0.05) 10.8 (0.05)
M10c Cubic 24 25 24.5 (0.85) 16.2 (1.25) 16.2 (1.25) 24.2 (0.45) 14.9 (1.11) 14.9 (1.11) 23.9 (0.41) 14.5 (0.32) 14.5 (0.32) 23.6 (0.32) 14.4 (0.14) 14.4 (0.14)
M10d Cubic 70 72 nan (nan) nan (nan) nan (nan) nan (nan) nan (nan) nan (nan) nan (nan) nan (nan) nan (nan) nan (nan) nan (nan) nan (nan)
M11 Moebius 2 3 2.0 (0.0) 2.1 (0.01) 2.1 (0.01) 2.0 (0.0) 2.1 (0.0) 2.1 (0.0) 2.0 (0.0) 2.1 (0.0) 2.1 (0.0) 2.0 (0.0) 2.1 (0.0) 2.1 (0.0)
M12 Norm 20 20 19.9 (0.33) 9.2 (0.18) 9.2 (0.18) 19.9 (0.34) 9.1 (0.13) 9.1 (0.13) 19.9 (0.11) 9.0 (0.08) 9.0 (0.08) 20.0 (0.18) 8.9 (0.05) 8.9 (0.05)
64
M13a Scurve 2 3 1.9 (0.02) 1.8 (0.01) 1.8 (0.01) 1.9 (0.01) 1.8 (0.01) 1.8 (0.01) 1.9 (0.01) 1.8 (0.01) 1.8 (0.01) 1.9 (0.01) 1.8 (0.0) 1.8 (0.0)
M13b Spiral 1 13 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0) 2.0 (0.0)
Mbeta 10 40 5.3 (0.1) 3.6 (0.03) 3.6 (0.03) 5.3 (0.05) 3.6 (0.02) 3.6 (0.02) 5.3 (0.05) 3.6 (0.02) 3.6 (0.02) 5.3 (0.04) 3.6 (0.01) 3.6 (0.01)
Mn1 Nonlinear 18 72 18.4 (0.78) 9.9 (0.32) 9.9 (0.32) 18.3 (0.53) 9.9 (0.18) 9.9 (0.18) 17.4 (0.43) 9.9 (0.12) 9.9 (0.12) 18.4 (0.22) 9.9 (0.06) 9.9 (0.06)
Mn2 Nonlinear 24 96 23.9 (0.92) 13.6 (1.13) 13.6 (1.13) 24.9 (1.0) 12.9 (0.36) 12.9 (0.36) 22.9 (0.75) 12.9 (0.22) 12.9 (0.22) 23.3 (0.31) 12.9 (0.09) 12.9 (0.09)
Mp1 Paraboloid 3 12 1.4 (0.01) 1.4 (0.01) 1.4 (0.01) 1.4 (0.01) 1.4 (0.01) 1.4 (0.01) 1.4 (0.01) 1.4 (0.01) 1.4 (0.01) 1.4 (0.0) 1.4 (0.0) 1.4 (0.0)
Mp2 Paraboloid 6 21 1.3 (0.01) 1.3 (0.01) 1.3 (0.01) 1.3 (0.01) 1.3 (0.01) 1.3 (0.01) 1.3 (0.01) 1.3 (0.01) 1.3 (0.01) 1.3 (0.0) 1.3 (0.0) 1.3 (0.0)
Mp3 Paraboloid 9 30 1.4 (0.01) 1.4 (0.01) 1.4 (0.01) 1.3 (0.01) 1.3 (0.01) 1.3 (0.01) 1.3 (0.01) 1.3 (0.01) 1.3 (0.01) 1.3 (0.0) 1.3 (0.0) 1.3 (0.0)
Table C13: Number of samples. The performance of FisherS does not significantly improve when more samples are added. Even with many samples, FisherS can be biased
for some low-dimensional datasets, such as M3 Nonlinear 4to6, M5a Helix1d, and M7 Roll. High dimensional datasets. With the correct hyperparameters, FisherS can
have good performances on high dimensional datasets, but it is quite dependent on a good choice. As FisherS relies on inverting a non-linear function in its routine, for M10d
Cubic, the estimator returned an error. Variance. Variance is mostly low, but for some nonlinear datasets such as Mn1,2 Nonlinear with few samples, the variance can be high.
Hyperparameter dependency. On nonlinear datasets such as M13 Norm, Mbeta, Mn1,2 Nonlinear, and M4 Nonlinear; and high dimensional datasets such as M9 Affine,
and M10 Cubics, there is a significant dependency on the hyperparameter choice. Nonlinear datasets. The performance on nonlinear datasets depends on hyperparameter
choices, though it grossly underestimates on Mp1,2,3 Paraboloids.
CDim n = 625 n = 1250 n = 2500 n = 5000
Best Med Abs Med Rel Best Med Abs Med Rel Best Med Abs Med Rel Best Med Abs Med Rel
M1 Sphere 10 11 5.0 5.0 5.0 5.1 5.1 5.1 5.5 5.5 5.5 6.0 6.0 5.7
M2 Affine 3to5 3 5 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
M3 Nonlinear 4to6 4 6 4.0 4.0 4.0 4.0 3.9 3.9 4.0 4.0 4.0 4.0 4.0 4.0
M4 Nonlinear 4 8 4.0 4.0 4.0 4.0 4.2 4.2 4.0 4.2 4.2 4.0 4.0 4.1
M5a Helix1d 1 3 1.0 2.0 2.0 1.0 1.7 1.4 1.0 1.1 1.1 1.0 1.0 1.0
M5b Helix2d 2 3 2.0 3.0 3.0 2.0 3.0 3.0 2.0 3.0 3.0 2.0 3.0 3.0
M6 Nonlinear 6 36 5.0 4.0 5.0 5.0 4.6 4.4 5.0 4.7 4.7 5.0 5.0 4.8
M7 Roll 2 3 2.0 2.0 3.0 2.0 2.5 2.4 2.0 2.1 2.1 2.0 2.0 2.0
M8 Nonlinear 12 72 3.0 3.0 3.0 3.3 3.3 3.0 3.6 3.6 3.6 4.0 4.0 3.8
M9 Affine 20 20 3.3 3.0 3.0 4.0 3.7 3.3 4.0 4.0 4.0 4.2 4.0 4.2
M10a Cubic 10 11 5.0 4.0 5.0 5.0 4.9 4.8 5.1 5.1 5.1 5.2 5.0 5.2
M10b Cubic 17 18 4.0 3.0 4.0 4.0 4.0 3.7 4.3 4.3 4.3 5.0 5.0 4.6
M10c Cubic 24 25 3.0 3.0 3.0 3.0 3.0 2.7 3.4 3.4 3.4 4.0 4.0 3.6
M10d Cubic 70 72 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.1 1.0 1.1
M11 Moebius 2 3 2.0 3.0 3.0 2.0 2.8 2.7 2.0 2.3 2.3 2.0 2.0 2.1
65
M12 Norm 20 20 3.1 3.0 3.0 3.3 3.3 2.8 3.5 3.5 3.5 4.0 4.0 3.6
M13a Scurve 2 3 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
M13b Spiral 1 13 1.8 2.0 2.0 1.6 2.0 2.0 1.0 2.0 2.0 1.0 2.0 2.0
Mbeta 10 40 4.0 3.0 4.0 4.0 3.8 3.7 4.0 4.0 4.0 4.1 4.0 4.1
Mn1 Nonlinear 18 72 3.4 3.0 3.0 4.0 3.7 3.3 4.0 3.9 3.9 4.2 4.0 4.2
Mn2 Nonlinear 24 96 3.0 2.0 3.0 3.0 2.9 2.5 3.1 3.1 3.1 3.4 3.0 3.4
Mp1 Paraboloid 3 12 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
Mp2 Paraboloid 6 21 4.0 4.0 4.0 4.1 4.1 3.9 4.4 4.4 4.4 5.0 5.0 4.5
Mp3 Paraboloid 9 30 4.0 3.0 4.0 4.0 3.8 3.6 4.1 4.1 4.1 4.4 4.0 4.4
Table C14: Only one run of the data sets was performed due to the computation time, so no variance is reported. Overall, the estimator struggles
with higher dimensions but remains accurate on lower dimensions regardless of codimension. The very slight improvements on higher-dimensional
datasets when there are more points suggests that the number of points needed to accurately estimate dimension in these cases is extremely large.
D Hyperparameter choices
This appendix contains the hyperparameters for the benchmarking experiments in Appendix C.
Across all local estimators, and all sampling regimes, we varied between ϵ and k-nearest neighbour
neighbourhoods; the k-nearest neighbour parameter is varied only between k ∈ {10, 20, 40, 80}. We also
varied the ϵ-neighbourhood across four settings. For consistency across datasets, given a point set, the ϵ-
parameter is chosen to be the median k-nearest neighbour distance of that point set, where k ∈ {10, 20, 40, 80}
as well.
lPCA
thresholding version FO(α = 0.05)
Fan(α = 10, β = 0.8, P = 0.95)
Maxgap
Ratio (α = 0.05)
Participation ratio
Kaiser
broken stick
Minka
Table D1: lPCA thresholding methods being varied in our benchmark experiments.
PH0
alpha 0.5, 1.0, 1.5, 2.0
n range min 0.75
n range max 1.0
range type fraction
subsamples 10
nsteps 10
n neighbours 30
66
PH0 625 1250 2500 5000
med abs med rel med abs med rel med abs med rel med abs med rel
alpha 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
KNN
k 1,. . . ,80
n range min 0.75
n range max 1.0
range type fraction
subsamples 10
nsteps 10
Table D5: KNN hyperparameter range in benchmark experiments. Note that unlike other estimators with
nearest neighbour k hyperparameter, in the implementation it incurs an almost trivial cost to compute all
k’s up to a specified maximum k.
GRIDE
n1 1, 2, 4, 8, 16, 32
multiplier 2, 3, 5
d0 1
d1 150
Table D8: Hyperparameters for the estimates in Table C7. Note that these choices make the estimator
identical to MLE with input from two nearest neighbours.
CorrInt
k1 2, 4, 6, 8, 10, 12
k2 12, 14, 16, 18, 20
67
DANCo
k 10, 20, 25, 30, 35, 40, 80
ver DANCo
ESS
ver a, b
d 1
FisherS
conditional number 5, 6, 8, 10, 12, 13
project on sphere 0, 1
limit maxdim 0, 1
MiND ML
k 1, 2, 3, 4, 5, 10, 15, 20
ver MLk, MLi
D 10, 20, 30, 40, 50, 60, 70
68
MiND ML 625 1250 2500 5000
med abs med rel med abs med rel med abs med rel med abs med rel
k 1 1 1 1 1 1 1 1
ver MLk MLi MLk MLk MLk MLk MLi MLk
D 20 20 10 20 30 10 10 10
MLE
k 10, 20, 40, 80
comb mean, hmean, median
TLE
n neighbours 10, 20, 40, 80
epsilon 1e-06, 2e-06, 3e-06, 5e-06
TwoNN
discard fraction 0.05, 0.06, 0.08, 0.09, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6
WODCap
k 10, 20, 40, 80
comb mean, hmean, median
69
WODCap 625 1250 2500 5000
med abs med rel med abs med rel med abs med rel med abs med rel
comb hmean median median mean median mean median mean
k 20 20 20 20 20 20 20 20
70
E Noise experiments
Table E1: Mean estimated dimension along with standard deviation for the unit sphere S 6 ⊂ R11 with
ambient Gaussian noise. The variance of the noise is given at the top of the columns.
Table E2: Mean estimated dimension along with standard deviations for S 6 ⊂ R11 with uniform background
noise (outliers). The number of outlier points is given at the top of the columns.
Table E3: Mean estimated dimension along with standard deviations for S 10 ⊂ R11 with ambient Gaussian
noise. The variance of the noise is given at the top of the columns.
71
Estimator n=0 n=25 n=125 n=250
lPCA 10.00 (0.000) 10.00 (0.000) 10.00 (0.000) 10.00 (0.000)
MLE 9.26 (0.14) 9.33 (0.14) 9.49 (0.13) 9.58 (0.13)
PH0 9.36 (0.22) 9.47 (0.30) 9.74 (0.38) 10.05 (0.45)
KNN 9.91 (1.03) 10.17 (1.31) 9.72 (1.37) 9.18 (1.42)
WODCap 9.15 (0.0) 9.15 (0.0) 9.15 (0.0) 9.15 (0.0)
GRIDE 9.4 (0.200) 9.48 (0.222) 9.53 (0.204) 9.52 (0.245)
TwoNN 9.4 (0.210) 9.49 (0.275) 9.49 (0.263) 9.48 (0.230)
DANCo 10.9 (0.300) 11.00 (0.000) 11.00 (0.000) 11.00 (0.000)
MiND ML 9.4 (0.050) 10.00 (0.458) 10.00 (0.490) 10.00 (0.497)
CorrInt 9.1 (0.060) 9.14 (0.090) 9.11 (0.081) 9.16 (0.136)
ESS 10.0 (0.030) 9.97 (0.051) 9.88 (0.106) 9.66 (0.248)
FisherS 11.0 (0.020) 6.65 (0.052) 4.78 (0.051) 4.07 (0.068)
TLE 10.1 (0.050) 9.72 (0.189) 9.64 (0.201) 9.55 (0.193)
Table E4: Mean estimated dimension along with standard deviations for S 10 ⊂ R11 with uniform background
noise (outliers). The number of outlier points is given at the top of the columns.
Table E5: Mean estimated dimension along with standard deviation for the SO(4) dataset with ambient
Gaussian noise. The variance of the noise is given at the top of the columns.
Table E6: Mean estimated dimension along with standard deviations for SO(4) with uniform background
noise (outliers). The number of outlier points is given at the top of the columns.
72