Dimension Reduction For Clustering
Dimension Reduction For Clustering
Di Yue¶
Peking University
Abstract
The Johnson-Lindenstrauss transform is a fundamental method for dimension reduction
in Euclidean spaces, that can map any dataset of n points into dimension O(log n) with low
distortion of their distances. This dimension bound is tight in general, but one can bypass
it for specific problems. Indeed, tremendous progress has been made for clustering problems,
especially in the continuous setting where centers can be picked from the ambient space Rd . Most
notably, for k-median and k-means, the dimension bound was improved to O(log k) [Makarychev,
Makarychev and Razenshteyn, STOC 2019].
We explore dimension reduction for clustering in the discrete setting, where centers can
only be picked from the dataset, and present two results that are both parameterized by the
doubling dimension of the dataset, denoted as ddim. The first result shows that dimension
Oε (ddim + log k + log log n) suffices, and is moreover tight, to guarantee that the cost is pre-
served within factor 1 ± ε for every set of centers. Our second result eliminates the log log n
term in the dimension through a relaxation of the guarantee (namely, preserving the cost only
for all approximately-optimal sets of centers), which maintains its usefulness for downstream
applications.
Overall, we achieve strong dimension reduction in the discrete setting, and find that it
differs from the continuous setting not only in the dimension bound, which depends on the
doubling dimension, but also in the guarantees beyond preserving the optimal value, such as
which clusterings are preserved.
∗
Email: [email protected]
†
The Harry Weinrebe Professorial Chair of Computer Science. Work partially supported by the Israel Science
Foundation grant #1336/23. Email: [email protected]
‡
Email: [email protected]
§
Email: [email protected]
¶
Email: di [email protected]
1
1 Introduction
Oblivious dimension reduction, in the spirit of the Johnson and Lindenstrauss (JL) Lemma [JL84],
is a fundamental technique for many Euclidean optimization problems over large, high-dimensional
datasets. It has a strong guarantee: there is a random linear map π : Rd → Rt , for a suitable target
dimension t = O(ε−2 log n), such that for every n-point dataset P ⊂ Rd , with high probability, π
preserves all pairwise distances in P within factor 1 ± ε:
where throughout ∥ · ∥ is the Euclidean norm. This guarantee is extremely powerful, particularly
for algorithms: to solve a Euclidean problem on input P , one can apply the map π, solve the same
problem on π(P ), which is often more efficient since π(P ) lies in low dimension, and “lift” the
solution back to the original dimension (as discussed further in Section 1.2).
However, many problems require computational resources that grow exponentially with the
dimension (the curse of dimensionality), and hence even dimension t = O(ε−2 log n) might be too
large. Unfortunately, this dimension bound is tight in general, i.e., for preserving all pairwise
distances [LN17], but interestingly one may bypass it for specific optimization problems, by showing
that the optimal value/solution is preserved even when the dimension is reduced beyond the JL
Lemma, say to dimension t = O(ε−2 ), which is completely independent of n. This raises an
important question:
For which problems does dimension o(ε−2 log n) suffice for oblivious dimension reduction?
Prior work has revealed an affirmative answer for several key problems, as we discuss below.
This paper studies this question for fundamental clustering problems, captured by (k, z)-clustering,
which includes the famous k-means and k-median problems as its special cases. In (k, z)-clustering,
the input is a dataset P ⊂ Rd , and the goal is to find a set of centers C of size |C| ≤ k that
minimizes X
costz (P, C) := distz (p, C), where distz (p, C) := min ∥p − c∥z .
c∈C
p∈P
We can distinguish two variants, differing in their space of potential centers. In the continuous
variant, C is a subset of Rd (the centers lie in the ambient space), and in the discrete variant, also
called sometimes k-medoids, C is a subset of P (or maybe of a larger set given as input). A key
feature of the discrete version, is that π : P → π(P ) is invertible, hence each potential center in
π(P ) corresponds to a unique potential center in P (in contrast, a potential center in the ambient
space Rt has many preimages in Rd ). Thus, in the discrete version, a set of centers computed for
the dataset π(P ) can be mapped back to the higher dimension and serve as centers for the dataset
P . See Section 1.3 for a discussion on practical applications of the discrete variant.
The continuous variant is a success story of the “beyond JL” program. A series of papers [BZD10,
CEM+ 15, BBC+ 19, MMR19] has culminated showing that target dimension t = O(ε−2 log kε ),
which is independent of n, suffices to preserve all the solutions within factor 1 ± ε. Curiously,
Charikar and Waingarten [CW25] observed that the discrete variant behaves very differently: certain
instances require t = Ω(log n), even for k = 1 (when using the standard Gaussian-based map π).
Counterintuitively, restricting the centers to be data points makes dimension reduction significantly
harder!
2
To bypass this limitation, we consider the doubling dimension, which was identified in previous
work as a natural parameter that is very effective in achieving “beyond JL” bounds [IN07, NSIZ21,
JKS24, HJKY25, GJK+ 25]. Formally, the doubling dimension of P , denoted ddim(P ), is the smallest
positive number such that every ball in the finite metric P can be covered by 2ddim(P ) balls of half the
radius. For several problems, including nearest neighbor [IN07], facility location [NSIZ21, HJKY25],
and maximum matching [GJK+ 25], target dimension t = O(ε−2 log 1ε · ddim(P )) suffices. Note
that restricting the doubling dimension does not immediately imply a better dimension reduction
of the JL flavor, as there are datasets P ⊂ Rd with ddim(P ) = O(1) where no linear map can
approximately preserve all pairwise distances (see e.g., [IN07, Remark 4.1]).
however for sake of exposition, we omit z and focus on z = 1 or z = 2, which are discrete k-median
and k-means. We use the notation Õ(f ) to hide factors that are logarithmic in f , although below
it only hides a log 1ε factor.
Theorem 1.1 (Informal version of Theorem 3.1). For suitable t = Õ(ε−2 (ddim(P ) + log k +
log log n)), with probability at least 2/3,
This theorem has immediate algorithmic applications. First, it implies that the optimal value is
preserved, i.e., opt(G(P )) ∈ (1 ± ε) opt(P ). Second, for every C ⊂ P and β > 1, if the set of centers
G(C) is a β-approximate solution for the instance G(P ), then C is a (1 + O(ε))β-approximate
solution for the instance P . Therefore, the theorem fit into the general paradigm of using oblivious
linear maps — apply the mapping, solve the problem in low dimension, and lift the centers back to
the higher dimension.
It is interesting to compare our result with the continuous variant of (k, z)-clustering. On the
one hand, to preserve the optimal value in the continuous variant, we know from [MMR19] that
target dimension O(ε−2 log kε ) suffices, independently of ddim(P ). On the other hand, Theorem 1.1
further provides a “for all centers” guarantee, which is not attainable in the continuous version (by
any linear map), by simply considering centers in the kernel of the linear map (see Theorem 6.1).
We examine and discuss these guarantees more carefully in Section 1.2.
3
Matching lower bounds. The results in Theorem 1.1 are nearly tight for Gaussian JL maps,
and likely for all oblivious linear maps. It is known that achieving opt(G(P )) ∈ (1 ± ε) opt(P )
requires target dimension t = Ω(log k), even for a dataset P of doubling dimension O(1) [NSIZ21],
and another known lower bound is that t = Ω(ddim(P )), even for k = O(1) [CW25]. It is easy
to tighten these bounds with respect to the dependence on ε, and we include it in Section C for
completeness. We complete the picture, and show in Theorem 6.2 the multiplicative approximation
of Theorem 1.1 requires dimension t = Ω(ε−2 log log n), even for k = 1 and a dataset P of doubling
dimension O(1).
To get some intuition about the discrete variant, we briefly recall the hard instance of [CW25],
taking z = 1 for simplicity. Consider k = 2, and let √ P be the first n standard
√ basis vectors, thus
ddim(P ) = log n. The pairwise distances all equal 2, hence opt(P ) = 2 · n. The standard basis
vectors form a well-known hard instance for the JL Lemma, hence, when using target dimension
t = o(ε−2 log n), with high probability, there exists j1 ∈ [ n2 ] such that ∥Gej1 ∥ < 1 − 10ε. Similarly,
let j2 > n2 be such an index for the last n2 standard basis vectors. Let Gej1 , Gej2 be the two
centers for G(P ), and assign the first n2 basis vectors to Gej2 and the last n2 vectors to Gej1 . Now
a simple argument using
√ the independence between the two halves, see Section C, shows that
opt(G(P )) ≤ (1 − ε) 2 · n = (1 − ε) opt(P ) with probability 2/3.
A relaxed guarantee. Our main result avoids the log log n term in Theorem 1.1 by slightly
relaxing the guarantee, while keeping it useful for downstream applications.
Theorem 1.2 (Informal version of Theorem 5.1). For suitable t = Õ(ε−2 (ddim(P ) + log k)), with
probability at least 2/3,
2. for all C ⊆ P, |C| ≤ k, we have cost(G(P ), G(C)) ≥ min{(1 − ε) cost(P, C), 100 opt(P )}.
This theorem implies that the optimal value is preserved, i.e., opt(G(P )) ∈ (1 ± ε) opt(P ).
Let us further examine which solutions are preserved under this guarantee: For all C ⊂ P and
100
1 < β < 1+ε , if the set of centers G(C) is a β-approximate solution for the instance G(P ), then C
is a (1 + O(ε))β-approximate solution for the instance P . Recall that for Theorem 1.1, we had a
100
similar claim, but without the restriction β < 1+ε . The constant 100 here is arbitrary, and can be
changed to any α > 2, at the cost of increasing the target dimension by an additive O(ε−2 log log α)
term.
4
however a (1 + ε)-approximate MST of G(P ) may have large cost for P [NSIZ21]. Ideally, we want
the cost of every solution to have bounded contraction, as it allows to lift any solution for G(P ) to
a solution for P , and we thus consider several different notions for the set of solutions, as follows.
For simplicity, we present these for z = 1 in the discrete setting, but they extend naturally to all
z ≥ 1 and to the continuous setting.
1. Partitions.
Pk APsolution is a partition P = (P1 , . . . , Pk ) of P . Its cost is defined as cost(P) :=
i=1 minc∈Pi p∈Pi ∥p − c∥.
2. Centers.
P A solution is a set of centers C = (c1 , . . . , ck ) ⊆ P . Its cost is defined as cost(P, C) :=
p∈P dist(p, C).
These definitions are fairly natural, and were used in prior work on dimension reduction, e.g.,
partition-based solutions were used in [MMR19] for k-means and k-median, and center-based
solutions were used in [JKS24] for k-center. It was observed in [CW25] that not all “for all”
guarantees are the same; in particular, “for all centers” and “for all partitions” are incomparable.
However,“for all centers and partitions” is clearly stronger than both.
Next, we define contraction for solutions, capturing the two notions in Theorems 1.1 and 1.2. The
notion in Theorem 1.1 is simply of multiplicative contraction: A solution S has (1 − ε)-contraction if
cost(G(S)) ≥ (1 − ε) cost(S). The notion in Theorem 1.2 is new, at least in the context of dimension
reduction, and goes as follows.
Definition 1.3 (Relaxed Contraction). A solution S has α-relaxed (1 − ε)-contraction (for α > 1,
ε > 0) if cost(G(S)) ≥ min{α opt(P ), (1 − ε) cost(S)}.
Using these definitions, we can restate Theorem 1.1 as having (1 − ε)-contraction for all centers,
and restate Theorem 1.2 as achieving 100-relaxed (1 − ε)-contraction for all centers. In fact, we can
strengthen Theorem 1.1 to assert (1 − ε)-contraction for all centers and partitions.
Theorem 1.4 (Strengthened Theorem 1.1, informal). For suitable t = Õ(ε−2 (ddim(P ) + log k +
log log n)), with probability at least 2/3, for all partitions P = (P1 , . . . , Pk ) of P and sets of centers
C = (c1 , . . . , ck ) ⊆ P ,
cost(G(P), G(C)) ≥ (1 − ε) cost(P, C).
This strengthening is not attainable for Theorem 1.2, as dimension Ω(ε−2 log log n) is needed
to get a “for all centers and partitions” guarantee, even for relaxed contraction (see Theorem 6.4).
However, we do not know if a “for all partitions” guarantee is possible without the log log n term.
If it is possible, then a curious phenomenon will occur: we get a “for all partitions” and a “for
all centers” guarantees, but not a “for all centers and partitions” guarantee. All our results are
summarized in Table 1.
Candidate centers. We consider also a more general variant of k-clustering, where the candidate
centers are part of the input (given either explicitly or implicitly): Given a P
dataset P and candidate-
centers set Q, the goal is to find C ⊆ Q of size |C| ≤ k that minimizes p∈P distz (p, C). When
Q = Rd or Q = P , we obtain the continuous and discrete variants, respectively.
5
Problem Target dimension ∀ partitions ∀ centers contraction Reference
Continuous O(ε−2 log k) yes no multiplicative [MMR19]
Ω(ε−2 log k) no no even for value [NSIZ21]
>d−1 no yes even for relaxed Thm 6.1
Discrete O(ε−2 (ddim + log k + log log n)) yes yes multiplicative Thm 3.1
O(ε−2 (ddim + log k)) no yes relaxed Thm 5.1
? yes no any OPEN
Ω(ε−2 log log n) yes yes even for relaxed Thm 6.4
Ω(ε−2 log log n) no yes multiplicative Thm 6.2
Ω(ε−2 log k) no no even for value [NSIZ21]
Ω(ε−2 ddim) no no even for value [CW25]
Candidate O(ε−2 log s) yes yes multiplicative Thm 4.1
centers O(ε−2 (ddim + log k + log log n)) yes yes relaxed Thm 4.4
O(ε−2 (ddim + log k)) no yes relaxed Thm 5.1
Ω(ε−2 log s) no yes multiplicative Thm 6.5
Table 1: Summary of our results for dimension reduction for k-clustering. The notions of “for all”
centers and/or partitions, and of multiplicative/relaxed contraction are as explained in Section 1.2.
Some lower bounds apply even for preserving the optimal value; for clarity, it is noted in the table
they hold “even for value”. In the setting of candidate centers, the size of the candidate set is
denoted by s. Suppressing log 1ε terms and the dependence on α for α-relaxed contraction.
6
interpretability might require the centers to be part of the dataset. For example, in applications
based on machine-learning embeddings of objects such as text [XGF16], an arbitrary vector in the
embedding space might not represent any actual object. A similar issue arises for structured data
such as sparse data or images, e.g., the “average image” is visually random noise [LRU20, TZM+ 20]
or the average of sparse vectors is not necessarily sparse. A discrete center, however, represents an
actual underlying object, and thus preserves the underlying properties of the input points.
Failure of extension theorems in the discrete setting. To prove (1) (and possibly more gen-
eral claims), a natural framework based on extension theorems have been widely used in dimension
reduction for clustering. Specifically, given an arbitrary center v in the target space (e.g., v is the
optimal 1-median center of G(P )), one can define an “inverse image” u in the original space such
1
that cost(P, u) ≤ (1 + ε) cost(G(P ), v), and this directly implies opt(G(P )) ≥ 1+ε opt(P ). The key
step of defining “inverse image” is precisely what an extension theorem does. This framework is
widely used in prior works such as [MMR19, JKS24], in the spirit of the classic Kirszbraun extension
theorem [Kir34] or the robust one-point extension theorem [MMR19, Theorem 5.2]. However, such
extension theorems are only known to work in the continuous setting, which require to pick the
inverse image v ∈ Rd from the entire Rd and cannot be restricted only to the data points v ∈ P .1
Our techniques. We start with k = 1 case (a detailed discussion can be found in Section 1.4.1).
In this case, we first obtain a target dimension bound with an O(log log n) factor, by utilizing the
existence of a small movement-based coreset. A coreset is a small accurate proxy of the dataset, and
the movement-based coreset additionally requires the existence of a “local” mapping such that each
data point can be mapped to a nearby coreset point. The dimension reduction simply preserves the
pairwise distance on the coreset, and (1) is argued via the local mapping. A conceptually similar
coreset-to-dimension-reduction idea has also been employed in [CW25], and one main difference is
that we also utilize the movement/local property of the coreset.
Then, to remove the O(log log n) factor, we consider a weaker guarantee as in Theorem 1.2,
where we prove the (1 + ε) relative error only for near-optimal solutions, and for the other solutions
we have a flat 100 opt(P ) error. This relaxed guarantee is strong enough for (1) (and many other
1
We note that the Kirszbraun theorem may be adapted to work for the discrete case when the target dimension
t = O(log n), but this dimension bound is too large to be useful.
7
applications), which may be of independent interest to further studies. Our analysis is crucially
built on this small vs large cost case, albeit we also need to consider the middle ground of the mix
of the two.
Finally, we discuss the generalization to k > 1 in Section 1.4.2, which introduces several nontrivial
technical complications from k = 1.
The O(log log n) bound: from coreset to dimension reduction. To prove (1), we use an
approach inspired by the movement-based coreset construction in Euclidean spaces [HM04]. Roughly
speaking, aP movement-based coreset2 is a subset S ⊆ P , such that there exists a mapping σ : P → S
satisfying p∈P ∥p − σ(p)∥ ≤ O(ε) opt(P ). Our framework is summarized as follows: we first
construct a movement-based coreset S to compress the dataset P . Next, we apply the standard JL
lemma to preserve pairwise distances in the coreset S within (1 ± ε), which requires O(ε−2 log |S|)
target dimensions. After this step, the optimal value of S is already preserved, nemely, opt(G(S)) ∈
(1 ± ε) opt(S). Finally,
P it suffices to show P that the cost of snapping data points to their nearest
neighbor in S (i.e., p∈P ∥p − S(p)∥ and p∈P ∥Gp − GS(p)∥) is negligible in both original and
target spaces.
The construction of the coreset is essentially the same with that in [HM04], except that [HM04]
also assigns weight to the coreset points and here we only need the point set itself. We review the
construction. This construction is based on a sequence of nets, a standard tool for discretizing
metrics. Formally, a ρ-net of a point set P is a subset N ⊆ P , such that 1) the interpoint distances
in N are at least ρ, and 2) every point in P has a point in N within distance ρ. (See the more
detailed definition in Definition 2.3). Denote c∗ ∈ P as an optimal discrete 1-median center. We
construct nets on a sequence of balls centered at c∗ with geometrically decreasing radii. Denote
r0 := opt(P ) and rℓ := r0 /2ℓ for ℓ = 1, 2, . . . , log n. Construct the level ℓ net Nℓ as an εrℓ -net on
the ball B(c∗ , rℓ ), and denote N := log
S n
ℓ=0 Nℓ to be the union of all log n levels of nets.
By the standard packing property of doubling metrics, each net has size |Nℓ | ≤ O(ε−O(ddim) ),
thus |N | ≤ O(ε−O(ddim) log n), which implies a target dimension t = O(ε−2 (ddim log ε−1 +log log n)).
On the other hand, let G(c) ∈ G(P ) be an optimal discrete 1-median center P of G(P ). Then the
total cost of snapping c and all data points to the nearest neighbor in N (i.e., p∈P (∥p − N (p)∥ +
2
This definition is tailored to our need and may be slightly different to that in the literature.
8
∥c − N (c)∥)) can be bounded by O(ε)(opt(P ) + cost(P, c)) in the original space.
P Based on results
in [IN07], we further show that this snapping cost in the target space (i.e., p∈P (∥Gp − GN (p)∥ +
∥Gc − GN (c)∥)) can increase by at most a constant factor.
Finally, we note that the above analysis can be applied to obtain the “for all centers” guarantee
in Theorem 1.1, or even the stronger “for all centers and partitions” guarantee in Theorem 1.4.
Removing the log log n term via relaxed guarantee. Let us first recall the cause of the
log log n term. We the JL Lemma to N , which is a union of log n nets, each of size ε−O(ddim) . The
log log n thus comes from a union bound over all log n levels. To bypass this union bound, we use
two technical ideas. First, we avoid touching cross-level pairs and only apply the union bound
for each Nℓ separately. This requires us to always snap p and c to the same level of net when
handling each p ∈ P . Second, for a single level, we analyze its maximum distance distortion which
is a random variable, and bound the expectation. We remark that some levels will be distorted
significantly, but the average distortion is (1 + O(ε)). Similar ideas have been used by prior works
(e.g., [GJK+ 25]).
Consider the following two extremes. First, suppose c is the closest point to c∗ , say, ∀p ∈
P, ∥c − c∗ ∥ ≤ ∥p − c∗ ∥. For every p ∈ P , we can snap p to its nearest neighbor in net Np .
Observe that c can also be covered by Np . The cost of snapping p and c can both be bounded
by O(ε) · ∥p − c∗ ∥, and we show that on average, the cost of snapping Gp and Gc is bounded by
O(ε) · ∥p − c∗ ∥ as well, which adds up to O(ε) opt(P ). The other extreme is that c is very far from
c∗ , i.e, ∥c − c∗ ∥ > opt(P )/10. In this case, we can no longer snap c to the same net as p (like the
previous case). We show that in this case, cost(G(P ), Gc) ≥ 100 opt(P ).
If c does not fall into any of the above two extremes, our analysis is a combination of them.
Indeed, we show the relaxed “for all centers” guarantee,
Note that this is exactly the same as the guarantee 2 of Theorem 1.2, and that the two terms in
the min corresponds to the aforementioned two extremes, respectively. Specifically, we first specify
a level ℓ and its corresponding radius rℓ . If ∥c − c∗ ∥ > rℓ , then we fall into the second extreme
and show that cost(G(P ), Gc) ≥ 100 opt(P ). Otherwise, ∥c − c∗ ∥ ≤ rℓ , then we handle each p ∈ P
differently, depending on the distance ∥p − c∗ ∥. If ∥p − c∗ ∥ ≥ rℓ , then we use the same argument as
the first extreme — snapping both p and c to Np , bounding the snapping cost, and analyzing the
additive contraction. If ∥p − c∗ ∥ < rℓ , then we snap both p and c to Nℓ . Since ℓ is a fixed level, a
union bound over Nℓ is affordable and we obtain cost(G(P ), Gc) ≥ (1 − ε) cost(P, c) in this case.
where C(p) is the center in C closest to p. Note that (3) is weaker than what we desire in Theorem 1.2,
for the following two reasons. First, the target dimension is worse than the O(ε−2 (ddim + log k))
in Theorem 1.2. Second, the left hand side of (3) can be much larger than cost(G(P ), G(C)), since
9
the image of C(p) under G (i.e., GC(p)) is not necessarily the nearest neighbor of Gp in G(C).
Nonetheless, the proof of (3) already captures most of our key ideas. In the end of this section, we
briefly discuss how we obtain a sharper target dimension bound as well as a stronger guarantee.
Suppose C ∗ ⊆ P is an optimal solution, which induces a clustering C ∗ = {S1∗ , S2∗ , . . . , Sk∗ }. Our
general proof framework is the same as the k = 1 case — considering the “distance” between C
and C ∗ , if C is “far from” C ∗ , then we show cost(G(P ), G(C)) ≥ 100 opt(P ); otherwise we show
cost(G(P ), G(C)) ≥ (1 − ε) cost(P, C).
However, an immediate issue is how to define that C and C ∗ are far from or close to each other.
For each i ∈ [k], we specify a “threshold level” of cluster Si∗ , denoted by ℓi . We say C is “far from”
C ∗ if there exists i ∈ [k], such that dist(c∗i , C) > 10rℓi . In this case, the cost of connecting B(c∗i , rℓi )
to C is already high. We further prove that cost(G(P ), G(C)) ≥ 100 opt(P ), by careful analysis of
the randomness of G.
Now suppose C is “close to” C ∗ , i.e., ∀i ∈ [k], dist(c∗i , C) ≤ 10rℓi . Our key observation is that
for every p ∈ Si∗ , C(p) should also be close to c∗i , i.e.,
As a natural generalization of the k = 1 case, we lower bound ∥Gp − GC(p)∥ for p ∈ Si∗ differently,
depending on the distances ∥C(p) − c∗i ∥. If ∥C(p) − c∗i ∥ ≥ rℓi , then we snap both p and C(p) to the
(enlarged) net Np . (We can do this since (4) holds.) Otherwise, we snap both p and C(p) to the
(enlarged) net Nℓi . The snapping cost and the distance contraction are bounded similarly to the
k = 1 case. This simply introduces an extra log k factor in the target dimension.
Decoupling ddim from log k. So far, we only obtain an Oε (ddim log k) bound, instead of
Oε (ddim + log k). This is due to error accumulation: Recall we handle each (optimal) cluster
Si∗ separately, each of which incurs an O(ε) opt(P ) additive error; hence, we have to rescale ε by
a 1/k factor to compensate the accumulated error of k clusters, resulting in an O(ε−2 ddim log k)
target dimension (naı̈vely, that results in Õ(ε−2 k 2 ddim) target dimension, but this is avoided by
an easy adaptation).
To decouple these two factors, we need more delicate analysis for the error. For “far” points
p ∈ Si∗ with ∥C(p)−c∗i ∥ ≥ rℓi , the snapping and distortion error is O(ε)∥p−c∗i ∥ in expectation, which
adds up to O(ε) opt(P ) and does not incur any error accumulation. However, the error accumulation
happens for P “close” points p with ∥C(p) − c∗i ∥ < rℓi , where the snapping cost within a single cluster
Si∗ , namely p∈S ∗ ∥p − Nℓ (p)∥, is already O(ε) opt(P ), which accumulates to O(kε) opt(P ).
i
To reduce the error accumulation, we further divide the close points (i.e., ∥C(p) − c∗i ∥ < rℓi )
into two ranges, namely, the close range ∥C(p) − c∗i ∥ < rℓi /k and the middle range ∥C(p) − c∗i ∥ ∈
[rℓi /k, rℓi ], and handle these two ranges differently. The cost of points in the close range can be
bounded by O(ε/k) opt(P ), which adds up to O(ε) opt(P ). For points in the middle range, we
2
handle them in a point-by-point manner, at the cost of poly(k)e−Ω(ε t) per point. Since there are
at most k · O(log k) levels in the middle range, a union bound over all net points at these levels will
be affordable.
Handling nearest neighbor assignment in the target space. Recall that (4) conerns the cost
∥Gp − GC(p)∥, which is the cost in the target space with respect to the nearest neighbor assignment
in the original space. However, what we really need is the nearest neighbor assignment in the target
space. To capture such misalignment in the original and target spaces, we define a mapping f to
10
be the assignment in Pthe target space, i.e., f (p) is the center in C realizing dist(Gp, G(C)), so that
cost(G(P ), G(C)) = p∈P ∥Gp − Gf (p)∥, and f (p) = C(p) does not hold in general. We attempt
to modify the previous analysis to lower bound each ∥Gp − Gf (p)∥ instead of ∥Gp − GC(p)∥.
To lower bound this distance, we attempt to replace every C(p) with f (p) in our previous proof.
The analysis becomes problematic, as our structural observation (4) no longer holds if we change
C(p) to f (p), and this turns out to be the only place where our analysis does not go through.
To resolve this issue, let us focus on the bad scenario where f (p) is sufficiently far from c∗i , i.e.,
∥f (p) − c∗i ∥ ≫ max{∥p − c∗i ∥, rℓi }. This implies f (p) is also far from p. We further show that
∥Gp − Gf (p)∥ ≫ ∥p − c∗i ∥ by careful analysis of G’s randomness. On the other hand, we have
∥p−C(p)∥ ≤ O(∥p−c∗i ∥) by (4). Therefore, we can directly lower bound ∥Gp−Gf (p)∥ by ∥p−C(p)∥
in this case.
2 Preliminaries
Consider a point set P ⊂ Rd . For every x ∈ Rd , denote by P (x) the point in P closest to
x and dist(x, P ) := ∥x − P (x)∥ (recall that throughout ∥ · ∥ is the Euclidean norm). Denote
diam(P ) := max{dist(p, q) : p, q ∈ P } as the diameter of P . For x ∈ Rd and r > 0, denote by
B(x, r) := {y ∈ Rd : |x−y∥ ≤ r} the ball centered at x with radius r. Recall that forPk ∈ N and z ≥ 1,
the (k, z)-clustering cost of P w.r.t. center set C ⊂ Rd , |C| ≤ k is costzk (P, C) := p∈P dist(p, C)z .
The optimal discrete (k, z)-clustering cost of P w.r.t. a candidate center set Q ⊂ Rd is denoted
by optzk (P, Q) := minC⊆Q,|C|≤k costzk (P, C), and by opt(P, Q) for short when k, z are clear from the
context. Denote opt(P ) := opt(P, P ) and opt-cont(P ) := opt(P, Rd ) for simplicity.
We use the following generalized triangle inequalities.
Lemma 2.1 (Generalized triangle inequalities [MMR19]). Let (X, dist) be a metric space. Then
for every z ≥ 1, ε ∈ (0, 1) and p, q, r ∈ X,
Our proof uses ρ-nets for doubling sets, whose definition and key properties are described here.
11
2.2 Dimension reduction
For simplicity, we only consider random linear maps defined by a matrix of iid Gaussians, which
are known to satisfy the JL Lemma [IM98, DG03].
Definition 2.5. A Gaussian JL map is a t × d matrix with i.i.d. entries drawn from N (0, 1t ).
Recall the following concentration bound [IN07, Eq. (7)] (see also [NSIZ21, Eq. (5)]), from
which one can deduce the JL lemma.
Lemma 2.6 ([IN07, Eq. (7)]). Let x ∈ Rd , ε > 0 and a Gaussian JL map G ∈ Rt×d . We have
The following two lemmas regard Gaussian JL maps when applied to doubling sets.
Lemma 2.7 ([IN07, Lemma 4.2]). There exist universal constants A1 , A2 > 0 such that for every
subset P ⊂ B(⃗0, 1) of the Euclidean unit ball in Rd , t > A1 · ddim(P ) + 1, D ≥ 10, and a Gaussian
JL map G ∈ Rt×d ,
2
Pr(∃x ∈ P, ∥Gx∥ > D) ≤ e−A2 tD .
Lemma 2.8 ([HJKY25, Lemma 3.21]). There exists universal constants A1 , A2 , L > 1, such that
for every P ⊂ Rd \ B(⃗0, 1), ε > 0, t > A1 ddim(P ), and a Gaussian JL map G ∈ Rt×d ,
Theorem 3.1. Let ε > 0, z ≥ 1 and d, ddim, k ∈ N and a Gaussian JL map G ∈ Rt×d with suitable
t = O(z 2 ε−2 (ddim log(z/ε) + log k + log log n)). For every set P ⊆ Rd with ddim(P ) ≤ ddim, with
probability at least 2/3,
We use the following lemma to bound the clustering cost of a fixed set of centers and partition
of P . The proof is deferred to Section A.
Lemma 3.2. Let ε > 0, z ≥ 1 and d, k ∈ N and a Gaussian JL map G ∈ Rt×d with suitable
t = O(z 2 ε−2 log ε−1 ). For every set P ⊆ Rd , every set of centers (c1 , . . . , ck ) ⊂ Rd and every
partition P = (S1 , . . . , Sk ) of P , with probability at least 9/10,
12
Proof of Theorem 3.1. Consider an optimal discrete k-median of P . Denote by C ∗ = {c∗1 , . . . , c∗k } ⊆
P and by S1∗ , . . . , Sk∗ the centers and clusters (respectively) in that solution. Applying Lemma 3.2
to the optimal center set C ∗ and the partition P ∗ = (S1∗ , . . . , Sk∗ ), we have that with probability at
least 9/10,
εΩ(ddim(P ))
Pr(∥Gx − Gy∥ > (1 + ε)∥x − y∥) ≤ exp(−ε2 t/8) ≤ .
k 2 m2
Thus, by a union bound, w.p. at least 9/10,
By another union bound, Equations (5) and (6) hold with probability at least 2/3.
We are now ready to prove the second part of the theorem. Let C = {c1 , . . . , ck } ⊆ P and let a
partition P = (S1 , . . . , Sk ) of P . For every p ∈ P we denote by up the nearest net-point to p in the
level such that Pi \ Pi+1 contains p, and the radius of that level is denoted rp . Denote by f (p) the
center in C assigned to p according to the partition P. Recall that C ∗ (p) is a point in C ∗ that is
nearest to p. Observe that
r z Xk m−1
0
X X X
rpz ≤ n · + (2∥p − c∗j ∥)z = O(2z ) · opt(P ),
n10
p∈P j=1 i=0 p∈Pi,j \Pi+1,j
and
Therefore,
cost(G(P), G(C))
13
X
≡ ∥Gp − Gf (p)∥z
p∈P
X
≥ (1 − zε)∥Gup − Guf (p) ∥z − ε−z ∥Gp − Gup ∥z − ε−z ∥Gf (p) − Guf (p) ∥z by Lemma 2.1
p∈P
X
≥ (1 − zε)(1 − ε)z ∥up − uf (p) ∥z − ε−z (10ε3 rp )z − ε−z (10ε3 rf (p) )z by (5) and (6)
p∈P
X
≥ (1 − zε)2 (1 − ε)z ∥p − f (p)∥z − O(ε)z rpz − O(ε)z rfz (p) by Lemma 2.1
p∈P
X
≥ (1 − 3zε)∥p − f (p)∥z − O(ε)z rpz − O(ε)z 22z−1 (∥p − f (p)∥z + ∥p − C ∗ (p)∥z ) by (7)
p∈P
Theorem 4.1. Let ε > 0, z ≥ 1 and d, k, s ∈ N and a Gaussian JL map G ∈ Rt×d with suitable
t = O(z 2 ε−2 (log s + z log(z/ε))). For every set P ⊆ Rd and every candidate center set Q ⊆ Rd with
|Q| = s ≥ k, with probability at least 2/3,
The proof of Theorem 4.1 uses the following lemma, whose proof is provided in Section 4.1.
Lemma 4.2. There exists universal constant A2 > 1, such that for every P ⊂ Rd , ε > 0, z ≥ 1, k ∈
2
N, c ∈ Rd , and a Gaussian JL map G ∈ Rt×d , with probability 1 − ε−O(z) k 2 e−A2 ε t ,
X X ε
∀P ′ ⊆ P, ∥Gp − Gc∥z ≥ (1 − ε)3z ∥p − c∥z − 2 opt-contzk (P ),
′ ′
k
p∈P p∈P
Remark 4.3. There is a similar statement in [MMR19], but w.r.t. the optimal center of P ′ . In
contrast, here the center is fixed.
14
Proof of Theorem 4.1. The first guarantee is the same as Theorem 3.1, so we omit its proof and
focus on the second guarantee. By Lemma 4.2 and a union bound over Q, we have that with
2
probability 1 − s · ε−O(z) k 2 e−A2 ε t ≥ 2/3, all centers c ∈ Q satisfy that
X X ε
∀P ′ ⊆ P, ∥Gp − Gc∥z ≥ (1 − ε)3z ∥p − c∥z − 2 opt-contzk (P ). (8)
′ ′
k
p∈P p∈P
To bypass the O(ε−2 log |Q|) barrier in the target dimension, we consider relaxed contraction,
and prove the following.
Theorem 4.4. Let ε > 0, z ≥ 1 and d, ddim, k ∈ N and a Gaussian JL map G ∈ Rt×d with suitable
t = O(z 2 ε−2 (ddim log(z/ε) + log k + log log α + log log n)). For every n-point set P ⊆ Rd and every
candidate center set Q ⊆ Rd with ddim(P ∪ Q) ≤ ddim, with probability at least 2/3,
costzk (G(P), G(C)) ≥ min{α · optzk (P, Q), (1 − ε) costzk (P, C)},
Proof. The first guarantee is the same as Theorem 3.1, so we omit its proof and focus on the second
guarantee. Consider an optimal discrete k-median of P . Denote by C ∗ = {c∗1 , . . . , c∗k } ⊆ P and by
S1∗ , . . . , Sk∗ the centers and clusters (respectively) in that solution. Denote r0 := optzk (P, Q)1/z . Pick
a suitable m = O(log n) such that 2m = n10 . Denote L as the same (sufficiently large) constant
in Lemma 2.8. For i ∈ [− log(10Lα), m] and j ∈ [k], set ri = r0 /2i , and Pij = Sj∗ ∩ B(cj , ri ), i.e.,
15
for every cluster, we have a sequence of geometrically decreasing balls. Additionally, let Ni be an
ε ri -net of ∪j Pij . As in the proof of Theorem 3.1, we have p∈P rpz = O(2z ) opt.
3
P
By Lemmas 2.6 to 2.8 and a union bound, the following hold with probability at least 2/3,
Equations (9) and (10) are the same as Equations (5) and (6), and hold with probability at least
9/10. Equations (11) and (12) each hold w.p. 9/10 directly by Lemmas 2.7 and 2.8, respectively.
A union bound yields the desired success probability > 2/3.
We are now ready to prove the theorem. Let C = {c1 , . . . , ck } ⊆ Q and let a partition
P = (S1 , . . . , Sk ) of P . For p ∈ P , denote f (p) ∈ C to be the center to which p is assigned. Consider
the following cases.
Case 1, ∃j ∈ [k], s.t. ∥cj −C ∗ (cj )∥ ≥ 10Lα·r0 and Sj ̸⊆ {cj }. By assumption, there exists a point
̸ cj . Then ∥cj − C ∗ (p)∥ ≥ ∥cj − C ∗ (cj )∥ ≥ 10Lα · r0 . By (12), ∥Gcj − GC ∗ (p)∥ ≥ 10αr0 .
p ∈ Sj , p =
On the other hand, ∥p − C ∗ (p)∥ ≤ r0 . By (11), ∥Gp − GC ∗ (p)∥ ≤ 10r0 . Hence,
Case 2, ∀j ∈ [k], ∥cj − C ∗ (cj )∥ ≤ 10Lα · r0 or Sj ⊆ {cj }. Without loss of generality, we can
assume that for all j ∈ [k], Sj ̸⊆ {cj }. That is since whenever Sj ⊆ {cj }, we have cost(Sj , cj ) =
cost(G(Sj ), Gcj ) = 0. S
Therefore, every center in C is covered by the union of nets i Ni . For every p ∈ P ∪ Q we
denote by up the nearest net-point to p in the level such that Pi \ Pi+1 contains p, and the radius
of that level is denoted rp . Same as the proof of Theorem 3.1, we are able to establish (7) for every
rf (p) . Then
cost(G(P), G(C))
X
≡ ∥Gp − Gf (p)∥z
p∈P
X
≥ (1 − zε)∥Gup − Guf (p) ∥z − ε−z ∥Gp − Gup ∥z − ε−z ∥Gf (p) − Guf (p) ∥z by Lemma 2.1
p∈P
X
≥ (1 − zε)(1 − ε)z ∥up − uf (p) ∥z − ε−z (10ε3 rp )z − ε−z (10ε3 rf (p) )z by (9) and (10)
p∈P
X
≥ (1 − zε)2 (1 − ε)z ∥p − f (p)∥z − O(ε)z rpz − O(ε)z rfz (p) by Lemma 2.1
p∈P
16
X
≥ (1 − 3zε)∥p − f (p)∥z − O(ε)z rpz − O(ε)z 22z−1 (∥p − f (p)∥z + ∥p − C ∗ (p)∥z ) by (7)
p∈P
Our proof of Lemma 4.2 is based on [Dan21], and is by reducing to the central symmetric case.
We say a point set X ⊂ Rd is central symmetric with center c ∈ Rd , if for every point x ∈ X, it
holds 2c − x ∈ X. The following lemma shows that the (continuous) 1-median center of a central
symmetric point set coincides with its center of symmetry.
Lemma 4.5. Let z ≥ 1 and X ⊂ Rd be a central symmetric point set centered at point c ∈ Rd .
Then c is an optimal (continuous) (1, z)-clustering center of X.
Denote opt-contzk (P ) as the optimal continuous (k, z)-clustering value of P . The following lemma
is a restatement of [MMR19, Theorem 3.4].
Lemma 4.6 ([MMR19, Theorem 3.4]). Consider a point set X ⊂ Rd . Let G be a random linear
map and C be a random subset of X (which may depend on G). Then with probability at least
−2
1 − O(ε−O(z) k 2 e−Ω(ε t) ),
ε
opt-contz1 (G(C)) ≥ (1 − ε)3z opt-contz1 (C) − opt-contzk (X).
k2
17
Proof of Lemma 4.2. Let P̃ ⊆ P be a subset that maximizes
X X
(1 − ε)3z ∥p − c∥z − ∥Gp − Gc∥z .
p∈P̃ p∈P̃
18
rpz ≤ 2z optzk (P, Q).
P
Lemma 5.2. p∈P
For C ⊆ Q and p ∈ P , recall we denote by C(p) the point closest to p in C. We have the
following lemma that upper bounds the distance from C(p) to C ∗ (p) (and also the distance from
C(p) to p).
Lemma 5.3. Let C ⊆ Q. Then for every i ∈ [k] and p ∈ Si∗ , it holds that ∥C(p) − c∗i ∥ ≤
4 max{rp , ∥c∗i − C(c∗i )∥}.
Proof.
Proof of Theorem 5.1. The first guarantee is the same as Theorem 3.1, so we omit its proof and
focus on the second guarantee. For a generic solution C ⊆ Q, |C| = k, denote C = {c1 , c2 , . . . , ck }.
Denote f (p) := G−1 (GC(Gp)), i.e., f (p) is a center in C realizing dist(Gp, G(C)). For j ∈ [k],
denote Sj := {p ∈ P : f (p) = cj } as the cluster induced by cj .
For every i ∈ [k], define the “threshold level” of cluster i as
We also define the i-th “buffer” as Ii := [ℓi − log(2000L2 ), ℓi + log(αk)], where L is the (sufficiently
large) constant in Lemma 2.8.
For 0 ≤ ℓ ≤ m, denote random variable βℓ to be the minimum real, such that ∀u, v ∈ Nℓ , ∥Gu −
Gv∥ ≥ (1 − ε − βℓ ε)∥u − v∥. Denote random variable γℓ to be the minimum real, such that
∀u ∈ Nℓ , v ∈ B(u, ε3 rℓ ), ∥Gu − Gv∥ ≤ γℓ ε3 rℓ . For p ∈ P ∪ Q, write βp := βjp and γp := γjp for
simplicity.
In the following lemma, we define our good events and bound their success probability. The
proof is deferred to Section 5.1.
Lemma 5.4. With probability at least 0.99, the following events happen simultaneously.
z −Ω(ε2 t) · opt(P, Q), and z z z
P P
(a) p∈P βp rp ≤ e p∈P γp rp ≤ 10 · O(opt(P, Q)).
(e) ∀i ∈ [k], ∀y ∈ (P ∪ Q) \ B(c∗i , 2000L2 · rℓi ), ∥Gy − Gc∗i ∥ > 2000L · rℓi .
19
(f ) For p ∈ P , denote by random variable ξp := miny : ∥y−p∥>9L·rℓ ∥Gy − Gp∥. Then ∀i ∈ [k],
i
X
ξpz > α opt(P, Q).
p∈Pℓi
i
Case 1, one cluster with no cover: max1≤i≤k {∥c∗i − C(c∗i )∥ − 10L · rℓi } > 0. Then there exists
i ∈ [k], such that ∥c∗i − C(c∗i )∥ > 10L · rℓi . Intuitively, this means all points in C are far away from
c∗i . Write
X
cost(G(P ), G(C)) ≥ cost(G(Pℓii ), G(C)) = ∥Gp − Gf (p)∥z . (14)
p∈Pℓi
i
∥p − f (p)∥ ≥ ∥p − C(p)∥
≥ ∥C(p) − c∗i ∥ − ∥p − c∗i ∥
≥ ∥c∗i − C(c∗i )∥ − ∥p − c∗i ∥
> 10L · rℓi − rp
≥ 9L · rℓi .
Case 2, max1≤i≤k {∥c∗i − C(c∗i )∥ − 10L · rℓi } ≤ 0. Then for every i ∈ [k], ∥c∗i − C(c∗i )∥ ≤ 10L · rℓi ,
which intuitively means every center in C ∗ has a nearby neighbor in C.
Comparing “fake” centers to optimal centers. Let i ∈ [k]. For every p ∈ Si∗ , we consider
the distance of p’s “fake” center f (p) (recall, Gf (p) realizes dist(Gp, G(C))) from p’s optimal center
c∗i . There are three ranges we consider for ∥f (p) − c∗i ∥.
Define Ri := {p ∈ Si∗ : rℓi /(αk) ≤ ∥f (p) − c∗i ∥ ≤ 2000L2 · rℓi }, and denote R := ki=1 Ri
S
(called “the middle range”). Moreover, define Ti := {p ∈ Si∗ : ∥f (p) − c∗i ∥ ≤ rℓi /(αk)}, and denote
T := ki=1 Ti (called “the close range”).
S
20
Case 2.1, the middle range p ∈ R. Let us first lower bound ∥Gp − Gf (p)∥ for p ∈ R. Assume
C ∗ (p) = c∗i and f (p) = cj , where i, j ∈ [k]. Since p ∈ Ri , we can assume rℓ+1 < ∥cj − c∗i ∥ ≤ rℓ for
some level ℓ ∈ Ii . Let ui,j be the net point in Nℓ closest to cj . Then
∥Gp − Gf (p)∥z ≥ (1 − zε)∥Gp − Gui,j ∥z − ε−z ∥Gcj − Gui,j ∥z by Lemma 2.1
z −z 3 z
≥ (1 − zε)∥Gp − Gui,j ∥ − ε (10ε rℓ ) by event (b)
≥ (1 − zε)∥Gp − Gui,j ∥ − z
O(ε) ∥cj − c∗i ∥z
2z
Case 2.2, the close range p ∈ T . This is somewhat of a special case of Case 2.1. Assume
C ∗ (p) = c∗i and f (p) = cj , where i, j ∈ [k]. Since p ∈ Ri , we have ∥cj − c∗i ∥ ≤ rℓ for ℓ = ℓi + log(αk).
Let ui,j be the net point in Nℓ closest to cj . we have,
∥Gp − Gf (p)∥z ≥ (1 − zε)∥Gp − Gui,j ∥z − ε−z ∥Gcj − Gui,j ∥z
≥ (1 − zε)∥Gp − Gui,j ∥z − O(ε)2z rℓz .
/ B(c∗i , rℓi +1 ), then
If p ∈
rℓ ≤ 1
2k ∥p − c∗i ∥ ≤ 1
2k (∥p − cj ∥ + ∥cj − c∗i ∥) ≤ 1
2k (∥p − cj ∥ + rℓ ).
Rearranging, we obtain rℓ ≤ ∥p − cj ∥. Summing over p ∈ T , we have
X
∥Gp − Gf (p)∥z
p∈T
21
k X
X k X
= ∥Gp − Gcj ∥z
i=1 j=1 p∈Ti ∩Sj
k X
X k X
≥ (1 − zε)∥Gp − Gui,j ∥z − O(ε)2z ∥p − cj ∥z − O(ε)2z P ∩ B(c∗i , rℓi +1 ) · rℓz
i=1 j=1 p∈Ti ∩Sj
k X
X k X
≥ (1 − zε)∥Gp − Gui,j ∥z − O(ε)2z ∥p − cj ∥z − O(ε opt) by choice of ℓi
i=1 j=1 p∈Ti ∩Sj
Case 2.3, the far range p ∈ / R ∪ T . We now consider points p ∈ Si∗ \ (R ∪ T ), i.e., ∥f (p) − c∗i ∥ ≥
2000L rℓi . Suppose f (p) = cj . By (e), ∥Gcj − Gc∗i ∥ ≥ 2000Lrℓi .
2
Proof. Assume by contradiction that rp < 10Lrℓi . By Lemma 5.3, ∥C(p) − c∗i ∥ ≤ 4 max{rp , ∥c∗i −
C(c∗i )∥} ≤ 40Lrℓi . Thus by (d), Gp, GC(p) ∈ B(Gc∗i , 400Lrℓi ). Therefore,
∥Gcj − Gc∗i ∥ ≤ ∥Gcj − Gp∥ + ∥Gp − Gc∗i ∥ ≤ ∥GC(p) − Gp∥ + ∥Gp − Gc∗i ∥ ≤ 800Lrℓi ,
contradiction.
On a high level, as can be seen by the claim, we have that both f (p) and p are far from c∗i . We
split into cases depending which of p or f (p) is farther from c∗i (up to a constant), as follows.
Case 2.3.1, p ∈ Si∗ \ (R ∪ T ), and ∥f (p) − c∗i ∥ > 10Lrp . By triangle inequality,
22
X
≥ ηpz since ∥p − f (p)∥ ≥ 9Lrp
p∈Si∗ \(R∪T )
∥f (p)−c∗i ∥>10Lrp
X X
≥ (9rp )z − e−Ω(t) · rpz by event (g)
p∈Si∗ \(R∪T ) p∈Si∗
∥f (p)−c∗i ∥>10Lrp
X X
≥ (5rp )z − e−Ω(t) · rpz
p∈Si∗ \(R∪T ) p∈Si∗
∥f (p)−c∗i ∥>10Lrp
X X
≥ ∥p − C(p)∥z − e−Ω(t) · rpz by (17) (18)
p∈Si∗ \(R∪T ) p∈Si∗
∥f (p)−c∗i ∥>10Lrp
Case 2.3.2, p ∈ Si∗ \ (R ∪ T ), and ∥f (p) − c∗i ∥ ≤ 10Lrp . Denote up and uf (p) to be the net points
in Njp that are closest to p and f (p), respectively. Then
∥Gp − Gf (p)∥z
≥ (1 − 2zε)∥Gup − Guf (p) ∥z − ε−z ∥Gp − Gup ∥z − ε−z ∥Gf (p) − Guf (p) ∥z by triangle inequality
z z −z 3 z
≥ (1 − 2zε)(1 − ε − βp ε) ∥up − uf (p) ∥ − 2ε (γp ε rp ) by definitions of βp , γp
≥ (1 − 3zε − βp zε)∥up − uf (p) ∥z − O(ε)2z γpz rpz
≥ (1 − 3zε − βp zε)∥p − f (p)∥z − O(ε)2z rpz − O(ε)2z γpz rpz
Since ∥p − f (p)∥ ≤ ∥p − c∗i ∥ + ∥f (p) − c∗i ∥ ≤ rp + 10Lrp ≤ 20Lrp , we have
≥ (1 − 3zε)∥p − f (p)∥z − βp zε · (20L)z rpz − O(ε)2z rpz − O(ε)2z γpz rpz .
Therefore,
X
∥Gp − Gf (p)∥z
p∈Si∗ \(R∪T )
∥f (p)−c∗i ∥≤10Lrp
X X X
≥ (1 − 3zε) ∥p − f (p)∥z − zε(20L)z βp rpz − O(ε)2z (1 + γpz )rpz . (19)
p∈Si∗ \(R∪T ) p∈Si∗ p∈Si∗
∥f (p)−c∗i ∥≤10Lrp
23
2 t)
X
≥ (1 − 3zε) ∥p − C(p)∥z − zε(20L)z e−Ω(ε · opt(P, Q) − O(ε)2z · opt(P, Q)
p∈P \(R∪T )
X
≥ (1 − 3zε) ∥p − C(p)∥z − O(ε) · opt(P, Q), (20)
p∈P \(R∪T )
where the second last inequality follows from event (a) and Lemma 5.2. Finally, we combine (15),(16)
and (20) and obtain
(e) ∀i ∈ [k], ∀y ∈ (P ∪ Q) \ B(c∗i , 2000L2 · rℓi ), ∥Gy − Gc∗i ∥ > 2000L · rℓi .
(f ) For p ∈ P , denote by random variable ξp := miny : ∥y−p∥>9L·rℓ ∥Gy − Gp∥. Then ∀i ∈ [k],
i
X
ξpz > α opt(P, Q).
p∈Pℓi
i
Proof. It suffices to show that each of the events happens with probability at least 0.999, then a
union bound concludes the proof.
24
2
Event (a). We first show that ∀ℓ ∈ [m], E(βℓ ) = e−Ω(ε t) . Recall we define βℓ to be the minimum
real, such that ∀u, v ∈ Nℓ , ∥Gu − Gv∥ ≥ (1 − ε − βℓ ε)∥u − v∥. Then for every h ≥ 0, we have
Pr(βℓ > h) ≤ Pr ∃u, v ∈ Nℓ , ∥Gu − Gv∥ < (1 − (h + 1)ε)∥u − v∥
2 ε2 t/8
≤ ε−O(ddim) e−(h+1)
2 t(h+1)2
≤ e−a·ε .
The last inequality holds since the target dimension t = Ω(ε−2 ddim log ε−1 ), and thus a is a constant.
Therefore,
Z +∞ Z +∞ Z +∞ Z +∞
−aε2 t(h+1)2 −aε2 th2 2 2 1 −aε2 t
E(βℓ ) = Pr(βℓ > h) dh ≤ e dh = e dh ≤ he−aε th dh = e .
0 0 1 1 2aε2 t
Hence,
2 2 2
X X X
E βp rpz = rpz · E(βp ) = e−Ω(ε t) · rpz = e−Ω(ε t) 2z · opt(P, Q) ≤ e−Ω(ε t) · opt(P, Q).
p∈P p∈P p∈P
such that ∀u ∈ Nℓ , v ∈ B(u, ε3 rℓ ), ∥Gu − Gv∥ ≤ γℓ ε3 rℓ . Then for every h > 10, we have
Pr(γℓ > h) ≤ Pr ∃u ∈ Nℓ , v ∈ B(u, ε3 rℓ ), ∥Gu − Gv∥ > hε3 rℓ
2t
≤ ε−O(ddim) e−A3 h by Lemma 2.7
−bth2
≤e ,
Hence,
X X X
E γpz rpz = rpz · E(γpz ) = O(10z ) · rpz ≤ O(10z ) · opt(P, Q).
p∈P p∈P p∈P
25
Event (b). Fix i ∈ [k], level ℓ ∈ Ii and a net point u ∈ Nℓ . By Lemma 2.7, Pr(∃v ∈
B(u, ε3 rℓ ), ∥Gu − Gv∥ > 10ε3 rℓ ) ≤ e−Ω(t) . Noting that |Ii | = O(log(αk)), a union bound over all
k · (log k + log α) · ε−O(ddim) tuples (i, ℓ, u) completes the proof.
2 t)
Event (c). Fix a point u ∈ P . By Lemma 4.2, with probability 1 − ε−O(z) k 2 e−Ω(ε ,
X X ε
∀P ′ ⊆ P, ∥Gp − Gu∥z ≥ (1 − ε)3z ∥p − u∥z − opt-contzk (P )
k2
p∈P ′ p∈P ′
X ε
≥ (1 − ε)3z ∥p − u∥z − opt(P, Q).
k2
p∈P ′
A union bound over all k · (log k + log α) · ε−O(ddim) tuples (i, ℓ, u) completes the proof.
Event (d). For every i ∈ [k], by Lemma 2.7, Pr(∃y ∈ B(c∗i , 40L · rℓi ), ∥Gy − Gc∗i ∥ > 400L · rℓi ) ≤
e−Ω(t) . A union bound over i ∈ [k] completes the proof.
Event (e). For every i ∈ [k], by Lemma 2.8, Pr(∃y ∈ (P ∪ Q) \ B(c∗i , 2000L2 · rℓi ), ∥Gy − Gc∗i ∥ ≤
2000L · rℓi ) ≤ e−Ω(t) . A union bound over i ∈ [k] completes the proof.
By Markov’s inequality and a union bound over i ∈ [k], with probability 0.999,
X X X
ξp ≥ (9rℓi )z − O(k · e−A2 t ) · (9rℓi )z ≥ 8|Pℓii | · rℓzi > α opt(P, Q), ∀i ∈ [k].
p∈Pℓi p∈Pℓi p∈Pℓi
i i i
Event (g). By Lemma 2.8, we have Pr(ηp < 9rp ) ≤ e−A2 t . Hence,
By Markov’s inequality and a union bound over i ∈ [k], with probability 0.999,
X X X
max{0, 9rp − ηp } ≤ O(k · e−A2 t ) · 9z rpz ≤ e−Ω(t) rpz , ∀i ∈ [k].
p∈Si∗ p∈Si∗ p∈Si∗
26
6 Lower bounds
In this section, we provide our lower bounds. For simplicity, we do not try to optimize the dependence
on z. All lower bounds are presented for z = 1.
Theorem 6.1. Let n, dP∈ N, and P = {0d }n . LetPG ∈ R(d−1)×d be any linear map. Then, there
exists c ∈ Rd such that p∈P ∥Gp − Gc∥ = 0 and p∈P ∥p − c∥ = n.
Proof. Take a unit length vector c ∈ ker(G), i.e., Gc = 0 and ∥c∥ = 1. The proof follows immediately.
Theorem 6.2. Let n, d ∈ N and ε ∈ (0, 12 ). There exists P ⊂ Rd of size |P | = n and ddim(P ) =
Θ(1), such that if G is a Gaussian JL map onto dimension t ≤ aε−2 log log n forP a sufficiently small
constant
P a > 0, then with probability at least 2/3, there exists c ∈ P such that p∈P ∥Gp − Gc∥ ≤
(1 − ε) p∈P ∥p − c∥.
To prove the theorem, we will use the following lemma, whose proof appears in Section B for
completeness (see [ZZ20] for a stronger bound).
Lemma 6.3. Let t > 2 be an integer, let Xt be a chi-squared random variable with t degrees of
freedom, and let ε ∈ (0, 12 ). We have
2 t)
e−O(ε
t
Pr Xt < ≥ .
1+ε t
1
log n
Proof of Theorem 6.2. Denote by ei the i-th standard basis vector. Pick P = {2−i ei }i=0
2
∪
1
{0d }n− 2 log n . For each i ∈ [0, 12 log n], by Lemma 6.3,
2 t)
e−O(ε 10
Pr(∥Gei ∥ ≤ 1 − ε) ≥ t ≥ log n .
Therefore, the probability that for all i ∈ [0, 12 log n], we have ∥Gei ∥ > 1 − ε, is at most (1 −
10 12 log n
log n )
1
≤ 10 . Suppose this event does not happen, therefore there exists i∗ ∈ [0, 12 log n] such
that ∥Gei∗ ∥ ≤ 1 − ε. Moreover, assume that max 1 ∥Gei ∥ ≤ log log n, which holds with high
i∈[0, 2 log n]
∗
probability. Pick c = 2−i e i∗ . We have
X
∥Gp − Gc∥ ≤ (n − 21 log n) · ∥Gc∥ + 12 log n · 2 log log n
p∈P
∗
≤ (1 − ε)(n − 12 log n)2−i + log n · log log n
27
∗
≤ (1 − 2ε )(n − 12 log n)2−i ,
∗ 1
where we used 2−i ≥ 2− 2 log n = √1 ,
n
so for sufficiently large n, we have log n log log n ≤ ε
√
2 n
(n −
1
2 log n). Moreover,
∗
X
∥p − c∥ ≥ (n − 12 log n)2−i .
p∈P
The proof follows by rescaling ε and combining the two inequalities.
Therefore, the probability that for all i ∈ [0, 12 log n], we have ∥Gei ∥ > 1 − ε, is at most (1 −
10 12 log n
log n )
1
≤ 10 . Suppose this event does not happen, therefore there exists i∗ ∈ [0, 12 log n] such
that ∥Gei∗ ∥ ≤ 1 − ε. Moreover, assume that i∈[0, 1 log n] ∥Gei ∥(1 − ε)i ≤ (1 + ε) i∈[0, 1 log n] (1 −
P P
2 2
ε)i = (1 + ε) opt, which holds with high probability by [Dan21, Theorem A.3.1] since t = Ω(ε−2 )
(alternatively, one can use Lemma 4.6 with k = 1, and get that this bound holds w.h.p. if
∗ −1 −i∗
t = Ω(ε−2 log 1ε )). Pick c1 = 0, c2 = (1 − ε)i ei∗ and P2 = {0d }50ε ·(1−ε) , P1 = P \ P2 . (Note
∗ 1
that 50ε−1 · (1 − ε)−i ≤ 50ε−1 · 2 2 log n ≤ n2 , so this assignment is feasible.) We have,
∗ ∗
X X X
∥p − ci ∥ = (1 − ε)i · 50ε−1 · (1 − ε)−i + (1 − ε)i ≥ 51ε−1 − 2;
i∈{1,2} p∈Pi i̸=i∗
and
∗ ∗
X X X
∥Gp − Gci ∥ ≤ (1 − ε)i · 50ε−1 · (1 − ε)−i · (1 − ε) + ∥Gei ∥(1 − ε)i
i∈{1,2} p∈Pi i̸=i∗
X
≤ (1 − ε)50ε−1 + (1 + ε) (1 − ε)i
i∈[0, 12 log n]
≤ (1 − ε)50ε−1 + (1 + ε)ε−1
= 51ε−1 − 49,
which is smaller than both i∈{1,2} p∈Pi ∥p − ci ∥ and 100 opt = 100ε−1 , concluding the proof.
P P
28
6.4 Discrete, for all centers, with candidate center set
Theorem 6.5. Let n, s, d ∈ N and ε ∈ (0, 12 ). There exists P, Q ⊂ Rd of sizes |P | = n, |Q| = s,
and ddim(P ∪ Q) = O(1), such that if G is a Gaussian JL map onto dimension t ≤ aε−2 log s for
aPsufficiently small constantPa > 0, then with probability at least 2/3, there exists c ∈ Q such that
p∈P ∥Gp − Gc∥ ≤ (1 − ε) p∈P ∥p − c∥.
Proof. Consider P = {0d }n and Q = {2i ei }si=1 . For each i ∈ [s], by Lemma 6.3,
2 t)
e−O(ε 10
Pr(∥Gei ∥ ≤ 1 − ε) ≥ t ≥ s .
With probability at least 1 − (1 − 10/s)s ≥ 1 − e−10 , there exists i∗ ∈ [s] such that ∥Gei∗ ∥ ≤ 1 − ε.
∗
Pick c = 2i ei∗ . Then
∗ ∗
cost(G(P ), Gc) = n · 2i ∥Gei∗ ∥ ≤ n · 2i · (1 − ε) = (1 − ε) cost(P, c).
References
[BBC+ 19] Luca Becchetti, Marc Bury, Vincent Cohen-Addad, Fabrizio Grandoni, and Chris
Schwiegelshohn. Oblivious dimension reduction for k-means: beyond subspaces and
the Johnson-Lindenstrauss Lemma. In Proceedings of the 51st Annual ACM SIGACT
Symposium on Theory of Computing, STOC, pages 1039–1050, 2019. doi:10.1145/33
13276.3316318.
[BRS11] Yair Bartal, Ben Recht, and Leonard J Schulman. Dimensionality reduction: beyond the
johnson-lindenstrauss bound. In Proceedings of the twenty-second annual ACM-SIAM
symposium on Discrete Algorithms, pages 868–887. SIAM, 2011.
[BZD10] Christos Boutsidis, Anastasios Zouzias, and Petros Drineas. Random projections for k-
means clustering. In 24th Annual Conference on Neural Information Processing Systems,
NeurIPS, pages 298–306. Curran Associates, Inc., 2010. URL: https://proceedings.ne
urips.cc/paper/2010/hash/73278a4a86960eeb576a8fd4c9ec6997-Abstract.html.
[CEM+ 15] Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina
Persu. Dimensionality reduction for k-means clustering and low rank approximation. In
Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing,
STOC, pages 163–172, 2015. doi:10.1145/2746539.2746569.
[CEMN22] Vincent Cohen-Addad, Hossein Esfandiari, Vahab Mirrokni, and Shyam Narayanan.
Improved approximations for Euclidean k-means and k-median, via nested quasi-
independent sets. In Proceedings of the 54th Annual ACM SIGACT Symposium on
Theory of Computing, pages 1621–1628, 2022.
[CFS21] Vincent Cohen-Addad, Andreas Emil Feldmann, and David Saulpic. Near-linear time
approximation schemes for clustering in doubling metrics. Journal of the ACM, 68(6):1–
34, 2021. doi:10.1145/3477541.
29
[CGL+ 25] Vincent Cohen-Addad, Fabrizio Grandoni, Euiwoong Lee, Chris Schwiegelshohn, and
Ola Svensson. A (2+ ε)-approximation algorithm for metric k-median. In Proceedings
of the 57th Annual ACM Symposium on Theory of Computing, pages 615–624, 2025.
[CJK23] Xiaoyu Chen, Shaofeng H.-C. Jiang, and Robert Krauthgamer. Streaming Euclidean
Max-Cut: Dimension vs data reduction. In Proceedings of the 55th Annual ACM
Symposium on Theory of Computing, STOC, pages 170–182, 2023. doi:10.1145/3564
246.3585170.
[CK19] Vincent Cohen-Addad and C. S. Karthik. Inapproximability of clustering in Lp metrics.
In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS),
pages 519–539. IEEE, 2019. doi:10.1109/FOCS.2019.00040.
[CKL22] Vincent Cohen-Addad, C. S. Karthik, and Euiwoong Lee. Johnson coverage hypothesis:
Inapproximability of k-means and k-median in ℓp -metrics. In Proceedings of the 2022
Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1493–1530.
SIAM, 2022.
[CNW16] Michael B. Cohen, Jelani Nelson, and David P. Woodruff. Optimal approximate ma-
trix product in terms of stable rank. In 43rd International Colloquium on Automata,
Languages, and Programming (ICALP 2016), volume 55 of LIPIcs, pages 11:1–11:14.
Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPICS.ICALP
.2016.11.
[CW25] Moses Charikar and Erik Waingarten. The Johnson-Lindenstrauss Lemma for clustering
and subspace approximation: From coresets to dimension reduction. In SODA, pages
3172–3209. SIAM, 2025. doi:10.1137/1.9781611978322.102.
[Dan21] Matan Danos. Coresets for clustering by uniform sampling and generalized rank aggre-
gation. Master’s thesis, Weizmann Institute of Science, Rehovot, Israel, 2021. URL:
https://www.wisdom.weizmann.ac.il/~robi/files/MatanDanos-MScThesis-202
1_11.pdf.
[DG03] Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of Johnson
and Lindenstrauss. Random Struct. Algorithms, 22(1):60–65, 2003. doi:10.1002/rsa.
10073.
[GJK+ 25] Jie Gao, Rajesh Jayaram, Benedikt Kolbe, Shay Sapir, Chris Schwiegelshohn, Sandeep
Silwal, and Erik Waingarten. Randomized dimensionality reduction for Euclidean maxi-
mization and diversity measures. In Forty-second International Conference on Machine
Learning, 2025. URL: https://openreview.net/forum?id=Rcivp36KzO.
[GK15] Lee-Ad Gottlieb and Robert Krauthgamer. A nonlinear approach to dimension reduction.
Discrete & Computational Geometry, 54(2):291–315, 2015. doi:10.1007/s00454-015
-9707-9.
[GKL03] Anupam Gupta, Robert Krauthgamer, and James R. Lee. Bounded geometries, fractals,
and low-distortion embeddings. In 44th Symposium on Foundations of Computer Science,
FOCS, pages 534–543. IEEE Computer Society, 2003. doi:10.1109/SFCS.2003.1238
226.
30
[HJKY25] Lingxiao Huang, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Di Yue. Near-optimal
dimension reduction for facility location. In Proceedings of the 57th Annual ACM
Symposium on Theory of Computing, STOC, pages 665–676, 2025. doi:10.1145/3717
823.3718214.
[HM04] Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median cluster-
ing. In STOC, pages 291–300. ACM, 2004. doi:10.1145/1007352.1007400.
[IM98] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing
the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on
the Theory of Computing, STOC, pages 604–613, 1998. doi:10.1145/276698.276876.
[IN07] Piotr Indyk and Assaf Naor. Nearest-neighbor-preserving embeddings. ACM Trans.
Algorithms, 3(3):31, 2007. doi:10.1145/1273340.1273347.
[ISZ21] Zachary Izzo, Sandeep Silwal, and Samson Zhou. Dimensionality reduction for Wasser-
stein barycenter. Advances in neural information processing systems, 34:15582–15594,
2021.
[JKS24] Shaofeng H.-C. Jiang, Robert Krauthgamer, and Shay Sapir. Moderate dimension
reduction for k-center clustering. In 40th International Symposium on Computational
Geometry (SoCG 2024), volume 293 of Leibniz International Proceedings in Informatics
(LIPIcs), pages 64:1–64:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024.
doi:10.4230/LIPIcs.SoCG.2024.64.
[JL84] William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into
Hilbert space. Contemporary mathematics, 26:189–206, 1984. doi:10.1090/conm/026
/737400.
[KR09] Leonard Kaufman and Peter J Rousseeuw. Finding groups in data: an introduction to
cluster analysis. John Wiley & Sons, 2009.
[Lam10] Christiane Lammersen. Approximation Techniques for Facility Location and Their
Applications in Metric Embeddings. PhD thesis, Technische Universität Dortmund, 2010.
doi:10.17877/DE290R-8506.
[LN17] Kasper Green Larsen and Jelani Nelson. Optimality of the Johnson-Lindenstrauss
Lemma. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS,
pages 633–638, 2017. doi:10.1109/FOCS.2017.64.
[LRU20] Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. Mining of massive data
sets. Cambridge university press, 2020.
[LSS09] Christiane Lammersen, Anastasios Sidiropoulos, and Christian Sohler. Streaming embed-
dings with slack. In 11th International Symposium on Algorithms and Data Structures,
WADS, volume 5664 of Lecture Notes in Computer Science, pages 483–494. Springer,
2009. doi:10.1007/978-3-642-03367-4\_42.
31
[Mah11] Michael W. Mahoney. Randomized algorithms for matrices and data. Foundations and
Trends in Machine Learning, 3(2):123–224, 2011. doi:10.1561/2200000035.
[NSIZ21] Shyam Narayanan, Sandeep Silwal, Piotr Indyk, and Or Zamir. Randomized dimen-
sionality reduction for facility location and single-linkage clustering. In Proceedings
of the 38th International Conference on Machine Learning, ICML, volume 139 of
Proceedings of Machine Learning Research, pages 7948–7957. PMLR, 2021. URL:
http://proceedings.mlr.press/v139/narayanan21b.html.
[PJ09] Hae-Sang Park and Chi-Hyuck Jun. A simple and fast algorithm for K-medoids cluster-
ing. Expert Syst. Appl., 36(2):3336–3341, 2009. doi:10.1016/J.ESWA.2008.01.039.
[TZM+ 20] Mo Tiwari, Martin J Zhang, James Mayclin, Sebastian Thrun, Chris Piech, and Ilan
Shomorony. Banditpam: Almost linear time k-medoids clustering via multi-armed
bandits. Advances in Neural Information Processing Systems, 33:10211–10222, 2020.
[Woo14] David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and
Trends in Theoretical Computer Science, 10(1—2):1–157, 2014. doi:10.1561/040000
0060.
[XGF16] Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsupervised deep embedding for cluster-
ing analysis. In Proceedings of The 33rd International Conference on Machine Learning,
pages 478–487. PMLR, 2016. URL: https://proceedings.mlr.press/v48/xieb16.h
tml.
[ZZ20] Anru R. Zhang and Yuchen Zhou. On the non-asymptotic and sharp lower tail bounds
of random variables. Stat, 9(1):e314, 2020. doi:10.1002/sta4.314.
We state the following lemma in [MMR19], which bounds the expected distance distortion of a
fixed pair of points under Gaussian JL maps.
32
Lemma A.1 ([MMR19, Eq. (5)]). Let ε ∈ (0, 1), z ≥ 1 and G ∈ Rt×d be a Gaussian JL map. For
p, q ∈ Rd ,
∥Gp − Gq∥z
2
E max 0, z
− (1 + ε)z
≤ e−Ω(ε t) .
∥p − q∥
Proof of Lemma 3.2. For every i ∈ [k] and p ∈ Si , by Lemma A.1,
∥Gp − Gci ∥z
2
E max 0, z
− (1 + ε)z
≤ e−Ω(ε t) .
∥p − ci ∥
Therefore,
2 t)
E (max {0, ∥Gp − Gci ∥z − (1 + ε)z ∥p − ci ∥z }) ≤ e−Ω(ε ∥p − ci ∥z .
33
t
− Z t/(1−ε)
e 2(1−ε)
≥ t/2 xt/2−1 dx
2 Γ(t/2) 0
t
− 2(1−ε) t/2
e t
= t/2 ·
2 Γ(t/2)t/2 1−ε
tt/2
t
− 2(1−ε) 1
= t/2 · e · .
2 Γ(t/2)t/2 (1 − ε)t/2
Now we analyze the first term. From [Prond], we have that
t
t/2
2 −1
Γ(t/2) = (t/2 − 1)! ≤ ,
et/2−2
so
tt/2 et/2−2 tt/2 t/2−2 tt/2
≥ ≥ e · ≥ Ω(et/2 /t).
2t/2 Γ(t/2)t/2 2t/2 (t/2 − 1)t/2 t/2 2t/2 (t/2)t/2 t/2
Furthermore,
t
− 2(1−ε) 1 t 1 t ε2
e · = exp − + log(1 − ε) ≥ exp − 1+ 2 ,
(1 − ε)t/2 2 1−ε 2
1 2
where we used log(1 − ε) ≤ −ε and 1−ε ≤ 1 + ε + ε2 , which hold for ε ≤ 12 .
Multiplying the above two bounds and canceling the et/2 term, we have
2
e−O(tε )
t
Pr X < ≥ ,
1−ε t
as desired.
34
Theorem C.2. Let n ∈ N and ε ∈ (0, 12 ), with n = Ω(ε−2 ). Fix k = 2. There exists P ⊂ Rd of
size |P | = n, such that if G is a Gaussian JL map onto dimension t ≤ aε−2 log n for a sufficiently
small constant a > 0, then with probability at least 2/3, opt(G(P )) < (1 − ε) opt(P ).
Proof. The proof is based on [CW25] and a sketch is given in Section 1.1. Recall that the instance
is the first n standard basis vectors. We now complete the proof sketch into a full proof. Let
j1 ∈ [n/2] and j2 ∈ [ n2 + 1, n] be the indices minimizing ∥Gej ∥ in their regime. By Lemma 6.3,
√
E ∥Geji ∥2 < 1 − ε and E( ∥Geji ∥2 + 1) < (1 − ε) 2 for i = 1, 2. Next, consider i ∈ [ n2 ]. We have
p
E ∥Gej2 − Gei ∥ = E E ∥Gej2 − Gei ∥ Gej2 law of total expectation
q
≤E E ∥Gej2 − Gei ∥2 Gej2 Jensen’s inequality
q
= E( ∥Gej2 ∥2 + 1) independence
√
< (1 − ε) 2,
and
var(∥Gej2 − Gei ∥) ≤ E ∥Gej2 − Gei ∥2 = E(∥Gej2 ∥2 + 1) < 2 − ε,
and the same holds for i ∈ [ n2 + 1, n] and j1 . Therefore, by Chebyshev’s inequality and a union
bound, with probability 2/3,
X X √
opt(G(P )) ≤ ∥Gei − Gej2 ∥ + ∥Gei − Gej1 ∥ < (1 − Ω(ε)) 2 · n.
i∈[ n
2
] i∈[ n
2
+1,n]
35