0% found this document useful (0 votes)

25 views35 pages

Dimension Reduction For Clustering

Uploaded by

polagame

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views35 pages

Dimension Reduction For Clustering

Uploaded by

polagame

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Dimension Reduction for Clustering:

The Curious Case of Discrete Centers

Shaofeng H.-C. Jiang∗ Robert Krauthgamer†
Peking University Weizmann Institute of Science
Shay Sapir‡ Sandeep Silwal§
Weizmann Institute of Science University of Wisconsin-Madison
arXiv:2509.07444v1 [cs.DS] 9 Sep 2025

Di Yue¶
Peking University

September 10, 2025

Abstract
The Johnson-Lindenstrauss transform is a fundamental method for dimension reduction
in Euclidean spaces, that can map any dataset of n points into dimension O(log n) with low
distortion of their distances. This dimension bound is tight in general, but one can bypass
it for specific problems. Indeed, tremendous progress has been made for clustering problems,
especially in the continuous setting where centers can be picked from the ambient space Rd . Most
notably, for k-median and k-means, the dimension bound was improved to O(log k) [Makarychev,
Makarychev and Razenshteyn, STOC 2019].
We explore dimension reduction for clustering in the discrete setting, where centers can
only be picked from the dataset, and present two results that are both parameterized by the
doubling dimension of the dataset, denoted as ddim. The first result shows that dimension
Oε (ddim + log k + log log n) suffices, and is moreover tight, to guarantee that the cost is pre-
served within factor 1 ± ε for every set of centers. Our second result eliminates the log log n
term in the dimension through a relaxation of the guarantee (namely, preserving the cost only
for all approximately-optimal sets of centers), which maintains its usefulness for downstream
applications.
Overall, we achieve strong dimension reduction in the discrete setting, and find that it
differs from the continuous setting not only in the dimension bound, which depends on the
doubling dimension, but also in the guarantees beyond preserving the optimal value, such as
which clusterings are preserved.
∗
Email: [email protected]
†
The Harry Weinrebe Professorial Chair of Computer Science. Work partially supported by the Israel Science
Foundation grant #1336/23. Email: [email protected]
‡
Email: [email protected]
§
Email: [email protected]
¶
Email: di [email protected]

1
1 Introduction
Oblivious dimension reduction, in the spirit of the Johnson and Lindenstrauss (JL) Lemma [JL84],
is a fundamental technique for many Euclidean optimization problems over large, high-dimensional
datasets. It has a strong guarantee: there is a random linear map π : Rd → Rt , for a suitable target
dimension t = O(ε−2 log n), such that for every n-point dataset P ⊂ Rd , with high probability, π
preserves all pairwise distances in P within factor 1 ± ε:

∀x, y ∈ P, ∥π(x) − π(y)∥ ∈ (1 ± ε)∥x − y∥,

where throughout ∥ · ∥ is the Euclidean norm. This guarantee is extremely powerful, particularly
for algorithms: to solve a Euclidean problem on input P , one can apply the map π, solve the same
problem on π(P ), which is often more efficient since π(P ) lies in low dimension, and “lift” the
solution back to the original dimension (as discussed further in Section 1.2).
However, many problems require computational resources that grow exponentially with the
dimension (the curse of dimensionality), and hence even dimension t = O(ε−2 log n) might be too
large. Unfortunately, this dimension bound is tight in general, i.e., for preserving all pairwise
distances [LN17], but interestingly one may bypass it for specific optimization problems, by showing
that the optimal value/solution is preserved even when the dimension is reduced beyond the JL
Lemma, say to dimension t = O(ε−2 ), which is completely independent of n. This raises an
important question:

For which problems does dimension o(ε−2 log n) suffice for oblivious dimension reduction?

Prior work has revealed an affirmative answer for several key problems, as we discuss below.
This paper studies this question for fundamental clustering problems, captured by (k, z)-clustering,
which includes the famous k-means and k-median problems as its special cases. In (k, z)-clustering,
the input is a dataset P ⊂ Rd , and the goal is to find a set of centers C of size |C| ≤ k that
minimizes X
costz (P, C) := distz (p, C), where distz (p, C) := min ∥p − c∥z .
c∈C
p∈P

We can distinguish two variants, differing in their space of potential centers. In the continuous
variant, C is a subset of Rd (the centers lie in the ambient space), and in the discrete variant, also
called sometimes k-medoids, C is a subset of P (or maybe of a larger set given as input). A key
feature of the discrete version, is that π : P → π(P ) is invertible, hence each potential center in
π(P ) corresponds to a unique potential center in P (in contrast, a potential center in the ambient
space Rt has many preimages in Rd ). Thus, in the discrete version, a set of centers computed for
the dataset π(P ) can be mapped back to the higher dimension and serve as centers for the dataset
P . See Section 1.3 for a discussion on practical applications of the discrete variant.
The continuous variant is a success story of the “beyond JL” program. A series of papers [BZD10,
CEM+ 15, BBC+ 19, MMR19] has culminated showing that target dimension t = O(ε−2 log kε ),
which is independent of n, suffices to preserve all the solutions within factor 1 ± ε. Curiously,
Charikar and Waingarten [CW25] observed that the discrete variant behaves very differently: certain
instances require t = Ω(log n), even for k = 1 (when using the standard Gaussian-based map π).
Counterintuitively, restricting the centers to be data points makes dimension reduction significantly
harder!

2
To bypass this limitation, we consider the doubling dimension, which was identified in previous
work as a natural parameter that is very effective in achieving “beyond JL” bounds [IN07, NSIZ21,
JKS24, HJKY25, GJK+ 25]. Formally, the doubling dimension of P , denoted ddim(P ), is the smallest
positive number such that every ball in the finite metric P can be covered by 2ddim(P ) balls of half the
radius. For several problems, including nearest neighbor [IN07], facility location [NSIZ21, HJKY25],
and maximum matching [GJK+ 25], target dimension t = O(ε−2 log 1ε · ddim(P )) suffices. Note
that restricting the doubling dimension does not immediately imply a better dimension reduction
of the JL flavor, as there are datasets P ⊂ Rd with ddim(P ) = O(1) where no linear map can
approximately preserve all pairwise distances (see e.g., [IN07, Remark 4.1]).

1.1 Main results

We present the first dimension reduction results for discrete (k, z)-clustering, along with matching
lower bounds. Our first result (Theorem 1.1) provides a strong approximation guarantee, but
requires a log log n term in the target dimension, which we show is necessary. Our main result
(Theorem 1.2) avoids this log log n term, through a relaxation of the guarantee that maintains its
algorithmic usefulness, e.g., it still implies that the optimal value is preserved up to factor 1 ± ε.
In all our results, the random linear map π is given by a matrix G ∈ Rt×d of iid Gaussians N (0, 1t ),
which we refer to as a Gaussian JL map. This is nowadays a standard JL map [IM98, DG03], and
our results may extend to other JL maps, similarly to prior work in this context. We denote the
optimal value of discrete (k, z)-clustering by

optz (P ) = min costz (P, C),

C⊂P,|C|=k

however for sake of exposition, we omit z and focus on z = 1 or z = 2, which are discrete k-median
and k-means. We use the notation Õ(f ) to hide factors that are logarithmic in f , although below
it only hides a log 1ε factor.

Theorem 1.1 (Informal version of Theorem 3.1). For suitable t = Õ(ε−2 (ddim(P ) + log k +
log log n)), with probability at least 2/3,

1. opt(G(P )) ≤ (1 + ε) opt(P ), and

2. for all C ⊆ P, |C| ≤ k, we have cost(G(P ), G(C)) ≥ (1 − ε) cost(P, C).

This theorem has immediate algorithmic applications. First, it implies that the optimal value is
preserved, i.e., opt(G(P )) ∈ (1 ± ε) opt(P ). Second, for every C ⊂ P and β > 1, if the set of centers
G(C) is a β-approximate solution for the instance G(P ), then C is a (1 + O(ε))β-approximate
solution for the instance P . Therefore, the theorem fit into the general paradigm of using oblivious
linear maps — apply the mapping, solve the problem in low dimension, and lift the centers back to
the higher dimension.
It is interesting to compare our result with the continuous variant of (k, z)-clustering. On the
one hand, to preserve the optimal value in the continuous variant, we know from [MMR19] that
target dimension O(ε−2 log kε ) suffices, independently of ddim(P ). On the other hand, Theorem 1.1
further provides a “for all centers” guarantee, which is not attainable in the continuous version (by
any linear map), by simply considering centers in the kernel of the linear map (see Theorem 6.1).
We examine and discuss these guarantees more carefully in Section 1.2.

3
Matching lower bounds. The results in Theorem 1.1 are nearly tight for Gaussian JL maps,
and likely for all oblivious linear maps. It is known that achieving opt(G(P )) ∈ (1 ± ε) opt(P )
requires target dimension t = Ω(log k), even for a dataset P of doubling dimension O(1) [NSIZ21],
and another known lower bound is that t = Ω(ddim(P )), even for k = O(1) [CW25]. It is easy
to tighten these bounds with respect to the dependence on ε, and we include it in Section C for
completeness. We complete the picture, and show in Theorem 6.2 the multiplicative approximation
of Theorem 1.1 requires dimension t = Ω(ε−2 log log n), even for k = 1 and a dataset P of doubling
dimension O(1).
To get some intuition about the discrete variant, we briefly recall the hard instance of [CW25],
taking z = 1 for simplicity. Consider k = 2, and let √ P be the first n standard
√ basis vectors, thus
ddim(P ) = log n. The pairwise distances all equal 2, hence opt(P ) = 2 · n. The standard basis
vectors form a well-known hard instance for the JL Lemma, hence, when using target dimension
t = o(ε−2 log n), with high probability, there exists j1 ∈ [ n2 ] such that ∥Gej1 ∥ < 1 − 10ε. Similarly,
let j2 > n2 be such an index for the last n2 standard basis vectors. Let Gej1 , Gej2 be the two
centers for G(P ), and assign the first n2 basis vectors to Gej2 and the last n2 vectors to Gej1 . Now
a simple argument using
√ the independence between the two halves, see Section C, shows that
opt(G(P )) ≤ (1 − ε) 2 · n = (1 − ε) opt(P ) with probability 2/3.

A relaxed guarantee. Our main result avoids the log log n term in Theorem 1.1 by slightly
relaxing the guarantee, while keeping it useful for downstream applications.

Theorem 1.2 (Informal version of Theorem 5.1). For suitable t = Õ(ε−2 (ddim(P ) + log k)), with
probability at least 2/3,

1. opt(G(P )) ≤ (1 + ε) opt(P ), and

2. for all C ⊆ P, |C| ≤ k, we have cost(G(P ), G(C)) ≥ min{(1 − ε) cost(P, C), 100 opt(P )}.

This theorem implies that the optimal value is preserved, i.e., opt(G(P )) ∈ (1 ± ε) opt(P ).
Let us further examine which solutions are preserved under this guarantee: For all C ⊂ P and
100
1 < β < 1+ε , if the set of centers G(C) is a β-approximate solution for the instance G(P ), then C
is a (1 + O(ε))β-approximate solution for the instance P . Recall that for Theorem 1.1, we had a
100
similar claim, but without the restriction β < 1+ε . The constant 100 here is arbitrary, and can be
changed to any α > 2, at the cost of increasing the target dimension by an additive O(ε−2 log log α)
term.

1.2 Various notions for preserving solutions

We study several definitions for dimension reduction for k-clustering. All these definitions require
(perhaps implicitly) that opt(G(P )) ≤ (1 + ε) opt(P ), i.e., that the optimal value has bounded
expansion. This direction is often easy because it suffices to analyze one optimal solution for P .
In the other direction, one may naively require that opt(G(P )) ≥ (1 − ε) opt(P ), however this is
rather weak, as it does not guarantee that solutions are preserved. Moreover, even requiring that an
optimal solution for G(P ) is a near-optimal solution for P is quite limited, because a near-optimal
solution for G(P ), say one found by a (1 + ε)-approximation algorithm, may be lifted to a poor
solution for P . In fact, such a phenomenon was observed for minimum spanning tree (MST) when
using target dimension t = o(log n): an optimal MST of G(P ) is a (1 + ε)-approximate MST of P ,

4
however a (1 + ε)-approximate MST of G(P ) may have large cost for P [NSIZ21]. Ideally, we want
the cost of every solution to have bounded contraction, as it allows to lift any solution for G(P ) to
a solution for P , and we thus consider several different notions for the set of solutions, as follows.
For simplicity, we present these for z = 1 in the discrete setting, but they extend naturally to all
z ≥ 1 and to the continuous setting.

1. Partitions.
Pk APsolution is a partition P = (P1 , . . . , Pk ) of P . Its cost is defined as cost(P) :=
i=1 minc∈Pi p∈Pi ∥p − c∥.

2. Centers.
P A solution is a set of centers C = (c1 , . . . , ck ) ⊆ P . Its cost is defined as cost(P, C) :=
p∈P dist(p, C).

3. Centers and partitions. A solution is a partition P = (P 1 , . . .P

, Pk ) of P and a set of centers
P k
C = (c1 , . . . , ck ) ⊆ P . Its cost is defined as cost(P, C) := i=1 p∈Pi ∥p − ci ∥.

These definitions are fairly natural, and were used in prior work on dimension reduction, e.g.,
partition-based solutions were used in [MMR19] for k-means and k-median, and center-based
solutions were used in [JKS24] for k-center. It was observed in [CW25] that not all “for all”
guarantees are the same; in particular, “for all centers” and “for all partitions” are incomparable.
However,“for all centers and partitions” is clearly stronger than both.
Next, we define contraction for solutions, capturing the two notions in Theorems 1.1 and 1.2. The
notion in Theorem 1.1 is simply of multiplicative contraction: A solution S has (1 − ε)-contraction if
cost(G(S)) ≥ (1 − ε) cost(S). The notion in Theorem 1.2 is new, at least in the context of dimension
reduction, and goes as follows.

Definition 1.3 (Relaxed Contraction). A solution S has α-relaxed (1 − ε)-contraction (for α > 1,
ε > 0) if cost(G(S)) ≥ min{α opt(P ), (1 − ε) cost(S)}.

Using these definitions, we can restate Theorem 1.1 as having (1 − ε)-contraction for all centers,
and restate Theorem 1.2 as achieving 100-relaxed (1 − ε)-contraction for all centers. In fact, we can
strengthen Theorem 1.1 to assert (1 − ε)-contraction for all centers and partitions.

Theorem 1.4 (Strengthened Theorem 1.1, informal). For suitable t = Õ(ε−2 (ddim(P ) + log k +
log log n)), with probability at least 2/3, for all partitions P = (P1 , . . . , Pk ) of P and sets of centers
C = (c1 , . . . , ck ) ⊆ P ,
cost(G(P), G(C)) ≥ (1 − ε) cost(P, C).

This strengthening is not attainable for Theorem 1.2, as dimension Ω(ε−2 log log n) is needed
to get a “for all centers and partitions” guarantee, even for relaxed contraction (see Theorem 6.4).
However, we do not know if a “for all partitions” guarantee is possible without the log log n term.
If it is possible, then a curious phenomenon will occur: we get a “for all partitions” and a “for
all centers” guarantees, but not a “for all centers and partitions” guarantee. All our results are
summarized in Table 1.

Candidate centers. We consider also a more general variant of k-clustering, where the candidate
centers are part of the input (given either explicitly or implicitly): Given a P
dataset P and candidate-
centers set Q, the goal is to find C ⊆ Q of size |C| ≤ k that minimizes p∈P distz (p, C). When
Q = Rd or Q = P , we obtain the continuous and discrete variants, respectively.

5
Problem Target dimension ∀ partitions ∀ centers contraction Reference
Continuous O(ε−2 log k) yes no multiplicative [MMR19]
Ω(ε−2 log k) no no even for value [NSIZ21]
>d−1 no yes even for relaxed Thm 6.1
Discrete O(ε−2 (ddim + log k + log log n)) yes yes multiplicative Thm 3.1
O(ε−2 (ddim + log k)) no yes relaxed Thm 5.1
? yes no any OPEN
Ω(ε−2 log log n) yes yes even for relaxed Thm 6.4
Ω(ε−2 log log n) no yes multiplicative Thm 6.2
Ω(ε−2 log k) no no even for value [NSIZ21]
Ω(ε−2 ddim) no no even for value [CW25]
Candidate O(ε−2 log s) yes yes multiplicative Thm 4.1
centers O(ε−2 (ddim + log k + log log n)) yes yes relaxed Thm 4.4
O(ε−2 (ddim + log k)) no yes relaxed Thm 5.1
Ω(ε−2 log s) no yes multiplicative Thm 6.5

Table 1: Summary of our results for dimension reduction for k-clustering. The notions of “for all”
centers and/or partitions, and of multiplicative/relaxed contraction are as explained in Section 1.2.
Some lower bounds apply even for preserving the optimal value; for clarity, it is noted in the table
they hold “even for value”. In the setting of candidate centers, the size of the candidate set is
denoted by s. Suppressing log 1ε terms and the dependence on α for α-relaxed contraction.

We observe a slightly different phenomenon in terms of the attainable contraction: to get

(1 − ε)-contraction, one needs target dimension Θ(ε−2 log |Q|), and the lower bound holds even
when both P and Q are doubling and k = 1. We can still obtain claims analogous to Theorems 1.2
and 1.4, albeit with relaxed contraction: a “for all partitions and centers” using dimension t =
Õ(ε−2 (ddim(P ∪ Q) + log k + log log n)), and a “for all centers” for the same target dimension but
without the log log n term. See Table 1 for references.

1.3 Other related work

Besides the aforementioned results for “beyond JL” for clustering problems, there are also several
improved bounds for other classes of problems such as Max-Cut [LSS09, Lam10, CJK23], numerical
linear algebra [Mah11, Woo14, CNW16], and other applications [BRS11, GK15, ISZ21].
The discrete k-median problem in Euclidean space was originally shown to be NP-hard by
Papadimitriou, even for the case of d = 2 [Pap81]. In terms of hardness of approximation, the
current state of the art is that one cannot approximate the discrete k-means or k-median problem
beyond 1.07 and 1.17, respectively, assuming P ̸= NP [CK19, CKL22]. As for upper bounds, the
best approximation factors known in polynomial time are 2 + ε for any fixed ε > 0 for discrete
Euclidean k-median [CGL+ 25] and 5.912 for discrete Euclidean k-means [CEMN22]. There are also
algorithms that achieve 1 + ε approximation (again in the discrete case) in time that is doubly
exponential in the doubling dimension, see [CFS21] for a thorough discussion.
The discrete variant that we study may also be preferred over the continuous version in certain
applications. First, it is thought to be less sensitive to outliers in practice than the continuous
version [PJ09, KR09]. Second, in applications where cluster centers are used as data summarization,

6
interpretability might require the centers to be part of the dataset. For example, in applications
based on machine-learning embeddings of objects such as text [XGF16], an arbitrary vector in the
embedding space might not represent any actual object. A similar issue arises for structured data
such as sparse data or images, e.g., the “average image” is visually random noise [LRU20, TZM+ 20]
or the average of sparse vectors is not necessarily sparse. A discrete center, however, represents an
actual underlying object, and thus preserves the underlying properties of the input points.

1.4 Technical overview

Since the dimension-error tradeoff behaves differently between the discrete and continuous settings,
it is not surprising that our results for the discrete setting require new techniques. To simplify the
discussion, we focus on the k-medoids (z = 1) case, and an alternative guarantee that only preserves
the optimal value, i.e.,
opt(G(P )) ∈ (1 ± ε) opt(P ), (1)
with target dimension bound t = Õ(ε−2 (ddim(P ) + log k)) which is the same to that in Theorem 1.2.
While this is a weaker guarantee than both Theorem 1.1 and Theorem 1.2, it already introduces
major technical challenges, and the techniques for this claim covers most of our new ideas.
We begin our discussion with the case k = 1. We first argue that even for this case, a natural
framework based on extension theorems (which has been used in previous works on dimension
reduction for clustering) fails in our discrete case.

Failure of extension theorems in the discrete setting. To prove (1) (and possibly more gen-
eral claims), a natural framework based on extension theorems have been widely used in dimension
reduction for clustering. Specifically, given an arbitrary center v in the target space (e.g., v is the
optimal 1-median center of G(P )), one can define an “inverse image” u in the original space such
1
that cost(P, u) ≤ (1 + ε) cost(G(P ), v), and this directly implies opt(G(P )) ≥ 1+ε opt(P ). The key
step of defining “inverse image” is precisely what an extension theorem does. This framework is
widely used in prior works such as [MMR19, JKS24], in the spirit of the classic Kirszbraun extension
theorem [Kir34] or the robust one-point extension theorem [MMR19, Theorem 5.2]. However, such
extension theorems are only known to work in the continuous setting, which require to pick the
inverse image v ∈ Rd from the entire Rd and cannot be restricted only to the data points v ∈ P .1

Our techniques. We start with k = 1 case (a detailed discussion can be found in Section 1.4.1).
In this case, we first obtain a target dimension bound with an O(log log n) factor, by utilizing the
existence of a small movement-based coreset. A coreset is a small accurate proxy of the dataset, and
the movement-based coreset additionally requires the existence of a “local” mapping such that each
data point can be mapped to a nearby coreset point. The dimension reduction simply preserves the
pairwise distance on the coreset, and (1) is argued via the local mapping. A conceptually similar
coreset-to-dimension-reduction idea has also been employed in [CW25], and one main difference is
that we also utilize the movement/local property of the coreset.
Then, to remove the O(log log n) factor, we consider a weaker guarantee as in Theorem 1.2,
where we prove the (1 + ε) relative error only for near-optimal solutions, and for the other solutions
we have a flat 100 opt(P ) error. This relaxed guarantee is strong enough for (1) (and many other
1
We note that the Kirszbraun theorem may be adapted to work for the discrete case when the target dimension
t = O(log n), but this dimension bound is too large to be useful.

7
applications), which may be of independent interest to further studies. Our analysis is crucially
built on this small vs large cost case, albeit we also need to consider the middle ground of the mix
of the two.
Finally, we discuss the generalization to k > 1 in Section 1.4.2, which introduces several nontrivial
technical complications from k = 1.

1.4.1 The k = 1 case

The easy side of (1) is the upper bound opt(G(P )) ≤ (1 + ε) opt(P ), even for the general k case.
The reason is that it suffices to preserve the cost w.r.t. an optimal center set C ∗ , and since C ∗ is
a fixed solution, even a target dimension t = O(ε−2 log(1/ε)) will be sufficient. This is a standard
argument also observed in prior works. The lower bound opt(G(P )) ≥ (1 − ε) opt(P ) is the major
challenge. To prove this inequality, we want to preserve the clustering cost w.r.t. the optimal center
set of G(P ), denoted by C. Since C is a random set that depends on G, preserving its cost is almost
the same as preserving the cost of all center sets, which is exactly the guarantee 2 of Theorems 1.1
and 1.2.
To introduce our new techniques, we first establish a weaker target dimension bound of
O(ε−2 (ddim + log log n)), and this part contains main ideas for proving Theorem 1.1. We then
overview the key steps to eliminate the extra log log n term, which also reflects how we prove
Theorem 1.2.

The O(log log n) bound: from coreset to dimension reduction. To prove (1), we use an
approach inspired by the movement-based coreset construction in Euclidean spaces [HM04]. Roughly
speaking, aP movement-based coreset2 is a subset S ⊆ P , such that there exists a mapping σ : P → S
satisfying p∈P ∥p − σ(p)∥ ≤ O(ε) opt(P ). Our framework is summarized as follows: we first
construct a movement-based coreset S to compress the dataset P . Next, we apply the standard JL
lemma to preserve pairwise distances in the coreset S within (1 ± ε), which requires O(ε−2 log |S|)
target dimensions. After this step, the optimal value of S is already preserved, nemely, opt(G(S)) ∈
(1 ± ε) opt(S). Finally,
P it suffices to show P that the cost of snapping data points to their nearest
neighbor in S (i.e., p∈P ∥p − S(p)∥ and p∈P ∥Gp − GS(p)∥) is negligible in both original and
target spaces.
The construction of the coreset is essentially the same with that in [HM04], except that [HM04]
also assigns weight to the coreset points and here we only need the point set itself. We review the
construction. This construction is based on a sequence of nets, a standard tool for discretizing
metrics. Formally, a ρ-net of a point set P is a subset N ⊆ P , such that 1) the interpoint distances
in N are at least ρ, and 2) every point in P has a point in N within distance ρ. (See the more
detailed definition in Definition 2.3). Denote c∗ ∈ P as an optimal discrete 1-median center. We
construct nets on a sequence of balls centered at c∗ with geometrically decreasing radii. Denote
r0 := opt(P ) and rℓ := r0 /2ℓ for ℓ = 1, 2, . . . , log n. Construct the level ℓ net Nℓ as an εrℓ -net on
the ball B(c∗ , rℓ ), and denote N := log
S n
ℓ=0 Nℓ to be the union of all log n levels of nets.
By the standard packing property of doubling metrics, each net has size |Nℓ | ≤ O(ε−O(ddim) ),
thus |N | ≤ O(ε−O(ddim) log n), which implies a target dimension t = O(ε−2 (ddim log ε−1 +log log n)).
On the other hand, let G(c) ∈ G(P ) be an optimal discrete 1-median center P of G(P ). Then the
total cost of snapping c and all data points to the nearest neighbor in N (i.e., p∈P (∥p − N (p)∥ +
2
This definition is tailored to our need and may be slightly different to that in the literature.

8
∥c − N (c)∥)) can be bounded by O(ε)(opt(P ) + cost(P, c)) in the original space.
P Based on results
in [IN07], we further show that this snapping cost in the target space (i.e., p∈P (∥Gp − GN (p)∥ +
∥Gc − GN (c)∥)) can increase by at most a constant factor.
Finally, we note that the above analysis can be applied to obtain the “for all centers” guarantee
in Theorem 1.1, or even the stronger “for all centers and partitions” guarantee in Theorem 1.4.

Removing the log log n term via relaxed guarantee. Let us first recall the cause of the
log log n term. We the JL Lemma to N , which is a union of log n nets, each of size ε−O(ddim) . The
log log n thus comes from a union bound over all log n levels. To bypass this union bound, we use
two technical ideas. First, we avoid touching cross-level pairs and only apply the union bound
for each Nℓ separately. This requires us to always snap p and c to the same level of net when
handling each p ∈ P . Second, for a single level, we analyze its maximum distance distortion which
is a random variable, and bound the expectation. We remark that some levels will be distorted
significantly, but the average distortion is (1 + O(ε)). Similar ideas have been used by prior works
(e.g., [GJK+ 25]).
Consider the following two extremes. First, suppose c is the closest point to c∗ , say, ∀p ∈
P, ∥c − c∗ ∥ ≤ ∥p − c∗ ∥. For every p ∈ P , we can snap p to its nearest neighbor in net Np .
Observe that c can also be covered by Np . The cost of snapping p and c can both be bounded
by O(ε) · ∥p − c∗ ∥, and we show that on average, the cost of snapping Gp and Gc is bounded by
O(ε) · ∥p − c∗ ∥ as well, which adds up to O(ε) opt(P ). The other extreme is that c is very far from
c∗ , i.e, ∥c − c∗ ∥ > opt(P )/10. In this case, we can no longer snap c to the same net as p (like the
previous case). We show that in this case, cost(G(P ), Gc) ≥ 100 opt(P ).
If c does not fall into any of the above two extremes, our analysis is a combination of them.
Indeed, we show the relaxed “for all centers” guarantee,

∀c ∈ P, cost(G(P ), Gc) ≥ min{(1 − ε) cost(P, c), 100 opt(P )}. (2)

Note that this is exactly the same as the guarantee 2 of Theorem 1.2, and that the two terms in
the min corresponds to the aforementioned two extremes, respectively. Specifically, we first specify
a level ℓ and its corresponding radius rℓ . If ∥c − c∗ ∥ > rℓ , then we fall into the second extreme
and show that cost(G(P ), Gc) ≥ 100 opt(P ). Otherwise, ∥c − c∗ ∥ ≤ rℓ , then we handle each p ∈ P
differently, depending on the distance ∥p − c∗ ∥. If ∥p − c∗ ∥ ≥ rℓ , then we use the same argument as
the first extreme — snapping both p and c to Np , bounding the snapping cost, and analyzing the
additive contraction. If ∥p − c∗ ∥ < rℓ , then we snap both p and c to Nℓ . Since ℓ is a fixed level, a
union bound over Nℓ is affordable and we obtain cost(G(P ), Gc) ≥ (1 − ε) cost(P, c) in this case.

1.4.2 Generalization to k > 1

Instead of directly generalizing (2), we first show a weaker guarantee: for target dimension t =
O(ε−2 ddim log k),
X
∀C ⊆ P, |C| = k, ∥Gp − GC(p)∥ ≥ min{(1 − ε) cost(P, C), 100 opt(P )}, (3)
p∈P

where C(p) is the center in C closest to p. Note that (3) is weaker than what we desire in Theorem 1.2,
for the following two reasons. First, the target dimension is worse than the O(ε−2 (ddim + log k))
in Theorem 1.2. Second, the left hand side of (3) can be much larger than cost(G(P ), G(C)), since

9
the image of C(p) under G (i.e., GC(p)) is not necessarily the nearest neighbor of Gp in G(C).
Nonetheless, the proof of (3) already captures most of our key ideas. In the end of this section, we
briefly discuss how we obtain a sharper target dimension bound as well as a stronger guarantee.
Suppose C ∗ ⊆ P is an optimal solution, which induces a clustering C ∗ = {S1∗ , S2∗ , . . . , Sk∗ }. Our
general proof framework is the same as the k = 1 case — considering the “distance” between C
and C ∗ , if C is “far from” C ∗ , then we show cost(G(P ), G(C)) ≥ 100 opt(P ); otherwise we show
cost(G(P ), G(C)) ≥ (1 − ε) cost(P, C).
However, an immediate issue is how to define that C and C ∗ are far from or close to each other.
For each i ∈ [k], we specify a “threshold level” of cluster Si∗ , denoted by ℓi . We say C is “far from”
C ∗ if there exists i ∈ [k], such that dist(c∗i , C) > 10rℓi . In this case, the cost of connecting B(c∗i , rℓi )
to C is already high. We further prove that cost(G(P ), G(C)) ≥ 100 opt(P ), by careful analysis of
the randomness of G.
Now suppose C is “close to” C ∗ , i.e., ∀i ∈ [k], dist(c∗i , C) ≤ 10rℓi . Our key observation is that
for every p ∈ Si∗ , C(p) should also be close to c∗i , i.e.,

∀p ∈ Si∗ , ∥C(p) − c∗i ∥ ≤ O(max{∥p − c∗i ∥, rℓi }). (4)

As a natural generalization of the k = 1 case, we lower bound ∥Gp − GC(p)∥ for p ∈ Si∗ differently,
depending on the distances ∥C(p) − c∗i ∥. If ∥C(p) − c∗i ∥ ≥ rℓi , then we snap both p and C(p) to the
(enlarged) net Np . (We can do this since (4) holds.) Otherwise, we snap both p and C(p) to the
(enlarged) net Nℓi . The snapping cost and the distance contraction are bounded similarly to the
k = 1 case. This simply introduces an extra log k factor in the target dimension.

Decoupling ddim from log k. So far, we only obtain an Oε (ddim log k) bound, instead of
Oε (ddim + log k). This is due to error accumulation: Recall we handle each (optimal) cluster
Si∗ separately, each of which incurs an O(ε) opt(P ) additive error; hence, we have to rescale ε by
a 1/k factor to compensate the accumulated error of k clusters, resulting in an O(ε−2 ddim log k)
target dimension (naı̈vely, that results in Õ(ε−2 k 2 ddim) target dimension, but this is avoided by
an easy adaptation).
To decouple these two factors, we need more delicate analysis for the error. For “far” points
p ∈ Si∗ with ∥C(p)−c∗i ∥ ≥ rℓi , the snapping and distortion error is O(ε)∥p−c∗i ∥ in expectation, which
adds up to O(ε) opt(P ) and does not incur any error accumulation. However, the error accumulation
happens for P “close” points p with ∥C(p) − c∗i ∥ < rℓi , where the snapping cost within a single cluster
Si∗ , namely p∈S ∗ ∥p − Nℓ (p)∥, is already O(ε) opt(P ), which accumulates to O(kε) opt(P ).
i
To reduce the error accumulation, we further divide the close points (i.e., ∥C(p) − c∗i ∥ < rℓi )
into two ranges, namely, the close range ∥C(p) − c∗i ∥ < rℓi /k and the middle range ∥C(p) − c∗i ∥ ∈
[rℓi /k, rℓi ], and handle these two ranges differently. The cost of points in the close range can be
bounded by O(ε/k) opt(P ), which adds up to O(ε) opt(P ). For points in the middle range, we
2
handle them in a point-by-point manner, at the cost of poly(k)e−Ω(ε t) per point. Since there are
at most k · O(log k) levels in the middle range, a union bound over all net points at these levels will
be affordable.

Handling nearest neighbor assignment in the target space. Recall that (4) conerns the cost
∥Gp − GC(p)∥, which is the cost in the target space with respect to the nearest neighbor assignment
in the original space. However, what we really need is the nearest neighbor assignment in the target
space. To capture such misalignment in the original and target spaces, we define a mapping f to

10
be the assignment in Pthe target space, i.e., f (p) is the center in C realizing dist(Gp, G(C)), so that
cost(G(P ), G(C)) = p∈P ∥Gp − Gf (p)∥, and f (p) = C(p) does not hold in general. We attempt
to modify the previous analysis to lower bound each ∥Gp − Gf (p)∥ instead of ∥Gp − GC(p)∥.
To lower bound this distance, we attempt to replace every C(p) with f (p) in our previous proof.
The analysis becomes problematic, as our structural observation (4) no longer holds if we change
C(p) to f (p), and this turns out to be the only place where our analysis does not go through.
To resolve this issue, let us focus on the bad scenario where f (p) is sufficiently far from c∗i , i.e.,
∥f (p) − c∗i ∥ ≫ max{∥p − c∗i ∥, rℓi }. This implies f (p) is also far from p. We further show that
∥Gp − Gf (p)∥ ≫ ∥p − c∗i ∥ by careful analysis of G’s randomness. On the other hand, we have
∥p−C(p)∥ ≤ O(∥p−c∗i ∥) by (4). Therefore, we can directly lower bound ∥Gp−Gf (p)∥ by ∥p−C(p)∥
in this case.

2 Preliminaries
Consider a point set P ⊂ Rd . For every x ∈ Rd , denote by P (x) the point in P closest to
x and dist(x, P ) := ∥x − P (x)∥ (recall that throughout ∥ · ∥ is the Euclidean norm). Denote
diam(P ) := max{dist(p, q) : p, q ∈ P } as the diameter of P . For x ∈ Rd and r > 0, denote by
B(x, r) := {y ∈ Rd : |x−y∥ ≤ r} the ball centered at x with radius r. Recall that forPk ∈ N and z ≥ 1,
the (k, z)-clustering cost of P w.r.t. center set C ⊂ Rd , |C| ≤ k is costzk (P, C) := p∈P dist(p, C)z .
The optimal discrete (k, z)-clustering cost of P w.r.t. a candidate center set Q ⊂ Rd is denoted
by optzk (P, Q) := minC⊆Q,|C|≤k costzk (P, C), and by opt(P, Q) for short when k, z are clear from the
context. Denote opt(P ) := opt(P, P ) and opt-cont(P ) := opt(P, Rd ) for simplicity.
We use the following generalized triangle inequalities.

Lemma 2.1 (Generalized triangle inequalities [MMR19]). Let (X, dist) be a metric space. Then
for every z ≥ 1, ε ∈ (0, 1) and p, q, r ∈ X,

dist(p, q)z ≥ (1 − zε) dist(p, r)z − ε−z dist(q, r)z .

1 + ε z−1

z z−1 z
dist(p, q) ≤ (1 + ε) dist(p, r) + dist(q, r)z .
ε

2.1 Doubling dimension and nets

Definition 2.2 (Doubling dimension [GKL03]). The doubling dimension of a set P ⊆ Rd , denoted
ddim(P ), is the minimum m > 0, such that ∀r > 0, every ball in P with radius r can be covered by
at most 2m balls of radius r/2.

Our proof uses ρ-nets for doubling sets, whose definition and key properties are described here.

Definition 2.3 (ρ-net). Let P ⊆ Rd and ρ > 0. A subset N ⊆ P is called a ρ-packing of P if

∀u, v ∈ N, ∥u − v∥ > ρ. The subset N is called a ρ-covering of P if ∀x ∈ P , there exists u ∈ N
such that x ∈ B(u, ρ). The subset N is called a ρ-net of P if N is both a ρ-packing and ρ-covering
of P .

Lemma 2.4 (Packing property [GKL03]). Let P ⊆ Rd and N ⊆ P be a ρ-packing of P . Then

|N | ≤ (diam(P )/ρ)O(ddim(P )) .

11
2.2 Dimension reduction
For simplicity, we only consider random linear maps defined by a matrix of iid Gaussians, which
are known to satisfy the JL Lemma [IM98, DG03].

Definition 2.5. A Gaussian JL map is a t × d matrix with i.i.d. entries drawn from N (0, 1t ).

Recall the following concentration bound [IN07, Eq. (7)] (see also [NSIZ21, Eq. (5)]), from
which one can deduce the JL lemma.

Lemma 2.6 ([IN07, Eq. (7)]). Let x ∈ Rd , ε > 0 and a Gaussian JL map G ∈ Rt×d . We have

/ (1 ± ε)∥x∥) ≤ exp(−ε2 t/8).

Pr(∥Gx∥ ∈

The following two lemmas regard Gaussian JL maps when applied to doubling sets.

Lemma 2.7 ([IN07, Lemma 4.2]). There exist universal constants A1 , A2 > 0 such that for every
subset P ⊂ B(⃗0, 1) of the Euclidean unit ball in Rd , t > A1 · ddim(P ) + 1, D ≥ 10, and a Gaussian
JL map G ∈ Rt×d ,
2
Pr(∃x ∈ P, ∥Gx∥ > D) ≤ e−A2 tD .

Lemma 2.8 ([HJKY25, Lemma 3.21]). There exists universal constants A1 , A2 , L > 1, such that
for every P ⊂ Rd \ B(⃗0, 1), ε > 0, t > A1 ddim(P ), and a Gaussian JL map G ∈ Rt×d ,

Pr(∃x ∈ P, ∥Gx∥ < 1

L) ≤ e−A2 t .

3 The first upper bound

We prove Theorem 1.1 (a.k.a Theorem 1.4) in this section, formally stated below.

Theorem 3.1. Let ε > 0, z ≥ 1 and d, ddim, k ∈ N and a Gaussian JL map G ∈ Rt×d with suitable
t = O(z 2 ε−2 (ddim log(z/ε) + log k + log log n)). For every set P ⊆ Rd with ddim(P ) ≤ ddim, with
probability at least 2/3,

1. optzk (G(P )) ≤ (1 + ε) optzk (P ), and

2. for all centers C = (c1 , . . . , ck ) ⊆ P and all partitions P = (S1 , . . . , Sk ) of P ,

costzk (G(P), G(C)) ≥ (1 − ε) costzk (P, C),

where costzk (P, C) = ki=1 p∈Si ∥p − ci ∥z .

P P

We use the following lemma to bound the clustering cost of a fixed set of centers and partition
of P . The proof is deferred to Section A.

Lemma 3.2. Let ε > 0, z ≥ 1 and d, k ∈ N and a Gaussian JL map G ∈ Rt×d with suitable
t = O(z 2 ε−2 log ε−1 ). For every set P ⊆ Rd , every set of centers (c1 , . . . , ck ) ⊂ Rd and every
partition P = (S1 , . . . , Sk ) of P , with probability at least 9/10,

costzk (G(P), G(C)) ≤ (1 + ε) costzk (P, C).

12
Proof of Theorem 3.1. Consider an optimal discrete k-median of P . Denote by C ∗ = {c∗1 , . . . , c∗k } ⊆
P and by S1∗ , . . . , Sk∗ the centers and clusters (respectively) in that solution. Applying Lemma 3.2
to the optimal center set C ∗ and the partition P ∗ = (S1∗ , . . . , Sk∗ ), we have that with probability at
least 9/10,

opt(G(P )) ≤ cost(G(P ∗ ), G(C ∗ )) ≤ (1 + ε) cost(P ∗ , C ∗ ) = (1 + ε) opt(P ),

concluding the first part of the theorem.

Denote by r0 the largest radius of any cluster Si∗ . Pick a suitable m = O(log n) such that
2 = n10 . For i ∈ [0, m] and j ∈ [k], set ri = r0 /2i , and Pij = Sj ∩ B(cj , ri ), i.e., for every cluster,
m

we have a sequence of geometrically decreasing balls. Additionally, let Ni be an ε3 ri -net of ∪j Pij .

By Lemma 2.4, |Ni | ≤ kε−O(ddim(P )) .
For each x, y ∈ ∪i∈[0,m] Ni , by Lemma 2.6,

εΩ(ddim(P ))
Pr(∥Gx − Gy∥ > (1 + ε)∥x − y∥) ≤ exp(−ε2 t/8) ≤ .
k 2 m2
Thus, by a union bound, w.p. at least 9/10,

∀x, y ∈ ∪i∈[0,m] Ni , ∥Gx − Gy∥ ≤ (1 + ε)∥x − y∥. (5)

Furthermore, for each i ∈ [0, m], y ∈ Ni , by Lemma 2.7,

Pr(∃p ∈ P ∩ B(y, ε3 ri ) s.t. ∥G(p − y)∥ > 10ε3 ri ) ≤ exp(−Ω(t)).

By a union bound, w.p. at least 9/10,

∀i ∈ [0, m], y ∈ Ni , p ∈ P ∩ B(y, ε3 ri ), ∥G(p − y)∥ ≤ 10ε3 ri . (6)

By another union bound, Equations (5) and (6) hold with probability at least 2/3.
We are now ready to prove the second part of the theorem. Let C = {c1 , . . . , ck } ⊆ P and let a
partition P = (S1 , . . . , Sk ) of P . For every p ∈ P we denote by up the nearest net-point to p in the
level such that Pi \ Pi+1 contains p, and the radius of that level is denoted rp . Denote by f (p) the
center in C assigned to p according to the partition P. Recall that C ∗ (p) is a point in C ∗ that is
nearest to p. Observe that
r z Xk m−1
0
X X X
rpz ≤ n · + (2∥p − c∗j ∥)z = O(2z ) · opt(P ),
n10
p∈P j=1 i=0 p∈Pi,j \Pi+1,j

and

( 12 rf (p) )z ≤ ∥f (p) − C ∗ (f (p))∥z by definition

∗
≤ ∥f (p) − C (p)∥ z
C ∗ (f (p)) is nearest to f (p) from C ∗
≤ 2z−1 ∥p − f (p)∥z + 2z−1 ∥p − C ∗ (p)∥z by Lemma 2.1. (7)

Therefore,

cost(G(P), G(C))

13
X
≡ ∥Gp − Gf (p)∥z
p∈P
X
≥ (1 − zε)∥Gup − Guf (p) ∥z − ε−z ∥Gp − Gup ∥z − ε−z ∥Gf (p) − Guf (p) ∥z by Lemma 2.1
p∈P
X
≥ (1 − zε)(1 − ε)z ∥up − uf (p) ∥z − ε−z (10ε3 rp )z − ε−z (10ε3 rf (p) )z by (5) and (6)
p∈P
X
≥ (1 − zε)2 (1 − ε)z ∥p − f (p)∥z − O(ε)z rpz − O(ε)z rfz (p) by Lemma 2.1
p∈P
X
≥ (1 − 3zε)∥p − f (p)∥z − O(ε)z rpz − O(ε)z 22z−1 (∥p − f (p)∥z + ∥p − C ∗ (p)∥z ) by (7)
p∈P

≥ (1 − O(zε)) cost(P, C) − O(ε) · opt(P ).

Rescaling ε → ε/z concludes the proof.

4 General candidate centers

We now consider a generalization of Theorem 3.1, to the setting where the centers are from a (possibly
different than the input) candidate set Q. Unfortunately, to obtain multiplicative contraction in
this setting, we have to pay Θ(ε−2 log |Q|) in the target dimension. We prove the upper bound
below, and the matching lower bound is provided in Theorem 6.5.

Theorem 4.1. Let ε > 0, z ≥ 1 and d, k, s ∈ N and a Gaussian JL map G ∈ Rt×d with suitable
t = O(z 2 ε−2 (log s + z log(z/ε))). For every set P ⊆ Rd and every candidate center set Q ⊆ Rd with
|Q| = s ≥ k, with probability at least 2/3,

1. optzk (G(P ), G(Q)) ≤ (1 + ε) optzk (P, Q), and

2. for every C = (c1 , . . . , ck ) ⊆ Q and every partition P = (S1 , . . . , Sk ) of P ,

costzk (G(P), G(C)) ≥ (1 − ε) costzk (P, C),

where costzk (P, C) = ki=1 p∈Si ∥p − ci ∥z .

P P

The proof of Theorem 4.1 uses the following lemma, whose proof is provided in Section 4.1.

Lemma 4.2. There exists universal constant A2 > 1, such that for every P ⊂ Rd , ε > 0, z ≥ 1, k ∈
2
N, c ∈ Rd , and a Gaussian JL map G ∈ Rt×d , with probability 1 − ε−O(z) k 2 e−A2 ε t ,
X X ε
∀P ′ ⊆ P, ∥Gp − Gc∥z ≥ (1 − ε)3z ∥p − c∥z − 2 opt-contzk (P ),
′ ′
k
p∈P p∈P

where opt-contzk (P ) is the optimal continuous (k, z)-clustering value of P .

Remark 4.3. There is a similar statement in [MMR19], but w.r.t. the optimal center of P ′ . In
contrast, here the center is fixed.

14
Proof of Theorem 4.1. The first guarantee is the same as Theorem 3.1, so we omit its proof and
focus on the second guarantee. By Lemma 4.2 and a union bound over Q, we have that with
2
probability 1 − s · ε−O(z) k 2 e−A2 ε t ≥ 2/3, all centers c ∈ Q satisfy that
X X ε
∀P ′ ⊆ P, ∥Gp − Gc∥z ≥ (1 − ε)3z ∥p − c∥z − 2 opt-contzk (P ). (8)
′ ′
k
p∈P p∈P

Consider an arbitrary center set C = (c1 , c2 , . . . , ck ) ⊆ Q and a partition P = (S1 , S2 , . . . , Sk ) of

P . For i ∈ [k], pick c = ci and P ′ = Si in (8); we have
X X ε
∀i ∈ [k], ∥Gp − Gci ∥z ≥ (1 − ε)3z ∥p − ci ∥z − 2 opt-contzk (P ).
k
p∈Si p∈Si

Summing over i ∈ [k], we obtain

k X
X
costzk (G(P), G(C)) = ∥Gp − Gci ∥z
i=1 p∈Si
k X
X
≥ (1 − ε)3z ∥p − ci ∥z − ε opt-contzk (P )
i=1 p∈Si
k X
X
≥ (1 − ε)3z ∥p − ci ∥z − ε optzk (P, Q)
i=1 p∈Si

≥ (1 − 3zε) costzk (P, C) − ε optzk (P, Q)

≥ (1 − O(zε)) costzk (P, C).

Rescaling ε by a factor of 1/z completes the proof.

To bypass the O(ε−2 log |Q|) barrier in the target dimension, we consider relaxed contraction,
and prove the following.

Theorem 4.4. Let ε > 0, z ≥ 1 and d, ddim, k ∈ N and a Gaussian JL map G ∈ Rt×d with suitable
t = O(z 2 ε−2 (ddim log(z/ε) + log k + log log α + log log n)). For every n-point set P ⊆ Rd and every
candidate center set Q ⊆ Rd with ddim(P ∪ Q) ≤ ddim, with probability at least 2/3,

1. optzk (G(P ), G(Q)) ≤ (1 + ε) optzk (P, Q), and

2. for every C = (c1 , . . . , ck ) ⊆ Q and every partition P = (S1 , . . . , Sk ) of P ,

costzk (G(P), G(C)) ≥ min{α · optzk (P, Q), (1 − ε) costzk (P, C)},

where costzk (P, C) = ki=1 p∈Si ∥p − ci ∥z .

P P

Proof. The first guarantee is the same as Theorem 3.1, so we omit its proof and focus on the second
guarantee. Consider an optimal discrete k-median of P . Denote by C ∗ = {c∗1 , . . . , c∗k } ⊆ P and by
S1∗ , . . . , Sk∗ the centers and clusters (respectively) in that solution. Denote r0 := optzk (P, Q)1/z . Pick
a suitable m = O(log n) such that 2m = n10 . Denote L as the same (sufficiently large) constant
in Lemma 2.8. For i ∈ [− log(10Lα), m] and j ∈ [k], set ri = r0 /2i , and Pij = Sj∗ ∩ B(cj , ri ), i.e.,

15
for every cluster, we have a sequence of geometrically decreasing balls. Additionally, let Ni be an
ε ri -net of ∪j Pij . As in the proof of Theorem 3.1, we have p∈P rpz = O(2z ) opt.
3
P
By Lemmas 2.6 to 2.8 and a union bound, the following hold with probability at least 2/3,

∀x, y ∈ ∪Ni , ∥Gx − Gy∥ ≤ (1 + ε)∥x − y∥ (9)

3 3
∀i ∈ [− log(10Lα), m], y ∈ Ni , p ∈ P ∩ B(y, ε ri ), ∥G(p − y)∥ ≤ 10ε ri (10)
∀i ∈ [k], ∀y ∈ (P ∪ Q) ∩ B(c∗i , r0 ), ∥Gy − Gc∗i ∥ ≤ 10r0 (11)
∀i ∈ [k], ∀y ∈ (P ∪ Q) \ B(c∗i , 10Lα · r0 ), ∥Gy − Gc∗i ∥ ≥ 10α · r0 (12)

Equations (9) and (10) are the same as Equations (5) and (6), and hold with probability at least
9/10. Equations (11) and (12) each hold w.p. 9/10 directly by Lemmas 2.7 and 2.8, respectively.
A union bound yields the desired success probability > 2/3.
We are now ready to prove the theorem. Let C = {c1 , . . . , ck } ⊆ Q and let a partition
P = (S1 , . . . , Sk ) of P . For p ∈ P , denote f (p) ∈ C to be the center to which p is assigned. Consider
the following cases.

Case 1, ∃j ∈ [k], s.t. ∥cj −C ∗ (cj )∥ ≥ 10Lα·r0 and Sj ̸⊆ {cj }. By assumption, there exists a point
̸ cj . Then ∥cj − C ∗ (p)∥ ≥ ∥cj − C ∗ (cj )∥ ≥ 10Lα · r0 . By (12), ∥Gcj − GC ∗ (p)∥ ≥ 10αr0 .
p ∈ Sj , p =
On the other hand, ∥p − C ∗ (p)∥ ≤ r0 . By (11), ∥Gp − GC ∗ (p)∥ ≤ 10r0 . Hence,

cost(G(P), G(C)) ≥ ∥Gp − Gcj ∥z

≥ (∥Gcj − GC ∗ (p)∥ − ∥Gp − GC ∗ (p)∥)z
≥ (10αr0 − 10r0 )z
> αr0z
= α opt(P, Q).

Case 2, ∀j ∈ [k], ∥cj − C ∗ (cj )∥ ≤ 10Lα · r0 or Sj ⊆ {cj }. Without loss of generality, we can
assume that for all j ∈ [k], Sj ̸⊆ {cj }. That is since whenever Sj ⊆ {cj }, we have cost(Sj , cj ) =
cost(G(Sj ), Gcj ) = 0. S
Therefore, every center in C is covered by the union of nets i Ni . For every p ∈ P ∪ Q we
denote by up the nearest net-point to p in the level such that Pi \ Pi+1 contains p, and the radius
of that level is denoted rp . Same as the proof of Theorem 3.1, we are able to establish (7) for every
rf (p) . Then

cost(G(P), G(C))
X
≡ ∥Gp − Gf (p)∥z
p∈P
X
≥ (1 − zε)∥Gup − Guf (p) ∥z − ε−z ∥Gp − Gup ∥z − ε−z ∥Gf (p) − Guf (p) ∥z by Lemma 2.1
p∈P
X
≥ (1 − zε)(1 − ε)z ∥up − uf (p) ∥z − ε−z (10ε3 rp )z − ε−z (10ε3 rf (p) )z by (9) and (10)
p∈P
X
≥ (1 − zε)2 (1 − ε)z ∥p − f (p)∥z − O(ε)z rpz − O(ε)z rfz (p) by Lemma 2.1
p∈P

16
X
≥ (1 − 3zε)∥p − f (p)∥z − O(ε)z rpz − O(ε)z 22z−1 (∥p − f (p)∥z + ∥p − C ∗ (p)∥z ) by (7)
p∈P

≥ (1 − O(zε)) cost(P, C) − O(ε) · opt(P ).

Rescaling ε → ε/z concludes the proof.

4.1 Proof of Lemma 4.2

where opt-contzk (P ) is the optimal continuous (k, z)-clustering value of P .

Our proof of Lemma 4.2 is based on [Dan21], and is by reducing to the central symmetric case.
We say a point set X ⊂ Rd is central symmetric with center c ∈ Rd , if for every point x ∈ X, it
holds 2c − x ∈ X. The following lemma shows that the (continuous) 1-median center of a central
symmetric point set coincides with its center of symmetry.

Lemma 4.5. Let z ≥ 1 and X ⊂ Rd be a central symmetric point set centered at point c ∈ Rd .
Then c is an optimal (continuous) (1, z)-clustering center of X.

Proof. For x ∈ X, denote x− := 2c − x. Let c∗ ∈ Rd be an optimal 1-median center of X. Then

X
cost(X, c∗ ) = ∥x − c∗ ∥z
x∈X
1X
= (∥x − c∗ ∥z + ∥x− − c∗ ∥z )
2
x∈X
1 X
≥ z (∥x − c∗ ∥ + ∥x− − c∗ ∥)z
2
x∈X
1 X
≥ z ∥x − x− ∥z .
2
x∈X

Denote opt-contzk (P ) as the optimal continuous (k, z)-clustering value of P . The following lemma
is a restatement of [MMR19, Theorem 3.4].

Lemma 4.6 ([MMR19, Theorem 3.4]). Consider a point set X ⊂ Rd . Let G be a random linear
map and C be a random subset of X (which may depend on G). Then with probability at least
−2
1 − O(ε−O(z) k 2 e−Ω(ε t) ),
ε
opt-contz1 (G(C)) ≥ (1 − ε)3z opt-contz1 (C) − opt-contzk (X).
k2

17
Proof of Lemma 4.2. Let P̃ ⊆ P be a subset that maximizes
X X
(1 − ε)3z ∥p − c∥z − ∥Gp − Gc∥z .
p∈P̃ p∈P̃

Then P̃ is a random subset of P that depends on G. Denote X := P ∪ {2c − p : p ∈ P } and

X̃ := P̃ ∪ {2c − p : p ∈ P̃ }. Then both X and X̃ are symmetric with center c. By Lemma 4.6, with
2
k ′ = 2k, with probability 1 − O(ε−O(z) k 2 e−Ω(ε t) ),
ε
opt-contz1 (G(X̃)) ≥ (1 − ε)3z opt-contz1 (X̃) − opt-contz2k (X).
4k 2
By the linearity of G, G(X̃) is symmetric with center Gc. Hence, by Lemma 4.5, we have
X X ε
∥Gx − Gc∥z ≥ (1 − ε)3z ∥x − c∥z − 2 opt-contz2k (X).
4k
x∈X̃ x∈X̃

Note that opt-contz2k (X) ≤ 2 opt-contzk (P ). Then

X X ε
∥Gp − Gc∥z ≥ (1 − ε)3z ∥p − c∥z − 2 opt-contzk (P ).
2k
p∈P̃ p∈P̃
2
By the definition of P̃ , we conclude that with probability 1 − O(ε−O(z) k 2 e−Ω(ε t) ),
X X ε
∀P ′ ⊆ P, ∥Gp − Gc∥z ≥ (1 − ε)3z ∥p − c∥z − 2 opt-contzk (P ).
′ ′
2k
p∈P p∈P

5 Improved upper bound: Removing the log log n term

We prove Theorem 1.2 in this section. In fact, we prove the following for the more general candidate
centers setting, and get Theorem 1.2 by setting Q = P .
Theorem 5.1. Let 0 < ε < 12 , z ≥ 1, α > 2 and d, ddim ∈ N and a Gaussian JL map G ∈ Rt×d
with suitable t = O(z 2 ε−2 (ddim log(z/ε) + z log(z/ε) + log k + log log α)), the following holds. For
every P, Q ⊆ Rd with ddim(P ∪ Q) ≤ ddim, with probability at least 2/3,
1. optzk (G(P ), G(Q)) ≤ (1 + ε) optzk (P, Q), and
2. ∀C ⊆ Q, |C| = k,
costzk (G(P ), G(C)) ≥ min{α · optzk (P, Q), (1 − ε) costzk (P, C)}.

Consider an optimal discrete k-median of P w.r.t. candidate center set Q. Denote by C ∗ =

{c∗1 , . . . , c∗k }
⊆ Q and by S1∗ , . . . , Sk∗ the centers and clusters (respectively) in that solution. Denote
r0 := opt(P, Q)1/z . For ℓ ∈ N and i ∈ [k], set rℓ = r0 /2ℓ , and Pℓi = Si∗ ∩ B(c∗i , rℓ ), i.e., for every
cluster, we have a sequence of geometrically i 3
∗
S i decreasing balls. Additionally, let Nℓ be an ε rℓ -net of
(P ∪ Q) ∩ B(ci , rℓ−log ε−1 ). Let Nℓ := i Nℓ .
For p ∈ P ∪ Q, recall C ∗ (p) is the closest center to p in C ∗ . Let jp ∈ N be the level satisfying
rjp +1 ≤ ∥p − C ∗ (p)∥ ≤ rjp . Denote rp := rjp for simplicity. We have the following claim.

18
rpz ≤ 2z optzk (P, Q).
P
Lemma 5.2. p∈P

For C ⊆ Q and p ∈ P , recall we denote by C(p) the point closest to p in C. We have the
following lemma that upper bounds the distance from C(p) to C ∗ (p) (and also the distance from
C(p) to p).

Lemma 5.3. Let C ⊆ Q. Then for every i ∈ [k] and p ∈ Si∗ , it holds that ∥C(p) − c∗i ∥ ≤
4 max{rp , ∥c∗i − C(c∗i )∥}.

Proof.

∥C(p) − c∗i ∥ ≤ ∥C(p) − p∥ + ∥p − c∗i ∥ by triangle inequality

≤ ∥C(c∗i ) − p∥ + ∥p − c∗i ∥ C(p) is the point closest to p in C
≤ ∥C(c∗i )
− c∗i ∥ + ∥c∗i − p∥ + ∥p − c∗i ∥ by triangle inequality
≤ 4 max{rp , ∥c∗i − C(c∗i )∥}.

Proof of Theorem 5.1. The first guarantee is the same as Theorem 3.1, so we omit its proof and
focus on the second guarantee. For a generic solution C ⊆ Q, |C| = k, denote C = {c1 , c2 , . . . , ck }.
Denote f (p) := G−1 (GC(Gp)), i.e., f (p) is a center in C realizing dist(Gp, G(C)). For j ∈ [k],
denote Sj := {p ∈ P : f (p) = cj } as the cluster induced by cj .
For every i ∈ [k], define the “threshold level” of cluster i as

ℓi := max{ℓ : |Pℓi | · rℓz > α opt(P, Q)}. (13)

We also define the i-th “buffer” as Ii := [ℓi − log(2000L2 ), ℓi + log(αk)], where L is the (sufficiently
large) constant in Lemma 2.8.
For 0 ≤ ℓ ≤ m, denote random variable βℓ to be the minimum real, such that ∀u, v ∈ Nℓ , ∥Gu −
Gv∥ ≥ (1 − ε − βℓ ε)∥u − v∥. Denote random variable γℓ to be the minimum real, such that
∀u ∈ Nℓ , v ∈ B(u, ε3 rℓ ), ∥Gu − Gv∥ ≤ γℓ ε3 rℓ . For p ∈ P ∪ Q, write βp := βjp and γp := γjp for
simplicity.
In the following lemma, we define our good events and bound their success probability. The
proof is deferred to Section 5.1.
Lemma 5.4. With probability at least 0.99, the following events happen simultaneously.
z −Ω(ε2 t) · opt(P, Q), and z z z
P P
(a) p∈P βp rp ≤ e p∈P γp rp ≤ 10 · O(opt(P, Q)).

(b) ∀i ∈ [k], ∀ℓ ∈ Ii , ∀u ∈ Nℓ , v ∈ B(u, ε3 rℓ ), ∥Gu − Gv∥ ≤ 10ε3 rℓ .

(c) ∀i ∈ [k], ∀ℓ ∈ Ii , every net point u ∈ Nℓ satisfies that ∀P ′ ⊆ P ,

X X ε
∥Gp − Gu∥z ≥ (1 − ε)3z ∥p − u∥z − opt(P, Q).
k2
p∈P ′ p∈P ′

(d) ∀i ∈ [k], ∀y ∈ B(c∗i , 40L · rℓi ), ∥Gy − Gc∗i ∥ ≤ 400L · rℓi .

(e) ∀i ∈ [k], ∀y ∈ (P ∪ Q) \ B(c∗i , 2000L2 · rℓi ), ∥Gy − Gc∗i ∥ > 2000L · rℓi .

19
(f ) For p ∈ P , denote by random variable ξp := miny : ∥y−p∥>9L·rℓ ∥Gy − Gp∥. Then ∀i ∈ [k],
i

X
ξpz > α opt(P, Q).
p∈Pℓi
i

(g) For p ∈ P , denote ηp := miny : ∥y−p∥>9Lrp ∥Gy − Gp∥. Then ∀i ∈ [k],

X X
max{0, (9rp )z − ηpz } ≤ e−Ω(t) · rpz .
p∈Si∗ p∈Si∗

The proof proceeds by a careful case analysis.

Case 1, one cluster with no cover: max1≤i≤k {∥c∗i − C(c∗i )∥ − 10L · rℓi } > 0. Then there exists
i ∈ [k], such that ∥c∗i − C(c∗i )∥ > 10L · rℓi . Intuitively, this means all points in C are far away from
c∗i . Write
X
cost(G(P ), G(C)) ≥ cost(G(Pℓii ), G(C)) = ∥Gp − Gf (p)∥z . (14)
p∈Pℓi
i

Note that for every p ∈ Pℓii ,

∥p − f (p)∥ ≥ ∥p − C(p)∥
≥ ∥C(p) − c∗i ∥ − ∥p − c∗i ∥
≥ ∥c∗i − C(c∗i )∥ − ∥p − c∗i ∥
> 10L · rℓi − rp
≥ 9L · rℓi .

Therefore, ∥Gp − Gf (p)∥ ≥ ξp . Combining with (14) yields

X X
cost(G(P ), G(C)) ≥ ∥Gp − Gf (p)∥z ≥ ξpz > α opt(P, Q).
p∈Pℓi p∈Pℓi
i i

where the last inequality follows from event (f).

Case 2, max1≤i≤k {∥c∗i − C(c∗i )∥ − 10L · rℓi } ≤ 0. Then for every i ∈ [k], ∥c∗i − C(c∗i )∥ ≤ 10L · rℓi ,
which intuitively means every center in C ∗ has a nearby neighbor in C.

Comparing “fake” centers to optimal centers. Let i ∈ [k]. For every p ∈ Si∗ , we consider
the distance of p’s “fake” center f (p) (recall, Gf (p) realizes dist(Gp, G(C))) from p’s optimal center
c∗i . There are three ranges we consider for ∥f (p) − c∗i ∥.
Define Ri := {p ∈ Si∗ : rℓi /(αk) ≤ ∥f (p) − c∗i ∥ ≤ 2000L2 · rℓi }, and denote R := ki=1 Ri
S
(called “the middle range”). Moreover, define Ti := {p ∈ Si∗ : ∥f (p) − c∗i ∥ ≤ rℓi /(αk)}, and denote
T := ki=1 Ti (called “the close range”).
S

20
Case 2.1, the middle range p ∈ R. Let us first lower bound ∥Gp − Gf (p)∥ for p ∈ R. Assume
C ∗ (p) = c∗i and f (p) = cj , where i, j ∈ [k]. Since p ∈ Ri , we can assume rℓ+1 < ∥cj − c∗i ∥ ≤ rℓ for
some level ℓ ∈ Ii . Let ui,j be the net point in Nℓ closest to cj . Then
∥Gp − Gf (p)∥z ≥ (1 − zε)∥Gp − Gui,j ∥z − ε−z ∥Gcj − Gui,j ∥z by Lemma 2.1
z −z 3 z
≥ (1 − zε)∥Gp − Gui,j ∥ − ε (10ε rℓ ) by event (b)
≥ (1 − zε)∥Gp − Gui,j ∥ − z
O(ε) ∥cj − c∗i ∥z
2z

≥ (1 − zε)∥Gp − Gui,j ∥z − O(ε)2z ∥p − cj ∥z − O(ε)2z ∥p − c∗i ∥z by Lemma 2.1

= (1 − zε)∥Gp − Gui,j ∥z − O(ε) ∥p − f (p)∥ − O(ε)2z ∥p − C ∗ (p)∥z .
2z z

Summing over p ∈ R, we have

X
∥Gp − Gf (p)∥z
p∈R
k X
X k X
= ∥Gp − Gcj ∥z
i=1 j=1 p∈Ri ∩Sj
k X
X k X
≥ (1 − zε)∥Gp − Gui,j ∥z − O(ε)2z ∥p − f (p)∥z − O(ε)2z ∥p − C ∗ (p)∥z
i=1 j=1 p∈Ri ∩Sj
k X
X k X X
≥ (1 − zε)∥Gp − Gui,j ∥z − O(ε)2z ∥p − f (p)∥z − O(ε)2z opt(P, Q)
i=1 j=1 p∈Ri ∩Sj p∈R

Applying event (c) to net point ui,j and subset Ri ∩ Sj , we have

 
k X k
X X ε X
≥ (1 − O(zε)) ∥p − ui,j ∥z − 2 opt(P, Q) − O(ε) ∥p − f (p)∥z − O(ε) opt(P, Q)
k
i=1 j=1 p∈Ri ∩Sj p∈R
X
≥ (1 − O(zε)) ∥p − f (p)∥z − O(ε) opt(P, Q)
p∈R
X
≥ (1 − O(zε)) ∥p − C(p)∥z − O(ε) opt(P, Q). (15)
p∈R

Case 2.2, the close range p ∈ T . This is somewhat of a special case of Case 2.1. Assume
C ∗ (p) = c∗i and f (p) = cj , where i, j ∈ [k]. Since p ∈ Ri , we have ∥cj − c∗i ∥ ≤ rℓ for ℓ = ℓi + log(αk).
Let ui,j be the net point in Nℓ closest to cj . we have,
∥Gp − Gf (p)∥z ≥ (1 − zε)∥Gp − Gui,j ∥z − ε−z ∥Gcj − Gui,j ∥z
≥ (1 − zε)∥Gp − Gui,j ∥z − O(ε)2z rℓz .
/ B(c∗i , rℓi +1 ), then
If p ∈
rℓ ≤ 1
2k ∥p − c∗i ∥ ≤ 1
2k (∥p − cj ∥ + ∥cj − c∗i ∥) ≤ 1
2k (∥p − cj ∥ + rℓ ).
Rearranging, we obtain rℓ ≤ ∥p − cj ∥. Summing over p ∈ T , we have
X
∥Gp − Gf (p)∥z
p∈T

21
k X
X k X
= ∥Gp − Gcj ∥z
i=1 j=1 p∈Ti ∩Sj
k X
X k X
≥ (1 − zε)∥Gp − Gui,j ∥z − O(ε)2z ∥p − cj ∥z − O(ε)2z P ∩ B(c∗i , rℓi +1 ) · rℓz
i=1 j=1 p∈Ti ∩Sj
k X
X k X
≥ (1 − zε)∥Gp − Gui,j ∥z − O(ε)2z ∥p − cj ∥z − O(ε opt) by choice of ℓi
i=1 j=1 p∈Ti ∩Sj

Applying event (c) to net point ui,j and subset Ti ∩ Sj , we have

k X k
X X ε
≥ (1 − O(zε))∥p − ui,j ∥z − 2 opt −O(ε)2z ∥p − cj ∥z − O(ε opt)
k
i=1 j=1 p∈Ti ∩Sj
k X
k
X X ε
≥ (1 − O(zε))∥p − cj ∥z − opt − O(ε opt)
k2
i=1 j=1 p∈Ti ∩Sj
X
≥ (1 − O(zε)) ∥p − C(p)∥z − O(ε) opt(P, Q). (16)
p∈T

Case 2.3, the far range p ∈ / R ∪ T . We now consider points p ∈ Si∗ \ (R ∪ T ), i.e., ∥f (p) − c∗i ∥ ≥
2000L rℓi . Suppose f (p) = cj . By (e), ∥Gcj − Gc∗i ∥ ≥ 2000Lrℓi .
2

Claim 5.5. In this case, rp ≥ 10Lrℓi .

Proof. Assume by contradiction that rp < 10Lrℓi . By Lemma 5.3, ∥C(p) − c∗i ∥ ≤ 4 max{rp , ∥c∗i −
C(c∗i )∥} ≤ 40Lrℓi . Thus by (d), Gp, GC(p) ∈ B(Gc∗i , 400Lrℓi ). Therefore,

∥Gcj − Gc∗i ∥ ≤ ∥Gcj − Gp∥ + ∥Gp − Gc∗i ∥ ≤ ∥GC(p) − Gp∥ + ∥Gp − Gc∗i ∥ ≤ 800Lrℓi ,

contradiction.

Therefore, by Lemma 5.3, ∥C(p) − c∗i ∥ ≤ 4rp and hence

∥p − C(p)∥ ≤ ∥p − c∗i ∥ + ∥C(p) − c∗i ∥ ≤ 5rp . (17)

On a high level, as can be seen by the claim, we have that both f (p) and p are far from c∗i . We
split into cases depending which of p or f (p) is farther from c∗i (up to a constant), as follows.

Case 2.3.1, p ∈ Si∗ \ (R ∪ T ), and ∥f (p) − c∗i ∥ > 10Lrp . By triangle inequality,

∥p − f (p)∥ ≥ ∥f (p) − c∗i ∥ − ∥p − c∗i ∥ ≥ 9L · rp .

By the definition of ηp , we have ∥Gp − Gf (p)∥ ≥ ηp . Therefore,

X
∥Gp − Gf (p)∥z
p∈Si∗ \(R∪T )
∥f (p)−c∗i ∥>10Lrp

22
X
≥ ηpz since ∥p − f (p)∥ ≥ 9Lrp
p∈Si∗ \(R∪T )
∥f (p)−c∗i ∥>10Lrp
X X
≥ (9rp )z − e−Ω(t) · rpz by event (g)
p∈Si∗ \(R∪T ) p∈Si∗
∥f (p)−c∗i ∥>10Lrp
X X
≥ (5rp )z − e−Ω(t) · rpz
p∈Si∗ \(R∪T ) p∈Si∗
∥f (p)−c∗i ∥>10Lrp
X X
≥ ∥p − C(p)∥z − e−Ω(t) · rpz by (17) (18)
p∈Si∗ \(R∪T ) p∈Si∗
∥f (p)−c∗i ∥>10Lrp

Case 2.3.2, p ∈ Si∗ \ (R ∪ T ), and ∥f (p) − c∗i ∥ ≤ 10Lrp . Denote up and uf (p) to be the net points
in Njp that are closest to p and f (p), respectively. Then

∥Gp − Gf (p)∥z
≥ (1 − 2zε)∥Gup − Guf (p) ∥z − ε−z ∥Gp − Gup ∥z − ε−z ∥Gf (p) − Guf (p) ∥z by triangle inequality
z z −z 3 z
≥ (1 − 2zε)(1 − ε − βp ε) ∥up − uf (p) ∥ − 2ε (γp ε rp ) by definitions of βp , γp
≥ (1 − 3zε − βp zε)∥up − uf (p) ∥z − O(ε)2z γpz rpz
≥ (1 − 3zε − βp zε)∥p − f (p)∥z − O(ε)2z rpz − O(ε)2z γpz rpz
Since ∥p − f (p)∥ ≤ ∥p − c∗i ∥ + ∥f (p) − c∗i ∥ ≤ rp + 10Lrp ≤ 20Lrp , we have
≥ (1 − 3zε)∥p − f (p)∥z − βp zε · (20L)z rpz − O(ε)2z rpz − O(ε)2z γpz rpz .

Therefore,
X
∥Gp − Gf (p)∥z
p∈Si∗ \(R∪T )
∥f (p)−c∗i ∥≤10Lrp
X X X
≥ (1 − 3zε) ∥p − f (p)∥z − zε(20L)z βp rpz − O(ε)2z (1 + γpz )rpz . (19)
p∈Si∗ \(R∪T ) p∈Si∗ p∈Si∗
∥f (p)−c∗i ∥≤10Lrp

Wrap Up. Combining (19) and (18), we have

X X X X
∥Gp − Gf (p)∥z ≥ (1 − 3zε) ∥p − C(p)∥z − zε(20L)z βp rpz − O(ε)2z (1 + γpz )rpz .
p∈Si∗ \(R∪T ) p∈Si∗ \(R∪T ) p∈Si∗ p∈Si∗

Summing over i ∈ [k] yields

X
∥Gp − Gf (p)∥z
p∈P \(R∪T )
X X X
≥ (1 − 3zε) ∥p − C(p)∥z − zε(20L)z βp rpz − O(ε)2z (1 + γpz )rpz
p∈P \(R∪T ) p∈P p∈P

23
2 t)
X
≥ (1 − 3zε) ∥p − C(p)∥z − zε(20L)z e−Ω(ε · opt(P, Q) − O(ε)2z · opt(P, Q)
p∈P \(R∪T )
X
≥ (1 − 3zε) ∥p − C(p)∥z − O(ε) · opt(P, Q), (20)
p∈P \(R∪T )

where the second last inequality follows from event (a) and Lemma 5.2. Finally, we combine (15),(16)
and (20) and obtain

cost(G(P ), G(C)) ≥ (1 − O(zε)) cost(P, C) − O(ε) · opt(P, Q) ≥ (1 − O(zε)) cost(P, C).

Rescaling ε → ε/z concludes the proof.

5.1 Proof of Lemma 5.4

Lemma 5.4. With probability at least 0.99, the following events happen simultaneously.
z −Ω(ε2 t) · opt(P, Q), and z z z
P P
(a) p∈P βp rp ≤ e p∈P γp rp ≤ 10 · O(opt(P, Q)).

(b) ∀i ∈ [k], ∀ℓ ∈ Ii , ∀u ∈ Nℓ , v ∈ B(u, ε3 rℓ ), ∥Gu − Gv∥ ≤ 10ε3 rℓ .

(c) ∀i ∈ [k], ∀ℓ ∈ Ii , every net point u ∈ Nℓ satisfies that ∀P ′ ⊆ P ,

X X ε
∥Gp − Gu∥z ≥ (1 − ε)3z ∥p − u∥z − opt(P, Q).
k2
p∈P ′ p∈P ′

(d) ∀i ∈ [k], ∀y ∈ B(c∗i , 40L · rℓi ), ∥Gy − Gc∗i ∥ ≤ 400L · rℓi .

(e) ∀i ∈ [k], ∀y ∈ (P ∪ Q) \ B(c∗i , 2000L2 · rℓi ), ∥Gy − Gc∗i ∥ > 2000L · rℓi .

(f ) For p ∈ P , denote by random variable ξp := miny : ∥y−p∥>9L·rℓ ∥Gy − Gp∥. Then ∀i ∈ [k],
i

X
ξpz > α opt(P, Q).
p∈Pℓi
i

(g) For p ∈ P , denote ηp := miny : ∥y−p∥>9Lrp ∥Gy − Gp∥. Then ∀i ∈ [k],

X X
max{0, (9rp )z − ηpz } ≤ e−Ω(t) · rpz .
p∈Si∗ p∈Si∗

Proof. It suffices to show that each of the events happens with probability at least 0.999, then a
union bound concludes the proof.

24
2
Event (a). We first show that ∀ℓ ∈ [m], E(βℓ ) = e−Ω(ε t) . Recall we define βℓ to be the minimum
real, such that ∀u, v ∈ Nℓ , ∥Gu − Gv∥ ≥ (1 − ε − βℓ ε)∥u − v∥. Then for every h ≥ 0, we have

Pr(βℓ > h) ≤ Pr ∃u, v ∈ Nℓ , ∥Gu − Gv∥ < (1 − (h + 1)ε)∥u − v∥
2 ε2 t/8
≤ ε−O(ddim) e−(h+1)
2 t(h+1)2
≤ e−a·ε .

The last inequality holds since the target dimension t = Ω(ε−2 ddim log ε−1 ), and thus a is a constant.
Therefore,
Z +∞ Z +∞ Z +∞ Z +∞
−aε2 t(h+1)2 −aε2 th2 2 2 1 −aε2 t
E(βℓ ) = Pr(βℓ > h) dh ≤ e dh = e dh ≤ he−aε th dh = e .
0 0 1 1 2aε2 t

Hence,
 
2 2 2
X X X
E βp rpz  = rpz · E(βp ) = e−Ω(ε t) · rpz = e−Ω(ε t) 2z · opt(P, Q) ≤ e−Ω(ε t) · opt(P, Q).
p∈P p∈P p∈P

z −Ω(ε2 t) · opt(P, Q).

P
By Markov’s inequality, with probability at least 0.999, p∈P βp rp = e
We bound p∈P γpz rpz by the similar argument. Recall we define γℓ to be the minimum real,
P

such that ∀u ∈ Nℓ , v ∈ B(u, ε3 rℓ ), ∥Gu − Gv∥ ≤ γℓ ε3 rℓ . Then for every h > 10, we have

Pr(γℓ > h) ≤ Pr ∃u ∈ Nℓ , v ∈ B(u, ε3 rℓ ), ∥Gu − Gv∥ > hε3 rℓ
2t
≤ ε−O(ddim) e−A3 h by Lemma 2.7
−bth2
≤e ,

where b is a constant. Therefore,

Z ∞ Z 10z Z +∞ Z +∞
2/z 2
E(γℓz ) = Pr(γℓz > h) dh ≤ dh + e−bth dh = 10z + zhz−1 e−bth dh
0 0 10z 10
Z +∞ Z +∞
′ 2
= 10z + z exp(z ln h − bth2 ) dh ≤ 10z + e−b th dh
10 10
Z +∞
h ′ 2 1 ′
≤ 10z + e−b th dh = 10z + ′
e−100b t
10 10 20b t
= O(10z ).

Hence,
 
X X X
E γpz rpz  = rpz · E(γpz ) = O(10z ) · rpz ≤ O(10z ) · opt(P, Q).
p∈P p∈P p∈P

γpz rpz = O(10z ) · opt(P, Q).

P
By Markov’s inequality, with probability at least 0.999, p∈P

25
Event (b). Fix i ∈ [k], level ℓ ∈ Ii and a net point u ∈ Nℓ . By Lemma 2.7, Pr(∃v ∈
B(u, ε3 rℓ ), ∥Gu − Gv∥ > 10ε3 rℓ ) ≤ e−Ω(t) . Noting that |Ii | = O(log(αk)), a union bound over all
k · (log k + log α) · ε−O(ddim) tuples (i, ℓ, u) completes the proof.

2 t)
Event (c). Fix a point u ∈ P . By Lemma 4.2, with probability 1 − ε−O(z) k 2 e−Ω(ε ,
X X ε
∀P ′ ⊆ P, ∥Gp − Gu∥z ≥ (1 − ε)3z ∥p − u∥z − opt-contzk (P )
k2
p∈P ′ p∈P ′
X ε
≥ (1 − ε)3z ∥p − u∥z − opt(P, Q).
k2
p∈P ′

A union bound over all k · (log k + log α) · ε−O(ddim) tuples (i, ℓ, u) completes the proof.

Event (d). For every i ∈ [k], by Lemma 2.7, Pr(∃y ∈ B(c∗i , 40L · rℓi ), ∥Gy − Gc∗i ∥ > 400L · rℓi ) ≤
e−Ω(t) . A union bound over i ∈ [k] completes the proof.

Event (e). For every i ∈ [k], by Lemma 2.8, Pr(∃y ∈ (P ∪ Q) \ B(c∗i , 2000L2 · rℓi ), ∥Gy − Gc∗i ∥ ≤
2000L · rℓi ) ≤ e−Ω(t) . A union bound over i ∈ [k] completes the proof.

Event (f ). By Lemma 2.8, Pr(ξp < 9rℓi ) ≤ e−A2 t . Hence,

E(max{0, (9rℓi )z − ξpz }) ≤ (9rℓi )z · Pr(ξp < 9rℓi ) ≤ (9rℓi )z · e−A2 t .

By Markov’s inequality and a union bound over i ∈ [k], with probability 0.999,
X X X
ξp ≥ (9rℓi )z − O(k · e−A2 t ) · (9rℓi )z ≥ 8|Pℓii | · rℓzi > α opt(P, Q), ∀i ∈ [k].
p∈Pℓi p∈Pℓi p∈Pℓi
i i i

Event (g). By Lemma 2.8, we have Pr(ηp < 9rp ) ≤ e−A2 t . Hence,

E(max{0, (9rp )z − ηpz }) ≤ (9rp )z · Pr(ηp < 9rp ) ≤ (9rp )z · e−A2 t .

Summing up over p ∈ Si∗ , we have

 
X X
E max{0, (9rp )z − ηpz } ≤ O(e−A2 t ) · (9rp )z .
p∈Si∗ p∈Si∗

By Markov’s inequality and a union bound over i ∈ [k], with probability 0.999,
X X X
max{0, 9rp − ηp } ≤ O(k · e−A2 t ) · 9z rpz ≤ e−Ω(t) rpz , ∀i ∈ [k].
p∈Si∗ p∈Si∗ p∈Si∗

26
6 Lower bounds
In this section, we provide our lower bounds. For simplicity, we do not try to optimize the dependence
on z. All lower bounds are presented for z = 1.

6.1 Continuous, for all centers

Denote by 0d the origin of Rd . For ease of presentation, we allow P to be a multi-set.

Theorem 6.1. Let n, dP∈ N, and P = {0d }n . LetPG ∈ R(d−1)×d be any linear map. Then, there
exists c ∈ Rd such that p∈P ∥Gp − Gc∥ = 0 and p∈P ∥p − c∥ = n.

Proof. Take a unit length vector c ∈ ker(G), i.e., Gc = 0 and ∥c∥ = 1. The proof follows immediately.

6.2 Discrete, for all centers

Here, we show that in order to bound the (multiplicative) contraction for all centers, we need either
dimension Ω(log log n), or to relax the definition of contraction (as is done in Theorem 5.1).

Theorem 6.2. Let n, d ∈ N and ε ∈ (0, 12 ). There exists P ⊂ Rd of size |P | = n and ddim(P ) =
Θ(1), such that if G is a Gaussian JL map onto dimension t ≤ aε−2 log log n forP a sufficiently small
constant
P a > 0, then with probability at least 2/3, there exists c ∈ P such that p∈P ∥Gp − Gc∥ ≤
(1 − ε) p∈P ∥p − c∥.

To prove the theorem, we will use the following lemma, whose proof appears in Section B for
completeness (see [ZZ20] for a stronger bound).

Lemma 6.3. Let t > 2 be an integer, let Xt be a chi-squared random variable with t degrees of
freedom, and let ε ∈ (0, 12 ). We have
2 t)
e−O(ε

t
Pr Xt < ≥ .
1+ε t
1
log n
Proof of Theorem 6.2. Denote by ei the i-th standard basis vector. Pick P = {2−i ei }i=0
2
∪
1
{0d }n− 2 log n . For each i ∈ [0, 12 log n], by Lemma 6.3,
2 t)
e−O(ε 10
Pr(∥Gei ∥ ≤ 1 − ε) ≥ t ≥ log n .

Therefore, the probability that for all i ∈ [0, 12 log n], we have ∥Gei ∥ > 1 − ε, is at most (1 −
10 12 log n
log n )
1
≤ 10 . Suppose this event does not happen, therefore there exists i∗ ∈ [0, 12 log n] such
that ∥Gei∗ ∥ ≤ 1 − ε. Moreover, assume that max 1 ∥Gei ∥ ≤ log log n, which holds with high
i∈[0, 2 log n]
∗
probability. Pick c = 2−i e i∗ . We have
X
∥Gp − Gc∥ ≤ (n − 21 log n) · ∥Gc∥ + 12 log n · 2 log log n
p∈P
∗
≤ (1 − ε)(n − 12 log n)2−i + log n · log log n

27
∗
≤ (1 − 2ε )(n − 12 log n)2−i ,
∗ 1
where we used 2−i ≥ 2− 2 log n = √1 ,
n
so for sufficiently large n, we have log n log log n ≤ ε
√
2 n
(n −
1
2 log n). Moreover,
∗
X
∥p − c∥ ≥ (n − 12 log n)2−i .
p∈P
The proof follows by rescaling ε and combining the two inequalities.

6.3 Discrete, for all partitions and centers

In this section, we show that dimension Ω(log log n) is necessary, even for the relaxed notion of
contraction, for preserving all partitions and centers.
Theorem 6.4. Let n, d ∈ N and ε ∈ (0, 12 ). There exists P ⊂ Rd of size |P | = n and ddim(P ) =
Θ(1), such that if G is a Gaussian JL map onto dimension a1 ε−2 ≤ t ≤ aε−2 log log n for a
sufficiently small constant a > 0, then with probability at least 2/3, there exists (c1 , c2 ) ⊂ P and a
partition (P1 , P2 ) of P such that
X X n X X o
∥Gp − Gci ∥ < min (1 − ε) ∥p − ci ∥, 100 opt(P ) .
i∈{1,2} p∈Pi i∈{1,2} p∈Pi
1
log n 1
Proof. Pick P = {(1 − ε)i ei }i=0 ∪ {0d }n− 2 log n . Observe that opt = − ε)i ≤ ε−1 .
2
P
i∈[1, 12 log n] (1
For each i ∈ [0, 21 log n], by Lemma 6.3,
2 t)
e−O(ε 10
Pr(∥Gei ∥ ≤ 1 − ε) ≥ t ≥ log n .

Therefore, the probability that for all i ∈ [0, 12 log n], we have ∥Gei ∥ > 1 − ε, is at most (1 −
10 12 log n
log n )
1
≤ 10 . Suppose this event does not happen, therefore there exists i∗ ∈ [0, 12 log n] such
that ∥Gei∗ ∥ ≤ 1 − ε. Moreover, assume that i∈[0, 1 log n] ∥Gei ∥(1 − ε)i ≤ (1 + ε) i∈[0, 1 log n] (1 −
P P
2 2
ε)i = (1 + ε) opt, which holds with high probability by [Dan21, Theorem A.3.1] since t = Ω(ε−2 )
(alternatively, one can use Lemma 4.6 with k = 1, and get that this bound holds w.h.p. if
∗ −1 −i∗
t = Ω(ε−2 log 1ε )). Pick c1 = 0, c2 = (1 − ε)i ei∗ and P2 = {0d }50ε ·(1−ε) , P1 = P \ P2 . (Note
∗ 1
that 50ε−1 · (1 − ε)−i ≤ 50ε−1 · 2 2 log n ≤ n2 , so this assignment is feasible.) We have,
∗ ∗
X X X
∥p − ci ∥ = (1 − ε)i · 50ε−1 · (1 − ε)−i + (1 − ε)i ≥ 51ε−1 − 2;
i∈{1,2} p∈Pi i̸=i∗

and
∗ ∗
X X X
∥Gp − Gci ∥ ≤ (1 − ε)i · 50ε−1 · (1 − ε)−i · (1 − ε) + ∥Gei ∥(1 − ε)i
i∈{1,2} p∈Pi i̸=i∗
X
≤ (1 − ε)50ε−1 + (1 + ε) (1 − ε)i
i∈[0, 12 log n]

≤ (1 − ε)50ε−1 + (1 + ε)ε−1
= 51ε−1 − 49,
which is smaller than both i∈{1,2} p∈Pi ∥p − ci ∥ and 100 opt = 100ε−1 , concluding the proof.
P P

28
6.4 Discrete, for all centers, with candidate center set
Theorem 6.5. Let n, s, d ∈ N and ε ∈ (0, 12 ). There exists P, Q ⊂ Rd of sizes |P | = n, |Q| = s,
and ddim(P ∪ Q) = O(1), such that if G is a Gaussian JL map onto dimension t ≤ aε−2 log s for
aPsufficiently small constantPa > 0, then with probability at least 2/3, there exists c ∈ Q such that
p∈P ∥Gp − Gc∥ ≤ (1 − ε) p∈P ∥p − c∥.

Proof. Consider P = {0d }n and Q = {2i ei }si=1 . For each i ∈ [s], by Lemma 6.3,
2 t)
e−O(ε 10
Pr(∥Gei ∥ ≤ 1 − ε) ≥ t ≥ s .

With probability at least 1 − (1 − 10/s)s ≥ 1 − e−10 , there exists i∗ ∈ [s] such that ∥Gei∗ ∥ ≤ 1 − ε.
∗
Pick c = 2i ei∗ . Then
∗ ∗
cost(G(P ), Gc) = n · 2i ∥Gei∗ ∥ ≤ n · 2i · (1 − ε) = (1 − ε) cost(P, c).

References
[BBC+ 19] Luca Becchetti, Marc Bury, Vincent Cohen-Addad, Fabrizio Grandoni, and Chris
Schwiegelshohn. Oblivious dimension reduction for k-means: beyond subspaces and
the Johnson-Lindenstrauss Lemma. In Proceedings of the 51st Annual ACM SIGACT
Symposium on Theory of Computing, STOC, pages 1039–1050, 2019. doi:10.1145/33
13276.3316318.

[BRS11] Yair Bartal, Ben Recht, and Leonard J Schulman. Dimensionality reduction: beyond the
johnson-lindenstrauss bound. In Proceedings of the twenty-second annual ACM-SIAM
symposium on Discrete Algorithms, pages 868–887. SIAM, 2011.

[BZD10] Christos Boutsidis, Anastasios Zouzias, and Petros Drineas. Random projections for k-
means clustering. In 24th Annual Conference on Neural Information Processing Systems,
NeurIPS, pages 298–306. Curran Associates, Inc., 2010. URL: https://proceedings.ne
urips.cc/paper/2010/hash/73278a4a86960eeb576a8fd4c9ec6997-Abstract.html.

[CEM+ 15] Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina
Persu. Dimensionality reduction for k-means clustering and low rank approximation. In
Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing,
STOC, pages 163–172, 2015. doi:10.1145/2746539.2746569.

[CEMN22] Vincent Cohen-Addad, Hossein Esfandiari, Vahab Mirrokni, and Shyam Narayanan.
Improved approximations for Euclidean k-means and k-median, via nested quasi-
independent sets. In Proceedings of the 54th Annual ACM SIGACT Symposium on
Theory of Computing, pages 1621–1628, 2022.

[CFS21] Vincent Cohen-Addad, Andreas Emil Feldmann, and David Saulpic. Near-linear time
approximation schemes for clustering in doubling metrics. Journal of the ACM, 68(6):1–
34, 2021. doi:10.1145/3477541.

29
[CGL+ 25] Vincent Cohen-Addad, Fabrizio Grandoni, Euiwoong Lee, Chris Schwiegelshohn, and
Ola Svensson. A (2+ ε)-approximation algorithm for metric k-median. In Proceedings
of the 57th Annual ACM Symposium on Theory of Computing, pages 615–624, 2025.
[CJK23] Xiaoyu Chen, Shaofeng H.-C. Jiang, and Robert Krauthgamer. Streaming Euclidean
Max-Cut: Dimension vs data reduction. In Proceedings of the 55th Annual ACM
Symposium on Theory of Computing, STOC, pages 170–182, 2023. doi:10.1145/3564
246.3585170.
[CK19] Vincent Cohen-Addad and C. S. Karthik. Inapproximability of clustering in Lp metrics.
In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS),
pages 519–539. IEEE, 2019. doi:10.1109/FOCS.2019.00040.
[CKL22] Vincent Cohen-Addad, C. S. Karthik, and Euiwoong Lee. Johnson coverage hypothesis:
Inapproximability of k-means and k-median in ℓp -metrics. In Proceedings of the 2022
Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1493–1530.
SIAM, 2022.
[CNW16] Michael B. Cohen, Jelani Nelson, and David P. Woodruff. Optimal approximate ma-
trix product in terms of stable rank. In 43rd International Colloquium on Automata,
Languages, and Programming (ICALP 2016), volume 55 of LIPIcs, pages 11:1–11:14.
Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPICS.ICALP
.2016.11.
[CW25] Moses Charikar and Erik Waingarten. The Johnson-Lindenstrauss Lemma for clustering
and subspace approximation: From coresets to dimension reduction. In SODA, pages
3172–3209. SIAM, 2025. doi:10.1137/1.9781611978322.102.
[Dan21] Matan Danos. Coresets for clustering by uniform sampling and generalized rank aggre-
gation. Master’s thesis, Weizmann Institute of Science, Rehovot, Israel, 2021. URL:
https://www.wisdom.weizmann.ac.il/~robi/files/MatanDanos-MScThesis-202
1_11.pdf.
[DG03] Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of Johnson
and Lindenstrauss. Random Struct. Algorithms, 22(1):60–65, 2003. doi:10.1002/rsa.
10073.
[GJK+ 25] Jie Gao, Rajesh Jayaram, Benedikt Kolbe, Shay Sapir, Chris Schwiegelshohn, Sandeep
Silwal, and Erik Waingarten. Randomized dimensionality reduction for Euclidean maxi-
mization and diversity measures. In Forty-second International Conference on Machine
Learning, 2025. URL: https://openreview.net/forum?id=Rcivp36KzO.
[GK15] Lee-Ad Gottlieb and Robert Krauthgamer. A nonlinear approach to dimension reduction.
Discrete & Computational Geometry, 54(2):291–315, 2015. doi:10.1007/s00454-015
-9707-9.
[GKL03] Anupam Gupta, Robert Krauthgamer, and James R. Lee. Bounded geometries, fractals,
and low-distortion embeddings. In 44th Symposium on Foundations of Computer Science,
FOCS, pages 534–543. IEEE Computer Society, 2003. doi:10.1109/SFCS.2003.1238
226.

30
[HJKY25] Lingxiao Huang, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Di Yue. Near-optimal
dimension reduction for facility location. In Proceedings of the 57th Annual ACM
Symposium on Theory of Computing, STOC, pages 665–676, 2025. doi:10.1145/3717
823.3718214.

[HM04] Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median cluster-
ing. In STOC, pages 291–300. ACM, 2004. doi:10.1145/1007352.1007400.

[IM98] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing
the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on
the Theory of Computing, STOC, pages 604–613, 1998. doi:10.1145/276698.276876.

[IN07] Piotr Indyk and Assaf Naor. Nearest-neighbor-preserving embeddings. ACM Trans.
Algorithms, 3(3):31, 2007. doi:10.1145/1273340.1273347.

[ISZ21] Zachary Izzo, Sandeep Silwal, and Samson Zhou. Dimensionality reduction for Wasser-
stein barycenter. Advances in neural information processing systems, 34:15582–15594,
2021.

[JKS24] Shaofeng H.-C. Jiang, Robert Krauthgamer, and Shay Sapir. Moderate dimension
reduction for k-center clustering. In 40th International Symposium on Computational
Geometry (SoCG 2024), volume 293 of Leibniz International Proceedings in Informatics
(LIPIcs), pages 64:1–64:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024.
doi:10.4230/LIPIcs.SoCG.2024.64.

[JL84] William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into
Hilbert space. Contemporary mathematics, 26:189–206, 1984. doi:10.1090/conm/026
/737400.

[Kir34] M. Kirszbraun. Über die zusammenziehende und lipschitzsche transformationen. Fun-

damenta Mathematicae, 22(1):77–108, 1934. doi:10.4064/fm-22-1-77-108.

[KR09] Leonard Kaufman and Peter J Rousseeuw. Finding groups in data: an introduction to
cluster analysis. John Wiley & Sons, 2009.

[Lam10] Christiane Lammersen. Approximation Techniques for Facility Location and Their
Applications in Metric Embeddings. PhD thesis, Technische Universität Dortmund, 2010.
doi:10.17877/DE290R-8506.

[LN17] Kasper Green Larsen and Jelani Nelson. Optimality of the Johnson-Lindenstrauss
Lemma. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS,
pages 633–638, 2017. doi:10.1109/FOCS.2017.64.

[LRU20] Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. Mining of massive data
sets. Cambridge university press, 2020.

[LSS09] Christiane Lammersen, Anastasios Sidiropoulos, and Christian Sohler. Streaming embed-
dings with slack. In 11th International Symposium on Algorithms and Data Structures,
WADS, volume 5664 of Lecture Notes in Computer Science, pages 483–494. Springer,
2009. doi:10.1007/978-3-642-03367-4\_42.

31
[Mah11] Michael W. Mahoney. Randomized algorithms for matrices and data. Foundations and
Trends in Machine Learning, 3(2):123–224, 2011. doi:10.1561/2200000035.

[MMR19] Konstantin Makarychev, Yury Makarychev, and Ilya P. Razenshteyn. Performance of

Johnson-Lindenstrauss transform for k-means and k-medians clustering. In Proceedings
of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC, pages
1027–1038, 2019. doi:10.1145/3313276.3316350.

[NSIZ21] Shyam Narayanan, Sandeep Silwal, Piotr Indyk, and Or Zamir. Randomized dimen-
sionality reduction for facility location and single-linkage clustering. In Proceedings
of the 38th International Conference on Machine Learning, ICML, volume 139 of
Proceedings of Machine Learning Research, pages 7948–7957. PMLR, 2021. URL:
http://proceedings.mlr.press/v139/narayanan21b.html.

[Pap81] Christos H. Papadimitriou. Worst-case and probabilistic analysis of a geometric location

problem. SIAM Journal on Computing, 10(3):542–557, 1981.

[PJ09] Hae-Sang Park and Chi-Hyuck Jun. A simple and fast algorithm for K-medoids cluster-
ing. Expert Syst. Appl., 36(2):3336–3341, 2009. doi:10.1016/J.ESWA.2008.01.039.

[Prond] ProofWiki contributors. Lower and upper bound of factorial. https://proofwiki.or

g/wiki/Lower_and_Upper_Bound_of_Factorial, n.d. Accessed on August 4, 2025.

[TZM+ 20] Mo Tiwari, Martin J Zhang, James Mayclin, Sebastian Thrun, Chris Piech, and Ilan
Shomorony. Banditpam: Almost linear time k-medoids clustering via multi-armed
bandits. Advances in Neural Information Processing Systems, 33:10211–10222, 2020.

[Woo14] David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and
Trends in Theoretical Computer Science, 10(1—2):1–157, 2014. doi:10.1561/040000
0060.

[XGF16] Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsupervised deep embedding for cluster-
ing analysis. In Proceedings of The 33rd International Conference on Machine Learning,
pages 478–487. PMLR, 2016. URL: https://proceedings.mlr.press/v48/xieb16.h
tml.

[ZZ20] Anru R. Zhang and Yuchen Zhou. On the non-asymptotic and sharp lower tail bounds
of random variables. Stat, 9(1):e314, 2020. doi:10.1002/sta4.314.

A Proof of Lemma 3.2

costzk (G(P), G(C)) ≤ (1 + ε) costzk (P, C).

We state the following lemma in [MMR19], which bounds the expected distance distortion of a
fixed pair of points under Gaussian JL maps.

32
Lemma A.1 ([MMR19, Eq. (5)]). Let ε ∈ (0, 1), z ≥ 1 and G ∈ Rt×d be a Gaussian JL map. For
p, q ∈ Rd ,
∥Gp − Gq∥z

2
E max 0, z
− (1 + ε)z
≤ e−Ω(ε t) .
∥p − q∥
Proof of Lemma 3.2. For every i ∈ [k] and p ∈ Si , by Lemma A.1,
∥Gp − Gci ∥z

2
E max 0, z
− (1 + ε)z
≤ e−Ω(ε t) .
∥p − ci ∥
Therefore,
2 t)
E (max {0, ∥Gp − Gci ∥z − (1 + ε)z ∥p − ci ∥z }) ≤ e−Ω(ε ∥p − ci ∥z .

Summing up i ∈ [k] and p ∈ Si ,

 
k X k X
2
X X
E max {0, ∥Gp − Gci ∥z − (1 + ε)z ∥p − ci ∥z } ≤ e−Ω(ε t) ∥p − ci ∥z .
i=1 p∈Si i=1 p∈Si

By Markov’s inequality, with probability at least 9/10,

k X k X
−Ω(ε2 t)
X X
z z z
max {0, ∥Gp − Gci ∥ − (1 + ε) ∥p − ci ∥ } ≤ O(e ) ∥p − ci ∥z ,
i=1 p∈Si i=1 p∈Si

which further implies

2 t) 2 t)
cost(G(P), G(C)) ≤ ((1 + ε)z + O(e−Ω(ε )) cost(P, C) ≤ (1 + O(zε + e−Ω(ε )) cost(P, C).

Rescaling ε completes the proof.

B Proof of Lemma 6.3

Lemma 6.3. Let t > 2 be an integer, let Xt be a chi-squared random variable with t degrees of
freedom, and let ε ∈ (0, 12 ). We have
2 t)
e−O(ε

t
Pr Xt < ≥ .
1+ε t
Proof. Without loss of generality, we assume t is even. To see why there is no loss of generality,
observe that if t is odd, then one can consider t′ = t + 1, which is even; we have Xt′ ≥ Xt and thus
Pr(Xt′ < x) ≤ Pr(Xt < x), hence we can bound with respect to t′ .
The density function of Xt is
1
t/2
xt/2−1 e−x/2 .
2 Γ(t/2)
Thus,
Z t/(1−ε)
t 1
Pr X < ≥ xt/2−1 e−x/2 dx
1−ε 0 2t/2 Γ(t/2)

33
t
− Z t/(1−ε)
e 2(1−ε)
≥ t/2 xt/2−1 dx
2 Γ(t/2) 0
t
− 2(1−ε) t/2
e t
= t/2 ·
2 Γ(t/2)t/2 1−ε
tt/2

t
− 2(1−ε) 1
= t/2 · e · .
2 Γ(t/2)t/2 (1 − ε)t/2
Now we analyze the first term. From [Prond], we have that
t
t/2
2 −1
Γ(t/2) = (t/2 − 1)! ≤ ,
et/2−2
so
tt/2 et/2−2 tt/2 t/2−2 tt/2
≥ ≥ e · ≥ Ω(et/2 /t).
2t/2 Γ(t/2)t/2 2t/2 (t/2 − 1)t/2 t/2 2t/2 (t/2)t/2 t/2
Furthermore,

t
− 2(1−ε) 1 t 1 t ε2
e · = exp − + log(1 − ε) ≥ exp − 1+ 2 ,
(1 − ε)t/2 2 1−ε 2
1 2
where we used log(1 − ε) ≤ −ε and 1−ε ≤ 1 + ε + ε2 , which hold for ε ≤ 12 .
Multiplying the above two bounds and canceling the et/2 term, we have
2
e−O(tε )

t
Pr X < ≥ ,
1−ε t
as desired.

C Tightening the known lower bounds

The lower bounds of [NSIZ21, CW25] were stated without the dependence on ε. For completeness,
we add this dependence in this section. The proof is by applying Lemma 6.3 to their hard instances.
These lower bounds hold even for just preserving the optimum cost.
Theorem C.1. Let d, k ∈ N and ε ∈ (0, 12 ). There exists P ⊂ Rd of size |P | = O(k) and
ddim(P ) = Θ(1), such that if G is a Gaussian JL map onto dimension t ≤ aε−2 log k for a
sufficiently small constant a > 0, then with probability at least 2/3, opt(G(P )) < (1 − ε) opt(P ).
Proof. The proof is based on an instance by [NSIZ21]. Without loss of generality, assume k is odd.
Otherwise, just include a far-away point and use k ′ = k − 1. The hard dataset instance is made of
k+1
2 pairs (10ie1 , 10ie1 + ei ), where ei are the standard basis vectors. An optimal solution is to cover
all but one pair using two centers, and cover one pair using one center. Thus the optimum value
is just 1 from this cost of covering a pair with one center. The instance clearly has ddim = O(1).
Therefore, to preserve the cost, one has to preserve the distance for all pairs. However, the pairs
are in different directions, and this is precisely the hard instance for the JL Lemma. Hence, when
using target dimension t = o(ε−2 log k), by Lemma 6.3, one pair contracts to distance below 1 − ε
with high probability. Thus the cost reduces and the proof is concluded.

34
Theorem C.2. Let n ∈ N and ε ∈ (0, 12 ), with n = Ω(ε−2 ). Fix k = 2. There exists P ⊂ Rd of
size |P | = n, such that if G is a Gaussian JL map onto dimension t ≤ aε−2 log n for a sufficiently
small constant a > 0, then with probability at least 2/3, opt(G(P )) < (1 − ε) opt(P ).

Proof. The proof is based on [CW25] and a sketch is given in Section 1.1. Recall that the instance
is the first n standard basis vectors. We now complete the proof sketch into a full proof. Let
j1 ∈ [n/2] and j2 ∈ [ n2 + 1, n] be the indices minimizing ∥Gej ∥ in their regime. By Lemma 6.3,
√
E ∥Geji ∥2 < 1 − ε and E( ∥Geji ∥2 + 1) < (1 − ε) 2 for i = 1, 2. Next, consider i ∈ [ n2 ]. We have
p

E ∥Gej2 − Gei ∥ = E E ∥Gej2 − Gei ∥ Gej2 law of total expectation
q

≤E E ∥Gej2 − Gei ∥2 Gej2 Jensen’s inequality
q
= E( ∥Gej2 ∥2 + 1) independence
√
< (1 − ε) 2,

and
var(∥Gej2 − Gei ∥) ≤ E ∥Gej2 − Gei ∥2 = E(∥Gej2 ∥2 + 1) < 2 − ε,
and the same holds for i ∈ [ n2 + 1, n] and j1 . Therefore, by Chebyshev’s inequality and a union
bound, with probability 2/3,
X X √
opt(G(P )) ≤ ∥Gei − Gej2 ∥ + ∥Gei − Gej1 ∥ < (1 − Ω(ε)) 2 · n.
i∈[ n
2
] i∈[ n
2
+1,n]

The proof is concluded by rescaling ε.

Narayanan21b-DR For Facility Location and Single Linkage Clustering
No ratings yet
Narayanan21b-DR For Facility Location and Single Linkage Clustering
10 pages
Beyond Worst-Case Dimensionality Reduction For Sparse Vectors
No ratings yet
Beyond Worst-Case Dimensionality Reduction For Sparse Vectors
42 pages
Achilioptas
No ratings yet
Achilioptas
17 pages
Dimension Reduction Techniques: For Efficiently Computing Distances in Massive Data
No ratings yet
Dimension Reduction Techniques: For Efficiently Computing Distances in Massive Data
47 pages
Kmeans Is Np-Hard
No ratings yet
Kmeans Is Np-Hard
15 pages
Lecture08 Notes
No ratings yet
Lecture08 Notes
5 pages
Kmeans Journal
No ratings yet
Kmeans Journal
21 pages
An Optimal Algorithm For Approximate Nearest
No ratings yet
An Optimal Algorithm For Approximate Nearest
33 pages
Big Data Algo
No ratings yet
Big Data Algo
20 pages
Closest Pairs Clustering
No ratings yet
Closest Pairs Clustering
4 pages
17 Random Projections and Orthogonal Matching Pursuit
No ratings yet
17 Random Projections and Orthogonal Matching Pursuit
7 pages
07.01.approximate Nearest Neighbor Queries in Fixed Dimensions
No ratings yet
07.01.approximate Nearest Neighbor Queries in Fixed Dimensions
11 pages
Introduction To Data Science: Tom A S Horv Ath
No ratings yet
Introduction To Data Science: Tom A S Horv Ath
39 pages
Very Sparse Random Projections: Ping Li Trevor J. Hastie Kenneth W. Church
No ratings yet
Very Sparse Random Projections: Ping Li Trevor J. Hastie Kenneth W. Church
10 pages
Bock 2007
No ratings yet
Bock 2007
12 pages
10 K Means Clustering PDF
No ratings yet
10 K Means Clustering PDF
5 pages
Spectral Multidimensional Scaling: Signi Ficance
No ratings yet
Spectral Multidimensional Scaling: Signi Ficance
6 pages
An Optimized Divide-and-Conquer Algorithm For The Closest-Pair Problem in The Planar Case
100% (1)
An Optimized Divide-and-Conquer Algorithm For The Closest-Pair Problem in The Planar Case
14 pages
An Optimized Divide-and-Conquer Algorithm For The Closest-Pair Problem in The Planar Case
No ratings yet
An Optimized Divide-and-Conquer Algorithm For The Closest-Pair Problem in The Planar Case
14 pages
1 Applications of Nearest Neighbor
No ratings yet
1 Applications of Nearest Neighbor
5 pages
Application of Approximate Equality For Reduction of Feature Vector Dimension
No ratings yet
Application of Approximate Equality For Reduction of Feature Vector Dimension
15 pages
02 Clustering and Curse of Dimensionality PDF
No ratings yet
02 Clustering and Curse of Dimensionality PDF
3 pages
On Clustering Binary Data: Tao Li Shenghuo Zhu
No ratings yet
On Clustering Binary Data: Tao Li Shenghuo Zhu
5 pages
Intro To Data Science
No ratings yet
Intro To Data Science
47 pages
Generalized Eigenvalue Proximal Support Vector Machines: T, Y), I 1, 2, ..., M)
No ratings yet
Generalized Eigenvalue Proximal Support Vector Machines: T, Y), I 1, 2, ..., M)
19 pages
PresCNRS22 2
No ratings yet
PresCNRS22 2
37 pages
An Improved Algorithm For Approximating
No ratings yet
An Improved Algorithm For Approximating
10 pages
A Study of Projection Based Multiplicative Data Perturbation For Privacy Preserving Data Mining
No ratings yet
A Study of Projection Based Multiplicative Data Perturbation For Privacy Preserving Data Mining
3 pages
Multiple-Category Attribute Reduct Using Decision-Theoretic Rough Set Model
No ratings yet
Multiple-Category Attribute Reduct Using Decision-Theoretic Rough Set Model
18 pages
IBM Thomas J. Watson Research Center, New York, U.S.A.: K L K L
No ratings yet
IBM Thomas J. Watson Research Center, New York, U.S.A.: K L K L
22 pages
Wavefront Coding For Accommodation-Invariant Near-Eye Displays
No ratings yet
Wavefront Coding For Accommodation-Invariant Near-Eye Displays
14 pages
Energy-Efficient FPGA Framework For Non-Quantized Convolutional Neural Networks
No ratings yet
Energy-Efficient FPGA Framework For Non-Quantized Convolutional Neural Networks
2 pages
Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs
No ratings yet
Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs
8 pages
Computing-In-Memory Aware Model Adaption For Edge Devices
No ratings yet
Computing-In-Memory Aware Model Adaption For Edge Devices
9 pages
Low Power Vision Transformer Accelerator With Hardware-Aware Pruning and Optimized Dataflow
No ratings yet
Low Power Vision Transformer Accelerator With Hardware-Aware Pruning and Optimized Dataflow
10 pages
A Direct Memory Access Controller (DMAC) For Irregular Data Transfers On RISC-V Linux Systems
No ratings yet
A Direct Memory Access Controller (DMAC) For Irregular Data Transfers On RISC-V Linux Systems
6 pages
Rescaling-Aware Training For Efficient Deployment of Deep Learning Models On Full-Integer Hardware
No ratings yet
Rescaling-Aware Training For Efficient Deployment of Deep Learning Models On Full-Integer Hardware
4 pages
Fine-Tuning Small Language Models For Domain-Specific AI
No ratings yet
Fine-Tuning Small Language Models For Domain-Specific AI
15 pages
Deadlock-Free Routing For Full-Mesh Networks Without Using Virtual Channels
No ratings yet
Deadlock-Free Routing For Full-Mesh Networks Without Using Virtual Channels
15 pages
Toward An Unbiased Collective Memory For Efficient LLM-Based Agentic 6G Cross-Domain Management
No ratings yet
Toward An Unbiased Collective Memory For Efficient LLM-Based Agentic 6G Cross-Domain Management
12 pages
Introducing Large Language Models Into The Design Flow of Time Sensitive Networking
No ratings yet
Introducing Large Language Models Into The Design Flow of Time Sensitive Networking
7 pages
A Review of Software For Designing and Operating Quantum Networks
No ratings yet
A Review of Software For Designing and Operating Quantum Networks
12 pages
From Loop Nests To Silicon
No ratings yet
From Loop Nests To Silicon
34 pages
Optimizing Version AoI in Energy-Harvesting IoT
No ratings yet
Optimizing Version AoI in Energy-Harvesting IoT
6 pages
Enhancing Urban VANETs Stability
No ratings yet
Enhancing Urban VANETs Stability
10 pages
Wireless Laser Power Transfer For Low-Altitude Uncrewed Aerial Vehicle-Assisted Internet of Things
No ratings yet
Wireless Laser Power Transfer For Low-Altitude Uncrewed Aerial Vehicle-Assisted Internet of Things
7 pages
Target Wake Time Scheduling For Time-Sensitive and Energy-Efficient Wi-Fi Networks
No ratings yet
Target Wake Time Scheduling For Time-Sensitive and Energy-Efficient Wi-Fi Networks
18 pages
Bridging The Gap Between Simulated and Real Network Data Using Transfer Learning
No ratings yet
Bridging The Gap Between Simulated and Real Network Data Using Transfer Learning
7 pages
The Role of Legacy Mobile Networks in Infrastructure Resilience
No ratings yet
The Role of Legacy Mobile Networks in Infrastructure Resilience
6 pages
Dynamic Low Power Traffic Pattern For Energy Constrained Wireless Sensor Networks
No ratings yet
Dynamic Low Power Traffic Pattern For Energy Constrained Wireless Sensor Networks
14 pages
Faster Offloads by Unloading Them
No ratings yet
Faster Offloads by Unloading Them
7 pages
Quantum Algorithms For General Nonlinear Dynamics Based On The Carleman Embedding
No ratings yet
Quantum Algorithms For General Nonlinear Dynamics Based On The Carleman Embedding
148 pages
The COLIBRE Project Cosmological Hydrodynamical Simulations of Galaxy Formation and Evolution
No ratings yet
The COLIBRE Project Cosmological Hydrodynamical Simulations of Galaxy Formation and Evolution
54 pages
Towards Scalable Proteomics
No ratings yet
Towards Scalable Proteomics
10 pages
Modified Loss of Momentum Gradient Descent
No ratings yet
Modified Loss of Momentum Gradient Descent
52 pages
Compressibility Measures and Succinct Data Structures For Piecewise Linear Approximations
No ratings yet
Compressibility Measures and Succinct Data Structures For Piecewise Linear Approximations
19 pages
The Impact of Spectroscopic Redshift Errors On Cosmological Measurements
No ratings yet
The Impact of Spectroscopic Redshift Errors On Cosmological Measurements
27 pages
Mixed Dark Matter and Galaxy Clustering: The Importance of Relative Perturbations
No ratings yet
Mixed Dark Matter and Galaxy Clustering: The Importance of Relative Perturbations
29 pages
Cosmic Variance in Anisotropy Searches at Pulsar Timing Arrays
No ratings yet
Cosmic Variance in Anisotropy Searches at Pulsar Timing Arrays
10 pages
An Introduction To Gravitational Wave Theory: S. Speziale and D.A. Steer
No ratings yet
An Introduction To Gravitational Wave Theory: S. Speziale and D.A. Steer
71 pages
Bmgs Scheme of Work Ss3
No ratings yet
Bmgs Scheme of Work Ss3
84 pages
Edgar Valenzuela-RM PDF
100% (4)
Edgar Valenzuela-RM PDF
734 pages
Fluid Lab Weir
No ratings yet
Fluid Lab Weir
10 pages
Year 13 Physics Textbook
No ratings yet
Year 13 Physics Textbook
221 pages
Complexity
No ratings yet
Complexity
31 pages
Mathematics Advanced Year 11 Topic Guide Exponential and Logarithmic Functions
No ratings yet
Mathematics Advanced Year 11 Topic Guide Exponential and Logarithmic Functions
8 pages
Algebra: Equations and Logarithms
No ratings yet
Algebra: Equations and Logarithms
34 pages
Functions String Manipulation
100% (1)
Functions String Manipulation
23 pages
CourseBrick Spring2012 Zhou
0% (1)
CourseBrick Spring2012 Zhou
153 pages
Rediscovering Workflow Model From Event
No ratings yet
Rediscovering Workflow Model From Event
24 pages
Financial Trading Strategy Dynamics
No ratings yet
Financial Trading Strategy Dynamics
29 pages
(Ebook) Discrete Mathematics With Applications, 5th Edition (2020) by Susanna S. Epp Updated 2025
100% (1)
(Ebook) Discrete Mathematics With Applications, 5th Edition (2020) by Susanna S. Epp Updated 2025
104 pages
Logarithms Book
No ratings yet
Logarithms Book
61 pages
OJEE Syllybus For MCA 1
No ratings yet
OJEE Syllybus For MCA 1
9 pages
Honors Precalculus 2020-2021
No ratings yet
Honors Precalculus 2020-2021
6 pages
1.6 Limits Involving Exponential and Logarithmic Functions
No ratings yet
1.6 Limits Involving Exponential and Logarithmic Functions
23 pages
0s3 9MA0-02 Pure 2 - Mock Set 3 Ms PDF
100% (1)
0s3 9MA0-02 Pure 2 - Mock Set 3 Ms PDF
20 pages
Sound Pressure
No ratings yet
Sound Pressure
8 pages
Maths Project Work
No ratings yet
Maths Project Work
23 pages
NC Maths Assignment 1
No ratings yet
NC Maths Assignment 1
2 pages
Quantitative Reasoning Tools For Today's Informed Citizen 2nd Edition by Alicia Sevilla, Kay Somers ISBN 0470592710 9780470592717
No ratings yet
Quantitative Reasoning Tools For Today's Informed Citizen 2nd Edition by Alicia Sevilla, Kay Somers ISBN 0470592710 9780470592717
56 pages
08 Plasticity 06 Hardening
No ratings yet
08 Plasticity 06 Hardening
9 pages
Data Transformation
No ratings yet
Data Transformation
28 pages
Expt. 3 Heat Transfer in Agitated Vessel
67% (3)
Expt. 3 Heat Transfer in Agitated Vessel
7 pages
Chapter 2756
No ratings yet
Chapter 2756
30 pages
Teak Volume Table Analysis
No ratings yet
Teak Volume Table Analysis
9 pages
Indices and Logarithms Paper 1
No ratings yet
Indices and Logarithms Paper 1
8 pages
Module 3 - Point Trnsformations 2019
No ratings yet
Module 3 - Point Trnsformations 2019
8 pages
Virus Titration Using Reed-Müench Method
No ratings yet
Virus Titration Using Reed-Müench Method
3 pages
Logarithmic Properties and Laws
100% (1)
Logarithmic Properties and Laws
32 pages