Geometric Medians On Product Manifolds
Geometric Medians On Product Manifolds
Abstract
Product manifolds arise when heterogeneous geometric variables are recorded jointly. While the
Fréchet mean on Riemannian manifolds separates cleanly across factors, the canonical geometric median
couples them, and its behavior in product spaces has remained largely unexplored. In this paper, we
give the first systematic treatment of this problem. After formulating the coupled objective, we establish
general existence and uniqueness results: the median is unique on any Hadamard product, and remains
locally unique under sharp conditions on curvature and injectivity radius even when one or more factors
have positive curvature. We then prove that the estimator enjoys Lipschitz stability to perturbations and
the optimal breakdown point, extending classical robustness guarantees to the product-manifold setting.
Two practical solvers are proposed, including a Riemannian subgradient method with global sublinear
convergence and a product-aware Weiszfeld iteration that achieves local linear convergence when safely
away from data singularities. Both algorithms update the factors independently while respecting the
latent coupling term, enabling implementation with standard manifold primitives. Simulations on pa-
rameter spaces of univariate and multivariate Gaussian distributions endowed with the Bures-Wasserstein
geometry show that the median is more resilient to contamination than the Fréchet mean. The results
provide both theoretical foundations and computational tools for robust location inference with hetero-
geneous manifold-valued data.
1 Introduction
Robustness is a foundational principle in modern statistical methodology, particularly in settings where data
may be contaminated by outliers, subject to model misspecification, or exhibit heavy-tailed behavior (Huber,
1981). A canonical example is the sample mean, which is widely regarded as a natural estimator of central
tendency under Gaussian models due to its optimality properties. However, this optimality is highly sensitive
to deviations from idealized assumptions: the sample mean can exhibit substantial bias in the presence of
even a single extreme observation. Such sensitivity renders the sample mean unreliable in a wide range of
real-world applications.
The geometric median addresses these limitations by offering a promising alternative. Defined as the
point minimizing the sum of distances to observed data, it naturally attenuates the influence of outliers.
Unlike the sample mean, which aggregates squared distances and thus amplifies extremity, the geometric
median prioritizes positional consensus. Beyond its robustness, it also enjoys computational simplicity and
clear geometric intuition, which have spurred its adoption across a variety of domains.
1
In recent years, this idea has been extended from flat spaces to curved geometries. Riemannian manifolds
now host a growing literature on statistical inference, accommodating datasets that live on spheres, rotation
groups, and spaces of covariance matrices. For example, directional statistics leverages the geometry of the
sphere (Mardia and Jupp, 2000), diffusion tensor imaging works with the manifold of symmetric positive-
definite matrices (Dryden et al., 2009; You and Park, 2021), and robotics operates on matrix Lie groups
(Selig, 2005). Foundational tools like the Fréchet mean (Fréchet, 1948) and more recently, the Riemannian
geometric median (Afsari, 2011; Fletcher et al., 2009), have been developed for such settings. These advances
even extend to spaces of probability measures, where Wasserstein geometry underpins robust summaries (You
et al., 2025).
Yet one major class of manifolds remains underexplored in this context: product manifolds. These
arise when heterogeneous geometric variables are measured jointly, as in multimodal applications. For
instance, combined measurements of diffusion tensors and principal directions yield data on the product of a
symmetric positive-definite manifold and a sphere; in pose estimation, one must account for both translation
and rotation; in neuroimaging, functional connectivity matrices may be paired with cortical coordinates.
Such product structures are ubiquitous, but robust location inference in these spaces has been given little
formal treatment.
A key distinction arises when comparing means and medians on product manifolds. The Fréchet mean
decomposes additively across factors, allowing independent computation on each component. In contrast,
the geometric median involves an ℓ1 -like objective that couples the components through a norm structure.
This non-separability introduces both theoretical challenges and algorithmic complexity, precluding direct
application of existing median methods designed for single manifolds.
This paper develops the first comprehensive theory of geometric medians on product manifolds, combining
geometric analysis, robustness theory, and algorithmic design. Our contributions are fourfold:
1. We develop a general theoretical framework for geometric medians on product manifolds, establishing
existence and uniqueness results under conditions involving curvature bounds and injectivity radii of the
component manifolds.
2. We characterize the inherent non-separability of the geometric median in product spaces, demonstrating
that the minimizer cannot, in general, be recovered via marginal optimization over individual factors.
This coupling is formalized and analyzed.
3. We derive robustness guarantees in the form of perturbation bounds and breakdown properties, showing
that the geometric median retains desirable stability characteristics under mild geometric conditions.
4. We introduce two algorithmic strategies for computing the geometric median based on subgradient de-
scent and a Riemannian generalization of the Weiszfeld algorithm. Both algorithms operate component-
wise while incorporating the coupling structure of the objective. Convergence properties are established
under suitable regularity assumptions.
The remainder of the paper is organized as follows. Section 2 reviews the geometry of product manifolds
and the formulation of geometric medians in the Riemannian setting. Section 3 presents the main theoretical
results on existence, uniqueness, and robustness. Section 4 introduces computational algorithms and analyzes
their convergence properties. Section 5 illustrates the methodology through representative examples. Proofs
are deferred to the appendix.
2
2 Preliminaries
This section reviews the mathematical foundations for the study of geometric medians on product manifolds
at the minimal level. We first recall the definition and properties of the geometric median in the Riemannian
setting, followed by a description of product manifold geometry, which plays a key role in both our theoretical
and computational developments.
This formulation extends the Lp center of mass in Euclidean spaces to general Riemannian manifolds. For
instance, setting M = Rd and p = 2 induces the solution of Equation (1) as the standard weighted average.
For general manifolds, the problem and its solution are known as the Fréchet or Karcher mean when p = 2,
which is a direct generalization of the sample mean. Another important case is when p = 1, the minimizer
of whcih has been known as the geometric median (Fletcher et al., 2009).
Pn
Denote F (x) = i=1 wi d(x, xi ) the objective function in the geometric median problem. When the point
x ∈ M is such that the geodesic distance between x and each datum xi is unique and length-minimizing,
F (x) is directionally differentiable and locally Lipschitz. At such points, the subdifferential of F contains a
vector
n
X logx (xi )
− wi · ∈ ∂F (x),
i=1
∥ logx (xi )∥
where logx (xi ) ∈ Tx M denotes the Riemannian logarithmic map from x to xi , which gives the direction and
magnitude of the geodesic from x to xi . This subgradient formulation generalizes the Euclidean subdiffer-
ential of the ℓ1 norm to the Riemannian setting (Ferreira and Oliveira, 1998). When x = xj for some j, or
when multiple minimizing geodesics exist, the subdifferential is a set and includes a convex set of descent
directions.
Existence of the geometric median is guaranteed under mild conditions, the weakest of which is complete-
ness of M. However, uniqueness is much more subtle, contingent on both the curvature of the manifold and
the dispersion of the data distribution. In particular, on Hadamard manifolds that are complete, simply-
connected, and nonpositively curved, the uniqueness is immediate by the convexity of the distance function.
For positively curved manifolds, uniqueness may fail unless the data lie within a convex geodesic ball of
sufficiently small radius (Bhattacharya and Bhattacharya, 2012).
3
point (p, q) ∈ M, the tangent space decomposes as a direct sum
T(p,q) (M × N ) = Tp M ⊕ Tq N, (2)
and the inner product between tangent vectors (v1 , w1 ), (v2 , w2 ) ∈ T(p,q) M is defined by
For the rest of this paper, we will call M and N as factor manifolds, or simply factor, to denote components
in defining a product manifold.
Geodesics in M are given by pointwise pairing of geodesics from each factor. That is, if γM : [0, 1] → M
and γN : [0, 1] → N are geodesics in M and N respectively, then γ(t) = (γM (t), γN (t)) defines a geodesic in
M. The exponential map on M satisfies
for (v, w) ∈ T(p,q) M and (p′ , q ′ ) ∈ M. The geodesic distance on M is induced by the product metric
p
dM ((p1 , q1 ), (p2 , q2 )) = dM (p1 , p2 )2 + dN (q1 , q2 )2 ,
3 Theory
3.1 Problem formulation
Let M = M × N be a product manifold equipped with the product Riemannian metric gM and the distance
function dM as defined in Section 2. Suppose we have a random sample {(xi , yi )}ni=1 ⊂ M, with associated
Pn
positive weights w1 , . . . , wn > 0 such that i=1 wi = 1. The geometric median is defined as a minimizer
(p∗ , q ∗ ) ∈ M of the objective function
n
X n
X p
Fmedian (p, q) = wi dM ((p, q), (xi , yi )) = wi dM (p, xi )2 + dN (q, yi )2 . (4)
i=1 i=1
4
This formulation naturally couples the variables p ∈ M and q ∈ N on factor manifolds through the norm
structure of the product distance. For comparison, consider the Fréchet mean objective
n
X n
X n
X
Fmean (p, q) = wi dM ((p, q), (xi , yi ))2 = wi dM (p, xi )2 + wi dN (q, yi )2 ,
i=1 i=1 i=1
which admits separation of the objective into additive components, allowing independent optimization over
M and N . Unfortunately, the geometric median objective Fmedian lacks such separability. Hence, the
minimizer (p∗ , q ∗ ) cannot be obtained by solving two independent problems and this interdependency entails
nontrivial implications for both theoretical properties and computational treatment.
3.2 Existence
We begin by establishing the existence of a geometric median on the product manifold M, the argument of
which relies on classical results in variational analysis.
Theorem 3.1. Let M and N be complete Riemannian manifolds and let M = M × N denote the associated
product manifold with the product metric. Given a random sample {(xi , yi )}ni=1 and weights w1 , . . . , wn > 0
Pn
satisfying i=1 wi = 1, there exists at least one minimizer of the objective function
n
X
Fmedian (p, q) = wi dM ((p, q), (xi , yi )).
i=1
We note that neither M nor N is assumed to be compact. If so, the coercivity of the objective function
is readily available since all continuous functions on compact Riemannian manifolds are bounded and attain
their extrema. Therefore, the existence of a geometric median is guaranteed even when only one of the two
component manifolds is noncompact.
3.3 Uniqueness
We move onto investigating conditions under which the geometric median on a product manifold is uniquely
defined. Throughout this section, we impose the standard assumption that not all points lie on a common
geodesic. This prevents degeneracy of the problem and ensures strict convexity of the objective. We also
assume that the data lie within a geodesically convex region so that distances are uniquely defined and
differentaible.
The uniqueness of the geometric median is closely tied to the curvature properties of the underlying manifold.
We begin by investigating the sectional curvature of product manifolds in terms of those of their factors.
Proposition 3.2. Let M and N be Riemannian manifold manifolds with sectional curvatures bounded above
by secM ≤ κM and secN ≤ κN , respectively, for κM , κN ≥ 0. Then the sectional curvature of the product
manifold M = M × N equipped with the product metric is bounded above by max(κM , κN ).
5
This proposition provides a geometric characterization of the curvature profile of product manifolds in
that the curvature of any 2-plane in M is controlled by the worst-case curvature among the individual
components.
A direct consequence of the curvature decomposition established in Proposition 3.2 is to obtain the following
uniqueness result for the geometric on the product of manifolds of nonpositive curvature.
Proposition 3.3. Let M and N be Hadamard manifolds; that is, complete, simply connected Riemannian
manifolds with onpositive sectional curvature. Then the product manifold M = M × N is also Hadamard,
and the geometric median of any finite set of weighted points in M is unique.
This result simplifies both theoretical analysis and algorithmic aspect of the geometric median in product
settings with Hadamard components, including the Euclidean space, the space of symmetric and positive-
definite matrices, and the hyperbolic space.
Many important applications often involve manifolds with bounded positive curvature, compact components,
or mixture of such. We present a unified local result that ensures uniqueness of the geometric median on
product manifolds under mild geometric constraints. This framework also encompasses mixed-curvature
settings where factors manifolds are Euclidean spaces and the unit hypersphere, which are common in
directional statistics.
Many model spaces in practice, particularly those with bounded positive curvature, do not permit global
uniqueness to hold, necessitating local analysis. We establish sufficient conditions for the local uniqueness
of the geometric median based on curvature upper bounds and injectivity radii.
Theorem 3.4. Let M and N be complete Riemannian manifolds with bounded sectional curvatures secM ≤
κM , secN ≤ κN . Let κ := max(κM , κN ). Suppose there exists a point (p0 , qi ) ∈ M such that a random
sample {(xi , yi )}ni=1 ⊂ M lies within the geodesic ball B((p0 , q0 ), r) ⊂ M with
π
r < min injM (p0 ), injN (q0 ), √ ,
4 κ
where injM (x) is the injectivity radius of a manifold M at x ∈ M . Then the geometric median uniquely
exists within the geodesic ball B((p0 , q0 ), r).
Theorem 3.4 provides a general criterion for the local uniqueness of the geometric median. The radius
condition reflects a standard trade-off between positive curvature and geodesic convexity. Suppose κ = 0 as
in flat or Hadamard spaces, the uniqueness holds globally. When κ > 0, the uniqueness holds in sufficiently
small neighborhood where convexity holds. This is a generalization of convexity-based uniqueness results
for the Fréchet mean to the non-smooth geometric median setting (Afsari, 2011).
A particularly useful implication arises when one factor is nonpositively curved while the other is compact
with bounded positive curvature, such as the unit hypersphere. In such mixed-curvature settings, the
uniqueness argument can still be ensured by focusing the radius constraint entirely on the positively curved
factor as formalized in the following corollary.
6
Corollary 3.5. Suppose M is a Hadamard manifold and N is compact with secN ≤ κN . Then, for any
sufficiently small ball B((p0 , q0 ), r) ⊂ M containing a random sample {(xi , yi )}ni=1 ⊂ M with a radius
π
r < min injN (q0 ), √ ,
4 κN
3.4 Robustness
In this subsection, we establish theoretical guarantees regarding the robustness of the geometric median on
product manifolds. To recall, robustness refers to the stability of the estimator under small perturbations of
the data and its resilience to outliers (Huber, 1981). These properties are well known in the Euclidean setting
and have been studied in various geometric contexts on several manifolds (Fletcher et al., 2009). We show
that similar guarantees hold in the product setting, both locally and globally, under suitable conditions.
We begin by formalizing the stability of the geometric median under data perturbations. The result
assumes that both the original and perturbed datasets lie within a convex geodesic ball whose radius is
determined by curvature and injuectivity radius as previously established.
Proposition 3.6. Let M and N be complete Riemannian manifolds and assume the conditions of Theorem
3.4. Suppose the geometric median (p∗ , q ∗ ) uniquely exists within a geodesic ball B((p0 , q0 ), r) ⊂ M. Let the
perturbed sample {(x′i , yi′ )}ni=1 ⊂ M satisfy
dM (xi , x′i ) ≤ εM
i , dN (yi , yi′ ) ≤ εN
i for all i ∈ [n],
and let (p′∗ , q ′∗ ) denote the geometric median of the perturbed sample. Provided both (p′∗ , q ′∗ ) and (p∗ , q ∗ )
lie within B((p0 , q0 ), r), the following bound holds:
n
1 X M 2
dM (((p∗ , q ∗ ), (p′∗ , q ′∗ ))2 ≤ wi (εi ) + (εN 2
i ) ,
µ2 i=1
for some constant µ > 0 that depends on the geometry of M in the ball B((p0 , q0 ), r). Particularly, µ = 1
when both M and N are Hadamard manifolds.
Proposition 3.6 formalizes the intuition that the geometric median is a Lipschitz-continuous functional of
the data when restricted to sufficiently regular domains. In particular, it shows that small local perturbations
to the input induce a proportionally small displacement of the ouptut.
Next, we characterize the estimator’s resilience to adversarial contamination. The following result estab-
lishes that the breakdown point of the geometric median on product manifolds is asymptotically optimal.
Proposition 3.7. Let the conditions of Proposition 3.6 hold with a dataset of n weighted points {(xi , yi )}ni=1 ⊂
M. Suppose m < n points are replaced arbitrarily, yielding a contaminated dataset {(x′i , yi′ )}ni=1 such that
(x′i , yi′ ) = (xi , yi ) for i ∈
/ I with |I| = m. Let (p̃, q̃) ∈ M denote the geometric median of the contaminated
dataset. Then, jnk
sup dM ((p∗ , q ∗ ), (p̃, q̃)) < ∞ if and only if m < .
(x′i ,yi′ ):i∈I 2
Hence, the geometric median on M has breakdown point of 1/2.
7
This confirms that the geometric median is maximally robust against outliers, in the sense that it can
withstand up to 49% adversarial contamination without drastic degradation. This robustness, along with
the stability under smooth perturbations, makes the geometric median a highly desirable estimator as in the
traditional regime on single manifolds (Fletcher et al., 2009).
4 Computation
We present computational strategies for estimating the geometric median on product manifolds. The objec-
tive function Fmedian is nonsmooth, which calls for tools from nonsmooth Riemannian optimization. We first
formulate a general subgradient-based approach and show how the classical Riemannian Weiszfeld algorithm
arises as a special case. We then study the convergence behavior of these algorithms under mild assumptions
on the geometry of the manifold and the regularity of the data.
4.1 Algorithms
The product structure of M = M × N allows computation of the geometric median in a component-wise
manner, with updates independently performed on each factor manifold with an interleaving term that
connects the two. That is, the optimization problem on the product manifold naturally reduces to a coupled
problem over M and N , where each update step involves operations in the respective tangent space TP
and Tq N . This structure invites direct application of manifold optimization routines without requiring
complex constructions on the full product space. We posit that a random sample {(xi , yi )}ni=1 ⊂ M is given
with positive weights {wi }ni=1 that sums to 1. For simplicity, we denote the objective Fmedian simply as F
throughout the rest of this section.
where ∂p F (p, q) ⊂ Tp M and ∂q F (p, q) ⊂ Tq N are the partial subdifferentials of F with respect to each
component. Using the standard notation, we can express partial differentials explicitly as
X logp (xi ) X
∂p F (p, q) ∈ − wi · + wi · Bp ,
dM ((p, q), (xi , yi )) i:p=x
i:p̸=xi i
X logq (yj ) X
∂q F (p, q) ∈ − wj · + w j · Bq ,
dM ((p, q), (xj , yj )) j:q=y
j:q̸=yj j
where logp (x) : M → Tp M and logq (y) : N → Tq N are logarithmic maps (Absil et al., 2008), and Bp = {v ∈
Tp M | ∥v∥ ≤ 1} and Bq = {w ∈ Tq N | ∥w∥ ≤ 1} are the closed unit balls in the respective tangent spaces
and represent the set-valued contributions from points where the distance function is non-differentiable.
In order to derive explicit expressions of the subgradients, we need to consider three cases for each i ∈ [n].
First, consider the case when p ̸= xi and q ̸= yi . This is the regular differentiable case and the contribution
8
of the i-th terms to the subdifferentials are
logp (xi ) logq (yi )
∂p Fi (p, q) = −wi · , ∂q Fi (p, q) = −wi · .
dM ((p, q), (xi , yi )) dM ((p, q), (xi , yi ))
Next, consider when the equality holds for one of the factor manifolds. Without loss of generality, consider
p = xi and q ̸= yi . In this case, dM (p, xi ) = 0, hence logp (xi ) = 0. By the construction of the geodesic
distance, we have dM ((p, q), (xi , yi )) = dN (q, yi ). Therefore, the individual contribution to ∂p F (p, q) is the
differential of the scaled norm of at zero by wi while the contribution to ∂q F (p, q) remains the same:
logq (yi )
∂p Fi (p, q) = {v ∈ Tp M | ∥v∥ ≤ wi }, ∂q Fi (p, q) = −wi ·
dN (q, yi )
When p = xi and q = yi of the case with maximal nondifferentiability, directional derivatives in a classical
sense cannot be defined. Instead, one can still define the subdifferentials, using the similar logic as before,
by
∂p Fi (p, q) = {v ∈ Tp M | ∥v∥ ≤ wi }, ∂q Fi (p, q) = {w ∈ Tq N | ∥w∥ ≤ wi }.
These represent all directions of descent scaled by the weight wi , whose union across i still gives a well-defined
convex set of subgradients.
Putting the components together, we obtain a subgradient descent scheme on M, where updates are com-
(k) (k)
puted independently along each factor. Let (ξp , ξq ) ∈ ∂F (p(k) , q (k) ) denote a valid choice of subgradients
at the k-th iteration. The iterates are updated according to the following rules
where expp (v) : Tp M → M and expq (w) : Tq N → N are exponential maps on each factor and ηk is a step
size. This formulation updates coordinates at the same time while two updates involve an interleaving term
dM ((p(k) , q (k) ), (xi , yi )) that respects the formulation of the geometric median. In practice, a specific choice
of subgradient can be made by selecting a representative from each set-valued term, such as the element of
minimal norm in ∂p F (p, q) and ∂q F (p, q) or even random vectors of small magnitude. We note that any
measurable selection guarantees validity of descent direction upon updates under standard conditions.
A well-known alternative to subgradient descent is a fixed-point iteration that enforces the standard first-
order condition of vanishing gradients. When the current iterate (p, q) is far away from all of (xi , yi )’s, the
subdifferentials ∂p F (p, q) and ∂q F (p, q) are singletons and the conditions on vanishing gradients become
n n
X logp (xi ) X logq (yi )
wi · = 0, wi · = 0.
i=1
dM ((p, q), (xi , yi )) i=1
dM ((p, q), (xi , yi ))
Solving for a square root of these expressions motivate a fixed-point approach where the update is derived
by exponentiating average of logarithmic directions weighted by the inverse of the distances. The resulting
scheme, known as the Riemannian Weiszfeld algorithm (Fletcher et al., 2009), is derived as follows:
n
! n
!
X X
(k+1) (k+1)
p = expp(k) w̃i logp(k) (xi ) and q = expq(k) w̃i logq(k) (yi ) ,
i=1 i=1
9
(k)
with adaptive weights w̃i ’s that are given by
−1
n
(k)
X wj · wi
w̃i = (k) (k)
.
j=1
dM ((p , q ), (xj , yj )) dM ((p , q (k) ), (xi , yi ))
(k)
Each update corresponds to a weighted Fréchet mean on Riemannian manifolds where the contribution
of each data point is scaled inversely by its distance from the current iterate. This yields an implicit
normalization of the descent direction and avoids explicit tuning of step size.
The Weiszfeld algorithm is parameter-free and often exhibits linear convergence near the solution under
appropriate regularity. It is particularly effective when the geometric median lies deep within a geodesically
convex neighborhood of the data. However, its fixed-point nature and reliance on inverse-distance scaling can
result in instability near singularities, requiring care in implementation. There are several ad hoc remedies
to cope with such scenarios (Beck and Sabach, 2015). One is to regularize terms with small distances dM
by replacing it with or adding a small constant ε. Another strategy is to replace each update with a convex
combination of the current iterate and the raw Weiszfeld direction, the process called damping. There are
also algorithmic remedies to restart the algorithm, switch to a subgradient update, or even terminate at the
point.
In contrast to subgradient descent, the Weiszfeld method exploits structure but lacks robustness in
degenerate regimes. In practice, hybrid schemes that use Weiszfeld iterations when far from singularities
and fall back to subgradient updates when near data points can be particularly effective.
Lemma 4.1. Given two Hadamard manifolds M and N , the geometric median objective Fmedian is geodesi-
cally convex on the product manifold M = M × N .
In this setting, the subgradient method enjoys global convergence guarantees from nonsmooth Rieman-
nian optimization theory (Zhang and Sra, 2016). Specifically, if subgradients are uniformly bounded and the
√
step sizes satisfy ηk = η0 / k + 1 for some η0 > 0, then the following holds:
log k
min Fmedian (p(j) , q (j) ) − Fmedian (p∗ , q ∗ ) = O √ ,
0≤j≤k k
where (p∗ , q ∗ ) are minimizers of the objective.
If M or N has positive curvature, global convexity is no longer available, yet convexity can still be locally
recovered within a sufficiently small geodesic ball. That is, if the data and all iterates remain within such
ball, the same convergence guarantees still hold locally.
10
Theorem 4.2. Let M := M × N be a product of complete Riemannian manifolds with curvature upper
bounds secM ≤ κM and secN ≤ κN with κ := max(κM , κN ). Suppose a random sample {(xi , yi )}ni=1 and
the initial point (p(0) , q (0) ) all lie within a geodesic ball centered at (p0 , q0 ) with a radius r as prescribed in
Theorem 3.4. Assume that the iterates (p(k) , q (k) ) along the path of the subgradient method remain in the
√
ball and the step size is chosen as ηk = η0 / k + 1 for some η0 > 0. Then, the geometric median is unique
in the ball B((p0 , q0 ), r) and it achieves a sublinear rate of convergence
log k
min Fmedian (p(j) , q (j) ) − Fmedian (p∗ , q ∗ ) = O √ .
0≤j≤k k
The convergence rate O log
√ k stated in Theorem 4.2 arises from applying a specific bound using the step
k
√
size sequence ηk = η0 / k + 1. Although this bound shows a factor log k, the standard optimal convergence
rate for the decrease in the value of the
function
of the subgradient method in geodesically convex and
1
Lipschitz functions is known to be O √
k
. This tighter rate can typically be achieved with alternative
step-size selection strategies, such as fixing a priori the total number of iterations or using step sizes, e.g.,
Polyak, that depend on the norm of the subgradient or through a more refined convergence analysis.
We now turn to the Weiszfeld algorithm. When the manifold M is Hadamard, the algorithm enjoys
global convergence to the unique geometric median, provided the iterates remain distinct from all data
points. This is due to the fact that the Weiszfeld update corresponds to a normalized fixed-point iteration
of the subgradient condition, which is well-defined and contractive in convex geodesic regions.
In positively curved or mixed-curvature settings, however, convergence is no longer global. Nonetheless,
under the same local convexity condition as Theorem 4.2, the Weiszfeld algorithm converges to the unique
minimizer in a sufficiently small geodesic ball, provided that the iterates avoid the singularities induced by
the data points.
Corollary 4.3. Under the same assumptions as in Theorem 4.2, suppose the iterates of the Riemannian
Weiszfeld algorithm remain within the ball B((p0 , q0 ), r) and that no iterates collapse onto any data point
(xi , yi ) for all i ∈ [n]. Then, the algorithm converges to the unique geometric median in the ball. Moreover,
if the distances dM ((p(k) , q (k) ), (xi , yi )) are uniformly bounded away from zero, the convergence is locally
linear.
This result confirms that the Weiszfeld algorithm is both theoretically sound and computationally attrac-
tive in product manifold settings, particularly when the geometry permits local convexity. The assumption
that iterates remain distinct from the data points excludes a measure-zero singular set where the denom-
inator in the update becomes ill-defined. In practice, this is enforced by remedial strategies we discussed,
including numerical damping or regularization.
The local linear convergence rate hinges on the conditioning of the problem near the minimizer. Specif-
ically, when the geometric median lies well-separated from the data, the denominator terms in the update
remain uniformly bounded away from zero, and the algorithm behaves like a contractive fixed-point map.
This is analogous to strong convexity in Euclidean settings, though it arises here from local convexity and
smoothness of the Riemannian distance function within the ball. In contrast, when data points cluster
tightly or lie near the median, the conditioning deteriorates, and convergence may slow down or stall due to
near-singular behavior. This sensitivity motivates hybrid strategies, where one begins with robust subgradi-
ent iterations that are insensitive to non-differentiability and switches to Weiszfeld updates once the iterates
11
approach the minimizer. Such hybrid schemes help to balance global robustness with local acceleration and
are particularly effective when the median lies near high-curvature regions of the manifold.
5 Examples
In this section, we illustrate the proposed framework for computing geometric medians on product manifolds
using simulated datasets. Specifically, we focus on the space of univariate and multivariate Gaussian distribu-
tions, where each distribution is viewed as a point on a product manifold endowed with the Bures-Wasserstein
geometry (Takatsu, 2011). These examples highlight the estimator’s robustness to contamination and the
feasibility of the proposed algorithms.
First, consider the space of univariate Gaussian laws N (µ, σ 2 ), parametrized by a mean µ ∈ R and a
standard deviation σ > 0. This space can be identified with R × R+ , equipped with the 2-Wasserstein
distance. In this case, the distance between two Gaussians P1 = N (µ1 , σ12 ) and P2 = N (µ2 , σ22 ) admits a
closed-form expression:
dW (P1 , P2 )2 = (µ1 − µ2 )2 + (σ1 − σ2 )2 . (5)
While this geometry appears Euclidean, it reflects optimal transport geometry and forms a special case of
the broader Bures-Wasserstein structure.
We simulate n = 1000 observations N (µi , σi2 ) by drawing µi ∼ N (−1, 1/4) and σi2 ∼ Beta(5, 5), forming
the signal distribution. The most probable realization is centered at N (−1, 1/2), corresponding to the modal
values of the generative distributions for the parameters. These parameter pairs are treated as points on
M = R × R+ , with distance measured according to Equation (5).
In order to assess robustness, we introduce contamination by replacing a fraction α of the data with
outliers. Specifically, ⌊αn⌋ samples are replaced with Gaussians whose parameters are drawn from µi ∼
N (5, 1) and σi2 ∼ 5 · Beta(5, 5). We compute both the Fréchet mean, which corresponds to the Wasserstein
barycenter in the literature of optimal transport, and the geometric median of the parameterized points.
The former is available in closed form, while the latter is computed using the Weiszfeld algorithm.
Figure 1: Visualization of the univariate Gaussian example. (A) Model densities of the signal and noise are
presented. (B) A representative set of realized signal and noise distributions is shown. (C) Estimation error
as a function of contamination rate is given for both Fréchet mean and geometric median.
12
Figure 1 illustrates the results. As the contamination rate increases, the discrepancy between the Fréchet
mean and the model signal grows linearly. In contrast, the geometric median remains stable until the
contamination level approaches 50%, at which point a sharp transition in performance is observed, reflecting
the aforementioned theoretical breakdown point. Beyond this threshold, the geometric median shifts toward
the model noise, indicating that the notion of “signal” itself may become ill-posed when it constitutes a
minority of the data.
We now extend the experiment to multivariate Gaussian distributions Nd (µ, Σ) characterized by mean
vector µ ∈ Rd and covariance matrix Σ ∈ Rd×d , where Σ is symmetric and positive-definite. For P1 =
Nd (µ1 , Σ1 ) and P2 = Nd (µ2 , Σ2 ), the 2-Wasserstein distance is
1/2 1/2
dW (P1 , P2 )2 = ∥µ1 − µ2 ∥2 + tr Σ1 + Σ2 − 2(Σ2 Σ1 Σ2 )1/2 , (6)
where Σ1/2 is the matrix square root, i.e., Σ1/2 · Σ1/2 = Σ. While this expression simplifies to Equation (5)
when d = 1, the space of multivariate Gaussian distributions is different from the univariate case in several
key senses. First, while the 2-Wasserstein distance in one dimension decomposes cleanly into additive con-
tributions from mean and variance differences, the multivariate case involves intricate interactions between
covariance matrices, requiring matrix square roots and trace terms that reflect both scale and orientation.
Second, the geometry of the space becomes non-Euclidean as the set of multivariate Gaussians endowed
with the 2-Wasserstein metric forms a Riemannian manifold, where geodesics and interpolation paths are
curved and shape-aware, unlike the linear interpolation in R. Finally, optimal transport maps in higher
dimensions are no longer monotonic functions but instead involve linear transformations that align mass in
both direction and spread, introducing substantial geometric and computational complexity.
We employ the previous experimental design by simulating n = 1000 replicates of the signal distribution
Nd (µi , Σi ), each of which is a perturbed version of Nd (0, Id ), where Id is the d × d identity matrix. Instead of
directly sampling the parameters, each replicate is generated by drawing a sample of size 2d from Nd (0, Id )
and computing the corresponding maximum likelihood estimates (MLEs), which serve as the parameters
(µi , Σi ) of the simulated signal distributions. To model contamination, we replace ⌊αn⌋ distributions with
noise. The noise distributions are sampled from Nd (10, ΣAR ), where ΣAR is the autoregressive covariance
matrix of an AR(1) process, defined as
Σ(i, j) = ρ|i−j| , 1 ≤ i, j ≤ d,
for a decay parameter ρ ∈ (0, 1), as described in Bickel and Levina (2008). As with the signal generation,
each noise distribution is based on a sample of size 2d, and its MLEs provide the parameters of the noise
distributions under the Gaussian model. We consider varying the dimensionality d ∈ {10, 50, 100} and the
decay parameter ρ ∈ {0.1, 0.5, 0.9}, and compute both the Fréchet mean and the geometric median in the
parameter space. In this experiment, we restrict the contamination level to at most 49%, in order to avoid
the regime where the proportion of noise exceeds that of the signal, as discussed in the previous example.
Figure 2 reports the estimation error as a function of contamination. Across all configurations, the
geometric median exhibits markedly better resilience than the Fréchet mean. While the mean degrades
steadily under increasing contamination, the median remains robust. The only discernible trend is that the
discrepancy grows with dimension, primarily due to the increased contribution of the mean vectors to the
overall Wasserstein distance. These results reinforce the utility of the geometric median for robust estimation
on product manifolds, particularly in high-dimensional and contaminated settings.
13
Figure 2: Comparison of robustness to contamination between the Fréchet mean and the geometric median
of contaminated set of multivarate Gaussian distributions across different settings of data dimension d ∈
{10, 50, 100} and autocorrelation structure controlled by ρ ∈ {0.1, 0.5, 0.9}.
6 Conclusion
This work provides a systematic treatment of geometric medians on product manifolds, a setting that
naturally arises whenever heterogeneous geometric variables are observed conjointly. We formulated the
median objective in the Riemannian framework and showed that the L1 criterion couples the factor manifolds
in a fundamentally non-separable manner, unlike the Fréchet mean. Building on this, we developed a
general theory establishing existence and uniqueness of the geometric median under curvature and injectivity
conditions. In particular, we proved that uniqueness holds globally in Hadamard products and locally under
explicit radius constraints, even when one or more components have positive curvature. On the robustness
side, we showed that the estimator inherits classical properties: Lipschitz stability under perturbations and
an optimal breakdown point of 50%. These results extend the well-established behavior of the geometric
median in Euclidean and single-manifold settings to the product-manifold regime.
On the algorithmic front, we proposed two practical solvers. The first is a Riemannian subgradient
14
method, which is globally convergent under mild assumptions and suitable for general settings. The second
is a product-aware Weiszfeld iteration, which achieves local linear convergence when safely away from data
singularities. Both methods are designed to update components independently while respecting the coupling
structure of the objective. This modularity allows them to leverage existing manifold toolkits and scale
effectively to high-dimensional settings. Through examples on the space of univariate and multivariate
Gaussians equipped with the Bures-Wasserstein metric, we demonstrated that the geometric median exhibits
substantial resilience against contamination, outperforming the Fréchet mean in both accuracy and stability
across a range of dimensions and covariance structures.
Despite the breadth of the present analysis, several open directions remain. First, while our theory ex-
tends readily to products of more than two manifolds, empirical behavior in higher-order products such as
tensor bundles in medical imaging has yet to be fully characterized, particularly under curvature-induced
ill-conditioning. Second, our results establish robustness but not yeet a complementary statistical theory
including consistency and distributional theory. Future work could develop central limit theorems, boot-
strap procedures, or inference frameworks for geometric medians in product spaces. Third, the algorithms
presented here operate in batch mode. Stochastic or streaming extensions with theoretical guarantees in
non-Euclidean settings would be especially valuable for large-scale applications in domains like neuroimag-
ing, robotics, and climate science. Finally, many real-world datasets reside not on strict product manifolds,
but on richer structures such as fiber bundles or quotient manifolds. Extending robustness principles and
optimization strategies to these more intricate geometries remains an exciting avenue for future research.
References
Absil, P.-A., Mahony, R. and Sepulchre, R. (2008). Optimization algorithms on matrix manifolds, Princeton
University Press, Princeton, NJ.
Afsari, B. (2011). Riemannian Lp center of mass: Existence, uniqueness, and convexity, Proceedings of the
American Mathematical Society 139(02): 655–655.
Agueh, M. and Carlier, G. (2011). Barycenters in the Wasserstein Space, SIAM Journal on Mathematical
Analysis 43(2): 904–924.
Ambrosio, L., Gigli, N. and Savaré, G. (2005). Gradient flows: in metric spaces and in the space of probability
measures, Lectures in mathematics ETH Zürich, Birkhäuser, Boston.
Beck, A. and Sabach, S. (2015). Weiszfeld’s Method: Old and New Results, Journal of Optimization Theory
and Applications 164(1): 1–40.
Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices, The Annals of
Statistics 36(1).
Boyd, S. P. and Vandenberghe, L. (2004). Convex optimization, Cambridge University Press, Cambridge,
UK ; New York.
15
Bridson, M. R. and Haefliger, A. (1999). Metric spaces of non-positive curvature, number 319 in Grundlehren
der mathematischen Wissenschaften, Springer, Berlin.
Dacorogna, B. (2008). Direct methods in the calculus of variations, number v. 78 in Applied mathematical
sciences, 2nd ed edn, Springer, New York, N.Y.
Dryden, I. L., Koloydenko, A. and Zhou, D. (2009). Non-Euclidean statistics for covariance matrices, with
applications to diffusion tensor imaging, The Annals of Applied Statistics 3(3).
Fletcher, P. T., Venkatasubramanian, S. and Joshi, S. (2009). The geometric median on Riemannian mani-
folds with application to robust atlas estimation, NeuroImage 45(1): S143–S152.
Fréchet, M. R. (1948). Les éléments aléatoires de nature quelconque dans un espace distancié, Annales de
l’institut Henri Poincaré 10(4): 215–310.
Huber, P. J. (1981). Robust statistics, Wiley series in probability and mathematical statistics, Wiley, New
York.
Lee, J. M. (1997). Riemannian manifolds: an introduction to curvature, number 176 in Graduate texts in
mathematics, Springer, New York.
Lee, J. M. (2012). Introduction to Smooth Manifolds, Vol. 218 of Graduate Texts in Mathematics, Springer
New York, New York, NY.
Lee, J. M. (2018). Introduction to Riemannian Manifolds, number 176 in Graduate Texts in Mathematics,
2nd ed. 2018 edn, Springer International Publishing : Imprint: Springer, Cham.
Mardia, K. V. and Jupp, P. E. (2000). Directional statistics, Wiley series in probability and statistics, J.
Wiley, Chichester ; New York.
Petersen, P. (2006). Riemannian Geometry, Vol. 171 of Graduate Texts in Mathematics, Springer New York.
Rockafellar, R. T. (1997). Convex analysis, Princeton Landmarks in mathematics and physics, 10. print.
and 1. paperb. print edn, Princeton Univ. Press, Princeton, NJ.
Selig, J. M. (2005). Lie Groups and Lie Algebras in Robotics, in J. Byrnes (ed.), Computational Noncom-
mutative Algebra and Applications, Vol. 136, Kluwer Academic Publishers, Dordrecht, pp. 101–125.
Takatsu, A. (2011). Wasserstein geometry of Gaussian measures, Osaka Journal of Mathematics 48(4): 1005
– 1026.
Vardi, Y. and Zhang, C.-H. (2000). The multivariate L1-median and associated data depth, Proceedings of
the National Academy of Sciences 97(4): 1423–1426.
You, K. and Park, H.-J. (2021). Re-visiting Riemannian geometry of symmetric positive definite matrices
for the analysis of functional connectivity, NeuroImage 225: 117464.
16
You, K., Shung, D. and Giuffrè, M. (2025). On the Wasserstein Median of Probability Measures, Journal of
Computational and Graphical Statistics 34(1): 253–266.
Zhang, H. and Sra, S. (2016). First-order Methods for Geodesically Convex Optimization, in V. Feldman,
A. Rakhlin and O. Shamir (eds), 29th Annual Conference on Learning Theory, Vol. 49 of Proceedings of
Machine Learning Research, PMLR, Columbia University, New York, New York, USA, pp. 1617–1638.
17
Appendix
Proof of Theorem 3.1
We are interested in showing that the objective function
n
X n
X p
Fmedian (p, q) = wi dM ((p, q), (xi , yi )) = wi dM (p, xi )2 + dN (q, yi )2
i=1 i=1
which yields
p
Fmedian (pk , qk ) ≥ wi dM (pk , xi )2 + dN (qk , yi )2 → ∞.
Hence, Fmedian is coercive, i.e., it diverges along any sequence that escapes compact subsets of M.
The function Fmedian : M → R+ is continuous and coercive, the latter of which means that its sublevel
sets are bounded and closed, hence precompact in M. Since M is a complete Riemannian manifold, it is also
a proper metric space where closed and bounded sets are compact. Therefore, each sublevel set of Fmedian
is compact. By the direct method of the calculus of variations (Dacorogna, 2008), a continuous, coercive
function defined on a proper metric space always attains its minimum (Ambrosio et al., 2005). Therefore,
Fmedian admits at least one minimizer (p∗ , q ∗ ) ∈ M.
18
For brevity, denote (|X1 |M , |X2 |N , |Y1 |M , |Y2 |N ) = (a, b, c, d), where each quantity is nonnegative. Divid-
ing both sides of Equation (7) by |X ∧ Y |2M ×N leads to
We now examine the last term of Equation (8), whose denominator is nonnegative. The case of equality,
which corresponds to degenerate configurations where X and Y are colinear, is excluded from our analysis as
degenerate planes do not define sectional curvature. The numerator is nonnegative by the Cauchy-Schwarz
inequality:
where R(X, Y )Y denotes the Riemannian curvature tensor. Since the curvature tensor of a product mani-
fold only acts nontrivially on the vertical or horizontal planes and mixed components contribute zero, the
numerator becomes
Given the upper curvature bounds secM ≤ κM and secN ≤ κN , we can bound the terms on the right-hand
side using the definition of sectional curvature:
Hence, we obtain
κM |X1 ∧ Y1 |2M + κN |X2 ∧ Y2 |2
secM ×N (X ∧ Y ) ≤
|X ∧ Y |2M ×N
max(κM , κN ) · (|X1 ∧ Y1 |2M + |X2 ∧ Y2 |2 )
≤
|X ∧ Y |2M ×N
max(κM , κN ) · |X ∧ Y |2M ×N
≤ = max(κM , κN ),
|X ∧ Y |2M ×N
where the last inequality uses Equation (9). This completes the proof.
19
and each factor is complete and simply connected. From Proposition 3.2, the product manifold M = M × N
has nonpositive section curvature. Furthermore, the product of two simply connected manifolds is also
simply connected. Hence, M is Hadamard.
On a Hadamard manifold, the squared distance function x 7→ d(x, xi )2 is strictly convex for every xi
(Lee, 2018). In consequence, the square root of this mapping x 7→ d(x, xi ) is convex and becomes strictly
convex when the data points are not colinear (Rockafellar, 1997). Therefore, the objective Fmedian is strictly
convex under mild non-degeneracy conditions, ensuring that it admits a unique minimizer on M.
is strictly convex on the ball B((p0 , q0 ), r) once the radius r is smaller than the injectivity radius at each
√
factor and less than the comparison threshold π/(2( κ).
The geometric median is defined as a convex combination of a map
p
(p, q) 7→ dM (p, xi )2 + dN (q, yi )2 ,
which is the composition of the square root with a strictly convex function. It is known that the square
root function, which is concave on R+ , still yields a strictly convex function on convex domains (Boyd and
Vandenberghe, 2004). Since the radius r is chosen in a way that the distance between (p, q) and each (xi , yi )
is uniformly positive and squared distance is strictly positive in this domain, the composition remains strictly
positive.
Since Fmedian is a finite strictly convex combination of strictly convex functions, it is strictly convex on
B((p0 , q0 ), r), hence admits a unique minimizer within this ball.
Let (p∗ , q ∗ ) and (p′∗ , q ′∗ ) be the unique minimizers of F and F ′ , respectively. By assumption, both lie
in the same geodesic ball B((p0 , q0 ), r) ⊂ M, where F and F ′ are geodesically convex and directionally
differentiable.
20
Let ξ ∈ ∂F (p∗ , q ∗ ) and ξ ′ ∈ ∂F ′ (p′∗ , q ′∗ ). By strong monotonicity of the subdifferential mapping in a
convex ball, we have: D E
ξ ′ − ξ, log(p∗ ,q∗ ) (p′∗ , q ′∗ ) ≥ µ · dM ((p∗ , q ∗ ), (p′∗ , q ′∗ ))2 .
Next, we estimate ∥ξ − ξ ′ ∥ using the triangle inequality:
n
X
∥ξ − ξ ′ ∥ ≤ wi |dM ((p∗ , q ∗ ), (xi , yi )) − dM ((p∗ , q ∗ ), (x′i , yi′ ))| .
i=1
Therefore,
n
X q
∥ξ − ξ ′ ∥ ≤ wi · (εM 2 N 2
i ) + (εi ) .
i=1
Applying Cauchy–Schwarz:
v v v
u n u n u n
X X uX
∥ξ − ξ ′ ∥ ≤ t
u u
wi · t wi (εM )2 + (εN )2 = t wi (εM )2 + (εN )2 .
i i i i
i=1 i=1 i=1
and fix any compact set K ⊂ M containing all clean points {(xj , yj )}j∈J . As z → ∞, the distances d(z, zj )
diverge for all j ∈ J , while the corrupted terms remain bounded or arbitrary.
Since the clean portion dominates the total weight (more than half), F (z) becomes arbitrarily large
outside K, and hence the minimum is attained within a compact subset. Thus, the contaminated median
(p̃, q̃) remains bounded.
21
Conversely, if m ≥ ⌊n/2⌋, then the adversary controls at least half of the total weight. By placing all
corrupted points arbitrarily far from the clean data, they can dominate the objective and force the median
(p̃, q̃) to escape to infinity. Thus, the breakdown point is 1/2.
We will prove that each fi (t) is convex, which implies that F (γ(t)) is convex as a weighted sum of convex
functions.
Since M and N are Hadamard manifolds, the squared distance functions
are convex functions in t (Bridson and Haefliger, 1999). Therefore, their sum si (t) := ai (t) + bi (t) is also
convex on [0, 1].
p √
Now, define fi (t) = si (t). Since the square root function ϕ(s) = s is concave but strictly increasing
on (0, ∞), and since the composition ϕ ◦ si (t) is convex whenever si (t) is convex and positive, we conclude
that fi (t) is convex for all t such that si (t) > 0. Note that si (t) = 0 only if γM (t) = xi and γN (t) = yi ,
which occurs only at isolated t unless (p, q) = (xi , yi ) or (p′ , q ′ ) = (xi , yi ). In such cases, the function fi (t)
is continuous and pointwise limit of convex functions. Therefore, each fi (t) is convex on [0, 1], and hence
n
X
F (γ(t)) = wi fi (t)
i=1
is convex as a nonnegative weighted sum of convex functions. Since this statement holds for arbitrary
(p, q), (p′ , q ′ ), we conclude that F is geodesically convex on M = M × N .
22
is geodesically convex within the ball and admits a unique minimizer in B((p0 , q0 ), r).
From the algorithmic derivation, the subgradient (ξp , ξq ) ∈ Tp M × Tq N at any (p, q) ∈ M satisfies
X logp (xi ) X logq (yi )
ξp := −2 wi · , ξq := −2 wi · .
dM ((p, q), (xi , yi )) dM ((p, q), (xi , yi ))
i:p̸=xi i:q̸=yi
Since the iterates and all data points are assumed to lie in the compact geodesic ball B((p0 , q0 ), r), both
the numerator and denominator terms in the above expression are bounded. Hence, the norm of the full
(k) (k)
subgradient vector ξ (k) := (ξp , ξq ) is uniformly bounded. That is, ∥ξ (k) ∥ ≤ G < ∞ for all k.
We now apply the standard convergence result for Riemannian subgradient descent on geodesically convex
functions (Zhang and Sra, 2016). Since the objective is geodesically convex on a convex geodesic ball, the
√
subgradients are uniformly bounded, and the step size is chosen as ηk = η0 / k + 1, it follows that
h i D2 + η 2 G2 log(k + 1)
min F (p(j) , q (j) ) − F (p∗ , q ∗ ) ≤ 0
√ ,
0≤j≤k 2η0 k + 1
where D := dM ((p(0) , q (0) ), (p∗ , q ∗ )) is the Riemannian distance to the unique minimizer (p∗ , q ∗ ). This
√
establishes the desired sublinear convergence rate of O(log k/ k).
(k)
and similarly for ∆q , where all terms are well-defined given that ((p(k) , q (k) ) ̸= (xi , yi ). The assumption
that the iterates remain in a convex geodesic ball ensures that both the log maps and distance terms vary
smoothly. Therefore, the update map defines a continuous self-map on a compact convex domain. By
standard contraction arguments (Fletcher et al., 2009), the Weiszfeld iteration converges to the unique fixed
point when the iterates are well-separated from singularities.
In addition, suppose all pairwise distances dM ((p(k) , q (k) ), (xi , yi )) are bounded below by a fixed δ >
0. Then the weights in the denominator remain uniformly bounded and the iteration map is a Lipschitz
continuous map with Lipschitz constant strictly less than 1 in a neighborhood of the solution (Vardi and
Zhang, 2000). This guarantees local linear convergence to the minimizer.
23