Beyond - Barycenters: An Effective Averaging Method On Stiefel and Grassmann Manifolds
Beyond - Barycenters: An Effective Averaging Method On Stiefel and Grassmann Manifolds
Abstract—In this paper, the issue of averaging data on a often computationally quite expensive. Indeed, the distance
manifold is addressed. While the Fréchet mean resulting from is not always known in closed form – e.g., for the Stiefel
Riemannian geometry appears ideal, it is unfortunately not manifold – or involves complicated operators – such as the
always available and often computationally very expensive. To
overcome this, R-barycenters have been proposed and success- matrix logarithm for Grassmann [8], [11]–[13]. Even when
arXiv:2501.11555v1 [stat.ML] 20 Jan 2025
fully applied to Stiefel and Grassmann manifolds. However, R- available, an iterative algorithm is usually needed to compute
barycenters still suffer severe limitations as they rely on iterative the Fréchet mean; see e.g., [7], [16] for SPD matrices or [11],
algorithms and complicated operators. We propose simpler, yet [12] for Grassmann. This algorithm relies on two objects: the
efficient, barycenters that we call RL-barycenters. We show that, Riemannian exponential, which maps tangent vectors onto the
in the setting relevant to most applications, our framework yields
astonishingly simple barycenters: arithmetic means projected manifold following geodesics, and its inverse, the Riemannian
onto the manifold. We apply this approach to the Stiefel and logarithm.
Grassmann manifolds. On simulated data, our approach is To overcome the limitations of the Riemannian Fréchet
competitive with respect to existing averaging methods, while mean, [17], [18] have proposed simpler averaging methods
computationally cheaper. on manifolds: the so-called R-barycenters. They are defined
Index Terms—Means on matrix manifolds; R-barycenters; through a fixed-point equation that mimics the one that
Riemannian geometry; Stiefel manifold; Grassmann manifold characterizes the Riemannian Fréchet mean. The Riemannian
exponential is replaced by a simpler tool: a retraction [9],
I. I NTRODUCTION which can simply be a first order approximation of the
Riemannian exponential. The Riemannian logarithm is then
In statistical signal processing and machine learning, it is
replaced by the inverse of the chosen retraction. This approach
often necessary to average data. Indeed, this is for instance
has been successfully applied on the Stiefel and Grassmann
leveraged for classification (e.g., nearest centroid classifier [1],
manifolds in [17], [18]. While the R-barycenter framework is
[2]), clustering (e.g., K-means [3]), shrinkage (to build the
simpler than Riemannian Fréchet means, it still features major
target matrix) [4], [5], batch normalization [6], etc. When data
drawbacks. Indeed, an iterative procedure is still needed and
possess a specific structure, e.g., when they belong to a smooth
one has to combine a retraction with its exact inverse. This
manifold, one should expect their average to possess the same
second point appears as the most limiting one. Indeed, for all
structure and be adapted to the geometry of the manifold. In
considered retractions in [17], [18], either the retraction or its
such a case, the arithmetic mean is not well-suited. Examples
inverse involves costly and possibly unstable operations.
of such structured data are covariance matrices, which are
In this letter, we follow a different path, recalling that the
symmetric positive definite matrices (see, e.g., [7] for a full
idea behind retractions is to simplify Riemannian exponentials.
review oriented on geometry); orthogonal matrices, which
Rather than choosing the inverse retraction to replace the
are embedded in the Stiefel manifold [8]–[10]; or subspaces,
Riemannian logarithm, we propose to leverage simpler liftings,
which correspond to the Grassmann manifold [8]–[13]. While
which map points on the manifold onto tangent spaces, hence
this letter aims to deal with generic smooth manifolds, a spe-
approximating the Riemannian logarithm. This yields the so-
cial attention is given to the Stiefel and Grassmann manifolds.
called RL-barycenters. Choosing the widely spread projection
These are especially useful in the context of dimensionality
based retraction [19] and the simplest lifting built on the
reduction (see e.g., [14] with an application to clustering) or
Riemannian projection onto tangent spaces, we find out that
deep learning [15].
the resulting RL-barycenter is astonishingly simple. Indeed,
To average data on a smooth manifold, Riemannian geom- it is just the projection onto the manifold of the arithmetic
etry is often exploited. Riemannian geometry indeed induces mean of the data. Applied to the Stiefel manifold, we show
geodesics, which generalize the notion of straight lines, and that the resulting barycenter is in fact a closed form solution
a distance on the manifold; see, e.g., [9], [10]. These in turn of the R-barycenter associated to the orthographic retraction
lead to the definition of the Fréchet mean, which perfectly fits from [17]. We also extend our result to the projection based on
the geometry of the manifold. While such a Fréchet mean QR decomposition, showing that the resulting projected mean
appears ideal, it is unfortunately not always available and is also an RL-barycenter. In order to apply our approach on
Florent Bouchard is with Université Paris Saclay, CNRS, CentraleSupélec, the Grassmann manifold, we derive the projection from the
laboratoire des signaux et systèmes. Nils Laurent and Nicolas Le Bihan are ambient space onto the manifold. Numerical experiments are
with Université Grenoble Alpes, CNRS, Grenoble INP, Gipsa-lab. Salem Said conducted on simulated data. Our projected means perform
is with Université Grenoble Alpes, CNRS, Grenoble INP, laboratoire Jean
Kuntzmann. This work has been partially supported by MIAI @ Grenoble better than existing R-barycenters on Stiefel. We also do
Alpes, (ANR-19-P3IA-0003). not lose too much accuracy as compared to the Riemannian
2
Fréchet mean on Grassmann. Due to their simplicity and B. Barycenters on matrix manifolds
reasonable complexity, on Stiefel and Grassmann manifolds,
our proposed projected means appear very advantageous as When aiming to compute a barycenter on a Riemannian
compared to other existing averaging methods. matrix manifold M, the ideal solution appears to employ
To ensure reproducibility, the code is available at https:// the Riemannian mean. Such manifold is equipped with a
github.com/flbouchard/projection barycenter. Riemannian metric ⟨·, ·⟩· , which yields a Riemannian distance
δ(·, ·) on M. This distance can be exploited to define the
II. BACKGROUND corresponding Riemannian mean (or Fréchet mean). Given
A. Stiefel and Grassmann manifolds samples {M i }ni=1 in M, their Riemannian mean G ∈ M
is the solution to the optimization problem [21]
The real Stiefel manifold is the homogeneous space of p×k
orthogonal matrices [8]–[10], i.e., n
X
Stp,k = {U ∈ R p×k ⊤
: U U = I k }. (1) G = argmin δ 2 (M i , G). (10)
G∈M i=1
The projection map from Rp×k onto Stp,k according to the
Euclidean distance is [20, Theorem 4.1] It is usually not known in closed form. To compute it, one
can employ the Riemannian gradient descent, which yields
P Stp,k
(X) = argmin ∥X − U ∥22 = uf(X), (2)
U ∈Stp,k the following fixed-point algorithm [21], [22]
where uf(·) returns the orthogonal factor of the polar decom- n
!
position. The tangent space of Stp,k at U is [8]–[10] 1X
G(t+1) = expG(t) logG(t) (M i ) , (11)
p×k ⊤ ⊤ n i=1
TU Stp,k = {ξ ∈ R : U ξ + ξ U = 0}. (3)
Since Stp,k is a submanifold of the Euclidean space Rp×k , it where expG : TG M → M and logG : M → TG M
can simply be turned into a Riemannian manifold by endowing are the Riemannian exponential and logarithm at G ∈ M.
it with the Euclidean metric The Riemannian exponential is defined through the geodesics,
⟨ξ, η⟩U = tr(ξ ⊤ η). (4) which generalize the notion of straight lines to Riemannian
manifolds. The Riemannian logarithm is its (local) inverse.
p×k
The corresponding orthogonal projection from R onto Unfortunately, even though it seems the most natural option,
TU Stp,k is [8]–[10] the Riemannian mean is often very complicated to compute
St
PU p,k (Z) = Z − U sym(U ⊤ Z), (5) in practice. This is because Riemannian exponential and loga-
rithm operators are computationally expensive in many cases.
where sym(·) returns the symmetrical part of its argument. In fact, they are not always known in closed form (especially
The Grassmann manifold is the manifold of k-dimensional the Riemannian logarithm) and, even when they are, their
subspaces in the Euclidean space Rp [8]–[13]. There exist computation usually involves costly operations. For instance,
various ways of representing it. For instance, it can be viewed for the Stiefel manifold, the Riemannian exponential involves
as a quotient manifold of the Stiefel manifold Stp,k with the a matrix exponential [8], [9], [23] while the Riemannian
orthogonal group Ok [8]–[13]. In this article, as in [12], [13], logarithm is not known in closed form and can only be
we identify it with the set of orthogonal rank k projectors, i.e., computed with a heavy iterative algorithm [23]–[25].
Grp,k = {P ∈ Sp : P 2 = P , rank(P ) = k}, (6) To overcome the fact that the Riemannian exponential is
often too expensive, a simpler tool to map tangent vectors onto
where Sp denotes the Euclidean space of p × p symmetric
the manifold has been designed in the context of optimization:
matrices. This representation of the Grassmann manifold Grp,k
the retraction [9]. A retraction is, at G ∈ M, a mapping RG :
is linked to the Stiefel manifold Stp,k through the projection
TG M → M such that RG (ξ) = G + ξ + o(∥ξ∥). Retractions
mapping
are (at least) first order approximations of the Riemannian
π : U ∈ Stp,k 7→ U U ⊤ ∈ Grp,k . (7)
exponential. Notice that on a manifold, there are often several
Even though the formula is quite intuitive and related to retractions available. Beyond optimization, retractions have
principal component analysis, we could not find the projection been leveraged to design barycenters on manifolds: the so-
map from Sp onto Grp,k identified as (6) in the literature. We called R-barycenters [17], [18]. The goal is to propose simpler
thus provide it in Section III, which contains our contributions. barycenters than the Riemannian mean while respecting the
The tangent space of the Grassmann manifold identified as (6) structure of the manifold. This appears particularly attractive
at P ∈ Grp,k is [13] for manifolds whose Riemannian exponential and/or logarithm
TP Grp,k = {ξ ∈ Sp : P ξ + ξP = ξ}. (8) are not known in closed form such as the Stiefel manifold. The
idea is to mimic (11), replacing the Riemannian exponential
Since, in this case, Grp,k is a submanifold of Sp , it can also and logarithm with a retraction and its inverse [17], [18].
be turned into a Riemannian manifold by endowing it with the Formally, the resulting fixed-point algorithm is
Euclidean metric (4). The corresponding orthogonal projection
from Sp onto TP Grp,k is [13] n
!
(t+1) 1 X −1
Grp,k G = RG(t) R (M i ) . (12)
PP (Z) = 2 sym((I p − P )ZP ). (9) n i=1 G(t)
3
In practice, this approach has been exploited on the Stiefel In this work, we are particularly interested in the retrac-
manifold with various retractions [17]. The first one is the tions that arise from the projection from the ambient space
one based on the projection (2) (polar decomposition), i.e., E to the matrix manifold M [19], defined as P(X) =
uf argminG∈M ∥X − G∥22 . The corresponding retraction is
RU (ξ) = P Stp,k (U + ξ) = uf(U + ξ). (13)
RG (ξ) = P(G + ξ). (16)
The second one is based on the QR decomposition, i.e.,
qf
RU (ξ) = qf(U + ξ), (14) For the lifting, we consider the orthogonal projection mapping
on tangent spaces corresponding to the Euclidean metric of E.
where qf(·) returns the orthogonal factor of the QR decompo- At G ∈ M, it is denoted PG : E → TG M. The lifting is
sition. For these two retractions, computing the inverse is not
straightforward. In both case, it involves solving equations not LG (M ) = PG (M − G). (17)
admitting closed form solutions. The third retraction is the so- This retraction and lifting appear as the simplest natural
called orthographic retraction [17]. This has a straightforward choices on M. Interestingly, as shown in Proposition 1, we
inverse, while the retraction itself is implicitly defined and soon realised that the resulting barycenter admits a simple
involves solving a Ricatti equation. The inverse retraction closed form expression: it is the projection on M of the
exploits the orthogonal projection (5) and is given by arithmetic mean of {M i }ni=1 (which belongs to E).
o −1 St
RU (V ) = PU p,k (V − U ). (15) Proposition 1 (Projection based barycenters). Given the
For all above R-barycenters, a simple expression exists retraction (16) and the lifting (17), the RL-barycenter of
either for the retraction R· (·) or the inverse retraction R·−1 (·), {M i }ni=1 , according to Definition 1, is
but numerically solving an equation, possibly costly and unsta- n
!
1X
ble, is necessary for the other operation. Indeed, as explained G=P Mi .
in [17], a solution to such equation is only guaranteed in n i=1
a neighborhood of U ∈ Stp,k . Hence, the resulting proce- Pn
Proof. By definition, G = argminG∈M ∥ n1 i=1 M i −G∥22 .
dure (12) appears quite complicated and heavy. Moreover, the n
Let F (G) = ∥ n1 i=1 M i − G∥22 . The directional derivative
P
motivation behind retractions is to simplify the Riemannian
of F atPG ∈ M in direction ξ ∈ TG M is d F (G)[ξ] =
exponential. Exactly taking the inverse retraction, which is n
⟨G− n1 i=1 M i , ξ⟩, where ⟨·, ·⟩ denotes the Euclidean metric
complicated, does not seem to follow this philosophy.
on E. Since ξ ∈ TG M, one has
Pn
III. P ROJECTION BASED BARYCENTERS d F (G)[ξ] = ⟨PG (G − n1 i=1 M i ), ξ⟩.
This section contains our contribution. Our original idea By identification, it follows that the
is to simplify (12) by dropping the requirement of choosing PRiemannian
n
gradient of
F at G is ∇F (G) = PG (G − n1 i=1 M i ). Moreover, by
the inverse retraction. We rather replace that with a lifting,
which, at G ∈ M, is a mapping LG : M → TG M
definition
Pn of the projected meanPn G, ∇F (G) = 0. Hence,
PG ( n1 i=1 (M i − G)) = n1 i=1 LG (M i ) = 0.
such that LG (M ) = M − G + o(∥M ∥). The resulting
barycenters, named retraction-lifting barycenters, and denoted In particular, this approach can be employed with the Stiefel
RL-barycenters, are defined in Definition 1. manifold Stp,k with the retraction and lifting resulting from
projections (2) and (5). It is interesting to notice that both
Definition 1 (RL-barycenters). Given the retraction R· :
the retraction and lifting were previously considered in the
T· M → M and lifting L· : M → T· M, the so-called RL-
context of R-barycenters. Indeed, the retraction corresponds
barycenter G ∈ M of samples {M i }ni=1 in M, if it exists, is
to the polar retraction (13) while the lifting corresponds to the
solution to the fixed-point equation
! inverse retraction (15) of the orthographic retraction. One of
n
1X the results of the present paper isPthat, as a direct consequence
G = RG LG (M i ) . n
of Proposition 1, G = uf( n1 i=1 M i ) is a closed form
n i=1
solution for the R-barycenter with the orthographic retraction.
1
Pn that the point G ∈ M is solution if it verifies
Notice Hence, in this case, the iterative procedure (12) is no longer
n i=1 LG (M i ) = 0. necessary. To apply our approach on the Grassmann manifold
This formalism of RL-barycenters encompasses the existing Grp,k , the projection map from Sp onto Grp,k is required. It
ones of Riemannian means – with RG (·) = expG (·) and is provided in Proposition 2.
−1
LG (·) = logG (·) – and R-barycenters – with LG (·) = RG (·). Proposition 2 (Projection on the Grassmann manifold). The
It is more general since a wider range of choices of liftings projection map from Sp onto Grp,k according to the Euclidean
is possible. Hence, it allows to select retractions and liftings distance is
known in closed form and not too expensive to compute. One
can thus expect to obtain more tractable algorithms with this P Grp,k (X) = argmin ∥X − P ∥22 = V k V ⊤
k,
P ∈Grp,k
setting. Choosing simple yet natural retraction and lifting is
our goal in the following. As we will see, doing so yields where V k is composed of the k eigenvectors corresponding
astonishingly simple barycenters. to the k largest eigenvalues of X.
4
σ = 0.3 σ = 0.5
b (dB)
−20 20
R polar
R QR
errStp,k (GStp,k , G)
−30
proj polar 0
−40 proj QR
−20
−50
−40
−60
20 50 200 70 100 500 20 50 70 100 200 500
n n
Fig. 1. Medians (solid lines), 10% and 90% quantiles (filled areas) over 100 realizations of error measure (19) of mean estimators on the Stiefel manifold
Stp,k . “R polar” and “R QR” correspond to R-barycenters with polar and QR retractions. “proj polar” and “proj QR” correspond to the projected arithmetic
means with the projections on Stp,k based on the polar and QR decompositions, respectively. In these simulations, p = 10 and k = 5.
Riemannian mean
manifold, the projected mean is compared to the Riemannian
errGrp,k (GGrp,k , G)
proj evd
0 mean; see, e.g., [11], [12]. In every cases, iterative algorithms
are initialized with the first sample of the dataset to average.
Let us now describe how simulated data are obtained. For
−20
the Stiefel manifold, a random center GStp,k by taking the k
first columns of a p × p orthogonal matrix uniformly drawn on
20 50200 70 100500 Op . From there, n random samples U i are generated according
n
Fig. 2. Medians (solid lines), 10% and 90% quantiles (filled areas) over
to U i = expm(σΩi )GStp,k , where σ > 0 and Ωi is obtained
100 realizations of error measure (20) of mean estimators on the Grassmann by taking the skew-symmetrical part of a p × p matrix whose
manifold Grp,k . In the legend, “proj evd” corresponds to the projected elements are independently drawn from the centered normal
arithmetic mean with the projection on Grp,k based on the eigenvalue
decomposition. In these simulations, p = 10, k = 5 and σ = 0.5.
distribution with unit variance. For Grassmann, the random
center GGrp,k as well as the random samples P i are obtained
by projecting GStp,k and U i on Grp,k through (7).
Proof. See Supplementary materials. To measure the performance on Stp,k , we rely on the same
similarity measure as in [17], i.e.,
We further believe that our results extend to more generic
projections, i.e., mappings P
e : E → M such that P e2 (X) = b = ∥G⊤ G
errStp,k (GStp,k , G) b − I k ∥2 . (19)
Stp,k 2
P(X).
e From [19], we know that, in the case of P : X ∈ E 7→
argminG∈M ∥X − G∥22 , we have PG (Z) = d P(G)[Z]. For Grp,k , we employ the Riemannian distance [13], yielding
Hence, if we set RG (ξ) = P(G e + ξ) and LG (M ) = b = ∥ 1 logm((I p − 1 GGr )(I p − 1 G))∥
b 2.
errGrp,k (GGrp,k , G) 2
d P(G)[M − G], then the corresponding RL-barycenter is
e 2 2 p,k 2
(20)
the arithmetic mean projected on M with P, e i.e., G =
1
Pn Obtained results are displayed in Figures 1 and 2. Notice
P( n i=1 M i ). To obtain this, it is needed to show that, with
e
e 1 Pn M i ), we have d P(G)[ 1
Pn that, on Stp,k , the results obtained with the R-barycenter asso-
G = P( n i=1
e
n i=1 M i − G] = ciated to the orthographic retraction are not displayed since, as
0. Intuitively, this seems to be the case but proving it is
expected, it yields the same results as the projected arithmetic
beyond the scope of the present letter in the general case.
mean with the projection based on the polar decomposition (in
In supplementary materials, we show that this actually works
all considered cases, the difference is lower than 10−10 ). We
on Stp,k with the projection based on the QR decomposition,
observe that our proposed projected means perform well on
i.e.,
n
! both Stiefel and Grassmann manifolds as compared to other
1X considered barycenters on these simulated data. On Stiefel, R-
G = qf Mi (18)
n i=1 barycenters based on polar and QR retractions do not perform
well as the distance of samples to the mean increases, while
is the RL-barycenter of {M i }ni=1 with the QR retraction (14)
our proposed projected arithmetic means remain competitive.
and the lifting LG (M ) = d qf(G)[M − G].
d qf(B)[d B] = Q⊥ Q⊤
⊥dB
+ Q(tril(Q⊤ d BR−1 ) − tril(Q⊤ d BR−1 )⊤ ).
We are interested in d qf(G)[A−G]. By construction, G ∈
Stp,k and A = GR. Hence,
d qf(G)[A − G] = G⊥ G⊤
⊥ (GR − G)
+ G(tril(G⊤ (GR − G)) − tril(G⊤ (GR − G))⊤ ).
By definition, we have G⊤ ⊤
⊥ G = 0. Moreover, tril(G (GR −
G)) = Ptril(R − I k ) = 0. It is enough to conclude that G =
n
qf( n1 i=1 M i ) is indeed an RL-barycenter on Stp,k with the
retraction based on the QR decomposition defined in (14) and
with the lifting LG (M ) = d qf(G)[M − G].