0% found this document useful (0 votes)
43 views25 pages

Hyperbolic Deep Learning in Computer Vision: A Survey

Uploaded by

aya.almallah.96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views25 pages

Hyperbolic Deep Learning in Computer Vision: A Survey

Uploaded by

aya.almallah.96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

International Journal of Computer Vision (2024) 132:3484–3508

https://doi.org/10.1007/s11263-024-02043-5

Hyperbolic Deep Learning in Computer Vision: A Survey


Pascal Mettes1 · Mina Ghadimi Atigh1 · Martin Keller-Ressel2 · Jeffrey Gu3 · Serena Yeung3

Received: 10 May 2023 / Accepted: 27 February 2024 / Published online: 26 March 2024
© The Author(s) 2024

Abstract
Deep representation learning is a ubiquitous part of modern computer vision. While Euclidean space has been the de facto
standard manifold for learning visual representations, hyperbolic space has recently gained rapid traction for learning in
computer vision. Specifically, hyperbolic learning has shown a strong potential to embed hierarchical structures, learn from
limited samples, quantify uncertainty, add robustness, limit error severity, and more. In this paper, we provide a categorization
and in-depth overview of current literature on hyperbolic learning for computer vision. We research both supervised and
unsupervised literature and identify three main research themes in each direction. We outline how hyperbolic learning is
performed in all themes and discuss the main research problems that benefit from current advances in hyperbolic learning for
computer vision. Moreover, we provide a high-level intuition behind hyperbolic geometry and outline open research questions
to further advance research in this direction.

Keywords Hyperbolic deep learning · Computer vision · Representation learning

1 Introduction are based on Euclidean operators and therefore - implicitly


or explicitly - assume that data is best represented on regular
From image segmentation to future frame prediction and grids.
from video grounding to generating images, deep represen- Euclidean space forms an intuitive and grounded underly-
tation learning is the central component that drives modern ing manifold, but its inherent properties are not a best match
computer vision problems (LeCun et al., 2015). In short suc- for all types of data. Consider for example hierarchical struc-
cession, many differentiable layers and network architectures tures such as trees, ontologies, and taxonomies. Hierarchies
have been proposed to tackle visual research problems (Gu are foundational building blocks across all scientific disci-
et al., 2018; Bommasani et al., 2021; Khan et al., 2022). plines to formalize our knowledge (Noy & Hafner, 1997). In
While different in structure, scope, and inductive biases, all hierarchies, the number of nodes grows exponentially with
depth, from few coarse-grained to many fine-grained nodes.
The volume of a ball in Euclidean space however, grows only
Communicated by Jean-Michel Morel.
polynomially with its diameter. An alternative geometry is
B Pascal Mettes needed to match the nature of hierarchies.
[email protected] In the quest for a more appropriate geometry of hierar-
Mina Ghadimi Atigh chies, hyperbolic geometry provides a direct fit (Bridson
[email protected] & Haefliger, 2013). In essence, hyperbolic and Euclidean
Martin Keller-Ressel geometry are different in only one aspect: the parallel pos-
[email protected] tulate. In Euclidean space, there is exactly one parallel line
Jeffrey Gu that goes through a point not on the other line. In hyperbolic
[email protected] space, there are at least two such parallel lines. This change
Serena Yeung comes with many consequences and as a result, hyperbolic
[email protected] geometry can be seen as a geometry of constant negative
1
curvature. In the context of deep learning this geometry has
University of Amsterdam, Amsterdam, The Netherlands
many attractive properties, such as its hierarchical structure
2 TU Dresden, Dresden, Germany and exponential expansion.
3 Stanford University, Stanford, CA, USA

123
International Journal of Computer Vision (2024) 132:3484–3508 3485

Empowered by these geometric properties, hierarchical current overview of hyperbolic learning in the context of
embeddings have in recent years been performed in hyper- computer vision. Our survey extends the survey of Fang
bolic space with great success (Nickel & Kiela, 2017), et al. (2023b) by providing a grouping of the advances in
leading to unparalleled abilities to embed deep and complex supervised and unsupervised hyperbolic learning, delivering
trees with minimal distortion (Ganea et al., 2018a; Sala et an in-depth overview of hyperbolic geometry with its most
al., 2018; Sonthalia & Gilbert, 2020; Verbeek & Suri, 2014; important functionalities for deep learning, and discussing
Chami et al., 2020a). This has led to rapid advances in hyper- emerging advances such as fully hyperbolic learning.
bolic deep learning across many disciplines and research The rest of the paper is organised as follows. In Sect. 2
areas, including but not limited to graph networks (Chami we provide the background on hyperbolic geometry and
et al., 2019; Liu et al., 2019; Dai et al., 2021; Sarkar, 2011; foundational papers on hyperbolic embeddings and hyper-
Sun et al., 2021a; Yang et al., 2022a; Wang et al., 2023b), text bolic neural networks. Sections 3 and 4 provide an overview
embeddings (Tifrea et al., 2019; Zhu et al., 2020; Dhingra et of supervised and unsupervised hyperbolic visual learn-
al., 2018; Dai et al., 2020), molecular representation learning ing literature. Lastly in Sect. 5 we outline advantages and
(Klimovskaia et al., 2020; Yu et al., 2020; Wu et al., 2021; Qu improvements reported in current papers, as well as open
& Zou, 2022b), and recommender systems (Mirvakhabova challenges for the field.
et al., 2020; Wang et al., 2021; Yang et al., 2022b; Li et al.,
2022; Vinh Tran et al., 2020; Chamberlain et al., 2019; Vinh
et al., 2018).
2 Background on Hyperbolic Geometry
In the wake of other disciplines, computer vision has in
recent years also benefited from research into deep learning in
2.1 What is Hyperbolic Geometry?
hyperbolic space. A quickly growing body of literature has
shown that hyperbolic embeddings benefit few-shot learn-
Hyperbolic geometry was initially developed in the 19th cen-
ing (Fang et al., 2021; Khrulkov et al., 2020; Gao et al.,
tury by Gauss, Lobachevsky, Bolyai and others as a concrete
2021; Guo et al., 2022), zero-shot recognition (Long et al.,
example of a non-Euclidean geometry (Cannon et al., 1997).
2020; Liu et al., 2020; Ghadimi Atigh et al., 2021; Hong
Soon after it found important applications in physics, as
et al., 2023b), out-of-distribution generalization (Khrulkov
the mathematical basis of Einstein’s special theory of rel-
et al., 2020; Hong et al., 2023a; Guo et al., 2022), uncer-
ativity. It can be characterized as the geometry of constant
tainty quantification (Khrulkov et al., 2020; Ghadimi Atigh
negative curvature, differentiating it from the flat geometry
et al., 2022; Chen et al., 2022), generative learning (Kingma
of Euclidean space and the positively curved geometry of
& Welling, 2013; Rezende et al., 2014; Lazcano et al., 2021;
spheres and hyperspheres. From the point of view of repre-
Heusel et al., 2017), and hierarchical representation learn-
sentation learning, its attractive properties are its exponential
ing (Dhall et al., 2020; Long et al., 2020; Gulshad et al.,
expansion and its hierarchical, tree-like structure. Exponen-
2023; Liu et al., 2020; Ghadimi Atigh et al., 2022) amongst
tial expansion means that the volume of a ball in hyperbolic
others. These works show evidence that hyperbolic geometry
space growths exponentially with its diameter, in contrast
has a lot of potential for learning in computer vision.
to Euclidean space, where the rate of growth is polynomial.
This survey provides an in-depth overview and catego-
The ‘tree-likeness’ of a metric space can be quantified by
rization of the recent boom in hyperbolic computer vision
Gromov’s hyperbolicity (Bridson & Haefliger, 2013), which
literature. These works have investigated hyperbolic learn-
is zero for tree graphs, finite (but non-zero) for hyperbolic
ing across many visual research problems with different
space, and infinite for Euclidean space.
solutions. As a result, it is unclear how current literature
is connected, what is common and new in each work, and
in which direction the field is heading. This survey seeks 2.2 Models of Hyperbolic Geometry
to fill this void. We investigate both supervised and unsu-
pervised papers. For supervised learning, we identify three Several different, but isometric, models of hyperbolic geome-
shared themes amongst current papers, where samples are try exist (Cannon et al., 1997). They differ in their coordinate
matched to either gyroplanes, prototypes, or other samples representations of points and in their expressions for dis-
in hyperbolic space. For unsupervised papers, we dive into tances, geodesics, and other quantities. Although they can
the three main axes explored in current papers, namely gener- be isometrically mapped to each other, certain models may
ative learning, clustering, and self-supervised learning. This be preferred for a given task, for reasons of numerical effi-
survey fills this void. Peng et al. (2021) have recently writ- ciency, ease of visualization, or simplified calculations. The
ten a general survey on hyperbolic neural networks but their most commonly used models are the Poincaré model, the
main focus is not on advances in computer vision. This hyperboloid (or ‘Lorentz’) model, the Klein model, and the
survey fills this void. Fang et al. (2023b) have made a con- upper half-space model.

123
3486 International Journal of Computer Vision (2024) 132:3484–3508

Fig. 2 Hyperboloid and Poincaré disc model. This figure shows the
relationship between the hyperboloid model and the Poincaré model of
hyperbolic geometry. In each model, two points (red) and their connect-
ing geodesic arc (blue) are shown, as well as the tangent plane (light
blue) at one of the points in the hyperboloid model

Fig. 1 Circle Limit I (1958). This artwork by M. C. Escher is based on


the Poincaré disc model of hyperbolic geometry represented by linear maps. Expressions for distances
and geodesics are simpler compared to other models.
• The Poincaré model represents d-dimensional hyper- Notably, the Poincaré model can be derived as a pro-
bolic space by the unit ball jection (‘stereographic projection’) of the hyperboloid
model to the unit ball (Cannon et al., 1997; Ratcliffe,
Dd = { p ∈ Rd : p12 + · · · + pd2 < 1} 1994). Fig. 2 shows how the hyperboloid model and the
Poincaré ball model are related.
which, in the frequently considered case d = 2 becomes • The Klein model Kd also uses the unit ball to represent
the unit disc. Geodesics (‘shortest paths’) are arcs of hyperbolic space. In contrast to the Poincaré model, it
Euclidean circles (or lines), meeting the boundary of Dd is not conformal; its geodesics, however, are Euclidean
at a right angle. While distances, area and volume are dis- (‘straight’) lines, which can be beneficial from a compu-
torted in comparison to their Euclidean counterparts, the tational point of view, e.g., when computing barycenters.
model is conformal, i.e., hyperbolic angles are measured • Lastly, the upper half space model represents d-
as in Euclidean geometry. In its two-dimensional form as dimensional hyperbolic space by the set Ud = { p ∈
Poincaré disc, the model is popular for visualizations; it Rd : pd > 0}. It is a conformal model and shares many
is also the geometric basis of the art works Circle Limits properties with the Poincaré model; geodesics, for exam-
I-IV of M. C. Escher; see Fig. 1. ple, are also arcs of Euclidean circles (or lines), meeting
the boundary of Ud at a right angle.

• The hyperboloid model uses the single-sheet hyper- 2.3 Five core Hyperbolic Operations
boloid
  Within the context of deep learning and computer vision, we
Hd = { p ∈ Rd+1 : p02 − p12 + · · · + pd2 = 1, p0 > 0} find that five core operations form the basic building blocks of
the vast majority of algorithms that use hyperbolic geometry
as a model of d-dimensional hyperbolic geometry. Con- for learning. The ability to work with these five operations
trary to the other models, its ambient space Rd+1 adds will cover most of the existing literature:
one dimension to the modeled space. Many formulas
involving the hyperboloid model can be written in con- 1. Measuring the distance of two points p and q;
cise form by introducing the Lorentz product p ◦ q = 2. Finding the geodesic arc (the distance-minimizing curve)
p0 q0 − ( p1 q1 + · · · + pd qd ). An advantage of the hyper- from p to q;
boloid model is that it retains some linear structure; 3. Forming a geodesic, by extending a geodesic arc as far as
translations and other isometries, for example, can be possible;

123
International Journal of Computer Vision (2024) 132:3484–3508 3487

4. Using the exponential map, to determine the result of Euclidean to hyperbolic space. In the hyperboloid model,
following a geodesic in direction u, at speed r , starting at the exponential mapping coincides with the expression of the
a point p; geodesic given in (5). In the Poincaré model the exponential
5. Moving a cloud of points, while preserving all their map can be conveniently written in terms of gyrovectorspace
pairwise hyperbolic distances, by applying a hyperbolic addition and is given in (9). In practice, the exponential and
translation. logarithmic mapping functions are tools in vision for map-
ping representations from Euclidean to hyperbolic space or
The distance of two points is given, in the Poincaré and the vice versa. This is common for example when using hyper-
hyperboloid model respectively, by bolic embeddings on top of standard encoders or when using
pre-trained networks.
 
1 2| p − q|2 Finally, the hyperbolic translation τ p , also called Lorentz
dD ( p, q) = √ arcosh 1 + , (1) boost, Möbius transformation, or gyrovectorspace addi-
κ (1 − | p|2 )(1 − |q|2 )
1 tion, is the unique distance-preserving transformation of
dH ( p, q) = √ arcosh ( p ◦ q) . (2) hyperbolic space, which moves 0 to a given point p. Concate-
κ
nations of logarithmic maps, parallel transport in the tangent
In the less frequently used Klein and the upper half space space and exponential maps, as used for example in Ganea et
model, distances are given by al. (2018a) can be expressed in terms of hyperbolic transla-
tions, or equivalently in terms of gyrovectorspace addition;
 
1 1 − p q see Eq. (26) in Ganea et al. (2018a). In the hyperboloid model,
dK ( p, q) = √ arcosh   , (3) the hyperbolic translation can be represented by the linear
κ 1 − | p|2 1 − |q|2
  map
1 | p − q|2
dU ( p, q) = √ arcosh 1 + , (4)
κ 2 pd q d τ p (q) = L p · q, where (6)
 
p0  p̄ 
see Ratcliffe (1994, §6.1). The scaling factor of distances is Lp = with p̄ = ( p0 , . . . , pd ). (7)
p̄ Id + p̄ p̄ 
controlled by the curvature parameter κ ∈ (0, ∞), which
is often standardized to κ = 1. The sectional curvature (in
In the Poincaré model hyperbolic translations are also known
the sense of differential geometry) of hyperbolic space is
as gyrovectorspace addition and form the basic opera-
constant, negative and equal to −κ. Given the distance func-
tion of gyrovectorspace calculus. For the equivalence of
tion, it makes sense to speak of geodesics and geodesic arcs,
gyrovectorspace addition and hyperbolic translations, one
that is (locally) distance-minimizing curves, either extend-
can compare Eq.(4) in Ganea et al. (2018a) and Eq. (4.5.5)
ing infinitely or connecting two points. In the hyperboloid
in Ratcliffe (1994). For the equivalence of hyperbolic trans-
model for example, each geodesic is the intersection of Hd
lations and Lorenz boosts see e.g., Sec. 2.2. in Chen et al.
with a Euclidean hyperplane in the ambient space Rd+1 . The
(2021).
geodesic at a point p ∈ Hd in direction u can be written as
√ √ 2.4 Gyrovectorspace Calculus
λH (t) = cosh(t κ) p + sinh(t κ)u, t ∈ R. (5)
Gyrovectorspace calculus, as introduced by Ungar (2005,
where u is an element of the tangent space T p = {u ∈ Rd+1 :
2012), provides a convenient and rapidly adopted framework
p ◦u = 0}, normalized to u ◦u = −1. In the Poincaré model,
for calculations in the Poincaré ball model. Its first basic
the geodesics are precisely the segments of Euclidean circles
operation is the (non-commutative) gyrovectorspace addition
and lines that meet the boundary of Dd at a right angle. A
convenient formula for the geodesic arc between two points
(1 − | p|2 )q + (1 + 2 p  q + |q|2 ) p
p, q ∈ Dd can be given in terms of gyrovectorspace calculus, p⊕q = .
1 + 2 p  q + | p|2 |q|2
see (8).
The value of the exponential map exp p (tu) is the result of As a secondary operation, the (commutative) gyrovec-
following a geodesic in a normalized direction u at a speed torspace scalar product
t > 0, after starting at a given point p in hyperbolic space.
Identifying Rd with the tangent space T p at p, the exponen- p
tial mapping provides a convenient way to embed Rd into t ⊗ p = p ⊗ t = tanh t artanh(| p|)
| p|
hyperbolic space with origin at p. The exponential map is the
most often used function in hyperbolic learning for computer with a scalar t ∈ R is introduced. Hyperbolic translations
vision, as it allows us to map visual representations from are directly given by τ p (q) = p ⊕ q and the geodesic arc

123
3488 International Journal of Computer Vision (2024) 132:3484–3508

connecting p and q is given hierarchy. To embed the nodes in the Poincaré model,
  Nickel & Kiela (2017) minimize the following loss function:
λD (t) = p ⊕ (− p) ⊕ q ⊗ t , t ∈ [0, 1]. (8)
e−d(u,v)
L() = log , (10)
Letting t range through all of R a full geodesic line is −d(u,v )
(u,v)∈D v ∈N (u) e
obtained.
In the context of gyrovector space calculus, the Poincaré
where N (u) = {v |(u, v ) ∈ / D} ∪ {v} denotes the set of the
ball is often rescaled with the square root of curvature, setting
nodes not related to u, including v, as negative examples.
The loss function pushes unrelated nodes farther apart than
Ddκ = { p ∈ Rd : p12 + · · · + pd2 < 1/κ}.
the related ones. To evaluate the embedded hierarchy, the
distances between pairs of connected nodes (u, v) are cal-
The advantage of this rescaling is that Euclidean space is
culated and ranked among the negative pairs of nodes (i.e.,
obtained as a continuous limit as κ → 0. In the rescaled
the nodes not in D), and the mean average precision (MAP)
model, gyrovectorspace addition and scalar product become
is calculated based on the ranking. Later, Sala et al. (2018)
√ √ propose a combinatorial construction to embed the trees in
p ⊕κ q = √1
κ
( κ p) ⊕ ( κq)
hyperbolic space without optimization and with low distor-
tion, relieving the optimization problems in existing works.
and Ganea et al. (2018a) address drawbacks of Nickel & Kiela
√ (2017) including the collapse of the points on the boundary
t ⊗κ p = √1 (t
κ
⊗ ( κ p)) of the space as a result of the loss function and incapability
of encoding asymmetric relations. They introduce entailment
for p, q ∈ Ddκ . The exponential map in the direction of a cones to embed hierarchies, using a max-margin loss func-
tangent vector v ∈ T p can then be written as tion:
  √  
κ|v| v L= E(u, v) + max(0, γ − E(u , v )),(11)
expκp (v) = x ⊕κ tanh √ (9)
1 − κ| p|2 κ|v| (u,v)∈P (u ,v )∈N

for p ∈ Ddκ , see Ganea et al. (2018b). where γ , P, and N indicate margin, the positive and nega-
tive edges, respectively. E(u, v) is a penalty term that forces
2.5 Non-visual Hyperbolic Learning child nodes to fall under the cone of the parent node. Amongst
others, hyperbolic embeddings have been proposed for multi-
The traction of hyperbolic learning in computer vision is relational graphs (Balazevic et al., 2019), low-dimensional
built upon advances in embedding hierarchical structures, knowledge graphs (Chami et al., 2020b), and learning contin-
designing hyperbolic network layers, and hyperbolic learning uous hierarchies in Lorentz model given pairwise similarity
on other data types such as graphs, text, and more. Below, we measurements (Nickel & Kiela, 2018). Nickel & Kiela
discuss these works and their relevance for hyperbolic visual (2018) proposes to learn embeddings  = {u}i=1 m in the
learning literature. Lorentz model by optimizing
Hyperbolic embedding of hierarchies. Embedding hierarchi-
cal structures like trees and taxonomies in Euclidean space max log Pr (φ(i, j) = j|) (12)
suffers from large distortion (Bachmann et al., 2020), and 
i, j
polynomial volume expansion, limiting its capacity to cap-
ture the exponential complexity of hierarchies. However, where given N (i, j) as the set concepts to embed,
hyperbolic space can be thought of as a continuous version
of trees (Nickel & Kiela, 2017) and has tree-like prop- φ(i, j) = arg min d(u i , u z )
erties (Hamann, 2018; Ungar, 2008), like the exponential z∈N (i, j)
growth of distances when moving from the origin towards e−d(u i ,u j )
the boundary. Encouraged by this, Nickel & Kiela (2017) Pr (φ(i, j) = j|) = .
z∈N (i, j) e−d(u i ,u z )
propose to embed hierarchical structures on the Poincaré
model. The goal is to learn hyperbolic representations for
the nodes of a hierarchy, such that the distance in the embed- Hyperbolic neural networks. Foundational in the transition
ding space has an inverse relation with semantic similarity. of deep learning towards hyperbolic space is the develop-
Let D = {(u, v)} denote the set of the nodes connected in a ment of hyperbolic network layers and their optimization.

123
International Journal of Computer Vision (2024) 132:3484–3508 3489

We consider two pivotal papers here that provide a such the- Following Ganea et al. (2018b) and Shimizu et al. (2021),
oretical foundation, namely Hyperbolic Neural Networks by Yang et al. (2023) investigate the hierarchical representa-
Ganea et al. (2018b) and Hyperbolic Neural Networks++ by tion ability of the existing HNNs and HGNNs, improving
Shimizu et al. (2021). Ganea et al. (2018b) propose multino- the hierarchical ability through hyperbolic informed embed-
mial logistic regression in the Poincaré ball. ding (HIE) via incorporating hierarchical distance of the node
Given k ∈ {1, ..., K } classes, pk ∈ Dnc , ∀q ∈ Dnc , and to origin. HIE is task- and model-agnostic and can be used
ak ∈ Dnc \{0}, hyperbolic logistic regression is performed to improve the hierarchical embedding ability of different
using hyperbolic models (i.e., Poincaré model and Lorentz model).
Park et al. (2023) use hyperbolic neural networks and propose

λcpk ak  a Hyperbolic Affinity Learning method for spatial propa-
p(y = k|q) ∝ exp √
c gation and learning the hierarchical relationship among the
 √  (13) pixels.
−1 2 c− pk ⊕c q, ak 
sinh . Hyperbolic learning of graphs, text, and more. The advances
(1 − c− pk ⊕c q2 )ak 
in hyperbolic embeddings of hierarchies and the introduc-
Intuitively, the above equation describes the distance to the tion of hyperbolic network layers have spurred research in
margin hyperplane in hyperbolic space. As an extension, a several other research directions as well. As a logical exten-
hyperbolic version of linear layer f is given as f : Rn → sion of hierarchical embeddings, graph networks have been
Rm , a Möbius version of f where the map from Dn → Dm extended to hyperbolic space. Liu et al. (2019) and Chami
is defined as: et al. (2019) propose a tangent-based view to hyperbolic
graph networks. Both approaches model a graph layer by first
f ⊗c := expc0 ( f (logc0 (q))), (14) mapping node embeddings to the tangent space, then per-
forming the transformation and aggregation in the tangent
space, after which the updated node embeddings are pro-
with expc0 : T0m Dmc → Dc and log0 : Dc → T0n Dc . They
m c n n
jected back to the hyperbolic manifold at hand. Since tangent
furthermore outline how to create recurrent network layers.
operations only provide an approximation of the graph oper-
Bdeir et al. (2023) also provide the Lorentzian formulation
ations on the manifold, several works have proposed graph
of 2D convolutional layer, batch normalization, and multino-
networks that better abide the underlying hyperbolic geome-
mial logistic regression. As (Bdeir et al., 2023) show, given
try, such as constant curvature κ-GCNs (Bachmann et al.,
parameters ac ∈ R and z c ∈ Rn , the logit for class c and
2020), hyperbolic-to-hyperbolic GCNs (Dai et al., 2021),
input x ∈ LnK is given as:
Lorentzian GCNs (Zhang et al., 2021c), Lorentzian nested
1 √ α hyperbolic GCNs (Fan et al. 2022), attention-based hyper-
vz c ,ac (x) = √ sign(α)β| sinh−1 ( −K )| bolic graph networks (Gulcehre et al., 2019; Zhang et al.,
−K β
√ √ 2021b), dynamic hyperbolic graph attention network (Li et
α = cosh( −K a)z, xs  − sinh ( −K a) (15)
al., 2023a), and embedding graphs by combining hyperbolic
√ √ and diffusion geometry (Lin et al., 2023c). Hyperbolic graph
β =  cosh ( −K a)z2 − (sinh ( −K a)z)2 .
networks have shown to improve node, link, and graph clas-
sification compared to Euclidean variants, especially when
Shimizu et al. (2021) reformulate the hyperbolic logistic
graphs have latent hierarchical structures.
regression of Ganea et al. (2018b) to reduce the number of
Hyperbolic embeddings have also been investigated for
parameters to the same level as the Euclidean logistic regres-
text. Tifrea et al. (2019), Dhingra et al. (2018), and Leimeis-
sion. Their linear layer is given as:
ter & Wilson (2018) propose hyperbolic alternatives for word
 embeddings. Zhu et al. (2020) introduce HyperText to endow
y = F c ( p; Z , r ) := w(1 + 1 + cw2 )−1 (16) FastText with hyperbolic geometry. Embedding text in hyper-
bolic space has the potential to improve similarity, analogy,
where Z = {z k ∈ T0 Bnc = Rn }m k=1 , r = {rk ∈ R}k=1 , and
m
and hypernymy detection, most notably with few embedding
1 √
w := (c− 2 sinh( cvk ( p)))mk=1 . More importantly for com- dimensions.
puter vision, they show how to formulate convolutional layers Beyond text and graphs, hyperbolic learning has shown
using Poincaré fully connected layer and β-concatenation. to be beneficial for several other research directions, includ-
To do so, they show how to generalize the hyperbolic linear ing but not limited to learning representations for molec-
layer to image patches through β-splits, and β-concatenation, ular/cellular structures (Klimovskaia et al., 2020; Yu et
leading in principle to arbitrary-dimensional convolutional al., 2020; Wu et al., 2021), recommender systems (Mir-
layers. Moreover, Poincaré multi-head attention is possible vakhabova et al., 2020; Wang et al., 2021; Yang et al., 2022b),
through the same operators. reinforcement learning (Cetin et al., 2022), music genera-

123
3490 International Journal of Computer Vision (2024) 132:3484–3508

Fig. 3 The three core strategies for supervised hyperbolic learning hyperbolic class hyperplanes, i.e., gyroplanes, (ii) to hyperbolic class
in computer vision. Current literature performs hyperbolic learning prototypes, or (iii) by contrasting to other samples
of visual embeddings by learning to match training samples (i) to

tion (Huang et al., 2023), skeletal data (Franco et al., 2023; space, and networks are optimized to minimize hyperbolic
Chen et al., 2023), LiDAR data (Tong et al., 2022; Wang et distances between samples and prototypes.
al., 2023a), point clouds (Montanaro et al., 2022; Anvekar & 3. Sample-to-sample learning denotes the setting where net-
Bazazian, 2023; Lin et al., 2023b; Onghena et al., 2023), 3D works are optimized by learning metrics or contrastive
shapes (Chen et al., 2020b; Onghena et al., 2023; Leng et al., objectives between samples in a batch.
2023), and remote sensing data (Hamzaoui et al., 2023). In
summary, hyperbolic geometry has impacted a wide range of
For all strategies, let (x, y) denote the visual input x, which
research fields. This survey focuses specifically on the impact
can be an image or a video, and the corresponding label
and potential in the visual domain.
y ∈ Y. Let f θ (x) ∈ R D denote its Euclidean embedding
after going through a network. This representation is mapped
to hyperbolic space using the exponential map, denoted as
g(x) = exp0 ( f θ (x)). In many hyperbolic works, additional
3 Supervised Hyperbolic Visual Learning
information about hierarchical relations between classes is
assumed. Let H = (Y, P, R), with Y the class labels denot-
In Fig. 3, we provide an overview of literature on supervised
ing the leaf nodes of the hierarchy, P the internal nodes, and
learning with hyperbolic geometry in computer vision. In cur-
R the set of hypernym-hyponym relations of the hierarchy.
rent vision works, hyperbolic learning is mostly performed
Below, we discuss how current literature tackles each strat-
at the embedding- or classifier-level. In other words, current
egy in detail sequentially.
works rely on standard networks for feature learning and
transform the output embeddings to hyperbolic space for the
final learning stage. For supervised learning in hyperbolic 3.1 Sample-to-Gyroplane Learning
space, we have identified three main optimization strategies:
The most direct way to induce hyperbolic geometry in the
classification space is by replacing the classification layer by
1. Sample-to-gyroplane learning denotes the setting where a hyperbolic alternative. This can be done either by means of
classes are represented by hyperbolic hyperplanes, i.e., a hyperbolic logistic regression or through hyperbolic kernel
gyroplanes, with networks optimized based on confidence machines.
logit scores between samples and gyroplanes. Hyperbolic logistic regression. Khrulkov et al. (2020) incor-
2. Sample-to-prototype learning denotes the setting where porate a hyperbolic classifier by taking a standard convolu-
class semantics are represented as points in hyperbolic tional network and mapping the outputs of the last hidden

123
International Journal of Computer Vision (2024) 132:3484–3508 3491

layer to hyperbolic space using an exponential map. After- parametrization to hyperbolic space. These works indicate
wards, the hyperbolic multinomial logistic regression as the need for more robust representation and optimization
described by Ganea et al. (2018b) is used to obtain class when working in hyperbolic space.
logits which can be optimized with cross-entropy. They find Next to global classification, a few recent works have
that training a hyperbolic classifier on top of a convolutional investigated hyperbolic logistic regression for structured
network allows us to obtain uncertainty information based prediction tasks such as object detection and image segmen-
on the distance to the origin of the hyperbolic embeddings tation. Valada (2022) extend object detection with hyperbolic
of images. Out-of-distribution samples on average have a geometry, amongst others by replacing the classifier head
smaller norm, making it possible by differentiating in- to of a two-stage detection like Sparse R-CNN (Sun et al.,
out-of-distribution samples by sorting them by the distance 2021b) with a hyperbolic logistic regression, improving
to the origin. Hong et al. (2023a) show that hyperbolic clas- object detection performance in standard and zero-shot set-
sification is beneficial for visual anomaly recognition tasks, tings. Ghadimi Atigh et al. (2022) introduce Hyperbolic
such as out-of-distribution detection in image classification Image Segmentation, where the final per-pixel classification
and segmentation tasks. Araño et al. (2021) use hyperbolic was performed in hyperbolic space. Starting from the geo-
layers to perform multi-modal sentiment analysis based on metric interpretation of hyperbolic gyroplanes of Ganea et
the audio, video, and text modalities. Ahmad & Lecue (2022) al. (2018b), they find that simultaneously computing class
also show the effect of hyperbolic space to perform object logits over all pixels of all images in a batch, as is customary
recognition with ultra-wide field-of-view lenses. Han et al. in Euclidean networks, is not directly applicable in hyper-
(2023) show that hyperbolic embeddings with logistic regres- bolic space. This is because the explicit computation of the
sion and an extra contrastive loss benefits anti-face spoofing. Möbius addition requires evaluating a tensor in RW ×H ×|Y |×d
Guo et al. (2022) address a limitation when training clas- for an images of size (W × H ) with d embedding dimensions.
sifiers in hyperbolic space, namely a vanishing gradient Instead, they rewrite the Möbius addition as:
problem due to the hybrid architecture of current hyperbolic
approaches in computer vision, where Euclidean features are
f 1 ⊕c f 2 = α f 1 + β f 2 ,
connected to a hyperbolic classifier. Equation 13 highlights
that to maximize the likelihood of correct predictions, the 1 + 2c f 1 , f 2  + c|| f 2 ||2
α= ,
distance to hyperbolic gyroplanes needs to be maximized. In 1 + 2c f 1 , f 2  + c2 || f 1 ||2 || f 2 ||2 (18)
practice, embeddings of samples are pushed to the boundary 1 + c|| f 1 ||2
of the Poincaré ball. As a result, the inverse of the Riemannian β= .
1 + 2c f 1 , f 2  + c2 || f 1 ||2 || f 2 ||2
tensor metric approaches zero, resulting in small gradients.
This finding is in line with several other works on vanishing
gradients in hyperbolic representation learning in Poincaré This rewrite reduces the addition to adding two tensors
and Lorentz models (Nickel & Kiela, 2018; Liu et al., 2019). in RW ×H ×|Y | , allowing for per-pixel evaluation on image
To combat the vanishing gradient problem, Guo et al. batches. For training, Ghadimi Atigh et al. (2022) incorporate
(2022) propose to clip the Euclidean embeddings of sam- hierarchical information by replacing the one-hot softmax
ples before the exponential mapping, i.e.: with a hierarchical softmax:

  
clipped r exp(ξh (g(x)i j ))
fθ (x) = min 1, · f θ (x), (17) p( ŷ = y|g(x)i j ) = , (19)
|| f θ (x)||
h∈H y s∈Sh exp(ξs (g(x)i j ))

with r as a hyperparameter. This trick improves learning with


hyperbolic multinomial logistic regression, especially when with H y = {y} ∩ A y the set containing y and its ancestors
dealing with many classes such as on ImageNet. Further- and Sh the set of siblings of class h. Performing per-pixel
more, training with clipped hyperbolic classifiers improves classification with hyperbolic hierarchical logistic regression
out-of-distribution detection over training with Euclidean opens up multiple new doors for image segmentation. First,
classifiers, while also being more robust to adversarial the notion of uncertainty as given by the hyperbolic norm of
attacks. However, Moreira et al. (2023) dive into the hyper- output embeddings generalizes naturally to the pixel level.
bolic prototypical networks with high-dimensional output As shown in Fig. 4, the norm of pixel embeddings correlates
space while performing a few-shot learning task, show- with semantic ambiguity; the closer the pixel is to a semantic
ing that the hyperbolic representations concentrate close to boundary the lower the pixel norm. Chen et al. (2022) have
the surface, resulting in a boundary saturation. Mishne et already used this insight to improve image segmentation.
al. (2023) analyze the limitations and differences between They outline a hyperbolic uncertainty loss, where the cross-
Poincaré and Lorentz models, along with a Euclidean entropy loss of a pixel is weighted as follows for pixelxi j :

123
3492 International Journal of Computer Vision (2024) 132:3484–3508

classes based on their sample mean, in the spirit of Prototyp-


ical Networks (ProtoNet) (Snell et al., 2017), or embeddings
classes based on a given hierarchy over all classes.

Hyperbolic ProtoNet In Prototypical Networks (Snell et al.,


2017), the prototype of a class k is determined as the mean
vector of the samples belonging to that class:

1
PR (k) = f θ (xs ), (21)
|Sk |
ys ∈Sk

with Sk the set of samples belonging to class k. Inference


can in turn be performed by assigning the label of the nearest
Fig. 4 Hyperbolic image segmentation naturally provides us per-pixel
uncertainty information. Pixels with low hyperbolic norm constitute prototype for a test sample. Khrulkov et al. (2020) gener-
pixels with high uncertainty and are strongly correlated with close- alize this formulation to Hyperbolic Prototypical Networks.
ness to semantic boundaries. Figure reproduced with permission of Since computing averages in the Poincaré ball model requires
Ghadimi Atigh et al. (2022)
expensive Fréchet mean calculations, they perform averag-
ing using the Einstein midpoint, given in Klein coordinates
1 as:
uw(xi j ) = 1 +  , (20)
dh (g(x)i j ,0)
log t + dh (g(s),0)
|Sk | |Sk |
PK (k) = γi gK (xi )/ γi , (22)
i=1 i=1
with s the most confident pixel and t a hyperparameter
set to 1.02 in order to have a wide weight variation while with γi the Lorentz factors:
avoiding division by zero. Adding this weight to the cross-
entropy pixel loss consistently improves segmentation results 1
γi =  . (23)
for well-known segmentation networks. Other benefits of 1 − c||g(xi )||2
hyperbolic image segmentation include better zero-label gen-
eralization and higher effectiveness with few embedding Since Khrulkov et al. (2020) operate in the Poincaré
dimensions compared to Euclidean pixel embeddings. ball model, this averaging operation requires transforming
Hyperbolic kernel machines. Next to logistic regression, Cho embeddings to and from the Klein model:
et al. (2019) provide a general formulation for kernel meth-
ods in hyperbolic space with large-margin classifiers. Fang 2gD (xi )
gK (xi ) = ,
et al. (2021, 2023a) introduce positive definite kernel func- 1 + c||gD (xi )||2
(24)
tions in hyperbolic space and show its potential for computer gK (xi )
vision. Specifically, they propose hyperbolic instantiations gD (xi ) =  ,
1 + 1 − c||gK (xi )||2
of tangent kernels, radial basis function kernels, (general-
ized) Laplace kernels, and binomial kernels. The kernels can with gD (xi ) and gK (xi ) the embeddings of input xi in respec-
be plugged on top of convolutional networks and trained tively the Poincaré ball model and the Klein model. Akin
with cross-entropy to benefit from both the representation to its Euclidean counterpart, Hyperbolic ProtoNet is used
learning of the convolutional layers and the hyperbolic to address few-shot learning, where the sample mean pro-
kernel dynamics in the classifier. Deep learning with hyper- totype serves as the class representation. Khrulkov et al.
bolic kernel methods improves few-shot learning, person (2020) show that performing prototypical few-shot learning
re-identification, and knowledge distillation. Zero-shot learn- in hyperbolic space is competitive to Euclidean prototypi-
ing is even enabled through kernel distances between visual cal learning, even resulting in better accuracy scores when
embeddings and semantic class representations. relying on a 4-layer ConvNet as the backbone.
As a follow-up work, Gao et al. (2021) show that different
3.2 Sample-to-Prototype Learning tasks and even individual classes in few-shot learning favor
different curvatures. They propose to generate a per-class cur-
The most popular strategy in hyperbolic learning is to repre- vature based on the second-order statistics of its in-class and
sent classes as prototypes, i.e., as points in hyperbolic space. out-of-class sample representations. Using the second-order
In this research direction, there are two solutions: embedding statistics, a multi-layer perceptron with sigmoid activation is

123
International Journal of Computer Vision (2024) 132:3484–3508 3493

learned to fix the range of the curvature to [0, 1]. Given class-
specific curvatures, prototypes are obtained by constructing
an intra-class distance matrix on top of which an MLP is
trained. The MLP serves as weights for each in-class sam-
ple. The procedure is repeated for the closest samples in the
out-of-class set, after which the per-class prototype is given
as the weighted hyperbolic average over the in-class and
closest out-of-class samples. The curvature generation and
weighted hyperbolic averaging improve few-shot learning in
both inductive and transductive settings.
The hyperbolic clipping of Guo et al. (2022) is also effec-
tive for few-shot learning, consistently outperforming the
standard ProtoNet and Hyperbolic ProtoNet on the CUB Fig. 5 Hierarchical knowledge amongst classes provides a structure for
Birds and miniImageNet few-shot benchmarks. A few other hyperbolic embeddings in computer vision approaches, where classes
are represented as points or prototypes in hyperbolic space according
works have extended Hyperbolic ProtoNet for few-shot to their hypernym-hyponym relations. For example, Long et al. (2020)
learning with set- and grouplet-based learning and will be exploit hierarchical relations from different actions for action hierar-
discussed in the sample-to-sample learning section. chies (right). Figure reproduced with permission of Long et al. (2020)
Recently, Gao et al. (2022) investigate feature augmenta-
tion in hyperbolic space to solve the overfitting problem when
dealing with limited data. On top, they introduce a scheme
to estimate the feature distribution using neural-ODE. These search. In a similar spirit, Dhall et al. (2020) show that using
elements are then plugged into few-shot approaches such hyperbolic entailment cones for image classification is empir-
as the hyperbolic prototypical networks of Khrulkov et ically better than using Euclidean entailment cones. Rather
al. (2020), improving performance. Choudhary & Reddy than separating hierarchical and visual embedding learning,
(2022) improve hyperbolic few-shot learning by reformu- Yu et al. (2022b) propose to simultaneously learn hierarchi-
lating hyperbolic neural networks through Taylor series cal and visual representations for skin lesion recognition in
expansions of hyperbolic trigonometric functions and show images. Image embeddings are optimized towards their cor-
that it improves the scalability and compatibility, and outper- rect class prototype, while the classes are optimized to abide
forms Hyperbolic ProtoNet. by their hyperbolic entailment cones with an extra distortion
Hierarchical embedding of prototypes. Where Hyperbolic loss to obtain better hierarchical embeddings. Gulshad et al.
ProtoNets are effective in few-shot settings, a number of (2023) propose Hierarchical Prototype Explainer, a reason-
works have also investigated prototype-based solutions for ing model in hyperbolic space to provide explainability in
the general classification. As starting point, these works com- video action recognition. Their approach learns hierarchical
monly assume that the classes in a dataset are organized in a prototypes at different levels of granularity e.g., parent and
hierarchy, see Fig. 5. Long et al. (2020) embed action class grandparent levels, to explain the recognized action in the
hierarchy H in hyperbolic space using hyperbolic entailment video. By learning the hierarchical prototypes, they can pro-
cones (Ganea et al., 2018a), with an additional loss to increase vide explanations on different levels of granularity, including
the angular separation between leaf nodes to avoid inter- interpretation of the prediction of a specific class label and
label confusion amongst class labels Y. With L H (H) as the providing information on the spatiotemporal parts that con-
hyperbolic embedding loss for hierarchy H, let P denote the tribute to the final prediction. Li et al. (2023c) investigate
leave nodes of the hierarchy. Then the separation-based loss the semantic space of action recognition datasets and bridge
is given over the leaf nodes as: the gap between different labeling systems. To achieve a uni-
fied action learning, actions are connected into a hierarchy
L S (P) = 1T ( P̂ P̂ T − I )1, (25) using VerbNet (Schuler, 2005) and embedded as prototypes
in hyperbolic space.
with P̂ the 2 -normalized representations of the leaf nodes. Hierarchical prototype embeddings have also been suc-
By combining the hierarchical and separation based losses, cessfully employed in the zero-shot domain. Liu et al. (2020)
the hierarchy is embedded to balance both hierarchical show how to perform zero-shot learning with hyperbolic
constraints and discriminative abilities. The embedding is embeddings. Classes are embedded by taking their WordNet-
learned a priori, after which video embeddings are projected based Poincaré Embeddings (Nickel & Kiela, 2017) and text-
to the same hyperbolic space and optimized to their correct based Poincaré GloVe embeddings (Tifrea et al., 2019). Both
class embedding. This approach improves action recogni- are concatenated to obtain class prototypes. By optimizing
tion, zero-shot action classification, and hierarchical action seen training images to their prototypes, it becomes possible

123
3494 International Journal of Computer Vision (2024) 132:3484–3508

to generalize to unseen classes during testing through a near- 3.3 Sample-to-Sample Learning
est neighbor search in the concatenated hyperbolic space. Xu
et al. (2022) also perform hyperbolic zero-shot learning by Lastly, a number of recent works have investigated hyper-
training hyperbolic graph layers (Chami et al., 2019) on top bolic learning by contrasting between samples.
of hyperbolic word embeddings. Dengxiong & Kong (2023)
show the potential of hyperbolic space in generalized open set Hyperbolic Metric Learning Ermolov et al. (2022) inves-
recognition, which classifies unknown samples based on side tigate the potential of hyperbolic embedding for metric
information. A side information (taxonomy) learning frame- learning. In metric learning, the de facto solution is to match
work is introduced to embed the information in hyperbolic representations of sample pairs based on embeddings given
space with low distortion and identify the unknown samples. by a pre-trained encoder. Rather than relying on Euclidean
Moreover, an ancestor search algorithm is outlined to find the distances and contrastive learning for optimization, they
most similar ancestor in the taxonomy of the known classes. propose a hyperbolic pairwise cross-entropy loss. Given a
For standard classification, Ghadimi Atigh et al. (2021) dataset with |Y| classes, each batch samples two samples
show how to integrate uniformity amongst prototypes in from each category, i.e., K = 2 · |Y|. Then the loss function
hyperbolic space by embedding classes with maximum sepa- for a positive pair with the same class label is given as:
ration on the boundary of the Poincaré ball given by Mettes et
al. (2019); Kasarla et al. (2022). With prototypes now at the exp(−D(g(xi ), g(x j ))/τ )
ij = − log K
, (27)
k=1 exp(−D(g(x i ), g(x k ))/τ )
boundary of the ball, standard distance functions no longer
apply since they are at the infinite distance to any point within
the ball. To that end, they propose to use the Busemann dis- where D(·, ·) can be either a hyperbolic or a cosine dis-
tance, which is given for hyperbolic image embedding g(x) tance and τ denotes a temperature hyperparameter. This
and prototype p as: loss is computed over all positive pairs (i, j) and ( j, i) in
a batch. Using supervised (Dosovitskiy et al., 2021) and
 
|| p − g(x)||2 self-supervised (Caron et al., 2021) vision transformers as
b p (g(x)) = log . (26)
1 − ||g(x)||2 encoders, hyperbolic metric learning consistently outper-
forms Euclidean alternatives and sets state-of-the-art on
By fixing prototypes with maximum separation a priori and fine-grained datasets.
minimizing this distance function with an extra regularization Hyperbolic metric learning has shown to be effective
towards the origin, it becomes possible to perform hyperbolic to overcome overfitting and catastrophic forgetting in few-
prototypical learning with prototypes at the ideal boundary. shot class-incremental learning tasks, explored by Cui et al.
Ghadimi Atigh et al. (2021) show that such an approach (2022). This is done by adding a metric learning loss as a part
has direct links with conventional logistic regression in the of the distillation in continual learning. They also propose a
binary case, highlighting its inherent properties. Moreover, hyperbolic version of Reciprocal Point Learning (Chen et al.,
maximally separated prototypes can also be replaced by pro- 2020a) to provide extra-class space for known categories in
totypes from word embeddings or hierarchical knowledge, the few-shot learning stage. Yan et al. (2023) also explore
depending on the available knowledge and task at hand. In hyperbolic metric learning, incorporating noise-insensitive
addition to standard classification, hierarchical hyperbolic and adaptive hierarchical similarity to handle noisy labels
embeddings have demonstrated effectiveness in continual and multi-level relations. Kim et al. (2022) add a hierarchical
learning (Gao et al., 2023). To learn the new data, Gao regularization term on top of the metric learning approaches,
et al. (2023) propose a dynamically expanding geometry with the goal of learning hierarchical ancestors in hyperbolic
through a mixed-curvature space, enabling learning of com- space without any annotation. Hyperbolic metric learning
plex hierarchies in a data stream. To prevent forgetting, is furthermore effective in semantic hashing (Amin et al.,
angle-regularization and neighbor-robustness losses are used 2022), face recognition via large-margin nearest-neighbor
to preserve the geometry of the old data. learning (Trpin & Boshkoska, 2022), and multi-modal align-
Few-shot learning has also been investigated with hierar- ment given videos and knowledge graph (Guo et al., 2021).
chical knowledge. Zhang et al. (2022) perform such few-shot Following the progress of large language models and the
learning by first training a network on a joint classification success of vision-language models (e.g., CLIP (Radford et
and hierarchical consistency objective. The classification is al., 2021)) in multimodal representation learning, Desai et
given as a softmax over the class probabilities, as well as al. (2023) propose a hyperbolic image-text representation in
the softmax over the superclasses. In the few-shot inference Lorentz model. The proposed method first processes the input
stage, class prototypes are obtained through hyperbolic graph image and text using two separate encoders. Then, the gen-
propagation to deal with the limited sample setting, improv- erated embedding is projected into the hyperbolic space, and
ing few-shot learning as a result. training is performed using a contrastive and entailment loss.

123
International Journal of Computer Vision (2024) 132:3484–3508 3495

The paper shows that the proposed approach outperforms benefits few-shot learning, especially when dealing with out-
the Euclidean CLIP as it is capable of capturing hierarchi- liers.
cal multimodal relations in hyperbolic space. Hong et al. In the context of metric learning, Zhang et al. (2021a)
(2023b) also explore multimodal data to perform zero-shot argue that sample-to-sample learning is computationally
learning with audio-visual data with a curvature-aware geo- expensive, while sample-to-prototype learning is less accu-
metric solution. To align the features extracted from the audio rate. They propose a hybrid strategy based on grouplets.
and video modalities, Hong et al. (2023b) propose Hyper- Each grouplet is a random subset of samples and the
align, a hyperbolic alignment loss in a fixed curvature setup, set of grouplets is matched with prototypes through a
followed by Hyper-single, a module to enable learnable cur- differentiable optimal transport. Akin to Ermolov et al.
vature, and Hyper-multiple, to calculate the alignment loss (2022), they show that using hyperbolic embedding spaces
within different curvatures. improved metric learning on fine-grained datasets. More-
Hyperbolic set-based learning. Where sample-to-prototype over, they provide empirical evidence that other metric-based
and sample-to-sample approaches compare samples to indi- losses benefit from hyperbolic embeddings, highlighting
vidual elements, some works have shown that set-based and the general utility of hyperbolic space for metric learn-
group-based distances are more effective and robust. Ma et al. ing.
(2022) introduce an adaptive sample-to-set distance function
in the context of few-shot learning. Rather than aggregating
support samples to a single prototype, an adaptive sample-
to-set approach is proposed to increase the robustness to the 4 Unsupervised Hyperbolic Visual Learning
outliers. The sample-to-set function is a weighted average
of the distance from the query to all support samples, where Hyperbolic learning has been actively researched in the
the distance is calculated with a small network over the fea- unsupervised domain of computer vision. We identify three
ture maps of the query and support samples. This approach dominant research directions in which hyperbolic deep learn-
ing has found success: generative learning, clustering, and
self-supervised learning. Below, each is discussed separately.

Fig. 6 The three major methods for unsupervised hyperbolic learning in computer vision. Current literature performs unsupervised learning in
hyperbolic space using (i) generative models, (ii) clustering, (iii) self-supervised learning

123
3496 International Journal of Computer Vision (2024) 132:3484–3508

4.1 Generative Approaches

4.1.1 Hyperbolic VAEs

Variational autoencoders (VAEs) (Kingma & Welling, 2013;


Rezende et al., 2014) with hyperbolic latent space have
been used to learn representations of images. Nagano et
al. (2019) propose the hyperbolic wrapped normal distri-
bution in Lorentz model and derive algorithms for both
reparametrizable sampling and computing the probability
density function. They then derive a hyperbolic β-VAE (Hig-
gins et al., 2017) using the wrapped normal function as the Fig. 7 The standard hyperbolic wrapped normal (top) and rotated
prior and posterior, replacing the usual (Euclidean) Gaussian hyperbolic wrapped normal (bottom). In (a), the principal axes of
the normal distribution are illustrated. In (b), the principal axes of the
distribution. The wrapped normal distribution in a manifold transported normal distribution are visualized. The density of the two
M is the pushforward measure under the exponential map distributions are visualized in (c). Image courtesy of Cho et al. (2022)
expM . Thus, a sample z can be obtained as (Mathieu et al.,
2019):
where Z R is a normalizing constant, μ and σ 2 are the mean
and variance. Mathieu et al. (2019) additionally introduce
 
z = expM
μ G(μ) −1/2
v , v ∼ N (·|0, ) (28) the use of a gyroplane layer as the first layer of the decoder,
following Ganea et al. (2018b). Noting that a Euclidean affine
transform can be written as
where expM μ is the exponential map of M at μ and G is
f a, p (z) = sign(a, z − p)||a||d E (z, Ha, p )
the matrix representation of the metric of M, and v is a ran-
dom sample from Euclidean normal distribution with mean 0
where Ha, p = {z ∈ Rn |a, z − p = 0} is the decision
and variance . To accommodate the geometry of the latent
hyperplane, they replace each piece of the formula with its
space, exponential and logarithmic maps were added at the
hyperbolic counterpart to obtain
end of the VAE encoder and before the start of the VAE
decoder, respectively. In order to train their hyperbolic VAE
f a,c p (z) = sign(a, logcp (z) p )||a|| p d cp (z, Ha,
c
p) (30)
with the typical evidence lower bound, Nagano et al. (2019)
compute the density of the wrapped normal distribution
p = {z ∈ H|a, log p (z) = 0}. The closed-
where all Ha, c c
using the change-of-variables formula. Since their sampling form formula for the distance term in the Poincaré ball is
algorithm required the exponential and parallel transport
 √ 
maps, Nagano et al. (2019) compute the log-determinants 1 −1 2 c|− p ⊕c z, a|
d p (z, Ha, p ) = √ sinh
c c
and inverses of these maps in order to apply the change-of- c (1 − c|| − p ⊕c z||2 )||a||
variables formula. Nagano et al. (2019) then use their VAE (31)
to learn representations of MNIST and Atari 2600 Breakout
screens. On MNIST, Hyperbolic representations outperform Mathieu et al. (2019) also use their hyperbolic VAE to
Euclidean representations at low latent dimensions but were learn representations of MNIST and find that using both the
overtaken starting at dimension 10. Riemannian normal and the gyroplane layer improve test log-
Mathieu et al. (2019) extend the work of Nagano et al. likelihoods, especially at low latent dimensions.
(2019) by introducing the Riemannian normal distribution Cho et al. (2022) extend the previous two works by
and deriving reparametrizable sampling schemes for both proposing a new version of the hyperbolic wrapped nor-
the Riemannian normal and wrapped normal using hyper- mal distribution (HWN) in Lorentz model. Their primary
bolic polar coordinates. The Riemannian normal views the observation is that for the wrapped normal distribution, the
Euclidean normal distribution as the distribution minimiz- principal axes of the distributions are not aligned with the
ing the entropy for a given mean and standard deviation and local standard axes, see Fig. 7. They propose a new sam-
defines a new normal distribution on hyperbolic space with pling process that fixes the alignment of the principal axes,
this property: resulting in a new distribution which they call the rotated
hyperbolic wrapped normal (RoWN). Given a mean μ in
  the Lorentz model of hyperbolic geometry and a diagonal
1 dM (μ, z)2 covariance matrix , samples from the RoWN distribution
NM (z|μ, σ ) = R exp −
R 2
(29)
Z 2σ 2 are sampled as follows:

123
International Journal of Computer Vision (2024) 132:3484–3508 3497

1. Find the rotation matrix R that rotates the x-axis x =


([±1, . . . , 0]) to y = μ1: . We can compute R as

(y T x − x T y)2
R = I + (y T x − x T y) + (32)
1 + x, y

2. Rotate  by R:  ˆ = R R T
3. Now sample as in the usual hyperbolic wrapped normal:
Fig. 8 Hierarchical attribute editing in hyperbolic space is possible due
sample v ∼ N (0, )ˆ and then map it to hyperbolic space
to hyperbolic space’s ability to encode semantic hierarchical structure
as follows: expμ (PT0→μ ([0, v])) within image data. Changing the high-level, category-relevant details
(closest to the origin) changes the category, while changing low-level
(farthest from the origin), category-irrelevant attributes varies images
Cho et al. (2022) find that RoWN outperforms HWN in a within categories. Image courtesy of Li et al. (2023b)
variety of settings, such as the Atari 2600 Breakout image
generation experiment first examined in Nagano et al. (2019).
and the third a a new Lorentz concatenation layer:

⎡ ⎤
4.1.2 Hyperbolic GANs    N
HCat {xi }i=1 = 
N ⎣ xi2t + (N − 1)/K , x1s , . . . , x1s ⎦
Using the intuition that images are organized hierarchically, i=1
several works have proposed hyperbolic generative adver- (33)
sarial networks (GANs). Lazcano et al. (2021) propose a
hyperbolic GAN which replaces some of the Euclidean layers
Compared to previous work (Shimizu et al., 2021), the HCat
in both the generator and discriminator with hyperbolic lay-
layer has the advantage of always having bounded gradients
ers (Ganea et al., 2018a) with learnable curvature. Lazcano et
(Shimizu et al., 2021). Compared to Lazcano et al. (2021),
al. (2021) propose hyperbolic variants of the original GAN
HAEGAN shows improved results on MNIST image gener-
(Goodfellow et al., 2020), the Wasserstein GAN WGAN-
ation.
GP (Gulrajani et al., 2017) and conditional GAN CGAN
Li et al. (2023b) propose a hyperbolic method for few-
(Mirza & Osindero, 2014). The paper finds that their best
shot image generation. The main idea is that hyperbolic space
configurations of Euclidean and hyperbolic layers generally
encodes a semantic hierarchy, where the root of the hierar-
improved the Inception Score (Salimans et al., 2016) and
chy (i.e., at the center of hyperbolic space) is a category, e.g.,
Frechet Inception Distance (Heusel et al., 2017) on MNIST
dog. At lower levels, we have more fine-grained separations,
image generation, with the best improvements in the GAN
such as subcategories, e.g., Shih-Tzu and Ridgeback dogs.
architecture. The best learned curvatures are close to zero.
Finally, at the lowest level, there are category-irrelevant fea-
Unlike other hyperbolic generative models (VAEs and nor-
tures, e.g., the hair color or pose of the dog (see Fig. 8). This
malizing flows), good results are observed at large latent
method builds on the Euclidean pSp method (Richardson et
dimensions.
al., 2021) for image-to-image translation. The pSp method
Qu & Zou (2022a) propose HAEGAN, a hyperbolic
uses a feature pyramid to extract feature maps and uses a set
autoencoder and GAN framework in the Lorentz model L
of projection heads on these feature maps to produce each of
(also known as the hyperboloid model), of hyperbolic geom-
the style vectors required by StyleGAN (Karras et al., 2019,
etry. The GAN is based on the structure of WGAN-GP
2020), which is commonly denoted the W + -space. Image-
(Arjovsky et al., 2017; Gulrajani et al., 2017). The structure
to-image translation can then be done by editing or replacing
of HAEGAN consists of an encoder, which takes in real data
style vectors. Li et al. (2023b) generalize to hyperbolic space
and generates real representations, and a generator, which
by mapping the output of a frozen, pre-trained pSp encoder
takes in noise and generates fake representations. A critic is
to hyperbolic space and then back to the W + -space of style
trained to distinguish between the two representations, and a
vectors, and then feeding the style vectors into a frozen, pre-
decoder takes the fake representations and produces the final
trained StyleGAN. Projection to hyperbolic space is done
generated object. Qu & Zou (2022a) generalize WGAN-GP
using the Mobius layer f ⊗c of Ganea et al. (2018b), with the
to hyperbolic space using three operations: the first is the
full projection layer having the form
hyperbolic linear layer is HLinearn,m : LnK → Lm K of
Chen et al. (2021), the second the hyperbolic centroid dis-
tance layer HCDistn,m (x) : LnK → Rm of Liu et al. (2019), z Di = f ⊗c (expc0 (MLP E (wi ))) (34)

123
3498 International Journal of Computer Vision (2024) 132:3484–3508

with mapping back to the W + -space achieved by a loga- instead of just the origin. The paper also derives the inverse
rithmic map plus an MLP. Li et al. (2023b) supervise the and Jacobian determinants of the two flows. As is the case
hyperbolic latent space with a hyperbolic classification loss for hyperbolic VAEs, Bose et al. (2020) also benchmark on
based on the multinomial logistic regression formulation of MNIST, and find a similar trend as Nagano et al. (2019): the
Ganea et al. (2018b). After calculating the probabilities, the performance of hyperbolic models exceed that of the equiva-
loss function is just negative log-likelihood as lent Euclidean model at low dimension, but as early as latent
dimension 6 Euclidean models overtake hyperbolic models in
N
1 performance. Bose et al. (2020) find that hyperbolic normal-
Lhyper = − log( pn ) (35) izing flows outperform hyperbolic VAEs at these low latent
N
i=1
dimensions.
The full loss function is the pSp loss function plus this term,
excluding a specific facial reconstruction loss used by the 4.2 Clustering
pSp method, since Li et al. (2023b) do not focus on face
generation. Li et al. (2023b) perform image generation as Due to the close relationship between hyperbolic space,
follows: given an image xi , the image is embedded in hyper- hierarchies, and trees, several works have explored hierarchi-
bolic space with representation gD (xi ) and is rescaled to the cal clustering using hyperbolic space. Monath et al. (2019)
desired radius (i.e., fine-grained-ness) r . A random vector is propose to perform hierarchical clustering using hyperbolic
then sampled from the seen categories and a point is taken representations. Given a dataset D = {xi }i=1 N , Monath et

on the geodesic between the two points. Li et al. (2023b) find al. (2019) require a hyperbolic representation at the edge
that their method is competitive with state-of-the-art methods of the Poincaré disk Dd for each data point xi ∈ D, which
and show promise for image-to-image transfer. becomes the leaves of the hierarchical clustering. The method
of Monath et al. (2019) creates a hierarchical clustering by
4.1.3 Hyperbolic Normalizing Flows optimizing the hyperbolic representations for a fixed num-
ber of internal nodes. Parent–children dissimilarity between
Bose et al. (2020) propose a hyperbolic normalizing flow in a child representation z c and a parent representation z p is
Lorentz model that generalizes the Euclidean normalizing measured by
flow RealNVP (Dinh et al., 2016) to hyperbolic space. They
propose two types of hyperbolic normalizing flows: the first, dcp (z c , z p ) = dD (z c , z p )(1 + max{||z p ||D − ||z c ||D , 0})
which they call tangent coupling, which carries out the cou- (41)
pling layer of RealNVP in the tangent space at the hyperbolic
origin o: which encourages children to have larger norms than their
 parents. A discrete tree can then be extracted as follows:
T z̃ 1 = x̃1
f˜ (x̃) =
C
(36)
z̃ 2 = x̃2  σ (s(x̃1 )) + t(x̃1 ) Parent(z c ) = arg min||z p ||<||z c || dcp (z c , z p ) (42)
f T C (x) = expoK ( f˜T C (logoK (x))) (37)
The internal node observations are supervised by two losses:
where s, t are neural networks and σ is a pointwise non- first, a hierarchical clustering loss based on Dasgupta’s cost
linearity. (Dasgupta, 2016) and a continuous extension due to Wang
The wrapped hyperboloid extends tangent coupling by & Wang (2018) that reformulates the loss in terms of lowest
using parallel transport to map intermediate vectors from the common ancestors (LCAs), and second, a parent–child mar-
tangent space of the origin to the tangent space of another gin objective that encourages parent nodes to have smaller
point in hyperbolic space: norm than their children.
Suppose D has pairwise similarities {wi j }i, j∈[N ] . A hier-

z̃ 1 = x̃1 archical clustering of D is a rooted tree T such that each leaf
f˜W HC (x̃) =  
is a data point. For leaves i, j ∈ T , denote their LCA by
z̃ 2 = logoK expt(
K
x̃1 ) PTo→t(x̃1 ) (v) i ∨ j, the subtree rooted at i ∨ j by T [i ∨ j], and the leaves of
(38) T [i ∨ j] by leaves(T [i ∨ j]). Finally, let relation {i, j|k}
v = x̃2  σ (s(x̃1 )) (39) holds if i ∨ j is a descendant of i ∨ j ∨ k. Then Dasgupta’s
W HC cost can be formulated as
f (x) = expoK ( f˜W HC (logoK (x))) (40)

Compared to tangent coupling, wrapped hyperbolic coupling CDasgupta (T ; w) = wi j |leaves(T [i ∨ j])| (43)
allows the flow to leverage different parts of the manifold ij

123
International Journal of Computer Vision (2024) 132:3484–3508 3499

Wang & Wang (2018) show that evenly-sized clusters is performed using the Sinkhorn fixed-
point iteration method of Asano et al. (2019). Hyperbolic
CDasgupta (T ; w) = [wi j + wik + w jk − wi jk (T ; w)] clustering is then performed using the method of Chami et
i jk al. (2020a). Finally, the clustering is self-supervised using
the method of Long et al. (2020).
+2 wi j
Lin et al. (2023a) propose a neural-network based frame-
ij
(44) work for the hierarchical clustering of multi-view data.
The framework consists of two steps: first, improving
where representation quality via reconstruction loss, contrastive
learning between different views, and a weighted triplet
wi jk (T ; w) = wi j 1[{i, j|k}] + wik 1[{i, k| j}] loss between positive examples and mined hard negative
(45) examples, and second, applying the hyperbolic hierarchical
+ w jk 1[{ j, k|i}]
clustering framework of Chami et al. (2020a).
The margin parent–child dissimilarity is given as The contrastive loss in Lin et al. (2023a) is the usual con-
trastive loss (see following section) where positive examples
dcp (z c , z p ; γ ) = dD (z c , z p )(1 + max{||z p ||D are views from the same object and negative examples are
− ||z c ||D + γ , 0}) (46) views from different objects. The weighted triplet loss is

and the total margin objective is N


1
Lm = w m (ai , pi )[m + ||ai − pi ||22 − ||ai − n i ||22 ]+
N
Lcp = dcp (z c , Parent(z c ); γ ) (47) i=1
zc (48)

The embedding is alternately optimized between the cluster- where ai refer to the anchor points, pi are the positive
ing objective and the parent–child objective. Optimization of examples, and n i are the negative examples. Positive and
the hyperbolic parameters is done via the method of Nickel negatives examples are mined based on the method of Iscen
& Kiela (2017). Using this method, Monath et al. (2019) are et al. (2017), which measures the similarity of a pair of
able to embed ImageNet using representations taken from points based on estimating the data manifold using k-nearest
the last layer of a pre-trained Inception neural network. neighbors graphs. Lin et al. (2023b) apply their method to
Similar to Monath et al. (2019), Chami et al. (2020a) perform multi-view clustering for a variety of multi-view
base their method on Dasgupta’s cost (Eq. 43) and Wang image datasets.
and Wang’s (Eq. 44) reformulation in terms of LCAs. Chami
et al. (2020a) define the LCA of two points in hyperbolic
space to be the point on the geodesic connecting the two 4.3 Self-Supervised Learning
points that are closest to the hyperbolic origin, and provide
a formula to calculate this point in the Poincaré disk D. This In Sect. 4.3.1, we describe methods for hyperbolic self-
formula allows Eq. 44 to be directly optimized by replacing supervision that are primarily based on triplet losses, and
the wi jk (T ; w) terms with its continuous counterpart. A hier- in Sect. 4.3.2 we discuss methods for hyperbolic self-
archical clustering tree can then be produced by iteratively supervision which are primarily based on contrastive losses.
merging the most similar pairs, where similarity is measured
by their hyperbolic LCA distance from the origin. Unlike the 4.3.1 Hyperbolic Self-Supervision
method of Monath et al. (2019), Chami et al. (2020a) do not
require hyperbolic embeddings to be available, and optimize Based on the idea that biomedical images are inherently
the hyperbolic embeddings of the whole tree, not just the hierarchical, Hsu et al. (2021) propose to learn patch-level
leaves. representations of 3D biomedical images using a 3D hyper-
Recently, Long & van Noord (2023) propose a scalable bolic VAE and to perform 3D unsupervised segmentation by
Hyperbolic Hierarchical Clustering (sHHC) enabling learn- clustering the representations. Hsu et al. (2021) extend the
ing of a continuous hierarchy which is also scalable to large hyperbolic VAE architecture of Mathieu et al. (2019) using a
datasets. They use clustering to extract hierarchical pseudo- 3D convolutional encoder and decoder as well as gyroplane
labels from sound and vision and perform a downstream convolutional layer that generalizes the Euclidean convolu-
cross-modal self-supervised task, achieving competitive per- tion with the gyroplane layer of Ganea et al. (2018b) (See
formance. They augment the hyperbolic clustering of Chami Eqs. 30 and 31). In order to learn good representations, the
et al. (2020a) by pre-clustering of the data point features into paper proposes to use a hierarchical self-supervised loss that

123
3500 International Journal of Computer Vision (2024) 132:3484–3508

captures the implicit hierarchical structure of 3D biomedical supervised by a combination of three self-supervised losses.
images. The representations are fixed to have latent dimension 2. The
To capture the hierarchical structure of 3D biomedical first self-supervised loss encourages the representation of the
images, Hsu et al. (2021) propose that given a parent patch object to be similar to that of the full object image and farther
μ p , to sample a child patch μc which is a subpatch of the away from the representation of the object background:
parent patch, and a negative patch μn that does not overlap
with the parent patch. Then the hierarchical self-supervised k
bg
loss is defined as a margin triplet loss as follows: Lmask = max(0, γ − d(z ifull , z fg ) + d(z ifull , z i )) (51)
i=1

Lhierarchical = max(0, dD (μ p , μc ) − dD (μ p , μn ) + γ ) (49) The second loss is a triplet loss that requires the sampling of
positive and negative examples.
This encourages the representations of subpatches to be chil-
dren or descendants of the representation of the main patch, k
fg fg fg
and faraway patches (which likely contain different struc- Lobject = max(0, γ − d(z i , ẑ fg ) + d(z i , z i )) (52)
tures) to be on other branches of the learned hierarchical i=1
representation.
fg
To perform unsupervised segmentation, the learned latent where ẑ fg and z i are the features of the positive and neg-
representations are extracted and clustered using a hyperbolic ative samples.
k-means algorithm, where the traditional Euclidean mean is The third loss is similar to the hierarchical triplet loss of
replaced with the Frechet mean. For a manifold M with met- Hsu et al. (2021) described above, except with the origin
ric dM , the Frechet mean of a set of points {z i }i=1
k ,z ∈ M
i taking the place of negative samples:
is defined as the point μ that minimizes the squared distance
to all points z i : k
fg
Lhierarchical = max(0, γ − d(z ichild , o) − d(z i , o)) (53)
k i=1
1
μFr = arg minμ∈M dM (z i , μ)2 (50)
k where o represents the origin of the Poincaré ball, and z ichild
i=1
is the feature of the child mask of proposal i.
and is one way to generalize the concept of a mean to man- Finally, the representations are clustered using hyperbolic
ifolds. Unfortunately, the Frechet mean on the Poincaré ball k-means clustering. Unlike (Hsu et al., 2021), to compute
does not admit a closed-form solution, so Hsu et al. (2021) the mean they map the representations from the Poincaré
compute the Frechet mean with the iterative algorithm of Lou disk to the hyperboloid model L and compute the (weighted)
et al. (2020). The paper finds that this strategy is effective for hyperboloid midpoint proposed by Law et al. (2019):
the unsupervised segmentation of both synthetic biological
data and 3D brain tumor MRI scans (Menze et al., 2014;  k
i=1 νi x i 
μ= β (54)
Bakas et al., 2017, 2018).  k 
|| i=1 νi x i ||L 
Weng et al. (2021) propose to leverage the hierarchical
structure of objects within images to perform weakly-
where β is −1/curvature.
supervised long-tail instance segmentation. To capture this
Compared to the Frechet mean, this mean has the advan-
hierarchical structure, Weng et al. (2021) learn hyperbolic
tage of having a closed-form formula, making it more
representations which are supervised with several hyperbolic
computationally efficient. Weng et al. (2021) find that their
self-supervised losses. Instance segmentation is done in three
method improves other partially-supervised methods on the
stages: first, mask proposals are generated using a pre-trained
LVIS long-tail segmentation dataset (Gupta et al., 2019).
mask proposal network. Mask proposals consists of bound-
ing boxes {Bi }i=1k and masks {Mi }i=1 k . Define x full to be
i
bg 4.3.2 Hyperbolic Contrastive Learning
the original image cropped to bounding box Bi , xi to be
the cropped image with the object masked out using mask Hyperbolic contrastive learning methods have also been
fg
1−Mi , and xi to be the same cropped image with the back- proposed. Surís et al. (2021) propose to learn hyperbolic
ground masked out using mask Mi . We will refer to these as representations for video action prediction because of their
the full object image, object background, and object, respec- ability to combine representing hierarchy and giving a mea-
tively. sure of uncertainty (see Fig. 9). Surís et al. (2021) learn an
bg bg
Second, hyperbolic representations of z i = g(xi ), and action hierarchy where more abstract actions are near the
fg fg
z i = g(xi ) are learned by a pre-trained feature extractor and origin of the Poincaré disk and more fine-grained actions are

123
International Journal of Computer Vision (2024) 132:3484–3508 3501

Fig. 9 Surís et al. (2021) model uncertainty with hyperbolic repre-


sentations. If the model is uncertain, it can predict an abstraction of all
possible actions (red square), and if it is certain it can predict a more
specific action (blue square). The pink circle shows how computing
the mean of two representations (pink squares) increases the general-
ity. Figure reproduced with permission of Surís et al. (2021)

near the edge. If the preceding video frames are ambiguous,


this hierarchical representation allows the ability to predict
Fig. 10 The learned hierarchy of Ge et al. (2023) has objects near the
a more general parent category of action (e.g., greeting) origin of the Poincaré disk and scenes near the edge of hyperbolic space.
instead of having to predict more fine-grained child cate- Image courtesy of Ge et al. (2023)
gories of action (e.g., handshake or high-five). The parent
of two actions is computed as the hyperbolic mean of their
hyperbolic version of the MoCo architecture He et al. (2020),
hyperbolic representations, which Surís et al. (2021) compute
which the authors call HCL. Ge et al. (2023) extend the MoCo
as the midpoint of the geodesic connecting the two represen-
architectures in several ways: first, unlike previous works for
tations. Surís et al. (2021) propose a two-stage framework
visual contrastive learning, HCL requires that object regions
for video action prediction which consists first of contrastive
be extracted from the input image. Secondly, a hyperbolic
pre-training hyperbolic representations, then freezing the
backbone along with a corresponding momentum encoder
representations and training a linear classifier for action pre-
is added to MoCo’s Euclidean backbone and its momentum
diction.
encoder. The Euclidean backbone and momentum encoder
Self-supervised pre-training proceeds as follows: let xt be
are trained the same way as in He et al. (2020), but the
a frame of the video, and a representation z t = f (xt ) is
inputs are not images but the extracted object regions. The
produced by an encoder f . The pretext task is to predict the
hyperbolic branch takes as input a scene region u and an
representation z t+δ of a clip δ frames into the future. The
object region v that is a subregion of the scene u, and neg-
model produces an estimate ẑ t+δ = φ(ct , δ), where ct =
ative objects Nu = {n 1 , . . . , n k } that are not subregions of
g(z 1 , . . . , z t ) is an encoding of all past video frames. All
the scene u. Let the representations of u, v, n j be z u , z v , z j ,
function f , g, φ are parameterized by a neural network. The
respectively. The hyperbolic branch is then trained with a
training is supervised by a contrastive loss:
contrastive loss with hyperbolic distance as the similarity
  measure:
exp(−dD2 (ẑ i , z i ))
L=− log (55)  
i j exp(−dD2 (ẑ i , z j )) exp − dD (zτu ,z v )
Lhyp = − log    
d (z ,z )
which encourages the positive pairs ẑ i , z i to have similar exp − dD (zτu ,z v ) + j exp − D τu j
representations while pushing ẑ i from the representations of (56)
all negative examples z j . One key feature of this loss is that
under the presence of uncertainty, say when actions a, b are where τ is a temperature parameter. This loss encour-
probable, L is minimized by predicting the midpoint on the ages representations to form a scene-object hierarchy where
geodesic connecting a, b, which is equivalent to moving one scenes have the highest norm (i.e., are at the edge of the
level up the hierarchy to the parent of a, b. Poincaré ball D) and objects have the smallest norm (i.e., are
Ge et al. (2023) propose to improve contrastive learning at the center of D). The paper finds that their method achieves
by incorporating the hierarchical structure of images with a small gains over the original MoCo and MoCo augmented
scene-object hierarchy (see Fig. 10). Ge et al. (2023) use a with bounding box information. They also examine the repre-

123
3502 International Journal of Computer Vision (2024) 132:3484–3508

sentations of out-of-context objects using their method, and centroid is computed by the hyperbolic average above):
find that they generally have higher distance to the scene
images. δB = max dD (z m , zc ) (59)
m∈B,c∈K
Yue et al. (2023) propose a different method for hyper-
bolic contrastive learning that is based on SimCLR (Chen et
Unlabelled examples zu are then labelled if they satisfy the
al., 2020c). Like Ge et al. (2023), Yue et al. (2023) replace
condition
the dot-product similarity of the contrastive loss with the
hyperbolic distance:
min dD (zu , zc ) ≤ δB (60)
c∈K

exp(−dD (z i , z j(i) )/τ )


hyp = −
Lself log (57) which essentially says that if an unlabelled example is “as
i∈I a∈A(i) exp(−dD (z i , z a )/τ ) certain” as some labelled example, it should be labelled.
Durrant & Leontidis (2023) also propose a hyperbolic
but unlike Ge et al. (2023), they only have a hyperbolic self-supervised approach, using Ideal prototypes to extend
branch and do not retain a Euclidean branch. Yue et al. (2023) masked Siamese networks Assran et al. (2022) to hyperbolic
also propose to extend the supervised contrastive learning space. To do this, the dot product similarity used by Assran
method SupCon (Khosla et al., 2020) in the same way. Yue et al. (2022) is replaced with distance on the Poincaré ball.
et al. (2023) also propose to train an adversarially robust con- Similarities with prototypes are replaced with the Busemann
trastive learner that extends the Robust Contrastive Learning function on the Poincaré ball. Finally, a hyperbolic projec-
(RoCL) (Kim et al., 2020) method to hyperbolic space by tion head is used in place of an Euclidean projection head,
replacing the Euclidean contrastive losses in RoCL’s adver- using the hyperbolic linear layers of Shimizu et al. (2021).
sarial training loss with their hyperbolic contrastive loss: Doan et al. (2023) tackles the Open World Object Detec-
tion (OWOD) task, by leveraging the object unknownness
+ adv
level with respect to the context. To this end, they pro-
hyp ( x̃, { x̃ , x̃
Lself , {x̃ − }}) + λLself
hyp ( x̃
adv +
, x̃ , {x̃ − }) (58) pose Hyp-OW consisting of three main parts: Hyperbolic
contrastive learning, to learn a hierarchical class represen-
where x̃ is a given image, x̃ + is a positive example, x̃ − is a tation, Super class regularizer, to push semantically similar
negative example, and x̃ adv is an adversarial example that is classes close, and Adaptive relabeling, to detect unknown
within δ of x̃. As in Ge et al. (2023), Yan et al. (2021) find objects using hyperbolic distance based relabeling. Durrant
that hyperbolic contrastive learning generally achieves small & Leontidis (2023) also propose a hyperbolic self-supervised
gains over its Euclidean counterparts. approach, where ideal prototypes are used to extend masked
Doan et al. (2023) tackles the Open World Object Detec- siamese networks Assran et al. (2022) to hyperbolic space.
tion (OWOD) task, by leveraging the object unknownness
level with respect to the context. To this end, Doan et al.
(2023) propose Hyp-OW consisting of three main parts: 5 Conclusions and Future Outlook
Hyperbolic contrastive learning, to learn a hierarchical class
representation, Super class regularizer, to push semanti- This survey provides an overview of the current state of
cally similar classes close, and Adaptive relabeling, to detect affairs in hyperbolic deep learning for computer vision.
unknown objects using hyperbolic distance based relabeling. Based on the organization of supervised and unsupervised
The hyperbolic contrastive loss is the usual contrastive loss literature, we conclude the survey by discussing which types
with temperature (e.g., Eq. 57) performed on hyperbolic fea- of problems currently benefit most from hyperbolic learning
tures, which are extracted from a Euclidean feature extractor and discussing open problems for future research.
and embedded into hyperbolic space using the exponential
map. Positive and negative examples are drawn from both 5.1 When is Hyperbolic Learning Most Effective?
the batch B as well as a buffer M. In super class regular-
ization, a category p consisting of classes S p = c1 , . . . , cn From current works, we identify four main axes of improve-
is embedded as the hyperbolic average (using the hyper- ment that have come with the recent shift towards learning
bolic average of Khrulkov et al. (2020)) of the hyperbolic in hyperbolic space for computer vision:
embeddings of its constitutent classes. Category embeddings
are then supervised by the same contrastive loss at the cat- • Hierarchical learning. The inherent links between hierar-
egory level. Finally, in adaptive relabeling, the maximum chical data and hyperbolic embeddings are well known.
distance δB from each matched (that is, has a groundtruth It is therefore not all too surprising to see that a wide
label, denoted zm ) to every class centroid zc (where the class range of works have used hyperbolic learning to improve

123
International Journal of Computer Vision (2024) 132:3484–3508 3503

hierarchical objectives in computer vision. The ability to bolic space [see e.g., (Ghadimi Atigh et al., 2021)]. As
incorporate hierarchical knowledge, for example through such, hyperbolic learning has the potential to enable
hyperbolic embeddings or hierarchical hyperbolic logis- learning in compressed and embedded domains.
tic regression, has been utilized for several problems.
Hierarchical learning in hyperbolic space can among oth- 5.2 Open Research Questions
ers reduce error severity, resulting in smaller mistakes
and more consistent retrieval, see e.g., Long et al. (2020), Hyperbolic learning has made an impact on computer vision
Dhall et al. (2020) and Yu et al. (2022b). This is a key with many promising avenues ahead. The field is however
property for example in medical domains, where large still in the early stages with many challenges and opportuni-
mistakes need to be avoided at all costs. ties ahead. Three directions stand out:
Hierarchical learning has also shown to have applications
in zero-shot learning. By embedding class hierarchies in • Fully hyperbolic learning Hyperbolic learning papers in
hyperbolic space and mapping examples of seen classes computer vision commonly share one perspective: hyper-
to their corresponding embedding, it becomes possi- bolic learning should be done in the embedding space. For
ble to generalize to examples of unseen classes (Liu et the most part, the representation learning of earlier layers
al., 2020). In general, hierarchical information between is done in Euclidean space, resulting in hybrid networks.
classes helps to structure the semantics of the task at hand, Works from neuroscience indicate that for the earlier lay-
and embedding such knowledge in hyperbolic space is ers in neural networks, hyperbolic space can also play a
preferred over Euclidean space. prominent role (Chossat, 2020). Recently, Zhang et al.
• Few-sample learning. Few-shot learning is popular in (2023) have shown that spatial relations in the hippocam-
hyperbolic deep learning for computer vision. Many pus are more hyperbolic than Euclidean.
works have shown that consistent improvements can be Learning deep networks fully in hyperbolic space requires
made by performing this task with hyperbolic embed- rethinking all layers, from convolutions to self-attention
dings and prototypes, both with [e.g., (Zhang et al., and normalization. At the time of writing the survey,
2022)] and without [e.g., (Khrulkov et al., 2020)] hier- two works have made steps in this direction. Bdeir et
archical knowledge. In few-shot learning, samples are al. (2023) introduce a hyperbolic convolutional network
scarce when it comes to generalization, and working in in the Lorentz model of hyperbolic space. They outline
hyperbolic space consistently improves accuracy. These how to perform convolutions, batch normalization, and
results indicate that hyperbolic space can generalize from residual connections. Simultaneously, van Spengler et al.
fewer examples, with potential in domains where exam- (2023a) introduce Poincaré ResNet, with convolutions,
ples are scarce. This is already visible in the unsupervised residuals, batch normalization, and better network ini-
domain, where generative learning is better in hyperbolic tialization in the Poincaré ball model. The works provide
space when working with constrained data sources. a foundation towards fully hyperbolic learning, but many
• Robust learning. Across several axes, hyperbolic learn- open questions remain. Which model is most suitable for
ing has shown to be more robust. For example, hyperbolic fully hyperbolic learning? Or do different layers work
embeddings improve out-of-distribution detection, pro- best in different models? And how can fully hyperbolic
vide a natural way to quantify uncertainty about samples learning scale to ImageNet and beyond? Should each
[see e.g., (Ghadimi Atigh et al., 2022)], pinpoint unsuper- stage of the network have the same curvature? And how
vised out-of-context samples [see e.g., (Ge et al., 2023)], effective can hyperbolic networks become across all pos-
and can improve robustness to adversarial attacks [see sible tasks compared to Euclidean networks? A lot more
e.g., (Guo et al., 2022)]. Robustness and uncertainty are research is needed to answer these questions.
key challenges in deep learning in general, hyperbolic • Computational challenges Performing gradient-based
deep learning can provide a natural solution to robustify learning in hyperbolic space changes how networks are
networks. optimized and how parameters behave. Compared to their
• Low-dimensional learning. For a lot of applications, net- Euclidean counterpart however, hyperbolic networks and
works, and embedding spaces need to be constrained, for embeddings can be numerically more unstable, with
example when learning on embedded devices or when issues at the boundary of the ball (Moreira et al. 2023),
visualizing data. In the unsupervised domain, hyperbolic vanishing gradients, and more. Moreover, hyperbolic
learning consistently improves over Euclidean learning operations can be more involved and computationally
when working with smaller embedding spaces [see e.g., heavy depending on the used model, leading to less
(Nagano et al., 2019)]. Similarly, the embedding space efficient networks. Such computational challenges are
in supervised problems can be substantially reduced relevant for all domains of hyperbolic learning and a
while maintaining downstream performance in hyper- broader topic that is receiving attention.

123
3504 International Journal of Computer Vision (2024) 132:3484–3508

• Open source community Modern deep learning libraries Acknowledgements MKR acknowledges funding from ScaDS.AI
are centered around Euclidean geometry. Any new Dresden/Leipzig Center for Scalable Data Analytics and Artificial Intel-
ligence.
researcher in hyperbolic learning, therefore, does not
have the opportunity to quickly implement networks and Open Access This article is licensed under a Creative Commons
layers to get an intuition into its workings. Moreover, any Attribution 4.0 International License, which permits use, sharing, adap-
new advances have to be either implemented from scratch tation, distribution and reproduction in any medium or format, as
long as you give appropriate credit to the original author(s) and the
or imported from code repositories of other papers. What source, provide a link to the Creative Commons licence, and indi-
is missing is an open-source community and a shared cate if changes were made. The images or other third party material
repository that houses advances in hyperbolic learning for in this article are included in the article’s Creative Commons licence,
computer vision. Such a community and code base is vital unless indicated otherwise in a credit line to the material. If material
is not included in the article’s Creative Commons licence and your
to get further traction and attract a wide audience, includ- intended use is not permitted by statutory regulation or exceeds the
ing practitioners. Whether it be part of existing libraries permitted use, you will need to obtain permission directly from the copy-
or as a separate library, continued development of open- right holder. To view a copy of this licence, visit http://creativecomm
source hyperbolic learning code is key for the future of ons.org/licenses/by/4.0/.
the field. In recent years, several libraries have initiated
learning and optimizing in hyperbolic space, including
geoopt (Kochurov et al., 2020), geomstats (Miolane et al.,
2020), manifolds.jl (Axen et al., 2021), and HypLL (van References
Spengler et al., 2023b). These libraries will form a great
Ahmad, O., & Lecue, F. (2022). Fisheyehdk: Hyperbolic deformable
basis towards the development of hyperbolic learning. kernel learning for ultra-wide field-of-view image recognition. In
• Large and multimodal learning In computer vision, and AAAI conference on artificial intelligence.
Artificial Intelligence in general, there is a strong trend Amin, F., Mondal, A., & Mathew, J. (2022). Deep semantic hashing with
structure-semantic disagreement correction via hyperbolic metric
towards learning at large scale and learning with multiple
learning. In International workshop on multimedia signal process-
modalities, e.g., image-text or video-audio models. It is ing.
therefore a natural desire for the field to arrive at hyper- Anvekar, T., & Bazazian, D. (2023). Gpr-net: Geometric prototypical
bolic foundation models. While early work has shown network for point cloud few-shot learning. arXiv.
Araño, K. A., Orsenigo, C., Soto, M., & Vercellis, C. (2021). Mul-
that large-scale and/or multimodal learning is viable with
timodal sentiment and emotion recognition in hyperbolic space.
hyperbolic embeddings (Desai et al., 2023), hyperbolic Expert Systems with Applications.
foundation models form a longer-term commitment as Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein genera-
they require solutions to all open problems mentioned tive adversarial networks. In International conference on machine
learning.
above, from stable, fully hyperbolic learning to contin-
Asano, Y. M., Rupprecht, C., & Vedaldi, A. (2019). Self-labelling
ued open source development. via simultaneous clustering and representation learning. arXiv
• Multiple hyperbolic models Unique to hyperbolic geom- preprint arXiv:1911.05371.
etry is the existence of multiple models to perform Assran, M., Caron, M., Misra, I., Bojanowski, P., Bordes, F., Vincent,
P., Joulin, A., Rabbat, M., & Ballas, N. (2022). Masked siamese
numerical operations. Multiple papers have shown that
networks for label-efficient learning. In European conference on
different operations are preferred in different models. computer vision. Berlin: Springer (pp 456–473).
For example, for computing the mean of a distribution, Axen, S. D., Baran, M., Bergmann, & R., Rzecki, K. (2021). Manifolds.
the Klein model (Dai et al., 2021) is preferred over the JL: An extensible Julia framework for data analysis on manifolds.
arXiv preprint arXiv:2106.08777.
Poincaré ball model as this avoids having to compute
Bachmann, G., Bécigneul, G., & Ganea, O. (2020). Constant curva-
the expensive Fréchet mean (van Spengler et al., 2023a; ture graph convolutional networks. In International conference on
Khrulkov et al., 2020). For representation layers, multiple machine learning.
papers advocate for the Lorentz model over the Poincaré Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby,
J. S., Freymann, J. B., Farahani, K., & Davatzikos, C. (2017).
ball model as it is faster and more robust (Chen et al.,
Advancing the cancer genome atlas glioma MRI collections with
2021; Dai et al., 2021). Recently Mishne et al. (2023) expert segmentation labels and radiomic features. Scientific data.
have also investigated the limitations of the Poincaré ball Bakas, S., Reyes, M., Jakab, A., Bauer, S., Rempfler, M., Crimi, A.,
model and the Lorentz model. As shown in Mishne et Shinohara, R. T., Berger, C., Ha, S. M., Rozycki, M., et al. (2018).
Identifying the best machine learning algorithms for brain tumor
al. (2023)), while the Poincaré ball model has a larger
segmentation, progression assessment, and overall survival pre-
capacity to accurately represent points, the Lorentz model diction in the brats challenge. arXiv.
is stronger in optimization and training perspectives. It Balazevic, I., Allen, C., & Hospedales, T. (2019). Multi-relational
remains an open question which model is most suitable poincaré graph embeddings. In Advances in neural information
processing systems.
overall and whether one model suits all or we should
Bdeir, A., Schwethelm, K., & Landwehr, N. (2023). Hyperbolic geome-
employ different hyperbolic models for different opera- try in computer vision: A novel framework for convolutional neural
tions. networks. arXiv.

123
International Journal of Computer Vision (2024) 132:3484–3508 3505

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Dai, S., Gan, Z., Cheng, Y., Tao, C., Carin, L., & Liu, J. (2020).
Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., APO-VAE: Text generation in hyperbolic space. arXiv preprint
et al. (2021). On the opportunities and risks of foundation models. arXiv:2005.00054.
arXiv. Dasgupta, S. (2016). A cost function for similarity-based hierarchical
Bose, J., Smofsky, A., Liao, R., Panangaden, P., & Hamilton, W. (2020). clustering. In ACM symposium on theory of computing.
Latent variable modelling with hyperbolic normalizing flows. In Dengxiong, X., & Kong, Y. (2023). Ancestor search: Generalized open
International conference on machine learning. set recognition via hyperbolic side information learning. In Winter
Bridson, M. R., & Haefliger, A. (2013). Metric Spaces of Non-positive conference on applications of computer vision.
Curvature (Vol. 319). Berlin: Springer. Desai, K., Nickel, M., Rajpurohit, T., Johnson, J., & Vedantam, R.
Cannon, J. W., Floyd, W. J., Kenyon, R., Parry, W. R., et al. (1997). (2023). Hyperbolic image-text representations. arXiv.
Hyperbolic geometry. Flavors of. Geometry, 31, 59–115. Dhall, A., Makarova, A., Ganea, O., Pavllo, D., Greeff, M., & Krause,
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., A. (2020). Hierarchical image classification using entailment cone
& Joulin, A. (2021). Emerging properties in self-supervised vision embeddings. In Computer vision and pattern recognition work-
transformers. In international conference on computer vision. shops.
Cetin, E., Chamberlain, B., Bronstein, M., & Hunt, J. J. Dhingra, B., Shallue, C. J., Norouzi, M., Dai, A. M., & Dahl, G. E.
(2022). Hyperbolic deep reinforcement learning. arXiv preprint (2018). Embedding text in hyperbolic spaces. In Workshop on
arXiv:2210.01542. graph-based methods for natural language processing.
Chamberlain, B. P., Hardwick, S. R., Wardrope, D. R., Dzogang, F., Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2016). Density estimation
Daolio, F., & Vargas, S. (2019). Scalable hyperbolic recommender using real nvp. arXiv.
systems. arXiv preprint arXiv:1902.08648. Doan, T., Li, X., Behpour, S., He, W., Gou, L., & Ren, L. (2023). HYP-
Chami, I., Ying, Z., Ré C, & Leskovec, J. (2019). Hyperbolic graph OW: Exploiting hierarchical structure learning with hyperbolic
convolutional neural networks. In Advances in neural information distance enhances open world object detection. arXiv preprint
processing systems. arXiv:2306.14291.
Chami, I., Gu, A., Chatziafratis, V., & Ré, C. (2020a). From trees to con- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X.,
tinuous embeddings and back: Hyperbolic hierarchical clustering. Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly,
In Advances in neural information processing systems. S., et al. (2021). An image is worth 16x16 words: Transformers for
Chami, I., Wolf, A., Juan, D. C., Sala, F., Ravi, S., & Ré, C. image recognition at scale. In International conference on learning
(2020b). Low-dimensional hyperbolic knowledge graph embed- representations.
dings. arXiv. Durrant, A., & Leontidis, G. (2023). Hmsn: Hyperbolic self-supervised
Chen, B., Peng, W., Cao, X., & Röning, J. (2022). Hyperbolic uncer- learning by clustering with ideal prototypes. arXiv preprint
tainty aware semantic segmentation. Transactions on Intelligent arXiv:2305.10926.
Transportation Systems. Ermolov, A., Mirvakhabova, L., Khrulkov, V., Sebe, N., & Oseledets, I.
Chen, G., Qiao, L., Shi, Y., Peng, P., Li, J., Huang, T., Pu, S., & Tian, Y. (2022). Hyperbolic vision transformers: Combining improvements
(2020a). Learning open set network with discriminative reciprocal in metric learning. In Computer vision and pattern recognition.
points. In European conference on computer vision. Fan, X., Yang, C. H., & Vemuri, B. C. (2022). Nested hyperbolic spaces
Chen, J., Qin, J., Shen, Y., Liu, L., Zhu, F., & Shao, L. (2020b). Learning for dimensionality reduction and hyperbolic NN design. In Pro-
attentive and hierarchical representations for 3d shape recognition. ceedings of the IEEE computer society conference on computer
In European conference on computer vision. vision and pattern recognition (pp. 356–365).
Chen, J., Jin, Z., Wang, Q., & Meng, H. (2023). Self-supervised 3D Fang, P., Harandi, M., & Petersson, L. (2021). Kernel methods in hyper-
behavior representation learning based on homotopic hyperbolic bolic spaces. In international conference on computer vision.
embedding. IEEE Transactions on Image Processing, 32, 6061– Fang, P., Harandi, M., Lan, Z., & Petersson, L. (2023a). Poincaré kernels
6074. for hyperbolic representations. International Journal of Computer
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020c). A simple Vision, pp. 1–23.
framework for contrastive learning of visual representations. In Fang, P., Harandi, M., Le, T., & Phung, D. (2023b). Hyperbolic geome-
International conference on machine learning. try in computer vision: A survey. arXiv preprint arXiv:2304.10764.
Chen, W., Han, X., Lin, Y., Zhao, H., Liu, Z., Li, P., Sun, M., & Zhou, Franco, L., Mandica, P., Munjal, B., & Galasso, F. (2023). Hyper-
J. (2021). Fully hyperbolic neural networks. arXiv. bolic self-paced learning for self-supervised skeleton-based action
Cho, H., DeMeo, B., Peng, J., & Berger, B. (2019). Large-margin representations. In International Conference on Learning Repre-
classification in hyperbolic space. In International conference on sentations.
artificial intelligence and statistics. Ganea, O., Bécigneul, G., & Hofmann, T. (2018a). Hyperbolic entail-
Cho, S., Lee, J., Park, J., & Kim, D. (2022). A rotated hyperbolic ment cones for learning hierarchical embeddings. In International
wrapped normal distribution for hierarchical representation learn- conference on machine learning.
ing. arXiv. Ganea, O., Bécigneul, G., & Hofmann, T. (2018b). Hyperbolic neural
Chossat, P. (2020). The hyperbolic model for edge and texture detec- networks. In Advances in neural information processing systems.
tion in the primary visual cortex. The Journal of Mathematical Gao, Z., Wu, Y., Jia, Y., & Harandi, M. (2021). Curvature generation in
Neuroscience. curved spaces for few-shot learning. In international conference
Choudhary, N., & Reddy, C. K. (2022). Towards scalable hyperbolic on computer vision.
neural networks using Taylor series approximations. arXiv. Gao, Z., Wu, Y., Jia, Y., & Harandi, M. (2022). Hyperbolic feature
Cui, Y., Yu, Z., Peng, W., & Liu, L. (2022). Rethinking few-shot augmentation via distribution estimation and infinite sampling on
class-incremental learning with open-set hypothesis in hyperbolic manifolds. In Advances in neural information processing systems.
geometry. arXiv. Gao, Z., Xu, C., Li, F., Jia, Y., Harandi, M., & Wu, Y. (2023). Explor-
Dai, J., Wu, Y., Gao, Z., & Jia, Y. (2021). A hyperbolic-to-hyperbolic ing data geometry for continual learning. In Proceedings of the
graph convolutional network. In Computer vision and pattern IEEE/CVF conference on computer vision and pattern recogni-
recognition. tion (pp. 24325–24334).
Ge, S., Mishra, S., Kornblith, S., Li, C. L., & Jacobs, D. (2023).
Hyperbolic contrastive learning for visual representations beyond

123
3506 International Journal of Computer Vision (2024) 132:3484–3508

objects. In Proceedings of the IEEE/CVF conference on computer compact CNN representations. In Computer vision and pattern
vision and pattern recognition (pp 6840–6849). recognition.
Ghadimi Atigh, M., Keller-Ressel, M., & Mettes, P. (2021). Hyperbolic Karras, T., Laine, S., & Aila, T. (2019). A style-based generator archi-
Busemann learning with ideal prototypes. In Advances in neural tecture for generative adversarial networks. In Computer vision
information processing systems. and pattern recognition.
Ghadimi Atigh, M., Schoep, J., Acar, E., van Noord, N., & Mettes, P. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T.
(2022). Hyperbolic image segmentation. In Computer vision and (2020). Analyzing and improving the image quality of StyleGAN.
pattern recognition. In Computer vision and pattern recognition.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Kasarla, T., Burghouts, G., van Spengler, M., van der Pol, E., Cucchiara,
Ozair, S., Courville, A., & Bengio, Y. (2020). Generative adver- R., & Mettes, P. (2022). Maximum class separation as inductive
sarial networks. Communications of the ACM. bias in one matrix. In Advances in neural information processing
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., systems.
Wang, X., Wang, G., Cai, J., et al. (2018). Recent advances in Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah,
convolutional neural networks. Pattern recognition. M. (2022). Transformers in vision: A survey. ACM Computing
Gulcehre, C., Denil, M., Malinowski, M., Razavi, A., Pascanu, R., Her- Surveys.
mann, K. M., Battaglia, P., Bapst, V., Raposo, D., Santoro, A., Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P.,
et al. (2019). Hyperbolic attention networks. In International con- Maschinot, A., Liu, C., & Krishnan, D. (2020). Supervised con-
ference on learning representations. trastive learning. In Advances in neural information processing
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. systems.
C. (2017). Improved training of wasserstein gans. In Advances in Khrulkov, V., Mirvakhabova, L., Ustinova, E., Oseledets, I., & Lempit-
neural information processing systems. sky, V. (2020). Hyperbolic image embeddings. In Computer vision
Gulshad, S., Long, T., & van Noord, N. (2023). Hierarchical explana- and pattern recognition.
tions for video action recognition. arXiv. Kim, M., Tack, J., & Hwang, S. J. (2020). Adversarial self-supervised
Guo, H., Tang, J., Zeng, W., Zhao, X., & Liu, L. (2021). Multi-modal contrastive learning. In Advances in neural information processing
entity alignment in hyperbolic space. Neurocomputing. systems.
Guo, Y., Wang, X., Chen, Y., & Yu, S. X. (2022). Clipped hyperbolic Kim, S., Jung, B., & Kwak, S. (2022). HIER: Metric learning beyond
classifiers are super-hyperbolic classifiers. In Computer vision and class labels via hierarchical regularization. arXiv.
pattern recognition. Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes.
Gupta, A., Dollar, P., & Girshick, R. (2019). LVIS: A dataset for large arXiv.
vocabulary instance segmentation. In Computer vision and pattern Klimovskaia, A., Lopez-Paz, D., Bottou, L., & Nickel, M. (2020).
recognition. Poincaré maps for analyzing complex hierarchies in single-cell
Hamann, M. (2018). On the tree-likeness of hyperbolic spaces. In data. Nature communications.
Mathematical proceedings of the Cambridge philosophical soci- Kochurov, M., Karimov, R., & Kozlukov, S. (2020). Geoopt: Rieman-
ety, Cambridge: Cambridge University Press (pp. 345–361). nian optimization in pytorch. arXiv preprint arXiv:2005.02819.
Hamzaoui, M., Chapel, L., Pham, M. T., & Lefèvre, S. (2023). Hyper- Law, M., Liao, R., Snell, J., & Zemel, R. (2019). Lorentzian distance
bolic prototypical network for few shot remote sensing scene learning for hyperbolic representations. In International confer-
classification. Pattern Recognition Letters. ence on machine learning (pp. 3672–3681).
Han, S., Cai, R., Cui, Y., Yu, Z., Hu, Y., & Kot, A. (2023). Hyperbolic Lazcano, D., Franco, N. F., & Creixell, W. (2021). HGAN: Hyperbolic
face anti-spoofing. arXiv preprint arXiv:2308.09107. generative adversarial network. IEEE Access.
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum con- LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature.
trast for unsupervised visual representation learning. In Computer Leimeister, M., & Wilson, B. J. (2018). Skip-gram word embeddings
vision and pattern recognition. in hyperbolic space. arXiv.
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter Leng, Z., Wu, S. C., Saleh, M., Montanaro, A., Yu, H., Wang, Y., Navab,
S. (2017). Gans trained by a two time-scale update rule converge N., Liang, X., & Tombari, F. (2023). Dynamic hyperbolic atten-
to a local NASH equilibrium. In Advances in neural information tion network for fine hand-object reconstruction. In Proceedings of
processing systems. the IEEE/CVF international conference on computer Vision (pp.
Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, 14894–14904).
M., Mohamed, S., & Lerchner, A. (2017). beta-VAE: Learning Li, A., Yang, B., Hussain, F. K., & Huo, H. (2022). HSR: Hyperbolic
basic visual concepts with a constrained variational framework. In social recommender. Information Sciences, 585, 275–288.
international conference on learning representations. Li, H., Jiang, H., Ye, D., Wang, Q., Du, L., Zeng, Y., Wang, Y.,
Hong, J., Fang, P., Li, W., Han, J., Petersson, L., & Harandi, M. (2023a). Chen, C., et al. (2023a). Dhgat: Hyperbolic representation learn-
Curved geometric networks for visual anomaly recognition. IEEE ing on dynamic graphs via attention networks. Neurocomputing p.
transactions on neural networks and learning systems. 127038.
Hong, J., Hayder, Z., Han, J., Fang, P., Harandi, M., & Petersson, L. Li, L., Zhang, Y., & Wang, S. (2023b). The Euclidean space is evil:
(2023b). Hyperbolic audio-visual zero-shot learning. In Proceed- Hyperbolic attribute editing for few-shot image generation. In Pro-
ings of the IEEE/CVF international conference on computer vision ceedings of the IEEE/CVF international conference on computer
(pp. 7873–7883). vision (pp. 22714–22724).
Hsu, J., Gu, J., Wu, G., Chiu, W., & Yeung, S. (2021). Capturing implicit Li, Y. L., Wu, X., Liu, X., Dou, Y., Ji, Y., Zhang, J., Li, Y., Tan, J., Lu,
hierarchical structure in 3d biomedical images with self-supervised X., & Lu, C. (2023c). From isolated islands to Pangea: Unifying
hyperbolic representations. In Advances in neural information pro- semantic space for human action understanding. arXiv.
cessing systems. Lin, F., Bai, B., Guo, Y., Chen, H., Ren, Y., & Xu, Z. (2023a). MHCN: A
Huang, W., Yu, Y., Xu, H., Su, Z., & Wu, Y. (2023). Hyperbolic music hyperbolic neural network model for multi-view hierarchical clus-
transformer for structured music generation. IEEE Access, 11, tering. In Proceedings of the IEEE/CVF international conference
26893–26905. on computer vision (pp. 16525–16535).
Iscen, A., Tolias, G., Avrithis, Y., Furon, T., & Chum, O. (2017). Effi- Lin, F., Yue, Y., Hou, S., Yu, X., Xu, Y., Yamada, K. D., & Zhang,
cient diffusion on region manifolds: Recovering small objects with Z. (2023b). Hyperbolic chamfer distance for point cloud comple-

123
International Journal of Computer Vision (2024) 132:3484–3508 3507

tion. In Proceedings of the IEEE/CVF international conference on Noy, N. F., & Hafner, C. D. (1997). The state of the art in ontology
computer vision (pp. 14595–14606). design: A survey and comparative review. AI Magazine.
Lin, Y. W. E., Coifman, R. R., Mishne, G., & Talmon, R. (2023c). Onghena, P., Gigli, L., & Velasco-Forero, S. (2023). Rotation-invariant
Hyperbolic diffusion embedding and distance for hierarchical rep- hierarchical segmentation on poincare ball for 3d point cloud. In
resentation learning. arXiv preprint arXiv:2305.18962. Proceedings of the IEEE/CVF international conference on com-
Liu, Q., Nickel, M., & Kiela, D. (2019). Hyperbolic graph neural net- puter vision (pp. 1765–1774).
works. In Advances in neural information processing systems. Park, J. H., Choe, J., Bae, I., & Jeon, H. G. (2023). Learning affinity with
Liu, S., Chen, J., Pan, L., Ngo, C. W., Chua, T. S., & Jiang, Y. G. (2020). hyperbolic representation for spatial propagation. In International
Hyperbolic visual embedding learning for zero-shot recognition. conference on machine learning.
In Computer vision and pattern recognition. Peng, W., Varanka, T., Mostafa, A., Shi, H., & Zhao G. (2021). Hyper-
Long, T., & van Noord, N. (2023). Cross-modal scalable hyperbolic bolic deep neural networks: A survey. IEEE Transactions on
hierarchical clustering. In Proceedings of the IEEE/CVF interna- Pattern Analysis and Machine Intelligence.
tional conference on computer vision (pp. 16655–16664). Qu, E., & Zou, D. (2022a). Autoencoding hyperbolic representation for
Long, T., Mettes, P., Shen, H. T., & Snoek, C. G. M. (2020). Search- adversarial generation. arXiv.
ing for actions on the hyperbole. In Computer vision and pattern Qu, E., & Zou, D. (2022b). Hyperbolic neural networks for molecular
recognition. generation. arXiv preprint arXiv:2201.12825.
Lou, A., Katsman, I., Jiang, Q., Belongie, S., Lim, S. N., & De Sa, C. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S.,
(2020). Differentiating through the fréchet mean. In International Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. (2021). Learning
conference on machine learning. transferable visual models from natural language supervision. In
Ma, R., Fang, P., Drummond, T., & Harandi, M. (2022). Adaptive international conference on machine learning.
poincaré point to set distance for few-shot classification. In AAAI Ratcliffe, J. G. (1994). Foundations of hyperbolic manifolds (Vol. 149).
conference on artificial intelligence. Berlin: Springer.
Mathieu, E., Le Lan, C., Maddison, C. J., Tomioka, R., & The, Y. Rezende, D. J., Mohamed, S., & Wierstra, D. (2014). Stochastic back-
W. (2019). Continuous hierarchical representations with poincaré propagation and approximate inference in deep generative models.
variational auto-encoders. In Advances in neural information pro- In International conference on machine learning.
cessing systems, (vol. 32). Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro,
Menze, B. H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, S., & Cohen-Or, D. (2021). Encoding in style: a stylegan encoder
K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al. for image-to-image translation. In Computer vision and pattern
(2014). The multimodal brain tumor image segmentation bench- recognition.
mark (BRATS). In: IEEE transactions on medical imaging. Sala, F., De Sa, C., Gu, A., & Ré, C. (2018). Representation tradeoffs for
Mettes, P., Van der Pol, E., & Snoek, C. (2019). Hyperspherical pro- hyperbolic embeddings. In International conference on machine
totype networks. In Advances in neural information processing learning.
systems. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A.,
Miolane, N., Guigui, N., Le Brigant, A., Mathe, J., Hou, B., Thanwer- & Chen, X. (2016). Improved techniques for training gans. In
das, Y., Heyder, S., Peltre, O., Koep, N., Zaatiti, H., et al. (2020). Advances in neural information processing systems.
Geomstats: A python package for riemannian geometry in machine Sarkar, R. (2011). Low distortion Delaunay embedding of trees in
learning. The Journal of Machine Learning Research, 21(1), 9203– hyperbolic plane. International symposium on graph drawing (pp.
9211. 355–366). Berlin: Springer.
Mirvakhabova, L., Frolov, E., Khrulkov, V., Oseledets, I., & Tuzhilin, Schuler, K. K. (2005). VerbNet: A broad-coverage, comprehensive verb
A. (2020). Performance of hyperbolic geometry models on top-n lexicon. Philadelphia: University of Pennsylvania.
recommendation tasks. In ACM conference on recommender sys- Shimizu, R., Mukuta, Y., & Harada, T. (2021). Hyperbolic neural net-
tems. works++. In International conference on learning representations.
Mirza, M., & Osindero, S. (2014). Conditional generative adversarial Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for
nets. arXiv. few-shot learning. In Advances in neural information processing
Mishne, G., Wan, Z., Wang, Y., & Yang, S. (2023). The numerical systems.
stability of hyperbolic representation learning. In International Sonthalia, R., & Gilbert, A. (2020). Tree! I am no tree! I am a low
conference on machine learning, PMLR (pp. 24925–24949). dimensional hyperbolic embedding. Advances in Neural Informa-
Monath, N., Zaheer, M., Silva, D., McCallum, A., & Ahmed, A. (2019). tion Processing Systems, 33, 845–856.
Gradient-based hierarchical clustering using continuous represen- van Spengler, M., Berkhout, E., & Mettes, P. (2023a). Poincaré resnet.
tations of trees in hyperbolic space. In International conference on arXiv.
knowledge discovery & data mining. van Spengler, M., Wirth, P., & Mettes, P. (2023b). Hypll: The hyperbolic
Montanaro, A., Valsesia, D., & Magli, E. (2022). Rethinking the compo- learning library. arXiv preprint arXiv:2306.06154.
sitionality of point clouds through regularization in the hyperbolic Sun, J., Cheng, Z., Zuberi, S., Pérez, F., & Volkovs, M. (2021). HGCF:
space. arXiv. Hyperbolic graph convolution networks for collaborative filtering.
Moreira, G., Marques, M., Costeira, J. P., & Hauptmann, A. (2023). Proceedings of the web conference, 2021 (pp. 593–601).
Hyperbolic vs Euclidean embeddings in few-shot learning: Two Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M.,
sides of the same coin. arXiv preprint arXiv:2309.10013. Li, L., Yuan, Z., Wang, C., et al. (2021b). Sparse r-CNN: End-to-
Nagano, Y., Yamaguchi, S., Fujita, Y., & Koyama, M. (2019). A wrapped end object detection with learnable proposals. In Computer vision
normal distribution on hyperbolic space for gradient-based learn- and pattern recognition.
ing. In International conference on machine learning. Surís, D., Liu, R., & Vondrick, C. (2021). Learning the predictability
Nickel, M., & Kiela, D. (2017). Poincaré embeddings for learning of the future. In Computer vision and pattern recognition.
hierarchical representations. In Advances in neural information Tifrea, A., Bécigneul, G., & Ganea, O. E. (2019). Poincar’e glove:
processing systems, (vol. 30). Hyperbolic word embeddings. In International conference on
Nickel, M., & Kiela, D. (2018). Learning continuous hierarchies in the learning representations.
Lorentz model of hyperbolic geometry. In International confer- Tong, J., Yang, F., Yang, S., Dong, E., Du, S., Wang, X., & Yi, X. (2022).
ence on machine learning. Hyperbolic cosine transformer for lidar 3d object detection. arXiv.

123
3508 International Journal of Computer Vision (2024) 132:3484–3508

Trpin, A., & Boshkoska, B. (2022). Face recognition with a hyperbolic Yang, M., Zhou, M., Li, Z., Liu, J., Pan, L., Xiong, H., & King, I.
metric classification model. In International convention on infor- (2022a). Hyperbolic graph neural networks: A review of methods
mation, communication and electronic technology. and applications. arXiv preprint arXiv:2202.13852.
Ungar, A. A. (2005). Gyrovector spaces and their differential geometry. Yang, M., Zhou, M., Liu, J., Lian, D., & King, I. (2022b). HRCF:
Nonlinear Functional Analysis and Applications, 10(5), 791–834. Enhancing collaborative filtering via hyperbolic geometric regu-
Ungar, A. A. (2008). A gyrovector space approach to hyperbolic geome- larization. In Proceedings of the ACM web conference.
try. Synthesis Lectures on Mathematics and Statistics, 1(1), 1–194. Yang, M., Zhou, M., Ying, R., Chen, Y., & King, I. (2023). Hyperbolic
Ungar, A. A. (2012). Beyond the Einstein addition law and its representation learning: Revisiting and advancing. arXiv preprint
gyroscopic Thomas precession: The theory of gyrogroups and arXiv:2306.09118
gyrovector spaces (Vol. 117). Berlin: Springer. Yu, K., Visweswaran, S., & Batmanghelich, K. (2020). Semi-supervised
Valada, A. (2022). On hyperbolic embeddings in object detection. In hierarchical drug embedding in hyperbolic space. Journal of
German conference on pattern recognition. Chemical Information and Modeling.
Verbeek, K., & Suri, S. (2014). Metric embedding, hyperbolic space, and Yu, Z., Nguyen, T., Gal, Y., Ju, L., Chandra, S. S., Zhang, L., Bon-
social networks. In: Proceedings of the thirtieth annual symposium nington, P., Mar, V., Wang, Z., & Ge, Z. (2022b). Skin lesion
on Computational geometry (pp. 501–510). recognition with class-hierarchy regularized hyperbolic embed-
Vinh, T. D. Q., Tay, Y., Zhang, S., Cong, G., & Li, X. L. (2018). Hyper- dings. In International conference on medical image computing
bolic recommender systems. arXiv preprint arXiv:1809.01703. and computer-assisted intervention.
Vinh Tran, L., Tay, Y., Zhang, S., Cong, G., & Li, X. (2020). Hyperml: Yue, Y., Lin, F., Yamada, K. D., & Zhang, Z. (2023). Hyperbolic con-
A boosting metric learning approach in hyperbolic space for trastive learning. arXiv.
recommender systems. In Proceedings of the 13th international Zhang, B., Jiang, H., Feng, S., Li, X., Ye, Y., & Ye, R. (2022). Hyperbolic
conference on web search and data mining (pp. 609–617). knowledge transfer with class hierarchy for few-shot learning. In
Wang, D., & Wang, Y. (2018). An improved cost function for hierar- International joint conference on artificial intelligence.
chical cluster trees. arXiv. Zhang, H., Rich, P. D., Lee, A. K., & Sharpee, T. O. (2023). Hip-
Wang, L., Hu, F., Wu, S., & Wang, L. (2021). Fully hyperbolic graph pocampal spatial representations exhibit a hyperbolic geometry
convolution network for recommendation. In Proceedings of the that expands with experience. Nature Neuroscience.
30th ACM international conference on information & knowledge Zhang, Y., Luo, L., Xian, W., & Huang, H. (2021a). Learning better
management. visual data similarities via new grouplet non-euclidean embedding.
Wang, S., Kang, Q., She, R., Wang, W., Zhao, K., Song, Y., & Tay, W. In International conference on computer vision.
P. (2023a). Hypliloc: Towards effective lidar pose regression with Zhang, Y., Wang, X., Shi, C., Jiang, X., Ye Y. (2021b). Hyperbolic graph
hyperbolic fusion. arXiv. attention network. IEEE Transactions on Big Data.
Wang, Y., Wang, H., Lu, W., & Yan, Y. (2023). Hygge: Hyperbolic graph Zhang, Y., Wang, X., Shi, C., Liu, N., & Song, G. (2021c). Lorentzian
attention network for reasoning over knowledge graphs. Informa- graph convolutional networks. In Proceedings of the web confer-
tion Sciences, 630, 190–205. ence.
Weng, Z., Ogut, M. G., Limonchik, S., & Yeung, S. (2021). Unsuper- Zhu, Y., Zhou, D., Xiao, J., Jiang, X., Chen, X., & Liu, Q. (2020). Hyper-
vised discovery of the long-tail in instance segmentation using text: Endowing fasttext with hyperbolic geometry. In Empirical
hierarchical self-supervision. In Computer vision and pattern methods in natural language processing
recognition (pp. 2603–2612).
Wu, Z., Jiang, D., Hsieh, C. Y., Chen, G., Liao, B., Cao, D., & Hou T.
(2021). Hyperbolic relational graph convolution networks plus: a Publisher’s Note Springer Nature remains neutral with regard to juris-
simple but highly efficient QSAR-modeling method. Briefings in dictional claims in published maps and institutional affiliations.
Bioinformatics.
Xu, Y., Mu, L., Ji, Z., Liu, X., & Han, J. (2022). Meta hyperbolic
networks for zero-shot learning. Neurocomputing.
Yan, J., Luo, L., Deng, C., & Huang, H. (2021). Unsupervised hyper-
bolic metric learning. In Computer Vision and Pattern Recognition.
Yan, J., Luo, L., Deng, C., & Huang, H. (2023). Adaptive hierarchical
similarity metric learning with noisy labels. IEEE Transactions on
Image Processing.

123

You might also like