0% found this document useful (0 votes)

8 views59 pages

Order Quantum Wasserstein Distances From Couplings: Emily Beatty and Daniel Stilck Fran Ca

Uploaded by

tdtv0204

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views59 pages

Order Quantum Wasserstein Distances From Couplings: Emily Beatty and Daniel Stilck Fran Ca

Uploaded by

tdtv0204

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Ann.

Henri Poincaré Online First

c 2025 Springer Nature Switzerland AG
https://doi.org/10.1007/s00023-025-01557-z Annales Henri Poincaré

Order p Quantum Wasserstein Distances

from Couplings
Emily Beatty and Daniel Stilck França

Abstract. Optimal transport provides a powerful mathematical frame-

work with applications spanning numerous fields. A cornerstone within
this domain is the p-Wasserstein distance, which serves to quantify the
cost of transporting one probability measure to another. While recent
attempts have sought to extend this measure to the realm of quantum
states, existing definitions often present certain limitations, such as not
being faithful. In this work, we present a new definition of quantum
Wasserstein distances. This definition, leveraging the coupling method
and a metric applicable to pure states, draws inspiration from a prop-
erty characterising the classical Wasserstein distance—its determination
based on its value on point masses. Subject to certain continuity proper-
ties, our definition exhibits numerous attributes expected of an optimal
quantum rendition of the Wasserstein distance. Notably, our approach
seamlessly integrates metrics familiar to quantum information theory,
such as the trace distance. Moreover, it provides an organic extension
for metrics, like Nielsen’s complexity metric, allowing their application to
mixed states with a natural operational interpretation. We analyse this
metric’s attributes in the context of random quantum states, unveil phase
transitions concerning the complexity of subsystems of random states and
use it to derive circuit lower bounds. In addition, we show how we can use
our definition to study hypercontractive inequalities for quantum channels
that do not admit a faithful fixed point, allowing us to derive concentra-
tion inequalities. Finally, we discuss how our distance is well suited to
define quantum Wasserstein generative adversarial networks.

Contents
1. Introduction
2. Basic Concepts and Notation
2.1. Classical Optimal Transport
2.2. Quantum Information Framework
E. Beatty and D. Stilck França Ann. Henri Poincaré

3. Motivations and Deﬁnitions

3.1. Other Approaches
4. General Properties
4.1. Dual Picture
5. Special Instances
5.1. W1H Distance on n-qudit Systems
5.2. Trace Distance
5.3. Complexity Geometry
6. Applications
6.1. Results for Random Quantum States
6.2. Operational Interpretation in Terms of Classical–Quantum
Sources
6.3. Hypercontractivity and Noise
6.4. Quantum Wasserstein Generative Adversarial Networks
7. Conclusion
Acknowledgements
Appendix A: Comparison to Other Proposed Definitions of a Quantum
Wasserstein Distance
Appendix B: Development of the Proposed Definition
Definition of a Transport Plan
7.1. Entangled Couplings and Transport Plans
7.2. Transport Plans Defined from Absolutely Continuous Measures
Appendix C: Towards the Triangle Inequality
Appendix D: Auxiliary Proofs
Proof of Proposition 17 on the Continuity of Wpd
Proof of Corollary 33 on the Approximate Gate Complexity of Random
Pure States
References

1. Introduction
Optimal transport has established itself as a powerful tool in various areas
of science and pure mathematics, such as machine learning [30], information
theory [58], partial diﬀerential equations [65] and economics [31]. In light of
this, it should come as no surprise that the last years have seen a surge of
interest in the quantum generalisation of optimal transport [6,15,18,24,25,29,
32,33,50,52].
One of the central concepts of classical optimal transport is the set of
p-Wasserstein distances Wp , which is a family of distances on the set of prob-
ability measures on a metric space. Roughly speaking, if we imagine a prob-
ability measure as describing a distribution of mass, these distances measure
the cost of transporting one measure onto another in terms of a cost function
on the underlying metric space. They can recover many widely studied met-
rics in probability spaces in a uniﬁed way. For instance, the total variation
Order p Quantum Wasserstein Distances from Couplings

distance corresponds to the Wasserstein distance obtained when we equip the

space with the discrete metric. In the classical setting [65], they admit many
different equivalent characterisations, which we can broadly categorise into a
geometrical formulation, a dual formulation and a coupling formulation.
There now exist many different proposals of quantum Wasserstein dis-
tances that take one of the approaches mentioned above [12,15,17,29,33,50,
52,54,56]. Important examples of definitions that follow the geometric route
are the definitions by Carlen and Maas [15,16], which have been extensively
studied in a series of works [17,25,56]. Roughly speaking, the geometric frame-
work gives a natural generalisation of W2 in terms of a Riemannian metric on
the set of full rank states such that the relative entropy with respect to a ref-
erence state arises as a gradient flow. However, it has the undesirable feature
of depending on an underlying semigroup converging to a reference state.
A good example of a distance that is based on a dual approach is the
definition of De Palma et al. [50]. Whereas the geometric approach is well
suited to generalise the case p = 2, this approach works well for p = 1, and
the authors were able to define a natural quantum generalisation of W1 with
respect to the Hamming distance on the hypercube. The dual approach works
by first defining a good generalisation of a so-called Lipschitz constant of ob-
servables, which in some sense measures how much the expectation value of
the observable can change between two points when normalised by the under-
lying metric. From this, defining the distance induced by such observables is
straightforward through duality. This generalisation has also found numerous
applications in quantum information and computation [24,38,49,51,54,57],
and has been extended to a distance between channels [21].
Finally, some works have followed the coupling approach [12,18,26,27,29,
52]. Recall that given two random variables, a coupling is a joint distribution
whose marginals are the two random variables. In the classical setting, the
Wasserstein distance can be defined as the infimum of the expected cost of
transporting one random variable onto the other, where the infimum is taken
over all couplings of the two random variables. The main advantage of the
coupling approach is that it can yield a definition of a quantum p-Wasserstein
distance for all values of p. Existing approaches that attempt to quantise the
coupling approach have significant downsides and usually do not satisfy one
or more of the key properties expected from the Wasserstein distance. For
example, the definition of [12] is not a semidistance, and the definition of [52]
is not faithful.
In this work, we propose a new definition of the quantum Wasserstein
distance based on the coupling approach. Our novel definition departs from
the observation that, classically, the Wasserstein distance is completely deter-
mined by its value on point masses. As pure states are quantum analogues of
point masses, we propose a coupling definition of the quantum p-Wasserstein
distance induced by a metric on pure states and then consider an optimisation
over all separable couplings of the underlying states.
We show that as long as the metric on pure states satisfies some simple
continuity properties, the induced quantum p-Wasserstein has many desirable
E. Beatty and D. Stilck França Ann. Henri Poincaré

properties expected from a good quantisation of the Wasserstein distance. In

particular, we show that our definition recovers widely studied metrics on the
set of quantum states, such as the trace distance, and is closely related to De
Palma’s quantum Wasserstein distance of order 1. Furthermore, we show that
it offers a natural way of extending distances from pure to mixed states in
a way that preserves all the symmetries of the original metric and recovers
its value on pure states. We also discuss some interesting ways in which our
quantised distance behaves differently from its classical counterpart.
The main aim of this work is to cover a gap in the existing literature by
expanding the theory on p-Wasserstein distances beyond the established orders
p = 1 and p = 2, and to establish a general definition that is not restricted
by the structure or natural geometry of the space to which it is applied. The
approach given defines a quantum p-Wasserstein distance in very broad gen-
erality, allowing for much greater flexibility in application. This flexibility is
enhanced by its ability to adapt to any underlying geometry on the system’s
projective Hilbert space subject to the aforementioned simple continuity prop-
erties. It follows that these distances can be adapted to represent the transport
cost with respect to different qualitative properties of quantum states, as long
as those properties can be captured by a distance on the projective Hilbert
space.
As an application of our new definition, we give a generalisation of
Nielsen’s complexity geometry [46] to mixed states that has a natural opera-
tional interpretation of measuring the complexity of mapping one mixed state
to another. We then study the behaviour of this metric for various ensem-
bles of random quantum states and identify phase transitions in the expected
Wasserstein distance between them. More concretely, we show that if we look
at reduced density matrices of regions whose size is at most one-ninth of the
total size, then the complexity distance is exponentially small. In contrast,
for large enough subsystems, the distance is maximal. This result can be in-
terpreted as formalising the intuitive notion that small enough subsystems of
random states are essentially maximally mixed and trivial from a complex-
ity point of view, whereas large enough subsystems are highly entangled and
complex. This generalisation also allows us to obtain bounds which reflect the
exponential nature of complexity in the number n of qubits, in contrast with
existing methods from quantum Wasserstein distances which can only give
bounds which are linear in n.
We also study De Palma et al.’s W1H distance under random quantum
states, showing that they are essentially maximally far apart. Previous works
[57] hinted at the possibility that the W1H captures how well states can be
distinguished by local observables. However, as random states cannot be dis-
tinguished by local observables, and as our results show that random states
can be distinguished by the W1H distance, we see that this original intuition
is false. This mirrors a result found concurrently in [49]. This shows that the
W1H distance, and in turn the set of Lipschitz observables defined from its
dual, have a significantly richer structure than previously expected.
Order p Quantum Wasserstein Distances from Couplings

Finally, we discuss three further applications of our distance: an oper-

ational interpretation of our newly introduced distance in terms of classical-
quantum sources, analysis of the noise in a channel in terms of its hypercon-
tractivity under the quantum p-Wasserstein distance, and theoretical improve-
ments to the qualitative accuracy of quantum Wasserstein generative adversar-
ial networks (qWGANs) using p-Wasserstein distances. Our hypercontractive
inequalities have the advantage of not requiring the underlying quantum chan-
nels to have a faithful fixed point, an issue with other approaches [7], while
still allowing for some the typical applications of hypercontractivity like con-
centration inequalities.
Each of these is an application which is only possible thanks to the broad
flexibility of this new definition. We are able to discuss arbitrary pth moments
of the distance between the output of classical-quantum sources thanks to the
definition of p-Wasserstein distances for arbitrary p, something which has not
yet been possible.
Theoretical improvements to qWGANs come from the flexibility in terms
of the underlying distance on the projective Hilbert space. And discussing hy-
percontractive properties of quantum channels requires comparing p-
Wasserstein distances of two different orders p1 , p2 , which has been made pos-
sible by this new definition. Although existing definitions with enough basic
properties to be usable cover both p = 1 and p = 2, these are applicable in
wildly different contexts, and so direct comparison is not feasible without the
new definition presented in this work.

2. Basic Concepts and Notation

2.1. Classical Optimal Transport
In the classical setting, the domain of optimal transport introduced in [43] is
well developed and finds applications in a wide variety of areas. The focus is
to understand the cost of transporting one measure onto another and find the
minimal cost of such transportation. More formally, for measurable spaces X , Y
with probability measures μ, ν, respectively, we assign a cost c : X × Y → R≥0
which represents the cost c(x, y) of moving one unit of measure from x to y.
From here we optimise the total cost of transporting μ onto ν for each coupling
[36] γ of μ and ν.
Definition 1. Let μ, ν be probability measures on measurable spaces X , Y, re-
spectively. A coupling of μ and ν is a measure γ on X × Y such that for all
measurable sets A ⊆ X , we have γ(A ⊗ Y) = μ(A), and all measurable sets
B ⊆ Y we have γ(X ⊗ B) = ν(B). Denote by C(μ, ν) the set of couplings of μ
with ν.
In other words, a coupling is a measure γ on X ⊗ Y with marginals μ and
ν. For a cost function c on X × Y, we can then define the transport cost of a
coupling γ as

T (γ) = c(x, y)dγ(x, y). (1)
X ×Y
E. Beatty and D. Stilck França Ann. Henri Poincaré

The optimal transport cost of transporting μ and ν is then given by minimising

(1) over all couplings γ. Throughout we will only consider the case where
X = Y, X is equipped with a metric d, and the cost function c is a power of
the metric d. This leads to the deﬁnition of a classical Wasserstein distance
[64, p. 207].
Definition 2. Let (X , d) be a measurable metric space with probability mea-
sures μ, ν. Given p ≥ 1, the pth -order Wasserstein distance between μ and ν
is given by
1/p
Wp (μ, ν) = inf d(x, y)p dγ(x, y) . (2)
γ∈C(μ,ν) X ×X

This notion also extends to p = ∞ in the following way:

Definition 3. Let (X , d) be a measurable metric space with probability mea-
sures μ, ν. The infinite-order Wasserstein distance between μ and ν is given
by
W∞ (μ, ν) = inf sup d(x, y). (3)
γ∈C(μ,ν) (x,y)∈supp(γ)

It is desirable to generalise these distances to quantities on quantum

states to mirror many of their classical applications. We’re particularly inter-
ested in generalisations of the case where the underlying metric d is the discrete
metric, for which Wp is the total variation distance, and the case where the
underlying metric is the Hamming distance on the hypercube, for which W1 is
¯
Ornstein’s d-distance [48]. These are arguably the most widely used distances
in this context so far and a good generalisation should also recover them.
2.2. Quantum Information Framework
We represent a quantum system by a separable Hilbert space H over C. Pure
states on this system are modelled as elements |ψ of the projective space PH,
where PH = (H\{0})/C. We notate the Hermitian conjugate of such a pure
state by ψ|, and then in a slight abuse of notation we write |ψψ| for the
projection onto |ψ, taking the convention Tr[|ψψ|] = 1. We also use the
term ‘pure state’ to refer to this projection in the set of operators on H, and
the distinction will be clear from the context. For most of this manuscript, and
in particular the examples and applications, we will discuss finite-dimensional
Hilbert spaces.
The set of mixed states on H is defined as the set of positive semidefinite
linear operators of trace 1, otherwise known as density operators, and we will
denote this set by D(H). These operators are all compact, and so can be
decomposed spectrally in the form

ρ= qj |ψj ψj | (4)
j

where j qj = 1 and qj > 0. The term ‘state’ could refer to either an ele-
ment |ψ of PH or a trace one positive semidefinite linear operator ρ, and the
meaning will again be clear from the context.
Order p Quantum Wasserstein Distances from Couplings

The combination of two systems H1 and H2 is represented by their tensor

product H1 ⊗ H2 . Their projective space is then P(H1 ⊗ H2 ). Similarly, the
set of density operators on this bipartite system is D(H1 ⊗ H2 ).
The quantum equivalent of taking the marginal is the partial trace oper-
ation. Using this, we can define a quantum coupling of states ρ, σ.
Definition 4. Let ρ, σ be quantum states on Hilbert spaces H1 , H2 , respec-
tively. A quantum coupling of ρ with σ is a quantum state τ on H1 ⊗ H2 such
that
Tr2 τ = ρ and Tr1 τ = σ. (5)
As we focus primarily on distances in this manuscript, we will almost
always refer to the case where H1 = H2 . Throughout this work, we will also
reference a few key objects from quantum information theory and quantum
computation.
Definition 5. For states ρ and σ on Hilbert space H, their trace distance is
half the trace norm of their difference, defined by

1 1
d1 (ρ, σ) = ρ − σ 1 = Tr (ρ − σ)† (ρ − σ) . (6)
2 2
We also have the equivalence
d1 (ρ, σ) = sup Tr[P (ρ − σ)]. (7)
P † =P,0≤P ≤I

Definition 6. Let U(n) be the group of n × n unitary matrices. The Haar

measure [66] on U(n), denoted μHaar , is the unique measure on the group U(n)
which is invariant under left-multiplication and for which μHaar (U(n)) = 1.
The Haar measure on the unitary group induces a measure on PH for H
of dimension n. This measure has the form of the pushforward M# μHaar for
the map M (U ) = U |0 for some ﬁxed |0. The choice of |0 does not change
the ﬁnal measure. We refer to this as the Haar measure on PH.
Definition 7. The 2 × 2 Pauli operators X, Y, Z are given by the matrices

0 1 0 −i 1 0
X= Y = Z= (8)
1 0 i 0 0 −1
seen as unitary transformations with respect to the computational basis. On
the n-fold tensor product of C2 , a Pauli string is a product σ1 ⊗ . . . σn where
each σi ∈ {I, X, Y, Z}. The weight of such a string is the number of indices for
which σi = I, so the weight of I ⊗ X ⊗ I is one and the weight of X ⊗ Y ⊗ Z
is three.

3. Motivations and Definitions

To generalise the Wasserstein distances to the quantum setting in a way that
is useful, it is desirable to replicate as many basic properties of the classical
deﬁnition as possible. To allow this, we note a key property of the classical
E. Beatty and D. Stilck França Ann. Henri Poincaré

Wasserstein distances. For all orders p ≥ 1 and all x, y ∈ X , the Wp dis-

tance between point masses agrees with the metric distance. That is, for Dirac
measures δx and δy , we have

Wp (δx , δy ) = d(x, y). (9)

It follows, therefore, that the classical Wasserstein distances are deter-
mined by their values on point masses. We also see that we can recover the
underlying metric d from the Wasserstein distance.
This property of determination by point masses is the key motivation of
this work. For any quantum Wasserstein distance which is defined for general
order p, the distances between the quantum versions of point masses (pure
states) should be independent of p, and should agree with the underlying
metric on the space if one exists. This definition of a quantity on mixed states
stemming from a quantity on pure states is akin to the convex roof construction
in the definition of entanglement measures on mixed states, as in [14,62,67,68].
To replicate this property, consider a Hilbert space H with projective
space PH, equipped with metric d. In this setting, for all orders p ≥ 1 and
states |ψ and |ϕ we would expect a quantum Wasserstein distance Wpd to
satisfy

Wpd (|ψψ|, |ϕϕ|) = d(|ψ , |ϕ) (10)

taking the convention that |ψψ| and |ϕϕ| both have unit trace. Given this
motivation for a property the distance should satisfy, it remains to generalise
the definition of a transport cost in (1) to the quantum setting. One possible
generalisation is given in the definition below.
Definition 8. Let H1 , H2 be separable Hilbert spaces with projective spaces
PH1 , PH2 , respectively, and let ρ ∈ D(H1 ), σ ∈ D(H2 ). We define a quantum
transport plan between ρ and σ as any countable set Q of triples
Q = {(qj , |ψj , |ϕj )}j∈J (11)
such that

qj |ψj ψj | = ρ and qj |ϕj ϕj | = σ. (12)
j∈J j∈J

where the indexing set J is countable.

We denote the set of all quantum transport plans between ρ and σ
by Q(ρ, σ). Note that Q(ρ, σ) = ∅; indeed for spectral decompositions ρ =
Order p Quantum Wasserstein Distances from Couplings

j∈J λj |ψj ψj | and σ = k∈K μk |ϕk ϕk | we have
Q = {(λj μk , |ψj , |ϕk )}j∈J,k∈K ∈ Q(ρ, σ). (14)
Transport plans can be defined in any way satisfying the criteria above. How-
ever when discussing the number of elements in a finite transport plan, we
say that two elements (q, |ψ , |ϕ), (q , |ψ , |ϕ) transporting the same states
never appear in the same transport plan: any plan written as such is implied
to contain the element (q + q , |ψ , |ϕ) in their place.
This equivalence between transport plans and ways of writing couplings
reflects the classical case, although we note that one quantum coupling could
give rise to multiple quantum transport plans. For example, the separable
quantum coupling τ = |00| ⊗ 2I between |00| and 2I on a two-qubit space
gives rise to transport plans {(1/2, |0 , |0), (1/2, |0 , |1)} and
{(1/2, |0 , |+), (1/2, |0 , |−)}.
In building quantum transport plans, we can also refer to partial quantum
transport plans. This is a quantum transport plan where ρ and σ are instead
positive semidefinite operators of equal trace at most 1, and the partial plan
transports the ‘partial state’ ρ onto the ‘partial state’ σ.
We can then use this notion of a quantum transport plan to replicate the
classical definitions of transport cost and Wasserstein distance in the quantum
setting. The concept of a quantum transport plan is defined for all separable
Hilbert spaces H1 , H2 , but from this point forwards we require H1 = H2 in
order to have a metric between elements of transport plans. From here onwards
we refer to these both as H.
Definition 9. Let H be a separable Hilbert space and let d be a distance on
PH. Let p ≥ 1. For any transport plan Q = {(qj , |ψj , |ϕj )}j∈J we define its
pth -order quantum transport cost as
p
Tpd (Q) = qj d (|ψj , |ϕj ) . (15)
j∈J

The pth -order quantum Wasserstein distance on D(H) is then deﬁned as

1/p
d d
Wp (ρ, σ) = inf Tp (Q) . (16)
Q∈Q(ρ,σ)

We will see in Proposition 16 that for ﬁnite-dimensional H and for d

continuous, the infimum in this definition is equivalent to the minimum. We
will also see in that proposition that the minimum is attained at a transport
plan with at most 2D2 elements, where D is the dimension of the Hilbert space
H. However, we do not strictly limit the number of states in the definition of
a transport plan in finite dimension. This is because defining a plan Q from
ρ to σ upper bounds the Wasserstein distance Wpd (ρ, σ) by Tpd (Q)1/p , and we
do not want to restrict the set of plans for which this is possible.
Note that we have restricted to transporting from pure states in PH to
pure states on PH above. This is because there is no natural extension of the
distance d to a transportation cost for entangled pure states on H ⊗ H. In
E. Beatty and D. Stilck França Ann. Henri Poincaré

Appendix B 2, we discuss one possible extension and then argue that, at least
for p = 1, couplings with entangled states are not advantageous and we can
restrict to the couplings defined above without loss of generality.
We could also have considered transport plans defined by an integrable
measure q on PH1 ×PH2 . We discuss this possibility in Appendix B 3 and show
these do also not give an advantage over the transport plans as defined above.
However, note that it is still possible to upper bound Wpd (ρ, σ) by Tpd (q)1/p for
such a q, where Tpd (q) is defined in Appendix B 3.
As with many other ordered distances, we can also define the infinite-
order quantum Wasserstein distance.
Definition 10. Let H be a separable Hilbert space and let d be a distance
on PH. For any states ρ, σ ∈ D(H), we define their infinite-order quantum
Wasserstein distance as
d
W∞ (ρ, σ) = inf sup d(|ψj , |ϕj ) (17)
Q∈Q(ρ,σ) j∈J

for quantum transport plans Q = {(qj , |ψj , |ϕj )}j∈J .

As a natural consequence of the convex nature of these deﬁnitions, we
see that this family of Wasserstein distances is monotone in p. Indeed, for any
transport plan Q between ρ and σ we have for p1 < p2 that
⎛ ⎞1/p1

Tp1 (Q)1/p1 = ⎝ qj d(|ψj , |ϕj )p1 ⎠ (18)
j∈J
⎛⎛ ⎞1/p1 ⎞p1 /p2
⎜ ⎟
≤ ⎝⎝ qj (d(|ψj , |ϕj )p1 ) 2 1 ⎠
p /p
⎠ (19)
j∈J
⎛ ⎞1/p2

=⎝ qj d(|ψj , |ϕj )p2 ⎠ (20)
j∈J

= Tp2 (Q)1/p2 (21)

by an application of Jensen’s inequality. Taking the infimum over Q ∈ Q(ρ, σ),
we get
Wpd1 (ρ, σ) ≤ Wpd2 (ρ, σ). (22)
The same applies when p2 = ∞, as Tp1 (Q)1/p1 ≤ maxj∈J d(|ψj , |ϕj ).
This definition is given for all 1 ≤ p ≤ ∞. However, at least in the
classical case, the most important cases to consider for interesting applications
are p = 1, 2, and ∞ [55,63,65].
Although many desirable properties of these Wpd are proven in Sect. 4,
establishing a triangle inequality remains a difficult open problem. This is
true of many attempts to establish a quantum generalisation of the classical
Wasserstein distances, and further discussion of attempts to establish this are
Order p Quantum Wasserstein Distances from Couplings

discussed in Appendix C. However, by looking at the dual picture in Sect. 4.1,

we establish a closely related norm on the space of traceless Hermitian op-
erators for which W1d is an upper bound, and which agrees with the original
distance d on differences between pure states. For order p = 1 this norm allows
a trade-off between ease of proving the triangle inequality, and operational in-
terpretation in terms of moving mass. This dual formulation closely mirrors the
Kantorovich–Rubinstein theorem in the classical case [64, p. 34]. Furthermore,
we show in Sect. 5 that it allows us to essentially recover the trace distance
and the definition of [50], thus our results on the dual definition give strong
evidence that this Wpd is a good quantum generalisation.
3.1. Other Approaches
Many proposals have been suggested for generalisations of the classical Wasser-
stein distance to quantum states, in various contexts, to varying degrees of
success. Arguably, the most desirable classical metrics to generalise are the
discrete metric on a finite space and the Hamming distance on the hypercube.
The most prominent example is perhaps the first-order distance defined
in [50] which generalises the W1 distance on the hypercube whose underlying
geometry is the Hamming distance. They define states ρ, σ on (Cd )⊗n to be
neighbouring if there exists some qudit i for which Tri [ρ − σ] = 0. In other
words, ρ and σ are the same when qudit i is traced out. The norm · W1H
over traceless Hermitian operators is then defined such that its unit ball is
the convex hull of all differences between neighbouring states. This means
ρ − σ W1H can be written as
ρ−σ W1H
n

n
= min ci : ci > 0, (i)
ci (ρ − σ ) = ρ − σ, Tri [ρ
(i) (i)
−σ ]=0 .
(i)

i=1 i=1
(23)
This is a true metric and it has led to many applications such as in quantum
spin systems [54] and variational quantum algorithms [24]. However, the ap-
proach taken is very specific to the Hamming distance and does not lend itself
to general transport costs.
For order p = 2, a number of definitions have been proposed in various
contexts, such as [33] which defines a second-order optimal transport cost in
the context of mean field limits that was shown in [13] to have links with the
Brenier formulation of classical optimal transport [64, p. 238].
More precisely, for a set of Hermitian operators {R1 , . . . , RK } on H, the
second-order transport cost of a coupling Π on H ⊗ H by

K
C(Π) = Tr[(Ri ⊗ I − I ⊗ Ri )Π(Ri ⊗ I − I ⊗ Ri )] (24)
i=1
from which a second-order Wasserstein distance is derived by taking the square
root of the infimum of the cost of all couplings between ρ and σ in the usual
K 2 2
way. Defining quantity d(|ψ , |ϕ) = i=1 Ri |ψ + Ri |ϕ
E. Beatty and D. Stilck França Ann. Henri Poincaré

− 2ψ|Ri |ψϕ|Ri |ϕ, we see that this is a variation of our definition above,
although with a specific d which is not a true metric, and with the infimum
taken over all couplings as opposed to just those which are separable.
Following the Brenier formulation more closely, the distance in [15] is also
a second-order distance and has been used [25] to prove a quantum version of
the HWI inequality. Specifically, the 2-Wasserstein distance is defined here as
the geodesic distance on the set of full-rank states equipped with a Riemannian
metric defined by the continuity equation. This definition is a world away from
our framework, as it leans heavily into the intricacies of the classical dynamical
formulation.
We should also mention the definition in [18] and refined in [42] which
defines a second-order cost based on couplings with a specific cost function
given by an asymmetric projection. While it is conjectured that this gives a
true distance, we show in Appendix A that this definition cannot be extended
to other underlying geometries, such as analogues of the Hamming distance
on multipartite spaces. Further approaches include the more flexible [52], an
alternative to [33], which defines a second-order distance based on couplings
that is not faithful. It has been conjectured [53] that a natural modification
of this quantity is a true distance, though this remains an open problem. A
similarly flexible approach appears in [29], following a naı̈ve translation of the
classical formulation into the quantum setting. This definition takes a cost
matrix C on H ⊗ H, and defines a Wasserstein distance by
1/p
Wp (ρ, σ) = inf Tr[C p τ ] (25)
τ ∈C(ρ,σ)

where C(ρ, σ) is the set of quantum couplings of ρ with σ. The particular case
of C being the projection onto the asymmetric subspace is studied in detail
and coincides with the definition in [18]. In the general case, however, this is
not shown to be a semidistance.
Each of the approaches seems to be generalising one particular aspect
or application of Wasserstein distances to the noncommutative setting, be it
obtaining a distance for a given value of p or a given underlying geometry of the
Hilbert space. However, it is not clear how they relate to each other or how to
extend them beyond their original setting. The definition in this work adapts to
any order p and any underlying metric d on the set of pure states of the Hilbert
space provided that d satisfies a few basic continuity properties. This broad
flexibility allows us to talk about the moments of the cost of moving between
classical-quantum sources in great generality (Sect. 6.2) and also allows us
to talk about the noise of an operator by comparing transport distances of
different orders in an analogue of hypercontractivity (Sect. 6.3).
To avoid confusion, we give in Table 1 a summary of all relevant Wasser-
stein distances used throughout this work.
Order p Quantum Wasserstein Distances from Couplings

Table 1. An overview of the Wasserstein distances used in

this work, and the relevant notation

Notation Definition

Wpd (μ, ν) The classical Wasserstein distance of order p between measures

μ, ν on metric space (X, d); see Eqs. (2) and (3).
Wpd (ρ, σ) The quantum Wasserstein distance of order p between states
ρ, σ on Hilbert space H and metric d on PH; see Eqs. (16) and
(17).
ρ − σW H The quantum Wasserstein norm of order one from [50]; see Eq.
1
(23).
WpH (ρ, σ) The quantum Wasserstein distance of order p on H = (Cd )⊗n
defined by metric dH (|ψ , |ϕ) = |ψψ| − |ϕϕ|W H ; see
1
H
Sect. 5.1. Write Wp=1 when p = 1 to avoid confusion with
norm ·W H .
1
Wp1 (ρ, σ) The quantum Wasserstein distance of order p on
H = CD defined by the trace distance d1 (|ψ , |ϕ) =
1
2
|ψψ| − |ϕϕ|1 ; see Sect. 5.2.
WpC (ρ, σ) The quantum Wasserstein distance of order p on H = (C2 )⊗n
defined by the complexity geometry metric dC [46]; see
Sect. 5.3.

4. General Properties
The goals of this section are to prove some fundamental attributes of Wpd which
will give an idea of how it behaves in the case of general d. Throughout we will
require some basic regularity conditions on d. As we will see in more detail in
the discussion following Cor. 13, not every metric d on the set of pure states
induces a faithful Wasserstein distance. Thus, we will restrict to metrics with
respect to which the 2-norm on the set of traceless self-adjoint operators is
Hölder continuous. In other words, we require that there exists α ∈ (0, 1] and
C > 0 for which for all |ψ , |ϕ ∈ PH we have

Cd(|ψ , |ϕ)α ≥ |ψψ| − |ϕϕ| 2 . (26)

Note that as we work in finite dimension, the 2-norm is chosen here without
loss of generality. In the case α = 1, this is equivalent to the 2-norm being
Lipschitz with respect to d. The constant C is a function of the metric space
(PH, d) and so can depend on the dimension of H. For some other properties
such as continuity, we will require that d be continuous, noting that on H of
finite dimension this is equivalent to d being uniformly continuous.
Given the main philosophical motivation of this definition, that the trans-
port distance between point masses in the classical setting is given by the un-
derlying metric, it’s important to show the quantum equivalent here. That is,
for all orders p the quantum Wasserstein distance between pure states agrees
with the underlying metric.
E. Beatty and D. Stilck França Ann. Henri Poincaré

Proposition 11. Let H be a separable Hilbert space with distance d on PH.

Then, for pure states |ψψ|, |ϕϕ|, we have
Wpd (|ψψ|, |ϕϕ|) = d(|ψ , |ϕ). (27)

Proof. As |ψψ| and |ϕϕ| are pure, the only permitted transport plan is
Q = {(1, |ψ , |ϕ)}, which has cost
Tpd (Q) = d(|ψ , |ϕ)p (28)

and therefore we have Wpd (|ψψ|, |ϕϕ|) = d(|ψ , |ϕ).

We can then employ the enforced Hölder continuity condition to prove

that Wpd is nondegenerate. This is captured in the following lemma. Zero self-
distance in Wpd is also fairly clear, as is symmetry. Note that these fundamental
properties have not always been present in previous deﬁnitions such as in [52]
where the deﬁnition has nonzero self-distance.

Lemma 12. Let H be a separable Hilbert space with distance d on PH with

respect to which the 2-norm is Hölder continuous with constant C > 0 and
exponent α ∈ (0, 1]. Then for all ρ, σ ∈ D(H), we have
1 1/α
Wpd (ρ, σ) ≥ ρ−σ 2 . (29)
C
Proof. We have Cd(|ψ , |ϕ)α ≥ |ψψ| − |ϕϕ| 2 on PH, for C > 0 and
α ∈ (0, 1]. Let {(qj , |ψj , |ϕj )} be a quantum transport plan between ρ and
1/α
σ. The functions x → xp and X → X 2 for traceless X are both convex.
Hence,
⎛ ⎞p

qj d(|ψj , |ϕj )p ≥ ⎝ qj d(|ψj , |ϕj )⎠ (30)
j j
⎛ ⎞p
1
≥⎝
1/α ⎠
qj |ψj ψj | − |ϕj ϕj | 2 (31)
j
C
⎛ 1/α ⎞p

⎜1 ⎟
≥⎝ qj (|ψj ψj | − |ϕj ϕj |)
⎠ (32)
C j
2
1 p/α
= p ρ−σ 2 . (33)
C
Hence, taking the inﬁmum and the pth root, we have Wpd (ρ, σ) ≥ 1
C
1/α
ρ−σ 2 .

Corollary 13. Let d be a distance on PH with respect to which the 2-norm is

Hölder continuous. For all ρ, σ ∈ D(H), we have Wpd (ρ, σ) ≥ 0 with equality
iﬀ ρ = σ.
Order p Quantum Wasserstein Distances from Couplings

Proof. Wpd (ρ, σ) ≥ 0 is clear, and equality when ρ = σ can be obtained from
the taking transport plan {(cj , |ψj , |ψj )}j∈J where ρ = j∈J cj |ψj ψj | is
a spectral decomposition. Faithfulness is a direct consequence of Proposition
12.
For a general distance d, not necessarily satisfying property (26), we get
from the same proof above that Wpd (ρ, σ) ≥ 0 and Wpd (ρ, ρ) = 0. It is not
guaranteed, however, that a general distance on PH leads to a nondegenerate
Wpd . For example, let H = C2 with the standard basis {|0 , |1}. For any
non-negative real-valued function f on PH with f (|0) = 0 and f positive
elsewhere, we can deﬁne a metric d as follows:

0 if |ψ = |ϕ
d(|ψ , |ϕ) = (34)
f (|ψ) + f (|ϕ) otherwise.
This forms a version of the SNCF metric, also known as the centralised railway
metric [19, p. 327], with |0 at the centre. This is a metric in which travel is
only permitted along rays emanating from a central point. We will consider f
such that

n−1 1 n−1 1 1
f |0 + |1 = f |1 − |0 = (35)
n n n n n
for n ∈ N and f (|ψ) = 2 otherwise. Consider the sequence of transport plans

1 n−1 1 1 n−1 1
Qn = , |0 , |0 + |1 , , |0 , |1 − |0
2 n n 2 n n
(36)
for n ∈ N. These are all plans which transport |00| onto 2I and each has cost

T1d (Qn ) = n1 . This gives W1d |00|, 2I ≤ T1d (Qn ) = n1 → 0, and therefore

W1d |00|, 2I = 0. We have shown that the Hölder continuity condition is suf-
ﬁcient for Wpd to be nondegenerate, but it is not immediately obvious whether
or not it is necessary.
Having established that Wpd is at least a semidistance, we turn our atten-
tion to its basic behavioural properties. Firstly, we note that all symmetries of
(PH, d) are inherited by the quantum Wasserstein distances.
Proposition 14. Let 1 ≤ p ≤ ∞ and d be a metric on PH. The group of unitary
symmetries of the underlying metric d is exactly the group of conjugational
symmetries of Wpd .
Proof. Let U be a symmetry of d. U is invertible; therefore, there is a direct
correspondence
{(qj , |ψj , |ϕj )}j∈J ←→ {(qj , U |ψj , U |ϕj )}j∈J (37)
between quantum transport plans from ρ to σ and from U ρU † to U σU † . The
distance d is invariant under U so cost is preserved under this correspondence,
therefore the optimal cost is also preserved.
E. Beatty and D. Stilck França Ann. Henri Poincaré

Conversely, let unitary V be a conjugational symmetry of Wpd . Then, for

all |ψ , |ϕ ∈ PH, we have
d(|ψ , |ϕ) = Wpd (|ψψ|, |ϕϕ|)
= Wpd (V |ψψ|V † , V |ϕϕ|V † ) = d(V |ψ , V |ϕ) (38)
and so V is a symmetry of d.

This allows us to prove a result on data processing for mixed unitary

channels. The data processing inequality is a central concept in both classical
and quantum information theory, and represents the idea that you can’t create
new information by processing old data: all the information you have about an
object is encoded in the rawest data you have. This concept also exists in clas-
sical optimal transport: for measures μ on a metric space (X, d), measurable
sets A, B and a measurable map f : X → X, we deﬁne the pushforward f∗ μ by
f∗ μ(B) = μ(f−1 (B)). If f is 1-Lipschitz, then Wpd (μ, ν) ≥ Wpd (f∗ μ, f∗ ν). While
many quantities in quantum information theory satisfy data processing for all
quantum channels, in the case of the Wasserstein distances we would only ex-
pect the data processing inequality to be satisﬁed for those channels which
respect the geometry of the underlying space. For example, for an arbitrary
unitary map U we would not expect the channel ρ → U ρU † to satisfy data
processing unless U has d(|ψ , |φ) ≥ d(U |ψ , U |ϕ) for all states. In general,
we would only expect channels which respect the geometry of the underlying
space to satisfy data processing. For general d, the largest class of channels
from D(H) → D(H) that respect the geometry of the underlying space is the
class of mixed unitary channels whose unitaries are isometries of (PH, d). In
this case, we do indeed have data processing as follows:

Proposition 15. Suppose Φ is a mixed unitary channel written as a countable

sum

Φ(ρ) = ak Uk ρUk† ak ≥ 0, ak = 1 (39)
k∈K k∈K

for which multiplication by each Uk is an isometry with respect to d. Then, for

all orders 1 ≤ p ≤ ∞ and all states ρ, σ, we have Wpd (ρ, σ) ≥ Wpd (Φ(ρ), Φ(σ)).

Proof. Let Q = {(qj , |ψj , |ϕj )}j∈J be any pth -order transport plan between ρ
and σ. We can then deﬁne a transport plan Q = {(qj ak , Uk |ψj ,
Uk |ϕj )}j∈J,k∈K between Φ(ρ) and Φ(σ) which has the same pth -order trans-
port cost as Q. Taking the inﬁmum over Q gives the result.

It is also possible, in ﬁnite dimensions, to guarantee the existence of an

optimal transport plan when the underlying distance d is continuous and to
bound the number of elements of an optimal transport plan. This applies in
particular when d is induced by a norm: that is to say, that there exists a
norm · d on the self-adjoint traceless operators on H such that d(|ψ , |ϕ) =
|ψψ| − |ϕϕ| d everywhere.
Order p Quantum Wasserstein Distances from Couplings

Proposition 16. Let d be a continuous distance on PH and let H have dimen-

sion D < ∞. The infimum in (16) is attained with a transport plan containing
at most 2D2 elements.
Proof. To show that the infimum over all plans is the same as the infimum over
plans of finite size, let Q = {(qj , |ψj , |ϕj )}j∈N be an infinite-sized transport
plan between ρ and σ. Let
Q = {(qj , |ψj , |ϕj ) : j ∈ N, qj ≥ } ∪ Q (40)

for Q a partial transport plan of the form in Eq. (13) between
ρ − j:qj ≥ qj |ψj ψj | and σ − j:qj ≥ qj |ϕj ϕj |. This Q is then a finite
transport plan between ρ and σ. As → 0, the transport cost of the first part
of Q tends to Tpd (Q) and the cost of Q tends to 0, which proves the equality
of the infima.
For the bound 2D2 , suppose Q = {(qj , |ψj , |ϕj }j∈J is a transport plan
between ρ and σ with more than 2D2 elements. We will show that there
exists a transport plan Q with strictly fewer elements than Q, whose pth -
order transport cost is at most that of Q. Let Msa D be the space of self-adjoint
operators on D dimensions. The space Msa D × M sa
D has real dimension 2D .
2

Therefore, among the elements (|ψj ψj |, |ϕj ϕj |) in MD × MD we can ﬁnd
sa sa

a nontrivial linear relation

cj (|ψj ψj |, |ϕj ϕj |) = 0. (41)
j∈J

Deﬁne subsets K, L of J by K = {k ∈ J : ck > 0} and L = {l ∈ J : cl < 0}.

so that the coeﬃcients ck and −cl are strictly positive. Without loss of gener-
ality, we may assume that

ck d(|ψk , |ϕk )p ≥ (−cl )d(|ψl , |ϕl )p . (43)
k∈K l∈L

We will aim to replace a portion of the transport plan corresponding to the

left-hand side, with a portion
corresponding to the right-hand side.
Let m = mink∈K qckk , and let k be a minimiser. We can then form a
new transport plan
Q = {(qj , |ψj , |ϕj )}j ∈K∪L
/ ∪ {(qk − mck , |ψk , |ϕk )}k∈K
∪ {(ql − mcl , |ψl , |ϕl )}l∈L (44)
replacing m times the left hand side of (42) with m times the right hand side.
k attains the minimum in the deﬁnition of m so any multiple of (|ψk , |ϕk )
is removed from the plan. The linear relation means that the resulting plan is
still a transport plan between ρ and σ, and (43) means that the cost is not
increased. Hence we can ﬁnd a transport plan between ρ and σ with fewer
elements, without increasing the transport cost.
E. Beatty and D. Stilck França Ann. Henri Poincaré

We may then optimise the transport cost over all transport plans of size
at most 2D2 . This set is compact and the pth -order cost (15) of a transport
plan is a continuous function of the transport plan Q for d continuous, so the
infimum is attained.
The existence of an optimal transport plan will be particularly useful
later when looking at examples, applications, and further properties.
Another key property of this Wpd is continuity, which we can prove pro-
vided that d is uniformly continuous. This will be necessary for a coherent
definition of a dual in Sect. 4.1.
Proposition 17. Suppose d is uniformly continuous on PH and let 1 ≤ p < ∞.
Then Wpd is uniformly continuous.
Proof. The proof is quite technical and not particularly instructive, so has
been placed in Appendix D 1 for the convenience of the reader.
For p = 1 specifically, we furthermore have joint convexity.
Proposition 18. W1d is jointly convex.
Proof. Suppose ρ1 , σ1 , ρ2 , and σ2 are quantum states, and let r1 + r2 =
1, ri ≥ 0. Let Q1 = {(q1,j , |ψ1,j , |ϕ1,j )}j∈J be any transport plan be-
tween ρ1 and σ1 , and Q2 = {(q2,k , |ψ2,k , |ϕ2,k )}k∈K any transport plan
between ρ2 and σ2 . Then Q = r1 Q1 ∪ r2 Q2 := {(r1 q1,j , |ψ1,j , |ϕ1,j )}j∈J j ∪
{(r2 q2,k , |ψ2,k , |ϕ2,k )}k∈K is a transport plan between r1 ρ1 +r2 ρ2 and r1 σ1 +
r2 σ2 .
Then, we have

T1d (Q) = r1 q1,j d(|ψ1,j , |ϕ1,j ) + r2 q2,k d(|ψ2,k , |ϕ2,k )
j∈J k∈K

= r1 T1d (Q1 ) + r2 T1d (Q2 ). (45)

Taking the infimum over Q1 and Q2 shows that W1d (r1 ρ1 +r2 ρ2 , r1 σ1 +r2 σ2 ) ≤
r1 W1d (ρ1 , σ1 ) + r2 W1d (ρ2 , σ2 ).
For p > 1, joint convexity does not hold even in the classical case: consider
for example the Hamming distance on the cube and measures μ1 = δ000 , ν1 =
δ001 , μ2 = δ111 , ν2 = δ111 with r1 = r2 = 12 . To show that joint convexity does
not hold in the quantum case for p > 1, we give an example in Sect. 5.2.
It is difficult to talk about how tensor products interact with Wpd in
general, as for two spaces H1 , H2 whose projectivizations are equipped with
metrics d1 , d2 , respectively, there is no natural choice of a metric on P(H1 ⊗H2 ).
However, if we equip P(H1 ⊗ H2 ) with a distance which is subadditive with
respect to d1 and d2 , then Wpd is also subadditive:
Proposition 19. Let Ha , Hb be finite-dimensional Hilbert spaces whose projec-
tivizations PHa , PHb are equipped with metrics da , db , respectively. Let P(Ha ⊗
Hb ) be equipped with metric d which satisfies
da (|ψa , |ϕa ) + db (|ψb , |ϕb ) ≥ d(|ψa ⊗ |ψb , |ϕa ⊗ |ϕb ). (46)
Order p Quantum Wasserstein Distances from Couplings

Then for all p and all states ρa , σa on Ha and ρb , σb on Hb , we have

Wpda (ρa , σa ) + Wpdb (ρb , σb ) ≥ Wpd (ρa ⊗ ρb , σa ⊗ σb ). (47)

Proof. Let Qa = {(qj,a , |ψj,a , |ϕj,a )}j∈J and Qb = {(qk,b , |ψk,b , |ϕk,b )}k∈K
be transport plans from ρa to σa and from ρb to σb , respectively. We’ll show
that the transport plan Q = {(qj,a qk,b , |ψj,a ⊗ |ψk,b , |ϕj,a ⊗ |ϕk,b }j∈J,k∈K
has
Tpda (Qa )1/p + Tpdb (Qb )1/p ≥ Tpd (Q)1/p . (48)
Letting Xa be the nonnegative random variable taking value j with probability
qj,a , and similarly Xb independent of Xa taking value k with probability qk,b .
Let Aa = da (|ψXa ,a , |ϕXa ,a ) and Ab = db (|ψXb ,b , |ϕXb ,b ). We see for i = a, b
that Tpdi (Qi )1/p is by its deﬁnition the pth root of the pth moment of Ai . As
the pth root of the pth moment is an Lp norm on random variables on the
measure space (X , μ) induced by Xa and Xb , we know that
p 1/p
Tpda (Qa )1/p + Tpdb (Qb )1/p ≥ Eμ [(Aa + Ab ) ] . (49)
Equation (46) means that Aa + Ab ≥ d(|ψj,a ⊗ |ψk,b , |ϕj,a ⊗ |ϕk,b ), and so
p 1/p 1/p
Eμ [(Aa + Ab ) ] ≥ Eμ [d(|ψ1,Xa ⊗ |ψ2,Xb , |ϕ1,Xa ⊗ |ϕ2,Xb )p ]
= Tpd (Q)1/p (50)
as claimed.

Note that this result also holds for p = ∞, as can be seen by replacing
Wp by W∞ and Tp (·)1/p by supj∈J d(|ψj , |ϕj ) in the proof.

4.1. Dual Picture

We can now take a look at the dual picture for the first-order transport dis-
tance, with some regularity conditions on the underlying metric d which allow
the dual to be well defined. As we will see later, this will then allow us to
define a version of the Wasserstein distance for p = 1 that satisfies the triangle
inequality and inherits many of the properties of the original W1d .

Definition 20. Let d be a metric on PH. Suppose d is continuous and that

the 2-norm is Lipschitz with respect to d. We define the dual constant of a
self-adjoint operator O as
Tr[O(ρ − σ)]
Ld (O) = sup . (51)
ρ=σ W1d (ρ, σ)
We refer to this as the d-Lipschitz constant. Usually, in the classical
case, the Lipschitz constant of a function with respect to a metric is defined
by taking the supremum only with respect to point masses, not probability
distributions. As we see below in Proposition 21, we can also take w.l.o.g. the
supremum only with respect to pure states and only define it with mixed states
for convenience.
E. Beatty and D. Stilck França Ann. Henri Poincaré

Proposition 21. Let d be a metric on PH. Suppose that the 2-norm is Lipschitz
with respect to d. Then,
Tr[O(|ψψ| − |ϕϕ|)]
Ld (O) = sup . (52)
|ψ =|ϕ d(|ψ , |ϕ)

Proof. For any ρ and σ, let Q = {(qj , |ψj , |ϕj )}j∈J be an transport plan
between them whose ﬁrst-order transport cost is at most W1d (ρ, σ) + . Then,
Tr[O(ρ − σ)]
Ld (O) ≤ sup (53)
ρ=σ W d (ρ, σ)
1
j∈J qj Tr[O(|ψj ψj | − |ϕj ϕj |)]
≤ sup (54)
ρ=σ j∈J qj (d(|ψj , |ϕj ) − )
Tr[O(|ψj ψj | − |ϕj ϕj |)]
≤ sup max (55)
ρ=σ j∈J d(|ψj , |ϕj ) −
Tr[O(|ψψ| − |ϕϕ|)]
≤ sup (56)
|ψ =|ϕ d(|ψ , |ϕ) −
Tr[O(|ψψ| − |ϕϕ|)]
= sup (57)
|ψ =|ϕ W1d (|ψψ|, |ϕϕ|) −
Tr[O(|ψψ| − |ϕϕ|)]
→ sup as →0 (58)
|ψ =|ϕ W1d (|ψψ|, |ϕϕ|)
≤ Ld (O) (59)

and so this is a chain of equalities.

Importantly, this means that this Ld is a function of the metric d and is

independent of our definition of Wpd . It also means that our choice of p = 1 in
the definition is arbitrary: defining Ld,p (O) = supρ=σ Tr[O(ρ − σ)]/Wpd (ρ, σ)
gives the same quantity as Ld (O). It is also important to note that under
regularity conditions only slightly stricter than already established on d, this
quantity is a norm.

Proposition 22. Let d be a metric on PH such that the 2-norm is Lipschitz

with respect to d. Then, Ld (O) is a norm on the space of traceless self-adjoint
operators.

Proof. For ﬁniteness, we use |Tr[O(|ψψ|−|ϕϕ|)]| ≤ O ∞ |ψψ| − |ϕϕ| 1

by Hölder’s inequality and then that O ∞ |ψψ| − |ϕϕ| 1 ≤ 2 O ∞
|ψψ| − |ϕϕ| 2 since |ψψ| − |ϕϕ| has rank at most 2. As Lipschitz is the
same as Hölder continuous with exponent α = 1, we have that this is at most
2C O ∞ W1d (|ψψ|, |ϕϕ|) for some positive constant C by Proposition 12.
To show nondegeneracy, suppose O is nonzero and let |ψ, |ϕ be eigen-
states of O with positive and negative eigenvalues, respectively, which must ex-
ist as O is traceless. Then, ρ = |ψψ|, σ = |ϕϕ| gives Tr[O(ρ−σ)]/W1d (ρ, σ) >
0.
Order p Quantum Wasserstein Distances from Couplings

For the other norm properties, clearly Ld (λO) = |λ|Ld (O). And suppose
we have Hermitian operators O1 and O2 . For any states ρ and σ, we have
Tr[(O1 + O2 )(ρ − σ)]/W1d (ρ, σ)
= Tr[O1 (ρ − σ)]/W1d (ρ, σ) + Tr[O2 (ρ − σ)]/W1d (ρ, σ) ≤ Ld (O1 ) + Ld (O2 )
(60)
and taking the supremum of the left over such ρ and σ proves the triangle
inequality for Ld .
We can then dualise one more time and consider the norm
ρ−σ DW1d = sup Tr[O(ρ − σ)]. (61)
Ld (O)≤1

Proposition 23. Let · DW1d be deﬁned by X DW1d = supLd (O)≤1 Tr[OX] on

the space of traceless Hermitian operators. Then, this is a norm, and on quan-
tum states ρ, σ we have
W1d (ρ, σ) ≥ ρ − σ DW1d . (62)

Proof. For the norm, it is clear that when X = 0 we have X W D1d equal to
zero. The fact that λX DW1d = |λ| X DW1d is clear from the deﬁnition, and
the triangle inequality holds as
X1 + X 2 DW1d = sup Tr[O(X1 + X2 )]
Ld (O)≤1

≤ sup Tr[O1 X1 ] + sup Tr[O2 X2 ]

Ld (O1 )≤1 Ld (O2 )≤1

= X1 DW1d + X2 DW1d . (63)

For nondegeneracy, take O to be the projector onto the positive part of X
normalised to Ld (O) = 1, which gives Tr[OX] > 0.
On states ρ, σ, let O be any operator with Ld (O) ≤ 1. Then
Tr[O(ρ − σ )] Tr[O(ρ − σ)]
1 ≥ Ld (O) = sup d
≥ (64)
ρ =σ W1 (ρ , σ ) W1d (ρ, σ)
and so Tr[O(ρ−σ)] ≤ W1d (ρ, σ). Taking the supremum over O gives ρ − σ DW1d
≤ W1d (ρ, σ).
The inequality W1d (ρ, σ) ≥ ρ − σ DW1d is important in our understand-
ing of these quantities. It is as yet unclear whether or not equality holds in
this equation, and establishing the conditions for equality is an important open
problem, not least because this would allow us to recover the triangle inequal-
ity for W1d . There are some simple necessary conditions for equality in this
equation, namely that W1d depends only on ρ − σ, and that it scales like a
norm.
However, we can consider the case where there exists a metric · d such
that d(|ψ , |ϕ) = |ψψ| − |ϕϕ| d . Replacing · 2 with · d in the proof
of Proposition 12, and using d(|ψ , |ϕ) = |ψψ| − |ϕϕ| d , we get that
W1d (ρ, σ) ≥ ρ − σ d . This allows us to recover even more properties of · DW1d :
E. Beatty and D. Stilck França Ann. Henri Poincaré

Proposition 24. Suppose that there H is ﬁnite-dimensional exists norm · d

such that d(|ψ , |ϕ) = |ψψ| − |ϕϕ| d . Then:
1. W1d (ρ, σ) ≥ ρ − σ DW1d ≥ ρ − σ d .
2. When ρ and σ are pure, this becomes an equality.
DW d
3. For all 1 ≤ q ≤ ∞, we have Wqd = Wq 1 .

Proof. As discussed above, we know that ρ − σ d ≤ W1d (ρ, σ). Letting · D

be the dual to · d on the space of traceless Hermitian operators, we have
then that O D ≥ Ld (O). Dualising once more, as in ﬁnite dimension the cor-
responding Banach space is reﬂexive, we know for all X that X d ≤ X DW1d .
Combining this with Proposition 23 gives the chain
W1d (ρ, σ) ≥ ρ − σ DW1d ≥ ρ−σ d . (65)

If ρ, σ are pure then by Proposition 11 we have W1d (ρ, σ) = ρ − σ d , and so

the whole chain (65) is an equality. It follows that for |ψ , |ϕ, we have
d(|ψ , |ϕ) = |ψψ| − |ϕϕ| DW1d . (66)
DW1d
This means in turn that for all 1 ≤ q ≤ ∞, we have Wqd = Wq .
We restricted to the finite-dimensional case here to ensure that reflexivity
of the Banach space is automatic. In the infinite-dimensional separable case,
this result also holds when the Banach space corresponding to · d is reflexive,
though this is no longer guaranteed. The first statement of Proposition 24 tells
us that · DW1d is maximal among norms that agree with · d on distances
between pure states, and the third statement tells us that we can, without
changing W1d , assume that the underlying norm · d is just · DW1d .
We present this norm here, not as a replacement for Wpd , but because
the two complement each other nicely. This method of defining · DW1d via a
supremum of a trace of the difference of two states over Lipschitz observables
mirrors exactly the Kantorovich–Rubinstein theorem from classical optimal
transport [64, p. 34]. In taking the double dual, we gain an easy proof of the
triangle inequality, albeit at the expense of flexibility of order p and of natural
interpretation in terms of transport plans and couplings. We have seen here
that Wpd and · DW1d are closely related, particularly when d is induced by a
norm. We will see later in Sect. 5.1 and Sect. 5.2 that in many cases · DW1d
essentially recovers the original norm · d .

5. Special Instances
5.1. W1H Distance on n-qudit Systems
[50] introduced a quantum Wasserstein distance of order 1 which generalises
the Hamming distance on the discrete hypercube. This is deﬁned on Hilbert
⊗n
space H = Cd . The distance deﬁned is normed, and we notate it here by
ρ − σ W1H .
Order p Quantum Wasserstein Distances from Couplings

· W1H has the interesting property that it recovers the classical ﬁrst-
order Wasserstein distance W1H on the Hamming cube, for states that are
diagonal in the computational basis. That is, for r and s probability distri-
butions on {0, 1, . . . , d − 1}n , and states ρ = x∈{0,1,...,d−1}n r(x)|xx| and

σ = y∈{0,1,...,d−1}n s(y)|yy| we have

ρ−σ W1H = W1H (r, s). (67)

Furthermore, its formulation mirrors the Kantorovich–Rubinstein theo-

rem with the deﬁnition of the quantum Lipschitz constant [50, Deﬁnition 8]:

O L = max{Tr[O(ρ − σ)] : ρ − σ W1H ≤ 1}. (68)

It then induces a metric dH on PH, given by d(|ψ , |ϕ) = |ψψ| − |ϕϕ| W1H ,
from which we can deﬁne a pth order Wasserstein distance WpH as above. For
H
the special case p = 1, we will write Wp=1 to avoid confusion.
We might expect that property in Eq. (67) extends to the WpH distance:
that is, that for classical states ρ and σ deﬁned by r and s as above, that

WpH (ρ, σ) = WpH (r, s). (69)

However, in this case, we recover some interesting diﬀerences between the

classical and quantum deﬁnitions of the p-Wasserstein distances, by taking
‘quantum shortcuts’ in our transport plans.
For p = 1, equality occurs in (69): an optimal classical coupling γ between
r and s gives us the transport plan {(γ(x, y), |x , |y}x,y∈{0,1,...,d−1}n which
has transport cost W1H (r, s), and we know that Wp=1 H
(ρ, σ) ≥ ρ − σ W1H =
W1 (r, s) from Proposition 24 above and Proposition 6 of [50].
H

However, for p > 1, we ﬁnd an interesting phenomenon of quantum ‘short-

cuts’ between classical states. Indeed, consider the case n = d = 2 and the
⊗2
states ρ = |0000|, σ = I 4 . Classically, the distributions δ00 and the uni-
form distribution have a W∞ H
distance of 2. However, consider the quantum
transport plan

1 ±±

, |00 , ϕ (70)
4
for the standard 2-qubit Bell states
++ 1 +− 1
ϕ = √ (|00 + |11) ϕ = √ (|00 − |11)
2 2
−+ −− (71)
ϕ 1 ϕ 1
= √ (|01 + |10) = √ (|01 − |10).
2 2
This corresponds to the separable coupling
1
τ= |0000| ⊗ |ϕ±± ϕ±± | (72)
±±
4
E. Beatty and D. Stilck França Ann. Henri Poincaré

of ρ and σ. Proposition 2 of [50] shows that both

√
|0000| − |ϕ+± ϕ+± | H ≤ 2 and |0000| − |ϕ−± ϕ−± | H
W1 W1
√
5+3
≤ . (73)
4
√ √ √
This in turn gives us W∞ H
(ρ, σ) ≤ max 2, 5+3
4 = 2. While these quan-
tum shortcuts are intriguing, it is as yet unclear the full extent to which they
can appear and how they behave for diﬀerent distances d.
From Proposition 14, we see that WpH does inherit the invariance prop-
erties of · W1H , namely invariance under local unitaries and qubit swaps. We
can also relate the double-dual norm · DW1H to the norm · W1H as follows.
⊗n
Proposition 25. Let ρ, σ be states on H = Cd . Then,
2 ρ−σ W1H ≥ ρ−σ DW1H ≥ ρ−σ W1H . (74)

Proof. The lower bound comes from Eq. (65). For the upper bound, we refer
to the dual O L and show that O L ≤ 2Ld (O). Taking the dual of this
equation will give the upper bound required.
We have from [50, Proposition 15], that
2 max O − Tri O ⊗ Ii ∞ ≥ O L . (75)
1≤i≤n

So we will show that Ld (O) ≥ max1≤i≤n O − Tri O ⊗ Ii ∞ .

Fix i, and let |ψ be an eigenvector of O − Tri O ⊗ Ii with an eigenvalue
which is maximal in absolute value, assuming without loss of generality that
this is positive. Let μHaar be the Haar measure on the unitary group U(d)
acting on the ith qudit. Writing Ui for a unitary on the ith qubit, Ii for the
identity on the ith qubit, and Iî for the identity acting on all except the ith
qubit, we note that
EUi ∼μHaar Tr[(O − Tri O ⊗ Ii )Ui ⊗ Iî |ψψ|U † ⊗ Iî ]
= Tr[(O − Tri O ⊗ Ii )Tri |ψψ| ⊗ Ii /d] = 0. (76)
Therefore, taking any Ui ∈ U(d) such that Ui |ψ has Tr[(O − Tri O ⊗ Ii )Ui ⊗
Iî |ψψ|Ui† ⊗ Iî ] < 0, let this Ui ⊗ Iî |ψ be |ϕ. Note that Tri [|ψψ|] =
Tri [|ϕϕ|] and therefore |ψψ| − |ϕϕ| W1H ≤ 1. This gives
Tr[O(|ψψ| − |ϕϕ|)] Tr[O(|ψψ| − |ϕϕ|)]
≥ (77)
|ψψ| − |ϕϕ| W1H 1
= Tr[O(|ψψ| − |ϕϕ|)]
+ Tr[Tri O(Tri |ψψ| − Tri |ϕϕ|])] (78)
= Tr[(O − Tri O ⊗ Ii )(|ψψ| − |ϕϕ|)] (79)
≥ O − Tri O ⊗ Ii ∞ (80)
which concludes the proof.
Order p Quantum Wasserstein Distances from Couplings

For this speciﬁc instance, we have a natural way to look at pth -order
quantum Wasserstein distances under tensor products. Indeed, for qudit sys-
tems Ha and Hb each equipped with · W1H on na and nb qudits, respectively,
Ha ⊗ Hb can be equipped with · W1H on na + nb qudits. We can therefore
apply the general result 19 to WpH as · W1H is additive under tensor products
[50, Proposition 4].

5.2. Trace Distance

In this section, we consider the case where d is induced by the trace distance
(that is, half the trace norm). For pure states |ψ , |ϕ, we have

1 2
d(|ψ , |ϕ) = |ψψ| − |ϕϕ| 1 = 1 − |ψ|ϕ| . (81)
2
The trace distance, when applied to pure states on a D-dimensional space,
is a direct analogue of the discrete metric on a space with D elements. This
means that it is the simplest case to consider in terms of optimal transport.
The trace distance on mixed states can also be considered as a quantum gener-
D
alisation of the total variation distance dT V (μ, ν) = i=1 |μ(i) − ν(i)|, which
is the classical W1 distance on this discrete space.
We write the pth -order transport cost associated with this distance as
Tp and the associated quantum Wasserstein distance as Wp1 . We note from
1

Proposition 2 of [52] that this agrees with WpH in the case of a single qudit.
From this we inherit all the properties of WpH , notably Proposition 25 which
gives us ρ − σ 1 ≥ ρ − σ DW11 ≥ 12 ρ − σ 1 . However, we can go one step
further.

Proposition 26. Let ρ, σ be states on Cd . Then,

1
ρ−σ DW11 = ρ−σ 1 . (82)
2
Proof. That ρ − σ DW11 ≥ 12 ρ − σ 1 is a direct consequence of Proposition
24. For the other direction, we once again consider the dual. The double dual
of the 1-norm is itself, and so to show inequality in the other direction we need
only show for any traceless Hermitian operator O that
Tr[O(ρ − σ)]
sup ≤ L1 (O) (83)
2 ρ−σ 1
1
ρ=σ

as taking the dual of this equation would give the inequality in the other
direction.
ρ − σ for ρ , σ positive semideﬁnite
For any ρ = σ, let ρ − σ = operators
with ρ ⊥ σ . Then, write ρ = j∈J μj |ψj ψj | and σ = k∈K νk |ϕk ϕk | in

spectral decompositions. Let γ be a classical coupling of the measures μ and

ν. It then follows that
1 1
ρ−σ 1 = γj,k |ψj ψj | − |ϕk ϕk | 1 (84)
2 2
j∈J,k∈K
E. Beatty and D. Stilck França Ann. Henri Poincaré

and so

Tr[O(ρ − σ)] j∈J,k∈K γj,k Tr[O(|ψj ψj | − |ϕk ϕk |)]
=
2 ρ−σ 1
1
j∈J,k∈K γj,k |ψj ψj | − |ϕk ϕk | 1
Tr[O(|ψψ| − |ϕϕ|)]
≤ sup = L1 (O) (85)
|ψ =|ϕ |ψψ| − |ϕϕ| 1
which concludes the proof.
We can also now give an example to show that joint convexity does not
hold in the case p > 1, analogously to the classical case. Using the notation of
Proposition 18 consider ρ1 = |00|, σ1 = |00|, ρ2 = |00|, and σ2 = |11|,
with r1 = r2 = 12 . In this case r1 Wp1 (ρ1 , σ1 ) + r2 Wp1 (ρ2 , σ2 ) = 12 no matter
the value of p, and r1 ρ1 + r2 ρ2 = |00|, r1 σ1 + r2 σ2 = 2I . By the proof of
Proposition 24, we have that for any transport plan Q = {(qj , |0 , |ϕj )}j∈J
between |00| and 2I we have

I
Tp1 (Q)1/p ≥ T11 (Q) ≥ |00| − = 1. (86)
2 2
1
Equality in the first comparison happens if and only d(|0 , |ϕj ) is con-
→ xp is strictly convex. This means that the |ϕj must all
stant in j since x
take form α |0 + 1 − |α|2 eiθj |1 for some fixed α. We must have α = √12 to
ensure the end state is indeed 2I . Equality in the second comparison then hap-
pens only if all |00| − |ϕj ϕj | commute, meaning all phases eiθj must be the
same. But then the end state cannot be 2I , therefore a plan Q between |00|
and 2I satisfying equality here cannot exist. We conclude that joint convexity
does not hold for p > 1.
5.3. Complexity Geometry
A distance d(I, U ) giving lower bounds for the complexity of synthesising a
unitary U ∈ SU(2n ) from a universal one- and two-qubit gate set was defined
in [46] and refined in [22,45]. This distance d on SU(2n ) is a geodesic distance
of a Riemannian manifold, where the Riemannian metric is chosen such that
local travel is fast in directions corresponding to multiplication by low-weight
unitaries, and slow in directions corresponding to multiplication by high-weight
unitaries.
This idea of expressing quantum gate complexity in terms of Riemannian
geometry has seen renewed interest in recent years, from applications in black
hole thermodynamics [9,35] to rigid bodies [10] and the complexity of typical
unitaries [8]. However, this metric is originally defined as a distance between
unitaries, with the complexity of unitary expressed in terms of its distance
from the identity. This extends naturally to distances between pure states as
the lowest complexity of a unitary which transforms one into the other. Our
optimal transport formulation allows for a natural extension of this metric
to mixed states, in a way that can be considered as quantifying the lowest
possible complexity of transforming one mixed state into another. A related
approach in [59] extends a variation of this complexity geometry metric, one
Order p Quantum Wasserstein Distances from Couplings

in which multiplication by a unitary is independent of weight, to mixed states

using purification methods described in [2]. Another approach in [39] instead
studies complexity using the · W1H norm.
Formally, [46] defines a right-invariant Riemannian metric g on SU(2n )
given by the following inner product on TI SU(2n ). A vector X in the tangent
space TU SU(2n ) to SU(2n ) at U is identified with its Hamiltonian represen-
d −iHt
tation H, given by X = dt e U t=0 . H can be decomposed as HP + HQ ,
where HP is a linear combination of Pauli matrices of weight at most 2, and
HQ is a linear combination of Pauli matrices of weight at least 3. Defining
operations P and Q by P(H) = HP and Q(H) = HQ , the inner product on
TU SU(2n ) is then defined by
Tr(HP(J)) + qTr(HQ(J))
H, J = (87)
2n

where q > 4n is a penalty parameter. The length H, H will then be of order
1 for 2-local Hamiltonians, and of order q otherwise. q > 4n is chosen such that
for any geodesic, the associated Hamiltonian path has an approximately low-
weight decomposition.
The length of a curve U (t) defined by evolution U̇ (t) = −iH(t)U (t) is
then defined as standard by

H(t), H(t)dt (88)

and the distance d(I, U ) is simply the geodesic distance on this Riemannian
manifold.
The main purpose of d is to ﬁnd a geometric interpretation of gate com-
plexity, and [22, Equation 3] gives bounds for d in terms of gate complexity.
Let G(U ) be the exact gate complexity of U , i.e. the minimal number of one-
and two-qubit gates required to synthesise U exactly. For ≥ 0, let G(U, ) be
the minimal number of one- and two-qubit gates required to synthesise a gate
V such that U − V ∞ ≤ , also known as the -approximate gate complexity.
We then have bounds
κG(U, )1/3 2/3
≤ d(I, U ) ≤ G(U ) (89)
n2
for some constant κ > 0.
With these unitaries acting on the n-qubit space H = (C2 )⊗n , we can
deﬁne a metric dC on PH by
dC (|ψ , |ϕ) = min {d(I, U ) : U ∈ SU(2n ), U |ψ = |ϕ} . (90)
The metric properties of dC come directly from the metric properties and
right-invariance of d. Operationally, this gives a lower bound for the minimum
circuit complexity required to synthesise |ϕ from |ψ. Indeed,
κ(min{G(U, ) : U |ψ = |ϕ})1/3 2/3
≤ dC (|ψ , |ϕ)
n2
≤ min{G(U ) : U |ψ = |ϕ}. (91)
E. Beatty and D. Stilck França Ann. Henri Poincaré

In order to say anything useful about the quantum Wasserstein distance

induced by this metric dC , we must demonstrate some basic continuity prop-
erties. Continuity of dC is inherent from the fact that d is a geodesic distance
on a Riemannian manifold and Hölder continuity of the 2-norm with respect
to dC comes from the following proposition.

Proposition 27. Let U ∈ SU(2n ). Then,

d(I, U ) ≥ 2−n/2 U − I 2 . (92)

Proof. Suppose U (t) is a geodesic curve in SU(2n ) with U (t) = I and U (T ) =

U , generated by Hamiltonian H(t) satisfying U̇ (t) = −iH(t)U (t). By a family
of smooth curves uniformly approximating U we may assume that both U and
H are smooth, with ﬁrst-order Taylor expansion U (t+h) = U (t)−iH(t)U (t)h+
δ(t, h)h for some δ such that for all t, δ(t, h) → 0 with h. By smoothly extending
U we may assume that δ is deﬁned on some [0, T ] × (− , ) and as [0, T ] is
compact this convergence to 0 is uniform in t. Then, for any N , we have

N −1
U − I2 ≤ U ((j + 1)T /N ) − U (jT /N )2 (93)
j=0

N −1
≤ i(T /N )H (jT /N ) U (jT /N ) + (T /N )δ(jT /N, T /N )2 (94)
j=0

N −1
T T
≤ iH (jT /N ) U (jT /N )2 + δ(jT /N, T /N )2 by the triangle inequality
j=0
N N
(95)
⎛ ⎞

N −1
T
≤⎝ H (jT /N )2 ⎠ + sup δ(t, T /N )2 by unitary invariance of ·2
j=0
N t

(96)
T
→ H(t)2 dt as N → ∞ by smoothness of H (97)
0
T
≤2
n/2
H(t), H(t) dt (98)
0

using the inequality 2−n/2 H 2 ≤ H, H in the last line. Taking the
inﬁmum over such curves gives d(I, U ) ≥ 2−n/2 U − I 2 .

Corollary 28. Let |ψ , |ϕ ∈ PH. Then,

2(n+1)/2 dC (|ψ , |ϕ) ≥ |ψψ| − |ϕϕ| 2 . (99)

Proof. Let U ∈ SU(2n ) have U |ψ = |ϕ. Then,

= Tr[(U − I)† (U − I)]

2
U −I 2 (100)
≥ ψ|(U − I)† (U − I)|ψ (101)
≥ 2(1 − |ψ|ϕ|). (102)
Order p Quantum Wasserstein Distances from Couplings

2
Using the formula |ψψ| − |ϕϕ| 2 = 2(1 − |ψ|ϕ|)2 , we get

2 1 2
U −I 2 ≥2 1− 1− |ψψ| − |ϕϕ| 2 (103)
2
1 2
≥ |ψψ| − |ϕϕ| 2 . (104)
2
Combining this with Proposition 27 gives the result.

It follows from Proposition 12 that all WpC deﬁned from this dC are
nondegenerate. This gives a natural extension of the ideas of state complexity
to mixed states, to which we can apply results such as those which will be
discussed in Sect. 6.2 on classical-quantum (cq) sources.
Looking at ρ and σ in their eigenbases, we can give a concrete interpre-
tation of these values. Indeed, let

ρ= rb |ψb ψb | σ= sc |ϕc ϕc | (105)
b∈{0,1}n c∈{0,1}n

for r and s some classical probability distributions on {0, 1}n . Letting U be

any unitary such that all U |ψb = |b, V any unitary such that all V |ϕc = |c,
and q be an optimal pth -order classical coupling of the distributions r and s,
we can consider transport plan

Q = {(qbc , |ψb , |ϕc )}b,c∈{0,1}n (106)

which has a ﬁrst-order transport cost of at most G(U ) + G(V ) + W1H (r, s).
This allows us to conclude that

W1C (ρ, σ) ≤ G(U ) + G(V ) + W1H (r, s). (107)

This splits the quantum transport distance arising from complexity geome-
try into two parts: the classical transport cost between the states, and their
quantum complexity as a whole.
For this speciﬁc instance we also have subadditivity as discussed in Propo-
sition 19, as there is a natural choice for the distance dC on the tensor product.
Namely, for two systems Ha , Hb on na , nb qubits, respectively, we can equip
P(Ha ⊗ Hb ) with the distance dC on na + nb qubits. For states |ψa , |ϕa on
Ha and |ψb , |ϕb on Hb , let Ua |ψa = |ϕa , and Ub |ψb = |ϕb . We have

d(Ia ⊗ Ib , Ua ⊗ Ub ) ≤ d(Ia ⊗ Ib , Ua ⊗ Ib ) + d(Ua ⊗ Ib , Ua ⊗ Ub ) (108)

= d(Ia ⊗ Ib , Ua ⊗ Ib ) + d(Ia ⊗ Ib , Ia ⊗ Ub ) (109)
≤ d(Ia , Ua ) + d(Ib , Ub ) (110)

and therefore dC (|ψa ⊗ |ψb , |ϕa ⊗ |ϕb ) ≤ dC (|ψa , |ϕa ) + dC (|ψb , |ϕb ).
The condition of Proposition 19 is satisﬁed, and so under the natural choice
dC on P(Ha ⊗ Hb ) the Wpd distances are subadditive.
E. Beatty and D. Stilck França Ann. Henri Poincaré

6. Applications
6.1. Results for Random Quantum States
To understand these quantities in general, it is useful to look at how they
behave on random quantum states. We will look at both the versions stemming
from · W1H and from complexity geometry.
We look at a few regimes in the deﬁnition of ‘random’ states. For random
pure states, we generate |ψ according to the uniform measure on the unit
sphere in H, and take ρ = |ψψ|. For mixed states, we adjoin an auxiliary
system A of dimension s, generate a random pure state |ψ on H ⊗ A, then
take ρ = TrA |ψψ|. The distribution of ρ depends entirely on the values of s
and dimH chosen.
Note that for the two versions studied, the underlying space H is a qudit
space H = (Cd )⊗n . It is convenient, in this case, to write s = dm , as this
allows us to consider the qudit ratio c = m n between the auxiliary and base
systems. Note that while s is always an integer, m need not be, i.e. we will also
consider auxiliary dimensions that are not a power of d. We can then consider
the regime c < 1 as ‘low rank’, and regime c > 1 as ‘high rank’. We will see
that there is a marked phase transition between values c < 1, in which W1H
and W1C grow on average like diam(PH) in n, and between values c > c∗ for
some threshold c∗ ≥ 1 depending on H, in which all W1d decay exponentially
with n on average.
Previous works have highlighted similar properties for other quantities.
For example, [60] shows that for the regime c > 1 we have E[S(ρ)] ≥ n log d −
1 −(c−1)n
2d , from which we can show that the quantum relative entropy D(ρ σ) =
Tr[(ρ log ρ − ρ log σ)] has E[D(ρ σ)] decaying exponentially in n for c > 1.
Meanwhile, two random states ρ, σ with c > 1 have span σ ⊆ span ρ with prob-
ability 1, and so D(ρ σ) is in general inﬁnite. Similar results
1 have been found
for the trace distance
[61], showing that for c < 1 we have E 2 ρ − σ 1 →n→∞
1, and for c > 1 E 12 ρ − σ 1 →n→∞ 0. However, these existing analyses re-
late to the notion of distinguishability of states, whereas analysing the phase
transition in W1d allows us to go beyond this regime to discuss other properties
such as the computational complexity between random states.
For mixed states in the high-rank regime, we see exponential decay in
the expected W1d distance between two i.i.d. states, no matter the underlying
metric d. This is summarised in the following proposition. Note that while it
log dim A
is phrased for qudit systems, it applies for any H, A where c = log dim H and
dn = dim H. We write diamd (PH) for the diameter of PH under metric d.
Proposition 29. Let H = (Cd )⊗n with i.i.d. random mixed states ρ, σ generated
by an auxiliary system of A of dimension s = dm . Let c = m n , and suppose
c > 1. Then, for any β > 0, we have
! " 1
P W1d (ρ, σ) ≥ βd−(c−3)n/2 diam(PH) ≤ 2 . (111)
β
Proof. For random ρ, σ generated from large auxiliary systems, we generally
expect both to be close to maximally mixed. And so letting the minimum
Order p Quantum Wasserstein Distances from Couplings

eigenvalue
1 among
ρ and σ be d1n − δ, we can split up ρ into parts ρ − d1n − δ I
and dn − δ I, both of which 1 are positivesemideﬁnite.
We can do the same for
σ. We
can then
transport1 d n − δ I onto d
1
n − δ I at zero cost, and transport
ρ− d1n − δ I onto
1 σ − dn − δ I via any partial transport plan, at a maximum
cost of Tr ρ − dn − δ I diamd (PH) = δd diamd (PH). We will show that this
n

is most likely very small.

Focusing on ρ, we know from [60] that
1
E[S(ρ)] ≥ n log d − d−(c−1)n . (112)
2
Knowing also that for the von Neumann entropy S we have S(ρ) ≤ n log d,
and using the top-down Markov inequality, we get that for any α > 0 we have
! α " 1
P S(ρ) ≥ n log d − d−(c−1)n ≥ 1 − . (113)
2 α
I
Using then that D ρ|| dn = n log d − S(ρ), and the quantum Pinsker’s in-
equality, we have

I α −(c−1)n/2 1

P ρ − n ≤ d ≥1− (114)
d 1 2 α
and so

I
P ρ − ≤ α d−(c−1)n/2 ≥ 1 − 1 . (115)
dn ∞ 2 α
Reintroducing σ, we then get that
2
I I α −(c−1)n/2 1 2

P max ρ − n , σ − n ≤ d ≥ 1− ≥1 − .
d ∞ d ∞ 2 α α
(116)
# $
Note that for any ρ, σ with max ρ − dIn ∞ , σ − dIn ∞ ≤ α2 d−(c−1)n/2 ,

we can take δ = α2 d−(c−1)n/2 . Then, setting β = α/2, and applying the
transport plan described above, we have

α −(c−1)n/2 n 2
P W1 (ρ, σ) ≥
d
d · d diamd (PH) ≤ (117)
2 α
and so
! " 1
P W1d (ρ, σ) ≥ βd−(c−3)n/2 diamd (PH) ≤ 2 . (118)
β

This result can be easily applied to the two underlying distances men-
tioned above. For qudit ratios c > 3 + n2 logd diamd (PH) we can deﬁne β by,
for any λ ∈ (0, 1), value

2 n
logd β = λ c − 3 − logd diamd (PH) (119)
n 2
to show that the probability of an exponentially small deviation falls exponen-
tially as the number n of qudits increases.
E. Beatty and D. Stilck França Ann. Henri Poincaré

For the W1H case, we know diamd (PH) = n, and so this exponential
decay applies for qubit ratio c > 3 and large enough n. Applying this to the
expectation gives, taking λ = 1/3 for an optimal decay rate,

1 1
Eρ,σ W1H (ρ, σ) ≤ 1 − 2 βd−(c−3)n/2 diamd (PH) + 2 diamd (PH) (120)
β β

1
≤ n1−λ expd − (1 − λ)(c − 3)n + expd n1+2λ (λ(c − 3)n)
2
(121)

1 2
= expd − (c − 3)n + logd n
3 3

1 5
+ expd − (c − 3)n + logd n (122)
3 3
% &
1
= n2/3 + n5/3 expd − (c − 3)n (123)
3
For the W1C case, any n-qudit gate can be synthesised in at most 2n (2n −
1) one- and two-qubit gates [44], and so the metric space PH has diameter at
most 22n . It then follows that for any qudit ratio c > 7, the probability of an
exponentially large deviation becomes exponentially small.
c−3
Applying Eq. (111) to the expectation gives, taking λ = 3(c−7) for an
optimal decay rate,

1 1
Eρ,σ W1C (ρ, σ) ≤ 1 − 2 β2−(c−3)n/2 diamd (PH) + 2 diamd (PH) (124)
β β
≤ 2−(1−λ)(c−3)n/2 2(1−λ)2n + 2−λ(c−3)n 2(1+2λ)2n (125)
−(1−λ)(c−7)n/2 −(λ(c−7)−2)n
=2 +2 (126)
= 2 · 2(c−9)n/3 (127)
giving exponential decay in expectation for qubit ratios c > 9.
For the low-rank setting, we look first at the W1H distance. There are
two lines of intuition here. The first is that W1H generalises the Hamming
distance, and the Hamming distance quantifies the local distinguishability of
d-nary strings. If this property propagated to the quantum setting, we’d ex-
pect W1H to be small on average as random pure states are generally locally
indistinguishable. This was the behaviour conjectured in [24]. The second is
that the average Hamming distance between two d-nary strings of length n is
n(1 − 1/d), and so we might also expect the average W1H distance between
random pure strings to grow linearly with the number of qudits.
Let H = (Cd )⊗n and let m = logd s, noting again that, while s is an
integer, m need not be. In the case m < n, we can apply Theorem 9.1 of [54]
to lower bound the expected distance between two low-rank random mixed
states. A similar result was noted independently in [49].
Proposition 30. Let ρ, σ be two i.i.d. random mixed states on H = (Cd )⊗n
generated using an auxiliary system of dimension s = dm for m < n. Write
Order p Quantum Wasserstein Distances from Couplings

m
c= n. Then,
H
Eρ,σ Wp=1 (ρ, σ) ≥ λc n (128)
where λc satisﬁes (1 − c) log d = h2 (λ)+λ log(d2 −1) for h2 the binary entropy.
H
Proof. First note from Proposition 24 that Wp=1 (ρ, σ) ≥ ρ − σ W1H , and so
! "
we prove that Eρ,σ ρ − σ W1H ≥ λn.
Fix ρ, and note that averaging over σ and using convexity of the norm,
we have
! "
Eσ ρ − σ W1H ≥ ρ − Eσ σ W1H (129)

I⊗n
= d
ρ − dn H . (130)
W 1

Applying Theorem 9.1 of [54] then gives

⊗n
1 I
(1 − c) log d ≤ S(ρ) − S dn
n d

ρ − I⊗n /dn H ρ − I⊗n /dn H
d W1 d W1
≤ h2 + log(d2 − 1).
n n
(131)
Noting then that the function g(t) = h2 (t) + t log(d − 1) takes the value
2

(1 − c) log d at exactly one value λ ∈ [0, 1], and that for t < λ we haveg(t) <
g(λ) and for t > λ we have g(t) > g(λ), we conclude that ρ − I⊗n d /d
n
W1H
>
λn. Averaging over ρ gives the result.

For ρ, σ random pure states, we simply take the case m = 0. In gen-

H
eral, this shows that the expected Wp=1 distance between two random states
generated using small auxiliary systems grows linearly with the number n of
qudits. This is in direct contrast to initial conjecture, and brings up inter-
esting ideas about the nature of · W1H . On one hand, this gives insight into
the behaviour of the widely celebrated · W1H norm, showing that it behaves
qualitatively very diﬀerently in the quantum setting to the classical setting. It
also highlights the importance of entangled states to analysis of · W1H . This
in turn opens up question about the qualitative nature of Lipschitz operators
according to its dual norm · L . We have shown here that Lipschitz operators
applied to ρ − σ do not necessarily detect local distinguishability, so what do
they detect? On the other hand, this calls into question the assumptions made
in [38] that the weighted sum of Pauli coeﬃcients of a Hermitian operator O
is a good approximation to O L , as such an approximation links directly to
local distinguishability in the dual setting. This will be discussed further in
Sect. 6.4.
This property can be understood more concretely by lookingn at the dis-
tance operator [28] of a state |ψ. The distance operator Λ|ψ = i=1 iΠVi ∩Vi−1
⊥
E. Beatty and D. Stilck França Ann. Henri Poincaré

⊥
is defined as a weighted sum of projectors onto subspaces Vi ∩ Vi−1 , where
Vi = span{O |ψ : O acts on at most i qudits}. (132)
This has been specifically constructed so that Tr[Λ|ψ |ψψ|] = 0, and that if
|ϕ can be written as a sum of states which differ from ψin at most k qubits,
then Tr[Λ|ψ |ϕϕ|] ≤ k. It was proven in [54] that Λ|φ L ≤ 1, and analysis
of Vi shows that Tr[Λ|ψ ] = O(nd ). It follows that, for fixed
n
of the dimension

|ψ, E|ϕ ∼μHaar Tr[Λ|ψ (|ψψ| − |ϕϕ|)] = O(n).
Turning our
2attention
to the W1C distance generated from complexity
⊗n
geometry on P (C ) , we see a similar picture for low-rank states. As noted
earlier, for the approximate gate complexity G(U, ) and the gate complexity
G(U ), we have the bound
κG(U, )1/3 2/3
≤ d(I, U ) ≤ G(U ) (133)
n2
for some constant κ > 0.
For pure states, we also know that WpC (|ψ ψ| , |ϕϕ|) = dC (|ψ , |ϕ).
And so to show the behaviour of WpC distances on low-rank states, we look at
dC .
Lemma 31. Let H = (C2 )⊗n be an n-qudit space, and let S ⊆ SU(4) be a
finite universal gate set with inverses. Letting GS (U, ) be the -approximate
gate complexity of U from set S viewed as a set of gates on 2 qubits, and
G(U, ) the -approximate gate complexity of U using any one- or two-qubit
gates, we have
−1
G(U, )poly(log(G(U, )) + log( )) ≥ GS (U, 2 ) (134)
Proof. Let V1 , . . . , VG(U,) be any circuit of one- and two-qubit gates to synthe-
sise V such that U − V ∞ ≤ . Using Solovay–Kitaev, each of these can be ap-
proximated to within error /G(U, ) in poly
(log (G(U, )/ )) gates from S. Compounding errors linearly, these form a cir-
cuit of length G(U, ) poly(log(G(U, )) + log( −1 )) of gates from S which syn-
thesises U to within operator norm 2 .
Lemma 32. Let S ⊆ SU(4) be a finite universal gate set with inverses and
⊗n
ρ, σ i.i.d. random quantum states on H = C2 generated by auxiliary
opt
system A of integer dimension s = 2cn where 0 ≤ c < 1. Let GS (Uρ→σ , )=
% &
opt
min GS U|ψ →|ϕ , : |ψ ∈ span ρ, |ϕ ∈ span σ . Then,
! "
opt
, ) ≤ 2(1−δ)n ≤ e−Ω(2 log(1/)) .
n
Pρ,σ GS (Uρ→σ (135)

Proof. Let Cx be the set of circuits of S of length x.

opt
GS (Uρ→σ , )
≤ 2(1−δ)n =⇒ ∃ |ψ ∈ span ρ, |ϕ ∈ span σ, C ∈ C2(1−δ)n s.t. C |ψ ≈ |ϕ .
(136)
Order p Quantum Wasserstein Distances from Couplings

Note that span ρ and span σ are i.i.d. hyperplanes distributed according to the
Haar measure of the Grassmannian Grs H. Fix element R of Grs H and choose
an -ball covering of (R \{0})/C of minimal size N = eO(2 log(1/)) centred on
cn

elements {yi }N i=1 . Choose ﬁxed unitaries Uρ and Uσ such that Uρ R = span ρ
and Uσ R = span σ, and independent random unitaries Vρ , Vσ ∼ μHaar on
span ρ, span σ, respectively. This gives a randomly chosen independent -
covers {Yi = Vρ Uρ yi }N i=1 of span ρ and {Zj = Vσ Uσ yj }j=1 of span σ. Each
N

Yi , Zj is distributed according to the Haar measure on span ρ, span σ, respec-

tively. And so
P [∃ |ψ ∈ span ρ, |ϕ ∈ span σ, C ∈ C2(1−δ)n s.t. C |ψ ≈ |ϕ] (137)
≤ P [∃1 ≤ i, j ≤ N, C ∈ C2(1−δ)n s.t. CYi ≈3 Zj ] (138)

N
≤ P [CYi ≈3 Zj ] (139)
i,j=1 C∈C2(1−δ)n

= N 2 |C2(1−δ)n |P|ψ ,|ϕ ∼i.i.d. μHaar [|ψ ≈3 |ϕ] (140)

2(1−δ)n vol(3 -ball)
= eO(2
cn
log(1/))
(n(n − 1)|S|) (141)
vol(PH)
= eO(2 log(1/)) O(2(1−δ)n log n) −Ω(2n log(1/))
cn
e e (142)
−Ω(2n log(1/))
=e . (143)

Combining these two propositions with Eq. (89) gives the result.
⊗n
Corollary 33. Let ρ, σ be i.i.d. states on H = C2 generated by an auxiliary
system A of dimension s = 2cn where 0 ≤ c < 1. For all δ > 0,
' 1/3 (
2/3 −1 2(1−δ)n −1
≤ e−Ω(2 log((2) )) .
n
P|ϕ W1 (ρ, σ) ≤
C
n κ −1
poly(n, log )
(144)
Proof. The proof is technical and not particularly instructive, so has been
placed in Appendix D 2
In other words, the chance of two independent low-rank states on n qubits
being less than exponentially far apart in the complexity geometry quantum
Wasserstein distance becomes exponentially small as n tends towards inﬁnity.
As far as we are aware, no such link between Wasserstein distances and
circuit complexity has been made in the classical case due to the discreteness
of classical circuits. In particular, such comments on the computational com-
plexity cannot ever be made from a classical point of view, as the maximum
classical computational complexity between any two n-bit strings is at most
n, whereas the quantum computational complexity between any two n-qubit
states is, in general, exponential in n. Statements which accurately quantify the
computational complexity between mixed quantum states will always require
quantum tools, which again points to the importance of the WpC distances.
E. Beatty and D. Stilck França Ann. Henri Poincaré

The first-order Wasserstein distance of [50] has been used in the quan-
tum setting to give lower bounds on the circuit complexity of shallow random
quantum circuits [39], though, just as from the classical point of view, the
bound ρ − σ W1H ≤ n greatly restricts the effectiveness of this lower bound.
The results are only significant for circuits with a number of gates linear in the
number of qubits, as the maximum lower bound possible using · W1H is n. One
potential application of WpC could be to extend this result to give lower bounds
on the circuit complexity of random quantum circuits of arbitrary depth, as it
does not suffer from this linear constraint.

6.2. Operational Interpretation in Terms of Classical–Quantum Sources

In this section, we show that the Wpd distances have an operational significance
for distances between classical–quantum (cq) states and classical–quantum
(cq) sources. This application is particularly relevant to the quantum Wasser-
stein distances presented in this work because its value enhanced by both the
adaptability in terms of the underlying distance d, and by the generality in
order p. Indeed, let R and S be two cq sources, each controlled by a classi-
cal random variable X on {1, . . . , N } with probabilities pi . On input i, let R
output |ψi ψi | and S output |ϕi ϕi |. We define r to be the random variable
which is the output of R, and s the random variable which is the output of S.
Let the output be in finite-dimensional Hilbert space H and equip PH with
distance d. These sources can be simulated by measuring the first (classical)
register of the following cq states in the standard basis:

N
N
ρ̃ = pi |ii| ⊗ |ψi ψi | σ̃ = pi |ii| ⊗ |ϕi ϕi |. (145)
i=1 i=1
N N
Letting then ρ = i=1 pi |ψi ψi | and σ = i=1 pi |ϕi ϕi |, these ρ and
σ are the expected outputs of R and S, respectively. Using d, we can then
talk about the distance between the outputs r and s as a random variable
taking value d(|ψi , |ϕi ) when X = i. The Wpd distance provides a lower
bound between the pth moment of the distance between the outputs. Broadly
speaking, we interpret the distance as the cost of moving between R and S.
Proposition 34. Let R and S be cq sources with expected outputs ρ and σ,
respectively, on dimension D, and let 1 ≤ p < ∞. Given access to the out-
put register and classical control register, the expected distance d between the
outputs r and s satisfies
EX [d(r, s)p ] ≥ Wpd (ρ, σ)p (146)
and this bound is sharp.
Proof. For the lower bound, consider quantum transport plan Q = {(pi , |ψi ,
|ϕi )}N th
i=1 between ρ and σ. The p -order cost of this transport plan is

N
Tpd (Q) = pi d(|ψi , |ϕi )p = EX [d(r, s)p ] (147)
i=1
Order p Quantum Wasserstein Distances from Couplings

which is lower bounded by the optimal pth -order quantum transport cost
Wpd (ρ, σ).
For sharpness, let Q = {(qj , |ψj , |ϕj )}j∈J be any ﬁnite transport plan
between ρ and σ. The sources

R= qj |jj| ⊗ |ψj ψj | S= qj |jj| ⊗ |ϕj ϕj | (148)
j∈J j∈J

controlled by random variable Y taking values in J with probabilities qj , then

have pth moment

EY [d(r, s)p ] = qj d(|ψj , |ϕj )p . (149)
j∈J

Taking the inﬁmum over all transport plans gives sharpness.

We also get from Proposition 16 that when d is continuous, we can take
the first register to have size at most 2D2 in the equality case.
In the case where d is the complexity geometry metric dC , this means
that WpC effectively quantifies the pth moment of the gate complexity of trans-
forming one source into another, post-output. Indeed, for sources R and S as
opt opt
→|ϕ ∈ SU(2 ) a unitary with U|ψ →|ϕ |ψ = |ϕ, and minimal
n
above, and U|ψ
complexity among all such U , we know that
EX [G(Ur→s ) ] ≥ WpC (ρ, σ)p .
opt p
(150)
th
From Proposition 16, for any ρ, σ we may take an optimal p -order transport
plan Q from ρ to σ and let R and S be cq sources defined from Q as above.
From Eq. (133), we have
% &1/3
opt 2/3
κG U|ψ j →|ϕj
,
dC (|ψj , |ϕj ) ≥ (151)
n2
and therefore there exist cq sources R, S controlled by the same random vari-
able with expected outputs ρ, σ, respectively, such that
! "
opt p/3 2/3
κE G (Ur→s , )
Wp (ρ, σ) ≥
C p
. (152)
n2
In other words, for any ρ, σ there exist cq sources R and S with expected
outputs ρ, σ such that the pth power of the pth -order Wasserstein distance
between their expected outputs upper bounds the (p/3)th moment of the -
approximate gate complexity of transforming between them post-output. We
note that such a broad application to arbitrary moments is only possible for
our Wpd , as existing quantum definitions do not cover beyond p = 1 and p = 2.
d
In the infinite-order setting, the W∞ distance gives a lower bound for the
d
highest possible value of d(r, s). It follows from the definition of W∞ that
max d(|ψi , |ϕi ) ≥ W∞
d
(ρ, σ) (153)
1≤i≤N

and taking the inﬁmum of the left-hand side over Q(ρ, σ) shows that the bound
C C
is sharp. Operationally, for the W∞ distance, this means W∞ (ρ, σ) is a lower
E. Beatty and D. Stilck França Ann. Henri Poincaré

bound for the worst-case scenario cost of transforming R into S, post-output.

For the lower bound, we cannot guarantee an analogue to Eq. (152) as the
existence of an optimal transport plan requires continuity of the Wasserstein
distances, which is not given (either classically or quantumly) for p = ∞.
6.3. Hypercontractivity and Noise
In the hierarchy of Wpd in order p, we saw in Eq. (22) that if p1 < p2 then
Wpd1 (ρ, σ) ≤ Wpd2 (ρ, σ). This hierarchy mirrors the hierarchy in the standard
Lp norms, for which the notion of hypercontractivity [4,5,37] has been used in
various ways to quantify the noise of an operation [37,47]. Broadly speaking,
an operator T on a space of functions is hypercontractive if, for some p2 > p1 ,
we have for all functions f that T f p2 ≤ f p1 . Hypercontractivity and the
noise of an operator are most closely linked in the hypercontractivity theo-
rem [5, Chapter 7], which demonstrates the hypercontractive properties of the
standard Boolean noise operator.
We will show that this idea carries over to the quantum Wpd distances,
Wpd1 (ρ,σ)
and that the ratio d
Wp2 (N (ρ),N (σ))
can be considered as a measure of noise
in the channel N and be used as a tool to derive other useful properties, like
concentration inequalities. This is a technique which is unique to this definition
of a quantum Wasserstein distance, as it requires direct comparison of quantum
Wasserstein distances of two different orders p1 , p2 . Such a comparison has not
yet been possible as no other definition covers more than one order p. While
different definitions of a quantum Wasserstein distance exist for both p = 1 and
p = 2, they are qualitatively so different that they cannot yet be compared in
any meaningful way to discuss hypercontractivity. Furthermore, our definition
of hypercontractivity does not require us to restrict to quantum channels that
have a faithful fixed point, like it is the case for the standard approach [7].
To illustrate this advantage, we will study hypercontractive properties of
the replacement channel Rδ,x , given by
Rδ,x (ρ) = (1 − δ)ρ + δ|xx| (154)
and the depolarising channel Sδ given by
Sδ (ρ) = (1 − δ)ρ + δI/D (155)
where D = dim H. These are both examples of a more general type of depo-
larising map
Sδ,τ (ρ) = (1 − δ)ρ + δτ (156)
for some fixed quantum state τ . But for the case of Rδ,x , it is clear that it does
not admit a faithful fixed point. We leave studying hypercontractive properties
of more general channels to future work.
Proposition 35. Let ρ, σ be two quantum states on H and let 1 ≤ p1 < ∞,
and suppose Wpd1 (ρ, σ) = M . For p2 > p1 , let 1 − δ ≤ (M/diamd (PH))p2 −p1 .
Then, the general depolarising channel Sδ,τ has
Wpd2 (Sδ,τ (ρ), Sδ,τ (σ)) ≤ Wpd1 (ρ, σ). (157)
Order p Quantum Wasserstein Distances from Couplings

Proof. Let Q = {(qj , |ψj , |ϕj )}j∈J be any p1 th -order transport plan from ρ
to σ, and let
Q = (1 − δ)Q ∪ δ{(λi , |ωi , |ωi )}i∈I (158)

for τ = i∈I λi |ωi ωi | a spectral decomposition. This is a transport plan from
Sδ,τ (ρ) to Sδ,τ (σ). This gives
⎛ ⎞1/p2

Tpd2 (Q )1/p2 = ⎝(1 − δ) qj d(|ψj , |ϕj )p2 ⎠ = (1 − δ)1/p2 Tpd2 (Q)1/p2
j∈J
1−p1 /p2
M
≤ Tpd2 (Q)1/p2 . (159)
diamd (PH)
So it remains to show that
1−p1 /p2
M
d
Tp1 (Q) 1/p1
≥ Tpd2 (Q)1/p2 . (160)
diamd (PH)
Given quantities {Aj }j∈J subject to constraints

qj Apj 1 = M p1 (161)
j∈J
0 ≤ Aj ≤ diamd (PH), (162)
q
the maximum value of j∈J qj Aj
is achieved when all Aj are either diamd (PH)
or 0, with values q1 = (M/diamd (PH))p1 and q2 = 1 − q1 . This gives
Tpd2 (Q) ≤ (M/diamd (PH))p1 diamd (PH)p2 (163)
and therefore that
p /p 1−p1 /p2
Tpd1 (Q)1/p1 M diamd (PH) 1 2 M
≥ = .
Tpd2 (Q)1/p2 diamd (PH) M diamd (PH)
(164)
This gives Tpd2 (Q )1/p2 ≤ Tpd1 (Q)1/p1 , and taking the inﬁmum over Q gives the
result.
We note from the condition 1−δ ≤ (M/diamd (PH))p2 −p1 that this result
is applicable only when Wpd1 is close to diamd (PH) or when p2 − p1 is small.
Otherwise, we would require δ so large that the channels are no longer of
practical use. Our results on random states from Sect. 6.1 demonstrate that
this is indeed a relevant case, as for random low-rank states, WpH and WpC are
generally high.
Particularly in the case of complexity geometry, this shows that channel
noise reduces complexity. These results also demonstrate the value of consid-
ering arbitrary p in our construction of the Wpd distances. Such consideration
is not possible without being able to relate Wasserstein distances of diﬀer-
ent orders to one another. In general, for arbitrary p1 , p2 , this shows that
Wpd1 (ρ,σ)
Wpd2 (N (ρ),N (σ))
can be well considered as a measure of noise in the channel N .
This is applicable, in particular, in situations where we have no knowledge
E. Beatty and D. Stilck França Ann. Henri Poincaré

of the actual structure of the noise of a channel, and so cannot determine it

eﬃciently, but still want to gauge how noisy it is on a qualitative level.
In situations where we do not have M/diamd (PH) large, we can still use
the notion of hypercontractivity to explore the nature of quantum channels.
For example, hypercontractivity can tell us how a channel N aﬀects the con-
centration of measure of one state with respect to the span of another. This
is formalised in the following proposition. Here, we assume that the span of
N (σ) is very small. We can then use hypercontractivity to discuss how much
the mass of N (ρ) concentrates within a distance K of the span of N (σ).
Proposition 36. Let ρ, σ be states on H and suppose PH is equipped with metric
d. Let N : D(H) → D(H) be a quantum channel, and suppose that we have
hypercontractivity in N with respect to orders p1 < p2 : that is, Wpd1 (ρ, σ) ≥

Wpd2 (N (ρ), N (σ)). Then, for K ∈ Wpd1 (ρ, σ), diamd (PH) , there is a coupling
of N (ρ) and N (σ), Q = {(qj , |ψj , |ϕj )}j∈J , s.t.
p2
Wpd1 (ρ, σ)
qj ≤ (165)
K
j:d(|ψj ,|ϕj )>K

Proof. Let M = Wpd1 (ρ, σ) ≥ Wpd2 (N (ρ), N (σ)). It follows that in the limit of
2 -order transport plans Q = {(qj , |ψj , |ϕj )}j∈J between N (ρ) and
optimal pth
N (σ), we have
p
qj d (|ψj , |ϕj ) 2 ≤ M p2 . (166)
j∈J

Let L = j∈J:d(|ψj ,|ϕj )>K qj . This also has L = P (d (|ψX , |ϕX ) ≥ K),
where X is a random variable taking values in J according to law P(X = j) =
qj . Note therefore that by Markov’s inequality
p2 p2
j∈J qj d (|ψj , |ϕj ) M
L≤ p
≤ . (167)
K 2 K

Thus, we see that a hypercontractive inequality implies that most of the
transport plan is concentrated on states that are close, and this concentration
becomes more pronounced the lower the value of p1 (as the initial distance is
monotonically increasing in p1 ) and the higher the value of p2 .
6.4. Quantum Wasserstein Generative Adversarial Networks
One of the most promising near-term applications of quantum devices is in
quantum machine learning. Among these methods, one framework which has
attracted attention in recent years is that of generative adversarial networks,
or GANs.
GANs [34] form a framework in machine learning which seeks to generate
new data which are indistinguishable from the training data. For example, they
can be used to generate images which look like authentic photographs when
trained on datasets of photographs [3], or provide image-to-text translation
Order p Quantum Wasserstein Distances from Couplings

for automatic image description for visually impaired users [40]. The main
feature of GANs is learning via a zero-sum game between a generator, which
is trained to generate data, and a discriminator, which is trained to distinguish
between the generated data and the training data. The standard GAN takes
its definition of ‘far’ to be ‘large Jensen-Shannon divergence’.
Wasserstein GANs [1] were proposed in 2017 as an alternative to standard
GANs, which redefine ‘far’ to mean ‘large first-order Wasserstein distance’.
These provide numerous improvements over standard GANs, including the
reduction of mode collapse, and allowing reference distributions which are
concentrated to small dimension in a space of large dimension.
Quantum GANs [20] are a natural quantum equivalent of classical GANs,
where a quantum state generator G parameterised by θ is trained to a target
distribution ρtar . The standard architecture for such a generator is a shallow
quantum circuit, whose gates are functions of θ.
With standard distinguishing functions, such as the trace distance or
fidelity, these suffer greatly from the problem of barren plateaus [41]—areas
where, as the number n of qudits grows, the gradient of the objective function
decays to 0. As in the classical setting, GANs which use the theory behind the
first-order quantum Wasserstein distance of De Palma et al. [50] have been
proposed [23,38] to eliminate this problem.
The form of such a quantum Wasserstein GAN (qWGAN) is as follows.
The generated state is parameterised by θ, and is stored as a set {(p1 , U1 (θ), . . . ,
pr Ur (θ)} where p(θ) = {p1 , . . . , pr } is a probability distribution also parame-
terised by θ. This represents the state

r
G(θ) = pi Ui (θ)|00|Ui (θ)† . (168)
i=1

The discriminator is a Lipschitz observable O, which aims to discriminate

between G(θ) and the target state ρtar . One round of theoretical qWGAN
optimisation takes the following form:
1. Replace O by Omax = argmaxOL ≤1 Tr[O(G(θ) − ρtar )],
2. Calculate the gradients in θ of Tr[Omax (G(θ) − ρtar )],
3. Update p(θ) and each Ui (θ) according to these gradients.
For practical reasons, linked to the difficulty of optimising over the set
{O : O L ≤ 1}, the algorithm in practice uses an approximation to O L
rather than O L itself. This has, at least for some toy examples such as the
GHZ state, been able to eliminate the issue of barren plateaus seen in standard
quantum GANs. This highlights the need for flexibility in the distinguishability
metric used in quantum GANs.
Another problem among similar lines, seen frequently in generative net-
works ‘in the wild’, is the misalignment of human ideas of indistinguishability
and computer ideas of indistinguishability. Many structures which generate
data are only able to mimic the training data at a surface level, making them
seem correct at first glance but with unsettling artefacts of computer genera-
tion or with fundamental incorrectness or impossibility of the data generated.
E. Beatty and D. Stilck França Ann. Henri Poincaré

This again highlights the need for broad flexibility in the distinguishability
metric used in quantum Wasserstein GANs.
Such broad flexibility can theoretically be easily achieved by replacing the
Lipschitz norm · L associated with the · W1H norm, with the more general
Ld (O) constant defined in Eq. 51, for an underlying distance d satisfying the
required continuity conditions for this to exist. This would, in theory, allow us
once again to avoid barren plateaus, but also to prioritise qualitatively different
ideas of discrimination via the choice of the metric d.
This would give rounds of qWGAN optimisation of the following form:
1. Replace O by Omax = argmaxLd (O)≤1 Tr[O(G(θ) − ρtar )],
2. Calculate the gradients in θ of Tr[Omax (G(θ) − ρtar )],
3. Update p(θ) and each Ui (θ) according to these gradients.
This model opens a theoretical avenue to qWGANs with flexible notions
of distinguishability, though we note that implementation of this lies beyond
the scope of this work due to its highly variable nature. In particular, the big
challenge is accurately calculating and maximising Ld (O), and it is likely that
this will require bespoke approximations to Ld for each choice of d.
Any such approximation would need to be careful to preserve the nature
of the distance chosen, in order to avoid the issue of misalignment of notions
of distinguishability. For example, the above mentioned works [23,38] discuss
qWGANs which supposedly give convergence in · W1H , but the approximation
used for the Lipschitz constant of O is a weighted sum of the coefficients of the
Pauli decomposition of O, which directly measures local distinguishability as
discussed in [38]. We have shown in Sect. 6.1 that the · W1H does not reflect
local distinguishability in general.
More concretely, they only optimise over operators O of the form
n (i)
(i,j)
O = ωI I + ωP P (i) + ωP,Q P (i) ⊗ Q(j)
i=1 P ∈{X,Y,Z} 1≤i<j≤n P,Q∈{X,Y,Z}

(169)
where P (i) is the Pauli operator P acting on qubit i, and tuples (i, j) are
unordered. To ensure these have O L ≤ 1, they force the coeﬃcients ωP,Q to
satisfy

(i) (i,j)
2 max ω P + ωP,Q ≤ 1. (170)
1≤i≤n
P ∈{X,Y,Z} j =i P,Q∈{X,Y,Z}

Let us call the set of such operators on n qubits L̃n .

Proposition 37. Let |ψ , |ϕ be independent pure states on n qubits distributed
according to the Haar measure.

P max Tr[O(|ψψ| − |ϕϕ|)] ≥ n24− 4 ≤ 6(3n2 − 1)2− 4 .
n n
(171)
O∈L̃n

Proof. Let R be a Pauli string on one or two qubits. Note that R ∞ = 1,

and so Tr[R(|ψψ| − |ϕϕ|)] ≤ ρ − σ 1 ≤ W11 (ρ, σ) where ρ and σ are
Order p Quantum Wasserstein Distances from Couplings

the marginals of |ψψ| and |ϕϕ|, respectively, on two qubits containing the
support of R. So from Proposition 29 applied with qubit ratio c = n2 − 1,
underlying dimension 4, and parameter β = 2n/4 , we have that

P Tr[R(|ψψ| − |ϕϕ|)] ≥ 24− 4 ≤ 4 · 2− 2 .
n n
(172)
Letting ω be the sum of absolute values of the Pauli coefficients of O (except
ωI ). From Eq. (170), we have ω ≤ n. It follows via a union bound that

P Tr[O(|ψψ| − |ϕϕ|)] ≥ n24− 4 ≤ 6(3n2 − 1)2− 4 .
n n
(173)

This shows that the probability of an exponentially small deviation from
zero decays exponentially in n. In terms of the qWGAN algorithm of [38],
this means that even with a convergence threshold which decays exponentially
in n, the ability of the discriminator to distinguish between randomly chosen
states decays exponentially too. This has significant practical implications.
Namely, even for exponentially low convergence thresholds, the algorithm will
on average fail at the first iteration, as a randomly chosen initial ρinit = |ψψ|
will be indistinguishable from the target ρtar = |ϕϕ| by any O allowed by
the algorithm. However, our results on random states show that random pure
states |ψψ| and |ϕϕ| are on average maximally far apart in · W1H , so the
algorithm will believe that the states have converged at the initial step despite
them being maximally far apart.
When considering qWGANs when defined with respect to W1d , avoiding
such misalignment in approximation of Ld (O) would be an important consid-
eration for any practical implementation of such algorithms.

7. Conclusion
In this work, we have introduced a novel definition of the quantum Wasser-
stein distance by combining the coupling method and a metric on the set of
pure states. This novel definition successfully captures the essence of the clas-
sical Wasserstein distance. A significant aspect of our approach is its inherent
adaptability, effortlessly incorporating established metrics on quantum states
such as trace distance and naturally extending pure-state metrics, for instance,
Nielsen’s complexity metric, to cater to mixed states.
Furthermore, we established various properties of this new definition.
While we acknowledge that these properties are not exhaustive, they never-
theless offer advantages compared to other recent generalisations, such as the
fact that one definition can cover several examples in the literature in a unified
way. A significant challenge remains: establishing the triangle inequality in its
generality. Our findings, though, hint at the possibility that under suitable
conditions the dual approach yields a proof of the triangle inequality.
Our exploration of specific cases reveals the W11 distance as a potentially
powerful tool for bounding the trace distance between quantum states, draw-
ing parallels with classical approaches for determining total variation distance
E. Beatty and D. Stilck França Ann. Henri Poincaré

and mixing time bounds. Additionally, our work enriches the understanding of
the complexity geometry of quantum states, offering a new lens through which
to view and quantify the complexity of transformations within quantum en-
sembles.
Our examination of the behaviour of the Wasserstein distance under ran-
dom quantum states unveils various phase transitions. These results, derived
from entropic inequalities and continuity bounds, debunk some existing specu-
lations (e.g. that · W1H captures local distinguishability) while affirming oth-
ers (the complexity of small subsystems of random quantum states is low).
In conclusion, our research represents a significant stride forward in the
pursuit of understanding optimal transport in quantum mechanics. The novel
quantum Wasserstein distance that we have proposed holds great promise, not
only as an analytical tool but also as a medium for further exploration and
discovery in quantum computation and information. While our work has paved
the way, many challenges and open problems remain, especially when it comes
to practical implementation of the applications laid out here.

Acknowledgements
DSF and EB would like to thank Alexander Müller-Hermes, Raul Garcı́a-
Patrón, Philippe Faist, Fanch Coudreuse, and Guillaume Aubrun for inter-
esting discussions. This project received support from the PEPR integrated
project EPiQ ANR-22-PETQ-0007 part of Plan France 2030.

Declarations

Conflict of interest The authors declare no conﬂict of interest.

Publisher’s Note Springer Nature remains neutral with regard to jurisdic-
tional claims in published maps and institutional aﬃliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive
rights to this article under a publishing agreement with the author(s) or other
rightsholder(s); author self-archiving of the accepted manuscript version of
this article is solely governed by the terms of such publishing agreement and
applicable law.

Appendix A: Comparison to Other Proposed Definitions of a

Quantum Wasserstein Distance
Until now, finding a coherent way to define a quantum optimal transport cost,
regardless of the underlying metric on the Hilbert space, has proved elusive.
Some definitions (such as in [52], [15] and [33]) work well for specific orders p,
and specific underlying metrics. A further definition was proposed recently in
Order p Quantum Wasserstein Distances from Couplings

[18] using standard quantum couplings and using projection onto the asym-
metric subspace as a cost function, with the motivation of mimicking the defi-
nition of total variation distance as the Wasserstein distance corresponding to
the trivial metric.
Originally, [18] defined a second-order Wasserstein semi-distance W on
states on H = Cd as follows. Letting F |a ⊗ |b = |b ⊗ |a define the flip
operator on H ⊗ H, we can then define the symmetric projector Psym (d) as
the projector onto the 1-eigenspace of F, and Pasym (d) the projector onto its
(−1)-eigenspace. Note Psym (d) + Pasym (d) = ICd ⊗Cd .
They then define the quantum optimal transport cost

T (ρ, σ) = min Tr[τAB Pasym (d)] (174)

τAB ∈D(Cd ⊗Cd ),τA =ρ,τB =σ

and W (ρ, σ) = T (ρ, σ). This gives a 2-Wasserstein semi-distance which is
equivalent to the trace distance. This definition was further studied in [29] and
[12], and refined in [42].
Specifically, [42] showed that this original definition does not satisfy a
data-processing inequality, so it does not mirror the original trace distance in
this regard. The definition in Eq. (174) was changed to give the data processing
inequality by considering a complete version:

Ts (ρ, σ) = T (ρ ⊗ I/2, σ ⊗ I/2) (175)

and Ws (ρ, σ) = Ts (ρ, σ). Note that in general, the complete version of a
quantity requires an optimisation over auxiliary systems of arbitrary size; how-
ever, [42] shows that the maximally mixed state of one qubit suffices.
There are two main differences between this path to obtaining a gener-
alisation of the Wasserstein distance and ours. First, in this path, the cost of
transforming one state into another is given by the expectation value of an
observable on the bipartite system. In our case, we depart from a metric on
the set of point masses/pure states. Although the metric version is in direct
analogy with the classical version, optimising over the expectation value of
an observable certainly has a more operational interpretation. Second, in the
definition of Eq. (174) we see that we indeed optimise over all couplings, in-
cluding separable ones. In our definition, we only optimise over separable ones,
which gives the definition in Eq. (174) a more quantum flavour. Furthermore,
computing Eq. (174) corresponds to an SDP, so it can be computed efficiently
in the dimension, whereas it is unclear if we can efficiently compute our version
in general. However, one of the main motivations for our work was to obtain
a unified way of obtaining various versions of Wp present in the literature, in-
cluding the definition of [50]. However, it appears that approaches like that of
Eq. (174) cannot mimic the behaviour of the Wasserstein distance of De Palma
et al. Consider, for example, the potential extension to an n-qudit space given
by
E. Beatty and D. Stilck França Ann. Henri Poincaré

' n (

T (ρ, σ) = min Tr τAB Pi,asym (d) : τAB ∈ D (Cd )⊗n ⊗ (Cd )⊗n : τA
i=1
= ρ, τB = σ} (176)
where Pi,asym (d) is the projection onto the asymmetric subspace of the ith
qudits (and is the identity on the other qudits). In stabilising, it makes little
diﬀerence whether we stabilise each qudit individually with its own copy of
I/2 or share one, so we stabilise this deﬁnition to
' n (

Ts (ρ, σ) = min Tr τAB Pi,asym (d ⊗ 2) : (177)
i=1

τAB ∈ D (Cd ⊗ C2 )⊗n ⊗ (Cd ⊗ C2 )⊗n , τA

= ρ ⊗ (I/2)⊗n , τB = σ ⊗ (I/2)⊗n (178)

and Ws (ρ, σ) = Ts (ρ, σ).
We will show, however, that any such type of generalisation leads to some
states ρ with Ts (ρ, ρ) > 0. Take, in this example, n = d = 2 and ρ to be a Bell
state |ψ + ψ + |. As ρ is pure, the set of possible couplings τ is very limited.
In the non-stabilised deﬁnition, τ can only be |ψ + ψ + | ⊗ |ψ + ψ + |. In the
stabilised deﬁnition, τ must have form |ψ + ψ + | ⊗ |ψ + ψ + | ⊗ ω for some
coupling ω of I⊗2 /4 and I⊗2 /4.
However, when looking at applying Pasym to the individual qubits, we
note that as the Bell state has qubit marginals I/2, we will have nonzero
transport cost from ρ to itself. Indeed in the non-stabilised case,

2

T (ρ, σ) = Tr |ψ + ψ + | ⊗ |ψ + ψ + | (Pi,asym (2)) (179)
i=1

I I
= 2Tr ⊗ Pasym (2) (180)
2 2
1 1
= Tr[Pasym (2)] = . (181)
2 2
In the stabilised case, we still have for any τ , that
' 2 (

Tr τAB Pi,asym (2 ⊗ 2) (182)
i=1

2

= Tr |ψ + ψ + | ⊗ |ψ + ψ + | ⊗ ω (Pi,asym (2 ⊗ 2)) (183)
i=1

2
I I
= Tr ⊗ ⊗ ωi Pasym (2 ⊗ 2) (184)
i=1
2 2
2
I I
= Tr ⊗ ⊗ ωi (Pasym (2) ⊗ Psym (2) + Psym (2) ⊗ Pasym (2)) (185)
i=1
2 2
Order p Quantum Wasserstein Distances from Couplings

2
1 3
= Tr [ωi Psym (2)] + Tr [ωi Pasym (2)] (186)
i=1
4 4
2
1 2
1 1
≥ Tr [ωi Psym (2)] + Tr [ωi Pasym (2)] = Tr[ωi ] = (187)
i=1
4 i=1
4 2

where in line (184), ωi is a coupling of I/2 with itself, on the ith stabilising
qubits.
We see from the details of this example that, no matter whether or not
we stabilise using many qubits or a single qubit, and no matter the size of
the space, any attempt to put a Pasym projector on a subspace of H will
result in highly entangled pure states, those which have marginals close to
maximally mixed on subspaces where Pasym is placed, having nonzero self-
distance. This is largely because the coupling of a pure state with itself is
forced to be the product of the state with itself, and because the identity
on Cd ⊗ Cd has a nonzero asymmetric and a nonzero symmetric component.
Thus, it appears that a version of the Wasserstein distance that is not based
on a single observable, like ours, is more suited to obtain generalisations of
quantities like the one of De Palma et al.

Appendix B: Development of the Proposed Definition

Definition of a Transport Plan
During the development of this proposal, many different ways of defining a
transport plan were considered. They all had different versions of the condi-
tions

qj |ψj ψj | = ρ, qj |ϕj ϕj | = σ, qj > 0. (188)
j∈J j∈J

As in the deﬁnition of the W1 norm in [50], a ﬁrst proposal was simply the
condition

qj (|ψj ψj | − |ϕj ϕj |) = ρ − σ, qj > 0. (189)
j∈J

It quickly becomes clear that using a transport plan of this form containing
telescoping sums could lead to the degeneracy of Wpd when
p > 1. The main
stumbling block here centred on the unboundedness of j∈J qj .

Two proposals for restricting j qj were then considered, the ﬁrst being

qj = 1 (190)
j∈J

and the second

1
qj = ρ−σ 1 . (191)
2
j∈J

Both of these were discounted when considering the transport plans that were
H
allowed under these regimes, and their impact on the norm W∞ .
E. Beatty and D. Stilck França Ann. Henri Poincaré

For transport plans of the form (190), consider the W∞ H

distance on H =
3 ⊗2
C , and the states ρ = 2 |0000| + 2 |2222|, σ = 2 |1111| + 12 |2222|.
1 1 1

Classically, the inﬁnite-order transport distance between the measures μ =

2 δ00 + 2 δ22 and ν = 2 δ11 + 2 δ22 on the Hamming cube {0, 1, 2} is 2. With
1 1 1 1 2

the deﬁnition from (190) however, we could deﬁne a transport plan

1 1
Q= , |00 , |01 , , |01 , |11 (192)
2 2
with a maximum d(|ψj , |ϕj ) of 1. Though as in (70) we are perfectly happy
that the W∞ H ¯
distance might not match Ornstein’s d-distance for classical
measures, it certainly should not differ because of a transport plan of a classical
nature such as this one.
The picture for condition (191) is very similar. Consider W∞ H
on H =
2 ⊗2
C , with states ρ = 12 |0000| + 12 |0101|, σ = 12 |0101| + 12 |1111|.
Classically, these have an infinite-order transport distance of 1 on the Ham-
ming cube {0, 1}2 . However, because 12 ρ − σ 1 is small in this case, the only
permitted transport plan is

1
Q= , |00 , |11 (193)
2
with a maximum d(|ψj , |ϕj ) of 2. The condition (191) does not permit quan-
tum versions of the standard classical transport plans, and so this definition
of a transport plan seems strictly worse than the one we propose in this work.

7.1. Entangled Couplings and Transport Plans

The main reason we need to restrict to separable couplings is the fact that it is
not obvious at ﬁrst how to attribute a distance to an entangled state starting
from a distance on PH and then attribute a transportation cost to transport
plans that include entangled pure states and satisfy the marginal constraints.
One possibility is as follows: given a state |ψ in P(H ⊗ H) and SD(|ψ)
the set of Schmidt decompositions of |ψ, we deﬁne d(|ψ) as:

d(|ψ) = √ inf pi d(|φ1 , |φ2 ). (194)
i pi |φ1 ⊗|φ2 ∈SD(|ψ )
i

It is then easy to see that we recover the original distance when considering
product states. However, it is also easy to see that, at least for the case p =
1, enlarging the set of possible states to include entangled states oﬀers no
advantage.
√ Indeed, given any Schmidt decomposition of an entangled |ψ =
i pi |φ1 ⊗ |φ2 , adding instead {(pi , |φ1 , |φ2 )}i to the transport plan will
give the same cost and still satisfy the marginal constraints. Thus, at least
for this possible generalisation of the distance to entangled states, there is no
advantage in considering entangled couplings.

7.2. Transport Plans Defined from Absolutely Continuous Measures

The deﬁnition of a transport plan as given in Eq. (11) takes a countable set
{qj }j∈J of positive weights such that j∈J qj = 1, and applies these weights
Order p Quantum Wasserstein Distances from Couplings

to pairs (|ψj , |ϕj ) in PH1 × PH2 . This is equivalent to a probability measure

μ on PH1 × PH2 of the form

q= qj δ(|ψj ,|ϕj ) (195)
j∈J

such that

|ψψ|d(π1# q)(|ψ) = ρ, |ϕϕ|d(π2# q)(|ϕ) = σ (196)
PH1 PH2

for π1 and π2 the projections onto the first and second coordinates of PH1 ×
PH2 , respectively.
We could, instead, relax this definition to consider transport plans defined
by arbitrary probability measures q on PH1 × PH2 satisfying Eq. (196). For a
distance d on PH, taking H = H1 = H2 , this would give transport cost

Tpd (q) = d(|ψ , |ϕ)p dq(|ψ , |ϕ). (197)
PH×PH

In order to replicate the proof of continuity from Appendix D 1, we would take

q to be absolutely continuous with respect to the Haar measure on PH × PH,
and then all of the proofs in this work can be replicated replacing sums with
integrals and qj with the density q(|ψ , |ϕ) with respect to the Haar measure.
The following discussion shows that the infimum over transport plans de-
fined from absolutely continuous measures is at least the infimum over count-
able transport plans, and so we lose nothing in transport cost by only consider-
ing countable plans. Again, we restrict to uniformly continuous d for continuity.
Let q be a measure on PH×PH which has density q(|ψ , |ϕ) with respect
to the Haar measure. Take a dense subset {(|ψj , |ϕj )}∞ j=1 of PH × PH. Let
Bj ( ) be the open ball of radius around (|ψj , |ϕj ), with respect to the
distance |ψψ| − |ψ ψ | 2 + |ϕϕ| − |ϕ ϕ | 2 . Let {uj }∞
j=1 be a partition
of unity over Bj ( ). We can then define

qj = q(|ψ , |ϕ)uj (|ψ , |ϕ)dμHaar (|ψ , |ϕ). (198)
Bj ()

and let

Q = {(qj , |ψj , |ϕj )}∞

j=1 (199)

which is a countable transport plan between some states ρ , σ . We will show

that ρ and σ are very close to ρ and σ, and that Q has a very similar transport
cost to q.
E. Beatty and D. Stilck França Ann. Henri Poincaré

We have that

qj |ψj ψj | − |ψψ|q(|ψ , |ϕ)uj (|ψ , |ϕ)dμHaar (|ψ , |ϕ) (200)
Bj ()
2

= (|ψj ψj | − |ψψ|)q(|ψ , |ϕ)uj (|ψ , |ϕ)dμHaar (|ψ , |ϕ) (201)
Bj ()
2

≤ |ψj ψj | − |ψψ| 2 q(|ψ , |ϕ)uj (|ψ , |ϕ)dμHaar (|ψ , |ϕ) (202)
Bj ()

≤ q(|ψ , |ϕ)uj (|ψ , |ϕ)dμHaar (|ψ , |ϕ) (203)
Bj ()
= qj (204)
and therefore that ρ − ρ ∞ ≤ ρ − ρ 2 ≤ . The same holds for σ.
For the similarity of cost,
p
T (Q) − T p (q)
d d

∞
= d(|ψ , |ϕ ) q(|ψ , |ϕ )dμHaar (|ψ , |ϕ ) (205)
p p
qj d(|ψj , |ϕj ) −
j=1 PH×PH

∞
p
= q(|ψ , |ϕ )uj (|ψ , |ϕ ) d(|ψj , |ϕj ) − d(|ψ , |ϕ ) dμHaar (|ψ , |ϕ )
p
(206)
j=1 Bj ()
∞
p
q(|ψ , |ϕ )uj (|ψ , |ϕ ) d(|ψj , |ϕj ) − d(|ψ , |ϕ ) dμHaar (|ψ , |ϕ ) (207)
p
≤
j=1 Bj ()

∞
p
≤ qj sup d(ψ , ϕ ) − d(|ψ , |ϕ )p (208)
|ψψ|−|ψ ψ | +|ϕϕ|−|ϕ ϕ | ≤
j=1 2 2
p
= sup d(ψ , ϕ ) − d(|ψ , |ϕ )p . (209)

|ψψ|−|ψ ψ |2 +|ϕϕ|−|ϕ ϕ |2 ≤

swapping limits in line 206 as the terms in each part are all positive, and
combining into one as the sums and integrals are all bounded. This ﬁnal line
tends to 0 with due to the uniform continuity of d.
From the proof of continuity in Appendix D 1, we can ﬁnd a countable
transport plan Q from ρ to σ with cost at most the cost of Q in the limit
→ 0. Therefore in the limit → ∞, Tpd (Q ) ≤ Tpd (q) and we are done.

Appendix C: Towards the Triangle Inequality

As with many other attempts to generalise the classical Wasserstein distances
to the quantum setting, the triangle inequality eludes us. The central barrier
to the triangle inequality is the quantum marginal problem: even in the case
where we do not require our couplings to be separable, there is no coherent
way to build a coupling τ13 from coupling τ12 of ρ1 and ρ2 and coupling τ23 of
ρ2 and ρ3 .
All known proofs of the triangle inequality for the classical Wp distances
rely on this idea: given a coupling γ12 of measures μ1 and μ2 , and coupling
Order p Quantum Wasserstein Distances from Couplings

γ23 of μ2 and μ3 , we can ﬁnd measure γ123 with (1, 2)-marginal γ12 and (2, 3)-
marginal γ23 . We can then take the (1, 3)-marginal γ13 of this overarching
measure and show that [11]

Tp (γ13 )1/p ≤ Tp (γ12 )1/p + Tp (γ23 )1/p (210)

via Minkowski’s inequality. Without an equivalent τ123 in the quantum setting
from which to form coupling τ13 , there is as of yet no clear path to a triangle
inequality.
For the case p = 1, we can focus on the case where ρ − σ DW1d =
W1d (ρ, σ), as · DW1d does satisfy the triangle inequality. As mentioned in
Sect. 4.1, this remains a key open problem of our work.

Appendix D: Auxiliary Proofs

Proof of Proposition 17 on the Continuity of Wpd
Proposition 17. Suppose d is continuous on PH and let 1 ≤ p < ∞. Then Wpd
is uniformly continuous.
Proof. Take > 0. Let ρ, σ, ρ , σ be states on PH with ρ − ρ ∞ ≤ δ and
σ − σ ∞ ≤ δ, for some δ to be chosen later. For any transport plan from ρ
to σ, we will form a transport plan from ρ to σ with a similar cost. Note that
as H is ﬁnite-dimensional, PH is compact and so d is uniformly continuous.
Let c 1 (also to be chosen later, but note scale 1 cδ) and let Sρ ,
Sσ , respectively, be the span of the eigenvectors of ρ , σ whose eigenvalues
are at least cδ. Let Πρ , Πσ be the projectors onto Sρ , Sσ , respectively.
Let

Q = {(qj , |ψj , |ϕj )}j∈J (211)

)

be any pth -order transport plan from ρ to σ. Define then |ψj = ψjρ + ψj⊥
) )

where ψjρ = Πρ |ψj and ψj⊥ is perpendicular to Sρ . Define ϕσj and
⊥
ϕ analogously.
j
We can then begin to define a transport plan from ρ to σ . Let
ρ ) * ρ
ρ̃ = Πρ ρΠρ = qj ψj ψj (212)
j∈J

and deﬁne σ̃ analogously. Note that Πρ (ρ − ρ )Πρ ∞ ≤ ρ − ρ ∞ < δ.

Therefore,

1 1
ρ̃ ≤ Πρ ρ Πρ + δISρ ≤ 1 + Πρ ρ Πρ ≤ 1 + ρ (213)
c c
c
and so c+1 ρ̃ ≤ ρ .
The same holds for σ, so c
c+1 σ̃ ≤ σ .
E. Beatty and D. Stilck França Ann. Henri Poincaré

We may then begin to build our new transport plan starting with the
partial transport plan
⎧⎛ ) ) ⎞⎫
⎨ ρ σ ⎬
ψ j ϕj
Q1 = ⎝qj , ,
⎠ (214)
⎩ ψ ρ |ψ ρ ϕσ |ϕσ ⎭

j j j j
j∈J

where qj = c
c+1 qj min{ψjρ |ψjρ , ϕσj |ϕσj }. Note then that
) * ) *
ρ σ
c ψj ψjρ c ϕj ϕσj
ρ ≥ ρ̃ ≥ qj and σ ≥ σ̃ ≥ qj .
c+1
j∈J ψjρ |ψjρ c+1
j∈J
ϕσj |ϕσj
(215)
)*
ρ
ψj ψjρ
We can then transport the positive semideﬁnite operator ρ − j∈J qj
ψjρ |ψjρ
)*
σ
ϕj ϕσ
j
onto σ − j∈J qj ϕσ
via any partial transport plan. Let these transport
j |ϕj
σ

plan elements form set Q2 . It follows that Q1 ∪ Q2 is a transport plan from ρ

to σ .
We can now attempt bounding the cost of this transport plan. We will
show that Q2 has a very small cost and that Q1 has a cost very close to Tpd (Q).
Starting with the Q2 part, we know that
⎡ ) * ⎤
ρ
ψj ψjρ
⎣
Tp (Q2 ) ≤ Tr ρ −
d
qj ⎦ diamd (PH)d . (216)
ρ ρ
j∈J ψ j |ψ j

We will show that this trace part is very small. Indeed

⎡ ) * ⎤ ) *
ρ ρ
ψj ψjρ ψj ψjρ
⎣
Tr ρ −
qj ⎦
= ρ −
qj (217)
ψ ρ
|ψ ρ
ψ ρ
|ψ ρ

j∈J j j j∈J j j
1 ) *
ρ
ψj ψjρ
≤ ρ−ρ 1+

ρ − qj (218)
j∈J ψj |ψj
ρ ρ

) * 1
ρ
ψj ψjρ

≤ δdimH + ρ − qj (219)
ψ ρ
|ψ ρ

j∈J j j
1

Choose M > 0 (again to be determined later), and select L >

and we can split up these 1-norm terms into j ∈ K and j ∈ / K. For j ∈ K, we

have

|ψjρ ψjρ |

|ψjρ ψjρ |

1
/ qj , let {|α}α be some orthonormal basis for the orthogonal complement
j ∈K
of Sρ . We have
⎡ ⎤

ρ − ρ ∞ dimH ≥ α|ρ − ρ |α ≥ Tr ⎣ qj (|ψj⊥ ψj⊥ | + |ϕ⊥ ⊥ ⎦
j ϕj |)
α j

1
≤ dimH ( ρ − ρ ∞ + σ − σ ∞ + 2cδ) (226)
L
2δ(c + 1)
≤ dimH. (227)
L
This gives us
) *
ρ
ψj ψjρ
√
ρ −
qj ≤ 2 L + L + 1 + 4δ(c + 1) dimH. (228)
ρ ρ
ψj |ψj c+1 L
j∈J
1

By symmetry this holds for σ, and so substituting into Eq. (219) and then
(216) gives

√ 1 4δ(c + 1)
Tpd (Q2 ) ≤ δdimH + 2 L + L + + dimH diamd (PH)p .
c L
(229)
This bounds Tpd (Q2 ) above.
E. Beatty and D. Stilck França Ann. Henri Poincaré

For the bounding of Q1 , we have

% ) )&p

Tpd (Q1 ) = qj d ψjρ , ϕσj (230)
j∈J
% ) )&p % ) )&p

= qj d ψjρ , ϕσj + qj d ψjρ , ϕσj (231)
j∈K j ∈K
/
p 2δ p
≤ qj d (|ψj , |ϕj ) + M + dimH · diamd (PH) (232)
L
j∈K
2δ
≤ Tpd (Q) + M + dimH · diamd (PH)p . (233)
L
It follows then that

(4c + 6)δ √ 1
Tpd (Q1 ∪ Q2 ) ≤ Tpd (Q) + M + dimH + δdimH + 2 L + L +
L c+1
diamd (PH)p . (234)
1
And so choosing M < , L suﬃciently small such that the conti-
4 √
nuity condition for M is satisﬁed and such that 2 L + L ≤ 4diamd(PH)p ,
p
c = 4diamd(PH) , and then choosing δ < 4(1+(4c+6)/L)dimHdiam

d (PH)
p gives

Tp (Q1 ∪ Q2 ) ≤ Tp (Q) + . Thus we have a transport plan for ρ to σ with

d d

cost at most more than the transport cost of plan Q between ρ and σ.
Taking the inﬁmum over all such transport plans Q, we see that whenever
ρ − ρ ∞ < δ and σ − σ ∞ < δ, we have
inf Tpd (Q ) ≤ + inf Tpd (Q). (235)
Q ∈Q(ρ ,σ ) Q∈Q(ρ,σ)

It follows that Wpd is uniformly continuous.

Proof of Corollary 33 on the Approximate Gate Complexity of Random Pure
States
⊗n
Corollary 33. Let ρ, σ be i.i.d. states on H = C2 generated by an auxiliary
system A of dimension s = 2cn where 0 ≤ c < 1. For all δ > 0,
' 1/3 (
(1−δ)n
2 −1
P|ϕ W1C (ρ, σ) ≤ 2/3 n−1 κ ≤ e−Ω(2 log((2) )) .
n

poly(n, log −1 )
(236)
Proof. For any ﬁxed |ψ , |ϕ, using Eq. (89) and Lemma 31, we have
2/3 −2
dC (|ψ , |ϕ) ≥ min κG(U, )1/3 n (237)
U ∈SU (2n ),U |ψ =|ϕ
1/3
GS (U, 2 ) 2/3 −2
≥ min κ −1 ))
n
U ∈SU (2 ),U |ψ =|ϕ
n poly(log(G(U, )) + log(
(238)
1/3
GS (U, 2 ) 2/3 −2
≥ min κ n . (239)
U ∈SU (2n ),U |ψ =|ϕ poly(n, log( −1 ))
Order p Quantum Wasserstein Distances from Couplings

Given that W1C (ρ, σ) ≥ min {dC (|ψ , |ϕ) : |ψ ∈ span ρ, |ϕ ∈ span σ}, it
follows from Lemma 32 that
' 1/3 (
2/3 −1 2(1−δ)n
Pρ,σ W1 (ρ, σ) ≤
C
n κ (240)
poly(n, log −1 )
' 1/3
GS (U, 2 ) 2/3 −2
≤ Pρ,σ min κ n
U :U |ψ =|ϕ ,|ψ ∈ span ρ,|ϕ ∈ span σ poly(n, log( −1 ))
1/3 (
2/3 −2 2(1−δ)n
≤ n κ (241)
poly(n, log −1 )
! opt "
= Pρ,σ GS Uρ→σ , 2 ≤ 2(1−δ)n (242)
≤ e−Ω(2
n
log(1/2)
(243)
as claimed.

References
[1] Arjovsky, M., Chintala, S., & Bottou, L.: Wasserstein generative adversarial net-
works. In Proceedings of the 34th International Conference on Machine Learning,
volume 70 of Proceedings of Machine Learning Research, pages 214–223. PMLR,
06–11 (Aug 2017)
[2] Agón, C.A., Headrick, M., Swingle, B.: Subsystem complexity and holography.
J High Energy Phys 2019(2), 1 (2019)
[3] Brock, A., Donahue, J. and Simonyan, K.: Large scale gan training for high
fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, (September
2018)
[4] Biswal, P.: Hypercontractivity and its applications. arXiv preprint
arXiv:1101.2913, (January 2011)
[5] Bonami, A.: Étude des coefficients de Fourier des fonctions de lp (g). Annales de
l’Institut Fourier 20(2), 335–402 (1970)
[6] Bunth, G., Pitrik, J., Titkos, T. and Virosztek, D.: On the metric property of
quantum Wasserstein divergences. arXiv preprint arXiv:2402.13150, (2024)
[7] Bardet, I., Rouzé, C.: Hypercontractivity and logarithmic Sobolev inequality for
non-primitive quantum Markov semigroups and estimation of decoherence rates.
arXiv preprint arXiv:1803.05379, (2018)
[8] Brown, A.R.: A quantum complexity lower bound from differential geometry.
Nat. Phys. 19, 401–406 (2023)
[9] Brown, A.R., Roberts, D.A., Susskind, L., Swingle, B., Zhao, Y.: Complexity,
action, and black holes. Phys. Rev. D 93, 086006 (2016)
[10] Brown, A.R., Susskind, L.: Complexity geometry of a single qubit. Phys. Rev.
D 100, 046020 (2019)
[11] Clement, P., Desch, W.: An elementary proof of the triangle inequality for the
Wasserstein metric. Proc. Am. Math. Soc. 136, 333–340 (2008)
E. Beatty and D. Stilck França Ann. Henri Poincaré

[12] Cole, S., Eckstein, M., Friedland, S., Zyczkowski, K.: On quantum optimal trans-
port. Math. Phys. Anal. Geom. 26, 14 (2023)
[13] Caglioti, E., Golse, F., Paul, T.: Towards optimal transport for quantum densi-
ties. Annali Scuola Normale Superiore - Classe di Scienze, (January 2021)
[14] Coffman, V., Kundu, J., Wootters, W.K.: Distributed entanglement. Phys. Rev.
A 61(5), 052306 (2000)
[15] Carlen, E.A., Maas, J.: An analog of the 2-Wassertein metric in non-commutative
probability under which the Fermionic Fokker-Planck equation is gradient flow
for the entropy. Commun. Math. Phys. 331(3), 887–926 (2014)
[16] Carlen, E.A., Maas, J.: Gradient flow and entropy inequalities for quantum
Markov semigroups with detailed balance. J. Funct. Anal. 273(5), 1810–1869
(2017)
[17] Carlen, E.A., Maas, J.: Non-commutative calculus, optimal transport and func-
tional inequalities in dissipative quantum systems. J. Stat. Phys. 178(2), 319–378
(2019)
[18] Chakrabarti, S., Yiming, H., Li, T., Feizi, S., Wu, X.: Quantum Wasserstein
generative adversarial networks. In H. Wallach, H. Larochelle, A. Beygelzimer,
F. d’ Alché-Buc, E. Fox, and R. Garnett, editors, Adv. Neural Inf. Process. Syst.,
32. (2019)
[19] Michel Marie Deza and Elena Deza: Encyclopedia of Distances. Springer, Berlin
Heidelberg (2013)
[20] Dallaire-Demers, P., Killoran, N.: Quantum generative adversarial networks.
Phys. Rev. A, 98(1), (2018)
[21] Duvenhage, R., Mapaya, M.: Quantum wasserstein distance of order 1 between
channels. Infinite Dimens. Anal. Quant. Prob. Relat. Topics, 26(3):2350006,
(2023)
[22] Dowling, M.R., Nielsen, M.A.: The geometry of quantum computation. Quant.
Inf. Comput. 8(10), 861–899 (2008)
[23] De Palma, G., Klein, T., Pastorello, D.: Classical shadows meet quantum optimal
mass transport. J. Math. Phys., 65(9), (2024)
[24] De Palma, G., Marvian, M., Rouzé, C., França, D.S.: Limitations of variational
quantum algorithms: a quantum optimal transport approach. PRX Quant. 4(1),
010309 (2023)
[25] Datta, N., Rouzé, C.: Relating relative entropy, optimal transport and Fisher
information: a quantum HWI inequality. Annales Henri Poincaré 21(7), 2115–
2150 (2020)
[26] Duvenhage, R., Skosana, S., & Snyman, M.: Extending quantum detailed balance
through optimal transport. arXiv preprint arXiv:2206.15287, (June 2022)
[27] Duvenhage, R.: Quadratic Wasserstein metrics for von Neumann algebras via
transport plans. J. Operat. Theory, 88(2), (2022)
[28] Eldar, L., & Harrow, A. W.: Local hamiltonians whose ground states are hard to
approximate. In 2017 IEEE 58th Annual Symposium on Foundations of Com-
puter Science (FOCS), page 427-438. IEEE, (October 2017)
[29] Friedland, S., Eckstein, M., Cole, S., & Życzkowski, K.: Quantum Monge-
Kantorovich problem and transport distance between density matrices. Phys.
Rev. Lett., 129(11), (2022)
Order p Quantum Wasserstein Distances from Couplings

[30] Frogner, C., Zhang, C., Mobahi, H., Araya, M., & Poggio, T. A.: Learning with
a Wasserstein loss. In Adv. Neural Inf. Process. Syst. (NIPS) 28, (2015)
[31] Galichon, A.: Optimal transport methods in economics. Princeton University
Press, (2018)
[32] Gao, L., Junge, M., & Li, H.: Geometric approach towards complete logarithmic
sobolev inequalities. arXiv preprint arXiv:2102.04434, (February 2021)
[33] Golse, F., Mouhot, C., Paul, T.: On the mean ﬁeld and classical limits of quan-
tum mechanics. Commun. Math. Phys. 343, 165–205 (2015)
[34] Goodfellow, Ian J., Pouget-Abadie, Jean, Mirza, Mehdi, Xu, Bing, Warde-Farley,
David, Ozair, Sherjil, Courville, Aaron, Bengio, Yoshua: Generative adversarial
nets. Proceedings of the International Conference on Neural Information Pro-
cessing (NIPS 2014), (2014)
[35] Heller, M.P.: Geometry and complexity scaling. Nat. Phys. 19, 312–313 (2023)
[36] Kantorovich, L.V.: On the translocation of masses. Doklady Akademii Nauk
SSSR 37(7–8), 227–229 (1942)
[37] Keevash, P., Lifshitz, N., Long, E., Minzer, D.: Global hypercontractivity and
its applications. arXiv preprint arXiv:2103.04604, (2021)
[38] Tussi Kiani, B., De Palma, G., Marvian, M., Liu, Z.W., Lloyd, S.: Learning
quantum data with the quantum earth mover’s distance. Quant. Sci. Technol.
7(4), 045002 (2022)
[39] Li, L., Bu, K., Koh, D. E., Jaﬀe, A., & Lloyd, S.: Wasserstein complexity of
quantum circuits. arXiv preprint arXiv:2208.06306, (2022)
[40] Liu, J., & Wu, W.: Automatic image annotation using improved wasserstein
generative adversarial networks. IAENG Int. J. Comput. Sci., 48(3), (2021)
[41] McClean, Jarrod R., Boixo, Sergio, Smelyanskiy, Vadim N., Babbush, Ryan,
Neven, Hartmut: Barren plateaus in quantum neural network training land-
scapes. Nat. Commun., 9(1), (2018)
[42] Müller-Hermes, A.: On the monotonicity of a quantum optimal transport cost,
(November 2022)
[43] Monge, G.: Mémoire sur la théorie des déblais et des remblais. Imprimerie royale,
(1781)
[44] Nielsen, Michael A., Chuang, Isaac L.: Quantum Computation and Quantum
Information: 10th Anniversary Edition. Cambridge University Press, (2010)
[45] Nielsen, M.A., Dowling, M.R., Mile, G., Doherty, A.C.: Quantum computation
as geometry. Science 311(5764), 1133–1135 (2006)
[46] Nielsen, M.A.: A geometric approach to quantum circuit lower bounds. Quant.
Inf. Comput. 6(3), 213–262 (2006)
[47] O’Donnell, Ryan: Analysis of Boolean Functions. Cambridge University Press,
(2014)
[48] Ornstein, D.S.: An application of ergodic theory to probability theory. Ann.
Prob. 1(1), 43–58 (1973)
[49] De Palma, G., Klein, T., & Pastorello, D.: Classical shadows meet quantum
optimal mass transport. arXiv preprint arXiv:2309.08426, (September 2023)
[50] De Palma, G., Marvian, M., Trevisan, D., Lloyd, S.: The quantum Wasserstein
distance of order 1. IEEE Trans. Inf. Theory 67, 6627–6643 (2021)
E. Beatty and D. Stilck França Ann. Henri Poincaré

[51] De Palma, G., Rouzé, C.: Quantum concentration inequalities. Annales Henri
Poincaré 23(9), 3391–3429 (2022)
[52] De Palma, G., Trevisan, D.: Quantum optimal transport with quantum channels.
Annales Henri Poincaré 22(10), 3199–3234 (2021)
[53] De Palma, G., & Trevisan, D.: Quantum optimal transport: Quantum channels
and qubits. In Summer School on Optimal Transport on Quantum Structures,
Erdos Center, Rényi Institute, Budapest, Hungary, arxiv:2307.16268, (Septem-
ber 2022)
[54] De Palma, G., & Trevisan, D.: The Wassertein distance of order 1 for quantum
spin systems on inﬁnite lattices. Annales Henri Poincaré, (June 2023)
[55] Panaretos, V.M., Zemel, Y.: Statistical aspects of Wassertein distances. Annu.
Rev. Stat. Appl. 6(1), 405–431 (2019)
[56] Rouzé, C., Datta, N.: Concentration of quantum states from quantum functional
and transportation cost inequalities. J. Math. Phys. 60(1), 012202 (2019)
[57] Rouzé, C., França, D.S.: Learning quantum many-body systems from a few
copies. arXiv preprint arXiv:2107.03333, (July 2021)
[58] Raginsky, M., Sason, I.: Concentration of measure inequalities and
their communication and information-theoretic applications. arXiv preprint
arXiv:1510.02947, (October 2015)
[59] Ruan, S.-M.: Circuit Complexity of Mixed States. Phd thesis, University of Wa-
terloo, (2021)
[60] Sánchez-Ruiz, J.: Simple proof of Page’s conjecture on the average entropy of a
subsystem. Phys. Rev. E 52, 5653–5655 (1995)
[61] Joaquim Telles de Miranda and Tobias Micklitz: Subsystem trace-distances of
two random states. J. Phys. A Math. Theor. 56(17), 175301 (2023)
[62] Tóth, G., Moroder, T., Gühne, O.: Evaluating convex roof entanglement mea-
sures. Phys. Rev. Lett. 114(16), 10501 (2015)
[63] Trashorras, J.: Large deviations for a matching problem related to the inﬁnity-
Wasserstein distance. Lat. Am. J. Prob. Math. Stat 15, 247–278 (2018)
[64] Villani, C.: Topics in Optimal Transportation. Graduate Studies in Mathematics.
American Mathematical Society, (2003)
[65] Villani, Cédric: Optimal transport: old and new, volume 338. Springer Science &
Business Media, (2008)
[66] Weil, A.: L’integration dans les groupe topologiques et ses applications. Actu-
alités Scientifiques et Industrielles, (1940)
[67] Wei, T.-C., Goldbart, P.M.: Geometric measure of entanglement and applica-
tions to bipartite and multipartite quantum states. Phys. Rev. A, 68 (2003)
[68] Wootters, W.K.: Entanglement of formation of an arbitrary state of two qubits.
Phys. Rev. Lett. 80, 2245–2248 (1998)

Emily Beatty and Daniel Stilck França

Univ Lyon, ENS Lyon, UCBL, CNRS, Inria, LIP
69342 Lyon Cedex 07
France
e-mail: [email protected];
daniel.stilck [email protected]
Order p Quantum Wasserstein Distances from Couplings

Daniel Stilck França

Department of Mathematical Sciences
University of Copenhagen
Universitetsparken 5
2100 Copenhagen
Denmark

Communicated by David Pérez-Garcı́a.

Received: May 13, 2024.
Accepted: February 15, 2025.

Wasserstein Metric
No ratings yet
Wasserstein Metric
5 pages
Fornasier 2024 Approximation 1
No ratings yet
Fornasier 2024 Approximation 1
63 pages
Tangential Wasserstein Projections
No ratings yet
Tangential Wasserstein Projections
41 pages
Graph Diffusion Wasserstein Distances
No ratings yet
Graph Diffusion Wasserstein Distances
17 pages
Optimal Transport On Gas Networks
No ratings yet
Optimal Transport On Gas Networks
43 pages
Ederated Asserstein Istance: Alain Rakotomamonjy Kimia Nadjahi
No ratings yet
Ederated Asserstein Istance: Alain Rakotomamonjy Kimia Nadjahi
23 pages
SWIFT: Scalable Wasserstein Factorization For Sparse Nonnegative Tensors
No ratings yet
SWIFT: Scalable Wasserstein Factorization For Sparse Nonnegative Tensors
25 pages
Euclidean, Metric, and Wasserstein Gradient Flows: An Overview
No ratings yet
Euclidean, Metric, and Wasserstein Gradient Flows: An Overview
65 pages
Sinkhorn Distances: Lightspeed Computation of Optimal Transport
No ratings yet
Sinkhorn Distances: Lightspeed Computation of Optimal Transport
9 pages
Metric Measure Spaces Analysis
No ratings yet
Metric Measure Spaces Analysis
71 pages
Quantum Optimal Transport For Af-C - Algebras: David F. Hornshaw
No ratings yet
Quantum Optimal Transport For Af-C - Algebras: David F. Hornshaw
388 pages
2023 ICLR Hierarchical Sliced Wasserstein Distance
No ratings yet
2023 ICLR Hierarchical Sliced Wasserstein Distance
29 pages
(Euclidean, Metric, and Wasserstein) Gradient Flows
No ratings yet
(Euclidean, Metric, and Wasserstein) Gradient Flows
68 pages
Learning Wasserstein Embeddings
No ratings yet
Learning Wasserstein Embeddings
10 pages
24 Aos2368
No ratings yet
24 Aos2368
36 pages
Department of Mathematics, The Ohio State University.: Bstract
No ratings yet
Department of Mathematics, The Ohio State University.: Bstract
23 pages
1.1 Graph Theory
No ratings yet
1.1 Graph Theory
19 pages
Hypergraph Co-Optimal Transport Framework
No ratings yet
Hypergraph Co-Optimal Transport Framework
21 pages
A Linear Transportation LP Distance For Pattern Recognition
No ratings yet
A Linear Transportation LP Distance For Pattern Recognition
41 pages
Graph Theory in Distribution and Transportation Problems
No ratings yet
Graph Theory in Distribution and Transportation Problems
14 pages
Optimal Transport: Fast Probabilistic Approximation With Exact Solvers
No ratings yet
Optimal Transport: Fast Probabilistic Approximation With Exact Solvers
23 pages
Reachable Distance Function For KNN Classification
No ratings yet
Reachable Distance Function For KNN Classification
152 pages
Mass Transport Final
No ratings yet
Mass Transport Final
137 pages
Constrained Sliced Wasserstein Embedding: Navid Naderializadeh
No ratings yet
Constrained Sliced Wasserstein Embedding: Navid Naderializadeh
27 pages
Intro to Space-Filling Curves
No ratings yet
Intro to Space-Filling Curves
10 pages
Course Optimal Transport
No ratings yet
Course Optimal Transport
46 pages
Optimizing Urban Network Via Mass Transport
No ratings yet
Optimizing Urban Network Via Mass Transport
161 pages
Topology and The Real Number Line
No ratings yet
Topology and The Real Number Line
19 pages
OTNote
No ratings yet
OTNote
46 pages
Semi-Relaxed Gromov Wasserstein Divergence With Applications On Graphs
No ratings yet
Semi-Relaxed Gromov Wasserstein Divergence With Applications On Graphs
28 pages
Bohm Feynman
No ratings yet
Bohm Feynman
15 pages
21 Ejs1920
No ratings yet
21 Ejs1920
53 pages
Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances
No ratings yet
Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances
13 pages
Interpolating Between Optimal Transport and MMD Using Sinkhorn Divergences
No ratings yet
Interpolating Between Optimal Transport and MMD Using Sinkhorn Divergences
15 pages
Minkowski Distance
No ratings yet
Minkowski Distance
2 pages
Sublinear Cuts Are The Exception in Bdf-Girgs: Marc Kaufmann Raghu Raman Ravi Ulysse Schaller
No ratings yet
Sublinear Cuts Are The Exception in Bdf-Girgs: Marc Kaufmann Raghu Raman Ravi Ulysse Schaller
23 pages
Institute of Engineering and Technology, Lucknow: Special Lab (Kcs-751)
No ratings yet
Institute of Engineering and Technology, Lucknow: Special Lab (Kcs-751)
37 pages
Institute of Engineering and Technology, Lucknow: Special Lab (Kcs-751)
No ratings yet
Institute of Engineering and Technology, Lucknow: Special Lab (Kcs-751)
37 pages
Jean Paul Penot1989
No ratings yet
Jean Paul Penot1989
15 pages
WGAN Explained
No ratings yet
WGAN Explained
30 pages
Supralight Quantum Tension - An Analysis in Black Holes and Quantum Circuits
No ratings yet
Supralight Quantum Tension - An Analysis in Black Holes and Quantum Circuits
38 pages
LU1 - Distance Based Models
No ratings yet
LU1 - Distance Based Models
13 pages
(Jakob) Sic Povm
No ratings yet
(Jakob) Sic Povm
25 pages
Studies in Graph Theory-Distance Related Concepts in Graphs
No ratings yet
Studies in Graph Theory-Distance Related Concepts in Graphs
133 pages
Some Unusual Locus Problems
No ratings yet
Some Unusual Locus Problems
10 pages
Gromov-Wasserstein: Entropic Regularization & Duality
No ratings yet
Gromov-Wasserstein: Entropic Regularization & Duality
32 pages
Lie Theory in Mathematical Physics
No ratings yet
Lie Theory in Mathematical Physics
63 pages
A Notion of Fractality For A Class of States and Noncommutative Relative Distance Zeta Functional
No ratings yet
A Notion of Fractality For A Class of States and Noncommutative Relative Distance Zeta Functional
46 pages
Hausdorff and Wasserstein Metrics On Graphs and Other Structured Data
No ratings yet
Hausdorff and Wasserstein Metrics On Graphs and Other Structured Data
41 pages
Relative Optimal Transport: P P P P
No ratings yet
Relative Optimal Transport: P P P P
32 pages
Information Geometry: Shun-Ichi Amari
No ratings yet
Information Geometry: Shun-Ichi Amari
48 pages
The Bayesian Update: Variational Formulations and Gradient Flows
No ratings yet
The Bayesian Update: Variational Formulations and Gradient Flows
28 pages
K-Nearest Neighbor & Dimensionality
No ratings yet
K-Nearest Neighbor & Dimensionality
17 pages
Prob Level Sets
No ratings yet
Prob Level Sets
8 pages
Definition of Limit... - E. H. Moore
No ratings yet
Definition of Limit... - E. H. Moore
5 pages
Concentrating Partial Entanglement by Local Operations
No ratings yet
Concentrating Partial Entanglement by Local Operations
21 pages
Continuous No Where Differentiable Functions
No ratings yet
Continuous No Where Differentiable Functions
98 pages
Fulltext
No ratings yet
Fulltext
30 pages
Comp514 HW1
No ratings yet
Comp514 HW1
6 pages
PaperFile 48 35
No ratings yet
PaperFile 48 35
106 pages
Ujhandout
No ratings yet
Ujhandout
70 pages
Numerical Radius
No ratings yet
Numerical Radius
127 pages
An Extended Matrix Exponential Formula
No ratings yet
An Extended Matrix Exponential Formula
5 pages
2402 10526v1
No ratings yet
2402 10526v1
19 pages
Article
No ratings yet
Article
20 pages
2022optiscicom List of Contributions and Abstracts Web Upload
No ratings yet
2022optiscicom List of Contributions and Abstracts Web Upload
59 pages
Kyoto University
No ratings yet
Kyoto University
15 pages
Generalizing Lieb's Concavity Theorem Via Operator Interpolation
No ratings yet
Generalizing Lieb's Concavity Theorem Via Operator Interpolation
26 pages
Lie Trotter
No ratings yet
Lie Trotter
14 pages
MIT Probabilistic Method Lectures
No ratings yet
MIT Probabilistic Method Lectures
124 pages
Reciprocal Lie-Trotter Formula: Linear and Multilinear Algebra
No ratings yet
Reciprocal Lie-Trotter Formula: Linear and Multilinear Algebra
17 pages
Bí kíp thế lực 3.0 PDF
No ratings yet
Bí kíp thế lực 3.0 PDF
168 pages
Equations Reducible To Exact Differential Equations-Method 1 (Inspection Method)
No ratings yet
Equations Reducible To Exact Differential Equations-Method 1 (Inspection Method)
11 pages
Click Here To Get Your Free Novapdf Lite Registration Key
No ratings yet
Click Here To Get Your Free Novapdf Lite Registration Key
4 pages
Probability Notes
No ratings yet
Probability Notes
127 pages
Regularity Methods in Analysis: A. Lastname
No ratings yet
Regularity Methods in Analysis: A. Lastname
8 pages
Transformations Study Guide
No ratings yet
Transformations Study Guide
3 pages
Neural Networks and Principal Component Analysis: Learning From Examples Without Local Minima
No ratings yet
Neural Networks and Principal Component Analysis: Learning From Examples Without Local Minima
6 pages
PET - 2 TRIGONOMETRY and GEOMETRY Ans
No ratings yet
PET - 2 TRIGONOMETRY and GEOMETRY Ans
3 pages
Ans Applied Mathematics Questionbank Diploma Sem-2
No ratings yet
Ans Applied Mathematics Questionbank Diploma Sem-2
12 pages
Applied Mathematics and Computation: Amar Deep, Deepmala, R. Ezzati
No ratings yet
Applied Mathematics and Computation: Amar Deep, Deepmala, R. Ezzati
9 pages
Linear Programming Linear Programming
No ratings yet
Linear Programming Linear Programming
67 pages
Ibps RRB Book
100% (1)
Ibps RRB Book
554 pages
MM1A2d Simplify Rational Expr Row Game
No ratings yet
MM1A2d Simplify Rational Expr Row Game
1 page
Chapter5v2 0
No ratings yet
Chapter5v2 0
25 pages
On Weak Law of Large Numbers For Sums of NSD R.V
No ratings yet
On Weak Law of Large Numbers For Sums of NSD R.V
10 pages
Domain Range 1
No ratings yet
Domain Range 1
2 pages
Mathematical Analysis - Real Sequences and Real Series
No ratings yet
Mathematical Analysis - Real Sequences and Real Series
53 pages
Summer 578 Assignment 2 Solutions
100% (1)
Summer 578 Assignment 2 Solutions
13 pages
Ma8391 Notes
No ratings yet
Ma8391 Notes
60 pages
FP1 Complex Numbers PDF
No ratings yet
FP1 Complex Numbers PDF
9 pages
Additional Mathematics
No ratings yet
Additional Mathematics
8 pages
Network Design Optimization
No ratings yet
Network Design Optimization
12 pages
Module 1: A Crash Course in Vectors Lecture 2: Coordinate Systems
No ratings yet
Module 1: A Crash Course in Vectors Lecture 2: Coordinate Systems
10 pages
ADVANCE
No ratings yet
ADVANCE
136 pages
Grade 9 Math: Radicals & Complex Numbers
No ratings yet
Grade 9 Math: Radicals & Complex Numbers
11 pages
Two Dimensional Cyclic Convolution Algorithms With Minimal Multiplicative Complexity
No ratings yet
Two Dimensional Cyclic Convolution Algorithms With Minimal Multiplicative Complexity
7 pages
Probability Integral Transformation
No ratings yet
Probability Integral Transformation
5 pages
Chapter 4 - Function of Random Variables: EE385 Class Notes 7/6/2015 John Stensby
No ratings yet
Chapter 4 - Function of Random Variables: EE385 Class Notes 7/6/2015 John Stensby
43 pages
Activity 4 - Rational Functions
No ratings yet
Activity 4 - Rational Functions
2 pages

Order Quantum Wasserstein Distances From Couplings: Emily Beatty and Daniel Stilck Fran Ca

Uploaded by

Order Quantum Wasserstein Distances From Couplings: Emily Beatty and Daniel Stilck Fran Ca

Uploaded by

Ann.

Henri Poincaré Online First

Order p Quantum Wasserstein Distances

Abstract. Optimal transport provides a powerful mathematical frame-

3. Motivations and Deﬁnitions

distance corresponds to the Wasserstein distance obtained when we equip the

properties expected from a good quantisation of the Wasserstein distance. In

Finally, we discuss three further applications of our distance: an oper-

2. Basic Concepts and Notation

The optimal transport cost of transporting μ and ν is then given by minimising

This notion also extends to p = ∞ in the following way:

It is desirable to generalise these distances to quantities on quantum

The combination of two systems H1 and H2 is represented by their tensor

Definition 6. Let U(n) be the group of n × n unitary matrices. The Haar

3. Motivations and Definitions

Wasserstein distances. For all orders p ≥ 1 and all x, y ∈ X , the Wp dis-

Wp (δx , δy ) = d(x, y). (9)

Wpd (|ψψ|, |ϕϕ|) = d(|ψ , |ϕ) (10)

where the indexing set J is countable.

The pth -order quantum Wasserstein distance on D(H) is then deﬁned as

We will see in Proposition 16 that for ﬁnite-dimensional H and for d

for quantum transport plans Q = {(qj , |ψj  , |ϕj )}j∈J .

= Tp2 (Q)1/p2 (21)

discussed in Appendix C. However, by looking at the dual picture in Sect. 4.1,

Table 1. An overview of the Wasserstein distances used in

Wpd (μ, ν) The classical Wasserstein distance of order p between measures

Cd(|ψ , |ϕ)α ≥ |ψψ| − |ϕϕ| 2 . (26)

Proposition 11. Let H be a separable Hilbert space with distance d on PH.

and therefore we have Wpd (|ψψ|, |ϕϕ|) = d(|ψ , |ϕ).

We can then employ the enforced Hölder continuity condition to prove

Lemma 12. Let H be a separable Hilbert space with distance d on PH with

Corollary 13. Let d be a distance on PH with respect to which the 2-norm is

Conversely, let unitary V be a conjugational symmetry of Wpd . Then, for

This allows us to prove a result on data processing for mixed unitary

Proposition 15. Suppose Φ is a mixed unitary channel written as a countable

for which multiplication by each Uk is an isometry with respect to d. Then, for

It is also possible, in ﬁnite dimensions, to guarantee the existence of an

Proposition 16. Let d be a continuous distance on PH and let H have dimen-

a nontrivial linear relation

Deﬁne subsets K, L of J by K = {k ∈ J : ck > 0} and L = {l ∈ J : cl < 0}.

We will aim to replace a portion of the transport plan corresponding to the

= r1 T1d (Q1 ) + r2 T1d (Q2 ). (45)

Then for all p and all states ρa , σa on Ha and ρb , σb on Hb , we have

4.1. Dual Picture

Definition 20. Let d be a metric on PH. Suppose d is continuous and that

and so this is a chain of equalities.

Importantly, this means that this Ld is a function of the metric d and is

Proposition 22. Let d be a metric on PH such that the 2-norm is Lipschitz

Proof. For ﬁniteness, we use |Tr[O(|ψψ|−|ϕϕ|)]| ≤ O ∞ |ψψ| − |ϕϕ| 1

Proposition 23. Let · DW1d be deﬁned by X DW1d = supLd (O)≤1 Tr[OX] on

≤ sup Tr[O1 X1 ] + sup Tr[O2 X2 ]

= X1 DW1d + X2 DW1d . (63)

Proposition 24. Suppose that there H is ﬁnite-dimensional exists norm · d

Proof. As discussed above, we know that ρ − σ d ≤ W1d (ρ, σ). Letting · D

If ρ, σ are pure then by Proposition 11 we have W1d (ρ, σ) = ρ − σ d , and so

ρ−σ W1H = W1H (r, s). (67)

Furthermore, its formulation mirrors the Kantorovich–Rubinstein theo-

O L = max{Tr[O(ρ − σ)] : ρ − σ W1H ≤ 1}. (68)

WpH (ρ, σ) = WpH (r, s). (69)

However, in this case, we recover some interesting diﬀerences between the

However, for p > 1, we ﬁnd an interesting phenomenon of quantum ‘short-

of ρ and σ. Proposition 2 of [50] shows that both

So we will show that Ld (O) ≥ max1≤i≤n O − Tri O ⊗ Ii ∞ .

5.2. Trace Distance

Proposition 26. Let ρ, σ be states on Cd . Then,

spectral decompositions. Let γ be a classical coupling of the measures μ and

in which multiplication by a unitary is independent of weight, to mixed states

In order to say anything useful about the quantum Wasserstein distance

Proposition 27. Let U ∈ SU(2n ). Then,

d(I, U ) ≥ 2−n/2 U − I 2 . (92)

Proof. Suppose U (t) is a geodesic curve in SU(2n ) with U (t) = I and U (T ) =

Corollary 28. Let |ψ , |ϕ ∈ PH. Then,

2(n+1)/2 dC (|ψ , |ϕ) ≥ |ψψ| − |ϕϕ| 2 . (99)

Proof. Let U ∈ SU(2n ) have U |ψ = |ϕ. Then,

= Tr[(U − I)† (U − I)]

for r and s some classical probability distributions on {0, 1}n . Letting U be

Wpd (|ψψ|, |ϕϕ|) = d(|ψ , |ϕ) (10)

for quantum transport plans Q = {(qj , |ψj , |ϕj )}j∈J .

Cd(|ψ , |ϕ)α ≥ |ψψ| − |ϕϕ| 2 . (26)

and therefore we have Wpd (|ψψ|, |ϕϕ|) = d(|ψ , |ϕ).

Proof. For ﬁniteness, we use |Tr[O(|ψψ|−|ϕϕ|)]| ≤ O ∞ |ψψ| − |ϕϕ| 1

Corollary 28. Let |ψ , |ϕ ∈ PH. Then,

2(n+1)/2 dC (|ψ , |ϕ) ≥ |ψψ| − |ϕϕ| 2 . (99)

Proof. Let U ∈ SU(2n ) have U |ψ = |ϕ. Then,

Q = {(qbc , |ψb , |ϕc )}b,c∈{0,1}n (106)

= N 2 |C2(1−δ)n |P|ψ ,|ϕ ∼i.i.d. μHaar [|ψ ≈3 |ϕ] (140)

to pairs (|ψj , |ϕj ) in PH1 × PH2 . This is equivalent to a probability measure

Q = {(qj , |ψj , |ϕj )}∞

which is a countable transport plan between some states ρ , σ . We will show

Q = {(qj , |ψj , |ϕj )}j∈J (211)

and deﬁne σ̃ analogously. Note that Πρ (ρ − ρ )Πρ ∞ ≤ ρ − ρ ∞ < δ.

plan elements form set Q2 . It follows that Q1 ∪ Q2 is a transport plan from ρ

Tp (Q1 ∪ Q2 ) ≤ Tp (Q) + . Thus we have a transport plan for ρ to σ with