Order Quantum Wasserstein Distances From Couplings: Emily Beatty and Daniel Stilck Fran Ca
Order Quantum Wasserstein Distances From Couplings: Emily Beatty and Daniel Stilck Fran Ca
Contents
1. Introduction
2. Basic Concepts and Notation
2.1. Classical Optimal Transport
2.2. Quantum Information Framework
E. Beatty and D. Stilck França Ann. Henri Poincaré
1. Introduction
Optimal transport has established itself as a powerful tool in various areas
of science and pure mathematics, such as machine learning [30], information
theory [58], partial differential equations [65] and economics [31]. In light of
this, it should come as no surprise that the last years have seen a surge of
interest in the quantum generalisation of optimal transport [6,15,18,24,25,29,
32,33,50,52].
One of the central concepts of classical optimal transport is the set of
p-Wasserstein distances Wp , which is a family of distances on the set of prob-
ability measures on a metric space. Roughly speaking, if we imagine a prob-
ability measure as describing a distribution of mass, these distances measure
the cost of transporting one measure onto another in terms of a cost function
on the underlying metric space. They can recover many widely studied met-
rics in probability spaces in a unified way. For instance, the total variation
Order p Quantum Wasserstein Distances from Couplings
where qj > 0, and taking convention Tr|ψj ψj | = Tr|ϕj ϕj | = 1. This is
equivalent to an expression of a quantum coupling τ of ρ and σ as a convex
combination of pure bipartite states
τ= qj |ψj ψj | ⊗ |ϕj ϕj | (13)
j∈J
j∈J λj |ψj ψj | and σ = k∈K μk |ϕk ϕk | we have
Q = {(λj μk , |ψj , |ϕk )}j∈J,k∈K ∈ Q(ρ, σ). (14)
Transport plans can be defined in any way satisfying the criteria above. How-
ever when discussing the number of elements in a finite transport plan, we
say that two elements (q, |ψ , |ϕ), (q , |ψ , |ϕ) transporting the same states
never appear in the same transport plan: any plan written as such is implied
to contain the element (q + q , |ψ , |ϕ) in their place.
This equivalence between transport plans and ways of writing couplings
reflects the classical case, although we note that one quantum coupling could
give rise to multiple quantum transport plans. For example, the separable
quantum coupling τ = |00| ⊗ 2I between |00| and 2I on a two-qubit space
gives rise to transport plans {(1/2, |0 , |0), (1/2, |0 , |1)} and
{(1/2, |0 , |+), (1/2, |0 , |−)}.
In building quantum transport plans, we can also refer to partial quantum
transport plans. This is a quantum transport plan where ρ and σ are instead
positive semidefinite operators of equal trace at most 1, and the partial plan
transports the ‘partial state’ ρ onto the ‘partial state’ σ.
We can then use this notion of a quantum transport plan to replicate the
classical definitions of transport cost and Wasserstein distance in the quantum
setting. The concept of a quantum transport plan is defined for all separable
Hilbert spaces H1 , H2 , but from this point forwards we require H1 = H2 in
order to have a metric between elements of transport plans. From here onwards
we refer to these both as H.
Definition 9. Let H be a separable Hilbert space and let d be a distance on
PH. Let p ≥ 1. For any transport plan Q = {(qj , |ψj , |ϕj )}j∈J we define its
pth -order quantum transport cost as
p
Tpd (Q) = qj d (|ψj , |ϕj ) . (15)
j∈J
Appendix B 2, we discuss one possible extension and then argue that, at least
for p = 1, couplings with entangled states are not advantageous and we can
restrict to the couplings defined above without loss of generality.
We could also have considered transport plans defined by an integrable
measure q on PH1 ×PH2 . We discuss this possibility in Appendix B 3 and show
these do also not give an advantage over the transport plans as defined above.
However, note that it is still possible to upper bound Wpd (ρ, σ) by Tpd (q)1/p for
such a q, where Tpd (q) is defined in Appendix B 3.
As with many other ordered distances, we can also define the infinite-
order quantum Wasserstein distance.
Definition 10. Let H be a separable Hilbert space and let d be a distance
on PH. For any states ρ, σ ∈ D(H), we define their infinite-order quantum
Wasserstein distance as
d
W∞ (ρ, σ) = inf sup d(|ψj , |ϕj ) (17)
Q∈Q(ρ,σ) j∈J
i=1 i=1
(23)
This is a true metric and it has led to many applications such as in quantum
spin systems [54] and variational quantum algorithms [24]. However, the ap-
proach taken is very specific to the Hamming distance and does not lend itself
to general transport costs.
For order p = 2, a number of definitions have been proposed in various
contexts, such as [33] which defines a second-order optimal transport cost in
the context of mean field limits that was shown in [13] to have links with the
Brenier formulation of classical optimal transport [64, p. 238].
More precisely, for a set of Hermitian operators {R1 , . . . , RK } on H, the
second-order transport cost of a coupling Π on H ⊗ H by
K
C(Π) = Tr[(Ri ⊗ I − I ⊗ Ri )Π(Ri ⊗ I − I ⊗ Ri )] (24)
i=1
from which a second-order Wasserstein distance is derived by taking the square
root of the infimum of the cost of all couplings between ρ and σ in the usual
K 2 2
way. Defining quantity d(|ψ , |ϕ) = i=1 Ri |ψ + Ri |ϕ
E. Beatty and D. Stilck França Ann. Henri Poincaré
− 2ψ|Ri |ψϕ|Ri |ϕ, we see that this is a variation of our definition above,
although with a specific d which is not a true metric, and with the infimum
taken over all couplings as opposed to just those which are separable.
Following the Brenier formulation more closely, the distance in [15] is also
a second-order distance and has been used [25] to prove a quantum version of
the HWI inequality. Specifically, the 2-Wasserstein distance is defined here as
the geodesic distance on the set of full-rank states equipped with a Riemannian
metric defined by the continuity equation. This definition is a world away from
our framework, as it leans heavily into the intricacies of the classical dynamical
formulation.
We should also mention the definition in [18] and refined in [42] which
defines a second-order cost based on couplings with a specific cost function
given by an asymmetric projection. While it is conjectured that this gives a
true distance, we show in Appendix A that this definition cannot be extended
to other underlying geometries, such as analogues of the Hamming distance
on multipartite spaces. Further approaches include the more flexible [52], an
alternative to [33], which defines a second-order distance based on couplings
that is not faithful. It has been conjectured [53] that a natural modification
of this quantity is a true distance, though this remains an open problem. A
similarly flexible approach appears in [29], following a naı̈ve translation of the
classical formulation into the quantum setting. This definition takes a cost
matrix C on H ⊗ H, and defines a Wasserstein distance by
1/p
Wp (ρ, σ) = inf Tr[C p τ ] (25)
τ ∈C(ρ,σ)
where C(ρ, σ) is the set of quantum couplings of ρ with σ. The particular case
of C being the projection onto the asymmetric subspace is studied in detail
and coincides with the definition in [18]. In the general case, however, this is
not shown to be a semidistance.
Each of the approaches seems to be generalising one particular aspect
or application of Wasserstein distances to the noncommutative setting, be it
obtaining a distance for a given value of p or a given underlying geometry of the
Hilbert space. However, it is not clear how they relate to each other or how to
extend them beyond their original setting. The definition in this work adapts to
any order p and any underlying metric d on the set of pure states of the Hilbert
space provided that d satisfies a few basic continuity properties. This broad
flexibility allows us to talk about the moments of the cost of moving between
classical-quantum sources in great generality (Sect. 6.2) and also allows us
to talk about the noise of an operator by comparing transport distances of
different orders in an analogue of hypercontractivity (Sect. 6.3).
To avoid confusion, we give in Table 1 a summary of all relevant Wasser-
stein distances used throughout this work.
Order p Quantum Wasserstein Distances from Couplings
Notation Definition
4. General Properties
The goals of this section are to prove some fundamental attributes of Wpd which
will give an idea of how it behaves in the case of general d. Throughout we will
require some basic regularity conditions on d. As we will see in more detail in
the discussion following Cor. 13, not every metric d on the set of pure states
induces a faithful Wasserstein distance. Thus, we will restrict to metrics with
respect to which the 2-norm on the set of traceless self-adjoint operators is
Hölder continuous. In other words, we require that there exists α ∈ (0, 1] and
C > 0 for which for all |ψ , |ϕ ∈ PH we have
Note that as we work in finite dimension, the 2-norm is chosen here without
loss of generality. In the case α = 1, this is equivalent to the 2-norm being
Lipschitz with respect to d. The constant C is a function of the metric space
(PH, d) and so can depend on the dimension of H. For some other properties
such as continuity, we will require that d be continuous, noting that on H of
finite dimension this is equivalent to d being uniformly continuous.
Given the main philosophical motivation of this definition, that the trans-
port distance between point masses in the classical setting is given by the un-
derlying metric, it’s important to show the quantum equivalent here. That is,
for all orders p the quantum Wasserstein distance between pure states agrees
with the underlying metric.
E. Beatty and D. Stilck França Ann. Henri Poincaré
Proof. As |ψψ| and |ϕϕ| are pure, the only permitted transport plan is
Q = {(1, |ψ , |ϕ)}, which has cost
Tpd (Q) = d(|ψ , |ϕ)p (28)
Proof. Wpd (ρ, σ) ≥ 0 is clear, and equality when ρ = σ can be obtained from
the taking transport plan {(cj , |ψj , |ψj )}j∈J where ρ = j∈J cj |ψj ψj | is
a spectral decomposition. Faithfulness is a direct consequence of Proposition
12.
For a general distance d, not necessarily satisfying property (26), we get
from the same proof above that Wpd (ρ, σ) ≥ 0 and Wpd (ρ, ρ) = 0. It is not
guaranteed, however, that a general distance on PH leads to a nondegenerate
Wpd . For example, let H = C2 with the standard basis {|0 , |1}. For any
non-negative real-valued function f on PH with f (|0) = 0 and f positive
elsewhere, we can define a metric d as follows:
0 if |ψ = |ϕ
d(|ψ , |ϕ) = (34)
f (|ψ) + f (|ϕ) otherwise.
This forms a version of the SNCF metric, also known as the centralised railway
metric [19, p. 327], with |0 at the centre. This is a metric in which travel is
only permitted along rays emanating from a central point. We will consider f
such that
n−1 1 n−1 1 1
f |0 + |1 = f |1 − |0 = (35)
n n n n n
for n ∈ N and f (|ψ) = 2 otherwise. Consider the sequence of transport plans
1 n−1 1 1 n−1 1
Qn = , |0 , |0 + |1 , , |0 , |1 − |0
2 n n 2 n n
(36)
for n ∈ N. These are all plans which transport |00| onto 2I and each has cost
T1d (Qn ) = n1 . This gives W1d |00|, 2I ≤ T1d (Qn ) = n1 → 0, and therefore
W1d |00|, 2I = 0. We have shown that the Hölder continuity condition is suf-
ficient for Wpd to be nondegenerate, but it is not immediately obvious whether
or not it is necessary.
Having established that Wpd is at least a semidistance, we turn our atten-
tion to its basic behavioural properties. Firstly, we note that all symmetries of
(PH, d) are inherited by the quantum Wasserstein distances.
Proposition 14. Let 1 ≤ p ≤ ∞ and d be a metric on PH. The group of unitary
symmetries of the underlying metric d is exactly the group of conjugational
symmetries of Wpd .
Proof. Let U be a symmetry of d. U is invertible; therefore, there is a direct
correspondence
{(qj , |ψj , |ϕj )}j∈J ←→ {(qj , U |ψj , U |ϕj )}j∈J (37)
between quantum transport plans from ρ to σ and from U ρU † to U σU † . The
distance d is invariant under U so cost is preserved under this correspondence,
therefore the optimal cost is also preserved.
E. Beatty and D. Stilck França Ann. Henri Poincaré
Proof. Let Q = {(qj , |ψj , |ϕj )}j∈J be any pth -order transport plan between ρ
and σ. We can then define a transport plan Q = {(qj ak , Uk |ψj ,
Uk |ϕj )}j∈J,k∈K between Φ(ρ) and Φ(σ) which has the same pth -order trans-
port cost as Q. Taking the infimum over Q gives the result.
Therefore, among the elements (|ψj ψj |, |ϕj ϕj |) in MD × MD we can find
sa sa
so that the coefficients ck and −cl are strictly positive. Without loss of gener-
ality, we may assume that
ck d(|ψk , |ϕk )p ≥ (−cl )d(|ψl , |ϕl )p . (43)
k∈K l∈L
We may then optimise the transport cost over all transport plans of size
at most 2D2 . This set is compact and the pth -order cost (15) of a transport
plan is a continuous function of the transport plan Q for d continuous, so the
infimum is attained.
The existence of an optimal transport plan will be particularly useful
later when looking at examples, applications, and further properties.
Another key property of this Wpd is continuity, which we can prove pro-
vided that d is uniformly continuous. This will be necessary for a coherent
definition of a dual in Sect. 4.1.
Proposition 17. Suppose d is uniformly continuous on PH and let 1 ≤ p < ∞.
Then Wpd is uniformly continuous.
Proof. The proof is quite technical and not particularly instructive, so has
been placed in Appendix D 1 for the convenience of the reader.
For p = 1 specifically, we furthermore have joint convexity.
Proposition 18. W1d is jointly convex.
Proof. Suppose ρ1 , σ1 , ρ2 , and σ2 are quantum states, and let r1 + r2 =
1, ri ≥ 0. Let Q1 = {(q1,j , |ψ1,j , |ϕ1,j )}j∈J be any transport plan be-
tween ρ1 and σ1 , and Q2 = {(q2,k , |ψ2,k , |ϕ2,k )}k∈K any transport plan
between ρ2 and σ2 . Then Q = r1 Q1 ∪ r2 Q2 := {(r1 q1,j , |ψ1,j , |ϕ1,j )}j∈J j ∪
{(r2 q2,k , |ψ2,k , |ϕ2,k )}k∈K is a transport plan between r1 ρ1 +r2 ρ2 and r1 σ1 +
r2 σ2 .
Then, we have
T1d (Q) = r1 q1,j d(|ψ1,j , |ϕ1,j ) + r2 q2,k d(|ψ2,k , |ϕ2,k )
j∈J k∈K
Proof. Let Qa = {(qj,a , |ψj,a , |ϕj,a )}j∈J and Qb = {(qk,b , |ψk,b , |ϕk,b )}k∈K
be transport plans from ρa to σa and from ρb to σb , respectively. We’ll show
that the transport plan Q = {(qj,a qk,b , |ψj,a ⊗ |ψk,b , |ϕj,a ⊗ |ϕk,b }j∈J,k∈K
has
Tpda (Qa )1/p + Tpdb (Qb )1/p ≥ Tpd (Q)1/p . (48)
Letting Xa be the nonnegative random variable taking value j with probability
qj,a , and similarly Xb independent of Xa taking value k with probability qk,b .
Let Aa = da (|ψXa ,a , |ϕXa ,a ) and Ab = db (|ψXb ,b , |ϕXb ,b ). We see for i = a, b
that Tpdi (Qi )1/p is by its definition the pth root of the pth moment of Ai . As
the pth root of the pth moment is an Lp norm on random variables on the
measure space (X , μ) induced by Xa and Xb , we know that
p 1/p
Tpda (Qa )1/p + Tpdb (Qb )1/p ≥ Eμ [(Aa + Ab ) ] . (49)
Equation (46) means that Aa + Ab ≥ d(|ψj,a ⊗ |ψk,b , |ϕj,a ⊗ |ϕk,b ), and so
p 1/p 1/p
Eμ [(Aa + Ab ) ] ≥ Eμ [d(|ψ1,Xa ⊗ |ψ2,Xb , |ϕ1,Xa ⊗ |ϕ2,Xb )p ]
= Tpd (Q)1/p (50)
as claimed.
Note that this result also holds for p = ∞, as can be seen by replacing
Wp by W∞ and Tp (·)1/p by supj∈J d(|ψj , |ϕj ) in the proof.
Proposition 21. Let d be a metric on PH. Suppose that the 2-norm is Lipschitz
with respect to d. Then,
Tr[O(|ψψ| − |ϕϕ|)]
Ld (O) = sup . (52)
|ψ =|ϕ d(|ψ , |ϕ)
Proof. For any ρ and σ, let Q = {(qj , |ψj , |ϕj )}j∈J be an transport plan
between them whose first-order transport cost is at most W1d (ρ, σ) + . Then,
Tr[O(ρ − σ)]
Ld (O) ≤ sup (53)
ρ=σ W d (ρ, σ)
1
j∈J qj Tr[O(|ψj ψj | − |ϕj ϕj |)]
≤ sup (54)
ρ=σ j∈J qj (d(|ψj , |ϕj ) − )
Tr[O(|ψj ψj | − |ϕj ϕj |)]
≤ sup max (55)
ρ=σ j∈J d(|ψj , |ϕj ) −
Tr[O(|ψψ| − |ϕϕ|)]
≤ sup (56)
|ψ =|ϕ d(|ψ , |ϕ) −
Tr[O(|ψψ| − |ϕϕ|)]
= sup (57)
|ψ =|ϕ W1d (|ψψ|, |ϕϕ|) −
Tr[O(|ψψ| − |ϕϕ|)]
→ sup as →0 (58)
|ψ =|ϕ W1d (|ψψ|, |ϕϕ|)
≤ Ld (O) (59)
For the other norm properties, clearly Ld (λO) = |λ|Ld (O). And suppose
we have Hermitian operators O1 and O2 . For any states ρ and σ, we have
Tr[(O1 + O2 )(ρ − σ)]/W1d (ρ, σ)
= Tr[O1 (ρ − σ)]/W1d (ρ, σ) + Tr[O2 (ρ − σ)]/W1d (ρ, σ) ≤ Ld (O1 ) + Ld (O2 )
(60)
and taking the supremum of the left over such ρ and σ proves the triangle
inequality for Ld .
We can then dualise one more time and consider the norm
ρ−σ DW1d = sup Tr[O(ρ − σ)]. (61)
Ld (O)≤1
Proof. For the norm, it is clear that when X = 0 we have X W D1d equal to
zero. The fact that λX DW1d = |λ| X DW1d is clear from the definition, and
the triangle inequality holds as
X1 + X 2 DW1d = sup Tr[O(X1 + X2 )]
Ld (O)≤1
5. Special Instances
5.1. W1H Distance on n-qudit Systems
[50] introduced a quantum Wasserstein distance of order 1 which generalises
the Hamming distance on the discrete hypercube. This is defined on Hilbert
⊗n
space H = Cd . The distance defined is normed, and we notate it here by
ρ − σ W1H .
Order p Quantum Wasserstein Distances from Couplings
· W1H has the interesting property that it recovers the classical first-
order Wasserstein distance W1H on the Hamming cube, for states that are
diagonal in the computational basis. That is, for r and s probability distri-
butions on {0, 1, . . . , d − 1}n , and states ρ = x∈{0,1,...,d−1}n r(x)|xx| and
σ = y∈{0,1,...,d−1}n s(y)|yy| we have
It then induces a metric dH on PH, given by d(|ψ , |ϕ) = |ψψ| − |ϕϕ| W1H ,
from which we can define a pth order Wasserstein distance WpH as above. For
H
the special case p = 1, we will write Wp=1 to avoid confusion.
We might expect that property in Eq. (67) extends to the WpH distance:
that is, that for classical states ρ and σ defined by r and s as above, that
Proof. The lower bound comes from Eq. (65). For the upper bound, we refer
to the dual O L and show that O L ≤ 2Ld (O). Taking the dual of this
equation will give the upper bound required.
We have from [50, Proposition 15], that
2 max O − Tri O ⊗ Ii ∞ ≥ O L . (75)
1≤i≤n
For this specific instance, we have a natural way to look at pth -order
quantum Wasserstein distances under tensor products. Indeed, for qudit sys-
tems Ha and Hb each equipped with · W1H on na and nb qudits, respectively,
Ha ⊗ Hb can be equipped with · W1H on na + nb qudits. We can therefore
apply the general result 19 to WpH as · W1H is additive under tensor products
[50, Proposition 4].
Proposition 2 of [52] that this agrees with WpH in the case of a single qudit.
From this we inherit all the properties of WpH , notably Proposition 25 which
gives us ρ − σ 1 ≥ ρ − σ DW11 ≥ 12 ρ − σ 1 . However, we can go one step
further.
as taking the dual of this equation would give the inequality in the other
direction.
ρ − σ for ρ , σ positive semidefinite
For any ρ = σ, let ρ − σ = operators
with ρ ⊥ σ . Then, write ρ = j∈J μj |ψj ψj | and σ = k∈K νk |ϕk ϕk | in
and so
Tr[O(ρ − σ)] j∈J,k∈K γj,k Tr[O(|ψj ψj | − |ϕk ϕk |)]
=
2 ρ−σ 1
1
j∈J,k∈K γj,k |ψj ψj | − |ϕk ϕk | 1
Tr[O(|ψψ| − |ϕϕ|)]
≤ sup = L1 (O) (85)
|ψ =|ϕ |ψψ| − |ϕϕ| 1
which concludes the proof.
We can also now give an example to show that joint convexity does not
hold in the case p > 1, analogously to the classical case. Using the notation of
Proposition 18 consider ρ1 = |00|, σ1 = |00|, ρ2 = |00|, and σ2 = |11|,
with r1 = r2 = 12 . In this case r1 Wp1 (ρ1 , σ1 ) + r2 Wp1 (ρ2 , σ2 ) = 12 no matter
the value of p, and r1 ρ1 + r2 ρ2 = |00|, r1 σ1 + r2 σ2 = 2I . By the proof of
Proposition 24, we have that for any transport plan Q = {(qj , |0 , |ϕj )}j∈J
between |00| and 2I we have
I
Tp1 (Q)1/p ≥ T11 (Q) ≥ |00| − = 1. (86)
2 2
1
Equality in the first comparison happens if and only d(|0 , |ϕj ) is con-
→ xp is strictly convex. This means that the |ϕj must all
stant in j since x
take form α |0 + 1 − |α|2 eiθj |1 for some fixed α. We must have α = √12 to
ensure the end state is indeed 2I . Equality in the second comparison then hap-
pens only if all |00| − |ϕj ϕj | commute, meaning all phases eiθj must be the
same. But then the end state cannot be 2I , therefore a plan Q between |00|
and 2I satisfying equality here cannot exist. We conclude that joint convexity
does not hold for p > 1.
5.3. Complexity Geometry
A distance d(I, U ) giving lower bounds for the complexity of synthesising a
unitary U ∈ SU(2n ) from a universal one- and two-qubit gate set was defined
in [46] and refined in [22,45]. This distance d on SU(2n ) is a geodesic distance
of a Riemannian manifold, where the Riemannian metric is chosen such that
local travel is fast in directions corresponding to multiplication by low-weight
unitaries, and slow in directions corresponding to multiplication by high-weight
unitaries.
This idea of expressing quantum gate complexity in terms of Riemannian
geometry has seen renewed interest in recent years, from applications in black
hole thermodynamics [9,35] to rigid bodies [10] and the complexity of typical
unitaries [8]. However, this metric is originally defined as a distance between
unitaries, with the complexity of unitary expressed in terms of its distance
from the identity. This extends naturally to distances between pure states as
the lowest complexity of a unitary which transforms one into the other. Our
optimal transport formulation allows for a natural extension of this metric
to mixed states, in a way that can be considered as quantifying the lowest
possible complexity of transforming one mixed state into another. A related
approach in [59] extends a variation of this complexity geometry metric, one
Order p Quantum Wasserstein Distances from Couplings
and the distance d(I, U ) is simply the geodesic distance on this Riemannian
manifold.
The main purpose of d is to find a geometric interpretation of gate com-
plexity, and [22, Equation 3] gives bounds for d in terms of gate complexity.
Let G(U ) be the exact gate complexity of U , i.e. the minimal number of one-
and two-qubit gates required to synthesise U exactly. For ≥ 0, let G(U, ) be
the minimal number of one- and two-qubit gates required to synthesise a gate
V such that U − V ∞ ≤ , also known as the -approximate gate complexity.
We then have bounds
κG(U, )1/3 2/3
≤ d(I, U ) ≤ G(U ) (89)
n2
for some constant κ > 0.
With these unitaries acting on the n-qubit space H = (C2 )⊗n , we can
define a metric dC on PH by
dC (|ψ , |ϕ) = min {d(I, U ) : U ∈ SU(2n ), U |ψ = |ϕ} . (90)
The metric properties of dC come directly from the metric properties and
right-invariance of d. Operationally, this gives a lower bound for the minimum
circuit complexity required to synthesise |ϕ from |ψ. Indeed,
κ(min{G(U, ) : U |ψ = |ϕ})1/3 2/3
≤ dC (|ψ , |ϕ)
n2
≤ min{G(U ) : U |ψ = |ϕ}. (91)
E. Beatty and D. Stilck França Ann. Henri Poincaré
N −1
≤ i(T /N )H (jT /N ) U (jT /N ) + (T /N )δ(jT /N, T /N )2 (94)
j=0
N −1
T T
≤ iH (jT /N ) U (jT /N )2 + δ(jT /N, T /N )2 by the triangle inequality
j=0
N N
(95)
⎛ ⎞
N −1
T
≤⎝ H (jT /N )2 ⎠ + sup δ(t, T /N )2 by unitary invariance of ·2
j=0
N t
(96)
T
→ H(t)2 dt as N → ∞ by smoothness of H (97)
0
T
≤2
n/2
H(t), H(t) dt (98)
0
using the inequality 2−n/2 H 2 ≤ H, H in the last line. Taking the
infimum over such curves gives d(I, U ) ≥ 2−n/2 U − I 2 .
2
Using the formula |ψψ| − |ϕϕ| 2 = 2(1 − |ψ|ϕ|)2 , we get
2 1 2
U −I 2 ≥2 1− 1− |ψψ| − |ϕϕ| 2 (103)
2
1 2
≥ |ψψ| − |ϕϕ| 2 . (104)
2
Combining this with Proposition 27 gives the result.
It follows from Proposition 12 that all WpC defined from this dC are
nondegenerate. This gives a natural extension of the ideas of state complexity
to mixed states, to which we can apply results such as those which will be
discussed in Sect. 6.2 on classical-quantum (cq) sources.
Looking at ρ and σ in their eigenbases, we can give a concrete interpre-
tation of these values. Indeed, let
ρ= rb |ψb ψb | σ= sc |ϕc ϕc | (105)
b∈{0,1}n c∈{0,1}n
which has a first-order transport cost of at most G(U ) + G(V ) + W1H (r, s).
This allows us to conclude that
This splits the quantum transport distance arising from complexity geome-
try into two parts: the classical transport cost between the states, and their
quantum complexity as a whole.
For this specific instance we also have subadditivity as discussed in Propo-
sition 19, as there is a natural choice for the distance dC on the tensor product.
Namely, for two systems Ha , Hb on na , nb qubits, respectively, we can equip
P(Ha ⊗ Hb ) with the distance dC on na + nb qubits. For states |ψa , |ϕa on
Ha and |ψb , |ϕb on Hb , let Ua |ψa = |ϕa , and Ub |ψb = |ϕb . We have
and therefore dC (|ψa ⊗ |ψb , |ϕa ⊗ |ϕb ) ≤ dC (|ψa , |ϕa ) + dC (|ψb , |ϕb ).
The condition of Proposition 19 is satisfied, and so under the natural choice
dC on P(Ha ⊗ Hb ) the Wpd distances are subadditive.
E. Beatty and D. Stilck França Ann. Henri Poincaré
6. Applications
6.1. Results for Random Quantum States
To understand these quantities in general, it is useful to look at how they
behave on random quantum states. We will look at both the versions stemming
from · W1H and from complexity geometry.
We look at a few regimes in the definition of ‘random’ states. For random
pure states, we generate |ψ according to the uniform measure on the unit
sphere in H, and take ρ = |ψψ|. For mixed states, we adjoin an auxiliary
system A of dimension s, generate a random pure state |ψ on H ⊗ A, then
take ρ = TrA |ψψ|. The distribution of ρ depends entirely on the values of s
and dimH chosen.
Note that for the two versions studied, the underlying space H is a qudit
space H = (Cd )⊗n . It is convenient, in this case, to write s = dm , as this
allows us to consider the qudit ratio c = m n between the auxiliary and base
systems. Note that while s is always an integer, m need not be, i.e. we will also
consider auxiliary dimensions that are not a power of d. We can then consider
the regime c < 1 as ‘low rank’, and regime c > 1 as ‘high rank’. We will see
that there is a marked phase transition between values c < 1, in which W1H
and W1C grow on average like diam(PH) in n, and between values c > c∗ for
some threshold c∗ ≥ 1 depending on H, in which all W1d decay exponentially
with n on average.
Previous works have highlighted similar properties for other quantities.
For example, [60] shows that for the regime c > 1 we have E[S(ρ)] ≥ n log d −
1 −(c−1)n
2d , from which we can show that the quantum relative entropy D(ρ σ) =
Tr[(ρ log ρ − ρ log σ)] has E[D(ρ σ)] decaying exponentially in n for c > 1.
Meanwhile, two random states ρ, σ with c > 1 have span σ ⊆ span ρ with prob-
ability 1, and so D(ρ σ) is in general infinite. Similar results
1 have been found
for the trace distance
[61], showing that for c < 1 we have E 2 ρ − σ 1 →n→∞
1, and for c > 1 E 12 ρ − σ 1 →n→∞ 0. However, these existing analyses re-
late to the notion of distinguishability of states, whereas analysing the phase
transition in W1d allows us to go beyond this regime to discuss other properties
such as the computational complexity between random states.
For mixed states in the high-rank regime, we see exponential decay in
the expected W1d distance between two i.i.d. states, no matter the underlying
metric d. This is summarised in the following proposition. Note that while it
log dim A
is phrased for qudit systems, it applies for any H, A where c = log dim H and
dn = dim H. We write diamd (PH) for the diameter of PH under metric d.
Proposition 29. Let H = (Cd )⊗n with i.i.d. random mixed states ρ, σ generated
by an auxiliary system of A of dimension s = dm . Let c = m n , and suppose
c > 1. Then, for any β > 0, we have
! " 1
P W1d (ρ, σ) ≥ βd−(c−3)n/2 diam(PH) ≤ 2 . (111)
β
Proof. For random ρ, σ generated from large auxiliary systems, we generally
expect both to be close to maximally mixed. And so letting the minimum
Order p Quantum Wasserstein Distances from Couplings
eigenvalue
1 among
ρ and σ be d1n − δ, we can split up ρ into parts ρ − d1n − δ I
and dn − δ I, both of which 1 are positivesemidefinite.
We can do the same for
σ. We
can then
transport1 d n − δ I onto d
1
n − δ I at zero cost, and transport
ρ− d1n − δ I onto
1 σ − dn − δ I via any partial transport plan, at a maximum
cost of Tr ρ − dn − δ I diamd (PH) = δd diamd (PH). We will show that this
n
For the W1H case, we know diamd (PH) = n, and so this exponential
decay applies for qubit ratio c > 3 and large enough n. Applying this to the
expectation gives, taking λ = 1/3 for an optimal decay rate,
1 1
Eρ,σ W1H (ρ, σ) ≤ 1 − 2 βd−(c−3)n/2 diamd (PH) + 2 diamd (PH) (120)
β β
1
≤ n1−λ expd − (1 − λ)(c − 3)n + expd n1+2λ (λ(c − 3)n)
2
(121)
1 2
= expd − (c − 3)n + logd n
3 3
1 5
+ expd − (c − 3)n + logd n (122)
3 3
% &
1
= n2/3 + n5/3 expd − (c − 3)n (123)
3
For the W1C case, any n-qudit gate can be synthesised in at most 2n (2n −
1) one- and two-qubit gates [44], and so the metric space PH has diameter at
most 22n . It then follows that for any qudit ratio c > 7, the probability of an
exponentially large deviation becomes exponentially small.
c−3
Applying Eq. (111) to the expectation gives, taking λ = 3(c−7) for an
optimal decay rate,
1 1
Eρ,σ W1C (ρ, σ) ≤ 1 − 2 β2−(c−3)n/2 diamd (PH) + 2 diamd (PH) (124)
β β
≤ 2−(1−λ)(c−3)n/2 2(1−λ)2n + 2−λ(c−3)n 2(1+2λ)2n (125)
−(1−λ)(c−7)n/2 −(λ(c−7)−2)n
=2 +2 (126)
= 2 · 2(c−9)n/3 (127)
giving exponential decay in expectation for qubit ratios c > 9.
For the low-rank setting, we look first at the W1H distance. There are
two lines of intuition here. The first is that W1H generalises the Hamming
distance, and the Hamming distance quantifies the local distinguishability of
d-nary strings. If this property propagated to the quantum setting, we’d ex-
pect W1H to be small on average as random pure states are generally locally
indistinguishable. This was the behaviour conjectured in [24]. The second is
that the average Hamming distance between two d-nary strings of length n is
n(1 − 1/d), and so we might also expect the average W1H distance between
random pure strings to grow linearly with the number of qudits.
Let H = (Cd )⊗n and let m = logd s, noting again that, while s is an
integer, m need not be. In the case m < n, we can apply Theorem 9.1 of [54]
to lower bound the expected distance between two low-rank random mixed
states. A similar result was noted independently in [49].
Proposition 30. Let ρ, σ be two i.i.d. random mixed states on H = (Cd )⊗n
generated using an auxiliary system of dimension s = dm for m < n. Write
Order p Quantum Wasserstein Distances from Couplings
m
c= n. Then,
H
Eρ,σ Wp=1 (ρ, σ) ≥ λc n (128)
where λc satisfies (1 − c) log d = h2 (λ)+λ log(d2 −1) for h2 the binary entropy.
H
Proof. First note from Proposition 24 that Wp=1 (ρ, σ) ≥ ρ − σ W1H , and so
! "
we prove that Eρ,σ ρ − σ W1H ≥ λn.
Fix ρ, and note that averaging over σ and using convexity of the norm,
we have
! "
Eσ ρ − σ W1H ≥ ρ − Eσ σ W1H (129)
I⊗n
= d
ρ − dn H . (130)
W 1
(1 − c) log d at exactly one value λ ∈ [0, 1], and that for t < λ we haveg(t) <
g(λ) and for t > λ we have g(t) > g(λ), we conclude that ρ − I⊗n d /d
n
W1H
>
λn. Averaging over ρ gives the result.
⊥
is defined as a weighted sum of projectors onto subspaces Vi ∩ Vi−1 , where
Vi = span{O |ψ : O acts on at most i qudits}. (132)
This has been specifically constructed so that Tr[Λ|ψ |ψψ|] = 0, and that if
|ϕ can be written as a sum of states which differ from ψin at most k qubits,
then Tr[Λ|ψ |ϕϕ|] ≤ k. It was proven in [54] that Λ|φ L ≤ 1, and analysis
of Vi shows that Tr[Λ|ψ ] = O(nd ). It follows that, for fixed
n
of the dimension
|ψ, E|ϕ ∼μHaar Tr[Λ|ψ (|ψψ| − |ϕϕ|)] = O(n).
Turning our
2attention
to the W1C distance generated from complexity
⊗n
geometry on P (C ) , we see a similar picture for low-rank states. As noted
earlier, for the approximate gate complexity G(U, ) and the gate complexity
G(U ), we have the bound
κG(U, )1/3 2/3
≤ d(I, U ) ≤ G(U ) (133)
n2
for some constant κ > 0.
For pure states, we also know that WpC (|ψ ψ| , |ϕϕ|) = dC (|ψ , |ϕ).
And so to show the behaviour of WpC distances on low-rank states, we look at
dC .
Lemma 31. Let H = (C2 )⊗n be an n-qudit space, and let S ⊆ SU(4) be a
finite universal gate set with inverses. Letting GS (U, ) be the -approximate
gate complexity of U from set S viewed as a set of gates on 2 qubits, and
G(U, ) the -approximate gate complexity of U using any one- or two-qubit
gates, we have
−1
G(U, )poly(log(G(U, )) + log( )) ≥ GS (U, 2 ) (134)
Proof. Let V1 , . . . , VG(U,) be any circuit of one- and two-qubit gates to synthe-
sise V such that U − V ∞ ≤ . Using Solovay–Kitaev, each of these can be ap-
proximated to within error /G(U, ) in poly
(log (G(U, )/ )) gates from S. Compounding errors linearly, these form a cir-
cuit of length G(U, ) poly(log(G(U, )) + log( −1 )) of gates from S which syn-
thesises U to within operator norm 2 .
Lemma 32. Let S ⊆ SU(4) be a finite universal gate set with inverses and
⊗n
ρ, σ i.i.d. random quantum states on H = C2 generated by auxiliary
opt
system A of integer dimension s = 2cn where 0 ≤ c < 1. Let GS (Uρ→σ , )=
% &
opt
min GS U|ψ →|ϕ , : |ψ ∈ span ρ, |ϕ ∈ span σ . Then,
! "
opt
, ) ≤ 2(1−δ)n ≤ e−Ω(2 log(1/)) .
n
Pρ,σ GS (Uρ→σ (135)
Note that span ρ and span σ are i.i.d. hyperplanes distributed according to the
Haar measure of the Grassmannian Grs H. Fix element R of Grs H and choose
an -ball covering of (R \{0})/C of minimal size N = eO(2 log(1/)) centred on
cn
elements {yi }N i=1 . Choose fixed unitaries Uρ and Uσ such that Uρ R = span ρ
and Uσ R = span σ, and independent random unitaries Vρ , Vσ ∼ μHaar on
span ρ, span σ, respectively. This gives a randomly chosen independent -
covers {Yi = Vρ Uρ yi }N i=1 of span ρ and {Zj = Vσ Uσ yj }j=1 of span σ. Each
N
The first-order Wasserstein distance of [50] has been used in the quan-
tum setting to give lower bounds on the circuit complexity of shallow random
quantum circuits [39], though, just as from the classical point of view, the
bound ρ − σ W1H ≤ n greatly restricts the effectiveness of this lower bound.
The results are only significant for circuits with a number of gates linear in the
number of qubits, as the maximum lower bound possible using · W1H is n. One
potential application of WpC could be to extend this result to give lower bounds
on the circuit complexity of random quantum circuits of arbitrary depth, as it
does not suffer from this linear constraint.
N
Tpd (Q) = pi d(|ψi , |ϕi )p = EX [d(r, s)p ] (147)
i=1
Order p Quantum Wasserstein Distances from Couplings
which is lower bounded by the optimal pth -order quantum transport cost
Wpd (ρ, σ).
For sharpness, let Q = {(qj , |ψj , |ϕj )}j∈J be any finite transport plan
between ρ and σ. The sources
R= qj |jj| ⊗ |ψj ψj | S= qj |jj| ⊗ |ϕj ϕj | (148)
j∈J j∈J
and taking the infimum of the left-hand side over Q(ρ, σ) shows that the bound
C C
is sharp. Operationally, for the W∞ distance, this means W∞ (ρ, σ) is a lower
E. Beatty and D. Stilck França Ann. Henri Poincaré
Proof. Let Q = {(qj , |ψj , |ϕj )}j∈J be any p1 th -order transport plan from ρ
to σ, and let
Q = (1 − δ)Q ∪ δ{(λi , |ωi , |ωi )}i∈I (158)
for τ = i∈I λi |ωi ωi | a spectral decomposition. This is a transport plan from
Sδ,τ (ρ) to Sδ,τ (σ). This gives
⎛ ⎞1/p2
Tpd2 (Q )1/p2 = ⎝(1 − δ) qj d(|ψj , |ϕj )p2 ⎠ = (1 − δ)1/p2 Tpd2 (Q)1/p2
j∈J
1−p1 /p2
M
≤ Tpd2 (Q)1/p2 . (159)
diamd (PH)
So it remains to show that
1−p1 /p2
M
d
Tp1 (Q) 1/p1
≥ Tpd2 (Q)1/p2 . (160)
diamd (PH)
Given quantities {Aj }j∈J subject to constraints
qj Apj 1 = M p1 (161)
j∈J
0 ≤ Aj ≤ diamd (PH), (162)
q
the maximum value of j∈J qj Aj
is achieved when all Aj are either diamd (PH)
or 0, with values q1 = (M/diamd (PH))p1 and q2 = 1 − q1 . This gives
Tpd2 (Q) ≤ (M/diamd (PH))p1 diamd (PH)p2 (163)
and therefore that
p /p 1−p1 /p2
Tpd1 (Q)1/p1 M diamd (PH) 1 2 M
≥ = .
Tpd2 (Q)1/p2 diamd (PH) M diamd (PH)
(164)
This gives Tpd2 (Q )1/p2 ≤ Tpd1 (Q)1/p1 , and taking the infimum over Q gives the
result.
We note from the condition 1−δ ≤ (M/diamd (PH))p2 −p1 that this result
is applicable only when Wpd1 is close to diamd (PH) or when p2 − p1 is small.
Otherwise, we would require δ so large that the channels are no longer of
practical use. Our results on random states from Sect. 6.1 demonstrate that
this is indeed a relevant case, as for random low-rank states, WpH and WpC are
generally high.
Particularly in the case of complexity geometry, this shows that channel
noise reduces complexity. These results also demonstrate the value of consid-
ering arbitrary p in our construction of the Wpd distances. Such consideration
is not possible without being able to relate Wasserstein distances of differ-
ent orders to one another. In general, for arbitrary p1 , p2 , this shows that
Wpd1 (ρ,σ)
Wpd2 (N (ρ),N (σ))
can be well considered as a measure of noise in the channel N .
This is applicable, in particular, in situations where we have no knowledge
E. Beatty and D. Stilck França Ann. Henri Poincaré
Proof. Let M = Wpd1 (ρ, σ) ≥ Wpd2 (N (ρ), N (σ)). It follows that in the limit of
2 -order transport plans Q = {(qj , |ψj , |ϕj )}j∈J between N (ρ) and
optimal pth
N (σ), we have
p
qj d (|ψj , |ϕj ) 2 ≤ M p2 . (166)
j∈J
Let L = j∈J:d(|ψj ,|ϕj )>K qj . This also has L = P (d (|ψX , |ϕX ) ≥ K),
where X is a random variable taking values in J according to law P(X = j) =
qj . Note therefore that by Markov’s inequality
p2 p2
j∈J qj d (|ψj , |ϕj ) M
L≤ p
≤ . (167)
K 2 K
Thus, we see that a hypercontractive inequality implies that most of the
transport plan is concentrated on states that are close, and this concentration
becomes more pronounced the lower the value of p1 (as the initial distance is
monotonically increasing in p1 ) and the higher the value of p2 .
6.4. Quantum Wasserstein Generative Adversarial Networks
One of the most promising near-term applications of quantum devices is in
quantum machine learning. Among these methods, one framework which has
attracted attention in recent years is that of generative adversarial networks,
or GANs.
GANs [34] form a framework in machine learning which seeks to generate
new data which are indistinguishable from the training data. For example, they
can be used to generate images which look like authentic photographs when
trained on datasets of photographs [3], or provide image-to-text translation
Order p Quantum Wasserstein Distances from Couplings
for automatic image description for visually impaired users [40]. The main
feature of GANs is learning via a zero-sum game between a generator, which
is trained to generate data, and a discriminator, which is trained to distinguish
between the generated data and the training data. The standard GAN takes
its definition of ‘far’ to be ‘large Jensen-Shannon divergence’.
Wasserstein GANs [1] were proposed in 2017 as an alternative to standard
GANs, which redefine ‘far’ to mean ‘large first-order Wasserstein distance’.
These provide numerous improvements over standard GANs, including the
reduction of mode collapse, and allowing reference distributions which are
concentrated to small dimension in a space of large dimension.
Quantum GANs [20] are a natural quantum equivalent of classical GANs,
where a quantum state generator G parameterised by θ is trained to a target
distribution ρtar . The standard architecture for such a generator is a shallow
quantum circuit, whose gates are functions of θ.
With standard distinguishing functions, such as the trace distance or
fidelity, these suffer greatly from the problem of barren plateaus [41]—areas
where, as the number n of qudits grows, the gradient of the objective function
decays to 0. As in the classical setting, GANs which use the theory behind the
first-order quantum Wasserstein distance of De Palma et al. [50] have been
proposed [23,38] to eliminate this problem.
The form of such a quantum Wasserstein GAN (qWGAN) is as follows.
The generated state is parameterised by θ, and is stored as a set {(p1 , U1 (θ), . . . ,
pr Ur (θ)} where p(θ) = {p1 , . . . , pr } is a probability distribution also parame-
terised by θ. This represents the state
r
G(θ) = pi Ui (θ)|00|Ui (θ)† . (168)
i=1
This again highlights the need for broad flexibility in the distinguishability
metric used in quantum Wasserstein GANs.
Such broad flexibility can theoretically be easily achieved by replacing the
Lipschitz norm · L associated with the · W1H norm, with the more general
Ld (O) constant defined in Eq. 51, for an underlying distance d satisfying the
required continuity conditions for this to exist. This would, in theory, allow us
once again to avoid barren plateaus, but also to prioritise qualitatively different
ideas of discrimination via the choice of the metric d.
This would give rounds of qWGAN optimisation of the following form:
1. Replace O by Omax = argmaxLd (O)≤1 Tr[O(G(θ) − ρtar )],
2. Calculate the gradients in θ of Tr[Omax (G(θ) − ρtar )],
3. Update p(θ) and each Ui (θ) according to these gradients.
This model opens a theoretical avenue to qWGANs with flexible notions
of distinguishability, though we note that implementation of this lies beyond
the scope of this work due to its highly variable nature. In particular, the big
challenge is accurately calculating and maximising Ld (O), and it is likely that
this will require bespoke approximations to Ld for each choice of d.
Any such approximation would need to be careful to preserve the nature
of the distance chosen, in order to avoid the issue of misalignment of notions
of distinguishability. For example, the above mentioned works [23,38] discuss
qWGANs which supposedly give convergence in · W1H , but the approximation
used for the Lipschitz constant of O is a weighted sum of the coefficients of the
Pauli decomposition of O, which directly measures local distinguishability as
discussed in [38]. We have shown in Sect. 6.1 that the · W1H does not reflect
local distinguishability in general.
More concretely, they only optimise over operators O of the form
n (i)
(i,j)
O = ωI I + ωP P (i) + ωP,Q P (i) ⊗ Q(j)
i=1 P ∈{X,Y,Z} 1≤i<j≤n P,Q∈{X,Y,Z}
(169)
where P (i) is the Pauli operator P acting on qubit i, and tuples (i, j) are
unordered. To ensure these have O L ≤ 1, they force the coefficients ωP,Q to
satisfy
(i) (i,j)
2 max ω P + ωP,Q ≤ 1. (170)
1≤i≤n
P ∈{X,Y,Z} j =i P,Q∈{X,Y,Z}
the marginals of |ψψ| and |ϕϕ|, respectively, on two qubits containing the
support of R. So from Proposition 29 applied with qubit ratio c = n2 − 1,
underlying dimension 4, and parameter β = 2n/4 , we have that
P Tr[R(|ψψ| − |ϕϕ|)] ≥ 24− 4 ≤ 4 · 2− 2 .
n n
(172)
Letting ω be the sum of absolute values of the Pauli coefficients of O (except
ωI ). From Eq. (170), we have ω ≤ n. It follows via a union bound that
P Tr[O(|ψψ| − |ϕϕ|)] ≥ n24− 4 ≤ 6(3n2 − 1)2− 4 .
n n
(173)
This shows that the probability of an exponentially small deviation from
zero decays exponentially in n. In terms of the qWGAN algorithm of [38],
this means that even with a convergence threshold which decays exponentially
in n, the ability of the discriminator to distinguish between randomly chosen
states decays exponentially too. This has significant practical implications.
Namely, even for exponentially low convergence thresholds, the algorithm will
on average fail at the first iteration, as a randomly chosen initial ρinit = |ψψ|
will be indistinguishable from the target ρtar = |ϕϕ| by any O allowed by
the algorithm. However, our results on random states show that random pure
states |ψψ| and |ϕϕ| are on average maximally far apart in · W1H , so the
algorithm will believe that the states have converged at the initial step despite
them being maximally far apart.
When considering qWGANs when defined with respect to W1d , avoiding
such misalignment in approximation of Ld (O) would be an important consid-
eration for any practical implementation of such algorithms.
7. Conclusion
In this work, we have introduced a novel definition of the quantum Wasser-
stein distance by combining the coupling method and a metric on the set of
pure states. This novel definition successfully captures the essence of the clas-
sical Wasserstein distance. A significant aspect of our approach is its inherent
adaptability, effortlessly incorporating established metrics on quantum states
such as trace distance and naturally extending pure-state metrics, for instance,
Nielsen’s complexity metric, to cater to mixed states.
Furthermore, we established various properties of this new definition.
While we acknowledge that these properties are not exhaustive, they never-
theless offer advantages compared to other recent generalisations, such as the
fact that one definition can cover several examples in the literature in a unified
way. A significant challenge remains: establishing the triangle inequality in its
generality. Our findings, though, hint at the possibility that under suitable
conditions the dual approach yields a proof of the triangle inequality.
Our exploration of specific cases reveals the W11 distance as a potentially
powerful tool for bounding the trace distance between quantum states, draw-
ing parallels with classical approaches for determining total variation distance
E. Beatty and D. Stilck França Ann. Henri Poincaré
and mixing time bounds. Additionally, our work enriches the understanding of
the complexity geometry of quantum states, offering a new lens through which
to view and quantify the complexity of transformations within quantum en-
sembles.
Our examination of the behaviour of the Wasserstein distance under ran-
dom quantum states unveils various phase transitions. These results, derived
from entropic inequalities and continuity bounds, debunk some existing specu-
lations (e.g. that · W1H captures local distinguishability) while affirming oth-
ers (the complexity of small subsystems of random quantum states is low).
In conclusion, our research represents a significant stride forward in the
pursuit of understanding optimal transport in quantum mechanics. The novel
quantum Wasserstein distance that we have proposed holds great promise, not
only as an analytical tool but also as a medium for further exploration and
discovery in quantum computation and information. While our work has paved
the way, many challenges and open problems remain, especially when it comes
to practical implementation of the applications laid out here.
Acknowledgements
DSF and EB would like to thank Alexander Müller-Hermes, Raul Garcı́a-
Patrón, Philippe Faist, Fanch Coudreuse, and Guillaume Aubrun for inter-
esting discussions. This project received support from the PEPR integrated
project EPiQ ANR-22-PETQ-0007 part of Plan France 2030.
Declarations
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive
rights to this article under a publishing agreement with the author(s) or other
rightsholder(s); author self-archiving of the accepted manuscript version of
this article is solely governed by the terms of such publishing agreement and
applicable law.
[18] using standard quantum couplings and using projection onto the asym-
metric subspace as a cost function, with the motivation of mimicking the defi-
nition of total variation distance as the Wasserstein distance corresponding to
the trivial metric.
Originally, [18] defined a second-order Wasserstein semi-distance W on
states on H = Cd as follows. Letting F |a ⊗ |b = |b ⊗ |a define the flip
operator on H ⊗ H, we can then define the symmetric projector Psym (d) as
the projector onto the 1-eigenspace of F, and Pasym (d) the projector onto its
(−1)-eigenspace. Note Psym (d) + Pasym (d) = ICd ⊗Cd .
They then define the quantum optimal transport cost
and W (ρ, σ) = T (ρ, σ). This gives a 2-Wasserstein semi-distance which is
equivalent to the trace distance. This definition was further studied in [29] and
[12], and refined in [42].
Specifically, [42] showed that this original definition does not satisfy a
data-processing inequality, so it does not mirror the original trace distance in
this regard. The definition in Eq. (174) was changed to give the data processing
inequality by considering a complete version:
' n (
T (ρ, σ) = min Tr τAB Pi,asym (d) : τAB ∈ D (Cd )⊗n ⊗ (Cd )⊗n : τA
i=1
= ρ, τB = σ} (176)
where Pi,asym (d) is the projection onto the asymmetric subspace of the ith
qudits (and is the identity on the other qudits). In stabilising, it makes little
difference whether we stabilise each qudit individually with its own copy of
I/2 or share one, so we stabilise this definition to
' n (
Ts (ρ, σ) = min Tr τAB Pi,asym (d ⊗ 2) : (177)
i=1
τAB ∈ D (Cd ⊗ C2 )⊗n ⊗ (Cd ⊗ C2 )⊗n , τA
= ρ ⊗ (I/2)⊗n , τB = σ ⊗ (I/2)⊗n (178)
and Ws (ρ, σ) = Ts (ρ, σ).
We will show, however, that any such type of generalisation leads to some
states ρ with Ts (ρ, ρ) > 0. Take, in this example, n = d = 2 and ρ to be a Bell
state |ψ + ψ + |. As ρ is pure, the set of possible couplings τ is very limited.
In the non-stabilised definition, τ can only be |ψ + ψ + | ⊗ |ψ + ψ + |. In the
stabilised definition, τ must have form |ψ + ψ + | ⊗ |ψ + ψ + | ⊗ ω for some
coupling ω of I⊗2 /4 and I⊗2 /4.
However, when looking at applying Pasym to the individual qubits, we
note that as the Bell state has qubit marginals I/2, we will have nonzero
transport cost from ρ to itself. Indeed in the non-stabilised case,
2
T (ρ, σ) = Tr |ψ + ψ + | ⊗ |ψ + ψ + | (Pi,asym (2)) (179)
i=1
I I
= 2Tr ⊗ Pasym (2) (180)
2 2
1 1
= Tr[Pasym (2)] = . (181)
2 2
In the stabilised case, we still have for any τ , that
' 2 (
Tr τAB Pi,asym (2 ⊗ 2) (182)
i=1
2
= Tr |ψ + ψ + | ⊗ |ψ + ψ + | ⊗ ω (Pi,asym (2 ⊗ 2)) (183)
i=1
2
I I
= Tr ⊗ ⊗ ωi Pasym (2 ⊗ 2) (184)
i=1
2 2
2
I I
= Tr ⊗ ⊗ ωi (Pasym (2) ⊗ Psym (2) + Psym (2) ⊗ Pasym (2)) (185)
i=1
2 2
Order p Quantum Wasserstein Distances from Couplings
2
1 3
= Tr [ωi Psym (2)] + Tr [ωi Pasym (2)] (186)
i=1
4 4
2
1 2
1 1
≥ Tr [ωi Psym (2)] + Tr [ωi Pasym (2)] = Tr[ωi ] = (187)
i=1
4 i=1
4 2
where in line (184), ωi is a coupling of I/2 with itself, on the ith stabilising
qubits.
We see from the details of this example that, no matter whether or not
we stabilise using many qubits or a single qubit, and no matter the size of
the space, any attempt to put a Pasym projector on a subspace of H will
result in highly entangled pure states, those which have marginals close to
maximally mixed on subspaces where Pasym is placed, having nonzero self-
distance. This is largely because the coupling of a pure state with itself is
forced to be the product of the state with itself, and because the identity
on Cd ⊗ Cd has a nonzero asymmetric and a nonzero symmetric component.
Thus, it appears that a version of the Wasserstein distance that is not based
on a single observable, like ours, is more suited to obtain generalisations of
quantities like the one of De Palma et al.
As in the definition of the W1 norm in [50], a first proposal was simply the
condition
qj (|ψj ψj | − |ϕj ϕj |) = ρ − σ, qj > 0. (189)
j∈J
It quickly becomes clear that using a transport plan of this form containing
telescoping sums could lead to the degeneracy of Wpd when
p > 1. The main
stumbling block here centred on the unboundedness of j∈J qj .
Two proposals for restricting j qj were then considered, the first being
qj = 1 (190)
j∈J
Both of these were discounted when considering the transport plans that were
H
allowed under these regimes, and their impact on the norm W∞ .
E. Beatty and D. Stilck França Ann. Henri Poincaré
It is then easy to see that we recover the original distance when considering
product states. However, it is also easy to see that, at least for the case p =
1, enlarging the set of possible states to include entangled states offers no
advantage.
√ Indeed, given any Schmidt decomposition of an entangled |ψ =
i pi |φ1 ⊗ |φ2 , adding instead {(pi , |φ1 , |φ2 )}i to the transport plan will
give the same cost and still satisfy the marginal constraints. Thus, at least
for this possible generalisation of the distance to entangled states, there is no
advantage in considering entangled couplings.
such that
|ψψ|d(π1# q)(|ψ) = ρ, |ϕϕ|d(π2# q)(|ϕ) = σ (196)
PH1 PH2
for π1 and π2 the projections onto the first and second coordinates of PH1 ×
PH2 , respectively.
We could, instead, relax this definition to consider transport plans defined
by arbitrary probability measures q on PH1 × PH2 satisfying Eq. (196). For a
distance d on PH, taking H = H1 = H2 , this would give transport cost
Tpd (q) = d(|ψ , |ϕ)p dq(|ψ , |ϕ). (197)
PH×PH
and let
We have that
qj |ψj ψj | − |ψψ|q(|ψ , |ϕ)uj (|ψ , |ϕ)dμHaar (|ψ , |ϕ) (200)
Bj ()
2
= (|ψj ψj | − |ψψ|)q(|ψ , |ϕ)uj (|ψ , |ϕ)dμHaar (|ψ , |ϕ) (201)
Bj ()
2
≤ |ψj ψj | − |ψψ| 2 q(|ψ , |ϕ)uj (|ψ , |ϕ)dμHaar (|ψ , |ϕ) (202)
Bj ()
≤ q(|ψ , |ϕ)uj (|ψ , |ϕ)dμHaar (|ψ , |ϕ) (203)
Bj ()
= qj (204)
and therefore that ρ − ρ ∞ ≤ ρ − ρ 2 ≤ . The same holds for σ.
For the similarity of cost,
p
T (Q) − T p (q)
d d
∞
= d(|ψ , |ϕ ) q(|ψ , |ϕ )dμHaar (|ψ , |ϕ ) (205)
p p
qj d(|ψj , |ϕj ) −
j=1 PH×PH
∞
p
= q(|ψ , |ϕ )uj (|ψ , |ϕ ) d(|ψj , |ϕj ) − d(|ψ , |ϕ ) dμHaar (|ψ , |ϕ )
p
(206)
j=1 Bj ()
∞
p
q(|ψ , |ϕ )uj (|ψ , |ϕ ) d(|ψj , |ϕj ) − d(|ψ , |ϕ ) dμHaar (|ψ , |ϕ ) (207)
p
≤
j=1 Bj ()
∞
p
≤ qj sup d(ψ , ϕ ) − d(|ψ , |ϕ )p (208)
|ψψ|−|ψ ψ | +|ϕϕ|−|ϕ ϕ | ≤
j=1 2 2
p
= sup d(ψ , ϕ ) − d(|ψ , |ϕ )p . (209)
|ψψ|−|ψ ψ |2 +|ϕϕ|−|ϕ ϕ |2 ≤
swapping limits in line 206 as the terms in each part are all positive, and
combining into one as the sums and integrals are all bounded. This final line
tends to 0 with due to the uniform continuity of d.
From the proof of continuity in Appendix D 1, we can find a countable
transport plan Q from ρ to σ with cost at most the cost of Q in the limit
→ 0. Therefore in the limit → ∞, Tpd (Q ) ≤ Tpd (q) and we are done.
γ23 of μ2 and μ3 , we can find measure γ123 with (1, 2)-marginal γ12 and (2, 3)-
marginal γ23 . We can then take the (1, 3)-marginal γ13 of this overarching
measure and show that [11]
We may then begin to build our new transport plan starting with the
partial transport plan
⎧⎛ ) ) ⎞⎫
⎨ ρ σ ⎬
ψ j ϕj
Q1 = ⎝qj , ,
⎠ (214)
⎩ ψ ρ |ψ ρ ϕσ |ϕσ ⎭
j j j j
j∈J
where qj = c
c+1 qj min{ψjρ |ψjρ , ϕσj |ϕσj }. Note then that
) * ) *
ρ σ
c ψj ψjρ c ϕj ϕσj
ρ ≥ ρ̃ ≥ qj and σ ≥ σ̃ ≥ qj .
c+1
j∈J ψjρ |ψjρ c+1
j∈J
ϕσj |ϕσj
(215)
)*
ρ
ψj ψjρ
We can then transport the positive semidefinite operator ρ − j∈J qj
ψjρ |ψjρ
)*
σ
ϕj ϕσ
j
onto σ − j∈J qj ϕσ
via any partial transport plan. Let these transport
j |ϕj
σ
qj |ψj ψj | − qj ρ ρ
≤ qj |ψj ψj | − ρ ρ
+ |qj − qj |
Tr|ψj ψj | Tr|ψj ψj |
1 1
(221)
√ 1
≤ 2qj L + L + qj
c+1
√ 1
= 2 L+L+ qj (222)
c+1
ρ ρ
|ψj ψj |
and for j ∈
/ K, we have qj |ψj ψj | − qj ≤ 2qj . In order to bound
Tr|ψjρ ψjρ |
1
/ qj , let {|α}α be some orthonormal basis for the orthogonal complement
j ∈K
of Sρ . We have
⎡ ⎤
ρ − ρ ∞ dimH ≥ α|ρ − ρ |α ≥ Tr ⎣ qj (|ψj⊥ ψj⊥ | + |ϕ⊥ ⊥ ⎦
j ϕj |)
α j
−cδdimH. (223)
The same applies to σ. Hence,
1
qj ≤ qj Tr[|ψj⊥ ψj⊥ | + |ϕ⊥ ⊥
j ϕj |] (224)
L
j ∈K
/ j ∈K
/
⎡ ⎤
1 ⎣ ⊥ ⎦
≤ Tr qj (|ψj⊥ ψj⊥ | + |ϕ⊥j ϕj |) (225)
L j
1
≤ dimH ( ρ − ρ ∞ + σ − σ ∞ + 2cδ) (226)
L
2δ(c + 1)
≤ dimH. (227)
L
This gives us
) *
ρ
ψj ψjρ
√
ρ −
qj ≤ 2 L + L + 1 + 4δ(c + 1) dimH. (228)
ρ ρ
ψj |ψj c+1 L
j∈J
1
By symmetry this holds for σ, and so substituting into Eq. (219) and then
(216) gives
√ 1 4δ(c + 1)
Tpd (Q2 ) ≤ δdimH + 2 L + L + + dimH diamd (PH)p .
c L
(229)
This bounds Tpd (Q2 ) above.
E. Beatty and D. Stilck França Ann. Henri Poincaré
cost at most more than the transport cost of plan Q between ρ and σ.
Taking the infimum over all such transport plans Q, we see that whenever
ρ − ρ ∞ < δ and σ − σ ∞ < δ, we have
inf Tpd (Q ) ≤ + inf Tpd (Q). (235)
Q ∈Q(ρ ,σ ) Q∈Q(ρ,σ)
poly(n, log −1 )
(236)
Proof. For any fixed |ψ , |ϕ, using Eq. (89) and Lemma 31, we have
2/3 −2
dC (|ψ , |ϕ) ≥ min κG(U, )1/3 n (237)
U ∈SU (2n ),U |ψ =|ϕ
1/3
GS (U, 2 ) 2/3 −2
≥ min κ −1 ))
n
U ∈SU (2 ),U |ψ =|ϕ
n poly(log(G(U, )) + log(
(238)
1/3
GS (U, 2 ) 2/3 −2
≥ min κ n . (239)
U ∈SU (2n ),U |ψ =|ϕ poly(n, log( −1 ))
Order p Quantum Wasserstein Distances from Couplings
Given that W1C (ρ, σ) ≥ min {dC (|ψ , |ϕ) : |ψ ∈ span ρ, |ϕ ∈ span σ}, it
follows from Lemma 32 that
' 1/3 (
2/3 −1 2(1−δ)n
Pρ,σ W1 (ρ, σ) ≤
C
n κ (240)
poly(n, log −1 )
' 1/3
GS (U, 2 ) 2/3 −2
≤ Pρ,σ min κ n
U :U |ψ =|ϕ ,|ψ ∈ span ρ,|ϕ ∈ span σ poly(n, log( −1 ))
1/3 (
2/3 −2 2(1−δ)n
≤ n κ (241)
poly(n, log −1 )
! opt "
= Pρ,σ GS Uρ→σ , 2 ≤ 2(1−δ)n (242)
≤ e−Ω(2
n
log(1/2)
(243)
as claimed.
References
[1] Arjovsky, M., Chintala, S., & Bottou, L.: Wasserstein generative adversarial net-
works. In Proceedings of the 34th International Conference on Machine Learning,
volume 70 of Proceedings of Machine Learning Research, pages 214–223. PMLR,
06–11 (Aug 2017)
[2] Agón, C.A., Headrick, M., Swingle, B.: Subsystem complexity and holography.
J High Energy Phys 2019(2), 1 (2019)
[3] Brock, A., Donahue, J. and Simonyan, K.: Large scale gan training for high
fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, (September
2018)
[4] Biswal, P.: Hypercontractivity and its applications. arXiv preprint
arXiv:1101.2913, (January 2011)
[5] Bonami, A.: Étude des coefficients de Fourier des fonctions de lp (g). Annales de
l’Institut Fourier 20(2), 335–402 (1970)
[6] Bunth, G., Pitrik, J., Titkos, T. and Virosztek, D.: On the metric property of
quantum Wasserstein divergences. arXiv preprint arXiv:2402.13150, (2024)
[7] Bardet, I., Rouzé, C.: Hypercontractivity and logarithmic Sobolev inequality for
non-primitive quantum Markov semigroups and estimation of decoherence rates.
arXiv preprint arXiv:1803.05379, (2018)
[8] Brown, A.R.: A quantum complexity lower bound from differential geometry.
Nat. Phys. 19, 401–406 (2023)
[9] Brown, A.R., Roberts, D.A., Susskind, L., Swingle, B., Zhao, Y.: Complexity,
action, and black holes. Phys. Rev. D 93, 086006 (2016)
[10] Brown, A.R., Susskind, L.: Complexity geometry of a single qubit. Phys. Rev.
D 100, 046020 (2019)
[11] Clement, P., Desch, W.: An elementary proof of the triangle inequality for the
Wasserstein metric. Proc. Am. Math. Soc. 136, 333–340 (2008)
E. Beatty and D. Stilck França Ann. Henri Poincaré
[12] Cole, S., Eckstein, M., Friedland, S., Zyczkowski, K.: On quantum optimal trans-
port. Math. Phys. Anal. Geom. 26, 14 (2023)
[13] Caglioti, E., Golse, F., Paul, T.: Towards optimal transport for quantum densi-
ties. Annali Scuola Normale Superiore - Classe di Scienze, (January 2021)
[14] Coffman, V., Kundu, J., Wootters, W.K.: Distributed entanglement. Phys. Rev.
A 61(5), 052306 (2000)
[15] Carlen, E.A., Maas, J.: An analog of the 2-Wassertein metric in non-commutative
probability under which the Fermionic Fokker-Planck equation is gradient flow
for the entropy. Commun. Math. Phys. 331(3), 887–926 (2014)
[16] Carlen, E.A., Maas, J.: Gradient flow and entropy inequalities for quantum
Markov semigroups with detailed balance. J. Funct. Anal. 273(5), 1810–1869
(2017)
[17] Carlen, E.A., Maas, J.: Non-commutative calculus, optimal transport and func-
tional inequalities in dissipative quantum systems. J. Stat. Phys. 178(2), 319–378
(2019)
[18] Chakrabarti, S., Yiming, H., Li, T., Feizi, S., Wu, X.: Quantum Wasserstein
generative adversarial networks. In H. Wallach, H. Larochelle, A. Beygelzimer,
F. d’ Alché-Buc, E. Fox, and R. Garnett, editors, Adv. Neural Inf. Process. Syst.,
32. (2019)
[19] Michel Marie Deza and Elena Deza: Encyclopedia of Distances. Springer, Berlin
Heidelberg (2013)
[20] Dallaire-Demers, P., Killoran, N.: Quantum generative adversarial networks.
Phys. Rev. A, 98(1), (2018)
[21] Duvenhage, R., Mapaya, M.: Quantum wasserstein distance of order 1 between
channels. Infinite Dimens. Anal. Quant. Prob. Relat. Topics, 26(3):2350006,
(2023)
[22] Dowling, M.R., Nielsen, M.A.: The geometry of quantum computation. Quant.
Inf. Comput. 8(10), 861–899 (2008)
[23] De Palma, G., Klein, T., Pastorello, D.: Classical shadows meet quantum optimal
mass transport. J. Math. Phys., 65(9), (2024)
[24] De Palma, G., Marvian, M., Rouzé, C., França, D.S.: Limitations of variational
quantum algorithms: a quantum optimal transport approach. PRX Quant. 4(1),
010309 (2023)
[25] Datta, N., Rouzé, C.: Relating relative entropy, optimal transport and Fisher
information: a quantum HWI inequality. Annales Henri Poincaré 21(7), 2115–
2150 (2020)
[26] Duvenhage, R., Skosana, S., & Snyman, M.: Extending quantum detailed balance
through optimal transport. arXiv preprint arXiv:2206.15287, (June 2022)
[27] Duvenhage, R.: Quadratic Wasserstein metrics for von Neumann algebras via
transport plans. J. Operat. Theory, 88(2), (2022)
[28] Eldar, L., & Harrow, A. W.: Local hamiltonians whose ground states are hard to
approximate. In 2017 IEEE 58th Annual Symposium on Foundations of Com-
puter Science (FOCS), page 427-438. IEEE, (October 2017)
[29] Friedland, S., Eckstein, M., Cole, S., & Życzkowski, K.: Quantum Monge-
Kantorovich problem and transport distance between density matrices. Phys.
Rev. Lett., 129(11), (2022)
Order p Quantum Wasserstein Distances from Couplings
[30] Frogner, C., Zhang, C., Mobahi, H., Araya, M., & Poggio, T. A.: Learning with
a Wasserstein loss. In Adv. Neural Inf. Process. Syst. (NIPS) 28, (2015)
[31] Galichon, A.: Optimal transport methods in economics. Princeton University
Press, (2018)
[32] Gao, L., Junge, M., & Li, H.: Geometric approach towards complete logarithmic
sobolev inequalities. arXiv preprint arXiv:2102.04434, (February 2021)
[33] Golse, F., Mouhot, C., Paul, T.: On the mean field and classical limits of quan-
tum mechanics. Commun. Math. Phys. 343, 165–205 (2015)
[34] Goodfellow, Ian J., Pouget-Abadie, Jean, Mirza, Mehdi, Xu, Bing, Warde-Farley,
David, Ozair, Sherjil, Courville, Aaron, Bengio, Yoshua: Generative adversarial
nets. Proceedings of the International Conference on Neural Information Pro-
cessing (NIPS 2014), (2014)
[35] Heller, M.P.: Geometry and complexity scaling. Nat. Phys. 19, 312–313 (2023)
[36] Kantorovich, L.V.: On the translocation of masses. Doklady Akademii Nauk
SSSR 37(7–8), 227–229 (1942)
[37] Keevash, P., Lifshitz, N., Long, E., Minzer, D.: Global hypercontractivity and
its applications. arXiv preprint arXiv:2103.04604, (2021)
[38] Tussi Kiani, B., De Palma, G., Marvian, M., Liu, Z.W., Lloyd, S.: Learning
quantum data with the quantum earth mover’s distance. Quant. Sci. Technol.
7(4), 045002 (2022)
[39] Li, L., Bu, K., Koh, D. E., Jaffe, A., & Lloyd, S.: Wasserstein complexity of
quantum circuits. arXiv preprint arXiv:2208.06306, (2022)
[40] Liu, J., & Wu, W.: Automatic image annotation using improved wasserstein
generative adversarial networks. IAENG Int. J. Comput. Sci., 48(3), (2021)
[41] McClean, Jarrod R., Boixo, Sergio, Smelyanskiy, Vadim N., Babbush, Ryan,
Neven, Hartmut: Barren plateaus in quantum neural network training land-
scapes. Nat. Commun., 9(1), (2018)
[42] Müller-Hermes, A.: On the monotonicity of a quantum optimal transport cost,
(November 2022)
[43] Monge, G.: Mémoire sur la théorie des déblais et des remblais. Imprimerie royale,
(1781)
[44] Nielsen, Michael A., Chuang, Isaac L.: Quantum Computation and Quantum
Information: 10th Anniversary Edition. Cambridge University Press, (2010)
[45] Nielsen, M.A., Dowling, M.R., Mile, G., Doherty, A.C.: Quantum computation
as geometry. Science 311(5764), 1133–1135 (2006)
[46] Nielsen, M.A.: A geometric approach to quantum circuit lower bounds. Quant.
Inf. Comput. 6(3), 213–262 (2006)
[47] O’Donnell, Ryan: Analysis of Boolean Functions. Cambridge University Press,
(2014)
[48] Ornstein, D.S.: An application of ergodic theory to probability theory. Ann.
Prob. 1(1), 43–58 (1973)
[49] De Palma, G., Klein, T., & Pastorello, D.: Classical shadows meet quantum
optimal mass transport. arXiv preprint arXiv:2309.08426, (September 2023)
[50] De Palma, G., Marvian, M., Trevisan, D., Lloyd, S.: The quantum Wasserstein
distance of order 1. IEEE Trans. Inf. Theory 67, 6627–6643 (2021)
E. Beatty and D. Stilck França Ann. Henri Poincaré
[51] De Palma, G., Rouzé, C.: Quantum concentration inequalities. Annales Henri
Poincaré 23(9), 3391–3429 (2022)
[52] De Palma, G., Trevisan, D.: Quantum optimal transport with quantum channels.
Annales Henri Poincaré 22(10), 3199–3234 (2021)
[53] De Palma, G., & Trevisan, D.: Quantum optimal transport: Quantum channels
and qubits. In Summer School on Optimal Transport on Quantum Structures,
Erdos Center, Rényi Institute, Budapest, Hungary, arxiv:2307.16268, (Septem-
ber 2022)
[54] De Palma, G., & Trevisan, D.: The Wassertein distance of order 1 for quantum
spin systems on infinite lattices. Annales Henri Poincaré, (June 2023)
[55] Panaretos, V.M., Zemel, Y.: Statistical aspects of Wassertein distances. Annu.
Rev. Stat. Appl. 6(1), 405–431 (2019)
[56] Rouzé, C., Datta, N.: Concentration of quantum states from quantum functional
and transportation cost inequalities. J. Math. Phys. 60(1), 012202 (2019)
[57] Rouzé, C., França, D.S.: Learning quantum many-body systems from a few
copies. arXiv preprint arXiv:2107.03333, (July 2021)
[58] Raginsky, M., Sason, I.: Concentration of measure inequalities and
their communication and information-theoretic applications. arXiv preprint
arXiv:1510.02947, (October 2015)
[59] Ruan, S.-M.: Circuit Complexity of Mixed States. Phd thesis, University of Wa-
terloo, (2021)
[60] Sánchez-Ruiz, J.: Simple proof of Page’s conjecture on the average entropy of a
subsystem. Phys. Rev. E 52, 5653–5655 (1995)
[61] Joaquim Telles de Miranda and Tobias Micklitz: Subsystem trace-distances of
two random states. J. Phys. A Math. Theor. 56(17), 175301 (2023)
[62] Tóth, G., Moroder, T., Gühne, O.: Evaluating convex roof entanglement mea-
sures. Phys. Rev. Lett. 114(16), 10501 (2015)
[63] Trashorras, J.: Large deviations for a matching problem related to the infinity-
Wasserstein distance. Lat. Am. J. Prob. Math. Stat 15, 247–278 (2018)
[64] Villani, C.: Topics in Optimal Transportation. Graduate Studies in Mathematics.
American Mathematical Society, (2003)
[65] Villani, Cédric: Optimal transport: old and new, volume 338. Springer Science &
Business Media, (2008)
[66] Weil, A.: L’integration dans les groupe topologiques et ses applications. Actu-
alités Scientifiques et Industrielles, (1940)
[67] Wei, T.-C., Goldbart, P.M.: Geometric measure of entanglement and applica-
tions to bipartite and multipartite quantum states. Phys. Rev. A, 68 (2003)
[68] Wootters, W.K.: Entanglement of formation of an arbitrary state of two qubits.
Phys. Rev. Lett. 80, 2245–2248 (1998)