Improved Lower Bounds for Learning Quantum Channels in Diamond Distance

Aadil Oufkir [email protected] Mohammed VI Polytechnic University, Rocade Rabat-Salé, Technopolis, Morocco Filippo Girardi [email protected] Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy

Abstract

We prove that learning an unknown quantum channel with input dimension $d_{A}$ , output dimension $d_{B}$ , and Choi rank $r$ to diamond distance $\varepsilon$ requires $\Omega\!\left(\frac{d_{A}d_{B}r}{\varepsilon\log(d_{B}r/\varepsilon)}\right)$ queries. This improves the best previous $\Omega(d_{A}d_{B}r)$ bound by introducing explicit $\varepsilon$ -dependence, with a scaling in $\varepsilon$ that is near-optimal when $d_{A}=rd_{B}$ but not tight in general. The proof constructs an ensemble of channels that are well-separated in diamond norm yet admit Stinespring isometries that are close in operator norm.

1 Introduction

In [AMele2025, chen2025quantumchanneltomographyestimation] it is proved that there exists a quantum learning algorithm that uses

\displaystyle\hskip 0.0ptN=O\left(\frac{d_{A}d_{B}r}{\varepsilon^{2}}\right)

(1)

parallel queries of any unknown channel $\pazocal{N}$ with input dimension $d_{A}$ , output dimension $d_{B}$ , and Choi rank $r$ and, with probability at least $2/3$ , outputs a classical description of a channel $\hat{\pazocal{N}}$ which is distant at most $\varepsilon$ from $\pazocal{N}$ in diamond distance. Moreover, in [Girardi2025Dec] it is proved that any quantum algorithm learning $\pazocal{N}$ up to constant error with success probability at least $2/3$ needs

N=\Omega(d_{A}d_{B}r)

(2)

queries of $\pazocal{N}$ at least. The aim of our work is to improve the lower bound in order to make it dependent on the diamond distance $\varepsilon$ . More precisely, our main result (see Theorem 3) identifies the new lower bound

\displaystyle\hskip 0.0ptN=\Omega\left(\frac{d_{A}d_{B}r}{\varepsilon\log(d_{B}r/\varepsilon)}\right).

(3)

This result, combined with the upper bound of [chen2025quantumchanneltomographyestimation], shows that the optimal dependency in the precision parameter $\varepsilon$ is $\mathaccent 869{\Theta}(\frac{1}{\varepsilon})$ when $d_{A}=d_{B}r$ . In particular, for learning unitary channels, our result recovers the optimal lower bound $\Omega(\frac{d^{2}}{\varepsilon})$ of [haah2023query] up to a logarithmic factor, via a different proof strategy that applies specifically in the coherent setting. Moreover, our approach generalizes to non-unitary channels.

When $d_{A}=1$ , channel learning becomes state learning, for which the optimal complexity is $\mathaccent 869{\Theta}(\frac{d_{B}r}{\varepsilon^{2}})$ [ODonnell2016-1, haah2017sample]. This shows that the $\varepsilon$ -dependency in our lower bound is not optimal in general.

The main idea to prove our lower bound consists of two steps. The first one is the proof of a general lower bound (Theorem 1), which leverage any arbitrary ensemble of channels $\{{}_{i}\}$ that are pairwise far in diamond distance, yet whose Stinespring isometries are pairwise close in operator norm. The resulting lower bound then scales logarithmically with the size of the ensemble and inversely with the distance of the Stinespring isometries. The second step is the actual construction of a suitable ensemble of channels to prove our lower bound. A natural strategy to this end could be the use of existing packing nets. However, as we show in Appendix A, this approach – although simpler – yields a weaker bound:

\displaystyle N\geq\Omega\left(\frac{d_{A}d_{B}r}{\sqrt{\varepsilon}\log(d_{B}r/\sqrt{\varepsilon})}\right).

(4)

Instead, using a probabilistic approach, we construct a random family of Stinespring isometries which, with positive probability, are sufficiently close in operator norm, but engender channels which are far enough in diamond distance. Such argument ensures the existence of an ensemble that produces the desired lower bound (Theorem 3).

The remainder of the manuscript is organised as follows. In Section 1 we introduce the notation and the definitions that we are going to use in the paper. In Section 2 we state and prove Theorem 1, i.e. the general lower bound on channel learning constructed in terms of ensembles of quantum channels. In Section 3 we prove the lower bound (3) leveraging a family of particular isometries (Lemma 2) in order to construct a suitable ensemble of channels to be used in Theorem 1 (see Theorem 3). The isometries of Lemma 2 are identified using a random construction, which is discussed in Section 4. In the Appendix we provide the proofs that were deferred in the previous sections to improve readability.

1.1 Notation

All the Hilbert spaces that we are going to consider are supposed to be finite-dimensional. Let $\mathcal{H}_{A}\cong\mathbb{C}^{d_{A}}$ and $\mathcal{H}_{B}\cong\mathbb{C}^{d_{B}}$ denote input and output spaces. We write $\mathcal{L}(\mathcal{H})$ for linear operators on $\mathcal{H}$ , and $\mathcal{D}(\mathcal{H})$ for quantum states (positive semi-definite operators with unit trace). The operator norm is $\|X\|_{\mathrm{op}}=\sup_{\|\psi\|=1}\|X\ket{\psi}\|$ , and the trace norm is $\|X\|_{1}=\operatorname{Tr}\sqrt{X^{\dagger}X}$ . For $\rho\in\mathcal{D}(\mathcal{H})$ , the von Neumann entropy is $S(\rho)=-\operatorname{Tr}[\rho\log\rho]$ . All logarithms are in base $\mathrm{e}$ .

1.2 Quantum channels and their representations

A quantum channel $\Phi:\mathcal{L}(\mathcal{H}_{A})\to\mathcal{L}(\mathcal{H}_{B})$ is a completely positive trace-preserving (CPTP) map. It can be written in the Kraus representation as $\Phi(X)=\sumop\slimits@_{i=1}^{r}K_{i}XK_{i}^{\dagger}$ with $\sumop\slimits@_{i}K_{i}^{\dagger}K_{i}=\mathds{1}_{A}$ . The minimal $r$ is called the Kraus rank.

For any channel , there exists an isometry $V:\mathcal{H}_{A}\to\mathcal{H}_{B}\otimes\mathcal{H}_{E}$ ( $V^{\dagger}V=\mathds{1}_{A}$ ) such that $\Phi(X)=\operatorname{Tr}_{E}(VXV^{\dagger})$ . Such isometry $V$ is called Stinespring dilation of . The minimal dimension of $\mathcal{H}_{E}$ equals the Kraus rank.

The Choi state of the channel is

\displaystyle\hskip 0.0ptJ(\Phi)\coloneqq(\mathds{1}_{A^{\prime}}\otimes\Phi)(\ket{\Psi}\bra{\Psi})\in\mathcal{L}(\mathcal{H}_{A^{\prime}}\otimes\mathcal{H}_{B}),

(5)

where $\ket{\Psi}_{A^{\prime}A}\coloneqq\frac{1}{\sqrt{d_{A}}}\sumop\slimits@_{i=1}^{d_{A}}\ket{i}_{A^{\prime}}\otimes\ket{i}_{A}$ is the normalised maximally entangled state between $A^{\prime}$ and $A$ . The linear map is CPTP if and only if $J(\Phi)\geq 0$ and $\operatorname{Tr}_{B}J(\Phi)=\mathds{1}_{A^{\prime}}/d_{A}$ .

The Choi rank $\operatorname{rank}_{\text{Choi}}{\Phi}\coloneqq\operatorname{rank}(J(\Phi))$ equals the Kraus rank and the minimal environment dimension of Stinespring dilations.

1.3 Channel ensembles with distance constraints

Let $\mathcal{C}(d_{A},d_{B},r)$ denote the set of quantum channels $\Phi:\mathcal{L}(\mathcal{H}_{A})\to\mathcal{L}(\mathcal{H}_{B})$ with Choi rank at most $r$ , i.e.

\displaystyle\hskip 0.0pt\mathcal{C}(d_{A},d_{B},r)\coloneqq\{\Phi\text{ quantum channel}\mid\operatorname{rank}_{\text{Choi}}{\Phi}\leq r\}.

(6)

For a channel $\Phi\in\mathcal{C}(d_{A},d_{B},r)$ , let $V:\mathcal{H}_{A}\to\mathcal{H}_{B}\otimes\mathcal{H}_{E}$ be a Stinespring isometry with minimal environment dimension $d_{E}\leq r$ .

For channels $\Phi,\Psi:\mathcal{L}(\mathcal{H}_{A})\to\mathcal{L}(\mathcal{H}_{B})$ , the diamond distance is defined as

\displaystyle\hskip 0.0pt\|\Phi-\Psi\|_{\diamond}\coloneqq\sup_{\rho_{RA}\in\mathcal{D}(\mathcal{H}_{R}\otimes\mathcal{H}_{A})}\|{(\mathds{1}_{R}\otimes\Phi)(\rho_{RA})-(\mathds{1}_{R}\otimes\Psi)(\rho_{RA})}\|_{1},

(7)

where the supremum is over all auxiliary spaces $\mathcal{H}_{R}$ and states $\rho_{RA}$ .

We say that two channels $\Phi,\Psi\in\mathcal{C}(d_{A},d_{B},r)$ are $\varepsilon$ -diamond far if

\|\Phi-\Psi\|_{\diamond}>\varepsilon.

(8)

We say that their Stinespring isometries are $\eta$ -operator norm close if there exist choices of isometries $V,V$ such that

\|V-V\|_{\mathrm{op}}\leq\eta.

(9)

Finally, we define $\mathcal{E}(d_{A},d_{B},r,\varepsilon,\eta)$ as the set of ensembles of channels that are pairwise $2\varepsilon$ -diamond-far but have $\eta$ -close Stinespring isometries:

\mathcal{E}(d_{A},d_{B},r,\varepsilon,\eta)=\left\{\{{}_{i}\}_{i=1}^{M}\subset\mathcal{C}(d_{A},d_{B},r)\;\middle|\;\begin{array}[]{l}\forall i\neq j:\|{}_{i}-{}_{j}\|_{\diamond}>2\varepsilon,\\ \exists\text{ Stinespring isometries }\{V_{i}\}_{i=1}^{M}\\ \text{such that }\forall i\neq j:\|V_{i}-V_{j}\|_{\mathrm{op}}\leq\eta\end{array}\right\}.

(10)

1.4 The coherent query model

In the coherent query model for quantum channel learning, the learning algorithm is allowed to interleave queries to the unknown channel with arbitrary, adaptively chosen quantum operations. Formally, the algorithm prepares an initial quantum state $\rho$ on a system comprising the input space $\mathcal{H}_{A}$ of dimension $d_{A}$ together with an auxiliary system $\mathcal{H}_{\text{aux}}$ of arbitrary dimension. It then performs $N$ uses of the unknown channel , interspersed with arbitrary quantum channels $\{\pazocal{N}_{i}\}_{i=1}^{N-1}$ (the intermediate operations) that act jointly on the output space $\mathcal{H}_{B}$ and the auxiliary system. The final state after $N$ queries is

\displaystyle\hskip 0.0pt\rho^{\text{output}}=\bigl[\Phi\otimes{\rm Id}_{\text{aux}}\bigr]\circ\pazocal{N}_{N-1}\circ\bigl[\Phi\otimes{\rm Id}_{\text{aux}}\bigr]\circ\cdots\circ\pazocal{N}_{1}\circ\bigl[\Phi\otimes{\rm Id}_{\text{aux}}\bigr](\rho)\,,

(11)

where each $\Phi\otimes\mathds{1}_{\text{aux}}$ denotes a query to the unknown channel acting on the input system while leaving the auxiliary system unchanged. Finally, the algorithm measures $\rho^{\text{output}}$ with a positive operator-valued measure (POVM) to produce a classical description of an estimate $\hat{\Phi}$ . The query complexity is the minimum number $N$ of uses of required to output, with high probability, an estimate $\hat{\Phi}$ such that $\|\Phi-\hat{\Phi}\|_{\diamond}\leq\varepsilon$ .

This model generalizes the parallel (or non-adaptive) query model, in which all $N$ uses of are applied in parallel on a (possibly entangled) input state, corresponding to the special case where the intermediate operations $\pazocal{N}_{i}$ are all identity channels. The coherent model captures the most general physically realizable learning procedure that respects causality and does not assume access to the inverse or conjugate of . It is the natural setting for studying the fundamental quantum limits of channel learning when arbitrary quantum processing between queries is allowed.

2 A general lower bound on channel learning

In this section, we prove a general lower bound for learning a general quantum channel in diamond distance.

Theorem 1 ((General lower bound)).

Let $d_{A},d_{B},r\geq 1$ , $M\geq 3$ and $\varepsilon,\eta\in(0,1/2)$ . Consider an ensemble $\{{}_{i}\}_{i=1}^{M}\in\mathcal{E}(d_{A},d_{B},r,\varepsilon,\eta)$ of $M$ quantum channels that are $2\varepsilon$ -diamond-far and whose Stinespring isometries are $\eta$ -operator-norm-close. Any coherent algorithm that constructs $\hat{\Phi}_{i}$ such that $\|{}_{i}-\hat{\Phi}_{i}\|_{\diamond}\leq\varepsilon$ with probability at least $2/3$ for all $i\in[M]$ needs at least

\displaystyle\hskip 0.0ptN=\left\lceil\frac{(2/3)\log(M)-\log 2}{4\eta\log(d_{B}r/\eta)}\right\rceil

(12)

uses of $\pazocal{N}$ .

We follow a standard strategy for proving lower bounds for learning problems (e.g., [flammia2012quantum, haah2017sample, lowe2022lower, fawzi2023lower, oufkir2023sample, Bluhm2024Mar, Rosenthal2024Sep, Mele_2025]).

Proof.

Refer to caption — Figure 1: Schematic representation of the ensemble $\{{}_{i}\}_{i=1}^{M}$ .

Let us consider any fixed coherent algorithm that constructs $\hat{\Phi}_{i}$ such that $\|{}_{i}-\hat{\Phi}_{i}\|_{\diamond}\leq\varepsilon$ with probability at least $2/3$ for all $i\in[M]$ . Let $X\sim{\rm Uniform}[M]$ and let $Y$ be the index output by such algorithm upon receiving the quantum channel ${}_{X}^{\otimes N}$ . Since the quantum channels $\{{}_{i}\}_{i=1}^{M}$ are pairwise $2\varepsilon$ -diamond-far, the algorithm should find $X$ just by picking the $\varepsilon$ -closest channel to $\hat{\Phi}_{X}$ in $\{{}_{i}\}_{i=1}^{M}$ (see Figure 1). Hence, by Fano’s inequality [FANO] we have:

\displaystyle\hskip 0.0ptI(X:Y)\geq(2/3)\log(M)-\log 2.

(13)

A coherent algorithm using the quantum channel ${}_{x}(\,\cdot\,)$ chooses the input state $\rho$ , the channels $\pazocal{N}_{1},\dots,\pazocal{N}_{N-1}$ , and measures the output state:

\displaystyle\hskip 0.0pt\sigma_{x}^{N}=[{}_{x}\otimes{\rm Id}]\circ\pazocal{N}_{N-1}\circ\cdots\circ\pazocal{N}_{1}\circ[{}_{x}\otimes{\rm Id}](\rho).

(14)

We can suppose that the channels $\{{}_{x}\}_{k\in[N]}$ act on different systems $\{A_{k}\}_{k\in[N]}$ of dimension $d_{A}$ (we can include swap channels in $\pazocal{N}_{1},\dots,\pazocal{N}_{N-1}$ if necessary). We can assume, without loss of generality, that all the channels $\pazocal{N}_{1},\dots,\pazocal{N}_{N-1}$ are isometries up to modifying the measurement at the end. Similarly, we can suppose that $V_{x}^{A_{k}\to B_{k}}=V_{{}_{x}}^{A_{k}\to B_{k}}$ is applied instead of _x directly after $\pazocal{N}_{k-1}$ for $k=1,\dots,N$ . The global system is thus $B_{1}\cdots B_{N}E$ where $|B_{k}|=d_{B}r$ and $E$ is an ancilla system of arbitrary dimension. The global state before measurement becomes

\displaystyle\hskip 0.0pt\sigma_{x}^{N}=[\pazocal{V}_{x}\otimes{\rm Id}]\circ\pazocal{N}_{N-1}\circ\cdots\circ\pazocal{N}_{1}\circ[\pazocal{V}_{x}\otimes{\rm Id}](\rho),

(15)

with $\pazocal{V}_{x}(\cdot)=V_{x}(\cdot)V_{x}^{\dagger}$ . For $k\in[N]$ , we denote $\sigma_{x}^{k}=\pazocal{V}_{x}\circ\pazocal{N}_{k-1}\circ\cdots\circ\pazocal{N}_{1}\circ\pazocal{V}_{x}(\rho)$ , $\sigma_{x}^{0}=\rho$ and $\pazocal{N}_{0}={\rm Id}$ , so that we have

\displaystyle\hskip 0.0pt\sigma_{x}^{k}=\pazocal{V}_{x}\circ\pazocal{N}_{k-1}(\sigma_{x}^{k-1}).

(16)

Denote by $\pi_{k}=\frac{1}{M}\sumop\slimits@_{x=1}^{M}\pazocal{V}_{x}\circ\pazocal{N}_{k-1}(\sigma_{x}^{k-1})$ and $\xi_{k}=\frac{1}{M}\sumop\slimits@_{x=1}^{M}\pazocal{V}_{1}\circ\pazocal{N}_{k-1}(\sigma_{x}^{k-1})$ . Hence the mutual information between $X$ and the observation of the coherent algorithm $Y$ can be bounded as follows

$\displaystyle\hskip 0.0ptI(X:Y)$	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(i)}}}}{{\leq}}S\left(\frac{1}{M}\sumop\slimits@_{x=1}^{M}\sigma_{x}^{N}\right)-\frac{1}{M}\sumop\slimits@_{x=1}^{M}S\left(\sigma_{x}^{N}\right)$	(17)
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(ii)}}}}{{=}}\sumop\slimits@_{k=1}^{N}S\left(\frac{1}{M}\sumop\slimits@_{x=1}^{M}\sigma_{x}^{k}\right)-\sumop\slimits@_{k=1}^{N}S\left(\frac{1}{M}\sumop\slimits@_{x=1}^{M}\sigma_{x}^{k-1}\right)$
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(iii)}}}}{{=}}\sumop\slimits@_{k=1}^{N}S\left(\frac{1}{M}\sumop\slimits@_{x=1}^{M}\pazocal{V}_{x}\circ\pazocal{N}_{k-1}(\sigma_{x}^{k-1})\right)-\sumop\slimits@_{k=1}^{N}S\left(\frac{1}{M}\sumop\slimits@_{x=1}^{M}\pazocal{V}_{1}\circ\pazocal{N}_{k-1}(\sigma_{x}^{k-1})\right)$
	$\displaystyle=\sumop\slimits@_{k=1}^{N}S\left(B_{1}\cdots B_{k-1}B_{k}A_{k+1}\cdots A_{N}E\right)_{\pi_{k}}-\sumop\slimits@_{k=1}^{N}S\left(B_{1}\cdots B_{k-1}B_{k}A_{k+1}\cdots A_{N}E\right)_{\xi_{k}},$

where (i) uses Holevo’s theorem [holevo1973bounds] ; (ii) is a telescopic sum and uses the fact that $S\left(\sigma_{x}^{N}\right)=S\left(\rho\right)$ for all $x$ as all the applied operations are isometry; (iii) uses the assumption that $\pazocal{N}_{k-1}$ and $\pazocal{V}_{1}$ are isometry channels.

Now, observe that we have that $\mathrm{Tr}_{B_{k}}\left[\pi_{k}\right]=\mathrm{Tr}_{B_{k}}\left[\xi_{k}\right]$ so we can apply the continuity bound of [Berta2024Aug, Theorem 5]

		$\displaystyle S\left(B_{1}\cdots B_{k-1}B_{k}A_{k+1}\cdots A_{N}E\right)_{\pi_{k}}-S\left(B_{1}\cdots B_{k-1}B_{k}A_{k+1}\cdots A_{N}E\right)_{\xi_{k}}$		(18)
		$\displaystyle\quad=H\left(B_{k}\|B_{1}\cdots B_{k-1}A_{k+1}\cdots A_{N}E\right)_{\pi_{k}}-H\left(B_{k}\|B_{1}\cdots B_{k-1}A_{k+1}\cdots A_{N}E\right)_{\xi_{k}}$
		$\displaystyle\quad\leq\\|\pi_{k}-\xi_{k}\\|_{1}\log(\|B_{k}\|^{2})+h_{2}(\\|\pi_{k}-\xi_{k}\\|_{1})$

with $h_{2}(a)=-a\log a-(1-a)\log(1-a)$ being the binary entropy. We have that

$\displaystyle\hskip 0.0pt\\|\pi_{k}-\xi_{k}\\|_{1}$	$\displaystyle=\left\\|\frac{1}{M}\sumop\slimits@_{x=1}^{M}\pazocal{V}_{x}\circ\pazocal{N}_{k-1}(\sigma_{x}^{k-1})-\frac{1}{M}\sumop\slimits@_{x=1}^{M}\pazocal{V}_{1}\circ\pazocal{N}_{k-1}(\sigma_{x}^{k-1})\right\\|_{1}$	(19)
	$\displaystyle\leq\frac{1}{M}\sumop\slimits@_{x=1}^{M}\left\\|(\pazocal{V}_{x}-\pazocal{V}_{1})\circ\pazocal{N}_{k-1}(\sigma_{x}^{k-1})\right\\|_{1}$
	$\displaystyle=\frac{1}{M}\sumop\slimits@_{x=1}^{M}\left\\|V_{x}\zeta V_{x}^{\dagger}-V_{1}\zeta V_{1}^{\dagger}\right\\|_{1}$

with $\zeta=\pazocal{N}_{k-1}(\sigma_{x}^{k-1})$ being a quantum state. Using the triangle inequality, we obtain

$\displaystyle\hskip 0.0pt\frac{1}{M}\sumop\slimits@_{x=1}^{M}\left\\|V_{x}\zeta V_{x}^{\dagger}-V_{1}\zeta V_{1}^{\dagger}\right\\|_{1}$	$\displaystyle\leq\frac{1}{M}\sumop\slimits@_{x=1}^{M}\left(\left\\|(V_{x}-V_{1})\zeta V_{x}^{\dagger}\right\\|_{1}+\left\\|V_{1}\zeta(V_{x}-V_{1})^{\dagger}\right\\|_{1}\right)$	(20)
	$\displaystyle\leq\frac{1}{M}\sumop\slimits@_{x=1}^{M}\left(\\|V_{x}-V_{1}\\|_{\mathrm{op}}\left\\|\zeta V_{x}^{\dagger}\right\\|_{1}+\\|V_{x}-V_{1}\\|_{\mathrm{op}}\left\\|V_{1}\zeta\right\\|_{1}\right)$
	$\displaystyle\leq 2\eta.$

Therefore, we deduce

$\displaystyle\hskip 0.0ptI(X:Y)$	$\displaystyle\leq\sumop\slimits@_{k=1}^{N}S\left(B_{1}\cdots B_{k-1}B_{k}A_{k+1}\cdots A_{N}E\right)_{\pi_{k}}-\sumop\slimits@_{k=1}^{N}S\left(B_{1}\cdots B_{k-1}B_{k}A_{k+1}\cdots A_{N}E\right)_{\xi_{k}}$	(21)
	$\displaystyle\leq\sumop\slimits@_{k=1}^{N}\\|\pi_{k}-\xi_{k}\\|_{1}\log(\|B_{k}\|^{2})+h_{2}(\\|\pi_{k}-\xi_{k}\\|_{1})$
	$\displaystyle\leq\sumop\slimits@_{k=1}^{N}2\eta\log((d_{B}r)^{2})+h_{2}(2\eta)$
	$\displaystyle\leq 4N\eta\log(d_{B}r/\eta),$

where we used that $h_{2}(a)\leq 2a\log(1/a)$ for $a\in(0,\frac{1}{2})$ . Since $I(X:Y)\geq(2/3)\log(M)-\log 2$ we deduce that

\displaystyle\hskip 0.0ptN\geq\frac{(2/3)\log(M)-\log 2}{4\eta\log(d_{B}r/\eta)}.

(22)

This concludes the proof. ∎

Given Theorem 1, we can prove lower bounds on learning quantum channels by constructing an ensemble $\{{}_{i}\}_{i=1}^{M}$ within $\mathcal{E}(d_{A},d_{B},r,\varepsilon,\eta)$ containing $M$ quantum channels that are pairwise $2\varepsilon$ -diamond-far, yet whose Stinespring isometries are pairwise $\eta$ -operator-norm-close. The resulting lower bound then scales with $M$ and inversely with $\eta$ . To strengthen this bound, we should aim to construct an ensemble that maximizes $M$ while minimizing $\eta$ ¹¹1Note that by the inequality $\|{}_{x}-{}_{y}\|_{\diamond}\leq\|V_{x}-V_{y}\|_{\mathrm{op}}$ [kretschmann2008information], the parameter $\eta$ should be at least $2\varepsilon$ ..

A natural approach to constructing such an ensemble is to use existing packing nets. However, this leads to an ensemble in $\mathcal{E}(d_{A},d_{B},r,\varepsilon,4\sqrt{\varepsilon})$ of cardinality $M$ satisfying, $\log M=\Omega(d_{A}d_{B}r)$ and by Theorem 1 implies the following weak lower bound (see Appendix A for details):

\displaystyle N\geq\Omega\left(\frac{d_{A}d_{B}r}{\sqrt{\varepsilon}\log(d_{B}r/\sqrt{\varepsilon})}\right).

(23)

In what follows, we improve the $\varepsilon$ -dependence of this lower bound by constructing a new ensemble in $\mathcal{E}(d_{A},d_{B},r,\varepsilon,\eta)$ with comparable cardinality but with $\eta=O(\varepsilon)$ rather than $O(\sqrt{\varepsilon})$ .

3 An ensemble yielding an improved lower bound

In this section, we improve the bound (23) by constructing an ensemble $\mathcal{E}(d_{A},d_{B},r,\Omega(\varepsilon),2\varepsilon)$ with cardinality $M$ satisfying $\log M=\Omega(d_{A}d_{B}r)$ . More precisely, we construct a set of isometries $\{{V}_{x}\}_{x\in[M]}$ corresponding to quantum channels $\{{}_{x}\}_{x\in[M]}$ such that $\|{}_{x}-{}_{y}\|_{\diamond}\geq\Omega(\varepsilon)$ , $\|{V}_{x}-{V}_{y}\|_{\infty}\leq O(\varepsilon)$ , and $\log M=\Omega(d_{A}d_{B}r)$ , as in Figure 2.

We prove the existence of such a set using a probabilistic argument. Let $\mathaccent 869{\Phi}_{0}$ be a quantum channel with Kraus operators $\{K_{0,i}\}_{i\in[r]}$ satisfying

\displaystyle\left|\operatorname{Tr}\left[K_{0,i}^{\dagger}K_{0,j}\right]\right|\leq\frac{2d_{A}}{r}\delta_{i,j}\,,\quad\forall i,j\in[r].

(24)

The existence of such a channel is shown in Appendix B. Let $\mathaccent 869{V}_{0}^{A\to BE}=\sumop\slimits@_{i=1}^{r}\ket{i}_{E}\otimes K_{0,i}$ be a Stinespring isometry of the quantum channel $\mathaccent 869{\Phi}_{0}$ .

Lemma 2.

There exists a set of isometries $\{\mathaccent 869{V}^{A\to BE}_{x}\}_{x\in[M]}$ such that $\log M=\frac{1}{1201}d_{A}d_{B}r$ and, for all $x\neq y$ , we have

\displaystyle\left\|\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}{}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\right]\right\|_{1}\geq 0.05,

(25)

where ${}_{A^{\prime}A}\coloneqq\ket{\Psi}\!\!\bra{\Psi}_{A^{\prime}A}$ is the maximally entangled state with $A^{\prime}\simeq A$ .

Proof.

The proof of this lemma is deferred to Section 4. ∎

We now have all the ingredients to prove the main result.

Proof.

Given the set of isometries $\{\mathaccent 869{V}^{A\to BE}_{x}\}_{x\in[M]}$ provided by Lemma 2, we define the isometries

\displaystyle\hskip 0.0ptV^{A\to FBE}_{x}\coloneqq\sqrt{1-\varepsilon^{2}}\ket{0}_{F}\otimes\mathaccent 869{V}_{0}^{A\to BE}+\varepsilon\ket{1}_{F}\otimes\mathaccent 869{V}_{x}^{A\to BE}

(27)

and let ${}_{x}(\,\cdot\,)\coloneqq\mathrm{Tr}_{E}\left[{V}_{x}\,\cdot\,{V}_{x}^{\dagger}\right]$ the corresponding quantum channel. It has Kraus rank at most $|E|=r$ . The quantum channel _x has input system $A$ of dimension $d_{A}$ and output system $FB$ of dimension $2d_{B}$ . We have that

\displaystyle\hskip 0.0pt\|{V}_{x}-{V}_{y}\|_{\infty}

\displaystyle=\|\varepsilon\ket{1}\otimes\mathaccent 869{V}_{x}-\varepsilon\ket{1}\otimes\mathaccent 869{V}_{y}\|_{\infty}=\varepsilon\|\mathaccent 869{V}_{x}-\mathaccent 869{V}_{y}\|_{\infty}\leq 2\varepsilon.

(28)

On the other hand, lower bounding the diamond norm by choosing the input state to be the maximally entangled state ${}_{A^{\prime}A}$ , we have

$\displaystyle\hskip 0.0pt\left\\|{}_{x}-{}_{y}\right\\|_{\diamond}$	$\displaystyle\geq\left\\|{}_{x}(\Psi)-{}_{y}(\Psi)\right\\|_{1}$	(29)
	$\displaystyle=\left\\|\mathrm{Tr}_{E}\left[{V}_{x}(\Psi){V}_{x}^{\dagger}\right]-\mathrm{Tr}_{E}\left[{V}_{y}(\Psi){V}_{y}^{\dagger}\right]\right\\|_{1}$
	$\displaystyle=\bigg\\|\varepsilon^{2}\ket{1}\!\!\bra{1}\otimes\left(\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{x}\Psi\mathaccent 869{V}_{x}^{\dagger}\right]-\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{y}\Psi\mathaccent 869{V}_{y}^{\dagger}\right]\right)$
	$\displaystyle\qquad+\varepsilon\sqrt{1-\varepsilon^{2}}\ket{0}\bra{1}\otimes\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\Psi(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\right]$
	$\displaystyle\qquad+\varepsilon\sqrt{1-\varepsilon^{2}}\ket{1}\bra{0}\otimes\mathrm{Tr}_{E}\left[(\mathaccent 869{V}_{x}-\mathaccent 869{V}_{y})\Psi\mathaccent 869{V}_{0}^{\dagger}\right]\bigg\\|_{1}$
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(i)}}}}{{\geq}}\varepsilon\sqrt{1-\varepsilon^{2}}\left\\|\ket{0}\bra{1}\otimes\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\Psi(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\right]+\ket{1}\bra{0}\otimes\mathrm{Tr}_{E}\left[(\mathaccent 869{V}_{x}-\mathaccent 869{V}_{y})\Psi\mathaccent 869{V}_{0}^{\dagger}\right]\right\\|_{1}$
	$\displaystyle\quad-\varepsilon^{2}\left\\|\ket{1}\!\!\bra{1}\otimes(\mathaccent 869{V}_{x}\Psi\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}\Psi\mathaccent 869{V}_{y}^{\dagger})\right\\|_{1}$
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(ii)}}}}{{=}}2\varepsilon\sqrt{1-\varepsilon^{2}}\left\\|\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\Psi(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\right]\right\\|_{1}-\varepsilon^{2}\left\\|\mathaccent 869{V}_{x}\Psi\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}\Psi\mathaccent 869{V}_{y}^{\dagger}\right\\|_{1}$
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(iii)}}}}{{\geq}}1\varepsilon\sqrt{1-\varepsilon^{2}}-2\varepsilon^{2}$
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(iv)}}}}{{\geq}}07\varepsilon,$

where in (i) we have used the reverse triangle inequality and the bound

\displaystyle\hskip 0.0pt\big\|\mathrm{Tr}_{E}\left[X_{FBE}\right]\big\|_{1}\leq\big\|X_{FBE}\big\|_{1},

(30)

which holds for every operator $X_{FBE}$ and follows from the data-processing inequality for the trace norm; in (ii) we have noticed that

		$\displaystyle\big\\|\ket{0}\bra{1}\otimes X_{BE}+\ket{1}\bra{0}\otimes X_{BE}^{\dagger}\big\\|_{1}$		(31)
		$\displaystyle\quad=\operatorname{Tr}\sqrt{\left(\ket{0}\bra{1}\otimes X_{BE}+\ket{1}\bra{0}\otimes X_{BE}^{\dagger}\right)^{\dagger}\left(\ket{0}\bra{1}\otimes X_{BE}+\ket{1}\bra{0}\otimes X_{BE}^{\dagger}\right)}$
		$\displaystyle\quad=\operatorname{Tr}\sqrt{\ket{1}\!\!\bra{1}\otimes X_{BE}^{\dagger}X_{BE}+\ket{0}\!\!\bra{0}\otimes X_{BE}X_{BE}^{\dagger}}$
		$\displaystyle\quad=\operatorname{Tr}\sqrt{X_{BE}^{\dagger}X_{BE}}+\operatorname{Tr}\sqrt{X_{BE}X_{BE}^{\dagger}}$
		$\displaystyle\quad=2\\|X_{BE}\\|_{1};$

in (iii) we have upper bounded $\|\mathaccent 869{V}_{x}\Psi\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}\Psi\mathaccent 869{V}_{y}^{\dagger}\|_{1}\leq\|\mathaccent 869{V}_{x}\Psi\mathaccent 869{V}_{x}^{\dagger}\|_{1}+\|\mathaccent 869{V}_{y}\Psi\mathaccent 869{V}_{y}^{\dagger}\|_{1}=2$ ; finally, in (iv) we used $\varepsilon\leq 0.01$ .

To sum up, we showed the existence of $\{{}_{x}\}_{x\in[M]}\in\mathcal{E}(d_{A},2d_{B},r,0.035\varepsilon,2\varepsilon)$ with cardinality $M$ satisfying $\log M=\frac{1}{1201}d_{A}d_{B}r$ . This implies the existence of $\{{}_{x}\}_{x\in[M]}\in\mathcal{E}(d_{A},d_{B},r,\varepsilon,60\varepsilon)$ with cardinality $M$ satisfying $\log M=\frac{1}{2\cdot 1201}d_{A}d_{B}r$ for $\varepsilon\leq 10^{-4}$ . By Theorem 1, we conclude:

\displaystyle\hskip 0.0ptN\geq\frac{(2/3)\log(M)-\log 2}{4\eta\log(d_{B}r/\eta)}\geq\frac{d_{A}d_{B}r-2500}{10^{6}\varepsilon\cdot\log(d_{B}r/60\varepsilon)},

(32)

which completes the proof. ∎

4 Proof of Lemma 2

In order to provide the set of isometries $\{\mathaccent 869{V}^{A\to BE}_{x}\}_{x\in[M]}$ claimed in the statement of Lemma 2, we are going use the following construction. Let us introduce

\displaystyle\hskip 0.0ptS^{A\to BE}

\displaystyle\coloneqq\sumop\slimits@_{i=1}^{d_{A}}\ket{i}_{BE}\bra{i}_{A}\in\mathbb{C}^{rd_{B}\times d_{A}}\qquad\text{and}\qquad\mathaccent 869{V}_{x}^{A\to BE}\coloneqq U_{x}^{BE}S^{A\to BE},

(33)

where $U_{x}\in{\rm U}(rd_{B})$ is a unitary operator. In our construction, we will consider independent random unitaries $U_{x}$ sampled according to the Haar measure. We are going to leverage the following technical statements in the proof of Lemma 2.

Lemma 4.

Let $U_{x},U_{y}\in{\rm U}(rd_{B}))$ and let $\mathaccent 869{V}_{x},\mathaccent 869{V}_{y}$ be defined as in (33). Then, let us define the operator

\displaystyle\hskip 0.0ptC\coloneqq\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}{}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\right],

(34)

where ${}_{A^{\prime}A}$ is the maximally entangled state between the systems $A$ and $A^{\prime}$ . The function $f(U_{x},U_{y})\coloneqq\left\|C\right\|_{1}$ is $\sqrt{\frac{2}{d_{A}}}$ -Lipschitz with respect to the $\ell_{2}$ -sum of the 2-norms, namely

\displaystyle\hskip 0.0pt|f(U_{x},U_{y})-f(U^{\prime}_{x},U^{\prime}_{y})|\leq\sqrt{\frac{2}{d_{A}}}\|(U_{x},U_{y})-(U_{x}^{\prime},U_{y}^{\prime})\|_{2}

(35)

for all $U_{x},U_{x}^{\prime},U_{y},U_{y}^{\prime}\in{\rm U}(rd_{B})$ , where $\|(A,B)\|_{2}\coloneqq\sqrt{\|A\|_{2}^{2}+\|B\|_{2}^{2}}$ is the $\ell_{2}$ -sum of the 2-norms. Furthermore, if we consider independent random unitaries $U_{x},U_{y}\sim{\rm Haar}({\rm U}(rd_{B}))$ , we have

(a)

$\displaystyle{\mathbb{E}\operatorname{Tr}\big[|C|^{2}\big]=\frac{2}{r}}$ ,
(b)

$\displaystyle{\mathbb{E}\operatorname{Tr}\big[|C|^{4}\big]\leq\frac{128}{r^{3}}}$ .

Proof.

See Appendix C. ∎

Lemma 5 ([meckes2013spectral, Corollary 17]).

Let $k,d\geq 1$ . Suppose that $f:\big({\rm U}(d)\big)^{k}\to\mathbb{R}$ is $L$ -Lipschitz with respect to the $\ell_{2}$ -sum of the 2-norms, i.e.

\displaystyle\hskip 0.0pt\big|f(U_{1},\dots,U_{k})-f(U^{\prime}_{1},\dots,U^{\prime}_{k})\big|\leq L\sqrt{\sumop\slimits@_{i=1}^{k}\|U_{i}-U_{i}^{\prime}\|_{2}^{2}}

(36)

for all $U_{i},U_{i}^{\prime}\in{\rm U}(d)$ , with $i=1,\dots,k$ . Then, if we independently sample $U_{1},\dots,U_{k}$ according to the Haar measure on ${\rm U}(d)$ , the following inequality holds for each $t>0$ :

\displaystyle\hskip 0.0pt\mathbb{P}\left(f(U_{1},\dots,U_{k})\geq\mathbb{E}\left[f(U_{1},\dots,U_{k})\right]+t\right)\leq\exp\left(-\frac{dt^{2}}{12L^{2}}\right).

(37)

Now we have all the ingredients to prove Lemma 2.

Proof of Lemma 2..

Let $C$ as in Lemma 4. By Hölder’s inequality applied to $\mathbb{E}\operatorname{Tr}[\,\cdot\,]$ with conjugate exponents $3$ and $3/2$ , we get

\displaystyle\hskip 0.0pt\mathbb{E}[\operatorname{Tr}\big[|C|^{2}\big]]=\mathbb{E}[\operatorname{Tr}\big[|C|^{4/3}|C|^{2/3}\big]]\leq\left(\mathbb{E}\left[\operatorname{Tr}\big[|C|^{4}\big]\right]\right)^{1/3}\left(\mathbb{E}\left[\operatorname{Tr}\big[|C|\big]\right]\right)^{2/3},

(38)

which yields

\displaystyle\hskip 0.0pt\big(\mathbb{E}[\operatorname{Tr}\big[|C|^{2}\big]]\big)^{3}\leq\mathbb{E}\left[\operatorname{Tr}\big[|C|^{4}\big]\right]\left(\mathbb{E}\left[\operatorname{Tr}\big[|C|\big]\right]\right)^{2}.

(39)

Whence, by the bounds (a) and (b) of Lemma 4 combined with (39), we have

\displaystyle\big(\mathbb{E}\operatorname{Tr}\big[|C|\big]\big)^{2}\geq\frac{\big(\mathbb{E}\operatorname{Tr}\big[|C|^{2}\big]\big)^{3}}{\mathbb{E}\operatorname{Tr}\big[|C|^{4}\big]}\geq\frac{(\frac{2}{r})^{3}}{\frac{128}{r^{3}}}=\frac{1}{16}.

(40)

Furthermore, since the function $f(U_{x},U_{y})\coloneqq\|C\|_{1}$ is $\sqrt{\frac{2}{d_{A}}}$ -Lipschitz, when we sample two independent unitaries $U_{x},U_{y}\sim{\rm Haar}({\rm U}(rd_{B}))$ , by Lemma 5, we have²²2To be precise, we are applying Lemma 5 to $-f$ , which is also $\sqrt{\frac{2}{d_{A}}}$ -Lipschitz.

\displaystyle\mathbb{P}\left(\mathbb{E}\left[f(U_{x},U_{y})\right]-f(U_{x},U_{y})\geq\frac{1}{5}\right)\leq\exp\left(-\frac{rd_{B}}{300\cdot\frac{2}{d_{A}}}\right)=\exp\left(-\frac{d_{A}d_{B}r}{600}\right)\eqcolon\delta

(41)

Therefore, with probability at least $1-\delta$ , we have

\displaystyle f(U_{x},U_{y})>\mathbb{E}\left[f(U_{x},U_{y})\right]-\frac{1}{5}\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(i)}}}}{{\geq}}\sqrt{\frac{1}{16}}-\frac{1}{5}=\frac{1}{20}.

(42)

where in (i) we have used the lower bound (40). Let

\displaystyle\hskip 0.0ptM\coloneqq\left\lfloor\exp\left(\frac{d_{A}d_{B}r}{1201}\right)\right\rfloor

(43)

and let $\{U_{x}\}_{x\in[M]}$ be i.i.d. Haar random matrices. Note that, by their very definitions, $M^{2}\delta<1$ and $\log M=\Omega(d_{A}d_{B}r)$ . By the union bound, we have

\displaystyle\hskip 0.0pt\mathbb{P}\left(\exists x\neq y:f(U_{x},U_{y})<\frac{1}{20}\right)

\displaystyle\leq M(M-1)\;\mathbb{P}\left(f(U_{x},U_{y})<\frac{1}{20}\right)\leq M^{2}\delta<1.

(44)

Hence, there exists a family $\{U_{x}\}_{x\in[M]}$ such that, for all $x\neq y$ ,

\displaystyle f(U_{x},U_{y})=\left\|\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\ket{\Psi}\!\!\bra{\Psi}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\right]\right\|_{1}\geq\frac{1}{20}.

(45)

This concludes the proof of Lemma 2. ∎

5 Conclusion

We have proved that learning an unknown quantum channel to diamond distance $\varepsilon$ requires $\Omega\bigl(d_{A}d_{B}r/(\varepsilon\log(d_{B}r/\varepsilon))\bigr)$ queries, improving upon the previous $\Omega(d_{A}d_{B}r)$ bound. The proof constructs ensembles of channels that are well-separated in diamond norm yet admit Stinespring isometries that are close in operator norm.

Several natural questions remain open. First, the precise $\varepsilon$ -dependence in the general case is still unclear: while our bound scales as $\mathaccent 869{\Omega}(1/\varepsilon)$ , the state-learning regime suggests $\Theta(1/\varepsilon^{2})$ is necessary in some parameter ranges. Second, it is unknown whether coherent strategies offer any advantage over parallel strategies for channel learning in diamond distance. Finally, the role of quantum memory in the query complexity requires further exploration.

Acknowledgments

FG acknowledges financial support from the European Union (ERC StG ETQO, Grant Agreement no. 101165230).

References

Appendix A A weaker lower bound using existing packing nets

A natural approach to constructing such an ensemble in $\mathcal{E}(d_{A},d_{B},r,\varepsilon,\eta)$ is to use packing nets. Assume that $d_{B}\geq 2$ . From [Girardi2025Dec, Lemma 14], we have

\displaystyle\hskip 0.0pt\log\mathcal{M}\left(\mathcal{C}(d_{A},d_{B},\lfloor\tfrac{r}{2}\rfloor),\ \|\cdot\|_{\diamond},\ 1/2\right)=\Theta\left(r\,d_{A}d_{B}\right),

(46)

where $\mathcal{M}(\mathcal{S},|\cdot|,\delta)$ denotes the $\delta$ -packing number of the set $\mathcal{S}$ with respect to the norm $|\cdot|$ .

Let $M=\mathcal{M}\big(\mathcal{C}(d_{A},d_{B},\lfloor\frac{r}{2}\rfloor),\ \|\cdot\|_{\diamond},\ 1/2\big)$ , and let $\{\mathaccent 869{\Phi}_{x}\}_{x\in[M]}$ be a $1/2$ -diamond-norm packing of quantum channels, with corresponding Stinespring isometries $\{\mathaccent 869{V}_{x}\}_{x\in[M]}$ .

For a given $\varepsilon\in(0,\frac{1}{4})$ and each $x\in[M]$ , we define the convex mixture

\displaystyle\hskip 0.0pt{}_{x}=(1-4\varepsilon)\mathaccent 869{\Phi}_{1}+4\varepsilon\mathaccent 869{\Phi}_{x}.

(47)

This is a valid quantum channel of Choi rank at most $2\lfloor\frac{r}{2}\rfloor\leq r$ . We observe that for any distinct $x,y\in[M]$ ,

\displaystyle\hskip 0.0pt\|{}_{x}-{}_{y}\|_{\diamond}

\displaystyle=4\varepsilon\|\mathaccent 869{\Phi}_{x}-\mathaccent 869{\Phi}_{y}\|_{\diamond}>2\varepsilon,

(48)

since $\|\mathaccent 869{\Phi}_{x}-\mathaccent 869{\Phi}_{y}\|_{\diamond}>1/2$ by the packing property. Moreover, by [kretschmann2008information], we have

\displaystyle\hskip 0.0pt\inf_{V_{{}_{x}}}\|V_{{}_{x}}-V_{1}\|_{\mathrm{op}}^{2}\leq\|{}_{x}-{}_{1}\|_{\diamond}=4\varepsilon\|\mathaccent 869{\Phi}_{x}-\mathaccent 869{\Phi}_{1}\|_{\diamond}\leq 4\varepsilon,

(49)

where $V_{1}$ is a Stinespring isometry for ₁. Let $V_{x}$ be a Stinespring isometry for _x achieving this infimum. Then for all $x,y\in[M]$ ,

\displaystyle\hskip 0.0pt\|V_{x}-V_{y}\|_{\mathrm{op}}

\displaystyle\leq\|V_{x}-V_{1}\|_{\mathrm{op}}+\|V_{y}-V_{1}\|_{\mathrm{op}}\leq 4\sqrt{\varepsilon}.

(50)

Thus, $\{{}_{x}\}_{x\in[M]}\in\mathcal{E}(d_{A},d_{B},r,\varepsilon,4\sqrt{\varepsilon})$ , which implies the lower bound from Theorem 1

\displaystyle N\geq\frac{(2/3)\log(M)-\log 2}{4\eta\log(d_{B}r/\eta)}\geq\Omega\left(\frac{d_{A}d_{B}r}{\sqrt{\varepsilon}\log(d_{B}r/\sqrt{\varepsilon})}\right).

(51)

Appendix B Existence of the quantum channel $\mathaccent 869{\Phi}_{0}$

In this section, we want to show the existence of a quantum channel $\mathaccent 869{\Phi}_{0}$ with Kraus operators $\{K_{0,i}\}_{i\in[r]}$ satisfying

\displaystyle\left|\operatorname{Tr}\left[K_{0,i}^{\dagger}K_{0,j}\right]\right|\leq\frac{2d_{A}}{r}\delta_{i,j}\,,\quad\forall i,j\in[r].

(52)

We make cases depending on whether $d_{A}\leq d_{B}$ or not.

•

Case $1$ : $d_{A}\leq d_{B}$ , let $k=\bigl\lfloor\frac{d_{B}}{d_{A}}\bigr\rfloor\in[1,r]$ and $l=\bigl\lceil\frac{r}{k}\bigr\rceil$ . Note that $l\leq\bigl\lceil\frac{rd_{A}}{d_{B}}\bigr\rceil\leq d_{A}^{2}$ . We decompose $\mathbb{C}^{d_{B}}\simeq\bigl(\bigoplusop\slimits@_{i=1}^{k}\mathbb{C}^{d_{A}}\bigr)\oplus\mathbb{C}^{d_{C}}$ , where $d_{C}=d_{B}-kd_{A}<d_{A}$ .

For each block $A_{i}\simeq A$ ( $i=1,\dots,k$ ), we can choose $l\leq d_{A}^{2}$ orthogonal $d_{A}\times d_{A}$ unitary matrices $\{U_{i,j}\}_{j\in[l]}$ (for example, a subset of the generalized Pauli operators). Since $kl=k\bigl\lceil\frac{r}{k}\bigr\rceil\geq r$ , we may select a subset $S\subset[k]\times[l]$ with $|S|=r$ . For each $(i,j)\in S$ , define the Kraus operator

\displaystyle\hskip 0.0ptK_{i,j}=\left(0\oplus\tfrac{1}{\sqrt{r}}\,U_{i,j}\right),

(53)

where the direct sum is taken with respect to the decomposition above, and $U_{i,j}$ acts nontrivially only on the $i$ -th $\mathbb{C}^{d_{A}}$ summand.

We then verify:

(a)

Completeness:

\displaystyle\hskip 0.0pt\sumop\slimits@_{(i,j)\in S}K_{i,j}^{\dagger}K_{i,j}=\sumop\slimits@_{(i,j)\in S}\frac{1}{r}\,U_{i,j}^{\dagger}U_{i,j}=\mathbb{I}_{A}.

(54)

(b)

Orthogonality: For all $(i,j),(i^{\prime},j^{\prime})\in S$ ,

\displaystyle\hskip 0.0pt\operatorname{Tr}\!\big[K_{i,j}^{\dagger}K_{i^{\prime},j^{\prime}}\big]=\frac{d_{A}}{r}\,\delta_{i,i^{\prime}}\delta_{j,j^{\prime}}.

(55)

•

Case $2$ : $d_{A}>d_{B}$ , let $k=\bigl\lfloor\frac{d_{A}}{d_{B}}\bigr\rfloor\in[1,r]$ and write $d_{A}=kd_{B}+d_{C}$ with $0\leq d_{C}<d_{B}$ . We can then decompose $\mathds{1}_{A}=\mathds{1}_{B_{1}}\oplus\cdots\oplus\mathds{1}_{B_{k}}\oplus\mathds{1}_{C}$ , where each $B_{i}\simeq B$ (i.e., $\dim B_{i}=d_{B}$ ).

For each block $B_{i}$ ( $i=1,\dots,k$ ), construct $l=\bigl\lfloor\frac{r}{k}\bigr\rfloor\in[1,d_{B}^{2}]$ orthogonal $d_{B}\times d_{B}$ unitary matrices $\{U_{i,j}\}_{j\in[l]}$ that are supported on $B_{i}$ and define the corresponding $d_{A}\times d_{B}$ matrices

\displaystyle\hskip 0.0ptK_{i,j}=\bigl(0\oplus\tfrac{1}{\sqrt{l}}\,U_{i,j}\bigr)\quad(j\in[l]),

(56)

where the direct sum is taken with respect to the decomposition $\mathbb{C}^{d_{A}}\simeq\bigl(\bigoplusop\slimits@_{i=1}^{k}\mathbb{C}^{d_{B}}\bigr)\oplus\mathbb{C}^{d_{C}}$ and $U_{i,j}$ acts nontrivially only on the $i$ -th $d_{B}$ -dimensional summand.

For the remaining block $C$ , since $d_{C}<d_{B}$ we can apply Case $1$ and construct $r^{\prime}=\bigl\lceil\frac{rd_{C}}{d_{A}}\bigr\rceil\in[1,d_{C}d_{B}]$ orthogonal $d_{C}\times d_{B}$ isometries $\{V_{i^{\prime}}\}_{i^{\prime}\in[r^{\prime}]}$ and define

\displaystyle\hskip 0.0ptK_{k+1,i^{\prime}}=\bigl(0\oplus\tfrac{1}{\sqrt{r^{\prime}}}\,V_{i^{\prime}}\bigr)\quad(i^{\prime}\in[r^{\prime}]),

(57)

where now $V_{i^{\prime}}$ acts nontrivially only on the $\mathbb{C}^{d_{C}}$ summand. We can check

(a)

Completeness:

	$\displaystyle\hskip 0.0pt\sumop\slimits@_{i=1}^{k}\sumop\slimits@_{j=1}^{l}K_{i,j}^{\dagger}K_{i,j}+\sumop\slimits@_{i^{\prime}=1}^{r^{\prime}}K_{k+1,i^{\prime}}^{\dagger}K_{k+1,i^{\prime}}$	$\displaystyle=\sumop\slimits@_{i=1}^{k}\sumop\slimits@_{j=1}^{l}\frac{1}{l}\,\mathbb{I}_{B_{i}}+\sumop\slimits@_{i^{\prime}=1}^{r^{\prime}}\frac{1}{r^{\prime}}\,\mathbb{I}_{C}$		(58)
		$\displaystyle=\mathbb{I}_{A}.$		(58)

(b)

Orthogonality: For all $i,i^{\prime}\in[k]$ and $j,j^{\prime}\in[l]$ ,

\displaystyle\hskip 0.0pt\operatorname{Tr}\!\big[K_{i,j}^{\dagger}K_{i^{\prime},j^{\prime}}\big]

\displaystyle=\delta_{i,i^{\prime}}\delta_{j,j^{\prime}}\frac{d_{B}}{l}\leq\delta_{i,i^{\prime}}\delta_{j,j^{\prime}}\frac{d_{A}}{r},

(59)

and for $i^{\prime},i^{\prime\prime}\in[r^{\prime}]$ ,

\displaystyle\hskip 0.0pt\operatorname{Tr}\!\big[K_{k+1,i^{\prime}}^{\dagger}K_{k+1,i^{\prime\prime}}\big]=\delta_{i^{\prime},i^{\prime\prime}}\frac{d_{C}}{r^{\prime}}\leq\delta_{i^{\prime},i^{\prime\prime}}\frac{d_{A}}{r}.

(60)

(c)

Kraus rank: The total number of Kraus operators is

$\displaystyle\hskip 0.0ptlk+r^{\prime}=\Bigl\lfloor\frac{r}{k}\Bigr\rfloor k+\Bigl\lceil\frac{rd_{C}}{d_{A}}\Bigr\rceil\leq 2r.$ (61)

Appendix C Proof of Lemma 4

Let us start by upper bounding

$\displaystyle\hskip 0.0pt\\|C\\|_{1}$	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(i)}}}}{{\leq}}\\|\mathaccent 869{V}_{0}{}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\\|_{1}$	(62)
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(ii)}}}}{{=}}\\|{}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\\|_{1}$
	$\displaystyle=\sqrt{\bra{{}_{A^{\prime}A}}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})(\mathaccent 869{V}_{x}-\mathaccent 869{V}_{y})\ket{{}_{A^{\prime}A}}}$
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(iii)}}}}{{=}}\sqrt{\frac{1}{d_{A}}\operatorname{Tr}\|\mathaccent 869{V}_{x}-\mathaccent 869{V}_{y}\|^{2}}$
	$\displaystyle=\frac{1}{\sqrt{d_{A}}}\\|\mathaccent 869{V}_{x}-\mathaccent 869{V}_{y}\\|_{2}$
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(iv)}}}}{{\leq}}\frac{1}{\sqrt{d_{A}}}\\|U_{x}-U_{y}\\|_{2},$

where in (i) we have used the data-processing inequality in a similar way to (30), in (ii) we have leveraged the variational characterisation of the 1-norm $\|\,\cdot\,\|_{1}=\max_{-\mathds{1}\leq X\leq\mathds{1}}\operatorname{Tr}[X\,\cdot\,]$ and we have absorbed $\tilde{V}_{1}$ in $X$ , in (iii) we have recalled the identity

\displaystyle\hskip 0.0pt\bra{{}_{A^{\prime}A}}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})(\mathaccent 869{V}_{x}-\mathaccent 869{V}_{y})\ket{{}_{A^{\prime}A}}=\frac{1}{d_{A}}\operatorname{Tr}\left[(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})(\mathaccent 869{V}_{x}-\mathaccent 869{V}_{y})\right];

(63)

finally, in (iv) we have noticed that

	$\displaystyle\hskip 0.0pt\\|\mathaccent 869{V}_{x}-\mathaccent 869{V}_{y}\\|_{2}^{2}$	$\displaystyle=\operatorname{Tr}\big[S^{\dagger}(U_{x}-U_{y})^{\dagger}(U_{x}-U_{y})S\big]$		(64)
		$\displaystyle=\operatorname{Tr}\big[SS^{\dagger}(U_{x}-U_{y})^{\dagger}(U_{x}-U_{y})\big]\leq\\|U_{x}-U_{y}\\|_{2}^{2},$		(64)

as $SS^{\dagger}\leq\mathds{1}_{BE}$ .

Calling $f(U_{x},U_{y})\coloneqq\|C\|_{1}$ , we have

		$\displaystyle\|f(U_{x},U_{y})-f(U^{\prime}_{x},U^{\prime}_{y})\|$		(65)
		$\displaystyle\quad=\left\|\;\left\\|\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\ket{\Psi}\!\!\bra{\Psi}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\right]\right\\|_{1}-\left\\|\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\ket{\Psi}\!\!\bra{\Psi}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{{}^{\prime}\dagger}-\mathaccent 869{V}_{y}^{{}^{\prime}\dagger})\right]\right\\|_{1}\;\right\|$
		$\displaystyle\quad\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(v)}}}}{{=}}\left\\|\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\ket{\Psi}\!\!\bra{\Psi}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\right]-\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\ket{\Psi}\!\!\bra{\Psi}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{{}^{\prime}\dagger}-\mathaccent 869{V}_{y}^{{}^{\prime}\dagger})\right]\right\\|_{1}$
		$\displaystyle\quad\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(vi)}}}}{{\leq}}\left\\|\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\ket{\Psi}\!\!\bra{\Psi}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{x}^{{}^{\prime}\dagger})\right]\right\\|_{1}+\left\\|\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\ket{\Psi}\!\!\bra{\Psi}_{A^{\prime}A}(\mathaccent 869{V}_{y}^{{}^{\prime}\dagger}-\mathaccent 869{V}_{y}^{\dagger})\right]\right\\|_{1}$
		$\displaystyle\quad\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(vii)}}}}{{\leq}}\sqrt{\frac{1}{d_{A}}}\\|{U}_{x}-{U}_{x}^{\prime}\\|_{2}+\sqrt{\frac{1}{d_{A}}}\\|{U}_{y}-{U}_{y}^{\prime}\\|_{2}$
		$\displaystyle\quad\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(viii)}}}}{{\leq}}\sqrt{\frac{2}{d_{A}}}\sqrt{\\|{U}_{x}-{U}_{x}^{\prime}\\|_{2}^{2}+\\|{U}_{y}-{U}_{y}^{\prime}\\|_{2}^{2}}$
		$\displaystyle\quad=\sqrt{\frac{2}{d_{A}}}\\|(U_{x},U_{y})-(U_{x}^{\prime},U_{y}^{\prime})\\|_{2},$

where in (v) we have used the reverse triangle inequality, in (vi) we have leveraged the triangle inequality, in (vii) we have bounded as in (62), and in (viii) we have recalled the inequality $|a|+|b|\leq\sqrt{2(a^{2}+b^{2})}$ . This completes the first part of the proof of Lemma 4. Now, we want to prove that

\displaystyle\mathbb{E}\operatorname{Tr}\big[|C|^{2}\big]

\displaystyle=\frac{2}{r}.

(66)

when we sample independent random unitaries $U_{x},U_{y}\sim{\rm Haar}({\rm U}(rd_{B}))$ . Let $\{\ket{i}_{E}\}_{i\in[r]}$ be an orthonormal basis for $E$ . For $i=1,\dots,r$ , let $K_{0,i}^{A\to B}\coloneqq\bra{i}_{E}\mathaccent 869{V}_{0}^{A\to BE}$ and $K_{x,i}^{A\to B}\coloneqq\bra{i}_{E}\mathaccent 869{V}_{x}^{A\to BE}$ be the Kraus operators obtained from the isometries $\mathaccent 869{V}_{0}$ and $\mathaccent 869{V}_{x}$ , respectively. Writing the trace on the system $E$ in terms of the basis $\{\ket{i}_{E}\}_{i\in[r]}$ , we get

$\displaystyle\hskip 0.0pt\mathbb{E}\operatorname{Tr}\big[\|C\|^{2}\big]$	$\displaystyle=\mathbb{E}\operatorname{Tr}\left[\sumop\slimits@_{i,j=1}^{r}K_{0,i}{}_{A^{\prime}A}(K_{x,i}-K_{y,i})^{\dagger}(K_{x,j}-K_{y,j}){}_{A^{\prime}A}K_{0,j}^{\dagger}\right]$	(67)
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(viii)}}}}{{=}}\sumop\slimits@_{i,j=1}^{r}\operatorname{Tr}\left[K_{0,i}{}_{A^{\prime}A}\left(\frac{2\mathds{1}}{r}\delta_{i,j}\right){}_{A^{\prime}A}K_{0,j}^{\dagger}\right]$
	$\displaystyle=\frac{2}{r}\sumop\slimits@_{i=1}^{r}\operatorname{Tr}\left[K_{0,i}{}_{A^{\prime}A}^{2}K_{0,i}^{\dagger}\right]$
	$\displaystyle=\frac{2}{r}\operatorname{Tr}\left[{}_{A^{\prime}A}\sumop\slimits@_{i=1}^{r}K_{0,i}^{\dagger}K_{0,i}\right]$
	$\displaystyle=\frac{2}{r},$

where in (viii) we have expanded

	$\displaystyle\hskip 0.0pt\mathbb{E}\left[(K_{x,i}-K_{y,i})^{\dagger}(K_{x,j}-K_{y,j})\right]$	$\displaystyle=\mathbb{E}\left[K_{x,i}^{\dagger}K_{x,j}\right]+\mathbb{E}\left[K_{y,i}^{\dagger}K_{y,j}\right]$		(68)
		$\displaystyle\quad-\mathbb{E}\left[K_{x,i}^{\dagger}K_{y,j}\right]-\mathbb{E}\left[K_{y,i}^{\dagger}K_{x,j}\right]$		(68)

and, for $z_{1},z_{2}\in\{x,y\}$ , we have computed

	$\displaystyle\hskip 0.0pt\mathbb{E}\left[K_{{z_{1}},i}^{\dagger}K_{{z_{2}},j}\right]$	$\displaystyle=\mathbb{E}\left[\mathaccent 869{V}_{z_{1}}^{\dagger}\ket{i}_{E}\bra{j}_{E}\mathaccent 869{V}_{z_{2}}\right]=S^{\dagger}\mathbb{E}\left[U_{z_{1}}^{\dagger}\ket{i}_{E}\bra{j}_{E}U_{z_{2}}\right]S$		(69)
		$\displaystyle=S^{\dagger}\left(\frac{\delta_{{z_{1}},{z_{2}}}}{rd_{B}}\operatorname{Tr}\left[\ket{i}_{E}\bra{j}_{E}\otimes\mathds{1}_{B}\right]\right)S=\frac{\delta_{{z_{1}},{z_{2}}}\delta_{i,j}}{r}\mathds{1}_{A},$		(69)

leveraging the fact that $\underset{\mathchoice{\scalebox{0.8}{$\displaystyle U\in{\rm U}(d)$}}{\scalebox{0.8}{$\textstyle U\in{\rm U}(d)$}}{\scalebox{0.8}{$\scriptstyle U\in{\rm U}(d)$}}{\scalebox{0.8}{$\scriptscriptstyle U\in{\rm U}(d)$}}}{\mathds{E}\,}[U]=0$ , $\underset{\mathchoice{\scalebox{0.8}{$\displaystyle U\in{\rm U}(d)$}}{\scalebox{0.8}{$\textstyle U\in{\rm U}(d)$}}{\scalebox{0.8}{$\scriptstyle U\in{\rm U}(d)$}}{\scalebox{0.8}{$\scriptscriptstyle U\in{\rm U}(d)$}}}{\mathds{E}\,}[U^{\dagger}XU]=\frac{\operatorname{Tr}[X]}{d}\mathds{1}$ and $S^{\dagger}S=\mathds{1}_{A}$ .

The only inequality we are left to prove is

\displaystyle\hskip 0.0pt\mathbb{E}\operatorname{Tr}\big[|C|^{4}\big]\leq\frac{128}{r^{3}}.

(70)

We have

$\displaystyle\hskip 0.0pt\mathbb{E}\operatorname{Tr}\big[\|C\|^{4}\big]$	$\displaystyle=\mathbb{E}\operatorname{Tr}\Bigg[\sumop\slimits@_{i,j=1}^{r}K_{0,i}{}_{A^{\prime}A}(K_{x,i}-K_{y,i})^{\dagger}(K_{x,j}-K_{y,j}){}_{A^{\prime}A}K_{0,j}^{\dagger}$	(71)
	$\displaystyle\qquad\qquad\qquad\qquad\times\sumop\slimits@_{k,l=1}^{r}K_{1,k}{}_{A^{\prime}A}(K_{x,k}-K_{y,k})^{\dagger}(K_{x,l}-K_{y,l}){}_{A^{\prime}A}K_{0,l}^{\dagger}\Bigg]$
	$\displaystyle=\mathbb{E}\sumop\slimits@_{i,j=1}^{r}\sumop\slimits@_{k,l=1}^{r}\bra{{}_{A^{\prime}A}}K_{0,l}^{\dagger}K_{0,i}\ket{{}_{A^{\prime}A}}\bra{{}_{A^{\prime}A}}K_{0,j}^{\dagger}K_{0,k}\ket{{}_{A^{\prime}A}}$
	$\displaystyle\phantom{\mathbb{E}\sumop\slimits@_{i,j=1}^{r}\sumop\slimits@_{k,l=1}^{r}}\quad\times\bra{{}_{A^{\prime}A}}(K_{x,i}-K_{y,i})^{\dagger}(K_{x,j}-K_{y,j})\ket{{}_{A^{\prime}A}}$
	$\displaystyle\phantom{\mathbb{E}\sumop\slimits@_{i,j=1}^{r}\sumop\slimits@_{k,l=1}^{r}}\quad\times\bra{{}_{A^{\prime}A}}(K_{x,k}-K_{y,k})^{\dagger}(K_{x,l}-K_{y,l})\ket{{}_{A^{\prime}A}}$
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(ix)}}}}{{\leq}}\mathbb{E}\sumop\slimits@_{i,j=1}^{r}\frac{4}{r^{2}}\left\|\bra{{}_{A^{\prime}A}}(K_{x,i}-K_{y,i})^{\dagger}(K_{x,j}-K_{y,j})\ket{{}_{A^{\prime}A}}\right\|^{2}$

where in (ix) we have noticed that, by (24),

		$\displaystyle\big\|\bra{{}_{A^{\prime}A}}K_{0,l}^{\dagger}K_{0,i}\ket{{}_{A^{\prime}A}}\bra{{}_{A^{\prime}A}}K_{0,j}^{\dagger}K_{0,k}\ket{{}_{A^{\prime}A}}\big\|$		(72)
		$\displaystyle\qquad\qquad=\left\|\frac{1}{d_{A}}\operatorname{Tr}[K_{0,l}^{\dagger}K_{0,i}]\frac{1}{d_{A}}\operatorname{Tr}[K_{0,j}^{\dagger}K_{0,k}]\right\|\leq\frac{4}{r^{2}}\delta_{i,l}\delta_{j,k}.$		(72)

Hence

$\displaystyle\hskip 0.0pt\mathbb{E}\operatorname{Tr}\big[\|C\|^{4}\big]$	$\displaystyle\leq\mathbb{E}\sumop\slimits@_{i,j=1}^{r}\frac{4}{r^{2}}\left\|\bra{{}_{A^{\prime}A}}(K_{x,i}-K_{y,i})^{\dagger}(K_{x,j}-K_{y,j})\ket{{}_{A^{\prime}A}}\right\|^{2}$	(73)
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(x)}}}}{{\leq}}\mathbb{E}\sumop\slimits@_{i,j=1}^{r}\frac{16}{r^{2}}\Big(\left\|\bra{{}_{A^{\prime}A}}K_{x,i}^{\dagger}K_{x,j}\ket{{}_{A^{\prime}A}}\right\|^{2}+\left\|\bra{{}_{A^{\prime}A}}K_{y,i}^{\dagger}K_{y,j}\ket{{}_{A^{\prime}A}}\right\|^{2}$
	$\displaystyle\phantom{\leq\mathbb{E}\sumop\slimits@_{i,j=1}^{r}\frac{16}{r^{2}}}\quad+2\left\|\bra{{}_{A^{\prime}A}}K_{x,i}^{\dagger}K_{y,j}\ket{{}_{A^{\prime}A}}\right\|^{2}\Big)$
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(xi)}}}}{{\leq}}\mathbb{E}\sumop\slimits@_{i,j=1}^{r}\frac{32}{r^{2}}\left(\left\|\bra{{}_{A^{\prime}A}}K_{x,i}^{\dagger}K_{x,j}\ket{{}_{A^{\prime}A}}\right\|^{2}+\left\|\bra{{}_{A^{\prime}A}}K_{y,i}^{\dagger}K_{y,j}\ket{{}_{A^{\prime}A}}\right\|^{2}\right)$
	$\displaystyle=\frac{64}{r^{2}}\sumop\slimits@_{i,j=1}^{r}\mathbb{E}\left\|\bra{{}_{A^{\prime}A}}K_{x,i}^{\dagger}K_{x,j}\ket{{}_{A^{\prime}A}}\right\|^{2},$

where in (x) and in (xi) we have leveraged the inequality $2ab\leq a^{2}+b^{2}$ multiple times.

Recalling that we defined $\mathaccent 869{V}_{x}=U_{x}S$ and $K_{x,i}=\bra{i}_{E}\mathaccent 869{V}_{x}$ , we compute

		$\displaystyle\mathbb{E}\left\|\bra{{}_{A^{\prime}A}}K_{x,i}^{\dagger}K_{x,j}\ket{{}_{A^{\prime}A}}\right\|^{2}$		(74)
		$\displaystyle\quad=\frac{1}{d_{A}^{2}}\mathbb{E}\left[\operatorname{Tr}[K_{x,i}^{\dagger}K_{x,j}]\operatorname{Tr}[K_{x,j}^{\dagger}K_{x,i}]\right]$
		$\displaystyle\quad=\frac{1}{d_{A}^{2}}\sumop\slimits@_{k,l=1}^{d_{A}}\mathbb{E}\operatorname{Tr}\left[K_{x,i}^{\dagger}K_{x,j}\ket{l}\bra{k}K_{x,j}^{\dagger}K_{x,i}\ket{k}\bra{l}\right]$
		$\displaystyle\quad\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(xii)}}}}{{=}}\frac{1}{d_{A}^{2}}\sumop\slimits@_{k,l=1}^{d_{A}}\mathbb{E}\operatorname{Tr}\left[U_{x}^{\dagger}\big(\ket{i}_{E}\bra{j}_{E}\otimes\mathds{1}_{B}\big)U_{x}S\ket{l}\bra{k}S^{\dagger}U_{x}^{\dagger}\big(\ket{j}_{E}\bra{i}_{E}\otimes\mathds{1}_{B}\big)U_{x}S\ket{k}\bra{l}S^{\dagger}\right]$
		$\displaystyle\quad\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(xiii)}}}}{{=}}\frac{1}{d_{A}^{2}}\sumop\slimits@_{k,l=1}^{d_{A}}\sumop\slimits@_{\alpha,\beta\in S_{2}}\operatorname{Wg}(\beta\alpha,d_{B}r)\mathrm{Tr}_{\beta}\left[S\ket{k}\bra{l}S^{\dagger},S\ket{l}\bra{k}S^{\dagger}\right]$
		$\displaystyle\quad\phantom{\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(xii)}}}}{{=}}\sumop\slimits@_{\alpha,\beta\in S_{2}}\operatorname{Wg}(\beta\alpha,d_{B}r)}\quad\times\mathrm{Tr}_{\alpha(12)}\left[\ket{i}_{E}\bra{j}_{E}\otimes\mathds{1}_{B},\ket{j}_{E}\bra{i}_{E}\otimes\mathds{1}_{B}\right]$
		$\displaystyle\quad=\frac{1}{d_{A}^{2}}\sumop\slimits@_{k,l=1}^{d_{A}}\Big(\operatorname{Wg}((1)(2),d_{B}r)\big(\delta_{k,l}d_{B}+\delta_{i,j}d_{B}^{2}\big)+\operatorname{Wg}((2),d_{B}r)\big(\delta_{k,l}\delta_{i,j}d_{B}^{2}+d_{B}\big)\Big)$
		$\displaystyle\quad\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(xiv)}}}}{{=}}\frac{1}{d_{A}^{2}}\cdot\frac{1}{(d_{B}r)^{2}-1}\sumop\slimits@_{k,l=1}^{d_{A}}\Big(\delta_{k,l}d_{B}+\delta_{i,j}d_{B}^{2}-\frac{1}{d_{B}r}\big(\delta_{k,l}\delta_{i,j}d_{B}^{2}+d_{B}\big)\Big)$
		$\displaystyle\quad=\frac{1}{d_{A}}\cdot\frac{1}{(rd_{B})^{2}-1}\Big(d_{B}+\delta_{i,j}d_{A}d_{B}^{2}-\frac{1}{d_{B}r}\big(\delta_{i,j}d_{B}^{2}+d_{A}d_{B}\big)\Big)$

where in (xii) have expanded $K_{x,i}=\bra{i}_{E}U_{x}S$ and we have leveraged the ciclicity of the trace; in (xiii) we have used Lemma 6 with $A_{1}=\ket{i}_{E}\bra{j}_{E}\otimes\mathds{1}_{B}$ , $A_{2}=\ket{j}_{E}\bra{i}_{E}\otimes\mathds{1}_{B}$ , $B_{1}=S\ket{k}\bra{l}S^{\dagger}$ and $B_{2}=S\ket{l}\bra{k}S^{\dagger}$ ; in (xiv) we have used the values given in Lemma 7. Combining (73) with (LABEL:eq:73), we get

$\displaystyle\hskip 0.0pt\mathbb{E}\operatorname{Tr}\big[\|C\|^{4}\big]$	$\displaystyle\leq\frac{64}{r^{2}}\sumop\slimits@_{i,j=1}^{r}\mathbb{E}\left\|\bra{{}_{A^{\prime}A}}K_{x,i}^{\dagger}K_{x,j}\ket{{}_{A^{\prime}A}}\right\|^{2}$	(75)
	$\displaystyle\leq\frac{64}{r^{2}}\cdot\frac{1}{d_{A}}\cdot\frac{1}{(rd_{B})^{2}-1}\sumop\slimits@_{i,j=1}^{r}\Big(d_{B}+\delta_{i,j}d_{A}d_{B}^{2}-\frac{1}{d_{B}r}\big(\delta_{i,j}d_{B}^{2}+d_{A}d_{B}\big)\Big)$
	$\displaystyle=\frac{64}{r^{2}}\cdot\frac{1}{d_{A}d_{B}}\cdot\frac{1}{(rd_{B})^{2}-1}\left(d_{B}^{2}r^{2}+rd_{A}d_{B}^{3}-d_{B}^{2}-d_{A}d_{B}r\right)$
	$\displaystyle=\frac{64}{r^{2}}\cdot\left(\frac{1}{d_{A}d_{B}}+\frac{1}{r}+\frac{1-d_{B}^{2}+d_{A}d_{B}-d_{A}d_{B}r}{d_{A}d_{B}(r^{2}d_{B}^{2}-1)}\right)\leq\frac{128}{r^{3}},$

where in the last line we have recalled that $r\leq d_{A}d_{B}$ . This concludes the proof.

Appendix D Weingarten Calculus

As we use a random channel constructed from sampling a ${\rm Haar}$ -random unitary matrix in our lower bound proofs, we need some facts from Weingarten calculus in order to compute the corresponding expectation values with respect to the Haar measure. If $\pi\in S_{n}$ is a permutation of $[n]$ , let $\operatorname{Wg}(\pi,d)$ denote the Weingarten function of dimension $d$ . The following lemma is useful for our results.

Lemma 6 ([gu2013moments]).

Let $U$ be a ${\rm Haar}$ -distributed unitary $(d\times d)$ -matrix and let $\{A_{i},B_{i}\}_{i=1}^{n}$ be a sequence of complex $(d\times d)$ -matrices. We have the following formula for the expectation value:

		$\displaystyle\mathbb{E}\left[\operatorname{Tr}(UB_{1}U^{\dagger}A_{1}U\dots UB_{n}U^{\dagger}A_{n})\right]$		(76)
		$\displaystyle\qquad=\sumop\slimits@_{\alpha,\beta\in S_{n}}\operatorname{Wg}(\beta\alpha^{-1},d)\operatorname{Tr}_{\beta^{-1}}(B_{1},\dots,B_{n})\operatorname{Tr}_{\alpha\gamma_{n}}(A_{1},\dots,A_{n}),$		(76)

where $\gamma_{n}=(12\dots n)$ and, writing $\sigma$ in terms of cycles $\{C_{j}\}$ as $\sigma=\prodop\slimits@_{j}C_{j}$ ,

\displaystyle\hskip 0.0pt\operatorname{Tr}_{\sigma}(M_{1},\dots,M_{n})\coloneqq\prodop\slimits@_{j}\operatorname{Tr}\prodop\slimits@_{i\in C_{j}}M_{i}.

(77)

We will also need some values of Weingarten function.

Lemma 7 ([collins2006integration]).

The function $\operatorname{Wg}(\pi,d)$ has the following values:

•

$\operatorname{Wg}((1),d)=\frac{1}{d}$ ,
•

$\operatorname{Wg}((12),d)=\frac{-1}{d(d^{2}-1)}$ ,
•

$\operatorname{Wg}((1)(2),d)=\frac{1}{d^{2}-1}$ .

$\displaystyle\hskip 0.0pt\\|\pi_{k}-\xi_{k}\\|_{1}$	$\displaystyle=\left\\|\frac{1}{M}\sumop\slimits@_{x=1}^{M}\pazocal{V}_{x}\circ\pazocal{N}_{k-1}(\sigma_{x}^{k-1})-\frac{1}{M}\sumop\slimits@_{x=1}^{M}\pazocal{V}_{1}\circ\pazocal{N}_{k-1}(\sigma_{x}^{k-1})\right\\|_{1}$	(19)
	$\displaystyle\leq\frac{1}{M}\sumop\slimits@_{x=1}^{M}\left\\|(\pazocal{V}_{x}-\pazocal{V}_{1})\circ\pazocal{N}_{k-1}(\sigma_{x}^{k-1})\right\\|_{1}$
	$\displaystyle=\frac{1}{M}\sumop\slimits@_{x=1}^{M}\left\\|V_{x}\zeta V_{x}^{\dagger}-V_{1}\zeta V_{1}^{\dagger}\right\\|_{1}$

$\displaystyle\hskip 0.0pt\frac{1}{M}\sumop\slimits@_{x=1}^{M}\left\\|V_{x}\zeta V_{x}^{\dagger}-V_{1}\zeta V_{1}^{\dagger}\right\\|_{1}$	$\displaystyle\leq\frac{1}{M}\sumop\slimits@_{x=1}^{M}\left(\left\\|(V_{x}-V_{1})\zeta V_{x}^{\dagger}\right\\|_{1}+\left\\|V_{1}\zeta(V_{x}-V_{1})^{\dagger}\right\\|_{1}\right)$	(20)
	$\displaystyle\leq\frac{1}{M}\sumop\slimits@_{x=1}^{M}\left(\\|V_{x}-V_{1}\\|_{\mathrm{op}}\left\\|\zeta V_{x}^{\dagger}\right\\|_{1}+\\|V_{x}-V_{1}\\|_{\mathrm{op}}\left\\|V_{1}\zeta\right\\|_{1}\right)$
	$\displaystyle\leq 2\eta.$

$\displaystyle\hskip 0.0pt\left\\|{}_{x}-{}_{y}\right\\|_{\diamond}$	$\displaystyle\geq\left\\|{}_{x}(\Psi)-{}_{y}(\Psi)\right\\|_{1}$	(29)
	$\displaystyle=\left\\|\mathrm{Tr}_{E}\left[{V}_{x}(\Psi){V}_{x}^{\dagger}\right]-\mathrm{Tr}_{E}\left[{V}_{y}(\Psi){V}_{y}^{\dagger}\right]\right\\|_{1}$
	$\displaystyle=\bigg\\|\varepsilon^{2}\ket{1}\!\!\bra{1}\otimes\left(\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{x}\Psi\mathaccent 869{V}_{x}^{\dagger}\right]-\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{y}\Psi\mathaccent 869{V}_{y}^{\dagger}\right]\right)$
	$\displaystyle\qquad+\varepsilon\sqrt{1-\varepsilon^{2}}\ket{0}\bra{1}\otimes\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\Psi(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\right]$
	$\displaystyle\qquad+\varepsilon\sqrt{1-\varepsilon^{2}}\ket{1}\bra{0}\otimes\mathrm{Tr}_{E}\left[(\mathaccent 869{V}_{x}-\mathaccent 869{V}_{y})\Psi\mathaccent 869{V}_{0}^{\dagger}\right]\bigg\\|_{1}$
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(i)}}}}{{\geq}}\varepsilon\sqrt{1-\varepsilon^{2}}\left\\|\ket{0}\bra{1}\otimes\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\Psi(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\right]+\ket{1}\bra{0}\otimes\mathrm{Tr}_{E}\left[(\mathaccent 869{V}_{x}-\mathaccent 869{V}_{y})\Psi\mathaccent 869{V}_{0}^{\dagger}\right]\right\\|_{1}$
	$\displaystyle\quad-\varepsilon^{2}\left\\|\ket{1}\!\!\bra{1}\otimes(\mathaccent 869{V}_{x}\Psi\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}\Psi\mathaccent 869{V}_{y}^{\dagger})\right\\|_{1}$
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(ii)}}}}{{=}}2\varepsilon\sqrt{1-\varepsilon^{2}}\left\\|\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\Psi(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\right]\right\\|_{1}-\varepsilon^{2}\left\\|\mathaccent 869{V}_{x}\Psi\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}\Psi\mathaccent 869{V}_{y}^{\dagger}\right\\|_{1}$
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(iii)}}}}{{\geq}}1\varepsilon\sqrt{1-\varepsilon^{2}}-2\varepsilon^{2}$
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(iv)}}}}{{\geq}}07\varepsilon,$

$\displaystyle\hskip 0.0pt\\|C\\|_{1}$	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(i)}}}}{{\leq}}\\|\mathaccent 869{V}_{0}{}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\\|_{1}$	(62)
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(ii)}}}}{{=}}\\|{}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\\|_{1}$
	$\displaystyle=\sqrt{\bra{{}_{A^{\prime}A}}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})(\mathaccent 869{V}_{x}-\mathaccent 869{V}_{y})\ket{{}_{A^{\prime}A}}}$
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(iii)}}}}{{=}}\sqrt{\frac{1}{d_{A}}\operatorname{Tr}\|\mathaccent 869{V}_{x}-\mathaccent 869{V}_{y}\|^{2}}$
	$\displaystyle=\frac{1}{\sqrt{d_{A}}}\\|\mathaccent 869{V}_{x}-\mathaccent 869{V}_{y}\\|_{2}$
	$\displaystyle\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(iv)}}}}{{\leq}}\frac{1}{\sqrt{d_{A}}}\\|U_{x}-U_{y}\\|_{2},$

		$\displaystyle\|f(U_{x},U_{y})-f(U^{\prime}_{x},U^{\prime}_{y})\|$		(65)
		$\displaystyle\quad=\left\|\;\left\\|\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\ket{\Psi}\!\!\bra{\Psi}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\right]\right\\|_{1}-\left\\|\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\ket{\Psi}\!\!\bra{\Psi}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{{}^{\prime}\dagger}-\mathaccent 869{V}_{y}^{{}^{\prime}\dagger})\right]\right\\|_{1}\;\right\|$
		$\displaystyle\quad\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(v)}}}}{{=}}\left\\|\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\ket{\Psi}\!\!\bra{\Psi}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{y}^{\dagger})\right]-\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\ket{\Psi}\!\!\bra{\Psi}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{{}^{\prime}\dagger}-\mathaccent 869{V}_{y}^{{}^{\prime}\dagger})\right]\right\\|_{1}$
		$\displaystyle\quad\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(vi)}}}}{{\leq}}\left\\|\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\ket{\Psi}\!\!\bra{\Psi}_{A^{\prime}A}(\mathaccent 869{V}_{x}^{\dagger}-\mathaccent 869{V}_{x}^{{}^{\prime}\dagger})\right]\right\\|_{1}+\left\\|\mathrm{Tr}_{E}\left[\mathaccent 869{V}_{0}\ket{\Psi}\!\!\bra{\Psi}_{A^{\prime}A}(\mathaccent 869{V}_{y}^{{}^{\prime}\dagger}-\mathaccent 869{V}_{y}^{\dagger})\right]\right\\|_{1}$
		$\displaystyle\quad\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(vii)}}}}{{\leq}}\sqrt{\frac{1}{d_{A}}}\\|{U}_{x}-{U}_{x}^{\prime}\\|_{2}+\sqrt{\frac{1}{d_{A}}}\\|{U}_{y}-{U}_{y}^{\prime}\\|_{2}$
		$\displaystyle\quad\stackrel{{\scriptstyle\mathclap{\scriptsize\mbox{(viii)}}}}{{\leq}}\sqrt{\frac{2}{d_{A}}}\sqrt{\\|{U}_{x}-{U}_{x}^{\prime}\\|_{2}^{2}+\\|{U}_{y}-{U}_{y}^{\prime}\\|_{2}^{2}}$
		$\displaystyle\quad=\sqrt{\frac{2}{d_{A}}}\\|(U_{x},U_{y})-(U_{x}^{\prime},U_{y}^{\prime})\\|_{2},$

Improved Lower Bounds for Learning Quantum Channels in Diamond Distance

Abstract

1 Introduction

1.1 Notation

1.2 Quantum channels and their representations

1.3 Channel ensembles with distance constraints

1.4 The coherent query model

2 A general lower bound on channel learning

Theorem 1 ((General lower bound)).

Proof.

3 An ensemble yielding an improved lower bound

Lemma 2.

Proof.

Theorem 3 ((Improved lower bound for channel learning)).

Proof.

4 Proof of Lemma 2

Lemma 4.

Proof.

Lemma 5 ([meckes2013spectral, Corollary 17]).

Proof of Lemma 2..

5 Conclusion

Acknowledgments

References

Appendix A A weaker lower bound using existing packing nets

Appendix B Existence of the quantum channel ~0\mathaccent 869{\Phi}_{0}

Appendix C Proof of Lemma 4

Appendix D Weingarten Calculus

Lemma 6 ([gu2013moments]).

Lemma 7 ([collins2006integration]).

Appendix B Existence of the quantum channel $\mathaccent 869{\Phi}_{0}$