0% found this document useful (0 votes)
21 views

Batch Signatures, Revisited: Carlos Aguilar-Melchor Martin R. Albrecht Thomas Bailleux

Uploaded by

jbs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Batch Signatures, Revisited: Carlos Aguilar-Melchor Martin R. Albrecht Thomas Bailleux

Uploaded by

jbs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Batch Signatures, Revisited

Carlos Aguilar-Melchor Martin R. Albrecht Thomas Bailleux


SandboxAQ SandboxAQ SandboxAQ
[email protected] [email protected] [email protected]

Nina Bindel James Howe Andreas Hülsing


SandboxAQ SandboxAQ Eindhoven University of Technology
[email protected] [email protected] [email protected]

David Joseph Marc Manzano


SandboxAQ SandboxAQ
[email protected] [email protected]

ABSTRACT significantly higher than those of ECDSA; the fastest currently-


We revisit batch signatures (previously considered in a draft RFC, deployed primitive for signing. This severely impacts the ability
and used in multiple recent works), where a single, potentially ex- of systems to scale and inhibits their migration to PQC, especially
pensive, “inner” digital signature authenticates a Merkle tree con- in higher-throughput settings.
structed from many messages. We formalise a construction and For instance, at a 128-bit security level, using the standard SU-
prove its unforgeability and privacy properties. PERCOP platform benchmarks on a Core i7 Tigerlake processor [6],
We also show that batch signing allows us to scale slow signing ECDSA over an Edwards curve requires 85K cycles for signing.
algorithms, such as those recently selected for standardisation as The equivalent Dilithium, Falcon and SPHINCS+ signatures need
part of NIST’s post-quantum project, to high throughput, with a 272K, 570K and 25M cycles, respectively (considering the fastest
mild increase in latency. We demonstrate the practical efficiency alternative among existing variants for each).1 The performance
of batch signing in the context of TLS. For the example of Falcon- gap between ECDSA and the three PQC alternatives is vast. Fur-
512 in TLS, we can increase the amount of connections per second thermore, there are good reasons to choose Falcon or SPHINCS+
by a factor 3.2x, at the cost of an increase in the signature size by over Dilithium for certain scenarios, which increases the gap fur-
∼ 14% and the median latency by ∼ 25%, where both are ran on ther: Falcon provides smaller signatures and verification key sizes
the same 30 core server. which makes it a strong contender in networking applications and
We also discuss applications where batch signatures allow us SPHINCS+ relies on conservative security assumptions which are
to increase throughput and to save bandwidth. For example, again appropriate for long-term security.
for Falcon-512, once one batch signature is available, the additional In 2020 an RFC draft [2] proposed Batch Signing for TLS to solve
bandwidth for each of the remaining 𝑁 − 1 is only 82 bytes. existing scalability challenges of classical digital signature stan-
dards in a high-throughput TLS setting. In this approach, one ex-
pensive “inner” signing operation signs the root of a Merkle tree
1 INTRODUCTION constructed from a batch of messages. Then, the final signature for
Unkeyed and symmetric cryptography is known to be significantly each message contains the sibling nodes of a message to recover
cheaper than asymmetric cryptography from a computational per- the Merkle tree’s root and the original “inner” signature. This rep-
spective. Indeed, hash functions, stream or block ciphers typically resents a logarithmic increase in the signature size but asymptoti-
require between a few cycles [1] to a few hundred cycles [12], cally reduces the amortised cost to a few hash computations. The
whereas key establishment and digital signature primitives require draft was not finalised, and thus has now become a deprecated TLS
between tens of thousands to hundreds of millions of cycles [6]. In working group document [3]. While the proposal was motivated
situations where a substantial volume of signatures must be han- by classical signature standards and TLS, the approach can be gen-
dled – e.g. a Hardware Security Module (HSM) renewing a large eralised and used with any signature scheme, e.g. with one of the
set of short-lived certificates or a load balancer terminating a large PQC schemes, and in a myriad of additional settings.
number of TLS connections per second – this may pose serious Other recent works have considered using Merkle trees to re-
limitations on scaling these and related scenarios. duce (amortised) signature size in certificates by signing them in
These challenges are amplified by upcoming public-key cryp- batches and using the fact that they all share the same “inner”
tography standards: In July 2022, the US National Institute of Stan- signature. A recent work [9] focuses on stateful signatures tar-
dards and Technology (NIST) announced four algorithms for post- geting such signature size reduction. Stateful signature schemes
quantum cryptography (PQC) standardisation. In particular, three are capable of producing small signatures, which are ideal for use
digital signature algorithms, namely Dilithium [16], Falcon [21], cases such as certificate authorities, but at the expense of a more
and SPHINCS+ [14], were selected, and migration from current involved design, with the critical need for state management. A
standards to these new algorithms is already underway [27]. One stateless approach, in contrast, generalises more easily and allows
of the key issues when considering migrating to PQC is that the
1 See
computational costs of the new digital signature algorithms are also the discussion of Falcon’s performance in Section 2.3.
1
for a more flexible design applicable to a plethora of use cases Table 1: Batch signature sizes for a targeted security level 𝜆.
for increasing the throughput of signature schemes. A recent RFC
draft [4] proposes a stateless system to reduce certificate sizes by Scheme 𝜆 |vk| |𝜎| 𝑁 |sig| |sig𝑐 |
defining CAs that only sign certificates in batches and rely on a
ECDSA P256 128 64 64 32 162 98
Certificate Transparency channel to deliver the large “inner” sig-
Dilithium2 128 1312 2420 32 2518 98
natures.
Dilithium5 256 2592 4595 32 4693 98
These works highlight the increasing community interest in batch
Falcon-512 128 897 666 16 748 82
signatures.
Falcon-1024 256 1793 1280 32 1378 98
Falcon-512-fpemu 128 897 666 16 758 82
1.1 Contributions Falcon-1024-fpemu 256 1793 1280 16 1362 82
In this work, we study batch signatures and provide (i) a refine- All sizes are in bytes. Batch signature size is given in the column |sig| , verification
ment of the construction from [2] to reduce the size by removing key size in |vk| , “inner” signature size in |𝜎 | ; all for a batch of size 𝑁 . The
the collision-resistance property from the requirements of the hash compressed batch signature size, assuming the inner signature for multiple batch
signatures is cached, is given in column |sig𝑐 | . We assume two bytes are used to
function used to build the Merkle tree, (ii) a formal treatment of the encode 𝑁 in sig.
unforgeability and privacy of batch signatures, (iii) a description of
several settings in which batch signatures can have a positive im- Signature sizes (or certificates) grow by fewer than one hundred bytes. This
represents, for the algorithms considered (except ECDSA), at most ten percent when
pact in terms of either throughput increase or bandwidth reduction considering signatures and at most five percent when considering certificates
and (iv) an empirical study into batch signatures for both classi- (signatures + verification keys).
cal and PQC schemes in the context of TLS and certificate signing.
Overall, our performance study indicates that, while the approach Table 2: Handshakes per second and latency for different
introduces a logarithmic overhead in signature sizes (cf. Table 1) percentiles in TLS using different signing algorithms.
and signing latency, it significantly reduces the CPU burden (cf. Ta-
ble 2) allowing us to scale to a larger number of signatures per sec- Scheme and Handshakes Latency (ms) Signing
ond compared with the plain approach using the same number of Instantiation Per Second med p90 p99 Cores
cores. Moreover, our benchmarks also indicate that in some appli-
Plain 39,000 1.2 1.3 1.5 1.2
cations batch signatures result in significant bandwidth savings. ECDSA P256
MT N=32 49,000 1.3 1.7 2.7 1.0
In more detail, after some preliminaries in Section 2, we formally
define batch signing and its security properties in Section 3. In par- Plain 29,000 1.6 1.8 2.1 2.9
Dilithium2
ticular, in addition to the usual unforgeability property, we also MT N=32 50,000 1.8 2.2 2.7 1.0
define batch privacy notions that essentially control the leakage Plain 25,000 1.9 2.2 2.4 4.4
Dilithium5
of information due to signing in batches. We define two variants MT N=32 43,000 2.2 2.6 3.2 1.0
of batch privacy, with our construction achieving the weaker one. Plain 28,000 1.1 1.3 1.5 7.8
We then specify our batch signing scheme in Section 4, which is es- Falcon-512
MT N=16 43,000 1.5 1.8 2.5 2.0
sentially a refined version of that in [2]. The main difference is that Plain 24,000 2.0 2.1 2.3 13.1
we do not need to rely on collision resistance but instead on target Falcon-1024
MT N=32 43,000 2.2 2.5 3.3 2.0
collision resistance [5], allowing us to pick smaller parameters and
thus reduce the signature size. We prove the security properties of Falcon-512 Plain 5,000 5.1 5.2 6.0 20.0
(fpemu) MT N=16 16,000 6.4 7.6 8.4 8.0
our construction in Section 5. We describe some real-world appli-
cations where batch signatures can provide significant improve- Falcon-1024 Plain 2,600 9.9 10.0 11.0 22.5
ments in Section 6 and finally describe our implementation for the (fpemu) MT N=16 8,200 12.0 15.0 17.0 8.0
TLS scenario in Section 7. All experiments are run on a 30 core machine with HyperThreading disabled. The
results are presented for a ‘plain’ multi-threaded implementation (pool of as many
threads as CPU cores, with select/poll handling), and for the Merkle Tree (MT)
2 PRELIMINARIES approach with a limit 𝑁 to the maximum size of a tree. The amount of cores used
for signatures is estimated for the plain approach, out of the computational cost of
We write 𝑥 ← 𝑦 for assigning 𝑦 to 𝑥 and 𝑥 ←$ D for sampling one signature, and fixed (by reserving cores explicitly) for the Merkle Tree approach.
𝑥 from some distribution D. If D is a finite set, we assume the
The number of handshakes per second is roughly doubled for fast algorithms, and
uniform distribution over this set. We write PPT for probabilistic multiplied by a factor between three and four for slow algorithms. Latency (99th
polynomial time and BQP for bounded-error quantum polynomial percentile) is increased by roughly fifty percent (one millisecond for fast algorithms
time. and up to six milliseconds for the slower ones).

2.1 Hash Functions in multi-target settings where an adversary wins when they man-
In this work we consider tweakable hash functions. These are keyed age to attack one out of many targets.
hash functions that take an additional input which can be thought Definition 2.1 (Tweakable Hash Function [5]). Let 𝑛, 𝑚 ∈ N, let P
of as a domain separator (while the key or public parameter serves be the public parameters space and T the tweak space. A tweakable
as a separator between users). When used right, tweakable hash hash function is a tuple of algorithm H = (KeyGen, Eval) such
functions allow to tightly achieve target collision resistance even that:
2
A (𝑘, 𝜆)
SM-TCRH EUF-CMASA (𝜆) / BEUF-CMASA (𝜆) Sign(𝜇)
𝜆
𝑃 ← KeyGen(1 ) Q ← ∅; 𝜎 ← Sign(sk, 𝜇 )
(1𝜆 ) vk, sk ← KeyGen(1𝜆 );
H(𝑃,·,·) Q ← Q ∪ { (𝜇, 𝜎 ) }
𝑆 ← A0
/ Q = { (𝑡0 , 𝜇0 ), . . . , (𝑡𝑘 −1 , 𝜇𝑘 −1 ) } queries submitted to H(𝑃, ·, ·) ★ ★
(𝜇 , 𝜎 ) ← A Sign
(vk); / EUF-CMA return 𝜎
for 𝑖, ℓ ∈ {0, . . . , 𝑘 − 1}, 𝑖 ≠ ℓ do (𝜇★, 𝜎 ★ ) ← A BSign (vk); / BEUF-CMA BSign(M)
if 𝑡𝑖 = 𝑡 ℓ : return ⊥ ★ ★
return (𝜇 , ·) ∉ Q ∧ Verify(vk, 𝜎 , 𝜇 ) = 1★ S ← Sign(sk, M )
( 𝑗, 𝜇 ) ← A1 (1𝜆 , 𝑆, 𝑃, Q ) for 0 ≤ 𝑗 < | M | do
return 0 ≤ 𝑗 < 𝑘 ∧ 𝜇 ≠ 𝜇 𝑗 ∧ H(𝑃, 𝑡 𝑗 , 𝜇 𝑗 ) = H(𝑃, 𝑡 𝑗 , 𝜇 ) 𝑞 𝑗 ← ( M [ 𝑗 ], S [ 𝑗 ] )
Q ← Q ∪ {𝑞 𝑗 }
Figure 1: Single-function, Multi-Target Collision Resistance return S
for distinct tweaks (SM-TCR).
Figure 3: Existential Unforgeability under Chosen Message
Attacks for Signatures (EUF-CMA) and Batch Signatures
Commit RoR(𝑥)
(BEUF-CMA).
𝑏 ←$ {0, 1} 𝑘 ←$ {0, 1} 𝜆

𝑏 ←A RoR(·)
() 𝑦0 ← F(𝑘, 𝑥 )
return 𝑏 = 𝑏 ′ 𝑦1 ←$ {0, 1}𝑛 KeyGen The key generation algorithm is a randomised algorithm
return 𝑦𝑏 that takes as input a security parameter 1𝜆 and outputs a
pair (vk, sk), the verification key and signing key, respec-
tively. We write (vk, sk) ← KeyGen(1𝜆 ).
Figure 2: One-time Pseudorandom Function (OT-PRF)
Sign The signing algorithm takes as input a signing key sk, a
message 𝜇 and outputs a signature 𝜎. We write this as 𝜎 ←
KeyGen the setup function takes the security parameter 1𝜆 and Sign(sk, 𝜇). The signing algorithm may be randomised or
outputs a (possibly empty) public parameter 𝑝. We write deterministic. We may write 𝜎 ← Sign(sk, 𝜇; 𝑟 ) to un-
𝑝 ← KeyGen(1𝜆 ). earth the used randomness explicitly.
Eval the evaluation function takes public parameters 𝑝, a tweak Verify The verification algorithm takes as input a verification
𝑡, an input 𝑥 ∈ {0, 1}𝑚 and returns a hash value ℎ. We key vk, a signature 𝜎 and a message 𝜇 and outputs a bit
write ℎ ← Eval(𝑝, 𝑡, 𝑥) or simply ℎ ← H(𝑝, 𝑡, 𝑥). This is 𝑏, with 𝑏 = 1 meaning the signature is valid and 𝑏 = 0
a deterministic function. meaning the signature is invalid. Verify is a deterministic
algorithm. We write 𝑏 ← Verify(vk, 𝜎, 𝜇).
In what follows, we will avoid relying on plain collision resis-
We require that except with negligible probability over (vk, sk) ←
tance but target collision resistance of tweakable hash functions.
KeyGen(1𝜆 ), it holds that Verify(vk, Sign(sk, 𝜇), 𝜇) = 1 for all 𝜇.
Definition 2.2 (Target Collision Resistant Hash Function [5]). An
efficient tweakable hash function H = (KeyGen, Eval) is called We rely on the standard notion of existential unforgeability un-
single-function multiple-targets target-collision resistant for dis- der chosen message attacks:
tinct tweaks (SM-TCR) if the advantage Advsm-tcr
A,H (𝑘, 𝜆) of any (PPT/ Definition 2.5 (EUF-CMA). We define
BQP) algorithms A = (A0, A1 ) that define up to 𝑘 targets in the
SM-TCR experiment defined in Figure 1 is negligible with Adveuf-cma
A,S (𝜆) B Pr[EUF-CMASA (𝜆) ⇒ 1]
A
Advsm-tcr
A,H (𝑘, 𝜆) B Pr [SM-TCRH (𝑘, 𝜆) ⇒ 1]. for EUF-CMASA (𝜆) as in Figure 3 and say a signature scheme S is
We will also rely on one-time pseudrandomness to argue pri- EUF-CMA secure if no PPT/BQP adversary A has non-negligible
vacy. advantage Adveuf-cma
A,S (𝜆).

Definition 2.3 (OT-PRF). Let 𝑛, 𝑚 ∈ N, F : {0, 1}𝜆 × {0, 1}𝑚 → Remark. In our construction, the signature scheme takes as inputs
{0, 1}𝑛 be a keyed function. We define and outputs batches of messages and signatures, respectively. We for-
ot-prf A mally define batch signature schemes and their security (BEUF-CMA)
Adv A,F (𝜆) B Pr[OT-PRFF (𝜆) ⇒ 1]
in Section 3.
A (𝜆) as in Figure 2 and say F is an OT-PRF if no PPT/BQP
for OT-PRFF
ot-prf 2.3 Falcon Signature Scheme
adversary A has non-negligible advantage Adv A,F (𝜆).
Since our flagship demonstrator is the composition of our scheme
with Falcon [20] (based on the GPV paradigm [11]), we give a
2.2 Digital Signatures stylised description in Figure 4, since this suffices for our purposes
Definition 2.4 (Signature Scheme). A signature scheme S consists here. Let (TrapGen, SampD, SampPre) be PPT algorithms with
of three PPT algorithms (KeyGen, Sign, Verify) such that: the following syntax and properties [10, 11, 17]:
3
• (𝑨, td) ← TrapGen(1𝜂 , 1ℓ , 𝑞, R, 𝛽) takes dimensions 𝜂, ℓ ∈ N, a KeyGen The key generation algorithm is a randomised algorithm
modulus 𝑞 ∈ N, a ring R, and a norm bound 𝛽 ∈ R. It generates that takes as input a security parameter 1𝜆 and outputs a
𝜂×ℓ
a matrix 𝑨 ∈ R𝑞 and a trapdoor td. For any 𝑛 ∈ poly(𝜆) pair (vk, sk), the verification key and signing key, respec-
and ℓ ≥ lhl(R, 𝜂, 𝑞, 𝛽), the distribution of 𝑨 is within negl(𝜆) tively. We write (vk, sk) ← KeyGen(1𝜆 ).
𝜂×ℓ BSign The batch signing algorithm takes as input a signing key
statistical distance to the uniform distribution on R𝑞 .
• 𝒖 ← SampD(1𝜂 , 1ℓ , R, 𝛽 ′ ) with ℓ ≥ lhl(R, 𝜂, 𝑞, 𝛽) outputs an sk, a list of messages M = {𝜇𝑖 } and outputs a list of sig-
element in 𝒖 ∈ R ℓ with norm bound 𝛽 ′ ≥ 𝛽. We have that natures S = {sig𝑖 }. We write this as S ← BSign(sk, M).
𝒗 B 𝑨 · 𝒖 mod 𝑞 is within negl(𝜆) statistical distance to the The signing algorithm may be randomised or determinis-
uniform distribution on R𝑞 .
𝜂 tic. We may write S ← BSign(sk, M; 𝑟 ) to unearth the

• 𝒖 ← SampPre(td, 𝒗, 𝛽 ) with ℓ ≥ lhl(R, 𝜂, 𝑞, 𝛽) takes a trap- used randomness explicitly.
𝜂 Verify The verification algorithm takes as input a verification
door td, a vector 𝒗 ∈ R𝑞 , and a norm bound 𝛽 ′ ≥ 𝛽. It samples
key vk, a signature sig and a message 𝜇 and outputs a bit
𝒖 ∈ R ℓ satisfying 𝑨 · 𝒖 ≡ 𝒗 mod 𝑞 and ∥𝒖 ∥ ≤ 𝛽 ′ . Furthermore, 𝒖
𝑏, with 𝑏 = 1 meaning the signature is valid and 𝑏 = 0
is within negl(𝜆) statistical distance to 𝒖 ← SampD(1𝜂 , 1ℓ , R, 𝛽 ′ )
meaning the signature is invalid. Verify is a deterministic
conditioned on 𝒗 ≡ 𝑨 · 𝒖 mod 𝑞.
algorithm. We write 𝑏 ← Verify(vk, sig, 𝜇).
We require that except with negligible probability over (vk, sk) ←
KeyGen(1𝜆 ), for all M B {𝜇𝑖 } and S ← BSign(sk, {𝜇𝑖 }) it holds
KeyGen(1𝜆 ) Sign(𝜇 𝑗 , sk𝑖 ; 𝑟 )
that ∀ sig𝑖 ∈ S : Verify(vk, sig𝑖 , 𝜇𝑖 ) = 1.
𝑨, td ← TrapGen(11 , 12 , 𝑞, R, 𝛽 ) 𝒚 𝑗 ← SampPre(td, 𝐻 (𝜇 𝑗 , 𝑟 ), 𝛽 ′ )
return vk𝑖 = 𝑨, sk𝑖 = td return 𝒚 Definition 3.2 (EUF-CMA for Batch Signature Schemes). We de-
fine
Verify(𝜎 𝑗 , 𝜇 𝑗 , vk𝑖 ) Adveuf-cma (𝜆) B Pr[BEUF-CMASA (𝜆) ⇒ 1]
?
A,S
?
return 𝒚 𝑗 ≤ 𝛽 ′ ∧ 𝐻 (𝜇 𝑗 ) ≡ 𝑨 · 𝒚 𝑗 for BEUF-CMASA (𝜆) as in Figure 3 and say a batch-signature scheme
S is EUF-CMA secure if no PPT/BQP adversary A has non-negligible
Figure 4: Falcon signatures [11, 21] advantage Adveuf-cma
A,S (𝜆).
The following proposition is immediate, by simply calling Sign
for all 𝜇𝑖 ∈ M. We call this the naïve construction.
Assumption 2.6. The Falcon signature scheme is EUF-CMA se-
cure. In particular, no (quantum) adversary exists to forge messages Proposition 3.3. Every EUF-CMA secure signature scheme can
with cost ≪ 2128 for Falcon-512 and no such adversary exists with be turned into a EUF-CMA secure batch signature scheme.
cost ≪ 2256 for Falcon-1024.
We also define two privacy notions for batch signatures. These
Performance. Consider Falcon-512 which minimises the signature assert that no efficient adversary can distinguish whether signa-
size among the NIST selected post-quantum signature algorithms. tures were signed in the same batch or not. A weak variant of
An optimised implementation beats RSA-2048 signing by roughly privacy only guarantees that signatures from the same batch do
a factor of five [6]. Critically, however, this optimised implementa- not leak anything about a message for which no signature is made
tion relies on constant-time double-precision floating point arith- available.
metic. This is not completely out of reach, as demonstrated by
Definition 3.4 ((Weak) Batch Privacy). We define
constant-time Falcon implementations [19] on several different CPUs,
(𝜆) B Pr[BATCH-PRIVSA (𝜆) ⇒ 1] − 1/2
batch-priv
working around several CPU instructions’ behaviours. However, Adv A,S
the long-term reliability of this approach is less certain than for
bit or integer operations. That is, future instructions or optimisa- and
(𝜆) B Pr[wBATCH-PRIVSA (𝜆) ⇒ 1] − 1/2
wbatch-priv
tions might prevent the desired constant-time behaviour. Further- Adv A,S
more, many CPUs to date simply lack fast constant-time double-
for the games defined in Figure 5 and say a signature scheme S
precision arithmetic [13].
has (weak) batch privacy if no PPT/BQP adversary A has non-
On systems where no sufficiently constant-time floating point (w)batch-priv
unit is available or where floating-point arithmetic is avoided for negligible advantage Adv A,S (𝜆).
the reasons mentioned above, floating-point arithmetic can be em- Our construction in Section 4 achieves wBATCH-PRIV but not
ulated (in constant time) at a hefty – approximately 20x – over- BATCH-PRIV and thus establishes that there are schemes achiev-
head [19]. ing the former but not the latter. Next, we establish that an adver-
sary breaking wBATCH-PRIV can also break BATCH-PRIV.
3 BATCH SIGNATURES
We formally define batch signatures. Lemma 3.5. Let A be an adversary against wBATCH-PRIV with
wbatch-priv
Adv A,S (𝜆). Then there is an adversary B against BATCH-PRIV
Definition 3.1 (Batch Signature Scheme). A batch signature scheme with advantage
S consists of three PPT algorithms (KeyGen, BSign, Verify) such
batch-priv wbatch-priv
that: Adv B,S (𝜆) ≥ 1/2 · Adv A,S (𝜆)
4
𝜌
BATCH-PRIVSA (𝜆) Sign(M)
𝑏 ←$ {0, 1}; if 𝑏 = 0 then 𝑛 3,0
𝑏 ★ ← A Sign (vk); for 𝜇𝑖 ∈ M do
𝑛 2,0 𝑛 2,1

return 𝑏 = 𝑏 {sig𝑖 } ← BSign(sk, {𝜇𝑖 } );
S ← {sig𝑖 }0≤𝑖<|M | 𝑛 1,0 𝑛 1,1 𝑛 1,2 𝑛 1,3
else
𝜇0 𝜇1 𝜇2 𝜇3 𝜇4 𝜇5 𝜇6 𝜇7
S ← BSign(sk, M );
return S
Figure 6: A Merkle tree and addressing scheme.
wBATCH-PRIVSA (𝜆) Sign(M, 𝑖, {𝜇 0, 𝜇 1 })
𝑏 ←$ {0, 1}; if 𝑖 ≥ | M | ∨ 𝑖 < 0 then return ⊥
Algorithm 1 BSign(sk, 𝑀 = [𝜇 0, 𝜇 1, . . . , 𝜇 𝑁 −1 ]) for 𝑁 = 2𝑛

𝑏 ←A Sign
(vk); M𝑖 ← 𝜇𝑏 / 𝑖 -th message is 𝜇𝑏
1: 𝑇 ←[]
return 𝑏 ★ = 𝑏 S ← BSign(sk, M );
2: id ←$ {0, 1}𝜆 ⊲ Tree identifier
S𝑖 ← ⊥ / delete 𝑖 -th signature 3: for 0 ≤ 𝑖 < 𝑁 do ⊲ Generate 𝑁 leaves
return S 4: 𝑟𝑖 ←$ {0, 1}𝜆
5: 𝑇 [0, 𝑖] ← H(id, 0 | 𝑖, 𝑟𝑖 | 𝜇𝑖 )
Figure 5: (Weak) Batch Privacy. 6: end for
7: ℎ ← log2 𝑁
8: for 0 ≤ 𝑘 < ℎ do
Proof. To construct the adversary B against BATCH-PRIV, we 9: for 0 ≤ 𝑗 < 2ℎ−𝑘 −1 do ⊲ Build tree
use the BATCH-PRIV signing oracle to simulate the call to BSign. 10: left, right ← 𝑇 [𝑘, 2𝑗], 𝑇 [𝑘, 2𝑗 + 1]
For this, we sample a bit 𝑐 to decide what set M to submit to ⊲ 𝑖𝑑 is public parameter, (1 | (𝑘 + 1) | 𝑗) is tweak
BATCH-PRIV signing oracle. When the adversary outputs 𝑐 ★ = 𝑐 11: 𝑇 [𝑘 + 1, 𝑗] ← H(id, 1 | (𝑘 + 1) | 𝑗, left | right)
we output 𝑏★ = 1, otherwise we output 𝑏★ = 0. 12: end for
batch-priv
To bound Adv B,S (𝜆) note that if 𝑏 = 0 the signatures re- 13: end for
turned by the BATCH-PRIV signing oracle are independent of 𝜇 0 14: root ← 𝑇 [ℎ, 0]
and 𝜇 1 by construction and thus the advantage of A is zero. If 𝑏 = 15: 𝜎 ← S.Sign(sk, id | root | 𝑁 )
1 then our signing oracle faithfully emulates the wBATCH-PRIV 16: for 0 ≤ 𝑖 < 𝑁 do ⊲ Generate user signature
signing oracle. Thus, 17: path𝑖 ← []
18: for 0 ≤ 𝑘 < log2 𝑁 do
(𝜆) = Pr[BATCH-PRIVSB (𝜆) ⇒ 1] − 1/2
batch-priv
Adv B,S 19: 𝑗 ← ⌊𝑖/2𝑘 ⌋
if 𝑗 mod 2 = 0 then
Pr[wBATCH-PRIVSA (𝜆)
20:
= 1/2 · ⇒ 1 | 𝑏 = 0] − 1/2
21: path𝑖 [𝑘] = 𝑇 [𝑘, 𝑗 + 1]
+1/2 · Pr[wBATCH-PRIVSA (𝜆) ⇒ 1 | 𝑏 = 1] − 1/2 22: else
= 0 + 1/2 · Pr[wBATCH-PRIVSA (𝜆) ⇒ 1 | 𝑏 = 1] − 1/2 23: path𝑖 [𝑘] = 𝑇 [𝑘, 𝑗 − 1]
24: end if
wbatch-priv
= 0 + 1/2 · Adv A,S (𝜆) 25: end for
26: sig𝑖 ← (id, 𝑁 , 𝜎, 𝑖, 𝑟𝑖 , path𝑖 )

27: end for
Finally, we note that BATCH-PRIV is achievable: 28: return {sig0, sig1, . . . , sig𝑁 −1 }

Proposition 3.6. The naïve construction of batch signatures from


the Falcon signature scheme is batch private.
BaS = (KeyGen, BSign, Verify) with KeyGen B S.KeyGen
4 CONSTRUCTION and BSign and Verify as in Algorithms 1 and 2 respectively.
Our construction relies on a Merkle tree. When addressing nodes
in a Merkle tree of height ℎ with 𝑁 leaves, we may label nodes and Remark. For clarity we restrict our presentation to a fixed, power-
leaves in the tree by their position: 𝑛𝑖,𝑘 is the 𝑖-th node at height of-two batch size 𝑁 . To handle batches that do not satisfy this, we
𝑘, counting from left to right and from bottom upwards (i.e. leaves break down too long lists of messages into several batches of size
are on height 0 and the root is on height ℎ). We illustrate this in at most 𝑁 . To handle batches of size less than 𝑁 we can either pad
Figure 6. the tree by repeating leaves or use incomplete trees (see e.g. “L-trees”
Let S = (KeyGen, Sign, Verify) be a digital signature scheme in [8]). Since this is standard in the literature, we omit the details
as defined in Definition 2.4, let H be a tweakable hash function as here.
defined in Definition 2.1. We define our batch signature scheme
5
Algorithm 2 Verify(vk, 𝜇, sig = (id, 𝑁 , 𝜎, 𝑖, 𝑟, path)) To bound the distance between both games, we construct an
1: ℎ ← H(id, 0 | 𝑖, 𝑟 | 𝜇) algorithm B that breaks EUF-CMA of S using A. Given vk, B
2: 𝑘 ← 0; runs A(vk). It implements the signing oracle for A following Al-
3: for 1 ≤ 𝑘 < log2 𝑁 do ⊲ Construct root gorithm 1 with the only difference that it asks its own S-signing or-
4: 𝑗 ← ⌊𝑖/2𝑘 ⌋ acle to sign (id, root, 𝑁 ) in line 15. Hence, B makes the same num-
5: if 𝑗 mod 2 = 0 then ber of signing queries A makes. Consider the event that Game1
6: ℎ ← H(id, 1 | 𝑘 | 𝑗, ℎ | path[𝑘]) aborts but Game0 does not. We can bound this probability by B’s
7: else advantage AdvEUF-CMA
S,B with 𝑞𝑠 many queries. So
8: ℎ ← H(id, 1 | 𝑘 | 𝑗, path[𝑘] | ℎ)
Adv0 − Adv1 ≤ AdvEUF-CMA
S,B .
9: end if
10: end for Bounding Adv1 : Forgery in the tree. We now bound the proba-
11: return S.Verify(vk, id | ℎ | 𝑁 ) bility that an adversary succeeds in Game1 . Note that if we did
not abort in Game1 , we have that the id and root of the forgery
(id★, root★, 𝑁 ★) are identical to those of a tree that has been cre-
5 SECURITY PROOF ated during a signing query. Let this query be the 𝑗-th query M 𝑗 =
Theorem 5.1. Let S = (KeyGen, Sign, Verify) be a digital sig- {𝜇 0, . . . , 𝜇 𝑁 ★ −1 } with response S 𝑗 . Hence (id★, root★, 𝑁 ★) = (id𝑘 ,
nature scheme as in Definition 2.4, H be a tweakable hash function as root𝑘 , 𝑁𝑘 )∀ sig𝑘 ∈ S 𝑗 . Here, again, we implicitly define root★
in Definition 2.1. Let BaS = (KeyGen, BSign, Verify) be the batch and root𝑘 by sig★ and sig𝑘 . Also, given the fact that 𝜇★, sig★ =
signature scheme with BSign and Verify defined in Algorithms 1 (id★, 𝜎 ★, 𝑁 ★, 𝑖 ★, 𝑟𝑖★, path𝑖★) is a forgery, by definition of BEUF-CMA,
and 2, respectively. If there exists a (classical or quantum) adversary we must have that 𝜇★ ≠ 𝜇𝑖 ★ . Running Verify(vk, 𝜇★, sig★) and
A that breaks BEUF-CMA of BaS (see Definition 2.5) with 𝑞𝑠 queries Verify(vk, 𝜇𝑖 ★ , sig𝑖 ★ ) we note that this computes the same branch
to the signing oracles, then it holds that in two hash trees of same height and with identical roots but differ-
ing starting values. By the pigeonhole principle, this implies that
AdvBEUF-CMA
BaS,A (𝜆) ≤ AdvEUF-CMA
S,B (𝜆) + 𝑞𝑠 · AdvSM-TCR
H,C (𝑁 , 𝜆), there must be a collision in these paths which can be extracted.
where B makes 𝑞𝑠 queries to its signing oracle. We use the above observation to construct an adversary C against
the SM-TCR-security of H. At the beginning of the game, C guesses
The idea of the proof is as follows. Assume adversary A forges a which signing query 𝑗 the collision will occur in. To answer the 𝑗-
signature of BaS on some message. By definition of unforgeability th signing query, instead of sampling id (Line 2), C builds the tree
this message has not been queried to the signing oracle. This en- using calls to its H(𝑃, ·, ·) oracle (where 𝑃 is chosen by the SM-TCR
ables us to distinguish two cases. Either the root (that is included challenger, see Figure 1). After finishing the tree, C requests 𝑃 from
in the signature) was part of a query response or not. If it has not the challenger before Line 15 and finishes Algorithm 1. Later, when
been part of a response, we can extract a forgery for S. In the other the adversary A outputs a forgery sig★ on 𝜇★, C extracts the colli-
case, there must be a collision somewhere in the hash tree which sion using Verify as outlined above. The algorithm submits the col-
we can use to solve SM-TCR. liding value from the forgery as the solution in the SM-TCRH A (𝜆)
game.
Proof. Let A be an adversary against BEUF-CMA of BaS. More We can bound the probability that A succeeds in Game1 by C’s
concretely, assume that the adversary A gets the verification key advantage AdvSM-TCR (𝑁 , 𝜆) and the probability of C guessing the
vk, has access to a signing oracle, and outputs a signature sig★ B H,C
right query 𝑗. So
(id★, 𝑁 ★, 𝜎 ★, 𝑖 ★, 𝑟 ★, path★) for a message 𝜇★ that has not been
queried before, i.e. (𝜇★, ·) ∉ Q. We proceed via a series of game Adv1 ≤ 𝑞𝑠 · AdvSM-TCR
H,C (𝑁 , 𝜆).
hops. Throughout, we let Adv𝑖 denote the advantage of A in Gamei .
Combining both bounds confirms the claimed statement. □
Also, we implicitly define root𝑖 (and root★) by sig𝑖 (and sig★) since
they can be computed deterministically: it is the value that comes Remark. We note that our proof is not tight due to the factor 𝑞𝑠
out of the authentication path evaluation in Verify, cf. up to Line 10 incurred from guessing the right query to play the SM-TCR game
of Algorithm 2. with. It is plausible that this factor of 𝑞𝑠 can be removed by a more
Game0 : BEUF-CMA against BaS. So careful analysis of the required SM-TCR property. More precisely, we
use a different public parameter id for each tree. For a good tweakable
Adv0 = AdvBEUF-CMA
BaS,A . hash function, a query under public parameter id should not leak any
information about the outcome using a different parameter id′ ≠ id.
Game1 : Excluding S-forgeries. Game1 is identical to Game0
Hence, an adversary should intuitively not gain any advantage from
except that it aborts if (·, (id★, root★, 𝑁 ★, · · · )) ∉ Q. Here, we use
targeting multiple instances of H at the same time, as long as they
that sig★ and sig𝑖 implicitly define root★ and root𝑖 . In this case,
use different public parameters as we do. We leave an analysis of this
((id★, root★, 𝑁 ★), 𝜎 ★) is an EUF-CMA forgery for S found by A.
property for follow-up work.
In particular, we have that
Theorem 5.2. Let S = (KeyGen, Sign, Verify) be a digital sig-
S.Verify(vk, 𝜎 ★, (id★, root★, 𝑁 ★)) = 1
nature scheme as in Definition 2.4, H be a tweakable hash function as
if Verify(vk, 𝜇★, sig★) = 1. in Definition 2.1. Let BaS = (KeyGen, BSign, Verify) be the batch
6
signature scheme with BSign and Verify defined in Algorithms 1 is full before the HSM is ready, the agent starts a new tree resulting
and 2, respectively. If there exists a (classical or quantum) adversary in a queue of trees to be signed.
A that breaks wBATCH-PRIV of BaS (see Definition 3.4), then it When the signature of the Merkle tree root is returned by the
holds that HSM, it is added to each certificate together with the sibling path
AdvwBATCH-PRIV ≤ 2 · AdvOT-PRF associated to that request (see Line 26, Algorithm 1), resulting in
BaS,A F,B ,
the final certificate. Of course, this assumes that the certificate re-
for F(𝑘, (id, 𝑖, 𝑥)) B H(id, 0 | 𝑖, 𝑘 | 𝑥). quester is able to verify batch signatures. Moreover, the CA gen-
erating the certificates using batch signatures needs to be updated
Proof. On receipt of M, we run the signing oracle as usual but accordingly. Naturally this increases the throughput at which cer-
call our RoR oracle with input (id, 𝑖, 𝜇𝑏 ) in Line 5 of Algorithm 1. tificates can be signed by roughly a factor equal to the batch size,
If the oracle returns random outputs (which happens with proba- as we need only one signature per 32 certificates. For example, in
bility 1/2), the advantage of A is zero. Otherwise, if A returns the the cloud HSM setting we would pass for ECDSA from hundreds
correct answer, this allows to distinguish F from random, which is to hundreds of thousands of signatures per second.
bounded by AdvOT-PRF
F,B . □ We expect that this effect will be more pronounced in a post-
quantum setting where signing operations are much more expen-
6 APPLICATIONS sive (as mentioned above). However, hard performance figures for
Batch signatures reduce the computational cost of signing by re- post-quantum signatures on HSMs are not yet available, so we can-
placing one signature per message with fewer than two hashes not estimate the likely throughput.
per message and one signature per batch. They can thus be used to
increase the throughput attainable at a given amount of computa-
6.1.2 Transport Layer Security. TLS, being one of the most pop-
tional power. In some applications, the amount of data that needs
ular and commonly used cryptographic protocols today, will also
to be sent can be reduced in addition; namely, if a given entity (is
suffer substantial impact from the transition to post-quantum cryp-
aware that it) receives multiple signatures from the same batch. In
tography. Indeed, many of the recent benchmarks (e.g. [23]) show
this case, sending the signed root multiple times is redundant and
a significant performance penalty, especially on the computational
we can asymptotically reduce the amount of received information
and communication costs associated to signatures (in general done
to a few hashes per message.
server-side), and the gap becomes much more apparent when con-
In the following, we describe how to use batch signatures to
sidering packet loss [18]. This performance degradation in PQ TLS
reduce computational costs and communication costs in two oper-
is incurred due to larger signature sizes and slower signing speeds.
ations: certificate generation (typically in an HSM) and transcript
KEM sizes and performances while also worse are much closer to
signature (typically in TLS). We then discuss two scenarios in which
ECC in comparison. As a result, to circumvent the use of (PQ) signa-
this can be particularly beneficial.
tures for authentication in TLS KEMTLS [22] was proposed which
replaces static server authentication with a static KEM, so that only
6.1 Computational Costs the involved KEM public keys need to be signed rather than the
As noted, we will consider two scenarios: HSMs that generate a transcript. The results reported in [22] show a reduction in the
large set of short-lived certificates, and server-side signing for TLS. bandwidth required for the client and server communications, as
well as reducing the computational costs on the server’s CPU.
6.1.1 Hardware Security Modules. Generating a large set of certifi-
However, despite the performance virtues of KEMTLS, it requires
cates, for example when they are renewed for a group of entities,
a number of significant infrastructure changes in order for it to
implies in general computing a signature per certificate, and thus
fully reach fruition. Specifically, in order for KEMTLS to be used
can represent a significant computational burden. Moreover, those
in practice, it will rely on changes to (i) include KEM public keys
signatures are in general computed on HSMs, which are signifi-
into a public-key infrastructure (PKI) and (ii) TLS implementations
cantly slower than traditional CPUs. For example, where a mod-
to operate with different state machines on both client and server
ern commodity CPU can sign tens of thousands of messages per
sides. These points inhibit the design of a KEMTLS standard and its
second with ECDSA [6], some widely used enterprise-grade cloud
uptake compared with “plain” PQ TLS. We illustrate the messages
HSMs can just sign a few hundred messages per second [7]. Thus,
exchanged in TLS 1.3 and in KEMTLS in Figures 7a and 7b.
short-lived certificates [26], for example, can put significant stress
A less invasive proposal would be to use batch signing for server-
on HSMs, especially when certificate renewal concerns a large set
side computations. This approach goes back to [2] and is explored
of devices or containers (e.g. an Envoy mesh network with 10K to
in this work. As in Algorithm 1, a server can amortise its signa-
100K containers [15]).
ture computation costs by adding each incoming client to a Merkle
In such a setting, deploying a batch signing approach is quite
tree, building the tree, and returning the signed root to each client,
straightforward. Interfacing with the HSM, an agent waits for a
in addition to some auxiliary information. This then reduces the
signal from the HSM indicating that it is ready to start a new sig-
number of “inner” signature computations required (by a factor
nature. While waiting, the agent gathers incoming certificate re-
equal to the batch size), which is the major contributing factor for
quests that are hashed and a fixed size (e.g. of 32 leaves) Merkle
the high-throughput improvements shown in Table 2. This signif-
tree is built. When the HSM signals “ready”, the agent completes
icantly improves the performance of PQ TLS without any major
the Merkle tree with the appropriate number of zeroed leaves and
changes to the PKI.
sends it for signing to the HSM. In the opposite case, when the tree
7
Client Server Client Server

ClientHello, KeyShare ClientHello, KeyShare

ServerHello, KeyShare, ServerHello, KeyShare,


Cert., Cert. Verify, Finished Certificate

Finished, App. Data KeyShare, Finished, App. Data


Application Data Finished
Application Data

(a) High-level overview of TLS 1.3 1-RTT handshake. Cer- (b) High-level overview of KEMTLS handshake.
tificate Verify contains the signature.

Figure 7: TLS 1.3 and KEMTLS.

These improvements come at a few extra minor costs in TLS; sends their certificate to the client. In that setting, if the client is
these being a slight increase in the overall batch signature sizes going to interact over the lifetime of the certificates with multiple
(adding between 82 and 98 bytes, as shown in Table 1) compared servers from the same batch group, it can inform that the tree root
to the non-batch version of the signature as well as a slight increase signature is already known (for example in TLS with a variation
in latency (shown in Table 2) and computation for the client, which to the TLS Cached Information Extension that allows to notify a
is due to the hash function calls when building the Merkle tree. server that some information is already known). The certificate can
just contain the sibling path as the signature, leading to a band-
6.2 Bandwidth Reduction width usage reduction (between 1KB and 3KB if using Falcon or
As noted previously, in [4] a reduction in certificate sizes is pro- Dilithium, and up to 30 KB if using SPHINCS+ ).
posed by a new type of CA which would exclusively sign a new
type of batch oriented certificates. Such a CA would only be used 6.3 Use-Cases
together with a Certificate Transparency authority which changes We consider here two more fleshed-out examples of situations where
the usual required flows for authentication. The benefits obtained forming batches is natural and can produce significant gains.
in certificate size and verification/signature costs are significant.
It also implies that the main criterion for being on a same batch 6.3.1 Fleet of Load Balancers. To reduce downtime (e.g. because
are: being signed by the same CA, and being signed roughly at the of a Denial-of-Service attack, server maintenance, etc.) and to im-
same time. prove scalability (e.g. to efficiently (geographic) distribute requests
In this section we consider what benefits we can obtain in a under heavy network traffic) load balancers are essential for most
simpler setting, supposing just that a usual CA and its users can use of today’s web applications. At the same time, they often act as
multiple signing algorithms, one of them being a batch signature. a TLS termination proxy, and as such decrypt, encrypt, and sign
In this case multiple certificates will be in the same batch when a the incoming and outgoing HTTPS traffic to offload cryptographic
user considers this beneficial. In this section we show how this can computations from back-end server(s), see Figure 8. As a conse-
be beneficial (besides reducing the load of the CA). There are two quence, when cryptographic computations become more costly due
flows in which we can expect bandwidth reduction for the entities to the transition to PQC, load balancers themselves may become a
with certificates belonging to a same signed batch. throughput bottle-neck. Implementing batch signatures could sig-
nificantly increase their workload capacity.
6.2.1 First Flow: HSMs. The first flow is from the signing author-
In this use-case, we consider a fleet of load balancers belonging
ity (again, typically, an HSM) to the entities corresponding to a
to a large cloud provider that renew their certificates periodically
same batch of signed certificates. Indeed, all the issued certificates
(say weekly). They form a natural group on the certificate renewal
have a signature that contains the same Merkle tree root signature
process and making a batch certificate signature request can signif-
but a different sibling path. Depending on the exact situation, it is
icantly reduce the computational load on the associated HSMs, as
then possible for the HSM to broadcast the root signature or the en-
described in Section 6.1.1, and the communications, as described at
tire Merkle Tree and remove any information from the certificates
the beginning of this section. Most importantly, in such a setting,
that are sent back that can be reconstructed from the broadcasts.
the fleet of load balancers would send full certificates only once per
6.2.2 Second Flow: TLS. The second flow is from the entities hold- week and per user, and in all remaining connections load balancers
ing the certificates signed on the same batch to the entities receiv- will reduce the size of the certificates by 1 to 30KB. From a user per-
ing that certificate. This typically happens in TLS when the server spective, if major cloud providers use this system, a user will only
8
Figure 8: Client–Load Balancer–Backend server connection
with nested batch signatures.

have to download full sized certificates a few times per week (for
connections served through cloud provider load balancers).
Instead of considering the bandwidth reduction, this can be seen
as a usability question for schemes with large signatures such as
Figure 9: Mesh network with envoys using nested signatures
SPHINCS+ . This algorithm relies on mild assumptions, and thus
within mTLS
is a good candidate for CAs. Unfortunately, increasing certificate
sizes to tens of kilobytes can be considered too steep of a require-
ment. Batch signing can solve this in the load balancer setting (and
• openssl: includes a fork of OpenSSL 1.1.1, using support from
similar situations) as full certificates are only sent exceptionally.
the liboqs library [25] and batchtls_engine.
6.3.2 mTLS Mesh. The second scenario is a large-scale container • tcpserver: a TCP server, using openssl and bsign_engine.
mesh network that renews the mutual TLS (mTLS) certificates in its The bsign_engine is the core component of our implementa-
Envoy instances. In this scenario, by considering all the Envoy in- tion. It is responsible for asynchronously gathering client requests
stances as a single batch, there is no need to ever transmit the root into Merkle trees and signing the associated roots with a fixed num-
signature between peers, as every peer has the root signature in its ber of signing threads.
own certificate (see Figure 9). Of course, during the generation of We introduce two implementation-specific parameters, described
the certificates we also benefit from the already described compu- below, which are used to tweak the performance of our implemen-
tational and communication reduction for the HSM which can be tation. This fine-tuning allows us to increase performance based on
quite significant for mesh networks with thousands or tens of thou- a number of factors, most importantly what signature algorithm is
sands of containers. The storage requirements for the whole set of used, the server specifications, and the resulting latency.
mesh certificates (in cached key servers that are used throughout
Merkle Tree Size: This quantifies the amount of messages han-
mesh networks) is also greatly reduced, e.g. for a ten thousand con-
dled per batch signing transaction, which thus affects the latency
tainer mesh using SPHINCS+ signatures it would be reduced from
and throughput of the server. Having a larger number of clients
hundreds of megabytes to hundreds of kilobytes.
(messages as tree leaves) will reduce the average computation time
per client, but at the same time will increase the size of the final
7 IMPLEMENTATION & BENCHMARK SETUP signature (due to the longer path) and increase median and worst-
As a demonstrator for batch signatures we implement them for one case latency. We provide batch signature sizes in Table 1.
of the most important use cases, TLS, and set up a load balancer We parameterise the Merkle tree using the number of leaves in
benchmark. Here we provide details on our implementation and the tree and commonly fix this to a power-of-two for efficiency
on the benchmark setup. reasons, e.g. MT_size = 25 which produces a (balanced) tree of
height 5. However, this parameter may need to be adapted to fit
7.1 Implementation details other hardware or performance constraints.
There are a number of components that make up the overall im-
Signing Threads: These are responsible for taking the ownership
plementation for our proof-of-concept batch signing experiments
of a Merkle tree and for signing its root. When a signing thread
in TLS. These allow us to estimate realistic conditions for secure
is currently signing a Merkle tree root, it cannot handle the next
network communications and thus accurately learn how effective
tree. Thus, if the bsign_engine has a single signing thread, the
our construction is under such conditions. These components are:
worst observed latency will correspond to twice the time needed
• bsign_engine: the Rust implementation of batch signing, which for a signature. By adding additional signing threads, the latency
includes Merkle tree building and batch signing functionalities. will get closer to the time needed for a signature. However, having
We also have a benchmarking wrapper to produce results. too many signing threads may saturate the scheduling and may
9
slow down the engine, or become sub-optimal at the very least. [2] David Benjamin. 2020. Batch Signing for TLS. Internet-Draft draft-ietf-tls-batch-
Inversely, if we reduce the number of signing threads we also limit signing-00. Internet Engineering Task Force. https://datatracker.ietf.org/doc/
draft-ietf-tls-batch-signing/00/ Work in Progress.
the number of cores used for signatures. In Table 2, we choose [3] David Benjamin. 2022. private communication. (2022).
the number of signing threads/cores as the smallest number that [4] David Benjamin, D O’Brien, and Bas Westerbaan. 2023. Merkle Tree
Certificates for TLS. Internet-Draft draft-davidben-tls-merkle-tree-
allows to increase the number of handshakes significantly (with certs-00. Internet Engineering Task Force. https://www.ietf.org/id/
respect to the plain implementation) with a low latency increase. draft-davidben-tls-merkle-tree-certs-00.html Work in Progress.
We use the open-source Rust library, ring [24], for our ECDSA [5] Daniel J. Bernstein, Andreas Hülsing, Stefan Kölbl, Ruben Niederhagen, Joost Ri-
jneveld, and Peter Schwabe. 2019. The SPHINCS+ Signature Framework. In ACM
implementation and the open-source C library, liboqs [25], for CCS 2019, Lorenzo Cavallaro, Johannes Kinder, XiaoFeng Wang, and Jonathan
our implementations of post-quantum signatures. One reason for Katz (Eds.). ACM Press, 2129–2146. https://doi.org/10.1145/3319535.3363229
using this library in particular is because it has integration into [6] Daniel J Bernstein and Tanja Lange. 2018. SUPERCOP: System for unified per-
formance evaluation related to cryptographic operations and primitives. https:
OpenSSL 1.1.12 which we patch for our experiments to work with //bench.cr.yp.to/supercop.html. (2018).
the bsign_engine. The patch modifies the state machine of OpenSSL [7] AWS CloudHSM. 2023. FAQS - Performance and capacity. https://aws.amazon.
com/cloudhsm/faqs/#Performance_and_capacity. (2023). [Online; accessed 21-
to inject batch signing under certain conditions, but in such a way February-2023].
that it remains functional with classic TLS when these conditions [8] Erik Dahmen, Katsuyuki Okeya, Tsuyoshi Takagi, and Camille Vuillaume. 2008.
are not met. More specifically, our patch adds a new structure – Digital Signatures Out of Second-Preimage Resistant Hash Functions. In Post-
quantum cryptography, second international workshop, PQCRYPTO 2008, Jo-
BATCHTLS_CTX – to the OpenSSL context – SSL_CTX – to track the hannes Buchmann and Jintai Ding (Eds.). Springer, Heidelberg, 109–123. https:
context of the engine. We use environment variables for setting the //doi.org/10.1007/978-3-540-88403-3_8
parameters in bsign_engine for simplicity, as opposed to adding [9] Andrew Fregly, Joseph Harvey, Burton S. Kaliski Jr., and Swapneel Sheth. 2022.
Merkle Tree Ladder Mode: Reducing the Size Impact of NIST PQC Signature
new APIs on top of SSL_CTX. Algorithms in Practice. Cryptology ePrint Archive, Report 2022/1730. (2022).
These aforementioned components are the ingredients that make https://eprint.iacr.org/2022/1730.
[10] Nicholas Genise and Daniele Micciancio. 2018. Faster Gaussian Sampling for
up our overall implementation for batch signing in TLS, once com- Trapdoor Lattices with Arbitrary Modulus. In EUROCRYPT 2018, Part I (LNCS),
bined with a TCP server. Implementing it this way gives us a real- Jesper Buus Nielsen and Vincent Rijmen (Eds.), Vol. 10820. Springer, Heidelberg,
istic environment in order to run our experiments; providing con- 174–203. https://doi.org/10.1007/978-3-319-78381-9_7
[11] Craig Gentry, Chris Peikert, and Vinod Vaikuntanathan. 2008. Trapdoors
ditions that we would expect to see in the real world. for hard lattices and new cryptographic constructions. In 40th ACM STOC,
Richard E. Ladner and Cynthia Dwork (Eds.). ACM Press, 197–206. https:
//doi.org/10.1145/1374376.1374407
7.2 Benchmark Setup [12] Shay Gueron and Vlad Krasnov. 2012. Parallelizing message schedules to accel-
We demonstrate the application of batch signatures for the TLS erate the computations of hash functions. Journal of Cryptographic Engineering
2, 4 (2012), 241–253.
use case with the results for these shown in Table 2. For this setup [13] James Howe and Bas Westerbaan. 2022. Benchmarking and Analysing the NIST
we took between one and four client machines and a single server PQC Finalist Lattice-Based Signature Schemes on the ARM Cortex M7. Cryp-
machine. Each of them uses a Google Cloud C2 instance which has tology ePrint Archive, Report 2022/405. (2022). https://eprint.iacr.org/2022/405.
[14] Andreas Hulsing, Daniel J. Bernstein, Christoph Dobraunig, Maria Eichlseder,
an Intel 3.9 GHz Cascade Lake processor. The specific instance type Scott Fluhrer, Stefan-Lukas Gazdag, Panos Kampanakis, Stefan Kolbl, Tanja
we used was the c2-standard-30, which offers 30 (virtual) CPU Lange, Martin M Lauridsen, Florian Mendel, Ruben Niederhagen, Christian
threads, 120 GB memory, and a (max) egress bandwidth of 32 Gbps. Rechberger, Joost Rijneveld, Peter Schwabe, Jean-Philippe Aumasson, Bas West-
erbaan, and Ward Beullens. 2022. SPHINCS+. Technical Report. National Insti-
We disable hyper-threading in order to have more stable tests. tute of Standards and Technology. available at https://csrc.nist.gov/Projects/
The results in Table 2 labelled as ‘plain’ are taken from a multi- post-quantum-cryptography/selected-algorithms-2022.
[15] Matt Klein. 2017. Lyft’s Envoy: Experiences Operating a Large Service Mesh.
threaded implementation (with a pool of as many threads as com- SREcon17 Americas. (2017). available at https://www.usenix.org/sites/default/
puter cores and with select/poll handling). For the batch signing files/conference/protected-files/srecon17americas_slides_klein.pdf.
results, we use a limit on the maximum size of the Merkle tree. [16] Vadim Lyubashevsky, Léo Ducas, Eike Kiltz, Tancrède Lepoint, Peter Schwabe,
Gregor Seiler, Damien Stehlé, and Shi Bai. 2022. CRYSTALS-DILITHIUM. Techni-
The amount of cores used for signatures is estimated for the plain cal Report. National Institute of Standards and Technology. available at https://
approach out of the computational cost of one signature, and fixed csrc.nist.gov/Projects/post-quantum-cryptography/selected-algorithms-2022.
(by reserving cores explicitly) for the batch signing approach. [17] Daniele Micciancio and Chris Peikert. 2012. Trapdoors for Lattices: Simpler,
Tighter, Faster, Smaller. In EUROCRYPT 2012 (LNCS), David Pointcheval and
The results in Table 2 provide both handshakes per second (es- Thomas Johansson (Eds.), Vol. 7237. Springer, Heidelberg, 700–718. https://doi.
sentially, throughput) and latency (for various percentiles). We kept org/10.1007/978-3-642-29011-4_41
[18] Christian Paquin, Douglas Stebila, and Goutam Tamvada. 2020. Benchmarking
Merkle tree sizes to either 16 or 32, since larger sizes incurred much Post-quantum Cryptography in TLS. In Post-Quantum Cryptography - 11th In-
higher latency costs. Indeed, for large trees the latency added is un- ternational Conference, PQCrypto 2020, Jintai Ding and Jean-Pierre Tillich (Eds.).
realistic, despite throughput being increased. Springer, Heidelberg, 72–91. https://doi.org/10.1007/978-3-030-44223-1_5
[19] Thomas Pornin. 2019. New Efficient, Constant-Time Implementations of Falcon.
The performance results are presented in Table 2 and discussed Cryptology ePrint Archive, Report 2019/893. (2019). https://eprint.iacr.org/2019/
on the associated caption. Results for SPHINCS+ and a second bench- 893.
mark in the HSM setting will be presented in a full version of this [20] Thomas Prest, Pierre-Alain Fouque, Jeffrey Hoffstein, Paul Kirch-
ner, Vadim Lyubashevsky, Thomas Pornin, Thomas Ricosset, Gregor
paper. Seiler, William Whyte, and Zhenfei Zhang. 2020. FALCON. Techni-
cal Report. National Institute of Standards and Technology. avail-
able at https://csrc.nist.gov/projects/post-quantum-cryptography/
REFERENCES post-quantum-cryptography-standardization/round-3-submissions.
[1] Kahraman Akdemir, Martin Dixon, Patrick Fay Wajdi Feghali, Vinodh Gopal, [21] Thomas Prest, Pierre-Alain Fouque, Jeffrey Hoffstein, Paul Kirchner, Vadim
Jim Guilford, Erdinc Ozturk, Gil Wolrich, and Ronen Zohar. 2010. Breakthrough Lyubashevsky, Thomas Pornin, Thomas Ricosset, Gregor Seiler, William Whyte,
AES Performance with Intel® AES New Instructions. Whitepaper. Intel. and Zhenfei Zhang. 2022. FALCON. Technical Report. National Institute
of Standards and Technology. available at https://csrc.nist.gov/Projects/
post-quantum-cryptography/selected-algorithms-2022.
2 see https://openquantumsafe.org/applications/tls.html.
10
[22] Peter Schwabe, Douglas Stebila, and Thom Wiggers. 2020. Post-Quantum
TLS Without Handshake Signatures. In ACM CCS 2020, Jay Ligatti, Xinming
Ou, Jonathan Katz, and Giovanni Vigna (Eds.). ACM Press, 1461–1480. https:
//doi.org/10.1145/3372297.3423350
[23] Dimitrios Sikeridis, Panos Kampanakis, and Michael Devetsikiotis. 2020. Post-
Quantum Authentication in TLS 1.3: A Performance Study. In NDSS 2020. The
Internet Society.
[24] Brian Smith. 2023. Crate ring. https://github.com/briansmith/ring. (2023). [On-
line; accessed 24-February-2023].
[25] Douglas Stebila and Michele Mosca. 2016. Post-quantum Key Exchange for
the Internet and the Open Quantum Safe Project. In SAC 2016 (LNCS), Roberto
Avanzi and Howard M. Heys (Eds.), Vol. 10532. Springer, Heidelberg, 14–37.
https://doi.org/10.1007/978-3-319-69453-5_2
[26] Emin Topalovic, Brennan Saeta, Lin-Shung Huang, Collin Jackson, and Dan
Boneh. 2012. Towards Short-Lived Certificates. In IEEE Oakland Web 2.0 Se-
curity and Privacy (W2SP).
[27] Shalanda D. Young. 2022. National Security Memo on Promoting
United States Leadership in Quantum Computing While Mitigating
Risks to Vulnerable Cryptographic Systems (NSM-10). Executive Office
of the President, Office of Management and Budget, Washington, DC,
USA. (2022). https://www.whitehouse.gov/wp-content/uploads/2022/11/
M-23-02-M-Memo-on-Migrating-to-Post-Quantum-Cryptography.pdf

11

You might also like