Batch Signatures, Revisited: Carlos Aguilar-Melchor Martin R. Albrecht Thomas Bailleux
Batch Signatures, Revisited: Carlos Aguilar-Melchor Martin R. Albrecht Thomas Bailleux
2.1 Hash Functions in multi-target settings where an adversary wins when they man-
In this work we consider tweakable hash functions. These are keyed age to attack one out of many targets.
hash functions that take an additional input which can be thought Definition 2.1 (Tweakable Hash Function [5]). Let 𝑛, 𝑚 ∈ N, let P
of as a domain separator (while the key or public parameter serves be the public parameters space and T the tweak space. A tweakable
as a separator between users). When used right, tweakable hash hash function is a tuple of algorithm H = (KeyGen, Eval) such
functions allow to tightly achieve target collision resistance even that:
2
A (𝑘, 𝜆)
SM-TCRH EUF-CMASA (𝜆) / BEUF-CMASA (𝜆) Sign(𝜇)
𝜆
𝑃 ← KeyGen(1 ) Q ← ∅; 𝜎 ← Sign(sk, 𝜇 )
(1𝜆 ) vk, sk ← KeyGen(1𝜆 );
H(𝑃,·,·) Q ← Q ∪ { (𝜇, 𝜎 ) }
𝑆 ← A0
/ Q = { (𝑡0 , 𝜇0 ), . . . , (𝑡𝑘 −1 , 𝜇𝑘 −1 ) } queries submitted to H(𝑃, ·, ·) ★ ★
(𝜇 , 𝜎 ) ← A Sign
(vk); / EUF-CMA return 𝜎
for 𝑖, ℓ ∈ {0, . . . , 𝑘 − 1}, 𝑖 ≠ ℓ do (𝜇★, 𝜎 ★ ) ← A BSign (vk); / BEUF-CMA BSign(M)
if 𝑡𝑖 = 𝑡 ℓ : return ⊥ ★ ★
return (𝜇 , ·) ∉ Q ∧ Verify(vk, 𝜎 , 𝜇 ) = 1★ S ← Sign(sk, M )
( 𝑗, 𝜇 ) ← A1 (1𝜆 , 𝑆, 𝑃, Q ) for 0 ≤ 𝑗 < | M | do
return 0 ≤ 𝑗 < 𝑘 ∧ 𝜇 ≠ 𝜇 𝑗 ∧ H(𝑃, 𝑡 𝑗 , 𝜇 𝑗 ) = H(𝑃, 𝑡 𝑗 , 𝜇 ) 𝑞 𝑗 ← ( M [ 𝑗 ], S [ 𝑗 ] )
Q ← Q ∪ {𝑞 𝑗 }
Figure 1: Single-function, Multi-Target Collision Resistance return S
for distinct tweaks (SM-TCR).
Figure 3: Existential Unforgeability under Chosen Message
Attacks for Signatures (EUF-CMA) and Batch Signatures
Commit RoR(𝑥)
(BEUF-CMA).
𝑏 ←$ {0, 1} 𝑘 ←$ {0, 1} 𝜆
′
𝑏 ←A RoR(·)
() 𝑦0 ← F(𝑘, 𝑥 )
return 𝑏 = 𝑏 ′ 𝑦1 ←$ {0, 1}𝑛 KeyGen The key generation algorithm is a randomised algorithm
return 𝑦𝑏 that takes as input a security parameter 1𝜆 and outputs a
pair (vk, sk), the verification key and signing key, respec-
tively. We write (vk, sk) ← KeyGen(1𝜆 ).
Figure 2: One-time Pseudorandom Function (OT-PRF)
Sign The signing algorithm takes as input a signing key sk, a
message 𝜇 and outputs a signature 𝜎. We write this as 𝜎 ←
KeyGen the setup function takes the security parameter 1𝜆 and Sign(sk, 𝜇). The signing algorithm may be randomised or
outputs a (possibly empty) public parameter 𝑝. We write deterministic. We may write 𝜎 ← Sign(sk, 𝜇; 𝑟 ) to un-
𝑝 ← KeyGen(1𝜆 ). earth the used randomness explicitly.
Eval the evaluation function takes public parameters 𝑝, a tweak Verify The verification algorithm takes as input a verification
𝑡, an input 𝑥 ∈ {0, 1}𝑚 and returns a hash value ℎ. We key vk, a signature 𝜎 and a message 𝜇 and outputs a bit
write ℎ ← Eval(𝑝, 𝑡, 𝑥) or simply ℎ ← H(𝑝, 𝑡, 𝑥). This is 𝑏, with 𝑏 = 1 meaning the signature is valid and 𝑏 = 0
a deterministic function. meaning the signature is invalid. Verify is a deterministic
algorithm. We write 𝑏 ← Verify(vk, 𝜎, 𝜇).
In what follows, we will avoid relying on plain collision resis-
We require that except with negligible probability over (vk, sk) ←
tance but target collision resistance of tweakable hash functions.
KeyGen(1𝜆 ), it holds that Verify(vk, Sign(sk, 𝜇), 𝜇) = 1 for all 𝜇.
Definition 2.2 (Target Collision Resistant Hash Function [5]). An
efficient tweakable hash function H = (KeyGen, Eval) is called We rely on the standard notion of existential unforgeability un-
single-function multiple-targets target-collision resistant for dis- der chosen message attacks:
tinct tweaks (SM-TCR) if the advantage Advsm-tcr
A,H (𝑘, 𝜆) of any (PPT/ Definition 2.5 (EUF-CMA). We define
BQP) algorithms A = (A0, A1 ) that define up to 𝑘 targets in the
SM-TCR experiment defined in Figure 1 is negligible with Adveuf-cma
A,S (𝜆) B Pr[EUF-CMASA (𝜆) ⇒ 1]
A
Advsm-tcr
A,H (𝑘, 𝜆) B Pr [SM-TCRH (𝑘, 𝜆) ⇒ 1]. for EUF-CMASA (𝜆) as in Figure 3 and say a signature scheme S is
We will also rely on one-time pseudrandomness to argue pri- EUF-CMA secure if no PPT/BQP adversary A has non-negligible
vacy. advantage Adveuf-cma
A,S (𝜆).
Definition 2.3 (OT-PRF). Let 𝑛, 𝑚 ∈ N, F : {0, 1}𝜆 × {0, 1}𝑚 → Remark. In our construction, the signature scheme takes as inputs
{0, 1}𝑛 be a keyed function. We define and outputs batches of messages and signatures, respectively. We for-
ot-prf A mally define batch signature schemes and their security (BEUF-CMA)
Adv A,F (𝜆) B Pr[OT-PRFF (𝜆) ⇒ 1]
in Section 3.
A (𝜆) as in Figure 2 and say F is an OT-PRF if no PPT/BQP
for OT-PRFF
ot-prf 2.3 Falcon Signature Scheme
adversary A has non-negligible advantage Adv A,F (𝜆).
Since our flagship demonstrator is the composition of our scheme
with Falcon [20] (based on the GPV paradigm [11]), we give a
2.2 Digital Signatures stylised description in Figure 4, since this suffices for our purposes
Definition 2.4 (Signature Scheme). A signature scheme S consists here. Let (TrapGen, SampD, SampPre) be PPT algorithms with
of three PPT algorithms (KeyGen, Sign, Verify) such that: the following syntax and properties [10, 11, 17]:
3
• (𝑨, td) ← TrapGen(1𝜂 , 1ℓ , 𝑞, R, 𝛽) takes dimensions 𝜂, ℓ ∈ N, a KeyGen The key generation algorithm is a randomised algorithm
modulus 𝑞 ∈ N, a ring R, and a norm bound 𝛽 ∈ R. It generates that takes as input a security parameter 1𝜆 and outputs a
𝜂×ℓ
a matrix 𝑨 ∈ R𝑞 and a trapdoor td. For any 𝑛 ∈ poly(𝜆) pair (vk, sk), the verification key and signing key, respec-
and ℓ ≥ lhl(R, 𝜂, 𝑞, 𝛽), the distribution of 𝑨 is within negl(𝜆) tively. We write (vk, sk) ← KeyGen(1𝜆 ).
𝜂×ℓ BSign The batch signing algorithm takes as input a signing key
statistical distance to the uniform distribution on R𝑞 .
• 𝒖 ← SampD(1𝜂 , 1ℓ , R, 𝛽 ′ ) with ℓ ≥ lhl(R, 𝜂, 𝑞, 𝛽) outputs an sk, a list of messages M = {𝜇𝑖 } and outputs a list of sig-
element in 𝒖 ∈ R ℓ with norm bound 𝛽 ′ ≥ 𝛽. We have that natures S = {sig𝑖 }. We write this as S ← BSign(sk, M).
𝒗 B 𝑨 · 𝒖 mod 𝑞 is within negl(𝜆) statistical distance to the The signing algorithm may be randomised or determinis-
uniform distribution on R𝑞 .
𝜂 tic. We may write S ← BSign(sk, M; 𝑟 ) to unearth the
′
• 𝒖 ← SampPre(td, 𝒗, 𝛽 ) with ℓ ≥ lhl(R, 𝜂, 𝑞, 𝛽) takes a trap- used randomness explicitly.
𝜂 Verify The verification algorithm takes as input a verification
door td, a vector 𝒗 ∈ R𝑞 , and a norm bound 𝛽 ′ ≥ 𝛽. It samples
key vk, a signature sig and a message 𝜇 and outputs a bit
𝒖 ∈ R ℓ satisfying 𝑨 · 𝒖 ≡ 𝒗 mod 𝑞 and ∥𝒖 ∥ ≤ 𝛽 ′ . Furthermore, 𝒖
𝑏, with 𝑏 = 1 meaning the signature is valid and 𝑏 = 0
is within negl(𝜆) statistical distance to 𝒖 ← SampD(1𝜂 , 1ℓ , R, 𝛽 ′ )
meaning the signature is invalid. Verify is a deterministic
conditioned on 𝒗 ≡ 𝑨 · 𝒖 mod 𝑞.
algorithm. We write 𝑏 ← Verify(vk, sig, 𝜇).
We require that except with negligible probability over (vk, sk) ←
KeyGen(1𝜆 ), for all M B {𝜇𝑖 } and S ← BSign(sk, {𝜇𝑖 }) it holds
KeyGen(1𝜆 ) Sign(𝜇 𝑗 , sk𝑖 ; 𝑟 )
that ∀ sig𝑖 ∈ S : Verify(vk, sig𝑖 , 𝜇𝑖 ) = 1.
𝑨, td ← TrapGen(11 , 12 , 𝑞, R, 𝛽 ) 𝒚 𝑗 ← SampPre(td, 𝐻 (𝜇 𝑗 , 𝑟 ), 𝛽 ′ )
return vk𝑖 = 𝑨, sk𝑖 = td return 𝒚 Definition 3.2 (EUF-CMA for Batch Signature Schemes). We de-
fine
Verify(𝜎 𝑗 , 𝜇 𝑗 , vk𝑖 ) Adveuf-cma (𝜆) B Pr[BEUF-CMASA (𝜆) ⇒ 1]
?
A,S
?
return 𝒚 𝑗 ≤ 𝛽 ′ ∧ 𝐻 (𝜇 𝑗 ) ≡ 𝑨 · 𝒚 𝑗 for BEUF-CMASA (𝜆) as in Figure 3 and say a batch-signature scheme
S is EUF-CMA secure if no PPT/BQP adversary A has non-negligible
Figure 4: Falcon signatures [11, 21] advantage Adveuf-cma
A,S (𝜆).
The following proposition is immediate, by simply calling Sign
for all 𝜇𝑖 ∈ M. We call this the naïve construction.
Assumption 2.6. The Falcon signature scheme is EUF-CMA se-
cure. In particular, no (quantum) adversary exists to forge messages Proposition 3.3. Every EUF-CMA secure signature scheme can
with cost ≪ 2128 for Falcon-512 and no such adversary exists with be turned into a EUF-CMA secure batch signature scheme.
cost ≪ 2256 for Falcon-1024.
We also define two privacy notions for batch signatures. These
Performance. Consider Falcon-512 which minimises the signature assert that no efficient adversary can distinguish whether signa-
size among the NIST selected post-quantum signature algorithms. tures were signed in the same batch or not. A weak variant of
An optimised implementation beats RSA-2048 signing by roughly privacy only guarantees that signatures from the same batch do
a factor of five [6]. Critically, however, this optimised implementa- not leak anything about a message for which no signature is made
tion relies on constant-time double-precision floating point arith- available.
metic. This is not completely out of reach, as demonstrated by
Definition 3.4 ((Weak) Batch Privacy). We define
constant-time Falcon implementations [19] on several different CPUs,
(𝜆) B Pr[BATCH-PRIVSA (𝜆) ⇒ 1] − 1/2
batch-priv
working around several CPU instructions’ behaviours. However, Adv A,S
the long-term reliability of this approach is less certain than for
bit or integer operations. That is, future instructions or optimisa- and
(𝜆) B Pr[wBATCH-PRIVSA (𝜆) ⇒ 1] − 1/2
wbatch-priv
tions might prevent the desired constant-time behaviour. Further- Adv A,S
more, many CPUs to date simply lack fast constant-time double-
for the games defined in Figure 5 and say a signature scheme S
precision arithmetic [13].
has (weak) batch privacy if no PPT/BQP adversary A has non-
On systems where no sufficiently constant-time floating point (w)batch-priv
unit is available or where floating-point arithmetic is avoided for negligible advantage Adv A,S (𝜆).
the reasons mentioned above, floating-point arithmetic can be em- Our construction in Section 4 achieves wBATCH-PRIV but not
ulated (in constant time) at a hefty – approximately 20x – over- BATCH-PRIV and thus establishes that there are schemes achiev-
head [19]. ing the former but not the latter. Next, we establish that an adver-
sary breaking wBATCH-PRIV can also break BATCH-PRIV.
3 BATCH SIGNATURES
We formally define batch signatures. Lemma 3.5. Let A be an adversary against wBATCH-PRIV with
wbatch-priv
Adv A,S (𝜆). Then there is an adversary B against BATCH-PRIV
Definition 3.1 (Batch Signature Scheme). A batch signature scheme with advantage
S consists of three PPT algorithms (KeyGen, BSign, Verify) such
batch-priv wbatch-priv
that: Adv B,S (𝜆) ≥ 1/2 · Adv A,S (𝜆)
4
𝜌
BATCH-PRIVSA (𝜆) Sign(M)
𝑏 ←$ {0, 1}; if 𝑏 = 0 then 𝑛 3,0
𝑏 ★ ← A Sign (vk); for 𝜇𝑖 ∈ M do
𝑛 2,0 𝑛 2,1
★
return 𝑏 = 𝑏 {sig𝑖 } ← BSign(sk, {𝜇𝑖 } );
S ← {sig𝑖 }0≤𝑖<|M | 𝑛 1,0 𝑛 1,1 𝑛 1,2 𝑛 1,3
else
𝜇0 𝜇1 𝜇2 𝜇3 𝜇4 𝜇5 𝜇6 𝜇7
S ← BSign(sk, M );
return S
Figure 6: A Merkle tree and addressing scheme.
wBATCH-PRIVSA (𝜆) Sign(M, 𝑖, {𝜇 0, 𝜇 1 })
𝑏 ←$ {0, 1}; if 𝑖 ≥ | M | ∨ 𝑖 < 0 then return ⊥
Algorithm 1 BSign(sk, 𝑀 = [𝜇 0, 𝜇 1, . . . , 𝜇 𝑁 −1 ]) for 𝑁 = 2𝑛
★
𝑏 ←A Sign
(vk); M𝑖 ← 𝜇𝑏 / 𝑖 -th message is 𝜇𝑏
1: 𝑇 ←[]
return 𝑏 ★ = 𝑏 S ← BSign(sk, M );
2: id ←$ {0, 1}𝜆 ⊲ Tree identifier
S𝑖 ← ⊥ / delete 𝑖 -th signature 3: for 0 ≤ 𝑖 < 𝑁 do ⊲ Generate 𝑁 leaves
return S 4: 𝑟𝑖 ←$ {0, 1}𝜆
5: 𝑇 [0, 𝑖] ← H(id, 0 | 𝑖, 𝑟𝑖 | 𝜇𝑖 )
Figure 5: (Weak) Batch Privacy. 6: end for
7: ℎ ← log2 𝑁
8: for 0 ≤ 𝑘 < ℎ do
Proof. To construct the adversary B against BATCH-PRIV, we 9: for 0 ≤ 𝑗 < 2ℎ−𝑘 −1 do ⊲ Build tree
use the BATCH-PRIV signing oracle to simulate the call to BSign. 10: left, right ← 𝑇 [𝑘, 2𝑗], 𝑇 [𝑘, 2𝑗 + 1]
For this, we sample a bit 𝑐 to decide what set M to submit to ⊲ 𝑖𝑑 is public parameter, (1 | (𝑘 + 1) | 𝑗) is tweak
BATCH-PRIV signing oracle. When the adversary outputs 𝑐 ★ = 𝑐 11: 𝑇 [𝑘 + 1, 𝑗] ← H(id, 1 | (𝑘 + 1) | 𝑗, left | right)
we output 𝑏★ = 1, otherwise we output 𝑏★ = 0. 12: end for
batch-priv
To bound Adv B,S (𝜆) note that if 𝑏 = 0 the signatures re- 13: end for
turned by the BATCH-PRIV signing oracle are independent of 𝜇 0 14: root ← 𝑇 [ℎ, 0]
and 𝜇 1 by construction and thus the advantage of A is zero. If 𝑏 = 15: 𝜎 ← S.Sign(sk, id | root | 𝑁 )
1 then our signing oracle faithfully emulates the wBATCH-PRIV 16: for 0 ≤ 𝑖 < 𝑁 do ⊲ Generate user signature
signing oracle. Thus, 17: path𝑖 ← []
18: for 0 ≤ 𝑘 < log2 𝑁 do
(𝜆) = Pr[BATCH-PRIVSB (𝜆) ⇒ 1] − 1/2
batch-priv
Adv B,S 19: 𝑗 ← ⌊𝑖/2𝑘 ⌋
if 𝑗 mod 2 = 0 then
Pr[wBATCH-PRIVSA (𝜆)
20:
= 1/2 · ⇒ 1 | 𝑏 = 0] − 1/2
21: path𝑖 [𝑘] = 𝑇 [𝑘, 𝑗 + 1]
+1/2 · Pr[wBATCH-PRIVSA (𝜆) ⇒ 1 | 𝑏 = 1] − 1/2 22: else
= 0 + 1/2 · Pr[wBATCH-PRIVSA (𝜆) ⇒ 1 | 𝑏 = 1] − 1/2 23: path𝑖 [𝑘] = 𝑇 [𝑘, 𝑗 − 1]
24: end if
wbatch-priv
= 0 + 1/2 · Adv A,S (𝜆) 25: end for
26: sig𝑖 ← (id, 𝑁 , 𝜎, 𝑖, 𝑟𝑖 , path𝑖 )
□
27: end for
Finally, we note that BATCH-PRIV is achievable: 28: return {sig0, sig1, . . . , sig𝑁 −1 }
(a) High-level overview of TLS 1.3 1-RTT handshake. Cer- (b) High-level overview of KEMTLS handshake.
tificate Verify contains the signature.
These improvements come at a few extra minor costs in TLS; sends their certificate to the client. In that setting, if the client is
these being a slight increase in the overall batch signature sizes going to interact over the lifetime of the certificates with multiple
(adding between 82 and 98 bytes, as shown in Table 1) compared servers from the same batch group, it can inform that the tree root
to the non-batch version of the signature as well as a slight increase signature is already known (for example in TLS with a variation
in latency (shown in Table 2) and computation for the client, which to the TLS Cached Information Extension that allows to notify a
is due to the hash function calls when building the Merkle tree. server that some information is already known). The certificate can
just contain the sibling path as the signature, leading to a band-
6.2 Bandwidth Reduction width usage reduction (between 1KB and 3KB if using Falcon or
As noted previously, in [4] a reduction in certificate sizes is pro- Dilithium, and up to 30 KB if using SPHINCS+ ).
posed by a new type of CA which would exclusively sign a new
type of batch oriented certificates. Such a CA would only be used 6.3 Use-Cases
together with a Certificate Transparency authority which changes We consider here two more fleshed-out examples of situations where
the usual required flows for authentication. The benefits obtained forming batches is natural and can produce significant gains.
in certificate size and verification/signature costs are significant.
It also implies that the main criterion for being on a same batch 6.3.1 Fleet of Load Balancers. To reduce downtime (e.g. because
are: being signed by the same CA, and being signed roughly at the of a Denial-of-Service attack, server maintenance, etc.) and to im-
same time. prove scalability (e.g. to efficiently (geographic) distribute requests
In this section we consider what benefits we can obtain in a under heavy network traffic) load balancers are essential for most
simpler setting, supposing just that a usual CA and its users can use of today’s web applications. At the same time, they often act as
multiple signing algorithms, one of them being a batch signature. a TLS termination proxy, and as such decrypt, encrypt, and sign
In this case multiple certificates will be in the same batch when a the incoming and outgoing HTTPS traffic to offload cryptographic
user considers this beneficial. In this section we show how this can computations from back-end server(s), see Figure 8. As a conse-
be beneficial (besides reducing the load of the CA). There are two quence, when cryptographic computations become more costly due
flows in which we can expect bandwidth reduction for the entities to the transition to PQC, load balancers themselves may become a
with certificates belonging to a same signed batch. throughput bottle-neck. Implementing batch signatures could sig-
nificantly increase their workload capacity.
6.2.1 First Flow: HSMs. The first flow is from the signing author-
In this use-case, we consider a fleet of load balancers belonging
ity (again, typically, an HSM) to the entities corresponding to a
to a large cloud provider that renew their certificates periodically
same batch of signed certificates. Indeed, all the issued certificates
(say weekly). They form a natural group on the certificate renewal
have a signature that contains the same Merkle tree root signature
process and making a batch certificate signature request can signif-
but a different sibling path. Depending on the exact situation, it is
icantly reduce the computational load on the associated HSMs, as
then possible for the HSM to broadcast the root signature or the en-
described in Section 6.1.1, and the communications, as described at
tire Merkle Tree and remove any information from the certificates
the beginning of this section. Most importantly, in such a setting,
that are sent back that can be reconstructed from the broadcasts.
the fleet of load balancers would send full certificates only once per
6.2.2 Second Flow: TLS. The second flow is from the entities hold- week and per user, and in all remaining connections load balancers
ing the certificates signed on the same batch to the entities receiv- will reduce the size of the certificates by 1 to 30KB. From a user per-
ing that certificate. This typically happens in TLS when the server spective, if major cloud providers use this system, a user will only
8
Figure 8: Client–Load Balancer–Backend server connection
with nested batch signatures.
have to download full sized certificates a few times per week (for
connections served through cloud provider load balancers).
Instead of considering the bandwidth reduction, this can be seen
as a usability question for schemes with large signatures such as
Figure 9: Mesh network with envoys using nested signatures
SPHINCS+ . This algorithm relies on mild assumptions, and thus
within mTLS
is a good candidate for CAs. Unfortunately, increasing certificate
sizes to tens of kilobytes can be considered too steep of a require-
ment. Batch signing can solve this in the load balancer setting (and
• openssl: includes a fork of OpenSSL 1.1.1, using support from
similar situations) as full certificates are only sent exceptionally.
the liboqs library [25] and batchtls_engine.
6.3.2 mTLS Mesh. The second scenario is a large-scale container • tcpserver: a TCP server, using openssl and bsign_engine.
mesh network that renews the mutual TLS (mTLS) certificates in its The bsign_engine is the core component of our implementa-
Envoy instances. In this scenario, by considering all the Envoy in- tion. It is responsible for asynchronously gathering client requests
stances as a single batch, there is no need to ever transmit the root into Merkle trees and signing the associated roots with a fixed num-
signature between peers, as every peer has the root signature in its ber of signing threads.
own certificate (see Figure 9). Of course, during the generation of We introduce two implementation-specific parameters, described
the certificates we also benefit from the already described compu- below, which are used to tweak the performance of our implemen-
tational and communication reduction for the HSM which can be tation. This fine-tuning allows us to increase performance based on
quite significant for mesh networks with thousands or tens of thou- a number of factors, most importantly what signature algorithm is
sands of containers. The storage requirements for the whole set of used, the server specifications, and the resulting latency.
mesh certificates (in cached key servers that are used throughout
Merkle Tree Size: This quantifies the amount of messages han-
mesh networks) is also greatly reduced, e.g. for a ten thousand con-
dled per batch signing transaction, which thus affects the latency
tainer mesh using SPHINCS+ signatures it would be reduced from
and throughput of the server. Having a larger number of clients
hundreds of megabytes to hundreds of kilobytes.
(messages as tree leaves) will reduce the average computation time
per client, but at the same time will increase the size of the final
7 IMPLEMENTATION & BENCHMARK SETUP signature (due to the longer path) and increase median and worst-
As a demonstrator for batch signatures we implement them for one case latency. We provide batch signature sizes in Table 1.
of the most important use cases, TLS, and set up a load balancer We parameterise the Merkle tree using the number of leaves in
benchmark. Here we provide details on our implementation and the tree and commonly fix this to a power-of-two for efficiency
on the benchmark setup. reasons, e.g. MT_size = 25 which produces a (balanced) tree of
height 5. However, this parameter may need to be adapted to fit
7.1 Implementation details other hardware or performance constraints.
There are a number of components that make up the overall im-
Signing Threads: These are responsible for taking the ownership
plementation for our proof-of-concept batch signing experiments
of a Merkle tree and for signing its root. When a signing thread
in TLS. These allow us to estimate realistic conditions for secure
is currently signing a Merkle tree root, it cannot handle the next
network communications and thus accurately learn how effective
tree. Thus, if the bsign_engine has a single signing thread, the
our construction is under such conditions. These components are:
worst observed latency will correspond to twice the time needed
• bsign_engine: the Rust implementation of batch signing, which for a signature. By adding additional signing threads, the latency
includes Merkle tree building and batch signing functionalities. will get closer to the time needed for a signature. However, having
We also have a benchmarking wrapper to produce results. too many signing threads may saturate the scheduling and may
9
slow down the engine, or become sub-optimal at the very least. [2] David Benjamin. 2020. Batch Signing for TLS. Internet-Draft draft-ietf-tls-batch-
Inversely, if we reduce the number of signing threads we also limit signing-00. Internet Engineering Task Force. https://datatracker.ietf.org/doc/
draft-ietf-tls-batch-signing/00/ Work in Progress.
the number of cores used for signatures. In Table 2, we choose [3] David Benjamin. 2022. private communication. (2022).
the number of signing threads/cores as the smallest number that [4] David Benjamin, D O’Brien, and Bas Westerbaan. 2023. Merkle Tree
Certificates for TLS. Internet-Draft draft-davidben-tls-merkle-tree-
allows to increase the number of handshakes significantly (with certs-00. Internet Engineering Task Force. https://www.ietf.org/id/
respect to the plain implementation) with a low latency increase. draft-davidben-tls-merkle-tree-certs-00.html Work in Progress.
We use the open-source Rust library, ring [24], for our ECDSA [5] Daniel J. Bernstein, Andreas Hülsing, Stefan Kölbl, Ruben Niederhagen, Joost Ri-
jneveld, and Peter Schwabe. 2019. The SPHINCS+ Signature Framework. In ACM
implementation and the open-source C library, liboqs [25], for CCS 2019, Lorenzo Cavallaro, Johannes Kinder, XiaoFeng Wang, and Jonathan
our implementations of post-quantum signatures. One reason for Katz (Eds.). ACM Press, 2129–2146. https://doi.org/10.1145/3319535.3363229
using this library in particular is because it has integration into [6] Daniel J Bernstein and Tanja Lange. 2018. SUPERCOP: System for unified per-
formance evaluation related to cryptographic operations and primitives. https:
OpenSSL 1.1.12 which we patch for our experiments to work with //bench.cr.yp.to/supercop.html. (2018).
the bsign_engine. The patch modifies the state machine of OpenSSL [7] AWS CloudHSM. 2023. FAQS - Performance and capacity. https://aws.amazon.
com/cloudhsm/faqs/#Performance_and_capacity. (2023). [Online; accessed 21-
to inject batch signing under certain conditions, but in such a way February-2023].
that it remains functional with classic TLS when these conditions [8] Erik Dahmen, Katsuyuki Okeya, Tsuyoshi Takagi, and Camille Vuillaume. 2008.
are not met. More specifically, our patch adds a new structure – Digital Signatures Out of Second-Preimage Resistant Hash Functions. In Post-
quantum cryptography, second international workshop, PQCRYPTO 2008, Jo-
BATCHTLS_CTX – to the OpenSSL context – SSL_CTX – to track the hannes Buchmann and Jintai Ding (Eds.). Springer, Heidelberg, 109–123. https:
context of the engine. We use environment variables for setting the //doi.org/10.1007/978-3-540-88403-3_8
parameters in bsign_engine for simplicity, as opposed to adding [9] Andrew Fregly, Joseph Harvey, Burton S. Kaliski Jr., and Swapneel Sheth. 2022.
Merkle Tree Ladder Mode: Reducing the Size Impact of NIST PQC Signature
new APIs on top of SSL_CTX. Algorithms in Practice. Cryptology ePrint Archive, Report 2022/1730. (2022).
These aforementioned components are the ingredients that make https://eprint.iacr.org/2022/1730.
[10] Nicholas Genise and Daniele Micciancio. 2018. Faster Gaussian Sampling for
up our overall implementation for batch signing in TLS, once com- Trapdoor Lattices with Arbitrary Modulus. In EUROCRYPT 2018, Part I (LNCS),
bined with a TCP server. Implementing it this way gives us a real- Jesper Buus Nielsen and Vincent Rijmen (Eds.), Vol. 10820. Springer, Heidelberg,
istic environment in order to run our experiments; providing con- 174–203. https://doi.org/10.1007/978-3-319-78381-9_7
[11] Craig Gentry, Chris Peikert, and Vinod Vaikuntanathan. 2008. Trapdoors
ditions that we would expect to see in the real world. for hard lattices and new cryptographic constructions. In 40th ACM STOC,
Richard E. Ladner and Cynthia Dwork (Eds.). ACM Press, 197–206. https:
//doi.org/10.1145/1374376.1374407
7.2 Benchmark Setup [12] Shay Gueron and Vlad Krasnov. 2012. Parallelizing message schedules to accel-
We demonstrate the application of batch signatures for the TLS erate the computations of hash functions. Journal of Cryptographic Engineering
2, 4 (2012), 241–253.
use case with the results for these shown in Table 2. For this setup [13] James Howe and Bas Westerbaan. 2022. Benchmarking and Analysing the NIST
we took between one and four client machines and a single server PQC Finalist Lattice-Based Signature Schemes on the ARM Cortex M7. Cryp-
machine. Each of them uses a Google Cloud C2 instance which has tology ePrint Archive, Report 2022/405. (2022). https://eprint.iacr.org/2022/405.
[14] Andreas Hulsing, Daniel J. Bernstein, Christoph Dobraunig, Maria Eichlseder,
an Intel 3.9 GHz Cascade Lake processor. The specific instance type Scott Fluhrer, Stefan-Lukas Gazdag, Panos Kampanakis, Stefan Kolbl, Tanja
we used was the c2-standard-30, which offers 30 (virtual) CPU Lange, Martin M Lauridsen, Florian Mendel, Ruben Niederhagen, Christian
threads, 120 GB memory, and a (max) egress bandwidth of 32 Gbps. Rechberger, Joost Rijneveld, Peter Schwabe, Jean-Philippe Aumasson, Bas West-
erbaan, and Ward Beullens. 2022. SPHINCS+. Technical Report. National Insti-
We disable hyper-threading in order to have more stable tests. tute of Standards and Technology. available at https://csrc.nist.gov/Projects/
The results in Table 2 labelled as ‘plain’ are taken from a multi- post-quantum-cryptography/selected-algorithms-2022.
[15] Matt Klein. 2017. Lyft’s Envoy: Experiences Operating a Large Service Mesh.
threaded implementation (with a pool of as many threads as com- SREcon17 Americas. (2017). available at https://www.usenix.org/sites/default/
puter cores and with select/poll handling). For the batch signing files/conference/protected-files/srecon17americas_slides_klein.pdf.
results, we use a limit on the maximum size of the Merkle tree. [16] Vadim Lyubashevsky, Léo Ducas, Eike Kiltz, Tancrède Lepoint, Peter Schwabe,
Gregor Seiler, Damien Stehlé, and Shi Bai. 2022. CRYSTALS-DILITHIUM. Techni-
The amount of cores used for signatures is estimated for the plain cal Report. National Institute of Standards and Technology. available at https://
approach out of the computational cost of one signature, and fixed csrc.nist.gov/Projects/post-quantum-cryptography/selected-algorithms-2022.
(by reserving cores explicitly) for the batch signing approach. [17] Daniele Micciancio and Chris Peikert. 2012. Trapdoors for Lattices: Simpler,
Tighter, Faster, Smaller. In EUROCRYPT 2012 (LNCS), David Pointcheval and
The results in Table 2 provide both handshakes per second (es- Thomas Johansson (Eds.), Vol. 7237. Springer, Heidelberg, 700–718. https://doi.
sentially, throughput) and latency (for various percentiles). We kept org/10.1007/978-3-642-29011-4_41
[18] Christian Paquin, Douglas Stebila, and Goutam Tamvada. 2020. Benchmarking
Merkle tree sizes to either 16 or 32, since larger sizes incurred much Post-quantum Cryptography in TLS. In Post-Quantum Cryptography - 11th In-
higher latency costs. Indeed, for large trees the latency added is un- ternational Conference, PQCrypto 2020, Jintai Ding and Jean-Pierre Tillich (Eds.).
realistic, despite throughput being increased. Springer, Heidelberg, 72–91. https://doi.org/10.1007/978-3-030-44223-1_5
[19] Thomas Pornin. 2019. New Efficient, Constant-Time Implementations of Falcon.
The performance results are presented in Table 2 and discussed Cryptology ePrint Archive, Report 2019/893. (2019). https://eprint.iacr.org/2019/
on the associated caption. Results for SPHINCS+ and a second bench- 893.
mark in the HSM setting will be presented in a full version of this [20] Thomas Prest, Pierre-Alain Fouque, Jeffrey Hoffstein, Paul Kirch-
ner, Vadim Lyubashevsky, Thomas Pornin, Thomas Ricosset, Gregor
paper. Seiler, William Whyte, and Zhenfei Zhang. 2020. FALCON. Techni-
cal Report. National Institute of Standards and Technology. avail-
able at https://csrc.nist.gov/projects/post-quantum-cryptography/
REFERENCES post-quantum-cryptography-standardization/round-3-submissions.
[1] Kahraman Akdemir, Martin Dixon, Patrick Fay Wajdi Feghali, Vinodh Gopal, [21] Thomas Prest, Pierre-Alain Fouque, Jeffrey Hoffstein, Paul Kirchner, Vadim
Jim Guilford, Erdinc Ozturk, Gil Wolrich, and Ronen Zohar. 2010. Breakthrough Lyubashevsky, Thomas Pornin, Thomas Ricosset, Gregor Seiler, William Whyte,
AES Performance with Intel® AES New Instructions. Whitepaper. Intel. and Zhenfei Zhang. 2022. FALCON. Technical Report. National Institute
of Standards and Technology. available at https://csrc.nist.gov/Projects/
post-quantum-cryptography/selected-algorithms-2022.
2 see https://openquantumsafe.org/applications/tls.html.
10
[22] Peter Schwabe, Douglas Stebila, and Thom Wiggers. 2020. Post-Quantum
TLS Without Handshake Signatures. In ACM CCS 2020, Jay Ligatti, Xinming
Ou, Jonathan Katz, and Giovanni Vigna (Eds.). ACM Press, 1461–1480. https:
//doi.org/10.1145/3372297.3423350
[23] Dimitrios Sikeridis, Panos Kampanakis, and Michael Devetsikiotis. 2020. Post-
Quantum Authentication in TLS 1.3: A Performance Study. In NDSS 2020. The
Internet Society.
[24] Brian Smith. 2023. Crate ring. https://github.com/briansmith/ring. (2023). [On-
line; accessed 24-February-2023].
[25] Douglas Stebila and Michele Mosca. 2016. Post-quantum Key Exchange for
the Internet and the Open Quantum Safe Project. In SAC 2016 (LNCS), Roberto
Avanzi and Howard M. Heys (Eds.), Vol. 10532. Springer, Heidelberg, 14–37.
https://doi.org/10.1007/978-3-319-69453-5_2
[26] Emin Topalovic, Brennan Saeta, Lin-Shung Huang, Collin Jackson, and Dan
Boneh. 2012. Towards Short-Lived Certificates. In IEEE Oakland Web 2.0 Se-
curity and Privacy (W2SP).
[27] Shalanda D. Young. 2022. National Security Memo on Promoting
United States Leadership in Quantum Computing While Mitigating
Risks to Vulnerable Cryptographic Systems (NSM-10). Executive Office
of the President, Office of Management and Budget, Washington, DC,
USA. (2022). https://www.whitehouse.gov/wp-content/uploads/2022/11/
M-23-02-M-Memo-on-Migrating-to-Post-Quantum-Cryptography.pdf
11