0% found this document useful (0 votes)
54 views14 pages

Formally Verified Correctness Bounds For Lattice-Based Cryptography

Uploaded by

marcelo almeida
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views14 pages

Formally Verified Correctness Bounds For Lattice-Based Cryptography

Uploaded by

marcelo almeida
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Formally Verified Correctness Bounds for Lattice-Based

Cryptography
Manuel Barbosa Matthias J. Kannwischer Thing-han Lim
[email protected] [email protected] [email protected]
University of Porto (FCUP) Chelpis Quantum Corp Academia Sinica
Porto, Portugal Taipei, Taiwan Taipei, Taiwan
INESC TEC, Porto, Portugal
Porto, Portugal

Peter Schwabe Pierre-Yves Strub


[email protected] [email protected]
MPI-SP PQShield
Bochum, Germany Paris, France
Radboud University
Nijmegen, The Netherlands

Abstract 1 Introduction
Decryption errors play a crucial role in the security of KEMs based The transition to post-quantum cryptography (PQC) has seen an
on Fujisaki-Okamoto because the concrete security guarantees important development in 2024 with the publication of the first
provided by this transformation directly depend on the probability PQC standards FIPS-203 [31], FIPS-204 [32] and FIPS-205 [33] by
of such an event being bounded by a small real number. In this paper NIST. The standardized algorithms are called, ML-KEM, ML-DSA
we present an approach to formally verify the claims of statistical and SLH-DSA, which match the Kyber, Dilithium and SPHINCS+
probabilistic bounds for incorrect decryption in lattice-based KEM submissions with minor changes, respectively. These algorithms
constructions. Our main motivating example is the PKE encryption will see large-scale deployment in the near future in many practical
scheme underlying ML-KEM. We formalize the statistical event applications as mitigation for the potential arrival of a quantum
that is used in the literature to heuristically approximate ML-KEM computer. Key Encapsulation Mechanisms (KEM), such as ML-KEM
decryption errors and confirm that the upper bounds given in the are arguably the most critical components in the PQC transition,
literature for this event are correct. We consider FrodoKEM as as they protect against so-called harvest now, decrypt later attacks
an additional example, to demonstrate the wider applicability of which allow an attacker to decrypt data exchanged today with
the approach and the verification of a correctness bound without a future quantum computer. For this reason, ML-KEM is already
heuristic approximations. We also discuss other (non-approximate) being deployed by software giants such as Google and AWS [5,
approaches to bounding the probability of ML-KEM decryption. 12], and the number of deployed implementations is expected to
grow fast in the near future. Another competitor in the NIST PQC
CCS Concepts competition for KEMs is called FrodoKEM. Although not selected
by NIST for standardization, FrodoKEM’s conservative design—
• Security and privacy → Logic and verification; Cryptogra-
its security is based on the standard Learning With Errors (LWE)
phy.
assumption, rather than the Module LWE (MLWE) assumption
used by ML-KEM—has led to endorsement of entities such as the
Keywords German Federal Office for Information Security (BSI) [23] and the
Computer-Aided Cryptography, Formal Verification, EasyCrypt French National Agency for the Security of Information Systems
(ANSSI) [1] for adoption in the transition to PQC. Additionally,
ISO/IEC has approved its standardization in the revision of ISO/IEC
18033-2 [24].
Widely deployed cryptographic (de facto) standards such as
ML-KEM and FrodoKEM will be critical security components in
the ITC infrastructure of the coming decades, and so it is crucial
that their design is validated to the highest level of assurance. For
ML-KEM, several recent works have looked at formally verifying
both the design and efficient implementation of the standard. In
particular, Almeida et al. [4] presented formally verified proofs of
cryptographic security (IND-CCA) and correctness—the guarantee
that decapsulation inverts encapsulation—in EasyCrypt. Alterna-
tive proofs of IND-CPA security and correctness were given by
Manuel Barbosa, Matthias J. Kannwischer, Thing-han Lim, Peter Schwabe, and Pierre-Yves Strub

Kreuzer [29] in Isabelle. However, in both of these works, there decryption failure of the IND-CPA scheme—which affects, not only
is one aspect of the security and correctness claims that support the decryption failure probability of the IND-CCA scheme, but also
the ML-KEM design that is not formally verified: the concrete val- the corresponding security bound.2 Interestingly, the bounds for
ues for the probability of a failed decryption. Both works account all three schemes were computed using somewhat similar Python
for the probability of a failed decryption by defining a statistical scripts, which trace their origins back to the script used to bound
event over the distribution of a complex noise expression, and the failure probability of NewHope [3].3
then proving that bounding the probability of such an event yields For FrodoKEM, the computation performed by this script can be
an upper bound for decryption failures. However, neither work described as follows. The IND-CPA scheme decryption procedure
provides a means to compute or even upper-bound this concrete of FrodoKEM recovers 𝑀 ′ = 𝐶 2 − 𝐶 1𝑆 where 𝑀 ′ , 𝐶 1 and 𝐶 2 are
probability to a high-level of assurance. In this paper we address matrices of (binary) field elements and 𝑀 ′ encodes a message in the
this gap. We begin by recalling the importance of decryption errors most significant bits of its entries. Here, (𝐶 1, 𝐶 2 ) are produced by
in post-quantum KEM security. the encryption procedure as 𝐶 1 = 𝑆 ′𝐴 + 𝐸 ′ and 𝐶 2 = 𝑆 ′ 𝐵 + 𝐸 ′′ + 𝑀,
where matrices 𝐴 and 𝐵 = 𝐴𝑆 + 𝐸 are fixed by the public encryption
The importance of decryption errors. Unlike Diffie–Hellman and
key, the 𝑆 matrix is the secret key, and 𝑆 ′ , 𝐸, 𝐸 ′ and 𝐸 ′′ are noise
RSA-based constructions, which typically yield perfectly correct
matrices sampled from distributions with very small support—every
cryptographic constructions, lattice-based constructions often al-
finite field element produced by these distributions is an element
low for a low probability of error in order to optimize the compro-
close to 0 chosen from a small set of possibilities. A straightforward
mise between security and performance. One might think that a
linear algebra argument shows that Decode(𝑀 ′ ) = Decode(𝑀) if
decryption error would represent only an inconvenience for practi-
the noise expression 𝐸 ′′′ = 𝑆 ′ 𝐸 + 𝐸 ′′ − 𝐸 ′𝑆 results in a matrix where
cal applications, e.g., in that it would cause message transmission to
all entries are field elements with a small norm, i.e., they are small
sometimes fail. However, it is well known that, when freely exposed
enough that the entries in 𝑀 and 𝑀 ′ have the same most significant
to an adversary, decryption errors can lead to devastating attacks
bits. The Python script brute-force computes the probability mass
on lattice-based constructions [7, 9, 13, 14, 20–22]. Put differently,
function of a coefficient in 𝐸 ′′′ and computes the tail probability of
lattice-based KEM constructions such as ML-KEM and FrodoKEM
a value exceeding the correctness threshold. The overall correctness
are supported by IND-CCA security proofs where the overall bound
bound follows from arguing that all entries in 𝐸 ′′′ , individually,
on an attacker’s advantage in breaking the KEM must typically ac-
have the same distribution, and computing a union bound. We
count for the probability that the attacker can cause a decryption
note that these computations are performed using high-precision
error to occur. This means that, in order to have a concrete security
floating-point arithmetic and result in values of the order of 2 −200 .
bound for the construction, one must bound the probability of a
The cases of ML-KEM and Saber are slightly more intricate due
decryption error.
to the use of rounding, but the principle is the same. Prior to this
Intuitively, it is easy to explain why this is the case. Both ML-KEM
work, the correctness of the above simplification steps—which are
and FrodoKEM internally use the Fujisaki–Okamoto [25] trans-
crucial to allow an efficient computation of the error—and therefore
formation, where IND-CCA security is achieved by having the
computed bounds that support the ML-KEM standard, and the
decapsulation algorithm check consistency of a recovered decryp-
FrodoKEM and Saber proposals have not been subject to formal
tion result via re-encryption. Informally, decapsulation checks that
verification.
𝐶 = Enc(𝑝𝑘, 𝑀; 𝐻 (𝑀)), where 𝑀 = Dec(𝑠𝑘, 𝐶) and 𝐻 (𝑀) is used
to derive all randomness required by encryption pseudo-randomly. Our Contributions. Our main contribution is an EasyCrypt formal-
If the check succeeds, then decapsulation proceeds; otherwise the ization that permits connecting the formal definition of a decryption
ciphertext is rejected. Indeed, correct decryption and re-encryption error for a KEM construction to an efficiently computable specifi-
is taken as evidence that 𝐶 was honestly constructed by the adver- cation of a statistical event that provably yields an upper-bound
sary starting from 𝑀, rather than mauling another ciphertext from for this security-critical parameter. More in detail, our individual
which it is trying to extract information. The soundness of this tech- contributions are the following.
nique crucially depends on the adversary not being able to exploit • We provide a framework to reason in EasyCrypt about distribu-
decryption errors, which is why the probability of a correctness tions over a restricted class of matrix expressions, and proving
error appears in the security bound for the IND-CCA construction. that the relevant events related to decryption errors can be ex-
Bounding the probability of decryption errors. Among the algo- pressed as a union bound over events that can be checked for
rithms considered for the last round of the NIST PQC competition, only one of the matrix entries. We extend this result to cases
four of them were very close in structure: Kyber [35], Saber [15], where matrix entries are expressions in a certain class of poly-
FrodoKEM [30], and NTRU LPRime [8].1 All of these schemes start nomial rings, in which the event is checked for only one of the
from a lattice-based IND-CPA encryption scheme and then apply polynomial coefficients. This framework reduces the problem of
the Fujisaki–Okamoto transform outlined above. However, while
2 The results in this paper focus on the IND-CPA public-key encryption scheme sub-
NTRU LPRrime selects parameters avoiding decryption errors al- components of the above algorithms. This means that we can talk interchangeably
together, the other three proposals support the soundness of their about Kyber (round 3) and ML-KEM, since there is no difference in their IND-CPA
designs and parameter choices by computing exact bounds for statis- subcomponents. To avoid confusion, and because we believe this is where the interest
lies for practical applications, we will mostly refer to ML-KEM from this point onwards
tical events that permit setting upper bounds for the probability of a when we talk about our results.
3 See https://github.com/newhopecrypto/newhope-usenix/blob/master/scripts/failure.
1 FrodoKEM and NTRU LPRrime were not finalists, but kept as an alternate candidates. py
Formally Verified Correctness Bounds for Lattice-Based Cryptography

bounding the probability of decryption errors to the problem of error probability that is relevant for the setting of parameters of
comparing the absolute value of a finite field element sampled lattice-based KEMs. Our results confirm that, for FrodoKEM this is
from a distribution, to a fixed threshold. easy to do, whereas for ML-KEM obtaining a formal proof comes
• We propose an approach to connect EasyCrypt specifications at a cost of significantly overestimating the probability.5
of probability bounds as above to OCaml computations that are Alternatively, Hövelmanns, Hülsing, and Majenz [26] observe
guaranteed by construction to provide a concrete upper bound. that the notion of cryptographic correctness (i.e., absence of de-
We then build on this feature to compute upper bounds for de- cryption failures) used in Fujisaki-Okamoto security proofs may be
cryption failure probabilities for FrodoKEM and ML-KEM. The too strong, in that it requires the bound to hold against an adver-
algorithm has a reasonable execution time, whenever the dis- sary that learns the secret key. The authors propose an alternative
tributions have a simple description and small enough support. (weaker) definition that removes this requirement, but fundamen-
On a modest personal machine, the more costly computations tally modifies the way in which decryption failures are estimated:
we performed for FrodoKEM take a few hours to complete. Our one needs to bound the difference in probability of failure with
results can be seen as a formally verified implementation of the respect to another key pair. We are not aware of concrete bounds
Python scripts used to obtain the upper bounds presented in the computed for these definitions, but it is an interesting direction
NIST post-quantum submissions. for future work to formally verify their correctness. In this work
• We show that, for FrodoKEM, our EasyCrypt development per- we therefore work with the more standard (stronger) notion of
mits connecting the formal definition of correctness to a fully decryption failure probability and study how bounds can formally
concrete correctness bound, where all statistical terms can be verified for ML-KEM and FrodoKEM.
computed: our correctness theorem relates the adversary’s advan- A sequence of works[13, 14, 16, 18, 19] studies the potential of
tage in winning the correctness game for the KEM to a descrip- exploiting decryption failures in lattice-based schemes, and Saber
tion of the computation required to determine the probability and Kyber in particular, in both single and multi-target scenarios.
value, which is then carried out in OCaml. As a side contribu- These works also investigate how to obtain good estimates for de-
tion, we give a computer-verified security proof for the IND-CPA cryption failure probabilities, and various estimation techniques are
component of FrodoKEM that goes down to a variant of the proposed to deal with correlations between rounding noise across
standard LWE problem (rather than MLWE as in ML-KEM [10] coefficients. In particular, these works point out that assuming inde-
or LWR as in Saber [27]). This proof is similar in structure to pendence across coefficients may be overly optimistic. We work in
those given in [4, 27, 29] but, to the best of our knowledge, such the simpler setting of single-key attack models, and consider only
a proof had not been previously verified. In particular, our proof the most basic technique for approximate probability estimation
includes a hybrid argument that reduces the LWE problem to in ML-KEM, which consists of assuming that all rounding errors
the multi-instance LWE problem required for FrodoKEM. across coefficients are independent. This was the approach used
• We revisit the formally verified correctness proofs for ML-KEM in the Kyber submission to NIST. We leave it as an interesting di-
in [4, 29] and resolve one of the proof goals left for future work: rection for future work to formally verify the correctness of other
formally verifying that the simplified (heuristic) computations for approximate estimation techniques. The impact of decryption fail-
the correctness bounds given in the documentation that suported ures in other families of cryptographic constructions have also been
this algorithm in the NIST PQC competition are correct. This studied, e.g. in [36] for code-based cryptography, but these analyses
shows the generality of our method and extends the formal are so far out of reach of our formal framework.
verification results for ML-KEM [4] to cover all the correctness
claims that supported it in the NIST competition. We also provide Structure of this paper. In Section 2 we provide some necessary
a new (more conservative) bound for ML-KEM decryption errors background on ML-KEM, FrodoKEM, and EasyCrypt. Then in Sec-
that can be justified under the MLWE assumption, i.e., we prove tion 3 and in Section 4 we describe the proofs that were formally
that this bound is correct unless MLWE can be broken.4 verified in EasyCrypt. Finally, in Section 5 we discuss our approach
to computing upper bounds in a formally verified way, and present
Related Work. Two previous works presented formally verified our results for ML-KEM and FrodoKEM.
proofs of security and correctness for ML-KEM [4, 29]. Although
these works covered security and correctness guarantees, none of Access to development. The EasyCrypt and OCaml code described
them addressed the problem of proving that the concrete bounds in this paper are submitted as supplementary material.
for decryption failures claimed for the construction hold. We are
not aware of prior work formally verifying any of the FrodoKEM 2 Preliminaries
security and correctness claims. We now briefly discuss the mechanized reasoning tools we use
The impact of decryption failures in lattice-based KEM security for our proofs and give an overview of the IND-CPA encryption
has been studied in the literature from two perspectives: a provable schemes that underlie the FrodoKEM and ML-KEM constructions,
security perspective, and an attack perspective. In this work we which is all that we need to present our work on formally verifying
are interested in the provable security perspective, i.e., how one the correctness bounds for both schemes. The cryptographic defi-
can obtain a concrete (formally verified) bound for the decryption nitions used are standard and we try to keep as our presentation
4 Proving the claim that the heuristic bound, which is computed over a simplified
distribution, applies to ML-KEM is an open problem. Our new bound provably applies, 5We do not exclude that a better provably secure bound can be established using
but it is significantly larger than the heuristic one. different techniques, but we leave this as an open problem.
Manuel Barbosa, Matthias J. Kannwischer, Thing-han Lim, Peter Schwabe, and Pierre-Yves Strub

Game COR: Game IND-CPA: Algorithm 2 K-PKE.Enc(pk, 𝑚): encryption


1: (𝑝𝑘, 𝑠𝑘) ←$ Gen O ( ) 1: (𝑝𝑘, 𝑠𝑘) ←$ Gen O ( )
Require: Public key pk = ( t̂, 𝜌) ∈ R𝑞𝑘 × {0, 1}256 , message 𝑚 ∈
2: 𝑚 ←$ A O (𝑝𝑘, 𝑠𝑘) 2: (𝑚 0, 𝑚 1, 𝑠𝑡) ←$ A1O (𝑝𝑘)
3: 𝑐 ←$ Enc O (𝑝𝑘, 𝑚) 3: 𝑏 ←$ {0, 1} {0, 1}256
4: return (𝑚 ≠ 4: 𝑐 ∗ ←$ Enc O (𝑝𝑘, 𝑚𝑏 ) Ensure: Ciphertext 𝑐 ∈ R𝑑𝑘 × R𝑑 𝑣
Dec O (𝑠𝑘, 𝑐)) 𝑏 ′ ←$ A2O (𝑐 ∗, 𝑠𝑡)
𝑢
5:
1: 𝑟 ←$ {0, 1}256
6: return 𝑏 ′ = 𝑏
2: Â ← Parse(XOF(𝜌))
3: r ← CBD𝜂1 (PRF(𝑟 )) ⊲ r ∈ R𝑞𝑘
4: e1, 𝑒 2 ← CBD𝜂2 (PRF(𝑟 )) ⊲ e1 ∈ R𝑞𝑘 , 𝑒 2∈ R𝑞
Figure 1: Correctness and Security of a PKE in the Random
5: r̂ ← NTT(r)
Oracle Model.
6: u ← NTT −1 ( Â𝑇 r̂) + e1
Algorithm 1 K-PKE.Gen(): key generation 7: 𝑣 ← NTT −1 ( t̂𝑇 r̂) + 𝑒 2 + ToPoly(𝑚)
8: c1 ← Compress𝑞 (u, 𝑑𝑢 )
𝑘
Ensure: Secret key sk ∈ R𝑞𝑘 and public key pk ∈ Rˆ𝑞 × {0, 1}256 9: 𝑐 2 ← Compress𝑞 (𝑣, 𝑑 𝑣 )
1: 𝑑 ←$ {0, 1} 256 10: return 𝑐 = (c1, 𝑐 2 )
2: (𝜌, 𝜎) ← G(𝑑)
3: Â ← Parse(XOF(𝜌))
Algorithm 3 K-PKE.Dec(sk, 𝑐): decryption
4: s, e ← CBD𝜂 1 (PRF(𝜎)) ⊲ s, e ∈ R𝑞𝑘
5: ŝ ← NTT(s) Require: Secret key sk = ŝ ∈ R𝑞𝑘 and ciphertext 𝑐 = (c1, 𝑐 2 ) ∈
6: ê ← NTT(e) R𝑑𝑘 × R𝑑 𝑣
7: t̂ ← Âŝ + ê 𝑢
Ensure: Message 𝑚 ∈ {0, 1}256
8: return sk = ŝ and pk = ( t̂, 𝜌)
1: ũ ← Decompress𝑞 (c1 , 𝑑𝑢 )
2: 𝑣˜ ← Decompress𝑞 (𝑐 2 , 𝑑 𝑣 )
3: 𝑚 ← ToMsg(𝑣˜ − NTT −1 (ŝ𝑇 NTT( ũ)))
of the constructions close to the specifications of the algorithms 4: return 𝑚
found in the literature [30, 35].

2.1 Public-Key Encryption ML-KEM works in the ring R𝑞 = Z𝑞 [𝑋 ]/(𝑋 𝑛 + 1) with 𝑞 = 3329
and 𝑛 = 256. The core operations are on small-dimension vec-
Syntax. A public-key encryption scheme consists of three algo- tors and matrices over R𝑞 ; the dimension depends on the param-
rithms PKE = (Gen, Enc, Dec) and a finite message space M. The eter 𝑘, which is different for different parameter sets of ML-KEM:
key generation algorithm Gen outputs a key pair (𝑝𝑘, 𝑠𝑘). The ML-KEM-512 (NIST security level 1) uses 𝑘 = 2, ML-KEM-768 (NIST
encryption algorithm Enc, on input 𝑝𝑘 and a message 𝑚 ∈ M, security level 3) uses 𝑘 = 3, and ML-KEM-1024 (NIST security level
outputs a ciphertext 𝑐 ←$ Enc(𝑝𝑘, 𝑚). The decryption algorithm 5) uses 𝑘 = 4. We denote elements in R𝑞 with regular lower-case
Dec, on input 𝑠𝑘 and a ciphertext 𝑐, outputs either a message letters (e.g., 𝑣); vectors over R𝑞 with bold-face lower-case letters
𝑚 ← Dec(𝑠𝑘, 𝑐) ∈ M or a special symbol ⊥∉ M. (e.g., u), and matrices over R𝑞 with bold-face upper-case letters
(e.g., A).
Correctness. Correctness of a PKE is defined as in Figure 1 (left). In these descriptions, XOF is an extendable output function that
We give the definition in the Random Oracle Model, as this is what in ML-KEM is instantiated with SHAKE-128 [34]. Parse interprets
we are going to use. Note that the adversary gets the secret key as an outputs of XOF as sequence of 12-bit unsigned integers and runs re-
input. We say a PKE is 𝛿-correct if, for all (possibly computationally jection sampling to obtain coefficients that look uniformly random
unbounded) adversaries A placing at most 𝑞 queries to the random modulo 𝑞. CBD𝜂 denotes sampling coefficients from a centered bi-
oracle, we have that Pr[CORPKE A ⇒ 1] ≤ 𝛿 (𝑞). nomial distribution with parameter 𝜂;6 extension from coefficients
to (vectors of) polynomials is done by sampling each coefficient
Security. In this paper we are only considering IND-CPA security. independently from CBD𝜂 . For example, both ML-KEM-768 and
We define the IND-CPA game as in Figure 1 (right), and the IND- ML-KEM-1024 use 𝜂 1 = 𝜂 2 = 2. The sampling routine is parame-
CPA advantage function of an adversary A = (A1, 𝐴2 ) against terized by a pseudorandom function PRF𝑘 with key 𝑘. NTT is the
PKE as number-theoretic transform of a polynomial in R𝑞 . Both input and
IND-CPA A output of NTT can be written as a sequence of 256 coefficients in Z𝑞
AdvPKE (A) = | Pr[IND-CPAPKE ⇒ 1] − 1/2| . and typical implementations perform the transform inplace. How-
ever, output coefficients do not have any meaning as a polynomial
2.2 The IND-CPA PKE underlying ML-KEM in R𝑞 . We therefore denote the output domain as Rˆ𝑞 ; we apply
We give a high-level algorithmic description of K-PKE, the IND- the same notation for elements in Rˆ𝑞 , e.g., 𝑢ˆ = NTT(𝑢). Applica-
CPA-secure public-key encryption scheme underlying ML-KEM, in tion of NTT to vectors and matrices over R𝑞 is done element-wise.
Algorithms 1 to 3. For a more implementation-oriented description
that operates on byte arrays, see [31, Algs. 12–14]. 6 This means we have 𝐵 (𝑛, 𝑝 ) with 𝑝 = 1/2, 𝑛 = 2𝜂 and expected value shifted to 0.
Formally Verified Correctness Bounds for Lattice-Based Cryptography

Compress𝑞 compresses elements in R𝑞 (or R𝑞𝑘 ) by rounding co- Algorithm 4 FrodoPKE.Gen(): key generation
efficients to a smaller modulus 2𝑑 𝑣 (or 2𝑑𝑢 ). For ML-KEM-768 we Ensure: Key pair (pk, sk) ∈ ({0, 1}𝑙𝑒𝑛seedA × Z𝑞𝑛×𝑛¯ ) × Z𝑞𝑛×𝑛¯
have 𝑑 𝑣 = 4 and 𝑑𝑢 = 10. For ML-KEM-1024 we have 𝑑 𝑣 = 5 and 𝑙𝑒𝑛
1: seedA ←$ {0, 1} seedA
𝑑𝑢 = 11. Decompress𝑞 is an approximate inverse of Compress𝑞 .
2: A ← Gen(seedA )
For an integer 𝑥 ∈ [0..3329), these functions are defined as: 𝑙𝑒𝑛
3: seedSE ←$ {0, 1} seedSE
(0)
4: (r , ..., r (2𝑛 ¯
𝑛−1) ) ← SHAKE(0𝑥5𝐹 ||seedSE, 2𝑛𝑛¯ · 𝑙𝑒𝑛 𝜒 )
Compress𝑞 (𝑥, 𝑑) = ⌊(2𝑑 /𝑞) · 𝑥⌉ mod 2𝑑
5: S𝑇 ← SampleMatrix((r (0) , ..., r (𝑛𝑛−1) ¯ ), 𝑛,
¯ 𝑛,𝑇𝜒 )
Decompress𝑞 (𝑥, 𝑑) = ⌊(𝑞/2𝑑 ) · 𝑥⌉ . (𝑛 ¯
𝑛) (2𝑛 ¯
6: E ← SampleMatrix((r , ..., r 𝑛−1) ¯ 𝜒)
), 𝑛, 𝑛,𝑇
7: B = AS + E
ToPoly maps 256-bit strings to elements in R𝑞 by mapping a zero 8: return (pk, sk) ← ((seedA , B), S𝑇 )
𝑞
bit to a zero coefficient and mapping a one bit to a 2 coefficient;
ToMsg rounds coefficients to bits to recover a message from a noisy
version of a polynomial generated by ToPoly. Algorithm 5 FrodoPKE.Enc(pk, 𝜇): encryption
Require: Public key pk = (seedA, B) ∈ {0, 1}𝑙𝑒𝑛seedA × Z𝑞𝑛×𝑛¯ and
2.3 The IND-CPA PKE underlying FrodoKEM message 𝜇 ∈ M
Ensure: Ciphertext c = (C1, C2 ) ∈ Z𝑚×𝑛 ¯ × Z𝑚× ¯ 𝑛¯
FrodoKEM is based on the algebraically unstructured LWE problem. 𝑞 𝑞
It uses an error distribution that closely approximates a wide Gauss- 1: A ← Gen(𝑠𝑒𝑒𝑑𝐴 )
ian distribution, parameterized to guarantee that the best known 𝑙𝑒𝑛
2: seedSE ←$ {0, 1} seedSE
attacks on the resulting LWE instance require a computational effort (0)
3: (r , ..., r (2𝑛 ¯
𝑛−1) ) ← SHAKE(0𝑥96||seedSE, (2𝑚𝑛 ¯ +𝑚¯ 𝑛)
¯ ·𝑙𝑒𝑛 𝜒 )
that is well beyond the one mandated by the target security level. ¯
4: S′ ← SampleMatrix((r (0) , ..., r (𝑚𝑛−1) ), 𝑚,
¯ 𝑛,𝑇𝜒 )
In addition, FrodoKEM is designed with simplicity in mind [2], as 5: E′ ← SampleMatrix((r (𝑚𝑛) ¯ , ..., r (2𝑚𝑛−1)
¯ ), 𝑚,¯ 𝑛,𝑇𝜒 )
evidenced by: (1) its use of integer modulo 𝑞 ≤ 216 , which is always ′′ (2 ¯
𝑚𝑛) (2 ¯
𝑚𝑛+ 𝑚¯ ¯
𝑛−1)
6: E ← SampleMatrix((r , ..., r ¯ 𝑛,𝑇
), 𝑚, ¯ 𝜒)
a power of 2; (2) the main operations in the scheme consisting of
7: B′ = S′ A + E′
simple matrix-vector multiplications, unlike the more complex op-
8: V′ = S′ B + E′′
erations in systems based on algebraically structured LWE variants;
9: return c ← (C1 , C2 ) = (B′ , V + Encode(𝜇))
and (3) its straightforward encoding of secret bits by multiplying by
𝑞/2𝐵 (for 𝐵 bits), avoiding the complex bandwidth-saving optimiza-
tions required by some Ring-LWE-based and Module-LWE-based Algorithm 6 FrodoPKE.Dec(sk, c): decryption
schemes. ¯ ¯ 𝑛¯ and secret key
Require: Ciphertext c = (C1, C2 ) ∈ Z𝑚×𝑛
𝑞 × Z𝑚×
𝑞
FrodoKEM is parameterized by the pseudorandom function 𝑇 ¯
𝑛×𝑛
(PRF) used to generate the public matrix A. Two options are avail- sk = S ∈ Z𝑞
able for generating A: AES-128 and SHAKE-128. Ensure: Decrypted message 𝜇 ′ ∈ M
1: M = C2 − C1 S
2.3.1 Technical description of FrodoKEM.. We give a high-level 2: return message 𝜇 ′ ← Decode(M)
description of the IND-CPA PKE scheme underlying FrodoKEM
in Algorithm 4, Algorithm 5 and Algorithm 6. FrodoKEM works
under a quotient ring Z𝑞 and the main operations are on matrices 2.4 The EasyCrypt proof assistant
over Z𝑞 , where FrodoKEM-640 uses 𝑞 = 215 while FrodoKEM- EasyCrypt7 [6] is a proof assistant for formalizing proofs of cryp-
976 and FrodoKEM-1344 use 𝑞 = 216 . Gen generates a pseudo- tographic properties. Its primary feature is the Probabilistic Re-
random matrix A either by SHAKE128 or AES128. The SHA-3- lational Hoare Logic (pRHL), which we use throughout to prove
derived extendable output function SHAKE is either SHAKE128 or equivalences between games. pRHL is designed to support reason-
SHAKE256 determined by the parameter set (FrodoKEM-640 uses ing about equivalences of probabilistic programs while reasoning
SHAKE128 and FrodoKEM-976, FrodoKEM-1344 use SHAKE256). only locally (within oracles) and without reasoning about the dis-
The SampleMatrix function samples an 𝑛 1 -by-𝑛 2 matrix with each tribution of specific variables—essentially keeping track only of
entry sampled from the error distribution 𝜒, which is a discrete and the fact that variables in one program are distributed identically to
symmetric distribution centered at zero and closely approximating a variables in the other, but not keeping track of what that distribu-
moderately wide Gaussian distribution (denoted as 𝜒FrodoKEM−640 , tion may be. This logic has proved highly expressive for the bulk
𝜒FrodoKEM−976 , 𝜒FrodoKEM−1344 for the 3 NIST security levels re- of cryptographic proof work. However, some steps require more
spectively). global reasoning (about the entire execution) or keeping track of
The Encode function encodes a bit string into a matrix and the the distribution of individual variables. Logical rules to support
Decode function decodes a matrix into a bit string using the follow- such reasoning steps are implemented in EasyCrypt, but are often
ing encoding and decoding functions, given 2𝐵 ≤ 𝑞 and 0 ≤ 𝑘 < 2𝐵 : unwieldy to apply in concrete context. The EasyCrypt team has,
over the years, developed a number of generic libraries that abstract
Encode(𝑘) = 𝑘 · 𝑞/𝑥 𝐵 Decode(𝑐) = ⌊𝑐 · 2𝐵 /𝑞⌉ mod 2𝐵 those more complex reasoning rules into “game transformations” or
7 https://easycrypt.info
Manuel Barbosa, Matthias J. Kannwischer, Thing-han Lim, Peter Schwabe, and Pierre-Yves Strub

equivalence results that can be instantiated as part of other proofs. the need for and difficulties associated to building a reduction to
Our proof makes use, in particular, of the Hybrid theory, which MLWE when reasoning about correctness in the rest of the section.
provides a formalized and generic argument for bounding the dis- The proof is carried out in the random oracle model (ROM) in
tance between two games that differ only in one oracle, but where two steps. We first define the MLWE problem in the ROM in the
the transition must be done query-by-query for the purpose of the natural way: the adversary has access to a random oracle mapping
proof. We also rely on the PROM theory, which provides a generic a seed to a matrix A, and receives as challenge a vector t = As + e
argument, initially intended to apply to programmable random and a random seed 𝑠𝑑 that was used by the challenger to retrieve
oracles, that encapsulates the widely used argument that one can A from the random oracle. A simple reduction permits proving
move the sampling of a value that is independent of the adversary’s that distinguishing t from a vector sampled uniformly at random
view. amounts is equivalent to the standard MLWE problem where A is
given directly to the adversary a part of the challenge: the reduction
3 Analysis of K-PKE just lazily simulates the RO itself, programming A in the point
All the results stated in this section are formally verified in Easy- defined by 𝑠𝑑.
Crypt. Our proofs are conducted over the simplified specification The IND-CPA proof then proceeds in two hops justified using
of the PKE shown in Figure 2. We factor out the encoding and de- MLWE in the ROM. The first hop replaces the t vector in the public
coding of ring elements (and vectors thereof) to operators ★_encd key with a uniform vector. The second hop uses the fact that t can
and ★_decd. For public keys and secret keys, these operators are now be seen as an extra row in an MLWE challenge matrix and
simply assumed to form a bijection, which was formally proved replaces both MLWE samples computed in the ciphertext (u, ⟨t, r⟩ +
in [4], and implies that they play no role in the security and correct- 𝑒 2 ) with uniform values. Both steps are justified by reductions
ness analyses. For ciphertexts and messages, we will see that the that receive an MLWE challenge and construct for the adversary a
definitions of encoding and decoding vary with different variants perfect interpolation between the two games involved in the hop:
of ML-KEM. However, one can express all results generically so as if the reduction is given an MLWE sample, the adversary is run in
to cover all variants, and only deal with a full instantiation when the game on the left, and otherwise it is run in the game on the
a computation of a probability is needed. We assume an arbitrary right. In the final game it is clear that all information about the
distribution SD over seeds, and consider the case where all the message is information-theoretically hidden from the adversary, as
ring elements are sampled from the same binomial distribution B this is masked by a value sampled uniformly and independently at
(shown as B𝑘 when applied to vectors).8 Finally, the matrix A is random. So the probability of correctly guessing the challenge bit
taken as the output of a random oracle. 𝑏 is exactly 1/2. Combining all the proof steps, we can express the
We justify the simplifications in this specification as follows. In security of the K-PKE in the ROM in terms of the standard MLWE
practice, the sampling procedure for A is public, so there is really problem.
no way to argue that A has a distribution that looks uniform to
an adversary. However, modeling sampling procedure as a random
3.2 Correctness Analysis
oracle allows formally relating the security of ML-KEM to the stan-
dard MLWE problem, and it is aligned with the intuition of using The proof of correctness first rearranges the correctness game in
SHA-3-based rejection sampling to compress A into a small seed. Figure 1 instantiated with the K-PKE into the form shown in Figure 3
In our analysis we will also take B to be the binomial distribution, (left). A simple algebraic argument permits showing that the noise
rather than the SHA-3-based sampling procedure used in practice. that remains added to the message 𝑚 is given by the expression
It was proved in [4] that this procedure generates noise that is assigned to 𝑛˜ in the figure, and ⌊𝑞/4⌋ −1 is the maximum noise value
computationally close to the binomial distribution if the specific above which a decoding error can occur in the message recovery.
variant of SHA-3 used in that process is a secure PRF—and this Note that the noise expression includes two terms c𝑢 and 𝑐 𝑣 that
holds statistically in the random oracle model. To summarize: our capture the inaccuracy introduced by encoding ciphertexts via
results rely heavily on the random oracle heuristic to justify that rounding to a smaller noise: these are expressed as additive noise,
the distributions over which we perform the computations are good each of them defined as the difference between the original value
approximations of those occurring in ML-KEM. Nevertheless, the and the decoded value.
simplifications we introduce in this way are natural and they are Our goal is to provide an upper-bound for the probability that the
aligned with prior analyses of ML-KEM [11]. ˜
noise threshold is exceeded in at least one of the 256 coefficients in 𝑛.
The way that this is typically done in the literature is to argue that,
3.1 Security analysis although the joint distribution of all 256 coefficients is complex, the
distribution of each coefficient individually can actually be proven
We have formally verified a security proof of the K-PKE alternative
to be the same. Moreover, this distribution is simple enough that
to that given in [4]. The difference in this formalization is that we
one can just exhaustively compute the probability mass function
establish a direct connection to the standard MLWE assumption,
over all elements in the support to obtain an exact upperbound
which we are able to do because we model the sampling of A as
for the probability 𝜖 of one coefficient in 𝑛˜ exceeding the noise
coming from a random oracle. We do not claim any novelty here,
threshold. A union bound then permits obtaining a final bound of
but we present the result because the intuition helps understand
256 · 𝜖.
8 This means our proof doesn’t strictly cover ML-KEM-512, but it can be easily extended As we will see in the next section, this argument applies directly
to do so. in the case of FrodoKEM, because the noise expression nicely
Formally Verified Correctness Bounds for Lattice-Based Cryptography

Algorithm Gen O ( ): Algorithm Enc O (𝑝𝑘, 𝑚): Game CORML-KEM : Game CORheuristic
ML-KEM :
1: 𝑠𝑑 ←$ SD 1:(t, 𝑠𝑑) ← pk_decd(𝑝𝑘) 1: 𝑠𝑑 ←$ SD 1:
2: A ← O (𝑠𝑑) 2:A ← O (𝑠𝑑) 2: s, e ←$ B𝑘 2: s, e ←$ B𝑘
3: s, e ←$ B𝑘 3: r, e1 ←$ B𝑘 3: r, e1 ←$ B𝑘 3: r, e1 ←$ B𝑘
4: t ← As + e 4: 𝑒 2 ←$ B 4: 𝑒 2 ←$ B 4: 𝑒 2 ←$ B
5: 𝑝𝑘 ← pk_encd(t, 𝑠𝑑) 5: u ← A⊤ r + e1 5: A ← O (𝑠𝑑) 5:
6: 𝑠𝑘 ← sk_encd(s) 6: 𝑣 ← ⟨t, r⟩ +𝑒 2 + m_encd(𝑚) 6: t ← As + e 6:
7: return (𝑝𝑘, 𝑠𝑘) 7: 𝑐 ← c_encd(u, 𝑣) 7: 𝑝𝑘 ← pk_encd(t, 𝑠𝑑) 7:
8: return 𝑐 8: 𝑠𝑘 ← sk_encd(s) 8:
Algorithm Dec O (𝑠𝑘, 𝑐): 9: 𝑚 ←$ A O (𝑝𝑘, 𝑠𝑘) 9:
10: u ← A⊤ r + e1 10: u ←$ U (𝑅𝑞𝑘 )
1: s ← sk_decd(𝑠𝑘)
11: 𝑣 ← ⟨t, r⟩ + 𝑒 2 + m_encd(𝑚) 11: 𝑣 ←$ U (𝑅𝑞 )
2: (u, 𝑣) ← c_decd(𝑐)
12: (c𝑢 , 𝑐 𝑣 ) ← c_decd(c_encd(u, 𝑣)) − (u, 𝑣) 12: (c𝑢 , 𝑐 𝑣 ) ← c_decd(c_encd(u, 𝑣)) − (u, 𝑣)
3: 𝑚 ← m_decd(𝑣 − ⟨s, u⟩)
13: 𝑛˜ ← ⟨e, r⟩ − ⟨s, e1 ⟩ − ⟨s, c𝑢 ⟩ + 𝑒 2 + 𝑐 𝑣 13: 𝑛˜ ← ⟨e, r⟩ − ⟨s, e1 ⟩ − ⟨s, c𝑢 ⟩ + 𝑒 2 + 𝑐 𝑣
4: return 𝑚
14: return ∥𝑛∥ ˜ ∞ > ⌊𝑞/4⌋ − 1 14: return ∥𝑛∥ ˜ ∞ > ⌊𝑞/4⌋ − 1

Figure 2: Abstract specification of K-PKE. Figure 3: Rearranged correctness experiment for ML-KEM (left).
Heuristic approximation (right)

decomposes into summations and products of values independently Game COR1ML-KEM : Game COR2ML-KEM :
sampled from distributions with small support. However, in the 1: 𝑠𝑑 ←$ SD 1: s ←$ B𝑘
case of ML-KEM there is a problem: the distribution of the noise is 2: s, e ←$ B𝑘 2: u ←$ U (𝑅𝑞𝑘 )
affected by c𝑢 and 𝑐 𝑣 , which break this convenient behavior of the 3: r, e1 ←$ B𝑘 3: c𝑢 ← c_decd(c_encd(u)) −u
noise expression. We now consider three alternatives to addressing 4: 𝑒 2 ←$ B 4: 𝑛˜ 2 ← ⟨s, c𝑢 ⟩
5: A ← O (𝑠𝑑) 5: return ∥𝑛˜ 2 ∥ ∞ > 𝑡 c𝑢
this problem. 6: u ← A⊤ r + e1
7: c𝑢 ← c_decd(c_encd(u)) − u
3.2.1 Heuristic approximation. The solution proposed in [11] and 8: 𝑛˜ 1 ← ⟨e, r⟩ − ⟨s, e1 ⟩ + 𝑒 2
used in the Kyber submission to the NIST competition simply as- 9: 𝑛˜ 2 ← ⟨s, c𝑢 ⟩
sumes that one can take the probability of the event defined in the 10: return ∥𝑛˜ 1 −𝑛˜ 2 ∥ ∞ > ⌊𝑞/4⌋−1−𝑡𝑐max
𝑣
experiment in Figure 3 (right) as an upper-bound for the correctness
error. The assumption here is that (u, 𝑣) are taken to be values sam- Figure 4: Removing the adversary (left). Provable bound un-
pled uniformly at random and independently from everything else der MLWE (right).
in the noise expression. This immediately allows brute-forcing the
probability computation: what is crucial here is that the distribution setting. However, it is clear that the adversary’s influence in the
of the error introduced by rounding each ciphertext coefficient is outcome of the experiment is limited to choosing 𝑚, so one can just
independent from other coefficients, which allows deriving a simple consider the worst case value of 𝑐 𝑣 , i.e. the smallest integer 𝑡𝑐max
𝑣
(computable) description of the final noise distribution. such that
Justifying that this assumption is reasonable seems, at first sight, Pr[CORML-KEM : ∥𝑐 𝑣 ∥ ∞ > 𝑡𝑐max ]=0
to follow from the MLWE problem: in fact, in the security proof 𝑣

we described above, (u, 𝑣) and shown to be computationally close And redefining the experiment to output ∥𝑛˜ ′ ∥ ∞ > ⌊𝑞/4⌋ − 1 − 𝑡𝑐max
𝑣
,
to uniform after the two game hops we described above. However, where
the same game-hopping reasoning does not directly apply here: to 𝑛˜ ′ := ⟨e, r⟩ − ⟨s, e1 ⟩ − ⟨s, c𝑢 ⟩ + 𝑒 2
transform the COR experiment we would need to build a reduction
We show the resulting game COR1ML-KEM in Figure 4 (left)9 . Clearly,
B from MLWE, and this algorithm would need to 1) provide the
it is straightforward to prove that upperbounding the probability
secret key to the adversary, and 2) construct its guess based on
that the (reduced) noise threshold is reached in this new experiment
whether the COR experiment would return true or false. We note
also yields an upper-bound for the original experiment and there-
that, not only does B not know the secret key s to give to A, but also
fore for the correctness of the K-PKE. We note that one can still
point 2) implies computing the noise expression explicitly using
not immediately justify that u can be assumed to be uniform under
the secret key s plus all ephemeral noise values not revealed by
MLWE: a reduction from MLWE would still need to decide whether
the MLWE experiment. In Section 5 we explain how we obtain a
the noise threshold is exceeded in order to produce a guess, and this
formally verified computed bound for the heuristic explained above
implies explicitly computing 𝑛˜ ′ . Nevertheless, it is still interesting
that confirms the accuracy of the claims in [11]. We conclude this
to assess how much is lost by max-ing out the adversary’s influence
section on how one could provably bound the probability above
in the bound, so we also consider this case in Section 5.10
without the heuristic.

3.2.2 Removing the adversary. The discussion above shows that,


9We split the noise expression of 𝑛˜ ′ into two sub-expressions, because this is useful
for the discussion that follows.
unless one removes the need to provide adversary A with the secret 10 To be precise, this heuristic bound can be defined in terms of the experiment in
key, there is little hope of relying on the MLWE assumption in this Figure 4 (right), considering the event returned by experiment Figure 4 (left).
Manuel Barbosa, Matthias J. Kannwischer, Thing-han Lim, Peter Schwabe, and Pierre-Yves Strub

3.2.3 A provable bound. To obtain a (sub-optimal) provable bound, Algorithm Gen O ( ): Algorithm Enc O (𝑝𝑘, 𝑚):
we introduce two noise thresholds 𝑡 and 𝑡 c𝑢 , where the latter is 1: 𝑠𝑑 ←$ SD 1: (B, 𝑠𝑑) ← pk_decd(𝑝𝑘)
allocated to the term depending on c𝑢 and the former to the re- 2: A ← O (𝑠𝑑) 2: A ← O (𝑠𝑑)
3: S ←$ M𝑛× 𝑛¯ ¯
3: S′ ←$ M𝑚×𝑚
maining terms in the noise expression. Clearly, setting 𝑡 + 𝑡 c𝑢 = X X
E ←$ M𝑚× 𝑛¯ 4: E′ ←$ M𝑚×𝑛
¯
⌊𝑞/4⌋ − 1 − 𝑡𝑐max , the probability that the experiment returns true 4:
X X
𝑣
5: B ← AS + E ′′
5: E ←$ M𝑚× ¯ 𝑛¯
can be upper-bounded by taking the union bound as follows: X
6: 𝑝𝑘 ← pk_encd(B, 𝑠𝑑) 6: U ← S′ A + E′
7: 𝑠𝑘 ← sk_encd(S) 7: V ← S′ B + E′′ + m_encd(𝑚)
Pr[COR1ML-KEM : ⊤] ≤ Pr[COR1ML-KEM : ∥ 𝑛˜ 1 ∥ ∞ > 𝑡 ] + 8: return (𝑝𝑘, 𝑠𝑘) 8: 𝑐 ← c_encd(U, V)
9: return 𝑐
Pr[COR1ML-KEM : ∥ 𝑛˜ 2 ∥ ∞ > 𝑡 c𝑢 ] .
Algorithm Dec O (𝑠𝑘, 𝑐):
In other words, we can analyze the two events independently. 1: S ← sk_decd(𝑠𝑘)
Furthermore, the first probability corresponding to 𝑛˜ 1 is already in 2: (U, V) ← c_decd(𝑐)
a form that can be computed by exhaustive evaluation: and note 3: 𝑚 ← m_decd(V − US)
that this intuitively corresponds to the probability of a decryption 4: return 𝑚
error if no rounding was used by ML-KEM. On the other hand, the Figure 5: Abstract specification of FrodoKEM PKE.
term corresponding to 𝑛˜ 2 can now be justified using MLWE: we
introduce an additional experiment COR2ML-KEM in Figure 4 (right) even though these observations are folklore knowledge in the com-
and prove that a simple reduction to MLWE can be used to justify munity, it is interesting to highlight the fact that there are provable
that: security costs inherent to the introduction of optimizations in the
design of post-quantum schemes. As before, all the results stated
in this section have been machine-checked in EasyCrypt.
| Pr[COR1ML-KEM : ∥ 𝑛˜ 2 ∥ ∞ > 𝑡 c𝑢 ] −
Pr[COR2ML-KEM : ∥ 𝑛˜ 2 ∥ ∞ > 𝑡 c𝑢 ] | ≤ 𝜖LWE . 4.0.1 The specification. The specification is given in Figure 5. We
follow very much the same approach as for ML-KEM in model-
The reduction is now trivial, since it only needs to generate s ing the encoding and decoding operations, and the introduction
itself to check if an error occurred. Finally, we can use COR2ML-KEM of the random oracle to sample A. The main difference is the use
to brute-force the probability of exceeding threshold 𝑡 c𝑢 , with the of a general theory for matrices that allows fixing the dimensions
guarantee that any error significant introduced by the approxima- dynamically, rather than working with hardwired dimensions in
tion would imply an attack on MLWE: the type. This allows us to reason about the relations between vari-
ous LWE definitions and reductions involving matrices of different
Pr[COR1ML-KEM : ⊤] ≤ Pr[COR1ML-KEM : ∥ 𝑛˜ 1 ∥ ∞ > 𝑡 ] + dimensions, as well as distributions over matrices of different di-
Pr[COR2ML-KEM : ∥ 𝑛˜ 2 ∥ ∞ > 𝑡 c𝑢 ] + 𝜖LWE . mensions, in a unified context. Note in the figure the use of M 𝑎×𝑏
X
to denote the lifting of a distribution over field elements X to ma-
Note that this provable bound comes at a cost: we are over-
trices of dimension 𝑎 × 𝑏. Again, the encoding and decoding of
approximating the decryption error by declaring the adversary the
public and secret keys is irrelevant for our results and is treated as
winner whenever noise terms, which could cancel each-other out
an abstract bijection. Moreover, for FrodoKEM, the encoding and
in the full noise expression, exceed individually a partial threshold.
decoding of ciphertexts is also a bijection, so the only encoding/de-
We will see the impact in Section 5. Another way to interpret this
coding operators that need to be taken into consideration for the
bound is the following: we can statistically bound the error when no
correctness bound are the ones applied to the message.
rounding is used, so we exclude this possibility with a conservative
threshold. Then, the only way that an error could occur is caused
4.1 Security analysis
by the rounding component and we are able to use the MLWE
assumption to formally exclude this possibility. In this analysis We have formally verified a security proof of the FrodoKEM PKE.
we lose precision because, clearly, the summation of both noise The novelty here compared to prior machine-checked security
components could still be small enough not to cause an error, even proofs for post-quantum PKEs is pushing the reduction down to
if one of them exceeds its threshold. standard LWE, relying on the fact that we are assuming the matrix
A to be produced by a random oracle. The proof is carried out in
4 Analysis of FrodoKEM PKE the random oracle model in three steps.
We now turn our attention to FrodoKEM PKE. We proceed in the 4.1.1 From LWE to Matrix LWE.. We first define the standard LWE
same way as we did for ML-KEM, first introducing the abstract spec- problem and the Matrix LWE problem. Both provide the adversary
ification over which we conducted the analysis, and then describing with a challenge (A, B), and the adversary must guess whether this
the security and correctness proofs. The main difference to ML- challenge comes from a real or an ideal distribution. In the real
KEM is that, here, we can present a security proof that goes down distribution, the challenge is constructed as B = AS + E where A, S
to the standard LWE problem and, most importantly, we can give and E are matrices over some ring 𝑅𝑞 . Matrix A is sampled from
a functional correctness proof that doesn’t require any heuristic the uniform distribution, whereas S and E are sampled from some
approximations (except for the ROM) and yields a machine-checked arbitrary distribution X lifted to the appropriate matrix dimensions.
proof that the PKE decryption error probability is bound by the In the ideal distribution A and B are sampled independently and
numbers claimed by the designers of FrodoKEM. We believe that, uniformly at random. For the Matrix LWE problem, we have A ∈
Formally Verified Correctness Bounds for Lattice-Based Cryptography

𝑅𝑞𝑚×𝑛 , S ∈ 𝑅𝑞𝑛×𝑛¯ , and B, E ∈ 𝑅𝑞𝑚×𝑛¯ . In the standard LWE problem provable


Game CORFrodoKEM :
we have 𝑛¯ = 1, so S, B and E are column vectors. We prove the 1: S ←$ M𝑛× 𝑛¯
X
following theorem: 2: E ←$ M𝑚× 𝑛¯
X
3: ′
S ←$ M X ¯
𝑚×𝑚
Theorem 4.1. For every adversary A attacking the Matrix LWE
problem, there exists an adversary B such that: 4: E′ ←$ M𝑚×𝑛X
¯

5: E′′ ←$ M𝑚× X
¯ 𝑛¯

𝑛¯ 𝑛¯ 6: (cU, cV ) ← c_decd(c_encd(U, V)) − (U, V)


Pr[LWE𝑚,𝑛, ( A ) ⇒ 1|𝑏 = 1] − Pr[LWE𝑚,𝑛, ( A ) ⇒ 1|𝑏 = 0] ≤
X X 7: N ← S′ E − E′ S + E′′
𝑛¯ · (Pr[LWE𝑚,𝑛,1 ( B ) ⇒ 1|𝑏 = 1] − Pr[LWE𝑚,𝑛,1 ( B ) ⇒ 1|𝑏 = 0] ) 8: return ∃𝑖 𝑗, ¬(−𝑞/2𝐵+1 ≤ N[𝑖, 𝑗] < 𝑞/2𝐵+1 )
X X
The proof follows a hybrid argument, where at each step one of
Figure 6: Provable statistical bound for FrodoKEM
the columns of the Matrix LWE challenge is flipped from the real
distribution to the ideal distribution. This proof, which is straight-
forward on paper, required some effort to machine check. Notably, public key (A, B) is now seen as a monolythic LWE public matrix
we needed to develop a general theory of distributions over ma- [AB], so one must reason compositionally about the distribution
trices that permits reasoning about composing distributions over of the Matrix LWE challenge, when proving that the reduction
submatrices. Once this library was in place, we first re-expressed matches the distribution of the security games over which the
the Matrix LWE assumption as an experiment that samples the hop is being carried out—in these games A and B are constructed
column vectors of the challenge matrix B in a while loop, one at a separately.
time—the new library allowed us to prove equivalence by induction.
From that point on, we relied on the EasyCrypt libraries for general 4.2 Correctness Analysis
hybrid arguments, with extra support from the PROM theory when
We carry out our analysis of the FrodoKEM PKE correctness in
we needed to argue that the crucial 𝑖-th sample involved in a hybrid
pretty much the same way as was presented for ML-KEM. However,
step could be pre-sampled outside of the loop—this is needed to
in this case, the analysis is much simpler. Indeed, a simple algebraic
construct a reduction to LWE for the 𝑖-th step: the challenge sample
argument allows us to show that the experiment in Figure 6 pro-
is given upfront, and then needs to be programmed into the 𝑖-th
vides a provable upper-bound for the failure probability in Frodo
loop iteration.
KEM, for any adversary. Indeed, the absence of any compression
4.1.2 IND-CPA security in the ROM.. The second step in the proof in ciphertexts gives us a nice cancellation in the decryption pro-
is to show that the Matrix LWE assumption in the ROM follows cess, ending up with an error distribution that can be characterized
from Matrix LWE assumption in the standard model and therefore based only on the distributions of the noise matrices. This allows
from LWE. This proof step is similar to the one we presented in the us to formally connect the probability of a decryption failure with
previous section for MLWE. Finally, the IND-CPA security proof for a machine-checked computed probability bound for this statistical
the FrodoKEM PKE follows the same structure as the one for the event, as we will describe in the next section.
K-PKE, comprising two hops. The first hop uses the Matrix LWE
assumption (in the ROM) to justify sampling matrix B in the public 5 Computing formally-verified upper-bounds
key as a uniform matrix. The second hop uses the Matrix LWE
In order to provide a machine-checked computation of an upper
assumption (in the ROM) to justify making the ciphertext uniform.
bound for each of the statistical events defined in Section 3 and
By showing that the adversary’s advantage in the final game is 0,
Section 4, three steps are needed:
and plugging in the previous results on Matrix LWE, we obtain the
following theorem for FrodoKEM. (1) Prove that the statistical event can be upper-bounded using a
union bound and, in some cases, reducing the problem to the
Theorem 4.2. The FrodoKEM PKE is IND-CPA secure under the
probability that a single integer modulo 𝑞 is within a prescribed
LWE assumption in the Random Oracle Model. More precisely, for
range. I.e., for ML-KEM we need to consider only the distribu-
every adversary A against FrodoKEM, there exist adversaries B1
tion of one polynomial coefficient, and in FrodoKEM we need
and B2 such that:
only consider the distribution of one matrix entry.
Pr[IND-CPARO
FrodoKEM ( A ) ⇒ 1|𝑏 = 1] −
(2) Prove that the probability above can be computed using an
Pr[IND-CPARO
FrodoKEM ( A ) ⇒ 1|𝑏 = 0] ≤ explicit functional formula over the reals, which essentially
𝑛¯ · (Pr[LWE𝑚,𝑛,1
X
( B1 ) ⇒ 1|𝑏 = 1] − represents the construction of the probability mass function,
Pr[LWE𝑚,𝑛,1 ( B1 ) ⇒ 1|𝑏 = 0] )+ followed by the computation of the tail probability.
X
(3) Extract the specification obtained in EasyCrypt to an OCaml
¯
¯ · (Pr[LWE𝑚,𝑛+
𝑚 X
𝑛,1
( B2 ) ⇒ 1|𝑏 = 1] − program and execute it to compute the probability upper bound.
¯
Pr[LWE𝑚,𝑛+
X
𝑛,1
( B2 ) ⇒ 1|𝑏 = 0] ) One of the current limitations of our work is that this step is
not done automatically: we have carefully crafted an OCaml
Again, although the proof is straightforward on paper and con-
program that syntactically closely matches the EasyCrypt spec-
ceptually identical to the proof for the K-PKE, there were some
ifications (with some caveats described below) and leave it as a
machine-checking challenges we needed to overcome in order to
direction for future work to automate this process.
conclude it. In particular, the second hop in the security proof re-
quires again to reason about the distributions of sub-matrices: the We now describe how we achieve these goals.
Manuel Barbosa, Matthias J. Kannwischer, Thing-han Lim, Peter Schwabe, and Pierre-Yves Strub

5.1 Modular reasoning about distributions and, furthermore, that zero is not the only element in the support:11
Our formalization starts with a few basic definitions of distribution

Pr[𝑥 = 0|𝑥 ←$ D] < 1
combiners. Throughout our formal development we consider only good(D) ⇒ .
∀𝑐, Pr[𝑥 = 𝑐 |𝑥 ←$ D] = Pr[𝑥 = −𝑐 |𝑥 ←$ D]
lossless distributions, where the summation of the probabilities of all
Note that both B and X mentioned above satisfy this property,
values in the support adds up to 1. When performing approximate
which we prove in EasyCrypt, but the distribution of rounding
computations, this property may, of course, not be preserved.
errors in ML-KEM does not.
Let D𝑖 , for 𝑖 ∈ 1, 2, . . . denote arbitrary distributions over a
The following property is straightforward to prove in EasyCrypt
ring 𝑅. Then we can define the distributions induced by addition,
for any ring:
subtraction, multiplication and inner products as
good(D2 ) ⇒ D1 ⊕ D2 = D1 ⊖ D2
good(D1 ) ⇒ good(D2 ) ⇒ good(D1 ⊕ D2 )
D1 ⊕ D2 := {𝑎 + 𝑏 |𝑎 ←$ D1 ; 𝑏 ←$ 𝐷 2 }
D1 ⊖ D2 := {𝑎 − 𝑏 |𝑎 ←$ D1 ; 𝑏 ←$ 𝐷 2 } Furthermore, if working over a field, which is the case of Z𝑞 in
ML-KEM (but not in FrodoKEM) we prove that, for any I and any
D1 ⊗ D2 := {𝑎 · 𝑏 : 𝑎 ←$ D1 ; 𝑏 ←$ 𝐷 2 }
( 𝑛 ) 𝑛, have that:
∑︁
⟨D1, D2 ⟩𝑛 := 𝑛 𝑛
a𝑖 · b𝑖 : a ←$ D1 ; b ←$ 𝐷 2 good(D1 ) ⇒ good(D2 ) ⇒ ⟨D1, D2 ⟩𝑛 = ⟨D1, D2 ⟩𝑛I
𝑖=1
( 𝑛 ) good(D1 ) ⇒ good(D2 ) ⇒ good(⟨D1, D2 ⟩𝑛 )
∑︁
⟨D1, D2 ⟩𝑛I := 𝑖∈I
−1 · a𝑖 · b𝑖 : a ←$ D1𝑛 ; b ←$ 𝐷 𝑛2 The existence of multiplicative inverses for all non-zero elements
𝑖=1 permits showing that multiplication preserves the good property,
and the result then follows by induction.
Equipped with these general results, we can now look at how
Here, the last distribution is a generalization of the inner product, they permit proving the correctness of simple and efficiently com-
where some terms are added and other terms are subtracted. This putable formulas for the probability bounds defined in Section 3
distribution is useful to describe multiplication in the polynomial and Section 4.
ring underlying ML-KEM.
We note that, independently of the cardinality of the ring, if 5.2 When all noise coefficients are alike
distributions D1 and D2 have small enough support, then the prob- 5.2.1 ML-KEM without rounding. The most elegant result can be
ability mass functions of the distributions resulting from a small established for the probability defined in Section 3 as
number of applications of these combiners can be computed by
exhaustive evaluation. In particular, this is the case when D𝑖 is one Pr[COR1ML-KEM : ∥𝑛˜ 1 ∥ ∞ > ⌊𝑞/4⌋ − 1 − 𝑡𝑐max
𝑣
− 𝑡 c𝑢 ]
of the following distributions: We recall that this is intuitively the probability that the noise ex-
pression in ML-KEM exceeds a threshold 𝑡, if one does not consider
(1) The binomial distribution B over Z𝑞 as used in all the vari- the rounding noise. The noise expression in this case is given by:
ants of ML-KEM.
(2) The distribution of the rounding error resulting from round- 𝑛˜ 1 := ⟨e, r⟩ − ⟨s, e1 ⟩ + 𝑒 2
ing a uniform element in Z𝑞 to a smaller modulus, required Here, e, r, and s are vectors of polynomials of size 𝑘, where each
to analyze the probability of error in all of the ML-KEM polynomial has 256 coefficients. 𝑒 2 and 𝑛˜ 1 are each a single polyno-
variants. mial. Each coefficient in each of the input polynomials is sampled
(3) The distribution of the noise X over Z𝑞 used in all of the independently at random from the binomial distribution B. Using
FrodoKEM variants. the general results above we can prove the following theorem.

We first proved the following general result in EasyCrypt, for any Theorem 5.1. The distribution of each coefficient of 𝑛˜ 1 is given by
ring and any D1 and D2 : ⟨B, B⟩256𝑘 ⊕ ⟨B, B⟩256𝑘 ⊕ B

Sketch. We first consider the operations over the ML-KEM


𝑛
Ê polynomial ring. Addition is done coefficient-wise, so one can easily
⟨D1, D2 ⟩𝑛 = D1 ⊗ D2
propagate the good property. Multiplication in the polynomial ring
1
can be defined, for each coefficient of the result, as a generalized
inner product over the coefficients. More precisely, if we see two
This result permits computing the distribution of an inner product polynomials 𝑎 and 𝑏 as vectors of coefficients of size 256, and we
using a standard double and add algorithm, starting from
É the prob- 11 Thislatter requirement ensures that we are actually working with non-trivial noise
ability mass function of D1 ⊗ D2 and computing the combiner distributions throughout the computation. It was recently pointed out to us that this
𝑂 (lg 𝑛) times. requirement, although intuitive, may be unnecessarily strong if the goal is only to
We also define a restricted class of distributions, called good, simplify the final probability expressions. Removing it would simplify the proofs of
preservation of good and may allow deriving more general results, e.g., for commu-
if they satisfy the following two requirements, which intuitively tative rings, that can be important for other use cases. This is not relevant for our
mean that the distribution is symmetric and centered around zero examples, but the adaptation is simple and will be considered in future work.
Formally Verified Correctness Bounds for Lattice-Based Cryptography

slightly abuse notation, it is well known that one can write the We note that both the results for ML-KEM and FrodoKEM are
formula for coefficient 𝑖 of the product as: obtained generically, and so we can use them for different variants
I of each construction.
(𝑎 · 𝑏)𝑖 = ⟨𝑎, 𝑏⟩256 for I = {𝑘 > 𝑖 | 𝑘 ∈ [0..256)}
The EasyCrypt proof relies on this fact and the goodness of B to
derive that a coefficient of the product of two polynomials sampled
5.3 Dealing with rounding in ML-KEM
from B 256 is given by ⟨B, B⟩256 . The proof, for a fixed 𝑖, is by 5.3.1 The provable bound. We now return to ML-KEM and assess
induction on the summation that builds the coefficient value and the impact of rounding in the analysis. Let us begin with the isolated
relies on the general properties of distributions given above. A event associated with a rounding error in the ciphertext component
second inductive proof over 𝑘 allows us to use the properties we u, that we defined in Section 3 as:
established for ring addition and multiplication and extend the
result to vectors of polynomials. Note that the final expression does Pr[COR2ML-KEM : ∥𝑛˜ 2 ∥ ∞ > 𝑡 c𝑢 ] where 𝑛˜ 2 := ⟨s, c𝑢 ⟩
not use ⊖ at all, which is possible due to the propagation of the
good property throughout the whole computation. □ □ The distribution of the error is defined as
The above result has the nice property that the distribution of all
noise coefficients is the same. This means that one can use a union Dc𝑢 := { c𝑢 | c𝑢 ← c_decd(c_encd(u)) − u; u ←$ U (𝑅𝑞𝑘 )}
bound and derive that:
Pr[COR1ML-KEM : ∥𝑛˜ 1 ∥ ∞ > 𝑡] ≤ For concreteness, when rounding a coefficient to 10 bits, as in
ML-KEM-768, this distribution can be defined by the following
256 · Pr[ |𝑐 | > 𝑡 | 𝑐 ←$ ⟨B, B⟩256𝑘 ⊕ ⟨B, B⟩256𝑘 ⊕ B ]
frequency list
where 𝑡 = ⌊𝑞/4⌋ − 1 − 𝑡𝑐max − 𝑡 c𝑢 . (1)
𝑣

5.2.2 FrodoKEM.. The case of FrodoKEM is similar to the proof {(−2, 128), (−1, 1024), (0, 1024), (1, 1024), (2, 129)}
above, with the important caveat that we are working over two-
dimensional structures. For our machine-checked proof this intro- where the first element in each pair represents the element in Z𝑞 and
duces additional complexity, so we developed a theory of distribu- the second represents the number of occurrences. Probabilities can
tions over matrices that permits seeing distributions over matrices be obtained by dividing the second elements by 3329. When round-
as distributions over lists. Using this framework, we proved a result ing to 11 bits, as in ML-KEM-1024 we get {(−1, 640), (0, 2048), (1, 641)}.
that is the analogue of the one presented above for the ML-KEM So, these distributions have small support, but they do not satisfy
polynomial ring, but expressed over the ring of matrices used by the good definition due to the lack of symmetry.
FrodoKEM. Consider the expression for the noise matrix we ob- The implication of this is that we cannot simplify the description
tained in Section 4: of the distribution beyond the statement in the following theorem.
N := S′ E − E′ S + E′′
Theorem 5.3. The distribution of coefficient 𝑖 in 𝑛˜ 2 is given by
Here S′ is a matrix of dimensions 𝑛¯ × 𝑛, E has dimensions 𝑛 × 𝑛,
¯ E′
has dimensions 𝑛¯ ×𝑛, S has dimensions 𝑛 × 𝑛¯ and E′′ has dimensions
⟨B, Dc𝑢 ⟩𝑘 (𝑖+1) ⊖ ⟨B, Dc𝑢 ⟩𝑘 (255−𝑖 )
¯ 12 We prove the following theorem:
𝑛¯ × 𝑛.
Theorem 5.2. The distribution of each coefficient of N is given by The machine-checked proof is tedious, as it requires reasoning
about the associativity of ⊕ and ⊖ when applied to general distri-
⟨X, X⟩𝑛 ⊖ ⟨X, X⟩𝑛 ⊕ X
butions. Using this property we first aggregate the terms which are
The proof is similar in structure to the one presented for the added and those which are subtracted in each ring multiplication,
ML-KEM expression, but conceptually simpler because matrix mul- and then aggregate them again across the inner products of vectors
tiplication can be expressed directly using simple (rather than gen- s and c𝑢 of size 𝑘. The resulting probability distribution now de-
eralized) inner products. We cannot, however propagate the good pends on the coefficient index, which means that the computation
property from the input distribution to the inner product distribu- of the bound cannot be simplified beyond the following summation
tion because we are not working over a field. For this reason the over all coefficients:
final expression of the noise distribution still uses ⊖.
Nevertheless, we still obtain the nice result that all noise coeffi- Pr[COR2ML-KEM : ∥𝑛˜ 2 ∥ ∞ > 𝑡 c𝑢 ] ≤
cients are equally distributed, and so we can derive the following Í255 (3)
upper bound for the error probability: 𝑖=0 Pr[ |𝑐 | > 𝑡 c𝑢 | 𝑐 ←$ ⟨B, Dc𝑢 ⟩𝑘 (𝑖+1) ⊖ ⟨B, Dc𝑢 ⟩𝑘 (255−𝑖 ) ]
provable
Pr[CORFrodoKEM : ∃𝑖 𝑗, ¬(−𝑞/2𝐵+1 ≤ N[𝑖, 𝑗] < 𝑞/2𝐵+1 )] ≤
The above bound, combined with the one we obtained in the
𝑛¯2 · Pr[ ¬(−𝑞/2𝐵+1 ≤ 𝑐 < 𝑞/2𝐵+1 ) | 𝑐 ←$ ⟨X, X⟩𝑛 ⊖ ⟨X, X⟩𝑛 ⊕ X ] previous section and stated in Equation (1) allows us to obtain a
(2) provably secure correctness bound for ML-KEM under the MLWE
12 In comparison to the general result shown in Section 4 that applies to a PKE based on
assumption, as discussed in Section 3. However, even optimizing
LWE using arbitrary, yet consistent, matrix dimensions, we focus here on the concrete for the most favorable 𝑡 c𝑢 this bound is significantly worse than
case of FrodoKEM where we have 𝑛 = 𝑚 and 𝑚 ¯ = 𝑛¯ . the heuristic approximations that we will discuss next.
Manuel Barbosa, Matthias J. Kannwischer, Thing-han Lim, Peter Schwabe, and Pierre-Yves Strub

5.3.2 The heuristic bounds. The two heuristic bounds we consider 𝑛-fold summation of identically and independently distributed val-
are defined as follows. ues by a double-and-add algorithm, and using memoization to avoid
the repeated computation of some intermediate results. We plan to
Pr[CORheuristic ˜ ∞ > ⌊𝑞/4⌋ − 1]
ML-KEM : ∥𝑛∥ (4)
provide EasyCrypt proofs that these optimizations are correct in
Pr[COR1ML-KEM : ∥𝑛˜ ′ ∥ ∞ > ⌊𝑞/4⌋ − 1 − 𝑡𝑐max ] (5) the future.
𝑣
The OCaml code then emulates the required computations over
where 𝑛˜ := ⟨e, r⟩ − ⟨s, e1 + c𝑢 ⟩ + 𝑒 2 + 𝑐 𝑣 the reals by using the MPFR library, using a precision of 500 bits and
enforcing a rounding mode toward infinity to emulate the computa-
𝑛˜ ′ := ⟨e, r⟩ − ⟨s, e1 + c𝑢 ⟩ + 𝑒 2 tions over the reals. This ensures that we always overapproximate
the mass functions of the distributions, thereby guaranteeing that
The first one corresponds to assuming that both c𝑢 and 𝑐 𝑣 result
the computed failure probabilities serve as a valid upper bound.
from rounding uniform (vectors of) ring elements, as is done in
One other potential direction for future improvements is devel-
the original Kyber analysis [11]. In the second one we make the
oping a robust extraction mechanism for EasyCrypt that enables
assumption only for c𝑢 and max-out the 𝑐 𝑣 component. The formu-
efficient computation outside its virtual machine while preserving
las we obtain for the noise expressions are given by the following
formal guarantees. Alternatively, implementing a more efficient
theorems.
evaluator directly within EasyCrypt could make direct computa-
Theorem 5.4. The (heuristic approximation) of the distribution of tions feasible without resorting to extraction to OCaml.
coefficient 𝑖 in 𝑛˜ is given by
⟨B, B⟩256𝑘 ⊖ (⟨B, B ⊕Dc𝑢 ⟩𝑘 (𝑖+1) ⊖ ⟨B, B ⊕Dc𝑢 ⟩𝑘 (255−𝑖 ) ) ⊕B ⊕D𝑐 𝑣 5.5 Results and Discussion
Theorem 5.5. The (heuristic approximation) of the distribution of The results of our verified probability computations are given in
coefficient 𝑖 in 𝑛˜ ′ is given by Table 1. We show in blue the results that confirm the claims in the
submissions to the NIST post-quantum competition. For FrodoKEM
⟨B, B⟩256𝑘 ⊖ (⟨B, B ⊕ Dc𝑢 ⟩𝑘 (𝑖+1) ⊖ ⟨B, B ⊕ Dc𝑢 ⟩𝑘 (255−𝑖 ) ) ⊕ B these correspond to Equation (2) and they are given in the Prov-
Since the distribution depends on the index of the coefficient, able column—this is because we can formally connect them to the
the overall upper bound is obtained by computing the probability definition of cryptographic correctness required for security proofs
for each 𝑖 and summing all 256 values to obtain the union bound. of CCA security. For ML-KEM, the claims in the NIST submissions
In the next section we describe how we perform floating point correspond to the heuristically simplified distributions (in column
computations that are guaranteed to provide an upper bound for Heur. c𝑢 , 𝑐 𝑣 ) captured by the bound in Equation (4), i.e., assuming
the above mathematical quantities defined in this section. We will that both rounding errors c𝑢 and 𝑐 𝑣 result from rounding uniform
conclude the section and the technical part of the paper with our elements.
numeric results. The Provable bounds for ML-KEM have been computed as the
summation of two probabilities given by Equation (1) and Equa-
5.4 Computing the upper bounds. tion (3). Recall that, in the analysis, we assign a threshold of 𝑡 c𝑢 to
Our formal development provides rigorous upper bounds for the the rounding error noise term, and a threshold of ⌊𝑞/4⌋−1−𝑡𝑐max 𝑣
−𝑡 c𝑢
statistical events defined in Section 3 and Section 4. This devel- to the noise terms unrelated to rounding. To obtain the final bound
opment relies on the explicit construction of discrete probability we tried all possible values of 𝑡 c𝑢 and selected the thresholds that
distributions and the computation of concrete upper bounds of some provided the best upper bounds. We illustrate the observed behav-
probability events defined on them. However, while EasyCrypt al- ior in Figure 7. The optimal value for 𝑡 c𝑢 is 296 in ML-KEM-768.
lows for exact computations in theory, performing them in practice This corresponds to partial error probabilities of 2 −81 and 2 −82 . The
within the tool is not possible.13 To overcome this limitation, we optimal value for 𝑡 c𝑢 is 240 in ML-KEM-1024. This corresponds to
have mirrored the constructive definitions of these distributions in partial error probabilities of 2 −96 and 2 −97 . Finally, we also report
OCaml, enabling practical computation of failure probabilities. We for ML-KEM the heuristic bound that results from max-ing out
took care to keep the OCaml definitions syntactically as close as 𝑐 𝑣 and assuming only that c𝑢 is computed over a uniform u. This
possible to their EasyCrypt counterparts to ensure correctness and corresponds to Equation (5).
maintain a strong link between the formal proofs and the numerical
computations. Remark. We emphasize that our search for a provable bound
This can be observed in the development provided as supple- for ML-KEM is not motivated by a belief that previously claimed
mentary material: the expression that describes how a probability bounds are incorrect, but rather to emphasize that, so far and to
distribution is constructed in EasyCrypt is easy to match to the the best of our knowledge, they have not been formally justified
expression that does this in OCaml. We deviated from a strict trans- under MLWE in a way that is compatible with the definition of
lation of the EasyCrypt computations only to introduce two opti- cryptographic correctness required for CCA security proofs. The
mizations that are yet unverified: computing the distribution of an intuition of such a proof would be that, if the simplification used to
compute the claimed heuristic bounds was wrong, then one should
13 Exact
computations over the reals in general remain out of reach due to constraints be able to break MLWE. We prove that this is indeed the case,
such as the need for unbounded rational numbers. The particular examples that
we handle in this paper might be within reach for a powerful machine with a well but only when the simplification is done separately on different
optimized implementation, but we decided to adopt a more pragmatic approach. noise terms, which has the cost of yielding a significantly worse
Formally Verified Correctness Bounds for Lattice-Based Cryptography

Table 1: Results of probability computations. Provable code produces results that are close enough to the original scripts
bounds: proved in EasyCrypt to apply to the cryptographic give us good indications that they are indeed correct.
definition of correctness. Heuristic (Heur.) bounds: assume In terms of technical challenges, the main hurdle we faced was
that the errors introduced by rounding one or both of the in showing that the distribution combiners can be applied to ML-
ciphertext elements in ML-KEM are distributed as if one KEM, where noise expressions are computed via polynomial ring
rounded a uniform value. Values in blue confirm the claims operations, by leveraging the cyclotomic structure of the ring. In-
in the submissions to the NIST competition. deed, deriving that all coefficients in the noise expression follow a
distribution with a simple enough description that allows efficient
Algorithm Variant Provable Heur. c𝑢 , 𝑐 𝑣 Heur. c𝑢 computation was, to the best of our knowledge, never proved in a
ML-KEM 768 2 −80 2 −164 2 −158 machine-checked setting.
ML-KEM 1024 2 −95 2 −174 2 −169
FrodoKEM 640 2 −138 - - Take-Away Messages. When we set out to do this work, our
FrodoKEM 976 2 −199 - - primary motivation was to unambiguously formalize the claims
FrodoKEM 1344 2 −252 - - about correctness errors in ML-KEM, which is clearly the most
practically relevant algorithm. We have achieved this: we have
formalized the (simplified) distribution of noise that is used in
the literature supporting ML-KEM, and we have a mechanized
proof that the reported probability bounds for this distribution are
correct. This is perhaps the most important result for practice in
the immediate future.
Our secondary motivation was to clarify what the claimed bounds
for ML-KEM mean from a provable security point of view. We do
this in two ways: 1) we highlight the fact that the simplified distribu-
tion above is a heuristic approximation, i.e., that it is an assumption
that currently underlies the security of ML-KEM (this has been
already noted in the literature); and 2) we provide a worst-case
scenario for removing this assumption by using MLWE to simplify
the distribution in a way that is compatible to the provable security
results for ML-KEM. The take-away message from these results is
not that our worst-case bound is the correct one to use for parame-
ter selection, but rather that further investigation is needed on how
to bound the error probability without relying on MLWE.
Figure 7: Behavior of the provable bound (y axis) computed To further clarify the area we decided to look at FrodoKEM
for ML-KEM-768 & ML-KEM-1024 for varying values of 𝑡 c𝑢 for two reasons: 1) it is also being endorsed for practical uses by
(x axis). public entities and 2) its conservative design permits obtaining an
efficiently computable bound for the failure probability that can
be directly plugged into IND-CCA2 security proofs. The second
bound.14 The take-away message is that to obtain stronger provably point, we believe, highlights a tradeoff between optimization and
secure guarantees for the current parameters one needs better provable security that in our opinion was not well understood in
ways to estimate the probability of failure without relying on the the past.
MLWE assumption, i.e., without assuming that rounding is applied
to uniform and independently distributed coefficients. Future Work. There are many interesting directions for future
work. Our current approach to connecting the EasyCrypt develop-
6 Conclusions and Future Work ment to OCaml code requires human intervention (and validation),
and so it is natural to consider either a fully automatic extrac-
Further Discussion. A natural question to ask is whether our tion mechanism, or an EasyCrypt extension that can perform such
results somehow formally verify the implementations of the Python computations directly inside the tool. Our techniques should nat-
scripts that were previously used to compute the correctness bounds urally extend to probability bounds computed explicitly for other
we corroborate. Strictly speaking this is not the case, as there are lattice-based constructions and ML-DSA in particular. Also, we did
several points in which the code we use for computations differs not yet consider Saber because it seems not to have the same im-
from the original Python scripts. For example, the original Python mediate practical relevance as ML-KEM and FrodoKEM; however,
“cleans up” intermediate distributions by removing points with very formally verifying the correctness bounds for this algorithm may
low mass, and it performs composition of distribution in different also raise interesting questions on how to deal with conditional
ways than we do. We did not initially have an intuition on the probabilities when simplifying the analyses of error distributions,
potential impact of these differences. However, the fact that our as discussed in [17, 28]. A more exploratory direction is to consider
14 Indeed,
concrete probability bounds claimed for other families of crypto-
according to Figure 7, our approach to obtain a justification via MLWE
cannot result in a better upper bound than the one reported in the Provable column graphic primitives such as code-based and multivariate polynomial
in Table 1. cryptography.
Manuel Barbosa, Matthias J. Kannwischer, Thing-han Lim, Peter Schwabe, and Pierre-Yves Strub

Acknowledgements Schemes. 3–33. https://doi.org/10.1007/978-3-030-45727-3_1


[17] Jan-Pieter D’Anvers, Angshuman Karmakar Sujoy Sinha Roy, and Frederik
Most of the work was carried out while Manuel Barbosa was at Vercauteren. 2018. Saber: Module-LWR based key exchange, CPA-secure en-
MPI-SP and while Thing-han Lim was at Chelpis Quantum Corp. cryption and CCA-secure KEM. Cryptology ePrint Archive, Report 2018/230.
https://eprint.iacr.org/2018/230
The research was supported by Deutsche Forschungsgemeinschaft [18] Jan-Pieter D’Anvers, Frederik Vercauteren, and Ingrid Verbauwhede. 2018. The
(DFG, German research Foundation) as part of the Excellence Strat- impact of error dependencies on Ring/Mod-LWE/LWR based schemes. Cryptol-
egy of the German Federal and State Governments – EXC 2092 ogy ePrint Archive, Report 2018/1172. https://eprint.iacr.org/2018/1172
[19] Jan-Pieter D’Anvers, Frederik Vercauteren, and Ingrid Verbauwhede. 2018. On
CASA - 390781972 and by the German Federal Ministry of Educa- the impact of decryption failures on the security of LWE/LWR based schemes.
tion and Research (BMBF) in the framework of the 6GEM research Cryptology ePrint Archive, Report 2018/1089. https://eprint.iacr.org/2018/1089
hub under grant number 16KISK038. [20] Jintai Ding, Saed Alsayigh, R V Saraswathy, Scott Fluhrer, and Xiaodong Lin.
2017. Leakage of signal function with reused keys in RLWE key exchange.
In 2017 IEEE International Conference on Communications (ICC). 1–6. https:
//doi.org/10.1109/ICC.2017.7996806
References [21] Jintai Ding, Scott R. Fluhrer, and Saraswathy RV. 2018. Complete Attack on
[1] Agence nationale de la sécurité des systèmes d’information (ANSSI). 2023. ANSSI RLWE Key Exchange with Reused Keys, Without Signal Leakage. 467–486. https:
Views on the Post-Quantum Cryptography Transition: Follow-up Position Paper. //doi.org/10.1007/978-3-319-93638-3_27
Technical Report. Agence nationale de la sécurité des systèmes d’information [22] Scott Fluhrer. 2016. Cryptanalysis of ring-LWE based key exchange with key
(ANSSI). https://cyber.gouv.fr/sites/default/files/document/follow_up_position_ share reuse. Cryptology ePrint Archive, Report 2016/085. https://eprint.iacr.
paper_on_post_quantum_cryptography.pdf Accessed: 2025-02-11. org/2016/085
[2] Erdem Alkim, Joppe Bos, Léo Ducas, Patrick Longa, Ilya Mironov, Michael [23] Federal Office for Information Security (BSI). 2024. Cryptographic
Naehrig, Valeria Nikolaenko, Chris Peikert, Ananth Raghunathan, and Douglas Mechanisms: Recommendations and Key Lengths. Technical Guide-
Stebila. 2021. FrodoKEM Learning With Errors Key Encapsulation Algorithm line. https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/
Specifications And Supporting Documentation. https://frodokem.org/files/ TechGuidelines/TG02102/BSI-TR-02102-1.pdf.
FrodoKEM-specification-20210604.pdf Accessed: 2024-12-27. [24] FrodoKEM Team. 2024. FrodoKEM: Practical Quantum-Secure Key Encapsulation
[3] Erdem Alkim, Léo Ducas, Thomas Pöppelmann, and Peter Schwabe. 2016. Post- from Generic Lattices. https://frodokem.org/ Accessed: 2024-12-27, see the
quantum Key Exchange - A New Hope. 327–343. "News" section..
[4] José Bacelar Almeida, Santiago Arranz Olmos, Manuel Barbosa, Gilles Barthe, [25] Eiichiro Fujisaki and Tatsuaki Okamoto. 1999. Secure Integration of Asymmetric
François Dupressoir, Benjamin Grégoire, Vincent Laporte, Jean-Christophe Léch- and Symmetric Encryption Schemes. 537–554. https://doi.org/10.1007/3-540-
enet, Cameron Low, Tiago Oliveira, Hugo Pacheco, Miguel Quaresma, Peter 48405-1_34
Schwabe, and Pierre-Yves Strub. 2024. Formally Verifying Kyber - Episode V: [26] Kathrin Hövelmanns, Andreas Hülsing, and Christian Majenz. 2022. Failing
Machine-Checked IND-CCA Security and Correctness of ML-KEM in EasyCrypt. Gracefully: Decryption Failures and the Fujisaki-Okamoto Transform. 414–443.
384–421. https://doi.org/10.1007/978-3-031-68379-4_12 https://doi.org/10.1007/978-3-031-22972-5_15
[5] Nouri Alnahawi, Johannes Müller, Jan Oupický, and Alexander Wiesmaier. 2024. [27] Andreas Hülsing, Matthias Meijers, and Pierre-Yves Strub. 2022. Formal Ver-
A Comprehensive Survey on Post-Quantum TLS. 1, 2 (2024), 6. https://doi.org/ ification of Saber’s Public-Key Encryption Scheme in EasyCrypt. 622–653.
10.62056/ahee0iuc https://doi.org/10.1007/978-3-031-15802-5_22
[6] Gilles Barthe, Benjamin Grégoire, Sylvain Heraud, and Santiago Zanella Béguelin. [28] Zhengzhong Jin and Yunlei Zhao. 2017. Optimal Key Consensus in Presence of
2011. Computer-Aided Security Proofs for the Working Cryptographer. 71–90. Noise. Cryptology ePrint Archive, Report 2017/1058. https://eprint.iacr.org/
https://doi.org/10.1007/978-3-642-22792-9_5 2017/1058
[7] Daniel J. Bernstein, Leon Groot Bruinderink, Tanja Lange, and Lorenz Panny. [29] Katharina Kreuzer. 2024. Verification of Correctness and Security Properties for
2018. HILA5 Pindakaas: On the CCA Security of Lattice-Based Encryption with CRYSTALS-KYBER. In 37th IEEE Computer Security Foundations Symposium, CSF
Error Correction. 203–216. https://doi.org/10.1007/978-3-319-89339-6_12 2024. IEEE, 511–526. https://doi.org/10.1109/CSF61375.2024.00016
[8] Daniel J. Bernstein, Billy Bob Brumley, Ming-Shing Chen, Chitchanok Chuengsa- [30] Michael Naehrig, Erdem Alkim, Joppe Bos, Léo Ducas, Karen Easterbrook,
tiansup, Tanja Lange, Adrian Marotzke, Bo-Yuan Peng, Nicola Tuveri, Chris- Brian LaMacchia, Patrick Longa, Ilya Mironov, Valeria Nikolaenko, Christo-
tine van Vredendaal, and Bo-Yin Yang. 2020. NTRU Prime. Techni- pher Peikert, Ananth Raghunathan, and Douglas Stebila. 2020. FrodoKEM.
cal Report. National Institute of Standards and Technology. available Technical Report. National Institute of Standards and Technology. available
at https://csrc.nist.gov/projects/post-quantum-cryptography/post-quantum- at https://csrc.nist.gov/projects/post-quantum-cryptography/post-quantum-
cryptography-standardization/round-3-submissions. cryptography-standardization/round-3-submissions.
[9] Nina Bindel and John M. Schanck. 2020. Decryption Failure Is More Likely After [31] National Institute of Standards and Technology. 2024. FIPS PUB 203 – ML-
Success. 206–225. https://doi.org/10.1007/978-3-030-44223-1_12 KEM: Module-Lattice-Based Key-Encapsulation Mechanism Standard. https:
[10] Joppe W. Bos, Léo Ducas, Eike Kiltz, Tancrède Lepoint, Vadim Lyubashevsky, //nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.203.pdf.
John M. Schanck, Peter Schwabe, Gregor Seiler, and Damien Stehlé. 2018. [32] National Institute of Standards and Technology. 2024. FIPS PUB 204 – ML-DSA:
CRYSTALS - Kyber: A CCA-Secure Module-Lattice-Based KEM. In 2018 IEEE Module-Lattice-Based Digital Signature Standard. https://nvlpubs.nist.gov/
European Symposium on Security and Privacy, EuroS&P 2018. IEEE, 353–367. nistpubs/FIPS/NIST.FIPS.204.pdf.
https://doi.org/10.1109/EuroSP.2018.00032 [33] National Institute of Standards and Technology. 2024. FIPS PUB 205 – SLH-DSA:
[11] Joppe W. Bos, Léo Ducas, Eike Kiltz, Tancrède Lepoint, Vadim Lyubashevsky, Stateless Hash-Based Digital Signature Standard. https://nvlpubs.nist.gov/
John M. Schanck, Peter Schwabe, Gregor Seiler, and Damien Stehlé. 2018. CRYS- nistpubs/FIPS/NIST.FIPS.205.pdf.
TALS - Kyber: A CCA-Secure Module-Lattice-Based KEM. 353–367. https: [34] NIST 2015. FIPS PUB 202 – SHA-3 Standard: Permutation-Based Hash and
//doi.org/10.1109/EuroSP.2018.00032 Extendable-Output Functions. https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.
[12] Sofía Celi, Armando Faz-Hernández, Nick Sullivan, Goutam Tamvada, Luke 202.pdf.
Valenta, Thom Wiggers, Bas Westerbaan, and Christopher A. Wood. 2021. Imple- [35] Peter Schwabe, Roberto Avanzi, Joppe Bos, Léo Ducas, Eike Kiltz,
menting and Measuring KEMTLS. 88–107. https://doi.org/10.1007/978-3-030- Tancrède Lepoint, Vadim Lyubashevsky, John M. Schanck, Gregor
88238-9_5 Seiler, and Damien Stehlé. 2020. CRYSTALS-KYBER. Technical Re-
[13] Jan-Pieter D’Anvers and Senne Batsleer. 2022. Multitarget Decryption Failure port. National Institute of Standards and Technology. available at
Attacks and Their Application to Saber and Kyber. 3–33. https://doi.org/10.1007/ https://csrc.nist.gov/projects/post-quantum-cryptography/post-quantum-
978-3-030-97121-2_1 cryptography-standardization/round-3-submissions.
[14] Jan-Pieter D’Anvers, Qian Guo, Thomas Johansson, Alexander Nilsson, Frederik [36] Tianrui Wang, Anyu Wang, and Xiaoyun Wang. 2023. Exploring Decryption
Vercauteren, and Ingrid Verbauwhede. 2019. Decryption Failure Attacks on Failures of BIKE: New Class of Weak Keys and Key Recovery Attacks. 70–100.
IND-CCA Secure Lattice-Based Schemes. 565–598. https://doi.org/10.1007/978- https://doi.org/10.1007/978-3-031-38548-3_3
3-030-17259-6_19
[15] Jan-Pieter D’Anvers, Angshuman Karmakar, Sujoy Sinha Roy, Frederik Ver-
cauteren, Jose Maria Bermudo Mera, Michiel Van Beirendonck, and Andrea Basso.
2020. SABER. Technical Report. National Institute of Standards and Technol-
ogy. available at https://csrc.nist.gov/projects/post-quantum-cryptography/post-
quantum-cryptography-standardization/round-3-submissions.
[16] Jan-Pieter D’Anvers, Mélissa Rossi, and Fernando Virdia. 2020. (One) Failure Is
Not an Option: Bootstrapping the Search for Failures in Lattice-Based Encryption

You might also like