Zero-Knowledge Proofs Overview 2023
Zero-Knowledge Proofs Overview 2023
Data Protection
Master in Cyber Security (UPC) 2023
Jorge L. Villar
Contents
1 Interactive Zero Knowledge Proofs 1
2 Proofs of Knowledge 2
2.1 Sigma Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.1 Knowledge of a Discrete Logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.2 Knowledge of an e-th Root Modulo n . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.3 Knowledge of Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.4 Equality of Discrete Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.5 Inequality of Discrete Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 AND Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 OR Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Applications 8
4.1 Knowledge of Secret Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2 Plaintext Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3 Electronic Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.4 Directed Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1
• Correctness: If the protocol is run by an honest prover and an honest verifier, the verifier always
accepts the proof.
• Soundness: An honest verifier accepts the proof of a dishonest prover for a false statment with a
probability not greater than certain bound (e.g., 1/2).
• Zero Knowledge: The verifier learns nothing else about the honest prover’s private input from the
protocol execution, other than the fact that the claimed statment is true.
The last property in particular means that after running the protocol, the verifier can prove nothing about
the statment to a third party. Therefore, the proof convinces only the verifier taking part in the protocol
execution.
The “learns nothing” part needs a more precise definition. The technical way to define the zero-knowledge
property is by means of a simulator S of the honest prover, that tries to impersonate the real prover but
without receiving any private input. In particular, S simulates the protocol for V sending and receiving
the appropriate messages, and in the end, it outputs the transcript of a real execution with an identical
probability distribution.
The existence of such simulator proves that whatever the verifier obtains from the protocol execution with
an honest prover can also be obtained by the verifier alone (actually, the simulator extracts the information
from the verifier, without any intervention of any prover).
Some relaxed versions of the previous notions are considered in practice. For instance, in Zero Knowledge Ar-
guments the soundness property holds only in a computational sense (e.g., if the prover breaks the soundness
property, then it can also solve a conjectured hard problem, like the Discrete Logarithm Problem).
The zero-knowledge property can also be relaxed to be computational zero-knowledge if we only require
that the probability distributions of the output of the simulator and the transcript of the real protocol are
computationally indistinguishable.
Another relaxation of the zero-knowledge notion is the honest-verifier zero-knowledge, in which the property
is only guaranteed to hold for honest verifiers.
2 Proofs of Knowledge
A proof of knowledge is a special type of zero-knowledge proof in which the statment has the form “P
knows a witness for the statment”. For this special type of proofs, the soundness (also known as special
soundness) is defined via a knowledge extractor. Namely, there exists an algorithm E that interacts
with any (possibly dishonest) prover with acceptance probability above certain bound, and outputs a valid
witness for the proven statment.
A classical example of a zero-knowledge proof of knowledge is showing that the prover knows an isomorphism
between two graphs. Informally, a graph is a finite set of points (called vertices) together with a set of pairwise
connections (called edges). More formally, a (finite) graph is defined by the tuple G = (V, E), where V and
E are a finite sets, and every element of E is an unordered pair of different elements in V . For instance,
the graph defined by V = {a, b, c, d} and E = {{a, b}, {b, c}, {c, d}, {a, c}, {a, d}} can be drawn as a square
polygon with one of the two diagonals.
Two graphs are called isomorphic if there is relabelling of the vertices that converts one graph to the other.
In the previous example, one can transform the square with the diagonal a-c into the same square with the
other diagonal b-d by cyclically permuting the vertices a 7→ b 7→ c 7→ d 7→ a.
Finding an isomorphism between two large random graphs is a hard problem. Therefore, it makes sense
proving the knowledge of such an isomorphism without revealing it. Assume then that a prover P and a
verifier V know two isomorphic graphs G0 and G1 , and P knows also a graph isomorphism φ : G0 → G1 .
We assume, for simplicity, that both graphs have the same set of vertices, and then the isomorphism is also
a permutation of this set.
2
To start the protocol, P chooses a random permutation ψ1 of the vertices and defines the graph G2 = ψ1 (G1 ),
that is clearly isomorphic to G1 . P also defines the isomorphism ψ0 = ψ1 ◦φ, that is an isomorphism bewteen
G0 and G2 . Then, P sends the graph G2 to V , for instance, giving a list or a description of its edges. Next,
V chooses a random bit c ∈ {0, 1} as sends it to P as a challenge. P responds the challenge sending V the
isomorphism ψc . V will accept the proof that P knows an isomorphism between G0 and G1 if ψc is bijective
and G2 = ψc (Gc ).
The protocol is trivially correct, since an honest verifier will always accept the proof given by an honest prover.
On the other hand, if a (possibly dishonest) prover P is accepted by an honest verifier V with probability
noticeably greater than 1/2, then there is a non-neglibible probability that P can answer correctly the two
challenges c = 0 and c = 1 for the same G2 . Therefore, from the two answers ψ0 and ψ1 , P itself can
compute the isomorphism φ = ψ1−1 ◦ ψ0 .
For the zero-knowledge property, there exists a simulator that can obtain from a (possible dishonest) verifier
an accepting transcript that is indistinguishable from the one obtained from a real execution of the protocol
between an honest prover and the same verifier. Indeed, the simulator (who does not know the isomorphism
φ : G0 → G1 ), chooses a random bit c ∈ {0, 1}. Then, it chooses random permutations ψ0 and ψ1 of the set
of vertices, and defines G2 = ψc (Gc ). Next, the simulator starts a protocol execution by sending G2 to V . If
the challenge given by V is different from c, then it aborts the protocol execution and restarts by choosing
a new random bit. Otherwise, the simulator outputs the transcript (G2 , c, ψc ).
Observe that the distribution of the simulated transcript is identical to the one of a real execution, even if
the challenge given by the verifier is biased in a way that depends on G0 , G1 or G2 .
The soundness error of 1/2 can be reduced arbitrarily by repeating the protocol many independent times.
The verifier will accept the aggregated proof only if all individual executions are accepted.
3
2.2 Examples
2.2.1 Knowledge of a Discrete Logarithm
In this protocol, due to Chaum and Pedersen, the common input consists of param = (G, q, g), where G is
a cyclic group of prime order q generated by g, together with the group element y = g x . The witness (i.e.,
the prover’s private input) is the discrete logarithm x, which knowledge is to be proved by P .
In order to generate the commitment, P chooses a random r ∈ Zq and computes a = g r .
After receiving a random challenge c ∈ Zq from the verifier, P computes the response t = r + cx mod q.
Finally, the verifier accepts the proof if g t = ay c .
Correctness is straightforward, since for honest parties g t = g r+cx = g r (g x )c = ay c .
On the other hand, extracting the witness from two accepting transcripts (a, c1 , t1 ), (a, c2 , t2 ) with a common
commitment but different challenges is also easy. Indeed, a as a group element can be written as a = g r for
a certain r ∈ Zq . Similarly, y = g x for a certain x ∈ Zq . Then, the proof acceptance equations for both
transcripts imply:
t1 = r + c1 x (mod q)
t2 = r + c2 x (mod q)
Now, using that c1 6= c2 we can recover the witness:
x = (t2 − t1 )(c2 − c1 )−1 (mod q)
Finally, the honest-verifier zero-knowledge property is based on a simulator that computes identically dis-
tributed accepting transcripts without knowing the witness. Namely, for a random c ∈ Zq , the simulator
chooses a random response t ∈ Zq and computes the matching commitment a = g t y −c .
Observe that the main difference from the simulator and the real protocol is the order in which the different
components of the transcript are generated. Indeed, the prover convinces the verifier only if the commitment
is generated prior receiving the challenge. Otherwise, an accepting transcript can be easily generated without
the knowledge of x.
The protocol is very similar to the previous one. The common input is an RSA modulus n = pq for two
random λ-bit primes, a prime number e of k bits, and the element y ∈ Zn× . The witness is the e-th root of
y, that is x ∈ Zn× such that y = xe mod n.
The commitment a is computed from a random r ∈ Zn× as a = re mod n, and, given a random challenge
c ∈ {0, ..., e − 1}, the response is t = rxc mod n.
The acceptance equation is te = ay c mod n.
Correctness is immediate, and from any two accepting transcripts (a, c1 , t1 ), (a, c2 , t2 ) with a common com-
mitment but different challenges, the witness can be recovered in the following way:
x = (t2 /t1 )α y β mod n,
where α and β are integers, computed with the extended Euclid gcd algorithm, fulfilling the equation
α(c2 − c1 ) + βe = 1.
Observe that c2 − c1 and e are always coprime, because the challenge is always smaller than e. If e is chosen
too small, then the tolerated soundness error probability 1/e (i.e., the probability that a dishonest prover
passes the verification without knowing any witness) can be not enough for practical applications.
However, one can always perform repeated independent executions of the protocol to reduce this error
probability to a reasonably small amount, at the cost of lossing some efficiency.
The honest-verifier zero-knowledge property works as in the previous example: First, choose random c and
t, then, compute the right commitment a = te y −c . It is easy to see that the simulated transcripts and the
4
real transcripts have the same exact probability distribution.
A generalization of the discrete logarithm to more than one independent base elemente is known as repre-
sentation of a group element. Given param = (G, q, g, h), where G is a cyclic group of prime order q, and
g and h are two independent generators (the discrete logarithm of h with respect to g is unknown), for a
group element y ∈ G the exponents (α, β) such that y = g α hβ are called a representation of y on base (g, h).
A direct generalization of Chaum-Pedersen proof allows the prover to convince a verifier that it knows a
representation without revealing it.
The common input is again (param, y), and the witness is a pair (α, β) such that y = g α hβ . The prover
generates the commitment as a = g r1 hr2 , for two random exponents r1 , r2 ∈ Zq . The response to a random
challenge c ∈ Zq is the pair t = (t1 , t2 ) = (r1 + cα, r2 + cβ). The verifier checks the equation g t1 ht2 = ay c .
From two accepting transcripts (a, c, t1 , t2 ), (a, c0 , t01 , t02 ) with the same commitment but different challenges,
one can efficiently extract the representation y = g α hβ , where α = (t01 − t1 )/(c0 − c) mod q and β =
(t02 − t2 )/(c0 − c) mod q. Observe that there are many different representations of the same group element,
but the prover just needs to prove the knowledge of a single one. Indeed, the knowledge of two different
representations of the same group element implies the knowledge of the discrete logarithm of h with respect
to g.
The honest-verifier zero-knowledge simulator just computes random c and (t1 , t2 ), and computes a =
g t1 ht2 y −c .
This example is actually the aggregation of two instances on the proof of knowledge of a discrete logarithm.
The statment proved in the protocol is informally “I know these two discrete logarithms, and they are equal”.
With the same setup than in Chaum-Pedersen proof, the common input is param = (G, q, g, h), where G is
a cyclic group of prime order q, and g and h are two generators, together with the group elements y and z.
The witness is the discrete logarithm x, such that y = g x and z = hx , which knowledge is to be proved by
P.
The commitment a is the pair a = (a1 , a2 ) = (g r , hr ), for a random r ∈ Zq . On reception of a random
challenge c ∈ Zq , the prover computes its response as t = r + cx mod q.
The verification equations are g t = a1 y c and ht = a2 z c .
From the soundness of Chaum-Pedersen proof, we know that a possibly dishonest prover that passes the
verification with a probability noticeable greater than 1/q can extract the discrete logarithm of y with respect
to g, and the discrete logarithm of z with respect to h. Furthermore, both discrete logarithms must be equal.
Indeed, even if the prover tries to cheat the verifier, one can always write y = g x1 , z = hx2 , a1 = g r1 and a2 =
hr2 , for suitable x1 , x2 , r1 , r2 ∈ Zq . Then, the verification equations imply t = r1 +cx1 = r2 +cx2 mod q, that
in case of x1 6= x2 they can only be satisfied by a single value of the challenge, c = (r2 − r1 )/(x2 − x1 ) mod q.
But this fact contradicts the assumption that the acceptance probability is greater than 1/q.
The honest-verifier zero-knowledge property uses a similar simulator to the one used in Chaum-Pedersen
proof.
Observe that the proof applies also when two different cyclic groups G1 , G2 of the same prime order q are
used, one generated by g and the other by h. Then, g, y, a1 ∈ G1 and h, z, a2 ∈ G2 .
Similarly, one can apply the same ideas to the proof of knowledge of a representation to obtain a proof of
equality of two representations, in which the knowledge of a solution (α, β) to the equations y1 = g1α hβ1 and
y2 = g2α hβ2 is proven. The commitment will be now a = (a1 , a2 ) = (g1r1 hr12 , g2r1 hr22 ), and the response will be
t = (t1 , t2 ) = (r1 + cα, r2 + cβ). The verification equations are now g1t1 ht12 = a1 y1−c and g2t1 ht22 = a2 y2−c .
5
2.2.5 Inequality of Discrete Logarithms
Proving inequality of discrete logarithms is not so easy as proving their equality. Actually, we have to be
more precise in the meaning of “proving inequality”. Here, we will refer to the statment “I know one discrete
logarithm, and also know that a second discrete logarithm is different from the first (but I do not prove the
knowledge of the second discrete logarithm)”.
The inequality proof is constructed from a variation of the proof of equality of two representations. Namely,
for a random nonzero exponent r ∈ Zq× , the discrete logarithm equation y = g x is transformed into 1 =
y r g −rx , that is, (r, −rx) is a representation of 1 with the basis (y, g). Similarly, the inequality z 6= hx is
transformed into a0 = z r h−rx 6= 1. This last equation means that (r, −rx) is also a representation of a0 with
the basis (z, h), but a0 6= 1.
The common input of the protocol is param = (G, q, g, h), together with the group elements y and z. The
witness is the discrete logarithm x, such that y = g x , which knowledge is to be proved by P , but now it is also
proven than z 6= hx . The commitment is the tuple a = (a0 , a1 , a2 ) = (z r h−rx , y r1 g r2 , z r1 hr2 ), for random
r ∈ Zq× and r1 , r2 ∈ Zq . The response given to the challenge c ∈ Zq is t = (t1 , t2 ) = (r1 + cr, r2 − crx). The
verification equations are y t1 g t2 = a1 , z t1 ht2 = a2 a−c
0 and a0 6= 1.
From two accepting transcripts (a0 , a1 , a2 , c, t1 , t2 ), (a0 , a1 , a2 , c0 , t01 , t02 ) with the same commitment but dif-
ferent challenges, the extractor computes x = −(t02 − t2 )/(t01 − t1 ) mod q. Observe that t1 and t01 cannot be
equal. Otherwise, from the verification equations, t2 = t02 and then a0 = 1, which is a contradiction.
For the honest-verifier zero-knowledge property, the simulator chooses random c, t1 , t2 ∈ Zq and a0 ∈ G\{1},
and computes a1 = y t1 g t2 and a2 = z t1 ht2 a−c
0 .
2.4 OR Proofs
Following a similar methodology, two sigma protocol instances can be combined together to form a single
sigma protocol instance which statment is the disjunction (OR) of the two individual statments. In this case,
the prover convinces the verifier that it knows either one or the other witness (but it does not necessarily
know both). The trick here is using the honest-verifier zero-knowledge simulator for the instance with the
unknown witness.
In the OR sigma protocol, the prover first runs the simulator for the instance with the unknown witness (say
the instance 1). Thus, it obtains an accepting simulated transcript (a1 , c1 , t1 ), that has exactly the same
probability distribution than a real accepting transcript for an honest verifier. Next, the prover normally
runs the other sigma protocol, and generates the corresponding commitment a2 . Then, it sends the two
6
commitments (a1 , a2 ) to the verifier. The verifier then sends a challenge c, and the prover splits it into two
random pieces: c1 and c2 = c − c1 . Then, the prover computes the response t2 following the second sigma
protocol normal instance, and it sends the two responses (t1 , t2 ) to the verifier. The verifier accepts the proof
by checking all the verification equations of the two individual sigma protocols.
The only restriction of the two protocols is that the equation c2 = c − c1 for a random c produces the right
probability distribution for the challenge in the second protocol instance. Normally, this is guaranteed if
both sigma protocols share the same set of possible challenges, and there is a group operation (addition,
XOR, etc.) defined on it.
The knowledge extractor for the aggregated sigma protocol runs on two accepting transcripts with common
commitments and different challenges. If the aggregated challenges c and c0 differ, then at least the challenges
of one of the individual protocols also differ. Then, the individual knowledge extractor of this sigma protocol
instance will correctly extract the corresponding witness.
Similarly, the honest-verifier zero-knowledge simulator for the aggregated sigma protocol just runs the indi-
vidual simulators, and then it computes the aggregated challenge as c = c1 + c2 .
Observe that in the OR proof, nothing is revealed about which of the two witnesses is used by the prover.
Indeed, the protocol executions (from the view of the verifier) using the witness from sigma protocols 1 or
2 are identically distributed.
Combining both AND and OR aggregation techniques one can virtually build sigma protocols for any ar-
bitrary boolean combination of statments, where the size of the challenge (and then the soundness error
probability) does not depend on the number of aggregation steps made. However, the lengths of the commit-
ment and the response will dramatically grow with the complexity of the boolean formula used to combine
the individual statments.
7
maintaining a table with random images associated to all queried values of the hash function). Then, the
extractor runs several instances of the prover to find two accepting transcripts with the same commitment
a but a different value of H(g, y, a), by changing some of the answers to the random oracle queries made by
the prover.
For the zero-knowledge property, the simulator can easily generate a transcript (a, t) with exactly the same
probability distribution as the one generated by an honest prover. Namely, from a random response t, the
commitment is computed as a = g t y −H(g,y,a) . Observe that in the non-interactive version of the protocol, a
dishonest verifier cannot introduce any bias in the probability distribution of the challenge.
The computation of the hash can involve some extra information, like a message m, turning the proof of
knowledge into a digital signature scheme (like in the Schnorr signature). Indeed, with the message included
in the hash, that is c = H(g, y, a, m), the actual statment proven in the protocol is “I am the one who knows
the witness and I also know (agree on) the message”. The secret signing key is the witness, the public key
is the statment fulfilled by the witness and the signature is the non-interactive proof. The unforgeability of
the signature is equivalent to the soundness of the zero-knowledge proof.
Observe that the security properties of the non-interactive protocols obtained from Fiat-Shamir conversion
only hold in the Random Oracle Model. But when the random oracle H() is replaced by a real hash function,
the security properties are no longer guaranteed.
4 Applications
4.1 Knowledge of Secret Keys
One important application of zero-knowledge proofs of knowledge is key-registration. For security, a certi-
fication autority can ask a key-holder to give a proof that he knows the secret key associated to its public
key. In the case of an ElGamal key, this exactly means proving the knowledge of a discrete logarithm. Then,
Chaum-Pedersen interactive protocol, or its non-interactive counterpart would be enough for this purpose.
8
In a similar way, one can provide a “proof of decryption” for an ElGamal ciphertext, without revealing the
secret key or disclosing the randomness. Namely, proving that (c0 , c1 ) decrypts to m is equivalent to prove
that the discrete logarithm of y with respect to g is equal to the discrete logarithm of c1 /m with respect to
c0 .
The two previous proofs require the knowledge of the secret key. However, in a similar way, we can provide
proofs that can be performed only form the knowledge of the randomness used in the encryptions.
For instance, in the latter case, one can prove that the discrete logarithm of c1 /m with respect to y equals
the discrete logarithm of c0 with respect to g.