Proof Theory Lecture Notes 1994
Proof Theory Lecture Notes 1994
Helmut Schwichtenberg
These notes represent the content of a lecture course on proof theory given during Som-
mersemester 1994 at the Mathematisches Institut der Universität München. They are still
rather sketchy and will have to undergo many more revisions.
For their help in preparing these notes I would like to thank Ulrich Berger, Michael
Bopp, Felix Joachimski, Ralph Matthes, Karl–Heinz Niggl, Jaco van de Pol and Robert
Stärk.
Helmut Schwichtenberg
ii
Contents
Part I: Preliminaries
1 Typed languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Natural deduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Hilbert style systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 The strong existential quantifier . . . . . . . . . . . . . . . . . . . . . . . 38
6 Realizing terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7 Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Appendix
15 Permutative conversions . . . . . . . . . . . . . . . . . . . . . . . . . . 87
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
iii
Part I: Preliminaries
1 Typed languages
Let us first fix our language L. Let G be a set of ground types (e.g. nat and boole). Types
(also called object types or simple types) are formed from G by the operations ρ → σ and
ρ × σ. The level of a type ρ is defined by
lev(ι) := 0 for any ground type ι,
lev(ρ → σ) := max(lev(ρ) + 1, lev(σ)),
lev(ρ × σ) := max(lev(ρ), lev(σ)).
For any type ρ let a countable infinite set of variables of type ρ be given. We denote
variables of type ρ by xρ , y ρ , . . .. We also assume that a set C of constants denoted cρ
is given, each of an arbitrary type ρ. Furthermore, we assume that a set F of function
symbols denoted f is given, each of a “functionality” (ρ1 , . . . , ρm ) → σ. They are intended
to denote external functionals, which need not be represented by objects in the model.
Note that it is a consequence of this that we can not freely abstract variables in a term, for
otherwise we could form λxρ f (x) for an arbitrary function symbol f , and hence f would
be an object of our model.
We define inductively terms tρ of type ρ, the set FV(tρ ) of variables free in tρ and the
set nonabs(tρ ) of non–abstractable variables in FV(tρ ).
• If t is a term of type ρ0 ×ρ1 and i ∈ {0, 1}, then πi (t) is a term of type ρi . FV(πi (t)) =
FV(t), nonabs(πi (t)) = nonabs(t).
We now fix our notion of a formula. Let an additional ground type ◦ be given, to be viewed
as the type of propositions. ◦ is not to be used in object types ρ, σ, · · ·. We assume that
2 1 Typed languages
R(t1 , . . . , tn ) and ⊥ are called atomic formulas or atoms. Note that ⊥ is not to a propo-
sitional symbol.
We use (also with indices)
r, s, t for terms,
x, y, z for variables,
a, b, c for constants,
P, Q, R for relation symbols,
f, g, h for function symbols,
A, B, C for formulas.
¬A := A → ⊥,
A ∨ B := ¬A ∧ ¬B → ⊥,
∃x A := ¬∀x ¬A.
For simplicity we identify terms and formulas which differ only by the names of bound
variables. This makes it possible to define substitution r[tρ /xρ ] and A[tρ /xρ ] of a term tρ
for a variable xρ in a particularly simple fashion.
For a term r, variable xρ and a term tρ we define the result of substituting tρ for xρ
in r inductively on r:
t, falls x = y;
y[t/x] :=
y, sonst.
c[t/x] := c.
f (t1 , . . . , tn )[t/x] := f (t1 [t/x], . . . , tn [t/x]),
(λy ρ r)[t/x] := λy ρ r[t/x], where y 6= x and y 6∈ FV(t),
(rs)[t/x] := r[t/x]s[t/x],
ht0 , t1 i[t/x] := ht0 [t/x], t1 [t/x]i,
πi (r)[t/x] := πi (r[t/x]),
and siminarly for a formula A, variable xρ and a term tρ we define the result of substituting
tρ for xρ in A inductively on A:
We now want to formulate a general notion of a structure or model for our typed
languages, called environment model in [3].
M = ((Dρ )ρ , I0 , I1 ) is called an L–structure, if Dρ is a nonempty set for any type ρ,
I0 is a mapping assigning to any constant symbol cρ ∈ C an object I0 (c) ∈ Dρ and to any
function symbol f ∈ F of functionality (ρ1 , . . . , ρn ) → σ a function
I0 (f ): Dρ1 × · · · × Dρn → Dσ ,
(iii) a mapping assigning to any term t and any environment U (i.e. mapping from
the set of variables into ρ Dρ such that for each xρ we have U (xρ ) ∈ Dρ ) an element
S
[[tρ ]]U ∈ Dρ such that the following holds.
Note that for a given L–structure M = ((Dρ )ρ , I0 , I1 ) and bijections Φρ,σ and Ψρ,σ it is
in general not possible to define [[t]]U by the requirements above. The reason is that the
function f : Dρ → Dσ defined by f (a) = [[t]]U[a/xρ ] may not be in the range [Dρ → Dσ ]
of Φρ,σ , and hence we cannot define [[λxρ t]]U by Φ−1 ρ,σ (f ). — A trivial way out of this
difficulty is of course to let [Dρ → Dσ ] be the set of all functions from Dρ to Dσ . Hence
we obtain the following trivial
Example of an environment model. Let Dι for any ground type ι be an arbitrary
nonempty set. Choose Dρ→σ to be the set of all functions from Dρ to Dσ , and Dρ×σ =
Dρ ×Dσ . If we then take Φρ,σ and Ψρ,σ to be identities, we can define [[t]]U by the equations
above.
There are more interesting examples of environment models, most notably the model
of the partial continuous function due to Scott [27] and Ersov [9]; see also [24] for an
exposition.
We will usually be somewhat sloppy with our notation and leave out Φ and Ψ. So we
write
[[t]]U [[s]]U instead of (Φρ,σ ([[t]]U ))([[s]]U ),
([[t]]U )i instead of (Ψρ,σ ([[t]]U ))i ,
[[[t0 ]]U , [[t1 ]]U ] instead of Ψ−1
ρ,σ ([[[t0 ]]U , [[t1 ]]U ]) .
We often write f M for the interpretation I0 (f ) of a function symbol and RM for the
interpretation I1 (R) of a relation symbol.
For any environment model M, environment U and formula A we define a model
relation M |= A[U ] by induction on A.
Coincidence Lemma. (i) If U (x) = U ′ (x) for all x ∈ FV(t), then [[t]]U = [[t]]U ′ .
As our deductive formalism we use the system of natural deduction introduced by Gerhard
Gentzen in [11]. It consists of the following introduction and elimination rules for →, ∧
and ∀.
For any L–formula A let countably many assumption variables of type A be given.
We use uA , v A , wA to denote assumption variables of type A.
Later we will define substitution d[t/x] of object terms t for object variables x in
derivation terms d. We want to avoid that different assumption variables are identified by
R(t) R(x)
a substitution of object terms, as e.g. in hu0 , u0 i[t/x]. Therefore we assume that for
any two assumption variables uA B
i , uj in a derivation d in case A 6= B we also have i 6= j
(cf. [33], p. 30).
The notions of a derivation term dA in minimal logic and its set FA(dA ) of free
assumption variables are defined inductively by
(λuA dB )A→B
(dA→B eA )B
hdA , eB iA∧B
is a derivation term with FA(πi (dA0 ∧A1 )Ai ) := FA(dA0 ∧A1 ) (for i ∈ {0, 1}).
ρ
(λxρ dA )∀x A
2 Natural deduction 7
(→− )
| |
A→B A
→−
B
(∧+ )
| |
A0 A1 +
∧
A0 ∧ A1
(∧−
i )
|
A0 ∧ A1 −
∧i
Ai
(∀+ )
|
A +
∀ x
∀x A
provided x does not occur free in any open assumption of the given derivation of A.
(∀− )
|
∀x A t −
∀
A[t/x]
8 2 Natural deduction
dB [uA An
1 , . . . , un ]
1
P → (Q → P ),
(P → Q → R) → (P → Q) → P → R,
(∀x.P (x) → Q(x)) → ∀x P (x) → ∀x Q(x).
Definition. For any derivation d we define its set FV(d) of free (object) variables by
FV(uA ) := FV(A).
FV(λuA dB ) := FV(A) ∪ FV(dB ).
FV(dA→B eA ) := FV(dA→B ) ∪ FV(eA ).
FV(hdA , eB i) := FV(dA ) ∪ FV(eB ).
FV(πi (dA∧B )) := FV(dA∧B ).
FV(λx dA ) := FV(dA ) \ {x}.
FV(d∀x A t) := FV(d∀x A ) ∪ FV(t).
u: R(x)
→+ u
R(x) → R(x)
Then we have
d = λuR(x) uR(x) ,
FA(d) = ∅,
FV(d) = {x}.
For derivation terms we have two kinds of substitution: we can substitute a derivation
term f A for an assumption variable uA , and we can substitute an object term t for an
object variable x. These substitutions are defined as follows. For simplicity we again
identify derivation terms which differ only in the names of bound variables.
2 Natural deduction 9
Definition.
f, if u = v;
v[f /u] :=
v, otherwise.
(λv d)[f /u] := λv d[f /u], where u 6= v and v 6∈ FA(f ).
(de)[f /u] := d[f /u]e[f /u].
(hd, ei)[f /u] := hd[f /u], e[f /u]i.
(πi (d))[f /u] := πi (d[f /u]).
[
(λx d)[f /u] := λx d[f /u], where x 6∈ {FV(A) : v A ∈ FA(f )}.
(dt)[f /u] := d[f /u]t.
Definition.
uA [t/x] := uA[t/x] .
(λuA d)[t/x] := λuA[t/x] d[t/x].
(de)[t/x] := d[t/x]e[t/x].
(hd, ei)[t/x] := hd[t/x], e[t/x]i.
(πi (d))[t/x] := πi (d[t/x]).
(λy d)[t/x] := λy d[t/x], where x 6= y and y 6∈ FV(t).
(dr)[t/x] := d[t/x]r[t/x].
Recall here the requirement for derivation terms mentioned above: for any two as-
sumption variables uA B
6 B we also have i 6= j (cf. [33],
i , uj in a derivation d in case A =
p. 30).
Lemma. (i) If d, f are derivation terms and t is an object term, then d[f /u] and d[t/x]
are derivation terms.
((P → Q) → P ) → P
((((P → Q) → P ) → P ) → Q) → Q
((((P → Q) → P ) → P ) → R) → R
Stability Lemma. From stability assumptions StabR for any relation symbol R occurring
in a formula A we can derive ¬¬A → A.
Proof by induction on A.
Case R(~t). Use StabR .
Case ⊥. ¬¬⊥ → ⊥ ≡ ((⊥ → ⊥) → ⊥) → ⊥
(⊥ → ⊥) → ⊥ ⊥→⊥
⊥
Case A → B. Use
⊢ (¬¬B → B) → ¬¬(A → B) → A → B.
2 Natural deduction 11
A→B A
¬B B
¬¬(A → B) ¬(A → B)
¬¬B → B ¬¬B
∀x A x
¬A A
¬¬∀x A ¬∀x A
¬¬A → A ¬¬A
Case A ∧ B. Use
Case ∀x A. Use
⊢ (¬¬A → A) → ¬¬∀x A → A.
Similarly we can show that from our ex–falso–quodlibet axioms we can derive ex–
falso–quodlibet for arbitrary formulas (again in our → ∧∀–language).
Ex–falso–quodlibet Lemma. From assumptions EfqR for any relation symbol R occur-
ring in a formula A we can derive ⊥ → A in intuitionistic logic.
12 2 Natural deduction
Proof by induction on A.
From ¬¬A → A one can clearly derive ⊥ → A. Hence any formula derivable in
intuitionistic logic is also derivable in classical logic.
3 Hilbert style systems
By a Hilbert style system we mean a derivation system without the possibility to discharge
open assumptions, i.e. without the rule →+ of →–introduction. Such a system for minimal
quantifier logic is the following. Axioms:
K: B → A → B,
S: (A → B → C) → (A → B) → A → C,
∧+ –Ax: A → B → A ∧ B,
∧−
i –Ax: A0 ∧ A1 → Ai ,
∀+ –Ax: (∀x.A → B) → A → ∀x B if x 6∈ FV(A),
−
∀ –Ax: ∀x A → A[t/x].
Rules: only →− , ∀+ . Sometimes →− is also called modus ponens, and ∀+ the rule of
generalization. Note that in addition to the axioms (or better axiom constants) we also
allow assumption variables; the variable condition in the rule ∀+ only refers to the free
assumption variables, not to the axiom constants.
We want to prove that this Hilbert system is equivalent to the natural deduction
system for minimal quantifier logic introduced in §2.
For the non–trivial direction we consider an intermediate system which has the axioms
above and as rules only →+ , →− , ∀+ .
~ B in the natural deduction system we
We first show that for any derivation d[~u: A]:
~ B in the intermediate system. The proof is by induction on
can find a derivation d′ [~u: A]:
d.
Case hd0 , d1 i. Take ∧+ –Ax d′0 d′1 .
Case πi (d). Take ∧− –Ax d′ .
Case dt. Take ∀− –Ax t.
We now show the so-called deduction theorem, i.e. that for any derivation
~ B
d[uA , ~u: A]:
K: ∀(B → A → B),
S: ∀((A → B → C) → (A → B) → A → C),
∧+ –Ax: ∀(A → B → A ∧ B),
∧−
i –Ax: ∀(A0 ∧ A1 → Ai ),
K ∀ : ∀(B → ∀x B) if x 6∈ FV(B),
S ∀ : ∀((∀x.B → C) → ∀x B → ∀x C),
∀− –Ax: ∀(∀x A → A[t/x]),
where ∀(A) denotes the universal closure of the formula A. Rules: only →− . Again in
addition to the axioms we also allow assumption variables. Show that for any derivation
~ B in the natural deduction system we can find a derivation d′ [~u: A]:
d[~u: A]: ~ B in this variant
of the Hilbert system.
4 Normalization
We show in this section that any derivation d can be transformed by appropriate con-
version steps into a normal form. A derivation in normal form has the property that it
does not make “detours”, or more precisely, that it cannot occur that an elimination rule
immediately follows an introduction rule. Derivations in normal form have many pleasant
properties, and can be used for a variety of results. We also construct in this section the
so–called long normal form, by means of an additional conversion step called η–expansion.
Implementing normalization of λ–terms with free variables (like derivations d) in the
usual recursive fashion is quite inefficient. However, it is possible to compute the long
normal form of such a term by evaluating it in an appropriate model (cf. [2] and also [5]).
This makes it possible to use the built–in evaluation mechanism of e.g. Scheme (a Lisp
dialect, defined in [4]) to efficiently implement normalization.
We finally show that the requirement to give a normal derivation of a derivable formula
can sometimes be unrealistic. Following Orevkov [16] we give examples of formulas C k
which are easily derivable with non–normal derivations (whose number of nodes is linear
in k), but which require a non–elementary (in k) number of nodes in any normal derivation.
For the arguments in this section it is convenient to use the following notation.
Definition. M →0 M ′ is defined by
λx.M x →0 M, if x ∈
/ FV(M ) ∪ FA(M ). (η)
πi hM0 , M1 i →0 Mi for i ∈ {0, 1}. (β×,i )
hπ0 (M ), π1 (M )i →0 M. (η× )
A term M is called β–convertible (η–convertible), if it has the form of a left hand side of
(β) or (β×,i ) ((η) or (η× )). Such terms are also called β–redex or η–redex (for reducible
expression).
From →0 (also denoted by →0βη ) one derives a one–step reduction relation → (also
denoted by →βη ) as follows. Intuitively M → M ′ means that M ′ is obtained from M by
converting exactly one subterm.
M →0 M ′ =⇒ M → M ′ .
M → M ′ =⇒ λx M → λx M ′ . (+)
M → M ′ =⇒ M N → M ′ N. (−0 )
N → N ′ =⇒ M N → M N ′ . (−1 )
M → M ′ =⇒ hM, N i → hM ′ , N i. (+×,0 )
N → N ′ =⇒ hM, N i → hM, N ′ i. (+×,1 )
M → M ′ =⇒ πi (M ) → πi (M ′ ) for i = 0, 1. (−×,i )
~ of strongly computable
A term M is strongly computable under substitution if for any list N
~ ] is strongly computable.
terms M [N
Proof by simultaneous induction of the type ρ. For ground types ι both claims are trivial.
Case ρ → σ. (i) Let M ρ→σ be strongly computable. By IH(ii) and the definition
of strong computability (M x)σ is strongly computable. Hence by IH(i) any reduction
sequence starting with M x terminates. Then clearly this also holds for M , since any
subterm of a strongly normalizable term is also strongly normalizable. (ii) Let M ~ be a
list of strongly computable terms or 0 or 1 such that xM ~ is of ground type. We have to
show that xM ~ is strongly computable, which for a ground type means the same as strongly
normalizable. But this follows from IH(i), which says that any reduction sequence starting
with some Mi terminates.
Case ρ×σ. (i) Let M ρ×σ be strongly computable. By definition e.g. π0 (M )ρ is strongly
computable and hence by IH(i) also strongly normalizable. This clearly also holds for M ,
since any subterm of a strongly normalizable term is also strongly normalizable. (ii) As in
Case ρ → σ.
Proof by induction on M .
Case x. The claim follows from Lemma 1(ii).
Case M N . Let K~ be strongly computable. We have to show that M [K]N ~ [K] ~ is
~ as well as N [K]
strongly computable. But this clearly holds, since by IH M [K] ~ are strongly
computable.
Case π0 (M ), π1 (M ). Similar.
Case λx M . Let K~ be strongly computable. We have to show that λx M [K]~ is strongly
~ a list of strongly computable terms or
computable. So let N be strongly computable and L
18 4 Normalization
~
0 or 1. We have to show that (λx M [K])N L~ is strongly computable, i.e. that any reduction
sequence starting with it terminates. So assume that we have such a reduction sequence.
~ N and L
M [K], ~ are all strongly normalizable. Hence we can assume that in this reduction
~ ′ )N ′ L
sequence there is a term (λx M [K] ~ ′ with M [K]~ →∗ M [K]
~ ′ , N →∗ N ′ and L ~ →∗ L~′
to which in the next step a head conversion has been applied. This must be either a β– or
an η–conversion. In both cases the result is
~ ′ [N ′ ]L
M [K] ~ ′.
~ →∗ M [K]
Because of M [K] ~ ′ we also have M [N, K] ~ →∗ M [K]~ ′ [N ′ ]. Since by IH M
is strongly computable under substitution we know that M [N, K] ~ is strongly computable,
and hence by Lemma 2 also M [K] ~ ′ [N ′ ]. Since again by Lemma 2 L~ ′ is strongly computable
~ ′ [N ′ ]L
we obtain that M [K] ~ ′ is strongly computable and hence strongly normalizable.
~′ → PL
hπ0 (P ), π1 (P )iL ~ ′.
Clearly the proof just given remains correct if we leave out the η–conversion rules (η)
and (η× ). The corresponding normal form is called β–normal form.
This proof can be extended to terms involving primitive recursion operators (see
e.g. [30, 31, 21]), the general recursion or fixed point operator (see [18]) or the bounded
fixed point operator (see [26]).
4 Normalization 19
We now want to draw some conclusions from the fact that any derivation can be brought
into β–normal form. For this purpose we have to analyse more closely the form of a normal
derivation. First we need some terminological preparations.
With p, q ∈ {0, 1}∗ we denote positions in a term M . Pos(M ) is the set of all positions
in M , and [] denotes the empty list, i.e. the root position. p0 and p1 are the extensions of
the position p by 0 or 1. The subterm of M at position p is denoted by M/p, and M [N ]p
results from M by replacing the subterm at position p by N . We write p ≤ q if p is a prefix
of q. p and q are called independent (written p k q) if p 6≤ q and q 6≤ p. Firthermore let
Let
Endpos(M ) := {p ∈ Pos(M ) : endposM (p) = p}.
For β–normal terms M the leaf positions and the minimal positions are in a bijective
correspondence. Note however that two minimal positions can yield the same end position,
as e.g. in
P Q
P ∧Q
20 4 Normalization
is called the branch in M determined by the leaf position p. In particular, minposM (p) is
an element of the branch determined by p. In case endposM (p) = [] the branch is called a
main branch. In a β–normal term any branch p1 ≥ · · · ≥ pn has a particularly perspicious
form: all elimination rules must come before all introduction rules.
In the case of a normal derivation term we want to draw from this some conclusions
on the formulas attached to the positions. To do this we need the following notions.
The relation “A is a subformula of B” is the reflexive transitive closure of the relation
“immediate subformula”, defined as follows.
C B
e is a subderivation of d , then C is a subformula of B or of some Ai .
Proof by induction on the length of the end position of a branch in d. Here we need the
properties of branches just mentioned.
Proof . To simplify the notation we only treat the case ∀x A ⊢ B with quantifier free A, B.
From the given derivation we can construct a normal derivation dB [u∀x A ]. By induction
on the length of the end position of a branch it can be shown easily that any branch ends
with a derivation of a quantifier free formula and begins with the rule ∀− , i.e. with u∀x A ti ,
or with a bound assumption variable v C with quantifier free C. Now replace in the first
A[t ]
case the subderivation u∀x A ti by vi i with a new assumption variable vi .
4.3 Eta–expansion
Definition. M →η –exp M ′ iff there is a p ∈ Minpos(M ) such that the type of N := M/p
is composed and in case typ(N ) = ρ → σ we have
M ′ = M [λxρ .N x]p
M ′ = M [hπ0 (N ), π1 (N )i]p .
The key idea of Di Cosmo’s and Kesner’s proof is the observation that η–expansion
can be simulated by β–conversion:
(λy ρ→σ λxρ .yx)M →β λx.M x if x 6∈ FV(M ),
(λxρ×σ hπ0 (x), π1 (x)i)M →β hπ0 (M ), π1 (M )i.
Lemma.
~ →∗ ∆k (M ∆k N
∆k M N ~ ), (1)
β
(2). We again use induction on k. The case k = 0 is trivial, and for the induction
step we obtain
∆k+1 k
ρ→σ (λx M ) = ∆ρ→σ (∆ρ→σ (λx M ))
(3). Similar.
By a ∆–expansion of M we mean a term M ∆ obtained from M by replacing simulta-
neously every η–expandable subterm N of type ρ in M by ∆kρ N for some k > 0.
By an inner ∆–expansion of M we mean a term M i∆ obtained from M by replacing
simultaneously every η–expandable subterm N of type ρ in M exept the term M itself by
∆kρ N for some k > 0.
By a strong ∆–expansion of M we mean a term M s∆ obtained from M by replacing
simultaneously every η–expandable subterm N of type ρ in M by ∆kρ N for some k > 0
and every other subterm N of type ρ in M by ∆kρ N for some k ≥ 0.
For the proof of the following theorem we need an inductive definition of η–expansion,
which can be given as follows.
M →0η−exp M ′ =⇒ M →η –exp M ′.
~ →η –exp
M →η –exp M ′ =⇒ (λx M )K ~
(λx M ′ )K.
~ →η –exp
M →η –exp M ′ =⇒ hM, N iK ~
hM ′ , N iK.
~ →η –exp
N →η –exp N ′ =⇒ hM, N iK ~
hM, N ′ iK.
~ →η –exp
N →η –exp N ′ =⇒ M N K ~
M N ′ K.
M →0β M ′ =⇒ M N ~ →β M ′ N
~.
~ →β (λx M ′ )K.
M →β M ′ =⇒ (λx M )K ~
~ →β hM ′ , N iK.
M →β M ′ =⇒ hM, N iK ~
~ →β hM, N ′ iK.
N →β N ′ =⇒ hM, N iK ~
N →β N ′ =⇒ M N K~ →β M N ′ K.
~
Proof by induction on the definition of M →βη –exp N . Recall →βη –exp =→β ∪ →η –exp .
We only treat the cases concerning types ρ → σ; the cases concerning product types ρ × σ
can be dealt with in a similar fashion.
Case (λx M )N N ~ →β M [N/x]N. ~ Any ∆–expansion of (λx M )N N ~ must be of the
~ ∆ ) for some k > 0 with ∆–expansions M ∆ of M , N ∆ of N and
form ∆k ((λx M ∆ )N ∆ N
4 Normalization 25
~ ∆ of N
N ~ . One β–reduction yields ∆k (M ∆ [N ∆ /x]N
~ ∆ ). Since this term is a strong ∆–
~ we can β–reduce it by the previous Lemma to a ∆–expansion of
expansion of M [N/x]N
~
M [N/x]N.
Case M →0η−exp λx.M x. Then M is not a λ–expression. Hence any ∆–expansion
of M must be of the form ∆k+1 M i∆ with an inner ∆–expansion M i∆ of M . From ∆ =
λyλx.∆(y∆x) we obtain immediately
Since this term is a strong ∆–expansion of λx.M x we can β–reduce it by the previous
Lemma to a ∆–expansion of λx.M x.
In the following cases let → denote either →η –exp or else →β .
Case (λx M )K ~ → (λx M ′ )K.
~ Any ∆–expansion of (λx M )K ~ must be of the form
~ ∆ ) for some k > 0 with ∆–expansions M ∆ of M and K
∆k ((λx M ∆ )K ~ ∆ of K.~ By IH
we can find a ∆–expansion M ′∆ of M ′ such that M ′∆ can be obtained from M ∆ by a
~ ∆ := ∆k ((λx M ′∆ )K
non–empty sequence of β–reductions. Let ((λx M ′ )K) ~ ∆ ).
We now use the long normal form of derivation terms to show that any classical proof
of ⊥ from so–called generalized (definite) Horn formulas can be converted into a proof in
minimal logic, Moreover we describe a reasonable algorithm to do this conversion; here we
follow [22]. This result can be used to prove completeness of SLD–Resolution.
A formula is called Horn formula if it has the form ∀x1 , . . . , xn .A1 → . . . → Am → B
with Ai and B atomic. It is called definite Horn formula if in addition we have B 6= ⊥. If
instead of atomic Ai we allow universally quantified atomic formulas, the result is called a
generalized (definite) Horn formula.
26 4 Normalization
Proof by induction on the total number of stability axioms used. Note first that bound
assumption variables u in the given normal proof can only occur in the context
∀~
x.¬¬R(~
x)→R(~
x)~
StabR t(λu d)
with u of type ¬R(~t) and d of type ⊥. The reason for this is that all top formulas different
from stability axioms are generalized Horn formulas which never have an implication in
the premise of another implication.
Case 1. There is at least one occurrence of a bound assumption variable in the
proof. Since we assume our proof to be in long normal form, any of the occurrences of
an assumption variable u of type ¬R(~t) must be the main premise of an →–elimination,
i.e. must be in a context ud1 where d1 derives R(~t). Now choose an uppermost occurrence
of a bound assumption variable, i.e. a subderivation ud1 where d1 does not contain an
occurrence of any bound assumption variable. Since d1 derives R(~t), we can replace the
whole subderivation StabR (~t)(λu d) of R(~t) (the one where u is bound) by d1 . Hence we
have removed one occurrence of a stability axiom.
Case 2. Otherwise. If there are no more stability axioms in the proof, we are
done. If not, choose an uppermost occurrence of a stability axiom, i.e. a subderivation
StabR (~t)(λu d) where d does not contain stability axioms. Since we are in case 2 here d
also cannot contain free assumption variables which are bound elsewhere in the proof. But
since d derives ⊥, we can replace the whole proof (which also has ⊥ as its end formula) by
d and hence we are done again.
Note that Theorem 1 is best possible in the sense that it becomes false if we allow an
implication in the body of one of the Horn formulas. A counterexample (due to U. Berger)
is
((P → Q) → ⊥) → (P → ⊥) → ⊥,
which is provable in classical but not in minimal logic. For if it were, we could replace
⊥ in this proof (which in minimal logic is just another propositional variable) by P , and
hence we would obtain a proof in minimal logic of the Peirce formula
((P → Q) → P ) → P,
Proof by a simple modification of the argument for Theorem 1. Note that in case 2
it cannot happen that stability axioms occur in the proof since then we would have a
derivation d of ⊥ from definite Horn formulas, which is clearly impossible.
We now show the uniqueness of normal forms. More precisely, we show that the one–step
relations →βη and →βη –exp are confluent and hence that the βη normal form as well as
the long normal form are uniquely determined. This follows from the local confluence and
termination of these relations by the Lemma of Newman [15].
Remark : If we leave out the conversion rule
hπ0 (M ), π1 (M )i →0 M (η× )
can be proved. For →βη instead of →par this property does not hold. The reason that it
holds for →par is that →par is compatible with substitution in the following sense: we have
Lemma. (i) M → M ′ =⇒ M [N ] → M ′ [N ].
(ii) N → N ′ =⇒ M [N ] →∗ M [N ′ ].
Note that (ii) with → instead of →∗ does not hold. The reason is that the variable x
to be substituted can have multiple occurrences in M , while the relation M → M ′ means
that M ′ is obtained from M by conversion at exactly one position.
Case −0 +, β.
(λx M )N → (λx M ′ )N
↓ ↓
M [N ] → M ′ [N ]
Case −1 , β.
(λx M )N → (λx M )N ′
↓ ↓
M [N ] →∗ M [N ′ ]
Case +−1 , η. This case cannot occur, since there is no N such that x → N .
Case +β, η.
λx.(λx M )x → λx M
↓ =
λx M = λx M
We now treat the group of cases concerning pairs and components. We leave out the
index ×.
4 Normalization 29
Case −0 , −1 .
hM, N i → hM ′ , N i
↓ ↓
hM, N ′ i → hM ′ , N ′ i
π0 hM0 , M1 i → π0 hM0′ , M1 i
↓ ↓
M0 → M0′
Case −i η, βi .
πi hπ0 (M ), π1 (M )i → πi (M )
↓ =
πi (M ) = πi (M )
hπ0 (M ), π1 (M )i → hπ0 (M ′ ), π1 (M )i
↓ ↓∗
M → M′
30 4 Normalization
Proof . A term M is called good if it satisfies the property of confluence, i.e. if for all
M ′ , M ′′ we have
M → M1 →∗ M′
↓ ↓∗
M2 →∗ M3
↓∗
M ′′
M → M1 →∗ M′
↓ ↓∗ ↓∗
M2 →∗ M3 →∗ N
↓∗ ↓∗ ↓∗
M ′′ →∗ K →∗ L
Lemma. (i) M → M ′ =⇒ M [N ] → M ′ [N ].
(ii) N → N ′ =⇒ ∃K.M [N ], M [N ′] →∗ K.
Case −0 +, β.
(λx M )N → (λx M ′ )N
↓ ↓
M [N ] → M ′ [N ]
Case −0 , −1 .
hM, N i → hM ′ , N i
↓ ↓
hM, N ′ i → hM ′ , N ′ i
π0 hM0 , M1 i → π0 hM0′ , M1 i
↓ ↓
M0 → M0′
We now show that normalization can be achived by evaluation, following [2]. We make use
of the fact proved above that any term M has a unique long normal form which we now
denote by lnf(M ). The following properties of the long normal form will be used.
We define a model (Dρ )ρ of our language as follows. For any ground type ι let Dι be
the set Lι of all terms of type ι in long normal form. Furthermore let D ρ→σ be the set of
all functions f : Dρ → Dσ and Dρ×σ be the cartesian product Dρ × Dσ .
4 Normalization 33
Similar to Tait’s notion of strong computability we define for any type ρ a relation Rρ
on Tρ × Dρ by
Rι (M, a) ⇐⇒ lnf(M ) = a,
Rρ→σ (M, f ) ⇐⇒ ∀N, a.Rρ (N, a) =⇒ Rσ (M N, f (a)),
Rρ×σ (M, [a, b]) ⇐⇒ Rρ (M 0, a) and Rσ (M 1, b).
Note that M =βη M ′ implies R(M, a) ⇐⇒ R(M ′ , a); this follows from (2) by induction
on M .
Lemma 2. Assume R(c, [[c]]) for any constant c. Then for any term M [~x] of type ρ and
any environment U
R(K, ~ U (~x)) =⇒ R(M [K/~~ x], [[M ]]U ).
We now show that the requirement to give a normal derivation of a derivable formula can
sometimes be unrealistic. Following Orevkov [16] we give examples of formulas C k which
are easily derivable with non–normal derivations (whose number of nodes is linear in k),
but which require a non–elementary (in k) number of nodes in any normal derivation.
The example is related to Gentzens proof in [12] of transfinite induction up to ωk in
arithmetic; see e.g. [20] for an exposition. There the function y ⊕ ω x plays a crucial role,
and also the assignment of a “lifting”–formula A+ to any formula A, by
A+ := ∀y.(∀z ≺ y)A[z/x] → (∀z ≺ y ⊕ ω x )A[z/x].
4 Normalization 35
Here we consider the numerical function y + 2x instead, and axiomatize its graph by
means of Horn clauses. The formula Ck expresses that from these axioms the existence of
2k follows. A short, non–normal proof of this fact can then be given by a modification of
Gentzen’s idea, and it is easily seen that any normal proof of Ck must contain at least 2k
nodes.
The derivations to be given make heavy use of the existential quantifier ∃ defined by
¬∀¬. In particular we need:
Existence–Introduction–Lemma. ⊢ A → ∃x A.
Note that the stability assumption ¬¬B → B is not needed if B does not contain an
atom 6= ⊥ as a strictly positive subformula. This will be the case for the derivations below,
where B will always be an existential formula.
Let us now fix our language. We use a ternary relation symbol R to represent the
graph of the function y + 2x ; so R(y, x, z) is intended to mean y + 2x = z. We now
axiomatize R by means of Horn clauses. For simplicity we use a unary function symbol
s (to be viewed as the successor function) and a constant 0; one could use logic without
function symbols instead — as Orevkov does —, but this makes the formulas somewhat
less readable and the proofs less perspicious.
To obtain the short proof of the goal formula Ck we use formulas Ai with a free
parameter x; for ease in reading we write A[r] instead of A[r/x].
Proof . We give an informal argument, which can easily be converted into a formal proof.
Note that the Existence–Elimination–Lemma is used only with existential formulas as
conclusions. Hence it is not necessary to use stability axioms and we have a derivation in
minimal logic.
Case i = 0. Obvious by Hyp1 .
Case i = 1. Let x with A0 [x] be given. It is sufficient to show A0 [s(x)], that is
∀y∃z1 R(y, s(x), z1). So let y be given. We know
Applying (1) to our y gives z such that R(y, x, z). Applying (1) again to this z gives z1
such that R(z, x, z1 ). By Hyp2 we obtain R(y, s(x), z1).
Case i + 2. Let x with Ai+1 [x] be given. It suffices to show Ai+1 [s(x)], that is
∀y.Ai [y] → ∃z.Ai [z] ∧ R(y, s(x), z). So let y with Ai [y] be given. We know
Ai+1 [x] = ∀y.Ai [y] → ∃z1 .Ai [z1 ] ∧ R(y, x, z1). (2)
Applying (2) to our y gives z such that Ai [z] and R(y, x, z). Applying (2) again to this z
gives z1 such that Ai [z1 ] and R(z, x, z1 ). By Hyp2 we obtain R(y, s(x), z1).
Remark . Note that the derivations given have a fixed length, independent of i.
Proof . We give an informal argument, which can easily be converted into a formal proof.
Again the Existence–Elimination–Lemma is used only with existential formulas as conclu-
sions, and hence we have a derivation in minimal logic.
Ak [0] applied to 0 and Ak−1 [0] yields zk with Ak−1 [zk ] and R(0, 0, zk ).
Ak−1 [zk ] applied to 0 and Ak−2 [0] yields zk−1 with Ak−2 [zk−1 ] and R(0, zk , zk−1 ).
A1 [z2 ] applied to 0 and A0 [0] yields z1 with A0 [z1 ] and R(0, z2 , z1 ).
A0 [z1 ] applied to 0 yields z0 with R(0, z1 , z0 ).
Remark . Note that the derivations given have length linear in k.
We want to compare the length of this derivation of Ck with the length of an arbitrary
normal derivation.
Lemma. Any normal derivation of Ck from Hyp1 and Hyp2 has at least 2k nodes.
Proof . Let a normal derivation d of falsity ⊥ from Hyp1 , Hyp2 and the additional hypo-
thesis
u: ∀zk , . . . , z0 .R(0, 0, zk ) → R(0, zk , zk−1 ) → . . . → R(0, z1 , z0 ) → ⊥
4 Normalization 37
be given. We may assume that d does not contain free object variables (otherwise substitute
them by 0). The main branch of d must begin with u, and its side premises are all of the
form R(0, sn(0), sk (0)).
Observe that any normal derivation of R(sm (0), sn (0), sk (0)) from Hyp1 , Hyp2 and u
has at least 2n occurrences of Hyp1 and is such that k = m + 2n . This can be seen easily
by induction on n. Note also that such a derivation cannot involve u.
If we apply this observation to the above derivations of the side premises we see that
they derive
0 0 20
R(0, 0, s2 (0)), R(0, s2 (0), s2 (0)), ... R(0, s2k−1 (0), s2k (0)).
We now extend our language L by a strong existential quantifier written ∃∗ (as opposed
to ∃ defined by ¬∀¬). There are two approaches to deal with formulas containing ∃∗ in a
constructive setting, e.g. in minimal or intuitionistic logic.
• A formula containing ∃∗ is considered not to be an entity the deduction system can deal
with: some “realizing terms” are required to turn it into a “judgement” (this terminol-
ogy is due to Weyl [34] and has been taken up by Martin–Löf). E.g. r realizes ∃∗ x A
is a judgement, which can be translated into A[r/x].
• (Heyting) The logic is extended by axioms expressing the intended meaning of the
strong existential quantifier.
We will treat both approaches here. At first sight, Weyl’s point of view is more convincing.
However, Heyting’s is more prominent in the literature, and we also need it to properly
discuss Friedman’s A–translation.
Let us first describe Heyting’s approach. Here we extend our notion of an L–formula
by adding a clause
In the inductive definition of derivation terms dA in minimal logic and their sets FA(dA )
of free assumptions we have to add two more clauses:
∃∗+ ∗
x,A : ∀∀x.A → ∃ x A
(∃∗− ) If A, B are formulas and x is a variable of type ρ such that x 6∈ FV(B), then
∃∗− ∗ ρ ρ
x,A,B : ∀.(∃ x A) → (∀x .A → B) → B
It can be shown that any derivation term has a unique normal form with respect to βη∃∗ –
conversion.
An alternative (in fact more usual) way to introduce the strong existential quantifier
∃∗ into our natural deduction calculus is to use rules instead of axiom schemata. These
rules have been formulated by Gentzen [11], as follows.
(∃∗+ )
|
A[t] ∗+
∃
∃∗ x A
(∃∗− )
[A]
| |
∗
∃ xA B
∃∗− ,
B
provided x does not occur free in B and in any open assumption of the given derivation
of B apart from the assumption A shown.
is a derivation term with FA(∃∗− (d, x, u, e)) = FA(d) ∪ (FA(e) \ {u}). The variables x
and u are viewed as bound by ∃∗− (d, x, u, e).
Also the definition of the set FV(d) of free (object) variables has to be extended by
Similarly one can also introduce a strong disjunction into our natural deduction cal-
culus, written ∨∗ (as opposed to ∨ defined by A ∨ B := ¬(¬A ∧ ¬B)). We again extend
our notion of an L–formula by adding a clause
In the inductive definition of derivation terms dA in minimal logic and their sets FA(dA )
of free assumptions we have to add three more clauses:
(∨∗+
0 ) If A, B are formulas, then
∨∗+ ∗
A,B,0 : ∀.A → A ∨ B
(∨∗+
1 ) If A, B are formulas, then
∨∗+ ∗
A,B,1 : ∀.B → A ∨ B
Clearly FV(∨∗+
i ) = FV(∨
∗−
) = ∅.
An alternative (and again more usual) way to introduce strong disjunction ∨∗ into
our natural deduction calculus is to use rules instead of axiom schemata. These rules have
been formulated by Gentzen [11], as follows.
(∨∗+
0 )
|
A
∨∗+
A∨ B 0
∗
(∨∗+
1 )
|
B
∨∗+
A ∨∗ B 1
(∨∗− )
[A] [B]
| | |
A ∨∗ B C C
∨∗− .
C
5 The strong existential quantifier 41
(∨∗+ A
0 ) If d is a derivation term, then
∗
∨∗+ A A∨
0 (d )
B
(∨∗+ B
1 ) If e is a derivation term, then
∗
∨∗+ B A∨
1 (e )
B
is a derivation term with FA(∨∗− (d, u, e, v, f )) = FA(d) ∪(FA(e) \ {u}) ∪(FA(f ) \ {v}).
Again the variables u, v are viewed as bound by ∨∗− (d, u, e, v, f ).
Also the definition of the set FV(d) of free (object) variables has to be extended by
∗
FV(∨∗+ A A∨
0 (d )
B
) := FV(d) ∪ FV(B),
∗
FV(∨∗+ B A∨ B
1 (e ) ) := FV(e) ∪ FV(A),
∗−
FV(∨ (d, u, e, v, f )) := FV(d) ∪ FV(e) ∪ FV(f ).
Note that one can easily extend the Ex–falso–quodlibet Lemma to the present situation
and prove ⊥ → A for an arbitrary formula A. In the cases ∃∗ x A and A ∨∗ B just use the
corresponding introduction axiom (or rule).
As an application of normalization we obtain the so–called existence and disjunction
properties of minimal and intuitionistic logic. To formulate it we introduce the notion of
an instance of a formula possibly involving ∃∗ , ∨∗ . It is obtained by recursively replacing
stricly positive subformulas ∃∗ x A by A[t/x] and A ∨∗ B by A or B. More precisely, we
give the following inductive definition.
Let us now describe Weyl’s approach. We restrict ourselves to formulas without ∨∗ , since
in the presence of a ground type of booleans we can define A ∨∗ B by
r1ρ1 , . . . , rm
ρm
mr A
τ (A → B) := ρ~ → σ1 , . . . , ρ
~ → σn ,
τ (A ∧ B) := ρ~, ~σ,
τ (∀xρ B) := ρ → σ1 , . . . , ρ → σn ,
τ (∃∗ xρ B) := ρ, ~σ.
~ → σ1 , . . . , ρ
Instead of ρ ~ → σn we will sometimes write ρ
~ → ~σ . To give some examples, let
n, m, k be of type nat. Then
Note that τ (A) = ε iff A is a Harrop formula (i.e. contains ∃∗ in premises of → only).
For any judgement we now define its modified realizability interpretation, i.e. its trans-
lation in our ∃∗ –free language.
ε mr R(~t) := R(~t),
r1 , . . . , rn mr (A → B) := ∀~x.~x mr A → r1 ~x, . . . , rn~x mr B,
~r, ~s mr (A ∧ B) := ~r mr A ∧ ~s mr B,
r1 , . . . , rn mr ∀xρ B := ∀xρ r1 x, . . . , rn x mr B,
r, ~s mr ∃∗ xρ B := ~s mr B[r/x].
Note that for Harrop formulas A we have ε mr A ≡ A iff A does not contain ∃∗ .
Let us now set up a relation between the implicit and the explicit approach to deal
with the existential quantifier.
44 6 Realizing terms
τ (B)
Definition. Assume that to any assumption variable uB we have assigned a list ~xu =
xρu11 , . . . , xρun
n
of distinct variables, where ρ1 , . . . , ρn = τ (B). Relative to this assignment we
define for any derivation dA its extracted terms ets(dA ), by induction on dA . If τ (A) =
σ1 , . . . , σk , then ets(dA ) will be a list r1σ1 , . . . , rkσk .
ets(uA ) = ~xτu(A) ,
ets(λuA dB ) = λ~xτu(A) ets(d),
ets(dA→B eA ) = ets(d)ets(e),
ets(hdA , eB i) = ets(dA ), ets(eB ),
ets(π0 (dA∧B )) = the head of ets(dA∧B ) of same length as τ (A),
ets(π1 (dA∧B )) = the tail of ets(dA∧B ) of same length as τ (B),
ets(λxρ dA ) = λxρ ets(d),
ρ
ets(d∀x A tρ ) = ets(d)t,
ets(∃∗+
x,A ) = λ~
xλxλ~y .x, ~y,
ets(∃∗−
x,A,B ) = λ~
xλxλ~y λz1 . . . λzn .z1 x~y , . . . , zn x~y .
Note that if ets(d) = r1 , . . . , rk and ets(e) = ~s, then ets(d)ets(e) = r1~s, . . . , rk~s and
λ~x ets(d) = λ~x r1 , . . . , λ~x rk . In the last clause the (omitted) types are
ρ→τ (A)→σj
xρ , (~y )τ (A) and zj ,
where τ (B) = σ1 , . . . , σn .
The following can be proved easily.
τ (A)
Lemma. FV(ets(d)) ⊆ FV(d) ∪ {~xu : uA ∈ FA(d)}.
Hence we can safely identify terms with the same βη∃∗ –normal forms.
µ(d): ets(d) mr A
Proof by induction on d.
6 Realizing terms 45
i.e. of
∀~xρ~ .~x mr B → ets(dA ) mr A,
since (λ~xρ~ ets(dA ))~x =β ets(dA ) and terms with the same β–normal form are identified.
Hence we can take
µ(λuB dA ) := λ~xλũ~x mr B µ(d).
Hence we have
µ(d)ets(e)µ(e): ets(d)ets(e) mr B
and since ets(de) = ets(d)ets(e) we can take µ(de) := µ(d)ets(e)µ(e).
Case hdA , eB i. By IH we have
µ(d): ets(d) mr A,
µ(e): ets(e) mr B,
hence
By the definition of modified realizability µ(d) proves a conjunction. Let ℓ be the length
of τ (A) and headℓ (ets(d)) the head of ets(d) of length ℓ. Then
Let us now extend these considerations to arithmetic. We allow constants for primitive
recursive functionals of arbitrary types (i.e. terms of Gödel’s T ), identifying terms with
the same normal form (w.r.t. the usual conversion rules for Gödel’s T ). It is assumed
that at least the ground types nat of natural numbers and boole of booleans are present.
We restrict ourselves to decidable atomic formulas; it is convenient to represent them by
boolean terms, i.e. in the form atom(t boole ) where atom is a distinguished relation symbol.
We could equally well take equations r = s with r, s terms of type nat as the only boolean
formulas. We do not need ⊥ as an extra atomic formula, since it can be defined by
⊥ := atom(false). Let us use n, m as variables of type nat and p, q as variables of type
boole. Our induction schemata are the universal closures of
We can use boolean induction (i.e. case analysis) to prove stability ¬¬atom(p) → atom(p)
and from this we can as before conclude the stability ¬¬A → A of formulas A built from
atoms by →, ∧ and ∀. As already remarked, strong disjunction ∨∗ can be defined by
means of the strong existential quantifier ∃∗ .
Let us now carry out this program. First we extend the notion of a term by adding
the clauses
are terms.
Rnat,ρ rs0 →0 r,
Rnat,ρ rs(t + 1) →0 st(Rnat,ρ rst)
and
Rboole,ρ rs true →0 r,
Rboole,ρ rs false →0 s.
48 7 Arithmetic
We now show that addition of these conversion rules for R does not destroy the
property of our reduction relation to be terminating.
There are many treatments of this problem in the literature. Troelstra in [30], p. 107–
108 gives a proof which either uses König’s Lemma (or the fan theorem) or else the Church–
Rosser Theorem for →. A proof which works for closed terms only (but in a general setting
where bounded fixed point operators are allowed) is given in [26]. The short proof below
is due to Ulrich Berger; it works for arbitrary arithmetical terms and does not use the
Church–Rosser Theorem.
Theorem. → is terminating.
Proof. We extend the argument in §4, which was based on Tait’s strong computability
predicates. Let us write SC(r) to mean that r is strongly computable, and SN(r) to mean
that r is strongly normalizeable. It remains to be shown that the constants Rnat,ρ and
Rboole,ρ are strongly computable. We restrict ourselves to Rnat,ρ ; for Rboole,ρ the argument
is similar (and simpler). So we have to show that for any terms r, s, t, ~t
since for ground types SC and SN coincide. So assume SC(r, s, t, ~t). We prove (1) by
induction on triples
(h(r, s, t), #t, h(~t))
ordered lexicographically, where #r is the length of r and h(r1 , . . . , rn ) is the sum of the
heights of the reductions trees for r1 , . . . , rn (assuming SN(r1 , . . . , rn )); here we have to
use König’s Lemma. It clearly suffices to show
If the reduction takes place within a subterm r, s, t, ~t the assigned triple gets smaller and the
claim follows by IH (here we need that strong computability is preserved under conversion
steps; cf. Lemma 2 in §4). Since the case t ≡ 0, t′ ≡ r~t is trivial it suffices to consider the
case t ≡ t0 + 1, t′ ≡ st0 (Rrst0 )~t. For all ~u with SC(~u) we have
SC(~u) =⇒ SN(Rrst0~u),
closed term of type boole is identified with either true or false. Such terms are denoted by
n and called numbers (even if they are of type boole).
Let us consider some examples of arithmetical terms. Addition n + m can be defined
easily by recursion on m; note that the parameter n remains fixed, since n + (Sm) =
S(n + m). Let
+ := λn, m.Rnat,nat n(λx, y.Sy)m.
Then we have
Similarly one can define the predecessor pred and the zero-test zero?: nat → boole. Equal-
ity n = m presents a slight problem, since in a definition by recursion on n the parameter
m has to be changed: we must define Sn = Sm by n = m. Therefore we represent
=: nat → nat → boole as the function which maps n to λm.n = m. More formally,
with
f := zero?,
g := λn, h, m.Rboole,boole false (h(pred m))(zero? m).
Then we have
Problem: Define a closed term f : nat → nat representing the Fibonacci sequence
f 0 = 1,
f 1 = 1,
f (n + 2) = f n + f (n + 1).
50 7 Arithmetic
Indn,A~rde0 →0 d,
Indn,A~rde(t + 1) →0 et(Indn,A~rdet)
and
Indp,A~rde true →0 d,
Indp,A~rde false →0 e.
Again it can be shown by standard methods — just as for Gödel’s T , cf. [30] — that any
derivation term in arithmetic has a unique βη∃∗ R–normal form, where R refers to the
conversion rules above.
7 Arithmetic 51
i.e.
Ri ~y f~0 = yi ,
Ri ~y f~(z + 1) = fi z(R1 ~y f~z) . . . (Rk ~y f~z);
here = denotes equality of βηR–normal forms. Using these equations we can then prove
the above claim easily (recall that terms with the same normal form are identified). The
operators R1 , . . . , Rn can be defined easily from the recursion constant Rnat,ρ1 ×...×ρk . We
could equally well have introduced them as constants and added the equations above as
conversion rules.
Boolean induction, i.e. case analysis is treated similarly. We let
ets(Indp,A ) := λ~x.R1 , . . . , Rk ,
where now R1 , . . . , Rk are simultaneous primitive recursion (or case splitting) operators of
~→ρ
type Ri : ρ ~ → boole → ρi satisfying
Ri ~y~z true = yi ,
Ri ~y~z false = zi .
The lemmata in §6 stating that ets commutes with substitution and reduction remain valid
since the conversion rules for induction and recursion fit together.
The following remarks and definitions will be helpful later. Let us call a formula A
decidable if there is a term tA such that ⊢ A ↔ atom(tA ).
1. Every quantifier–free formula is decidable. First let ⊃:= λpλq.Rq true p and & :=
λpλq.R(R true false q) false p. Clearly
tatom(r) := r,
tA→B := ⊃tA tB
tA∧B := &tA tB .
2. We can do case splitting according to decidable formulas A, i.e. for every formula
B[~x] we can prove
CasesA,B : (A → B) → (¬A → B) → B.
The derivation CasesA,B is given by
λu1 , u2 .Ind ~x(λu3 λu4 .u3 T)(λu5 λu6 .u6 ¬F)tA (λu7 .u1 (d1 u7 ))(λu8 .u2 (d2 u8 ))
atom(t )→A
A A ¬atom(t )→¬A
where d1 , and d2 are derivations which exist according to 1 and the
axioms and assumption variables with indices are (writing t for atom(t))
Indp,(p→B)→(¬p→B)→B ,
uA→B
1 , u¬A→B
2 , utrue→B
3 , u4¬true→B , ufalse→B
5 , u¬false→B
6 , ut7A and u¬t
8
A
. The extracted
terms of CasesA,B are given by
~ y1 , ~z1 .~y1 )(λ~y1 , ~z1 .~z1 ) and ~y , ~z, ~y1 , ~z1 are lists of variables of types ρ~ := τ (B).
where if := R(λ~
Clearly
if true ~r~s =βR ~r, if false ~r~s =βR ~s.
For better readability we use for if tA~r~s the notation
For every formula A let A∼ be obtained from A by relativizing all quantifiers ∀xρ and
∗ ρ
∃ x to x ∼ρ x.
(3) x ∼ρ x → y ∼ρ y → (x ∼ρ y ↔ (x =ρ y)∼ ).
Using (2) and (3) we can show that to every derivation d: A in E–HAω there exists a
derivation d∼ : A∼ in N –HAω with FA(d∼ ) = {B ∼ |B ∈ FA(d)} ∪ {x ∼ x|x ∈ FV(d)}. If d
is an extensionality axiom we use (3). If d is an ∀-elimination we use (2). All other cases
are easy. The conservativity result now follows since in N –HAω we can prove x ∼ρ x for
types ρ of level ≤ 1 and hence A ↔ A∼ if A contains quantifiers of type level ≤ 1 only.
The proof also shows that for universal quantifiers in negative position and existential
quantifiers in positive positions we can remove the restriction on types.
2. Howard has shown in [30], Appendix B that the formula
∀x, y, u∃z.xz = yz → ux = uy
with x, y: nat → nat and u: (nat → nat) → nat (which is equivalent to Extnat→nat,nat ) is
an example of an ∀∃-formula which is provable in E–HAω such that the corresponding
∀∃∗ -formula
∀x, y, u∃∗ z.xz = yz → ux = uy
is not provable in E–HAω (since the latter is equivalent to the Dialectica Interpretation of
Extnat→nat,nat this shows that the extensionality axioms are not Dialectica interpretable).
Therefore E–HAω is not closed under Markov’s rule, i.e. for quantifier free A we have
that E–HAω ⊢ ¬¬∃∗ xA (note that ⊢ ¬¬∃∗ xA ↔ ∃A) does not imply E–HAω ⊢ ∃∗ xA in
general. In the next chapter we will show that N –HAω is closed under Markov’s rule.
Part II: Computational content of
proofs
8 A–translation with program extraction
and the direct method
First note that a classical proof of ∀x∃y A generally does not yield a program to compute
y from x. The reason for this is that there might be a universal quantifier ∀z right after
∃y, i.e. after ¬∀y¬, and this makes it possible that an assumption
∀y¬∀z B
is instantiated with a non–constant term containing critical variables which are bound
later by ∀z.
It is well known that this is not just a technical difficulty: if T denotes Kleene’s
T –predicate, then
∀n∃m∀k.T (n, n, k) → T (n, n, m)
is trivially provable even in minimal logic (with ∃m defined as ¬∀m¬, i.e. in classical logic),
but there is no computable function f satisfying
for then ∃k T (n, n, k) would be decidable: it would be true if and only if T (n, n, f (n))
holds.
Hence in the rest of this section we will only consider formulas of the form ∀x∃y A
with A quantifier–free.
8 A–translation with program extraction and the direct method 55
We first describe a “direct method” (cf. [23]) to extract the computational content from a
classical proof.
By a Π–formula we mean a formula built without the strong existential quantifier
∗
∃ , which has no (universal) quantifier in premises of implications. For instance any Horn
formula ∀~x.P1 (~x) → · · · → Pn (~x) → Q(~x) is a Π–formula, but
is not. Clearly every Π–formula is equivalent (in minimal logic) to a conjunction of formulas
∀C where C is quantifier–free and without ∧. So from now on we will assume that Π–
formulas are of this form.
A derivation d is called a refutation of Π–assumptions if d derives a closed false atom
from assumptions FA(d) = {v1 : ∀C1 , . . . , vn : ∀Cn } where each Ci is quantifier–free.
Now let d be a refutation of Π–assumptions. We may assume FV(d) = ∅ (if not,
substitute arbitrary closed terms for the free variables in d). Next we can normalize d. Let
d↓ be the result. Again d↓ is a refutation of Π–assumptions. We then can read off from
d↓ a list |d| of closed terms called the “first instance” of d↓ (cf. [23]) such that one of the
Π–assumptions is false at |d|. To make this notion easier to understand let us restrict the
general situation slightly. A closed quantifier–free formula B is true respectively false if
tB normalizes to the boolean constant true respectively false. A closed Π–formula ∀~x C is
true iff for all closed terms ~t the formula C[~t/~x] is true.
Now let d be a refutation of Π–assumptions. We may assume FV(d) = ∅ (if not,
substitute arbitrary closed terms for the free variables in d). Next we can normalize d. Let
d↓ be the result. Again d↓ is a refutation of Π–assumptions. We then can read off from d↓
a list of closed terms called the “first instance” of d↓ (cf. [23]). To make this notion easier
to understand let us restrict the general situation slightly.
Let d: ⊥ be a normal derivation with FV(d) = ∅ of ⊥ from assumptions
where ∀C1 , . . . , ∀Cn are true closed Π–formulas. We define a list |d| of closed terms, called
the first instance of d, such that B1 [|d|], . . . , Bm [|d|] are true. |d| is defined by induction
on d. Since d is normal and FV(d) = ∅ it does not contain axioms (exept the truth axiom,
which is a closed Π–formulas and hence may be assumed to be among the Π–assumptions
∀Ci ). To see this recall that the normal form of any closed term of type nat is of the form
S(S(S . . . (S0) . . .)) and of any closed term of type boole is either true or false; hence all
induction axioms unfold. Therefore d is of the form
w~sd1 . . . dk ,
56 8 A–translation with program extraction and the direct method
where ~s are closed terms and d1 , . . . , dk are derivations of closed quantifier–free formulas.
We distinguish two cases.
1. d1 , . . . , dk derive only true formulas (which can be decided, since the formulas are
quantifier–free and closed). Then w cannot be one of the vi since all ∀Ci are true.
Hence d = u~sd1 . . . dk and the di derive Bi [~s]. So let |d| := ~s.
V V
Hence from a proof d of ∀~x∃~y i Bi [~x, ~y ] from true Π–assumptions with Bi quantifier–
free we can obtain the following algorithm to compute V
V for any ~r an ~s such that all Bi [~r, ~s]
hold. First instanciate d with ~r, i.e. form d~r: ∃~y i Bi [~r, ~y ]. Since ∃ is ¬∀¬, we have d~ru: ⊥
with u: ∀~y .B1 [~r, ~y] → . . . → Bm [~r, ~y ] → ⊥ a new Π–assumption (of a false formula!). Now
normalize d~ru. From its normal form (d~ru)↓, which is a refutation from Π–assumptions,
we can read off the first instance |(d~ru)↓| of (d~ru)↓. These are closed terms ~s such that all
Bi [~r, ~s] are true.
It might seem that instead of the method described, which chooses the branch to follow
by checking whether some quantifier–free formulas are false or true, one could alternatively
look for an occurrence of the false Π–assumption u in the proof whose arguments do not
contain u any more. However, in our general case where Π–assumptions (and not just
Horn formulas) are allowed these arguments may contain free assumption variables bound
later (by →+ ) in the proof, and so we cannot conclude that all arguments of u derive true
formulas. If, however, we restrict attention to the special case where we only allow Horn–
formulas as assumptions, then this phenomenon cannot happen and we have a variant of
the direct method.
Some comments are to be made here.
using ex–falso–quodlibet. If we do this preparatory step first, then the variant of the
direct method can be applied to the general case of Π–assumptions as well.
8.3 A–translation
We now describe Friedman’s A–translation from [10]. Let A be an arbitrary but fixed
formula. The A–translation B A of a formula B is obtained by replacing any atomic sub-
formula P of B by (P → A) → A.
Note that any derivation d of some formula B from assumptions C 1 , . . . , Cn becomes
after the A–translation a derivation of B A from C1A , , . . . , CnA . To see this recall that our
logical rules are those of minimal logic and hence give no extra treatment to falsity. Also
the axiom schemes (exept the truth axiom, which can be viewed as a Π–assumption) remain
instances of the same axiom scheme after the A–translation. E.g. boolean induction
B[true/p] → B[false/p] → ∀p B
is translated into
B A [true/p] → B A [false/p] → ∀p B A ,
which again is an instance of boolean induction.
Let us look at what happens with Π–assumptions under the A–translation. As in §8.2
we may assume that all formulas considered do not contain ∧.
So assume
ũ: B~ → R,
~ A,
ṽi : B
i
w: R → A.
We must show A.
Case u− ~
i : ¬Bi for some i. Let Bi ≡ Ci → Pi with atoms Pi . Then
~i A → (Pi → A) → A
ṽi : C
and we have
~
eij [u−
i ]: ≡ StabCij λv
¬Cij −
.ui λ~uCi .EfqPi (vuj ): Cij ,
Pi − ~
ei [u− uCi wi : ¬Pi .
i ]: ≡ λwi .ui λ~
58 8 A–translation with program extraction and the direct method
A
By IH we have dCij : Cij → Cij . Hence
ṽi (dCi1 ei1 ) . . . (dCini eini )(λwiPi .EfqA (ei wi )): A.
Case u+
i : Bi for all i. Then
w(ũu+ +
1 . . . um ): A
...
if ¬Bm then ~xm dets ets ~
Cm1 . . . dCmnm 0 else
~z fi . . . fi,
where ~xi , ~z are the lists of variables associated with ṽi : BiA , w: R → A.
Here we have used case splitting according to the quantifier free formulas Bi which is
admissible by the remark at the end of §7.
If we want to use the A–translation to extract the computational content from a
classical proof we have to choose a particular A involving the strong existential quantifier.
Lemma 2. Let Bi [~x, ~y ] be quantifier–free formulas and A[~x] := ∃∗ ~y i Bi [~x, ~y]. Then we
V
V
can find a derivation of (∀~y .B1 [~x, ~y ] → . . . → Bm [~x, ~y ] → ⊥)A[~x] .
Case u− ~
i : ¬Bi for some i. Let Bi ≡ Ci → Pi with atoms Pi . Then
A
~i → (Pi → A) → A
ṽi : C
and we have
~
eij [u−
i ]: ≡ StabCij λv
¬Cij −
.ui λ~uCi .EfqPi (vuj ): Cij ,
Pi − ~
ei [u− uCi wi : ¬Pi .
i ]: ≡ λwi .ui λ~
A
Using dCij : Cij → Cij from Lemma 1 we obtain
ṽi (dCi1 ei1 ) . . . (dCini eini )(λwiPi .EfqA (ei wi )): A.
Case u+
i : Bi for all i. Then
∃+ ~y hu+ +
1 , . . . , um i: A
dA[~x] [u′ : (∀~y .B1 [~x, ~y] → . . . → Bm [~x, ~y ] → ⊥)A[~x] , v1′ : (∀C1 )A[~x] , . . . vn′ : (∀Cn )A[~x] ]
: (⊥ → A[~x]) → A[~x]
By Lemma 2 (now using the particular choice of A[~x]) the A[~x]–translation of the assump-
tion u is provable without assumptions:
where
dtr ≡ dA[~x] [du , dv1 , . . . , dvn ]EfqA[~x] .
Having obtained a proof dtr of an existential formula ∃∗ ~y i Bi [~x, ~y ] we can then apply
V
V
the general method of extracting terms to this proof. It yields
ets ~
(dtr )ets ≡ (dA[~x] )ets [dets ets
u , dv1 , . . . , dvn ]0, (1)
1. The most straightforward way is to prove the formula without any assumptions. This
means that we are not allowed to use lemmata, and hence that such a proof tends to
be rather long, and difficult to produce.
2. TThe next straightforward way is to pack all Π–Lemmata used in the proof (and
proved explicitely in 1 above) into purely generalized atoms ∀~x atom(t). However,
this means that we have to introduce rather complex boolean terms, which will later
show up in the proof as tests for case distinctions.
60 8 A–translation with program extraction and the direct method
8.4 Comparison
We now prove that the value of the extracted terms (dtr )ets when instanciated with a list
~r of closed terms is in fact the same as the result of the direct method described in §8.2.
So consider again the situation of Friedman’s Theorem, i.e. a derivation
d[u: ∀~y .B1 [~x, ~y ] → . . . → Bm [~x, ~y] → ⊥, v1 : ∀C1 , . . . , vn : ∀Cn ]: ⊥
with Bi , Cj quantifier–free. We just observed that the program (dtr )ets extracted from the
translated derivation has the form (1) above. Let us try to understand how this program
works. First, (dA[~x] )ets closely follows the structure of d. The reason is that dA[~x] differs
from d only with respect to the formulas affixed, and when forming the extracted terms
this affects only the types and the arities of the lists of object variables associated with
assumption variables.
In order to comprehend dets ets
vi and du let us have a second look at the proofs of Lemma 1
A[~
x]
and 2. First note that dvi [vi : ∀Ci ]: (∀Ci )A[~x] is obtained from di : Ci → Ci constructed
in the proof of Lemma 1 by
dvi ≡ λ~yi .di (vi ~yi ).
Since vi has type ∀Ci , which is a Harrop formula, we have dets vi ≡ λ~ yi dets
i . Now from the
proof of Lemma 1 we obtain
ets ~
dets
i ≡ λ~x1 , . . . , ~xm , ~z. if ¬B1 then ~x1 dets
C . . . dC 0 else
11 1n1
...
if ¬Bm then ~xm dets ets ~
Cm1 . . . dCmnm 0 else
~z fi . . . fi, (2)
where Ci ≡ B1 → . . . → Bm → R with Bi ≡ C ~ i → Pi and ~xi , ~z are the lists of variables
ets
associated with ṽi , w. Furthermore, dCij are the extracted terms of derivations dCij : Cij →
A[~
x]
Cij constructed by previous applications of Lemma 1. Similarly
ets ~
dets y , ~x1 , . . . , ~xm , ~z. if ¬B1 then ~x1 dets
u ≡ λ~ C11 . . . dC1n1 0 else
...
if ¬Bm then ~xm dets ets ~
Cm1 . . . dCmnm 0 else
~y fi . . . fi, (3)
where Bi ≡ C ~ i → Pi and ~xi , ~z are the lists of variables associated with ṽi , w. Furthermore,
A[~
x]
dets
Cij are the extracted terms of derivations dCij : Cij → Cij constructed by previous
applications of Lemma 1.
This analysis makes it possible to prove that the value of the extracted terms when
instanciated with a list ~r of closed terms is in fact the same as the result of the direct method
described in §8.2 to read off the first instance provided by the instanciated derivation
¯ ∀~y .B1 [~r, ~y ] → . . . → Bm [~r, ~y] → ⊥, v1 : ∀C1 , . . . , vn : ∀Cn ]: ⊥
d[ū:
ets ~
|e| = [[(eA[~r] )ets [dets
u [~r /~x], dets
v1 , . . . , dvn ]0]].
We then obtain that the instanciation of the extracted terms (1) with ~r for ~x, i.e.
ets ~ ets ~
(dA[~x] )ets [dets ets
r/~x] ≡ (d[~r/~x]A[~r] )ets [dets
u , dv1 , . . . , dvn ]0[~ u [~r/~x], dets
v1 , . . . , dvn ] 0
has as its value the list of closed terms which is the first instance of the instanciated
derivation d[~r/~x], i.e. |d[~r/~x]↓|. For by the claim we have
ets ~
|d[~r/~x]↓| = [[((d[~r/~x]↓)A[~r] )ets [dets
u [~r/~x], dets
v1 , . . . , dvn ]0]]
ets ~
= [[((d[~r/~x]A[~r] )ets )↓[dets
u [~r/~x], dets
v , . . . , dv ]0]]
1 n
A[~
r] ets ets ~
= [[(d[~r/~x] ) [dets
u [~r/~x], dets
v1 , . . . , dvn ]0]],
since normalization (i.e. βη∃∗ R–conversion) commutes with A[~r]–translation and the for-
mation of extracted terms.
It remains to prove the claim. We use induction on e. Since e is normal, it must be
of the form e = w~se1 . . . ek with w ∈ {ū, v1 , . . . , vn }.
Case 1. e1 , . . . , ek derive only true formulas. Then w = ū, k = m and the ei derive
Bi [~r, ~s]. By definition |e| := ~s. Furthermore
ets ~
(eA[~r] )ets [dets
u [~r /~x], dets
v1 , . . . , dvn ] 0
A[~
r] ets ets ~
≡ dets
u [~r /~x]~s(e1 ) [dets
u [~r /~x], dets ets A[~
r] ets ets
v1 , . . . , dvn ] . . . (em ) [du [~ r/~x], dets
v1 , . . . , dvn ] 0
=βR ~s
Case 2. There is a minimal i such that ei derives a false formula, Di1 [~s] → · · · →
Dini [~s] → ⊥ say. Then Di1 [~s], . . . , Dini [~s] are true. Without loss of generality we may
D [~
s] Din [~ s]
assume that ei = λw1 i1 . . . λwm i f where f : ⊥ contains assumptions among
ets ~
(eA[~r] )ets [dets
u [~ r], dets
v1 , . . . , dvn ] 0
(
A[~
r] A[~
r] ets ets ets ~
dets
u [~ r]~s (e1 )ets [dets u [~ r], dets ets
v1 , . . . , dvn ] . . . (ek ) [du [~r], dets
v1 , . . . , dvn ]0 if w = ū
≡ A[~r] A[~r] ets ets ets ~
dets
vi ~s (e1 )ets [dets u [~ r], dets ets
v1 , . . . , dvn ] . . . (ek ) [du [~r], dets
v1 , . . . , dvn ] 0 if w = vi
A[~
r] ets ets ~
=βR (ei ) [dets
u [~r], dets ets ets
v1 , . . . , dvn ]dw1 . . . dwn 0 by (3) and (2), respectively
i
ets ~
=β (f A[~r] )ets [dets
u [~r], dets ets ets
v1 , . . . , dvn , dw1 , . . . , dwn ]0,
i
In applications it will be important to produce extracted terms with as few as possible case
distinctions, and also that the case distinctions should be over as simple as possible boolean
terms. The following example will show that such improvements are indeed necessary.
Let f : nat → nat be an unbounded function with f (0) = 0. Then we can prove
f (m) = m2 , then this formula expresses the existence of an integer square root
If e.g. √
m := [ n] for any n. More formally we can prove
Here <nat→nat→boole is the characteristic function of the natural ordering of the natural
numbers and r < s denotes atom(<rs). We expressed f (m) ≤ n by ¬n < f (m) and
f (0) = 0 by ∀n ¬n < f (0) to keep the formal proof as simple as possible. In order to
have Π–assumptions we had to express the unboundedness of f by a witnessing function
g. First note that the Π–assumptions v1 and v2 are not closed as required previously. But
this is no problem since A–translation and program extraction clearly also work in this
case: we just get a program containing f and g as parameters.
Now let us prove (1). Let n be given and assume
We have to show ⊥. From v1 and u we inductively get ∀m ¬n < f (m). For m := g(n) this
yields a contradiction to v2 .
The derivation term corresponding to this proof is
Now let
A := ∃∗ m.¬n < f (m) ∧ n < f (m + 1).
The program extracted from d is
(dA )ets is the same as d exept that Indm,¬n<f (m) has to be replaced by
λn Rnat,(nat→nat)→nat→nat
dets
u ≡ λm, x1 , x2 , k. if n < f (m) then x1 id0 else
if ¬n < f (m + 1) then x2 0 else
m fifi,
dets
v1 ≡ λn, x1 , k. if ¬n < f (0) then x1 0 else
k fi,
dets
v2 ≡ λn, k k.
Informally, (dtr )ets = H(g(n), λk k, 0) where H: nat → (nat → nat) → nat → nat is such
that
This program is correct, but it is unnecessarily complicated. We will now describe a refined
A–translation which will simplify the type of the auxiliary function H as well as its if–
then–else structure. The type reduction will be achieved by not replacing all atoms P
9 The root example; refinements 65
by (P → A) → A. This of course requires that atom is not the only relation symbol.
Furthermore, to reduce case splitting we will construct “better” proofs of C → C A for
quantifier–free C without ∧.
We first extend our arithmetical system by introducing relation symbols different
from atom, and also reintroducing a special symbol ⊥ for falsity. The reason for the latter
addition is that otherwise e.g. the boolean induction axiom
atom(true) → atom(false) → ∀p atom(p)
would translate into
atom(true) → A → ∀p atom(p),
which certainly is not derivable. Formulas are now built from atomic formulas ⊥, atom(t)
and possibly other atoms P (~t) by means of →, ∧, ∀ and ∃∗ .
We also extend the notion of a derivation term by adding the following clauses.
¬F: ¬atom(false) is a derivation term with FA(¬F) := FV(¬F) := ∅, and Efqfalse : ⊥ →
atom(false) is a derivation term with FA(Efqfalse ) := FA(Efqfalse ) := ∅. Note that by
Efqfalse and the falsity axiom ¬F: ¬atom(false) we have
⊢ atom(false) ↔ ⊥.
Stabatom : ∀p.¬¬atom(p) → atom(p) can again be proved easily by boolean induction, using
the truth axiom in the case true, and the falsity axiom ¬F and Efqfalse in the case false.
As before we can conclude ⊢ ¬¬A → A for formulas A built with ⊥, →, ∧ and ∀ and
containing atom as the only the relation symbol. ⊥ → A is derivable for any formula A
containing atom as the only the relation symbol.
Let us call a relation symbol P decidable if there is a term tP such that ⊢ ∀~x.P (~x) ↔
atom(tP ~x). Again a formula A is called decidable if there is a term tA such that ⊢
A ↔ atom(tA ). As before every quantifier–free formula containing only decidable relation
symbols is decidable, and we can do case splitting according to decidable formulas A.
Now let L be a set of formulas. In our applications L will consist of the quantifier–free
kernels of the lemmata ∀Ci ,Vand in addition of the formula B1 [~x, ~y ] → . . . → Bm [~x, ~y ] → ⊥,
V
if our goal formula is ∀~x∃~y i Bi [~x, ~y ].
The set of L–critical relation symbols is the smallest set satisfying the following con-
dition.
~ 1 → P1 ) → . . . → ( C
If (C ~ m → Pm ) → R(~t) is a positive subformula of an L–formula,
and if for some i Pi ≡ ⊥ or Pi ≡ Q(~s) for some L–critical relation symbol Q, then R
is L–critical.
We will write P ∈ CL if the atom P is of the form R(~t) for some L–critical R or P ≡ ⊥.
A quantifier–free formula B1 → . . . → Bm → R will be called L–relevant if R ∈ CL .
Lemma 1∗ . For any B ∈ Neg(L) and any C ∈ Pos(L) we can find the following deriva-
tions.
~ → R) → B
(B ~ A → (R → A) → A.
So assume
u: B~ → R,
vi : BiA ,
w: R → A.
We must show A.
Let Bi for i ∈ {1, . . . , k} be relevant and Bj for j ∈ {k + 1, . . . , m} be irrelevant.
Assume ui : Bi . Note that by IH(iii) fBj : BjA → Bj and hence fBj vj : Bj . From u: B1 →
. . . → Bk → Bk+1 → . . . → Bm → R and w: R → A we get A. Now cancel uk : Bk ,
yielding Bk → A. Using the IH gBk : BkA → (Bk → A) → A and the assumption BkA we
get A. Repeating this procedure we finally cancel u1 : B1 , yielding B1 → A. Using the IH
gB1 : B1A → (B1 → A) → A and the assumption B1A we get A, as required. The derivation
term is
Bk
dC ≡ λu, v1 , . . . , vm , w.gB1 v1 λuB
1 . · · · gBk vk λuk .w uu1 . . . uk (fBk+1 vk+1 ) . . . (fBm vm )
1
and we get
dets
C ≡ λ~
ets
x1 , . . . , ~xk , ~z.gB ets
~x (gB
1 1
ets
~x . . . (gB
2 2
~x ~z ) . . .),
k k
u1 : ¬R u2 : R
⊥→D ⊥
u: (R → D) → A R→D
(R → A) → (¬R → A) → A v: R → A A
(¬R → A) → A ¬R → A
Figure 1
~ → R) → B
(B ~ A → R.
So assume u: B ~ → R and B
~ A . Using the IH fB : B A → Bi we obtain B
~ and hence R. In
i i
ets
this case dC ≡ ε.
(ii) Case R with R ∈ CL , R 6≡ ⊥. Then RA ≡ (R → A) → A, and we must derive
((R → D) → A) → (R → A) → A.
and we get
eets
R,D ≡ λ~
x, ~y . if R then ~y else ~x fi.
((⊥ → D) → A) → A.
But this clearly can be done using EfqD , and we have eets
⊥,D ≡ λ~
x ~x.
68 9 The root example; refinements
u3 : B → C u2 : B
u1 : C → D C
u: ((B → C) → D) → A (B → C) → D
B A → (B → A) → A v: B A A
(B → A) → A B→A
(C → D) → A
Figure 2
[((B → C) → D) → A] → B A → C A
eC,D : ((C → D) → A) → C A .
λu, v.eC,D λu1 .gB vλu2 .uλu3 .u1 (u3 u2 ) or λu, v.eC,D λu1 .uλu3 .u1 (u3 (fB v))
((R → A) → A) → (R → A) → A,
ets
which is trivial. We have gR ≡ λ~x ~x.
Case R with R ≡ ⊥. Then RA ≡ A, and we must derive
A → (⊥ → A) → A,
ets
which again is trivial. We have g⊥ ≡ λ~x, ~y ~x.
Case C → B. By assumption B is relevant. We must derive
(C A → B A ) → ((C → B) → A) → A.
So assume u: C A → B A and v: (C → B) → A.
We first consider the case where C is relevant. Then we have eC,B : ((C → B) → A) →
C by IH(ii), hence B A (using v and u). By IH(iv) for the shorter formula B we know
A
and we get
ets ets
gC→B ≡ λ~x, ~y .gB (~x(eets
C,B ~
y ))~y .
We finally consider the case where C is irrelevant. The derivation now uses CasesC ;
so it suffices to first derive A under the additional hypothesis C, and then derive A under
the additional hypothesis ¬C.
So assume u+ : C. Since by IH(i) dC : C → C A we obtain C A and hence B A (using
u: C A → B A ). By IH(iv) for the shorter formula B we know gB : B A → (B → A) → A
and hence (B → A) → A. But B → A can easily be derived from our hypothesis v: (C →
B) → A and hence we obtain A, as required.
Now assume u− : ¬C. Then by ex–falso–quodlibet we obtain C → B and hence A,
using our hypothesis v: (C → B) → A.
The derivation term is
and we get
ets ets
gC→B ≡ λ~x, ~y . if C then gB (~xdets
C )~y else ~y fi.
A
involving CasesBi and dCij : Cij → Cij ~ i → Pi and fB : B A → Bj for
for relevant Bi ≡ C j j
irrelevant Bj .
~ A , . . . , vm : B
Proof. Let ~y be given and assume v1 : B ~ A . We must show A.
1 m
~i A → (Pi → A) → A
vi : C
in case Pi 6≡ ⊥ and
~i A → A
vi : C
in case Pi ≡ ⊥. We have
~
eij [u−
i ]: ≡ StabCij λv
¬Cij −
.ui λ~uCi .EfqPi (vuj ): Cij ,
Pi − ~
ei [u− uCi wi : ¬Pi .
i ]: ≡ λwi .ui λ~
A
Using dCij : Cij → Cij from Lemma 1 ∗ we obtain in case Pi 6≡ ⊥
and in case Pi ≡ ⊥
vi (dCi1 ei1 ) . . . (dCini eini ): A.
Case u+
i : Bi for all i ∈ {1, . . . , k}. Then
∃+ ~y hu+ +
1 , . . . , uk , fBk+1 vk+1 , . . . , fBm vm i: A
y
~ fi . . . fi,
A
V
V
i Bi → A direct case distinctions are only necessary for the relevant Bi ; for the other
Bj we have a derivation of BjA → Bi by Lemma 1∗ .
Let us now come back to our initial example and study the effect of our refinements
there. We can relativize the A–translation to the set L of formulas consisting of
n < f (0) → ⊥,
n < f (g(n)),
(n < f (m) → ⊥) → n < f (m + 1) → ⊥.
(dA )ets is the same as d exept that Indm,¬n<f (m) has to be replaced by
λn Rnat,nat
(since τ ((n < f (m) → ⊥)A ) = τ (n < f (m) → A) = nat), and the assumption variables
u, v1 in d have to be replaced by (unary lists of) object variables
whereas v2 has to be replaced by the empty list. The subprograms dets ets
u , dv1 (of the same
types as xu , xv1 ) are given by (cf. Lemmata 2∗ and 1∗ )
dets
u ≡ λm, x. if n < f (m) then x else
m fi,
dets
v1 ≡ λn 0.
h(0) = 0,
h(m + 1) = if n < f (m) then h(m) else m fi.
~ 1 → P1 ) → . . . → ( C
C[~y ] ≡ (C ~ m → Pm ) → Q 1 → . . . → Q n → R
A
with atoms Pi , Qj , R. Assume that we have derivations dCik : Cik → Cik with extracted
ets A
terms dCik . Show that there exists a derivation of ∀~y .C → C whose extracted termlist
has the form
...
~ m → Pm ) then ~xm dets . . . dets ~0 else
if ¬(C Cm1 Cmnm
~x′1 (~x′2 . . . (~x′n~z ) . . .) fi . . . fi.
Solution. Assume
u: (C~ 1 → P1 ) → . . . → ( C
~ m → Pm ) → Q1 → . . . → Qn → R,
~ A → (Pi → A) → A,
vi : Ci
′
→ A) → A,
vj : (Qj
w: R → A.
A
Assume further dCik : Cik → Cik .
~ i , u′ : ¬Pi for some i. Then
Case ~ui : C i
~ 1 → P1 , . . . , u′′ : C
Case u′′1 : C ~ m → Pm . Then we have uu′′ . . . u′′ : Q1 → . . . → Qn → R
m 1 m
and hence
v1′ (λuQ ′ Q2 ′ Qn ′′ ′′
1 .v2 (λu2 . . . . vn (λun .w(uu1 . . . um u1 . . . un )) . . .)): A.
1
For the extracted terms let ~xi be the list of variables associated with vi , ~x′j be the list of
variables associated with vj′ and ~z be the list of variables associated with w. Then clearly
the extracted termlist is as given in the statement of the lemma.
Part III: Classifying arithmetical
proofs
10 Ordinals below epsilon zero
We want to discuss the derivability and underivability of initial cases of transfinite in-
duction in arithmetical systems. In order to do that we shall need some knowledge and
notations for ordinals. Now we do not want to assume set theory here; hence we introduce
a certain initial segment of the ordinal (the ordinals < ε0 ) in a formal, combinatorial way,
i.e. via ordinal notations. Our treatment is based on the Cantor normal form for ordi-
nals; cf. [1]. We also introduce some elementary relations and operations for such ordinal
notations, which will be used later.
We define the two notions
• α is an ordinal notation
simultaneously by induction:
ω αm + · · · + ω α0 < ω βn + · · · + ω β0
iff there is an i ≥ 0 such that αm−i < βn−i , αm−i+1 = βn−i+1 , . . . , αm = βn , or else
m < n and αm = βn , . . . , α0 = βn−m .
It is easy to see (by induction on the levels in the inductive definition) that < is a linear
order with 0 being the smallest element.
We shall use the notation 1 for ω 0 , a for ω 0 + · · · + ω 0 with a copies of ω 0 and ω α a
for ω α + · · · + ω α again with a copies of ω α .
We now define addition for ordinal notations:
ω αm + · · · + ω α0 + ω βn + · · · + ω β0 := ω αm + · · · + ω αi + ω βn + · · · + ω β0
74 10 Ordinals below epsilon zero
(ω αm + · · · + ω α0 )#(ω βn + · · · + ω β0 ) := ω γm+n + · · · + ω γ0 ,
Furthermore, for any γ < β + ω α we can find a δ < α and an a such that
γ < β + ω δ a.
For any ordinal notation α we define its Gödel number |α| inductively by
|0| := 0,
Y
|ω αm am + · · · + ω α0 a0 | := ( pa|αi i | ) − 1.
i≤m
For any nonnegative integer x we define its corresponding ordinal notation o(x) inductively
by
o(0) = 0
Y X
o(( pai i ) − 1) = ω o(i) ai
i≤m i≤m
Lemma.
1. o(|α|) = α,
2. |o(x)| = x.
We now derive initial cases of the principle of transfinite induction in arithmetic, i.e. of
for some number a and a predicate symbol P . In §13 we will see that our results here
are optimal in the sense that for larger segments of the ordinals transfinite induction is
underivable. All these results are due to Gentzen [12].
Our arithmetical systems are based on a fixed (possibly countably infinite) supply of
function constants and predicate constants which are assumed to denote fixed functions and
predicates on the nonnegative integers for which a computation procedure is known. An
example is the formal system of arithmetic described in §7. Among the function constants
there must be a constant S for the successor function and 0 for (the 0–place function)
zero. Among the predicate constants there must be a constant = for equality and ≺ for
the ordering of type ε0 of the natural numbers, as introduced in §10. In order to formulate
the general principle of transfinite induction we also assume that a predicate symbol P is
present.
Terms are built up from object variables x, y, z by means of f (t1 , . . . , tm ), where f
is a function constant. We identify closed terms which have the same value; this is a
convenient way to express in our formal systems the assumption that for each function
constant a computation procedure is known. Terms of the form S(S(. . . S(0) . . .)) are
called numbers. We use the notation S i 0 or even i for them. Formulas are built up from
prime formulas P (t1 , . . . , tm ) with P a predicate constant or a predicate symbol and ⊥ by
means of (A → B) and ∀x A. As usual we abbreviate A → ⊥ by ¬A.
The axioms of our arithmetical systems will always include the Peano–axioms
∀x.x = x,
∀x, y.x = y → y = x,
∀x, y, z.x = y → y = z → x = z,
∀~x, ~y .x1 = y1 → . . . → xm = ym → f (~x) = f (~y ),
∀~x, ~y .x1 = y1 → . . . → xm = ym → P (~x) → P (~y )
for any function constant f and predicate constant or predicate symbol P . We also require
irreflexivity and transitivity for ≺ as axioms. For our negative results we can also allow
that for any predicate symbol P the stability axioms
are present. However, for the positive results no stability axioms are needed. We express
our assumption that for any predicate constant a decision procedure is known by adding
the axiom
P (S i1 0, . . . , S im 0)
whenever P (~i) is true, and
¬P (S i1 0, . . . , S im 0)
whenever P (~i) is false.
We finally allow in any of our arithmetical systems Z an arbitrary supply of true
Π–formuals as axioms. Our (positive and negative) results concerning initial cases of
transfinite recursion will not depend on which of those axioms we have chosen, except that
for the positive results we always assume the universal closures of
x 6≺ 0, (1)
0
z ≺ y ⊕ ω → z 6≺ y → z 6= y → ⊥, (2)
x ⊕ 0 = x, (3)
x ⊕ (y ⊕ z) = (x ⊕ y) ⊕ z, (4)
0 ⊕ x = x, (5)
ω x 0 = 0, (6)
ω x (Sy) = ω x y ⊕ ω x , (7)
z ≺ y ⊕ ω x → x 6= 0 → z ≺ y ⊕ ω f (x,y,z) g(x, y, z), (8)
z ≺ y ⊕ ω x → x 6= 0 → f (x, y, z) ≺ x, (9)
is derivable in Z.
We first show
A is progressive =⇒ A+ is progressive,
where “B is progressive” means ∀x.(∀y.y ≺ x → B[y]) → B. So assume that A is progres-
sive and
∀y.y ≺ x → A+ [y]. (10)
We have to show A+ . So assume further
So assume the left–hand side, i.e. assume that A is progressive. Case 0. From x ≺ ω 0
we get x = 0 by (5), (2) and (1), and A[0] follows from the progressiveness of A by (1).
Case n + 1. Since A is progressive, by what we have shown above also A+ is progressive.
Applying the IH to A+ yields ∀x.x ≺ ωn → A+ , and hence A+ [ωn ] by the progressiveness
of A+ . Now the definition of A+ (together with (1) and (5)) yields ∀z.z ≺ ω ωn → A[z].
11 Provability of initial cases of transfinite induction 79
Note that in these derivations the induction scheme was used for formulas of un-
bounded complexity.
We now want to refine the Theorem to a corresponding result for the subsystems Zk
of Z. Note first that if A is a formula of level ≤ k, then the formula A+ constructed in the
proof of the Theorem has level ≤ k + 1, and for the proof of
A is progressive =⇒ A+ is progressive
we have used induction with an induction formula of level ≤ k.
Now let A be of level ≤ 1, and assume that A is progressive. Let A0 := A, Ai+1 :=
(Ai )+ . Then Ai is of level ≤ i + 1, and hence in Zk we can derive that A1 , A2 , . . . Ak
are all progressive. Let ω1 [m] := m, ωi+1 [m] = ω ωi [m] . Since in Zk we can derive that
Ak is progressive, we can also derive Ak [0], Ak [1], Ak [2] and generally Ak [m] for any m,
i.e. Ak [ω1 [m]]. But since
Ak ≡ (Ak−1 )+ ≡ ∀y.(∀z.z ≺ y → Ak−1 [z]) → ∀z.z ≺ y ⊕ ω x → Ak−1 [z],
we first get (with y = 0) ∀z.z ≺ ω2 [m] → Ak−1 [z] and then Ak−1 [ω2 [m]] by the progres-
siveness of Ak−1 . Repeating this argument we finally obtain ∀z.z ≺ ωk+1 [m] → A0 [z].
Hence we have
If more generally we start out with a formula A of level ≤ ℓ instead, where 1 ≤ ℓ ≤ k, then
a similar argument yields the following result (cf. Parsons [17]).
Our next aim is to prove that these bounds are sharp. More precisely, we will show that
in Z (no matter how many true Π–formulas we have added as axioms) one cannot derive
transfinite induction up to ε0 , i.e. the formula
[∀x.(∀y.y ≺ x → P (y)) → P (x)] → ∀x P (x)
with a free predicate symbol P , and that in Zk one cannot derive transfinite induction up
to ωk+1 , i.e. the formula
[∀x.(∀y.y ≺ x → P (y)) → P (x)] → ∀x.x ≺ ωk+1 → P (x).
This will follow from the method of normalization applied to arithmetical systems, which
we have to develop first.
12 Normalization for arithmetic with the
omega rule
We will show in §14 that a normalization theorem does not hold for a system of arithmetic
like Z in §11, in the sense that for any formula A derivable in Z there is a derivation of the
same formula A in Z which only uses formulas of a level bounded by the level of A. The
reason for this failure is the presence of the induction axioms, which can be of arbitrary
level.
Here we remove that obstacle against normalization and replace the induction ax-
ioms by a rule with infinitely many premises, the so–called ω–rule (suggested by Hilbert
and studied by Lorenzen, Novikov and Schütte), which allows to conclude ∀x A from
A[0], A[1], A[2], · · ·.
Clearly this ω–rule can also be used to replace the rule ∀+ . As a consequence we do
not need to consider free object variables.
So we introduce the system Z ∞ of ω–arithmetic as follows. Z ∞ has the same language
and — apart from the induction axioms — the same axioms as Z. Derivations in Z ∞ are
infinite objects; they are built up from assumption variables uA , v B and constants axA for
any axiom A of Z other than an induction axiom by means of the rules
(λuA dB )A→B ,
(dA→B eA )B ,
A[i] ∀x A
hdi ii<ω ,
∀x A A[i]
(d i)
+ − −
denoted by → , → , ω and ∀ , respectively.
More precisely, we define the notion of a ~u–derivation (i.e. a derivation in Z ∞ with
free assumption variables among ~u) of height ≤ α and degree ≤ k inductively, as below.
Note that derivations are infinite objects now. They may be viewed as mappings from
finite lists of natural numbers (= nodes in the derivation tree) to lists of data including
the formula appearing at that node, the rule applied last, a list of assumption variables
including all those free in the subderivation (starting at that node), a bound on the height
of the subderivation, and a bound on the degree of the subderivation.
Intuitively, the degree of a derivation is the least number ≥ the level of any sub-
derivation λx d in a context (λx d)e or hdi ii<ω in a context hdi ii<ω j, where the level of a
derivation is the level of its type, i.e. the formula it derives. This notion of a degree is
needed for the normalization proof we give below.
* Any assumption variable uA and any axiom axA is a ~u–derivation of height ≤ α and
degree ≤ k, for any list ~u of assumption variables (containing u in the first case),
ordinal α and number k.
12 Normalization for arithmetic with the omega rule 81
We now embed our systems Zk (i.e. arithmetic with induction restricted to formulas of
level ≤ k) and hence Z into Z ∞ .
~
Lemma 12.1. Let dB be a derivation in Zk with free assumption variables among ~uA
which contains ≤ m instances of the induction scheme all with induction formulas of level
≤ k. Let σ be a substitution of numbers for object variables such that Aσ, ~ Bσ do not
~
contain free object variables. Then we can find a ~uAσ –derivation (d∞ )Bσ in Z ∞ of height
≤ ω m + h for some h < ω and degree ≤ k.
Proof by induction on the height of the given derivation, which we may assume to be in
long normal form. The only case which requires some argument is when the derivation
consists of two applications of →− to an instance of the induction scheme. Then it must
have the form
Indx,A dA[0] (λx, v A eA[S(x)] ).
By IH we obtain derivations
dA[0]
∞ of height ≤ ω m−1 + h0
eA[1] A[0]
∞ [d∞ ] of height ≤ ω m−1 · 2 + h1 ,
eA[2] A[1] A[0]
∞ [e∞ [e∞ ]] of height ≤ ω m−1 · 3 + h2
and so on, all of degree ≤ k. Combining all these derivations of A[i] as premises of the
ω–rule yields a derivation of ∀x A of height ≤ ω m and degree ≤ k.
A derivation is called convertible if it is of the form (λu d)e or else hdi ii<ω j, which can
be converted into d[e/u] or dj , respectively. Here d[e/u] is obtained from d by substituting
e for all free occurences of u in d. A derivation is called normal if it does not contain a
convertible subderivation. Note that a derivation of degree 0 must be normal.
We want to define an operation which by repeated conversions transforms a given
derivation into a normal one with the same end formula and no more assumption variables.
The methods employed in §4 to achieve such a task have to be adapted properly in order
to deal with the new situation of infinitary derivations. Here we give a particularly simple
argument due to Tait [29].
82 12 Normalization for arithmetic with the omega rule
Lemma 12.3. For any ~u–derivation dA of height ≤ α and degree ≤ k + 1 we can find a
~u–derivation (dk )A of height ≤ 2α and degree ≤ k.
Proof by induction on α. The only case which requires some argument is when the deriva-
tion is of the form de with d of height ≤ α1 < α and e of height ≤ α2 < α. We first
consider the subcase where dk = λu d1 and lev(d) = k + 1. Then lev(e) ≤ k by the def-
inition of level, and hence (d1 )[ek /u] has degree ≤ k by Lemma 12.2. Furthermore, also
by Lemma 12.2, (d1 )[ek /u] has height ≤ 2α2 + 2α1 ≤ 2max(α2 ,α1 )+1 ≤ 2α . Hence we can
take (de)k to be (d1 )[ek /u]. If we are not in the above subcase, we can simply take (de)k
to be dk ek . This derivation clearly has height ≤ 2α . Also it has degree ≤ k, which can be
seen as follows. If lev(d) ≤ k we are done. If however lev(d) ≥ k + 2, then d must be of
the form d0 d1 . . . dm for some assumption variable or axiom d 0 (since the given derivation
has degree ≤ k + 1). But then dk has the form d0 dk1 . . . dkm and we are done again. (To be
completely precise, this last statement has to be added to the formulation of the Lemma
above and proved simultaneously with it).
As an immediate consequence we obtain
We now apply the technique of normalization for arithmetic with the ω–rule for a proof
that transfinite induction up to ε0 is underivable in Z, i.e. of
P (i)
If di are ~u–derivations of heights ≤ αi < α and degrees ≤ ki ≤ k (i ≺ j), then
P (i) P (j)
hdi ii≺j is an ~u–derivation of height ≤ α and degree ≤ k.
Since this progression rule only deals with derivations of prime formulas it does not affect
the degrees of derivations. Hence the proof of normalization for Z ∞ carries over unchanged
to Z ∞ + Prog(P ). In particular we have
We now show that from the progression rule for P we can easily derive the progres-
siveness of P .
Proof. By the ω–rule it suffices to derive (∀y.y ≺ j → P (y)) → P (j) for any j with height
≤ 4. We argue informally. Assume ∀y.y ≺ j → P (y). By ∀− we have i ≺ j → P (i) for any
i. Now for any i ≺ j we have i ≺ j as an axiom; hence P (i) for any such i. An application
of the progression rule yields P (j), with a derivation of height ≤ 3. Now by →+ and ω
the claim follows.
The crucial observation now is that a normal derivation of P (|β|) must essentially have
a height of at least β. However, to obtain the right estimates for our subsystems Zk we
cannot apply Lemma 13.1 down to degree 0 (i.e. to the normal form) but must stop already
84 13 Unprovable initial cases of transfinite induction
at degree 1. Such derivations, i.e. those of degree ≤ 1, will be called almost normal; they
can also be analyzed easily. An almost normal derivation d in Z ∞ + Prog(P ) is called a
~
α|), ¬P (|β|)–refutation
P (|~ if d derives a formula A ~ → B with A
~ and the free assumptions
in d among P (|~ ~ := ¬P (|β1 |), . . . , ¬P (|βn |) and
α|) := P (|α1 |), . . . , P (|αm |) and ¬P (|β|)
true prime formulas, and B a false prime formula or else among P (| β|). ~
where lg~
α denotes the length of the list α
~.
Proof by induction on |d|. Note that we may assume that d does not contain the ω–rule,
since any application of it must be in a context hdi ij , which can be replaced by dj . We can
also assume that d contains ∀− only in a context where leading universal quantifiers of an
axiom are removed. Note also that d cannot derive an instance |γ| = |δ| → P (|γ|) → P (|δ|)
of an equality axiom with γ = δ true, since we have assumed that α ~ and β~ are disjoint.
We distinguish cases according to the last rule in d.
Case →+ . By our definition of refutations the claim follows immediately from the IH.
~
Case →− . Then d ≡ f A→(A→B) eA . If A is a true prime formula, the claim follows
from the IH for f . If A is a false prime formula, the claim follows from the IH for e. If A is
∀x.¬¬P (x)→P (x)
¬¬P (|γ|) (and hence f ≡ StabP |γ|), then since the level of ¬¬P (|γ|) is 2 the
¬¬P (|γ|)
derivation e must end with an introduction rule, i.e. s ≡ λu¬P (|γ|) e⊥
0 (for otherwise,
since no axiom contains some ¬¬P (d0 ) as a strictly positive subformula, we would get a
contradiction against the assumption that d has degree ≤ 1). The claim now follows from
the IH for e0 . The only remaining case is when A is P (|γ|). Then f is an almost normal
~ –refutation and e is an almost normal P (|~
α|), ¬P (|β|)
P (|γ|), P (|~ ~ ¬P (|γ|) –
α|), ¬P (|β|),
refutation. We may assume that γ is not among α ~ , since otherwise the claim follows
immediately from the IH for f . Hence we have by the IH for f
~ ≤ |f | + lg~
min(β) α + 1 ≤ |d| + lg~
α.
~ δ) ≤ |dδ | + lg~
min(β, α < |d| + lg~
α
and hence
~ γ) ≤ |d| + lg~
min(β, α.
Now we can show the following result (cf. Mints [13] and Parsons [17]).
13 Unprovable initial cases of transfinite induction 85
Proof. We restrict ourselves to the second part. So assume that transfinite induction up
to ωk+1 is derivable in Zk . Then by the embedding of Zk into Z ∞ (Lemma 12.1) and the
normal derivability of the progressiveness of P in Z ∞ +Prog(P ) with finite height (Lemma
13.2) we can conclude that ∀x.x ≺ ωk+1 → P (x) is derivable in Z ∞ + Prog(P ) with height
< ω m + h for some m, h < ω and degree ≤ k. Now k − 1 applications of Lemma 13.1
yield a derivation of the same formula ∀x.x ≺ ωk+1 → P (x) in Z ∞ + Prog(P ) with height
≤ γ = 2k−1 (ω m + h) < ωk+1 and degree ≤ 1, hence also a derivation of P (|γ + 1|) in
Z ∞ + Prog(P ) with height ≤ γ and degree ≤ 1. But this contradicts Lemma 13.3.
14 Normalization for arithmetic is
impossible
The normalization theorem for first–order logic applied to arithmetic Z is not particularly
useful since we may have used in our derivation induction axioms of arbitrary complexity.
Hence it is tempting to first eliminate the induction scheme in favour of an induction rule
allowing to conclude ∀x A from a derivation of A[0] and a derivation of A[S(x)] with an
additional assumption A to be cancelled at this point (note that this rule is equivalent to
the induction scheme), and then to try to normalize the resulting derivation in the new
system Z with the induction rule. We will apply our results from §13 to show that even a
very weak form of the normalization theorem cannot hold in Z with the induction rule.
Theorem 14.1. The following weak form of a normalization theorem for Z with the
~ ~ B formulas of degree ≤ ℓ there
induction rule is false: For any ~uA –derivation dB with A,
~
A ∗ B
is an ~u –derivation (d ) containing only formulas of degree ≤ k, with k depending only
on ℓ.
Proof. Assume that such a normalization theorem would hold. Consider the formula
Prawitz in [19] proves strong normalization for conversion rules on proof trees for logic
with ∨ and ∃. The conversion rules not only include the usual β–rules but also so–
called permutative rules. The method Prawitz uses is based on so-called strong validity
predicates; they are a variant of Tait’s notion of a strongly computable term treated in §4.
The following exposition is based on Prawitz’ proof and on a study of it by van de
Pol [32]. It uses derivation terms instead of proof trees for a cleaner exposition. Also
the definition of an end segment is formalized. In the definition of strong validity a small
oversight in Prawitz’ formulation has been corrected.
We first define the reduction rules. In order to have a uniform notation it is convenient
to write
We now show that → is terminating, i.e. that any reduction sequence starting with
d terminates after finitely many steps. We write d →∗ d′ (or d →+ d′ ) if d′ is a member
of a reduction sequence (a reduction sequence with at least two elements) starting with d.
Hence →∗ is the reflexive transitive closure of →.
Clearly conversion is compatible with substitution:
Clearly the end segment relation is transitive, i.e. ES(d, e) and ES(e, f ) imply ES(d, f ).
We now prove that an introduction in the end segment of the main premise of an exists–
elimination can always be removed by conversion.
Lemma 2. Consider ∃− (d1 , x, u, f ) and assume ES(d1 , ∃+ (t, e)). Then there exists a d2
such that ∃− (d1 , x, u, f ) →+ d2 and ES(d2 , f [t, e/x, u]).
Lemma 3. Consider ∨− (d1 , u0 , e0 , u1 , e1 ) and assume ES(d1 , ∨+ i (e)). Then there exists a
− +
d2 such that ∨ (d1 , u0 , e0 , u1 , e1 ) → d2 and ES(d2 , ei [e/ui ]).
We now define the central notion of a strongly valid derivation dA . The definition
is by induction on the number of logical symbols in A, and for fixed A it is an inductive
definition. We write SV(d) to mean that d is strongly valid.
15 Permutative conversions 89
– SV(∨+
i (d)) if SV(d).
– If d is not an introduction, then SV(d) if the following three conditions all hold.
(b) If d = ∨− (d1 , u0 , e0 , u1 , e1 ), then for i = 0 and i = 1 we have SV(ei ) and for all
d′ with d1 →∗ d′ and all e with ES(d′ , ∨+ i (e)) we have SV(ei [e/ui ]).
(c) If d = ∃− (d1 , x, u, e), then SV(e) and for all d′ with d1 →∗ d′ and all t, f with
ES(d′ , ∃+ (t, f )) we have SV(e[t, f /x, u]).
Remark. In (c) a small oversight in Prawitz’ formulation has been corrected by van
de Pol. Prawitz (in [19], p. 293) does not mention that the top formula ∃x B in the end
segment of d′ should have been obtained by ∃+ from B[t]; this is needed in the proofs
below. It could also have been obtained by
B[t] → ∃x B B[t]
.
∃x B
Lemma 6. (i) If ES(d, e) and SV(d), then SV(e). (ii) If ES(d, e) and e → e′ , then there
is a d′ such that ES(d′ , e′ ) and d → d′ . (iii) If ES(d, e) and SN(d), then SN(e).
Proof. (i) is proved easily by induction over the definition of ES(d, e), using the first part
of (b) and (c) in the definition of SV(d). (ii) is also proved easily by induction over the
definition of ES(d, e). (iii) is an immediate consequence of (ii).
90 15 Permutative conversions
(ii) If the last rule of d is neither ∨− nor ∃− , then SV(e) for any immediate subderivation
e of d.
(iii) If the last rule of d is either ∨− or ∃− , then conditions (b) and (c) of the definition of
strong validity hold. More explicitely: (b) If d = ∨− (d1 , u0 , e0 , u1 , e1 ), then for i = 0
and i = 1 we have SV(ei ) and for all d′ with d1 →∗ d′ and all e with ES(d′ , ∨+ i (e)) we
have SV(ei [e/ui ]); (c) If d = ∃ (d1 , x, u, e), then SV(e) and for all d with d1 →∗ d′
− ′
k = the length of the longest reduction sequence from the major premise of d,
ℓ = the depth of the major premise of d,
m = the sum of the lengths of the longest reduction sequences from the immediate
subderivations of d.
Case ∀− (∀+ (x, d), t) →0 d[t/x]. We must prove SV(d[t/x]). By (ii) we know that
SV(∀+ (x, d)). Hence SV(d[t/x]) by the definition of strong validity.
Case ∨− (∨+ i (d), u0 , e0 , u1 , e1 ) →0 ei [d/ui ]. We must prove SV(ei [d/ui ]). This follows
immediately from (iii), condition (b).
Case ∃− (∃+ (t, d), x, u, e) →0 e[t, d/x, u]. Similar.
III. d′ is a permutative reduct of d. We only treat the case of an existential permutative
conversion. Then
Let (k ′ , ℓ′ , m′ ) be the induction value of d′ . We will show in (i) below that k ′ and m′
are finite. We have to prove SV(d′ ). Note that the major premise of d′ is an immediate
subderivation of the major premise of d. Therefore k ≥ k ′ and ℓ > ℓ′ . So the induction
value is lowered and we can use the IH, which says that it is enough to prove (i) and (iii)
for d′ .
(i). SN(d1 ) clearly follows from (i) for d. SN(− (d2 , α
~ )) follows by Lemma 5 from
−
SV( (d2 , α
~ )), which will be proved as (∗) below.
(iii). We have to show (∗) SV(− (d2 , α ~ )) and (∗∗) If d1 →∗ d′1 with ES(d′1 , ∃+ (t, f )),
then SV(− (d2 [t, f /x, u], ~α)). First note the following facts.
SN(~
α), SN(∃− (d1 , x, u, d2)) and SN(d2 ). (F 1)
The first and second follow from assumption (ii) for d, and the third follows from the
second by the definition of strong validity. Now we can prove (∗) and (∗∗).
(∗). Let (k1 , ℓ1 , m1 ) be the induction value of − (d2 , α
~ ). Then k ≥ k1 and ℓ > ℓ1 .
Hence we can apply the IH, which says that it suffices to prove (i)—(iii) for − (d2 , α
~ ). (i)
follows from (F1) and (ii) follows from (F2). For (iii) we only treat the case − (d2 , α~) =
−
∃ (d2 , y, v, d3); the case with ∨ is similar. Then
We have to prove SV(d3 ) and for all d′2 with d2 →∗ d′2 and all s, g such that ES(d′2 , ∃+ (s, g))
we have SV(d3 [s, g/y, v]). The first follows from (c1 ) for d. For the second assume
d2 →∗ d′2 and ES(d′2 , ∃+ (s, g)). Then SV(d3 [s, g/y, v]) follows from (c2 ) for d, since
∃− (d1 , x, u, d2) →∗ ∃− (d1 , x, u, d′2) and ES(∃− (d1 , x, u, d′2 ), ∃+ (s, g)).
92 15 Permutative conversions
(∗∗). Let d1 →∗ d′1 and ES(d′1 , ∃+ (t, f )). We must show SV(− (d2 [t, f /x, u], ~α)). Let
(k2 , ℓ2 , m2 ) be the induction value of − (d2 [t, f /x, u], ~α). We will need yet another fact:
There is an e such that ∃− (d1 , x, u, d2) →+ e and ES(e, d2 [t, f /x, u]). (F 3)
This follows from Lemma 2, since d 1 →∗ d′1 and ES(d′1 , ∃+ (t, f )). From (F3) we immedi-
ately obtain k > k2 , using Lemma 6. Clearly m2 is finite by (F1). So the induction value
is lowered and we can use the IH, which says that it is enough to prove (i), (ii) and (iii)
for − (d2 [t, f /x, u], ~α).
(i). SN(~
α) holds by (F1). SN(d2 [t, f /x, u]) follows from (F3), Lemma 6(iii) and (F1).
(ii). Let ∈ {∧, ⊃, ∀}. Then we obtain SV(~ α) by (F2). SV(d2 [t, f /x, u]) follows from
−
SV(∃ (d1 , x, u, d2)) (which holds by (F2)) by (c) in the definition of strong validity (since
d1 →∗ d′1 and ES(d′1 , ∃+ (t, f ))).
(iii). We only treat the case − (d2 , α
~ ) = ∃− (d2 , y, v, d3); the case with ∨ is similar.
Then
We have to prove SV(d3 ) and for all d′2 with d2 [t, f /x, u] →∗ d′2 and all s.g such that
ES(d′2 , ∃+ (s, g)) we have SV(d3 [s, g/y, v]). The first follows from (c1 ) for d. For the second
assume d2 [t, f /x, u] →∗ d′2 and ES(d′2 , ∃+ (s, g)). We must show SV(d3 [s, g/y, v]). By (F3)
we have an e such that
But then, because of d2 [t, f /x, u] →∗ d′2 , Lemma 6(ii) gives us an e′ such that
Because of ES(d′2 , ∃+ (s, g)) and the transitivity of the end segment relation we have
ES(e′ , ∃+ (s, g)). Therefore the main premise of d reduces to e′ with ∃+ (s, g) in its end
segment. By (c2 ) for d we can conclude that the side premise of d with s, g substituted is
strongly valid, i.e. SV(d3 [s, g/y, v]).
A derivation d is called strongly valid under substitution (written SV∗ (d)) if for any ob-
ject terms ~t and strongly valid derivations f~ we have SV(d[~t, f~]). Using the Main Lemma 7
we can now prove
Case ⊃+ (u, d). Let f~ be strongly valid. We have to show that ⊃+ (u, d[~t, f~]) is strongly
valid. So let e be strongly valid. We have to show that d[e, f]~ is strongly valid. But this
follows from the IH for d.
Case ∀+ (x, d). Similar.
Case − (~ α). Let f~ be strongly valid. We have to prove (i), (ii) and (iii) of the Main
Lemma 7 for − (~ α)[~t, f~]. By IH SV(αi [~t, f~]) for every immediate subderivation αi , and
by Lemma 5 we can conclude that SN(αi [~t, f~]). This proves (i) and (ii). For (iii) we have
to prove conditions (a) and (b) of the definition of strong validity. We only treat (c).
So let d = ∃− (d1 , x, u, e), hence d[~t, f~] = ∃− (d1 [~t, f~], x, u, e[~t, f~]). Let d1 [~t, f~] →∗ d′ and
ES(d′ , ∃+ (t, f )). We must show SV(e[t, ~t, f~]). Since SV(d1 [~t, f~]) by IH, we get SV(d′ ) by
Lemma 4, hence SV(∃+ (t, f )) by Lemma 6(i) and hence SV(f ) by the definition of strong
validity. Now from the IH for e we obtain SV(e[t, ~t, f, f]). ~
From Lemma 5 and Lemma 8 we immediately obtain
[12] Gerhard Gentzen. Beweisbarkeit und Unbeweisbarkeit von Anfangsfällen der trans-
finiten Induktion in der reinen Zahlentheorie. Mathematische Annalen, 119:140–161,
1943.
[13] G.E. Mints. Exact estimates of the provability of transfinite induction in the initial
segments of arithmetic. Journal of Soviet Math, 1:85–91, 1973. Translated from
Zapiski Nauch. Sem. Leningrad 20, 134–144 (1971).
[14] G.E. Mints. On e–theorems (in russian). Zapiski, 40:110–118, 1974.
[15] M.H.A. Newman. On theories with a combinatorial definition of “equivalence”. Annals
of Mathematics, 43(2):223–243, 1942.
[16] V. P. Orevkov. Lower bounds for increasing complexity of derivations after cut elim-
ination. Zapiski Nauchnykh Seminarov Leningradskogo, 88:137–161, 1979.
[17] Charles Parsons. Transfinite induction in subsystems of number theory (abstract).
Journal of Symbolic Logic, 38(3):544–545, 1973.
[18] Gordon D. Plotkin. LCF considered as a programming language. Theoretical Com-
puter Science, 5:223–255, 1977.
[19] Dag Prawitz. Ideas and results in proof theory. In Jens Erik Fenstad, editor, Pro-
ceedings of the second Scandinavian Logic Symposium, pages 235–307. North–Holland,
Amsterdam, 1971.
[20] Helmut Schwichtenberg. Normalization. In F.L. Bauer, editor, Logic, Algebra and
Computation. Proceedings of the International Summer School Marktoberdorf, Ger-
many, July 25 – August 6, 1989, Series F: Computer and Systems Sciences, Vol. 79,
pages 201–237, Berlin, 1991. NATO Advanced Study Institute, Springer.
[21] Helmut Schwichtenberg. Primitive recursion on the partial continuous functionals. In
Manfred Broy, editor, Informatik und Mathematik, pages 251–269. Springer, Berlin,
1991.
[22] Helmut Schwichtenberg. Minimal from classical proofs. In E. Börger, G. Jäger,
H. Kleine-Büning, and M.M. Richter, editors, Computer Science Logic, pages 326–
328. Springer LNCS 626, 1992.
[23] Helmut Schwichtenberg. Proofs as programs. In Peter Aczel, Harold Simmons, and
Stanley S. Wainer, editors, Proof Theory. A selection of papers from the Leeds Proof
Theory Programme 1990, pages 81–113. Cambridge University Press, 1992.
[24] Helmut Schwichtenberg. Density and choice for partial continuous functionals. In
preparation, 1993.
[25] Helmut Schwichtenberg. Logikprogrammierung. Vorlesungsmanuskript, Universität
München, 1994.
[26] Helmut Schwichtenberg and Stan Wainer. Ordinal bounds for programs. Submitted
for publication in: Feasible Mathematics II (ed. Jeff Remmel), August 1993.
96 References
[27] Dana Scott. Domains for denotational semantics. In E. Nielsen and E. M. Schmidt,
editors, Automata, Languages and Programming, Lecture Notes in Computer Science,
Volume 140, pages 577–613. Springer, Berlin, 1982. A corrected and expanded version
of a paper prepared for ICALP’82, Aarhus, Denmark.
[28] Martin Stein. Interpretationen der Heyting–Arithmetik endlicher Typen. PhD thesis,
Universität Münster, Fachbereich Mathematik, 1976.
[29] William W. Tait. Infinitely long terms of transfinite type. In J. Crossley and M. Dum-
mett, editors, Formal Systems and Recursive Functions, pages 176–185, Amsterdam,
1965. North–Holland.
[30] Anne S. Troelstra, editor. Metamathematical Investigations of Intuitionistic Arith-
metic and Analysis, volume 344 of Lecture Notes in Mathematics. Springer, Berlin,
1973.
[31] Anne S. Troelstra and Dirk van Dalen. Constructivism in Mathematics. An Intro-
duction, volume 121, 123 of Studies in Logic and the Foundations of Mathematics.
North–Holland, Amsterdam, 1988.
[32] Jaco van de Pol. Stong normalization of fol with permutative conversions. Manuscript,
May 1994.
[33] Anton Wallner. Komplexe Existenzbeweise in der Arithmetik. Master’s thesis, Ma-
thematisches Institut der Universität München, 1993.
[34] Hermann Weyl. Über die neue Grundlagenkrise der Mathematik. Mathematische
Zeitschrift, 10, 1921.
Index
∃∗ –rules, 39
∨∗ –rules, 41
almost normal, 84
assumption variable, 6
atom, 2
atomic formula, 2
beta–eta–conversion, 15
beta–eta–normal, 16
branch, 20
classical logic, 9
closed, 2
closed derivation term, 8
Coincidence Lemma, 5
constant, 1
conversion relation, 15
critical, 65
decidable, 51, 65
deduction theorem, 13
definite Horn formula, 25
delta–expansion, 23
derivable formula, 8
derivation term, 6
disjunction property, 41
end segment, 88
98 Index
environment, 4
eta–expansion, 21
eta–expansor, 22
Ex–falso–quodlibet axiom, 10
Ex–falso–quodlibet Lemma, 11
existence property, 41
Existence–Elimination–Lemma, 35
Existence–Introduction–Lemma, 35
extensionality axioms, 52
formula, 1
free assumption variable, 6
function symbol, 1
Harrop formula, 43
Herbrand’s theorem, 20
Heyting, 38
Hilbert system, 13
Horn formula, 25
immediate reduct, 87
immediate subformula, 20
indirect proof, 10
inner delta–expansion, 23
instance, 41
intuitionistic logic, 9
judgement, 43
level, 1, 76
long normal form, 21
Index 99
minimal logic, 6
Mints formula, 10
modified realizability interpretation, 43
natural deduction, 6
normalization by evaluation, 32
number, 49
relation symbol, 2
relevant, 66
SLD–Resolution, 25
stability axiom, 10
Stability Lemma, 10
100 Index
term, 1
type, 1
types associated with A, 43
variable, 1
Weyl, 38