100% found this document useful (1 vote)
58 views103 pages

Proof Theory Lecture Notes 1994

Uploaded by

mohammed kandil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
58 views103 pages

Proof Theory Lecture Notes 1994

Uploaded by

mohammed kandil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

Proof Theory

Helmut Schwichtenberg

Mathematisches Institut der Universität München


Sommersemester 1994
Preface

These notes represent the content of a lecture course on proof theory given during Som-
mersemester 1994 at the Mathematisches Institut der Universität München. They are still
rather sketchy and will have to undergo many more revisions.
For their help in preparing these notes I would like to thank Ulrich Berger, Michael
Bopp, Felix Joachimski, Ralph Matthes, Karl–Heinz Niggl, Jaco van de Pol and Robert
Stärk.

München, July 1994

Helmut Schwichtenberg

ii
Contents

Part I: Preliminaries
1 Typed languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Natural deduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Hilbert style systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 The strong existential quantifier . . . . . . . . . . . . . . . . . . . . . . . 38
6 Realizing terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7 Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Part II: Computational content of proofs


8 A–translation with program extraction and the direct method . . . . . . . . . . 54
9 The root example; refinements . . . . . . . . . . . . . . . . . . . . . . . 63

Part III: Classifying arithmetical proofs


10 Ordinals below epsilon zero . . . . . . . . . . . . . . . . . . . . . . . . 73
11 Provability of initial cases of transfinite induction . . . . . . . . . . . . . . 76
12 Normalization for arithmetic with the omega rule . . . . . . . . . . . . . . 80
13 Unprovable initial cases of transfinite induction . . . . . . . . . . . . . . . 83
14 Normalization for arithmetic is impossible . . . . . . . . . . . . . . . . . . 86

Appendix
15 Permutative conversions . . . . . . . . . . . . . . . . . . . . . . . . . . 87
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

iii
Part I: Preliminaries
1 Typed languages

Let us first fix our language L. Let G be a set of ground types (e.g. nat and boole). Types
(also called object types or simple types) are formed from G by the operations ρ → σ and
ρ × σ. The level of a type ρ is defined by
lev(ι) := 0 for any ground type ι,
lev(ρ → σ) := max(lev(ρ) + 1, lev(σ)),
lev(ρ × σ) := max(lev(ρ), lev(σ)).
For any type ρ let a countable infinite set of variables of type ρ be given. We denote
variables of type ρ by xρ , y ρ , . . .. We also assume that a set C of constants denoted cρ
is given, each of an arbitrary type ρ. Furthermore, we assume that a set F of function
symbols denoted f is given, each of a “functionality” (ρ1 , . . . , ρm ) → σ. They are intended
to denote external functionals, which need not be represented by objects in the model.
Note that it is a consequence of this that we can not freely abstract variables in a term, for
otherwise we could form λxρ f (x) for an arbitrary function symbol f , and hence f would
be an object of our model.
We define inductively terms tρ of type ρ, the set FV(tρ ) of variables free in tρ and the
set nonabs(tρ ) of non–abstractable variables in FV(tρ ).

• xρ and cρ are terms of type ρ, FV(xρ ) = {xρ }, FV(cρ ) = ∅, nonabs(xρ ) = ∅,


nonabs(cρ ) = ∅.

• If tρ11 , . . . , tρnn are terms and f ∈ F a function symbol of functionality (ρ1 , . . . , ρm ) → σ,


then f (t1 , . . . , tn ) is a term of type σ. FV(f (t1 , . . . , tn )) = FV(t1 ) ∪ · · · ∪ FV(tn ),
nonabs(f (t1 , . . . , tn )) = FV(t1 ) ∪ · · · ∪ FV(tn )

• If t is a term of type σ and xρ 6∈ nonabs(t), then λxρ t is a term of type ρ → σ.


FV(λxρ t) = FV(t) \ {xρ }, nonabs(λxρ t) = nonabs(t).

• If t is a term of type ρ → σ and s is a term of type ρ, then ts is a term of type σ.


FV(ts) = FV(t) ∪ FV(s), nonabs(ts) = nonabs(t) ∪ nonabs(s).

• If ti is a term of type ρi for i ∈ {0, 1}, then ht0 , t1 i is a term of type ρ0 × ρ1 .


FV(ht0 , t1 i) = FV(t0 ) ∪ FV(t1 ), nonabs(ht0 , t1 i) = nonabs(t0 ) ∪ nonabs(t1 ).

• If t is a term of type ρ0 ×ρ1 and i ∈ {0, 1}, then πi (t) is a term of type ρi . FV(πi (t)) =
FV(t), nonabs(πi (t)) = nonabs(t).

We now fix our notion of a formula. Let an additional ground type ◦ be given, to be viewed
as the type of propositions. ◦ is not to be used in object types ρ, σ, · · ·. We assume that
2 1 Typed languages

a set P of relation symbols R of functionalities (ρ1 , . . . , ρn ) → ◦ is given. 0–ary relation


symbols are called propositional symbols. Formulas are defined inductively by

• If tρ11 , . . . , tρnn are terms and R ∈ P a relation symbol of functionality (ρ1 , . . . , ρn ) → ◦,


then R(t1 , . . . , tn ) is a formula.

• ⊥ (to be read “falsity”) is a formula.

• If A and B are formulas, then A → B is a formula.

• If A and B are formulas, then A ∧ B is a formula.

• If A is a formula and xρ is a variable, then ∀xρ A is a formula.

R(t1 , . . . , tn ) and ⊥ are called atomic formulas or atoms. Note that ⊥ is not to a propo-
sitional symbol.
We use (also with indices)

r, s, t for terms,
x, y, z for variables,
a, b, c for constants,
P, Q, R for relation symbols,
f, g, h for function symbols,
A, B, C for formulas.

Negation, disjunction and the existential quantifier are defined by

¬A := A → ⊥,
A ∨ B := ¬A ∧ ¬B → ⊥,
∃x A := ¬∀x ¬A.

The set FV(A) of variables free in an L–formula A is defined as usual:

FV(R(t1 , . . . , tn )) := vars(t1 ) ∪ . . . ∪ vars(tn ),


FV(⊥) := ∅.
FV(A → B) := FV(A) ∪ FV(B).
FV(A ∧ B) := FV(A) ∪ FV(B).
FV(∀x A) := FV(A) \ {x}.

A term t and a formula A are called closed , if FV(t) = ∅ or FV(A) = ∅, respectively. We


write t[x1 , . . . , xn ] or A[x1 , . . . , xn ] to indicate that all variables free in t or A are in the
list x1 , . . . , xn .
1 Typed languages 3

For simplicity we identify terms and formulas which differ only by the names of bound
variables. This makes it possible to define substitution r[tρ /xρ ] and A[tρ /xρ ] of a term tρ
for a variable xρ in a particularly simple fashion.
For a term r, variable xρ and a term tρ we define the result of substituting tρ for xρ
in r inductively on r:

t, falls x = y;
y[t/x] :=
y, sonst.
c[t/x] := c.
f (t1 , . . . , tn )[t/x] := f (t1 [t/x], . . . , tn [t/x]),
(λy ρ r)[t/x] := λy ρ r[t/x], where y 6= x and y 6∈ FV(t),
(rs)[t/x] := r[t/x]s[t/x],
ht0 , t1 i[t/x] := ht0 [t/x], t1 [t/x]i,
πi (r)[t/x] := πi (r[t/x]),

and siminarly for a formula A, variable xρ and a term tρ we define the result of substituting
tρ for xρ in A inductively on A:

R(t1 , . . . , tn )[t/x] := R(t1 [t/x], . . . , tn [t/x]),


⊥[t/x] := ⊥,
(A → B)[t/x] := A[t/x] → B[t/x].
(A ∧ B)[t/x] := A[t/x] ∧ B[t/x].
(∀y A)[t/x] := ∀y A[t/x], where y 6= x and y 6∈ FV(t).

We now want to formulate a general notion of a structure or model for our typed
languages, called environment model in [3].
M = ((Dρ )ρ , I0 , I1 ) is called an L–structure, if Dρ is a nonempty set for any type ρ,
I0 is a mapping assigning to any constant symbol cρ ∈ C an object I0 (c) ∈ Dρ and to any
function symbol f ∈ F of functionality (ρ1 , . . . , ρn ) → σ a function

I0 (f ): Dρ1 × · · · × Dρn → Dσ ,

and I1 is a mapping assigning to any relation symbol R ∈ P of functionality (ρ1 , . . . , ρn ) →


◦ a relation
I1 (R) ⊆ Dρ1 × · · · × Dρn .
In case n = 0 we have I1 (R) ∈ {true, false}. An L–environment model is given by an
L–structure M = ((Dρ )ρ , I0 , I1 ) and in addition
(i) bijections Φρ,σ : Dρ→σ → [Dρ → Dσ ] between Dρ→σ and some set [Dρ → Dσ ] of
functions from Dρ to Dσ ,
(ii) bijections Ψρ,σ : Dρ×σ → Dρ × Dσ , and
4 1 Typed languages

(iii) a mapping assigning to any term t and any environment U (i.e. mapping from
the set of variables into ρ Dρ such that for each xρ we have U (xρ ) ∈ Dρ ) an element
S
[[tρ ]]U ∈ Dρ such that the following holds.

[[xρ ]]U = U (xρ ),


[[cρ ]]U = I0 (cρ ),
[[f (t1 , . . . , tn )]]U = I0 (f )([[t1 ]]U , . . . , [[tn ]]U ),
[[λxρ t]]U = Φ−1 ρ,σ (f ), where f (a) = [[t]]U[a/xρ ] for all a ∈ Dρ ,
[[ts]]U = (Φρ,σ ([[t]]U ))([[s]]U ),
[[ht0 , t1 i]]U = Ψ−1
ρ,σ ([[[t0 ]]U , [[t1 ]]U ]) ,
[[πi (t)]]U = (Ψρ,σ ([[t]]U ))i for i ∈ {0, 1},

Note that for a given L–structure M = ((Dρ )ρ , I0 , I1 ) and bijections Φρ,σ and Ψρ,σ it is
in general not possible to define [[t]]U by the requirements above. The reason is that the
function f : Dρ → Dσ defined by f (a) = [[t]]U[a/xρ ] may not be in the range [Dρ → Dσ ]
of Φρ,σ , and hence we cannot define [[λxρ t]]U by Φ−1 ρ,σ (f ). — A trivial way out of this
difficulty is of course to let [Dρ → Dσ ] be the set of all functions from Dρ to Dσ . Hence
we obtain the following trivial
Example of an environment model. Let Dι for any ground type ι be an arbitrary
nonempty set. Choose Dρ→σ to be the set of all functions from Dρ to Dσ , and Dρ×σ =
Dρ ×Dσ . If we then take Φρ,σ and Ψρ,σ to be identities, we can define [[t]]U by the equations
above.
There are more interesting examples of environment models, most notably the model
of the partial continuous function due to Scott [27] and Ersov [9]; see also [24] for an
exposition.
We will usually be somewhat sloppy with our notation and leave out Φ and Ψ. So we
write
[[t]]U [[s]]U instead of (Φρ,σ ([[t]]U ))([[s]]U ),
([[t]]U )i instead of (Ψρ,σ ([[t]]U ))i ,
[[[t0 ]]U , [[t1 ]]U ] instead of Ψ−1
ρ,σ ([[[t0 ]]U , [[t1 ]]U ]) .

We often write f M for the interpretation I0 (f ) of a function symbol and RM for the
interpretation I1 (R) of a relation symbol.
For any environment model M, environment U and formula A we define a model
relation M |= A[U ] by induction on A.

M |= (R(t1 , . . . , tn ))[U ] ⇐⇒ [[t1 ]]U , . . . , [[tn ]]U ∈ RM .


M 6|= ⊥[U ]
M |= (A → B)[U ] ⇐⇒ if M |= A[U ], then M |= B[U ].
M |= (A ∧ B)[U ] ⇐⇒ M |= A[U ] and M |= B[U ].
M |= (∀x A)[U ] ⇐⇒ for all a ∈ |M| we have M |= A[U [a/x]].
1 Typed languages 5

Coincidence Lemma. (i) If U (x) = U ′ (x) for all x ∈ FV(t), then [[t]]U = [[t]]U ′ .

(ii) If U (x) = U ′ (x) for all x ∈ FV(A), then M |= A[U ] ⇐⇒ M |= A[U ′ ].

Proof by induction on t and A. 

Substitution Lemma. (i) [[r[t/x]]]U = [[r]]U ′ with U ′ = U [[[t]]U /x].

(ii) M |= (A[t/x])[U ] ⇐⇒ M |= A[U ′ ] with U ′ = U [[[t]]U /x].

Proof by induction on r and A, using the Coincidence Lemma. 


2 Natural deduction

As our deductive formalism we use the system of natural deduction introduced by Gerhard
Gentzen in [11]. It consists of the following introduction and elimination rules for →, ∧
and ∀.
For any L–formula A let countably many assumption variables of type A be given.
We use uA , v A , wA to denote assumption variables of type A.
Later we will define substitution d[t/x] of object terms t for object variables x in
derivation terms d. We want to avoid that different assumption variables are identified by
R(t) R(x)
a substitution of object terms, as e.g. in hu0 , u0 i[t/x]. Therefore we assume that for
any two assumption variables uA B
i , uj in a derivation d in case A 6= B we also have i 6= j
(cf. [33], p. 30).
The notions of a derivation term dA in minimal logic and its set FA(dA ) of free
assumption variables are defined inductively by

(A) uA is a derivation term with FA(uA ) = {uA }.

(→+ ) If dB is a derivation term, then

(λuA dB )A→B

is a derivation term with FA(λuA dB ) = FA(dB ) \ {uA }.

(→− ) If dA→B and eA are derivation terms, then

(dA→B eA )B

is a derivation term with FA(dA→B eA ) = FA(dA→B ) ∪ FA(eA ).

(∧+ ) If dB and eB are derivation terms, then

hdA , eB iA∧B

is a derivation term with FA(hdA , eB iA∧B ) := FA(dA ) ∪ FA(eB ).

(∧− ) If dA0 ∧A1 is a derivation term, then

πi (dA0 ∧A1 )Ai

is a derivation term with FA(πi (dA0 ∧A1 )Ai ) := FA(dA0 ∧A1 ) (for i ∈ {0, 1}).

(∀+ ) If dA is a derivation term and xρ 6∈ {FV(B)|uB ∈ FA(dA )}, then


S

ρ
(λxρ dA )∀x A
2 Natural deduction 7

is a derivation term with FA(λxρ dA ) = FA(dA ).


ρ
(∀− ) If d∀x A
is a derivation term and tρ is a term, then
ρ
A ρ A[t/xρ ]
(d∀x t )
ρ ρ
is a derivation term with FA(d∀x A ρ
t ) = FA(d∀x A
).

It is sometimes useful to display derivation terms in the following graphical fashion.


(→+ )
u: A
|
B
→+ u
A→B

(→− )
| |
A→B A
→−
B

(∧+ )
| |
A0 A1 +

A0 ∧ A1

(∧−
i )
|
A0 ∧ A1 −
∧i
Ai

(∀+ )
|
A +
∀ x
∀x A
provided x does not occur free in any open assumption of the given derivation of A.

(∀− )
|
∀x A t −

A[t/x]
8 2 Natural deduction

A derivation term dA is called closed , if FA(dA ) = ∅. We write

dB [uA An
1 , . . . , un ]
1

to indicate that the assumption variables free in dB are in the list uA An


1 , . . . , un . We also
1

use the notation d: A instead of dA .

Definition. A formula A is called derivable from assumptions A 1 , . . . , An , is there is a


derivation term dB [uA An A1 An
1 , . . . , un ] with different assumption variables u1 , . . . , un .
1

Let S be a (finite or infinite) set of formulas. We write S ⊢ B, if the formula B is


derivable from finitely many assumptions A 1 , . . . , An ∈ S.
Examples for derivable formulas are

P → (Q → P ),
(P → Q → R) → (P → Q) → P → R,
(∀x.P (x) → Q(x)) → ∀x P (x) → ∀x Q(x).

Definition. For any derivation d we define its set FV(d) of free (object) variables by

FV(uA ) := FV(A).
FV(λuA dB ) := FV(A) ∪ FV(dB ).
FV(dA→B eA ) := FV(dA→B ) ∪ FV(eA ).
FV(hdA , eB i) := FV(dA ) ∪ FV(eB ).
FV(πi (dA∧B )) := FV(dA∧B ).
FV(λx dA ) := FV(dA ) \ {x}.
FV(d∀x A t) := FV(d∀x A ) ∪ FV(t).

Example: Let d be the derivation given by

u: R(x)
→+ u
R(x) → R(x)

Then we have
d = λuR(x) uR(x) ,
FA(d) = ∅,
FV(d) = {x}.

For derivation terms we have two kinds of substitution: we can substitute a derivation
term f A for an assumption variable uA , and we can substitute an object term t for an
object variable x. These substitutions are defined as follows. For simplicity we again
identify derivation terms which differ only in the names of bound variables.
2 Natural deduction 9

Definition.

f, if u = v;
v[f /u] :=
v, otherwise.
(λv d)[f /u] := λv d[f /u], where u 6= v and v 6∈ FA(f ).
(de)[f /u] := d[f /u]e[f /u].
(hd, ei)[f /u] := hd[f /u], e[f /u]i.
(πi (d))[f /u] := πi (d[f /u]).
[
(λx d)[f /u] := λx d[f /u], where x 6∈ {FV(A) : v A ∈ FA(f )}.
(dt)[f /u] := d[f /u]t.

Definition.

uA [t/x] := uA[t/x] .
(λuA d)[t/x] := λuA[t/x] d[t/x].
(de)[t/x] := d[t/x]e[t/x].
(hd, ei)[t/x] := hd[t/x], e[t/x]i.
(πi (d))[t/x] := πi (d[t/x]).
(λy d)[t/x] := λy d[t/x], where x 6= y and y 6∈ FV(t).
(dr)[t/x] := d[t/x]r[t/x].

Recall here the requirement for derivation terms mentioned above: for any two as-
sumption variables uA B
6 B we also have i 6= j (cf. [33],
i , uj in a derivation d in case A =
p. 30).

Lemma. (i) If d, f are derivation terms and t is an object term, then d[f /u] and d[t/x]
are derivation terms.

(ii) FA(d[f /u]) ⊆ (FA(d) \ {u}) ∪ FA(f ).

(iii) FV(d[f /u]) ⊆ FV(d) ∪ FV(f ).

(iv) FA(d[t/x]) = {uA[t/x] : uA ∈ FA(d)}.

(v) FV(d[t/x]) ⊆ (FV(d) \ {x}) ∪ FV(t).

Proof by simultaneous induction (cf. [33], p. 30/31). 


Derivation terms in intuitionistic and in classical logic are obtained by adding to the
first (assumption–) clause of the definition

• in the case of intuitionistic logic: For any relation symbol R ∈ P we let

EfqR : ∀~x.⊥ → R(~x)


10 2 Natural deduction

be a derivation term with FA(EfqR ) = ∅ (Ex–falso–quodlibet axiom).

• in the case of classical logic: For any relation symbol R ∈ P we let

StabR : ∀~x.¬¬R(~x) → R(~x)

be a derivation term with FA(StabR ) = ∅ (Stability axiom).

Clearly FV(StabR ) := FV(EfqR ) := ∅.


We write Γ ⊢ A (Γ ⊢I A, Γ ⊢C A), if there is a derivation term dA in minimal
(intuitionistic, classical) logic such that for any uB ∈ FA(d) we have B ∈ Γ.
Here are some more interesting examples: The Peirce formula

((P → Q) → P ) → P

is derivable in classical, but not in minimal logic. The Mints formula

((((P → Q) → P ) → P ) → Q) → Q

is derivable in minimal logic. However, its variant

((((P → Q) → P ) → P ) → R) → R

is derivable in classical but not in minimal logic.


By obvious reasons the stability axiom is also called the principle of indirect proof for
the relation symbol R. We now want to show that from our stability axioms we can derive
the principle of indirect proof for arbitrary formulas (in our → ∧∀–language).

Stability Lemma. From stability assumptions StabR for any relation symbol R occurring
in a formula A we can derive ¬¬A → A.

Proof by induction on A.
Case R(~t). Use StabR .
Case ⊥. ¬¬⊥ → ⊥ ≡ ((⊥ → ⊥) → ⊥) → ⊥

(⊥ → ⊥) → ⊥ ⊥→⊥

Case A → B. Use

⊢ (¬¬B → B) → ¬¬(A → B) → A → B.
2 Natural deduction 11

A→B A

¬B B

¬¬(A → B) ¬(A → B)

¬¬B → B ¬¬B

Derivation of (¬¬B → B) → ¬¬(A → B) → A → B.

∀x A x

¬A A

¬¬∀x A ¬∀x A

¬¬A → A ¬¬A

Derivation of (¬¬A → A) → ¬¬∀x A → A.

Case A ∧ B. Use

⊢ (¬¬A → A) → (¬¬B → B) → ¬¬(A ∧ B) → A ∧ B.

Case ∀x A. Use
⊢ (¬¬A → A) → ¬¬∀x A → A. 

Similarly we can show that from our ex–falso–quodlibet axioms we can derive ex–
falso–quodlibet for arbitrary formulas (again in our → ∧∀–language).

Ex–falso–quodlibet Lemma. From assumptions EfqR for any relation symbol R occur-
ring in a formula A we can derive ⊥ → A in intuitionistic logic.
12 2 Natural deduction

Proof by induction on A. 
From ¬¬A → A one can clearly derive ⊥ → A. Hence any formula derivable in
intuitionistic logic is also derivable in classical logic.
3 Hilbert style systems

By a Hilbert style system we mean a derivation system without the possibility to discharge
open assumptions, i.e. without the rule →+ of →–introduction. Such a system for minimal
quantifier logic is the following. Axioms:

K: B → A → B,
S: (A → B → C) → (A → B) → A → C,
∧+ –Ax: A → B → A ∧ B,
∧−
i –Ax: A0 ∧ A1 → Ai ,
∀+ –Ax: (∀x.A → B) → A → ∀x B if x 6∈ FV(A),

∀ –Ax: ∀x A → A[t/x].

Rules: only →− , ∀+ . Sometimes →− is also called modus ponens, and ∀+ the rule of
generalization. Note that in addition to the axioms (or better axiom constants) we also
allow assumption variables; the variable condition in the rule ∀+ only refers to the free
assumption variables, not to the axiom constants.
We want to prove that this Hilbert system is equivalent to the natural deduction
system for minimal quantifier logic introduced in §2.
For the non–trivial direction we consider an intermediate system which has the axioms
above and as rules only →+ , →− , ∀+ .
~ B in the natural deduction system we
We first show that for any derivation d[~u: A]:
~ B in the intermediate system. The proof is by induction on
can find a derivation d′ [~u: A]:
d.
Case hd0 , d1 i. Take ∧+ –Ax d′0 d′1 .
Case πi (d). Take ∧− –Ax d′ .
Case dt. Take ∀− –Ax t. 
We now show the so-called deduction theorem, i.e. that for any derivation

~ B
d[uA , ~u: A]:

~ B again in the Hilbert system.


in the Hilbert system we can find a derivation (λ∗ uA d)[~u: A]:
The proof is again by induction on d.
Case v 6= u or c. Take Kd
Case u. Take SKK
Case de. Take λ∗ uA .de := S(λ∗ uA d)(λ∗ uA e).
14 3 Hilbert style systems

Case λx d. Take λ∗ uA (λx d) := ∀− –Ax(λx(λ∗ uA d)). 


~ B in the natural deduction
Now we can easily prove that for any derivation d[~u: A]:
~ in the Hilbert system.
system we can find a derivation of B from A
For the proof we can assume that d is a derivation in the intermediate system. We
now use again induction on d. The only intersting case occurs when the derivation is of
~ By IH we can assume that d already is in the Hilbert system.
the form λuA d[uA , ~u: A].
~ B of B from A
But then the deduction theorem gives us a derivation (λ∗ uA d)[~u: A]: ~ in the
Hilbert system.
Problem: Consider the following variant of the Hilbert system above. Axioms:

K: ∀(B → A → B),
S: ∀((A → B → C) → (A → B) → A → C),
∧+ –Ax: ∀(A → B → A ∧ B),
∧−
i –Ax: ∀(A0 ∧ A1 → Ai ),
K ∀ : ∀(B → ∀x B) if x 6∈ FV(B),
S ∀ : ∀((∀x.B → C) → ∀x B → ∀x C),
∀− –Ax: ∀(∀x A → A[t/x]),

where ∀(A) denotes the universal closure of the formula A. Rules: only →− . Again in
addition to the axioms we also allow assumption variables. Show that for any derivation
~ B in the natural deduction system we can find a derivation d′ [~u: A]:
d[~u: A]: ~ B in this variant
of the Hilbert system.
4 Normalization

We show in this section that any derivation d can be transformed by appropriate con-
version steps into a normal form. A derivation in normal form has the property that it
does not make “detours”, or more precisely, that it cannot occur that an elimination rule
immediately follows an introduction rule. Derivations in normal form have many pleasant
properties, and can be used for a variety of results. We also construct in this section the
so–called long normal form, by means of an additional conversion step called η–expansion.
Implementing normalization of λ–terms with free variables (like derivations d) in the
usual recursive fashion is quite inefficient. However, it is possible to compute the long
normal form of such a term by evaluating it in an appropriate model (cf. [2] and also [5]).
This makes it possible to use the built–in evaluation mechanism of e.g. Scheme (a Lisp
dialect, defined in [4]) to efficiently implement normalization.
We finally show that the requirement to give a normal derivation of a derivable formula
can sometimes be unrealistic. Following Orevkov [16] we give examples of formulas C k
which are easily derivable with non–normal derivations (whose number of nodes is linear
in k), but which require a non–elementary (in k) number of nodes in any normal derivation.

4.1 Beta–eta normal forms

For the arguments in this section it is convenient to use the following notation.

• By a term we mean a derivation term as well as an object term. M, N, K, . . . denote


terms in this sense. For simplicity we assume that object terms do not contain function
symbols.

• x, y, z . . . denote assumption variables as well as object variables.

• We identify terms differing only by the names of their bound variables.

• ρ, σ, τ . . . denote formulas as well as types, and ι denotes atomic formulas or ⊥ as well


as ground types. ρ × σ denotes product types as well as conjunctions, and ρ → σ
denotes function types as well as implications or universal formulas. → associates to
the right and × to the left. Furthermore × has a higher precedence than →.

We now define a conversion relation M →0 M ′ between terms. Since we have allowed


pairing and components, we have β– and η–conversion for → as well as for ×.

Definition. M →0 M ′ is defined by

(λx M )N →0 M [N/x]. (β)


16 4 Normalization

λx.M x →0 M, if x ∈
/ FV(M ) ∪ FA(M ). (η)
πi hM0 , M1 i →0 Mi for i ∈ {0, 1}. (β×,i )
hπ0 (M ), π1 (M )i →0 M. (η× )

A term M is called β–convertible (η–convertible), if it has the form of a left hand side of
(β) or (β×,i ) ((η) or (η× )). Such terms are also called β–redex or η–redex (for reducible
expression).

From →0 (also denoted by →0βη ) one derives a one–step reduction relation → (also
denoted by →βη ) as follows. Intuitively M → M ′ means that M ′ is obtained from M by
converting exactly one subterm.

Definition. M → M ′ is defined inductively by

M →0 M ′ =⇒ M → M ′ .
M → M ′ =⇒ λx M → λx M ′ . (+)
M → M ′ =⇒ M N → M ′ N. (−0 )
N → N ′ =⇒ M N → M N ′ . (−1 )
M → M ′ =⇒ hM, N i → hM ′ , N i. (+×,0 )
N → N ′ =⇒ hM, N i → hM, N ′ i. (+×,1 )
M → M ′ =⇒ πi (M ) → πi (M ′ ) for i = 0, 1. (−×,i )

Definition. A term M is βη–normal if M has no β– or η–convertible subterm.

Hence a term M is βη–normal if and only if M contains no subterm M ′ such that


M → M ′ . We now show that → is terminating, i.e. that any reduction sequence starting
with M terminates after finitely many steps. By a reduction sequence we mean a (finite
or infinite) sequence
M1 , M2 , . . . , Mn , . . .

such that Mi+1 arises from Mi by a β– or η–conversion of a subterm, i.e. Mi → Mi+1 . We


write M →∗ M ′ (or M →+ M ′ ) if M ′ is a member of a reduction sequence (a reduction
sequence with at least two elements) starting with M . Hence →∗ is the reflexive transitive
closure of →.
For many of our later arguments it is useful to write M 0 for π0 (M ) and M 1 for π1 (M ).
To prove termination of → we make use of a method due to W.W. Tait and define
so–called strong computability predicates. They are defined by induction on the type ρ of
a term M ρ as follows.

• M ι is strongly computable if M ι strongly normalizable, i.e. if any reduction sequence


starting with M ι terminates after finitely many steps.
4 Normalization 17

• M ρ→σ is strongly computable if for any strongly computable N ρ also (M N )σ is


strongly computable

• M ρ×σ is strongly computable if π0 (M )ρ and π1 (M )σ are strongly computable.

~ of strongly computable
A term M is strongly computable under substitution if for any list N
~ ] is strongly computable.
terms M [N

Lemma 1. (i) Any strongly computable term M ρ is strongly normalizable.

(ii) Any variable xρ is strongly computable.

Proof by simultaneous induction of the type ρ. For ground types ι both claims are trivial.
Case ρ → σ. (i) Let M ρ→σ be strongly computable. By IH(ii) and the definition
of strong computability (M x)σ is strongly computable. Hence by IH(i) any reduction
sequence starting with M x terminates. Then clearly this also holds for M , since any
subterm of a strongly normalizable term is also strongly normalizable. (ii) Let M ~ be a
list of strongly computable terms or 0 or 1 such that xM ~ is of ground type. We have to
show that xM ~ is strongly computable, which for a ground type means the same as strongly
normalizable. But this follows from IH(i), which says that any reduction sequence starting
with some Mi terminates.
Case ρ×σ. (i) Let M ρ×σ be strongly computable. By definition e.g. π0 (M )ρ is strongly
computable and hence by IH(i) also strongly normalizable. This clearly also holds for M ,
since any subterm of a strongly normalizable term is also strongly normalizable. (ii) As in
Case ρ → σ. 

Lemma 2. If M →∗ M ′ and M is strongly computable, then so is M ′ .


~ be a list of strongly computable terms or 0 or 1 such that M N
Proof . Let N ~ is of ground
type. We have to show that M ′ N ~ is strongly computable, i.e. that any reduction sequence
′~
starting with M N terminates. But this clearly holds, since any such sequence can be
extended to a reduction sequence starting with M N ~. 

Lemma 3. Any term M is strongly computable under substitution.

Proof by induction on M .
Case x. The claim follows from Lemma 1(ii).
Case M N . Let K~ be strongly computable. We have to show that M [K]N ~ [K] ~ is
~ as well as N [K]
strongly computable. But this clearly holds, since by IH M [K] ~ are strongly
computable.
Case π0 (M ), π1 (M ). Similar.
Case λx M . Let K~ be strongly computable. We have to show that λx M [K]~ is strongly
~ a list of strongly computable terms or
computable. So let N be strongly computable and L
18 4 Normalization

~
0 or 1. We have to show that (λx M [K])N L~ is strongly computable, i.e. that any reduction
sequence starting with it terminates. So assume that we have such a reduction sequence.
~ N and L
M [K], ~ are all strongly normalizable. Hence we can assume that in this reduction
~ ′ )N ′ L
sequence there is a term (λx M [K] ~ ′ with M [K]~ →∗ M [K]
~ ′ , N →∗ N ′ and L ~ →∗ L~′
to which in the next step a head conversion has been applied. This must be either a β– or
an η–conversion. In both cases the result is
~ ′ [N ′ ]L
M [K] ~ ′.

~ →∗ M [K]
Because of M [K] ~ ′ we also have M [N, K] ~ →∗ M [K]~ ′ [N ′ ]. Since by IH M
is strongly computable under substitution we know that M [N, K] ~ is strongly computable,
and hence by Lemma 2 also M [K] ~ ′ [N ′ ]. Since again by Lemma 2 L~ ′ is strongly computable
~ ′ [N ′ ]L
we obtain that M [K] ~ ′ is strongly computable and hence strongly normalizable.

Case hM, N i. Similarly. Let K ~ be strongly computable. We have to show that


~ N [K]i
hM [K], ~ is strongly computable. So let L ~ be a list of strongly computable terms
or 0 or 1. We have to show that hM [K], ~ N [K]i~ L ~ is strongly computable, i.e. that any
reduction sequence starting with it terminates. So assume that we have such a reduction
~ N [K]
sequence. M [K], ~ and L~ are all strongly normalizable. Hence we can assume that in
this reduction sequence there is a term hM [K]~ ′ , N [K]
~ ′ iL
~ ′ with M [K]~ →∗ M [K] ~ ′ , N [K]
~ →∗
~ ′ and L
N [K] ~ →∗ L ~ ′ to which in the next step a head conversion has been applied. This
must be either a β×,i – or an η× –conversion. In case it is a β×,i –conversion L ~ ′ must begin
with i, hence L~ ′ = i, M~ ′ and L
~ = i, M ~ with M ~ →∗ M ~ ′ , and e.g. for i = 0 the head
conversion yields
~ ′M
M [K] ~ ′.
~ ′ and M
But M [K] ~ ′ are strongly computable by Lemma 2. Hence M [ K] ~ ′M
~ ′ is strongly
computable and therefore strongly normalizable. In case it is an η× –conversion M [K] ~ ′ and
~ ′ are of the form π0 (P ) and π1 (P ), and the head conversion yields
N [K]

~′ → PL
hπ0 (P ), π1 (P )iL ~ ′.

By IH and Lemma 2 π0 (P ) and π1 (P ) are strongly computable, hence by definition also


P . Since again by Lemma 2 L ~ ′ is strongly computable we obtain that P L
~ ′ is strongly
computable and hence strongly normalizable. 
From Lemma 3 and Lemma 1 we immediately obtain

Theorem. → is terminating, i.e. any term M is strongly normalizable. 

Clearly the proof just given remains correct if we leave out the η–conversion rules (η)
and (η× ). The corresponding normal form is called β–normal form.
This proof can be extended to terms involving primitive recursion operators (see
e.g. [30, 31, 21]), the general recursion or fixed point operator (see [18]) or the bounded
fixed point operator (see [26]).
4 Normalization 19

4.2 Subformula property and Herbrand’s Theorem

We now want to draw some conclusions from the fact that any derivation can be brought
into β–normal form. For this purpose we have to analyse more closely the form of a normal
derivation. First we need some terminological preparations.
With p, q ∈ {0, 1}∗ we denote positions in a term M . Pos(M ) is the set of all positions
in M , and [] denotes the empty list, i.e. the root position. p0 and p1 are the extensions of
the position p by 0 or 1. The subterm of M at position p is denoted by M/p, and M [N ]p
results from M by replacing the subterm at position p by N . We write p ≤ q if p is a prefix
of q. p and q are called independent (written p k q) if p 6≤ q and q 6≤ p. Firthermore let

Leafpos(M ) := {p ∈ Pos(M ) : M/p variable or constant},


Elimpos(M ) := {p ∈ Pos(M ) : M/p formed by an elimination rule},
Intropos(M ) := {p ∈ Pos(M ) : M/p formed by an introduction rule}.

For any p ∈ Elimpos(M ) ∪ Leafpos(M ) we define its associated minimal position by

minposM ([]) := [].



 minposM (p), if i = 0 and p ∈ Elimpos(M );
minposM (pi) := pi, if i = 1 and p ∈ Elimpos(M ), or
p ∈ Intropos(M ).

Clearly minposM (p) ∈ Elimpos(M ) ∪ Leafpos(M ). Let

Minpos(M ) := {p ∈ Elimpos(M ) ∪ Leafpos(M ) : minposM (p) = p}.

For any p ∈ Pos(M ) we define its associated end position by

endposM ([]) := [].



 endposM (p), if p ∈ Intropos(M ), or
endposM (pi) := i = 0 and p ∈ Elimpos(M );
pi, if i = 1 and p ∈ Elimpos(M ).

Let
Endpos(M ) := {p ∈ Pos(M ) : endposM (p) = p}.
For β–normal terms M the leaf positions and the minimal positions are in a bijective
correspondence. Note however that two minimal positions can yield the same end position,
as e.g. in

P Q

P ∧Q
20 4 Normalization

Let M be a term and p ∈ Leafpos(M ). Then

{q ∈ Pos(M ) : p ≥ q ≥ endposM (p)}

is called the branch in M determined by the leaf position p. In particular, minposM (p) is
an element of the branch determined by p. In case endposM (p) = [] the branch is called a
main branch. In a β–normal term any branch p1 ≥ · · · ≥ pn has a particularly perspicious
form: all elimination rules must come before all introduction rules.
In the case of a normal derivation term we want to draw from this some conclusions
on the formulas attached to the positions. To do this we need the following notions.
The relation “A is a subformula of B” is the reflexive transitive closure of the relation
“immediate subformula”, defined as follows.

(i) A and B are immediate subformulas of A → B,

(ii) A and B are immediate subformulas of A ∧ B,

(iii) A[t/x] is an immediate subformula of ∀x A.

Furthermore we need the notion “A is a strictly positive subformula of B”. This is to


be the reflexive transitive closure of the relation “immediate strictly positive subformula”,
defined as follows.

(i) B is an immediate strictly positive subformula of A → B,

(ii) A and B are immediate strictly positive subformulas of A ∧ B,

(iii) A[t/x] is an immediate strictly positive subformula of ∀x A.

Now let d be a β–normal derivation term, p1 ≥ · · · ≥ pn a branch in d, pm =


minposd (p1 ) and Ai the type of d/pi . Then Am must be a strictly positive subformula of
all Ai with i 6= m. This Am is called the minimal formula of the branch. Furthermore any
Ai with i ≤ m is a strictly positive subformula of A1 , and any Ai with i ≥ m is a strictly
positive subformula of An .

Theorem (subformula property). If dB [uA An


1 , . . . , un ] is a β–normal derivation and
1

C B
e is a subderivation of d , then C is a subformula of B or of some Ai .

Proof by induction on the length of the end position of a branch in d. Here we need the
properties of branches just mentioned. 

Theorem (Herbrand). If ∀~x1 A1 , . . . , ∀~xm Am ⊢ B with quantifier free A1 , . . . , Am , B,


then one can find terms ~t11 , . . . , ~t1n1 , . . . , ~tm1 , . . . , ~tmnm such that

A1 [~t11 ], . . . , A1 [~t1n1 ], . . . , Am [~tm1 ], . . . , Am [~tmnm ] ⊢ B.


4 Normalization 21

Proof . To simplify the notation we only treat the case ∀x A ⊢ B with quantifier free A, B.
From the given derivation we can construct a normal derivation dB [u∀x A ]. By induction
on the length of the end position of a branch it can be shown easily that any branch ends
with a derivation of a quantifier free formula and begins with the rule ∀− , i.e. with u∀x A ti ,
or with a bound assumption variable v C with quantifier free C. Now replace in the first
A[t ]
case the subderivation u∀x A ti by vi i with a new assumption variable vi . 

4.3 Eta–expansion

We now define an η–expansion relation →η –exp between terms. M →η –exp M ′ is to mean


that M ′ can be obtained from M by η–expansion of a subterm N , i.e. by replacing N
by λx.N x or by hπ0 (N ), π1 (N )i respectively, but only if no new β–convertible subterm is
created. This can be ensured if η–expansions are allowed at minimal positions only. For
instance a variable F of type (ρ → ρ) → ρ can be η–expanded as follows:

F (ρ→ρ)→ρ →η –exp λf ρ→ρ .F f →η –exp λf.F (λx.f x).

Definition. M →η –exp M ′ iff there is a p ∈ Minpos(M ) such that the type of N := M/p
is composed and in case typ(N ) = ρ → σ we have

M ′ = M [λxρ .N x]p

with x 6∈ FV(N ), and in case typ(N ) = ρ × σ we have

M ′ = M [hπ0 (N ), π1 (N )i]p .

Note that in case M →η –exp M ′ the term M ′ again is β–normal if M is.

Definition. A term M is in long normal form if it has no β–convertible or η–expandable


subterm.

Write M →βη –exp M ′ if M ′ is obtained from M by β–conversion or (context depen-


dent) η–expansion of exactly one subterm, i.e. if M →β M ′ or M →η –exp M ′ , where
M →β M ′ denotes the one step relation defined using β–conversion rules only. Following
Di Cosmo and Kesner [6],[7] we now prove that also →βη –exp is terminating. A different
proof based on an extension of Tait’s method of computability predicates (and using ideas
from Prawitz [19]) has been given by Dougherty [8], pages 142–146.
Note that the proof above cannot be extended to this situation, for the following
reason. In Lemma 2 we have shown that from M → ∗ M ′ and strong computability of M
we can infer strong computability of M ′ . In the proof we have made use of the fact that any
reduction sequence starting with M ′ N~ can be extended to a reduction sequence starting
~
with M N . But this does not hold any more, since η–expansion is context depending.
22 4 Normalization

The key idea of Di Cosmo’s and Kesner’s proof is the observation that η–expansion
can be simulated by β–conversion:
(λy ρ→σ λxρ .yx)M →β λx.M x if x 6∈ FV(M ),
(λxρ×σ hπ0 (x), π1 (x)i)M →β hπ0 (M ), π1 (M )i.

More precisely, for any type ρ we define an η–expansor ∆ρ by


∆ι := λxι xι ,
∆ρ→σ := λy ρ→σ λxρ .∆σ (y∆ρ x),
∆ρ×σ := λxρ×σ h∆ρ π0 (x), ∆σ π1 (x)i.
Note that ∆ρ is a closed term of type ρ → ρ. We write ∆kρ M for ∆ρ (∆ρ (. . . (∆ρ M ) . . .))
with k occurrences of ∆ρ . The operator ∆k is to have a higher precedence than application,
so ∆k M N means (∆k M )N and ∆k M 0 means (∆k M )0, i.e. π0 (∆k M ). We write ∆k M ~ for
k k
∆ M1 . . . ∆ Mn .

Lemma.
~ →∗ ∆k (M ∆k N
∆k M N ~ ), (1)
β

∆kρ→σ (λx M ) →∗β λx.∆kσ (M [∆kρ x/x]), (2)


∆kρ×σ hM, N i →∗β h∆kρ M, ∆kσ N i. (3)

~ is meant to be a list of terms or 0 or 1.


Proof. Note that in (1) N
~ has length 1. So we have to show
(1). We first assume that N
∆kρ→σ M N →∗β ∆kσ (M ∆kρ N ), (1.1)
π0 (∆ρ×σ M ) →∗β ∆ρ (π0 (M )), (1.2)
π1 (∆ρ×σ M ) →∗β ∆σ (π1 (M )). (1.3)
We use induction on k. The case k = 0 is trivial, and in the induction step we obtain for
(1.1)
∆k+1 k
ρ→σ M N = ∆ρ→σ (∆ρ→σ M )N

→∗β ∆σ (∆kρ→σ M ∆ρ N ) by definition of ∆ρ→σ


→∗β ∆σ (∆kσ (M ∆kρ (∆ρ N ))) by IH
= ∆k+1 k+1
σ (M ∆ρ N)
and e.g. for (1.3)
π1 ∆k+1 k
 
ρ×σ M = π 1 ∆ ρ×σ (∆ ρ×σ M )
∗ k

→β ∆σ π1 ∆ρ×σ M by definition of ∆ρ×σ
→∗β ∆σ ∆kσ π1 (M )

by IH
= ∆k+1
σ π1 (M ).
4 Normalization 23

We now prove (1) for an arbitrary N ~ , by induction on the length of N


~ . For empty N
~ this
is trivial, and for the induction step we obtain
~ →∗β ∆kσ (M ∆kρ N )N
∆kρ→σ M N N ~)
~ →∗β ∆k (M ∆kρ N ∆k N by IH.

(2). We again use induction on k. The case k = 0 is trivial, and for the induction
step we obtain
∆k+1 k
ρ→σ (λx M ) = ∆ρ→σ (∆ρ→σ (λx M ))

→∗β λx ∆σ (∆kρ→σ (λx M )∆ρ x) by definition of ∆ρ→σ


→∗β λx ∆σ (∆kσ (M [∆kρ x/x])[∆ρ x/x]) by IH
= λx ∆k+1 k+1
σ M [∆ρ x/x].

(3). Similar. 
By a ∆–expansion of M we mean a term M ∆ obtained from M by replacing simulta-
neously every η–expandable subterm N of type ρ in M by ∆kρ N for some k > 0.
By an inner ∆–expansion of M we mean a term M i∆ obtained from M by replacing
simultaneously every η–expandable subterm N of type ρ in M exept the term M itself by
∆kρ N for some k > 0.
By a strong ∆–expansion of M we mean a term M s∆ obtained from M by replacing
simultaneously every η–expandable subterm N of type ρ in M by ∆kρ N for some k > 0
and every other subterm N of type ρ in M by ∆kρ N for some k ≥ 0.

Lemma. Let M s∆ be a strong ∆–expansion of M . Then we can find a ∆–expansion M ∆


of M such that M s∆ β–reduces to M ∆ .

Proof by induction on M , using the previous Lemma.


Case λx M . Any (λx M )s∆ has the form ∆k (λx M s∆ ) with k ≥ 0. By the previous
Lemma ∆k (λx M s∆ ) β–reduces to λx.∆k (M s∆ [∆k x/x]). By IH(M ) we can find a ∆–
expansion M ∆ of M such that ∆k (M s∆ [∆k x/x]) β–reduces to M ∆ . Let (λx M )∆ :=
λx M ∆ .
Case hM, N i. Similar.
Case M N ~ , M an introduction and N~ not empty. Any (M N ~ )s∆ must have the form
∆k (∆m M s∆ N~ s∆ ) with k > 0, m ≥ 0 and M s∆ an introduction. By the previous Lemma
∆m M s∆ N~ s∆ β–reduces to ∆m (M s∆ ∆m N~ s∆ ). By IH(M ) we can find a ∆–expansion M ∆
s∆
of M such that M β–reduces to M . By IH(Ni ) we can find a ∆–expansion Ni∆ of Ni

such that ∆m Nis∆ β–reduces to Ni∆ . Let (M N ~ )∆ := ∆k+m (M ∆ N


~ ∆ ).

Case xN~ . Any (xN~ )s∆ has the form ∆k (∆m xN


~ s∆ ) with k > 0 and m ≥ 0. By the
previous Lemma ∆m xN ~ s∆ β–reduces to ∆m (x(∆m N
~ s∆ )). By IH(Ni ) we can find an ∆–
~ )∆ := ∆k+m (xN
expansion Ni∆ of Ni such that ∆m Nis∆ β–reduces to Ni∆ . Let (xN ~ ∆ ). 
24 4 Normalization

For the proof of the following theorem we need an inductive definition of η–expansion,
which can be given as follows.

Definition. M →0η−exp M ′ is defined by

M →0η−exp λx.M x if M is not of the form λy N ,


M →0η−exp hπ0 (M ), π1 (M )i if M is not a pair.

From →0η−exp one derives a one–step η–expansion relation →η –exp as follows.

Definition. M →η –exp M ′ is defined inductively by

M →0η−exp M ′ =⇒ M →η –exp M ′.
~ →η –exp
M →η –exp M ′ =⇒ (λx M )K ~
(λx M ′ )K.
~ →η –exp
M →η –exp M ′ =⇒ hM, N iK ~
hM ′ , N iK.
~ →η –exp
N →η –exp N ′ =⇒ hM, N iK ~
hM, N ′ iK.
~ →η –exp
N →η –exp N ′ =⇒ M N K ~
M N ′ K.

Note that a similar definition of →β can be given, as follows:

Definition. M →β M ′ is defined inductively by

M →0β M ′ =⇒ M N ~ →β M ′ N
~.
~ →β (λx M ′ )K.
M →β M ′ =⇒ (λx M )K ~
~ →β hM ′ , N iK.
M →β M ′ =⇒ hM, N iK ~
~ →β hM, N ′ iK.
N →β N ′ =⇒ hM, N iK ~
N →β N ′ =⇒ M N K~ →β M N ′ K.
~

This remark will be used in the proof of the theorem below.

Theorem. If M →βη –exp N , then for any ∆–expansion M ∆ of M we can find a ∆–


expansion N ∆ of N such that N ∆ can be obtained from M ∆ by a non–empty sequence of
β–reductions.

Proof by induction on the definition of M →βη –exp N . Recall →βη –exp =→β ∪ →η –exp .
We only treat the cases concerning types ρ → σ; the cases concerning product types ρ × σ
can be dealt with in a similar fashion.
Case (λx M )N N ~ →β M [N/x]N. ~ Any ∆–expansion of (λx M )N N ~ must be of the
~ ∆ ) for some k > 0 with ∆–expansions M ∆ of M , N ∆ of N and
form ∆k ((λx M ∆ )N ∆ N
4 Normalization 25

~ ∆ of N
N ~ . One β–reduction yields ∆k (M ∆ [N ∆ /x]N
~ ∆ ). Since this term is a strong ∆–
~ we can β–reduce it by the previous Lemma to a ∆–expansion of
expansion of M [N/x]N
~
M [N/x]N.
Case M →0η−exp λx.M x. Then M is not a λ–expression. Hence any ∆–expansion
of M must be of the form ∆k+1 M i∆ with an inner ∆–expansion M i∆ of M . From ∆ =
λyλx.∆(y∆x) we obtain immediately

∆k+1 M i∆ = ∆k (∆M i∆ ) →β ∆k λx.∆(M i∆ ∆x).

Since this term is a strong ∆–expansion of λx.M x we can β–reduce it by the previous
Lemma to a ∆–expansion of λx.M x.
In the following cases let → denote either →η –exp or else →β .
Case (λx M )K ~ → (λx M ′ )K.
~ Any ∆–expansion of (λx M )K ~ must be of the form
~ ∆ ) for some k > 0 with ∆–expansions M ∆ of M and K
∆k ((λx M ∆ )K ~ ∆ of K.~ By IH
we can find a ∆–expansion M ′∆ of M ′ such that M ′∆ can be obtained from M ∆ by a
~ ∆ := ∆k ((λx M ′∆ )K
non–empty sequence of β–reductions. Let ((λx M ′ )K) ~ ∆ ).

Case xM~ NK ~ → xM~ N ′ K.


~ Then any ∆–expansion of xM ~ NK ~ must be of the form
k
∆ (xM~ N K
∆ ∆ ~ ) for some k > 0 with ∆–expansions M
∆ ~ of M
∆ ~ , N ∆ of N and K ~ ∆ of K.
~
′∆ ′ ′∆ ∆
By IH we can find a ∆–expansion N of N such that N can be obtained from N by
~ N ′ K)
a non–empty sequence of β–reductions. Let (xM ~ ∆ := ∆k (xM
~ ∆ N ′∆ K
~ ∆ ).

Case (λx M )MN ~ K ~ → (λx M )MN


~ ′ K.
~ Any ∆–expansion of (λx M )M ~ NK~ must be of
k ∆ ~ ∆ ∆~∆ ∆ ~ ∆ of
the form ∆ ((λx M )M N K ) for some k > 0 with ∆–expansions M of M , M
M~ , N ∆ of N and K~ ∆ of K.
~ By IH we can find a ∆–expansion N ′∆ of N ′ such that N ′∆ can
be obtained from N ∆ by a non–empty sequence of β–reductions. Let ((λx M )MN ~ ′ K)
~ ∆ :=
∆k ((λx M ∆ )M ~ ∆ N ′∆ K
~ ∆ ). 

Corollary. →βη –exp is terminating. 

4.4 Minimal from classical proofs

We now use the long normal form of derivation terms to show that any classical proof
of ⊥ from so–called generalized (definite) Horn formulas can be converted into a proof in
minimal logic, Moreover we describe a reasonable algorithm to do this conversion; here we
follow [22]. This result can be used to prove completeness of SLD–Resolution.
A formula is called Horn formula if it has the form ∀x1 , . . . , xn .A1 → . . . → Am → B
with Ai and B atomic. It is called definite Horn formula if in addition we have B 6= ⊥. If
instead of atomic Ai we allow universally quantified atomic formulas, the result is called a
generalized (definite) Horn formula.
26 4 Normalization

Theorem 1. Let A1 , . . . , An be generalized Horn formulas. We have a quadratic algorithm


transforming a classical proof in long normal form of ⊥ from A 1 , . . . , An into a proof in
minimal logic of ⊥ from the same assumptions.

Proof by induction on the total number of stability axioms used. Note first that bound
assumption variables u in the given normal proof can only occur in the context
∀~
x.¬¬R(~
x)→R(~
x)~
StabR t(λu d)

with u of type ¬R(~t) and d of type ⊥. The reason for this is that all top formulas different
from stability axioms are generalized Horn formulas which never have an implication in
the premise of another implication.
Case 1. There is at least one occurrence of a bound assumption variable in the
proof. Since we assume our proof to be in long normal form, any of the occurrences of
an assumption variable u of type ¬R(~t) must be the main premise of an →–elimination,
i.e. must be in a context ud1 where d1 derives R(~t). Now choose an uppermost occurrence
of a bound assumption variable, i.e. a subderivation ud1 where d1 does not contain an
occurrence of any bound assumption variable. Since d1 derives R(~t), we can replace the
whole subderivation StabR (~t)(λu d) of R(~t) (the one where u is bound) by d1 . Hence we
have removed one occurrence of a stability axiom.
Case 2. Otherwise. If there are no more stability axioms in the proof, we are
done. If not, choose an uppermost occurrence of a stability axiom, i.e. a subderivation
StabR (~t)(λu d) where d does not contain stability axioms. Since we are in case 2 here d
also cannot contain free assumption variables which are bound elsewhere in the proof. But
since d derives ⊥, we can replace the whole proof (which also has ⊥ as its end formula) by
d and hence we are done again. 
Note that Theorem 1 is best possible in the sense that it becomes false if we allow an
implication in the body of one of the Horn formulas. A counterexample (due to U. Berger)
is
((P → Q) → ⊥) → (P → ⊥) → ⊥,
which is provable in classical but not in minimal logic. For if it were, we could replace
⊥ in this proof (which in minimal logic is just another propositional variable) by P , and
hence we would obtain a proof in minimal logic of the Peirce formula

((P → Q) → P ) → P,

which is known to be underivable.


By essentially the same argument we obtain the following variant of Theorem 1 for
generalized definite Horn formulas:

Theorem 2. Let A1 , . . . , An be generalized definite Horn formulas. We have a quadratic


algorithm transforming a classical proof in long normal form of an atomic formula B from
A1 , . . . , An into a proof in minimal logic of B from the same assumptions.
4 Normalization 27

Proof by a simple modification of the argument for Theorem 1. Note that in case 2
it cannot happen that stability axioms occur in the proof since then we would have a
derivation d of ⊥ from definite Horn formulas, which is clearly impossible. 

4.5 Uniqueness of normal forms

We now show the uniqueness of normal forms. More precisely, we show that the one–step
relations →βη and →βη –exp are confluent and hence that the βη normal form as well as
the long normal form are uniquely determined. This follows from the local confluence and
termination of these relations by the Lemma of Newman [15].
Remark : If we leave out the conversion rule

hπ0 (M ), π1 (M )i →0 M (η× )

(expressing surjectivity of pairing), then there is an elegant alternative method to prove


confluence of →βη , which does not use termination. The idea of this proof goes back
to J.B. Rosser and W.W. Tait and uses a parallel reduction relation →par satisfying
→βη ⊆→par ⊆→∗βη . For this relation the diamond property

M →par M ′ , M →par M ′′ =⇒ ∃M ′′′ .M ′ →par M ′′′ , M ′′ →par M ′′′

can be proved. For →βη instead of →par this property does not hold. The reason that it
holds for →par is that →par is compatible with substitution in the following sense: we have

M →par M ′ , N →par N ′ =⇒ M [N ] →par M ′ [N ′ ].

An exposition can be found in [25].


Let → be the one–step relation →βη . For the proof of local confluence of → we need
the following fact concerning compatibility of → with substitution.

Lemma. (i) M → M ′ =⇒ M [N ] → M ′ [N ].
(ii) N → N ′ =⇒ M [N ] →∗ M [N ′ ]. 

Note that (ii) with → instead of →∗ does not hold. The reason is that the variable x
to be substituted can have multiple occurrences in M , while the relation M → M ′ means
that M ′ is obtained from M by conversion at exactly one position.

Lemma. →βη is locally confluent, i.e.

M → M ′ , M → M ′′ =⇒ ∃M ′′′ .M ′ →∗ M ′′′ , M ′′ →∗ M ′′′ .

Proof by induction on M . If M → M ′ and M → M ′′ are obtained by the same rule


the claim follows immediately from the IH. The remaining cases split into two groups,
28 4 Normalization

depending on whether M is an abstraction or an application, or a pair or a component,


respectively. We start with the first group of cases.
Case −0 , −1 .
MN → M ′N
↓ ↓
MN′ → M ′N ′

Case −0 +, β.
(λx M )N → (λx M ′ )N
↓ ↓
M [N ] → M ′ [N ]

Here we have used M → M ′ =⇒ M [N ] → M ′ [N ].


Case −0 η, β.
(λx.M x)N → MN
↓ =
MN = MN

Case −1 , β.
(λx M )N → (λx M )N ′
↓ ↓
M [N ] →∗ M [N ′ ]

Here we have used N → N ′ =⇒ M [N ] →∗ M [N ′ ].


Case +−0 , η.
λx.M x → λx.M ′ x
↓ ↓
M → M′

Case +−1 , η. This case cannot occur, since there is no N such that x → N .
Case +β, η.
λx.(λx M )x → λx M
↓ =
λx M = λx M

We now treat the group of cases concerning pairs and components. We leave out the
index ×.
4 Normalization 29

Case −0 , −1 .
hM, N i → hM ′ , N i
↓ ↓
hM, N ′ i → hM ′ , N ′ i

Case −0 +0 , β0 (and similarly −1 +1 , β1 ).

π0 hM0 , M1 i → π0 hM0′ , M1 i
↓ ↓
M0 → M0′

Case −0 +1 , β0 (and similarly −1 +0 , β1 ).

π0 hM0 , M1 i → π0 hM0 , M1′ i


↓ ↓
M0 = M0

Case −i η, βi .
πi hπ0 (M ), π1 (M )i → πi (M )
↓ =
πi (M ) = πi (M )

Case +0 −0 , η (and similarly +1 −1 , η).

hπ0 (M ), π1 (M )i → hπ0 (M ′ ), π1 (M )i
↓ ↓∗
M → M′

Case +0 β0 , η (and similarly +1 β1 , η).

hπ0 hM0 , M1 i, π1 hM0 , M1 ii → hM0 , π1 hM0 , M1 ii


↓ ↓
hM0 , M1 i = hM0 , M1 i


30 4 Normalization

Lemma of Newman. If → is locally confluent and terminating, then → is confluent, i.e.

M →∗ M ′ , M →∗ M ′′ =⇒ ∃M ′′′ .M ′ →∗ M ′′′ , M ′′ →∗ M ′′′ .

Proof . A term M is called good if it satisfies the property of confluence, i.e. if for all
M ′ , M ′′ we have

M →∗ M ′ , M →∗ M ′′ =⇒ ∃M ′′′ .M ′ →∗ M ′′′ , M ′′ →∗ M ′′′ ;

otherwise M is called ambiguous. Now assume that an ambiguous term M exists. We


show that then there is an ambiguous term N such that M →+ N . This contradicts our
assumption that → is terminating.
So let M be ambiguous, and M ′ , M ′′ be terms satisfying M →∗ M ′ and M →∗ M ′′
and such that there is no term M ′′′ satisfying M ′ →∗ M ′′′ and M ′′ →∗ M ′′′ . We then have
M1 , M2 such that M → M1 →∗ M ′ and M → M2 →∗ M ′′ . From the local confluence of
→ we obtain an M3 such that

M → M1 →∗ M′
↓ ↓∗
M2 →∗ M3
↓∗
M ′′

Now if M1 , M2 , M3 were all good, we would have terms N, K, L such that

M → M1 →∗ M′
↓ ↓∗ ↓∗
M2 →∗ M3 →∗ N
↓∗ ↓∗ ↓∗
M ′′ →∗ K →∗ L

But this contradicts our assumption that M is ambiguous. 

Corollary (Church–Rosser). M =βη N ⇐⇒ ∃K.M →∗ K, N →∗ K.

Proof : ⇐=. Clear. =⇒. Induction on the definition of M =βη N . 


Now let → be the one–step relation →βη –exp . For the proof of local confluence of →
we need the following fact concerning compatibility of → with substitution.
4 Normalization 31

Lemma. (i) M → M ′ =⇒ M [N ] → M ′ [N ].
(ii) N → N ′ =⇒ ∃K.M [N ], M [N ′] →∗ K.

Proof. (ii) If N → N ′ is not a head η–expansion we have M [N ] →∗ M [N ′ ] and are done.


Otherwise, i.e. if N ′ = λz.N z or N ′ = hπ0 (N ), π1 (N )i, we generally have M [N ] 6→∗ M [N ′ ],
since η–expansion is context dependent. But in this case we can find a K such that
M [N ], M [N ′] →∗ K. To see this consider the occurrences of x in M . If x is in a context xL
with L term or 0 or 1 replace xL by N L; otherwise replace x by λz.N z or hπ0 (N ), π1 (N )i,
respectively. 

Lemma. → is locally confluent, i.e.

M → M ′ , M → M ′′ =⇒ ∃M ′′′ .M ′ →∗ M ′′′ , M ′′ →∗ M ′′′ .

Proof. We argue as before in the case →βη , by induction on M . If M → M ′ and M → M ′′


are obtained by the same rule or M → M ′ or M → M ′′ are formed by head η–expansion,
the claim follows immediately from the IH. The remaining cases split into two groups,
depending on whether M is an abstraction or an application, or a pair or a component,
respectively. We start with the first group of cases.
Case −0 , −1 .
MN → M ′N
↓ ↓
MN′ → M ′N ′

Case −0 +, β.
(λx M )N → (λx M ′ )N
↓ ↓
M [N ] → M ′ [N ]

Here we have used M → M ′ =⇒ M [N ] → M ′ [N ].


Case −1 , β.
(λx M )N → (λx M )N ′
↓ ↓∗
M [N ] →∗ K

Here we have used N → N ′ =⇒ ∃K.M [N ], M [N ′] →∗ K.


We now treat the group of cases concerning pairs and components. We leave out the
index ×.
32 4 Normalization

Case −0 , −1 .
hM, N i → hM ′ , N i
↓ ↓
hM, N ′ i → hM ′ , N ′ i

Case −0 +0 , β0 (and similarly −1 +1 , β1 ).

π0 hM0 , M1 i → π0 hM0′ , M1 i
↓ ↓
M0 → M0′

Case −0 +1 , β0 (and similarly −1 +0 , β1 ).

π0 hM0 , M1 i → π0 hM0 , M1′ i


↓ ↓
M0 = M0

4.6 Normalization by evaluation

We now show that normalization can be achived by evaluation, following [2]. We make use
of the fact proved above that any term M has a unique long normal form which we now
denote by lnf(M ). The following properties of the long normal form will be used.

M =βη lnf(M ), (1)


M =βη M ′ =⇒ lnf(M ) = lnf(M ′ ), (2)
lnf(M x) = N =⇒ lnf(M ) = λx N if x 6∈ FV(M ), (3)
lnf(M 0) = N0 and lnf(M 1) = N1 =⇒ lnf(M ) = hN0 , N1 i, (4)
~) = K
lnf(N ~ and xN
~ of ground type =⇒ lnf(xN~ ) = xK.
~ (5)

A term M is in long normal form if lnf(M ) = M . Let

Tρ = the set of all terms of type ρ,


Lρ = the set of all terms of type ρ in long normal form,
~ with N
Aρ = the set of all terms of type ρ of the form xN ~ in long normal form.

We define a model (Dρ )ρ of our language as follows. For any ground type ι let Dι be
the set Lι of all terms of type ι in long normal form. Furthermore let D ρ→σ be the set of
all functions f : Dρ → Dσ and Dρ×σ be the cartesian product Dρ × Dσ .
4 Normalization 33

For any type ρ we define functions ϕρ : Dρ → Lρ and ψρ : Aρ → Dρ by simultaneous


recursion on ρ. To see that ϕρ and ψρ are well–defined we need the properties (1)–(5) of
the long normal form listed above.
ϕι (M ) = M,
ϕρ→σ (f ) = λx.ϕσ (f (ψρ (x))), x a new variable,
ϕρ×σ ([a, b]) = hϕρ (a), ϕσ (b)i
and
~ ) = xM
ψι (xM ~,
ψρ→σ (xM ~ )(a) = ψσ (xM
~ ϕρ (a)),
~ ) = [ψρ (xM
ψρ×σ (xM ~ 0), ψσ (xM
~ 1)].
This definition of ψρ can be condensed into
~ )(~a) = xM
ψρ (xM ~ ϕ(~a).

Similar to Tait’s notion of strong computability we define for any type ρ a relation Rρ
on Tρ × Dρ by
Rι (M, a) ⇐⇒ lnf(M ) = a,
Rρ→σ (M, f ) ⇐⇒ ∀N, a.Rρ (N, a) =⇒ Rσ (M N, f (a)),
Rρ×σ (M, [a, b]) ⇐⇒ Rρ (M 0, a) and Rσ (M 1, b).
Note that M =βη M ′ implies R(M, a) ⇐⇒ R(M ′ , a); this follows from (2) by induction
on M .

Lemma 1. (i) Rρ (M, a) =⇒ lnf(M ) = ϕρ (a).


~ in long normal form =⇒ Rρ (xM
(ii) M ~ , ψρ (xM
~ )).

Proof . Induction on ρ. Case ι. Clear.


Case ρ → σ, (i). Assume Rρ→σ (M, f ). We must show lnf(M ) = λx ϕσ (f (ψρ (x))).
By (3) it suffices to show that lnf(M x) = ϕσ (f (ψρ (x))). By IHσ (i) it suffices to show
Rσ (M x, f (ψρ(x))). By definition of Rρ→σ (M, f ) it suffices to show Rρ (x, ψρ (x)). But this
holds by IHρ (ii).
Case ρ × σ, (i). Assume Rρ×σ (M, [a, b]), hence Rρ (M 0, a) and Rσ (M 1, b). We must
show lnf(M ) = hϕρ (a), ϕσ (b)i. By (4) it suffices to show lnf(M 0) = ϕρ (a) and lnf(M 1) =
ϕσ (b). But this follows from IHρ (i) and IHσ (i).
Case (ii). Let M ~ be in long normal form. We must show R ρ (xM
~ , ψρ (xM
~ )). So let N
~,
~a mit R(N ~ , ~a) be given. We must show lnf(xM ~N
~ ) = xM
~ ϕ(~a). By (5) it suffices to show
~ ) = ϕ(~a). But this follows from IH(i).
lnf(N 
Since in our model Dρ→σ is the set of all functions f : Dρ → Dσ and Dρ×σ is the
cartesian product Dρ × Dσ we can define the value [[M ]]U ∈ Dρ for any term M of type ρ
and any environment U in the standard way (cf. §1).
34 4 Normalization

Lemma 2. Assume R(c, [[c]]) for any constant c. Then for any term M [~x] of type ρ and
any environment U
R(K, ~ U (~x)) =⇒ R(M [K/~~ x], [[M ]]U ).

Proof by induction on M . Case xi . Clear. Case c. By assumption.


Case λx M . Let N , a be given such that Rρ (N, a). We must show
~ x])N, [[λx M ]]U (a)).
Rσ ((λx M [K/~
Since by the remark above Rσ is compatible with =βη it suffices to show
~
Rσ (M [N, K/x, ~x], [[M ]]Uxa ).
But this holds by IH for M .
Case M N . We must show
~ x], [[M N ]]U ).
R((M N )[K/~
~ x], [[M ]]U ) and R(N [K/~
By definition of R it suffices to show R(M [K/~ ~ x], [[N ]]U ). But this
holds by IH for M and N .
~ x], [[M ]]U ) and Rσ (N [K/~
Case hM, N i. We must show Rρ (M [K/~ ~ x], [[N ]]U ). But this
holds by IH for M and N .
Case M i. We must show R((M i)[K/~ ~ x], [[M i]]U ). By definition of R it suffices to show
~ x], [[M ]]U ). But this holds by IH for M .
R(M [K/~ 
Now consider the special environment Uψ (xσ ) := ψσ (x). We write [[M ]]ψ for [[M ]]Uψ .
Then we have R(x, Uψ (x)) by Lemma 1(ii) and hence by Lemma 2 R(M, [[M ]] ψ ) for any
term M . Hence by Lemma 1(i) ϕρ ([[M ]]ψ ) is the long normal form of M .
Remark . This observation leads to an efficient implementation of normalization for
terms with free variables. The ϕρ and ψρ can be computed easily, and one can use the
built–in evaluation mechanism of Scheme to provide the value [[M ]] ψ of a term M in the
environment Uψ .

4.7 Normal versus non–normal derivations

We now show that the requirement to give a normal derivation of a derivable formula can
sometimes be unrealistic. Following Orevkov [16] we give examples of formulas C k which
are easily derivable with non–normal derivations (whose number of nodes is linear in k),
but which require a non–elementary (in k) number of nodes in any normal derivation.
The example is related to Gentzens proof in [12] of transfinite induction up to ωk in
arithmetic; see e.g. [20] for an exposition. There the function y ⊕ ω x plays a crucial role,
and also the assignment of a “lifting”–formula A+ to any formula A, by
A+ := ∀y.(∀z ≺ y)A[z/x] → (∀z ≺ y ⊕ ω x )A[z/x].
4 Normalization 35

Here we consider the numerical function y + 2x instead, and axiomatize its graph by
means of Horn clauses. The formula Ck expresses that from these axioms the existence of
2k follows. A short, non–normal proof of this fact can then be given by a modification of
Gentzen’s idea, and it is easily seen that any normal proof of Ck must contain at least 2k
nodes.
The derivations to be given make heavy use of the existential quantifier ∃ defined by
¬∀¬. In particular we need:

Existence–Introduction–Lemma. ⊢ A → ∃x A.

Proof. A derivation term is


λuA λv ∀x ¬A .vxu. 

Existence–Elimination–Lemma. ⊢ (¬¬B → B) → ∃x A → (∀x.A → B) → B if


x 6∈ FV(B).

Proof. A derivation term is

λu¬¬B→B λv ¬∀x¬A λw∀x.A→B .uλu¬B A


2 .vλxλu1 .u2 (wxu1 ). 

Note that the stability assumption ¬¬B → B is not needed if B does not contain an
atom 6= ⊥ as a strictly positive subformula. This will be the case for the derivations below,
where B will always be an existential formula.
Let us now fix our language. We use a ternary relation symbol R to represent the
graph of the function y + 2x ; so R(y, x, z) is intended to mean y + 2x = z. We now
axiomatize R by means of Horn clauses. For simplicity we use a unary function symbol
s (to be viewed as the successor function) and a constant 0; one could use logic without
function symbols instead — as Orevkov does —, but this makes the formulas somewhat
less readable and the proofs less perspicious.

Hyp1 : ∀y R(y, 0, s(y))


Hyp2 : ∀y, x, z, z1.R(y, x, z) → R(z, x, z1 ) → R(y, s(x), z1)

The goal formula then is

Ck := ∃zk , . . . , z0 .R(0, 0, zk ) ∧ R(0, zk , zk−1 ) ∧ . . . ∧ R(0, z1 , z0 ).

To obtain the short proof of the goal formula Ck we use formulas Ai with a free
parameter x; for ease in reading we write A[r] instead of A[r/x].

A0 := ∀y∃z R(y, x, z),


Ai+1 := ∀y.Ai [y] → ∃z.Ai [z] ∧ R(y, x, z).
36 4 Normalization

Lemma. ⊢ Hyp1 → Hyp2 → Ai [0].

Proof . We give an informal argument, which can easily be converted into a formal proof.
Note that the Existence–Elimination–Lemma is used only with existential formulas as
conclusions. Hence it is not necessary to use stability axioms and we have a derivation in
minimal logic.
Case i = 0. Obvious by Hyp1 .
Case i = 1. Let x with A0 [x] be given. It is sufficient to show A0 [s(x)], that is
∀y∃z1 R(y, s(x), z1). So let y be given. We know

A0 [x] = ∀y∃z R(y, x, z). (1)

Applying (1) to our y gives z such that R(y, x, z). Applying (1) again to this z gives z1
such that R(z, x, z1 ). By Hyp2 we obtain R(y, s(x), z1).
Case i + 2. Let x with Ai+1 [x] be given. It suffices to show Ai+1 [s(x)], that is
∀y.Ai [y] → ∃z.Ai [z] ∧ R(y, s(x), z). So let y with Ai [y] be given. We know

Ai+1 [x] = ∀y.Ai [y] → ∃z1 .Ai [z1 ] ∧ R(y, x, z1). (2)

Applying (2) to our y gives z such that Ai [z] and R(y, x, z). Applying (2) again to this z
gives z1 such that Ai [z1 ] and R(z, x, z1 ). By Hyp2 we obtain R(y, s(x), z1). 
Remark . Note that the derivations given have a fixed length, independent of i.

Lemma. ⊢ Hyp1 → Hyp2 → Ck .

Proof . We give an informal argument, which can easily be converted into a formal proof.
Again the Existence–Elimination–Lemma is used only with existential formulas as conclu-
sions, and hence we have a derivation in minimal logic.
Ak [0] applied to 0 and Ak−1 [0] yields zk with Ak−1 [zk ] and R(0, 0, zk ).
Ak−1 [zk ] applied to 0 and Ak−2 [0] yields zk−1 with Ak−2 [zk−1 ] and R(0, zk , zk−1 ).
A1 [z2 ] applied to 0 and A0 [0] yields z1 with A0 [z1 ] and R(0, z2 , z1 ).
A0 [z1 ] applied to 0 yields z0 with R(0, z1 , z0 ). 
Remark . Note that the derivations given have length linear in k.
We want to compare the length of this derivation of Ck with the length of an arbitrary
normal derivation.

Lemma. Any normal derivation of Ck from Hyp1 and Hyp2 has at least 2k nodes.

Proof . Let a normal derivation d of falsity ⊥ from Hyp1 , Hyp2 and the additional hypo-
thesis
u: ∀zk , . . . , z0 .R(0, 0, zk ) → R(0, zk , zk−1 ) → . . . → R(0, z1 , z0 ) → ⊥
4 Normalization 37

be given. We may assume that d does not contain free object variables (otherwise substitute
them by 0). The main branch of d must begin with u, and its side premises are all of the
form R(0, sn(0), sk (0)).
Observe that any normal derivation of R(sm (0), sn (0), sk (0)) from Hyp1 , Hyp2 and u
has at least 2n occurrences of Hyp1 and is such that k = m + 2n . This can be seen easily
by induction on n. Note also that such a derivation cannot involve u.
If we apply this observation to the above derivations of the side premises we see that
they derive

0 0 20
R(0, 0, s2 (0)), R(0, s2 (0), s2 (0)), ... R(0, s2k−1 (0), s2k (0)).

The last of these derivations uses at least 22k−1 = 2k –times Hyp1 . 


5 The strong existential quantifier

We now extend our language L by a strong existential quantifier written ∃∗ (as opposed
to ∃ defined by ¬∀¬). There are two approaches to deal with formulas containing ∃∗ in a
constructive setting, e.g. in minimal or intuitionistic logic.

• A formula containing ∃∗ is considered not to be an entity the deduction system can deal
with: some “realizing terms” are required to turn it into a “judgement” (this terminol-
ogy is due to Weyl [34] and has been taken up by Martin–Löf). E.g. r realizes ∃∗ x A
is a judgement, which can be translated into A[r/x].

• (Heyting) The logic is extended by axioms expressing the intended meaning of the
strong existential quantifier.

We will treat both approaches here. At first sight, Weyl’s point of view is more convincing.
However, Heyting’s is more prominent in the literature, and we also need it to properly
discuss Friedman’s A–translation.
Let us first describe Heyting’s approach. Here we extend our notion of an L–formula
by adding a clause

• If A is a formula, then ∃∗ xρ A is a formula.

In the inductive definition of derivation terms dA in minimal logic and their sets FA(dA )
of free assumptions we have to add two more clauses:

(∃∗+ ) If A is a formula and x is a variable of type ρ, then

∃∗+ ∗
x,A : ∀∀x.A → ∃ x A

is a derivation term, where ∀C denotes the universal closure of C, and FA(∃∗+


x,A ) = ∅.

(∃∗− ) If A, B are formulas and x is a variable of type ρ such that x 6∈ FV(B), then

∃∗− ∗ ρ ρ
x,A,B : ∀.(∃ x A) → (∀x .A → B) → B

is a derivation term with FA(∃∗−


x,A,B ) = ∅.

Clearly FV(∃∗+ ) = FV(∃∗− ) = ∅.


For these new derivation terms we have the following conversion rule:
 
∃∗−~t ∃∗+~tt dA[t/x] e 7→ etd. (∃∗ )
5 The strong existential quantifier 39

It can be shown that any derivation term has a unique normal form with respect to βη∃∗ –
conversion.
An alternative (in fact more usual) way to introduce the strong existential quantifier
∃∗ into our natural deduction calculus is to use rules instead of axiom schemata. These
rules have been formulated by Gentzen [11], as follows.

(∃∗+ )
|
A[t] ∗+

∃∗ x A

(∃∗− )
[A]
| |

∃ xA B
∃∗− ,
B
provided x does not occur free in B and in any open assumption of the given derivation
of B apart from the assumption A shown.

It is easy to show that both calculi are equivalent.


Note that the calculus with ∃∗ –rules can also be viewed as an inductive definition of
a set of derivation terms, as in §2. We just have to add two clauses:

(∃∗+ ) If dA[t] is a derivation term, then



∃∗+ (t, dA[t] )∃ xA

is a derivation term with FA(∃∗+ (t, dA[t] )) = FA(dA[t) ).


∗ ρ
(∃∗− ) S
If d∃ x A and eB are derivation terms, uA is an assumption variable and x 6∈ FV(B) ∪
{FV(C) : v C ∈ FA(eB ) \ {uA }}, then
∗ ρ
∃∗− (d∃ x A
, xρ , uA , eB )B

is a derivation term with FA(∃∗− (d, x, u, e)) = FA(d) ∪ (FA(e) \ {u}). The variables x
and u are viewed as bound by ∃∗− (d, x, u, e).

Also the definition of the set FV(d) of free (object) variables has to be extended by

FV(∃∗+ (t, d)) := FV(d),


FV(∃∗− (d, x, u, e)) := FV(d) ∪ (FV(e) \ {x}).
40 5 The strong existential quantifier

Similarly one can also introduce a strong disjunction into our natural deduction cal-
culus, written ∨∗ (as opposed to ∨ defined by A ∨ B := ¬(¬A ∧ ¬B)). We again extend
our notion of an L–formula by adding a clause

• If A, B are formulas, then A ∨∗ B is a formula.

In the inductive definition of derivation terms dA in minimal logic and their sets FA(dA )
of free assumptions we have to add three more clauses:

(∨∗+
0 ) If A, B are formulas, then
∨∗+ ∗
A,B,0 : ∀.A → A ∨ B

is a derivation term, and FA(∨∗+


A,B,0 ) = ∅.

(∨∗+
1 ) If A, B are formulas, then
∨∗+ ∗
A,B,1 : ∀.B → A ∨ B

is a derivation term, and FA(∨∗+


A,B,1 ) = ∅.

(∨∗− ) If A, B, C are formulas, then


∨∗− ∗
A,B,C : ∀.A ∨ B → (A → C) → (B → C) → C

is a derivation term with FA(∨∗−


A,B,C ) = ∅.

Clearly FV(∨∗+
i ) = FV(∨
∗−
) = ∅.
An alternative (and again more usual) way to introduce strong disjunction ∨∗ into
our natural deduction calculus is to use rules instead of axiom schemata. These rules have
been formulated by Gentzen [11], as follows.

(∨∗+
0 )
|
A
∨∗+
A∨ B 0

(∨∗+
1 )
|
B
∨∗+
A ∨∗ B 1

(∨∗− )
[A] [B]
| | |
A ∨∗ B C C
∨∗− .
C
5 The strong existential quantifier 41

It is easy to show that both calculi are equivalent.


Note that the calculus with ∨∗ –rules can again be viewed as an inductive definition
of a set of derivation terms, as in §2. We just have to add three clauses:

(∨∗+ A
0 ) If d is a derivation term, then


∨∗+ A A∨
0 (d )
B

is a derivation term with FA(∨∗+ A A


0 (d )) = FA(d ).

(∨∗+ B
1 ) If e is a derivation term, then


∨∗+ B A∨
1 (e )
B

is a derivation term with FA(∨∗+ B B


1 (e )) = FA(e ).

(∨∗− ) If dA∨ B
and eC , f C are derivation terms and uA , v B are assumption variables, then

∨∗− (dA∨ B
, uA , eC , v B , f C )B

is a derivation term with FA(∨∗− (d, u, e, v, f )) = FA(d) ∪(FA(e) \ {u}) ∪(FA(f ) \ {v}).
Again the variables u, v are viewed as bound by ∨∗− (d, u, e, v, f ).

Also the definition of the set FV(d) of free (object) variables has to be extended by

FV(∨∗+ A A∨
0 (d )
B
) := FV(d) ∪ FV(B),

FV(∨∗+ B A∨ B
1 (e ) ) := FV(e) ∪ FV(A),
∗−
FV(∨ (d, u, e, v, f )) := FV(d) ∪ FV(e) ∪ FV(f ).

Note that one can easily extend the Ex–falso–quodlibet Lemma to the present situation
and prove ⊥ → A for an arbitrary formula A. In the cases ∃∗ x A and A ∨∗ B just use the
corresponding introduction axiom (or rule).
As an application of normalization we obtain the so–called existence and disjunction
properties of minimal and intuitionistic logic. To formulate it we introduce the notion of
an instance of a formula possibly involving ∃∗ , ∨∗ . It is obtained by recursively replacing
stricly positive subformulas ∃∗ x A by A[t/x] and A ∨∗ B by A or B. More precisely, we
give the following inductive definition.

Definition. The relation “A′ is an instance of A” is defined inductively by the following


clauses.

(i) A is an instance of A if A does not contain an ∃∗ – or ∨∗ –formula as strictly positive


subformula.
42 5 The strong existential quantifier

(ii) If B ′ is an instance of B, then A → B ′ is an instance of A → B.

(iii) If A′ is an instance of A and B ′ is an instance of B, then A′ ∧ B ′ is an instance of


A ∧ B.

(iv) If A′ is an instance of A, then ∀x A′ is an instance of ∀x A.

(v) If A′ is an instance of A, then A′ is an instance of A ∨∗ B. If B ′ is an instance of B,


then B ′ is an instance of A ∨∗ B.

(vi) If A′ is an instance of A[t], then A′ is an instance of ∃∗ x A.

Note that if A′ is an instance of A, then ⊢ A′ → A.

Theorem. Let d[u1 : A1 , . . . , un : An ]: B where A1 , . . . , An do not contain an ∃∗ – or ∨∗ –


formula as strictly positive subformula. Then we can find an instance B ′ of B and a
derivation d′ [u1 : A1 , . . . , un : An ]: B ′ .

Proof by an analysis of the normal form of d. 


We can also show that the logic with ∃∗ and ∨∗ is conservative over the fragment
without these connectives. This can either be proved using the modified realizabi;ity
interpretation to be introduced in the next section, or else directly by an analysis of the
normal form of the derivation.
6 Realizing terms

Let us now describe Weyl’s approach. We restrict ourselves to formulas without ∨∗ , since
in the presence of a ground type of booleans we can define A ∨∗ B by

A ∨∗ B := ∃∗ p.(p = true → A) ∧ (p = false → B).

We define judgements to be expressions of the form

r1ρ1 , . . . , rm
ρm
mr A

(to be read r1ρ1 , . . . , rm


ρm
modified realizes A), where A is a formula built from atomic
formulas using →, ∧, ∀ and ∃∗ , and ρ1 , . . . , ρm = τ (A) is a list of types associated with A,
defined as follows.
τ (R(~t)) := ε,
where ε denotes the empty list, and if τ (A) = ρ~ and τ (B) = σ1 , . . . , σn we let

τ (A → B) := ρ~ → σ1 , . . . , ρ
~ → σn ,
τ (A ∧ B) := ρ~, ~σ,
τ (∀xρ B) := ρ → σ1 , . . . , ρ → σn ,
τ (∃∗ xρ B) := ρ, ~σ.

~ → σ1 , . . . , ρ
Instead of ρ ~ → σn we will sometimes write ρ
~ → ~σ . To give some examples, let
n, m, k be of type nat. Then

τ (∀n∃∗ m R(n, m)) = nat → nat,


τ (∀n∃∗ m∃∗ k R(n, m, k) = (nat → nat), (nat → nat),
τ (∀n∃∗ m R(n, m) → ∃∗ k Q(k)) := (nat → nat) → nat.

Note that τ (A) = ε iff A is a Harrop formula (i.e. contains ∃∗ in premises of → only).
For any judgement we now define its modified realizability interpretation, i.e. its trans-
lation in our ∃∗ –free language.

ε mr R(~t) := R(~t),
r1 , . . . , rn mr (A → B) := ∀~x.~x mr A → r1 ~x, . . . , rn~x mr B,
~r, ~s mr (A ∧ B) := ~r mr A ∧ ~s mr B,
r1 , . . . , rn mr ∀xρ B := ∀xρ r1 x, . . . , rn x mr B,
r, ~s mr ∃∗ xρ B := ~s mr B[r/x].

Note that for Harrop formulas A we have ε mr A ≡ A iff A does not contain ∃∗ .
Let us now set up a relation between the implicit and the explicit approach to deal
with the existential quantifier.
44 6 Realizing terms
τ (B)
Definition. Assume that to any assumption variable uB we have assigned a list ~xu =
xρu11 , . . . , xρun
n
of distinct variables, where ρ1 , . . . , ρn = τ (B). Relative to this assignment we
define for any derivation dA its extracted terms ets(dA ), by induction on dA . If τ (A) =
σ1 , . . . , σk , then ets(dA ) will be a list r1σ1 , . . . , rkσk .

ets(uA ) = ~xτu(A) ,
ets(λuA dB ) = λ~xτu(A) ets(d),
ets(dA→B eA ) = ets(d)ets(e),
ets(hdA , eB i) = ets(dA ), ets(eB ),
ets(π0 (dA∧B )) = the head of ets(dA∧B ) of same length as τ (A),
ets(π1 (dA∧B )) = the tail of ets(dA∧B ) of same length as τ (B),
ets(λxρ dA ) = λxρ ets(d),
ρ
ets(d∀x A tρ ) = ets(d)t,
ets(∃∗+
x,A ) = λ~
xλxλ~y .x, ~y,
ets(∃∗−
x,A,B ) = λ~
xλxλ~y λz1 . . . λzn .z1 x~y , . . . , zn x~y .

Note that if ets(d) = r1 , . . . , rk and ets(e) = ~s, then ets(d)ets(e) = r1~s, . . . , rk~s and
λ~x ets(d) = λ~x r1 , . . . , λ~x rk . In the last clause the (omitted) types are
ρ→τ (A)→σj
xρ , (~y )τ (A) and zj ,

where τ (B) = σ1 , . . . , σn .
The following can be proved easily.
τ (A)
Lemma. FV(ets(d)) ⊆ FV(d) ∪ {~xu : uA ∈ FA(d)}.

Lemma. We have ets(d[t/x]) = ets(d)[t/x] and ets(d[e/u]) = ets(d)[ets(e)/~xu ], hence

d =β∃∗ e =⇒ ets(d) =β ets(e)


d =η e =⇒ ets(d) =η ets(e) 

Hence we can safely identify terms with the same βη∃∗ –normal forms.

Soundness Theorem. Assume that to any assumption variable uA we have assigned a


τ (A) τ (A)
list ~xu and a new assumption variable ũ: ~xu mr A. Relative to this assignment we can
find for any derivation d: A with FA(d) = {u1 : A1 , . . . , un : An } a derivation

µ(d): ets(d) mr A

with FA(µ(d)) = {ũ1 : ~x1 mr A1 , . . . , ũn : ~xn mr An }.

Proof by induction on d.
6 Realizing terms 45

Case ui : Ai . Then ũi : ~xui mr Ai .


Case λuB dA . Let τ (B) = ρ~ and τ (A) = σ1 , . . . , σm . Then we have τ (B → A) = ρ~ →
σ1 , . . . , ~ρ → σm . We look for a derivation of

λ~xρ~ ets(dA ) mr (B → A),

i.e. of
∀~xρ~ .~x mr B → ets(dA ) mr A,
since (λ~xρ~ ets(dA ))~x =β ets(dA ) and terms with the same β–normal form are identified.
Hence we can take
µ(λuB dA ) := λ~xλũ~x mr B µ(d).

Case dA→B eA . By IH we have

µ(d): ets(d) mr (A → B),


µ(e): ets(e) mr A.

By definition the first of these means

µ(d): ∀~x.~x mr A → ets(d)~x mr B.

Hence we have
µ(d)ets(e)µ(e): ets(d)ets(e) mr B
and since ets(de) = ets(d)ets(e) we can take µ(de) := µ(d)ets(e)µ(e).
Case hdA , eB i. By IH we have

µ(d): ets(d) mr A,
µ(e): ets(e) mr B,

hence

hµ(d), µ(e)i: ets(d) mr A ∧ ets(e) mr B.

Now since ets(d) mr A ∧ ets(e) mr B ≡ ets(d), ets(e) mr (A ∧ B) ≡ ets(hd, ei) mr (A ∧ B) it


suffices to define µ(hd, ei) := hµ(d), µ(e)i.
Case π0 (dA∧B ). By IH we have

µ(d): ets(d) mr (A ∧ B).

By the definition of modified realizability µ(d) proves a conjunction. Let ℓ be the length
of τ (A) and headℓ (ets(d)) the head of ets(d) of length ℓ. Then

π0 (µ(d)): headℓ (ets(d)) mr A.


46 6 Realizing terms

So it suffices to define µ(π0 (d)) := π0 (µ(d)).


Case π1 (dA∧B ). Similar.
Case λxρ dA . We have to find a derivation of
λxρ ets(d) mr ∀xρ A,
i.e. of (since we identify terms with the same β–normal forms)
∀xρ ets(d) mr A.
By IH we have µ(d): ets(d) mr A with free assumptions not involving x. Hence we can take
µ(λxρ d) := λxρ µ(d).
ρ
Case d∀x A ρ
t . By IH we have
µ(d): ets(d) mr ∀x A,
i.e.
µ(d): ∀x ets(d)x mr A.
Hence µ(d)t: ets(d)t mr A[t/x] and we can take µ(dt) := µ(d)t.
Case ∃∗+ ∗+
x,A . We look for a derivation µ(∃x,A ) of
(λ~xλxλ~y .x, ~y) mr ∀~x∀xρ .A → ∃∗ xρ A
∀~x∀xρ (λ~y .x, ~y) mr (A → ∃∗ xρ A)
∀~x∀xρ ∀~y .~y mr A → x, ~y mr ∃∗ xρ A
∀~x∀xρ ∀~y .~y mr A → ~
y mr A
Hence we can take µ(∃∗+
x,A ) := λ~
xλxλ~y λu u where u: ~y mr A.
Case ∃∗−
x,A,B . Recall that

∃∗− x.∃∗ x A → (∀x.A → B) → B


x,A,B : ∀~

with x 6∈ FV(B). We look for a derivation µ(∃∗−


x,A,B ) of
(λ~xλxλ~y λz1 . . . λzn .z1 x~y , . . . , zn x~y ) mr ∀~x.∃∗ x A → (∀x.A → B) → B
∀~x(λxλ~y λz1 . . . λzn .z1 x~y , . . . , zn x~y ) mr (∃∗ x A → (∀x.A → B) → B)
∀~x∀x∀~y .x, ~y mr ∃∗ x A → (λz1 . . . λzn .z1 x~y , . . . , zn x~y ) mr ((∀x.A → B) → B)
∀~x∀x∀~y .~y mr A → ∀~z .~z mr (∀x.A → B) → z1 x~y , . . . , zn x~y mr B
Now since modulo β–equivalence ~z mr ∀x.A → B is the same as
∀x∀~y .~y mr A → z1 x~y , . . . , zn x~y mr B
we can easily derive this formula by
µ(∃∗−
x,A,B ) := λ~
xλxλ~y λuλ~z λv.vx~y u,
where u: ~y mr A and v: ∀x∀~y .~y mr A → z1 x~y , . . . , zn x~y mr B. 
Remark : Mints has shown already in [14] that for any d: ∃∗ x A with FA(d) = FV(d) =
∅ the extracted terms ets(d) reduce into the terms that can be read off from the long
normal form of d: ∃∗ x A. This has later been generalized by Stein in his thesis [28].
7 Arithmetic

Let us now extend these considerations to arithmetic. We allow constants for primitive
recursive functionals of arbitrary types (i.e. terms of Gödel’s T ), identifying terms with
the same normal form (w.r.t. the usual conversion rules for Gödel’s T ). It is assumed
that at least the ground types nat of natural numbers and boole of booleans are present.
We restrict ourselves to decidable atomic formulas; it is convenient to represent them by
boolean terms, i.e. in the form atom(t boole ) where atom is a distinguished relation symbol.
We could equally well take equations r = s with r, s terms of type nat as the only boolean
formulas. We do not need ⊥ as an extra atomic formula, since it can be defined by
⊥ := atom(false). Let us use n, m as variables of type nat and p, q as variables of type
boole. Our induction schemata are the universal closures of

A[0/n] → (∀n.A → A[n + 1/n]) → ∀n A,


A[true/p] → A[false/p] → ∀p A.

We can use boolean induction (i.e. case analysis) to prove stability ¬¬atom(p) → atom(p)
and from this we can as before conclude the stability ¬¬A → A of formulas A built from
atoms by →, ∧ and ∀. As already remarked, strong disjunction ∨∗ can be defined by
means of the strong existential quantifier ∃∗ .
Let us now carry out this program. First we extend the notion of a term by adding
the clauses

– The constants 0nat , S nat→nat , trueboole and falseboole are terms.

– For each type ρ the constants

Rnat,ρ : ρ → (nat → ρ → ρ) → nat → ρ,


Rboole,ρ : ρ → ρ → boole → ρ

are terms.

We add the following conversion rules (writing t + 1 for St).

Rnat,ρ rs0 →0 r,
Rnat,ρ rs(t + 1) →0 st(Rnat,ρ rst)

and

Rboole,ρ rs true →0 r,
Rboole,ρ rs false →0 s.
48 7 Arithmetic

We now show that addition of these conversion rules for R does not destroy the
property of our reduction relation to be terminating.
There are many treatments of this problem in the literature. Troelstra in [30], p. 107–
108 gives a proof which either uses König’s Lemma (or the fan theorem) or else the Church–
Rosser Theorem for →. A proof which works for closed terms only (but in a general setting
where bounded fixed point operators are allowed) is given in [26]. The short proof below
is due to Ulrich Berger; it works for arbitrary arithmetical terms and does not use the
Church–Rosser Theorem.

Theorem. → is terminating.

Proof. We extend the argument in §4, which was based on Tait’s strong computability
predicates. Let us write SC(r) to mean that r is strongly computable, and SN(r) to mean
that r is strongly normalizeable. It remains to be shown that the constants Rnat,ρ and
Rboole,ρ are strongly computable. We restrict ourselves to Rnat,ρ ; for Rboole,ρ the argument
is similar (and simpler). So we have to show that for any terms r, s, t, ~t

SC(r, s, t, ~t) =⇒ SN(Rrst~t), (1)

since for ground types SC and SN coincide. So assume SC(r, s, t, ~t). We prove (1) by
induction on triples
(h(r, s, t), #t, h(~t))
ordered lexicographically, where #r is the length of r and h(r1 , . . . , rn ) is the sum of the
heights of the reductions trees for r1 , . . . , rn (assuming SN(r1 , . . . , rn )); here we have to
use König’s Lemma. It clearly suffices to show

∀t′ .Rrst~t → t′ =⇒ SN(t′ ).

If the reduction takes place within a subterm r, s, t, ~t the assigned triple gets smaller and the
claim follows by IH (here we need that strong computability is preserved under conversion
steps; cf. Lemma 2 in §4). Since the case t ≡ 0, t′ ≡ r~t is trivial it suffices to consider the
case t ≡ t0 + 1, t′ ≡ st0 (Rrst0 )~t. For all ~u with SC(~u) we have

(h(r, s, t0 ), #t0 , h(~u)) < (h(r, s, t), #t, h(~t)).

Hence by IH for any terms ~u

SC(~u) =⇒ SN(Rrst0~u),

since clearly SN(t0 ). Therefore SC(Rrst0 ) and we obtain SN(t′ ). 


Now as in §4 we can conclude via Newman’s Lemma that the normal form is uniquely
determined. For simplicity we identify terms with the same βηR–normal form. Hence any
closed term of type nat is identified with a term of the form S(S(S . . . (S0) . . .)) and any
7 Arithmetic 49

closed term of type boole is identified with either true or false. Such terms are denoted by
n and called numbers (even if they are of type boole).
Let us consider some examples of arithmetical terms. Addition n + m can be defined
easily by recursion on m; note that the parameter n remains fixed, since n + (Sm) =
S(n + m). Let
+ := λn, m.Rnat,nat n(λx, y.Sy)m.
Then we have

+n0 → Rn(λx, y.Sy)0 → n,


+n(Sm) → Rn(λx, y.Sy)(Sm)
→ (λx, y.Sy)m(Rn(λx, y.Sy)m)
→ S(Rn(λx, y.Sy)m)
=β S(+nm).

Similarly one can define the predecessor pred and the zero-test zero?: nat → boole. Equal-
ity n = m presents a slight problem, since in a definition by recursion on n the parameter
m has to be changed: we must define Sn = Sm by n = m. Therefore we represent
=: nat → nat → boole as the function which maps n to λm.n = m. More formally,

equal? := λn, m.Rnat,nat→boole f gnm

with

f := zero?,
g := λn, h, m.Rboole,boole false (h(pred m))(zero? m).

Then we have

equal? 00 → Rf g00 → zero? 0 → true,


equal? 0(Sm) → Rf g0(Sm) → zero?(Sm) → false,
equal? (Sn)0 → Rf g(Sn)0 → gn(Rf gn)0 → false,
equal? (Sn)(Sm) → Rf g(Sn)(Sm)
→ gn(Rf gn)(Sm)
→ Rboole,boole false (Rf gnm)(zero? (Sm))
→ Rf gnm
=β equal? nm.

Problem: Define a closed term f : nat → nat representing the Fibonacci sequence

f 0 = 1,
f 1 = 1,
f (n + 2) = f n + f (n + 1).
50 7 Arithmetic

Hint: Define an auxiliary term f : nat → nat → nat such that


f m, if m < n,
n
f nm =
0 otherwise.

Formulas are built from atomic formulas by means of →, ∧, ∀ and ∃ ∗ . We let ⊥ :=


atom(false), ¬A := A → ⊥ and (writing p for atom(p)) A∨∗ B := ∃∗ p.(p → A)∧(¬p → B).
We also extend the notion of a derivation term by constants for the truth axiom and
induction axioms. Hence derivation terms in arithmetic are obtained by adding the clauses

(T) T: atom(true) is a derivation term with FA(T) = ∅.

(Ind) For any formula ∀n A

Indn,A : ∀.A[0/n] → (∀n.A → A[n + 1/n]) → ∀n A

is a derivation term with FA(Indn,A ) = ∅. Similarly, for any formula ∀p A

Indp,A : ∀.A[true/p] → A[false/p] → ∀p A

is a derivation term with FA(Indp,A ) = ∅.

Clearly FV(T) := F V (Indn,A ) := FV(Indp,A ) = ∅.


In the sequel ⊢ refers to derivability in the arithmetical system determined by the
derivation terms just described.
Stabatom : ∀p.¬¬atom(p) → atom(p) can now be proved easily by boolean induction,
using the truth axiom in the case true. As in §2 we can conclude ⊢ ¬¬A → A for formulas
A without ∃∗ , i.e. built with →, ∧ and ∀. We also have ⊢ ⊥ → A for any A. Furthermore
we can derive the usual axioms for ∨∗ .
Problem: Give a derivation of ∀n atom(equal?nn) for the closed term equal?: nat →
nat → boole defined above.
For these new derivation terms we have the following conversion rules.

Indn,A~rde0 →0 d,
Indn,A~rde(t + 1) →0 et(Indn,A~rdet)

and

Indp,A~rde true →0 d,
Indp,A~rde false →0 e.

Again it can be shown by standard methods — just as for Gödel’s T , cf. [30] — that any
derivation term in arithmetic has a unique βη∃∗ R–normal form, where R refers to the
conversion rules above.
7 Arithmetic 51

The notion of extracted terms can straightforwardly be extended to this situation,


and the Soundness Theorem carries over easily. In the case of Indn,A we have to prove

ets(Indn,A ) mr ∀~x.A[0/n] → (∀n.A → A[n + 1/n]) → ∀n A,

i.e.

∀~x∀~y ∀f~∀n.~y mr A[0/n] → (∀n∀~y1 .~y1 mr A → f~n~y1 mr A[n + 1/n])


→ ets(Indn,A )~x~y f~n mr A.

Hence we let ets(Indn,A ) := λ~x.R1 , . . . , Rk where k is the length of τ (A) = ρ1 , . . . , ρk and


R1 , . . . , Rk are simultaneous primitive recursion operators of type Ri : ρ ~ → (nat → ρ ~ →
~) → nat → ρi satisfying
ρ

Ri ~y f~0 = yi ,
Ri ~y f~(z + 1) = fi z(R1 ~y f~z) . . . (Rk ~y f~z);

here = denotes equality of βηR–normal forms. Using these equations we can then prove
the above claim easily (recall that terms with the same normal form are identified). The
operators R1 , . . . , Rn can be defined easily from the recursion constant Rnat,ρ1 ×...×ρk . We
could equally well have introduced them as constants and added the equations above as
conversion rules.
Boolean induction, i.e. case analysis is treated similarly. We let

ets(Indp,A ) := λ~x.R1 , . . . , Rk ,

where now R1 , . . . , Rk are simultaneous primitive recursion (or case splitting) operators of
~→ρ
type Ri : ρ ~ → boole → ρi satisfying

Ri ~y~z true = yi ,
Ri ~y~z false = zi .

The lemmata in §6 stating that ets commutes with substitution and reduction remain valid
since the conversion rules for induction and recursion fit together.
The following remarks and definitions will be helpful later. Let us call a formula A
decidable if there is a term tA such that ⊢ A ↔ atom(tA ).
1. Every quantifier–free formula is decidable. First let ⊃:= λpλq.Rq true p and & :=
λpλq.R(R true false q) false p. Clearly

∀p, q.(atom(p) → atom(q)) ↔ atom(⊃pq)


∀p, q.(atom(p) ∧ atom(q)) ↔ atom(&pq)
52 7 Arithmetic

are provable. Hence we let

tatom(r) := r,
tA→B := ⊃tA tB
tA∧B := &tA tB .

2. We can do case splitting according to decidable formulas A, i.e. for every formula
B[~x] we can prove
CasesA,B : (A → B) → (¬A → B) → B.
The derivation CasesA,B is given by

λu1 , u2 .Ind ~x(λu3 λu4 .u3 T)(λu5 λu6 .u6 ¬F)tA (λu7 .u1 (d1 u7 ))(λu8 .u2 (d2 u8 ))
atom(t )→A
A A ¬atom(t )→¬A
where d1 , and d2 are derivations which exist according to 1 and the
axioms and assumption variables with indices are (writing t for atom(t))

Indp,(p→B)→(¬p→B)→B ,

uA→B
1 , u¬A→B
2 , utrue→B
3 , u4¬true→B , ufalse→B
5 , u¬false→B
6 , ut7A and u¬t
8
A
. The extracted
terms of CasesA,B are given by

ets(CasesA,B ) = λ~y , ~z.if tA ~y~z =η if tA

~ y1 , ~z1 .~y1 )(λ~y1 , ~z1 .~z1 ) and ~y , ~z, ~y1 , ~z1 are lists of variables of types ρ~ := τ (B).
where if := R(λ~
Clearly
if true ~r~s =βR ~r, if false ~r~s =βR ~s.
For better readability we use for if tA~r~s the notation

if A then ~r else ~s fi.

Up to now we have restricted ourselves to the “neutral” system of arithmetic without


extensionality axioms (this system is denoted by N –HAω in [30]). Let us now discuss
extensionality axioms, i.e. axioms postulating the compatibility of extensional equality
with application. To formulate them we first have to define extensional equality =ρ for an
arbitrary type ρ, by
z1 =ρ→σ z2 := ∀xρ z1 x =σ z2 x,
where x =nat y is shorthand for atom(= x y) and =: nat → nat → boole is the characteristic
function of equality on nat; x =boole y is defined similarly. Then we extend the notion of
a derivation term by constants for the extensionality axioms, as follows.

(Extρσ ) For any types ρ, σ


Extρσ : ∀x, y, z.x =ρ y → zx =σ zy
7 Arithmetic 53

is a derivation term with FA(Extρσ ) = ∅.


(Extf ) For any function symbol f ∈ F
Extf : ∀~x∀~y .x1 =ρ1 y1 → . . . → xm =ρm ym → f (~x) =σ f (~y )
is a derivation term with FA(Extf ) = ∅.

Clearly FV(Extρσ ) := F V (Extf ) := ∅. The resulting system of arithmetic is called E–


HAω in [30]. Below we discuss some relationships between the neutral and the extensional
system.
1. E–HAω is conservative over N –HAω with respect to formulas containing variables
(free or bound) of types of level ≤ 1 only. This can be seen by introducing logical relations
(called heriditary extensionality in [30], pp. 155)
n ∼nat m := n = m,
f ∼ρ→σ g := ∀x, y.x ∼ρ y → f x ∼σ gy.
In N –HAω the following facts can be proved easily.

(1) ∼ρ is symmetric and transitive and hence x ∼ρ y → x ∼ρ x.

(2) ~x ∼ ~y → r ∼ r[~y /~x] for each term r with FV(r) ⊆ ~x.

For every formula A let A∼ be obtained from A by relativizing all quantifiers ∀xρ and
∗ ρ
∃ x to x ∼ρ x.

(3) x ∼ρ x → y ∼ρ y → (x ∼ρ y ↔ (x =ρ y)∼ ).

Using (2) and (3) we can show that to every derivation d: A in E–HAω there exists a
derivation d∼ : A∼ in N –HAω with FA(d∼ ) = {B ∼ |B ∈ FA(d)} ∪ {x ∼ x|x ∈ FV(d)}. If d
is an extensionality axiom we use (3). If d is an ∀-elimination we use (2). All other cases
are easy. The conservativity result now follows since in N –HAω we can prove x ∼ρ x for
types ρ of level ≤ 1 and hence A ↔ A∼ if A contains quantifiers of type level ≤ 1 only.
The proof also shows that for universal quantifiers in negative position and existential
quantifiers in positive positions we can remove the restriction on types.
2. Howard has shown in [30], Appendix B that the formula
∀x, y, u∃z.xz = yz → ux = uy
with x, y: nat → nat and u: (nat → nat) → nat (which is equivalent to Extnat→nat,nat ) is
an example of an ∀∃-formula which is provable in E–HAω such that the corresponding
∀∃∗ -formula
∀x, y, u∃∗ z.xz = yz → ux = uy
is not provable in E–HAω (since the latter is equivalent to the Dialectica Interpretation of
Extnat→nat,nat this shows that the extensionality axioms are not Dialectica interpretable).
Therefore E–HAω is not closed under Markov’s rule, i.e. for quantifier free A we have
that E–HAω ⊢ ¬¬∃∗ xA (note that ⊢ ¬¬∃∗ xA ↔ ∃A) does not imply E–HAω ⊢ ∃∗ xA in
general. In the next chapter we will show that N –HAω is closed under Markov’s rule.
Part II: Computational content of
proofs
8 A–translation with program extraction
and the direct method

As is well known a proof of a ∀∃–theorem with a quantifier–free kernel — where ∃ is viewed


as defined by ¬∀¬ — can be used as a program. We describe a “direct method” to use such
a proof as a program, and compare it with Harvey Friedman’s A–translation [10] followed
by the well–known program extraction from constructive proofs.
The arguments presented work only for proofs not involving extensionality axioms.

8.1 Kreisel’s counterexample

First note that a classical proof of ∀x∃y A generally does not yield a program to compute
y from x. The reason for this is that there might be a universal quantifier ∀z right after
∃y, i.e. after ¬∀y¬, and this makes it possible that an assumption

∀y¬∀z B

is instantiated with a non–constant term containing critical variables which are bound
later by ∀z.
It is well known that this is not just a technical difficulty: if T denotes Kleene’s
T –predicate, then
∀n∃m∀k.T (n, n, k) → T (n, n, m)
is trivially provable even in minimal logic (with ∃m defined as ¬∀m¬, i.e. in classical logic),
but there is no computable function f satisfying

∀n, k.T (n, n, k) → T (n, n, f (n)),

for then ∃k T (n, n, k) would be decidable: it would be true if and only if T (n, n, f (n))
holds.
Hence in the rest of this section we will only consider formulas of the form ∀x∃y A
with A quantifier–free.
8 A–translation with program extraction and the direct method 55

8.2 The direct method

We first describe a “direct method” (cf. [23]) to extract the computational content from a
classical proof.
By a Π–formula we mean a formula built without the strong existential quantifier

∃ , which has no (universal) quantifier in premises of implications. For instance any Horn
formula ∀~x.P1 (~x) → · · · → Pn (~x) → Q(~x) is a Π–formula, but

∀x, y, u.(∀z.xz = yz) → ux = uy

is not. Clearly every Π–formula is equivalent (in minimal logic) to a conjunction of formulas
∀C where C is quantifier–free and without ∧. So from now on we will assume that Π–
formulas are of this form.
A derivation d is called a refutation of Π–assumptions if d derives a closed false atom
from assumptions FA(d) = {v1 : ∀C1 , . . . , vn : ∀Cn } where each Ci is quantifier–free.
Now let d be a refutation of Π–assumptions. We may assume FV(d) = ∅ (if not,
substitute arbitrary closed terms for the free variables in d). Next we can normalize d. Let
d↓ be the result. Again d↓ is a refutation of Π–assumptions. We then can read off from
d↓ a list |d| of closed terms called the “first instance” of d↓ (cf. [23]) such that one of the
Π–assumptions is false at |d|. To make this notion easier to understand let us restrict the
general situation slightly. A closed quantifier–free formula B is true respectively false if
tB normalizes to the boolean constant true respectively false. A closed Π–formula ∀~x C is
true iff for all closed terms ~t the formula C[~t/~x] is true.
Now let d be a refutation of Π–assumptions. We may assume FV(d) = ∅ (if not,
substitute arbitrary closed terms for the free variables in d). Next we can normalize d. Let
d↓ be the result. Again d↓ is a refutation of Π–assumptions. We then can read off from d↓
a list of closed terms called the “first instance” of d↓ (cf. [23]). To make this notion easier
to understand let us restrict the general situation slightly.
Let d: ⊥ be a normal derivation with FV(d) = ∅ of ⊥ from assumptions

u: ∀~y .B1 [~y ] → . . . → Bm [~y ] → ⊥,


v1 : ∀C1 , . . . , vn : ∀Cn

where ∀C1 , . . . , ∀Cn are true closed Π–formulas. We define a list |d| of closed terms, called
the first instance of d, such that B1 [|d|], . . . , Bm [|d|] are true. |d| is defined by induction
on d. Since d is normal and FV(d) = ∅ it does not contain axioms (exept the truth axiom,
which is a closed Π–formulas and hence may be assumed to be among the Π–assumptions
∀Ci ). To see this recall that the normal form of any closed term of type nat is of the form
S(S(S . . . (S0) . . .)) and of any closed term of type boole is either true or false; hence all
induction axioms unfold. Therefore d is of the form

w~sd1 . . . dk ,
56 8 A–translation with program extraction and the direct method

where ~s are closed terms and d1 , . . . , dk are derivations of closed quantifier–free formulas.
We distinguish two cases.

1. d1 , . . . , dk derive only true formulas (which can be decided, since the formulas are
quantifier–free and closed). Then w cannot be one of the vi since all ∀Ci are true.
Hence d = u~sd1 . . . dk and the di derive Bi [~s]. So let |d| := ~s.

2. There is a minimal i such that di derives a false formula, A1 → · · · → Ani → ⊥


say. Then A1 . . . , Ani are true. Without loss of generality we may assume that
An
di = λw1A1 . . . λwni i e where e: ⊥ contains assumptions among

u: ∀~y .B1 [~y ] → . . . → Bm [~y ] → ⊥,


v1 : ∀C1 , . . . , vn : ∀Cn ,
w1 : A1 , . . . , wni : Ani .

Therefore we can recursively define |d| := |e|.

V V
Hence from a proof d of ∀~x∃~y i Bi [~x, ~y ] from true Π–assumptions with Bi quantifier–
free we can obtain the following algorithm to compute V
V for any ~r an ~s such that all Bi [~r, ~s]
hold. First instanciate d with ~r, i.e. form d~r: ∃~y i Bi [~r, ~y ]. Since ∃ is ¬∀¬, we have d~ru: ⊥
with u: ∀~y .B1 [~r, ~y] → . . . → Bm [~r, ~y ] → ⊥ a new Π–assumption (of a false formula!). Now
normalize d~ru. From its normal form (d~ru)↓, which is a refutation from Π–assumptions,
we can read off the first instance |(d~ru)↓| of (d~ru)↓. These are closed terms ~s such that all
Bi [~r, ~s] are true.
It might seem that instead of the method described, which chooses the branch to follow
by checking whether some quantifier–free formulas are false or true, one could alternatively
look for an occurrence of the false Π–assumption u in the proof whose arguments do not
contain u any more. However, in our general case where Π–assumptions (and not just
Horn formulas) are allowed these arguments may contain free assumption variables bound
later (by →+ ) in the proof, and so we cannot conclude that all arguments of u derive true
formulas. If, however, we restrict attention to the special case where we only allow Horn–
formulas as assumptions, then this phenomenon cannot happen and we have a variant of
the direct method.
Some comments are to be made here.

1. In principle, of course, we could replace any quantifier–free formula C by the atomic


formula atom(tC ). However, this introduces quite a lot of somewhat artificial boolean
functions, which makes the proof much harder to read and work with.

2. If we are prepared to apply some preparatory “pruning”–step to our derivation then


we may assume that such additional assumptions v are always true. For if we had a
v: A with A false which is bound later in the proof yielding A → C, then we could
replace the whole subproof above this occurrence of A → C by a derivation of A → C
8 A–translation with program extraction and the direct method 57

using ex–falso–quodlibet. If we do this preparatory step first, then the variant of the
direct method can be applied to the general case of Π–assumptions as well.

8.3 A–translation

We now describe Friedman’s A–translation from [10]. Let A be an arbitrary but fixed
formula. The A–translation B A of a formula B is obtained by replacing any atomic sub-
formula P of B by (P → A) → A.
Note that any derivation d of some formula B from assumptions C 1 , . . . , Cn becomes
after the A–translation a derivation of B A from C1A , , . . . , CnA . To see this recall that our
logical rules are those of minimal logic and hence give no extra treatment to falsity. Also
the axiom schemes (exept the truth axiom, which can be viewed as a Π–assumption) remain
instances of the same axiom scheme after the A–translation. E.g. boolean induction
B[true/p] → B[false/p] → ∀p B
is translated into
B A [true/p] → B A [false/p] → ∀p B A ,
which again is an instance of boolean induction.
Let us look at what happens with Π–assumptions under the A–translation. As in §8.2
we may assume that all formulas considered do not contain ∧.

Lemma 1. For any quantifier–free formula C we can find a derivation d: C → C A .

Proof by induction on C. Let C ≡ B1 → . . . → Bm → R with an atom R. We must derive


~ → R) → B
(B ~ A → (R → A) → A.

So assume
ũ: B~ → R,
~ A,
ṽi : B
i
w: R → A.
We must show A.
Case u− ~
i : ¬Bi for some i. Let Bi ≡ Ci → Pi with atoms Pi . Then

~i A → (Pi → A) → A
ṽi : C
and we have
~
eij [u−
i ]: ≡ StabCij λv
¬Cij −
.ui λ~uCi .EfqPi (vuj ): Cij ,
Pi − ~
ei [u− uCi wi : ¬Pi .
i ]: ≡ λwi .ui λ~
58 8 A–translation with program extraction and the direct method

A
By IH we have dCij : Cij → Cij . Hence
ṽi (dCi1 ei1 ) . . . (dCini eini )(λwiPi .EfqA (ei wi )): A.

Case u+
i : Bi for all i. Then
w(ũu+ +
1 . . . um ): A

The extracted terms for this derivation are


ets ~
dets ≡ λ~x1 , . . . , ~xm , ~z. if ¬B1 then ~x1 dets
C11 . . . dC1n 0 else
1

...
if ¬Bm then ~xm dets ets ~
Cm1 . . . dCmnm 0 else
~z fi . . . fi,
where ~xi , ~z are the lists of variables associated with ṽi : BiA , w: R → A. 
Here we have used case splitting according to the quantifier free formulas Bi which is
admissible by the remark at the end of §7.
If we want to use the A–translation to extract the computational content from a
classical proof we have to choose a particular A involving the strong existential quantifier.

Lemma 2. Let Bi [~x, ~y ] be quantifier–free formulas and A[~x] := ∃∗ ~y i Bi [~x, ~y]. Then we
V
V
can find a derivation of (∀~y .B1 [~x, ~y ] → . . . → Bm [~x, ~y ] → ⊥)A[~x] .

~ A and w: ⊥ → A. We must show A.


Proof. Let ~y be given and assume ṽi : B i

Case u− ~
i : ¬Bi for some i. Let Bi ≡ Ci → Pi with atoms Pi . Then
A
~i → (Pi → A) → A
ṽi : C
and we have
~
eij [u−
i ]: ≡ StabCij λv
¬Cij −
.ui λ~uCi .EfqPi (vuj ): Cij ,
Pi − ~
ei [u− uCi wi : ¬Pi .
i ]: ≡ λwi .ui λ~
A
Using dCij : Cij → Cij from Lemma 1 we obtain
ṽi (dCi1 ei1 ) . . . (dCini eini )(λwiPi .EfqA (ei wi )): A.

Case u+
i : Bi for all i. Then
∃+ ~y hu+ +
1 , . . . , um i: A

The extracted terms for this derivation are


ets ~
dets ≡ λ~y , ~x1 , . . . , ~xm , ~z. if ¬B1 then ~x1 dets
C11 . . . dC1n1 0 else
...
if ¬Bm then ~xm dets ets ~
Cm1 . . . dCmnm 0 else
~y fi . . . fi,
where ~xi , ~z are the lists of variables associated with ṽi : BiA , w: ⊥ → A. 
8 A–translation with program extraction and the direct method 59

Theorem (Friedman). For any derivation

d[u: ∀~y .B1 [~x, ~y ] → . . . → Bm [~x, ~y] → ⊥, v1 : ∀C1 , . . . , vn : ∀Cn ]: ⊥

with Bi , Cj quantifier–free we can find a derivation

dtr [v1 : ∀C1 , . . . , vn : ∀Cn ]: ∃∗ ~y


V
V
Bi [~x, ~y ].
i

Proof. Let A[~x] := ∃∗ ~y


V
V
i Bi [~x, ~y ] and consider the A[~x]–translation

dA[~x] [u′ : (∀~y .B1 [~x, ~y] → . . . → Bm [~x, ~y ] → ⊥)A[~x] , v1′ : (∀C1 )A[~x] , . . . vn′ : (∀Cn )A[~x] ]
: (⊥ → A[~x]) → A[~x]

of d, obtained by just changing some formulas. By Lemma 1 we have

dvi [vi : ∀Ci ]: (∀Ci )A[~x] .

By Lemma 2 (now using the particular choice of A[~x]) the A[~x]–translation of the assump-
tion u is provable without assumptions:

du : (∀~y .B1 [~x, ~y ] → . . . → Bm [~x, ~y ] → ⊥)A[~x]

Substituting dvi [vi : ∀Ci ] for vi′ and du for u′ we obtain

dtr [v1 : ∀C1 , . . . , vn : ∀Cn ]: ∃∗ ~y Bi [~x, ~y ],


V
V
i

where
dtr ≡ dA[~x] [du , dv1 , . . . , dvn ]EfqA[~x] . 

Having obtained a proof dtr of an existential formula ∃∗ ~y i Bi [~x, ~y ] we can then apply
V
V
the general method of extracting terms to this proof. It yields
ets ~
(dtr )ets ≡ (dA[~x] )ets [dets ets
u , dv1 , . . . , dvn ]0, (1)

since extracting terms commutes with substitution.


Note that there are many ways to use Friedman’s method to extract a term from a
given arithmetical proof of a weak existential formula.

1. The most straightforward way is to prove the formula without any assumptions. This
means that we are not allowed to use lemmata, and hence that such a proof tends to
be rather long, and difficult to produce.

2. TThe next straightforward way is to pack all Π–Lemmata used in the proof (and
proved explicitely in 1 above) into purely generalized atoms ∀~x atom(t). However,
this means that we have to introduce rather complex boolean terms, which will later
show up in the proof as tests for case distinctions.
60 8 A–translation with program extraction and the direct method

8.4 Comparison

We now prove that the value of the extracted terms (dtr )ets when instanciated with a list
~r of closed terms is in fact the same as the result of the direct method described in §8.2.
So consider again the situation of Friedman’s Theorem, i.e. a derivation
d[u: ∀~y .B1 [~x, ~y ] → . . . → Bm [~x, ~y] → ⊥, v1 : ∀C1 , . . . , vn : ∀Cn ]: ⊥
with Bi , Cj quantifier–free. We just observed that the program (dtr )ets extracted from the
translated derivation has the form (1) above. Let us try to understand how this program
works. First, (dA[~x] )ets closely follows the structure of d. The reason is that dA[~x] differs
from d only with respect to the formulas affixed, and when forming the extracted terms
this affects only the types and the arities of the lists of object variables associated with
assumption variables.
In order to comprehend dets ets
vi and du let us have a second look at the proofs of Lemma 1
A[~
x]
and 2. First note that dvi [vi : ∀Ci ]: (∀Ci )A[~x] is obtained from di : Ci → Ci constructed
in the proof of Lemma 1 by
dvi ≡ λ~yi .di (vi ~yi ).
Since vi has type ∀Ci , which is a Harrop formula, we have dets vi ≡ λ~ yi dets
i . Now from the
proof of Lemma 1 we obtain
ets ~
dets
i ≡ λ~x1 , . . . , ~xm , ~z. if ¬B1 then ~x1 dets
C . . . dC 0 else
11 1n1

...
if ¬Bm then ~xm dets ets ~
Cm1 . . . dCmnm 0 else
~z fi . . . fi, (2)
where Ci ≡ B1 → . . . → Bm → R with Bi ≡ C ~ i → Pi and ~xi , ~z are the lists of variables
ets
associated with ṽi , w. Furthermore, dCij are the extracted terms of derivations dCij : Cij →
A[~
x]
Cij constructed by previous applications of Lemma 1. Similarly
ets ~
dets y , ~x1 , . . . , ~xm , ~z. if ¬B1 then ~x1 dets
u ≡ λ~ C11 . . . dC1n1 0 else
...
if ¬Bm then ~xm dets ets ~
Cm1 . . . dCmnm 0 else
~y fi . . . fi, (3)
where Bi ≡ C ~ i → Pi and ~xi , ~z are the lists of variables associated with ṽi , w. Furthermore,
A[~
x]
dets
Cij are the extracted terms of derivations dCij : Cij → Cij constructed by previous
applications of Lemma 1.
This analysis makes it possible to prove that the value of the extracted terms when
instanciated with a list ~r of closed terms is in fact the same as the result of the direct method
described in §8.2 to read off the first instance provided by the instanciated derivation
¯ ∀~y .B1 [~r, ~y ] → . . . → Bm [~r, ~y] → ⊥, v1 : ∀C1 , . . . , vn : ∀Cn ]: ⊥
d[ū:

Below we will show the following


8 A–translation with program extraction and the direct method 61

Claim. For any normal derivation

e[ū: ∀~y .B1 [~r, ~y] → . . . → Bm [~r, ~y ] → ⊥, v1 : ∀C1 , . . . , vn : ∀Cn ]: ⊥

with FV(e) = ∅ we have

ets ~
|e| = [[(eA[~r] )ets [dets
u [~r /~x], dets
v1 , . . . , dvn ]0]].

We then obtain that the instanciation of the extracted terms (1) with ~r for ~x, i.e.

ets ~ ets ~
(dA[~x] )ets [dets ets
r/~x] ≡ (d[~r/~x]A[~r] )ets [dets
u , dv1 , . . . , dvn ]0[~ u [~r/~x], dets
v1 , . . . , dvn ] 0

has as its value the list of closed terms which is the first instance of the instanciated
derivation d[~r/~x], i.e. |d[~r/~x]↓|. For by the claim we have

ets ~
|d[~r/~x]↓| = [[((d[~r/~x]↓)A[~r] )ets [dets
u [~r/~x], dets
v1 , . . . , dvn ]0]]
ets ~
= [[((d[~r/~x]A[~r] )ets )↓[dets
u [~r/~x], dets
v , . . . , dv ]0]]
1 n
A[~
r] ets ets ~
= [[(d[~r/~x] ) [dets
u [~r/~x], dets
v1 , . . . , dvn ]0]],

since normalization (i.e. βη∃∗ R–conversion) commutes with A[~r]–translation and the for-
mation of extracted terms.
It remains to prove the claim. We use induction on e. Since e is normal, it must be
of the form e = w~se1 . . . ek with w ∈ {ū, v1 , . . . , vn }.
Case 1. e1 , . . . , ek derive only true formulas. Then w = ū, k = m and the ei derive
Bi [~r, ~s]. By definition |e| := ~s. Furthermore

ets ~
(eA[~r] )ets [dets
u [~r /~x], dets
v1 , . . . , dvn ] 0
A[~
r] ets ets ~
≡ dets
u [~r /~x]~s(e1 ) [dets
u [~r /~x], dets ets A[~
r] ets ets
v1 , . . . , dvn ] . . . (em ) [du [~ r/~x], dets
v1 , . . . , dvn ] 0
=βR ~s

by the form (3) of dets ~ i → Pi )[~r, ~s] are true.


r , ~s] ≡ (B
u , since all Bi [~

Case 2. There is a minimal i such that ei derives a false formula, Di1 [~s] → · · · →
Dini [~s] → ⊥ say. Then Di1 [~s], . . . , Dini [~s] are true. Without loss of generality we may
D [~
s] Din [~ s]
assume that ei = λw1 i1 . . . λwm i f where f : ⊥ contains assumptions among

ū: ∀~y .B1 [~r, ~y ] → . . . → Bm [~r, ~y] → ⊥,


v1 : ∀C1 , . . . , vn : ∀Cn ,
w1 : Di1 , . . . , wm : Dini .
62 8 A–translation with program extraction and the direct method
A[~
r]
Therefore by definition |e| := |f |. Furthermore using the notation dwj [wj ]: Dij for the
derivation obtained by applying Lemma 1 to Dij we have

ets ~
(eA[~r] )ets [dets
u [~ r], dets
v1 , . . . , dvn ] 0
(
A[~
r] A[~
r] ets ets ets ~
dets
u [~ r]~s (e1 )ets [dets u [~ r], dets ets
v1 , . . . , dvn ] . . . (ek ) [du [~r], dets
v1 , . . . , dvn ]0 if w = ū
≡ A[~r] A[~r] ets ets ets ~
dets
vi ~s (e1 )ets [dets u [~ r], dets ets
v1 , . . . , dvn ] . . . (ek ) [du [~r], dets
v1 , . . . , dvn ] 0 if w = vi
A[~
r] ets ets ~
=βR (ei ) [dets
u [~r], dets ets ets
v1 , . . . , dvn ]dw1 . . . dwn 0 by (3) and (2), respectively
i
ets ~
=β (f A[~r] )ets [dets
u [~r], dets ets ets
v1 , . . . , dvn , dw1 , . . . , dwn ]0,
i

so the claim follows from the IH. 


9 The root example; refinements

In applications it will be important to produce extracted terms with as few as possible case
distinctions, and also that the case distinctions should be over as simple as possible boolean
terms. The following example will show that such improvements are indeed necessary.
Let f : nat → nat be an unbounded function with f (0) = 0. Then we can prove

∀n∃m.f (m) ≤ n < f (m + 1).

f (m) = m2 , then this formula expresses the existence of an integer square root
If e.g. √
m := [ n] for any n. More formally we can prove

∀n∃m.¬n < f (m) ∧ n < f (m + 1) (1)

from the assumptions

v1 : ∀n ¬n < f (0), v2 : ∀n n < f (g(n)).

Here <nat→nat→boole is the characteristic function of the natural ordering of the natural
numbers and r < s denotes atom(<rs). We expressed f (m) ≤ n by ¬n < f (m) and
f (0) = 0 by ∀n ¬n < f (0) to keep the formal proof as simple as possible. In order to
have Π–assumptions we had to express the unboundedness of f by a witnessing function
g. First note that the Π–assumptions v1 and v2 are not closed as required previously. But
this is no problem since A–translation and program extraction clearly also work in this
case: we just get a program containing f and g as parameters.
Now let us prove (1). Let n be given and assume

u: ∀m.¬n < f (m) → n < f (m + 1) → ⊥.

We have to show ⊥. From v1 and u we inductively get ∀m ¬n < f (m). For m := g(n) this
yields a contradiction to v2 .
The derivation term corresponding to this proof is

d := Indm,¬n<f (m) n(v1 n)u(g(n))(v2n): ⊥.

Now let
A := ∃∗ m.¬n < f (m) ∧ n < f (m + 1).
The program extracted from d is

(dtr )ets ≡ (dA )ets [dets ets ets


u , dv1 , dv2 ]0: nat.
64 9 The root example; refinements

(dA )ets is the same as d exept that Indm,¬n<f (m) has to be replaced by

λn Rnat,(nat→nat)→nat→nat

(since τ ((¬n < f (m))A ) = τ (((n < f (m) → A) → A) → (⊥ → A) → A) = (nat → nat) →


nat → nat), and the assumption variables u, v1 , v2 in d have to be replaced by (unary lists
of) object variables

xu : nat → [(nat → nat) → nat → nat] → (nat → nat) → nat → nat,


xv1 : nat → (nat → nat) → nat → nat,
xv2 : nat → nat → nat.

The subprograms dets ets ets


u , dv1 , dv2 (of the same types as xu , xv1 , xv2 ) are given by

dets
u ≡ λm, x1 , x2 , k. if n < f (m) then x1 id0 else
if ¬n < f (m + 1) then x2 0 else
m fifi,
dets
v1 ≡ λn, x1 , k. if ¬n < f (0) then x1 0 else
k fi,
dets
v2 ≡ λn, k k.

Hence the normal form of (dtr )ets is

R(λx1 , k. if ¬n < f (0) then x1 0 else k fi)


(λm, w1 , w2 , k. if n < f (m) then w1 id0 else
if ¬n < f (m + 1) then w2 0 else
m fifi)
(g(n))
(λk k)
0

Informally, (dtr )ets = H(g(n), λk k, 0) where H: nat → (nat → nat) → nat → nat is such
that

H(0, x1 , k) = if ¬n < f (0) then x1 0 else k fi


H(m + 1, x1 , k) = if n < f (m) then H(m, id, 0) else
if ¬n < f (m + 1) then x1 0 else
m fifi

This program is correct, but it is unnecessarily complicated. We will now describe a refined
A–translation which will simplify the type of the auxiliary function H as well as its if–
then–else structure. The type reduction will be achieved by not replacing all atoms P
9 The root example; refinements 65

by (P → A) → A. This of course requires that atom is not the only relation symbol.
Furthermore, to reduce case splitting we will construct “better” proofs of C → C A for
quantifier–free C without ∧.
We first extend our arithmetical system by introducing relation symbols different
from atom, and also reintroducing a special symbol ⊥ for falsity. The reason for the latter
addition is that otherwise e.g. the boolean induction axiom
atom(true) → atom(false) → ∀p atom(p)
would translate into
atom(true) → A → ∀p atom(p),
which certainly is not derivable. Formulas are now built from atomic formulas ⊥, atom(t)
and possibly other atoms P (~t) by means of →, ∧, ∀ and ∃∗ .
We also extend the notion of a derivation term by adding the following clauses.
¬F: ¬atom(false) is a derivation term with FA(¬F) := FV(¬F) := ∅, and Efqfalse : ⊥ →
atom(false) is a derivation term with FA(Efqfalse ) := FA(Efqfalse ) := ∅. Note that by
Efqfalse and the falsity axiom ¬F: ¬atom(false) we have
⊢ atom(false) ↔ ⊥.
Stabatom : ∀p.¬¬atom(p) → atom(p) can again be proved easily by boolean induction, using
the truth axiom in the case true, and the falsity axiom ¬F and Efqfalse in the case false.
As before we can conclude ⊢ ¬¬A → A for formulas A built with ⊥, →, ∧ and ∀ and
containing atom as the only the relation symbol. ⊥ → A is derivable for any formula A
containing atom as the only the relation symbol.
Let us call a relation symbol P decidable if there is a term tP such that ⊢ ∀~x.P (~x) ↔
atom(tP ~x). Again a formula A is called decidable if there is a term tA such that ⊢
A ↔ atom(tA ). As before every quantifier–free formula containing only decidable relation
symbols is decidable, and we can do case splitting according to decidable formulas A.
Now let L be a set of formulas. In our applications L will consist of the quantifier–free
kernels of the lemmata ∀Ci ,Vand in addition of the formula B1 [~x, ~y ] → . . . → Bm [~x, ~y ] → ⊥,
V
if our goal formula is ∀~x∃~y i Bi [~x, ~y ].
The set of L–critical relation symbols is the smallest set satisfying the following con-
dition.
~ 1 → P1 ) → . . . → ( C
If (C ~ m → Pm ) → R(~t) is a positive subformula of an L–formula,
and if for some i Pi ≡ ⊥ or Pi ≡ Q(~s) for some L–critical relation symbol Q, then R
is L–critical.

We now define an A–translation relative to L:


(R(~t) → A) → A, if R is L–critical

~ A
R(t) :=
R(~t), otherwise
⊥A := A
(B → C)A := B A → C A .
66 9 The root example; refinements

We will write P ∈ CL if the atom P is of the form R(~t) for some L–critical R or P ≡ ⊥.
A quantifier–free formula B1 → . . . → Bm → R will be called L–relevant if R ∈ CL .

Lemma 1∗ . For any B ∈ Neg(L) and any C ∈ Pos(L) we can find the following deriva-
tions.

~ → R. Then in case R ∈ CL we need gB for Bi L–relevant


(i) dC : C → C A . Let C ≡ B i
and fBj for Bj L–irrelevant, and in case R 6∈ CL we need fB~ .

(ii) eC,D : ((C → D) → A) → C A for C L–relevant. Let C ≡ B ~ → R. Then we need


CasesR if R 6≡ ⊥, and also gBi or fBi depending on whether Bi is L–relevant or not.
~ → R. Then we need d ~ .
(iii) fB : B A → B for B L–irrelevant. Let B ≡ C C

(iv) gB : B A → (B → A) → A for B L–relevant. Let B ≡ C ~ → R. Then we need some


eCi ,D for Ci L–relevant, and CasesCj and also dCj for Cj L–irrelevant.

Proof. Simultaneously by induction on the subformulas of L–formulas.


~ → R. Case R ∈ CL , R 6≡ ⊥. We must derive
(i) Let C ≡ B

~ → R) → B
(B ~ A → (R → A) → A.

So assume

u: B~ → R,
vi : BiA ,
w: R → A.

We must show A.
Let Bi for i ∈ {1, . . . , k} be relevant and Bj for j ∈ {k + 1, . . . , m} be irrelevant.
Assume ui : Bi . Note that by IH(iii) fBj : BjA → Bj and hence fBj vj : Bj . From u: B1 →
. . . → Bk → Bk+1 → . . . → Bm → R and w: R → A we get A. Now cancel uk : Bk ,
yielding Bk → A. Using the IH gBk : BkA → (Bk → A) → A and the assumption BkA we
get A. Repeating this procedure we finally cancel u1 : B1 , yielding B1 → A. Using the IH
gB1 : B1A → (B1 → A) → A and the assumption B1A we get A, as required. The derivation
term is
Bk
dC ≡ λu, v1 , . . . , vm , w.gB1 v1 λuB

1 . · · · gBk vk λuk .w uu1 . . . uk (fBk+1 vk+1 ) . . . (fBm vm )
1

and we get
dets
C ≡ λ~
ets
x1 , . . . , ~xk , ~z.gB ets
~x (gB
1 1
ets
~x . . . (gB
2 2
~x ~z ) . . .),
k k

where ~xi , ~z are the lists of variables associated with vi , w.


9 The root example; refinements 67

u1 : ¬R u2 : R

⊥→D ⊥

u: (R → D) → A R→D

(R → A) → (¬R → A) → A v: R → A A

(¬R → A) → A ¬R → A

Figure 1

Case R ≡ ⊥. Similarly, using EfqA instead of w: R → A. Then in dets


C we leave out
λ~z and replace ~z in the kernel by ~0.
Case R 6∈ CL . Then all Bi are irrelevant. We must derive

~ → R) → B
(B ~ A → R.

So assume u: B ~ → R and B
~ A . Using the IH fB : B A → Bi we obtain B
~ and hence R. In
i i
ets
this case dC ≡ ε.
(ii) Case R with R ∈ CL , R 6≡ ⊥. Then RA ≡ (R → A) → A, and we must derive

((R → D) → A) → (R → A) → A.

So assume u: (R → D) → A and v: R → A. Then clearly A can be derived, using CasesR


and EfqD ; a derivation is given in Figure 1.
The derivation term is

eR,D ≡ λu, v.CasesR vλu¬R R


1 .uλu2 .EfqD (uu1 )

and we get
eets
R,D ≡ λ~
x, ~y . if R then ~y else ~x fi.

Case R with R ≡ ⊥. Then RA ≡ A, and we must derive

((⊥ → D) → A) → A.

But this clearly can be done using EfqD , and we have eets
⊥,D ≡ λ~
x ~x.
68 9 The root example; refinements

u3 : B → C u2 : B

u1 : C → D C

u: ((B → C) → D) → A (B → C) → D

B A → (B → A) → A v: B A A

(B → A) → A B→A

(C → D) → A

Figure 2

Case B → C. We must derive

[((B → C) → D) → A] → B A → C A

So assume u: ((B → C) → D) → A and v: B A . We must show C A . First note that we


can derive (C → D) → A from our assumptions, using the fact that by IH we can derive
B A → (B → A) → A (with gB or fB , depending on whether B is relevant or not). A
derivation is given in Figure 2. But then the claim follows, since by IH

eC,D : ((C → D) → A) → C A .

The derivation term is

λu, v.eC,D λu1 .gB vλu2 .uλu3 .u1 (u3 u2 ) or λu, v.eC,D λu1 .uλu3 .u1 (u3 (fB v))

depending on whether B is relevant or not, and we obtain the extracted terms

λ~x, ~y.eets ets


C,D (gB ~ y~x) or λ~x, ~y .eets
C,D (~
x~y ).

~ → R. Since B is irrelevant we have R 6∈ CL . Then (C


(iii) Let B ≡ C ~ → R)A ≡
~A
C → R. We must derive
~ A → R) → C
(C ~ → R.

But this is easy, using the IH dCi : Ci → CiA . Clearly fBets ≡ ε.


9 The root example; refinements 69

(iv) Case R with R ∈ CL , R 6≡ ⊥. Then RA ≡ (R → A) → A, and we must derive

((R → A) → A) → (R → A) → A,
ets
which is trivial. We have gR ≡ λ~x ~x.
Case R with R ≡ ⊥. Then RA ≡ A, and we must derive

A → (⊥ → A) → A,
ets
which again is trivial. We have g⊥ ≡ λ~x, ~y ~x.
Case C → B. By assumption B is relevant. We must derive

(C A → B A ) → ((C → B) → A) → A.

So assume u: C A → B A and v: (C → B) → A.
We first consider the case where C is relevant. Then we have eC,B : ((C → B) → A) →
C by IH(ii), hence B A (using v and u). By IH(iv) for the shorter formula B we know
A

gB : B A → (B → A) → A and hence (B → A) → A. But B → A can easily be derived


from our hypothesis v: (C → B) → A and hence we obtain A, as required. The derivation
term is
gC→B ≡ λu, v.gB (u(eC,B v))λuB C
1 .vλu2 u1

and we get
ets ets
gC→B ≡ λ~x, ~y .gB (~x(eets
C,B ~
y ))~y .

We finally consider the case where C is irrelevant. The derivation now uses CasesC ;
so it suffices to first derive A under the additional hypothesis C, and then derive A under
the additional hypothesis ¬C.
So assume u+ : C. Since by IH(i) dC : C → C A we obtain C A and hence B A (using
u: C A → B A ). By IH(iv) for the shorter formula B we know gB : B A → (B → A) → A
and hence (B → A) → A. But B → A can easily be derived from our hypothesis v: (C →
B) → A and hence we obtain A, as required.
Now assume u− : ¬C. Then by ex–falso–quodlibet we obtain C → B and hence A,
using our hypothesis v: (C → B) → A.
The derivation term is

gC→B ≡ λu, v.CasesC (λu+ .gB (u(dC u+ ))λuB C − C −


1 .vλu2 u1 )(λu .vλu3 .EfqB (u u3 ))

and we get
ets ets
gC→B ≡ λ~x, ~y . if C then gB (~xdets
C )~y else ~y fi. 

Lemma 2∗ . Let Bi be quantifier–free formulas and A := ∃∗ ~y


V
V
i Bi . Then we can find a
derivation of
∀~y .B1A → . . . → Bm
A
→A
70 9 The root example; refinements

A
involving CasesBi and dCij : Cij → Cij ~ i → Pi and fB : B A → Bj for
for relevant Bi ≡ C j j
irrelevant Bj .

~ A , . . . , vm : B
Proof. Let ~y be given and assume v1 : B ~ A . We must show A.
1 m

We may assume that Bi for i ∈ {1, . . . , k} be relevant and Bj for j ∈ {k + 1, . . . , m}


be irrelevant.
Case u− ~
i : ¬Bi for some i ∈ {1, . . . , k}. Let Bi ≡ Ci → Pi with atoms Pi . Then

~i A → (Pi → A) → A
vi : C

in case Pi 6≡ ⊥ and
~i A → A
vi : C
in case Pi ≡ ⊥. We have
~
eij [u−
i ]: ≡ StabCij λv
¬Cij −
.ui λ~uCi .EfqPi (vuj ): Cij ,
Pi − ~
ei [u− uCi wi : ¬Pi .
i ]: ≡ λwi .ui λ~

A
Using dCij : Cij → Cij from Lemma 1 ∗ we obtain in case Pi 6≡ ⊥

vi (dCi1 ei1 ) . . . (dCini eini )(λwiPi .EfqA (ei wi )): A

and in case Pi ≡ ⊥
vi (dCi1 ei1 ) . . . (dCini eini ): A.

Case u+
i : Bi for all i ∈ {1, . . . , k}. Then

∃+ ~y hu+ +
1 , . . . , uk , fBk+1 vk+1 , . . . , fBm vm i: A

The extracted terms for this derivation are


ets ~
dets ≡ λ~y , ~x1 , . . . , ~xm . if ¬B1 then ~x1 dets
C11 . . . dC1n1 0 else
...
ets ~
if ¬Bk then ~xk dets
Ck1 . . . dCkn 0 else
k

y
~ fi . . . fi,

where ~xi is the list of variables associated with vi : BiA . 


This A–translation relative to L simplifies the extracted terms a lot. First the deriva-
tions of C → C A are necessary only for those lemmata ∀C where C involves L–critical
relation symbols (for otherwise we have C A ≡ C). For the other lemmata a derivation of
C → C A provided by Lemma 1∗ involves in most cases only very few case distinctions.
Finally, in the derivation given by Lemma 2∗ of the A–translation of our false assumption
9 The root example; refinements 71

A
V
V
i Bi → A direct case distinctions are only necessary for the relevant Bi ; for the other
Bj we have a derivation of BjA → Bi by Lemma 1∗ .
Let us now come back to our initial example and study the effect of our refinements
there. We can relativize the A–translation to the set L of formulas consisting of

n < f (0) → ⊥,
n < f (g(n)),
(n < f (m) → ⊥) → n < f (m + 1) → ⊥.

Since no positive subformula of an L–formula is an implication with a <–atom as its


~ → ⊥ are
conclusion there are no L–critical relation symbols. Hence only negations B
L–relevant.
We now repeat our treatment of the root example, based of the A–translation relative
to L and the refined Lemmata 1∗ and 2∗ . The derivation term corresponding to the
informal proof is
d := Indm,¬n<f (m) n(v1 n)u(g(n))(v2n): ⊥.
Now let
A := ∃∗ m.¬n < f (m) ∧ n < f (m + 1).
The program extracted from d is

(dtr )ets ≡ (dA )ets [dets ets ets


u , dv1 , dv2 ]: nat.

(dA )ets is the same as d exept that Indm,¬n<f (m) has to be replaced by

λn Rnat,nat

(since τ ((n < f (m) → ⊥)A ) = τ (n < f (m) → A) = nat), and the assumption variables
u, v1 in d have to be replaced by (unary lists of) object variables

xu : nat → nat → nat,


xv1 : nat → nat,

whereas v2 has to be replaced by the empty list. The subprograms dets ets
u , dv1 (of the same
types as xu , xv1 ) are given by (cf. Lemmata 2∗ and 1∗ )

dets
u ≡ λm, x. if n < f (m) then x else
m fi,
dets
v1 ≡ λn 0.

Hence the normal form of (dtr )ets is

R0(λm, x. if n < f (m) then x else m fi)(g(n)).


72 9 The root example; refinements

Informally, (dtr )ets = h(g(n)) where h: nat → nat is such that

h(0) = 0,
h(m + 1) = if n < f (m) then h(m) else m fi.

Problem. Let C = C[~y ] be the quantifier–free formula

~ 1 → P1 ) → . . . → ( C
C[~y ] ≡ (C ~ m → Pm ) → Q 1 → . . . → Q n → R

A
with atoms Pi , Qj , R. Assume that we have derivations dCik : Cik → Cik with extracted
ets A
terms dCik . Show that there exists a derivation of ∀~y .C → C whose extracted termlist
has the form

~ 1 → P1 ) then ~x1 dets . . . dets ~0 else


λ~y , ~x1 , . . . , ~xm , ~x′1 , . . . , ~x′n , ~z. if ¬(C C11 C1n 1

...
~ m → Pm ) then ~xm dets . . . dets ~0 else
if ¬(C Cm1 Cmnm
~x′1 (~x′2 . . . (~x′n~z ) . . .) fi . . . fi.

Solution. Assume

u: (C~ 1 → P1 ) → . . . → ( C
~ m → Pm ) → Q1 → . . . → Qn → R,
~ A → (Pi → A) → A,
vi : Ci

→ A) → A,
vj : (Qj
w: R → A.

A
Assume further dCik : Cik → Cik .
~ i , u′ : ¬Pi for some i. Then
Case ~ui : C i

vi (dCi1 ui1 ) . . . (dCimi uimi )(λwiPi .EfqA (u′i wi )): A

~ 1 → P1 , . . . , u′′ : C
Case u′′1 : C ~ m → Pm . Then we have uu′′ . . . u′′ : Q1 → . . . → Qn → R
m 1 m
and hence

v1′ (λuQ ′ Q2 ′ Qn ′′ ′′
1 .v2 (λu2 . . . . vn (λun .w(uu1 . . . um u1 . . . un )) . . .)): A.
1

For the extracted terms let ~xi be the list of variables associated with vi , ~x′j be the list of
variables associated with vj′ and ~z be the list of variables associated with w. Then clearly
the extracted termlist is as given in the statement of the lemma. 
Part III: Classifying arithmetical
proofs
10 Ordinals below epsilon zero

We want to discuss the derivability and underivability of initial cases of transfinite in-
duction in arithmetical systems. In order to do that we shall need some knowledge and
notations for ordinals. Now we do not want to assume set theory here; hence we introduce
a certain initial segment of the ordinal (the ordinals < ε0 ) in a formal, combinatorial way,
i.e. via ordinal notations. Our treatment is based on the Cantor normal form for ordi-
nals; cf. [1]. We also introduce some elementary relations and operations for such ordinal
notations, which will be used later.
We define the two notions

• α is an ordinal notation

• α < β for ordinal notations α, β

simultaneously by induction:

1. If αm , . . . , α0 are ordinal notations and αm ≥ . . . ≥ α0 (where α ≥ β means α > β or


α = β), then
ω αm + · · · + ω α0
is an ordinal notation. Note that the empty sum denoted by 0 is allowed here.

2. If ω αm + · · · + ω α0 and ω βn + · · · + ω β0 are ordinal notations, then

ω αm + · · · + ω α0 < ω βn + · · · + ω β0

iff there is an i ≥ 0 such that αm−i < βn−i , αm−i+1 = βn−i+1 , . . . , αm = βn , or else
m < n and αm = βn , . . . , α0 = βn−m .

It is easy to see (by induction on the levels in the inductive definition) that < is a linear
order with 0 being the smallest element.
We shall use the notation 1 for ω 0 , a for ω 0 + · · · + ω 0 with a copies of ω 0 and ω α a
for ω α + · · · + ω α again with a copies of ω α .
We now define addition for ordinal notations:

ω αm + · · · + ω α0 + ω βn + · · · + ω β0 := ω αm + · · · + ω αi + ω βn + · · · + ω β0
74 10 Ordinals below epsilon zero

where i is minimal such that αi ≥ βn .


It is easy to see that + is an associative operation which is strictly monotonic in
the second argument and weakly monotonic in the first argument. Note that + is not
commutative: 1 + ω = ω 6= ω + 1.
The natural (or Hessenberg) sum of two ordinal notations is defined by

(ω αm + · · · + ω α0 )#(ω βn + · · · + ω β0 ) := ω γm+n + · · · + ω γ0 ,

where γm+n , . . . , γ0 is a decreasing permutation of αm , . . . , α0 , βn , . . . , β0 .


Again it is easy to see that # is associative, commutative and strictly monotonic in
both arguments.
We will also need to know how ordinal notations of the form β + ω α can be approxi-
mated from below. First note that

δ < α → β + ωδ a < β + ωα.

Furthermore, for any γ < β + ω α we can find a δ < α and an a such that

γ < β + ω δ a.

We now define 2α for ordinal notations α. Let αm ≥ · · · α0 ≥ ω > kn ≥ · · · ≥ k1 > 0.


Then αm α0 kn k1 αm α0 kn −1
0
+···+ω k1 −1 a
2ω +···+ω +ω +···+ω +ω a := ω ω +···+ω +ω 2 .
It is easy to see that 2α+1 = 2α + 2α and that 2α is strictly monotonic in α.
In order to work with ordinal notations in a purely arithmetical system we set up
a bijection between ordinal notations and nonnegative integers (i.e. a Gödel numbering).
For its definition it is useful to refer to ordinal notations in the form

ω αm a m + · · · + ω α0 a 0 with αm > · · · > α0 .

For any ordinal notation α we define its Gödel number |α| inductively by

|0| := 0,
Y
|ω αm am + · · · + ω α0 a0 | := ( pa|αi i | ) − 1.
i≤m

For any nonnegative integer x we define its corresponding ordinal notation o(x) inductively
by
o(0) = 0
Y X
o(( pai i ) − 1) = ω o(i) ai
i≤m i≤m

where the sum is to be understood as the natural sum.


10 Ordinals below epsilon zero 75

Lemma.
1. o(|α|) = α,
2. |o(x)| = x.

This can be proved easily by induction. 


Hence we have a bijection between ordinal notations and nonnegative integers. Us-
ing this bijection we can transfer our relations and operations on ordinal notations to
computable relations and operations on nonnegative integers. We will use the notations

x≺y for o(x) < o(y),


ωx for |ω o(x) |,
x⊕y for |o(x) + o(y)|.
11 Provability of initial cases of transfinite
induction

We now derive initial cases of the principle of transfinite induction in arithmetic, i.e. of

[∀x.(∀y.y ≺ x → P (y)) → P (x)] → ∀x.x ≺ a → P (x)

for some number a and a predicate symbol P . In §13 we will see that our results here
are optimal in the sense that for larger segments of the ordinals transfinite induction is
underivable. All these results are due to Gentzen [12].
Our arithmetical systems are based on a fixed (possibly countably infinite) supply of
function constants and predicate constants which are assumed to denote fixed functions and
predicates on the nonnegative integers for which a computation procedure is known. An
example is the formal system of arithmetic described in §7. Among the function constants
there must be a constant S for the successor function and 0 for (the 0–place function)
zero. Among the predicate constants there must be a constant = for equality and ≺ for
the ordering of type ε0 of the natural numbers, as introduced in §10. In order to formulate
the general principle of transfinite induction we also assume that a predicate symbol P is
present.
Terms are built up from object variables x, y, z by means of f (t1 , . . . , tm ), where f
is a function constant. We identify closed terms which have the same value; this is a
convenient way to express in our formal systems the assumption that for each function
constant a computation procedure is known. Terms of the form S(S(. . . S(0) . . .)) are
called numbers. We use the notation S i 0 or even i for them. Formulas are built up from
prime formulas P (t1 , . . . , tm ) with P a predicate constant or a predicate symbol and ⊥ by
means of (A → B) and ∀x A. As usual we abbreviate A → ⊥ by ¬A.
The axioms of our arithmetical systems will always include the Peano–axioms

∀x, y.S(x) = S(y) → x = y,


∀x S(x) 6= 0.

Any instance of the induction scheme

A[0/x] → (∀x.A → A[S(x)/x]) → ∀x A

with A an arbitrary formula is an axiom of full arithmetic Z. We will also consider


subsystems Zk of Z where the formulas A in the induction scheme are restricted to formulas
of level lev(A) ≤ k. The level of of a formula A is defined by

lev(P ) := 0 for any atom P ,


lev(A → B) := max(lev(A) + 1, lev(B)),
lev(∀x A) := max(1, lev(A)).
11 Provability of initial cases of transfinite induction 77

In addition, in any arithmetical system we have the equality axioms

∀x.x = x,
∀x, y.x = y → y = x,
∀x, y, z.x = y → y = z → x = z,
∀~x, ~y .x1 = y1 → . . . → xm = ym → f (~x) = f (~y ),
∀~x, ~y .x1 = y1 → . . . → xm = ym → P (~x) → P (~y )

for any function constant f and predicate constant or predicate symbol P . We also require
irreflexivity and transitivity for ≺ as axioms. For our negative results we can also allow
that for any predicate symbol P the stability axioms

∀~x.¬¬P (~x) → P (~x)

are present. However, for the positive results no stability axioms are needed. We express
our assumption that for any predicate constant a decision procedure is known by adding
the axiom
P (S i1 0, . . . , S im 0)
whenever P (~i) is true, and
¬P (S i1 0, . . . , S im 0)
whenever P (~i) is false.
We finally allow in any of our arithmetical systems Z an arbitrary supply of true
Π–formuals as axioms. Our (positive and negative) results concerning initial cases of
transfinite recursion will not depend on which of those axioms we have chosen, except that
for the positive results we always assume the universal closures of

x 6≺ 0, (1)
0
z ≺ y ⊕ ω → z 6≺ y → z 6= y → ⊥, (2)
x ⊕ 0 = x, (3)
x ⊕ (y ⊕ z) = (x ⊕ y) ⊕ z, (4)
0 ⊕ x = x, (5)
ω x 0 = 0, (6)
ω x (Sy) = ω x y ⊕ ω x , (7)
z ≺ y ⊕ ω x → x 6= 0 → z ≺ y ⊕ ω f (x,y,z) g(x, y, z), (8)
z ≺ y ⊕ ω x → x 6= 0 → f (x, y, z) ≺ x, (9)

where in (9) f and g are function constants.

Theorem 11.1 (Gentzen). Transfinite induction up to ωn (with ω1 := ω, ωn+1 := ω ωn ),


i.e. the formula
∀x ((∀y.y ≺ x → A[y]) → A) → ∀x.x ≺ ωn → A
78 11 Provability of initial cases of transfinite induction

is derivable in Z.

Proof. To any formula A we assign a formula A+ (with respect to a fixed variable x) by

A+ := ∀y.(∀z.z ≺ y → A[z/x]) → ∀z.z ≺ y ⊕ ω x → A[z/x].

We first show
A is progressive =⇒ A+ is progressive,
where “B is progressive” means ∀x.(∀y.y ≺ x → B[y]) → B. So assume that A is progres-
sive and
∀y.y ≺ x → A+ [y]. (10)
We have to show A+ . So assume further

∀z.z ≺ y → A[z] (11)

and z ≺ y ⊕ ω x . We have to show A[z]. Case x = 0. From z ≺ y ⊕ ω 0 we have by


(2) z ≺ y ∨ z = y. If z ≺ y, then A[z] follows from (11), and if z = y, then A[z]
follows from (11) and the progressiveness of A. Case x 6= 0. From z ≺ y ⊕ ω x we
obtain z ≺ y ⊕ ω f (x,y,z) g(x, y, z) by (8) and f (x, y, z) ≺ x by (9). From (10) we obtain
A+ [f (x, y, z)]. By the definition of A+ we get

(∀u.u ≺ y ⊕ ω f (x,y,z) v → A[u]) → ∀u.u ≺ (y ⊕ ω f (x,y,z)v) ⊕ ω f (x,y,z) → A[u]

and hence, using (4) and (7)

(∀u.u ≺ y ⊕ ω f (x,y,z) v → A[u]) → ∀u.u ≺ y ⊕ ω f (x,y,z) S(v) → A[u].

Also from (11) and (6), (3) we obtain

∀u.u ≺ y ⊕ ω f (x,y,z) 0 → A[u].

Using an appropriate instance of the induction scheme we can conclude

∀u.u ≺ y ⊕ ω f (x,y,z) g(x, y, z) → A[u]

and hence A[z].


We now show, by induction on n, how to obtain a derivation of

∀x ((∀y.y ≺ x → A[y]) → A) → ∀x.x ≺ ωn → A.

So assume the left–hand side, i.e. assume that A is progressive. Case 0. From x ≺ ω 0
we get x = 0 by (5), (2) and (1), and A[0] follows from the progressiveness of A by (1).
Case n + 1. Since A is progressive, by what we have shown above also A+ is progressive.
Applying the IH to A+ yields ∀x.x ≺ ωn → A+ , and hence A+ [ωn ] by the progressiveness
of A+ . Now the definition of A+ (together with (1) and (5)) yields ∀z.z ≺ ω ωn → A[z]. 
11 Provability of initial cases of transfinite induction 79

Note that in these derivations the induction scheme was used for formulas of un-
bounded complexity.
We now want to refine the Theorem to a corresponding result for the subsystems Zk
of Z. Note first that if A is a formula of level ≤ k, then the formula A+ constructed in the
proof of the Theorem has level ≤ k + 1, and for the proof of
A is progressive =⇒ A+ is progressive
we have used induction with an induction formula of level ≤ k.
Now let A be of level ≤ 1, and assume that A is progressive. Let A0 := A, Ai+1 :=
(Ai )+ . Then Ai is of level ≤ i + 1, and hence in Zk we can derive that A1 , A2 , . . . Ak
are all progressive. Let ω1 [m] := m, ωi+1 [m] = ω ωi [m] . Since in Zk we can derive that
Ak is progressive, we can also derive Ak [0], Ak [1], Ak [2] and generally Ak [m] for any m,
i.e. Ak [ω1 [m]]. But since
Ak ≡ (Ak−1 )+ ≡ ∀y.(∀z.z ≺ y → Ak−1 [z]) → ∀z.z ≺ y ⊕ ω x → Ak−1 [z],
we first get (with y = 0) ∀z.z ≺ ω2 [m] → Ak−1 [z] and then Ak−1 [ω2 [m]] by the progres-
siveness of Ak−1 . Repeating this argument we finally obtain ∀z.z ≺ ωk+1 [m] → A0 [z].
Hence we have

Theorem 11.2. Let A be a formula of level 1. Then in Zk we can derive transfinite


induction for A up to ωk+1 [m] for any m, i.e.
Zk ⊢ [∀x.(∀y.y ≺ x → A[y]) → A] → ∀x.x ≺ ωk+1 [m] → A. 

If more generally we start out with a formula A of level ≤ ℓ instead, where 1 ≤ ℓ ≤ k, then
a similar argument yields the following result (cf. Parsons [17]).

Theorem 11.3. Let A be a formula of level ≤ ℓ, 1 ≤ ℓ ≤ k. Then in Zk we can derive


transfinite induction for A up to ωk+2−ℓ [m] for any m, i.e.
Zk ⊢ [∀x.(∀y.y ≺ x → A[y]) → A] → ∀x.x ≺ ωk+2−ℓ [m] → A. 

Our next aim is to prove that these bounds are sharp. More precisely, we will show that
in Z (no matter how many true Π–formulas we have added as axioms) one cannot derive
transfinite induction up to ε0 , i.e. the formula
[∀x.(∀y.y ≺ x → P (y)) → P (x)] → ∀x P (x)
with a free predicate symbol P , and that in Zk one cannot derive transfinite induction up
to ωk+1 , i.e. the formula
[∀x.(∀y.y ≺ x → P (y)) → P (x)] → ∀x.x ≺ ωk+1 → P (x).
This will follow from the method of normalization applied to arithmetical systems, which
we have to develop first.
12 Normalization for arithmetic with the
omega rule

We will show in §14 that a normalization theorem does not hold for a system of arithmetic
like Z in §11, in the sense that for any formula A derivable in Z there is a derivation of the
same formula A in Z which only uses formulas of a level bounded by the level of A. The
reason for this failure is the presence of the induction axioms, which can be of arbitrary
level.
Here we remove that obstacle against normalization and replace the induction ax-
ioms by a rule with infinitely many premises, the so–called ω–rule (suggested by Hilbert
and studied by Lorenzen, Novikov and Schütte), which allows to conclude ∀x A from
A[0], A[1], A[2], · · ·.
Clearly this ω–rule can also be used to replace the rule ∀+ . As a consequence we do
not need to consider free object variables.
So we introduce the system Z ∞ of ω–arithmetic as follows. Z ∞ has the same language
and — apart from the induction axioms — the same axioms as Z. Derivations in Z ∞ are
infinite objects; they are built up from assumption variables uA , v B and constants axA for
any axiom A of Z other than an induction axiom by means of the rules
(λuA dB )A→B ,
(dA→B eA )B ,
A[i] ∀x A
hdi ii<ω ,
∀x A A[i]
(d i)
+ − −
denoted by → , → , ω and ∀ , respectively.
More precisely, we define the notion of a ~u–derivation (i.e. a derivation in Z ∞ with
free assumption variables among ~u) of height ≤ α and degree ≤ k inductively, as below.
Note that derivations are infinite objects now. They may be viewed as mappings from
finite lists of natural numbers (= nodes in the derivation tree) to lists of data including
the formula appearing at that node, the rule applied last, a list of assumption variables
including all those free in the subderivation (starting at that node), a bound on the height
of the subderivation, and a bound on the degree of the subderivation.
Intuitively, the degree of a derivation is the least number ≥ the level of any sub-
derivation λx d in a context (λx d)e or hdi ii<ω in a context hdi ii<ω j, where the level of a
derivation is the level of its type, i.e. the formula it derives. This notion of a degree is
needed for the normalization proof we give below.

* Any assumption variable uA and any axiom axA is a ~u–derivation of height ≤ α and
degree ≤ k, for any list ~u of assumption variables (containing u in the first case),
ordinal α and number k.
12 Normalization for arithmetic with the omega rule 81

→+ If dB is a ~u, u, ~v–derivation of height ≤ α0 < α and degree ≤ k, then (λuA dB )A→B is


a ~u, ~v–derivation of height ≤ α and degree ≤ k.

→− If dA→B and eA are ~u–derivations of heights ≤ αi < α and degrees ≤ ki ≤ k (i =


1, 2), then (dA→B eA )B is a ~u–derivation of height ≤ α and degree ≤ m with m =
max(k, lev(A → B)), if dA→B is generated by the rule →+ , or of degree ≤ k otherwise.
A[i]
ω If ri are ~u–derivations of heights ≤ αi < α and degrees ≤ ki ≤ k (i < ω), then
A[i] ∀x A
hri ii<ω is an ~u–derivation of height ≤ α and degree ≤ k.

∀− If d∀x A is a ~u–derivation of height ≤ α0 < α and degree ≤ k, then (d∀x A i)A[i] is a


~u–derivation of height ≤ α and degree ≤ m with m = max(k, level ∀x A), if d∀x A is
generated by the rule ω, or of degree ≤ k otherwise.

We now embed our systems Zk (i.e. arithmetic with induction restricted to formulas of
level ≤ k) and hence Z into Z ∞ .
~
Lemma 12.1. Let dB be a derivation in Zk with free assumption variables among ~uA
which contains ≤ m instances of the induction scheme all with induction formulas of level
≤ k. Let σ be a substitution of numbers for object variables such that Aσ, ~ Bσ do not
~
contain free object variables. Then we can find a ~uAσ –derivation (d∞ )Bσ in Z ∞ of height
≤ ω m + h for some h < ω and degree ≤ k.

Proof by induction on the height of the given derivation, which we may assume to be in
long normal form. The only case which requires some argument is when the derivation
consists of two applications of →− to an instance of the induction scheme. Then it must
have the form
Indx,A dA[0] (λx, v A eA[S(x)] ).
By IH we obtain derivations
dA[0]
∞ of height ≤ ω m−1 + h0
eA[1] A[0]
∞ [d∞ ] of height ≤ ω m−1 · 2 + h1 ,
eA[2] A[1] A[0]
∞ [e∞ [e∞ ]] of height ≤ ω m−1 · 3 + h2
and so on, all of degree ≤ k. Combining all these derivations of A[i] as premises of the
ω–rule yields a derivation of ∀x A of height ≤ ω m and degree ≤ k. 
A derivation is called convertible if it is of the form (λu d)e or else hdi ii<ω j, which can
be converted into d[e/u] or dj , respectively. Here d[e/u] is obtained from d by substituting
e for all free occurences of u in d. A derivation is called normal if it does not contain a
convertible subderivation. Note that a derivation of degree 0 must be normal.
We want to define an operation which by repeated conversions transforms a given
derivation into a normal one with the same end formula and no more assumption variables.
The methods employed in §4 to achieve such a task have to be adapted properly in order
to deal with the new situation of infinitary derivations. Here we give a particularly simple
argument due to Tait [29].
82 12 Normalization for arithmetic with the omega rule

Lemma 12.2. If d is a ~u, uA , ~v–derivation of height ≤ α and degree ≤ k and eA is a


~u, ~v–derivation of height ≤ β and degree ≤ ℓ, then d[e/u] is an ~u, ~v–derivation of height
≤ β + α and degree ≤ max(k, ℓ, lev(e)).

This is proved by a straightforward induction on the height of d. 

Lemma 12.3. For any ~u–derivation dA of height ≤ α and degree ≤ k + 1 we can find a
~u–derivation (dk )A of height ≤ 2α and degree ≤ k.

Proof by induction on α. The only case which requires some argument is when the deriva-
tion is of the form de with d of height ≤ α1 < α and e of height ≤ α2 < α. We first
consider the subcase where dk = λu d1 and lev(d) = k + 1. Then lev(e) ≤ k by the def-
inition of level, and hence (d1 )[ek /u] has degree ≤ k by Lemma 12.2. Furthermore, also
by Lemma 12.2, (d1 )[ek /u] has height ≤ 2α2 + 2α1 ≤ 2max(α2 ,α1 )+1 ≤ 2α . Hence we can
take (de)k to be (d1 )[ek /u]. If we are not in the above subcase, we can simply take (de)k
to be dk ek . This derivation clearly has height ≤ 2α . Also it has degree ≤ k, which can be
seen as follows. If lev(d) ≤ k we are done. If however lev(d) ≥ k + 2, then d must be of
the form d0 d1 . . . dm for some assumption variable or axiom d 0 (since the given derivation
has degree ≤ k + 1). But then dk has the form d0 dk1 . . . dkm and we are done again. (To be
completely precise, this last statement has to be added to the formulation of the Lemma
above and proved simultaneously with it). 
As an immediate consequence we obtain

Theorem 12.4 (Normalization for Z ∞ ). For any ~u–derivation dA of height ≤ α and


degree ≤ k we can find a normal ~u–derivation (d∗ )A of height ≤ 2k α (where 20 α = α,
α
2m+1 α = 22m ). 
13 Unprovable initial cases of transfinite
induction

We now apply the technique of normalization for arithmetic with the ω–rule for a proof
that transfinite induction up to ε0 is underivable in Z, i.e. of

Z 6⊢ [∀x.(∀y.y ≺ x → P (y)) → P (x)] → ∀x P (x)

with a predicate symbol P , and that transfinite induction up to ωk+1 is underivable in Zk ,


i.e. of
Zk 6⊢ [∀x.(∀y.y ≺ x → P (y)) → P (x)] → ∀x.x ≺ ωk+1 → P (x).
Our proof is based on an idea of Schütte, which consists in adding a so–called progression
rule to the infinitary systems. This rule allows to conclude P (j) (where j is any number)
from all P (i) for i ≺ j.
More precisely, we define the notion of a ~u–derivation in Z ∞ + Prog(P ) of height ≤ α
and degree ≤ k by the inductive clauses of §12 and the additional clause Prog(P ):

P (i)
If di are ~u–derivations of heights ≤ αi < α and degrees ≤ ki ≤ k (i ≺ j), then
P (i) P (j)
hdi ii≺j is an ~u–derivation of height ≤ α and degree ≤ k.

Since this progression rule only deals with derivations of prime formulas it does not affect
the degrees of derivations. Hence the proof of normalization for Z ∞ carries over unchanged
to Z ∞ + Prog(P ). In particular we have

Lemma 13.1. For any ~u–derivation dA in Z ∞ +Prog(P ) of height ≤ α and degree ≤ k +1


we can find an ~u–derivation (dk )A in Z ∞ + Prog(P ) of height ≤ 2α and degree ≤ k. 

We now show that from the progression rule for P we can easily derive the progres-
siveness of P .

Lemma 13.2. We have a normal derivation of ∀x.(∀y.y ≺ x → P (y)) → P (x) in Z ∞ +


Prog(P ) with height ≤ 5.

Proof. By the ω–rule it suffices to derive (∀y.y ≺ j → P (y)) → P (j) for any j with height
≤ 4. We argue informally. Assume ∀y.y ≺ j → P (y). By ∀− we have i ≺ j → P (i) for any
i. Now for any i ≺ j we have i ≺ j as an axiom; hence P (i) for any such i. An application
of the progression rule yields P (j), with a derivation of height ≤ 3. Now by →+ and ω
the claim follows. 
The crucial observation now is that a normal derivation of P (|β|) must essentially have
a height of at least β. However, to obtain the right estimates for our subsystems Zk we
cannot apply Lemma 13.1 down to degree 0 (i.e. to the normal form) but must stop already
84 13 Unprovable initial cases of transfinite induction

at degree 1. Such derivations, i.e. those of degree ≤ 1, will be called almost normal; they
can also be analyzed easily. An almost normal derivation d in Z ∞ + Prog(P ) is called a
~
α|), ¬P (|β|)–refutation
P (|~ if d derives a formula A ~ → B with A
~ and the free assumptions
in d among P (|~ ~ := ¬P (|β1 |), . . . , ¬P (|βn |) and
α|) := P (|α1 |), . . . , P (|αm |) and ¬P (|β|)
true prime formulas, and B a false prime formula or else among P (| β|). ~

Lemma 13.3. Let d be an almost normal P (|~ ~


α|), ¬P (|β|)–refutation of height ≤ |d| with
α ~
~ and β disjoint. Then
~ ≤ |d| + lg~
min(β) α,

where lg~
α denotes the length of the list α
~.

Proof by induction on |d|. Note that we may assume that d does not contain the ω–rule,
since any application of it must be in a context hdi ij , which can be replaced by dj . We can
also assume that d contains ∀− only in a context where leading universal quantifiers of an
axiom are removed. Note also that d cannot derive an instance |γ| = |δ| → P (|γ|) → P (|δ|)
of an equality axiom with γ = δ true, since we have assumed that α ~ and β~ are disjoint.
We distinguish cases according to the last rule in d.
Case →+ . By our definition of refutations the claim follows immediately from the IH.
~
Case →− . Then d ≡ f A→(A→B) eA . If A is a true prime formula, the claim follows
from the IH for f . If A is a false prime formula, the claim follows from the IH for e. If A is
∀x.¬¬P (x)→P (x)
¬¬P (|γ|) (and hence f ≡ StabP |γ|), then since the level of ¬¬P (|γ|) is 2 the
¬¬P (|γ|)
derivation e must end with an introduction rule, i.e. s ≡ λu¬P (|γ|) e⊥
0 (for otherwise,
since no axiom contains some ¬¬P (d0 ) as a strictly positive subformula, we would get a
contradiction against the assumption that d has degree ≤ 1). The claim now follows from
the IH for e0 . The only remaining case is when A is P (|γ|). Then f is an almost normal
~ –refutation and e is an almost normal P (|~
α|), ¬P (|β|)
P (|γ|), P (|~ ~ ¬P (|γ|) –
α|), ¬P (|β|),
refutation. We may assume that γ is not among α ~ , since otherwise the claim follows
immediately from the IH for f . Hence we have by the IH for f

~ ≤ |f | + lg~
min(β) α + 1 ≤ |d| + lg~
α.

P (|δ|) P (|γ|) ~ ¬P (|δ|)


Case Prog(P ). Then d ≡ hdδ iδ<γ . α|), ¬P (|β|),
By IH, since dδ is a P (|~
–refutation, we have for all δ < γ

~ δ) ≤ |dδ | + lg~
min(β, α < |d| + lg~
α

and hence
~ γ) ≤ |d| + lg~
min(β, α. 

Now we can show the following result (cf. Mints [13] and Parsons [17]).
13 Unprovable initial cases of transfinite induction 85

Theorem 13.4. Transfinite induction up to ε0 is underivable in Z, i.e.

Z 6⊢ [∀x.(∀y.y ≺ x → P (y)) → P (x)] → ∀x P (x)

with a predicate symbol P , and transfinite induction up to ωk+1 is underivable in Zk , i.e.

Zk 6⊢ [∀x.(∀y.y ≺ x → P (y)) → P (x)] → ∀x.x ≺ ωk+1 → P (x).

Proof. We restrict ourselves to the second part. So assume that transfinite induction up
to ωk+1 is derivable in Zk . Then by the embedding of Zk into Z ∞ (Lemma 12.1) and the
normal derivability of the progressiveness of P in Z ∞ +Prog(P ) with finite height (Lemma
13.2) we can conclude that ∀x.x ≺ ωk+1 → P (x) is derivable in Z ∞ + Prog(P ) with height
< ω m + h for some m, h < ω and degree ≤ k. Now k − 1 applications of Lemma 13.1
yield a derivation of the same formula ∀x.x ≺ ωk+1 → P (x) in Z ∞ + Prog(P ) with height
≤ γ = 2k−1 (ω m + h) < ωk+1 and degree ≤ 1, hence also a derivation of P (|γ + 1|) in
Z ∞ + Prog(P ) with height ≤ γ and degree ≤ 1. But this contradicts Lemma 13.3. 
14 Normalization for arithmetic is
impossible

The normalization theorem for first–order logic applied to arithmetic Z is not particularly
useful since we may have used in our derivation induction axioms of arbitrary complexity.
Hence it is tempting to first eliminate the induction scheme in favour of an induction rule
allowing to conclude ∀x A from a derivation of A[0] and a derivation of A[S(x)] with an
additional assumption A to be cancelled at this point (note that this rule is equivalent to
the induction scheme), and then to try to normalize the resulting derivation in the new
system Z with the induction rule. We will apply our results from §13 to show that even a
very weak form of the normalization theorem cannot hold in Z with the induction rule.

Theorem 14.1. The following weak form of a normalization theorem for Z with the
~ ~ B formulas of degree ≤ ℓ there
induction rule is false: For any ~uA –derivation dB with A,
~
A ∗ B
is an ~u –derivation (d ) containing only formulas of degree ≤ k, with k depending only
on ℓ.

Proof. Assume that such a normalization theorem would hold. Consider the formula

[∀x.(∀y.y ≺ x → P (y)) → P (x)] → ∀x.x ≺ ωn+1 → P (x)

expressing transfinite induction up to ωn+1 , which is of degree 3. By Theorem 11.1 it is


derivable in Z. Hence there exists a derivation of the same formula containing only formulas
of degree ≤ k, for some k independent of n. Hence Zk derives transfinite induction up to
ωn+1 for any n. But this clearly contradicts Theorem 13.1. 
Appendix
15 Permutative conversions

Prawitz in [19] proves strong normalization for conversion rules on proof trees for logic
with ∨ and ∃. The conversion rules not only include the usual β–rules but also so–
called permutative rules. The method Prawitz uses is based on so-called strong validity
predicates; they are a variant of Tait’s notion of a strongly computable term treated in §4.
The following exposition is based on Prawitz’ proof and on a study of it by van de
Pol [32]. It uses derivation terms instead of proof trees for a cleaner exposition. Also
the definition of an end segment is formalized. In the definition of strong validity a small
oversight in Prawitz’ formulation has been corrected.
We first define the reduction rules. In order to have a uniform notation it is convenient
to write

⊃+ (u, d) instead of (λuA dB )A→B ,


⊃− (d, e) instead of (dA→B eA )B ,
∧+ (d, e) instead of hdA , eB iA∧B ,
∧−
i (d) instead of πi (d
A0 ∧A1 Ai
) ,
ρ
∀+ (x, d) instead of (λxρ dA )∀x A ,
ρ ρ
∀− (d, t) instead of (d∀x A tρ )A[t/x ] .

The conversion rules are

⊃− (⊃+ (u, d), e) →0 d[e/u],


∧− +
i (∧ (d0 , d1 )) →0 di ,
∨− (∨+
i (d), u0 , e0 , u1 , e1 ) →0 ei [d/ui ],
∀− (∀+ (x, d), t) →0 d[t/x],
∃− (∃+ (t, d), x, u, e) →0 e[t, d/x, u],
− (∨− (d, u, e, v, f ), ~α) →0 ∨− (d, u, − (e, α
~ ), v, −(f, α
~ )),
− (∃− (d, x, u, e), ~α) →0 ∃− (d, x, u, −(e, α
~ )).

Here we have used  to denote an arbitrary element of {⊃, ∧, ∨, ∀, ∃}. α ~ denotes an


appropriate sequence of object– and assumption variables. For simplicity we write ∨, ∃
instead of ∨∗ , ∃∗ .
The right hand side of each rule is called an immediate reduct of the left hand side.
In the last two cases we call it a permutative reduct.. From →0 one derives a one–step
reduction relation → as usual (cf. §4).
88 15 Permutative conversions

We now show that → is terminating, i.e. that any reduction sequence starting with
d terminates after finitely many steps. We write d →∗ d′ (or d →+ d′ ) if d′ is a member
of a reduction sequence (a reduction sequence with at least two elements) starting with d.
Hence →∗ is the reflexive transitive closure of →.
Clearly conversion is compatible with substitution:

Lemma 1. If d → d′ , then d[t] → d′ [t] and d[e] → d′ [e]

Proof by induction on the definition of d → d′ . 


We now define what it means that a derivation e is an end segment of the derivation
d. This is written ES(d, e), and the relation ES is defined inductively as follows.

(i) ES(d, d).

(ii) If ES(d′ , e), then ES(∃− (d, x, u, d′), e).

(iii) If ES(di , e) for some i, then ES(∨− (d, u0 , d0 , u1 , d1 ), e).

Clearly the end segment relation is transitive, i.e. ES(d, e) and ES(e, f ) imply ES(d, f ).
We now prove that an introduction in the end segment of the main premise of an exists–
elimination can always be removed by conversion.

Lemma 2. Consider ∃− (d1 , x, u, f ) and assume ES(d1 , ∃+ (t, e)). Then there exists a d2
such that ∃− (d1 , x, u, f ) →+ d2 and ES(d2 , f [t, e/x, u]).

Proof by induction on the definition of ES(d1 , ∃+ (t, e)).


Case d1 = ∃+ (t, e). Then we can carry out a proper conversion, i.e. let d2 =
f [t, e/x, u].
Case d1 = ∃− (g, y, v, d′) with ES(d′ , ∃+ (t, e)). Then we have an existential permuta-
tive conversion
∃− (d1 , x, u, f ) →0 ∃− (g, y, v, ∃−(d′ , x, u, f )).
By IH we can reduce the last term to ∃− (g, y, v, d′2) with ES(d′2 , f [t, e/x, u]). Let d2 =
∃− (g, y, v, d′2).
Case d1 = ∨− (d, u0 , d0 , u1 , d1 ) with ES(di , ∃+ (t, e)). Similar. 
Similarly we can prove that an introduction in the end segment of an or–elimination
can always be removed by conversion.

Lemma 3. Consider ∨− (d1 , u0 , e0 , u1 , e1 ) and assume ES(d1 , ∨+ i (e)). Then there exists a
− +
d2 such that ∨ (d1 , u0 , e0 , u1 , e1 ) → d2 and ES(d2 , ei [e/ui ]). 

We now define the central notion of a strongly valid derivation dA . The definition
is by induction on the number of logical symbols in A, and for fixed A it is an inductive
definition. We write SV(d) to mean that d is strongly valid.
15 Permutative conversions 89

– SV(∧+ (d, e)) if SV(d) and SV(e).

– SV(∨+
i (d)) if SV(d).

– SV(∃+ (t, d)) if SV(d).

– SV(⊃+ (u, d)) if for any e with SV(e) we have SV(d[e/u]).

– SV(∀+ (x, d)) if for any t we have SV(d[t/x]).

– If d is not an introduction, then SV(d) if the following three conditions all hold.

(a) For all d′ with d → d′ we have SV(d′ ).

(b) If d = ∨− (d1 , u0 , e0 , u1 , e1 ), then for i = 0 and i = 1 we have SV(ei ) and for all
d′ with d1 →∗ d′ and all e with ES(d′ , ∨+ i (e)) we have SV(ei [e/ui ]).

(c) If d = ∃− (d1 , x, u, e), then SV(e) and for all d′ with d1 →∗ d′ and all t, f with
ES(d′ , ∃+ (t, f )) we have SV(e[t, f /x, u]).
Remark. In (c) a small oversight in Prawitz’ formulation has been corrected by van
de Pol. Prawitz (in [19], p. 293) does not mention that the top formula ∃x B in the end
segment of d′ should have been obtained by ∃+ from B[t]; this is needed in the proofs
below. It could also have been obtained by

B[t] → ∃x B B[t]
.
∃x B

Lemma 4. If d → d′ and SV(d), then SV(d′ ).

Proof by induction on the definition of SV(d). If d is an introduction, then the claim


follows immediately from the IH. If d is not an introduction, then by (a) in the definition
of SV(d) we have SV(d′ ). 

Lemma 5. If SV(d), then SN(d).

Proof by induction on the definition of SV(d). If d is an introduction, then the claim


follows immediately from the IH and the fact that any assumption variable is strongly
valid. If d is not an introduction, then by definition of SV(d) we have SV(e) for any e such
that d → e. By IH we have SN(e) for any such e, and hence SN(d). 

Lemma 6. (i) If ES(d, e) and SV(d), then SV(e). (ii) If ES(d, e) and e → e′ , then there
is a d′ such that ES(d′ , e′ ) and d → d′ . (iii) If ES(d, e) and SN(d), then SN(e).

Proof. (i) is proved easily by induction over the definition of ES(d, e), using the first part
of (b) and (c) in the definition of SV(d). (ii) is also proved easily by induction over the
definition of ES(d, e). (iii) is an immediate consequence of (ii). 
90 15 Permutative conversions

Main Lemma 7. If d is an elimination, then SV(d) if the following conditions hold.

(i) SN(e) for any immediate subderivation e of d.

(ii) If the last rule of d is neither ∨− nor ∃− , then SV(e) for any immediate subderivation
e of d.

(iii) If the last rule of d is either ∨− or ∃− , then conditions (b) and (c) of the definition of
strong validity hold. More explicitely: (b) If d = ∨− (d1 , u0 , e0 , u1 , e1 ), then for i = 0
and i = 1 we have SV(ei ) and for all d′ with d1 →∗ d′ and all e with ES(d′ , ∨+ i (e)) we
have SV(ei [e/ui ]); (c) If d = ∃ (d1 , x, u, e), then SV(e) and for all d with d1 →∗ d′
− ′

and all t, f with ES(d′ , ∃+ (t, f )) we have SV(e[t, f /x, u]).

Proof. To any elimination d we assign an induction value, which is a triple (k, ℓ, m) of


natural numbers where

k = the length of the longest reduction sequence from the major premise of d,
ℓ = the depth of the major premise of d,
m = the sum of the lengths of the longest reduction sequences from the immediate
subderivations of d.

The induction values are ordered lexicographically.


Let d be an elimination such that (i)–(iii) hold. Let (k, ℓ, m) be the induction value.
Note that k and m are finite by assumption (i). By the definition of SV(d) it suffices to
show SV(d′ ) for all d′ such that d → d′ . We distinguish three cases.
I. d′ is obtained from d by reducing a proper subderivation of d. Let (k ′ , ℓ′ , m′ ) be
the induction value of d′ . If the major premise of d was reduced, then k > k ′ . Otherwise
k = k ′ , ℓ = ℓ′ and m > m′ . So the induction value is lowered and we can use the IH,
which says that it is enough to prove (i), (ii) and (iii) for d′ . (i) follows from (i) for (d).
(ii) follows from (ii) for (d), using Lemma 4. For (iii) we only treat the existential case.
So let d = ∃− (d1 , x, u, e1 ). Then d′ = ∃− (d2 , x, u, e2 ) with d1 → d2 and e1 = e2 , or
d1 = d2 and e1 → e2 . We must show SV(e2 ) and for all d′ with d2 →∗ d′ and all t, f with
ES(d′ , ∃+ (t, f )) we have SV(e2 [t, f /x, u]). SV(e2 ) holds, since by assumption d satisfies (iii)
(we need Lemma 4 here in case e1 → e2 ). Now let d2 →∗ d′ and t, f with ES(d′ , ∃+ (t, f ))
be given. Then also d1 →∗ d′ . With (iii) for d it follows that SV(e1 [t, f /x, u]). Then also
SV(e2 [t, f /x, u]) (we need Lemmata 1 and 4 here in case e1 → e2 ).
II. d′ is a proper reduct of d. Here the claim follows easily from the assumptions
(i)–(iii) and the definition of strong validity.
Case ⊃− (⊃+ (u, d), e) →0 d[e/u]. We must prove SV(d[e/u]). By (ii) we know
SV(⊃+ (u, d)) and SV(e). Hence SV(d[e/u]) by the definition of strong validity.
Case ∧− + +
i (∧ (d0 , d1 )) →0 di . We must prove SV(di ). By (ii) we know SV(∧ (d0 , d1 ))).
Hence SV(di ) by the definition of strong validity.
15 Permutative conversions 91

Case ∀− (∀+ (x, d), t) →0 d[t/x]. We must prove SV(d[t/x]). By (ii) we know that
SV(∀+ (x, d)). Hence SV(d[t/x]) by the definition of strong validity.
Case ∨− (∨+ i (d), u0 , e0 , u1 , e1 ) →0 ei [d/ui ]. We must prove SV(ei [d/ui ]). This follows
immediately from (iii), condition (b).
Case ∃− (∃+ (t, d), x, u, e) →0 e[t, d/x, u]. Similar.
III. d′ is a permutative reduct of d. We only treat the case of an existential permutative
conversion. Then

d = − (∃− (d1 , x, u, d2), α


~ ),
d′ = ∃− (d1 , x, u, − (d2 , α
~ )).

Let (k ′ , ℓ′ , m′ ) be the induction value of d′ . We will show in (i) below that k ′ and m′
are finite. We have to prove SV(d′ ). Note that the major premise of d′ is an immediate
subderivation of the major premise of d. Therefore k ≥ k ′ and ℓ > ℓ′ . So the induction
value is lowered and we can use the IH, which says that it is enough to prove (i) and (iii)
for d′ .
(i). SN(d1 ) clearly follows from (i) for d. SN(− (d2 , α
~ )) follows by Lemma 5 from

SV( (d2 , α
~ )), which will be proved as (∗) below.
(iii). We have to show (∗) SV(− (d2 , α ~ )) and (∗∗) If d1 →∗ d′1 with ES(d′1 , ∃+ (t, f )),
then SV(− (d2 [t, f /x, u], ~α)). First note the following facts.

SN(~
α), SN(∃− (d1 , x, u, d2)) and SN(d2 ). (F 1)

This follows from assumption (i) for d. We also have

If  ∈ {∧, ⊃, ∀}, then SV(~


α), SV(∃− (d1 , x, u, d2)) and SV(d2 ). (F 2)

The first and second follow from assumption (ii) for d, and the third follows from the
second by the definition of strong validity. Now we can prove (∗) and (∗∗).
(∗). Let (k1 , ℓ1 , m1 ) be the induction value of − (d2 , α
~ ). Then k ≥ k1 and ℓ > ℓ1 .
Hence we can apply the IH, which says that it suffices to prove (i)—(iii) for − (d2 , α
~ ). (i)
follows from (F1) and (ii) follows from (F2). For (iii) we only treat the case − (d2 , α~) =

∃ (d2 , y, v, d3); the case with ∨ is similar. Then

d = ∃− (∃− (d1 , x, u, d2), y, v, d3),


d′ = ∃− (d1 , x, u, ∃− (d2 , y, v, d3)).

We have to prove SV(d3 ) and for all d′2 with d2 →∗ d′2 and all s, g such that ES(d′2 , ∃+ (s, g))
we have SV(d3 [s, g/y, v]). The first follows from (c1 ) for d. For the second assume
d2 →∗ d′2 and ES(d′2 , ∃+ (s, g)). Then SV(d3 [s, g/y, v]) follows from (c2 ) for d, since
∃− (d1 , x, u, d2) →∗ ∃− (d1 , x, u, d′2) and ES(∃− (d1 , x, u, d′2 ), ∃+ (s, g)).
92 15 Permutative conversions

(∗∗). Let d1 →∗ d′1 and ES(d′1 , ∃+ (t, f )). We must show SV(− (d2 [t, f /x, u], ~α)). Let
(k2 , ℓ2 , m2 ) be the induction value of − (d2 [t, f /x, u], ~α). We will need yet another fact:

There is an e such that ∃− (d1 , x, u, d2) →+ e and ES(e, d2 [t, f /x, u]). (F 3)

This follows from Lemma 2, since d 1 →∗ d′1 and ES(d′1 , ∃+ (t, f )). From (F3) we immedi-
ately obtain k > k2 , using Lemma 6. Clearly m2 is finite by (F1). So the induction value
is lowered and we can use the IH, which says that it is enough to prove (i), (ii) and (iii)
for − (d2 [t, f /x, u], ~α).
(i). SN(~
α) holds by (F1). SN(d2 [t, f /x, u]) follows from (F3), Lemma 6(iii) and (F1).
(ii). Let  ∈ {∧, ⊃, ∀}. Then we obtain SV(~ α) by (F2). SV(d2 [t, f /x, u]) follows from

SV(∃ (d1 , x, u, d2)) (which holds by (F2)) by (c) in the definition of strong validity (since
d1 →∗ d′1 and ES(d′1 , ∃+ (t, f ))).
(iii). We only treat the case − (d2 , α
~ ) = ∃− (d2 , y, v, d3); the case with ∨ is similar.
Then

d = ∃− (∃− (d1 , x, u, d2), y, v, d3),


d′ = ∃− (d1 , x, u, ∃− (d2 , y, v, d3)).

We have to prove SV(d3 ) and for all d′2 with d2 [t, f /x, u] →∗ d′2 and all s.g such that
ES(d′2 , ∃+ (s, g)) we have SV(d3 [s, g/y, v]). The first follows from (c1 ) for d. For the second
assume d2 [t, f /x, u] →∗ d′2 and ES(d′2 , ∃+ (s, g)). We must show SV(d3 [s, g/y, v]). By (F3)
we have an e such that

∃− (d2 , x, u, d1) →+ e and ES(e, d2 [t, f /x, u]).

But then, because of d2 [t, f /x, u] →∗ d′2 , Lemma 6(ii) gives us an e′ such that

∃− (d2 , x, u, d1) →+ e′ and ES(e′ , d′2 ).

Because of ES(d′2 , ∃+ (s, g)) and the transitivity of the end segment relation we have
ES(e′ , ∃+ (s, g)). Therefore the main premise of d reduces to e′ with ∃+ (s, g) in its end
segment. By (c2 ) for d we can conclude that the side premise of d with s, g substituted is
strongly valid, i.e. SV(d3 [s, g/y, v]). 
A derivation d is called strongly valid under substitution (written SV∗ (d)) if for any ob-
ject terms ~t and strongly valid derivations f~ we have SV(d[~t, f~]). Using the Main Lemma 7
we can now prove

Lemma 8. Any derivation d is strongly valid under substitution.

Proof by induction over d. Case u. Then SV(u) by definition.


Cases ∧+ (d, e), ∨+ +
i (d), ∃ (t, d). The claim follows immediately from the definition of
strong validity and the IH.
15 Permutative conversions 93

Case ⊃+ (u, d). Let f~ be strongly valid. We have to show that ⊃+ (u, d[~t, f~]) is strongly
valid. So let e be strongly valid. We have to show that d[e, f]~ is strongly valid. But this
follows from the IH for d.
Case ∀+ (x, d). Similar.
Case − (~ α). Let f~ be strongly valid. We have to prove (i), (ii) and (iii) of the Main
Lemma 7 for − (~ α)[~t, f~]. By IH SV(αi [~t, f~]) for every immediate subderivation αi , and
by Lemma 5 we can conclude that SN(αi [~t, f~]). This proves (i) and (ii). For (iii) we have
to prove conditions (a) and (b) of the definition of strong validity. We only treat (c).
So let d = ∃− (d1 , x, u, e), hence d[~t, f~] = ∃− (d1 [~t, f~], x, u, e[~t, f~]). Let d1 [~t, f~] →∗ d′ and
ES(d′ , ∃+ (t, f )). We must show SV(e[t, ~t, f~]). Since SV(d1 [~t, f~]) by IH, we get SV(d′ ) by
Lemma 4, hence SV(∃+ (t, f )) by Lemma 6(i) and hence SV(f ) by the definition of strong
validity. Now from the IH for e we obtain SV(e[t, ~t, f, f]). ~ 
From Lemma 5 and Lemma 8 we immediately obtain

Theorem. → is terminating, i.e. any derivation d is strongly normalizable. 


References

[1] Heinz Bachmann. Transfinite Zahlen. Springer, Berlin, 1955.


[2] Ulrich Berger and Helmut Schwichtenberg. An inverse of the evaluation functional
for typed λ–calculus. In Rao Vemuri, editor, Proceedings of the Sixth Annual IEEE
Symposium on Logic in Computer Science, pages 203–211. IEEE Computer Society
Press, Loss Alamitos, 1991.
[3] Kim B. Bruce, Albert R. Meyer, and John C. Mitchell. The semantics of second order
lambda-calculus. Information and Computation, 85:76–134, 1990.
[4] William Clinger, Jonathan Rees (editors) H. Abelson, N.I. Adams IV, D.H. Bart-
ley, G. Brooks, R.K. Dybvig, D.P. Friedman, R. Halstead, C. Hanson, C.T. Haynes,
E. Kohlbecker, D. Oxley, K.M. Pitman, G.J. Rozas, G.L. Steele Jr., G.J. Sussman,
and M. Wand. Revised4 Report on the Algorithmic Language Scheme, 1991. Appeared
in ACM Lisp Pointers IV, July-September 1991, and also as MIT AI Memo 848b. It
can be obtained by anonymous ftp at the two Scheme Repositories, altdorf.ai.mit.edu
and nexus.yorku.ca.
[5] Catarina Coquand. From semantics to rules: A machine assisted analysis. Submitted
to CSL’93, September 1993.
[6] Roberto Di Cosmo and Delia Kesner. A confluent reduction for the extensional typed
λ–calculus with pairs, sums, recursion and terminal object. In ICALP 93, number 700
in Lecture Notes in Computer Science, pages 645–656, Heidelberg, 1993. Springer.
[7] Roberto Di Cosmo and Delia Kesner. Simulating expansions without expansions.
Rapport de recherche 1911, INRIA, Roquencourst, May 1993.
[8] Daniel J. Dougherty. Some lambda calculi with categorical sums and products. In
Proceedings of the Fifth International Conference on Rewriting Techniques and Ap-
plications (RTA), number 700 in Lecture Notes in Computer Science, pages 137–151.
Springer, 1993.
[9] Yuri L. Ershov. Model C of partial continuous functionals. In R. Gandy and M. Hy-
land, editors, Logic Colloquium 1976, pages 455–467. North Holland, Amsterdam,
1977.
[10] Harvey Friedman. Classically and intuitionistically provably recursive functions. In
Dana S. Scott and Gert H. Müller, editors, Higher Set Theory, pages 21–28. Springer
Lecture Notes in Mathematics, Volume 699, 1978.
[11] Gerhard Gentzen. Untersuchungen über das logische Schließen. Mathematische
Zeitschrift, 39:176–210, 405–431, 1934.
References 95

[12] Gerhard Gentzen. Beweisbarkeit und Unbeweisbarkeit von Anfangsfällen der trans-
finiten Induktion in der reinen Zahlentheorie. Mathematische Annalen, 119:140–161,
1943.
[13] G.E. Mints. Exact estimates of the provability of transfinite induction in the initial
segments of arithmetic. Journal of Soviet Math, 1:85–91, 1973. Translated from
Zapiski Nauch. Sem. Leningrad 20, 134–144 (1971).
[14] G.E. Mints. On e–theorems (in russian). Zapiski, 40:110–118, 1974.
[15] M.H.A. Newman. On theories with a combinatorial definition of “equivalence”. Annals
of Mathematics, 43(2):223–243, 1942.
[16] V. P. Orevkov. Lower bounds for increasing complexity of derivations after cut elim-
ination. Zapiski Nauchnykh Seminarov Leningradskogo, 88:137–161, 1979.
[17] Charles Parsons. Transfinite induction in subsystems of number theory (abstract).
Journal of Symbolic Logic, 38(3):544–545, 1973.
[18] Gordon D. Plotkin. LCF considered as a programming language. Theoretical Com-
puter Science, 5:223–255, 1977.
[19] Dag Prawitz. Ideas and results in proof theory. In Jens Erik Fenstad, editor, Pro-
ceedings of the second Scandinavian Logic Symposium, pages 235–307. North–Holland,
Amsterdam, 1971.
[20] Helmut Schwichtenberg. Normalization. In F.L. Bauer, editor, Logic, Algebra and
Computation. Proceedings of the International Summer School Marktoberdorf, Ger-
many, July 25 – August 6, 1989, Series F: Computer and Systems Sciences, Vol. 79,
pages 201–237, Berlin, 1991. NATO Advanced Study Institute, Springer.
[21] Helmut Schwichtenberg. Primitive recursion on the partial continuous functionals. In
Manfred Broy, editor, Informatik und Mathematik, pages 251–269. Springer, Berlin,
1991.
[22] Helmut Schwichtenberg. Minimal from classical proofs. In E. Börger, G. Jäger,
H. Kleine-Büning, and M.M. Richter, editors, Computer Science Logic, pages 326–
328. Springer LNCS 626, 1992.
[23] Helmut Schwichtenberg. Proofs as programs. In Peter Aczel, Harold Simmons, and
Stanley S. Wainer, editors, Proof Theory. A selection of papers from the Leeds Proof
Theory Programme 1990, pages 81–113. Cambridge University Press, 1992.
[24] Helmut Schwichtenberg. Density and choice for partial continuous functionals. In
preparation, 1993.
[25] Helmut Schwichtenberg. Logikprogrammierung. Vorlesungsmanuskript, Universität
München, 1994.
[26] Helmut Schwichtenberg and Stan Wainer. Ordinal bounds for programs. Submitted
for publication in: Feasible Mathematics II (ed. Jeff Remmel), August 1993.
96 References

[27] Dana Scott. Domains for denotational semantics. In E. Nielsen and E. M. Schmidt,
editors, Automata, Languages and Programming, Lecture Notes in Computer Science,
Volume 140, pages 577–613. Springer, Berlin, 1982. A corrected and expanded version
of a paper prepared for ICALP’82, Aarhus, Denmark.
[28] Martin Stein. Interpretationen der Heyting–Arithmetik endlicher Typen. PhD thesis,
Universität Münster, Fachbereich Mathematik, 1976.
[29] William W. Tait. Infinitely long terms of transfinite type. In J. Crossley and M. Dum-
mett, editors, Formal Systems and Recursive Functions, pages 176–185, Amsterdam,
1965. North–Holland.
[30] Anne S. Troelstra, editor. Metamathematical Investigations of Intuitionistic Arith-
metic and Analysis, volume 344 of Lecture Notes in Mathematics. Springer, Berlin,
1973.
[31] Anne S. Troelstra and Dirk van Dalen. Constructivism in Mathematics. An Intro-
duction, volume 121, 123 of Studies in Logic and the Foundations of Mathematics.
North–Holland, Amsterdam, 1988.
[32] Jaco van de Pol. Stong normalization of fol with permutative conversions. Manuscript,
May 1994.
[33] Anton Wallner. Komplexe Existenzbeweise in der Arithmetik. Master’s thesis, Ma-
thematisches Institut der Universität München, 1993.
[34] Hermann Weyl. Über die neue Grundlagenkrise der Mathematik. Mathematische
Zeitschrift, 10, 1921.
Index

∃∗ –rules, 39
∨∗ –rules, 41

almost normal, 84
assumption variable, 6
atom, 2
atomic formula, 2

beta–eta–conversion, 15
beta–eta–normal, 16
branch, 20

classical logic, 9
closed, 2
closed derivation term, 8
Coincidence Lemma, 5
constant, 1
conversion relation, 15
critical, 65

decidable, 51, 65
deduction theorem, 13
definite Horn formula, 25
delta–expansion, 23
derivable formula, 8
derivation term, 6
disjunction property, 41

end segment, 88
98 Index

environment, 4
eta–expansion, 21
eta–expansor, 22
Ex–falso–quodlibet axiom, 10
Ex–falso–quodlibet Lemma, 11
existence property, 41
Existence–Elimination–Lemma, 35
Existence–Introduction–Lemma, 35
extensionality axioms, 52

formula, 1
free assumption variable, 6
function symbol, 1

generalized definite Horn formula, 25

Harrop formula, 43
Herbrand’s theorem, 20
Heyting, 38
Hilbert system, 13
Horn formula, 25

immediate reduct, 87
immediate subformula, 20
indirect proof, 10
inner delta–expansion, 23
instance, 41
intuitionistic logic, 9

judgement, 43

level, 1, 76
long normal form, 21
Index 99

minimal logic, 6
Mints formula, 10
modified realizability interpretation, 43

natural deduction, 6
normalization by evaluation, 32
number, 49

one–step reduction relation, 16, 87


Orevkov, 34

parallel reduction relation, 27


Peano–axioms, 76
Peirce formula, 10
permutative reduct, 87
position, 19
elimination, 19
end, 19
independent, 19
introduction, 19
leaf, 19
minimal, 19
progression rule, 83
progressive, 78
propositional symbol, 2

relation symbol, 2
relevant, 66

SLD–Resolution, 25
stability axiom, 10
Stability Lemma, 10
100 Index

strong computability predicates, 16


strong delta–expansion, 23
strong disjunction, 40
strong existential quantifier, 38
strongly valid, 88
strongly valid under substitution, 92
subformula, 20
subformula property, 20
substitution, 3
Substitution Lemma, 5
surjectivity of pairing, 27

term, 1
type, 1
types associated with A, 43

uniqueness of normal forms, 27

variable, 1

Weyl, 38

You might also like