0% found this document useful (0 votes)

102 views480 pages

Proofs and Computations Helmut Schwichtenberg Stanley S - Schwichtenberg, Helmut Wainer, Stanley S - PERSPECTIVES in LOGIC, 2012 - Cambridge - 9780521517690 - Anna's Archive

The book 'Proofs and Computations' explores the relationship between proof theory and computability, providing a comprehensive resource for advanced students and researchers in mathematical logic and computer science. It consists of three parts: the first covers basic proof theory and Gödel's theorems, the second classifies provable recursion in classical systems, and the third develops the theoretical foundations of the proof-assistant MINLOG. The authors, Helmut Schwichtenberg and Stanley S. Wainer, aim to extend existing literature by addressing topics that reflect their interests and include new material not previously available in book form.

Uploaded by

catloafs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views480 pages

Proofs and Computations Helmut Schwichtenberg Stanley S - Schwichtenberg, Helmut Wainer, Stanley S - PERSPECTIVES in LOGIC, 2012 - Cambridge - 9780521517690 - Anna's Archive

Uploaded by

catloafs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 480

Proofs and Computations

Driven by the question “What is the computational content of a (formal) proof?”, this
book studies fundamental interactions between proof theory and computability. It
provides a unique self-contained text for advanced students and researchers in
mathematical logic and computer science.
Part 1 covers basic proof theory, computability and Gödel’s theorems. Part 2 studies
and classifies provable recursion in classical systems, from fragments of Peano
arithmetic up to 11 -CA0 . Ordinal analysis and the (Schwichtenberg–Wainer)
subrecursive hierarchies play a central role, and are used in proving the “modified
finite Ramsey” and “extended Kruskal” independence results for PA and 11 -CA0 .
Part 3 develops the theoretical underpinnings of the first author’s proof-assistant
MINLOG. Three chapters cover higher-type computability via information systems, a
constructive theory TCF of computable functionals, realizability, Dialectica
interpretation, computationally significant quantifiers and connectives, and polytime
complexity in a two-sorted, higher-type arithmetic with linear logic.

h e l m u t s c h w i c h t e n b e r g is an Emeritus Professor of Mathematics at

Ludwig-Maximilians-Universität Munchen. He has recently developed the
“proof-assistant” MINLOG, a computer-implemented logic system for proof/program
development and extraction of computational content.

s t a n l e y s . w a i n e r is an Emeritus Professor of Mathematics at the University of

Leeds and a past-President of the British Logic Colloquium.
PERSPECTIVES IN LOGIC

The Perspectives in Logic series publishes substantial, high-quality books whose central
theme lies in any area or aspect of logic. Books that present new material not now
available in book form are particularly welcome. The series ranges from introductory
texts suitable for beginning graduate courses to specialized monographs at the frontiers
of research. Each book offers an illuminating perspective for its intended audience.

The series has its origins in the old Perspectives in Mathematical Logic series edited
by the -Group for “Mathematische Logik” of the Heidelberger Akademie der Wis-
senschaften, whose beginnings date back to the 1960s. The Association for Symbolic
Logic has assumed editorial responsibility for the series and changed its name to reﬂect
its interest in books that span the full range of disciplines in which logic plays an
important role.

Thomas Scanlon, Managing Editor

Department of Mathematics, University of California Berkeley

Editorial Board:
Michael Benedikt
Department of Computing Science, University of Oxford

Steven A. Cook
Computer Science Department, University of Toronto

Michael Glanzberg
Department of Philosophy, University of California Davis

Antonio Montalban
Department of Mathematics, University of Chicago

Michael Rathjen
School of Mathematics, University of Leeds

Simon Thomas
Department of Mathematics, Rutgers University
ASL Publisher
Richard A. Shore
Department of Mathematics, Cornell University

For more information, see www.aslonline.org/books perspectives.html

PERSPECTIVES IN LOGIC

Proofs and Computations

HELMUT SCHWICHTENBERG
Ludwig-Maximilians-Universität Munchen

STANLEY S. WAINER
University of Leeds

association for symbolic logic

cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town,
Singapore, São Paulo, Delhi, Tokyo, Mexico City
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK
Published in the United States of America by Cambridge University Press, New York

www.cambridge.org
Information on this title: www.cambridge.org/9780521517690

Association for Symbolic Logic

Richard Shore, Publisher
Department of Mathematics, Cornell University, Ithaca, NY 14853
http://www.aslonline.org

C Association for Symbolic Logic 2012

This publication is in copyright. Subject to statutory exception

and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.

First published 2012

Printed in the United Kingdom at the University Press, Cambridge

A catalogue record for this publication is available from the British Library

ISBN 978-0-521-51769-0 Hardback

Cambridge University Press has no responsibility for the persistence or

accuracy of URLs for external or third-party internet websites referred to in
this publication, and does not guarantee that any content on such websites is,
or will remain, accurate or appropriate.
To Ursula and Lib
for their love and patience

In memory of our teachers

Dieter Rödding (1937–1984)
Martin H. Löb (1921–2006)
CONTENTS

page

Preface xi

Preliminaries 1

Part 1. Basic proof theory and computability

Chapter 1. Logic 5
1.1. Natural deduction 6
1.2. Normalization 20
1.3. Soundness and completeness for tree models 44
1.4. Soundness and completeness of the classical fragment 52
1.5. Tait calculus 57
1.6. Notes 59

Chapter 2. Recursion theory 61

2.1. Register machines 61
2.2. Elementary functions 65
2.3. Kleene’s normal form theorem 73
2.4. Recursive deﬁnitions 78
2.5. Primitive recursion and for-loops 84
2.6. The arithmetical hierarchy 90
2.7. The analytical hierarchy 94
2.8. Recursive type-2 functionals and well-foundedness 98
2.9. Inductive deﬁnitions 102
2.10. Notes 110

Chapter 3. Gödel’s theorems 113

3.1. IΔ0 (exp) 114
3.2. Gödel numbers 123
3.3. The notion of truth in formal theories 133
3.4. Undecidability and incompleteness 135
3.5. Representability 137
3.6. Unprovability of consistency 141
3.7. Notes 145

vii
viii Contents

Part 2. Provable recursion in classical systems

Chapter 4. The provably recursive functions of arithmetic 149
4.1. Primitive recursion and IΣ1 151
4.2. ε0 -recursion in Peano arithmetic 157
4.3. Ordinal bounds for provable recursion in PA 173
4.4. Independence results for PA 185
4.5. Notes 192
Chapter 5. Accessible recursive functions, ID< and Π11 -CA0 195
5.1. The subrecursive stumblingblock 195
5.2. Accessible recursive functions 199
5.3. Proof-theoretic characterizations of accessibility 215
5.4. ID< and Π11 -CA0 231
5.5. An independence result: extended Kruskal theorem 237
5.6. Notes 245

Part 3. Constructive logic and complexity

Chapter 6. Computability in higher types 249

6.1. Abstract computability via information systems 249
6.2. Denotational and operational semantics 266
6.3. Normalization 290
6.4. Computable functionals 296
6.5. Total functionals 304
6.6. Notes 309
Chapter 7. Extracting computational contentfrom proofs 313
7.1. A theory of computable functionals 313
7.2. Realizability interpretation 327
7.3. Reﬁned A-translation 352
7.4. Gödel’s Dialectica interpretation 367
7.5. Optimal decoration of proofs 380
7.6. Application: Euclid’s theorem 388
7.7. Notes 392

Chapter 8. Linear two-sorted arithmetic 395

8.1. Provable recursion and complexity in EA(;) 397
8.2. A two-sorted variant T(;) of Gödel’s T 404
8.3. A linear two-sorted variant LT(;) of Gödel’s T 412
8.4. Two-sorted systems A(;), LA(;) 422
8.5. Notes 428

Bibliography 431
Index 457
PREFACE

This book is about the deep connections between proof theory and recur-
sive function theory. Their interplay has continuously underpinned and
motivated the more constructively orientated developments in mathemati-
cal logic ever since the pioneering days of Hilbert, Gödel, Church, Turing,
Kleene, Ackermann, Gentzen, Péter, Herbrand, Skolem, Malcev, Kol-
mogorov and others in the 1930s. They were all concerned in one way or
another with the links between logic and computability. Gödel’s theorem
utilized the logical representability of recursive functions in number the-
ory; Herbrand’s theorem extracted explicit loop-free programs (sets of wit-
nessing terms) from existential proofs in logic; Ackermann and Gentzen
analysed the computational content of ε-reduction and cut-elimination
in terms of transfinite recursion; Turing not only devised the classical
machine-model of computation, but (what is less well known) already
foresaw the potential of transfinite induction as a method for program
verification; and of course the Herbrand–Gödel–Kleene equation calcu-
lus presented computability as a formal system of equational derivation
(with “call by value” being modelled by a substitution rule which itself is
a form of “cut” but at the level of terms).
That these two fields—proof and recursion—have developed side by
side over the intervening seventy-five years so as to form now a cor-
nerstone in the foundations of computer science, testifies to the power
and importance of mathematical logic in transferring what was orig-
inally a body of philosophically inspired ideas and results, down to
the frontiers of modern information technology. A glance through the
contents of any good undergraduate text on the fundamentals of com-
puting should lend conviction to this argument, but we hope also that
some of the examples and applications in this book will support it fur-
ther.
Our book is not about “technology transfer” however, but rather about
a fundamental area of mathematical logic which underlies it. We would
not presume to compete with “classics” in the field like Kleene’s [1952],
Schütte’s [1960], Takeuti’s [1987], Girard’s [1987] or Troelstra and van
Dalen’s [1988], but rather we aim to complement them and extend their

ix
x Preface

range of proof-theoretic applications with a treatment of topics which re-

flect our own personal interests over many years and include some which
have not previously been covered in textbook form. Our contribution
could be seen as building on the books by Rose [1984] and Troelstra and
Schwichtenberg [2000]. Thus the theory of proofs, recursions, provably
recursive functions, their subrecursive hierarchy classifications, and the
computational significance and application of these, will constitute the
driving theme. The methods will be those now-classical ones of cut elimi-
nation, normalization, functional interpretations, program extraction and
ordinal analyses, but restricted to the “small-to-medium-sized” range of
mathematically significant proof systems between polynomial time arith-
metic and (restricted) Π11 -comprehension or ID< . Within this range we
hope to have something new to contribute. Beyond it, the “outer limits”
of ordinal analysis and the emerging connections there with large cardinal
theory are presently undergoing rapid and surprising development. Who
knows where that will lead?—others are far better equipped to comment.
The fundamental point of proof theory as we see it is Kreisel’s dictum:
a proof of a theorem conveys more information than the mere statement
that it is true (at least it does if we know how to analyse its structure). In
a computational context, knowledge of the truth of a “program specifica-
tion”
∀x∈N ∃y∈N Spec(x, y)
tells us that there is a while-program
y := 0; while ¬Spec(x, y) do y := y + 1; p := y
which satisfies it in the sense that
∀x∈N Spec(x, p(x)).
However, we know nothing about the complexity of the program without
knowing why the specification was true in the first place. What we need is a
proof! Even when we have one it might use lemmas of logical complexity
far greater than Σ01 , and this would prevent us from analysing directly
the computational structure embedded within it. So what is required is a
method of reducing the proof and the applications of lemmas in it, to a
“computational” (Σ01 ) form, together with some means of measuring the
cost or complexity of that reduction. The method is cut elimination or
normalization and the measurement is achieved by ordinal analysis.
One may wonder why transfinite ordinals enter into the measurement
of program complexity. The reason is this: a program, say over the
natural numbers, is a syntactic description of a type-2 recursive functional
which takes variable “given functions” g to output functions f. By
unravelling the intended operation of the program according to the various
function calls it makes in the course of evaluation, one constructs a tree of
subcomputations, each branch of which is determined by an input number
Preface xi

for the function f being computed together with a particular choice of

given function g. To say that the program “terminates everywhere” is to
say that every branch of the computation tree ends with an output value
after ﬁnitely many steps. Thus

termination = well-foundedness .

But what is the obvious way to measure the size of an infinite well-founded
tree? Of course, by its ordinal height or rank!
We thus have a natural hierarchy of total recursive functionals in terms
of the (recursive) ordinal ranks of their defining programs. Kleene was
already aware in 1958 that this hierarchy continues to expand throughout
the recursive ordinals—i.e., for each recursive ordinal α there is a total
recursive functional which cannot be defined by any program of rank < α.
The “subrecursive classification problem” therefore has a perfectly natural
and satisfying solution when viewed in the light of type-2 functionals,
in stark contrast to the rather disappointing state of affairs in the case
of type-1 functions—where “intensionality” and the question “what is
a natural well-ordering?” are stumbling blocks which have long been a
barrier to achieving any useful hierarchy classification of all recursive
functions (in one go). Nevertheless there has been good progress in
classifying subclasses of the recursive functions which arise naturally in
a proof-theoretic context, and the later parts of this book will be much
concerned with this.
Proofs in mathematics generally deal with abstract, “higher-type” ob-
jects. Therefore an analysis of computational aspects of such proofs
must be based on a theory of computation in higher types. A mathe-
matically satisfactory such theory has been provided by Scott [1970] and
Ershov [1977] (see also Chernov [1976]). The basic concept is that of
a partial continuous functional. Since each such can be seen as a limit
of its finite approximations, we get for free the notion of a computable
functional: it is given by a recursive enumeration of finite approximations.
The price to pay for this simplicity is that functionals are now partial, in
stark contrast to the view of Gödel [1958]. However, the total function-
als can be defined as a subset of the partial ones. In fact, as observed
by Kreisel, they form a dense subset with respect to the Scott topology.
The next step is to build a theory, with the partial continuous functionals
as the intended range of its (typed) variables. This is TCF, a “theory
of computable functionals”. It suffices to restrict the prime formulas to
those built with inductively defined predicates. For instance, falsity can
be defined by F := Eq(ff, tt), where Eq is the inductively defined Leib-
niz equality. The only logical connectives are implication and universal
quantification: existence, conjunction and disjunction can all be seen as
inductively defined (with parameters). TCF is well suited to reflect on
xii Preface

the computational content of proofs, along the lines of the Brouwer–

Heyting–Kolmogorov interpretation, or more technically a realizability
interpretation in the sense of Kleene and Kreisel. Moreover the computa-
tional content of classical (or “weak”) existence proofs can be analyzed in
TCF, by way of Gödel’s [1958] Dialectica interpretation and the so-called
A-translation of Friedman [1978] and Dragalin [1979]. The difference
of TCF to well-established theories like Martin-Löf’s [1984] intuitionis-
tic type theory or the theory of constructions underlying the Coq proof
assistant is that TCF treats partial continuous functionals as first-class
citizens. Since they are the mathematically correct domain of computable
functionals, it seems (to us) that this is a reasonable step to take.
Our aim is to bring these issues together as two sides of the same coin:
on one the proof-theoretic aspects of computation, and on the other the
computational aspects of proof. We shall try to do this in progressive
stages through three distinct parts, keeping in mind that we want the
book to be self-contained, orderly and fairly complete in its presentation
of our material, and also useful as a reference. Thus we begin with two
basic chapters on proof theory and recursion theory, followed by Chap-
ter 3 on Gödel’s theorems, providing the fundamental material without
which any book with this title would be incomplete. Part 2 deals with the
now fairly classical results on hierarchies of provably recursive functions
for a spectrum of theories ranging between IΔ0 (exp) and Π11 -CA0 . The
point is that, just as in other areas of mathematical logic, ordinals (in
our case recursive ordinals) provide a fundamental abstract mathematical
scale against which we can measure and compare the logical complexity of
inductive proofs and the computational complexity of recursive programs
specified by them. The bridge is formed by the fast-, medium- and slow-
growing hierarchies of proof-theoretic bounding functions which are quite
naturally associated with the ordinals themselves, and which also “model”
in a clear way the basic computational paradigms: “functional”, “while-
loop” and “term-reduction”. We also bring out connections between
fast-growing functions and combinatorial independence results such as
the modified finite Ramsey theorem and the extended Kruskal theorem,
for labelled trees. Part 3 develops the fundamental theory of computable
functionals TCF. This is also the theory underlying the first author’s
proof-assistant and program-extraction system Minlog1 . The implemen-
tation is not discussed here, but the underlying proof-theoretic ideas and
the various aspects of constructive logic involved are dealt with in some
detail. Thus: the domain of continuous functionals in which higher-
type computation naturally arises, functional interpretations, and finally
implicit complexity, where ideas developed throughout the whole book
are brought to bear on certain newer weak systems with more “feasible”

1 See http://www.minlog-system.de.
Preface xiii

provable functions. Every chapter is intended to contain some examples

or applications illustrating our intended theme: the link between proof
theory, recursion and computation.
Although we have struggled with this book project over many years, we
have found the writing of it more and more stimulating as it got closer to
fruition. The reason for this has been a divergence of our mathematical
“standpoints”—while one (S.W.) holds to a more pragmatic middle-of-
the-road stance, the other (H.S.) holds a somewhat clearer and committed
constructive view of the mathematical world. The difference has led
to many happy hours of dispute and this inevitably may be evident in
the choice of topics and their presentations which follow. Despite these
differences, both authors believe (to a greater or lesser extent) that it is a
rather extreme position to hold that existence is really equivalent to the
impossibility of non-existence. Foundational studies—even if classically
inspired—should surely investigate these positions to see what relative
properties the “strong” (∃) and “weak” (∃) ˜ existential quantifiers might
possess.
A brief guide to the reader. Part 1 could be the basis for a year-long
graduate logic course, possibly extended by selected material from parts 2
and 3. We have endeavoured (though not with complete success) to make
each chapter fairly self-contained, so that the interested reader might
access any one of them directly. All succeeding chapters require a small
amount of material from chapters 1 and 2, and chapters 7 and 8 rely on
some concepts from chapter 6 (recursion operators, algebras and types),
but otherwise all chapters can be read almost independently.
Acknowledgement. We would like to thank the many people who have
contributed to this book in one way or another. They are too numerous to
be listed here, but most of them appear in the references. The material in
parts 1 and 2 has been used as a basis for graduate lecture courses by both
authors, and we gratefully acknowledge the many useful student contri-
butions to both the exposition and the content. Simon Huber—in his
diploma thesis [2010]—provided many improvements and/or corrections
to part 3. Wilfried Buchholz’s work has significantly influenced part 2,
and he made helpful comments on the final section of chapter 3. Our
special thanks go to Josef Berger and Grigori Mints, who kindly agreed
to critically read the manuscript. Thanks also to the Mathematisches
Forschungsinstitut Oberwolfach for allowing us the opportunity to meet
occasionally, under the Research in Pairs Programme, while the book was
in its formative stages.
PRELIMINARIES

Referencing. References are by chapter, section and subsection: i.j.k

refers to subsection k of section j in chapter i. Theorems and the like
are referred to, not by number, but by their names or the number of
the subsection they appear in. Equations are numbered within a chapter
where necessary; reference to equation n in section j is in the form “(j.n)”.
Mathematical notation. Definitional equivalence or equality (accord-
ing to context) is written :=. Application of terms is left associative,
and lambda abstraction binds stronger than application. For example,
MNK means (MN )K and not M (NK ), and x MN means (x M )N , not
x (MN ). We also sometimes save on parentheses by writing, e.g., Rxyz,
Rt0 t1 t2 instead of R(x, y, z), R(t0 , t1 , t2 ), where R is some predicate sym-
bol. Similarly for a unary function symbol with a (typographically) simple
argument, we write fx for f(x), etc. In this case no confusion will arise.
But readability requires that we write in full R(fx, gy, hz), instead of
Rfxgyhz. Binary function and relation symbols are usually written in
infix notation, e.g., x +y instead of +(x, y), and x < y instead of <(x, y).
We write t = s for ¬(t = s) and t < s for ¬(t < s).
Logical formulas. We use the notation →, ∧, ∨, ⊥, ¬A, ∀x A, ∃x A,
where ⊥ means logical falsity and negation is defined (most of the time)
by ¬A := A → ⊥. Note that the bound variable in a quantifier is written as
a subscript—the authors believe this to be typographically more pleasing.
Bounded quantifiers are written like ∀i<n A. In writing formulas we save
on parentheses by assuming that ∀, ∃, ¬ bind more strongly than ∧, ∨, and
that in turn ∧, ∨ bind more strongly than →, ↔ (where A ↔ B abbreviates
(A → B) ∧ (B → A)). Outermost parentheses are also usually dropped.
Thus A ∧ ¬B → C is read as ((A ∧ (¬B)) → C ). In the case of iterated
implications we sometimes use the short notation
A1 → A2 → · · · → An−1 → An for
A1 → (A2 → · · · → (An−1 → An ) . . . ).

1
Part 1

BASIC PROOF THEORY AND COMPUTABILITY

Chapter 1

LOGIC

The main subject of Mathematical Logic is mathematical proof. In this

introductory chapter we deal with the basics of formalizing such proofs
and, via normalization, analysing their structure. The system we pick for
the representation of proofs is Gentzen’s natural deduction from [1935].
Our reasons for this choice are twofold. First, as the name says this is a
natural notion of formal proof, which means that the way proofs are repre-
sented corresponds very much to the way a careful mathematician writing
out all details of an argument would go anyway. Second, formal proofs
in natural deduction are closely related (via the so-called Curry–Howard
correspondence) to terms in typed lambda calculus. This provides us not
only with a compact notation for logical derivations (which otherwise tend
to become somewhat unmanageable tree-like structures), but also opens
up a route to applying (in part 3) the computational techniques which
underpin lambda calculus.
Apart from classical logic we will also deal with more constructive
logics: minimal and intuitionistic logic. This will reveal some interesting
aspects of proofs, e.g., that it is possible and useful to distinguish beween
existential proofs that actually construct witnessing objects, and others
that don’t.
An essential point for Mathematical Logic is to fix a formal language
to be used. We take implication → and the universal quantifier ∀ as
basic. Then the logic rules correspond precisely to lambda calculus. The
additional connectives (i.e., the existential quantifier ∃, disjunction ∨ and
conjunction ∧) can then be added either as rules or as axiom schemes. It is
“natural” to treat them as rules, and that is what we do here. However, later
(in chapter 7) they will appear instead as axioms formalizing particular
inductive definitions. In addition to the use of inductive definitions as
a unifying concept, another reason for that change of emphasis will be
that it fits more readily with the more computational viewpoint adopted
there.
We shall not develop sequent-style logics, except for Tait’s one-sided
sequent calculus for classical logic, it (and the associated cut elimination
process) being a most convenient tool for the ordinal analysis of classical

5
6 1. Logic

theories, as done in part 2. There are many excellent treatments of sequent

calculus in the literature and we have little of substance to add. Rather,
we concentrate on those logical issues which have interested us. This
chapter does not simply introduce basic proof theory, but in addition
there is an underlying theme: to bring out the constructive content of logic,
particularly in regard to the relationship between minimal and classical
logic. For us the latter is most appropriately viewed as a subsystem of the
former.

1.1. Natural deduction

Rules come in pairs: we have an introduction and an elimination rule

for each of the logical connectives. The resulting system is called minimal
logic; it was introduced by Kolmogorov [1932], Gentzen [1935] and Jo-
hansson [1937]. Notice that no negation is yet present. If we go on and
require ex-falso-quodlibet for the nullary propositional symbol ⊥ (“fal-
sum”) we can embed intuitionistic logic with negation as A → ⊥. To
embed classical logic, we need to go further and add as an axiom schema
x → R
the principle of indirect proof, also called stability (∀x (¬¬R x ) for
relation symbols R), but then it is appropriate to restrict to the language
based on →, ∀, ⊥ and ∧. The reason for this restriction is that we can
neither prove ¬¬∃x A → ∃x A nor ¬¬(A ∨ B) → A ∨ B, for there are
countermodels to both (the former is Markov’s scheme). However, we
can prove them for the classical existential quantifier and disjunction de-
fined by ¬∀x ¬A and ¬A → ¬B → ⊥. Thus we need to make a distinction
between two kinds of “exists” and two kinds of “or”: the classical ones are
“weak” and the non-classical ones “strong” since they have constructive
content. In situations where both kinds occur together we must mark the
distinction, and we shall do this by writing a tilde above the weak disjunc-
tion and existence symbols thus ∨, ˜ ∃.
˜ Of course, in a classical context this
distinction does not arise and the tilde is not necessary.
1.1.1. Terms and formulas. Let a countably infinite set {vi | i ∈ N} of
variables be given; they will be denoted by x, y, z. A first-order language
L then is determined by its signature, which is to mean the following.
(i) For every natural number n ≥ 0 a (possibly empty) set of n-ary
relation symbols (or predicate symbols). 0-ary relation symbols are
called propositional symbols. ⊥ (read “falsum”) is required as a
fixed propositional symbol. The language will not, unless stated
otherwise, contain = as a primitive. Binary relation symbols can be
marked as infix.
(ii) For every natural number n ≥ 0 a (possibly empty) set of n-ary func-
tion symbols. 0-ary function symbols are called constants. Binary
function symbols can also be marked as infix.
1.1. Natural deduction 7

We assume that all these sets of variables, relation and function symbols
are disjoint. L is kept fixed and will only be mentioned when necessary.
Terms are inductively defined as follows.
(i) Every variable is a term.
(ii) Every constant is a term.
(iii) If t1 , . . . , tn are terms and f is an n-ary function symbol with n ≥ 1,
then f(t1 , . . . , tn ) is a term. (If r, s are terms and ◦ is a binary
function symbol, then (r ◦ s) is a term.)
From terms one constructs prime formulas, also called atomic formulas
or just atoms: If t1 , . . . , tn are terms and R is an n-ary relation symbol,
then R(t1 , . . . , tn ) is a prime formula. (If r, s are terms and ∼ is a binary
relation symbol, then (r ∼ s) is a prime formula.)
Formulas are inductively defined from prime formulas by
(i) Every prime formula is a formula.
(ii) If A and B are formulas, then so are (A → B) (“if A then B”),
(A ∧ B) (“A and B”) and (A ∨ B) (“A or B”).
(iii) If A is a formula and x is a variable, then ∀x A (“A holds for all x”)
and ∃x A (“there is an x such that A”) are formulas.
Negation is defined by

¬A := (A → ⊥).

We shall often need to do induction on the height, denoted |A|, of

formulas A. This is defined as follows: |P| = 0 for atoms P, |A ◦ B| =
max(|A|, |B|) + 1 for binary operators ◦ (i.e., →, ∧, ∨) and | ◦ A| = |A| + 1
for unary operators ◦ (i.e., ∀x , ∃x ).
1.1.2. Substitution, free and bound variables. Expressions E, E which
differ only in the names of bound (occurrences of) variables will be re-
garded as identical. This is sometimes expressed by saying that E and E
are α-equal. In other words, we are only interested in expressions “mod-
ulo renaming of bound variables”. There are methods of finding unique
representatives for such expressions, e.g., the name-free terms of de Bruijn
[1972]. For the human reader such representations are less convenient, so
we shall stick to the use of bound variables.
In the definition of “substitution of expression E for variable x in
expression E”, either one requires that no variable free in E becomes
bound by a variable-binding operator in E, when the free occurrences of x
are replaced by E (also expressed by saying that there must be no “clashes
of variables”), “E is free for x in E”, or the substitution operation is
taken to involve a systematic renaming operation for the bound variables,
avoiding clashes. Having stated that we are only interested in expressions
modulo renaming bound variables, we can without loss of generality
assume that substitution is always possible.
8 1. Logic

Also, it is never a real restriction to assume that distinct quantiﬁer

occurrences are followed by distinct variables, and that the sets of bound
and free variables of a formula are disjoint.
Notation. “FV” is used for the (set of) free variables of an expression;
so FV(r) is the set of variables free in the term r, FV(A) the set of variables
free in formula A etc. A formula A is said to be closed if FV(A) = ∅.
E[x := r] denotes the result of substituting the term r for the variable
x in the expression E. Similarly, E[ x := r ] is the result of simultaneously
substituting the terms r = r1 , . . . , rn for the variables x = x1 , . . . , x n ,
respectively.
In a given context we shall adopt the following convention. Once a
formula has been introduced as A(x), i.e., A with a designated variable x,
we write A(r) for A[x := r], and similarly with more variables.
1.1.3. Subformulas. Unless stated otherwise, the notion of subformula
will be that defined by Gentzen.
Definition. (Gentzen) subformulas of A are defined by
(a) A is a subformula of A;
(b) if B ◦ C is a subformula of A then so are B, C , for ◦ = →, ∧, ∨;
(c) if ∀x B(x) or ∃x B(x) is a subformula of A, then so is B(r).
Definition. The notions of positive, negative, strictly positive subfor-
mula are defined in a similar style:
(a) A is a positive and a strictly positive subformula of itself;
(b) if B ∧ C or B ∨ C is a positive (negative, strictly positive) subformula
of A, then so are B, C ;
(c) if ∀x B(x) or ∃x B(x) is a positive (negative, strictly positive) subfor-
mula of A, then so is B(r);
(d) if B → C is a positive (negative) subformula of A, then B is a negative
(positive) subformula of A, and C is a positive (negative) subformula
of A;
(e) if B → C is a strictly positive subformula of A, then so is C .
A strictly positive subformula of A is also called a strictly positive part
(s.p.p.) of A. Note that the set of subformulas of A is the union of the
positive and negative subformulas of A.
Example. (P → Q) → R ∧ ∀x S(x) has as s.p.p.’s the whole formula,
R ∧ ∀x S(x), R, ∀x S(x), S(r). The positive subformulas are the s.p.p.’s
and in addition P; the negative subformulas are P → Q, Q.
1.1.4. Examples of derivations. To motivate the rules for natural de-
duction, let us start with informal proofs of some simple logical facts.
(A → B → C ) → (A → B) → A → C.
1.1. Natural deduction 9

Informal proof. Assume A → B → C . To show: (A → B) → A → C .

So assume A → B. To show: A → C . So finally assume A. To show: C .
Using the third assumption twice we have B → C by the first assumption,
and B by the second assumption. From B → C and B we then obtain C .
Then A → C , cancelling the assumption on A; (A → B) → A → C
cancelling the second assumption; and the result follows by cancelling the
first assumption.
∀x (A → B) → A → ∀x B, if x ∈
/ FV(A).
Informal proof. Assume ∀x (A → B). To show: A → ∀x B. So assume
A. To show: ∀x B. Let x be arbitrary; note that we have not made any
assumptions on x. To show: B. We have A → B by the first assumption.
Hence also B by the second assumption. Hence ∀x B. Hence A → ∀x B,
cancelling the second assumption. Hence the result, cancelling the first
assumption.
A characteristic feature of these proofs is that assumptions are intro-
duced and eliminated again. At any point in time during the proof the
free or “open” assumptions are known, but as the proof progresses, free
assumptions may become cancelled or “closed” because of the implies-
introduction rule.
We reserve the word proof for the informal level; a formal representation
of a proof will be called a derivation.
An intuitive way to communicate derivations is to view them as labelled
trees each node of which denotes a rule application. The labels of the
inner nodes are the formulas derived as conclusions at those points, and
the labels of the leaves are formulas or terms. The labels of the nodes
immediately above a node k are the premises of the rule application. At
the root of the tree we have the conclusion (or end formula) of the whole
derivation. In natural deduction systems one works with assumptions at
leaves of the tree; they can be either open or closed (cancelled). Any of
these assumptions carries a marker. As markers we use assumption vari-
ables denoted u, v, w, u0 , u1 , . . . . The variables of the language previously
introduced will now often be called object variables, to distinguish them
from assumption variables. If at a node below an assumption the depen-
dency on this assumption is removed (it becomes closed) we record this by
writing down the assumption variable. Since the same assumption may
be used more than once (this was the case in the first example above),
the assumption marked with u (written u : A) may appear many times.
Of course we insist that distinct assumption formulas must have distinct
markers. An inner node of the tree is understood as the result of passing
from premises to the conclusion of a given rule. The label of the node then
contains, in addition to the conclusion, also the name of the rule. In some
cases the rule binds or closes or cancels an assumption variable u (and
hence removes the dependency of all assumptions u : A thus marked). An
10 1. Logic

application of the ∀-introduction rule similarly binds an object variable

x (and hence removes the dependency on x). In both cases the bound
assumption or object variable is added to the label of the node.
Definition. A formula A is called derivable (in minimal logic), written
A, if there is a derivation of A (without free assumptions) using the
natural deduction rules. A formula B is called derivable from assumptions
A1 , . . . , An , if there is a derivation of B with free assumptions among
A1 , . . . , An . Let Γ be a (finite or infinite) set of formulas. We write
Γ B if the formula B is derivable from finitely many assumptions
A1 , . . . , An ∈ Γ.
We now formulate the rules of natural deduction.
1.1.5. Introduction and elimination rules for → and ∀. First we have an
assumption rule, allowing to write down an arbitrary formula A together
with a marker u:
u: A assumption.
The other rules of natural deduction split into introduction rules (I-rules
for short) and elimination rules (E-rules) for the logical connectives which,
for the time being, are just → and ∀. For implication → there is an intro-
duction rule →+ and an elimination rule →− also called modus ponens.
The left premise A → B in →− is called the major (or main) premise,
and the right premise A the minor (or side) premise. Note that with an
application of the →+ -rule all assumptions above it marked with u : A
are cancelled (which is denoted by putting square brackets around these
assumptions), and the u then gets written alongside. There may of course
be other uncancelled assumptions v : A of the same formula A, which may
get cancelled at a later stage.
[u : A]
|M |M |N
B A→B A −
→+ u →
A→B B
For the universal quantifier ∀ there is an introduction rule ∀+ (again
marked, but now with the bound variable x) and an elimination rule ∀−
whose right premise is the term r to be substituted. The rule ∀+ x with
conclusion ∀x A is subject to the following (eigen-)variable condition: the
derivation M of the premise A should not contain any open assumption
having x as a free variable.

|M |M
A + ∀x A(x) r
∀ x ∀−
∀x A A(r)
1.1. Natural deduction 11

We now give derivations of the two example formulas treated informally

above. Since in many cases the rule used is determined by the conclusion,
we suppress in such cases the name of the rule.
u: A → B → C w: A v: A → B w: A
B →C B
C
→+ w
A→C
→+ v
(A → B) → A → C
→+ u
(A → B → C ) → (A → B) → A → C

u : ∀x (A → B) x
A→B v: A
B +
∀ x
∀x B
→+ v
A → ∀x B
→+ u
∀x (A → B) → A → ∀x B
Note that the variable condition is satisﬁed: x is not free in A (and also
not free in ∀x (A → B)).
1.1.6. Properties of negation. Recall that negation is deﬁned by ¬A :=
(A → ⊥). The following can easily be derived.
A → ¬¬A,
¬¬¬A → ¬A.
However, ¬¬A → A is in general not derivable (without stability—we will
come back to this later on).
Lemma. The following are derivable.
(A → B) → ¬B → ¬A,
¬(A → B) → ¬B,

¬¬(A → B) → ¬¬A → ¬¬B,

(⊥ → B) → (¬¬A → ¬¬B) → ¬¬(A → B),

¬¬∀x A → ∀x ¬¬A.
Derivations are left as an exercise.
1.1.7. Introduction and elimination rules for disjunction ∨, conjunction
∧ and existence ∃. For disjunction the introduction and elimination rules
are
[u : A] [v : B]
|M |M |M |N |K
A ∨+ B ∨+ A∨B C C −
0 1 ∨ u, v
A∨B A∨B C
12 1. Logic

For conjunction we have

[u : A] [v : B]
|M |N |M |N
A B + A∧B C −
∧ ∧ u, v
A∧B C
and for the existential quantifier
[u : A]
|M |M |N
r A(r) + ∃x A B −
∃ ∃ x, u (var.cond.)
∃x A(x) B
Similar to ∀+ x the rule ∃− x, u is subject to an (eigen-)variable condition:
in the derivation N the variable x (i) should not occur free in the formula
of any open assumption other than u : A, and (ii) should not occur free
in B.
Again, in each of the elimination rules ∨− , ∧− and ∃− the left premise
is called major (or main) premise, and the right premise is called the minor
(or side) premise.
It is easy to see that for each of the connectives ∨, ∧, ∃ the rules and
the following axioms are equivalent over minimal logic; this is left as an
exercise. For disjunction the introduction and elimination axioms are
∨+
0 : A → A ∨ B,
∨+
1 : B → A ∨ B,
∨− : A ∨ B → (A → C ) → (B → C ) → C.
For conjunction we have
∧+ : A → B → A ∧ B, ∧− : A ∧ B → (A → B → C ) → C
and for the existential quantifier
∃+ : A → ∃x A, ∃− : ∃x A → ∀x (A → B) → B (x ∈
/ FV(B)).
Remark. All these axioms can be seen as special cases of a general
schema, that of an inductively defined predicate, which is defined by some
introduction rules and one elimination rule. Later we will study this kind
of definition in full generality.
We collect some easy facts about derivability; B ← A means A → B.
Lemma. The following are derivable.
(A ∧ B → C ) ↔ (A → B → C ),
(A → B ∧ C ) ↔ (A → B) ∧ (A → C ),
1.1. Natural deduction 13

(A ∨ B → C ) ↔ (A → C ) ∧ (B → C ),
(A → B ∨ C ) ← (A → B) ∨ (A → C ),

(∀x A → B) ← ∃x (A → B) if x ∈
/ FV(B),
(A → ∀x B) ↔ ∀x (A → B) if x ∈
/ FV(A),

(∃x A → B) ↔ ∀x (A → B) if x ∈
/ FV(B),
(A → ∃x B) ← ∃x (A → B) if x ∈
/ FV(A).
Proof. A derivation of the final formula is
w: A → B v: A
x B
u : ∃x (A → B) ∃x B −
∃ x, w
∃x B
→+ v
A → ∃x B
→+ u
∃x (A → B) → A → ∃x B
The variable condition for ∃− is satisfied since the variable x (i) is not free
in the formula A of the open assumption v : A, and (ii) is not free in ∃x B.
The rest of the proof is left as an exercise.
As already mentioned, we distinguish between two kinds of “exists”
and two kinds of “or”: the “weak” or classical ones and the “strong” or
non-classical ones, with constructive content. In the present context both
kinds occur together and hence we must mark the distinction; we shall do
this by writing a tilde above the weak disjunction and existence symbols
thus
A ∨˜ B := ¬A → ¬B → ⊥, ∃˜ x A := ¬∀x ¬A.
These weak variants of disjunction and the existential quantifier are no
stronger than the proper ones (in fact, they are weaker):
A ∨ B → A ∨˜ B, ∃x A → ∃˜ x A.
This can be seen easily by putting C := ⊥ in ∨− and B := ⊥ in ∃− .
Remark. Since ∃˜ x ∃˜ y A unfolds into a rather awkward formula we extend
the ∃-terminology
˜ to lists of variables:
∃˜ x1 ,...,xn A := ∀x1 ,...,xn (A → ⊥) → ⊥.
Moreover let
∃˜ x1 ,...,xn (A1 ∧˜ · · · ∧˜ Am ) := ∀x1 ,...,xn (A1 → · · · → Am → ⊥) → ⊥.
This allows to stay in the →, ∀ part of the language. Notice that ∧˜ only
makes sense in this context, i.e., in connection with ∃.
˜
14 1. Logic

1.1.8. Intuitionistic and classical derivability. In the deﬁnition of deriv-

ability in 1.1.4 falsity ⊥ plays no role. We may change this and require
ex-falso-quodlibet axioms, of the form
∀x (⊥ → R
x)
with R a relation symbol distinct from ⊥. Let Efq denote the set of all
such axioms. A formula A is called intuitionistically derivable, written
i A, if Efq A. We write Γ i B for Γ ∪ Efq B.
We may even go further and require stability axioms, of the form
∀x (¬¬R
x → R
x)
with R again a relation symbol distinct from ⊥. Let Stab denote the set of
all these axioms. A formula A is called classically derivable, written c A,
if Stab A. We write Γ c B for Γ ∪ Stab B.
It is easy to see that intuitionistically (i.e., from Efq) we can derive
⊥ → A for an arbitrary formula A, using the introduction rules for the
connectives. A similar generalization of the stability axioms is only pos-
sible for formulas in the language not involving ∨, ∃. However, it is still
possible to use the substitutes ∨˜ and ∃.
˜
Theorem (Stability, or principle of indirect proof).
(a) (¬¬A → A) → (¬¬B → B) → ¬¬(A ∧ B) → A ∧ B.
(b) (¬¬B → B) → ¬¬(A → B) → A → B.
(c) (¬¬A → A) → ¬¬∀x A → A.
(d) c ¬¬A → A for every formula A without ∨, ∃.
Proof. (a) is left as an exercise.
(b) For simplicity, in the derivation to be constructed we leave out
applications of →+ at the end.
u2 : A → B w: A
u1 : ¬B B
⊥
→ u2
+
v : ¬¬(A → B) ¬(A → B)
⊥
→+ u1
u : ¬¬B → B ¬¬B
B
(c)
u2 : ∀x A x
u1 : ¬A A
⊥
→+ u2
v : ¬¬∀x A ¬∀x A
⊥
→+ u1
u : ¬¬A → A ¬¬A
A
1.1. Natural deduction 15

(d) Induction on A. The case Rt with R distinct from ⊥ is given by

Stab. In the case ⊥ the desired derivation is
u: ⊥
→+ u
v : (⊥ → ⊥) → ⊥ ⊥→⊥
⊥
In the cases A ∧ B, A → B and ∀x A use (a), (b) and (c), respectively.
Using stability we can prove some well-known facts about the interac-
tion of weak disjunction and the weak existential quantifier with implica-
tion. We first prove a more refined claim, stating to what extent we need
to go beyond minimal logic.
Lemma. The following are derivable.
(∃˜ x A → B) → ∀x (A → B) if x ∈
/ FV(B), (1)
(¬¬B → B) → ∀x (A → B) → ∃˜ x A → B if x ∈
/ FV(B), (2)
(⊥ → B[x:=c]) → (A → ∃˜ x B) → ∃˜ x (A → B) if x ∈
/ FV(A), (3)
∃˜ x (A → B) → A → ∃˜ x B if x ∈
/ FV(A). (4)
The last two items can also be seen as simplifying a weakly existentially
quantified implication whose premise does not contain the quantified vari-
able. In case the conclusion does not contain the quantified variable we
have
(¬¬B → B) → ∃˜ x (A → B) → ∀x A → B if x ∈
/ FV(B), (5)
∀x (¬¬A → A) → (∀x A → B) → ∃˜ x (A → B) if x ∈
/ FV(B). (6)

Proof. (1)
u1 : ∀x ¬A x
¬A A
⊥
→+ u1
∃˜ x A → B ¬∀x ¬A
B
(2)
∀x (A → B) x
A→B u1 : A
u2 : ¬B B
⊥
→+ u1
¬A
¬∀x ¬A ∀x ¬A
⊥
→+ u2
¬¬B → B ¬¬B
B
16 1. Logic

(3) Writing B0 for B[x:=c] we have

∀x ¬(A → B) x u1 : B
¬(A → B) A→B
⊥
A → ∃˜ x B u2 : A →+ u1
¬B
∃˜ x B ∀x ¬B
⊥ → B0 ⊥
∀x ¬(A → B) c B0
→+ u2
¬(A → B0 ) A → B0
⊥
(4)

∀x ¬B x u1 : A → B A
¬B B
⊥
→+ u1
¬(A → B)
∃˜ x (A → B) ∀x ¬(A → B)
⊥
(5)

∀x A x
u1 : A → B A
u2 : ¬B B
⊥
→+ u1
¬(A → B)
∃˜ x (A → B) ∀x ¬(A → B)
⊥
→+ u2
¬¬B → B ¬¬B
B
(6) We derive ∀x (⊥ → A) → (∀x A → B) → ∀x ¬(A → B) → ¬¬A.
Writing Ax, Ay for A(x), A(y) we have

∀y (⊥ → Ay) y u1 : ¬Ax u2 : Ax
⊥ → Ay ⊥
Ay
∀x Ax → B ∀y Ay
∀x ¬(Ax → B) x B
→+ u2
¬(Ax → B) Ax → B
⊥
→+ u1
¬¬Ax
1.1. Natural deduction 17

Using this derivation M we obtain

∀x (¬¬Ax → Ax) x |M
¬¬Ax → Ax ¬¬Ax
Ax
∀x Ax → B ∀x Ax
∀x ¬(Ax → B) x B
¬(Ax → B) Ax → B
⊥
Since clearly (¬¬A → A) → ⊥ → A the claim follows.
Remark. An immediate consequence of (6) is the classical derivability
of the “drinker formula” ∃˜ x (Px → ∀x Px), to be read “in every non-empty
bar there is a person such that, if this person drinks, then everybody
drinks”. To see this let A := Px and B := ∀x Px in (6).
Corollary.
c (∃˜ x A → B) ↔ ∀x (A → B) if x ∈
/ FV(B) and B without ∨, ∃,
i (A → ∃˜ x B) ↔ ∃˜ x (A → B) if x ∈
/ FV(A),
c ∃˜ x (A → B) ↔ (∀x A → B) if x ∈
/ FV(B) and A, B without ∨, ∃.
There is a similar lemma on weak disjunction:
Lemma. The following are derivable.
(A ∨˜ B → C ) → (A → C ) ∧ (B → C ),
(¬¬C → C ) → (A → C ) → (B → C ) → A ∨˜ B → C,
(⊥ → B) → (A → B ∨˜ C ) → (A → B) ∨˜ (A → C ),
(A → B) ∨˜ (A → C ) → A → B ∨˜ C,
(¬¬C → C ) → (A → C ) ∨˜ (B → C ) → A → B → C,
(⊥ → C ) → (A → B → C ) → (A → C ) ∨˜ (B → C ).
Proof. The derivation of the ﬁnal formula is
A→B →C u1 : A
B →C u2 : B
C
→+ u1
¬(A → C ) A→C
⊥→C ⊥
C
→+ u2
¬(B → C ) B →C
⊥
The other derivations are similar to the ones above, if one views ∃˜ as an
inﬁnitary version of ∨.
˜
18 1. Logic

Corollary.
c (A ∨˜ B → C ) ↔ (A → C ) ∧ (B → C ) for C without ∨, ∃,
i (A → B ∨˜ C ) ↔ (A → B) ∨˜ (A → C ),
c (A → C ) ∨˜ (B → C ) ↔ (A → B → C ) for C without ∨, ∃.
Remark. It is easy to see that weak disjunction and the weak existential
quantiﬁer satisfy the same axioms as the strong variants, if one restricts
the conclusion of the elimination axioms to formulas without ∨, ∃. In
fact, we have
A → A ∨˜ B, B → A ∨˜ B,
c A ∨˜ B → (A → C ) → (B → C ) → C (C without ∨, ∃),
A → ∃˜ x A,
c ∃˜ x A → ∀x (A → B) → B (x ∈
/ FV(B), B without ∨, ∃).
The derivations of the second and the fourth formula are
A → C u2 : A
u1 : ¬C C B → C u3 : B
⊥ u1 : ¬C C
→+ u2
¬A → ¬B → ⊥ ¬A ⊥
→+ u3
¬B → ⊥ ¬B
⊥
→+ u1
¬¬C → C ¬¬C
C
and

∀x (A → B) x
A→B u2 : A
u1 : ¬B B
⊥
→+ u2
¬A
¬∀x ¬A ∀x ¬A
⊥
→+ u1
¬¬B → B ¬¬B
B

1.1.9. Gödel–Gentzen translation. Classical derivability Γ c B was

deﬁned in 1.1.8 by Γ ∪ Stab B. This embedding of classical logic into
minimal logic can be expressed in a somewhat diﬀerent and very explicit
form, namely as a syntactic translation A → Ag of formulas such that A
is derivable in classical logic if and only if its translation Ag is derivable in
minimal logic.
1.1. Natural deduction 19

Deﬁnition (Gödel–Gentzen translation Ag ).

(Rt )g := ¬¬Rt for R distinct from ⊥,
⊥g := ⊥,
(A ∨ B)g := Ag ∨˜ B g ,
(∃x A)g := ∃˜ x Ag ,
(A ◦ B)g := Ag ◦ B g for ◦ = →, ∧,
(∀x A) := ∀x A .
g g

Lemma. ¬¬Ag → Ag .
Proof. Induction on A.
Case Rt with R distinct from ⊥. We must show ¬¬¬¬Rt → ¬¬Rt,
which is a special case of ¬¬¬B → ¬B.
Case ⊥. Use ¬¬⊥ → ⊥.
Case A ∨ B. We must show ¬¬(Ag ∨˜ B g ) → Ag ∨˜ B g , which is a
special case of ¬¬(¬C → ¬D → ⊥) → ¬C → ¬D → ⊥:
u1 : ¬C → ¬D → ⊥ ¬C
¬D → ⊥ ¬D
⊥
→+ u1
¬¬(¬C → ¬D → ⊥) ¬(¬C → ¬D → ⊥)
⊥
Case ∃x A. In this case we must show ¬¬∃˜ x Ag → ∃˜ x Ag , but this is a
special case of ¬¬¬B → ¬B, because ∃˜ x Ag is the negation ¬∀x ¬Ag .
Case A ∧ B. We must show ¬¬(Ag ∧ B g ) → Ag ∧ B g . By induction
hypothesis ¬¬Ag → Ag and ¬¬B g → B g . Now use part (a) of the
stability theorem in 1.1.8.
The cases A → B and ∀x A are similar, using parts (b) and (c) of the
stability theorem instead.
Theorem. (a) Γ c A implies Γg Ag .
(b) Γg Ag implies Γ c A for Γ, A without ∨, ∃.
Proof. (a) We use induction on Γ c A. For a stability axiom
∀x (¬¬Rx → R x ) we must derive ∀x (¬¬¬¬R x → ¬¬R x ), which is
easy (as above). For the rules →+ , →− , ∀+ , ∀− , ∧+ and ∧− the claim
follows immediately from the induction hypothesis, using the same rule
again. This works because the Gödel–Gentzen translation acts as a ho-
− −
momorphism for these connectives. For the rules ∨+ i , ∨ , ∃ and ∃ the
+

claim follows from the induction hypothesis and the remark at the end of
1.1.8. For example, in case ∃− the induction hypothesis gives
u : Ag
|M
and |N
∃˜ x Ag Bg
20 1. Logic

with x ∈/ FV(B g ). Now use (¬¬B g → B g ) → ∃˜ x Ag → ∀x (Ag →

B ) → B g . Its premise ¬¬B g → B g is derivable by the lemma above.
g

(b) First note that c (B ↔ B g ) if B is without ∨, ∃. Now assume that

Γ, A are without ∨, ∃. From Γg Ag we obtain Γ c A as follows. We
argue informally. Assume Γ. Then Γg by the note, hence Ag because of
Γg Ag , hence A again by the note.

1.2. Normalization

A derivation in normal form does not make “detours”, or more pre-

cisely, it cannot occur that an elimination rule immediately follows an
introduction rule. We use “conversions” to remove such “local maxima”
of complexity, thus reducing any given derivation to normal form.
First we consider derivations involving →, ∀-rules only. We prove that
every such reduction sequence terminates after finitely many steps, and
that the resulting “normal form” is uniquely determined. Uniqueness
of normal form will be shown by means of an application of Newman’s
lemma; we will also introduce and discuss the related notions of conflu-
ence, weak confluence and the Church–Rosser property. Moreover we
analyse the shape of derivations in normal form, and prove the (crucial)
subformula property, which says that every formula in a normal derivation
is a subformula of the end-formula or else of an assumption.
We then show that the requirement to give a normal derivation of a
derivable formula can sometimes be unrealistic. Following Statman [1978]
and Orevkov [1979] we give examples of simple →, ∀-formulas Ci which
need derivation height superexponential in i if normal derivations are
required, but have non-normal derivations of height linear in i. The non-
normal derivations of Ci make use of auxiliary formulas with an i-fold
nesting of implications and universal quantifiers. This sheds some light
on the power of abstract notions in mathematics: their use can shorten
proofs dramatically.
Finally we extend the study of normalization to the rules for ∨, ∧ and ∃.
However, here the elimination rules create a difficulty: the minor premise
reappears in the conclusion. Hence we can have a situation where we
first introduce a logical connective, then do not touch it (by carrying it
along in minor premises of ∨− , ∧− , ∃− ), and finally eliminate the con-
nective. What has to be done is a “permutative” conversion: permute an
elimination immediately following an ∨− , ∧− , ∃− -rule over this rule to the
minor premise. We will show that any sequence of such conversion steps
terminates in a normal form. This easily implies uniqueness of normal
forms, using Newman’s lemma again.
Derivations in normal form continue to have many pleasant properties.
Frist, we again have the subformula property: every formula occurring
1.2. Normalization 21

in a normal derivation is a subformula of either the conclusion or else an

assumption. Second, there is an explicit definability property: a normal
derivation of a formula ∃x A(x) from assumptions not involving disjunc-
tive or existential strictly positive parts ends with an existence introduc-
tion, hence provides a term r and a derivation of A(r). Finally, we have
a disjunction property: a normal derivation of a disjunction A ∨ B from
assumptions not involving disjunctions as strictly positive parts ends with
a disjunction introduction, hence also provides either a derivation of A or
else one of B.
1.2.1. The Curry–Howard correspondence. Since natural deduction de-
rivations can be notationally cumbersome, it will be convenient to repre-
sent them as typed “derivation terms”, where the derived formula is the
“type” of the term (and displayed as a superscript). This representation
goes under the name of Curry–Howard correspondence. It dates back to
Curry [1930] and somewhat later Howard, published only in [1980], who
noted that the types of the combinators used in combinatory logic are
exactly the Hilbert style axioms for minimal propositional logic. Subse-
quently Martin-Löf [1972] transferred these ideas to a natural deduction
setting where natural deduction proofs of formulas A now correspond
exactly to lambda terms with type A. This representation of natural
deduction proofs will henceforth be used consistently.
We give an inductive definition of such derivation terms for the →, ∀-
rules in table 1 where for clarity we have written the corresponding deriva-
tions to the left. Later (in 1.2.6, table 2) this will be extended to the rules
for ∨, ∧ and ∃.
Every derivation term carries a formula as its type. However, we shall
usually leave these formulas implicit and write derivation terms without
them. Notice that every derivation term can be written uniquely in one of
the forms
| v M | (v M )N L,
uM
where u is an assumption variable or assumption constant, v is an as-
sumption variable or object variable, and M , N , L are derivation terms
or object terms. Here the final form is not normal: (v M )N L is called a
-redex (for “reducible expression”). It can be reduced by a “conversion”.
A conversion removes a detour in a derivation, i.e., an elimination imme-
diately following an introduction. We consider the following conversions,
for derivations written in tree notation and also as derivation terms.
→-conversion.
[u : A]
|N
|M
A
B |N →
→ u
+ |M
A→B A −
→ B
B
22 1. Logic

Derivation Term

u: A uA

[u : A]
|M
(u A M B )A→B
B
→+ u
A→B
|M |N
A→B A − (M A→B N A )B
→
B
|M
A + (x M A )∀x A (with var.cond.)
∀ x (with var.cond.)
∀x A

|M
∀x A(x) r (M ∀x A(x) r)A(r)
∀−
A(r)

Table 1. Derivation terms for → and ∀

or written as derivation terms

(u M (u A )B )A→B N A → M (N A )B .

The reader familiar with -calculus should note that this is nothing other
than -conversion.
∀-conversion.
|M
A(x) + | M
∀ x →
∀x A(x) r A(r)
∀−
A(r)
or written as derivation terms

(x M (x)A(x) )∀x A(x) r → M (r).

The closure of the conversion relation → is deﬁned by

(a) If M → M , then M → M .
(b) If M → M , then also MN → M N , NM → NM , v M → v M
(inner reductions).
1.2. Normalization 23

Therefore M → N means that M reduces in one step to N , i.e., N is

obtained from M by replacement of (an occurrence of) a redex M of
M by a conversum M of M , i.e., by a single conversion. Here is an
example:

(x y z (xz(yz)))(u v u)(u v u ) →

(y z ((u v u)z(yz)))(u v u ) →
(y z ((v z)(yz)))(u v u ) →
(y z z)(u v u ) → z z.

The relation →+ (“properly reduces to”) is the transitive closure of →, and

→∗ (“reduces to”) is the reflexive and transitive closure of →. The relation
→∗ is said to be the notion of reduction generated by →.
Lemma (Substitutivity of →).
(a) If M (v) → M (v), then M (N ) → M (N ).
(b) If N → N , then M (N ) →∗ M (N ).
Proof. (a) is proved by induction on M (v) → M (v); (b) by induction
on M (v). Notice that the reason for →∗ in (b) is the fact that v may have
many occurrences in M (v).
1.2.2. Strong normalization. A term M is in normal form, or M is
normal, if M does not contain a redex. M has a normal form if there
is a normal N such that M →∗ N . A reduction sequence is a (finite or
infinite) sequence M0 → M1 → M2 . . . such that Mi → Mi+1 , for all
i. Finite reduction sequences are partially ordered under the initial part
relation; the collection of finite reduction sequences starting from a term
M forms a tree, the reduction tree of M . The branches of this tree may
be identified with the collection of all infinite and all terminating finite
reduction sequences. A term is strongly normalizing if its reduction tree is
finite.

Remark. It may well happen that reasonable “simpliﬁcation” steps on

derivation may lead to reduction loops. The following example is due to
Ekman [1994]. Consider the derivation

u: A → A → B w: A
A→B w: A
B
→+ w
v : (A → B) → A A → B (∗)
u: A → A → B A |M
A → B (∗) A
B
24 1. Logic

where M is
u: A → A → B w: A
A→B w: A
B
→+ w
v : (A → B) → A A→B
A
Its derivation term is
u(vw (uww))(vw (uww)).
Here the following “pruning” simplification can be performed. In between
the two occurrences of A → B marked with (∗) no →+ rule is applied.
Therefore we may cut out or prune the intermediate part and obtain
u: A → A → B w: A
A→B w: A
B |M
→+ w
A→B A
B
whose derivation term is
(w (uww))(vw (uww)).
But now an →-conversion can be performed, which leads to the derivation
we started with.
We show that every term is strongly normalizing. To this end, define by
recursion on k a relation sn(M, k) between terms M and natural numbers
k with the intention that k is an upper bound on the number of reduction
steps up to normal form.
sn(M, 0) := M is in normal form,
sn(M, k + 1) := sn(M , k) for all M such that M → M .
Clearly a term is strongly normalizing if there is a k such that sn(M, k).
We first prove some closure properties of the relation sn, but a word
about notation is crucial here. Whenever we write an applicative term as
MN := MN1 . . . Nk the convention is that bracketing to the left operates.
= (. . . (MN1 ) . . . Nk ).
That is, M N
Lemma (Properties of sn). (a) If sn(M, k), then sn(M, k + 1).
(b) If sn(MN, k), then sn(M, k).
(c) If sn(Mi , ki ) for i = 1 . . . n, then sn(uM1 . . . Mn , k1 + · · · + kn ).
(d) If sn(M, k), then sn(v M, k).
(e) If sn(M (N )L, k) and sn(N, l ), then sn((v M (v))N L, k + l + 1).
Proof. (a) Induction on k. Assume sn(M, k). We show sn(M, k + 1).
Let M with M → M be given; because of sn(M, k) we must have k > 0.
We have to show sn(M , k). Because of sn(M, k) we have sn(M , k − 1),
hence by induction hypothesis sn(M , k).
1.2. Normalization 25

(b) Induction on k. Assume sn(MN, k). We show sn(M, k). In case

k = 0 the term MN is normal, hence also M is normal and therefore
sn(M, 0). Let k > 0 and M → M ; we have to show sn(M , k − 1). From
M → M we obtain MN → M N . Because of sn(MN, k) we have by
definition sn(M N, k − 1), hence sn(M , k − 1) by induction hypothesis.
(c) Assume sn(Mi , ki ) for i = 1 . . . n. We show sn(uM1 . . . Mn , k) with
k := k1 + · · · + kn . Again we employ induction on k. In case k = 0
all Mi are normal, hence also uM1 . . . Mn . Let k > 0 and uM1 . . . Mn →
M . Then M = uM1 . . . Mi . . . Mn with Mi → Mi . We have to show
sn(uM1 . . . Mi . . . Mn , k − 1). Because of Mi → Mi and sn(Mi , ki ) we
have ki > 0 and sn(Mi , ki − 1), hence sn(uM1 . . . Mi . . . Mn , k − 1) by
induction hypothesis.
(d) Assume sn(M, k). We have to show sn(v M, k). Use induction on
k. In case k = 0 M is normal, hence v M is normal, hence sn(v M, 0).
Let k > 0 and v M → L. Then L has the form v M with M →
M . So sn(M , k − 1) by definition, hence sn(v M , k) by induction
hypothesis.
(e) Assume sn(M (N )L, k) and sn(N, l ). We show sn((v M (v))N L,
k + l + 1). We use induction on k + l . In case k + l = 0 the term N
and M (N )L are normal, hence also M and all Li . So there is exactly
one term K such that (v M (v))N L → K , namely M (N )L, and this K
is normal. Now let k + l > 0 and (v M (v))N L → K . We have to show

sn(K, k + l ).
Case K = M (N )L, i.e., we have a head conversion. From sn(M (N )L,

k) we obtain sn(M (N )L, k + l ) by (a).
Case K = (v M (v))N L with M → M . Then we have M (N )L →
Now sn(M (N )L,
k) implies k > 0 and sn(M (N )L, k − 1).
M (N )L.
The induction hypothesis yields sn((v M (v))N L, k − 1 + l + 1).
Case K = (v M (v))N L with N → N . Now sn(N, l ) implies l > 0
and sn(N , l − 1). The induction hypothesis yields sn((v M (v))N L, k+

l − 1 + 1), since sn(M (N )L, k) by (a).
The essential idea of the strong normalization proof is to view the
last three closure properties of sn from the preceding lemma without the
information on the bounds as an inductive definition of a new set SN:
M ∈ SN M ∈ SN M (N )L ∈ SN N ∈ SN
(Var) () ()
uM ∈ SN v M ∈ SN ∈ SN
(v M (v))N L
Corollary. For every term M ∈ SN there is a k ∈ N such that sn(M, k).
Hence every term M ∈ SN is strongly normalizing
Proof. By induction on M ∈ SN, using the previous lemma.
In what follows we shall show that every term is in SN and hence is
strongly normalizing. Given the definition of SN we only have to show
26 1. Logic

that SN is closed under application. In order to prove this we must prove

simultaneously the closure of SN under substitution.
Theorem (Properties of SN). For all formulas A,
(a) for all M (v) ∈ SN, if N A ∈ SN, then M (N ) ∈ SN,
(b) for all M (x) ∈ SN, M (r) ∈ SN,
(c) if M derives A → B, then MN ∈ SN,
(d) if M derives ∀x A, then Mr ∈ SN.
Proof. By course-of-values induction on |A|, with a side induction on
M ∈ SN. Let N A ∈ SN. We distinguish cases on the form of M .
Case u M by (Var) from M ∈ SN. (a) The side induction hypothesis
(a) yields Mi (N ) ∈ SN for all Mi from M . In case u = v we immediately
have u M (N ) ∈ SN. Otherwise we need N M
(N ) ∈ SN. But this follows
by multiple applications of induction hypothesis (c), since every Mi (N )
derives a subformula of A with smaller height. (b) Similar, and simpler.
(c), (d) Use (Var) again.
Case v M by () from M ∈ SN. (a), (b) Use () again. (c) Our
goal is (v M (v))N ∈ SN. By () it suffices to show M (N ) ∈ SN and
N ∈ SN. The latter holds by assumption, and the former by the side
induction hypothesis (a). (d) Similar, and simpler.
Case (w M (w))K L by () from M (K )L ∈ SN and K ∈ SN. (a) The
side induction hypothesis (a) yields M (N )(K (N ))L(N ) ∈ SN and
K(N ) ∈ SN, hence (w M (N ))K (N )L(N ) ∈ SN by (). (b) Similar,

and simpler. (c), (d) Use () again.
Corollary. For every term we have M ∈ SN; in particular every term
M is strongly normalizing.
Proof. Induction on the (first) inductive definition of derivation terms
M . In cases u and v M the claim follows from the definition of SN, and
in case MN it follows from the preceding theorem.
1.2.3. Uniqueness of normal forms. We show that normal forms w.r.t.
the →,∀-conversions are uniquely determined. This is also expressed by
saying that the reduction relation is “confluent”. The proof relies on the
fact that the reduction relation terminates, and uses Newman’s lemma to
infer confluence from the (easy to prove) “local confluence”.
A relation → is said to be confluent, or to have the Church–Rosser
property (CR), if, whenever M0 → M1 and M0 → M2 , then there is an
M3 such that M1 → M3 and M2 → M3 . A relation → is said to be
weakly confluent, or to have the weak Church–Rosser property (WCR),
if, whenever M0 → M1 and M0 → M2 then there is an M3 such that
M1 →∗ M3 and M2 →∗ M3 , where →∗ is the reflexive and transitive
closure of →.
1.2. Normalization 27

Lemma (Newman [1942]). Assume that → is weakly conﬂuent. Then the

normal form w.r.t. → of a strongly normalizing M is unique. Moreover, if all
terms are strongly normalizing w.r.t. →, then the relation →∗ is confluent.
Proof. We write N ← M for M → N , and N ←∗ M for M →∗ N .
Call M good if it satisfies the confluence property w.r.t. →∗ , i.e., whenever
K ←∗ M →∗ L, then K →∗ N ←∗ L for some N . We show that
every strongly normalizing M is good, by transfinite induction on the
well-founded partial order →+ , restricted to all terms occurring in the
reduction tree of M . So let M be given and assume

every M with M →+ M is good.

We must show that M is good, so assume K ←∗ M →∗ L. We may further

assume that there are M , M such that K ←∗ M ← M → M →∗ L,
for otherwise the claim is trivial. But then the claim follows from the
assumed weak conﬂuence and the induction hypothesis for M and M ,
as shown in Figure 1.

M
@
@R
@
M weak conf. M
∗ @∗ ∗ @∗
@
R
@ @
R
@
K IH(M ) ∃N L
@∗ ∗ IH(M )
@R
@
∃N ∗
@∗
@
R
@
∃N

Figure 1. Proof of Newman’s lemma

Proposition. → is weakly conﬂuent.

Proof. Assume N0 ← M → N1 . We show that N0 →∗ N ←∗ N1 for
some N , by induction on M . If there are two inner reductions both on
the same subterm, then the claim follows from the induction hypothesis
using substitutivity. If they are on distinct subterms, then the subterms
do not overlap and the claim is obvious. It remains to deal with the
case of a head reduction together with an inner conversion. This is done
in Figure 2 on page 28, where for the lower left arrows we have used
substitutivity again.
28 1. Logic

(v M (v))N L (v M (v))N L

@ @
@R
@ @R
@

M (N )L (v M (v))N L

M (N )L (v M (v))N L

@ @∗
@
R
@ @
R
@
M (N )L
M (N )L

(v M (v))N L
@
@R
@

M (N )L
(v M (v))N L
@
@
R
@

M (N )L

Figure 2. Weak conﬂuence

Corollary. Normal forms are unique.

Proof. By the proposition → is weakly conﬂuent. From this and the
fact that it is strongly normalizing we can infer (using Newman’s lemma)
that normal forms are unique.
1.2.4. The structure of normal derivations. To analyze normal deriva-
tions, it will be useful to introduce the notion of a track in a proof tree,
which makes sense for non-normal derivations as well.
Deﬁnition. A track of a derivation M is a sequence of formula occur-
rences (f.o.) A0 , . . . , An such that
(a) A0 is a top f.o. in M (possibly discharged by an application of →− );
(b) Ai for i < n is not the minor premise of an instance of →− , and Ai+1
is directly below Ai ;
(c) An is either the minor premise of an instance of →− , or the conclusion
of M .
The track of order 0, or main track, in a derivation is the (unique) track
ending in the conclusion of the whole derivation. A track of order n + 1
is a track ending in the minor premise of an →− -application, with major
premise belonging to a track of order n.
Lemma. In a derivation each formula occurrence belongs to some track.
Proof. By induction on derivations.
Now consider a normal derivation M . Since by normality an E-rule
cannot have the conclusion of an I-rule as its major premise, the E-rules
have to precede the I-rules in a track, so the following is obvious: a track
may be divided into an E-part, say A0 , . . . , Ai−1 , a minimal formula Ai ,
1.2. Normalization 29

and an I-part Ai+1 , . . . , An . In the E-part all rules are E-rules; in the I-part
all rules are I-rules; Ai is the conclusion of an E-rule and, if i < n, a premise
of an I-rule. Tracks are pieces of branches of the tree with successive f.o.’s
in the subformula relationship: either Ai+1 is a subformula of Ai or vice
versa. As a result, all formulas in a track A0 , . . . , An are subformulas of
A0 or of An ; and from this, by induction on the order of tracks, we see
that every formula in M is a subformula either of an open assumption or
of the conclusion. To summarize:
Theorem. In a normal derivation each formula is a subformula of either
the end formula or else an assumption formula.
Proof. One proves this for tracks of order n, by induction on n.
Notice that the minimal formula in a track can be an implication A → B
or generalization ∀x A. However, we can apply an “-expansion” and
replace the occurrence of A → B or ∀x A by
A→B u: A − ∀x A x −
→ ∀
B A +
→+ u ∀ x
A→B ∀x A
Repeating this process we obtain a derivation in “long normal form”, all
of whose minimal formulas are neither implications nor generalizations.
1.2.5. Normal vs. non-normal derivations. We work in a language with
a ternary relation symbol R, a constant 0 and a unary function symbol S.
The intuitive meaning of Ryxz is y + 2x = z, and we can express this by
means of two (“Horn”-) clauses
Hyp1 := ∀y R(y, 0, Sy),
Hyp2 := ∀y,x,z,z1 (Ryxz → Rzxz1 → R(y, Sx, z1 )).
Let
Di := ∃˜ zi ,zi−1 ,...,z0 (R00zi ∧˜ R0zi zi−1 ∧˜ · · · ∧˜ R0z1 z0 ),
Ci := Hyp1 → Hyp2 → Di .
(for ∧˜ cf. the remark at the end of 1.1.7). Di intuitively means that there
2
are numbers zi = 1, zi−1 = 2zi = 2, zi−2 = 2zi−1 = 22 , zi−3 = 2zi−2 = 22
2n
and ﬁnally z0 = 2i (where 20 := 1, 2n+1 := 2 ).
To obtain short derivations of Ci we use the following “lifting” formulas:
A0 (x) := ∀y ∃˜ z Ryxz,
Ai+1 (x) := ∀y∈Ai ∃˜ z∈Ai Ryxz,
where ∀z∈Ai B abbreviates ∀z (Ai (z) → B).
Lemma. There are derivations of
(a) ∀x (Ai (x) → Ai (Sx)) from Hyp2 and of
(b) Ai (0) from Hyp1 and Hyp2 ,
both of constant (i.e., independent of i) height.
30 1. Logic

Proof. Unfolding ∃˜ gives

Di = ∀zi ,zi−1 ,...,z0 (R00zi → R0zi zi−1 → · · · → R0z1 z0 → ⊥) → ⊥,
A0 (x) = ∀y (∀z (Ryxz → ⊥) → ⊥),
Ai+1 (x) = ∀y∈Ai (∀z∈Ai (Ryxz → ⊥) → ⊥).

(a) Derivations Mi of ∀x (Ai (x) → Ai (Sx)) from Hyp2 with constant

height are constructed as follows. We use assumption variables
d : Ai (x), e3 : Ryxz, e5 : Rzxz1 , w0 : ∀z1 ¬R(y, Sx, z1 )
and in case i > 0
e1 : Ai−1 (y), e2 : Ai−1 (z), e4 : Ai−1 (z1 ), w : ∀z1 ∈Ai−1 ¬R(y, Sx, z1 ).
Take in case i = 0
Mi := x,d,y,w0 (dyz,e3 (dzz1 ,e5 (w0 z1 (Hyp2 yxzz1 e3 e5 ))))
and in case i > 0
Mi := x,d,y,e1 ,w (dye1 z,e2 ,e3 (dze2 z1 ,e4 ,e5 (wz1 e4 (Hyp2 yxzz1 e3 e5 )))).
Notice that d is used twice in these derivations.
(b) Clearly A0 (0) can be derived from Hyp1 . For i > 0 the required
derivation of Ai (0) from Hyp1 , Hyp2 of constant height can be constructed
from Mi−1 : ∀x (Ai−1 (x) → Ai−1 (Sx)) and the assumption variables
d : Ai−1 (x), e : ∀z∈Ai−1 ¬Rx0z.
Take
Ni := x,d,e (e(Sx)(Mi−1 xd )(Hyp1 x)).

Proposition. There are derivations of Di from Hyp1 and Hyp2 with

height linear in i.
Proof. Let Ni be the derivation Ai (0) from Hyp1 , Hyp2 constructed in
the lemma above. Let
K0 := w0 z0 v0 ,
K1 := u1 0z0 ,v0 (w1 z1 v1 z0 v0 ),
Ki := ui 0Ni−2 zi−1 ,ui−1 ,vi−1 Ki−1 [wi−1 := wi zi vi ] (i ≥ 2)
with assumption variables
ui : Ai−1 (zi ) (i > 0),
vi : R0zi+1 zi ,
wi : ∀zi (R0zi+1 zi → ∀zi−1 (R0zi zi−1 → . . . ∀z0 (R0z1 z0 → ⊥) . . . )).
1.2. Normalization 31

Ki has free object variables zi+1 , zi and free assumption variables ui , vi , wi

(with ui missing in case i = 0). Substitute zi+1 by 0 and zi by S0 in Ki .
The result has free assumption variables among Hyp1 , Hyp2 and
ui : Ai−1 (S0) (i > 0),
vi : R(0, 0, S0),
wi : ∀zi (R00zi → ∀zi−1 (R0zi zi−1 → . . . ∀z0 (R0z1 z0 → ⊥) . . . )).
Now Ai−1 (S0) can be derived from Hyp1 , Hyp2 with constant height by
the lemma above, and clearly R(0, 0, S0) as well. Ki has height linear in i.
Hence we have a derivation of
∀zi (R00zi → ∀zi−1 (R0zi zi−1 → . . . ∀z0 (R0z1 z0 → ⊥) . . . )) → ⊥
from Hyp1 , Hyp2 of height linear in i. But this formula is up to making
the premise prenex the same as Di , and this transformation can clearly be
done by a derivation of height again linear in i.
Theorem. Every normal derivation of Di from Hyp1 , Hyp2 has at least
2i nodes.
Proof. Let Li be a normal derivation of ⊥ from Hyp1 , Hyp2 and the
assumption
Ei := ∀zi ,zi−1 ,...,z0 (R00zi → R0zi zi−1 → · · · → R0z1 z0 → ⊥).
We can assume that Li has no free variables; otherwise replace them by 0.
The main branch of Li starts with Ei followed by i + 1 applications of
∀− followed by i + 1 applications of →− . All minor premises are of the
form R0n̄ k̄ (where 0̄ := 0, n + 1 := Sn̄).
Let M be an arbitrary normal derivation of Rm̄ n̄ k̄ from Ei , Hyp1 ,
Hyp2 . We show that M (i) contains at least 2n occurrences of Hyp1 , and
(ii) satisﬁes m + 2n = k. We prove (i) and (ii) by induction on n. The
base case is obvious. For the step case we can assume that every normal
derivation of Rm̄ n̄ k̄ from Ei , Hyp1 , Hyp2 contains at least 2n occurrences
of Hyp1 , and satisﬁes m + 2n = k. Now consider an arbitrary normal
derivation of R(m̄, Sn̄, k̄). It must end with
| | M1
¯ ¯
Rm̄ n̄ l → Rl n̄ k̄ → R(m̄, Sn̄, k̄) Rm̄ n̄ l¯ | M2
Rl¯n̄ k̄ → R(m̄, Sn̄, k̄) Rl¯n̄ k̄
R(m̄, Sn̄, k̄)
By induction hypothesis both M1 , M2 contain at least 2n occurrences of
Hyp1 , and we have m + 2n = l and l + 2n = k, hence m + 2n+1 = k. (It
is easy to see that M does not use the assumption Ei .)
We now come back to the main branch of Li , in particular its minor
premises. They derive R001̄, R01̄2̄ and so on until R(0, 2i−1 , 2i ). Hence
altogether we have j≤i 2j = 2i+1 − 1 occurrences of Hyp1 .
32 1. Logic

Derivation Term

|M |M
A ∨+ B ∨+
0 1
A∨B A∨B (∨+ A A∨B
0,B M ) (∨+ B A∨B
1,A M )

[u : A] [v : B]
|M |N |K
A∨B C C −
∨ u, v
C (M A∨B (u A .N C , v B .K C ))C

|M |N
A B +
∧
A∧B M A , N B A∧B

[u : A] [v : B]
|M |N
A∧B C −
∧ u, v
C (M A∧B (u A , v B .N C ))C

|M
r A(r) +
∃ A(r) ∃x A(x)
∃x A(x) (∃+
x,A rM )

[u : A]
|M |N
∃x A B −
∃ x, u (var.cond.)
B (M ∃x A (x, u A .N B ))C (var.cond.)

Table 2. Derivation terms for ∨, ∧ and ∃

1.2.6. Conversions for ∨, ∧, ∃. In addition to the →, ∀-conversions

or as derivation terms (∨+

0,B M )
A A∨B
(u A .N (u)C , v B .K(v)C ) → N (M A )C ,
and similarly for ∨+1 with K instead of N .
∧-conversion.
|M |N [u : A] [v : B] |M |N
A B + |K A B
∧ →
A∧B C − | K
∧ u, v
C C
or M A , N B A∧B (u A , v B .K (u, v)C ) → K (M A , N B )C .
∃-conversion.
|M [u : A(x)] |M
r A(r) + |N A(r)
∃ →
∃x A(x) B − | N
∃ x, u
B B
A(r) ∃x A(x) A(x)
or (∃+
x,A rM ) (u .N (x, u)B ) → N (r, M A(r) )B .
However, there is a diﬃculty here: an introduced formula may be used
as a minor premise of an application of an elimination rule for ∨, ∧ or
∃, then stay the same throughout a sequence of applications of these
rules, being eliminated at the end. This also constitutes a local maximum,
which we should like to eliminate; permutative conversions are designed for
exactly this situation. In a permutative conversion we permute an E-rule
upwards over the minor premises of ∨− , ∧− or ∃− . They are deﬁned as
follows.
∨-permutative conversion.
|M |N |K
A∨B C C |L
→
C C
E-rule
D

|N |L |K |L
|M C C
C C
E-rule E-rule
A∨B D D
D
or with for instance → as E-rule (M A∨B (u A .N C →D , v B .K C →D ))C →D LC
−

→ (M A∨B (u A .(N C →D LC )D , v B .(K C →D LC )D ))D .

∧-permutative conversion.
|M |N |N |K
A∧B C |K |M C C

→ E-rule
C C A∧B D
E-rule
D D
or (M A∧B
(u , v B .N C →D ))C →D K C → (M A∧B (u A , v B .(N C →D K C )D ))D .
A
34 1. Logic

∃-permutative conversion.
|M |N |N |K
∃x A B |K |M B C
→ E-rule
B C ∃x A D
E-rule
D D
or (M ∃x A (u A .N C →D ))C →D K C → (M ∃x A (u A .(N C →D K C )D ))D .
We further need so-called simpliﬁcation conversions. These are some-
what trivial conversions, which remove unnecessary applications of the
elimination rules for ∨, ∧ and ∃. For ∨ we have
[u : A] [v : B]
|M |N |K |N
→
A∨B C C − C
∨ u, v
C
if u : A is not free in N , or (M A∨B (u A .N C , v B .K C ))C → N C ; similarly
for the second component. For ∧ there is the conversion
[u : A] [v : B]
|M |N |N
→
A∧B C − C
∧ u, v
C
if neither u : A nor v : B is free in N , or (M A∧B (u A , v B .N C ))C → N C .
For ∃ the simpliﬁcation conversion is
[u : A]
|M |N |N
→
∃x A B − B
∃ x, u
B
if again u : A is not free in N , or (M ∃x A (u A .N B ))B → N B .
1.2.7. Strong normalization for -, - and -conversions. We now ex-
tend the proof of strong normalization in 1.2.2 to the new conversion rules.
We shall write derivation terms without formula super- or subscripts. For
instance, we write ∃+ instead of ∃+ x,A . Hence we consider derivation terms
M, N, K now of the forms
u | v M | y M | ∨+
0 M | ∨1 M | M, N | ∃ rM |
+ +

MN | Mr | M (v0 .N0 , v1 .N1 ) | M (v, w.N ) | M (x, v.N )

where, in these expressions, the variables v, y, v0 , v1 , w, x are bound.
To simplify the technicalities, we restrict our treatment to the rules for
→ and ∃. The argument easily extends to the full set of rules. Hence we
consider
u | v M | ∃+ rM | MN | M (x, v.N ).
1.2. Normalization 35

The strategy for strong normalization is set out below. We reserve the
letters E, F, G for eliminations, i.e., expressions of the form (x, v.N ), and
R, S, T for both terms and eliminations. Using this notation we obtain a
second (and clearly equivalent) inductive definition of terms:
uM | uM E | v M | ∃+ rM | (v M )N R | ∃+ rM (x, v.N )R
| uM ERS.
and ∃+ rM (x, v.N )R
Here the final three forms are not normal: (v M )N R

both are -redexes, and u M ERS is a permutative redex. The conversion
rules for them are
(v M (v))N → M (N ) → -conversion,
∃+
x,A rM (x, v.N (x, v)) → N (r, M ) ∃ -conversion,
M (x, v.N )R → M (x, v.NR) permutative conversion.
In addition we also allow
M (x, v.N ) → N if v : A is not free in N .
The latter is called a simplification conversion and M (x, v.N ) a simplifica-
tion redex.
The closure of these conversions is defined by
(a) If M → M for = , , , then M → M .
(b) If M → M , then MR → M R, NM → NM , N (x, v.M ) →
N (x, v.M ), v M → v M , ∃+ rM → ∃+ rM (inner reductions).
So M → N means that M reduces in one step to N , i.e., N is obtained
from M by replacement of (an occurrence of) a redex M of M by a
conversum M of M , i.e., by a single conversion.
We inductively define a set SN of derivation terms. In doing so we take
care that for a given M there is exactly one rule applicable to generate
M ∈ SN. This will be crucial to make the later proofs work.
Definition (SN).
∈ SN
M M ∈ SN M ∈ SN
(Var0 ) () (∃)
uM ∈ SN v M ∈ SN ∃ rM ∈ SN
+

M , N ∈ SN (x, v.NR)S ∈ SN
uM
(Var) (Var )
(x, v.N ) ∈ SN
uM (x, v.N )RS ∈ SN
uM

∈ SN
M (N )R N ∈ SN
(→ )
∈ SN
(v M (v))N R

∈ SN
N (r, M )R M ∈ SN
(∃ ) .
∃+
x,A rM (x, v.N (x, v))R ∈ SN
In (Var ) we require that x (from ∃x A) and v are not free in R.
36 1. Logic

It is easy to see that SN is closed under substitution for object variables:

if M (x) ∈ SN, then M (r) ∈ SN. The proof is by induction on M ∈
SN, applying the induction hypothesis first to the premise(es) and then
reapplying the same rule.
We write M ↓ to mean that M is strongly normalizing, i.e., that every
reduction sequence starting from M terminates. By analysing the possible
reduction steps we now show that the set {M | M ↓} has the closure
properties of the definition of SN above, and hence SN ⊆ {M | M ↓}.
Lemma. Every term in SN is strongly normalizing.
Proof. We distinguish cases according to the generation rule of SN
applied last. The following rules deserve special attention.
Case (Var ). We prove, as an auxiliary lemma, that
uM (x, v.NR)S↓ implies u M
(x, v.N )RS↓.

As a typical case consider
uM (x, v.N (x , v .N ))TS↓ implies u M
(x, v.N )(x , v .N )TS↓.
However, it is easy to see that any infinite reduction sequence of the latter
would give rise to an infinite reduction sequence of the former.
Case (→ ). We show that M (N )R↓ and N ↓ imply (v M (v))N R↓.
This is done by induction on N ↓, with a side induction on M (N )R↓.
We need to consider all possible reducts of (v M (v))N R. In case of an
outer -reduction use the assumption. If N is reduced, use the induction
hypothesis. Reductions in M and in R as well as permutative reductions

within R are taken care of by the side induction hypothesis.
Case (∃ ). We show that
and M ↓ together imply ∃+ rM (x, v.N (x, v))R↓.
N (r, M )R↓
This is done by a threefold induction: first on M ↓, second on N (r, M )R↓
and third on the length of R. We need to consider all possible reducts
of ∃+ rM (x, v.N (x, v))R.
In the case of an outer -reduction it must

reduce to N (r, M )R, hence the result by assumption. If M is reduced, use
the first induction hypothesis. Reductions in N (x, v) and in R as well as

permutative reductions within R are taken care of by the second induction
hypothesis. The only remaining case is when R = S S and (x, v.N (x, v))
is permuted with S, to yield ∃+ rM (x, v.N (x, v)S)S, in which case the
third induction hypothesis applies.
For later use we prove a slightly generalized form of the rule (Var ):
Proposition. If M (x, v.NR)S ∈ SN, then M (x, v.N )RS ∈ SN.
Proof. Induction on the generation of M (x, v.NR)S ∈ SN. We dis-
tinguish cases according to the form of M .
Case u T (x, v.NR)S ∈ SN. If T = M (i.e., T consists of deriva-
tion terms only), use (Var ). Else u M (x , v .N )R(x,
v.NR)S ∈ SN.
1.2. Normalization 37

This must be generated by repeated applications of (Var ) from u M (x ,

v .N R(x,
v.NR)S) ∈ SN, and finally by (Var) from M ∈ SN and

N R(x, v.NR)S ∈ SN. The induction hypothesis for the latter fact yields

N R(x,
v.N )RS ∈ SN, hence u M (x , v .N R(x,
∈ SN by (Var)
v.N )RS)

and u M (x , v .N )R(x, v.N )RS ∈ SN by (Var ).

Case ∃+ rM T (x, v.N (x, v)R)S ∈ SN. Similarly, with (∃ ) instead of
(Var ). In detail: If T is empty, by (∃ ) this came from N (r, M )RS ∈ SN
and M ∈ SN, hence ∃+ rM (x, v.N (x, v))RS ∈ SN again by (∃ ). Oth-
erwise we have ∃+ rM (x , v .N (x , v ))T (x, v.NR)S ∈ SN. This must
be generated by (∃ ) from N (r, M )T (x, v.NR)S ∈ SN. The induction
hypothesis yields N (r, M )T (x, v.N )RS ∈ SN, hence ∃+ rM (x , v .N (x,
v ))T (x, v.N )RS ∈ SN by (∃ ).
Case (v M (v))N R(w.NR)
S ∈ SN. By (→ ) this came from N ∈

SN and M (N )R(w.NR)S ∈ SN. But the induction hypothesis yields
M (N )R(w.N
)RS ∈ SN, hence (v M (v))N R(w.N )RS ∈ SN by (→ ).

We show, finally, that every term is in SN and hence is strongly nor-
malizing. Given the definition of SN we only have to show that SN is
closed under →− and ∃− . But in order to prove this we must prove
simultaneously the closure of SN under substitution.
Theorem (Properties of SN). For all formulas A,
(a) for all M ∈ SN, if M proves A = A0 →A1 and N ∈ SN, then MN ∈
SN,
(b) for all M ∈ SN, if M proves A = ∃x B and N ∈ SN, then M (x, v.N ) ∈
SN,
(c) for all M (v) ∈ SN, if N A ∈ SN, then M (N ) ∈ SN.
Proof. Induction on |A|. We prove (a) and (b) before (c), and hence
have (a) and (b) available for the proof of (c). More formally, by induction
on A we simultaneously prove that (a) holds, that (b) holds and that (a),
(b) together imply (c).
(a) By side induction on M ∈ SN. Let M ∈ SN and assume that M
proves A = A0 → A1 and N ∈ SN. We distinguish cases according to
how M ∈ SN was generated. For (Var0 ), (Var ), (→ ) and (∃ ) use the
same rule again.
Case u M (x, v.N ) ∈ SN by (Var) from M , N ∈ SN. Then N N ∈ SN
(x, v.N N ) ∈ SN by (Var),
by side induction hypothesis for N , hence u M
hence u M (x, v.N )N ∈ SN by (Var ).
Case (v M (v))A0 →A1 ∈ SN by () from M (v) ∈ SN. Use (→ ); for this
we need to know M (N ) ∈ SN. But this follows from induction hypothesis
(c) for M (v), since N derives A0 .
(b) By side induction on M ∈ SN. Let M ∈ SN and assume that
M proves A = ∃x B and N ∈ SN. The goal is M (x, v.N ) ∈ SN. We
38 1. Logic

distinguish cases according to how M ∈ SN was generated. For (Var ),

(→ ) and (∃ ) use the same rule again.
Case u M ∈ SN by (Var0 ) from M ∈ SN. Use (Var).
Case (∃+ rM )∃x A ∈ SN by (∃) from M ∈ SN. We must show that
∃+ rM (x, v.N (x, v)) ∈ SN. Use (∃ ); for this we need to know N (r, M ) ∈
SN. But this follows from induction hypothesis (c) for N (r, v) (which is
in SN by the remark above), since M derives A(r).
Case u M (x , v .N ) ∈ SN by (Var) from M , N ∈ SN. Then N (x,
v.N ) ∈ SN by side induction hypothesis for N , hence u M (x , v .N (x,

v.N )) ∈ SN by (Var) and therefore u M (x , v .N )(x, v.N ) ∈ SN by

(Var ).
(c). By side induction on M (v) ∈ SN. Let N A ∈ SN; the goal is
M (N ) ∈ SN. We distinguish cases according to how M (v) ∈ SN was
generated. For (), (∃), (→ ) and (∃ ) use the same rule again, after
applying the induction hypothesis to the premise(es).
Case u M (v) ∈ SN by (Var0 ) from M (v) ∈ SN. Then M (N ) ∈ SN
by side induction hypothesis (c). If u = v, use (Var0 ) again. If u = v,
we must show N M (N ) ∈ SN. Note that N proves A; hence the claim
follows from M (N ) ∈ SN by (a) with M = N .
Case u M (v)(x , v .N (v)) ∈ SN by (Var) from M (v), N (v) ∈ SN. If
u = v, use (Var) again. If u = v, we must show N M (N )(x , v .N (N )) ∈

SN. Note that N proves A; hence in case M (v) is empty the claim follows
from (b) with M = N , and otherwise from (a), (b) and the induction
hypothesis.
Case u M (v)(x , v .N (v))R(v)S(v) ∈ SN has been obtained by (Var )

from u M (v)(x , v .N (v)R(v))S(v) ∈ SN. If u = v, use (Var ) again.

If u = v, we obtain N M (N )(x , v .N (N )R(N ))S(N
) ∈ SN from the
side induction hypothesis. Now use the proposition above with M :=
NM (N ).
Corollary. Every derivation term is in SN and therefore strongly nor-
malizing.
Proof. Induction on the (first) inductive definition of derivation terms.
In cases u, v M and ∃+ rM the claim follows from the definition of SN,
and in cases MN and M (x, v.N ) from parts (a), (b) of the previous
theorem.
Incorporating the full set of rules adds no other technical complications
but merely increases the length. For the energetic reader, however, we
include here the details necessary for disjunction. The conjunction case is
entirely straightforward.
We have additional -conversions

∨+
i M (v0 .N0 , v1 .N1 ) → Ni [vi := M ] ∨i -conversion.
1.2. Normalization 39

The deﬁnition of SN needs to be extended by

M ∈ SN
(∨i )
∨+
i M ∈ SN

, N0 , N1 ∈ SN
M (v0 .N0 R, v1 .N1 R)S ∈ SN
uM
(Var∨ ) (Var∨, )
(v0 .N0 , v1 .N1 ) ∈ SN
uM (v0 .N0 , v1 .N1 )RS ∈ SN
uM

∈ SN
Ni [vi := M ]R ∈ SN
N1−i R M ∈ SN
(∨i )
∨+
i M (v0 .N0 , v1 .N1 )R ∈ SN
The former rules (Var), (Var ) should then be renamed (Var∃ ), (Var∃, ).
The lemma above stating that every term in SN is strongly normalizable
needs to be extended by an additional clause:
Case (∨i ). We show that Ni [vi := M ]R↓, N1−i R↓ and M ↓ together
imply ∨i M (v0 .N0 , v1 .N1 )R↓. This is done by a fourfold induction: ﬁrst
+
on M ↓, second on Ni [vi := M ]R↓, N1−i R↓,
third on N1−i R↓ and
fourth on the length of R. We need to consider all possible reducts of
∨+
i M (v0 .N0 , v1 .N1 )R. In case of an outer -reduction use the assump-
tion. If M is reduced, use the ﬁrst induction hypothesis. Reductions in Ni
and in R as well as permutative reductions within R are taken care of by the
second induction hypothesis. Reductions in N1−i are taken care of by the
third induction hypothesis. The only remaining case is when R = S S and
(v0 .N0 , v1 .N1 ) is permuted with S, to yield (v0 .N0 S, v1 .N1 S). Apply the
fourth induction hypothesis, since (Ni S)[v := M ]S = Ni [v := M ]S S.
Finally the theorem above stating properties of SN needs an additional
clause:
(b ) for all M ∈ SN, if M proves A = A0 ∨ A1 and N0 , N1 ∈ SN, then
M (v0 .N0 , v1 .N1 ) ∈ SN.
Proof. The new clause is proved by induction on M ∈ SN. Let M ∈
SN and assume that M proves A = A0 ∨ A1 and N0 , N1 ∈ SN. The goal is
M (v0 .N0 , v1 .N1 ) ∈ SN. We distinguish cases according to how M ∈ SN
was generated. For (Var∃, ), (Var∨, ), (→ ), (∃ ) and (∨i ) use the same
rule again.
Case u M ∈ SN by (Var0 ) from M ∈ SN. Use (Var∨ ).
A0 ∨A1
Case (∨+ i M ) ∈ SN by (∨ i from M ∈ SN. Use (∨i ); for this we
)
need to know Ni [vi := M ] ∈ SN and N1−i ∈ SN. The latter is assumed,
and the former follows from main induction hypothesis (with Ni ) for the
substitution clause of the theorem, since M derives Ai .
Case u M (x , v .N ) ∈ SN by (Var∃ ) from M , N ∈ SN. For brevity let
E := (v0 .N0 , v1 .N1 ). Then N E ∈ SN by side induction hypothesis for N ,

so u M (x , v .N E) ∈ SN by (Var∃ ) and therefore u M (x , v .N )E ∈ SN

by (Var∃, ).
40 1. Logic

Case u M (v .N , v .N ) ∈ SN by (Var∨ ) from M , N , N ∈ SN. Let

0 0 1 1 0 1

E := (v0 .N0 , v1 .N1 ). Then Ni E ∈ SN by side induction hypothesis for
Ni , so u M
(v .N E, v .N E) ∈ SN by (Var∨ ) and therefore u M
0 0 1 1
(v .N ,
0 0

v1 .N1 )E ∈ SN by (Var∨, ).
Clause (c) now needs additional cases, e.g.,
Case u M (v0 .N0 , v1 .N1 ) ∈ SN by (Var∨ ) from M , N0 , N1 ∈ SN. If u =

v, use (Var∨ ). If u = v, we show N M [v := N ](v0 .N0 [v := N ], v1 .N1 [v :=
N ]) ∈ SN. Note that N proves A; hence in case M empty the claim follows
from (b), and otherwise from (a) and the induction hypothesis.
Since we now have strong normalization, the proof of uniqueness of
normal forms in 1.2.3 can easily be extended to the present case where -,
- and -conversions are admitted.
Proposition. The extended reduction relation → for the full set of con-
nectives is weakly conﬂuent.
Proof. The argument for the corresponding proposition in 1.2.3 can
easily be extended.
Corollary. Normal forms are unique.
Proof. As in 1.2.3, using Newman’s lemma.
1.2.8. The structure of normal derivations, again. As mentioned already,
normalizations aim at removing local maxima of complexity, i.e., formula
occurrences which are ﬁrst introduced and immediately afterwards elimi-
nated. However, an introduced formula may be used as a minor premise
of an application of ∨− , ∧− or ∃− , then stay the same throughout a se-
quence of applications of these rules, being eliminated at the end. This
also constitutes a local maximum, which we should like to eliminate; for
that we need permutative conversions. To analyse normal derivations, it
will be useful to introduce the notion of a segment and to modify accord-
ingly the notion of a track in a proof tree aready considered in 1.2.4. Both
make sense for non-normal derivations as well.

Deﬁnition. A segment (of length n) in a derivation M is a sequence

A0 , . . . , An of occurrences of the same formula A such that
(a) for 0 ≤ i < n, Ai is a minor premise of an application of ∨− , ∧− or
∃− , with conclusion Ai+1 ;
(b) An is not a minor premise of ∨− , ∧− or ∃− .
(c) A0 is not the conclusion of ∨− , ∧− or ∃− .
Notice that a formula occurrence (f.o.) which is neither a minor premise
nor the conclusion of an application of ∨− , ∧− or ∃− always constitutes
a segment of length 1. A segment is maximal or a cut (segment) if An is
the major premise of an E-rule, and either n > 0, or n = 0 and A0 = An
is the conclusion of an I-rule.
1.2. Normalization 41

We use , for segments. is called a subformula of if the formula

A in is a subformula of B in .
The notion of a track is designed to retain the subformula property
in case one passes through the major premise of an application of a
∨− , ∧− , ∃− -rule. In a track, when arriving at an Ai which is the major
premise of an application of such a rule, we take for Ai+1 a hypothesis
discharged by this rule.
Deﬁnition. A track of a derivation M is a sequence of f.o.’s A0 , . . . , An
such that
(a) A0 is a top f.o. in M not discharged by an application of a ∨− , ∧− , ∃− -
rule;
(b) Ai for i < n is not the minor premise of an instance of →− , and either
(i) Ai is not the major premise of an instance of a ∨− , ∧− , ∃− -rule
and Ai+1 is directly below Ai , or
(ii) Ai is the major premise of an instance of a ∨− , ∧− , ∃− -rule and
Ai+1 is an assumption discharged by this instance;
(c) An is either
(i) the minor premise of an instance of →− , or
(ii) the end formula of M , or
(iii) the major premise of an instance of a ∨− , ∧− , ∃− -rule in case
there are no assumptions discharged by this instance.
Lemma. In a derivation each formula occurrence belongs to some track.
Proof. By induction on derivations. For example, suppose a derivation
K ends with a ∃− -application:
[u : A]
|M |N
∃x A B −
∃ x, u
B
B in N belongs to a track (induction hypothesis); either this does not
start in u : A, and then , B is a track in K which ends in the end formula;
or starts in u : A, and then there is a track in M (induction hypothesis)
such that , , B is a track in K ending in the end formula. The other
cases are left to the reader.
Deﬁnition. A track of order 0, or main track, in a derivation is a track
ending either in the end formula of the whole derivation or in the major
premise of an application of a ∨− , ∧− or ∃− -rule, provided there are no
assumption variables discharged by the application. A track of order n + 1
is a track ending in the minor premise of an →− -application, with major
premise belonging to a track of order n.
A main branch of a derivation is a branch (i.e., a linearly ordered
subtree) in the proof tree such that passes only through premises of
42 1. Logic

I-rules and major premises of E-rules, and begins at a top node and ends
in the end formula.
Since by simplification conversions we have removed every application
of an ∨− , ∧− or ∃− -rule that discharges no assumption variables, each
track of order 0 in a normal derivation is a track ending in the end formula
of the whole derivation. Note also that if we search for a main branch
going upwards from the end formula, the branch to be followed is unique
as long as we do not encounter an ∧+ -application. Now let us consider
normal derivations. Recall the notion of a strictly positive part of a
formula, defined in 1.1.3.
Proposition. Let M be a normal derivation, and let = 0 , . . . , n be
a track in M . Then there is a segment i in , the minimum segment or
minimum part of the track, which separates two (possibly empty) parts of
, called the E-part (elimination part) and the I-part (introduction part)
of such that
(a) for each j in the E-part one has j < i, j is a major premise of an
E-rule, and j+1 is a strictly positive part of j , and therefore each j
is a s.p.p. of 0 ;
(b) for each j which is in the I-part or is the minimum segment one has
i ≤ j, and if j = n, then j is a premise of an I-rule and a s.p.p. of
j+1 , so each j is a s.p.p. of n .
Proof. By tracing through the definitions.
Theorem (Subformula property). Let M be a normal derivation. Then
each formula occurring in the derivation is a subformula of either the end
formula or else an (uncancelled ) assumption formula.
Proof. As noted above, each track of order 0 in M is a track ending in
the end formula of M . Furthermore each track has an E-part above an
I-part. Therefore any formula on a track of order 0 is either a subformula
of the end formula or else a subformula of an (uncancelled) assumption.
We can now prove the theorem for tracks of order n, by induction on n.
So assume the result holds for tracks of order n. If A is any formula on
a track of order n + 1, either A lies in the E-part in which case it is a
subformula of an assumption, or else it lies in the I-part and is therefore a
subformula of the minor premise of an →− whose main premise belongs
to a track of order n. In this case A is a subformula of a formula on a
track of order n and we can apply the induction hypothesis.
Theorem (Disjunction property). If no strictly positive part of a formula
in Γ is a disjunction, then Γ A ∨ B implies Γ A or Γ B.
Proof. Consider a normal derivation M of A ∨ B from assumptions
Γ not containing a disjunction as s.p.p. The end formula A ∨ B is the
final formula of a (main) track. If the I-part of this track is empty, then
the structure of main tracks ensures that A ∨ B would be a s.p.p. of an
1.2. Normalization 43

assumption in Γ, but this is not allowed. Hence A ∨ B lies in the I-part

of a main track. If above A ∨ B this track goes through a minor premise
of an ∨− , then the major premise would again be a disjunctive s.p.p. of
an assumption, which is not allowed. Thus A ∨ B belongs to a segment
within the I-part of the track, above which there can only be finitely many
∃− and ∧− followed by an ∨+ i . Its premise is either A or B, and therefore
we can replace the segment of A ∨ B’s by a segment of A’s or a segment
of B’s, thus transforming the proof into either a proof of A or a proof
of B.
There is a similar theorem for the existential quantifier:
Theorem (Explicit definability under hypotheses). If no strictly posi-
tive part of a formula in Γ is existential, then Γ ∃x A(x) implies Γ
A(r1 ) ∨ · · · ∨ A(rn ) for some terms r1 , . . . , rn . If in addition no s.p.p. of a
formula in Γ is disjunctive then Γ ∃x A(x) implies there is even a single
term r such that Γ A(r).
Proof. Consider a normal derivation M of ∃x A(x) from assumptions
Γ not containing an existential s.p.p. We use induction on the derivation,
and distinguish cases on the last rule.
By assumption the last rule cannot be ∃− , using a similar argument
to the above. Again as before, the only critical case is when the last rule
is ∨− .

[u : B] [v : C ]
|M | N0 | N1
B ∨C ∃x A(x) ∃x A(x)
∨− u, v
∃x A(x)

By assumption again neither B nor C can have an existential s.p.p. Ap-

plying the induction hypothesis to N0 and N1 we obtain

[u : B] [v : C ]
| |

n
n+m
|M i=1 A(ri ) i=n+1 A(ri )

n+m ∨+
n+m ∨+
B ∨C i=1 A(ri ) i=1 A(ri )

n+m ∨− u, v
i=1 A(ri )

The remaining cases are left to the reader.

The second part of the theorem is proved similarly; by assumption the
last rule can be neither ∨− nor ∃− , so it may be an ∧− . In that case there is
only one minor premise and so no need to duplicate instances of A(x).
44 1. Logic

1.3. Soundness and completeness for tree models

It is an obvious question to ask whether the logical rules we have been

considering suffice, i.e., whether we have forgotten some necessary rules.
To answer this question we first have to fix the meaning of a formula,
i.e., provide a semantics. This will be done by means of the tree models
introduced by Beth [1956]. Using this concept of a model we will prove
soundness and completeness.
1.3.1. Tree models. Consider a finitely branching tree of “possible
worlds”. The worlds are represented as nodes in this tree. They may
be thought of as possible states such that all nodes “above” a node k are
the ways in which k may develop in the future. The worlds are increasing;
that is, if an atomic formula Rs is true in a world k, then Rs is true in all
future worlds k .
More formally, each tree model is based on a finitely branching tree T .
A node k over a set S is a finite sequence k = a0 , a1 , . . . , an−1 of elements
of S; lh(k) is the length of k. We write k k if k is an initial segment
of k . A tree on S is a set of nodes closed under initial segments. A
tree T is finitely branching if every node in T has finitely many immediate
successors. A tree T is infinite if for every n ∈ N there is a node k ∈ T
such that lh(k) = n. A branch of a tree T is a linearly ordered subtree of
T , and a leaf of T is a node without successors in T . A tree T is complete
if every node in T has an immediate successor, i.e., T has no leaves.
For the proof of the completeness theorem, the complete tree over {0, 1}
(whose branches constitute Cantor space) will suffice. The nodes will be
all the finite sequences of 0’s and 1’s, and the ordering is as above. The
root is the empty sequence and k0 is the sequence k with the element 0
added at the end; similarly for k1.
For the rest of this section, fix a countable formal language L.

Deﬁnition. Let T be a ﬁnitely branching tree. A tree model on T is a

triple T = (D, I0 , I1 ) such that
(a) D is a non-empty set;
(b) for every n-ary function symbol f (in the underlying language L), I0
assigns to f a map I0 (f) : D n → D;
(c) for every n-ary relation symbol R and every node k ∈ T , I1 (R, k) ⊆
D n is assigned in such a way that monotonicity is preserved:

k k → I1 (R, k) ⊆ I1 (R, k ).

If n = 0, then I1 (R, k) is either true or false. There is no special re-

quirement set on I1 (⊥, k). (Recall that minimal logic places no particular
constraints on falsum ⊥.) We write RT ( a , k) for a ∈ I1 (R, k), and |T | to
denote the domain D.
1.3. Soundness and completeness for tree models 45

It is obvious from the deﬁnition that any tree T can be extended to a

complete tree T̄ (i.e., without leaves), in which for every leaf k ∈ T all
sequences k0, k00, k000, . . . are added to T . For every node k0 . . . 0, we
then add I1 (R, k0 . . . 0) := I1 (R, k). In the sequel we assume that all trees
T are complete.
An assignment (or variable assignment) in D is a map assigning to
every variable x ∈ dom() a value (x) ∈ D. Finite assignments will
be written as [x1 := a1 , . . . , xn := an ] or else as [a1 /x1 , . . . , an /xn ], with
distinct x1 , . . . , xn . If is an assignment in D and a ∈ D, let xa be the
assignment in D mapping x to a and coinciding with elsewhere:

(y) if y = x,
xa (y) :=
a if y = x.
Let a tree model T = (D, I0 , I1 ) and an assignment in D be given. We
define a homomorphic extension of (denoted by as well) to terms t
whose variables lie in dom() by
(c) := I0 (c),
(f(t1 , . . . , tn )) := I0 (f)((t1 ), . . . , (tn )).
Observe that the extension of depends on T ; we often write t T [] for
(t).
Definition. T , k A[] (T forces A at node k for an assignment
) is defined inductively. We write k A[] when it is clear from the
context what the underlying model T is, and ∀k n k A for ∀k k (lh(k ) =
lh(k) + n → A).
k (Rs )[] := ∃n ∀k nk
RT (sT [], k ),
k (A ∨ B)[] := ∃n ∀k nk
(k A[] ∨ k B[]),
k (∃x A)[] := ∃n ∀k nk
∃a∈|T | (k A[xa ]),

k (A → B)[] := ∀k k (k A[] → k B[]),
k (A ∧ B)[] := k A[] ∧ k B[],
k (∀x A)[] := ∀a∈|T | (k A[xa ]).
Thus in the atomic, disjunctive and existential cases, the set of k whose
length is lh(k) + n acts as a “bar” in the complete tree. Note that the
implicational case is treated differently, and refers to the “unbounded
future”.
In this definition, the logical connectives →, ∧, ∨, ∀, ∃ on the left hand
side are part of the object language, whereas the same connectives on the
right hand side are to be understood in the usual sense: they belong to
the “metalanguage”. It should always be clear from the context whether
a formula is part of the object or the metalanguage.
46 1. Logic

1.3.2. Covering lemma. It is easily seen (using the definition and mono-
tonicity) that from k A[] and k k we can conclude k A[]. The
converse is true as well:
Lemma (Covering). ∀k n k (k A[]) → k A[].
Proof. Induction on A. We write k A for k A[].
Case Rs. Assume
∀k nk
(k Rs ),
hence by definition
∀k nk
∃m ∀k mk
RT (sT [], k ).
Since T is a finitely branching tree,
∃m ∀k mk
RT (sT [], k ).
Hence k Rs.
The cases A ∨ B and ∃x A are handled similarly.
Case A → B. Let k A → B for all k k with lh(k ) = lh(k) + n.
We show
∀l k (l A → l B).
Let l k and l A. We must show l B. To this end we apply the
induction hypothesis to B and m := max(lh(k) + n, lh(l )). So assume
l l and lh(l ) = m. It is sufficient to show l B. If lh(l ) = lh(l ),
then l = l and we are done. If lh(l ) = lh(k) + n > lh(l ), then l is
an extension of l as well as of k and has length lh(k) + n, and hence
l A → B by assumption. Moreover, l A, since l l and l A. It
follows that l B.
The cases A ∧ B and ∀x A are easy.
1.3.3. Soundness.
Lemma (Coincidence). Let T be a tree model, t a term, A a formula and
, assignments in |T |.
(a) If (x) = (x) for all x ∈ vars(t), then (t) = (t).
(b) If (x) = (x) for all x ∈ FV(A), then T , k A[] if and only if
T , k A[ ].
Proof. Induction on terms and formulas.
Lemma (Substitution). Let T be a tree model, t, r(x) terms, A(x) a
formula and an assignment in |T |. Then
(a) (r(t)) = x(t) (r(x)).
(b) T , k A(t)[] if and only if T , k A(x)[x(t) ].
Proof. Induction on terms and formulas.
Theorem (Soundness). Let Γ∪{A} be a set of formulas such that Γ A.
Then, if T is a tree model, k any node and an assignment in |T |, it follows
that T , k Γ[] implies T , k A[].
1.3. Soundness and completeness for tree models 47

Proof. Induction on derivations.

− − −
We begin with the axiom schemes ∨+ 0 , ∨1 , ∨ , ∧ , ∧ , ∃ and ∃ .
+ + +

k C [] is abbreviated k C , when is known from the context.

Case ∨+0 : A → A ∨ B. We show k A → A ∨ B. Assume for k k

that k A. Show: k A ∨ B. This follows from the definition, since
k A. The case ∨+ 1 : B → A ∨ B is symmetric.
Case ∨− : A ∨ B → (A → C ) → (B → C ) → C . We show that
k A ∨ B → (A → C ) → (B → C ) → C . Assume for k k that
k A ∨ B, k A → C and k B → C (we can safely assume that k
is the same for all three premises). Show that k C . By definition, there
is an n s.t. for all k n k , k A or k B. In both cases it follows
that k C , since k A → C and k B → C . By the covering
lemma, k C .
The cases ∧+ , ∧− are easy.
Case ∃+ : A → ∃x A. We show k (A → ∃x A)[]. Assume k k and
k A[]. We show k (∃x A)[]. Since = x(x) there is an a ∈ |T |
(namely a := (x)) such that k A[xa ]. Hence, k (∃x A)[].
Case ∃− : ∃x A → ∀x (A → B) → B and x ∈ / FV(B). We show
that k (∃x A → ∀x (A → B) → B)[]. Assume that k k and
k (∃x A)[] and k ∀x (A → B)[]. We show k B[]. By
definition, there is an n such that for all k n k we have a ∈ |T | and
k A[xa ]. From k ∀x (A → B)[] it follows that k B[xa ], and
since x ∈ / FV(B), from the coincidence lemma, k B[]. Then, finally,
by the covering lemma k B[].
This concludes the treatment of the axioms. We now consider the rules.
In case of the assumption rule u : A we have A ∈ Γ and the claim is
obvious.
Case →+ . Assume k Γ. We show k A → B. Assume k k and
k A. Our goal is k B. We have k Γ ∪ {A}. Thus, k B by
induction hypothesis.
Case →− . Assume k Γ. The induction hypothesis gives us k A →
B and k A. Hence k B.
Case ∀+ . Assume k Γ[] and x ∈ / FV(Γ). We show k (∀x A)[],
i.e., k A[xa ] for an arbitrary a ∈ |T |. We have
k Γ[xa ] by the coincidence lemma, since x ∈
/ FV(Γ),
k A[xa ] by induction hypothesis.
Case ∀− . Let k Γ[]. We show that k A(t)[]. This follows from
k (∀x A(x))[] by induction hypothesis,
k A(x)[x(t) ] by definition,
k A(t)[] by the substitution lemma.
This concludes the proof.
48 1. Logic

1.3.4. Counter models. With soundness at hand, it is easy to build

counter models proving that certain formulas are underivable in minimal
or intuitionistic logic. A tree model for intuitionistic logic is a tree model
T = (D, I0 , I1 ) in which I1 (⊥, k) is false for all k. This is equivalent to
saying that ⊥ is never forced:
Lemma. Given any tree model T , ⊥T (k) is false at all nodes k if and only
if k ⊥ for all nodes k.
Proof. Clearly if k ⊥ then ⊥ is false at node k. Conversely, suppose
⊥T (k ) is false at all nodes k . We must show ∀k (k ⊥). Let k be given.
Then, since ⊥T (k ) is false at all nodes k , it is certainly false at some
k n k, for every n. This means k ⊥ by deﬁnition.
Therefore by unravelling the implication clause in the forcing deﬁnition,
one sees that in any tree model for intuitionistic logic,

(k ¬A) ↔ ∀k k (k A),

(k ¬¬A) ↔ ∀k k (k ¬A)

↔ ∀k k ∃k k (k
˜ A).

As an example we show that i ¬¬P → P. We describe the desired

tree model by means of a diagram below. Next to every node we write all
propositions forced at that node.
.
•P ..
@
•P @•
@
•P @•
@
@•

This is a tree model because monotonicity clearly holds. Observe also

that I1 (⊥, k) is false at all nodes k. Hence this is an intuitionistic tree
model, and moreover P. Using the remark above, it is easily seen
that ¬¬P. Thus (¬¬P → P) and hence i (¬¬P → P). The
model also shows that the Peirce formula ((P → Q) → P) → P is not
derivable in intuitionistic logic.
As another example we show that the drinker formula ∃˜ x (Px → ∀x Px)
from 1.1.8 is intuitionistically underivable, using a quite diﬀerent tree
model. In this case the underlying tree is the full binary one, i.e., its
nodes are the ﬁnite sequences k = i0 , i1 , . . . , in−1 of numbers 0 or 1. For
the language determined by ⊥ and a unary predicate symbol P consider
T := (D, I1 ) with I1 (⊥, k) false, D := N and

I1 (P, i0 , . . . , in−1 ) := {a ∈ D | i0 , . . . , in−1 contains at least a zeros}.

1.3. Soundness and completeness for tree models 49

Cleary T is an intuitionistic tree model (monotonicity is easily checked),

k ∀x Px for every k, and ∀a,k ∃l k (l Px[x := a]). Therefore
∀a,k (k (Px → ∀x Px)[x := a])
∀x ¬(Px → ∀x Px).
Hence i ¬∀x ¬(Px → ∀x Px).
1.3.5. Completeness.
Theorem (Completeness). Let Γ ∪ {A} be a set of formulas. Then the
following propositions are equivalent.
(a) Γ A.
(b) Γ A, i.e., for all tree models T , nodes k and assignments
T , k Γ[] → T , k A[].
Proof. Soundness already gives “(a) implies (b)”. For the other direc-
tion we employ a technique due to Harvey Friedman and construct a tree
model T (over the set T01 of all finite 0–1-sequences) whose domain D
is the set of all terms of the underlying language, with the property that
Γ B is equivalent to T , B[id]. We can assume here that Γ and also
A are closed.
In order to define T , we will need an enumeration A0 , A1 , A2 , . . . of
the underlying language L (assumed countable), in which every formula
occurs infinitely often. We also fix an enumeration x 0 , x1 , . . . of distinct
variables. Since Γ is countable it can we written Γ = n Γn with finite sets
Γn such that Γn ⊆ Γn+1 . With every node k ∈ T01 , we associate a finite
set Δk of formulas and a set Vk of variables, by induction on the length
of k.
Let Δ := ∅ and V := ∅. Take a node k such that lh(k) = n and
suppose that Δk , Vk are already defined. Write Δ n B to mean that there
is a derivation of length ≤ n of B from Δ. We define Δk0 , Vk0 and Δk1 ,
Vk1 as follows:
Case 0. FV(An ) ⊆ Vk . Then let
Δk0 := Δk1 := Δk and Vk0 := Vk1 := Vk .
Case 1. FV(An ) ⊆ Vk and Γn , Δk n An . Let
Δk0 := Δk and Δk1 := Δk ∪ {An },
Vk0 := Vk1 := Vk .
Case 2. FV(An ) ⊆ Vk and Γn , Δk n An = An ∨ An . Let
Δk0 := Δk ∪ {An , An } and Δk1 := Δk ∪ {An , An },
Vk0 := Vk1 := Vk .
Case 3. FV(An ) ⊆ Vk and Γn , Δk n An = ∃x An (x). Let
Δk0 := Δk1 := Δk ∪ {An , An (xi )} and Vk0 := Vk1 := Vk ∪ {xi },
50 1. Logic

where xi is the ﬁrst variable ∈

/ Vk .
Case 4. FV(An ) ⊆ Vk and Γn , Δk n An , with An neither a disjunction
nor an existentially quantified formula. Let
Δk0 := Δk1 := Δk ∪ {An } and Vk0 := Vk1 := Vk .
Obviously FV(Δk ) ⊆ Vk , and k k implies that Δk ⊆ Δk . Notice
also that because of ∃x (⊥ → ⊥) and the fact that this formula is repeated
infinitely often in the given enumeration, for every variable xi there is an
m such that xi ∈ Vk for all k with lh(k) = m.
We note that
∀k nk
(Γ, Δk B) → Γ, Δk B, provided FV(B) ⊆ Vk . (7)
It is sufficient to show that, for FV(B) ⊆ Vk ,
(Γ, Δk0 B) ∧ (Γ, Δk1 B) → (Γ, Δk B).
In cases 0, 1 and 4, this is obvious. For case 2, the claim follows immedi-
ately from the axiom schema ∨− . In case 3, we have FV(An ) ⊆ Vk and
Γn , Δk n An = ∃x An (x). Assume Γ, Δk ∪{An , An (xi )} B with xi ∈
/ Vk ,
and FV(B) ⊆ Vk . Then xi ∈ / FV(Δk ∪ {An , B}), hence Γ, Δk ∪ {An } B
by ∃− and therefore Γ, Δk B.
Next, we show
Γ, Δk B → ∃n ∀k nk
(B ∈ Δk ), provided FV(B) ⊆ Vk . (8)
Choose n ≥ lh(k) such that B = An and Γn , Δk n An . For all k k, if
lh(k ) = n + 1 then An ∈ Δk (cf. the cases 2–4).
Using the sets Δk we can define a tree model T as (Ter, I0 , I1 ) where Ter
denotes the set of terms of the underlying language, I0 (f)(s ) := fs and
RT (s, k) = I1 (R, k)(s ) := (Rs ∈ Δk ).
Obviously, t T [id] = t for all terms t.
Now write k B for T , k B[id]. We show:
Claim. Γ, Δk B ↔ k B provided FV(B) ⊆ Vk .
The proof is by induction on B.
Case Rs. Assume FV(Rs ) ⊆ Vk . The following are equivalent:
Γ, Δk Rs,
∃n ∀k nk
(Rs ∈ Δk ) by (8) and (7),
∃n ∀k nk
RT (s, k ) by definition of T ,
k Rs by definition of , since t T [id] = t.
Case B ∨ C . Assume FV(B ∨ C ) ⊆ Vk . For the implication → let
Γ, Δk B ∨ C . Choose an n ≥ lh(k) such that Γn , Δk n An = B ∨ C .
Then, for all k k s.t. lh(k ) = n,
Δk 0 = Δk ∪ {B ∨ C, B} and Δk 1 = Δk ∪ {B ∨ C, C },
1.3. Soundness and completeness for tree models 51

and therefore by induction hypothesis

k0 B and k 1 C.
Then by definition we have k B ∨ C . For the reverse implication ←
argue as follows.
k B ∨ C,
∃ n ∀k nk
(k B ∨ k C ),
∃ n ∀k nk
((Γ, Δk B) ∨ (Γ, Δk C )) by induction hypothesis,
∃ n ∀k nk
(Γ, Δk B ∨ C ),
Γ, Δk B ∨ C by (7).
Case B ∧ C . This is evident.
Case B → C . Assume FV(B → C ) ⊆ Vk . For → let Γ, Δk B → C .
We must show k B → C , i.e.,

∀k k (k B → k C ).
Let k k be such that k B. By induction hypothesis, it follows that
Γ, Δk B. Hence Γ, Δk C follows by assumption. Then again by
induction hypothesis k C .
For ← let k B → C , i.e., ∀k k (k B → k C ). We show that
Γ, Δk B → C , using (7). Choose n ≥ lh(k) such that B = An . For all
k m k with m := n − lh(k) we show that Γ, Δk B → C .
If Γn , Δk n An , then k B by induction hypothesis, and k C by
assumption. Hence Γ, Δk C again by induction hypothesis and thus
Γ, Δk B → C .
If Γn , Δk n An , then by definition Δk 1 = Δk ∪ {B}. Hence Γ, Δk 1
B, and thus k 1 B by induction hypothesis. Now k 1 C by
assumption, and finally Γ, Δk 1 C by induction hypothesis. From
Δk 1 = Δk ∪ {B} it follows that Γ, Δk B → C .
Case ∀x B(x). Assume FV(∀x B(x)) ⊆ Vk . For → let Γ, Δk ∀x B(x).
Fix a term t. Then Γ, Δk B(t). Choose n such that FV(B(t)) ⊆ Vk
for all k n k. Then ∀k n k (Γ, Δk B(t)), hence ∀k n k (k B(t)) by
induction hypothesis, hence k B(t) by the covering lemma. This holds
for every term t, hence k ∀x B(x).
For ← assume k ∀x B(x). Pick k n k such that Am = ∃x (⊥ → ⊥),
for m := lh(k) + n. Then at height m we put some xi into the variable
sets: for k n k we have xi ∈ / Vk but xi ∈ Vk j . Clearly k j B(xi ),
hence Γ, Δk j B(xi ) by induction hypothesis, hence (since at this height
we consider the trivial formula ∃x (⊥ → ⊥)) also Γ, Δk B(xi ). Since
xi ∈ / Vk we obtain Γ, Δk ∀x B(x). This holds for all k n k, hence
Γ, Δk ∀x B(x) by (7).
Case ∃x B(x). Assume FV(∃x B(x)) ⊆ Vk . For → let Γ, Δk ∃x B(x).
Choose an n ≥ lh(k) such that Γn , Δk n An = ∃x B(x). Then, for all
52 1. Logic

k k with lh(k ) = n

Δk 0 = Δk 1 = Δk ∪ {∃x B(x), B(xi )}

where xi ∈
/ Vk . Hence by induction hypothesis for B(xi ) (applicable
since FV(B(xi )) ⊆ Vk j for j = 0, 1)

k 0 B(xi ) and k 1 B(xi ).

It follows by deﬁnition that k ∃x B(x).

For ← assume k ∃x B(x). Then ∀k n k ∃t∈Ter (k B(x)[idtx ]) for
some n, hence ∀k n k ∃t∈Ter (k B(t)). For each of the ﬁnitely many
k n k pick an m such that ∀k m k (FV(B(tk )) ⊆ Vk ). Let m0 be the
maximum of all these m. Then

∀k m0 +n k
∃t∈Ter ((k B(t)) ∧ FV(B(t)) ⊆ Vk ).

The induction hypothesis for B(t) yields

∀k m0 +n k
∃t∈Ter (Γ, Δk B(t)),
∀k m0 +n k
(Γ, Δk ∃x B(x)),
Γ, Δk ∃x B(x) by (7),

and this completes the proof of the claim.

Now we can ﬁnish the proof of the completeness theorem by showing
that (b) implies (a). We apply (b) to the tree model T constructed above
from Γ, the empty node and the assignment = id. Then T , Γ[id]
by the claim (since each formula in Γ is derivable from Γ). Hence T ,
A[id] by (b) and therefore Γ A by the claim again.
Completeness of intuitionistic logic follows as a corollary.
Corollary. Let Γ∪{A} be a set of formulas. The following propositions
are equivalent.
(a) Γ i A.
(b) Γ, Efq A, i.e., for all tree models T for intuitionistic logic, nodes k
and assignments

T , k Γ[] → T , k A[]. (9)

1.4. Soundness and completeness of the classical fragment

We give a proof of completeness of classical logic which relies on the

above completeness proof for minimal logic. As far as the authors are
aware, Ulrich Berger was the ﬁrst to give a proof by this method.
1.4. Soundness and completeness of the classical fragment 53

1.4.1. Models. We deﬁne the notion of a (classical) model (or more

accurately, L-model), and what the value of a term and the meaning of
a formula in a model should be. The latter definition is by induction on
formulas, where in the quantifier case we need a quantifier in the definition.
For the rest of this section, fix a countable formal language L; we do
not mention the dependence on L in the notation. Since we deal with
classical logic, we only consider formulas built without ∨, ∃.
Definition. A model is a triple M = (D, I0 , I1 ) such that
(a) D is a non-empty set;
(b) for every n-ary function symbol f, I0 assigns to f a map I0 (f) : D n →
D;
(c) for every n-ary relation symbol R, I1 assigns to R an n-ary relation
on D n . In case n = 0, I1 (R) is either true or false. We require that
I1 (⊥) is false.
We write |M| for the carrier set D of M and f M , RM for the interpre-
tations I0 (f), I1 (R) of the function and relation symbols. Assignments
and their homomorphic extensions are defined as in 1.3.1. Again we write
t M [] for (t).
Definition (Validity). For every model M, assignment in |M| and
formula A such that FV(A) ⊆ dom() we define M |= A[] (read: A is
valid in M under the assignment ) by induction on A.
M |= (Rs )[] := RM (sM []),
M |= (A → B)[] := ((M |= A[]) → (M |= B[])),
M |= (A ∧ B)[] := ((M |= A[]) ∧ (M |= B[])),
M |= (∀x A)[] := ∀a∈|M| (M |= A[xa ]).
Since I1 (⊥) is false, we have M |= ⊥[].
1.4.2. Soundness of classical logic.
Lemma (Coincidence). Let M be a model, t a term, A a formula and
, assignments in |M|.
(a) If (x) = (x) for all x ∈ vars(t), then (t) = (t).
(b) If (x) = (x) for all x ∈ FV(A), then M |= A[] if and only if
M |= A[ ].
Proof. Induction on terms and formulas.
Lemma (Substitution). Let M be a model, t, r(x) terms, A(x) a formula
and an assignment in |M|. Then
(a) (r(t)) = x(t) (r(x)).
(b) M |= A(t)[] if and only if M |= A(x)[x(t) ].
Proof. Induction on terms and formulas.
54 1. Logic

A model M is called classical if ¬¬RM ( a ) → RM ( a ) for all relation

symbols R and all a ∈ |M|. We prove that every formula derivable in
classical logic is valid in an arbitrary classical model.
Theorem (Soundness of classical logic). Let Γ ∪ {A} be a set of formu-
las such that Γ c A. Then, if M is a classical model and an assignment
in |M|, it follows that M |= Γ[] implies M |= A[].
Proof. Induction on derivations. We begin with the axioms in Stab
and the axiom schemes ∧+ , ∧− . M |= C [] is abbreviated M |= C when
is known from the context.
For the stability axiom ∀x (¬¬R x → R x ) the claim follows from our
assumption that M is classical, i.e., ¬¬RM ( a ) → RM ( a ) for all a ∈ |M|.
−
The axioms ∧ , ∧ are clearly valid.
+

This concludes the treatment of the axioms. We now consider the rules.
In case of the assumption rule u : A we have A ∈ Γ and the claim is
obvious.
Case →+ . Assume M |= Γ. We show M |= (A → B). So assume in
addition M |= A. We must show M |= B. By induction hypothesis (with
Γ ∪ {A} instead of Γ) this clearly holds.
Case →− . Assume M |= Γ. We must show M |= B. By induction
hypothesis, M |= (A → B) and M |= A. The claim follows from the
definition of |=.
Case ∀+ . Assume M |= Γ[] and x ∈ / FV(Γ). We show M |= (∀x A)[],
i.e., M |= A[xa ] for an arbitrary a ∈ |M|. We have
M |= Γ[xa ] by the coincidence lemma, since x ∈
/ FV(Γ),
M |= A[xa ] by induction hypothesis.
Case ∀− . Let M |= Γ[]. We show that M |= A(t)[]. This follows
from
M |= (∀x A(x))[] by induction hypothesis,
M |= A(x)[x(t) ] by definition,
M |= A(t)[] by the substitution lemma.
This concludes the proof.
1.4.3. Completeness of classical logic. We give a constructive analysis
of the completeness of classical logic by using, in the metatheory below,
constructively valid arguments only, mentioning explicitly any assump-
tions which go beyond. When dealing with the classical fragment we of
course need to restrict to classical models. The only non-constructive
principle will be the use of the axiom of dependent choice for the weak
existential quantifier
∃˜ x A(0, x) → ∀n,x (A(n, x) → ∃˜ y A(n + 1, y)) → ∃˜ f ∀n A(n, fn).
Recall that we only consider formulas without ∨, ∃.
1.4. Soundness and completeness of the classical fragment 55

Theorem (Completeness of classical logic). Let Γ ∪ {A} be a set of for-

mulas. Assume that for all classical models M and assignments ,
M |= Γ[] → M |= A[].
Then there must exist a derivation of A from Γ ∪ Stab.
Proof. Since “there must exist a derivation” expresses the weak exis-
tential quantifier in the metalanguage, we need to prove a contradiction
from the assumption Γ, Stab A.
By the completeness theorem for minimal logic, there must be a tree
model T = (Ter, I0 , I1 ) on the complete binary tree T01 and a node l0 such
that l0 Γ, Stab and l0 A.
Call a node k consistent if k ⊥, and stable if k Stab. We prove
k B → ∃˜ k k (k ¬B ∧ k ⊥) (k stable). (10)
Let k be a stable node, and B a formula (without ∨, ∃). Then Stab
¬¬B → B by the stability theorem, and therefore k ¬¬B → B.
Hence from k B we obtain k ¬¬B. By definition this implies
¬∀k k (k ¬B → k ⊥), which proves (10).
Let α be a branch in the underlying tree T01 . We define
α A := ∃˜ k∈α (k A),
α is consistent := α ⊥,
α is stable := ∃˜ k∈α (k Stab).

Note that from α A and A → B it follows that α B. To see this,

consider α A. Then k A for a k ∈ α, since α is linearly ordered.
From A → B it follows that k B, i.e., α B.
A branch α is generic (in the sense that it generates a classical model) if
it is consistent and stable, if in addition for all formulas B
(α B) ∨˜ (α ¬B), (11)
and if for all formulas ∀y B(
y ) with B(
y ) not a universal formula
∀s∈Ter (α B(s )) → α ∀y B(
y ). (12)
For a branch α, we deﬁne a classical model Mα = (Ter, I0 , I1α ) as
I1α (R)(s ) := ∃˜ k∈α I1 (R, k)(s ) (R = ⊥).
Since ∃˜ is used in this deﬁnition, Mα is stable.
We show that for every generic branch α and formula B (without ∨, ∃)
α B ↔ Mα |= B. (13)
The proof is by induction on the logical complexity of B.
Case Rs with R = ⊥. Then (13) holds for all α.
Case ⊥. We have α ⊥ since α is consistent.
56 1. Logic

Case B → C . Let α B → C and Mα |= B. We must show that

Mα |= C . Note that α B by induction hypothesis, hence α C , hence
Mα |= C again by induction hypothesis. Conversely let Mα |= B → C .
Clearly (Mα |= B) ∨˜ (Mα |= B). If Mα |= B, then Mα |= C . Hence
α C by induction hypothesis and therefore α B → C . If Mα |= B
then α B by induction hypothesis. Hence α ¬B by (11) and therefore
α B → C , since α is stable (and (¬¬C → C ) → ⊥ → C ). [Note
that for this argument to be contructively valid one needs to observe that
the formula α B → C is a negation, and therefore one can argue by
the case distinction based on ∨.
˜ This is because, with P1 := Mα |= B,
P2 := M |= B and Q := α B → C , the formula (P1 ∨˜ P2 ) → (P1 →
α

Q) → (P2 → Q) → Q is derivable in minimal logic.]

Case B ∧ C . Easy.
Case ∀y B( y ) (
y not empty) where B( y ) is not a universal formula.
The following are equivalent.
α ∀y B(
y ),
∀s∈Ter (α B(s )) by (12),
∀s∈Ter (M |= B(s )) by induction hypothesis,
α

Mα |= ∀y B(
y ).
This concludes the proof of (13).
Next we show that for every consistent and stable node k there must be
a generic branch containing k:
k ⊥ → k Stab → ∃˜ α (α generic ∧ k ∈ α). (14)
For the proof, let A0 , A1 , . . . enumerate all formulas. We define a sequence
k = k0 k1 k2 . . . of consistent stable nodes by dependent choice. Let
k0 := k. Assume that kn is defined. We write An in the form ∀y B( y)
(with y
possibly empty) where B is not a universal formula. In case
kn ∀y B(
y ) let kn+1 := kn . Otherwise we have kn B(s ) for some s,
and by (10) there must be a consistent node k kn such that k ¬B(s ).
Let kn+1 := k . Since kn kn+1 , the node kn+1 is stable.
Let α := {l | ∃n (l kn )}, hence k ∈ α. We show that α is generic.
Clearly α is consistent and stable. We now prove both (11) and (12).
Let C = ∀y B( y ) (with y possibly empty) where B( y ) is not a universal
formula, and choose n such that C = An . In case kn ∀y B( y ) we are
done. Otherwise by construction kn+1 ¬B(s ) for some s. For (11) we
get kn+1 ¬∀y B( y ) since ∀y B( y ) → B(s ), and (12) follows from the
consistency of α. This concludes the proof of (14).
Now we can finalize the completeness proof. Recall that l0 Γ, Stab
and l0 A. Since l0 A and l0 is stable, (10) yields a consistent node
k l0 such that k ¬A. Evidently, k is stable as well. By (14) there
must be a generic branch α such that k ∈ α. Since k ¬A it follows that
1.5. Tait calculus 57

α ¬A, hence Mα |= ¬A by (13). Moreover, α Γ, thus Mα |= Γ by

(13). This contradicts our assumption.
1.4.4. Compactness and Löwenheim–Skolem theorems. Among the
many important corollaries of the completeness theorem the compactness
and Löwenheim–Skolem theorems stand out as particularly important.
A set Γ of formulas is consistent if Γ c ⊥, and satisfiable if there is (in the
weak sense) a classical model M and an assignment in |M| such that
M |= Γ[].
Corollary. Let Γ be a set of formulas.
(a) If Γ is consistent, then Γ is satisfiable.
(b) (Compactness). If each finite subset of Γ is satisfiable, Γ is satisfiable.
Proof. (a) Assume Γ c ⊥ and that for all classical models M we have
M |= Γ, i.e., M |= Γ implies M |= ⊥. Then the completeness theorem
yields a contradiction.
(b) Otherwise by the completeness theorem there must be a derivation
of ⊥ from Γ ∪ Stab, hence also from Γ0 ∪ Stab for some finite subset
Γ0 ⊆ Γ. This contradicts the assumption that Γ0 is satisfiable.
Corollary (Löwenheim–Skolem). Let Γ be a set of formulas (we as-
sume that L is countable). If Γ is satisfiable, then Γ is satisfiable in a model
with a countably infinite carrier set.
Proof. Assume that Γ is not satisfiable in a countable model. Then
by the completeness theorem Γ ∪ Stab ⊥. Therefore by the soundness
theorem Γ cannot be satisfiable.
Of course one often wishes to incorporate equality into the formal
language. One adds the equality axioms
x=x (reflexivity),
x=y→y=x (symmetry),
x=y→y=z→x=z (transitivity),
x1 = y1 → · · · → xn = yn → f(x1 , . . . , xn ) = f(y1 , . . . , yn ),
x1 = y1 → · · · → xn = yn → R(x1 , . . . , xn ) → R(y1 , . . . , yn ).
Cleary they induce a congruence relation on any model. By “collapsing”
the domain to congruence classes any model would become a “normal”
model in which = is interpreted as identity. One thus obtains complete-
ness, compactness etc. for theories with equality and their normal models.

1.5. Tait calculus

In this section we deal with classical logic only and hence disregard the
distinction between strong and weak existential quantiﬁers and disjunc-
tions. In classical logic one has the de Morgan laws (¬(A∧B) ↔ ¬A∨¬B,
58 1. Logic

¬∀x A ↔ ∃x ¬A, etc.) and these allow any formula to be brought into nega-
tion normal form, i.e., built up from atoms or negated atoms by applying
∨, ∧, ∃, ∀. For such formulas Tait [1968] derived a deceptively simple cal-
culus with just one rule for each symbol. However, it depends crucially
on the principle that finite sets of formulas Γ, Δ etc. are derived. The rules
of Tait’s calculus are as follows where, in order to single out a particular
formula from a finite set, the convention is that Γ, A denotes the finite set
Γ ∪ {A}.

Γ, Rt, ¬Rt (Ax)

Γ, A0 , A1 Γ, A0 Γ, A1
(∨) (∧)
Γ, (A0 ∨ A1 ) Γ, (A0 ∧ A1 )

Γ, A(t) Γ, A
(∃) (∀)
Γ, ∃x A(x) Γ, ∀x A

Γ, C Γ, ¬C
(Cut)
Γ
where in the axioms Rt is an atom, and in the ∀-rule x is not free in Γ.
That this is an equivalent formulation of classical logic is easy. First
notice that any finite set derivable as above is, when considered as a
disjunction, valid in all classical models and therefore (by completeness)
classically derivable. In the opposite direction, if Γ c A, then ¬Γ, A is
derivable in the pure Tait calculus (where ¬Γ is the finite set consisting
of the negation normal forms of all ¬A’s for A ∈ Γ.) We treat some
examples.
(→− ). The →− -rule from assumptions Γ embeds into the Tait calculus
as follows: from ¬Γ, A → B (which is equiderivable with ¬Γ, ¬A, B) and
¬Γ, A derive ¬Γ, B by (Cut), after first weakening ¬Γ, A to ¬Γ, A, B.
(→+ ). From ¬Γ, ¬A, B one obtains ¬Γ, ¬A∨B and hence ¬Γ, A → B.
(∀− ). First note that the Tait calculus easily derives A, ¬A, for any A.
From A(t), ¬A(t) derive A(t), ∃x ¬A(x) by (∃). Hence from ¬Γ, ∀x A(x)
(and some weakenings) we have ¬Γ, A(t) by (Cut).
(∀+ ) is given by the Tait (∀)-rule.
It is well known that from any derivation in the pure Tait calculus one
can eliminate the (Cut) rule. Cut elimination plays a role analogous to
normalization in natural deduction. We do not treat it here in detail be-
cause it will appear in much more detail in part 2, where cut elimination
will be the principal tool in extracting bounds for existential theorems
in a hierarchy of infinitary theories based on arithmetic. Of course nor-
malization could be used instead, but the main point behind the use of
the Tait calculus is that the natural dualities between ∃ and ∀, ∨ and ∧,
simplify the reduction processes involved and reduce the number of cases
1.6. Notes 59

to be considered. Brieﬂy, one shows that the “cut rank” of any Tait proof
(i.e., the maximum height of cut formulas C appearing in it) can be suc-
cessively reduced to zero. For suppose Γ, C and Γ, ¬C are the premises
of a cut, and that both are derivable with cut rank smaller than the height
of C itself. By the duality between C and ¬C , one needs only to consider
the cases where the cut formula C is atomic, disjunctive or existential.
By induction through the derivation of Γ, C , and by inverting its dual
Γ, ¬C , one sees easily that in each case the cut may be replaced by one of
smaller rank (whose cut formula is now a subformula of C ). Repeating
this process through the entire proof thus reduces the cut rank (at the cost
of an exponential increase in its height).

1.6. Notes

Gentzen [1935] introduced natural deduction systems NJ and NK for

intuitionistic and classical logic respectively, using a tree notation as we
have done here. Before him, Jáskowski [1934] already gave such a for-
malism for classical logic, but in linear, not in tree format. However,
Gentzen’s exposition was particularly convincing and made the system
widely known and used.
We have stressed minimal logic based on implication → and universal
quantification ∀ as the possibly “purest” part of natural deduction, since
it is close to lambda calculus and hence allows for the formation of proof
terms. Disjunction ∨, conjunction ∧ and existence ∃ can then be defined
either by axioms or else by introduction and elimination rules, as in 1.1.7.
Later (in 7.1.4) we will see that they are all instances of inductively defined
predicates; this was first discovered by Martin-Löf [1971]. The elimination
rule for conjunction was first proposed by Schroeder-Heister [1984].
The first axiom system for minimal logic was given by Kolmogorov
[1925]. Johansson [1937] seems to be the first to have coined the term
“minimal logic”.
The first published proof of the existence of a normal form for arbi-
trary derivations in natural deduction is due to Prawitz [1965], though
unpublished notes of Gentzen, recently discovered by Negri and von
Plato [2008], indicate that Gentzen already had a normalization proof.
Prawitz also considered permutative and simplification conversions. The
proof presented in 1.2.2 is based on ideas of Pol [1995]. The so-called
SN-technique was introduced by Raamsdonk and Severi [1995] and was
further developed and extended by Joachimski and Matthes [2003]. The
result in 1.2.5 is an adaption of Orevkov [1979] (which in turn is based on
Statman [1978]) to natural deduction.
Tree models as used here were first introduced (for intuitionistic logic)
by Beth [1956], [1959], and are often called Beth models in the literature,
60 1. Logic

for instance in Troelstra and van Dalen [1988]. Kripke [1965] further
developed Beth models, but with variable domains, to provide semantics
both for intuitionistic and various modal logics. The completeness proof
we give for minimal logic in 1.3 is due to Friedman; a published version
appears in Troelstra and van Dalen [1988].
Tait introduced his calculus in [1968], as a convenient reﬁnement of the
sequent calculus of Gentzen [1935]. Due to its usage of the negation nor-
mal form it is applicable only to classical logic, but then it can exploit the
∨, ∧ and ∃, ∀ dualities in order to reduce the number of cases considered
in proof analysis (see particularly part 2). The cut elimination theorem
for his sequent calculus was proved by Gentzen [1935]; for more recent
expositions see Schwichtenberg [1977], Troelstra and van Dalen [1988],
Mints [2000], Troelstra and Schwichtenberg [2000], Negri and von Plato
[2001].
Chapter 2

RECURSION THEORY

In this chapter we develop the basics of recursive function theory, or as it

is more generally known, computability theory. Its history goes back to
the seminal works of Turing, Kleene and others in the 1930s.
A computable function is one deﬁned by a program whose operational
semantics tell an idealized computer what to do to its storage locations
as it proceeds deterministically from input to output, without any prior
restrictions on storage space or computation time. We shall be concerned
with various program styles and the relationships between them, but the
emphasis throughout this chapter and in part 2 will be on one underlying
data type, namely the natural numbers, since it is there that the most basic
foundational connections between proof theory and computation are to
be seen in their clearest light. This is not to say that computability over
more general and abstract data types is less important. Quite the con-
trary. For example, from a logical point of view, Stoltenberg-Hansen and
Tucker [1999], Tucker and Zucker [2000], [2006] and Moschovakis [1997]
give excellent presentations of a more abstract approach, and our part 3
develops a theory in higher types from a completely general standpoint.
The two best-known models of machine computation are the Turing
Machine and the (Unlimited) Register Machine of Shepherdson and
Sturgis [1963]. We base our development on the latter since it aﬀords the
quickest route to the results we want to establish (see also Cutland [1980]).

2.1. Register machines

2.1.1. Programs. A register machine stores natural numbers in registers

denoted u, v, w, x, y, z possibly with subscripts, and it responds step by
step to a program consisting of an ordered list of basic instructions:

I0
I1
..
.
Ik−1

61
62 2. Recursion theory

Each instruction has one of the following three forms whose meanings are
obvious:
Zero: x := 0,
Succ: x := x + 1,
Jump: [if x = y then In else Im ].
The instructions are obeyed in order starting with I0 except when a condi-
tional jump instruction is encountered, in which case the next instruction
will be either In or Im according as the numerical contents of registers x
and y are equal or not at that stage. The computation terminates when
it runs out of instructions, that is when the next instruction called for is
Ik . Thus if a program of length k contains a jump instruction as above
then it must satisfy the condition n, m ≤ k and Ik means “halt”. Notice
of course that some programs do not terminate, for example the following
one-liner:
[if x = x then I0 else I1 ]
2.1.2. Program constructs. We develop some shorthand for building up
standard sorts of programs.
Transfer. “x := y” is the program
x := 0
[if x = y then I4 else I2 ]
x := x + 1
[if x = x then I1 else I1 ],
which copies the contents of register y into register x.
Predecessor. The program “x := y− · 1” copies the modiﬁed predecessor
of y into x, and simultaneously copies y into z:
x := 0
z := 0
[if x = y then I8 else I3 ]
z := z + 1
[if z = y then I8 else I5 ]
z := z + 1
x := x + 1
[if z = y then I8 else I5 ].
Composition. “P ; Q” is the program obtained by concatenating pro-
gram P with program Q. However, in order to ensure that jump instruc-
tions in Q of the form “[if x = y then In else Im ]” still operate properly
within Q they need to be re-numbered by changing the addresses n, m to
k + n, k + m respectively where k is the length of program P. Thus the
eﬀect of this program is to do P until it halts (if ever) and then do Q.
2.1. Register machines 63

Conditional. “if x = y then P else Q ﬁ” is the program

[if x = y then I1 else Ik+2 ]
..
.P
[if x = x then Ik+2+l else I2 ]
..
.Q
where k, l are the lengths of the programs P, Q respectively, and again
their jump instructions must be appropriately re-numbered by adding 1
to the addresses in P and k + 2 to the addresses in Q. Clearly if x = y
then program P is obeyed and the next jump instruction automatically
bypasses Q and halts. If x = y then program Q is performed.
For loop. “for i = 1 . . . x do P od” is the program
i := 0
[if x = i then Ik+4 else I2 ]
i := i + 1
..
.P
[if x = i then Ik+4 else I2 ]
where, again, k is the length of program P and the jump instructions in
P must be appropriately re-addressed by adding 3. The intention of this
new program is that it should iterate the program P x times (do nothing
if x = 0). This requires the restriction that the register x and the “local”
counting-register i are not re-assigned new values inside P.
While loop. “while x = 0 do P od” is the program
y := 0
[if x = y then Ik+3 else I2 ]
..
.P
[if x = y then Ik+3 else I2 ]
where, again, k is the length of program P and the jump instructions in
P must be re-addressed by adding 2. This program keeps on doing P
until (if ever) the register x becomes 0; it requires the restriction that the
auxiliary register y is not re-assigned new values inside P.
2.1.3. Register machine computable functions. A register machine pro-
gram P may have certain distinguished “input registers” and “output
registers”. It may also use other “working registers” for scratchwork and
these will initially be set to zero. We write P(x1 , . . . , xk ; y) to signify that
program P has input registers x1 , . . . , xk and one output register y, which
are distinct.
Deﬁnition. The program P(x1 , . . . , xk ; y) is said to compute the k-
ary partial function ϕ : Nk → N if, starting with any numerical values
n1 , . . . , nk in the input registers, the program terminates with the number
64 2. Recursion theory

m in the output register if and only if ϕ(n1 , . . . , nk ) is deﬁned with value

m. In this case, the input registers hold their original values.
A function is register machine computable if there is some program
which computes it.

Here are some examples.

Addition. “Add(x, y; z)” is the program

z := x ; for i = 1, . . . , y do z := z + 1 od

which adds the contents of registers x and y into register z.

Subtraction. “Subt(x, y; z)” is the program
· 1 ; z := w od
z := x ; for i = 1, . . . , y do w := z −

which computes the modiﬁed subtraction function x − · y.

Bounded sum. If P(x1 , . . . , xk , w; y) computes the k + 1-ary function
ϕ then the program Q(x1 , . . . , xk , z; x)

x := 0 ;
· 1 ; P(
for i = 1, . . . , z do w := i − x , w; y) ; v := x ; Add(v, y; x) od

computes the function

(x1 , . . . , xk , z) = ϕ(x1 , . . . , xk , w)
w<z

which will be undeﬁned if, for some w < z, ϕ(x1 , . . . , xk , w) is undeﬁned.

Multiplication. Deleting “w := i − · 1 ; P” from the last example gives
a program Mult(z, y; x) which places the product of y and z into x.
Bounded product. If in the bounded sum example the instruction x :=
x + 1 is inserted immediately after x := 0, and if Add(v, y; x) is replaced
by Mult(v, y; x), then the resulting program computes the function

(x1 , . . . , xk , z) = ϕ(x1 , . . . , xk , w).
w<z

Composition. If Pj (x1 , . . . , xk ; yj ) computes ϕj for each j = i, . . . , n

and if P0 (y1 , . . . , yn ; y0 ) computes ϕ0 , then the program Q(x1 , . . . , xk ; y0 )

P1 (x1 , . . . , xk ; y1 ) ; . . . ; Pn (x1 , . . . , xk ; yn ) ; P0 (y1 , . . . , yn ; y0 )

computes the function

(x1 , . . . , xk ) = ϕ0 (ϕ1 (x1 , . . . , xk ) , . . . , ϕn (x1 , . . . , xk ))

which will be undeﬁned if any of the ϕ-subterms on the right hand side is
undeﬁned.
2.2. Elementary functions 65

Unbounded minimization. If P(x1 , . . . , xk , y; z) computes ϕ then the

program Q(x1 , . . . , xk ; z)
y := 0 ; z := 0 ; z := z + 1 ;
while z = 0 do P(x1 , . . . , xk , y; z) ; y := y + 1 od ;
z := y − · 1
computes the function
(x1 , . . . , xk ) = y (ϕ(x1 , . . . , xk , y) = 0)
that is, the least number y such that ϕ(x1 , . . . , xk , y ) is deﬁned for every
y ≤ y and ϕ(x1 , . . . , xk , y) = 0.

2.2. Elementary functions

2.2.1. Deﬁnition and simple properties. The elementary functions of

Kalmár [1943] are those number-theoretic functions which can be de-
fined explicitly by compositional terms built up from variables and the
constants 0, 1 by repeated applications of addition +, modified subtrac-
tion −· , bounded sums and bounded products.
By omitting bounded products, one obtains the subelementary func-
tions.
The examples in the previous section show that all elementary functions
are computable and totally defined. Multiplication and exponentiation
are elementary since

m · n = i<n m and m n = i<n m
and hence by repeated composition all exponential polynomials are ele-
mentary.
In addition the elementary functions are closed under
Definition by cases,

g0 (
n ) if h(
n) = 0
f(
n) =
g1 (
n ) otherwise
since f can be defined from g0 , g1 and h by
f(
n ) = g0 ( · h(
n ) · (1 − · (1 −
n ) · (1 −
n )) + g1 ( · h(
n ))).
Bounded minimization,
f(
n , m) = k<m (g(
n , k) = 0)
since f can be defined from g by

f(
n , m) = ·
1− · g(
1− n , k) .
i<m k≤i
66 2. Recursion theory

Note: this deﬁnition gives value m if there is no k < m such that

g(n , k) = 0. It shows that not only the elementary but in fact the sub-
elementary functions are closed under bounded minimization. Further-
more, we define k≤m (g( n , k) = 0) as k<m+1 (g(
n , k) = 0).
Lemma.
(a) For every elementary function f : Nr → N there is a number k such that
= n1 , . . . , nr ,
for all n
f(
n ) < 2k (max(
n ))
where 20 (m) := m and 2k+1 (m) := 22k (m) .
(b) Hence the function n → 2n (1) is not elementary.
Proof. (a) By induction on the build-up of the compositional term
defining f. The result clearly holds if f is any one of the base functions:
f( · nj .
n ) = 0 or 1 or ni or ni + nj or ni −
If f is defined from g by application of bounded sum or product

f(
n , m) = g(
n , i) or g(
n , i)
i<m i<m

where g(
n , i) < 2k (max(
n , i)) then we have
n , m) ≤ (2k (max(
f( n , m)))m < 2k+2 (max(
n , m))
n n
using n n < 22 (since n n < (2n )n ≤ 22 for n > 3).
If f is deﬁned from g0 , g1 , . . . , gl by composition
f(
n ) = g0 (g1 (
n ), . . . , gl (
n ))
where for each j ≤ l we have gj (−) < 2kj (max(−)), then with k =
maxj kj
f(
n ) < 2k (2k (max(
n ))) = 22k (max(
n ))
and this completes the ﬁrst part.
(b) If 2n (1) were an elementary function of n then by (a) there would
be a positive k such that for all n
2n (1) < 2k (n)
but then putting n = 2k (1) yields 22k (1) (1) < 22k (1), a contradiction.
2.2.2. Elementary relations. A relation R on N is said to be elementary
k

if its characteristic function

1 if R(
n)
cR (
n) =
0 otherwise
is elementary. In particular, the “equality” and “less than” relations are
elementary since their characteristic functions can be deﬁned as follows:
· (1 −
c< (n, m) = 1 − · (m − · n)), c= (n, m) = 1 − · (c< (n, m) + c< (m, n)).
2.2. Elementary functions 67

Furthermore if R is elementary then so is the function

f(
n , m) = k<m R(
n , k)
since R( · cR (
n , k) is equivalent to 1 − n , k) = 0.
Lemma. The elementary relations are closed under applications of propo-
sitional connectives and bounded quantiﬁers.
Proof. For example, the characteristic function of ¬R is
· cR (
1− n ).
The characteristic function of R0 ∧ R1 is
n ) · cR1 (
cR0 ( n ).
The characteristic function of ∀i<m R(
n , i) is
c= (m, i<m (cR (
n , i) = 0)).

Examples. The above closure properties enable us to show that many

“natural” functions and relations of number theory are elementary; thus
n
= k<n (n < (k + 1)m),
m
n mod m = n − · n m,
m
Prime(n) ↔ 1 < n ∧ ¬∃m<n (1 < m ∧ n mod m = 0),

pn = m<22n Prime(m) ∧ n = cPrime (i) ,
i<m

so p0 , p1 , p2 , . . . gives the enumeration of primes in increasing order. The

n
estimate pn ≤ 22 for the n-th prime pn can be proved by induction on n:
For n = 0 this is clear, and for n ≥ 1 we obtain
0 1 n−1 n
−1 n
pn ≤ p0 p1 · · · pn−1 + 1 ≤ 22 22 · · · 22 + 1 = 22 + 1 < 22 .
2.2.3. The class E.
Definition. The class E consists of those number-theoretic functions
which can be defined from the initial functions: constant 0, successor S,
projections (onto the i-th coordinate), addition +, modified subtraction
· , multiplication · and exponentiation 2x , by applications of composition
−
and bounded minimization.
The remarks above show immediately that the characteristic functions
of the equality and less than relations lie in E, and that (by the proof of
the lemma) the relations in E are closed under propositional connectives
and bounded quantifiers.
Furthermore the above examples show that all the functions in the class
E are elementary. We now prove the converse, which will be useful later.
68 2. Recursion theory

Lemma. There are “pairing functions” , 1 , 2 in E with the following

properties:
(a) maps N × N bijectively onto N,
(b) (a, b)+b+2 ≤ (a +b+1)2 for a +b ≥ 1, hence (a, b) < (a +b+1)2 ,
(c) 1 (c), 2 (c) ≤ c,
(d) ( 1 (c), 2 (c)) = c,
(e) 1 ( (a, b)) = a,
(f) 2 ( (a, b)) = b.
Proof. Enumerate the pairs of natural numbers as follows:
..
.
6 ...
3 7 ...
1 4 8 ...
0 2 5 9 ...
At position (0, b) we clearly have the sum of the lengths of the preceding
diagonals, and on the next diagonal a + b remains constant. Let (a, b)
be the number written at position (a, b). Then we have
1
(a, b) = i + a = (a + b)(a + b + 1) + a.
2
i≤a+b

Clearly : N × N → N is bijective. Moreover, a, b ≤ (a, b) and in case

(a, b) = 0 also a < (a, b). Let
1 (c) := x≤c ∃y≤c ( (x, y) = c),
2 (c) := y≤c ∃x≤c ( (x, y) = c).
Then clearly i (c) ≤ c for i ∈ {1, 2} and
1( (a, b)) = a, 2( (a, b)) = b, ( 1 (c), 2 (c)) = c.
, 1 and 2 are in E by deﬁnition. For (a, b) we have the estimate
(a, b) + b + 2 ≤ (a + b + 1)2 for a + b ≥ 1.
This follows with n := a + b from
1
n(n + 1) + n + 2 ≤ (n + 1)2 for n ≥ 1,
2
which is equivalent to n(n + 1) + 2(n + 1) ≤ 2((n + 1)2 − 1) and hence to
(n + 2)(n + 1) ≤ 2n(n + 2), which holds for n ≥ 1.
The proof shows that , 1 and 2 are in fact subelementary.
Theorem (Gödel’s -function). There is in E a function with the fol-
lowing property: For every sequence a0 , . . . , an−1 < b of numbers less than
4
b we can ﬁnd a number c ≤ 4 · 4n(b+n+1) such that (c, i) = ai for all i < n.
2.2. Elementary functions 69

Proof. Let
(x, y) := (x, y) + 1 and
i (x) := i (x
· 1) for i = 1, 2.
−
Deﬁne

a := (b, n) and d := i<n (1 + (ai , i)a!).
From a! and d we can, for each given i < n, reconstruct the number ai as
the unique x < b such that 1 + (x, i)a! properly divides d . For clearly
ai is such an x, and if some x < b were to satisfy the same condition, then
because 1 ≤ (x, i) < a and the numbers 1 + ka! are relatively prime for
k ≤ a, we would have (x, i) = (aj , j) for some j < n. Hence x = aj
and i = j, thus x = ai . Therefore

ai = x<b ∃z<d ((1 + (x, i)a!)z = d ).
We can now deﬁne Gödel’s -function as

(c, i) := x<
1 (c)
∃z<
2 (c)
((1 + (x, i) · 1 (c)) ·z = 2 (c)).

Clearly is in E. Furthermore with c := (a!, d ) we see that (c, i) = ai .

It is then not difficult to estimate the given bound on c, using (b, n) <
(b + n + 1)2 .
The above definition of shows that it is subelementary.
2.2.4. Closure properties of E.
Theorem. The class E is closed under limited recursion. Thus if g, h, k
are given functions in E and f is defined from them according to the schema
0) = g(m),
f(m,
n + 1) = h(n, f(m,
f(m, n), m),

n) ≤ k(m,
f(m, n),
then f is in E also.
Proof. Let f be defined from g, h and k in E, by limited recursion as
above. Using Gödel’s -function as in the last theorem we can find for
any given m, i) for all i ≤ n. Let
n a number c such that (c, i) = f(m,
n, c) be the relation
R(m,
∧ ∀i<n ((c, i + 1) = h(i, (c, i), m))
(c, 0) = g(m)
and note by the remarks above that its characteristic function is in E. It is
n, c) holds then (c, i) = f(m,
clear, by induction, that if R(m, i), for all
i ≤ n. Therefore we can define f explicitly by the equation
n) = (c R(m,
f(m, n, c), n).
f will lie in E if c can be bounded by an E-function. However, the
4
theorem on Gödel’s -function gives a bound 4 · 4(n+1)(b+n+2) , where in
this case b can be taken as the maximum of k(m, i) for i ≤ n. But this
can be defined in E as k(m,
i0 ), where i0 = i≤n ∀j≤n (k(m,
j) ≤ k(m,
i)).
Hence c can be bounded by an E-function.
70 2. Recursion theory

Remark. Note that it is in this proof only that the exponential function
is required, in providing a bound for .
Corollary. E is the class of all elementary functions.
Proof. It is sufficient merely to show that E is closed under bounded
sums and bounded products. Suppose, for instance,
that f is defined from
g in E by bounded summation: f(m, n) = i<n g(m, i). Then f can be
defined by limited recursion, as follows:
0) = 0,
f(m,
n + 1) = f(m,
f(m, n) + g(m,
n),
n) ≤ n · max g(m,
f(m, i),
i<n

and the functions (including the bound) from which it is defined are in
E. Thus f is in E by the theorem. If, instead, f is defined by bounded
product, then proceed similarly.
2.2.5. Coding finite lists. Computation on lists is a practical necessity,
so because we are basing everything here on the single data type N we
must develop some means of “coding” finite lists or sequences of natural
numbers into N itself. There are various ways to do this and we shall
adopt one of the most traditional, based on the pairing functions , 1 ,
2.
The empty sequence is coded by the number 0 and a sequence n0 , n1 ,
. . . , nk−1 is coded by the “sequence number”

n0 , n1 , . . . , nk−1 = (. . . ( (0, n0 ), n1 ), . . . , nk−1 )

with (a, b) := (a, b) + 1, thus recursively,
:= 0,

n0 , n1 , . . . , nk := (n0 , n1 , . . . , nk−1 , nk ).
Because of the surjectivity of , every number a can be decoded uniquely
as a sequence number a = n0 , n1 , . . . , nk−1 . If a is greater than zero,
hd(a) := 2 (a − · 1) is the “head” (i.e., rightmost element) and tl(a) :=
·
1 (a − 1) is the “tail” of the list. The k-th iterate of tl is denoted tl
(k)
(k)
and since tl(a) is less than or equal to a, tl (a) is elementarily definable
(by limited recursion). Thus we can define elementarily the “length” and
“decoding” functions:
lh(a) := k≤a (tl(k) (a) = 0),
·
(a)i := hd(tl(lh(a)−(i+1)) (a)).
Then if a = n0 , n1 , . . . , nk−1 it is easy to check that
lh(a) = k and (a)i = ni for each i < k.
2.2. Elementary functions 71

Furthermore (a)i = 0 when i ≥ lh(a). We shall write (a)i,j for ((a)i )j

and (a)i,j,k for (((a)i )j )k . This elementary coding machinery will be used
at various crucial points in the following.
Note that our previous remarks show that the functions lh(·) and (a)i
are subelementary, and so is n0 , n1 , . . . , nk−1 for each ﬁxed k.
Lemma (Estimate for sequence numbers).
k
(n + 1)k ≤ n, . . . , n < (n + 1)2 for n ≥ 1.

k

Proof. We prove a slightly strengthened form of the second estimate:

k
n, . . . , n + n + 1 ≤ (n + 1)2 ,

k

by induction on k. For k = 0 the claim is clear. In the step k → k + 1 we

have
n, . . . , n + n + 1 = (n, . . . , n , n) + n + 2

k+1 k

≤ (n, . . . , n + n + 1)2 by the lemma in 2.2.3

k
k+1
≤ (n + 1)2 by induction hypothesis.
For the ﬁrst estimate the base case k = 0 is clear, and in the step we have
n, . . . , n = (n, . . . , n , n) + 1

k+1 k
≥ n, . . . , n + n + 1

k
≥ (n + 1)(k + 1) by induction hypothesis.

Concatenation of sequence numbers b ∗ a is deﬁned thus:

b ∗ := b,
b ∗ n0 , n1 , . . . , nk := (b ∗ n0 , n1 , . . . , nk−1 , nk ) + 1.

To check that this operation is also elementary, deﬁne h(b, a, i) by recur-

sion on i as follows.
h(b, a, 0) = b,
h(b, a, i + 1) = (h(b, a, i), (a)i ) + 1
and note that since
h(b, a, i) = (b)0 , . . . , (b)lh(b)−·1 , (a)0 , . . . , (a)i−·1 for i ≤ lh(a)
72 2. Recursion theory
lh(b)+i
it follows from the estimate above that h(a, b, i) ≤ (b + a)2 . Thus h
is definable by limited recursion from elementary functions and hence is
itself elementary. Finally
b ∗ a = h(b, a, lh(a)).
Lemma. The class E is closed under limited course-of-values recursion.
Thus if h, k are given functions in E and f is defined from them according
to the schema
n) = h(n, f(m,
f(m, n − 1), m),
0), . . . , f(m,
n) ≤ k(m,
f(m, n),
then f is in E also.
Proof. f( ¯ m,
n) := f(m, n − 1) is definable by
0), . . . , f(m,
¯ m,
f( 0) = 0,
¯ m,
f( ¯ m,
n + 1) = f( ¯ m,
n) ∗ h(n, f( n), m),

2n
¯ m,
f( n) ≤ i) + 2 ,
k(m,
i<n
k
using n, . . . , n < (n + 1)2 . But f(m, ¯ m,
n) = (f( n + 1))n .

k
The next lemma gives closure of E under limited course-of-values re-
cursion but with parameter substitution allowed. Here we are working at
the extremity of elementary definability, but this generalized schema will
be crucially important for the elementary arithmetization of syntax which
is developed prior to Gödel’s theorems in the next chapter (particularly
in regard to the substitution function). Unfortunately this last closure
property of E is rather complicated to state, because it requires notational
details to do with iteration of parameter substitutions.
Lemma. The class E is closed under limited course-of-values recursion
with parameter substitution. Suppose g, h, k, pi and ai ( for i ≤ l ) are all
in E and let f be defined from them as follows.
⎧
⎪
⎪
⎨g(m) if n = 0
f(m, n) = h(n, f(p0 (m, n), a0 (n)), . . . ,
⎪
⎪
⎩ f(pl (m, n), al (n)), m) otherwise
f(m, n) ≤ k(m, n)
where ai (n) < n when n > 0. Then f is also in E provided that the iterated
parameter function p( , m, n) defined below is elementarily bounded.
For any sequence := i0 , i1 , . . . , ir−1 of numbers ≤ l define n( ) by:
n() := n, n( ∗ i) := ai (n( )) if n( ) = 0 and := 0 otherwise. Then
2.3. Kleene’s normal form theorem 73

p( , m, n) is given by the course-of-values recursion:

p(, m, n) = m

pi (p( , m, n), n( )) if n( ) = 0
p( ∗ i, m, n) =
p( , m, n) if n( ) = 0.

Proof. First note that, since p( , m, n) is deﬁned by a course-of-values

recursion and, by supposition, is elementarily bounded, it is itself in E by
the last lemma. Similarly, n( ) is elementary.
We code the computation of f(m, n) as a ﬁnitely branching tree of
height ≤ n + 1. Nodes are sequence numbers = i0 , i1 , . . . , ir−1 with
n+1
ij ≤ l and each such node is bounded in value by (l + 1)2 . At each
node is attached the value of f at the current parameter substitution
p( , m, n) and the current stage n( ). Let Q(m, n, z) be the elementary
relation expressing the fact that z correctly encodes the computation tree
for f(m, n) with (z) being the correct value at current node . Thus
n+1
Q(m, n, z) is the following condition, for all nodes ≤ (l + 1)2 : If
n( ) = 0, (z) = h(n( ), (z) ∗ 0 , . . . , (z) ∗ l , p( , m, n)) and if n( ) = 0
then (z) = g(p( , m, n)). Clearly Q is an elementary relation, and if z
is the least such that Q(m, n, z) holds then f(m, n) = (z) . Therefore f
will be elementary if z can be bounded by an elementary function. This
is now easy because z = (z) , (z)1 , . . . (z)(l +1)2n+1 where each (z) =
f(p( , m, n), n( )) ≤ k(p( , m, n), n( )). Therefore
n+1
n+1 (l +1)2
z ≤ (max{k(p( , m, n), n( )) | ≤ (l + 1)2 } + 1)2
and this is elementary.

2.3. Kleene’s normal form theorem

2.3.1. Program numbers. The three types of register machine instruc-

tions I can be coded by “instruction numbers” I thus, where v0 , v1 , v2 ,
. . . is a list of all variables used to denote registers:
If I is “vj := 0” then I = 0, j.
If I is “vj := vj + 1” then I = 1, j.
If I is “if vj = vl then Im else In ” then I = 2, j, l, m, n.
Clearly, using the sequence coding and decoding apparatus above, we
can check elementarily whether or not a given number is an instruction
number.
Any register machine program P = I0 , I1 , . . . , Ik−1 can then be coded
by a “program number” or “index” P thus:
P = I0 , I1 , . . . , Ik−1
74 2. Recursion theory

and again (although it is tedious) we can elementarily check whether

or not a given number is indeed of the form P for some program P.
Tradition has it that e is normally reserved as a variable over putative
program numbers.
Standard program constructs such as those in 2.1 have associated
“index-constructors”, i.e., functions which, given indices of the subpro-
grams, produce an index for the constructed program. The point is that for
standard program constructs the associated index-constructor functions
are elementary. For example, there is an elementary index-constructor
comp such that, given programs P0 , P1 with indices e0 , e1 , comp(e0 , e1 ) is
an index of the program P0 ; P1 . A moment’s thought should convince
the reader that the appropriate deﬁnition of comp is as follows:
· 1)
comp(e0 , e1 ) = e0 ∗ r(e0 , e1 , 0), r(e0 , e1 , 1), . . . , r(e0 , e1 , lh(e1 ) −

where r(e0 , e1 , i) =

2, (e1 )i,1 , (e1 )i,2 , (e1 )i,3 + lh(e0 ), (e1 )i,4 + lh(e0 ) if (e1 )i,0 = 2
(e1 )i otherwise

re-addresses the jump instructions in P1 . Clearly r and hence comp are

elementary functions.

Deﬁnition. Henceforth, ϕe(r) denotes the partial function computed by

the register machine program with program number e, operating on the
input registers v1 , . . . , vr and with output register v0 . There is no loss of
generality here, since the variables in any program can always be renamed
so that v1 , . . . , vr become the input registers and v0 the output. If e is not a
program number, or it is but does not operate on the right variables, then
we adopt the convention that ϕe(r) (n1 , . . . , nr ) is undefined for all inputs
n1 , . . . , nr . Alternative notation for ϕe(r) (n1 , . . . , nr ) is {e}(n1 , . . . , nr ).
This is used in chapter 5, where ϕ has a different significance.

2.3.2. Normal form.

Theorem (Kleene’s normal form). For each arity r there is an elemen-
tary function U and an elementary relation T such that, for all e and all
inputs n1 , . . . , nr ,
(a) ϕe(r) (n1 , . . . , nr ) is deﬁned if and only if ∃s T (e, n1 , . . . , nr , s),
(b) ϕe(r) (n1 , . . . , nr ) = U (e, n1 , . . . , nr , s T (e, n1 , . . . , nr , s)).
Proof. A computation of a register machine program P(v1 , . . . , vr ; v0 )
on numerical inputs n = n1 , . . . , nr proceeds deterministically, step by
step, each step corresponding to the execution of one instruction. Let e
be its program number, and let v0 , . . . , vl be all the registers used by P,
including the “working registers”, so r ≤ l .
2.3. Kleene’s normal form theorem 75

The “state” of the computation at step s is deﬁned to be the sequence

number
, s) = e, i, m0 , m1 , . . . , ml
state(e, n
where m0 , m1 , . . . , ml are the values stored in the registers v0 , v1 , . . . , vl
after step s is completed, and the next instruction to be performed is the
ith one, thus (e)i is its instruction number.
The “state transition function” tr : N → N computes the “next state”.
So suppose that x = e, i, m0 , m1 , . . . , ml is any putative state. Then in
what follows, e = (x)0 , i = (x)1 , and mj = (x)j+2 for each j ≤ l . The
definition of tr(x) is therefore as follows:
tr(x) = e, i , m0 , m1 , . . . , ml
where
(i) If (e)i = 0, j where j ≤ l then i = i + 1, mj = 0, and all other
registers remain unchanged, i.e., mk = mk for k = j.
(ii) If (e)i = 1, j where j ≤ l then i = i + 1, mj = mj + 1, and all
other registers remain unchanged.
(iii) If (e)i = 2, j0 , j1 , i0 , i1 where j0 , j1 ≤ l and i0 , i1 ≤ lh(e) then
i = i0 or i = i1 according as mj0 = mj1 or not, and all registers
remain unchanged, i.e., mj = mj for all j ≤ l .
(iv) Otherwise, if e is not a program number, or if it refers to a register
vk with l < k, or if lh(e) ≤ i, then tr(x) simply repeats the same
state x so i = i, and mj = mj for every j ≤ l .
Clearly tr is an elementary function, since it is defined by elementarily
decidable cases, with (a great deal of) elementary decoding and re-coding
involved in each case.
Consequently, the “state function” state(e, n , s) is also elementary be-
cause it can be defined by iterating the transition function by limited
recursion on s as follows:
, 0) = e, 0, 0, n1 , . . . , nr , 0, . . . , 0
state(e, n
, s + 1) = tr(state(e, n
state(e, n , s))
, s) ≤ h(e, n
state(e, n , s)
where for the bounding function h we can take
, s) = e, e ∗ max(
h(e, n n ) + s, . . . , max(
n ) + s.
This is because the maximum value of any register at step s cannot be
greater than max( n ) + s. Now this expression clearly is elementary, since
m, . . . , m with i occurrences of m is definable by a limited recursion with
i
bound (m + i)2 , as is easily seen by induction on i.
Now recall that if program P has program number e then computation
terminates when instruction Ilh(e) is encountered. Thus we can define the
76 2. Recursion theory

, s), meaning “computation terminates at

“termination relation” T (e, n
step s”, by
, s) := ((state(e, n
T (e, n , s))1 = lh(e)).
Clearly T is elementary and
n ) is deﬁned ↔ ∃s T (e, n
ϕe(r) ( , s).
The output on termination is the value of register v0 , so if we deﬁne the
, s) by
“output function” U (e, n
, s) := (state(e, n
U (e, n , s))2
then U is also elementary and
ϕe(r) ( , s T (e, n
n ) = U (e, n , s)).

2.3.3. Σ01 -deﬁnable relations and -recursive functions. A relation R of

arity r is said to be Σ01 -definable if there is an elementary relation E, say of
= n1 , . . . , nr ,
arity r + l , such that for all n
n ) ↔ ∃k1 . . . ∃kl E(
R( n , k1 , . . . , kl ).
A partial function ϕ is said to be Σ01 -definable if its graph
{(
n , m) | ϕ(
n ) is defined and = m}
is Σ01 -definable.
To say that a non-empty relation R is Σ01 -definable is equivalent to saying
that the set of all sequences n satisfying R can be enumerated (possibly
with repetitions) by some elementary function f : N → N. Such rela-
tions are called elementarily enumerable. For choose any fixed sequence
a1 , . . . , ar satisfying R and define

(m)1 , . . . , (m)r if E((m)1 , . . . , (m)r+l )
f(m) =
a1 , . . . , ar otherwise.
Conversely if R is elementarily enumerated by f then
n ) ↔ ∃m (f(m) =
R( n )
is a Σ01 -definition of R.
The -recursive functions are those (partial) functions which can be de-
fined from the initial functions: constant 0, successor S, projections (onto
the i-th coordinate), addition +, modified subtraction − · and multipli-
cation ·, by applications of composition and unbounded minimization.
Note that it is through unbounded minimization that partial functions
may arise.
Lemma. Every elementary function is -recursive.
2.3. Kleene’s normal form theorem 77

Proof. By simply removing the bounds on in the lemmas in 2.2.3 one

obtains -recursive definitions of the pairing functions , 1 , 2 and of
Gödel’s -function. Then by removing all mention of bounds from the the-
orem in 2.2.4 one sees that the -recursive functions are closed under (un-
limited) primitive recursive definitions: f(m, 0) = g(m),
f(m, n + 1) =
n)). Thus one can -recursively define bounded sums and
h(n, f(m,
bounded products, and hence all elementary functions.

2.3.4. Computable functions.

Definition. The while programs are those programs which can be built
up from assignment statements x := 0, x := y, x := y + 1, x := y −· 1,
by conditionals, composition, for loops and while loops as in 2.1 (on
program constructs).
Theorem. The following are equivalent:
(a) ϕ is register machine computable,
(b) ϕ is Σ01 -definable,
(c) ϕ is -recursive,
(d) ϕ is computable by a while program.
Proof. The normal form theorem shows immediately that every regi-
ster machine computable function ϕe(r) is Σ01 -definable since
n ) = m ↔ ∃s (T (e, n
ϕe(r) ( , s) ∧ U (e, n
, s) = m)
, s) ∧ U (e, n
and the relation T (e, n , s) = m is clearly elementary. If ϕ is
Σ01 -definable, say
n ) = m ↔ ∃k1 . . . ∃kl E(
ϕ( n , m, k1 , . . . , kl ),
then ϕ can be defined -recursively by
ϕ(
n ) = (m E(
n , (m)0 , (m)1 , . . . , (m)l ) )0 ,
using the fact (above) that elementary functions are -recursive. The
examples of computable functionals in 2.1 show how the definition of
any -recursive function translates automatically into a while program.
Finally, 2.1 shows how to implement any while program on a register
machine.
Henceforth computable means “register machine computable” or any
of its equivalents.
Corollary. The function ϕe(r) (n1 , . . . , nr ) is a computable partial func-
tion of the r + 1 variables e, n1 , . . . , nr .
Proof. Immediate from the normal form.
Lemma. A relation R is computable if and only if both R and its comple-
ment Nn \ R are Σ01 -definable.
78 2. Recursion theory

Proof. We can assume that both R and Nn \ R are not empty, and (for
simplicity) also n = 1.
“→”. By the theorem above every computable relation is Σ01 -definable,
and with R clearly its complement is computable.
“←”. Let f, g ∈ E enumerate R and N \ R, respectively. Then
h(n) := i (f(i) = n ∨ g(i) = n)
is a total -recursive function, and R(n) ↔ f(h(n)) = n.
2.3.5. Undecidability of the halting problem. The above corollary says
that there is a single “universal” program which, given numbers e and n ,
computes ϕe(r) (n ) if it is defined. However, we cannot decide in advance
whether or not it will be defined. There is no program which, given e and
, computes the total function
n

1 if ϕe(r) (
n ) is defined
h(e, n)=
0 if ϕe(r) (
n ) is undefined.

For suppose there were such a program. Then the function

) = 0)
n ) = m (h(n1 , n
(
would be computable, say with ﬁxed program number e0 , and therefore

(r) 0 )=0
if h(n1 , n
ϕe0 (
n) =
undeﬁned if h(n1 , n ) = 1.

But then ﬁxing n1 = e0 gives

ϕe(r)
0 (
) = 0 ↔ ϕe(r)
n ) deﬁned ↔ h(e0 , n n ) undeﬁned,
0 (

a contradiction. Hence the relation R(e, n ), which holds if and only if

ϕe(r) (
n ) is deﬁned, is not recursive. It is however Σ01 -deﬁnable.
There are numerous attempts to classify total computable functions
according to the complexity of their termination proofs.

2.4. Recursive deﬁnitions

2.4.1. Least fixed points of recursive definitions. By a recursive definition

of a partial function ϕ of arity r from given partial functions 1 , . . . , m
of fixed but unspecified arities, we mean a defining equation of the form
ϕ(n1 , . . . , nr ) = t( 1, . . . , m , ϕ; n1 , . . . , nr )

where t is any compositional term built up from the numerical variables

= n1 , . . . , nr and the constant 0 by repeated applications of the successor
n
2.4. Recursive deﬁnitions 79

and predecessor functions, the given functions 1 , . . . , m , the function

ϕ itself, and the “definition by cases” function:
⎧
⎪
⎨u if x, y are both defined and equal
dc(x, y, u, v) = v if x, y are both defined and unequal
⎪
⎩
undefined otherwise.
There may be many partial functions ϕ satisfying such a recursive
definition, but the one we wish to single out is the least defined one, i.e.,
the one whose defined values arise inevitably by lazy evaluation of the
term t “from the outside in”, making only those function calls which are
absolutely necessary. This presupposes that each of the functions from
which t is constructed already comes equipped with an evaluation strategy.
In particular if a subterm dc(t1 , t2 , t3 , t4 ) is called then it is to be evaluated
according to the program construct
x := t1 ; y := t2 ; [if x := y then t3 else t4 ].
Some of the function calls demanded by the term t may be for further
values of ϕ itself, and these must be evaluated by repeated unravellings of
t (in other words by recursion).
This “least solution” ϕ will be referred to as the function defined by that
recursive definition or its least fixed point. Its existence and its computabil-
ity are guaranteed by Kleene’s recursion theorem below.
2.4.2. The principles of finite support and monotonicity, and the effec-
tive index property. Suppose we are given any fixed partial functions
1 , . . . , m and , of the appropriate arities, and fixed inputs n . If the term
t = t( 1 , . . . , m , ; n ) evaluates to a defined value k then the following
principles clearly hold:
Finite support principle. Only finitely many values of 1 , . . . , m and
are used in that evaluation of t.
Monotonicity principle. The same value k will be obtained no matter
how the partial functions 1 , . . . , m and are extended.
Note also that any such term t satisfies the
Effective index property. There is an elementary function f such that if
1 , . . . , m and are computable partial functions with program numbers
e1 , . . . , em and e respectively, then according to the lazy evaluation strategy
just described,
t( 1, . . . , m, )
;n
defines a computable function of n with program number f(e1 , . . . , em , e).
The proof of the effective index property is by induction over the build-
up of the term t. The base case is where t is just one of the constants 0, 1
or a variable nj , in which case it defines either a constant function n → 0
→ 1, or a projection function n
or n → nj . Each of these is trivially
computable with a fixed program number, and it is this program number
80 2. Recursion theory

we take as the value of f(e1 , . . . , em , e). Since in this case f is a constant

function, it is clearly elementary. The induction step is where t is built up
by applying one of the given functions: successor, predecessor, definition
by cases or (with or without a subscript) to previously constructed
), i = 1 . . . l , thus:
subterms ti ( 1 , . . . , m , ; n
t = (t1 , . . . , tl ).
Inductively we can assume that for each i = 1 . . . l , ti defines a partial
= n1 , . . . , nr which is register machine computable by some
function of n
program Pi with program number given by an already constructed ele-
mentary function fi = fi (e1 , . . . , em , e). Therefore if is computed by
a program Q with program number e , we can put P1 , . . . , Pl and Q to-
gether to construct a new program obeying the evaluation strategy for
t. Furthermore, by the remark on index-constructions in 2.3.1 we will
be able to compute its program number f(e1 , . . . , em , e) from the given
numbers f1 , . . . , fl and e , by some elementary function.
2.4.3. Recursion theorem.
Theorem (Kleene’s recursion theorem). For given partial functions
1 , . . . , m , every recursive definition

ϕ(
n ) = t( 1, . . . , )
m , ϕ; n

has a least ﬁxed point, i.e., a least deﬁned solution, ϕ. Moreover if

1 , . . . , m are computable, so is the least fixed point ϕ.
Proof. Let 1 , . . . , m be fixed partial functions of the appropriate
arities. Let Φ be the functional from partial functions of arity r to partial
functions of arity r defined by lazy evaluation of the term t as described
above:
n ) = t(
Φ( )( 1, . . . , m, ).
;n
Let ϕ0 , ϕ1 , ϕ2 , . . . be the sequence of partial functions of arity r generated
by Φ thus: ϕ0 is the completely undefined function, and ϕi+1 = Φ(ϕi )
for each i. Then by induction on i, using the monotonicity principle
above, we see that each ϕi is a subfunction of ϕi+1 . That is, whenever
ϕi (
n ) is defined with a value k then ϕi+1 ( n ) is defined with that same
value. Since their defined values are consistent with one another we can
therefore construct the “union” ϕ of the ϕi ’s as follows:
n ) = k ↔ ∃i (ϕi (
ϕ( n ) = k).
(i) This ϕ is then the required least fixed point of the recursive definition.
To see that it is a fixed point, i.e., ϕ = Φ(ϕ), first suppose ϕ( n ) is
defined with value k. Then by the definition of ϕ just given, there is
an i > 0 such that ϕi ( n ) is defined with value k. But ϕi = Φ(ϕi−1 )
so Φ(ϕi−1 )(n ) is defined with value k. Therefore by the monotonicity
2.4. Recursive definitions 81

principle for Φ, since ϕi−1 is a subfunction of ϕ, Φ(ϕ)( n ) is deﬁned with

value k. Hence ϕ is a subfunction of Φ(ϕ).
It remains to show the converse, that Φ(ϕ) is a subfunction of ϕ. So
suppose Φ(ϕ)( n ) is defined with value k. Then by the finite support
principle, only finitely many defined values of ϕ are called for in this
evaluation. By the definition of ϕ there must be some i such that ϕi
already supplies all of these required values, and so already at stage i
we have Φ(ϕi )( n ) = ϕi+1 (
n ) defined with value k. Since ϕi+1 is a
subfunction of ϕ it follows that ϕ( n ) is defined with value k. Hence
Φ(ϕ) is a subfunction of ϕ.
To see that ϕ is the least such fixed point, suppose ϕ is any fixed
point of Φ. Then Φ(ϕ ) = ϕ so by the monotonicity principle, since
ϕ0 is a subfunction of ϕ , it follows that Φ(ϕ0 ) = ϕ1 is a subfunction of
Φ(ϕ ) = ϕ . Then again by monotonicity, Φ(ϕ1 ) = ϕ2 is a subfunction
of Φ(ϕ ) = ϕ etcetera so that for each i, ϕi is a subfunction of ϕ . Since
ϕ is the union of the ϕi ’s it follows that ϕ itself is a subfunction of ϕ .
Hence ϕ is the least fixed point of Φ.
(ii) Finally we have to show that ϕ is computable if the given functions
1 , . . . , m are. For this we need the effective index property of the term
t, which supplies an elementary function f such that if is computable
with program number e then Φ( ) is computable with program number
f(e) = f(e1 , . . . , em , e). Thus if u is any fixed program number for the
completely undefined function of arity r, f(u) is a program number for
ϕ1 = Φ(ϕ0 ), f 2 (u) = f(f(u)) is a program number for ϕ2 = Φ(ϕ1 ), and
in general f i (u) is a program number for ϕi . Therefore in the notation of
the normal form theorem,

n ) = ϕf(r)i (u) (
ϕi ( n ),

and by the corollary (in 2.3.4) to the normal form theorem, this is a
computable function of i and n , since f i (u) is a computable function
of i deﬁnable (informally) say by a for-loop of the form “for j = 1 . . . i
do f od”. Therefore by the earlier equivalences, ϕi ( n ) is a Σ01 -deﬁnable
, and hence so is ϕ itself because
function of i and n

n ) = m ↔ ∃i ( ϕi (
ϕ( n ) = m ).

So ϕ is computable and this completes the proof.

Note. The above proof works equally well if ϕ is a vector-valued func-

tion: in other words if, instead of defining a single partial function ϕ, the
recursive definition in fact defines a finite list ϕ of such functions simul-
taneously. For example, the individual components of the machine state
of any register machine at step s are clearly defined by a simultaneous
recursive definition, from zero and successor.
82 2. Recursion theory

2.4.4. Recursive programs and partial recursive functions. A recursive

program is a ﬁnite sequence of possibly simultaneous recursive deﬁni-
tions:

ϕ
0 (n1 , . . . , nr0 ) = t0 (ϕ
0 ; n1 , . . . , nr0 )
ϕ
1 (n1 , . . . , nr1 ) = t1 (ϕ
0, ϕ
1 ; n1 , . . . , nr1 )
ϕ
2 (n1 , . . . , nr2 ) = t2 (ϕ
0, ϕ
1, ϕ
2 ; n1 , . . . , nr2 )
..
.
ϕ
k (n1 , . . . , nrk ) = tk (ϕ
0, . . . , ϕ
k−1 , ϕ
k ; n1 , . . . , nrk ).

A partial function is said to be partial recursive if it is one of the func-

tions defined by some recursive program as above. A partial recursive
function which happens to be totally defined is called simply a recursive
function.
Theorem. A function is partial recursive if and only if it is computable.
Proof. The recursion theorem tells us immediately that every partial
recursive function is computable. For the converse we use the equivalence
of computability with -recursiveness already established in 2.3.4. Thus
we need only show how to translate any -recursive definition into a
recursive program:
The constant 0 function is defined by the recursive program

ϕ(
n) = 0

and similarly for the constant 1 function.

The addition function ϕ(m, n) = m + n is deﬁned by the recursive
program
· 1) + 1)
ϕ(m, n) = dc(n, 0, m, ϕ(m, n −

and the subtraction function ϕ(m, n) = m − · n is deﬁned similarly but with

the successor function +1 replaced by the predecessor −· 1. Multiplication
is defined recursively from addition in much the same way. Note that
in each case the right hand side of the recursive definition is an allowed
term.
The composition schema is a recursive definition as it stands.
Finally, given a recursive program defining , if we add to it the recur-
sive definition

ϕ(
n , m) = dc( (
n , m), 0, m, ϕ(
n , m + 1) )

followed by

ϕ (
n ) = ϕ(
n , 0)
2.4. Recursive deﬁnitions 83

then the computation of ϕ (

n ) proceeds as follows:
ϕ (
n ) = ϕ(
n , 0)
= ϕ(
n , 1) if n , 0) = 0
(
= ϕ(
n , 2) if n , 1) = 0
(
..
.
= ϕ(
n , m) if n , m − 1) = 0
(
=m if n , m) = 0.
(

Thus the recursive program for ϕ deﬁnes unbounded minimization:
ϕ (
n ) = m ( (
n , m) = 0).

2.4.5. Relativized recursion. If, in a recursive program, arbitrary (non-

recursive) function parameters g are introduced, then the functions so
defined are said to be partial recursive in g . We only consider here the
case where these function parameters are totally defined. It is notationally
convenient to regard them as being coded into a single, unary function
g. As g varies over all such functions, the program thus defines a partial
) where the index e codes a “relativized register
recursive functional Φe (g, n
machine” which computes the solution to the recursive program with the
aid of a new kind of instruction. The “oracle call” instruction acts on
a sequence number , which we imagine as supplying a finite segment
of the values of g, and provides on request (and in one machine step)
the i-th value of the sequence where i is the numerical content of some
pre-determined register. By the finite support principle we know that
any computation from a given g will only use a finite segment of its
values. Thus if = ḡ(s) := g(0), g(1), . . . , g(s − 1) and if s is large
enough, this will supply all the values of g required by the computation.
Therefore with this new kind of instruction relativized register machines
will compute relativized recursion. The normal form theorem now extends
straightforwardly to a relativized version.
Theorem (Kleene’s relativized normal form). For each arity r there is
an elementary function U (1) and an elementary relation T (1) such that, for
) of the e-th partial recursive
all e and all inputs n1 , . . . , nr , the value Φe (g, n
functional satisfies the following:
) is defined if and only if ∃s T (1) (e, n
(a) Φe (g, n , ḡ(s)),
) = U (1) (e, n
(b) Φe (g, n , ḡ(s0 )) where s0 = s T (1) (e, n
, ḡ(s))).
If f( ) is totally defined, then f is said to be “recursive
n ) = Φe (g, n
in” g. The relation “f is recursive in g and g is recursive in f” is an equi-
valence which splits all total number-theoretic functions into equivalence
classes called the “degrees of unsolvability” or “Turing degrees”; see, e.g.,
Soare [1987], Cooper [2003].
84 2. Recursion theory

2.5. Primitive recursion and for-loops

2.5.1. Primitive recursive functions. A primitive recursive program over

N is a recursive program in which each recursive definition is of one of the
following five special kinds:
(Z) fi (n) = 0,
(S) fi (n) = n + 1,
(Ujk ) fi (n1 , . . . , nk ) = nj ,
(Crk ) fi (n1 , . . . , nk ) = fi0 ( fi1 (n1 , . . . , nk ), . . . , fir (n1 , . . . , nk ) ),
(PR) fi (n1 , . . . , nk , 0) = fi0 (n1 , . . . , nk ),
fi (n1 , . . . , nk , m + 1) = fi1 (n1 , . . . , nk , m, fi (n1 , . . . , nk , m)),
where, in (C ) and (PR), i0 , i1 , . . . , ir < i. Recall that functions are
allowed to be 0-ary, so k may be 0. Note that the two equations in the
(PR) schema can easily be combined into one recursive definition using
· function. The reason for using f rather than ϕ to denote the
the dc and −
functions in such a program is that they are obviously totally defined (we
try to maintain the convention that f, g, h, . . . denote total functions).
Definition. The primitive recursive functions are those which are defi-
nable by primitive recursive programs. The class of all primitive recursive
functions is denoted “Prim”
Lemma (Explicit definitions). If t is a term built up from numerical con-
stants, variables n1 , . . . , nk and function symbols f1 , . . . , fm denoting previ-
ously defined primitive recursive functions, then the function f defined from
them by
f(n1 , . . . , nk ) = t(f1 , . . . , fm ; n1 , . . . , nk )
is also primitive recursive.
Proof. By induction over the generation of term t.
If t is a constant l then using the (Z), (S) and (U ) schemes
f(n1 , . . . , nk ) = (S ◦ S . . . S ◦ Z ◦ U1k ) (n1 , . . . , nk ).
If t is one of the variables nj then using the (Ujk ) schema
f(n1 , . . . , nk ) = nj .
If t is an applicative term fi (t1 , . . . , tr ) then by the (Crk ) schema
f(n1 , . . . , nk ) = fi (t1 (n1 , . . . , nk ), . . . , tr (n1 , . . . , nk )).

Lemma. Every elementary function is primitive recursive, but not con-

versely.
2.5. Primitive recursion and for-loops 85

Proof. Addition f(n, m) = n + m is deﬁned from successor by the

primitive recursion
f(n, 0) = n, f(n, m + 1) = f(n, m) + 1,
and modified subtraction f(n, m) = n −· m is defined similarly, replacing
· ·
+1 by − 1. Note that predecessor − 1 is definable by a trivial primitive
recursion:
f(0) = 0, f(m + 1) = m.

Bounded sum f( n , m) = i<m g(
n , i) is definable from + by another
primitive recursion:
f(
n , 0) = 0, f(
n , m + 1) = f(
n , m) + g(
n , m).
Multiplication is then defined explicitly by a bounded sum, and bounded
product by a further primitive recursion. The above lemma then gives
closure under all explicit definitions using these principles. Hence every
elementary function is primitive recursive.
We have already seen that the function n → 2n (1) is not elementary.
However, it can be defined primitive recursively from the (elementary)
exponential function thus:
20 (1) = 1, 2n+1 (1) = 22n (1) .
2.5.2. Loop-programs. The loop-programs over N are built up from
• assignments x := 0, x := x + 1, x := y, x := y − · 1 using
• compositions . . . ; . . . ,
• conditionals if x = y then . . . else . . . fi, and
• for-loops for i = 1 . . . y do . . . od,
where i is not reset between do and od.
Lemma. Every primitive recursive function is computable by a loop-
program.
Proof. Composition corresponds to “;” and primitive recursion
f(
n , 0) = g(
n ), f(
n , m + 1) = h(
n , m, f(
n , m))
can be recast as a for-loop (with input variables x
, y and output variable z)
thus:
z := g( x , i − 1, z) od.
x ); for i = 1 . . . y do z := h(
We now describe the operational semantics of loop-programs. Each loop-
program P on “free variables” x = x1 , . . . , xk (i.e., those not “bound” by
for-loops) can be considered as a “state-transformer” function from Nk to
n ) to denote the output state (n1 , . . . , nk ) which results
Nk , and we write P(
after applying program P to input (n1 , . . . , nk ). Note that loop-programs
always terminate! The definition of P( n ) runs as follows, according to
the form of program P:
86 2. Recursion theory
· 1” then
Assignments. For example if P is “xi := xj −
· 1, . . . , nk ).
P(n1 , . . . , ni , . . . , nk ) = (n1 , . . . , nj −

Composition. If P is “Q ; R” then

n ) = (R ◦ Q)(
P( n ).

Conditionals. If P is “if xi = xj then Q else R ﬁ” then

Q(
n ) if ni = nj
P(n) =
x ) if ni = nj .
R(

For-loops. If P is “for i = 1 . . . xj do Q(i, x ) od” then P is deﬁned

by P(n1 , . . . , nj , . . . , nk ) = Q ∗ (nj , n1 , . . . , nj , . . . , nk ) with Q ∗ deﬁned by
primitive recursion on i thus:

Q ∗ (0, n1 , . . . , nj , . . . , nk ) = (n1 , . . . , nj , . . . , nk )
Q (i + 1, n1 , . . . , nj , . . . , nk ) = Q(i + 1, Q ∗ (i, n1 , . . . , nj , . . . , nk )).
∗

Note that the above description actually gives P as a primitive recursive

function from Nk to Nk and not from Nk to N as the formal deﬁnition of
primitive recursion requires. However, this is immaterial when working
over N because we can work with “coded” sequences n ∈ N instead of
n ) ∈ Nk so as to deﬁne
vectors (

P(n1 , . . . , nk ) = n1 , . . . , nk .

The coding and decoding can all be done elementarily, so for any loop-
program P the output function P(n ) will always be primitive recursive.
We therefore have:
Theorem. The primitive recursive functions are exactly those computed
by loop-programs.
2.5.3. Reduction to primitive recursion. Various somewhat more general
kinds of recursion can be transformed into ordinary primitive recursion.
Two important examples are:
Course of values recursion. A trivial example is the Fibonacci function

f(0) := 1, f(1) := 2, f(n + 2) := f(n) + f(n + 1),

which calls for several “previous” values (in this case two) in order to
compute the “next” value. This is not formally a primitive recursion,
but it could be transformed into one because it can be computed by the
for-loop (with x, y as input and output variables):

y := 1 ; z := 1 ; for i = 1 . . . x do u := y ; y := y + z ; z := u od.
2.5. Primitive recursion and for-loops 87

Recursion with parameter substitution. This has the form

f(n, 0) = g(n),
f(n, m + 1) = h(n, m, f(p(n, m), m)).

Again this is not formally a primitive recursion as it stands, but it can be

transformed to the following primitive recursive program:

q(n, m, 0) = n,
(PR)
· (i + 1)),
q(n, m, i + 1) = p(q(n, m, i), m −
(C ) g (n, m) = g(q(n, m, m)),
(C ) · (i + 1)), i, j),
h (n, m, i, j) = h(q(n, m, m −

f (n, m, 0) = g (n, m),
(PR)
f (n, m, i + 1) = h (n, m, i, f (n, m, i)),
(C ) f(n, m) = f (n, m, m).

We leave it as an exercise to check that this program deﬁnes the correct

function f.
2.5.4. A complexity hierarchy for Prim. Given a register machine pro-
gram I0 , I1 , . . . , Im . . . , Ik−1 where, for example, Im is a jump instruction
“if xp = xq then Ir else Is ﬁ” and given numerical inputs in the registers
x
, the ensuing computation as far as step y can be performed by a single
for-loop as follows, where j counts the “next instruction” to be obeyed:

j := 0 ;
for i = 1 . . . y do
if j = 0 then I0 ; j := 1 else
if j = 1 then I1 ; j := 2 else
...
if j = m then if xp = xq then j := r else j := s fi else
...
... fi ... fi fi
od.

Deﬁnition. Lk consists of all loop-programs which contain nested for-

loops with maximum depth of nesting k. Thus L0 -programs are loop-free
and Lk+1 -programs only contain for-loops of the form for i = 1 . . . y do
P od where P is a Lj -program for some j ≤ k.

Deﬁnition. A bounding function for a loop-program P is an increasing

function BP : N → N (that is, BP (n) ≥ n) such that for all n ∈ N we have

BP (n) ≥ n + max #P (i)

i≤n
88 2. Recursion theory

where #P (i) denotes the number of steps executed by P when called with
input i. Note that BP (n) will also bound the size of the output for any
input i ≤ n, since at most 1 can be added to any register at any step.
With each loop-program there is a naturally associated bounding func-
tion as follows:
P = assignment BP (n) = n + 1,
P = if xi = xj then Q else R fi BP (n) = max(BQ (n), BR (n)) + 1,
P=Q;R BP (n) = BR (BQ (n)),
P = for i = 1 . . . xk do Q od BP (n) = BQn (n),
where BQn denotes the n-times iterate of BQ .
It is obvious that the defined BP is a bounding function when P is an
assignment or a conditional. When P is a composed program P = Q ; R
then, given any input i ≤ n let s := #Q (i). Then n + s ≤ BQ (n) and
so the output j of the computation of Q on i is also ≤ BQ (n). Now
let s := #R (j).
Then BR (BQ (n)) ≥ BQ (n) + s ≥ n + s + s . Hence
BR (BQ (n)) ≥ n + maxi≤n #P (i) and therefore BR ◦ BQ is an appropriate
bounding function for P. Finally if P is a for-loop as indicated, then for
any input i ≤ n the computation simply composes Q a certain number
of times, say k, where k ≤ n. Therefore, by what we just have done for
composition, BQn (n) ≥ BQk (n) ≥ n + #P (i). Again this justifies our choice
of bounding functions for for-loops.
Definition. The sequence F0 , F1 , . . . , Fk , . . . of Prim functions is given
by
F0 (n) = n + 1, Fk+1 (n) = Fkn (n).
Definition. For each increasing function g : N → N let Comp(g) de-
note the class of all total functions f : Nr → N which can be computed by
register machines in such a way that on (all but finitely many) inputs n ,
the number of steps required to compute f( n ) is bounded by g(max(
n )).
Theorem. For each k ≥ 1 we have

Lk -computable = Comp(Fki )
i
and hence

Prim = Comp(Fk ).
k

Proof. The second part follows immediately from the first since for all
n ≥ i, Fki (n) ≤ Fkn (n) = Fk+1 (n).
To prove the left-to-right containment of the first part, proceed by
induction on k ≥ 0 to show that for every Lk -program P there is a fixed
2.5. Primitive recursion and for-loops 89

i such that BP ≤ Fki where BP is the bounding function associated with

P as above. It then follows that the function computed by P lies in
Comp(BP ) which is contained in Comp(Fki ). The basis of the induction
is trivial since L0 -programs terminate in a constant number of steps i so
that BP (n) = n + i = F0i (n). For the induction step the crucial case is
where P is a Lk+1 -program of the form for j = 1 . . . xm do Q od with
Q ∈ Lk . By the induction hypothesis there is a i such that BQ ≤ Fki and
hence, using F1 (n) = 2n ≤ Fk+1 (n), we have
BP (n) = BQn (n) ≤ Fkin (n) ≤ Fk+1 (in) ≤ Fk+1 (2i−1 n) ≤ Fk+1
i
(n)
as required.
For the right-to-left containment, suppose f ∈ Comp(Fki ) for some
fixed i and k. Then there is a register machine which computes f( n)
within Fki (max(
n )) steps. Now Fk is defined by k successive iterations
(nested for-loops) starting with F0 = succ. So Fk is Lk -computable and
(by composing i times) so is Fki . Therefore if k ≥ 1 we can compute f(
n)
by a Lk -program:
n ) ; y := Fki (x) ; compute y steps in the computation of f
x := max(
since, as we have already noted, an L1 program suffices to perform any
predetermined number of steps of a register machine program. This
completes the proof.
Corollary. The “Ackermann–Péter function” F : N → N defined as
2

F (k, n) = Fk (n)
is not primitive recursive.
Proof. Since every loop-program has one of the Fki as a bounding
function, it follows that every Prim function f is dominated by some Fki
and therefore for all n ≥ max(k + 1, i) we have
f(n) < Fki (n) ≤ Fkn (n) = Fk+1 (n) = F (k + 1, n) ≤ F (n, n).
Thus the binary function F cannot be primitive recursive, for otherwise
we could take f(n) = F (n, n) and obtain a contradiction.
Corollary. The elementary functions are just those deﬁnable by L2 -
programs, since

Elem = Comp(F2i )
i

where F2 (n) = n · 2 .
n

Proof. It is very easy to see that the elementary functions (like the
primitive recursive ones) form an “honest” class in the sense that every
elementary function is computable within a number of steps bounded by
some (other) elementary function, and hence by some iterated exponen-
tial, and hence by F2i for some i. Conversely if f ∈ Comp(F2i ) then
90 2. Recursion theory

by the normal form theorem there is a program number e such that for
,
all n
f( , s T (e, n
n ) = U ( e, n , s) )
, s) is bounded
and furthermore the number of computation steps s T (e, n
elementarily by F2i (max(
n )). Thus the unbounded minimization is in this
case replaced by an elementarily bounded minimization, and since U and
T are both elementary, so therefore is f.

2.6. The arithmetical hierarchy

The goal of this section is to give a classiﬁcation of the relations deﬁnable

by arithmetical formulas. We have already made a step in this direction
when we discussed the Σ01 -definable relations.
As a preparatory step we prove the substitution lemma and as its corol-
lary the fixed point lemma, also known as Kleene’s second recursion
theorem.
2.6.1. Kleene’s second recursion theorem.
Lemma (Substitution lemma). There is a binary elementary function S
such that
(q)
ϕe(q+1) (m, n
) = ϕS(e,m) n ).
(
Proof. The details are left as an exercise; we only describe the basic
idea here. To construct S(e, m) we view e as code of a register machine
program computing a q +1-ary function ϕ. Then S(e, m) is to be a code of
a register machine program computing the q-ary function obtained from
ϕ by fixing its first argument to be m. So the program coded by S(e, m)
should work as follows. Shift all inputs one register to the right, and write
m in the first register. Then compute as prescribed by e.
Theorem (Fixed point lemma or Kleene’s second recursion theorem).
=
Fix an arity q. Then for every e we can find an e0 such that for all n
n1 , . . . , nq
ϕe(q)
0
n ) = ϕe(q+1) (e0 , n
( ).
Proof. Let ϕh (m, n ) = ϕe (S(m, m), n
) and e0 := S(h, h). Then by
the substitution lemma
ϕe0 (
n ) = ϕS(h,h) ( ) = ϕe (S(h, h), n
n ) = ϕh (h, n ) = ϕe (e0 , n
).
2.6.2. Characterization of Σ01 -definable and recursive relations. We now
give a useful characterization of the Σ01 -definable relations, which will lead
us to the arithmetical hierarchy. Let
We(q) := {
n | ∃s T (e, n
, s)}.
2.6. The arithmetical hierarchy 91

The Σ01 -definable relations are also called recursively enumerable (r.e.) re-
lations.
Lemma.
(a) The We(q) enumerate for e = 0, 1, 2, . . . the q-ary Σ01 -definable relations.
(b) For fixed arity q, We(q) (
n ) as a relation of e, n is Σ01 -definable, but not
recursive.
Proof. (a) If R = We(q) , then R is Σ10 -definable by definition. For the
converse assume that R is Σ01 -definable, i.e., that there is an elementary
relation E, say of arity q + r, such that for all n = n1 , . . . , nq
n ) ↔ ∃k1 . . . ∃kr E(
R( n , k1 , . . . , kr ).
Then clearly R is the domain of the partial recursion function ϕ given the
following -recursive definition:
n ) = m [lh(m) = r ∧ E(
ϕ( n , (m)0 , (m)1 , . . . , (m)r−1 )].
For ϕ = ϕe we have by the normal form theorem R( n ) ↔ ∃s T (e, n
, s).
(b) It suffices to show that We (
n ) is not recursive. So assume that it
would be. Then we could pick e0 such that
) ↔ ¬We (e, n
We0 (e, n );
for e = e0 we obtain a contradiction.
From the substitution lemma above we can immediately infer
(q)
) ↔ WS(e,m)
We(q+1) (m, n n );
(
this fact is sometimes called substitution lemma for Σ01 -definable relations.
Note. We have already seen in 2.3.4 that a relation R is recursive if and
only if both R and its complement ¬R are Σ01 -definable.
2.6.3. Arithmetical relations. A relation R of arity q is said to be arith-
metical if there is an elementary relation E, say of arity q + r, such that
= n1 , . . . , nq
for all n
n ) ↔ (Q1 )k1 . . . (Qr )kr E(
R( n , k1 , . . . , kr ) with Qi ∈ {∀, ∃}.
Note that we may assume that the quantifiers Qi are alternating, since e.g.
∀n ∀m R(n, m) ↔ ∀k R((k)0 , (k)1 ).
A relation R of arity q is said to be Σ0r -definable if there is an elementary

relation E such that for all n
n ) ↔ ∃k1 ∀k2 . . . Qkr E(
R( n , k1 , . . . , kr )
with Q = ∀ if r is even and Q = ∃ if r is odd. Similarly, a relation R of
arity q is said to be Π0r -definable if there is an elementary relation E such

that for all n
n ) ↔ ∀k1 ∃k2 . . . Qkr E(
R( n , k1 , . . . , kr )
92 2. Recursion theory

with Q = ∃ if r is even and Q = ∀ if r is odd. A relation R is said to be

Δ0r -definable if it is Σ0r -definable as well as Π0r -definable.
A partial function ϕ is said to be arithmetical (Σ0r -definable, Π0r -defi-
nable, Δ0r -definable) if its graph {( n , m) | ϕ(
n ) is defined and = m} is.
By the note above, a relation R is Δ01 -definable if and only if it is recursive.

Example. Let Tot := {e | ϕe(1) is total}. Then we have

e ∈ Tot ↔ ϕe(1) is total

↔ ∀n ∃m (ϕe (n) = m)
↔ ∀n ∃m ∃s (T (e, n, s) ∧ U (e, n, s) = m).

Therefore Tot is Π02 -deﬁnable. We will show below that Tot is not Σ02 -
deﬁnable.

2.6.4. Closure properties.

Lemma. Σ0r , Π0r and Δ0r -definable relations are closed under conjunction,
disjunction and bounded quantifiers ∃m<n , ∀m<n . The Δ0r -definable relations
are closed against negation. Moreover, for r > 0 the Σ0r -definable relations
are closed against the existential quantifier ∃ and the Π0r -definable relations
are closed against the universal quantifier ∀.
Proof. This can be seen easily. For instance, closure under the bounded
universal quantifier ∀m<n follows from

∀m<n ∃k R(
n , n, m, k) ↔ ∃l ∀m<n R(
n , n, m, (l )m ).

The relative positions of the Σ0r , Π0r and Δ0r -deﬁnable relations are shown
in Figure 1.

.
..
Π03

Π02 Δ03 Σ03

Π01 Δ02 Σ02

Δ01 Σ01

Figure 1. The arithmetical hierarchy

2.6.5. Universal Σ0r+1 -deﬁnable relations. We now generalize the enu-

meration We(q) of the unary Σ01 -deﬁnable relations and construct binary
2.6. The arithmetical hierarchy 93

universal Σ0r+1 -deﬁnable relations Ur+1

0
:

U10 (e, n) := ∃s T (e, n, s) (↔ n ∈ We(1) ),

0
Ur+1 (e, n) := ∃m ¬Ur0 (e, n ∗ m).
For example,
U30 (e, n) := ∃m1 ∀m2 ∃s T (e, n ∗ m1 , m2 , s),
U20 (e, n) := ∃m ∀s ¬T (e, n ∗ m, s).
0
Clearly the relations Ur+1 (e,
n ) enumerate for e = 0, 1, 2, . . . the q-ary
Σr+1 -deﬁnable relations, and their complements the q-ary Π0r+1 -deﬁnable
0

relations.
Now it easily follows that all inclusions in Figure 1 are proper. To see
this, assume for example that ∃m ∀s ¬T (e, n, m, s) would be Π02 . Pick e0
such that
∀m ∃s T (e0 , n, m, s) ↔ ∃m ∀s ¬T (n, n, m, s);
for n := e0 we obtain a contradiction. As another example, assume

A := {2e, n | ∃m ∀s ¬T (e, n, m, s)} ∪

{2e, n + 1 | ∀m ∃s T (e, n, m, s)},

which is a Δ03 -set, would be Σ02 . Then we would have a contradiction

∀m ∃s T (e, n, m, s) ↔ 2e, n + 1 ∈ A,
and hence {(e, n) | ∀m ∃s T (e, n, m, s)} would be a Σ02 -definable relation,
a contradiction.
2.6.6. Σ0r -complete relations. We now develop an easy method to obtain
precise classifications in the arithmetical hierarchy. Since by sequence-
coding we can pass in an elementary way between relations R of arity
q and relations R (n) := R((n)1 , . . . , (n)q ) of arity 1, it is no real loss
of generality if we henceforth restrict to q = 1 and only deal with sets
A, B ⊆ N (i.e., unary relations). First we introduce the notion of many-
one reducibility.
Let A, B ⊆ N. B is said to be many-one reducible to A if there is a total
recursive function f such that for all n
n ∈ B ↔ f(n) ∈ A.
A set A is said to be Σ0r -complete if
1. A is Σ0r -definable, and
2. every Σ0r -definable set B is many-one reducible to A.
Lemma. If A is Σ0r -complete, then A is Σ0r -definable but not Π0r -definable.
94 2. Recursion theory

Proof. Let A be Σ0r -complete and assume that A is Π0r -definable. Pick
a set B which is Σ0r -definable but not Π0r -definable. By Σ0r -completeness of
A the set B is many-one reducible to A via a recursive function f:
n ∈ B ↔ f(n) ∈ A.
But then B would be Π0r -definable too, contradicting the choice of B.
Remark. In the definition and the lemma above we can replace Σ0r by
Π0r .This gives the notion of Π01 -completeness, and the proposition that
every Π0r -complete set A is Π0r -definable but not Σ0r -definable.

Example. We have seen above that the set Tot := {e | ϕe(1) is total}
is Π02 -definable. We now can show that Tot is not Σ02 -definable. By the
lemma it suffices to prove that Tot is Π02 -complete. So let B be an arbitrary
Π02 -definable set. Then, for some e ∈ N,
n ∈ B ↔ ∀m ∃s T (e, n, m, s).
Consider the partial recursive function
ϕe (n, m) := U (e, n, m, s T (e, n, m, s)).
By the substitution lemma we have
n ∈ B ↔ ∀m (ϕe (n, m) is defined)
↔ ∀m (ϕS(e,n) (m) is defined)
↔ ϕS(e,n) is total
↔ S(e, n) ∈ Tot.
Therefore B is many-one reducible to Tot.

2.7. The analytical hierarchy

We now generalize the arithmetical hierarchy and give a classiﬁcation

of the relations definable by analytical formulas, i.e., formulas involving
number as well as function quantifiers.
2.7.1. Analytical relations. First note that the substitution lemma as
well as the fixed point lemma in 2.6.1 continue to hold if function argu-
ments are present, with the same function S in the substitution lemma. We
also extend the enumeration We(q) of the Σ01 -definable relations: By 2.6.2
suitably extended to allow additional function arguments g = g1 , . . . gp
the sets
We(p,q) := {( ) | ∃s T2 (e, g , n
g, n , s)}
enumerate for e = 0, 1, 2, . . . the (p, q)-ary Σ01 -definable relations. With
the same argument as in 2.6 we see that for fixed arity (p, q), We(p,q) ( )
g, n
is Σ01 -definable, but not recursive. The treatment
, e, n
as a relation of g
2.7. The analytical hierarchy 95

of the arithmetical hierarchy can now be extended without diﬃculties to

(p, q)-ary relations.

Examples. (a) The set R of all recursive functions is Σ03 -deﬁnable, since

R(f) ↔ ∃e ∀n ∃s [T (e, n, s) ∧ U (e, n, s) = f(n)].

(b) Let LinOrd denote the set of all functions f such that

≤f := {(n, m) | fn, m = 1}

is a linear ordering of its ﬁeld Mf := {n | ∃m (fn, m = 1∨fm, n =

1)}. LinOrd is Π01 -deﬁnable, since

LinOrd(f) ↔ ∀n (n ∈ Mf → fn, n = 1) ∧
∀n,m (fn, m = 1 ∧ fm, n = 1 → n = m)) ∧
∀n,m,k (fn, m = 1 ∧ fm, k = 1 → fn, k = 1) ∧
∀n,m (n, m ∈ Mf → fn, m = 1 ∨ fm, n = 1).

Here we have written n ∈ Mf for ∃m (fn, m = 1 ∨ fm, n = 1).

A relation R of arity (p, q) is said to be analytical if there is an arith-

metical relation P, say of arity (r + p, q), such that for all g = g1 , . . . , gp
= n 1 , . . . , nq
and n

R( ) ↔ (Q1 )f1 . . . (Qr )fr P(f1 , . . . , fr , g , n

g, n ) with Qi ∈ {∀, ∃}.

Note that we may assume that the quantiﬁers Qi are alternating, since for
instance

∀f ∀g R(f, g) ↔ ∀h R((h)0 , (h)1 ),

where (h)i (n) := (h(n))i . A relation R of arity (p, q) is said to be Σ1r -

deﬁnable if there is a (r + p, q)-ary arithmetical relation P such that for
, n
all g

R( ) ↔ ∃f1 ∀f2 . . . Qfr P(f1 , . . . , fr , g , n

g, n )

with Q = ∀ if r is even and Q = ∃ if r is odd. Similarly, a relation R of

arity (p, q) is said to be Π1r -deﬁnable if there is an arithmetical relation P
such that for all g , n

R( ) ↔ ∀f1 ∃f2 . . . Qfr P(f1 , . . . , fr , g , n

g, n )

with Q = ∃ if r is even and Q = ∀ if r is odd. A relation R is said to be

Δ1r -definable if it is Σ1r -definable as well as Π1r -definable.
A partial functional Φ is said to be analytical (Σ1r -definable, Π1r -definable,
Δ1r -definable) if its graph {( , m) | Φ(
g, n ) is defined and = m} is.
g, n
96 2. Recursion theory

Lemma. A relation R is Σ1r -deﬁnable if and only if it can be written in the

form
R( ) ↔ ∃f1 ∀f2 . . . Qfr Q m P(f1 , . . . , fr , g , n
g, n , m)

∃ if Q = ∀
with Q ∈ {∀, ∃} and Q :=
∀ if Q = ∃

with an elementary relation P. Similarly, a relation R is Π1r -deﬁnable if and

only if it can be written in the form
R( ) ↔ ∀f1 ∃f2 . . . Qfr Q m P(f1 , . . . , fr , g , n
g, n , m)
with Q, Q as above and an elementary relation P.
Proof. Use
∀n ∃f R(f, n) ↔ ∃g ∀n R((g)n , n) with (g)n (m) := fn, m,
∀n R(n) ↔ ∀f R(f(0)).
For example, the prefix ∀f ∃n ∀m is transformed first into ∀f ∃n ∀g , then into
∀f ∀h ∃n , and finally into ∀g ∃n .
Example. Define
WOrd(f) := (≤f is a well-ordering of its field Mf ).
Then WOrd satisfies

WOrd(f) ↔ LinOrd(f) ∧ ∀g [∀n fg(n + 1), g(n)) = 1 →

∃m g(m + 1) = g(m)].
Hence WOrd is Π11 -definable.
2.7.2. Closure properties.
Lemma (Closure properties). The Σ1r , Π1r and Δ1r -definable relations are
closed against conjunction, disjunction and numerical quantifiers ∃n , ∀n . The
Δ1r -definable relations are closed against negation. Moreover, for r > 0 the
Σ1r -definable relations are closed against the existential function quantifier
∃f and the Π1r -definable relations are closed against the universal function
quantifier ∀f .
Proof. This can be seen easily. For instance, closure of the Σ11 -definable
relations against universal numerical quantifiers follows from the trans-
formation of ∀n ∃f ∀m first into ∃g ∀n ∀m and then into ∃g ∀k .
The relative positions of the Σ1r , Π1r and Δ1r -definable relations are shown
in Figure 2.
Here

Δ0∞ := Σ0r (= Π0r )
r≥1 r≥1
2.7. The analytical hierarchy 97

Δ1∞

.
..
Π12
Π11 Δ12 Σ12

Δ11 Σ11

Δ0∞

.
..
Π03

Π02 Δ03 Σ03

Π01 Δ02 Σ02

Δ01 Σ01

Figure 2. The analytical hierarchy

is the set of all arithmetical relations, and

Δ1∞ := Σ1r (= Π1r )
r≥1 r≥1

is the set of all analytical relations.

2.7.3. Universal Σ1r+1 -definable relations.
Lemma (Universal relations). Among the Σ1r+1 (Π1r+1 )-definable relations
there is a (p, q + 1)-ary relation enumerating all (p, q)-ary Σ1r+1 (Π1r+1 )-
definable relations.
Proof. As an example, we prove the lemma for Σ12 and Σ11 . All Σ12 -
definable relations are enumerated by
n
∃g ∀h ∃s T2 (e, f, , g, h, s),
and all Σ11 -definable relations are enumerated by
n
∃g ∀s ¬T2 (e, f, , g, s).

Lemma. All inclusions in Figure 2 above are proper.

Proof. We postpone (to 2.9.8) the proof of Δ0∞ Δ11 . The rest of the
proof is obvious from the following examples. Assume ∃g ∀h ∃s T2 (e, n, g,
98 2. Recursion theory

h, s) would be Π12 . Pick e0 such that

∀g ∃h ∀s ¬T2 (e0 , n, g, h, s) ↔ ∃g ∀h ∃s T2 (n, n, g, h, s);
for n := e0 we obtain a contradiction. As another example, assume
A := {2e, n | ∃g ∀h ∃s T2 (e, n, g, h, s)} ∪
{2e, n + 1 | ∀g ∃h ∀s ¬T2 (e, n, g, h, s)},
which is a Δ13 -set, would be Σ12 . Then from
∀g ∃h ∀s ¬T2 (e, n, g, h, s) ↔ 2e, n + 1 ∈ A,
it would follow that {(e, n) | ∀g ∃h ∀s ¬T2 (e, n, g, h, s)} is a Σ12 -deﬁnable
relation, a contradiction.
2.7.4. Σr -complete relations. A set A ⊆ N is said to be Σr -complete if
1 1

1. A is Σ1r -deﬁnable, and

2. every Σ1r -definable set B ⊆ N is many-one reducible to A.
Lemma. If A ⊆ N is Σ1r -complete, then A is Σ1r -definable but not Π1r -
definable.
Proof. Let A be Σ1r -complete and assume that A is Π1r -definable. Pick a
set B ⊆ N which is Σ1r -definable but not Π1r -definable. By Σ1r -completeness
of A the set B is many-one reducible to A via a recursive function f:
n ∈ B ↔ f(n) ∈ A.
But then B would be Π1r -definable too, contradicting the choice of B.
Remark. In the definition and the lemma above we can replace by Σ1r
Π1r . This gives the notion of Π1r -completeness, and the proposition that
every Π1r -complete set A is Π1r -definable but not Σ1r -definable.

2.8. Recursive type-2 functionals and well-foundedness

2.8.1. Computation trees. To each oracle program with index e, asso-

ciate its “tree of non-past-secured sequence numbers”:
Tree(e) := {n0 , . . . , nl −1 | ∀k<l ¬T (1) (e, n0 , n1 , . . . , nk−1 )},
called the computation tree of the given program.
We imagine the computation tree as growing downwards by extension;
that is, if and are any two sequence numbers (or nodes) in the tree
then comes below if and only if is a proper extension of , i.e.,
lh() < lh( ) and ∀i<lh() (( )i = ()i ). We write ⊃ to denote this.
Note that if is in the tree and ⊃ then is automatically in the tree, by
definition. An infinite branch of the tree is thus determined by a number
n and a function g : N → N such that ∀s ¬T (1) (e, n, g(s)). Therefore by
the relativized normal form theorem, an infinite branch is a witness to
2.8. Recursive type-2 functionals and well-foundedness 99

the fact that for some n and some g, Φe (g)(n) = Φe (g, n) is not defined.
To say that the tree is “well-founded” is to say that there are no infinite
branches, and hence:
Theorem. Φe is total if and only if Tree(e) is well-founded.
2.8.2. Ordinal assignments; recursive ordinals. This equivalence is the
basis for a natural theory of ordinal assignments, measuring (in some
sense) the “complexity” of those oracle programs which terminate “every-
where” (on all oracles and all numerical inputs). We shall later investigate
in some detail these ordinal assignments and the ways in which they mea-
sure complexity, but to begin with we shall merely describe the hierarchy
which immediately arises. It is due to Kleene [1958], but appears there
only as a brief footnote to the first page.
Definition. If Tree(e) is well-founded we can assign to each of its nodes
an ordinal by recursion “up the tree” as follows: if is a terminal
node (no extension of it belongs to the tree) then = 0; otherwise
= sup{ + 1 | ⊃ ∧ ∈ Tree(e)}.
Then we can assign an ordinal to the whole tree by defining e := .
Example. The for-loop (with input variable x and output variable y)
y := 0 ; for i = 1 . . . x do y := g(y) od
computes the iteration functional It(g)(n) = g n (0). For fixed g and n the
branch through its computation tree will terminate in a node
n, g(0), . . . , g 2 (0), . . . , g n−1 (0), . . . , g n (0), . . . , g(s − 1),
where s is the least number such that (i) g(s) contains all the necessary
oracle information concerning g, so s > g n−1 (0), and (ii) computation of
the program terminates by step s.
Working down this g-branch (and remembering that g is any function
at all) we see that for i < n, once the value of g i (0) is chosen, it determines
the length of the ensuing segment as far as g i+1 (0). The greater the value
of g i (0), the greater is the length of this segment. Therefore as we take
the supremum over all branches issuing from a node
n, g(0), . . . , g 2 (0), . . . , g(g i−1 (0) − 1)
the successive segments g i (0), . . . , g i+1 (0) have unbounded length, de-
pending on the value of g i (0). So each such segment adds one more
to the ordinal height of the tree. Since there are n − 1 such segments, the
height of the subtree below node n will be ·(n−1). Therefore the height
of the computation tree for this loop-program is supn · (n − 1) = 2 .
Definition. An ordinal is recursive if it is the order-type of some re-
cursive well-ordering relation ⊆ N × N. Any predecessor of a recursive
ordinal is recursive and so is its successor, so the recursive ordinals form an
100 2. Recursion theory

initial segment of the countable ordinals. The least non-recursive ordinal

is a limit, denoted 1CK , the “CK” standing for Church–Kleene.
Note that if Φe is total recursive then Tree(e) can be well-ordered by
the so-called Kleene–Brouwer ordering: <KB if and only if either ⊃
or else there is an i < min(lh( ), lh()) such that ∀j<i (( )j = ()j ) and
( )i < ()i . This is a recursive (in fact elementary) well-ordering with
order-type ≥ e . Hence e is a recursive ordinal.
2.8.3. A hierarchy of total recursive functionals. Kleene’s hierarchy of
total recursive functionals consists of the classes
R2 (α) := {Φe | Φe total ∧ e < α}
where α ranges over all recursive ordinals. Thus R2 (α) ⊆ R2 () if α < .
Theorem (Hierarchy theorem). Every total recursive functional belongs
to R2 (α) for some recursive ordinal α. Furthermore the hierarchy continues
to expand as α increases through 1CK ; that is, for every recursive ordinal α
there is a total recursive functional F such that F ∈/ R2 (α).
Proof. The first part is immediate since if Φe is total it belongs to
R2 (α + 1) where α is the order-type of the Kleene–Brouwer ordering on
Tree(e).
For the second part suppose α is any fixed recursive ordinal, and let ≺α
be a fixed recursive well-ordering with that order-type. We define a total
recursive functional Vα (f, g, e, ) with two unary function arguments f
and g, where e ranges over indices for oracle programs and ranges
over sequence numbers. Note first that if = n0 , n1 , . . . , nk−1 is a non-
terminal node in Tree(e) then for any function g : N → N the sequence
number
∗ g(lh( )−1) := n0 , n1 , . . . , nk−1 , g(k − 1)
is also a node in Tree(e), below . The definition of Vα is as follows,
by recursion down the g-branch of Tree(e) starting with node , but
controlled by the well-ordering ≺α via the other function argument f:
Vα (f, g, e, ) =
⎧
⎪
⎨Vα (f, g, e, ∗ g(lh( )−1)) if ∈ Tree(e) and
f( ∗ g(lh( )−1)) ≺α f( )
⎪
⎩
U1 (e, ( )0 , ( )1 , . . . , ( )k−1 ) otherwise.
This is a recursive definition and furthermore it always is defined since
repeated application of the first clause leads to a descending sequence

· · · ≺α f( ) ≺α f( ) ≺α f( )
which must terminate after finitely many steps because ≺α is a well-
ordering. Hence the second clause must eventually apply and the compu-
tation terminates. Therefore Vα is total recursive.
2.8. Recursive type-2 functionals and well-foundedness 101

Now if Φe is any total recursive functional such that e < α then

there will be an order-preserving map from Tree(e) into α, and hence a
function fe : N → N such that whenever ⊃ in Tree(e) then fe () ≺α
fe ( ). For this particular e and fe it is easy to see by induction up the
computation tree, and using the relativized normal form theorem, that for
all g and n

Φe (g)(n) = Vα (fe , g, e, n).

Consequently the total recursive functional F deﬁned from Vα by

F (g)(n) = Vα (x g(x + 1), g, g(0), n) + 1

cannot lie in R2 (α). For if it did there would be an e and fe as above

such that F = Φe and hence for all g and all n

Vα (x g(x + 1), g, g(0), n) + 1 = Vα (fe , g, e, n).

A contradiction follows immediately by choosing g so that g(0) = e and

g(x + 1) = fe (x). This completes the proof.

Remark. For relatively simple but fundamental reasons based in eﬀec-

tive descriptive set theory, no such “nice” hierarchy exists for the recursive
functions. For whereas the class of all indices e of total recursive func-
tionals is deﬁnable by the Π11 condition

∀g ∀n ∃s T (1) (e, n, g(s))

the set of all indices of total recursive functions is given merely by an

arithmetical Π02 condition:

∀n ∃s T (e, n, s).

So by the so-called “boundedness property” of hyperarithmetic theory,

any inductive hierarchy classification of all the recursive functions is sure
to “collapse” before 1CK . Already in the 1960s this point was clear.
Moschovakis wrote an unpublished note to this effect, and Feferman
[1962] developed a rich general theory of such hierarchies, part of which
will be briefly summarized in chapter 5. In practice the collapse usually
occurs at the very first limit stage and the hierarchy gives no interesting
information.
Nevertheless if we adopt a more constructive view and take into ac-
count also the ways in which a countable ordinal may be presented as a
well-ordering, rather than just accepting its set-theoretic existence, then
interesting hierarchies of proof theoretically important sub-classes of re-
cursive functions begin to emerge (see part 2).
102 2. Recursion theory

2.9. Inductive deﬁnitions

We have already used an inductive deﬁnition in our proof of Kleene’s

recursion theorem in 2.4.3. Now we treat inductive definitions quite gen-
erally, and discuss how far they will carry us in the analytical hierarchy.
We also discuss the rather important dual concept of “coinductive” defi-
nitions.
2.9.1. Monotone operators. Let U be a fixed non-empty set. A map
Γ : P(U ) → P(U ) is called an operator on U . Γ is called monotone if
X ⊆ Y implies Γ(X ) ⊆ Γ(Y ), for all X, Y ⊆ U .

IΓ := {X ⊆ U | Γ(X ) ⊆ X }
is the set defined inductively by the monotone operator Γ; so IΓ is the
intersection of all Γ-closed subsets of U . Dually,

CΓ := {X ⊆ U | X ⊆ Γ(X )}
is the set defined coinductively by the monotone operator Γ; so CΓ is the
union of all subsets of U that are extended by Γ. Definitions of this kind
are called (generalized) monotone inductive or coinductive definitions.
Theorem (Knaster–Tarski). Let Γ be a monotone operator.
(a) If Γ(X ) ⊆ X , then IΓ ⊆ X .
(b) If X ⊆ Γ(X ), then X ⊆ CΓ .
(c) Γ(IΓ ) = IΓ and Γ(CΓ ) = CΓ .
In particular IΓ is the least fixed point of Γ, and CΓ is the greatest fixed
point of Γ.
Proof. (a), (b) These follow immediately from the definitions of IΓ
and CΓ .
(c) From Γ(X ) ⊆ X we can conclude IΓ ⊆ X by (a), hence Γ(IΓ ) ⊆
Γ(X ) ⊆ X by the monotonicity of Γ. By definition of IΓ we obtain
Γ(IΓ ) ⊆ IΓ . Using monotonicity of Γ we can infer Γ(Γ(IΓ )) ⊆ Γ(IΓ ),
hence IΓ ⊆ Γ(IΓ ) again by definition of IΓ . The argument for Γ(CΓ ) = CΓ
is the same.
Example. Let 0 ∈ U and consider an arbitrary function S : U → U .
For every set X ⊆ U we define
Γ(X ) := {0} ∪ {S(v) | v ∈ X }.
Clearly Γ is monotone, and both IΓ and CΓ consist of the (not necessarily
distinct) elements 0, S(0), S(S(0)), . . . .
2.9.2. Induction and coinduction principles. The premise Γ(X ) ⊆ X in
part (a) of the Knaster–Tarski theorem is in the special case of the example
above equivalent to
∀u (u = 0 ∨ ∃v (v ∈ X ∧ u = S(v)) → u ∈ X ),
2.9. Inductive definitions 103

i.e., to
0 ∈ X ∧ ∀v (v ∈ X → S(v) ∈ X ),
and the conclusion is ∀u (u ∈ IΓ → u ∈ X ). Hence part (a) of the
Knaster–Tarski theorem expresses some kind of a general induction prin-
ciple. However, in the “induction step” we do not quite have the desired
form: instead of ∀v (v ∈ X → S(v) ∈ X ) we would like to have ∀v∈IΓ (v ∈
X → S(v) ∈ X ) (called the strengthened form of induction). But this can
be achieved easily. The theorem below formulates this in the general case.
Theorem (Induction principle for monotone inductive definitions). Let
Γ be a monotone operator. If Γ(X ∩ IΓ ) ⊆ X , then IΓ ⊆ X .
Proof. Because of Γ(X ∩IΓ ) ⊆ Γ(IΓ ) = IΓ we obtain from the premise
Γ(X ∩ IΓ ) ⊆ X ∩ IΓ . Therefore we have IΓ ⊆ X ∩ IΓ by definition of IΓ ,
hence IΓ ⊆ X .
Similarly, the premise X ⊆ Γ(X ) in part (b) is in the special case of the
example above equivalent to
∀u (u ∈ X → u = 0 ∨ ∃v (v ∈ X ∧ u = S(v))),
and the conclusion is ∀u (u ∈ X → u ∈ CΓ ). This can be viewed as
a dual form of the induction principle, called coinduction. Again we
obtain a more appropriate form of the “coinduction step”: instead of
∃v (v ∈ X ∧ u = S(v)) we can have ∃v∈CΓ ∪X (u = S(v)) (called the
strengthened form of coinduction). Generally:
Theorem (Coinduction principle for monotone inductive definitions).
Let Γ be a monotone operator. If X ⊆ Γ(X ∪ CΓ ), then X ⊆ CΓ .
Proof. Because of CΓ = Γ(CΓ ) ⊆ Γ(X ∪CΓ ) we obtain from the premise
X ∪ CΓ ⊆ Γ(X ∪ CΓ ). Then X ∪ CΓ ⊆ CΓ by definition of CΓ , hence
X ⊆ CΓ .
2.9.3. Approximation of the least and greatest fixed point. The least
fixed point IΓ of the monotone operator Γ was defined “from above”,
as intersection of all sets X such that Γ(X ) ⊆ X . We now show that it
can also be obtained by stepwise approximation “from below”. In the
general situation considered here we need a transfinite iteration of the
approximation steps along the ordinals. Similarly the greatest fixed point
CΓ was defined “from below”, as the union of all sets X such that X ⊆
Γ(X ). We show that it can also be obtained by stepwise approximation
“from above”. For an arbitrary operator Γ : P(U ) → P(U ) we define
Γ↑α and Γ↓α by transfinite recursion on ordinals α:
Γ↑0 := ∅, Γ↓0 := U,
Γ↑(α + 1) := Γ(Γ↑α), Γ↓(α + 1) := Γ(Γ↓α),

Γ↑ := Γ↑ , Γ↓ := Γ↓ ,
< <
104 2. Recursion theory

where denotes a limit ordinal. It turns out that not only monotone but
also certain other operators Γ have fixed points that can be approximated
by these Γ↑α or Γ↓α. Call an operator Γ inclusive if X ⊆ Γ(X ) and
selective if X ⊇ Γ(X ), for all X ⊆ U .
Lemma. Let Γ be a monotone or inclusive operator.
(a) Γ↑α ⊆ Γ↑(α + 1) for all ordinals α.
(b) If Γ↑α = Γ↑(α + 1), then Γ↑(α + ) = Γ↑α for all ordinals .
(c) Γ↑α = Γ↑(α + 1) for some α such that Card(α) ≤ Card(U ).

So Γ := Γ↑∞ := ∈On Γ↑ = Γ↑α, where α is the least ordinal such
that Γ↑α = Γ↑(α + 1), and On denotes the class of all ordinals. This α is
called the closure ordinal of Γ and is denoted by |Γ|↑ . The set Γ is called
the closure of the operator Γ. Clearly Γ is a fixed point of Γ.
Proof. (a) For monotone Γ we use transfinite induction on α. The
case α = 0 is trivial. In the successor case we have
Γ↑α = Γ(Γ↑(α − 1)) ⊆ Γ(Γ↑α) = Γ↑(α + 1).
Here we have used the induction hypothesis and the monotonicity of Γ.
In the limit case we obtain

Γ↑ = Γ↑ ⊆ Γ↑( + 1) = Γ(Γ↑ ) ⊆ Γ( Γ↑ ) = Γ↑( + 1).
< < < <

Again we have used the induction hypothesis and the monotonicity of Γ.

In case Γ is inclusive we simply have
Γ↑α ⊆ Γ(Γ↑α) = Γ↑(α + 1).
(b) By transﬁnite induction on . The case = 0 is trivial. In the
successor case we have by induction hypothesis
Γ↑(α + + 1) = Γ(Γ↑(α + )) = Γ(Γ↑α) = Γ↑(α + 1) = Γ↑α,
and in the limit case again by induction hypothesis

Γ↑(α + ) = Γ↑(α + ) = Γ↑α.
<

(c) Assume that for all α such that Card(α) ≤ Card(U ) we have
Γ↑α Γ↑(α + 1), and let uα ∈ Γ↑(α + 1) \ Γ↑α. This defines an
injective map from {α | Card(α) ≤ Card(U )} into U . But this set {α |
Card(α) ≤ Card(U )} is exactly the least cardinal larger than Card(U ),
so this is impossible.
Similarly we obtain
Lemma. Let Γ be a monotone or selective operator.
(a) Γ↓α ⊇ Γ↓(α + 1) for all ordinals α.
(b) If Γ↓α = Γ↓(α + 1), then Γ↓(α + ) = Γ↓α for all ordinals .
(c) Γ↓α = Γ↓(α + 1) for some α such that Card(α) ≤ Card(U ).
2.9. Inductive definitions 105

So Γ := Γ↓∞ := ∈On Γ↓ = Γ↓α, where α is the least ordinal such that
Γ↓α = Γ↓(α + 1), and On denotes the class of all ordinals. This α is called
the coclosure ordinal of Γ and is denoted by |Γ|↓ . The set Γ is called the
coclosure of the operator Γ. Clearly Γ is a fixed point of Γ.
We now show that for a monotone operator Γ its closure Γ is in fact its
least fixed point IΓ and its coclosure Γ is its greatest fixed point CΓ .
Lemma. Let Γ be a monotone operator. Then for all ordinals α we have
(a) Γ↑α ⊆ IΓ .
(b) If Γ↑α = Γ↑(α + 1), then Γ↑α = IΓ .
(c) Γ↓α ⊇ CΓ .
(d) If Γ↓α = Γ↓(α + 1), then Γ↓α = CΓ .
Proof. (a) By transfinite induction on α. The case α = 0 is trivial. In
the successor case we have by induction hypothesis Γ↑(α − 1) ⊆ IΓ . Since
Γ is monotone this implies
Γ↑α = Γ(Γ↑(α − 1)) ⊆ Γ(IΓ ) = IΓ .
In the limit case we obtain from the induction hypothesis Γ↑ ⊆ IΓ for
all < . This implies

Γ↑ = Γ↑ ⊆ IΓ .
<

(b) Let Γ↑α = Γ↑(α + 1), hence Γ↑α = Γ(Γ↑α). Then Γ↑α is a ﬁxed
point of Γ, hence IΓ ⊆ Γ↑α. The reverse inclusion follows from (a).
For (c) and (d) the proofs are similar.
2.9.4. Continuous operators. We now consider the important special
case of continuous operators. A subset Z ⊆ P(U ) is called directed if for
every ﬁnite Z0 ⊆ Z there is an X ∈ Z such that Y ⊆ X for all Y ∈ Z0 .
An operator Γ : P(U ) → P(U ) is called continuous if

Γ( Z) = {Γ(X ) | X ∈ Z}

for every directed subset Z ⊆ P(U ). We also need a dual notion: a subset
Z ⊆ P(U ) is called codirected if for every ﬁnite Z0 ⊆ Z there is an X ∈ Z
such that X ⊆ Y for all Y ∈ Z0 . An operator Γ : P(U ) → P(U ) is called
cocontinuous if

Γ( Z) = {Γ(X ) | X ∈ Z}

for every codirected subset Z ⊆ P(U ).

Lemma. Every continuous or cocontinuous operator Γ is monotone.
Proof. For X, Y ⊆ U such that X ⊆ Y we obtain Γ(Y ) = Γ(X ∪Y ) =
Γ(X )∪Γ(Y ) from the continuity of Γ, and hence Γ(X ) ⊆ Γ(Y ). Similarly
we obtain Γ(X ) = Γ(X ∩ Y ) = Γ(X ) ∩ Γ(Y ) from the cocontinuity of Γ,
and hence Γ(X ) ⊆ Γ(Y ).
106 2. Recursion theory

For a continuous (cocontinuous) operator the transﬁnite approxima-

tion of its least (greatest) fixed point stops after steps. Hence in this case
we have an easy characterization of the least fixed point “from below”,
and of the greatest fixed point “from above”.
Lemma. (a) Let Γ be a continuous operator. Then IΓ = Γ↑.
(b) Let Γ be a cocontinuous operator. Then CΓ = Γ↓.
Proof. (a) It suffices to show Γ↑( + 1) = Γ↑.

Γ↑( + 1) = Γ(Γ↑) = Γ( Γ↑n) = Γ(Γ↑n) = Γ↑(n + 1)
n< n< n<
= Γ↑,
where in the third to last equation we have used the continuity of Γ.
(b) Similarly it suffices to show Γ↓( + 1) = Γ↓.

Γ↓( + 1) = Γ(Γ↓) = Γ( Γ↓n) = Γ(Γ↓n) = Γ↓(n + 1)
n< n< n<
= Γ↓,
where in the third to last equation we have used the cocontinuity of Γ.
2.9.5. The accessible part of a relation. An important example of a
monotone inductive definition is the following construction of the acces-
sible part of a binary relation ≺ on U . Note that ≺ is not required to be
transitive, so (U, ) may be viewed as a reduction system. For X ⊆ U let
Γ≺ (X ) be the set of all ≺-predecessors of u:
Γ≺ (X ) := {u | ∀v≺u (v ∈ X )}.
Clearly Γ≺ is monotone; its least fixed point IΓ≺ is called the accessible
part of (U, ≺) and denoted by acc(≺) or acc≺ . If IΓ≺ = U , then the
relation ≺ is called well-founded; the inverse relation is called noetherian
or terminating. In this special case the Knaster–Tarski theorem and the
induction principle for monotone inductive definitions in 2.9.2 can be
combined as follows.
∀u (∀v≺u (v ∈ X ∩ acc≺ ) → u ∈ X ) → ∀u∈acc≺ (u ∈ X ). (1)
acc≺ is Γ≺ -closed, i.e., ∀v≺u (v ∈ acc≺ ) implies u ∈ acc≺ . (2)
Every u ∈ acc≺ is from Γ≺ (acc≺ ), i.e., ∀u∈acc≺ ∀v≺u (v ∈ acc≺ ). (3)
Note that (1) expresses an induction principle: to show that all elements
in u ∈ acc≺ are in a set X it suffices to prove the “induction step”: we can
infer u ∈ X from the assumption that all smaller v ≺ u are accessible and
in X .
By a reduction sequence we mean a finite or infinite sequence u1 , u2 , . . .
such that ui ui+1 . As an easy application one can show that u ∈ acc≺
if and only if every reduction sequence starting with u terminates after
finitely many steps. For the direction from left to right we use induction
2.9. Inductive definitions 107

on u ∈ acc≺ . So let u ∈ acc≺ and assume that for every u such that
u u every reduction sequence starting starting with u terminates after
finitely many steps. Then clearly also every reduction sequence starting
with u must terminate, since its second member is such a u . Conversely,
suppose we would have a u ∈ / acc≺ . We construct an infinite reduction
sequence u = u1 , u2 , . . . , un , . . . such that un ∈
/ acc≺ ; this yields the desired
contradiction. So let un ∈ / acc≺ . By (2) we then have a v ∈ / acc≺ such
that un v; pick un+1 as such a v.
2.9.6. Inductive definitions over N. We now turn to inductive definitions
over the set N and their relation to the arithmetical and analytical hier-
archies. An operator Γ : P(N) → P(N) is called Σ0r -definable if there is a
Σ0r -definable relation QΓ such that for all A ⊆ N and all n ∈ N
n ∈ Γ(A) ↔ QΓ (cA , n).
Π0r , Δ0r , Σ1r ,
and Π1r Δ1r -definable
operators are defined similarly.
It is easy to show that every Σ01 -definable monotone operator Γ is con-
tinuous, and hence by a lemma in 2.9.4 has closure ordinal |Γ| ≤ . We
now show that this consequence still holds for inclusive operators.
Lemma. Let Γ be a monotone or inclusive Σ01 -definable operator. Then
|Γ| ≤ .
Proof. By assumption
n ∈ Γ(A) ↔ ∃s T (1) (e, n, cA (s))
for some e ∈ N. It suffices to show that Γ(Γ↑) ⊆ Γ↑. Suppose
n ∈ Γ(Γ↑), so T (1) (e, n, cΓ↑ (s)) for some s. Since Γ↑ is the union of
the increasing chain Γ↑0 ⊆ Γ↑1 ⊆ Γ↑2 ⊆ · · · , for some r we must have
cΓ↑ (s) = cΓ↑r (s). Therefore n ∈ Γ(Γ↑r) = Γ↑(r + 1) ⊆ Γ↑.
2.9.7. Definability of least fixed points for monotone operators. Next
we prove that the closure of a monotone Σ01 -definable operator is Σ01 -
definable as well (this will be seen to be false for inclusive operators).
As a tool in the proof we need König’s lemma. Here and later we use
starred function variables f ∗ , g ∗ , h ∗ , . . . to range over 0-1-valued func-
tions.
Lemma (König). Let T be a binary tree, i.e., T consists of (codes for)
sequences of 0 and 1 only and is closed against the formation of initial
segments. Then
∀n ∃x (lh(x) = n ∧ ∀i<n (x)i ≤ 1 ∧ x ∈ T ) ↔ ∃f ∗ ∀s (f ∗ (s) ∈ T ).
Proof. The direction from right to left is obvious. For the converse
assume the left hand side and let

M := {y | ∀i<lh(y) (y)i ≤ 1 ∧ ∀m ∃z (lh(z) = m ∧ ∀i<m (z)i ≤ 1 ∧

∀j≤lh(y)+m Init(y ∗ z, j) ∈ T )}.
108 2. Recursion theory

M can be seen as the set of all “fertile” nodes, possessing arbitrary long
extensions within T . To construct the required inﬁnite path f ∗ we use the
axiom of dependent choice:

∃y A(0, y) → ∀n,y (A(n, y) → ∃z A(n + 1, z)) → ∃f ∀n A(n, f(n)),

with A(n, y) expressing that y is a fertile node of length n:

A(n, y) := (y ∈ M ∧ lh(y) = n).

Now ∃y A(0, y) is obvious (take y := ). For the step case assume that y is
a fertile node of length n. Then at least one of the two possible extensions
y ∗ 0 and y ∗ 1 must be fertile, i.e., in M ; pick z accordingly.
Corollary. If R is Π01 -deﬁnable, then so is

Q( ) := ∃f ∗ ∀s R(f ∗ (s), g , n

g, n ).

Proof. By König’s lemma we have

Q( ) ↔ ∀n ∃x≤
g, n 1,...,1 (lh(x) = n ∧ ∀i<n (x)i ≤ 1 ∧ R(x, g
, n
)).

We now show that the Π11 and Σ01 -definable relations are closed against
monotone inductive definitions.
Theorem. Let Γ : P(N) → P(N) be a monotone operator.
(a) If Γ is Π11 -definable, then so is its least fixed point IΓ .
(b) If Γ is Σ01 -definable, then so is its least fixed point IΓ .
Proof. Let Γ : P(N) → P(N) be a monotone operator and n ∈ Γ(A) ↔
QΓ (cA , n).
(a) Assume QΓ is Π11 -definable. Then IΓ is the intersection of all Γ-
closed sets, so

n ∈ IΓ ↔ ∀f (∀m (QΓ (f, m) → f(m) = 1) → f(n) = 1).

This shows that IΓ is Π11 -deﬁnable.

(b) Assume QΓ is Σ01 -deﬁnable. Then IΓ can be represented in the
form

n ∈ IΓ ↔ ∀f ∗ (∀m (QΓ (f ∗ , m) → f ∗ (m) = 1) → f ∗ (n) = 1)

↔ ∀f ∗ ∃m R(f ∗ , m, n) with R recursive

↔ ∀f ∗ ∃s T (1) (e, n, f ∗ (s)) for some e.

By the corollary to König’s lemma IΓ is Σ01 -deﬁnable.

2.9. Inductive deﬁnitions 109

2.9.8. Some counter examples. If Γ is a non-monotone but only inclu-

sive Σ01 -definable operator, then its closure Γ need not even be arithmetical.
Recall from 2.6.5 the definition of the universal Σ0r+1 -definable relations
0
Ur+1 (e, n):
U10 (e, n) := ∃s T (e, n, s) ( ↔ n ∈ We(1) ),
0
Ur+1 (e, n) := ∃m ¬Ur0 (e, n ∗ m).
Let
U0 := {r, e, n
| Ur+1
0
(e,
n )}.
Clearly for every arithmetical relation R there are r, e such that R( n) ↔
r, e, n
∈ U0 . Hence U0 cannot be arithmetical, for if it were say Σ0r+1 -
definable, then every arithmetical relation R would be Σ0r+1 -definable, con-
tradicting the fact that the arithmetical hierarchy is properly expanding.
On the other hand we have
Lemma. There is an inclusive Σ01 -definable operator Γ such that Γ = U0 ;
hence Γ is not arithmetical.
Proof. We define Γ such that
Γ↑r = {t, e, n
| 0 < t ≤ r ∧ Ut0 (e,
n )}. (4)
Let
s ∈ Γ(A) := s ∈ A ∨ ∃e,n (s = 1, e, n
∧ U10 (e,
n )) ∨
∃t,e,n (s = t + 1, e, n
∧ ∃e1 ,m t, e1 , m
∈ A ∧ ∃m t, e, n
, m ∈
/ A).
We now prove (4) by induction on r. The base case r = 0 is obvious. In
the step case we have
s ∈ Γ(Γ↑r) ↔ s ∈ Γ↑r ∨ ∃e,n (s = 1, e, n
∧ U10 (e,
n )) ∨
∃t,e,n (s = t + 1, e, n
∧ ∃e1 ,m t, e1 , m
∈ Γ↑r ∧
∃m t, e, n
, m ∈
/ Γ↑r)
↔ s ∈ Γ↑r ∨ ∃e,n ((s = 1, e, n
∧ U10 (e,
n )) ∨
∃t,e,n (s = t + 1, e, n
∧ 0 < t ≤ r ∧ ∃m ¬Ut0 (e,
n , m))
↔ ∃t,e,n (s = t, e, n
∧ 0 < t ≤ r ∧ Ut0 (e,
n )) ∨
∃e,n ((s = 1, e, n
∧ U10 (e,
n )) ∨
∃t,e,n (s = t + 1, e, n
∧ 0 < t ≤ r ∧ Ut+1
0
(e,
n ))
↔ s ∈ {t, e, n
| 0 < t ≤ r + 1 ∧ Ut0 (e,
n )]}.
Clearly Γ is a Σ01 -definable inclusive operator. By 2.9.6 its closure ordinal

|Γ| is ≤ , so Γ↑ = Γ. But clearly Γ↑ = r Γ↑r = U0 .
On the positive side we have
110 2. Recursion theory

Lemma. For every inclusive Δ11 -definable operator Γ its closure Γ is Δ11 -
definable.
Proof. Let Γ be an inclusive operator such that n ∈ Γ(A) ↔ QΓ (cA , n)
for some Δ11 -definable relation QΓ . Let

∗ 1 if p = r + 1, n and n ∈ Γ↑r
f (p) :=
0 otherwise
and consider the following Δ11 -definable relation R:
R(g) := ∀p (g(p) ≤ 1) ∧
∀p (lh(p) = 2 ∨ (p)0 = 0 → g(p) = 0) ∧
∀r ∀n (gr + 1, n = 1 ↔ QΓ (x gr, x, n)).
∗
Clearly R(f ). Moreover, for every g such that R(g) we have g(p) = 0
for all p not of the form r + 1, n, and it is easy to prove that
gr + 1, n = 1 ↔ n ∈ Γ↑r.
∗
Therefore f is the unique member of R, and we have
n ∈ Γ ↔ ∃r (n ∈ Γ↑r) ↔ ∃r (gr + 1, n = 1)
↔ ∃g (R(g) ∧ ∃r (gr + 1, n = 1))
↔ ∀g (R(g) → ∃r (gr + 1, n = 1)).
Hence Γ is Δ11 -definable.
Corollary. U0 ∈ Δ11 \ Δ0∞ .

2.10. Notes

The history of recursive function theory goes back to the pioneering

work of Turing, Kleene and others in the 1930s. We have based our
approach to the theory on the concept of an (unlimited) register machine
of Shepherdson and Sturgis [1963], which allows a particularly simple
development. The normal form theorem and the undecidability of the
halting problem are classical results from the 1930s, due to Kleene and
Church, respectively.
Our treatment of recursion in terms of equational deﬁnability is very
much based on the Herbrand–Gödel–Kleene equation calculus (see
Kleene [1952]) and is related more closely to the general development
in McCarthy’s [1963].
The subclass of the elementary functions treated in 2.2.2 was introduced
by Kalmár [1943].
Grzegorczyk [1953] was the ﬁrst to classify the primitive recursive func-
tions by means of a hierarchy E n , which coincides with levels of Lk -com-
putability for n = k + 1 ≥ 3; see also Cleave [1963]. In addition, E 2
2.10. Notes 111

is the class of subelementary functions. Ritchie [1963] and Schwichten-

berg [1967] (see also Rödding [1968]) all gave hierarchy classiﬁcations
of the elementary functions in terms of iterated exponential complexity
bounds, beginning with E 2 at the level of polynomial bounds. These are
polynomials in the input, not in the binary length of inputs. Thus, in
terms of binary length, E 2 then corresponds to linear space on a Turing
machine: see Ritchie [1963], and also Handley and Wainer [1999].
Volume II of Odifreddi [1999] contains a comprehensive 212-page sur-
vey, with many further references, on hierarchies of recursive functions.
The treatment ranges through complexity classes such as logarithmic
space and polynomial time, to subrecursive classes like “elementary” and
“primitive recursive”, up to the ordinally indexed hierarchies whose proof-
theoretic signiﬁcance will emerge in chapter 4.
Chapter 3

GÖDEL’S THEOREMS

This is the point at which we bring proof and recursion together and
begin to study connections between the computational complexity of re-
cursive functions and the logical complexity of their formal termination
or existence proofs. The rest of the book will largely be motivated by
this theme, and will make repeated use of the basics laid out here and
the proof-theoretic methods developed earlier. It should be stressed that
by “computational complexity” we mean complexity “in the large” or
“in theory”; not necessarily feasible or practical complexity. Feasibil-
ity is always desirable if one can achieve it, but the fact is that natural
formal theories of even modest logical strength prove the termination of
functions with enormous growth rate, way beyond the realm of practical
computability. Since our aim is to unravel the computational constraints
implicit in the logic of a given theory, we do not wish to have any prior
bounds imposed on the levels of complexity allowed.
At the base of our hierarchy of theories lie ones with polynomially
or at most exponentially bounded complexity, and these are studied in
part 3 at the end of the book. The principal objects of study in this
chapter are the elementary functions, which (i) will be characterized as
those provably terminating in the theory IΔ0 (exp) of bounded induction,
and (ii) will be shown to be adequate for the arithmetization of syntax
leading to Gödel’s theorems, a fact which most logicians believe but which
rarely has received a complete treatment elsewhere. We believe (i) to be
a fundamental theorem of mathematical logic, and one which—along
with realizability interpretations (see part 3)—underlies the whole area of
“proofs as programs” now actively developed in computer science. The
proof is completely straightforward, but it will require us, once and for
all, to develop some routine basic arithmetic inside IΔ0 (exp).
Later (in part 3) we shall see how to build alternative versions of this
theory, without the explicit bounds of Δ0 -formulas, but which still cha-
racterize the elementary functions and natural subclasses such as poly-
time. Such theories are reﬂective of a more recent research trend towards
“implicit complexity”. At ﬁrst sight they resemble theories of full arith-
metic, but they incorporate ideas of Bellantoni and Cook [1992], and of

113
114 3. Gödel’s theorems

Leivant [1995b] in which composition (quantiﬁcation) and recursion (in-

duction) act on diﬀerent kinds of variables. It is this variable separation
which brings apparently strong theories down to “more feasible” levels.
One cannot write a text on proof theory without bowing to Gödel, and
this chapter seems the obvious place in which to give a short but we hope
reasonably complete treatment of the two incompleteness theorems.
All of the results in this chapter are developed as if the logic is classical.
However, every result goes through in much the same way in a constructive
context.

3.1. IΔ0 (exp)

IΔ0 (exp) is a theory in classical logic, based on the language

· , · , exp2 }
{=, 0, S, P, +, −
where S, P denote the successor and predecessor functions. We shall
generally use infix notations x + 1, x −· 1, 2x rather than the more formal
S(x), P(x), exp2 (x) etcetera. The axioms of IΔ0 (exp) are the usual axioms
for equality (1.4.4), the following defining axioms for the constants:
x + 1 = 0 x+1=y+1→x =y
0−· 1=0 (x + 1) −· 1=x
x+0=x x + (y + 1) = (x + y) + 1
x− · 0=x x− · (y + 1) = (x − · y) −
· 1
x·0=0 x · (y + 1) = (x · y) + x
20 = 1 (= 0 + 1) 2x+1 = 2x + 2x
and the axiom-schema of “bounded induction”:
B(0) ∧ ∀x (B(x) → B(x + 1)) → ∀x B(x)
for all “bounded” formulas B as defined below.
Definition. We write t1 ≤ t2 for t1 − · t2 = 0 and t1 < t2 for t1 + 1 ≤ t2 ,
where t1 , t2 denote arbitrary terms of the language.
A Δ0 - or bounded formula is a formula in the langage of IΔ0 (exp), in
which all quantifiers occur bounded; thus ∀x<t B(x) stands for ∀x(x <
t → B(x)) and ∃x<t B(x) stands for ∃x(x < t ∧ B(x)) (similarly with ≤
instead of <).
A Σ1 -formula is any formula of the form ∃x1 ∃x2 . . . ∃xk B where B is
a bounded formula. The prefix of unbounded existential quantifiers is
allowed to be empty, thus bounded formulas are Σ1 .
3.1.1. Basic arithmetic in IΔ0 (exp). The first task in any axiomatic the-
ory is to develop, from the axioms, those basic algebraic properties which
are going to be used frequently without further reference. Thus, in the
3.1. IΔ0 (exp) 115
case of IΔ0 (exp) we need to establish the usual associativity, commuta-
tivity and distributivity laws for addition and multiplication, the laws of
exponentiation, and rules governing the relations ≤ and < just defined.
Lemma. In IΔ0 (exp) one can prove (the universal closures of ) case-
distinction
x = 0 ∨ x = (x − · 1) + 1,
the associativity laws for addition and multiplication
x + (y + z) = (x + y) + z and x · (y · z) = (x · y) · z,
the distributivity law
x · (y + z) = x · y + x · z,
the commutativity laws
x + y = y + x and x · y = y · x,
the law
· (y + z) = (x −
x− · y) −
· z,
and the exponentiation law
2x+y = 2x · 2y .
Proof. Since 0 = 0 and x +1 = ((x +1)− · 1)+1 by axioms, a trivial in-
duction on x gives the cases-distinction. A straightforward induction on z
gives associativity for +, and distributivity follows from this by an equally
straightforward induction, again on z. Associativity of multiplication is
proven similarly, but requires distributivity. The commutativity of + is
done by induction on y (or x) using sub-inductions to first prove 0+x = x
and (y + x) + 1 = (y + 1) + x. Commutativity of · is done similarly using
0 · x = 0 and y · x + x = (y + 1) · x, this latter requiring both associativity
and commutativity of +. That x − · (y + z) = (x − · y) −· z follows easily
by a direct induction on z. The base-case for the exponentiation law is
2x+0 = 2x = 0 + 2x = 2x · 0 + 2x = 2x · (0 + 1) = 2x · 20 and the induction
step needs distributivity to give 2x+y+1 = 2x · 2y + 2x · 2y = 2x · 2y+1 .
Lemma. The following (and their universal closures) are provable in
IΔ0 (exp):
1. x ≤ 0 ↔ x = 0 and ¬ x < 0,
2. 0 ≤ x and x ≤ x and x < x + 1,
3. x < y + 1 ↔ x ≤ y,
4. x ≤ y ↔ x < y ∨ x = y,
5. x ≤ y ∧ y ≤ z → x ≤ z and x < y ∧ y < z → x < z,
6. x ≤ y ∨ y < x,
7. x < y → x + z < y + z,
8. x < y → x · (z + 1) < y · (z + 1),
9. x < 2x and x < y → 2x < 2y .
116 3. Gödel’s theorems

Proof. (1) This is an immediate consequence of the axioms x − · 0=x

and x + 1 = 0. (2) A simple induction proves 0 − · x = 0, that is 0 ≤ x.
Another induction on y gives (x + 1) − · (y + 1) = x − · y, and then
a further induction proves x − · x = 0, which is x ≤ x. Replacing
x by x + 1 then gives x < x + 1. (3) This follows straight from the
equation (x + 1) − · (y + 1) = x − · y. (4) From x ≤ x we obtain
x = y → x ≤ y, and from x − · y = (x + 1) − · (y + 1) we obtain
x < y → x ≤ y, hence x < y ∨ x = y → x ≤ y. The converse
x ≤ y → x < y ∨ x = y is proven by a case-distinction on y, the case
y = 0 being immediate from part 1. In the other case y = (y − · 1) + 1
and one obtains x ≤ y → x − · (y − · 1) = 0 ∨ x − · (y −· 1) = 1 by a
case-distinction on x − · (y − · 1). Since (x + 1) −· y =x− · (y −· 1) this
gives x ≤ y → x < y ∨ x − · (y −· 1) = 1. It therefore remains only to
prove x − · (y −· 1) = 1 → x = y. But this follows immediately from
x− · z = 0 → x = z + (x − · z), which is proven by induction on z using
(z + 1) + (x −· (z + 1)) = z + (x − · (z + 1)) + 1 = z + ((x −
· z) −· 1) + 1 =
z + (x −· z). (5) Transitivity of ≤ is proven by induction on z using parts
1 for the basis and 4 for the induction step. Then, by replacing x by
x + 1 and y by y + 1, the transitivity of < follows. (6) can be proved
by induction on x. The basis is immediate from 0 ≤ y. The induction
step is straightforward since y < x → y < x + 1 by transitivity, and
x ≤ y → x < y ∨ x = y → x + 1 ≤ y ∨ y < x + 1 by previous
facts. (7) requires a simple induction on z, the induction step being
x + z < y + z → x + z + 1 ≤ y + z < y + z + 1. (8) follows from part
7 and transitivity by another easy induction on z. (9) Using part 7 and
transitivity again, one easily proves by induction, 2x < 2x+1 . Then x < 2x
follows straightforwardly by another induction, as does x < y → 2x < 2y
by induction on y, the induction step being x < y + 1 implies x ≤ y
implies 2x ≤ 2y implies 2x < 2y+1 by means of transitivity.
Note. All of the inductions used in the lemmas above are inductions on
“open”, i.e., quantifier-free, formulas.
3.1.2. Provable recursion in IΔ0 (exp). Of course in any theory many
new functions and relations can be defined out of the given constants.
What we are interested in are those which can not only be defined in the
language of the theory, but also can be proven to exist. This gives rise to
one of the main definitions in this book.
Definition. We say that a function f : Nk → N is provably Σ1 or prov-
ably recursive in an arithmetical theory T if there is a Σ1 -formula F (
x , y),
called a “defining formula” for f, such that
(a) f(
n ) = m if and only if F (n , m) is true (in the standard model);
(b) T ∃y F ( x , y);
(c) T F ( x, y ) → y = y .
x , y) ∧ F (
3.1. IΔ0 (exp) 117
If, in addition, F is a bounded formula and there is a bounding term t( x)
for f such that T F ( x , y) → y < t(
x ) then we say that f is provably
bounded in T. In this case we clearly have T ∃y<t(x ) F (
x , y).
The importance of this definition is brought out by the following:
Theorem. If f is provably Σ1 in T we may conservatively extend T
by adding a new function symbol for f together with the defining axiom
F (
x , f(
x )).
Proof. This is simply because any model M of T can be made into a
model (M, f) of the extended theory, by interpreting f as the function
on M uniquely determined by the second and third conditions above. So
if A is a closed formula not involving f, provable in the extended theory,
then it is true in (M, f) and hence true in M. Then by completeness, A
must already be provable in T .
Since Σ1 -definable functions are recursive, we shall often use the terms
“provably Σ1 ” and “provably recursive” synonymously. We next develop
the stock of functions provably Σ1 in IΔ0 (exp), and prove that they are
exactly the elementary functions.
Lemma. Each term defines a provably bounded function of IΔ0 (exp).
Proof. Let f be the function defined explicitly by f( n ) = t(n ) where
t is any term of IΔ0 (exp). Then we may take y = t( x ) as the defining
formula for f, since ∃y (y = t( x )) derives immediately from the axiom
t(
x ) = t(x ), and y = t(x ) ∧ y = t(x ) implies y = y is an equality
axiom. Furthermore, as y = t( x ) is a bounded formula and y = t implies
y < t + 1 is provable, f is provably bounded.
Lemma. Define 2k (x) by 20 (x) = x and 2k+1 (x) = 22k (x) . Then for
· , · , exp2 ,
every term t(x1 , . . . , xn ) built up from the constants 0, S, P, +, −
there is a k such that
IΔ0 (exp) t(x1 , . . . , xn ) < 2k (x1 + · · · + xn ).
Proof. We can prove in IΔ0 (exp) both 0 < 2x and x < 2x . Now
suppose t is any term constructed from subterms t0 , t1 by application of
one of the function constants. Assume inductively that t0 < 2k0 (s0 ) and
t1 < 2k1 (s1 ) are both provable, where s0 , s1 are the sums of all variables
appearing in t0 , t1 respectively. Let s be the sum of all variables appearing
in either t0 or t1 , and let k be the maximum of k0 and k1 . Then, by the
various arithmetical laws in the preceding lemmas, we can prove t0 < 2k (s)
and t1 < 2k (s), and it is then a simple matter to prove t0 + 1 < 2k+1 (s),
t0 −· 1 < 2k (s), t0 −· t1 < 2k (s), t0 + t1 < 2k+1 (s), t0 · t1 < 2k+1 (s) and
t0
2 < 2k+1 (s). Hence IΔ0 (exp) proves t < 2k+1 (s).
Lemma. Suppose f is defined by composition
f(
n ) = g0 ( g1 (
n ), . . . , gm (
n))
118 3. Gödel’s theorems

from functions g0 , g1 , . . . , gm , each of which is provably bounded in IΔ0 (exp).

Then f is provably bounded in IΔ0 (exp).
Proof. By the deﬁnition of “provably bounded” there is for each gi
(i ≤ m) a bounded deﬁning formula Gi and (by the last lemma) a number
ki such that, for 1 ≤ i ≤ m, IΔ0 (exp) ∃yi <2ki (s) Gi (
x , yi ), where s is
the sum of the variables x
; and for i = 0,
IΔ0 (exp) ∃y<2k (s0 ) G0 (y1 , . . . , ym , y),
0

where s0 is the sum of the variables y1 , . . . , ym . Let k := max(k0 , k1 , . . . ,

km ) and let F (
x , y) be the bounded formula
∃y1 <2k (s) . . . ∃ym <2k (s) C (
x , y1 , . . . , ym , y)
where C (
x , y1 , . . . , ym , y) is the conjunction
x , y1 ) ∧ · · · ∧ Gm (
G1 ( x , ym ) ∧ G0 (y1 , . . . , ym , y).
Then, clearly, F is a defining formula for f, and by prenex operations,
IΔ0 (exp) ∃y F (
x , y).
Furthermore, by the uniqueness condition on each Gi , we can also prove
in IΔ0 (exp)
x , z1 , . . . , zm , y )
x , y1 , . . . , ym , y) ∧ C (
C (
→ y1 = z1 ∧ · · · ∧ ym = zm ∧ G0 (y1 , . . . , ym , y) ∧ G0 (y1 , . . . , ym , y )
→ y = y ,
and hence by the quantifier rules of logic
IΔ0 (exp) F ( x, y ) → y = y .
x , y) ∧ F (
Thus f is provably Σ1 with F as a bounded defining formula, and it only
remains to find a bounding term. But IΔ0 (exp) proves

x , y1 , . . . , ym , y) → y1 < 2k (s) ∧ · · · ∧ ym < 2k (s) ∧

C (
y < 2k (y1 + · · · + ym )
and
y1 < 2k (s) ∧ · · · ∧ ym < 2k (s) → y1 + · · · + ym < 2k (s) · m.
x ) to be the term 2k (2k (s) · m) we obtain
Therefore by taking t(
IΔ0 (exp) C (
x , y1 , . . . , ym , y) → y < t(
x)
and hence
IΔ0 (exp) F (
x , y) → y < t(
x ).
This completes the proof.
3.1. IΔ0 (exp) 119
Lemma. Suppose f is defined by bounded minimization
f(
n , m) = k<m ( g(
n , k) = 0 )
from a function g which is provably bounded in IΔ0 (exp). Then f is provably
bounded in IΔ0 (exp).
Proof. Let G be a bounded defining formula for g and let F (
x , z, y)
be the bounded formula
y ≤ z ∧ ∀i<y ¬ G(
x , i, 0) ∧ (y = z ∨ G(
x , y, 0)).
Obviously F (n , m, k) is true in the standard model if and only if either
k is the least number less than m such that g( n , k) = 0, or there is
no such and k = m. But this is exactly what it means for k to be
the value of f( n , m), so F is a defining formula for f. Furthermore
IΔ0 (exp) F ( x , z, y) → y < z + 1, so t(
x , z) = z + 1 can be taken as a
bounding term for f. Also it is clear that we can prove
F ( x , z, y ) ∧ y < y → G(
x , z, y) ∧ F ( x , y, 0) ∧ ¬ G(
x , y, 0)
and similarly with y and y interchanged. Therefore
IΔ0 (exp) F ( x , z, y ) → ¬y < y ∧ ¬y < y
x , z, y) ∧ F (
and hence, because y < y ∨ y < y ∨ y = y is provable, we have
IΔ0 (exp) F ( x , z, y ) → y = y .
x , z, y) ∧ F (
It remains to check that IΔ0 (exp) ∃y F ( x , z, y). This is the point
where bounded induction comes into play, since ∃y F ( x , z, y) is a bounded
formula. We prove it by induction on z.
For the basis, recall that y ≤ 0 ↔ y = 0 and ¬ i < 0 are provable.
Therefore F (x , 0, 0) is provable, and hence so is ∃y F (x , 0, y).
For the induction step from z to z + 1, we can prove y ≤ z → y + 1 ≤
z + 1 and, using i < y + 1 ↔ i < y ∨ i = y,

∀i<y ¬ G(
x , i, 0) ∧ (y = z ∧ ¬ G(
x , y, 0)) → ∀i<y+1 ¬ G(
x , i, 0) ∧
y+1=z +1
Therefore
x , z, y) → F (
F ( x , z + 1, y + 1) ∨ F (
x , z + 1, y)
and hence
∃y F (
x , z, y) → ∃y F (
x , z + 1, y)
which completes the proof.
Theorem. Every elementary function is provably bounded in the theory
IΔ0 (exp).
120 3. Gödel’s theorems

Proof. As we have seen earlier in 2.2, the elementary functions can be

· , ·, exp2
characterized as those definable from the constants 0, S, P, +, −
by composition and bounded minimization. The above lemmas show that
each such function is provably bounded in IΔ0 (exp).
3.1.3. Proof-theoretic characterization.
Definition. A closed Σ1 -formula ∃z B( z ), with B a bounded formula,
is said to be “true at m”, and we write m |= ∃z B( z ), if there are numbers
m = m1 , m2 , . . . , ml , all less than m, such that B(m) is true (in the standard
model). A finite set Γ of closed Σ1 -formulas is “true at m”, written m |= Γ,
if at least one of them is true at m.
If Γ(x1 , . . . , xk ) is a finite set of Σ1 -formulas all of whose free variables
occur among x1 , . . . , xk , and if f : Nk → N, then we write f |= Γ to
mean that for all numerical assignments n = n1 , . . . , nk to the variables
x
= x1 , . . . , xk we have f( n ) |= Γ(
n ).
Note (Persistence). For sets Γ of closed Σ1 -formulas, if m |= Γ and
m < m then m |= Γ. Similarly for sets Γ( x ) of Σ1 -formulas with free
n ) ≤ f (
variables, if f |= Γ and f( ∈ N k then f |= Γ.
n ) for all n
Lemma. If Γ( x ) is a finite set of Σ1 -formulas (whose disjunction is) prov-
able in IΔ0 (exp) then there is an elementary function f, strictly increasing
in each of its variables, such that f |= Γ.
Proof. It is convenient to use a Tait-style formalization of IΔ0 (exp).
The axioms will be all sets of formulas Γ which contain either a comple-
mentary pair of equations t1 = t2 , t1 = t2 , or an identity t = t, or an
equality axiom t1 = t2 , ¬ e(t1 ), e(t2 ) where e(t) is any equation or inequa-
tion with a distinguished subterm t, or a substitution instance of one of
the defining axioms for the constants. The axiom schema of bounded
induction will be replaced by the induction rule
Γ, B(0) Γ, ¬B(y), B(y + 1)
Γ, B(t)
where B is any bounded formula, y is not free in Γ and t is any term.
Note that if Γ is provable in IΔ0 (exp) then it has a proof in the formalism
just described, in which all cut formulas are Σ1 . For if Γ is classically
derivable from non-logical axioms A1 , . . . , As then there is a cut-free proof
in Tait-style logic of ¬A1 , Δ, Γ where Δ = ¬A2 , . . . , ¬As . We show how
to cancel ¬A1 using a Σ1 -cut. If A1 is an induction axiom on the formula
B we have a cut-free proof in logic of
B(0) ∧ ∀y (¬B(y) ∨ B(y + 1)) ∧ ∃x ¬B(x), Δ, Γ
and hence, by inversion, cut-free proofs of B(0), Δ, Γ and ¬B(y), B(y+1),
Δ, Γ and ∃x ¬B(x), Δ, Γ. From the first two of these we obtain B(x), Δ, Γ
by the induction rule above, then ∀x B(x), Δ, Γ, and then from the third
3.1. IΔ0 (exp) 121
we obtain Δ, Γ by a cut on the Σ1 -formula ∃x ¬B(x). If A1 is the universal
closure of any other (quantifier-free) axiom then we immediately obtain
Δ, Γ by a cut on the Σ1 -formula ¬A1 . Having thus cancelled ¬A1 we can
similarly cancel each of ¬A2 , . . . , ¬As in turn, so as to yield the desired
proof of Γ which only uses cuts on Σ1 -formulas.
Now, choosing such a proof for Γ( x ), we proceed by induction on
its height, showing at each new proof-step how to define the required
elementary function f such that f |= Γ.
(i) If Γ( x ) is an axiom then for all n , Γ(n ) contains a true atom.
Therefore f |= Γ for any f. To make f sufficiently increasing choose
f(n ) = n1 + · · · + nk .
(ii) If Γ, B0 ∨ B1 arises by an application of the ∨-rule from Γ, B0 , B1
then (because of our definition of Σ1 -formula) B0 and B1 must both be
bounded formulas. Thus by our definition of “true at”, any function f
satisfying f |= Γ, B0 , B1 must also satisfy f |= Γ, B0 ∨ B1 .
(iii) Only a slightly more complicated argument applies to the dual case
where Γ, B0 ∧ B1 arises by an application of the ∧-rule from the premises
Γ, B0 and Γ, B1 . For if f0 ( n ) |= Γ(
n ), B0 ( n ) |= Γ(
n ) and f1 ( n ), B1 (
n)
for all n , then it is easy to see (by persistence) that f |= Γ, B0 ∧ B1 where
f(n ) = f0 ( n ) + f1 (n ).
(iv) If Γ, ∀y B(y) arises from Γ, B(y) by the ∀-rule (y not free in Γ)
then since all the formulas are Σ1 , ∀y B(y) must be bounded and so B(y)
must be of the form y<t ∨ B (y) for some term t. Now assume f0 |=
Γ, y < t, B (y) for some increasing elementary function f0 . Then for
all assignments n to the free variables x , and all assignments k to the
variable y,
n , k) |= Γ(
f0 ( n ), B (
n ), k < t( n , k).
Therefore by defining f( n ) = Σk<g(n ) f0 ( n , k) where g is an increasing
elementary function bounding t, we easily see that either f( n ) |= Γ(n)
or else, by persistence, B (
n , k) is true for every k < t(n ). Hence f |=
Γ, ∀y B(y) as required, and clearly f is elementary since f0 and g are.
(v) Now suppose Γ, ∃y A(y, x ) arises from Γ, A(t, x ) by the ∃-rule,
where A is Σ1 . Then by the induction hypothesis there is an elementary

f0 such that for all n
n ) |= Γ(
f0 ( n ), A(t( ).
n ), n
Then either f0 ( n ) |= Γ(
n ) or else f0 (
n ) bounds true witnesses for all
the existential quantifiers already in A(t( ). Therefore by choosing
n ), n
any elementary bounding function g for the term t, and defining f( n) =
f0 (
n ) + g( n ) |= Γ(
n ), we see that either f( n ) |= ∃y A(y, n
n ) or f( ) for
.
all n
(vi) If Γ comes about by the cut rule with Σ1 -cut formula C := ∃z B( z)
then the two premises are Γ, ∀z ¬B( z ) and Γ, ∃z B(z ). The universal
122 3. Gödel’s theorems

quantiﬁers in the ﬁrst premise can be inverted (without increasing proof-

height) to give Γ, ¬B(z ) and since B is bounded the induction hypothesis
can be applied to this to give an elementary f0 such that for all numerical
assignments n to the (implicit) variables x
and all assignments m to the
new free variables z
f0 ( |= Γ(
n , m) n ), ¬B(
n , m).
Applying the induction hypothesis to the second premise gives an elemen-
, either f1 (
tary f1 such that for all n n ) |= Γ(
n ) or else there are fixed
< f1 (
witnesses m n ) such that B( is true. Therefore if we define f
n , m)
by substitution from f0 and f1 thus:
f(
n ) = f0 (
n , f1 (
n ), . . . , f1 (
n ))
then f will be elementary, greater than or equal to f1 , and strictly in-
creasing since both f0 and f1 are. Furthermore f |= Γ. For otherwise
there would be a tuple n such that Γ( n ) is not true at f(n ) and hence,
by persistence, not true at f1 ( n ). So B( is true for certain numbers
n , m)
m < f1 ( n ). But then f0 ( < f(
n , m) n ) and so, again by persistence,
Γ(n ) cannot be true at f0 (
n , m). This means B( is false, by the
n , m)
above, and so we have a contradiction.
(vii) Finally suppose Γ( x ), B( x , t) arises by an application of the in-
duction rule on the bounded formula B. The premises are Γ( x ), B(
x , 0)
and Γ( x ), ¬ B(x , y), B(
x , y + 1). Applying the induction hypothesis to
each of the premises one obtains increasing elementary functions f0 and
f1 such that for all n and all k
n ) |= Γ(
f0 ( n ), B(
n , 0),
n , k) |= Γ(
f1 ( n ), ¬B(
n , k), B(
n , k + 1).
Now define f( n ) = f0 (n ) + Σk<g(n ) f1 (
n , k) where g is some increasing
elementary bounding function for the term t. Then f is elementary and
increasing, and by persistence from the above properties of f0 and f1 ,
either f(n ) |= Γ(n ), or else B(
n , 0) and B( n , k) → B( n , k + 1) are true
for all k < t(n ). In this latter case B( n , t(n )) is true by induction on k
up to the value of t( n ). Either way, we have f |= Γ( x ), B( x , t(
x )) and
this completes the proof.
Theorem. A number-theoretic function is elementary if and only if it is
provably Σ1 in IΔ0 (exp).
Proof. We have already shown that every elementary function is prov-
ably bounded, and hence provably Σ1 , in IΔ0 (exp). Conversely suppose f
is provably Σ1 . Then there is a Σ1 -formula
x , y) := ∃z1 . . . ∃zk B(
F ( x , y, z1 . . . zk )
which defines f and such that
IΔ0 (exp) ∃y F (
x , y).
3.2. Gödel numbers 123

By the lemma immediately above, there is an elementary function g such

there are numbers m0 , m1 , . . . , mk less
that for every tuple of arguments n
than g(n ) satisfying the bounded formula B( n , m0 , m1 , . . . , mk ). Using
the elementary sequence-coding schema developed earlier in 2.2, let

n ) = g(
h( n ), g(
n ), . . . , g(
n )

so that if m = m0 , m1 , . . . , mk where m0 , m1 , . . . , mk < g(

n ), then m <
h(
n ). Then, because f( n ) is the unique m0 for which there are m1 , . . . , mk
satisfying B(
n , m0 , m1 , . . . , mk ), we can deﬁne f as follows:

f(
n ) = ( m<h(n ) B(
n , (m)0 , (m)1 , . . . , (m)k ) )0 .

Since B is a bounded formula of IΔ0 (exp) it is elementarily decidable, and

since the least number operator is bounded by the elementary function
h, the entire deﬁnition of f therefore involves only elementary operations.
Hence f is an elementary function.

3.2. Gödel numbers

We will assign numbers—so-called Gödel numbers, GN for short—

to the syntactical constructs developed in chapter 1: terms, formulas
and derivations. Using the elementary sequence-coding and decoding
machinery developed earlier we will be able to construct the code number
of a composed object from its parts, and conversely to disassemble the
code number of a composed object into the code numbers of its parts.
3.2.1. Gödel numbers of terms, formulas and derivations. Let L be a
countable first-order language. Assume that we have injectively assigned
to every n-ary relation symbol R a symbol number sn(R) of the form
1, n, i and to every n-ary function symbol f a symbol number sn(f)
of the form 2, n, j. Call L elementarily presented if the set SymbL of
all these symbol numbers is elementary. In what follows we shall always
assume that the languages L considered are elementarily presented. In
particular this applies to every language with finitely many relation and
function symbols.
Let sn(Var) := 0. For every L-term r we define recursively its Gödel
number r by

xi := sn(Var), i,

fr1 . . . rn := sn(f), r1 , . . . , rn .

Assign numbers to the logical symbols by sn(→) := 3, 0 and sn(∀) :=

3, 1. For simplicity we leave out the logical connectives ∧, ∨ and ∃ here;
they could be treated similarly. We deﬁne for every L-formula A its Gödel
124 3. Gödel’s theorems

number A by
Rr1 . . . rn := sn(R), r1 , . . . , rn ,
A → B := sn(→), A, B,
∀xi A := sn(∀), i, A.

We deﬁne symbol numbers for the names of the natural deduction rules:
sn(AssVar) := 4, 0, sn(→+ ) := 4, 1, sn(→− ) := 4, 2, sn(∀+ ) :=
4, 3, sn(∀− ) := 4, 4. For a derivation M we deﬁne its Gödel number
M by
uiA := sn(AssVar), i, A,
uiA M := sn(→+ ), i, A, M ,
MN := sn(→− ), M , N ,
xi M := sn(∀+ ), i, M ,
Mr := sn(∀− ), M , r.

It will be helpful in the sequel to have some general estimates on Gödel

numbers, which we provide here. For a term r or formula A we deﬁne its
sum of maximal sequence lengths ||r|| or ||A|| by
||xi || := 2,
||fr0 . . . rk−1 || := k + 1 + max(||ri ||),
||Rr0 . . . rk−1 || := k + 1 + max(||ri ||),
||A → B|| := 3 + max(||A||, ||B||),
||∀xi A|| := 3 + ||A||

and its symbol bound sb(r) or sb(A) by

sb(xi ) := 1 + max(sn(Var), i),
sb(f) := 1 + sn(f),
sb(fr0 . . . rk ) := max(sn(f), max(sb(ri ))),
sb(R) := 1 + sn(R),
sb(Rr0 . . . rk ) := max(sn(R), max(sb(ri ))),
sb(A → B) := max(sn(→), sb(A), sb(B)),
sb(∀xi A) := max(sn(∀), i, sb(A)).
||r|| ||A||
Lemma. ||r|| ≤ r < sb(r)2 and ||A|| ≤ A < sb(A)2 .
Proof. We prove ||r|| ≤ r by induction on r. Case xi .
||xi || = 2 ≤ sn(Var), i = xi .
3.2. Gödel numbers 125

Case fr0 . . . rk−1 . First note that k + i<k ni ≤ n0 , . . . , nk−1 can be
proved easily, by induction on k. Hence

fr0 . . . rk−1 = sn(f), r0 , . . . , rk−1

≥ k + 1 + max(ri )
≥ k + 1 + max(||ri ||) by induction hypothesis
= ||fr0 . . . rk−1 ||.
||r||
The proof of ||A|| ≤ A is similar. For r < sb(r)2 we again use
induction on r. For a variable xi we obtain by the estimate in 2.2.5
2 ||xi ||
xi = sn(Var), i < sb(xi )2 = sb(xi )2

and for a constant f

||f||
· 1 < sb(f)2 = sb(f)2
f = sn(f) = sb(f) − .

For a term r := fr0 . . . rk−1 built with a function symbol f or arity k > 0
we have

fr0 . . . rk−1
= sn(f), r0 , . . . , rk−1
· 1, n −
≤ n − · 1, . . . , n −
· 1 with n := sb(r)2
max ||ri ||
, by ind. hyp.

k+1
2k+1
<n by the estimate in 2.2.5
k+1+max ||ri || ||r||
= sb(r)2 = sb(r)2 .
||A||
The proof of A < sb(A)2 is again similar, but we spell out the
quantiﬁer case A := ∀xi B:

∀xi B = sn(∀), i, B

· 1, n −
≤ n − · 1, n −
· 1 with n := sb(A)2
||B||
, by ind. hyp.
23
<n by the estimate in 2.2.5
3+||B|| ||A||
= sb(A)2 = sb(A)2 .

3.2.2. Elementary functions on Gödel numbers. We shall deﬁne an ele-

mentary predicate Deriv such that Deriv(d ) if and only if d is the Gödel
number of a derivation. To this end we need a number of auxiliary func-
tions and relations, which will all be elementary and have the properties
described. (The convention is that relations are capitalized and functions
126 3. Gödel’s theorems

are lower case). First we need some basic notions:

Ter(t) t is GN of a term,
For(a) a is GN of a formula,
α-Eq(x, y) the terms/formulas with GNs x, y are α-equal,
FV(i, y) the variable xi is free in the term or formula with GN y,
fmla(d ) GN of the formula derived by the derivation with GN d .
A
By the context of a derivation M we mean the set {uiA0 0 , . . . , uin−1
n−1
} of its
free assumption variables, where i0 < · · · < in−1 . Its Gödel number is
deﬁned to be the least number c such that ∀<n ((c)i = A ).
ctx(d ) GN of the context of the derivation with GN d ,
Cons(c1 , c2 ) the contexts with GN c1 , c2 are consistent.
Then Deriv can be deﬁned by course-of-values recursion, using the next-
to-last lemma in 2.2.5.
Deriv(d ) := ((d )0 = sn(AssVar) ∧ lh(d ) = 3 ∧ For((d )2 )) ∨
((d )0 = sn(→+ ) ∧ lh(d ) = 4 ∧ For((d )2 ) ∧ Deriv((d )3 ) ∧
((ctx((d )3 ))(d )1 = 0 → (ctx((d )3 ))(d )1 = (d )2 )) ∨
((d )0 = sn(→− ) ∧ lh(d ) = 3 ∧ Deriv((d )1 ) ∧ Deriv((d )2 )∧
Cons(ctx((d )1 ), ctx((d )2 )) ∧
(fmla((d )1 ))0 = sn(→) ∧ (fmla((d )1 ))1 = fmla((d )2 )) ∨
((d )0 = sn(∀+ ) ∧ lh(d ) = 3 ∧ Deriv((d )2 ) ∧ ∀i<lh(ctx((d )2 )) (
(ctx((d )2 ))i = 0 → ¬FV((d )1 , (ctx((d )2 ))i ))) ∨
((d )0 = sn(∀− ) ∧ lh(d ) = 3 ∧ Deriv((d )1 ) ∧ Ter((d )2 ) ∧
(fmla((d )1 ))0 = sn(∀)).
Still further auxiliary functions are needed. A substitution is a map
xi0 → r0 , . . . , xin−1 → rn−1 with i0 < · · · < in−1 from variables to terms;
its Gödel number is the least number s such that ∀<n ((s)i = r ).
Hence (s)i = 0 indicates that s leaves xi unchanged.
union(c1 , c2 ) GN of the union of the consistent contexts with GN c1 , c2 ,
remove(c, i) GN of result of removing ui from the context with GN c,
sub(x, s) GN of the result of applying the substitution with GN s
to the term or formula with GN x,
update(s, i, t) GN of the result of updating the substitution with GN s
by changing its entry at i to the term with GN t.
3.2. Gödel numbers 127

We now give definitions of all these; from the form of the definitions it
will be clear that they have the required properties, and are elementary.
Update. This can be defined explicitly, using the bounded least number
operator:

update(s, i, t) := x<h(max(s,t),max(lh(s),i)) ((x)i = t ∧

∀k<max(lh(s),i) (k = i → (x)k = (s)k ))
k
where h(n, k) := (n + 1)2 is the elementary function defined earlier with
the property n, . . . , n ≤ h(n, k).
Substitution. The substitution function defined next takes a formula or
term with GN x and applies to it a substitution with GN s to produce
a new formula with GN y. The substitution works by assigning specific
terms to the free variables, but in order to avoid clashing it must also
reassign new variables to the universally bound ones. This occurs in the
final clause of the definition where, to be on the safe side, we (recursively)
assign to a bound variable the new variable with index x + i(s), where i(s)
is the maximum index of any variable occurring in a value term (s)j of s.
Notice that i(s) ≤ s. We define substitution by a limited course-of-values
recursion with parameter substitutions:
⎧
⎪
⎪ x if (x)0 = sn(Var) ∧ (s)(x)1 = 0,
⎪
⎪
⎪
⎪ (s)(x)1 if (x)0 = sn(Var) ∧ (s)(x)1 = 0,
⎪
⎪
⎪
⎪
⎪
⎪ y≤k(x,s) (lh(x) = lh(y) ∧ (x)0 = (y)0 ∧
⎪
⎪
⎪
⎨∀i<l (sub((x)i+1 , s) = (y)i+1 ))
sub(x, s) :=
⎪
⎪ if (x)0,0 = 1 ∨ (x)0,0 = 2 ∨ (x)0 = sn(→),
⎪
⎪
⎪
⎪
⎪
⎪ sn(∀), x + i(s), sub((x)2 , update(s, (x)1 ,
⎪
⎪
⎪
⎪ sn(Var), x + i(s))) if (x)0 = sn(∀),
⎪
⎪
⎪
⎩
0 otherwise,
sub(x, s) ≤ k(x, s),
where it is assumed that the relation and function symbols in the given
language L all have arity ≤ l . The bound k(x, s) and a bound for the
iterated parameter updates remain to be provided, so that the last lemma
in 2.2.5 can be applied. Then sub will be elementary.
First notice that as s is continually updated by the recursion, for the
sake of (the formula or term with GN) x, the first update assigns to
a bound variable in x a “new” variable with index x + i(s). The next
update will then assign to a bound variable in some subformula x of
x a new variable with index x + x + i(s) etcetera. The final update
will therefore be a sequence of length ≤ x 2 + i(s), whose entries are all
< max(s, sn(Var), x 2 + i(s)). Thus a bound for all iterated updates
128 3. Gödel’s theorems
2
starting from s and x is this last expression to the power of 2x +i(s) , which
is elementary.
Using the lemma in 3.2.1 above we can see that if x is the GN of a
term or a formula X and s is the GN of a substitution S, so that we may
write sub(x, s) = X [S], then sb(X [S]) ≤ max(s, x, x 2 + i(s)) ≤ x 2 + s
and, clearly, ||X [S]|| ≤ x + s. The lemma then gives an elementary bound
x+s
k(x, s) := (x 2 + s)2 for sub(x, s).
Remove, union, consistency, context. Removal of an assumption vari-
able from a context is defined by
remove(c, i) := x≤c ((x)i = 0 ∧ ∀j<lh(c) (j = i → (x)j = (c)j )).
The union of two consistent contexts can again be defined by the bounded
-operator:
union(c1 , c2 ) := c≤c1 ∗c2 ∀i<max(lh(c1 ),lh(c2 )) ((c)i = max((c1 )i , (c2 )i )).
Consistency of two contexts is defined by
Cons(c1 , c2 ) :=
∀i<max(lh(c1 ),lh(c2 )) ((c1 )i = 0 → (c2 )i = 0 → α-Eq((c1 )i , (c2 )i )).
The context of a derivation is defined by
ctx(d ) := c≤d (((d )0 = sn(AssVar) ∧ (c)(d )1 = (d )2 ) ∨
((d )0 = sn(→+ ) ∧ c = remove(ctx((d )3 ), (d )1 )) ∨
((d )0 = sn(→− ) ∧ c = union(ctx((d )1 ), ctx((d )2 ))) ∨
((d )0 = sn(∀+ ) ∧ c = ctx((d )2 )) ∨
((d )0 = sn(∀− ) ∧ c = ctx((d )1 ))).
Formulas, terms. The end formula of a derivation is defined by
fmla(d ) := a≤f(d ) (((d )0 = sn(AssVar) ∧ a = (d )2 ) ∨
((d )0 = sn(→+ ) ∧ a = sn(→), (d )2 , fmla((d )3 )) ∨
((d )0 = sn(→− ) ∧ a = (fmla((d )1 ))2 ) ∨
((d )0 = sn(∀+ ) ∧ a = sn(∀), (d )1 , fmla((d )2 )) ∨
((d )0 = sn(∀− ) ∧
sub((fmla((d )1 ))2 , s≤d ((s)(fmla((d )1 ))1 =(d )2 )) = a)),
where the elementary bound f(d ) remains to be provided. Clearly it suf-
fices to have an elementary estimate of A(r) in terms of a = ∀x A(x)
and b = r. For the GN s of the substitution assigning r to x we have
b b d
s ≤ a 2 . Hence A(r) = sub(a, s) ≤ k(a, a 2 ) ≤ k(d, d 2 ) =: f(d ).
Notice that this is the only place in our definitions of auxiliary functions
and relations where the substitution function is needed.
3.2. Gödel numbers 129

Freeness of a variable xi in a term or formula is deﬁned by

FV(i, y) := ((y)0 = sn(Var) ∧ (y)1 = i) ∨
((y)0,0 = 1 ∧ ∃j<lh(y)−·1 FV(i, (y)j+1 )) ∨
((y)0,0 = 2 ∧ ∃j<lh(y)−·1 FV(i, (y)j+1 )) ∨
((y)0 = sn(→) ∧ (FV(i, (y)1 ) ∨ FV(i, (y)2 ))) ∨
((y)0 = sn(∀) ∧ i = (y)1 ∧ FV(i, (y)2 )).
To define α-equality (i.e., equality up to renaming of bound variables)
of formulas we use a relation Corr(n, m, s, t) due to Robert Stärk. The
intuitive meaning is this: two numbers n, m (indices of variables) are
“correlated” w.r.t. coded lists s, t (of mutually inverted pairs of indices) if
one of the following holds.
(i) There is a first element n, v of the form n, . . . in s and a first
element m, u of the form m, . . . in t, and v = m, u = n.
(ii) There is no element of the form n, . . . in s and no element of the
form m, . . . in t, and n = m.
We define Corr by
Corr(n, m, s, t) := ∃i<lh(s) ∃j<lh(t) ((s)i = n, (s)i,1 ∧ ∀i <i (s)i ,0 = n ∧
(t)j = m, (t)j,1 ∧ ∀j <j (t)j ,0 = m ∧
(s)i,1 = m ∧ (t)j,1 = n) ∨ (n = m ∧
∀i<lh(s) (s)i,0 = n ∧ ∀j<lh(t) (t)j,0 = m).

Now define α-Eq by
α-Eq (a, b, s, t) :=
((a)0 = (b)0 = sn(Var) ∧ Corr((a)1 , (b)1 , s, t)) ∨
((a)0 = (b)0 ∧ SymbL ((a)0 ) ∧ ∀i<(a)0,1 α-Eq ((a)i+1 , (b)i+1 , s, t)) ∨
((a)0 = (b)0 = sn(→) ∧ α-Eq ((a)1 , (b)1 , s, t) ∧ α-Eq ((a)2 , (b)2 , s, t))∨
((a)0 = (b)0 = sn(∀) ∧ α-Eq ((a)2 , (b)2 , (a)1 , (b)1 ∗ s,
(b)1 , (a)1 ∗ t)).
α-Eq is an elementary relation because it is here defined by course-of-
values recursion with parameter substitution, where iterates of the (qua-
dratic) parameter updates are elementarily bounded. Finally α-Eq(x, y)
:= α-Eq (x, y, , ).
The sets of formulas and terms are defined by
For(a) :=
((a)0,0 = 1 ∧ SymbL ((a)0 ) ∧ lh(a) = (a)0,1 + 1 ∧ ∀j<(a)0,1 Ter((a)j+1 )) ∨
((a)0 = sn(→) ∧ lh(a) = 3 ∧ For((a)1 ) ∧ For((a)2 )) ∨
((a)0 = sn(∀) ∧ lh(a) = 3 ∧ For((a)2 )),
130 3. Gödel’s theorems

Ter(t) := ((t)0 = sn(Var) ∧ lh(t) = 2) ∨ ((t)0,0 = 2 ∧

SymbL ((t)0 ) ∧ lh(t) = (t)0,1 +1 ∧ ∀j<(t)0,1 Ter((t)j+1 )).

Recall that for simplicity we have left out the logical connectives ∧, ∨ and
∃. They could be added easily, including an extension of the notion of a
derivation to also allow their axioms as listed in 1.1.7.
3.2.3. Axiomatized theories. Let L be an elementarily presented lan-
guage with = in L. Call a relation recursive if its (total) characteristic func-
tion is recursive. A set S of formulas is called recursive (elementary, primi-
tive recursive, recursively enumerable), if S := {A | A ∈ S} is recursive
(elementary, primitive recursive, recursively enumerable). Clearly the sets
EfqL of ex-falso-quodlibet axioms and EqL of L-equality axioms are ele-
mentary. A theory T with L(T ) ⊆ L is recursively (elementarily, primitive
recursively) axiomatizable if there is a recursive (elementary, primitive re-
cursive) set S of closed L-formulas such that T = {A ∈ L | S ∪EqL A}.
Theorem. For theories T with L(T ) ⊆ L the following are equivalent.
(a) T is recursively axiomatizable.
(b) T is primitive recursively axiomatizable.
(c) T is elementarily axiomatizable.
(d) T is recursively enumerable.
Proof. (d) → (c). Let T be recursively enumerable. Then there is an
elementary f such that T = ran(f). Let f(n) = An . We deﬁne an
elementary function g with the property g(n) = A0 ∧ · · · ∧ An by

g(0) := f(0),
g(n + 1) := g(n) ∧˙ f(n + 1),
3n
g(n) < mn2 where mn := 1 + max(sn(∧), maxi≤n f(i))

with a ∧˙ b := sn(∧), a, b. The estimate is proved by induction on n. The

base case is clear, and in the step we have

g(n + 1) = sn(∧), g(n), f(n + 1)

· 1, mn2 −
· 1, mn+1 −
· 1
3n
≤ mn+1 − by induction hypothesis
3n 3
2
< (mn+1 )2 by the estimate in 2.2.5
23(n+1)
= mn+1 .

For S := {A0 ∧ · · · ∧ An | n ∈ N} we have S = ran(g), and this set is

elementary because of a ∈ ran(g) ↔ ∃n<a (a = g(n)). T is elementarily
axiomatizable, since T = {A ∈ L | S ∪ EqL A}.
(c) → (b) and (b) → (a) are clear.
3.2. Gödel numbers 131

(a) → (d). Let T be axiomatized by S with S recursive. Then

a ∈ T ↔ ∃d (Deriv(d ) ∧ fmla(d ) = a ∧ ∀i<a ¬FV(i, a) ∧

∀i<lh(ctx(d )) ((ctx(d ))i ∈ EqL ∪ S)).
Hence T is recursively enumerable.
Call a theory T in our elementarily presented language L axiomatized
if it is given by a recursively enumerable axiom system AxT . By the
theorem just proved we can even assume that AxT is elementary. For such
axiomatized theories we deﬁne a binary relation Prf T by

Prf T (d, a) := Deriv(d ) ∧ fmla(d )=a ∧

∀i<lh(ctx(d )) ((ctx(d ))i ∈ EqL ∪ AxT ).
Clearly Prf T is elementary and Prf T (d, a) holds if and only if d is the
GN of a derivation of the formula with GN a from a context composed
of equality axioms and formulas from AxT . A theory T is consistent if
⊥∈ / T ; otherwise T is inconsistent. A theory T is complete if for every
closed formula A we have A ∈ T or ¬A ∈ T , and incomplete otherwise.
Corollary. Let T be a consistent theory. If T is axiomatized and
complete then T is recursive.
Proof. We deﬁne the characteristic function c T of T as follows.
c T (a) is 0 if ¬For(a) or ∃i<a FV(i, a). Otherwise it is deﬁned by

c T (a) = (x ((Prf T ((x)0 , a) ∧ (x)1 = 1) ∨

(Prf T ((x)0 , ¬a)
˙ ∧ (x)1 = 0)))1
with ¬a˙ := sn(→), a, sn(⊥). Completeness of T implies that c T
is total, and consistency that it indeed is the characteristic function of
T .
3.2.4. Undefinability of the notion of truth. Let M be an L-structure.
A relation R ⊆ |M|n is called definable in M if there is an L-formula
A(x1 , . . . , xn ) such that
R = {(a1 , . . . , an ) ∈ |M|n | M |= A(x1 , . . . xn )[x1 := a1 , . . . , xn := an ]}.
We assume in this section that |M| = N, 0 is a constant in L and S is a
unary function symbol in L with 0M = 0 and SM (a) = a + 1. Recall that
for every a ∈ N the numeral a ∈ TerL is defined by 0 := 0 and n + 1 := Sn.
Observe that in this case the definability of R ⊆ Nn by A(x1 , . . . , xn ) is
equivalent to
R = {(a1 , . . . , an ) ∈ Nn | M |= A(a1 , . . . , an )}.
Furthermore let L be an elementarily presented language. We assume in
this section that every elementary relation is definable in M. A set S of
formulas is called definable in M if S := {A | A ∈ S} is.
132 3. Gödel’s theorems

We shall show that already from these assumptions it follows that the
notion of truth for M, more precisely the set Th(M) of all closed formulas
valid in M, is undefinable in M. From this it will follow that the notion
of truth is in fact undecidable, for otherwise the set Th(M) would be
recursive, hence recursively enumerable, and hence definable, because we
have assumed already that all elementary relations are definable in M and
so their projections are definable also. For the proof we shall need the
following fixed point lemma, which will be generalized in 3.3.2.
Lemma (Semantical fixed point lemma). If every elementary relation is
definable in M, then for every L-formula B(z) we can find a closed L-
formula A such that
M |= A if and only if M |= B(A).
Proof. Let s be the elementary function satisfying for every formula
C = C (z) with z := x0 ,
s(C , k) = sub(C , k) = C (k)
where sub is the substitution function already defined in 3.2.2. Hence in
particular
s(C , C ) = C (C ).
By assumption the graph Gs of s is definable in M, by As (x1 , x2 , x3 ) say.
Let
C := ∃x (B(x) ∧ As (z, z, x)), A := C (C ),
and therefore
A = ∃x (B(x) ∧ As (C , C , x)).
Hence M |= A if and only if ∃a∈N ((M |= B(a)) ∧ a = C (C )), which
is the same as M |= B(A).
Theorem (Tarski’s undefinability theorem). Assume that every eleme-
ntary relation is definable in M. Then Th(M) is undefinable in M, hence
in particular not recursively enumerable.
Proof. Assume that Th(M) is definable by BW (z). Then for all
closed formulas A
M |= A if and only if M |= BW (A).
Now consider the formula ¬BW (z) and choose by the fixed point lemma
a closed L-formula A such that
M |= A if and only if M |= ¬BW (A).
This contradicts the equivalence above.
We already have noticed that all recursively enumerable relations are
definable in M. Hence it follows that Th(M) cannot be recursively
enumerable.
3.3. The notion of truth in formal theories 133

3.3. The notion of truth in formal theories

We now want to generalize the arguments of the previous section. There

we have made essential use of the notion of truth in a structure M, i.e., of
the relation M |= A. The set of all closed formulas A such that M |= A
has been called the theory of M, denoted Th(M).
Now instead of Th(M) we shall start more generally from an arbitrary
theory T . We consider the question as to whether in T there is a notion of
truth (in the form of a truth formula B(z)), such that B(z) “means” that
z is “true”. A consequence is that we have to explain all the notions used
without referring to semantical concepts at all.
(i) z ranges over closed formulas (or sentences) A, or more precisely
over their Gödel numbers A.
(ii) A “true” is to be replaced by T A.
(iii) C “equivalent” to D is to be replaced by T C ↔ D.
Hence the question now is whether there is a truth formula B(z) such
that T A ↔ B(A) for all sentences A. The result will be that this is
impossible, under rather weak assumptions on the theory T . Technically,
the issue will be to replace the notion of deﬁnability by the notion of
“representability” within a formal theory. We begin with a discussion of
this notion.
In this section we assume that L is an elementarily presented language
with 0, S and = in L, and T an L-theory containing the equality axioms
EqL .

3.3.1. Representable relations and functions.

Deﬁnition. A relation R ⊆ Nn is representable in T if there is a formula
A(x1 , . . . , xn ) such that
T A(a1 , . . . , an ) if (a1 , . . . , an ) ∈ R,
T ¬A(a1 , . . . , an ) if (a1 , . . . , an ) ∈
/ R.
A function f : Nn → N is called representable in T if there is a formula
A(x1 , . . . , xn , y) representing the graph Gf ⊆ Nn+1 of f, i.e., such that
T A(a1 , . . . , an , f(a1 , . . . , an )), (1)
T ¬A(a1 , . . . , an , c) if c = f(a1 , . . . , an ) (2)
and such that in addition
T A(a1 , . . . , an , y) ∧ A(a1 , . . . , an , z) → y=z for all a1 , . . . , an ∈ N.
(3)

Note that in case T b = c for b < c condition (2) follows from (1)
and (3).
134 3. Gödel’s theorems

Lemma. If the characteristic function cR of a relation R ⊆ Nn is repre-

sentable in T , then so is the relation R itself.
Proof. For simplicity assume n = 1. Let A(x, y) be a formula rep-
resenting cR . We show that A(x, 1) represents the relation R. Assume
a ∈ R. Then cR (a) = 1, hence (a, 1) ∈ GcR , hence T A(a, 1). Con-
versely, assume a ∈ / R. Then cR (a) = 0, hence (a, 1) ∈ / GcR , hence
T ¬A(a, 1).
3.3.2. Undefinability of the notion of truth in formal theories.
Lemma (Fixed point lemma). Assume that all elementary functions are
representable in T . Then for every formula B(z) we can find a closed formula
A such that
T A ↔ B(A).
Proof. The proof is very similar to the proof of the semantical fixed
point lemma. Let s be the elementary function introduced there and
As (x1 , x2 , x3 ) a formula representing s in T . Let
C := ∃x (B(x) ∧ As (z, z, x)), A := C (C ),
and therefore
A = ∃x (B(x) ∧ As (C , C , x)).
Because of s(C , C ) = C (C ) = A we can prove in T
As (C , C , x) ↔ x = A,
hence by definition of A also
A ↔ ∃x (B(x) ∧ x = A)
and therefore
A ↔ B(A).

Note that for T = Th(M) we obtain the semantical fixed point lemma
above as a special case.
Theorem. Let T be a consistent theory such that all elementary functions
are representable in T . Then there cannot exist a formula B(z) defining the
notion of truth, i.e., such that for all closed formulas A
T A ↔ B(A).
Proof. Assume we would have such a B(z). Consider the formula
¬B(z) and choose by the fixed point lemma a closed formula A such that
T A ↔ ¬B(A).
For this A we obtain T A ↔ ¬A, contradicting the consistency of T .
With T := Th(M) Tarski’s undefinability theorem is a special case.
3.4. Undecidability and incompleteness 135

3.4. Undecidability and incompleteness

Consider a consistent formal theory T with the property that all recur-
sive functions are representable in T . This is a very weak assumption,
as we shall show in the next section: it is always satisfied if the theory
allows to develop a certain minimum of arithmetic. We shall show that
such a theory necessarily is undecidable. First we shall prove a (weak)
first incompleteness theorem saying that every axiomatized such theory
must be incomplete, and then we prove a sharpened form of this theorem
due to Gödel and then Rosser, which explicitly provides a closed formula
A such that neither A nor ¬A is provable in the theory T .
In this section let L again be an elementarily presented language with
0, S, = in L and T a theory containing the equality axioms EqL .
3.4.1. Undecidability.
Theorem (Undecidability). Assume that T is a consistent theory such
that all recursive functions are representable in T . Then T is not recursive.
Proof. Assume that T is recursive. By assumption there exists a for-
mula B(z) representing T in T . Choose by the fixed point lemma a
closed formula A such that

T A ↔ ¬B(A).

We shall prove (∗) T A and (∗∗) T A; this is the desired contradiction.

Ad (∗). Assume T A. Then A ∈ T , hence A ∈ T , hence
T B(A) (because B(z) represents in T the set T ). By the choice
of A it follows that T ¬A, which contradicts the consistency of T .
Ad (∗∗). By (∗) we know T A. Therefore A ∈ / T , hence A ∈/ T
and therefore T ¬B(A). By the choice of A it follows that T A.
3.4.2. Incompleteness.
Theorem (First incompleteness theorem). Assume that T is an axiom-
atized consistent theory with the property that all recursive functions are
representable in T . Then T is incomplete.
Proof. This is an immediate consequence of the fact that every axiom-
atized consistent theory which is complete is also recursive (a corollary in
3.2.3), and the undecidability theorem above.
As already mentioned, we now sharpen the incompleteness theorem
in the sense that we actually produce a formula A such that neither A
nor ¬A is provable. Gödel’s ﬁrst incompleteness theorem provided such
an A under the assumption that the theory satisﬁed a stronger condition
than mere consistency, namely “-consistency”. Rosser then improved
Gödel’s result by showing, with a somewhat more complicated formula,
that consistency is all that is required.
136 3. Gödel’s theorems

Theorem (Gödel–Rosser). Let T be axiomatized and consistent. As-

sume that there is a formula L(x, y)—written x < y—such that
T ∀x<n (x = 0 ∨ · · · ∨ x = n − 1), (4)
T ∀x (x = 0 ∨ · · · ∨ x = n ∨ n < x). (5)
Assume also that every elementary function is representable in T . Then we
can find a closed formula A such that neither A nor ¬A is provable in T .
Proof. We first define RefutT ⊆ N × N by
RefutT (d, a) := Prf T (d, ¬a).
˙
Then RefutT is elementary and RefutT (d, a) holds if and only if d is the
GN of a derivation of the negation of a formula with GN a from a context
composed of equality axioms and formulas from AxT . Let BPrf T (x1 , x2 )
and BRefutT (x1 , x2 ) be formulas representing Prf T and RefutT , respectively.
Choose by the fixed point lemma a closed formula A such that
T A ↔ ∀x (BPrf T (x, A) → ∃y<x BRefutT (y, A)).
A expresses its own underivability, in the form (due to Rosser): “For every
proof of me there is a shorter proof of my negation”.
We shall show (∗) T A and (∗∗) T ¬A.
Ad (∗). Assume T A. Choose n such that
Prf T (n, A).

Then we also have

not RefutT (m, A) for all m,

since T is consistent. Hence

T BPrf T (n, A),

T ¬BRefutT (m, A) for all m.

By (4) we can conclude

T BPrf T (n, A) ∧ ∀y<n ¬BRefutT (y, A).

Hence

T ∃x (BPrf T (x, A) ∧ ∀y<x ¬BRefutT (y, A)),

T ¬A.
This contradicts the assumed consistency of T .
Ad (∗∗). Assume T ¬A. Choose n such that
RefutT (n, A).
3.5. Representability 137

Then we also have

not Prf T (m, A) for all m,

since T is consistent. Hence

T BRefutT (n, A),

T ¬BPrf T (m, A) for all m.

This implies

T ∀x (BPrf T (x, A) → ∃y<x BRefutT (y, A)),

as can be seen easily by cases on x, using (5). Hence T A. But this

again contradicts the assumed consistency of T .
Finally we formulate a variant of this theorem which does not assume
that the theory T talks about numbers only. Call T a theory with de-
ﬁned natural numbers if there is a formula N (x)—written Nx—such that
T N 0 and T ∀x∈N N (Sx) where ∀x∈N A is short for ∀x (Nx → A).
Representing a function in such a theory of course means that the free
variables in (3) are relativized to N :

T ∀y,z∈N (A(a1 , . . . , an , y) ∧ A(a1 , . . . , an , z) → y=z)

for all a1 , . . . , an ∈ N.

Theorem (Gödel–Rosser). Assume that T is an axiomatized consistent

theory with deﬁned natural numbers, and that there is a formula L(x, y)—
written x < y—such that

T ∀x∈N (x < n → x = 0 ∨ · · · ∨ x = n − 1),

T ∀x∈N (x = 0 ∨ · · · ∨ x = n ∨ n < x).

Assume also that every elementary function is representable in T . Then one

can ﬁnd a closed formula A such that neither A nor ¬A is provable in T .
Proof. As for the Gödel–Rosser theorem above; just relativize all quan-
tiﬁers to N .

3.5. Representability

We show in this section that already very simple theories have the prop-
erty that all recursive functions are representable in them.
138 3. Gödel’s theorems

3.5.1. Weak arithmetical theories.

Theorem. Let L be an elementarily presented language with 0, S, = in
L and T a consistent theory with defined natural numbers containing the
equality axioms EqL and the ex-falso-quodlibet axiom ∀x,y∈N (⊥ → x = y).
Assume that there is a formula L(x, y)—written x < y—such that
T Sa = 0 for all a ∈ N, (6)
T Sa = Sb → a = b for all a, b ∈ N, (7)
the functions + and · are representable in T , (8)
T ∀x∈N (x < 0), (9)
T ∀x∈N (x < Sb → x < b ∨ x = b) for all b ∈ N, (10)
T ∀x∈N (x < b ∨ x = b ∨ b < x) for all b ∈ N. (11)
Then T fulfills the assumptions of the Gödel–Rosser theorem relativized to
N , i.e.,
T ∀x∈N (x < a → x = 0 ∨ · · · ∨ x = a − 1) for all a ∈ N, (12)
T ∀x∈N (x = 0 ∨ · · · ∨ x = a ∨ a < x) for all a ∈ N, (13)
and every recursive function is representable in T .
Proof. (12) can be proved easily by induction on a. The base case
follows from (9), and the step from the induction hypothesis and (10).
(13) immediately follows from the trichotomy law (11), using (12).
For the representability of recursive functions, first note that the for-
mulas x = y and x < y actually do represent in T the equality and the
less-than relations, respectively. From (6) and (7) we can see immediately
that T a = b when a = b. Assume a < b. We show T a < b by
induction on b. T a < 0 follows from (9). In the step we have a < b +1,
hence a < b and a = b, hence by induction hypothesis and the repre-
sentability (above) of the equality relation, T a < b and T a = b,
hence by (10) T a < Sb. Now assume a < b. Then T a = b and
T b < a, hence by (11) T a < b.
We now show by induction on the definition of -recursive functions
that every recursive function is representable in T . Recall (from 3.3.1) that
the second condition (2) in the definition of representability of a function
automatically follows from the other two (and hence need not be checked
further). This is because T a = b for a = b.
The initial functions constant 0, successor and projection (onto the i-th
coordinate) are trivially represented by the formulas 0 = y, Sx = y
and xi = y respectively. Addition and multiplication are represented in
T by assumption. Recall that the one remaining initial function of -
recursiveness is − · , but this is definable from the characteristic function
of < by a − · b = i (b + i ≥ a) = i (c< (b + i, a) = 0). We now show
that the characteristic function of < is representable in T . (It will then
3.5. Representability 139
· is representable, once we have shown that the representable
follow that −
functions are closed under .) We show that
A(x1 , x2 , y) := (x1 < x2 ∧ y = 1) ∨ (x1 < x2 ∧ y = 0)
represents c< . First notice that ∀y,z∈N (A(a1 , a2 , y) ∧ A(a1 , a2 , z) → y =
z) already follows logically from the equality axiom and the ex-falso-
quodlibet axiom for equality (by cases on the alternatives of A). Assume
a1 < a2 . Then T a1 < a2 , hence T A(a1 , a2 , 1). Now assume
a1 < a2 . Then T a1 < a2 , hence T A(a1 , a2 , 0).
For the composition case, suppose f is defined from h, g1 , . . . , gm by
f(
a ) = h(g1 (
a ), . . . , gm (
a )).
By induction hypothesis we already have representing formulas Agi (
x , yi )
and Ah (
y , z). As representing formula for f we take
Af := ∃y ∈N (Ag1 (
x , y1 ) ∧ · · · ∧ Agm (
x , ym ) ∧ Ah (
y , z)).
Assume f( a ) = c. Then there are b1 , . . . , bm such that T Agi ( a , bi )

for each i, and T Ah (b, c) so by logic T Af ( a , c). It remains to
show uniqueness T ∀z1 ,z2 ∈N (Af (
a , z1 ) ∧ Af (
a , z2 ) → z1 = z2 ). But this
follows by logic from the induction hypothesis for gi , which gives
T ∀y1i ,y2i ∈N (Agi (
a , y1i ) ∧ Agi (
a , y2i ) → y1i = y2i = gi (
a ))
and the induction hypothesis for h, which gives
T ∀z1 ,z2 ∈N (Ah (b, z1 ) ∧ Ah (b, z2 ) → z1 = z2 ) with bi = gi (
a ).
For the case, suppose f is defined from g (taken here to be bi-
nary for notational convenience) by f(a) = i (g(i, a) = 0), assuming
∀a ∃i (g(i, a) = 0). By induction hypothesis we have a formula Ag (y, x, z)
representing g. In this case we represent f by the formula
Af (x, y) := Ny ∧ Ag (y, x, 0) ∧ ∀v∈N (v < y → ∃u∈N ;u=0 Ag (v, x, u)).
We first show the representability condition (1), that is T Af (a, b)
when f(a) = b. Because of the form of Af this follows from the assumed
representability of g together with T ∀v∈N (v < b → v = 0 ∨ · · · ∨ v =
b − 1).
We now tackle the uniqueness condition (3). Given a, let b := f(a)
(thus g(b, a) = 0 and b is the least such). It suffices to show
T ∀y∈N (Af (a, y) → y = b).
We prove T ∀y∈N (y < b → ¬Af (a, y)) and T ∀y∈N (b < y →
¬Af (a, y)), and then appeal to the trichotomy law and the ex-falso-
quodlibet axiom for equality.
We first show T ∀y∈N (y < b → ¬Af (a, y)). Now since, for any
i < b, T ¬Ag (i, a, 0) by the assumed representability of g, we obtain
140 3. Gödel’s theorems

immediately T ¬Af (a, i). Hence because of T ∀y∈N (y < b →

y = 0 ∨ · · · ∨ y = b − 1) the claim follows.
Secondly, T ∀y∈N (b < y → ¬Af (a, y)) follows almost immediately
from T ∀y∈N (b < y → Af (a, y) → ∃u∈N ;u=0 Ag (b, a, u)) and the
uniqueness for g, T ∀u∈N (Ag (b, a, u) → u = 0).
3.5.2. Robinson’s theory Q. We conclude this section by considering
a special and particularly simple arithmetical theory due originally to
Robinson [1950]. Let L1 be the language given by 0, S, +, · and =, and let
Q be the theory determined by the axioms EqL1 , ex-falso-quodlibet for
equality ⊥ → x = y and
Sx = 0, (14)
Sx = Sy → x = y, (15)
x + 0 = x, (16)
x + Sy = S(x + y), (17)
x · 0 = 0, (18)
x · Sy = x · y + x, (19)
∃z (x + Sz = y) ∨ x = y ∨ ∃z (y + Sz = x). (20)
Theorem (Robinson’s Q). Every consistent theory T ⊇ Q fulfills the
assumptions of the Gödel–Rosser theorem w.r.t. the definition L(x, y) :=
∃z (x + Sz = y) of the <-relation. In particular, every recursive function is
representable in T .
Proof. We show that T satisfies the conditions of the previous theorem.
For (6) and (7) this is clear. For (8) we can take x + y = z and x · y = z
as representing formulas. For (9) we have to show ¬∃z (x + Sz = 0); this
follows from (17) and (14). For the proof of (10) we need the auxiliary
proposition
x = 0 ∨ ∃y (x = 0 + Sy), (21)
which will be attended to below. Assume x + Sz = Sb, hence also
S(x + z) = Sb and therefore x + z = b. We must show ∃y (x + Sy =
b) ∨ x = b. But this follows from (21) for z. In case z = 0 we obtain
x = b, and in case ∃y (z = 0 + Sy) we have ∃y (x + Sy = b), since
0 + Sy = S(0 + y). Thus (10) is proved. (11) follows immediately from
(20). For the proof of (21) we use (20) with y = 0. It clearly suffices
to exclude the first case ∃z (x + Sz = 0). But this means S(x + z) = 0,
contradicting (14).
Corollary (Essential undecidability of Q). Every consistent theory
T ⊇ Q in an elementarily presented language is non-recursive.
Proof. This follows from the theorem above and the undecidability
theorem in 3.4.1.
3.6. Unprovability of consistency 141

Corollary (Undecidability of logic). The set of formulas derivable in

the classical fragment of minimal logic is non-recursive.
Proof. Otherwise Q would be recursive, because a formula A is deriv-
able in Q if and only if the implication B → A is derivable, where B is the
conjunction of the finitely many axioms and equality axioms of Q.
Remark. Note that it suffices that the underlying language contains one
binary relation symbol (for =), one constant symbol (for 0), one unary
function symbol (for S) and two binary functions symbols (for + and ·).
The study of decidable fragments of first-order logic is one of the oldest
research areas of mathematical logic. For more information see Börger,
Grädel, and Gurevich [1997].
3.5.3. Σ1 -formulas. Reading the above proof of representability, one
can see that the representing formulas used are of a restricted form, having
no unbounded universal quantifiers and therefore defining Σ01 -relations.
This will be of crucial importance for our proof of Gödel’s second incom-
pleteness theorem to follow, but in addition we need to make a syntac-
tically precise definition of the class of formulas involved, more specific
and apparently more restrictive than the notion of Σ1 -formula used ear-
lier. However, as proved in the corollary below, we can still represent all
recursive functions even in the weak theory Q by means of Σ1 -formulas
in this more restrictive sense. Consequently provable Σ1 -ness will be the
same whichever definition we take.
Definition. For the remainder of this chapter, the Σ1 -formulas of the
language L1 will be those generated inductively by the following clauses:
(a) Only atomic formulas of the restricted forms x = y, x = y, 0 = x,
Sx = y, x + y = z and x · y = z are allowed as Σ1 -formulas.
(b) If A and B are Σ1 -formulas, then so are A ∧ B and A ∨ B.
(c) If A is a Σ1 -formula, then so is ∀x<y A, which is an abbreviation for
∀x (∃z (x + Sz = y) → A).
(d) If A is a Σ1 -formula, then so is ∃x A.
Corollary. Every recursive function is representable in Q by a Σ1 -
formula in the language L1 .
Proof. This can be seen immediately by inspecting the proof of the
theorem above on weak arithmetical theories. Only notice that because of
the equality axioms ∃z (x+Sz = y) is equivalent to ∃z ∃w (Sz = w∧x+w =
y) and A(0) is equivalent to ∃x (0 = x ∧ A(x)).

3.6. Unprovability of consistency

We have seen in the theorem of Gödel–Rosser how, for every axiom-

atized consistent theory T saﬁsfying certain weak assumptions, we can
142 3. Gödel’s theorems

construct an undecidable sentence A meaning “For every proof of me

there is a shorter proof of my negation”. Because A is unprovable, it is
clearly true.
Gödel’s second incompleteness theorem provides a particularly interes-
ting alternative to A, namely a formula ConT expressing the consistency
of T . Again it turns out to be unprovable and therefore true. We shall
prove this theorem in a sharpened form due to Löb.
3.6.1. Σ1 -completeness. We prove an auxiliary proposition, expressing
the completeness of Q with respect to Σ1 -formulas.
Lemma (Σ1 -completeness). Let A(x1 , . . . , xn ) be a Σ1 -formula of the lan-
guage L1 . Assume that N1 |= A(a1 , . . . , an ) where N1 is the standard model
of L1 . Then Q A(a1 , . . . , an ).
Proof. By induction on the Σ1 -formulas of the language L1 . For atomic
formulas, the cases have been dealt with either in the earlier parts of the
proof of the theorem above on weak arithmetical theories, or (for x+y = z
and x · y = z) they follow from the recursion equations (16)–(19).
Cases A ∧ B, A ∨ B. The claim follows immediately from the induction
hypothesis.
Case ∀x<y A(x, y, z1 , . . . , zn ); for simplicity assume n = 1. Suppose
N1 |= (∀x<y A)(b, c). Then also N1 |= A(i, b, c) for each i < b and hence
by induction hypothesis Q A(i, b, c). Now by the theorem above on
Robinson’s Q
Q ∀x<b (x = 0 ∨ · · · ∨ x = b − 1),
hence
Q (∀x<y A)(b, c).
Case ∃x A(x, y1 , . . . , yn ); for simplicity again take n = 1. Assume
N1 |= (∃x A)(b). Then N1 |= A(a, b) for some a ∈ N, hence by induction
hypothesis Q A(a, b) and therefore Q (∃x A)(b).
3.6.2. Derivability conditions. Let T be an axiomatized consistent the-
ory with T ⊇ Q, and let Prf T (p, z) be a Σ1 -formula of the language L1
which represents in Robinson’s theory Q the recursive relation “a is the
Gödel number of a proof in T of the formula with Gödel number b”.
Consider the following L1 -formulas:
ThmT (x) := ∃y Prf T (y, x),
ConT := ¬∃y Prf T (y, ⊥).

Then ThmT (x) deﬁnes in N1 the set of formulas provable in T , and

we have N1 |= ConT if and only if T is consistent. We write A for
ThmT (A); hence ConT can be written ¬⊥. Now suppose, in addition,
that T satisﬁes the following two derivability conditions, due to Hilbert and
3.6. Unprovability of consistency 143

Bernays [1939]:
T A → A, (22)
T (A → B) → A → B. (23)

(22) formalizes Σ1 -completeness of the theory T for closed formulas, and

(23) is a formalization of its closure under modus ponens (i.e., →− ).
The derivability conditions place further restrictions on the theory T
and its proof predicate Prf T . We check them under the assumption that
T contains IΔ0 (exp), and Prf T is as defined earlier. (There are non-
standard ways of coding proofs which lead to various “pathologies”—see,
e.g., Feferman [1960]).
The formalized version of modus ponens is easy to see, assuming that
T can be conservatively extended to include a “proof term” t(y, y ) such
that one may prove
Prf T (y, A → B) → Prf T (y , A) → Prf T (t(y, y ), B)
for then (23) follows immediately by quantifier rules.
(22) is harder. A detailed proof requires a great deal of syntactic
machinery to do with the construction of proof terms, as above, acting
on Gödel numbers so as to mimic the various rules inside T . We merely
content ourselves here with a short indication of why (22) holds; this
should be sufficient to convince the reader of its validity.
Assume that T contains IΔ0 (exp). Then, as we have seen at the be-
ginning of this chapter, the elementary functions are provably recursive
and so we may take their definitions as having been added conservatively.
Working informally “inside” T one shows, by induction on y, that
Prf T (y, A) → Prf T (f(y), A)
where f is elementary. Then (22) follows by the quantifier rules.
If y is the Gödel number of a derivation (in T ) consisting of an axiom
A then there will be a term t, elementarily computable from y, such that
Prf T (t, A) and hence A are derivable in T . This derivation may be
syntactically complex, but it will essentially consist of checking that t, as
a Gödel number, encodes the right thing. Thus the derivation of A has
a fixed Gödel number (depending on t and hence y) and this is what we
take as the value of f(y).
If y is the Gödel number of a derivation of A in which one of the
rules is finally applied, say to premises A and A , then there will be
y , y < y such that Prf T (y , A ) and Prf T (y , A ). By the induction
hypothesis, f(y ) and f(y ) will be the Gödel numbers of T -derivations
of A and A , and as in the modus-ponens case above, there will be a
fixed derivation which combines these two into a new derivation of A.
We take, as the value f(y), the Gödel number of this final derivation,
144 3. Gödel’s theorems

computable from f(y ) and f(y ) by applying some additional (sub-

elementary) coding corresponding to the additional steps from A and
A to A.
The function f will be definable from elementary functions by a course-
of-values recursion in which the recursion steps are in fact computed sub-
elementarily. Therefore it will be a limited course-of-values recursion and,
by a result in chapter 2, f will therefore be elementary as required.
Theorem (Gödel’s second incompleteness theorem). Let T be an ax-
iomatized consistent extension of Q, satisfying the derivability conditions
(22) und (23). Then T ConT .
Proof. Let C := ⊥ in Löb’s theorem below, which is a generalization
of Gödel’s original result.
Theorem (Löb). Let T be an axiomatized consistent extension of Q
satisfying the derivability conditions (22) and (23). Then for any closed
L1 -formula C , if T C → C , then already T C .
Proof. Assume T C → C . We must show T C . Choose A by
the fixed point lemma such that
Q A ↔ (A → C ). (24)
First we show T A → C . We obtain
T A → A → C by (24)
T (A → A → C ) by Σ1 -completeness
T A → (A → C ) by (23)
T A → A → C again by (23)
T A → C since T A → A by (22).
Therefore the assumption T C → C implies T A → C . Hence
T A by (24), and then T A by Σ1 -completeness. But T A → C
as we have just shown, therefore T C .
Remark. It follows that if T is any axiomatized consistent extension of
Q satisfying the derivability conditions (22) und (23), then the reflection
schema
C → C for closed L1 -formulas C
is not derivable in T . For by Löb’s theorem, it cannot be derivable when
C is underivable.
By adding to Q the induction schema for all formulas we obtain Peano
arithmetic PA, which is the most natural example of a theory T to which
the results above apply. However, various weaker fragments of PA, ob-
tained by restricting the classes of induction formulas, would serve equally
well as examples of such T . As we have seen, in fact, T ⊇ IΔ0 (exp)
suffices.
3.7. Notes 145

3.7. Notes

The fundamental paper on incompleteness is Gödel [1931]. This paper

already contains the -function crucially needed for the representation
theorem; the fixed point lemma is used implicitly. Gödel’s first incom-
pleteness theorem uses the formula “I am not provable”, a fixed point of
¬ThmT (x). To prove independence of this proposition from the underly-
ing theory T one needs -consistency of T (which is automatically fulfilled
if T is a subtheory of the theory of the standard model). Rosser [1936]
found the sharpening presented here, using the formula “For every proof
of me there is a shorter proof of my negation”. Löb’s theorem [1955]
is based on the formula A, which says “If I am provable, then C ”. A
consequence is that, just as “Gödel sentences”, which assert their own
unprovability, are true, so also are so-called “Henkin sentences” true, i.e.,
those which assert their own provability.
Undefinability of the notion of truth was proved originally by Tarski
[1936], and undecidability of predicate logic is a result of Church [1936].
The arithmetical theory Q is due to Robinson [1950]. Buss [1998a] gives a
detailed exposition of various weak and strong fragments of PA, bounded
arithmetic, Q, polynomial-time arithmetization and Gödel’s theorems.
There is also much more work on general reflection principles, which
we only have touched in the most simple case. One should mention
here Smoryński [1991], Feferman [1960], Girard [1987] and Beklemishev
[2003].
The volumes of Gödel’s collected works edited by Feferman, Dawson
et al. [1986, 1990, 1995, 2002a, 2002b] provide excellent commentaries on
his massive contributions to logic.
Part 2

PROVABLE RECURSION IN CLASSICAL SYSTEMS

Chapter 4

THE PROVABLY RECURSIVE FUNCTIONS OF

ARITHMETIC

This chapter develops the classiﬁcation theory of the provably recursive

functions of arithmetic. The topic has a long history tracing back to
Kreisel [1951], [1952] who, in setting out his “no-counter-example” inter-
pretation, gave the first explicit characterization of the functions “com-
putable in” arithmetic, as those definable by recursions over standard
well-orderings of the natural numbers with order types less than ε0 . Such
a characterization seems now, perhaps, not so surprising in light of the
groundbreaking work of Gentzen [1936], [1943], showing that these well-
orderings are just the ones over which one can prove transfinite induc-
tion in arithmetic, and hence prove the totality of functions defined by
recursions over them. Subsequent work of the present authors [1970],
[1971], [1972], extending previous results of Grzegorczyk [1953] and Rob-
bin [1965], then provided other complexity characterizations in terms of
natural, simply defined hierarchies of so-called “fast growing” bounding
functions. What was surprising was the deep connection later discovered,
first by Ketonen and Solovay [1981], between these bounding functions
and a variety of combinatorial results related to the “modified” finite Ram-
sey theorem of Paris and Harrington [1977]. It is through this connection
that one gains immediate access to a range of mathematically meaningful
independence results for arithmetic and stronger theories. Thus, classify-
ing the provably recursive functions of a theory not only gives a measure of
its computational power; it also serves to delimit its mathematical power
in providing natural examples of true mathematical statements it cannot
prove. The devil lies in the detail, however, and that’s what we present
here.
The main ingredients of the chapter are: (i) Parsons’ [1966] oft-quoted
but seldom fully exposited refinement of Kreisel’s result, characterizing
the functions provably recursive in fragments of arithmetic with restricted
induction-complexity; (ii) their corresponding classifications in terms of
the fast-growing hierarchy; and (iii) applications to two of the best-known
independence results: that of Kirby and Paris [1982] on Goodstein’s the-
orem [1944] and the modified finite Ramsey theorem already mentioned.

149
150 4. The provably recursive functions of arithmetic

Whereas Kreisel’s original proof (that the provably recursive functions

are “ordinal-recursive” at levels below ε0 ) was based on Ackermann’s
[1940] analysis of the epsilon-substitution method for arithmetic, our
principal method will be that first developed by Schütte [1951], namely
cut-elimination in infinitary logics with ordinal bounds. A wide variety
of other treatments of these, and related, topics is to be found in the liter-
ature, some along similar lines to those presented here, some using quite
different model-theoretic ideas, and some applying to stronger theories
than just arithmetic (as we shall do in the next chapter—for once the basic
classification theory is established, there is no reason to stop at ε0 ). See for
example Tait [1961], [1968], Löb and Wainer [1970], Wainer [1970], [1972],
Schwichtenberg [1971], [1975], [1977], [1992], Parsons [1972], Borodin
and Constable [1971], Constable [1972], Mints [1973], Zemke [1977], Paris
[1980], Kirby and Paris [1982], Rose [1984], Sieg [1985], [1991], Buch-
holz and Wainer [1987], Buchholz [1980], Girard [1987], Takeuti [1987],
Hájek and Pudlák [1993], Feferman [1992], Rathjen [1992], [1999], Som-
mer [1992], [1995], Tucker and Zucker [1992], Ratajczyk [1993], Buch-
holz, Cichon, and Weiermann [1994], Buss [1994], [1998b], Friedman
and Sheard [1995], Weiermann [1996], [1999], [2004], [2005], [2006], Avi-
gad and Sommer [1997], Fairtlough and Wainer [1998], Troelstra and
Schwichtenberg [2000], Feferman and Strahm [2000], [2010], Strahm and
Zucker [2008], Bovykin [2009].
Recall, from the previous chapter, that a function f : Nk → N is provably
Σ1 , or provably recursive, in an arithmetical theory T if there is a Σ1 -
formula F ( x , y) (i.e., one obtained by prefixing finitely many unbounded
existential quantifiers to a Δ0 (exp)-formula) such that
(i) f(
n ) = m if and only if F ( n , m) is true (in the standard model),
(ii) T ∃y F ( x , y),
(iii) T F ( x, y ) → y = y .
x , y) ∧ F (
The theories we shall be concerned with in this chapter are PA (Peano
arithmetic) and its inductive fragments IΣn , all based on classical logic.
We take, as our formalization of PA, IΔ0 (exp) together with all induction
axioms
A(0) ∧ ∀a (A(a) → A(a + 1)) → A(t)
for arbitrary formulas A and (substitutible) terms t.
Historically of course, the Peano axioms only include definitions of zero,
successor, addition and multiplication, whereas the base theory we have
chosen includes predecessor, modified subtraction and exponentiation as
well. We do this because IΔ0 (exp) is both a natural and convenient theory
to have available from the start. However, these extra functions can all
be provably Σ1 -defined in IΣ1 from the “pure” Peano axioms, using the
Chinese remainder theorem, so we are not actually increasing the strength
of any of the theories here by including them. Furthermore the results in
4.1. Primitive recursion and IΣ1 151

this chapter would not at all be affected by adding to the base theory any
other primitive recursively defined functions one wishes.
IΣn has the same base theory IΔ0 (exp), but the induction axioms are
restricted to formulas A of the form Σi or Πi with i ≤ n, defined for the
purposes of this chapter as follows:
Definition. Σ1 -formulas have already been defined. A Π1 -formula is
the dual or (classically) negation of a Σ1 -formula. For n > 1, a Σn -
formula is one obtained by prefixing just one existential quantifier to
a Πn−1 -formula, and a Πn -formula is one formed by prefixing just one
universal quantifier to a Σn−1 -formula. Thus only in the cases Σ1 and
Π1 do strings of like quantifiers occur. In all other cases, strings of like
quantifiers are assumed to have been contracted into one such, using the
pairing functions , 1 , 2 which are available in IΔ0 (exp). This is no real
restriction, but merely a matter of convenience for later results.
Note. It doesn’t matter whether one restricts to Σn or Πn -induction
formulas since, in the presence of the subtraction function, induction on a
Πn -formula A is reducible to induction on its Σn dual ¬A, and vice versa.
For if one replaces A(a) by ¬A(t − · a) in the induction axiom, and then
contraposes, one obtains
· t) ∧ ∀a (A(t −
A(t − · (a + 1)) → A(t −
· a)) → A(t −
· 0)
· t = 0,
from which follows the induction axiom for A(a) itself, since t −
· · · ·
t − 0 = t, and t − a = (t − (a + 1)) + 1 if t − a = 0. In a similar way
the least number principle
∃a A(a) → ∃a (A(a) ∧ ∀b<a ¬A(b))
is obtained by contraposing the induction axiom for the formula B(a) :=
∀b<a ¬A(b).

4.1. Primitive recursion and IΣ1

One of the most fundamental results about provable recursiveness, due

originally to Parsons [1966] but see also Mints [1973] and Takeuti [1987],
is the fact that the provably recursive functions of IΣ1 are exactly the
primitive recursive functions. The proof is very similar to the one in the
last chapter, characterizing the elementary functions as those provably
recursive in IΔ0 (exp), but the extra power of induction on unbounded ex-
istentially quantiﬁed formulas now allows us to prove that every primitive
recursion terminates.
4.1.1. Primitive recursive functions are provable in IΣ1 .
Lemma. Every primitive recursive function is provably recursive in IΣ1 .
152 4. The provably recursive functions of arithmetic

Proof. We must show how to assign, to each primitive recursive deﬁ-

nition of a function f, a Σ1 -formula F ( x , y) := ∃z C (
x , y, z) such that
1. f( n ) = m if and only if F ( n , m) is true (in the standard model),
2. T ∃y F ( x , y),
3. T F ( x , y) ∧ F ( x, y ) → y = y .
In each case, C ( x , y, z) will be a Δ0 (exp)-formula constructed using the
sequence coding machinery already shown to be definable (by bounded
formulas) in IΔ0 (exp). It expresses that z is a uniquely determined se-
quence number coding the computation of f( x ) = y, and containing the
output value y as its final component, so that y = 2 (z). Condition 1 will
hold automatically because of the definition of C , and condition 3 will be
satisfied because of the uniqueness of z. We consider, in turn, each of the
five definitional schemes by which the function f may be introduced:
First suppose f is the constant-zero function f(x) = 0. Then we take
C (x, y, z) to be the formula y = 0 ∧ z = 0. Conditions 1, 2 and 3 are
then immediately satisfied.
Similarly, if f is the successor function f(x) = x + 1 we take C (x, y, z)
to be the formula y = x + 1 ∧ z = x + 1. Again, the conditions hold
trivially.
Similarly, if f is a projection function f( x ) = xi we take C ( x , y, z) to
be the formula y = xi ∧ z = xi .
Now suppose f is defined by substitution from previously generated
primitive recursive functions f0 , f1 , . . . , fk thus:
f(
x ) = f0 (f1 (
x ), . . . , fk (
x )).
For typographical ease, and without any real loss of generality, we shall
fix k = 2. So assume inductively that f0 , f1 , f2 have already been shown
to be provably recursive, with associated Δ0 (exp)-formulas C0 , C1 , C2
coding their computations. For the function f itself, define C ( x , y, z)
to be the conjunction of the formulas lh(z) = 4, C1 ( x , 2 ((z)1 ), (z)1 ),
C2 (x , 2 ((z)2 ), (z)2 ), C0 ( 2 ((z)1 ), 2 ((z)2 ), y, (z)0 ), and (z)3 = y.
Then condition 1 holds because f( n ) = m if and only if there are
numbers m1 , m2 such that f1 ( n ) = m1 , f2 ( n ) = m2 and f0 (m1 , m2 ) =
m; and these hold if and only if there are numbers k1 , k2 , k0 such that
C1 (n , m1 , k1 ) and C2 ( n , m2 , k2 ) and C0 (m1 , m2 , m, k0 ) are all true; and
these hold if and only if C ( n , m, k0 , k1 , k2 , m) is true. Thus f( n) = m
if and only if F ( n , m) := ∃z C ( n , m, z) is true.
Condition 2 holds as well, since from C1 ( x , y1 , z1 ), C2 (x , y2 , z2 ) and
C0 (y1 , y2 , y, z0 ) we can immediately derive C ( x , y, z0 , z1 , z2 , y) in
IΔ0 (exp). So from ∃y ∃z C1 ( x , y, z), ∃y ∃z C2 ( x , y, z) and ∀x1 ∀x2 ∃y ∃z C0 (x1 ,
x2 , y, z) we obtain a proof of ∃y F ( x , y) := ∃y ∃z C ( x , y, z) as required.
Condition 3 holds because, from the corresponding property for each
of C0 , C1 and C2 , we can easily derive C ( x , y, z) ∧ C ( x, y , z ) → y =
y ∧ z = z .
4.1. Primitive recursion and IΣ1 153

Finally suppose f is deﬁned from f0 and f1 by primitive recursion:

f(
v , 0) = f0 (
v ) and f(
v , x + 1) = f1 (
v , x, f(
v , x))
where f0 and f1 are already assumed to be provably recursive with asso-
ciated Δ0 (exp)-formulas C0 and C1 . Define C ( v , x, y, z) to be the conjunc-
v , 2 ((z)0 ), (z)0 ), ∀i<x C1 (
tion of the formulas C0 ( v , i, 2 ((z)i ), 2 ((z)i+1 ),
(z)i+1 ), (z)x+1 = y, 2 ((z)x ) = y and lh(z) = x + 2.
Then condition 1 holds because f(l, n) = m if and only if there is a
sequence number k = k0 , . . . , kn , m such that k0 codes the computation
of f(l, 0) with value 2 (k0 ), and for each i < n, ki+1 codes the computa-
tion of f(l, i + 1) = f1 (l, i, 2 (ki )) with value 2 (ki+1 ), and 2 (kn ) = m.
This is equivalent to saying F (l, n, m) ↔ ∃z C (l, n, m, z) is true.
For condition 2 note that in IΔ0 we can prove
v , y, z) → C (
C0 ( v , 0, y, z, y)
and
v , x, y, y , z ) → C (
v , x, y, z) ∧ C1 (
C ( v , x + 1, y , t)
for a suitable term t which removes the end component y of z, re-
places it by z , and then adds the final value component y . Speci-
fically t = ( ( 1 (z), z ), y ). Hence from ∃y ∃z C0 ( v , y, z) we obtain
∃y ∃z C ( v , x, y, y , z ) we can derive
v , 0, y, z), and also from ∀y ∃y ∃z C1 (
∃y ∃z C (
v , x, y, z) → ∃y ∃z C (
v , x + 1, y, z).
By the assumed provable recursiveness of f0 and f1 , we therefore can
prove outright, ∃y F ( v , 0, y) and ∃y F ( v , x, y) → ∃y F (v , x + 1, y). Then
Σ1 -induction allows us to derive ∃y F ( v , x, y) immediately.
To show that condition 3 holds we argue informally in IΔ0 (exp). As-
sume C ( v , x, y, z) and C ( v , x, y , z ). Then z and z are sequence num-
bers of the same length x + 2. Furthermore we have C0 ( v , 2 ((z)0 ), (z)0 )
and C0 ( v , 2 ((z )0 ), (z )0 ) so by the assumed uniqueness condition for C0
we have (z)0 = (z )0 . Similarly we have ∀i<x C1 ( v , i, 2 ((z)i ), 2 ((z)i+1 ),
(z)i+1 ), and the same with z replaced by z . So if (z)i = (z )i we can
deduce (z)i+1 = (z )i+1 using the assumed uniqueness condition for C1 .
Therefore by Δ0 (exp)-induction we obtain ∀i≤x ((z)i = (z )i ). The fi-
nal conjuncts in C give (z)x+1 = 2 ((z)x ) = y and the same with z
replaced by z and y replaced by y . But since (z)x = (z )x this means
y = y and, since all their components are equal, z = z . Hence we have
F (v , x, y) ∧ F ( v , x, y ) → y = y . This completes the proof.
4.1.2. IΣ1 -provable functions are primitive recursive.
Definition. A closed Σ1 -formula ∃z B( z ), with B ∈ Δ0 (exp), is said
to be “true at m”, and we write m |= ∃z B( z ), if there are numbers
m = m1 , . . . , ml all less than m such that B(m) is true (in the standard
154 4. The provably recursive functions of arithmetic

model). A ﬁnite set Γ of closed Σ1 -formulas is “true at m”, written m |= Γ,

if at least one of them is true at m.
If Γ(x1 , . . . , xk ) is a finite set of Σ1 -formulas all of whose free variables
occur among x1 , . . . , xk , and if f : Nk → N, then we write f |= Γ to
mean that for all numerical assignments n = n1 , . . . , nk to the variables
x
= x1 , . . . , xk we have f( n ) |= Γ(
n ).
Note (Persistence). For sets Γ of closed Σ1 -formulas, if m |= Γ and
m < m then m |= Γ. Similarly for sets Γ( x ) of Σ1 -formulas with free

variables, if f |= Γ and f(
n ) ≤ f ( ∈ Nk then f |= Γ.
n ) for all n
Lemma (Σ1 -induction). If Γ( x ) is a finite set of Σ1 -formulas (whose
disjunction is) provable in IΣ1 then there is a primitive recursive function f,
strictly increasing in each of its variables, such that f |= Γ.
Proof. It is convenient to use a Tait-style formalization of IΣ1 , just like
the one used for IΔ0 (exp) in the last chapter, except that the induction
rule
Γ, A(0) Γ, ¬A(y), A(y + 1)
Γ, A(t)
with y not free in Γ and t any term, now applies to any Σ1 -formula A.
Note that if Γ is provable in this system then it has a proof in which all
the non-atomic cut formulas are induction formulas (in this case Σ1 ). For
if Γ is classically derivable from non-logical axioms A1 , . . . , As then there is
a cut-free proof in (Tait-style) logic of ¬A1 , Δ, Γ where Δ = ¬A2 , . . . , ¬As .
Then if A1 is an induction axiom on a formula F we have a cut-free proof
in logic of
F (0) ∧ ∀y (¬F (y) ∨ F (y + 1)) ∧ ¬F (t), Δ, Γ
and hence, by inversion, cut-free proofs of F (0), Δ, Γ and ¬F (y), F (y +
1), Δ, Γ and ¬F (t), Δ, Γ. From the first two of these we obtain F (t), Δ, Γ
by the induction rule above, and then from the third we obtain Δ, Γ by a
cut on the formula F (t). Similarly we can detach ¬A2 , . . . , ¬As in turn, to
yield finally a proof of Γ which only uses cuts on (Σ1 ) induction formulas
or on atoms arising from other non-logical axioms. Such proofs are said
to be “free-cut” free.
Choosing such a proof for Γ( x ), we proceed by induction on its height,
showing at each new proof-step how to define the required primitive re-
cursive function f satisfying f |= Γ.
x ) is an axiom then for all n
If Γ( , Γ(
n ) contains a true atom. Therefore
f |= Γ for any f, so choose f( n ) = n1 + · · · + nk in order to make it
strictly increasing.
If Γ, B0 ∨ B1 arises by an application of the ∨-rule from Γ, B0 , B1
then (because of our definition of Σ1 -formula) B0 and B1 must both be
Δ0 (exp)-formulas. Thus by our definition of “true at”, any function f
satisfying f |= Γ, B0 , B1 must also satisfy f |= Γ, B0 ∨ B1 .
4.1. Primitive recursion and IΣ1 155

Only a slightly more complicated argument applies to the dual case

where Γ, B0 ∧ B1 arises by an application of the ∧-rule from the premises
Γ, B0 and Γ, B1 . For if f0 ( n ) |= Γ(
n ), B0 ( n ) |= Γ(
n ) and f1 ( n ), B1 (
n)
, then it is easy to see (by persistence) that f |= Γ, B0 ∧ B1 where
for all n
f(n ) = f0 ( n ) + f1 (
n ).
If Γ, ∀y B(y) arises from Γ, B(y) by the ∀-rule (y not free in Γ) then
since all formulas are Σ1 , ∀y B(y) must be Δ0 (exp) and so B(y) must be of
the form y<t ∨ B (y) for some (elementary, or even primitive recursive)
term t. Now assume that f0 |= Γ, y < t ∨ B (y) for some increasing
primitive recursive function f0 . Then for all assignments n to the free
variables x , and all assignments k to the variable y,
n , k) |= Γ(
f0 ( n ), B (
n ), k < t( n , k).
Therefore by defining f( n ) = Σk<g(n ) f0 (
n , k), where g is some increasing
elementary (or primitive recursive) function bounding the values of term
n ) |= Γ(
t, we easily see that either f( n ) or else B (
n , k) is true for every
k < t(n ). Hence f |= Γ, ∀y B(y) as required, and clearly f is primitive
recursive.
Now suppose Γ, ∃y A(y) arises from Γ, A(t) by the ∃-rule, where A is
Σ1 . Then by the induction hypothesis there is a primitive recursive f0 such
,
that for all n
n ) |= Γ(
f0 ( n ), A(t( ).
n ), n
Then either f0 ( n ) |= Γ(
n ) or else f0 (
n ) bounds true witnesses for all the
existential quantifiers already in A(t( n ), n ). Therefore by again choosing
an elementary bounding function g for the term t, and defining f( n) =
f0 (
n ) + g(n ), we see that either f(n ) |= Γ( n ) |= ∃y A(y, n
n ) or f( ) for
.
all n
If Γ comes about by the cut rule with Σ1 cut formula C := ∃z B( z)
then the two premises are Γ, ∀z ¬B( z ) and Γ, ∃z B( z ). The universal
quantifiers in the first premise can be inverted (without increasing proof-
height) to give Γ, ¬B( z ) and since B is Δ0 (exp) the induction hypothesis
can be applied to this to give a primitive recursive f0 such that for all
numerical assignments n to the (implicit) variables x and all assignments
m to the new free variables z ,
f0 ( |= Γ(
n , m) n ), ¬B(
n , m).
Applying the induction hypothesis to the second premise gives a primitive
recursive f1 such that for all n n ) |= Γ(
, either f1 ( n ) or else there are fixed
witnesses m < f1 (
n ) such that B( is true. Therefore if we define f
n , m)
by substitution from f0 and f1 thus:
f(
n ) = f0 (
n , f1 (
n ), . . . , f1 (
n ))
then f will be primitive recursive, greater than or equal to f1 , and strictly
increasing since both f0 and f1 are. Furthermore f |= Γ. For otherwise
156 4. The provably recursive functions of arithmetic

there would be a tuple n such that Γ( n ) is not true at f( n ) and hence,
by persistence, not true at f1 ( n ). So B( is true for certain numbers
n , m)
m < f1 ( n ). But then f0 ( < f(
n , m) n ) and so, again by persistence,
Γ(n ) cannot be true at f0 (
n , m). This means B( is false, by the
n , m)
above, and so we have a contradiction.
Finally suppose Γ( x ), A(x , t) arises by an application of the induction
rule on the Σ1 induction formula A( x , y) := ∃z B( x , y, z ). The premises
are Γ( x ), A(x , 0) and Γ( x ), ¬A( x , y), A( x , y + 1). By inverting the
universal quantifiers over z in ¬A( x , y), the second premise becomes
Γ(x ), ¬B( x , y, z ), A(x , y + 1) which is now a set of Σ1 -formulas, and
the height of its proof is not increased. Thus we can apply the induction
hypothesis to each of the premises to obtain increasing primitive recursive
functions f0 and f1 such that for all n , all k and all m,
n ) |= Γ(
f0 ( n ), A(
n , 0),
f1 ( |= Γ(
n , k, m) n ), ¬B( A(
n , k, m), n , k + 1).
Now define f by primitive recursion from f0 and f1 as follows:
f(
n , 0) = f0 (
n ) and f(
n , k + 1) = f1 (
n , k, f(
n , k), . . . , f(
n , k)).
Then for all n and all k, f( n , k) |= Γ( n ), A( n , k). This is shown
by induction on k. The base case is immediate by the definition of
f(n , 0). The induction step is much like the cut case above. Assume
that f( n , k) |= Γ( n ), A( n , k). If Γ( n ) is not true at f( n , k + 1) then by
persistence it is not true at f( n , k), and so f( n , k) |= A( n , k). There-
fore there are numbers m < f( n , k) such that B( is true. Hence
n , k, m)
|= Γ(
n , k, m)
f1 ( n ), A(n , k + 1), and since f1 ( ≤ f(
n , k, m) n , k + 1) we
have, by persistence, f( n , k + 1) |= Γ( n ), A( n , k + 1) as required.
It only remains to substitute, for the final argument k in f, an in-
creasing elementary (or primitive recursive) function g which bounds the
values of term t, so that with f ( n ) = f( n , g(
n )) we have f( n )) |=
n , t(
Γ(n ), A( n , t( , and hence f |= Γ(
n )) for all n x ), A( x , t) by persistence.
This completes the proof.
Theorem. The provably recursive functions of IΣ1 are exactly the primi-
tive recursive functions.
Proof. We have already shown that every primitive recursive function
is provably recursive in IΣ1 . For the converse, suppose g : Nk → N is Σ1
defined by the formula F ( x , y) := ∃z C (
x , y, z) with C ∈ Δ0 (exp), and
IΣ1 ∃y F (x , y). Then by the lemma above, there is a primitive recursive
function f such that for all n ∈ Nk ,
n ) |= ∃y ∃z C (
f( n , y, z).
This means that for every n there is an m < f( n ) and a k < f( n ) such
that C (
n , m, k) is true, and that this (unique) m must be the value of g(
n ).
4.2. ε0 -recursion in Peano arithmetic 157

We can therefore deﬁne g primitive recursively from f as follows:

g(
n ) = (m<h(n ) C (
n , (m)0 , (m)1 ))0

n ) = f(
where h( n ), f(
n ). This completes the proof.

4.2. ε0 -recursion in Peano arithmetic

We now set about showing that the provably recursive functions of

Peano arithmetic are exactly the “ε0 -recursive” functions, i.e., those de-
finable from the primitive recursive functions by substitutions and (arbi-
trarily nested) recursions over “standard” well-orderings of the natural
numbers with order types less than the ordinal

ε0 = sup{, , , . . . }.
As preliminaries, we must first develop some of the basic theory of these or-
dinals, and their standard codings as well-orderings on N . Then we define
the hierarchies of fast-growing bounding functions naturally associated
with them. These will provide an important complexity characterization
through which we can more easily obtain the main result.
4.2.1. Ordinals below ε0 . Throughout the rest of this chapter, α, ,
, , . . . will denote ordinals less than ε0 . Every such ordinal is ei-
ther 0 or can be represented uniquely in so-called Cantor normal form
thus:
α = 1 · c1 + 2 · c2 + · · · + k · ck
where k < · · · < 2 < 1 < α and the coefficients c1 , c2 , . . . , ck are
arbitrary positive integers. If k = 0 then α is a successor ordinal, writ-
ten Succ(α), and its immediate predecessor α − 1 has the same repre-
sentation but with ck reduced to ck − 1. Otherwise α is a limit ordi-
nal, written Lim(α), and it has infinitely many possible “fundamental
sequences”, i.e., increasing sequences of smaller ordinals whose supre-
mum is α. However, we shall pick out one particular fundamental se-
quence {α(n) | n = 0, 1, 2, . . . } for each such limit ordinal α, as fol-
lows: First write α as + where = 1 · c1 + · · · + k · (ck − 1)
and = k . Assume inductively that when is a limit, its fundamen-
tal sequence {(n)} has already been specified. Then define, for each
n ∈ N,

+ −1 · (n + 1) if Succ()
α(n) =
+ (n) if Lim().

Clearly {α(n)} is an increasing sequence of ordinals with supremum α.

158 4. The provably recursive functions of arithmetic

Deﬁnition. With each α < ε0 and each natural number n, associate a

finite set of ordinals α[n] as follows:
⎧
⎪
⎨∅ if α = 0
α[n] = (α − 1)[n] ∪ {α − 1} if Succ(α)
⎪
⎩
α(n)[n] if Lim(α).
Lemma. For each α = + and all n,
α[n] = [n] ∪ { + 1 · c1 + · · · + k · ck | ∀i (i ∈ [n] ∧ ci ≤ n)}.
Proof. By induction on . If = 0 then [n] is empty and so the
right hand side is just [n] ∪ {}, which is the same as α[n] = ( + 1)[n]
according to the definition above.
If is a limit then [n] = (n)[n] so the set on the right hand side is the
same as the one with (n)[n] instead of [n]. By the induction hypothesis
applied to α(n) = + (n) , this set equals α(n)[n], which is just α[n]
again by definition.
Now suppose is a successor. Then α is a limit and α[n] = α(n)[n]
where α(n) = + −1 · (n + 1). This we can write as α(n) = α(n − 1) +
−1 where, in case n = 0, α(−1) = . By the induction hypothesis for
− 1, the set α[n] is therefore equal to

α(n − 1)[n] ∪ {α(n − 1) + 1 · c1 + · · · + k · ck |

∀i (i ∈ ( − 1)[n] ∧ ci ≤ n)}
and similarly for each of α(n − 1)[n], α(n − 2)[n], . . . , α(1)[n]. Since for
each m ≤ n, α(m − 1) = + −1 · m, this last set is the same as

[n] ∪ { + −1 · m + 1 · c1 + · · · + k · ck |
m ≤ n ∧ ∀i (i ∈ ( − 1)[n] ∧ ci ≤n)}
and this is the set required because [n] = ( − 1)[n] ∪ { − 1}. This
completes the proof.
Corollary. For every limit α < ε0 and every n, α(n) ∈ α[n + 1].
Furthermore if ∈ [n] then ∈ [n] provided n = 0.
Definition. The maximum coefficient of = 1 · b1 + · · · + l · bl is
defined inductively to be the maximum of all the bi and all the maximum
coefficients of the exponents i .
Lemma. If < α and the maximum coefficient of is ≤ n then ∈ α[n].
Proof. By induction on α. Let α = + . If < , then ∈ [n]
by induction hypothesis and [n] ⊆ α[n] by the last lemma. Otherwise
= + 1 · b1 + · · · + k · bk with α > > 1 > · · · > k and bi ≤ n.
By induction hypothesis i ∈ [n]. Hence ∈ α[n] again by the last
lemma.
4.2. ε0 -recursion in Peano arithmetic 159

Definition. Let Gα (n) denote the cardinality of the finite set α[n].
Then immediately from the definition of α[n] we have
⎧
⎪
⎨0 if α = 0
Gα (n) = Gα−1 (n) + 1 if Succ(α)
⎪
⎩
Gα(n) (n) if Lim(α).
The hierarchy of functions Gα is called the “slow-growing” hierarchy.
Lemma. If α = + then for all n
Gα (n) = G (n) + (n + 1)G (n) .
Therefore for each α < ε0 , Gα (n) is the elementary function which results
by substituting n + 1 for every occurrence of in the Cantor normal form
of α.
Proof. By induction on . If = 0 then α = + 1, so Gα (n) =
G (n) + 1 = G (n) + (n + 1)0 as required. If is a successor then α is a
limit and α(n) = + −1 ·(n+1), so by n+1 applications of the induction
hypothesis for − 1 we have Gα (n) = Gα(n) (n) = G (n) + (n + 1)G−1 (n) ·
(n + 1) = G (n) + (n + 1)G (n) since G−1 (n) + 1 = G (n). Finally, if is a
limit then α(n) = + (n) , so applying the induction hypothesis to (n),
we have Gα (n) = Gα(n) (n) = G (n) + (n + 1)G(n) (n) which immediately
gives the desired result since G(n) (n) = G (n) by definition.
Definition (Coding ordinals). Encode each ordinal = 1 ·b1 + 2 ·
b2 + · · · + l · bl by the sequence number ¯ constructed recursively as
follows:
¯ = ¯1 , b1 , ¯2 , b2 , . . . , ¯l , bl .
The ordinal 0 is coded by the empty sequence number, which is 0. Note
that ¯ is numerically greater than the maximum coefficient of , and
greater than the codes ¯i of all its exponents, and their exponents etcetera.
Lemma. (a) There is an elementary function h(m, n) such that, with
¯
m = ,
⎧
⎪
⎨0 if = 0
¯ n) = − 1
h(, if Succ()
⎪
⎩
(n) if Lim().
(b) For each fixed α < ε0 there is an elementary well-ordering ≺α ⊂ N2
such that for all b, c ∈ N, b ≺α c if and only if b = ¯ and c = ¯ for
some < < α.
Proof. (a) Thinking of m as a , ¯ define h(m, n) as follows: First set
h(0, n) = 0. Then if m is a non-zero sequence number, see if its final
(rightmost) component 2 (m) is a pair m , n . If so, and m = 0 but
160 4. The provably recursive functions of arithmetic

n = 0, then is a successor and the code of its predecessor, h(m, n), is

then defined to be the new sequence number obtained by reducing n by
one (or removing this final component altogether if n = 1). Otherwise
if 2 (m) = m , n where m and n are both positive, then is a limit
of the form + · n where m = . ¯ Now let k be the code of +
· (n − 1), obtained by reducing n by one inside m (or if n = 1,
deleting the final component from m). Set k aside for the moment.
At the “right hand end” of we have a spare which, in order to
produce (n), must be reduced to −1 · (n + 1) if Succ(), or to (n)
if Lim(). Therefore the required code h(m, n) of (n) will in this case
be obtained by tagging on to the end of the sequence number k one
extra pair coding this additional term. But if we assume inductively
that h(m , n) has already been defined for m < m then this additional
component must be either h(m , n), n + 1 if Succ() or h(m , n), 1 if
Lim().
This defines h(m, n), once we agree to set its value to zero in all extrane-
ous cases where m is not a sequence number of the right form. However,
the definition so far given is a primitive recursion (depending on previ-
ous values for smaller m’s). To make it elementary we need to check
that h(m, n) is also elementarily bounded, for then h is defined by “lim-
ited recursion” from elementary functions, and we know that the result
will then be an elementary function. Now when m codes a successor
then, clearly, h(m, n) < m. In the limit case, h(m, n) is obtained from
the sequence number k (numerically smaller than m) by adding one new
pair on the end. Recall that an extra item i is tagged on to the end
of a sequence number k by the function (k, i) which is quadratic in
k and i. If the item added is the pair h(m , n), n + 1 where Succ(),
then h(m , n) < m and so h(m, n) is numerically bounded by some fixed
polynomial in m and n. In the other case, however, all we can say imme-
diately is that h(m, n) is numerically less than some fixed polynomial of
m and h(m , n). But since m codes an exponent in the Cantor normal
form coded by m, this second polynomial cannot be iterated more than
d times, where d is the “exponential height” of the normal form. There-
fore h(m, n) is bounded by some d -times iterated polynomial of m + n.
c(m+n)
Since d < m it is therefore bounded by the elementary function 22
for some constant c. Thus h(m, n) is defined by limited recursion, so it is
elementary.
(b) Fix α < ε0 and let d be the exponential height of its Cantor
normal form. We use the function h just defined in part (a), and note
that if we only apply it to codes for ordinals below α, they will all have
exponential height ≤ d , and so with this restriction we can consider
h as being bounded by some fixed polynomial of its two arguments.
Define g(0, n) = ᾱ and g(i + 1, n) = h(g(i, n), n), and notice that g
is therefore bounded by an i-times iterated polynomial, so g is defined
4.2. ε0 -recursion in Peano arithmetic 161

by an elementarily limited recursion from h, and hence is itself elemen-

tary.
Now define b ≺α c if and only if c = 0 and there are i and j
such that 0 < i < j ≤ Gα (max(b, c) + 1) and g(i, max(b, c)) = c
and g(j, max(b, c)) = b. Since the functions g and Gα are elemen-
tary, and since the quantifiers are bounded, the relation ≺α is elemen-
tary. Furthermore by the properties of h it is clear that if i < j then
g(i, max(b, c)) codes an ordinal greater than g(j, max(b, c)) (provided
the first is not zero). Hence if b ≺α c then b = ¯ and c = ¯ for some
< < α.
We must show the converse, so suppose b = ¯ and c = ¯ where
< < α. Then since the code of an ordinal is greater than its maximum
coefficient, we have ∈ α[max(b, c)] and ∈ α[max(b, c)]. This means
that the sequence starting with α and at each stage descending from a
to either − 1 if Succ() or (max(b, c)) if Lim(), must pass through
first and later . In terms of codes it means that there is an i and a
j such that 0 < i < j and g(i, max(b, c)) = c and g(j, max(b, c)) = b.
Thus b ≺α c holds if we can show that j ≤ Gα (max(b, c) + 1). In
the descending sequence just described, only the successor stages actu-
ally contribute an element − 1 to α[max(b, c)]. At the limit stages,
(max(b, c)) does not get put in. However, although (n) does not be-
long to [n], it does belong to [n + 1]. Therefore all the ordinals in the
descending sequence lie in α[max(b, c) + 1]. So j can be no bigger than
the cardinality of this set, which is Gα (max(b, c) + 1). This completes the
proof.
Thus the principles of transfinite induction and transfinite recursion over
initial segments of the ordinals below ε0 can all be expressed in the language
of elementary recursive arithmetic.
4.2.2. Introducing the fast-growing hierarchy.
Definition. The “Hardy hierarchy” {Hα }α<ε0 is defined by recursion
on α thus (cf. Hardy [1904]):
⎧
⎪
⎨n if α = 0
Hα (n) = Hα−1 (n + 1) if Succ(α)
⎪
⎩
Hα(n) (n) if Lim(α).
The “fast-growing hierarchy” {Fα }α<ε0 is defined by recursion on α thus:
⎧
⎪
⎨n + 1 if α = 0
n+1
Fα (n) = Fα−1 (n) if Succ(α)
⎪
⎩
Fα(n) (n) if Lim(α)
n+1
where Fα−1 (n) is the (n + 1)-times iterate of Fα−1 on n.
162 4. The provably recursive functions of arithmetic

Note. The Hα and Fα functions could equally well be deﬁned purely

number-theoretically, by working over the well-orderings ≺α instead of
directly over the ordinals themselves. Thus they are ε0 -recursive functions.
Lemma. For all α, and all n,
(a) Hα+ (n) = Hα (H (n)),
(b) H α (n) = Fα (n).
Proof. The first part is proven by induction on , the unstated as-
sumption being that the Cantor normal form of α + is just the re-
sult of concatenating their two separate Cantor normal forms, so that
(α + )(n) = α + (n). This of course requires that the leading expo-
nent in the normal form of is not greater than the final exponent in the
normal form of α. We shall always make this assumption when writing
α + .
If = 0 the equation holds trivially because H0 is the identity function.
If Succ() then by the definition of the Hardy functions and the induction
hypothesis for − 1,
Hα+ (n) = Hα+(−1) (n + 1) = Hα (H−1 (n + 1)) = Hα (H (n)).
If Lim() then by the induction hypothesis for (n),
Hα+ (n) = Hα+(n) (n) = Hα (H(n) (n)) = Hα (H (n)).
The second part is proved by induction on α. If α = 0 then H 0 (n) =
H1 (n) = n + 1 = F0 (n). If Succ(α) then by the limit case of the definition
of H , the induction hypothesis, and the first part above,
H α (n) = H α−1 ·(n+1) (n) = Hn+1 n+1
α−1 (n) = Fα−1 (n) = Fα (n).

If Lim(α) then the equation follows immediately by the induction hypoth-

esis for α(n). This completes the proof.
Lemma. For each α < ε0 , Hα is strictly increasing and H (n) < Hα (n)
whenever ∈ α[n]. The same holds for Fα , with the slight restriction that
n = 0, for when n = 0 we have Fα (0) = 1 for all α.
Proof. By induction on α. The case α = 0 is trivial since H0 is
the identity function and 0[n] is empty. If Succ(α) then Hα is Hα−1
composed with the successor function, so it is strictly increasing by the
induction hypothesis. Furthermore if ∈ α[n] then either ∈ (α − 1)[n]
or = α − 1 so, again by the induction hypothesis, H (n) ≤ Hα−1 (n) <
Hα−1 (n + 1) = Hα (n). If Lim(α) then Hα (n) = Hα(n) (n) < Hα(n) (n + 1)
by the induction hypothesis. But as noted previously, α(n) ∈ α[n + 1] =
α(n+1)[n+1], so by applying the induction hypothesis to α(n+1) we have
Hα(n) (n + 1) < Hα(n+1) (n + 1) = Hα (n + 1). Thus Hα (n) < Hα (n + 1).
Furthermore if ∈ α[n] then ∈ α(n)[n] so H (n) < Hα(n) (n) = Hα (n)
straight away by the induction hypothesis for α(n).
4.2. ε0 -recursion in Peano arithmetic 163

The same holds for Fα = H α provided we restrict to n = 0 since if

∈ α[n] we then have ∈ α [n]. This completes the proof.
Lemma. If ∈ α[n] then F+1 (m) ≤ Fα (m) for all m ≥ n.
Proof. By induction on α, the zero case being trivial. If α is a successor
then either ∈ (α − 1)[n] in which case the result follows straight from
the induction hypothesis, or = α − 1 in which case it’s immediate. If α
is a limit then we have ∈ α(n)[n] and hence by the induction hypothesis
F+1 (m) ≤ Fα(n) (m). But Fα(n) (m) ≤ Fα (m) either by definition of F in
case m = n, or by the last lemma when m > n since then α(n) ∈ α[m].
4.2.3. α-recursion and ε0 -recursion.
Definition (α-recursion).
(a) An α-recursion is a function-definition of the following form, defining
f : Nk+1 → N from given functions g0 , g1 , . . . , gs by two clauses (in
the second, n = 0):
= g0 (m)
f(0, m)
= T (g1 , . . . , gs , f≺n , n, m)
f(n, m)

is a ﬁxed term built up from the num-

where T (g1 , . . . , gs , f≺n , n, m)
ber variables n, m by applications of the functions g1 , . . . , gs and the
function f≺n given by

f(n , m)
if n ≺α n
f≺n (n , m) =
0 otherwise.

It is of course always assumed, when doing α-recursion, that α = 0.

(b) An unnested α-recursion is one of the special form:
= g0 (m)
f(0, m)
= g1 (n, m,
f(n, m) f(g2 (n, m),
. . . , gk+2 (n, m)))

with just one recursive call on f where g2 (n, m) ≺α n for all n and m.
(c) Let ε0 (0) = and ε0 (i + 1) = ε0 (i) . Then for each fixed i, a
function is said to be ε0 (i)-recursive if it can be defined from primitive
recursive functions by successive substitutions and α-recursions with
α < ε0 (i). It is unnested ε0 (i)-recursive if all the α-recursions used in
its definition are unnested. It is ε0 -recursive if it is ε0 (i)-recursive for
some (any) i.
Note. The ε0 (0)-recursive functions are just the primitive recursive
ones, since if α < then α-recursion is just a finitely iterated substi-
tution. So the definition of ε0 (0)-recursion simply amounts to the closure
of the primitive recursive functions under substitution, which of course
does not enlarge the primitive recursive class.
164 4. The provably recursive functions of arithmetic

Lemma (Bounds for α-recursion). Suppose f is deﬁned from g1 , . . . , gs

by an α-recursion:
= g0 (m)
f(0, m)
= T (g1 , . . . , gs , f≺n , n, m)
f(n, m)
where for each i ≤ s, gi (
a ) < F (k + max a ) for all numerical arguments
a . (The and k are arbitrary constants, but it is assumed that the last
exponent in the Cantor normal form of is ≥ the ﬁrst exponent in the
normal form of α, so that + α is automatically in Cantor normal form).

Then there is a constant d such that for all n, m,
< F+α (k + 2d + max(n, m)).
f(n, m)
Proof. The constant d will be the depth of nesting of the term T ,
where variables have depth of nesting 0 and each compositional term
g(T1 , . . . , Tl ) has depth of nesting one greater than the maximum depth
of nesting of the subterms Tj .
First suppose n lies in the ﬁeld of the well-ordering ≺α . Then n = ¯ for
some < α. We claim by induction on that
< F++1 (k + 2d + max(n, m)).
f(n, m)
This holds immediately when n = 0, because g0 (m) < F (k + max m)
and F is strictly increasing and bounded by F+1 . So suppose n = 0 and
assume the claim for all n = ¯ where < .
Let T be any subterm of T (g1 , . . . , gs , f≺n , n, m)
with depth of nesting
d , built up by application of one of the functions g1 , . . . , gs or f≺n to
subterms T1 , . . . , Tl . Now assume (for a sub-induction on d ) that each of
2(d −1)
these Tj ’s has numerical value vj less than F+
(k+2d +max(n, m)). If
T is obtained by application of one of the functions gi then its numerical
value will be

2(d −1)
gi (v1 , . . . , vl ) < F (k + F+
(k + 2d + max(n, m)))

2d
< F+
(k + 2d + max(n, m))

since if k < u then F (k + u) < F (2u) < F2 (u) provided = 0. On the
other hand, if T is obtained by application of the function f≺n , its value
will be f(v1 , . . . , vl ) if v1 ≺α n, or 0 otherwise. Suppose v1 = ¯ ≺α .
¯
Then by the induction hypothesis,
f(v1 , . . . , vl ) < F++1 (k + 2d + max v ) ≤ F+ (k + 2d + max v )
because v1 is greater than the maximum coeﬃcient of , so ∈ [v1 ], so
+ ∈ ( + )[v1 ] and hence F++1 is bounded by F+ on arguments
≥ v1 . Therefore, inserting the assumed bounds for the vj , we have

2(d −1)
f(v1 , . . . , vl ) < F+ (k + 2d + F+
(k + 2d + max(n, m)))
4.2. ε0 -recursion in Peano arithmetic 165

and then by the same argument as before,

2d
f(v1 , . . . , vl ) < F+
(k + 2d + max(n, m))).
We have now shown that the value of every subterm of T with depth of
2d
nesting d is less than F+
(k + 2d + max(n, m))). Applying this to T
itself with depth of nesting d we thus obtain
2d
< F+
f(n, m)
(k + 2d + max(n, m)))
< F++1 (k + 2d + max(n, m)))
as required. This proves the claim.
To derive the result of the lemma is now easy. If n = ¯ lies in the field
of ≺α then + ∈ ( + α)[n] and so
< F++1 (k + 2d + max(n, m)))
f(n, m) ≤ F+α (k + 2d + max(n, m))).

If n does not lie in the field of ≺α then the function f≺n is the constant zero
function, and so in evaluating f(n, m) by the term T only applications of
the gi -functions come into play. Therefore a much simpler version of the
above argument gives the desired
< F2d (k + 2d + max(n, m))
f(n, m) < F+α (k + 2d + max(n, m))

since α = 0. This completes the proof.
Theorem. For each i, a function is ε0 (i)-recursive if and only if it is
register-machine computable in a number of steps bounded by Fα for some
α < ε0 (i).
Proof. For the “if” part, recall that for every register-machine com-
putable function g there is an elementary function U such that for all
arguments m, if s(m)
bounds the number of steps needed to compute
then g(m)
g(m) = U (m, s(m)).
Thus if g is computable in a number of
steps bounded by Fα , this means that g can be defined from Fα by the
substitution
= U (m,
g(m) Fα (max m)).

Hence g will be ε0 (i)-recursive if Fα is. We therefore need to show that
if α < ε0 (i) then Fα is ε0 (i)-recursive. This is clearly true when i = 0
since then α is finite, and the finite levels of the F hierarchy are all prim-
itive recursive, and therefore ε0 (0)-recursive. Suppose then that i > 0,
and that α = 1 · c1 + · · · + k · ck is less than ε0 (i). Adding one to
each exponent, and inserting a successor term at the end, produces the
ordinal = α + n where α is the limit 1 +1 · c1 + · · · + k +1 · ck .
Since i > 0 it is still the case that < ε0 (i). Obviously, from the code
for α, here denoted a, we can elementarily compute the code for α ,
denoted a , and then b = (a , 0, n) will be the code for . Con-
versely from such a b we can elementarily decode a and hence a, and
also the n. Choosing a large enough < ε0 (i) so that < , we can
now define a function f(b, m) by -recursion, with the property that
166 4. The provably recursive functions of arithmetic

when b is the code for = α + n, then f(b, m) = Fαn (m). To ex-

plicate matters we shall expose the components from which b is con-
structed by writing b = (a, n). Then the recursion deﬁning f(b, m) =
f((a, n), m) has the following form, using the elementary function h(a, n)
deﬁned earlier, which gives the code for α − 1 if Succ(α), or α(n) if
Lim(α):
⎧
⎪
⎪ m+n if a = 0 or n = 0
⎪
⎪
⎪
⎪ f((h(a, m), m + 1), m) if Succ(a) and n = 1
⎨
f((a, n), m) = f((h(a, m), 1), m) if Lim(a) and n = 1
⎪
⎪
⎪f((a, 1), f((a, n − 1), m)) if n > 1
⎪
⎪
⎪
⎩0 otherwise.

Clearly then f is ε0 (i)-recursive, and Fα (m) = f((ᾱ, 1), m), so Fα is

ε0 (i)-recursive for every α < ε0 (i).
For the “only if” part note first that the number of steps needed to
compute a compositional term g(T1 , . . . , Tl ) is the sum of the numbers
of steps needed to compute all the subterms Tj , plus the number of steps
needed to compute g(v1 , . . . , vl ) where vj is the value of Tj . Furthermore,
in a register-machine computation, these values vj are bounded by the
number of computation steps plus the maximum input. This means that
we can compute a bound on the computation-steps for any such term,
and we can do it elementarily from given bounds for the input data. Now
suppose f(n, m) = T (g1 , . . . , gs , f≺n , n, m)
is any recursion-step of an
α-recursion. Then if we are given bounding functions on the numbers
of steps to compute each of the gi ’s, and we assume inductively that
we already have a bound on the number of steps to compute f(n , −)
whenever n ≺α n, it follows that we can elementarily estimate a bound
on the number of steps to compute f(n, m). In other words, for any
function defined by an α-recursion from given functions g , a bounding
function (on the number of steps needed to compute f) is also definable
by α-recursion from given bounding functions for the g’s. Exactly the
same thing holds for primitive recursions. But in the preceding lemma
we showed that as we successively define functions by α-recursions, with
α < ε0 (i), their values are bounded by functions F+α where also <
ε0 (i). But ε0 (i) is closed under addition, so + α < ε0 (i). Hence every
ε0 (i)-recursive function is register-machine computable in a number of
steps bounded by some F where < ε0 (i). This completes the proof.

The following reduction of nested to unnested recursion is due to Tait
[1961]; see also Fairtlough and Wainer [1992] .
Corollary. For each i, a function is ε0 (i)-recursive if and only if it is
unnested ε0 (i + 1)-recursive.
4.2. ε0 -recursion in Peano arithmetic 167

Proof. By the theorem, every ε0 (i)-recursive function is computable in

“time” bounded by Fα = H α where α < ε0 (i). It is therefore primitive
recursively definable from H α . But H α is defined by an unnested α -
recursion, and clearly α < ε0 (i + 1). Hence arbitrarily nested ε0 (i)-
recursions are reducible to unnested ε0 (i + 1)-recursions.
Conversely, suppose f is defined from given functions g0 , g1 , . . . , gk+2
by an unnested α-recursion where α < ε0 (i + 1):

= g0 (m)
f(0, m)
= g1 (n, m,
f(n, m) f(g2 (n, m),
. . . , gk+2 (n, m)))

≺α n for all n and m.

with g2 (n, m) Then the number of recursion-steps
is f (n, m)
needed to compute f(n, m) where

f (0, m)
=0
f (n, m)
= 1 + f (g2 (n, m),
. . . , gk+2 (n, m))

and f is then primitive recursively deﬁnable from g2 , . . . , gk+2 and any

bound for f . Now assume that the given functions gj are all primitive
recursively deﬁnable from, and bounded by, H where < ε0 (i +1). Then
a similar, but easier, argument to that used in proving the lemma above
providing bounds for α-recursion shows that f (n, m)
is bounded by H·
where n = .¯ This is simply because

H·(+1) (x) = H·+ (x) = H· (H (x)).

Therefore f is primitive recursively deﬁnable from H and H·α . Clearly,

since , α < ε0 (i +1) we may choose = and α = α for appropriate
α ≤ < ε0 (i). Then H = F and H·α = F +α where of course
+ α < ε0 (i). Therefore f is ε0 (i)-recursive.
4.2.4. Provable recursiveness of Hα and Fα . We now prove that for every
α < ε0 (i), with i > 0, the function Fα is provably recursive in the theory
IΣi+1 .
Since all of the machinery we have developed for coding ordinals below
ε0 is elementary, we can safely assume that it can be defined (with all
relevant properties proven) in IΔ0 (exp). In particular we shall again make
use of the function h such that if a codes a successor ordinal α then h(a, n)
codes α − 1, and if a codes a limit ordinal α then h(a, n) codes α(n). Note
that we can decide whether a codes a succesor ordinal (Succ(a)) or a limit
ordinal (Lim(a)) by asking whether h(a, 0) = h(a, 1) or not. It is easiest
to develop first the provable recursiveness of the Hardy functions Hα ,
since they have a simpler, unnested recursive definition. The fast-growing
functions are then easily obtained by the equation Fα = H α .
168 4. The provably recursive functions of arithmetic

Deﬁnition. Let H (a, x, y, z) be a Δ0 (exp)-formula expressing

(z)0 = 0, y ∧ 2 (z) = a, x ∧
∀i<lh(z) (lh((z)i ) = 2 ∧ (i > 0 → (z)i,0 > 0)) ∧
∀0<i<lh(z) (Succ((z)i,0 ) → (z)i−1,0 = h((z)i,0 , (z)i,1 ) ∧
(z)i−1,1 = (z)i,1 +1) ∧
∀0<i<lh(z) (Lim((z)i,0 ) → (z)i−1,0 = h((z)i,0 , (z)i,1 ) ∧ (z)i−1,1 = (z)i,1 ).
Lemma (Definability of Hα ). Hα (n) = m if and only if ∃z H (ᾱ, n, m, z)
is true. Furthermore, for each α < ε0 we can prove in IΣ1 ,
∃z H (ᾱ, x, y, z) ∧ ∃z H (ᾱ, x, y , z) → y = y .
Proof. The meaning of the formula ∃z H (ᾱ, n, m, z) is that there is a
finite sequence of pairs αi , ni , beginning with 0, m and ending with
α, n, such that at each i > 0, if Succ(αi ) then αi−1 = αi − 1 and
ni−1 = ni + 1, and if Lim(αi ) then αi−1 = αi (ni ) and ni−1 = ni . Thus
by induction up along the sequence, and using the original definition of
Hα , we easily see that for each i > 0, Hαi (ni ) = m, and thus at the end,
Hα (n) = m. Conversely, if Hα (n) = m then there must exist such a
computation sequence, and this proves the first part of the lemma.
For the second part notice that, by induction on the length of the
computation sequence s, we can prove for each n, m, m , s, s that
H (ᾱ, n, m, s) → H (ᾱ, n, m , s ) → s = s ∧ m = m .
This proof can be formalized directly in IΔ0 (exp) to give
H (ᾱ, x, y, z) → H (ᾱ, x, y , z ) → z = z ∧ y = y
and hence
∃z H (ᾱ, x, y, z) → ∃z H (ᾱ, x, y , z) → y = y .

Remark. Thus in order for Hα to be provably recursive it remains only

to prove (in the required theory) ∃y ∃z H (ᾱ, x, y, z).
Lemma. In IΔ0 (exp) we can prove
∃z H ( a , x, y, z) → ∃z H ( a c, y, w, z) → ∃z H ( a (c + 1), x, w, z)
where a c is the elementary term a, c which constructs, from the code
a of an ordinal α, the code for the ordinal α · c.
Proof. By assumption we have sequences s, s satisfying H ( a , x, y, s)
and H ( a c, y, w, s ). Add a c to the ﬁrst component of each pair in s.
Then the last pair in s and the ﬁrst pair in s become identical. By
concatenating the two—taking this repeated pair only once—construct
an elementary term t(s, s ) satisfying H ( a (c + 1), x, w, t). We can then
prove
H ( a , x, y, s) → H ( a c, y, w, s ) → H ( a (c + 1), x, w, t)
4.2. ε0 -recursion in Peano arithmetic 169

in a conservative extension of IΔ0 (exp), and hence in IΔ0 (exp) derive

∃z H ( a , x, y, z) → ∃z H ( a c, y, w, z) → ∃z H ( a (c + 1), x, w, z).

Lemma. Let H (a) be the Π2 -formula ∀x ∃y ∃z H (a, x, y, z). Then with

Π2 -induction we can prove the following:
(a) H ( 0 ).
(b) Succ(a) → H ( h(a,0) ) → H ( a ).
(c) Lim(a) → ∀x H ( h(a,x) ) → H ( a ).
Proof. The term t0 = 0, x + 1, 1, x witnesses H ( 0 , x, x + 1, t0 )
in IΔ0 (exp), so H ( 0 ) is immediate.
With the aid of the lemma just proven we can derive
H ( h(a,0) ) → H ( h(a,0) c) → H ( h(a,0) (c + 1)).
Therefore by Π2 -induction we obtain
H ( h(a,0) ) → H ( h(a,0) (x + 1))
and then
H ( h(a,0) ) → ∃y ∃z H ( h(a,0) (x + 1), x, y, z).
But there is an elementary term t1 with the property
Succ(a) → H ( h(a,0) (x + 1), x, y, z) → H ( a , x, y, t1 )
since t1 only needs to tag on to the end of the sequence z the new pair
a , x, thus t1 = (z, a , x). Hence by the quantifier rules,
Succ(a) → H ( h(a,0) ) → H ( a ).
The final case is now straightforward, since the term t1 just constructed
also gives
Lim(a) → H ( h(a,x) , x, y, z) → H ( a , x, y, t1 )
and so by quantifier rules again,
Lim(a) → ∀x H ( h(a,x) ) → H ( a ).

Deﬁnition (Structural transﬁnite induction). The structural progres-

siveness of a formula A(a) is expressed by SProga A, which is the con-
junction of the formulas A(0), ∀a (Succ(a) → A(h(a, 0)) → A(a)), and
∀a (Lim(a) → ∀x A(h(a, x)) → A(a)). The principle of structural transﬁ-
nite induction up to an ordinal α is then the following axiom schema, for
all formulas A:
SProga A → ∀a≺ᾱ A(a)
where a ≺ ᾱ means a lies in the ﬁeld of the well-ordering ≺α , in other
words a = 0 ∨ 0 ≺α a.
170 4. The provably recursive functions of arithmetic

Note. The last lemma shows that the Π2 -formula H ( a ) is structurally

progressive, and that this is provable with Π2 -induction.
We now make use of a famous result of Gentzen [1936], which says that
transfinite induction is provable in arithmetic up to any α < ε0 . For later
use we prove this fact in a slightly more general form, where one can recur
to all points strictly below the present one, and need not refer explicitly to
distinguished fundamental sequences.
Definition (Transfinite induction). The (general) progressiveness of a
formula A(a) is
Proga A := ∀a (∀b≺a A(b) → A(a)).
The principle of transfinite induction up to an ordinal α is the schema
Proga A → ∀a≺ᾱ A(a)
where again a ≺ ᾱ means a lies in the field of the well-ordering ≺α .
Lemma. Structural transfinite induction up to α is derivable from trans-
finite induction up to α.
Proof. Let A be an arbitrary formula and assume SProga A; we must
show ∀a≺ᾱ A(a). Using transfinite induction for the formula a ≺ ᾱ →
A(a) it suffices to prove
∀a (∀b≺a;b≺ᾱ A(b) → a ≺ ᾱ → A(a))
which is equivalent to
∀a≺ᾱ (∀b≺a A(b) → A(a)).
This is easily proved from SProga A, using the properties of the h function,
and distinguishing the cases a = 0, Succ(a) and Lim(a).
Remark. Induction over an arbitrary well-founded set is an easy conse-
quence. Comparisons are made by means of a “measure function”, into
an initial segment of the ordinals. The principle of “general induction”
up to an ordinal α is
Progx A(x) → ∀x;x≺ᾱ A(x)
where Progx A(x) expresses “-progressiveness” w.r.t. the measure func-
tion and the ordering ≺:=≺α
Progx A(x) := ∀a (∀y;y≺a A(y) → ∀x;x=a A(x)).
We claim that general induction up to an ordinal α is provable from
transfinite induction up to α.
Proof. Assume Progx A(x); we must show ∀x;x≺ᾱ A(x). Consider
B(a) := ∀x;x=a A(x).
4.2. ε0 -recursion in Peano arithmetic 171

It suﬃces to prove ∀a≺ᾱ B(a), which is ∀a≺ᾱ ∀x;x=a A(x). By transﬁnite

induction it suffices to prove Proga B, which is
∀a (∀b≺a ∀y;y=b A(y) → ∀x;x=a A(x)).
But this follows from the assumption Progx A(x), since ∀b≺a ∀y;y=b A(y)
implies ∀y;y≺a A(y).
4.2.5. Gentzen’s theorem on transfinite induction in PA. To complete the
provable recursiveness of Hα and Fα we make use of Gentzen’s analysis of
provable instances of transfinite induction below ε0 , subsequently refined
by Parsons [1972], [1973]. In the proof we will need some properties of
≺, and of the elementary addition function ⊕ on ordinal codes, which
concatenates ᾱ with ¯ to form ᾱ ⊕ ¯ = α + . These can all be proved
in IΔ0 (exp): e.g., irreflexivity and transitivity of ≺, and also—following
Schütte—
a ≺ 0 → A, (1)
c ≺ b ⊕ → (c ≺ b → A) → (c = b → A) → A,
0
(2)
a ⊕ 0 = a, (3)
a ⊕ (b ⊕ c) = (a ⊕ b) ⊕ c, (4)
0 ⊕ a = a, (5)
a
0 = 0, (6)
(x + 1) = x ⊕ ,
a a a
(7)
a = 0 → c ≺ b ⊕ → c ≺ b ⊕
a e(a,b,c)
m(a, b, c), (8)
a = 0 → c ≺ b ⊕ → e(a, b, c) ≺ a.
a
(9)
Here, e and m denote appropriate function constants (the reader should
check that they can both be taken to be elementary).
Theorem (Gentzen, Parsons). For every Π2 -formula F and each i > 0
we can prove in IΣi+1 the principle of transfinite induction up to α for all
α < ε0 (i).
Proof. Starting with any Πj -formula A(a), we construct the formula
A+ (a) := ∀b (∀c≺b A(c) → ∀c≺b⊕ a A(c)).
Note that since A is Πj then, by reduction to prenex form, A+ is (provably
equivalent to) a Πj+1 -formula. The crucial point is that
IΣj Proga A(a) → Proga A+ (a).
So assume Proga A(a), that is, ∀a (∀b≺a A(b) → A(a)) and
∀b≺a A+ (b). (10)
+
We have to show A (a). So assume further
∀c≺b A(c) (11)
172 4. The provably recursive functions of arithmetic

and c ≺ b ⊕ a . We have to show A(c).

If a = 0, then c ≺ b ⊕ 0 . By (2) it suffices to derive A(c) from c ≺ b
as well as from c = b. If c ≺ b, then A(c) follows from (11), and if c = b,
then A(c) follows from (11) and Proga A.
If a = 0, from c ≺ b ⊕ a we obtain c ≺ b ⊕ e(a,b,c) m(a, b, c) by (8)
and e(a, b, c) ≺ a by (9). From (10) we obtain A+ (e(a, b, c)). By the
definition of A+ (x) we get
∀u≺b⊕ e(a,b,c) x A(u) → ∀u≺(b⊕ e(a,b,c) x)⊕ e(a,b,c) A(u)
and hence, using (4) and (7),
∀u≺b⊕ e(a,b,c) x A(u) → ∀u≺b⊕ e(a,b,c) (x+1) A(u).
Also from (11) and (6), (3) we obtain
∀u≺b⊕ e(a,b,c) 0 A(u).
Using an appropriate instance of Πj -induction we then conclude
∀u≺b⊕ e(a,b,c) m(a,b,c) A(u)
and hence A(c). Thus IΣj Proga A(a) → Proga A+ (a).
Now fix i > 0 and (throughout the rest of this proof) let ≺ denote
the well-ordering ≺ε0 (i) . Given any Π2 -formula F (v) define A(a) to be
the formula ∀v≺a F (v). Then (contracting like quantifiers) A becomes Π2
also, and furthermore it is easy to see that Progv F (v) → Proga A(a) is
derivable in IΔ0 (exp). Therefore by iterating the above procedure i times
starting with j = 2, we obtain successively the formulas A+ , A++ , . . . , A(i)
where A(i) is Πi+2 and
IΣi+1 Progv F (v) → Progu A(i) (u).
Now fix any α < ε0 (i) and choose k so that α ≤ ε0 (i)(k). By applying
k + 1 times the progressiveness of A(i) (u), one obtains A(i) (k + 1) without
need of any further induction, since k is fixed. Therefore
IΣi+1 Progv F (v) → A(i) (k + 1).
But by instantiating the outermost universally quantified variable of A(i)
to zero we have A(i) (k + 1) → A(i−1) ( k+1 ). Again instantiating to zero
the outermost universally quantified variable in A(i−1) we similarly obtain
k+1
A(i−1) ( k+1 ) → A(i−2) ( ). Continuing in this way, and noting that
ε0 (i)(k) consists of an exponential stack of i ’s with k + 1 on the top,
we finally get down (after i steps) to
IΣi+1 Progv F (v) → A(ε0 (i)(k)).
Since A(ε0 (i)(k)) is just ∀v≺ε0 (i)(k) F (v) we have therefore proved, in IΣi+1 ,
transfinite induction for F up to ε0 (i)(k), and hence up to the given α.
4.3. Ordinal bounds for provable recursion in PA 173

Theorem. For each i and every α < ε0 (i), the fast-growing function Fα
is provably recursive in IΣi+1 .
Proof. If i = 0 then α is ﬁnite and Fα is therefore primitive recursive,
so it is provably recursive in IΣ1 .
Now suppose i > 0. Since Fα = H α we need only show, for every α <
ε0 (i), that H α is provably recursive in IΣi+1 . But a lemma above shows
that its deﬁning Π2 -formula H ( a ) is provably (structurally) progressive
in IΣ2 , and therefore by Gentzen’s result,

IΣi+1 ∀a≺ᾱ H ( a ).

One further application of progressiveness then gives

IΣi+1 H ( ᾱ )

which, together with the deﬁnability of Hα already proven earlier, com-

pletes the provable Σ1 -definability of H α in IΣi+1 .
Corollary. Any ε0 (i)-recursive function is provably recursive in IΣi+1 .
Proof. We have seen already that each ε0 (i)-recursive function is re-
gister-machine computable in a number of steps bounded by some Fα
with α < ε0 (i). Consequently, each such function is primitive recursively,
even elementarily, definable from an Fα which itself is provably recursive
in IΣi+1 . But primitive recursions only need Σ1 -inductions to prove them
defined (see 4.1). Thus in IΣi+1 we can prove the Σ1 -definability of all
ε0 (i)-recursive functions.

4.3. Ordinal bounds for provable recursion in PA

For the converse of the above result we perform an ordinal analysis of PA

proofs in a system which allows higher levels of induction to be reduced,
via cut elimination, to Σ1 -inductions. The cost of such reductions is a
successive exponential increase in the ordinals involved, but in the end, by
a generalization of Parsons’ theorem on primitive recursion, this enables
us to read off fast-growing bounding functions for provable recursion.
It would be naive to try to carry through cut elimination directly on
PA proofs since the inductions would get in the way. Instead, following
Schütte [1951], the trick is to unravel the inductions by means of the
-rule: from the infinite sequence of premises {A(n) | n ∈ N} derive
∀x A(x). The disadvantage is that this embeds PA into a “semi-formal”
system with an infinite rule, so proofs will now be well-founded trees
with ordinals measuring their heights. The advantage is that this system
admits cut elimination, and furthermore it bears a close relationship with
the fast-growing hierarchy, as we shall see.
174 4. The provably recursive functions of arithmetic

4.3.1. The inﬁnitary system n : N α Γ. We shall inductively generate,

according to the rules below, an infinitary system of (classical) one-sided
sequents
n : N α Γ
in Tait style (i.e., with negation of compound formulas defined by de
Morgan’s laws) where:
(i) n : N is a new kind of atomic formula, declaring a bound on numer-
ical “inputs” from which terms appearing in Γ are computed according
to the N -rules and axioms.
(ii) Γ is any finite set of closed formulas, either of the form m : N , or else
formulas in the language of arithmetic based on {=, 0, S, P, +, − · , ·, exp2 },
possibly with the addition of any number of further primitive recursively
defined function symbols. Recall that Γ, A denotes the set Γ ∪ {A}, and
Γ, Γ denotes Γ ∪ Γ etc.
(iii) Ordinals α, , , . . . denote bounds on the heights of derivations,
assigned in a carefully controlled way due originally to Buchholz whose
work strongly influences and underpins the infinitary systems here and in
the next chapter; see Buchholz [1987], also Buchholz and Wainer [1987]
and Fairtlough and Wainer [1998]. Essentially, the condition is that if
a sequent with bound α is derived from a premise with bound then
∈ α[n] where n is the declared input bound.
(iv) Any occurrence of a number n in a formula should of course be read
as its corresponding numeral, but we need not introduce explicit notation
for this since the intention will be clear in context.
The first axiom and rule are “computation rules” for N , and the rest
are just formalized versions of the truth definition, with Cut added.
(N 1): For arbitrary α,
n : N α Γ, m : N provided m ≤ n + 1
(N 2): For , ∈ α[n],

n : N n : N n : N Γ
n : N α Γ
(Ax): If Γ contains a true atom (i.e., an equation or inequation between
closed terms) then for arbitrary α,
n : N α Γ
(∨): For ∈ α[n],
n : N Γ, A, B
n : N α Γ, A ∨ B
(∧): For , ∈ α[n],

n : N Γ, A n : N Γ, B
n : N α Γ, A ∧ B
4.3. Ordinal bounds for provable recursion in PA 175

(∃): For , ∈ α[n],

n : N m : N n : N Γ, A(m)
n : N α Γ, ∃x A(x)
(∀): Provided i ∈ α[max(n, i)] for every i,
max(n, i) : N i Γ, A(i) for every i ∈ N
n : N α Γ, ∀x A(x)
(Cut): For , ∈ α[n],

n : N Γ, C n : N Γ , ¬C
n : N α Γ, Γ
(C is called the “cut formula”).
Remark. The ordinal bounds used here are the standard ones below ε0 .
If arbitrary ordinal bounds were allowed then one could easily show this
system to be complete for first-order arithmetic, since the rules build-in
the truth definition. However, the ordinal structures thereby assigned
to derivations of true sentences would be chaotic and not in any sense
standard, so no informative analysis would be obtained.
Definition. The functions Bα are defined by the recursion
B0 (n) = n + 1, Bα+1 (n) = Bα (Bα (n)), B (n) = B(n) (n)
where denotes any limit ordinal with assigned fundamental sequence
(n).
Note. Since, at successor stages, Bα is just composed with itself once, an
easy comparison with the fast-growing Fα shows that Bα (n) ≤ Fα (n) for
all n > 0. It is also easy to see that for each positive integer k, B·k (n) is
the 2n+1 -times iterate of B·(k−1) on n. Thus another comparison with the
definition of Fk shows that Fk (n) ≤ B·k (n) for all n. Thus every primitive
recursive function is bounded by a B·k for some k. Furthermore, just as
for Hα and Fα , Bα is strictly increasing and B (n) < Bα (n) whenever ∈
α[n]. The next two lemmas show that these functions Bα are intimately
related with the infinitary system we have just set up.
Lemma. m ≤ Bα (n) if and only if n : N α m : N is derivable by the N 1
and N 2 rules only.
Proof. For the “if” part, note that the proviso on the axiom N 1 is
that m ≤ n + 1 and therefore m ≤ Bα (n) is automatic. Secondly if
n : N α m : N arises by the N 2 rule from premises n : N n : N

and n : N m : N where , ∈ α[n] then, assuming inductively
that m ≤ B (n ) and n ≤ B (n), we have m ≤ B (B (n)) and hence
m ≤ Bα (n).
For the “only if” proceed by induction on α, assuming m ≤ Bα (n). If
α = 0 then m ≤ n + 1 and so n : N α m : N by N 1. If α = + 1
176 4. The provably recursive functions of arithmetic

then m ≤ B (n ) where n = B (n), so by the induction hypothesis,

n : N n : N and n : N m : N . Hence n : N α m : N by
N 2 since ∈ α[n]. Finally, if α is a limit then m ≤ Bα(n) (n) and so
n : N α(n) m : N by the induction hypothesis. But since α[n] = α(n)[n]
the ordinal bounds on the premises of this last derivation also lie in
α[n], which means that n : N α m : N as required.
Definition. A sequent n : N α Γ is said to be term controlled if
every closed (i.e., variable-free) term occurring in Γ has numerical value
bounded by Bα (n). An infinitary derivation is then term controlled if
every one of its sequents is term controlled.
Note. For a derivation to be term controlled it is sufficient that each
axiom is term controlled, since in any rule the closed terms occurring
in the conclusion must already occur in a premise (in the case of the ∀
rule, the premise i = 0). Thus, inductively, if α is the ordinal bound on
the conclusion, every such closed term will be bounded by a B (n) for
some ∈ α[n] and hence is bounded by Bα (n) as required. There is
one slightly more complicated case, namely the N 2 rule. But here, each
closed term in the conclusion appears already in the right hand premise,
so that it is bounded by B (n ), and n ≤ B (n) by the left hand premise.
Therefore the term is bounded by B (B (n)) which again is ≤ Bα (n)
since , ∈ α[n].
Lemma (Bounding lemma). Let Γ be a set of Σ1 -formulas or atoms of
the form m : N . If n : N α Γ has a term controlled derivation in which all
cut formulas are Σ1 , then Γ is true at Bα+1 (n). Here, the definition of “true
at” is extended to include atoms m : N by saying that m : N is true at k if
m < k.
Proof. By induction over α according to the generation of the sequent
n : N α Γ, which we shall denote by S.
(Axioms) If S is either a logical axiom or of the form N 1 then Γ contains
either a true atomic equation or inequation, or else an atom m : N where
m < n + 2, so Γ is automatically true at Bα+1 (n).
(N 2) If S arises by the N 2 rule from premises n : N n : N and

n : N Γ where , ∈ α[n] then, by the induction hypothesis, Γ is
true at B +1 (n ) where n < B+1 (n). Therefore by persistence, Γ is true
at B +1 (B+1 (n)) which is less than or equal to Bα (Bα (n)) = Bα+1 (n).
So by persistence again, Γ is true at Bα+1 (n).
(∨, ∧) Because of our definition of Σ1 -formulas, the ∨ and ∧ rules only
apply to bounded (Δ0 (exp)) formulas, so the result is immediate in these
cases (by persistence and the fact that the rules preserve truth).
(∀) Similarly, the only way in which the ∀ rule can be applied is in
a bounded context, where Γ = Γ , ∀x(x < t ∨ A(x)), t is a closed
term, and A(x) a bounded formula. Suppose then that S arises by
4.3. Ordinal bounds for provable recursion in PA 177

the ∀ rule from premises max(n, i) : N i Γ , i < t ∨ A(i) where

i ∈ α[max(n, i)] for every i. Since the derivation is term controlled
we know that (the numerical value of) t is less than or equal to Bα (n).
Therefore by the induction hypothesis and persistence again, for every
i < t, the set Γ , A(i) is true at Bi +1 (Bα (n)). But i ∈ α[Bα (n)] and so
Bi +1 (Bα (n)) ≤ Bα (Bα (n)) = Bα+1 (n). Hence Γ is true at Bα+1 (n) using
persistence once more.
(∃) If Γ contains a Σ1 -formula ∃x A(x) and S arises by the ∃ rule from

premises n : N m : N and n : N Γ, A(m) then by the induction
hypothesisis, Γ, A(m) is true at B +1 (n) where m < B+1 (n). Therefore,
by the definition of “true at”, Γ is true at whichever is the greater of
B+1 (n) and B +1 (n). But since , ∈ α[n] both of these are less than
Bα+1 (n), so Γ is again true at Bα+1 (n).
(Cut) Finally suppose S comes about by a cut on the Σ1 -formula C :=
∃x D(x ) with D bounded. Then the premises are n : N Γ, C and n : N
Γ , ¬C with ordinal bounds , ∈ α[n] respectively. By the induction
hypothesis applied to the first premise, we have numbers m < B+1 (n) such
that Γ, D(m) is true at B+1 (n). From the second premise it is easy to see,
by induction on , that the universal quantifiers in ¬C := ∀x ¬D( x ) may

be instantiated at m to give max(n, m) : N Γ , ¬D(m). Then by the

induction hypothesis (since Γ , ¬D(m) is now a set of Σ1 -formulas) we have

Γ , ¬D(m) true at B +1 (max(n, m)), which is less than B +1 (B+1 (n)),

which is less than or equal to Bα+1 (n). Therefore (by persistence) Γ, Γ
must be true at Bα+1 (n), for otherwise both D(m) and ¬D(m) would be
true, and this cannot be.
4.3.2. Embedding of PA. The bounding lemma above becomes appli-
cable to PA if we can embed it into the infinitary system and then (as done
in the next sub-section) reduce all the cuts to Σ1 -form. This is standard
proof-theoretic procedure. First comes a simple technical lemma which
will be needed frequently.
Lemma (Weakening). If n : N α Γ and n ≤ n and Γ ⊆ Γ and α[m] ⊆

α [m] for every m ≥ n then n : N α Γ . Furthermore, if the given

derivation of n : N Γ is term controlled then so will be the derivation of

α

n : N α Γ provided of course that any new closed terms introduced in

Γ \ Γ are suitably bounded.
Proof. Proceed by induction on α. Note ﬁrst that if n : N α Γ is
an axiom then Γ, and hence also Γ , contains either a true atom or a

declaration m : N where m ≤ n + 1. Thus n : N α Γ is an axiom also.
(N 2) If n : N Γ arises by the N 2 rule from premises n : N m : N
α

and m : N Γ where , ∈ α[n] then, by applying the induction
hypothesis to each of these, n can be increased to n in the ﬁrst, and Γ can
be increased to Γ in the second. But then since α[n] ⊆ α[n ] ⊆ α [n ] the

rule N 2 can be re-applied to yield the desired n : N α Γ .
178 4. The provably recursive functions of arithmetic

(∃) If n : N α Γ arises by the ∃ rule from premises n : N m : N

and n : N Γ, A(m) where ∃x A(x) ∈ Γ and , ∈ α[n] then, by
applying the induction hypothesis to each premise, n can be increased to
n and Γ increased to Γ . The ∃ rule can then be re-applied to yield the

desired n : N α Γ , since as above, , ∈ α [n ].
(∀) Suppose n : N α Γ arises by the ∀ rule from premises
max(n, i) : N i Γ, A(i)
where ∀x A(x) ∈ Γ and i ∈ α[max(n, i)] for every i. Then, by applying
the induction hypothesis to each of these premises, n can be increased to n
and Γ increased to Γ . The ∀ rule can then be re-applied to yield the desired

n : N α Γ , since for each i, i ∈ α[max(n , i)] ⊆ α [max(n , i)].
The remaining rules, ∨, ∧ and Cut, are handled easily by increasing n
to n and Γ to Γ in the premises, and then re-applying the rule.
Theorem (Embedding). Suppose PA Γ(x1 , . . . , xk ) where x1 , . . . , xk
are all the free variables occurring in Γ. Then there is a fixed number d such
that, for all numerical instantiations n1 , n2 , . . . , nk of the free variables, we
have a term controlled derivation of
max(n1 , n2 , . . . , nk ) : N ·d Γ(n1 , n2 , . . . , nk ).
Furthermore, the (non-atomic) cut formulas occurring in this derivation are
just the induction formulas which occur in the original PA proof.
Proof. We work with a Tait style formalisation of PA in which the
induction axioms are replaced by corresponding rules:
Γ, A(0) Γ, ¬A(z), A(z + 1)
Γ, A(t)
with z not free in Γ and t any term. As in the proof of the Σ1 -induction
lemma in 4.1, we may suppose that the given PA proof of Γ( x ) has been
reduced to “free-cut” free form, wherein the only non-atomic cut formulas
are the induction formulas. We simply have to transform each step of this
PA proof into an appropriate, term controlled infinitary derivation.
(Axioms) If Γ( x ) is an axiom of PA then with n = n1 , n2 , . . . , nk
substituted for the variables x = x1 , x2 , . . . , xk , there must occur a true
atom in Γ( n ). Thus we automatically have a derivation of max n : N α
Γ(n ) for arbitrary α. However, we must choose α appropriately so that,
, this sequent is term controlled. To do this, simply note that, since
for all n
PA only has primitive recursively defined function constants, every one of
the (finitely many) terms t( x ) appearing in Γ( x ) is primitive recursive,
and therefore there is a number d such that for all n , B·d (max n
) bounds
the value of every such t( n ). So choose α = · d .
(∨, ∧, Cut) If Γ( x ) arises by a ∨, ∧ or cut rule from premises Γ0 ( x)
and Γ1 ( x ) then, inductively, we can assume that we already have infinitary
derivations of max n : N ·d0 Γ0 (n ) and max n : N ·d1 Γ1 ( n ) where
4.3. Ordinal bounds for provable recursion in PA 179

d0 and d1 are independent of n . So choose d = max(d0 , d1 ) + 1 and note

that · d0 and · d1 both belong to · d [max n ]. Then by re-applying
the corresponding infinitary rule, we obtain max n : N ·d Γ( n ) as
required, and this derivation will again be term controlled provided the
premises were.
(∀) Suppose Γ( x ) arises by an application of the ∀ rule from the premise
x ), A(
Γ0 ( x , z) where Γ = Γ0 , ∀z A( x , z). Assume that we already have
a d0 such that for all n and all m, there is a term controlled derivation
of max( n , m) : N ·d0 Γ0 ( n ), A(
n , m). Then with d = d0 + 1 we have
· d0 ∈ · d [max( n , m)], and so an application of the infinitary ∀ rule
immediately gives max n : N ·d Γ( n ). This is also term controlled
because any closed term appearing in Γ( n ) must appear in Γ0 ( n ), A( n , 0)
and so is already bounded by B·d0 (max n ).
(∃) Suppose Γ( x ) arises by an application of the ∃ rule from the premise
x ), A(
Γ0 ( x )) where Γ = Γ0 , ∃z A(
x , t( x , z). If the witnessing term t
contains any other variables besides x1 , . . . , xk we can assume they have
been substituted by zero. Thus by the induction we have, for every n , a term
controlled derivation of max n : N ·d0 Γ0 ( n ), A(n , t(
n )) for some fixed
d0 independent of n . Now it is easy to see, by checking through the rules,
that any occurrences of the term t( n ) may be replaced by (the numeral for)
its value, say m. Furthermore, because the derivation is term controlled,
m ≤ B·d0 (max n) and hence max n : N ·d0 m : N . Therefore by the
∃ rule we immediately obtain max n : N ·d Γ0 ( n ), ∃z A(
n , z) where
d = d0 + 1, and this derivation is again term controlled.
(Induction) Finally, suppose Γ( x ) = Γ0 ( x ), A( x , t(
x )) arises by the
induction rule from premises Γ0 ( x ), A( x , 0) and Γ0 ( x ), ¬A( x , z), A( x,
z + 1). Assume inductively, that we have d0 and d1 and, for all n and all
i, term controlled derivations of
: N ·d0 Γ0 (
max n n ), A(
n , 0)
n , i) : N ·d1 Γ0 (
max( n ), ¬A(
n , i), A(
n , i + 1).
Now let d2 be any number ≥ max(d0 , d1 ) and such that B·d2 bounds
every subterm of t(
x ) (again there is such a d2 because every subterm of
,
t defines a primitive recursive function of its variables). Then for all n
if m is the numerical value of the term t( n ) we have a term controlled
derivation of
n , m) : N ·(d2 +1) Γ0 (
max( n ), A(
n , m).
For in the case m = 0 this follows immediately from the first premise
above by weakening the ordinal bound; and if m > 0 then by successive
n , i) for i = 0, 1, . . . , m − 1, with weakenings where necessary,
cuts on A(
we obtain first a term controlled derivation of
n , m) : N ·d2 +m Γ0 (
max( n ), A(
n , m)
180 4. The provably recursive functions of arithmetic

and then, since m ∈ [max( n , m)], another weakening provides the de-
sired ordinal bound · (d2 + 1).
Since by our choice of d2 , max(n , m) ≤ B·d2 (max n
) we also have
: N ·d2 max(
max n n , m) : N
and so, combining this with the sequent just derived, the N 2 rule gives
: N ·(d2 +2) Γ0 (
max n n ), A(
n , m).
It therefore only remains to replace the numeral m by the term t(n ), whose
value it is. But it is easy to check, by induction over the logical structure
of formula A, that provided d2 is in addition chosen to be at least twice
the height of the “formation tree” of A, then for all n there is a cut-free
derivation of
: N ·d2 Γ0 (
max n n ), ¬A(
n , m), A(
n , t(
n )).
Therefore, fixing d2 accordingly and setting d = d2 + 3, a final cut on the
formula A( , of
n , m) yields the desired term controlled derivation, for all n
: N ·d Γ0 (
max n n ), A(
n , t(
n )).
This completes the induction case, and hence the proof, noting that the
only non-atomic cuts introduced are on induction formulas.
4.3.3. Cut elimination. Once a PA proof is embedded in the infinitary
system, we need to reduce the cut complexity before the bounding lemma
becomes applicable. As we shall see, this entails an iterated exponential
increase in the original ordinal bound. Thus ε0 , the first exponentially
closed ordinal after , is a measure of the proof-theoretic complexity of
PA.
Lemma (∀-inversion). If n : N α Γ, ∀a A(a) then for every m we have
max(n, m) : N α Γ, A(m).
Proof. We proceed by induction on α. Note first that if the sequent
n : N α Γ, ∀a A(a) is an axiom then so is n : N α Γ and then the
desired result follows immediately by weakening.
Suppose n : N α Γ, ∀a A(a) is the consequence of a ∀ rule with ∀a A(a)
the “main formula” proven. Then the premises are, for each i,
max(n, i) : N i Γ, A(i), ∀a A(a)
where i ∈ α[max(n, i)]. So by applying the induction hypothesis to
the case i = m one immediately obtains max(n, m) : N m Γ, A(m).
Weakening then allows the ordinal bound m to be increased to α.
In all other cases the formula ∀a A(a) is a “side formula” occurring in
the premise(s) of the final rule applied. So by the induction hypothesis,
∀a A(a) can be replaced by A(m) and n by max(n, m). The result then
follows by re-applying that final rule.
4.3. Ordinal bounds for provable recursion in PA 181

Deﬁnition. We insert a subscript “Σr ” on the proof-gate thus:

n : N αΣr Γ
to signify that, in the infinitary derivation, all cut formulas are of the form
Σi or Πi where i ≤ r.
Lemma (Cut reduction). Let n : N αΣr Γ, C and n : N Σr Γ , ¬C
have term controlled derivations, where r ≥ 1 and C is a Σr+1 -formula.
Suppose also that α[n ] ⊆ [n ] for all n ≥ n. Then there is a term
controlled derivation of
n : N +α
Σr Γ, Γ .
Proof. Note that one could obtain Γ straightaway by applying a Σr+1
cut, but the whole point is to replace this by a derivation with Σr cuts only.
We proceed by induction on α according to the given derivation of
n : N αΣr Γ, C . If this is an axiom then C , being non-atomic, can be
deleted, and it’s still a term controlled axiom, and so is n : N +α
Σr Γ, Γ
because B+α (n) ≥ max(B (n), Bα (n)).
Now suppose C is the “main formula” proven in the final rule of the
derivation. Since C := ∃x D(x) with D a Πr -formula, this final rule is an
∃ rule with premises n : N Σ0r m : N and n : N Σ1r Γ, D(m), C where
0 , 1 ∈ α[n] ⊆ [n]. By the induction hypothesis we then have a term
controlled derivation of
n : N +
Σr
1
Γ, D(m), Γ (∗)
Since ¬C := ∀x ¬D(x), and since every closed term in D(m) is bounded by
B1 (n) and hence by B (n), we can apply the above proof of ∀-inversion to
the given derivation of n : N Σr Γ , ¬C so as to obtain a term controlled
derivation of max(n, m) : N Σr Γ , ¬D(m). Hence by the N 2 rule, using
n : N Σ0r m : N ,
n : N +
Σr
1
Γ , ¬D(m) (∗∗)
Then from (∗) and (∗∗) a cut on D(m) gives the desired result:
n : N +α
Σr Γ, Γ .
and this derivation is again term controlled. Notice, however, that (∗∗)
requires 1 to be non-zero so that ∈ + 1 [n]. If, on the other hand,
1 = 0 then either n : N 0Σr Γ is an axiom or else D(m) is a true atom,
in which case ¬D(m) may be deleted from max(n, m) : N Σr Γ , ¬D(m)
and then, by N 2, n : N +α
Σr Γ . Whichever is the case, the desired result
follows by weakening, and term control is preserved.
Finally suppose otherwise, that C is a “side formula” in the final rule
of the derivation of n : N αΣr Γ, C . Then by applying the induction
hypothesis to the premise(s), C gets replaced by Γ , the ordinal bounds
182 4. The provably recursive functions of arithmetic

are replaced by + and term control is preserved. Re-application of

that ﬁnal rule then yields n : N +α
Σr Γ, Γ as required.
Theorem (Cut elimination). If n : N αΣr+1 Γ where n ≥ 1 then
α
n : N
Σr Γ.

Furthermore, if the given derivation is term controlled so is the resulting one.

Proof. Proceeding by induction on α, ﬁrst suppose n : N αΣr+1 Γ, Γ
comes about by a cut on a Σr+1 or Πr+1 -formula C . Then the premises
are n : N Σ0r+1 Γ, C and n : N Σ1r+1 Γ , ¬C where 0 , 1 ∈ α[n]. By
an appropriate weakening we may increase whichever is the smaller of
0 , 1 so that both ordinal bounds become = max(0 , 1 ). Applying
the induction hypothesis we obtain

n : N
Σr Γ, C and n : N Σr Γ , ¬C.

Then since one of C, ¬C is Σr+1 , the above cut reduction lemma with
α = = yields
·2
Γ, Γ .

n : N
Σr

But ∈ α[n] and so · 2[m] ⊆ α [m] for every m ≥ n. Therefore by

α

weakening, n : N Σr Γ, Γ .
Now suppose n : N Σr+1 Γ arises by any rule other than a cut on a Σr+1
α

or Πr+1 -formula. First, apply the induction hypothesis to the premises,

thus reducing r + 1 to r and increasing ordinal bounds to , and then
α
re-apply that ﬁnal rule to obtain n : N Σr Γ, noting that if ∈ α[n]
then ∈ α [n] provided n ≥ 1.
All the steps preserve term control.
Theorem (Preliminary cut elimination). If n : N ·d
Σr+1
+c
Γ with r ≥ 1
and n ≥ 1, then
·2 d c+1
n : N
Σr Γ
and this derivation is term controlled if the ﬁrst one is.
Proof. This is just a special case of the main cut elimination theorem
above, where α < 2 . Essentially the same steps are applied, but with a
few extra technicalities.
Suppose n : N ·d Σr+1
+c
Γ, Γ arises by a cut on a Σr+1 -formula C . By
weakening we may assume that the premises are n : N Σr+1 Γ, C and
n : N Σr+1 Γ , ¬C both with the same ordinal bound ∈ · d + c[n].
Thus = · k + l where either k = d and l < c or k < d and l ≤ n. The
induction hypothesis then gives n : N Σr Γ, C and n : N Σr Γ , ¬C

where = k · 2l +1 . The cut reduction lemma then gives n : N ·2
Σr Γ, Γ .
If k = d and l < c then · 2[m] ⊆ d · 2c+1 [m] for all m ≥ n and so the
desired result follows immediately by weakening. On the other hand, if
4.3. Ordinal bounds for provable recursion in PA 183

k < d and l ≤ n then, setting n = 2n+2 , we have · 2[m] ⊆ d [m] for all
m ≥ n . Thus again by weakening, n : N
d
Σr Γ, Γ . But B+1 (n) ≥ n

so n : N Σr n : N . Therefore by the N 2 rule we again have
+1

·2
Γ, Γ
d c+1
n : N
Σr

as required, since + 1 and d both belong to d · 2c+1 [n] when n ≥ 1.

If n : N ·d
Σr+1
+c
Γ comes about by the ∀-rule then the premises are, for
each i, max(n, i) : N Σir+1 Γ, A(i) where each i is of the form · k + l
with either k = d and l < c or k < d and l ≤ max(n, i). Applying
the induction hypothesis to each premise gives max(n, i) : N Σir Γ, A(i)
where i = k · 2l +1 . If k = d and l < c then i ∈ d · 2c+1 [max(n, i)]. If
k < d and l ≤ max(n, i) then with n = 2n+1 and i = 2i+1 we obtain ﬁrst,
by weakening, max(n , i ) : N
d
Σr Γ, A(i), and second, max(n, i) : N Σr

max(n , i ) : N because B (max(n, i)) ≥ max(n , i ). Therefore by N 2,
d
max(n, i) : N Σr
+1
Γ, A(i). Thus in either case we have, for each i, an
ordinal i ∈ · 2 [max(n, i)] such that
d c+1

max(n, i) : N Σir Γ, A(i).

The desired result then follows by re-applying the ∀-rule.
Finally suppose n : N ·d
Σr+1
+c
Γ arises by any other rule or axiom. Then
the premises (if any) are of the form n : N Σr+1 Γ or, in the case of N 2
(with a weakening to make the ordinal bounds the same) m : N Σr+1 Γ
and n : N Σr+1 m : N . In each case ∈ · d + c[n] and so = · k + l
where either k = d and l < c or k < d and l ≤ n. The induction
hypothesis then transforms each such premise, reducing r + 1 to r and
increasing to k · 2l +1 . If k = d and l < c then k · 2l +1 belongs to
d · 2c+1 [n]. If k < d and l ≤ n then, just as before, we can use N 2 and
weakening to increase the bound k · 2l +1 to d + 1 which again belongs
to d · 2c+1 [n] since n is assumed to be ≥ 1. Thus whichever is the case,
each premise of the rule applied now has r instead of r + 1 and an ordinal
bound belonging to d · 2c+1 [n]. Re-application of that final rule (or
axiom) then immediately gives
·2d c+1
n : N
Σr Γ
as required, and each step preserves term control.
4.3.4. The classification theorem.
Theorem. For each i the following are equivalent:
(a) f is provably recursive in IΣi+1 ;
(b) f is elementarily definable from Fα = H α for some α < ε0 (i);
(c) f is computable in Fα -bounded time, for some α < ε0 (i);
(d) f is ε0 (i)-recursive.
184 4. The provably recursive functions of arithmetic

Proof. The theorem in 4.2.3 characterizing ε0 (i)-recursive functions

gives the equivalence of (c) and (d), and its proof also shows their equi-
valence with (b). The implication from (d) to (a) was a corollary in 4.2.5.
It therefore only remains to prove that (a) implies (b).
Suppose that f : Nk → N is provably recursive in IΣi+1 . Then there is
a Σ1 -formula F ( and m, f(
x , y) such that for all n n ) = m if and only if
F (n , m) is true, and such that

IΣi+1 ∃y F (
x , y).

In the case i = 0 we have already proved that f is primitive recursive

and hence ε0 (0)-recursive, so henceforth assume i > 0. By the embedding
theorem there is a ﬁxed number d and, for all instantiations n of the
variables x
, a term controlled derivation of

: N ·d
max n Σi+1 ∃y F (
n , y).

Let n = max n if max n > 0 and n = 1 if max n = 0. Then by the

preliminary cut elimination theorem with c = 0,
d
·2
n : N
Σi ∃y F (
n , y)

and by weakening, since d · 2[m] ⊆ d +1 [m] for all m ≥ n,

d +1
n : N
Σi ∃y F (
n , y).

Now, if i > 1, apply the main cut elimination theorem i −1 times, bringing
the cuts down to the Σ1 -level and simultaneously increasing the ordinal
bound d +1 by i −1 iterated exponentiations to the base . This produces

n : N αΣ1 ∃y F (
n , y)

with ordinal bound α < ε0 (i) (recalling that, as defined earlier, ε0 (i)
consists of an exponential stack of i + 1 ’s). Since this last derivation is
still term controlled, we can next apply the bounding lemma to conclude
that ∃y F (
n , y) is true at Bα+1 (n), which is less than or equal to Fα+1 (n).
This means that for all n , Fα+1 (n) bounds the value m of f(
n ) and bounds
witnesses for all the existential quantifiers in the prefix of the Σ1 defining
formula F ( n , m). Thus, relative to Fα+1 , the defining formula is bounded
and therefore elementarily decidable, and f can be defined from it by a
bounded least-number operator. That is, f is elementarily definable from
Fα+1 .
Corollary. Every function provably recursive in IΣi+1 is bounded by an
Fα = H α for some α < ε0 (i). Hence Hε0 (i+1) is not provably recursive in
IΣi+1 , for otherwise it would dominate itself.
4.4. Independence results for PA 185

4.4. Independence results for PA

If the Hardy hierarchy is extended to ε0 itself by the deﬁnition

Hε0 (n) = Hε0 (n) (n)
then clearly (by what we have already done) the provable recursiveness
of Hε0 is a consequence of transfinite induction up to ε0 . However, this
function is obviously not provably recursive in PA, for if it were we would
have an α < ε0 such that Hε0 (n) ≤ Hα (n) for all n, contradicting the fact
that α ∈ ε0 [m] for some m and hence Hα (m) < Hε0 (m). Thus, although
transfinite induction up to any fixed ordinal below ε0 is provable in PA,
transfinite induction all the way up to ε0 itself is not. This is Gentzen’s
result, that ε0 is the least upper bound of the “provable ordinals” of PA.
Together with the Gödel incompleteness phenomena, it underlies all log-
ical independence results for PA and related theories. The question that
remained, until the later 1970s, was whether there might be other inde-
pendence results of a more natural and clear mathematical character, i.e.,
genuine mathematical statements formalizable in the language of arith-
metic which, though true, are not provable in PA. A variety of such results
have emerged since the first, and most famous, one of Paris and Harring-
ton [1977] which is treated below. But we begin with a much simpler one,
due also to Paris and his then student Kirby (Kirby and Paris [1982]).
The proofs given here however (due respectively to Cichon [1983] and Ke-
tonen and Solovay [1981]) are quite different from their originals, which
had more model-theoretic motivations. In each case there emerges a deep
connection with the Hardy hierarchy.
4.4.1. Goodstein sequences. Choose any two positive numbers a and x,
and write a in base-(x + 1) normal form thus:
a = (x + 1)a1 · m1 + (x + 1)a2 · m2 + · · · + (x + 1)ak · mk
where 1 ≤ m1 , m2 , . . . , mk ≤ x and a1 > a2 > · · · > ak . Then write each
exponent ai in base-(x + 1) normal form, and each of their exponents,
etcetera until all exponents are ≤ x. The expression finally obtained is
called the complete base-(x + 1) form of a.
Definition. Let g(a, x) be the number which results by first writing
a − 1 in complete base-(x + 1) form, and then increasing the base from
(x + 1) to (x + 2), leaving all the coefficients mi ≤ x fixed.
Definition. The Goodstein sequence on (a, x) is then the sequence of
numbers {ai }i≥x generated by iteration of the operation g thus: ax = a
and ax+j+1 = g(ax+j , x + j).
For example, the Goodstein sequence on (16, 1) begins a1 = 16, a2 =
112, a3 = 1,284, a4 = 18,753, a5 = 326,594, etc.
186 4. The provably recursive functions of arithmetic

Deﬁnition. Given a number a written in complete base-(x + 1) form,

let ord (a, x) be the ordinal in Cantor normal form obtained by replacing
the base (x + 1) throughout by .
Definition. For α > 0 define the x-predecessor of α to be Px (α) = the
maximum element of α[x].
Lemma. ord (a − 1, x) = Px (ord (a, x)).
Proof. The proof is by induction on a. If a = 1 then ord (a − 1, x) =
0 = Px (1). Suppose then, that a > 1, and let the complete base-(x + 1)
form of a be
a = (x + 1)a1 · m1 + (x + 1)a2 · m2 + · · · + (x + 1)ak · mk .
If ak = 0 then ord (a, x) is a successor and ord (a −1, x) = ord (a, x)−1 =
Px (ord (a, x)). If ak > 0 let
b = (x + 1)a1 · m1 + (x + 1)a2 · m2 + · · · + (x + 1)ak · (mk − 1).
Then in complete base-(x + 1) we have
a − 1 = b + (x + 1)ak −1 · x + (x + 1)ak −2 · x + · · · + (x + 1)0 · x.
Let α = ord (a, x), = ord (b, x) and αk = ord (ak , x). Then α = + αk
and by the induction hypothesis we have
2 3
ord (a − 1, x) = + Px (αk ) · x + Px (αk ) · x + Px (αk ) · x + · · · + x
where Px (αk ), Px2 (αk ), Px3 (αk ), . . . , 0 are all the elements of αk [x] in de-
scending order. Therefore ord (a − 1, x) is the maximum element of
+ αk [x]. But this set is just α[x], so the proof is complete.
Lemma. Let {ai }i≥x be the Goodstein sequence on (a, x). Then for each
j > 0,
ord (ax+j , x + j) = Px+j−1 Px+j−2 · · · Px+1 Px (ord (a, x)).
Proof. By induction on j. The basis j = 1 follows immediately from
the last lemma since, by the definitions, ord (ax+1 , x + 1) = ord (ax −
1, x) = ord (a − 1, x). Similarly for the step from j to j + 1:
ord (ax+j+1 , x + j + 1) = ord (ax+j − 1, x + j)
= Px+j (ord (ax+j , x + j))
and the result then follows immediately by the induction hypothesis.
Since the ordinals associated with the stages of a Goodstein sequence
decrease, it follows that every such sequence must eventually terminate
at 0. This was established by Goodstein himself many years ago. However,
the following result, due to Cichon [1983], brings out a surprisingly close
connection with the Hardy hierarchy.
Theorem. Every Goodstein sequence terminates: if {ai }i≥x is the Good-
stein sequence on (a, x) then there is an m such that am = 0. Furthermore
the least such m is given by m = Hord (a,x) (x).
4.4. Independence results for PA 187

Proof. Since ord (ax+j+1 , x+j+1) = Px+j (ord (ax+j , x+j)) it follows
straight away by well-foundedness that there must be a first stage k at
which ord (ax+k , x + k) = 0 and hence ax+k = 0. Letting m = x + k we
therefore have, by the last lemma,
m = y>x .Py−1 Py−2 · · · Px+1 Px (ord (a, x)) = 0.
But it is very easy to check by induction on α > 0 that for all x,
Hα (x) = y>x .Py−1 Py−2 · · · Px+1 Px (α) = 0
since Px (1) = 0, Px (α + 1) = α and Px (α) = Px (α(x)). Hence m =
Hord (a,x) (x).
The theorem of Kirby and Paris [1982] now follows immediately:
Corollary. The statement “every Goodstein sequence terminates” is
expressible in the language of PA and, though true, is not provable in PA.
Proof. The Goodstein sequence {ai }i≥x on (a, x) is generated by iter-
ation of the function g which is clearly primitive recursive. Therefore ai
is a primitive recursive function of a, x, i, and hence there is a Σ1 -formula
which (provably) defines it in PA. Thus the fact that every Goodstein
sequence terminates, i.e., ∀a>0 ∀x>0 ∃y>x (ay = 0), is expressible in PA. It
cannot be proved in PA however, for otherwise the function Hord (a,x) (x) =
the least y > x such that ay = 0 would be provably recursive. But this
is impossible because, by substituting for a the primitive recursive func-
tion e(x) consisting of an iterated exponential stack of (x + 1)’s with
stack-height (x + 1), one obtains ord (e(x), x) = ε0 (x). Hence Hε0 (x) =
Hord (e(x),x) (x) would be provably recursive also; a contradiction.
4.4.2. The modified finite Ramsey theorem. Ramsey’s theorem for in-
finite sets [1930] says that for every positive integer n, each finite parti-
tioning (or “colouring”) of the n-element subsets of an infinite set X has
an infinite homogeneous (or “monochromatic”) subset Y ⊂ X , mean-
ing all n-element subsets of Y have the same colour (lie in the same
partition). Ramsey also proved a version for finite sets: the finite Ram-
sey theorem states that given any positive integers n, k, l with n < k,
there is an m so large that every partitioning of the n-element subsets of
m = {0, 1, . . . , m − 1}, into l (disjoint) classes, has a homogeneous subset
Y ⊂ m with cardinality at least k. This is usually written
∀n,k,l ∃m (m → (k)nl )
where, letting m [n] denote the collection of all n-element subsets of m,
m → (k)nl means that for every function (colouring) c : m [n] → l there is
a subset Y ⊂ m of cardinality at least k, which is homogeneous for c, i.e.,
c is constant on the n-element subsets of Y .
Whereas the infinite Ramsey theorem is not first-order expressible (see
Jockusch [1972] for a recursion-theoretic analysis of the degrees of ho-
mogeneous sets) the finite Ramsey theorem clearly is. For by standard
188 4. The provably recursive functions of arithmetic

coding, the relation m → (k)nl is easily seen to be elementary recursive and

so expressible as a Δ0 (exp)-formula. The statement therefore asserts the
existence of a recursive function which computes the least such m from
n, k, l . This function is known to have super-exponential growth rate, so it
is primitive recursive but not elementary. Thus the finite Ramsey theorem
is independent of IΔ0 (exp) but provable in IΣ1 .
The modified finite Ramsey theorem of Paris and Harrington [1977]
is also expressible as a Π02 -formula, but it is now independent of Peano
arithmetic. Their modification is to replace the requirement that the fi-
nite homogeneous set Y has cardinality at least k, by the requirement
that Y is “large” in the sense that its cardinality is at least as big as its
smallest element, i.e., |Y | ≥ min Y . (Thus {5, 7, 8, 9, 10} is large but
{6, 7, 80, 900, 1010 } is not.) We can now (if we wish, and it’s simpler to do
so) dispense with the parameter k and state the modified version as
∀n,l ∃m (m → (large)nl )
where m → (large)nl means that every colouring c : m [n] → l has a large
homogeneous set Y ⊂ m, it being assumed always that Y must have at
least n + 1 elements in order to avoid the trivial case Y = m = n.
That the modified finite Ramsey theorem is indeed true follows easily
from the infinite Ramsey theorem. For assume, toward a contradiction,
that it is false. Then there are fixed n and l such that for every m there
is a colouring cm : m [n] → l with no large homogeneous set. Define a
“diagonal” colouring on all (n + 1)-element subsets of N by
d ({x0 , x1 , . . . , xn−1 , xn }) = cxn ({x0 , x1 , . . . , xn−1 })
where x0 , x1 , . . . , xn−1 , xn are written in increasing order. Then by the in-
finite Ramsey theorem, d has an infinite homogeneous set Y ⊂ N. We can
therefore select from Y an increasing sequence {y0 , y1 , . . . , yy0 } with y0 ≥
n + 1. Now let m = yy0 and choose Y0 = {y0 , y1 , . . . , yy0 −1 }. Then Y0 is
a large subset of m and is homogeneous for cm since cm (x0 , . . . , xn−1 ) =
d (x0 , . . . , xn−1 , m) is constant on all {x0 , . . . , xn−1 } ∈ Y0[n] . This is the
desired contradiction.
Remark. For fixed n, the infinite Ramsey theorem for (n + 1)-element
subsets, and this derivation of the modified finite Ramsey theorem from
it, can both be proven in the second-order theory of arithmetical com-
prehension, which is conservative over PA. Therefore the modified finite
Ramsey theorem for each fixed n is provable in PA.
Paris and Harrington’s original proof that
PA ∀n ∀l ∃m (m → (large)nl )
has a more model-theoretic flavour, relying on Gödel’s second incomplete-
ness theorem. Later, Ketonen and Solovay [1981] gave a refined, purely
4.4. Independence results for PA 189

combinatorial analysis of the rate of growth of the Paris–Harrington func-

tion

PH(n, l ) = m (m → (large)nl )

showing that for suﬃciently large n,

Fε0 (n − 3) ≤ PH(n, 8) ≤ Fε0 (n − 2).

The lower bound immediately gives the independence result, since it

says that PH(n, 8) eventually dominates every provably recursive func-
tion of PA. The basic ingredients of the Ketonen–Solovay method for
the lower bound are set out concisely in the book Ramsey Theory by
Graham, Rothschild and Spencer [1990] where a somewhat weaker re-
sult is presented. However, it is not diﬃcult to adapt their treatment
so as to obtain a fairly short proof that, for a suitable elementary func-
tion l (n),

Hε0 (n) ≤ PH(n + 1, l (n)).

Though it does not give the reﬁned bounds of Ketonen–Solovay, this is

enough for the independence result.
The proof has two parts. First, deﬁne certain colourings on ﬁnite sets
of ordinals below ε0 , for which we can prove that all of their homoge-
neous sets must be “relatively small”. Then, as in the foregoing result
on Goodstein sequences, use the Hardy functions to associate numbers
x between n and Hε0 (n) with ordinals Px Px−1 . . . Pn (ε0 ). By this corre-
spondence one obtains colourings on (n + 1)-element subsets of Hε0 (n)
which have no large homogeneous sets. Hence PH must grow at least as
fast as Hε0 .

Deﬁnition. Given Cantor normal forms α = α1 · a1 + · · · + αr · ar

and = 1 ·b1 +· · ·+ s ·bs with α > , let D(α, ) denote the ﬁrst (i.e.
greatest) exponent αi at which they diﬀer. Thus α1 ·a1 +· · ·+ αi−1 ·ai−1 =
1 · b1 + · · · + i−1 · bi−1 and αi · ai > i · bi + · · · + s · bs .

Deﬁnition. For each n ≥ 2 the function Cn from the (n + 1)-element

subsets of ε0 (n − 1) into 2n − 1 is given by the following induction. The
deﬁnition of Cn ({α0 , α1 , . . . , αn }) requires that the ordinals are listed in
descending order; to emphasise this we write Cn (α0 , α1 , . . . , αn )> instead.
Note that if α, < ε0 (n − 1) then D(α, ) < ε0 (n − 2).
⎧
⎪
⎨0 if D(α0 , α1 ) > D(α1 , α2 )
C2 (α0 , α1 , α2 )> = 1 if D(α0 , α1 ) < D(α1 , α2 )
⎪
⎩
2 if D(α0 , α1 ) = D(α1 , α2 )
190 4. The provably recursive functions of arithmetic

and for each n > 2,

Cn (α0 , . . . , αn )> =
⎧
⎪
⎨2 · Cn−1 ({0 , . . . , n−1 }) if D(α0 , α1 ) > D(α1 , α2 )
2 · Cn−1 ({n−1 , . . . , 0 })+1 if D(α0 , α1 ) < D(α1 , α2 )
⎪
⎩ n
2 −2 if D(α0 , α1 ) = D(α1 , α2 )
where i = D(αi , αi+1 ) for each i < n.
Lemma. If S = {0 , 1 , . . . , r }> is homogeneous for Cn then, letting
max(0 ) denote the maximum coefficient of 0 and k(n) = 1 + 2 + · · · +
(n − 1) + 2, we have |S| < max(0 ) + k(n).
Proof. Proceed by induction on n ≥ 2.
For the base case we have ε0 (1) = and C2 : ( )[3] → 3. Since S is
a subset of the values of D(i , i+1 ), for i < r, are integers. Let 0 , the
greatest member of S, have Cantor normal form
0 = m · cm + m−1 · cm−1 + · · · + 2 · c2 + · c1 + c0
where some of cm−1 , . . . , c1 , c0 may be zero, but cm > 0. Then for each
i < r, D(i , i+1 ) ≤ cm ≤ max(0 ). Now if C2 has constant value 0 or 1 on
S [3] then all D(i , i+1 ), for i < r, are distinct, and since we have r distinct
numbers ≤ max(0 ) it follows that |S| = r + 1 < max(0 ) + 3 as required.
If, on the other hand, C2 has constant value 2 on S [3] then all the D(i , i+1 )
are equal, say to j. But then the Cantor normal form of each i contains
a term j · ci,j where 0 ≤ cr,j < cr−1,j < · · · < c0,j = cj ≤ max(0 ).
In this case we have r + 1 distinct numbers ≤ max(0 ) and hence, again,
|S| = r + 1 < max(0 ) + 3.
For the induction step assume n > 2. Assume also that r ≥ k(n), for
otherwise the desired result |S| < max(0 ) + k(n) is automatic.
First, suppose Cn is constant on S [n+1] with even value < 2n − 2.
Note that the final (n + 1)-tuple of S is (r−n , r−n+1 , r−n+2 , . . . , r )> .
Therefore, by the first case in the definition of Cn ,
D(0 , 1 ) > D(1 , 2 ) > · · · > D(r−n+1 , r−n+2 )
and this set is homogeneous for Cn−1 (the condition r ≥ k(n) ensures that
it has more than n elements). Consequently, by the induction hypothesis,
r − n + 2 < max(D(0 , 1 )) + k(n − 1) and therefore, since D(0 , 1 ))
occurs as an exponent in the Cantor normal form of 0 ,
|S| = r + 1 < max(D(0 , 1 )) + k(n − 1) + (n − 1) ≤ max(0 ) + k(n)
as required.
Second, suppose Cn is constant on S [n+1] with odd value. Then by the
definition of Cn we have
D(r−n+1 , r−n+2 ) > D(r−n , r−n+1 ) > · · · > D(0 , 1 )
4.4. Independence results for PA 191

and this set is homogeneous for Cn−1 . So by applying the induction

hypothesis, r − n + 2 < max(D(r−n+1 , r−n+2 )) + k(n − 1) and hence
|S| = r + 1 < max(D(r−n+1 , r−n+2 )) + k(n).
Now in this case, since D(1 , 2 ) > D(0 , 1 ) it follows that the initial
segments of the Cantor normal forms of 0 and 1 are identical down to
and including the term with exponent D(1 , 2 ). Therefore D(1 , 2 ) =
D(0 , 2 ). Similarly D(2 , 3 ) = D(1 , 3 ) = D(0 , 3 ) and by repeating
this argument one obtains eventually D(r−n+1 , r−n+2 ) = D(0 , r−n+2 ).
Thus D(r−n+1 , r−n+2 ) is one of the exponents in the Cantor normal
form of 0 , so its maximum coefficient is bounded by max(0 ) and, again,
|S| < max(0 ) + k(n).
Finally suppose Cn is constant on S [n+1] with value 2n −2. In this case all
the D(i , i+1 ) are equal, say to , for i < r −n +2. Let di be the coefficient
of in the Cantor normal form of i . Then d0 > d1 > · · · > dr−n+1 > 0
and so r − n + 1 < d0 ≤ max(0 ). Therefore |S| = r + 1 < max(0 ) + k(n)
and this completes the proof.
Lemma. For each n ≥ 2 let l (n) = 2k(n) + 2n − 1. Then there is a
colouring cn : Hε0 (n−1) (k(n))[n+1] → l (n) which has no large homogeneous
sets.
Proof. Fix n ≥ 2 and let k = k(n). Recall that
Hε0 (n−1) (k) = y>k (Py−1 Py−2 · · · Pk (ε0 (n − 1)) = 0).
As i increases from k up to Hε0 (n−1) (k)−1, the associated sequence of ordi-
nals αi = Pi Pi−1 · · · Pk (ε0 (n − 1)) strictly decreases to 0. Therefore, from
the above colouring Cn on sets of ordinals below ε0 (n − 1), we can define
a colouring dn on the (n + 1)-subsets of {2k, 2k + 1, . . . , Hε0 (n−1) (k) − 1}
thus:
dn (x0 , x1 , . . . , xn )< = Cn (αx0 −k , αx1 −k , . . . αxn −k )> .
Clearly, every homogeneous set {y0 , y1 , . . . , yr }< for dn corresponds to a
homogeneous set {αy0 −k , αy1 −k , . . . αyr −k }> for Cn , and by the previous
lemma it has fewer than max(αy0 −k )+k elements. Now the maximum co-
efficient of any Pi () is no greater than the maximum of i and max(), so
max(αy0 −k ) ≤ y0 − k. Therefore every homogeneous set {y0 , y1 , . . . , yr }<
for dn has fewer than y0 elements.
From dn construct cn : Hε0 (n−1) (k)[n+1] → l (n) as follows:

dn (x0 , x1 , . . . , xn ) if x0 ≥ 2k
cn (x0 , x1 , . . . , xn )< =
x0 + 2n − 1 if x0 < 2k.
Suppose {y0 , y1 , . . . , yr }< is homogeneous for cn with colour ≥ 2n − 1.
Then by the second clause, y0 +2n −1 = cn (y0 , y1 , . . . , yn ) = cn (y1 , y2 , . . . ,
yn+1 ) = y1 + 2n − 1 and hence y0 = y1 which is impossible. Therefore any
homogeneous set for cn has least element y0 ≥ 2k and, by the first clause,
192 4. The provably recursive functions of arithmetic

it must be homogeneous for dn also. Thus it has fewer than y0 elements,

and hence this colouring cn has no large homogeneous sets.
Theorem (Paris–Harrington). The modiﬁed ﬁnite Ramsey theorem
∀n ∀l ∃m (m → (large)nl ) is true but not provable in PA.
Proof. Suppose, toward a contradiction, that ∀n ∀l ∃m (m → (large)nl )
were provable in PA. Then the function
PH(n, l ) = m (m → (large)nl )
would be provably recursive in PA, and so also would be the function
f(n) = PH(n + 2, l (n + 1)). For each n, f(n) is so big that every
colouring on f(n)[n+2] with l (n + 1) colours has a large homogeneous
set. The last lemma, with n replaced by n + 1, gives a colouring cn+1 :
Hε0 (n) (k(n+1))[n+2] → l (n+1) with no large homogeneous sets. Therefore
f(n) > Hε0 (n) (k(n + 1)) for otherwise cn+1 , restricted to f(n)[n+2] , would
have a large homogeneous set. Since Hε0 (n) is increasing, Hε0 (n) (k(n +
1)) > Hε0 (n) (n) = Hε0 (n). Hence f(n) > Hε0 (n) for all n, and since Hε0
eventually dominates all provably recursive functions of PA it follows that
f cannot be provably recursive. This is the contradiction.

4.5. Notes

Intuitionistic (Heyting) arithmetic has the same provably recursive

functions as PA, for if a Σ1 -definition ∃y F (x, y) were provable in PA
then (writing ∃˜ for ∃ as in chapter 1) we would have in minimal logic
∀y (F (x, y) → ⊥) → ⊥
from stability for F (x, y), i.e., ∀x,y (¬¬F (x, y) → F (x, y)). The latter is
assumed to be derivable since we consider Σ1 -formulas. But since minimal
logic has no special rule for ⊥, it could be replaced here by the formula
∃y F (x, y). Then the premise of the implication becomes provable, and
so ∃y F (x, y) follows constructively. (See chapter 7 for the A-translation
etc.)
The characterization and classification of functions provably recursive
in the fragments IΣi of PA seems to have remained well-known folklore for
many years. The authors are not aware of a full, explicit treatment before
Fairtlough and Wainer [1998], a precursor to this, though Paris [1980]
indicates a model-theoretic approach. In order to show merely that the
provably recursive functions of PA are “captured” by the fast-growing
hierarchy below ε0 , without classifying the fragments IΣi , one does not
need to worry about “term control” and the proof becomes more straight-
forward. The cut reduction lemma easily extends to cut formulas C of
existential, disjunctive or atomic form, and then step-by-step application
of cut elimination reduces all infinitary derivations to cut-free ones, at
4.5. Notes 193

finitely iterated exponential cost but still with ordinal bounds α < ε0 .
An existential theorem must then come about by an ∃-rule, the left hand
premise of which immediately gives a Bα -bounded witness. Furthermore,
Gentzen’s fundamental result, that the consistency of PA is a consequence
of transfinite induction below ε0 , follows immediately. This is because
any PA proof of a false atom (e.g., 0 = 1) would embed into the infinitary
system and then, by transfinite induction, have a cut-free derivation with
bound less than ε0 . But inspection of the cut-free rules shows straight
away that this is impossible. The argument is formalizable in elementary
or primitive recursive arithmetic. There are many published variations on
this theme, listed at the beginning of this chapter, but Weiermann [2006]
appears to give the shortest treatment yet. There are other non-standard
model-theoretic approaches too, which are mathematically appealing; see
for example Avigad and Sommer [1997] and Hájek and Pudlák [1993].
The authors, however, remain biased toward direct proof-theoretic anal-
ysis.
Schwichtenberg [1977] and Girard [1987] provide many further details
and applications of Gentzen’s cut elimination method [1935], [1943], in-
cluding the characterization of provably recursive functions. There is also
a result of Kreisel and Lévy [1968] that, over PA, the scheme of transfinite
induction up to ε0 is equivalent to the uniform reflection principle (i.e.,
formalized soundness: if a formula is PA provable, it’s true). Clearly this
implies ConPA , the PA consistency statement, so PA + TI (ε0 ) ConPA
follows, and in fact IΣ1 + TI (ε0 ) ConPA . Paris and Harrington [1977]
prove that the uniform reflection principle for Σ1 -formulas is equivalent,
over PA, to their modified finite Ramsey theorem, hence to the totality
of the associated function PH. This in turn is equivalent to the totality
of Hε0 . For more on the connection between reflection principles and the
Hardy (or, in their terms, “descent recursive”) functions, see Friedman
and Sheard [1995]. Such results, on the equivalence (over suitable base
theories) of “logical” with “mathematical” independence statements, were
the beginnings of the wide area now known as reverse mathematics; see
Simpson [2009].
In a related direction, Weiermann has, over recent years, developed
an extensive body of work on “threshold” results, the most immediate
example being the following: call a finite subset Y of N f-large if it is of
size at least f(min Y ). Let PHf denote the Paris–Harrington function,
modified to the requirement that every colouring should have an f-large
homogeneous set. Then with f any finite iteration of the base-2 log
function, PHf is still not provably recursive in PA, whereas with f =
log∗ where log∗ (n) = log(n) (n), the provability threshold is reached, and
PHf becomes provable. Many further results and interconnections with
finite combinatorics can be found in Weiermann [2004], [2005], [2007].
Bovykin [2009] gives an excellent survey of the area, with model-theoretic
194 4. The provably recursive functions of arithmetic

proofs of basic results. Generalisations of “largeness” play a role already in

Ketonen and Solovay [1981], where an interval I = [n, m] in N is α-large if
α = 0 and n ≤ m, or α is a successor and I contains a proper end-segment
which is (α − 1)-large, or α is a limit and I is α(n)-large. Then I is α-large
iﬀ it is fα -large in the previous sense, where fα (n) = Hα (n) − (n − 1).
For more on α-largeness and connections with the fast-growing hierarchy,
see Hájek and Pudlák [1993].
Chapter 5

ACCESSIBLE RECURSIVE FUNCTIONS,

ID< AND Π11 -CA0

As we shall see in section 5.1 below, the class of all recursive functions fails
to possess a natural hierarchical structure, generated inductively “from
within”. On the other hand, many proof-theoretically significant sub-
recursive classes do. This chapter attempts to measure the limits of pred-
icative generation in this context, by classifying and characterizing those
(predictably terminating) recursive functions which can be successively
defined according to an autonomy principle of the form: allow recursions
only over well-orderings which have already been “coded” at previous
levels. The question is: how can a recursion code a well-ordering? The
answer lies in Girard’s theory of dilators [1981], but it is reworked here
in an entirely different and much simplified framework specific to our
subrecursive purposes. The “accessible” recursive functions thus gener-
ated turn out to be those provably recursive in the theory ID< of finitely
iterated inductive definitions, or equivalently in the second-order theory
Π11 -CA0 of Π11 -comprehension.

5.1. The subrecursive stumblingblock

An obvious goal would be to find, once and for all, a natural trans-
finite hierarchy classification of all the recursive functions which clearly
reflects their computational and termination complexity. There is one
for the total type-2 recursive functionals, as we saw in 2.8. So why
isn’t there one for the type-1 recursive functions as well? The reason
is that the termination statement for a type-2 recursive functional is a
well-foundedness condition—i.e., a statement that a certain recursive or-
dinal exists—whereas the termination statements for recursive functions
are merely arithmetical and have nothing apparently to do with ordinals.
This is all somewhat vague, but there are some basic negative results of
general recursion theory which help explain it more precisely.
Firstly, it is simply not possible to classify recursive functions in terms
of the order-types of termination orderings, since every recursive function

195
196 5. Accessible recursive functions, ID< and Π11 -CA0
has an easily definable (e.g., Δ0 or elementary) termination ordering of
length . This result goes back to Myhill [1953], Routledge [1953] and
Liu [1960].
5.1.1. An old result of Myhill, Routledge and Liu.
Theorem. For every recursive function ϕe there is an elementary recursive
well-ordering <e of order-type in which the rank of any point (n, 0) is a
bound on the number of steps needed to compute ϕe (n). Thus ϕe is definable
by an easy recursion over <e .
Proof. Define the well-ordering <e ⊆ N × N by: (n, s) <e (n , s ) if
and only if either (i) n < n or (ii) n = n , s > s and ϕe (n) is undefined
at step s . Then the well-foundedness of <e is just a restatement of
the assumption that the computation of ϕe (n) terminates for every n.
Furthermore the rank or height of the point (n, 0) is just the rank of
(n − 1, 0) (if n > 0) plus the number of steps needed to compute ϕe (n).
Using the notation of Kleene’s normal form theorem in chapter 2, we can
define the rank r of any point in the well-ordering quite simply by
⎧
⎪
⎨0 if n = 0 ∧ T (e, 0, s)
r(n, s) = r(n, s + 1) + 1 if ¬T (e, n, s)
⎪
⎩
r(n − 1, 0) + 1 if n > 0 ∧ T (e, n, s)
and then for each n we have ϕe (n) = U (e, n, r(n, 0)).
This result tells us that subrecursive hierarchies must inevitably be
“notation-dependent”. They must depend upon given well-orderings,
not just on their order-types. So what is a subrecursive hierarchy?
5.1.2. Subrecursive hierarchies and constructive ordinals.
Definition. By a subrecursive hierarchy we mean a triple (C, P, ≺) where
≺ is a recursively enumerable relation, P is a linearly and hence well-
ordered initial segment of the accessible part of ≺ and, uniformly to each
a ∈ P, C assigns an effectively generated class C (a) of recursive functions
so that C (a ) ⊆ C (a) whenever a ≺ a. Furthermore we require that
there are elementary relations Lim, Succ and Zero which decide, for each
a ∈ P, whether a represents a limit ordinal, a successor or zero, and an
elementary function pred which computes the immediate predecessor of
a if it happens to represent a successor. We also assume that pred (a) is
numerically smaller than a whenever Succ(a) holds.
For example, the classes C (a) could be the functions elementary in Fa
where F is some version of the fast-growing hierarchy, but what gives the
hierarchy its power is the size and structure of its well-ordering (P, ≺).
There is a universal system of notations for such well-orderings, called
Kleene’s O, and it will be convenient, now and for later, to develop its basic
properties. Our somewhat modified version will however be denoted W.
5.1. The subrecursive stumblingblock 197

Deﬁnition. (i) The set W of “constructive ordinal notations” is the

smallest set closed under the following inductive rule:
a ∈ W if a = 0 ∨ ∃b∈W (a = 2b + 1) ∨ ∃e (∀n ([e](n) ∈ W) ∧ a = 2e)
where [e] denotes the e-th elementary function in some standard primitive
recursive enumeration.
(ii) For each a ∈ W its “rank” is the ordinal |a| given by
|0| = 0 ; |2b + 1| = |b| + 1 ; |2e| = sup{|[e](n)| + 1}.
n

These ordinals are called the “constructive” or “recursive” ordinals. Their

least upper bound is denoted 1CK .
(iii) The recursively enumerable relation ≺W defined inductively by
a ≺W a if ∃b (a = 2b+1∧a W b)∨∃e ∃n (a = 2e ∧a W [e](n))
partially orders and is well-founded on W. In fact W is the accessible part
of ≺W .
(iv) A path in W is any subset P ⊆ W which is linearly (and hence well-)
ordered by ≺W and contains with each a ∈ P all its ≺W -predecessors. If
it contains a notation for every recursive ordinal then it is called a path
through W.
Theorem. If P is a path in W then the well-ordering (P, ≺W ) satisfies
the conditions of the definition of a subrecursive hierarchy. Conversely every
well-ordering (P, ≺) satisfying those conditions is isomorphic to a path in W.
Proof. It is clear from the last set of definitions that if P is a path in W
then (P, ≺W ) is a well-ordering satisfying the conditions of the definition
of a subrecursive hierarchy. For the converse, let (P, ≺) be any well-
ordering satisfying those conditions. As ≺ is a recursively enumerable
relation, there is an elementary recursive function pr of two variables
such that for every number a the function n → pr(a, n) enumerates
{a | a ≺ a} provided it is non-empty.
We now define an elementary recursive function w such that for every
a ∈ P we have w(a) ∈ W and |w(a)| is the ordinal represented by a in
the well-ordering (P, ≺).
⎧
⎪
⎨0 if Zero(a)
w(a) = 2 · w(pred (a)) + 1 if Succ(a)
⎪
⎩
2 · e(a) if Lim(a)
where e(a) is an elementary index such that for every n,
[e(a)](n) = w(pr(a, n)).
Clearly e(a) is computed by a standard index-construction using as pa-
rameters a given index for the function pr and and an assumed one for
w itself. Since pred (a) < a the definition of w is thus a course-of-values
198 5. Accessible recursive functions, ID< and Π11 -CA0
primitive recursion, and it is bounded by some fixed elementary function
depending on the chosen method of indexing. Thus w is definable ele-
mentarily from its own index as a parameter, and the second recursion
theorem justifies this principle of definition.
It is obvious by induction that if a ∈ P then w(a) ∈ W and |w(a)| is the
ordinal represented by a in the well-ordering (P, ≺). Thus {w(a) | a ∈ P}
is a path in W isomorphic with the given (P, ≺). Note that if w(a) ∈ W
then although a may not be in P it certainly will lie in the accessible part
of ≺.
Theorem. W is a complete Π11 set.
Proof. Since W is the intersection of all sets satisfying a positive arith-
metical closure condition, it is Π11 . Furthermore if
S = {n | ∀g ∃s T (e, n, g(s))}
is any Π11 subset of N then as in 2.8, the Kleene–Brouwer ordering of
non-past-secured sequence numbers gives, uniformly to each n, an (ele-
mentary) recursive linear ordering (Pn , ≺n ) having n as its top element,
which is well-founded if and only if n ∈ S, and on which it is possible
(elementarily) to distinguish limits from successors, and compute the pre-
decessor of any successor point. Since, in this case, membership in each Pn
is decidable, the function w of the above proof is easily modified (adding
n as a new parameter) so that w(n, a) ∈ W if and only if a belongs to the
accessible part of ≺n . Therefore with a = n, the top element of Pn , we
get the reduction
n ∈ S ↔ w(n, n) ∈ W.
Therefore S is “many-one reducible” to W and hence W is Π11 complete.

Thus every subrecursive hierarchy can be represented in the form (C, P)
where P is some path in W, the underlying relation ≺W now being the same
in each case. The level of definability of P then serves as a rough measure
of the logical complexity of the given hierarchy. To say that the hierarchy
(C, P) is “inductively generated” is therefore to say that the path P is Π11 .
Since one can easily manufacture such hierarchies of arbitrary recursive
ordinal-lengths, the question of the existence of inductively generated
hierarchies which are “complete” in the sense that they capture all recursive
functions and extend through all recursive ordinal-levels, becomes the
question: is there a subrecursive hierarchy (C, P) where P is a Π11 path
through W? The “stumblingblock” is that the answer is “No”.
5.1.3. Incompleteness along Π11 -paths through W.
Theorem. There isno subrecursive hierarchy (C, P) such that P is a Π11
path through W and {C (a) | a ∈ P} contains all recursive functions.
5.2. Accessible recursive functions 199

Proof. Supposethere were such a hierarchy. Then since at each level

a ∈ P, the class {C (a ) | a W a} is a recursively enumerable set
of recursive functions, there must always be, at any level a ∈ P, a new
recursive function which has not yet appeared. This enables us to define
W as follows:
a ∈ W ↔ ∃e (ϕe is total ∧ ∀c (c ∈ P ∧ ϕe ∈ C (c) → a ∈ W ∧ |a| < |c|)).
Now for c ∈ P there is a uniform Σ11 -definition of the condition a ∈
W ∧ |a| < |c| since it is equivalent to saying there is an order-preserving
function from {d | d W a} into {d | d ≺W c}. Also, notice that the
Π11 -condition c ∈ P occurs negatively. Since all other components of the
right hand side are arithmetical the above yields a Σ11 -definition of W.
This is impossible since W is a complete Π11 -set; if it were also Σ11 then
every Π11 -set would be Σ11 and conversely.
The classic Feferman [1962] was the first to provide a detailed technical
investigation of the general theory of subrecursive hierarchies. Many fun-
damental results are proved there, of which the above is just one relatively
simple but important example. It is also shown that there are subrecur-
sive hierarchies (C, P) which contain all recursive functions, but where the
path P is arithmetically definable and very short (e.g., of length 3 ). These
pathological hierarchies are not generated “from below” either, since they
are constructed out of an assumed enumeration of indices for all total
recursive functions. The classification problem for all recursive functions
thus seems intractable. On the other hand, as we have already seen and
shall see further, there are good hierarchies for “naturally ocurring” r.e.
subclasses such as the ones provably recursive in arithmetical theories. An
axiomatic treatment of such subrecursive classes is attempted in Heaton
and Wainer [1996].

5.2. Accessible recursive functions

Before one accepts a computable function as being recursive, a proof of

totality is required. This will generally be an induction over a tree in which
computations and, below them, their respective sub-computations, may
be embedded according to the given defining algorithm. If the tree is well-
founded, the strength of the induction principle over its Kleene–Brouwer
well-ordering thus serves as a bound on the proof-theoretic complexity
of the given function. One of the earliest examples of such a “program
proof” was Turing’s use of the ordinal 3 in a 1949 report to the inaugural
conference of the EDSAC computer at Cambridge University; see Morris
and Jones [1984].
The aim of this chapter is to isolate and characterize those recursive
functions which may be termed “predicatively accessible” or “predictably
200 5. Accessible recursive functions, ID< and Π11 -CA0
terminating” according to the following hierarchy principle: one is allowed
to generate a function at a new level only if it is provably recursive over a
well-ordering already coded in a previous level, i.e., only if one has already
constructed a method to prove its termination.
This begs the question: what should it mean for a well-ordering to be
“coded in a previous level”? Certainly it is not enough merely to require
that the characteristic function of its ordering relation should have been
generated at an earlier stage, since by the Myhill–Routledge observation
in the last section, the resulting hierarchy would then collapse in the sense
that all recursive functions would appear immediately once the elementary
relations had been produced. In order to avoid this circularity, a more
delicate notion of “code” for well-orderings is needed, but one which is
still finitary in that it should be determined by number-theoretic functions
only. The crucial idea is the one underpinning Girard’s Π12 -logic [1981],
and this section can be viewed as a reconstruction of some of the main
results there. However, our approach is quite different and, since the
concern is with only those parts of the general framework specific to
subrecursive hierarchies, it can be developed in (what we hope is) a simpler
and more basic context, first published as Wainer [1999]. The slogan is:
code well-orderings by number-theoretic functors whose direct limits are
(isomorphic to) the well-orderings themselves. This functorial connection
is easily explained.
A well-ordering is an “intensional ordinal”. If the ordinal is countable
then the additional intensional component should amount to a particu-
lar choice of enumeration of its elements. Thus, by a presentation of a
countable ordinal α we shall mean a chosen sequence of finite subsets of
it, denoted α[n], n ∈ N, such that
∀n (α[n] ⊆ α[n + 1]) and ∀<α ∃n ( ∈ α[n]).
It will later be convenient to require also that when + 1 < α,
+ 1 ∈ α[n] → ∈ α[n] and ∈ α[n] → + 1 ∈ α[n + 1]
Note that a presentation of α immediately induces a sub-presentation for
each < α by [n] := α[n] ∩ , and consequently a system of “rank
functions” given by
G(, n) := card [n]
for ≤ α, so that if belongs to α[n] then it is the G(, n)-th element
in ascending order. Thus G(, n) < G(, n) whenever ∈ [n]. This
system G, called the “slow-growing hierarchy” on the given presentation,
determines a functor
G(α) : N0 → N
where N0 is the category {0 → 1 → 2 → 3 → · · · } in which there
is only one arrow (the identity function) imn : m → n if m ≤ n, and
5.2. Accessible recursive functions 201

where N is the category of natural numbers in which the morphisms

between m and n are all strictly increasing maps from {0, 1, . . . , m − 1} to
{0, 1, . . . , n − 1}. The definition of G(α) is straightforward: on numbers
we take G(α)(n) = card α[n] and on arrows we take G(α)(imn ) to be
the map p : G(α, m) → G(α, n) such that if k < G(α, m) and the k-th
element of α[m] in ascending order is , then p(k) = G(, n).
It is easy to check that if instead we view G(α) as a functor from
N0 into the larger category of all countable linear orderings, with order-
preserving maps as morphisms, then G(α) has a direct limit which will be
a well-ordered structure isomorphic to the presentation of α we started
with. For recall that a direct limit or colimit of G(α) will be an initial (in
this context, “minimal”) object among all linear orderings L for which
there is a system of maps n : G(α)(n) → L such that whenever m ≤ n,
m = n ◦ G(α)(imn ). It is clear that α plays this role in conjunction
with the system of maps n where n simply lists the elements of α[n] in
increasing order. To describe this situation we shall therefore write
(α, [ ]) = Lim→ G(α)
or more loosely, when the presentation is understood,
α = Lim→ G(α).
G(α) will be taken as the canonical functorial code of the given presentation.
Note further that, given two presentations (α, [ ]) and (α , [ ] ), the exis-
tence of a natural transformation from G(α) to G(α ) is equivalent to the
existence of an order-preserving map from α into α such that for every
n, takes α[n] into α [n] . Hence if ∈ α[n] then G(, n) ≤ G((), n).
Thus although the notion of a “natural well-ordering” or “natural pre-
sentation” remains unclear (see Feferman [1996] for a discussion of this
bothersome problem) there is nevertheless a “natural” partial ordering on
them which ensures that majorization is preserved.
We can now begin to describe what is meant by an accessible recursive
function. Firstly we need to develop recursion within a robust hierarchical
framework, one which closely reflects provable termination on the one
hand, and complexity on the other. That is, if a function is provably
recursive over a well-ordering of order-type α then the chosen hierarchy
should provide a complexity bound for it, at or near level α. The “fast-
growing hierarchy” has this property as we have already seen, and the
version B turns out to be a particularly convenient form to work with.
Definition. Given a presentation of α, define for each ≤ α the
function B : N → N as follows:
B0 (n) = n + 1 and B (n) = B ◦ B (n) if = 0
where is the maximum element of [n]. (If [n] happens to be empty,
again take B (n) = n + 1.)
202 5. Accessible recursive functions, ID< and Π11 -CA0
Theorem. For a suitably large class of ordinal presentations α, the func-
tion Bα naturally extends to a functor on N. This functor is, in the sense
described earlier, a canonical code for a (larger) ordinal presentation α + .
Thus Bα = G(α + ) and hence α + = Lim→ Bα .
Definition. The accessible part of the fast-growing hierarchy is defined
to be (Bα )α< where = sup i and the presentations i are generated as
follows:
0 = and i+1 = Lim→ Bi = i+ .
The accessible recursive functions are those computable within Bα -bounded
time or space, for any α < (or those Kalmár-elementary in Bα ’s, α < ).
Theorem. is a presentation of the proof-theoretic ordinal of the the-
ory Π11 -CA0 . The accessible recursive functions are therefore the provably
recursive functions of this theory.
The main effort of this section will lie in computing the operation
α → α + and establishing the functorial identity Bα = G(α + ). The
following section will characterize the ordinals i and their limit proof
theoretically. In fact i+2 will turn out to be the ordinal of the theory IDi
of an i-times-iterated inductive definition. In order to compute these and
other moderately large recursive ordinals we shall need to make uniform
recursive definitions of systems of “fast-growing” operations on ordinal
presentations. It will therefore be convenient to develop a more explicitly
computational theory of ordinal presentations within a uniform inductive
framework. This is where “structured tree ordinals” come into play, but
we shall need to generalize them to all finite number classes, the idea
being that large ordinals in one number class can be presented in terms
of a fast-growing hierarchy indexed by ordinal presentations in the next
number class.
Note. It will be intuitively clear that the ordinals computed are indeed
recursive ordinals, having notations in the set W. However, throughout
this section we shall suppress all the recursion-theoretic machinery of
W to do with coding limit ordinals by recursive indices etcetera, and
concentrate purely on their abstract structure as unrestricted tree ordinals
in Ω, wherein arbitrary sequences are allowed and not just recursive (or
elementary) ones. Later we shall be forced to code them up as ordinal
notations, and it will be fairly obvious how this should be done. However,
it all adds a further level of technical and syntactical complexity that we
don’t need to be bothered with at present. Things are complicated enough
without at the same time having to worry about recursion indices. So
for the time being let us agree to work over the classical Ω instead of the
constructive W, and appeal to Church’s thesis whenever we want to claim
that a tree ordinal is recursive.
5.2. Accessible recursive functions 203

5.2.1. Structured tree ordinals. The sets Ω0 ⊂ Ω1 ⊂ Ω2 ⊂ . . . of ﬁnite,

countable and higher-level tree ordinals (hereafter denoted by lower-case
greek letters) are generated by the following iterated inductive definition:
α ∈ Ωk if α = 0 ∨ ∃∈Ωk (α = + 1) ∨ ∃i<k (α : Ωi → Ωk )
where + 1 denotes ∪ {}, and if α : Ωi → Ωk we call it a limit and
often write, more suggestively, α = supΩi α or even α = sup α when the
level i is understood, the subscript denoting evaluation of the function
at . We often use to denote such limits. The subtree partial ordering ≺
on Ωk is the transitive closure of ≺ + 1 and α ≺ α for each . The
identity function on Ωi will be denoted i , so that i = supΩi ∈ Ωk
whenever i < k.
The principle method of proof in this chapter will be “induction on
α ∈ Ωk ”, by which is meant ≺-induction, over the generation of α in Ωk .
Definition. For each α ∈ Ωk and ∈ Ωk−1 × Ωk−2 × · · · × Ω0 define
the finite linearly-ordered set α[] of ≺-predecessors of α by induction as
follows:
0[] = φ ; (α + 1)[] = α[] ∪ {α} ; (sup α )[] = αi [].
Ωi

Note that for i < k and α ∈ Ωi+1 , α[] = α[i , . . . , 0 ].

Definition (Structuredness). The subset ΩSk of structured tree ordinals
at level k is defined by induction on k. If each ΩSi has already been defined
for i < k, let ≺S ⊆ Ωk × Ωk be the transitive closure of ≺S + 1 and
α ≺S α for every ∈ ΩSi , in the case where α : Ωi → Ωk . Then ΩSk
consists of those α ∈ Ωk such that for every S α with = supΩi ,
the following condition holds:
∀∈ΩS ×ΩSk−2 ×···×ΩS0 ∀ ∈i [] ( ∈ []).
k−1

Remark. The structuredness condition above ensures that “fundamen-

tal sequences” mesh together appropriately. In particular, since ΩS0 =
Ω0 = N and since 0 [x] = {0, 1, 2, . . . , x − 1} for each x ∈ Ω0 , the
condition for countable tree ordinals = supΩ0 z simply amounts to
∀x ∀z<x (z ∈ [x] = x [x]).
Note also that if α ∈ ΩSk and ≺S α then ∈ ΩSk , that 0 , 1 , . . . , k−1
are structured at level k, and that ΩSi ⊂ ΩSk whenever i < k.
Definition. Tree ordinals carry a natural arithmetic, obtained by ex-
tending the usual number-theoretic definitions in a formally “continuous”
manner at limits. Thus addition on Ωk is defined by
α + 0 = α, α + ( + 1) = (α + ) + 1, α + = sup(α + ),
204 5. Accessible recursive functions, ID< and Π11 -CA0
multiplication by
α · 0 = 0, α · ( + 1) = (α · ) + α, α · = sup(α · )

and exponentiation similarly:

α 0 = 1, α +1 = α · α, α = sup α .

Remark. It is easy to check that, if ∈ [] then (i) α + ∈ (α + )[],

(ii) α · ∈ α · [] provided that 0 ∈ α[], and (iii) α ∈ α [] provided
1 ∈ α[]. Therefore if α and are structured at level k, then so are
(i) α + , (ii) α · and (iii) α , but with the proviso in (ii) that 0 ∈ α[]
whenever i [] is non-empty, and in (iii) that 1 ∈ α[] whenever i [] is
non-empty.
In particular, at level 1, ΩS1 is closed under addition and exponentiation
to the base = 1 + 0 . One could then define ε0 ∈ Ω1 by ε0 (0) = and
ε0 (i + 1) = ε0 (i) , and this too is structured. The tree ordinals ≺ ε0 are
now precisely those expressible in Cantor normal form, with exactly the
same fundamental sequences as were used in chapter 4. Tree ordinals thus
provide a general setting for the development of many different varieties
of ordinal notation systems. It must be kept in mind that ≺ is a partial,
not total, ordering on ΩS1 ; for example, 1 + 0 and 0 are just different
limits, incomparable under ≺. However, as we now show, ≺ well-orders
the predecessors of any fixed α ∈ ΩS1 .
Theorem. If α ∈ ΩS1 then
∀x (α[x] ⊆ α[x + 1]) and ∀≺α ∃x ( ∈ α[x]).
Proof. By induction over the generation of countable tree ordinals α.
The α = 0 and α = + 1 cases are trivial. Suppose α = sup αz
where z ranges over Ω0 = N, and assume inductively that the result
holds for each αz individually. Since α is structured, we have for each
x ∈ N, αx ∈ α[x + 1] and hence αx [x + 1] ⊆ α[x + 1]. Thus α[x] =
αx [x] ⊆ αx [x + 1] ⊆ α[x + 1]. For the second part, if ≺ α, then
αz for some z. So by the induction hypothesis ∈ αz [x] ∪ {αz } for
all sufficiently large x. Therefore choosing x > z we have αz ∈ α[x] by
structuredness and hence αz [x] ∪ {αz } ⊆ α[x] and hence ∈ α[x].
Theorem (Structure). If α is a countable structured tree ordinal then
{ | ≺ α} is well-ordered by ≺, and if + 1 ≺ α then we have, for all n,
+ 1 ∈ α[n] → ∈ α[n] and ∈ α[n] → + 1 ∈ α[n + 1]. Therefore by
associating to each α its set-theoretic ordinal || = sup{|| + 1 | ≺
}, it is clear that α determines a presentation of the countable ordinal |α|
given by |α|[n] = {|| | ∈ α[n]}.
Proof. By the lemma above, if ≺ α and ≺ α, then for some large
enough x, and both lie in α[x], and so ≺ or ≺ or = .
Hence ≺ well-orders { | ≺ α}. The rest is quite straightforward.
5.2. Accessible recursive functions 205

Thus ΩS1 provides a convenient structure over which ordinal presen-

tations can be computed. The reason for introducing higher-level tree
ordinals ΩSk is that they will enable us to name large elements of ΩS1 in a
uniform way, by higher-level versions of the fast-growing hierarchy.
Deﬁnition. The ϕ-hierarchy at level k is the function
ϕ (k) : Ωk+1 × Ωk → Ωk
deﬁned by the following recursion over α ∈ Ωk+1 :
ϕ (k) (0, ) := + 1,
ϕ (k) (α + 1, ) := ϕ (k) (α, ϕ (k) (α, )),
ϕ (k) (sup α , ) := sup ϕ (k) (α , ) if i < k,
Ωi Ωi
(k)
ϕ (sup α , ) := ϕ (k) (α , ).
Ωk

When the context is clear, the superscript (k) will be supressed. Also
ϕ(α, ) will sometimes be written ϕα (). These functions will play a
fundamental role throughout this chapter.
Note. At level k = 0 we have ϕα(0) = Bα where Bα is the fast-growing
function defined in the introduction according to the presentation deter-
mined by α ∈ ΩS1 as in the structure theorem. This is because α[n] = αn [n]
if α is a limit and the maximum element of α + 1[n] is α.
Lemma (Properties of ϕ). For any level k, all α, α ∈ Ωk+1 , all ∈ Ωk ,
and all ∈ Ωk−1 × · · · × Ω0 ,
α ∈ α[, ] → ϕ(α , ) ∈ ϕ(α, )[].
Proof. By induction on α ∈ Ωk+1 . The implication holds vacuously
if α = 0 since 0[, ] is empty. For the successor step α to α + 1, if
α ∈ α + 1[, ] then α ∈ α[, ] ∪ {α}, so by the induction hypothesis,
ϕ(α , ) ∈ ϕ(α, )[] ∪ {ϕ(α, )}. But ∈ ϕ(α, )[] for any , so
putting = ϕ(α, ) gives ϕ(α , ) ∈ ϕ(α, ϕ(α, ))[] = ϕ(α + 1, )[] as
required. Now suppose α = supΩi α . If i < k then from the definitions
α[, ] = αi [, ] and ϕ(α, )[] = ϕ(αi , )[] so the result follows
immediately from the induction hypothesis for αi ≺ α. The final case
i = k follows in exactly the same way, but with i replaced by .
Theorem. ϕ preserves structuredness. For any level k, if α ∈ ΩSk+1 and
∈ ΩSk then ϕ(α, ) ∈ ΩSk .
Proof. By induction on α ∈ ΩSk+1 . The zero and successor cases are
immediate, and so is the limit case α = supΩk α since if ∈ ΩSk then by
definition, α ∈ ΩSk+1 and hence ϕ(α, ) = ϕ(α , ) ∈ ΩSk . Suppose then
that α = supΩi α where i < k. Then for every ∈ ΩSi we have α ∈ ΩSk+1
206 5. Accessible recursive functions, ID< and Π11 -CA0
and so ϕ(α , ) ∈ ΩSk by the induction hypothesis. It remains only
to check the structuredness condition for = ϕ(α, ) = supΩi ϕ(α , ).
Assume ∈ ΩSk−1 ×· · ·×ΩS0 and ∈ i []. Then by the structuredness of
α we have α ∈ α[, ] because ∈ i [] implies ∈ i [, ] when i < k.
Therefore by the last lemma, ϕ(α , ) ∈ ϕ(α, )[]. So ϕ(α, ) ∈ ΩSk .
Corollary. Define, for each positive integer k,

k = ϕ (1) (ϕ (2) (. . . ϕ (k) (k , k−1 ) . . . , 1 ), 0 )

and set 0 = 0 . Thus 1 = ϕ (1) (1 , 0 ), 2 = ϕ (1) (ϕ (2) (2 , 1 ), 0 )
etcetera. Then each k ∈ ΩS1 , and since

k−1 ∈ ϕ (k) (k , k−1 )[k−2 , . . . 0 , k]

we have k−1 ∈ k [k] by repeated application of the last lemma. Therefore
= sup k is also structured.
Our notion of structuredness is closely related to the earlier work of
Schmidt [1976] on “step-down” relations and “built-up” systems of fun-
damental sequences. See also Kadota [1993] for an earlier alternative
treatment of .
5.2.2. Collapsing properties of G. For the time being we set structured-
ness to one side and review some of the “arithmetical” properties of the
slow-growing G-function developed in Wainer [1989]. These will be fun-
damental to what follows later. Recall that G(α, n) measures the size of
α[n]. Since we are now working with tree ordinals α and the presentations
determined by them according to the structure theorem it is clear that G
can be defined by recursion over α ∈ Ω1 as follows:
G(0, n) = 0, G(α + 1, n) = G(α, n) + 1, G(sup αz , n) = G(αn , n).
Note that the parameter n does not change in this recursion, so what we
are actually defining is a function Gn : Ω1 → Ω0 for each fixed n where
Gn (α) := G(α, n). We need to lift this to higher levels.
Definition. Fix n ∈ N and, by induction on k, define the functions
Gn : Ωk+1 → Ωk and Ln : Ωk → Ωk+1 as follows:
Gn (0) = 0, Ln (0) = 0,
Gn (α + 1) = Gn (α) + 1, Ln ( + 1) = Ln () + 1,
Gn (sup αz ) = Gn (αn ),
Ω0

Gn (sup α ) = sup Gn (αLn ), Ln (sup ) = sup Ln (Gn ).

Ωi+1 Ωi Ωi Ωi+1

Lemma. For all ∈ Ωk we have Gn ◦ Ln () = . Hence for every

positive k, Gn (k ) = k−1 .
5.2. Accessible recursive functions 207

Proof. By induction on ∈ Ωk . The zero and successor cases are

immediate, and if = supΩi then, assuming we have already proved
Gn ◦ Ln () = for every ∈ Ωi (i < k), we can simply unravel the above
deﬁnitions to obtain
Gn ◦ Ln () = Gn (sup Ln (Gn )) = sup Gn ◦ Ln (Gn Ln ) = sup = .
Ωi+1 Ωi Ωi

Hence Gn ◦ Ln is the identity and since k = supΩk we have

Gn (k ) = sup Gn (Ln ) = sup = k−1 .
Ωk−1 Ωk−1

Note that this lemma holds for every ﬁxed n.

Definition. For each fixed n define the subset ΩG
k (n)
of Ωk as follows
by induction on k. Set ΩG 0 (n) = Ω 0 and assume ΩG
i (n)
defined for each
i < k. Take ≺G n ⊆ Ω k × Ω k to be the transitive closure of ≺G n +1
and α ≺n α for every ∈ Ωi (n), if α : Ωi → Ωk . Then ΩG
G G
k (n) consists
of those α ∈ Ωk such that for every = supΩi G n α the following
condition holds:

∀ ∈ΩGi (n) Gn ( ) = Gn (Ln Gn )) .
Call this the “G-condition”. Note that ΩG G
0 (n) = Ω0 and Ω1 (n) = Ω1 for
every n since = Ln Gn if ∈ Ω0 .
Lemma. For each fixed n ∈ N:
(a) If = supΩi+1 ∈ ΩG k (n) then Gn ( ) = Gn ()Gn ( ) for every ∈
G
Ωi+1 (n).
(b) 0 , 1 , . . . , k−1 ∈ ΩG k (n).
(c) Ln : Ωk → ΩG k+1 (n).
Proof. First, if = supΩi+1 ∈ ΩG k (n) then Gn () = supΩi Gn (Ln )
so if ∈ ΩG
i+1 (n) we can put = Gn ( ) to obtain
Gn ()Gn ( ) = Gn (Ln Gn ) = Gn ( )
by the G-condition.
Second, note that 0 ∈ Ω1 = ΩG 1 (n) ⊆ Ωk if k > 0. If k > i + 1 and
G

≺G n i+1 then ∈ Ωi+1 so it satisﬁes the G-condition. If = i+1 then

it is just the identity function on Ωi+1 and so the G-condition amounts to

Gn ( ) = Gn (Ln Gn ), which holds because Gn ◦ Ln is the identity. Hence
i+1 ∈ ΩG k (n).
Third, we show Ln () ∈ ΩG k+1 (n) for every ∈ Ωk , by induction
on . The zero and successor cases are immediate. If = supΩi
then Ln () = supΩi+1 Ln (Gn ) and Ln (Gn ) ∈ ΩG k+1 (n) by the induction
hypothesis. For Ln () ∈ ΩG k+1 (n), it remains to check the G-condition:
Gn (Ln (Gn )) = Gn (Ln (Gn Ln Gn )).
Again, this holds because Gn ◦ Ln is the identity.
208 5. Accessible recursive functions, ID< and Π11 -CA0
Theorem. For each ﬁxed n ∈ N and every k, if α ∈ ΩG
k+2 (n) and
∈ Ωk+1 (n) then
G

(a) ϕ (k+1) (α, ) ∈ ΩGk+1 (n),

(b) Gn (ϕ (k+1) (α, )) = ϕ (k) (Gn (α), Gn ()).
Proof. By induction on α ∈ ΩG k+2 (n). The zero and successor cases
are straightforward, and so is the case where α = supΩk+1 α because then
we have (1) ϕ (k+1) (α, ) = ϕ (k+1) (α , ) ∈ ΩG k+1 (n) by the induction
hypothesis, and (2) Gn (α ) = Gn (α)Gn () by the last lemma, so by the
induction hypothesis and the deﬁnition of the ϕ-functions,
Gn (ϕ (k+1) (α , )) = ϕ (k) (Gn (α)Gn () , Gn ()) = ϕ (k) (Gn (α), Gn ()).
Now suppose α = supΩi α with i ≤ k. Then for (1) we have
ϕ (k+1) (α, ) = sup ϕ (k+1) (α , )
Ωi

and for each ∈ ΩG

i (n), ϕ (α , ) ∈ ΩG
(k+1)
k+1 (n) by the induction hy-
pothesis. Furthermore, αLn Gn ∈ ΩG k+2 (n) because Ln takes Ωi into
ΩGi+1 (n), and Gn (α ) = G n (α Ln Gn ). So by the induction hypothesis
(k+1) (k+1)
Gn (ϕ (α , )) and Gn (ϕ (αLn Gn , )) are identical. Thus ϕ (k+1) (α,
) ∈ Ωk+1 (n). For part (2) if i = 0 then Gn (α) = Gn (αn ) and by the
G

induction hypothesis,
Gn (ϕ (k+1) (α, )) = Gn (ϕ (k+1) (αn , ))
= ϕ (k) (Gn (αn ), Gn ())
= ϕ (k) (Gn (α), Gn ()).
If i > 0 then for every ∈ Ωi−1 we have Ln ∈ ΩG i (n) so αLn ∈ Ωk+2 (n)
G

and Gn (αLn ) = Gn (α) . Therefore, using the induction hypothesis once

more,
Gn (ϕ (k+1) (α, )) = sup Gn (ϕ (k+1) (αLn , ))
Ωi−1

= sup ϕ (k) (Gn (αLn ), Gn ())

Ωi−1

= sup ϕ (k) (Gn (α) , Gn ())

Ωi−1

= ϕ (k) (Gn (α), Gn ()).

This completes the proof.
Corollary. Recalling the deﬁnition
k = ϕ (1) (ϕ (2) (. . . ϕ (k) (k , k−1 ) . . . , 1 ), 0 )
and the fact that ϕ (0) = B, we have for each k > 0
G(k , n) = Gn (k ) = Bk−1 (n) for all n ∈ N.
5.2. Accessible recursive functions 209

Our next task is to extend this to a functorial identity. The following

simple lemma plays a crucial role.
Lemma (Bijectivity). Fix n and k. If α ∈ ΩG k (n) and i ∈ Ωi (n) for
G

each i < k then Gn bijectively collapses α[k−1 , . . . , 1 , n] onto

Gn (α)[Gn (k−1 ), . . . , Gn (1 )].
Proof. By an easy induction on α, noting that Gn (α + 1) = Gn (α) + 1
and Gn (α ) = Gn (α)Gn () for ∈ ΩGi (n).
5.2.3. The functors G, B and ϕ. Henceforth we shall restrict attention
to those tree ordinals which simultaneously are structured and possess the
G-collapsing properties above.

Deﬁnition. Ok := ΩSk ∩ ΩGk (n).
n∈N

Thus O0 = N, O1 = ΩS1 and if α = supΩi α ∈ Ok then α ∈ Ok

whenever ∈ Oi . The preceding subsections give i ∈ Ok for every i < k
and ϕ(α, ) ∈ Ok whenever α ∈ Ok+1 and ∈ Ok , so we can build lots
of elements in Ok using the ϕ-functions. The importance of the Ok ’s is
that when restricted to them, the ϕ-functions can be made into functors.
Deﬁnition. Set O<k = Ok−1 × Ok−2 , × · · · × O0 for each k > 0. Make
O into a category by choosing as morphisms
<k

: (k−1 , . . . , 0 ) → (k−1 , . . . , 0 )

all ≺-preserving maps from k−1 [k−1 , . . . , 0 ] into k−1 [k−1 , . . . , 0 ].
Note that k−1 [k−1 , . . . , 0 ] is the same as k−1 [k−2 , . . . , 0 ].
Definition. The functor G : O<k+1 → O<k is given by:
• G(k , . . . , 1 , n) = (Gn (k ), . . . , Gn (1 )).
• If : (k , . . . , 1 , n) → (k , . . . , 1 , m) then G( ) = Gm ◦ ◦ Gn−1
where Gn−1 is the inverse of the bijection Gn : k [k , . . . , 1 , n] →
k−1 [Gn (k ), . . . Gn (1 )] given by the bijectivity lemma.
Note. Each α ∈ O1 can be made into a functor α from N0 = {0 →
1 → 2 → 3 → . . . } into O<2 by defining α(n) = (α, n) and α(inm ) to
be the identity embedding of α[n] into α[m]. Then G ◦ α is exactly the
functor G(α) : N0 → N defined in the introduction to this section.
Before defining ϕα as a functor we need a “normal form lemma”.
Lemma (Normal form). For α ∈ Ωk+1 , ∈ Ωk , ∈ Ωk−1 × · · · × Ω0 :
if ∈ ϕ (k) (α, )[] then either ∈ [] ∪ {} or else is expressible
uniquely in the “normal form”
= ϕ(αr , ϕ(αr−1 . . . ϕ(α1 , ϕ(α0 , )) . . . )),
where α0 ∈ α[, ] and αi+1 ∈ αi [ϕ(αi , . . . ϕ(α0 , ) . . . ), ].
210 5. Accessible recursive functions, ID< and Π11 -CA0
Furthermore, if α, and are structured then for each i < r
αi+1 ∈ α[ϕ(αi , . . . ϕ(α0 , ) . . . ), ].
Proof. By induction on α. If α = 0 then ϕ(α, )[] = [] ∪ {}. If
α is a limit then ϕ(α, )[] = ϕ(α , )[] where = or k−1 or . . . or
0 , so the result follows immediately. For the successor step α to α + 1
note that ϕ(α + 1, )[] = ϕ(α, 1 )[] where 1 = ϕ(α, ). Therefore if
∈ ϕ(α +1, )[] then by the induction hypothesis for α, either ∈ 1 [],
in which case the unique normal form is as stated, or 1 in which case
the normal form is as stated but with replaced by 1 = ϕ(α, ).
If, furthermore, α, and are structured, then by induction on i =
0, 1, 2, . . . , r − 1 we show
αi+1 ∈ α[ϕ(αi , . . . ϕ(α0 , ) . . . ), ].
Firstly, we have α[, ] ⊆ α[ϕ(α0 , ), ] by a simple induction on α. The
only non-trivial case is where α : Ωk → Ωk+1 , so α[, ] = α [, ] ⊆
α [ϕ(α0 , ), ]. But ∈ ϕ(α0 , )[] = k [ϕ(α0 , ), ] so by struc-
turedness α ∈ α[ϕ(α0 , ), ] and hence α [ϕ(α0 , ), ] ⊆ α[ϕ(α0 , ), ].
Thus for the base-case i = 0, α0 ∈ α[, ] ⊆ α[ϕ(α0 , ), ] and hence
α1 ∈ α0 [ϕ(α0 , ), ] ⊆ α[ϕ(α0 , ), ]. And for the induction step i to
i + 1, replacing by ϕ(αi , . . . , ϕ(α0 , ) . . . ) and α0 by αi+1 gives
αi+2 ∈ αi+1 [ϕ(αi+1 , . . . ϕ(α0 , ) . . . ), ] ⊆ α[ϕ(αi+1 , . . . ϕ(α0 , ) . . . ), ]
as required.
Note. Conversely, if has the normal form above then ∈ ϕ (k) (α, )[]
by repeated application of the lemma on properties of ϕ in 5.2.1.
Definition (Functorial definition of ϕ). The functor ϕα(k) : O<k+1 →
O <k+1
is defined as follows: First, assume α ∈ Ok+1 has already been
made into a functor α : O<k+1 → O<k+2 such that α(k , . . . , 0 ) =
(α, k , . . . , 0 ) and α(i ) = iα()α( ) where i denotes the identity em-
bedding of the finite set k [k , . . . , 0 ] = k [k−1 , . . . , 0 ] as a subset of
k [k−1

, . . . , 0 ] when it exists. Note that this amounts to a monotonicity
condition: if is a subfunction of then α( ) will be a subfunction of
α( ). Girard called such functors “flowers”. We can now define ϕα(k) as
a functor on O<k+1 ; the superscript (k) will be omitted.
(i) ϕα (k , k−1 , . . . , 0 ) = (ϕα (k ), k−1 , . . . , 0 ).
(ii) If : (k , . . . , 0 ) → (k , . . . , 0 ) then
ϕα ( ) : ϕα (k )[k−1 , . . . , 0 ] → ϕα (k )[k−1

, . . . , 0 ]
is the map → built up inductively on ∈ ϕα (k )[] according to its
normal form as in the normal form lemma:
• if ∈ k [k−1 , . . . , 0 ] then = ();
• if = k then = k ;
5.2. Accessible recursive functions 211

• if = ϕαr ◦. . . ϕα1 ◦ϕα0 (k ) then set = ϕαr ◦. . . ϕα1 ◦ϕα0 (k ) where
for each i = 0, 1, 2, . . . , r, αi = α( i )(αi ) with i the previously
determined subfunction taking
∈ ϕαi−1 ◦ · · · ◦ ϕα0 (k )[] to ∈ ϕα ◦ · · · ◦ ϕα (k )[ ].
i−1 0

Note that i is a subfunction of i+1 and so α( i ) is a subfunction of

α( i+1 ). This means that αi+1 occurs below αi in the domain of α( i+1 ),

and hence αi+1 occurs below αi in α[ϕαi ◦ · · · ◦ ϕα0 (k ), ]. Thus αi+1

∈

αi [ϕαi ◦ · · · ◦ ϕα0 (k ), ] for each i. So ∈ ϕα (k )[ ] as required, by the
above note. This completes the definition.
Note. A careful reading of the preceding definition should convince
the reader that the maps ϕα(k) ( ) do in fact constitute a functor; that is,
ϕα (id ) = id ϕα () and ϕα ( ◦ ) = ϕα ( ) ◦ ϕα ( ). This depends,
of course, on the assumed functoriality of α. Furthermore, ϕα also
satisfies the “flower” property, in the sense that if k = k and is the

identity function from k [k−1 , . . . , 0 ] into k [k−1 , . . . , 0 ] then ϕα ( ) is

the identity function from ϕα (k )[] into ϕα (k )[ ]. Again, this depends
on the assumption that α is a “flower”.
Theorem (Commutation). Fix k > 0 and suppose α ∈ Ok+1 satisfies
the assumptions of the functorial definition of ϕ. Suppose also that there
is a ∈ Ok such that Gn (α) = for every n, and determines a functor
: O<k → O<k+1 satisfying G ◦ α = ◦ G. Then
G ◦ ϕα(k) = ϕ(k−1) ◦ G.
Proof. Firstly, if (k , . . . , 1 , n) ∈ O<k+1 then by the theorem in 5.2.2
and the definition of the functor G,
G ◦ ϕα(k) (k , . . . , 1 , n) = G(ϕα(k) (k ), k−1 , . . . , 1 , n)
= (Gn ϕα(k) (k ), Gn (k−1 ), . . . , Gn (1 ))
= (ϕ(k−1) (Gn k ), Gn (k−1 ), . . . , Gn (1 ))
= ϕ(k−1) (Gn (k ), Gn (k−1 ), . . . , Gn (1 ))
= ϕ(k−1) ◦ G(k , . . . , 1 , n).
Secondly, if : (k , . . . , 1 , n) → (k , . . . , 1 , m) in O<k+1 then using the
notation of the functorial definition of ϕ and again the definition of the
functor G,
G ◦ ϕα(k) ( ) : ϕ(k−1) ◦ G(k , . . . , 1 , n) → ϕ(k−1) ◦ G(k , . . . , 1 m)
is the map sending Gn () to Gm ( ) whenever → under ϕα(k) ( ).
Therefore in order to prove
G ◦ ϕα(k) ( ) = ϕ(k−1) ◦ G( )
212 5. Accessible recursive functions, ID< and Π11 -CA0
we have to check that for every ∈ ϕα (k )[k−1 , . . . , 1 , n]
Gm ( ) = ϕ(k−1) (G( ))(Gn ()).
Recall that by the bijectivity lemma, Gn always collapses k [k−1 , . . . , 1 ,
n] bijectively onto Gn ()[Gn (k−1 ), . . . , Gn (1 )]. Now according to the
definition of ϕα(k) ( ) in the functorial definition of ϕ, there are three cases
to consider:
(i) If ∈ k [k−1 , . . . , 1 , n] then = () and so
Gm ( ) = Gm ◦ () = G( )(Gn ()) = ϕ(k−1) (G( ))(Gn ())
because in this case we have Gn () ∈ Gn (k )[Gn (k−1 ), . . . , Gn (1 )].
(ii) If = k then = k and so Gn () = Gn (k ) and in this case

Gm ( ) = Gm (k ) = ϕ(k−1) (G( ))(Gn ()).

(iii) If = ϕα(k)
r
◦ · · · ◦ ϕα(k)
0
(k ) is in ϕα(k) (k )[k−1 , . . . , 1 , n] where
each αi+1 occurs below αi in α[ϕα(k) i
◦ · · · ◦ ϕα(k)
0
(k ), ], then Gn () =
ϕ(k−1)
r
◦ · · · ◦ ϕ(k−11)
0
(Gn (k )) with i = Gn (αi ) for each i ≤ r. The
collapsing property in the bijectivity lemma of Gn ensures that i+1 occurs
below i in [ϕ(k−1)i
◦ · · · ◦ ϕ(k−1)
0
(Gn k ), Gn ] and that

Gn () ∈ ϕ(k−1) (Gn k )[Gn (k−1 ), . . . , Gn (1 )].

Furthermore, every element of ϕ(k−1) (Gn k )[Gn (k−1 ), . . . , Gn (1 )] oc-

curs as such a Gn (). In this case, we have = ϕα(k) (k)
◦ · · · ◦ ϕα (k ) where
r 0
αi = α( i )(αi ), and i is the previously generated subfunction taking
∈ ϕα(k)
i−1
◦ · · · ◦ ϕα(k)
0
(k )[] to ∈ ϕα(k)
◦ · · · ◦ ϕα(k)
(k )[ ]. There-
i−1 0

fore Gm ( ) = ϕ(k−1)
◦ · · · ◦ ϕ(k−1)
(Gm k ) where i = Gm (αi ) for each
r 0
i ≤ r. But since G ◦ α = ◦ G, it follows that i = G ◦ α( i )(i ) =
◦ G( i )(i ) and G( i ) is the subfunction taking Gn ( ) ∈ ϕ(k−10 i−1
◦ ··· ◦
(k−1) (k−1) (k−1)
ϕ0 (Gn k )[Gn ] to Gm ( ) ∈ ϕ i−1 ◦ · · · ◦ ϕ (Gm )[Gm ]. All of
0 k
(k−1)
this means that Gm ( ) = ϕ (G( ))(Gn ()) according to the deﬁnition
of ϕ(k−1) (G( )).
Theorem. Again recall the deﬁnition
k = ϕ (1) (ϕ (2) (. . . ϕ (k) (k , k−1 ) . . . , 1 ), 0 ).
As before, write B = ϕ (0) and, whenever α ∈ O1 has been made into a
functor α : O<1 → O<2 , write G(α) = G ◦ α. Then we have the functorial
identity:
G(k ) = Bk−1 .
5.2. Accessible recursive functions 213

Hence for each k > 0,

Lim→ Bk−1 = k .
Proof. Fix k > 0 and for each i = 1, . . . , k set
αi = ϕ (i) (ϕ (i+1) (. . . ϕ (k) (k , k−1 ) . . . , i ), i−1 ).
Then from the previous subsections we have αi ∈ Oi . Make i into a
functor i : O<i+1 → O<i+2 by setting
• i (i , . . . , 0 ) = (i , i , . . . , 0 ),
• i ( ) = whenever : (i , . . . , 0 ) → (i , . . . , 0 ).
This makes good sense because i [i , . . . , 0 ] is the same set of tree ordinals
as i [i−1 , . . . , 0 ]. Since i is just the identity functor it automatically has
the “flower” property. Therefore starting with k and repeatedly applying
the functorial definition of ϕ and its accompanying note, we can make
each αi into a functor αi : O<i → O<i+1 , again with the flower property,
by setting
αk = ϕ(k)
k
◦ k−1
and for each i = k − 1, . . . , 1 in turn,
αi = ϕα(i)i+1 ◦ i−1 .
Exactly the same thing can be done with
i = ϕ (i−1) (ϕ (i) (. . . ϕ (k−1) (k−1 , k−1 ) . . . , i−1 ), i−2 )
for 1 < i ≤ k, and we claim that for each such i
G ◦ αi = i ◦ G.
The proof is by downward induction on i = k, k − 1, . . . , 2. If i = k then
since Gn (k ) = k−1 for every n, and G ◦ k = k−1 ◦ G, we can apply
the commutation theorem to get
G ◦ αk = ϕ(k−1)
k−1
◦ G ◦ k−1 = ϕ(k−1)
k−1
◦ k−2 ◦ G = k ◦ G.
The induction step from i +1 to i is similar. First note that by the theorem
in 5.2.2, Gn (αi+1 ) = i+1 for every n, and G ◦ αi+1 = i+1 ◦ G by the
induction hypothesis. So the commutation theorem applies again, giving
G ◦ αi = ϕ(i−1)
i+1
◦ G ◦ i−1 = ϕ(i−1)
i+1
◦ i−2 ◦ G = i ◦ G.
This proves the claim, and the theorem follows from G ◦ α2 = 2 ◦ G
by one more application of the commutation theorem:
G(k ) = G ◦ α1 = ϕ(0)
2
◦ G ◦ 0 = Bk−1

since α1 = k , 2 = k−1 , ϕ (0) = B and G ◦ 0 = idN .

More generally, by the same method one may prove:
214 5. Accessible recursive functions, ID< and Π11 -CA0
Theorem. Let α ∈ Ω1 be given by any term built up from i ’s by appli-
cation of ϕ (j) -functions (j ≥ 1). Lift α to α ∈ Ω2 by replacing each i by
i+1 and each ϕ (j) by ϕ (j+1) . Deﬁne α + = ϕ (1) (α , 0 ). Then we have the
functorial identity
G(α + ) = Bα ,
and hence
Lim→ Bα = α + .
Remark. Girard’s dilators [1981] are functors on the category of (set-
theoretic) ordinals which commute with direct limits and with pull-backs.
Commutation with direct limits provides number-theoretic representation
systems for countable ordinals named by the dilator, and commutation
with pull-backs ensures uniqueness of representation. Although our con-
text is diﬀerent, the ϕ-functors above are nevertheless dilator-like, since
the commutation theorem essentially expresses preservation of “limits”
under G, and the normal form lemma gives uniqueness of representation
with respect to ϕ.
Example. B0 is the following functor on O<1 :
(i) B0 (n) = n + 2n ,
(ii) B0 ( : n → m) is the map taking
n + 2n0 + 2n1 + · · · + 2nr → m + 2 (n0 )
+2 (n1 )
+ ··· + 2 (nr )
.
Thus
Lim→ B0 = 1 = ϕ (1) (1 , 0 ) = ϕ (1) (0 , 0 ) = 0 + 20
and B0 constructs a presentation of the (disappointingly small) ordinal
.2. Do not be deceived, however. After this point the B-functors really
get moving. At the next step in the -sequence we have
Lim→ B1 = 2
= ϕ (1) (ϕ (2) (2 , 1 ), 0 )
= ϕ (1) (ϕ (2) (1 , 1 ), 0 )
= ϕ (1) (ϕ (2) (0 , 1 ), 0 )
= ϕ (1) (1 + 20 , 0 )
= sup ϕ (1) (1 + 2n , 0 )
n

= sup ϕ(1)1 ◦ · · · ◦ ϕ(1)1 (0 )

n
n
where, in the last line, there are 22 iterations of ϕ(1)1 for each n. But for
each ∈ Ω1 we have ϕ(1)1 () = + 2 . So |2 | is the limit of iterated
exponentials, starting with 0 . In other words, 2 is a presentation of
5.3. Proof-theoretic characterizations of accessibility 215

epsilon-zero. Then 3 is a presentation of the Bachmann–Howard ordinal,

as we shall see in the next section.
5.2.4. The accessible recursive functions. The accessible part of the fast-
growing hierarchy is generated from 0 by iteration of the principle: given
α, ﬁrst form Bα and then take its direct limit to obtain the next ordinal α + .
Note that the equation
α + = Lim→ Bα
is really only an isomorphism of presentations. However, this is enough to
ensure that the B-functions are uniquely determined. So by the theorem
above, we may take 0+ = 1 , 0++ = 2 , +++ = 3 , etcetera. Thus to
sum up where we are so far:
Theorem. The accessible recursive functions are exactly the functions
Kalmár elementary in the functions Bα , α ≺ , where = sup i . Similarly
they are exactly those which are elementary in the functions Gα , α ≺ . is
the ﬁrst point in this scale at which the elementary closures of {Bα | α ≺ }
and {Gα | α ≺ } are equal.
Our next task is to characterize the accessible recursive functions:
Theorem. The accessible recursive functions are exactly the functions
provably recursive in the theories ID< and Π11 -CA0 .

5.3. Proof-theoretic characterizations of accessibility

We ﬁrst characterize the accessible recursive functions as those provably

recursive in the (first-order) theory ID< of finitely iterated inductive
definitions. Later we will show that these, in turn, are the same as the
functions provably recursive in the second-order theory Π11 -CA0 .
Since the systems of ϕ-functions, and the number-theoretic functions
Bα indexed by them, are all defined by (admittedly somewhat abstract)
recursion equations, they are, at least in an intuitive sense, recursive.
It should therefore be possible (and it is, as we now show) to develop
them, instead, in a more formal recursive setting, where the sets of tree
ordinals Ωk are replaced by sets Wk of Kleene-style ordinal notations, and
the uncountable “regular cardinals” k are replaced by their “recursively
regular” analogues, the “admissibles” kCK . After all, the k ’s are only
used to index certain strong kinds of diagonalization, so if we know that
it is only necessary to diagonalize over recursive sequences, the kCK ’s
should do just as well. The end result will be that we can formalize
the definition of each Wk in a first-order arithmetical theory IDk of a
k-times iterated inductive definition, then develop recursive analogues of
the functions ϕ (i) , i ≤ k, within it, and hence prove the recursiveness
of Bα for at least α < k (and in fact α < k+2 ). Thus every accessible
216 5. Accessible recursive functions, ID< and Π11 -CA0
recursive function, being elementary recursive in Bα for some α < , will
be provably recursive in ID< = k IDk . The converse will be proven in
subsequent subsections, using ordinal analysis methods due to Buchholz,
in particular his Ω-rules; see Buchholz [1987] and Buchholz, Feferman,
Pohlers, and Sieg [1981].
5.3.1. Finitely iterated inductive definitions. We can generate a recursive
analogue Wk of the set of tree ordinals Ωk by starting with 0 as a notation
for the ordinal zero, choosing 0, b as a notation for the successor of b,
and choosing i + 1, e as a notation for the limit of the recursive sequence
{e}(x) taken over x ∈ Wi . Thus Wk is obtained by k successive (iterated)
inductive definitions thus: a ∈ Wk if a = 0 or a = 0, b for some b ∈ Wk
or a = i + 1, e for some i < k where {e}(x) ∈ Wk for all x ∈ Wi . We
can formalize these constructions of W1 , . . . , Wi , . . . , Wk in a sequence of
first-order arithmetical theories IDi (W ), i ≤ k, as follows: first, for each
k and any formula A with a distinguished free variable, let Fk (A, a) be the
“positive-in-A” formula (i.e., A does not occur as a negative subformula)

a = 0 ∨ ∃b a = 0, b ∧ A(b) ∨

∃e a = 1, e ∧ ∀x ∃y ({e}(x) = y ∧ A(y)) ∨

∃e a = i + 1, e ∧ ∀x (Wi (x) → ∃y ({e}(x) = y ∧ A(y)))
1≤i<k

where {e}(x) = y abbreviates ∃z (T (e, x, z) ∧ U (e, x, z) = y).

Definition. ID0 (W ) is just Peano arithmetic, and for each k > 0,
IDk (W ) is the theory in the language of PA expanded by new predicates
W1 , . . . , Wk , having for each i = 1, . . . , k the inductive closure axioms
∀a (Fi (Wi , a) → Wi (a))
and the least-fixed-point axiom schemes
∀a (Fi (A, a) → A(a)) → ∀a (Wi (a) → A(a))
where A is any formula in the language of IDk (W ). ID< (W ) is then the
union of the IDk (W )’s.
Note that the i-th least-fixed-point axiom applied to the formula A :=
Wk (a) gives
∀a (Fi (Wk , a) → Wk (a)) → ∀a (Wi (a) → Wk (a)),
from which follows ∀a (Wi (a) → Wk (a)), as ∀a (Fi (Wk , a) → Fk (Wk , a))
is immediate by the definition of Fk , and then ∀a (Fi (Wk , a) → Wk (a))
by the inductive closure axiom for Wk .
These theories were first studied by Kreisel [1963], Feferman [1970]
and Friedman [1970]. The next ten years saw major developments in
ordinal analysis and in the close interrelationships between theories of
(transfinitely) iterated inductive definitions and subsystems of analysis. A
5.3. Proof-theoretic characterizations of accessibility 217

comprehensive treatment of this fundamental area, by four of its prime

movers, is to be found in Buchholz, Feferman, Pohlers, and Sieg [1981].
We have chosen to concentrate on the IDk (W )’s, with just one inductive
definition giving the set Wk , rather than the more general IDk ’s which
allow arbitrary k-times iterated inductive definitions to be thrown in to-
gether. Since Wk is a “complete” set at that level, it makes no difference,
in the end, to the strength of the theory, and IDk (W ) is easier to present.
As an illustration of what can be done in the IDk (W ) theories, let
f (k) : Wk+1 × Wk → Wk be the partial recursive function which mimics
ϕ (k) on ordinal notations. Thus f (k) is defined by the recursion theorem
to satisfy
f0(k) (b) := 0, b,
f (k) (k) (k)
0,a (b) := fa (fa (b)),

f (k)
i+1,e (b) := i + 1, d
(k)
where {d }(x) = f{e}(x) (b) if i < k,
f (k) (k)
k+1,e (b) := f{e}(b) (b),

where, as is done here, we often write the first argument of the binary f (k)
as a subscript. It is easy to check that if k is replaced by kCK , if α ∈ Ωk+1
is then replaced by a notation a ∈ Wk+1 , and if ∈ Ωk is replaced by
a notation b ∈ Wk , then ϕα(k) () ∈ Ωk gets replaced by fa(k) (b) ∈ Wk .
In particular then, the countable ϕα(1) () is a recursive ordinal. Also, for
each recursive α with notation a ∈ W1 , Bα (x) = ϕα(0) (x) = fa(0) (ẋ) where
ẋ is the notation for integer x. For a more general development of proof-
theoretic ordinal functions as functions on the admissible analogues of
“large” cardinals, see Rathjen [1993].
Furthermore we can actually prove f (k) : Wk+1 × Wk → Wk in
IDk+1 (W ). For let A(a) be the formula

∀b Wk (b) → ∃c (fa(k) (b) = c ∧ Wk (c))
where fa(k) (b) = c abbreviates a suitable Σ1 -computation formula. Then
the recursive definition of f (k) together with the inductive closure axiom
for Wk enable one easily to prove ∀a (Fk+1 (A, a) → A(a)). The least-fixed-
point axiom for Wk+1 then gives ∀a (Wk+1 (a) → A(a)). Thus, provably
in IDk+1 (W ) we have f (k) : Wk+1 × Wk → Wk .
Now, starting from iCK with notation i + 1, e0 ∈ Wi+1 where e0 is an
index for the identity function, we can immediately deduce the existence
in W1 of a notation
tk = f (1) (f (2) (. . . f (k) (k + 1, e0 , k, e0 ) . . . , 2, e0 ), 1, e0 )
for the (tree) ordinal k .
Now let CB be a computation formula for the function B, so that for
any notation a ∈ W1 for the recursive ordinal α, ∃y,z CB (a, x, y, z) is
218 5. Accessible recursive functions, ID< and Π11 -CA0
a Σ1 -definition of Bα . By the same argument that we have just applied
above, and writing Bα (x)↓ for the formula ∃y,z CB (a, x, y, z), we can prove
∀a (W1 (a) → ∀x Bα (x)↓). Thus Bα is provably recursive in IDk (W ) for
any ordinal α which, provably in IDk (W ), has a notation in W1 .
Suppose α ≺ k . We check that α itself has a notation in W1 , provably
in IDk (W ). Firstly, the relation a ≺ b is the restriction of a Σ1 -relation
to W1 . This is because a ≺ b if and only if there is a finite sequence of
pairs bi , xi such that b0 = b and the last b = a and for each i < ,
bi+1 = {e}(xi ) if bi = 1, e and bi+1 = c if bi = 0, c. Now let W1≺ (b)
be the formula ∀a≺b W1 (a), and notice that ∀b (F1 (W1≺ , b) → W1≺ (b))
is easily checked in IDk (W ). Therefore by the least-fixed-point axiom
for W1 we have ∀b (W1 (b) → W1≺ (b)). Hence if α ≺ k has notation
a ∈ W1 then since we can prove W1 (tk ) it follows that we can prove
W1 (a) (using the fact that the true Σ1 -statement a ≺ tk is provable in
arithmetic). We have shown that every α ≺ k has a recursive ordinal
notation, provably in IDk (W ). Therefore by the last paragraph, Bα , and
hence any function elementarily (or primitive recursively) definable from
it, is provably recursive in IDk (W ). This proves
Theorem. Every accessible recursive function is provably recursive in
ID< (W ).
It can be refined further:
Theorem. Each α ≺ k+2 has a notation a for which IDk (W ) W1 (a).
Therefore every function in the elementary (or primitive recursive) closure
of {Bα | α ≺ k+2 } is provably recursive in IDk (W ).
Proof. The second part follows immediately from the first since (i) as
above, Bα is provably recursive whenever α is provably a recursive ordinal,
and (ii) provably recursive functions will always be closed under primitive
recursion in the presence of Σ1 induction.
For the first part, suppose α ≺ k+2 . Then α has a recursive ordinal
notation a ≺ tk+2 = 1, ek+2 and so, for some fixed n, a ≺ {ek+2 }(n).
If we can show that W1 ({ek+2 }(n)) is provable in IDk (W ) then by the
earlier remarks IDk (W ) W1 (a).
Now by unravelling the definition of tk+2 according to the recursion
equations for f (1) , . . . , f (k+2) , it is not difficult to check that

{ek+2 }(n) =
f (1) (f (2) (. . . f (k) (f (k+1) m
k+2,e0 (k + 1, e0 ), k, e0 ) . . . , 2, e0 ), 1, e0 )

n
with m = 22 iterates of f (k+1)
k+2,e0 . (Recall that k, e0 is the chosen notation
CK
for k−1 in Wk .) It therefore will be enough to prove in IDk (W ) that

f (k) (f (k+1) m
k+2,e0 (k + 1, e0 ), k, e0 ) ∈ Wk
5.3. Proof-theoretic characterizations of accessibility 219

for then W1 ({ek+2 }(n)) follows immediately. (Note that we could prove
it easily in IDk+1 (W ) but the point is that one only needs IDk (W ).)
The following is a lifting to IDk (W ) of Gentzen’s original argument
showing that transﬁnite induction up to any ordinal below ε0 is provable
in PA. First let Ai be the formula generated by
A0 (d ) := ∀b (Wk (b) → ∃a (fd(k) (b) = a ∧ Wk (a))),
Ai+1 (d ) := ∀c (Ai (c) → ∃a (fd(k+1) (c) = a ∧ Ai (a))).

Then in IDk (W ) it is easy to check, from the deﬁnitions of f (k) and f (k+1) ,
that for every i, Fk (Ai , d ) → Ai (d ) and hence ∀d (Wk (d ) → Ai (d )).
Furthermore if d is a limit notation of the form k + 1, e then, again for
each i, ∀b (Wk (b) → Ai ({e}(b))) → Ai (d ), and in particular, therefore,
Ai (k + 1, e0 ) for every i.
Now by a downward meta-induction on j = m + 1, m, . . . , 1 we show,
still in IDk (W ), that Ai (cj ) for every i ≤ j where cj denotes the
(m + 1 − j)-th iterate of f (k+1)
k+2,e0 starting on k + 1, e0 .
The j = m + 1 case simply states Ai (k + 1, e0 ) shown above. For the
induction step assume the result holds for j > 1 and let i ≤ j − 1. Then
we have Ai+1 (cj ) and Ai (cj ) and hence

∃a (fc(k+1)
j
(cj ) = a ∧ Ai (a)).

But fc(k+1)
j (cj ) = f (k+1)
k+2,e0 (cj ) = cj−1 and so Ai (cj−1 ).
This completes the induction and putting j = 1, i = 0 we immediately
obtain
IDk (W ) ∀b (Wk (b) → ∃a (fc(k)
1
(b) = a ∧ Wk (a))).
Therefore with b = k, e0 ,

IDk (W ) Wk (f (k) (f (k+1) m

k+2,e0 (k + 1, e0 ), k, e0 ))

as required.
∞
5.3.2. The infinitary system IDk (W ) . With a view to the converse
of the above, we now set up an infinitary system suitable for the ordinal
analysis of IDk (W ), particularly cut elimination and “collapsing” results
in the style of Buchholz, from which bounds can be computed directly. The
crucial component is a version of Buchholz’s Ωi -rule, a major technical
innovation in the analysis of larger systems of finitely and transfinitely
iterated inductive definitions (see, e.g., Buchholz [1987] and Buchholz,
Feferman, Pohlers, and Sieg [1981]). We have the basics already from the
previous chapter on Peano arithmetic, but now sequents will be of the
more complex form
k : ΩSk , k−1 : ΩSk−1 , . . . , 1 : ΩS1 , n : N α Γ
220 5. Accessible recursive functions, ID< and Π11 -CA0
which we shall immediately abbreviate to , n α Γ. The particular se-
quence := k−1 , k−2 , . . . , 1 , 0 , where i = sup ∈Ωi ∈ ΩSi+1 , will
be of special significance. The ordinal bound α will be in ΩSk+1 . We stress
that throughout this and the following subsections all tree ordinals will be
structured as in 5.2.1. Note that we could equally well replace each Ωi by
its recursive analogue Wi and each i by iCK , as in the previous section.
The results below would work in just the same way.
The system IDk (W )∞ is, as before, in Tait style, Γ being a set of closed
formulas in the language of IDk (W ), and written in negation normal
form. The rules are as follows:
(N 1): For arbitrary α and ,
, n α Γ, m : N provided m ≤ n + 1.
(N 2): For 0 , 1 ∈ α[, n],
, n 0 n : N , n 1 Γ
.
, n α Γ
(Ax): If Γ contains a true atom (i.e., an equation or inequation between
closed terms) then, for arbitrary α,
, n α Γ.
(∨): For ∈ α[, n] and i = 0, 1,
, n Γ, Ai
.
, n α Γ, A0 ∨ A1
(∧): For 0 , 1 ∈ α[, n],
, n 0 Γ, A0 , n 1 Γ, A1
.
, n α Γ, A0 ∧ A1
(∃): For 1 ∈ α[, n] and 0 ∈ 1 [, n],
, n 0 m : N , n 1 Γ, A(m)
.
, n α Γ, ∃x A(x)
(∀): Provided i ∈ α[, max(n, i)] for every i,
, max(n, i) i Γ, A(i) for every i ∈ N
.
, n α Γ, ∀x A(x)
(Cut): For 0 , 1 ∈ α[, n], with C the “cut formula”,
, n 0 Γ, C , n 1 Γ, ¬C
, n α Γ
(Wi -Ax): For arbitrary α and and 1 ≤ i ≤ k,
, n α Γ, Wi (m), Wi (m) provided m ≤ n.
5.3. Proof-theoretic characterizations of accessibility 221

(Wi ): For 1 ∈ α[, n], 0 ∈ 1 [, n] and 1 ≤ i ≤ k,

, n 0 m : N , n 1 Γ, Fi (Wi , m)
.
, n α Γ, Wi (m)
(Ωi ): For 0 , 1 ∈ α[, n], i α and 1 ≤ i ≤ k,
, n 0 Γ0 , Wi (m) , n; Wi (m) →1 Γ1
, n α Γ0 , Γ1
where , n; Wi (m) →1 Γ1 means: whenever , n 0 Δ, Wi (m) is a
cut-free derivation with (i) ∈ ΩSi and i ∈ [ , n ], (ii) [i → ],
n , n as defined below, and (iii) Δ is a set of “positive-in-W≥i ”
formulas (i.e., containing no negative occurrences of Wj for j ≥ i);
then , n 1 Δ, Γ1 . In (ii), [i → ] denotes the sequence with i
replaced by .
We indicate that all cut formulas in a derivation are of “size” ≤ r by
writing , n αr Γ. The requirements on “size” are that a subformula must
be of smaller size than a formula, that atomic formulas m : N have size
0 and that all other atomic formulas have size 1. We first need to extend
slightly our notation to do with sets of structured tree ordinals.
Definition. For , n and , n in ΩSk ×· · ·×ΩS1 ×N we write , n , n
to mean n ≤ n and for all i ≤ k either i = i or i ∈ i [ , n ].

Lemma. If α ∈ ΩSk+1 and , n , n then α[, n] ⊆ α[ , n ].

Proof. By induction on α. If α = 0 or α is a successor the result
follows trivially from the definition of α[, n]. In the case where α =
supΩi α where i > 0, we have α[, n] = αi [, n] ⊆ αi [ , n ] by induction
hypothesis. Either i = i or i ∈ i [ , n ] = i [ , n ] since , n
, n . By the definition of structuredness we then have αi ∈ α[ , n ] and
hence immediately αi [ , n ] ⊆ α[ , n ]. Therefore α[, n] ⊆ α[ , n ] as
required. The case i = 0 is similar.
Lemma. The relation is transitive.
Proof. Suppose , n , n and , n , n . Then n ≤ n immedi-
ately and for each i = 1, . . . , k, either i = i or i ∈ i [ , n ], in other
words i ∈ i + 1[ , n ]. By the last lemma, i + 1[ , n ] ⊆ i + 1[ , n ].
Now either i = i or i ∈ i [ , n ] so i + 1[ , n ] ⊆ i + 1[ , n ].
Therefore i ∈ i + 1[ , n ] for each i = 1, . . . , k, hence , n , n as
required.
∞
Lemma (Weakening). Suppose , n Γ in IDk (W ) .
α

(a) If Γ ⊆ Γ and α[, n ] ⊆ α [, n ] for all n ≥ n then , n α0 +α Γ for
any α0 .
(b) If , n , n then , n α Γ.
222 5. Accessible recursive functions, ID< and Π11 -CA0
Proof. (a) By induction on α with cases according to the last rule
applied in deriving , n α Γ. In all cases except the (∀) and (Ωi ) rules the
result follows by first applying the induction hypothesis to the premises
and then re-applying the final rule. This final rule becomes applicable
because if ∈ α[, n] then ∈ α [, n] by assumption, and consequently
α0 + ∈ α0 + α [, n].
Case (∀). Suppose ∀x A(x) ∈ Γ and for each i we have , max(n, i) i
Γ, A(i) for some i ∈ α[, max(n, i)]. By the induction hypothesis ap-
plied to each of these premises we obtain , max(n, i) α0 +i Γ , A(i).
Also α0 + i ∈ (α0 + α )[, max(n, i)] because i ∈ α[, max(n, i)] ⊆
α [, max(n, i)] by assumption. Now we can re-apply the (∀) rule to

obtain the required , n α0 +α Γ .
Case (Ωi ). Let Γ = Γ0 , Γ1 where , n 0 Γ0 , Wi (m) and , n;
Wi (m) →1 Γ1 . It is easy to see that, in this case, we can apply the
induction hypothesis straight away to each of the premises, to obtain
, n α0 +0 Γ0 , Wi (m) and , n; Wi (m) →α0 +1 Γ1 . Then by re-applying

the (Ωi ) rule one obtains, as before, the desired , n α0 +α Γ0 , Γ1 .
(b) Again by induction on α with cases according to the last rule
applied in deriving , n α Γ. We treat the (Ωi ) rules separately. In all
other cases one applies the induction hypothesis to the premises, increasing
i up to i in the declaration, and then one re-applies the same rule,
noticing that if ∈ α[, n] then ∈ α[ , n ] by the first lemma above.
Finally, the (Ωi ) rule has , n 0 Γ0 , Wi (m) and , n; Wi (m) →1
Γ1 as its premises, and the conclusion is , n α Γ0 , Γ1 . Again the
induction hypothesis can be applied immediately to the first premise, so
as to increase , n to , n in the declaration. For the second premise
assume , l 0 Δ, Wi (m) where i ∈ [ , l ] and [i → ], n , l . The
assumption , n , n together with the transitivity of therefore gives
i ∈ [ , l ] and [i → ], n , l . Thus we can apply , n; Wi (m) →1 Γ1
to obtain , l 1 Δ, Γ1 . We have now shown the second premise with up-
dated declaration , n , i.e. , n ; Wi (m) →1 Γ1 . Since 0 , 1 ∈ α[, n]
and α[, n] ⊆ α[ , n ] by the first lemma above, one can now re-apply the
Ωi -rule to obtain the required , n α Γ0 , Γ1 .
5.3.3. Embedding IDk (W ) into IDk (W )∞ .
Lemma. Suppose , n 0 Γ where ∈ ΩSi and all occurrences of Wi in Γ
are positive. Suppose also that n = {0, . . . , n − 1} ⊆ k [, n ] for every
positive n ≥ n. Let A(a) be an arbitrary formula of IDk (W ). Then there
are fixed d and r such that in IDk (W )∞ one can prove
, max(n, d ) rk + ¬∀a (Fi (A, a) → A(a)), Γ∗
where Γ∗ results from Γ by replacing some, but not necessarily all, occur-
rences of Wi by A.
5.3. Proof-theoretic characterizations of accessibility 223

Proof. By induction according to the last rule applied in deriving

, n 0 Γ. Notice that if this is an axiom then it cannot be a (Wi )-axiom
because all occurrences of Wi are positive. Hence Γ∗ is (essentially) the
same axiom. The result in this case then follows because any ordinal
bound can be assigned to an axiom.
For any rule other than the (Wi ) and (Ωi ) rules the result follows
straightforwardly by applying the induction hypothesis to the premises
and then re-applying that rule.
In the case of a (Wi ) rule suppose , n 0 Γ, Wi (m) comes from the
premises , n 0 0 m : N and , n 0 1 Γ, Fi (Wi , m) where 1 ∈ [, n] and
0 ∈ 1 [, n]. By the induction hypothesis
, max(n, d )
r
k +1
Γ∗ , ¬∀a (Fi (A, a) → A(a)), Fi (A, m).
By logic we can prove Fi (A, m)∧¬A(m), ¬Fi (A, m), A(m) for any formula
A. The proof can be translated directly into a proof in IDk (W )∞ of ﬁnite
height. We choose d to be this height, and r to be the size of the formula
Fi (A, m). Since d [, n ] ⊆ n ⊆ k [, n ] for all n ≥ max(n, d ) we can
apply part (a) of the weakening lemma to give
∗
, max(n, d )
0 Γ , Fi (A, m) ∧ ¬A(m), ¬Fi (A, m), A(m).
k

By weakening the other premise , n 0 0 m : N to , max(n, d )

0
k +0

m : N and then applying the (∃) rule,

, max(n, d )
0
k +1
Γ∗ , ¬∀a (Fi (A, a) → A(a)), ¬Fi (A, m), A(m).
A cut on the formula Fi (A, m), of size r, immediately gives
, max(n, d )
r
k +
Γ∗ , ¬∀a (Fi (A, a) → A(a)), A(m)
which is the required sequent in this case.
In the case of an (Ωj ) rule, suppose Γ = Γ0 , Γ1 and the premises are
, n 0 0 Γ0 , Wj (m) and , n; Wj (m) →0 1 Γ1 where 0 , 1 ∈ [, n]. Note
that since ∈ ΩSi it cannot be the case that j > i and so any set Δ of
positive-in-W≥j formulas can only contain positive occurrences of Wi .
Therefore by applying the induction hypothesis to each of these premises
we easily obtain
, max(n, d )
0
k +0
¬∀a (Fi (A, a) → A(a)), Γ∗0 , Wi (m),
, max(n, d ); Wi (m) →r
k +1
¬∀a (Fi (A, a) → A(a)), Γ∗1 .
We can then re-apply the (Ωi ) rule to obtain
, max(n, d )
r
k +
¬∀a (Fi (A, a) → A(a)), Γ∗0 , Γ∗1
since k + 0 , k + 1 ∈ k + [, n]. This concludes the proof.
Theorem (Embedding). Suppose IDk (W ) Γ( x ). Then there are
fixed numbers d and r, determined by this derivation, such that for all
k ·2+0
= n1 , n2 , . . . , if max(
n n
n , d ) < n then , r Γ(n ) in IDk (W )∞ .
224 5. Accessible recursive functions, ID< and Π11 -CA0
Proof. By induction on the height of the given derivation of Γ( x ) in
k ·2+d
IDk (W ) we show , n r n ) for a suitable d and sufficiently large
Γ(
n. The number r is an upper bound on the cut rank and the induction
complexity of the IDk (W ) derivation. This proof will now simply be an
extension of the corresponding embedding of Peano arithmetic into its
infinitary system. Since the logic and the rules are essentially the same
we need only consider the new axioms built into IDk (W ). Note that any
finite steps in a derivation, say from s to s + 1, can always be replaced by
steps from k · s to k · (s + 1) because k · s ∈ k · (s + 1)[, n] provided
, n are all non-zero. Thus if we can derive in IDk (W )∞ the axioms of
IDk (W ) with bounds k · d we can derive any consequence from them,
proven with height h, with a corresponding bound k · (d + h).
Firstly for any axiom Γ, Wi (x), Wi (x) we immediately have, by
k ·2
n
(Wi -Ax), , r n ), Wi (m), Wi (m) for any m ≤ n.
Γ(
For the inductive closure axioms note that

n d0 Γ(
, n ), ¬Fi (Wi , m), Fi (Wi , m)
for some d depending only on the size of the formula Fi (Wi , m). Fur-
thermore for m ≤ n we have , n 00 m : N by (N 1). Thus by the
d +1
(Wi ) rule, , n 0 Γ( n ), ¬Fi (Wi , m), Wi (m). Hence by two appli-

cations of the (∨) rule followed by the (∀) rule we obtain , n d0 +4
Γ(n ), ∀a (Fi (Wi , a) → Wi (a)). Therefore, provided we choose d ≥ d +4,
k ·2
we have d + 4 ∈ k [, n], so by weakening , n
0 n ), ∀a (Fi (Wi ,
Γ(
a) → Wi (a)).
For the least-fixed-point axioms for Wi we apply the (Ωi ) rule and
the last lemma. First, in order to show the right hand premise of an
Ωi -rule, assume , n 0 Δ, Wi (m) where ∈ ΩSi , i−1 ∈ [ , n ] and
→ ], max(n, m) , n . Since Δ, Wi (m) is positive-in-W≥i the
[i
lemma applies, giving
, n r
k +
Δ, ¬∀a (Fi (A, a) → A(a)), A(m)
where max( n , d ) < n ≤ n , d and r depend on the size of the formula A,
and n are numerals substituted for any other free variables not mentioned
explicitly. Now because [i → ], max(n, m) , n it follows that
k [ , n ] ⊇ i [ , n ] ⊇ [ , n ] and hence k · 2[ , n ] ⊇ k + [ , n ].

Therefore by part (a) of the weakening lemma,

, n r
k ·2
Δ, ¬∀a (Fi (A, a) → A(a)), A(m).
We have now shown
max(n, m); Wi (m) →k ·2 ¬∀a (Fi (A, a) → A(a)), A(m).
,
max(n, m) 0
Now we use the (Ωi ) rule to combine this with the axiom ,
n ), Wi (m), Wi (m) so as to derive
Γ(
k ·2+1
max(n, m)
, r n ), Wi (m), ¬∀a (Fi (A, a) → A(a)), A(m).
Γ(
5.3. Proof-theoretic characterizations of accessibility 225

Hence by two (∨) rules followed by the (∀) rule,

k ·2+4
n
, r n ), ¬∀a (Fi (A, a) → A(a)), ∀a (Wi (a) → A(a)).
Γ(
Hence the least-fixed-point axiom for Wi .
The ordinary induction axioms can be treated as for PA and would yield
a bound 0 + 4, which could be weakened to k · 2 + 4. As noted above,
the logical rules of IDk (W ) are easily transferred to “finite step” rules
in the infinitary calculus. It now follows that any derivation in IDk (W )
k ·2+d
n
transforms into an infinitary one , r n ) for suitable d and r.
Γ(
Weakening then replaces d by 0 as d < n is assumed.
5.3.4. Ordinal analysis of IDk . In this subsection we compute ordi-
nal bounds for Σ1 -theorems, and hence provably recursive functions, of
IDk (W ). The methods are cut elimination and collapsing à la Buch-
holz [1987]. The point is to estimate their cost in terms of suitable ordinal
functions. The Ωi rules used here are a variation on his original invention,
but tailored to a step-by-step collapsing process. It should be remarked
that the development here is similar to (though somewhat more complex
than) the PhD thesis of Williams [2004] which analyses finitely iterated
inductive definitions from a somewhat different point of view, based on a
weak “predicative” arithmetic with a “pointwise” induction scheme (see
Wainer and Williams [2005] for the uniterated case, and Wainer [2010] for
an overview).
Lemma (Inversion). In IDk (W )∞ :
(a) If , n αr Γ, A0 ∧ A1 , then , n αr Γ, Ai , for each i = 0, 1.
(b) If , n αr Γ, ∀x A(x), then , max(n, m) αr Γ, A(m).
Proof. The parts are very similar, so we shall only do part (b). Further-
more the fundamental ideas for this are already dealt with in the ordinal
analysis for Peano arithmetic.
We proceed by induction on α. Note first that if the sequent , n αr
Γ, ∀x A(x) is an axiom of IDk (W )∞ then so is , n αr Γ and then the
desired result follows immediately by weakening.
Suppose , n αr Γ, ∀x A(x) is the consequence of a (∀) rule with ∀x A(x)
the “main formula” proven. Then the premises are, for each m,
, max(n, m) r m Γ, A(m), ∀x A(x)
where m ∈ α[, max(n, m)]. So by applying the induction hypothesis one
immediately obtains , max(n, m) r m Γ, A(m). Weakening then allows
the ordinal bound m to be increased to α.
In all other cases the formula ∀x A(x) is a “side formula” occurring in
the premise(s) of the final rule applied. So by the induction hypothesis,
∀x A(x) can be replaced by A(m) and n by max(n, m). The result then
follows by re-applying that final rule.
226 5. Accessible recursive functions, ID< and Π11 -CA0
Lemma (Cut reduction). Suppose , n αr Γ, C and , n r Γ , ¬C in
IDk (W )∞ where C is a formula of size r + 1 and of shape C0 ∨ C1 or
∃x C0 (x) or Wi (m) or a false atom. Then
, n +α
r Γ, Γ .
Proof. By induction on α with cases according the last rule applied in
deriving , n αr Γ, C .
If it is an axiom then either Γ is already an axiom or else C is Wi (m)
and Γ contains ¬C . In this case we can weaken , n r Γ , ¬C to obtain
, n +αr Γ , Γ as required.
If it arises by any rule in which C is a side formula then the induction
hypothesis applied to the premises replaces C by Γ and adds to the left
of the ordinal bound. But if ∈ α[, n] then + ∈ + α[, n], so by
re-applying the final rule one again obtains , n +α r Γ, Γ . This applies
to all of the rules including the (Wi ) rule and the (Ωi ) rule.
Finally suppose C is the “main formula” proven in the final rule of the
derivation. There are two cases:
If C is C0 ∨ C1 then the premise is , n r Γ, Ci , C with ∈ α[, n]. By
the induction hypothesis we therefore have , n + r Γ, Ci , Γ . By invert-
ing , n r Γ , ¬C where ¬C is ¬C0 ∧ ¬C1 we obtain , n r Γ , ¬Ci , Γ

by weakening. Now we can apply a cut on Ci (which has size ≤ r) to

produce , n +α r Γ, Γ .
If C is ∃x C0 (x) the premises are , n r 0 m : N and , n r 1 Γ, C0 (m), C
where 1 ∈ α[, n] and 0 ∈ 1 [, n]. By the induction hypothesis we
therefore have , n + r
1
Γ, C0 (m), Γ . Now by inverting , n r Γ , ¬C
where ¬C is ∀x ¬C0 (x) we get , max(n, m) r Γ , ¬C0 (m), Γ by weak-
ening. Observe that the ﬁrst premise , n r 0 m : N can be weakened
to the ordinal bound + 0 and from this, by the (N 2) rule, we obtain
, n +r
1
Γ , ¬C0 (m), Γ. Now we can apply a cut on C0 (m) (which has
size ≤ r) to produce , n +α r Γ, Γ .
∞
Theorem (Cut elimination). In IDk (W ) , if , n r+1 Γ then we have
α
∗
, n 2r Γ, and by repeating this, , n α0 Γ where α ∗ = 2r+1 (α).
α

Proof. By induction on α.
If , n αr+1 Γ arises by any rule other than a cut of rank r + 1, simply
apply the induction hypothesis to the premises and then re-apply this ﬁnal
rule, using the fact that ∈ α[, n] implies 2 ∈ 2α [, n].
If, on the other hand, it arises by a cut of rank r + 1, then the premises
will be , n r+1 0
Γ, C and , n r+1 1
Γ, ¬C where 0 , 1 ∈ α[, n] and
C has size r + 1. By weakening if necessary we may assume 0 = 1 .

Applying the induction hypothesis to these one then obtains , n 2r 0 Γ, C
+1
and , n 2r 0 Γ, ¬C . Cut reduction then gives , n 2r 0 Γ and, since
0 ∈ α[, n], 20 ∈ 2α [, n] and therefore 20 +1 [, n ] ⊆ 2α [, n ] for all
n ≥ n. Weakening then gives , n 2r Γ as required.
α

5.3. Proof-theoretic characterizations of accessibility 227

Theorem (Collapsing). If , n α0 Γ in IDk (W )∞ where Γ is a set of

positive-in-Wk formulas, then
, n ϕ(α,
0
k)
Γ
by a derivation in which there are no (Ωk ) rules. Here ϕ denotes the function
ϕ (k) : ΩSk+1 × ΩSk → ΩSk defined at the beginning of this chapter.
Proof. By induction on α, as usual, with cases according to the last
rule applied in deriving , n α0 Γ. In the case of axioms there is nothing
to do, because the ordinal bound can be chosen arbitrarily. In all rules
except (Ω) the process is the same: apply the induction hypothesis to
the premises, and then re-apply the final rule. For instance, if the final
rule is a (∀) where ∀x A(x) ∈ Γ, then the premises are , max(n, i) i
Γ, A(i) with i ∈ α[, max(n, i)] for all i. By the induction hypothesis
we therefore have a derivation of , max(n, i) ϕ(i ,k ) Γ, A(i) in which
there are no (Ωk ) rules. Since i ∈ α[, max(n, i)], an earlier calculation
gives ϕ(i , k ) ∈ ϕ(α, k )[, max(n, i)]. Re-applying the (∀) rule gives
, n ϕ(α,
0
k)
Γ.
Now suppose , n α0 Γ0 , Γ1 comes about by an application of an
(Ωi ) rule where i < k. Applying the induction hypothesis to the first
premise , n 0 0 Γ0 , Wi (m) gives immediately the derivation , n ϕ(0
0 ,k )

Γ0 , Wi (m) in which there are no (Ωk ) rules. In the case of the second
premise , n; Wi (m) →0 1 Γ1 assume , n 0 Δ, Wi (m) where i ∈ [ , n ]
and [i → ], n , n . Since ∈ ΩSi the ﬁrst k in the declaration
plays no role and may be chosen arbitrarily, so now replace k by k .
Then by applying the second premise , n; Wi (m) →0 1 Γ1 one obtains
[k → k ], n 0 1 Δ, Γ1 . Then the induction hypothesis gives
[k → k ], n ϕ(
0
1 ,k )
Δ, Γ1
with no (Ωk ) rules, and since ϕ(1 , k ) ∈ ΩSk the k at the front of the
declaration may again be replaced by anything, in particular the original
k . This proves , n; Wi (m) →ϕ( 0
1 ,k )
Γ1 . We can therefore re-apply this
(Ωi ) rule, using ϕ(0 , k ), ϕ(1 , k ) ∈ ϕ(α, k )[, n] to obtain , n ϕ(α, 0
k)

Γ0 , Γ1 , again with no (Ωk ) rules.

Finally suppose , n α0 Γ0 , Γ1 comes about by an application of an
(Ωk ) rule. By a simple weakening of one of the premises we may safely
assume that they both have the same ordinal bound ∈ α[, n]. Ap-
plying the induction hypothesis to the ﬁrst premise , n 0 Γ0 , Wk (m)
we obtain a derivation , n ϕ(, 0
k)
Γ0 , Wk (m) as before, without (Ωk )
rules. Since Γ0 is a set of positive-in-Wk formulas we can apply the
second premise , n; Wk (m) →0 Γ1 with = ϕ(, k ), Δ = Γ0 and
, n = , k−1 , . . . , 1 , n to obtain , k−1 , . . . , 1 , n 0 Γ0 , Γ1 . Hence
by the induction hypothesis, , k−1 , . . . , 1 , n ϕ(,) 0 Γ0 , Γ1 without (Ωk )
228 5. Accessible recursive functions, ID< and Π11 -CA0
rules, and since the ordinal bound is now in ΩSk the at the front is
redundant and may be replaced by anything, in particular k . Thus
, n ϕ(,)
0 Γ0 , Γ1 where ϕ(, ) = ϕ( +1, k ). Since ∈ α[, n], we have
ϕ( + 1, k )[, n] ⊆ ϕ(α, k )[, n], so a ﬁnal weakening gives , n ϕ(α,
0
k)

Γ0 , Γ1 . Notice that we have eliminated this application of the (Ωk ) rule.

5.3.5. Accessible = provably recursive in ID< . Now we can put the
above results together:
Theorem. If IDk (W ) Γ( x ) where Γ is a set of purely arithmetical
(or even positive-in-W1 ) formulas, then there are fixed numbers d and r,
and a fixed (countable) α ≺ k+2 , such that if n > max( n , d, r) then in
ID0 (W )∞ + (W1 ) = PA∞ + (W1 ) we can derive
n α0 Γ(
n ).
Recall k+2 = ϕ (1) (ϕ (2) (. . . ϕ (k+1) (ϕ (k+2) (k+2 , k+1 ), k ), . . . , 1 ), 0 ).
Proof. The embedding theorem shows that if Γ( x ) is provable in
IDk (W ) then there are fixed numbers d , r, determined by this proof,
such that if n > max( n , d ) then
n
, r
k +k +d
n ).
Γ(
Repeated applications of, first, cut elimination and then collapsing will
immediately transform this into a derivation in ID0 (W )∞ + (W1 ), with
a tree ordinal bound of approximately the right form, but it will not be
≺ k+2 as required. In order to achieve a bound α ≺ k+2 , we need to
perform some additional calculations which only the most determined fan
of tree ordinals will wish to follow.
First, we can slightly weaken the result of embedding, by noting that
n
whenever we have an infinitary derivation , r
k +
Γ, the ordinal bound

+2

can be replaced by 2 thus: , n r k
Γ. This is easily shown
by induction on , for we only need to check the ordinal assignment
conditions, that if k + ∈ k + [, n] then ∈ [, n], hence 2 ∈

n] and finally k + 2 ∈ k + 2 [,
2 [,
n]. Therefore
k +d
n
, r
k +2
n ).
Γ(
2k n
Now k +2k ·(1+2 )[, n] = k +2k ·(1+22 )[, n]. Such calculations
are easily checked: the rightmost k diagonalizes to k−1 , then to k−2 ,
and so on down to 0 and then to n. Thus if n > d we see that k + 2k +d

belongs to k + 2k · (1 + 22 k )[,
n]. We can now use part (a) of the
weakening lemma to increase the ordinal bound k + 2k +d to k + 2k ·

(1 + 22 k ), which is the same as ϕ (k+1) (k+1 , ϕ (k+1) (k+1 , k )). This is
because for ∈ ΩSk+1 the definition of ϕ (k+1) gives ϕ (k+1) (k+1 , ) =
ϕ (k+1) (, ) = + 2 . Thus
(k+1)
(k+1 ,ϕ (k+1) (k+1 ,k ))
n ϕr
, n ).
Γ(
5.3. Proof-theoretic characterizations of accessibility 229

By the cut elimination theorem, any derivation with cut rank r and
ordinal bound ∈ ΩSk+1 can be transformed into a derivation with
cut rank r − 1 and ordinal bound 2 . But this could be weakened to
+ 2 = ϕ (k+1) (k+1 , ). Therefore we can successively reduce cut rank
by iterating the function (·) := ϕ (k+1) (k+1 , ·) to obtain
r+2
(k )
n 0
, n ).
Γ(
By repeated collapsing
n α0 Γ(
, n)
where α = ϕ (1) (ϕ (2) (. . . ϕ (k) ( r+2 (k ), k−1 ) . . . , 1 ), 0 ). Since this is
now a cut-free derivation with a countable ordinal bound, it has neither
(Wi ) rules for i > 1 nor any (Ωi ) rules and the prefix is redundant.
Hence in ID0 (W )∞ + (W1 ) we have
n α0 Γ(
n ).
It remains to check that α ≺ k+2 . Firstly ϕ (k+1) (ϕ (k+2) (k+2 , k+1 ),
k ) = ϕ (k+1) (k+1 + 2k+1 , k ) = ϕ (k+1) (k+1 + 2k , k ). Furthermore
we have ϕ (k+1) (k+1 + 2k , k )[, n] = ϕ (k+1) (k+1 + 2n , k )[,
n] =
n
22 r+2 2n
(k )[,
n] and this set contains (k ) as long as 2 > r + 2. This
is because ∈ ()[, n] for all ∈ ΩSk+1 . Hence
r+2
(k ) ∈ ϕ (k+1) (ϕ (k+2) (k+2 , k+1 ), k )[,
n].
Now recall that if ∈ [,
n] then ϕ (k) (, k−1 ) ∈ ϕ (k) (, k−1 )[,
n].
So
ϕ (k) ( r+2 (k ), k−1 ) ∈
ϕ (k) (ϕ (k+1) (ϕ (k+2) (k+2 , k+1 ), k ), k−1 )[k−2 , . . . , 0 , n].
Repeating this process at levels k − 1, k − 2, . . . , 1 we thus obtain
α = ϕ (1) (ϕ (2) (. . . ϕ (k) ( r+2
(k ), k−1 ) . . . , 1 ), 0 )
∈ ϕ (. . . ϕ
(1) (k)
(ϕ (k+1)
(ϕ (k+2)
(k+2 , k+1 ), k ), k−1 ) . . . , 0 )[n]
= k+2 [n].
We have checked, in the course of the above, that this holds provided n is
large enough, for example n > max(d, r). Thus α ∈ k+2 [max(d, r)] and
hence α ≺ k+2 .
Lemma (Witnessing lemma). Suppose Γ is a finite set of Σ1 -formulas
such that n α0 Γ( n ). We may assume, by weakening if necessary, that
the derivation is “term controlled” in the sense that ordinals assigned to
sub-derivations are sufficient to ensure that B bounds the numerical values
of any elementary terms appearing (e.g., 1 = + 2 ≺ ). Then one of
the formulas in Γ is true with existential witnesses < Bα (n + 1).
230 5. Accessible recursive functions, ID< and Π11 -CA0
Proof. Proceed by induction on α with cases according to the last rule
applied. If Γ is an axiom, then it contains a true atom, which is a trivial
Σ1 -formula requiring no witnesses, and so we are done. (N 2) is easy.
If the last rule applied is anything other than an (∃) rule then its principal
formula has only bounded quantifiers. Either this is true, in which case
it is again a true Σ1 -formula requiring no witnesses and we are done,
or else one of the premises is of the form n 0 Γ ( n ), C where C is a
false subformula of that principal formula and n is less than the value of
some term t( n ). Applying the induction hypothesis to this premise we see
that Γ (n ) ⊆ Γ( n ) contains a true Σ1 -formula with witnesses less than
B (n + 1). The term control ensures that n + 1 ≤ B (n + 1) and so the
witnesses are bounded by B (B (n + 1)) ≤ Bα (n + 1) as required.
Finally suppose the last rule applied is an (∃) rule with conclusion Γ( n)
where the principal formula is ∃y D(y, n ) ∈ Γ(
n ) and the premises are
n 0 m : N and n 0 Γ( n ), D(m, n ). Then by the induction hypothesis,
either Γ( n ) already contains a true Σ1 -formula with witnesses less than
B (n + 1) or else D(m, n ) is a true Σ1 -formula with witnesses less than
B (n + 1) and the new witness m for ∃y is also less than B (n + 1).
(Recall the bounding lemma, that n 0 m : N implies m ≤ B (n).) Since
B (n + 1) is less than Bα (n + 1) we are done.
Corollary. If IDk (W ) ∃y A( x, y
) with A a bounded formula, then
there is an α ≺ k+2 and a number d such that for any n there are m
<
Bα (max( n , d ) + 2) such that A( holds.
n , m)
Proof. By the theorem we have n α0 ∃y A( n, y
) if n > max( n , d ) and
we can safely assume that the derivation is term controlled (if not, weaken
it to one that is, by transforming each ordinal bound into + 2 where
is chosen to ensure that B bounds all terms in A and + 2α ≺ k+2 ).
Applying the lemma we have true witnesses m < Bα (max( n , d ) + 2) such
that A( holds.
n , m)
From the foregoing and 5.3.1 we immediately have
Theorem. The provably recursive functions of IDk (W ) are exactly those
elementary, or primitive, recursive in {Bα | α ≺ k+2 }. Hence the accessible
recursive functions are exactly those provably recursive in ID< (W ).
From the corollary in 5.2.2 in the previous section of this chapter, one
now sees immediately that witnesses for existential theorems of IDk (W )
may alternatively be bounded by levels of the slow-growing hierarchy
Gα with α ≺ k+3 . Arai [1991] and Schwichtenberg [1992] give quite
different analyses, for the IDk ’s and ID1 respectively, both of which are
quite novel in that they directly bound existential witnesses in terms of the
slow-growing, rather than fast-growing, hierarchy.
5.3.6. Provable ordinals of IDk (W ). By a “provable ordinal” of the
theory IDk (W ) we mean one for which there is a recursive ordinal notation
5.4. ID< and Π11 -CA0 231
a such that IDk (W ) W1 (a). This is equivalent to proving transfinite
induction up to a in the form
∀b (F1 (A, b) → A(b)) → ∀ba A(b).
Thus in case k = 0, although W1 is not a predicate symbol of PA, it
nevertheless makes perfectly good sense to refer to the ordinals below
ε0 = |2 | as the provable ordinals of ID0 (W ), since we know already that
they are the ones for which PA proves transfinite induction.
Theorem. For each k, the provable ordinals of IDk (W ) are exactly those
less than |k+2 |.
Proof. Any ordinal less that |k+2 | is represented by a tree ordinal
α ≺ k+2 and, by 5.3.1, this has a notation a for which IDk (W ) W1 (a).
Thus every ordinal below |k+2 | is a provable one of IDk (W ).
Conversely, if IDk (W ) W1 (a), then by 5.3.4 one can derive a α0
W1 (a) in ID0 (W )∞ + (W1 ) for some fixed α ≺ k+2 . Therefore a 0
F1 (W1 , a) for some ≺ α. By inverting this, and deleting existential side
formulas which contain false atomic conjuncts, one easily sees that either
a = 0 or a = 0, b and a 0 W1 (b) for some ≺ , or else a = 1, e
and for every n, max(a, n) 0 W1 ({e}(n)), again where ≺ . Thus one
can prove by induction on α that if n α0 W1 (a) than |a| ≤ |α|.
Our ϕ (k) -functions—a variation on the tree ordinal approach devel-
oped by Buchholz [1987]—give a system of ordinal representations which
is somewhat different in style from those more commonly used in proof-
theoretical analysis (e.g., based on the Bachmann–Veblen hierarchy of
“critical” functions). The outcome is the same nevertheless—a compu-
tation of the “ordinal” k+2 of IDk (W ). Thus in particular, since it
gives the ordinal of ID1 , 3 is a presentation of the so-called Bachmann–
Howard ordinal, often denoted φεΩ+1 (0) or Ω (εΩ+1 ), and in this latter
notation the ordinal || of ID< (W ) is Ω (Ω ). For a wealth of further
information on “higher” ordinal analyses of transfinitely iterated induc-
tive definitions, strong subsystems of second-order arithmetic and related
admissible set theories extending beyond ID< (W ), see for instance the
work of Buchholz and Pohlers [1978], Buchholz, Feferman, Pohlers, and
Sieg [1981], Buchholz [1987], Jäger [1986], Pohlers [1998], [2009], Rath-
jen [1999] which gives an expert overview, Arai [2000], Carlson [2001],
Rathjen [2005].

5.4. ID< and Π11 -CA0

Π11 -CA0 is the second-order (classical) theory whose language extends

that of PA by the addition of new variables X, Y, . . . and f, g, . . . denot-
ing sets of numbers and (respectively) unary number-theoretic functions.
232 5. Accessible recursive functions, ID< and Π11 -CA0
Thus f(x, y) stands for f(x, y). Of course, the set variables may be
eliminated in favour of function variables only, for example replacing
∃X A(X ) by ∃f A({x | f(x) = 0}) and t ∈ {x | f(x) = 0} by f(t) = 0.
For later convenience we shall consider this done, but for the time being
we continue to use the set notation as an abbreviation.
The axioms of Π11 -CA0 are those of PA together with the single in-
duction axiom (not the schema—this is what the subscript 0 in “-CA0 ”
indicates)
∀X (0 ∈ X ∧ ∀x (x ∈ X → x + 1 ∈ X ) → ∀x (x ∈ X ))
and the comprehension schema
∃X ∀x (x ∈ X ↔ C (x))
restricted to Π11 -formulas
C which do not contain X free but may have
first- and second-order parameters. Recall that Π11 -formulas are those of
the form ∀f A(f) with A arithmetical (i.e., containing no second-order
quantifiers). The comprehension principle gives sets, but we do not yet
have a principle guaranteeing the existence of functions whose graphs
are definable. To this end we need to add the so-called graph princi-
ple:
∀x ∃!z (
x , z) ∈ X → ∃h ∀x ( x )) ∈ X.
x , h(
As an example of how this is used we show, briefly, that the following
version of the axiom of choice,
∀x ∃f A(x, f) → ∃h ∀x A(x, hx ),
is provable in Π11 -CA0 , for arithmetical A and where hx (y) := h(x, y).
First, let s ( f mean that s is a sequence number with length lh(s)
and ∃i<lh(s) (∀j<i ((s)j = f(j)) ∧ (s)i ≤ f(i)) and let s ⊂ f mean
∀j<lh(s) ((s)j = f(j)). By Π11 comprehension, define sets X1 = {x, s|
∀f (A(x, f) → s ( f)} and X2 = {x, s|∀f (A(x, f) → s ⊂ f)}.
Then set X = X1 \ X2 . Now assume ∃f A(x, f). Then X is the set
of all pairs (x, s) such that s is an initial segment of the “leftmost”
function f satisfying A(x, f), and it is easy to prove by induction on
y that ∀y ∃!s (x, s ∈ X ∧ lh(s) = y + 1). The graph principle therefore
gives
∀x ∃f A(x, f) → ∃g ∀x,y ((x, g(x, y)) ∈ X ∧ lh(g(x, y)) = y + 1)
so by defining h(x, y) = (g(x, y))y one sees that for each x, hx is the
“leftmost” branch through A(x, .) and hence
∀x ∃f A(x, f) → ∃h ∀x A(x, hx ).
A similar style of argument, though only requiring arithmetical compre-
hension, also proves König’s lemma, that an infinite, finitely branching
tree must have an infinite branch.
5.4. ID< and Π11 -CA0 233
The foundational importance of Π11 -CA0 is that it is strong enough to
formalize and develop large (most) parts of core mathematics—up to,
for example, the Cantor–Bendixson theorem, Ulm’s structure theory for
arbitrary countable abelian groups, and many other fundamental results.
The reader is recommended to consult the major work of Simpson [2009]
where Π11 -CA0 features at the “top” of a hierarchy of five mathematically
significant subsystems of second-order arithmetic (RCA0 , WKL0 , ACA0 ,
ATR0 and Π11 -CA0 ), each of which captures and (in reverse) characterizes
deep mathematical principles in terms of the levels of comprehension
allowed.
Takeuti [1967] gave the first constructive consistency proof for Π11 -
analysis, using his ordinal diagrams, and Feferman [1970] proved that
various Π11 -systems, in particular Π11 -CA0 , can be reduced to theories of
iterated inductive definitions. We follow here the treatment reviewed in
chapter 1 of Buchholz, Feferman, Pohlers, and Sieg [1981], to show that
Π11 -CA0 and ID< (W ) have the same first-order theorems (and thus
the same provably recursive functions). However, a certain amount
of additional care must be taken in the reduction of Π11 -CA0 to the
finitely iterated ID system used here, because our Wi ’s are defined in
terms of unrelativized partial recursive sequencing at limits, and this
means that the usual Π11 normal form cannot be reduced directly to a
Wi set without further manipulation into a special normal form due to
Richter [1965].
5.4.1. Embedding ID< (W ) in Π11 -CA0 . First, as a straightforward
illustration of the power of Π11 -CA0 , we show that ID< (W ) is eas-
ily embedded into it. One only needs to prove the existence of sets
X1 , . . . Xk , . . . satisfying the inductive closure and least-fixed-point ax-
ioms for the operator forms F1 , . . . , Fk , . . . respectively. For each k
let Fk (X1 , . . . , Xk−1 , Y, a) be the formula obtained from the operator
form Fk (A, a) by replacing each occurrence of Wi by the set variable
Xi and the formula A by Y . Then let Ck (X1 , . . . , Xk−1 , Z) be the for-
mula

∀z (z ∈ Z ↔ ∀Y (∀a (Fk (X1 , . . . , Xk−1 , Y, a) → a ∈ Y ) → z ∈ Y ))

expressing that Z is the intersection of all sets Y which are inductively

closed under Fk with respect to the set parameters X1 , . . . , Xk−1 . Since
this is a Π11 -condition we have by Π11 -comprehension,

∃Z Ck (X1 , . . . , Xk−1 , Z).

By its very definition, this Z is the least fixed point of Fk (X1 , . . . , Xk−1 , Y )
as an operator on sets Y . The corresponding least-fixed-point schema of
IDk (W ), on arithmetically defined sets Y , is a consequence of comprehen-
sion. The inductive closure axiom holds because Fk (X1 , . . . , Xk−1 , Y, a)
234 5. Accessible recursive functions, ID< and Π11 -CA0
is positive (and thus, as an operator, monotone) in Y , for we have
Fk (X1 , . . . , Xk−1 , Z, z) →
∀Y (Z ⊆ Y → Fk (X1 , . . . , Xk−1 , Y, z)) →
∀Y (∀a (Fk (X1 , . . . , Xk−1 , Y, a) → a ∈ Y ) → Fk (X1 , . . . , Xk−1 , Y, z)) →
∀Y (∀a (Fk (X1 , . . . , Xk−1 , Y, a) → a ∈ Y ) → z ∈ Y ) →
z ∈ Z.
Hence if X1 , . . . , Xk−1 are the sets W1 , . . . , Wk−1 then Z is the set Wk .
The provable formula
∃X1 ∃X2 . . . ∃Xk (C1 (X1 ) ∧ C2 (X1 , X2 ) ∧ · · · ∧ Ck (X1 , . . . , Xk−1 , Xk ))
therefore establishes the existence of W1 , . . . , Wk in Π11 -CA0 .
Conversely, we need to show that, for first-order arithmetical sentences
at least, Π11 -CA0 is conservative over ID< (W ). This will require a (many-
one recursive) reduction of any Π11 -form to one of the Wi sets, and a suit-
able interpretation of the second-order theory Π11 -CA0 inside ID< (W ).
We leave aside the interpretation for the time being, and concentrate first
on Richter’s Π11 -reduction without bothering explicitly about its formal-
ization.
5.4.2. Reduction of Π11 -forms to Wi sets. By standard quantifier manip-
ulations, using Kleene’s normal form for partial recursion and the usual
overbar to denote the course-of-values function f(x) ¯ = f(0), . . . , f(x)
any Π11 -formula C with set parameter X and number variable a can be
brought to the form
¯
C (a, X ) := ∀f ∃x R(a, f(x), g¯1 (x), g¯2 (x))
where R is some “primitive recursive” formula (having only bounded
quantifiers) and g1 , g2 are the strictly increasing functions enumerating X
and its complement (denoted here) X . Let h encode the three functions
f, g1 , g2 by h(x) = f(x), g1 (x), g2 (x) so that h0 (x) = (h(x))0 = f(x),
h1 (x) = g1 (x) and h2 (x) = g2 (x). Then the negation of the above form
is equivalent to
∃h ∀x (h(x) ∈ N × X × X ∧ ¬R1 (a, h̄(x)))
where ¬R1 (a, h̄(x)) expresses the conjunction of h1 (x − 1) < h1 (x),
h2 (x − 1) < h2 (x) and ∃y≤x (h1 (y) = x ∨ h2 (y) = x) and ¬R(a, h¯0 (x),
h¯1 (x), h¯2 (x)). Negating once again, one sees that the original Π11 -form is
equivalent to
C (a, X ) := ∀h ∃x (h(x) ∈ N × X × X ∨ R1 (a, h̄(x))).
We refer to this as the “Richter normal form” on N, X, X .
5.4. ID< and Π11 -CA0 235
Now, to take this a stage further, suppose that X were expressible in
Richter normal form from parameter Y thus:
¯
x ∈ X ↔ ∀f ∃y (f(y) ∈ Y ∨ R2 (x, f(y))).
Then, putting the two forms together, we have

C (a, X ) ↔ ∀h ∃x ∀f ∃y (h1 (x) ∈ X ∨ f(y) ∈ Y ∨ R1 (a, h̄(x)) ∨

¯
R2 (h2 (x), f(y))).
Replacing ∃x ∀f P(x, f) by the (classically) equivalent ∀g ∃x P(x, g(x, .)),
the right hand side now becomes

∀h ∀g ∃x ∃y (h1 (x) ∈ X ∨ g(x, y) ∈ Y ∨ R1 (a, h̄(x)) ∨

R3 (h2 (x), ḡ(x, y)))
where R3 is a suitably modified version of R2 . The negation of this says
that there are functions h, g such that, for all x, y,
h1 (x) ∈ X ∧ g(x, y) ∈ Y ∧ ¬R1 (a, h̄(x)) ∧ ¬R3 (h2 (x), ḡ(x, y)).
By combining the functions h = h0 , h1 , h2 and g into a new function
f(x, y) = h0 (x), h1 (x), h2 (x), g(x, y), and adding a new primi-
tive recursive clause ¬R4 (f(x,¯ y)) expressing, for each i = 0, 1, 2,
that fi = hi is independent of y, i.e., fi (x, y) = fi (x, 0), one
sees that, for all x, y, this last line is equivalent to the conjunction of
f1 (x, y) ∈ X ∧ f3 (x, y) ∈ Y and ¬R1 (a, f0 , f1 , f2 (x, y)) and
¯
¬R3 (f2 (x, y), f̄3 (x, y)) and ¬R4 (f(x, y)). Negating back again,
and contracting the pair x, y into a single z (= x, y), one obtains
¯
C (a, X ) ↔ ∀f ∃z (f(z) ∈ N × X × N × Y ∨ R5 (a, f(z)))
where R5 is a disjunction of suitably modified versions of R1 , R3 , R4 . It
is then a simple matter to combine the two occurrences of N into one,
so that C is now expressed in Richter normal form on the parameters
N, X, Y .
Thus, if C is Π11 in X and X is expressible in Richter normal form
on parameter(s) Y , then C is expressible in Richter normal form on
parameters N, X, Y . Now one sees that since W1 , being Π11 in no set
parameters, is therefore expressible in Richter normal form on parameter
N , any set Π11 in W1 is expressible in Richter normal form on parameters
N, W1 , and since this includes W2 , any set Π11 in W2 is expressible in Richter
normal form on parameters N, W1 , W2 (one can always combine multiple
occurrences of N in the parameter list). Iterating this, it follows that any
set Π11 in Wk is expressible in Richter normal form on the parameters
N, W1 , . . . , Wk .
Lemma. If S is Π11 in Wk then it is many-one reducible to Wk+1 . That is,
there is a recursive function g such that a ∈ S ↔ g(a) ∈ Wk+1 .
236 5. Accessible recursive functions, ID< and Π11 -CA0
Proof. If S is Π11 in Wk then, by the above, there is a primitive recursive
relation R such that, for all a,
¯
a ∈ S ↔ ∀f ∃z (f(z) ∈ N × W1 × · · · × Wk → R(a, f(z))),
and we may assume that R(a, s) implies R(a, s ) for any extension s of
s by simply replacing R(a, s) by ∃t⊆s R(a, t) if necessary. For notational
simplicity only, we carry the proof through for k = 1, the general case
being entirely similar. The function g is given by g(a) = g(a, ) where,
for arbitrary sequence numbers s, the binary g(a, s) is defined (from its
own index) by the recursion theorem thus:

0 if R(a, s)
g(a, s) =
1, Λi 2, Λj g(a, s ∗ i, j) otherwise.
Here, the Kleene notation Λi t denotes any index of t regarded as a re-
cursive function of i only, and s ∗ s denotes the new sequence number
obtained by concatenating s with s . We must show
¯
g(a, s) ∈ W2 ↔ ∀f ∃z (f(z) ∈ N × W1 → R(a, s ∗ f(z)))
so that the required result follows immediately by putting s = the empty
sequence .
For the left-to-right implication we use (informally) the least-fixed-
point property of W2 , by applying it to

A(b) := ∀cb ∀s (c = g(a, s) → ∀f ∃z (f(z) ∈ N × W1 →

¯
R(a, s ∗ f(z))))
with the “sub-tree” partial ordering on W2 . We show that ∀b (F2 (A,
b) → A(b)) from which we get the required left-to-right implication by
putting b = g(a, s). Note that this is an abuse of the language of ID2 (W )
because A is not even a ﬁrst-order formula. However, it will be when
we come to formalize this argument subsequently. So assume F2 (A, b).
This means that A(c) holds for every c ≺ b. If b = 0, c or 2, e
then A(b) is automatic because b is not a value of g. If b = 0 and
b = g(a, s) then R(a, s) holds and we again have A(b). Finally suppose
b = 1, e and b = g(a, s). Then for each i, {e}(i) = 2, ei ≺ b where
{ei }(j) = g(a, s ∗ i, j) for every j ∈ W1 . Hence for every n, considered
as a pair n = i, j, if j ∈ W1 we have g(a, s ∗ n) ≺ b and therefore
A(g(a, s ∗ n)), thus ∀f ∃z (f(z) ∈ N × W1 → R(a, s ∗ n ∗ f(z))).¯ Since
n is arbitrary, ∀f ∃z (f(z) ∈ N × W1 → R(a, s ∗ f(z))) ¯ and again we
have A(b).
For the right-to-left implication use the inductive closure property of
W2 . Suppose g(a, s) ∈ W2 . Then g(a, s) is deﬁned by its second clause
and so there is an i and a j ∈ W1 such that g(a, s ∗ i, j) ∈ W2 . Let n0
be the least such pair i, j so that g(a, s ∗ n0 ) ∈ W2 . Then let n1 be the
5.5. An independence result: extended Kruskal theorem 237

least such pair so that g(a, s ∗ n0 , n1 ) ∈ W2 . Clearly this process can be
repeated ad infinitum to obtain a function f(z) = nz such that for all z,
¯
¬R(a, s ∗ f(z)). This completes the proof.
Note. The function f just defined is recursive in W2 . More generally,
the proof that any set Π11 in Wk is many-one reducible to Wk+1 needs only
refer to functions recursive in Wk+1 .
5.4.3. Conservativity of Π11 -CA0 over ID< (W ).
Theorem. Any Π11 -CA0 proof of a first-order arithmetical sentence can
be interpreted in some IDk+1 (W ) by restricting the function variables to
range over those recursive in Wk+1 . Thus Π11 -CA0 and ID< (W ) prove the
same arithmetical formulas.
Proof. Suppose the given Π11 -CA0 proof uses k + 1 instances of Π11 -
comprehension, defining sets X0 , . . . , Xk . Imagine them ordered in such
a way that the definition of each Xi uses only parameters from the list
X0 , . . . , Xi−1 . Then by induction on i one sees, by the lemma, that each
Xi is many-one reducible to Wi+1 and thence to Wk+1 and the proof of this
refers only to functions recursive in Wk+1 . Therefore by interpreting all
function variables in the original proof as ranging over functions recursive
in Wk+1 , replacing ∃f C (f) by ∃e (∀x ∃y ({e}Wk+1 (x) = y) ∧ C ({e}Wk+1 ))
etc., every second-order formula is translated into a first-order formula of
IDk+1 (W ), first-order formulas remaining unchanged. Thus the second-
order quantifier rules become first-order ones provable in IDk+1 (W ),
the graph principle becomes provable too, and the induction axiom of
Π11 -CA0 becomes provable by the usual first-order schema. Further-
more, under this interpretation, all of the second-order quantifier ma-
nipulations used in the above reduction of Π11 forms to Richter nor-
mal forms are provably correct (because of standard recursion-theoretic
uniformities). The proof of the lemma then translates into a proof
in IDk+1 (W ), and consequently each application of Π11 -comprehension
becomes a theorem. If the endformula of the given Π11 -CA0 proof
is first-order arithmetical, it therefore remains provable in IDk+1 (W ).

The provably recursive functions of Π11 -CA0 are therefore the accessible
ones, || is the supremum of its provable ordinals, and both the fast-
growing B and, by 5.2.2, the slow-growing G eventually dominate all of
these functions.

5.5. An independence result: extended Kruskal theorem

Kruskal’s theorem [1960] states that every inﬁnite sequence {Ti } of

finite trees has an i < j such that Ti is embeddable in Tj . By “finite tree”
is meant a rooted (finite) partial ordering in which the nodes below any
238 5. Accessible recursive functions, ID< and Π11 -CA0
given one are totally ordered. An embedding of Ti into Tj is then just a
one-to-one function from the nodes of Ti to nodes of Tj preserving infs
(greatest lower bounds).
Friedman [1981] showed this theorem to be independent of the theory
ATR0 (see Simpson [2009]) and went on, in Friedman [1982], to develop
a significant extension of it which is independent of Π11 -CA0 . This, and its
relationship to the graph minor theorem of Robertson and Seymour, are
reported in the subsequent Friedman, Robertson, and Seymour [1987].
The extended Kruskal theorem concerns finite trees in which the nodes
carry labels from a fixed finite list {0, 1, 2, . . . , k}. By a more delicate
argument, he proved that for any k, every infinite sequence {Ti } of finite
≤ k-labelled trees has an embedding Ti → Tj where i < j. However, the
notion of embedding is now more complex. Ti → Tj means that there is
an embedding f in the former sense, but which also preserves labels (i.e.,
the label of a node is the same as that of its image under f) and satisfies
the gap condition which states: if node x comes immediately below node
y in Ti , and if z is an intermediate node strictly between f(x) and f(y)
in Tj , then the label of z must be ≥ the label of f(y).
Both of these statements are Π11 , expressed by a universal set/function
quantifier, but Friedman showed that they can be “miniaturized” to Π02
forms which (i) now fall within the realm of “finitary combinatorics”,
expressible in the language of first-order arithmetic, but (ii) still reflect the
proof-theoretic strength of the original results. See Simpson [1985] for an
excellent short exposition.
The miniaturized Kruskal theorem for labelled trees runs as follows:
For any number c and fixed k there is a number Kk (c) so large that for
every sequence {Ti } of finite ≤ k-labelled trees of length Kk (c), and where
each Ti is bounded in size by Ti ≤ c · (i + 1), there is an embedding
Ti → Tj with i < j. In fact we shall consider a slight variant of this—
where the size restriction Ti ≤ c · (i + 1) is weakened to Ti ≤ c · 2i .
By the “size” of a tree is simply meant the number of its nodes. Friedman
showed that, by slowing down the sequence, 2i may be replaced by i + 1
without affecting the result’s proof-theoretic strength.
That the miniaturized version is a consequence of the full theorem fol-
lows from König’s lemma, for suppose the miniaturized version fails.
Then there is a c such that for every there is a sequence of size-
bounded, ≤ k-labelled finite trees, of length , which is “bad” (i.e., con-
tains no embedding Ti → Tj with i < j). Arrange these bad sequences
into a big tree, each node of which is itself a finite labelled tree. Be-
cause of the size-bound, this big tree is finitely branching—only finitely
many branches issue from each node. However, it has infinitely many
levels, so by König’s lemma there is an infinite branch. This infinite
branch is then an infinite bad sequence, contradicting the full theo-
rem.
5.5. An independence result: extended Kruskal theorem 239

In this section we give a proof that the miniaturized Kruskal theorem

for labelled trees, and hence the full theorem, are independent of Π11 -CA0 .
Our proof makes fundamental use of the slow-growing G hierarchy. It
consists in showing directly that the natural computation sequence for
Gk (n) is bad. It follows that, for all k, n, Gk (n) < Kk (ck (n)) for a
suitably small ck (n). Therefore, since from the previous results G (n) =
Gn (n) dominates all provably recursive functions of Π11 -CA0 , so does K
as a function of both k and c. Thus K is not provably recursive, and
hence the miniaturized Kruskal theorem for labelled trees is not provable,
in Π11 -CA0 . It becomes provable if the number of labels is specified in
advance.
5.5.1. ϕ-terms, trees and i-sequences. Henceforth we shall regard the
ϕ-functions as function symbols and use them, together with the constants
0, j , to build terms. Each such term will of course denote a (structured)
tree ordinal, but it is important to lay stress, in this section, upon the fact
that a tree ordinal may be denoted by many different terms—for example
the terms
ϕ (1) (1 + 1, 0 ),
ϕ (1) (1 , ϕ (1) (1 , 0 )),
ϕ (1) (ϕ (1) (1 , 0 ), ϕ (1) (1 , 0 ))

all denote the same tree ordinal 0 + 20 + 20 +2 0 .
Definition. An i-term, for i > 0, is either i−1 or else of the form
ϕα(i) () (alternatively written ϕ (i) (α, ))
where is an i-term and α is a j-term with j ≤ i + 1. (0-terms are just
numerals n̄ built from 0 by repeated applications of the successor, denoted
here ϕ (0) without subscript.) Note that each i-term may be viewed as a
finite, labelled, binary tree whose root has label i, whose left hand subtree
is the tree α and whose right hand subtree is the tree . The tree i−1
consists of a single node labelled i, and the zero tree is the single node
labelled 0. When necessary, we indicate that is an i-term by writing it with
superscript i, thus i . As tree ordinals, we always have i−1 i ∈ Ωi .
Notation. For each ≤ i-term and (i − 1)-term i−1 (assuming i > 1)
it will be notationally useful in this subsection to abbreviate the term
ϕ(i−1) ( ) by the shorthand ( i−1 ). With association to the left, a typical
i-term then would be written as
ir ir−1 i1
( )( )...( )( i )
where (the “indicator” of how computation is to proceed) is either 0 or
an j . In particular, the tree-ordinal k may be written as
k (k−1 )(k−2 ) . . . (0 ).
240 5. Accessible recursive functions, ID< and Π11 -CA0
Definition (Stepwise term reduction at level i). Fix i = 1 . . . k +1 and
n, and let ( i−1 , i−2 ) abbreviate the 1-term ( i−1 )( i−2 )(i−4 ) . . . (0 ),
where ( i−2 )(i−4 ) . . . (0 ) is omitted if i = 2 and ( i−1 )( i−2 )(i−4 ) . . .
(0 ) is omitted if i = 1.
Then one-step i-reduction is defined by the seven cases below, according
to the computation rules for the ϕ functions:
i−1 i−2
( , )
i-reduces (or rewrites) in one step to
i−1 i−2
n̄( , ) if =0 ,
j−1 ( i−1
, i−2
) if =j and 0 < j < i − 2,
j
( i−1
, i−2
) if =j and j = i − 2 or i − 1,

( ( i−1
), i−2
) if = + 1=0( ),
α ( )( i−1
, i−2
) if =α( i ) and
α( i , i−1
) (i + 1)-reduces to α ( , i−1
),
i−1 i−2 i−1
α ( )( , ) if =α( ) and
α( i−1 , i−2
) i-reduces to α ( , i−2
),
i−1 i−2 i−2
α ( )( , ) if =α( ) and α = α + 1, = α () or
α( i−1
, i−2
) i-reduces to α ( i−1
, ).
Note. If ( i−1 , i−2 ) i-reduces in one step to ( i−1 , i−2 ) then:
(i) as tree ordinals, ; and
(ii) as labelled trees, ( i−1 , i−2 ) results from ( i−1 , i−2 ) by copying
a subtree (or inserting the numeral n̄) at its indicator place, and slightly
rearranging the branching when is a successor. Thus one step of i-
reduction at most doubles the size of the tree (or increases it by n).
Definition (The i-sequences). The i-sequence from ( i−1 , i−2 ) and
fixed n is the sequence {r ( ri−1 , ri−2 )} generated from by successive one-
step i-reductions, thus: 0 ( 0i−1 , 0i−2 ) is ( i−1 , i−2 ) and r ( ri−1 , ri−2 )
i−1 i−2
one-step i-reduces to r+1 ( r+1 , r+1 ).
Since i-reduction is deterministic, once the initial parameters i−1 , i−2
are fixed the successive pairs ri−1 , ri−2 may be suppressed, and we shall
simply write →i to signify that = r occurs in the i-sequence
beginning .
An i-sequence terminates when it reaches 0.
Note. If an i-sequence terminates at 0( i−1 , i−2 ) where i−1 = ( )
then a new i-sequence begins with ( , i−1 ( i−2 )). If, on the other hand,
i−1 = i−2 then there are no further i-sequences because the “next” term
would not be of the correct form ( i−1 , i−2 ) for an i-reduction ( would
be empty).
5.5. An independence result: extended Kruskal theorem 241

Deﬁnition (The computation sequence). For ﬁxed k and n, there is

just one 1-sequence beginning with k and n, and it is called the compu-
tation sequence. Note that all terms after the initial k have labels ≤ k,
because the single k gets reduced immediately to k−1 .
Lemma (Termination). In the computation sequence from k and n, every
i-sequence terminates, for each i = 1 . . . k + 1. Hence, with i = 1, the
computation sequence terminates.
Proof. We prove that each i-sequence terminates by induction down-
ward from i = k + 1 to i = 1.
For the basis i = k + 1, the first (k + 1)-reduction starts with k , and
this reduces successively to k−1 , k−2 , . . . , 0 and then to n (suppressing
the numeral overbar) followed by n − 1, n − 2, . . . , 2, 1, 0. This completes
the first (k + 1)-sequence. At this point the k-term ϕ0 ϕ1 . . . ϕn−1 (k−1 )
will have been generated at level k, so the next (k + 1)-sequence will be
simply {0}, the next {1, 0}, the next {0} and the next {2, 1, 0}, etcetera.
There will be 2n terminating (k + 1)-sequences in all, each subsequent one
of the form {m, m − 1, . . . , 1, 0} beginning with an m < n.
For the induction step assume that every (i + 1)-sequence terminates,
and consider any i-sequence beginning ( i−1 , i−2 ). We show that it
terminates by transfinite induction over the ordinal denoted by . Clearly
there is nothing to do if = 0. Now we apply the above case-by-case
definition of one-step i-reduction. Suppose that ( i−1 , i−2 ) i-reduces
in one step to ( i−1 , i−2 ).
(i) If this happens by any of the first four cases then, as tree ordinals,
≺ , so by the induction hypothesis →i →i 0 as required.
(ii) If case five applies then = α( i ) is an i-term where α( i , i−1 )
(i + 1)-reduces to α ( , i−1 ) and = α ( ). By the assumption
that every (i + 1)-sequence terminates, α →i+1 α →i+1 0 and hence
→i α ( ) →i 0( ) →i for some where, as tree ordinals, ≺ .
Therefore by the induction hypothesis, →i 0 and hence →i 0.
(iii) If the reduction happens because of case six then = α( i−1 ) is an
(i −1)-term and = α ( ) where α( i−1 , i−2 ) i-reduces to α ( , i−2 ).
If α is a j-term with j ≤ i − 1 then, as tree ordinals, we have α ≺
(this is easily checked at each step of i-reduction). Then by the induction
hypothesis, α →i 0 and hence →i 0( ) for some ≺ . Applying the
induction hypothesis, →i 0 and then →i 0. On the other hand, α
might be an i-term other than i−1 . But then its subscript will, by the
inductive assumption, reduce to 0 by a sequence of (i + 1)-reductions, and
consequently α →i 0(α1 ) →i α1 for some α1 ≺ α. Therefore →i →i
α1 (1 ) for some 1 ≺ . This process can be repeated if α1 is an i-term
other than i−1 , to yield α2 and 2 such that →i α1 (1 ) →i α2 (2 ) where
α2 ≺ α1 ≺ α and 2 ≺ . By well-foundedness, the process can only be
242 5. Accessible recursive functions, ID< and Π11 -CA0
repeated finitely often, so there is a ≺ such that →i i−1 ( ). But
i−1 ( ) →i ( ) and by the induction hypothesis, →i 0. Then
→i 0( ) →i for some ≺ . By the induction hypothesis again,
→i 0 and hence →i 0.
(iv) Finally suppose the reduction happens because of case seven, so =
α( i−2 ) is an (i −2)-term. If α is a j-term with j ≤ i −2 then α ≺ so
α →i 0 by the induction hypothesis, and then = α() →i 0( ) for some
≺ and by the induction hypothesis again, →i →i 0 as required.
The more awkward case is when α is an (i − 1)-term other than i−2 . In
general α will be of the form (si−1 ) . . . (1i−1 ) where is either i−2 or
not an (i − 1)-term, and each r+1 i−1
is ordinally smaller than the immediate
(i −1)-term which contains it as a subscript, i.e. r+1 i−1 i−1
(r ). Now if is an
i-term it reduces to 0 as in part (ii) above, and if it is a j-term with j ≤ i −2
then it must be ordinally less than and therefore eventually reduces to 0 by
the induction hypothesis. Hence, with reduced to 0, α will be reduced
i−1
to si−1 (si−1 (s−1 )) . . . (1i−1 ) and we may similarly unravel the (i − 1)-
i−1
subscripts of s in the same way as for α. Since each such subscript is
ordinally less than the immediate (i − 1)-term containing it, repetition of
the process must stop after finitely many stages, and α will then have been
reduced to the case (si−1 ) . . . (1i−1 ) where = i−2 . This i-reduces to
α1 = (si−1 ) . . . (1i−1 ) in one step, and α1 ≺ α because of “small sup”
diagonalization over ∈ Ωi−2 . We therefore have = α() →i α1 (1 )
where α1 ≺ α and 1 ≺ . The entire process can now be repeated
on α1 to produce →i α1 (1 ) →i α2 (2 ) where α2 ≺ α1 ≺ α and
2 ≺ . Etcetera. By well-foundedness the descending chain of (i − 1)-
terms α must end with i−2 , and then →i i−2 ( ) for some ≺ .
Since i−2 ( ) i-reduces to ( ) and since →i 0 by the induction
hypothesis, we obtain →i ( ) →i 0( ) →i for some ≺ .
Therefore →i 0 by the induction hypothesis again, and hence →i 0
as required. This completes the proof.
Lemma. Suppose = α( ) occurs in an i-sequence. Then (i) →
j j i

and (ii) if α is also a j-term then →i α.

Proof. (i) By transfinite induction on the ordinal of . The last lemma
gives α →i 0 or α →i+1 0. Let αs ≺ · · · ≺ α2 ≺ α1 be all the terms
whose successors appear in the reduction sequence from α to 0. Then
→i 0( ) →i where = αs (. . . (α2 (α1 ())) . . . ) and ≺ . By
repeated applications of the induction hypothesis, →i and hence
→i .
(ii) Now suppose j = α( j ) where α is also a j-term. Then by the
definition of one-step i-reduction, either α = as a result of a reduction
from j () to (), or else results by an i-reduction from α ( ) where
α is also a j-term and α →i α. If we assume that →i α then →i
by part (i) and so →i α →i α as required. Therefore since k satisfies
5.5. An independence result: extended Kruskal theorem 243

(ii) vacuously the result follows by an induction along the i-sequences

issuing from it.
Lemma. Fix k and n. Then each i-sequence is non-repeating.
Proof. This is clear. If the term ( s ) . . . ( 2 )( 1 ) appeared twice in
the same i-sequence then ( s ) . . . ( 2 ) would appear twice in the same i-
or (i + 1)-sequence, because any occurrence of 0 in between them would
create a change in the 1 ’s. Continuing this, one finally sees that the
indicator would have to appear twice in the same reduction sequence.
But this is impossible, for either = 0 in which case a change in s is
caused, or = j in which case no subsequent reduced term can be a
(j + 1)-term.
Lemma. The r-th member of the computation sequence from k and n is
bounded in size by ck (n) · 2r where ck (n) is max(2k + 1, n).
Proof. As already noted, at each step of the computation sequence,
the reduct is at most twice the size of the previous term or tree, or else
greater by n. It remains only to note that the size of the starting tree k is
2k + 1.
Lemma. The length of the computation sequence from k and n is greater
than Gk (n).
Proof. The computation sequence first reduces k , branching at the
fixed n when countable limits are encountered, until a successor is com-
puted, and then immediately after, its predecessor representing the tree
ordinal Pn (k ). The process is then repeated on this until Pn Pn (k ) is
computed, and so on down to 0. Thus the sequence passes through every
tree ordinal in the set k [n]. Its length is therefore greater than the size of
this set which, by definition, is just Gk (n).
5.5.2. The computation sequence is bad.
Definition. →+ means that, as labelled trees, → (i.e., is
embeddable in , preserving labels, infs and satisfying the gap condition)
and furthermore, if is a j -term, the embedding does not completely
embed inside any j-subterm of where j < j .
Lemma. Fix k and n. Then for each i with 1 ≤ i ≤ k + 1 and every term
, if →i and →+ then and are identical.
Proof. By induction on i from k + 1 down to 1, and within that an
induction over the term or tree , and within that a sub-induction over .
For the basis i = k + 1, we have already noted that the first (k + 1)-
sequence is k−1 , k−2 , . . . , 0 , n, n − 1, . . . , 2, 1, 0 and that there will be
finitely many others, each of the form {m, m − 1, . . . , 1, 0} beginning
with an m < n. Clearly, in each such (k + 1)-sequence, no term can be
embedded in any follower.
244 5. Accessible recursive functions, ID< and Π11 -CA0
Now suppose 1 ≤ i < k and assume the result for i + 1. We proceed by
induction on the term . If = j or 0 and →+ the only possibility is
that is . Suppose then that is of the form ϕα(j) (). Then cannot be
the (j + 1)-term j for any j ≥ j because →+ , and it cannot be j
with j < j because none of its followers in the i-sequence could then be

j-terms. Thus is also of the form ϕα(j ) ( ). By →+ we have j ≤ j
and by →i we have j ≥ j, so j = j. Also, we cannot have →i
for otherwise, by the gap condition, →+ implies →+ , so by the
sub-induction hypothesis and would be identical, and then would
contain as a proper subterm, contradicting → .

The situation then is this: = ϕα(j) (j)
( ), = ϕα (), →
i
and

→ . Furthermore, since → , a consequence of the reduction
+ i

steps is that either and are identical, or else must be of the form

ϕα(j) (j) (j)
r . . . ϕα2 ϕα1 ( ) where α → . . . α1 → . . . α2 → . . . αr → . . . α is the
initial part of an (i + 1)- or i-sequence from α , and α is ordinally greater
than α. (This is because any occurrence of zero immediately gets stripped
away, leaving what remains before it.) In this latter case we cannot have
α →+ α for otherwise the induction hypotheses would imply that α and
α are identical, contradicting the fact that the ﬁrst is ordinally greater
than the second. Hence, if α →+ α then and must be identical.
Now there are four possible ways in which can embed in , only two
of which can actually happen.
Case 1. →+ . Then →i →i belong to the same i-sequence, so
by the induction hypothesis is then identical to . Therefore the ordinal
denoted by is strictly less than the ordinal of . But this is impossible
because i-sequences are non-increasing.
Case 2. →+ α. Then there is a j-subterm of α such that →+ .
This occurrence of in the subscript α of must be created anew as the
i-sequence proceeds from to . There are two ways in which this could
arise:

(i) At some intervening stage a ϕα(j) ( ) occurs, where the indicator

of α is j . The next stage replaces by and then reduces to .

Thus →i →i and →+ so by the induction hypothesis ( being
a proper subterm of ) and are identical. But this is impossible since
is ordinally greater than and is greater than or equal to .
(ii) α has a j-subterm such that →i , thus causing the reductions
of α to α and hence to . Since →+ α any subterm of α containing
must be a j -term with j ≥ j, therefore any subterm of α containing
must also be a j -term with j ≥ j. Thus the embedding →+ entails
→+ because of the gap condition. But is a proper subterm of , so
by the induction hypothesis, and are identical. This is impossible how-
ever, because it means would be embeddable in a proper subtree of itself.
5.6. Notes 245

Case 3. →+ where the embedding takes the root of to the root
of and α → and → α. By the gap condition, since and are
j-terms, α must be either a j-term or a (j + 1)-term and α a j -term
with j ≤ j. Therefore, since α → α, it can only be the case that both
α and α are j-terms. Thus →i α →i α and, again because of the gap
condition, →+ α. Consequently, by the induction hypothesis, and
α are identical, and as i-sequences don’t repeat, they are both identical to
α . Since α and α are identical, so are and , and hence so are and .
Case 4. →+ where the embedding takes the root of to the root
of and α → α and → . Since α →i+1 α or α →i α and, again
by the gap condition, α →+ α, it follows from the induction hypotheses
that α and α are identical. Therefore and are identical and so too
are and . This completes the proof.
Theorem. The computation sequence from k and n is a bad sequence, and
therefore its length is bounded by the Kruskal function Kk (ck (n)). Hence
Gk (n) < Kk (ck (n)).
Proof. We already have seen that the computation sequence is non-
repeating, satisﬁes the size-bound |r | ≤ ck (n) · 2r , and has length greater
than Gk (n). Now apply the above lemma with i = 1, noting that if and
are 1-terms then → automatically implies →+ since 1-terms never
get inserted inside numerals. Thus if 1 precedes 1 in the computation
sequence then cannot be embeddable in , for otherwise →+ and
they would be identical. Hence the computation sequence is bad, and its
length must therefore be bounded by Kk (ck (n)).
Corollary. Neither Kruskal’s theorem for labelled trees, nor its minia-
turized version, is provable in Π11 -CA0 .
Proof. If the miniaturized version were provable then K and hence
Kn (cn (n)) would be provably recursive in Π11 -CA0 and therefore majorized
by G (n) = Gn (n), contradicting the theorem.

5.6. Notes

This chapter, like the previous one, has studied theories in classical logic
only. However, Buchholz, Feferman, Pohlers, and Sieg [1981] give con-
servation results for classical ID theories over their intuitionistic counter-
parts, at least for sets of formulas including Π02 . Thus in particular,
the provably recursive functions remain the same, whether in classical
or intuitionistic settings. Further attention has been given to general
methods for reducing classical to constructive systems, by Coquand and
Hofmann [1999] and by Avigad [2000], where forcing techniques are em-
ployed.
246 5. Accessible recursive functions, ID< and Π11 -CA0
The relationship between the fast- and slow-growing hierarchies is a
delicate one, which appears to have a lot to do with the question “What is a
standard well-ordering?”. Weiermann [1995] is the only one to have made
a deep study of this relationship when one moves beyond the first “catch-
up” point , and starts to look for more, but many questions remain as to
what these “subrecursively inaccessible” ordinals are. He also has done
much work investigating and illustrating the (extreme) sensitivity of the
slow-growing G to choices of fundamental sequences, where seemingly
“small” changes may create dramatic increases or decreases in rate of
growth (see, e.g., Weiermann [1999]).
Though we have not considered them here, it is worth noting that by
dropping the least-fixed-point schema from the ID theories, and writing
the inductive closure axioms as equivalences rather than just implications,
one obtains the weaker, so-called “fixed point theories” first studied in
detail by Feferman [1982] and extensively since then; see, e.g., Jäger,
Kahle, Setzer, and Strahm [1999].
Our treatment of Friedman’s extended Kruskal theorem is somewhat
different from others, being purely computational in nature. It imme-
diately gives refinements. For example, notice that 3 , a presentation
of the Bachmann–Howard ordinal, may be written as 0 (2 )(1 )(0 )
and therefore can be represented as a binary tree with four labels. Since
G3 = Bε0 dominates Peano arithmetic, it follows that Kruskal’s theo-
rem for binary trees with four labels is independent of PA. With five
labels it would be independent of ID1 , etcetera. (These are not “best
possible”; cleverer codings of the ϕ-functions would reduce the number
of labels needed—see Jervell [2005] for very concise tree representations).
Bovykin [2009] and Weiermann [2007] give more information on a variety
of Friedman-style independence results and related threshold theorems,
and Rathjen and Weiermann [1993] provides proof-theoretic analyses of
the Kruskal theorem.
Part 3

CONSTRUCTIVE LOGIC AND COMPLEXITY

Chapter 6

COMPUTABILITY IN HIGHER TYPES

In this chapter we will develop a somewhat more general view of com-

putability theory, where not only numbers and functions appear as argu-
ments, but also functionals of any ﬁnite type.

6.1. Abstract computability via information systems

There are two principles on which our notion of computability will be

based: finite support and monotonicity, both of which have already been
used (at the lowest type level) in section 2.4.
It is a fundamental property of computation that evaluation must be
finite. So in any evaluation of Φ(ϕ) the argument ϕ can be called upon only
finitely many times, and hence the value—if defined—must be determined
by some finite subfunction of ϕ. This is the principle of finite support (cf.
section 2.4).
Let us carry this discussion somewhat further and look at the situation
one type higher up. Let H be a partial functional of type 3, mapping
type-2 functionals Φ to natural numbers. Suppose Φ is given and H(Φ)
evaluates to a defined value. Again, evaluation must be finite. Hence the
argument Φ can only be called on finitely many functions ϕ. Furthermore
each such ϕ must be presented to Φ in a finite form (explicitly say, as a
set of ordered pairs). In other words, H and also any type-2 argument
Φ supplied to it must satisfy the finite support principle, and this must
continue to apply as we move up through the types.
To describe this principle more precisely, we need to introduce the
notion of a “finite approximation” Φ0 of a functional Φ. By this we mean
a finite set X of pairs (ϕ0 , n) such that (i) ϕ0 is a finite function, (ii) Φ(ϕ0 )
is defined with value n, and (iii) if (ϕ0 , n) and (ϕ0 , n ) belong to X where
ϕ0 and ϕ0 are “consistent”, then n = n . The essential idea here is that Φ
should be viewed as the union of all its finite approximations. Using this
notion of a finite approximation we can now formulate the
Principle of finite support. If H(Φ) is defined with value n,
then there is a finite approximation Φ0 of Φ such that H(Φ0 ) is
defined with value n.
249
250 6. Computability in higher types

The monotonicity principle formalizes the simple idea that once H(Φ)
is evaluated, then the same value will be obtained no matter how the
argument Φ is extended. This requires the notion of “extension”. Φ
extends Φ if for any piece of data (ϕ0 , n) in Φ there exists another (ϕ0 , n)
in Φ such that ϕ0 extends ϕ0 (note the contravariance!). The second basic
principle is then
Monotonicity principle. If H(Φ) is defined with value n and Φ
extends Φ, then also H(Φ ) is defined with value n.
An immediate consequence of finite support and monotonicity is that
the behaviour of any functional is indeed determined by its set of finite
approximations. For if Φ, Φ have the same finite approximations and
H(Φ) is defined with value n, then by finite support, H(Φ0 ) is defined
with value n for some finite approximation Φ0 , and then by monotonicity
H(Φ ) is defined with value n. Thus H(Φ) = H(Φ ), for all H.
This observation now allows us to formulate a notion of abstract com-
putability:
Effectivity principle. An object is computable just in case its set
of finite approximations is (primitive) recursively enumerable
(or equivalently, Σ01 -definable).
This is an “externally induced” notion of computability, and it is of definite
interest to ask whether one can find an “internal” notion of computability
coinciding with it. This will be done by means of a fixed point operator in-
troduced into this framework by Platek, and the result we shall eventually
prove is due to Plotkin [1978].
The general theory of computability concerns partial functions and
partial operations on them. However, we are primarily interested in total
objects, so once the theory of partial objects is developed, we can look
for ways to extract the total ones. In the last section of this chapter
Kreisel’s density theorem (that the total functionals are dense in the space
of all partial continuous functionals) and the associated effective choice
principle are presented.
The organization of the remaining sections is as follows. First we give
an abstract, axiomatic formulation of the above principles, in terms of the
so-called information systems of Scott [1982]. From these we define the
notion of a continuous functional of arbitrary finite type, over N and also
over general free algebras. Plotkin’s theorem will then characterize the
computable ones as those generated by certain natural schemata, just as
-recursion or least fixed points generate the partial recursive functions.
6.1.1. Information systems. The basic idea of information systems is
to provide an axiomatic setting to describe approximations of abstract
objects (like functions or functionals) by concrete, finite ones. We do not
attempt to analyze the notion of “concreteness” or finiteness here, but
rather take an arbitrary countable set A of “bits of data” or “tokens” as
6.1. Abstract computability via information systems 251

a basic notion to be explained axiomatically. In order to use such data

to build approximations of abstract objects, we need a notion of “consis-
tency”, which determines when the elements of a finite set of tokens are
consistent with each other. We also need an “entailment relation” between
consistent sets U of data and single tokens a, which intuitively expresses
the fact that the information contained in U is sufficient to compute the bit
of information a. The axioms below are a minor modification of Scott’s
[1982], due to Larsen and Winskel [1991].
Definition. An information system is a structure (A, Con, ) where A
is a countable set (the tokens), Con is a non-empty set of finite subsets
of A (the consistent sets) and is a subset of Con × A (the entailment
relation), which satisfy
U ⊆ V ∈ Con → U ∈ Con,
{a} ∈ Con,
U a → U ∪ {a} ∈ Con,
a ∈ U ∈ Con → U a,
U, V ∈ Con → ∀a∈V (U a) → V b → U b.
The elements of Con are called formal neighborhoods. We use U, V, W
to denote finite sets, and write
U V for U ∈ Con ∧ ∀a∈V (U a),
a↑b for {a, b} ∈ Con (a, b are consistent),
U ↑V for ∀a∈U,b∈V (a ↑ b).

Deﬁnition. The ideals (also called objects) of an information system

A = (A, Con, ) are deﬁned to be those subsets x of A which satisfy
U ⊆ x → U ∈ Con (x is consistent),
x⊇U a→a∈x (x is deductively closed).

For example the deductive closure U := {a ∈ A | U a} of U ∈ Con is

an ideal. The set of all ideals of A is denoted by |A|.
Examples. Every countable set A can be turned into a ﬂat information
system by letting the set of tokens be A, Con := {∅} ∪ {{a} | a ∈ A} and
U a mean a ∈ U . In this case the ideals are just the elements of Con.
For A = N we have the following picture of the Con-sets.
{0} {1} {2}
• • • . . .

•
∅
252 6. Computability in higher types

A rather important example is the following, which concerns approxi-

mations of functions from a countable set A into a countable set B. The
tokens are the pairs (a, b) with a ∈ A and b ∈ B, and
Con := {{(ai , bi ) | i < k} | ∀i,j<k (ai = aj → bi = bj )},
U (a, b) := (a, b) ∈ U.
It is not difficult to verify that this defines an information system whose
ideals are (the graphs of) all partial functions from A to B.
Yet another example is provided by any fixed partial functional Φ. A
token should now be a pair (ϕ0 , n) where ϕ0 is a finite function and Φ(ϕ0 )
is defined with value n. Thus if we take Con to be the set of all finite sets
of tokens and for U := {(ϕi , ni ) | i = 1, . . . , k} define U (ϕ0 , n) if and
only if ϕ0 extends some ϕi , then this structure becomes an information
system. The ideals in this case are all sets x of tokens with the property
that whenever (ϕ0 , n) belongs to x, then also all (ϕ0 , n) with ϕ0 extending
ϕ0 belong to x.
6.1.2. Domains with countable basis.
Definition. (D, (, ⊥) is a complete partial ordering (cpo for short), if
( is a partial ordering (i.e., reflexive, transitive and antisymmetric) on D
with least element
⊥, and moreover every directed subset S ⊆ D has a
supremum S in D. Here S ⊆ D is called directed if S is inhabited and
for any x, y ∈ S there is a z ∈ S such that x ( z and y ( z.
Lemma. Let A = (A, Con, ) be an information system.
Then (|A|, ⊆, ∅)
is a complete partial ordering with supremum operator .
Proof. Exercise.
Definition. Let (D, (, ⊥) be a complete partial ordering. An element
x ∈ D is called compact if for every directed subset S ⊆ D with x ( S
there is a z ∈ S such that x ( z. The set of all compact elements of D is
called the basis of D; it is denoted by Dc .
Lemma. Let A = (A, Con, ) be an information system. The compact
elements of the complete partial ordering (|A|, ⊆, ∅) can be represented in
the form
|A|c = {U | U ∈ Con},
where U := {a ∈ A | U a} is the deductive closure of U .
Proof. Let z ∈ |A| be compact. We must show z = U for some
U ∈ Con. The family {U | U ⊆ z} is directed (because for U, V ⊆ z an
upper bound of U und V is given by U ∪ V ), and we have z ⊆ U ⊆z U .
Since z is compact, we have z ⊆ U for some U ⊆ z. Now z is deductively
closed as well, hence U ⊆ z.
6.1. Abstract computability via information systems 253

Conversely, let U ∈ Con. We must show U ∈ |A|c . Clearly U ∈ |A|.

that U is compact. So let S ⊆ |A| be a directed subset
It remains to show
satisfying U ⊆ S. With U = {a1 , . . . , an } we have ai ∈ zi ∈ S. Since
S is directed, there is a z ∈ S with z1 , . . . , zn ⊆ z, hence U ⊆ z and
therefore U ⊆ z.
Deﬁnition. A complete partial ordering (D, (, ⊥) is algebraic if every
x ∈ D is the supremum of its compact approximations:

x = {u ∈ Dc | u ( x}.

Lemma. Let A = (A, Con, ) be an information system. Then (|A|, ⊆, ∅)

is algebraic.

Proof. Assume x ∈ |A|. Clearly x = {U | U ⊆ x}.
Definition. A complete partial ordering (D, (, ⊥) is bounded complete
(or consistently
complete) if every bounded subset S of D has a least upper
bound S in D. It is a domain (or Scott–Ershov domain) if it is algebraic
and bounded complete.
Now we can prove that the ideals of an information system form a
domain with a countable basis.
Theorem. For every information system A = (A, Con, ) the structure
(|A|, ⊆, ∅) is a domain, whose set of compact elements can be represented
as |A|c = {U | U ∈ Con}.
Proof. We already noticed that (|A|, ⊆, ∅) is a complete partial order-

ing and algebraic. If S ⊆ |A| is bounded, then S is its least upper
bound. Hence (|A|, ⊆, ∅) is bounded complete. The characterization of
the compact elements has been proved above.
Remark. The converse is true as well: one can show easily that every
domain with countable basis can be represented in the way just described,
as the set of all ideals of an appropriate information system.
6.1.3. Function spaces. We now define the “function space” A → B
between two information systems A and B.
Definition. Let A = (A, ConA , A ) and B = (B, ConB , B ) be infor-
mation systems. Define A → B = (C, Con, ) by
C := ConA × B,

{(Ui , bi ) | i ∈ I } ∈ Con := ∀J ⊆I ( Uj ∈ ConA →
j∈J
{bj | j ∈ J } ∈ ConB ).
For the definition of the entailment relation it is helpful to first define the
notion of an application of W := {(Ui , bi ) | i ∈ I } ∈ Con to U ∈ ConA :
{(Ui , bi ) | i ∈ I }U := {bi | U A Ui }.
254 6. Computability in higher types

From the definition of Con we know that this set is in ConB . Now define
W (U, b) by WU B b.
Clearly application is monotone in the second argument, in the sense
that U A U implies (WU ⊆ WU , hence also) WU B WU . In fact,
application is also monotone in the first argument, i.e.,
W W implies WU B W U.
To see this let W = {(Ui , bi ) | i ∈ I } and W = {(Uj , bj ) | j ∈ J }. By
definition W U = {bj | U A Uj }. Now fix j such that U A Uj ; we
must show WU B bj . By assumption W (Uj , bj ), hence WUj B bj .
Because of WU ⊇ WUj the claim follows.
Lemma. If A and B are information systems, then so is A → B defined
as above.
Proof. Let A = (A, ConA , A ) and B = (B, ConB , B ). The first,
second and fourth property of the definition are clearly satisfied. For the
third, suppose
{(U1 , b1 ), . . . , (Un , bn )} (U, b), i.e., {bj | U A Uj } B b.
We have to show that {(U1 , b1 ), . . . , (Un , bn ), (U, b)} ∈ Con. So let I ⊆
{1, . . . , n} and suppose

U∪ Ui ∈ ConA .
i∈I

We must show that {b} ∪ {bi | i ∈ I } ∈ ConB . Let J ⊆ {1, . . . , n} consist

of those j with U A Uj . Then also

U∪ Ui ∪ Uj ∈ ConA .
i∈I j∈J

Since

Ui ∪ Uj ∈ ConA ,
i∈I j∈J

from the consistency of {(U1 , b1 ), . . . , (Un , bn )} we can conclude that

{bi | i ∈ I } ∪ {bj | j ∈ J } ∈ ConB .
But {bj | j ∈ J } B b by assumption. Hence
{bi | i ∈ I } ∪ {bj | j ∈ J } ∪ {b} ∈ ConB .
For the final property, suppose
W W and W (U, b).
We have to show W (U, b), i.e., WU B b. We obtain WU B W U
by monotonicity in the first argument, and W U b by definition.
6.1. Abstract computability via information systems 255

We shall now give two alternative characterizations of the function

space: firstly as “approximable maps”, and secondly as continuous maps
w.r.t. the so-called Scott topology.
The basic idea for approximable maps is the desire to study “information
respecting” maps from A into B. Such a map is given by a relation r
between ConA and B, where r(U, b) intuitively means that whenever we
are given the information U ∈ ConA , then we know that at least the token
b appears in the value.
Definition. Let A = (A, ConA , A ) and B = (B, ConB , B ) be infor-
mation systems. A relation r ⊆ ConA × B is an approximable map if it
satisfies the following:
(a) if r(U, b1 ), . . . , r(U, bn ), then {b1 , . . . , bn } ∈ ConB ;
(b) if r(U, b1 ), . . . , r(U, bn ) and {b1 , . . . , bn } B b, then r(U, b);
(c) if r(U , b) and U A U , then r(U, b).
We write r : A → B to mean that r is an approximable map from A to B.
Theorem. Let A and B be information systems. Then the ideals of
A → B are exactly the approximable maps from A to B.
Proof. Let A = (A, ConA , A ) and B = (B, ConB , B ). If r ∈ |A →
B| then r ⊆ ConA × B is consistent and deductively closed. We have to
show that r satisfies the axioms for approximable maps.
(a) Let r(U, b1 ), . . . , r(U, bn ). We must show that {b1 , . . . , bn } ∈ ConB .
But this clearly follows from the consistency of r.
(b) Let r(U, b1 ), . . . , r(U, bn ) and {b1 , . . . , bn } B b. We must show
that r(U, b). But
{(U, b1 ), . . . , (U, bn )} (U, b)
by the definition of the entailment relation in A → B, hence r(U, b)
since r is deductively closed.
(c) Let U A U and r(U , b). We must show that r(U, b). But
{(U , b)} (U, b)
since {(U , b)}U = {b} (which follows from U A U ), hence again
r(U, b), again since r is deductively closed.
For the other direction suppose that r : A → B is an approximable map.
We must show that r ∈ |A → B|.
Consistency of r. Suppose r(U1 , b1 ), . . . , r(Un , bn ) and U = {Ui | i ∈
I } ∈ ConA for some I ⊆ {1, . . . , n}. We must show that {bi | i ∈ I } ∈
ConB . Now from r(Ui , bi ) and U A Ui we obtain r(U, bi ) by axiom (c)
for all i ∈ I , and hence {bi | i ∈ I } ∈ ConB by axiom (a).
Deductive closure of r. Suppose r(U1 , b1 ), . . . , r(Un , bn ) and
W := {(U1 , b1 ), . . . , (Un , bn )} (U, b).
256 6. Computability in higher types

We must show r(U, b). By deﬁnition of for A → B we have WU B b,

which is {bi | U A Ui } B b. Further by our assumption r(Ui , bi ) we
know r(U, bi ) by axiom (c) for all i with U A Ui . Hence r(U, b) by
axiom (b).
Definition. Suppose A = (A, Con, ) is an information system and
U ∈ Con. Define OU ⊆ |A| by
OU := {x ∈ |A| | U ⊆ x}.
Note that, since the ideals x ∈ |A| are deductively closed, x ∈ OU
implies U ⊆ x.
Lemma. The system of all OU with U ∈ Con forms the basis of a topo-
logy on |A|, called the Scott topology.
Proof. Suppose U, V ∈ Con and x ∈ OU ∩ OV . We have to find
W ∈ Con such that x ∈ OW ⊆ OU ∩ OV . Choose W = U ∪ V .
Lemma. Let A be an information system and O ⊆ |A|. Then the following
are equivalent.
(a) O is open in the Scott topology.
(b) O satisfies
(i) If x ∈ O and x ⊆ y, then y ∈ O (Alexandrov condition).
If x ∈ O, then U ∈ O for some U ⊆ x (Scott condition).
(ii)
(c) O = U ∈O OU .
Hence open sets O may be seen as those determined by a (possibly
infinite) system of finitely observable properties, namely all U such that
U ∈ O.
Proof. (a) → (b). If O is open, then O is the union of some OU ’s,
U ∈ Con. Since each OU is upwards closed, also O is; this proves the
Alexandrov condition. For the Scott condition assume x ∈ O. Then
x ∈ OU ⊆ O for some U ∈ Con. Note that U ∈ OU , hence U ∈ O, and
U ⊆ x since x ∈ OU .
(b) → (c). Assume that O ⊆ |A| satisfies the Alexandrov and Scott
conditions. Let x ∈ O. By the Scott condition, U ∈ O for some U ⊆ x,
so x ∈ OU for this U . Conversely, let x ∈ OU for some U ∈ O. Then
U ⊆ x. Now x ∈ O follows from U ∈ O by the Alexandrov condition.
(c) → (a). The OU ’s are the basic open sets of the Scott topology.
We now give some simple characterizations of the continuous functions
f : |A| → |B|. Call f monotone if x ⊆ y implies f(x) ⊆ f(y).
Lemma. Let A and B be information systems and f : |A| → |B|. Then
the following are equivalent.
(a) f is continuous w.r.t. the Scott topology.
(b) f is monotone and satisfies the “principle of finite support” PFS: If
b ∈ f(x), then b ∈ f(U ) for some U ⊆ x.
6.1. Abstract computability via information systems 257

Note that in (c) the set {f(x) | x ∈ D} is directed by monotonicity

of f; hence its union is indeed an ideal in |A|. Note also that from PFS
and monotonicity of f it follows immediately that if V ⊆ f(x), then
V ⊆ f(U ) for some U ⊆ x.
Hence continuous maps f : |A| → |B| are those that can be completely
described from the point of view of finite approximations of the abstract
objects x ∈ |A| and f(x) ∈ |B|: Whenever we are given a finite approxi-
mation V to the value f(x), then there is a finite approximation U to the
argument x such that already f(U ) contains the information in V ; note
that by monotonicity f(U ) ⊆ f(x).
Proof. (a) → (b). Let f be continuous. Then for any basic open set
OV ⊆ |B| (so V ∈ ConB ) the set f −1 [OV ] = {x | V ⊆ f(x)} is open in
|A|. To prove monotonicity assume x ⊆ y; we must show f(x) ⊆ f(y).
So let b ∈ f(x), i.e., {b} ⊆ f(x). The open set f −1 [O{b} ] = {z | {b} ⊆
f(z)} satisfies the Alexandrov condition, so from x ⊆ y we can infer
{b} ⊆ f(y), i.e., b ∈ f(y). To prove PFS assume b ∈ f(x). The open
set {z | {b} ⊆ f(z)} satisfies the Scott condition, so for some U ⊆ x we
have {b} ⊆ f(U ).
(b) → (a). Assume that f satisfies monotonicity and PFS. We must
show that f is continuous, i.e., that for any fixed V ∈ ConB the set
f −1 [OV ] = {x | V ⊆ f(x)} is open. We prove

{x | V ⊆ f(x)} = {OU | U ∈ ConA and V ⊆ f(U )}.

Let V ⊆ f(x). Then by PFS V ⊆ f(U ) for some U ∈ ConA such that
U ⊆ x, and U ⊆ x implies x ∈ OU . Conversely, let x ∈ OU for some
U ∈ ConA such that V ⊆ f(U ). Then U ⊆ x, hence V ⊆ f(x) by
monotonicity.
For (b) ↔ (c) assume
that f is monotone. Let f satisfy PFS, and D ⊆
|A| be directed. f( x∈D x) ⊇ x∈D f(x) follows from monotonicity.

inclusion let b ∈ f( x∈D x). Then by PFS b ∈ f(U ) for
For the reverse
some U ⊆ x∈D x. From the directedness and the fact that U is ﬁnite we
obtain U ⊆ z for some z ∈ D. From b ∈ f(U ) and monotonicity infer
b ∈ f(z). Conversely, let f commute with directed unions, and assume
b ∈ f(x). Then

b ∈ f(x) = f( U) = f(U ),
U ⊆x U ⊆x

hence b ∈ f(U ) for some U ⊆ x.

258 6. Computability in higher types

Clearly the identity and constant functions are continuous, and also the
composition g◦f of continuous functions f : |A| → |B| and g : |B| → |C |.
Theorem. Let A and B = (B, ConB , B ) be information systems. Then
the ideals of A → B are in a natural bijective correspondence with the
continuous functions from |A| to |B|, as follows.
(a) With any approximable map r : A → B we can associate a continuous
function |r| : |A| → |B| by
|r|(z) := {b ∈ B | r(U, b) for some U ⊆ z}.
We call |r|(z) the application of r to z.
(b) Conversely, with any continuous function f : |A| → |B| we can associate
an approximable map fˆ : A → B by
ˆ
f(U, b) := (b ∈ f(U )).
These assignments are inverse to each other, i.e., f = |f|
ˆ and r = |r|.
Proof. Let r be an ideal of A → B; then by the theorem just proved
r is an approximable map. We first show that |r| is well-defined. So let
z ∈ |A|.
|r|(z) is consistent: let b1 , . . . , bn ∈ |r|(z). Then there are U1 , . . . , Un ⊆
z such that r(Ui , bi ). Hence U := U1 ∪ · · · ∪ Un ⊆ z and r(U, bi ) by
axiom (c) of approximable maps. Now from axiom (a) we can conclude
that {b1 , . . . , bn } ∈ ConB .
|r|(z) is deductively closed: let b1 , . . . , bn ∈ |r|(z) and {b1 , . . . , bn } B
b. We must show b ∈ |r|(z). As before we find U ⊆ z such that r(U, bi ).
Now from axiom (b) we can conclude r(U, b) and hence b ∈ |r|(z).
Continuity of |r| follows immediately from part (b) of the lemma above,
since by definition |r| is monotone and satisfies PFS.
Now let f : |A| → |B| be continuous. It is easy to verify that fˆ is indeed
an approximable map. Furthermore
ˆ
b ∈ |f|(z) ˆ
↔ f(U, b) for some U ⊆ z
↔ b ∈ f(U ) for some U ⊆ z
↔ b ∈ f(z) by monotonicity and PFS.
Finally, for any approximable map r : A → B we have
r(U, b) ↔ ∃V ⊆U r(V, b) by axiom (c) for approximable maps
↔ b ∈ |r|(U )
b),
↔ |r|(U,

so r = |r|.
Moreover, one can easily check that
r ◦ s := {(U, c) | ∃V ((U, V ) ⊆ s ∧ (V, c) ∈ r)}
6.1. Abstract computability via information systems 259

is an approximable map (where (U, V ) := {(U, b) | b ∈ V }), and

|r ◦ s| = |r| ◦ |s|, f
◦ g = fˆ ◦ ĝ.
From now on we will usually write r(z) for |r|(z), and similarly f(U, b)
ˆ
for f(U, b). It should always be clear from the context where the mods
and hats should be inserted.
6.1.4. Algebras and types. We now consider concrete information sys-
tems, our basis for continuous functionals.
Types will be built from base types by the formation of function types,
→ . As domains for the base types we choose non-flat and possibly
infinitary free algebras, given by their constructors. The main reason
for taking non-flat base domains is that we want the constructors to be
injective and with disjoint ranges. This generally is not the case for flat
domains.
Definition (Algebras and types). Let , α be distinct type variables;
the αl are called type parameters. We inductively define type forms , , ∈
Ty(α ), constructor type forms κ ∈ KT (α ) and algebra forms ∈ Alg(α );
all these are called strictly positive in α. In case α is empty we abbreviate
Ty(α ) by Ty and call its elements types rather than type forms; similarly
for the other notions.
∈ Alg(α) ∈ Ty ∈ Ty(α
)
αl ∈ Ty(α ), , ,
∈ Ty(α
) → ∈ Ty(α )
κ0 , . . . , κk−1 ∈ KT (α )
(k ≥ 1),
(κ0 , . . . , κk−1 ) ∈ Alg(α )
∈ Ty(α ) 0 , . . . , n−1 ∈ Ty
(n ≥ 0).
→ ( → )<n → ∈ KT (α )
We use for algebra forms and , , for type forms. → means
0 → · · · → n−1 → , associated to the right. For → ( → )<n →
∈ KT (α ) call the parameter argument types and the → recursive
argument types. To avoid empty types, we require that there is a nullary
constructor type, i.e., one without recursive argument types.
Here are some examples of algebras.
U := (unit),
B := ( , ) (booleans),
N := ( , → ) (natural numbers, unary),
P := ( , → , → ) (positive numbers, binary),
D := ( , → → ) (binary trees, or derivations),
O := ( , → , (N → ) → ) (ordinals),
T0 := N, Tn+1 := ( , (Tn → ) → ) (trees).
260 6. Computability in higher types

Important examples of algebra forms are

L(α) := ( , α → → ) (lists),
α × := (α → → ) (product),
α + := (α → , → ) (sum).

Remark (Substitution for type parameters). Let ∈ Ty(α ); we write

(α ) for to indicate its dependence on the type parameters α.
We can
substitute types for α,
to obtain ( ). Examples are L(B), the type of
lists of booleans, and N × N, the type of pairs of natural numbers.
Note that often there are many equivalent ways to define a particular
type. For instance, we could take U + U to be the type of booleans, L(U)
to be the type of natural numbers, and L(B) to be the type of positive
binary numbers.
For every constructor type κi ( ) of an algebra = (
κ ) we provide
a (typed) constructor symbol Ci of type κi (). In some cases they have
standard names, for instance
ttB , ff B for the two constructors of the type B of booleans,
0N , SN→N for the type N of (unary) natural numbers,
P
1 , S0P→P , S1P→P for the type P of (binary) positive numbers,
L() →L()→L()
nil , cons for the type L() of lists,
→+ →+
(inl ) , (inr ) for the sum type + .
We denote the constructors of the type D of derivations by 0D (axiom)
and CD→D→D (rule).
One can extend the definition of algebras and types to simultaneously
defined algebras: just replace by a list = 0 , . . . , N −1 of type variables
and change the algebra introduction rule to
κ0 , . . . , κk−1 ∈ KT (α
)
(k ≥ 1, j < N ).
( (κ0 , . . . , κk−1 ))j ∈ Alg(α
)
with each κi of the form
→ ( → j )<n → j.

The deﬁnition of a “nullary” constructor type is a little more delicate here.

We require that for every j (j < N ) there is a κij with final value type
j , each of whose recursive argument types has a final value type j with
j < j. Examples of simultaneously defined algebras are
(Ev, Od) := , ( , → , → ) (even and odd numbers),
(Ts(), T()) := , ( , → → , → , → ) (tree lists and trees).
6.1. Abstract computability via information systems 261

T() defines finitely branching trees, and Ts() finite lists of such trees;
the trees carry objects of a type at their leaves. The constructor symbols
and their types are

EmptyTs() , TconsT()→Ts()→Ts() ,
Leaf →T() , BranchTs()→T() .

However, for simplicity we often consider non-simultaneous algebras only.

An algebra is finitary if all its constructor types (i) only have finitary
algebras as parameter argument types, and (ii) have recursive argument
types of the form only (so the in the general definition are all empty).
Structure-finitary algebras are defined similarly, but without conditions
on parameter argument types. In the examples above U, B, N, P and
D are all finitary, but O and Tn+1 are not. L(), × and + are
structure-finitary, and finitary if their parameter types are. An argument
position in a type is called finitary if it is occupied by a finitary algebra.
An algebra is explicit if all its constructor types have parameter argu-
ment types only (i.e., no recursive argument types). In the examples above
U, B, × and + are explicit, but N, P, L(), D, O and Tn+1 are not.
We will also need the notion of the level of a type, which is defined by
lev() := 0, lev( → ) := max{lev( ), 1 + lev()}.
Base types are types of level 0, and a higher type has level at least 1.
6.1.5. Partial continuous functionals. For every type we define the
information system C = (C , Con , ). The ideals x ∈ |C | are the
partial continuous functionals of type . Since we will have C → = C →
C , the partial continuous functionals of type → will correspond to
the continuous functions from |C | to |C | w.r.t. the Scott topology. It
will not be possible to define C by recursion on the type , since we allow
algebras with constructors having function arguments (like O and Sup).
Instead, we shall use recursion on the “height” of the notions involved,
defined below.

Deﬁnition (Information system of type ). We simultaneously deﬁne

C , C→ , Con and Con→ .
(a) The tokens a ∈ C are the type correct constructor expressions
Ca1∗ . . . an∗ where ai∗ is an extended token, i.e., a token or the spe-
cial symbol ∗ which carries no information.
(b) The tokens in C→ are the pairs (U, b) with U ∈ Con and b ∈ C .
(c) A ﬁnite set U of tokens in C is consistent (i.e., ∈ Con ) if all its
elements start with the same constructor C, say of arity 1 → · · · →
n → , and all Ui ∈ Coni for i = 1, . . . , n, where Ui consists of
all (proper) tokens at the i-th argument position of some token in
U = {Ca1∗ , . . . , Cam∗ }.
262 6. Computability in higher types
.
..
S(S(S0)) •
@@
S(S0) • @• S(S(S∗))
@@
S0 • @• S(S∗)
@@
0 • @• S∗

Figure 1. Tokens and entailment for N

(d) {(Ui , bi ) | i ∈ I } ∈ Con→ is defined to mean ∀J ⊆I ( j∈J Uj ∈
Con → {bj | j ∈ J } ∈ Con ).
Building on this definition, we define U a for U ∈ Con and a ∈ C .
(e) {Ca1∗ , . . . , Cam∗ } C a∗ is defined to mean C = C , m ≥ 1 and
Ui ai∗ , with Ui as in (c) above (and U ∗ taken to be true).
(f) W → (U, b) is defined to mean WU b, where application WU
of W = {(Ui , bi ) | i ∈ I } ∈ Con→ to U ∈ Con is defined to be
{bi | U Ui }; recall that U V abbreviates ∀a∈V (U a).
If we define the height of the syntactic expressions involved by
|Ca1∗ . . . an∗ | := 1 + max{|ai∗ | | i = 1, . . . , n}, | ∗ | := 0,
|(U, b)| := max{1 + |U |, 1 + |b|},
|{ai | i ∈ I }| := max{1 + |ai | | i ∈ I },
|U a| := max{1 + |U |, 1 + |a|},
these are definitions by recursion on the height.
It is easy to see that (C , Con , ) is an information system. Observe
that all the notions involved are computable: a ∈ C , U ∈ Con and
U a.
Definition (Partial continuous functionals). For every type let C
be the information system (C , Con , ). The set |C | of ideals in C is
the set of partial continuous functionals of type . A partial continuous
functional x ∈ |C | is computable if it is recursively enumerable when
viewed as a set of tokens.
Notice that C → = C → C as defined generally for information
systems.
For example, the tokens for the algebra N are shown in Figure 1. For
tokens a, b we have {a} b if and only if there is a path from a (up) to
b (down). As another (more typical) example, consider the algebra D of
derivations with a nullary constructor 0 and a binary C. Then {C0∗, C∗0}
is consistent, and {C0∗, C∗0} C00.
6.1. Abstract computability via information systems 263

6.1.6. Constructors as continuous functions. Let be an algebra. Every

constructor C generates the following ideal in the function space:
, Ca∗ ) | U
rC := {(U a∗ }.

Here (U , a) abbreviates (U1 , (U2 , . . . (Un , a) . . . )).

According to the general deﬁnition of a continuous function associated
to an ideal in a function space the continuous map |rC | satisﬁes
x ) = {Ca∗ | ∃U ⊆x (U
|rC |( a∗ )}.

An immediate consequence is that the (continuous maps corresponding

to) constructors are injective and their ranges are disjoint, which is what
we wanted to achieve by associating non-flat rather than flat information
systems with algebras.
Lemma (Constructors are injective and have disjoint ranges). Let be
an algebra and C be a constructor of . Then
|rC |(
x ) ⊆ |rC |(
y)↔x
⊆y
.
If C1 , C2 are distinct constructors of , then |rC1 |(
x ) = |rC2 |(
y ), since the
two ideals are non-empty and disjoint.
Proof. Immediate from the definitions.
Remark. Notice that neither property holds for flat information sys-
tems, since for them, by monotonicity, constructors need to be strict (i.e.,
if one argument is the empty ideal, then the value is as well). But then we
have
|rC |(∅, y) = ∅ = |rC |(x, ∅),
|rC1 |(∅) = ∅ = |rC2 |(∅)
where in the first case we have one binary and, in the second, two unary
constructors.
Lemma (Ideals of base type). Every non-empty ideal in the information
system associated to an algebra has the form |rC |( x ) with a constructor C
and ideals x .
Proof. Let z be a non-empty ideal and Ca0∗ b0∗ ∈ z, where for simplicity
we assume that C is a binary constructor. Let x := {a | Ca∗ ∈ z} and y :=
{b | C∗b ∈ z}; clearly x, y are ideals. We claim that z = |rC |(x, y). For ⊇
consider Ca ∗ b ∗ with a ∗ ∈ x ∪ {∗} and b ∗ ∈ y ∪ {∗}. Then by definition
{Ca ∗ ∗, C∗b ∗ } ⊆ z, hence Ca ∗ b ∗ ∈ z by deductive closure. Conversely,
notice that an arbitrary element of z must have the form Ca ∗ b ∗ , because of
consistency. Then {Ca ∗ ∗, C∗b ∗ } ⊆ z again by deductive closure. Hence
a ∗ ∈ x ∪ {∗} and b ∗ ∈ y ∪ {∗}, and therefore Ca ∗ b ∗ ∈ |rC |(x, y).
It is in this proof that we need entailment to be a relation between finite
sets of tokens and single tokens, not just a binary relation between tokens.
264 6. Computability in higher types

Information systems with the latter property are called atomic; they have
been studied in Schwichtenberg [2006c].
The information systems C enjoy the pleasant property of coherence,
which amounts to the possibility to locate inconsistencies in two-element
sets of data objects. Generally, an information system A = (A, Con, )
is coherent if it satisﬁes: U ⊆ A is consistent if and only if all of its
two-element subsets are.
Lemma. Let A and B be information systems. If B is coherent, then so
is A → B.
Proof. Let A = (A, ConA , A ) and B = (B, ConB , B ) be information
systems, and consider {(U1 , b1 ), . . . , (Un , bn )} ⊆ ConA × B. Assume

∀1≤i<j≤n ({(Ui , bi ), (Uj , bj )} ∈ Con).

We have to show {(U1 , b1 ), . . . , (Un , bn )} ∈ Con. Let I ⊆ {1, . . . , n} and

i∈I Ui ∈ ConA . We must show {bi | i ∈ I } ∈ ConB . Now since B is
coherent by assumption, it suffices to show that {bi , bj } ∈ ConB for all
i, j ∈ I . So let i, j ∈ I . By assumption we have Ui ∪ Uj ∈ ConA , and
hence {bi , bj } ∈ ConB .
By a similar argument we can prove
Lemma. The information systems C are all coherent.
Proof. By induction of the height |U | of consistent finite sets of tokens
in C , as defined in parts (c) and (d) of the definition in 6.1.5.
6.1.7. Total and cototal ideals in a finitary algebra. In the information
system C associated with an algebra , the “total” and “cototal” ideals
are of special interest. Here we give an explicit definition for finitary
algebras. For general algebras totality can be defined inductively and
cototality coinductively (cf. 7.1.6).
Recall that a token in is a constructor tree P possibly containing
the special symbol ∗. Because of the possibility of parameter arguments
we need to distinguish between “structure-” and “fully” total and cototal
ideals. For the definition it is easiest to refer to a constructor tree P(∗) with
a distinguished occurrence of ∗. This occurrence is called non-parametric
if the path from it to the root does not pass through a parameter argument
of a constructor. For a constructor tree P(∗), an arbitrary P(Ca∗ ) is called
one-step extension of P(∗), written P(Ca∗ ) 1 P(∗).

Deﬁnition. Let be an algebra, and C its associated information

system. An ideal x ∈ |C | is cototal if every constructor tree P(∗) ∈ x
has a 1 -predecessor P(C∗ ) ∈ x; it is called total if it is cototal and the
relation 1 on x is well-founded. It is called structure-cototal (structure-
total) if the same holds with 1 deﬁned w.r.t. P(∗) with a non-parametric
distinguished occurrence of ∗.
6.1. Abstract computability via information systems 265

If there are no parameter arguments, we shall simply speak of total

and cototal ideals. For example, for the algebra N every total ideal is
the deductive closure of a token S(S . . . (S0) . . . ), and the set of all tokens
S(S . . . (S∗) . . . ) is a cototal ideal. For the algebra L(N) of lists of natural
numbers the total ideals are the finite lists and the cototal ones the finite
or infinite lists. For the algebra D of derivations the total ideals can
be viewed as the finite derivations, and the cototal ones as the finite or
infinite “locally correct” derivations of Mints [1978]; arbitrary ideals can
be viewed as “partial” or “incomplete” derivations, with “holes”.
Remark. From a categorical perspective (as in Hagino [1987], Rutten
[2000]) finite lists of natural numbers can be seen as making up the initial
algebra of the functor TX = 1 + (N × X ), and infinite lists (or streams)
of natural numbers as making up the terminal coalgebra of the functor
TX = N × X . In the present setting both finite and infinite lists of natural
numbers appear as cototal ideals in the algebra L(N), with the finite ones
the total ideals. However, to properly deal with computability we need
to accommodate partiality, and hence there are more ideals in the algebra
L(N).
We consider two examples (both due to Berger [2009]) of algebras
whose cototal ideals are of interest. The first one concerns the algebra
I := ( , → , → , → ) of standard rational intervals, whose
constructors we name I (for the initial interval [−1, 1]) and C−1 , C0 , C1
(for the left, middle, right part of the argument interval, of half its length).
For example, C−1 I, C0 I and C1 I should be viewed as the intervals [−1, 0],
[− 12 , 12 ] and [0, 1]. Every total ideal then can be seen as a standard interval
i 1 i 1
Ii·2−k ,k := [ k
− k, k + k] for −2k < i < 2k .
2 2 2 2
However, the cototal ideals include {Cn−1 ∗ | n ≥ 0}, which can be seen as
a “stream” representation of the real −1, and also {Cn1 C−1 ∗ | n ≥ 0} and
{Cn−1 C1 ∗ | n ≥ 0}, which both represent the real 0. Generally, the cototal
ideals give us all reals in [−1, 1], in the well-known (non-unique) stream
representation using “signed digits” from {−1, 0, 1}.
The second example concerns the simultaneously defined algebras
(W, R) := , ( , → , → , → , → , → → → ).
The constructors with their type and intended meaning are
W0 : W stop,
W:R→W quit writing and go into read mode,
Rd : W → R quit reading and write d (d ∈ {−1, 0, 1}),
R: R → R → R → R read the next digit and stay in read mode.
266 6. Computability in higher types

Consider a well-founded “read tree”, i.e., a constructor tree built from R

(ternary) with Rd at its leaves. The digit d at a leaf means that, after
reading all input digits on the path leading to the leaf, the output d is
written. Notice that the tree may consist of a single leaf Rd , which means
that, without any input, d is written as output. Let Rd1 , . . . , Rdn be all
leaves of such a well-founded tree. At a leaf Rdi we continue with W
(indicating that we now write di ), and continue with another such well-
founded read tree, and carry on. The result is a “W-cototal R-total” ideal,
which can be viewed as a representation of a uniformly continuous real
function f : I → I. For example, let P := R(R1 WP, R0 WP, R−1 WP).
Then P represents the function f(x) := −x, and R0 WP represents the
function f(x) := − x2 .

6.2. Denotational and operational semantics

For every type , we have deﬁned what a partial continuous functional

of type is: an ideal consisting of tokens at this type. These tokens
or rather the formal neighborhoods formed from them are syntactic in
nature; they are reminiscent to Kreisel’s “formal neighborhoods” (Kreisel
[1959], Martin-Löf [1983], Coquand and Spiwack [2006]). However—in
contrast to Martin-Löf [1983]—we do not have to deal separately with a
notion of consistency for formal neighborhoods: this concept is built into
information systems.
Let us now turn our attention to a formal (functional programming)
language, in the style of Plotkin’s PCF [1977], and see how we can provide
a denotational semantics (that is, a “meaning”) for the terms of this
language. A closed term M of type will denote a partial continuous
functional of this type, that is, a consistent and deductively closed set of
tokens of type . We will define this set inductively.
It will turn out that these sets are recursively enumerable. In this sense
every closed term M of type denotes a computable partial continuous
functional of type . However, it is not a good idea to define a computable
functional in this way, by providing a recursive enumeration of its tokens.
We rather want to be able to use recursion equations for such definitions.
Therefore we extend the term language by constants D defined by certain
“computation rules”, as in Berger, Eberl, and Schwichtenberg [2003],
Berger [2005b]. Our semantics will cover these as well. The resulting
term system can be seen as a common extension of Gödel’s T [1958] and
Plotkin’s PCF; we call it T+ . There are some natural questions one can
ask for such a term language:
(a) Preservation of values under conversion (as in Martin-Löf [1983, First
Theorem]). Here we need to include applications of computation
rules.
6.2. Denotational and operational semantics 267

(b) An adequacy theorem (cf. Plotkin [1977, Theorem 3.1] or Martin-

Löf [1983, Second Theorem]), which in our setting says that whenever
a closed term of ground type has a proper token in the ideal it denotes,
then it evaluates to a constructor term entailing this token.
Property (a) will be proved in 6.2.7, and (b) in 6.2.8.
6.2.1. Structural recursion operators and Gödel’s T. We begin with
a discussion of particularly important examples of such constants D,
the (structural) higher type recursion operators R introduced by Gödel
[1958]. They are used to construct maps from the algebra to , by
recursion on the structure of . For instance, RN has type N → →
(N → → ) → . The first argument is the recursion argument, the
second one gives the base value, and the third one gives the step function,
mapping the recursion argument and the previous value to the next value.
For example, RN N nmn,p (Sp) defines addition m + n by recursion on n.
Generally, in order to define the type of the recursion operators w.r.t.
= (κ0 , . . . , κk−1 ) and result type , we first define for each constructor
type
κ = → ( → )<n → ∈ KT
the step type
:= → ( → )<n → ( → )<n → .
The recursion operator R then has type
→ 0 → · · · → k−1 →
where k is the number of constructors. The recursion argument is of type
. In the step type above, the are parameter types, ( → )<n are
the types of the predecessor components in the recursion argument, and
( → )<n are the types of the previously defined values.
For some common algebras listed in 6.1.4 we spell out the type of their
recursion operators:
RB : B → → → ,
RN : N → → (N → → ) → ,
RP : P → → (P → → ) → (P → → ) → ,
RO : O → → (O → → ) → ((N → O) → (N → ) → ) → ,
RL() : L() → → ( → L() → → ) → ,
R+ : + → ( → ) → ( → ) → ,
R× :× → ( → → ) → .
One can extend the definition of the (structural) recursion operators to
simultaneously defined algebras = (κ0 , . . . , κk−1 ) and result types .
268 6. Computability in higher types

Then for each constructor type

κ = → ( → j )<n → j ∈ KT
we have the step type
:= → ( → j )<n → ( → j )<n → j .
The j-th simultaneous recursion operator Rj, has type
j → 0 → · · · → k−1 → j
where k is the total number of constructors. The recursion argument is
of type j . In the step type , the are parameter types, ( → j )<n are
the types of the predecessor components in the recursion argument, and
( → j )<n are the types of the previously defined values. We will often
omit the upper indices , when they are clear from the context. Notice
that in case of a non-simultaneous free algebra we write R for R, 1 . An
example of a simultaneous recursion on tree lists and trees will be given
below.
Definition. Terms of Gödel’s T are inductively defined from typed vari-
ables x and constants for constructors Ci and recursion operators Rj,
by abstraction x M and application M → N .
6.2.2. Conversion. To define the conversion relation for the structural
recursion operators, it will be helpful to use the following notation. Let
= κ
,

κi = 0 → · · · → m−1 → (0 → j0 ) → · · · → (n−1 → jn−1 ) → j

∈ KT ,
of type j . We write N
and consider Ci N P = N P , . . . , N P for the
0 m−1
0 m−1 R = N R , . . . , N R for the
parameter arguments N0 , . . . , Nm−1 and N 0 n−1
0 →j n−1 →j
recursive arguments Nm 0 , . . . , Nm+n−1 n−1 , and n R for the number n of
recursive arguments.
We deﬁne a conversion relation → between terms of type by
(x M (x))N → M (N ), (1)
x (Mx) → M if x ∈
/ FV(M ) (M not an abstraction), (2)
Rj (Ci N
)M
→ Mi N
((Rj0 · M
)◦ N0R ) . . . ((Rjn−1 ·M
)◦ R
Nn−1 ). (3)
Here we have written Rj · M ); ◦ denotes ordinary
for x j (R, x j M
j
composition. The rule (1) is called -conversion, and (2) -conversion;
their left hand sides are called -redexes or -redexes, respectively. The left
hand side of (3) is called R-redex; it is a special case of a redex associated
with a constant D deﬁned by “computation rules” (cf. 6.2.4), and hence
also called a D-redex.
6.2. Denotational and operational semantics 269

Let us look at some examples of what can be deﬁned in Gödel’s T. We

define the canonical inhabitant ε of a type ∈ Ty:
ε j := Cij ε (x1 ε j1 ) . . . (xn ε jn ), ε → := x ε .
The projections of a pair to its components can be defined easily:
M 0 := R× M × (x ,y x ), M 1 := R× M × (x ,y y ).
The append function ∗ for lists is defined recursively as follows. We
write x :: l as shorthand for cons(x, l ).
nil ∗ l2 := l2 , (x :: l1 ) ∗ l2 := x :: (l1 ∗ l2 ).
It can be defined as the term
l1 ∗ l2 := RL(α)→L(α)
L(α)
l1 (l2 l2 )x, ,p,l2 (x :: (pl2 ))l2 .
Here “ ” is a name for a bound variable which is not used.
Using the append function ∗ we can define list reversal Rev by
Rev(nil) := nil, Rev(x :: l ) := Rev(l ) ∗ (x :: nil).
The corresponding term is
Rev(l ) := RL(α)
L(α)
l nilx, ,p (p ∗ (x :: nil)).
Assume we want to define by simultaneous recursion two functions on
N, say even, odd : N → B. We want
even(0) := tt, odd(0) := ff,
even(Sn) := odd(n), odd(Sn) := even(n).
This can be achieved by using pair types: we recursively define the single
function evenodd : N → B × B. The step types are
0 = B × B, 1 = N → B × B → B × B,
and we can define evenodd m := RB×B N mtt, ffn,p p1, p0.
Another example concerns the algebras (Ts(N), T(N)) simultaneously
defined in 6.1.4 (we write them without the argument N here), whose
constructors C(Ts,T)
i for i ∈ {0, . . . , 3} are
EmptyTs , TconsT→Ts→Ts , Leaf N→T , BranchTs→T .
Recall that the elements of the algebra T (i.e., T(N)) are just the finitely
branching trees, which carry natural numbers on their leaves.
Let us compute the types of the recursion operators w.r.t. the result
types 0 , 1 , i.e., of R(Ts,T),(
Ts
0 ,1 )
and R(Ts,T),(
T
0 ,1 )
, or shortly RTs and RT .
The step types are
0 := 0 , 2 := N → 1 ,
1 := Ts → T → 0 → 1 → 0 , 3 := Ts → 0 → 1 .
270 6. Computability in higher types

Hence the types of the recursion operators are

RTs : Ts → 0 → 1 → 2 → 3 → 0 ,
RT : T → 0 → 1 → 2 → 3 → 1 .
As a concrete example we recursively define addition ⊕ : Ts → T → Ts
and + : T → T → T. The tree list ⊕ bs a is the result of replacing each
(labelled) leaf in bs by the tree a, and + b a is defined similarly. The
recursion equations to be satisfied are
⊕ Empty = a Empty,
⊕(Tcons b bs) = a (Tcons(+ b a)(⊕ bs a)),
+(Leaf n) = a a,
+(Branch bs) = a (Branch(⊕ bs a)).
We define ⊕ and + by means of the recursion operators RTs and RT with
result types 0 := T → Ts and 1 := T → T. The step terms are
M0 := a Empty, M2 := n,a a,
M1 := bs,b,f,g,a (Tcons(g 1 a)(f 0 a)), M3 := bs,f,a (Branch(f 0 a)).
Then
bs ⊕ a := RTs bsM
a, b + a := RT b M
a.
We finally introduce some special cases of structural recursion and also
a generalization; both will be important later on.
Simplified simultaneous recursion. In a recursion on simultaneously de-
fined algebras one may need to recur on some of those algebras only. Then
one can simplify the type of the recursion operator accordingly, by
(i) omitting all step types i , with irrelevant value type j , and
(ii) simplifying the remaining step types by omitting from the recursive
argument types ( → j )<n and also from their algebra-duplicates
( → j )<n all those with irrelevant j .
In the (Ts, T)-example, if we only want to recur on Ts, then the step types
are
0 := 0 , 1 := Ts → 0 → 0 .
Hence the type of the simplified recursion operator is
RTs : Ts → 0 → 1 → 0 .
An example is the recursive definition of the length of a tree list. The
recursion equations are
lh(Empty) = 0, lh(Tcons b bs) = lh(bs) + 1.
The step terms are
M0 := 0, M1 := bs,p (p + 1).
6.2. Denotational and operational semantics 271

Cases. There is an important variant of recursion, where no recursive

calls occur. This variant is called the cases operator; it distinguishes cases
according to the outer constructor form. Here all step types have the
form
i , := → ( → j )<n → j .
The intended meaning of the cases operator is given by the conversion
rule
Cj (Ci N → Mi N
)M . (4)
Notice that only those step terms are used whose value type is the present
j ; this is due to the fact that there are no recursive calls. Therefore the
type of the cases operator is
Cj →j : j → i0 → · · · → iq−1 → j ,

where i0 , . . . , iq−1 consists of all i with value type j . We write Cjj or
even Cj for Cj →j .
The simplest example (for type B) is if-then-else. Another example
is
CN : N → → (N → ) → .
It can be used to define the predecessor function on N, i.e., P0 := 0 and
P(Sn) := n, by the term
Pm := CNN m0(n n).
In the (Ts, T)-example we have
0
CTs : Ts → 0 → (T → Ts → 0 ) → 0 .
When computing the value of a cases term, we do not want to (ea-
gerly) evaluate all arguments, but rather compute the test argument first
and depending on the result (lazily) evaluate at most one of the other
arguments. This phenomenon is well known in functional languages; for
instance, in Scheme the if-construct is called a special form (as opposed
to an operator). Therefore instead of taking the cases operator applied
to a full list of arguments, one rather uses a case-construct to build this
term; it differs from the former only in that it employs lazy evaluation.
Hence the predecessor function is written in the form [case m of 0 | n n].
If there are exactly two cases, we also write m [if m then 0 else n n]
instead.
General recursion with respect to a measure. In practice it often hap-
pens that one needs to recur to an argument which is not an immediate
component of the present constructor object; this is not allowed in struc-
tural recursion. Of course, in order to ensure that the recursion termi-
nates we have to assume that the recurrence is w.r.t. a given well-founded
set; for simplicity we restrict ourselves to the algebra N. However, we
272 6. Computability in higher types

do allow that the recurrence is with respect to a measure function ,

with values in N. The operator F of general recursion then is deﬁned
by

FxG = Gx(y [if y < x then FyG else ε]), (5)

where ε denotes a canonical inhabitant of the range. We leave it as an ex-

ercise to prove that F is definable from an appropriate structural recursion
operator.
6.2.3. Corecursion. We will show in 6.3 that an arbitrary “reduction
sequence” beginning with a term in Gödel’s T terminates. For this to hold
it is essential that the constants allowed in T are restricted to constructors
C and recursion operators R. A consequence will be that every closed
term of a base type denotes a total ideal. The conversion rules for R
(cf. 6.2.2) work from the leaves towards the root, and terminate because
total ideals are well-founded. If, however, we deal with cototal ideals
(infinitary derivations, for example), then a similar operator is available
to define functions with cototal ideals as values, namely “corecursion”.
For simplicity we restrict ourselves to finitary algebras, and only consider
the non-simultaneous case. The corecursion operator co R is used to con-
struct a mapping from to by corecursion on the structure of . We
define the (single) step type by

:= → × ( + ) × · · · × ( + ) ,

with summation over all constructors of . are the types of the parameter
arguments of the i-th constructor, followed by as many ( + )’s as there
are recursive arguments. The corecursion operator co R has type →
→ .
We list the types of the corecursion operators for some algebras:
co
RB : → ( → U + U) → B,
co
RN : → ( → U + (N + )) → N,
co
RP : → ( → U + (P + ) + (P + )) → P,
co
RD : → ( → U + (D + ) × (D + )) → D,
co
RL() : → ( → U + × (L() + )) → L(),
co
RI : → ( → U + (I + ) + (I + ) + (I + )) → I.

The conversion relation for each of these is deﬁned below. For f : →

and g : → we denote x (R+ xfg) of type + → by [f, g], and
similary for ternary sumtypes etcetera. x1 , x2 are shorthand for the two
projections of x of type × . The identity functions id below are of type
6.2. Denotational and operational semantics 273

→ with the respective algebra.

co
RB NM → [ tt, ff](MN ),
co
RN NM → [ 0, x (S([idN→N , y (co RN yM )]x))](MN ),
co
RP NM → [ 1, x (S0 ([id, PP ]x)), x (S1 ([id, PP ]x))](MN ),
co
RD NM → [ 0, x (C([id, PD ]x1 )([id, PD ]x2 ))](MN ),
co
RL() NM → [ nil, x (x1 :: [id, y (co RL() yM )]x2 )](MN ),
co
RI NM → [ I, x (C−1 ([id, PI ]x)), x (C0 ([id, PI ]x)),
x (C1 ([id, PI ]x))](MN )
with Pα := y (co Rα yM ) for α ∈ {P, D, I}.
As an example of a function defined by corecursion (again due to Berger
[2009]) consider the transformation of an “abstract” real in the interval
[−1, 1] into a stream representation using signed digits from {−1, 0, 1}.
Assume that we work in an abstract (axiomatic) theory of reals, having an
unspecified type , and that we have a type for rationals as well. Assume
that the abstract theory provides us with a function g : → → → B
comparing a real x with a proper rational interval p < q:
g(x, p, q) = tt → x ≤ q,
g(x, p, q) = ff → p ≤ x.
From g we define a function h : → U + (I + ) + (I + ) + (I + ) by
⎧
⎪
⎨2x + 1 in rhs of left I + if g(x, − 2 , 0) = tt,
1

h(x) := 2x in rhs of middle I + if g(x, − 12 , 0) = ff, g(x, 0, 12 ) = tt,

⎪
⎩
2x − 1 in rhs of right I + if g(x, 0, 12 ) = ff.
h is definable by a closed term M in Gödel’s T. Then the desired function
f : → I transforming an abstract real x into a cototal ideal (i.e., a
stream) in I can be defined by
f(x) := co RI xM.
This f(x) will thus be a stream of digits −1, 0, 1.
6.2.4. A common extension T+ of Gödel’s T and Plotkin’s PCF. Terms
of T+ are built from (typed) variables and (typed) constants (constructors
C or defined constants D, see below) by (type-correct) application and
abstraction:
M, N ::= x | C | D | (x M )→ | (M → N ) .
Definition (Computation rule). Every defined constant D comes with
a system of computation rules, consisting of finitely many equations
i (
DP yi ) = Mi (i = 1, . . . , n) (6)
274 6. Computability in higher types

with free variables of P i (

yi ) and Mi among y
i , where the arguments on the
left hand side must be “constructor patterns”, i.e., lists of applicative terms
built from constructors and distinct variables. To ensure consistency of
the defining equations, we require that for i = j P i and P j have disjoint

free variables, and either Pi and Pj are non-unifiable (i.e., there is no
substitution which identifies them), or else for the most general unifier
of Pi and Pj we have Mi = Mj . Notice that the substitution assigns
to the variables yi in Mi constructor patterns R k (
z ) (k = i, j). A further
requirement on a system of computation rules D P i (
yi ) = Mi is that the

lengths of all Pi (yi ) are the same; this number is called the arity of D,
denoted by ar(D). A substitution instance of a left hand side of (6) is
called a D-redex.
More formally, constructor patterns are defined inductively by (we write

P(
x ) to indicate all variables in P):
(a) x is a constructor pattern.
(b) The empty list is a constructor pattern.
x ) and Q(
(c) If P( y ) are constructor patterns whose variables x
and y

Q)(
are disjoint, then (P, x, y
) is a constructor pattern.
(d) If C is a constructor and P a constructor pattern, then so is CP,

provided it is of ground type.
Remark. The requirement of disjoint variables in constructor patterns
i and P
P j used in computation rules of a defined constant D is needed
to ensure that applying the most general unifier produces constructor
patterns again. However, for readability we take this as an implicit con-
vention, and write computation rules with possibly non-disjoint variables.
Examples of constants D defined by computation rules are abundant.
The defining equations in 6.2.2 can all be seen as computation rules, for
(i) the append-function ∗,
(ii) list reversal Rev,
(iii) the simultaneously defined functions even, odd : N → B and
(iv) the two simultaneously defined functions ⊕ : Ts → T → Ts and
+ : T → T → T.
Moreover, the structural recursion operators themselves can be viewed
as defined by computation rules, which in this case are called conversion
rules; cf. 6.2.2.
The boolean connectives andb, impb and orb are defined by
tt andb y = y, tt orb y = tt,
ff impb y = tt,
x andb tt = x, x orb tt = tt,
tt impb y = y,
ff andb y = ff, ff orb y = y,
x impb tt = tt,
x andb ff = ff, x orb ff = x.
6.2. Denotational and operational semantics 275

Notice that when two such rules overlap, their right hand sides are equal
under any unifier of the left hand sides.
Generally, for finitary algebras we define the boolean-valued function
E : → B (existence, corresponding to total ideals in finitary algebras)
and for structure-finitary algebras SE : → B (structural existence, cor-
responding to structure-total ideals) by

Ej (Ci x P andb
) = E x E x R
m+ , SEj (Ci x
)= R
xm+
SE ( )
<n <n

(recall the notation from 6.2.2 for parameter and recursive arguments of
a constructor). Examples are

EN 0 = tt, SEL(α) (nil) = tt,

EN (Sn) = EN n, SEL(α) (x :: l ) = SEL(α) l.

Decidable equality = : → → B for a ﬁnitary algebra is deﬁned by

(Ci x ) = ff if i = j,
= Cj y

R
(Ci x
= Ci y x P = y
) = ( P andb xm+ =j y
( R
m+ )).
<n

For example,

(0 =N 0) = tt, (Sm =N 0) = ff,

(0 =N Sn) = ff, (Sm =N Sn) = (m =N n).

The predecessor functions introduced in 6.2.1 by means of the cases

operator C can also be viewed as deﬁned constants:

P0 = 0, P(Sn) = n.

Another example is the destructor function, disassembling a constructor-

built argument into its parts. For the type T1 := ( , (N → ) → ) it
is

DT1 : T1 → U + (N → T1 )

deﬁned by the computation rules

DT1 0 = inl(u), DT1 (Sup(f)) = inr(f).

Generally, the type of the destructor function for := (κ0 , . . . , κk−1 )

with κi = i → (i → )<ni → is

D : → i × (i → ) .
i<k <ni
276 6. Computability in higher types

6.2.5. Confluence. The -conversion rule (cf. 6.2.2) together with the
computation rules of the defined constants D generate a “reduction”
relation → between terms of T+ . We show that the reflexive and transitive
closure →∗ of → is “confluent”, i.e., any two reduction sequences starting
from the same term can be continued to lead to the same term. The
proof uses a method due to W. W. Tait and P. Martin-Löf (cf. Barendregt
[1984, 3.2]). The idea is to use a “parallel” reduction relation →p , which
intuitively has the following meaning. Mark some - or D-redexes in a
given term. Then convert all of them, working in parallel from the leaves
to the root. Notice that redexes newly generated by this process are not
converted. Confluence of the relation →p can be easily proved using the
notion of a “complete development” M ∗ of a term M due to Takahashi
[1995], and confluence of →p immediately implies the confluence of →∗ .
Recall the definition of the -conversion relation → in 6.2.2. We extend
i (
it with the computation rules of T+ : for every such rule D P yi ) = Mi (
yi )
we have the conversion D P i ) → Mi (N
i (N i ). The one step reduction
relation → between terms in T+ is defined as follows. M → N if N is
obtained from M by replacing a subterm M in M by N , where M → N .
The reduction relations →+ and →∗ are the transitive and the reflexive
transitive closure of →, respectively.
Definition. A binary relation R has the diamond property if from xRy1
and xRy2 we can infer the existence of z such that y1 Rz and y2 Rz. We call
R confluent if its reflexive and transitive closure has the diamond property.
Lemma. Every binary relation R with the diamond property is confluent.
Proof. We write xRn y if there is a sequence x = x0 Rx1 Rx2 R . . . Rxn =
y. By induction on n + m one proves that from xRn y1 and xRm y2 we can
infer the existence of a z such that y1 Rm z and y2 Rn z.
Definition. Parallel reduction →p is defined inductively by the rules
x →p x, C →p C, D →p D (7)

M →p M
(8)
x M →p x M
M →p M N →p N
(9)
MN →p M N
M (x) →p M (x) N →p N
(10)
(x M (x))N →p M (N )
→p N
N
y ) = M (
for D P( y ) a computation rule. (11)
N
D P( )
) →p M (N

x ) →p M (
Lemma (Substitutivity of →p ). If M ( ,
→p K
x ) and K

then M (K ) →p M (K ).

6.2. Denotational and operational semantics 277

Proof. By induction on M →p M . All cases are easy with the possible

exception of (10) and (11). Case (10). Consider
) →p M (y, x
M (y, x ) x ) →p N (
N ( x)
.
(y M (y, x x ) →p M (N (
))N ( x ), x
)
→p K
Assume K ) →p M (y, K
. By induction hypothesis M (y, K ) and
→p N
(K)
N
(K
). Then an application of (10) gives
) →p M (N (K
))N (K
(y M (y, K ), K
),

since we can assume y ∈ ).

/ FV(K
Case (11). Consider
x ) →p N
(
N (
x)
N
D P( x )) →p M (N
( (
x ))
y ) = M (
with D P( y ) a computation rule. Assume K →p K . By

→p N (K ). Then an application of (11)
induction hypothesis N (K)
gives
N
D P( (K)) )).
(K
→p M (N

Here we have made use of our assumption that all free variables in
y ) = M (
D P( y ) are among y
.
Deﬁnition (Complete expansion M ∗ of M ).
x ∗ := x, C∗ := C for constructors C,
∗
D := D if ar(D) > 0, or ar(D) = 0 and D has no rules,
(x M ) := x M ∗ ,
∗

(MN )∗ := M ∗ N ∗ if MN is neither a - nor a D-redex,

∗ ∗ ∗
((x M (x))N ) := M (N ),
N
(D P( ))∗ := M (N ∗ ) for D P(
y ) = M (
y ) a computation rule.

To see that M ∗ is well-deﬁned assume D P1 (N 1 ) = D P2 (N

2 ), where

D Pi (
yi ) = Mi (yi ) (i = 1, 2) are computation rules. We must show
M1 (N1∗ ) = M2 (N2∗ ). By our conditions on computation rules there is a
most general uniﬁer of P 1 (
y1 ) and P 2 (
y2 ) such that M1 (
y1 ) = M2 ( y2 ).
Notice that yi is a constructor pattern; without loss of generality we can
assume that both y 1 and y 2 are parts of the same constructor pattern.
Then we can write N i = (
yi )(K ), where the substitution of K is for x
.
Hence
Mi (Ni∗ ) = Mi ((
yi )(K∗ )) = Mi (
yi )(K∗ )
and therefore M1 (N1∗ ) = M2 (N2∗ ).
278 6. Computability in higher types

The crucial property of the complete expansion M ∗ of M is that the

result of an arbitrary parallel reduction of M can be further parallely
reduced to M ∗ .
Lemma. M →p M implies M →p M ∗ .
Proof. By induction on M and distinguishing cases on M →p M .
The initial cases (7) are easy. Case (8). Then
M →p M
.
x M →p x M
By induction hypothesis M →p M ∗ . Then another application of (8)
yields x M →p x M ∗ = (x M )∗ .
Case (9). We distinguish cases on M . Subcase MN , which is neither a
- nor a D-redex. Then (MN )∗ = M ∗ N ∗ , and the last rule was
M →p M N →p N
.
MN →p M N
By induction hypothesis M →p M ∗ and N →p N ∗ . Another applica-
tion of (9) yields M N →p M ∗ N ∗ .
Subcase (x M (x))N . Then ((x M (x))N )∗ = M ∗ (N ∗ ), and the last
rule was
x M (x) →p x M (x) N →p N
.
(x M (x))N →p (x M (x))N
Then we also have M (x) →p M (x). By induction hypothesis M (x) →p
M ∗ (x) and N →p N ∗ . Therefore (x M (x))N →p M ∗ (N ∗ ), which
was to be shown.
Subcase D P( M ) where D P(
y ) = K( y ) is a computation rule. Then
M
(D P( ))∗ = K(M ∗ ). The last rule derived D P( M ) →p N for some

N . Since this rule was (9) we have N = D N and P(M ) →p N . But
y ) is a constructor pattern, hence N
P( = P( M
) with M →p M .
By induction hypothesis M →p M ∗ . Therefore N = D P( ) →p
M
∗ ∗
K(M ) = (D P(M )) .
Case (10). Then
M (x) →p M (x) N →p N
.
(x M (x))N →p M (N )
We must show M (N ) →p ((x M (x))N )∗ (= M ∗ (N ∗ )). But this follows
with the induction hypothesis M (x) →p M ∗ (x) and N →p N ∗ from
the substitutivity of →p .
Case (11). Then
→p N
N
y ) = M (
for D P( y ) a computation rule.
N
D P( )
) →p M (N
6.2. Denotational and operational semantics 279

We must show M (N ) →p (D P(
N ))∗ (= M (N ∗ )). Again this fol-
lows with the induction hypothesis N →p N ∗ from the substitutivity
of →p .
Corollary. →∗ is confluent.
Proof. The reflexive closure of → is contained in →p , which itself is
contained in →∗ . Hence →∗ is the reflexive and transitive closure of →p .
Since →p has the diamond property by the previous lemma, an earlier
lemma implies that →∗ is confluent.
6.2.6. Ideals as denotation of terms. How can we use computation rules
to define an ideal z in a function space? The general idea is to inductively
define the set of tokens (U, b) that make up z. It is convenient to define the
value [[x M ]], where M is a term with free variables among x . Since this
value is a token set, we can define inductively the relation (U , b) ∈ [[x M ]].

For a constructor pattern P( x ) and a list V of the same length and
types as x we define a list P(V ) of formal neighborhoods of the same

length and types as P( x ), by induction on P(
x ). x(V ) is the singleton
list V , and for we take the empty list. (P, Q)(V , W ) is covered by the
induction hypothesis. Finally
V ) := {Cb∗ | bi∗ ∈ Pi (V
(CP)( i ) = ∅, and bi∗ = ∗ otherwise}.
i ) if Pi (V

We use the following notation. (U , b) means (U1 , . . . (Un , b) . . . ), and

, V ) ⊆ [[x M ]] means (U
(U , b) ∈ [[x M ]] for all (finitely many) b ∈ V .
, b) ∈ [[x M ]]).
Definition (Inductive, of (U
Ui b , V, c) ∈ [[x M ]]
(U , V ) ⊆ [[x N ]]
(U
(V ), (A).
, b) ∈ [[x xi ]]
(U (U , c) ∈ [[x (MN )]]
For every constructor C and defined constant D we have
V b∗ , V , b) ∈ [[x ,y M ]]
(U W P(
V )
(C), (D)
, V , Cb∗ ) ∈ [[x C]]
(U (U,W , b) ∈ [[x D]]
y ) = M.
with one such rule (D) for every computation rule D P(
, b) ∈ [[x M ]] is defined as usual, by
The height of a derivation of (U
adding 1 at each rule. We define its D-height similarly, where only rules
(D) count.
We begin with some simple consequences of this definition. The fol-
lowing transformations preserve D-height:
V U
→ (U
, b) ∈ [[x M ]] → (V , b) ∈ [[x M ]], (12)
, V, b) ∈ [[x ,y M ]] ↔ (U
(U , b) ∈ [[x M ]] if y ∈
/ FV(M ), (13)
, V, b) ∈ [[x ,y (My)]] ↔ (U
(U , V, b) ∈ [[x M ]] if y ∈
/ FV(M ), (14)
(U y )))]] ↔ (U
, V , b) ∈ [[x ,y (M (P( V ), b) ∈ [[x ,z (M (
, P( z ))]]. (15)
280 6. Computability in higher types

Proof. (12) and (13) are both proved by easy inductions on the respec-
tive derivations.
(14). Assume (U , V, b) ∈ [[x ,y (My)]]. By (A) we then have a W such
that (U , V, W ) ⊆ [[x ,y y]] (i.e., V W ) and (U
, V, W, b) ∈ [[x ,y M ]].
By (12) from the latter we obtain (U , V, V, b) ∈ [[x ,y M ]]. Now since

y ∈ / FV(M ), (13) yields (U , V, b) ∈ [[x M ]], as required. Conversely,
assume (U , V, b) ∈ [[x M ]]. Since y ∈
/ FV(M ), (13) yields (U , V, V, b) ∈
[[x M ]]. Clearly we have (U , V, V ) ⊆ [[x ,y y]]. Hence by (A) (U
, V, b) ∈
[[x ,y (My)]], as required. Notice that the D-height did not change in these
transformations.
(15). By induction on P, with a side induction on M . We distinguish
cases on M . The cases xi , C and D are follow immediately from (13). In
case MN the following are equivalent by induction hypothesis:
, V , b) ∈ [[x ,y ((MN )(P(
(U y )))]]
∃W ((U
, V , W ) ⊆ [[x ,y (N (P(
y )))]] ∧ (U
, V , W, b) ∈ [[x ,y (M (P(
y )))]])
∃W ((U V ), W ) ⊆ [[x ,y (N (
, P( z ))]] ∧ (U V ), W, b) ∈ [[x ,y (M (
, P( z ))]])
(U V ), b) ∈ [[x ,y ((MN )(
, P( z ))]].
The ﬁnal case is where M is zi . Then we have to show
, V , b) ∈ [[x ,y (P(
(U y ))]] ↔ P(V ) b.
We now distinguish cases on P( y ). If P( y ) is yj , then both sides are
equivalent to Vj b. In case P(
y ) is (CQ)(y ) the following are equiva-
y)
lent, using the induction hypothesis for Q(
, V , b) ∈ [[x ,y ((CQ)(
(U y ))]]
, V , b) ∈ [[x ,y (CQ(
(U y ))]]
(U V ), b) ∈ [[x ,u (C
, Q( u )]]
(U V ), b) ∈ [[x C]]
, Q( by (14)
∃b∗ (b = Cb∗ ∧ Q(
V ) b∗ )
V ) b.
CQ(
This concludes the proof.
Let ∼ denote the equivalence relation on formal neighborhoods generated
by entailment, i.e., U ∼ V means (U V ) ∧ (V U ).
P(
If U V ), then there are W ∼ P(
such that U W V . (16)
) and W

The cases x and are clear, and in case

Proof. By induction on P.
Q we can apply the induction hypothesis. It remains to treat the case
P,
6.2. Denotational and operational semantics 281
V ) there is a b∗ such that Cb∗ ∈ U . Let
x ). Since U CP(
CP( 0 0

Ui := {a | ∃a∗ (Ca∗ ∈ U ∧ a = ai∗ )}.

. By deﬁnition
x consider CU
For the constructor pattern C
= {Ca∗ | ai∗ ∈ Ui if Ui = ∅, and ai∗ = ∗ otherwise}.
CU
We ﬁrst show U ∼ CU . Assume Ca∗ ∈ CU . For each i, if Ui = ∅, then
there is an ai such that Cai∗ ∈ U and aii∗ = ai∗ , and if Ui = ∅ then ai∗ = ∗.
∗

Hence
U ⊇ {Cai∗ | Ui = ∅} ∪ {Cb0∗ } Ca∗ .
Conversely assume Ca∗ ∈ U . We define Cb∗ ∈ CU by bi∗ = ai∗ if ai∗ = ∗,
∗ ∗
bi = ∗ if Ui = ∅, and otherwise (i.e., if ai = ∗ and Ui = ∅) take an
arbitrary bi∗ ∈ Ui . Clearly {Cb∗ } Ca∗ .
By definition U P( V ). Hence by induction hypothesis there are W
such that U ∼ P(
W ) and W V . Therefore U ∼ CU ∼ CP( W ).
Lemma (Unification). If P 1 (V1 ) ∼ · · · ∼ Pn (Vn ), then P
1 , . . . , P
n are
unifiable with a most general unifier and there exists W such that
) = · · · = (P
1 )(W
(P )∼P
n )(W 1 (V1 ) ∼ · · · ∼ P
n (Vn ).

Proof. Assume P 1 (V1 ) ∼ · · · ∼ P n (Vn ). Then P 1 (V1 ), . . . , Pn (Vn )

are componentwise consistent and hence P1 , . . . , Pn are unifiable with
a most general unifier . We now proceed by induction on P 1 , . . . , P
n .
If they are either all empty or all variables the claim is trivial. In the
case (P1 , P1 ), . . . , (P
n , Pn ) it follows from the linearity condition on vari-
ables that a most general unifier of (P 1 , P1 ), . . . , (Pn , Pn ) is the union
of most general unifiers of P 1 , . . . , P
n and of P1 , . . . , Pn . Hence the in-
duction hypothesis applies. In the case CP 1 , . . . , CP
n the assumption
CP1 (V1 ) ∼ · · · ∼ CP n (Vn ) implies P 1 (V1 ) ∼ · · · ∼ P n (Vn ) and hence
again the induction hypothesis applies. The remaining case is when some
are variables and the other ones of the form CP i , say x, CP 2 , . . . , CP n . By
assumption
V1 ∼ CP
2 (V2 ) ∼ · · · ∼ CP
n (Vn ).
such that
By induction hypothesis we obtain the required W
) = · · · = (P
2 )(W
(P )∼P
n )(W 2 (V1 ) ∼ · · · ∼ P
n (Vn ).

Lemma (Consistency). [[x M ]] is consistent.

Proof. Let (Ui , bi ) ∈ [[x M ]] for i = 1, 2. By coherence (cf. the corol-
lary at the end of 6.1.6) it suﬃces to prove that (U 1 , b1 ) and (U
2 , b2 )
are consistent. We shall prove this by induction on the maximum of the
D-heights and a side induction on the maximum of the heights.
282 6. Computability in higher types

Case (V). Let (U 2 , b2 ) ∈ [[x xi ]], and assume that U

1 , b1 ), (U 1 and U
2
are componentwise consistent. Then U1i b1 and U2i b2 . Since
U1i ∪ U2i is consistent, b1 and b2 must be consistent as well.
Case (C). For i = 1, 2 we have
Vi bi∗
.
i , Vi , Cb∗ ) ∈ [[x C]]
(U i

Assume U 1 , V1 and U

2 , V2 are componentwise consistent. The consistency
of Cb1∗ and Cb2∗ follows from Vi bi∗ and the consistency of V1 and V2 .
Case (A). For i = 1, 2 we have
i , Vi , ci ) ∈ [[x M ]]
(U i , Vi ) ⊆ [[x N ]]
(U
.
(Ui , ci ) ∈ [[x (MN )]]
Assume U 1 and U2 are componentwise consistent. By the side induction
hypothesis for the right premises V1 ∪ V2 is consistent. Hence by the side
induction hypothesis for the left hand sides c1 and c2 are consistent.
Case (D). For i = 1, 2 we have
i , Vi , bi ) ∈ [[x ,yi Mi (
(U yi )]] W i P
i (Vi )
(D)
(Ui , Wi , bi ) ∈ [[x D]]

for computation rules D P i (
yi ) = Mi ( yi ). Assume U 1 , W
1 and U2 , W
2 are
componentwise consistent; we must show that b1 and b2 are consistent.
Since W 1 ∪ W2 Pi (Vi ) for i = 1, 2, by (16) there are V , V such that
1 2
Vi Vi and W
1 ∪ W 2 ∼ P i (Vi ). Then by the uniﬁcation lemma there
are W such that (P
1 )(W )∼P i (Vi ) P i (Vi ) for i = 1, 2, where is the
most general uniﬁer of P 1 and P 2 . But then also
) Vi ,
yi )(W
(
and hence by (12) we have
i , (
(U ), bi ) ∈ [[x ,yi Mi (
yi )(W yi )]]
with lesser D-height. Now (15) gives
(U , bi ) ∈ [[x ,z Mi (
i , W yi ) ]]
without increasing the D-height. Notice that M1 ( yi ) = M2 (yi ) by our
condition on computation rules. Hence the induction hypothesis applied
to (U 1 , W
, b1 ), (U , b2 ) ∈ [[x ,z M1 (
2 , W y1 ) ]] implies the consistency of b1
and b2 , as required.
Lemma (Deductive closure). [[x M ]] is deductively closed, i.e., if W ⊆
[[x M ]] and W (V , c), then (V , c) ∈ [[x M ]].
Proof. By induction on the maximun of the D-heights and a side
induction on the maximun of the heights of W ⊆ [[x M ]]. We distinguish
cases on the last rule of these derivations (which is determined by M ).
6.2. Denotational and operational semantics 283
, b) ∈ W we have
Case (V). For all (U
Ui b
.
(U , b) ∈ [[x xi ]]

We must show Vi c. By assumption W (V , c), hence W V c. It

suffices to prove Vi W V . Let b ∈ W V ; we show Vi b. There are U
such that V U and (U , b) ∈ W . But then by the above Ui b, hence

Vi Ui b.
Case (A). Let W = {(U 1 , b1 ), . . . , (U i , bi ) ∈ W
n , bn )}. For each (U
there is Ui such that
(Ui , Ui , bi ) ∈ [[x M ]] i , Ui ) ⊆ [[x N ]]
(U
.
(Ui , bi ) ∈ [[x (MN )]]

Define U := {Ui | V U i }. We first show that U is consistent. Let
a, b ∈ U . There are i, j such that a ∈ Ui , b ∈ Uj and V U i , U
j . Then

Ui and Uj are consistent; hence by the consistency of [[x N ]] proved above
a and b are consistent as well.
Next we show (V , U ) ⊆ [[x N ]]. Let a ∈ U ; we show (V , a) ∈ [[x N ]].
Fix i such that a ∈ Ui and V Ui , and let Wi := {(U i , b) | b ∈ Ui } ⊆
[[x N ]]. Since by the side induction hypothesis [[x N ]] is deductively closed
it suffices to prove Wi (V , a), i.e., {b | b ∈ Ui ∧ V U i } a. But the
latter set equals Ui , and a ∈ Ui .
Finally we show (V , U, c) ⊆ [[x M ]]. Let

W := {(U n , Un , bn )} ⊆ [[x M ]].

1 , U1 , b1 ), . . . , (U

By side induction hypothesis it suﬃces to prove that W (V , U, c), i.e.,

{bi | V U i ∧ U Ui } c. But by deﬁnition of U the latter set equals
{bi | V Ui }, which in turn entails c because by assumption W (V , c).

Now we can use (A) to infer (V , c) ∈ [[x M ]], as required.
Case (C). Assume W ⊆ [[x C ]]. Then W consists of (U ,U , Cb∗ ) such
that U b∗ . Assume further W (V , V , c). Then

{Cb∗ | ∃U ,U ((U , Cb∗ ) ∈ W ∧ V U

,U ∧ V U
)} c.

By deﬁnition of entailment c has the form Cc∗ such that

Wi := {b | ∃U ,U ,b∗ (b = bi∗ ∧ (U , Cb∗ ) ∈ W ∧ V U

,U ∧

V U
)} ci∗ .

We must show (V , V , Cc∗ ) ∈ [[x C]], i.e., V c∗ . It suﬃces to show
Vi Wi , for every i. Let b ∈ Wi . Then there are U ,U , b∗ such that
∗ ,U
, Cb∗ ) ∈ W and V U . Hence Vi Ui bi∗ = b.

b = bi , (U
284 6. Computability in higher types

Case (D). Let W = {(U , b1 ), . . . , (U

1 , U n , bn )}. For every i there
n , U
1
i such that
is an U
i , bi ) ∈ [[x ,yi Mi (yi )]]
i , U
(U Ui Pi (U
i )
i , U
(U , bi ) ∈ [[x D]]
i

for D Pi (yi ) = Mi (yi ) a computation rule. Assume W (V , V , c). We

must prove (V , V , c) ∈ [[x D]]. Let
i ∧ V U
I := {i | 1 ≤ i ≤ n ∧ V U i }.

Then {bi | i ∈ I } c, hence I = ∅. For i ∈ I we have V U i Pi (U

i ),

hence by (16) there are Vi such that V ∼ P i (Vi ) and Vi U i . In par-
ticular for i, j ∈ I
V ∼ P
i (Vi ) ∼ P
j (Vj ).

To simplify notation assume I = {1, . . . , m}. Hence by the uniﬁcation

lemma P 1 , . . . , P
m are uniﬁable with a most general uniﬁer and there
such that
exists W
) = · · · = (P
1 )(W
(P 1 (V1 ) ∼ · · · ∼ P
)∼P
m )(W m (Vm ).
Let i, j ∈ I . Then by the conditions on computation rules Mi = Mj .
Also (yi )(W ) Vi U
i . Therefore by (12)

(V , ( ), bi ) ∈ [[x ,yi Mi (yi )]]

yi )(W
and hence by (15)
, bi ) ∈ [[x ,yi Mi (yi )]].
(V , W
But Mi (yi ) = Mi = M1 = M1 (y1 ) and hence for all i ∈ I
, bi ) ∈ [[x ,yi M1 (y1 )]].
(V , W
Therefore X := {(V , W , bi ) | i ∈ I } ⊆ [[x ,yi M1 (y1 )]]. Since {bi |
i ∈ I } c, we have X (V , W , c) and hence the induction hypoth-
esis implies (V , W , c) ∈ [[x ,yi M1 (y1 )]]. Using (15) again we obtain

(V , ( ), c) ∈ [[x ,yi M1 (y1 )]]. Since V ∼ P
yi )(W 1 (V ) ∼ P
1 (( ))
y1 )(W
1

we obtain (V , V , c) ∈ [[x D]], by (D).

Corollary. [[x M ]] is an ideal.
6.2.7. Preservation of values. We now prove that our definition above of
the denotation of a term is reasonable in the sense that it is not changed by
an application of the standard (- and -) conversions or a computation
rule. For the -conversion part of this proof it is helpful to first introduce
a more standard notation, which involves variable environments.
Definition. Assume that all free variables in M are among x . Let

[[M ]]U := {b | (U , b) ∈ [[x M ]]} and [[M ]]u,V := [[M ]]U ,V .
x
x ,
y U ⊆
u x
,
y
6.2. Denotational and operational semantics 285

From (13) we obtain [[M ]]U if y ∈
,V U
,y = [[M ]]x
x / FV(M ), and similarly
for ideals u , v instead of U , V . We have a useful monotonicity property,
which follows from the deductive closure of [[x M ]].
Lemma. (a) If V U , b c and b ∈ [[M ]]U , then c ∈ [[M ]]V .
x
x

(b) If v ⊇ u , b c and b ∈ [[M ]]ux , then c ∈ [[M ]]vx .
Proof. (a) V U , b c and (U , b) ∈ [[x M ]] together imply (V , c) ∈
[[x M ]], by the deductive closure of [[x M ]]. (b) follows from (a).
u
Lemma. (a) [[xi ]]U
= U i and [[xi ]]x
x = ui .

= {(V, b) | b ∈ [[M ]]x
,y } and
U ,V
(b) [[y M ]]xU
[[y M ]]xu = {(V, b) | b ∈ [[M ]]ux,V
,y }.
u u u
(c) [[MN ]]U
x
U
= [[M ]]x
U
[[N ]]x
and [[MN ]]x = [[M ]]x [[N ]]x.

Proof. (b) It suffices to prove the first part. But (V, b) ∈ [[y M ]]U
and
x
,V
b ∈ [[M ]]x ,y are both equivalent to (U
U , V, b) ∈ [[x ,y M ]].
(c) For the first part we argue as follows.

c ∈ [[M ]]U ↔ ∃V ⊆[[N ]]U ((V, c) ∈ [[M ]]x
U
[[N ]]x
x
U
)
x

↔ ∃V ((U
, V ) ⊆ [[x N ]] ∧ (U
, V, c) ∈ [[x M ]])
↔ (U
, c) ∈ [[x (MN )]] by (A)

↔c∈ [[MN ]]U
.
x

The second part is an easy consequence:

c ∈ [[M ]]ux [[N ]]xu ↔ ∃V ⊆[[N ]]u ((V, c) ∈ [[M ]]ux )
x

↔ ∃V ⊆[[N ]]u ∃U ⊆u ((V, c) ∈ [[M ]]xU
)
x

↔ ∃U1 ⊆u ∃
U ∃U ⊆u ((V, c) ∈ [[M ]]U
)
x
V ⊆[[N ]]x 1

↔(∗) ∃U ⊆u ∃V ⊆[[N ]]U ((V, c) ∈ [[M ]]U
)
x
x

↔ ∃U ⊆u (c ∈ [[M ]]U U
[[N ]]x
x )

↔ ∃U ⊆u (c ∈ [[MN ]]U
) by the ﬁrst part
x

↔ c ∈ [[MN ]]ux .

Here is the proof of the equivalence marked (∗). The upward direction
is obvious. For the downward direction we use monotonicity. Assume
U1 ⊆ u , V ⊆ [[N ]]U1 , U
⊆ u and (V, c) ∈ [[M ]]U . Let U 1 ∪ U
2 := U ⊆ u .
x
x

Then by monotonicity V ⊆ [[N ]]U
x
2 U2
and (V, c) ∈ [[M ]]x
.
Corollary. [[y M ]]ux v = [[M ]]xu,v,y .
286 6. Computability in higher types

Proof.
b ∈ [[y M ]]ux v ↔ ∃V ⊆v ((V, b) ∈ [[y M ]]ux )
↔ ∃V ⊆v (b ∈ [[M ]]ux,V
,y ) by the lemma, part (b)

↔ b ∈ [[M ]]ux,v,y .
u ,[[N ]]ux
Lemma (Substitution). [[M (z)]]x ,z = [[M (N )]]ux .
Proof. By induction on M , and cases on the form of M .
and u .
Case y M . For readability we leave out x
[[y M (z)]][[N
z
]]
= {(V, b) | b ∈ [[M (z)]][[N
z,y
]],V
}
= {(V, b) | b ∈ [[M (N )]]y } by induction hypothesis
V

= [[y M (N )]] by the last lemma, part (b)

= [[(y M )(N )]].
The other cases are easy.
u u
Lemma (Preservation of values, ). [[(y M (y))N ]]x = [[M (N )]]x .
Proof. Again we leave out x , u . By the last two lemmata and the
corollary, [[(y M (y))N ]] = [[y M (y)]][[N ]] = [[M (y)]][[N y
]]
= [[M (N )]].
Lemma (Preservation of values, ). [[y (My)]]x = [[M ]]ux if y ∈
u
/
FV(M ).
Proof.
(V, b) ∈ [[y (My)]]ux ↔ ∃ ((U
U ⊆
u
, V, b) ∈ [[x ,y (My)]])

↔ ∃U ⊆u ((U

, V, b) ∈ [[x M ]]) by (14)
↔ (V, b) ∈ [[M ]]ux .

Lemma (Inversion). Let D P( y ) = M be a computation rule of a deﬁned

constant D. Then (P(V ), b) ∈ [[D]] implies (V , b) ∈ [[y M ]].

Proof. Assume (P( V ), b) ∈ [[D]]. Then there is a computation rule

D P1 (y1 ) = M1 such that (V1 , b) ∈ [[y1 M1 ]] and P(
V ) P 1 (V1 ) for some
V1 . Hence by (16) there are V0 such that P( V ) ∼ P 1 (V0 ) and V0 V1 . By
the unification lemma in 6.2.6 P and P 1 are unifiable with a most general
unifier (hence M = M1 ) and there exists W such that
)(W
(P ) = (P 1 )(W ) ∼ P( V ) ∼ P 1 (V0 ).

From (V1 , b) ∈ [[y1 M1 ]] we obtain (V0 , b) ∈ [[y1 M1 ]] since V0 V1 .

1 )(W
Now (P )∼P 1 (V0 ) implies ( y1 )(W ) ∼ V0 . Hence we obtain
y1 )(W
(( ), b) ∈ [[y1 M1 ]]
, b) ∈ [[z M1 ]]
(W by (15)
, b) ∈ [[z M ]]
(W since M = M1
y )(W
(( ), b) ∈ [[y M ]] again by (15).
6.2. Denotational and operational semantics 287

But (P ) ∼ P(
)(W V ) implies ( ) ∼ V . Hence (V , b) ∈ [[y M ]].
y )(W

We can now prove preservation of values under computation rules:
y ) = M of a deﬁned constant
Lemma. For every computation rule D P(
y ))]]u = [[y M ]]u .
D, [[y (D P( x x
and u .
Proof. For readability we omit x
(V , b) ∈ [[y (D P(
y ))]] ↔ (P(
V ), b) ∈ [[z (D
z )]] by (15)
↔ (P(
V ), b) ∈ [[D]] by (14)
↔ (V , b) ∈ [[y M ]], by the inversion lemma.

6.2.8. Operational semantics; adequacy. The adequacy theorem of Plot-

kin [1977, Theorem 3.1] says that whenever the value of a closed term M is
a numeral, then M head-reduces to this numeral. So in this sense the (de-
notational) semantics is (computationally) “adequate”. Plotkin’s proof is
by induction on the types, and uses a computability predicate. We prove
an adequacy theorem in our setting. However, for technical reasons we
require that the left hand sides of computation rules are non-unifiable.
Operational semantics. Recall that a token of an algebra is a construc-
tor tree whose outermost constructor is for .
Definition. For closed terms M we inductively define M 1 N (“M
head-reduces to N ”) and M ∈ Nf (“M is in normal form”).
1 M (N )K
(x M (x))N K ,
N
D P( 1 M (N
)K )K
y ) = M (
for D P( y ) a computation rule,
1 N
M
CM 1 CN

where CM 1 N
is of base type; M means that Mi 1 Ni for at least one
i, and for all i either Mi 1 Ni or Mi = Ni ∈ Nf. In the final rule we
assume that M has length ar(D), but is not an instance of P(
y ) such that
D has a computation rule of this constructor pattern:
M 1 N

.
K
DM 1 D N
K
If none of the rules applies, then M ∈ Nf.
Clearly for every term M there is at most one M such that M 1 M .
Let denote the reflexive transitive closure of 1 .
We define an “operational interpretation” (Martin-Löf [1983]) of for-
mal neighborhoods U . To this end we define a notion M ∈ [a], for M
closed.
288 6. Computability in higher types

Deﬁnition (M ∈ [a]). We deﬁne a relation M ∈ [a] by recursion on

the height |a| of a (deﬁned in 6.1.5). Let M ∈ [U ] mean ∀a∈U (M ∈ [a]).
M ∈ [∗] is deﬁned to be true.
M ∈ [Ca∗ ] := ∃ ∗ (M CN
N ∈[a ]
),

M ∈ [(U
, b)] := ∀ (M N
N ∈[U ]
∈ [b]).

Remark. Notice that the ﬁrst clause of the deﬁnition generalizes to

M ∈ [P(a∗ )] → ∃N ∈[a∗ ] (M P(N
)). But this implies

M ∈ [P(V )] → ∃N ∈[V ] (M P(N

)), (17)
which can be seen as follows. To simplify notation assume V is V . Let
b ∈ V ; recall that V is finite. Then M ∈ [P(b)], and hence there is
Nb ∈ [V ] such that M P(Nb ). All these Nb ’s must have a common
reduct N , and by the next lemma N ∈ [V ].
We prove some easy but useful properties of the relation M ∈ [a]. The
first one says that [a] is closed under backward and forward reduction
steps.
Lemma. (a) M − 1 M → M ∈ [a] → M − ∈ [a].
(b) M 1 M + → M ∈ [a] → M + ∈ [a].
Proof. (a) By induction on |a|. Case M ∈ [Ca∗ ]. Then M CN
∗ − −
for some N ∈ [a ]. From M 1 M we obtain M CN . Hence

M − ∈ [Ca∗ ]. Case M ∈ [(U , b)]. Assume M − 1 M . We must show
− ]; we must show M − N
M ∈ [(U ∈ [U
, b)]. Let N ∈ [b]. By assumption
we have M N ∈ [b]. Because of M − 1 M at an arrow type and the
trailing K in the rules for 1 at arrow types we also have M − N 1 M N .
−
By induction hypothesis M N ∈ [b].
(b) By induction on |a|. Case M ∈ [Ca∗ ]. Then M CN for some
∗
N ∈ [a ]. Subcase M = CN . Then M = CN with N
+ + 1 N +.
By induction hypothesis N + ∗
∈ [a ]. Hence M = CN + + ∗
∈ [Ca ] by
definition. Subcase M 1 M + CN . Then M + ∈ [Ca∗ ], again by
definition. Case M ∈ [(U , b)]. Assume M 1 M + . We must show
M ∈ [(U , b)]. Let N ∈ [U ]; we must show M + N
+ ∈ [b]. By assumption
we have M N ∈ [b]. Because of M 1 M we obtain M N
+ 1 M + N , as
above. By induction hypothesis M N ∈ [b].
+

The next lemma allows to decrease the information in U .
Lemma. M ∈ [U ] → U b → M ∈ [b].
Proof. By induction on the type, and a side induction on |U |.
Case U = {Cai∗ | i ∈ I } and M ∈ [Cai∗ ] for all i ∈ I . Then M CN i
∗
for some Ni ∈ [ai ]. Since 1 is deterministic (i.e., the reduct is unique if
it exists), there is a common reduct CN of all CN
i , and by the previous
lemma N ∈ [a∗ ]. Since U = {Ca∗ | i ∈ I } b, the token b is of the form
i i
6.2. Denotational and operational semantics 289

Cb∗ with {aij∗ | i ∈ I } bj∗ . Hence N ∈ [b∗ ] by induction hypothesis,

and therefore M ∈ [Cb ]. ∗

Case U = {(U i , bi ) | i ∈ I } and M ∈ [(U i , bi )] for all i ∈ I . Assume

U (V , c), i.e., {bi | V U i } c. We must show M ∈ [(V , c)].
Let N ∈ [V ]; we must show M N ∈ [c]. From V U i we obtain
N ∈ [Ui ] by induction hypothesis. Now M ∈ [(Ui , bi )] yields M N
∈ [bi ].
From {bi | V Ui } c we obtain M N ∈ [c], again by induction

hypothesis.
Theorem (Adequacy). For closed terms M
a ∈ [[M ]] → M ∈ [a].
Proof. We show for arbitrary terms M with free variables among x

, b) ∈ [[x M ]] → x M ∈ [(U
(U , b)],
, b) ∈ [[x M ]], and cases on the form
by induction on the rules defining (U
of M .
Case xi .
Ui b
(V ).
, b) ∈ [[x xi ]]
(U
We must show x xi ∈ [(U , b)], i.e., ∀ ((x xi )N
N ∈[U ]
∈ [b]). Let N ∈ [U
].
It suffices to show Ni ∈ [b]. But this follows from Ni ∈ [Ui ] and Ui b.
Case MN .
(U , V ) ⊆ [[x N (x )]] (U , V, c) ∈ [[x M (
x )]]
(A).
(U , c) ∈ [[x (M (
x )N (x ))]]
We must show ∀K ∈[U ] (M (K)N
(K ) ∈ [c]). Let K ∈ [U ]. By induction
hypothesis, for all b ∈ V we have x N ( x ) ∈ [(U , b)] and hence N (K) ∈
[b]. This means N (K) ∈ [V ]. Also, by induction hypothesis we have

x ) ∈ [(U
x M ( , V, c)]. Therefore (x M ( x ))K N (K ) ∈ [c] and hence
M (K)N (K ) ∈ [c].

Case C.
V b∗
(C).
, V , Cb∗ ) ∈ [[x C]]
(U
We must show x C ∈ [(U , V , Cb∗ )], i.e., ∀ ∀ (CL ∈ [Cb∗ ]). Let
K∈[U ] L∈[V ]
∈ [V ]. Since V b∗ we have L
L ∈ [b∗ ]. Hence CL ∈ [Cb∗ ] by
definition.
Case D.
(U , V , b) ∈ [[x ,y M ]] W P(
V )
(D),
(U,W , b) ∈ [[x D]]
290 6. Computability in higher types
y ) = M (
with D P( y ) a computation rule. To simplify notation assume
that x are empty. We must show D ∈ [(W
, U , b)]. Assume L ∈ [W ];
we must show D L ∈ [b]. Since W P(V ) we have L ∈ [P(V )].

By (17) there are N ∈ [V ] such that L P( N ). Because of clo-

sure of [V ] under backwards reduction steps we can assume that in the
head reduction sequence from L to P(
N ) with N ∈ [V ] this N is the
N
such that before P( ) we have not yet reached the pattern P. Hence
DL D P( N ) 1 M (N ). (Here we need that the left hand sides
of computation rules are non-unifiable.) Because of (V , b) ∈ [[y M ]]
by induction hypothesis y M ∈ [(V , b)], i.e., ∀N ∈[V ] ((y M )N
∈ [b]).
Hence (y M )N ∈ [b] for the N above, and by the next-to-last lemma
D L ∈ [b].

6.3. Normalization

In the adequacy theorem we have seen that whenever a closed term de-
notes a numeral, then a particular reduction method—head reduction—
terminates and the result will be this numeral. However, in general we
cannot expect that reducing an arbitrary term will terminate. Our quite
general computation rules exclude this; in fact, the definition Yf =
f(Yf) of the least-fixed-point operator easily leads to an example of
non-termination. Moreover, we should not expect anything else, since
our terms denote partial functionals.
Now suppose we want to concentrate on total functionals. This can
be achieved if one gives up general computation rules and restricts at-
tention to the (structural) higher type recursion operators introduced by
Gödel [1958]. In his system T Gödel considers terms built from (typed)
variables and (typed) constants for the constructors and recursion op-
erators by (type-correct) application and abstraction. For the recursion
operators one can formulate their natural conversion rules, which have
the form of computation rules. We will prove in this section that not
only head reduction will terminate for such terms, but also arbitrary re-
duction sequences. The proof will be given by a “predicative” method
(that is, “from below”, without quantifying over all predicates or sets). It
is well known (see for instance Troelstra and van Dalen [1988]) that by
formalizing this proof in HA one can show that every function definable
in Gödel’s T is in fact provable in HA. The converse will follow by means
of the realizability interpretation treated in detail in the next chapter: if
M is a proof in HA of ∀x ∃y C (x , y) with C (
x , y) a Δ0 (exp) formula, then
its extracted term et(M ) is in Gödel’s T and satisfies ∀x C (x , et(M )(
x )).
In the final subsection we address the question whether the normal form
of a term can be computed by evaluating the term in an appropriate model.
6.3. Normalization 291

This indeed can be done, but of course the value obtained must be “reified”
to a term, which turns out to be the long normal form. For simplicity we
restrict attention to -terms without defined higher order constants, given
by their computation rules; however, the method works for the general
case as well. In fact, the question arose when implementing normalization
in Minlog. Since the underlying programming language is Scheme—a
member of the Lisp family with a built-in efficient evaluation—it was
tempting to use exactly this evaluation mechanism to compute normal
forms. This is done in Minlog.
6.3.1. Strong normalization. We consider terms in Gödel’s T. Recall
the definition of the conversion relation → in 6.2.2. We define the one
step reduction relation → between terms in T as follows. M → N if
N is obtained from M by replacing a subterm M in M by N , where
M → N . The reduction relations →+ and →∗ are the transitive and the
reflexive transitive closure of →, respectively. For M = M1 , . . . , Mn we
write M → M if Mi → Mi for some i ∈ {1, . . . , n} and Mj = Mj for all

i = j ∈ {1, . . . , n}. A term M is normal (or in normal form) if there is no

term N such that M → N .
Clearly normal closed terms of ground type are of the form Ci N .

Deﬁnition. The set SN of strongly normalizing terms is inductively

defined by
∀N ;M →N (N ∈ SN) → M ∈ SN.
Note that with M clearly every subterm of M is strongly normalizing.
Definition. We define strong computability predicates SC by induction
on .
)j . Then M ∈ SCj if
Case j = ( κ

∀N ;M →N (N ∈ SC), and (18)

M = Ci N P ∈ SC ∧
implies N ∀K∈SC

∈ SCjp ).
(NpR K (19)
p<n R

Case → . SC→ := {M | ∀N ∈SC (MN ∈ SC )}.

The reference to N P ∈ SC and K ∈ SC in (19) is legal, because the

,K
types , i of N must have been generated before j . Note also that by
∈ SC implies N
(19) Ci N ∈ SC.
We now set up a sequence of lemmata leading to a proof that every term
is strongly normalizing.
Lemma (Closure of SC under reduction). If M ∈ SC and M → M ,
then M ∈ SC .
Proof. Induction on . Case . By (18). Case → . Assume
M ∈ SC→ and M → M ; we must show M ∈ SC. So let N ∈ SC ;
292 6. Computability in higher types

we must show M N ∈ SC . But this follows from MN → M N and

MN ∈ SC by induction hypothesis for .
Lemma (Closure of SC under variable application).
∀M ∈SN (M
∈ SC → (x M
) ∈ SC).

Proof. Induction on M ∈ SN. Assume M ∈ SN and M

∈ SC; we
must show (x M ) ∈ SC. So assume x M → N ; we must show N ∈ SC.

Now by the form of the conversion rules N must be of the form x M

with M → M . But M ∈ SC by closure of SC under reduction, hence
xM ∈ SC by induction hypothesis for M .
Lemma. (a) SC ⊆ SN, (b) x ∈ SC .

Proof. By simultaneous induction on . Case j = ( κ )j . (a) We

show that M ∈ SC implies M ∈ SN by (side) induction on M ∈ SCj .
j

So assume M ∈ SCj ; we must show M ∈ SN. But for every N with

M → N we have N ∈ SC by (18), hence N ∈ SN by the side induction
hypothesis. (b) x ∈ SCj holds trivially.
Case → . (a) Assume M ∈ SC→ ; we must show M ∈ SN. By
induction hypothesis (b) for we have x ∈ SC , hence Mx ∈ SC , hence
Mx ∈ SN by induction hypothesis (a) for . But Mx ∈ SN clearly implies
M ∈ SN. (b) Let M ∈ SC with 1 = ; we must show x M ∈ SC .
But this follows from the closure of SC under variable application, using
induction hypothesis (a) for .
It follows that each constructor is strongly computable:
Corollary. N ∈ SC → Ci N ∈ SC, i.e., Ci ∈ SC.
Proof. First show ∀N ∈SN (N
∈ SC → Ci N
∈ SC) by induction on
∈ SN as we proved closure of SC under variable application, and then
N
use SC ⊆ SN.
Lemma. ∀M,N,N ∈SN (M (N )N ∈ SC → (x M (x))N N ∈ SC ).

Proof. By induction on M, N, N ∈ SN. Let M, N, N ∈ SN and as-

sume that M (N )N ∈ SC; we must show (x M (x))N N ∈ SC. Assume
(x M (x))N N → K ; we must show K ∈ SC. Case K = (x M (x))N N
∗
with M, N, N → M , N , N . Then M (N )N → M (N )N , hence by

(18) from our assumption M (N )N ∈ SC we can infer M (N )N
∈ SC,

therefore (x M (x))N N ∈ SC by induction hypothesis. Case K =
M (N )N . Then K ∈ SC by assumption.
By induction on (using SC ⊆ SN) it follows that this property holds

for arbitrary types as well:

∀M,N,N ∈SN (M (N )N
∈ SC → (x M (x))N N
∈ SC ). (20)

Lemma. ∀N ∈SCj ∀M ,L∈SN

∈ SC → Rj N M
,L
(M L ∈ SC ).
6.3. Normalization 293

Proof. By main induction on N ∈ SCj , and side induction on M ∈

,L
SN. Assume
Rj N M
L → L.
To show L ∈ SC we distinguish cases according to how this one step
reduction was generated.
Case 1. Rj N M L ∈ SC by the side induction hypothesis.
Case 2. Rj N ML ∈ SC by the main induction hypothesis.

Case 3. N = Ci N and
((Rj0 · M
L = Mi N ) ◦ N0R ) . . . ((Rjn−1 · M
) ◦ Nn−1
R
)L.

M ∈ SC by assumption. N
,L ∈ SC follows from N = Ci N ∈ SC by (19).
R
Note that for all recursive arguments Np of N and all strongly computable
K by (19) we have the induction hypothesis for NpR K available. It remains
to show (Rjp · M ) ◦ Np = xp (Rjp (Np x
R R
p )M ) ∈ SC. So let K
,Q ∈ SC
be given. We must show (xp (Rjp (Np x R
p )M ))K
Q ∈ SC. By induction
hypothesis for Np K we have Rjp (Np K)M Q ∈ SC, since M
R R ,Q ∈ SN
because of SC ⊆ SN. Now (20) yields the claim.

So in particular Rj ∈ SC.
Definition. A substitution is strongly computable if (x) ∈ SC for
all variables x. A term M is strongly computable under substitution if
M ∈ SC for all strongly computable substitutions .
Theorem. Every term in Gödel’s T is strongly computable under substi-
tution.
Proof. Induction on the term M . Case x. x ∈ SC, since is strongly
computable. Cases Ci and Rj have been treated above. Case MN . By
induction hypothesis M , N ∈ SC, hence (MN ) = (M )(N ) ∈ SC.
Case x M . Let be a strongly computable substitution; we must show
(x M ) = x M xx ∈ SC. So let N ∈ SC; we must show (x M xx )N ∈ SC.
By induction hypothesis M xN ∈ SC, hence (x M xx )N ∈ SC by (20).
It follows that every term in Gödel’s T is strongly normalizing.
6.3.2. Normalization by evaluation. A basic question is how to prac-
tically normalize a term, say in a system like Minlog. There are many
ways to do this; however, one wants to compute the normal form in a
rational and efficient way. We show here—as an aside—that this can be
done simply by evaluating the term itself, but in an appropriate model. Of
course the value obtained must then be “reified” to a term, which turns
out to be the long normal form.
Recall the notion of a simple type: , → ; we may also include
× . The set Λ of terms is defined by x , (x M )→ , (M → N ) .
Let Λ denote the set of all terms of type . We consider the set of terms
294 6. Computability in higher types

in long normal form (i.e., normal w.r.t. -reduction and -expansion):

, and xM1 . . . Mn by
(xM1 . . . Mn ) , x M . Abbreviate M1 . . . Mn by M

x M . By nf(M ) we denote the long normal form of M , i.e., the (unique)
term in long normal form -equal to M .
Our goal is to define a normalization function that
(i) first evaluates a term M in a suitable (denotational) model to some
object, say a, and then
(ii) converts a back into a term which is the long normal form of M .
We take terms of base type as base type objects, and all functions as
possible function type objects:
[[]] := Λ , [[ → ]] := [[ ]][[]] (the full function space).
It is crucial that all terms (of base type) are present, not just the closed
ones.
Next we need an assignment ↑ lifting a variable to an object, and a
function ↓ giving us a normal term from an object. They should meet
the following condition, to be called “correctness of normalization by
evaluation”:
↓([[M ]]↑ ) = nf(M ),
where [[M ]]↑ ∈ [[]] denotes the value of M under the assignment ↑.
Two such functions ↓ and ↑ can be defined simultaneously, by induction
on the type. It is convenient to define ↑ on all terms (not just on variables).
Define ↓ : [[]] → Λ and ↑ : Λ → [[]] (called reify and reflect) by
↓ (M ) := M, ↑ (M ) := M,
↓→ (a) := x (↓ (a(↑ (x))))(x “new”), ↑→ (M )(a) := ↑ (M ↓ (a)).
x “new” is not a problem for an implementation, where we have an
operational understanding and may use something like gensym, but it is
for a mathematical model. We therefore refine our model by considering
term families.
Since normalization by evaluation needs to create bound variables when
“reifying” abstract objects of higher type, it is useful to follow de Bruijn’s
[1972] style of representing bound variables in terms. This is done here—
as in Berger and Schwichtenberg [1991], Filinski [1999]—by means of
term families. A term family is a parametrized version of a given term
M . The idea is that the term family of M at index k reproduces M with
bound variables renamed starting at k. For example, for
M := u,v (cx (vx)y,z (zu))
the associated term family M ∞ at index 3 yields
M ∞ (3) := x3 ,x4 (cx5 (x4 x5 )x5 ,x6 (x6 x3 )).
We denote terms by M, N, K, . . . , and term families by r, s, t, . . . .
6.3. Normalization 295

To every term M we assign a term family M ∞ : N → Λ by

x ∞ (k) := x,
(y M )∞ (k) := xk (M [y := xk ]∞ (k + 1)),
(MN )∞ (k) := M ∞ (k)N ∞ (k).
The application of a term family r : N → Λ→ to s : N → Λ is the family
rs : N → Λ deﬁned by (rs)(k) := r(k)s(k). Hence, e.g., (MN )∞ =
M ∞ N ∞ . Let k > FV(M ) mean that k is greater than all i such that
xi ∈ FV(M ) for some type .
Lemma. (a) If M =α N , then M ∞ = N ∞ .
(b) If k > FV(M ), then M ∞ (k) =α M .
Proof. (a) Induction on the height |M | of M . Only the case where
M and N are abstractions is critical. So assume y M =α z N . Then
M [y := P] =α N [z := P] for all terms P . In particular M [y := xk ] =α
N [z := xk ] for arbitrary k ∈ N. Hence we have M [y := xk ]∞ (k + 1) =
N [z := xk ]∞ (k + 1), by induction hypothesis. Therefore
(y M )∞ (k) = xk (M [y := xk ]∞ (k + 1))
= xk (N [z := xk ]∞ (k + 1))
= (z N )∞ (k).

(b) Induction on |M |. We only consider the case y M . The assumption

k > FV(y M ) implies xk ∈ / FV(y M ), hence y M =α xk (M [y := xk ]).
Furthermore k + 1 > FV(M [y := xk ]), hence M [y := xk ]∞ (k + 1) =α
M [y := xk ], by induction hypothesis. Therefore
(y M )∞ (k) = xk (M [y := xk ]∞ (k + 1)) =α xk (M [y := xk ]) =α y M.

Let ext(r) := r(k), where k is the least number greater than all i such
that some variable of the form xi occurs (free or bound) in r(0).
Lemma. ext(M ∞ ) =α M .
Proof. ext(M ∞ ) = M ∞ (k) for the least k > i for all i such that xi
occurs (free or bound) in M ∞ (0), hence k > FV(M ). Now use part (b)
of the lemma above.
We now aim at proving correctness of normalization by evaluation.
First we refine our model by allowing term families:
[[]] := ΛN
, [[ → ]] := [[ ]][[]] (full function spaces).
For every type we define two functions,
↓ : [[]] → (N → Λ ) (“reify”), ↑ : (N → Λ ) → [[]] (“reflect”),
296 6. Computability in higher types

simultaneously, by induction on :
↓ (r) := r, ↑ (r) := r,
↓→ (a)(k) := xk (↓ (a(↑ (xk∞ )))(k+1)), ↑→ (r)(b) := ↑ (r ↓ (b)).
Then, for ai ∈ [[i ]]
↑→ (r)(a1 , . . . an ) = ↑ (r ↓1 (a1 ) . . . ↓n (an )). (21)
Theorem (Correctness of normalization by evaluation). For terms M
in long normal form we have
↓([[M ]]↑ ) = M ∞ ,
where [[M ]]↑ denotes the value of M in the environment given by ↑.
Proof. Induction on the height of M . Case y M .
↓([[y M ]]↑ )(k) = xk (↓([[y M ]]↑ (↑(xk∞ )))(k + 1))
= xk (↓([[M [y := xk ]]]↑ )(k + 1))
= xk (M [y := xk ]∞ (k + 1)) by induction hypothesis
= (y M )∞ (k).

Case (x M ) . By (21) and the induction hypothesis we obtain [[x M

]]↑ =
∞ ∞ ∞ ∞ ∞
↑(x )([[M ]]↑ ) = x ↓([[M ]]↑ ) = x M = (x M ) .

6.4. Computable functionals

We now study our abstract notion of computability in more detail. The

essential tool will be recursion, and in the proof of the Kleene recursion
theorem in 2.4.3 we have already seen that solutions to recursive definitions
can be obtained as least fixed points of certain higher type operators. This
approach can be carried over to recursion in a higher order setting by
means of least-fixed-point operators Y of type ( → ) → defined by
the computation rule
Y f = f(Y f).
[[Y ]] has the property that (W, b) ∈ [[Y ]] implies W n ∅ b for some n. We
will prove this fact in 6.4.1, from the inductive definition of [[Y ]].
We need to consider some further continuous functionals, the parallel
conditional pcond of type B → N → N → N and a continuous ap-
proximation ∃ of type (N → B) → B to the existential quantifier. The
main result of this section is that a continuous functional is computable
if and only if it is “recursive in valmax, pcond and ∃”, where the latter
notion (defined below) refers to the fixed point operators. The func-
tion valmax : N → N → N is of an “administrative” character only:
6.4. Computable functionals 297

valmax(x, Sn 0) compares the ideal x ∈ |C N | with the ideal generated by

the n-th token an :

x if an ∈ x
valmax(x, Sn 0) =
{an } otherwise.
The denotation of the constants valmax, pcond and ∃ will be defined
in a “point-free” way, i.e., not referring to “points” or ideals but rather
to their (finite) approximations. This is done by adding some rules to the
inductive definition of (U , a) ∈ [[x M ]] in 6.2.6.
It will be necessary to refer to a given enumeration (an )n∈N of the tokens
in our underlying information system C N . To simplify notation somewhat
we shall, throughout this section, identify an index n with its token Sn 0.
It will usually be clear in context which “n” is intended. We shall use n to
denote the ideal {Sn 0} it generates.
6.4.1. Fixed point operators. Recall that the fixed point operators Y
were defined by the computation rule Y f = f(Y f).
Proposition. For every n > 0, there is a derivation of (W, b) ∈ [[Y ]] with
D-height n if and only if W n ∅ b.
Proof. Every derivation of (W, b) ∈ [[Y ]] must have the form
Ŵ (Vij , bij )
Ŵ (V, b) (Ŵ , Wi , bi ) ∈ [[f Y ]] (Ŵ , Vij , bij ) ∈ [[f f]]
(Ŵ , V, b) ∈ [[f f]] (Ŵ , bi ) ∈ [[f (Yf)]]
(Ŵ , b) ∈ [[f (f(Yf))]]
(D), assuming W Ŵ
(W, b) ∈ [[Y ]]
with V := {bi | i ∈ I }, Wi := {(Vij , bij ) | j ∈ Ii }.
“→”. By induction on the D-height. We have (Ŵ , Wi , bi ) ∈ [[f Y ]],
Ŵ Wi and Ŵ (V, b). By induction hypothesis Wini ∅ bi , and
Ŵ ni ∅ Wini ∅ by monotonicity of application. Because of Ŵ n+1 ∅ Ŵ n ∅
(proved by induction on n, using monotonicity) we obtain Ŵ n ∅ bi with
n := max ni , i.e., Ŵ n ∅ V . Recall that Ŵ (V, b) was defined to mean
Ŵ V b. Hence Ŵ (Ŵ n ∅) b and therefore W n+1 ∅ b.
“←”. By induction on n. Let W (W n ∅) b, i.e., W (V, b) with
V := W n ∅ =: {bi | i ∈ I }. Then W n ∅ bi , hence by induction
hypothesis (W, bi ) ∈ [[Y ]]. Substituting W for Ŵ and all Wi in the
derivation above gives the claim (W, b) ∈ [[Y ]].
Corollary. The fixed point operator Y has the property
b ∈ [[Y ]]w ↔ ∃k (b ∈ w k+1 ∅). (22)
Proof. Since w k+1 ∅ for fixed k is continuous in w, from b ∈ w k+1 ∅
we can infer W k+1 ∅ b for some W ⊆ w, and conversely. Moreover
298 6. Computability in higher types

b ∈ [[Y ]]w is equivalent to (W, b) ∈ [[Y ]] for some W ⊆ w, by (A). Now

apply the proposition.
6.4.2. Rules for pcond, ∃ and valmax. For pcond we have the rules
U tt V a U ff W a
(P1 ) (P2 )
, U, V, W, a) ∈ [[x pcond]]
(U , U, V, W, a) ∈ [[x pcond]]
(U

V a W a
(P3 )
(U , U, V, W, a) ∈ [[x pcond]]

and for ∃
U (Sn ∗, ff) U (Si 0, ff) (all i < n)
(E1 )
, U, ff) ∈ [[x ∃]]
(U

U ({Sn 0}, tt)

(E ).
, U, tt) ∈ [[x ∃]] 2
(U
The rules for valmax are
U an U a
(M1 )
(U , U, {S 0}, a) ∈ [[x valmax]]
n

{an } a
(M2 ).
(U , U, {S 0}, a) ∈ [[x valmax]]
n

One can check easily that the lemmata proved in 6.2.6 and 6.2.7 continue
to hold for the extended set of rules. Moreover one can prove easily that
pcond, ∃ and valmax denote the intended (continuous) functionals:
Lemma (Properties of pcond, ∃ and valmax).
⎧
⎪
⎨x if z = [[tt]],
[[pcond]](z, x, y) = y if z = [[ff]],
⎪
⎩
x ∩ y if z = ∅,

[[ff]] if (∅, ff) ∈ x,
[[∃]](x) =
[[tt]] if ({S∗}, tt) ∈ x or ({0}, tt) ∈ x,

x if Sn 0 ∈ y and an ∈ x,
[[valmax]](x, y) =
{an } if Sn 0 ∈ y and an ∈ / x.

Note that an n with Sn 0 ∈ y is uniquely determined if it exists. Note also

that for an algebra with at most unary constructors any two consistent
ideals x, y ∈ |C | are comparable, i.e., x ⊆ y or y ⊆ x. (A counter
example for an algebra with a binary constructor C and a nullary 0 is
{C∗0} and {C0∗}: they are consistent, but uncomparable.) Hence, if the
6.4. Computable functionals 299

token an is consistent with the ideal x, then

[[valmax]](x, Sn 0) = {an } ∪ x. (23)
This will be needed below. From pcond we can explicitly define the parallel
or of type B → B → B by ∨(p, q) := pcond(p, tt, q). Then

[[tt]] if x = [[tt]] or y = [[tt]],
[[∨]](x, y) =
[[ff]] if x = y = [[ff]].
6.4.3. Plotkin’s definability theorem.
Definition. A partial continuous functional Φ of type 1 → · · · →
p → N is said to be recursive in valmax, pcond and ∃ if it can be defined
explicitly by a term involving the constructors 0, S and the constants:
predecessor, the fixed point operators Y , valmax, pcond and ∃.
Theorem (Plotkin). A partial continuous functional is computable if and
only if it is recursive in valmax, pcond and ∃.
Proof. The fact that the constants are defined by the rules above im-
plies that the ideals they denote are recursively enumerable. Hence every
functional recursive in valmax, pcond and ∃ is computable.
For the converse let Φ be computable of type 1 → · · · → p → N.
Then Φ is a primitive recursively enumerable set of tokens
Φ = {(Uf11 n , . . . , Ufpp n , agn ) | n ∈ N}

where for each type j , (Uij )i∈N is an enumeration of Conj , and f1 , . . . ,

fp and g are fixed primitive recursive functions. Henceforth we will drop
the superscripts from the U ’s. For each such function f let f denote a
strict continuous extension of f to ideals, such that fn = fn and f∅ = ∅.
Let ϕ = ϕ1 , . . . , ϕp be arbitrary continuous functionals of types 1 , . . . ,
p respectively. We show that Φ is definable by the equation
Φϕ
= Ywϕ 0
with wϕ of type (N → N) → N → N given by
wϕ x := pcond(incons1 (ϕ1 , f1 x) ∨ · · · ∨ inconsp (ϕp , fp x),
(x + 1), valmax( (x + 1), gx)).
Here the inconsi ’s of type i → N → B are continuous functionals such
that

{tt} if ϕ ∪ Un is inconsistent,
incons(ϕ, n) =
{ff} if ϕ ⊇ Un .
We will prove in the lemma below that there are such functionals recursive
in valmax, pcond and ∃; their definition will involve the functional ∃.
300 6. Computability in higher types

For notational simplicity we assume p = 1 in the argument to follow,

and write w for wϕ . We ﬁrst prove that
∀n (a ∈ w k+1 ∅n → ∃n≤l ≤n+k (ϕ ⊇ Ufl ∧ {agl } a)).
The proof is by induction on k. For the base case assume a ∈ w∅n, i.e.,
a ∈ pcond(incons(ϕ, fn), ∅, valmax(∅, gn)).
Then clearly ϕ ⊇ Ufn and {agn } a. For the step k → k + 1 we have
a ∈ w k+2 ∅n = w(w k+1 ∅)n = pcond(incons(ϕ, fn), v, valmax(v, gn))
with v := w k+1 ∅(n +1). Then either a ∈ v or else ϕ ⊇ Ufn and {agn } a,
and hence the claim follows from the induction hypothesis.
Now Φϕ ⊇ Yw0 follows easily. Assume a ∈ Yw0. Then a ∈ w k+1 ∅0
for some k, by the proposition in 6.4.1. Therefore there is an l with
0 ≤ l ≤ k such that ϕ ⊇ Ufl and {agl } a. But this implies a ∈ Φϕ.
For the converse assume a ∈ Φϕ. Then for some U ⊆ ϕ we have
(U, a) ∈ Φ. By our assumption on Φ this means that we have an n such
that U = Ufn and a = agn . We show
a ∈ w k+1 ∅(n − k) for k ≤ n.
The proof is by induction on k. For the base case k = 0 because of ϕ ⊇
Ufn we have incons(ϕ, fn) = {ff} and hence w n = valmax( (n + 1),
gn) ) agn = a for any . For the step k → k + 1 by deﬁnition of w
(:= wϕ )
v := w k+2 ∅(n − k − 1)
= w(w k+1 ∅)(n − k − 1)
= pcond(incons(ϕ, f(n − k − 1)), v, valmax(v, g(n − k − 1)))
with v := w k+1 ∅(n − k). By induction hypothesis a ∈ v; we show a ∈ v .
If a and ag(n−k−1) are inconsistent, a ∈ Φϕ and (Uf(n−k−1) , ag(n−k−1) ) ∈
Φ imply that ϕ ∪Uf(n−k−1) is inconsistent, hence incons(ϕ, f(n − k − 1))
= {tt} and therefore v = v. Now assume that a and ag(n−k−1) are con-
sistent. Since our underlying algebra C N has at most unary constructors
it follows that a and ag(n−k−1) are comparable. In case {ag(n−k−1) } a
we have valmax(v, g(n − k − 1)) ⊇ {ag(n−k−1) } a, and hence a ∈ v
because a ∈ v. In case {a} ag(n−k−1) we have ag(n−k−1) ∈ v because
a ∈ v, hence valmax(v, g(n − k − 1)) = v and therefore again a ∈ v .
Now the converse inclusion Φϕ ⊆ Ywϕ 0 can be seen easily. Assume
a ∈ Φϕ. The claim just proved for k := n gives a ∈ wϕn+1 ∅0, and this
implies a ∈ Ywϕ 0.
Lemma. There are functionals en of type N → N → and incons of
type → N → B, both recursive in valmax, pcond and ∃, such that
6.4. Computable functionals 301

(a) en(m) enumerates all ﬁnitely generated extensions of Um thus

en(m, ∅) = Um ,
en(m, n) = Un if Un ⊇ Um .
(b)

{tt} if ϕ ∪ Un is inconsistent,
incons(ϕ, n) =
{ff} if ϕ ⊇ Un .
Proof. By induction on .
(a) We first prove that there is a functional en recursive in valmax,
pcond and ∃ with the properties stated. For its definition we need to look
in more detail into the definition of the sets Um of type .
For any type , fix an enumeration (Un )n∈N of Con such that U0 = ∅
and the following relations are primitive recursive:
Un ⊆ Um ,
Un ∪ Um ∈ Con ,
Un→ Um = Uk ,
Un ∪ Um = Uk (with k = 0 if Un ∪ Um ∈
/ Con ).
We also assume an enumeration (bi )i∈N of the set of tokens of type .
Recall that any primitive recursive function f can be lifted to a con-
tinuous functional f of type N → · · · → N → N. It is easy to see that
any primitive recursive function can be represented in this way by a term
involving 0, successor, predecessor, the least-fixed-point operator YN→N
and the cases operator C. For instance, addition can be written as
m + n = YN→N (ϕ x [if x = 0 then m else ϕ(x − 1) + 1 fi])n.
Let = 1 → · · · → p → N and j, k and h be primitive recursive
functions such that
Um = {(Uj(m,1,l ) , . . . , Uj(m,p,l ) , ak(m,l ) ) | l < hm}.
en will be defined from an auxiliary functional Ψ of type 1 → · · · →
p → N → N → N so that
Ψ(ϕ,
m, d, 0) := d,
Ψ(ϕ,
m, d, l + 1) := pcond(pl , Ψ(ϕ,
m, d, l ), valmax(Ψ(ϕ,
m, d, l ),
k(m, l )))
where pl denotes incons1 (ϕ1 , j(m, 1, l )) ∨ · · · ∨ inconsp (ϕp , j(m, p, l )).
Hence
⎧
⎪
⎨{tt} if ϕi ∪ Uj(m,i,l ) is inconsistent for some i = 1 . . . p,
pl = {ff} if ϕi ⊇ Uj(m,i,l ) for all i = 1 . . . p,
⎪
⎩
∅ otherwise.
302 6. Computability in higher types

Let
Um0 := ∅, Uml +1 := Uml ∪ {(Uj(m,1,l ) , . . . , Uj(m,p,l ) , ak(m,l ) )}.
We ﬁrst show that
m, ∅, l ) = Uml (ϕ
Ψ(ϕ, ). (24)
This is proved by induction on l . For l = 0 both sides = ∅. In the step
l → l + 1 we distinguish three cases according to the possible values {tt},
{ff} and ∅ of pl .
Case pl = {tt}. By deﬁnition of Ψ, the induction hypothesis and the
fact that pl = {tt} implies ϕi ∪ Uj(m,i,l ) is inconsistent for some i = 1 . . . p
we obtain
m, ∅, l + 1) = Ψ(ϕ,
Ψ(ϕ, m, ∅, l ) = Uml (ϕ
) = Uml +1 (ϕ
).
Case pl = {ff}. Then ϕi ⊇ Uj(m,i,l ) for all i = 1 . . . p. Now the consis-
) ∪ {ak(m,l ) } is consistent and therefore
tency of Uml +1 implies that Uml (ϕ
by (23)

), k(m, l )) = {ak(m,l ) } ∪ Uml (ϕ

valmax(Uml (ϕ ) = Uml +1 (ϕ
).
Hence the claim, by definition of Ψ and the induction hypothesis.
Case pl = ∅. Then by definition of Ψ and the (rule-based) definition of
valmax
m, ∅, l + 1) = Ψ(ϕ,
Ψ(ϕ, m, ∅, l )

(both ideals consist of the same tokens). Moreover Uml +1 (ϕ

) = Uml (ϕ
),
by definition of pl . This completes the proof of (24).
Next we show
Ψ(ϕ,
m, d, l ) = d for d ⊇ Um (ϕ
). (25)
The proof is by induction on l . For l = 0 we have Ψ(ϕ, m, d, 0) = d by
definition. In the step l → l + 1 we again distinguish cases according to
the possible values of pl . In case pl = {tt} we know that ϕi ∪ Uj(m,i,l )
is inconsistent for some i = 1 . . . p, hence we have Ψ(ϕ,
m, d, l + 1) =
Ψ(ϕ, m, d, l ) = d by induction hypothesis. In case pl = {ff} we know
ak(m,l ) ∈ Um (ϕ ) ⊆ d . Hence the claim follows from the induction
hypothesis and the property (23) of valmax. In case pl = ∅ we have
Ψ(ϕ, m, d, l + 1) = Ψ(ϕ, m, d, l ) by definition of Ψ and the definition
of valmax, and the claim follows from the induction hypothesis. This
completes the proof of (25).
We can now proceed with the proof of (a). Define
Φ(ϕ,
x, d ) := Ψ(ϕ,
x, d, hx),
en(x, y, ϕ
) := Φ(ϕ, y, ∅)).
x, Φ(ϕ,
6.4. Computable functionals 303

m, ∅) = Um (ϕ
Recall that Φ(ϕ, ) by (24). The first property of en is now
obvious, since
en(m, ∅, ϕ
) = Φ(ϕ, ∅, ∅)) = Φ(ϕ,
m, Φ(ϕ, m, ∅) = Um (ϕ
).
For the second property let Un ⊇ Um and ϕ be given, and d := Un (ϕ ).
Then by definition en(m, n, ϕ ) = Φ(ϕ,
m, d ), and Φ(ϕ, m, d ) = d follows
from (25).
(b) Let = → and f, g be primitive recursive functions such that
the i-th token at type is ai = (Ufi , agi

). We will define incons from
similar functionals [ic] of type → N → B with the property

{tt} if ϕ ∪ {ai } is inconsistent,
[ic](ϕ, i) =
{ff} if ϕ ⊇ {ai }.
Note that by monotonicity this implies
tt ∈ [ic](ϕ, i) ↔ ϕ ∪ {ai } is inconsistent,
ff ∈ [ic](ϕ, i) ↔ ϕ ⊇ {ai }.
To see that there are such [ic]’s recursive in valmax, pcond and ∃ observe
that the following are equivalent:
[ic] (ϕ, i) = {tt},
ϕ ∪ {ai } is inconsistent,
ϕ ∪ {(Ufi , agi )} is inconsistent,
∃n (Un ⊇ Ufi and ϕ(Un ) ∪ {agi } is inconsistent),
∃n (ϕ(en (fi, n)) ∪ {agi } is inconsistent),
∃n ([ic] (ϕ(en (fi, n)), gi) = {tt}),
and also
[ic] (ϕ, i) = {ff},
ai ∈ ϕ,
(Ufi , agi ) ∈ ϕ,
agi ∈ ϕ(Ufi ),
[ic] (ϕ(en (fi, ∅)), gi) = {ff}.
Hence we can define
[ic] (ϕ, x) := ∃z ([ic] [ϕ(en (fx, z)), gx]).
We still have to define incons from [ic] . Let
[ic]∗ (ϕ, x, 0) := 0,
[ic]∗ (ϕ, x, y + 1) := [ic]∗ (ϕ, x, y) ∨ [ic](ϕ, j(x, y)),
304 6. Computability in higher types

where j(n, l ) is defined by Un = {aj(n,l ) | l < hn}. It is now easy to see
that incons with the properties required above can be defined by
incons (ϕ, x) := [ic]∗ (ϕ, x, hx).
Note that we need the coherence of Con here. Note also that we do need
the parallel or in the definition of [ic]∗ .

6.5. Total functionals

We now single out the total continuous functionals from the partial
ones. Our main goal will be the density theorem, which says that every
finite functional can be extended to a total one.
6.5.1. Total and structure-total ideals. The total and structure-total
ideals in the information system C of a finitary algebra have been
defined in 6.1.7. We now extend this definition to arbitrary types.
Definition. The total ideals of type are defined by induction on .
(a) Case . For an algebra , we inductively define when an ideal is total.
Recall that any ideal x of type has the form C yP y R (where C denotes
the continuous function |rC |). x is total if y P
are total, and for every
ypR the following holds. Let p → be the type of ypR . Then for all
total z p the result |ypR |(
z p ) of applying ypR to z p must be total.
(b) Case → . An ideal r of type → is total if and only if for all
total z of type , the result |r|(z) of applying r to z is total.
The structure-total ideals are defined similarly; the difference is that in
case the ideals at parameter positions of C need not be total. We write
x ∈ G to mean that x is a total ideal of type .
Remark. Note that in the arrow case of the definition of totality, we
have made use of the universal quantifier “for all total z of type ” with
an implication in its kernel. So using the concept of a total computable
functional to explain the meaning of the logical connectives—as it is done
in the Brouwer–Heyting–Kolmogorov interpretation (see 7.1.1)—is in this
sense somewhat circular.
6.5.2. Equality for total functionals.
Definition. An equivalence ∼ between total ideals x1 , x2 ∈ G is
defined by induction on .
(a) Case . For an algebra , we inductively define when two total ideals
x1 , x2 are equivalent. This is the case if both are of the form CyiP y
iR
with the same constructor C of , we have y 1 ∼ y
P
2 and y1p z ∼ y2p z ,
P R R

for all total z p and all p.

(b) Case → . For f, g ∈ G→ deﬁne f ∼→ g by ∀x∈G (fx ∼ gx).
6.5. Total functionals 305

Clearly ∼ is an equivalence relation. Similarly, we can deﬁne an equiva-

lence relation ≈ between structure-total ideals x1 , x2 .
We obviously want to know that ∼ (and similarly ≈ ) is compatible
with application; we only treat ∼ here. The non-trivial part of this
argument is to show that x ∼ y implies fx ∼ fy. First we need some
lemmata. Recall that our partial continuous functionals are ideals (i.e.,
certain sets of tokens) in the information systems C .
Lemma (Extension). If f ∈ G , g ∈ |C | and f ⊆ g, then g ∈ G .
Proof. By induction on . For base types use induction on the defi-
nition of f ∈ G . → : Assume f ∈ G→ and f ⊆ g. We must show
g ∈ G→ . So let x ∈ G . We have to show gx ∈ G . But gx ⊇ fx ∈ G ,
so the claim follows by induction hypothesis.
Lemma. (f1 ∩ f2 )x = f1 x ∩ f2 x, for f1 , f2 ∈ |C → | and x ∈ |C |.
Proof. By the definition of |r|,
|f1 ∩ f2 |x
= {b ∈ C | ∃U ⊆x ((U, b) ∈ f1 ∩ f2 )}
= {b ∈ C | ∃U1 ⊆x ((U1 , b) ∈ f1 )} ∩ {b ∈ C | ∃U2 ⊆x ((U2 , b) ∈ f2 )}
= |f1 |x ∩ |f2 |x.
The part ⊆ of the middle equality is obvious. For ⊇, let Ui ⊆ x with
(Ui , b) ∈ fi be given. Choose U = U1 ∪ U2 . Then clearly (U, b) ∈ fi (as
{(Ui , b)} (U, b) and fi is deductively closed).
Lemma. f ∼ g if and only if f ∩ g ∈ G , for f, g ∈ G .
Proof. By induction on . For use induction on the definitions of
f ∼ g and G . Case → :
f ∼→ g ↔ ∀x∈G (fx ∼ gx)
↔ ∀x∈G (fx ∩ gx ∈ G ) by induction hypothesis
↔ ∀x∈G ((f ∩ g)x ∈ G ) by the last lemma
↔ f ∩ g ∈ G→ .
Theorem. x ∼ y implies fx ∼ fy, for x, y ∈ G and f ∈ G→ .
Proof. Since x ∼ y we have x ∩ y ∈ G by the previous lemma. Now
fx, fy ⊇ f(x ∩ y) ∈ G and hence fx ∩ fy ∈ G . But this implies
fx ∼ fy again by the previous lemma.
6.5.3. Dense and separating sets. We prove the density theorem, which
says that every finitely generated functional (i.e., every U with U ∈ Con )
can be extended to a total one. Notice that we need to know here that
the base types have nullary constructors, as required in 6.1.4. Otherwise,
density might fail for the trivial reason that there are no total ideals at all
(e.g., in ( → )).
306 6. Computability in higher types

Deﬁnition. A type is called dense if

∀U ∈Con ∃x∈G (U ⊆ x),
and separating if
∀U,V ∈Con (U ↑ V → ∃x∈G→B ((U, tt) ∈ x ∧ (V, ff) ∈ x)).
We prove that every type is both dense and separating. This extended
claim is needed for the inductive argument.
Recall the definition (given in 6.1.5) of the height |a ∗ | of an extended
token a ∗ , and |U | of a formal neighborhood U , by
|Ca1∗ . . . an∗ | := 1 + max{|ai∗ | | i = 1, . . . , n}, | ∗ | := 0,
|(U, b)| := max{1 + |U |, 1 + |b|},
|{ai | i ∈ I }| := max{1 + |ai | | i ∈ I }.
Remark. Let U ∈ Con be non-empty. Then every token in U starts
with the same constructor C. Let Ui consist of all tokens at the i-th
argument position of some token in U . Then CU U (and also U
CU ), and |Ui | < |U |. (Recall
CU := {Ca∗ | ai∗ ∈ Ui if Ui = ∅, and ai∗ = ∗ otherwise}
was defined in 6.2.6, in the proof of (16).)
Theorem (Density). For all U, V ∈ Con
(a) ∃x∈G (U ⊆ x) and
(b) U ↑ V → ∃x∈G→B ((U, tt) ∈ x ∧ (V, ff) ∈ x).
Moreover, the required x ∈ G can be chosen to be Σ01 -definable in both cases.
Proof. The proof is by induction on max{|U |, |V |}, using a case dis-
tinction on the form of the type .
Case . For U = ∅ both claims are easy. Notice that for (a) we need
that every base type has a total ideal. Now assume that U ∈ Con is
non-empty. Define Ui from U as in the remark above; then CU U.
(a) By induction hypothesis (a) there are x ∈ G such that Ui ⊆ xi .
Then for x := |rC | x ∈ G we have U ⊆ x, since CU ⊆ x and CU U.
(b) Assume U ↑ V . We need z ∈ G→B such that (U, tt), (V, ff) ∈ z.
Define Vi from V as in the remark above; then C V V . If C = C ,
we have Ui ↑ Vi for some i. The induction hypothesis (b) for Ui yields
z ∈ Gi →B such that (Ui , tt), (Vi , ff) ∈ z . Define p ∈ G→i by the
computation rules p(C x ) = xi and p(C y ) = y for every constructor
C = C, with a fixed y ∈ Gi . Let z := z ◦ p. Then z ∈ G→B , and

(U, tt) ∈ z because of CU U , (CU , Ui ) ⊆ p and (Ui , tt) ∈ z ; similarly

(V, ff) ∈ z. If C = C , deﬁne z ∈ G→B by z(C x ) = tt and z(C y
) = ff for
all constructors C = C. Then clearly (U, tt), (V, ff) ∈ z.
Case → . (b) Let W1 , W2 ∈ Con→ and assume W1 ↑ W2 . Then
there are (Ui , ai ) ∈ Wi (i = 1, 2) with U1 ↑ U2 but a1 ↑ a2 . Because of
6.5. Total functionals 307

|U1 ∪ U2 | < max{|W1 |, |W2 |} by induction hypothesis (a) we have x ∈ G

such that U1 ∪ U2 ⊆ x. By induction hypothesis (b) we have v ∈ G
such that ({a1 }, tt), ({a2 }, ff) ∈ v. We need z ∈ G(→ )→B such that
(W1 , tt), (W2 , ff) ∈ z. It suffices to have ({(U1 , a1 )}, tt), ({(U2 , a2 )}, ff) ∈
z. Define z by zy := v(yx) (with v, x fixed as above). Then clearly
z ∈ G(→ )→B . Since z{(Ui , ai )} = v({(Ui , ai )}x) ⊇ v({ai }) and
({a1 }, tt), ({a2 }, ff) ∈ v we obtain ({(U1 , a1 )}, tt), ({(U2 , a2 )}, ff) ∈ z.
(a) Fix W = {(Ui , ai ) | i ∈ I } ∈ Con→ with I := {0, . . . , n − 1}.
Consider i < j such that ai ↑ aj . Then Ui ↑ Uj . By induction
hypothesis (b) there are zij ∈ G such that (Ui , tt), (Uj , ff) ∈ zij . Define
for every U ∈ Con a set IU of indices i ∈ I such that “U behaves as Ui
with respect to the zij ”. More precisely, let
IU := {k ∈ I | ∀i<k (ai ↑ ak → (U, ff) ∈ zik ) ∧
∀j>k (ak ↑ aj → (U, tt) ∈ zkj ) }.
Notice that k ∈ IUk . We first show
VU := {ak | k ∈ IU } ∈ Con .
It suffices to prove that ai ↑ aj for all i, j ∈ IU . Since ai ↑ aj is decidable
we can argue indirectly. So let i, j ∈ IU with i < j and assume that
ai ↑ aj . Then (U, ff), (U, tt) ∈ zij and hence zij would be inconsistent.
This contradiction proves ai ↑ aj and hence VU ∈ Con .
By induction hypothesis (a) we can find yVU ∈ G such that VU ⊆ yVU .
Let f ⊆ Con × C consist of all (U, a) such that
(a ∈ yVU ∧ ∀i<j<n (ai ↑ aj → (U, tt) ∈ zij ∨ (U, ff) ∈ zij )) ∨ VU a,
(26)
which is a Σ01 -formula. We will show f ∈ G→ and W ⊆ f.
For W ⊆ f we show (Ui , ai ) ∈ f for all i ∈ I . But this holds, since
i ∈ IUi , hence ai ∈ VUi .
We now show f ∈ |C → |. To prove this we verify the defining proper-
ties of approximable maps (cf. 6.1.3).
First we show that (U, a) ∈ r and (U, b) ∈ f imply a ↑ b. But from the
premises we obtain a, b ∈ yVU and hence a ↑ b.
Next we show that (U, b1 ), . . . , (U, bn ) ∈ f and {b1 , . . . , bn } b imply
(U, b) ∈ f. We argue by cases. If the left hand side of the disjunction in
(26) holds for one bk , then {b1 , . . . , bn } ⊆ yVU , hence b ∈ yVU and thus
(U, b) ∈ f. Otherwise VU {b1 , . . . , bn } b and therefore (U, b) ∈ f as
well.
Finally we show that (U, a) ∈ f and U U imply (U , a) ∈ f. We
again argue by cases. If the left hand side of the disjunction in (26) holds,
we have a ∈ yVU , and from U U we obtain
∀i<j<n (ai ↑ aj → (U , tt) ∈ zij ∨ (U , ff) ∈ zij ).
308 6. Computability in higher types

We show a ∈ yVU . We have IU = IU , hence VU = VU , hence yVU =

yVU . Now assume VU a. Because of U U we have IU ⊆ IU , hence
VU ⊆ VU , hence VU a and therefore a ∈ yVU .
It remains to prove f ∈ G→ . Let x ∈ G . We show fx ∈ G , i.e.,
{a ∈ C | ∃U ⊆x ((U, a) ∈ f)} ∈ G .
Recall zij ∈ G→B for all i < j with ai ↑ aj . Hence tt ∈ zij x or
ff ∈ zij x for all such i, j, and we have Uij ⊆ x with (Uij , tt) ∈ zij or
(Uij , ff) ∈ z
ij . Hence ∀i<j<n (ai ↑ aj → (U, tt) ∈ zij ∨ (U, ff) ∈ zij ) holds
with U := Uij . Therefore (U, a) ∈ f for all a ∈ yVU , i.e., yVU ⊆ fx
and hence fx ∈ G , by the first lemma in 6.5.2.
An easy consequence of the density theorem is a further characterization
of the equivalence between total ideals.
Corollary. x ∼ y if and only if x ∪ y is consistent, for x, y ∈ G .
Proof. “→”. We use induction on the definition of x ∼ y, and only
treat the case where f, g ∈ G→ and x ∼→ y has been inferred from
∀x∈G (fx ∼ gx). Let (U, a) ∈ f and (V, b) ∈ g and assume U ↑ V .
We must show a ↑ b. By the density theorem there is an x ∈ G with
U ∪ V ⊆ x. Hence a ∈ fx and b ∈ gx. By induction hypothesis fx ∪ gx
is consistent, and therefore a ↑ b.
“←”. Let x ∪ y be consistent. Then z := x ∪ y is an ideal extending
both x, y ∈ G. Hence z is total as well. Moreover x ∩ z = x ∈ G and
y ∩ z = y ∈ G. By the last lemma in 6.5.2 we obtain x ∼ z ∼ y.
As a final application of the density theorem we prove a choice principle
for total continuous functionals.
Theorem (Choice principle for total functionals). There is an ideal Γ ∈
|C (→ →B)→→ | such that for every F ∈ G→ →B satisfying
∀x∈G ∃y∈G (F (x, y) = tt)
we have Γ(F ) ∈ G→ and
∀x∈G (F (x, Γ(F, x)) = tt).
Proof. Let V0 , V1 , V2 , . . . be an enumeration of Con . By the density
theorem we can find yn ∈ G such that Vn ⊆ yn . Define a relation
Γ ⊆ Con→ →B × C→ by
Γ := {(W, U, a) | ∃m (W̄ Ū ym ) tt ∧ a ∈ ym ∧ ∀i<m (W̄ Ū yi ) ff))}.
We first show that Γ is an approximable map. To prove this we have to
verify the clauses of the definition of approximable maps.
(a) (W, U1 , a1 ), (W, U2 , a2 ) ∈ Γ imply (U1 , a1 ) ↑ (U2 , a2 ). Assume the
premise and U := U1 ∪ U2 ∈ Con . We show a1 ↑ a2 . The numbers mi in
the definition of (W, Ui , ai ) ∈ Γ are the same, = m say. Hence a1 , a2 ∈ ym ,
and the claim follows from the consistency of ym .
6.6. Notes 309

(b) (W, U1 , a1 ), . . . , (W, Un , an ) ∈ Γ and {(U1 , a1 ), . . . , (Un , an )}

(U, a) imply (W, U, a) ∈ Γ. Assume the premise and I := {i | U Ui }.
Then {ai | i ∈ I } a. Therefore the numbers mi in the definition of
(W, Ui , ai ) ∈ Γ are all the same, = m say. Hence {ai | i ∈ I } ⊆ ym , and
the claim follows from the deductive closure of ym .
(c) (W , U, a) ∈ Γ and W W imply (W, U, a) ∈ Γ. The claim
follows from the definition of Γ, since the m from (W , U, a) ∈ Γ can be
used for (W, U, a) ∈ Γ.
We finally show that for all F ∈ G→ →B satisfying
∀x∈G ∃y∈G (F (x, y) ) tt)
and all x ∈ G we have Γ(F, x) ∈ G and F (x, Γ(F, x)) ) tt. So let F
and x with these properties be given. By assumption there is a y ∈ G
such that F (x, y) ) tt. Hence by the definition of application there is
a Vn ∈ Con such that F (x, Vn ) ) tt. Since Vn ⊆ yn we also have
F (x, yn ) ) tt. Clearly we may assume here that n is minimal with this
property, i.e., that
F (x, y0 ) ) ff, . . . , F (x, yn−1 ) ) ff.
We show that Γ(F, x) ⊇ yn ; this suffices because every superset of a total
ideal is total. Recall that
Γ(F ) = {(U, a) ∈ Con × C | ∃W ⊆F ((W, U, a) ∈ Γ)}
and
Γ(F, x) = {a ∈ C | ∃U ⊆x ((U, a) ∈ Γ(F ))}
= {a ∈ C | ∃U ⊆x ∃W ⊆F ((W, U, a) ∈ Γ)}.
Let a ∈ yn . By the choice of n we get U ⊆ x and W ⊆ F such that
∀i<n (W̄ Ū yi ) ff) and W̄ Ū yn ) tt.
Therefore (W, U, a) ∈ Γ and hence a ∈ Γ(F, x).
0
It is easy to see from the proof that the functional Γ is in fact Σ1 -
definable. This “effective” choice principle generalizes the simple fact that
whenever we know the truth of ∀x∈N ∃y∈N Rxy with Rxy decidable, then
given x we can just search for a y such that Rxy holds; the truth of
∀x∈N ∃y∈N Rxy guarantees termination of the search.

6.6. Notes

The development of constructive theories of computable functionals

of ﬁnite type began with Gödel’s [1958]. There the emphasis was on
particular computable functionals, the structural (or primitive) recursive
ones. In contrast to what was done later by Kreisel, Kleene, Scott and
310 6. Computability in higher types

Ershov, the domains for these functionals were not constructed explicitly,
but rather considered as described axiomatically by the theory.
Denotational semantics for PCF-like languages is well-developed, and
usually (as in Plotkin’s [1977]) done in a domain-theoretic setting. For
thorough coverage of domain theory see Stoltenberg-Hansen, Griffor,
and Lindström [1994] or Abramsky and Jung [1994]. The study of the se-
mantics of non-overlapping higher type recursion equations—called here
computation rules—has been initiated in Berger, Eberl, and Schwichten-
berg [2003], again in a domain-theoretic setting. Berger [2005a] intro-
duced a “strict” variant of this domain-theoretic semantics, and used it
to prove strong normalization of extensions of Gödel’s T by different
versions of bar recursion. Information systems have been conceived by
Scott [1982] as an intuitive approach to domains for denotational se-
mantics. Coherent information systems have been introduced by Plotkin
[1978, p. 210]. Taking up Kreisel’s [1959] idea of neighborhood systems,
Martin-Löf developed in unpublished notes [1983] a domain-theoretic
interpretation of his type theory. The intersection type discipline of
Barendregt, Coppo, and Dezani-Ciancaglini [1983] can be seen as a
different style of presenting the idea of a neighborhood system. The
desire to have a more general framework for these ideas has led Martin-
Löf, Sambin and others to develop a formal topology; cf. Coquand,
Sambin, Smith, and Valentini [2003] and the forthcoming book of Sam-
bin.
The first proof of an adequacy theorem (not under this name) is due to
Plotkin [1977, Theorem 3.1]; Plotkin’s proof is by induction on the types,
and uses a computability predicate. A similar result in a type-theoretic
setting is in Martin-Löf’s notes [1983, Second Theorem]. Adequacy the-
orems have been proved in many contexts, by Abramsky [1991], Amadio
and Curien [1998], Barendregt, Coppo, and Dezani-Ciancaglini [1983],
Martin-Löf [1983]; Coquand and Spiwack [2006]—building on the work
of Martin-Löf [1983] and Berger [2005a]—observed that the adequacy
result even holds for untyped languages, hence also for dependently typed
ones.
The problem of proving strong normalization for extensions of typed -
calculi by higher order rewrite rules has been studied extensively in the lit-
erature: Tait [1971], Girard [1971], Troelstra [1973], Blanqui, Jouannaud,
and Okada [1999], Abel and Altenkirch [2000], Berger [2005a]. Most of
these proofs use impredicative methods (e.g., by reducing the problem to
strong normalization of second-order propositional logic, called system
F by Girard [1971]). Our definition of the strong computability predi-
cates and also the proof are related to Zucker’s [1973] proof of strong
normalization of his term system for recursion on the first three number
or tree classes. However, Zucker uses a combinatory term system and de-
fines strong computability for closed terms only. Following some ideas in
6.6. Notes 311

an unpublished note of Berger, Benl (in his diploma thesis [1998]) trans-
ferred this proof to terms in simply typed -calculus, possibly involving
free variables. Here it is adapted to the present context. Normalization
by evaluation has been introduced in Berger and Schwichtenberg [1991],
and extended to constants defined by computation rules in Berger, Eberl,
and Schwichtenberg [2003].
In 6.4.3 we have proved that every computable functional Φ is recursive
in valmax, pcond and ∃. If in addition one requires that Φ is total,
then in fact the “parallel” computation involved in pcond and ∃ can be
avoided. This has been conjectured by Berger [1993b] and proved by
Normann [2000]. For a good survey of these and related results we refer
the reader to Normann [2006].
The density theorem was first stated by Kreisel [1959]. Proofs of var-
ious versions of it have been given by Ershov [1972], Berger [1993b],
Stoltenberg-Hansen, Griffor, and Lindström [1994], Schwichtenberg
[1996] and Kristiansen and Normann [1997]. The proof given here is
based on the one given by Berger in [1993b], and extends to the case
where the base domains are not just the flat domain of natural numbers,
but non-flat and possibly parametrized free algebras. At several points it
makes use of ideas from Huber [2010].
Chapter 7

EXTRACTING COMPUTATIONAL CONTENT

FROM PROOFS

The treatment of our subject—proof and computation—would be incom-

plete if we could not address the issue of extracting computational content
from formalized proofs. The first author has over many years developed
a machine-implemented proof assistant, Minlog, within which this can be
done where, unlike many other similar systems, the extracted content lies
within the logic itself. Many non-trivial examples have been developed,
illustrating both the breadth and the depth of Minlog, and some of them
will be seen in what follows. Here we shall develop the theoretical un-
derpinnings of this system. It will be a theory of computable functionals
(TCF), a self-generating system built from scratch and based on mini-
mal logic, whose intended model consists of the computable functions on
partial continuous objects, as treated in the previous chapter. The main
tool will be (iterated) inductive definitions of predicates and their elimi-
nation (or least-fixed-point) axioms. Its computational strength will be
roughly that of ID< , but it will be more adaptable and computationally
applicable.
After developing the system TCF, we shall concentrate on delicate ques-
tions to do with finding computational content in both constructive and
classical existence proofs. We discuss three “proof interpretations” which
achieve this task: realizability for constructive existence proofs and, for
classical proofs, the refined A-translation and Gödel’s Dialectica interpre-
tation. After presenting these concepts and proving the crucial soundness
theorem for each of them, we address the question of how to implement
such proof interpretations. However, we do not give a description of
Minlog itself, but prefer to present the methods and their implementation
by means of worked examples. For references to the Minlog system see
Schwichtenberg [1992], [2006b] and http://www.minlog-system.de.

7.1. A theory of computable functionals

7.1.1. Brouwer–Heyting–Kolmogorov and Gödel. The Brouwer–Hey-
ting–Kolmogorov interpretation (BHK-interpretation for short) of intui-
tionistic (and minimal) logic explains what it means to prove a logically
313
314 7. Extracting computational content from proofs

compound statement in terms of what it means to prove its components;

the explanations use the notions of construction and constructive proof as
unexplained primitive notions. For prime formulas the notion of proof is
supposed to be given. The clauses of the BHK-interpretation are:
(i) p proves A ∧ B if and only if p is a pair p0 , p1 and p0 proves A, p1
proves B;
(ii) p proves A → B if and only if p is a construction transforming any
proof q of A into a proof p(q) of B;
(iii) ⊥ is a proposition without proof;
(iv) p proves ∀x∈D A(x) if and only if p is a construction such that for
all d ∈ D, p(d ) proves A(d );
(v) p proves ∃x∈D A(x) if and only if p is of the form d, q with d an
element of D, and q a proof of A(d ).
The problem with the BHK-interpretation clearly is its reliance on the
unexplained notions of construction and constructive proof. Gödel was
concerned with this problem for more than 30 years. In 1941 he gave
a lecture at Yale university with the title “In what sense is intuitionistic
logic constructive?”. According to Kreisel, Gödel “wanted to establish
that intuitionistic proofs of existential theorems provide explicit realizers”
(Feferman, Dawson et al. [1986, 1990, 1995, 2002a, 2002b], Vol. II,
p. 219). Gödel published his “Dialectica interpretation” in [1958], and
revised this work over and over again; its state in 1972 has been published
in the same volume. Troelstra, in his introductory note to the latter two
papers writes [loc. cit., pp. 220/221]:
Gödel argues that, since the finististic methods considered are
not sufficient to carry out Hilbert’s program, one has to admit at
least some abstract notions in a consistency proof; . . . However,
Gödel did not want to go as far as admitting Heyting’s abstract
notion of constructive proof; hence he tried to replace the
notion of constructive proof by something more definite, less
abstract (that is, more nearly finitistic), his principal candidate
being a notion of “computable functional of finite type” which
is to be accepted as sufficiently well understood to justify the
axioms and rules of his system T, an essentially logic-free theory
of functionals of finite type.
We intend to utilize the notion of a computable functional of finite type
as an ideal in an information system, as explained in the previous chapter.
However, Gödel noted that his proof interpretation is largely indepen-
dent of a precise definition of computable functional; one only needs to
know that certain basic functionals are computable (including primitive
recursion operators in finite types), and that they are closed under com-
position. Building on Gödel [1958], we assign to every formula A a new
one ∃x A1 (x) with A1 (x) ∃-free. Then from a derivation of A we want
7.1. A theory of computable functionals 315

to extract a “realizing term” r such that A1 (r). Of course its meaning

should in some sense be related to the meaning of the original formula
A. However, Gödel explicitly states in [1958, p. 286] that his Dialectica
interpretation is not the one intended by the BHK-interpretation.
7.1.2. Formulas and predicates. When we want to make propositions
about computable functionals and their domains of partial continuous
functionals, it is perfectly natural to take, as initial propositions, ones
formed inductively or coinductively. However, for simplicity we postpone
the treatment of coinductive definitions to 7.1.7 and deal with inductive
definitions only until then. For example, in the algebra N we can induc-
tively define totality by the clauses
T 0, ∀n (Tn → T (Sn)).
Its least-fixed-point scheme will now be taken in the form
∀n (Tn → A(0) → ∀n (Tn → A(n) → A(Sn)) → A(n)).
The reason for writing it in this way is that it fits more conveniently
with the logical elimination rules, which will be useful in the proof of
the soundness theorem in 7.2.8. It expresses that every “competitor”
{n | A(n)} satisfying the same clauses contains T . This is the usual
induction schema for natural numbers, which clearly only holds for “total”
numbers (i.e., total ideals in the information system for N). Notice that we
have used a “strengthened” form of the “step formula”, namely ∀n (Tn →
A(n) → A(Sn)) rather than ∀n (A(n) → A(Sn)). In applications of the
least-fixed-point axiom this simplifies the proof of the “induction step”,
since we have the additional hypothesis T (n) available. Totality for an
arbitrary algebra can be defined similarly. Consider for example the non-
finitary algebra O (cf. 6.1.4), with constructors 0, successor S of type
O → O and supremum Sup of type (N → O) → O. Its clauses are
TO 0, ∀x (TO x → TO (Sx)), ∀f (∀n∈T TO (fn) → TO (Sup(f))),
and its least-fixed-point scheme is
∀x (TO x → A(0) →
∀x (TO x → A(x) → A(Sx)) →
∀f (∀n∈T TO (fn) → ∀n∈T A(fn) → A(Sup(f))) →
A(x)).
Generally, an inductively defined predicate I is given by k clauses, which
are of the form
∀x (Ai → (∀yi (B
i → I si ))<ni → I ti ) (i < k).

Our formulas will be deﬁned by the operations of implication A →

B and universal quantification ∀x A from inductively defined predicates
316 7. Extracting computational content from proofs
where X is a “predicate variable”, and the Ki are “clauses”. Every
X K,
predicate has an arity, which is a possibly empty list of types.
Definition (Predicates and formulas). Let X, Y be distinct predicate
variables; the Yl are called predicate parameters. We inductively define
formula forms A, B, C, D ∈ F(Y ), predicate forms P, Q, I, J ∈ Preds(Y )
and clause forms K ∈ ClX (Y ); all these are called strictly positive in
Y . In case Y is empty we abbreviate F(Y ) by F and call its elements
formulas; similarly for the other notions. (However, for brevity we often
say “formula” etc. when it is clear from the context that parameters may
occur.)
A ∈ F B ∈ F(Y ) A ∈ F(Y )
Yl r ∈ F(Y ), , ,
A → B ∈ F(Y ) ∀x A ∈ F(Y )

C ∈ F(Y ) P ∈ Preds(Y )
, ,
{
x | C } ∈ Preds(Y ) Pr ∈ F(Y )

K0 , . . . , Kk−1 ∈ ClX (Y )

(k ≥ 1),
X (K0 , . . . , Kk−1 ) ∈ Preds(Y )

A ∈ F(Y ) n−1 ∈ F
0 , . . . , B
B
(n ≥ 0).
∀x (A → (∀y (B
→ X s ))<n → X t ) ∈ ClX (Y )

Here A → B means A0 → · · · → An−1 → B, associated to the right.

For a clause ∀x (A → (∀y (B → X s ))<n → X t ) ∈ ClX (Y ) we call A
parameter premises and ∀y (B → X s ) recursive premises. We require
that in X (K0 , . . . , Kk−1 ) the clause K0 is “nullary”, without recursive
premises. The terms r are those introduced in section 6.2, i.e., typed terms
built from variables and constants by abstraction and application, and
(importantly) those with a common reduct are identiﬁed.
A predicate of the form { x | C } is called a comprehension term. We
identify {
x | C (
x )}r with C (r ). The letter I will be used for predicates of
the form X (K0 , . . . , Kk−1 ); they are called inductively deﬁned predicates.
Remark (Substitution for predicate parameters). Let A ∈ F(Y ); we
write A(Y ) for A to indicate its dependence on the predicate parame-
ters Y . Similarly we write I (Y ) for I if I ∈ Preds(Y ). We can substitute
predicates P for Y , to obtain A(P)
and I (P),
respectively.

An inductively deﬁned predicate is ﬁnitary if its clauses have recursive

premises of the form X s only (so the y in the general deﬁnition
and B
are all empty).
Deﬁnition (Theory of computable functionals, TCF). TCF is the
system in minimal logic for → and ∀, whose formulas are those in F
7.1. A theory of computable functionals 317

above, and whose axioms are the following. For each inductively de-
fined predicate, there are “closure” or introduction axioms, together with
a “least-fixed-point” or elimination axiom. In more detail, consider an
inductively defined predicate I := X (K0 , . . . , Kk−1 ). For each of the k
clauses we have an introduction axiom, as follows. Let the i-th clause for
I be
Ki (X ) := ∀x (A → (∀y (B
→ X s ))<n → X t ).
Then the corresponding introduction axiom is Ki (I ), that is,
∀x (A → (∀y (B
→ I s ))<n → I t ). (1)
The elimination axiom is
∀x (I x
→ (Ki (I, P))i<k → P x
), (2)
where
Ki (I, P) := ∀x (A → (∀y (B
→ I s ))<n →
→ Ps ))<n → Pt ).
(∀y (B
We label each introduction axiom Ki (I ) by Ii+ and the elimination axiom
by I − .
7.1.3. Equalities. A word of warning is in order here: we need to dis-
tinguish four separate but closely related equalities.
(i) Firstly, defined function constants D are introduced by computation
rules, written l = r, but intended as left-to-right rewrites.
(ii) Secondly, we have Leibniz equality Eq inductively defined below.
(iii) Thirdly, pointwise equality between partial continuous functionals
will be defined inductively as well.
(iv) Fourthly, if l and r have a finitary algebra as their type, l = r can be
read as a boolean term, where = is the decidable equality defined in
6.2.4 as a boolean-valued binary function.
Leibniz equality. We define Leibniz equality by
Eq() := X (∀x X (x , x )).
The introduction axiom is
∀x Eq(x , x )
and the elimination axiom
∀x,y (Eq(x, y) → ∀x Pxx → Pxy),
where Eq(x, y) abbreviates Eq()(x , y ).
Lemma (Compatibility of Eq). ∀x,y (Eq(x, y) → A(x) → A(y)).
Proof. Use the elimination axiom with Pxy := (A(x) → A(y)).
Using compatibility of Eq one easily proves symmetry and transitivity.
Define falsity by F := Eq(ff, tt). Then we have
318 7. Extracting computational content from proofs

Theorem (Ex-falso-quodlibet). For every formula A without predicate

parameters we can derive F → A.
Proof. We first show that F → Eq(x , y ). To see this, we first obtain
Eq([if ff then x else y], [if ff then x else y]) from the introduction axiom,
since [if ff then x else y] is an allowed term, and then from Eq(ff, tt) we
get Eq([if tt then x else y], [if ff then x else y]) by compatibility. Hence
Eq(x , y ).
The claim can now be proved by induction on A ∈ F. Case I s. Let Ki
be the nullary clause, with final conclusion I t. By induction hypothesis
from F we can derive all parameter premises. Hence I t. From F we also
obtain Eq(si , ti ), by the remark above. Hence I s by compatibility. The
cases A → B and ∀x A are obvious.
A crucial use of the equality predicate Eq is that it allows us to lift
a boolean term r B to a formula, using atom(r B ) := Eq(r B , tt). This
opens up a convenient way to deal with equality on finitary algebras. The
computation rules ensure that, for instance, the boolean term Sr =N Ss,
or more precisely =N (Sr, Ss), is identified with r =N s. We can now
turn this boolean term into the formula Eq(Sr =N Ss, tt), which again is
abbreviated by Sr =N Ss, but this time with the understanding that it is a
formula. Then (importantly) the two formulas Sr =N Ss and r =N s are
identified because the latter is a reduct of the first. Consequently there is
no need to prove the implication Sr =N Ss → r =N s explicitly.
Pointwise equality = . For every constructor Ci of an algebra we have
an introduction axiom
y P = z P → (∀x (ym+
∀y ,z ( R
x R
= zm+ ))<n → Ci y
x = Ci z ).
For an arrow type → the introduction axiom is explicit, in the sense
that it has no recursive premise:
∀x1 ,x2 (∀y (x1 y = x2 y) → x1 =→ x2 ).
For example, =N is inductively defined by
0 =N 0,
∀n1 ,n2 (n1 =N n2 → Sn1 =N Sn2 ),
and the elimination axiom is
∀n1 ,n2 (n1 =N n2 → P00 →
∀n1 ,n2 (n1 =N n2 → Pn1 n2 → P(Sn1 , Sn2 )) →
Pn1 n2 ).

An example with the non-ﬁnitary algebra T1 (cf. 6.1.4) is:

0 =T1 0,
∀f1 ,f2 (∀n (f1 n =T1 f2 n) → Sup(f1 ) =T1 Sup(f2 )),
7.1. A theory of computable functionals 319

and the elimination axiom is

∀x1 ,x2 (x1 =T1 x2 → P00 →
∀f1 ,f2 (∀n (f1 n =T1 f2 n) → ∀n P(f1 n, f2 n) →
P(Sup(f1 ), Sup(f2 ))) →
Px1 x2 ).
The main purpose of pointwise equality is that it allows us to formulate the
extensionality axiom: we express the extensionality of our intended model
by stipulating that pointwise equality is equivalent to Leibniz equality.
Axiom (Extensionality). ∀x1 ,x2 (x1 = x2 ↔ Eq(x1 , x2 )).
We write E-TCF when the extensionality axioms are present.
7.1.4. Existence, conjunction and disjunction. One of the main points
of TCF is that it allows the logical connectives existence, conjunction
and disjunction to be inductively defined as predicates. This was first
discovered by Martin-Löf [1971].
Existential quantifier.
Ex(Y ) := X (∀x (Yx → X )).
The introduction axiom is
∀x (A → ∃x A),
where ∃x A abbreviates Ex({x | A}), and the elimination axiom is
∃x A → ∀x (A → P) → P.
Conjunction. We define
And(Y, Z) := X (Y → Z → X ).
The introduction axiom is
A→B →A∧B
where A ∧ B abbreviates And({| A}, {| B}), and the elimination axiom is
A ∧ B → (A → B → P) → P.
Disjunction. We define
Or(Y, Z) := X (Y → X, Z → X ).
The introduction axioms are
A → A ∨ B, B → A ∨ B,
where A ∨ B abbreviates Or({| A}, {| B}), and the elimination axiom is
A ∨ B → (A → P) → (B → P) → P.
Remark. Alternatively, disjunction A ∨ B could be defined by the for-
mula ∃p ((p → A)∧(¬p → B)) with p a boolean variable. However, for an
analysis of the computational content of coinductively defined predicates
it is better to define it inductively.
320 7. Extracting computational content from proofs

7.1.5. Further examples. We give some more familiar examples of in-

ductively defined predicates.
The even numbers. The introduction axioms are
Even(0), ∀n (Even(n) → Even(S(Sn)))
and the elimination axiom is
∀n (Even(n) → P0 → ∀n (Even(n) → Pn → P(S(Sn))) → Pn).
Transitive closure. Let ≺ be a binary relation. The transitive closure of
≺ is inductively defined as follows. The introduction axioms are
∀x,y (x ≺ y → TC(x, y)),
∀x,y,z (x ≺ y → TC(y, z) → TC(x, z))
and the elimination axiom is
∀x,y (TC(x, y) → ∀x,y (x ≺ y → Pxy) →
∀x,y,z (x ≺ y → TC(y, z) → Pyz → Pxz) →
Pxy).
Accessible part. Let ≺ again be a binary relation. The accessible part
of ≺ is inductively defined as follows. The introduction axioms are
∀x (F → Acc(x)),
∀x (∀y≺x Acc(y) → Acc(x)),
and the elimination axiom is
∀x (Acc(x) → ∀x (F → Px) →
∀x (∀y≺x Acc(y) → ∀y≺x Py → Px) →
Px).
7.1.6. Totality and induction. In 6.1.7 we have defined what the total
and structure-total ideals of a finitary algebra are. We now inductively
define corresponding predicates G (no connection whatsoever with the
slow-growing hierarchy) and T ; this inductive definition works for arbi-
trary algebras . The least-fixed-point axiom for T will provide us with
the induction axiom.
Let us first look at some examples. We already have stated the clauses
defining totality for the algebra N:
TN 0, ∀n (TN n → TN (Sn)).
The least-fixed-point axiom is
∀n (TN n → P0 → ∀n (TN n → Pn → P(Sn)) → Pn).
Clearly the partial continuous functionals with TN interpreted as the total
ideals for N provide a model of TCF extended by these axioms.
7.1. A theory of computable functionals 321

For the algebra D of derivations totality is inductively deﬁned by the

clauses
TD 0D , ∀x,y (TD x → TD y → TD (CD→D→D xy)),
with least-fixed-point axiom
∀x (TD x → P0D →
∀x,y (TD x → TD y → Px → Py → P(CD→D→D xy)) →
Px).
Again, the partial continuous functionals with TD interpreted as the total
ideals for D (i.e., the finite derivations) provide a model.
As an example of a finitary algebra with parameters consider L(). The
clauses defining its (full, “gesamt”) totality predicate GL() are
GL() (nil), ∀x,l (G x → GL() l → GL() (x :: l )),
where G is assumed to be defined already; x :: l is shorthand for
cons(x, l ). In contrast, the clauses for the predicate TL() expressing
structure-totality are
TL() (nil), ∀x,l (TL() l → TL() (x :: l )),
with no assumptions on x.
Generally, for arbitrary types we inductively define predicates G of
totality and T of structure-totality, by induction on . This definition
is relative to an assignment of predicate variables Gα , Tα of arity (α) to
type variables α.
Definition. In case ∈ Alg(α
) we have = (κ0 , . . . , κk−1 ), with
κi = → ( → )<n → . Then G := X (K0 , . . . , Kk−1 ), with
Ki := ∀x (G x
P → (∀y (G y
→ X (xR y
)))<n → X (Ci x
)).
Similarly, T := X (K0 , . . . , Kk−1

), with
Ki := ∀x ((∀y (T y
→ X (xR y
)))<nR → X (Ci x
)).
For arrow types the definition is explicit; that is, the clauses have no
recursive premises but parameter premises only.
G→ := X ∀f (∀x (G x → G (fx)) → Xf),
T→ := X ∀f (∀x (T x → T (fx)) → Xf).
This concludes the definition.
In the case of an algebra the introduction axioms for T are
i : ∀x
(T )+ → T (xR y
(T y
((∀y )))<n → T (Ci x
))
and the elimination axiom is
T− : ∀x (T x → K0 (T , P) → · · · → Kk−1 (T , P) → Px),
322 7. Extracting computational content from proofs

where
Ki (T , P) := ∀x ((∀y (T y
→ T (xR y
)))<n →
→ P(x y
(∀y (T y R
)))<n → P(Ci x
)).
In the arrow type case, the introduction and elimination axioms are
∀x (T x → T (fx)) → T→ f,
T→ f → ∀x (T x → T (fx)).
(The “official” axiom T→ f → (∀x (T x → T (fx)) → P) → P is
clearly equivalent to the one stated.) Abbreviating ∀x (Tx → A) by ∀x∈T A
allows a shorter formulation of these axioms:
))<n → T (Ci x
(∀y ∈T T (xR y ),
∀x∈T (K0 (T , P) → · · · → Kk−1 (T , P) → Px),
∀x∈T T (fx) → T→ f,
∀f∈T→ ,x∈T T (fx))
where
Ki (T , P) := ∀x P ∀x R ∈T ((∀y ∈T P(xR y
))<n → P(Ci x
)).
Hence the elimination axiom T− is the induction axiom, and the Ki (T , P)
are its step formulas. We write Indx,P
or Indx,P for T− , and omit the indices
x, P when they are clear from the context. Examples are
Indp,P : ∀p∈T (Ptt → Pff → PpB ),
Indn,P : ∀n∈T (P0 → ∀n∈T (Pn → P(Sn)) → Pn N ),
Indl,P : ∀l ∈T (P(nil) → ∀x ∀l ∈T (Pl → P(x :: l )) → Pl L() ),
Indz,P : ∀z∈T (∀x,y Px , y → Pz × ),
where x :: l is shorthand for cons(x, l ) and x, y for ×+ xy.
All this can be done similarly for the G . A difference only occurs for
algebras with parameters: for example, list induction then is
∀l ∈G (P(nil) → ∀x,l ∈G (Pl → P(x :: l )) → Pl L() ).
Parallel to general recursion, one can also consider general induction,
which allows recurrence to all points “strictly below” the present one.
For applications it is best to make the necessary comparisons w.r.t. a
“measure function” . Then it suffices to use an initial segment of the
ordinals instead of a well-founded set. For simplicity we here restrict
ourselves to the segment given by , so the ordering we refer to is just
the standard <-relation on the natural numbers. The principle of general
induction then is
∀,x∈T (Progx Px → Px) (3)
7.1. A theory of computable functionals 323

where Progx Px expresses “progressiveness” w.r.t. the measure function

and the ordering <:
Progx Px := ∀x∈T (∀y∈T ;y<x Py → Px).
It is easy to see that in our special case of the <-relation we can prove (3)
from structural induction. However, it will be convenient to use general
induction as a primitive axiom.
7.1.7. Coinductive definitions. We now extend TCF by allowing coin-
ductive definitions as well as inductive ones. For instance, in the algebra
N we can coinductively define cototality by the clause
co
TN n → Eq(n, 0) ∨ ∃m (Eq(n, Sm) ∧ co TN m).
Its greatest-fixed-point axiom is
Pn → ∀n (Pn → Eq(n, 0) ∨ ∃m (Eq(n, Sm) ∧ (co TN m ∨ Pm)) → co TN n.
It expresses that every “competitor” P satisfying the same clause is a
subset of co TN . The partial continuous functionals with co TN interpreted
as the cototal ideals for N provide a model of TCF extended by these
axioms. The greatest-fixed-point axiom is called the coinduction axiom
for natural numbers.
Similarly, for the algebra D of derivations with constructors 0D and
CD→D→D cototality is coinductively defined by the clause
co
TD x → Eq(x, 0) ∨ ∃y,z (Eq(x, Cyz) ∧ co TD y ∧ co TD z).
Its greatest-fixed-point axiom is
Px → ∀x (Px → Eq(x, 0) ∨ ∃y,z (Eq(x, Cyz)∧(co TD x ∨ Py) ∧
(co TD x ∨ Pz))) → co TD x.
The partial continuous functionals with co TD interpreted as the cototal
ideals for D (i.e., the finite or infinite locally correct derivations) provide
a model.
For the algebra I of standard rational intervals cototality is defined by
co
TI x → Eq(x, I) ∨ ∃y (Eq(x, C−1 y) ∧ co TI y) ∨
∃y (Eq(x, C0 y) ∧ co TI y) ∨
∃y (Eq(x, C1 y) ∧ co TI y).
A model is provided by the set of all finite or infinite streams of signed digits
from {−1, 0, 1}, i.e., the well-known (non-unique) stream representation
of real numbers.
Generally, a coinductively defined predicate J is given by exactly one
clause, which is of the form

∀x (J x
→ ∃yi ( Ai ∧ ∀yi (B
i → J si ))).
i<k <ni
324 7. Extracting computational content from proofs

More precisely, we must extend the deﬁnition of formulas and predicates

in 7.1.2 by (co)clause forms K ∈ co ClX (Y ), and need the additional rules
K ∈ co ClX (Y )
,
X K ∈ Preds(Y )

Ai ∈ F(Y ) ini −1 ∈ F

i0 , . . . , B
B (i < k)
,
∀x (X x
→ i<k ∃yi ( Ai ∧ <ni ∀yi (B i → X si ))) ∈ co ClX (Y )
where we require k > 0 and n0 = 0. The letter J will be used for predicates
of the form X K , called coinductively defined. For each coinductively
defined predicate, there is a closure axiom

J − : ∀x (J x
→ ∃yi ( Ai ∧ ∀yi (B
i → J si )))
i<k <ni
+
and a greatest-fixed-point axiom J

∀x (P x
→ ∀x (P x
→ ∃yi ( Ai ∧
i<k

∀yi (B
i → (J si ∨ Psi )))) → J x
).
<ni

Notice that the proof of the ex-falso-quodlibet theorem in 7.1.3 can be

easily extended by a case J s with J coinductively defined: use the greatest-
fixed-point axiom for J with P x := F. Since k > 0 and n0 = 0 it suffices

to prove F → ∃yi Ai . But this follows from the induction hypothesis.
A coinductively defined predicate is finitary if its clause has the form

∀x (J x
→ i<k ∃yi ( Ai ∧ J si )) (so the y i and B i in the general defini-
tion are all empty). We will often restrict to finitary coinductively defined
predicates only.
The most important coinductively defined predicates for us will be those
of cototality and structure-cototality; we have seen some examples above.
Generally, for a finitary algebra cototality and structure-cototality are
coinductively defined by

co
G x → ∃yi (Eq(x, Ci y
i ) ∧ co G y
i ),
i<k

co
T x → ∃yiP ,yiR (Eq(x, Ci y iR ) ∧ co T y
iP y iR ).
i<k

Finally we consider simultaneous inductive/coinductive deﬁnitions of

predicates. An example where this comes up is the formalization of an
abstract theory of (uniformly) continuous real functions f : I → I where
I := [−1, 1] (cf. 6.1.7); “continuous” is to mean “uniformly continuous”
here. Let Cf abbreviate the formula expressing that f is a continuous
real function, and Ip,k := [p − 2−k , p + 2−k ]. Assume we can prove in the
abstract theory
Cf → ∀k ∃l Bl,k f, with Bl,k f := ∀p ∃q (f[Ip,l ] ⊆ Iq,k ). (4)
7.1. A theory of computable functionals 325

The converse is true as well: every real function f satisfying ∀k ∃l Bl,k f is

(uniformly) continuous.
For d ∈ {−1, 0, 1} let Id be defined by I−1 := [−1, 0], I0 := [− 12 , 12 ]
and I1 := [0, 1]. We define continuous real functions ind , outd such that
ind [I] = Id and outd [Id ] = I by
d +x
ind (x) := , outd (x) := 2x − d.
2
Clearly both functions are inverse to each other.
We give an inductive definition of a predicate IY depending on a pa-
rameter Y by
f[I] ⊆ Id → Y (outd ◦ f) → IY f (d ∈ {−1, 0, 1}), (5)
IY (f ◦ in−1 ) → IY (f ◦ in0 ) → IY (f ◦ in1 ) → IY f. (6)
The corresponding least-fixed-point axiom is
IY f → (∀f (f[I] ⊆ Id → Y (outd ◦ f) → Pf))d ∈{−1,0,1} →
∀f ((IY (f ◦ind ))d ∈{−1,0,1} → (P(f ◦ind ))d ∈{−1,0,1} → Pf) →
Pf). (7)
Using IY we give a “simultaneous inductive/coinductive” definition of a
predicate J by
Jf → Eq(f, id) ∨ IJ f. (8)
The corresponding greatest-fixed-point axiom is
Qf → ∀f (Qf → Eq(f, id) ∨ IJ ∨Q f) → Jf. (9)
We now restrict attention to continuous functions f on the interval I
satisfying f[I] ⊆ I. Define

Bl,k f := ∀p ∃q (f[Ip,l ∩ I] ⊆ Iq,k ).

Lemma. (a) Bl,k (outd ◦ f) → Bl,k+1 f.

(b) Assume Bld ,k+1 (f ◦ ind ) for all d ∈ {−1, 0, 1}. Then Bl,k+1 (f) with
l := 1 + maxd ∈{−1,0,1} ld .
Proof. (a) Let p and x be given such that
1 1
−1, p − l ≤ x ≤ p + l , 1.
2 2
We need q such that
1 1
q − k+1 ≤ f(x) ≤ q + k+1 .
2 2
By assumption we have q such that
1 1
q − k ≤ 2f(x) − d ≤ q + k .
2 2
Let q := q+d
2 .
326 7. Extracting computational content from proofs

(b) Let p and x be given such that

1 1
−1, p − l
≤ x ≤ p + l , 1.
2 2
Then
1 1
−2, 2p − ≤ 2x ≤ 2p + max l , 1.
2max ld 2 d

By choosing d ∈ {−1, 0, 1} appropriately we can ensure that −1 ≤ 2x −

d ≤ 1. Hence
1 1
−1, 2p − d − ≤ 2x − d ≤ 2p − d + l , 1.
2ld 2d
The assumption Bld ,k+1 (f ◦ ind ) for 2p − d yields q such that
1 1
q− ≤ f(ind (2x − d )) ≤ q + k+1 .
2k+1 2
But ind (2x − d ) = x.
Proposition. (a) ∀f (Cf → f[I] ⊆ I → Jf).

(b) ∀f (Jf → f[I] ⊆ I → ∀k ∃l Bl,k f).
Proof. (a) Assume Cf. We use (9) with Q := {f | Cf ∧ f[I] ⊆ I}.
Let f be arbitrary; it suﬃces to show Qf → IJ ∨Q f. Assume Qf, i.e.,
Cf and f[I] ⊆ I. By (4) we have an l such that Bl,2 f. We prove
∀l,f (Bl,2 f → Cf → f[I] ⊆ I → IJ ∨Q f) by induction on l . Base,
l = 0. B0,2 f implies that there is a rational q such that f[I0,0 ] ⊆ Iq,2 .
Because of I0,0 = I, f[I] ⊆ I and the fact that there is a d such that
Iq,2 ∩ I ⊆ Id we have f[I] ⊆ Id and hence (outd ◦ f)[I] ⊆ I. Then
Q(outd ◦ f) since outd ◦ f is continuous. Hence IJ ∨Q f by (5). Step.
Assume l > 0. Then Bl −1,2 (f ◦ ind ) because of Bl,2 f, and clearly f ◦ ind
is continuous and satisﬁes (f ◦ ind )[I] ⊆ I, for every d . By induction
hypothesis IJ ∨Q (f ◦ ind ). Hence IJ ∨Q f by (6).

(b) We prove ∀k ∀f (Jf → f[I] ⊆ I → ∃l Bl,k f) by induction on k.

Base. Because of f[I] ⊆ I clearly B0,0 f. Step, k → k + 1. Assume
Jf and f[I] ⊆ I. Then Eq(f, id) ∨ IJ f by (8). In case Eq(f, id) the

claim is trivial, since then clearly Bk+1,k+1 f. We prove ∀f (IJ f → f[I] ⊆

I → ∃l Bl,k+1 f) using (7), that is, by a side induction on IJ f. Side
induction base. Assume f[I] ⊆ Id and J (outd ◦ f). We must show

f[I] ⊆ I → ∃l Bl,k+1 f. Because of f[I] ⊆ Id we have (outd ◦ f)[I] ⊆ I.

The main induction hypothesis yields an l such that Bl,k (outd ◦ f), hence

Bl,k+1 f by the lemma. Side induction step. Assume, as side induction
hypothesis, (f ◦ ind )[I] ⊆ I → Bld ,k+1 (f ◦ ind ) for all d ∈ {−1, 0, 1}.

We must show f[I] ⊆ I → ∃l Bl,k+1 f. Assume f[I] ⊆ I. Then clearly

(f ◦ ind )[I] ⊆ I. Hence Bld ,k+1 (f ◦ ind ) for all d ∈ {−1, 0, 1}. By the

lemma this implies Bl,k+1 f with l := 1 + maxd ∈{−1,0,1} ld .
7.2. Realizability interpretation 327

Our general form of simultaneous inductive/coinductive deﬁnitions of

predicates is based on an inductively defined IY with a predicate parame-
ter Y ; this is needed to formulate the greatest-fixed-point axiom for the
simultaneously defined J . More precisely, we coinductively define J by

∀x (J x
→ ∃yi ( Ai ∧ ∀yi (B
i → IJ si ))).
i<k <ni

Its greatest-ﬁxed-point axiom then is

J + : ∀x (P x
→ ∀x (P x
→ ∃yi ( Ai ∧
i<k

∀yi (B
i → IJ ∨P si ))) → J x
).
<ni

The deﬁnition of formulas and predicates in 7.1.2 can easily be adapted,

and the proof of the ex-falso-quodlibet theorem in 7.1.3 extended. A
simultaneous inductive/coinductive deﬁnition is ﬁnitary if both parts are.

7.2. Realizability interpretation

We now come to the crucial step of inserting “computational content”

into proofs, which can then be extracted. It consists in “decorating”
our connectives → and ∀, or rather allowing “computational” variants
→c and ∀c as well as non-computational ones →nc and ∀nc . This dis-
tinction (for the universal quantifier) is due to Berger [1993a], [2005b].
The logical meaning of the connectives is not changed by the decoration.
Since we inductively defined predicates by means of clauses built with →
and ∀, we can now introduce computational variants of these predicates.
This will give us the possibility to fine-tune the computational content of
proofs.
For instance, the introduction and elimination axioms for the induc-
tively defined totality predicate T in the algebra N will be decorated as
follows. The clauses are

T 0, ∀nc
n (Tn → T (Sn)),
c

and its elimination axiom is

∀nc
n (Tn → P0 → ∀n (Tn → Pn → P(Sn)) → Pn).
c c nc c c c

If Tr holds, then this fact must have been derived from the clauses, and
hence we have a total ideal in an algebra built in correspondence with the
clauses, which in this case is N again. The predicate T can be understood
as the least set of witness–argument pairs satisfying the clauses; the witness
being a total ideal.
328 7. Extracting computational content from proofs

7.2.1. An informal explanation. The ideas that we develop here are

illustrated by the following simple situation. The computational content
of an implication Pn →c P(Sn) is that demanded of an implication by the
BHK interpretation, namely a function from evidence for Pn to evidence
for P(Sn). The universal quantiﬁer ∀n is non-computational if it merely
supplies n as an “input”, whereas to say that a universal quantiﬁer is
computational means that a construction of input n is also supplied.
Thus a realization of
∀nc
n (Pn → P(Sn))
c

will be a unary function f such that if r “realizes” Pn, then fr realizes

P(Sn), for every n. On the other hand, a realization of
∀cn (Pn →c P(Sn))
will be a binary function g which, given a number n and a realization r of
Pn, produces a realization g(n, r) of P(Sn). Therefore an induction with
basis and step of the form
P0, ∀nc
n (Pn → P(Sn))
c

will be realized by iterates f (n) (r0 ), whereas a computational induction

P0, ∀cn (Pn →c P(Sn))
will be realized by the primitive recusively defined h(n, r0 ) where h(0, r0 ) =
r0 and h(Sn, r0 ) = g(n, h(n, r0 )).
Finally, a word about the non-computational implication: a realizer of
A →nc B will depend solely on the existence of a realizer of A, but will
be completely independent of which one it is. An example would be an
induction
P0, ∀cn (Pn →nc P(Sn))
where the realizer h(n, r0 ) is given by h(0, r0 ) = r0 , h(Sn, r0 ) = g(n),
without recursive calls. The point is that in this case g does not depend
on a realizer for Pn, only upon the number n itself.
7.2.2. Decorating → and ∀. We adapt the definition in 7.1.2 of pre-
dicates and formulas to the newly introduced decorated connectives →c , ∀c
and →nc , ∀nc . Let → denote either →c or →nc , and similarly ∀ either ∀c or
∀nc . Then the definition in 7.1.2 can be read as it stands.
We also need to adapt our definition of TCF to the decorated connec-
tives →c , →nc and ∀c , ∀nc . The introduction and elimination rules for →c
and ∀c remain as before, and also the elimination rules for →nc and ∀nc .
However, the introduction rules for →nc and ∀nc must be restricted: the
abstracted (assumption or object) variable must be “non-computational”,
in the following sense. Simultaneously with a derivation M we define the
sets CV(M ) and CA(M ) of computational object and assumption vari-
ables of M , as follows. Let M A be a derivation. If A is non-computational
7.2. Realizability interpretation 329

(n.c.), i.e., the type (A) of A (deﬁned below in 7.2.4) is the “nulltype”
symbol ◦, then CV(M A ) := CA(M A ) := ∅. Otherwise

CV(c A ) := ∅ (c A an axiom),
CV(u A ) := ∅,
c nc
CV((u A M B )A→ B ) := CV((u A M B )A→ B
) := CV(M ),
A→c B
CV((M N ) ) := CV(M ) ∪ CV(N ),
A B

nc
CV((M A→ B
N A )B ) := CV(M ),
CV((x M A )∀x A ) := CV((x M A )∀x A ) := CV(M ) \ {x},
c nc

CV((M ∀x A(x) r)A(r) ) := CV(M ) ∪ FV(r),

CV((M ∀x A(x) r)A(r) ) := CV(M ),

and similarly

CA(c A ) := ∅ (c A an axiom),
CA(u A ) := {u},
c nc
CA((u A M B )A→ B ) := CA((u A M B )A→ B
) := CA(M A ) \ {u},
c
CA((M A→ B N A )B ) := CA(M ) ∪ CA(N ),
nc
CA((M A→ B
N A )B ) := CA(M ),
CA((x M A )∀x A ) := CA((x M A )∀x A ) := CA(M ),
c nc

CA((M ∀x A(x) r)A(r) ) := CA((M ∀x A(x) r)A(r) ) := CA(M ).

c nc

The introduction rules for →nc and ∀nc then are

nc
(i) If M B is a derivation and u A ∈ / CA(M ) then (u A M B )A→ B is a
derivation.
(ii) If M A is a derivation, x is not free in any formula of a free assumption
/ CV(M ), then (x M A )∀x A is a derivation.
nc
variable of M and x ∈
An alternative way to formulate these rules is simultaneously with the
notion of the “extracted term” et(M ) of a derivation M . This will be
done in 7.2.5.
Formulas can be decorated in many different ways, and it is a natural
question to ask when one such decoration A is “stronger” than another
one A, in the sense that the former computationally implies the latter, i.e.,
A →c A. We give a partial answer to this question in the proposition
below.
We define a relation A + A (A is a computational strengthening of A)
between c.r. formulas A , A inductively. It is reflexive, transitive and
330 7. Extracting computational content from proofs

satisﬁes
(A →nc B) + (A →c B),
(A →c B) + (A →nc B) if A is n.c.,
(A → B ) + (A → B) if B + B, with → ∈ {→c , →nc },
(A → B) + (A → B) if A + A, with → ∈ {→c , →nc },
∀nc
x A + ∀x A,
c

∀x A + ∀x A if A + A, with ∀ ∈ {∀c , ∀nc }.

Proposition. If A + A, then A →c A.
Proof. We show that the relation “ A →c A” has the same closure
properties as “A + A”. For reflexivity and transitivity this is clear. For
the rest we give some sample derivations.
| assumed A →nc B u: A
A →nc B u: A
B B → B c
B
(→c )+ , u B
A →c B (→nc )+ , u
A →nc B
where in the last derivation the final (→nc )+ -application is correct since
u is not a computational assumption variable in the premise derivation
of B.
| assumed
A →c A u : A
A →nc B A
B
(→nc )+ , u

A →nc B
where for the same reason the final (→nc )+ -application is correct.
Remark. In 7.2.6 we shall define decorated variants ∃d , ∃l , ∃r , ∧d , ∧l , ∧r ,
∨ , ∨l , ∨r , ∨u of the existential quantifier, conjunction and disjunction.
d

For formulas involving these the proposition continues to hold if the

deﬁnition of computational strengthening is extended by
∃dx A + ∃lx A, ∃rx A,
∃x A + ∃x A if A + A, with ∃ ∈ {∃d , ∃l , ∃r },
(A ∧d B) + (A ∧l B), (A ∧r B),
(A ∧ B) + (A ∧ B) if A + A, with ∧ ∈ {∧d , ∧l , ∧r },
(A ∧ B ) + (A ∧ B) if B + B, with ∧ ∈ {∧d , ∧l , ∧r },
(A ∨d B) + (A ∨l B), (A ∨r B) + (A ∨u B),
(A ∨ B) + (A ∨ B) if A + A, with ∨ ∈ {∨d , ∨l , ∨r , ∨u },
(A ∨ B ) + (A ∨ B) if B + B, with ∨ ∈ {∨d , ∨l , ∨r , ∨u }.
7.2. Realizability interpretation 331

7.2.3. Decorating inductive deﬁnitions. For the introduction and

elimination axioms of computationally relevant (c.r.) inductively defined
predicates I (which is the default case) we can now use arbitrary formulas;
these axioms need to be carefully decorated. In particular, in all clauses
the → after recursive premises must be →c . Generally, the introduction
axioms (or clauses) are
∀x (A → (∀y (B
→ I s ))<n →c I t ), (10)
and the elimination (or least-fixed-point) axiom is
∀nc
x →c (Ki (I, P))i<k →c P x
(I x ), (11)
where
K (I, P) := ∀x (A → (∀y (B
→ I s ))<n →c
→ Ps ))<n →c Pt ).
(∀y (B
The decorated form of the general induction schema is
∀c∈T ∀cx∈T (Progx Px →c Px) (12)
with
Progx Px := ∀cx∈T (∀cy∈T (y < x →nc Py) →c Px).
We have made use here of totality predicates and the abbreviation ∀cx∈T A;
both are introduced in 7.2.7 below.
The next thing to do is to view a formula A as a “computational prob-
lem”, as done by Kolmogorov [1932]. Then what should be the solution to
the problem posed by the formula I r, where I is inductively defined? The
obvious idea here is to take a “generation tree”, witnessing how the argu-
ments r were put into I . For example, consider the clauses Even(0) and
∀nc
n (Even(n) → Even(S(Sn))). A generation tree for Even(6) should con-
c

sist of a single branch with nodes Even(0), Even(2), Even(4) and Even(6).
When we want to generally define this concept of a generation tree,
it seems natural to let the clauses of I determine the algebra to which
such trees belong. Hence we will define I to be the type (κ0 , . . . , κk−1 )
generated from constructor types κi := (Ki ), where Ki is the i-th clause
of the inductive definition of I as X (K0 , . . . Kk−1 ), and (Ki ) is the type
of the clause Ki , relative to (X r ) := .
More formally, along the inductive definition of formulas, predicates
and clauses we will define
(i) the type (A) of a formula A, and in particular when A is computa-
tionally relevant (c.r.);
(ii) the formula t realizes A, written t r A, for t a term of type (A).
This will require other subsidiary notions: for a (c.r.) inductively defined
I , (i) its associated algebra I of witnesses or generating trees, and (ii) a
witnessing predicate I r of arity (I , ), where is the arity of I . All these
notions are defined simultaneously.
332 7. Extracting computational content from proofs

We will also allow non-computational (n.c.) inductively deﬁned predi-

cates. However, some restrictions apply for the soundness theorem (cf.
7.2.8) to hold.
A formula A is called invariant (under realizability) if ∃x (x r A) (defined
below) is equivalent to A. Now the restrictions are as follows.
(a) An arbitrary inductively defined predicate I can be marked as non-
computational if in each clause (10) the parameter premises A and all
premises B of recursive premises are invariant. Then the elimination
scheme for I must be restricted to non-computational formulas.
(b) Moreover, there are some special non-computational inductively de-
fined predicates whose elimination schemes need not be restricted.
(i) For every I its witnessing predicate I r . It is special in the sense
that I r ts only states that we do have a realizer t for I r s.
(ii) Leibniz equality Eq, and uniform (or non-computational) ver-
sions ∃u and ∧u of the existential quantifier and of conjunc-
tion. These are special in the sense that they are defined by just
one clause, which contains →nc , ∀nc only and has no recursive
premises.
7.2.4. The type of a formula, realizability and witnesses. We begin with
the definition of the type (A) of a formula A, the type of a potential
realizer of A. More precisely, (A) should be the type of the term (or
“program”) to be extracted from a proof of A. Formally, we assign
to every formula A an object (A) (a type or the “nulltype” symbol
◦). In case (A) = ◦ proofs of A have no computational content; such
formulas A are called non-computational (n.c.) (or Harrop formulas); the
other ones are called computationally relevant (c.r.). The definition can
be conveniently written if we extend the use of → to the nulltype
symbol ◦:
( → ◦) := ◦, (◦ → ) := , (◦ → ◦) := ◦.
With this understanding of → we can simply write
Definition (Type (A) of a formula A).

I if I is c.r.,
(I r ) :=
◦ if I is n.c.,
(A →c B) := ((A) → (B)), (A →nc B) := (B),
(∀cx A) := ( → (A)), (∀nc
x A) := (A).

We now deﬁne realizability. It will be convenient to introduce a special

“nullterm” symbol ε to be used as a “realizer” for n.c. formulas. We
extend term application to the nullterm symbol by
εt := ε, tε := t, εε := ε.
7.2. Realizability interpretation 333

The deﬁnition uses the witnessing predicate I r associated with I , which is

introduced below.
Deﬁnition (t realizes A). Let A be a formula and t either a term of
type (A) if the latter is a type, or the nullterm symbol ε for n.c. A.
t r I s := I r ts if I is c.r.,
t r (A → B) :=
c
∀nc
x (x r A →
nc
tx r B),
t r (A → nc
B) := ∀nc
x (x r A →
nc
t r B),
t r ∀cx A := ∀nc
x (tx r A),
t r ∀nc
x A := ∀nc
x (t r A).

In case A is n.c., ∀nc

x (x r A →nc B(x)) means ε r A →nc B(ε). For
a general n.c. inductively defined predicate (with restricted elimination
scheme) we define ε r I s to be I s. For the special n.c. inductively defined
predicates introduced below realizability is defined by
ε r I r ts := I r ts,
ε r Eq(t, s) := Eq(t, s),
ε r ∃ux A := ∃ux,y (y r A),
ε r (A ∧u B) := ∃ux (x r A) ∧u ∃uy (y r B),
Note. Call two formulas A and A computationally equivalent if each
of them computationally implies the other, and in addition the identity
realizes each of the two derivations of A →c A and of A →c A . It is an
easy exercise to verify that for n.c. A, the formulas A →c B and A →nc B
are computationally equivalent, and hence can be identified. In the sequel
we shall simply write A → B for either of them. Similarly, for n.c. A the
two formulas ∀cx A and ∀ncx A are n.c., and both ε r ∀x A and ε r ∀x A are
c nc

deﬁned to be ∀x (ε r A). Hence they can be identiﬁed as well, and we

shall simply write ∀x A for either of them. Since the formula t r A is n.c.,
under this convention the →, ∀-cases in the deﬁnition of realizability can
be written
t r (A →c B) := ∀x (x r A → tx r B),
t r (A →nc B) := ∀x (x r A → t r B),
t r ∀cx A := ∀x (tx r A),
t r ∀nc
x A := ∀x (t r A).

For every c.r. inductively deﬁned predicate I = X (K0 , . . . Kk−1 ) we

define the algebra I of its generation trees or witnesses.
Definition (Algebra I of witnesses). Each clause generates a constr-
uctor type κi := (Ki ), relative to (X r ) := . Then I := κ
.
The witnessing predicate I r of arity (I , ) can now be defined as follows.
334 7. Extracting computational content from proofs

Deﬁnition (Witnessing predicate I r ). For every clause

K = ∀x (A → (∀y (B
→ X s ))<ni →c X t )

of the original inductive deﬁnition of I we require the introduction axiom

∀x ,u,f (
u r A → (∀y ,v ( → I r (f y
v r B v , s )))<n →
(13)
I r (C t ))
x u f,
with the understanding that
(i) only those xj with a computational ∀cxj in K , and
(ii) only those ui with Ai c.r. and followed by →c in K
occur as arguments in C similarly for y
x u f; , v and f y
v . Here C
is the constructor of the algebra I generated from the constructor type
(K).
Notice that in the clause K above →, ∀ can be either of →c or →nc and
∀c or ∀nc , depending on how the clause is formulated. However, in the
introduction axiom (13) all displayed →, ∀ mean →nc , ∀nc , according to
our convention in the note above.
The elimination axiom is
∀xnc
∀w (I w x
c r
→ (Kir (I r , Q))i<k →c Qw x
) (14)
with
Kir (I r , Q) := ∀nc
x u ,f
,
u r A → (∀y ,v (
( → I r (f y
v r B v , s )))<n →
(∀yc ,v ( → Q(f y
v r B v , s )))<n →c
t )).
u f,
Q(Ci x
To understand this deﬁnition one needs to look at examples. Consider
the totality predicate T for N inductively deﬁned by the clauses
T 0, ∀nc
n (Tn → T (Sn)).
c

More precisely T := X (K0 , K1 ) with K0 := X 0, K1 := ∀nc n (Xn →

X (Sn)). These clauses have types κ0 := (K0 ) = (X 0) = and κ1 :=

(K1 ) = (∀nc n (Xn → X (Sn))) = → . Therefore the algebra of
c

witnesses is T := ( , → ), that is, N again. The witnessing predicate

T r is deﬁned by the clauses
T r 00, ∀n,m (T r mn → T r (Sm, Sn))
and it has as its elimination axiom
∀nc
n ∀m (T mn → Q(0, 0) →
c r c

∀nc
n,m (T mn → Qmn → Q(Sm, Sn)) →
r c c

Qmn.
7.2. Realizability interpretation 335

As an example involving parameters, consider the formula ∃dx A with a

c.r. formula A, and view ∃dx A as inductively deﬁned by the clause
∀cx (A →c ∃dx A).
More precisely, Exd (Y ) := X (K0 ) with K0 := ∀cx (Yx →c X ). Then
∃dx A abbreviates Exd ({x | A}). The single clause has type κ0 := (K0 ) =
(∀cx (Yx →c X )) = → α → . Therefore the algebra of witnesses is
:= ∃dx A := ( → α → ), that is, × α. We write x, u for the values
of the (only) constructor of , i.e., the pairing operator. The witnessing
predicate (∃dx A)r is deﬁned by the clause K0r ((∃dx A)r , {x | A}) :=
∀x,u (u r A → (∃dx A)r x, u)
and its elimination axiom is
∀cw ((∃dx A)r w → ∀nc
x,u (u r A → Qx, u) → Qw).
c

Deﬁnition (Leibniz equality Eq and ∃u , ∧u ). The introduction axioms

are
∀nc
x Eq(x, x), ∀nc
x (A → ∃x A),
nc u
A →nc B →nc A ∧u B,
and the elimination axioms are
∀nc
x,y (Eq(x, y) → ∀x Pxx → Pxy),
nc c

∃ux A → ∀nc
x (A → P) → P,
nc c

A ∧u B → (A →nc B →nc P) →c P.
An important property of the realizing formulas t r A is that they are
invariant.
Proposition. ε r (t r A) is the same formula as t r A.
Proof. By induction on the simultaneous inductive definition of for-
mulas and predicates in 7.1.2.
Case t r I s. By definition the formulas ε r (t r I s ), ε r I r ts, I r ts and
t r I s are identical.
Case I r ts. By definition ε r (ε r I r ts ) and ε r I r ts are identical.
Case Eq(t, s). By definition ε r (ε r (Eq(t, s))) and ε r (Eq(t, s)) are
identical.
Case ∃ux A. The following formulas are identical.
ε r (ε r ∃ux A),
ε r ∃ux ∃uy (y r A),
∃ux (ε r ∃uy (y r A)),
∃ux ∃uy (ε r (y r A)),
∃ux ∃uy (y r A) by induction hypothesis,
ε r ∃ux A.
336 7. Extracting computational content from proofs

Case A ∧u B. The following formulas are identical.

ε r (ε r (A ∧u B)),
ε r (∃ux (x r A) ∧u ∃uy (y r B)),
ε r ∃ux (x r A) ∧u ε r ∃uy (y r B),
∃ux (ε r (x r A)) ∧u ∃uy (ε r (y r B)),
∃ux (x r A) ∧u ∃uy (y r B) by induction hypothesis,
ε r (A ∧u B).
Case A →c B. The following formulas are identical.
ε r (t r (A →c B)),
ε r ∀x (x r A → tx r B),
∀x (ε r (x r A) → ε r (tx r B)),
∀x (x r A → tx r B) by induction hypothesis,
t r (A →c B).
Case A →nc B. The following formulas are identical.
ε r (t r (A →nc B)),
ε r ∀x (x r A → t r B),
∀x (ε r (x r A) → ε r (t r B)),
∀x (x r A → t r B) by induction hypothesis,
t r (A →nc B).
Case ∀cx A. The following formulas are identical.
ε r (t r ∀cx A),
ε r ∀x (tx r A),
∀x (ε r (tx r A)),
∀x (tx r A) by induction hypothesis,
t r ∀cx A.
Case ∀nc
x A. The following formulas are identical.

ε r (t r ∀nc
x A),
ε r ∀x (t r A),
∀x (ε r (t r A)),
∀x (t r A) by induction hypothesis,
t r ∀nc
x A.

This completes the proof.

7.2. Realizability interpretation 337

7.2.5. Extracted terms. For a derivation M of a formula A we deﬁne

its extracted term et(M ), of type (A). This deﬁnition is relative to a
ﬁxed assignment of object variables to assumption variables: to every
assumption variable u A for a formula A we assign an object variable xu of
type (A).

Deﬁnition (Extracted term et(M ) of a derivation M ). For derivati-

ons M A with A n.c. let et(M A ) := ε. Otherwise

et(u A ) := xu(A) (xu(A) uniquely associated with u A ),

c
et((u A M B )A→ B ) := x (A) et(M ),
u
c
et((M A→ B N A )B ) := et(M )et(N ),
et((x M A )∀x A ) := x et(M ),
c

et((M ∀x A(x) r)A(r) ) := et(M )r,

nc
et((u A M B )A→ B
) := et(M ),
nc
et((M A→ B
N A )B ) := et(M ),
et((x M A )∀x A ) := et(M ),
nc

et((M ∀x A(x) r)A(r) ) := et(M ).

Here x (A) et(M ) means just et(M ) if A is n.c.

u
It remains to define extracted terms for the axioms. Consider a (c.r.)
inductively defined predicate I . For its introduction axioms (1) and elim-
ination axiom (2) define

et(Ii+ ) := Ci , et(I − ) := R,

where both the constructor Ci and the recursion operator R refer to the
algebra I associated with I .
Now consider the special non-computational inductively defined pre-
dicates. Since they are n.c., we only need to define extracted terms
for their elimination axioms. For the witnessing predicate I r we define
et((I r )− ) := R (referring to the algebra I again), and for Leibniz equality
Eq, the n.c. existential quantifier ∃ux A and conjunction A ∧u B we take
identities of the appropriate type.

If derivations M are deﬁned simultaneously with their extracted terms

et(M ), we can formulate the introduction rules for →nc and ∀nc by
nc
(i) If M B is a derivation and xu A ∈ / FV(et(M )), then (u A M B )A→ B is
a derivation.
(ii) If M A is a derivation, x is not free in any formula of a free assumption
/ FV(et(M )), then (x M A )∀x A is a derivation.
nc
variable of M and x ∈
338 7. Extracting computational content from proofs

7.2.6. Computational variants of some inductively deﬁned predicates.

We can now define variants of the inductively defined predicates in 7.1.4
and 7.1.5, which take computational aspects into account. For ∃, ∧ and
∨ we obtain ∃d , ∃l , ∃r , ∃u , ∧d , ∧l , ∧r , ∧u ∨d , ∨l , ∨r , ∨u with d for “double”,
l for “left”, r for “right” and u for “uniform”. They are defined by
their introduction and elimination axioms, which involve both →c , ∀c and
→nc , ∀nc . For ∃u and ∧u these have already been defined (in 7.2.4). For
the remaining ones they are
∀cx (A →c ∃dx A), ∃dx A →c ∀cx (A →c P) →c P,
∀cx (A →nc ∃lx A), ∃lx A →c ∀cx (A →nc P) →c P,
∀nc
x (A → ∃x A),
c r
∃rx A →c ∀nc
x (A → P) → P,
c c

and similar for ∧:

A →c B →c A ∧d B, A ∧d B →c (A →c B →c P) →c P,
A →c B →nc A ∧l B, A ∧l B →c (A →c B →nc P) →c P,
A →nc B →c A ∧r B, A ∧r B →c (A →nc B →c P) →c P
and for ∨:
A →c A ∨d B, B →c A ∨d B,
A →c A ∨l B, B →nc A ∨l B,
A →nc A ∨r B, B →c A ∨r B,
A →nc A ∨u B, B →nc A ∨u B
with elimination axioms
A ∨d B →c (A →c P) →c (B →c P) →c P,
A ∨l B →c (A →c P) →c (B →nc P) →c P,
A ∨r B →c (A →nc P) →c (B →c P) →c P,
A ∨u B →c (A →nc P) →c (B →nc P) →c P.
Let ≺ be a binary relation. A computational variant of the inductively
deﬁned transitive closure of ≺ has introduction axioms
∀cx,y (x ≺ y →nc TC(x, y)),
∀cx,y ∀nc
z (x ≺ y → TC(y, z) → TC(x, z)),
nc c

and the elimination axiom is according to (11)

∀nc
x,y (TC(x, y) → ∀x,y (x ≺ y → Pxy) →
c c nc c

∀cx,y ∀nc
z (x ≺ y → TC(y, z) → Pyz → Pxz) →
nc c c c

Pxy).
Consider the accessible part of a binary relation ≺. A computational
variant is determined by the introduction axioms
∀cx (F → Acc(x)),
∀nc
x (∀y≺x Acc(y) → Acc(x)),
c c
7.2. Realizability interpretation 339

where ∀cy≺x A stands for ∀cy (y ≺ x →nc A). The elimination axiom is
∀nc
x (Acc(x) → ∀x (F → Px) →
c c c

∀x (∀y≺x Acc(y) →c ∀cy≺x Py →c Px) →c

nc c

Px).
7.2.7. Computational variants of totality and induction. We now adapt
the treatment of totality and induction in 7.1.6 to the decorated con-
nectives →c , ∀c and →nc , ∀nc , giving computational variants of totality.
Their elimination axiom provides us with a computational induction ax-
iom, whose extracted term is the recursion operator of the corresponding
algebra.
Recall that the deﬁnition of the totality predicates T was relative to a
given assignment α → Tα of predicate variables to type variables. In the
deﬁnition of T the clauses are decorated by
Ki := ∀cx P ∀xnc nc
→c X (xR y
(T y
R ((∀y )))<n →c X (Ci x
)),
and in the arrow type case the (explicit) clause is decorated by
T→ := X ∀nc
f (∀x (T x → T (fx)) → Xf).
nc c c

Abbreviating ∀nc x (Tx → A) by ∀x∈T A allows a shorter formulation of

c c

the introduction axioms and elimination schemes:

∀nc
f (∀x∈T T (fx) → T→ f),
c c

∀f∈T→ ,x∈T T (fx),

∀cx P ∀nc
x
c R
∈T T (x y
R ((∀y ))<n →c T (Ci x
)),
∀x∈T (K0 (T , P) → · · · → Kk−1 (T , P) →c Px)
c c c

where Ki (T , P) :=
∀cx P ∀nc c
R ((∀y
x
R
))<n →c (∀cy ∈T P(xR y
∈T T (x y ))<n →c P(Ci x
)).
It is helpful to look at some examples. Let (T )+
i denote the i-th introduc-
tion axiom for T .
(TN )+
1 : ∀cn∈T T (Sn),
(TL() )+
1 : ∀cx ∀cl ∈T T (x :: l ),
(T× )+
0 : ∀cx,y T x, y.
The elimination axiom T− now is the computational induction axiom, and
is denoted accordingly. Examples are
Indp,P : ∀cp∈T (Ptt →c Pff →c PpB ),
Indn,P : ∀cn∈T (P0 →c ∀cn∈T (Pn →c P(Sn)) →c Pn N ),
Indl,P : ∀cl ∈T (P(nil) →c ∀cx ∀cl ∈T (Pl →c P(x :: l )) →c Pl L() ),
Indz,P : ∀cz∈T (∀cx,y Px , y →c Pz × ),
340 7. Extracting computational content from proofs

Notice that for the totality predicates T the type (T r) is , provided
we extend the definition of (A) to the predicate variable Tα assigned to
type variable α by stipulating (Tα r) := α. This can be proved easily, by
induction on . As a consequence, the types of the various computational
induction schemes are, with := (A)
(Indp,A ) = B → → → ,
(Indn,A ) = N → → (N → → ) → ,
(Indl,A ) = L() → → ( → L() → → ) → ,
(Indx,A ) = + → ( → ) → ( → ) → ,
(Indz,A ) = × → ( → → ) → .
These are the types of the corresponding recursion operators.
The type of general induction (12) is
(α → N) → α → (α → (α → ) → ) → ,
which is the type of the general recursion operator F defined in (5).
7.2.8. Soundness. We prove that every theorem in TCF + Axnci has a
realizer: the extracted term of its proof. Here (Axnci ) is an arbitrary set
of non-computational invariant formulas viewed as axioms.
Theorem (Soundness). Let M be a derivation of A from assumptions
ui : Ci (i < n). Then we can derive et(M ) r A from assumptions xui r Ci
(with xui := ε in case Ci is n.c.).
If not stated otherwise, all derivations are in TCF + Axnci . The proof
is by induction on M .
Proof for the logical rules. Case u : A. Then et(u) = xu .
c
Case (u A M B )A→ B . We must find a derivation of
et(u M ) r (A →c B), which is ∀x (x r A → et(u M )x r B).
Recall that et(u M ) = xu et(M ). Renaming x into xu , our goal is to find
a derivation of
∀xu (xu r A → et(M ) r B),
since we identify terms with the same -normal form. But by induction
hypothesis we have a derivation of et(M ) r B from xu r A. An → and ∀
introduction then give the desired result.
c
Case M A→ B N A . We must find a derivation of et(MN ) r B. Recall
et(MN ) = et(M )et(N ). By induction hypothesis we have derivations of
et(M ) r (A →c B), which is ∀x (x r A → et(M )x r B)
and of et(N ) r A. Hence, again by logic, the claim follows.
Case (x M A )∀x A . We must find a derivation et(x M ) r ∀cx A. By
c

deﬁnition et(x M ) = x et(M ). Hence we must derive

x et(M ) r ∀cx A, which is ∀x ((x et(M ))x r A).
7.2. Realizability interpretation 341

Since we identify terms with the same -normal form, the claim follows
from the induction hypothesis.
Case M ∀x A(x) t. We must ﬁnd a derivation of et(Mt) r A(t). By deﬁni-
c

tion et(Mt) = et(M )t, and by induction hypothesis we have a derivation

et(M ) r ∀cx A(x), which is ∀x (et(M )x r A(x)).

Hence the claim.

nc
Case (u A M B )A→ B . We must ﬁnd a derivation of et(M ) r (A →nc B),
i.e., of ∀y (y r A → et(M ) r B). But this is immediate from the induction
hypothesis.
nc
Case M A→ B N A . We must ﬁnd a derivation of et(M ) r B. By induc-
tion hypothesis we have derivations of

et(M ) r (A →nc B), which is ∀y (y r A → et(M ) r B),

and of et(N ) r A. Hence the claim.

Case (x M A )∀x A . We must ﬁnd a derivation et(x M ) r ∀nc
nc
x A. By
deﬁnition et(x M ) = et(M ). Hence we must derive

et(M ) r ∀nc
x A, which is ∀x (et(M ) r A).

But this follows from the induction hypothesis.

Case M ∀x A(x) t. We must ﬁnd a derivation of et(Mt) r A(t). By deﬁni-
nc

tion et(Mt) = et(M ), and by induction hypothesis we have a derivation

et(M ) r ∀nc
x A(x), which is ∀x (et(M ) r A(x)).

Hence the claim.

It remains to prove the soundness theorem for the axioms, i.e., that
their extracted terms are realizers. Before doing anything general let us
ﬁrst look at an example. Totality for N has been inductively deﬁned by
the clauses

T 0, ∀nc
n (Tn → T (Sn)).
c

Its elimination axiom is

∀nc
n (Tn → P0 → ∀n (Tn → Pn → P(Sn)) → Pn).
c c nc c c c

We show that their extracted terms 0, S and R are indeed realizers. For
the proof recall from the examples in 7.2.4 that the witnessing predicate
T r is deﬁned by the clauses

T r 00, ∀n,m (T r mn → T r (Sm, Sn)),

342 7. Extracting computational content from proofs

and it has as its elimination axiom

∀nc
n ∀m (T mn → Q00 →
c r c

∀nc
n,m (T mn → Qmn → Q(Sm, Sn)) →
r c c

Qmn).
Lemma. (a) 0 r T 0 and S r ∀nc
n (Tn → T (Sn)).
c

(b) R r ∀n (Tn → P0 → ∀n (Tn → Pn →c P(Sn)) →c Pn).

nc c c nc c

Proof. (a) 0 r T 0 is deﬁned to be T r 00. Moreover, by deﬁnition

S r ∀nc
n (Tn → T (Sn)) unfolds into ∀n,m (T mn → T (Sm, Sn)).
c r r

(b) Let n, m be given and assume m r Tn. Let further w0 , w1 be such

that w0 r P0 and w1 r ∀nc
n (Tn → Pn → P(Sn)), i.e.,
c c

∀n,m (T r mn → ∀g (g r Pn → w1 mg r P(Sn))).
Our goal is
Rmw0 w1 r Pn =: Qmn.
To this end we use the elimination axiom for T r above. Hence it suﬃces
to prove its premises Q00 and ∀ncn,m (T mn → Qmn → Q(Sm, Sn)). By a
r c

conversion rule for R (cf. 6.2.2) the former is the same as w0 r P0, which
we have. For the latter assume n, m and its premises. We show Q(Sm, Sn),
i.e., R(Sm)w0 w1 r P(Sn). By a conversion rule for R this is the same as
w1 m(Rmw0 w1 ) r P(Sn).
But with g := Rmw0 w1 this follows from what we have.
Proof for the axioms. We first prove soundness for introduction and
elimination axioms of c.r. inductively defined predicates, and show that
the extracted terms defined above indeed are realizers. The proof uses the
introduction axioms (13) and the elimination axiom (14) for I r .
By the clauses (13) for I r we clearly have Ci r Ii+ . For the elimination
axiom we have to prove R r I − , that is,
R r ∀nc →c (Ki (I, P))i<k →c P x
(I x
x ).
Let x
, w be given and assume w r I x . Let further w0 , . . . , wk−1 be such
that wi r Ki (I, P). For simplicity we assume that all universal quantifiers
and implications in Ki are computational. Then wi r Ki (I, P) is
∀x ,u,f, u r A → (∀y ,v (
g (
→ f y
v r B v r I (s )))<n →
→ g y
v r B
(∀y ,v ( v r P(s )))<n → (15)
g r P(t )).
u f
wi x
Our goal is
Rw w
r Px
=: Qw x
.
7.2. Realizability interpretation 343

We use the elimination axiom (14) for I r with Q(w, x

), i.e.,
∀xnc
∀w (I w x
c r
→ (Kir (I r , Q))i<k →c Qw x
).
Hence it suﬃces to prove Kir (I r , Q) for every constructor formula Ki , i.e.,
∀xnc u ,f
,
u r A → (∀y ,v (
( → I r (f y
v r B v , s )))<n →
(∀cy ,v ( → Q(f y
v r B v , s )))<n →c (16)
t )).
u f,
Q(Ci x

, u , f and the premises of (16). We show Q(Ci x

So assume x t ), i.e.,
u f,
u f )w
R(Ci x r P(t ).
By the conversion rules for R (cf. 6.2.2) this is the same as
y ,v R(f y
u f(
wi x <n r P(t ).
v )w)

To this end we use (15) with x (y ,v R(f y

, u , f, v )w)
<n . Its conclusion

is what we want, and its premises follow from the premises of (16).
Now consider non-computational inductively deﬁned predicates. In the
general case (with a restricted elimination scheme) we required that in a
clause
∀x (A → (∀y (B
→ I s ))<n → I t ).

the parameter premises A and all premises B of recursive premises are

invariant. Then the following are equivalent:
ε r ∀x (A → (∀y (B
→ I s ))<n → I t ),
∀x ,u (
u r A → (∀y ,v ( → ε r I s ))<n → ε r I t ).
v r B
B
Now since A, are invariant, ∃ui (ui r Ai ) is equivalent to Ai , and similar

for B . Moreover by deﬁnition ε r I s is I s. Hence ε realizes every
introduction axiom. For an elimination axiom
∀x (I x
→ (Ki (I, P))i<k → P x
),
and
K(I, P) := ∀x (A → (∀y (B
→ I s ))<n →
→ Ps ))<n → Pt ).
(∀y (B
we have the restriction that P is non-computational. Hence the following
are equivalent:
ε r ∀x (I x
→ (Ki (I, P))i<k → P x
),
∀x (I x
→ (ε r Ki (I, P))i<k → ε r P x
),
344 7. Extracting computational content from proofs

and for K (I, P)

ε r ∀x (A → (∀y (B
→ I s ))<n →
→ Ps ))<n → Pt )
(∀y (B
is equivalent to
∀x ,u (
u r A → (∀y ,v ( → ε r I s ))<n →
v r B
→ ε r Ps ))<n → ε r Pt ).
v r B
(∀y ,v (

Again because of the invariance of A, B

the resulting formula is just
another instance of the same elimination scheme, with ε r Ps instead of
Ps.
We still need to attend to the special n.c. inductively deﬁned predicates
I r , Eq, ∃u and ∧u . For I r we must show that ε realizes the introduction
axiom (13) and R realizes the elimination axiom (14). The former follows
from the very same axiom, using the invariance of realizing formulas (as
proved in the proposition at the end of 7.2.4). For the latter we can argue
similarly as for the proof of R r I − above. However, we carry this out,
since the way the decorations work is rather delicate here.
We have to prove R r (I r )− , that is,
R r ∀nc
∀w (I w x
x
c r
→ (Kir (I r , Q))i<k →c Qw x
)
with Kir (I r , Q) as in (16). Let x
, w be given and assume I r w x
. Let further
w0 , . . . , wk−1 be such that wi r Kir (I r , P), i.e.,
∀x ,u,f, u r A → (∀y ,v (
g (
→ I r (f y
v r B v , s )))<n →
→ g y
v r B
(∀y ,v ( v , s )))<n → (17)
v r Q(f y
g r Q(Ci x
u f
wi x t )).
u f,
Our goal is
Rw w =: Q w x
r Qw x .
We use the elimination axiom (14) for I r with Q w x
, i.e.,
∀nc
x
c r
→ (Kir (I r , Q ))i<k →c Q w x
∀w (I w x ).
Hence it suﬃces to prove Kir (I r , Q ) for every constructor formula Ki , i.e.,
∀xnc u ,f
,
u r A → (∀y ,v (
( → I r (f y
v r B v , s )))<n →
(∀cy ,v ( → Q (f y
v r B v , s )))<n →c (18)
t )).
u f,
Q (Ci x

, u , f and the premises of (18). We show Q (Ci x

So assume x t ), i.e.,
u f,
u f )w
R(Ci x t ).
u f,
r Q(Ci x
7.2. Realizability interpretation 345

By the conversion rules for R this is the same as

wi x y ,v R(f y
u f( v )w) t ).
u f,
<n r Q(Ci x

To this end we use (17) with x (y ,v R(f y

, u , f, v )w)
<n . Its conclusion

is what we want, and its premises follow from the premises of (18).
It remains to consider the introduction and elimination axioms for Eq,
∃u and ∧u . We first prove that ε is a realizer for the introduction axioms.
The following formulas are identical by definition, and the final one in
each block is derivable:
ε r ∀nc
x (A → ∃x A)
nc u
ε r ∀nc
x Eq(x, x)
∀x (ε r (A →nc ∃ux A))
∀x (ε r Eq(x, x))
∀x,y (y r A → ε r ∃ux A)
∀x Eq(x, x)
∀x,y (y r A → ∃ux,y (y r A))

ε r (A →nc B →nc A ∧u B)
∀x (x r A → ∀y (y r B → ε r (A ∧u B)))
∀x (x r A → ∀y (y r B → ∃ux (x r A) ∧u ∃uy (y r B)))
We now prove that the identity is a realizer for the elimination axioms.
Again the formulas in each block are identical by deﬁnition, and the ﬁnal
one is derivable.
id r ∀nc
x,y (Eq(x, y) → ∀x Pxx → Pxy),
nc c

∀x,y (ε r Eq(x, y) → id r (∀nc

x Pxx → Pxy))
c

∀x,y (Eq(x, y) → ∀z (z r ∀nc

x Pxx → z r Pxy))
∀x,y (Eq(x, y) → ∀z (∀x (z r Pxx) → z r Pxy))

id r (∃ux A →nc ∀nc

x (A → P) → P)
nc c

ε r ∃ux A → id r (∀nc
x (A → P) → P)
nc c

∃ux,y (y r A) → ∀z (z r ∀nc
x (A → P) → z r P)
nc

∃ux,y (y r A) → ∀z (∀x (z r (A →nc P)) → z r P)

∃ux,y (y r A) → ∀z (∀x,y (y r A → z r P) → z r P)

id r (A ∧u B →nc (A →nc B →nc P) →c P)

ε r (A ∧u B) → id r ((A →nc B →nc P) →c P)
∃ux (x r A) ∧u ∃uy (y r B) → ∀z (z r (A →nc B →nc P) → z r P)
∃ux (x r A) ∧u ∃uy (y r B) → ∀z (∀x (x r A → ∀y (y r B → z r P)) → z r P)
We ﬁnally show that general recursion provides a realizer for general
induction. Recall that according to (12) general induction is the schema
∀c∈T ∀cx∈T (Progx Px →c Px)
346 7. Extracting computational content from proofs

where Progx Px expresses “progressiveness” w.r.t. the measure function

and the ordering <:
Progx Px := ∀cx∈T (∀cy∈T (y < x →nc Py) →c Px).
We need to show
F r ∀c,x∈T (Progx Px →c Px),
that is,
∀c,x∈T ∀nc
g (g r ∀x∈T (∀y∈T ;y<x Py → Px) → Fxg r Px).
c c c

Fix , x, g and assume the premise, which unfolds into

∀nc
x∈T,f (∀y∈T ;y<x (fy r Py) → gxf r Px).
nc nc
(19)
We have to show Fxg r Px. To this end we use an instance of general
induction with the formula Fxg r Px, that is,
∀c,x∈T (∀cx∈T (∀cy∈T ;y<x (F yg r Py) →c Fxg r Px) →c Fxg r Px).
It suffices to prove the premise. Assume ∀cy∈T ;y<x (Fyg r Py) for a
fixed x ∈ T . We must show Fxg r Px. Recall that by definition (5)
Fxg = gxf0 with f0 := y [if y < x then Fyg else ε].
Hence we can apply (19) to x, f0 , and it remains to show
∀nc
y∈T ;y<x (f0 y r Py).

Fix y ∈ T with y < x. Then f0 y = Fyg, and by the last assumption
we have Fyg r Py.
Remark (Code-carrying proof). A customer buys some software. He
or she requires proof that it works, so the supplier sends a proof of
the existence of a solution to the speciﬁcation provided, from which the
program has been automatically extracted (e.g., as a term in Gödel’s T).
However, this particular customer is very discerning and does not fully
trust the supplier’s systems. So he/she makes a further request for proof
that the extraction mechanism (e.g., in Minlog) is itself correct. The
supplier therefore sends the soundness proof for that particular piece of
software. This is practically checkable and is just what is needed.
7.2.9. An example: list reversal. We ﬁrst give an informal existence
proof for list reversal. Write vw for the result v ∗ w of appending the list
w to the list v, vx for the result v ∗ x: of appending the one element list x:
to the list v, and xv for the result x :: v of constructing a list by writing
an element x in front of a list v, and omit the parentheses in R(v, w) for
(typographically) simple arguments. Assume
InitRev : R(nil, nil), (20)
GenRev : ∀v,w,x (Rvw → R(vx, xw)). (21)
7.2. Realizability interpretation 347

We view R as a predicate variable without computational content. The

reader should not be confused: of course these formulas involving R do
express how a computation of the reversed list should proceed. However,
the predicate variable R itself is a placeholder for a n.c. formula.
A straightforward proof of ∀v∈T ∃w∈T Rvw proceeds as follows. We
first prove a lemma ListInitLastNat stating that every non-empty list
can be written in the form vx. Using it, ∀v∈T ∃w∈T Rvw can be proved
by induction on the length of v. In the step case, our list is non-
empty, and hence can be written in the form vx. Since v has smaller
length, the induction hypothesis yields its reversal w. Then we can take
xw.
Here is the term neterm (for “normalized extracted term”) extracted
from a formalization of this proof, with variable names f for unary func-
tions on lists and p for pairs of lists and numbers:
[x0]
(Rec nat=>list nat=>list nat)x0([v2](Nil nat))
([x2,f3,v4]
[if v4
(Nil nat)
([x5,v6][let p7 (cListInitLastNat v6 x5)
(right p7::f3 left p7)])])
where the square brackets in [x] is a notation for -abstraction x . The
term contains the constant cListInitLastNat denoting the content of
the auxiliary proposition, and in the step the function defined recursively
calls itself via f3. The underlying algorithm defines an auxiliary function
g by
g(0, v) := nil,
g(n + 1, nil) := nil,
g(n + 1, xv) := let wy = xv in y :: g(n, w)
and gives the result by applying g to lh(v) and v. It clearly takes quadratic
time. To run this algorithm one has to normalize (via “nt”) the term
obtained by applying neterm to the length of a list and the list itself, and
“pretty print” the result (via “pp”):
(animate "ListInitLastNat")
(animate "Id")
(pp (nt (mk-term-in-app-form
neterm (pt "4") (pt "1::2::3::4:"))))
The returned value is the reversed list 4::3::2::1:. We have made
use here of a mechanism to “animate” or “deanimate” lemmata, or
more precisely the constants that denote their computational content.
This method can be described generally as follows. Suppose a proof
of a theorem uses a lemma. Then the proof term contains just the
348 7. Extracting computational content from proofs

name of the lemma, say L. In the term extracted from this proof we
want to preserve the structure of the original proof as much as pos-
sible, and hence we use a new constant cL at those places where the
computational content of the lemma is needed. When we want to exe-
cute the program, we have to replace the constant cL corresponding to
a lemma L by the extracted program of its proof. This can be achieved
by adding computation rules for cL. We can be rather flexible here and
enable/block rewriting by using animate/deanimate as desired. To ob-
tain the let expression in the term above, we have used implicitly the
“identity lemma” Id : P → P; its realizer has the form f,x (fx). If Id
is not animated, the extracted term has the form cId(x M )N , which is
printed as [let x N M ].
We shall later (in 7.5.2) come back to this example. It will turn
out that the method of “refined A-translation” (treated in section
7.3) applied to a weak existence proof (of ∀v∈T ∃˜ w∈T Rvw rather than
∀v∈T ∃w∈T Rvw) together with decoration will make it possible to extract
the usual linear list reversal algorithm from a proof.
7.2.10. Computational content for coinductive definitions. We now ex-
tend the insertion of computational content into the axioms for coinduc-
tively defined predicates. Consider for example the coinductive definition
of cototality for the algebra N in 7.1.7. Taking computational content
into account, it is decorated by

∀nc
n ( TN n → Eq(n, 0) ∨ ∃m (Eq(n, Sm) ∧ TN m)).
co c r co

Its decorated greatest-ﬁxed-point axiom is

∀nc
n (Pn → ∀n (Pn → Eq(n, 0) ∨ ∃m (Eq(n, Sm) ∧ ( TN m ∨ Pm))) →
c nc c r co c

co
TN n).

If co TN r holds, then by the clause we have a cototal ideal in an algebra

built in correspondence with the clause, which in this case again is (an
isomorphic copy of) N. The predicate co TN can be understood as the
greatest set of witness–argument pairs satisfying the clause, the witness
being a cototal ideal.
Let us also reconsider the example at the end of 6.2.3 concerning “ab-
stract” reals, having an unspeciﬁed type . Let Rx abbreviate “x is a real
in [−1, 1]”, and assume that we have a type for rationals, and a predi-
cate Q such that Qp means “p is a rational in [−1, 1]”. To formalize the
argument, we assume that in the abstract theory we can prove that every
real can be compared with a proper interval with rational endpoints:

∀cx∈R;p,q∈Q (p < q → x ≤ q ∨ p ≤ x). (22)

7.2. Realizability interpretation 349

We coinductively deﬁne a predicate J of arity () by the clause

y−1
∀nc
x (Jx → Eq(x, 0) ∨ ∃y (Eq(x,
c r
) ∧ Jy) ∨
2
y
∃ry (Eq(x, ) ∧ Jy) ∨ (23)
2
y+1
∃ry (Eq(x, ) ∧ Jy)).
2
Notice that this clause has the same form as the definition of cototality
co
TI for I in 7.1.7; in particular, its associated algebras (defined below)
are the same. The only difference is that the arity of co TI is (I), whereas
the arity of J is (), with the unspecified type of reals. This makes it
possible to extract computational content (w.r.t. a stream representation)
from proofs in an abstract theory of reals. The greatest-fixed-point axiom
for J is
y−1
∀nc
x (Px → ∀x (Px → Eq(x, 0) ∨ ∃y (Eq(x,
c nc c r
) ∧ (Jy ∨ Py)) ∨
2
y
∃ry (Eq(x, ) ∧ (Jy ∨ Py)) ∨ (24)
2
y+1
∃ry (Eq(x, ) ∧ (Jy ∨ Py))) →c Jx).
2
The types of (23) and (24) are
→ U + + + , → ( → U + ( + ) + ( + ) + ( + )) → ,
respectively, with the algebra associated with this clause (which in fact
is I), and := (Pr). Note that the former is the type of the destructor
for , and the latter is the type of the corecursion operator co R .
We prove that Rx implies Jx, and Jx implies that x can be approxi-
mated arbitrarily good by a rational. As one can expect from the types
above, a realizability interpretation of these proofs will be computationally
informative.
Let Ip,k := [p − 2−k , p + 2−k ] and Bk x := ∃lq (x ∈ Iq,k ), meaning that
x can be approximated by a rational with accuracy 2−k .
Proposition. (a) ∀nc x (Rx → Jx).
c

(b) ∀x (Jx → ∀k Bk x).

nc c c

Proof. (a) We use (24) with R for P. It suﬃces to prove Rx →

∃ry (Eq(x, y−1 y y+1
2 ) ∧ Ry) ∨ ∃y (Eq(x, 2 ) ∧ Ry) ∨ ∃y (Eq(x, 2 ) ∧ Ry). Since
r r

x ∈ [−1, 1], by (22) either x ∈ [−1, 0] or x ∈ [− 2 , 2 ] or x ∈ [0, 1]. Let

1 1

for example x ∈ [−1, 0]. Choose y := 2x + 1. Then y ∈ [−1, 1] and

therefore Ry, and clearly Eq(x, y−1 2 ).
(b) We prove ∀ck ∀nc
x (Jx → Bk x) by induction on k. Base, k = 0. Since
c

x ∈ [−1, 1], we clearly have B0 x. Step, k → k + 1. Assume Jx. Then

Eq(x, 0) or (for instance) ∃ry (Eq(x, y−1 2 ) ∧ Jy) by (23). In case Eq(x, 0)
the claim is trivial, since Bk 0. Otherwise let a real y with Eq(x, y−12 ) and
350 7. Extracting computational content from proofs

Jy be given. By induction hypothesis we have Bk y. Because of Eq(x, y−1

2 )
this implies Bk+1 x.
The general development follows the one for inductively defined pre-
dicates rather closely. For simplicity we only consider the finitary case.
Again by default, coinductively defined predicates are computationally
relevant (c.r.), with the only exception of the witnessing predicates J r
defined below. The clause for a c.r. coinductively defined predicate is
decorated by

r

∀nc →c
(J x
x ∃yi ( Ai ∧ J si ))
i<k <ni

where the conjunction after each Ai is either ∧d or ∧r , and each conjunc-
tion between the J si is ∧d . Its greatest-ﬁxed-point axiom is decorated
by
r

∀nc →c ∀nc
(P x
x (P x
x →c ∃yi ( Ai ∧ (J si ∨ Psi ))) →c J x
)
i<k <ni

The deﬁnitions of the type (A) of a formula A and of the realizability

relation t r A is extended by

J if J is c.r.
(Jr ) :=
◦ if J is n.c.
t r J s := J r ts,
ε r J r ts := J r ts.
The algebra J of witnesses for a coinductively deﬁned predicate J := X K
is deﬁned as follows. Let

r

K = ∀nc →c
(J x
x ∃yi ( Ai ∧ J si )).
i<k <ni

Then J has k constructors, the i-th one of type (Aim1 ) → · · · →

(Aimn ) → J → · · · → J with Aim1 , . . . , Aimn those of Ai which are
c.r. and followed by ∧d (rather than ∧r ) in K , and ni + 1 occurrences of J .
The witnessing predicate J r of arity (J , ) is coinductively deﬁned by
r d l

r
∀xnc
∀w (J w x
c r
→nc ∃yi ∃ui ∃zi (Eq(w, Ci ui zi ) ∧ ui r Ai ∧ J zi si ))
i<k <ni

with the understanding that only those uij occur with Aij c.r. and followed
by ∧d in K.
For example, for cototality of N coinductively deﬁned by the clause
∀nc
n ( TN n → Eq(n, 0) ∨ ∃m (Eq(n, Sm) ∧ TN m))
co c r co

the witnessing predicate co TNr has arity (N, N) and is deﬁned by

n ∀w ( TN wn → (Eq(w, 0) ∧ Eq(n, 0)) ∨
∀nc c co r nc

∃rm ∃lz (Eq(w, Sz) ∧ Eq(n, Sm) ∧ co TNr zm)).

7.2. Realizability interpretation 351

The realizing formula t r A continues to be invariant, since ε r (t r J s )

is identical to t r J s. The extracted term of the clause of a coinductively
defined predicate is the destructor of its associated algebra, and for its
greatest-fixed-point axiom it is the corecursion operator of this algebra.
The proof of the soundness theorem can easily be extended.
Finally we reconsider the example from 6.1.7 and 7.1.7 dealing with
(uniformly) continuous real functions, taking computational content into
account. We decorate (5)–(9) as follows.
∀nc
f (f[I] ⊆ Id → Y (outd ◦ f) → IY f) (d ∈ {−1, 0, 1}),
c
(25)
∀nc
f (IY (f ◦ in−1 ) → IY (f ◦ in0 ) → IY (f ◦ in1 ) → IY f).
c c c
(26)
The decorated version of its least-fixed-point axiom is
∀nc
f (IY f →
c

f (f[I] ⊆ Id → Y (outd ◦ f) → Pf))d ∈{−1,0,1} →

(∀nc c c

∀nc
f ((IY (f ◦ ind ))d ∈{−1,0,1} → (P(f ◦ ind ))d ∈{−1,0,1} → Pf) →
c c c

Pf). (27)
The simultaneous inductive/coinductive definition of J is decorated by
∀nc
f (Jf → Eq(f, id) ∨ IJ f)
c
(28)
and its greatest-fixed-point axiom by
∀nc
f (Qf → ∀f (Qf → Eq(f, id) ∨ IJ ∨Q f) → Jf).
c nc c c
(29)
The types of (25)–(29) are
α → R(α),
R(α) → R(α) → R(α) → R(α),
R(α) → (α → P )3 → (R(α)3 → P3 → P ) → P ,
W → U + R(W),
Q → (Q → U + (W + R(W + Q ))) → W,
respectively, with α := (Yf), P := (Pr) and Q := (Qs). Substituting
α by W and writing R for R(W) we obtain
W → R,
R → R → R → R,
R → (W → P )3 → (R3 → P3 → P ) → P ,
W → U + R,
Q → (Q → U + (W + R(W + Q ))) → W.
These are the types of the first three constructors for R, the fourth con-
structor for R, the recursion operator RRP , the destructor for W and the

corecursion operator co RWQ , respectively.
352 7. Extracting computational content from proofs

The general form of simultaneous inductive/coinductive deﬁnitions of

predicates (in the ﬁnitary case) is decorated by

r

∀nc →c
(J x
x ∃yi ( Ai ∧ IJ si ))
i<k <ni

where the conjunction after each Ai is either ∧d or ∧r , and each conjunc-
tion between the IJ si is ∧d . Its greatest-ﬁxed-point axiom is decorated
by
r

J + : ∀nc →c ∀nc
(P x
x →c
(P x
x ∃yi ( Ai ∧ IJ ∨P si )) →c J x
).
i<k <ni

The algebra of witnesses has as constructors those of IJ , and in addition

those caused by the (single) clause of J , as explained above. The witnessing
predicate J r of arity (, ) then needs J -cototal-IJ -total ideals as witnesses.
However, we omit a further development of the general case here.

7.3. Reﬁned A-translation

In this section the connectives →, ∀ denote the computational versions

→c , ∀c , unless stated otherwise.
We will concentrate on the question of classical versus constructive
proofs. It is known, by the so-called “A-translation” of Friedman [1978]
and Dragalin [1979], that any proof of a specification of the form ∀x ∃˜ y B,
with B quantifier-free and a weak (or “classical”) existential quantifier
∃˜ y , can be transformed into a proof of ∀x ∃y B, now with the constructive
existential quantifier ∃y . However, when it comes to extraction of a
program from a proof obtained in this way, one easily ends up with a
mess. Therefore, some refinements of the standard transformation are
necessary. We shall study a refined method of extracting reasonable and
sometimes unexpected programs from classical proofs. It applies to proofs
of formulas of the form ∀x ∃˜ y B where B need not be quantifier-free, but
only has to belong to the larger class of goal formulas defined in 7.3.1.
Furthermore we allow unproven lemmata D to appear in the proof of
∀x ∃˜ y B, where D is a definite formula (also defined in 7.3.1).
We now describe in more detail what this section is about. It is well
known that from a derivation of a classical existential formula ∃˜ y A :=
∀y (A → ⊥) → ⊥ one generally cannot read off an instance. A simple
example has been given by Kreisel: let R be a primitive recursive relation
such that ∃˜ z Rxz is undecidable. Clearly—even logically—
∀x ∃˜ y ∀z (Rxz → Rxy)
but there is no computable f satisfying
∀x ∀z (Rxz → R(x, f(x))),
7.3. Refined A-translation 353

for then ∃˜ z Rxz would be decidable: it would be true if and only if

R(x, f(x)) holds.
However, it is well known that in case ∃˜ y G with G quantiﬁer-free one
can read oﬀ an instance. Here is a simple idea of how to prove this: replace
⊥ anywhere in the proof by ∃y G. Then the end formula ∀y (G → ⊥) → ⊥
is turned into ∀y (G → ∃y G) → ∃y G, and since the premise is trivially
provable, we have the claim.
Unfortunately, this simple argument is not quite correct. First, G may
contain ⊥, and hence is changed under the substitution of ∃y G for ⊥.
Second, we may have used axioms or lemmata involving ⊥ (e.g., ⊥ → P),
which need not be derivable after the substitution. But in spite of this, the
simple idea can be turned into something useful.
Assume that the lemmata D and the goal formula G are such that we
can derive
→ Di [⊥ := ∃y G],
D (30)
G[⊥ := ∃y G] → ∃y G. (31)

Assume also that the substitution [⊥ := ∃y G] turns any axiom into an

instance of the same axiom-schema, or else into a derivable formula. Then
from our given derivation (in minimal logic) of D → ∀y (G → ⊥) → ⊥
we obtain

D[⊥ := ∃y G] → ∀y (G[⊥ := ∃y G] → ∃y G) → ∃y G.

Now (30) allows the substitution in D to be dropped, and by (31) the

second premise is derivable. Hence we obtain as desired
→ ∃y G.
D

We shall identify classes of formulas—to be called deﬁnite and goal

formulas—such that slight generalizations of (30) and (31) hold. This
will be done in 7.3.1. In 7.3.2 we then prove our main theorem about
extraction from classical proofs.
We end the section with some examples of our general machinery. From
a classical proof of the existence of the Fibonacci numbers we extract
in 7.3.4 a short and eﬃcient program, where -expressions rather than
pairs are passed. In 7.3.6 we consider unary functions f, g, h, s on the
natural numbers, and a simple proof that for s not surjective, h ◦ s ◦ h
cannot be the identity. It turns out that a rather unexpected program is
extracted. In 7.3.5 we treat as a further example a classical proof of the
well-foundedness of < on N. Finally in 7.3.7 we take up a suggestion
of Bezem and Veldman [1993] and present a short classical proof of (the
general form of) Dickson’s lemma, as an interesting candidate for further
study.
354 7. Extracting computational content from proofs

7.3.1. Deﬁnite and goal formulas. We simultaneously inductively deﬁne

the classes D of definite formulas, G of goal formulas, R of relevant definite
formulas and I of irrelevant goal formulas. Let D, G, R, I range over D,
G, R, I, respectively, P over prime formulas distinct from ⊥, and D0 over
quantifier-free formulas in D.
D, G, R and I are generated by the clauses
(a) R, P, I → D, ∀x D ∈ D.
(b) I, ⊥, R → G, D0 → G ∈ G.
(c) ⊥, G → R, ∀x R ∈ R.
(d) P, D → I, ∀x I ∈ I.
Let AF denote A[⊥ := F], and ¬A, ¬⊥ A abbreviate A → F, A → ⊥,
respectively.
Lemma. We have derivations from F → ⊥ and F → P of
D F → D, (32)
G → ¬⊥ ¬ ⊥ G F , (33)
¬⊥ ¬R → R,
F
(34)
I →I . F
(35)
Proof. We prove (32)–(35) simultaneously, by induction on formulas.
(32). Case ⊥. Then ⊥F = F and the claim follows from our assumption
F → ⊥. Case P. Obvious. Case ∀x D. By induction hypothesis (32) for
D we have D F → D, which clearly implies ∀x D F → ∀x D.
Case R.
¬RF RF
F→⊥ F
| ⊥
¬⊥ ¬RF → R ¬⊥ ¬RF
R
RF → R
Here we have used (34) and F → ⊥.
Case I → D.
|
I → IF I
| I →D
F F
IF
DF → D DF
D
(I F → D F ) → I → D
Here we have used the induction hypotheses (35) for I and (32) for D.
(33). Case ⊥. Clear. Case P. Clear, since P F is P. Case I . This is clear
again, using the induction hypothesis (35).
7.3. Refined A-translation 355

Case R → G. We have to prove (R → G) → ¬⊥ ¬⊥ (RF → G F ). Let

D1 [R → G, ¬⊥ (RF → G F )] : ¬⊥ R be
GF
| R→G R ¬⊥ (R → G )
F F
R → GF
F

G → ¬⊥ ¬ ⊥ G F G ⊥
¬⊥ ¬⊥ G F ¬⊥ G F
⊥
¬⊥ R
(by induction hypothesis (33) for G) and D2 [¬⊥ (RF → G F )] : ¬⊥ ¬RF be
¬RF RF
F
..
.
GF
¬⊥ (RF → G F ) RF → G F
⊥
¬⊥ ¬RF
Note that G F is derivable from F, using our assumption F → P.
D2 [¬⊥ (RF → G F )]
D1 [R → G, ¬⊥ (RF → G F )] | |
| ¬⊥ ¬RF → R ¬⊥ ¬RF
¬⊥ R R
⊥
(R → G) → ¬⊥ ¬⊥ (RF → G F )
Here we have used the induction hypothesis (34) for R.
Case D0 → G. We have to prove (D0 → G) → ¬⊥ ¬⊥ (D0F → G F ). Let
D1 [D0 → G, ¬⊥ (D0F → G F )] : ¬⊥ D0 and D2 [¬⊥ (D0F → G F )] : ¬⊥ ¬D0F
be as above. We use (D0F → ⊥) → (¬D0F → ⊥) → ⊥, i.e., case distinction
on D0F . Hence it suﬃces to derive from D0 → G and ¬⊥ (D0F → G F )
both ¬⊥ D0F and ¬⊥ ¬D0F ; recall that our goal is (D0 → G) → ¬⊥ (D0F →
G F ) → ⊥. The negative case is provided by D2 [¬⊥ (D0F → G F )], and the
positive case by
D1 [D0 → G, ¬⊥ (D0F → G F )] |
| D0F → D0 D0F
¬⊥ D0 D0
⊥
¬⊥ D0F
Here we have used the induction hypothesis (32) for D0 .
(34). Case ⊥. Clearly ¬⊥ ¬⊥ (F → F) is derivable.
356 7. Extracting computational content from proofs

Case ∀x R.
∀x R F
¬R F
RF
F
¬⊥ ¬∀x RF ¬∀x RF
| ⊥
¬⊥ ¬RF → R ¬⊥ ¬RF
R
∀x R
¬⊥ ¬∀x RF → ∀x R
Here we have used the induction hypothesis (34) for R.
Case G → R.
G F → RF G F
¬R F
RF
F
| ¬⊥ ¬(G F → RF ) ¬(G F → RF )
G → ¬⊥ ¬ ⊥ G F G ⊥
¬ ⊥ ¬⊥ G F ¬⊥ G F
| ⊥
¬⊥ ¬RF → R ¬⊥ ¬RF
R
¬⊥ ¬(G F → RF ) → G → R
Here we have used the induction hypotheses (34) for R and (33) for G.
(35). Case P. Clear. Case ∀x I . This is clear again, using the induction
hypothesis (35) for I .
Case D → I .
|
DF → D DF
| D→I D
I → IF I
IF
(D → I ) → D F → I F
Here we have used the induction hypotheses (35) for I and (32) for D.
Remark. Is D the largest class of formulas such that D → D is provable
F

intuitionistically? This is not the case, as the following example shows.

S := ∀x (((Qx → F) → F) → Qx),
D := (∀x Qx → ⊥) → ⊥.
One can easily derive (S → D)F → S → D, since S F is S and a derivation
of D F → S → D can be found easily.
7.3. Reﬁned A-translation 357

However, S → D ∈ / D, since D ∈
/ D. This is because D is neither (i) in
R nor (ii) of the form I → D1 . For (i), observe that if D were in R, then
its premise ∀x Qx → ⊥ would be in G, hence ∀x Qx in R, which is not the
case. For (ii), observe that ∀x Qx → ⊥ is not in I bcause ⊥ ∈/ I.
It is an open problem to find a useful characterization of the class of
formulas such that D F → D is provable intuitionistically.
= G1 , . . . , Gn we have a derivation from
Lemma. For goal formulas G
F → ⊥ of
F → ⊥) → G
(G → ⊥. (36)
Proof. Assume F → ⊥. By (33)
Gi → (GiF → ⊥) → ⊥
for all i = 1, . . . , n. Now the assertion follows by minimal logic: Assume
G F → ⊥ and G; we must show ⊥. By G1 → (G F → ⊥) → ⊥ it suffices
1
to prove G1 → ⊥. Assume G1F . By G2 → (G2F → ⊥) → ⊥ it suffices
F

to prove G2F → ⊥. Assume G2F . Repeating this pattern, we ﬁnally have

assumptions G1F , . . . , GnF available, and obtain ⊥ from G F → ⊥.
7.3.2. Extraction from weak existence proofs.
Theorem (Strong from weak existence proofs). Assume that for arbi-
trary formulas A, definite formulas D and goal formulas G
we have a
derivation M∃˜ of
A → D
→ ∀y (G
→ ⊥) → ⊥. (37)
Then from F → ⊥ and F → P, where F is as defined in 7.1.3, we can derive
A → D
F → ∀y (G
F → ⊥) → ⊥.
G.
for all prime formulas in D, In particular, substitution of the formula
∃y G
F := ∃y (G1F ∧ · · · ∧ GnF )
for ⊥ yields a derivation M∃ from the F → P of
:= ∃y G
A[⊥ F] → D F → ∃y G
F. (38)
Proof. The first assertion follows from (32) (to infer D from D F)
and (36) (to infer G → ⊥ from G → ⊥). The second assertion is
F

a simple consequence since ∀y (G F → ∃y G F ) and F → ∃y G F are both

derivable.
We shall apply the method of realizability to extract computational
content from the resulting strong existence proof M∃ . Recall that this
proof essentially follows the given weak existence proof M∃˜ . The only
from D
diﬀerence is that proofs of (32) (to infer D F ) and (36) (to infer
→ ⊥ from G
G F → ⊥) have been inserted. Therefore the extracted term
can be structured in a similar way, with one part determined solely by M∃˜
358 7. Extracting computational content from proofs

and another part depending only on the definite formulas D and and goal
For simplicity let G
formulas G. consist of a single goal formula G.
To make the method work we need to assume that all prime formulas
P appearing in D F , G F are n.c. and invariant (for instance, equalities).
Lemma. Let D be a definite and G a goal formula. Assume that all prime
formulas P in D F , G F are n.c. and invariant.
(a) We have a term tD such that
D F → tD r D
is derivable from ∀y (F → y r ⊥) and F → P.
(b) We have a term sG such that
(G F → v r ⊥) → w r G → sG vw r ⊥
is derivable from ∀y (F → y r ⊥) and F → P.
Proof. The assumption implies that all formulas D F , G F are n.c. and
invariant as well, by the definition of realizability.
(a) By (32) we have a derivation ND of D F → D from assumptions
F → ⊥ and F → P. By the soundness theorem we can take tD := et(ND ).
(b) By (33) we have a derivation HG of (G F → ⊥) → G → ⊥ from as-
sumptions F → ⊥ and F → P. Observe that the following are equivalent:
et(HG ) r ((G F → ⊥) → G → ⊥),
∀v,w (v r (G F → ⊥) → w r G → et(HG )vw r ⊥),
∀v,w ((G F → v r ⊥) → w r G → et(HG )vw r ⊥).
Hence we can take sG := et(HG ).
Theorem (Extraction from weak existence proofs). Assume that for
and a goal formula G(y) we have a derivation M ˜ of
definite formulas D ∃
→ ∀y (G(y) → ⊥) → ⊥.
D
Assume that all prime formulas P in D F , G F (y) are n.c. and invariant. Let
t1 , . . . , tn and s be terms for D1 , . . . , Dn and G according to parts (a) and
(b) of the lemma above. Then from assumptions F → P we can derive
F → G F (et(M ˜ )t1 . . . tn s),
D ∃
where M∃˜ is the result of substituting ∃y G F (y) for ⊥ in M∃˜ .
Proof. By the soundness theorem we have
et(M ˜ ) r (D → ∀y (G(y) → ⊥) → ⊥),
∃

∀u,x ( → x r ∀y (G(y) → ⊥) → et(M ˜ )

urD ∃ u x r ⊥),

∀u,x ( → ∀y,w (w r G(y) → xyw r ⊥) → et(M ˜ )

urD ∃ u x r ⊥).

Instantiating u , x by t, s, respectively, we obtain

→ ∀y,w (w r G(y) → syw r ⊥) → et(M ˜ )ts r ⊥.
t r D ∃
7.3. Reﬁned A-translation 359

Hence by part (a) of the lemma above we have a derivation of

F → ∀y,w (w r G(y) → syw r ⊥) → et(M ˜ )ts r ⊥
D ∃

from ∀y (F → y r ⊥) and F → P. Substituting ⊥ by ∃y G F (y) gives

F → ∀y,w ((w r G(y))[⊥ := ∃y G F (y)] → G F (syw)) → G F (et(M ˜ )ts)
D ∃

from F → P. Substituting ⊥ by ∃y G F (y) in the formula derived in part

(b) of the lemma above gives
(G F (y) → G F (v)) → (w r G(y))[⊥ := ∃y G F (y)] → G F (svw)
from F → P. Instantiating this with v := y we obtain a derivation of
F → G F (et(M ˜ )ts)
D ∃

from F → P, as required.
Remark. The theorem can be generalized by allowing arbitrary formu-
las A as additional premises. Then the final conclusion needs additional
:= ∃y G F (y)], and we must assume that we have term r such
premises A[⊥
:= ∃y G F (y)] → (r r A)[⊥
that A[⊥ := ∃y G F (y)] is derivable. Moreover,
the et(M∃˜ ) in the final conclusion needs r as additional arguments.
Below we will give examples for this “refined” A-translation. However,
let us check first the mechanism of working with definite and goal formulas
for Kreisel’s “non-example” mentioned in the introduction. There we gave
a trivial proof of a ∀∃-formula
˜ that cannot be realized by a computable
function, and we want to make sure that our general result also does not
provide such a function. The example amounts to a proof of
∀z (¬⊥ ¬⊥ Rxz → Rxz) → ∀y ((Rxy → ∀z Rxz) → ⊥) → ⊥.
Here Rxy → ∀z Rxz is a goal formula, but the premise ∀z (¬⊥ ¬⊥ Rxz →
Rxz) is not definite. Replacing R by ¬⊥ S (to get rid of the stability
assumption) does not help, for then ¬⊥ Sxy → ∀z ¬⊥ Sxz is not a goal
formula.
Note (Critical predicate symbols). To apply these results we have to
know that our assumptions are definite formulas and our goal is given
by goal formulas. For quantifier-free formulas this clearly can always
be achieved by inserting double negations in front of every atom (cf.
the definitions of definite and goal formulas). This corresponds to the
original (unrefined) so-called A-translation of Friedman [1978] and Dra-
galin [1979]; see also Leivant [1985]. However, in order to obtain reason-
able programs which do not unnecessarily use higher types or case analysis
we want to insert double negations only at as few places as possible.
We describe a more economical and general way to obtain definite and
goal formulas. It consists in singling out some predicate symbols as being
360 7. Extracting computational content from proofs

“critical”, and then double negating only the atoms formed with critical
predicate symbols; call these critical atoms. Assume we have a proof of
∀x1 C1 → · · · → ∀xn Cn → ∀y (B
→ ⊥) → ⊥

with C , B
quantiﬁer-free. Let

L := {C1 , . . . , Cn , B
→ ⊥}.

The set of L-critical predicate symbols is deﬁned to be the smallest set

satisfying
(i) ⊥ is L-critical.
(ii) If (C1 → R1 s1 ) → · · · → (Cm → Rm sm ) → Rs is a positive
subformula of L, and if some Ri is L-critical, then R is L-critical.
Now if we double negate every L-critical atom different from ⊥ we clearly
obtain definite assumptions C and goal formulas B .
However, in particular cases we might be able to obtain definite and
goal formulas with still fewer double negations: it may not be necessary
to double negate every critical atom.
We now present some simple examples of how to apply this method. In
all of them we will have a single goal formula G. However, before we do
this we describe a useful method to suppress somewhat obvious proofs of
totality in derivation terms.
7.3.3. Suppressing totality proofs. In a derivation involving induction
we need to provide totality proofs in order to be able to use the induction
axiom. For instance, when we want to apply an induction axiom ∀n∈T A(n)
to a term r, we must know T (r) to conclude A(r). However, in many cases
such totality proofs are easy: Suppose r is k+l , and we already know T (k)
and T (l ). Then—referring to a proof of T (+) which is done once and for
all—we clearly know T (k + l ). In order to suppress such trivial proofs,
we mark the addition function + as total, and call a term syntactically
total if it is built from total variables by total function constants. Then we
allow an inference
∀n∈T A(n) r
A(r)
or (written as derivation term) M ∀n∈T A(n) r provided r is syntactically total.
It is clear that and how this “derivation” can be expanded into a proper
one. Since in the rest of the present section all variables will be restricted
to total ones we shall write ∀n A for ∀n∈T A. We also write for N.
7.3.4. Example: Fibonacci numbers. Let αn be the n-th Fibonacci num-
ber, i.e.,
α0 := 0, α1 := 1, αn := αn−2 + αn−1 for n ≥ 2.
7.3. Refined A-translation 361

We give a weak existence proof for the Fibonacci numbers:

∀n ∃˜ k Gnk, i.e., ∀n (∀k (Gnk → ⊥) → ⊥)
from clauses expressing that G is the graph of the Fibonacci function:
v0 : G00, v1 : G11, v2 : ∀n,k,l (Gnk → G(n + 1, l ) → G(n + 2, k + l )).
We view G as a predicate variable without computational content. Clearly
the clause formulas are definite and Gnk is a goal formula. To construct
a derivation, assume (n ∈ T and)
u : ∀k (Gnk → ⊥).
Our goal is ⊥. To this end we first prove a strengthened claim in order to
get the induction through:
∀n B(n) with B(n) := ∀k,l (Gnk → G(n + 1, l ) → ⊥) → ⊥.
This is proved by induction on n. The base case follows from the first two
clauses. In the step case we can assume that we have k, l satisfying Gnk and
G(n + 1, l ). We need k , l such that G(n + 1, k ) and G(n + 2, l ). Using
the third clause simply take k := l and l := k + l . To obtain our goal ⊥
from ∀n B, it suffices to prove its premise ∀k,l (Gnk → G(n + 1, l ) → ⊥).
So let k, l be given and assume u1 : Gnk and u2 : G(n + 1, l ). Then u
applied to k and u1 gives our goal ⊥.
The derivation term is
M∃˜ = ∀u k (Gnk→⊥) (Indn,B nMbase Mstep k,l Gnk G(n+1,l )
u1 u2 (uku1 ))
where
Indn,B(n) : ∀n (B(0) → ∀n (B(n) → B(Sn)) → B(n)),
∀ (G0k→G1l →⊥)
Mbase = wk,l0 (w0 01v0 v1 ),
∀ (G(n+1,k)→G(n+2,l )→⊥)
Mstep = n Bw wk,l1 (
wk,l Gnk G(n+1,l )
u1 u2 (w1 l (k + l )u2 (v2 nklu1 u2 ))).
Let M denote the result of substituting ⊥ by ∃k Gnk in M . Since nei-
ther the clauses nor the goal formula Gnk contain ⊥, the extracted term
according to the theorem above is et(M∃˜ )v v. The term et(M∃˜ ) can be
computed from M∃˜ as follows. For the object variable assigned to an
assumption variable u we shall use the same name:
et(M∃˜ ) = → (→→)→
u (R

n et(Mbase
) et(Mstep )k,l (uk))
where

et(Mbase ) = →→
w0 (w0 01),

et(Mstep ) = n (→→)→
w →→
w1 (wk,l (w1 l (k + l ))).
362 7. Extracting computational content from proofs

The construction of this proof, its A-translation and the extraction of

a realizer can all be done by the Minlog system. The normal form of the
extracted term et(M∃˜ )v v is printed as
[n0]
(Rec nat=>(nat=>nat=>nat)=>nat)n0([f1]f1 0 1)
([n1,p2,f3]p2([n4,n5]f3 n5(n4+n5)))
([n1,n2]n1)
with p (for “previous”) a name for variables of type (nat=>nat=>nat)=>
nat and f of type nat=>nat=>nat. The underlying algorithm defines an
auxiliary functional H by
H (0, f) := f(0, 1), H (n + 1, f) := H (n, k,l f(l, k + l ))
and gives the result by applying H to the original number and the first
projection k,l k. This is a linear algorithm in tail recursive form. It is
somewhat unexpected since it passes functions (rather than pairs, as one
would ordinarily do), and hence uses functional programming in a proper
way, in fact in “continuation passing style”. This clearly is related to the
use of classical logic, which by its use of double negations has a functional
flavour.
7.3.5. Example: well-foundedness of the natural numbers. An interes-
ting phenomenon can occur when we extract a program from a classical
proof which uses the minimum principle. Consider as a simple example
the well-foundedness of < on the natural numbers, i.e.,
∀f → ∃˜ k (fk ≤ fk+1 ).
If one formalizes the classical proof “choose k such that fk is minimal”
and extracts a program one might expect that it computes a k such that
fk is minimal. But this is impossible! In fact the program computes
the least k such that fk ≤ fk+1 instead. This discrepancy between the
classical proof and the extracted program can of course only show up if
the solution is not uniquely determined.
We begin with a rather detailed exposition of the classical proof, since
we need a complete formalization. Our goal is ∃˜ k (fk ≤ fk+1 ), and
the classical proof consists in using the minimum principle to choose a
minimal element in the range of f. This suffices, for if we have such a
minimal element, say n0 , then it must be of the form fk0 , and by the choice
of n0 we have fk0 ≤ fk for every k, so in particular fk0 ≤ fk0 +1 .
Next we need to prove the minimum principle
∃˜ k Rk → ∃˜ k (Rk ∧ ∀l<k (Rl → ⊥))
from ordinary zero-successor-induction. The minimum principle is logi-
cally equivalent to
∀k (∀l<k (Rl → ⊥) → Rk → ⊥) → ∀k (Rk → ⊥).
7.3. Refined A-translation 363

Abbreviate the premise by w1 : Prog; it expresses the “progressiveness” of

Rk → ⊥ w.r.t. <. We give a proof by zero-successor-induction on n w.r.t.
A(n) := ∀k<n (Rk → ⊥).
Base. To show A(0) let k be given and assume w2 : k < 0 and w3 : Rk.
Then the required ⊥ follows by applying an arithmetical lemma v1 : ∀m<0 ⊥
to k and w2 .
Step. Let n be given and assume w4 : A(n). To show A(n + 1) let k
be given and assume w5 : k < n + 1. We will derive Rk → ⊥ by using
w1 : Prog at k. Hence we have to prove
∀l<k (Rl → ⊥).
So, let l be given and assume further w6 : l < k. From w6 and w5 : k < n+1
we infer l < n (using an arithmetical lemma). Hence, by the induction
hypothesis w4 : A(n) at l we get Rl → ⊥.
Now a complete formalization is easy. We express m ≤ k by k < m → ⊥
and formalize a variant of the proof just given with ∀m (fm = k) (i.e.,
∀m (fm = k → ⊥)) instead of Rk → ⊥; this does not change much. The
derivation term is
M∃˜ := ∀v1m<0 ⊥
∀u k ((fk+1 <fk →⊥)→⊥) (
Prog→∀k ∀m (fm =k)
Mcvind Mprog f0 0Lf0 =f0 )
where
Mcvind = Prog
w1 k (Indn,B(n) f(k + 1)Mbase Mstep kL
k<k+1
),
Mbase = k k<0 fm =k
w2 m w3 (v1 kw2 ),
Mstep = n B(n) k<n+1
w4 k w5 (w1 kl l<k
w6 (w4 l (L
l<n
[w6 , w5 ]))),

Mprog = k ∀u1l<k ∀m (fm =l ) m fwm3 =k (umfwm+1

7
<fm
(
u1 fm+1 Lfm+1 <k [w7 , w3 ](m + 1)Lfm+1 =fm+1 )).
Here we have used the abbreviations
Prog = ∀k (∀l<k ∀m (fm = l ) → ∀m (fm = k)),
B(n) = ∀k<n ∀m (fm = k),
Indn,B(n) = ∀f,n (B(0) → ∀n (B(n) → B(n + 1)) → B(n)).
Let M denote the result of substituting ⊥ by ∃k (fk+1 < fk → F) in
M . The term et(M∃˜ ) can be computed from M∃˜ as follows. For the object
variable assigned to an assumption variable u we shall use the same name.
et(M ) = → →→
v1 u

(et(Mcvind
)et(Mprog )f0 0)
364 7. Extracting computational content from proofs

where

et(Mcvind ) = →(→→)→→
w1 k (R→→ (k + 1)et(Mbase )et(Mstep )k),

et(Mbase ) = k,m (v1 k),

et(Mstep ) = n →→
w4 k (w1 kl (w4 l )),

et(Mprog ) = k →→
u1 m (um(u1 fm+1 (m + 1))).

Note that k is not used in et(Mprog ); this is the reason why the optimization
below is possible.
Recall that by the extraction theorem, the term extracted from the
present proof has the form et(M∃˜ )t1 . . . tn s where t1 , . . . , tn and s are
terms for D1 , . . . , Dn and G according to parts (a) and (b) of the lemma
in 7.3.2. In our case we have just one deﬁnite formula D = ∀k<0 ⊥, and
since we can derive
∀k<0 F → (n 0) r ∀k<0 ⊥
from ∀k (F → k r ⊥), we can take t := n 0. The goal formula in our case
is G := (fk+1 < fk → ⊥). For this G we can derive directly
((fk+1 < fk → F) → v r ⊥) → (fk+1 < fk → w r ⊥) → svw r ⊥
with
s := v,w [if fk+1 < fk then w else v].
Then the term extracted according to the theorem is
et(M∃˜ )ts = et(Mcvind

)t,s et(Mprog )t,s f0 0
where t,s indicates substitution of t, s for v1 , u. Therefore

et(Mcvind )t,s = w1 ,k (R(k + 1)k,m 0n,w4 ,k (w1 kw4 )k ,

et(Mprog )t,s = k,u1 ,m [if fm+1 < fm then u1 (fm+1 )(m + 1) else m].
Hence we obtain as extracted term
et(M∃˜ )ts = R(f0 + 1)rbase rstep f0 0
with
rbase := k,m 0,
rstep := n,w4 ,k,m [if fm+1 < fm then w4 fm+1 (m + 1) else m].
Since the recursion argument is f0 + 1, we can convert et(M∃˜ )ts into
[if f1 < f0 then Rf0 rbase rstep f1 1 else 0].
The machine-extracted term (original output of Minlog) is almost literally
the same:
7.3. Reﬁned A-translation 365

[if (f 1<f 0)
((Rec nat=>nat=>nat=>nat)(f 0)([n0,n1]0)
([n0,g1,n2,n3][if (f(Succ n3)<f n3)
(g1(f(Succ n3))(Succ n3))
n3])
(f 1)
1)
0]
To make this algorithm more readable we may deﬁne

h(0, k, m) = 0,
h(n + 1, k, m) = [if fm+1 < fm then h(n, fm+1 , m + 1) else m]

and then write the result as h(f0 + 1, f0 , 0), or (unfolded) as

[if f1 < f0 then h(f0 , f1 , 1) else 0].

Note that k is not used here (this will always happen if induction is used
in the form of the minimum principle only). Now it is immediate to see
that the program computes the least k such that fk+1 < fk → ⊥, where
f0 + 1 only serves as an upper bound for the search
7.3.6. Example: the hsh-theorem. Let f, g, h, s denote unary functions
on the natural numbers. We show ∃˜ n (h(s(hn)) = n) and extract an
(unexpected) program from it.
Lemma (Surjectivity). g ◦ f surjective implies g surjective.
Lemma (Injectivity). g ◦ f injective implies f injective.
Lemma (Surjectivity–injectivity). g ◦ f surjective and g injective implies
f surjective.
Proof. Assume y is not in the range of f. Consider g(y). Since g ◦f is
surjective, there is an x with g(y) = g(f(x)). The injectivity of g implies
y = f(x), a contradiction.
Theorem (hsh-theorem). ∀n (s(n) = 0) → ¬∀n (h(s(h(n))) = n).
Proof. Assume h ◦ s ◦ h is the identity. Then by the injectivity lemma h
is injective. Hence by the surjectivity–injectivity lemma s ◦ h is surjective,
and therefore by the surjectivity lemma s is surjective, a contradiction.
From the Gödel–Gentzen translation and the fact that we can systemat-
ically replace triple negations by single negations we obtain a derivation of

∀n (s(n) = 0) → ∃˜ n (h(s(hn)) = n).

Now since ∀n (s(n) = 0) is a deﬁnite formula, this is in the form where our
general theory applies. The extracted program is, somewhat unexpectedly,
366 7. Extracting computational content from proofs

[s,h][if (h(s(h(h 0)))=h 0)

[if (h(s(h(s(h(h 0)))))=s(h(h 0)))
0
(s(h(h 0)))]
(h 0)]
Let us see why this program indeed provides a counter example to the
assumption that h ◦ s ◦ h is the identity.
If h(s(h(h0))) = h0, take h0. So assume h(s(h(h0))) = h0. If
h(s(h(s(h(h0))))) = s(h(h0)),
then also h(s(h0)) = s(h(h0)), so 0 is a counter example, because the
right hand side cannot be 0 (this was our assumption on s). So assume
h(s(h(s(h(h0))))) = s(h(h0)).
Then s(h(h0)) is a counter example.
7.3.7. Towards more interesting examples. Bezem and Veldman
[1993] suggested Dickson’s lemma [1913] as an interesting case study for
program extraction from classical proofs. It states that for k given inﬁnite
sequences f1 , . . . , fk of natural numbers and a given number l there are
indices i1 , . . . , il such that every sequence fκ increases on i1 , . . . , il , i.e.,
fκ (i1 ) ≤ · · · ≤ fκ (il ) for κ = 1, . . . , k. Here is a short classical proof,
using the minimum principle for undecidable sets.
Call a unary predicate (or set) Q ⊆ N unboundedif ∀x ∃˜ y>x Q(y).
Lemma. Let Q be unbounded and f a function from a superset of Q to
N. Then the set Qf of left f-minima w.r.t. Q is unbounded ; here

Qf (x) := Q(x) ∧ ∀y>x (Q(y) → f(x) ≤ f(y)).

Proof. Let x be given. We must ﬁnd y > x with Qf (y). The minimum
principle for {y > x | Q(y)} with measure f yields

∃˜ y>x Q(y) → ∃˜ y>x (Q(y) ∧ ∀z>x (Q(z) → f(y) ≤ f(z))).

Since Q is assumed to be unbounded, the premise is true. We show that
the y provided by the conclusion satisﬁes Qf (y), that is,

Q(y) ∧ ∀z>y (Q(z) → f(y) ≤ f(z)).

Let z > y with Q(z). From y > x we obtain z > x. Hence f(y) ≤
f(z).
Let Q be unbounded and f0 , f1 . . . be functions from a superset of Q
to N. Then for every k there is an unbounded subset Qk of Q such that
f0 , . . . , fk−1 increase on Qk w.r.t. Q, that is, ∀x,y;x<y (Qk (x) → Q(y) →
fi (x) ≤ fi (y)).
7.4. Gödel’s Dialectica interpretation 367

Lemma.

∀x ∃˜ y>x Q(y) → ∀k ∃Qk ⊆Q (∀x ∃˜ y>x Qk (y) ∧

∀i<k ∀x,y;x<y (Qk (x) → Q(y) → fi (x) ≤ fi (y))).
Proof. By induction on k. Base. Let Q0 := Q. Step. Consider
(Qk )fk . By induction hypothesis, f0 , . . . , fk−1 increase on Qk w.r.t. Q,
and therefore also on its subset (Qk )fk . Moreover, by construction also
fk increases on (Qk )fk w.r.t. Q.
Corollary. For every k, l we have

k
∀f1 ,...,fk ∃˜ i0 ,...,il (i < i+1 ∧ fκ (i ) ≤ fκ (i+1 )).
<l κ=1

For k = 2 (i.e., two sequences) this example has been treated by Berger,
Schwichtenberg, and Seisenberger [2001]. However, it is interesting to
look at the general case, since then the brute force search takes time
O(n k ), and we can hope that the program extracted from the classical
proof is better.

7.4. Gödel’s Dialectica interpretation

In his original functional interpretation of [1958], Gödel assigned to

every formula A a new one ∃x ∀y AD ( x, y
) with AD (
x, y
) quantifier-free.
Here x , y are lists of variables of finite types; the use of higher types is
necessary even when the original formula A is first-order. He did this in
such a way that whenever a proof of A say in Peano arithmetic was given,
one could produce closed terms r such that the quantifier-free formula
AD (r, y
) is provable in his quantifier-free system T.
In [1958] Gödel referred to a Hilbert-style proof calculus. However,
since the realizers will be formed in a -calculus formulation of system
T, Gödel’s interpretation becomes more perspicuous when it is done for
a natural deduction calculus. The present exposition is based on such
a setup. Then the need for contractions comes up in the (only) logical
rule with two premises: modus ponens (or implication elimination →− ).
This makes it possible to give a relatively simple proof of the soundness
theorem.
7.4.1. Positive and negative types. We assign to every formula A objects
+ (A), − (A) (a type or the “nulltype” symbol ◦). + (A) is intended to
be the type of a (Dialectica-) realizer to be extracted from a proof of A,
and − (A) the type of a challenge for the claim that this term realizes A.
+ (Ps ) := ◦, − (Ps ) := ◦,
+ (∀x A) := → + (A), − (∀x A) := × − (A),
368 7. Extracting computational content from proofs

+ (∃x A) := × + (A), − (∃x A) := − (A),

+ (A ∧ B) := + (A) × + (B), − (A ∧ B) := − (A) × − (B),

and for implication

+ (A → B) := ( + (A) → + (B)) × ( + (A) → − (B) → − (A)),

− (A → B) := + (A) × − (B).

Recall that ( → ◦) := ◦, (◦ → ) := , (◦ → ◦) := ◦, and ( × ◦) := ,

(◦ × ) := , (◦ × ◦) := ◦.
In case + (A) ( − (A)) is = ◦ we say that A has positive (negative)
computational content. For formulas without positive or without negative
content one can give an easy characterization, involving the well-known
notion of positive or negative occurrences of quantiﬁers in a formula:

+ (A) = ◦ ↔ A has no positive ∃ and no negative ∀,

− (A) = ◦ ↔ A has no positive ∀ and no negative ∃,
+ (A) = − (A) = ◦ ↔ A is quantiﬁer-free.

Examples. (a) For quantiﬁer-free A0 , B0 ,

+ (∀x A0 ) = ◦, − (∀x A0 ) = ,
+ (∃x A0 ) = , − (∃x A0 ) = ◦,

+ (∀x ∃y A0 ) = ( → ), − (∀x ∃y A0 ) = .

(b) For arbitrary A, B, writing ± A for ± (A)

+ (∀z (A → B)) = → ( + A → + B) × ( + A → − B → − A),

+ (∃z A → B) = ( × + A → + B) × ( × + A → − B → − A),

− (∀z (A → B)) = × ( + A × − B),

− (∃z A → B) = ( × + A) × − B.

Later we will see many more examples.

It is interesting to note that for an existential formula with a quantiﬁer-

free kernel the positive and negative type is the same, irrespective of the
choice of the existential quantiﬁer, constructive or classical.
Lemma. ± (∃˜ x A0 ) = ± (∃x A0 ) for A0 quantiﬁer-free. In more detail,
(a) + (∃˜ x A) = + (∃x A) = × + (A) provided − (A) = ◦,
(b) − (∃˜ x A) = − (∃x A) = − (A) provided + (A) = ◦.
7.4. Gödel’s Dialectica interpretation 369

Proof. For an arbitrary formula A we have

+ (∀x (A → ⊥) → ⊥) = + (∀x (A → ⊥)) → − (∀x (A → ⊥))
= ( → + (A → ⊥)) → ( × − (A → ⊥))
= ( → + (A) → − (A)) → ( × + (A)),
+ (∃x A) = × + (A).
Both types are equal if − (A) = ◦. Similarly
− (∀x (A → ⊥) → ⊥) = + (∀x (A→⊥)) = + (A→⊥)
= + (A) → − (A),
− (∃x A) = − (A).
Both types are = − (A) if + (A) = ◦.
7.4.2. Gödel translation. For every formula A and terms r of type + (A)
and s of type − (A) we deﬁne a new quantiﬁer-free formula |A|rs by
induction on A:
|Ps |rs := Ps,
|A ∧ B|rs := |A|r0
s0 ∧ |B|s1 ,
r1
|∀x A(x)|rs := |A(s0)|r(s0)
s1 ,
|A → B|s := |A|r1(s0)(s1) → |B|r0(s0)
r s0
.
|∃x A(x)|rs := |A(r0)|r1
s , s1

The formula ∃x ∀y |A|xy is called the Gödel translation of A and is often

denoted by AD . Its quantiﬁer-free kernel |A|xy is called Gödel kernel of A;
it is denoted by AD .
For readability we sometimes write terms of a pair type in pair form:
|∀z A|fz,y := |A|fz
y ,
|A ∧ B|x,z
y,u := |A|y ∧ |B|u ,
x z

|∃z A|z,x
y := |A| x
y, |A → B|x,u := |A|gxu → |B|fx
f,g x
u .

Examples. (a) For quantiﬁer-free formulas A0 , B0 with x ∈

/ FV(B0 )
+ (∀x A0 → B0 ) = − (∀x A0 ) = , − (∀x A0 → B0 ) = ◦,
−
(∃x (A0 → B0 )) = ,
+
(∃x (A0 → B0 )) = ◦.
Then
|∀x A0 → B0 |xε = |∀x A0 |εx → |B0 |εε = A0 → B0 ,
|∃x (A0 → B0 )|xε = A0 → B0 .
(b) For A with + (A) = ◦ and z ∈
/ FV(A), and arbitrary B
+ (A → ∃z B) = ( × + (B)) × ( + (B) → − (A)),
+ (∃z (A → B)) = × ( + (B) × ( + (B) → − (A))),

− (A → ∃z B) = − (B),
− (∃z (A → B)) = − (B).
370 7. Extracting computational content from proofs

Then
|A → ∃z B|vz,y ,g
= |A|εgv → |∃z B|z,y
v = |A|gv → |B|v ,
ε y

|∃z (A → B)|z,
v
y,g
= |A → B|y,g
v = |A|gv → |B|v .
ε y

(c) For arbitrary A

+ (∀x ∃y A(x, y)) = ( → × + (A)),
+ (∃f → ∀x A(x, fx)) = ( → ) × ( → + (A)),

− (∀x ∃y A(x, y)) = × − (A),

= |A|xw → |∃z A|z,x

w
= |A|xw → |A|xw .
7.4.3. Characterization. We consider the question when the Gödel
translation of a formula A is equivalent to the formula itself. This will only
hold if we assume the (constructively doubtful) Markov principle (MP),
for higher type variables and quantiﬁer-free formulas A0 , B0 :
(∀x A0 → B0 ) → ∃x (A0 → B0 ) (x ∈
/ FV(B0 )).
We will also need the less problematic axiom of choice (AC)
∀x ∃y A(x, y) → ∃f → ∀x A(x, f(x)).
and the independence of premise axiom (IP)
(A → ∃x B) → ∃x (A → B) (x ∈
/ FV(A), + (A) = ◦).
Notice that (AC) expresses that we can only have continuous dependen-
cies.
Theorem (Characterization).
AC + IP + MP (A ↔ ∃x ∀y |A|xy ).
7.4. Gödel’s Dialectica interpretation 371

Proof. Induction on A; we only treat the implication case.

(A → B) ↔ (∃x ∀y |A|xy → ∃v ∀u |B|vu ) by induction hypothesis

↔ ∀x (∀y |A|xy → ∃v ∀u |B|vu )
↔ ∀x ∃v (∀y |A|xy → ∀u |B|vu ) by (IP)
↔ ∀x ∃v ∀u (∀y |A|xy → |B|vu )
↔ ∀x ∃v ∀u ∃y (|A|xy → |B|vu ) by (MP)
↔ ∃f ∀x ∀u ∃y (|A|xy → |B|fx
u ) by (AC)
↔ ∃f,g ∀x,u (|A|xgxu → |B|fx
u ) by (AC)
↔ ∃f,g ∀x,u |A → B|f,g
x,u

where the last step is by deﬁnition.

Without the Markov principle one can still prove some relations between
A and its Gödel translation. This, however, requires conditions G + (A),
G − (A) on A, deﬁned inductively by

G ± (Ps ) := ,,
G + (A → B) := ( − (A) = ◦) ∧ G − (A) ∧ G + (B),
G − (A → B) := G + (A) ∧ G − (B),
G ± (A ∧ B) := G ± (A) ∧ G ± (B),
G ± (∀x A) := G ± (A), G ± (∃x A) := G ± (A).

Proposition.

AC ∃x ∀y |A|xy → A if G − (A), (39)

AC A → ∃x ∀y |A|xy +
if G (A). (40)

Proof. Both directions are proved simultaneously, by induction on A.

Case ∀z A. (39). Assume G − (A).

∃f ∀z,y |∀z A|fz,y → ∃f ∀z,y |A|fz

y by deﬁnition
→ ∀z ∃x ∀y |A|xy
→ ∀z A by induction hypothesis, using G − (A).

(40). Assume G + (A).

∀z A → ∀z ∃x ∀y |A|xy by induction hypothesis, using G + (A)

→ ∃f ∀z ∀y |A|fz
y by (AC)
→ ∃f ∀z,y |∀z A|fz,y by deﬁnition.
372 7. Extracting computational content from proofs

Case A → B. (39). Assume G + (A) and G − (B).

∃f,g ∀x,u |A → B|f,g
x,u → ∃f,g ∀x,u (|A|gxu → |B|u )
x fx
by deﬁnition
→ ∃f ∀x ∀u ∃y (|A|xy → |B|fx
u )
→ ∀x ∃v ∀u ∃y (|A|y → |B|u )
x v

→ ∀x ∃v ∀u (∀y |A|xy → |B|vu )

→ ∀x ∃v (∀y |A|xy → ∀u |B|vu )
→ ∀x (∀y |A|xy → ∃v ∀u |B|vu )
→ (∃x ∀y |A|xy → ∃v ∀u |B|vu )
→ (A → B) by induction hypothesis,
where in the ﬁnal step we have used G + (A) and G − (B).
(40). Assume − (A) = ◦, G − (A) and G + (B).
(A → B) → (∃x |A|xε → ∃v ∀u |B|vu ) by induction hypothesis
→ ∀x (|A|xε → ∃v ∀u |B|vu )
→ ∀x ∃v ∀u (|A|xε → |B|vu )
→ ∃f ∀x ∀u (|A|xε → |B|fx
u ) by (AC)

→ ∃f ∀x,u |A → B|fx,u by deﬁnition.

Case ∃z A. (39). Assume G − (A).
∃z,x ∀y |∃z A|z,x
y → ∃z ∃x ∀y |A|y
x
by deﬁnition
→ ∃z A by induction hypothesis, using G − (A).
(40). Assume G + (A).
∃z A → ∃z ∃x ∀y |A|xy by induction hypothesis, using G + (A).
→ ∃z,x ∀y |∃z A|z,x
y by deﬁnition.

7.4.4. Soundness. Let Heyting arithmetic HA in all finite types be the
fragment of TCF where (i) the only base types are N and B, and (ii) the
only inductively defined predicates are totality, Leibniz equality Eq, the
(proper) existential quantifier and conjunction. We prove soundness of
the Dialectica interpretation for HA + AC + IP + MP, for our natural
deduction formulation of the underlying logic.
We first treat some axioms, and show that each of them has a “logical
Dialectica realizer”, that is, a term t such that ∀y |A|ty can be proved in
HA .
For (∃+ ) this was proved in example (d) of 7.4.2. The introduction
axioms for totality and Eq, conjunction introduction (∧+ ) and elimina-
tion (∧− ) all have obvious Dialectica realizers. The elimination axioms
for totality (i.e., induction) and for existence are treated below, in their
7.4. Gödel’s Dialectica interpretation 373

(equivalent) rule formulation. The elimination axiom for Eq can be dealt

with similarly.
The axioms (MP), (IP) and (AC) all have the form C → D where
+ (C ) ∼ + (D) and − (C ) ∼ − (D), with ∼ indicating that and
are canonically isomorphic. This has been veriﬁed for (MP), (IP) and
(AC) in examples (a)–(c) of 7.4.2, respectively. Such canonical isomor-
phisms can be expressed by -terms

f + : + (C ) → + (D), f − : − (C ) → − (D),
g + : + (D) → + (C ), g − : − (D) → − (C )

(they have been written explicitly in 7.4.2). It is easy to check that the
+
Gödel translations |C |ug − v and |D|fv u are equal (modulo -conversion).
But then f + , u g − is a Dialectica realizer for the axiom C → D, because
+ − +
|C → D|fu,v ,u g = |C |ug − v → |D|fv u .

Theorem (Soundness). Let M be a derivation

HA + AC + IP + MP A

from assumptions ui : Ci (i = 1, . . . , n). Let xi of type + (Ci ) be variables

for realizers of the assumptions, and y be a variable of type − (A) for a
challenge of the goal. Then we can ﬁnd terms et+ (M ) =: t of type + (A)
with y ∈/ FV(t) and et− −
i (M ) =: ri of type (Ci ), and a derivation in HA

of |A|y from assumptions ūi : |Ci |ri .

t xi

Proof. Induction on M . We begin with the logical rules and leave the
treatment of the remaining axioms—induction, cases and (∃− )—for the
end.
Case u : A. Let x of type + (A) be a variable for a realizer of the
assumption u. Deﬁne et+ (u) := x and et− 0 (u) := y.
Case u A M B . By induction hypothesis we have a derivation of |B|tz
from ū : |A|xr and ūi : |Ci |xrii , where ū : |A|xr may be absent. Substitute y0
for x and y1 for z. By (→+ ) we obtain |A|y0 t[x:=y0]
r[x,z:=y0,y1] → |B|y1 , which
is (up to -conversion)

|A → B|yx t,x,z r , from ūi : |Ci |xrii[x,z:=y0,y1] .

Here r is the canonical inhabitant of the type − (A) in case ū : |A|xr is

absent. Hence we can deﬁne the required terms by (assuming that u A
is u1 )

et+ (u M ) := (x et+ (M ), x,z et−

1 (M )),
et− −
i (u M ) := eti+1 (M )[x, z := y0, y1].
374 7. Extracting computational content from proofs

Case M A→B N A . By induction hypothesis we have a derivation of

t0(x0)
|A → B|tx = |A|x0
t1(x0)(x1) → |B|x1 from |Ci |xpii , |Ck |xpkk , and of
x
|A|sz from |Cj |qjj , |Ck |xqkk .
Substituting s, y for x in the ﬁrst derivation and of t1sy for z in the
second derivation gives
|A|st1sy → |B|t0s
y from |Ci |xpi , |Ck |xpk , and
i k
x
|A|st1sy from |Cj |q j , |Ck |xq k .
j k

Now we contract |Ck |xpk and |Ck |xq k : since |Ck |xwk is quantiﬁer-free, there is
k k
a boolean term rCk such that
|Ck |xwk ↔ rCk w = tt. (41)
Hence with rk := [if rCk pk then qk else pk ] we can derive both |Ck |xpk
k
and |Ck |xq k from |Ck |xrkk . The derivation proceeds by cases on the boolean
k
term rCk pk . If it is true, then rk converts into qk , and we only need to
derive |Ck |xpk . But this follows by substituting pk for w in (41). If rCk pk
k
is false, then rk converts into pk , and we only need to derive |Ck |xq k , from
k
|Ck |xpk . But the latter implies ff = tt (substitute again pk for w in (41))
k
and therefore every quantiﬁer-free formula, in particular |Ck |xq k .
k
Using (→− ) we obtain
x
|B|t0s
y from |Ci |xpi , |Cj |q j , |Ck |xrkk .
i j

+
Let et (MN ) := t0s and et−
i (MN ) := et−
pi , −
j (MN ) := qj , etk (MN ) :=
rk .
Case x M A(x) . By induction hypothesis we have a derivation of |A(x)|tz
from ūi : |Ci |xrii . Substitute y0 for x and y1 for z. We obtain |A(y0)|t[x:=y0]
y1 ,
which is (up to -conversion)
|∀x A(x)|yx t , from ūi : |Ci |xrii[x,z:=y0,y1] .
Hence we can deﬁne the required terms by
et+ (x M ) := x et+ (M ),
et− −
i (x M ) := eti (M )[x, z := y0, y1].

Case M ∀x A(x) s. By induction hypothesis we have a derivation of

|∀x A(x)|tz = |A(z0)|t(z0)
z1 from |Ci |xrii .
Substituting s, y for z gives
|A(s)|ts
y from |Ci |xrii[z:= s,y ] .
+
Let et (Ms) := ts and et−
i (Ms) := ri [z := s, y].
7.4. Gödel’s Dialectica interpretation 375

Case Indn,A a aM0A(0) M1∀n (A(n)→A(n+1)) ; here we restrict ourselves to N.

Note that we can assume that the induction axiom appears with suﬃciently
many arguments, so that it can be seen as an application of the induction
rule. This can always be achieved by means of -expansion. Let Ik be the
set of all indices of assumption variables ui : Ci occuring free in the step
derivation Mk ; in the present case of induction over N we have k ∈ {0, 1}.
By induction hypothesis we have derivations of
|∀n (A(n) → A(n + 1))|tn,f,y =
|A(n) → A(n + 1)|tn
f,y =

|A(n)|ftn1fy → |A(n + 1)|tn0f

y from (|Ci |xri1i (n,f,y) )i∈I1

and of
|A(0)|tx00 from (|Ci |xri0i (x0 ) )i∈I0 .
˜ r̃i with free
It suﬃces to construct terms (involving recursion operators) t,
variables among x such that
˜
∀n,y ((|Ci |xr̃iiny )i → |A(n)|tn
y ). (42)
˜ and et−
For then deﬁne et+ (Indn,A a aM0 M1 ) := ta aM0 M1 ) :=
i (Indn,A a
r̃i ay. The recursion equations for t˜ are
˜ = t0 ,
t0 ˜ + 1) = tn0(tn).
t(n ˜
For r̃i the recursion equations may involve a case distinction correspond-
ing to the well-known need of contraction in the Dialectica interpretation.
This happens for the k-th recursion equation if and only if (i) we are not
in a base case of the induction, and (ii) i ∈ Ik , i.e., the i-th assumption
variable ui : Ci occurs free in Mk . Therefore in the present case of induc-
tion over N the recursion equation for r̃i needs a case distinction only if
i ∈ I1 and we are in the successor case; then

˜ y) =: s if ¬|Ci |xs i ,
ri1 (n, tn,
r̃i (n + 1)y =
˜
r̃i n(tn1(tn)y) otherwise.

For i ∈
/ I1 the second alternative suffices:
˜
r̃i (n + 1)y = r̃i n(tn1(tn)y).
˜ r̃i can be
In the base case we can simply define r̃i 0y = ri0 (y). Now t,
written explicitly with recursion operators:
˜ = Rnt0 n (tn0),
tn

Rn(y ri0 )n,p,y [if rCi s then p(tn1(tn)y)
˜ else s] if i ∈ I1 ,
r̃i n =
Rn(y ri0 )n,p,y (p(tn1(tn)y))
˜ otherwise
376 7. Extracting computational content from proofs
˜ y), as above. It remains to prove (42). We only
with s := ri1 (n, tn,
consider the successor case. Assume
|Ci |xr̃ii(n+1)y for all i. (43)
˜
We must show |A(n + 1)|t(n+1)
y . To this end we prove
|Ci |xri1i (n,tn,y)
˜ for all i ∈ I1 , and (44)
˜
r̃i (n + 1)y = r̃i n(tn1(tn)y) for all i. (45)
First assume i ∈ I1 . Let s := ri1 (n, tn,
˜ y). If ¬|Ci |xs i , then by definition
r̃i (n + 1)y = s, contradicting (43). Hence |Ci |xs i , which is (44). Then, by
definition, (45) holds as well. Now assume i ∈ / I1 . Then (44) does not
apply, and (45) holds by definition.
Recall the global induction hypothesis for the step derivation M1 . Used
˜ y it gives
with n, tn,
˜ ˜
(|Ci |xri1i (n,tn,y)
˜ )i∈I1 → |A(n)|tn ˜
tn1(tn)y → |A(n + 1)|tn0(
y
tn)
.
Because of (44) it suffices to prove the middle premise. By induction
˜
hypothesis (42) with y := tn1(tn)y it suffices to prove |Ci |xr̃iin(tn1(tn)y)
˜ for
all i. But this follows from (43) by (45).
Remark. It is interesting to note that (42) can also be proved by
quantifier-free induction. To this end, define
s̃0zm := z, · l−
s̃(l + 1)zm := t(m − · 1)1(t(m
˜ − · l−
· 1))(s̃lzm).
We fix z and prove by induction on n that
˜
n ≤ m → (|Ci |xr̃iin(s̃(m−·n)zm) )i → |A(n)|tn · n)zm .
s̃(m− (46)
Then (42) will follow with n := m. For the base case n = 0 we must show
˜
(|Ci |xr̃i0i (s̃mzm) )i → |A(0)|t0
s̃mzm .

Recall that the global induction hypothesis for the base derivation gives
with x0 := s̃mzm
(|Ci |xri0i (s̃mzm) )i∈I0 → |A(0)|ts̃mzm
0
.
By definition of t˜ and r̃i this is what we want. Now consider the successor
case. Assume n+1 ≤ m. We write s̃l for s̃lzm, and abbreviate s̃(m− · n−· 1)
by y. Notice that for l + 1 = m − · n by definition of s̃ we have s̃(m − · n) =
˜
tn1(tn)y. With this notation the previous argument goes through literally:
˜
Assume (43). We must show |A(n + 1)|yt(n+1) . To this end we prove (44)
and (45). First assume i ∈ I1 . Let s := ri1 (n, tn,
˜ y). If ¬|Ci |xs i , then by
definition r̃i (n + 1)y = s, contradicting (43). Hence |Ci |xs i , which is (44).
Then by definition (45) holds as well. Now assume i ∈ / I1 . Then (44) does
not apply, and (45) holds by definition.
7.4. Gödel’s Dialectica interpretation 377

Recall the global induction hypothesis for the step derivation M1 . Used
˜ y it gives
with n, tn,
˜ ˜
(|Ci |xri1i (n,tn,y)
˜ )i∈I1 → |A(n)|tn ˜
tn1(tn)y → |A(n + 1)|tn0(
y
tn)
.
Because of (44) it suffices to prove the middle premise. By induction
˜
hypothesis (42) with y := tn1(tn)y it suffices to prove |Ci |xr̃iin(tn1(tn)y)
˜ for
all i. But this follows from (43) by (45).
Case Cn,A aM0A(0) M1∀n A(n+1) . This can be dealt with similarly, but is
somewhat simpler. By induction hypothesis we have derivations of
|∀n A(n + 1)|tn,y = |A(n + 1)|tn
y from |Ci |xri1i (n,y)
and of
|A(0)|ty0 from |Ci |xri0i (y) .
i ranges over all assumption variables in Cn,A aM0 M1 (if necessary choose
˜ r̃i with free
canonical terms ri0 and ri1 ). It suffices to construct terms t,
variables among x such that
˜
∀m,y ((|Ci |xr̃iimy )i → |A(m)|tm
y ). (47)
˜ and et−
For then we can define et+ (Cn,A aM0 M1 ) = ta i (Cn,A aM0 M1 ) =
˜
r̃i ay. The defining equations for t are
˜ = t0 ,
t0 ˜ + 1) = tn
t(n
and for r̃i
r̃i 0y = ri0 , r̃i (n + 1)y = ri1 (n, y) =: s.
˜ r̃i can be written explicitly:
t,
˜ = [if m then t0 else tm],
tm r̃i m = [if m then y ri0 (y) else n,y s]
with s as above. It remains to prove (47). We only consider the successor
˜
case. Assume |Ci |xr̃ii(n+1)y for all i. We must show |A(n + 1)|t(n+1)
y . To see
this, recall that the global induction hypothesis (for the step derivation)
gives
(|Ci |xs i )i → |A(n + 1)|tn
y

and we are done.

Case ∃− x,A,B M
∃x A ∀x (A→B)
N . Again it is easiest to assume that the axiom
appears with two proof arguments, for its two assumptions. Then it can
be seen as an application of the existence elimination rule. We proceed
similar to the treatment of (→− ) above:
By induction hypothesis we have a derivation of
|∀x (A(x) → B)|tx = |A(x0) → B|t(x0)
x1
t(x0)0(x10)
= |A(x0)|x10
t(x0)1(x10)(x11) → |B|x11
378 7. Extracting computational content from proofs

from |Ci |xpii , |Ck |xpkk , and of

x
|∃x A(x)|sz = |A(s0)|s1
z from |Cj |qjj , |Ck |xqkk .

Substituting s0, s1, y for x in the ﬁrst derivation and of t(s0)1(s1)y
for z in the second derivation gives

|A(s0)|s1
t(s0)1(s1)y → |B|y
t(s0)0(s1)
from |Ci |xpi , |Ck |xpk , and
i k
x
|A(s0)|s1
t(s0)1(s1)y from |Cj |q j , |Ck |xq k .
j k

Now we contract |Ck |xpk and |Ck |xq k as in case (→− ) above; with rk :=
k k
[if rCk pk then qk else pk ] we can derive both |Ck |xpk and |Ck |xq k from
k k
|Ck |xrkk . Using (→− ) we obtain
x
|B|t(s0)0(s1)
y from |Ci |xpi , |Cj |q j , |Ck |xrkk .
i j

So et+ (∃− MN ) := t(s0)0(s1) and

et− −
i (∃ MN ) := pi , et− −
j (∃ MN ) := qj , et− −
k (∃ MN ) := rk .

7.4.5. A uniﬁed treatment of modiﬁed realizability and the Dialectica in-

terpretation. Following Oliva [2006], we show that modiﬁed realizability
can be treated in such a way that similarities with the Dialectica interpre-
tation become visible. To this end, one needs to change the deﬁnitions of
+ (A) and − (A) and also of the Gödel translation |A|xy in the implica-
tional case, as follows:
r+ (A → B) := r+ (A) → r+ (B),
||A → B||fx,u := ∀y ||A||xy → ||B||fx
u .
r− (A → B) := r+ (A) × r− (B),

Note that the (changed) Gödel translation ||A||xy is not quantiﬁer-free any
more, but only ∃-free. Then the above deﬁnition of r can be expressed in
terms of the (new) ||A||xy :

r r A ↔ ∀y ||A||ry .

This is proved by induction on A. For prime formulas the claim is obvious.

Case A → B, with r+ (A) = ◦, r− (A) = ◦.

r r (A → B) ↔ ∀x (x r A → rx r B) by deﬁnition
↔ ∀x (∀y ||A||xy
→ ∀u ||B||rx
u ) by induction hypothesis
↔ ∀x,u (∀y ||A||y → ||B||u )
x rx

= ∀x,u ||A → B||rx,u by deﬁnition.

The other cases are similar (even easier).

7.4. Gödel’s Dialectica interpretation 379

7.4.6. Dialectica interpretation of general induction. Recall the general

recursion operator introduced in (5) (in 6.2.1):
FxG = Gx(y [if y < x then FyG else ε]),
where ε denotes a canonical inhabitant of the range. Using general in-
duction one can prove that F is total:
Theorem. If , G and x are total, then so is FxG.
Proof. Fix total functions and G. We apply general induction on x
to show that FxG is total, which we write as (FxG)↓. By (3) it suffices
to show that
∀y;y<x (FyG)↓ → (FxG)↓.
But this follows from (5), using the totality of , G and x.
Again, in our special case of the <-relation general recursion is easily defi-
nable from structural recursion; the details are spelled out in Schwichten-
berg and Wainer [1995, pp. 399f]. However, general recursion is preferable
from an efficiency point of view.
For an implementation of the Dialectica interpretation it is advisable to
replace axioms by rules whenever possible. In particular, more perspicu-
ous realizers for proofs involving general induction can be obtained if the
induction axiom appears with sufficiently many arguments, so that it can
be seen as an application of the induction rule. Note that this can always
be achieved by means of -expansion.
h
Case GIndn,A a hkM Progn A(n) : A(n). By induction hypothesis we can
derive
|Proghn A(n)|tn,f,z =
|∀n (∀m;hm<hn A(m) → A(n))|tn,f,z =
|∀m;hm<hn A(m) → A(n)|tn
f,z =

|∀m;hm<hn A(m)|ftn1fz → |A(n)|tn0f

where i ranges over all assumption variables in GIndn,A a hkM (if neces-
sary choose canonical terms ri ). It suﬃces to construct terms (involving
˜ r̃i with free variables among x
general recursion operators) t, such that
˜
∀n,z ((|Ci |xr̃iinz )i → |A(n)|tn
z ), (48)
for then we can deﬁne et+ (GIndn,A a hkM ) = tk ˜ and et−
i (GIndn,A a hkM )
˜
= r̃i kz. The recursion equations for t and r̃i are

r (n, [t]˜ <hn , z) =: s if ¬|Ci |xs i ,
˜ <hn , r̃i nz = i
˜ = tn0[t]
tn
[r̃i ]<hn (t 0)(t 1) otherwise,
380 7. Extracting computational content from proofs

with the abbreviations

[r]<hn := m [if hm < hn then rm else ε], t := tn1[t]
˜ <hn z.
It remains to prove (48). For its proof we use general induction. Fix n.
We can assume
˜
∀m;hm<hn ∀z ((|Ci |xr̃iimz )i → |A(m)|tm
z ). (49)

Fix z and assume |Ci |xr̃iinz for all i. We must show |A(n)|tn z . If ¬|Ci |s for
˜ xi

some i, then by deﬁnition r̃i nz = s and we have |Ci |xs i , a contradiction.

Hence |Ci |xs i for all i, and therefore r̃i nz = [r̃i ]<hn (t 0)(t 1). The induction
hypothesis (49) with m := t 0 and z := t 1 gives

h(t 0) < hn → (|Ci |xr̃ii(t 0)(t 1) )i → |A(t 0)|t(t
˜ 0)
t1 .

Recall that the global induction hypothesis (for the derivation of progres-
˜ <hn
siveness) gives with f := [t]

(|Ci |xs i )i → (h(t 0) < hn → |A(t 0)|[tt] 1<hn (t 0) ) → |A(n)|tn0[
˜ ˜ <hn
t]
z .
˜ 0) = [t]
Since t(t ˜ <hn (t 0) and r̃i nz = [r̃i ]<hn (t 0)(t 1) = r̃i (t 0)(t 1) we
are done.
Notice that we can view this proof as an application of quantiﬁer-free
general induction, where the formula (|Ci |xr̃iinz )i → |A(n)|tn ˜
z is proved w.r.t.
the measure function h nz := hn.

7.5. Optimal decoration of proofs

In this section we are interested in “ﬁne-tuning” the computational

content of proofs, by inserting decorations. Here is an example (due to
Constable) of why this is of interest. Suppose that in a proof M of a
formula C we have made use of a case distinction based on an auxiliary
lemma stating a disjunction, say L : A ∨ B. Then the extract et(M )
will contain the extract et(L) of the proof of the auxiliary lemma, which
may be large. Now suppose further that in the proof M of C the only
computationally relevant use of the lemma was which one of the two
alternatives holds true, A or B. We can express this fact by using a
weakened form of the lemma instead: L : A ∨u B. Since the extract et(L )
is a boolean, the extract of the modified proof has been “purified” in the
sense that the (possibly large) extract et(L) has disappeared.
In 7.5.1 we consider the question of “optimal” decorations of proofs:
suppose we are given an undecorated proof, and a decoration of its end
formula. The task then is to find a decoration of the whole proof (in-
cluding a further decoration of its end formula) in such a way that any
other decoration “extends” this one. Here “extends” just means that some
7.5. Optimal decoration of proofs 381

connectives have been changed into their more informative versions, dis-
regarding polarities. We show that such an optimal decoration exists, and
give an algorithm to construct it.
We then consider applications. In 7.5.2 we take up the example of list
reversal used by Berger [2005b] to demonstrate that usage of ∀nc rather
than ∀c can signiﬁcantly reduce the complexity of extracted programs,
in this case from quadratic to linear. The Minlog implementation of
the decoration algorithm automatically ﬁnds the optimal decoration. A
similar application of decoration is treated in 7.5.3. It occurs when one
derives double induction (recurring to two predecessors) in continuation
passing style, i.e., not directly, but using as an intermediate assertion
(proved by induction)
∀cn,m ((Qn →c Q(Sn) →c Q(n + m)) →c Q0 →c Q1 →c Q(n + m)).
After decoration, the formula becomes
∀cn ∀nc
m ((Qn → Q(Sn) → Q(n + m)) → Q0 → Q1 → Q(n + m)).
c c c c c

This is applied (as in Chiarabini [2009]) to obtain a continuation based tail

recursive definition of the Fibonacci function, from a proof of its totality.
7.5.1. Decoration algorithm. We denote the sequent of a proof M by
Seq(M ); it consists of its context and end formula.
The proof pattern P(M ) of a proof M is the result of marking in c.r.
formulas of M (i.e., those not above a n.c. formula) all occurrences of
implications and universal quantifiers as non-computational, except the
“uninstantiated” formulas of axioms and theorems. For instance, the
induction axiom for N consists of the uninstantiated formula ∀cn (P0 →c
∀cn (Pn →c P(Sn)) →c Pn N ) with a unary predicate variable P and a
predicate substitution P → {x | A(x)}. Notice that a proof pattern in
most cases is not a correct proof, because at axioms formulas may not fit.
We say that a formula D extends C if D is obtained from C by changing
some (possibly zero) of its occurrences of non-computational implications
and universal quantifiers into their computational variants →c and ∀c .
A proof N extends M if (i) N and M are the same up to variants of
implications and universal quantifiers in their formulas, and (ii) every c.r.
formula of M is extended by the corresponding one in N . Every proof M
whose proof pattern P(M ) is U is called a decoration of U .
Notice that if a proof N extends another one M , then FV(et(N )) is
essentially (that is, up to extensions of assumption formulas) a superset
of FV(et(M )). This can be proven by induction on N .
In the sequel we assume that every axiom has the property that for every
extension of its formula we can find a further extension which is an instance
of an axiom, and which is the least one under all further extensions that
are instances of axioms. This property clearly holds for axioms whose
uninstantiated formula only has the decorated →c and ∀c , for instance
382 7. Extracting computational content from proofs

induction. However, in ∀cn (A(0) →c ∀cn (A(n) →c A(Sn)) →c A(n N )) the

given extension of the four A’s might be different. One needs to pick their
“least upper bound” as further extension.
We will define a decoration algorithm, assigning to every proof pattern
U and every extension of its sequent an “optimal” decoration M∞ of U ,
which further extends the given extension of its sequent.
Theorem. Under the assumption above, for every proof pattern U and
every extension of its sequent Seq(U ) we can find a decoration M∞ of U
such that
(a) Seq(M∞ ) extends the given extension of Seq(U ), and
(b) M∞ is optimal in the sense that any other decoration M of U whose
sequent Seq(M ) extends the given extension of Seq(U ) has the property
that M also extends M∞ .
Proof. By induction on derivations. It suffices to consider derivations
with a c.r. endformula. For axioms the validity of the claim was assumed,
and for assumption variables it is clear.
Case (→nc )+ . Consider the proof pattern
Γ, u : A
|U
B
(→nc )+ , u
A →nc B
with a given extension Δ ⇒ C →nc D or Δ ⇒ C →c D of its sequent
Γ ⇒ A →nc B. Applying the induction hypothesis for U with sequent
Δ, C ⇒ D, one obtains a decoration M∞ of U whose sequent Δ1 , C1 ⇒
D1 extends Δ, C ⇒ D. Now apply (→nc )+ in case the given extension is
Δ ⇒ C →nc D and xu ∈ / FV(et(M∞ )), and (→c )+ otherwise.
For (b) consider a decoration u M of u U whose sequent extends the
given extended sequent Δ ⇒ C →nc D or Δ ⇒ C →c D. Clearly the
sequent Seq(M ) of its premise extends Δ, C ⇒ D. Then M extends
M∞ by induction hypothesis for U . If u M derives a non-computational
implication then the given extended sequent must be of the form Δ ⇒
C →nc D and xu ∈ / FV(et(M )), hence xu ∈ / FV(et(M∞ )). But then
by construction we have applied (→nc )+ to obtain u M∞ . Hence u M
extends u M∞ . If u M does not derive a non-computational implication,
the claim follows immediately.
Case (→nc )− . Consider a proof pattern
Φ, Γ Γ, Ψ
|U |V
A→ Bnc
A −
(→nc )
B
We are given an extension Π, Δ, Σ ⇒ D of Φ, Γ, Ψ ⇒ B. Then we proceed
in alternating steps, applying the induction hypothesis to U and V .
7.5. Optimal decoration of proofs 383

(1) The induction hypothesis for U for the extension Π, Δ ⇒ A →nc D

of its sequent gives a decoration M1 of U whose sequent Π1 , Δ1 ⇒ C1 →
D1 extends Π, Δ ⇒ A →nc D, where → means →nc or →c . This already
suﬃces if A is n.c., since then the extension Δ1 , Σ ⇒ C1 of V is a correct
proof (recall that in n.c. parts of a proof decorations of implications and
universal quantiﬁers can be ignored). If A is c.r.:
(2) The induction hypothesis for V for the extension Δ1 , Σ ⇒ C1 of its
sequent gives a decoration N2 of V whose sequent Δ2 , Σ2 ⇒ C2 extends
Δ1 , Σ ⇒ C1 .
(3) The induction hypothesis for U for the extension Π1 , Δ2 ⇒ C2 → D1
of its sequent gives a decoration M3 of U whose sequent Π3 , Δ3 ⇒ C3 →
D3 extends Π1 , Δ2 ⇒ C2 → D1 .
(4) The induction hypothesis for V for the extension Δ3 , Σ2 ⇒ C3 of its
sequent gives a decoration N4 of V whose sequent Δ4 , Σ4 ⇒ C4 extends
Δ3 , Σ2 ⇒ C3 . This process is repeated until no further proper extension
of Δi , Ci is returned. Such a situation will always be reached since there
is a maximal extension, where all connectives are maximally decorated.
But then we easily obtain (a): Assume that in (4) we have Δ4 = Δ3 and
C4 = C3 . Then the decoration

Π3 , Δ 3 Δ4 , Σ4
| M3 | N4
C3 → D3 C4 −
→
D3

of UV derives a sequent Π3 , Δ3 , Σ4 ⇒ D3 extending Π, Δ, Σ ⇒ D.

For (b) we need to consider a decoration MN of UV whose sequent
Seq(MN ) extends the given extension Π, Δ, Σ ⇒ D of Φ, Γ, Ψ ⇒ B.
We must show that MN extends M3 N4 . To this end we go through the
alternating steps again.
(1) Since the sequent Seq(M ) extends Π, Δ ⇒ A →nc D, the induction
hypothesis for U for the extension Δ ⇒ A →nc D of its sequent ensures
that M extends M1 .
(2) Since then the sequent Seq(N ) extends Δ1 , Σ ⇒ C1 , the induction
hypothesis for V for the extension Δ1 , Σ ⇒ C1 of its sequent ensures that
N extends N2 .
(3) Therefore Seq(M ) extends the sequent Π1 , Δ2 ⇒ C2 → D1 , and the
induction hypothesis for U for the extension Π1 , Δ2 ⇒ C2 → D1 of U ’s
sequent ensures that M extends M3 .
(4) Therefore Seq(N ) extends Δ3 , Σ2 ⇒ C3 , and induction hypothesis
for V for the extension Δ3 , Σ2 ⇒ C3 of V ’s sequent ensures that N also
extends N4 .
But since Δ4 = Δ3 and C4 = C3 by assumption, MN extends the
decoration M3 N4 of UV constructed above.
384 7. Extracting computational content from proofs

Case (∀nc )+ . Consider a proof pattern

Γ
|U
A
(∀nc )+
∀nc
x A
with a given extension Δ ⇒ ∀ncx C or Δ ⇒ ∀x C of its sequent. Applying the
c

induction hypothesis for U with sequent Δ ⇒ C , one obtains a decoration

M∞ of U whose sequent Δ1 ⇒ C1 extends Δ ⇒ C . Now apply (∀nc )+ in
case the given extension is Δ ⇒ ∀nc x C and x ∈ / FV(et(M∞ )), and (∀c )+
otherwise.
For (b) consider a decoration x M of x U whose sequent extends
the given extended sequent Δ ⇒ ∀nc x C or Δ ⇒ ∀x C . Clearly the sequent
c

Seq(M ) of its premise extends Δ ⇒ C . Then M extends M∞ by induction

hypothesis for U . If x M derives a non-computational generalization,
then the given extended sequent must be of the form Δ ⇒ ∀nc x C and x ∈/
FV(et(M )), hence x ∈ / FV(et(M∞ )) (by the remark above). But then by
construction we have applied (∀nc )+ to obtain x M∞ . Hence x M extends
x M∞ . If x M does not derive a non-computational generalization, the
claim follows immediately.
Case (∀nc )− . Consider a proof pattern
Γ
|U
∀nc
x A(x) r
(∀nc )−
A(r)
and let Δ ⇒ C (r) be any extension of its sequent Γ ⇒ A(r). The induction
hypothesis for U for the extension Δ ⇒ ∀nc x C (x) produces a decoration
nc −
M∞ of U whose sequent extends Δ ⇒ ∀nc x C (x). Then apply (∀ ) or
c −
(∀ ) , whichever is appropriate, to obtain the required M∞ r.
For (b) consider a decoration Mr of Ur whose sequent Seq(Mr) extends
the given extension Δ ⇒ C (r) of Γ ⇒ A(r). Then M extends M∞ by
induction hypothesis for U , and hence Mr extends M∞ r.
We illustrate the effects of decoration on a simple example involving
implications. Consider A → B → A with the trivial proof M := A B
u1 u2 u1 .
Clearly only the first implication must transport possible computational
content. To “discover” this by means of the decoration algorithm we
specify as extension of Seq(P(M )) the formula A →nc B →nc A. The
algorithm then returns a proof of A →c B →nc A.
7.5.2. List reversal, again. We first give an informal weak existence
proof for list reversal. Recall that the weak (or “classical”) existential
quantifier is defined by
∃˜ x A := ¬∀x ¬A.
7.5. Optimal decoration of proofs 385

The proof is similar to the one given in 7.2.9. Again assuming (20) and
(21) we prove
∀v ∃˜ w Rvw ( := ∀v (∀w (Rvw → ⊥) → ⊥)). (50)
Fix v and assume u : ∀w ¬Rvw; we need to derive a contradiction. To
this end we prove that all initial segments of v are non-revertible, which
contradicts (20). More precisely, from u and (21) we prove
∀v2 A(v2 ) with A(v2 ) := ∀v1 (v1 v2 = v → ∀w ¬Rv1 w)
by induction on v2 . For v2 = nil this follows from our initial assumption
u. For the step case, assume v1 (xv2 ) = v, ﬁx w and assume further Rv1 w.
We must derive a contradiction. By (21) we conclude that R(v1 x, xw). On
the other hand, properties of the append function imply that (v1 x)v2 = v.
The induction for v1 x gives ∀w ¬R(v1 x, w). Taking xw for w leads to the
desidered contradiction.
We formalize this proof, to prepare it for decoration. The following
lemmata will be used:
Compat : ∀P ∀v1 ,v2 (v1 = v2 → Pv1 → Pv2 ),
Symm : ∀v1 ,v2 (v1 = v2 → v2 = v1 ),
Trans : ∀v1 ,v2 ,v3 (v1 = v2 → v2 = v3 → v1 = v3 ),
L1 : ∀v (v = v nil),
L2 : ∀v1 ,x,v2 ((v1 x)v2 = v1 (xv2 )).
The proof term is
M := v ∀u w ¬Rvw (Indv2 ,A(v2 ) vvMBase MStep nil Tnil v=v nil InitRev)
with
MBase := v1 vu11 nil=v (Compat {v | ∀w ¬Rvw} vv1
(Symm v1 v(Trans v1 (v1 nil)v(L1 v1 )u1 ))u),
MStep := x,v2 A(v
u0
2)
v1 vu11 (xv2 )=v w Rv
u2
1w
(
u0 (v1 x)(Trans ((v1 x)v2 )(v1 (xv2 ))v(L2 v1 xv2 )u1 )
(xw)(GenRev v1 wxu2 )).

We now have a proof M of ∀v ∃˜ w Rvw from the clauses InitRev : D1 and

GenRev : D2 , with D1 := R(nil, nil) and D2 := ∀v,w,x (Rvw → R(vx,
xw)). Using the reﬁned A-translation (cf. section 7.3) we can replace ⊥
throughout by ∃w Rvw. The end formula ∀v ∃˜ w Rvw := ∀v ¬∀w ¬Rvw :=
∀v (∀w (Rvw → ⊥) → ⊥) is turned into ∀v (∀w (Rvw → ∃w Rvw) →
∃w Rvw). Since its premise is an instance of existence introduction we
obtain a derivation M ∃ of ∀v ∃w Rvw. Moreover, in this case neither the
Di nor any of the axioms used involves ⊥ in its uninstantiated formulas,
and hence the correctness of the proof is not aﬀected by the substitution.
386 7. Extracting computational content from proofs

The term neterm extracted in Minlog from a formalization of the proof

above is (after “animating” Compat)
[v0]
(Rec list nat=>list nat=>list nat=>list nat)v0([v1,v2]v2)
([x1,v2,g3,v4,v5]g3(v4:+:x1:)(x1::v5))
(Nil nat)
(Nil nat)
with g a variable for binary functions on lists. In fact, the underlying
algorithm defines an auxiliary function h by
h(nil, v2 , v3 ) := v3 , h(xv1 , v2 , v3 ) := h(v1 , v2 x, xv3 )
and gives the result by applying h to the original list and twice nil.
Notice that the second argument of h is not needed. However, its
presence makes the algorithm quadratic rather than linear, because in
each recursion step v2 x is computed, and the list append function is
defined by recursion on its first argument. We will be able to get rid of
this superfluous second argument by decorating the proof. It will turn out
that in the proof (by induction on v2 ) of the auxiliary formula A(v2 ) :=
∀v1 (v1 v2 = v → ∀w ¬Rv1 w)), the variable v1 is not used computationally.
Hence, in the decorated version of the proof, we can use ∀nc v1 .
Let us now apply the general method of decorating proofs to the ex-
ample of list reversal. To this end, we present our proof in more detail,
particularly by writing proof trees with formulas. The decoration algo-
rithm then is applied to its proof pattern with the sequent consisting of the
context R(nil, nil) and ∀ncv,w,x (Rvw →
nc
R(vx, xw)) and the end formula
∀v ∃w Rvw.
nc l

Rather than describing the algorithm step by step we only display the
end result. Among the axioms used, the only ones in c.r. parts are Compat
and list induction. They appear in the decorated proof in the form
Compat : ∀P ∀nc
v1 ,v2 (v1 = v2 → Pv1 → Pv2 ),
c

Ind : ∀cv2 (A(nil) →c ∀cx,v2 (A(v2 ) →c A(xv2 )) →c A(v2 ))

c ∃ ∃
with A(v2 ) := ∀nc v1 (v1 v2 =v → ∀w ¬ Rv1 w) and ¬ Rv1 w := Rv1 w →
∃
∃lw Rvw. MBase is the derivation in Figure 1, where N is a derivation
∃
involving L1 with a free assumption u1 : v1 nil=v. MStep is the derivation
in Figure 2, where N1 is a derivation involving L2 with free assumption
u1 : v1 (xv2 )=v, and N2 is one involving GenRev with the free assumption
u2 : Rv1 w.
The extracted term neterm then is
[v0]
(Rec list nat=>list nat=>list nat)v0([v1]v1)
([x1,v2,f3,v4]f3(x1::v4))
(Nil nat)
7.5. Optimal decoration of proofs 387

[u1 : v1 nil=v]
Compat {v | ∀cw ¬∃ Rvw} v v1 |N
v=v1 → ∀cw ¬∃ Rvw →c ∀cw ¬∃ Rv1 w v=v1
∀cw ¬∃ Rvw →c ∀cw ¬∃ Rv1 w ∃+ : ∀cw ¬∃ Rvw
∀cw ¬∃ Rv1 w
(→nc )+ u1
v1 nil = v → ∀cw ¬∃ Rv1 w
∀nc
v1 (v1 nil = v → ∀cw ¬∃ Rv1 w) (= A(nil))
Figure 1. The decorated base derivation

[u1 : v1 (xv2 )=v]

[u0 : A(v2 )] v1 x | N1
c ∃
(v1 x)v2 =v → ∀w ¬ R(v1 x, w) (v1 x)v2 =v [u2 : Rv1 w]
∀cw ¬∃ R(v1 x, w) xw | N2
∃
¬ R(v1 x, xw) R(v1 x, xw)
∃lw Rvw
(→nc )+ u2
∃
¬ Rv1 w
∀cw ¬∃ Rv1 w
(→nc )+ u1
c ∃
v1 (xv2 )=v → ∀w ¬ Rv1 w
c ∃
∀nc (v
v1 1 (xv 2 )=v → ∀w ¬ Rv1 w) (=A(xv2 ))
(→c )+ u0
A(v2 ) →c A(xv2 )
∀cx,v2 (A(v2 ) →c A(xv2 ))
Figure 2. The decorated step derivation

with f a variable for unary functions on lists. To run this algorithm one
has to normalize the term obtained by applying neterm to a list:
(pp (nt (mk-term-in-app-form neterm (pt "1::2::3::4:"))))
The returned value is the reversed list 4::3::2::1:. This time, the un-
derlying algorithm deﬁnes an auxiliary function g by
g(nil, w) := w, g(x :: v, w) := g(v, x :: w)
and gives the result by applying g to the original list and nil. In conclusion,
we have obtained (by machine extraction from an automated decoration
of a weak existence proof) the standard linear algorithm for list reversal,
with its use of an accumulator.
7.5.3. Passing continuations. A similar application of decoration oc-
curs when one derives double induction
∀cn (Qn →c Q(Sn) →c Q(S(Sn))) →c ∀cn (Q0 →c Q1 →c Qn)
388 7. Extracting computational content from proofs

in continuation passing style, i.e., not directly, but using as an intermediate

assertion (proved by induction)
∀cn,m ((Qn →c Q(Sn) →c Q(n + m)) →c Q0 →c Q1 →c Q(n + m)).
After decoration, the formula becomes
∀cn ∀nc
m ((Qn → Q(Sn) → Q(n + m)) → Q0 → Q1 → Q(n + m)).
c c c c c

This can be applied to obtain a continuation based tail recursive deﬁni-

tion of the Fibonacci function, from a proof of its totality. Let G be the
graph of the Fibonacci function, deﬁned by the clauses
G00, G11,
∀nc
n,v,w (Gnv →nc G(Sn, w) →nc G(S(Sn), v + w)).
We view G as a predicate variable without computational content. From
these assumptions one can easily derive
∀cn ∃v Gnv,
using double induction (proved in continuation passing style). The term
extracted from this proof is
[n0]
(Rec nat=>nat=>(nat=>nat=>nat)=>nat=>nat=>nat)
n0([n1,k2]k2)
([n1,p2,n3,k4]p2(Succ n3)([n7,n8]k4 n8(n7+n8)))
applied to 0, ([n1,n2]n1), 0 and 1. An unclean aspect of this term is
that the recursion operator has value type
nat=>(nat=>nat=>nat)=>nat=>nat=>nat
rather than (nat=>nat=>nat)=>nat=>nat=>nat, which would corre-
spond to an iteration. However, we can repair this by decoration. After
(automatic) decoration of the proof, the extracted term becomes
[n0]
(Rec nat=>(nat=>nat=>nat)=>nat=>nat=>nat)
n0([k1]k1)
([n1,p2,k3]p2([n6,n7]k3 n7(n6+n7)))
applied to ([n1,n2]n1), 0 and 1. This indeed is iteration in continuation
passing style.

7.6. Application: Euclid’s theorem

Yiannis Moschovakis suggested the following example of a classical

existence proof with a quantiﬁer-free kernel which does not obviously
contain an algorithm: the gcd of two natural numbers a1 and a2 is a linear
combination of the two. Here we treat this example as a case study for
7.6. Application: Euclid’s theorem 389

program extraction from classical proofs. We will apply both methods

(52) applied to l1 , l2 gives A(l1 , l2 ) → ⊥ and hence 0 = |l1 a1 −l2 a2 | = r.

Lemma (Step).
a1 = q · |k1 a1 − k2 a2 | + r → r = |Step(a1 , a2 , k1 , k2 , q)a1 − qk2 a2 |.
Proof. Let

qk1 − 1 if k2 a2 < k1 a1 and 0 < q,
Step(a1 , a2 , k1 , k2 , q) :=
qk1 + 1 otherwise.
390 7. Extracting computational content from proofs

Clearly the values are natural numbers. Assume 0 < q. If k2 a2 < k1 a1 ,

a1 = q · (k1 a1 − k2 a2 ) + r
r = (1 − qk1 )a1 + qk2 a2
= −(qk1 − 1)a1 + qk2 a2
= −Step(a1 , a2 , k1 , k2 , q)a1 + qk2 a2
= |Step(a1 , a2 , k1 , k2 , q)a1 − qk2 a2 |,
and in case k2 a2 ≥ k1 a1
a1 = −q · (k1 a1 − k2 a2 ) + r
r = (qk1 + 1)a1 − qk2 a2
= |Step(a1 , a2 , k1 , k2 , q)a1 − qk2 a2 |.
For q = 0, Step(a1 , a2 , k1 , k2 , 0) = 1 and the claim is correct.
7.6.2. Extracted terms. The reﬁned A-translation when applied to a
formalization of the proof above produces a term eta :=
[n0,n1]
[if (0=Rem n0 n1)
(0@1)
[if (0<Rem n0 n1)
((Rec nat=>nat=>nat=>nat@@nat)([n2,n3]0@0)
([n2,f3,n4,n5]
[if (0=Rem n1(Lin n0 n1(n4@n5)))
[if (0=Rem n0(Lin n0 n1(n4@n5)))
(n4@n5)
[if (0<Rem n0(Lin n0 n1(n4@n5)))
(f3 (Step n0 n1(n4@n5)(Quot n0(Lin n0 n1(n4@n5))))
(Quot n0(Lin n0 n1(n4@n5))*n5))
(0@0)]]
[if (0<Rem n1(Lin n0 n1(n4@n5)))
(f3(Quot n1(Lin n0 n1(n4@n5))*n4)
(Step n1 n0(n5@n4)(Quot n1(Lin n0 n1(n4@n5)))))
(0@0)]])
n1
(Step n0 n1(0@1)(Quot n0 n1))
(Quot n0 n1))
(0@0)]]
The term extracted via the Dialectica interpretation from a formalization
of this proof is etd :=
[n0,n1]
[let pf712
((Rec nat=>nat@@nat=>nat@@nat)([p3]0@0)
7.6. Application: Euclid’s theorem 391

([n3,pf4,p5]
[if (0<Lin n0 n1 p5 impb
Rem n0(Lin n0 n1 p5)=0 impb
Rem n1(Lin n0 n1 p5)=0 impb False)
(pf4
[let p6
(Step n0 n1 p5(Quot n0(Lin n0 n1 p5))@
Quot n0(Lin n0 n1 p5)*right p5)
[if (Lin n0 n1 p6<n3 impb 0<Lin n0 n1 p6 impb False)
(Quot n1(Lin n0 n1 p5)*left p5@
Step n1 n0(right p5@left p5)(Quot n1(Lin n0 n1 p5)))
p6]])
p5])
n1)
[let p2
[if (0<n1 impb Rem n0 n1=0 impb False)
(pf712(Step n0 n1(0@1)(Quot n0 n1)@Quot n0 n1))
(0@1)]
[if (0<Lin n0 n1 p2 impb
Rem n0(Lin n0 n1 p2)=0 impb
Rem n1(Lin n0 n1 p2)=0 impb False)
(pf712(0@[if (0<n1) 0 2]))
p2]]]
Application of term-to-expr to etd as well as eta results in a Scheme
expression which can be “evaluated”, provided we have “deﬁned” (in the
sense of the underlying programming language) the functions |Step| and
|Lin|:
(define |Step|
(lambda (a1)
(lambda (a2)
(lambda (p)
(lambda (q)
(if (and (< (* (cdr p) a2) (* (car p) a1)) (< 0 q))
(- (* q (car p)) 1)
(+ (* q (car p)) 1)))))))

(define |Lin|
(lambda (a1)
(lambda (a2)
(lambda (p)
n(abs (- (* (car p) a1) (* (cdr p) a2)))))))
The result for (((ev (term-to-expr etd)) 66) 27) is (16 . 39). In-
deed |16 ∗ 66 − 39 ∗ 27| = 3, which is the greatest common divisor
392 7. Extracting computational content from proofs

of 66 and 27. For (((ev (term-to-expr eta)) 66) 27) the result
is (2 . 5), and again, |2 ∗ 66 − 5 ∗ 27| = 3.

Remarks. As we see from this example the recursion parameter n is

not really used in the computation but just serves as a counter or more
precisely as an upper bound for the number of steps until both remainders
are zero. This will always happen if the induction principle is used only
in the form of the minimum principle (or, equivalently, <-induction),
because then in the extracted terms of <-induction the step term has in
its kernel no free occurrence of n.
If we remove n according to this remark it becomes clear that our gcd
algorithm is similar to Euclid’s. The only diﬀerence lies in the fact that we
have kept a1 , a2 ﬁxed in our proof whereas Euclid changes a1 to a2 and a2
to Rem(a1 , a2 ) provided Rem(a1 , a2 ) > 0 (using the fact that this doesn’t
change the ideal).

7.7. Notes

Much of the material in the present chapter is due to Troelstra [1973],

and has its roots in work of Kreisel [1963]. More information on the
BHK-interpretation and its history may be found in Troelstra and van
Dalen [1988].
A very important new theme in the area, though it runs somewhat
orthogonally to the foundational viewpoint of this chapter, is the signif-
icant work of Kohlenbach on “proof mining”. There the emphasis is on
the application of techniques from realizability and functional interpre-
tations to the analysis of, and the extraction of numerical bounds from,
core theorems of mathematics by Kreisel-style “unwinding” of proofs.
Significant applications have been found in the areas of approximation
theory, functional analysis, fixed point theory in hyperbolic spaces, er-
godic theory etcetera; see for example Kohlenbach and Oliva [2003b],
Kohlenbach [2005], Gerhardy and Kohlenbach [2008], Kohlenbach and
Leustean [2003]. The book by Kohlenbach [2008] gives a detailed treat-
ment of what has been done up to 2007. A good introductory survey of
the work can be found in Kohlenbach and Oliva [2003a].
The concept of a “non-computational” universal quantifier treated in
section 7.2 was introduced by Berger [1993a], [2005b]. A somewhat re-
lated idea in the context of so-called pure type systems has been formu-
lated in Miquel [2001]. However, in his Gen rule used to introduce the
non-computational quantifier Miquel is more restrictive: the generalized
variable is required to not occur at all in the given proof M , whereas
Berger only requires that it is not a computational variable in M , which is
expressed here by x ∈/ FV(et(M )).
7.7. Notes 393

Section 7.3 is based on Berger, Buchholz, and Schwichtenberg [2002].

It generalizes previously known results of Kreisel [1963], Friedman [1978]
and Dragalin [1979] since B in ∀x ∃˜ y B need not be quantifier-free, but
only has to belong to the strictly larger class of goal formulas (defined in
7.3.1). Furthermore we allow unproven lemmata D in the proof of ∀x ∃˜ y B,
where D is a definite formula (defined in 7.3.1). Closely related classes of
formulas have (independently) been introduced by Ishihara [2000].
Section 7.4 develops the Dialectica interpretation of Gödel [1958] from
scratch, using natural deduction. The history of natural-deduction-based
treatments of the Dialectica interpretation is nicely described in Hernest’s
thesis [2006]:
Natural deduction formulations of the Diller and Nahm [1974]
variant of D-interpretation were provided by Diller’s students
Rath [1978] and Stein [1976]. Only in the year 2001 Jørgensen
provided a first Natural Deduction formulation of the original
Gödel functional interpretation. In the Diller–Nahm setting
all choices between the potential realizers of a contraction are
postponed to the very end by collecting all candidates and
making a single final global choice. In contrast, Jørgensen’s
formulation respects Gödel’s original treatment of contraction
by immediate (local) choices. Jørgensen devises a so-called
“contraction lemma” in order to handle (in the given natural
deduction context) the discharging of more than one copy of
an assumption in an implication introduction →+ .
In all these natural deduction formulations of the Dialectica interpretation
open assumptions are viewed as formulas, and consequently the problem
of contractions arises when an application of the implication introduc-
tion rule →+ discharges more than one assumption formula. However,
it seems to be more in the spirit of the Curry–Howard correspondence
(formulas correspond to types, and proofs to terms) to view assump-
tions as assumption variables. This is particularly important when—say
in an implementation—one wants to assign object terms (“realizers”, in
Gödel’s T) to proof terms. To see the point, notice that a proof term
M may have many occurrences of a free assumption variable u A . The
associated realizer et(M ) then needs to contain an object variable xu of
type (A) uniquely associated with u A , again with many occurrences. To
organize this in an appropriate way it seems mandatory to be able to
refer to an assumption A by means of its “label” u. This is carried out
in section 7.4, which includes a relatively simple proof of the soundness
theorem. The unified treatment of modified realizability and the Dialec-
tica interpretation in 7.4.5 is due to Oliva [2006]. More details on the
Dialectica interpretation of general induction in 7.4.6 can be found in
Schwichtenberg [2008a].
394 7. Extracting computational content from proofs

It is of obvious interest to compare the two computational interpreta-

tions of classical logic, refined A-translation and Dialectica interpretation.
Ratiu and Trifonov [2010] have done significant work in this direction,
with the infinite pigeonhole principle as a case study. One may also at-
tempt to transfer Berger’s [1993a] idea of non-computational quantifiers
into the realm of the Dialectica interpretation. Studies in this direction
have been carried out by Hernest [2006], Hernest and Trifonov [2010] and
Trifonov [2009].
Other interesting examples of program extraction from classical proofs
have been studied by Murthy [1990], Coquand’s group (see, e.g., Coquand
and Persson [1999]) in a type-theoretic context, by Kohlenbach [1996]
using a Dialectica interpretation, and by Raffalli [2004].
There is also a different line of research aimed at giving an algorith-
mic interpretation to (specific instances of) the classical double negation
rule. It essentially started with Griffin’s observation [1990] that Felleisen’s
control operator C (Felleisen, Friedman, Kohlbecker, and Duba [1987],
Felleisen and Hieb [1992]) can be given the type of the stability schema
¬¬A → A. This initiated quite a bit of work aimed at extending the
Curry–Howard correspondence to classical logic; for example, by Bar-
banera and Berardi [1993], Constable and Murthy [1991], Krivine [1994]
and Parigot [1992].
The decoration algorithm in 7.5 is taken from Ratiu and Schwichtenberg
[2010]. Some of its applications are based on work of Berger [2005a] and
observations of Chiarabini [2009].
Further case studies of using Minlog for program extraction from proofs
can be found in Schwichtenberg [2005], Berger, Berghofer, Letouzey, and
Schwichtenberg [2006], Schwichtenberg [2008b].
Chapter 8

LINEAR TWO-SORTED ARITHMETIC

In this ﬁnal chapter we focus much of the technical/logical work of

previous chapters onto theories with limited (more feasible) computa-
tional strength. The initial motivation is the surprising result of Bellan-
toni and Cook [1992] characterizing the polynomial-time functions by
the primitive recursion schemes, but with a judicially placed semicolon
first used by Simmons [1988], separating the variables into two kinds
(or sorts). The first “normal” kind controls the length of recursions,
and the second “safe” kind marks the places where substitutions are al-
lowed. Various alternative names have arisen for the two sorts of vari-
ables, which will play a fundamental role throughout this chapter, thus
“normal”/“input” and “safe”/“output”; we shall use the input–output
terminology. The important distinction here is that input and output
variables will not just be of base type, but may be of arbitrary higher
type.
We begin by developing a basic version of arithmetic which incorpo-
rates this variable separation. This theory EA(;) will have elementary
recursive strength (hence the prefix E) and sub-elementary (polynomially
bounded) strength when restricted to its Σ1 -inductive fragment. EA(;) is
a first-order theory which we use as a means to illustrate the underlying
principles available in such two-sorted situations. Our aim however is to
extend the Bellantoni and Cook variable separation to also incorporate
higher types. This produces a theory A(;) extending EA(;) with higher
type variables and quantifiers, having as its term system a two-sorted ver-
sion T(;) of Gödel’s T. T(;) will thus give a functional interpretation for
A(;), which has the same elementary computational strength, but is more
expressive and applicable.
We then go a stage further in formulating a theory LA(;) all of whose
provable recursions are polynomially bounded, not just those in the Σ1 -
inductive fragment; but to achieve this, an important additional aspect
now comes into play. We need the logic to be linear (hence the prefix L)
and the corresponding term system LT(;) to have a linearity restriction
on higher type output variables in order to ensure that the computational
content remains polynomial-time computable.

395
396 8. Linear two-sorted arithmetic

The following relationships will hold between the theories and their
corresponding functional interpretations:

Arithmetic A(;) LA(;)

= = .
Gödel’s T T(;) LT(;)

The leading intuition is of course that one should use the Curry–Howard
correspondence between terms in lambda-calculus and derivations in
arithmetic. However, in the two-sorted versions we are about to develop,
care must be taken to arrive at flexible and easy-to-use systems which can
be understood in their own right.
The first recursion-theoretic definition of polynomial-time computable
functions was given by Cobham [1965], and much later Cook and Kapron
[1990] proposed a notion of “basic feasible functional” of higher type,
in their system PV . One should also mention the work of Leivant and
Marion [1993], which gave a “tiered” typed -calculus characterization
of poly-time. However, Buss’ [1986] bounded arithmetic gave the first
proof-theoretic characterization of polynomial-time in terms of provable
recursiveness, and then Leivant [1995b], [1995a] characterized it (poly-
time) in a “predicative” theory without explicit bounds on quantifiers.
“Implicit complexity” (in theories without explicit bounds) subsequently
became a topic in itself. Our development is based on EA(;) introduced in
Çaǧman, Ostrin, and Wainer [2000] and Ostrin and Wainer [2005], which
reworks Leivant’s results in a simpler context, and on the papers Bellan-
toni, Niggl, and Schwichtenberg [2000] and Schwichtenberg and Bellan-
toni [2002], where linearity was first introduced in the setting of Gödel’s T,
in conjunction with the Bellantoni–Cook style of two-sorted recursion.
However, the notion of linearity used here is very down-to-earth, meaning
essentially “no contraction”, and it should not be confused with Girard’s
linear logic and its [1998] “light” variant. Other related work is that of
Bellantoni and Hofmann [2002], based on Hofmann’s [1999] concept of
“non-size-increasing” recursion. Many different proof-theoretic charac-
terizations of the poly-time functions can be found in the logic literature,
for example: Cantini’s [2002] and Strahm’s [1997] are based on applicative
theories of combinatory logic; also Marion’s [2001] gives a particularly
simple approach, where quantifiers are restricted to “actual terms”. Hof-
mann [2000] studies safe recursion at higher types, and Strahm [2004]
characterizes the type-2 basic feasible functionals, again in an applica-
tive theory. Other complexity classes, e.g., for boolean circuits, loga-
rithmic space, polynomial space, are studied widely and have their own
proof-theoretic and recursion-theoretic characterizations—see Clote and
Takeuti [1995] and Oitavem [2001]. In a different direction, Beckmann,
Pollett, and Buss [2003] investigate provability and non-provability of well-
foundedness of ordinal notations in subsystems of bounded arithmetic.
8.1. Provable recursion and complexity in EA(;) 397
8.1. Provable recursion and complexity in EA(;)

In this and the following sections, we consider ways of characterizing

the elementary functions (and complexity subclasses of them) by proof-
theoretic systems which have a more immediate computational relevance
than provable Σ1 -definability in IΔ0 (exp) say. Thus we require new, al-
ternative notions of “provable recursiveness”, more directly related to
recursion and computation than to logical definability. One such alterna-
tive approach, a very natural one due to, and developed extensively by,
Leivant [1995b], [1995a], is based on recursive definability in the equation
calculus. EA(;) will have the same strength as Leivant’s “two-sorted in-
trinsic theory” over N but is different in its conception, the emphasis being
on syntactic simplicity. The axioms are arbitrary equational definitions
of partial recursive functions, and we call a function f, introduced by
a system of defining equations E, “provably recursive” if ∃a (f( x ) . a)
is derivable from those axioms E. Of course the logic has to be set up
carefully so as to prevent proofs of ∃a (f(x ) . a) when f is only partially
defined. Furthermore, the induction rules must be sufficiently restrictive
that only functions of finitely iterated exponential complexity are prov-
ably total. In contrast to IΔ0 (exp) however, the restriction will not be on
the classes of induction formulas allowed, but on the kinds of variables
allowed, as the genesis of the theory lies in the “normal–safe” recursion
schemes of Bellantoni and Cook [1992]. They show how the polynomial-
time functions can be defined by an amazingly simple, two-sorted variant
of the usual primitive recursion schemes, in which (essentially) one is only
allowed to substitute for safe variables and do recursion over normal vari-
ables. So what if one imposes the same kind of variable separation on
formal arithmetic? Then one obtains a theory with two kinds of number
variables: “safe” or “output” variables which may be quantified over, and
“normal” or “input” variables which control the lengths of inductions
and only occur free! The analogies between this logically weak theory and
classical arithmetic are quite striking.
The key notion is that of “definedness” of a term t, expressed by

t↓ := ∃a (t . a)

and it is this deﬁnition which highlights the principal logical restriction

which must be applied to the ∃-introduction and (dually) ∀-elimination
rules of the theory described below. For if arbitrary terms t were allowed
as witnesses in ∃-introduction, then from the axiom t . t we could
immediately deduce ∃a (t . a) and hence in particular f(x)↓ for every
partial recursive f! This is clearly not what we want. Thus we make the
restriction that only “basic” terms—variables or 0 or their successors or
predecessors—may be used as witnesses. This is not quite so restrictive as
398 8. Linear two-sorted arithmetic

it ﬁrst appears, since from the equality axiom

t . a → A(t) → A(a)
we can derive immediately
t↓ → A(t) → ∃a A(a).
Thus a term may be used to witness an existential quantifier only when
it has been proven to be defined. In particular, if f is introduced by
a defining equation f(x) . t then to prove f(x)↓ we first must prove
(compute) t↓. Here we can begin to see that, provided we formulate
the theory carefully enough, proofs in its Σ1 -fragment will correspond
to computations in the equation calculus, and bounds on proof-size will
yield complexity measures.
8.1.1. The theory EA(;). There will be two kinds of variables: “in-
put” (or “normal”) variables denoted n, m, . . . , and “output” (or “safe”)
variables denoted a, b, c, . . . , both here intended as ranging over natural
numbers. Output variables may be bound by quantifiers, but input vari-
ables will always be free (one might better consider them as uninterpreted
constants). The basic terms are variables of either kind, the constant 0,
or the result of repeated application of the successor S or predecessor P.
General terms are built up in the usual way from 0 and variables of either
kind, by application of S, P and arbitrary function symbols f, g, h, . . .
denoting partial recursive functions given by sets E of Herbrand–Gödel–
Kleene-style defining equations.
Atomic formulas will be equations t1 . t2 between arbitrary terms,
and formulas A, B, . . . are built from these by applying propositional
connectives and quantifiers ∃a , ∀a over output variables a. The negation
of a formula ¬A will be defined as A → F, where (as before) F stands for
“false”.
We shall work in minimal, rather than classical, logic. This is compu-
tationally more natural, and is not a restriction for us here, since (as has
already been shown) a classical proof of f(n)↓ can be transformed, by
the double-negation interpretation, into a proof in minimal logic of
(∃a ((f(n) . a → ⊥) → ⊥) → ⊥) → ⊥
and since minimal logic has no special rule for ⊥ we could replace it
throughout by the formula f(n)↓ and hence obtain an outright proof of
f(n)↓, since the premise of the above implication becomes provable.
It is not necessary to list the propositional rules. However, as stressed
above, the quantifier rules need to be restricted to basic terms as witnesses.
Thus the ∀− rule is
∀a A(a) t
∀−
A(t)
8.1. Provable recursion and complexity in EA(;) 399
where t is a basic term, and thus from the ∃+ axiom one obtains A(t) →
∃a A(a), but again only when t is basic.
Two further principles are needed, describing the data-type N, namely
induction
A(0) → ∀a (A(a) → A(Sa)) → A(t)
where t is a basic term on an input variable, and cases
A(0) → ∀a A(Sa) → ∀a A(a).
Definition. Our notion of Σ1 -formula will be restricted to those of
the form ∃a A(
a ) where A is a conjunction of atomic formulas. A typical
example is f(n )↓. Note that a conjunction of such Σ1 -formulas is provably
equivalent to a single Σ1 -formula, by distributivity of ∃ over ∧.
Definition. A k-ary function f is provably recursive in EA(;) if it can
be defined by a system E of equations such that, with input variables
n1 , . . . , nk ,
Ē f(n1 , . . . , nk )↓
where Ē denotes the set of universal closures (over output variables) of
the defining equations in E.
8.1.2. Elementary functions are provably recursive. Let E be a system of
defining equations containing the usual primitive recursions for addition
and multiplication:
a + 0 . a, a + Sb . S(a + b),
a · 0 . 0, a · Sb . (a · b) + a,
and further equations of the forms
p0 . S0, pi . pi0 + pi1 , pi . pi0 · b
defining a sequence {pi : i = 0, 1, 2, . . . } of polynomials in variables
b = b1 , . . . , bn . Henceforth we allow p(b ) to stand for any one of the
polynomials so generated (clearly all polynomials can be built up in this
way).
Definition. The progressiveness of a formula A(a) with distinguished
free variable a is expressed by the formula
Proga A := A(0) ∧ ∀a (A(a) → A(Sa));
thus the induction principle of EA(;) is equivalent to
Proga A → A(n).
The following lemmas derive extensions of this principle, first to any
polynomial in n , then to any finitely iterated exponential. In the next
subsection we shall see that this is the most that EA(;) can do.
400 8. Linear two-sorted arithmetic

Lemma. Let p(b ) be any polynomial deﬁned by a system of equations

E as above. Then for every formula A(a) we have, with input variables
substituted for the variables of p,
Ē, Proga A A(p(
n )).
Proof. Proceed by induction over the build-up of the polynomial p
according to the given equations E. We argue in an informal natural
deduction style, deriving the succedent of a sequent from its antecedent.
If p is the constant 1 (that is, S0) then A(S0) follows immediately
from A(0) and A(0) → A(S0), the latter arising from substitution of the
defined, basic term 0 for the universally quantified variable a in ∀a (A(a) →
A(Sa)).
Suppose p is p0 + p1 where, by the induction hypothesis, the result
is assumed for each of p0 and p1 separately. First choose A(a) to be
the formula a↓ and note that in this case Proga A is provable. Then the
induction hypothesis applied to p0 gives p0 ( n )↓. Now again with an
arbitrary formula A, we can easily derive
Ē, Proga A, A(a) Progb (a + b↓ ∧ A(a + b))
because if a + b is assumed to be defined, it can be substituted for the
universally quantified a in ∀a (A(a) → A(Sa)) to yield A(a + b) →
A(a + Sb). Therefore by the induction hypothesis applied to p1 we obtain
Ē, Proga A, A(a) a + p1 (
n )↓ ∧ A(a + p1 (
n ))
and hence
Ē, Proga A ∀a (A(a) → A(a + p1 (
n ))).
Finally, substituting the defined term p0 (
n ) for a, and using the induction
hypothesis on p0 to give A(p0 (
n )) we get the desired result
Ē, Proga A A(p0 (
n ) + p1 (
n )).
Suppose p is p1 · b where b is a fresh variable not occurring in p1 . By
the induction hypothesis applied to p1 , we have as above p1 (
n )↓ and
Ē, Proga A ∀a (A(a) → A(a + p1 (
n )))
for any formula A. Also, from the defining equations E and since p1 ( n )↓,
n ) · 0 . 0 and p1 (
we have p1 ( n ) · Sb . (p1 (
n ) · b) + p1 (
n ). Therefore
we can prove
Ē, Proga A Progb (p1 (
n ) · b↓ ∧ A(p1 (
n ) · b))
and an application of the EA(;)-induction principle on variable b gives,
for any input variable n,
Ē, Proga A p1 (
n ) · n↓ ∧ A(p1 (
n ) · n)
and hence Ē, Proga A A(p(
n )) as required.
8.1. Provable recursion and complexity in EA(;) 401
Definition. Extend the system of equations E above by adding the
new recursive definitions
f1 (a, 0) . Sa, f1 (a, Sb) . f1 (f1 (a, b), b),
and for each k = 2, 3, . . . ,
fk (a, b1 , . . . , bk ) . f1 (a, fk−1 (b1 , . . . , bk ))

so that f1 (a, b) = a + 2b and fk (a, b) = a + 2fk−1 (b ) . Finally define
n )) . fk (0, . . . , 0, p(
2k (p( n ))
for each polynomial p given by E, and similarly for exponential bases
other than 2.
Lemma. In EA(;) we can prove, for each k and any formula A(a),
Ē, Proga A A(2k (p(
n ))).
Proof. First note that by a similar argument to one used in the previous
lemma (and going back all the way to Gentzen) we can prove, for any
formula A(a),
Ē, Proga A Progb ∀a (A(a) → f1 (a, b)↓ ∧ A(f1 (a, b)))
since the b := 0 case follows straight from Proga A, and the induction step
from b to Sb follows by appealing to the hypothesis twice: from A(a)
we first obtain A(f1 (a, b)) with f1 (a, b)↓, and then (by substituting the
defined f1 (a, b) for the universally quantified variable a) from A(f1 (a, b))
follows A(f1 (a, Sb)) with f1 (a, Sb)↓, using the defining equations for f1 .
The result is now obtained straightforwardly by induction on k. As-
suming Ē and Proga A we derive
Progb ∀a (A(a) → f1 (a, b)↓ ∧ A(f1 (a, b)))
and then by the previous lemma
∀a (A(a) → f1 (a, p(
n ))↓ ∧ A(f1 (a, p(
n ))))
and then with a := 0 and using A(0) we have 21 (p( n ))↓ and A(21 (p( n ))),
which is the case k = 1. For the step from k to k + 1 do the same,
but instead of the previous lemma use the induction to replace p( n ) by
2k (p(n )).
3
Theorem. Every elementary (E ) function is provably recursive in the
theory EA(;), and every sub-elementary (E 2 ) function is provably recursive
in the fragment which allows induction only on Σ1 -formulas.
Proof. Any elementary function g( n ) is computable by a register ma-
chine M (working in unary notation with basic instructions “successor”,
“predecessor”, “transfer” and “jump”) within a number of steps bounded
by 2k (p(n )) for some fixed k and polynomial p. Let r1 (c), r2 (c), . . . , rn (c)
be the values held in its registers at step c of the computation, and let i(c)
be the number of the machine instruction to be performed next. Each of
402 8. Linear two-sorted arithmetic

these functions depends also on the input parameters n , but we suppress

mention of these for brevity. The state of the computation i, r1 , r2 , . . . , rn
at step c + 1 is obtained from the state at step c by performing the atomic
act dictated by the instruction i(c). Thus the values of i, r1 , . . . , rn at step
c + 1 can be defined from their values at step c by a simultaneous recursive
definition involving only the successor S, predecessor P and definitions
by cases C . So now, add these defining equations for i, r1 , . . . , rn to the
system E above, together with the equations for predecessor and cases:
P(0) . 0, P(Sa) . a
C (0, a, b) . a, C (Sd, a, b) . b
and notice that the cases rule built into EA(;) ensures that we can prove
∀d,a,b C (d, a, b)↓. Since the passage from one step to the next involves only
applications of C or basic terms, all of which are provably defined, it is
easy to convince oneself that the Σ1 -formula
∃a (i(c) . a0 ∧ r1 (c) . a1 ∧ · · · ∧ rn (c) . an )
is provably progressive in variable c. Call this formula A(
n , c). Then by
the second lemma above we can prove
Ē A(
n , 2k (p(
n )))
and hence, with the convention that the final output is the value of r1 when
the computation terminates,
Ē r1 (2k (p(
n )))↓.
Hence the function g given by g( n ) . r1 (2k (p(
n ))) is provably recursive.

In just the same way, but using only the first lemma above, we see that
any sub-elementary function (which, e.g. by Rödding [1968], is register
machine computable in a number of steps bounded by just a polynomial
of its inputs) is provably recursive in the Σ1 -inductive fragment. This is
because the proof of A( n , p(
n )) by the first lemma only uses inductions
on substitution instances of A, and here, A is Σ1 .
8.1.3. Provably recursive functions are elementary. Because the input
variables of EA(;), once introduced by an induction, do not get univer-
sally quantified thereafter, they are never substituted by more complex
terms (as happens in standard single-sorted theories like PA). This means
that, for any fixed numerical assignment to the inputs, the inductions can
be “unravelled” directly within the theory EA(;) itself, but the height of
the resulting unravelled proof will depend linearly on the values of the
numerical inputs. This theory then admits normalization with iterated
exponential complexity. Therefore a proof of f( n )↓ will be transformed
into a normal proof of size elementary in n . This process is completely
uniform in n . Hence an elementary complexity bound for the function f
8.1. Provable recursion and complexity in EA(;) 403
itself may be extracted and f is therefore elementary. Spoors [2010] devel-
ops a layered hierarchy of EA(;)-style theories whose provable functions
coincide with the levels of the Grzegorczyk hierarchy.
For Σ1 -proofs the argument is similar, but the simpler cut formulas that
occur when one unravels inductions will now lead to polynomial bounds,
because for fixed inputs n the height of the unravelled proof will in fact
be logarithmic in n (since it is a binary branching tree) and so the size
of the proof, and hence the computation of f(n), will be exponential in
log n (more precisely 2d ·log n where d is the number of nested inductions)
and thus polynomial in n. If one begins instead with binary, rather
than unary, representations of numbers, then the complexity would be
polynomial in log n. Thus, in unary style the provable functions of the
Σ1 -inductive fragment of EA(;) will be the “sub-elementary” or “linear-
space” functions, and in binary style, the poly-time functions. We will
return to this and give a more detailed proof later on.
8.1.4. Two-sorted arithmetic in higher types. The theory EA(;) provides
a very basic setting in which more feasible computational notions may
be developed and proven, but in order to build a more robust theory
applicable to program development it would be natural to extend EA(;)
to a theory A(;) incorporating variables in all finite types and a more
elaborate and expressive term structure. The theory A(;) will be to EA(;)
as HA is to HA.
We shall work with two forms of arrow types, abstraction terms and
quantifiers:
⎧ ⎧
⎪
⎨N → ⎪
⎨ →
n r as well as a r
⎪
⎩ ⎪
⎩
∀n A ∀a A
and a corresponding syntactic distinction between input and output
(typed) variables. The intuition is that a formula ∀n A may be proved
by induction, but a formula ∀a A may not, and similarly a function of type
N → may be defined by recursion on its argument, but a function of
type N → may not.
The formulas of A(;) will be built from prime formulas by two forms
of implication A → B and A → B and the two forms above of universal
quantifiers. The existential quantifier, conjunction and disjunction will
be defined inductively, as was done previously in 7.1.5.
The induction axiom is
Indn,A : ∀n (A(0) → ∀a (A(a) → A(Sa)) → A(n))
for all “safe” formulas A, i.e., all those not containing → or ∀n . In
addition we have all the other usual axioms of arithmetic in finite types,
as listed in 7.1, with the output arrow → and universal quantification ∀a
over output variables only.
404 8. Linear two-sorted arithmetic

Though it is far more expressive, A(;) will have the same elementary
recursive strength as EA(;). The underlying computational power of the
theory is incorporated into its term system T(;), which we now develop.
We shall later restrict A(;) to a linear-style logic LA(;) with a corre-
sponding term system LT(;). The consequence of this will be that terms
of arbitrary type will then be of polynomial complexity only, so the system
will automatically yield polynomial-time program extraction. Complex-
ity is of course an important consideration when extracting content from
proofs, and the ﬁrst author’s Minlog system has this capability since both
A(;) and LA(;) are incorporated into it.

8.2. A two-sorted variant T(;) of Gödel’s T

We deﬁne a two-sorted variant T(;) of Gödel’s T, by lifting the approach

of Simmons [1988] and Bellantoni and Cook [1992] to higher types. It
is shown that the functions definable in T(;) are exactly the elementary
functions. The proof, an easier version of that given by Schwichtenberg
[2006a] for the linear system LT(;) below, is based on the observation
that -normalization of terms of rank ≤ k has elementary complexity.
Generally, the two-sortedness restriction allows one to unfold R in a
controlled way, rather as inductions are allowed to be unravelled in EA(;).
8.2.1. Higher order terms with input/output restrictions. We shall work
with two forms of arrow types and abstraction terms:

N → →
as well as
n r a r
and a corresponding syntactic distinction between input and output
(typed) variables. The intuition is that a function of type N → may
recurse on its argument. On the other hand, a function of type → is
not allowed to recurse on its base type argument.
Formally we proceed as follows. The types are
, , ::= | N → |→
with a finitary base type . A type is called safe if it does not contain the
input arrow →.
The constants are the constructors for all the finitary base types, con-
taining output arrows only, and the recursion and cases operators. The
typing of the recursion operators requires usage of both → and → to en-
sure sufficient control over their unfoldings. In the present case of finitary
and result type is R of
base types the recursion operator w.r.t. = α κ
type
→ 0 → · · · → k−1 →
8.2. A two-sorted variant T(;) of Gödel’s T 405
where the step types i are of the form → → → , the , corre-
sponding to the components of the object of type under consideration,
and to the previously defined values. Recall that the first argument is
the one that is recursed on and hence must be an input term, so the type
starts with →. For example, the recursion operator RN over (unary)
natural numbers has type

N → → (N → → ) → .

In general, however, we shall require simultaneous recursion operators

as described in 6.2.1, but now the type of the j-th component will be of
the form j → 0 → · · · → k−1 → j .
The typing for the cases variant of recursion is less problematic and can
be done with the output arrow → only. Recall that in the cases operator no
recursive calls occur: one just distinguishes cases according to the outer
constructor form. Thus the cases operator is C of type

→ 0 → · · · → k−1 →

where all step types i now have the simpler form → → . For example
CN has type

N → → (N → ) → .

Because of its more convenient typing we shall normally use the cases
operator rather than the recursion operator for explicit base types.
Note, however, that both the recursion and the cases operators need
to be restricted to safe value types . This restriction is necessary in the
proof of the normalization theorem below (analogously to cut reduction
of EA(;)-formulas which, as the reader will recall, only have quantiﬁcation
over output variables).
Terms are built from these constants and typed input and output vari-
ables by introduction and elimination rules for the two type forms N →
and → , i.e.,

n | a | C (constant) |
N→
(n r ) | (r N→ s N ) (s an input term) |
(a r ) →
| (r →
s ) ,

where a term s is called an input term if all its free variables are input
variables.
A function f is said to be deﬁnable in T(;) if there is a closed term
tf : Ntwoheadrightarrow · · · N N (∈ {→, →}) denoting this
function. Notice that it is always desirable to have more output arrows →
in the type of tf , because then there are fewer restrictions on its argument
terms.
406 8. Linear two-sorted arithmetic

8.2.2. Examples. In EA(;), the functions of interest were provided by

Herbrand–Gödel–Kleene-style defining equations, which is appropriate
for a first-order theory. However, in the present setting of higher order
theories we have to prove the existence of such functions, and moreover
we must decide which are input or output arguments. We will view
input positions as a convenient way to control the size of intermediate
computations, which is well-known to be a crucial requirement for feasible
definitions of functions. For ease of reading, we use n for input and a, b
for output variables of type N, and p for general output variables.
Elementary functions. Addition can be defined by a term t+ of type
N → N → N. The recursion equations are
a + 0 := a, a + Sn := S(a + n),
and the representing term is
t+ := a,n .RN na( ,p .Sp).
The predecessor function P can be defined by a term tP of type N → N if
we use the cases operator C:
tP := a .CN a0(b b).
·:
From the predecessor function we can define modified subtraction −
a− · 0 := a, a−· Sn := P(a −
· n)
by the term
t−· := a,n .RN na( ,p .Pp).

If f is defined from g by bounded summation f(
n , n) := i<n g(
n , i), i.e.,
f(
n , 0) := 0, f(
n , Sn) := f(
n , n) + g(
n , Sn)
and we have a term tg of type N → · · · → N → N defining g, then we
can build a term tf of the same type defining f by
n)).
tf := n ,n .RN n0( ,p .p + (tg n
Higher type definitions. Consider iteration I (n, f) = f n :
I (0, f, a) := a, I (0, f) := id,
or
I (n + 1, f, a) := I (n, f, f(a)), I (n + 1, f) := I (n, f) ◦ f.
It can be defined by a term with f a parameter of type N → N:
If := n (RN→N n(a a)( ,p,a (pN→N (fa)))).
In T(;), f can be either an input or an output variable, but in LT(;), f
will need to be an input variable, because the step argument of recursion
is an input argument.
For the general definition of iteration, let the pure safe types k be
defined by 0 := N and k+1 := k → k . Then we can define
Inak . . . a0 := akn ak−1 . . . a0 ,
8.2. A two-sorted variant T(;) of Gödel’s T 407
with ak of type k . These variables ak must be output variables, because
the value type of a recursion is required to be safe. Therefore, the definition
F0 ak . . . a0 := Ia0 ak . . . a0 which, as noted before, is sufficient to generate
all of Gödel’s T, is not possible: Ia0 is not allowed.
This observation also confirms the necessity of the restrictions on the
type of R. We must require that the value type is a safe type, for otherwise
we could define
IE := n (RN→N n(m m)( ,p,m (pN→N (Em)))),
and IE (n, m) = E n (m), a function of super-elementary growth.
We also need to require that the “previous” variable is an output vari-
able, because otherwise we could define
S := n (RN n0( ,m (Em))) (super-elementary).
Then S(n) = E n (0).
8.2.3. Elementary functions are definable. We now show that in spite
of our restrictions on the formation of types and terms we can define
functions of exponential growth.
Probably the easiest function of exponential growth is the fast-growing
B restricted to finite ordinals n, viz. B(n, a) = a + 2n of type B : N →
N → N, with the defining equations
B(0, a) = a + 1, B(n + 1, a) = B(n, B(n, a)).
We formally define B as a term in T(;) by
B := n (RN→N nS( ,p,a (pN→N (pa)))).
Notice that this will not be a legal definition in the linear term system
LT(;), because of the double occurrence of the higher type variable p.
From B we can define the exponential function E := n (Bn0) of type
E : N → N, and also iterated exponential functions like n (E(En)).
Theorem. For every elementary function f there is a term tf of type
N → · · · → N → N defining f as a function on inputs.
Proof. We use the characterization in 2.2.3 of the class E of elementary
functions: it consists of those number-theoretic functions which can be
defined from the initial functions: constant 0, successor S, projections
(onto the i-th coordinate), addition +, modified subtraction − · , multi-
plication · and exponentiation 2 , by applications of composition and
x

bounded minimization.
Recall that bounded minimization
f(
n , m) = k<m (g(
n , k) = 0)
is deﬁnable from bounded summation and −·:

f(
n , m) = ·
(1 − · g(
(1 − n , k))).
i<m k≤i
408 8. Linear two-sorted arithmetic

The claim follows from the examples above.

The main problem with the representation of the elementary functions
in the theorem above is that they have input arguments only. This prevents
substitution of terms involving output variables, which is a severe restric-
tion on the use of such functions in practice. A possible solution is to (1)
introduce an additional input argument acting as a bound for the results
of intermediate computations, and (2) replace the recursion operator by
the cases operator as much as possible, exploiting the fact that the latter
is of safe type. For example, addition can be obtained as f + (a, b, m) with
f + of type N → N → N → N, defined by

+ + b if a = 0
f (a, b, 0) := 0, f (a, b, m + 1) :=
f + (P(a), b, m) + 1 otherwise,
where P is the predecessor function of type N → N defined above, using
the cases operator. Then
a, b ≤ m → f + (a, b, m) = a + b.
Similarly, multiplication can be obtained as f × (a, b, m) with f × of type
N → N → N → N, by
f × (a, b, 0) := 0,

0 if b = 0
f × (a, b, m + 1) := + ×
f (f (a, P(b), m), a, m + 1) otherwise.
Then
a, b ≤ m → f × (a, b, m 2 ) = a · b.
Generally, we have
Theorem. For every n-ary elementary function f we can find a T(;)-term
tf of type N → · · · → N → N such that, for some k,
a ≤ m → tf (
a , 2k (m)) = f(
a ).
Proof. We proceed as in the theorem in 8.1.2. For arguments a with
a ≤ m, any elementary function f(
a ) is computable by a register ma-
chine M (working in unary notation with basic instructions “successor”,
“predecessor”, “transfer” and “jump”) within a number of steps bounded
by 2k (m) for some fixed k. Let r1 (n), r2 (n), . . . , rl (n) be the values held
in its registers at step n of the computation, and let i(n) be the number
of the machine instruction to be performed next. Each of these functions
depends also on the input parameter m, but we suppress mention of this
for brevity. The state of the computation i, r1 , r2 , . . . , rl at step n + 1 is
obtained from the state at step n by performing the atomic act dictated by
the instruction i(n). Thus the values of i, r1 , . . . , rl at step n + 1 can be
defined from their values at step n by a simultaneous recursive definition
8.2. A two-sorted variant T(;) of Gödel’s T 409
involving only the successor, predecessor and definitions by cases. The
terms representing this will be of the (simultaneous) form
N
tj := a n (RN,
j a0( ,p C0 )( ,p C1 ) . . . ( ,p Cl ))
n0
where j ≤ l , 0 a0 are the initial values of i, r1 , r2 , . . . , rl and C0 , C1 , . . . , Cl
are terms which predict the next values of i, r1 , r2 , . . . , rl given their previ-
ous ones p . The required term tf will then be t1 , assuming r1 is the output
register.
8.2.4. Definable functions are elementary. We give an elementary upper
bound on the complexity of functions definable in T(;). This will be
achieved by a careful analysis of the normalization process. Since the
complexity of -normalization is well-known to be elementary, we can
treat it separately from the elimination of the recursion operator.
Recall the conversion rules (3) for the recursion operator R and (4) for
the cases operator C. In addition we need the -conversion rule (1), which
we will employ in a slightly generalized form; see below. The -conversion
rule (2) is not needed, since we are interested in the computation of nu-
merals only. In fact, we can assume that all recursion and cases operators
are “-expanded”, i.e., appear with sufficiently many arguments for the
conversion rules to apply: if not, apply them to sufficiently many new
variables of the appropriate types and abstract them in front. This -
expansion process clearly does not change the intended meaning of the
term. Notice that the property of a term to have -expanded recursion
and cases operators only is preserved under the conversion rules (since
-conversion is left out).
The size (or length) ||r|| of a term r is the number of occurrences of
constructors, variables and constants in r: ||x|| = ||C || = 1, ||n r|| =
||a r|| = ||r|| + 1, and ||rs|| = ||r|| + ||s|| + 1.
Let us first consider -normalization. Here the distinction between
input and output variables and our two type formers → and → plays no
role. It will be convenient to allow generalized -conversion:
x , x))ss → (x r(
(x ,x r( x , s))s.
-redexes are instances of the left side of the -conversion rule. A term is
said to be in -normal form if it does not contain a -redex.
We want to show that every term reduces to a -normal form. This can
be seen easily if we follow a certain order in our conversions. To define
this order we have to make use of the fact that all our terms have types.
x , x ))ss is also called a cut with cut-type . By the
A -redex (x ,x r(
level of a cut we mean the level of its cut-type. The cut-rank of a term r
is the least number bigger than the levels of all cuts in r. Now let t be
a term of cut-rank k + 1. Pick a cut of the maximal level k in t, such
that s does not contain another cut of level k (e.g., pick the rightmost cut
of level k). Then it is easy to see that replacing the picked occurrence of
410 8. Linear two-sorted arithmetic

x , x ))ss in t by (x r(

(x ,x r( x , s))s reduces the number of cuts of the
maximal level k in t by 1. Hence
Theorem (-normalization). We have an algorithm which reduces any
given term into a -normal form.
We now want to give an estimate of the number of conversion steps our
algorithm takes until it reaches the normal form. The key observation for
this estimate is the obvious fact that replacing one occurrence of
(x ,x r(
x , x))ss by (x r(
x , s))s
in a given term t at most squares the size of t.
An elementary bound Ek (l ) for the number of steps our algorithm
takes to reduce the rank of a given term of size l by k can be derived
inductively, as follows. Let E0 (l ) := 0. To obtain Ek+1 (l ), first note that
by induction hypothesis it takes ≤ Ek (l ) steps to reduce the rank by k.
n
The size of the resulting term is ≤ l 2 where n := Ek (l ) since any step (i.e.,
-conversion) at most squares the size. Now to reduce the rank by one
more, we convert—as described above—one by one all cuts of the present
rank, where each such conversion does not produce new cuts of this rank.
Therefore the number of additional steps is bounded by the size n. Hence
the total number of steps to reduce the rank by k + 1 is bounded by
Ek (l )
Ek (l ) + l 2 =: Ek+1 (l ).
Theorem (Upper bound for the complexity of -normalization). The
-normalization algorithm given in the proof above takes at most Ek (l ) steps
to reduce a given term of cut-rank k and size l to normal form, where
Ek (l )
E0 (l ) := 0, Ek+1 (l ) := Ek (l ) + l 2 .
We now show that we can also eliminate the recursion operator, and
still have an elementary estimate on the time needed.
Lemma (R-elimination). Let t( x ) be a -normal term of safe type.
There is an elementary function Et such that: if r are safe type R-free
terms and the free variables of t(r ) are output variables of safe type, then
in time Et (||r ||) (with ||r || := i ||ri ||) one can compute an R-free term
rf(t; x ; r ) such that t(r ) →∗ rf(t; x ; r ).
Proof. Induction on ||t||.
If t(x ) has the form x u1 , then x is an output variable and x, u1 have
safe type because t has safe type. If t( x ) is of the form D u with D a
variable or a constant different from R, then each ui is a safe type term.
Here (in case D is a variable) we need that x and the free variables of t(r )
are of safe type.
In all of the preceding cases, the free variables of each ui (r ) are output
variables of safe type. Apply the induction hypothesis to obtain ui∗ :=
; r ). Let t ∗ be obtained from t by replacing each ui by ui∗ . Then
rf(ui ; x
8.2. A two-sorted variant T(;) of Gödel’s T 411
t ∗ is R-free. The result is obtained in linear time from u ∗ . This finishes
the lemma in all of these cases.
The only remaining case is when t is an R-clause. The first (input)
argument must be present, because the term has safe type and therefore
cannot be R alone. Recall that we may assume that t is of the form Rrus t
(by -expansion with safe variables, if necessary). We obtain rf(r; x ; r )
in time Er (||r ||) by the induction hypothesis. By assumption t(r ) has
free output variables only. Hence r(r ) is closed, because the type of R
requires r(r ) to be an input term. By -normalization we obtain the
numeral N := nf(rf(r; x ; r )) in a further elementary time, Er (||r ||). Here
nf(·) denotes a function on terms which produces the -normal form.
For the step term s we now consider sa with a new variable a, and
let s be its -normal form. Since s is -normal, ||s || ≤ ||s|| + 1 < ||t||.
Applying the induction hypothesis to s we obtain a monotone elementary
bounding function Esa . We compute all si := rf(s ; x , a; r, i) (i < N ) in
a total time of at most

Esa (||r || + i) ≤ Er (||r ||) · Esa (||r || + Er (||r ||)).
i<N

Consider u, t. The induction hypothesis gives u := rf(u; x ; r ) in time

Eu (||r ||), and all tˆi := rf(ti ; x
; r ) in time i Eti (||r ||). These terms are
also R-free by induction hypothesis.
Using additional time bounded by a polynomial P in the lengths of
these computed values, we construct the R-free term

; r ) := (sN −1 . . . (s1 (s0 u )) . . . )t.

rf(Rrus t; x ˆ

Deﬁning Et (l ) := P(Eu (l ) + i Eti (l ) + Er (l ) · Esa (l + Er (l ))), the total
time used in this case is at most Et (||r ||).
Let the R-rank of a term t be the least number bigger than the level of
all value types of recursion operators R in t. By the rank of a term we
mean the maximum of its cut-rank and its R-rank. Combining the last
two lemmas now gives the following.
Lemma. For every k there is an elementary function Nk such that every
T(;)-term t of rank ≤ k can be reduced in time Nk (||t||) to R-normal form.
It remains to remove the cases operator C. We may assume that only
CN occurs.
Lemma (C-elimination). Let t be an R-free closed -normal term of base
type N. Then in time linear in ||t|| one can reduce t to a numeral.
Proof. If the term does not contain C we are done. Otherwise remove
all occurrences of C, as follows. The term has the form Sr or Crts. Proceed
with r and iterate until we reach Crts where r does not contain C. Then r is
0 or Sr0 . In the ﬁrst case, convert C0ts to t. In the second case, notice that
412 8. Linear two-sorted arithmetic

s has the form a s0 (a). Convert C(Sr0 )t(a s0 (a)) ﬁrst into (a s0 (a))r0
and then into s0 (r0 ). Each time we have removed one occurrence of C.
We can now combine our results and state the ﬁnal theorem.
Theorem (Normalization). Let t be a closed T(;)-term of type N
· · · N N (∈ {→, →}). Then t denotes an elementary function.
Proof. We produce an elementary function Ft such that for all numerals
with t
n n of type N we can compute nf(t n ) in time Ft (|| n ||). Let x be new
variables such that t x is of type N. The -normal form -nf(t x ) of t x

is computed in an amount of time that may be large, but it is still only a
constant with respect to n .
By R-elimination we reduce to an R-free term rf(-nf(t x ); x ) in
; n
time bounded by an elementary function of || n ||. Since the running time
bounds the size of the produced term, ||rf(-nf(t x ); x )|| is also bounded
; n
by this elementary function of || n ||. By a further -normalization we can
therefore compute
R-nf(t
n ) = -nf(rf(-nf(t x
); x ))
; n
in time elementary in ||
n ||. Finally in time linear in the result we can remove
all occurrences of C and arrive at a numeral (elementarily in n ).

8.3. A linear two-sorted variant LT(;) of Gödel’s T

We restrict T(;) to a linear-style term system LT(;). The consequence

is that terms of arbitrary type will now be of polynomial-time complexity.
This work first appeared in Schwichtenberg [2006a].
Recall that in the first example concerning T(;) of a recursion producing
exponential growth, we defined B(n, a) = a + 2n by the term
B := n (RN→N nS( ,p,a (pN→N (pa)))).
Crucially, the higher type variable p for the “previous” value appears twice
in the step term. The linearity restriction will forbid this in a fairly brutal
way, by simply requiring that higher type output variables are only allowed
to appear (at most) once in a term. Now the output arrow → (where
is not a base type) really is the linear arrow, one of the fundamental
features of “linear logic”.
The term definition will now involve the above linearity constraint.
Moreover, the typing of the recursion operator R needs to be carefully
modified because we allow higher types as argument types for →, not just
base types like N. The (higher type) step argument may be used many
times; hence we need an input arrow → after it, not the → as before
because the linearity of → would now prevent multiple use. The type of
the recursion operator will thus be
N → → (N → → ) → .
8.3. A linear two-sorted variant LT(;) of Gödel’s T 413
The point is that the typing now ensures that the step term of a recursion is
an input argument. This implies that it cannot contain higher type output
variables, which would be duplicated when the recursion is unfolded.
8.3.1. LT(;)-terms. We extend the usage of arrow types and abstraction
terms from 8.2.1, by allowing higher type input variables as well. We work
with two forms of arrow types and abstraction terms:

→ →
as well as
x̄ r x r
and a corresponding syntactic distinction between input and output
(typed) variables, the intuition being that a function of type →
may recurse on its argument (if it is of base type) or use it many times (if
it is of higher type). On the other hand, a function of type → is not
allowed to recurse on its argument if it is of base type and can use it only
once if it is of higher type.
At higher types we shall need a large variety of variable names, and a
clear input/output distinction. A convenient way to achieve this is simply
to use an overbar to signify the input case. Thus x, y, z, . . . will now denote
arbitrary output variables, and x̄, ȳ, z̄, . . . will always be input variables.
Formally, the types are
, , ::= | → |→ .
with a finitary base type . Again, a type is called safe if does not contain
the input arrow →. The j-th component Rj , of a simultaneous recursion
operator now has type
j → 0 . . . k−1 j
where for each i < k, if the step type i demands a recursive call, then the
arrow after it must be →, and otherwise it must be the linear →.
The typing of Rj, with its careful choices of → and → deserves some
comments. The first argument is the one that is recursed on and hence
must be an input term, so the type starts with →. The recursive step
arguments are of a higher type and will be used many times when the
recursion operator is unfolded, so in LT(;) it must be an input term as
well. Hence we then need a → after such step types.
For the base type N of (unary) natural numbers the type of the recursion
operator RN now is
N → → (N → → ) → .
The type of the cases operator is as for T(;) (cf. 8.2.1). Also, both the
recursion and cases operators need to be restricted to safe value types j .
Terms are built from these constants and typed variables x̄ (input
variables) and x (output variables) by introduction and elimination rules
414 8. Linear two-sorted arithmetic

for the two type forms → and → , i.e.,

x̄ | x | C (constant) |
→
(x̄ r ) | (r → s ) (s an input term) |
(x r )→
| (r →
s ) (higher type output variables in r, s distinct),
where again a term s is called an input term if all its free variables are
input variables. The restriction on output variables in the formation of
an application r → s ensures that every higher type output variable can
occur at most once in a given LT(;)-term.
Again a function f is called definable in LT(;) if there is a closed term
tf : N · · · N N ( ∈ {→, →}) denoting this function.
8.3.2. Examples. We now look at some examples intended to explain
what can be done in LT(;), and in particular, how our restrictions on the
formation of types and terms make it impossible to obtain exponential
growth. However, for definiteness we first have to say precisely what we
mean by a numeral, this time being binary ones.
Terms of the form r1 :: (r2 :: . . . (rn :: nil ) . . .) are called lists; we
concentrate on lists of booleans. Let W := L(B), and
1 := nilB , S0 := v (ff :: v W ), S1 := v (tt :: v W ).
Particular lists are Si1 (. . . (Sin 1) . . . ), called binary numerals (or words),
denoted by v, w, . . . .
Polynomials. It is easy to define ⊕ : W → W → W such that v ⊕ w
concatenates ||v|| bits onto w:
1 ⊕ w = S0 w, (Si v) ⊕ w = S0 (v ⊕ w).
The representing term is
v̄ ⊕ w := RW→W v̄S0 ( , ,p,w .S0 (pW→W w))w.
Similarly we define / : W → W → W such that v / y has output
length ||v|| · ||w||:
v / 1 = v, v / (Si w) = v ⊕ (v / w).
The representing term is v̄ / w̄ := RW w̄ v̄( , ,p .v̄ ⊕ p).
Note that the typing ⊕ : W → W → W is crucial: it allows using
the output variable p in the definition of /. If we try to go on and
define exponentiation from multiplication just as / was defined from
⊕, we find that we cannot go ahead, because of the different typing
/ : W → W → W.
Two recursions. Consider
D(1) := S0 (1), E(1) := 1,
D(Si (w)) := S0 (S0 (D(w))), E(Si (w)) := D(E(w)).
8.3. A linear two-sorted variant LT(;) of Gödel’s T 415
The corresponding terms are
D := w̄ .RW w̄(S0 1)( , ,p .S0 (S0 p)), E := w̄ .RW w̄1( , ,p .Dp).
Here D is legal, but E is not: the application Dp is not allowed.
Recursion with parameter substitution. Consider
E(1, v) := S0 (v), E(1) := S0 ,
or
E(Si (w), v) := E(w, E(w, v)), E(Si (w)) := E(w) ◦ E(w).
The corresponding term
w̄ .RW→W w̄S0 ( , ,p,v .p W→W (pv))
does not satisfy the linearity condition: the higher type variable p occurs
twice, and the typing of R requires p to be an output variable.
Higher arguments types. Recall the definition of iteration I (n, f) = f n
in 8.2.2:
I (0, f, w) := w, I (0, f) := id,
or
I (n + 1, f, w) := I (n, f, f(w)), I (n + 1, f) := I (n, f) ◦ f.
It can be defined by a term with f a parameter of type W → W:
If := n (RW→W n(w w)( ,p,w (p W→W (fw)))).
In LT(;), f must be an input variable, because the step argument of
a recursion is by definition an input argument. Thus f If may only
be applied to input terms of type W → W. This severely restricts the
applicability of I , and raises a crucial point. The fact is that we cannot
define the exponential function by
n (RW→W nS( ,p (Ip 2)))
since on the one hand the step type requires p to be an output variable,
whereas on the other hand Ip is only correctly formed if p is an input
variable.
8.3.3. Polynomial-time functions are LT(;)-definable. We show that the
functions definable in LT(;) are exactly the polynomial-time computable
ones. Recall that for this result to hold it is important that we work with
the binary representation W of the natural numbers. As in 8.2.3 we can
prove
Theorem. For every k-ary polynomial-time computable function f we
can find an LT(;)-term tf of type W (k) → W → W such that, for some
polynomial p,
||
a || ≤ m → tf (
a , p(m)) = f(
a ).
Proof. We analyse successive state transitions of a register machine M
computing f, this time working in binary notation with the two successors
of W. Otherwise the proof is exactly the same.
416 8. Linear two-sorted arithmetic

Corollary. Each polynomial-time function f can be represented by the

term tf ( n ))) of type W (k) → W.
n , p(max(
8.3.4. LT(;)-definable functions are polynomial-time. To obtain a poly-
nomial-time upper bound on the complexity of functions definable in
LT(;), we again need a careful analysis of the normalization process. In
contrast to the T(;)-case, -normalization and the elimination of the re-
cursion operator cannot be separated but must be treated simultaneously.
Moreover, it will be helpful not to use register machines as our model
of computation, but another one closer to the lambda-terms we have to
work with. This model will be described as we go along; it is routine to
see that it is equivalent to the register machine model.
A dag is a directed acyclic graph. A parse dag is a structure like a parse
tree but admitting in-degree greater than one. For example, a parse dag
for x r has a node containing x and a pointer to a parse dag for r. A
parse dag for an application rs has a node containing a pair of pointers,
one to a parse dag for r and the other to a parse dag for s. Terminal nodes
are labeled by constants and variables.
The size ||d || of a parse dag d is the number of nodes in it. Starting
at any given node in the parse dag, one obtains a term by a depth-first
traversal; it is the term represented by that node. We may refer to a node
as if it were the term it represents.
A parse dag is conformal if (i) every node having in-degree greater than
1 is of base type, and (ii) every maximal (that is, non-extensible) path to
a bound variable x passes through the same binding x -node.
A parse dag is h-affine if every higher type variable occurs at most once
in the dag.
We adopt a model of computation over parse dags in which operations
such as the following can be performed in unit time: creation of a node
given its label and pointers to the sub-dags; deletion of a node; obtaining a
pointer to one of the subsidiary nodes given a pointer to an interior node;
conditional test on the type of node or on the constant or variable in
the node. Concerning computation over terms (including numerals), we
use the same model and identify each term with its parse tree. Although
not all parse dags are conformal, every term is conformal (assuming a
relabeling of bound variables).
A term is called simple if it contains no higher type input variables.
Obviously simple terms are closed under reductions, taking of subterms,
and applications. Every simple LT(;)-term is h-affine, due to the linearity
of higher type output variables.
Lemma (Simplicity). Let t be a base type term whose free variables are
of base type. Then nf(t) contains no higher type input variables.
Proof. Suppose a variable x̄ with lev( ) > 0 occurs in nf(t). It must
be bound in a subterm (x̄ r) → of nf(t). By the well-known subtype
8.3. A linear two-sorted variant LT(;) of Gödel’s T 417
property of normal terms, the type → either occurs positively in the
type of nf(t), or else negatively in the type of one of the constants or free
variables of nf(t). The former is impossible since t is of base type, and
the latter by inspection of the types of the constants.
Lemma (Sharing normalization). Let t be an R-free simple term. Then
a parse dag for nf(t), of size at most ||t||, can be computed from t in time
O(||t||2 ).
Proof. Under our model of computation, the input t is a parse tree.
Since t is simple, it is an h-affine conformal parse dag of size at most ||t||. If
there are no nodes which represent a redex, then we are done. Otherwise,
locate a node representing a redex; this takes time at most O(||t||). We
show how to update the dag in time O(||t||) so that the size of the dag
has strictly decreased and the redex has been eliminated, while preserving
conformality. Thus, after at most ||t|| iterations the resulting dag represents
the normal-form term nf(t). The total time therefore is O(||t||2 ).
Assume first that the redex in t is (x r)s with x of base type (see Fig-
ure 1, where ◦ is a node with in-degree at most one and • is an arbitrary
node); the argument is similar for an input variable x̄. Replace pointers to
x in r by pointers to s. Since s does not contain x, no cycles are created.
Delete the x -node and the root node for (x r)s which points to it. By
conformality (i) no other node points to the x -node. Update any node
which pointed to the deleted node for (x r)s, so that it now points to the
revised r-subdag. This completes the -reduction on the dag (one may
also delete the x-nodes). Conformality (ii) gives that the updated dag
represents a term t such that t → t .
One can verify that the resulting parse dag is conformal and h-affine,
with conformality (i) following from the fact that s has base type.

•
@R• s
@
x ◦ -• s

r •? r•
A A
A A
A A
•A A
x

Figure 1. Redex (x r)s with r of base type.

If the redex in t is (x r)s with x of higher type (see Figure 2 on page 418),
then x occurs at most once in r because the parse dag is h-aﬃne. By confor-
mality (i) there is at most one pointer to that occurrence of x. Update it to
point to s instead, deleting the x-node. As in the preceding case, delete the
x and the (x r)s-node pointing to it, and update other nodes to point to
the revised r. Again by conformality (ii) the updated dag represents t such
418 8. Linear two-sorted arithmetic

that t → t . Conformality and acyclicity are preserved, observing this time

that conformality (i) follows because there is at most one pointer to s.

•
@R◦ s
@
x ◦ -◦ s

r •? r•
A A
A A
A A
◦A A
x

Figure 2. Redex (x r)s with r of higher type.

The remaining reductions are for the constant symbols. We only need
to consider the case C (r :: l )ts → srl with possibly a base type; see
Figure 3.
•

• ◦
@
R◦
@
◦ ◦
@ s s
◦ R•
@
•t
@ t
◦ R•
@ •
C @ @
◦ R•
@ ◦ R•
@
@ l @ l
◦ R•
@ ◦ R•
@
:: r :: r

Figure 3. C (r :: l )ts → srl with possibly a base type.

Corollary (Base normalization). Let t be a closed R-free simple term

of type W. Then the binary numeral nf(t) can be computed from t in time
O(||t||2 ), and ||nf(t)|| ≤ ||t||.
Proof. By the sharing normalization lemma we obtain a parse dag
for nf(t) of size at most ||t||, in time O(||t||2 ). Since nf(t) is a binary
numeral, there is only one possible parse dag for it—namely, the parse
tree of the numeral. This is identified with the numeral itself in our model
of computation.
Lemma (R-elimination). Let t( x ) be a simple term of safe type. There
is a polynomial Pt such that: if r are safe type R-free closed simple terms
and the free variables of t(r ) are output variables of safe type, then in
time Pt (||r ||) one can compute an R-free simple term rf(t; x
; r ) such that
t(r ) →∗ rf(t; x ; r ).
8.3. A linear two-sorted variant LT(;) of Gödel’s T 419
Proof. By induction on ||t||.
If t(x ) has the form z u1 , then z is an output variable and z, u1 have safe
type because t has safe type. If t( x ) is of the form D u with D a variable
or a constant different from R, then each ui is a safe type term. Here (in
case D is a variable xi ) we need that xi is of safe type. In all of these cases,
each ui (r ) has only free output variables of safe type. Apply the induction
hypothesis as required to simple terms ui to obtain ui∗ := rf(ui ; x ; r ); so
each ui∗ is R-free. Let t ∗ be obtained from t by replacing each ui by ui∗ .
Then t ∗ is an R-free simple term; here we need that the r are closed, to
avoid duplication of variables. The result is obtained in linear time from
u ∗ . This finishes the lemma in all of these cases.
If t is (y r)s u with an output variable y of base type, apply the induction
hypothesis to yield (r u )∗ := rf(r ; r ) and s ∗ := rf(s; x
u; x ; r ). Redirect
the pointers to y in (r u ) to point to s ∗ instead. If t is (ȳ r)s u with
∗

an input variable ȳ of base type, apply the induction hypothesis to yield

s ∗ := rf(s; x ; r ). Note that s ∗ is closed, since it is an input term and
the free variables of s(r ) are output variables. Then apply the induction
hypothesis again to obtain rf(r u; x , ȳ; r, s ∗ ). The total time is at most
Q(||t||)+Ps (||r ||)+Pr (||r ||+Ps (||r ||)), where Q(||t||) is some linear function
bounding the time it takes to construct r u from t = (y r)s u .
If t is (y r(y))s u with y of higher type, then y can occur at most once
in r, because t is simple. Thus ||r(s) u || < ||(y r)s u || and hence we may
apply the induction hypothesis to obtain rf(r(s) u; x ; r ). Note that the
time is bounded by Q(||t||)+Pr(s)u (||r ||) for a degree-1 polynomial Q, since
it takes at most linear time to make the at-most-one substitution in the
parse tree.
The only remaining case is if the term is an R-clause. Then it is of
the form Rlus t, because the term has safe type, meaning that s must
be present. Since l is an input term, all free variables of l are input
variables—they must be in x since free variables of (Rlus t )[ x := r ] are
output variables. Therefore l (r ) is closed, implying nf(l (r )) is a list. One
obtains rf(l ; x ; r ) in time Pl (||r ||) by the induction hypothesis. Then by
base normalization one obtains the list lˆ := nf(rf(l ; x ; r )) in a further
polynomial time. Let lˆ = b0 :: (b1 :: . . . (bN −1 :: nil ) . . .) and let
li , i < N be obtained from lˆ by omitting the initial elements b0 , . . . , bi .
Thus all {bi , li | i < N } are obtained in a total time bounded by some
polynomial Pl (||r ||).
Now consider szy with new variables z and y L() . Applying the in-
duction hypothesis to szy one obtains a monotone bounding polynomial
Pszy . One computes all si := rf(szy; x , z, y; r, bi , li ) in a total time of at
most

Pszy (||bi || + ||li || + ||r ||) ≤ Pl (||r ||) · Pszy (2Pl (||r ||) + ||r ||).
i<N
420 8. Linear two-sorted arithmetic

Each si is R-free by the induction hypothesis. Furthermore, no si has a

free output variable: any such variable would also be free in s, contradict-
ing that s is an input term.
Consider u, t. The induction hypothesis gives û := rf(u; x ; r ) in time
Pu (||r ||), and all tî := rf(ti ; x
; r ) in time i Pti (||r ||). These terms are
also R-free by induction hypothesis. Clearly u and the ti do not have any
free (or bound) higher type output variables in common. The same is true
of û and all tî .
Using additional time bounded by a polynomial p in the lengths of
these computed values, one constructs the R-free term
(x s0 (s1 . . . (sN −1 x) . . . ))ût.
ˆ

Defining Pt (n) := P( i Pti (n) + Pl (n) · Pszy (2Pl (n) + n)), the total time
used in this case is at most Pt (||r ||). The result is a term because u and
the tî are terms which do not have any free higher type output variable
in common and because si does not have any free higher type output
variables at all.
Theorem (Normalization). Let r be a closed LT(;)-term of type W
· · · W W (∈ {→, →}). Then r denotes a poly-time function.
Proof. One must find a polynomial Qt such that for all R-free simple
closed terms n of types one can compute nf(t n ) in time Qt (|| n ||). Let x be
new variables of types . The normal form of t x is computed in an amount
of time that may be large, but it is still only a constant with respect to n . By
the simplicity lemma nf(t x ) is simple. By R-elimination one reduces to
an R-free simple term rf(nf(t x ); x ) in time Pt (||
; n n ||). Since the running
time bounds the size of the produced term, ||rf(nf(t x ); x )|| ≤ Pt (||
; n n ||).
By sharing normalization one can compute
n ) = nf(rf(nf(t x
nf(t ); x ))
; n
in time O(Pt (|| n ||) ), so for Qt one can take some constant multiple of
2

Pt (||n ||)2 .
8.3.5. The first-order fragment T1 (;) of T(;). Let T1 (;) be the fragment
of T(;) where recursion and cases operators have base value types only. It
will turn out that—similarly to the restriction of EA(;) to Σ1 -induction—
we can characterize polynomial-time complexity this way. The proof
is a simplification of the argument above. A term is called first-order
if it contains no higher type variables. Obviously first-order terms are
simple, and they are closed under reductions, taking of subterms, and
applications.
Lemma (R-elimination for T1 (;)). Let t( x ) be a first-order term of safe
type. There is a polynomial Pt such that: if r are R-free closed first-
order terms and the free variables of t(r ) are output variables, then in time
Pt (||r ||) one can compute an R-free first-order term rf(t; x ; r ) such that
t(r ) →∗ rf(t; x ; r ).
8.3. A linear two-sorted variant LT(;) of Gödel’s T 421
Proof. By induction on ||t||.
If t(x ) has the form z u1 , then z is an output variable because t has safe
type. If t( x ) is of the form D u with D a variable or a constant different
from R, then each ui is a safe type first-order term.
In all of the preceding cases, each ui (r ) has free output variables only.
Apply the induction hypothesis as required to first-order safe type terms
ui to obtain ui∗ := rf(ui ; x ; r ); so each ui∗ is R-free. Let t ∗ be obtained
from t by replacing each ui by ui∗ . Then t ∗ is an R-free first-order term.
The result is obtained in linear time from u ∗ . This finishes the lemma in
all of these cases.
If t is (y r)s u with an output variable y, apply the induction hypothesis
to yield (r u )∗ := rf(r u; x ; r ) and s ∗ := rf(s; x ; r ). Redirect the pointers
to y in (r u ) to point to s ∗ instead. If t is (ȳ r)s u with an input variable
∗

ȳ, apply the induction hypothesis to yield s ∗ := rf(s; x ; r ). Note that

s ∗ is closed, since it is an input term and the free variables of s(r ) are
output variables. Then apply the induction hypothesis again to obtain
rf(r u; x , ȳ; r, s ∗ ). The total time is at most Q(||t||) + Ps (||r ||) + Pr (||r || +
Ps (||r ||)), as it takes at most linear time to construct r u from (y r)s u .
The only remaining case is if the term is an R-clause of the form Rlus t.
Since l is an input term, all free variables of l are input variables—they
must be in x since free variables of (Rlus t )[ x := r ] are output variables.
Therefore l (r ) is closed, implying nf(l (r )) is a list. One obtains rf(l ; x ; r )
in time Pl (||r ||) by the induction hypothesis. Then by base normalization
one obtains the list l := nf(rf(l ; x ; r )) in a further polynomial time. Let
lˆ = b0 :: (b1 :: . . . (bN −1 :: nil ) . . .) and let li , i < N be obtained from
lˆ by omitting the initial elements b0 , . . . , bi . Thus all {bi , li | i < N } are
obtained in a total time bounded by Pl (||r ||) for a polynomial Pl .
Now consider szy with new variables z and y L() . Applying the in-
duction hypothesis to szy one obtains a monotone bounding polynomial
Pszy . One computes all si := rf(szy; x , z, y; r, bi , li ) in a total time of at
most

Pszy (||bi || + ||li || + ||r ||) ≤ Pl (||r ||) · Pszy (2Pl (||r ||) + ||r ||).
i<N

Each si is R-free by the induction hypothesis. Furthermore, no si has a

free output variable: any such variable would be free in s contradicting
that s is an input term.
Consider u, t. The induction hypothesis gives û := rf(u; x ; r ) in time
Pu (||r ||), and all tˆi := rf(ti ; x
; r ) in time i Pti (||r ||). These terms are
R-free by induction hypothesis.
Using additional time bounded by a polynomial p in the lengths of
these computed values, one constructs the R-free term

(x s0 (s1 . . . (sN −1 x) . . . ))ût.

ˆ
422 8. Linear two-sorted arithmetic

Defining Pt (n) := P( i Pti (n) + Pl (n) · Pszy (2Pl (n) + n)), the total time
used in this case is at most Pt (||r ||).
Theorem (Normalization). Let r be a closed T1 (;)-term of type W
· · · W W (∈ {→, →}). Then r denotes a poly-time function.
Proof. One must find a polynomial Qt such that for all numerals n one
can compute nf(t n ) in time Qt (||n ||). Let x be new variables of type W.
The normal form of t x is computed in an amount of time that may be
large, but it is still only a constant with respect to n .
) clearly is a first-order term, since the R- and C-operators have
nf(t x
base value types only. By R-elimination one reduces to an R-free first-
order term rf(nf(t x ); x
; n n ||). Since the running time bounds
) in time Pt (||
the size of the produced term, ||rf(nf(t x ); x )|| ≤ Pt (||
; n n ||).
By sharing normalization one can compute

n ) = nf(rf(nf(t x
nf(t ); x ))
; n

n ||)2 ). Let Qt be the polynomial referred to by the big-O

in time O(Pt (||
notation.

8.4. Two-sorted systems A(;), LA(;)

Using the fundamental Curry–Howard correspondence, we now trans-

fer the term systems T(;) and LT(;) to corresponding logical systems
A(;) and LA(;) of arithmetic. As a consequence, LA(;) and also the
Σ1 -fragment of A(;) will automatically yield polynomial-time extracts.
The goal is to ensure, by some annotations to proofs, that the extract
of a proof is a term, in LT(;) or T(;), with polynomial complexity. The
annotations are such that if we ignore them, then the resulting proof
is a correct one, in ordinary arithmetic. Of course, we could also first
extract a term in T and then annotate this term to obtain a term in
LT(;). However, the whole point of the present approach is to work with
proofs rather than terms. An additional benefit of annotating proofs is
that when interactively developing such a proof and finally checking its
correctness w.r.t. input/output annotations, one can provide informative
error messages. More precisely, the annotations consist in distinguishing
• two type arrows, → and → ,
• two sorts of variables, input ones x̄ and output ones x, and
• two implications, A → B and A → B.
Implication A → B is the “input” one, involving restrictions on the proofs
of its premise: such proofs are only allowed to use input assumptions or
input variables. In contrast, A → B is the “output” implication, for at
most one use of the hypothesis in case its type is not a base type.
8.4. Two-sorted systems A(;), LA(;) 423
8.4.1. Motivation. To motivate our annotations let us look at some
examples of arithmetical existence proofs exhibiting exponential growth.
Double use of assumptions. Consider
E(1, y) := S0 (y), E(1) := S0 ,
or
E(Si (x), y) := E(x, E(x, y)), E(Si (x)) := E(x) ◦ E(x).
||x||−1
Then E(x) = S0(2 )
, i.e., E grows exponentially. Here is a correspond-
ing existence proof. We have to show
∀x,y ∃v (||v|| = 2||x||−1 + ||y||).
Proof. By induction on x. The base case is obvious. For the step
let x be given and assume (induction hypothesis) ∀y ∃v (||v|| = 2||x||−1 +
||y||). We must show ∀y ∃w (||w|| = 2||x|| + ||y||). Given y, construct w by
using (induction hypothesis) with y to find v, and then using (induction
hypothesis) again, this time with v, to find w.
The double use of the (“functional”) induction hypothesis clearly is
responsible for the exponential growth. Our linearity restriction on output
implications will exclude such proofs.
Substitution in function parameters. Consider the iteration functional
I (x, f) = f (||x||−1) ; it is considered feasible in our setting. However, sub-
stituting the easily definable doubling function D satisfying ||D(x)|| = 2||x||
yields the exponential function I (x, D) = D (||x||−1) . The corresponding
proofs of
∀x (∀y ∃z (||z|| = 2||y||) → ∀y ∃v (||v|| = 2||x||−1 + ||y||)), (1)
∀y ∃z (||z|| = 2||y||) (2)
are unproblematic, but to avoid explosion we need to forbid applying a
cut here.
Our solution is to introduce a ramification concept. (2) is proved by
induction on y, hence needs a quantifier on an input variable: ∀ȳ ∃z (||z|| =
2||ȳ||). We exclude applicability of a cut by our ramification condition,
which requires that the “kernel” of (1)—to be proved by induction on
x—is safe and hence does not contain such universal subformulas proved
by induction.
Iterated induction. It might seem that our restrictions are so tight that
they rule out any form of nested induction. However, this is not true. One
can define, e.g., (a form of) multiplication on top of addition. First one
proves ∀x̄ ∀y ∃z (||z|| = ||x̄|| + ||y||) by induction on x̄, and then ∀ȳ ∃z (||z|| =
||x̄|| · ||ȳ||) by induction on ȳ with a parameter x̄.
8.4.2. LA(;)-proof terms. We assume a given set of inductively defined
predicates I , as in 7.2. Recall that each predicate I is of a fixed arity
(“arity” here means not just the number of arguments, but also covers the
424 8. Linear two-sorted arithmetic

type of the arguments). When writing I (r ) we implicitly assume correct

length and types of r. LA(;)-formulas (formulas for short) A, B, . . . are
I (r ) | A → B | A → B | ∀x̄ A | ∀x A.
In I (r ), the r are terms from T. Define falsity F by tt = ff and negation
¬A by A → F.
We adapt the assigment in 7.2.4 of a type (A) to a formula A to LA(;)-
formulas. Again it is convenient to extend the use of → and →
to the nulltype symbol ◦: for ∈ {→, →},
( ◦) := ◦, (◦ ) := , (◦ ◦) := ◦.
With this understanding we can simply write

◦ if I does not require witnesses
(I (r )) :=
I otherwise,
(A → B) := ((A) → (B)), (∀x̄ A) := ( → (A)),
(A → B) := ((A) → (B)), (∀x A) := ( → (A)).
A formula A is called safe if (A) is safe, i.e., →-free. For instance, every
formula without → and universal quantifiers ∀x̄ over an input variable x̄
is safe. Recall the definition of the level of a type (in 6.1.4); types of level
0 are called base types.
The induction axiom for N is
Indn,A : ∀n (A(0) → ∀a (A(a) → A(Sa)) → A(n N ))
with n an input and a an output variable of type N, and A a safe formula.
It has the type of the recursion operator which will realize it, namely
N → → (N → → ) → where = (A) is safe.
The cases axioms are as expected.
By an ordinary proof term we mean a standard proof term built from
axioms, assumption and object terms by the usual introduction and elimi-
nation rules for both implications → and → and both universal quantifiers
(over input and output variables). The construction is as follows:
cA (axiom) |
ū , u A
A
(input and output assumption variables) |
B A→B
(ū A M ) | (M A→B N A )B | (u A M B )A→B | (M A→B N A )B |
(x̄ M A )∀x̄ A | (M ∀x̄ A(x̄) r )A(r) | (x M A )∀x A | (M ∀x A(x) r )A(r) .
In the two introduction rules for the universal quantifier we assume the
usual condition on free variables, i.e., that x must not be free in the formula
of any free assumption variable. In the elimination rules for the universal
quantifier, r is a term in T (not necessarily in LT(;)).
8.4. Two-sorted systems A(;), LA(;) 425
If we disregard the difference between input and output variables and
also between the two implications → and → and the two type arrows →
and →, then every ordinary proof term becomes a proof term in HA .
Definition (LA(;)-proof term). The proof terms which make up LA(;)
are exactly those whose “extracted terms” (see below) lie in LT(;).
To complete the definition we need to define the extracted term et(M )
of an ordinary proof term M . This definition is an adaption of the
corresponding one in 7.2.5. We may assume that M derives a formula A
with (A) = ◦. Then
et(ū A ) := x̄ū(A) ,
et(u A ) := xu(A) ,
et((ū A M )A→B ) := x̄ (A) et(M ),
ū

et((u A M )A→B ) := x (A) et(M ),

et(M A→B N ) := et(M A→B N ) := et(M )et(N ),

et((x̃ M )∀x̃ A ) := x̃ et(M ),
et(M ∀x̃ A r) := et(M )r,
with x̃ an input or output variable. Extracted terms for the axioms
are deﬁned in the obvious way: constructors for the introductions and
recursion operators for the eliminations, as in 7.2.5.
The LA(;)-proof terms and their corresponding sets CV(M ) of compu-
tational variables may alternatively be inductively deﬁned. If (A) = ◦ then
every ordinary proof term M A is an LA(;)-proof term and CV(M ) := ∅.
(i) Every assumption constant (axiom) c A and every input or output
assumption variable ū A or u A is an LA(;)-proof term. CV(ū A ) :=
{x̄ū } and CV(u A ) := {xu }.
(ii) If M A is an LA(;)-proof term, then also (ū A M )A→B and
(u A M )A→B . CV(ū A M ) = CV(M ) \ {x̄ū } and CV(u A M ) =
CV(M ) \ {xu }.
(iii) If M A→B and N A are LA(;)-proof terms, then so is (MN )B , pro-
vided all variables in CV(N ) are input. CV(MN ) := CV(M ) ∪
CV(N ).
(iv) If M A→B and N A are LA(;)-proof terms, then so is (MN )B , provided
the higher type output variables in CV(M ) and CV(N ) are disjoint.
CV(MN ) := CV(M ) ∪ CV(N ).
(v) If M A is an LA(;)-proof term, and x̃ ∈ / FV(B) for every formula B of
a free assumption variable in M , then so is (x̃ M )∀x̃ A . CV(x̃ M ) :=
CV(M ) \ {x̃} (x̃ an input or output variable).
(vi) If M ∀x̄ A(x̄) is an LA(;)-proof term and r is an input LT(;)-term, then
(Mr)A(r) is an LA(;)-proof term. CV(Mr) := CV(M ) ∪ FV(r).
426 8. Linear two-sorted arithmetic

(vii) If M ∀x A(x) is an LA(;)-proof term and r is an LT(;)-term, then

(Mr)A(r) is an LA(;)-proof term, provided the higher type output
variables in CV(M ) are not free in r. CV(Mr) := CV(M ) ∪ FV(r).
It is easy to see that for every LA(;)-proof term M , the set CV(M ) of its
computational variables is the set of variables free in the extracted term
et(M ).
Theorem (Characterization). The LA(;)-proof terms are exactly those
generated by the above clauses.
Proof. We proceed by induction on M , assuming that M is an ordinary
proof term. We can assume (A) = ◦, for otherwise the claim is obvious.
Case M A→B N A with (A) = ◦. The following are equivalent.
• MN is generated by the clauses.
• M , N are generated by the clauses, and the higher type output
variables in CV(M ) and CV(N ) are disjoint.
• et(M ) and et(N ) are LT(;)-terms, and the higher type output vari-
ables in FV(et(M )) and FV(et(N )) are disjoint.
• et(M )et(N ) (= et(MN )) is an LT(;)-term.
The other cases are similar.
The natural deduction framework now provides a straightforward for-
malization of proofs in LA(;). This applies, for example, to the proofs
sketched in 8.4.1. Further examples will be given below.
8.4.3. LA(;) and its provably recursive functions. A k-ary numerical
function f is provably recursive in LA(;) if there is a Σ1 -formula Cf (n1 ,
. . . , nk , a) denoting the graph of f, and a derivation Mf in LA(;) of

∀n1 ,...,nk ∃a Cf (n1 , . . . , nk , a).

Here the ni , a denote input, respectively output variables of type W.

Theorem. The functions provably recursive in LA(;) are exactly the defi-
nable functions of LT(;) of type W k → W, which are exactly the functions
computable in polynomial time.
Proof. Let M be a derivation in LA(;) proving a formula of type
W k → W. Then et(M ) belongs to LT(;) and hence denotes a polynomial
time function which, by the soundness theorem, is f.
Conversely, any polynomial-time function f is represented by an LT(;)-
term, say t(
n ), and from t( n ) we deduce ∀n ∃a (t(
n ) = t( n ) = a). We
n ) = a to be the formula Cf . Thus f is provably recursive.
may take t(
8.4.4. A(;)- and Σ1 -A(;)-proof terms. In much the same way as we have
defined LA(;) from LT(;) above, we can define an arithmetical system A(;)
corresponding to T(;). A(;) is just LA(;), but with all linearity restrictions
removed. The analogue of the theorem above is now
8.4. Two-sorted systems A(;), LA(;) 427
Theorem. The functions provably recursive in A(;) are exactly the defi-
nable functions of T(;) of type N k → N, which are exactly the elementary
functions.
In 8.3.5 we have defined T1 (;) to be the first-order fragment of T(;),
where recursion and cases operators have base type values only. Let
Σ1 -A(;) be the corresponding arithmetical system; that is, the induction
and cases axioms are allowed for formulas A of base type only, which
is the appropriate generalization of Σ1 -formulas in our setting. Σ1 -A(;)
therefore is the Σ1 -fragment of A(;). Then again
Theorem. The functions provably recursive in Σ1 -A(;) are exactly the
definable functions of T1 (;) of type W k → W, which are exactly the
polynomial-time computable functions.
8.4.5. Application: insertion sort in LA(;). We show that the insertion
sort algorithm is the computational content of an appropriate proof.
To this end we recursively define a function I inserting an element a
into a list l , in the first place where it finds an element bigger:

a :: b :: l if a ≤ b,
I (a, nil) := a :: nil, I (a, b :: l ) :=
b :: I (a, l ) otherwise
and, using I , a function S sorting a list l into ascending order:
S(nil) := nil, S(a :: l ) := I (a, S(l )).
These functions need only be presented to the theory by inductive defini-
tions of their graphs. Thus, writing I (a, l, l ) to denote I (a, l ) = l and
similarly, S(l, l ) to denote S(l ) = l , we have the following axioms:
I (a, nil, a :: nil),
a ≤ b → I (a, b :: l, a :: b :: l ),
b < a → I (a, l, l ) → I (a, b :: l, b :: l ),
S(nil, nil),
S(l, l ) → I (a, l , l ) → S(a :: l, l ).
We need that the Σ1 -inductive definitions of I and S are admitted in safe
LA(;)-formulas. As an auxiliary function we use tli (l ), which is the tail of
the list l of length i, if i < lh(l ), and l otherwise. Its recursion equations
are
tli (nil) := nil, tli (a :: l ) := [if i ≤ lh(l ) then tli (l ) else a :: l ].
We will need some easy properties of S and tl:
S(l, l ) → lh(l ) = lh(l ),
i ≤ lh(l ) → tli (b :: l ) = tli (l ),
tllh(l ) (l ) = l, tl0 (l ) = nil.
428 8. Linear two-sorted arithmetic

We now want to derive S(l )↓ in LA(;). That is, ∃l S(l, l ). However, we

shall not be able to do this. All we can achieve is, for any input parameter
n, lh(l ) ≤ n → S(l )↓.
Lemma (Insertion). ∀a,l,n ∀i≤n ∃l I (a, tlmin(i,lh(l )) (l ), l ).
Proof. We fix a, l and prove the claim by induction on n. In the base
case we can take l := a :: nil, using tl0 (l ) = nil. For the step we must
show
∀i≤n ∃l I (a, tlmin(i,lh(l )) (l ), l ) → ∀i≤n+1 ∃l I (a, tlmin(i,lh(l )) (l ), l ).
Assume the premise, and i ≤ n + 1. If i ≤ n we are done, by the
premise. So let i = n + 1. If lh(l ) ≤ n then the premise for i := n
gives ∃l I (a, tllh(l ) (l ), l ), which is our goal. If n + 1 ≤ lh(l ) we need to
show ∃l I (a, tln+1 (l ), l ). Observe that tln+1 (l ) = b :: tln (l ) with b :=
hd(tln+1 (l )), because of n + 1 ≤ lh(l ). We now use the definition of I . If
a ≤ b, we explicitly have the desired value l := a :: b :: tln (l ). Otherwise
it suffices to know ∃l I (a, tln (l ), l ). But this follows from the premise for
i := n.
Using this we now prove
Lemma (Insertion sort). ∀l,n,m (m ≤ n → ∃l S(tlmin(m,lh(l )) (l ), l )).
Proof. We fix l, n and prove the claim by induction on m. In the base
case we can take l := nil, using tl0 (l ) = nil. For the step we must show

(m ≤ n → ∃l S(tlmin(m,lh(l )) (l ), l )) → (m+1 ≤ n →
∃l S(tlmin(m+1,lh(l )) (l ), l )).
Assume the premise and m + 1 ≤ n. If lh(l ) ≤ m we are done, by
the premise. If m + 1 ≤ lh(l ) we need to show ∃l S(tlm+1 (l ), l ). Now
tlm+1 (l ) = a :: tlm (l ) with a := hd(tlm+1 (l )), because of m +1 ≤ lh(l ). By
definition of S it suffices to find l , l such that S(tlm (l ), l ) and I (a, l , l ).
Pick by the premise an l with S(tlm (l ), l ). Further, the insertion lemma
applied to a, l , n and i := m gives an l such that I (a, tlm (l ), l ). Using
lh(l ) = lh(tlm (l )) = c we obtain I (a, l , l ), as desired.
Specializing this to l, n, n we finally obtain
lh(l ) ≤ n → ∃l S(l, l ).

8.5. Notes

The elementary variant T(;) of Gödel’s T developed in 8.2 has many

relatives in the literature.
Beckmann and Weiermann [1996] characterize the elementary func-
tions by means of a restriction of the combinatory logic version of Gödel’s
8.5. Notes 429

T. The restriction consists in allowing occurrences of the iteration opera-

tor only when immediately applied to a type N argument. For the proof
they use an ordinal assignment due to Howard [1970] and Schütte [1977].
The authors remark (on p. 477) that the methods of their paper can also be
applied to a -formulation of T: the restriction on terms then consists in
allowing only iterators of the form I t N and in disallowing -abstraction
of the form x . . . I t N . . . where x occurs in t N ; however, no details are
given. Moreover, our restrictions are slightly more liberal (input variables
in t can be abstracted), and also the proof method is very diﬀerent.
Aehlig and Johannsen [2005] characterize the elementary functions by
means of a fragment of Girard’s system F . They make essential use of
the Church-style representation of numbers in F . A somewhat diﬀerent
approach for characterizing the elementary functions based on a “pred-
icative” setting has been developed by Leivant [1994].
BIBLIOGRAPHY

Andreas Abel and Thorsten Altenkirch

[2000] A predicative strong normalization proof for a -calculus with
interleaving inductive types, Types for Proofs and Programs, Lecture Notes
in Computer Science, vol. 1956, Springer Verlag, Berlin, pp. 21– 40.

Samson Abramsky
[1991] Domain theory in logical form, Annals of Pure and Applied Logic,
vol. 51, pp. 1–77.

Samson Abramsky and Achim Jung

[1994] Domain theory, Handbook of Logic in Computer Science
(S. Abramsky, D. M. Gabbay, and T. S. E. Maibaum, editors), vol. 3,
Clarendon Press, pp. 1–168.

Wilhelm Ackermann
[1940] Zur Widerspruchsfreiheit der Zahlentheorie, Mathematische An-
nalen, vol. 117, pp. 162–194.

Peter Aczel, Harold Simmons, and Stanley S. Wainer

[1992] Proof Theory. A selection of papers from the Leeds Proof Theory
Programme 1990, Cambridge University Press.

Klaus Aehlig and Jan Johannsen

[2005] An elementary fragment of second-order lambda calculus, ACM
Transactions on Computational Logic, vol. 6, pp. 468– 480.

Roberto M. Amadio and Pierre-Louis Curien

[1998] Domains and Lambda-Calculi, Cambridge University Press.

Toshiyasu Arai
[1991] A slow growing analogue to Buchholz’ proof, Annals of Pure and
Applied Logic, vol. 54, pp. 101–120.
[2000] Ordinal diagrams for recursively Mahlo universes, Archive for
Mathematical Logic, vol. 39, no. 5, pp. 353–391.

431
432 Bibliography

Jeremy Avigad
[2000] Interpreting classical theories in constructive ones, The Journal of
Symbolic Logic, vol. 65, no. 4, pp. 1785–1812.

Jeremy Avigad and Rick Sommer

[1997] A model theoretic approach to ordinal analysis, The Bulletin of
Symbolic Logic, vol. 3, pp. 17–52.

Franco Barbanera and Stefano Berardi

[1993] Extracting constructive content from classical logic via control-like
reductions, Typed Lambda Calculi and Applications (M. Bezem and J. F.
Groote, editors), Lecture Notes in Computer Science, vol. 664, Springer
Verlag, Berlin, pp. 45–59.

Hendrik Pieter Barendregt

[1984] The Lambda Calculus, second ed., North-Holland, Amsterdam.

Henk Barendregt, Mario Coppo, and Mariangiola Dezani-

Ciancaglini
[1983] A ﬁlter lambda model and the completeness of type assignment,
The Journal of Symbolic Logic, vol. 48, no. 4, pp. 931–940.

Arnold Beckmann, Chris Pollett, and Samuel R. Buss

[2003] Ordinal notations and well-orderings in bounded arithmetic, An-
nals of Pure and Applied Logic, vol. 120, pp. 197–223.

Arnold Beckmann and Andreas Weiermann

[1996] A term rewriting characterization of the polytime functions and
related complexity classes, Archive for Mathematical Logic, vol. 36, pp. 11–
30.

Lev D. Beklemishev
[2003] Proof-theoretic analysis of iterated reﬂection, Archive for Mathe-
matical Logic, vol. 42, no. 6, pp. 515–552.

Stephen Bellantoni and Stephen Cook

[1992] A new recursion-theoretic characterization of the polytime func-
tions, Computational Complexity, vol. 2, pp. 97–110.

Stephen Bellantoni and Martin Hofmann

[2002] A new “feasible” arithmetic, The Journal of Symbolic Logic,
vol. 67, no. 1, pp. 104–116.

Ulrich Berger, Wilfried Buchholz, and Helmut Schwichtenberg

[2000] Higher type recursion, ramiﬁcation and polynomial time, Annals
of Pure and Applied Logic, vol. 104, pp. 17–30.
Bibliography 433

Holger Benl
[1998] Konstruktive Interpretation induktiver Deﬁnitionen, Master’s the-
sis, Mathematisches Institut der Universität München.

Ulrich Berger
[1993a] Program extraction from normalization proofs, Typed Lambda
Calculi and Applications (M. Bezem and J. F. Groote, editors), Lecture
Notes in Computer Science, vol. 664, Springer Verlag, Berlin, pp. 91–106.
[1993b] Total sets and objects in domain theory, Annals of Pure and
Applied Logic, vol. 60, pp. 91–117.
[2005a] Continuous semantics for strong normalization, Proceedings CiE
2005, Lecture Notes in Computer Science, vol. 3526, pp. 23–34.
[2005b] Uniform Heyting arithmetic, Annals of Pure and Applied Logic,
vol. 133, pp. 125–148.
[2009] From coinductive proofs to exact real arithmetic, Computer Sci-
ence Logic (E. Grädel and R. Kahle, editors), Lecture Notes in Computer
Science, Springer Verlag, Berlin, pp. 132–146.

Ulrich Berger, Stefan Berghofer, Pierre Letouzey, and Helmut

Schwichtenberg
[2006] Program extraction from normalization proofs, Studia Logica,
vol. 82, pp. 27–51.

Ulrich Berger, Wilfried Buchholz, and Helmut Schwichtenberg

[2002] Reﬁned program extraction from classical proofs, Annals of Pure
and Applied Logic, vol. 114, pp. 3–25.

Ulrich Berger, Matthias Eberl, and Helmut Schwichtenberg

[2003] Term rewriting for normalization by evaluation, Information and
Computation, vol. 183, pp. 19– 42.

Ulrich Berger and Helmut Schwichtenberg

[1991] An inverse of the evaluation functional for typed -calculus, Pro-
ceedings 6’th Symposium on Logic in Computer Science (LICS ’91) (R. Ve-
muri, editor), IEEE Computer Society Press, Los Alamitos, pp. 203–211.

Ulrich Berger, Helmut Schwichtenberg, and Monika Seisenberger

[2001] The Warshall algorithm and Dickson’s lemma: Two examples
of realistic program extraction, Journal of Automated Reasoning, vol. 26,
pp. 205–221.

Evert Willem Beth

[1956] Semantic construction of intuitionistic logic, Medelingen de
KNAW N.S., vol. 19, no. 11.
[1959] The Foundations of Mathematics, North-Holland, Amsterdam.
434 Bibliography

Marc Bezem and Vim Veldman

[1993] Ramsey’s theorem and the pigeonhole principle in intuitionistic
mathematics, Journal of the London Mathematical Society, vol. 47, pp.
193–211.

Frédéric Blanqui, Jean-Pierre Jouannaud, and Mitsuhiro Okada

[1999] The Calculus of Algebraic Constructions, RTA ’99, Lecture Notes
in Computer Science, vol. 1631.

Egon Börger, Erich Grädel, and Yuri Gurevich

[1997] The Classical Decision Problem, Perspectives in Mathematical
Logic, Springer Verlag, Berlin.

Alan Borodin and Robert L. Constable

[1971] Subrecursive programming languages II: on program size, Journal
of Computer and System Sciences, vol. 5, pp. 315–334.

Andrey Bovykin
[2009] Brief introduction to unprovability, Logic colloquium 2006, Lec-
ture Notes in Logic, Association for Symbolic Logic and Cambridge
University Press, pp. 38–64.

Wilfried Buchholz
[1980] Three contributions to the conference on recent advances in proof
theory, Handwritten notes.
[1987] An independence result for Π11 –CA+BI, Annals of Pure and Ap-
plied Logic, vol. 33, no. 2, pp. 131–155.

Wilfried Buchholz, Adam Cichon, and Andreas Weiermann

[1994] A uniform approach to fundamental sequences and hierarchies,
Mathematical Logic Quarterly, vol. 40, pp. 273–286.

Wilfried Buchholz, Solomon Feferman, Wolfram Pohlers, and

Wilfried Sieg
[1981] Iterated Inductive Deﬁnitions and Subsystems of Analysis: Re-
cent Proof-Theoretical Studies, Lecture Notes in Mathematics, vol. 897,
Springer Verlag, Berlin, Berlin.

Wilfried Buchholz and Wolfram Pohlers

[1978] Provable wellorderings of formal theories for transﬁnitely iterated
inductive deﬁnitions, The Journal of Symbolic Logic, vol. 43, pp. 118–125.

Wilfried Buchholz and Stanley S. Wainer

[1987] Provably computable functions and the fast growing hierarchy,
Logic and Combinatorics (S. G. Simpson, editor), Contemporary Mathe-
matics, vol. 65, American Mathematical Society, pp. 179–198.
Bibliography 435

Samuel R. Buss
[1986] Bounded Arithmetic, Studies in Proof Theory, Lecture Notes,
Bibliopolis, Napoli.
[1994] The witness function method and provably recursive functions of
Peano arithmetic, Proceedings of the 9th International Congress of Logic,
Methodology and Philosophy of Science (D. Prawitz, B. Skyrms, and
D. Westerstahl, editors), North-Holland, Amsterdam, pp. 29–68.
[1998a] First order proof theory of arithmetic, Handbook of Proof The-
ory (S. Buss, editor), North-Holland, Amsterdam, pp. 79–147.
[1998b] Handbook of Proof Theory, Studies in Logic and the Founda-
tions of Mathematics, vol. 137, North-Holland, Amsterdam.

N. Çaǧman, G. E. Ostrin, and S. S. Wainer

[2000] Proof theoretic complexity of low subrecursive classes, Founda-
tions of Secure Computation (F. L. Bauer and R. Steinbrüggen, editors),
NATO Science Series F, vol. 175, IOS Press, pp. 249–285.

Andrea Cantini
[2002] Polytime, combinatory logic and positive safe induction, Archive
for Mathematical Logic, vol. 41, no. 2, pp. 169–189.

Timothy J. Carlson
[2001] Elementary patterns of resemblance, Annals of Pure and Applied
Logic, vol. 108, pp. 19–77.

Victor P. Chernov
[1976] Constructive operators of ﬁnite types, Journal of Mathemati-
cal Science, vol. 6, pp. 465– 470, translated from Zapiski Nauch. Sem.
Leningrad, vol. 32, pp. 140–147 (1972).

Luca Chiarabini
[2009] Program Development by Proof Transformation, PhD thesis,
Fakultät für Mathematik, Informatik und Statistik der LMU, München.

Alonzo Church
[1936] A note on the Entscheidungsproblem, The Journal of Symbolic
Logic, vol. 1, pp. 40– 41, Correction, ibid., pp. 101–102.

Adam Cichon
[1983] A short proof of two recently discovered independence proofs using
recursion theoretic methods, Proceedings of the American Mathematical
Society, vol. 87, pp. 704–706.

John P. Cleave
[1963] A hierarchy of primitive recursive functions, Zeitschrift für Ma-
thematische Logik und Grundlagen der Mathematik, vol. 9, pp. 331–345.
436 Bibliography

Peter Clote and Gaisi Takeuti

[1995] First order bounded arithmetic and small boolean circuit complex-
ity classes, Feasible Mathematics II (P. Clote and J. Remmel, editors),
Birkhäuser, Boston, pp. 154–218.

Alan Cobham
[1965] The intrinsic computational diﬃculty of functions, Logic, Method-
ology and Philosophy of Science II (Y. Bar-Hillel, editor), North-Holland,
Amsterdam, pp. 24–30.

Robert L. Constable
[1972] Subrecursive programming languages I: eﬃciency and program
structure, Journal of the ACM, vol. 19, pp. 526–568.

Robert L. Constable and Chetan Murthy

[1991] Finding computational content in classical proofs, Logical Frame-
works (G. Huet and G. Plotkin, editors), Cambridge University Press,
pp. 341–362.

Stephen A. Cook and Bruce M. Kapron

[1990] Characterizations of the basic feasible functionals of ﬁnite type,
Feasible Mathematics (S. Buss and P. Scott, editors), Birkhäuser, pp. 71–
96.

S. Barry Cooper
[2003] Computability Theory, Shapman Hall/CRC.

Coq Development Team

[2009] The Coq Proof Assistant Reference Manual – Version 8.2, Inria.

Thierry Coquand and Martin Hofmann

[1999] A new method for establishing conservativity of classical systems
over their intuitionstic version, Mathematical Structures in Computer Sci-
ence, vol. 9, pp. 323–333.

Thierry Coquand and Hendrik Persson

[1999] Gröbner bases in type theory, Types for Proofs and Programs
(T. Altenkirch, W. Naraschewski, and B. Reus, editors), Lecture Notes in
Computer Science, vol. 1657, Springer Verlag, Berlin.

Thierry Coquand, Giovanni Sambin, Jan Smith, and Silvio Valentini

[2003] Inductively generated formal topologies, Annals of Pure and Ap-
plied Logic, vol. 124, pp. 71–106.

Thierry Coquand and Arnaud Spiwack

[2006] A proof of strong normalisation using domain theory, Proceedings
LICS 2006, pp. 307–316.
Bibliography 437

Haskell B. Curry
[1930] Grundlagen der kombinatorischen Logik, American Journal of
Mathematics, vol. 52, pp. 509–536, 789–834.

Nigel J. Cutland
[1980] Computability: An Introduction to Recursive Function Theory,
Cambridge University Press.

Nicolaas G. de Bruijn
[1972] Lambda calculus notation with nameless dummies, a tool for
automatic formula manipulation, with application to the Church–Rosser
theorem, Indagationes Mathematicae, vol. 34, pp. 381–392.

Leonard E. Dickson
[1913] Finiteness of the odd perfect and primitive abundant numbers
with n distinct prime factors, American Journal of Mathematics, vol. 35,
pp. 413– 422.

Justus Diller and W. Nahm

[1974] Eine Variante zur Dialectica-Interpretation der Heyting-
Arithmetik endlicher Typen, Archiv für Mathematische Logik und Grund-
lagenforschung, vol. 16, pp. 49–66.

Albert Dragalin
[1979] New kinds of realizability, Abstracts of the 6th International
Congress of Logic, Methodology and Philosophy of Sciences, Hannover,
Germany, pp. 20–24.

Jan Ekman
[1994] Normal Proofs in Set Theory, PhD thesis, Department of Com-
puter Science, University of Göteborg.

Yuri L. Ershov
[1972] Everywhere deﬁned continuous functionals, Algebra i Logika,
vol. 11, no. 6, pp. 656–665.
[1977] Model C of partial continuous functionals, Logic colloquium 1976
(R. Gandy and M. Hyland, editors), North-Holland, Amsterdam, pp.
455– 467.

Matthew V. H. Fairtlough and Stanley S. Wainer

[1992] Ordinal complexity of recursive deﬁnitions, Information and Com-
putation, vol. 99, pp. 123–153.
[1998] Hierarchies of provably recursive functions, Handbook of Proof
Theory (S. Buss, editor), Studies in Logic and the Foundations of Mathe-
matics, vol. 137, North-Holland, Amsterdam, pp. 149–207.
438 Bibliography

Solomon Feferman
[1960] Arithmetization of metamathematics in a general setting, Funda-
menta Mathematicae, vol. XLIX, pp. 35–92.
[1962] Classifications of recursive functions by means of hierarchies,
Transactions American Mathematical Society, vol. 104, pp. 101–122.
[1970] Formal theories for transfinite iterations of generalized inductive
definitions and some subsystems of analysis, Intuitioninism and proof theory
(J. Myhill A. Kino and R. E. Vesley, editors), Studies in Logic and the
Foundations of Mathematics, North-Holland, Amsterdam, pp. 303–325.
[1982] Iterated inductive fixed point theories: applications to Han-
cock’s conjecture, The Patras Symposium (G. Metakides, editor), North-
Holland, Amsterdam, pp. 171–196.
[1992] Logics for termination and correctness of functional programs,
Logic from Computer Science, Proceedings of a Workshop held November
13–17, 1989 (Y. N. Moschovakis, editor), MSRI Publications, no. 21,
Springer Verlag, Berlin, pp. 95–127.
[1996] Computation on abstract data types. The extensional approach,
with an application to streams, Annals of Pure and Applied Logic, vol. 81,
pp. 75–113.

Solomon Feferman, John W. Dawson et al.

[1986, 1990, 1995, 2002a, 2002b] Kurt Gödel Collected Works, Volume
I–V, Oxford University Press.

Solomon Feferman and Thomas Strahm

[2000] The unfolding of non-ﬁnitist arithmetic, Annals of Pure and Ap-
plied Logic, vol. 104, pp. 75–96.
[2010] Unfolding ﬁnitist arithmetic, Review of Symbolic Logic, vol. 3,
pp. 665–689.

Matthias Felleisen, Daniel P. Friedman, E. Kohlbecker, and B. F.

Duba
[1987] A syntactic theory of sequential control, Theoretical Computer
Science, vol. 52, pp. 205–237.

Matthias Felleisen and R. Hieb

[1992] The revised report on the syntactic theory of sequential control
and state, Theoretical Computer Science, vol. 102, pp. 235–271.

Andrzej Filinski
[1999] A semantic account of type-directed partial evaluation, Principles
and Practice of Declarative Programming 1999, Lecture Notes in Com-
puter Science, vol. 1702, Springer Verlag, Berlin, pp. 378–395.
Bibliography 439

Harvey Friedman
[1970] Iterated inductive deﬁnitions and Σ12 -AC, Intuitioninism and proof
theory (J. Myhill A. Kino and R. E. Vesley, editors), Studies in Logic and
the Foundations of Mathematics, North-Holland, Amsterdam, pp. 435–
442.
[1978] Classically and intuitionistically provably recursive functions,
Higher Set Theory (D. S. Scott and G. H. Müller, editors), Lecture Notes
in Mathematics, vol. 669, Springer Verlag, Berlin, pp. 21–28.
[1981] Independence results in ﬁnite graph theory, Unpublished
manuscripts, Ohio State University, 76 pages.
[1982] Beyond Kruskal’s theorem I–III, Unpublished manuscripts, Ohio
State University, 48 pages.

Harvey Friedman, Neil Robertson, and Paul Seymour

[1987] The metamathematics of the graph minor theorem, Logic and
Combinatorics (S. G. Simpson, editor), Contemporary Mathematics,
vol. 65, American Mathematical Society, pp. 229–261.

Harvey Friedman and Michael Sheard

[1995] Elementary descent recursion and proof theory, Annals of Pure
and Applied Logic, vol. 71, pp. 1– 45.

Gerhard Gentzen
[1935] Untersuchungen über das logische Schließen I, II, Mathematische
Zeitschrift, vol. 39, pp. 176–210, 405– 431.
[1936] Die Widerspruchsfreiheit der reinen Zahlentheorie, Mathemati-
sche Annalen, vol. 112, pp. 493–565.
[1943] Beweisbarkeit und Unbeweisbarkeit von Anfangsfällen der trans-
ﬁniten Induktion in der reinen Zahlentheorie, Mathematische Annalen,
vol. 119, pp. 140–161.

Philipp Gerhardy and Ulrich Kohlenbach

[2008] General logical metatheorems for functional analysis, Transac-
tions of the American Mathematical Society, vol. 360, pp. 2615–2660.

Jean-Yves Girard
[1971] Une extension de l’interprétation de Gödel à l’analyse, et son
application à l’élimination des coupures dans l’analyse et la théorie des
types, Proceedings of the Second Scandinavian Logic Symposium (J. E.
Fenstad, editor), North-Holland, Amsterdam, pp. 63–92.
[1981] Π12 -logic. Part I: Dilators, Annals of Mathematical Logic, vol. 21,
pp. 75–219.
[1987] Proof Theory and Logical Complexity, Bibliopolis, Napoli.
[1998] Light linear logic, Information and Computation, vol. 143, pp.
175–204.
440 Bibliography

Kurt Gödel
[1931] Über formal unentscheidbare Sätze der Principia Mathematica
und verwandter Systeme I, Monatshefte für Mathematik und Physik, vol.
38, pp. 173–198.
[1958] Über eine bisher noch nicht benützte Erweiterung des ﬁniten
Standpunkts, Dialectica, vol. 12, pp. 280–287.

Ruben L. Goodstein
[1944] On the restricted ordinal theorem, The Journal of Symbolic Logic,
vol. 9, pp. 33– 41.

Ronald Graham, Bruce Rothschild, and Joel Spencer

[1990] Ramsey Theory, second ed., Discrete Mathematics and Opti-
mization, Wiley Interscience.

Timothy G. Griﬃn
[1990] A formulae-as-types notion of control, Conference Record of the
Seventeenth Annual ACM Symposium on Principles of Programming Lan-
guages, pp. 47–58.

Andrzey Grzegorczyk
[1953] Some Classes of Recursive Functions, Rozprawy Matematyczne,
Warszawa.

Tatsuya Hagino
[1987] A typed lambda calculus with categorical type constructions, Cat-
egory Theory and Computer Science (D. H. Pitt, A. Poigné, and D. E. Ry-
deheard, editors), Lecture Notes in Computer Science, vol. 283, Springer
Verlag, Berlin, pp. 140–157.

Petr Hájek and Pavel Pudlák

[1993] Metamathematics of First-Order Arithmetic, Perspectives in
Mathematical Logic, Springer Verlag, Berlin.

William G. Handley and Stanley S. Wainer

[1999] Complexity of primitive recursion, Computational Logic
(U. Berger and H. Schwichtenberg, editors), NATO ASI Series F, Springer
Verlag, Berlin, pp. 273–300.

Godfrey H. Hardy
[1904] A theorem concerning the inﬁnite cardinal numbers, Quaterly
Journal of Mathematics, vol. 35, pp. 87–94.

Andrew J. Heaton and Stanley S. Wainer

[1996] Axioms for subrecursion theories, Computability, Enumerability,
Unsolvability. Directions in recursion theory (S. B. Cooper, T. A. Slaman,
Bibliography 441

and S. S. Wainer, editors), London Mathematical Society Lecture Notes

Series, vol. 224, Cambridge University Press, pp. 123–138.

Mircea Dan Hernest

[2006] Feasible Programs from (Non-Constructive) Proofs by the Light
(Monotone) Dialectica Interpretation, PhD thesis, Ecole Polytechnique
Paris and LMU München.

Mircea Dan Hernest and Trifon Trifonov

[2010] Light Dialectica revisited, Annals of Pure and Applied Logic,
vol. 161, pp. 1379–1389.

Arend Heyting
[1959] Constructivity in mathematics, North-Holland, Amsterdam.

David Hilbert and Paul Bernays

[1939] Grundlagen der Mathematik, vol. II, Springer Verlag, Berlin.

Martin Hofmann
[1999] Linear types and non-size-increasing polynomial time computa-
tion, Proceedings 14’th Symposium on Logic in Computer Science (LICS
’99), pp. 464– 473.
[2000] Safe recursion with higher types and BCK-algebra, Annals of Pure
and Applied Logic, vol. 104, pp. 113–166.

William A. Howard
[1970] Assignment of ordinals to terms for primitive recursive functionals
of ﬁnite type, Intuitioninism and proof theory (J. Myhill A. Kino and R. E.
Vesley, editors), Studies in Logic and the Foundations of Mathematics,
North-Holland, Amsterdam, pp. 443– 458.
[1980] The formulae-as-types notion of construction, To H. B. Curry:
Essays on Combinatory Logic, Lambda Calculus and Formalism (J. P.
Seldin and J. R. Hindley, editors), Academic Press, pp. 479– 490.

Simon Huber
[2010] On the computional content of choice axioms, Master’s thesis,
Mathematisches Institut der Universität München.

Hajime Ishihara
[2000] A note on the Gödel–Gentzen translation, Mathematical Logic
Quarterly, vol. 46, pp. 135–137.

Gerhard Jäger
[1986] Theories for Admissible Sets: A Unifying Approach to Proof
Theory, Bibliopolis, Naples.
442 Bibliography

Gerhard Jäger, Reinhard Kahle, Anton Setzer, and Thomas

Strahm
[1999] The proof-theoretic analysis of transﬁnitely iterated ﬁxed point
theories, The Journal of Symbolic Logic, vol. 64, no. 1, pp. 53–67.

Stanislaw Jáskowski
[1934] On the rules of supposition in formal logic (Polish), Studia Logica
(old series), vol. 1, pp. 5–32, translated in Polish Logic 1920–39 (S. McCall,
editor), Clarendon Press, Oxford 1967.

Herman R. Jervell
[2005] Finite trees as ordinals, New Computational Paradigms; Proceed-
ings of CiE 2005 (S. B. Cooper, B. Löwe, and L. Torenvliet, editors),
Lecture Notes in Computer Science, vol. 3526, Springer Verlag, Berlin,
pp. 211–220.

Felix Joachimski and Ralph Matthes

[2003] Short proofs of normalisation for the simply-typed -calculus,
permutative conversions and Gödel’s T , Archive for Mathematical Logic,
vol. 42, pp. 59–87.

Carl G. Jockusch
[1972] Ramsey’s theorem and recursion theory, The Journal of Symbolic
Logic, vol. 37, pp. 268–280.

Ingebrigt Johansson
[1937] Der Minimalkalkül, ein reduzierter intuitionistischer Formalis-
mus, Compositio Mathematica, vol. 4, pp. 119–136.

Klaus Frovin Jørgensen

[2001] Finite type arithmetic, Master’s thesis, University of Roskilde.

Noriya Kadota
[1993] On Wainer’s notation for a minimal subrecursive inaccessible or-
dinal, Mathematical Logic Quarterly, vol. 39, pp. 217–227.

Lazlo Kalmár
[1943] Ein einfaches Beispiel für ein unentscheidbares Problem (Hungar-
ian, with German summary), Mat. Fiz. Lapok, vol. 50, pp. 1–23.

Jussi Ketonen and Robert M. Solovay

[1981] Rapidly growing Ramsey furnctions, Annals of Mathematics (2),
vol. 113, pp. 267–314.

Akiko Kino, John Myhill, and Richard E. Vesley

[1970] Intuitioninism and Proof Theory, Studies in Logic and the Foun-
dations of Mathematics, North-Holland, Amsterdam.
Bibliography 443

Laurie A. S. Kirby and Jeﬀ B. Paris

[1982] Accessible independence results for Peano arithmetic, Bulletin of
the American Mathematical Society, vol. 113, pp. 285–293.

Stephen C. Kleene
[1952] Introduction to Metamathematics, D. van Nostrand, New York.
[1958] Extension of an eﬀectively generated class of functions by enumer-
ation, Colloquium Mathematicum, vol. 6, pp. 67–78.

Ulrich Kohlenbach
[1996] Analysing proofs in analysis, Logic: from Foundations to Applica-
tions. European Logic Colloquium (Keele, 1993) (W. Hodges, M. Hyland,
C. Steinhorn, and J. Truss, editors), Oxford University Press, pp. 225–260.
[2005] Some logical metatheorems with applications in functional analy-
sis, Transactions of the American Mathematical Society, vol. 357, pp. 89–
128.
[2008] Applied Proof Theory: Proof Interpretations and Their Use in
Mathematics, Springer Verlag, Berlin.

Ulrich Kohlenbach and Laurentin Leustean

[2003] Mann iterates of directionally nonexpansive mappings in hyper-
bolic spaces, Abstracts in Applied Analysis, vol. 8, pp. 449– 477.

Ulrich Kohlenbach and Paulo Oliva

[2003a] Proof mining: a systematic way of analysing proofs in math-
ematics, Proceedings of the Steklov Institute of Mathematics, vol. 242,
pp. 136–164.
[2003b] Proof mining in L1 approximation, Annals of Pure and Applied
Logic, vol. 121, pp. 1–38.

Andrey N. Kolmogorov
[1925] On the principle of the excluded middle (Russian), Matematich-
eskij Sbornik. Akademiya Nauk SSSRi Moskovskoe Matematicheskoe Ob-
shchestvo, vol. 32, pp. 646–667, translated in From Frege to Gödel. A Source
Book in Mathematical Logic 1879–1931 (J. van Heijenoort, editor), Har-
vard University Press, Cambridge, MA., 1967, pp. 414– 437.
[1932] Zur Deutung der intuitionistischen Logik, Mathematische
Zeitschrift, vol. 35, pp. 58–65.

Georg Kreisel
[1951] On the interpretation of non-ﬁnitist proofs I, The Journal of Sym-
bolic Logic, vol. 16, pp. 241–267.
[1952] On the interpretation of non-ﬁnitist proofs II, The Journal of
Symbolic Logic, vol. 17, pp. 43–58.
444 Bibliography

[1959] Interpretation of analysis by means of constructive functionals

of ﬁnite types, Constructivity in Mathematics (Arend Heyting, editor),
North-Holland, Amsterdam, pp. 101–128.
[1963] Generalized inductive deﬁnitions, Reports for the seminar on foun-
dations of analysis, vol. I, Stanford University, mimeographed.

Georg Kreisel and Azriel Lévy

[1968] Reﬂection principles and their use for establishing the complexity
of axiomatic systems, Zeitschrift für mathematische Logik und Grundlagen
der Mathematik, vol. 14, pp. 97–142.

Saul A. Kripke
[1965] Semantical analysis of intuitionistic logic I, Formal Systems
and Recursive Functions (J. Crossley and M. Dummett, editors), North-
Holland, Amsterdam, pp. 93–130.

Lill Kristiansen and Dag Normann

[1997] Total objects in inductively deﬁned types, Archive for Mathema-
tical Logic, vol. 36, no. 6, pp. 405– 436.

Jean-Louis Krivine
[1994] Classical logic, storage operators and second-order lambda-
calculus, Annals of Pure and Applied Logic, vol. 68, pp. 53–78.

Joseph Bernard Kruskal

[1960] Well-quasi-orderings, the tree theorem and Vaszonyi’s conjecture,
Transactions of the American Mathematical Society, vol. 95, pp. 210–255.

Kim G. Larsen and Glynn Winskel

[1991] Using information systems to solve recursive domain equations,
Information and Computation, vol. 91, pp. 232–258.

Daniel Leivant
[1985] Syntactic translations and provably recursive functions, The Jour-
nal of Symbolic Logic, vol. 50, no. 3, pp. 682–688.
[1994] Predicative recurrence in ﬁnite type, Logical Foundations of Com-
puter Science (A. Nerode and Y.V. Matiyasevich, editors), Lecture Notes
in Computer Science, vol. 813, pp. 227–239.
[1995a] Intrinsic theories and computational complexity, Logic and
Computational Complexity, International Workshop LCC ’94, Indianapolis,
IN, USA, October 1994 (D. Leivant, editor), Lecture Notes in Computer
Science, vol. 960, Springer Verlag, Berlin, pp. 177–194.
[1995b] Ramiﬁed recurrence and computational complexity I: Word re-
currence and poly-time, Feasible Mathematics II (P. Clote and J. Remmel,
editors), Birkhäuser, Boston, pp. 320–343.
Bibliography 445

Daniel Leivant and Jean-Yves Marion

[1993] Lambda calculus characterization of poly-time, Fundamenta In-
formaticae, vol. 19, pp. 167–184.

Shih-Ceao Liu
[1960] A theorem on general recursive functions, Proceedings American
Mathematical Society, vol. 11, pp. 184–187.

Martin H. Löb
[1955] Solution of a problem of Leon Henkin, The Journal of Symbolic
Logic, vol. 20, pp. 115–118.

Martin H. Löb and Stanley S. Wainer

[1970] Hierarchies of number theoretic functions I, II, Archiv für Mathe-
matische Logik und Grundlagenforschung, vol. 13, pp. 39–51, 97–113.

Jean-Yves Marion
[2001] Actual arithmetic and feasibility, 15th International workshop,
Computer Science Logic, CSL ’01 (L. Fribourg, editor), Lecture Notes in
Computer Science, vol. 2142, Springer Verlag, Berlin, pp. 115–139.

Per Martin-Löf
[1971] Hauptsatz for the intuitionistic theory of iterated inductive deﬁ-
nitions, Proceedings of the Second Scandinavian Logic Symposium (J. E.
Fenstad, editor), North-Holland, Amsterdam, pp. 179–216.
[1972] Inﬁnite terms and a system of natural deduction, Compositio
Mathematica, vol. 24, no. 1, pp. 93–103.
[1983] The domain interpretation of type theory, Talk at the workshop
on semantics of programming languages, Chalmers University, Göteborg,
August.
[1984] Intuitionistic Type Theory, Bibliopolis.

John McCarthy
[1963] A basis for a mathematical theory of computation, Computer
Programs and Formal Methods, North-Holland, Amsterdam, pp. 33–70.

Grigori Mints
[1973] Quantifier-free and one-quantifier systems, Journal of Soviet
Mathematics, vol. 1, pp. 71–84.
[1978] Finite investigations of transfinite derivations, Journal of Soviet
Mathematics, vol. 10, pp. 548–596, translated from Zap. Nauchn. Semin.
LOMI, vol. 49 (1975).
[2000] A Short Introduction to Intuitionistic Logic, Kluwer Aca-
demic/Plenum Publishers, New York.
446 Bibliography

Alexandre Miquel
[2001] The implicit calculus of constructions. Extending pure type sys-
tems with an intersection type binder and subtyping, Proceedings of the
fifth International Conference on Typed Lambda Calculi and Applications
(TLCA ’01) (Samson Abramsky, editor), Lecture Notes in Computer
Science, vol. 2044, Springer Verlag, Berlin, pp. 344–359.
F. Lockwood Morris and Cliff B. Jones
[1984] An early program proof by Alan Turing, Annals of the History of
Computing, vol. 6, pp. 139–143.
Yiannis Moschovakis
[1997] The logic of functional recursion, Logic and Scientific Methods.
Volume One of the Tenth International Congress of Logic, Methodology
and Philosophy of Science, Florence, August 1995 (M. L. Dalla Chiara,
K. Doets, D. Mundici, and J. van Benthem, editors), Synthese Library,
vol. 259, Kluwer Academic Publishers, Dordrecht, Boston, London, pp.
179–208.
Chetan Murthy
[1990] Extracting constructive content from classical proofs, Technical
Report 90–1151, Dep. of Comp. Science, Cornell Univ., Ithaca, New York,
PhD thesis.
John Myhill
[1953] A stumbling block in constructive mathematics (abstract), The
Journal of Symbolic Logic, vol. 18, p. 190.
Sara Negri and Jan von Plato
[2001] Structural Proof Theory, Cambridge University Press.
Maxwell Hermann Alexander Newman
[1942] On theories with a combinatorial definition of “equivalence”, An-
nals of Mathematics, vol. 43, no. 2, pp. 223–243.
Dag Normann
[2000] Computability over the partial continuous functionals, The Journal
of Symbolic Logic, vol. 65, no. 3, pp. 1133–1142.
[2006] Computing with functionals – computability theory or computer
science?, The Bulletin of Symbolic Logic, vol. 12, pp. 43–59.
Piergiorgio Odifreddi
[1999] Classical Recursion Theory Volume II, vol. 143, North-Holland,
Amsterdam.
Isabel Oitavem
[2001] Implicit Characterizations of Pspace, Proof Theory in Computer
Science (R. Kahle, P. Schroeder-Heister, and R. Stärk, editors), Lecture
Bibliography 447

Notes in Computer Science, vol. 2183, Springer Verlag, Berlin, pp. 170–
190.

Paulo Oliva
[2006] Unifying functional interpretations, Notre Dame Journal of For-
mal Logic, vol. 47, pp. 262–290.

Vladimir P. Orevkov
[1979] Lower bounds for increasing complexity of derivations after cut
elimination, Zapiski Nauchnykh Seminarov Leningradskogo, vol. 88, pp.
137–161.

Geoﬀrey E. Ostrin and Stanley S. Wainer

[2005] Elementary arithmetic, Annals of Pure and Applied Logic, vol.
133, pp. 275–292.

Michel Parigot
[1992] -calculus: an algorithmic interpretation of classical natural
deduction, Proc. of Log. Prog. and Automatic Reasoning, St. Petersburg,
Lecture Notes in Computer Science, vol. 624, Springer Verlag, Berlin,
pp. 190–201.

Jeﬀ Paris
[1980] A hierarchy of cuts in models of arithmetic, Model theory of
algebra and arithmetic (L. Pacholski et al., editors), Lecture Notes in
Mathematics, vol. 834, Springer Verlag, pp. 312–337.

Jeﬀ Paris and Leo Harrington

[1977] A mathematical incompleteness in Peano arithmetic, Handbook
of Mathematical Logic (J. Barwise, editor), North-Holland, Amsterdam,
pp. 1133–1142.

Charles Parsons
[1966] Ordinal recursion in partial systems of number theory (abstract),
Notices of the American Mathematical Society, vol. 13, pp. 857–858.
[1972] On n-quantiﬁer induction, The Journal of Symbolic Logic, vol. 37,
no. 3, pp. 466– 482.
[1973] Transﬁnite induction in subsystems of number theory (abstract),
The Journal of Symbolic Logic, vol. 38, no. 3, pp. 544–545.

Gordon D. Plotkin
[1977] LCF considered as a programming language, Theoretical Com-
puter Science, vol. 5, pp. 223–255.
[1978] T as a universal domain, Journal of Computer and System Sci-
ences, vol. 17, pp. 209–236.
448 Bibliography

Wolfram Pohlers
[1998] Subsystems of set theory and second order number theory, Hand-
book of Proof Theory (S. R. Buss, editor), Studies in Logic and the Foun-
dations of Mathematics, vol. 137, North-Holland, Amsterdam, pp. 209–
335.
[2009] Proof Theory, Universitext, Springer Verlag, Berlin.
Dag Prawitz
[1965] Natural Deduction, Acta Universitatis Stockholmiensis. Stock-
holm Studies in Philosophy, vol. 3, Almqvist & Wiksell, Stockholm.
Christophe Raﬀalli
[2004] Getting results from programs extracted from classical proofs,
Theoretical Computer Science, vol. 323, pp. 49–70.
Frank Plumpton Ramsey
[1930] On a problem of formal logic, Proceedings of the London Mathe-
matical Society (2), vol. 30, pp. 264–286.
Zygmunt Ratajczyk
[1993] Subsystems of true arithmetic and hierarchies of functions, Annals
of Pure and Applied Logic, vol. 64, pp. 95–152.
Paul Rath
[1978] Eine verallgemeinerte Funktionalinterpretation der Heyting Arith-
metik endlicher Typen, PhD thesis, Universität Münster, Fachbereich
Mathematik.
Michael Rathjen
[1992] A proof-theoretic characterization of primitive recursive set func-
tions, The Journal of Symbolic Logic, vol. 57, pp. 954–969.
[1993] How to develop proof-theoretic ordinal functions on the basis of
admissible sets, Mathematical Logic Quarterly, vol. 39, pp. 47–54.
[1999] The realm of ordinal analysis, Sets and Proofs: Logic Colloquium
’97 (S. B. Cooper and J. K. Truss, editors), London Mathematical Society
Lecture Notes, vol. 258, Cambridge University Press, pp. 219–279.
[2005] Ordinal analysis of parameter free Π12 -comprehension, Archive for
Mathematical Logic, vol. 44, no. 3, pp. 263–362.
Michael Rathjen and Andreas Weiermann
[1993] Proof-theoretic investigations on Kruskal’s theorem, Annals of
Pure and Applied Logic, vol. 60, pp. 49–88.
Diana Ratiu and Helmut Schwichtenberg
[2010] Decorating proofs, Proofs, Categories and Computations. Essays
in honor of Grigori Mints (S. Feferman and W. Sieg, editors), College
Publications, pp. 171–188.
Bibliography 449

Diana Ratiu and Trifon Trifonov

[2010] Exploring the computational content of the Inﬁnite Pigeonhole
Principle, Journal of Logic and Computation, to appear.

Wayne Richter
[1965] Extensions of the constructive ordinals, The Journal of Symbolic
Logic, vol. 30, no. 2, pp. 193–211.

Robert Ritchie
[1963] Classes of predictably computable functions, Transactions Amer-
ican Mathematical Society, vol. 106, pp. 139–173.

Joel W. Robbin
[1965] Subrecursive Hierarchies, PhD thesis, Princeton University.

Raphael M. Robinson
[1950] An essentially undecidable axiom system, Proceedings of the In-
ternational Congress of Mathematicians (Cambridge 1950), vol. I, pp. 729–
730.

Dieter Rödding
[1968] Klassen rekursiver Funktionen, Proceedings of the Summer School
in Logic, Lecture Notes in Mathematics, vol. 70, Springer Verlag, Berlin,
pp. 159–222.

Harvey E. Rose
[1984] Subrecursion: Functions and Hierarchies, Oxford Logic Guides,
vol. 9, Clarendon Press, Oxford.

Barkley Rosser
[1936] Extensions of some theorems of Gödel and Church, The Journal
of Symbolic Logic, vol. 1, pp. 87–91.

Norman A. Routledge
[1953] Ordinal recursion, Mathematical Proceedings of the Cambridge
Philosophical Society, vol. 49, pp. 175–182.

Jan Rutten
[2000] Universal coalgebra: a theory of systems, Theoretical Computer
Science, vol. 249, pp. 3–80.

Diana Schmidt
[1976] Built-up systems of fundamental sequences and hierarchies of
number-theoretic functions, Archiv für Mathematische Logik und Grundla-
genforschung, vol. 18, pp. 47–53.
450 Bibliography

Peter Schroeder-Heister
[1984] A natural extension of natural deduction, The Journal of Symbolic
Logic, vol. 49, pp. 2184–1300.

Kurt Schütte
[1951] Beweistheoretische Erfassung der unendlichen Induktion in der
Zahlentheorie, Mathematische Annalen, vol. 122, pp. 369–389.
[1960] Beweistheorie, Springer Verlag, Berlin.
[1977] Proof Theory, Springer Verlag, Berlin.

Helmut Schwichtenberg
[1967] Eine Klassifikation der elementaren Funktionen, Manuscript.
[1971] Eine Klassifikation der 0 -rekursiven Funktionen, Zeitschrift für
Mathematische Logik und Grundlagen der Mathematik, vol. 17, pp. 61–74.
[1975] Elimination of higher type levels in definitions of primitive re-
cursive functionals by means of transfinite recursion, Logic Colloquium ’73
(H. E. Rose and J. C. Shepherdson, editors), North-Holland, Amsterdam,
pp. 279–303.
[1977] Proof theory: some applications of cut-elimination, Handbook of
Mathematical Logic (J. Barwise, editor), Studies in Logic and the Founda-
tions of Mathematics, vol. 90, North-Holland, Amsterdam, pp. 867–895.
[1992] Proofs as programs, Proof Theory (P. Aczel, H. Simmons, and
S. Wainer, editors), Cambridge University Press, pp. 81–113.
[1996] Density and choice for total continuous functionals, Kreiseliana.
About and Around Georg Kreisel (P. Odifreddi, editor), A.K. Peters,
Wellesley, Massachusetts, pp. 335–362.
[2005] A direct proof of the equivalence between Brouwer’s fan theo-
rem and König’s lemma with a uniqueness hypothesis, Journal of Universal
Computer Science, vol. 11, no. 12, pp. 2086–2095.
[2006a] An arithmetic for polynomial-time computation, Theoretical
Computer Science, vol. 357, pp. 202–214.
[2006b] Minlog, The Seventeen Provers of the World (F. Wiedijk, edi-
tor), Lecture Notes in Artificial Intelligence, vol. 3600, Springer Verlag,
Berlin, pp. 151–157.
[2006c] Recursion on the partial continuous functionals, Logic Collo-
quium ’05 (C. Dimitracopoulos, L. Newelski, D. Normann, and J. Steel,
editors), Lecture Notes in Logic, vol. 28, Association for Symbolic Logic,
pp. 173–201.
[2008a] Dialectica interpretation of well-founded induction, Mathemati-
cal Logic Quarterly, vol. 54, no. 3, pp. 229–239.
[2008b] Realizability interpretation of proofs in constructive analysis,
Theory of Computing Systems, vol. 43, no. 3, pp. 583–602.
Bibliography 451

Helmut Schwichtenberg and Stephen Bellantoni

[2002] Feasible computation with higher types, Proof and System-
Reliability (H. Schwichtenberg and R. Steinbrüggen, editors), Proceed-
ings NATO Advanced Study Institute, Marktoberdorf, 2001, Kluwer Aca-
demic Publisher, pp. 399– 415.

Helmut Schwichtenberg and Stanley S. Wainer

[1995] Ordinal bounds for programs, Feasible Mathematics II (P. Clote
and J. Remmel, editors), Birkhäuser, Boston, pp. 387– 406.

Dana Scott
[1970] Outline of a mathematical theory of computation, Technical
Monograph PRG-2, Oxford University Computing Laboratory.
[1982] Domains for denotational semantics, Automata, Languages and
Programming (E. Nielsen and E. M. Schmidt, editors), Lecture Notes in
Computer Science, vol. 140, Springer Verlag, Berlin, pp. 577–613.

John C. Shepherdson and Howard E. Sturgis

[1963] Computability of recursive functions, Journal of the Association
for Computing Machinery, vol. 10, pp. 217–255.

Wilfried Sieg
[1985] Fragments of arithmetic, Annals of Pure and Applied Logic, vol.
28, pp. 33–71.
[1991] Herbrand analyses, Archive for Mathematical Logic, vol. 30, pp.
409– 441.

Harold Simmons
[1988] The realm of primitive recursion, Archive for Mathematical Logic,
vol. 27, pp. 177–188.

Stephen G. Simpson
[1985] Nonprovability of certain combinatorial properties of ﬁnite trees,
Harvey Friedman’s Research on the Foundations of Mathematics (L. Har-
rington, M. Morley, A. Scedrov, and S. G. Simpson, editors), North-
Holland, Amsterdam, pp. 87–117.
[2009] Subsystems of Second Order Arithmetic, second ed., Perspec-
tives in Logic, Association for Symbolic Logic and Cambridge University
Press.

Craig Smoryński
[1991] Logical Number Theory I, Universitext, Springer Verlag, Berlin.

Robert I. Soare
[1987] Recursively Enumerable Sets and Degrees, Perspectives in Math-
ematical Logic, Springer Verlag, Berlin.
452 Bibliography

Richard Sommer
[1992] Ordinal arithmetic in IΔ0 , Arithmetic, Proof Theory and Compu-
tational Complexity (P. Clote and J. Krajicek, editors), Oxford University
Press.
[1995] Transfinite induction within Peano arithmetic, Annals of Pure and
Applied Logic, vol. 76, pp. 231–289.
Elliott J. Spoors
[2010] A Hierarchy of Ramified Theories Below Primitive Recursive
Arithmetic, PhD thesis, Dept. of Pure Mathematics, Leeds University.
Richard Statman
[1978] Bounds for proof-search and speed-up in the predicate calculus,
Annals of Mathematical Logic, vol. 15, pp. 225–287.
Martin Stein
[1976] Interpretationen der Heyting-Arithmetik endlicher Typen, PhD
thesis, Universität Münster, Fachbereich Mathematik.
Viggo Stoltenberg-Hansen, Edward Griffor, and Ingrid Lind-
ström
[1994] Mathematical Theory of Domains, Cambridge Tracts in Theo-
retical Computer Science, Cambridge University Press.
Viggo Stoltenberg-Hansen and John V. Tucker
[1999] Computable rings and fields, Handbook of Computability Theory
(Edward Griffor, editor), North-Holland, Amsterdam, pp. 363– 447.
Thomas Strahm
[1997] Polynomial time operations in explicit mathematics, The Journal
of Symbolic Logic, vol. 62, no. 2, pp. 575–594.
[2004] A proof-theoretic characterization of the basic feasible functio-
nals, Theoretical Computer Science, vol. 329, pp. 159–176.
Thomas Strahm and Jeffery I. Zucker
[2008] Primitive recursive selection functions for existential assertions
over abstract algebras, Journal of Logic and Algebraic Programming, vol.
76, pp. 175–197.
William W. Tait
[1961] Nested recursion, Mathematische Annalen, vol. 143, pp. 236–250.
[1968] Normal derivability in classical logic, The Syntax and Semantics
of Infinitary Languages (J. Barwise, editor), Lecture Notes in Mathemat-
ics, vol. 72, Springer Verlag, Berlin, pp. 204–236.
[1971] Normal form theorem for bar recursive functions of finite type,
Proceedings of the Second Scandinavian Logic Symposium (J. E. Fenstad,
editor), North-Holland, Amsterdam, pp. 353–367.
Bibliography 453

Masako Takahashi
[1995] Parallel reductions in -calculus, Information and Computation,
vol. 118, pp. 120–127.

Gaisi Takeuti
[1967] Consistency proofs of subsystems of classical analysis, Annals of
Mathematics, vol. 86, pp. 299–348.
[1987] Proof Theory, second ed., North-Holland, Amsterdam.

Alfred Tarski
[1936] Der Wahrheitsbegriﬀ in den formalisierten Sprachen, Studia
Philosophica, vol. 1, pp. 261– 405.

Trifon Trifonov
[2009] Dialectica interpretation with ﬁne computational control, Proc.
5th Conference on Computability in Europe, Lecture Notes in Computer
Science, vol. 5635, Springer Verlag, Berlin, pp. 467– 477.

Anne S. Troelstra
[1973] Metamathematical Investigation of Intuitionistic Arithmetic and
Analysis, Lecture Notes in Mathematics, vol. 344, Springer Verlag, Berlin.

Anne S. Troelstra and Helmut Schwichtenberg

[2000] Basic Proof Theory, second ed., Cambridge University Press.

Anne S. Troelstra and Dirk van Dalen

[1988] Constructivism in Mathematics. An Introduction, Studies in Logic
and the Foundations of Mathematics, vol. 121, 123, North-Holland, Am-
sterdam.

John V. Tucker and Jeﬀery I. Zucker

[1992] Provable computable selection functions on abstract structures,
Proof Theory (P. Aczel, H. Simmons, and S. Wainer, editors), Cambridge
University Press, pp. 275–306.
[2000] Computable functions and semicomputable sets on many-sorted
algebras, Handbook of Logic in Computer Science, Vol. V (S. Abramsky,
D. Gabbay, and T. Maibaum, editors), Oxford University Press, pp. 317–
523.
[2006] Abstract versus concrete computability: the case of countable
algebras, Logic Colloquium 2003 (V. Stoltenberg-Hansen and J. Väänänen,
editors), ASL Lecture Notes in Logic, vol. 24, AK Peters, pp. 377– 408.

Jaco van de Pol

[1995] Two diﬀerent strong normalization proofs?, HOA 1995
(G. Dowek, J. Heering, K. Meinke, and B. Möller, editors), Lecture Notes
in Computer Science, vol. 1074, Springer Verlag, Berlin, pp. 201–220.
454 Bibliography

Femke van Raamsdonk and Paula Severi

[1995] On normalisation, Computer Science Report CS-R9545 1995,
Centrum voor Wiskunde en Informatica.

Jan von Plato

[2008] Gentzen’s proof of normalization for natural deduction, The Bul-
letin of Symbolic Logic, vol. 14, no. 2, pp. 240–257.

Stanley S. Wainer
[1970] A classiﬁcation of the ordinal recursive functions, Archiv für Ma-
thematische Logik und Grundlagenforschung, vol. 13, pp. 136–153.
[1972] Ordinal recursion, and a reﬁnement of the extended Grzegorcyk
hierarchy, The Journal of Symbolic Logic, vol. 38, pp. 281–292.
[1989] Slow growing versus fast growing, The Journal of Symbolic Logic,
vol. 54, no. 2, pp. 608–614.
[1999] Accessible recursive functions, The Bulletin of Symbolic Logic,
vol. 5, no. 3, pp. 367–388.
[2010] Computing bounds from arithmetical proofs, Ways of Proof The-
ory: Festschrift for W. Pohlers (R. Schindler, editor), Ontos Verlag,
pp. 459– 476.

Stanley S. Wainer and Richard S. Williams

[2005] Inductive deﬁnitions over a predicative arithmetic, Annals of Pure
and Applied Logic, vol. 136, pp. 175–188.

Andreas Weiermann
[1995] Investigations on slow versus fast growing: how to majorize slow
growing functions nontrivially by fast growing ones, Archive for Mathema-
tical Logic, vol. 34, pp. 313–330.
[1996] How to characterize provably total functions by local predicativity,
The Journal of Symbolic Logic, vol. 61, no. 1, pp. 52–69.
[1999] What makes a (pointwise) subrecursive hierarchy slow growing?,
Sets and Proofs: Logic Colloquium ’97 (S. B. Cooper and J. K. Truss, edi-
tors), London Mathematical Society Lecture Notes, vol. 258, Cambridge
University Press, pp. 403– 423.
[2004] A classiﬁcation of rapidly growing Ramsey functions, Proceedings
of the American Mathematical Society, vol. 132, pp. 553–561.
[2005] Analytic combinatorics, proof-theoretic ordinals, and phase-
transitions for independence results, Annals of Pure and Applied Logic,
vol. 136, pp. 189–218.
[2006] Classifying the provably total functions of PA, The Bulletin of
Symbolic Logic, vol. 12, pp. 177–190.
[2007] Phase transition thresholds for some Friedman-style independence
results, Mathematical Logic Quarterly, vol. 53, pp. 4–18.
Bibliography 455

Richard S. Williams
[2004] Finitely Iterated Inductive Deﬁnitions over a Predicative Arith-
metic, PhD thesis, Department of Pure Mathematics, Leeds University.

Fred Zemke
[1977] P.R.-regulated systems of notation and the subrecursive hierarchy
equivalence property, Transactions of the American Mathematical Society,
vol. 234, pp. 89–118.

Jeﬀrey Zucker
[1973] Iterated inductivex1 deﬁnitions, trees and ordinals, Mathematical
Investigation of Intuitionistic Arithmetic and Analysis (A. S. Troelstra,
editor), Lecture Notes in Mathematics, vol. 344, Springer Verlag, Berlin,
pp. 392– 453.
INDEX

A(r), 8 ∃d , ∃l , ∃r , ∃u , 335, 338

Bα , 175, 205 F, xiii, 317
Fα , 161 ⊥, 6, 353
G(α), 201 →+ , 243
Gα , 159 ∧,
˜ 13
Hα , 161 ∧d , ∧l , ∧r , ∧u , 335, 338
CV, 328 ∨d , ∨l , ∨r , ∨u , 338
W, 197, 202 i , 203
O<k , 209 →+ , 23
Ok , 209 →∗ , 23
EA(;), 397 E[x := r ], 8
Eq, 335 k , 206, 217
FV, 8 →c , →nc , 327
HA , 372 , 221
IDk (W ), 216 ϕe(r) (
n ), 74
IDk (W )∞ , 219 {e}( n ), 74
ID< (W ), 216 ϕα(k) (), 205
IΔ0 (exp), 114, 150 f |= Γ, 120, 154
IΣ1 , IΣn , 151 k A[], 45
LA(;), 422 m |= Γ, 120, 153
LT(;), 405, 412 n : N α Γ, 174
ΩGk
(n), 207 t r A, 333
ΩSk , 203
Ωi -rule, 219, 221 Abel, 310
Ωk , 203 Abramsky, 310
PA, 144 Ackermann, xi, 150
Π11 -CA0 , 195, 215, 231 adequacy, 289
A(;), 422 Aehlig, 429
T(;), 404 Alexandrov condition, 256
T1 (;), 420 algebra, 259
SC, 291 explicit, 261
T, 290 finitary, 261
TCF, 316 initial, 265
∀c , ∀nc , 327 of witnesses, 333
α[], 203 simultaneously defined, 260
α[n], 158, 200 structure-finitary, 261
ε0 , 157 α-equal, 7, 129
ε0 (i), 163 α-recursion, 163
∃,
˜ ∨,˜ 6, 13 Altenkirch, 310

457
458 Index

Amadio, 310 Çaǧman, 396

application, 253, 258 canonical inhabitant, 269
approximable map, 255 Cantini, 396
Arai, 230, 231 Cantor normal form, 157
argument type Carlson, 231
parameter, 259 case-construct, 271
recursive, 259 C operator, 271
assignment, 45, 53 cases-operator, 405
assumption, 9 Chiarabini, 381, 394
cancelled, 9 Church, xi, 110, 145
closed, 9 Church–Rosser property, 20, 26
open, 9 weak, 26
atom, 7 Cichon, 150, 185, 186
critical, 360 Cleave, 110
A-translation, 352 closure ordinal, 104
Avigad, 150, 193, 245 Clote, 396
axiom coalgebra
independence of premise, 370 terminal, 265
of choice, 232, 370 Cobham, 396
of dependent choice, 54, 108 coclosure ordinal, 105
of extensionality, 319 codirected, 105
coinduction, 103, 323
bar, 45
for monotone inductive definitions, 103
Barbanera, 394
strengthened form, 103
Barendregt, 276, 310
Beckmann, 396, 428 coinductive definition
Beklemishev, 145 monotone, 102
Bellantoni, 113, 395–397, 404 of cototality, 323, 348
Benl, 311 collapsing, 225, 227
Berardi, 394 compactness theorem, 57
Berger, 52, 265, 266, 273, 294, 310, 311, 327, complete partial ordering (cpo), 252
367, 381, 392–394 algebraic, 253
Josef, xv basis, 252
Bernays, 143 bounded complete, 253
Beth, 44, 59 compact element, 252
Bezem, 353, 366 deductive closure in, 252
BHK-interpretation, 304, 313 directed set in, 252
bibliography, 431– 455 domain, 253
Blanqui, 310 completeness theorem
Börger, 141 for classical logic, 55
Borodin, 150 for intuitionistic logic, 52
bounded minimization, 65 for minimal logic, 49
bounded product, 64 comprehension term, 316
bounded sum, 64 computation rule, 268, 273
boundedness property, 101 computation sequence, 241
Bovykin, 150, 193, 246 computational strengthening, 329
branch, 44 conclusion, 9
generic, 55 confluence, 20, 276
Brouwer, 198, 304, 313 confluent, 26
Bruijn, de, 7, 294 weakly, 26
Buchholz, xv, 150, 174, 216, 217, 219, 225, conjunction, 12, 319
231, 233, 245, 393 uniform ∧u , 335
Buss, 145, 150, 396 consistency, 142
Index 459

consistent, 251 normal, 40

consistent set of formulas, 57 segment, 40
Constable, 150, 380, 394 term, 21
constant, 6 track, 41
constructor destructor, 275, 351
as continuous function, 263 Dezani, 310
constructor pattern, 274 Dialectica interpretation, 367
constructor symbol, 260 diamond property, 276
constructor type Dickson’s lemma, 353, 366
nullary, 259, 260 Diller, 393
contents, ix–x direct limit, 201
context, 126 directed, 105
conversion, 21, 268, 274, 284 disjunction, 11, 12, 319
D-, 268, 274 disjunction property, 42
-, 35, 268, 409 domain, 253
-, 268 Dragalin, 352, 359, 393
R-, 268 drinker formula, 17
permutative, 20, 33, 35
simplification, 34, 35 E-rule, 10
conversion rule, 35 Eberl, 266, 310, 311
Cook, 113, 395–397, 404 effective index property, 79
Cooper, 83 effectivity principle, 250
Coppo, 310 Ekman, 23
Coquand, 245, 266, 310, 394 elimination, 35
corecursion elimination axiom, 317
operator, 272, 351 embedding of PA, 177
cototality, 323, 324, 348 entailment, 251
course of values recursion, 86 equality
cpo, 252 decidable, 275, 317
CR, 26 Leibniz, xiii, 317
Curien, 310 pointwise, 318
Curry, 21 Ershov, 253, 310, 311
Curry–Howard correspondence, 5, 21, 394, -expansion, 29
422 ex-falso-quodlibet, 6, 14, 138, 318
cut elimination, 180, 225, 226 existential quantifier, 12
cut formula, 120, 175 uniform ∃u , 335
cut rank, 59 explicit definability, 43
Cutland, 61
Fairtlough, 150, 166, 174, 192
Dalen, van, xi, 60, 392 falsity F, xiii, 317
decoration falsum ⊥, 6
algorithm, 382 Feferman, 101, 143, 145, 150, 199, 216, 217,
of proofs, 381 231, 233, 245, 246
definition Felleisen, 394
coinductive, 102 Filinski, 294
inductive, 102 finite support principle, 79, 249, 256
recursive, 78 fixed point
derivability conditions, 142 least, 79
derivable, 10 lemma, 134
classically, 14 forces, 45
intuitionistically, 14 formal neighborhood, 251
derivation, 9 formula, 7
main branch, 41 Π1 -, 151
460 Index

Πn -, 151 Gerhardy, 392

Σ1 -, 114, 150 Girard, xi, 145, 150, 193, 195, 200, 214, 310,
Σn -, 151 396
atomic, 7 Gödel, xi, 110, 114, 135, 137, 138, 141, 142,
bounded, 114 144, 313, 314, 367, 393
closed, 8 first incompleteness theorem, 135
computational equivalence, 333 number, 123
computationally relevant (c.r.), 332 second incompleteness theorem, 144
invariant, 332, 335 translation, 369
negative content, 368 Gödel–Gentzen translation, 19
non-computational (n.c.), 328, 332 Gödel–Rosser theorem, 136
positive content, 368 Goodstein, 149
prime, 7 Goodstein sequence, 185–187
safe, 403, 424 Grädel, 141
free (for a variable), 7 Graham, 189
Friedman, 49, 60, 150, 193, 216, 238, 246, greatest fixed point, 102
352, 359, 393 greatest-fixed-point axiom, 323–325, 327
function Griffin, 394
ε0 -recursive, 157, 163 Grzegorczyk, 110, 149
ε0 (i)-recursive, 163 Gurevich, 141
accessible recursive, 199, 201, 215
Ackermann–Péter, 89 Hájek, 150, 193, 194
computable, 77 Hagino, 265
continuous, 255 halting problem, 78
elementary, 65 Handley, 111
monotone, 256 Hardy, 161, 162, 167
-recursive, 76 Harrington, 149, 185, 188, 193
partial recursive, 82 Harrop formula, 332
primitive recursive, 84, 151 Heaton, 199
provably Σ1 , 116, 150 height
provably bounded, 117 of a formal neighborhood, 306
provably recursive, 116, 150 of an extended token, 306
recursive, 82 of syntactic expressions, 262
register machine computable, 64 Herbrand, xi, 110
representable, 133 Herbrand–Gödel–Kleene equation calculus,
strict, 263 xi
subelementary, 65, 111 Hernest, 393, 394
function symbol, 6 Heyting, 304, 313
functional Heyting arithmetic, 192, 372
computable, 262 hierarchy
partial continuous, 262 analytical, 94
recursive, 100 arithmetical, 90
recursive in valmax, pcond and ∃, 299 fast-growing, 161, 167, 175, 205
functor, 200 Grzegorczyk, 110
B-, 212 Hardy, 161, 167, 185
G-, 209 slow-growing, 159, 200, 206
ϕ-, 210 subrecursive, 196
fundamental sequence, 157 Hilbert, xi, 143
Hofmann, 245, 396
G-collapsing, 206 homogeneous set, 187
gap condition, 238 honest, 89
Gentzen, xi, 5, 6, 59, 149, 170, 171, 173, 185, Howard, 21, 429
193, 219 Huber, xv, 311
Index 461

i-reduction, 240 Jung, 310

I-rule, 10
i-sequence, 240 Kadota, 206
i-term, 239 Kahle, 246
ideal, 251 Kalmár, 65, 110, 202, 215
cototal, 264 Kapron, 396
structure-cototal, 264 Ketonen, 149, 185, 188, 194
structure-total, 264, 304 Kirby, 149, 150, 185, 187
total, 264, 304 Kleene, xi, xiii, 61, 99, 110, 196, 198, 309
implication Kleene’s normal form theorem, 74
computational →c , 327 relativized, 83
input →, 422 Kleene’s recursion theorem, 80, 296
non-computational →nc , 327 Kleene’s second recursion theorem, 90
output →, 422 Kleene–Brouwer ordering, 100, 198
incompleteness, 135 Knaster–Tarski theorem, 102, 106
incompleteness theorem König’s lemma, 107
first, 135 Kohlenbach, 392, 394
second, 144 Kolmogorov, xi, 6, 59, 304, 313, 331
index, 457– 465 Kreisel, xii, 149, 150, 193, 216, 266, 309–311,
induction 392, 393
computational, 339 Kripke, 60
for monotone inductive definitions, 103, Krivine, 394
106 Kruskal’s theorem, 237
general, 322 extended, 238
strengthened form, 103, 315
transfinite, 170 Lévy, 193
transfinite structural, 169 language
inductive definition elementarily presented, 123
decorated, 331 Larsen, 251
finitely iterated, 216 leaf, 44
monotone, 102 least fixed point, 102
of conjunction, 319 least number operator, 65
of disjunction, 319 least-fixed-point axiom, 317
of existence, 275, 319 least-fixed-point operator, 296
of totality, 315 Leibniz equality Eq, 335
infinitary arithmetic, 174 Leivant, 114, 359, 396, 397, 429
infix, 6 Leustean, 392
information system, 251 Liu, 196
atomic, 264 Löb, 142, 144, 145, 150
coherent, 264 Löb’s theorem, 144
flat, 251 Löwenheim–Skolem theorem, 57
of a type , 261 long normal form, 29, 294
intuitionistic logic, 6 loop-program, 85
Ishihara, 393
Malcev, xi
Jäger, 231, 246 many-one reducible, 93
Jaskowski, 59 Marion, 396
Jervell, 246 marker, 9
Joachimski, 59 Markov principle, 370
Jockusch, 187 Martin-Löf, 21, 59, 266, 276, 287, 310, 319
Jørgensen, 393 Matthes, 59
Johannsen, 429 McCarthy, 110
Johansson, 6, 59 minimal logic, 10
462 Index

Minlog, xiv, 291, 313, 362, 381, 394, 404 constructive, 197
Mints, xv, 60, 150, 151, 265 provable, 230
Miquel, 392 recursive, 99
model, 53 tree-, 217
classical, 54 Orevkov, 20, 59
tree, 44, 59 Ostrin, 396
modus ponens, 10
monotonicity principle, 79, 250 parallel or, 299
Moschovakis, 61, 101, 388 parameter premise, 316
Murthy, 394 Parigot, 394
Myhill, 196, 200 Paris, 149, 150, 185, 187, 188, 192, 193
Parsons, 149–151, 171, 173
Nahm, 393 part
negation, 7 strictly positive, 8
negation normal form, 58 Peano arithmetic, 144, 150, 157
Negri, 60 Peirce formula, 48
Newman’s lemma, 20, 26 permutative conversion, 20, 33
Niggl, 396 Persson, 394
node, 44 Péter, xi
consistent, 55 Platek, 250
stable, 55 Plato, von, 59
normal form, 23 Plotkin, 250, 266, 287, 299, 310
long, 29, 294 Pohlers, 217, 231, 233, 245
normalization, 20 Pol, van de, 59
by evaluation, 295 Pollett, 396
strong, 23, 34 Prawitz, 59
normalizing predicate
strongly, 23 coinductively defined, 324
Normann, 311 inductively defined, 316
nullary clause, 316 predicate symbol, 6
nullterm, 332 preface, xi–xv
nulltype, 329, 332 premise, 9
numeral, 131 major, 10, 12
minor, 10, 12
Odifreddi, 111 progressive, 170, 323
Oitavem, 396 -, 170
Oliva, 378, 392, 393 structural, 169
-consistent, 135, 145 proof, 9
-rule, 173 code-carrying, 346
operator, 102 pattern, 381
Σ0r -definable, 107 proof mining, 392
closure of an, 104 propositional symbol, 6
coclosure of an, 105 pruning, 24
cocontinuous, 105 Pudlák, 150, 193, 194
continuous, 105
inclusive, 104 Raamsdonk, van, 59
least-fixed-point, 107, 296 Raffalli, 394
monotone, 102 Ramsey, 149
selective, 104 Ramsey’s theorem, 187
ordinal finite, 187, 188
Bachmann–Howard, 215, 231 infinite, 187, 188
below ε0 , 157 modified finite, 188, 192
coding of, 159 rank of a term, 411
Index 463

Ratajczyk, 150 well-founded, 106

Rath, 393 relation symbol, 6
Rathjen, 150, 217, 231, 246 renaming, 7
Ratiu, 394 representability, 133
real Richter, 233, 234
abstract, 273, 348 normal form, 234
realizability, 113, 332 Ritchie, 111
recursion Robbin, 149
α-, 163 Robinson, 145
general, 272 Robinson’s Q, 140, 142
operator, 267, 290 Rödding, 111, 402
operator, simultaneous, 268 Rose, xii, 150
relativized, 83 Rosser, 135, 136, 138, 141, 145
recursive premise, 316 Rothschild, 189
recursive program, 82 Routledge, 196, 200
redex rule, 9
D-, 268 Ωi -, 219, 221
, 21 Rutten, 265
reduction, 23
Sambin, 310
generated, 23
satisfiable set of formulas, 57
head, 287
Schmidt, 206
inner, 35
Schroeder-Heister, 59
one-step, 23, 35
Schütte, xi, 150, 173, 429
parallel, 276
Schwichtenberg, 111, 150, 193, 230, 266,
proper, 23
294, 310, 311, 313, 367, 379, 393, 394,
reduction sequence, 23, 106
396, 404, 412
reduction system, 106
Scott, 250, 251, 253, 309, 310
reduction tree, 23 Scott condition, 256
register machine, 61 Scott topology, 256
relation Scott–Ershov domain, 253
Δ0r -definable, 92 Seisenberger, 367
Δ1r -definable, 95 sequence
Π0r -definable, 91 reduction, 23
Π1r -definable, 95 sequence number, 70
Σ01 -definable, 76 set of formulas
Σ0r -complete, 93 definable, 131
Σ0r -definable, 91 elementary, 130
Σ1r -complete, 98 primitive recursive, 130
Σ1r -definable, 95 recursive, 130
accessible part, 106, 320, 338 recursively enumerable, 130
analytical, 95 set of tokens
arithmetical, 91 deductively closed, 251
confluent, 276 Setzer, 246
definable, 131 Severi, 59
elementarily enumerable, 76 Sheard, 150, 193
elementary, 66 Shepherdson, 61, 110
noetherian, 106 Sieg, 150, 217, 231, 233, 245
recursive, 130 Σ1 -formulas
recursively enumerable, 91 of the language L1 , 141
representable, 133 signature, 6
terminating, 106 signed digit, 265, 273
universal, 93 Simmons, 395, 404
464 Index

simpliﬁcation redex, 35 simple, 416

Simpson, 193, 233, 238 syntactically total, 360
Skolem, xi theory
Smith, 310 axiomatized, 131
Smorynski, 145 complete, 131
Soare, 83 consistent, 131
Solovay, 149, 185, 188, 194 elementarily axiomatizable, 130
Sommer, 150, 193 incomplete, 131
soundness theorem inconsistent, 131
for classical logic, 54 primitive recursively axiomatizable, 130
for Dialectica, 373 recursively axiomatizable, 130
for minimal logic, 46 token, 251
for realizability, 340 extended, 261
Spencer, 189 totality, 315, 320, 321
Spiwak, 310 track, 28
Spoors, 403 main, 28
stability, 14 tree, 44
Stärk, 129 complete, 44
state finitely branching, 44
of computation, 75 infinite, 44
Statman, 20, 59 reduction, 23
Stein, 393 tree embedding, 238
step formula, 322 tree model, 44
Stoltenberg-Hansen, 61, 310, 311 for intuitionistic logic, 48
Strahm, 150, 246, 396 tree ordinal, 203
stream representation, 265, 273, 323 structured, 203
strictly positive, 259, 316 Trifonov, 394
strictly positive part, 8 Troelstra, xi, 60, 150, 310, 314, 392
strong computability, 291 true at, 153, 176
strongly normalizing, 23 Tucker, 61, 150
structure-totality, 321 Turing, xi, 61, 110, 199
Sturgis, 61, 110 type, 259
subformula, 8 base, 261
negative, 8 dense, 306
positive, 8 higher, 261
strictly positive, 8 level of, 261
subformula property, 42 safe, 404, 413
substitution, 7, 126 separating, 306
substitution lemma with nullary constructors, 305
for Σ01 -definable relations, 91 type parameter, 259
substitutivity, 23
undecidability, 135
Tait, 60, 150, 154, 166, 174, 178, 220, 276, of logic, 141
310 undefinability of truth, 132
Tait calculus, 57 undefinability theorem, 132
Takahashi, 276 uninstantiated formula
Takeuti, xi, 150, 151, 233, 396 of an axiom, 381
Tarski, 132, 145 universal quantifier
term, 7 computational ∀c , 327
extracted, 337 non-computational ∀nc , 327
input, 405, 414
of T+ , 273 Valentini, 310
of Gödel’s T, 268, 290 validity, 53
Index 465

variable, 6 Wainer, 111, 150, 166, 174, 192, 199, 200,

assumption, 9 206, 225, 379, 396
bound, 7 WCR, 26
computational, 328, 425 Weiermann, 150, 193, 246, 428
free, 8 Williams, 225
non-computational, 328 Winskel, 251
normal, 395 witnessing predicate, 334
object, 9
safe, 395
variable condition, 10, 12 Zemke, 150
Veldman, 353, 366 Zucker, 61, 150, 310

Zhang H. Logic in Computer Science 2025
No ratings yet
Zhang H. Logic in Computer Science 2025
488 pages
Computability Theory PDF
No ratings yet
Computability Theory PDF
87 pages
Logics For Computer Science - Classical and Non-Classical - Wasilewska, Anita - 1, 2018 - Springer - 10 - 1007 - 978!3!319-92591-2 - Anna's Archive
No ratings yet
Logics For Computer Science - Classical and Non-Classical - Wasilewska, Anita - 1, 2018 - Springer - 10 - 1007 - 978!3!319-92591-2 - Anna's Archive
539 pages
A Course in Mathematical Logic - Bell, J - L - (John Lane) Machover, Moshé - 1977 - Amsterdam - North-Holland Pub - Co - New York - Sole
No ratings yet
A Course in Mathematical Logic - Bell, J - L - (John Lane) Machover, Moshé - 1977 - Amsterdam - North-Holland Pub - Co - New York - Sole
628 pages
Computability & Incompleteness Notes
100% (1)
Computability & Incompleteness Notes
128 pages
Constructive Formalism. Essays On The Foundations of Mathematics (University College Leicester, 1951) by R. L. Goodstein PDF
No ratings yet
Constructive Formalism. Essays On The Foundations of Mathematics (University College Leicester, 1951) by R. L. Goodstein PDF
89 pages
Proof Theory: Normalization & Consistency
No ratings yet
Proof Theory: Normalization & Consistency
430 pages
SAT SMT by Example
No ratings yet
SAT SMT by Example
585 pages
Proof Analysis - A Contribution To Hilberts Last Problem - Sara Negri, Jan Von Plato
No ratings yet
Proof Analysis - A Contribution To Hilberts Last Problem - Sara Negri, Jan Von Plato
280 pages
Model Theory For Modal Logic - Kripke Models For Modal Predicate Calculi (Kenneth A. Bowen) PDF
No ratings yet
Model Theory For Modal Logic - Kripke Models For Modal Predicate Calculi (Kenneth A. Bowen) PDF
147 pages
978 1 4612 0539 5
No ratings yet
978 1 4612 0539 5
275 pages
Proof Complexity Draft Krajicek
No ratings yet
Proof Complexity Draft Krajicek
588 pages
Intuitive Axiomatic Set Theory (José Luis García) (Z-Library)
100% (1)
Intuitive Axiomatic Set Theory (José Luis García) (Z-Library)
362 pages
First Order Theories
50% (2)
First Order Theories
307 pages
Iterative Conceptions of Set
100% (1)
Iterative Conceptions of Set
104 pages
Fuzzy Mathematics - A Fundamental Introduction (Synthesis - Apostolos Syropoulos
No ratings yet
Fuzzy Mathematics - A Fundamental Introduction (Synthesis - Apostolos Syropoulos
109 pages
Applied Combinatorics On Words
No ratings yet
Applied Combinatorics On Words
575 pages
Graduate Texts in Mathematics Overview
100% (1)
Graduate Texts in Mathematics Overview
419 pages
Set Theory: An Introduction To The World of Large Cardinals: Expanded Notes of A Course in Four Lectures Delivered at The
100% (1)
Set Theory: An Introduction To The World of Large Cardinals: Expanded Notes of A Course in Four Lectures Delivered at The
49 pages
Commutative Algebra: Cohen-Macaulay Rings
100% (1)
Commutative Algebra: Cohen-Macaulay Rings
465 pages
Introduction to Type Theory Notes
100% (1)
Introduction to Type Theory Notes
60 pages
Logic and Information - Edwin Mares - (Elements in Philosophy and Logic), 2024 - Cambridge University Press - 9781009466752 - Anna's Archive
No ratings yet
Logic and Information - Edwin Mares - (Elements in Philosophy and Logic), 2024 - Cambridge University Press - 9781009466752 - Anna's Archive
82 pages
Axiomatic Set Theory New
No ratings yet
Axiomatic Set Theory New
109 pages
Multiplicative Ideal Theory PDF
No ratings yet
Multiplicative Ideal Theory PDF
57 pages
Introducing Category Theory
No ratings yet
Introducing Category Theory
500 pages
Introduction to Mathematical Logic
100% (2)
Introduction to Mathematical Logic
24 pages
Set Theory 2017
No ratings yet
Set Theory 2017
48 pages
A Primer of Commutative Algebra
No ratings yet
A Primer of Commutative Algebra
110 pages
Proof-Theoretical Coherence
No ratings yet
Proof-Theoretical Coherence
391 pages
(Encyclopedia of Mathematics and Its Applications 50) Francis Borceux - Handbook of Categorical Algebra 1 - Basic Category Theory - Cambridge University Press (1994) (Z-Lib - Io)
No ratings yet
(Encyclopedia of Mathematics and Its Applications 50) Francis Borceux - Handbook of Categorical Algebra 1 - Basic Category Theory - Cambridge University Press (1994) (Z-Lib - Io)
361 pages
The Origin of Cauchy's Conceptions of A Definite Integral and of The Continuity of A Function
No ratings yet
The Origin of Cauchy's Conceptions of A Definite Integral and of The Continuity of A Function
43 pages
Philippeter AxiomaticSetTheory
No ratings yet
Philippeter AxiomaticSetTheory
154 pages
An Introduction To Formal Language Theory Robert N Annas Archive Libgenrs NF 1505825
No ratings yet
An Introduction To Formal Language Theory Robert N Annas Archive Libgenrs NF 1505825
214 pages
Category Theory's Foundational Debate
100% (2)
Category Theory's Foundational Debate
46 pages
Decidability and Reductions
No ratings yet
Decidability and Reductions
178 pages
V Sankrithi Krishnan An Introduction To Category Theory North Holland 1981
100% (2)
V Sankrithi Krishnan An Introduction To Category Theory North Holland 1981
181 pages
Cell Signalling 4th Edition Official Test Bank
100% (1)
Cell Signalling 4th Edition Official Test Bank
400 pages
Steven H Weintraub - An Introduction To Abstract Algebra - Sets, Groups, Rings, and Fields
No ratings yet
Steven H Weintraub - An Introduction To Abstract Algebra - Sets, Groups, Rings, and Fields
438 pages
Total Domination Books PDF
75% (4)
Total Domination Books PDF
184 pages
Zach Intermediate Logic
100% (1)
Zach Intermediate Logic
170 pages
A First Course in Real Analysis: Murray H. Protter Charles B. Morrey, JR
No ratings yet
A First Course in Real Analysis: Murray H. Protter Charles B. Morrey, JR
5 pages
Notes On Lattice Theory: J. B. Nation University of Hawaii
No ratings yet
Notes On Lattice Theory: J. B. Nation University of Hawaii
69 pages
Set Theory Questions and Answers Guide
No ratings yet
Set Theory Questions and Answers Guide
100 pages
An IntroductionToTheLanguageOfCategoryTheory
100% (1)
An IntroductionToTheLanguageOfCategoryTheory
174 pages
Liao Homotopy
100% (3)
Liao Homotopy
317 pages
Mathematics of Modality
100% (1)
Mathematics of Modality
288 pages
Elementary Commutative Algebra Notes
No ratings yet
Elementary Commutative Algebra Notes
121 pages
Basic Proof Theory
No ratings yet
Basic Proof Theory
178 pages
Introduction To Theory of The Complexity
100% (2)
Introduction To Theory of The Complexity
290 pages
Understanding Rings and Factorization
No ratings yet
Understanding Rings and Factorization
30 pages
Model Theory Course Overview
100% (1)
Model Theory Course Overview
110 pages
Homotopy Theory: Elementary Basic Concepts.
100% (1)
Homotopy Theory: Elementary Basic Concepts.
63 pages
Mathematical Logic Lecture Notes
No ratings yet
Mathematical Logic Lecture Notes
119 pages
Schwichtenberg & Wainer - Proofs and Computations
100% (1)
Schwichtenberg & Wainer - Proofs and Computations
384 pages
MID TERM TESTS T2 gr3 To 12 of 2025
No ratings yet
MID TERM TESTS T2 gr3 To 12 of 2025
2 pages
OpenScape 4000 V10OpenScape 4000 V10 Volume 2 System Components Hardware Software Service Documentation Issue 8
No ratings yet
OpenScape 4000 V10OpenScape 4000 V10 Volume 2 System Components Hardware Software Service Documentation Issue 8
407 pages
MATLAB Simulator for Yeast Fermentation
No ratings yet
MATLAB Simulator for Yeast Fermentation
11 pages
Reforming Morocco's Education System
No ratings yet
Reforming Morocco's Education System
12 pages
Research On Bags
No ratings yet
Research On Bags
3 pages
DD 14035-2007
No ratings yet
DD 14035-2007
24 pages
ACTREMP - Units II and III Part 1
No ratings yet
ACTREMP - Units II and III Part 1
30 pages
Basha Resume
No ratings yet
Basha Resume
3 pages
Prof. Ed. 21 Worktext Episode 1.6
No ratings yet
Prof. Ed. 21 Worktext Episode 1.6
5 pages
Sika Igasol
No ratings yet
Sika Igasol
2 pages
Report: Building Construction: Submitted by
No ratings yet
Report: Building Construction: Submitted by
27 pages
Solid Oxide Fuel Cell Thesis PDF
100% (3)
Solid Oxide Fuel Cell Thesis PDF
5 pages
Halpern Et Al 2008
No ratings yet
Halpern Et Al 2008
5 pages
Rotary Drilling
100% (7)
Rotary Drilling
5 pages
TDS Ceresit CM 9 Global 2023
No ratings yet
TDS Ceresit CM 9 Global 2023
3 pages
Uniflex Distributors Proposal Issues
No ratings yet
Uniflex Distributors Proposal Issues
17 pages
Toshiba P Base
No ratings yet
Toshiba P Base
2 pages
Trial and Error Learning
No ratings yet
Trial and Error Learning
19 pages
Simple Linear Regression: Y XI. XI X
No ratings yet
Simple Linear Regression: Y XI. XI X
25 pages
Environmental Vibration Criteria
No ratings yet
Environmental Vibration Criteria
2 pages
11 - Wave Optics
No ratings yet
11 - Wave Optics
49 pages
Natural Resources Case Digests
No ratings yet
Natural Resources Case Digests
12 pages
Non and Renwable Resources
No ratings yet
Non and Renwable Resources
21 pages
Fase 4 - Hacia Una Solución de Conflictos Ambientales
No ratings yet
Fase 4 - Hacia Una Solución de Conflictos Ambientales
4 pages
Vibration Monitoring for Bearing Reliability
No ratings yet
Vibration Monitoring for Bearing Reliability
17 pages
18 Dome Light (DH)
No ratings yet
18 Dome Light (DH)
1 page
Gen Ed 001 - Module 3 Lesson 3
No ratings yet
Gen Ed 001 - Module 3 Lesson 3
4 pages
2.dlp-Tle 6-Ia - 3RD Day 1-Week 6
No ratings yet
2.dlp-Tle 6-Ia - 3RD Day 1-Week 6
4 pages
Design of Machine Elements I
No ratings yet
Design of Machine Elements I
4 pages
Genital Dermatology Atlas 2nd Edition Official Test Bank
No ratings yet
Genital Dermatology Atlas 2nd Edition Official Test Bank
317 pages

Proofs and Computations Helmut Schwichtenberg Stanley S - Schwichtenberg, Helmut Wainer, Stanley S - PERSPECTIVES in LOGIC, 2012 - Cambridge - 9780521517690 - Anna's Archive

Uploaded by

Proofs and Computations Helmut Schwichtenberg Stanley S - Schwichtenberg, Helmut Wainer, Stanley S - PERSPECTIVES in LOGIC, 2012 - Cambridge - 9780521517690 - Anna's Archive

Uploaded by

Proofs and Computations

h e l m u t s c h w i c h t e n b e r g is an Emeritus Professor of Mathematics at

s t a n l e y s . w a i n e r is an Emeritus Professor of Mathematics at the University of

Thomas Scanlon, Managing Editor

For more information, see www.aslonline.org/books perspectives.html

Proofs and Computations

association for symbolic logic

Association for Symbolic Logic

This publication is in copyright. Subject to statutory exception

First published 2012

Printed in the United Kingdom at the University Press, Cambridge

ISBN 978-0-521-51769-0 Hardback

Cambridge University Press has no responsibility for the persistence or

In memory of our teachers

Part 1. Basic proof theory and computability

Chapter 2. Recursion theory 61

Chapter 3. Gödel’s theorems 113

Part 2. Provable recursion in classical systems

Part 3. Constructive logic and complexity

Chapter 6. Computability in higher types 249

Chapter 8. Linear two-sorted arithmetic 395

range of proof-theoretic applications with a treatment of topics which re-

for the function f being computed together with a particular choice of

the computational content of proofs, along the lines of the Brouwer–

provable functions. Every chapter is intended to contain some examples

Referencing. References are by chapter, section and subsection: i.j.k

BASIC PROOF THEORY AND COMPUTABILITY

The main subject of Mathematical Logic is mathematical proof. In this

theories, as done in part 2. There are many excellent treatments of sequent

1.1. Natural deduction

Rules come in pairs: we have an introduction and an elimination rule

We shall often need to do induction on the height, denoted |A|, of

Also, it is never a real restriction to assume that distinct quantiﬁer

Informal proof. Assume A → B → C . To show: (A → B) → A → C .

application of the ∀-introduction rule similarly binds an object variable

We now give derivations of the two example formulas treated informally

¬¬(A → B) → ¬¬A → ¬¬B,

For conjunction we have

1.1.8. Intuitionistic and classical derivability. In the deﬁnition of deriv-

(d) Induction on A. The case Rt with R distinct from ⊥ is given by

(3) Writing B0 for B[x:=c] we have

Using this derivation M we obtain

1.1.9. Gödel–Gentzen translation. Classical derivability Γ c B was

Deﬁnition (Gödel–Gentzen translation Ag ).

with x ∈/ FV(B g ). Now use  (¬¬B g → B g ) → ∃˜ x Ag → ∀x (Ag →

(b) First note that c (B ↔ B g ) if B is without ∨, ∃. Now assume that

A derivation in normal form does not make “detours”, or more pre-

in a normal derivation is a subformula of either the conclusion or else an

Table 1. Derivation terms for → and ∀

or written as derivation terms

(u M (u A )B )A→B N A → M (N A )B .

(x M (x)A(x) )∀x A(x) r → M (r).

The closure of the conversion relation → is deﬁned by

Therefore M → N means that M reduces in one step to N , i.e., N is

(x y z (xz(yz)))(u v u)(u v u ) →

The relation →+ (“properly reduces to”) is the transitive closure of →, and

Remark. It may well happen that reasonable “simpliﬁcation” steps on

(b) Induction on k. Assume sn(MN, k). We show sn(M, k). In case

that SN is closed under application. In order to prove this we must prove

Lemma (Newman [1942]). Assume that → is weakly conﬂuent. Then the

every M with M →+ M is good.

We must show that M is good, so assume K ←∗ M →∗ L. We may further

Figure 1. Proof of Newman’s lemma

Proposition. → is weakly conﬂuent.

(v M (v))N L (v M (v))N L

Figure 2. Weak conﬂuence

Corollary. Normal forms are unique.

Proof. Unfolding ∃˜ gives

(a) Derivations Mi of ∀x (Ai (x) → Ai (Sx)) from Hyp2 with constant

Proposition. There are derivations of Di from Hyp1 and Hyp2 with

Ki has free object variables zi+1 , zi and free assumption variables ui , vi , wi

Table 2. Derivation terms for ∨, ∧ and ∃

1.2.6. Conversions for ∨, ∧, ∃. In addition to the →, ∀-conversions

or as derivation terms (∨+

→ (M A∨B (u A .(N C →D LC )D , v B .(K C →D LC )D ))D .

MN | Mr | M (v0 .N0 , v1 .N1 ) | M (v, w.N ) | M (x, v.N )

(d) Induction on A. The case Rt with R distinct from ⊥ is given by

1.1.9. Gödel–Gentzen translation. Classical derivability Γ c B was

with x ∈/ FV(B g ). Now use (¬¬B g → B g ) → ∃˜ x Ag → ∀x (Ag →

(b) First note that c (B ↔ B g ) if B is without ∨, ∃. Now assume that

(u M (u A )B )A→B N A → M (N A )B .

(x M (x)A(x) )∀x A(x) r → M (r).

The closure of the conversion relation → is deﬁned by

(x y z (xz(yz)))(u v u)(u v u ) →

(v M (v))N L (v M (v))N L

→ (M A∨B (u A .(N C →D LC )D , v B .(K C →D LC )D ))D .

This must be generated by repeated applications of (Var ) from u M (x ,

so u M (x , v .N E) ∈ SN by (Var∃ ) and therefore u M (x , v .N )E ∈ SN

Case u M (v .N , v .N ) ∈ SN by (Var∨ ) from M , N , N ∈ SN. Let

k C [] is abbreviated k C , when is known from the context.

As an example we show that i ¬¬P → P. We describe the desired

I1 (P, i0 , . . . , in−1 ) := {a ∈ D | i0 , . . . , in−1 contains at least a zeros}.

T , k Γ[] → T , k A[]. (9)

A model M is called classical if ¬¬RM ( a ) → RM ( a ) for all relation

Note that from α A and A → B it follows that α B. To see this,

Γ, Rt, ¬Rt (Ax)

Clearly is in E. Furthermore with c := (a!, d ) we see that (c, i) = ai .