0% found this document useful (0 votes)
107 views312 pages

Set Theory, Logic, and Their Limitations - Machover, Moshé - 1996 - Cambridge New York - Cambridge University Press - 0521479983 - Anna's Archive

Uploaded by

kayloo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views312 pages

Set Theory, Logic, and Their Limitations - Machover, Moshé - 1996 - Cambridge New York - Cambridge University Press - 0521479983 - Anna's Archive

Uploaded by

kayloo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 312

FeV)

AY

Vy
S
Y

\
Set theory, logic and
their |imitations Yo ~

NN
Ne
whey,

SEN
.

28

=
\ “a

‘ ~<a

\
ws
~
AA VG
WC
SE

Re
NSIS

AS

WW
ws

N
¥

\
ee
-

|,
owe
“Saez,

——L.

— oo.
eo
AQj

2
FLORIDA STATE
UNIVERSITY LIBRARIES

MAR 241 1998

TALLAHASSEE, FLORIDA
Digitized by the Internet Archive
in 2023 with funding from
Kahle/Austin Foundation

https://archive.org/details/settheorylogicthOO0Omach
Set Theory, Logic and their Limitations
Set Theory, Logic
and their Limitations

Moshé Machover
King’s College London

= CAMBRIDGE
ee UNIVERSITY PRESS
Published by the Press Syndicate of the University of Cambridge
The Pitt Building, Trumpington Street, Cambridge CB2 1RP
40 West 20th Street, New York, NY 10011-4211, USA
10 Stamford Road, Oakleigh, Melbourne 3166, Australia

© Cambridge University Press 1996

First published 1996

Printed in Great Britain at the University Press, Cambridge

A catalogue record for this book is available from the British Library

Library of Congress cataloguing in publication data available

ISBN 0 521 47493 0 hardback


ISBN 0 521 47998 3 paperback

KT
Contents

Preface Vii

Mathematical induction

Sets and classes

Relations and functions

Cardinals

Ordinals

The axiom of choice

Finite cardinals and alephs

Propositional logic 101

First-order logic 142

Facts from recursion theory 194

10 Limitative results 210

Appendix: Skolem’s Paradox Pitfs

Author index 283

General index 284


Preface

This is an edited version of lecture notes distributed to students in two


of my courses, one on set theory, the other on quantification theory
and limitative results of mathematical logic. These courses are de-
signed primarily for philosophy undergraduates at the University of
London who bravely choose the Symbolic Logic paper as one of their
Finals options. They are also offered to mathematics undergraduates at
King’s College, London.
This then is a discourse addressed by a mathematician to an au-
dience with a keen interest in philosophy. The style of technical
presentation is mathematical. In particular, in logical notation and
terminology I generally conform to the usage of mathematicians. (It
seems that in this matter philosophers in any case tend follow suit —
after some delay.) But philosophical and methodological issues are
often highlighted instead of being glossed over, as is quite common in
texts addressed primarily to students of mathematics.
A naive presentation of set theory may be in order if the main aim is
instrumental: to acquaint would-be practitioners of mathematics with
the basic tools of their chosen trade and to inculcate in them methods
whereby nowadays the entire science is apparently reduced to set
theory. In a course of that kind, the student is understandably not
encouraged to scratch where it does not itch. But in the present course
such an attitude would be out of place. To be sure, here as well
set-theoretic concepts and results are needed as tools for formulating
and proving results in mathematical logic. But it would be perverse not
to alert would-be philosophers to the problematic aspects of set-
theoretic reductionism.
These considerations have largely dictated the presentation of set
theory: axiomatic, albeit unformalized. Critical notes about set
theoretic reductionism are sounded from time to time as a leitmotiv,
rounded off in a coda on Skolem’s Paradox. Also, the technical

Vii
Vill Preface

exposition of set theory is accompanied by historical remarks, mainly


because a historical perspective is needed in order to appreciate the
emergence of reductionism and the anti-reductionist critique.
In the exposition of mathematical logic, I have drawn heavily on
Chs. 1, 2, 3 and 7 of B&M (see Note below), which I had used for
many years as a main text for a postgraduate logic course. However,
considerable portions of the material presented in B&M had to be
omitted, either because they are too hard or specialized, or simply for
lack of time.
My greatest regret is that there is not enough time to include both
linear and rule-based logical calculi (my own favourite is the tableau
method). For certain technical reasons I had to sacrifice the latter.
However, as partial compensation, the linear calculi are developed in a
way that makes it clear that the logical axioms are mere stepping-
stones towards rules of deduction: once these rules are established, the
axioms can be shelved. Thus in practice the presentation comes quite
close to being rule-based. The axiom schemes have been designed so as
to make their connection with deduction rules quite direct and trans-
parent.
(The connoisseur will note that the propositional axiom schemes
have been chosen so that omitting one, two or three of them results in
complete systems for intuitionistic implication and negation, classical
implication, and intuitionistic implication. In particular, the only axiom
scheme that is not intuitionistically valid is a purely implicational one.)
Propositional logic is studied with reference to a purely propositional
language, rather than a first-order language as in B&M. This is done
for didactic reasons: although propositional languages in themselves
are of little interest, students are less intimidated by this approach.
For some tedious proofs that have been omitted, the reader is
referred to B&M. These omissions are more than balanced by the
addition of extensive methodological and explanatory comments.
A case in point is Lemma 10.10.12 (see Note below), which is the
main technical result needed for the present version of the Gédel-
Rosser First Incompleteness Theorem. I have omitted its proof, but
added a detailed analysis of the meaning of the lemma and the reason
why its proof works. When this is understood, the proof itself becomes
a mere technicality, almost a foregone conclusion. The analysis is
resumed after the proof of the Gddel-Rosser Theorem, to explain the
meaning of the Gddel—Rosser sentence and the reason for its remark-
able behaviour.
Preface IX
One major respect in which this course is not self-contained is its
heavy borrowing from recursion theory. For further details, see Pre-
view at the beginning of Ch. 9.
The Problems are an essential part of the text; the results contained
in many of them are used later on.

Moshé Machover

Note
@ Throughout ‘B&M’ refers to

J. L. Bell and M. Machover, A course in mathematical logic,


North-Holland, 1977 (second printing 1986).

@ The system of cross-references used here is quite common in


mathematical texts. It is illustrated by the following example.
‘Def. 2.3.4 refers to the fourth numbered article (which in this
case is a definition) in §3 of Ch. 2. Within Ch. 2, this definition is
referred to, more briefly, as ‘Def. 3.4’.
e I would like to express my gratitude to Roger Astley, Michael
Behrend and Tony Tomlinson of Cambridge University Press for
their expert help in preparing the manuscript.

Warning
In the last three chapters of this book there is a systematic interplay
between parallel sets of symbols; one set consisting of symbols in
ordinary (feint) typeface:
eed ‘_? ENT IS ‘>? ey" Vie ee oa?

and the other of their bold-face counterparts:


Pay tr Aer ey ee RR a
ie Te Dn. Say anes PG, Ba —f CA em TSS AP SA ed

i)

For explanations of the purpose of this system of notation, and


warnings against confusing a feint symbol with its bold-face counter-
part, see Warnings 8.1.2, 9.1.4 and 10.1.11 and Rem. 10.1.10.
Unfortunately the bold-face characters could not always be made as
distinct from their feint counterparts as would be desirable. The reader
is therefore urged to exercise special vigilance to discern which type-
face is being used in each instance.
eg ee a =P
ee ee mows ere
GD .- \¢ [wae
6 Oe i
»° IG?'@.. (Ame
oan
awe —_—s De -
Vi... oo y—=E->
rP AP e e= |7
~~
0
Mathematical induction

§1. Intuitive illustration; preliminaries


A familiar trick: dominoes standing on end are arranged in a row; then

Omit 2. n n+l

the initial domino (here labelled ‘0’) is given a gentle push — and the
whole row comes cascading down.
If you want to perform this trick, how can you make sure that all the
dominoes standing in a row will fall? Clearly, the following two
conditions are jointly sufficient.

1. The initial domino (domino 0) is made to fall to the right (for


example, by giving it a push).
2. The dominoes are arranged in such a way that whenever any one
of them (say domino n) falls to the right, it brings down the next
domino after it (domino n + 1) and causes it also to fall to the
right.

A moment’s reflection shows that these two conditions are sufficient


whether the row of dominoes is finite or proceeds ad infinitum. (In the
former case, Condition 2 does not apply to the last domino.)

The reasoning that allows us to infer from Conditions 1 and 2 that all
the dominoes will fall is based on the Principle of Mathematical (or
Complete) Induction. This is a fundamental — arguably the most
fundamental — fact about the so-called natural numbers (0, 1, 2, etc.).
It has several equivalent forms, three of which will be presented here.

1
2 0. Mathematical induction

WARNING

The term ‘induction’ used here has nothing to do with inductive


reasoning in the empirical sense.

We shall make use of the following terminology and notation.


By number we shall mean natural number. The class {0, 1, 2, ... }
of all numbers will be denoted by ‘N’. We shall use lower-case italic
letters as variables ranging over N.
If P is a property of numbers and n is any number, we write ‘Pn’ to
mean that n has the property P. The extension of P is the class of all
numbers n such that Pn. This class is denoted by ‘{n: Pn}’.
From an extensional point of view, P is identified with its extension:
P =({n: Pn}; and hence Pn is equivalent to n € P. (Here ‘e’ is short
for ‘is a member of’.)
We write ‘=’ as short for ‘implies that’, ‘iff’ or ‘<=’ as short for ‘if
and only if’, ‘V’ as short for ‘for all’, and ‘m <n’ as short for ‘m <n
orm=n’.
We state here as ‘facts’ the following elementary properties of the
ordered system of numbers.

1.1. Fact
The relation < between numbers is transitive: whenever k<m and
m <n, thenalsok <n.

1.2. Fact
The relation < obeys the trichotomy: for any numbers m and n, exactly
one of the following three holds:

m=<norm=norn<m.

1.3. Fact
Every number n has an immediate successor n + 1, such that, for any
mnsmiin tls m.

1.4. Fact
Zero is the least number: 0 < n for all n.
§2. Weak induction 3

hes TGR

For any number m # 0, there is ann such thatm =n + 1.

§2. Weak induction


Perhaps the most commonly used form of the Principle of Mathemat-
ical Induction is the so-called ‘Weak’ Principle of Induction. This
asserts, for any property P of numbers, that in order to prove VnPn
(i.e., that all numbers have the property P), it is sufficient to prove
two things: first, PO (i.e., that the number zero has P) and second,
Vn[Pn => P(n + 1)] (i.e., that whenever n is a number having the
property P then its successor n + 1 also has P). In schematic form:

PO, Vn[Pn > P(n + 1)]


(2.1)
VnPn
A proof of a statement VnPn by weak induction thus falls into two
sections. One section, called the basis of the inductive proof, is a proof
that PO holds. The other section, called the induction step, is a proof
that Vn[Pn = P(n + 1)]. When these two sections are completed (not
necessarily in the above order), the proof that VnPn is complete.
In the induction step, in order to prove that Vn[Pn => P(n + 1)],
you have to show that if n is any number such that Pn holds, then
P(n + 1) holds as well. In other words, you have to deduce P(n + 1)
from the assumption that Pn holds. The latter assumption is called the
induction hypothesis.
The induction step is therefore performed as follows. You consider
an arbitrary number, say n, about which you make just one assump-
tion: that Pn holds (the induction hypothesis). Using this assumption,
you try to deduce that P(n + 1). When this is achieved, the induction
step is complete.
In using the induction hypothesis Pn to deduce P(n + 1), you are
merely considering an arbitrary hypothetical n for which Pn holds,
without however committing yourself to the assumption that such a
number exists; in other words, you are adopting Pn as a provisional
hypothesis. If you succeed in deducing P(n + 1) from this provisional
hypothesis, then you have established the conditional statement
Pn => P(n + 1); and as you have established this for arbitrary number
n, you are entitled to infer that Vn[Pn > P(n + 1)].
Note that if you have completed the induction step only (without the
4 0. Mathematical induction

basis — that is, you have not proved that PO) then you are not entitled
to conclude that Pn holds for all numbers n; indeed you are not even
entitled to conclude that there exist any numbers n for which Pn
holds. For example, let P be the property of being a number that is
greater than itself; so Pn means that n > n. Now, from the hypothesis
n> n it is easy to deduce n+1>n+4 1 (for example, by adding 1 to
both sides of the hypothesis); so we have shown that Vn[Pn =>
P(n + 1)]. But it doesn’t follow that there is any number greater than
itself.

2.2. Remark
The Weak Principle of Induction was first invoked in 1653 by Pascal in
the proof of one of the results (Corollary 12) in his Traité du triangle
arithmétique (published in 1665). Pascal does not give an explicit
formulation of the principle in general, for arbitrary P; but from his
presentation of the method of proof it is clear that the general principle
is being invoked. We shall not reproduce Pascal’s proof here. Instead,
we shall illustrate the use of weak induction in proving a simpler result.

2.3. Example
We shall prove that, for all n,

(*) 0+1+2+---+n=n(n+1)/2.

PROOF
Define the property P by stipulating that Pn iff (*) holds for n. We
show by weak induction that VnPn.

Basis. For n = 0 the sum on the left-hand side reduces to 0, and the
value of the right-hand side is 0. Thus PO.

Induction step. Let n be any number such that Pn; thus our induction
hypothesis is that (*) holds for this n. Then

OFT +2+---+n+
(n+ 1) =nn+ D2 +H +1) by ind. hyp.
=(n + 1)(n/2 + 1)
= (n + 1)(n + 2)/2.
(The last two steps consist of simple algebraic manipulation.) Thus
§3. Strong induction 5
from the induction hypothesis we have deduced that

OFT 22-4 (n+ 1) = (a 1) (n+ 2)/2.


This equation says that P(n + 1) — it is the same as (*), but with n + 1
in place of n. So we have shown that Pn > P(n + 1). a

§3. Strong induction


The so-called ‘Strong’ Principle of Induction can be stated schematic-
ally as follows:

Vn[Vm < nPm => Pn]


(3.1)
VYnPn
Here, as before, P is any property of numbers. We have written
‘Ym <nPm’ as short for ‘all numbers m smaller than n have the
property P’.
Thus, to prove that all numbers have a given property P, it is
enough to prove that Vn[Vm < nPm = Pn]. To do this, you have to
show that if m is any number such that Vm <nPm holds, then Pn
holds as well; in other words, you have to deduce Pn from the
assumption that Va<nPm. This assumption is called the induction
hypothesis.
Note that a proof by strong induction does not have a separate
‘basis’ section.
As in the case of weak induction, here too the induction hypothesis
Ym <nPm is adopted provisionally, without presupposing it to be
actually true.
However, unlike the case of weak induction, here there is one
particular value of n for which the hypothesis Vim < nPm is in fact
always automatically true. To see this, observe that there does not
exist any m such that m <0; this follows at once from Facts 1.2 and
1.4. Therefore any statement of the form ‘for all m <0, ...’ (that is,
‘Ym <0 ...’) is considered by convention to be vacuously true. In
particular, Vm < 0Pm is always true.

3.2. Theorem
The Strong Principle of Induction follows from the Weak Principle of
Induction.
6 0. Mathematical induction

PROOF

Assume that P is a property of numbers such that Vn[Vm < nPm =>
Pn] holds. We shall show, using weak induction, that VnPn holds as
well. To this end, we define a new property Q by stipulating that, for
any number n,
(*) Qn q Vm < nPm.

(The subscript ‘df’ is short for ‘definition’.) Note that our assumption
regarding P can now be rewritten as
(#*) Yn[Qn => Pn].

We shall apply weak induction to Q, to prove that YnQn holds.


First, observe that by (+) QO is the same as Vm <0Pm, which — as
we have noted — is vacuously true.
Next, let n be a number and suppose (as induction hypothesis) that
Qn holds. From this hypothesis we shall deduce that Q(n + 1) holds as
well.
Using our induction hypothesis we infer from (**) that Pn holds. We
therefore have both Qn and Pn. But by (*) Qn means Vm < nPm.
Therefore what we have shown is that
(x94) Pm holds for all m < n.
From Facts 1.2 and 1.3 it is easy to see that m <n is equivalent to
m<n-+t 1, hence (***) can be rephrased as

Pm holds for all m <n + 1,

which, by the definition (*) of Q, means that Q(n + 1) holds. This


completes the proof of VnQn by weak induction.
From VnQn, which we have just proved, together with (**) it
follows at once that Pn holds for all n. |

§4. The Least Number Principle


Let M be any class of numbers; that is, M C N (M is a subclass of N).
By a least member of M we mean a number a € M such that a < m for
al me M.
Using Fact 1.2, it is easy to see that M cannot have more than one
least member; so if M has a least member we can refer to the latter as
the least member of M.
§4. The Least Number Principle Tf

The Least Number Principle (LNP) states:


IfM C Nand M is non-empty then M has a least member.

4.1. Theorem
The LNP follows from the Strong Principle of Induction.

PROOF

Let MCW and suppose that M does not have a least member. We
must show M is empty. To this end, let P be the property of not
belonging to M. Thus, for any n,

Pnosgn
¢€ M.

To show that M is empty is tantamount to showing that VnPn holds.


We shall do so by applying strong induction to P.
So let n be any number, and assume (as induction hypothesis) that
Ym<nPm holds. By the definition of P, our induction hypothesis
means that for all m <n we have m ¢ M. This is equivalent to saying
that m <n is not the case for any m € M. But by Fact 1.2 this means
that n < m for all m € M. Therefore n cannot belong to M, otherwise
it would be the /east member of M, contrary to our assumption that M
has no such member. Hence Pn holds, and our induction is complete.
e&

We shall now complete the cycle by proving:

4.2. Theorem
The Weak Principle of Induction follows from the LNP.

PROOF
Let P be a property of numbers such that PO and Vn[Pn > P(n + 1)]
hold. We must prove that VnPn holds. This amounts to showing that
the class

M =g, {n: Pn does not hold}

is empty. By the LNP, it is enough to show that M has no least


member.
Suppose that M does have a least member, m. Since PO holds, 0 is
8 0. Mathematical induction

not in M; hence m #0. Therefore by Fact 1.5 there is a number n


such that m=n+1.
From Fact 1.3 it follows at once that n < m. If n were in M, then we
would have m <n, because m is the least member of M; but m <n 1s
excluded by Fact 1.2, since we already have n<m. Therefore n
cannot be in M, which means that Pn must hold.
From our assumption that Vn[Pn = P(n + 1)] it now follows that
P(n + 1) holds; in other words, Pm holds. But then m cannot be a
member of M, let alone the /east member. Thus our assumption that
M has a least member leads to contradiction. a

We have thus shown that the Weak Principle of Induction, the Strong
Principle of Induction and the LNP are equivalent to one another.

4.3. Remark
While there is no evidence that the ancient Greek mathematicians
knew the Principles of Weak and Strong Induction, they did use
mathematical induction in the form of the LNP. We shall quote here
from a proof of Proposition 31 in Euclid’s Elements, Book VIII.
First we need a few definitions. By arithmoés (plural: arithmoi) the
Greeks meant what we call natural number greater than 1. An arithmos
b is said to measure an arithmos a if b<a and b goes into a (in
modern terminology: 5 is a proper divisor of a). An arithmos a is said
to be composite if there is an arithmos that measures it; otherwise, a is
said to be prime.
In Proposition 31 of Book VU, Euclid claims that every composite
arithmos is measured by some prime arithmos. He writes:
‘Let a be a composite arithmos. I say that it is measured by some prime
arithmos. For since a is composite, it will be measured by an arithmos,
and let b be the least of the arithmoi measuring it.’
Here the LNP is clearly invoked. The proof is now easily concluded: b
must be prime; otherwise, it would be measured by some smaller
arithmos c, which must then also measure a — contrary to the choice of
b as the least of the arithmoi measuring a.
Euclid also gives another proof of the same proposition, in which he
uses yet another form of the Principle of Induction: There does not
exist an infinite decreasing sequence of natural numbers.'

' On these matters see David Fowler, ‘Could the Greeks have used Mathematical
Induction? Did they use it?’, Physis, vol. 31 1994 pp. 252-265.
1
Sets and classes

§1. Introduction
1.1. Preview

Set theory occupies a fundamental position in the edifice of modern


mathematics. Its concepts and results are used nowadays in virtually all
standard mathematical discourse — not only in pure mathematics, but
also in applied mathematics and hence in all the mathematics-based
deductive sciences. In particular, set theory is used extensively in
technical discussions of logic and analytical philosophy.
The purpose of Chs. 1-6 is to present a minimal core of set theory,
adequate for the kind of application just mentioned. In particular, we
shall provide the set-theoretical vocabulary, notation and results
needed in later chapters, devoted to Symbolic Logic.
We shall not venture into the higher reaches of the theory, which are
of interest to specialist set-theorists. Nor shall we attempt a systematic
logical-axiomatic investigation of set theory itself.

1.2. Further reading


There are hundreds of books on set theory, many of them very good.
Among those pitched at a level similar to this course, there are two
classics:
Abraham A Fraenkel, Abstract set theory,
Paul R Halmos, Naive set theory.

Both contain more material than our course. Fraenkel’s book is


suitable for readers with relatively little previous mathematical know-
ledge. If you are mathematically more experienced, you may find it too
slow or verbose. Halmos is then likely to be more suitable.
For a more advanced, logical-axiomatic study of set theory, the two

9
10 1. Sets and classes

original masterpieces are:


Kurt Godel, The consistency of the continuum hypothesis (1940),
Paul J Cohen, Set theory and the continuum hypothesis (1966).

An alternative exposition of Gédel’s results and some additional


related material is in Chapter 10 of B&M. An alternative exposition of
Cohen’s results and much additional related material is in John L Bell,
Boolean-valued models and independence proofs in set theory.

1.3. Intuitive explanation


Intuitively speaking, a set is a definite collection, a plurality of objects
of any kind, which is itself apprehended as a single object.
For example, think of a lot of sheep grazing in a field. They are a
collection of sheep, a plurality of individual objects. However, we may
(and often do) think of them — it — as a single object: a herd of sheep.’
Note that in order to qualify as a set, the collection in question must
be definite. By this we mean that, if a is any object whatsoever, then a
either definitely belongs to the collection or definitely does not. For
this reason there is no such thing as the set of all blue cars, if ‘blue’ and
‘car’ are understood in their everyday fuzzy sense: my car is sort of
bluish, and a friend of mine has a vehicle that is half-way between a
car and a sad joke. (Most collections and concepts that are used in
everyday thinking and discourse are fuzzy; some philosophers have
therefore attempted to construct a theory of so-called fuzzy sets —
which are clearly not sets at all in the present sense of the term. This
difficult subject lies outside the scope of our course.)
From now on, whenever we speak of a collection (or plurality) we
shall tacitly take it to be definite, in the sense just explained. We shall
also use the word class as synonymous with collection.
The objects belonging to a class may be of any kind whatsoever —
physical or mental, real or ideal. In fact, being an object (in the sense
in which we shall use this term) is tantamount to being capable of
belonging to a collection.
In particular, since a set is a class regarded as a single object, it can
itself belong to a class. So we can have a class some, or even all, of

' Cf. Eric Partridge, Usage and abusage: ‘COLLECTIVE NOUNS; . .. Such collective nouns
as can be used either in the singular or in the plural (family, clergy, committee,
Parliament), are singular when unity (a unit) is intended; plural, when the idea of
plurality is predominant.’
§1. Introduction iil

whose members are sets. If such a class, in turn, is regarded as a single


object, we get a set having sets as (some of its) members. Thus, there
are sets of sets (sets all of whose members are sets), sets of sets of sets,
and so on.
The objects dealt with by set theory are therefore of two sorts: sets,
and objects that are not sets. An object of the latter sort is called an
individual; the German term Urelement (plural: Urelemente) is often
used as well for such an object. Somewhat surprisingly, it has turned
out that, as far as applications to pure mathematics are concerned,
individuals are in principle dispensable, so that set theory can confine
itself to sets only. We shall not make any ruling on this matter. Unless
otherwise stated, what we shall say will apply regardless of whether, or
how many, individuals are present.

1.4. Definition
We write ‘a € A’ as short for ‘[the object] a belongs to [the class] A’.
The same proposition is also expressed by saying that a is a member of
A, or an element of A, or that A contains a. We write ‘a ¢ A’ to
negate the proposition that a e€ A.

A class is specified by means of a definite property, say P, for which it


is stipulated that the condition Px is necessary and sufficient for any
object x’s membership in the class.

1.5. Definition
If P is any definite property, such that the condition Px is meaningful
for an arbitrary object x, then the extension of P, denoted by
Be ston a2

is the class of all objects x such that Px. Thus a € {x : Px} iff Pa.

Classes having exactly the same members are regarded as identical. Let
us state this more formally:

1.6. Principle of Extensionality (PX)


If A and B are any classes such that, for every object x,
a0 (HN SS 56 E /o35

then A= B.
1 1. Sets and classes

For example, the two classes

{x : x is an integer such that x? = x},


{y : y is an integer such that —1 < y < 2}

are equal: although the two defining conditions differ in meaning, they
are satisfied by the same objects — the integers 0 and 1.

1.7. Remark
Set theory (along with other parts of present-day mathematics) is
dominated by a structuralist ideology, which entails an extensionalist
view of properties. This means that properties having equal extensions
are considered to be equal; thus a property and its extension uniquely
determine each other.

§2. The antinomies; limitation of size


Since ancient times, mathematicians have dealt with infinite pluralities
as a matter of course — an obvious example is the class of positive
integers. However, until well into the 19th century there was great
reluctance to regard such pluralities as single objects, as sets in the
sense explained in 1.3. The infinitude of a class meant that more and
more of its members could be constructed or conceived of, without
limit. But to apprehend such a plurality as a single object seems to
impiy that all its members have ‘already’ been constructed or con-
ceived of, or at least that they are somehow all ‘out there’. This idea of
a completed or actual — rather than potential — infinity was (rightly!)
regarded with utmost suspicion.
However, the needs of mathematics as it developed in the 19th
century drove Georg Cantor (1845-1918) to create his Mengenlehre,
set theory, which admits infinite classes as objects. Despite early
hostility, set theory was soon accepted by the majority of mathemati-
cians as a powerful and indispensable tool; indeed, many regard it as a
framework and foundation for the whole of mathematics.
The success of set theory first lured its adherents into assuming that
every class can be regarded as a set. This assumption, known as the
Comprehension Principle, is however untenable: it leads to certain
logical contradictions or antinomies. The first such antinomy to be
discovered is called the Burali-Forti Paradox, after the person who
first published it, in 1897; but Cantor himself had been aware of it at
§2. The antinomies; limitation of size 13}

least two years earlier. The antinomy results directly from the assump-
tion that the class W of all ordinals is a set. (The theory of ordinals is
an important but quite technical part of set theory. In Ch. 4, when we
study the ordinals, we shall prove that W cannot be a set.) Similar
antinomies were later discovered by Cantor himself and by others.
Cantor was not too disturbed by these discoveries. He noticed that
the antinomies arose from applying the Comprehension Principle to
classes that were not just infinite but extremely vast. (An early result
of his set theory was that not all infinite classes have the same ‘size’.)
He concluded that some classes are not merely infinite but absolutely
infinite, hence simply too large to be comprehended as a single object.
Set theory would be on safe ground if the Comprehension Principle
were restricted to classes of moderate size.' However, he did not
specify precisely how to draw the line between moderately large
infinite classes, which can be regarded as sets with impunity, and vast
ones, which cannot be so regarded.
Matters came to a head in 1903, when Bertrand Russell published a
new antinomy, Russell’s Paradox, which he had discovered two years
earlier. Whereas previous antinomies arose in rather technical reaches
of set theory and therefore required lengthy expositions, Russell’s
Paradox checkmated the Comprehension Principle in two simple
moves, as follows. Let

S =¢4%.: x is a set such that.x-¢ x}.

Assuming that S is a set, it follows that S € S iff S satisfies the defining


condition of § — that is, iff S ¢ S. This is absurd.

The fact that an antinomy follows so easily from apparently sound


assumptions plunged set theory and logic (which cannot be sharply
demarcated from set theory) into a crisis.
In 1908, two solutions were proposed to this crisis. Both amounted
to imposing restrictions on the Comprehension Principle — but in two
very different ways. The first, proposed by Russell himself and embo-
died in his type theory, refused to accept {x : Px} as an object if the
condition Px is impredicative (that is, refers to a totality to which the
object, if it did exist, would belong).” Russell’s type theory, elaborated

' See Michael Hallett, Cantorian set theory and limitation of size. '
2 Russell’s paper, ‘Mathematical logic as based on the theory of types’, is reprinted in
van Heijenoort, From Frege to Gédel.
14 1. Sets and classes

by Whitehead and him in their three-volume Principia Mathematica


(1910, 1912, 1913) as a total system for logic and mathematics, turned
out to be quite complicated and cumbersome; and, at least in part
because of this, has won very few adherents.
The other solution, proposed by Ernst Zermelo, embodied an idea
similar to that entertained by Cantor: limitation of size.' Zermelo
proceeded to develop set theory axiomatically: he laid down postul-
ates, or [extralogical] axioms, from which the theorems of set theory
were to be deduced by elementary logical means. Besides an Axiom of
Extensionality (for sets), Zermelo’s axioms include certain particular
cases of the Comprehension Principle, which are regarded as safe
because — as far as one can tell — they do not allow the formation of
over-large sets and do not give rise to antinomies. In addition, Zer-
melo postulated a special axiom, the Axiom of Choice, which is not a
restricted form of the Comprehension Principle, but is needed for
proving certain important results in set theory itself and in other
branches of mathematics.”
In 1921-2, Abraham Fraenkel, Thoralf Skolem and Nels Lennes
(independently of one another) proposed one further postulate, the
Axiom of Replacement, which is vital for the internal needs of set
theory rather than for applications to other branches of mathematics.
This postulate is another apparently safe special case of the Compre-
hension Principle.?

The resulting theory —- known as Zermelo—Fraenkel set theory (ZF) -


has proved to be very convenient and has been adopted almost
universally by users of set theory.
While Zermelo’s axiomatic approach is, as far as we can tell,
sufficient for blocking the /ogical antinomies, such as the Burali-Forti
and Russell Paradoxes, it does not ward against another sort of
antinomy, which may be called linguistic or semantic.
Here is a modified version of a linguistic antinomy published in 1906
by Russell, who attributed it to G. G. Berry. Some English expressions
define natural numbers; for example, ‘zero’, ‘the square of eighty-
seven’, ‘the least prime number greater than eighty-seven million’.

' Russell too had briefly toyed with the same idea in 1905.
* A translation of Zermelo’s paper, ‘Investigations in the foundations of set theory I’, is
printed in van Heijenoort, From Frege to Gédel.
3 This postulate, as well as Zermelo’s Axiom of Separation and Axiom of Union Set, had
in fact been foreshadowed in 1899 by Cantor, in a letter to Dedekind, a translation of
which is printed in van Heijenoort, From Frege to Gédel.
§3. Zermelo’s axioms 5

Only finitely many numbers can be defined by English expressions that


use fewer than 87 letters, since clearly there are only finitely many such
expressions. Hence the class M of natural numbers not so definable
must be non-empty. By the Least Number Principle (see § 4 of Ch. 0),
M has a unique least member: the least natural number not definable
by an English expression using fewer than eighty-seven letters. But
observe: the italicized part of the previous sentence is an English
expression using just 86 letters, which (presumably) defines a number
that cannot be defined by an English expression using less than 87
letters!
On the face of it, this antinomy affects arithmetic rather than set
theory. However, as we shall see in §3 of Ch. 4 and §1 of Ch. 6, the
arithmetic of natural numbers can be simulated within set theory, so
that Berry’s antinomy threatens set theory as well.
We cannot go here into a detailed discussion of the linguistic
antinomies. Suffice it to say that the source of the trouble is that the
notion of definite property, and hence also that of class (as the
extension of such a property) has been left too loose and vague. Thus,
for example, the property of being definable by an English expression
using fewer than eighty-seven letters does not have a rigorously defined
meaning.
These antinomies can be blocked by laying down precise conditions
as to what may count as a definite property (or a class).' This may be
done by specifying a formal language with precise structure and rules,
and allowing as definite properties only such as can be expressed
formally in this language. For a formalized presentation of ZF see, for
example, Chapter 10 of B&M.
We shall present a fairly rigorous but unformalized version of ZF.
However, if desired it would be easy in principle (though tedious in
practice) to formalize our treatment.

§3. Zermelo’s axioms


Here we present (with minor modifications) Zermelo’s axioms except
for the Axiom of Choice, which we shall discuss in Ch. 5.
First, we shall assume that our universe of discourse — the class of all

' The first to formulate such precise conditions was Hermann Weyl in Das Kontinuum
(1918). A similar (and somewhat more formal) characterization was given independ-
ently by Skolem in a 1922 paper whose translation, ‘Some remarks on axiomatized set
theory’, is printed in van Heijenoort, From Frege to Godel.
16 1. Sets and classes

objects with which set theory deals — is non-empty. We do not


announce this assumption officially as a special postulate, because it is
conventional to consider it as a logical presupposition.
The objects in the universe of discourse are of two distinct sorts: sets
and individuals. Classes are admitted as extensions of properties: if P
is a definite property of objects, then we admit the class A = {x : Px}.
Note that, by Def. 1.5, to say that a € A is just another way of saying
that Pa (the object a has the property P).
In order to block the semantic antinomies we must however insist
that P be defined in purely set-theoretic terms, without using extran-
eous concepts.
The universe of discourse itself can be presented as a class according
to this format: it is {x : x = x}.
Although we refer to a class in the singular, this is merely a manner
of speaking and does not imply that the class is necessarily a single
object. From the axioms it will follow, however, that certain classes are
sets, and hence objects of set theory. Each set is identified with the
class of all its members.
The universe may also contain other objects, called individuals. An
individual is not a set and has no members. As we shall see shortly,
there is also a set that has no members — the empty set.
A class that is not a set is called a proper class; a proper class is not
an object, and therefore cannot be a member of any class.

As our first postulate we adopt the Principle of Extensionality 1.6. We


shall refer to it briefly as ‘PX’.
Zermelo postulated PX for sets only, as he did not consider classes
(except the universe of discourse) and used properties instead.

Before stating our next postulate, we introduce a useful piece of


notation.

3.1. Definition
If n is any natural number and aj, a, ..., a, are any objects, not
necessarily distinct, we put
(0,03,
4-1. G,} =ap{x oY x Ol Xi a) OFF Gy Ore Oli.
In particular, for n = 0 we get the empty class { } = {x : x #x}, which
we denote by ‘2’. (No object can differ from itself!)
§3. Zermelo’s axioms 7

3.2. Axiom of Pairing (A2)


For all objects a and b the class {a, b} is a set.

3.3. Remarks
(i) This set is called the pair of a and b. By PX we have
{a, b} = {b, a}.
(ii) For any object a we clearly have {a} = {a, a}, which is a set by
A2. This set is called the singleton of a.
(iii) From our assumption that there exists at least one object a, it
now follows that there exists at least one set, namely {a}. Note
however that we cannot prove the existence of an individual: our
postulates are neutral on this matter.

3.4. Definition
Let A and B be classes. If every member of B is also a member of A,
we say that B is a subclass of A (also, B is included in A, or A
includes B), briefly: B C A.
If BC A but A # B, we say that B is a proper subclass of A (also,
B is properly included in A, or A properly includes B), briefly:
BGA.

3.5. Warnings
(i) Beware of confusing ‘contains’ and ‘includes’; the former refers
to the relation of membership € while the latter refers to the
relation C just defined.
(ii) However, this terminological distinction is not observed by all
authors, so watch out for other usages.
(iii) Also, the notation introduced in Def. 3.4 is not universally
accepted. Some authors use ‘C’ instead of ‘C’ for not-necessarily-
proper inclusion; and ‘’ instead of ‘C’ for proper inclusion.

The following postulate was one of Zermelo’s central ideas.

3.6. Axiom of Subsets (AS)


If BC AandA is aset then so is B.
18 1. Sets and classes

3.7. Definition
If A is a class and P is a definite property such that the condition Px is
meaningful for any object x, we put

{x € A: Px} =g {x : x € A and Px}.

3.8. Remarks
(i) Zermelo’s formulation of AS, clearly equivalent to the one used
here, said (in effect) that if A is a set then the class {x € A: Px}
is always a set. Since this class separates or singles out those
members of A that have the property P, he called AS the Axiom
of Separation (Aussonderung). This name is still in current use.
(ii) The intuitive idea behind AS is clear: if B C A and A is not too
vast, then B cannot be too vast either.

3.9. Theorem
© is a set.

PROOF

Clearly© is included in any class, and in particular in any set. By


Rem. 3.3(iii) there exists a set. Hence © is included in some set, and
by AS is itself a set. a

3.10. Theorem
The class of all objects (the universe of discourse) and the class of all
sets are proper classes.

PROOF
We saw in § 2 that Russell’s class,

{x : x is a set such that x ¢ x}

cannot be a set. Since Russell’s class is included in the class of all sets,
the latter cannot be a set by AS. The same applies to the universe of
discourse. 8
§3. Zermelo’s axioms 19

3.11. Definition
If A is any class, we put

UA =a: {x : x € y for some y € A}.


UA is called the union class of A’.

3.12. Axiom of Union set (AU)


If A is a set then so is UA.

3.13. Remarks
(i) The members of UA are the members of the members of A.
(ii) Intuitively, the idea behind AU is that if A is a set then it does
not have ‘too many’ members; and each of these, being an object
(an individual or a set), in turn does not have ‘too many’
members. Therefore UA - obtained by pooling together not-too-
many collections, none of which is too vast — cannot itself be too
vast.

3.14. Definition
For any classes A and B, we put
AU B= {xix e A orx eB}.

A U Bis called the union (or join) ofA and B.

3.15. Theorem
AU Bis aset iff both A and B are sets.

PROOF

If A and B are sets, then A U B = 8 2 B}, which is a set by A2 and


AU. The converse follows easily from AS. a

3.16. Theorem
If n is any natural number and aj, a2, ..., Gm, are any objects, the class
(ay, 2358 5, ) OS @ SEL.
20 1. Sets and classes

PROOF

By (weak) induction on n.

Basis. For n = 0 the assertion of our theorem is Thm. 3.9.

Induction step. By Def. 3.14,

{a1, a2,+++54n; An+1} = {a1, a2,+++5 An} U {Gn 2}

which is a set by the induction hypothesis, Rem. 3.3(ii) and Thm. 3.15.
ie

3.17. Definition
If A is any class, we put

PA =g- {x : x is a set such that x C A}.


PA is called the power class of A.

3.18. Axiom of Power set (AP)


If A is a set then so is PA.

3.19. Remark
Intuitively, the idea behind AP is that although PA can be very large —
in fact, much larger than A — its size is nevertheless bounded provided
A itself is not too vast.

3.20. Problem
Prove that if A is a class of sets (that is, a class all of whose members
are sets) such that UA is aset, then A is a set as well.

The last axiom we shall postulate here is

3.21. Axiom of Infinity (AI)


There exists a set Z such that © € Z and such that for every set x € Z
alox OU {xe Z..
$4. Intersections and differences Dil

3.22. Remarks
(i) Without AI it is impossible to prove that there are infinite sets.
On the other hand, it is easy to see intuitively that any set Z
satisfying the conditions imposed by AI must be infinite. We shall
be able to prove this rigorously when we have a rigorous defini-
tion of infiniteness.
(ii) A2, AS, AU and AP are clearly particular cases of the Principle
of Comprehension: they say that certain classes are sets. Al-
though AI as it stands is not of this form, we shall see later that it
is equivalent to the proposition that a certain class, «, is a set.

§4. Intersections and differences


The following definitions will be needed later on.

4.1. Definition
If A is any class,
(\A =a {x x € y for everyy € A}.
(\A is called the intersection class of A.

4.2. Definition
If A and B are classes,
AM B=7ix.x7
eA andx e Bh.

AN Bis called the intersection (or meet) of A and B.

4.3. Definition
If A is any class,
Ae ke ee).

AS is called the complement of A.

4.4. Definition
If A and B are any classes,
A-B=g AN B.

A — B is called the difference between A and B.


DD 1. Sets and classes

4.5. Problem
(i) Prove that if A is a non-empty class then ()A is a set. What is
No
(ii) Prove that if A or B is aset then so is AN B.
(iii) Prove that A and A‘ cannot both be sets.
2
Relations and functions

§1. Ordered n-tuples, cartesian products and relations


1.1. Preview
By Def. 1.1.5, the extension of a property P of objects is the class {x :
Px}. Recall (Rem. 1.1.7) that from an extensionalist point of view a
property and its extension determine each other uniquely; so that—
wielding Occam’s razor, the structuralist mathematician’s favourite
instrument—one can identify the two and pretend that a property
simply is its extension. As set theory developed, it transpired that a
similar procedure could be applied to other fundamental mathematical
notions such as relation (among objects) and function: instead of
taking these as independent primitive notions, as had been done in the
early days of set theory, they could be reduced to classes and the
membership relation. In this and the next section we shall see how this
is done.

For any two objects a and b, not necessarily distinct, we need a unique
object (a, b) called the ordered pair of a and b [in this order]. It is not
really important how the ordered pair is defined, so long as the
following condition is satisfied:
(1.2) (a,b). = (c,d) a= cand b =u,

1.3. Warning
The ordered pair (a,b) must not be confused with the set {a, b},
sometimes known as an unordered pair, whose members are just a and
b. For example, the sets {a, b} and {b, a} are always equal (see Rem.
1.3.3(i)), but by (1.2) the ordered pairs (a,b) and (b,a) are equal
only if a= 6. However, when there is no risk of confusion we shall
often omit the adjective ‘ordered’ and say ‘pair’ when we mean ordered
pair.

23
24 2. Relations and functions

As part of the reductionist programme aiming to reduce all mathema-


tical concepts to the notion of class and the membership relation, the
following rather artificial definition, first proposed by Kazimierz Kura-
towski in 1921, has been widely accepted.

1.4. Definition
For any objects a and b,

(a,b) =a {{a}, {a, b}}.

1.5. Problem
Prove that (1.2) follows from Def. 1.4.

More generally, for any number n and any n objects a), a2, ..., a,
—not necessarily distinct—we need a unique object (a;, a2, ..., a)
called the ordered n-tuple of a,, a, ..., a, {in this order]. Again, it is
not really important how ordered n-tuples are defined, so long as the
following condition—of which (1.2) is a special case —is satisfied:

(176) COT Say, ean) OP ay ee De


+ @, =, b. tort = 1,233.4. n,.

Again, we shall often say ‘n-tuple’ as short for ‘ordered n-tuple’.

The following definitions deliver the goods. Proceeding inductively, we


supplement Def. 1.4 by:

1.7. Definition
For any 72, and @bjectS ay, do... Ge) age

(a1, a2,+++5,4n, An+1) Sali ((a1, A2,-++, anys An+1)-

1.8. Problem
Prove (1.6) for all n > 2. (Use weak induction on n, taking n = 2 as
basis.)
$1. Ordered n-tuples 25

There remain the cases n= 1 and n=0. For n= 1, condition (1.6)


reduces to:

{a) = (b)sa=b.
The simplest way to satisfy this is to adopt the following.

1.9. Definition
(a) =a¢ a.

As for n =0, condition (1.6) reduces to the unconditional equality


() = (), which will hold trivially, no matter how we define (). Since
© is the simplest object, the simplest convention to adopt is

1.10. Definition
() =a ©.

1.11. Remark
The equality which was decreed by Def. 1.7 for n = 2, now holds also
for n = 1 by virtue of Def. 1.9. However, it does not hold for n = 0,
because by Def. 1.9 (a) = a, whereas by Def. 1.10 ((), a) = (©, a).

We proceed to define the notions of cartesian product and cartesian


power.

1.12. Definition
(i) For any classes A;, Az, ..., A,, not necessarily distinct, their
cartesian product [in this order] is the class
A, X Az X::: X An =at

helen Marit) i Xd rAd, @ Arteria ne as

that is, the class of all n-tuples whose i-th component belongs to
A tori — 002, sacs I:
(ii) The n-th cartesian power of a class A is the cartesian product of
A with itself n times:
A= aA KOA X=? XA,
n times
26 2. Relations and functions

that is, the class of all n-tuples of members of A. In particular,


Ab Aand AU =s 4) Pao,

1.13. Remarks
(i) In Def. 1.12(i) we have used a convenient generalization of the
class notation introduced in Def. 1.1.5. Although it is almost
self-explanatory, let us spell it out.
Suppose F(x, x2, ..., X,) is an object whenever x1, X2,...,
X, are objects; and suppose P(x,;, x2, ..., X,) iS a condition
involving x1, X2,...,X,- Then
WA Grwee peas A BS Bap 2 tho 2 1

is defined to be the class

{yo there Existix;. 17. <5 Sucn Enat


F (X45 X35). 04 Hy) = yrand Pape ee

(ii) It is easy to see that, for any n>1, A, X A, X--:X A, = © iff


A; = © for at least one 7.

Intuitively, if n =1 and R is an n-ary relation on a class A, then for


any n-tuple of members of A it is meaningful to say that R holds or
does not hold for it. The class of all those n-tuples for which R does
hold is known as the extension of R. From an extensionalist point of
view, two relations are identical iff they have the same extension.
Thus, a relation and its extension uniquely determine each other. In
the spirit of the reductionist programme mentioned above, a relation is
simply identified with its extension. Hence the following

1.14. Definition
(i) For any n = 1 and any class A, an n-ary relation on A is a class of
n-tuples of members of A —that is, a subclass of A”.
(ii) In particular, a property on A is a unary relation on A—that is, a
subclass of A.

1.15. Remarks
(i) If R is an n-ary relation we shall often write ‘R(a;, a), ..., dp)’
as short for ‘(a, a2, ..., G,) € R’. In the special case where R is
a binary relation we shall often write ‘aRb’ for ‘(a, b) € R’.
§2. Functions; the axiom of replacement 21.
(ii) We could extend Def. 1.14(i) to the case n = 0, but the resulting
notion of 0-ary relation is found to be of little use.

§2. Functions; the axiom of replacement


Intuitively, if f is a function (or map, or mapping) then f assigns to
any object x at most one object fx as value. The class of all objects x
to which a value fx is assigned by f is called the domain [of definition]
of f and denoted by ‘dom f’.
The graph of f is then the class {(x, fx): x € dom f}. Note that the
graph of a function is a class of pairs. But not every class of pairs can
be the graph of a function: a class G of pairs is the graph of a function
iff for any object x there is at most one object y such that (x, y) € G.
From an extensionalist point of view, two functions are identical if
they have the same graphs. In the spirit of reductionism, we can
therefore identify a function with its graph:

2.1. Definition
A function (a.k.a. map or mapping) is a class f of ordered pairs
satisfying the functionality condition: whenever both (x,y) € f and
(x27) € f then-y
= z.

2.2. Definition
Let f be a function.

(i) The domain of f is the class


domf =qp {x : (x, y) € f for some y}.
(ii) If x € dom f, then the value of f at x — usually denoted by ‘fx’ — is
the [necessarily unique] y such that (x, y) € f.
(iii) The range of f is the class
ran
f =g; {fx : x € dom f}.

2.3. Problem
Verify that from Defs. 2.1 and 2.2 it follows that a function f is equal
to its own graph; that is,

f= shor domf
28 2. Relations and functions

Hence prove that functions f and g are equal iff dom f = dom g and
fx = gx for every x in their common domain.

2.4. Definition
Let f be a function.
(i) We say that f is a map from A to B (or that f maps A into B) if
dom f = A andranfC B.
(ii) We say that f is a surjection from A to B (or that f maps A onto
B) if dom f = A and ranf = B.
(iii) We say that f is an injection (or a one-to-one map) if whenever x
and y are distinct members of domf then fx and fy are also
distinct.
(iv) We say that f is a bijection from A to B if it is an injection as
well as a surjection from A to B (that is, a one-to-one map from
A onto B).

We shall now enquire when a relation or a function is a set.

2.5. Lemma
Let A and B be non-empty classes. Then A X B is a set iff both A and B
are sets.

PROOF

Let a and b be any members of A and B respectively. Then by Defs.


1.4 and 1.12 we have

fa, bvet{a}, (a. bl) =a, byeA <b.


Therefore by Def. 1.3.11

{a, b} e U(A x B).


Since both a and b belong to {a, b}, it follows, again by Def. 1.3.11,
that both are members of UU(A x B). Thus we have shown that
ACUU(A x B)and BC UU(A x B), hence A U BC UU(A x B).
Also, it is easy to see that UU(a4 x B)C AUB. Therefore by PX
we have
UU(A
x B)= AUB.
If A x B is a set, it follows from AU and Thm. 1.3.15 that A and B
are sets as well.
§2. Functions; the axiom of replacement 29

Conversely, if A and B are sets, then by Thm. 1.3.15 and Prob.


1.3.20 it follows that A x B is a set as well. |

2.6. Theorem
Let n=1, and let Aj, Az, ..., A, be non-empty classes. Then
Aga Max <= “XA MS asetiff Avis a set for eacrt = lo lae tf.

PROOF

By weak induction on n.

Basis. For n = 1 the assertion of our theorem is trivial, since in this


case A; X A, X-:- X A, is simply A, (see Defs. 1.12(i) and 1.9).

Induction step. It is easy to see that

Ay X Ao X°°: X A, X Anas = (Ay X Az X->* X A,) X Ana

(use Defs. 1.12(i) and 1.7 and Rem. 1.11). Hence, by Lemma 2.5 and
the induction hypothesis, A, X A, X--+ X A, X Ajn+, 18 a Set iff A; is
ARSC OMCACINT — lee TLe I) oe he]

2.7. Corollary
If A is a set and R is an n-ary relation on A (for some n 2 1) then R is a
set as well.

PROOF

By Def. 1.14 we have R C A”. If A = @ then A” = © by Def. 1.12(11)


and Rem. 1.13(ii); hence R = ©. If A is a non-empty set then A” is a
set by Thm. 2.6, hence R is a set by AS. ci

2.8. Theorem
Let f be a function. Then f is a set iff both dom f and ran f are sets.

PROOF

It is easy to verify that


UUf = dom f
Uranf.
30 2. Relations and functions

From this the required result follows, using the same argument as in
the proof of Lemma 2.5. a

At this point we introduce

2.9. Axiom of Replacement (AR)


If f is a function and dom
f is a set then ran f is a set as well.

2.10. Remarks
(i) AR is clearly a particular case of the Comprehension Principle.
(ii) In view of Thm. 2.8, AR is equivalent to the proposition that if f
is a function such that dom f is a set then f itself is a set. The
intuitive idea behind AR is that f has exactly ‘as many’ members
as does dom f : for each a e dom f, f contains the corresponding
pair (a, fa). Therefore if dom f is not too vast, neither is f itself.
(iii) In mathematical applications, a function f is almost always
defined as a mapping from A to B, where both A and B are
known in advance to be sets. It then follows from AS and Thm.
2.8 that ran f and f itself are sets. AR is not needed for this. But
as we shall see AR plays an important role within set theory
itself.

§3. Equivalence and order relations


3.1. Preview
In this section we discuss two kinds of relation that are of particular
importance, not only in set theory but in mathematics as a whole.
Throughout the section, A is an arbitrary class.

3.2. Definition
R is an equivalence relation on A if R is a binary relation on A such
that, for any members x, y and z of A, the following three conditions
are satisfied:

xRx (reflexivity),
if xRy then also yRx (symmetry),
if xRy and yRz then also xRz (transitivity).
§3. Equivalence and order relations ol
3.3. Example
The paradigmatic example of an equivalence relation on A is the
binary relation {(x,x):x € A}, called the identity (or diagonal)
relation on A, and denoted by ‘id4’. By the way, id, is clearly a
function; indeed, it is a bijection from A to itself.

3.4. Definition
Let R be an equivalence relation on A. For each a € A we put

[alr = df {x é xRa}.

We call [a]r the R-class of a, or the equivalence class of amodulo R.


Where there is no risk of confusion we omit the subscript ‘R’ and write
simply ‘[a]’.

3.5. Theorem
Let R be an equivalence relation on A and let a and b be any members
ofA. Then [a] = [6] iff aRb.

PROOF

(=). By reflexivity, aRa, so aé[a]. If [a]=[b] then by PX also


a €[b], so that aRb.
(<=). Suppose aRb. If x € [a], then xRa, hence by transitivity xRb,
so that x € [b]. Thus we have shown that [a] C [b].
Also, from aRb it follows by symmetry that bRa, so the argument
we have just used shows that [b] C [a]. Hence by PX [a] = [b]. ®

3.6. Corollary
Let R be an equivalence relation on A and let a be any member of A.
Then a belongs to exactly one R-class, namely [a].

PROOF

We have seen that a € [a]. If also a € [b] then by Def. 3.4 aRb, so by
Thm. 3.5 it follows that [a] = [5]. a

3.7. Definition
(i) S is a sharp partial order on A if S is a binary relation on A such
that, for any members x, y and z of A, the following two
BP: 2. Relations and functions

conditions are satisfied:

if xSy, then ySx does not hold (anti-symmetry),


if xSy and ySz then also xSz (transitivity).

(ii) B is a blunt partial order on A if B is a binary relation on A such


that, for any members x, y and z of A, the following three
conditions are satisfied:

xBx (reflexivity),
if xBy and yBx thenx = y (weak anti-symmetry),
if xBy and yBz then also xBz (transitivity).

3.8. Example
Let A be a class of sets (that is, all the members of A are sets rather
than individuals). Let S and B be the restrictions to A of C and C
respectively; that is,

S =a (xy) €A aC. y) and B=eitu yeAscaery


Then it is easy to see that S and B are a sharp and a blunt partial
order, respectively, on A.

3.9. Problem
Let S and B be a sharp and a blunt partial order, respectively, on A.
Put

Sb =4,SUid, and Bt =% B- id,.


(For the definitions of id, and — see Ex. 3.3 and Def. 1.4.4.)

(i) Prove that s> and B* are a blunt and a sharp order on A,
respectively.
(ii) Verify that S6* = § and B* = B.

3.10. Remarks
(i) The qualifications ‘sharp’ and ‘blunt’ are often omitted and a
partial order of either kind is referred to simply as a ‘partial
order’. There is no real harm in this, for two reasons. First,
because it is usually clear from the context which kind of partial
order is meant. Second, as shown in Prob. 3.9, there is a natural
$4. Operations on functions 33

mutual association between a sharp partial order and a blunt


partial order, whereby the latter is obtained from the former by
applying b and the former from the latter by applying i
(ii) Sharp partial orders are often denoted by symbols such as ‘<’ or
‘<"; the corresponding blunt partial orders are then denoted by
symbols such as ‘<’ or ‘<’ respectively.

3.11. Definition
(i) S is a sharp total order on A if S is a binary relation on A such
that, for any members x, y and z of A, the following two
conditions are satisfied:

exactly one of the following three disjuncts holds


xSy or x = y or ySx (trichotomy),
whenever xSy and ySz then also xS$z (transitivity).

(ii) B is a blunt total order on A if B is a binary relation on A such


that, for any members x, y and z of A, the following three
conditions are satisfied:

xBy or yBx (connectedness),


if xBy and yBx then x = y (weak anti-symmetry),
if xBy and yBz then also xBz (transitivity).

3.12. Problem
Let S and B be a sharp and a blunt total order, respectively, on A.
Prove that
(i) S is a sharp partial order, (i1) S> is a blunt total order,
(iii) B is a blunt partial order, (iv) Brisa sharp total order,
on A.

§4. Operations on functions


The following definitions will be needed later on.

4.1. Definition
If f and g are functions such that ran f C dom g, we put

Pot Se x weV Re ey ea
34 2. Relations and functions

go f—often denoted briefly ‘gf’— is called the composition of f and g.


(Note reading from right to left!)

4.2. Problem
Show: gof is a function, dom(gef) =domf and ran(g°ef) Crang.
Moreover, for any x in dom(g° f)—which is also dom f —check that
(go f)x = (fr).

4.3. Definition
If f is an injective (that is, one-to-one) function we put
Fog =the ere
f—} is called the inverse of f.

4.4. Problem
Verify that f—! itself is an injective function and, moreover,
dom(f~ )= ran f, ran(f—')
= domf,
fle f = idgoms fof? = idwnf.

(For the definition of id see Ex. 3.3.)

4.5. Problem
Prove that if f is a function from a proper class to a set, then f is not
injective.

4.6. Definition
If f is a function and C C dom f, we put
GO) ifGeer (eat) xe Cl:
(ii) f[C] =ae {fx : x € C}.
fC is called the restriction of f to C and f[C] is called the image of C
under f.

4.7. Problem
Verify that fC is a function, dom(f!C)=C and ran(f}C)=
f[C]. Moreover, (f }C)x = fx for every x € C.
$4. Operations on functions 3)

4.8. Problem
Let F be a class whose members are functions. Show that UF is a
function iff the following coherence condition is fulfilled: fx = gx for
all f and g in F and all x edomf dom g. Assuming this condition
holds, what are dom F and ran F?
3
Cardinals

§1. Equipollence and cardinality


We start by defining a binary relation ~ on the class of all sets:

1.1. Definition
Let A and B be sets. We say that A and B are equipollent, briefly:
A = B, if there exists a bijection from A to B (that is, a one-to-one
map from A onto B).

1.2. Theorem
Equipollence is an equivalence relation on the class of sets.

PROOF

For any set A, id, is a byection from A to itself; so = is reflexive.


If f is a bijection from A to B then clearly f~' is a bijection from B
to A; so = is symmetric.
Finally, if f is a bijection from A to B and g is a bijection from B to
C, then go f is a bijection from A to C; so = is transitive. ||

It is convenient to introduce the following

1.3. Definition (incomplete)


To each set A we assign an object |A|, called the cardinality of A, such
that for any two sets A and B,|A| =|B| iff A ~ B.
An object of the form |A| for some set A is called a cardinal.

36
§1. Equipollence and cardinality oH
1.4. Remarks

(i) Def. 1.3 is incomplete, because we have not specified what the
object |A| is or how it is to be chosen.
Cantor regarded cardinals as special abstract entities of a new
kind. In effect, this amounted to introducing the notion of
cardinal as a separate primitive notion.
However, it would obviously be more convenient — and con-
form to the reductionist programme — if cardinals were among the
hitherto posited objects of set theory. In this spirit, Frege pro-
posed in 1884 the elegant idea of defining |A| as [A]~, the
equivalence class of A modulo = (see Def. 2.3.4). The condition
required by Def. 1.3 —|A| =|B|< A = B - would then follow at
once by Thm. 2.3.5.
This procedure, novel at the time, was to become standard
practice, used with respect to various equivalence relations that
arise in numerous mathematical situations.
Ironically, Frege’s procedure does not work at all well in the
present case, where the equivalence relation is ~. Unaware that
the Comprehension Principle had to be restricted, he assumed as
a matter of course that [A]~ is always a set, hence an object.
Unfortunately, this is in general false. For example, if A is a
singleton, then [A]~ is the class of all singletons, and hence
UA]. is the class of all objects, the entire universe of discourse,
which is a proper class by Thm. 1.3.10. Hence by AU [A]~ must
be a proper class as well. This is very inconvenient, because we
would like to be able to form classes of cardinals, which is
impossible if cardinals are proper classes.
Fortunately there are other ways of defining cardinals, satisfy-
ing the requirement of Def. 1.3, while ensuring that the cardinals
are sets. Later on, in Ch. 6, we shall follow one such procedure.
In each ~-class we shall be able to select a unique ‘distinguished’
member. Then, for any set A, we can take |A| to be the
distinguished member of [A]~ rather than that class itself. Then
Thm. 2.3.5 ensures that the requirement of Def. 1.3 is satisfied.
(ii) For the time being, let us take it on trust that Def. 1.3 can be
completed in a satisfactory way. This is not asking too much,
since our reference to cardinals may be regarded as a mere
convenience: everything that we shall say in this chapter in terms
of cardinals can easily be rephrased (at the cost of some circum-
locution) in terms of sets and mapping between sets.
38 3. Cardinals

(iii) The cardinality |A| of a set A is a measure of its size. Cardinals


can be regarded intuitively as generalized natural numbers. In-
deed, if A is a finite set of the form {a;, d2,..., a,}, where the
a; are distinct, then we could take |A| to be n, the number of
members of A. Thus, each natural number may be regarded
intuitively as the cardinality of a finite set.
(iv) However, we shall not assume formally that the natural numbers
are in fact cardinals. Rather, in §3 we shall posit for each n a
corresponding cardinal n, without necessarily identifying the two.

§2. Ordering the cardinals; the Schréder—Bernstein Theorem


We define a binary relation < on the class of cardinals, which, as we
shall soon see, is a [blunt] partial order on that class:

2.1. Definition
Let A and wu be cardinals. Let A and B be sets such that |A| = A and
|B| = u. We say that A is smaller-than-or-equal-to u — briefly: 4 < u — if
there is an injection from A to B.

2.2. Remark
This definition is in need of legitimation: we must make sure that the
criterion it provides for asserting that A<w depends only on these
cardinals themselves rather than on the choice of particular sets A and
B such that |A| = A and |B| =u. This is done as follows. Let A, A’,
B, B' be sets such that |A| = |A’| and |B| = |B’|. Given an injection
from A to B, it is easy to show — DIY! — that there is also an injection
from A’ to B’.

2.3. Theorem
Let A and wu be cardinals and let B be a set such that |B\| = uw. Then
As wiff B has a subset whose cardinality is i.

PROOF

Let A be a set such that |A| = A. By Def. 2.2.4, an injection from A to


B is the same thing as a bijection from A to a subset of B. &
§2. Ordering; Schroeder—Bernstein Theorem 39
2.4. Theorem
The relation < on the class of cardinals is reflexive and transitive.

PROOF
DIY: &

To show that < is a partial order, it remains to establish that it is


weakly anti-symmetric (see Def. 2.3.7). This fact was conjectured by
Cantor and proved independently by F. Bernstein and E. Schroder.
The proof we shall present here, due to Zermelo, uses a lemma that is
of some interest in its own right.

2.5. Definition
A map g from a class of sets to a class of sets is monotone if whenever
X and Y are sets in domg such that X¥ C Y thengX CqY.

2.6. Lemma
Let A be a set and let g be a monotone map from PA to itself. Then A
has a subset G such that gG = G.

PROOF

For any subset X of A, the value 7X is also a subset of A. Let us say


X is a good set if it is a subset of A such that gX C X. (For example,
A itself is clearly good.)
Note that if X is good then gX is good as well. Indeed, if gX C X
then by the monotonicity of g we get g(gX) CgX, which means that
gX is good.
Let G be the intersection of all good subsets of A, that is:

G=(\ixePA
yx CX
(See Def. 1.4.1.) We claim that G itself is good. To show this, let X be
any good set. Then G C X because G is the intersection of all good
sets. Therefore by the monotonicity of g we have gG CgX. Also,
since X is good, we have gX C X; hence gGC X. Thus we see gG is
included in every good set. Hence gG must also be included in the
intersection of all good sets. But this intersection is G itself; this means
that gG C G, so G is good, as claimed.
It now follows that 7G is good as well. But G, the intersection of all
40 3. Cardinals

good sets, is included in each of them and in particular in the good set
gG. So we have shown bothgG C GandGCgG.ThusgG=G. H&

2.7. Theorem (Schroder—Bernstein)


If 4 and ware cardinals such that A < wand u< 2 then A= wu.

PROOF

Let A be a set such that |A| = w. Since A < w, according to Thm. 2.3 A
has a subset, say B, such that |B| = A. Since also «<A, according to
Def. 2.1 there is an injection, say, f, from A to B.
The claim that A= wu will be proved if we show that there is a
bijection from A to B.
Define a map ¢ from PA into itself by putting, for any X C A,

gX =(A—
B)U f{[X].
(For the definitions of A — B and f[X], cf. Def. 1.4.4 and Def. 2.4.6.)
It is easy to see that g is monotone. By Lemma 2.6, there exists some
G CA such that G = gG. Thus

Gata
— Bru GL
Note that f[G] C B because f maps the whole of A into B. (See Fig.
1. The large rectangle represents A; like Gaul, it is divided into three
parts.)
Now, f}G is an injection from G to B and a bijection from G to
f[G] (see Prob. 2.4.7). Let us put

h = (f1G) VU idg_g.

i Q

=a

Fig. 1
$3. Cardinals for natural numbers 4]
Thus / is a map whose domain in the whole of A, such that

It is obvious that / is a bijection from A to B. ia

2.8. Remarks
(i) In view of Thms. 2.4 and 2.7, < is a [blunt] partial order on the
class of cardinals.
(ii) As usual in such cases, we denote by ‘<’ the sharp partial order
associated with <. (Thus < is <#, see Prob. 2.3.9.) If A. and ware
cardinals such that A < wwe say that Ais smaller than wu.
(iii) Later on we shall prove (using the Axiom of Choice) that < is a
total order on the class of cardinals.

§3. Cardinals for natural numbers


3.1. Definition
If n is a natural number and aj, a,..., a, are distinct objects, we put

n =g |{@1, 42, .--, An}.


In particular, 0 = |\@| and 1 = |{a}|, where a is any object. We call n
the cardinal for (or corresponding to) n.

3.2. Remarks
(i) To legitimize Def. 3.1 we must verify that if a,, a2, ..., a, are
distinct objects and b,, bz, ..., b, are likewise distinct objects
then

{ay, D5 > «5 a,} oo {b,, bo, AOD De

This is easy: {(a,,b,), (az, bz), ..-, (@n,b,)} is clearly a


bijection from {@1, a2, ..., a,} to {by, bz, ..., Dy}.
(ii) By Thm. 2.3, 0 < py for every cardinal wu.

3.3. Problem
Define c,, by induction on n as follows:
Co = Oand Cn+1 = {Cy} for each n.
42 3. Cardinals

Prove that, for each n, the objects cg, ci, ..., Cy are distinct. (Use
induction on n.)
Thus for any natural number n there exist n distinct objects, and
hence the corresponding cardinal n exists.

3.4. Theorem
Let a, a7, ..., a, be any objects. Then there does not exist an injection
from the set {a,, a2, ..., G,} to any proper subset of itself.

PROOF

By induction on n. For n =0 our theorem is trivial, since @ has no


proper subset.
For the induction step, consider a set A = {@1, a2, ..., Gn, Gn+1}-
We may assume that the objects a), a2, ..., Gj, @n+41 are all distinct;
otherwise, by eliminating one duplication we can write A in the form
‘{b1, bo, ..., b,}’ and the required result follows at once by the
induction hypothesis.
Suppose f is an injection from A to some BCA. If BCA then at
least one member of A must be outside B; and (by relabelling the a’s
if necessary) we may assume that a,,,, ¢ B.
Since fa,,, must be in B, it cannot be a,,,, itself; and (again, by
relabelling if necessary) we may assume that fa,.; = a,. Therefore
a, € B. Also, since f is injective, a,,, is the only x € A such that
fx =a).
It would then follow that f}{a,, a2, ..., a,} is an injection from the
set {a1, dj, ..., a,} to its proper subset B — {a,} — contrary to the
induction hypothesis. Thus B cannot be a proper subset of A. =

3.5. Theorem
For any natural numbers n and m:

(i)if m<nthenmsn, (ii)if m#nthenm#n.

(WARNING. The two ‘<’ here mean different things: the first denotes
the usual order among natural numbers, while the second denotes the
partial order on the cardinals. )
§ 4. Addition 43

PROOF

(i) Assume m<n. Take n distinct objects a,, a2, ..., a, (which
exist by Prob. 3.3). Since {a,, az, ..., a} is clearly a subset of
HOR eee ae a,}, we have m <n by Thm. 2.3.
(ii) Let m#n. Without loss of generality we may assume m <n.
Take n distinct objects a,, a2, ..., a,. By Thm. 3.4 there is no
bijection from {@,, a), ..., a,} to its proper subset {a,, ao, ...,
am}. Therefore m# n. Ls)

3.6. Remark
A subtle matter: we have not shown that being a natural number is a
notion of set theory. Rather, we have taken this notion to be under-
stood in advance, prior to the development of set theory. Therefore
Def. 3.1 cannot be regarded as a single definition within this theory.
Rather, it is a definition scheme, a sequence of definitions whereby
each of the cardinals 0, 1, 2, 3, etc., in turn may be defined separately.
Similar caveats apply to the whole of this section as well as to
definitions like 1.3.1 and 2.1.7 and theorems like 1.3.16.

§4. Addition
In this section we shall see how cardinals may be added. But first we
introduce a useful bit of terminology.

4.1. Definition
If AM B=, we say that A and B are disjoint.

4.2. Lemma
For any sets A and B, there are disjoint sets A' and B', such that
|A| =|A"| and |B| = |B’.
PROOF

Take any two distinct objects a and b (for example, @ and {©}; see
Prob. 3.3). Then let

Aba XaAe= {CVayx) exten); BU={D\ 2X BH=ACb ax ie Bs


44 3. Cardinals

Using (2.1.2) it is easy to see that A’M B’ =. Also, a bijection f


from A’ to A is obtained by putting f(a, x) =x for every x € A; so
|Al = /AGl, Semilarly 5 |= 87). |

4.3. Lemma
Let A, B, A’, B' be sets such that AN B= A'1\ B' =, |A| =|A’
and.\B) = Bo ther |A UBL =A" Bae

PROOF

Let f and g be bijections from A to A’ and from B to B’ respectively.


Then it is clear that f U gis a bijection from A U Bto A’ U B’. i

4.4. Definition
For any cardinals 4 and uw, we define the sum of A and u:

A+u=q lA UB b)

where A and B are disjoint sets such that |A| = A and |B] = u.

4.5. Remarks
(i) Def. 4.4 is legitimized by Lemma 4.3.
(ii) In the proof of Thm. 2.7 we made use of a special case of Lemma
4.3. We had there A = GU(A — G) and B= f[G]U (A—- G),
where the unions in both cases are between disjoint sets. Also,
|G| =|f[G]| because f is injective. Hence we concluded that
|Al'=|BI:

4.6. Theorem
If k, m and n are natural numbers and k + m=n, thenk+m=n.

PROOF

DIY. (WARNING. The two ‘+’ here mean different things. The first
denotes the operation of addition of numbers. The second denotes
addition of cardinals.) a
$4. Addition 45

4.7. Problem
Verify, for all cardinals x, A and wu:
(i) w+(A+u)=(%+A)+u (associativity of addition),
Gi) A+u=ut+a (commutativity of addition),
(ii) A+ 0 =A (neutrality of 0 w.r.t. addition),
(Gv) ASp>xtAsutu (weak monotonicity of addition).

4.8. Warning
Although cardinal addition behaves in many ways like ordinary addi-
tion of natural numbers, not all rules of ordinary arithmetic apply
here. For example, as we shall see later, from x +A= x it does not
always follow that A= 0. Hence the cancellation law does not apply in
general (from x + A= x + wit does not always follow that A = w); nor
is addition of cardinals strongly monotone (from A< wu it does not
always follow thatx+A<x+ wu).

Instead of adding just a pair of cardinals at a time, it is possible to


define the sum of many — even infinitely many — cardinals simultan-
eously. However, the legitimation of this definition requires the Axiom
of Choice (AC, see Ch. 5). We shall explain the definition here,
leaving its legitimation for later. First, we need some new notation:

4.9. Definition
If B is a function whose domain is a set X, we sometimes denote the
value of B at x e X by ‘B, rather than by ‘Bx’ and denote B itself by

CB lixe XY.
In this connection we refer to X as the index set and to B as the family
of the B,, indexed by X.

4.10. Remark
Many authors use the vertical stroke ‘|’ instead of the colon for class
abstraction (as in Def. 1.1.5) and so use some other notation for
indexed families.
46 3. Cardinals

4.11. Definition
Let {B,|x € X} be an indexed family of sets (that is, all the B, are
sets). Let uw, = |B,| for each x e X. We put:
Ge me eeX =a |Uf{x} x By: x € X}}|.

This is called the sum of the [family of the] ,, indexed by X.

4.12. Remarks
(i) Thus, to add up all the “, simultaneously, we form the cartesian
product {x} x B, for each x e X. (Note that these products are
pairwise disjoint: if x # y then {x} x B, and {y} x B, are dis-
joint, although B, and B, need not be disjoint and may even be
equal.) Then we take the union of all these products. Using AR
and AU it is easy to verify that this union is a set. The cardinality
of this set is the required sum.
(ii) To legitimize this definition one must show that if A is another
indexed family of sets with the same index set X such that
|A,| = |B,| for all x e X, then

WHA SS Ad eat hon eae


This can easily be done, using AC (see Rem. 5.1.3(i11) below).
(iii) We need to define the sum of a family, rather than a set, of
cardinals because in a set of cardinals each cardinal can occur at
most once: a given cardinal either does or does not belong to a
given set. However, we must not forbid multiple occurrence of a
cardinal in a sum. This is taken care of by our definition, since in
the family {u, |x € X} we can have uw, = u, forx # y.
(iv) Def. 4.4 is obtained as a special case of Def. 4.11 by taking the
index set X to have just two members.
(v) The set U{{x} x B,: x € X} is called the direct sum of the
indexed family {B, |x € X}.

§5. Multiplication
5.1. Definition
For any cardinals A and uw, we define the product of A and u:

A-w=a|A X B =)
$5. Multiplication 47

where A and B are any sets such that |A| = A and |B| = uw. We often
abbreviate ‘A- w’ as ‘Aw.

5.2. Remarks

(i) A X Bisaset by Rem. 2.1.13(ii) and Lemma 2.2.5.


(ii) Def. 5.1 is legitimized by the easily proved fact that if A’ ~ A
and B’ = B, then also A’ x B'~ A x B.

For natural numbers m and n, the product mn equals the sum


obtained when n is added to itself m times (this is why the product is
read as ‘m times n’). A similar result also holds in cardinal arithmetic,
in the following sense:

5.3. Theorem
Let i and x be any cardinals and let {u,| a € A} be an indexed family
of cardinals such that u, = % for every a€ A and such that |A| =A.
Then

D {a |a€ A} = Ax.
PROOF
Let D be a set such that |D| = x. Applying Def. 4.11 to the indexed
family of sets {B,|ae¢A} such that B, = D for every ae A, we
obtain

D {Ha|a¢ A} = |Uf{a} x D: aeA}.


However, it is not difficult to verify (DIY!) that

Oh ei a elA Nat A 1).

Hence \{u, |a¢ A} =|A X D| = Ax. |

5.4. Theorem
If k, m and n are natural numbers and km = n, then km =n.

PROOF

DInxe ae
48 3. Cardinals

5.5. Problem
Verify, for all cardinals x, A and wu:

(i) “(Au) = (#A)u (associativity of multiplication),


(ii) Au = pA (commutativity of multiplication),
(iit) AL= A (neutrality of 1 w.r.t. multiplication),
(iv) ASu> xd xy (weak monotonicity of multiplication),
(v) (4+ A)uU= xt Au
(distributivity of multiplication over addition),
(vi) Au=O0@A=Oorp=0 (absorptive property of 0).

5.6. Problem
Prove the following generalization of Prob. 5.5(v): if {A, |x € X} is
any indexed family of cardinals and w is any cardinal then

(Stave X))-w= Sala en,

5.7. Warning
The same as 4.8, mutatis mutandis.

As in the case of addition, multiplication can be defined for a whole


family of cardinals rather than just a pair of cardinals. (Legitimation
again requires AC.) We start from a simple observation:

5.8. Lemma
Let C and D be any sets and let u and v be distinct objects. Let P be
the class

{f : f is a function such that dom f = {u, v} and fu € Cand fue D}.


Then P is a set equipolleni to C xX D.

PROOF

It is quite easy to show, without using AR, that P is a set. However,


we shall not bother to do so. Instead, we shall define a bijection F
from the set C x D to P. Thus by AR the latter is also a set. We put,
$5. Multiplication 49

for eachc € CanddeD,

EX ced) dd aise) ond).


It is easy to verify that F is indeed a bijection from C x D to P. S

The following definition generalizes the construction of Lemma 5.8 to


an arbitrary family of sets.

5.9. Definition
If {B, |x € X} is an indexed family of sets, the class
{f : f is a function such that dom f = X and fx e B, for all x « X}
is denoted by

‘X {B, |x € X}
and called the direct product of the family {B, |x € X }.

5.10. Lemma
If {B,| x € X} is any indexed family of sets, then X{B,|x € X} isa
set.

PROOF

Recall (Def. 4.9) that {B, | x € X} is the function having the index set
X as its domain, whose value at each x € X is B,. Therefore the range
of this function is
Bede Crk}
and this range is a set by AR. Now let us put
U = U{B,:
x € X}.
U is aset by AU. Next, observe that by Def. 5.9, if f is any member of
<{B,|x e X} then f is a map from X to U. Hence fE Xx U,
which means that f e P(X x U). Thus we have shown that
Mt Baek SrtxX x U).

Since X x U is a set (cf. Rem. 5.2(i)), it follows that P(X x U) is a


set by AP. Hence X {B,|x € X} is a set by AS. |
50 3. Cardinals

5.11. Definition
Let {B,|x € X} be a family of sets and let u, =|B,| for each x € X.
We put

[l{ |x€ xX} =ae |X { B,|x € At

This is called the product of the [family of] u,, indexed by X.

5.12. Remarks
(i) Using AC it is easy to legitimize this definition by showing that if
A is another indexed family of sets with the same index set X
such that |A,| = |B,| for all x e X, then
KAY
| eX) = 1B, xe XY).
(ii) Def. 5.1 can be regarded as a special case of Def. 5.11. Indeed, if
C and D are any sets, whose cardinalities are x and / respect-
ively, take X = {u,v}, where uw and v are distinct objects, and let
{B,|x € X} be the family such that B, = C and B, = D. Then
Lemma 5.8, rewritten in the notation of Def. 5.9, says that

X{ Bele
X= Cx

So in this case we have

|X {B, |x € X}|=|C
xD ’

which is what Def. 5.1 says xA should be.

§6. Exponentiation; Cantor’s Theorem


6.1. Definition
Let A and B be any sets. Then

map(A, B) = {f : fis a map from A to B}.

6.2. Remarks
(i) If f is any member of map(A, B) then f C A x B, hence f is
a member of P(A x B). Thus map(A, B) CP(A x B), and
map (A, B) isa set.
(ii) Perhaps more instructively, the same result can be derived
from Lemma 5.10, as follows. Consider the indexed family
$6. Exponentiation; Cantor’s Theorem Sill

{D,|a¢ A} such that D,=B for every ae A. Then


X {D, |a € A} - which is a set by Lemma 5.10 - is, by Def. 5.9
equal to
{f : f is a function such that

dom
f = A andfa € B forallae A}.
By Def. 6.1 this is exactly map (A, B).

6.3. Definition
For any cardinals A and uw, we define wu to the [power of] A:
u’ = |map (A, B) ’

where A and B are sets such that |A| = A and |B] = wu.

6.4. Remarks
(i) This definition is legitimized by the easily verified fact that if
A = A' and B = B' then map(A, B) ~ map(A’, B’).
(ii) From Rem. 6.2(ii) it follows that exponentiation (raising to a
power) can be achieved by repeated multiplication, in the follow-
ing sense: if {x,|a¢ A} is an indexed family of cardinals such
that x, = uw for all a € A, and if |A| = A, then
Tl{%, | a € A} =.

6.5. Problem
Let k, m be natural numbers, and let n = m*. Verify that n = m*.

6.6. Problem
Verify that for any cardinals x, A and u:

(v) (Aw) = A*u".

6.7. Theorem
For any set A, |PA| = 241.
Sy 3. Cardinals

PROOF
By Def. 6.3, what we have to show is that PA is equipollent to
map (A, B), where B is a set having exactly two members. Let us take
B = {@, {@}}. Define a map F from map (A, B) to PA, by putting,
for every f e map(A, B),
Ff ={aeA:
fa =O}.

It is easy to verify that F is a bijection from map (A, B) to PA. |

6.8. Cantor’s Theorem


For any set A, |A| <|PA|.

PROOF

First, we show that |A| <|P.A|. We define a map f from A into PA by


putting fa = {a} for each ae A. Clearly, f is an injection from A to
PA.
We show that |A| #|P.A| by reductio. Let g be any map from A to
PA. For each x € A, then, gx is a member of PA —that is, a subset of
A. Put

D={xeEA:x
¢€ gx}.

Then D is a subset of A—that is, a member of PA. If g were to map


A onto PA, there would be some de A for which gd= D. Then
dé gd<deD.
But from the definition of D we see that de D=d € gd.
Thus, d belongs to gd iff it doesn’t. This contradiction shows that g
cannot map A onto PA, and hence cannot be a bijection from A to
PA. a

6.9. Remark

The idea of Russell’s Paradox derives from this proof. Indeed, if A is


the class of all sets, then it is easy to see that PA C A. Thus id, is in
fact a bijection from A to a class—A itself—that includes PA. Taking
id, as the g in Cantor’s proof, the D of that proof becomes Russell’s
paradoxical class of all sets that do not belong to themselves.
4

Ordinals

§1. Intuitive discussion and preview


The introduction of the set-theoretical cardinals was motivated by the
wish to generalize the natural numbers in their capacity as cardinal
numbers, answering the question ‘how many?’. But the natural num-
bers are also used, in arithmetic as well as in ordinary life, in other
capacities. In my local bank branch there is a number dispenser: on
entering the branch, each customer collects from the dispenser a piece
of paper showing a number. This number is not (at least, not directly)
an answer to a ‘how many?’ question, but an ordinal number, fixing
the place of the customer in the queue.
A finite set can always be arranged as a queue — and if we ignore the
identity of the elements being ordered, this can done in just one way.
For example, the first three customers in the bank, arranged according
to the numbers assigned to them by the dispenser, always form the
following pattern:

e<e~<e
ee

We can use the number three as an ordinal number, to describe this


general abstract pattern, the order type of three objects arranged in a
queue. Note that three is also the number to be assigned to the next
customer, who is about to join the queue. This is quite general: the
ordinal number assigned to each customer is the order-type (the queue
pattern) of the queue of all preceding customers.
Cantor wished to extend this idea of finite queues and finite ordinal
numbers into the transfinite. Imagine that all the old (finite) ordinal
numbers have been dispensed. We have now got an infinite queue

23
54 4. Ordinals

forming the pattern


(*) ex~e<xe<e<...
eee
We need a new ordinal to describe the order type of this infinite
queue. Cantor denoted this new ordinal by ‘w’. We can assign this
ordinal to the next ‘customer’ and extend the queue by placing that
customer behind all the finite-numbered ones:
e~<e<e<e< ..<e
O i &@ 2 w

The new order type just formed is described by the next ordinal, which
Cantor denoted by ‘w+ 1’. We can continue in this way, getting not
only w+ n for every natural n but also w + w, then w+ w+ 1 and so
on and on and on.
Examining the ‘queues’ formed in this way, Cantor saw that they are
not merely totally ordered, but have a special property not shared by
all totally ordered sets: every non-empty subset of the queue has a
least (first) member. Cantor called such queues well-ordered.
An example of a total order that is not a well-ordering is provided by
the integers, ordered according to magnitude:

De ee ee eed) ee ae

Note that the fact that the pattern (*), described by the ordinal a, is
well-ordered is just the Least Number Principle, a form of the Principle
of Mathematical Induction (see § 4 of Ch. 0).
Cantor introduced the ordinals as a new and separate sort of abstract
entity, just as he did with cardinals. However, in 1923 John von
Neumann pointed out that among all well-ordered sets having a given
Cantorian ordinal as their order-type there is a particular one with
some very special properties. In the spirit of reductionism, this particu-
lar set can then be taken to be the ordinal of that order type.
We shall present von Neumann’s theory of ordinals as streamlined
by Raphael M Robinson and others.

§2. Definition and basic properties


2.1. Definition
Let < be a [sharp] partial order on a class A and let BC A. If be B
and b < x for every other x € B, we say that b is least in B with respect
KO) Sc
§2. Definition and basic properties 2)

2.2. Remarks
(i) Instead of demanding that b < x for every other x € B, we may
equivalently demand that b <x for every x € B. Here < is of
course <", the blunt partial order associated with < (see Prob.
2.3.9 and Rem. 2.3.10).
(ii) When there is no risk of confusion, we omit the phrase ‘with
respect to <”.
(iii) Since < is anti-symmetric, if B does have a least member it is
unique and we may therefore refer to it as the least member of
B:

2.3. Definition
A well-ordering on a class A is a partial order on A such that every
non-empty set included in A has a least member.

2.4. Lemma
If < is a well-ordering on a class A then < is a [sharp] total order on
A.

PROOF
According to Def. 2.3.11, we must show that < fulfils the trichotomy
and transitivity conditions. The latter condition is fulfilled because by
Def. 2.3 < is a partial order; so it only remains to verify the
trichotomy.
Let x and y be any members of A. We must show that exactly one
of the three disjuncts
ye Sj) Olens = j) (re yy << 9%

holds. That no two of these disjuncts can hold simultaneously follows


at once from the anti-symmetry of <. On the other hand, the set
{x, y} is included in A and so must have a least member; hence at
least one of the three disjuncts must hold. |

2.5. Definition
If A is any class, we define the binary relation €,4 on A, called the
restriction of € to A, by putting

ep=attx,y) 6 Aix ey}.


56 4. Ordinals

2.6. Remark
The relation €,4 can also be characterized by the fact that, for all x
and y,

xe€,yexeAandye Aandxey.

2.7. Definition
We say that a class A is e€-well-ordered if the relation €,4 is a
well-ordering on A.

2.8. Problem
(i) Let A be a class such that €, is a sharp total order on A; let
BCA and be B. Prove that b is least in B w.r.t. €, iff b is
either an individual or a set such thatbN B=.
(ii) Hence verify that a class A is €-well-ordered iff the following two
conditions are satisfied:
(1) €,4 is asharp total order on A.
(2) Every non-empty set u included in A has a member v such
that v is either an individual or a set such thatv Nu =.
(iii) Prove that in (ii) we may replace (1) by the weaker condition:
(1') For any members x and y of A, at least one of the following
three disjuncts holds:

xE€yorx=yoryex.

(Show that if two of these disjuncts hold simultaneously then the


set u = {x, y} violates (2). To verify that € 4 is transitive, let x, y
and z be members of A such that x € y € z and apply (2) to the
setu={x, y, z}.)
(iv) Hence (or directly from Def. 2.7) prove that if B C A and A is
€-well-ordered, then so is B.

2.9. Theorem

If A is an €-well-ordered class and B is a non-empty subclass of A, then


B has a least member w.r.t. € 4.
§2. Definition and basic properties a7

PROOF

Take any ze B. If z is the least member of B, we need look no


further. So let us suppose z is not the least member of B. Therefore by
Prob. 2.8(i) z is a set rather than an individual and zN B#@.
By Prob. 1.4.5@i) zM B is a set; we have just seen that it is
non-empty; and it is clearly included in B and hence also in A. So by
Def. 2.7 z M B must have a least member w.r.t. € 4.
Let y be the least member of z M B. We claim that this y is also the
least member of B. Indeed, if this were untrue, then (applying to y the
argument we have just applied to z) we would find an x such that
x ey B. Then x € y as well as y € z and by the transitivity of €, it
would follow that x ez, hence x €zMB. But this is impossible,
because x € y and y is the least member of zM B. SS]

2.10. Definition
A class A is transitive if, for all y,

yeA>yCa.

2.11. Remarks
(i) Note that every member of a transitive class must be a set rather
than an individual, because by Def. 1.3.4 y C A holds only if y is
a class. So a class A is transitive iff:
(1) all its members are sets and
(2) UA CA; that is, for allx andy,xe ye A>xeA.
(ii) Unfortunately, ‘transitivity’ is used with two meanings: the pre-
sent one and that applicable to binary relations (as, for example,
in Def. 2.3.2). In practice no confusion shall arise, as the context
will indicate which meaning is intended.

2.12. Definition
An ordinal is a transitive and €-well-ordered set. The class of all
ordinals is denoted by ‘W’.

2.13. Examples
The empty set @ is, vacuously, an ordinal. It is also easy to verify that
{O} and {G, {@}} are ordinals.
58 4. Ordinals

2.14. Convention
We shall use lower-case Greek letters — mainly ‘a’, ‘f’, ‘y’, ‘WV’, ‘& and
‘y’ — as variables ranging over the ordinals.

2.15. Theorem
All members of an ordinal are ordinals; thus, if @ is an ordinal,
w={E:Eea}.

PROOF

Let yea. Since a is transitive, we have yCqa@. Since a@ is an


€-well-ordered set, it follows from Prob. 2.8(iv) that its subset y is also
€-well-ordered. It remains to show that y is transitive.
So let ue x € y. Using the fact that a is a transitive set, we have
x € w and then in turn also u € w. Hence u and x, as well as y, are
members of a; so by the transitivity of the relation €, we infer from
uexeythatuey. =

2.16. Lemma
If y is any transitive subset of an ordinal a then y itself is an ordinal;
moreover, y= Worye da.

PROOF

That y is an ordinal follows at once from Prob. 2.8(iv). Moreover, let


u=a-—y. If w= then y=a. If u is non-empty, then it has a
(unique) least member x w.r.t. €,. We shall show that y = x.
First, let z € x. Since x € w and aq is transitive, it follows that z € a.
But z cannot be in u, because z € x, and x is the /east member of u;
thus z must be in y. This proves that x C y.
Conversely, let z « y. Then z = x is impossible because x ¢ y. Also,
x €Z is impossible because, by the transitivity of y, it would imply
x € y. Hence by Lemma 2.4 we must have z € x. This proves that
y Cx. Thus y=x ea. BE

2.17. Theorem
The class W of all ordinals is transitive and €-well-ordered.
§2. Definition and basic properties By)

PROOF
The transitivity of W follows at once from Thm. 2.15. To prove that W
is €-well-ordered, we shall make use of Prob. 2.8(iii).
To verify that condition (1’) of Prob. 2.8(iii) holds for
W, let w and
B be any ordinals. Since both a and f are transitive, it is easy to see
that a B is also transitive. Thus by Lemma 2.16 a / £ is an ordinal,
Say y; moreover, y = wor y € aw. Likewise, y = Por ye f.
But we cannot have both y € a and yé # because then ye aN B -
that is, y € y; and this would violate the anti-symmetry of the well-
ordering relation €, on y. Therefore y = aw or y = B. Hence a = f or
aw € Bor B € a, which proves condition (1') for W.
Now let u be any non-empty set of ordinals. We must prove that
there exists an ordinal € € u such that EM u=©. Take any we u. If
au =, we are through.
On the other hand, suppose aM u#. Since a is €-well-ordered,
there must exist some member & of aM u such that ENanu=2@.
But€ € wand q@is transitive;so§ C wa. Hence ENUu=ENaNuUu=G@.
a

2.18. Corollary
W is a proper class (that is, not a set).

PROOF
If W were a set, then by Def. 2.12 and Thm. 2.17 it would be an
ordinal, hence W e W, in violation of the anti-symmetry of the well-
ordering relation €w. a

2.19. Remarks
(i) The (naive) assumption that W is a set led to a contradiction.
This was the Burali-Forti Paradox (see § 2 of Ch. 1). Cor. 2.18 is
a ‘tame’ version, within ZF, of the paradox. Similarly, Thm.
1.3.10 is a ‘tame’ ZF version of Russell’s Paradox.
(ii) In the proofs of Thm. 2.17 and Cor. 2.18 we used the argument
that an ordinal y cannot be a member of itself because this would
violate the anti-symmetry of the well-ordering relation €, on y.
In mathematical practice it is often convenient to posit a further
postulate — the Axiom of Foundation (or Regularity), first pro-
posed by Dimitry Mirimanoff in 1917 — one of whose effects is to
60 4. Ordinals

exclude any set that belongs to itself. On the other hand, in some
special applications of set theory — notably in so-called situation
semantics, developed by Jon Barwise and others, and in abstract
computation theory — it is convenient to use an extension of ZF
proposed by Peter Aczel, which negates the Axiom of Founda-
tion and admits some sets that belong to themselves. In the
present course we do not commit ourselves either way.

2.20. Corollary
Any class of ordinals is €-well-ordered.

PROOF
Immediate from Thm. 2.17 and Prob. 2.8(iv). id

2.21. Definition
The €-well-ordering on W shall be denoted by ‘<’. Thus for any
ordinals w and £,

a<pooefB.

2.22. Remarks
(i) As usual, we denote by ‘<’ the blunt version of <. Thus

a=Boaefpora=B.

(ii) Thm. 2.15 can now be read as saying that if w is any ordinal then
Vee = Or
(iii) From now on, whenever we use order-related terminology in
connection with ordinals, we shall take it for granted that the
order relation referred to is the €-well-ordering, unless otherwise
stated.

2.23. Definition
Let < be a partial order on a class A and let BC A.

(i) If we A and x <u for all x € B, then uw is said to be an upper


bound of (or for) B with respect to <.
(ii) If wu is the least member of the class of upper bounds for B w.r.t.
< — that is, if uw is an upper bound for B w.r.t. < and if u <v
§2. Definition and basic properties 61
whenever v is any other upper bound for B w.r.t. < — then wu is
said to be the Jeast upper bound (abbreviated ‘lub’) for B
Walston

2.24. Remarks

(i) The phrase ‘with respect to <’ is omitted when there is no danger
of confusion.
(ii) A subclass B of A need not in general have any upper bound, let
alone a lub; but if it has a lub, it is unique.

2.25. Theorem
If A is a set of ordinals then its union-set UA is an ordinal. Moreover,
UA is the lub of A.
PROOF

To show that UA is transitive, assume that x € y é€ UA. Then for


some ordinal aw we have x € y € we A. Since @ is transitive, it follows
that x € we A; hence x € UA.
By Thm. 2.15, all the members of UA are ordinals; so by Cor 2-20)
UA is €-well-ordered. Thus UA is an ordinal.
If we A then aC UA, since UA is a transitive set. Therefore by
Lemma 2.16 a < UA. This means that UA is an upper bound for A.
Finally, if 6 is any upper bound for A, then for each we A we have
a<f - that is, ~w€ f or a=f. By the transitivity of the set B it
follows that in either case a C f. Since this holds for each we A, it
follows that also UA C B. By Lemma 2.16 we now have Wits p=
which proves that UA is the /east upper bound for A. a

2.26. Definition
For any ordinal aw we put a’ =g~U {a}. We call a’ the immediate
successor of a. (This terminology is justified by the following
theorem.)

2.27. Theorem
For any a, a is an ordinal. Moreover, for any B, B= a iff B<a’'
(equivalently: a < B iff a’ < B). Hence w< B iff a’ < fp’.
62 4, Ordinals

PROOF

Easy — DIY. &

2.28. Definition
(i) An ordinal of the form a’ is called asuccessor ordinal.
(ii) An ordinal that is neither @ nor a successor ordinal is called a
limit ordinal.

§3. The finite ordinals


3.1. Definition
An ordinal «@ is said to be finite if no ordinal § < @ is a limit ordinal.
Otherwise, « is said to be an infinite ordinal. We put
w =a {@: wis a finite ordinal}.

3.2. Theorem
w is transitive.

PROOF

Let a be a finite ordinal. We must show that every member of a is also


a finite ordinal. This is easily done — DIY, using Rem. 2.22(ii). |

3.3. Theorem
(i) © is a finite ordinal.
(ii) If aw is a finite ordinal then so is a’.

PROOE:

(i) We know that © is an ordinal (Ex. 2.13). But by Def. 2.28(ii) @


is not a limit ordinal. Since @ has no members, the only & such
that § < © is O itself. Hence © is a finite ordinal.
(ii) Let aw be a finite ordinal and let §< a’. We must show that E is
not a limit ordinal. Now, a’ itself is a successor ordinal, hence
not a limit ordinal. It remains to consider the case where E< a’.
By Thm. 2.27 this means that § < qa. Since aq is a finite ordinal, &
is not a limit ordinal. @
$3. The finite ordinals 63

3.4. Theorem
@ iS a Set.

PROOF

Using the Axiom of Infinity (Ax. 1.3.21), take a set Z such that We Z
and such that whenever x € Z, then also x U {x} € Z. Thus if an
ordinal a belongs to Z then (by Def. 2.26) so does a’.
Consider the class w — Z, the class of all finite ordinals not belon-
ging to Z. If this class is non-empty, then by Thm. 2.9 it must have a
least member, say B. Now, f cannot be @, because @ does belong to
Z. Also, f, being a finite ordinal, cannot be a limit ordinal. So it must
be a successor ordinal, say 6 = a’ = w U {a}. But in this case a itself
is a finite ordinal (by Thm. 3.2), such that a<. Since B was
supposed to be the Jeast finite ordinal not belonging to Z, it follows
that ae Z. Therefore by the assumption on Z also a’ € Z. But this is
impossible, because a’ = f, which is the least finite ordinal not belon-
ging to Z.
So w— Z must be empty. Thus wC Z; hence wisasetby AS.

3.5. Corollary
w is the unique set X having the following three properties:

(i) De X;
(ii) whenever aw € X then also a’ € X;
(iii) X C Z for any set Z such that @ e€ Z and such that whenever
awe Z thenalso a’ € Z.

PROOF
Thm. 3.3 says that w has properties (i) and (ii). The proof of Thm. 3.4
shows that w has also property (iii). The uniqueness of w follows by
PX, because if X is any set having the three properties then both
wc X and X Co. a

3.6. Remarks
(i) Our first use of AI was to prove that @ is a set. Conversely, if we
postulate that w is a set, then by Thm. 3.3 wis a set satisfying the
conditions that AI lays down for Z. This shows that (in the
64 4. Ordinals

presence of the other postulates) AI is equivalent to the proposi-


tion that w is a set, which is a special case of the Comprehension
Principle.
(ii) In fact, it now transpires (Cor. 3.5) that w is simply the smallest
set satisfying the conditions of AI.

We restate the fact that w satisfies condition (ii) of Cor. 3.5 as a


principle in its own right:

3.7. Corollary (Weak Principle of Induction on Finite Ordinals)


Let Z be any set such that @ € Z and such that whenever a e€ Z then
also aw € Z. Then wC Z. |

3.8. Remarks
(i) We see that the set w of finite ordinals, with its €-well-ordering,
simulates, within the confines of ZF set theory, the behaviour
that characterizes the system of natural numbers. We can take ©
as the counterpart of the number 0 and the €-well-ordering on w
as the counterpart of the usual ordering of the natural numbers.
Just as each natural number 7 has an immediate successor, 1 + 1,
so every finite ordinal aw has an immediate successor, a’.
Moreover, the basic facts about the ordering of the natural
numbers (Facts 0.1.1-0.1.5) are mimicked by theorems about the
finite ordinals and their €-well-ordering. And, most importantly,
the Principle of Mathematical Induction is mimicked by the
Principle of Induction on Finite Ordinals. Certainly, within ZF o
impersonates, plays the role of, ‘the set of natural numbers’. In
fact, Cor. 3.5 reproduces within ZF Richard Dedekind’s famous
characterization of the natural numbers.!
(ii) The obvious reductionist step at this point is to identify the
ZF-set w of finite ordinals as the ‘true’ (hitherto intuitive) set N
of natural numbers. This would be a grand reduction indeed,
because work done during the 19th century by several mathemati-
cians (including Hamilton, Bolzano, Weierstrass, Dedekind and
Cantor) showed that all the concepts of mathematical analysis
could be reduced to those of natural number, set and member-
ship (plus concepts such as relation and function that we have by

! Was sind und was sollen die Zahlen?, 1888. (English translation in Essays on the theory
of numbers edited by W. W. Beman, 1901.)
$3. The finite ordinals 65
now reduced to set-theoretic concepts). Thus a huge part, if not
the whole, of mathematics would be reduced to set theory.
Many (perhaps most) mathematicians, under the influence of
the dominant structuralist ideology, do proceed in this way, and
frame (or think of) their mathematical discourse as taking place
within set theory.

3.9. Warning
This reduction, although extremely successful in a formal sense, is by
no means unproblematic, as Skolem pointed out in 1922, when he
published his famous paradox. (We shall discuss Skolem’s Paradox in
the Appendix.)

3.10. Theorem
w is the least infinite ordinal and the least limit ordinal.

PROOF

That w is an ordinal follows at once from Cor. 2.20 and Thms. 3.2 and
3.4. Also, w cannot be a finite ordinal, because that would mean that
w € w — which is impossible for an ordinal. Thus w must be an infinite
ordinal. On the other hand, if & < w — that is, § € w—- then by Def. 3.1
E is a finite ordinal; hence w must be the /east infinite ordinal.
If £€ w then, as we have just seen, & is a finite ordinal, hence a
fortiori, not a limit ordinal. If w itself were not a limit ordinal then by
Def. 3.1 it would follow that is a finite ordinal, contrary to what we
have proved. Thus w must be a limit ordinal. As we have just
observed, no ordinal smaller than w can be a limit ordinal. Hence a is
the Jeast limit ordinal. |

3.11. Preview
We have yet to justify the adjectives finite and infinite introduced in
Def. 3.1 in connection with ordinals. Dedekind defined a set as infinite
if there exists an injection from it to a proper subset of itself, and as
finite if there is no such injection. We will not adopt Dedekind’s
definition, but we shall show that finite and infinite ordinals in the
sense of Def. 3.1 are finite and infinite respectively in Dedekind’s
sense.
66 4. Ordinals

3.12. Theorem
There does not exist an injection from a finite ordinal to a proper subset
of itself .

PROOF

We proceed by weak induction on finite ordinals (Cor. 3.7). The proof


is a formal (or ‘internalized’) version of the proof of Thm. 3.3.4.
Let Z be the set of all finite ordinals w such that there is no injection
from a to a subset of itself. In order to prove our theorem it is enough
to show that @ € Z and that if we Z then also a’ € Z.
That @ € Z is obvious, since © has no proper subsets. Now assume,
as induction hypothesis, that w € Z and let f be an injection from a’ —
that is, from aw U {a} — to a subset B of itself. If B is a proper subset
of a’ then the set a’ — B is non-empty.
Without loss of generality we may assume that aw belongs to a’ — B
rather than to B. (In the contrary case, where aw € B, take any member
B of aw’ — B and let g be the biection from a’ to itself that inter-
changes £ and qa but leaves all other members of a’ fixed: thus,
gB=a, ga=f and g&=& for any € a’ other than f and aw. Then
use gof instead of f itself: it is an injection from a’ to its proper
subset g[B] = (B — {a}) U {B}.)
Our assumption that wea’ — B means that BCa. Next, let
y = fa; then y must belong to B, since f is a map to B. It now follows
that f{qa@ is an injection from «@ to its proper subset B — {y}. This
contradicts the induction hypothesis. So B cannot be a proper subset
of a’. B

3.13. Theorem

If aw is an infinite ordinal then there is an injection from w to a proper


subset of itself.

PROOF

First, consider w. Define a map f on @ (that is, with @ as its domain)


by putting f§=&' for every finite ordinal &. Then f is injective.
Indeed, if € and 7 are distinct, say § < n, then by Thm. 2.27 &’ <n’,
hence &' and 7’ are also distinct. Also, f maps @ to (in fact, onto) its
proper subset w — {@}.
Now let a@ be any infinite ordinal. By Thm. 3.10 we have w<a,
$3. The finite ordinals 67

which means that we @w or w=; and since q@ is a transitive Set, at


follows that oC a. Then the map f Uid,_,, (with f as before) is
clearly an injection from a to its proper subset w — {@}. @

3.14. Theorem

A finite ordinal is not equipollent to any other ordinal.

PROOF

Let a be a finite ordinal and let f be another ordinal. First, suppose £


is finite as well. We have w < f or B < a - that is, B € w or Be w-and
since ordinals are transitive sets it follows that aC B or BC a; hence
by Thm. 3.12 w and f cannot be equipollent.
Now suppose f is an infinite ordinal. By Thm. 3.13 there exists an
injection, say g, from f to a proper subset of itself. If f were a
bijection from @ to f, then clearly f~'o ge f would be an injection
from «@ to a proper subset of itself — which is impossible. te

3.15. Definition
A set is finite if it is equipollent to a finite ordinal (in the sense of Def.
3.1). Otherwise, it is infinite.

3.16. Remarks
(i) By virtue of Thm. 3.14, an ordinal is finite (or infinite) in the
sense of Def. 3.1 iff it is finite (or infinite, respectively) in the
sense of Def. 3.15; so there in no conflict between the two
definitions.
(ii) By Thm. 3.14, a finite set is equipollent to a unique finite ordinal.

3.17. Problem
(i) Prove that there does not exist an injection from a finite set to a
proper subset of itself. (Use Thm. 3.12.)
(ii) Prove that if A is a non-empty finite set of ordinals, then A has a
greatest member — that is, an ordinal we A such that € < a for
each & € A. (Otherwise, define a map f on A by taking, for each
aw é A, fa as the least € € A such that a < &. Show that f would
be an injection from A to a proper subset of itself.)
68 4. Ordinals

3.18. Problem
Let n be a natural number. Show that for any objects a;, a2, ..., Gn,
the set {a,, a, ..., a,} is finite. (Use weak mathematical induction
on the number 7.)

§ 4. Transfinite induction
Various forms of the Principle of Mathematical Induction have an-
alogues that apply to ordinals. These analogues collectively are known
as the Principle of Transfinite Induction. First, by virtue of the fact
that W is well-ordered, we have immediately by Thm. 2.9:

4.1. Theorem (Least Ordinal Principle)


If X is anon-empty class of ordinals, then X has a least member. B

Hence other forms of the Principle of Transfinite Induction can be


deduced.

4.2. Theorem (Strong Principle of Transfinite Induction)


If X is a class of ordinals such that for every ordinal &

(*) ne X foreveryn<§&>EeEX,
then X = W.

PROOF

Let Y = W — X. If Y were non-empty, it would have a least member,


say §. So for each n < § we would have n € X. But then by (*) Ee X,
which is impossible. Thus Y must be empty. Z

4.3. Remark

By Rem. 2.22(ii) the antecedent, n € X for every n< &, in condition


(+) of Thm. 4.2 is equivalent to the statement that § C X.

4.4. Theorem (Weak Principle of Transfinite Induction)


If X is a class of ordinals satisfying the following three conditions
(i) Ge X,
$5. The Representation Theorem 69
(ii) for every ordinal E,E€ X > E'e€ X,
(iii) for every limit ordinal 4,AC X >1e X,

then X = W.

PROOF

Assume X satisfies these three conditions. Then by (i) and (iii) X


satisfies condition (*) of Thm. 4.2 for @ and for limit ordinals.
Now suppose &’ C X. By Def. 2.26 it follows that € « X; hence by
(ii) &' € X. Thus X satisfies («) also for successor ordinals. a

4.5. Remarks
(i) These principles have restricted forms, in which X is assumed to
be a subset of some (arbitrary) given ordinal @ rather than a
subclass of W. Thus, the form of Thm. 4.1 restricted to an
arbitrary ordinal w says that a non-empty subset of a has a least
member. The restricted form of Thm. 4.2 says that if X is a
subset of aw such that for all §< awe have §C X > Ee X, then
X=a.
(ii) The Principle of Transfinite Induction restricted to the particular
ordinal is precisely the Principle of Induction on Finite Ordin-
als.

4.6. Problem
Prove the restricted form of Thm. 4.2. Formulate and prove a form of
Thm. 4.4 restricted to an arbitrary ordinal.

§5. The Representation Theorem


5.1. Preview
In this section we shall show that every well-ordered set is similar in its
ordering to a unique ordinal.

5.2. Definition
A partially ordered set (briefly, poset) is a pair (A, <), where A is a
set and < is a [sharp] partial order on A. A totally ordered set is a
poset (A, <), in which < is a total order on A. A well-ordered set is a
poset (A, <), in which < is a well-ordering on A.
70 4. Ordinals

5.3. Remarks
(i) This is just a convenient way of packaging a set A together with a
particular partial order on A into a single object. It saves us
having to keep saying ‘such-and-such a set with such-and-such a
partial order on it’.
(ii) However, we shall often refer, somewhat inaccurately, to A itself
as the poset (or ordered set, or well-ordered set) when, strictly
speaking, we have in mind the pair (A,<). We shall only
commit this peccadillo when it is clear from the context which
relation < is involved. Thus, we refer to an ordinal @ as a
well-ordered set, when strictly speaking we mean the pair
(a, <), where < is €,, the €-well-ordering on a.

5.4. Definition
A similarity map (a.k.a. isomorphism) from a poset (A, <) to a poset
(A', <') is a bijection f from A to A’ such that, for all x and y in A,
Pave fy:

If such a map exists, (A, <) is said to be similar (or isomorphic) to


A)

5.5. Remark
It is easy to see that the identity map id, is a similarity map from
(A, <) to itself. Also if f is a similarity map from (A, <) to (A’, <’)
then its inverse f-' is a similarity map from (A’,<') to (A, <).
Finally, if f is a similarity map from (A, <) to (A’,<’) and gisa
similarity map from (A’, <') to (A”, <”) then the composition go f
is a similarity map from (A, <) to (A"”, <").
It follows that similarity is an equivalence relation on the class of
posets.

5.6. Theorem
Iff is a similarity map from an ordinal « to an ordinal B then f is the
identity map id,, hence w = f.

PROOF

First, we prove by strong transfinite induction (restricted to a) that


& = fé for every Fe a.
§5. The Representation Theorem wall

Let € € a. By the induction hypothesis, if 7 < & then » < fn. But if
n<6§ then also fn < f&, since f is a similarity map. Thus for every
n<& we have n< f&. In particular, n # f& for every n < &; in other
words, f§ < & is impossible. This proves that & < f& and completes the
induction.
Now, f! is a similarity map from £ to a; therefore by the same
token we have also €< f ‘6 for all Ce £. Taking € to be f&, where
E € aw, we obtain f§ < f -'f& = E. Thus fE < Eas well as E < fE, which
shows that f must be the identity id,. |

5.7. Corollary
For any poset (A,~<), there exists at most one similarity map from
(A, <) toan ordinal.

PROOF

If f and g are isomorphisms from (A, <) to a and f respectively, then


the composition gof ' is clearly an isomorphism from @ to f.
Therefore w= B and go f_! is the identity mapping, which means that
ae. 8

5.8. Preliminaries
(i) For the rest of this section, we consider a fixed but otherwise
arbitrary well-ordered set (A, <).
(ii) If B C A, then B is clearly well-ordered by the relation < Be
that is:

{(x, y): x € B, and y € B, and x < y},


which is called the restriction of < to B. Whenever we refer to a
subset B of A as well-ordered, we shall mean B with this
well-ordering, inherited by B from A.
(iii) For each a € A, the segment of A determined by a is the set

Ansan e AN aa}.

(iv) We define a class F as follows:


F =a {(x, €&) : x € A, and € is an ordinal,
and A, is similar to &}.

By Cor. 5.7, F is a function (see Def. 2.2.1). We may therefore


72 4. Ordinals

use functional notation in connection with F. Thus ‘Fx = &


means the same as ‘(x, &) € F’.
Clearly, dom F is a subset of A. By AS dom F is a set; hence
by AR ran F is a set as well. Note that all the members of ran F
are ordinals.

5.9. Lemma
Let Fa = w. Then for any ordinal B < aw there exists some b <a such
that Fb = B. Conversely, if b <a then b belongs to domF and Fb is
some ordinal B < a.

PROOF
Let f be the similarity map from A, to w. Suppose 6 < a. This means
that Be aw. Therefore fb = 6 for some b € A, — that is, b<a. Note
that by the transitivity of wa we have 6 Ca. It is easy to verify that
f{Aj,, the restriction of f to A,, is a similarity map from Ay, to P.
Hence Fb = £.
Conversely, suppose that b < a. This means that b € A,. Therefore
fb = B for some Bea —- that is, B<a. As before, it follows that
Fb = B. B

5.10. Lemma
F is injective.

PROOF

Let a and b be two distinct members of dom F. We have to show that


Fa # Fb. Without loss of generality, we may assume b<a. Let
Fa = w. Then by Lemma 5.9 it follows that Fb is some ordinal B < a.
8

5.11. Lemma
The set ran F is an ordinal.

PROOF

As a set of ordinals, ran F is €-well-ordered. It remains to prove that it


is a transitive set. Let w € ran F; thus Fa = a for some a € A. Now let
Be aw — that is, B<a. Then by Lemma 5.9 B also belongs to ran F,
showing that this set is transitive. S
$6. Transfinite recursion 3

5.12. Theorem (Representation Theorem for well-ordered sets)


There exists a unique similarity map from the well-ordered set A to an
ordinal.

PROOF

Uniqueness follows from Cor. 5.7. To prove existence, we shall show


that F is a similarity map from A to the ordinal ran F. By Lemmas 5.9
and 5.10, F is a similarity map from dom F, which is a subset of A, to
ran F; so it only remains to establish that dom F is the whole of A.
Suppose not. Then, since A is well-ordered, there would be a Jeast
b € A such that b ¢ dom F. Thus, if a € A such that a < b then a must
belong to dom F. On the other hand, if b <a then a cannot be in
domF because if it were then by the second half of Lemma 5.9 b
would also be in that domain.
It would follow that dom F is exactly A,. But then F is a similarity
map from A, to ran F. Thus A, is similar to the ordinal ran F. By the
definition of F it would then follow that (b,ranF) e€ F, hence
b € dom F, contradicting the choice of b. |

5.13. Definition
A set is denumerable if it is equipolient to w. A set is countable if it is
finite or denumerable.

5.14. Problem
(i) Let D be a subset of an ordinal aw. By Cor. 2.20, D is €-well-
ordered; and by Thm. 5.12, D is similar to an ordinal f. Prove
that B <a. (Let f be a similarity map from f to D. Show that
E = fé& for every €€ B.)
(ii) Prove that a set is countable iff it is equipollent to a subset of w.
(Use (i) to show that every subset of is countable.)

§6. Transfinite recursion


6.1. Preview
In this section we validate a powerful method of defining functions on
W (that is, having W as domain). Roughly speaking, F E, the value of
the function F at €, is defined in terms of the ‘behaviour’ of F for all
ordinals smaller than &.
74 4. Ordinals

6.2. Convention
Throughout this section we let C be a fixed but arbitrary function such
that dom C is the class of all sets.

6.3. Definition
We shall write ‘@-(F, w)’ as short for the statement:

F is a function and a' C dom Fand F§ = C(F/§&) forall § < a.

The equation ‘F§ = C(F[&)’ is called an ordinal recursion equation.

6.4. Remarks
(i) Recall that a’ = {§:§ <a}.
(ii) Note that F|& = {(n, Fn): n € &}. Therefore the recursion equa-
tion determines F& in terms of the ‘previous behaviour’ of F —
the restriction of F to the set of all ordinals 7 < §. Note also that
even if F is a proper class, F | § is always a set by AR and Thm.
2a
(iii) Rc(F, v) means that F is defined and satisfies the recursion
equation for all ordinals up to a inclusive. Hence

RF, w) > RC(F, B) for all B S a.

6.5. Lemma
If both RC(F, w) and R(G, w) then FE = G& for all E< a.

PROOF

By (strong) transfinite induction, restricted to a’. Let & be any ordinal


< @ (that is, <a’) and assume, as induction hypothesis, that Fy =
Gn for all n<& — that is, for all ne & This means that F}&=
Ghé, hence C(F/§)= C(GP&). It now follows from @-(F, w) and
RG, av) that FE = GE. s

6.6. Lemma
For any ordinal a@ there exists a unique function fy such that
dom fy = a = {§: § < a} and such that Rc(fy, d).
$6. Transfinite recursion 75

PROOF

Uniqueness follows from Lemma 6.5. We prove existence by strong


transfinite induction. Assume as induction hypothesis that for each
6B < @ there exists a (necessarily unique) function fg whose domain is
B’ = {&: € < B} such that Rc(fp, B).
If y< B<a then by Rem 6.4(iii) we have Rc(fp, y) and hence by
Lemma 6.5 fg(§) = f,(§) for all § < y. This means that fg and f, agree
wherever both of them are defined; in fact, it is easy to see that
f, S fg. By Prob. 2.4.8, we can therefore glue all the fg together to
obtain a single function: we put
f = ULfs: B < a}.
Clearly, f is a function whose domain is {£: 6 < aw} — that is a@ itself -
and it satisfies the recursion equation fB = C(f{f) for all B<a.
Finally, we extend f to a function defined for all 6 S a:
fa F Ua CG),
Then domf,=a’. Also, f=f,/a@ and hence f,(av)
= C(f)=
C(f,!a). Thus f, satisfies the required recursion equation for all
psa. a

6.7. Theorem (Definition by transfinite recursion)


We can define a (necessarily unique) function F such that dom F = W
and such that F§ = C(F{&) forall Fe W.

PROOF
To define F, note that the f, of Lemma 6.6 satisfy the recursion
equation wherever they are defined, and any two of them agree with
each other wherever both are defined. Therefore all we have to do is
glue them together:
F=y Vif, we Wi.

It is easy to see that indeed dom F = W and FE = C(F[6&) for every


Ee W. Moreover, these two conditions fulfilled by F imply that
Rc(F, «) for all w; hence F is unique by Lemma 6.5. a

6.8. Remarks
(i) Note the phrasing of Thm. 6.7: it does not claim that such-and-
such an F exists but that we can define it. To say, in set theory,
76 4. Ordinals

that F ‘exists’ would mean that it is an object of the theory —


which is false, since F is a proper class. In fact, Thm. 6.7 is not a
single theorem of set theory, but a meta-theorem or a theorem
scheme which shows how, for any given class C fulfilling a certain
condition (Convention 6.2), we can define a class F fulfilling
certain other conditions. The same applies to any other theorem,
postulate and definition in which general statements or stipula-
tions are made concerning classes — for example Def. 1.3.4 and
Ax. 1.3.6 (AS): they are not individual statements of set theory,
but schemes. (Compare Rem. 3.3.6.)
(ii) From Thm. 6.7 (or directly from Lemma 6.6) it is easy to obtain a
version of definition by transfinite recursion restricted to any
given ordinal a, in which domF is q@ instead of W and the
recursion equation F§ = C(F}6&) is satisfied for all F< a.
5)
The axiom of choice

§1. From the axiom of choice to the well-ordering theorem


1.1. Definition
A choice function on a class 3 of sets is a function g with domg = 3,
such that gX e X for every X € 3.

1.2. The axiom of choice (AC) states:


If 5 is a set of non-empty sets then there exists a choice function on 8.

1.3. Remarks

(i) AC was the first postulate of set theory (apart from PX) to be
stated as such. Its first known explicit formulation is due to
Giuseppe Peano (1890), who however rejected it as untenable. It
was first proposed as a new valid mathematical principle by
Beppo Levi in 1902, although it had been used inadvertently by
Cantor and others long before that. Zermelo, who was told about
AC by Erhard Schmidt, used it almost at once in his first (1904)
proof of the Well-Ordering Theorem (WOT, Cor. 1.6 below), a
result that had been conjectured by Cantor. Our formulation of
AC is essentially that used by Zermelo in his 1904 paper.
(ii) In his 1908 paper on the foundations of set theory, in which the
theory is given its first fully fledged axiomatic presentation,
Zermelo does not state AC in this form but in a more restricted
version. He assumes that 5 is a set of non-empty sets that are
pairwise disjoint—that is, X Y = © for any two distinct mem-
bers of 5 (see Def. 3.4.1). He then postulates the existence of a
set A such that, for any X ed, the intersection AMX has
exactly one member.

FI
78 5. The Axiom of Choice

This restricted version follows at once from AC. Indeed, if + is


a set of non-empty pairwise disjoint sets, then by AC there exists
a choice function g on 3. It is then easy to see that, for any
Xed,rangn X = {gX}.
Conversely, AC in the form we have stated it follows from the
restricted version. To show this, let 5 be any set of non-empty
sets. Put .

SHAN)
KA GX Cals
It is easy to verify that 7 is a set of non-empty and pairwise
disjoint sets. According to the restricted version, there exists a set
A whose intersection with each member of 7 is a singleton. We
now define a function g on 3 as follows. For any X € 6, the set
{X} x X belongs to 7 and hence its intersection with A has
exactly one member. This member must be of the form (X, x9),
where x9 is some member of X. We put gX =x . Then g is a
choice function on 3.
(iii) Using AC, Def. 3.4.11 is easily legitimized. If |A,|~|B,| for
each x € X, then by AC there exists a family f = {f,|x € X}
such that, for each x, f, is a bijection from {x} x A, to {x} x B,.
Then it is easy to see that Uranf is a bijection from U{{x} x A,:
x eX} to Uf{{x} x B,: x € X}. A similar argument applies to
Dates it:
(iv) AC has been regarded with suspicion because it is a purely
existential postulate. It asserts the existence of a set — a choice
function — without characterizing it as the extension of some
previously specified property. In other words, AC is not a special
case of the Principle of Comprehension. In this respect AC is
markedly different from all other existential postulates of set
theory. For example, the Power-set Axiom asserts that, for each
set A, there exists the power-set PA, which is characterized as
the extension of the property being a subset of A.
(v) In 1938 Gédel proved that AC is consistent relative to the other,
commonly accepted, postulates of set theory, in the sense that if
they are consistent, then the addition of AC does not result in
inconsistency. In 1963 P. J. Cohen proved that the same holds
also for the negation of AC.
(vi) AC has some weird (counter-intuitive) consequences. However,
its negation has even weirder ones: for example, the direct
product of a family of non-empty sets may well be empty. Note
$1. From AC to WOT 79

also that the finite version of AC — in which the set S is assumed


to be finite — can be deduced from the remaining postulates of
ZF. Thus AC is only needed as an additional postulate for the
case where d is infinite. It therefore appears as a natural exten-
sion to the infinite case of a principle that must in any case be
accepted in the finite case.
(vii) Most mathematicians regard AC as indispensable: without it,
many results in modern mathematics as well as in set theory itself
would be unprovable. However, in view of its somewhat contro-
versial status, when the AC is needed for proving a mathematical
result, it is customary to point this out.

1.4. Preview
Starting from AC, we shall prove a chain of other major principles, all
of which turn out to be equivalent to each other and to AC. The first
of these principles, which is also the most important, is a corollary of
the following theorem.

1.5. Theorem
Every set is equipollent to an ordinal.

PROOF
Let A be a set, and let 5 be the set PA — {©} of all non-empty subsets
of A. By AC there exists a choice function g on 3. Since A is a set, it
cannot be the universal class (Thm. 1.3.10); so there exists an object b
that does not belong to A.
We now define a function C whose domain is the class of all sets, as
follows: for any set x we put
F [g(A —ranx) ifx isa map such that ranx C A,
(*) o |b otherwise.

Using transfinite recursion (Thm. 4.6.7), we get a function F with W


as domain, satisfying the recursion equation F§ = C(F[&) for all
E € W. Combining this equation with (*), we obtain for all &:

_ |g(A — ran(Ff6)) if ran(F[é) CA,


iS os b otherwise.
80 5. The Axiom of Choice

Let & be any ordinal such that F§ # b. This means that F | must be a
map from & to A, and
FE =g(A — ran(F/&)) e A — ran(Ff6).

Thus Fé is a ‘fresh’ member of A, different from Fn for all n< &.


(What happens is that so long as A is not exhausted by previous values
of F, the new value F& is chosen, using the choice function g, as a
fresh member of A.)
If FE#b for all ordinals &, it would follow that F is an injection
from the proper class W (Cor. 4.2.18) to the set A. This is impossible
by Prob. 2.4.5. So there must exist some ordinal § for which F& = b.
Let a be the Jeast ordinal such that Fa = b. Such an a exists by the
Least Ordinal Principle (Thm. 4.4.1). Then it is easy to see that F [a is
an injection from @ — that is, from the set {§: §< a} -—to A. Also,
ran(F}q@) cannot be a proper subset of A. Thus F}a@ is in fact a
bijection from @ to A. Bq

1.6. Corollary (Well-Ordering Theorem)


For every set A there exists a well-ordering on A.

PROOF

By Thm. 1.5, there exists a bijection F from an ordinal aw to A. Now


put
< =a {(FE, Fn) << a}.
This means that for any members x and y of A, x < y iff §< 7, where
€ and 7 are the (necessarily unique) ordinals < aw such that x = Fé and
y = Fn. Clearly, < is a well-ordering on A. 8

1.7. Remarks

(i) With F, w and < as above, F} a is a similarity map from @ to the


well-ordered set (A, <).
(ii) Thms. 1.5 and Cor. 1.6 are equivalent to each other. Indeed, the
former can easily be deduced from the latter using the Represent-
ation Theorem 4.5.12. We shall therefore refer also to both Thm.
1.5 and Cor. 1.6 as the WOT.
Another important consequence of Thm. 1.5 is that the class of
cardinals is totally ordered (see Def. 2.3.11 (ii)):
§2. From WOT to AC 81

1.8. Corollary
For any sets A and B, A| <|B| or |B| <|A|.

PROOF

By Thm. 1.5, A and B are equipollent to ordinals, say w and 6


respectively. Since the class of ordinals is €-well-ordered, it follows
(see Lemma 4.2.4) that ae B or ~=8 or Bea. But ordinals are
transitive sets, hence aC Bor BC a. p]

§2. From the WOT via Zorn’s Lemma back to AC


We start by proving two simple lemmas about finite sets, which do not
depend on AC.

2.1. Lemma

If BC A and A is equipollent to a finite ordinal aw, then B is equipollent


to an ordinal B < a. Hence every subset of a finite set is finite.

PROOF

Let BCA, where A is equipollent to a finite ordinal w. Then B is


clearly equipollent to some D C a. By Prob. 4.5.14(i), D is similar —
and hence equipollent — to some ordinal 6 < aw. However, since here wv
is finite, Thm. 4.3.12 excludes the possibility that 6 = w. Therefore
Beg. 6

2.2. Lemma
If f is a map such that dom f is finite then ran f is finite as well.

PROOF
By Def. 4.3.15, dom f is equipollent to a finite ordinal w. Without loss
of generality we may therefore assume that dom f is a itself. (Other-
wise, replace f by feh, where h is a bijection from a to dom f.)
Define a map g from ran f to a by putting, for each x € ran f,

gx =g, the least € € w such that f& = x.


82 5. The Axiom of Choice

It is easy to see that g is injective, hence it is a bijection from ran f to


some subset D of w. By Lemma 2.1, D is finite; therefore so is ran f.
B

Next, we lay down a few definitions.

2.3. Definition
Let < be a partial order on a class A. A member a of A is said to be
maximal in A with respect to < if there is no x € A such that a < x.

2.4. Remarks

(i) When there is no risk of confusion, we shall omit the phrase ‘in A
with respect to <’.
(ii) In general, A may not have a maximal member; or it may have
more than one.
(iii) Do not confuse maximal with greatest. However, if < is a total
order on A and a is maximal in A then a is also the greatest
member of A, in the sense that x < a for any other x < A for any
other x € A. In this case it is clear that A cannot have more than
one. maximal member.

2.5. Definition
If ¢ is any class of sets, we put
Gus ade CX, Ve. [= A ee ray

C4 1s called the restriction of C to A.

2.6. Remarks

(i) We can also characterize the relation C_, by saying that, for any
X and Y,

XCy4xYeXectandYectandx
CY.

(ii) As noted in Ex. 2.3.8, if -# is any class of sets, Cy is a [sharp]


partial order on c#.
§2. From WOT to AC 83

2.7. Definition
A class ct of sets is of finite character if, for any set X,

XeAd=sY ee for every finite Y C X.


We shall use the WOT to prove the following useful result.

2.8. Theorem (Tukey—Teichmiiller Lemma).


If cot is a set of finite character, then for every A € A there exists an
M € <4 such that A C M and M is maximal in A w.r.t. Cy.

PROOF

By the WOT, c# is equipollent to some ordinal aw. Let G be a bijection


from @ to ct. Thus
A={GE2E <a}.

Take any A € c+; we shall hold A fixed for the rest of the proof.
Without loss of generality, we may assume that A = G©@ — otherwise,
we could compose G with the bijection from c# to itself that inter-
changes A with G©@ and leaves all other members of < alone.
Using transfinite recursion restricted to w (see Rem. 4.6.8(ii)), we
define a map F on a such that, for every §€< a,

re- 1G if U{Fn: n< &} C GE,


|UtFn: n<&} otherwise.
(Note that {Fn:7< &} =ran(F/6&), so that here F& is indeed being
determined in terms of F |}&, as required in transfinite recursion.)
It is clear that F is monotone in the sense that whenever n< §<a
then Fyn C FE.
We claim that F& € - for every §< a. We shall prove this claim by
strong transfinite induction restricted to a. Let §< qa; our induction
hypothesis is that Fy € 4 for every n < &.
Now, Fé is GE or U{ Fn: 1 < &}. Since certainly GE € -#, we need
only prove that the union U{Fn:< &} belongs to c#. But cA is a set
of finite character. So it is enough to show that every finite subset of
U{Fn:n<&} belongs to -#. We need only deal with non-empty
subsets, since @ is a finite subset of A, and as such must in any case
belong to cA.
Let B be a non-empty finite subset of U{Fn:n< &}. Then for each
b € B there exists some n < & such that b € Fn. Define a map f from
84 5. The Axiom of Choice

B to & by putting, for each be B,

fb =g, the least ny < € such that b € Fn.

By Lemma 2.2, ran f is a finite non-empty set of ordinals < §. Hence


by Prob. 4.3.17(ii) ranf has a greatest member, say 7*. This means
that for every b € B we have fb < n*; and, since F is monotone, it
follows that F(fb)
C F(n*). But by the definition of f we have
b € F(fb); hence

b € F(fb) C F(n*) for every b € B.

Thus BC F(n*). But n* < &, so by our induction hypothesis F(7*)


belongs to cA; and since c# is of finite character B, as a finite subset of
F(n*), must also belong to -. This completes the proof that F§ € #
for every €< a.
We now put M = U{Fn:n< a}. We shall show that M has the
properties claimed by our theorem. The fact that M e€ ct is proved by
showing, exactly as before, that every finite subset of M belongs to c.
Also, it is easy to see that FU = GO = A, hence AC M.
It remains to show that M is maximal w.r.t. C_,. Suppose this were
not so. Then there would be some X € & such that MC X. Now, X
must be G& for some §< a, so the assumption M C X means that
U{Fn: <a} C GE. Hence, a fortiori,

U{Fn:n<§} C GE.
But in this case the definition of F says that F§ = G&. It would then
follow that U{ Fn: 1 < a} C FE- which is impossible. @

2.9. Definition
Let (A, <) be a poset. A chain in (A, <) is any subset C of A such
that,
for allx and yinC,x< yorx=yory
<x.

2.10. Remark
In other words, a chain in (A, <) is a subset of A that is totally
ordered by the restriction of < to it.

We shall use the Tukey—Teichmiiller (TT) Lemma to prove:


§2. From WOT to AC 85
2.11. Theorem (Hausdorff Maximality Principle)
Let (A,<) be a poset and let © be the set of all chains in (A, =<),
Then every member of @ is included in some member of @ that is
maximal w.r.t. Ce.

PROOF

The condition for C being a chain in (A, <) (see Def. 2.9) involves
only two members of C at a time. Hence it is easy to see that the set @
of all chains is of finite character. Therefore the TT Lemma applies to
é. @

The most famous and frequently used of all the maximality principles
that are equivalent to AC is generally known as ‘Zorn’s Lemma’
although it is arguably due to Kuratowski, who published a version of
it in 1922, thirteen years before Zorn. We shall now deduce it from the
Hausdorff Maximality Principle (HMP). (For the meaning of upper
bound, see Def. 4.2.23.)

2.12. Theorem (Zorn’s Lemma)


Let (A, <) be a poset such that every chain in it has an upper bound in
A. Then for each a € A there is some u € A such that u is maximal in A
w.r.t. < and such that a Xu.

PROOF

As before, let @ be the set of all chains in (A, <), and consider the
poset consisting of @ with the partial order Cg on it.
The singleton {a} is, trivially, a chain in (A,<). Hence by the
HMP {a} is included in a chain C that is maximal in @ w.r.t. Ce. By
hypothesis, C has an upper bound wu in A. Since a € C, it follows that
au.
It remains to show that wu is maximal in A. Suppose it were not
maximal. Then there would exist some v such that u < v. Since u is an
upper bound for C, it would follow that x < v for all x ec C. But then
C U {v} would be a chain that properly includes C — contradicting the
maximality of C in @. @

We have shown that


AC = WOT = TI Lemma > HMP => Zorn’s Lemma.

Now we shall complete the cycle:


86 5. The Axiom of Choice

2.13. Theorem
ACfollows from Zorn’s Lemma.

PROOF
Let 5 be a set of non-empty sets. We must show that there exists a
choice function on 3.
If 5 is empty then © is the required choice function. So from now on
we may assume that 3 is non-empty.
Let us say that / is a partial choice function (pcf), if ¢ is a choice
function on a subset of 3. Such creatures do exist: for example, if A is
any member of J and a is any member of A then {( A, a)} is a choice
function on {A} and hence a pcf. Let F be the set of all pcfs. (It is
easy to verify that (F is indeed a set; DIY.) As we have just seen, (F is
non-empty.
We now consider the poset ((F, Cz). Note that if / and g are pcfs,
then {Cg means that dom/Cdomg and /X =gX for each X €
dom/.
We shall show that ((7,C¢) satisfies the condition of Zorn’s
Lemma. To this end, let us consider any chain @ in this poset. We
claim that its union, Ue, is an upper bound for €@ in (Ff.
For any / € @ we obviously have / C UZ. So it only remains to show
that Ue belongs to (F; in other words, that U@ isa pef.
Since every member of @, being a pcf, is a set of ordered pairs
(X,x) such that x e X €4, it is clear that UE likewise is a set of
ordered pairs of this kind. It only remains to show that U@ is a
function.
Now, if both / and g are members of € then, since €@ is a chain,
we must have /Cg or g C/¢/. Therefore X edom/Mdomg then
{X = qX. Thus the coherence condition is fulfilled, showing that Ué is
indeed a function (see Prob. 2.4.8).
We can now apply Zorn’s Lemma to the poset ((F, C+). Since F is
non-empty, it follows from the Lemma that there exists some g € (F
that is maximal w.r.t. Cz. Such g is a pef — a choice function on a
subset of 3. However, if domg were not the whole of 5, we could take
any A € 5 — domg and any a € A, and put

f= gU {(A, a)}.
Then / would be a pef such that g C /, contradicting the maximality of
g. Therefore g must be a choice function on the whole of 3. s
§2. From WOT to AC 87

2.14. Remarks

(i) We have now established

AC => WOT => TT Lemma > HMP => Zorn’s Lemma => AC,

hence these five principles are mutually equivalent.


(ii) These principles can be deduced directly from each other, with-
out going round the cycle. Some of these deductions are quite
easy. For example, to deduce AC directly from the WOT, let 3
be any set of non-empty sets. Note that if X¥ ed then X C WES
By the WOT, there exists a well-ordering < on he Thensa
choice function g on ¢ is obtained by putting, for all X e 3,

gX = the least member of X w.r.t. <.

It is also not difficult to deduce the TT Lemma directly from


Zorn’s Lemma (DIY!). However, the only direct routes I know
from AC to the three maximality principles (TT Lemma, HMP
and Zorn’s Lemma) are quite rocky.
6
Finite cardinals and alephs

§1. Finite cardinals


1.1. Preview
In this chapter we will complete the definition of cardinal and cardinal-
ity, which has so far been left open (see Rem. 3.1.4), and derive some
important results about cardinals. In the present section we confine
ourselves to finite sets and cardinals; here we shall not invoke AC.
Recall that by Def. 4.3.15 a set is finite iff it is equipollent to a finite
ordinal (that is, an ordinal <q); moreover, by Thm. 4.3.14 this
ordinal is unique. Hence the following definition is legitimate.

1.2. Definition
For any finite set A, the cardinality |A\| of A is the (necessarily unique
and finite) ordinal w such that A ~ a. A finite cardinal is an ordinal w
such that |A| = w for some finite set A.

1.3. Remarks

(i) Clearly, if A and B are finite sets then |A| =|B| iff A ~ B, as
required by the incomplete Def. 3.1.3.
(ii) By Def. 1.2, a finite cardinal is a finite ordinal. Conversely, if a is
a finite ordinal, then obviously |a| = a. Thus the finite cardinals
are just the finite ordinals by another name.
(iii) Let n be any natural number. By Def. 3.3.1 and Prob. 4.3.18, the
corresponding cardinal, n, is finite. This result also follows from
the next theorem, in which we calculate these cardinals.

88
§1. Finite cardinals 89

1.4. Theorem

(i) 0O= G0; 1= {O}.


(ii) If w is a finite cardinal, a+1= a’; hence w+1 is a finite
cardinal.
(iii) If m is a natural number and n = m + 1 then n = {0,1,..., m}.

PROOF

(i) By Def. 3.3.1, 0 = |@| and 1 = |{@}|. But by Thm. 4.3.3 @ as


well as ©’ — which by Def. 4.2.26 is {@} — are finite ordinals;
hence |@| = @ and |{@}| = {O}.
(ii) Here + is the operation of cardinal addition; so by Def. 3.4.4,

a+1=|AUBI,
where A and B are any disjoint set such that |A| =a and
\B| =1.
As A we take a@ itself. As B we may then take any set
equipollent to 1 — that is, any singleton — provided it is disjoint
from a. We put B= {a}, which is disjoint from a@ because an
ordinal cannot belong to itself (see Rem. 4.2.19(ii)). Hence

"fad WeDo
a I io
But by Def. 4.2.26 this is |a’|. Moreover, by Thm. 4.3.3(ii), since
w is a finite ordinal so is a’. Hence w+ 1= a’, which (as we
have just noted) is a finite ordinal.
(iii) We proceed by weak mathematical induction on m. For m=0n
is 1 and the required result, 1 = {0}, follows at once from (i).
Now assume, as induction hypothesis, that m is a number for
which (iii) holds. Let p = (m+1)+1=n+1. Then

p=nt+1 by Thm. 3.4.6,


=nU {n} by (ii),
2 ouey.Uin}
= 40,1). by ind. hyp.,
=O Lartaray ity |

1.5. Theorem
For any finite cardinals a and B, a + f is a finite cardinal. Moreover,
a+0=aanda+ fp’ =(a+ f)’.
90 6. Finite cardinals and alephs

PROOF
By Prob. 3.4.7(iii), the equality a +0 =a holds for all cardinals «,
not just for finite ones.
To prove that a + f is a finite cardinal, we apply to f induction on
finite ordinals.
For B = ©, the sum a + f is a+ 0 by Thm. 1.4(i), and we have just
seen that this is the finite cardinal a.
Now assume, as induction hypothesis, that (6 is a finite cardinal such
that w + f is also a finite cardinal. Then

(tap =a + (P+ t) by Thm. 1.4(ii),


=(at Bidet by Prob. 3.4.7(i).
By our induction hypothesis, w + f is a finite cardinal; hence by Thm.
1.4(ii) so is (w + B) + 1. This shows that w + f’ is a finite cardinal, and
completes the induction on /.
Finally, we have just shown, for any finite cardinals w and f, that
a+ Pp’ =(a+ B) +1. By Thm. 1.4(ii) this equals (aw + B)’. &

1.6. Theorem
For any finite cardinals w and B, a: B is a finite cardinal. Moreover,
a:0:=Oanda:p'=a-Br+a.

PROOF

DIY: proceed as in the proof of Thm. 1.5, using Prob. 3.5.5. me

1.7. Problem

Prove that if < is a [sharp] total order on a finite set A, then < is a
well-ordering on A. (Apply induction on finite ordinals to |A|. For any
non-empty subset B of A you must show that B has a least member. If
BCA, use Lemma 5.2.1. If B is A itself, let a be any member of A
and apply the induction hypothesis to A — {a}.)

1.8. Remark

In 1889, Peano proposed an axiomatization of the theory of natural


numbers.! In addition to some purely logical axioms (which must be

' A translation of his paper, ‘The principles of arithmetic, presented by a new method’,
is in van Heijenoort, From Frege to Gédel.
§1. Finite cardinals 91
satisfied by any system whatever) he proposed five postulates which we
now state, with some inessential modifications.

(1) 0 is a natural number.


(2) Every natural number m has a unique successor, s(m).
(3) If m and n are distinct natural numbers, then s(m) # s(n).
(4) For every natural number m, s(m) #0.
(5) (Principle of Mathematical Induction.) Let K be any set such
that K contains 0 and such that if it contains any natural number
m then it also contains s(m); then K contains every natural
number.

The operations of addition and multiplication of natural numbers


can then be introduced by means of four further postulates that assert,
for any natural numbers n and m:

(6) m+0=m.
(7) m+s(n) =s(m + rn).
(8) m:0=0.
(9) m-s(n)=m-n+m.

Intuitively speaking, it is clear that these nine postulates express truths


about the system of natural numbers. And in fact they are adequate
for an informal axiomatic development of the arithmetic of natural
numbers.
Now, speaking more formally, in ZF we have proved for the finite
cardinals (a.k.a. finite ordinals) theorems that are exact counterparts
of Peano’s postulates. To be precise: if in the statement of these
postulates we replace the words ‘natural number’ by ‘finite cardinal’,
and the symbols ‘0’ and ‘s’ respectively by ‘@” and ‘”’ (writing the latter
to the right of its argument instead of to its left) and if we understand
the symbols for addition and multiplication as denoting respectively
addition and multiplication of cardinals, then all nine postulates be-
come theorems of ZF. In this sense, the system consisting of the set w
of finite cardinals together with the operations of succession, addition
and multiplication on these cardinals, provides in ZF a model for
Peano’s postulates.
Moreover, this model is structurally unique in the following sense. In
ZF it is not difficult to prove that any system of objects and operations
satisfying the appropriate re-interpretation of Peano’s postulates must
be structurally identical, an exact structural replica of (technically
speaking: isomorphic to) the system of the finite cardinals.
In this sense, the finite cardinals play within ZF the role of natural
G2 6. Finite cardinals and alephs

numbers. And mathematicians developing (or simulating) various


branches of mathematics within set theory are justified in identifying
the finite cardinals with the natural numbers, for the purpose of this
activity (cf. Rem. 4.3.8).

1.9. Warning
All this does not quite answer the question whether the ZF system of
finite cardinals is a faithful and correct representation of the (informal)
system of natural numbers, which mathematicians had studied long
before the invention of set theory.
Note that for any natural number n,. we can prove that the cor-
responding cardinal n is a finite cardinal (Thm. 1.4, or Def. 3.3.1 and
Prob. 4.3.18). But we have not proved that
(*) Every finite cardinal has the form n for some natural number n.

At first sight it seems easy to prove (*) by applying induction on finite


ordinals (Cor. 4.3.7) to the ‘set’
{aw € w: w= n for some natural number n}.

But in order to be able to do so, we must first prove that such a set
exists as an object of set theory. This, in turn, requires the property
being a natural number, in terms of which this would-be set is defined,
to be a set-theoretic concept (see discussion at the end of § 2 and
beginning of § 3 of Ch. 1). But we have taken the notion of natural
number as given in advance, prior to the development of set theory (cf.
Rem. 3.3.6); and without begging the question we cannot presuppose
that it is also a set-theoretic notion.
We have no assurance that the ZF system of finite cardinals is a
faithful and correct representation of the pre-ZF informal system of
natural numbers, so long as the status of (*) is in question. We shall
see in the Appendix that this question has a rather surprising answer.

§2. Cardinals in general


To extend the definition of cardinality to infinite sets, we invoke AC,
via the WOT (Thm. 5.1.5). According to this theorem, every set A is
equipollent to some ordinal, and hence by the Least Ordinal Principle
(Thm. 4.4.1) there is a unique /east ordinal to which A is equipollent.
$2. Cardinals in general 93

2.1. Definition
For any set A, the cardinality |A| of A is the least ordinal a such that
A ~ a. A cardinal is an ordinal a such that |A| = a for some set A.

2.2. Remarks
(i) This definition obviously agrees with Def. 1.2 when A is a finite
set.
(ii) Def. 2.1 clearly satisfies the condition imposed in Def. 3.1.3: for
any sets A and B, |A| =|B| iffA ~ B.
(iii) From Def. 2.1 it follows at once that a cardinal is an ordinal that
is not equipollent to any smaller ordinal. Conversely, if an
ordinal @ is not equipollent to any smaller ordinal, then clearly
|a| = a, so that @ is a cardinal.
(iv) If A and w are cardinals, then the statement ‘A < w’ is apparently
ambiguous, because we can interpret ‘<’ according to Def. 4.2.21
(that is, as denoting the order on the class of ordinals) or
according to Def. 3.2.1. In the next lemma we shall prove that
these two interpretations are in fact equivalent. In the formula-
tion and proof of this lemma we shall use the symbol ‘S’ in the
sense of Def. 3.2.1 only, so as not to prejudge the issue. There-
after, we shall revert to using ‘S’ in either sense, as it will make
no difference.

2.3. Lemma
For any cardinals 4 and uw, 4 < wu (in the sense of Def. 3.2.1) iff A € wor
A=.

PROOF
Suppose A € wor A= u. Since ordinals are transitive sets, it follows that
AC u. Hence by Thm. 3.2.3 |A| < |u|. But A and w are cardinals, so
|A| = A and |u| = uw. Thus A < w.
Conversely, suppose that A ¢ uw and A# uw. Then, since the class of
ordinals is €-well-ordered, we must have we A. In the same way as
before, it now follows that «<A. Hence we cannot have AS yw, as by
the Schréder—Bernstein Theorem 3.2.7 it would then follow that A = wu,
contrary to hypothesis. @
94 6. Finite cardinals and alephs

2.4. Problem
Prove that if w is an infinite ordinal then |a| = |a’|; hence a’ cannot be
a cardinal. (Let f be the map such that domf =a’, f§ = &' for all
finite &, f& = & for all infinite §< a, and fa =. Show that f is a
bijection from a’ to @.)

2.5. Theorem
w is the least infinite cardinal. G

2.6. Theorem
If A is a set of cardinals, then UA is the lub of A in the class of all
cardinals, that is, the least cardinal A such that §<A forall §€ A.

PROOF

For each Ee A we have EC UA by Def. 1.3.11, hence |&| <|UA| by


Thm. 3.2.3. But & is a cardinal, hence |&| = &. Thus €<|UA| for all
E € A. This shows that the cardinal |UA| is an upper bound for A.
Note that |UA|, being a cardinal, is a fortiori an ordinal. But by
Thm. 4.2.25 UA itself is the least upper bound of A in the class of all
ordinals, hence UA < |U.AI.
On the other hand, from Def. 2.1 it is clear that |a| < w for every
ordinal aw. Since by Thm. 4.2.25 UA is an ordinal, it follows that
|U.A| < UA. Hence |U_A| is UA itself, and is the lub of A in the class
of cardinals. ®

2.7. Theorem

For any set A of cardinals there exists a cardinal (and, in particular, an


infinite cardinal) greater than all the members of A.

PROOF

Let A be the lub of A obtained in Thm. 2.6. By Cantor’s Theorem


3.6.8, there exists a cardinal uw such that A < yw, and hence also & < yu for
all € € A. If wis infinite, there is nothing further to prove. If wu is finite,
then q is an infinite cardinal such that u< w and hence also E < w for
all Fe A. a
§2. Cardinals in general 95

2.8. Corollary
The class of all cardinals is a proper class. |

2.9. Lemma
We can define a (necessarily unique) function F such that dom F = W
and for every ordinal a,

Fa = the least infinite cardinal not belonging to ran(F {|@).

PROOF

This follows from Thm. 4.6.7 (definition by transfinite recursion). We


only need to take as the C of that theorem a function such that
whenever x is a set that is also a function, Cx is the least cardinal not
belonging to ranx. (Note that ranx is a set by AR, hence by Thm. 2.7
there exists an infinite cardinal not belonging to it; so by the Least
Ordinal Principle 4.4.1 there is a Jeast such cardinal.) i]

2.10. Definition
For any ordinal a,
SH = df Fa,

where F is the function of Lemma 2.9.

2.11. Remarks
(i) ‘8’ is aleph, the first letter in the Hebrew alphabet. It is also the
first letter of the Hebrew word ‘101°X’ (einsoph, meaning
infinity), which is a cabbalistic appellation of the deity. The
notation is due to Cantor, who was deeply interested in mysti-
cism.
(ii) Combining Def. 2.10 with the characterization of F in Lemma
2.9, we obtain:

8, = the least infinite cardinal not belonging to the set

OS eu Sx) 2
96 6. Finite cardinals and alephs

2.12. Theorem
(i) For any a, &,q is an infinite cardinal.
(ii) For any ordinals w and B, ~< B>®q < Xz.
(il) Xo = @.

PROOF .
All three statements follow easily from Rem. 2.11(ii). w

2.13. Theorem
Every infinite cardinal is %_ for some ordinal a.

PROOF
From Thm. 2.12(ii) it follows that wa# B+, #&,. This means that
the function F of Lemma 2.9 is a bijection from the class W of all
ordinals to the class {8,: a € W} of all alephs. Since W is a proper
class (Cor. 4.2.18), it follows from Prob. 2.4.5 that the class of all
alephs must likewise be a proper class.
Now let A be any infinite cardinal. Then A, being an ordinal, is a set.
Hence there must be some a@ such that &, ¢ A — otherwise the set A
would include the class of all alephs, and by AS the latter would be a
set, contrary to what we have just shown.
Since both A and &, are ordinals, the fact that S, ¢ A implies that
A<,. If A= Ny, then there is nothing further to prove. On the other
hand, if A< &, then by Rem. 2.11(ii) it follows that A belongs to the
set {Ne: € < a}. Hence A= &; for some §< a. g

2.14. Remarks
(i) By Thms. 2.12 and 2.13, the alephs are just the infinite cardinals
by another name. Moreover, each infinite cardinal is an 8, for
some unique ordinal @.
(ii) The theory of real numbers, as other branches of mathematics,
can be developed within set theory. In doing so, one identifies
the finite cardinals with the natural numbers (see Rem. 1.8). It is
then not difficult to show that PXy (= Pw by Thm. 2.12(iii)) is
equipollent to the continuum — the set of all real numbers. (It is
§3. Arithmetic of the alephs 97

also equipollent to the set of all real numbers lying in any given
interval, for example, between 0 and 1.) The cardinal |PXo| is
therefore the cardinality of the continuum.
Cantor conjectured (but was unable to prove) that |P&9| =X).
This conjecture is known as the Continuum Hypothesis (CH).
More generally, the Generalized Continuum Hypothesis
(GCH) is the conjecture that |P®.| = &q for every a.
(iii) In 1938 Gédel proved that GCH is consistent relative to the
commonly accepted postulates of set theory, in the sense that if
they are consistent, then the addition of GCH does not result in
inconsistency. In 1963 P. J. Cohen proved that the same holds
also for the negation of CH (and hence GCH).

§3. Arithmetic of the alephs


3.1. Preview
In this section we shail present some important results in the arithmetic
of the alephs. Some of the proofs are given in a slightly abbreviated
form, omitting a few details. We present separately an outline of the
proof of Thm. 3.2, although it is a special case of Thm. 3.3. This is
done as a dry run, in order to display more clearly, in a simpler
context, the idea of the proof.

3.2. Theorem
Ro . Ro = Ro.

PROOF (OUTLINE)
According to Def. 3.5.1, &o-*o is the cardinality of the set A x B,
where A and B are any sets whose cardinality is $9. We shall take both
A and B to be &j itself.
Recall that by Thm. 2.12(iii) So = @, which is the set of finite
ordinals (as well as the set of finite cardinals). Thus we must show that
the set w X w of all ordered pairs of finite ordinals is equipollent to w
itself.
For any ordinals € and n, we let max (&, ) be the greater of € and n.
(If &= n then max (, 7) is equal to both of them.)
We define an order < on the set w X w as follows. For any finite
ordinals &, n, g and w we stipulate that (&, 1) < (@, wy) iff one of the
98 6. Finite cardinals and alephs

following three conditions holds:

(1) max (&, n) < max(q, w),


(2) max (&, 7) = max (9, p) and & < @,
(3) max (&, 1) = max(q, y) and § = mand n< y.

To make this clearer, here are the first few members of w X w, listed
according to the order <:

(0,0),
(0,1), (1,0), (1,1),
(0,2), (1,2), (2,0), (2,1), (2,2),
COPS). (1,3) 742,55. (3, Oneal) ahSs Bh, os Seer
It is not difficult to see that w X w with this order on it is similar to w
itself with its €-well-ordering. In particular, w x w is equipollent to a.
ee

3.3. Theorem
Nat Sa = Xq for any ordinal a.

PROOF

We proceed by transfinite induction. As induction hypothesis we


assume that g°&, = Xz for all B< a.
As in the proof of Thm. 3.2, we define an order < on &, X Xq by
stipulating, for any ordinals §, 7, m and yw smaller than &,, that
(E,n) < (q@, w) iff one of the conditions (1), (2) and (3) listed there
holds.
It is easy to verify that < is a well-ordering on 8, X Sq. Hence, by
the Representation Theorem 4.5.12, there exists a similarity map f
from &, X &q to an ordinal 6. Since F is a bijection from &, X &, to 6,
it follows that 8y*%q =|6|. We shall show that this 6 is in fact 8,
itself.
First, note that 8, =1-Nqg <Na*Nq = |6| < 6. Now suppose that
Sq <6. This means that 8, € d=ranf; so for some & and n, both
smaller than %,, we have f(&, 1) = X,.
Since € and 7 are smaller than &,, their cardinalities are certainly
smaller than &,. Let € = max(&, 7). Then |€| is either a finite cardinal
or some &g such that B< a.
Let us put A = {(@, W) : (g, w) < (&, n)}. Then, by the definition
$3. Arithmetic of the alephs 99

of <, for each (gp, y) € A we must have y< £ and y< €. Therefore
A is a subset of €’ x &’, hence |A| < |€'| -|C'|.
If ¢ is finite then €’ is finite as well and hence, by Thm. 1.6, so is
|A|.
If |¢) = 8g for some B < a, then by Prob. 2.4 |¢’| = Xz as well, so by
the induction hypothesis |A| < &,. Thus in any case |A| is smaller than
St
However, since f(§, 1) = Xq, it follows that f |A is a bijection from
A to &, and hence |A| =, — contrary to what we have just shown.
This contradiction shows that 6 must be equal to X,. =

3.4. Remark
In view of Thm. 2.13, Thm. 3.3 means simply that AA=A for any
infinite cardinal A.

3.5. Theorem
If uw is an infinite cardinal and i is any cardinal such that 1S AS yp,
then Au= wu.

PROOF

Using Prob. 3.5.5 and Thm. 3.3, we have:


w=l[bSAuS
p= yb.
Thus both uw S Awand Au S wu. Wi

3.6. Theorem
If u is an infinite cardinal and A is any cardinal such that 1< w, then
A+ P= UE

PROOF

Using Probs. 3.4.7 and 3.5.5 and Thm. 3.3, we have:


uw=O+usSsAtusut pw=iytip=(14 I= 2us p= py.

Thus both uw<A+ mwandA+uS uy. |

3.7. Theorem
If A is an infinite cardinal and « is any finite cardinal other than 0, then
AX =A.
100 6. Finite cardinals and alephs

PROOF

DIY, using induction on the finite ordinal aw. &

3.8. Definition
Let A be aclass. A map from an ordinal @ to A is called an A-string of
length w. A map from a finite ordinal to A is called a finite A-string.

3.9. Theorem
Let A be an infinite set and let S be the class of all finite A-strings. Then
S is a set and |S| =|Al|.

PROOF

If w is a finite ordinal then aC w. Hence every finite A-string is a


subset of w x A. It follows that S C P(w x A); so S is a set by AP and
AS.
For each finite ordinal aw, consider the set Sy = map(qa, A) of all
A-strings of length w (see Def. 3.6.1 and Rem. 3.6.2). Clearly, the S,
are pairwise disjoint and

S =U{S,:
a < o}.
Hence it is easy to see that

(*) |S] = D{|Sa| |a < o}.


Let |A| = A. Since Sy = map(q, A), it follows from Def. 3.6.3 that
[Sal = A%,
which by Thm. 3.7 is equal to A itself, except when w= 0, in which
case A* = 1. Therefore by (*) |S| is the sum of 1 and Np times A:
|S] =1+No-A.

Since Xo is the Jeast infinite cardinal (Thms. 2.5 and 2.12(iii)), it


follows that &y < A. Hence by Thms. 3.5 and 3.6 |S| =1+A=A. cd
u
Propositional logic

§1. Basic syntax


We shall describe a formal language “. This will be our object
language, an object of our discussion. It must be distinguished from
our metalanguage, the language in which the discussion is conducted:
ordinary English augmented by a special technical vocabulary.

1.1. Specification
The primitive symbols of & fall into two mutually exclusive categories:

(i) an infinite set of propositional symbols;


(11) two distinct connectives, — and —, called negation symbol and
implication symbol respectively.

1.2. Warning
The statement just made does not mean that, for example, the
implication symbol of & is a boldface arrow-shaped figure. (In fact, for
all we care & may not have a written form at all!) Rather, the boldface
arrow is a syntactic constant, a symbol in our metalanguage, used as a
name for the implication symbol of 2.

1.3. Definition
If / is a natural number and sj, s>, ..., s; are primitive symbols of 2,
not necessarily distinct, then the concatenation $$ ... s, is called an
£-string and the number / is called its length. (More formally, an
£-string of length / can be defined as map from the set {1,2,...,/} to

101
102 7. Propositional logic

the set of primitive symbols of 2.) In particular, the empty -string


has length 0.
We shall usually omit the prefix ‘£-’, and say simply ‘string’ rather
than ‘£-string’. Similar ellipses will be used, when there is no risk of
confusion, in connection with other bits of terminology introduced
later on.

1.4. Definition
L-formulas are strings constructed according to the following three
rules.
(1) A string consisting of a single occurrence of a propositional
symbol is an £- formula.
(2) If B is an £- formula then —f (the string obtained by concatenat-
ing a single occurrence of — and the string B, in this order) is an
£- formula.
(3) If B and y are £-formulas then —fy (the string obtained by
concatenating a single occurrence of —, the string B and the
string y, in this order) is an £- formula.

A formula constructed according to (1) — a single occurrence of a


propositional symbol — is called a prime formula.
A formula constructed according to (2) is called a negation formula;
here —f is the negation of B.
A formula constructed according to (3) is called an implication
formula; here B is the antecedent and y the consequent of >By.

1.5. Warnings
(i) In some books, particularly older ones, what we call ‘strings’ are
referred to as ‘formulas’, whereas what we call ‘formulas’ are
referred to as ‘well-formed formulas’ (‘wffs’).
(ii) Def. 1.4 does not mean that boldface lower-case Greek letters
are “-formulas. Rather, they are syntactic variables, symbols in
our metalanguage used to range over L-formulas.

1.6. Definition
A propositional symbol occurring in a formula @ is called a prime
component of @.
$1. Basic syntax 103

1.7. Definition
The degree of complexity of a formula @ — briefly, dega — is the total
number of occurrences of connectives (— and —>) in a.

1.8. Remark

We shall often wish to prove that all formulas a have some property P
— briefly, VaPa. This may be done by [strong] induction on dega, as
follows. Define a property Q of natural numbers by stipulating that Q
holds for a given number n iff P holds for all formulas @ such that
dega =n. Then clearly VaPa is equivalent to VnQn. As we know
(see § 3 of Ch. 0), to prove VnQn by strong induction we deduce Qn
(for arbitrary 1) from the induction hypothesis Vm < nQm.
Stated in terms of P rather than Q, this is tantamount to saying: if
we deduce Pa (for arbitrary a) from the induction hypothesis that PB
holds for all formulas 6 such that degB < dega, then it follows that
VaPa.

1.9. Problem
Assign to each primitive symbol s of £ a weight w(s) by stipulating: if
S iS a propositional symbol then w(s) = —1, while w(—)=0 and
w(—) =1. If s;,82,...,8,; are primitive symbols, we assign to the
string $18)... 8, weight

w(s1S2 ...S;) = W(S;) + W(S2) + +++ + w(s;).

Thus, the weight of a string is the sum obtained by adding —1 for each
occurrence of a propositional symbol and +1 for each occurrence of >
in the string (occurrences of — make no contribution to the weight).
Since a formula is also a string, every formula @ has now been assigned
a weight w(a). Show that, for any formula a,

(i) w(a) = —1;


(ii) if @ is the string sys, ...s, and k </, then w(sis2...s,) 20.

In other words, (ii) states that any string which is a proper initial
segment of @ (an initial part of @ short of the whole of a) has
non-negative weight. (Prove (i) and (ii) by strong induction on deg a.)

(iii) Show that if @ is an implication formula, a = —fy, then 6 is


the shortest non-empty initial segment of a whose weight is 0.
104 7. Propositional logic

§2. Notational conventions


In Def. 1.4 we stipulated that in forming an implication formula an
implication symbol is placed before the antecedent. The advantage of
this so-called Polish notation (invented by the Polish logician Jan
Lukasiewicz) is that 2 has no need for brackets or other punctuation
marks for indicating grouping of symbols. Thus, in an implication
formula (a formula whose initial symbol is +) the antecedent and the
consequent are uniquely determined (see Prob. 1.9). This economy is
both elegant and technically useful.
So far we have mimicked this Polish system also in our meta-
language: thus in ‘>fy’ the boldface arrow is placed to the left.
However, in practice this metalinguistic notation is difficult to read,
partly because it does not conform to common usage. The Polish
notation in “ itself causes us no inconvenience, because we do not
actually use that language, only talk about it. But in our metalanguage,
which we do use continually, we shall trade off elegance for legibility
and conformity to common usage.

2.1. Definition
(0B) =a >a.

This definition changes nothing in 2; as far as is concerned Def. 1.4


remains in force. The change is purely in the metalanguage: our
metalinguistic notation will no longer mimic the structure of 2-formu-
las, because we shall write ‘(a—f)’ instead of ‘>a’. For the sake of
easier legibility, we use parentheses and brackets of various styles and
sizes. In this context, we refer to all of them simply as brackets. The
brackets are now needed to prevent ambiguity. For example,
[(a>B)—>y] = > ay, but [a>(B>y)] = ~aBy.
Here the new notation (introduced in Def. 2.1) is used on the left-hand
side, while the old notation for the same formulas is used on the
right-hand side.
We now hit a new snag: in long metalinguistic expressions of this
kind, written in the new style, the proliferation of brackets can hinder
legibility. We therefore abbreviate such expressions by omitting as
many pairs of brackets as convenient. Of course, in order to prevent
ambiguity such omissions must be governed by certain rules, so that
the brackets can be restored to yield a unique unabbreviated expres-
sion. We shall need three such rules. The first rule is very simple:
§2. Notational conventions 105

2.2. Rule (Omission of outermost brackets)


A pair of brackets such that no part of the expression lies outside it may
be omitted.
For example, (a—>$)>y=[(a-8)>y] and a>(p oy) =
[a (B>y)].
The second rule is easier to formulate as a rule about how to restore
omitted brackets. (So a pair of brackets may be omitted if it could then
be restored according to this rule.)

2.3. Rule (Association to the right)


If there are two or more occurrences of ‘—’ all enclosed in exactly the
same pairs of brackets (or all not enclosed in any brackets) then you
may add a new pair of brackets that enclose only the rightmost of these
occurrences.
For example,

a—>f>y>d = a>B>(y>4) = a> [B>(y>5)] = {2 >[B>(y>4)]}},


(apy) = [a> (B>y)]> 46= {[o>(B>y)]}>35},
a (B>y)>6 = a>[(B>y)4] = {a>[(B>y)>4]},
(a—>B)>y>d = (a>B)>(y>5) = [(a>B)>(y)],
[((a>B)>y]>4 = {[(a>B)>
y]> 5}.
The third rule is

2.4. Rule (Adhesion of ‘—’)


Do not omit a pair of brackets whose left member is immediately
preceded by an occurrence of ‘1’. Equivalently: In restoring brackets,
do not add a new pair of brackets whose left member immediately
follows an occurrence of ‘-’.
For example, a> 7f>y=[a>(Af—y)] but a—> a (p>y) =
[a>7(B>y)].
For reasons of economy, we allowed & to have only two connectives,
— and —-. Other connectives can however be introduced metalinguist-
ically ,by definition.
106 7. Propositional logic

2.5. Definition
(i) (AAB) =a 7(4>B),
(ii) (aV B) =a 78,
(ili) (a<>B) =gr (AB)
A (Be).
(aA) is called a conjunction formula and « and B its first conjunct
and second conjunct respectively; (av B) is called a disjunction formula
and «@ and § its first disjunct and second disjunct respectively; (a<>B) is
called a bi-implication formula and @ and f its left-hand side and
right-hand side respectively.

2.6. Warning
The metalinguistic symbol ‘a’ does not denote anything; strictly speak-
ing it has no meaning on its own - only the package ‘(aA)’ as a whole
has been defined as an abbreviation for ‘~(a—>-—B)’. This is an
example of a contextual definition. Similar remarks apply to the other
two clauses of Def. 2.5.

In view of Def. 2.5 we need to modify our procedure for omitting and
restoring brackets in metalinguistic expressions. We leave Rules 2.2
and 2.4 as they are, but we replace Rule 2.3 by the following more
comprehensive rule for restoring brackets, which takes into account
not only ‘>’ but also the newly introduced metalinguistic symbols ‘a’,
“Vand <<".

2.7. Rule (Ranks and association to the right)


If there are occurrences of ‘<’, ‘=’, ‘v’ and ‘a’ — at least two
occurrences in total — all enclosed in exactly the same pairs of brackets
(or all not enclosed in any pair of brackets), order all these occurrences
by rank as follows. Occurrences of ‘<>’ have higher ranks than those
of ‘—’; the latter have higher ranks than those of ‘v’; and occurrences
of ‘a’ have lowest ranks. Moreover, of two occurrences of the same
symbol, the one further to the left has the higher rank. Then you may
add a new pair of brackets that encloses only the symbol-occurrence
with the lowest rank.
For example,

a>Pay>p>y = a>(Bay)>Boy = a (Bay) (Boy)


= a>[(BAy)>(B>y)] = {a>[(Ba
y) > (B>y)]};
$3. Propositional combinations 107

arAB>yora>Bvy = (arBP)>yore>Bvy
= (Ar B)>yo70>(6vy)
= (ar B)>yo[44a>(Bv y)] = [((aaB)>y]e[74a>(Bv
y)]
= {[(4B)>y]e[4a>8vy)]}}.
The idea behind Rule 2.7 is that — in the absence of brackets that
indicate otherwise — a symbol-occurrence of higher rank separates more
strongly than one of lower rank, in much the same way as in English
punctuation a full stop separates more strongly than a semicolon, and
the latter separates more strongly than a comma.

It must be stressed that the definitions and conventions introduced in


this section are metalinguistic devices used in discussing £ and do not
change ~ itself in any way.

§3. Propositional combinations


A formula @ is said to be a propositional combination of k formulas B,,
B., ..., By, if @ can be constructed from the B; using — and —. The
following definition puts this more precisely.

3.1. Definition
Let B,, Bo, ..., B, be any formulas. A propositional combination of
B,, Bo, ..., B, is any formula constructed according to the following
three rules.

(1) Each B; (where 1 <i<k) is a propositional combination of B,,


Bobitbe-
(2) If y is a propositional combination of B,, Bo, ..., By, then sy is
a propositional combination of B,, Bo, ... , Bx.
(3) If y and 6 are propositional combinations of By, Bs, ... , Bx, then
y—6 is a propositional combination of By, Bo, . . - » Be:

For brevity, we shall usually say ‘combination’, omitting the adjective


‘propositional’.

3.2. Warnings
(i) In forming a combination of B,, B2, ..., Bx, not all the B; need
actually be used. For example, according to Def. 3.1, both B) and
6,—B> are combinations of B,, Bo, Bs.
108 7. Propositional logic

(ii) The B; need not be mutually independent: for example, one of


them may be a combination of the others. (Indeed, the B; need
not be distinct: some of them may coincide with each other.) For
this reason one and the same formula may be obtainable from
the B; in more than one way. For example, if B; = —B,; then
= 6,—>, obtained from B,, B2, B; by using clause (1) of Def. 3.1
twice, then clause (2) and clause (3), is the same formula as
6; B., which can be obtained from f;, B2, B3; without using
clause (2) of Def. 3.1.

It is clear that every formula is a combination of its prime components


(see Def. 1.6). The following problem goes a bit further.

3.3 : Problem
Let B,, Bo, ..., By be distinct prime formulas, among which are all the
prime components of a formula a. Prove that a can be obtained as a
combination of B;, Bo, ..., By in exactly one way. (Use induction on
dega, distinguishing three cases corresponding to the three clauses of
Def. 1.4.)

§4. Basic semantics


In classical two-valued logic — which is what we are studying here — we
admit two distinct truth values, namely truth and untruth (a.k.a.
falsehood). For brevity, we shall denote them by ‘T’ and ‘’ respect-
ively.

4.1. Remark
From a purely technical point of view, it does not matter what the
truth values T and 1 are, so long as they are two distinct objects. But
intuitively it is best to think of them as abstract entities standing
outside the language 2.

4.2. Definition
(i) A truth valuation on £ is a mapping o from the set of all prime
-£-formulas to the set {7,1} of truth values. For any truth
valuation o and any prime formula « we denote by ‘a” the truth
value assigned by o to a.
§4. Basic semantics 109

(ii) Given a truth valuation o, we now extend the definition of a’,


the truth value assigned by o to a, to cover every £-formula a.
We proceed by induction on dega, defining « in terms of the
truth values assigned by o to formulas whose degrees are smaller
than that of a. We distinguish three cases, corresponding to the
three clauses of Def. 1.4.
(1) If@ is a prime formula, then @? is already defined.
ee oth
(2) a if B° = L.
pee ani bo | andy—.ol
SA boy) fo tf otherwise.
(i11) Let @ be a formula and o a truth valuation. If a? = T we say that
@ is true under o, whereas if a@° = 1 we say that @ is untrue (or
false) under o.

4.3. Remarks

(i) Strictly speaking, in Def. 4.2(11) we defined a new mapping,


which extends o: whereas dom a is the set of prime formulas, the
domain of the new mapping is the set of all formulas, but it
agrees with o on prime formulas. Sacrificing absolute rigour to
convenience, we denote by ‘o’ this extension as well as the
original mapping itself.
(ii) Note that @? is a truth value rather than an expression in £2. (Of
course, both ‘a’ and ‘a@”’ are expressions in our metalanguage.)

4.4. Definition
(i) If @ is a formula and oa is a truth valuation such that @° = T, we
say that o satisfies @ and write ‘oF @’.
(ii) If o is a truth valuation that satisfies every member of a set ® of
formulas, we say that o satisfies ® and write ‘oF ®’.
(iii) If a formula @ is satisfied by every truth valuation, we say that a
is a tautology and write ‘Fy @’.
(iv) If ® is a set of formulas and @ is a formula such that every truth
valuation satisfying ® also satisfies a, we say that @ is a tauto-
logical consequence of ® and write ‘® Fy a’.
(v) If a set ® of formulas is not satisfied by any truth valuation, we
say that ® is /[propositionally] unsatisfiable and write ‘® Fy’.
110 7. Propositional logic

4.5. Remarks
(i) According to Def. 4.4(ii), a truth valuation o fails to satisfy a set
® of formulas, iff &@ has a member that fails to be satisfied by o.
Therefore if o is any truth valuation, then oF @. Indeed, © does
not have a member that fails to be satisfied by o, because it has
no members at all.
(ii) By Def. 4.4(iv), @ ky @ means that every truth valuation satisfies
a (because, as we have just seen, every truth valuation satisfies
the empty set ©); by Def. 4.4(iii) this means that a is a tautology.
Thus, a formula is a tautology iff it is a tautological consequence
of the empty set.
(iii) In connection with ‘Fy’ we employ certain notational simplifica-
tions that ought to be self-explanatory. Thus, for example, we
write ‘®, a Fy B’ instead of ‘® U {a} Fo B’.

4.6. Problem
(i) For any set ® of formulas and any two formulas « and B, prove
that ®, af Biff B Fy af.
(ii) Prove that {a,, @,...,@ ,} Fo Biff Fp a3>a, - a, >.

4.7. Warning
Never, never get — and Fy confused with each other. (I was not
referring just now to the symbols ‘—’ and ‘Fo’. You are not likely to
get them confused, because you can see they are different: the former
is a boldface arrow-shaped figure, while the latter is shaped like a
double-barred turnstile with a little ring on its lower right-hand side.
Rather, I was referring to what these symbols denote.) Much can be
written about this, but the following should help you to avoid the most
common errors.
Suppose a and f# are “-formulas. Then a—-f is another such
formula. ‘a—f’ is a nominal phrase: if you write it on its own, you
would not be making any statement, but only referring to that formula
— just as when I say ‘my income-tax statement’ and no more I am not
making a statement but merely referring to my income-tax statement.!

' We must exclude here cases of ellipsis, such as when, in reply to the question ‘What
were you doing last night?’, I say ‘My income-tax statement.’ as an ellipsis for the
sentence ‘I was doing my income-tax statement.’
$5. Truth tables 111
On the other hand, if you write ‘ao PB’ on its own, you would be
Stating that B is a tautological consequence of @ (or, more precisely, of
the singleton {a}); and if you write ‘tj a>’ on its own, you would be
Stating that the implication formula af is a tautology. By Prob. 4.6,
these two statements are equivalent.

§5. Truth tables


Conditions (2) and (3) of Def. 4.2(ii) may be summarized in truth
tables:

B |B
1G fea
‘if lil

The idea here is that any truth valuation that assigns to 6 (or to B and
y) the truth value(s) shown in the first column (or the first two
columns) at a given row must assign to 6 (or to By) the truth value
shown in the last column at the same row.
This idea can be applied more generally. In the following definition
the formula @ is any combination of formulas B;, B2, ..., By. The
definition prescribes how to construct a truth table for o. in terms of By,
p>, ..., By. It proceeds by induction on dega: the induction hypothe-
sis is that if y is any combination of B,, Bo, ..., B, and degy <dega
then we can construct a truth table for y in terms of B,, Bo, ..-, Bx;
and using this hypothesis the definition tells us how to construct a truth
table for a in terms of B,, Bo, ..., Bx.

5.1. Definition
Let the formula @ be a combination of formulas f;, B., ..., By. A
truth value for « in terms of B,, Bo, ..., Bx is constructed as follows.
First, set up a rectangular table with k columns — headed ‘f,’, ‘B,’,
..., ‘By’ respectively — and 2* rows. In each of the k -2* spaces enter
‘T’ or ‘1’, so that no two rows are filled out in the same way. Thus
each of the 2* different strings of length k made up of ‘T’s and ‘1’s
should appear in exactly one row. (For the sake of definiteness, regard
these strings as ‘words’ in an alphabet consisting of the two letters ‘T’
it2 7. Propositional logic

and ‘1’ in this order, and enter the 2* different strings in lexicographic
order.)
Next, add a new last column, headed ‘a’, and — proceeding by
induction on dega — fill it out with ‘T’s and ‘1’s according to the
following three rules corresponding to the three clauses of Def. 3.1.

(1) If a= 8; (where 1 <i<k), copy the entries of the i-th column


(the one headed ‘,’) into the last column, headed ‘a’.
(2) If © = —y, where y is a combination of B,, Bo, ..., B,, then by
the induction hypothesis we already know how to construct a “y’
column. Now, in the ‘a’ column put ‘T’ in each row where the “y’
column has ‘1’, and ‘1’ in each row where the “y’ column has
a lint
(3) If a = y->5, where y and 6 are combinations of B,, Bo, ... , Bx,
then by the induction hypothesis we already know how to con-
struct “y’ and ‘6’ columns. Now, in the ‘a’ column put ‘1’ in each
row where the ‘y’ column has ‘T’ whereas the ‘6’ column has ‘1’;
and ‘T’ elsewhere, that is, in each row where the “y’ column has
‘L’ as well as in each row where the ‘6’ column has ‘T’.

5.2. Warning
Since in general the same a may be obtained as a combination of
formulas B,, B2, ..., B, in more than one way — see Warning 3.2(ii) —
Def. 5.1 may not yield a unique result: a may have more than one
truth table in terms of B,, Bo, ... , Bx.

5.3. Problem
Construct truth tables in terms of a, B for:

(i) anB,
(ii) avB,
(ili) af.
(See Det2: 5.)

5.4. Problem

In a truth table in terms of two formulas a, B there are four (= 22)


rows; thus the last column can be filled out with ‘T’s and ‘L’s in 16
(= 2*) different ways. Find 16 combinations of a, 6 whose truth tables
in terms of a, B yield all these 16 different last columns.
§5. Truth tables 113

5.5. Lemma
Let a be a combination of By, Bo, ..., Bx. Consider a given row in a
truth table for « in terms of B,, Bo, ..., By. Let o be any truth
valuation such that for every i (where i=1, 2, ..., k) B;° is the truth
value indicated in the given row at the i-th column (the one headed ‘f;’).
Then @° is the truth value indicated in the given row at the last column
(headed ‘a’).

PROOF

Immediate from Def. 5.1 and Def. 4.2(ii), by induction on deg a. a

5.6. Theorem (Semantic soundness of truth tables)


Let « be a propositional combination of B,, Bo, ..., By. If in a truth
table for @ in terms of Bi, Bo, ..., Bx all the entries in the last (‘a’)
column are ‘T’, then @ is a tautology.

PROOF

Let o be any truth valuation. Clearly, the truth values B,°, B2°, ...,
B,° must be respectively the same as those indicated in one particular
row of the given truth table. Hence by Lemma 5.5 @? is the truth value
indicated in the same row in the last column. But by assumption this
truth value is T. Thus a? = T for all o. a

5.7. Problem
Verify that for any a, B and y:

(i) Fy ao poo (Law of Affirmation of the Consequent),


(ii) Fp (A> B>y)>(a— B)>a>y
(Self-distributive Law of Implication),
(iii) Fo [(a->B)> aloo (Peirce’s Law),
(iv) Fy nao 8 (Law of Denial of the Antecedent),
(v) kp) (a>74a)>74 (Clavius’ Law).

5.8. Warning
The converse of Thm. 5.6 is not generally true. To see this, let
a = B—y; then a truth table for a in terms of B, y is shown above (p.
111) and has an ‘1’ in its last column. Does it follow that « cannot be a
tautology? No; this truth table only shows that «° = 1 provided o is a
114 7. Propositional logic

truth valuation for which B? = T and y? = L. But such a truth valu-


ation may not exist; for example, if y = B then of course we cannot
have both B° = T and y° = L. Or if y = 64, then y is a tautology,
and we can never have y° = 1, irrespective of what B° happens to be.
However, the converse of Thm. 5.6 does hold, provided the B; are
subjected to special conditions.

5.9. Theorem (Semantic completeness of truth tables)


Let « be a combination of k distinct prime formulas B,, Bo, ... . By. If
@ is a tautology, then in the truth table for @ in terms of B,, Bo, .-- . Bx
all the entries in the last (‘a’) column are *T’.

PROOF

Consider an arbitrary row in this truth table. Since B;, Bo, ..., By, are
prime and distinct, there exists a truth valuation o such that the truth
values B,°, Bo’, ..., B,° are respectively the same as those indicated
in this particular row of the truth table. By Lemma 5.5, @° is the truth
value indicated in the same row at the last column. But a’ = T since a
is a tautology. Thus the entry at the last column in this row is ‘T’. S

5.10. Remark
Thms. 5.6 and 5.9 together provide us with an algorithm (a mechanic-
ally performable procedure) whereby we can test any formula @ and
decide whether or not it is a tautology: construct the truth table for a
in terms of its prime components (or in terms of any distinct prime
formulas among which are all the prime components of a@; see Prob.
3.3);
Using Prob. 4.6, this algorithm also enables us to decide, for any
finite set ® of formulas and any formula a@, whether or not ® Fy a@.

5.11. Definition
If « and B are formulas satisfied by exactly the same truth valuations
(that is, both a Fo B and B ky @) we say that @ and B are tautologically
equivalent and write ‘a =o B’.

5.12. Remarks

(i) From Prob. 5.3(iii) it is easy to see that @ =o B iff Fy) aco.
(ii) An argument similar to the one used in the proof of Thm. 5.6
§5. Truth tables HS

shows that if a and B are combinations of 6), B., ..., B, and if


the ‘a’ and ‘B’ columns respectively in truth tables for a and B in
terms of B;, Bo, ..., By, have ‘T’s and ‘1’s in the same places,
then @ =o B.
(iii) An argument similar to that used in the proof of Thm. 5.9 shows
that the converse of (ii) holds, provided B,, B., ..., B, are
distinct prime formulas.

5.13. Problem
Verify that for any a, B, y, @, Qo, ... , Qx:

(i) avB =o (a>B)—>B,


(ii) a>B =) np > 74 (Law of Contraposition),

(ili) A(QAQA ... AG;x) =p AG VAY... a


(iv) A(QiVQvV ... V@x) =p APAAGMA ... AAG,

(De Morgan’s Laws),


(v) AABAY =o (AAB)Ay (Associative Law of Conjunction),
(vi) avBvy =o (avB)vy (Associative Law of Disjunction),
(Vii) PJAQLA ... AQy~A =p PIP2 + + PG, 4.

5.14. Problem
Let a and £ be any formulas. Let ® be the set of all formulas
obtainable from @ and f using negation and conjunction. More pre-
cisely,

(1) a and Bf are in ®;


(2) if y is in ® then so is 4;
(3) if y and 6 are in ® then so is yA0.

Find a formula in ® that is tautologically equivalent to af.

5.15. Problem
The same as Prob. 5.14, but with ‘conjunction’ and ‘A’ replaced by
‘disjunction’ and ‘v’ respectively.

5.16. Problem
For any formulas @ and f, put «|B =a; a(a@AB). The ‘|’ here is known
as Sheffer’s stroke. The formula a|f is called the non-conjunction of «.
116 7. Propositional logic

and B. Let ® be the set of all formulas obtainable from «@ and B using
non-conjunction. Thus,
(1) a and f are in ®;
(2) if y and 6 are in ® then so is y|8.
Find formulas in ® that are tautologically equivalent to —@ and a—f
respectively.

5.17. Problem
Let a and f be distinct prime formulas. Let ® be defined as in Prob.
5.16, but with ‘non-conjunction’ and ‘|’ replaced by ‘implication’ and
‘_»’ respectively. Prove that no formula in ® is tautologically equiva-
lent to anf.

5.18. Problem
Let a and f£ be distinct prime formulas. Let ® be defined as in Prob.
5.14, but with ‘conjunction’ and ‘A’ replaced by ‘bi-implication’ and
‘<>’ respectively.

(i) Find eight formulas in ® such that every formula in ® is


tautologically equivalent to exactly one of the eight.
(ii) Prove that no formula in ® is tautologically equivalent to af.

5.19. Remark
Prob. 5.4 means that all binary truth functions are reducible to
negation and implication. Prob. 5.14 (Prob. 5.15) means that implica-
tion — and hence all binary truth functions — can be reduced to negation
and conjunction (negation and disjunction). Prob. 5.16 means that
negation and implication — and hence all binary truth functions — can
be reduced to non-conjunction. Prob. 5.17 means that conjunction
cannot be reduced to implication (although by Prob. 5.13(i) disjunction
can be so reduced). Prob. 5.18(ii) means that implication cannot be
reduced to negation and bi-implication.

§6. The propositional calculus


The propositional calculus (briefly, Propcal) presented in this section
is a formal mechanism for generating the tautological consequences of
any set ® of formulas. A central role will be played by modus ponens.
$6. The propositional calculus diy,

6.1. Definition
Modus ponens is the [formal] operation that may be applied to any two
formulas of the form «@ and a—f8, to yield the formula B; schematically,

a, a>
B
In this connection, a and a—f are called the minor premiss and major
premiss respectively, and 6 is called the conclusion.

6.2. Remark

From Def. 4.2 it follows at once that if a’ = (a—-)° = T then also


B°=7T. (By Def. 4.4(iv) this amounts to the same thing as
{a, a—>B} Fo B.) We express this by saying that modus ponens pre-
serves truth and is therefore semantically sound as a rule of inference.

We designate as propositional axioms all formulas of the following five


forms:

6.3. Axiom schemei. u—>f—-a,

6.4. Axiom scheme ii. (a—-Bp—-y)>(a—>-f)>-


ay,

6.5. Axiom scheme iii. [(a—-f)>a|>a,

6.6. Axiom scheme iv. 0-0-6,

6.7. Axiom scheme vy. (G—->70)>74.

Note that these are not five single axioms but axiom schemes, each
representing infinitely many axioms obtained by all possible choices of
formulas a, B, and y. We shall refer to them briefly as ‘Ax. i’, ‘Ax. 11’,
etc:

6.8. Definition
(i) A propositional deduction from a set ® of formulas is a non-
empty finite sequence of formulas @, @2, ..., @, such that for
each k (k =1,2,..., n) at least one of the following conditions
118 7. Propositional logic

holds:
(1) @, is a propositional axiom,
(2) gE ®,
(3) @, is obtained by modus ponens from two earlier formulas in
the sequence; that is, there are i and j, both smaller than k, such
that @; = Gi Gx.
In this connection ® is called a set of hypotheses.
(ii) A propositional proof is a propositional deduction from the
empty set of hypotheses.
Where there is no risk of ambiguity, we shall usually omit the qualifica-
tion ‘propositional’ and say simply ‘deduction’ and ‘proof’. Similar
ellipses will be used in connection with other bits of terminology
introduced below.

6.9. Definition
(i) A deduction (or proof) whose last formula is @ is said to be a
deduction (or proof, respectively) of a.
(ii) If there exists a propositional deduction of a formula @ from a set
® of formulas, we say that «@ is /propositionally] deducible from
® and write, briefly, ‘® +) a’.
(iii) If there exists a propositional proof of a formula @ — that is, a
deduction of a from the empty set — we say that a« is /proposition-
ally] provable and write, briefly, ‘ko a@’. In this case @ is also
called a [propositional] theorem.

In connection with ‘fo’ we employ notational simplifications like those


used in connection with ‘Fo’. Thus, for example, we write ‘®, ay B’
instead of ‘® U {a} }o B’.

6.10. Remarks
(i) The calculus we have specified here is a linear calculus, as
distinct from calculi whose deductions have a more complex
tree-like branching form rather than being ordinary (linear) se-
quences as in Def. 6.8. A linear calculus is characterized uniquely
by specifying its axioms (by means of axiom-schemes or in some
other way) and rules of inference. In the present case the axioms
are all instances of Ax. i-Ax. v, and the sole rule of inference is
modus ponens.
$6. The propositional calculus 119

(ii) Many calculi described in the literature, based on other axioms or


rules of inference, are equivalent to the one presented here in the
sense, roughly speaking,’ that their relation of deducibility is
co-extensive with our |}. (For example, the calculus presented in
B&M, Ch. 1, § 10.) All these calculi, including of course the
present one, are often referred to collectively as the [classical]
propositional calculus. Although, strictly speaking, they are dis-
tinct calculi, their mutual equivalence makes it possible to regard
them as being merely different versions of the same calculus.
(iii) The qualification ‘classical’ is often omitted; it is however needed
sometimes in order to prevent confusion with non-classical
(a.k.a. non-standard or deviant) propositional calculi that are
broadly similar but not equivalent to the present one; for exam-
ple, the intuitionistic propositional calculus (a version of which is
presented in B&M, Ch. 9, § 8).
(iv) We use the term ‘theorem’ with two quite different meanings,
which must be strictly distinguished from each other. A [proposi-
tional] theorem in the sense of Def. 6.9(iii) is a formal expres-
sion, a formula in the language “. In this book we never assert
such a theorem, since we do not use the language , only talk
about it. On the other hand, a theorem such as Thm. 5.6 (which
we have asserted above) is a proposition stated in our metalan-
guage. In order not to get these two kinds of theorem confused
with each other, those of the former kind are sometimes called
formal theorems or £-theorems and those of the latter kind
metatheorems. However, this will rarely be necessary here, as it
will usually be clear from the context which meaning of ‘theorem’
is intended. A similar distinction must be drawn between the two
meanings of terms such as ‘deduction’, ‘hypothesis’ and ‘proof’.
(v) The reason for using the same terms with two alternative mean-
ings is that there is an intended connection between the two sets
of meanings. Thus formal deductions are supposed to be stylized
and formalized versions or counterparts (or at least analogues)
of ‘ordinary’ deductions in informal or semi-formal axiomatic
theories expounded within mathematics and related hypothetico-
deductive disciplines. Hypotheses in the sense of Def. 6.8 are
supposed to be formal counterparts of the hypotheses or assump-
tions adopted as a starting point for real (informal or semi-
formal) mathematical deductions. (When such hypotheses or

1 That is, ignoring irrelevant differences between the formal languages in which these
various calculi are formulated.
120 7. Propositional logic

assumptions are adopted as a point of departure for a whole


axiomatic theory, rather than for temporary or ad hoc ends, they
are usually called postulates or [extralogical] axioms.)
(vi) Formal deductions of the kind studied in Symbolic Logic differ
from ‘ordinary’ mathematical deductions not only in being com-
pletely formalized but also in spelling out the logical machinery
used. In informal or semi-formal mathematical deductions you
are allowed to assert any statement that follows logically from
previous ones, but the nature of this relation — being a logical
consequence — is not spelt out fully, if at all. In logical calculi,
such as Propcal, the purely logical steps in formal deductions are
made explicit and formally detailed by specifying logical axioms
(such as Ax. i-Ax. v) and rules of inference (such as modus
ponens).

(vii) In an ordinary mathematical deduction you are allowed to in-


troduce any statement deduced earlier (by a preceding deduc-
tion) from the same hypotheses. However, this licence is merely a
matter of practical convenience: in principle such a previously
deduced statement could be introduced together with its whole
deduction, so that every deduction would start from first prin-
ciples. This latter procedure is mimicked in Def. 6.8.

(viii) Propcal is pitifully inadequate for formalizing any but the most
trivial mathematical deductions. Its is however of interest as a
sort of pilot project for more powerful and useful systems.

6.11. Example
We show that }) a—a for every a. (In other words, we are going to
prove a [meta]theorem about Propcal, which asserts that, for every
formula @, G—« is a propositional theorem, a theorem of Propcal.)
The following sequence of five formulas is a [propositional] proof of
a> a:
[a> (a>a)>a]>(a>a>a)> a4, (Ax. ii)

a> (0>a)>a, (Ax. i)

(a>-a>a)>a->4, (m.p.)

I> 0> 4, (Ax. 1)

01> a. (m.p.)
$6. The propositional calculus 121
The marginal comments on the right have been added for convenience.
Thus the first formula is an instance of Ax. ii, obtained from (6.4) by
taking B = a—a and y = a; the second formula is an instance of Ax. i,
with 6 = a—a,; the third formula is obtained by modus ponens from
the preceding two; the fourth formula is an instance of Ax. i, with
6 =a; and the fifth formula is again obtained by modus ponens from
the preceding two. In principle these explanations are redundant,
because you can always check whether or not a given formula is an
instance of an axiom scheme, or obtainable by modus ponens from two
earlier formulas.

6.12. Theorem (Semantic soundness of Propcal)


If ® +, @ then also ® Fy a. In particular, if +o a then also Fo a.

PROOF

Let @,, @, ..., @, be a deduction of a from ®; thus @, = a. We shall


prove by [strong] induction on k that Fy q, for k=1, 2,..., n.
Thus, in particular, for k = n it will follow that ® Fy a@, as claimed. We
distinguish three cases concerning @,;, corresponding to the three
conditions in Def. 6.8(i).

Case 1: @;, is a propositional axiom. In this case it is easy to verify that


Fy @, (see Prob. 5.7); in other words, @, is satisfied by every truth
valuation. Hence a fortiori ® Fo @x.

Case 2: 9; € ®. Then obviously ® Fo gx.

Case 3: @, is obtained by modus ponens from two earlier formulas in


the deduction; that is, there are i, j < k such that @; = gz . In this
case, by Rem. 6.2, {@;,j} Fo @x- But by the induction hypothesis
both ® Fy g; and ® Fp g;. Hence clearly ® Fo g,.
The second claim of our theorem follows from the first by taking
® = ©. &

6.13. Theorem (Cut Rule)


If ® bo 8; for eachi=1,2,..., k and BU {8, &, ..., d} Foe
then ® UW} oa.
2 7. Propositional logic

PROOF
Take a deduction of « from W U {6;, 55, ..., 6,} and whenever 9; is
used there as an hypothesis replace it by a deduction of 6; from ®. The
result is clearly a deduction of « from ® UW. &

6.14. Remark
The Cut Rule clearly holds for any linear calculus, irrespective of its
axioms and rules of inference. The strange name of this rule is due to
the fact that it allows us to ‘cut out the middlemen’ 6;.
We shall often refer to this rule briefly as ‘Cut’.

§7. The Deduction Theorem


7.1. Remark
Suppose ® +) a—f. Since by modus ponens we have {a, a>} fo B,
we can apply Cut to the ‘middleman’ a—f — see Thm. 6.13 - and get
®, ato fp. Thus we have

® |, a8 = ®, a} B.

The converse of this result, which we prove next, is of central import-


ance.

7.2. Theorem (Deduction Theorem)


If ®, ato B then B®by a8.

PROOF

Let @1, Q2, .-., @, be a given deduction of B from ® U {a}; thus


", = B.
We shall prove, by [strong] induction on k, that ® +) a—>@, for
k=1,2,...,n. In particular, for k = n it will follow that ® +}, a=,
as claimed. We distinguish three cases concerning @;, corresponding to
the three conditions in Def. 6.8(i).

Case 1: @x 1S a propositional axiom. In this case the following se-


quence of three formulas is a proof of a+, and hence a fortiori a
deduction of it from ®:

Px, (ax.)
§7. The Deduction Theorem 123

P.O > Gx, (Ax. i)

A> Q x.
(m.p.)

Case 2: g, € ® U {a}. Thus g, € ® or g, = a. Because @ plays here a


special role, we must split our argument into two subcases.

Subcase 2a: @, € ®. Then the same sequence of three formulas as in


Case 1 is a deduction of aq, from ®, except that now the justifica-
tion for the presence of @, is that it is one of the hypotheses ® rather
than that it is an axiom.

Subcase 2b: @, =a. Then a>, = a4, so by Ex. 6.11 ky ag,


and a fortiori B +) a>@,.

Case 3: @, 1s obtained by modus ponens from two earlier formulas in


the given deduction. This means that there are 1, j<k such that
@; = Vi Gx (SO @; and @; serve as minor and major premiss, respect-
ively, to yield @;,). By the induction hypothesis, both ® |) a—@q; and
® |) aq; — that is, B fy) Gg, gy.
Thanks to Cut, the required result, ® }) a>q,, will follow if we
show that {a>@;, a>@;>4@;} ko a> ,. The following sequence of
five formulas is a deduction of a>q, from {a> q;, o> g,> @;}:

ae: (hyp.)
o> Gi Fx, (hyp.)
(4>G;> 9, )> (4G) 9 Fx, (Ax. ii)
(a>G))> a> Gx, (m.p.)
0 Px: (m.p.)

7.3. Remarks
(i) We shall refer to the Deduction Theorem briefly as ‘DT’.
(ii) In proving DT (and in Ex. 6.11, which is used in the proof) we
invoked only Ax. i and Ax. 11. In fact, it is not even necessary for
formulas of the forms (6.3) and (6.4) to be axioms: it would have
been enough if they were just theorems. More precisely: if -* is
the relation of deducibility in a linear calculus whose sole rule of
inference is modus ponens and if +/*a—-fB—-a as well as
-* (a-p>y)>(a>B)> ay; for all a, B and y, then DT holds
for -*, that is: ®, a+*B> ®/+* af.
124 7. Propositional logic

(iii) Now that we have DT, we shall not need to invoke Ax. i and Ax.
ii again. Indeed, the sole purpose of adopting these axiom
schemes was to enable us to establish DT.

7.4. Problem
Let }* be the deducibility relation in a calculus that has modus ponens
as a — not necessarily sole — rule of inference.
Show that if Cut and DT hold for }+*, then }*a—-f—-a and
-L* (a>fp>y)>(a>f)>
ay for all a, B and y.

§8. Inconsistency and consistency


8.1. Definition
(i) A set of two formulas {a, 4a}, one of which is the negation of
the other, is called a contradictory pair.
(ii) A set ® of formulas is said to be /propositionally] inconsistent —
in symbols: ‘® +,’ — if both members of some contradictory pair
are propositionally deducible from ®; that is, for some formula @
® |}, a as well as ® }y 4a. Otherwise, ® is said to be /proposi-
tionally] consistent.

8.2. Warning
Some authors use ‘contradictory’, ‘consistent’ and ‘inconsistent’ as
semantic terms; so that, for example, a set ® of formulas would be
said to be inconsistent if ® Fo, that is, if it is not satisfied by any truth
valuation. We shall strictly avoid that semantic usage. Although it will
transpire that a set ® of formulas is satisfied by some truth valuation
iff it is consistent (in the proof-theoretic sense of Def. 8.1), this fact is
a far from trivial theorem rather than a mere matter of definition.

8.3. Problem
(i) Prove that if W C ® and W is inconsistent then ® is inconsistent.
(ii) Prove that if ® is inconsistent then it has an inconsistent finite
subset.

8.4. Theorem
An inconsistent set of formulas is not satisfied by any truth valuation: if
@ ko then ® Eg.
§8. Inconsistency and consistency 125

PROOF

Suppose ®},. Then for some a@ both ® fo a and ® fy ma. By the


soundness of Propcal (Thm. 6.12) it follows that both ®t) a and
® —) 4a. Thus any truth valuation satisfying ® would have to satisfy
both @ and -4@, which is impossible by clause (2) of Def. 4.2(ii). a

8.5. Corollary (Consistency of Propcal)


It is impossible, for any a, that both ky a and ky ma.

PROOF

The claim is equivalent to saying that the empty set is consistent; but
the empty set is satisfied by every truth valuation (cf. Rem. 4.5(i)). &

8.6. Theorem (Inconsistency Effect)


If Bo then ® bg B for every formula B.

PROOF

Assume ®},. Then for some @ both ® +, a and ® +, a. Now, for


any $6, the formula ~a—-a-—f is an instance of Ax. iv; hence
{a, aa} Fo B. By Cut, Fy B. a

8.7. Remarks
(i) For brevity, we shall refer to the Inconsistency Effect as ‘IE’.
(ii) The converse of Thm. 8.6 is trivial: if all formulas are deducible
from @®, then in particular both members of any contradictory
pair are deducible from it.
(iii) Our sole purpose in adopting Ax. iv was to enable us to establish
IE. From now on this axiom scheme will not have to be invoked.

8.8. Problem
Let }* be the deducibility relation in a calculus for which both DT and
IE hold. Prove that }* aa->a-f for all a and B.

8.9. Theorem (Reductio ad absurdum)


If ®, abo then ® fy m4.
126 7. Propositional logic

PROOF
Assume ®, «| ,. Then by IE we have ®, a+) s@ and hence, by DT,
®},)a>74.
Now, (o——4)—-14 is an instance of Ax. v; hence d>74@ /) 74.
Using Cut, we get ® fy sa, as claimed. i

8.10. Remarks
(i) The converse of reductio is immediate: if ® |) sa then a fortiori
®, ay a. But clearly also ®, a+) a; hence ®, a fo.
(ii) The sole purpose of adopting Ax. v was to enable us to prove
reductio. Henceforth there will be no need to invoke that axiom
scheme.

8.11. Problem
Let +* be the deducibility relation in a calculus that has modus ponens
as a rule of inference and for which DT and reductio hold. Prove that
+* (a>74a)>71¢ for all a.

8.12. Problem
Prove that a) a—@ for all a.

8.13. Remark
All the proof-theoretic results we have obtained so far - Cut, DT, IE
and reductio — hold also for the intuitionistic propositional calculus
(the most important non-classical propositional calculus). But the
following result — the inverse of Prob. 8.12 — does not hold for that
calculus, so in order to prove it we shall have to invoke Ax. iii, which
is not valid in intuitionistic logic.

8.14. Lemma
a7a oaforall a.

PROOF

Clearly, {a>74,a} oa; but also {a>70,a}} oa, by modus


§8. Inconsistency and consistency 27

ponens. Therefore {a—-.a, a} fo and by reductio we get!


(1) a>1N0 ko 14,

Now, {70,44} is a contradictory pair, so it follows from (1) that


{174, a>-a} }o. Hence by IE we have {47a, a>} }y a, and
by DT

(2) 3760} 9 (@a>74)>4.

Next, [(a-74a)>a]>a is an instance of Ax. iii, therefore


(a>-7a)>a},)a. From this and (2) we get by Cut naa} oa, as
claimed. a

8.15. Theorem (Principle of Indirect Proof)


If ®, Aa +o then DP} oa.

PROOF

Assume ®, ma} . By reductio, ® +) 14-14; hence, using Lemma


8.14 and Cut, ® fg a, as claimed. 2

8.16. Remarks
(i) For brevity, we shall refer to the Principle of Indirect Proof as
BLP’.
(ii) Lemma 8.14 is a special case of PIP, for clearly {a 14, a} Fo.
(iii) The converse of PIP is immediate.
(iv) The sole purpose of adopting Ax. iii was to enable us to prove
PIP. Henceforth it will no longer be necessary to invoke this
axiom scheme.
(v) Indeed, from now on we shall not invoke any propositional
axiom, because the four proof-theoretic principles — DT, IE,
reductio and PIP — jointly contain all the information that the
choice of axioms was designed to provide (cf. Probs. 7.4, 8.8,
8.11 and 8.18). We use these four principles even where, as in the
proof of Lemma 8.14, it would have been quicker to invoke an
axiom. The reason for this apparent perversity is that the axioms
are forgettable, mere scaffolding, whereas the four principles
(together with modus ponens and Cut) encapsulate the most

1 We could have got (1) more directly, as in the proof of Thm. 8.9; but see Rem.
8.16(v).
128 7. Propositional logic

important inherent structural facts about the propositional cal-


culus.

8.17. Warning
Do not commit the solecism of confusing PIP with reductio. The two
principles, though formally similar to each other,. are quite distinct.
(Intuitionistic logic rejects the former and upholds the latter.)

8.18. Problem
Let }* be the deducibility relation in a calculus that has modus ponens
as a rule of inference and for which Cut, DT, IE and PIP hold. Prove
that }* [((a>8)—>a]|—a for all a and B.

8.19. Problem
Prove:

(i) Aa by a,
(ii) BEo a8,
(iii) {a, 4B} +) (a8),
(iv) 4(a>B) bo a,
(v) 7(0->) Fy 3B.
8.20. Problem
Using Def. 2.5, prove:
(i) aATaFko,
(ii) koava,
(ili) AABEo BAG,
(iv) avB ho Bva.

8.21. Remark
In Prob. 8.20, (ii) does not depend on the intuitionistically invalid PIP
(or Ax. 111), whereas (iv) does. On the other hand, it is well known that
in intuitionistic logic the law of excluded middle is invalid, whereas
disjunction has a symmetric meaning. This apparent incongruity is due
to the fact that in intuitionistic logic Def. 2.5(i1) itself is not acceptable,
because disjunction (and, for that matter, conjunction) cannot be
reduced to negation and implication.
$9. Weak completeness 129

8.22. Problem
Prove that fy (ma—>f)>(ma—>f)>.4.

8.23. Remark

The version of the propositional calculus introduced in B&M Ch. 1,


§ 10 differs from the present one solely in having the axiom scheme
(r0a—-8)>(Aa—>-—8)-24 instead of our last three axiom schemes.
Hence by Prob. 8.22 all the axioms of that version are theorems in the
present version. On the other hand, since (as shown in B&M, Ch. 1,
§ 10) IE, reductio and PIP hold for the B&M version, it follows from
Probs. 8.8, 8.11 and 8.18 that the converse also holds: all axioms of the
present version are theorems in the B&M version. The two versions
are therefore equivalent.

§9. Weak completeness


9.1. Observation
We reproduce below the truth tables for —f in terms of B and for By
in terms of B, y (cf. p. 111). Alongside these tables we quote some
proven results concerning deducibility.
B ||=B
B ko 7B (Prob. 8.12),
3B ky aB (obvious).

{B, y} Fo Boy (Prob. 8.19(ii)),


{B, a} Fo 7(B>y) (Prob. 8.19(iii)),
{B, y} ko Boy (Prob. 8.19(i) or 8.19(ii)),
{aB, ay} ko Boy (Prob. 8.19(i)).
Observe that there is a systematic relationship between each row in the
truth tables and the deducibility statement to its right. The formulas
involved in each statement are related to the headings of the columns
in the truth tables. Where the entry in the truth table is ‘T’, the
corresponding formula on the right is exactly the one indicated at the
head of the relevant column; but where the entry in the truth table is
‘1’, the corresponding formula on the right is the negation of that
indicated at the head of the column.
We shall now generalize this observation to all/ truth tables.
130 7. Propositional logic

9.2. Lemma
Let « be a combination of formulas B,, Bo, ..., By. Select a given row
in a truth table for @ in terms of By, Bo, ...,B,. For eachi=1,2,...,
k let B; be B; or —B,, according as the entry in the given row at the i-th
column is ‘T’ or ‘1’. Similarly, let a’ be « or —4a, according as the
entry in the given row at the last column is ‘1’ or ‘L?.
Then {Bi, B5,..., Bi} Fo@’.

PROOF

For brevity, we put ® = {B}, Bj, ..., B,}, so we must prove ®} ya’.
We proceed by induction on dega@ and distinguish three cases, accord-
ing to which of the three rules in Def. 5.1 was used to construct the last
column in the truth table in question.

Case 1: « = B; (where 1 <i =< k) and Rule (1) of Def. 5.1 was used. In
this case the entry in the given row and last column is a copy of the one
in the i-th column. Then a’ = B; € ® and obviously ® fy a’.

Case 2: « = —Yy, where y is a combination of B;, Bo, ... , B, and Rule


(2) of Def. 5.1 was used. By the induction hypothesis, ® fy y’, so the
required result, ® +, a’, will follow (thanks to Cut) if we show that
y’ ko a’. We distinguish two subcases.

Subcase 2a: y' = y. Then according to Rule (2) we get a’ = ma=


a-—y. Thus y’ oa’ is the same as y+) ——y, which holds by Prob.
8.12.

Subcase 2b: y' = —y. Then according to Rule (2) we get a’ =a=
—y; and y’ fo a@’ is the same as my kp My, which is obvious.

Case 3: « = y—>96, where y and 6 are combinations of B,, Bo, ..., Bx


and Rule (3) of Def. 5.1 was used. By the induction hypothesis,
®}o)y and ®},98’, so the required result, ®}+,a’, will follow
(thanks to Cut) if we show that {y’,6’}} oa’. We distinguish three
subcases (the first two of which are not mutually exclusive).

Subcase 3a: y' = sy. Then according to Rule (3) we get a’ =a=
y—9. So y’ oa’ is the same as myo y>8, which holds by Prob.
8.19(i). Therefore a fortiori {y', 5'} ko a’.
$9. Weak completeness 131

Subcase 3b: 6' = 8. According to Rule (3) we have again a’ =a=


y—5. So 8’; oa’ is the same as 8+) y>8, which holds by Prob.
8.19(ii). Therefore a fortiori {y’, 5'} fo a’.

Subcase 3c: neither of the previous two subcases holds; so y=¥


and 6’ = 6. Then by Rule (3) a’ = na = A(y->8). So {y’, 6'} fo a’
is the same as {y, 18} fo —(y—85), which holds by Prob. 8.19(iii).

9.3. Lemma

Let « be a combination of Bi, Bo, ..., Bx, and suppose that in some
truth table for a in terms of B,, Bo, ..., By all the entries in the last
column are ‘T’. For eachi=1,2,..., k let Bj be chosen arbitrarily as
B; or —B; — the choice being made independently for different i. Then
ifs, --- > Px-y) foe jor every p=, 1, .....°k; In‘partcular, for
p= Ke ko a.

PROOF

By induction on p. For p = 0 the claim is that {B;, Bj, ..., Bx} Fo.
This holds by Lemma 9.2, because according to our present assump-
tion the formula a’ (defined there) is always @ itself.
For the induction step, let p < k. We must show that ® |) a, where
® = {B}, Bs, ... , Bk_-(p4:y}- (If p= k — 1 then ® = ©.)
The induction hypothesis is that ®, B;_, |o @. But we are free to
choose B;_, in two ways: as B,_, or as —B,_,. So we have

®,B,-,poa@ and ®, af;,_, koa.

Hence
®, 740, br, fo and ®, 7a, aB,_, ko.

By reductio and PIP respectively, we get

®, 50+) 4B,_-, and ®, 4ab By_p.


This shows that ®, aa} y. So by PIP ® fy a, as required. |

We are now in a position to prove a partial converse of Thm. 6.12. The


converse is only partial because of the restriction to finite sets of
formulas; hence also the qualification ‘weak’ in the name of the
theorem:
132 7. Propositional logic

9.4. Theorem (Weak semantic completeness of Propcal)


For any formula @, if Fy a then |o a. More generally, if ® ts a finite set
of formulas and ® Fy a, then ® |p @.

PROOF

Suppose Fy a. Then by Thm. 5.9 the truth table of @ in terms of all its
prime components satisfies the assumption of Lemma 9.3; hence by
that lemma fg @.
To prove the second part of the theorem, assume that ® Fy a, where
® is a finite set of formulas. Let @,, @, ..., @, be all the members of
®; then ® = {q@,, @, ..., B,} and we have {@,, @,..., @,} Foa.
By Prob. 4.6(ii) we get Fp gj >q@2—: - ->@,-—4. Therefore, by the
first part of the present theorem, /p 9; @2—: - -> @,—>@. Hence, by
k applications of modus ponens, we obtain {@,, @2, ..., Qe} ko G,
that is, B Fo a. z

A partial converse of Thm. 8.4 can now be proved.

9.5. Theorem
A finite unsatisfiable set of formulas is inconsistent: if ® is finite and
® Fo, then P fo.

PROOF

Suppose ® Fy. Then trivially ® F) a for any formula a. If ® is finite,


then by Thm. 9.4 it follows that ®},)a for any a; hence clearly (cf.
Rem. 8.7(ii)) ® fo. =

9.6. Remarks

(i) Thm. 9.5 has been formulated contrapositively. An equivalent


positive formulation is: A finite consistent set of formulas is
satisfiable [by some truth valuation].
(ii) Thms. 9.4 and 9.5 are equivalent. We have just seen that the
latter follows from the former, but the converse also holds.
Indeed, if ® is finite and ® Fy a, then clearly ® U {ra} is finite
and unsatisfiable; hence by Thm. 9.5 ®, ma fo, and by? PIP
Poa.
§10. Hintikka sets 133

§10. Hintikka sets


10.1 Preview

Our final task in propositional logic will be to prove the full converse
of Thm. 6.12 — the strong semantic completeness of the propositional
calculus. From Rem. 9.6 it should be clear that this task can be
accomplished by proving first the full converse of Thm. 8.4: A consist-
ent — finite or infinite — set of formulas is satisfiable. We shall do so in
three easy stages.
First, we shall show that certain special sets of formulas, called
Hintikka sets, are satisfiable. This will be quite easy, because the
definition of these sets is rigged for this very purpose.
Second, we shall introduce the even more special maximal consistent
sets of formulas and show that each such set is a Hintikka set, and
hence satisfiable. In fact, it will transpire that there is a one-to-one
correspondence between maximal consistent sets and truth valuations.
Finally, using a simple but powerful result from set theory, we shall
show that every consistent set of formulas is a subset of a maximal
consistent set, and is therefore automatically satisfied by the truth
valuation that satisfies the latter.

10.2. Definition
A [propositional] Hintikka set [in £] is a set ® of formulas satisfying
the following four conditions for all formulas @ and B:

(1) Ifais prime andae ®, then jna¢ ®@.


(2) Ifnnae@®, thenalsoane ®.
(3) Ifaope@thennae®Porpe®.
(4) Ifn(a—B) € ® thena e ® and af e @.

10.3. Theorem
If ® is a Hintikka set, it is satisfied by some truth valuation.

PROOF

Let ® be a Hintikka set. Define a truth valuation o by stipulating that


a” = T for every prime formula @ belonging to ®, and a@° = 1 for any
other prime a. We claim that, for any formula @ ,

(a)gpeD=QG’=T, (b) nge®=>Q’= 1.


134 7. Propositional logic

We shall prove this double claim simultaneously! by induction on


deg @. We distinguish three cases, corresponding to the three clauses of
Def. 1.4 and those of Def. 4.2(ii).

Case 1: @ is prime.
(la) pe®>qQG’°=T by the definition of o.
(1b) ngpe®=>QEe_® by clause (1) of Def. 10.2,
=@gr=t by the definition of o.

Case 2: @ is a negation formula; say @ = 44.

(2a) pe®>r7nac®@
= a7 = 1 by part (b) of ind. hyp.,
Soo = 7 by clause (2) of Def. 4.2(1i).
(2b) nge®>a7A74€0@
>ace® by clause (2) of Def. 10.2,
>a°=T by part (a) of ind. hyp.,
= Get by clause (2) of Def. 4.2(11).

Case 3: @ is an implication formula; say g = af.

(3a) peD>a->-fpe@
>nacePMorpe® by clause (3) of Def. 10.2,
=«@ €l or B? = 7 by ind. hyp.,
=o =" | by clause (3) of Def. 4.2(i1).
(3b) ngpe®SacD& ape by clause (4) of Def. 10.2,
=>oa° = T and p? = 1 by ind. hyp.,
= @ = by clause (3) of Def. 4.2(ii). 1

§11. The ambient metatheory


Let us pause to consider the mathematical presuppositions that under-
lie our study of propositional logic. This study is being conducted in
mathematical fashion: we frame precise definitions and prove [meta]-
theorems concerning the object language “, its syntax and semantics.
The mathematical theory in which this study is conducted is our

' Note that (a) by itself is sufficient for proving our theorem; and once (a) is established
for all @ then (b) would follow automatically. But if you try to prove (a) on its own,
you will find out that the inductive argument runs into snags.
§11. The ambient metatheory 135

metatheory. (The prefix meta is used here to distinguish this theory,


which is developed in our metalanguage, from formal object theories
that may be constructed in the object language and serve as objects of
our study.)
As any other mathematical theory, our metatheory must start from a
launching pad of presuppositions: some underlying concepts, regarded
as known, in terms of which further concepts of the theory are defined;
and certain fundamental propositions, on the basis of which the
theorems of our theory can be rigorously proved.
Set theory — in the form of ZF or some broadly similar codification —
is certainly strong enough to underpin our study of logic. Indeed, the
entire technical development in the Logic part of this book can be read
as occurring within set theory. Interpreted in this way, not only the
term set but also other mathematical terms such as natural number and
finite, must be understood in the appropriate technical sense: a natural
number as an ordinal belonging to mw, and a finite set as a set
equipollent to a natural number (cf. Rem. 6.1.8).
But all that we have done so far in this chapter does not really
require such a strong ambient theory. Set theory is vital only where
there is need to regard infinite pluralities as single objects: sets that in
turn can themselves be members of classes. So far we have hardly had
any need for positing such completed (or actual) infinities. Though we
have used set-theoretic terminology, this was not essential. For exam-
ple, although in Specification 1.1(i) we refer to the totality of proposi-
tional symbols of £ as a set, we have never had to regard this totality
as a single object that can be a member of a class. We only need the
stock of propositional symbols to be potentially infinite; so everything
we have done works just as well if we replace the word ‘set’ here by
‘collection’ or by one of its synonyms such as ‘plurality’ or ‘class’. The
same applies to other places where the term ‘set’ has been used.
There was one context that seems to be an exception to what we
have just said and where we did refer to infinite entities as objects. In
Def. 4.4(iii) a formula « was defined to be a tautology, Fy a, just in
case a” = T for every truth valuation o. This definition refers (at least
implicitly) to the class of all truth valuations. Now, by Def. 4.2(i), a
truth valuation is a map with an infinite domain, and hence is itself
infinite.
However, this reliance on infinite objects can be avoided by a simple
device. Clearly, the truth value @° (defined in Def. 4.2(ii)) depends
only on the values assigned by o to the prime components of a.
136 7. Propositional logic

Therefore, instead of truth valuations proper we may consider partial


truth valuations, whose domain is a finite set of propositional symbols,
among which are all the prime components of a. These partial truth
valuations themselves are finite objects; and the notion of tautology
can be redefined by referring to the class of these objects rather than of
truth valuations proper.
A similar device can be used in connection with the definition of the
notion of tautological consequence, ® Fy a, provided the collection ®
is finite. (It is enough to consider partial truth valuations whose
domains are finite sets of propositional symbols, among which are all
the prime components of @ and all the members of ®.)
Thus, provided we restrict parts (iv) and (v) of Def. 4.4 to finite sets
®, the rest of the development of propositional logic so far does not
require the framework of set theory. Looked at in this way, terms such
as natural number and finite are to be understood informally rather
than in their technical set-theoretic sense. To be sure, some — relatively
modest — mathematical presuppositions are still needed as underpin-
ning. We shall not attempt to specify these presuppositions in detail,
but merely point out that among them the Principle of Mathematical
Induction takes pride of place.
But such modest mathematical underpinning is no longer adequate
for the development in the following sections of this chapter. Here,
particularly in the proof of Thm. 13.1, some set-theoretic machinery
must be used. So this development must be understood as taking place
within a sufficiently strong ambient set theory. (See, however, Rem.
15/3@);)

§ 12. Maximal consistent sets


12.1. Definition
A maximal [propositionally] consistent set is a consistent set of formu-
las that is not a proper subset of any consistent set of formulas.

12.2. Remarks

(i) In other words, a set ® of formulas is maximal consistent iff ® is


consistent, but by adding to ® even a single new formula (that is,
one not already belonging to it) we obtain an inconsistent set.
(ii) Maximal consistency is an instance of a general set-theoretic
concept. Let X be the class of all consistent sets of formulas. The
§12. Maximal consistent sets 1137/

relation Cy is then a partial order on X (see Def. 5.2.5 and


Rem. 5.2.6). A maximal consistent set is just a maximal member
of X with respect to the partial order Cy (see Def. 5.2.3).

The following theorem shows that a maximal consistent set is saturated


with respect to deducibility .

12.3. Theorem
If ® is maximal consistent and ® ‘ya, thenae ®.

PROOF

Let ® be maximal consistent and ® |, a. Suppose it were the case that


a ¢ ®. Then, by the maximality of ®, we would get ®, a (cf. Rem.
12.2(i)). Hence by reductio we would have ® }y sa, showing that ®
itself is inconsistent, contrary to hypothesis. |

The following theorem provides an alternative characterization of


maximal consistent sets.

12.4. Theorem
A consistent set ® is maximal consistent iff for every formula « either
ae®Morjnace®.
First, assume ® is maximal consistent. If a ¢ ® then by the maxim-
ality of ® it follows that ®, a |». Hence by reductio ® |) 44, and by
Thm. 12.3 nae @®.
Conversely, assume ® is consistent and satisfies the condition in
question. Let @ be any formula that does not belong to ®; so by the
assumed condition a € ®. It follows that ®, a+ ,. Thus we see that
by adding to ® even a single new formula we get an inconsistent set.
Thus (cf. Rem. 12.2(i)) ® is maximal consistent. a

12.5. Theorem
Every maximal consistent set is a Hintikka set.

PROOF

Let ® be maximal consistent. We shall show that ® fulfils the four


conditions of Def. 10.2.
138 7. Propositional logic

Condition (1) of that definition is obviously satisfied, since ® is


consistent.
Now suppose aa € ®. Then by Lemma 8.14 ®}ya, hence by
Thm. 12.3 a € ®. Thus condition (2) of Def. 10.2 is satisfied.
Next, suppose af € ®. If sa € ® then condition (3) of Def. 10.2
is satisfied. On the other hand, if aa ¢ ® then by Thm. 12.4a¢@®.
Since we have assumed that a—f € ®, it now follows that ® |}, B and
hence by Thm. 12.3 B € ®. Thus condition (3) of Def. 10.2 is satisfied
in this case as well.
Finally, suppose —(a—) € ®. By parts (iv) and (v) of Prob. 8.19
we have ®}+,ya and ®}+,—7f. Hence by Thm. 12.3 ae @® and
36 € ®. Thus condition (4) of Def. 10.2 is satisfied. |

The following theorem establishes a one-to-one correspondence be-


tween truth valuations and maximal consistent sets.

12.6. Theorem
(i) For any truth valuation o, the set {@: @° = T} is maximal consist-
ent.
(ii) Conversely, if ® is maximal consistent then ® = {q: g° = T}
for some truth valuation o. Moreover, this o is the unique truth
valuation satisfying ®.

PROOF
(i) Put B = {q@: @° = T}. ® is evidently satisfiable: it is satisfied by o.
Hence by Thm. 8.4 it must be consistent.
If a is a formula such that a ¢ W then, by the definition of W, it
follows that a° = |. Hence (4@)° = T and so nae W. Thus by Thm.
12.4 W is maximal consistent.

(ii) Conversely, let ® be any maximal consistent set. By Thm. 12.5, ®


is a Hintikka set and hence by Thm. 10.3 it is satisfiable. Let o be a
truth valuation satisfying ®. Again let us put BW = {@: @° = T}. Now
W is the set of all formulas satisfied by 0, so ® CW. By (i), W is
consistent; but ®, being maximal consistent, cannot be included in
another consistent set. Therefore W cannot be other than ® itself.
Thus ® = W = {(@: @°= T}.
As we have just seen, if o is any truth valuation satisfying ® then
@° = T holds just for formulas @ belonging to ® and for no others.
This means that o is uniquely determined by ®. |
§ 13. Strong completeness 139

12.7. Remark
It is now clear that showing a set of formulas to be satisfiable is
tantamount to showing that it is included in a maximal consistent set.

12.8. Problem (The [classical] logic of implication)


An implicational valuation is a mapping from the set of all prime
formulas and all negation formulas to the set {T, 1} of truth values.
An implicational valuation is then extended to implication formulas as
well by imposing condition (3) of Def. 4.2(ii). Let E* be the resulting
consequence relation; thus ®F*«a iff every implicational valuation
satisfying ® also satisfies a.
Let }* be the relation of deducibility in the [classical] calculus of
implication — the linear calculus based on Ax. i, Ax. ii and Ax. iii, with
modus ponens as sole rule of inference.

(i) Verify that the calculus of implication is semantically sound:


®}+*ta>@t* a.
(ii) Show that Ba, (B-y)—>a +* a for all a, B and y.
(iii) Let a be a formula and let ® be a set of formulas such that
® |/* a and which is maximal with this property (that is, ® is not
a proper subset of any W such that W |/* a). Show that ® is
saturated with respect to | *: if ® }* B then Be ®.
(iv) Let a and ® be as in (iii). Show that there is a unique implica-
tional valuation that satisfies ® but does not satisfy a.

§ 13. Strong completeness


The road to the strong completeness theorem is now clear.

13.1. Theorem
Every consistent set of formulas is satisfied by a truth valuation.

PROOF

Let ® be any set of formulas. If ® is consistent then clearly every


subset of ®, and in particular every finite subset, is consistent (cf.
Prob. 8.3(i)). Conversely, if every finite subset of ® is consistent then
by Prob. 8.3(ii) ® itself is consistent.
Thus the class X of all consistent sets of formulas is of finite
character (see Def. 5.2.7). It is not difficult to see that X is in fact a
140 7. Propositional logic

set. (The class S of all £-strings is a set by Thm. 6.3.9; and X is


included in PS.) So if ® is any consistent set, it follows from the TT
Lemma (Thm. 5.2.8) that ® is. included in some (not necessarily
unique) maximal consistent set W. By Thm. 12.6(ii) W is satisfiable,
and hence so is ®. |

13.2. Theorem (Strong semantic completeness of Propcal)


For any set ® of formulas and any formula @., if ® Fy @ then ® fo a.

PROOF

If ®F)a@ then every truth valuation satisfying ® must satisfy a and


hence cannot satisfy aa. Thus ®, nap. By Thm. 13.1 ®, maa fo;
hence by PIP ®} ga. SB

13.3. Remarks
(i) If the primitive symbols of £ are given by an explicit enumera-
tion:

{p,:
ne N},

then the proof of Thm. 13.1 can be made more elementary and
constructive. First, it is easy to define explicitly an enumeration
of all &-formulas:
{p,: ne N}.

Next, given a consistent set ®, we define, by induction on n, sets


®,, as follows. We put ®) = ®; and

poe ®, U{q,} _ if this set is consistent,


A ®,, otherwise.
It is then quite easy to show that the union W = U{®, : ne N}
is a maximal consistent set; and W clearly includes ®.
(ii) The soundness and completeness theorems (Thms. 6.12 and 13.2)
jointly mean that the relations of deducibility and tautological
consequence are co-extensive: ® | a iff ® Fy a. Similarly, Thms.
8.4 and 13.1 jointly mean that consistency and satisfiability are
co-extensive: ®}, iff ® ). Therefore any fact proved for fo
holds also for Fy and vice versa. An important example is the
following result.
§13. Strong completeness 141

13.4. Theorem (Compactness theorem for propositional logic)


If ® is a set of formulas such that every finite subset of ® is satisfiable,
then so is ® itself.

PROOF

Immediate from Prob. 8.3(i1). be

13.5. Problem (The logic of implication — continued)


Let —* and | * be as in Prob. 12.8. Prove the strong completeness of
the calculus of implication: if ® F*a then ®}+*a. (If ® }/* a, show
that ® is included in a set W such that W }/*a@ and such that W is
maximal with this property; then use Prob. 12.8(iv).)
8
First-order logic

§1. Basic syntax


From now on, our formal object language: will be a fixed but (unless
stated otherwise) arbitrary first-order language. We begin by specify-
ing the primitive symbols of such a language.

1.1. Specification
The primitive symbols of a first-order language -£ fall into five
mutually exclusive categories:

(i) An infinite sequence of [individual] variables:

Mee Woln, Wiginne 0 cc Wipes cet cue

The order of the variables indicated here will be referred to as their


alphabetic order.

(11) For each natural number n, a set of n-ary function symbols.


These sets must be pairwise disjoint and some or all of them may
be empty. The 0-ary function symbols (if any) are called /indi-
vidual] constants.
(iii) For each positive natural number n, a set of n-ary predicate
symbols. These sets must be pairwise disjoint and at least one of
them must be non-empty.
(iv) Two distinct connectives, — and —>, called negation symbol and
implication symbol respectively.
(v) The universal quantifier V.

A particular binary predicate symbol = may be singled out as the


equality symbol, in which case £ is referred to as a language with
equality. We further stipulate that if 2 has at least one function symbol

142
$1. Basic syntax 143
that is not an individual constant (that is, at least one n-ary function
symbol with positive n), then it must be a language with equality.
The variables, the connectives, the universal quantifier and the
equality symbol (if present) are the logical symbols of £. All other
primitive symbols (namely, the function symbols and the predicate
symbols other than =) are extralogical.

1.2. Warnings
(i) Specification 1.1 must not be read as exhibiting any symbol of the
object language “, which indeed may not have a written form.
Thus, for example, it must not be supposed that ‘v,’ is a variable
of £. Rather, it is a syntactic constant, belonging to our metalan-
guage and denoting the first variable (in alphabetic order) of 2.
Also, ‘=’ should not be taken to be the equality symbol of 2.
Rather, it is a syntactic constant used to denote the equality
symbol of &, if it has one. (Cf. Warning 7.1.2.)
(ii) Note carefully the distinction between ‘=’ and ‘=’. Both are
symbols in our metalanguage. The former is a name (in the
metalanguage) of the equality symbol of the object language (if it
has one); the latter is the equality symbol of the metalanguage,
an abbreviation of the phrase ‘is the same as’.
The similarity of shape between ‘=’ and ‘=’ — which may be
confusing at first — is an intended pun and a mnemonic device;
see Rem. 4.3(iii) below.

1.3. Remark
The difference in the logical symbols between two different first-order
languages is clearly inessential, and there would be no real loss of
generality if we were to assume that all first-order languages share the
same logical symbols. (In the case of the equality symbol this would
mean that all first-order languages with equality have the same equality
symbol.) Two first-order languages are essentially different if only one
of them is with equality, of if they have different stocks of extralogical
symbols.

1.4. Definition
An £-string is defined in the same way as in propositional logic (see
Def. 7.1.3), namely as a finite sequence of primitive symbols of 2.
144 8. First-order logic

In propositional logic we had one significant type of string: the


formulas. Here we have two types: terms as well as formulas.

1.5. Definition
L-terms are strings constructed according to the following two rules.

(1) A string consisting of a single occurrence of a variable is an


L-term.
(2) If f is an n-ary function symbol and t,, tz, ..., t, are L-terms
then the string ft,t, ... t, (obtained by concatenating a single
occurrence of f and t,, t),...,t,, in this order) is an £-term.

In a term ft,;t, ... t, constructed according to clause (2), the terms t;,
to, ..., t, are the first argument, second argument, ..., nth argu-
ment, respectively.
For n=0, (2) says that a single occurrence of a constant is an
£-term (see Specification 1.1(11)).

1.6. Definition
The degree of complexity of a term t — briefly, degt — is the total
number of occurrences of function symbols in t.
We shall often use induction on degt in order to prove general
statements about all terms t.

1.7. Definition
£-formulas are strings constructed according to the following four
rules.

(1) If P is an n-ary predicate symbol and t,, t2,..., t,, are &-terms
then the string Pt,t, ... t, (obtained by concatenating a single
occurrence of P and t), tz, ..., t,, in this order) is an L-for-
mula.
(2) If B is an £-formula then - (the string obtained by concatenat-
ing a single occurrence of — and the string B, in this order) is an
£- formula.
(3) If B and y are £-formulas then >fy (the string obtained by
concatenating a single occurrence of —, the string B and the
string y, in this order) is an L- formula.
(4) If x is a variable and B is an £-formula then Vxf (the string
§2. Adaptation of previous material 145
obtained by concatenating a single occurrence of V, a single
occurrence of x and the string , in this order) is an £- formula.
A formula Pt)t, ... t,, constructed according to (1) is called an atomic
formula; the terms t,, t), ..., t, are its first argument, second
argument, ..., nth argument, respectively. In the particular case
where P is the equality symbol = (in which case n must be 2) the
atomic formula is also called an equation and its first and second
arguments are called its left-hand side and right-hand side respectively.
In connection with formulas constructed according to (2) and (3) we
use the same terminology as before (see Def. 7.1.4).
A formula Vx constructed according to (4) is called a universal
formula; here x is the variable of quantification and the string xf is the
scope of the initial occurrence of the universal quantifier.

1.8. Definition
The degree of complexity of a formula a — briefly, dega@ — is the total
number of occurrences of connectives (— and —) and the universal
quantifier V in a.

1.9. Definition
An -expression is an £-term or an £-formula.

1.10. Remark
We use ‘r’, ‘s’ and ‘t’ (sometimes with subscripts) as syntactic variables
ranging over £-terms. Boldface lower-case Greek letters (sometimes
with subscripts) are used as syntactic variables ranging over &-formu-
las. These and other notational conventions of this kind should be
self-evident.

§2. Adaptation of previous material


In this section we adapt the notational conventions, definitions and
results of Ch. 7 to the new setting. Some of these will be slightly
extended to fit this new setting.
The following problem can be solved similarly to Prob. 7.1.9.
146 8. First-order logic

2.1. Problem
Assign to each primitive symbol p of £ a weight w(p) by stipulating
that if x is a variable then w(x) = —1; if f is an n-ary function symbol
then w(f)=n-—1; if P is an n-ary predicate symbol then w(P) =
n — 1; while w(—) = 0, w(—) = 1 and w(V) = 1. If pi, po, ..-, py are
primitive symbols, assign to the string p;p2 . . . py weight

w(piP2 --- Pi) = W(pi) + w(p2) +--+ - + Wp).


Thus, the weight of a string is the sum obtained by adding —1 for each
occurrence of a variable, n—1 for each occurrence of an n-ary
function symbol or predicate symbol, and +1 for each occurrence of
— or V in the string (occurrences of —.make no contribution to the
weight). Show that, for any term t,

(i) wisi
(ii) if tis the string pyp2... p, and k </, then w(pip2 .. . py) 29.
(iii) Show that if t is a term ft,t, ... t, formed according to Def.
1.5(2), then for each Kk=0,1,..., n, ft,t, ... ty, is the shortest
non-empty initial segment of t whose weight is n — k — 1.
(iv) Show that if @ is a formula Pt)t, ... t,, formed according to Def.
1.7(1)sthen foreach ik = 0; ly. : «, ny PG... 2 t; 1s the shortest
non-empty initial segment of a whose weight is nm — k — 1.
(v) Also show that the results of Prob. 7.1.9 concerning formulas
hold for the present language “. (For (1) and (ii) of Prob. 7.1.9,
four cases now need to be considered, corresponding to the four
clauses of Def. 1.7. In the case where @ is atomic, the previous
results of the present problem are invoked.)

Prob. 2.1 shows that the Polish notation decreed for £ makes brackets
and other punctuation marks unnecessary in that language.’ However,
for reasons explained in §2 of Ch. 7, we decree:

2.2. Definition
(i)i The-samesas Def. 7.2.
(ii) (r=s) =ae =rs,
' The ambiguities that might otherwise arise are illustrated by a piece that appeared in
The Guardian on 10 October 1985, reporting ‘grisly new details of the murder by Lord
Lucan in 1974 of one of his children’s two nannies’. Did the writer intend to say ‘. . . of
[one of (his children’s two nannies)]’ or ‘. . . of [(one of his children)’s two nannies]’?
Did Lord Lucan murder one of the two nannies of his children, or did he commit the
double murder of two nannies of one of his children?
§2. Adaptation of previous material 147

(iii) (r#s) =g¢ a(r=s).


Also, we introduce by contextual definition surrogates for three addi-
tional connectives and the existential quantifier:

2.3. Definition
(i)—(iii) The same as Def. 7.2.5(i)-(iii).
(iv) Axa =gs 2AVx-14.

With this more conventional metalinguistic notation, brackets are


needed, and so are rules for omitting and restoring them. We adopt
the same rules as before: ommission of outermost brackets (Rule
7.2.2), adhesion of ‘—’ (Rule 7.2.4), ranks and association to the right
(Rule 7.2.7) and add to them one more rule:

2.4. Rule (Adhesion of ‘Vx’ and ‘Ax’)


Do not omit a pair of brackets whose left member is immediately
preceded by an occurrence of ‘Wx’ or ‘Ax’. Equivalently: In restoring
brackets, do not add a new pair of brackets whose left member
immediately follows an occurrence of ‘Wx’ or ‘4x’. Similarly with ‘x’
replaced by ‘y’, or ‘z’, or by any other syntactic variable ranging over
£-variables, or by a syntactic constant denoting an £-variable.

In order to adapt the rest of the material of Ch. 7 to our present


setting, we need to redefine the notions prime formula and prime
component of a formula.

2.5. Definition
A prime formula is a formula that is atomic or universal.

2.6. Definition
The set of prime components of a formula @. is the smallest set of prime
formulas from which « can be obtained as a propositional combination.
In detail, by induction on deg a:

(1) If a is a prime formula, then the set of prime components of @ is


f
{a}.
148 8. First-order logic

(2) If a = 46 then the set of prime components of @ is the same as


that of B.
(3) If a = By then the set of prime components of @. is the union of
those of B and y.

With these redefinitions, all the material of §§3-13 of Ch. 7 carries


over lock, stock and barrel into the present setting. From now on,
whenever we use a piece of notation introduced in Ch. 7, or refer to a
definition, result or remark in that chapter, we shall interpret that
notation, definition, result or remark as relating to the present setting,
in which J is a first-order language rather than the language of Ch. 7.

§3. Mathematical structures


3.1. Preview
Of course, we have not introduced our first-order language £ merely
as a vehicle for propositional logic—this would leave the variables, the
function symbols, the predicate symbols and the universal quantifier
without gainful employment, while only the connectives would be
doing a significant job. The point of having a first-order language is
that such a language, when suitably interpreted, can be used to ‘talk
about’ this or that mathematical structure. In this section we shall
explain: what a mathematical structure is.
We shall make use of the material presented in Ch. 2; in particular,
the notions of relation and property (Def. 2.1.14) and that of map
(a.k.a mapping or function, Def. 2.2.1). We shall also need the
following definition.

3.2. Definition
For n 2 1, an n-ary operation on a class A is a map from A” to A.
If f is an n-ary operation on A, and aj, a>, ..., a, € A, then the
value of f at the n-tuple (a;, a), ..., a,) is usually denoted by
‘f(a1, 42, ..., G,)’ with parentheses instead of corner brackets.

3.3. Remark
From Def. 3.2 and the definitions made in Ch. 2 it is not difficult to see
that f is an n-ary operation on A iff f is an (m + 1)-ary relation on A
such that for any a;, a), ..., a, € A there is a unique a € A for which
(diva, aaa, ay ede
§3. Mathematical structures 149

So far we have defined the notion of n-ary operation for positive n


only. If we were to extend Def. 3.2 directly to n =0, then a 0-ary
operation on A would be defined as a set of the form {(@, a)}, with
ae A. On the other hand, were we to extend the condition of Rem.
3.3 to the case n = 0, then a 0-ary operation on A would have to be
defined as a set of the form {a}, with ae A. In either case, there
would be a one-to-one correspondence between 0-ary operations on A
and members of A. It fact it turns out to be most convenient to take
neither of these courses, but — in the spirit of reductionism — simply to
identify 0-ary operations on a class with its members:

3.4. Definition
A 0-ary operation on a class A is a member of A.

We are now ready to lay down the main definition of this section.

3.5. Definition
A mathematical structure is a composite entity U consisting of the
following ingredients.

(i) A non-empty set U called the domain or universe of U. The


members of the domain are called the individuals of U.
(ii) A set of operations on U, called the basic operations of U.
(iii) A non-empty set of relations on U, called the basic relations of U.
Note that the set of basic operations may be empty. Among the basic
operations there may be some O-ary ones, which by Def. 3.4 are
individuals of the structure. Such an object — that is, an individual of
the structure which is also among its basic operations — is called a
designated individual of the structure.

Perhaps the most fundamental structure of classical mathematics is:

3.6. Example
The elementary (or first-order) structure of natural numbers may be
defined as the structure It having the following ingredients.

(i) Its domain is the set N = {0, 1, 2, ...} of all natural numbers.
(ii) Its four basic operations are the designated individual 0; the
150 8. First-order logic

unary operation s which assigns to each number n its immediate


successor; and two binary operations, + and x, which assign to
each pair of numbers their sum and product respectively.
(iii) Its only basic relation is the identity relation on N, namely
idy ={(n,n):neN}.

3.7. Example
A more general notion of structure than that prescribed by Def. 3.5 is
obtained by allowing the domain to be a proper class rather than a set,
and also admitting a basic relation which is a proper class. The most
important example of this liberalized notion is the structure of sets M,
having the following ingredients.

(i) Its domain is the class M of all objects, that is sets and individuals
(if any) of set theory, a.k.a. the universal class.
(ii) No basic operations.
(iii) Its basic relations are the identity relation on M and the relation
€ of membership between objects and sets.

3.8. Remark
A great many mathematical statements are, or can be construed as,
statements about mathematical structures. The structuralist view of
mathematics holds that mathematics is essentially the study of such
structures.

§4. Basic semantics


4.1. Preview
By itself, & is meaningless; its expressions express nothing: they are
just strings of meaningless symbols, combined according to apparently
arbitrary formal syntactic rules. In this section we introduce the basic
semantic apparatus needed to endow -£-expressions with meaning.
First, we shall define the notion of £-interpretation (a.k.a. £-struc-
ture). Roughly speaking, an 2-interpretation is a mathematical struc-
ture (cf. Def. 3.5) together with a sort of ‘dictionary’ that assigns a
reference to each function symbol of &, making it a name of some
basic operation of the structure; and to each extralogical predicate
symbol of -2, making it a name of some basic relation of the structure.
Under a given £-interpretation, each closed term (a term not
$4. Basic semantics ilsil

containing variables) receives a reference, becoming a name for some


individual (a member of the domain of the structure). A term contain-
ing variables does not receive any particular reference, but once the
variables are assigned values (belonging to the domain) the term itself
receives a value (also belonging to the domain).
Certain formulas, known as sentences, also receive meaning under
an -interpretation: each sentence expresses a proposition about the
mathematical structure concerned, and thus receives a truth value T or
1, according as that proposition is true or not. A formula that is not a
sentence does not express a proposition and thus cannot be said to be
true or false outright. Rather, it expresses a condition which may or
may not be satisfied by a given assignment of values (belonging to the
domain) to certain variables, the free variables of the formula.
In order to deal with all terms (including those that contain vari-
ables) and all formulas (including those that are not sentences), we
shall introduce the notion of £-valuation, which is an £-interpretation
together with an assignment of an individual (member of the domain)
as value to each variable of &. Under an -£-valuation, each term
receives a value (belonging to the domain of the structure) and each
formula receives a truth value.

4.2. Definition
An £-interpretation (or £-structure) is a package — that is, a composite
entity (or, to be pedantic, an ordered triple) — U, consisting of the
following three components.

(i) A non-empty set U, called the domain or universe [of discourse]


of U. The members of U are called individuals of U.
(ii) A mapping that assigns to each function symbol f of & an
operation f on U, such that if f is an n-ary function symbol then
f'' is an n-ary operation on U. In particular, if ¢ is a constant of
£ then c! is an individual of U. Operations of the form f" are
called basic operations of U; individuals of the form c'' are called
designated individuals of U.
(iii) A mapping that assigns to each predicate symbol P of £ a
relation P" on U, such that if P is an n-ary predicate symbol then
Pp" is an n-ary relation on U and such that if 2 has the equality
symbol = then =" is the identity (diagonal) relation on U,
namely idy = {(u,u): ue U}.
dz 8. First-order logic

4.3. Remarks

(i) The requirement that the domain U be non-empty has some


technical advantages and is adopted by most modern authors.
However, it is not essential and some authors (for example,
Wilfrid Hodges, Logic, Penguin 1977) do allow structures with
empty domain; the resulting treatment differs in some minor
points from the conventional one.
(ii) The mappings mentioned in clauses (ii) and (iii) of Def. 4.2 are
not assumed to be one-to-one. For example, it is possible to have
ce #d with c! =d"; in other words, two distinct constants may
have the same interpretation. (This is like an object having more
than one name in ordinary language.)
(iii) The special role of the equality symbol of “, and the reason why
we have denoted it by ‘=’, are made clear in clause (iii) of Def.
4.2. Many authors confine the mapping in this clause to extra-
logical predicate symbols; and the additional requirement that
the equality symbol = of & be interpreted as denoting the
identity relation on U is then introduced separately as part of the
Basic Semantic Definition (see, for example, B&M, pp. 49 and
51). In the end it amounts to the same thing.
(iv) We use upper-case German (Sraktur) letters to denote “-structures.
We adopt the convention that where a structure is denoted by a
given German letter, its domain will be denoted by the corre-
sponding upper-case italic, unless specified otherwise.
(v) Note that the meaning of the term ‘individual’ here (as well as in
Def. 3.5) is different from its special meaning in set theory (see
1:13):

4.4. Definition

(i) An £-valuation is a package (say an ordered pair) 0 whose two


components are: an -interpretation U; and a mapping that
assigns to every variable x of £ a value x° € U.
(ii) The -structure that forms part of an -£-valuation o is called the
underlying structure of o. We also say that o is based on this
structure.
(iii) If o is an -2-valuation with underlying structure U, then by the
universe of o we mean the domain U of U; and we put f° =4,
for every function symbol f of 2 and P® =; P" for every predic-
ate symbol P of 2.
§4. Basic semantics IS

4.5. Definition
If o is a valuation and uw is an individual in its universe, then o(x/u) is
the valuation that is based on the same structure as o and assigns the
same values as o to all variables other than x, while x°°/) = uw. We say
that o(x/w) is obtained from o by revaluing x as u.
Thus £°°/“) = £% for every function symbol f; and P°/ = p? for
every predicate symbol P; and y°/“) = y° for every variable y # x;
while x" = y.

The following definition is of central importance. It was first stated


explicitly by Alfred Tarski in 1933, but had been used tacitly long
before that. For any valuation o, the first section of the definition
assigns to each term t a value t° belonging to the universe of o. This is
done by induction on degt, in two clauses corresponding to those of
Def. 1.5. The second section of the definition assigns to each formula a
a truth value a’. This is done by induction on dega, in four clauses
corresponding to those of Def. 1.7.

4.6. Basic Semantic Definition (BSD)


Let o be a valuation with universe U.

(T1) Ifx is a variable, then x° is already defined (see Def. 4.4).


(T2) If f is an n-ary function symbol and t;, tz,...,t, are terms, then

(ft,t» piieice ee? = f(t te skeuisin's fa):

Fl) If P is an n-a ry p redicate y.symbol and t;, to, ..., t, are terms,
then

CL ~ ae ht St
”) ’ > tee
*n P ’

heer te) {i otherwise.

In particular,

— oO —
‘Vimais = ts 5)
is5) ih otherwise.

(F2) (3B)° = {7 retin


154 8. First-order logic

L 6? = Tandy? =
(Fo) o
(B>y) ie otherwise.

- T if B°°/ = T for every u € U,


Wd Di i za ‘

4.7. Remarks
(i) Strictly speaking, what the BSD defines is a pair of new mappings
induced by the given valuation o and extending it to larger
domains. This is somewhat obscured by the fact that both of
these two new mappings are also denoted by ‘o’. The first of
these, defined in (T1) and (T2), is a map from the set of all terms
to the universe U of o. The second map induced by og, defined in
(F1)—(F4), maps the set of all formulas to the set {T, 1} of truth
values.
(ii) Clauses (F2) and (F3) are identical with clauses (2) and (3) of
Def. 7.4.2(ii), and so ensure that [the second mapping induced
by] a valuation assigns truth values to formulas in just the way a
truth valuation is required to do in propositional semantics. Thus,
as far as its effect on formulas is concerned, a valuation may be
regarded as a special case of a truth valuation. Note however that
not every truth valuation can be obtained in this way from a
valuation. For example, if a is any formula and x is any variable,
then by Def. 2.5 the formula Vx(a—«a) is prime; hence there are
truth valuations under which Vx(a—a) has the truth value L.
But it is easy to see that [Vx(a—a)]° = T for any valuation o. (If
£ is a language with equality, then a simpler counter-example is
provided by the equation x=x, where x is any variable: this
formula is prime, but its truth value under any valuation is T .)
(iii) Due to (F4), the BSD has a strongly non-effective character: it
does not, in general, provide us with a method whereby the truth
value «° (for given a and o) might be found in a finite number of
steps. For, if the universe U is infinite and @ is a universal
formula, VxB, then by (F4) the truth value a° depends on the
infinitely many truth values B°*/”, for all the infinitely many wu in
U. Of course, for some particular x, B and o with infinite
universe U it may be possible to determine, using some theoret-
ical argument, whether or not B°*/“) = T for all the infinitely
§4. Basic semantics SS

many u in U. But there is no a priori reason to suppose that


there is some universally applicable method for arriving at such
an argument. (Indeed, we shall see later that there can be no
such method.) Compare this to the situation in propositional
logic, where the truth value of any given formula under any given
truth valuation can be computed mechanically: for example,
using a truth table.

4.8. Problem
Using Def. 2.3(iv), show that

(AxB)° = T = poe) = T for some u in the universe U of o.

4.9. Definition
(i) If @ is a formula and o is a valuation such that @° = T, we say
that o satisfies @ and write ‘oF @’.
(ii) If ® is a set of formulas and o is a valuation that satisfies every
member of ®, we say that o satisfies ® and write ‘oF ®’.

4.10. Definition
(i) If the formula @ is satisfied by every valuation, we say that a is
logically true (or logically valid) and write ‘F a’.
(ii) If ® is a set of formulas and @ is a formula such that every
valuation that satisfies ® also satisfies a, we say that a is a logical
consequence of ® and write ‘®fa@’. In this connection we
employ simplified notation similar to that used in connection with
‘Eo’. For example, we write ‘®, mF a as short for ‘® U {@} Fa’.
(iii) If ® is a set of formulas that is satisfied by some valuation, we
say that ® is satisfiable. If ® is not satisfied by any valuation we
say that it is unsatisfiable and write ‘®F’.
(iv) If a B and also BF a@ (that is, a? = B° for every valuation o) then
we say that @ and f are logically equivalent and write ‘a. = B’.

4.11. Theorem
If ®&ya then also ® Ea. In particular, if Foa then also Fa; and if
a =, f then also a =f.
156 8. First-order logic

PROOF
Immediate from Rem. 4.7(ii). iy

The converse of this theorem is of course false. As pointed out in


Rem. 4.7(ii), the logically true formula Vx(a—«) is prime, so cannot
be a tautology. And this same formula is logically equivalent, but not
tautologically equivalent, to the formula Vx(a<a@).

4.12. Problem
(i) For any set ® of formulas and any two formulas @ and B, prove
that ®, a FBPiff DE af.
(ii) Prove that {a,,0@,...,a,} EB iff Fa;>a,>:- -—a,>8.
(iii) Prove that a = BiffFaop.

4.13. Remark
We say that B is a subformula of a if the formula B, regarded as an
£-string, occurs as a consecutive part of the formula a, where the
latter is also regarded as an “-string. (Note that B can occur in @ more
than once; but using Prob. 2.1(v) it is easy to show that two distinct
occurrences of B in @ cannot overlap.)
An obvious feature of the BSD is that if @ is a non-atomic formula,
then a is determined in terms of the truth values of certain subformu-
las of a under o itself and (if @ is a universal formula) under certain
other valuations. Note that it is the truth values of these subformulas
that matter, not the subformulas themselves.
This has the following consequence. Suppose that 8’ is a formula
such that B’ = and let a’ result from @ when an occurrence of a
subformula f in @ is replaced by 8’. Then a’ =a. This rather obvious
result can be proved rigorously by a simple but tedious induction on
deg a.

4.14. Remark
Let us pause to consider the issue raised in §11 of Ch. 7: that of the
ambient metatheory. While the mathematical presuppositions required
for the first three sections of this chapter are rather modest, the
Tarskian semantics presented in this section is quite another matter.
This is mainly due to Def. 4.10, which refers (albeit implicitly) to the
§5. Free and bound occurrences IS7/

class of all &-valuations, and thereby requires these valuations to be


objects.
Now, a valuation is in general an infinite entity, for two reasons.
First, because a valuation must assign interpretations to all extralogical
symbols of £, of which there may be infinitely many; and values to all
variables of &, of which there must be infinitely many. Second,
because one component of a valuation is its universe, which may be an
infinite set.
The first reason is not essential, at least if we are prepared to confine
our semantic treatment to a single formula (or to finitely many
formulas) at a time. We can employ a device similar to that described
in §11 of Ch. 7 in connection with truth valuations: instead of using
full valuations, we may use partial valuations, which assign interpreta-
tions and values to finitely many extralogical symbols and variables,
including all those that occur in the given formula(s). But the possible
infinitude of the universe of a valuation cannot be circumvented in this
way, because clause (F4) of the BSD makes the truth value of a
universal formula, (VxB)°, dependent on the whole universe U of o.
For this reason, those parts of our investigation that depend on
concepts defined in Def. 4.10 will generally presuppose the existence
of infinite sets as objects, and must be viewed as taking place in an
ambient theory that incorporates a sufficiently rich set theory.

§5. Free and bound occurrences of variables


5.1. Preview
The value t° of a term and the truth value @° of a formula under a
valuation o clearly ought not to depend on the whole of o but only on
its ‘relevant’ parts. For example, if f is a function symbol that does not
occur in t (or in @) then surely t° (or a”) ought not to depend on f°.
We shall soon state this proposition more precisely, and prove that it is
indeed correct. However, when it comes to variables, we must distin-
guish two ways in which they occur in formulas: an occurrence of a
variable in a formula can be either free or bound. It will transpire that
a” does not depend on x® even if the variable x does occur in a,
provided that all its occurrences are bound.

5.2. Definition
We say that valuations o and t agree on a variable x (or function
symbol f, or extralogical predicate symbol P) if o and t have the same
universe and x” = x" (or f° = f’, or P° = P’, respectively).
158 8. First-order logic

5.3. Remark
We can characterize o(x/u). as the valuation that agrees with o on all
extralogical symbols and all variables other than x, whereas xO) 1,

5.4. Theorem
Let t be a term and let o and t be valuations that agree on all function
symbols and all variables occurring in t. Then t° = t’.

PROOF
Easy, by induction on degt. DIY or see B&M, p. 54. a

5.5. Remark
In particular, if t contains no variables (and is therefore made up
entirely of constants and other function symbols) and the valuations o
and t are based on the same -structure then t? = t’.

5.6. Definition
A term t is closed if it contains no variables. If t is such a term and U is
an -£-structure, we put t" =4,t, where o is some valuation based on
U. (By Rem. 5.5 it makes no difference which valuation based on U is
chosen.)

An occurrence of a variable x in a formula @ is bound if it falls inside


the scope of a quantifier that has x as its variable of quantification.
Any other occurrence of x in a is free. More precisely, we define these
concepts by induction on deg a.

5.7. Definition
The occurrences of a variable x in a formula @ are classified into two
mutually exclusive kinds, free occurrences of x in @ and bound
occurrences of x in a, as follows:

(1) If @ is atomic, then all occurrences of x in @ are free in a.


(2) If a = —B then an occurrence of x in @ is free in a iff it is free
in B.
(3) If a = By then an occurrence of x in @ is free in a iff it is free in
Boriny.
§5. Free and bound occurrences 159

(4) If a = Vx then all occurrences of x in @ are bound in a. But if


a = VyB, where y is a variable other than x, then an occurrence
of x in @ is free in a iff it is free in B.
A variable x is free in a formula @ if x has a free occurrence in a. The
free variables of a formula are those that are free in it.

5.8. Theorem
Let a be a formula and let o and t be valuations that have the same
universe and agree on all the extralogical symbols and free variables of
a. Then a? =a".

PROOF

By induction on deg a. We distinguish four cases.

Case 1: @ is an atomic formula, say Pt,t, ...t,. Then

w= eiets et Per by BSD F1,


SAM oto uk esty eR by assumption,
PSA
Fe) ie Se De olvi by Thm. 5.4,
ea’ =T by BSD F1.

Case 2: @ is a negation formula, say —f. Note that o and t agree on


the extralogical symbols and free variables of B, since they are the
same as those of a. Hence
a =Tspe=al by BSD F2,
<P =1 by ind. hyp.,
ea=T by BSD F2.

Case 3: @ is an implication formula. DIY.

Case 4: @ is a universal formula, say Vxf. The valuations o and Tt


agree on all the extralogical symbols of B, because they are exactly
those of a. Every free variable of @ is also free in B; but B may have
one additional free variable, namely x. Now, o and Tt need not agree
on x; but if u is any member of U then o(x/u) and t(x/u) do agree on
x as well: both assign to it the value uw. Hence by the induction
160 8. First-order logic

hypothesis B°*/) = B™*/), So


a? = T = BP = T-for every ue U by BSD F4,
<> Bp“ = T for every u € U by ind. hyp.,
Sy = 7 by BSD F4. @

5.9. Remark
In particular, if « has no free variables (so that all occurrences of
variables in it, if any, are bound) and the valuations o and t are based
on the same structure then a? = a’.

5.10. Definition
A sentence is a formula without free variables. If @ is a sentence and U
is an £-structure, and @ is satisfied by some — and hence (cf. Rem. 5.9)
by every — valuation based on U, then we say that @ holds (or is
satisfied) in U1, and that U is a model for a, and write ‘UF @’.
If UE @ for every member g of a set & of sentences, we say that U is
a model for &.

5.11. Problem
Prove: F Vx(a—>$)>Vxa—Vxf. (Use Prob. 4.12.)

5.12. Problem
Show that if x is not free in a then Vxa = a = Axa.

5.13. Problem
Assuming that x is not free in B, show that

(i) Vx(anBp) =VxanB,


(ii) dx(anB) = dxanB;
(ii) Vx(avB) = Vxav6,
(iv) dx(av B) = dxavB;
(v) Vx(a—) = Axaf,
(vi) dx(a—>B) = Vxa>8;
(vii) Vx(B>a) = B>Vxa,
(viii) Ix(P>a) = B>Axa.
§6. Substitution 161

5.14. Problem
Construct a sentence @ containing only logical symbols (that is, no
function symbols and no predicate symbols other than =) such that a
holds in a structure U iff the domain U of U has
(i) at least three members,
(ii) at most three members,
(iii) exactly three members.

5.15. Problem

Let £ be a language without = whose only extralogical symbol is a


binary predicate symbol P. Construct an -2-sentence @ such that « has
no finite model (that is, a does not hold in any structure whose domain
is finite) and such that if U is any infinite set then there is a binary
relation P on U such that the “-structure U with domain U and with
p" = P is a model for @. (In writing your solution, do not be tempted
to denote the predicate symbol of & by anything other than ‘P’. Note
that any condition that you wish to impose on the interpretation P of P
must be written into @.)

§6. Substitution
Substitution is a purely syntactic operation: occurrences of a variable in
a given expression are replaced by [occurrences of] a term. Thus, three
£-entities are involved: first, the expression in which the substitution is
made; second, the variable for which a term is substituted; and third,
the term which is substituted for occurrences of this variable. We start
with the straightforward case where the first mentioned entity, the
expression in which the substitution is made, is itself a term.
We denote by ‘s(x/t)’ (read: ‘s, with x replaced by t’) the result
obtained from the term s when all occurrences of the variable x in s are
simultaneously replaced by occurrences of the term t. In detail, s(x/t)
is defined by induction on degs.

6.1. Definition
For any variable x and any term t,

(fa)ix(x/t) =;
(1b) y(x/t) = y for any variable y other than x;
162 8. First-order logic

(2) if s = fs|s) ... 8, where f is an n-ary function symbol and sj, sp,
...,S8, are terms, then s(x/t) = fs;(x/t)s(x/t) ...s,(x/t).
The most important fact about s(x/t) is its semantic behaviour: the way
its value undera valuation 0 depends on s, x, t and o.
We must not expect the value s(x/t)° to be the same as s’, because
s(x/t) and s are, in general, two different terms. However, note that in
the former term t occupies the same positions that x occupies in the
latter. Thus we ought to expect the value s(x/t)” to be the same as the
value of s not under o itself, but under the valuation obtained from o
by revaluing x and assigning to it the value that t has under o (see Def.
4.5). Thus we ought to have:

(6.2) s(x/t)” = s%/), where t = t?.

6.3. Remark
For purely typographical reasons, the printed form of (6.2) is a bit
more complicated than it need be. When writing this formula by hand,
there is no need to use ‘?’ at all, because the ‘?’ in the main part of the
formula can be replaced by ‘t®’. The form then taken by (6.2) is shown
here:

(6.2,)ie s(x/t)? = s°/),


Unfortunately, this requires three levels of print and the third-floor
characters have to be smaller than ordinary small print. This is
technically difficult to typeset as well hard on the eye. So in print we
use the verbose form (6.2); but in hand-written texts it is better to use
the more compact (6.2’). A similar remark applies also to (6.6) below.

6.4. Theorem
(6.2) holds for all s, x,t and o.

PROOF

By induction on degs. Three cases must be considered, corresponding


to the three clauses in Def. 6.1. Throughout, we put t = t’.

Case la: s is x. Then s(x/t)° = x(x/t)” = t°, by Def. 6.1. On the other
hand, s’*/) = x°@/9 = ¢, by Def. 4.5. So (6.2) holds in this case.
§6. Substitution 163
Case Ib: s is a variable y #x. Then s(x/t)’ = y(x/t)’ = y’, by Def.
6.1. On the other hand, s/) = y°@/) = y?, by Def. 4.5. So (6.2)
holds also in this case.

Case 2: s is fs;s)...s,,. Then

s(x/t)° = [fsjs> .. . s,](x/t)°


= [fs;(x/t)s2(x/t) ... s,,(x/t)]° by Def. 6.1,
= f°(s1(x/t)”, so(x/t)’, ...,s,(x/t)”) by BSD T2,
= £°/(s:(x/t)?, so(x/t)’, ..., 8,(x/t)") by Def. 4.5,
SED SOO 55 HO ao St) by ind. hyp.
= [isso usa lee? by BSD T2,
=e 5 78/1), a

6.5. Remark
Thm. 6.5 does not tell us anything unexpected about the semantic
effect of substitution — on the contrary, the result is what we anticip-
ated. The point of the theorem is that it confirms that Def. 6.1 was
correct, in the sense of ensuring the desired effect.
Let us turn to the case where a term t is to be substituted for a
variable x in a formula a. For reasons that should now be clear, we
must define the substitution in such a way that

(6.6) a(x/t)? = a), where t = t’.


Now however complications arise due to the different roles played by
free and bound occurrences of variables. Here we shall only outline the
way these complications are resolved. Full technical details can be
found in B&M, pp. 57-64.
First, it is clear that when substituting t for x in a, only free
occurrences of x in @ should be replaced by t. Intuitively speaking, the
reason for this is that the truth value a? depends on the value x° only
through the free occurrences of x in a (see Thm. 5.8). Besides, if we
replace all occurrences of x by t, the result may not be a formula at all.
Indeed, if x has bound occurrences in @, then at least one of them must
immediately follow an occurrence of V. If such an occurrence of x is
replaced by t, then the result will not be a formula, unless t itself
happens to be a variable, because in a formula each V must be
followed by a variable.
Can we therefore define a(x/t) as the result of replacing all free
164 8. First-order logic

occurrences of x in @ by t? Unfortunately, this does not always work.


Cases where if fails to work are those in which some free occurrence of
x in @ occurs within the scope of a y-quantifier, where the variable y
(which must of course be distinct from x) happens to occur in t. If we
then simply replace such an occurrence of x by t, the resulting
occurrence of y in the new formula so obtained will be captured: it
becomes bound by the y-quantifier. It turns out ‘that when capturing
takes place, (6.6) may fail.
For example, let a be Vy(x=y), where x and y are distinct. If we
were to define a(x/t) as Vy(t=y) for arbitrary t, then taking t as y
itself we would get a(x/t) = Vy(y=y). Note that the new (second)
occurrence of y got captured by a y-quantifier. But then (6.6) would
not always hold, because Vy(y=y) is satisfied by every valuation (it is
logically true), whereas Vy(x=y) is satisfied just by valuations whose
universe is a singleton.
Of course, this kind of complication, due to capturing, does not
always arise. Instead of defining a@(x/t) outright for all a, x and t, we
proceed in stages. First, we confine ourselves to cases in which
capturing does not take place.

6.7. Definition
If no free occurrence of x in @ is within the scope of a y-quantifier,
where y is a variable that occurs in t, then we say that t is free [to be
substituted] for x in @; and in this case we define a(x/t) as the result
obtained from @ when all free occurrences of x in @ are simultaneously
replaced by t. [For a more detailed version of this definition, proceed-
ing by induction on dega, see B&M, p. S9f.]

It is now fairly easy to show that (6.6) holds in the special case where
a.(x/t) has so far been defined.

6.8. Theorem
(6.6) holds whenever the term t is free for the variable x in the form-
ula a.

PROOF

DIY or see B&M, p. 60f. a


§6. Substitution 165

6.9. Remark

There are two special cases where t is free for x in a. First, where t
contains no variable other than x. Def. 6.7 therefore applies in this
case. In particular, in the case where t is x itself, it is easy to see that
a(x/x) is just @, as it ought to be. The second special case is where t
contains no variable that occurs bound in @: in this case a does not
contain any y-quantifier where y occurs in t.

In order to define a(x/t) in the remaining case — where t is not free for
x in @ — we must first modify the offending parts of a@ and make them
harmless. The trouble is caused by free occurrences of x in @ that fall
within subformulas of @ having the form Vy, where y is a variable
that occurs in t. In order to make the substitution work, so that (6.6) is
ensured, such subformulas of a@ must first be replaced by logically
equivalent ones that use a harmless variable, say z, instead of y. This
motivates the following

6.10. Definition
If z is a variable that does not occur free in B but is free for y in B, we
say that the formula Vz[B(y/z)] arises from the formula VyB by
[correct] alphabetic change [of variable of quantification].

6.11. Remarks
(i) The reasons for requiring that z be free for y in 6 is that
otherwise the substitution B(y/z) is not defined as yet. The reason
for requiring that z has no free occurrences in is that otherwise
the formulas Vz[B(y/z)] and VyB may not be logically equivalent.
For example, let 6 be y=z, where y and z are distinct variables.
It is easy to see that Vz(z=z) and Vy(y=z) are not logically
equivalent: the former is logically true, whereas the latter is
satisfied by a valuation o iff the universe of o is a singleton.
(ii) If z does not occur at all in B, then z clearly fulfils the conditions
in Def. 6.10.
(iii) It is not difficult to show that the operation of alphabetic change
is reversible; in other words, if Vz[B(y/z)] arises from VyB by an
alphabetic change, then the latter formula can be retrieved from
the former by an alphabetic change (see B&M, p. 61).
166 8. First-order logic

6.12. Theorem
If z[B(y/z)] arises from.“yB by alphabetic change then these two
formulas are logically equivalent.

PROOF
DIY or see B&M, p. 61. ' |

6.13. Definition
(i) We say that a formula y is obtained from a formula @ by an
alphabetic step if a has a subformula of the form VyB and y
results from @ when one occurrence of Vyf is replaced by a
formula Vz[B(y/z)] that arises from it by alphabetic change.
(ii) We say that a’ is a variant of a, and write ‘a ~ «@’’, if a’ can be
obtained from @ by a finite number of alphabetic steps.

6.14. Remarks
(i) The relation ~ is easily seen to be an equivalence relation. It is
reflexive: 0 ~ a always holds because @ is obtained from itself by
0 alphabetic steps. It is symmetric: if a~a’ then also a’~a
because alphabetic changes, and hence also alphabetic steps, are
reversible. Finally, it is clearly transitive: if a~a’ and a'~ a”
then alsoa ~ a”.
(ii) By Thm. 6.12 and Rem. 4.13, ifa~a’ thena=a’.

We can now define the substitution a(x/t) in full generality.

6.15. Definition
Let a variable x and a term t be given. For any formula @, we select a
formula a’ such that if t is free for x in a, then q@’ is @ itself; but if t is
not free for x in a, then @’ is a variant of @ in which t is free for x.
Thus «’(x/t) is already defined in Def. 6.7. We now define a(x/t) to be
the same as a’(x/t). [For details see B&M, p. 63. If t is not free for x in
a, then it does not really matter which variant of @ is selected to be a’,
so long as t is free for x in a’. But a definition must be unambiguous,
so a particular variant a’ must be selected. This is done by induction
on dega. The gist of the choice is that each offending subformula VyB
of @ is replaced by Vz[B(y/z)], where z is the first variable in the
§7. Hintikka sets 167
alphabetic list of ’-variables — that is, the v; with the least i — such that
this is a correct alphabetic change and such that z does not occur in t.]

6.16. Problem
Show that (6.6) holds for all a, x, t and o.

$7. Hintikka sets


We shall introduce first-order Hintikka sets because, as in the proposi-
tional case, it is relatively easy to prove that such sets are satisfiable. It
will follow that any set included in a Hintikka set is also satisfiable.
This will come in handy later on, when we shall want to prove the
appropriate completeness theorem.

7.1. Definition
A [first-order] Hintikka set [in £] is a set ® of L£-formulas satisfying
the following nine conditions:

(1) If @ is any atomic formula such that a € ®, then na ¢ ®.


(2) If a is any formula such that >a € ®, then alsoae ®.
(3) If a and B are any formulas such that a> € ®, then 4a € ® or
pe®.
(4) If @ and B are any formulas such that ~(a—f) € ®, then ac ®
and nape ®@.
(5) If @ is any formula and x is any variable such that Vxa € ®, then
a(x/t) € ® for every £-term t.
(6) If @ is any formula and x is any variable such that ~Vxae ®,
then 4a(x/t) € ® for some L-term t.
(7) If £ is a language with equality, then t=t € ® for every &-term t.
(8) If n=1ands,,5.,...,8, and t;, t,,...,t, are any 2m L-terms
such that for each i=1, 2, ..., n the equation s;=t; is in ®,
then it follows that for every n-ary function symbol f of & the
equation fs,s>...s,=ft,t,...t, is alsoin ®.
(9) If n=1 and sj, s2,...,8, and t,, tz,...,t, are any 2n £-terms
such that for eachi=1,2,..., m the equation s;=t; is in ®, and
if P is any n-ary predicate symbol such that the atomic formula
Ps,s> ...S, isin ®, then the formula Pt,t, ...t, is alsoin ®.
168 8. First-order logic

7.2. Remarks
(i) Conditions (8) and (9) of the definition are vacuous if £ is a
language without equality. The reason for excluding the case
n = 0 in (8) is that for n = 0 this condition would have reduced to
requiring that if c is any individual constant of & then c=c € ®,
which is already covered by condition (7).
(ii) Condition (9) applies in particular to the case where n = 2 and P
is = itself. In this special case the condition says that if s,, sz, t;
and t, are any four terms such that the three equations s,;=t,,
s,=t, and s;=s, are in ®, then the equation t;=t, is also in ®.
Fig. 2 can be used as a mnemonic for this statement. The four
terms are represented by the four corners of the square; the three
equations assumed to belong to ® are represented by the three
solid sides, reading from top to bottom and from left to nght; and
the fourth equation, which is then required to belong to @, is
represented by the dotted side, again reading from left to right.

For the rest of this section, we let ® be a fixed but arbitrary Hintikka
set. We shall refer to the nine conditions of Def. 7.1 simply as ‘(1)’,
‘(2)’ and so on.
Our aim is to prove that ® is satisfiable. We shall define a particular
valuation o and show that oF ®. In order to define o, we must specify
its various ingredients: first, we must specify its universe U; next, for
each variable x we shall have to specify its value x°, which must of
course be a member of V/; then, for each function symbol f we must
specify the corresponding operation f° on U; finally, for each extra-
logical predicate symbol P we have to specify the corresponding
relation P° on U. (As for the logical predicate symbol =, if it is
present in L£, we have no choice: =° has to be the diagonal relation on
Ue)
Of all the ingredients of o, the first - the universe U — turns out to

S| 8,

-
to
§7. Hintikka sets 169

require most work. Once U has been properly set up, the rest will
follow quite smoothly. The nature of the members of U (that is, what
‘stuff they are made of) is clearly of no importance; what is vital is that
for each term t there should be a member of U to serve as the value t”.
In general, the universe of a valuation may have members that do not
serve as the value of any term under that valuation; but in the present
case Occam’s razor turns out to be useful. So we shall define an object
[t] for each term t and - even before deciding what [t] is to be — we put

7.3. Definition
U =g, {[t] : tis an 2-term}.

Our plan is to define o in such a way that t° = [t] for every term t. As
we have said, the nature of [t] is unimportant; but we must decide
whether distinct terms are to have distinct values; in other words, if s
and t are distinct, should [s] and [t] also be distinct? The simplest
choice is to answer this question in the affirmative. The good news is
that if 2 is without equality then this simplest choice actually works.
The bad news is that it does not work if 2 has equality. The snag is
that in this case ® may contain equations s=t, where s and t are
distinct terms. If o is to satisfy ®, it must in particular satisfy these
equations, which (by the BSD F1) means that s° and t® must be the
same. As we intend these values to be [s] and [t] respectively, we are
forced to allow [s] and [t] to be equal whenever s=t € ®, even though
s and t may be distinct. This motivates the following definition of the
relation E between terms:

7.4. Definition
The relation E holds between two terms s and t — briefly, sEt — if
either £ is without equality and s is the same as t, or £ has equality
and the equation s=t is in ®.

7.5. Lemma
E is an equivalence relation: it is reflexive, symmetric and transitive.

PROOF

The case where & is without equality is trivial. Now suppose that £
does have equality.
170 8. First-order logic

The reflexivity of E follows at once from (7).


To prove that E is symmetric, assume that s=t € ®. We must show
that also t=s € ®. We shall: make use of Rem. 7.2(ii). Choosing s,, sz,
t, and t) as s, s, t ands respectively, we get the configuration shown in
Fig. 3.
The equation s=t (left side of the square) belongs to ® by assump-
tion; and the equation s=s (right and top sides) belongs to ® by (7).
Hence by Rem. 7.2(ii) the equation t=s (bottom side) must also be in
®. So E is symmetric.
To prove that EF is transitive, assume that r=s and s=t are in ®. We
must show that also r=t € ®. Again, we use Rem. 7.2(ii). This time
we choose $1, S2, t; and t, as r, s, r and t respectively, and obtain the
configuration shown in Fig. 4. The equation r=r (left side of the
square) is in ® by (7); and the equations r=s and s=t (top and right
sides) are in ® by assumption. Hence also the equation r=t (bottom
side) is in ®. So E is transitive. a

7.6. Definition
For each term t, we define [t] as the E-class of t (see Def. 2.3.4).
Thus,

[t] =a: [tle = {s: sEt}.


§7. Hintikka sets Wal

7.7. Remarks
(i) If £ is without equality, then [t] is simply {t}, so that if s and t
are distinct terms then [s] and [t] are also distinct. If 2 does have
equality, then [t] is a class of terms that may have several —
indeed even infinitely many — members.
(ii) Recall that by Thm. 2.3.5, [s] = [t] iff sEt. Also, by Cor. 2.3.6,
each term belongs to a unique E-class.
(iii) The class of all 2-strings is a set by Thm. 6.3.9. Hence by AS the
class T of all terms is also a set. For each t, [t] is a subset of T
and so, by Def. 7.3, U C PT. Thus U is a set by AP and AS.

Our intention was to have t° = [t] for very term t. For the particular
case where t is a variable we are free to decree this as part of the
specification of o.

7.8. Definition
We put x? = [x] for each variable x.

Next, for each n-ary function symbol f we must define the n-ary
operation on U that is to serve as f°. To define f°, we must specify, for
each n-tuple of members of U, the member of U produced by
applying f° to that n-tuple. Take n arbitrary members of U; by Def.
7.5 they are of the form ti]. [to|.-2 2a[t,)) wherett ts < at,cate
terms. We must specify a member of U as f?([t,], [t.], ..., [t,]). This
individual (again by Def. 7.3) must have the form [t] where t is some
term. How shall we choose this t? Clearly, t must involve f and t), to,
. , t,. So an obvious choice is

7.9. Definition
If f is any n-ary function symbol and t;, t2,..., t, are any terms,

f°((ti], [ta], Pee 9 [t,]) Sali [ft jt, ODO pal

7.10. Legitimation
If n > 0 — in which case, as stipulated in Sp. 1.1, & must have equality
— then this definition needs to be legitimized. The point is that one and
the same member of U may be represented in more than one way: r
and s may be distinct terms such that the object [r] is the same as [s].
WZ 8. First-order logic

However, the definiendum f?((t;], [t2], ..., [t,,]) must depend only on
the objects [t,], [to], ..., [t,] and not on the particular terms t,, ty,
..., t, that happen to represent them. So we have to prove that the
definiens [ft,t. ».. t,] depends only on the objects [t;], [tz], ..., [tn]
rather than on the particular terms t,, tz, ..., t, used to represent
them. We must therefore show that if [s;] = [t;] for 7=1, 2,..., n,
then also ,

[fs;so ...S,] = [ftyt,... t,].

This is easily done. Indeed, if [s;] = [t,;] for i=1, 2, ..., m, then by
Rem. 7.7(ii) for each i the equation s;=t; is in ®. So by (8) the equation
fs;S> ...8,=ft,t, ...t, is also in ® and [fss) ...s,]=[ft;t....t,].

We have not completed our definition of o: we still have to specify the


relations P°. But we are already in a position to prove

7.11. Lemma
t° = [t] for every term t.

PROOF
We proceed by induction on degt. The case where t is a variable is
covered by Def. 7.8. Now let t be ft,t, .. . t,,. Then

t° = (ftyty ... tp)” = £°(ty”, ty’, ..., te”) by BSD F2,


= eT] Ae [t,]) by ind. hyp.,
= [ftjty ot] by Def. 7.9,
= [t]. 8
To complete the definition of o, we have to define for each extralogical
n-ary predicate symbol P an n-ary relation P° on U; that is, P” must
be defined as a subset of U". To do this, we have to specify, for any
objects [t,], [t.], ..., [t,], whether the n-tuple ([t,], [t2], ..., [t,])
is to belong to P’. How are we to do this? Note that as we have
just proved, ([t;], [tz], ..., [ta]) is (ty°, to’, ..., t,7). Now, by the
BSD F'1, the atomic formula Pt;t, . . . t, is going to be satisfied by o iff
(t1°, to’, ..., t,%) € P®. But remember what o is for: it is supposed to
satisfy ®. Therefore, if Ptt, ... t, is in ® we would like the n-tuple
(t1", to’, ...,t,%) to be in P’. This suggests
§7. Hintikka sets 173

7.12. Definition
If P is any n-ary extralogical predicate symbol, then P? is defined to be
the subset of U” such that for any 7 terms t;, to,..., t,,

ted ltalice os lenl ve b> Ptith....t,.6 ©.

7.13. Legitimation
This definition too needs legitimation. We must make sure that
whether or not ([t;], [t2], ..., [t,]) € P° holds depends on the objects
[t;], [to], ..., [t,] rather than on the terms that happen to represent
them. In other words, it must be proved that if [s;] = [t;] for i = 1, 2,
Men Ih, THON

Psis>....5S,
€ ® = PLL ...t, 6 ®.

This is easy. DIY, using (9). is

7.14. Remark
As mentioned before, if & has equality we have no choice as to the
relation =°; we must put, for all terms s and t,

([s], [t])} € =" = [s] = [t].


But by Rem. 7.7(ii) this amounts to

({s], [t]) €e =" s=te@®.


This means that Def. 7.12 extends automatically also to the logical
predicate symbol =.

Having completed the definition of 0, we can prove

7.15. Theorem
For any formula @,

(a)~ge®=qQ’=T, (b) nage D=>Q°=1.

PROOF
We shall prove this double claim simultaneously by induction on deg @.
We distinguish four cases, corresponding to the clauses of Def. 1.7.
174 8. First-order logic

Case 1: @ is atomic; say @ = Ptit, ... ty.

(la) wpe@M=>Ptt,...t,E6@
=> ([ti], [te], ---, [tr]) € P?
by Def. 7.12 and Rem. 7.14,
=e Atianilo: sects Oy) ek by Lemma 7.11,
== (Ptits..-t.)°= 7 by BSD F1,
=>@g°=T.
(lb) nge®>QgEe® by (1),
=> Pt,t,...t, ¢.@

=> [ti], [t.],..., [t.]) ¢ P?


by Def. 7.12 and Rem. 7.14,
Cc ty eee © re ee by Lemma 7.11,
= (Pt;t,...t,)°=L by BSD F1,
=> @? = 1.

Case 2: @ is a negation formula. Similar to Case 2 in the proof of Thm.


JEANRE

Case 3: @ is an implication formula. Similar to Case 3 in the proof of


Thricey 2073"

Case 4: @ is a universal formula; say @ = Vxa.

(4a) mweD®=Vxac®
=> a(x/t) € ® for every term t by (5),
=> a(x/t)” = T for every term t by ind. hyp.,
=> a%*/) = T (where t = t’) for every termt
by Prob. 6.16,
= alt) = T for every termt by Lemma7.11,
=> 0/4) = T for every ue U by Def. 7.3,
=> (Wxa)? = T by BSD F4,
St aba
(4b) ngage ® > AVxac ®
=> 7a(x/t) € ® for some term t by (6),
=> a(x/t)° = L for some term t by ind. hyp.,
=> a%/) = | (where t = t”) for some term t
by Prob. 6.16,
=> alt) = 1 for some term t by Lemma 7.11,
§8. Prenex formulas; parity 175
=> 0°) = | for some u € U by Def. 7.3,
=> (Vxa)° = 1 by BSD F4,
Sat t= ahd
le
&
We have thus shown that the valuation o — specified by Defs. 7.3, 7.6,
7.8, 7.9 and 7.12 — satisfies the Hintikka set ®. We shall now obtain
an
upper bound for the cardinality of the universe of o.

7.16. Definition
The cardinality of the set of all primitive symbols of & is called the
cardinality of £ and denoted by ‘||£
>

7.17. Theorem
Given a Hintikka set ® in £, we can define an £-valuation o such that
the cardinality of the universe of o is at most ||£\| and such that oF ®.

PROOF
Take o as the valuation specified above. By AC, there exists a choice
function on the universe U of o: a function that selects a single term in
each E-class of terms. Since by Rem. 7.7(ii) distinct E-classes are
disjoint, the choice function is an injection from U to the set of all
£-strings, whose cardinality, by Thm. 6.3.9, is exactly ||£||. r]

§8. Prenex formulas; parity


8.1 Definition
(i) A formula is said to be prenex if it is of the form

Q,x,Q)x, nis Q,x;f,

where k=O and, for each i, Q; is either V or J, and 6 is


quantifier-free (that is, contains no quantifiers). In this connec-
tion the string Q;x,;Q)x, ... Qx, is called the prefix and B the
matrix. If moreover the variables x,, x», ..., x, in the prefix are
distinct and all of them are free in the matrix f, then the formula
is said to be prenex normal.
(ii) A prenex normal form for a formula @ is a prenex normal
formula logically equivalent to a.
176 8. First-order logic

8.2. Problem
(i) Let @ be a formula containing n + 1 occurrences of V. Show how
to find a formula of the form Qxw - where Q is V or J and p
contains only 1 occurrences of V — which is logically equivalent to
@. (Proceed by [strong] induction on deg q. In the case where @ is
a—f, we may assume, by the induction hypothesis, that @ is
logically equivalent to a formula of the form Qxy—f or a—>Qyéd,
and by alphabetic change we can arrange that x is not free in B
and y is not free in a. Then use Prob. 5.13(v)-—(viii).)
(ii) Hence show how to obtain a prenex normal form for any given
formula.

8.3. Definition
By induction on dega, we assign to each formula @ a parity pra,
which is either 0 or 1, as follows:

(1) If @ is atomic, then pra = 0.


(2) Ifa = 4B, then pra = 1 — prB.
(3) If a= By, then pra = (1 — prB)- pry.
(4) Ifa = VxB, then pra = prB.

We say that @ is even or odd according as pra is 0 or 1.

8.4. Problem
(i) Show that the set of all even formulas is a Hintikka set, and
hence is satisfiable.
(ii) Without using (i), define directly a valuation o such that oF « iff
a is even. (Take the universe of o to be a singleton.)

§9. The first-order predicate calculus


We designate as first-order axioms all £-formulas of the following
eight groups:

9.1. Axiom group I


All propositional axioms (7.6.3—7.6.7).
$9. First-order predicate calculus 197

9.2. Axiom group 2


Vx(a—>8)—>Vxa—Vxf, for any formulas @ and B and any variable x.

9.3. Axiom group 3


a— Vx, for any formula @ and any variable x that is not free in a.

9.4. Axiom group 4


Vxa—a(x/t), for any formula a, variable x and term t.

9.5. Axiom group 5


t=t, for any term t.

9.6. Axiom group 6

8, =t,-s)=t,>:: 9s, =t,fs)s. ...8,=ft,t....t,,

forany 7 = 1, any 272 terms §7,°S7,..., Sp; ti, , ..=, t, and any
n-ary function symbol f.

9.7. Axiom group 7

S,;=t,-s,=t,- :- ‘5, =t,— Ps |S coe S,—Pt)t Seonen Oats

(Oteany 7 ly, ANY. 21) 1e0IMS Sis. 85.0 - « yseSy,5 ti, .tono ty, andeany,
n-ary predicate symbol P.

9.8. Axiom group 8


Vx,Vx, ... Vx,a, for any k =1, any variables x,, x2, ..., Xx (not
necessarily distinct) and any -formula a belonging to any of the
preceding axiom groups.

9.9. Remarks
(i) Six of the eight groups of axioms are given by means of schemes;
but the first and last groups are miscellanies. We shall refer to
these eight groups of axioms briefly as ‘Ax. 1’, “Ax. 2’ and so on.
178 8. First-order logic

(ii) If Z is without equality then Ax. 5, 6 and 7 are vacuous, because


then there are no such £-formulas.
(iii) In Ax. 7, P can be the equality symbol =. In this case n = 2 and
we obtain-the axiom scheme
8, =t)}-s2=t,>
8, =s2> | =th.

Fig. 2 of Rem. 7.2(ii) can be used here too as a mnemonic, with


the proviso that the equations are to be read off the square in the
order: left side, right side, top, bottom.

9.10. Definition
(i) The [classical] first-order predicate calculus [in £] (briefly,
Fopcal) is the linear calculus based on the first-order axioms
listed above, and on modus ponens as sole rule of inference.
(ii) First-order deduction is defined in the same way as propositional
deduction (Def. 7.6.8), except that ‘propositional axiom’ is re-
placed by ‘first-order axiom’.
(iii) We use ‘+’ to denote first-order deducibility — that is, deducibility
in Fopcal — in the same way as ‘fy’ was used to denote proposi-
tional deducibility.
(iv) All terminological and notational definitions and conventions laid
down in §§6-8 and §12 of Ch. 7 in connection with +» and
Propcal are hereby adopted, mutatis mutandis, in connection with
+ and Fopcal.

9.11. Theorem
The Cut Rule, the Deduction Theorem, the Inconsistency Effect, reduc-
tio ad absurdum and the Principle of Indirect Proof hold for Fopcal. @

9.12. Remark
In B&M a similar system of axioms is used, but Ax. 4 is subject to the
proviso that t be free for x in a. The two versions of Fopcal are
equivalent, the B&M version is more economical whereas the present
one is a bit more user-friendly.

9.13. Warning
Versions of the classical Fopcal found in the literature fall into two
groups. One group consists of strong versions that are equivalent to
$9. First-order predicate calculus 179

ours. The other group consists of weak versions that are equivalent to
each other, but not to ours. To describe the relationship between the
two groups, let us denote by ‘+”” the relation of deducibility in a weak
version of Fopcal. The following four facts must be noted.

(i) Whenever ®} @ then also ® +’ a, but the converse does not


always hold — it is in this sense that | is stronger than +’.
(ii) For any set ® of formulas, let ®” be a set of sentences obtained
from ® upon replacing each @ € ® by Vx, Vx, ... Vx,@, where
X}, X), ..., X, are the free variables of @. Then ®+" a iff
®’ +a.
(iii) While DT holds for | outright (see Thm. 9.11), only a restricted
version of it, subject to certain conditions, holds for EV
(iv) An unrestricted rule of generalization holds for +’: if ® +’ @ then
also ® +’ Vxa, where x is any variable. For + only a restricted
version of this rule holds, as we shall see.

9.14. Theorem (Semantic soundness of Fopcal)


If ® + @ then also ® Fa. In particular, if ta then also Fa.

PROOF

Similar to the proof of the soundness of the propositional calculus


(Thm. 7.6.12), except that now it needs to be verified that all first-
order axioms are logically valid. This is straightforward; DIY. a

9.15. Theorem
If ® +o a then also ® + «@. In particular, if to & then also F a. |

9.16. Problem
Prove that | a(x/t)>4xa.

9.17. Problem
Prove that + dx(t=x), provided x does not occur in t. Point out where
you use the assumption about x and t.
180 8. First-order logic

§10. Rules of instantiation and generalization


10.1. Theorem (Rule of Universal Instantiation)
If ® + Vxa then ® | a(x/t) for any term t. gz

10.2. Remarks
(i) For brevity we shall refer to this rule as ‘UI’.
(ii) Clearly, UI holds for any linear calculus with modus ponens as a
rule of inference and all formulas of the form Vxa—a(x/t) as
theorems.
(iii) The only purpose of adopting Ax. 4 was to enable us to establish
UI. Now that we have done so, Ax. 4 need not be invoked again.
Indeed, it is easy to see that any calculus for which UI and DT
hold has all formulas of the form Vxa—a(x/t) as theorems.
(iv) Closely related to UI is the Rule of Existential Generalization
(briefly, EG): If ® + a(x/t) for some term t, then ® + Axa. This
rule follows at once from Prob. 9.16.

10.3. Definition
A variable is said to be free in a set (or a sequence) of formulas, if that
variable is free in some formula belonging to the set (or the sequence).

16.4. Theorem
Given a deduction D of a formula « from a set ® of hypotheses, if x is
a variable that is not free in ® then we can construct a deduction D' of
Vxa from ® such that x is not free in D' and every variable free in D'
is free in D as well.

PROOF

Let D be Qj, Q2, .-., @n3 SO , =a. We shall show by induction on k


(k =1,2,..., n) how to construct a deduction D; of Vxq@, from ®,
such that x is not free in D,, and every variable free in D, is free also
in D. Then we can take D,, as the required D’.

Case 1: @, is an axiom of Fopcal. Then Vxq@, is likewise an axiom —


Ax. 8 — and we can take D, as this formula by itself.

Case 2: @, € ®. Then by assumption x is not free in @,, and we can


$10. Instantiation and generalization 181

take D; to be

Gx, (hyp.)
QP. VXQx, (Ax. 3)
Vx@. (m.p.)

Case 3: @, is obtained by modus ponens from two earlier formulas in


D. Then there are i, j << k such that ©; = Vi > Q,. By the induction
hypothesis, we already possess deductions with the required proper-
ties, D; and D; of Vx@; and Vx(q;—>@,) respectively. It is now enough
to show that from these two formulas the formula Vx@, can be
deduced by means of a deduction in which x is not free and whose free
variables are all included among those of D. Here is such a deduction:

Vxq;, (hyp.)
Vx(Gi> Q,), (hyp.)
Vx(@;> ©, ) > Vx@;> Vx, (Ax. 2)
VQi>VXQx, (m.p.)
VxQ,. (m.p.)

10.5. Corollary (Rule of Universal Generalization on a Variable)


If ® + «@ and x is not free in ® then ® | Vxa. ®

10.6. Remarks
(i) We shall refer to this rule briefly as ‘UGV’.
(11) The only purpose of adopting Ax. 2, Ax. 3 and Ax. 8 was to
enable us to prove Thm. 10.4. Now that this has been done these
axioms need not be invoked again.
(iii) It is obvious that if +* is the relation of deducibility in any
calculus for which UGV holds, then from +*@ it follows that also
+*Y/xa for any variable x (cf. Ax. 8). If in addition DT also holds
for }*, then }*a—Vxea. for any formula @ and any variable x that
is not free in @ (cf. Ax. 3). See also Prob. 10.7 below.
(iv) Thm. 10.4 can be strengthened: it is enough to require that x is
not free in any formula of ® used as a hypothesis in the given
deduction (although it may be free in formulas of ® that are not
so used). To see this, let ®p be the set of those members of ®
that are used in the given deduction D, and apply the theorem to
®,. Similarly, in Cor. 10.5 it is enough to require that x is not
182 8. First-order logic

free in members of ® used as hypotheses in some particular


deduction of « from ®. Similar remarks apply also to other
results in the present section.
(v) On the other hand, the proviso that x must not be free in the
hypotheses used to deduce «@ is essential. For example, let a be
x#y, where x and y are distinct variables. If not for the proviso
in Cor. 10.5, we would have x#y + Vx(x#y) and hence, by Thm.
9.14, also x#y — Vx(x#y). But this is absurd, as x#y is clearly
satisfied by any valuation that assigns x and y distinct values,
whereas Vx(x # y) is satisfied by no valuation.

10.7. Problem
Let '* be the relation of deducibility in a calculus with modus ponens
as a rule of inference and for which Cut, DT, UI and UGV hold. Show
that /*Vx(a>B8)>Vxa—>Vxf for any formulas «@ and fB and any
variable x.

10.8. Definition
For any formula @ and variable x, we put

Alxa =; JyVx(a<>x=y),

where y is the first variable in alphabetic order that differs from x and
is not free in a.

10.9. Problem
(i) Verify that oF A!xo iff o(x/w) a for exactly one individual u in
the universe U of o.
(ii) Prove that -d!x(t=x), provided x does not occur in t.

10.10. Theorem (Rule of Universal Generalization on a Constant)


If ® + a(x/e), where ¢ is a constant that occurs neither in ® nor in @,
then also ® | Vxa.

PROOF

Let D be a deduction q,, @, ..., @, of a(x/e) from ®. Thus


On = a(x/c).
Now let y be a new variable, in the sense that it is distinct from x
$11. Consistency 183

and does not occur at all (either free or bound) in the deduction D. Let
D' be the sequence @;', @.’, ..., @,’ of formulas obtained from D
upon replacing ¢ everywhere by y. We claim that D’ is a deduction of
a(x/y) from ®.
Indeed, for any k (where 1 < k <n) three cases are possible. First,
@, may be an axiom. In this case it is easy to verify that @,' is also an
axiom. Second, @, may be a hypothesis, a member of ®. In this case
@x' 1S @, itself, because ¢ does not occur in ®. Finally, @, may have
been obtained by modus ponens from two earlier formulas in D, q;
and q;. In this case it is obvious that @,’ is obtained by modus ponens
from g;' and q;’. Thus D’ is a deduction of a(x/c)' from ®.
We still have to show that a@(x/c)’ is in fact a(x/y). To see this, recall
that ¢ does not occur in @. Thus the occurrences of ¢ in a(x/c) are just
those that replace the free occurrences of x in a; there are no other
occurrences of ¢ in a(x/c). Now, a(x/c)’ was obtained from a(x/c)
upon replacing these occurrences of ¢ by the new variable y. Thus
a(x/c)’ can be obtained directly from @ upon replacing all free occur-
rences of x in @ by y. But a(x/y) is obtained from @ in precisely the
same way, because y is a new variable, not occurring in @, so that the
substitution of y for x in @ does not involve any alphabetic changes.
We have now established that D’ is indeed a deduction of a(x/y)
from ®. Moreover, note that y does not occur in those members of ®
that are used as hypotheses in D’: the only occurrences of y in D’ are
those that have replaced occurrences of c, but ¢ does not occur in ®.
Therefore by UGV we have ® | Vy[a(x/y)].
By UI we have Vy[a(x/y)] + a(x/y)(y/x). But it is easy to see that
a(x/y)(y/x) is in fact @ itself; hence we have got Vy[a(x/y)] a. Now,
x is clearly not free in Vy[a(x/y)], so we can use UGV again and
obtain Vy[a(x/y)] + Vxa.
By Cut we finally have ® + Vxa., as required. @

10.11. Remark
We shall refer to this rule briefly as ‘UGC’.

§11. Consistency

As decreed in Def. 9.10(iv), a set ® of £&-formulas is [first-order]


inconsistent (briefly, ®+) if both members of a contradictory pair can
be deduced from ® in Fopcal. Otherwise, ® is [first-order] consistent.
184 8. First-order logic

We have already noted (Thm. 9.11) that IE, reductio and PIP hold
for Fopcal. The other results of §8 of Ch. 7 also have counterparts in
Fopcal. In particular, the following two results are proved similarly to
Thm. 7.8.4 and Cor. 7.8.5.

11.1. Theorem
If B+ then PE. &

11.2. Corollary (Consistency of Fopcal)


It is impossible that both +a and + ma. Ld

11.3. Remark
This proof of the consistency of Fopcal uses semantic notions which,
generally speaking, require a relatively powerful set-theoretic ambient
theory (see Rem. 4.14). On the other hand, since deductions are finite
objects, proof-theoretic notions such as deducibility and consistency
are quite elementary. It is therefore natural to ask whether the
consistency of Fopcal can be proved in an elementary way, without
appealing to semantics. Such a proof is outlined in the following
problem.

11.4. Problem
(i) Show that if ® + a and ® is a set of even formulas (see Def. 8.3)
then @ is even as well. (Verify that all the axioms of Fopcal are
even formulas and that modus ponens yields an even conclusion
from even premisses.)
(ii) Hence prove the consistency of Fopcal.

We shall now prove a few results that have no counterpart in the


propositional calculus. These results, which will be needed later, are
concerned with a consistent set ® of formulas that contains formulas of
the form 4Vxa. We add to ® ‘witnessing’ formulas -a(x/c), where
the ‘witness’ c is a fresh constant, that does not occur in ®. We prove
that the resulting set is consistent. First we consider the case where just
one witnessing formula is added; then a finite number; and then an
arbitrary set of such formulas.
$11. Consistency 185

11.5. Lemma
If ® is consistent and AVxa¢€ ®, and ¢ is a constant that does not
occur in ®, then ® U {7.a(x/c)} is also consistent.

PROOF

If ®, ma’ \+, then by PIP ® | a(x/c). As ¢ does not occur in ® and


as we ai uming that ~Vxae¢@®, c¢ cannot occur in @ either.
Therefore by UGC ®+| Vxa. But this is impossible, since 7Vxa € ®
and ® was assumed to be consistent. i

11.6. Problem
Prove the Rule of Existential Instantiation with a Constant (EIC): Jf
® is consistent and 4xa € ®, and ¢ is a constant that does not occur in
®, then ® U {a(x/c)} is also consistent.

11.7. Lemma
Let ® be consistent; for each i=1, 2, ..., k, let 3Vx;a; € ®, and
let ¢; be distinct constants that do not occur in ®.
Then ® U {4 ;,(x;/c;):i = 1,2,..., k} is also consistent.

PROOF

DIY by [weak] induction on k, using Lemma 11.5. a

11.8. Lemma
Let ® be consistent; let ®' be obtained from ® by adding, for every
formula of the form aNxa in ®, a ‘witnessing’ formula 74(x/c),
where ¢ does not occur in ® and where distinct constants ¢ are used for
distinct formulas of the form axa. Then ®’ is consistent as well.

PROOF
It is enough to prove that every finite subset of ®’ is consistent. (Cf.
Prob. 7.8.3(i): a similar result clearly holds for Fopcal.) However, a
finite subset of ®’ contains only a finite number of the new witnessing
formulas, and is therefore included in a set of the form ®U
{71a,;(x;/e;):i=1,2,..., k}, which is consistent by Lemma 11.7. Mf

In the sequel we shall need to consider, in addition to a given


186 8. First-order logic

first-order language “, languages obtained from it by adding new


individual constants, which will be used in connection with Lemma
11.8. We shall need to be. sure that a consistent set of -formulas
remains consistent within such an extended language.
This is not entirely obvious. Suppose £* is obtained from £ by
adding a set C of new constants. Let ® be a set of &-formulas that is
consistent within . This means that there exists an “-formula « that
is not deducible from ® within 2. But in &* there are formulas that
do not belong to & and in particular there are more axioms —
additional members of the eight axiom groups — containing new
constants. Can the formula @ become deducible from ® within £7 by
using these additional axioms?
We shall now show that this is in fact impossible.

11.9. Theorem
Let ® be a set of £-formulas that is consistent within £. Let £* be
obtained from £ by adding a set C of new individual constants. Then ®
is consistent within £* as well.

PROOF

By assumption, there is an &-formula @ not deducible from ® in /. It


is enough to show that this remains the case also in L£*.
Suppose that D is a deduction of a from ® within £*. Since D is a
finite sequence of 2*-formulas, it can contain only a finite number of
new constants, say ¢;, Cy, ... , €,,. Now choose m distinct variables yj,
Y2, ---, Ym that do not occur (free or bound) in D and let D’ be
obtained from D upon replacing ¢;, ¢2, ..., ¢,, throughout by yj, yo,
. > Ym respectively.
An argument similar to that used in the proof of Thm. 10.10 shows
that D’ is a deduction from ®. Indeed, when D was transformed into
D' any axiom used in D was transformed into an axiom; any hypothe-
sis remained unchanged (since ® is a set of &-formulas, it contains no
new constants); and any application of modus ponens in D was
transformed into an application of modus ponens. Now, D’ is a
deduction within £, because the new constants that were present have
been supplanted by variables. The last member of D’ is still a, which
has remained unchanged as it does not contain any new constants. So
now we have a deduction of a« from ® within 2 — contrary to our
original assumption. &
$12. Maximal consistency 187

§ 12. Maximal consistency


By Def. 9.10(iv), a set of £-formulas is maximal [first-order] consistent
[in £] if it is consistent but not included in any other consistent set of
£-formulas. As usual, we omit the qualifications ‘first-order’ and ‘in 2’
when there is no risk of confusion. The following two theorems are
proved in exactly the same way as their propositional counterparts.

12.1. Theorem
If ® is a maximal consistent setand ® + a, thnae ®. a

12.2. Theorem
A consistent set ® is maximal consistent iff for every formula © either
aePMornac®. i

12.3. Remark
From Thm. 12.2 it follows that if / is extended to a richer language
£~, by adding new extralogical symbols (for example, new constants)
then a set ® of “-formulas that is maximal consistent in £ will no
longer be so in £*. Indeed, if @ is an £*-formula containing a new
symbol (one that does not belong to £) then @ is not an £-formula, so
neither @ nor —a@ can belong to ®. Of course, by Thm. 11.9 @® is still
consistent in £*.

The following result is proved similarly to Thm. 7.12.6(i).

12.4. Theorem
For any valuation o, the set {@: @° = T } is maximal consistent. a

The counterpart of Thm. 7.12.6(ii) is also true: every maximal [first-


order] consistent set has the form {@: @° = T} for a unique valuation
o. But in order to prove this we must first show that every maximal
consistent set is satisfiable. In propositional logic we were able to show
that every maximal [propositionally] consistent set is a [propositional]
Hintikka set, and hence satisfiable. Here matters are not so simple.
188 &. First-order logic

12.5. Theorem
If ® is maximal consistent, it fulfils conditions (1)—(5) and (7)—(9) of
Detain

PROOF
Conditions (1)—(4) are verified as in the proof of Thm. 7.12.5. Condi-
tions (5) and (7)-(9) are verified by invoking UI and Ax. 5—Ax. 7
respectively and using Thm. 12.1. a

The following problem provides a counter-example showing that a


maximal consistent set need not fulfil the missing condition (6) of Def.
7.1, and hence need not be a Hintikka set.

12.6. Problem
Let £ be a first-order language with equality but without any extra-
logical symbols. Let o be the -£-valuation whose universe is
U = {u, v}, where u and v are distinct, and such that x° = u for every
variable x. Let ® = {@: @’ = T}, so that by Thm. 12.4 ® is maximal
consistent. Let a be the formula x=y, where x and y are distinct
variables.
Show. that —~Vxae@® but there is no L£-term t such that
—.a(x/t) € ®. (Note that the only terms of are the variables.)

§ 13. Completeness
13.1. Preview
As in propositional logic, the [strong] completeness of Fopcal will
follow immediately once we show that any given consistent set ® of
£-formulas is satisfiable. Also, exactly as in propositional logic, it is
easy to see that the set of all consistent sets of “-formulas is of finite
character (cf. proof of Thm. 7.13.1); hence, by the Tukey—Teichmiiller
Lemma (Thm. 5.2.8), any consistent ® is included in some W that is
maximal consistent within &. However, since a maximal consistent set
may not be a Hintikka set, we have no direct way of showing that W is
satisfiable.
It is clear from Thm. 12.5 and Prob. 12.6 that the only reason that
may prevent W from being a Hintikka set is the absence in it of
witnessing formulas. To overcome this obstacle, we use Lemma 11.8,
$13. Completeness 189
and add to W enough witnessing formulas, using constants as witnes-
ses. However, in order to make sure that these witness constants do
not occur in W (as Lemma 11.8 requires) we extend % to a richer
language “; by adding an adequate supply of new constants. By Thm.
11.9 W is still consistent in £;, so we may apply Lemma 11.8 there. Let
®, be the set so obtained. Unfortunately, in 2; W is no longer
maximal consistent (see Rem. 12.3), nor does the addition of new
witnessing formulas produce a maximal consistent set: all we can say
about ®, is that it is consistent. It seems as though we are back where
we Started.
Not despairing, we extend ®, to a maximal consistent set W, within
£,. Then we extend £;, to a richer language -2, by adding yet more
new constants, and get ®, from W, in the same way as we got ®,
from W.
The good news is that by iterating this procedure ad infinitum we
obtain in the limit a set that is not only maximal consistent but also a
Hintikka set, and includes our original set ®.
Throughout this section we shall be working within set theory (that
is, assume it as an ambient metatheory). In particular, as explained in
Rem. 6.1.8, we shall identify the natural numbers with the finite
ordinals (a.k.a. finite cardinals).

13.2. Definition
A set ® of £-formulas is a Henkin set in £ if ® is maximal consistent
in £ and, for any formula «@ and variable x, if aVxae@® then
“.a(x/t) € ® for some term t.

13.3. Remark
From Thm. 12.5 and Def. 7.1 it follows at once that a Henkin set in 2
is also a Hintikka set in &. Hence by Thm. 7.17 such a set is satisfied
by some valuation whose universe has cardinality not greater than ||£).
From now until the end of the proof of Thm. 13.8 we let ® be a
fixed but arbitrary consistent set of -formulas.
By [weak] induction on n we define for each natural number n a
first-order language &,,, a set ®, of £,-formulas, and a set W, of
£,-formulas that is maximal consistent in £,,.
190 &. First-order logic

13.4, Definition
Basis. We put £) = £ and ®,=@. As Wo we choose some set of
formulas that is maximal consistent in “, and includes ®o. (The
existence of such Wy is ensured by the Tukey—Teichmiiller Lemma.)

Induction step. Assume as induction hypothesis that £,, ®, and W,,


have been defined, and that W,, is a set of £,,-formulas that is maximal
consistent within “,,.
For each £,,-formula q, let ¢, be a new constant (not present in £,,)
such that if @ and w are distinct formulas then cy, and c,, are distinct
constants. Let C, be the set of all these new constants:
C,, = {cg : gis an L,,-formula}.
We define £,,, as the language obtained by adding the set of
constants C, to £,.
Since W,, is maximal consistent in £,,, it follows from Thm. 11.9 that
it is still consistent (albeit not maximally so) in the richer language
£41. We define ®,.,, to be the set of formulas obtained from W,, as
follows: for each formula ge W, of the form 4Vxa, add to W,, the
formula 4a(x/c,), where cy is the new constant in C,, corresponding
to this particular formula @. Clearly, ®,.; is a set of “,,.,-formulas.
And since W,, is a set of £,-formulas, none of these new constants
occur in it, so by Lemma 11.8 ®,,,, is consistent.
Finally, we choose as W,,,; some set of formulas that is maximal
consistent in £,,, and includes ®,,,,. (The existence of such a set is
again ensured by the Tukey—Teichmiiller Lemma. )
This concludes our inductive definition.

13.5. Remark
From Def. 13.4 it is evident that the ®,, and W,, form a chain of sets:

POPOV) Se OLS eae, Cw ap are we ae en

13.6. Definition
b
We define ,, as the union of all the languages “,; and W,, as the
union of all the sets W,, for n = 0;1,2,.....
Thus -£,, is obtained from “ by adding to the latter the union of all
the sets C,,, for n =0, 1, 2,...; and an £,,-formula a belongs to W,,
iff it belongs to W,, for some n.
§ 13. Completeness 191

13.7. Remark
From Rem. 13.5 it follows that an “.,-formula a belongs to W,, iff
there is some n such that a € W, for all k= n.

13.8. Theorem
Wis a Henkin set in L£,,.

PROOF

First, we show that W., is consistent. For the same reason as in


propositional logic (cf. Prob. 7.8.3), it is enough to show that every
finite subset of W,, is consistent. So let a1, a, ..., @,, be members of
W.,; we shall show that {a@,, a, ..., @,,} is consistent.
Since a, € W,,, it follows (see Rem. 13.7) that there is a number n,
such that a, € W;, for all k = n,. Similarly, there is a number n> such
that a € WY, for all k > nz. And so on for each of the a;, where j = 1,
2, ..., m. Now let k be any number greater than the m numbers nj,
N2,..., Mm. Then clearly a; « W, for j=1,2,..., m. It follows that
{@1, @,..., @,} C P,. But by Def. 13.4 W, is maximal consistent in
£,, hence consistent. So its subset {a,, G), ..., @,,} is certainly
consistent, as claimed.
By Thm. 12.2, in order to show that W,, is maximal consistent in £,,
it is enough to show that for any £,,-formula a, either a or 4@ is in
W,. So let a be any &,,-formula. Now, @ can only contain a finite
number of the new constants (those not in the original language “);
say these constants are c;, C2, ..., ¢,,. An argument entirely similar to
the one used in the preceding paragraph shows that if k is a sufficiently
big number then all these m constants are present in &,. Thus @ is in
fact an “,-formula for some k. But by Def. 13.4 W, is maximal
consistent in £,, so @ or —@ must belong to W, and hence also to W,,,
which includes W,.
Having proved that W,, is maximal consistent in £,,, we need only
show that it fulfils the additional condition: given that aVxa € W,, we
have to show that 4a(x/t)e W,, for some term t. However, if
3AVxae W,, then by Def. 13.6 aVxa e¢ W,, for some n. Therefore by
Def. 13.4 a formula 4a(x/c) — where c is a suitably chosen new
constant belonging to C,, — was one of the formulas added to W,, to
obtain ®,,,,. Thus 4a(x/c) € ®,4; C B,4,; C Wy. ea
192 8. First-order logic

13.9. Theorem
If ® is a consistent set of £-formulas then ® is satisfied by some
L-valuation whose universe has cardinality not greater than |\|£\.

PROOF
We have specified in Defs. 13.4 and 13.6 how to extend the language
£ to a language “,, by adding new constants, and how to define a set
Ww, of £,,-formulas such that ® C W,; and we have shown in Thm.
13.8 that W,, is a Henkin set in Z,,.
By Rem. 13.3, W,, — and hence also its subset ® — is satisfied by
some “,,-valuation, say 0,,, as obtained in §7, whose universe has
cardinality not greater than ||-Z,,|).
Let o be the £-valuation that agrees with o,, on all the variables, as
well as on all the extralogical symbols of “. (The only difference
between o,, and o is that the former assigns interpretations to the new
constants, which are not in £, while o ignores them.) Then clearly o is
an £-valuation that satisfies ®.
The universe of o is the same as that of o,,; so we shall complete the
proof by showing that |[Z,,|| = |L2||. For brevity, we put A= ||2||. Of
course, A is an infinite cardinal, because the set of variables is infinite;
in fact, its cardinality is Xo.
The set of all -formulas is included in the set of all &-strings, hence
by Thm. 6.3.9 the cardinality of the former set is <A. (In fact, it is
quite easy to show that its cardinality is exactly A, but we shall not need
this.) Recall that Lo is £ itself; so by Def. 13.4 Co is equipollent to the
set of £-formulas, hence |Co| <A. By Def. 13.4 and Thm. 6.3.6 we
have ||£,||= A. The same argument shows, by induction on n, that
\|-2,,|| =A and |C,,| <A for all n.
It now follows that |U{C,, : 2 < @}| < &o-4, which by Thm. 6.3.5 is
exactly A. Using Thm. 6.3.6 as before, we see that ||2,,|| = A. a

We can now prove

13.10. Theorem (Strong semantic completeness of F opcal)


For any set ® of formulas and any formula a, if ® Fa then D+ a.

PROOF

Similar to that of Thm. 7.13.2. a


$13. Completeness 193

13.11. Remarks

(i) Conjoining Thms. 9.14 and 13.10 we have

PFac@Pla.

Similarly, from Thms. 11.1 and 13.9 we get

PFs@®@}.

(ii) As pointed out in Rem. 4.14, the notions of logical consequence


and (un)satisfiability are essentially set-theoretic and thus pre-
suppose a fairly strong ambient theory. In contrast, as pointed
out in Rem. 11.3, the notions of deducibility and (in)consistency
in Fopcal are relatively elementary and do not require an ambient
theory that treats infinite pluralities as objects. It is therefore
highly remarkable that logical consequence and unsatisfiability
turn out to be equivalent to deducibility and inconsistency,
respectively. Of course, the proof of this equivalence required
rather powerful set theory.
(iii) Note however that if the primitive symbols of & are given by
explicit enumeration, the proof can be made more elementary: in
Def. 13.4, instead of invoking the TT Lemma we can obtain the
maximal consistent sets W,, as outlined in Rem. 7.13.3(i).

We conclude this chapter with two very important results.

13.12. Theorem (Compactness theorem for first-order logic)


If ® is a set of formulas such that every finite subset of ® is satisfiable,
then so is ® itself.

PROOF
Similar to that of Thm. 7.13.4. |

13.13. Theorem (Lowenheim—Skolem)


Let ® be a satisfiable set of &-formulas. Then there exists a valuation 0
such that 0& ® and such that the universe of o has cardinality not
greater than ||£\\.

PROOF

By Thm. 11.1, ® is consistent. Now apply Thm. 13.9. a


9
Facts from recursion theory

§1. Preliminaries
1.1. Preview
In this chapter we put formal languages on one side and present some
concepts and results from recursion theory that will be needed in the
sequel.
Recursion theory was created in the 1930s by logicians (Alonzo
Church, Kurt Gédel, Stephen Kleene, Emil Post, Alan Turing and
others) mainly for the sake of its applications to logic. But the theory
itself belongs to the abstract part of computing science. It is concerned
with computability — roughly speaking, the property of being mechan-
ically computable in principle (ignoring practical limitations of time
and memory storage space).
Our exposition will be neither rigorous nor self-contained. For some
of the key concepts, we shall provide intuitive explanations rather than
precise definitions. Instead of proving all theorems rigorously, we shall
in most cases present intuitive arguments. One major result — the
MRDP Theorem - will be stated without proof.
For a rigorous coverage of all this material, see Ch. 6 of B&M.
Alternative presentations of recursion theory can be found in books
wholly devoted to this subject, as well as in books that combine it with
logic. A classic of the first kind is

Hartly Rogers, Theory of recursive functions and effective


computability.

A fairly recent example of the second kind of book is

Daniel E. Cohen, Computability and logic.

194
§1. Preliminaries 195

1.2. Conventions

(i) In this chapter, by n-ary relation we mean n-ary relation on the


set N of natural numbers — that is, a subset of N”. In particular,
a property is a subset of N. By relation we mean n-ary relation
for some n= 1.
(ii) By n-ary function we mean an n-ary operation on N (see Defs.
8.3.2 and 8.3.4). In particular, a 0-ary function is just a natural
number. By function we mean n-ary function for some n = 0.
(iii) We use small italic letters — especially ‘a’, ‘b’, ‘c’, ‘x’, ‘y’ and ‘z’,
with or without subscripts — as informal variables ranging over
natural numbers; that is, the values of these variables are always
assumed to be natural numbers.
(iv) We use small German letters as informal variables ranging over
n-tuples of natural numbers. For the 7-th component of such an
n-tuple we use the corresponding italic letter with subscript ‘7’.
For examples d= 445055. a5 G,) BDO NS HX, Koy ee oe Xn) s
(v) If P is an n-ary relation, we often write ‘Pa’ instead of ‘a € P’.

1.3. Definition

(i) We define propositional (a.k.a. Boolean) operations on relations


as follows. If P is an n-ary relation, then its negation —P is
defined by stipulating, for all x e N”:
— Px = Px does not hold.

If P and Q are n-ary relations, we define their disjunction Pv Q


by stipulating, for all x € N”:
CPI Oye Pe orOy:

Other propositional operations, such as conjunction and implica-


tion, can be defined in the obvious way, either directly or from
negation and disjunction. We shall usually write, e.g., “Px v Qx’
instead of ‘(P v Q)x’.
(ii) If QO is an (n + 1)-ary relation, we can obtain an n-ary relation P
by stipulating, for all x e N”:
Px = Q(x, y) holds for some y.

We shall write, more briefly, Px <= JyQ(x, y), and say that P is
obtained from Q by existential quantification.
196 9. Recursion theory

The operation of universal quantification is defined in the


obvious way, directly or in terms of negation and existential
quantification.
(iii) The propositional operations as well as the two quantifications
are called /ogical operations.

1.4. Warning
Take care not to confuse ‘4’, ‘V’, etc. with their bold-face counter-
parts, ‘1’, ‘V’, etc. The former denote operations on relations; the
latter denote symbols in a formal language (which we are not studying
in this chapter). The typographical similarity between the two sets of
symbols is an intended pun and a mnemonic device, as will become
clearer in the next chapter.

§2. Computers
We shall define the central concepts of recursion theory in terms of the
notion of computer. The computers we have in mind are like real-life
programmable digital computers, but idealized in one crucial respect
(see Assumption 2.6 below). To help clarify this notion, we state in
informal intuitive terms the most essential assumptions we will make
about computers and the way they operate.

2.1. Assumption
A computer is a digital calculating machine: its states differ from each
other in a discrete manner. (This rules out analogue calculating devices
such as the slide-rule, whose states [are supposed to] vary continu-
ously.)

2.2. Assumption
A computer is a deterministic mechanism: it operates by rigidly and
deterministically following instructions stored in it in advance. (This
rules out resort to chance or random devices.)

2.3. Assumption
A computer operates in a serial discrete step-wise manner.
§2. Computers 197

2.4, Assumption
A computer has a memory capable of storing finitely many [represen-
tations of] natural numbers — which may be part of the input or the
output or an intermediate stage of a computation — and instructions.
(Without loss of generality, we may assume that instructions are coded
by natural numbers, as is in fact the case in present-day programmable
computers; so the content of the memory is always a finite sequence of
numbers.)

2.5. Assumption
A computer operates according to a program, a finite list of instruc-
tions, stored in it in advance (see Assumptions 2.2 and 2.4). Each
instruction requires the computer to execute a simple step such as to
erase a number stored in a specified location in the memory, or
increase by 1 the number stored in a specified location, or print out as
output the number stored in a specified location, or simply to stop.
After each step, the next instruction to be obeyed is determined by the
content of the memory (including the program itself).

2.6. Assumption
The computer’s memory has an unlimited storage capacity: it is able to
store an arbitrarily long finite sequence of natural numbers, each of
which can be arbitrarily large. (Thus, although the amount of informa-
tion stored in the memory is always finite, we assume that this amount
has no upper bound.)

2.7. Remarks
(i) Assumptions 2.1-2.5 are perfectly realistic: they are in fact
satisfied by many existing machines, from giant super-computers
down to modest programmable pocket calculators. Assumption
2.6, in contrast, is a far-reaching idealization: a real-life machine
can only store a limited amount of information. While the storage
capacity of many real machines can be enhanced by adding on
peripheral devices such as magnetic tapes or disks, this cannot be
done without limit.
(ii) In connection with Assumption 2.5 it is interesting to note that
the repertory of commands that a computer is able to obey (that
198 9. Recursion theory

is, the range of elementary steps it is able to perform) need not


be at all impressive: in this respect the powers of a modest
programmable pocket calculator are more than adequate. Real-
life computing machines vary enormously in memory size and
speed of operation. But if we assume that restrictions of memory
size are removed, then the only significant difference is that of
speed. Provided it had access to unlimited storage capacity, a
machine with fairly rudimentary powers could simulate (if only at
much reduced speed) the operation of any computer that has so
far been constructed or described.
(iii) Several computers can be combined to form a more complex
system, which can itself be regarded as a computer.

§3. Recursiveness
3.1. Definition
Let P be an n-ary relation. By a decide-P machine we mean a
computer with an input port and an output port, which is programmed
so that if any n-tuple x e N” is fed into the input port then after a
finite number of steps the computer prints out an output — say 1 for yes
and 0 for no — indicating whether Px holds or not.
A relation P is recursive (or computable) if a decide- P machine can
be constructed (that is, if a computer can be programmed to act as a
decide-P machine).

3.2. Remarks

(i) Naturally, the length of the computation, the number of steps


required by the machine to produce an output, will in general
depend on the input n-tuple x. We impose no bound on the
length of the computation but merely require it to be finite. Thus
we ignore real-life limitations of time: in practice a computation
that may take a million years is useless.
(ii) To be precise we should have said that the inputs fed into the
computer are not n-tuples of numbers (which are abstract enti-
ties) but representations of such n-tuples. Similarly what the
computer prints out is not a number, 0 or 1, but a representation
of a number. Similar — quite harmless — lapses will be committed
throughout this chapter.
§3. Recursiveness 199

(iii) Any relation you are likely to think of, off-hand, is certain to be
recursive — unless you are already familiar with some of the tricks
of recursion theory or are exceptionally ingenious. (We shall
meet examples of non-recursive relations in the next chapter.)
(iv) Nevertheless, set-theoretically speaking, the overwhelming ma-
jority of relations are non-recursive. (Here is an outline of a
proof. Working within ZF set theory, we identify N with the set
of finite cardinals. Using Thm. 6.3.7 and Cantor’s Thm. 3.6.8, it
is easy to show that for each n = 1 the set of all n-ary relations
has cardinality >. On the other hand, a computer program is a
finite string of instructions, each of which is a finite string of
symbols in some programming language with a countable set of
primitive symbols. Hence by Thm. 6.3.9 the set of all programs is
countable. If follows that the set of all recursive relations must
also be countable.)

3.3 Definition
Let P be an n-ary relation. By an enumerate-P machine we mean a
computer with an output port and programmed so that it prints out,
one by one, all the n-tuples x e€ N” for which Px holds, and no others.
A relation P is said to be recursively enumerable — briefly, r.e. — if
an enumerate-P can be constructed (that is, if a computer can be
programmed to act as an enumerate-P machine).

3.4. Remarks

(i) If P is infinite (that is, holds for infinitely many n-tuples) then an
enumerate-P machine, once switched on, will never stop unless it
is switched off. We impose no bound on the number of computa-
tion steps the machine may make between printing out two
successive n-tuples; we only require it to be finite.
(ii) An r.e. relation is sometimes said to be semi-recursive. The
reason for this will soon become clear.

3.5. Lemma
The n-ary relation N” (the set of all n-tuples of natural numbers) is r.e.
200 9. Recursion theory

PROOF

All n-tuples can be arranged in some systematic order. For example,


we may order them according to the following two rules:

1. If the maximal component of a is smaller than that of 6, then a


will precede b.
2. All n-tuples with the same maximal component will be ordered
lexicographically.

(The maximal component of an n-tuple x is the greatest among the


numbers x,, X2, ..., X,- Lexicographic order is the order in which
words are listed in a dictionary. Here we regard an n-tuple x as a
‘word’ with x, as its first letter, x. as its second, and so on.) As an
illustration, take n = 2. The pairs of natural numbers will be ordered
as follows (cf. proof of Thm. 6.3.2):

(0, 0),
(0,1), (1,0), (1,1),
(0,2), (1,2), (2,0), (2, 1), (2,2),
(On eo Gilic Sis (2) a oe Sse oe ee
Clearly, this procedure can be mechanized: a computer can be pro-
grammed to spew out all n-tuples of natural numbers in this order. Mf

3.6. Theorem
Let P be an n-ary relation. Then P is recursive iff both P and =P
are r.e.

PROOF

(=). Suppose P is recursive. Then we can construct a decide-P


machine 9. As we have just seen, we can also construct an enumerate-
N" machine €. We set € to work, and compile a final output by
modifying the output of € as follows. We feed a copy of each n-tuple a
that € prints out into D. If the latter says that Pa holds, a is left in the
final output; but if D says that Pa does not hold, then a is eliminated
from the final output. This procedure can be mechanized, yielding an
enumerate-P machine. An enumerate-— P machine can be constructed
in a similar way.
(<=). Now suppose both P and —P are r.e. Then we have at
our disposal both an enumerate-P machine and an enumerate-—P
§3. Recursiveness 201

machine. These can be used to construct a (rather inefficient but quite


legitimate) decide-P machine, as follows. We set both enumerating
machines to work. Given any n-tuple ae N”, the outputs of both
machines are monitored, until a emerges from one of them (this is
bound to happen sooner or later!) and then it is noted from which of
our two machines a has emerged. (All this monitoring and noting can
of course be done automatically.) If a has come out of the enumerate-
P machine, then Pa holds; whereas if a has come out of the other
machine, Pa does not hold. &

3.7. Remarks
(i) Note that in the second half of this proof we needed both
enumerating machines. If we only had an enumerate-P machine,
and we tried to use it for testing whether Pa holds, then if the
answer happened to be negative we would never find that out.
(ii) By Thm. 3.6, every recursive relation is r.e. We shall see in the
next chapter that the converse of this is false.

3.8. Theorem
If P is obtained from Q by existential quantification and Q is r.e., then
P is r.e. as well.

PROOF

Suppose Px =iyQ(x,y). Since Q is r.e., we can construct an


enumerate-Q machine. Set this machine to work, and let its output be
modified as follows. Whenever an (n + 1)-tuple (a, b) pops out, the
last component b is erased, leaving the n-tuple a. (This modification
can of course be done automatically.) It is easy to see that we now
have an enumerate-P machine. a

3.9. Definition
Let f be an n-ary function. By a compute-f machine we mean a
computer with an input port and an output port, and programmed so
that if any x € N” is fed into the input port, then after a finite number
of steps the computer prints out as output the value fx.
We say that f is a recursive (or computable) function if a compute-f
machine can be constructed.
202 9. Recursion theory

Recall that the graph of an n-ary function f is the (n + 1)-ary relation


P such that

P(x, y) <> (fx = y)


for all x € N” and all y e€ N. (As a matter of fact, if n 2 1 then from
Convention 1.2(ii), Def. 8.3.2 and Prob. 2.3.3 it follows that the graph
of f is f itself; but this is not important just now.)

3.10. Theorem
For any function f, the following three conditions are equivalent:

(i) fis a recursive function (in the sense of Def. 3.9);


(ii) the graph of f is recursive (in the sense of Def. 3.1);
(iii) the graph of f is r.e.

PROOF

Let f be an n-ary function, and let P be its graph.


(i) > (1). Assuming that f is recursive, we can construct a compute-f
machine €. We can employ € to find out, for any (n + 1)-tuple
(a,b) € N", whether P(a, b) holds or not, as follows.
Given any (n + 1)-tuple (a, 6), we split it into the n-tuple a and the
number b. We make a record of the latter, and feed the former into C.
When € prints out the value fa, we compare it with our record of b
and see whether they are equal. P(a, 5) holds iff fa = b.
The procedure described in the previous paragraph can obviously be
automated, yielding a decide- P machine.
(ii) = (iii) is immediate from Thm. 3.6.
(iii) > (i). Assuming that P is r.e., we can construct an enumerate-P
machine ©. We can use € in the following way to calculate fa for any
ae N”.
Upon receiving a, we set € to work and monitor its output, checking
each (n + 1)-tuple as it is printed out, to see whether it is of the form
(a, b), having a as its first » components. Sooner or later, such an
(n + 1)-tuple is bound to turn up. When it does, we know that its last
component, b, is the value fa.
The procedure described in the previous paragraph can obviously be
automated, yielding a compute-f machine. (No prizes for efficiency,
but it is perfectly legitimate. ) =
§3. Recursiveness 203

Bue Remarks

(i) Recursion theory studies functions of a more general kind: an


n-ary function is allowed to have any subset of N” as its domain
(instead of the whole of N”, as we insist here). The definition of
a compute-f machine must then be modified by stipulating that
the machine prints out the correct value fa for any input a €
domf; but for an input a ¢ domf it goes on computing for ever,
without producing any output. For these more general functions
it is not difficult to show that conditions (i) and (iii) of Thm. 3.10
are still equivalent to each other; but they do not imply condition
(ii).
(ii) The first rigorous description of a computer satisfying Assump-
tions 2.1—2.6, devised expressly for the purpose of explicating the
intuitive notion of computability, was published by Turing in
1936. Since then many alternative machines satisfying Assump-
tions 2.1-2.6 have been invented. (For a description of Turing
machines see the books by Rogers and D. E. Cohen cited in § 1;
the latter contains also descriptions of several other alternatives.)
In each case it was easy to prove that the operation of the
alternative machine can be simulated by a Turing machine; the
converse also holds, provided the alternative machine satisfies
some modest requirements.
This and other evidence lends overwhelming support to the
claim — known as Church’s Thesis — that any function that is
mechanically computable in the intuitive sense is computable by a
Turing machine (or, for that matter, by one of its equivalent
alternatives). Church’s Thesis is equivalent to the claim that any
relation that is mechanically decidable (or enumerable) in the
intuitive sense can be decided (or enumerated, respectively) by a
Turing machine.
(iii) Although a recursive or r.e. relation may well be infinite in
extension, and a recursive function is necessarily infinite in
extension, each such entity is completely determined by a com-
puter program, which is a finite object. For this reason, recursion
theory does not on the whole require powerful set-theoretic
presuppositions. Even without such presuppositions it is possible
to treat recursive and r.e. relations and recursive functions as
objects: if need be, programs can play this role vicariously,
standing in for the more abstract entities they characterize.
204 9. Recursion theory

§ 4. Closure results
4.1. Theorem
The class of recursive relations is closed under all propositional opera-
tions.

PROOF
Let P and Q be n-ary recursive relations. Thus we can construct a
decide-P machine Dp and a decide-Q machine Dg. Then Dp can be
turned into a decide-— P machine, simply be reversing its outputs.
Therefore — P is recursive.
To construct a decide-(P v Q) machine, let Dp and Dg operate
alongside each other. Given any n-tuple a €eN”, a copy of it is fed into
each of these two machines. Their two outputs are channelled into a
collating unit. This unit checks the two outputs, and if at least one of
them is ‘yes’ it gives out a final output ‘yes’; but if both Dp and Do say
‘no’, then the collating unit gives out a final output ‘no’. We have now
got a decide-(P v Q) machine, showing that P v Q is recursive. The
other Boolean operations can be reduced to negation and disjunction.
ET

4.2. Remark
According to Assumption 2.3, a computer is supposed to operate in
a serial manner. This seems to be violated by the decide-(P v Q)
machine just described, which has Dp and Dog as two components
working in parallel. The apparent difficulty can be resolved by assum-
ing that the two components operate alternately, as in bipedal walking:
each one pausing while the other performs a step.

4.3. Theorem
The class of r.e. relations is closed under disjunction, conjunction and
existential quantification.

PROOF

Let P and Q be n-ary r.e. relations. So, we can construct an


enumerate-P machine €p and an enumerate-Q machine Eg. We set
these two machines to operate alongside each other (see Rem. 4.2).
To get an enumerate-(P v Q) machine, we channel the outputs of
€p and €g into a collating unit that combines these two outputs into a
$4. Closure results 205

single list. The combined list is the output of an enumerate-(P v Q)


machine. Hence P v Q isr.e.
To get an enumerate-(P ~ Q) machine, we need, in addition to a
collating unit, two waiting lists or buffers in which information can be
accumulated — one each for P and Q. Initially both buffers are empty.
The collating unit examines in turn each fresh n-tuple that pops out of
€p or Eg. The two buffers as well as a final list are compiled according
to the following rules. Each time a fresh n-tuple a comes out of € p, the
collating unit checks whether an identical n-tuple is already stored in
the Q-buffer. If a is found to be in the Q-buffer, then it is put onto the
final list; but if a is not in the Q-buffer then the collating unit adds it to
the P-buffer. Similarly, each time a fresh n-tuple 6 comes out of Eo,
the collating unit checks whether 6 is stored in the P-buffer. If 6 is
found to be stored there, then it is put onto the final list; otherwise, it
is added to the Q-buffer. It is easy to see that the final list is the output
of an enumerate-(P A Q) machine, showing that P A Q isr.e.
As for closure under existential quantification — this has already
been proved (see Thm. 3.8). a

Next, we show that the class of r.e. relations is closed under the
operation of adding a redundant variable.

4.4. Theorem
Let P be an n-ary relation. Let Q be the (n + 1)-ary relation such that,
forallxée N” andall yeN,

OV i Pe.

IfP is r.e., then Q is r.e. as well.

PROOF

By hypothesis we can construct an enumerate-P machine €p. Also, by


Lemma 3.5 we can construct an enumerate-N‘"*!) machine &.
To get an enumeration of Q, we set both €p and © to work. As in
the proof of the A part of Thm. 4.3, we compile a final list as well as
two buffers, one each for P and N‘"*), When an n-tuple a pops out of
€p, it is added to the P buffer; and every (n + 1)-tuple of the form
(a, b) that is already stored in the N"*” buffer is added to the final
list.
When any (n + 1)-tuple (a, b) pops out of €, we check whether a is
206 9. Recursion theory

present in the P buffer; if it is, (a, b) goes on the final list; if not, it
goes to the N‘"*!) buffer. @

4.5. Remarks
(i) Results similar to Thm. 4.4 hold also for the class of recursive
relations and the class of recursive functions; but they are too
obvious to be stated as theorems.
(ii) Using these facts, we can deal with disjunctions and conjunctions
of r.e. or recursive relations that are not of the same n-arity. For
example, if P and Q are binary, we can form a quaternary
relation R by stipulating that for all w, x, y and z,

R(w, x, y, Z)= P(w, x) A QYy, Z).


By adding y and z to P.and w and x to Q as redundant variables,
we can see that if P and Q are r.e. (or recursive) then so is R.

For the final theorem of this section, we let f,, fo, ..., f, be n-ary
functions. Let g be a k-ary function and let the function h be obtained
by composing g with fj, fo, ..., f,; in other words, for all x e N”,

hx = g(fix, fox, .- ++ fix).


Let P be a k-ary relation and let the relation Q be obtained by
composing P with f,, fo,..., f,; in other words, for all x e N”,

OS <> Phx. Poke 5 eee)

4.6. Theorem
Let fi, fo, ..-5 fe be recursive functions.

(i) If g is a recursive function as well, then so is h.


(ii) If P is a recursive relation, then so is Q.
(iii) If Pis r.e., then so is Q.

PROOF
(i) By hypothesis, we can construct machines §,, §, ..., §, that
compute f;, fz, ..., f; respectively; also, we can construct a compute-
g machine, 8. To compute h, we proceed as follows.
Given any n-tuple a € N", copies of it are fed into the input ports of
$1, S2, -.., Sx. When these k machines have produced their outputs,
bi, bz, ..., by, they are put together as a k-tuple (b,, bo, ..., by),
§5. The MRDP Theorem 207

which is fed into the input port of ©. The output produced by the latter
is the required value ha.
This procedure can be mechanized, yielding a compute-h machine.
The proof of (ii) is similar. To prove (iii), we note that

Ox = Jyye ... dygl(fix = yi) A (fox = yo) A... A fk = ye)


AEAVGSVon sy VE):
By Thm. 3.10, the graphs of f;, fo, ..., f, are r.e., and P is r.e. by
hypothesis. Hence Q is r.e. by Thm. 4.3. and Rem. 4.5(ii). a

§5. The MRDP Theorem


5.1. Preview
In 1970, Yuri Matiyasevié — building upon work done during the
preceding two decades by Julia Robinson, Martin Davis and Hilary
Putnam — completed the proof of a remarkable theorem that character-
izes r.e. relations in extremely elementary terms. We refer to this
result by the acronym ‘MRDP’, for the four names just mentioned.
In view of Thms. 3.6 and 3.10, the MRDP Thm. also provides
elementary characterizations of the other two central concepts of
recursion theory: recursive relations and recursive functions. These
characterizations simplify the application of recursion theory to logic.
We shall present the MRDP Thm. without proof, which is too long
to be included here.

5.2. Definition
(i) An n-ary function f is a monomial if for some natural number a
(called the coefficient) and natural numbers ky, k2, ..., ky
(called the exponents) the equality
fx= DNs Nye ei Kae

holds for all x e N”.


(ii) An n-ary function f is a polynomial if it is a sum of monomials;
that is, for some monomials f,, fo,..., fy, the equality

Teh Nick fod tes cat end

holds for all x e N”.


208 9. Recursion theory

5.3. Definition
(i) An n-ary relation P is elementary if there are n-ary polynomials
f and g such that, for allx € N”,

Px <> (fx = gx).

(ii) An n-ary relation P is said to be diophantine.if it can be obtained


by a finite number of existential quantifications from an elemen-
tary relation; in other words, there are (n + m)-ary polynomials
f and g such that, for all x e N”,

Px <> Ay Ay... d¥mlfO, Vis Va. +++ 2 Yin) =


g(x, V19 YI5.< 29 9 Ym)]-

(Here m may be 0, so every elementary relation is a fortiori


diophantine.)

5.4. Theorem (MRDP)


A relation is r.e. iff it is diophantine. &

5.5. Remarks
(i) The = part of the theorem is simple to prove. First, let P be an
n-ary elementary relation, and let f and g be polynomials
satisfying the condition of Def. 5.3(i). For any given x € N” we
can calculate the values fx and gx — this involves a finite number
of additions and multiplications of natural numbers. Then the two
values can be compared to see whether Px holds or not. This
procedure can clearly be mechanized, yielding a decide-P
machine. Thus every elementary relation is recursive, and hence
r.e. by Thm. 3.6. Now, by Def. 5.3(ii), any diophantine relation
is obtainable from an elementary relation by a finite number of
existential quantifications; so it is r.e. by Thm. 3.8.
(ii) The => part of the MRDP Thm. is far harder to prove. The
original proof (including Robinson’s early results and her joint
work with Davis and Putnam) is reproduced in B&M, pp.
284-311. A shorter and more direct version of the proof is
presented in pp. 111-123 of Cohen’s book cited in § 1.
(iii) The proof of the MRDP Thm. is effective: it provides us with a
method whereby from a given description (program) of an
§5. The MRDP Theorem 209

enumerate-P machine it is possible in principle (granted enough


time and patience) to obtain polynomials f and g in terms of
which P can be presented as prescribed in Def. 5.3 (ii). Con-
versely, given such a presentation, it is easy to construct a
program under which a computer will operate as an enumerate-P
machine.
10
Limitative results

§1. Preliminaries
1.1. Preview
The main results in this chapter reveal the inherent limitations of
formalism and the formalist approach to mathematics. For the sake of
simplicity we confine ourselves to a very basic part of mathematics:
elementary arithmetic (a.k.a. elementary number theory), whose sub-
ject-matter is the elementary structure of natural numbers (see Ex.
8.3.6). However, these results can be generalized without much dif-
ficulty to richer and more elaborate mathematical contexts.

1.2. Convention
We shall often write ‘number’ as short for ‘natural number’. Unless
stated otherwise, we shall follow the notation and terminology of Ch. 9
(see Conv. 9.1.2). Also, we use ‘k’, ‘m’, ‘n’ and ‘p’ as informal
variables ranging over numbers.

1.3. Specification
From now on, unless stated otherwise, our formal object language 2
will be the first-order language of arithmetic; namely, the first-order
language with equality =, whose extralogical symbols are:

(i) One individual constant, 0;


(ii) One unary function symbol, s;
(iii) Two binary function symbols, + and x.

1.4. Remarks

(i) Note that 2 has no extralogical predicate symbols, so its only


atomic formulas are equations.

210
§1. Preliminaries Dull

(ii) Since ‘s’ is now used as a syntactic constant denoting the unary
function-symbol of £, we cannot use it any longer as a syntactic
variable ranging over ’-terms. For this purpose we shall use ‘q’,
‘r’ and ‘t’, with or without subscripts.
(ili) The terms of & evidently fall into the following five mutually
exclusive categories:
(1) Terms of the form x, consisting of a single occurrence of a
variable;
(2) The single term 0;
(3) Terms of the form st, where t is any term;
(4) Terms of the form +rt, where r and t are any terms;
(5) Terms of the form Xrt, where r and t are any terms.
Terms of the last three categories will be referred to as ‘s-terms’,
‘+-terms’ and *X-terms’ respectively.

1.5. Definition
In addition to Def. 8.2.2, which remains in force here — and for similar
reasons — we put, for any terms r and t:
Gi) (r-+t) =a +1t,
(li) (rXt) =g¢ Xrt.
In using this metalinguistic notation, brackets are required. To prevent
proliferation of brackets, which would impair legibility, we omit brack-
ets subject to three simple conventions. First, the Greek cross ‘+’ is
deemed to separate more strongly than the St Andrew cross ‘x’.
Second, of any two occurrences of ‘+’ (or of ‘X’) enclosed within the
same pairs of brackets, the one further to the left is deemed to
separate more strongly. Third, we do not omit any pair of brackets
whose left member comes immediately after an occurrence of ‘s’;
hence, when restoring brackets, no new left bracket should be placed
immediately after an ‘s’. For example,
s0+ss0Xs0Xsss0+0 = s0+ss0X (s0Xsss0)+0
= s0+[ss0X (s0Xsss0)|+0
= s0+{[ss0X (s0Xsss0)|+0} = {s0+{[ss0X (s0Xsss0)]+0}}.

1.6. Definition
Proceeding by induction, we define, for each natural number k, an
L-term s,, called the k-th L-numeral:
So = 0, Sxi1 = SS.
212 10. Limitative results

Thus s, is the £-term consisting of a single occurrence of 0 preceded


by k occurrences of s.

1.7. Recapitulation
Applying Def. 8.4.2 to our present language’, we see that an
£-interpretation (a.k.a. “-structure) U is completely determined by
the following ingredients.

(i) A non-empty set U — the domain of U.


(ii) An individual 0" € U — the individual denoted by 0 under the
interpretation U.
(iii) A unary operation s" on U — the operation that interprets s under
U.
(iv) Two binary operations +" and x" on U — the operations that
interpret + and X respectively under U.

Apart from the conditions we have just specified, these ingredients of


an “-interpretation can be quite arbitrary. Thus U can be a set of any
cardinality whatsoever, so long as it is non-empty; the nature of the
individuals (members of U) is immaterial; and 0" can be any member
of U. Similarly, s can be an arbitrary unary operation on U; and +"
and x" can be arbitrary binary operations on U.
However, of the huge variety of possible “-interpretations we single
out one, for which the language 2 was designed in the first place.

1.8. Definition
The intended or standard £-interpretation Mt is characterized as fol-
lows:

(i) St has as its domain the set N of natural numbers.


(ii) 0% = 0 (the number zero).
(iii) s” = 5, the successor function (that is, sx = x + 1 for each num-
ber x):
(iv) + = + and x = x (the operations of natural-number addition
and multiplication, respectively).
§1. Preliminaries 213

1.9. Definition

(i) If t is a closed £-term, we call t™ the numerical value of t (cf.


Def. 8.5.6).
(ii) We say that an -L-sentence 9 is true or false according as NE @ or
NE @ (cf. Def. 8.5.10).

1.10. Remarks

(i) We have chosen the syntactic constants ‘0’, ‘s’, ‘+’ and ‘x’
advisedly, so as to serve a mnemonic purpose: each of these
symbols graphically suggests the standard interpretation of the
£-symbol that it denotes. This punning mnemonic role of the
four syntactic constants is made manifest in clauses (ii), (iii) and
(iv) of Def. 1.8. For example, ‘0’ has been chosen as the name (in
our metalanguage) for the individual constant of £. The shape (if
any!) of the latter constant is left unspecified, but under the
standard interpretation of & it is treated as a name of the number
zero, that number which is conventionally denoted by the num-
eral ‘0’. Since ‘0’ was chosen for its present role precisely because
it looks like ‘0’, we have a mnemonically useful pun: 0* = 0.
A similar mnemonic purpose is served by the choice of ‘=’ as
the syntactic constant denoting the equality symbol of “, except
that in this case the pun is not confined to the standard interpre-
tation. Indeed, by Def. 8.4.2(iii), under any “-interpretation U
the equality symbol of & is interpreted as denoting the identity
relation on the domain U of U. As a result, we have (as part of
clause F1 of the BSD) the mnemonically useful pun:

(r=t)? = T iffr’ = t?,


for any £-valuation o and any -£-terms r and t.
(ii) A practical advantage of the choice of ‘0’, ‘s’, ‘+’ and ‘X’ is that
when we refer to an “-term by means of this metalinguistic
notation, it is often quite easy to work out by inspection the value
of that term under any valuation based on X. (This value must be
a number, because the domain of 9 is the set N of numbers.)
For example, consider the term xXx + ss0XxxXy + yXy, where
214 10. Limitative results

x and y are variables. If o is a valuation based on %, it is easy to


see that (xXx + ssOXxXy + yXy)” =x? + 2xy + y’, where x and
y are the numbers x® and y® respectively.
In particular, if t is a closed term, it is a simple matter to work
out the numerical value t™ of t.
Similarly, when we refer to an £-formula by means of our
metalinguistic notation, it is often quite easy to work out by
inspection the truth value of that formula under any valuation
based on Xt. In particular, if @ is an £-sentence it may be quite
easy to work out by inspection whether Jt F @ — that is, whether @
is true. For example, it is not difficult to verify that

ME VxVy[(xt+y)X(xt+y)=xXx + ssOXxXy + yXy].

1.11. Warning
Beware, however, of being deceived by this suggestive notation: Rem.
1.10Gi) works for the standard interpretation, but not necessarily for
other interpretations. Thus, for example, you must not assume that 0
always denotes the number 0. Rather, under an arbitrary “-interpreta-
tion U, the object 0" denoted by 0 need not be a number at all, let
alone the number 0; in fact, it can be any object whatsoever.
Or, to take another simple example, you must not assume that the
sentence 0+0=0 is true under an arbitrary -interpretation. Of
course, this sentence is easily seen to be true in the sense of Def.
1.9(ii). It is clearly satisfied in the standard structure J. But it is not
logically true: If o is a valuation based on an arbitrary interpretation
U, then we find (using the BSD) that (0+0=0)° = T iff f(a, a) =a,
where f = +" and a = 0" (that is, f and a are the binary operation and
individual named by + and 0 respectively under U). It is quite possible
that f(a, a) # a, in which case UF 0+0=0.

1.12. Problem
Show that s;,™ = k (see Def. 1.6).

1.13. Problem
Let x, y and z be distinct variables. Let o be a valuation based on %
and let x and y be the numbers x° and y” respectively. For each of the
§2. Theories 215

following five formulas state a condition involving x and y, which is


necessary and sufficient in order that o satisfy the formula in question.

(i) 4z(x+z=y),
(ii) 4z(x+sz=y),
(ili) Vy(x#sy),
(iv) dy(x=s2Xy),
(v) dz(x=yXz).

§2. Theories
2.1. Definition
For any number n, we let ®,, be the set of all 2-formulas whose free
variables are among vj, Vo, ..., V,, the first m variables of 2 in
alphabetic order (cf. Spec. 8.1.1(i)). In particular, ®9 is the set of all
£-sentences.

2.2. Remark
If ge @,,, it does not follow that all the variables v,, v>, ..., v, must
be free in @; but only that no other variables are free in @. Hence
®, C®@®,,,, forall n.

2.3. Definition

(i) If f is any set of sentences (that is, Tf C Bo) we put

DeD =a: (pg € Bo: TF g}.


DcI is called the deductive closure of T.
(ii) We put A = df Dc.

2.4. Remarks
By definition, DcI is the set of all sentences that can be deduced from
I in Fopcal. However, by the soundness and completeness of Fopcal
(Thms. 8.9.14 and 8.13.10), DeI is also the set of all sentences that are
logical consequences of I; in particular A is the set of all logically true
sentences (cf. Def. 8.4.10). ‘A’ is mnemonic for ‘logic’.
216 10. Limitative results

2.5. Definition
An £-theory is a set & C Pp such that & = Dex; in other words, it is a
set of 2-sentences closed (or saturated) under deducibility of L-sent-
ences.

2.6. Problem
If F is any set of sentences, show that Def is a theory that includes
itself. Moreover, Def is the smallest such theory: if X is any theory
that includes [, then DeI C x.

2.7. Definition
If Z is a theory, then a postulate set for X is any set I of sentences such
that & = Dc.

2.8. Remark
The ideas we have just introduced may be applied in two mutually
converse ways. In some cases we start with a given set I of sentences
as postulates, and wish to investigate the resulting theory Decl. In
other cases we start with a given theory & and wish to find a set of
postulates for it that has some desirable property. (Of course, by Defs.
2.5 and 2.7 every theory is a postulate set for itself; but the point is to
find a simpler set.)

2.9. Examples

(i) Consider A = De®. By Prob. 2.6, A is a theory; moreover, it is


the smallest theory, in the sense that it is included in every
theory.
(ii) The set ®5 of all sentences is evidently a theory. Moreover, it is
the /argest theory, in the sense that it includes every theory.
Clearly, ®o is inconsistent. Moreover, it is the only inconsistent
theory. Indeed, if 2 is an inconsistent theory, then for every
sentence @ we have &}+ @ by IE, hence ge because © is a
theory. So & must be ®p.
§2. Theories PAG]

2.10. Definition
For any -£-structure U we put
Thu =a {@ € Do: UE g}.

ThU is called the theory of U; it is the set of all sentences that hold in
bE

2.11. Remark

It is easy to see that THU is indeed a theory in the sense of Def. 2.5: if
yw is a sentence such that Thu} yw then, by the soundness of Fopcal,
UE w; therefore wp € ThU.

2.12. Definition
A theory 2 is complete if it is consistent, and for any sentence @ either
@meloraq@er.

2.13. Problem

(i) Show that a consistent theory 2 is complete iff it is maximal


among consistent theories, that is, it is not included in any other
consistent theory.
(ii) Show that, for any 2-structure U, THU is a complete theory.
(iii) Show that any consistent theory is included in a complete theory.
(iv) Show that any complete theory is of the form ThU for some U.

2.14. Definition
(i) We put
Q =<dt Th.

The theory 2, consisting of all true sentences (in the sense of


Def. 1.9(ii)) is called complete first-order arithmetic.
(ii) A set of sentences — and, in particular, a theory — is said to be
sound if it is included in 2; in other words, if all the sentences
belonging to it are true.

2.15. Remarks
(i) By Prob. 2.13(ii), & is indeed a complete theory. By Def. 2.14,
Q is a sound theory. In fact, Q is the only complete sound
218 10. Limitative results

theory. Indeed, if & is sound, then & CQ; but if X is also a


complete theory then by Prob. 2.13(i) it cannot be included in
another consistent theory, so & must coincide with @.
(ii) Q can be regarded as the whole truth about N in £, in the sense
that it consists of all &-sentences that are true in Xt. But is it
really the whole truth about 9? We shall address this question in
the next section. :

§3. Skolem’s Theorem


3.1. Preview
In this section we show that Jt cannot be uniquely characterized in £2:
even Q — the whole truth about % in Z — is not sufficient to single out
M because @ has, apart fromm ‘{ itself, other models that are not
isomorphic to 2.

3.2. Convention
We shall often wish to consider the standard structure Jt alongside
some -2-structure, which may or may not be the standard one. In such
cases it will be convenient to denote the latter structure by ‘*It’.
Whenever we use this notation, we shall take it for granted that
(i) *N is the domain of *%,
(ii) *0 is 0” (the designated individual of *N),
(iii) *s is s * (the basic unary operation of *2),
(iv) *+ and *x are + and x™ respectively (the basic binary
operations of *J).
The prefix ‘*’ is pronounced as ‘pseudo’.

3.3. Remark

The purpose of this convention is to stress both the similarities and


dissimilarities (if any) between Jt and *MN.

3.4. Definition
(i) An embedding of the structure M in the structure *N is an
injection from N to *N (that is, a 1-1 mapping from N into *N )
$3. Skolem’s Theorem 219
such that

PO = 00 © fim tel) = *s(fm),


(*)
f(m +n)=fm*+ fn, f(mn)
= fm*x fn,
for all numbers m and n.
(ii) If, in addition, f is a surjection from N to *N (that is, fmaps N
onto *N) then f is called an isomorphism between N and *, and
the two structures are said to be isomorphic to each other.

3.5. Remarks
(i) If f is an isomorphism between Jt and *N, then *N is an exact
replica of Jt: each number n has a unique counterpart fn and
each individual of *Jt is the counterpart of a unique number; and,
moreover, by (*) the basic operations on numbers are exactly
mimicked by the corresponding basic operations on their counter-
parts. The two structures are structurally indistinguishable.
For this reason we shall from now on refer not just to It itself
but also to any “-structure isomorphic to it as the standard
structure.
(ii) If f is merely an embedding of N in *I, then this means that *2
has a substructure isomorphic to 2.

3.6. Problem
Let f be an embedding of Jt in *M. For any valuation o based on
MN, we define fo as the valuation based on *® such that, for each vari-
able y,

y’’ = f(y’).
(i) Show that t/° = f(t’) for any term t. Hence, in particular, if t is a
closed term it follows that t™ = f(t”). (Use induction on degt,
distinguishing the five cases mentioned in Rem. 1.4(iii). Note that
the fact that f is injective need not be used in the proof.)
(ii) Show that f[o(x/n)] = (fo)(x/fn), where x is any variable and n
is any number.
(iii) Show that if f is an isomorphism between Q and *® then
a/° = a? for any formula @. In particular, *M + » iff NE @ for any
sentence @.
220 10. Limitative results

3.7. Remark
by Def. 2.14, Ne@ iff pe Q; thus N is a model for Q (see Def.
8.5.10). From Prob. 3.6(iii) it follows that any structure *J? isomorphic
to N is likewise a model for Q. This is hardly surprising, since such *2
is a carbon copy of Xt. The surprising fact, which will be proved next, is
that not all models for 2 are standard.

3.8. Theorem (Skolem, 1934)


There exists a nonstandard model for Q — that is, a model for 2 that is
not isomorphic to Wt. Moreover, there is such a model whose domain is
denumerable.

PROOF

Choose any variable x, and for each number n let @, be the formula
x#s,. Now consider the following set of formulas:
®=QU (g,:neN}.

We claim that ® is satisfiable. By the Compactness Thm. 8.13.12, this


claim will be proved if we show that every finite subset of ® is
satisfiable.
So let ®’ be any finite subset of ®. Clearly, ®’ can only contain a
finite number of formulas q,; hence @®’ is included in the set
Q U {@,:n < p}, provided p is sufficiently large. So in order to show
that ®’ is satisfiable, we need only show that 2 U {@,:n <p} is
satisfiable. However, the latter set is satisfied by any valuation o based
on Xt, provided x° = p. Indeed, since o is based on MN, it satisfies Q.
Furthermore, s,° =n (see Prob. 1.12); hence if x° = p then o also
satisfies the formulas q,, — that is, x#s, — for every n < p.
We have thus proved our claim that ® is satisfiable. Let t be a
valuation that satisfies ® and let *9t be its underlying structure. *9 is a
model for &2, because T satisfies ®, which includes Q.
As the language & is denumerable, it follows from the Lowenheim-—
Skolem Thm. 8.13.13 that we may take the domain *N of *% to be
countable (that is, finite or denumerable). However, *N cannot be
finite, because @ contains the sentences s,, # $s, for all pairs of distinct
numbers m and n, and therefore all these sentences must be satisfied
in *Xt, which can only happen if */N is infinite. Thus *N is denumer-
able.
It remains to show that *9t is nonstandard; in other words, that it is
not isomorphic to Jt. Suppose f is an embedding of in *St. We shall
$4. Representability pA

prove that f cannot be surjective (that is, cannot map N onto *N).
Indeed, for each number n our valuation T satisfies the formula ®n>
that is, x#s,,. Hence (by the BSD) we must have

x' #s,," for every number n.


However, by Probs. 3.6(i) and 1.12 we have

s,° = 8," = f(s,%) = fn.


Thus x" — which must belong to *N, the universe of t - cannot be fn
for any number n. This shows that f is not surjective. a

3.9. Problem
Let *It be any model for 2. Let f be the mapping from N to *N
defined by:

fn=s, ” for all n.

(i) Show that f is injective. (If m#n then s,, #s, is in ® and so
must hold in *9.) Prove:
(ii) f is an embedding of J in *2N.
(iii) f is the only embedding of M in *It. (Use Prob. 3.6(i).)
(iv) Hence *® is a standard model of Q iff *N = {s, %: ne N}.

3.10. Remark
Skolem’s Theorem means that the whole truth about St cannot be
expressed in £. As we have noted, @ is all that can be said in £ about
MN; but Q fails to pin I down uniquely (even up to isomorphism). At
first sight it may seem that is perhaps due to some accidental defect of
£. Can £ perhaps be enriched (and Jt correspondingly elaborated) so
that in the richer formal language the correspondingly more elaborate
structure of natural numbers may be characterized uniquely up to
isomorphism? For a discussion of this question, and a pessimistic
answer, see B&M, pp. 320-324. We shall return to this issue in the
Appendix.

§4. Representability
4.1. Preview
This section is devoted to defining new concepts rather than to proving
major results. We shall introduce two ways in which a relation on N
may be formally expressed or represented in a theory 2.
php 10. Limitative results

4.2. Reminder
We recall some of the conventions introduced in Ch. 9. Lower-case
German letters ‘a’, ‘b’, ‘x’ and ‘y’ are used as informal variables ranging
over the set N” of all n-tuples of numbers. Where a German letter is
used for an n-tuple, the corresponding italic letter is used for the
components of that n-tuple. Thus, for example,-a = (a1, d2,..., Gn)
ANG Minuet
Note that the number of components of a tuple denoted by a
German letter is always assumed to be n (rather than k or m etc.).
Recall that by relation we mean relation on N. If P is an n-ary
relation, we usually write, for example, ‘Pa’ as short for ‘a € P’.

4.3. Remark
The symbols ‘a’ and ‘x’ do not refer to, or have anything to do with,
the formal language ; they are ordinary mathematical symbols used
as variables in our own language.

4.4. Definition (abbreviated notation for substitution)


For any terms r, t;, t2,..., t, and any formula @ we put

Gi) T(t, thy... t.) a F(atisVay, - o Vata)


fii) Ottis (22 ot) = 4 OVa bie Gale ee SN Ee

4.5. Remarks

(i) Here the terms t,, t2, ..., t, are substituted simultaneously for
all free occurrences of v,, V2, ..., V, respectively — the first n
variables in alphabetic order (cf. Spec. 8.1.1(i)). So, for example,
‘a(t)’ is short for ‘a(v,/t)’. If t is to be substituted for a variable x
other than v,, we cannot use the abbreviated notation but have to
write ‘a(x/t)’ in full.
(ii) When substituting several terms in a formula, as in Def. 4.4(ii),
alphabetic changes of bound variables may be necessary in order
to prevent capture. Also, it is important that the terms are
substituted simultaneously rather than successively. (For a de-
tailed precise treatment of the technicalities involved in simul-
taneous substitution, see B&M, pp. 65-67.) However, in many
$4. Representability 223

cases when the abbreviated notation is used below, the terms that
are substituted will be closed terms; so no changes of bound
variables will be required. In such cases it is also unimportant
whether the substitution is made simultaneously or successively.

Next, for the case where the terms to be substituted for the variables
Vi, V2, ..-., V, are numerals, we introduce a further useful abbrevi-
ation which slightly stretches the use of lower-case German letters.

4.6. Definition
For any term r, any formula @ and anya € N”, we put

(i) r(s,) = de a (Fe Sa> Osta) Sai)s

(ii) a(S.) eat a(Sa,; Sar.-+++> Sa.)

Thus, @(s,) is obtained from @ by substituting the a;-th numeral for all
free occurrences of v;, where i=1,2,..., 7.

If ae ®,, then — for any a € N” —- a(s,) is a sentence. If Z is a theory,


it makes sense to enquire whether the sentence a(s,) belongs to 2;
similarly, we may enquire whether its negation, the sentence 4a@(s,),
belongs to that theory. This gives rise to the following important
definition.

4.7. Definition
Let P be any n-ary relation and let 2 be a theory.

(i) A formula a € ®,, represents P weakly in & if, for allx « N",

Px (S.) aa.

P is weakly representable in & if it is weakly represented in & by


some ae @®,,.
(ii) A formula « € ®,, represents P strongly in & if, for all x € N",

Px => a(s,) € &, =x —10(S;)' 6 a.

P is strongly representable in ¥ if it is strongly represented in & by


some ac ®,,.
224 10. Limitative results

4.8. Remarks

(i) Recall that — is the (informal) negation operation on relations;


thus — Px holds iff Px does not.
(it) Use of the adverbs ‘weakly’ and ‘strongly’ is justified because, for
a consistent theory, weak representation follows from strong
representation: if @ represents P strongly in & and Px does not
hold, then sa(s) € £; and — provided & is consistent — it follows
that a(s,) ¢ &. Thus «@ also represents P weakly in X.
If X is the inconsistent theory, the above argument fails. Weak
and strong representability in this theory are, however, trivial
notions. (See Prob. 4.9.)
(iii) For any a € ®, and any theory %, there is always a unique n-ary
relation P that is weakly represented by @ in X, because Def.
4.7(1) determines such P uniquely.
On the other hand, a may not represent any relation strongly
in 2, because for some x it may happen that neither a(s,) € & nor
a a(s,) € x.
(iv) However, if & is a complete theory (cf. Def. 2.12) then
.a(s,) € & iff a(s,) ¢€XZ; so in this case strong representation is
equivalent to weak representation. In other words, in a complete
theory any ae @®, represents a unique n-ary relation both
weakly and strongly. In connection with a complete theory we
shall therefore omit these qualifications and say simply that a
given formula represents the relation.

4.9. Problem

Let ae ®,, where n > 0. Determine the n-ary relations that a repre-
sents weakly/strongly in the inconsistent theory.

§5. Arithmeticity
5.1. Preview
In this section we investigate an important class of relations: those
representable in complete first-order arithmetic, Q. In view of Rem.
4.8(iv), in the present context we need not distinguish between weak
and strong representation, so we say simply that a given formula
represents a relation in Q.
$5. Arithmeticity 225

5.2. Definition
A relation is arithmetical if it is representable in Q.

5.3. Remark

Thus by Def. 4.7, an n-ary relation P is arithmetical iff there is a


representing formula a € ®,, such that

(*) Px = a(s,) €

for allx e N”.


And since by Def. 2.14(1) 2 = Th, condition (*) is tantamount to:
(**) Pr Pas.)

5.4. Definition
Let ae ®, andae N”. If @ is satisfied by some valuation o based on
‘such that vy," = a; fori = 1,2, ...,n, we-write:

‘NE afa]’.

5.5. Remarks
(i) If NF ala], then by Thm. 8.5.8 «@ is satisfied by every valuation 0
based on %t such that v;° = a; fori=1,2,...,n.
(ii) Def. 5.4 is a contextual definition: it defines the whole expression
‘ME a[a]’ as a package. The part ‘a[a]’ of this package has no
meaning on its own: it does not denote anything whatsoever. In
particular, ‘
‘a{a]’ must not be confused with ‘a(s,)’, which does
have meaning on its own: it denotes the “-sentence obtained
from « by substituting the n-tuple of numerals s, for the first n
variables of 2. However —

5.6. Lemma
Letae ®, andae N". ThenNE a(s,) iff RE aa].

PROOF

We consider in detail the case n = 1. In this case @ has no free variable


other than v,, and we must show, for any number a, that Itt a(s,) iff
226 10. Limitative results

Mt ala]. Here goes:


NE a(s,) <> a(s,)° = T for some valuation o based on Kt
by Def. 8.5.10,
~ > gg/2) = T by Prob. 8.6.16, since s,° =a,
=> NE afa] by Def. 5.4.

The general case, for arbitrary n, is treated similarly. Of course, it


utilizes the generalization of Eqn. 8.6.6 to the case of simultaneous
substitution of n terms. (See B&M, p. 65.) a

5.7. Remark
From this lemma it now follows that conditions (*) and (**) of Rem.
5.3 are equivalent to

(***) Px<MQE a[x].

5.8. Examples
Because condition (***) refers to the standard interpretation, it is
always straightforward to work out the n-ary relation represented in Q
by a given ae @®,. All that we need to do is to ‘deformalize’ a by
‘translating’ it from 2 into the metalanguage (see Rem. 1.10).

(i) Consider the formula v,;+v3=v>. It belongs to ®; and hence


represents in &2 a ternary relation P. Moreover, P is evidently
the relation determined by

P( X15 X25¥3) Ky Xs = 45.


Equivalently, P = {(x1, x2, x3) € N°: x1 + x3 =X}.
Note that our formula also belongs to ®, (as well as to ®,, for
any n = 3). So it represents in 2 a quaternary relation Q, which
is given by

O(%1, X25 X3, X4) <> Xy + X3 = Xp,


or

Q = {(x1, X2, X3, X4) € N 2 xy + x3 = Xp}.


Of course, Q does not depend on its fourth argument; but it is
nevertheless a quaternary relation!
(ii) Next, consider the formula dv3(v;+v3=v2). It belongs to ®, and
§5. Arithmeticity 227
therefore represents in Q a binary relation R. By direct ‘deform-
alization’ we see at once that R is given by

IQ 5.05) < Xe(Xi 03 — Xo).

It does not require much knowledge of arithmetic to realize that


R is the relation <; more explicitly:

Rij, Xo) = x1. = 2 OF R = {{ x1, 2%) & N7: x, = xp}.

This example should look familiar; it is of course Prob. 1.13(i) in


a slightly different guise.
(iii) Now consider the formula Vv2(v;#sv2). This belongs to ®, and
therefore represents in Q a property S. By direct ‘deformaliza-
tion’ we see:

SX1 => Vx2(x4 ca Noucts 1),

and, using a tiny bit of knowledge of arithmetic, we realize that


Sx; > x, = 0, so that S = {0}. Of course, S is also represented in
Q by other formulas, for example v,=0.

5.9. Lemma
If the equation r=t belongs to ®,, then it represents in Q an elementary
n-ary relation. Conversely, every elementary relation is represented in Q
by an equation.

PROOF

First, suppose that r=t belongs to ®,. This simply means that every
variable occurring in r or t is among vj, V2, ..., V,. In addition to
variables, r and t may contain occurrences of 0,s, + and X.
Let P be the n-ary relation represented by this equation in &. To
determine P we use the process of ‘deformalization’ illustrated in Ex.
5.8. We get, for all x e N”:
(*) Px = fx = gx,

where fx and gx are obtained from r and t respectively in the obvious


way: each y; is ‘translated’ as ‘x;’, 0 is ‘translated’ as ‘0’, and so on.
Thus fx and gx are given by expressions (in our metalanguage) made
up of variables ‘x’, ‘x2’, ... , ‘x,’ numerals ‘0’ and ‘1’ (the latter comes
from translating the symbol s of £) and operation symbols ‘+’ and ‘x’.
Simplifying these expressions by the rules of elementary algebra, we
228 10. Limitative results

see that fx and gx are polynomials; hence P is elementary (cf. Defs.


9.5.2 and 9.5.3).
Conversely, suppose that P is an n-ary elementary relation. Then P
satisfies an equivalence of the form (*), where fx and gx are poly-
nomials. To obtain an 2-formula that represents P in Q, all we have to
do is to formalize the equation fx = gx — translating it in the obvious
way into £2. We get an equation r=t that represents P in Q. a

5.10. Warning
Not every formula that represents in 2 an elementary relation is an
equation. What we have shown is that among the (infinitely many)
formulas representing in 2 a given elementary relation there must be
an equation.

5.11. Theorem
The following two conditions are equivalent:

(i) P is an arithmetical relation;


(ii) P can be obtained from elementary relations by a finite number of
applications of logical operations.

PROOF

(i) > (i). Let P be an n-ary arithmetical relation. Then P is repre-


sented in Q by some formula a € ®,,. We shall show by induction on
dega that (ii) holds.

Case 1: @ is an equation. Then by Lemma 5.9 P is itself elementary, so


(11) clearly holds.

Case 2: « = af. Let Q be the n-ary relation represented in Q by Bf.


Then it is easy to see that P = —Q. By the induction hypothesis, Q is
obtainable from elementary relations by a finite number of applications
of logical operations. Since P is obtained from Q by an application of
1, it is clear that (ii) holds.

Case 3: & = By. Let Q and R be the n-ary relations represented in


© by B and y respectively. Then it is easy to see that P=Q—>R=
4Q v R. By the induction hypothesis, both Q and R are obtainable
§5. Arithmeticity 229

from elementary relations by a finite number of applications of logical


operations. Hence the same holds for P.

Case 4: « = VyB. Without loss of generality, we may assume that y is


Vn+1 (otherwise, by appropriate alphabetic changes, we can obtain
from @ a variant Vv,,,p’, which is logically equivalent to a, has the
same degree as @ and, like a, represents P in Q). Therefore Be ®,,,,,
so B represents in Q an (n+ 1)-ary relation Q. Then clearly P is
obtained from @Q by (informal) universal quantification: Px <=
VyQ(x, y). By the induction hypothesis, Q is obtainable from elemen-
tary relations by a finite number of applications of logical operations.
Hence the same holds for P.
(ii) > (i). Assume (ii). Then P is obtainable from elementary rela-
tions by a finite number, say k, of applications of the three logical
operations: negation, implication and universal quantification. (The
other logical operations can be reduced to these.) We proceed by
induction on k.

Case 1: P itself is elementary. Then P is arithmetical by Lemma 5.9.

Case 2: P = 4Q, where Q is obtainable from elementary relations by


k —1 applications of the three logical operations. By the induction
hypothesis, Q is arithmetical, hence it is represented in &2 by some
formula B. Then P is represented in Q by the formula —f, and is
therefore arithmetical.

Case 3: P=Q—R, where Q and R are each obtainable from


elementary relations by fewer than k applications of the three logical
operations. By the induction hypothesis, P and Q are arithmetical,
hence represented in 2 by formulas B and y respectively. Then P is
represented in 2 by the formula B—y, and is therefore arithmetical.

Case 4: P is obtained by universal quantification from an (n + 1)-ary


relation Q:
Px > VXn41O(%, Xnt1),
where Q is obtainable from elementary relations by k — | applications
of the three logical operations. By the induction hypothesis, Q is
arithmetical, hence represented in 2 by some B € ®,,,;. Then it is easy
to see that P is represented in 2 by the formula Vv,4;B, and is
therefore arithmetical. ae
230 10. Limitative results

5.12. Remarks
(i) Thm. 5.11 means that the class of arithmetical relations is the
smallest class that contains all elementary relations and is closed
under the logical operations.
(ii) That the proof of Thm. 5.11 was so easy is due in part to the
notation we are using (cf. Warning 9.1.4).

The following corollary is extremely useful.

5.13. Corollary
If P is an n-ary r.e. relation, then it is arithmetical. Moreover, it is
represented in & by a formula of the form

W142 Pre AV a (Et),

where m = 0.

PROOF
By the MRDP Thm. 9.5.4, P is diophantine. This means that P is
obtained from an elementary relation by a finite number of (informal)
existential quantifications. The second half of the proof of Thm. 5.11
shows that P is represented in ® by a formula having the required
form. - #

5.14. Remark
Since the formula in Cor. 5.13 must be in @®,, all the variables
occurring inr or t must be among yj, V2, ..- 5 Vnam:

5.15. Corollary
Every recursive relation is arithmetical.

PROOF

A recursive relation is r.e. by Thm. 9.3.6, hence it is arithmetical by


Coro 1s: &

5.16. Remark
Since every elementary relation is recursive (see Rem. 9.5.5(i)), it
follows from Rem. 5.12(i) and Cor. 5.15 that the class of arithmetical
$6. Coding 23
relations is the smallest class that contains all recursive relations and is
closed under the logical operations.

5.17. Reminder
In what follows we use the terms function and graph in the same sense
as in Ch. 9: an n-ary function is an n-ary operation on N; and its graph
is the (m + 1)-ary relation P such that, for all x ¢ N” and all y € N,

P(x, y) > fx = y.

5.18. Definition
An arithmetical function is a function whose graph is an arithmetical
relation.

5.19. Theorem
Every recursive function is arithmetical.

PROOF

If f is a recursive function then by Thm. 9.3.10 its graph is r.e., hence


by Cor. 5.13 it is arithmetical. a

5.20. Problem
Let P be a k-ary arithmetical relation and let f,, fo, ..., f, be n-ary
arithmetical functions. Let the n-ary relation Q be defined, for all
x € N", by the equivalence
OR Pl fit tote ee 5 ED:

Prove that Q is arithmetical. (Argue as in the proof of Thm. 9.4.6.)

§6. Coding
6.1. Preview
In a natural language we can talk of many things: of shoes and ships
and sealing wax, of cabbages and kings — and of that very language
itself. Can the same thing be done in “, under its standard interpreta-
tion? Can be used to ‘talk’ of its own expressions, of their proper-
ties, of relations among them and of operations upon them? At first
glance this seems absurd: under its standard interpretation ‘talks’ of
Wey 10. Limitative results

numbers, numerical properties, relations and operations. However, we


can make this idea work by using the device of coding: to each symbol
and expression of £ we assign a code-number (a.k.a. Gédel number)
and then we can refer to expressions obliquely, via their code-num-
bers. Because “, under its standard interpretation, ‘talks’ of numbers,
it can be construed as referring obliquely to its own expressions, via
their code-numbers.
The particular method of coding is of little importance; the only
essential condition is that coding and decoding (encryption and decryp-
tion) must be algorithmic operations, of the kind that a computer can
be programmed to do. Thus, it should be possible to program a
computer so that, whenever an -2-expression is fed into it, the com-
puter, after a finite number of computation steps, will output the
code-number of the expression. Likewise, it should be possible to
program a computer so that, whenever a number is fed into it, the
computer, after a finite number of computation steps, will output a
signal indicating whether that number is the code-number of an
£-expression; and, if so, also output that expression. (Here we have
used the term computer in the sense explained in §2 of Ch. 9. Note
that, strictly speaking, computer inputs and outputs are not numbers
and “-expressions as such, but suitable representations of them in a
notation that the computer can handle.)
The: coding we shall introduce here is different from that used in
B&M (p. 327f). It will employ the binary (‘base-2’) representation of
numbers.

6.2. Definition
(i) To distinguish between the ordinary decimal and the binary
notation we shall use italic (slanted) digits ‘0’ and ‘J’ for the
latter: Thus 0=07 b= 1570 =2. 77 =3. 100 = 4 etc.
(ii) If kK21 and aj, az, ..., a, are any numbers, with a; >0, we
define their binary concatenation

aieae ES as

to be the number whose binary representation is obtained by


concatenating the binary representations of a, a), ..., a, in this
order. Thus, for example,
3°0°6 = 11707110 = 110110 = 32 + 16+ 44+2=54.
$6. Coding 233

6.3. Definition

(i) To each primitive symbol p of 2 we assign a code-number #p, as


follows:

#0=2=10,
#5 =4=27
= 100,
f= ev =7 000,
Xe 1h 2 '=. 70000.
#= = 32 = 2° = 100,000,
#1 = 64 = 2° = 1,000,000,
#— = 128 = 2’ = 10,000,000,
#V = 256 = 28 = 100,000,000,
en 2 tor ht

(ii) If k>1 and py, po, ..., px are primitive symbols of 2 then we
assign to the £-string p;p2.. . py the code-number

#(PiP2... Px) = #P1 #P.... #px.

6.4. Remarks

(i) It is easy to see that a number is the code-number of a string iff


its binary representation consists of one or more blocks, each of
which consists of a single ‘/’ followed by one or more ‘0’s. For
example, 0, 3 (= //) and 5 (= 10/) are not code-numbers of any
string.
(ii) Since “-expressions — terms and formulas — are in particular
£-strings, Def. 6.3 assigns a code-number #t to each term t anda
code-number #@ to each formula a. Note that in computing the
code-number of an expression, the symbols of the latter must be
taken in the order in which they occur in the original ‘Polish’
notation of . For example, the (false) equation so=s, is the
string =0s0. Hence its code-number is

#(=0s0) = 32°274°2 = 100,000°10°100~10

= 1,000,001 010,010 = 4,096 + 64 + 16+ 2 = 4,178.


234 10. Limitative results

6.5. Convention
When a noun or nominal phrase referring to “-expressions appears in
small capitals, it should be read with the words ‘code-number of’ or
‘code-number of a’ prefixed to it. Thus, for example, ‘TERM’ is short
for ‘code-number of a term’.

Many relations and functions connected with the syntax of “ can easily
be seen to be recursive.

6.6. Examples
(i) Consider the property Tm defined by
Tm (x) <q; X 1S a TERM.

It is clear that a computer can be programmed to check whether


any number x fed into it is a TERM or not. (According to standard
practice, the computer will first represent x in binary notation.
The results of Prob. 8.2.1 can then be used to ‘parse’ this binary
representation and check whether x is a TERM.) Thus Tm is a
recursive property.
(ii) The property Fla, defined by
Fla(x) <>g¢ x 1S a FORMULA,

is similarly seen to be recursive.


(iii) Consider the relation Frm, defined by
Frm(x, y) <>gz ¥ is a FORMULA belonging to ®,.

In other words, Frm(x, y) holds iff x = #a@ for some formula a


such that all the free variables of a are among vj, V2, ..., Vy.
Frm is clearly recursive.

The following example introduces a recursive function that will play an


important role in the sequel.

6.7. Example
The diagonal function is the unary function d defined as follows
HEN elec if x is A FORMULA @,
if x is not a FORMULA.

How can d(x) be calculated? First, we check whether x is a FORMULA.


If it isn’t, there is nothing further to do: d(x) is x itself.
§7. Tarski’s Theorem 235

Now suppose x is a FORMULA. We have to take that formula a of


which x is the code-number and substitute s, in it for v; (cf. Def. 4.4);
and d(x) is then the code-number of the resulting formula, a(s,). This
calculation is quite easy to do if x is represented in binary notation.
Each occurrence of v, appears in this representation as a block of the
form ‘/000000000°. We have to locate all blocks of this form that
correspond to free occurrences of v, in a@, and replace each of them by
the binary representation of s,, which consists of x successive blocks of
the form ‘00° (corresponding to x successive occurrences of s) fol-
lowed by a single block ‘10’ (corresponding to 0). When these replace-
ments are made, we have got the binary representation of d(x).
Clearly, a computer can be programmed to perform this procedure.
Thus we have:

6.8. Theorem
The function d is recursive. For any formula a,
d(#0) = #[a(szq)].
PROOF

For the recursiveness claim, see above. The equality follows directly
from the definition of d. |

§7. Tarski’s Theorem


7.1. Preview
We have seen that various relations connected with the syntax of & are
recursive. By Cor. 5.15, these relations are representable in $2; thus
they are expressible in & under its standard interpretation.
For example, we have seen that the property Tm of being a TERM is
recursive; hence it is arithmetical. So (cf. Rems. 5.3 and 5.7) there is a
formula @ € ®, such that, for any number x,

Tm(x) = NE ax] < NE a(s;,).

In this sense the formula @ expresses the property of being a TERM and
the sentence a(s,) ‘says’ that x is a TERM. Thus -2, under its standard
interpretation Qt, is able to discourse of various aspects of its own
syntax, albeit obliquely, by referring to its own expressions via their
code-numbers.
Can the standard semantics of £2 likewise be discussed in 2? We
shall show that it cannot.
236 10. Limitative results

7.2. Definition
For any set & of sentences, the property Ty is defined by

Ts(x) <qp X is a SENTENCE belonging to 2.

7.3. Remarks
(i) Equivalently, Ty is the set #[Z] of all seNTENCEs of Z.
(ii) In particular, Tg is the property of being a seNTENCE of @. In
other words, T(x) holds iff x is a TRUE SENTENCE (see Def.
1.9(ii)).

7.4. Theorem (Tarski, 1933)


Tg is not arithmetical.

PROOF

By Thm. 6.8, the diagonal function d is recursive; hence by Thm. 5.19


it is arithmetical. Now, let P be the property obtained by composing
Tg with d and then applying —; thus

(*) Px gp aT Q(d(x)).
If Tg were arithmetical, then by Prob 5.20 and Thm. 5.11 it would
follow that P is arithmetical as well. This would mean that there is
some formula @ € ®, such that, for any number x,
(+) Px = a(s,) € Q.

Taking x to be #a, we would therefore have:

O(S¥q) € 2 > P(#a) by (**),


= 1T9(d(#e)) by (*),
<=> 4To9(#[a(Sxz.)]) by Thm. 6.8,
> A(Sx_) ¢Q by Def> 722:

This contradiction proves that Tg cannot be arithmetical. &

7.5. Remarks
(i) Let us paraphrase the proof just given. If the property P were
arithmetical then it would be expressed (that is, represented in
§7. Tarski’s Theorem 237

@) by some formula a € ®,. For any number x, the sentence


a(s,) ‘says’ that Px holds. By (*), this is the same as ‘saying’ that
d(x) is not a TRUE SENTENCE.
Now, taking x to be the FORMULA @ itself, we find that the
sentence @(S4q) ‘says’ that d(#a) is not a TRUE SENTENCE. By
Thm. 6.8, this means that #[a(sz,)] is not a TRUE SENTENCE; in
other words, that the sentence a(sy,) itself is untrue.
Thus, @(s4_) would be ‘saying’ something like ‘I am false’!
Clearly, this is closely related to the well-known Liar Paradox.
Except that here there is no paradox: the argument in the proof
shows that a formula representing P in Q cannot exist; hence P —
and therefore also Tg — cannot be arithmetical.
(ii) Tarski’s Theorem applies not only to the language “ and its
standard interpretation; indeed, it was originally proved in a far
wider context. The argument used here can be adapted to show,
roughly speaking, that any sufficiently powerful formal language-
cum-interpretation — powerful enough to express certain key
concepts regarding its own syntax — cannot adequately express
the most basic notions of its own semantics. Hence it cannot
adequately serve as its own metalanguage.

The rest of this section contains an outline of a somewhat stronger


version of Tarski’s Theorem.

7.6. Definition
Let f be an n-ary function and let a € ®,,,;. We say that @ representsf
numeralwise in a theory © if, for any a € N", the sentence

(#**) VV n41[0(S,)<?>Vn41=S fa]

belongs to X.

7.7. Problem
Let @ represent the n-ary function f numeralwise in the theory &. For
any formula B in ®,, define B’ as the formula

AV 1 +1[B(V
141) A).
Prove that, for any ae N", the sentence B(s;/,)<>B'(s,) belongs to Z.
(It is enough to show that this sentence is deducible from (***) In
Fopcal.)
238 10. Limitative results

7.8. Definition
A formula y € ®, is called a truth definition inside a theory & if, for
each sentence @, & contains the sentence

V(S¥qy)9.

7.9. Problem
(i) Prove that if the diagonal function d is representable numeral-
wise in a consistent theory 2, then there cannot exist a truth
definition inside &. (Given any y € ®;, use Prob. 7.7 to find a
formula 6¢€ @®, such that for every number a the sentence
1 (Sa(a))<?4(S,) is in Z; then take @ as O(s4s5).)
(ii) Prove that d is representable numeralwise in £2; hence deduce
that there is no truth definition inside Q. (Since d is arithmetical,
there is a formula a € ®, that represents the graph of d in Q.
Show that the same @ also represents d numeralwise in @.)
(iii) Using (ii), give a new proof of Thm. 7.4. (Show that if Tg were
represented in Q by a formula y, then y would be a truth
definition inside 2.)
(iv) Prove that if & is a sound theory (see Def. 2.14) there is no truth
definition inside it.

§8. Axiomatizability
Recall (Def. 2.7) that a set of postulates (a.k.a. extralogical axioms) for
a theory & is a set of sentences f such that & = DeI. Having a set of
postulates is no big deal: every theory 2 has one, because (by Def. 2.5)
x = Dec. In order to qualify as an axiomatic theory, & must be
presented by means of a postulate set [ specified by a finite recipe.
This does not mean that I itself must be finite. (Of course, if I is finite
then so much the better, for then its sentences can be specified directly
by means of a finite laundry list.) Rather, it means that we are
provided with an algorithm — a finite set of instructions — whereby the
sentences of I can be generated mechanically, one after the other. By
Church’s Thesis, this is equivalent to saying that Ty must be given as
an r.e. property.

8.1. Conventions

(i) When we say that a set I of sentences is recursive (or r.e.), we


mean that Ty is a recursive (or r.e.) property.
§8. Axiomatizability 239
(ii) When we say that T is given as a recursive (or r.e.) set, we mean
that it is given in such a way as to enable us to program a
computer to operate as a decide-7y (or enumerate-7y;) machine.
Similarly, when we say that we can find a recursive (or r.e.) set of
sentences I’, we mean that we can describe I in such a way as to
indicate how a computer can be programmed to operate as a
decide-7y (or enumerate-7y;) machine.

8.2. Definition
(i) A theory 2 is axiomatic if it is presented by means of a set of
postulates [’, which is given as an r.e. set.
(ii) A theory & is axiomatizable if there exists an r.e. set T of
postulates for L.

8.3. Remark
Note that being axiomatic is an intensional attribute: it is not a
property of a theory as such, in a Platonic sense, but describes the way
in which a theory is presented. On the other hand, axiomatizability is
an extensional attribute of a theory as such, irrespective of how it is
presented.

8.4. Theorem
If & is an axiomatizable theory then there exists a recursive set of
postulates for X.

PROOF

By assumption, & = DeI, where I is anr.e. set of sentences.


Without loss of generality we may assume that I is infinite. (Other-
wise, we can add to F an infinite r.e. set of Fopcal axioms, for
example: © = {s,=s, :n € N}. The set I U @ is clearly an infinite r.e.
set of postulates for our theory 2.)
By assumption, there exists an enumerate-7y machine. Let

#Y0, PY pe ee as

be the order in which it enumerates the sENTENCES of I. We define


sentences 6, by induction on n as follows:

80 = Yo> 9n41 = ¥nt1A5, for all n.


240 10. Limitative results

Thus, 6, =¥,nAYn-1A---AYo for all n. We put A = {6, : ne N}.


It is easy to see that A is a set of postulates for 2. Indeed, it is
evident that for each n we have '} 946, as well as Afo y,. Hence
DcA = DcI =X.
Clearly, using the enumerate-7; machine we can construct a ma-
chine that enumerates the sENTENCEs of A,

(*) cae oess


2Pae iii
in this order. (The output of the enumerate-7; machine can be
converted by a simple further computation to yield this enumeration. )
Note, moreover, that in the enumeration (*) the sENTENCEs of A are
produced in increasing order: it is easy to see that

#841 = #(¥741A5,) > #6, for all n.

This enables us to construct a decide-T, machine, as follows. Given


any number x, monitor the enumeration (*) until a number greater
than x turns up — which is bound to happen, sooner or later, because
the numbers in (*) keep increasing. Then T,4(x) holds iff by this time x
itself has turned up in the enumeration (*).
This procedure is clearly mechanizable; hence A is a recursive set of
postulates for Z. ey

8.5. Remark
The proof of Thm. 8.4 shows that if 2 is not merely axiomatizable but
an axiomatic theory, then we can actually find a recursive set of
postulates for it.

To proceed, we shall need to assign a code-number to each non-empty


finite sequence of formulas.

8.6. Definition
For any formulas @1, @2, ..., @,, Where n = 1, we put

#(P1, G2, - +5 On) = FO 1° #q. 71. . 71#gy.

8.7. Remark
Thus, the binary representation of #(@,, 2, ..., @,) is obtained by
stringing together the binary representations of the code-numbers #q),
§8. Axiomatizability 241

#2, ..., #@,, in this order, but inserting a digit ‘7’ between each one
and the next. These additional ‘/’s serve as separators (like commas)
showing where the binary representation of the code-number of one
formula ends, and the next one begins. These separators are easily
detected: they are always the first of two successive occurrences of ‘/’.
(The second ‘/’ belongs to the binary representation of the next
formula.)

8.8. Definition
For any set of sentences I’ we define a binary relation Dedr by:

Dedr(x, y) <>g¢ x is A SENTENCE and y is a SEQUENCE-OF-


FORMULAS that constitutes a deduction of that sentence from I.

8.9. Lemma
If T is a recursive set of sentences then the relation Dedy is recursive.

PROOF

It is easy to see that the property of being an axiom of Fopcal in & is


recursive: from the description of the axioms (Ax. 8.9.1—-Ax. 8.9.8) it
is clear that a computer can be programmed to decide whether any
given number is an AxIom. By assumption, the property 7y is recursive
as well.
In order to determine whether Dedy(x, y) holds for a given x and y,
the following checks must be made.

(1) It must be verified that y is the code-number of a finite sequence


of formulas.
(2) If it is, this sequence must next be scanned to verify that it is a
deduction from I; that is, that each formula in it is an axiom, or a
member of I’, or obtainable by modus ponens from two formulas
that occur earlier in the sequence.
(3) If this turns out to be so, then finally the last formula of the
sequence must be checked to verify that it is a sentence and that
its code-number is x.

Clearly, a computer can be programmed to perform the checks in (1)


and (3). Since the property of being an axiom and the property Ty are
recursive, it follows that the checks required in (2) can likewise be
242 10. Limitative results

done by a suitably programmed computer. This shows that the relation


Dedy(x, y) is recursive. ba

8.10. Theorem
A theory is axiomatizable iff it is an r.e. set of sentences.

PROOF
If X is axiomatizable then by Thm. 8.4 there is a recursive set of
sequences [T such that & = DcI; that is, & is the set of sentences
deducible in Fopcal from I. Thus, for all x,

Tz (x) = dy Dedy (x, y).


By Lemma 8.9, Dedy is recursive, hence r.e. (by Thm. 9.3.6). There-
fore (by Thm. 9.3.8) Ty is an r.e. property.
Conversely, if the theory is r.e., then & has an r.e. set of
postulates: & itself, because & = Dez. |

8.11. Remarks
(i) The proof of Thm. 8.10 (including the proofs of Thm. 8.4 and
Lemma 8.9) shows that if & is not merely axiomatizable but an
axiomatic theory, then a program can actually be produced for
making a computer operate as an enumerate-7y machine. Hence
2 can be given as an r.e. set in the sense of Conv. 8.1(i).
(11) The theorem means that a theory is axiomatizable iff there exists
a finite presentation of it, by means of a program for generating
one by one all the s—ENTENCEs of the theory.

8.12. Theorem
Q is not axiomatizable.

PROOF

By Tarski’s Thm. 7.4, Tg is not arithmetical; hence by Cor. 5.13 it is


not an r.e. property. |

8.13. Theorem
If P is weakly representable in an axiomatizable theory then P is an r.e.
relation.
$9. Baby arithmetic 243

PROOF

Let P be an n-ary relation and let @ be a formula in ®,, that represents


P weakly in an axiomatizable theory 2. By Def. 4.7(i) we have, for all
xe N’,

Px = a(s,) € 2.

This means that, for all x e N”,

Px > Ty(#[a(s,)]).
The n-ary function f defined by the identity fx = #[a(s,)] is clearly
recursive. (To compute fx the m numerals s, must be substituted for
the variables v,, v2, ..., V, in @; the code-number of the resulting
sentence is fx. This computation can evidently be performed by a
suitably programmed computer. )
By Thm. 8.10 Ty is r.e.; therefore by Thm. 9.4.6(iii) P is r.e. as
well. &

8.14. Problem
Prove that if P is strongly representable in a consistent axiomatizable
theory, then P is a recursive relation. (First show that if a represents P
strongly in a theory, then -@ represents — P strongly in that theory.)

§9. Baby arithmetic


9.1. Preview
In this section we introduce a sound axiomatic theory IIj, which we
call ‘baby arithmetic’ because it formalizes only a very rudimentary
corpus of arithmetic facts: it ‘knows’ the true addition table and
multiplication table for numerals, and of course everything that can be
deduced from them logically — but nothing more. Despite its weakness,
it is sufficient for a very simple weak representation of all r.e.
relations.

II, is based on the following four postulate schemes:

9.2. Postulate scheme 1

Smt+So=Sm-
244 10. Limitative results

9.3. Postulate scheme 2

Sn +S841=8(Sm4t5S,)-

9.4. Postulate scheme 3

Sm*XSo=So-

9.5. Postulate scheme 4

Sm ¥*Sn4+1=Sm XSptSm-

Here m and n are any numbers.

9.6. Remark
Evidently, all these postulates are true; hence [Ip is sound. Also, this
theory is axiomatic, as the set of postulates 9.2-9.5 is evidently
recursive.
From the postulates of IIg we can deduce in Fopcal formal versions
of the addition and multiplication tables.

9.7. Example
Let us show that s,;+s,=s> € IIo. First, note that the equation

C) S$,+s,;=s(s|+So)
is an instance of Post. 2, and so belongs to IIp. Also, the equation

(2) 8, +So=s;
is an instance of Post. 1, and hence belongs to IIp. Using Ax. 6 of
Fopcal, we deduce from (2) the equation s(s;+so)=ss;, which (in view
of Deta.6) is

(3) S(s1+S9)=s>.

Finally, using Ax. 5 and Ax. 7 of Fopcal, we deduce from (1) and (3)
the equation

S, +s,=S),

which must therefore belong to IIo, as claimed.


$9. Baby arithmetic 245

9.8. Problem
Prove that II) contains the sentence:

(i) SatSp=Sman (the formal addition table),


(li) Sj %XS,=Simn (the formal multiplication table),

for all m, n € N. (Use weak induction on n.)

9.9. Lemma

If t is a closed term and t™ = n, then t=s,, € Mo.

PROOF

We proceed by induction on degt, considering the five cases men-


tioned in Rem. 1.4(iii). In each case it is enough to show that the
equation t=s,, is deducible in Fopcal from sentences known to belong
to II.

Case 1: t is a variable. Inapplicable here, as t is assumed closed.

Case 2: tis 0. Then n = 0 ands, = So = 0 by Def. 1.6. So the equation


t=s,) is 0=0, which is an instance of Ax. 5 of Fopcal, and hence
belongs to IIo.

Case 3: t is sr, where r is a closed term. Let r* = m. Thenn=m+1.


By the induction hypothesis, the equation r=s,, is in Ifo. From this
equation we deduce (using Ax. 6 of Fopcal) the equation sr=ss,,,
which is in fact t=s,,.

Case 4: t is q+r, where gq and r are closed terms. Let q’ =k and


r’ =m. Then n =k + m. By the induction hypothesis, the sentences
q=s, and r=s,, are in Ip. From these two sentences we deduce (again
using Ax. 6 of Fopcal) q+r=s,+s,,, which is in fact

t=s,+s,, 7

By Prob. 9.8(1), the equation

StS m=Sp

also belongs to Ip. From these two equations we deduce (using Ax. 5
and Ax. 7 of Fopcal) the equation t=s,,.
246 10. Limitative results

Case 5: t is qXr, where q and r are closed terms. This is similar to


Case 4. |

9.10. Definition
A formula (or sentence) of the form

dx, 4x, Pie dx,,(r=t),

where m = 0, is called a simple existential formula (or sentence).

9.11. Lemma
IIy contains all true simple existential sentences.

PROOF

Let @ be a true simple existential sentence. We proceed by induction


on the number m of quantifiers in q.
First, let m = 0. Then @ is an equation r=t, where r and t are closed
terms. Since @ is true, it follows that r” =t”: that is, r and t have the
same numerical value. Let n be this common numerical value. Then by
Lemma 9.9 the equations r=s, and t=s, belong to IIo. Using the
equality axioms of Fopcal, we can deduce from these two equations the
equation r=t, which must therefore belong to Ip as well.
For the induction step, let g have m + 1 quantifiers. Then @ has the
form Jxw, where y is a simple existential formula with m quantifiers,
and with no free variable other than x.
Since @ is true, it is easy to see (cf. Lemma 5.6) that w(x/s,,) must be
true for some number n. But (x/s,,) is a simple existential sentence
with m quantifiers; hence, by the induction hypothesis, it belongs to
II). By EG (Rem. 8.10.2(iv)), w(x/s,,) }Ixy. Thus @ must be in Hp.
cs]

9.12. Theorem

For any given n-ary r.e. relation P, we can find a formula of the form

AV, 414549 a AVG m(r=t),

that belongs to ®,, and represents P weakly in every sound theory that
includes Vo.
$9. Baby arithmetic 247

PROOF

By Cor. 5.13, we can find a formula @ of this form that represents P in


Q. Thus, for every x e N”,

Px = a(s,) € Q.

But a(s,) is a simple existential sentence. Hence, if & is a theory such


that Ip CY C Q, it follows from Lemma 9.11 that

a(s,) € Qs a(s,) € x.

Hence, for every x € N",

Px = a(s,) € =. a

9.13. Remarks
(i) By Thm. 8.13, only r.e. relations can be weakly represented in an
axiomatizable theory. We have just shown that every r.e. relation
is in fact weakly representable in Ig. Thus Ip achieves as much
as is possible for any axiomatizable theory as regards weak
representation.
(ii) As we shall see (Thm. 11.13), there are even weaker axiomatic
theories in which every r.e. relation is weakly representable.
However, the postulates of II) have been devised so as to make
this theory just strong enough for Lemma 9.11 to hold; hence r.e.
relations are weakly represented in My by formulas of a particu-
larly simple form.

9.14. Problem
Let U be an £-structure whose domain U is a singleton {u}.

(i) Show that all the sentences of Ip are satisfied in U.


(ii) Show that the sentence sp¥#sj is not in Mp.
(iii) Show that if the n-ary relation P is strongly representable in Mo,
then P is a trivial relation: Px holds either for all x € N” or for
none. In other words, P is N” or ©. (First show that if ae ®,
and UF a(s,) for some a € N”, then UF a(s,) for every x € N”.)

We return to our discussion of Thm. 9.12. Let a be the formula

AW 414Vn42 -- + dVn+m®,
248 10. Limitative results

where @ is any formula belonging to ®,,,,,; thus a € ®,,. Let P be the


n-ary relation represented by @ in ® and let ae N”. Thus we have
Past alfa] (see Rem. 5.7). Now, due to the particular form of @, it
is easy to see that

XE afa] <> there are numbers b,, b2,..., Dy,

such.that NF @ [a, by, b2,..., Dm).

Therefore
(9.15) Pa<>there are numbers b;, b2,..., Dm
such that NE @[a, by, bo, ..., bm].

This justifies the following

9.16. Definition
Let @ be a formula belonging to ®,,,,,, and let a@ be the formula

AV, 414V p42 Bae AV n+m®-

Let P be the n-ary relation represented by a@ in Q. Let ae N”. Then


by an a-witness that Pa we mean any m-tuple of numbers
(bi, bo, a oan-t) bm) such that

Stepan, Bo} aby

9.17. Remarks
(i) Thus (9.15) means that — under the assumptions made in Def.
9.16 — Pa holds iff there exists an a-witness that it does.
Moreover, the sentence a(s,) may be regarded as ‘saying’ that
there exists an a@-witness that Pa. Indeed, it is clear that a@(s,) is
true — that is, St a(s,) — iff such a witness exists.
(11) In the special situation covered by Thm. 9.12, P is an rue.
relation, @ is a simple existential formula of a particularly neat
form and @ is an equation r=t. In this case an a-witness that Pa
is an m-tuple (b;, b>, ..., b,,) such that
(*) SE @=t)la,"by, b2, = v03 Dal!

What does it take to show that such a witness exists? We may


search systematically through the set N” of all m-tuples of
numbers. For each m-tuple (b;,b2,...,bm), Wwe can test
§10. Junior arithmetic 249

whether it is a witness of the kind we are looking for. This


involves performing a finite number of additions and multiplica-
tions, to see whether (*) holds; in other words, whether the
equation r=t is satisfied by a valuation (based on %) that assigns
the.values 0°), 5, .... , by, to the variables vj, v2, .<., Vaam-
Of course, if Pa does not hold, then we can never find a witness
that it does. But if Pa does hold, then a witness exists, and in
order to recognize one we only need to be able to do the
following things:
1. Add and multiply, to calculate the values of terms r and t
under a given assignment of numerical values to their variables.
2. If both terms have the same value, recognize that this is so.
Now, these operations are so simple, that even the very modest
power of the theory IIo is sufficient for performing them for-
mally, within this theory. In other words, if the sentence a(s,) —
which ‘says’ formally that a witness of the required kind exists — is
true, then it can be deduced in Fopcal from Post. 9.2-9.5.

§10. Junior arithmetic


10.1. Preview
By adding to the postulates of baby arithmetic three schemes dealing
with inequalities, we obtain a somewhat more powerful axiomatic
theory, II, (a.k.a. junior arithmetic), in which all recursive relations
are strongly represented by relatively simply formulas. This will follow
from a major result, the Main Lemma, which will also play an
important role later on.

10.2. Definition
For any terms r and t, we put

rSt = g, dz(r+z=t),

where z is the first variable in alphabetic order that occurs neither in r


nor int.

10.3. Remark
This is yet another mnemonic pun: by Ex. 5.8(ii), the formula v;<v
represents in Q the relation <.
As postulates for II, we take Post. 1-4 (9.2—-9.5), as well as the
250 10. Limitative results

following three schemes:

10.4. Postulate scheme 5

Sm#Sn>

10.5. Postulate scheme 6

Vvi (v1,SS,<V1=Sp
VVj=S8, V. - -VV1=S,,),

10.6. Postulate scheme 7

Vvi(s,Svi VviSS,,);

where n is any number, and (in Post. 5) m is any number such that
mMF#N.

10.7. Remarks
(i) Evidently, II, is a sound axiomatic theory.
(ii) II, is a proper extension of II) because, for example, no instance
of Post. 5 belongs to I[p (cf. Prob. 9.14(11)).

10.8. Problem
Show that the results of Prob. 3.9(i), (ii) and (iii) hold with ‘Q’
replaced by ‘II,’.

10.9. Problem
(i) Let *2t be the 2-structure such that:
1. *N=NU {}, where © is an object that is not a natural
number;
2. *0=0;
3. *s is the extension of the ordinary successor function such that
*s() = 0;
4. *+ is the extension of ordinary addition such that if a = © or
b= thena*+ b=;
5. *x is the extension of ordinary multiplication such that if
a=~orb=thena*x b=0.
Show that *9 is a model for II,.
(ii) Prove that the sentence Vv,(sv;#s9) is not in II.
§10. Junior arithmetic 251

10.10. Definition
For any variable x, term r and formula @ we put

(i) axSro =g; dx(x<raaq),


(ii) VxSra =; Vx(x<r—a).

10.11. Preliminaries

Let two n-ary r.e. relations P and P’ be given. By Cor. 5.13, we


obtain two formulas

O = AW 1dV 42... dVy4,(r=t),

On AV 645 o Vaart =),

that belong to ®,, and represent P and P’ respectively in Q. Without


loss of generality we may assume that m' = m. Indeed, if m’ < m,
then by Prob. 8.5.12 we may insert a string of m—m’ additional
(‘vacuous’) quantifiers 4V,,4 7/41 --- 4dV,+m in @’ and obtain a formula
that is logically equivalent to a’ and, like it, represents P’ in Q.
Similarly, if m<m’', we can insert additional quantifiers into a.
Therefore we shall assume

@ = AV,415V,42---dV,4,(r=t),

=v au dV os: vay (ets).

From these two formulas we construct two new ones:

B = v,,41=Y5v 2425 -- - dVn+m=y(r=t),


BY = AVn+1SYAVn+2SY »-- WVn+m=y(r'=t’),
where y iS V,4m4+1- Finally, we construct a fifth formula:

y = dy(Ba 8B’).
Note that the free variables of 6 and Bf’ are among vj, v2, ..., V, and
y, and therefore y is in ®,.

10.12. Main Lemma


Given any two n-ary r.e. relations P and P’, let y be the formula
constructed above. Then for every a € N"” we have

Pa a aP'a > ¥(s,) € Th,

= Pan P'a=> 7 y(S,) € Th.


252 10. Limitative results

PROOF
For the simple but somewhat lengthy proof, see B&M, pp. 337-340.
(The Main Lemma appears there as Lemma 7.9, but its proof requires
two earlier results, Lemmas 7.7 and 7.8.) a

10.13. Analysis
Let @ and q’ be the equations r=t and r’=t’ that occur in the formulas
a and a’ respectively.
We take up the discussion begun in Rem. 9.17. Recall that Pa holds
iff there exists an a@-witness that it does. By Def. 9.16, such a witness is
an m-tuple (b,, bz,..., 6») for which
Jee g(a, by, bo, wagers bm].

Moreover, a(s,) ‘says’ that such a witness exists.


Now let us find out what is ‘said’ by a sentence obtained from B by
substituting numerals for its free variables. It is easy to see that

B(s,, y/s,) € 2 NE Bla, y/b]


< there are b;, bo, ...,b,, < b such that

ME pla, by, bz, ..~, By.

Thus B(s,, y/s,) ‘says’: There is an a-witness that Pa, and this witness is
bounded by the number b. In other words: Among the numbers <b
there can be found an a-witness that Pa. Exactly the same analysis
applies to P’, a’ and B’.
What does the sentence y(s,) ‘say’? Recalling that y = dJy(BA 78’),
we see that
y(s.) Ee QSME y[a]

< there is a number b such that

NF Bla, y/b] but NF B’'[a, y/d].


Thus y(s,) is true iff for some number b there is an a-witness, bounded
by b, that Pa, but there is no a@’-witness bounded by b that P’a.
Putting this a bit less accurately but more suggestively, y(s,) ‘says’:

An o-witness that Pa is found before an «'-witness that P'a.

Or, even more simply:

Pa is a-witnessed before P'a is a'-witnessed.


§10. Junior arithmetic 253

The whole of N” can be divided into four mutually exclusive regions,


as follows (see Fig. 5):
Region I=PaA-7P’,
Region H=-7Pa P’,
Region III= Pa P’,
Region IV =P x= PY.

Let us consider the truth value of y(s,) in each of these regions (that is,
for a belonging to each region).
For a in Region I, Pa holds, and hence is a-witnessed by some
m-tuple (b;,b,..., 6m). If we choose b large enough (say as the
largest among these b;) then Pa has an a-witness bounded by b. But in
this region P’a does not hold, hence has no a@’-witness, let alone a
witness bounded by our b. Thus Pa is a-witnessed before P'a is
a’-witnessed, simply because the former witness exists and the latter
does not. So y(s,) is true throughout Region I.
In Region II, the position is reversed. Here P’a holds, and is
therefore a'-witnessed; but Pa is not a-witnessed at all, let alone
before P'a is a’-witnessed. Hence y(s,) is false throughout Region II.
In Region III, both Pa and P’a hold, and are therefore witnessed,
but for some a in this region Pa may be a-witnessed before P’a is
«’-witnessed, while for other a in the same region this may not be the
case. So there is no general uniform answer for this region: y(s,) may
be true for some a and false for others.
In Region IV, neither Pa nor P’a holds, and hence neither is
witnessed. So, Pa is not a@-witnessed at all, let alone before P’a is
«'-witnessed. Hence the sentence y(s,) is false in this region.
254 10. Limitative results

Our Lemma says that for a in Region I the sentence y(s,) is not only
true, but even deducible from the postulates of II,; and that for a in
Region II the sentence is not only false, but even refutable (that is, its
negation is deducible) from these postulates.
The Lemma says nothing about the provability or refutability of
y(s,) in the other two regions. As far as Region III is concerned, the
reason is obvious: as we have seen, the sentence may not have a
uniform truth value in this region, so we cannot expect any uniform
result concerning its provability or refutability. But in Region IV the
position is quite different, because our sentence is false throughout this
region, just as in Region II. Why does the Lemma tell us nothing about
this fourth region?
To understand the reason for this discrepancy, we must examine
what kind of evidence is available for the truth or falsehood of y(s,)
when a is in Regions I, II, and IV.
In order to decide whether a given m-tuple (b;,b2,...,b,) of
numbers is an a-witness that Pa, we must be able to tell whether
ME ~la, b;, bo,..., 5], where @ is the equation r=t.
As we saw in Rem. 9.17, if (b,, b2,...,b,,) is indeed an a-witness
that Pa, then the operations required to recognize this fact can be
performed formally within Wo, and a fortiori within IT,.
Now, if (b;,b2,...,5,) iS not an a-witness that Pa, then the
operations required to recognize this fact involve not only adding and
multiplying to compute the relevant values of r and t, but also the
ability to tell that these two values are unequal. Thanks to Post. 5, all
this can be performed formally within IT,.
Thus, in II, it is possible to carry out formally all the operations
required to tell whether or not any given m-tuple (b;, b2,...,5,,) is
an a@-witness that Pa.
In order to decide whether a given a-witness that Pa is bounded by a
number b, we need to check whether each of the m components of the
witness is < b.
Now, if a is in Region I, then in order to verify that y(s,) is true we
need only to check that a given m-tuple of numbers is an a@-witness
that Pa, and is bounded by some given number 5; and then to verify
that each of the m-tuples bounded by b fails to be an a@’-witness that
P’a. Since there are only finitely many such m-tuples, all this requires
a finite number of simple steps.
In order to obtain a formal deduction of y(s,), we need to formalize
the process just described; and for this we need to have at our disposal
§10. Junior arithmetic ISS

a fairly modest set of postulates dealing formally with addition, multi-


plication and inequalities of both kinds (that is, # and <). The
postulates of II, are adequate for this.
In Region II the situation is broadly similar. If a is in this region,
then in order to verify that y(s,) is false, we need to check that a given
m-tuple is an @’-witness that P’a and is bounded by a given number b;
then we need to check, for each m-tuple bounded by b, that it fails to
be an a-witness that Pa. Finally, from these facts — namely, that P’a
has an a@-witness bounded by b, but Pa has no such a-witness — we
need to infer that Pa cannot be a-witnessed before P’a is a’-witnessed.
Again, all this amounts to a finite number of operations of additions
and multiplications, together with some very elementary inferences
about inequalities.
To obtain a formal refutation of y(s,), we need to formalize this
procedure. Again, the postulates of II, are adequate for this.
But in Region IV the situation is quite different. If a is in this region,
then in general there is no finite procedure of the kind described above
(that is, consisting of additions, multiplications and simple inferences
with inequalities) that would provide sufficient evidence that y(s,) is
false. Of course, the sentence is in fact false, but in general the only
way to verify this would be to check that none of the infinitely many
m-tuples of numbers is an a-witness that Pa. This requires an infinite
amount of calculation, and we cannot expect such an infinite procedure
to be formalizable within an axiomatic theory such as I).
One final remark. There is nothing magical about the particular set
of postulates of II,. It is not these postulates that are of prime
importance, but the Main Lemma. What we need is a sound axiomatic
theory, preferably quite weak, for which the lemma can be proved.
The theory I, was invented for the sake of the lemma. The postulates
of the theory were selected by working back from the lemma, and
discovering what postulates were needed to make the proof of the
lemma work without too much difficulty. This is of course the kind of
process described by Imre Lakatos in Proofs and Refutations.

10.14. Theorem
Given a recursive relation R, we can find a formula y, of the form
specified in Prel. 10.11, that represents R strongly in any theory that
includes 11,.
256 10. Limitative results

PROOE

In the Main Lemma, take P and P’ as R and —R, which are r.e. by
Thm. 9.3.6. Then the lemma shows that y represents R strongly in IT,,
and hence also in any theory that includes IT,. Es]

10.15. Problem
Let & be a theory that includes II,. Show that every recursive function
is representable numeralwise in XL. (If @ strongly represents the graph
of the n-ary function f in II,, prove that the formula

VySV n+ 1[0(Vn41/Y)<¥=Vn+i],

where y is V,,+2, represents f numeralwise in IT,.)


Hence show that if X is consistent there cannot exist a truth
definition inside it. (See Def. 7.8 and Prob. 7.9.)

10.16. Remark
The results of this section, particularly the Main Lemma, in a some-
what weaker form, are essentially due to Barkley Rosser.’ The present
stronger version is made possible by the MRDP Thm., which allows us
to take a and a’ as simple existential formulas.

§11. A finitely axiomatized theory


Whereas IIp and II, were based on infinitely many postulates, our next
theory, Hy, is based on the following nine.

11.1. Postulate I

Vv, (sv; #So).

11.2. Postulate II

Vv, Vv2(Svj=sv2>Vv,=V>).

' His 1936 paper, ‘Extensions of some theorems of Gédel and Church’, is reprinted in
M. Davis, The Undecidable.
$11. A finitely axiomatized theory pexsy)

(ee Postulate III

Vvi(v,+so=Vv)).

11.4. Postulate IV

Vv, Vv2[v,+sv.=s(v,+v>)].

ptRee Postulate V

Vvi(¥1XSo=So).

t1,6: Postulate VI

Vv, Vv2(v1 XSv.=v, Xv2+V}).

UNIg Postulate VII

Vvi(v;<so—v1=So).

11.8. Postulate VIII

Vv, Vv2(¥) SV.


V1 SV2 VV =SV>).

118: Postulate IX

Vv, Vv2(v¥1;Sv2 Vv2SVv}).

11.10. Remarks

(i) The theory I, is clearly sound and axiomatic.


(ii) Instead of adopting these nine separate postulates, we could have
taken their conjunction as a single postulate for II,. Indeed, we
shall make use of this option in the sequel. However, here we
have preferred to present shorter separate postulates, for the
sake of clarity.
(iii) TI, is a modification of a finitely axiomatized theory proposed by
Raphael Robinson in 1950.
258 10. Limitative results

11.11. Theorem
II, C Ib.

PROOF

It is quite easy to show that all the postulates of II, (Post. 1-7) can be
deduced from Post. I-IX. (DIY, or see the details in B&M, pp.
341-342.) cf

11.12. Problem
(i) Let *M be the 2-structure such that:
1. *N=NU{o}, where © is an object that is not a natural
number;
26 = 0s
3. *s is the extension of the ordinary successor function such that
NO
4. *+ is the extension of ordinary addition such that if a = ~ or
b=othena*+b=0;
5. *xX is the extension of ordinary multiplication such that if
b=+(0 then @ "b= 6: © *x0—0° and
@ x @ =. for
all @-
Show that * is a model for II,.
(ii) Prove that the sentence Vv,(sv;#v}) is not in II,.

11.13. Theorem
(i) Given an r.e. relation P, we can find a formula that represents P
weakly in any sound theory.
(ii) Given a recursive relation R, we can find a formula that represents
R weakly in any theory & such that & U MM, is consistent.

PROOF

(i) Let P be a given n-ary r.e. relation. Take a as the formula


provided by Cor. 5.13 and Thm. 9.12. Let a be the conjunction of
Posts. I-IX. We shall show that 1—a does the job.
II, is a sound theory, and by Thm. 11.11 it includes II,, hence also
Ip. Therefore by Thm. 9.12 a represents P weakly in I.
Let a be an n-tuple such that Pa. Then a(s,) € TM. Since all the
sentences of II, are deducible in Fopcal from a, we have | a(s,);
§ 12. Undecidability 29
hence by DT | x a(s,). Thus the sentence 1a(s,) belongs to every
theory, and in particular to every sound one.
Now let a be such that — Pa. Since @ represents P in 2, we have
a(s,) ¢ Q; in other words, a(s,) is false. But m is a true sentence, so
M+ (S,) is false, and hence cannot belong to any sound theory. Thus
we have shown that, for any sound theory 2 and anyae N",

Pa+1—>a(s,) € 2.

(ii) Let R be a given n-ary recursive relation. Take y as the formula


of Thm. 10.14. Then y represents R strongly in II.
Let a be an n-tuple such that Ra. Then by an argument like the one
used in the proof of (i) it follows that the sentence 1 (s,) belongs to
every theory.
Now let a be such that — Ra. Then -y(s,) € I,, hence 1} a (s,).
If 2 is a theory such that a> y(s,) € Z, then from & U {mw} we can
deduce both y(s,) and -y(s,), so & UM, is inconsistent. In other
words, if & U II, is consistent then 1 y(s,) ¢ X.
Thus we have shown that if & is a theory such that £2 UM, is
consistent then, for any ae N”,

Ran y(s,) € X. i®

§ 12. Undecidability
Let 2 be a set of sentences. The decision problem for & is the problem
of finding an algorithm -— a deterministic mechanical procedure —
whereby, for any sentence @, it can be determined whether or not
@ € =. This is clearly equivalent to the problem of finding an algorithm
whereby, for any number x, it can be determined whether or not
Ty(x) holds (that is, whether or not x is a SENTENCE of Z). If such an
algorithm is found, then this constitutes a positive solution to the
decision problem for Z, and & is said to be decidable. If it is proved
that such an algorithm cannot exist, this constitutes a negative solution
to that decision problem, and & is said to be undecidable.
Note that if © is undecidable, it does not follow that there is some
sentence for which it is impossible to decide whether or not it belongs
to ©. Each such individual problem may well be solvable by some
means or other. The undecidability of £ only means that no algorithm
will work for all sentences.
In order to make rigorous reasoning about decidability possible, this
260 10. Limitative results

intuitive notion must be given a precise mathematical explication.


Church’s Thesis (a.k.a. the Church—Turing Thesis) states that such
explication is provided by the notion of recursiveness. As mentioned in
Rem. 9.3.11(ii), this thesis is supported by very weighty arguments,
and has won virtually universal acceptance. Nevertheless, we shall
keep our terminology free from commitment to Church’s Thesis, by
using the adverb ‘recursively’ where the~thesis is needed to justify its
omission.

12.1. Definition
If & is a set of sentences such that the property Ty is not recursive, we
say that & is recursively undecidable and that the decision problem for
x is recursively unsolvable.

From Tarski’s Theorem 7.4 and Cor. 5.15 it follows at once that Q is
recursively undecidable. This, as well as many other undecidability
results, also follows from

12.2. Theorem
If X is a theory in which every recursive property is weakly represent-
able, then & is recursively undecidable.

PROOF

Suppose Ty were recursive. Let the property P be defined by

(*) Px <>gp 7 Ty(d(x)).


Since by Thm. 6.8 the function d is recursive, P would also be
recursive by Thms. 9.4.6(ii) and 9.4.1. Therefore P would be weakly
represented in & by some formula a € ®,. Thus, for all x e N,

(**) Px = a(s,) € =.
Taking x to be the number #a, we get, exactly as in the proof of Thm.
7.4:

A(Szq) € US P(#a) by (**),


+ 4T;(d(#a)) by (*),
<> 4Ty(#[a(S4.)]) by Thm. 6.8,
<> O(S¢a) ¢ X by Def. 7.2.
This contradiction proves that Ty cannot be recursive. |
§12. Undecidability 261

12.3. Corollary
Any sound theory is recursively undecidable.

PROOF
Immediate, by Thms. 9.3.6 and 11.13(i). 2

12.4. Corollary
Any consistent theory in which every recursive property is strongly
representable is recursively undecidable.

PROOF

Immediate, by Rem. 4.8(ii). z

12.5. Corollary
Any consistent theory that includes YW, is recursively undecidable.

PROOF
Immediate, by Cor. 12.4 and Thm. 10.14. &

12.6. Corollary
If X is a theory such that & UI, is consistent, then X is recursively
undecidable.

PROOF

Immediate, by Thm. 11.13(i1). |

12.7. Corollary (Church’s Theorem)


A is recursively undecidable.

PROOF
Immediate from Cor. 12.6, since A U TI, = II, is clearly consistent.

12.8. Remarks

(i) The consistency of II, follows of course from its soundness; but it
can also be proved by more elementary arguments, without
invoking semantic notions.
262 10. Limitative results

(ii) If & is an axiomatizable theory that satisfies the condition of


Thm. 12.2, then Ty is r.e. by Thm. 8.10, but not recursive. This
applies, in particular, to A, Mo, If, and M,. These provide us
with examples of r.e. properties that are not recursive.

12.9. Problem
Using Rem. 12.8(ii) and Prob. 8.14, obtain an alternative proof of
Thm. 8.12, not using Tarski’s Theorem.

12.10. Problem
Deduce Cor. 12.3 from Cor. 12.6.

12.11. Remarks

(i) Cor. 12.6 can be deduced from Cor. 12.5, as follows. Assume
that X is a theory such that 2 U II, is consistent.
In general, & UTI, is not a theory; but A = De(z UIL) is
clearly a consistent theory that includes II,, and hence also I.
Therefore by Cor. 12.5 A is recursively undecidable.
Let x be the conjunction of the nine postulates of II,. Then, it
is easy to show (DIY!) that, for any sentence q,
gpeAconr—@er.

Recall that #(1—>@) = 128°#n°#q. Therefore, for all x,

Ex(x) <= Tee), where fx = 128° #n~x.

Clearly, f is a recursive function. If Jy were recursive then T,


would likewise be recursive, which is impossible because A is
recursively undecidable. Therefore Ty cannot be recursive, so &
is recursively undecidable.
(ii) This illustrates the method of reduction. If £, and L, are theories
such that for all x
Ty,(x) > Tz, (fx),
where f is a recursive function, then f is said to be a reduction of
2, to 22. If 2; is known to be recursively undecidable, then it
follows that &, must also be recursively undecidable.
Starting from the results we have proved here, the method of
reduction is used to obtain many other undecidability results, not
$13. First-order Peano arithmetic 263

only for theories in the present language , but in other lan-


guages as well. It turns out that almost every interesting mathe-
matical theory is recursively undecidable. Which is just as well,
for otherwise mathematicians could be made redundant and
replaced by computers.

§13. First-order Peano arithmetic


The theory II, generally known as first-order Peano arithmetic
(FOPA), is based on the set of postulates comprising the first six
postulates of II, and the following scheme:

13.1. Postulate scheme of induction

VvoVv3... Vv,[a(so)
Vv; {a> a(sv,)}>Vv,a],

for every number n = 1 and any formulaae ®,,.

13.2. Remark

It is clear that II is axiomatic. We shall soon see that it is also sound.

To explain the meaning of these new postulates, we need the following


two definitions, the first of which extends the notation introduced in
Def. 5.4 to arbitrary @-structures.

13.3. Definition
(i) Let ae @®,,, let *M be an £-structure and let a = (a), a2,..., An)
be an n-tuple of individuals in the domain *N. If @ is satisfied by
some — and hence every — valuation o based on *% such that
vV;" =a; fori =1, 2, . ..,n, we write:

Et ale.
(ii) For any -structure *3t, any formula a ¢ ®, (with n 21) and
any a7, 43,..., 4, € *N, we put

MCX, 03 G>, 03,...50,) =a {0,€7*N 2 *NEafal}.

(iii) The set M(*2, a; az, a3,..., dy) is said to be defined in *M by a,


with parameter values az, a3, ..., d,. Sets of this form are said to
be parametrically definable in *Q.
264 10. Limitative results

13.4. Definition
If *N is an L-structure and X is any subset of *N, we say that X is
inductive in *Q if it satisfies the condition:

If *0 € X, and for every x € X also *s(x) € X, then X =*N.

ISD: Remarks
(i) A straightforward application of the BSD shows that

*N E Vv.Vv3... Vv,[@(so)>Vv, {a—a(sv,)}>Vv,a]

is equivalent to the condition that for all a2, a3,..., a, €*N the
set M(*2, @; ay, a3, ... , A) iS inductive in *2.
Thus, all instances of the induction postulates 13.1 hold in *2
iff all sets that are parametrically definable in *9t are inductive in
aot
(il) The Principle of Induction says that every subset of N is induc-
tive in Jt. It follows that all instances of 13.1 are true (that is,
they hold in 9t) and hence II is sound.
(iii) However, the present first-order induction scheme 13.1 falls far,
far short of expressing (under the standard interpretation) the full
power of the Principle of Induction. The latter states that all
subsets of N are inductive (in Jt). It is a second-order principle,
and was stated as such in Peano’s 1889 axiomatization of arith-
metic (cf. Rem. 6.1.8). Note that by Cantor’s Thm. 3.6.8 there
are uncountably many subsets of N.
On the other hand, our first-order induction postulates only
manage to state (under the standard interpretation) the induc-
tiveness of subsets of N that are parametrically definable in It —
that is, sets of the form M(X, @; az, a3, ... , a,). However, it is
easy to see (by an argument similar to that used in proving Thm.
6.3.9) that there are only denumerably many such subsets of N.
FOPA is in this sense merely a pale first-order shadow of the
theory outlined by Peano.
(iv) Nevertheless, II is an extremely strong theory. Although by
Thm. 8.12 we know that II must be a proper subtheory of ®, and
there must therefore exist true sentences that are not in I], it
requires very great ingenuity to discover such sentences.
The first examples of true sentences that do not belong to II
were given by Gédel in 1931. (We shall present his results in the
§13. First-order Peano arithmetic 265

next two sections.) However, his sentences state interesting facts


only when read obliquely, as referring to -expressions via their
code-numbers; and these facts are then of purely logical (rather
than general mathematical) interest.
It was only in 1977 that J. Paris and L. Harrington invented a
method for producing true sentences that do not belong to II
and, when read directly rather than obliquely, express reasonably
interesting mathematical facts, of the kind that can be of interest
to an honest mathematician, not just to a logician.

13.6. Theorem
ic il.
PROOF

It is enough to show that the last three postulates of II, (Post. VII-IX)
belong to II. This is not difficult. (DIY or see B&M, p. 343f.) a

3.7: Problem
Prove that Vv;(sv;#v,) € If. Hence by Prob. 11.12(11) TI is a proper
extension of II,.

13,8; Remarks
(i) Let *9 be a model of II. Then * is, in particular, also a model
of II,; hence by Prob. 10.8 there is a unique embedding f of Jt in
*R. Without loss of generality, we can assume that *2 is actually
an extension of Jt. This amounts to assuming that N C *N and
that fn = n for every number n. Thus by Def. 3.4 we have:

=O) *sirn)
= mods) meen= m+n, me Xn — mn,

for all numbers m and n.


(il) For some structural information about nonstandard models of II,
see B&M, p. 345 (Prob. 9.14 there). The same information
applies, in particular, to nonstandard models of &.

13.9) Problem
Let *M be a nonstandard model of TI. Without loss of generality,
assume that *Jt is an extension of 2.
266 10. Limitative results

(i) Show that N is not parametrically definable in *St. (See Def.


13.3(ii1).)
(ii) Hence prove, more generally, that no infinite subset of N is
parametrically definable in *2.

§14. The First Incompleteness Theorem


14.1, Preview
In an epoch-making paper published in 1931, Gédel presented two
main results, known as the First and Second Incompleteness
Theorems.!
Actually, the First Incompleteness Theorem came in two versions.
One version, which applies to sound theories — and therefore depends
on semantic notions — is explained in the introduction to the paper.
Thanks to the MRDP Thm. 9.5.4, proved in 1970, it is now possible to
obtain a somewhat stronger form of this semantic version: we shall
prove it as Thm. 14.2 below.
In the main body of the paper, Gédel proves another version of the
First Theorem, which does not depend on semantic notions. It applies
to theories that are w-consistent. (A theory & is w-inconsistent if for
some formula @ € ®,, it contains the sentences a(s,,) for all n as well
as the sentence —Vv,a. The inconsistent theory is clearly w-inconsist-
ent, but the converse is not true.) In 1936 Rosser showed that this
version of the First Theorem can be extended to theories that are just
consistent, but not necessarily w-consistent. His proof employed a
result which is the prototype of our Main Lemma 10.12.
Using the MRDP Thm., the Gédel—Rosser Theorem can also be
strengthened. This stronger form is proved below as Thm. 14.6.
The Second Incompleteness Theorem is stated by Gédel, but its
proof is only briefly outlined. In the next section we shall give a mere
outline of the proof.

By Thm. 8.12, Q is not axiomatizable. It follows at once that every


sound axiomatizable theory £ must be a proper sub-theory of @, and
hence incomplete. Thus there must exist a true sentence that does not
belong to 2. The following theorem shows that, given a sound axio-
matic theory 2, we can find such a sentence, of a particularly simple
form.

' A translation of his paper, ‘On formally undecidable propositions of Principia mathe-
matica and related systems I’, is printed in van Heijenoort, From Ferge to Gédel.
§ 14. First Incompleteness Theorem 267

14.2. Theorem (Semantic version of First Incompleteness Theorem)


Given a sound axiomatic theory X, we can find a true sentence © of the
form Vx\VX2 . . . WXm(p#q) that does not belong to X.

PROOF

By Thm. 8.10 (cf. also Rem. 8-11(i)), we obtain Ty as an r.e. property.


We put

Px <a Ty (d(x)).

Then by Thm. 9.4.6(ii1) P is r.e. as well. Hence, by Cor. 5.13, we can


find a formula a € ®, of the form dvj4v3 ... Av,,,,(r=t) that repre-
sents P in Q. Let

B de VvoVv3 aka Vint (r#t).

Clearly, B is logically equivalent to —a, and represents —P in Q.


Thus, for any number x,

B(s,) € 2 = AT (d(2)).

Taking x = #f, we have:

B(szg) ¢ 2 > 4Ty(d(#B))


<= 4 Ty(#[B(s4p)]) by Thm. 6.8,
<= B(szp) ¢ 2 by Def. 7.2.

Let @ be B(syg). Then @ is indeed of the form Vx,Vx2 ... VxX,(p#q).


(Here x,, Xo, ..., X aT€ V2, V3, --- 5 Vm41 respectively; and the terms
p and q are obtained from r and t respectively by substituting the
numeral sp for the variable v;.) Also, we have just shown that

geQ=s=qex.

This means that either

(*) ge Qandg
¢ Xz,

or

(#*) oy ¢Qandge.

However, (**) is impossible by the soundness of £; so (*) must be the


case. =]
268 10. Limitative results

14.3. Remarks
(i) If £, instead of being axiomatic, is assumed to be merely axio-
matizable, then the proof shows that there exists a sentence @
with the properties stated in the theorem, without telling us how
to obtain it.
(ii) In the proof of Thm. 14.2 we established not only that @ ¢ X but
also that @ € 2; hence mq ¢ 2. Since Y is assumed to be sound,
it follows that —q@ ¢ Z as well. Thus neither @ nor its negation is
in XZ, showing Z to be incomplete. For this reason Thm. 14.2 is an
incompleteness theorem.
(iii) Gédel says of @ that it is [formally] undecidable in X. We prefer
to say that @ is undecided by &, so as to avoid confusion with the
term undecidable explained in § 12.

14.4. Analysis
We know that Ty is the property of being a SENTENCE of &. Moreover,
tracing through the proof of Thm. 8.10, we see that — for an axiomatic
theory & — Ts was obtained as an r.e. property by noting that, for
any x,
Ty(x) <> x 1s a SENTENCE deducible from the postulates of X.

(The postulates referred to here are an r.e. set of postulates in terms of


which is presented.) Since B represents — P in Q, the sentence B(s,)
can be taken to ‘say’ (under the standard interpretation): d(x) is not a
SENTENCE deducible from the postulates of X.
In particular, when we take x to be #f, the sentence B(s,) is our @
and d(x) is #q@. Thus @ ‘says’: #@ is not a SENTENCE deducible from the
postulates of &. Or, briefly, @ ‘says’:

I am not deducible from the postulates of X.

Compare this with the proof of Tarski’s Theorem, analysed in Rem.


7.5(1). There we saw that if Tg were arithmetical, there would exist a
sentence that ‘says’ J am untrue. This would reproduce the Liar
Paradox in £. But in fact there was no paradox, since such a sentence
cannot exist; and this only showed that Tg is not arithmetical.
The Gédel sentence @ in the proof of Thm. 14.2. certainly does
exist: we have in fact shown how to obtain it. Nor does it assert its own
falsity; rather, it asserts its own undeducibility from the postulates of
x. Since & is sound, the postulates of X are all true. It follows that @
$14. First Incompleteness Theorem 269

cannot lie; for if it lied, it would be deducible from these true


postulates, and hence it would be true! Thus g is true and just because
of this it is undeducible from the postulates of &. Or, if you like, it is
true because it is undeducible from these postulates.
Here too there is no paradox: the Liar Paradox is merely skirted.
So far, we have subjected @ to the oblique version of the standard
interpretation, the reading that takes @ to refer to expressions of £ via
their code-numbers. It transpires that the -expression to which it
refers is @ itself. Read in this way, from a logical point of view, @ is a
very interesting sentence.
Now let us read @ directly. Deformalizing @ (cf. Ex. 5.8) we see that
under the standard interpretation it expresses a fact of the form
VWx1VX2... VXn(fx # gx),

where f and g are n-ary polynomials in the sense of Def. 9.5.2(ii). An


equation fx = gx, where f and g are two such polynomials, is called
diophantine, after Diophantus, the third-century(?) author of a book
on arithmetic. By a solution of the equation we mean an n-tuple a of
natural numbers such that fa = ga.
So @ asserts the unsolvability of the diophantine equation fx = gx,
and the proof of Thm. 14.2 produces, for any given sound axiomatic
theory &, a particular diophantine equation that is really unsolvable,
but whose unsolvability cannot be deduced from the postulates of X.
However, from a mathematical (rather than purely logical) point of
view, there is in general no reason why the equation fx = gx, or the
fact that it is unsolvable, should be of any particular interest.

From now on we shall consider the issue of completeness with regard


to axiomatizable theories that are consistent, but need not be sound.

14.5. Theorem
Every axiomatizable complete theory is recursively decidable.

PROOF
Let & be an axiomatizable complete theory. Then by Thm. 8.10 Ty is
an r.e. property.
Also, if x is any number then, by the completeness of 2 : > Ty(x) iff
x is not a SENTENCE, Or X iS a SENTENCE whose negation belongs to Z.
Thus
= Ty(x) = 3 Frm (x, 0) v Ty(64°x).
270 10. Limitative results

Here Frm is the recursive relation defined in Ex. 6.6(iii). Note that
Frm(x,0) holds iff x is a SENTENCE. Note also that by Def. 6.3
Ty(64°x) holds iff x is a SENTENCE whose negation belongs to 2.
Clearly, 64*x is a recursive function of x. Also, by Thm. 9.3.6
— Frm is r.e. since Frm is recursive. Hence by Thms. 9.4.3 and
9.4.6(iii) it follows that — Ty is an r.e. property.
Therefore by Thm. 9.3.6 Jy is recursive. : gz

By Cor. 12.5 it now follows that every consistent axiomatizable theory


x that includes II, must be incomplete; so there must exist a sentence
@ such that neither g € & nor qe. The following theorem shows
that, given a consistent axiomatic extension of II,, we can find such a
sentence whose form is relatively simple.

14.6. Theorem (Strengthened version of Godel—Rosser First


Incompleteness Theorem)
Given any axiomatic theory & that includes I1,, we can find a formula
y € ®,, of the form described in Prel. 10.11 with n=1, such that if
either of the sentences y(S4y), y(S#,) belongs to X then so does the
other, and hence & is inconsistent.

PROOF

As in the proof of Thm. 14.2, we obtain Ty as an r.e. property. We


now put, for any number x,
Px <>qp Ty (64° d(x)), = P'x <a Ty (d(x)).
Clearly, P and P’ are r.e. properties. So we can construct the formulas
a, a’, B, B’ and y as described in Prel. 10.11, with n = 1.
Note that, by Def. 6.3 and Thm. 6.8, it follows from the definitions
of P and P’ that
P(#y) <> ay(Szy) EX, P'(#y¥) > y(Se,) € XZ.
Now assume y(Syy)¢ 2. Then P'(#y). If it were the case that
ay(S¥y) ¢ Z then — P(#y) would also hold; therefore we would have
— P(#y)A P'(#y).
So by the Main Lemma 10.12 we would have —y(S#y) € I, Cz.
Thus +y(sy,) € & after all, and hence & is inconsistent in this case.
Similarly, suppose that sy(sz,) € £. Then P(#y) holds. If it were
the case that y(sy,) ¢ &, then — P’(#y) would also hold, and we would
have P(#y)A— P'(#y).
$14. First Incompleteness Theorem 271
So by the Main Lemma we would have y(sy,)¢M, CX. Thus
y(S4#,) € Z after all, and © is inconsistent in this case as well. |

14.7. Remark

If X& is not assumed to be axiomatic but merely axiomatizable, then the


proof shows that there exists a formula y with the stated properties,
without telling us how to obtain it.

14.8. Analysis
Consider the properties P and P’ defined in the proof of Thm. 14.6.
By definition, P’x holds iff d(x) is a SENTENCE belonging to LZ, and Px
holds iff d(x) is a SENTENCE whose negation is in X.
Thus, if 2 is consistent Px and P’x are incompatible. Referring back
to the definition of the four regions in Analysis 10.13, this means that,
for a consistent 2, Region III is empty. (The two discs in Fig. 5 do not
overlap.)
On the other hand, if 2 is the inconsistent theory, then Px and P’x
hold for exactly the same numbers x — namely, for any x such that
d(x) is a SENTENCE. Thus in this case Regions I and II are empty. (The
two discs in Fig. 5 coincide.)
Also, from Analysis 10.13 we find that (under the standard interpre-
tation) the Gddel—Rosser sentence y(s4,) ‘says’:

An a-witness that P(#¥) is found before an a'-witness that


P'(#y).

However, as we observed in the proof of Thm. 14.6, P(#y) means, by


definition, that the sentence —y(sy,) is deducible from the given
postulates of Z; or, in other words, that y(sy4,) itself is refutable from
these postulates. Also, P’(#y) means that y(sy,) is deducible from the
postulates of Z.
Thus y(s4,) ‘says’:
(*) An a-witness that I am refutable from the postulates of & is found
before an @'-witness that I am deducible from these postulates.

The proof of Thm. 14.6 shows that #y cannot belong to either of the
Regions I and II. Let us see why this is so.
Suppose #y were in Region I. Then, as we saw in Analysis 10.13,
y(Sgz,) must be true. Therefore (*) is a true statement. This implies
Oe, 10. Limitative results

that +y(sy,) is in 2. On the other hand, the Main Lemma tells us that
if #y were in Region I then y(sy,) would be in If; and hence in 2,
making & inconsistent — in which case Region I is empty! So #y cannot
be in Region I,
Now suppose #y were in Region II. Then the Main Lemma tells us
that y(sy,) is refutable from the postulates of II;, hence also from
those of &. Therefore there is an a-witness that y(sy,) is refutable
from the latter postulates. But since #y is in Region II, we know from
Analysis 10.13 that y(sy,) is false, so (*) is a false statement. This
implies that although an a-witness for the refutability of y(sy,) in X
can indeed be found, this does not happen before an a@’-witness for the
provability of y(sy,) in Z is also found. This means that y(s4,) is both
refutable and provable from the postulates of £, again making &
inconsistent, in which case Region II is empty. So #y cannot be there
either.
So #y must be in Region III or in Region IV. The former happens if
x is the inconsistent theory. In this case y(sy,) may be true or false,
depending on the precise form of a and a’, and in particular on the
(inconsistent) set of postulates by means of which & is given.
If & is a consistent theory, then Region III is empty, so #y belongs
to Region IV. From Analysis 10.13 we know that in this case y(s4,) is
a false sentence. This can also be seen from the proof of Thm. 14.6,
which shows that if 2 is consistent then y(sz,) is neither provable nor
refutable from the postulates of £. Therefore (*) is an untrue state-
ment, and y(sy,) is a false sentence.

§ 15. The Second Incompleteness Theorem


We take Thm. 14.6 as our point of departure. So let & be an axiomatic
theory that includes II,. We let P, P’, a, a’, B, B’ and y be as specified
in the proof of that theorem.
Part of what the theorem establishes is that

(1) If & is consistent then ay(Szy) ¢ X.

We now look for a formalization of (1); in other words, we wish to


find an -£-sentence that, under the standard interpretation, ‘states’ (1).
This is in fact quite easy.
First, the words ‘if ... then’ are obviously formalized by the
implication symbol >.
Next, let us look at the clause ‘+ y(sz,) ¢ Z’. It states that sentence
$15. Second Incompleteness Theorem 273

(Sy), whose code-number is 64°d(#y), is not in Z. Referring to


the definition of P in the proof of Thm. 14.6, we see that this amounts
to saying that — P(#y). But P is represented in Q by the formula a.
Thus the statement that — P(#y) is expressed formally by the sentence
1 (S4,), which ‘says’: P(#y) does not hold. As we have just seen, this
means that 4 y(sy4,) ¢ 2.
Now let us look at the clause ‘Z is consistent’. This is equivalent to
saying that the sentence 0#0 - the negation of the simplest logical
axiom — is not in Z. An easy calculation, using Def. 6.3, shows that
#(0=0) = 32°2°2 = 522. Since 0=0 is a sentence, substituting any
term for v, in it leaves it unchanged, so by Thm. 6.8 we get
d(522) = #(0=0) = 522. Therefore #(040) = 64%d(522). So, by the
definition of P, to say that 040 ¢ X& amounts to saying that — P(522).
This statement is expressed formally by the sentence 4a@(s5..), which
‘says’: P(522) does not hold. As this amounts to saying that 2 is
consistent, we put

Consiss; =df (S599).

We have now got an “-sentence that expresses (1) formally; it is

(Z) Consiss; > AOS a)

Moreover, since (1) is a true statement — we have proved it! — it


follows that (2) is a true sentence; in other words, it belongs to Q.
In fact, (2) belongs not only to 2 but even to FOPA. This can be
proved by examining the whole chain of (informal) reasoning that was
used to establish (1), and showing that it can be formalized: repro-
duced step by step as a formal deduction in Fopcal from the postulates
of FOPA.
This process is rather tedious, as the chain of reasoning that estab-
lished (1) was very long: it includes the proofs of Thm. 14.6 itself as
well as of the theorems on which it depended. But each step is quite
easy. What makes the whole thing possible is the great strength of the
postulates of FOPA. We shall not present the proof here, but ask you
to accept the fact that
(3) Consis; —> 0(S4y) ell.

Referring to Prel. 10.11 (with n=1), it is easy to see that for any
number k we have both y(s;,) + dyB(s;,) and JyB(s,) | a(s;,). Hence
+ a(s,)>(s;). Using this fact for k = #y, it follows from (3) that

(4) Consisy — —y(sy,) € II.


274 10. Limitative results

So far, we have assumed £ to be an axiomatic theory that includes IT,.


Now let & be an axiomatic theory that includes II; then it certainly
includes II, so (4) holds. Moreover, since II C 2, we have

(5) 2 Consis; — (Sy) eX.

15.1. Theorem (Second Incompleteness Theorem)


Let X be an axiomatic theory that includes FOPA. If & is consistent,
then the sentence Consiss, which expresses this fact formally, is not
in X.

PROOF

If Consisy € X then by (5) also 4 y(s4,) € Z. But then by Thm. 14.6 it


follows that & is inconsistent. co]

15.2. Remarks
(i) The Second Incompleteness Theorem can be extended to all
sufficiently strong formal theories, in & and other languages. All
that is required is that the theory in question is axiomatic, and
includes an appropriate ‘translation’ of II. For example, this
result applies to all the usual formalizations of set theory, such as
Lay
(ii) The result means that the consistency of any sufficiently strong
consistent axiomatic theory cannot be proved by means of argu-
ments that are wholly formalizable within that theory.
(iii) This poses a grave difficulty for the formalist view of mathema-
tics. For a brief discussion of this, see B&M, p. 358f.
(iv) In particular, if ZF is consistent, a proof of this fact cannot be
carried out within ZF itself. For this reason, it is extremely
unlikely that an intuitively convincing consistency proof for ZF
can ever be found.

Gédel’s two Incompleteness Theorems have had a profound and


far-reaching effect on the subsequent development of logic and philo-
sophy, particularly the philosophy of mathematics.
Appendix: Skolem’s Paradox

§1. Set-theoretic reductionism


Zermelo’s 1908 paper,’ in which he proposed his axioms for set theory,
begins with the words:

‘Set theory is that branch of mathematics whose task is to investigate


mathematically the fundamental notions ‘‘number’’, ‘“‘order’’, and ‘‘func-
tion”, taking them in their pristine, simple form, and to develop thereby
the logical foundation of all arithmetic and analysis; thus it constitutes an
indispensable component of the science of mathematics.’

This comes close to saying—but does not quite say—that set theory is
the sole foundation of the whole of mathematics. But soon such radical
claims were voiced. In 1910 Hermann Weyl’ put forward the view that
the whole of mathematics ought to be reduced to axiomatic set theory.
Each notion in the other branches of mathematics must be defined
explicitly in terms of previously defined notions. This regress stops
with set theory; ultimately all mathematical notions are to be defined
in set-theoretic terms.

‘So set theory appears to us today, in logical respects, as the proper


foundation of mathematical science, and we will have to make a halt with
set theory if we wish to formulate principles of definition which are not
only sufficient for elementary geometry, but also for the whole of
mathematics.”

The basic set-theoretic notions (set and membership) cannot be de-


fined explicitly, for this would lead to infinite regress. They — alone of
all mathematical notions — have to be characterized implicitly by means

wCitedins2 of Chal
2 The paper, ‘Uber die Definitionen der mathematischen Grundbegriffe’ is reprinted in
his Gesammelte Abhandlungen (1968). In this paper Weyl outlines a characterization of
the notion definite property, which he was to make more precise eight years later in
Das Kontinuum (cited in §2 of Ch. 1). The lines quoted here were translated by
Michael Hallett.

ZIAD
276 Appendix: Skolem’s Paradox

of an axiom system. Thus axiomatic set theory (more or less along the
lines proposed by Zermelo) becomes the ultimate framework for the
whole of mathematics.
Although Weyl was to change his mind, the reductionist view he had
expressed in 1910 was rapidly becoming very widespread among
mathematicians.
It was this reductionism that Skolem set out to criticize in 1922. His
short paper! — text of an address delivered at a congress of Scandina-
vian mathematicians — contains a lucid presentation of an astonishing
wealth of logical and set-theoretic ideas and insights.” But in Skolem’s
own view the most important result in his paper is what came to be
known as Skolem’s Paradox. It is the first of the fundamental limitat-
ive results in logic. In a Concluding Remark he comments on it:

‘T had already communicated it orally to F. Bernstein in Gottingen in the


winter of 1915-16. There are two reasons why I have not published
anything about it until now: first, I have in the meantime been occupied
with other problems; second, I believed that it was so clear that axiomat-
ization in terms of sets was not a satisfactory ultimate foundation of
mathematics that mathematicians would, for the most part, not be very
much concerned with it. But in recent times I have seen to my surprise
that so many mathematicians think that these axioms of set theory
provide the ideal foundation for mathematics; therefore it seemed to me
that the time has come to publish a critique.’

§2. Hugh’s world


In what follows we shall deal with ZF set theory; and for the sake of
simplicity we shall exclude individuals, so that all objects are assumed
to be sets. But a similar treatment, with very few minor modifications,
can be applied to the other axiomatizations of set theory, with or
without individuals.
As mentioned in §2 of Ch. 1, in order to make axiomatic set theory
conform to the highest standard of rigour and to bar the linguistic as
well as the logical antinomies, the theory must be formalized.
We shall assume that ZF is formalized in a first-order langauge £
with equality, whose only extralogical symbol is a binary predicate

' Cited in §2 of Ch. 1.


* Including the conjectures that it would ‘no doubt be very difficult’ to prove the
consistency of Zermelo’s axioms; and that the Continuum Hypothesis is ‘quite prob-
ably’ undecided by them. These conjectures have indeed been vindicated: the former in
1931 by Gédel’s Second Incompleteness Theorem (see §15 of Ch. 10); and the latter in
1963 by P. J. Cohen’s result (cf. Rem. 6.2.14).
$2. Hugh’s world PUT

symbol €. In the intended interpretation of L, the variables range over


all sets and € is interpreted as denoting the relation ¢ of membership
between sets. We shall write, for example, ‘x €y’ rather than ‘xy’.
Let ZF be the formalized version of ZF. The postulates and
theorems of ZF are expressed in ZF by L-sentences. For example, the
Principle of Extensionality (for sets) is expressed by

(PX) VxVy{Vz[z Exoz Ey|ox=y},

where x, y and z are distinct variables. (In ZF there is no need for


classes; instead, one can use properties, expressed by -2-formulas.)
From the formal postulates of ZF, formal versions of the theorems of
set theory can be deduced in Fopcal.
In particular, from the postulates of ZF we can deduce a formal
version of the theorem that there exists an uncountable set. This
theorem follows logically from the existence of a denumerable set — for
example, w (Thm. 4.3.4 and Def. 4.5.13) — and Cantor’s Thm. 3.6.8.
Let us assume that ZF is consistent. If it isn’t — which in any case is
highly unlikely — then the very idea of reducing to it the whole of
mathematics is quite pointless.
Since the language & is denumerable, it follows from Thm. 8.13.9
that ZF has a model U (an £-structure, or £-interpretation, under
which all the sentences of ZF are true) whose universe U is countable
(cf. Def. 4.5.13).!
It is easy to show that U cannot be finite. This can be done even
without invoking the Axiom of Infinity. Instead, it is enough to point
out that the formal version of Prob. 3.3.3 must hold in U. So we may
assume that U is denumerable.
Note that we are not saying that every model of ZF has a denumer-
able universe; only that among the models of this theory (assuming it is
consistent) there is a model U whose universe is denumerable.
What does the model U consist of? First, there is the universe U,
which serves as the range of values for the variables of 2. In other
words, the members of U (that is, the individuals of the structure U)
are what the structure U interprets as ‘sets’. We shall say that the
members of U are U-sets.
Second, there is the binary relation €". For brevity, let us put

1 In 1922 Fopcal had not been finalized (this was done in 1928 by David Hilbert and
Wilhelm Ackermann). When Skolem assumes ZF to be ‘consistent’, he means that it is
satisfiable. He then invokes the Léwenheim-Skolem Theorem (which he proves
directly, using relatively elementary means) to obtain a denumerable model for ZF.
278 Appendix: Skolem’s Paradox

E=€". E is a binary relation on U, that is, a binary relation among


U-sets; it serves as the interpretation of € in the structure U. We shall
say that E is the relation of U-membership. We shall write, for
example, ‘aEb’ when we wish to say that the U-set a bears the relation
E to the U-set b.
The U-sets are not necessarily sets in the usual intuitive sense, and
the relation E is not necessarily a relation of membership in the usual
intuitive sense. Rather, U-sets are sets in the sense of the model U, and
the relation E of U-membership is the relation of membership in the
sense of U. Nevertheless, since U is a model of ZF, all the postulates of
ZF are true in U; in other words, they hold for U-sets and U-member-
ship just as they presumably hold for ‘true’ sets and ‘true’ membership.
The same applies of course to all the theorems of ZF, that is, to all
£-sentences deducible from the postulates.
Let us imagine an internal observer, called Hugh, who ‘lives’ in the
structure U. Hugh can observe the U-sets; they are the objects of his
world. He can also observe whether or not aEb holds for any such
objects a and b. Let us also imagine that we can communicate with
Hugh and transmit to him -formulas, and in particular the postulates
of ZF. He can then check and confirm that, as far as his observations
go, these postulates — and indeed all &-sentences deduced from them
using Fopcal — are true under the interpretation U, in which the
variables are regarded as ranging over U and the predicate symbol € is
interpreted as denoting the relation E.
Hugh has heard that ZF is ‘axiomatic set theory’. He therefore
comes to the conclusion that the theory is really about the objects of
his world and the relation E. He comes to believe that the ‘sets’ and
the ‘membership relation’ about which the theory speaks are these
objects and the relation E (which for us are merely U-sets and
U-membership). We try to tell him that the theory is intended to be
about real sets and the real membership relation €. But he has no
reason to believe us. For one thing, he has no notion of what we call
‘real’ sets and ‘real’ membership — they are not real to him. Moreover,
since his observations confirm that the postulates of ZF are true under
his interpretation, why should he believe us that the theory is ‘really’
about some other reality?
Note that the whole idea of an axiomatic theory is that nothing must
be assumed concerning the objects and relations about which the
theory speaks, except what is stipulated by the postulates of the
theory. An axiomatic theory cannot say more than what can be
§3. The paradox and its resolution 279

logically deduced from its postulates. The postulates, and they alone,
must determine whether or not a given interpretation of the extra-
logical symbols of the theory is legitimate: an interpretation is legiti-
mate iff it satisfies the postulates.

Hugh — whose outlook is confined to his small provincial world -


cannot understand our talk of ‘real’ sets and ‘real’ membership. But we
— broad-minded people living in the big world — can understand his talk
of ‘sets’ and ‘membership’. We only have to remember that by ‘set’ he
means what we think of as a U-set, and by ‘membership’ he means the
relation E.
Actually, we can even translate his talk of [what are in reality] U-sets
and the relation E to talk about genuine sets and membership. This is
done as follows. For each U-set a, let us define:

(1) a@ = {x : xEa}.
We call @ the E-extension of a. Clearly, @ is a genuine set, in fact a
subset of U; and we have, for all x

2) x €a@<xEa.

Moreover, the correspondence between U-sets and their respective


E-extensions is one-to-one. This follows from the fact that U, being a
model of ZF, must satisfy the postulate PX. If a and b are two U-sets
such that the sets @ and 6 are equal, then it follows from (2) that a and
b have exactly the same U-members. But the postulate PX, as inter-
preted in U, says that any two U-sets that have exactly the same
U-members are equal. Hence a and b are equal.
Any statement about U-sets and the relation E can be rephrased in
terms of E-extensions (which are real sets) and real membership.

§3. The paradox and its resolution


We have already observed that all the theorems of ZF must be true in
u. Among these theorems there is, as we have noted, a sentence that
says ‘there exists an uncountable set’. In fact, Hugh — who is a
competent logician and has been able to deduce this theorem — can
point at a particular U-set c that instantiates the theorem: he can show
that c has ‘uncountably many members’. Naturally, we know that what
Hugh regards as ‘members’ of c are really just U-members of c; in
other words, they are U-sets that bear the relation E to c. But how can
280 Appendix: Skolem’s Paradox

this be? The whole universe U of U contains only denumerably many


objects; therefore for any a there can only be countably many objects
bearing the relation E to a. So how can there be uncountably many
objects bearing the relation E to c?
This seeming contradiction is Skolem’s Paradox.
In fact, the contradiction is only apparent. The resolution of the
paradox depends on the fact that many important set-theoretical
notions, such as countability, are relative. Thus, a U-set c may be
uncountable in the sense of the structure U, although when viewed from
the outside c has only countably many U-members.
Let us explain how this comes about. First, let us recall what it
means for a set to be countable. By Prob. 4.5.14, a set C is countable
iff there exists an injective function from C to the set w of finite
ordinals (which in set theory play the role of natural numbers). Recall
that such a function is itself a set. To say that f is an injective function
from C to w means that f is a set of ordered pairs of the form (x, &
with x € C and €€ a@, such that for each x € C there is exactly one
Ee w for which (x, &) € f, and for each € € w there is at most one
x € C for which (x, &) ef.
So, to say that C is countable means that there exists a set f having
the properties just mentioned. But we must realize that existence of
such-and-such a set may mean quite different things, depending on
whether, we interpret this phrase inside the structure U or in the outside
‘real’ world.
We have seen above that to each U-set a there corresponds the real
set @, which is a subset of U. Now, it is easy to see that the converse is
not generally true: if A is an arbitrary subset of U, there may not exist
any U-set a such that @ = A. Indeed, the mapping that maps each U-set
a to its E-extension @ is an injection from the set U to its own power
set; so by Cantor’s Theorem it cannot be surjective.'
Let A be a subset of U, that is, a set of U-sets. Then A is an object
in our world, the world of external observers. But if A is not @ for any
U-set a, then there is no object in the world U of the internal observer
Hugh that corresponds to A. The set A is then purely external, it
corresponds to nothing in Hugh’s ontology.

' Note the ironic double role played by Cantor’s Theorem. On the one hand, the fact
that Cantor’s Theorem holds inside U (that is, under the interpretation LU) gave rise to
the paradox in the first place, because it was used to give us an uncountable set (in the
sense of Ll). Now we are using the fact that Cantor’s Theorem holds ‘in the real world’
in order to resolve the paradox.
§3. The paradox and its resolution 281

Let us see how these observations help to resolve the paradox. In his
universe, Hugh finds an object w" that is ‘the set of finite ordinals’ in
his sense (ow satisfies, in the interpretation U, the formal set-theoretic
definition of the set of finite ordinals). Of course, w may not ‘really’
be the set of finite ordinals; but it is quite easy to see that its
F&-extension is in fact denumerable. Now, Hugh has found another
object (U-set) c, which serves as the U-power-set of w", and he can
prove that c is uncountable. We, on the other hand, can prove that c
has only countably many U-members. Who is right?
In fact, both he and we are right. He is right because there does not
exist any U-set @ that constitutes an injection from c to w" in the sense
of the interpretation U. We, on the other hand, are right because the set
c (the E-extension of c) is countable in the sense of our external
world. In fact, we can prove that there exists an injection f from ¢ to
the E-extension of w". However, this f is purely external; it exists in
the outside world, but it cannot be the E-extension of any U-set.
Indeed, if f were not purely external then it would be quite easy to
show that c is countable in the sense of U.
So the paradox is resolved — but not very happily. It is disappointing
to find that axiomatic set theory, if consistent, has such perverse
models, in which an object that is really quite modest in size can seem
huge.
As Skolem himself pointed out, countability is by no means the only
important set-theoretic notion that is relative in this sense. For exam-
ple, the notion of finiteness is also relative: we can have a model U
(even a denumerable one) in which a U-set a may be finite in the
internal sense of U, while in fact a has infinitely many U-members.
Indeed, by an argument like that used in the proof of Skolem’s Thm.
10.3.8 we can show that ZF has a model U (with denumerable universe)
such that the object w", the U-set-of-finite-ordinals, is nonstandard.
This means that — in addition to U-members of the form n" for each
natural number n (that is, U-cardinals corresponding to the natural
numbers) — w" also has U-members that do not correspond to any
natural number. If a is such a nonstandard U-member of w" then a is
a U-finite-ordinal: it satisfies in U the formal definition of the notion
finite ordinal (the formalization of the first part of Def. 4.3.1). In
particular, a is U-finite. But, as seen from outside U, @ actually has
infinitely many U-members, and so @ is really (really?) an infinite set!
(Cf. Warning 6.1.9.)
This has an important bearing on the issue raised in Rem. 10.3.10 in
282 Appendix: Skolem’s Paradox

connection with Skolem’s Theorem. The theorem says that the struc-
ture It of natural numbers cannot be characterized uniquely (up to
isomorphism) in the first-order language of arithmetic.
Now, Dedekind showed that the system of natural numbers can be
characterized uniquely in set-theoretic terms (cf. Rem. 4.3.8(i)). Fol-
lowing him, Peano also formulated his axiomatization of that system
using variables ranging over all sets of natural numbers (cf. Rem.
10.13.5(iii)). These, then, are characterizations of the system of natural
numbers within an ambient set theory. And they seem to work, in the
sense that in a sufficiently strong set theory it can be shown that
Peano’s axioms have (up to isomorphism) a unique model (cf. Rem.
Osis)
However, these set-theoretic characterizations are all relative: they
merely pass the buck to set theory. And now we see that set theory
itself has strange (nonstandard) models. Hugh may be very pleased to
find that in his world there is (essentially) just one ‘system of natural
numbers’ satisfying Peano’s second-order postulates. But we, from our
external vantage point, can see that this U-system-of-natural-numbers
is in fact (in fact?) nonstandard, containing infinite unnatural numbers,
which merely seem finite to Hugh.

It turns out that axiomatic set theory is unable to characterize some of


the most basic notions of mathematics, including intuitive set-theoretic
notions — except in a merely verbal sense. If mathematics — and in
particular the arithmetic of natural numbers — is more than mere verbal
discourse, then its reduction to axiomatic set theory somehow fails to
do it full justice.
Author index

Reference given to page numbers

Ackermann, W., 277 Lakatos, I., 255


Aczel, P., 60 Lennes, N., 14
JLs\al5 18355 2/7)
Barwise, J., 60 Lukasiewicz, J., 104
Bell,J. L., 1x, 10
Bernstein, F., 39, 276 Machover, M., ix
Berry, G. G., 14 Matiyasevié, Y., 207
Bolzano, B., 64 Mirimanoff, D., 59
Burali—Forti, C., 12, 59
Parise Jan 205
Cantor, G., 12, 13, 37, 39, 52—4, 64, 77, Partridge, E., 10
95,97 Pascal, B., 4
Church, A., 194 Peano, G., 77, 90, 264, 282
Cohen, D. E., 194, 203 Post, E., 194
Cohen, P. J., 10, 78, 97, 276 Putnam, H., 207

Davis, M., 207, 256 Robinson, J., 207


Dedekind, R., 14, 64, 65, 282 Robinson, R. M., 54, 257
Diophantus, 269 Rogers, H., 194, 203
Rosser, B., viii, 256, 266
Euclid, 8 Russell, B., 13, 14, 18, 54

Fowler, D., 8 Schmidt, E., 77


Fraenkel, A. A., 9, 14 Schroder, 39
Frese, G2 37 Skolem, T., 14, 15, 65, 220, 276, 277, 281

Godel, K., viii, 10, 78, 97, 194, 264, 266, Tarski, A., 153, 236
268, 274, 276 Turing, A., 194, 203

Hallett, M., 13, 275 van Heijenoort, J., 13-15, 90, 266
Halmos, P. R., 9 von Neumann, J., 54
Hamilton, W. R., 64
Harrington, L., 265 Weierstrass, K., 64
Hilbert, D327 Weyl, H., 15, 275, 276
Hodges, W., 152 Whitehead, A. N., 14

Kleene, S., 194 Zermelo, E., 14-18, 39, 77, 275, 276
Kuratowski, K., 24, 85 Zorn, M., 85

283
General index

References are given to the places where a term is defined, re-defined or explained.
A reference of the form x.y is to Section y of Chapter x. A reference of the form x.y.z is
to item z in Section y of Chapter x.

A2, see Pairing, Axiom of BSD, see Basic Semantic Definition


AC, see Choice, Axiom of Burali—Forti Paradox, 1.2, 4.2.19
Affirmation of the Consequent, Law of,
Teel Cantor’s Theorem, 3.6.8
Agreement of valuations, 8.5.2 Cardinal) 3:1-3;621:23.6:24
Al, see Infinity, Axiom of Cardinality, 3.1.3, 6.1.2, 6.2.1
Aleph, 6.2.11 of language, 8.7.16
Alphabetic change of variable, 8.6.10 Cartesian power, 2.1.12
Alphabetic order, 8.1.1 Cartesian product, 2.1.12
Antecedent, 7.1.4 Chain, 5.2.9
Anti-symmetry, 2.3.7 Choice, Axiom of, 5.1.2
Anti-symmetry, weak, 2.3.7 Choice function, 5.1.1
AP, see Power set, Axiom of Church’s Theorem, 10.12.7
AR, see Replacement, Axiom of Church’s (Church—Turing) Thesis, 9.3.11,
Argument in atomic formula, 8.1.7 10.12
Argument in term, 8.1.5 Clavius’ Law, 7.5.7
Arithmetical function, 10.5.17 Closed term, 8.5.6
Arithmetical relation, 10.5.2 Code number, 10.6.3, 10.8.6
Arithmos, 0.4.3 Coherence condition, 2.4.8
AS, see Subsets, Axiom of Combination, see Propositional
Associative Law of Conjunction, 7.5.13 combination
Associative Law of Disjunction, 7.5.13 Compactness Theorem (first-order),
Atomic formula, 8.1.7 8.13.12
AU, see Union set, Axiom of Compactness Theorem (propositional),
Axiom, first-order, 8.9.1-8.9.8 7.13.4
Axiom, propositional, 7.6.3—7.6.7 Complement, 1.4.3
Axiomatic theory, 10.8.2 Complete first-order arithmetic, 10.2.14
Axiomatizable theory, 10.8.2 Complete theory, 10.2.12
Comprehension Principle, 1.2
Composite arithmos, 0.4.3
Baby arithmetic, 10.9 Composition of functions, 2.4.1
Basic operation of structure, 8.3.5, 8.4.2 Compute machine, 9.3.9
Basic Semantic Definition, 8.4.6 Computable function, 9.3.9
Basic relation of structure, 8.3.5, 8.4.2 Computable relation, 9.3.1
Basis of inductive proof, 0.2 Computer, 9.2
Bi-implication, 7.3.1 Conclusion of modus ponens, 7.6.1
Bijection, 2.2.4 Conjunct, 7.2.5
Boolean operation, 9.1.3 Conjunction operation, 9.1.3
Bound occurrence of variable, 8.5.7 Conjunction formula, 7.2.5

284
General index 285

Connective, 7.1.1, 8.1.1 Existential Instantiation, Rule of, 8.11.6


Consequent, 7.1.4 Existential quantification, 9.1.3
Consistency of Propcal, 7.8.5 Exponentiation, see Power of cardinals
Consistency, 7.8.1, 8.9.10 Expression (in first-order language), 8.1.9
Constant, individual, 8.1.1 Extension of property, 0.1, 1.1.5
Contain, 1.1.4 Extensionalism, 1.1.7
Continuum, 6.2.14 Extensionality, Principle of, 1.1.6
Continuum Hypothesis, 6.2.14 Extralogical axiom, see Postulate
Contradictory pair, 7.8.1 Extralogical symbol, 8.1.1
Contraposition, Law of, 7.5.13
Countable set, 4.5.13 False sentence (in first-order language of
Cut [Rule], 7.6.13 arithmetic), 10.1.9
Finite character, 5.2.7
De Morgan’s Laws, 7.5.13 Finite ordinal, 4.3.1
Decidability, 10.12 Finite set, 4.3.5
Decide machine, 9.3.1 First-order language, 8.1.1
Decision problem, 10.12 of arithmetic, 10.1.3
Deducibility, 7.6.9, 8.9.10 First-order Peano arithmetic, 10.13
Deduction, 7.6.8, 8.9.10 First-order predicate calculus, 8.9.10
Deduction Theorem, 7.7.2 FOPA, see First-order Peano arithmetic
Deductive closure, 10.2.3 Fopcal, see First-order predicate calculus
Degree [of complexity] Formula, 7.1.4, 8.1.7
of term, 8.1.6 Foundation, Axiom of, 4.2.19
of formula, 7.1.7, 8.1.8 Free occurrence of variable, 8.5.7
Denial of the Antecedent, Law of, 7.5.7 Free variable, 8.5.7, 8.10.3
Denumerable set, 4.5.13 Freedom for substitution, 8.6.7
Designated individual of structure, 8.3.5, Function.2.2 anos251 Oma,
8.4.2 Function symbol, 8.1.1
Diagonal, 2.3.3 Functionality condition, 2.2.1
Diagonal function, 10.6.7
Difference, 1.4.4 Gédel number, see Code number
Diophantine equation, 10.14.4 Gédel’s Incompleteness Theorems, see
Diophantine relation, 9.5.3 Incompleteness Theorem
Direct product 3.5.9 Graph of function, 2.2, 9.3.9, 10.5.17
Direct sum, 3.4.12
Disjoint, 3.4.1 Hausdorff Maximality Principle, 5.2.11
Disjunct, 7.2.5 Henkin set, 8.13.2
Disjunction formula, 7.2.5 Hintikka set (first-order), 8.7.1
Disjunction operation, 9.1.3 Hintikka set (propositional), 7.10.2
Domain of function, 2.2.2 HMP, see Hausdorff Maximality Principle
Domain of structure, 8.3.5, 8.4.2 Hypothesis (of deduction), 7.6.8
DT, see Deduction Theorem
Identity, 2.3.3
EG, see Existential Generalization, Rule IE, see Inconsistency Effect
of Image (of class under map), 2.4.6
EIC, see Existential Instantiation, Rule of Immediate successor (number), 0.1.3
Element, 1.1.4 Immediate successor (ordinal), 4.2.26
Elementary relation, 9.5.3 Implication operation, 9.1.3
Embedding of structures, 10.3.4 Implication formula, 7.1.4
Empty class, 1.3.1 Implication symbol, 7.1.1, 8.1.1
Enumerate machine, 9.3.3 Impredicativity, 1.2
Equality symbol, 8.1.1 Incompleteness Theorem
Equation, 8.1.7 First (semantic version), 10.14.2
Equipollence, 3.1.1 First (Gédel—Rosser), 10.14.6
Equivalence class, 2.3.4 Second, 10.15.1
Equivalence relation, 2.3.2 Inconsistency, 7.8.1, 8.9.10
Existential Generalization, Rule of, 8.10.2 Inconsistency Effect, 7.8.6
286 General index

Indexed family, 3.4.9 Modus ponens, 7.6.1


Individual (in set theory), 1.1.3 Monomial, 9.5.2
Individual constant, see Constant, MRDP Theorem, 9.5.4
individual
Individual of structure, 8.3.5, 8.4.2
Negation formula, 7.1.4
Induction hypothesis, 0.2
Induction, Principle of Negation operation, 9.1.3
Negation symbol, 7.1.1, 8.1.1
Strong (on numbers), 0.3
Numerical value, 10.1.9
Strong (on ordinals), 4.4.2, 4.4.5
Weak (on finite ordinals), 4.3.7
Weak (on numbers), 0.2, 6.1.8 Object language, 7.1, 8.1, 10.1.3
Weak (on ordinals), 4.4.4, 4.4.6 One-to-one (map), 2.2.4
Inductive set, 10.13.4 Onto (map), 2.2.4
Infinite ordinal, 4.3.1 Operation, 8.3.2, 8.3.4
Infinite set, 4.3.5 Ordered pair, 2.1.2, 2.1.4
Infinity, Axiom of, 1.3.21 Ordered tuple, 2.1.7, 2.1.9, 2.1.10
Injection, 2.2.4 Ordinal, 4.2.12
Intended interpretation, see Standard Ordinal recursion equation, 4.6.3
interpretation
Interpretation (of first-order language),
8:4.2 Pair (unordered), 1.3.3
Intersection, 1.4.1, 1.4.2 see also Ordered pair
Into (map), 2.2.4 Pairing, Axiom of, 1.3.2
Inverse of function, 2.4.3 Parametrically definable set, 10.13.3
Isomorphism of posets, 4.5.4 Partial order, blunt, 2.3.7
Isomorphism of structures, 10.3.4 Partial order, sharp, 2.3.7
Partially ordered set, 4.5.2
Peano’s postulates, 6.1.8
Junior arithmetic, 10.10
Peirce’s Law, 7.5.7
PIP, see Principle of Indirect Proof
Least member, 0.4, 4.2.1 Polish notation, 7.2
Least Number Principle, 0.4 Polynomial, 9.5.2
Least Ordinal Principle, 4.4.1, 4.4.5 Poset, see Partially ordered set
Least upper bound, 4.2.23 Postulate, 7.6.10, 10.2.7
Liar Paradox, 10.7.5 Power class, 1.3.17
Limit ordinal, 4.2.28 Power of cardinals, 3.6.3
Linear calculus, 7.6.10 Power set, Axiom of, 1.3.18
LNP, see Least Number Principle Predicate symbol, 8.1.1
Logical consequence, 8.4.10 Prenex formula, 8.8.1
Logical equivalence, 8.4.10 Prenex normal form{ula], 8.8.1
Logical operation, 9.1.3 Prime arithmos, 0.4.3
Logical symbol, 8.1.1 Prime component, 7.1.6, 8.2.6
Logically true formula, 8.4.10 Prime formula 7.1.4, 8.2. 5
Logically valid formula, 8.4.10 Primitive symbol, 7.1.1, 8.1.1
Léwenheim-Skolem Theorem, 8.13.13 Principle of Indirect Proof, 7.8.15
Lub, see Least upper bound Product of cardinals, 3.5.1, 3.5.11
Proof, propositional, 7.6.8
Major premiss of modus ponens, 7.6.1 Propcal, see Propositional calculus,
Map, 2.2.1 Proper class, 1.3
Mapping, 2.2.1 Proper inclusion, 1.3.4
Mathematical induction, see Induction, Proper subclass, 1.3.4
Principle of (on numbers) Property, 2.1.14, 9.1.2
Maximal consistency, 7.12.1, 8.9.10 Propositional calculus, 7.6, 7.6.10
Maximal member, 5.2.3 Propositional combination, 7.3.1
Member, 1.1.4 Propositional operation, 9.1.3
Metalanguage, 7.1 Propositional symbol, 7.1.1
Minor premiss of modus ponens, 7.6.1. Provability, 7.6.9, 8.9.10
Model, 8.5.10 PX, see Extensionality, Principle of
General index 287

Range of function, 2.2.2 Structuralism, 1.1.7, 8.3.8


R.e., see Recursively enumerable Structure (for first-order language), 8.4.2
Recursive decidability, 10.12.1 Structure, mathematical, 8.3.5
Recursive function, 9.3.9 Subclass, 1.3.4
Recursive relation, 9.3.1 Subformula, 8.4.13
Recursive set of sentences, 10.8.1 Subsets, Axiom of, 1.3.6
Recursive undecidability, 10.12.1 Substitution, 8.6.1, 8.6.7, 8.6.15
Recursive unsolvability, 10.12.1w Successor ordinal, 4.2.28
Recursively enumerable relation, 9.3.3 Sum of cardinals, 3.4.4, 3.4.11
Recursively enumerable set of sentences, Surjection, 2.2.4
10.8.1 Symmetry, 2.3.2
Reductio [ad absurdum], 7.8.9
Reduction of theory, 10.12.11
Tarski’s Theorem, 10.7.4
Reflexivity, 2.3.2
Tautology, 7.4.4
Relation, 2.1.14, 9.1.2, 10.4.2
Tautological consequence, 7.4.4
Replacement, Axiom of, 2.2.9
Tautological equivalence, 7.5.11
Representation, numeralwise, 10.7.6 erm S.l.o
Representation, strong/weak, 10.4.7 Theorem (formal), 7.6.9, 8.9.10
Representability, strong/weak, 10.4.7 Theory, 10.2.5, 10.2.10
Restriction of function, 2.4.6 Total order, blunt, 2.3.11
Restriction of €, 4.2.5 Total order, sharp, 2.3.11
Restriction of C, 2.3.8
Totally ordered set, 4.5.2
Restriction of C, 2.3.8, 5.2.5 Transfinite induction, see Induction,
Restriction of well-ordering, 4.5.8 Principle of (on ordinals)
Revaluing, 8.4.5 Transfinite recursion, see Recursion,
Russell’s Paradox, 1.2 transfinite
Transitivity of class, 4.2.10
Satisfaction, 7.4.4, 8.4.9, 8.5.10 Transitivity of relation, 2.3.2
Satisfiability, 7.4.4, 8.4.10, Trichotomy, 0.1.2, 2.3.11
Schréder-Bernstein Theorem, 3.2.7 True sentence (in first-order language of
Scope of quantifier, 8.1.7 arithmetic), 10.1.9
Segment of well-ordered set, 4.5.8 Truth definition (inside theory), 10.7.8
Self-distributive Law of Implication, 7.5.7 Truth table, 7.5.1
Semantic completeness Truth value, 7.4.1
of truth tables, 7.5.9 Truth value of formula, 7.4.2, 8.4.6
strong, of Fopcal, 8.13.10 Truth valuation, 7.4.2
strong, of Propcal, 7.13.2 TT Lemma, see Tukey—Teichmiiller
weak, of Propcal, 7.9.4 Lemma
Semantic soundness Tukey—Teichmiller Lemma, 5.2.8
of Fopcal, 8.9.14 Type theory, 1.2
of modus ponens, 7.6.2
of Propcal, 7.6.12
of truth tables, 7.5.6 UGC, see Universal Generalization, Rules
Sentence, 8.5.10 of
Sheffer’s stroke, 7.5.16 UGYV, see Universal Generalization, Rules
Similarity (map), 4.5.4 of
Simple existential formula/sentence, UI, see Universal Instantiation, Rule of
10.9.10 Undecidability, 10.12
Singleton, 1.3.3 Underlying structure of valuation, 8.4.4
Skolem’s Paradox, Appendix §3 Union, 1.3.11, 1.3.14
Skolem’s Theorem, 10.3.8 Union set, Axiom of, 1.3.12
Sound set of sentences, 10.2.14 Universal class, see Universe of discourse
Standard interpretation (set theory)
(of first-order language of arithmetic), Universal formula, 8.1.7
10.1.8 Universal Generalization, Rules of,
Standard structure, 10.3.5 8.10.5, 8.10.10
see also Standard interpretation Universal Instantiation, Rule of, 8.10.1
String, 6.3.8, 7.1.3, 8.1.4 Universal quantification, 9.1.3
288 General index

Universal quantifier, 8.1.1 Variable of quantification, 8.1.7


Universe of discourse (set theory), 1.3, Variant, 8.6.13
8.3.7
Universe of discourse (first-order logic), Weight, 7.1.9, 8.2.1
8.4.2 Well-ordered set, 4.5.2
Universe of mathematical structure, 8.3.5 Well-ordering, 4.2.3
Universe of valuation, 8.4.4 e-well-ordering, 4.2.7
Upper bound, 4.2.23 Well-Ordering Theorem, 5.1.5, 5.1.6
Wirelementayie tno Witness m-tuple of numbers, 10.9.16
Witness term, 8.11
Witnessing formula, 8.11
Valuation of first-order language, 8.4.4 WOT, see Well-Ordering Theorem
Value of function, 2.2.2
Value of term under valuation, 8.4.6 Zermelo—Fraenkel set theory, 1.2
Value of variable under valuation, 8.4.4 ZF, see Zermelo—Fraenkel set theory
Variable, 8.1.1 Zorn’s Lemma, 5.2.12
aa ales >
ee ae
on oy a

bestiao Ded -@
ves € haa
Te
eI oa » @
rd
We
DATE DUE

MATA

Printed
in USA

HIGHSMITH #45230
ii l

a ———
eS
Ii I!) | |


3 1254 0266

WITHDRAWN. F.¢.y
This is an introduction to set theory and logic that starts completely
from scratch. The text is accompanied by many methodological
remarks and explanations. A rigorous axiomatic presentation of
Zermelo-Fraenkel set theory is given, demonstrating how the basic
concepts of mathematics have apparently been reduced to set
theory. This is followed by a presentation of propositional and first-
order logic. Concepts and results of recursion theory are explained in
intuitive terms, and the author proves and explains the limitative
results of Skolem, Tarski, Church and Goédel (the celebrated
incompleteness theorems).
For students of mathematics or philosophy this book provides an
excellent introduction to logic and set theory.

Cover design by Chris McLeod

ISBN 0-521-47493-0
CAMBRIDGE
UNIVERSITY PRESS

9°780521°474931

You might also like