0% found this document useful (0 votes)
58 views311 pages

CMV Springer Draft 2016 - Chapters 21 33

This document discusses Taylor polynomial approximations. It defines Taylor's polynomial of degree n as the polynomial with coefficients given by the derivatives of the function f up to order n, evaluated at the point x0. The key theorem states that if a function f is n times differentiable at x0, then f has a unique polynomial expansion of degree n at x0, given by Taylor's polynomial of degree n. Higher degree Taylor polynomials provide better approximations of f near x0, with the error shrinking faster as the degree n increases.

Uploaded by

abiy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views311 pages

CMV Springer Draft 2016 - Chapters 21 33

This document discusses Taylor polynomial approximations. It defines Taylor's polynomial of degree n as the polynomial with coefficients given by the derivatives of the function f up to order n, evaluated at the point x0. The key theorem states that if a function f is n times differentiable at x0, then f has a unique polynomial expansion of degree n at x0, given by Taylor's polynomial of degree n. Higher degree Taylor polynomials provide better approximations of f near x0, with the error shrinking faster as the degree n increases.

Uploaded by

abiy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 311

Chapter 21

Approximation

21.1 Taylor’s polynomial approximation


21.1.1 Polynomial expansions
Thanks to Theorem 771, a function f : (a; b) ! R with a derivative at x0 2 (a; b) has locally,
in such a point, the linear approximation
f (x0 + h) = f (x0 ) + f 0 (x0 ) h + o (h) as h ! 0
The fundamental properties of this approximation are two:

(i) the simplicity of the approximating function: in this case the linear function df (x0 ) (h) =
f 0 (x0 ) h;
(ii) the quality of the approximation, given by the error term o (h).

Intuitively, there is a tension between these two properties: the simpler the approximating
function, the worst the quality of the approximation. In other terms, the simpler we want
the approximating function, the higher the error in which we may incur. In this section
we study in detail the relation between these two key properties. In particular, suppose to
weaken property (i), being satis…ed with an approximating function that is a polynomial of
degree n, not necessarily with n = 1 as in the case of a straight line. The desideratum that
we posit is that there is a corresponding improvement in the error term that should become
of magnitude o (hn ). In other words, when the degree n of the approximating polynomial
increases, and so the complexity of the approximating function, we want that the error term
improves in a parallel way: an increase in the complexity of the approximating function
should be o¤set by a greater goodness of the approximation.

To formalize these ideas, we introduce polynomial expansions. Recall that a polynomial


pn : R ! R that is at most of degree n has the form p(h) = 0 + 1 h + 2 h2 + + n hn .

De…nition 867 A function f : (a; b) ! R admits a polynomial expansion of degree n at


x0 2 (a; b) if there exists a polynomial pn : R ! R, at most of degree n, such that
f (x0 + h) = pn (h) + o (hn ) as h ! 0 (21.1)
for every h such that x0 + h 2 (a; b), that is, for every h 2 (a x0 ; b x0 ).

599
600 CHAPTER 21. APPROXIMATION

For n = 1, the polynomial pn reduces to the a¢ ne function r (h) = 0 + 1 h of Sec-


tion 18.11.1 and so the approximation (21.1) reduces to (18.24). Therefore, for n = 1 the
expansion of f at x0 is equal, apart from the known term 0 , to the di¤erential of f at x0 .
For n 2 the notion of polynomial expansion goes beyond that of di¤erential. In par-
ticular, f has a polynomial expansion of degree n at x0 2 (a; b) if there exists a polynomial
pn : R ! R that approximates f (x0 + h) with an error which is o (hn ), i.e., which, as
h ! 0, goes to zero faster than hn . To a polynomial approximation of degree n there corres-
ponds, therefore, an error term of magnitude o (hn ), thus formalizing the tension previously
mentioned between the complexity of the approximating function and the goodness of the
approximation.
For example, if n = 2 we have the so-called quadratic approximation:
2
f (x0 + h) = 0 + 1h + 2h + o h2 as h ! 0

Relative to the linear approximation

f (x0 + h) = 0 + h + o (h) as h ! 0

the approximating function is now more complicated: instead of a straight line (the poly-
nomial of …rst degree 0 + h) we have a parabola (the polynomial of second degree
2
0 + 1 h + 2 h ). But, on the other hand the error term is now better: instead of o (h), we
2
have o h .

N.B. By setting x = x0 + h, the polynomial expansion can be equivalently written in the


form
n
X
f (x) = k (x x0 )k + o ((x x0 )n ) as x ! x0 (21.2)
k=0

for every x 2 (a; b), which is often used. O

An important property of polynomial expansions is that, when they exist, they are unique.

Lemma 868 A function f : (a; b) ! R has at most a unique polynomial expansion of degree
n at every point x0 2 (a; b).

Proof Suppose that, for every h 2 (a x0 ; b x0 ), there are two di¤erent expansions
2 n
0 + 1h + 2h + + nh + o (hn ) = 0 + 1h + 2h
2
+ + nh
n
+ o (hn ) (21.3)

Then
2 n
0 = lim 0 + 1h + 2h + + nh + o (hn )
h!0
2 n
= lim 0 + 1h + 2h + + nh + o (hn ) = 0
h!0

and (21.3) becomes


2 n
1h + 2h + + nh + o (hn ) = 1h + 2h
2
+ + nh
n
+ o (hn ) (21.4)
21.1. TAYLOR’S POLYNOMIAL APPROXIMATION 601

By dividing both sides by h, we then get


n 1
1 + 2h + + nh + o hn 1
= 1 + 2h + + nh
n 1
+ o hn 1

Hence,
n 1
1 = lim 1 + 2h + + nh + o hn 1
h!0
n 1
= lim 1 + 2h + + nh + o hn 1
= 1
h!0

and (21.4) becomes


2 n
2h + + nh + o (hn ) = 2h
2
+ + nh
n
+ o (hn )

By iterating what we have done above, we can show that 2 = 2 , and so on until we show
that n = n . This proves that at most one polynomial p (h) can satisfy approximation
(21.1).

21.1.2 Taylor’s Theorem


De…nition 869 Let f : (a; b) ! R be a function n times di¤ erentiable at a point x0 2 (a; b).
The polynomial Tn : R ! R of degree at most n given by
1 1 (n)
Tn (h) = f (x0 ) + f 0 (x0 ) h + f 00 (x0 ) h2 + + f (x0 ) hn
2 n!
Xn (k)
f (x0 ) k
= h
k!
k=0

is called Taylor’s polynomial of degree n of f at x0 .

For convenience of notation we have put f (0) = f . Such polynomial has as coe¢ cients the
derivatives of f at the point x0 until the order n. In particular, if x0 = 0 Taylor’s polynomial
is sometimes called MacLaurin’s polynomial.

The next result, which is fundamental and of great elegance, shows that if f has a suitable
number of derivatives at x0 , the unique polynomial expansion is given precisely by Taylor’s
polynomial.

Theorem 870 (Taylor) Let f : (a; b) ! R be with n 1 derivatives on (a; b). If f is n


times di¤ erentiable at x0 2 (a; b), then f has at x0 one and only one polynomial expansion
pn of degree n, given by
pn (h) = Tn (h) (21.5)

Under simple hypotheses of derivability at x0 , we thus have the fundamental polynomial


approximation
n
X
n f (k) (x0 )
f (x0 + h) = Tn (h) + o (h ) = hk + o (hn ) (21.6)
k!
k=0
602 CHAPTER 21. APPROXIMATION

where Tn is the unique polynomial, at most of degree n, that satis…es De…nition 867, i.e.,
which is able to approximate f (x0 + h) with error o (hn ).
The approximation (21.6) is called Taylor’s expansion (or formula) of order n of f at
x0 . The important special case x0 = 0 takes the name of MacLaurin’s expansion (or formula)
of order n of f .
Note that for n = 1 Taylor’s Theorem coincides with the direction “if” of Theorem 771.
Indeed, since we set f (0) = f , saying that f has a derivative 0 times on (a; b) is equivalent
simply to say that f is de…ned on (a; b). Hence, for n = 1, Taylor’s Theorem states that, if
f : (a; b) ! R has a derivative at x0 2 (a; b), then

f (x0 + h) = T1 (h) + o (h) = f (x0 ) + f 0 (x0 ) h + o (h) as h ! 0

that is f is di¤erentiable at x0 .

For n = 1, the polynomial approximation (21.6) reduces, therefore, to the linear approx-
imation (18.29), that is, to

f (x0 + h) = f (x0 ) + f 0 (x0 ) h + o (h) as h ! 0

If n = 2, (21.6) becomes the quadratic approximation


1
f (x0 + h) = f (x0 ) + f 0 (x0 ) h + f 00 (x0 ) h2 + o h2 as h ! 0 (21.7)
2
and so on for higher orders.
The approximation (21.6) is very important in the applications and it is the actual form
that takes the tension between complexity of the approximating polynomial and goodness
of the approximation we talked about before. The trade-o¤ must be solved case by case,
according to the relative importance that have the two properties in the particular applica-
tion we are interested in. In many cases, however, the quadratic approximation (21.7) is a
good compromise: among all the possible degrees of approximation, the quadratic one has a
particular importance.

O.R. Graphically the quadratic approximation (also called of the second order) is a parabola.
The linear approximation, as we know, is, graphically, the straight line tangent to the graph
of the function; the quadratic approximating is the so-called osculating parabola,1 that is the
parabola that shares at x0 the same value of the function, the same slope (…rst derivative)
and the same curvature (second derivative). H

Proof In the light of Lemma 868, it is su¢ cient to show that Taylor’s polynomial satis…es
(21.1). Let us start by observing preliminarily that, since f has a derivative n 1 times on
(a; b), we have f (k) : (a; b) ! R for every 1 k n 1. Moreover, thanks to Proposition
772, f (k) is continuous at x0 for 1 k n 1. Let ' : (x0 a; b x0 ) ! R and : R ! R
be the auxiliary functions given by
n
X f (k) (x0 )
' (h) = f (x0 + h) hk and (h) = hn
k!
k=0
1
From os, mouth, that is the “kissing” parabola.
21.1. TAYLOR’S POLYNOMIAL APPROXIMATION 603

We have to prove that


' (h)
lim =0 (21.8)
h!0 (h)

We have
(k) (k)
lim (h) = (0) (21.9)
h!0

for every 0 k n 1. Moreover, since f (k) is continuous at x0 for 0 k n 1, we have


n
Xk
(k) (k) (k) f (k+j) (x0 ) j
' (h) = f (x0 + h) f (x0 ) h (21.10)
j!
j=1

so that
lim '(k) (h) = '(k) (0) = 0 (21.11)
h!0

Thanks to (21.9) and (21.11), we can apply de l’Hospital’s rule n 1 times, in order to have

'(n 1) (h) '(n 2) (h) '(0) (h)


lim (n 1) (h)
= L =) lim (n 2) (h)
= L =) =) lim =L (21.12)
h!0 h!0 h!0 (0) (h)

with L 2 R. Simple calculations show that (n 1) (h) = n!h. Hence, being f with a derivative
n times at x0 , expression (21.10) with k = n 1 implies

'(n 1) (h) 1 f (n 1) (x
0 + h) f (n 1) (x )
0 hf (n) (x0 )
lim (n 1) (h)
= lim
h!0 n! h!0 h
!
1 f (n 1) (x
0 + h) f (n 1) (x )
0 (n)
= lim f (x0 ) =0
n! h!0 h

Thanks to (21.12), we can therefore conclude that (21.8) holds, as desired.

As seen for (21.2), by setting x = x0 + h the polynomial approximation (21.6) can be


rewritten as
Xn
f (k) (x0 )
f (x) = (x x0 )k + o ((x x0 )n ) (21.13)
k!
k=0
It is the form in which the approximation is often stated.

We now illustrate Taylor’s (or MacLaurin’s) expansions with some examples.

Example 871 Let us start with polynomials whose P polynomial approximation is trivial.
Indeed, if f : R ! R is itself a polynomial, f (x) = nk=0 k xk , we obtain the identity
n
X f (k) (0)
f (x) = xk 8x 2 R
k!
k=0

since, as the reader can verify, we have

f (k) (0)
k = 81 k n
k!
604 CHAPTER 21. APPROXIMATION

Each polynomial can therefore be equivalently rewritten in the form of a MacLaurin’s ex-
pansion. For example, if f (x) = x4 3x3 , we have f 0 (x) = 4x3 9x2 , f 00 (x) = 12x2 18x;
f 000 (x) = 24x 18 and f (iv) (x) = 24, and hence

f 00 (0)
0 = f (0) = 0 ; 1 = f 0 (0) = 0 ; 2= =0
2!
f 000 (0) 18 f (iv) (0) 24
3 = = = 3 ; 4 = = =1
3! 6 4! 24
N

Example 872 Let f : R++ ! R be given by f (x) = log (1 + x). It is n times di¤erentiable
at each point of its domain, with

(n 1)!
f (n) (x) = ( 1)n+1 8n 1
(1 + x)n

and therefore Taylor’s expansion of order n of f at x0 2 R++ is

h h2
log (1 + x0 + h) = log (1 + x0 ) +
2 (1 + x0 )2
1 + x0
h3 n+1 hn
+ + + ( 1) + o (hn )
3 (1 + x0 )3 n (1 + x0 )n
Xn
hk
= log (1 + x0 ) + ( 1)k+1 k
+ o (hn )
k=1
k (1 + x0 )

or equivalently, using (21.13),

n
X (x x0 )k
log (1 + x) = log (1 + x0 ) + ( 1)k+1 k
+ o ((x x0 )n )
k=1
k (1 + x0 )

Note how a simple polynomial approximates (and as well as we want because o ((x x0 )n )
can be made arbitrarily small) the logarithmic function. In particular, MacLaurin’s expan-
sion of order n of f is

x2 x3 xn
log (1 + x) = x + + + ( 1)n+1 + o (xn ) (21.14)
2 3 n
n
X xk
= ( 1)k+1 + o (xn )
k
k=1

Example 873 In an analogous way the reader can verify MacLaurin’s expansions of order
21.1. TAYLOR’S POLYNOMIAL APPROXIMATION 605

n of the following elementary functions:


X xk n
x x2 x3 xn
e =1+x+ + + + + o (xn ) = + o (xn )
2 3! n! k!
k=0
1 3 1 ( 1)n 2n+1
sin x = x x + x5 + + x + o x2n+1
3! 5! (2n + 1)!
n
X ( 1)k 2k+1
= x + o x2n+1
(2k + 1)!
k=0
X ( 1)k n
1 2 1 ( 1)n 2n
cos x = 1 x + x4 + + x +o x2n
= x2k + o x2n
2 4! (2n)! (2k)!
k=0

Also here it is important to observe as such functions can be (well) approximated by simple
polynomials. N

Example 874 Let f : ( 1; +1) ! R be given by f (x) = log 1 + x3 3 sin2 x. The


function has as many derivatives as we want at each point of its domain. Let us calculate
second order MacLaurin’s expansion. We have

3x2 3x4 + 6x
f 0 (x) = 6 cos x sin x ; f 00 (x) = 6(cos2 x sin2 x)
1 + x3 (1 + x3 )2

and therefore
1
f (x) = f (0) + f 0 (0) x + f 00 (0) x2 + o x2 = 3x2 + o x2 (21.15)
2
N

Example 875 Let f : ( 1; +1) ! R be given by f (x) = e x (log (1 + x) 1) + 1. The


function has in…nite derivatives at each point of its domain. We leave the reader to verify
that third-order Taylor’s expansion at x0 = 3 is given by
log 4 1 5 4 log 4 16 log 4 25
f (x) = 3
+1+ (x 3) + (x 3)2
e 4e3 32e3
63 32 log 4
+ (x 3)3 + o (x 3)3
192e3
N

O.R. With n …xed, the approximation given by Taylor’s polynomial is good only in a neigh-
borhood (that can be very little) of the point x0 . On the other hand, increasing n the
approximation improves. We conclude that, …xed n, the approximation is good (better
than a prearranged error threshold) only in a neighborhood of x0 , while, …xed an interval,
there exists a value of n such that the approximation in such interval is good (better than
a prearranged error threshold): obviously provided the function has derivatives until such
order.
If we …x simultaneously the degree n and an interval, in general therefore the approxim-
ation cannot be controlled: it can be very bad. H
606 CHAPTER 21. APPROXIMATION

O.R. It is possible to prove that, if f : (a; b) ! R has n + 1 derivatives on (a; b), it is also
possible to write, as x0 2 (a; b),
n
X f (k) (x0 )
f (x0 + h) = hk + f (n+1) (x0 + #h)
k!
k=0

with 0 # 1. In other words, the addend o (hn ) can always be taken equal to the
(n + 1)-th derivative calculated at an intermediate point between x0 and x0 + h. The ex-
pression indicated allows to control the approximation error: if f (n+1) (x) k for every
x 2 [x0 ; x0 + h] it is possible to conclude that the approximation error does not exceed k
and therefore that
n
X n
X
f (k) (x0 ) f (k) (x0 )
hk k f (x0 + h) hk + k
k! k!
k=0 k=0

The error term f (n+1) (x0 + #h) is called Lagrange’s remainder, while o (hn ) is called Peano’s
remainder. H

21.1.3 Taylor and limits


Taylor’s expansions reveal themselves very useful also in the calculation of limits. Indeed,
by expanding in an appropriate way f at x0 , we reduce to a simple limit of polynomials. We
illustrate with a couple of limits such a use of Taylor’s expansions.

Example 876 (i) Consider the limit

log 1 + x3 3 sin2 x
lim
x!0 log (1 + x)

Since the limit is as x ! 0, we can use second order MacLaurin’s expansion (21.15) and
(21.14) to approximate the numerator and the denominator. Thanks to Lemma 439 and by
using the algebra of little-o, we have

log 1 + x3 3 sin2 x 3x2 + o x2 3x2


lim = lim = lim =0
x!0 log (1 + x) x!0 x + o (x) x!0 x

The calculation of the limit has therefore been considerably simpli…ed thanks to the combined
use of MacLaurin’s expansions and of the comparison of in…nitesimals seen in Lemma 439.
(ii) Consider the limit
x sin x
lim
x!0 log2 (1 + x)

Also this limit can be solved combining in a suitable way expansions and comparisons of
in…nitesimals:

x sin x x (x + o (x)) x2 + o x2 x2
lim 2 = lim 2 = lim = lim =1
x!0 log (1 + x) x!0 (x + o (x)) x!0 x2 + o (x2 ) x!0 x2

N
21.2. OMNIBUS PROPOSITION FOR LOCAL EXTREMAL POINTS 607

21.2 Omnibus proposition for local extremal points


Although for simplicity we have stated Taylor’s Theorem for functions de…ned on intervals
(a; b), it holds in the interior points x0 of any set A where f has a derivative n times provided
there is a neighborhood (a; b) A of x0 where f has a derivative n 1 times.
This version of Taylor’s approximation allows for stating an “omnibus” proposition for
local extremal points which includes and extends both the necessary condition f 0 (x0 ) = 0 of
Fermat’s Theorem and the su¢ cient condition f 0 (x0 ) = 0 and f 00 (x0 ) < 0 of Corollary 844
(see also point (ii) of Corollary 846).

Proposition 877 Let f : A R ! R and C A. Let x0 be an interior point of C for


which there exists a neighborhood (a; b) such that f has a derivative n 1 times on (a; b) and
has a derivative n times at x0 .
If f (k) (x0 ) = 0 for every 1 k n 1 and f (n) (x0 ) 6= 0, then:

(i) If n is even and f (n) (x0 ) < 0, the point x0 is a strong local maximizer;

(ii) If n is even and f (n) (x0 ) > 0, the point x0 is a strong local minimizer;

(iii) If n is odd, the point x0 is not a local extremal point and, moreover, f is increasing or
decreasing at x0 depending on the fact that f (n) (x0 ) > 0 or f (n) (x0 ) < 0.

For n = 1, point (iii) is nothing but the fundamental …rst order necessary condition
f 0 (x0 ) = 0. Indeed, for n = 1, point (iii) states that if f 0 (x0 ) 6= 0, then x0 is not a local
extremal point (that is, it is neither a local maximizer nor a local minimizer). By contrast,
this is equivalent to say that if x0 is a local extremal point, then f 0 (x0 ) = 0. Point (iii)
extends therefore to subsequent derivatives the …rst order necessary condition.
Point (i) instead, together with the hypothesis f (k) (x0 ) = 0 for every 1 k n 1,
extends, to subsequent derivatives, the second order su¢ cient condition f 00 (x0 ) < 0 for strong
local maximizers. Indeed, for n = 2 point (i) is exactly condition f 00 (x0 ) < 0. Analogously,
point (ii) extends the analogous condition f 00 (x0 ) > 0 for minimizers.2

N.B. In this and in the next section we will concentrate on local extremal points and therefore
on the generalization of point (ii), of su¢ ciency, of Corollary 846. It is possible to generalize
in an analogous way point (i), of necessity, of the aforementioned corollary. We leave the
details to the reader. O

Proof Let us prove point (i). Let n be even and let f (n) (x0 ) < 0. Thanks to Taylor’s
Theorem, from the hypothesis f (k) (x0 ) = 0 for every 1 k n 1 and f (n) (x0 ) 6= 0 if
follows that

f (n) (x0 ) n f (n) (x0 ) n o (hn )


f (x0 + h) f (x0 ) = h + o (hn ) = h 1+
n! n! hn
2
Observe that, given what has been proved about the Taylor’s approximation, the case n = 2 presents
an interesting improvement with respect to Corollary 844: it is required that the function f has a derivative
twice on the neighbourhood (a; b), but not necessarily continuous.
608 CHAPTER 21. APPROXIMATION

Since limh!0 o (hn ) =hn = 0, there exists > 0 such that jhj < implies jo (hn ) =hn j < 1.
Hence
o (hn )
h2( ; ) =) 1 + >0
hn
Since f (n) (x0 ) < 0, we have therefore, being hn > 0 since n is even,

f (n) (x0 ) n o (hn )


h2( ; ) =) h 1+ < 0 =) f (x0 + h) f (x0 ) < 0
n! hn

that is, setting x = x0 + h,

x 2 (x0 ; x0 + ) =) f (x) < f (x0 )

and hence x0 is a local maximizer.


In an analogous way we prove point (ii). Finally, point (iii) can be proved by adapting
in a suitable way the proof of Fermat’s Theorem.

Example 878 Let us consider the function f : R ! R given by f (x) = x4 . We saw


in Example 845 that, for its maximizer x0 = 0, it was not possible to apply the su¢ cient
condition f 0 (x0 ) = 0 and f 00 (x0 ) < 0. We have however

f 0 (0) = f 00 (0) = f 000 (0) = 0 and f (iv) (0) < 0

Since n = 4 is even, by point (i) of Proposition 877, we can conclude that x0 = 0 is a local
maximizer (actually, it is a global maximizer, but using Proposition 877 is not enough to
conclude this). N

Example 879 Let us consider the function f : R ! R given by f (x) = x3 . At x0 = 0 we


have
f 0 (0) = f 00 (0) = 0 and f 000 (0) < 0
Since n = 3 is odd, by point (iii) of Proposition 877 we have that x0 = 0 is not a local
extremal point (rather, at x0 , the function is strictly decreasing). N

O.R. Proposition 877 states that, if the …rst k 1 derivatives of f are all zero at x0 and
f (k) (x0 ) 6= 0, if k is even, it gives the same information of f 00 (either local maximizer or
minimizer), while, if k is odd, it gives the same information of f 0 (to increase or to decrease).
In short, it is as if all the k 1 derivatives (which are equal to zero) were not present at all.
H

Example 880 The function de…ned by f (x) = x6 clearly attains its minimum value at
x0 = 0. Indeed, we have f 0 (0) = f 00 (0) = = f v (0) = 0 and f vi (0) = 6! > 0. The
function de…ned by f (x) = x is clearly increasing at x0 = 0. We have f 0 (0) = f 00 (0) =
5

f 000 (0) = f iv (0) = 0 and f v (0) = 5! = 120 > 0. N

Proposition 877 is very powerful, but it has also important limitations. As Corollary 844,
it can only evaluate interior points and it is powerless in front of non-strong local extremal
points, for which in general the derivatives of each order are zero. The classical case is that
21.2. OMNIBUS PROPOSITION FOR LOCAL EXTREMAL POINTS 609

of constant functions, whose points are all very trivial maximizers and minimizers, on which
Proposition 877 (as already Corollary 844) is not able to give us any indication.
Moreover, to apply Proposition 877 it is necessary that the function has a su¢ cient
number of derivatives at a stationary point, which is not always the case, as the next example
shows.

Example 881 Let us consider the function f : R ! R de…ned by:


8
< x2 sin 1 if x 6= 0
6 f (x) = x
:
0 if x = 0

It is continuous at x = 0; indeed, since jsin (1=h)j 1 and by applying the comparison


criterion, it follows that
1
lim f (0 + h) = lim h2 sin =0
h!0 h!0 h
It has a derivative at x = 0; indeed

f (0 + h) f (0) h2 sin h1 0 1
lim = lim = lim h sin =0
h!0 h h!0 h h!0 h
The point x = 0 is stationary for f , but the function does not admit second derivative
at 0. Indeed, we have 8
< 2x sin 1 cos 1 if x 6= 0
0 x x
f (x) =
:
0 if x = 0
and therefore

f 0 (0 + h) f 0 (0) 2h sin h1 cos h1 0 1 1 1


lim = lim = lim 2 sin cos
h!0 h h!0 h h!0 h h h

does not exist. Proposition 877 cannot therefore be applied and hence it is not able to say
anything on the nature of the stationary point x = 0. Nevertheless, the graph of f shows
that such a point is not a local extremal one, since f has in…nitely many oscillations in any
neighborhood of zero. N

Example 882 The general version of the previous example considers f : R ! R de…ned as:
8
< xn sin 1 if x 6= 0
f (x) = x
:
0 if x = 0

with n 1, and shows that such a function does not have derivatives of order n in the
origin (in the case n = 1 this means that there does not exist the …rst derivative). We leave
to the reader the development of this example. N

For convenience of the reader, we also report the following corollary of Proposition 877.
It only states the component of “su¢ cient condition” of the aforementioned proposition.
610 CHAPTER 21. APPROXIMATION

Corollary 883 (Second su¢ cient condition for local extremal points) Let f : A
R ! R and C A. Let n 2 N, with n 2. Let x0 be an interior point of C for which
there exists a neighborhood (a; b) such that f has a derivative n 1 times on (a; b) and has
a derivative n times at x0 . Let f 0 (x0 ) = 0.
Let f (k) (x0 ) = 0 for every k 2 N such that 2 k n 1 and f (n) (x0 ) 6= 0. Then:

(i) If n is even and f (n) (x0 ) < 0, the point x0 is of strong local maximizer;

(ii) If n is even and f (n) (x0 ) > 0, the point x0 is of strong local minimizer;

(iii) If n is odd, the point x0 is not a local extremal point and, moreover, f is increasing or
decreasing at x0 according to the fact that f (n) (x0 ) > 0 or f (n) (x0 ) < 0.

21.3 Omnibus procedure of search of local extremal points


Thanks to Proposition 877, we can re…ne the procedure seen in Section 20.5.2 for the search
of local extremal points of a function f : A R ! R on a set C. To …x ideas let us study
…rst of all two important particular cases.

21.3.1 Twice di¤erentiable functions


Let us suppose that f is twice di¤erentiable on the interior points of C, that is, on int C.
The omnibus procedure consists in the following stages:

1. We determine the set S of stationary points, solving the …rst order condition f 0 (x) = 0.
If S = ; the procedure ends (and we can conclude that, since there are no stationary
points, there are no extremal ones); otherwise we move to the next step.

2. We calculate f 00 at each of the stationary points x 2 S: the point x is a strong local


maximizer if the second order condition is f 00 (x) < 0; it is a strong local minimizer if
such a condition is f 00 (x) > 0; if f 00 (x) = 0 the procedure is not able to determine the
nature of x.

It is the classical procedure to …nd local extremal points based on …rst order and second
order conditions of Section 20.5.2. The version just presented improves what we have seen in
that section because, taking again what we have observed in a previous footnote, it requires
only that the function has two derivatives on int C, not necessarily with continuity. However,
we are left with the other limitations discussed in Section 20.5.2.

21.3.2 In…nitely di¤erentiable functions


Let us suppose that f is in…nitely di¤erentiable on int C. The omnibus procedure consists
in the following stages:

1. We determine the set S of the stationary points, solving the equation f 0 (x) = 0. If
S = ; the procedure ends; otherwise we move to the next step.
21.4. TAYLOR’S EXPANSION: VECTOR FUNCTIONS 611

2. We compute f 00 at each of the stationary points x 2 S: the point x is a strong local


maximizer if f 00 (x) < 0; it is a strong local minimizer if f 00 (x) > 0. Call S (2) the subset
of S of the points such that f 00 (x) = 0. If S (2) = ; the procedure ends; otherwise we
move to the next step.

3. We compute f 000 at each point of S (2) : if f 000 (x) 6= 0, the point x is not an extremal
one. Call S (3) the subset of S (2) in which f 000 (x) = 0. If S (3) = ; the procedure ends;
otherwise we move to the next step.

4. We compute f (iv) at each point of S (3) : the point x is a strong local maximizer if
f (iv) (x) < 0; a strong local minimizer if f (iv) (x) > 0. Call S (4) the subset of S (3) in
which f (iv) (x) = 0. If S (4) = ; the procedure ends; otherwise we move to the next
step.

5. We iterate the procedure until S (n) = ;.

The procedure thus ends if there exists n such that S (n) = ;. In the opposite case the
procedure iterates ad libitum.

Example 884 Let us take again the function f (x) = x4 , with C = R. We saw in Example
845 that, for its maximizer x0 = 0, it was not possible to apply the su¢ cient condition
f 0 (x0 ) = 0 and f 00 (x0 ) < 0. We have however

f 0 (0) = f 00 (0) = f 000 (0) = 0 and f (iv) (0) < 0

so that
S = S (2) = S (3) = f0g and S (4) = ;
Stage 1 identi…es the set S = f0g, on which stage 2 does not have however nothing to say
since f 00 (0) = 0. Also stage 3 does not add any extra information since f 000 (0) = 0. Stage 4
instead is conclusive: since f (iv) (0) < 0, we can assert that x = 0 is a strong local maximizer
(actually, it is a global maximizer, but this procedure does not allow us to say this). N

Naturally, the procedure is of practical interest when it ends with a value of n su¢ ciently
small.

21.4 Taylor’s expansion: vector functions


In this section we study a version of the fundamental Taylor’s expansion for functions of
several variables. To do this, it is necessary to introduce quadratic forms.

21.4.1 Quadratic forms


A function f : Rn ! R of the form

f (x1 ; :::; xn ) = k (x1 1 x2 2 xnn )


P
with k 2 R and i 2 N, is called monomial of degree m 2 N when ni=1 i = m. For
example, f (x1 ; x2 ) = 2x1 x2 is a monomial of second degree, while f (x1 ; x2 ; x3 ) = 5x1 x32 x43
is a monomial of eight degree.
612 CHAPTER 21. APPROXIMATION

De…nition 885 A function f : Rn ! R is a quadratic form if it is sum of monomials of


second degree.

For example, f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3 is a quadratic form because it is the sum of the
monomials of second degree 3x1 x3 and x2 x3 . It is easy to see that the following functions
are quadratic forms:

f (x) = x2
f (x1 ; x2 ) = x21 + x22 4x1 x2
f (x1 ; x2 ; x3 ) = x1 x3 + 5x2 x3 + x23
f (x1 ; x2 ; x3 ; x4 ) = x1 x4 2x21 + 3x2 x3

There is a one-to-one correspondence between quadratic forms and symmetric matrices, as


next result, of which we omit the proof, shows.

Proposition 886 There exists a one-to-one correspondence between quadratic forms f :


Rn ! R and symmetric matrices A of order n determined by:3
n X
X n
f (x) = x Ax = aij xi xj for every x 2 Rn (21.16)
i=1 j=1

Given a symmetric matrix A there exists therefore a unique quadratic form f : Rn ! R


n n
for which (21.16) holds; vice versa, given a quadratic form f : Rn ! R there exists a unique
symmetric matrix A for which (21.16) holds.
n n

The matrix A is called matrix associated to the quadratic form f . Given the matrix
A = (aij ), expression (21.16) can be written in extended way as

f (x) = a11 x21 + a22 x22 + a33 x23 + + ann x2n


+ 2a12 x1 x2 + 2a13 x1 x3 + + 2a1n x1 xn
+ 2a23 x2 x3 + + 2a2n x2 xn
+
+ 2an 1n xn 1 xn

The coe¢ cients of the squares x21 , x22 , ..., x2n are therefore the elements on the diagonal of A,
that is, (a11 ; a22 ; :::ann ), while for every i; j = 1; 2; :::n the coe¢ cient of the monomial xi xj
is 2aij . It is therefore very simple to pass from the matrix to the quadratic form and vice
versa. Let us see some examples.

Example 887 The matrix associated to the quadratic form f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3 is
given by: 2 3
3
0 0 2
A=4 0 0 1 5
2
3 1
2 2 0
3
In accordance with what established in Section 13.2.2, for simplicity of notation we write x Ax instead
of the more precise x AxT .
21.4. TAYLOR’S EXPANSION: VECTOR FUNCTIONS 613

Indeed, for every x 2 R3 we have:


2 3
32 3
0 0 2 x1
x Ax = (x1 ; x2 ; x3 ) 4 0 0 1 54
2 x2 5
3 1
2 2 0 x3
3 1 3 1
= (x1 ; x2 ; x3 ) x3 ; x3 ; x1 x2
2 2 2 2
3 1 3 1
= x1 x3 x2 x3 + x1 x3 x2 x3 = 3x1 x3 x2 x3
2 2 2 2
Note that also the matrices
2 3 2 3
0 0 3 0 0 0
A=4 0 0 1 5 and A=4 0 0 0 5 (21.17)
0 0 0 3 1 0

are such that f (x) = x Ax, although they are not symmetric. What we loose without
symmetry is the one-to-one correspondence between quadratic forms and matrices. Indeed,
while given the quadratic form f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3 there exists a unique symmetric
matrix for which (21.16) holds, this is no longer true if we do not require the symmetry of
the matrix, as the two matrices in (21.17) show, for both of which (21.16) holds. N

Example 888 As regards the quadratic form f (x1 ; x2 ) = x21 + x22 4x1 x2 , we have:

1 2
A=
2 1

Indeed, for every x 2 R2 we have:


1 2 x1
x Ax = (x1 ; x2 ) = (x1 ; x2 ) (x1 2x2 ; 2x1 + x2 )
2 1 x2
= x21 2x1 x2 2x1 x2 + x22 = x21 + x22 4x1 x2

N
P
Example 889 Let f : Rn ! R be de…ned as f (x) = kxk2 = ni=1 x2i for every x 2 Rn .
The symmetric matrix Pn associated to this quadratic formPis the identity matrix I. Indeed,
2 . More generally, let f (x) = n 2
x Ix = x x = x
i=1 i i=1 i xi with i 2 R for every
i = 1; :::; n. It is easy to see that the matrix associated to f is the diagonal matrix
2 3
1 0 0 0
6 0 0 0 7
6 2 7
6 0 0 0 7
6 3 7
4 0 0 0 0 5
0 0 0 n

Observe that if f : Rn ! R is a quadratic form, we have f (0) = 0. According to the sign


of f (x) for the other vectors of Rn , it is possible (and it is, rather, important) to classify the
quadratic forms as follows.
614 CHAPTER 21. APPROXIMATION

De…nition 890 A quadratic form f : Rn ! R is said to be:


(i) positive (negative) semi-de…nite if f (x) 0( 0) for every x 2 Rn ,
(ii) positive (negative) de…nite if f (x) > 0 (< 0) for every x 2 Rn with x 6= 0,
(iii) inde…nite if there exist x; x0 2 Rn such that f (x) < 0 and f (x0 ) > 0.
In the light of Proposition 886, we have a parallel classi…cation for symmetric matrices,
where the matrix is said to be positive semi-de…nite if the corresponding quadratic form is
so, and so on.

In some cases it is easy to verify theP sign of a quadratic form. For example, it is immediate
to see that the quadratic form f (x) = ni=1 i x2i is positive semi-de…nite if and only if i 0
for every i, while it is positive de…nite if and only if i > 0 for every i. In general, nevertheless,
it is not simple to establish directly the sign of a quadratic form and therefore some methods
that help in this task have been elaborated. Among them, we see as example the criterion
of Sylvester-Jacobi.
Given a symmetric matrix A, let us build the following square submatrices A1 , A2 , ...,
An :
2 3
a11 a12 a13
a11 a12
A1 = [a11 ] ; A2 = ; A3 = 4 a21 a22 a23 5 ; :::; An = A
a21 a22
a31 a32 a33
and let us consider their determinants det A1 , det A2 , det A3 ,..., det An = det A (that are
exactly the North-West principal minors of the matrix A introduced in Section 13.6.5, con-
sidered from the smallest one to the largest one).
Proposition 891 (Criterion of Sylvester-Jacobi) A symmetric matrix A is:
(i) positive de…nite if and only if det Ai > 0 for every i = 1; :::; n;
(ii) negative de…nite if and only if det Ai changes sign starting with negative sign (that is,
det A1 < 0, det A2 > 0, det A3 < 0 and so on);
(iii) inde…nite if the determinants det Ai are not zero and the sequence of their signs does
not respect (i) and (ii).
Example 892 Let f (x1 ; x2 ; x3 ) = x21 + 2x22 + x23 + (x1 + x3 ) x2 . The matrix associated to
f is: 2 3
1 21 0
A = 4 12 2 12 5
0 12 1
Indeed, we have
2 1
32 3
1 2 0 x1
x Ax = (x1 ; x2 ; x3 ) 4 1
2 2 1 54
2 x2 5
1
0 2 1 x3
1 1 1 1
= (x1 ; x2 ; x3 ) x1 + x2 ; x1 + 2x2 + x3 ; x2 + x3
2 2 2 2
= x21 + 2x22 + x23 + (x1 + x3 ) x2
21.4. TAYLOR’S EXPANSION: VECTOR FUNCTIONS 615

Let us try to study the sign of the quadratic form with the criterion of Sylvester-Jacobi. We
have:

det A1 = 1 > 0
7 1
1 2
det A2 = det = >0
1
2 24
3
det A3 = det A = > 0
2
By the criterion of Sylvester-Jacobi we can therefore conclude that the quadratic form is
positive de…nite. N

There exist versions of the criterion of Sylvester-Jacobi to determine if a symmetric


matrix is positive semi-de…nite, negative semi-de…nite or if it is instead inde…nite. We omit
nevertheless the details and we move, instead, to Taylor’s expansion.

21.4.2 Taylor’s expansion


Thanks to Theorem 790, a function f : U ! R de…ned on an open set U in Rn with
continuous partial derivatives is di¤erentiable at every x 2 U , that is, it can be linearly
approximated:

f (x + h) = f (x) + df (x) (h) + o (khk) = f (x) + rf (x) h + o (khk) (21.18)

for every h 2 Rn such that x+h 2 U . As already seen in Section 19.2, if, with a small change
of notation, we denote by x0 the point at which f is di¤erentiable and we set h = x x0 ,
expression (21.18) assumes the following equivalent, but more expressive, form:

f (x) = f (x0 ) + df (x0 ) (x x0 ) + o (kx x0 k) (21.19)


= f (x0 ) + rf (x0 ) (x x0 ) + o (kx x0 k)

for every x 2 U .
We can now present Taylor’s expansion for functions of several variables; as in the scalar
case, also in the general case with several variables, Taylor’s expansion re…nes approximation
(21.19). In stating it, we limit ourselves to an approximation up to the second order that
su¢ ces for our purposes. We postpone to more advanced courses the study of approximations
of higher order.

Theorem 893 Let f : U ! R be twice continuously di¤ erentiable on U , that is, f 2 C 2 (U ).


Then, at each x0 2 U we have:
1
f (x) = f (x0 ) + rf (x0 ) (x x0 ) + (x x0 ) r2 f (x0 ) (x x0 ) + o kx x0 k2 (21.20)
2
for every x 2 U .

Expression (21.20) is called Taylor’s expansion (or Taylor’s formula) up to the second
order. The polynomial in the variable x
1
f (x0 ) + rf (x0 ) (x x0 ) + (x x0 ) r2 f (x0 ) (x x0 )
2
616 CHAPTER 21. APPROXIMATION

is called Taylor’s polynomial of second degree at the point x0 . The second-degree term is
a quadratic form, whose associated matrix, the Hessian r2 f (x), is symmetric thanks to
Theorem 798 (of Schwartz). Naturally, if arrested to the …rst order Taylor’s expansion
reduces itself to (21.19). Moreover, observe that in the scalar case Taylor’s polynomial
assumes the well-know form:
1
f (x0 ) + f 0 (x0 ) (x x0 ) + f 00 (x0 ) (x x0 )2
2
Indeed, in such a case we have r2 f (x0 ) = f 00 (x0 ) and therefore

(x x0 ) r2 f (x0 ) (x x0 ) = f 00 (x0 ) (x x0 )2 (21.21)

As in the scalar case, also here we have a trade-o¤ between the simplicity of the approx-
imation and its accuracy. Indeed, the approximation up to the …rst order (21.19) has the
merit of the simplicity with respect to that up to the second order: we approximate with a
linear function rather than with a second-degree polynomial, but to detriment of the degree
of accuracy of the approximation, given by o (kx x0 k) instead of the better o kx x0 k2 .
The choice of the order to which arrest Taylor’s expansion depends therefore on the
particular use we are interested in, depending on which aspect of the approximation is more
important, simplicity or accuracy.
2
Example 894 Let f : R2 ! R be de…ned as f (x1 ; x2 ) = 3x21 ex2 . We have:
2 2
rf (x) = 6x1 ex2 ; 6x21 x2 ex2

and " #
2 2
2 6ex2 12x1 x2 ex2
r f (x) = 2 2
12x1 x2 ex2 6x21 ex2 1 + 2x22
By Theorem 893, Taylor’s expansion at x0 = (1; 1) is

f (x) = f (1; 1) + rf (1; 1) (x1 1; x2 1)


1
+ (x1 1; x2 1) r2 f (1; 1) (x1 1; x2 1) + o k(x1 1; x2 1)k2
2
= 3e + (6e; 6e) (x1 1; x2 1) +
1 6e 12e x1 1
(x1 1; x2 1) + o (x1 1)2 + (x2 1)2
2 12e 18e x2 1
= 3e x21 4x1 + 5 8x2 + 4x1 x2 + 3x22 + o (x1 1)2 + (x2 1)2

2
The function f (x1 ; x2 ) = 3x21 ex2 is therefore approximated at the point (1; 1) by the second-
degree Taylor’s polynomial

3e x21 4x1 + 5 8x2 + 4x1 x2 + 3x22

with level of accuracy given by o (x1 1)2 + (x2 1)2 . N


21.4. TAYLOR’S EXPANSION: VECTOR FUNCTIONS 617

21.4.3 Second-order conditions


Thanks to Taylor’s expansion (21.20) we can state a second order condition for local extremal
points. Indeed, such expansion allows to approximate locally a function f : U ! R at a point
x0 2 U with a second-degree polynomial in the following way:
1
f (x) = f (x0 ) + rf (x0 ) (x x0 ) + (x x0 ) r2 f (x0 ) (x x0 ) + o kx x0 k2
2
If x
^ is a local extremal point (maximum or minimum), by Fermat’s Theorem we have
rf (^x) = 0 and therefore the approximation becomes:

1
f (x) = f (^
x) + (x ^) r2 f (^
x x) (x x
^) + o kx ^ k2
x (21.22)
2
that is
1
f (^ x) + h r2 f (^
x + h) = f (^ x) h + o khk2
2
By working on this simple observation, we obtain the following second order conditions,
which are based on the sign of the quadratic form h r2 f (x0 ) h.

Theorem 895 Let f : U ! R be twice continuously di¤ erentiable on U , that is, f 2 C 2 (U ).


^ 2 U be a stationary point.4
Let x

^ is a local maximizer (minimizer) on U , the quadratic form h r2 f (^


(i) If x x) h is negative
(positive) semi-de…nite.

(ii) If the quadratic form h r2 f (^


x) h is negative (positive) de…nite, then x
^ is a strong local
maximizer (minimizer).

Note that from point (i) follows that if the quadratic form h r2 f (^ x) h is inde…nite,
the point x^ is neither a local maximizer nor a local minimizer on U . The theorem is the
analogous for functions of several variables of Corollary 846 for scalar functions. In the proof
we will reduce the problem from functions of several variables to scalar function, and we will
use this corollary. We will prove only point (i) leaving point (ii) to the reader.

Proof (i) Let x ^ be a local maximizer on U . We want to prove that the quadratic form
h r2 f (^ x) h is negative semi-de…nite. For simplicity, let us suppose that x ^ is the origin
0 = (0; 0) (leaving to the reader the case of any x ^). First of all let us prove that we have
v r2 f (0) v 0 for every versor v of Rn . Afterwards we will prove that we have h r2 f (0) h
0 for every h 2 Rn .
Since 0 is a local maximizer, there exists a neighborhood B (0) of 0 such that f (0)
f (x) for every x 2 B (0) \ U and there exists a spherical neighborhood of 0, of radius
su¢ ciently small, contained in B (0) \ U that is there exists " > 0 such that B" (0)
B (0) \ U . Let us observe that every vector x 2 B" (0) can be written as x = tv, where v
is a versor of Rn , that is, v 2 Rn , jjvjj = 1 and t 2 R.5 Clearly, tv 2 B" (0) if and only if
4
For simplicity we continue to consider a function de…ned on a neighbourhood. The reader can extend the
results to functions f : A Rn ! R and to interior points x ^ that belong to a choice set C A.
5
Intuitively, v gives the direction of x, and t gives its norm (indeed, jjxjj = jtj).
618 CHAPTER 21. APPROXIMATION

jtj < ". Fixed now an arbitrary versor v of Rn , let us de…ne the function v : ( "; ") ! R as
v (t) = f (tv). Since tv 2 B" (0) for jtj < ", we have

v (0) = f (0) f (tv) = v (t)

for every t 2 ( "; "). It follows that t = 0 is a local maximizer for the function v and hence,
being v di¤erentiable and being t = 0 interior point to the domain of v , applying Corollary
846 we get 0v (0) = 0 and 00v (0) 0. Applying the chain rule to the function

v (t) = f (tv1 ; tv2 ; :::; tvn )

we get 0v (t) = rf (tv) v and 00


v (t) = v r2 f (tv) v. The …rst order and second order
conditions become therefore
0 00
v (0) = rf (0) v = 0 and v (0) = v r2 f (0) v 0

Since the versor v of Rn is arbitrary, this last inequality holds for every v 2 Rn with jjvjj = 1.
Let now h 2 Rn . Analogously as before, let us observe that h = th v for some versor v of
Rn and th 2 R such that jth j = jjhjj.

1.5 h= t v
h

1
v
0.5

0
1
-0.5

-1

-1.5

-2
-2 -1 0 1 2

Then
h r2 f (0) h = th v r2 f (0) th v = t2h v r2 f (0) v
Since v r2 f (0) v 0, we have also h r2 f (0) h 0, and since this holds for every h 2 Rn ,
the quadratic form h r2 f (0)h is negative semi-de…nite.

In the scalar case we …nd again the usual second order conditions, based on the sign of
the second derivative f 00 (^
x). Indeed, we already observed in (21.21) that in the scalar case
it is true that
x r2 f (^
x) x = f 00 (^
x) x2
thus, in this case, the sign of the quadratic form depends only on the sign of f 00 (^x); that is,
it is negative (positive) de…nite if and only if f 00 (^
x) < 0 (> 0) and it is negative (positive)
semi-de…nite if and only if f 00 (^
x) 0 ( 0).
21.4. TAYLOR’S EXPANSION: VECTOR FUNCTIONS 619

Naturally, as in the scalar case, also in this more general framework, condition (i) is only
necessary for x ^ to be a local maximizer. Indeed, let us consider the function f (x1 ; x2 ) =
^ = 0 we have r2 f (0) = O. The corresponding quadratic form x r2 f (0) x
x21 x2 . At x
is identically zero and it is therefore both negative semi-de…nite and positive semi-de…nite.
Nevertheless, x ^ = 0 is neither a local maximizer nor a local minimizer. Indeed, given a
generic neighborhood B" (0), let x = (x1 ; x2 ) 2 B" (^ x) be such that x1 = x2 . Let t be such a
common value, so that
p p "
(t; t) 2 B" (0) () k(t; t)k = t2 + t2 = jtj 2 < " () jtj < p
2

Since f (t; t) = t3 , for every (t; t) 2 B" (0) we have f (t; t) < f (0) if t < 0 and f (0) < f (t; t)
if t > 0, which shows that x ^ = 0 is neither a local maximizer nor a local minimizer.6
Similarly, condition (ii) is only su¢ cient for x ^ to be a local maximizer. Consider the
function f (x) = x21 x22 . The point x ^ = 0 is clearly a maximizer (even absolute) for the
function f . But, r2 f (0) = O and therefore the corresponding quadratic form x r2 f (0) x
is not negative de…nite.

The Hessian r2 f (^ x) is the symmetric matrix associated to the quadratic form x


2
r f (^
x) x; we can therefore equivalently state Theorem 895 in the following way:

necessary condition for x^ to be maximizer (minimizer) is that the Hessian matrix


2
r f (^
x) is negative (positive) semi-de…nite,

su¢ cient condition for x


^ to be strong maximizer (minimizer) is that such matrix is
negative (positive) de…nite.

This is an important observation from the practical point of view because there exist cri-
teria, such as that of Sylvester-Jacobi, to determine if a symmetric matrix is positive/negative
de…nite or semi-de…nite.

To illustrate Theorem 895, let us consider the case of a function of two variables f :
R2 ! R that has a derivative twice with continuity. Let x0 2 R2 be a stationary point
rf (x0 ) = (0; 0) and let
2 3
@2f @2f
(x ) (x )
5= a b
2 0 @x @x 0
r2 f (x0 ) = 4 @@x2 f1 1
@2f
2
(21.23)
c d
@x2 @x1 (x0 ) @x2
(x0 )
2

be the Hessian matrix computed at the point x0 . Since the gradient at x0 is zero, the
point is a candidate to be a maximizer or minimizer of f . To evaluate its exact nature
it is necessary to proceed to the analysis of the Hessian matrix at the point. By The-
orem 895, x0 is a maximizer if the Hessian is negative de…nite, a minimizer if it is positive
6
In an alternative way, it is su¢ cient to observe that each point of the I and II quadrant, except the axes,
is such that f (x1 ; x2 ) > 0 and that each point of the III and IV quadrant, except the axes, is such that
f (x1 ; x2 ) < 0. Every neighbourhood of the origin contains necessarily both points of the I and II quadrant
(except the axes), for which we have f (x1 ; x2 ) > 0 = f (0), and points of the III and IV quadrant (except
the axes), for which we have f (x1 ; x2 ) < 0 = f (0). Hence 0 is neither a local maximizer nor point a local
minimizer.
620 CHAPTER 21. APPROXIMATION

de…nite, and it is neither a maximizer nor a minimizer if it is inde…nite. If the Hessian


is only semi-de…nite, positive or negative, it is not possible to draw conclusions on the
nature of x0 . Applying Sylvester-Jacobi’s Theorem to the matrix (21.23) we have that

(i) if a > 0 and ad bc > 0, the Hessian is positive de…nite, and therefore x0 is a strong
local minimizer;

(ii) if a < 0 and ad bc > 0, the Hessian is negative de…nite, and therefore x0 is a strong
local maximizer;

(iii) if ad bc < 0, the Hessian is inde…nite, and therefore x0 is neither a local maximizer
nor a local minimizer.

In all the other cases it is not possible to draw conclusions on the nature of the point x0 .

We conclude with two examples.

Example 896 Let f : R2 ! R be a function de…ned as f (x1 ; x2 ) = 3x21 + x22 + 6x1 . The
gradient of f is rf (x) = (6x1 + 6; 2x2 ). Its Hessian matrix is

6 0
r2 f (x) =
0 2

It is easy to see that the unique point where the gradient vanishes is the point x0 = ( 1; 0) 2
R2 , that is, rf ( 1; 0) = (0; 0). Moreover, using what we have just seen, since a > 0 and
ad bc > 0, the point x0 = ( 1; 0) is a strong local minimizer of f . N

Example 897 Let f : R3 ! R be de…ned as f (x1 ; x2 ; x3 ) = x31 + x32 + 3x23 2x3 + x21 x22 .
We have
rf (x) = 3x21 + 2x1 x22 ; 3x22 + 2x21 x2 ; 6x3 2

and therefore 2 3
6x1 + 2x22 4x1 x2 0
r2 f (x) = 4 4x1 x2 6x2 + 2x21 0 5
0 0 6

The stationary points are x0 = ( 3=2; 3=2; 1=3) and x00 = (0; 0; 1=3). At x0 , we have
2 9
3
2 9 0
r2 f x0 = 4 9 9
2 0 5
0 0 6

and therefore

9 9
9
det < 0; det 2
9 < 0; det r2 f x0 < 0
2 9 2
21.4. TAYLOR’S EXPANSION: VECTOR FUNCTIONS 621

By the criterion of Sylvester-Jacobi the Hessian matrix is inde…nite. By Theorem 895, the
point x0 = ( 3=2; 3=2; 1=3) is neither a local minimizer nor a local maximizer. For the
point x00 = (0; 0; 1=3) we have
2 3
0 0 0
r2 f x00 = 4 0 0 0 5
0 0 6

which is positive semi-de…nite since x r2 f (x00 ) x = 6x23 (note that it is not positive de…nite:
for example, (1; 1; 0) r2 f (x00 ) (1; 1; 0) = 0). N

21.4.4 Unconstrained optima: vector functions


Lastly, we can generalize to the vector case the partial procedure for the solution of un-
constrained optimization problems, discussed in Section 20.5.3. Let us consider the uncon-
strained optimization problem

max f (x) sub x 2 C


x

where C is an open set of Rn . Let us assume that f 2 C 2 (C). Thanks to point (i) of Theorem
895, the procedure of Section 20.5.3 assumes the following form:

1. We determine the set S C of the stationary interior points of f , solving the …rst
order condition rf (x) = 0 (Section 20.1.3).

2. We calculate f 00 at each of the stationary points x 2 S and we determine the set

S2 = x 2 S : r2 f (^
x) is negative semi-de…nite

3. We determine the set

S3 = x 2 S2 : f (x) f x0 for every x0 2 S2

that constitutes the set of the points of C candidate to be possible solutions of the
optimization problem.

Also here the procedure is not conclusive because nothing ensures the existence of a
solution. Later in the book we will discuss this crucial problem by combining in the method
of elimination such theorems of existence with the di¤erential methods.

Example 898 Let f : R2 ! R be de…ned as f (x1 ; x2 ) = 2x21 x22 + 3 (x1 + x2 ) x1 x2 + 3.


Let us study the unconstrained optimization problem

max f (x) sub x 2 R2++


x

Here C = R2++ is the …rst quadrant of the plane without the axes (and it is therefore an
open set). We have:
rf (x) = ( 4x1 + 3 x2 ; 2x2 + 3 x1 )
622 CHAPTER 21. APPROXIMATION

and therefore from the …rst order condition rf (x) = 0 it follows that the unique stationary
point is x = (3=7; 9=7), that is, S = f3=7; 9=7g. We have

4 1
r2 f (x) =
2 1

By the criterion of Sylvester-Jacobi, the Hessian matrix r2 f (x) is negative de…nite.7 Hence,
S2 = f3=7; 9=7g. Since S2 is a singleton, we have trivially S3 = S2 . In conclusion, the point
x = (3=7; 9=7) is the unique candidate to be solution of the unconstrained optimization
problem. It is possible to show that this point is the solution of the problem. For the
moment we can only say that it is a local maximizer (Theorem 895-(ii)). N

21.5 Asymptotic expansions


21.5.1 Asymptotic scales and expansions
Up to now we have considered polynomial expansions. Although they are the most relevant,
it would be useful to mention other general expansions (the study of which was pioneered
by Henri Poincaré in 1886), so to better contextualize the polynomial case itself.
Let us take any open interval (a; b), bounded or unbounded; in other words, a; b 2 R. A
family of scalar functions = f'n g1 n=0 de…ned on (a; b) is said to be an asymptotic scale in
x0 2 [a; b] if,8 for every n 0, we have that

'n+1 = o ('n ) as x ! x0

Example 899 (i) Power functions 'n (x) = (x x0 )n are an asymptotic scale in x0 2 (a; b).
(ii) Negative power functions in negative powers 'n (x) = x n are an asymptotic scale in
x0 = +1.9 More generally, powers 'n (x) = x n form an asymptotic scale in x0 = +1 as
long as n+1 > n for every n 1. (iii) The trigonometric functions 'n (x) = sinn (x x0 )
1
form an asymptotic scale in x0 2 (a; b). (iv) Logarithms 'n (x) = log n x for an asymptotic
scale in x0 = +1. N

Let us now give a general de…nition of expansion.

De…nition 900 A function f : (a; b) ! R admits an expansion of order n with respect to


the scale in x0 2 [a; b] if there exist scalars f k gnk=0 such that
n
X
f (x) = k 'k (x) + o ('n ) as x ! x0 (21.24)
k=0

for every x 2 (a; b).


7
Since r2 f (x) is negative de…nite for all x 2 Rn
++ , this also proves that f is concave.
8
The expression x0 2 [a; b] entails that x0 is an accumulation point of (a; b). For example, if (a; b) is
the real line, the point x0 belongs to the real line itself; in symbols, if (a; b) = ( 1; +1) we have that
x0 = [ 1; +1]
9
Whenever, as in this example, we have that x0 = +1 the interval (a; b) is unbounded b = +1 (the
negative power function scale example was made by Poincaré himself.)
21.5. ASYMPTOTIC EXPANSIONS 623

Polynomial expansions, in the form (21.2), are a special case of (21.24) where the asymp-
totic scale is given by power functions. Furthermore, contrary to the polynomial case where
x0 had to be a scalar, now we can take x0 = 1. General expansions are relevant as, with
respect to polynomial expansions, they allow us to approximate a function for large values of
the argument, that is to say asymptotically. In symbols, condition (21.24) scan be expressed
as
Xn
f (x) k 'k (x) as x ! x0
k=0
For example, for n = 2 we get the quadratic approximation:
f (x) 0 '0 (x) + 1 '1 (x) + 2 '2 (x) as x ! x0
By using the scale of power functions, we end up with the well-known quadratic approxim-
ation
2
f (x) 0 + 1x + 2x as x ! 0
however, if we use the scale of negative power functions, we get that:
1 2
f (x) + 2 0 +
as x ! +1
x x
In such a case, as x0 = +1, we are dealing with a quadratic asymptotic approximation.
Example 901 It holds that:
1 1 1
+ as x ! +1 (21.25)
x 1 x x2
Indeed,
1 1 1 1 1
+ 2 = =o as x ! +1
x 1 x x (x 1) x2 x2
Approximation (21.25) is asymptotic. For values close to 0, we consider the quadratic poly-
nomial approximation instead:
1
1 x 2x2 as x ! 0
x 1
N
The crucial property regarding the uniqueness of polynomial expansions (Lemma 868)
still holds in the general case.
Lemma 902 A function f : (a; b) ! R has at most a unique expansion of order n with
respect to scale in every point x0 2 [a; b].
P
Proof. Let us consider the expansion nk=0 k 'k (x) + o ('n ) in x0 2 [a; b]. We have that
Pn
f (x) k=0 k 'k (x) + o ('n )
lim = lim = 0 (21.26)
x!x0 '0 (x) x!x0 '0 (x)
Pn
f (x) 0 '0 (x) k=1 k 'k (x) + o ('n )
lim = lim = 1 (21.27)
x!x0 '1 (x) x!x0 '1 (x)
Pn 1
f (x) k=0 k 'k (x)
lim = n (21.28)
x!x0 'n (x)
624 CHAPTER 21. APPROXIMATION

Suppose that, for every x 2 (a; b), there are two di¤erent expansions
n
X n
X
k 'k (x) + o ('n ) = k 'k (x) + o ('n ) (21.29)
k=0 k=0

Equalities (21.26)-(21.28) must hold for both expansions. Hence, by (21.26) we have that
0 = 0 . Iterating such a procedure, from equality (21.27) we get that 1 = 1 , and so on
until n = n .

Limits (21.26)-(21.28) are crucial: it is rather easy to see that expansion (21.24) holds if
and only if the limits exist (and are …nite). Such limits in turn determine the expansion’s
coe¢ cients f k gnk=0 .10

Example 903 Let us determine the quadratic asymptotic approximation, with respect to
the scale of negative power functions, for the function f : ( 1; +1) ! R de…ned as f (x) =
1= (1 + x). Thanks to equalities (21.26)-(21.28), it holds that
1
f (x) 1
0 = lim = lim 1+x = lim =0
x!x0 '0 (x) x!x0 1 x!x0 1 + x
1
f (x) 0 '0 (x) 1+x x
1 = lim = lim 1 = lim =1
x!x0 '1 (x) x!x0
x
x!x0 1+x
1 1
f (x) 0 '0 (x) 1 '1 (x) 1+x x x
2 = lim = lim 1 = lim = 1
x!x0 '2 (x) x!x0
x2
x!x0 1+x
Hence the desired approximation is
2
1 1 1
as x ! +1
1+x x x
By the previous lemma, it is the only quadratic asymptotic approximation with respect to
the scale of negative power functions. N

By changing scale, the expansion changes as well. For example, approximation (21.25) is a
quadratic approximation for 1= (x 1) with respect to the scale of negative power functions,
however, by changing scale one obtains a di¤erent quadratic approximation. Indeed, if for
example in x0 = +1 we consider the asymptotic scale 'n (x) = (x + 1) =x2n we obtain the
quadratic asymptotic approximation
1 x+1 x+1
+ as x ! +1
x 1 x2 x4
In fact,
1 x+1 x+1 1 x+1
+ = =o as x ! +1
x 1 x2 x4 (x 1) x4 x4
In conclusion, di¤erent asymptotic scales lead to di¤erent, although unique, approxima-
tions (as long as they exist). But then, di¤erent functions can have the same expansion, as
the next example shows.
10
The “only if” part is shown in the previous proof, the reader can verity the converse.
21.5. ASYMPTOTIC EXPANSIONS 625

Example 904 Both


2
1 1 1
as x ! +1
1+x x x
and
2
1+e x 1 1
as x ! +1
1+x x x
hold. Indeed,
!
2
1+e x 1 1 1 + x2 e x 1
= =o as x ! +1
1+x x x (1 + x) x2 x2

Therefore 1=x 1=x2 is the quadratic asymptotic approximation of 1= (1 + x) and (1 + e x ) = (1 + x).


N

The reader might recall that we considered the two following formulations of the De
Moivre-Stirling formula

log n! = n log n n + o (n)


1 p
= n log n n + log n + log 2 + o (1)
2
the …rst one being slightly less precise but easier to derive. Although the deal with discrete
variables, these formulas can be thought of as two expansions for n ! +1 of function
log n!. In particular, the …rst one is a quadratic asymptotic approximation with respect to
a scale whose …rst two terms are fn log n; ng, for example n log n; n; 1; 1=n; 1=n2 ; ::: ; the
second one is an expansion of order 4 with respect to a scale whose …rst four terms are
fn log n; n; log n; 1g, for example fn log n; n; log n; 1; 1=n; :::g
In order to make a more suitable example, let us consider the function : (0; +1) ! R,
called gamma function, given by
Z 1
(x) = tx 1 e t dt
0

where the integral is an improper one (Section 30.10.1).

Lemma 905 It holds that (x + 1) = x (x) for every x > 0.

Proof By integrating by parts, one obtains that for every 0 < a < b
Z b Z b Z b
b
tx e t dt = e t tx a + x tx 1 e t dt = e b bx + e a ax + x tx 1
e t dt
a a a

If a # 0 we have that e a ax ! 0 and if b " +1 we have that e b bx ! 0,11 thus implying


the desired result.

By iterating, we thus have that for every n 1,

(n + 1) = n (n) = n (n 1) (n 1) = = n! (1) = n!
11
Since x > 0, we have that lima!0 ax = 0 as 1 = x lima!0 log a = lima!0 log ax .
626 CHAPTER 21. APPROXIMATION

as (1) = 1. The Gamma function can be therefore thought of as the extension on the
real line of the factorial function, which is de…ned on the natural numbers.12 It is a very
important function: the next remarkable result makes its interpretation in terms of expansion
of the two versions of the De Moivre-Stirling formula more rigorous.

Theorem 906 We have, for x ! +1,

log (x) = x log x x + o (x)


1 p
= x log x x log x + log 2 + o (1)
2
In the expansion notation, we can thus write that, for x ! +1,

log (x) x log x x


1 p
x log x x log x + log 2
2

21.5.2 Asymptotic expansions and analytic functions


1
If a sequence of coe¢ cients f k gk=0 is such that (21.24) holds for every n, we say that
1
X
f (x) k 'k (x) as x ! x0
k=0
P
or every x 2 (a; b). The expression 1 k=1 k 'k (x) is said asymptotic expansion of f in x0 .
Having set a value for the argument x, the asymptotic expansion is a series. In general, such
a series does not necessarily converge to the value f (x), rather it might even not converge
at all. In fact, an asymptotic expansion is required to be an approximation with a certain
degree of accuracy, and nothing more. The next example list the di¤erent (fortunate or less
fortunate) cases one can encounter.

Example 907 (i) The function f : (1; +1) ! R de…ned as f (x) = 1= (x 1) has, with
respect to the scale of negative power functions, the asymptotic expansion
1
X 1
f (x) as x ! +1 (21.30)
xk
k=1

The asymptotic expansion is, for every given x, a geometric series, therefore it converges for
every x > 1, that is for every x in the domain of f , with
1
X 1
f (x) =
xk
k=1
12
Instead of (n + 1) = n! we would have exactly that (n) = n! if in the gamma function the exponent
was x instead of x 1, which is the standard notation. This detail also explains the opposite sign of the
logarithmic term in the approximation of n! and of (x). The properties of the gamma function, including the
next theorem and its proof, can be found in E. Artin, “The gamma function”, Holt, Rinehart and Winston,
1964.
21.5. ASYMPTOTIC EXPANSIONS 627

In this (fortunate) case the asymptotic expansion is actually correct: la series determined by
the asymptotic expansion converges to f (x) for every x 2 (a; b).
(ii) Also the function f : (1; +1) ! R de…ned as (1 + e x ) = (x 1) has, with respect
to the scale of negative power functions, the asymptotic expansion (21.30) for x ! +1.
However, in this case, for every x > 1 we have that
1
X 1
f (x) 6=
xk
k=1

In this instance the asymptotic expansion is merely an approximation, with degree of accur-
acy x n for every n.
(iii) Consider the function f : (1; +1) ! R de…ned as:13
Z x t
x e
f (x) = e dt
1 t

By integrating repeatedly by parts, we get that:


Z x t x Z x t x x Z x t Z x t
e et e et et 2e t 1 1 x 2e
dt = + 2
dt = + 2 + 3
dt = e + 2 + 3
dt
1 t t 1 1 t t 1 t 1 1 t t t 1 1 t
x x x Z x Z x
et et 2et 3!et t 1 1 2! x 3!et
= + 2 + 3 + dt = e + + + dt
t 1 t 1 t 1 1 t3 t t2 t3 1 1 t3
x Z x
t 1 1 2! (n 1)! et
= e + 2+ 3+ + + n! dt
t t t tn 1 1 tn+1

Since
Rx R x
et
Rx et
x
ex
et 2 e2 +
1 tn+1
dt 1 tn+1 dt + x
2 tn+1
dt x n+1
( )
2 xn 2n+1
0 lim ex = lim ex lim ex = lim x + =0
x!1
xn
x!1
xn
x!1
xn
x!1 e2 x

We have that Z x
et ex
n+1
dt = o as x ! +1
1 t xn
Hence,

g (x) 1 1 2! 3! (n 1)! 1
f (x) = = + 2+ 3+ 4+ + +o as x ! +1
ex x x x x xn xn

and
1
X (k 1)!
f (x) as x ! +1
xk
k=1
P P
For any given x > 1, the ratio criterion implies that 1k=1 (k 1)!=xk = 1 k
k=1 k!=kx = +1.
The asymptotic expansion thus determines a divergent series. In this (very unfortunate) case
not only the series does not converge to f (x), but it even diverges. N
13
This example is taken from N. G. de Bruijn, “Asymptotic methods in analysis”, North-Holland, 1961.
628 CHAPTER 21. APPROXIMATION

Let us go back to the polynomial case, in which the asymptotic expansion of f : (a; b) ! R
in x0 2 (a; b) has the form
1
X
f (x) k (x x0 )k as x ! x0
k=0

When f in…nitely di¤erentiable in x0 , by Taylor’s Theorem, the asymptotic expansion be-


comes
1
X f (k) (x0 )
f (x) (x x0 )k as x ! x0
k!
k=0
The right-hand side of the expansion is the Taylor series (of MacLaurin if x0 = 0).
The function f is said to be analytic when its polynomial asymptotic expansion is no
longer an approximation as it coincides exactly with the original function: for every x0 2
(a; b) it holds that
1
X
f (x) = k (x x0 )k 8x 2 (a; b)
k=0
Analytic functions are thus expandable as series of power functions.
Proposition 908 A function f : (a; b) ! R is analytic if and only if, for every x0 2 (a; b),

1
X f (k) (x0 )
f (x) = (x x0 )k 8x 2 (a; b) (21.31)
k!
k=0
Proof The converse being P trivial, let us consider the “only if”side. Let f be analytic. Since,
by hypothesis, the series 1
k=0 k (x x0 )k
is convergent for every x 2 (a; b), with sum f (x),
one can show that f is in…nitely di¤erentiable in every x 2 (a; b). Let n 1. By Taylor’s
Theorem, we have that
n
X f (k) (x0 )
f (x) (x x0 )k as x ! x0
k!
k=0

Lemma 902 implies that k = f (k) (x0 ) =k! for every 1 k n. Since n was arbitrarily
chosen, the desired result follows.

The following result shows how some classic elementary functions are indeed analytic.
Proposition 909 (i) The exponential and logarithmic functions are analytic; in particular
1
X xk
ex = 8x 2 R
k!
k=0
1
X xk
log (1 + x) = ( 1)k+1 8x > 0
k
k=1

(ii) The trigonometric functions sine and cosine are analytic; in particular
1
X 1
X
( 1)k 2k+1 ( 1)k 2k
sin x = x and cos x = x 8x 2 R
(2k + 1)! (2k)!
k=0 k=0
21.5. ASYMPTOTIC EXPANSIONS 629

Proof Let us P only kconsider the exponential function. By Theorem 355, in x0 = 0 we


have that ex = 1 x =k! for every x 2 R. By substitution, for every x0 2 R it holds that
Pn k=0
x x
e = e +e0 x 0
k=1 (x x0 )k =k! for every x 2 R. The exponential function is thus analytic
on the real line.

In the previous proof we have seen how being in…nitely di¤erentiable is a necessary
condition for a function to be analytic. However, the following result shows how such a
condition is not su¢ cient.

Example 910 The function f : R ! R given by


( 1
e x2 if x 6= 0
f (x) =
0 if x = 0

is in…nitely di¤erentiable in every point of the real line, hence in the origin, so that
1
X f (k) (0)
f (x) xk as x ! 0
k!
k=0

However, it holds that f (n) (0) = 0 for every n 1, and so


n
X f (k) (0)
f (x) 6= 0 = xk 80 6= x 2 R
k!
k=0

The function f is not analytic although it is in…nitely di¤erentiable. N

In conclusion, analytic functions f : (a; b) ! R are a relevant subclass of in…nitely


di¤erentiable functions over (a; b). Thanks to their asymptotic expansion, which is both
polynomial and exact (what more could one want?), they are the nicest among all functions
from an analytical tractability perspective. This makes them perfect for applications, which
hardly can do without them.

21.5.3 Hille’s formula


We can now state a beautiful version of Taylor’s formula, due to Einar Hille, for continuous
functions (we omit its non-trivial proof).

Theorem 911 (Hille) Let f : (0; 1) ! R be a bounded continuous function and x0 > 0.
Then, for each h > 0,
X1 k f (x )
0
f (x0 + h) = lim hk (21.32)
!0+ k!
k=0

We call Hille’s formula the limit (21.32). When f is in…nitely di¤erentiable, Hille’s
formula intuitively should approach the series expansion (21.31), i.e.,
1
X f (k) (x0 )
f (x0 + h) = hk
k!
k=0
630 CHAPTER 21. APPROXIMATION

because lim !0+ k f (x0 ) = f (k) (x0 ) for every k 1 (Proposition 777). This is actually
true when f is analytic since in this case (21.31) and (21.32) together imply
1
X 1
kf(x0 ) k X f (k) (x0 ) k
lim h = h
!0+ k! k!
k=0 k=0

Hille’s formula, however, holds when f is just bounded and continuous, thus providing a
remarkable generalization of the Taylor ’s expansion of analytic functions.
Chapter 22

Concavity and di¤erentiability

Concave functions have remarkable di¤erential properties that con…rm the great tractability
of these widely used functions. The study of these properties is the subject matter of this
chapter. We begin with scalar functions and then move to vector ones. Throughout the
chapter C always denotes a convex set (so an interval in the scalar case). For brevity, we
will focus on concave functions, leaving to the readers the dual results that hold for convex
functions.

22.1 Scalar functions


22.1.1 Decreasing marginal e¤ects
The di¤erential properties of a scalar concave function f : C R ! R follow from a simple
geometric observation. Given two points x and y in the domain of a function f , the chord
that joins the points (x; f (x)) and (y; f (y)) of the graph has slope

f (y) f (x)
y x

as it is easy to check with a simple modi…cation of what seen for (18.6). Graphically:

f(y)
4
f(y)-f(x)
3
f(x)
2
y-x

0
O x y
-1
-1 0 1 2 3 4 5 6

631
632 CHAPTER 22. CONCAVITY AND DIFFERENTIABILITY

If a function is concave, the slope of the chord decreases when we move the chord rightward.
This basic geometric property characterizes concavity, as the next lemma shows.

Lemma 912 A function f : C R ! R is concave if and only if, for any four points
x; w; y; z 2 C with x w < y z, we have

f (y) f (x) f (z) f (w)


(22.1)
y x z w

In other words, by moving rightward from [x; y] to [w; z], the slope of the chords decreases.
Graphically:

5 D
C
4

3
B
2

1 A

0
O x w y z
-1
-1 0 1 2 3 4 5 6

Note that a strict inequality in (22.1) characterizes strict concavity.

Proof “Only if”. Let f be concave. The proof is divided in two steps: …rst we show that
the chord AC has a greater slope than the chord BC:

5
C
4

3
B
2

1 A

0
O x w y
-1
-1 0 1 2 3 4 5 6
22.1. SCALAR FUNCTIONS 633

Then, we show that the chord BC has a greater slope than the chord BD:

5 D
C
4

3
B
2

0
O w y z
-1
-1 0 1 2 3 4 5 6

The …rst step amounts to prove (22.1) for z = y. Since x w < y, there exists 2 [0; 1]
such that w = x + (1 )y. Since f is concave we have f (w) f (x) + (1 )f (y), so
that
f (y) f (w) f (y) f (x) (1 )f (y) f (y) f (x)
= (22.2)
y w y x (1 )y y x
This completes the …rst step. We now move to the second step, which amounts to prove
(22.1) for x = w. Since w < y z, there exists 2 [0; 1] such that y = w + (1 )z. Since
f is concave we have f (y) f (w) + (1 )f (z), so that

f (y) f (w) f (w) + (1 )f (z) f (w) f (z) f (w)


= (22.3)
y w w + (1 )z w z w
From (22.2) and (22.3) it follows that
f (z) f (w) f (y) f (w) f (y) f (x)
z w y w y x
as desired.
“If”. Assume (22.1). Let x; z 2 C, with x < z, and 2 [0; 1]. Set y = x + (1 ) z. If
in (22.1) we set w = x, we have
f ( x + (1 ) z) f (x) f (z) f (x)
x + (1 )z x z x
Since x + (1 )z x = (1 ) (z x), we then have
f ( x + (1 ) z) f (x) f (z) f (x)
(1 ) (z x) z x
that is, f ( x + (1 ) z) f (x) (1 ) (f (z) f (x)). In turn, this implies that f is
concave, as desired.

The geometric property (22.1) has the following analytical counterpart, of great economic
signi…cance.
634 CHAPTER 22. CONCAVITY AND DIFFERENTIABILITY

Proposition 913 If f : C R ! R is concave, then it has decreasing increments (or


di¤ erences), i.e.,
f (x + h) f (x) f (y + h) f (y) (22.4)
for all x; y 2 C, h 0 and x y with y + h 2 C. The converse is true if f is continuous.

Proof Let x y and h 0. Then the points y and x + h belongs to the interval [x; y + h].
Under the change of variable z = y +h, we have x+h; z h 2 [x; z]. Hence there is a 2 [0; 1]
for which x + h = x + (1 ) z. It is immediate to check that z h = (1 ) x + z.
By the concavity of f , we then have f (x + h) f (x) + (1 ) f (z) and f (z h)
(1 ) f (x) + f (z). Adding the two inequalities, we have

f (x + h) + f (z h) f (x) + f (z)
f (x + h) f (x) f (z) f (z h) = f (y + h) f (y) :

as desired. We omit the proof of the converse.


The inequality (22.4) does not change if we divide both sides by a h > 0. Hence,

f (x + h) f (x) f (y + h) f (y)
f+0 (x) = lim lim = f+0 (y)
h!0+ h h!0+ h

provided the limits exist. Similarly f 0 (x) f 0 (y), and so f 0 (x) f 0 (y) when the (bilat-
eral) derivative exists. Concave function f thus feature decreasing marginal e¤ects as their
argument increases, and so embody a fundamental economic principle: additional units have
a lower and lower marginal impact on levels (of utility, of production, and so on; we then
talk of decreasing marginal utility, decreasing marginal returns, and so on). It is through
this principle that forms of concavity …rst entered economics.1
The next lemma establishes this property rigorously by showing that unilateral derivatives
exist and are decreasing.2

Proposition 914 Let f : C R ! R be concave. Then,

(i) the left f+0 (x) and right f 0 (x) derivatives exist at each x 2 int C;

(ii) the left f+0 (x) and right f 0 (x) derivatives are both decreasing on int C;

(iii) f+0 (x) f 0 (x) for each x 2 int C.

A concave function has therefore remarkable properties of regularity: at each interior


point of its domain, it is automatically continuous (Theorem 609) and has decreasing uni-
lateral derivatives.
1
In his famous 1738 essay “Specimen theoriae novae de mensura sortis”, Daniel Bernoulli wrote: “Now it
is highly probable that any increase in wealth, no matter how insigni…cant, will always result in an increase in
utility which is inversely proportionate to the quantity of goods already possessed.”This is where the principle
…rst appeared, and through it Bernoulli justi…ed the use of a logarithmic (so concave) utility function. This
magni…cent insight of Bernoulli was way ahead of his time (see, for instance, the work of Stigler mentioned
in Section 6.2.1).
2
The interior, int C, of an interval C is an open interval: whether C is either [a; b] or [a; b) or (a; b], we
always have int C = (a; b).
22.1. SCALAR FUNCTIONS 635

Proof Since x0 is an interior point, there exists a neighborhood of this point (x0 "; x0 + ")
included in C, that is, (x0 "; x0 + ") C. Let 0 < a < ", so that we have [x0 a; x0 + a]
C. Let : [ a; a] ! R be de…ned as

f (x0 + h) f (x0 )
(h) = 8h 2 [ a; a]
h
Property (22.1) implies that is decreasing, that is,

f (x0 + h0 ) f (x0 ) f (x0 + h00 ) f (x0 )


h0 h00 =) h0 = h00 = (22.5)
x0 + h0 x0 x0 + h00 x0

Indeed, if h0 < 0 < h00 it is su¢ cient to apply (22.1) with w = y = x0 , x = x0 + h0 and
z = x0 + h00 . If h0 h00 < 0, it is su¢ cient to apply (22.2) with y = x0 , x = x0 + h0 and
w = x0 + h00 . If 0 < h0 h00 it is su¢ cient to apply (22.3) with w = x0 , y = x0 + h0 and
00
z = x0 + h .
Since is decreasing on [ a; a] we have (a) (h) ( a) for every h 2 [ a; a],
that is, is bounded. Therefore, is both decreasing and bounded, which implies that the
right-hand limit and the left-hand limit of exist and are …nite. This proves the existence
of unilateral derivatives. Moreover, the decreasing monotonicity of implies (h0 ) (h00 )
0 00
for every h < 0 < h , so that

f+0 (x0 ) = lim (h) lim (h) = f 0 (x0 )


h!0+ h!0

To show the monotonicity, let us consider x; y 2 int C such that x < y. By (22.4),

f (x + h) f (x) f (y + h) f (y)
8h 2 [ a; a]
h h
Hence,
f (x + h) f (x) f (y + h) f (y)
f+0 (x) = lim lim = f+0 (y)
h!0+ h h!0+ h
which implies that the right derivative is decreasing. A similar argument holds for the left
derivative.
Clearly, if in addition f is di¤erentiable at x, then f 0 (x) = f+0 (x) = f 0 (x). In particular:

Corollary 915 If a concave function f : C R ! R is di¤ erentiable on int C, then its


derivative function f 0 is decreasing on int C.

Example 916 (i) The concave function f (x) = jxj has not a derivative at x = 0. Never-
theless, the unilateral derivatives exist at each point of the domain, with
(
0 1 if x < 0
f+ (x) =
1 if x 0

and (
1 if x 0
f 0 (x) =
1 if x > 0
636 CHAPTER 22. CONCAVITY AND DIFFERENTIABILITY

Therefore f+0 (x) f 0 (x) for every x 2 R and both unilateral derivatives are decreasing.
(ii) The concave function
8
>
> x+1 if x 1
<
f (x) = 0 if 1<x<1
>
>
:
1 x if x 1

has not a derivative at x = 1 and at x = 1. Nevertheless, the unilateral derivatives exist


at each point of the domain, with
8
>
> 1 if x < 1
<
0
f+ (x) = 0 if 1 x<1
>
>
:
1 if x 1

and 8
>
> 1 if x 1
<
f 0 (x) = 0 if 1<x 1
>
>
:
1 if x > 1

Therefore f+0 (x) f 0 (x) for every x 2 R and both unilateral derivatives are decreasing.
(iii) The concave function f (x) = 1 x2 is di¤erentiable on R with f 0 (x) = 2x. The
derivative function is decreasing. N

Proposition 914 says, inter alia, that at the interior points x we have f+0 (x) f 0 (x).
The next result, of which we omit the proof, says that we actually have f+0 (x) = f 0 (x),
and so f is di¤erentiable at x, at all the points of C, except at an at most countable subset
of it (in the previous tripartite example, such set of non di¤erentiability D is, respectively,
D = f0g, D = f 1; 1g and D = ;).

Theorem 917 A concave function f : C R ! R is di¤ erentiable at all the points of


C with the exception of at most a countable subset.

22.1.2 Tests of concavity


An important property established in Proposition 914 is the decreasing monotonicity of
the unilateral derivatives of concave functions. The next important result shows that for
continuous functions this property characterizes concavity.

Theorem 918 Let f : C R ! R be continuous. Then:

(i) f is concave if and only if the right derivative f+0 exists and is decreasing on int C;

(ii) f is strictly concave if and only if the right derivative f+0 exists and is strictly decreasing
on int C.
22.1. SCALAR FUNCTIONS 637

Proof (i) We only prove the “if” since the converse follows from Proposition 914. For
simplicity, assume that f is di¤erentiable on the open interval int C. By hypotheses, f 0 is
decreasing on int C. Let x; y 2 int C, with x < y, and 2 (0; 1). Set z = x + (1 ) y, so
that x < z < y. By the Mean Value Theorem, there exist x 2 (x; z) and y 2 (z; y) such
that
f (z) f (x) f (y) f (z)
f 0 ( x) = ; f0 y =
z x y z
Since f 0 is decreasing, f 0 ( x ) f0 y . Hence,
f ( x + (1 ) y) f (x) f (y) f ( x + (1 ) y)
x + (1 )y x y x (1 )y
Being x + (1 )y x x = (1 ) (y x) and y x (1 )y = (y x), we then
have
f ( x + (1 ) y) f (x) f (y) f ( x + (1 ) y)
(1 ) (y x) (y x)
In turn, this easily implies f ( x + (1 ) z) f (x) + (1 ) f (z), as desired.3 (ii) This
part is left to the reader.

A similar result, left to the reader, holds for the other unilateral derivative f 0 . This
theorem thus establishes a di¤erential characterization for concavity by showing that it is
equivalent to the decreasing monotonicity of unilateral derivatives.

Example 919 Let f : R ! R be continuous given by f (x) = x + x3 , that is,


(
x + x3 if x < 0
f (x) =
x x3 if x 0

The function has unilateral derivatives at each point of the domain, with
(
1 + 3x2 if x < 0
f+0 (x) =
1 3x2 if x 0

and (
0 1 + 3x2 if x 0
f (x) =
1 3x2 if x > 0
To see that this is the case, let us consider the origin, which is the most delicate point. We
have
f (h) f (0) h + h3 h3
f+0 (0) = lim = lim = lim 1+ = 1
h!0+ h h!0+ h h!0+ h
and
f (h) f (0) h + h3 h3
f 0 (0) = lim = lim = lim 1+ =1
h!0 h h!0 h h!0 h
Therefore f+0 (x) f 0 (x) for every x 2 R and both derivatives are decreasing. By Theorem
918 the function is concave. N
3
A version of the Mean Value Theorem for unilateral derivatives would prove the result without any
di¤erentiability assumption on f .
638 CHAPTER 22. CONCAVITY AND DIFFERENTIABILITY

Unilateral derivatives are key in the previous theorem because concavity per se only en-
sures their existence, not that of the bilateral derivative .Unilateral derivatives are, however,
less easy to handle than the bilateral derivative. In application di¤erentiability is often as-
sumed. In this case we have the following simple consequence of the previous theorem that,
under di¤erentiability, provides a useful test of concavity for functions.

Corollary 920 Let f : C R ! R be di¤ erentiable on int C and continuous on C. Then:

(i) f is concave if and only if the derivative function f 0 is decreasing on int C;

(ii) f is strictly concave if and only if the derivative function f 0 is strictly decreasing on
int C.

Under di¤erentiability, a necessary and su¢ cient condition for a function to be (strictly)
concave is, thus, that its …rst derivative is (strictly) decreasing.4

Proof We prove (i), as (ii) is similar. Let f : C R ! R be with a derivative on int C


and continuous on C. If f is concave, Theorem 918 implies that f 0 = f+0 is decreasing. Vice
versa, if f 0 = f+0 is decreasing, then Theorem 918 implies that f is concave.

Example 921 The functions f; g : R ! R given by f (x) = x3 and g (x) = e x have a


derivative on their domain, with
(
0 3x2 if x 0
f (x) = and g 0 (x) = e x
3x2 if x > 0

3 y

0
O x
-1

-2

-3

-4
-3 -2 -1 0 1 2 3 4 5

f (x) = jx3 j

4
When C is open, the continuity hypothesis become super‡uous (a similar observation applies to Corollary
922 below).
22.1. SCALAR FUNCTIONS 639

2
y
1

0
O x
-1 -1

-2

-3

-4

-5
-3 -2 -1 0 1 2 3 4

g(x) = e x

The derivatives are strictly decreasing and therefore f and g are strictly concave thanks to
Corollary 920. N

This corollary provides a simple di¤erential criterion of concavity that reduces the test of
concavity to that, often operationally simple, of a property of …rst derivatives. It is actually
possible to do even better by recalling the di¤erential characterization of monotonicity seen
in Section 20.4.

Corollary 922 Let f : C R ! R be with twice di¤ erentiable on int C and continuous on
C. Then:

(i) f is concave if and only if f 00 0 on int C;

(ii) f is strictly concave if f 00 < 0 on int C.

Proof (i) It is su¢ cient to observe that, thanks to the “decreasing” version of Proposition
835, the …rst derivative f 0 is decreasing on int C if and only if f 00 (x) 0 for every x 2 int C.
(ii) It follows from the “strictly decreasing” version of Proposition 837.

Under the further hypothesis that f is twice di¤erentiable on int C, concavity thus be-
comes equivalent to the negativity of the second derivative, a condition often easier to check
than the decreasing monotonicity of the …rst derivative. In any case, thanks to the last two
corollaries we now have powerful di¤erential tests of concavity.5
Note the asymmetry between points (i) and (ii): while in (i) the decreasing monotonicity
is necessary and su¢ cient condition for concavity, in (ii) the strictly decreasing monotonicity
is only su¢ cient condition for strict concavity. This follows from the analogous asymmetry
for monotonicity between Propositions 835 and 837.
5
Dual results hold for convex functions, with increasing monotonicity instead of decreasing monotonicity
(and f 00 0 instead of f 00 0).
640 CHAPTER 22. CONCAVITY AND DIFFERENTIABILITY

p
Example 923 (i) The functions f (x) = x and g (x) = log x have, respectively, derivatives
p
f 0 (x) = 1=2 x and g 0 (x) = 1=x that are strictly decreasing. Therefore, they are strictly
concave. The second derivatives f 00 (x) = 1= 4x3=2 < 0 and g 00 (x) = 1=x2 < 0 con…rm
this conclusion.
(ii) The function f (x) = x2 has derivative f 0 (x) = 2x that is strictly increasing. There-
fore, it is strictly convex. Indeed, f 00 (x) = 2 > 0.
(iii) The function f (x) = x3 has derivative f 0 (x) = 3x2 that is strictly decreasing on
( 1; 0] and strictly increasing on [0; +1). Indeed, the second derivative f 00 (x) = 6x is 0
on ( 1; 0] and 0 on [0; +1). N

22.1.3 Chords and tangents


Theorem 924 Let f : (a; b) ! R be di¤ erentiable at x 2 (a; b). If f is concave, then

f (y) f (x) + f 0 (x) (y x) 8y 2 (a; b) (22.6)

Proof Let f be concave and let x and y be two distinct points of (a; b). If 2 (0; 1) we
have:

f (x + (1 ) (y x)) = f ( x + (1 ) y) f (x) + (1 ) f (y)


= f (x) + (1 ) [f (y) f (x)]

Therefore,
f (x + (1 ) (y x)) f (x)
f (y) f (x)
(1 )
Dividing and multiplying the left-hand side by y x, we get
f (x + (1 ) (y x)) f (x)
(y x) f (y) f (x)
(1 ) (y x)

This inequality holds for every 2 (0; 1). Hence, thanks to the di¤erentiability of f at x,
we have
f (x + (1 ) (y x)) f (x)
lim (y x) = f 0 (x) (y x)
!1 (1 ) (y x)
Therefore, f 0 (x) (y x) f (y) f (x), as desired.

The right-hand side of inequality (22.6) is the tangent line of f at x, that is, the linear
approximation of f that holds, locally, at x. By Theorem 924, such line always lies above
the graph of the function, the approximation is in “excess”.
Geometrically, this remarkable property is clear: the de…nition of concavity requires that
the straight line passing through the two points (x; f (x)) and (y; f (y)) lies below the graph
of f in the interval between x and y and therefore that it lies above it outside that interval.6
Letting y tend to x, the straight line becomes tangent and it lies all above the curve.
6
For completeness, let us prove it. Let z be exterior to the interval [x; y]: let us suppose that z > y. We can
then write y = x+(1 ) z with 2 (0; 1) and, by the concavity of f , we have f (y) f (x)+(1 ) f (z),
that is, f (z) (1 ) 1 f (y) (1 ) 1 f (x). But, being 1= (1 ) = > 1 and 1 = 1 1= (1 )=
= (1 ) < 0, we have f (z) = f ( y + (1 ) x) f (y) + (1 ) f (x) for every > 1. If z < x we
reason in a dual way.
22.2. VECTOR FUNCTIONS 641

5 f(x)+f'(x)(y-x)

4.5
f(x)
4
f(y)
3.5

3
f(y )
2

2.5

2 f(y )
1
1.5

0.5
O y y y x
1 2
0
0 1 2 3 4 5

In the previous theorem we assumed di¤erentiability at a given point x. If we assume it


on the entire interval (a; b), inequality (22.6) characterizes concavity.

Theorem 925 Let f : (a; b) ! R be di¤ erentiable on (a; b). Then, f is concave if and only
if
f (y) f (x) + f 0 (x) (y x) 8x; y 2 (a; b) (22.7)

For a function f with a derivative on (a; b), a necessary and su¢ cient condition for
concavity of f is that the tangent lines at the various points of its domain all lie above its
graph.

Proof The “only if” follows from the previous theorem. We prove the “if”. Suppose that
inequality (22.7) holds and consider the point z = x + (1 ) y. Let us consider (22.7)
twice: …rst with the points z and x, and then with the points z and y. Then:

f 0 (z) (1 ) (x y) f (x) f ( x + (1 ) y)
0
f (z) (y x) f (y) f ( x + (1 ) y)

Let us multiply the …rst inequality by , and the second one by (1 ). Adding up:

0 f (x) + (1 ) f (y) f ( x + (1 ) y)

Given the arbitrariness of x and y, we conclude that f is concave.

22.2 Vector functions


We will now present a few di¤erential results for concave functions of several variables. We
omit their non-trivial proofs that the reader will learn in more advanced courses. To ease
matters, throughout the section C is a convex and open set in Rn .
A …rst remarkable di¤erential property of concave functions of several variables is that
for them derivability and di¤erentiability are equivalent notions, as it was in the scalar case.
642 CHAPTER 22. CONCAVITY AND DIFFERENTIABILITY

Proposition 926 Let f : C ! R be concave. Given x 2 C, the function f is di¤ erentiable


at x if and only if it has partial derivatives at x.

Relative to Theorem 790, here the continuity of partial derivatives is not required.

A key question in the vector case is: What is the vector counterpart for the property of
decreasing monotonicity of the …rst derivative? Recall that in the scalar case this property
characterizes concavity, as Corollary 920 showed. For vector functions, the derivative func-
tion f 0 becomes the derivative operator rf : C ! Rn (Section 19.1.1). In the Appendix we
present a notion of monotonicity for operators. By applying such notion to the derivative
operator, next we extend Corollary 920 to vector functions.

Theorem 927 Suppose the function f : C ! R has a derivative on C. Then, f is concave


if and only if the derivative operator rf : C ! Rn is monotone, i.e.,

(rf (y) rf (x)) (y x) 0 8x; y 2 C (22.8)

Moreover, f is strictly concave if the inequality is strict if x 6= y.

A dual result, with opposite inequality, characterizes convex functions. The next result
makes operational this characterization via a condition of negativity on the Hessian matrix
r2 f (x) of f – that is, the matrix of second partial derivatives of f – that generalizes the
condition f 00 (x) 0 of Corollary 922. In other words, the role of the second derivative is
played in the general case by the Hessian matrix.

Proposition 928 Let f : C ! R be a twice continuously di¤ erentiable function on C.


Then:

(i) f is concave if and only if r2 f (x) is negative semi-de…nite for every x 2 C;

(ii) f is strictly concave if r2 f (x) is negative de…nite for every x 2 C.

This is the most useful criterion to determine whether a function is concave. Naturally,
dual results hold for convex functions, which are characterized by having positive semi-
de…nite Hessian matrices.

Example 929 In Example 892 we considered the function f : R3 ! R de…ned as

f (x1 ; x2 ; x3 ) = x21 + 2x22 + x23 + (x1 + x3 ) x2

and we saw how its Hessian matrix was positive de…nite. By Theorem 928, f is strictly
convex. N

The next result extends Theorems 924 and 925 to vector functions.

Theorem 930 (i) Let f : C ! R be di¤ erentiable at x 2 C. If f is concave, then

f (y) f (x) + rf (x) (y x) 8y 2 C (22.9)

(ii) If f is di¤ erentiable on C, then f is concave if and only if (22.9) holds for every x; y 2 C.
22.3. SUFFICIENCY OF THE FIRST ORDER CONDITION 643

It is easy to see that, for strictly concave functions, we have a strict inequality in (22.9).
The right-hand side of (22.9) is the linear approximation of f at x; geometrically, it is
the hyperplane tangent to f at x, that is, the vector version of the tangent line. By this
theorem, such approximation is from above, that is, the tangent hyperplane always lies above
the graph of a concave function. The di¤erential characterizations of concavity discussed in
the previous section for scalar functions, thus nicely extend to vector functions.

22.3 Su¢ ciency of the …rst order condition


Though the …rst order condition is in general only necessary, in Section 16.5 we saw that
the maximizers of concave functions are necessarily global (Theorem 703). We may then
expect that for concave functions the …rst order condition may come to play a decisive role.
Indeed, the results studied in this chapter allow us to show that for concave functions the
…rst order condition is also su¢ cient. In other words, a stationary point of a concave function
is, necessarily, a global maximizer. It is a truly remarkable property of concave functions, a
main reason behind their popularity.
To ease matters, we start by considering a scalar concave function f : (a; b) ! R that
admits derivative. The inequality (22.7), that is,

f (y) f (x) + f 0 (x) (y x) 8x; y 2 (a; b)

implies that a point x^ 2 (a; b) is a global maximizer if f 0 (^


x) = 0. Indeed, if x
^ 2 (a; b) is such
0
that f (^x) = 0, the inequality implies

f (y) x) + f 0 (^
f (^ x) (y x
^) = f (^
x) 8y 2 (a; b)

^ 2 (a; b) is a maximizer, it follows that f 0 (^


On the other hand, if x x) = 0 by Fermat’s
Theorem. Therefore:

Proposition 931 Let f : (a; b) ! R be a concave and di¤ erentiable function. A point
^ 2 (a; b) is a global maximizer of f on (a; b) if and only if f 0 (^
x x) = 0.

Example 932 (i) Consider the function f : R ! R de…ned by f (x) = (x + 1)4 + 2. We


have f 00 (x) = 12(x + 1)2 < 0. The function is concave on R and it is therefore su¢ cient to
…nd a point where its …rst derivative is zero to …nd a maximizer. We have f 0 (x) = 4(x+1)3 .
Hence f 0 is zero only at x
^ = 1. The point x ^ = 1 is the unique global maximizer, and the
maximum value of f on R is equal to f ( 1) = 2.
(ii) Consider the function f : R ! R de…ned by f (x) = x (1 x). Because f 0 (1=2) = 0
and f 00 (x) = 2 < 0, the point x^ = 1=2 is the unique global maximizer of f on R. N

The result easily extends to functions f : A Rn ! R of several variables using the


vector version (22.9) of the inequality. We have, therefore, the following general result.

Theorem 933 Let f : A Rn ! R be a concave function de…ned on a convex set A in Rn


and let C be an open and convex subset of A where f has a derivative. A point x
^ of C is a
global maximizer of f on C if and only if rf (^
x) = 0.
644 CHAPTER 22. CONCAVITY AND DIFFERENTIABILITY

Example 934 Consider the function f : R2 ! R de…ned by f (x1 ; x2 ) = (x1 1)2 (x2 +
3)2 6. We have
2 0
r2 f (x1 ; x2 ) =
0 2
Since 2 < 0 and r2 f (x1 ; x2 ) = 4 > 0; the Hessian matrix is negative de…nite for every
(x1 ; x2 ) 2 R2 and hence f is strictly concave. We have

rf (x1 ; x2 ) = 2(x1 1) 2(x2 + 3)

The unique point where the gradient is zero is (1; 3) which is, therefore, the unique global
maximizer. The maximum value of f on R2 is f (1; 3) = 6. N

22.4 Superdi¤erentials
Theorem 930 showed that di¤erentiable concave functions feature the important inequality7

f (y) f (x) + rf (x) (y x) 8y 2 C (22.10)

This inequality has a natural geometric interpretation: the tangent hyperplane (line, in
the scalar case) lies above the graph of f , which it touches only at (x; f (x)). Next we
show, without proof, that this property actually characterizes the di¤erentiability of concave
functions.8 In other words, this geometric property is peculiar to the tangent hyperplanes
of concave functions.

Theorem 935 A concave function f : C ! R is di¤ erentiable at x 2 C if and only if there


exists a unique vector 2 Rn such that

f (y) f (x) + (y x) 8y 2 C

In this case, = rf (x).

For concave functions, di¤erentiability is thus equivalent to the existence of a unique vec-
tor, the gradient, for which the basic inequality (22.10) holds. Equivalently, to the existence
of a unique linear functional l : Rn ! R such that f (y) f (x) + l (y x) for all y 2 C.
Consequently, non di¤erentiability is equivalent either to the existence of more than one
vectors for which (22.10) holds or to the non existence of any such vector. This observation
motivates the next de…nition, where C is any convex (possibly not open) set.

De…nition 936 A function f : C ! R is superdi¤ erentiable at a basepoint x 2 C if the set


@f (x) formed by the vectors 2 Rn such that

f (y) f (x) + (y x) 8y 2 C (22.11)

is non-empty. The set @f (x) is called superdi¤ erential at x of f .


7
Unless otherwise stated, throughout the section C denotes an open and convex set in Rn .
8
We omit the non-trivial proofs of most of the results of this section, leaving them to more advanced
courses.
22.4. SUPERDIFFERENTIALS 645

The superdi¤erential thus consists of all vectors (and so linear functions) for which (22.10)
holds. It may not exist any such vector (Example 943 below); in this case the superdi¤erential
is empty and the function is not superdi¤erentiable at the basepoint.

To visualize the superdi¤erential, given a basepoint x 2 C consider the a¢ ne function


r : Rn ! R de…ned by:
r (y) = f (x) + (y x)
with 2 @f (x). The a¢ ne function r is, therefore, such that

r (x) = f (x) (22.12)


n
r (y) f (y) 8y 2 R (22.13)

In words, r is equal to f at the basepoint x and dominates f elsewhere. It follows that @f (x)
identi…es the set of all a¢ ne functions that touch the graph of f at x and that lie above this
graph at all other points of the domain. In the scalar case, a¢ ne functions are the straight
lines. So, in the next …gure the straight lines r, r0 , and r00 belong to the superdi¤erential
@f (x) of a concave scalar function:

It is easy to see that, at the points where the function is di¤erentiable, the only straight
line that satis…es conditions (22.12)-(22.13) is the tangent line f (x) + f 0 (x) (y x). But,
at the points where the function is not di¤erentiable, we might well have several straight
lines r : R ! R that satisfy such conditions, that is, that touch the graph of the function
at the basepoint x and that lie above such graph elsewhere. The superdi¤erential, being
the collection of these straight lines, can thus be viewed as a surrogate of the tangent line,
i.e., of the di¤erential. This is the idea behind the superdi¤erential: it is a surrogate of the
di¤erential when it does not exist. The next result con…rms this intuition.

Proposition 937 A concave function f : C ! R is di¤ erentiable at x 2 C if and only if


@f (x) is a singleton. In this case, @f (x) = frf (x)g.

In the following example we determine the superdi¤erential of a simple scalar function.


646 CHAPTER 22. CONCAVITY AND DIFFERENTIABILITY

Example 938 Consider f : R ! R de…ned by f (x) = 1 jxj. The only point where f is
not di¤erentiable is x = 0. By Proposition 937, @f (x) = ff 0 (x)g for each x 6= 0. It remains
to determine @f (0). This is amounts to …nd the scalars that satisfy the inequality
1 jyj 1 j0j + (y 0) 8y 2 R
i.e., the scalars such that jyj y for each y 2 R. If y = 0, this inequality trivially
holds for all if y = 0. If y =
6 0, we have
y
1 (22.14)
jyj
Since
y 1 if y 0
=
jyj 1 if y < 0
from (22.14) it follows both 1 and ( 1) 1. That is, 2 [ 1; 1]. We conclude
that @f (0) = [ 1; 1]. Thus:
8
<1 if x > 0
@f (x) = [ 1; 1] if x = 0
:
1 if x < 0
N

We can recast what we found in the example as


(
f 0 (x) if x 6= 0
@f (x) =
f+0 (x) ; f 0 (x) if x = 0
Next we show that this is always the case for scalar functions.

Proposition 939 Let f : (a; b) ! R be a concave function de…ned on a, possibly unbounded,


interval of the real line. Then,
@f (x) = f+0 (x) ; f 0 (x) 8x 2 (a; b) (22.15)

In words, the superdi¤erential of a scalar function consists of all coe¢ cients that lie
between the right and left derivatives. This makes precise the geometric intuition we gave
above on scalar functions.

Proof We only prove that @f (x) f+0 (x) ; f 0 (x) . Let 2 @f (x). Given any h 6= 0, by
de…nition we have f (x + h) f (x) + h. If h > 0, we then have
f (x + h) f (x) f (x) + h f (x)
=
h h
and so f+0 (x) . If h < 0, then
f (x + h) f (x) f (x) + h f (x)
=
h h
and so f 0 (x). We conclude that 2 f+0 (x) ; f 0 (x) , as desired.

Next we compute the superdi¤erential of an important vector function.


22.4. SUPERDIFFERENTIALS 647

Example 940 Consider the function f : Rn ! R given by f (x) = mini=1;:::;n xi . Let us


…nd @f (0), that is, the vectors 2 Rn such that x f (x) for all x 2 Rn . Let 2 @f (0).
From:

i = ei f ei = 0 8i = 1; :::; n
n
X
i = (1; :::; 1) f (1; :::; 1) = 1
i=1
Xn

i = ( 1; :::; 1) f ( 1; :::; 1) = 1
i=1
P
we conclude that ni=1 i = 1 and i 0 for each i = 1; :::; n. That is, belongs to the
simplex n 1 . Thus, @f (0) n 1 . On the other hand, if 2 n 1 , then

x min xi ; :::; min xi = min xi 8x 2 Rn


i=1;:::;n i=1;:::;n i=1;:::;n

and so 2 @f (0). We conclude that @f (0) = n 1 , that is, the superdi¤erential at the
origin is the simplex. The reader can check that, for every x 2 Rn ,

@f (x) = f 2 n+1 : x = f (x)g

i.e., @f (x) consists of the vectors x of the simplex such that x = f (x). N

Before we argued that the superdi¤erential is a surrogate of the di¤erential. In order to


be a useful surrogate, however, it is necessary that it often exists, otherwise it would be of
little help.

Proposition 941 If f : C ! R is concave, then @f (x) is a non-empty compact set for


every x 2 C.

If f is di¤erentiable, this result reduces to point (i) of Theorem 930. The next result
generalizes point (ii) of that theorem by showing that concave functions are everywhere su-
perdi¤erentiable and that, moreover, this is exactly a property that characterizes concave
functions (another proof of the tight connection between superdi¤erentiability and concav-
ity).

Proposition 942 A function f : C ! R is concave if and only if @f (x) is non-empty for


all x 2 C.

Proof We only prove the “if” part. Suppose @f (x) 6= ; at all x 2 C. Let x1 ; x2 2 C and
t 2 [0; 1]. Let 2 @f (tx1 + (1 t) x2 ). By (22.11),

f (x1 ) f (tx1 + (1 t) x2 ) + (x1 (tx1 + (1 t) x2 ))


f (x2 ) f (tx1 + (1 t) x2 ) + (x2 (tx1 + (1 t) x2 ))

that is,

f (x1 ) (1 t) (x1 x2 ) f (tx1 + (1 t) x2 )


f (x2 ) t (x2 x1 ) f (tx1 + (1 t) x2 )
648 CHAPTER 22. CONCAVITY AND DIFFERENTIABILITY

Hence,

f (tx1 + (1 t) x2 )
tf (x1 ) t (1 t) (x1 x2 ) + (1 t) f (x2 ) (1 t) t (x2 x1 )
= tf (x1 ) + (1 t) f (x2 )

as desired.

The maintained hypothesis that C is open is key for the last two propositions, as the
next example shows.
p
Example 943 Consider f : [0; +1) ! R de…ned by f (x) = x. The only point of the
(closed) domain in which the function is not di¤erentiable is the boundary point x = 0. The
superdi¤erential @f (0) is given by the scalars such that
p p
y 0 + (y 0) 8y 0 (22.16)
p
i.e., such that y y for each y 0. If y = 0, this inequality holds for all . If y > 0,
p p
the inequality is equivalent to y=y = 1= y. But, letting y tend to 0, this implies
p
limy!0+ 1= y = +1. Therefore, there is no scalar for which (22.16) holds. It follows
that @f (0) = ;. We conclude that f is not superdi¤erentiable at the boundary point 0. N

N.B. We focused on open convex sets C to ease matters, but this example shows that
non-open domains may be important. Fortunately, the results of this section can be easily
extended to such domains. For instance, Proposition 942 can be stated for any convex set C
(possibly not open) by saying that a concave and continuous function f : C ! R is concave
on int C if and only if @f (x) is non-empty at all x 2 int C, i.e., at all interior points x of C.9
p
The concave function f (x) = x is indeed di¤erentiable (and so superdi¤erentiable, with
@f (x) = ff 0 (x)g) at all x 2 (0; 1), that is, at all interior points of the function’s domain
R+ . O

Superdi¤erentials permit to establish a neat characterization of the points of (global)


maximum of any function, not necessarily concave.

Theorem 944 Let f : C ! R be de…ned on a convex set C. Then, x


^ 2 C is a maximizer if
and only if f is superdi¤ erentiable at x
^ and 0 2 @f (^
x).

Proof Let x ^ 2 A be a maximizer. We have f (x) f (^ x) + 0 (x x ^) for every x 2 C, and


so 0 2 @f (^x). Vice versa, let 0 2 @f (^
x). We have f (x) f (^ x) + 0 (x x ^) for every x 2 C,
that is, f (x) f (^x) for each x 2 C, which implies that x^ is a maximizer.

For concave functions this theorem gives the most general version of the …rst order
condition for concave functions. Indeed, in view of Corollary 937, Theorem 933 is a special
case of this result.
9
If the domain C is not assumed to be open, we need to require continuity (which is otherwise automatically
satis…ed by Theorem 609).
22.5. APPENDIX: MONOTONICITY OF OPERATORS 649

Corollary 945 Let f : C ! R be concave. Then, x


^ 2 C is a maximizer if and only if
0 2 @f (^
x).

The next example shows how this corollary makes it possible to …nd maximizers even
when Fermat’s Theorem does not apply because there are points where the function is not
di¤erentiable.

Example 946 For the function f : R ! R de…ned by f (x) = 1 jxj we have ( Example
938): 8
< 1 if x > 0
@f (x) = [ 1; 1] if x = 0
:
1 if x < 0
By Corollary 945, x
^ = 0 is a maximizer since 0 2 @f (0). N

22.5 Appendix: monotonicity of operators


An operator g : C ! Rn is said to be monotone (decreasing) if

(g (x) g (y)) (x y) 0 8x; y 2 C (22.17)

and strictly monotone (decreasing) if the inequality (22.17) is strict if x 6= y. The reader can
verify that for n = 1 we obtain again the usual notions of monotonicity.

Example 947 Consider an a¢ ne function f : Rn ! Rn given by f (x) = Ax + b, where A


is a symmetric n n matrix and b 2 Rn . Then, f is monotone if and only if A is negative
semide…nite, and f is strictly monotone if and only if A is negative de…nite (why?). N
650 CHAPTER 22. CONCAVITY AND DIFFERENTIABILITY
Chapter 23

Implicit functions

23.1 The problem


So far we have studied scalar functions f : A R ! R by writing them in explicit form:

y = f (x)

It is in the usual form that, by keeping separate the independent variable x from the de-
pendent one y, permits to determine the values of the latter from those of the former. The
same function can be rewritten in implicit form, that is, through an equation that keeps all
the variables on the same side of the equality sign:

g (x; f (x)) = 0

where g is the function of two variables de…ned by1

g (x; y) = f (x) y

Example 948 (i) The function f (x) = x2 + x 3 can be written in implicit form as
g (x; f (x)) = 0 with g (x; y) = x2 + x 3 y. (ii) The function f (x) = 1 + lg x can be
written in implicit form as g (x; f (x)) = 0 with g (x; y) = 1 + lg x y. N

Note how
1
g (0) \ (A Im f ) = Gr f
The graph of the function f thus coincides with the level curve g 1 (0) at 0 of the function
g of two variables.2

The implicit rewriting of a scalar function f of which is known the explicit form is just a
bit more than a curiosity because the explicit form contains all the relevant information on f ,
in particular about the kind of dependence existing between the independent variable x and
1
In this section, to ease exposition we denote a function g of two variables by g(x; y) and not by g(x1 ; x2 ),
as in the rest of the text.
2
The rectangle A Im f has as its factors the domain and image of f . Clearly, Gr f A Im f . For
p
example,pfor the function f (x) = x this rectangle is the …rst orthant R2+ of the plane, while for the function
f (x) = x x2 is the unit square [0; 1] [0; 1] of the plane.

651
652 CHAPTER 23. IMPLICIT FUNCTIONS

the dependent one y. Unfortunately, often in applications we …nd important scalar functions
that are not de…ned in explicit form, “ready to use”, but only in implicit form through
equations g (x; y) = 0. For this reason, it is important to consider the inverse problem: does
an equation of the type g (x; y) = 0 de…ne implicitly a scalar function f ? In other words,
does f exist such that g (x; f (x)) = 0? This chapter will address this question by showing
that, under suitable regularity conditions, this function f exists and is unique (locally or
globally, as it will become clear).

An important preliminary observation: there is a close connection between implicit func-


tions and level curves that permits to express in functional terms the properties of the level
curves, a most useful way to describe such properties widely used in applications (cf. Sec-
tion 23.2.2 below). Because of its importance, in the next lemma we make this connection
rigorous. Note that the role that in the lemma the sets A and B play is to be, respectively,
the domain and codomain of the scalar functions considered. In other words, the lemma
considers functions f : A ! B that belong to a given space B A (cf. Section 6.3.2).

Lemma 949 Let A and B be any two sets in R and let g : C R2 ! R with A B C.
The scalar function f : A ! B is the unique function in B A with the property

g (x; f (x)) = 0 8x 2 A (23.1)

if and only if it is such that


1
g (0) \ (A B) = Gr f (23.2)

Note that (23.2) amounts to say that

g (x; y) = 0 () y = f (x) 8 (x; y) 2 A B

Moreover, if C = A B, then (23.2) simpli…es to g 1 (0) = Gr f .3

Proof “Only if”. Let (x; y) 2 Gr f . By de…nition, (x; y) 2 A B and y = f (x), thus
g (x; y) = g (x; f (x)) = 0. This implies (x; y) 2 g 1 (0) \ (A B), and so Gr f g 1 (0) \
1
(A B). As to the converse inclusion, let (x; y) 2 g (0) \ (A B). We want to show that
y = f (x). Suppose not, i.e., y 6= f (x). De…ne f~ : A ! R by f~ (x0 ) = f (x0 ) if x0 6= x and
f~ (x) = y. Since g (x; y) = 0, then g(x; f~ (x)) = 0 for every x 2 A. Since (x; y) 2 A B,
we have f~ 2 B A . Being f~ 6= f , this contradicts the uniqueness of f . We conclude that
(23.2) holds, as desired. “If”. By de…nition, (x; f (x)) 2 Gr f for each x 2 A. By (23.2),
we have (x; f (x)) 2 g 1 (0), and so g (x; f (x)) = 0 for each x 2 A. It remains to prove the
uniqueness of f . Let h 2 B A satisfy (23.1). By arguing as in the …rst inclusion of the “only
if” part of the proof, we can prove that Gr h g 1 (0) \ (A B). By (23.2), this yields
Gr h Gr f . If we consider x 2 A, then (x; h (x)) 2 Gr h Gr f . Since (x; h (x)) 2 Gr f ,
then (x; h (x)) = (x0 ; f (x0 )) for some x0 2 A. This implies x = x0 and h (x) = f (x0 ), and so
h (x) = f (x). Since x was arbitrarily chosen, we conclude that f = h, as desired.
3 1 1 1
In this case g (0) = f(x; y) 2 A B : g (x; y) = 0g and so g (0) \ (A B) = g (0).
23.2. A LOCAL PERSPECTIVE 653

23.2 A local perspective


23.2.1 Implicit Function Theorem
We begin by addressing the question that we posed from a local point of view. We focus on a
point (x0 ; y0 ) that solves equation g (x; y) = 0, i.e., such that g (x0 ; y0 ) = 0 or, equivalently,
such that (x0 ; y0 ) 2 g 1 (0).

De…nition 950 Given g : A R2 ! R, we say that equation g (x;y) = 0 de…nes implicitly


at the point (x0 ; y0 ) 2 g 1 (0) a scalar function if there exist neighborhoods B (x0 ) and V (y0 )
for which there is a unique scalar function f : B (x0 ) ! V (y0 ) such that

g (x; f (x)) = 0 8x 2 B (x0 ) (23.3)

The function f : B (x0 ) ! V (y0 ) is called implicit and is de…ned “locally” at the point
(x0 ; y0 ). The local point of view is particularly suited for di¤erential calculus, as the next
famous result, the Implicit Function Theorem, shows.4 It is the most important result in the
study of implicit functions and is widely used in applications.

Theorem 951 (Implicit Function Theorem) Let g : U ! R be de…ned (at least) on an


open set U of R2 and let g (x0 ; y0 ) = 0. If g is continuously di¤ erentiable on a neighborhood
of (x0 ; y0 ), with
@g
(x0 ; y0 ) 6= 0 (23.4)
@y
then there exist neighborhoods B (x0 ) and V (y0 ) and a unique function f : B (x0 ) ! V (y0 )
such that
g (x; f (x)) = 0 8x 2 B (x0 ) (23.5)
The function f is surjective and continuously di¤ erentiable on B (x0 ), with
@g
(x; y)
f 0 (x) = @x (23.6)
@g
(x; y)
@y

for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )).

The function f : B (x0 ) ! V (y0 ) is, therefore, de…ned implicitly by the equation g (x;y) =
0. Since f is unique and surjective, in view of Lemma 949 the relation (23.5) is equivalent to

g (x; y) = 0 () y = f (x) 8 (x; y) 2 B (x0 ) V (y0 ) (23.7)

that is, to
1
g (0) \ (B (x0 ) V (y0 )) = Gr f (23.8)
Thus, the level curve g 1 (0) can be represented locally by the graph of the implicit function.
This is, in the …nal analysis, the reason why the theorem is so important in applications (as
we will see shortly in Section 23.2.2).
4
This theorem …rst appeared in lecture notes that Ulisse Dini prepared in 1870s. For this reason, sometimes
it is named after Dini.
654 CHAPTER 23. IMPLICIT FUNCTIONS

Formula (23.6) permits the computation of the …rst derivative of the implicit function
even without knowing its explicit form. Since the …rst derivative is often what really mat-
ters about such function (because, for example, we are interested in solving a …rst order
condition), this is a most useful feature of the Implicit Function Theorem.

At the point (x0 ; y0 ) formula (23.6) takes the form


@g
(x0 ; y0 )
0
f (x0 ) = @x
@g
(x0 ; y0 )
@y
Note that the use of formula (23.6) is based on the clause “(x; y) 2 g 1 (0) \ B (x0 ) V (y0 )”
that requires to …x both variables x and y. This is the price to pay in implicit derivability
(in contrast, in explicit derivation it is su¢ cient to …x the variable x in order to compute
f 0 (x)). On the other hand, we can rewrite (23.6) as
@g
(x; f (x))
f 0 (x) = @x (23.9)
@g
(x; f (x))
@y
for each x 2 B (x0 ), thus emphasizing the role played by the implicit function. Formulations
(23.6) and (23.9) are both useful, for di¤erent reasons; it is better to keep both of them in
mind. As we remarked, formulation (23.6) permits to compute the …rst derivative of f even
without knowing f itself, thus establishing a useful …rst order local approximation of f . For
this reason in the examples we will always use (23.6) because the closed form of f will not
be available.

The proof of the Implicit Function Theorem is in the Appendix. We can, however, derive
heuristically formula (23.6) through the total di¤erential
@g @g
dg = dx + dy
@x @y
of the function g. We have dg = 0 for variations (dx; dy) that keep us along the level curve
g 1 (0). Therefore,
@g @g
dx = dy
@x @y
which “implies” (the power of heuristics!):
@g
dy @x
= @g
dx
@y

It is a rough (and incorrect), but certainly useful, argument in order to remember (23.6).

Example 952 In the trivial case of a linear function g (x; y) = ax+by k, equation g (x; y) =
0 becomes ax + by k = 0. From it we immediately get
a k
y = f (x) = x+
b b
23.2. A LOCAL PERSPECTIVE 655

provided b 6= 0. Also in this very simple case, the existence of an implicit function requires
the condition b = @g (x) =@y 6= 0. N

Example 953 Let g : R2 ! R be given by g (x; y) = x2 xy 3 +y 5 16. Let us check whether


equation g (x; y) = 0 de…nes implicitly a function at the point (x0 ; y0 ) = (4; 2) 2 g 1 (0).
The function g is continuously di¤erentiable on R2 , we have @g (x; y) =@y = 3xy 2 + 5y 4
and, therefore,
@g
(4; 2) = 32 6= 0
@y
By the Implicit Function Theorem, there exists a unique continuously di¤erentiable f :
B (4) ! V ( 2) such that
x2 xf 3 (x) + f 5 (x) = 16 8x 2 B (4)
Moreover, since @g (x; y) =@x = 2x y3,
@g
(4; 2) 2 4 ( 2)3 16 1
0
f (4) = @x = = =
@g 3 4 ( 2)2 + 5 ( 2)4 32 2
(4; 2)
@y
In general, at every point (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )) in which @g (x; y) =@y 6= 0,
we have
@g
(x; y) 2x y 3 y 3 2x
f 0 (x) = @x = =
@g 3xy 2 + 5y 4 3xy 2 + 5y 4
(x; y)
@y
In particular, the …rst order local approximation in a neighborhood of x0 is
f (x) = f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x0 )
y03 2x0
= y0 + (x x0 ) + o (x x0 )
3x0 y02 + 5y04
for every x 2 B(x0 ).5 N

Sometimes it is possible to …nd stationary points of the implicit function without knowing
its explicit form. When this happens, it is a remarkable application of the Implicit Function
Theorem. For instance, consider in the previous example the point (4; 2) 2 g 1 (0). We have
(@g=@y) (4; 2) = 32 6= 0. Let f : B (4) ! V (2) be the unique function then de…ned implicitly
at the point (4; 2).6 We get:
@g
(4; 2) 0
0
f (4) = @x = =0
@g 32
(4; 2)
@y
Therefore, the point x0 = 4 is a stationary point for the implicit function f . It is possible to
check that it is actually a local maximizer.
5
The reader can verify that also ( 12; 2) 2 g 1 (0) and @g=@y ( 12; 2) 6= 0, and calculate f 0 ( 12) for
the implicit function de…ned at ( 12; 2).
6
This function is di¤erent from the previous implicit function de…ned at the other point (4; 2).
656 CHAPTER 23. IMPLICIT FUNCTIONS

Example 954 (i) Consider the function g : R2 ! R given by g (x; y) = 7x2 + 2y ey . The
hypotheses of the Implicit Function Theorem are satis…ed at every point (x0 ; y0 ) 2 R2 . Thus,
equation g (x; y) = 0 de…nes implicitly at a point (x0 ; y0 ) 2 g 1 (0) a scalar continuously
di¤erentiable function f : B (x0 ) ! V (y0 ) with
@g(x;y)
14x
f 0 (x) = @x
= (23.10)
@g(x;y) 2 ey
@y

for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )). Even if we do not know the form of f , we have
been able to …nd its derivative function f 0 . The …rst order local approximation is

f (x) = f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x0 )


14x0
= y0 (x x0 ) + o (x x0 )
2 ey0
p p
at (x 0 ; y0 ). For example, at the point (1= 7; 0) 2 g 1 (0) we have, as x ! 1= 7,

1 p 1 1
f p = 2 7 x p +o x p
7 7 7

(ii) Let g : R2 ! R be given by g (x; y) = x3 + 4yex + y 2 + xey . If g (x0 ; y0 ) = 0


and @g (x0 ; y0 ) =@y 6= 0, thanks to the Implicit Function Theorem the equation g (x; y) = 0
de…nes at (x0 ; y0 ) a unique scalar continuously di¤erentiable function f : B (x0 ) ! V (y0 )
with
@g(x;y)
3x2 + 4yex + ey
f 0 (x) = @g(x;y)
@x
=
4ex + 2y + xey
@y

for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )). The …rst order local approximation is

f (x) = f (x0 ) + f 0 (x0 ) (x x0 ) + o (x x0 )


3x20 + 4y0 ex0 + ey0
= y0 (x x0 ) + o (x x0 )
4ex0 + 2y0 + x0 ey0

at (x 0 ; y0 ). For example, if (x0 ; y0 ) = (0; 0) we have @g (0; 0) =@y = 4 6= 0, so


@g(0;0)
1
f 0 (0) = @x
@g(0;0)
=
4
@y

and, as x ! 0,
1
f (x) = y0 + f 0 (0) x + o (x) = x + o (x)
4
N

By exchanging the variables in the Implicit Function Theorem, we can say that the
continuity of the partial derivatives of g in a neighborhood of (x0 ; y0 ) and the condition
@g (x0 ; y0 ) =@x 6= 0 ensures the existence of a (unique) implicit function x = ' (y) such that,
locally, we have g (' (y) ; y) = 0. It follows that, if at least one of the two partial derivatives
23.2. A LOCAL PERSPECTIVE 657

@g (x0 ; y0 ) =@x and @g (x0 ; y0 ) =@y is not zero, there is locally a univocal tie between the two
variables. As a result, the Implicit Function Theorem cannot be applied only when both the
partial derivatives @g (x0 ; y0 ) =@y and @g (x0 ; y0 ) =@x are zero.
For example, if g (x; y) = x2 + y 2 1, then for every point (x0 ; y0 ) that satis…es the
equation g (x; y) = 0 we have @g (x0 ; y0 ) =@y = 2y0 , which is zero only for y0 = 0 (and hence
x0 = 1). At the two points (1; 0) and ( 1; 0) the equation does not de…ne, indeed, any
implicit function of the type y = f (x). But, @g ( 1; 0) =@x = 2 6= 0 and, therefore, in such
points the equation de…nes an implicit function of the type x = ' (y). Symmetrically, at the
two points (0; 1) and (0; 1) the equation de…nes an implicit function of the type y = f (x),
but not one of the type x = ' (y).

This last remark suggests a …nal important observation on the Implicit Function The-
orem. Suppose that, as at the beginning of the chapter, ' is a standard function de…ned in
explicit form, which can be written in implicit form as

g (x; y) = ' (x) y (23.11)

Given (x0 ; y0 ) 2 g 1 (0), suppose @g (x0 ; y0 ) =@x 6= 0. The Implicit Function Theorem (in
“exchanged” form) then ensures the existence of neighborhoods B (y0 ) and V (x0 ) and of a
unique function f : B (y0 ) ! V (x0 ) such that

g (f (y) ; y) = 0 8y 2 B (y0 )

that is, by recalling (23.11),

' (f (y)) = y 8y 2 B (y0 )

The function f is, therefore, the inverse of ' on the neighborhood B (y0 ). The Implicit
Function Theorem thus implies the existence –locally, around the point y0 –of the inverse
of '. In particular, formula (23.6) here becomes

@g
(x0 ; y0 )
@y 1
f 0 (y0 ) = = 0
@g ' (x0 )
(x0 ; y0 )
@x
which is the classical formula (18.20) of the derivative of the inverse function. In sum, there
is a close connection between implicit and inverse functions, which the reader will explore in
more advanced courses.

23.2.2 Level curves and marginal rates


Given a function g : U R2 ! R and a scalar k, the simple transformation gk (x; y) =
g(x; y) k allows us to bring back the study of the level curve of level k
1
g (k) = f(x; y) 2 A : g (x; y) = kg

to the curve of level 0


gk 1 (0) = f(x; y) 2 A : gk (x; y) = 0g
658 CHAPTER 23. IMPLICIT FUNCTIONS

since g 1 (k) = gk 1 (0). The Implicit Function Theorem enables us to study locally gk 1 (0),
and so g 1 (k). In particular, the implicit function f : B (x0 ) ! V (y0 ) permits to establish
a functional representation of the level curve g 1 (k) through the fundamental relation
1
g (k) \ (B (x0 ) V (y0 )) = Gr f (23.12)

which is the general form of (23.7) for any k 2 R. Implicit functions thus describe the link
between the variables x and y that belong to the same level curve, thus making it possible
to formulate trough them some key properties of these curves. The great e¤ectiveness of this
formulation explains the importance of implicit functions, as mentioned right after (23.7).
For example, the isoquant g 1 (k) is a level curve of the production function g : R2+ ! R,
which features two inputs, x and y, and one output. The points (x; y) that belong to the
isoquant are all the input combinations that keep constant the quantity of output produced.
The implicit function y = f (x) tells us, locally, how it has to change the quantity y, when
x varies, in order to keep constant the overall production. Therefore, the properties of the
function f : B (x0 ) ! V (y0 ) characterize, locally, the relations between the inputs that
guarantee the level k of output. We usually assume that f is:

(i) decreasing, that is, f 0 (x) 0 for every x 2 B (x0 ): the two inputs are partially
substitutable and, to keep unchanged the quantity produced to the level k, to lower
quantities of the input x have to correspond larger quantities of the input y (and vice
versa);

(ii) convex, that is, f 00 (x0 ) 0 for every x 2 B (x0 ): to greater levels of x, have to corres-
pond larger and larger quantities of y in order to compensate (negative) in…nitesimal
variations of x in order to keep production at level k.

The absolute value jf 0 j of the derivative of the implicit function is called marginal rate
of transformation because, for in…nitesimal variations of the inputs, it describes their degree
of substitutability – that is, the variation of y that balances an increase in x. Thanks to
the functional representation (23.12) of the isoquant, geometrically the marginal rate of
transformation can be interpreted as the slope of the isoquant at (x; y). This is the classical
interpretation of the rate, which follows from (23.12).
The Implicit Function Theorem implies the classical formula
@g
(x; y)
M RTx;y = f 0 (x) = @x
@g
(23.13)
@y (x; y)

This is the usual form in which appears the notion of marginal rate of transformation
M RTx;y .7

Example 955 Let g : R2+ ! R be the Cobb-Douglas production function g (x; y) = x y 1 ,


with 0 < < 1. The marginal rate of transformation is
@g 1y1
@x (x; y) x y
M RTx;y = @g
= =
(x; y) (1 )x y 1 x
@y
7
In (23.13) appear directly the partial derivatives of g, which are equal to those of its transformation gk .
23.2. A LOCAL PERSPECTIVE 659

For example, at a point at which we use equal quantities of the two inputs –that is, x = y –
if we increase of one unit the …rst input, the second one must decrease of = (1 ) units to
leave unchanged the quantity of output produced: in particular, when = 1=2, the decrease
of the second one must be of one unit. At a point at which we use a quantity of the second
input …ve times bigger than that of the …rst one –that is, y = 5x –an increase of one unit
of the …rst input is compensated by a decrease of 5 = (1 ) of the second one. N
Similar considerations hold for the level curves of a utility function u : R2+ ! R, that
is, for its indi¤erence curves u 1 (k). The implicit functions tell us, locally, how has to
vary the quantity y when x varies to keep constant the overall utility level. For them we
assume properties of monotonicity and convexity similar to those assumed for the implicit
functions de…ned by isoquants. The monotonicity of the implicit function re‡ects the partial
substitutability of the two goods: it is possible to consume a bit less of one good and a
bit more of the other one and to keep unchanged the overall level of utility. The convexity
of the implicit function models the classical hypothesis of decreasing rates of substitution:
when the quantity of a good, for example x, increases we then need greater and greater
“compensative” variations of the other good y in order to remain on the same indi¤erence
curve, i.e., in order to have u (x; y) = u (x + x; y + y).
The absolute value jf 0 j of the derivative of the implicit function is called marginal rate of
substitution: it measures the (negative) variation in y that balances marginally an increase
in x. Geometrically, it is the slope of the indi¤erence curve at (x; y). Thanks to the Implicit
Function Theorem, we have
@u
(x; y)
M RSx;y = f 0 (x) = @x
@u
@y (x; y)
which is the classical form of the marginal rate of substitution.

Let h be a scalar function with a strictly positive derivative, so that it is strictly increasing
and h u is then a utility function equivalent to u. By the chain rule,
@h u
@x (x; y) h0 (u (x; y)) @u
@x (x; y)
@u
@x (x; y)
@h u
= @u
= @u
(23.14)
@y (x; y) h0 (u (x; y)) @y (x; y) @y (x; y)
Since we can drop the derivative h0 (u (x; y)), the marginal rate of substitution is the same
for u and for all its increasing transformations h u. Thus, the marginal rate of substitution
is an ordinal notion, invariant for strictly increasing transformations. It does not depend
on which equivalent utility function, u or h u, is considered. This explains the centrality
of this ordinal notion in consumer theory, where it replaced the notion of marginal utility
(which is instead, as already observed, a cardinal notion).
Example 956 To illustrate (23.14), consider on Rn++ the equivalent Cobb-Douglas utility
function u (x; y) = xa y 1 a and log-linear utility function log u (x; y) = a log x + (1 a) log y.
We have
@u @ log(u(x;y))
@x (x; y) axa 1 y 1 a a y @x (x; y)
M RSx;y = @u
= a a
= = @ log(u(x;y))
@y (x; y) (1 a) x y 1 ax (x; y)
@y

The two utility functions have the same marginal rate of substitution. N
660 CHAPTER 23. IMPLICIT FUNCTIONS

Finally, let us consider a consumer that consumes in two periods, today and tomorrow,
with intertemporal utility function U : R2+ ! R given by

U (c1 ; c2 ) = u (c1 ) + u (c2 )

where we assume the same instantaneous utility function u in the two periods. Given a
utility level k, let
U 1 (k) = (c1 ; c2 ) 2 R2+ : U (c1 ; c2 ) = k
be the intertemporal indi¤erence curve and let (c1 ; c2 ) be a point of it. When the hypotheses
of the Implicit Function Theorem (with the variables exchanged) are satis…ed at such point,
there exists an implicit function f : B (c2 ) ! V (c1 ) such that

U (f (c2 ) ; c2 ) = 0 8c2 2 B (c2 )

The scalar function c1 = f (c2 ) tells us how much has to vary consumption today c1 when
consumption tomorrow c2 varies, in order to keep constant the overall utility U . We have:
@U
(c1 ; c2 )
@c2 u0 (c2 )
f 0 (c2 ) = =
@U u0 (c1 )
(c1 ; c2 )
@c1
When it exists,
u0 (c2 )
IM RSc1 ;c2 = f 0 (c2 ) = (23.15)
u0 (c1 )
is called intertemporal marginal rate of substitution: it measures the (negative) variation in
c1 that balances an increase in c2 .

Example 957 Consider the power utility function u (c) = c = for > 0. We have
c1 c2
U (c1 ; c2 ) = +

so that the intertemporal marginal rate of substitution is (c2 =c1 ) 1


. N

23.2.3 Quadratic expansions


The Implicit Function Theorem says, inter alia, that if the function g is continuously dif-
ferentiable, then also the implicit function f is continuously di¤erentiable. The next result
shows that this important property holds much more generally.

Theorem 958 If in the Implicit Function Theorem the function g is n times continuously
di¤ erentiable, then also the implicit function f is n times continuously di¤ erentiable. In
particular, for n = 2 we have
2 2
@g(x;y) @g(x;y)
@2x @y 2 @g(x;y)
@x@y
@g(x;y) @g(x;y)
@x @y + @g(x;y)
@2y
@g(x;y)
@x
f 00 (x) = 3 (23.16)
@g(x;y)
@y

for every x 2 U (x0 ).


23.2. A LOCAL PERSPECTIVE 661

This expression can be written in a compact way as


00 g 02
gxx 00 g 0 g 0 + g 00 g 02
2gxy
y x y yy x
f 00 (x) = 03
gy

The numerator somehow reminds of a square formula, so it is easier to remember.

Proof We shall omit the proof of the …rst part of the statement. Suppose f is twice
di¤erentiable and let us apply the chain rule to (23.6), that is to
@g(x;f (x))
gx0 (x; f (x))
f 0 (x) = @x
=
@g(x;f (x)) gy0 (x; f (x))
@y

For the sake of brevity we do not make the dependence of the derivatives of g on (x; f (x))
explicit, so we can write
00 + g 00 f 0 (x) g 0
gxx gx0 gyx
00 + g 00 f 0 (x)
xy y yy
f 00 (x) = 2 + 2
gy0 gy0 (x; f (x))
0 0
00
gxx 00 gx g 0
gxy 00
gx0 gyx 00 gx
gyy
g0 y y g0 y
= 2 + 2
gy0 gy0
00 g 02
gxx 00 g 0 g 0 + g 00 g 02
2gxy
y x y yy x
=
gy0 3

as desired.

What we have seen in the two previous theorems allows us to give local approximations
for an implicitly de…ned function. As we know, one is rarely able to write the explicit
formulation of a function which is implicitly de…ned by an equation: being able to give
approximations is hence of great importance.
If g is of class C 1 on an open set U , the …rst order approximation of the implicitly de…ned
function in a point (x0 ; y0 ) 2 A such that g (x0 ; y0 ) = 0 is

@g
(x0 ; f (x0 ))
f (x) = y0 @x (x x0 ) + o (x x0 )
@g
(x0 ; f (x0 ))
@y

for x ! x0 .
If f is of class C 2 on an open set U , the second order approximation (often referred to
as quadratic) of the implicit function in a point (x0 ; y0 ) 2 U such that g (x0 ; y0 ) = 0 is, for
x ! x0 ,
00 g 0 2 00 g 0 g 0 + g 00 g 02
gx0 gxx y 2gxy x y yy x
f (x) = y0 (x x0 ) + (x x0 )2 + o (x x0 )2
gy0 gy03

where we omitted the dependence of the derivatives on the point (x0 ; f (x0 )).
662 CHAPTER 23. IMPLICIT FUNCTIONS

Example 959 Given the function in Example 953 we have that

2 (3x0 + 2y0 )2 6 (2x0 + 3y0 ) (3x0 + 2y0 ) + 2 (2x0 + 3y0 )2


f 00 (x0 ) =
(3x0 + 2y0 )3

so that the quadratic approximation of f is, for x ! x0 ,


2x + 3y0
f (x) = y0 (x x0 )
3x + 2y0
2 (3x0 + 2y0 )2 6 (2x0 + 3y0 ) (3x0 + 2y0 ) + 2 (2x0 + 3y0 )2
(x x0 )2
(3x0 + 2y0 )3
+ o (x x0 )2

in a generic point (x 0 ; y0 ) 2 g 1 (0). For example, in (x 0 ; y0 ) = (0; 1) 2 g 1 (0) we have, for


x ! 0,
3 10 2
f (0) = 1 x x + o (jxj)
2 8
Furthermore, knowing the second derivatives allows us to complete the analysis of the
critical point (x0 ; y0 ) = (1=2; 1). We have that

316
f 00 (x0 ) = >0
1331
and so the point is a local minimizer. N

23.2.4 Implicit vector functions


From a mathematical perspective, the variables x and y are symmetrical in equation g (x; y) =
0: we can try to express y in terms of x, so to have g (x; f (x)) = 0, or x in terms of y, so
to have g (f (y) ; y) = 0. We have concentrated on the …rst case for convenience, however, as
we often noted, all notions and result are symmetrical in the second case.
In this section we shall extend the analysis of implicit functions to the case

g (x1 ; :::; xn ; y) = 0

in which the independent variable can be a a vector, while the dependent one is still scalar.
In this case we have that g : A Rn+1 ! R and the function implicitly de…ned by equation
g (x; y) = 0 is a function f in n variables.
Fortunately, the results on implicit functions we outlined for x scalar can be easily ex-
tended to the case in which x is a vector. Let us have a look at the vectorial version of
Dini’s Theorem. Since f is a function in many variables, the partial derivatives @f (x) =@xk
substitute the derivative f 0 (x) from the scalar case.

Theorem 960 Let g : U ! R be de…ned (at least) on an open set U of Rn and let g (x0 ; y0 ) =
0. If g is continuously di¤ erentiable on a neighborhood of (x0 ; y0 ), with

@g
(x0 ; y0 ) 6= 0
@y
23.2. A LOCAL PERSPECTIVE 663

then there exist neighborhoods B (x0 ) Rn and V (y0 ) R and a unique vector function
f : U (x0 ) ! V (y0 ) such that
g (x; f (x)) = 0 8x 2 U (x0 ) (23.17)
The function f is surjective and continuously di¤ erentiable on B (x0 ), with
@g
(x; y)
@f @xk
(x) = (23.18)
@xk @g
(x; y)
@y
for every (x; y) 2 g 1 (0) \ U (x0 ) V (y0 ) and every k = 1; :::; n.

By using the gradient, (23.18) can be written as


rx g (x; y)
rf (x) =
@g
(x; y) (x; y)
@y
where rx g denotes the partial gradient of g with respect to x1 , x2 , ..., xn only. Moreover,
being f unique and surjective, also in this more general case (23.17) is equivalent to (23.7)
and (23.8).

Example 961 Let g : R3 ! R be de…ned as g (x1 ; x2 ; y) = x21 x22 +y 3 and let (x1 ; x2 ; y0 ) =
(6; 3; 3). We have that g 2 C 1 R3 and so (@g=@y) (x; y) = 3y 2 , therefore
@g
(6; 3; 3) = 27 6= 0
@y
By Dini’s Theorem there exists a unique y = f (x1 ; x2 ) de…ned in a neighborhood U (6; 3),
which is di¤erentiable therein and takes values in a neighborhood V ( 3). Since
@g @g
(x; y) = 2x1 and (x; y) = 2x2
@x1 @x2

we have that
@f 2x1 @f 2x2
(x) = and (x) = 2
@x1 3y 2 @x2 3y

In particular
4 2
rf (6; 3) = ;
27 27

The reader can check that a global implicit function exists f : R2 ! R and, after having
recovered the explicit expression (which exists because of the simplicity of g), can verify that
Dini’s formula is correct in computing rf (x). N

If in the previous theorems we assume that g is of class C n instead of class C 1 , the


implicitly de…ned function f is also of class C n . This allows us to recover formulas analogous
to (23.16) to compute further derivatives, up to order n included, for the implicit function
f . We omit details for the sake of brevity.
664 CHAPTER 23. IMPLICIT FUNCTIONS

23.2.5 Implicit operators


A more general case is
g (x1 ; :::; xn ; y1 ; :::; ym ) = 0
in which both the dependent and independent variables are vectors. Here g : A Rn+m ! R
is a vector function and the equation implicitly de…nes an operator f = (f1 ; :::; fm ) between
Rn and Rm such that
g (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0
Even more generally, we can consider the nonlinear system of equations:
8
>
> g1 (x1 ; :::; xn ; y1 ; :::; ym ) = 0
<
g2 (x1 ; :::; xn ; y1 ; :::; ym ) = 0
>
>
:
gm (x1 ; :::; xn ; y1 ; :::; ym ) = 0
Here also g = (g1 ; ::; gn ) : A Rn+m ! Rn is an operator and the equation de…nes an
operator f = (f1 ; :::; fm ) between Rn and Rm such that
8
>
> g1 (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0
<
g2 (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0
(23.19)
>
>
:
gm (x1 ; :::; xn ; f1 (x1 ; :::; xn ) ; :::; fm (x1 ; :::; xn )) = 0
Let us focus directly on this latter general case. Here the following square submatrix of
the Jacobian matrix of the operator g plays a key role:
2 @g1 @g1 @g1 3
@y1 (x; y) @y2 (x; y) @ym (x; y)
6 7
6 7
6 @g 7
6 2 @g2 @g2 7
6 @y1 (x; y) @y2 (x; y) @ym (x; y) 7
6 7
Dy g (x; y) = 6 7
6 7
6 7
6 7
6 @gm (x; y) @gm (x; y) @gm
(x; y) 7
4 @y1 @y2 @ym 5

We can now state, without proof, the operator version of the Implicit Function Theorem,
which is the most general form that we consider.

Theorem 962 Let g : U ! Rm be de…ned (at least) on an open set U of Rn+m and let
g (x0 ; y0 ) = 0. If g is continuously di¤ erentiable on a neighborhood of (x0 ; y0 ), with
det Dy g (x; y) 6= 0 (23.20)
then there exist neighborhoods B (x0 ) Rn and V (y0 ) Rm and a unique operator f =
(f1 ; :::; fm ) : B (x0 ) ! V (y0 ) such that (23.19) holds for every x 2 B (x0 ). The operator f
is surjective and continuously di¤ erentiable on B (x0 ), with
1
Df (x) = (Dy g (x; y)) Dx g (x; y) (23.21)
for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )).
23.3. A GLOBAL PERSPECTIVE 665

The Jacobian of the implicit operator is thus pinned down by formula (23.21). To better
understand this formula, it is convenient to write it as an equality
Dy g (x; y)Df (x) = D g (x; y)
| {z }| {z } | x {z }
m m m n m n

of two m n matrices. In terms of the (i; j) 2 f1; :::; mg f1; :::; ng component of each such
matrix, the equality is
Xm
@gi @fk @gi
(x) (x) = (x)
@yk @xj @xj
k=1
For each independent variable xj , we can determine the sought-after m-dimensional vector
@f1 @fm
(x) ; :::; (x)
@xj @xj
by solving the following linear system with m equations:
8
> Pm @g1 @fk @g1
>
> (x) (x) = (x)
>
> k=1
@yk @xj @xj
>
>
>
< Pm @g2 (x) @fk (x) = @g2 (x)
k=1
@yk @xj @xj
>
>
>
>
>
> Pm @gm @fk @gm
>
> (x) (x) = (x)
: k=1
@yk @xj @xj
By doing this for each j, we can …nally determine the Jacobian Df (x) of the implicit
operator.
Our previous discussion implies, inter alia, that in the special case m = 1 formula (23.21)
reduces to
@g @f @g
(x) (x) = (x)
@y @xj @xj
which is formula (23.18) of the vector function version of the Implicit Function Theorem.
Since condition (23.20) reduces to (@g=@y) (x0 ; y0 ) 6= 0, we conclude that the vector function
version is, indeed, the special case m = 1.

23.3 A global perspective


So far we addressed the motivating question that we posed in the …rst section from a local
perspective. This local approach, via the Implicit Function Theorem, could take advantage
of di¤erential analysis. We now take a global perspective, in which projections become key
(Appendix 23.5.1).
Note that
g 1 (0) 1 g
1
(0) 2 g
1
(0) (23.22)
So, for g (x; f (x)) = 0 to be well posed we need
1 1
x2 1 g (0) and f (x) 2 2 g (0)
If such an implicit function exists, its domain will be included in 1 g 1 (0) and its codomain
will be included in 2 g 1 (0) . This leads us to the following de…nition.
666 CHAPTER 23. IMPLICIT FUNCTIONS

De…nition 963 The equation g (x;y) = 0, with g : A R2 ! R, implicitly de…nes on the


rectangle
1 1
E = E1 E2 1 g (0) 2 g (0) (23.23)
a scalar function f : E1 ! E2 if

g (x; f (x)) = 0 8x 2 E1

If such an f is unique, equation g (x;y) = 0 is said to be explicitable on E.

The uniqueness of the implicit function f is crucial in applications as it guarantees a


univocal relationship between variables x and y. For such a reason, most of the results we
shall see will deal with equations g (x;y) = 0 which implicitly de…ne a unique function f .
Surjectivity, that is Im f = E2 , is another relevant property of f . Indeed, in light of
Lemma 949, in such a case we have that
1
g (0) \ E = Gr f (23.24)

that is
g (x; y) = 0 () y = f (x) 8 (x; y) 2 E
In such a signi…cant case, the implicit function f allows us to represent the level curve
g 1 (0) on E by means of its graph Gr f . In other words, the level curve admits a functional
representation. In particular, when E is the rectangle 1 g 1 (0) 2 g
1 (0) , it follows

from inclusion (23.22) that (23.24) takes the form


1
g (0) = Gr f

The following result illustrates these ideas.

Example 964 Let g : R2 ! R be given by g (x; y) = x2 + y 2 1. The level curve


1
g (0) = (x; y) 2 R2 : x2 + y 2 = 1

is the unit circle. Since


1 1
1 g (0) 1 g (0) = [ 1; 1] [ 1; 1]

we have that
E [ 1; 1] [ 1; 1]
that is, the possible implicit function takes the form f : E1 ! E2 with E1 [ 1; 1] and
E2 [ 1; 1]. Let us …x x 2 [ 1; 1] so to analyze the set

S (x) = y 2 [ 1; 1] : x2 + y 2 = 1

of solutions y to the equation x2 + y 2 = 1. We have that


8
>
> f0g if x = 1
>
< n p o
p
S (x) = 1 x2 ; 1 x2 if 0 < x < 1
>
>
>
:
f0g if x = 1
23.3. A GLOBAL PERSPECTIVE 667

The set has two elements, except for x = 1. In other words, for every 0 < x < 1 there are
two values y for which g (x; y) = 0. Let us consider the rectangle made up by the projections,
that is
E = [ 1; 1] [ 1; 1]

Any function f : [ 1; 1] ! [ 1; 1] such that

f (x) 2 S (x) 8x 2 [ 1; 1]

entails that
g (x; f (x)) = 0 8x 2 [ 1; 1]

and is thus implicitly de…ned by g on E. Such functions are in…nitely many; for example,
this is the case for function
( p
1 x2 if x 2 Q\ [ 1; 1]
f (x) = p
1 x2 otherwise

as well as for the functions


p p
f (x) = 1 x2 and f (x) = 1 x2 8x 2 [ 1; 1] (23.25)

Therefore, there are in…nitely many functions implicitly de…ned by g on the rectangle
E = [ 1; 1] [ 1; 1].8 The equation g (x; y) = 0 is therefore not explicitable on the rectangle
[ 1; 1] [ 1; 1], which makes this case hardly interesting. Let us consider instead the less
ambitious rectangle
~ = [ 1; 1]
E [0; 1]
p
The function f : [ 1; 1] ! [0; 1] de…ned as f (x) = 1 x2 is the only function such that

p
g (x; f (x)) = g x; 1 x2 = 0 8x 2 [ 1; 1]

The function f is the only function which is implicitly de…ned by g on the rectangle E, ~ and so
~ Moreover, f is surjective, that is f ([ 1; 1]) = [0; 1],
equation g (x; y) = 0 is explicitable on E.
which implies that
g 1 ~ = Gr f
(0) \ E

The level curve g 1 (0) ~ by means of the graph of f .


can be represented on E

8
Note that most of them are somewhat irregular; the only continuous ones among them are the two in
(23.25).
668 CHAPTER 23. IMPLICIT FUNCTIONS

y
2.5

1.5

0.5

0
-1 O 1 x
-0.5

-1
-2 -1 0 1 2

p
Gra…co di f (x) = 1 x2 per x 2 [ 1; 1]:

Example 965 In a similar fashion, if we consider


p the rectangle E = [ 1; 1] [ 1; 0] and if
we de…ne h : [ 1; 1] ! [ 1; 0] as h (x) = 2
1 x , we have that
p
g (x; h (x)) = g x; 1 x2 = 0 8x 2 [ 1; 1]

and also, since h is surjective as well, that


1
g (0) \ E = Gr h
The scalar function h is the only function which is implicitly de…ned by g on the rectangle
E and the level curve g 1 (0) can be represented by means of its graph. The equation
g (x; y) = 0 is explicitable on E.

y
1.5

0.5

-1 1
0
O x
-0.5

-1

-1.5

-2
-2 -1 0 1 2

p
Gra…co di h(x) = 1 x2 per x 2 [ 1; 1]:
23.3. A GLOBAL PERSPECTIVE 669

Example 966 To sum up, there are in…nitely many implicit functions on the projections
rectangle E, while uniqueness (and surjectivity) can be obtained when we restrict ourselves
to the smaller rectangles E~ and E. The study of implicit functions is of interest on these two
rectangles, as the unique implicit function f de…ned thereon describes a univocal relationship
between the variables x and y which equation g (x; y) = 0 implicitly determines. N

O.R. If we draw the graph of the level curve g 1 (0), that is the locus of points satisfying
equation g (x; y) = 0, one can notice how the rectangle E can be thought of a sort of “frame”
on this graph, isolating a part of it. In some framings the graph is explicitable, in other less
fortunate ones, it is not. By changing the framing we can tell apart di¤erent parts of the
graph according to their explicitability. H

The last example showed how it is important to study, for each x 2 g 1 (0) , the set
1
of solutions
S (x) = y 2 2 g 1 (0) : g (x; y) = 0
The scalar functions f such that f (x) 2 S (x) for every x in their domain, are the possible
implicit functions. In particular, when the rectangle E is such that S (x) \ E2 is a singleton
for each x 2 E1 , we have a unique implicit function. In other words, this is the case when E
is such that, for any …xed x 2 E1 , there is a unique solution y to equation g (x; y) = 0.

Let us see another simple example, warning the reader that these - however useful to …x
ideas - are very fortunate cases: usually constructing S (x) is far from easy (though local,
the Implicit Function Theorem is key in this regard).
p
Example 967 Let g : R2+ ! R be given by g (x; y) = xy 1. We have that
1
g (0) = (x; y) 2 R2+ : xy = 1

since g 1 (0) = g 1 (0) = (0; +1), and so


1 2

E (0; +1) (0; +1) = R2++

Let us …x x 2 (0; +1) and let us analyze the set

S (x) = fy 2 (0; +1) : xy = 1g

We have that
1
S (x) = 8x 2 (0; +1)
x
which leads us to consider E = R2++ and f : (0; +1) ! (0; +1) given by f (x) = 1=x. we
have that
1
g (x; f (x)) = g x; =0 8x 2 (0; +1)
x
and f is the only function implicitly de…ned by g on R2++ . Moreover, since f is surjective,
we have that
g 1 (0) \ R2++ = Gr f
The level curve g 1 (0) can be represented on R2++ , as the graph of f . N
670 CHAPTER 23. IMPLICIT FUNCTIONS

Example 968 Let g : R f0g R be de…ned for each x 6= 0 as


8 y
>
< 1 if x; y 2 Q
x
g (x; y) =
: y 1 otherwise
>
x
There is a unique implicit function f : R f0g ! R on R f0g R given by
(
x if 0 6= x 2 Q
f (x) =
x if x 2 =Q
as the reader can check. N
When writing g (x; y) = 0, variables x and y have symmetric roles, so that we can think
of a relationship of type y = f (x) or of type x = ' (y) indi¤erently. In what follows, we
will always consider a function y = f (x), as the case x = ' (y), can be easily recovered by
conducting a parallel analysis to that we conduct here.

We shall soon present the main results regarding existence and uniqueness of an implicit
function. These results are vastly used in economic theory, which often deals with functions
such as g (x; y) = 0 for which the possibility of the existence of a univocal relationship (and
hence the nature of such relationship) between variables is of paramount interest.
The reader should be aware of the fact that an explicit form can rarely be found for
implicitly de…ned functions. This is possible in the simplest cases only, for example whenever
g is linear; normally one can guarantee the existence of a function implicitly de…ned by an
equation, without being able to …nd it explicit formulation. We shall see that, even when the
explicit form is not available, one can compute its derivative, for example. This will allow us
to use Taylor’s formula in order to give a local approximation of the implicit function, even
when its analytical expression cannot be given).

23.3.1 Implicit functions and comparative statics


The analysis of functions which are implicitly de…ned by functions as
g (x; y) = 0 (23.26)
occurs in economics in at least two settings:
(i) equilibrium analysis, where equation (23.26) derives from an equilibrium condition, in
which y is an equilibrium (endogenous) variable and x is an (exogenous) parameter;
(ii) optimization problems, where equation (23.26) comes from a …rst order condition, in
which y is a choice variable and x is a parameter.
The analysis regarding the relationship between x and y, that is between the values of
the parameter and the resulting choice or equilibrium variable, is referred to as comparative
statics and consists in studying the function f implicitly de…ned by (23.26). The uniqueness
of such an implicit function, and hence the explicitability of equation (23.26), is essential in
order to conduct comparative statics.
The following two subsections will present the aforementioned comparative statics prob-
lems.
23.3. A GLOBAL PERSPECTIVE 671

Equilibrium comparative statics Consider the market of a given good, as seen in


Chapter 12. Let D : [0; b] ! R and S : [0; b] ! R be the demand and supply functions
respectively. A pair (p; q) 2 R2+ of prices and quantities is said to be a market equilibrium if

q = D (p) = S (p) (23.27)

In particular, having found the equilibrium price p^ by solving the equation D (p) = S (p),
the equilibrium quantity is q^ = D (^
p) = S (^
p).
Suppose that the demand for the good (also) depends on an exogenous variable 0.
For example, may be the level of indirect taxation which in‡uences the demanded quantity.
The demand thus takes the form D ( ; p) and it is a function D : [0; b] R+ ! R, that is,
it depends on both the market price p and the value of the exogenous variable. The
equilibrium condition (23.27) now becomes

q = D ( ; p) = S (p) (23.28)

and the equilibrium price p^ varies as changes. What is the relationship between taxation
level and equilibrium prices? Which properties does such a relationship have?
Answering these questions, which are simple but crucial from an economic perspective,
is equivalent to asking oneself: (i) whether a (unique) function p = f ( ) which connects
taxation and equilibrium prices, that is the exogenous and endogenous variable of this simple
market model, exists, and (ii) which properties such a function has.
In order to deal with this problem, we introduce the function g : [0; b] R+ ! R given
by g ( ; p) = S (p) D ( ; p), so that the equilibrium condition (23.28) can be written as

g ( ; p) = 0

In particular,
1
g (0) = f( ; p) 2 [0; b] R+ : g ( ; p) = 0g
is the set of all pairs of equilibrium prices/taxation levels, that is endogenous variable/exogenous
variable.
The two questions asked above are now equivalent to asking oneself whether:

(i) a (unique) function f such that g ( ; f ( )) = 0 exists;

(ii) if so, which are the properties of such a function f : for example, if it is decreasing, so
that higher indirect taxes correspond to lower equilibrium prices.

Problems as such, where the relationship among endogenous and exogenous variables
is studied, and, in particular how changes in the former impact the latter, are of central
importance in economic theory and in its empirical tests.
In order to …x ideas, let us examine the simple linear case where everything is straight-
forward.

Example 969 Consider the linear demand and supply functions:

D ( ; p) = (p + )
S (p) = a + bp
672 CHAPTER 23. IMPLICIT FUNCTIONS

where > 0 and b > 0. We have that

g ( ; p) = a + bp + (p + )

so that the function f : R+ ! R given by

a
f( )= + (23.29)
b+ +b

clearly satis…es (23.28). The equation g ( ; p) = 0 thus implicitly de…nes (and in this case
also explicitly) the function f given by (23.29). Its properties are obvious: for example, it
is strictly decreasing, so that changes in the taxation level bring about opposite changes in
equilibrium prices.
Regarding the equilibrium quantity q^, for every it is

q^ = D (f ( ) ; ) = S (f ( ))

In other words, we have a function : R+ ! R, equivalently de…ned by ( ) = D (f ( ) ; )


or by ( ) = S (f ( )) such that ( ) is the equilibrium quantity corresponding to the
taxation level .By using function ( ) = S (f ( )) for the sake of convenience, from (23.29)
we get that
b ( a) b
( )=a +
b+ +b
It is a strictly decreasing function, so that changes in the taxation level bring about opposite
changes in the equilibrium quantities as well. N

Optimum comparative statics Let us consider the pro…t function

(y) = py c (y) 8y 2 [0; +1)

of a …rm in perfect competition with cost function c : [0; +1) ! R which we suppose to be
di¤erentiable. As seen in Section 16.1.3, the …rm’s optimization problem is

max (y) sub y 2 R+ (23.30)


y

If, as one would expect, we assume there to be at least one production level y > 0 such that
(y) > 0, the level y = 0 is not optimal so that problem (23.30) becomes

max (y) sub y 2 (0; +1) (23.31)


y

Since the set (0; +1) is open, by Fermat’s Theorem, a necessary condition for y > 0 to be
optimal is that it satis…es the …rst order condition
0
(y) = p c0 (y) = 0 (23.32)

The most crucial aspect of the producer’s problem is to assess how the optimal production
varies as the market price changes, as this determines the producer’s behavior in the market
23.3. A GLOBAL PERSPECTIVE 673

for good y. Such a relevant relationship between prices and quantities is expressed by the
scalar function f such that

p c0 (f (p)) = 0 8p 2 [0; +1)

that is, by the function implicitly de…ned by the …rst order condition (23.32). Function f is
referred to as the producer’s (individual) supply function and, for each price level p, it gives
the optimal quantity y = f (p). Its existence and properties (for example, if it is increasing,
that is if higher prices lead to larger produced quantities, hence larger supplied quantities
in the market) are of central importance in studying the market for good y. In particular,
the sum of the supply functions of all producers of the good who are present in the market
constitutes the market supply function S (p) which we saw in Chapter 12.
In order to formalize the derivation of the supply function from the optimization problem
(23.31), we de…ne the function g : [0; +1) (0; +1) ! R given by

g (p; y) = p c0 (y)

The …rst order condition (23.32) can be rewritten as

g (p; y) = 0

which describes the producer’s optimal price/quantity pair. If there exists a function y =
f (p) such that g (p; f (p)) = 0, it is nothing but the supply function itself. Its properties
(monotonicity in particular) are essential for studying the good’s market. Let us see a simple
example where the function f and its properties can be recovered with simple computations.

Example 970 Let us consider quadratic costs: c (y) = y 2 for y 0. In such a case g (p; y) =
p 2y, so that the only function f : [0; +1) ! [0; +1) implicitly de…ned by g on R2+ is
f (p) = p=2. In particular, f is strictly increasing so that higher prices entail a higher
production, and hence a larger supply. N

23.3.2 Existence and uniqueness


The …rst important problem one faces when analyzing implicit functions is that of determ-
ining which conditions on function g guarantee that equation g (x; y) = 0 is solvable, that is
that it de…nes a unique implicit function. For the problem to be well posed it is necessary
that
0 2 Img (23.33)
that is that at least a solution (x0 ; y0 ) to equation g (x; y) = 0 exists. If it were not so, the
problem would be meaningless, and, for this reason, we shall assume that the non triviality
condition (23.33) holds. A very powerful tool which can be used to check it is Bolzano’s
Theorem: if g is continuous and points (x0 ; y 0 ) and (x; y) exist such that g (x; y) < 0 <
g (x0 ; y 0 ), we can conclude that there exists (x0 ; y0 ) such that g (x0 ; y0 ) = 0, that is 0 2 Img.

Having said this, let us focus on the following important result. It shows how strict
monotonicity in y is a su¢ cient condition for g (x; y) = 0 to de…ne a unique implicit function.9
9
A function is strictly monotone if it strictly increasing or strictly decreasing.
674 CHAPTER 23. IMPLICIT FUNCTIONS

Theorem 971 Let g : A R2 ! R be such that 0 2 Img. If g is strictly monotone


in y, equation g (x; y) = 0 de…nes one and only one implicit function f : 1 g 1 (0) !
1 (0) on the rectangle 1 (0) 1 (0) .
2 g 1 g 2 g

Proof It is enough to show that, for every (x; y) ; (x; y 0 ) 2 A,

g (x; y) = g x; y 0 =) y = y 0 (23.34)

In such a case, for every x 2 1 g 1 (0) there is necessarily a unique value of y.


Let g be strictly monotone in y, for example let it be strictly increasing (the decreasing
case is analogous) and let (x; y) ; (x; y 0 ) 2 A be such that g (x; y) = g (x; y 0 ). Suppose that
y 6= y 0 , for example y > y 0 . Then (x; y) > (x; y 0 ), and so strict monotonicity in y implies that
g (x; y) > g (x; y 0 ), which contradicts g (x; y) = g (x; y 0 ). Hence, y = y 0 and (23.34) holds.

In the case of di¤erentiable functions, monotonicity can be easily checked by looking at


the sign of the derivatives. Indeed, condition

@g (x; y) @g (x0 ; y 0 )
>0 8 (x; y) ; x0 ; y 0 2 A (23.35)
@y @y

implies the strict monotonicity which is required by the theorem.10

Example 972 Equation


g (x; y) = 7x2 2y ey = 0
implicitly de…nes a function on the whole R2 . Indeed g is di¤erentiable with

@g (x; y)
= 2 ey < 0 8y 2 R
@y

Therefore, g is strictly decreasing with respect to y. Moreover

lim g (x; y) = +1 ; lim g (x; y) = 1 8x 2 R


y! 1 y!+1

and so, by Bolzano’s Theorem, 0 2 Img. By Theorem 971, there is one and only one implicit
function f : R ! R such that, for every x 2 R,

g (x; f (x)) = 7x2 2f (x) ef (x) = 0

Note that we are not able to e¤ectively write y as an explicit function of x that is we are
not able to provide the explicit form of f . N

If g is strictly monotone in y only on a subset D of A, we can consider the restriction of


g on D and use Theorem 971 for it.
10
This condition is equivalent to having @g (x; y) =@y > 0 for every (x; y) 2 A or @g (x; y) =@y < 0 for every
(x; y) 2 A. The sign of the partial derivative of g in y must be constant, be it positive or negative.
23.3. A GLOBAL PERSPECTIVE 675

Example 973 Consider the equation g (x; y) = 0 with g : R2 ! R given by g (x; y) =


x2 + 3xy + y 2 . We have that @g=@y = 3x + 2y, and so @g=@y > 0 if and only if y > (3=2) x.
By setting
3
D = (x; y) 2 R2 : y > x
2
let g~ be the restriction of g on D. It is strictly increasing in y and, as the reader can check,
0 2 Im g~ and
~ 1 (0) = 2 g~ 1 (0) = R f0g
1 g

By Theorem 971, g~ de…nes a unique implicit function f : R f0g ! R f0g on R2 . Since for
every x 2 R f0g, we have that
8 np o
>
> 5 3
x if x > 0
< 2
S~ (x) = fy 2 R f0g : g~ (x; y) = 0g = n p o
>
>
: 5+3
x if x < 0
2

and such a function is 8 p


< 5 3
2 x if x > 0
f (x) = p
: 5+3
2 x if x < 0
In a similar fashion, since @g=@y < 0 if and only if y < (3=2) x, we can show that also the
restriction of g on (x; y) 2 R2 : y < (3=2) x de…nes a unique implicit function (we leave
the details to the reader). N
Example 974 Let us consider the function g : R2 ! R given by g (x; y) = y 2 x2 1. The
function g is strictly increasing in y 0, that is on D = R R+ . Consider the restriction of
g on D. We have that 0 2 Im g~, as well as
1 1
1 g~ (0) = R and 2 g~ (0) = R+
By Theorem 971, g~ de…nes a unique implicit function f : R ! R+ on the p rectangle D. In
particular, it can be easily seen that such a function is given by f (x) = x2 + 1. Since f is
surjective, we have that g~ 1 (0) = Gr f , that is
p
y 2 x2 1 = 0 () y = 1 + x2 8 (x; y) 2 D
Finally, notice that g is strictly decreasing in y on R R . By setting D = R R , the
restriction
p of g on D yields a (di¤erent) implicit function, an explicit expression of which
y= 1 + x2 can be given. N
Let us conclude by observing that the strict monotonicity assumption of Theorem 971
is a su¢ cient, yet not necessary, condition for the existence and uniqueness of the implicit
function. In Example 968 function g is not monotone with respect to y; nevertheless, equation
g (x; y) = 0 de…nes a unique implicit function.
After all, the careful reader might have noticed that in the proof of Theorem 971 we
only used the injectivity of g with respect to y (which is obviously guaranteed by strict
monotonicity). It can be easily seen that the injectivity of g in y (for every x) is the necessary
and su¢ cient condition for the uniqueness of the implicit function de…ned by the equation
g (x; y) = 0. Strict monotonicity is the simplest and most convenient su¢ cient condition for
injectivity.
676 CHAPTER 23. IMPLICIT FUNCTIONS

23.3.3 Properties of implicit functions


The following result lists some notable properties implicit functions have: in short, the
monotonicity and convexity of g are passed on, although reversed, to the implicit function
y = f (x) de…ned by equation g (x; y) = 0.

Proposition 975 Let g : A R2 ! R be strictly increasing in y, with 0 2 Img. The


function f : 1 g 1 (0) ! 2 g 1 (0) de…ned implicitly by g (x; y) = 0 on the rectangle
1 (0) 1 (0) is:
1 g 2 g

(i) strictly decreasing if g is strictly increasing in x;

(ii) convex if g is quasi concave in x;

(iii) strictly convex if g is strictly quasi concave in x;

(iv) continuous if g is continuous and A is open.

This result also holds when in point (i) “decreasing ”and “increasing ”are reversed, and
also when, in points (ii) and (iii), the roles concavity and convexity are reversed.11

The following lemma shows that assuming that g is strictly increasing in x as well as in
y in point (i) is equivalent to directly assuming that g is strictly increasing on A.

Lemma 976 A function g : A R2 ! R is strictly increasing if and only if it is strictly


increasing in both x and y.

Proof Let us only show the “If” part as the converse is trivial. Hence, let g : A R2 ! R
be strictly increasing both in x and y. Let (x; y) > (x0 ; y 0 ). Our aim is to show that
g (x; y) > g (x0 ; y 0 ). If x = x0 or y = y 0 , the result is trivial. Hence, let x > x0 and y > y 0 ,
that is (x; y) > (x0 ; y 0 ). We have that

(x; y) > x0 ; y > x0 ; y 0

and so g (x; y) > g (x0 ; y) > g (x0 ; y 0 ), which implies that g (x; y) > g (x0 ; y 0 ).

Proof of Proposition 975 By Proposition 971 there exists an implicit function f : 1 g 1 (0) !
1 (0) .
2 g
(i) Since it is strictly increasing both in x and y, by Lemma 976 the function f is strictly
increasing. Let us show that f is strictly decreasing. Take x; x0 2 1 g 1 (0) with x > x0 .
Suppose, by contradiction, that f (x) f (x0 ). This implies that (x; f (x)) > (x0 ; f (x0 )) and
so g (x; f (x)) > g (x0 ; f (x0 )),which contradicts g (x; f (x)) = g (x0 ; f (x0 )).
(ii) let g be quasi concave. Let us show that f is convex. Let x; x0 2 1 g 1 (0) and
2 [0; 1]. From g (x; f (x)) = g (x0 ; f (x0 )) it follows that
11
That is, f is strictly increasing if g is strictly increasing in x, and is (strictly) concave if g is (strictly)
quasi convex. In this regard, note that in points (ii) and (iii) we tacitly assumed that the domain of A and
the projections 1 g 1 (0) are convex sets, otherwise speaking of the concavity of g and the convexity of f
would be meaningless.
23.3. A GLOBAL PERSPECTIVE 677

g x + (1 ) x0 ; f (x) + (1 ) f x0
g (x; f (x)) = g x + (1 ) x0 ; f x + (1 ) x0

Hence, f (x) + (1 ) f (x0 ) f ( x + (1 ) x0 ) as f is strictly increasing in y. A similar


reasoning can be used to show (iii).
(iv) In order to show continuity, let us suppose that g is strictly increasing with respect
to y; if it were strictly decreasing, we could use an analogous line of reasoning. Consider
a point x0 and the corresponding value y0 = f (x0 ). Since A is open, the point (x0 ; y0 ) is
an interior point. Hence, there exists " > 0 such that B" (x0 ; y0 ) A. Take 0 < ~" < ".
Since g (x0 ; y0 ) = 0 and g is strictly increasing in y0 , it must hold that g (x0 ; y0 ~") < 0
< g (x0 ; y0 + ~"). By the continuity of gthere are two neighborhoods U 0 (x0 ) and U 00 (x0 ) of
x0 in which the sign of g does not change as x changes:

g (x; y0 ") < 0 for each x 2 U 0 (x0 ) and g (x; y0 + ") > 0 for each x 2 U 00 (x0 )

In the intersection U = U 0 \ U 00 of the two neighborhoods, both inequalities hold:

g (x; y0 ") < 0 < g (x; y0 + ") 8x 2 U (x0 )

Since g is strictly increasing, for every x 2 U (x0 ) the only value y such that g (x; y ) = 0 is
thus between y0 " and y0 + ":

y0 " < y < y0 + "

Therefore, we have that for the implicit function f , for every " > 0 there exists a neighbor-
hood U (x0 ) such that for every x in such a neighborhood

f (x0 ) " < f (x) < f (x0 ) + "

This guarantees that f is continuous in x0 . In fact, having …xed " > 0, let xn ! x. There is
a n" such that xn 2 U (x0 ) for every n n" , so that

f (x0 ) " < f (xn ) < f (x0 ) + " 8n n"

Since this hold for any " > 0, we have that limn f (xn ) = f (x0 ). Since x0 was arbitrarily
chosen, f is continuous everywhere.

Example 977 The Cobb-Douglas function u (x; y) = x y 1 , with 0 < < 1, is continu-
ous, strictly increasing and strictly concave on R2++ . Having set k > 0, by Proposition 975
the equation u (x; y) k = 0 de…nes on R2++ a unique implicit function fk : (0; 1) ! R
which is strictly decreasing and convex. N

Equilibrium comparative statics: properties Let us use the results we proved above
for the comparative statics problems we saw in Section 23.3.1.
Let us examine the …rst problem with indirect taxation . Suppose that:

(i) D : [0; b] R ! R and S : [0; b] ! R are continuous and such that D (0; ) S (0) and
D (b; ) S (b) for every .
678 CHAPTER 23. IMPLICIT FUNCTIONS

(ii) D is strictly decreasing in p and S is strictly increasing.

The function g : [0; b] R+ ! R given by g ( ; p) = S (p) D ( ; p) is therefore strictly


increasing in p. By Theorem 971, the equation g ( ; p) = 0 de…nes an implicit function
p = f ( ) since 0 2 Im g as Bolzano’s Theorem guarantees.12 The implicit function is such
that
g ( ; f ( )) = 0
and, by Proposition 975, it is

(i) continuous as D and S are continuous;


(ii) strictly decreasing as D is strictly decreasing in ;
(iii) (strictly) convex if S is (strictly) quasi concave and D is (strictly) quasi convex.

Property (i) is especially interesting. Under the natural hypothesis that D is strictly
decreasing in , we have that f is strictly decreasing, that is changes in taxation bring about
opposite changes in equilibrium prices (increases in entail decreases in p, and decreases in
determine increases in p).
In the linear case of Example 969 the existence and properties of f follow from simple
computations. The results in this section allow to extend the same conclusions to much more
general demand and supply functions.

Optimum comparative statics: properties Let us consider the optimization problem


max F (x; y) sub y 2 (0; +1)
y

with F : [0; +1) (0; +1) ! R di¤erentiable.


When the partial derivative @F=@y : (0; +1) ! R is strictly increasing in y (for example,
@ 2 F=@y 2 > 0 if F is twice di¤erentiable) and if 0 belongs to its image, then, by Theorem 971,
the function g (x; y) = @F=@y implicitly de…nes a unique function y = f (x). By Proposition
975 function f is:

(i) continuous if @F=@y is continuous;


(ii) strictly decreasing if @F=@y is strictly decreasing in x;
(iii) (strictly) convex if @F=@y is (strictly) quasi concave.

In the special case of the producer’s problem, we have that F (p; y) = px c (y) and so
@F (p; y)
g (p; y) = =p c0 (y) :
@y
The strict monotonicity of g in y is equivalent to the strict monotonicity of the derivative
function c0 (and to its strict convexity or concavity). In particular, if c0 is strictly increasing
(and so c is strictly convex), the function g is concave, which implies that the supply function
y = f (p) is convex. In such a case, since g is strictly increasing in p, the supply function is
strictly increasing in p.
12
Indeed D and S are continuous and, furthermore, D (0; ) = S (0) and D (b; ) 5 S (b) for every .
23.4. A GLOCAL PERSPECTIVE 679

23.4 A glocal perspective


The following result combines the global perspective of Theorem 971 with the local one of
Dini’s Theorem. In so doing, we complete Proposition 975 by showing the di¤erentiability
properties of the implicit function whose existence and uniqueness follow from the strict
monotonicity of g in y.

Theorem 978 (Global Implicit Function Theorem) Let g : U R2 ! R be de…ned


on an open set U , with 0 2 Img. If g is continuously di¤ erentiable on U and condition
(23.35) holds, then equation g (x; y) = 0 de…nes a unique implicit function f : 1 g 1 (0) !
1 (0) on the rectangle 1 (0) 1 (0) . The function f is continuously
2 g 1 g 2 g
di¤ erentiable, with

@g @g
(x; y) (x; f (x))
f 0 (x) = @x = @x 8 (x; y) 2 g 1
(0)
@g @g
(x; y) (x; f (x))
@y @y

Proof It is enough to notice that the hypothesis of the Implicit Function Theorem are
satis…ed in every (x; y) 2 g 1 (0).

Example 979 Let g : R2 ! R be given by g (x; y) = x3 3x2 + y 3 . The function g


is continuously di¤erentiable R2 , with (@g=@y) (x; y) = 3y 2 , so that (@g=@y) (x; y) > 0 for
every (x; y) 2 R2 . Since 1 g 1 (0) = 2 g 1 (0) = R, it follows from the Global Implicit
Function Theorem that the equation g (x; y) = 0 de…nes a unique implicit function f : R ! R
on R2 , with
@g
(x; y) x x2
f 0 (x) = @x = 2 8 (x; y) 2 g 1 (0)
@g y
(x; y)
@y
N

When condition (23.35) does not hold on the whole U , but only on one of its open subsets
D, the result can be used for the restriction g~ of g on D. This observation allows us to use
the result in many more settings, as the following variation of the previous example shows.

Example 980 Let g : R2 ! R be given by g (x; y) = x3 3x2 + y 3 + 3y 2 . The function g


is continuously di¤erentiable on R2 , with (@g=@y) (x; y) = 3y (y + 2), so that

@g @g
(x; y) > 0 () y 2 ( 1; 2) [ (0; +1) and (x; y) < 0 () y 2 ( 2; 0)
@y @y

Take D = R ( 1; 2) [ (0; +1) and D0 = R ( 2; 0). Let g~ be the restriction of g on


D. We have that

y 3 + 3y 2 : y 2 ( 1; 2) [ (0; +1) = x3 3x2 : x 2 R = R:


680 CHAPTER 23. IMPLICIT FUNCTIONS

and so 1 g~ 1 (0) = 2 g~ 1 (0) = R. By the Global Implicit Function Theorem, the


equation g~ (x; y) = 0 de…nes a unique implicit function f : R ! R on R2 , with
@g
(x; y) x x2
f 0 (x) = @x = 2 8 (x; y) 2 g~ 1
(0)
@g y (y + 2)
(x; y)
@y
We leave the study of the restriction of g on D0 to the reader. N

23.5 Appendix
23.5.1 Projections and shadows
Let A be a subset of the plane R2 : we denote each point as (x; y). Its projection

1 (A) = fx 2 R : 9y 2 R such that (x; y) 2 Ag

is the set of point x on the x-axis such that there exists a point y on the y-axis such that
the pair (x; y) belong to A.13
Likewise de…ne the projection

2 (A) = fy 2 R : 9x 2 R such that (x; y) 2 Ag

on the y-axis,that is the set of points y on the y-axis such that there exists (at least) one
point x on the x-axis such that (x; y) belongs to A.

The projections 1 (A) and 2 (A) are nothing but the “shadows” of the set A R2 on
the two axes.

4
y

0 π (A)
2

-2

-4
O π (A) x
1

-6
-6 -4 -2 0 2 4 6

Projections of set A on the two axes.

13
The notion of projection is not to be confused with the di¤erent one seen in Chapter 19.1.
23.5. APPENDIX 681

The next examples illustrate this important notion.

Example 981 (i) Let A = [a; b] [c; d]. In this case,

1 (A) = [a; b] and 2 (A) = [c; d] .

More in general, if A = A1 A2 , one has

1 (A) = A1 and 2 (A) = A2 .

The projections of a product set are its own factors.


(ii) Let A = x 2 R2 : x2 + y 2 = 1 and B = [0; 1] [0; 1]. Even though A B we obtain

1 (A) = 2 (A) = [ 1; 1] = 1 (B) = 2 (B) :

Di¤erent sets may sharenthe same projections. o


p
(iii) Let B" (x; y) = x 2 R2 : x2 + y 2 < " be a neighborhood of a point (x; y) 2 R2 .
One has
1 (B" (x; y)) = B" (x) = (x "; x + ")
and
2 (B" (x; y)) = B" (y) = (y "; y + ") :
We can conclude that the projections of a neighborhood(x; y) in R2 are neighborhoods of
equal radius of x and y in R.
(iv) Given f (x) = 1= jxj de…ned on R f0g, one has

1 (Gr f ) =R f0g and 2 (Gr f ) = (0; +1) .

In particular, 1 (Gr f ) is the domain of f and 2 (Gr f ) is the image Im f . This holds in
general: if f : A R ! R one has 1 (Gr f ) = A and 2 (Gr f ) = Im f . N

23.5.2 Proof of the Implicit Function Theorem


Suppose, without loss of generality, that (23.4) takes the form

@g
(x0 ; y0 ) > 0 (23.36)
@y

Since g is continuously di¤erentiable on B (x0 ; y0 ), by the Permanence of sigh Theorem,


(23.36) implies the existence of a neighborhood B~ (x0 ; y0 ) B (x0 ; y0 ) for which

@g ~ (x0 ; y0 )
(x; y) > 0 8 (x; y) 2 B
@y
Let " > 0 be small enough so that

(x0 "; x0 + ") (y0 "; y0 + ") ~ (x0 ; y0 ) :


B

and let g" be the restriction of g on this rectangle. Clearly, @g" (x; y) =@y > 0 for every
(x; y) 2 (x0 "; x0 + ") (y0 "; y0 + "). Furthermore, the projections 1 g" 1 (0) and
682 CHAPTER 23. IMPLICIT FUNCTIONS

2 g" 1 (0) are open intervals (why?). By setting U (x0 ) = 1 g" 1 (0) and V (y0 ) =
1
2 g" (0) , Theorem 971 applied to g" guarantees the existence of a unique implicit function
f : U (x0 ) ! V (y0 ) on the rectangle U (x0 ) V (y0 ) such that
g (x; f (x)) = 0 8x 2 U (x0 )
The function f is surjective (why?).
In order to show that f is continuously di¤erentiable, let us consider two points x and
x + x in U (x0 ) whose images are respectively
y = f (x) and y+ y = f (x + x) :
It must hold that
g" (x; y) = g" (x + x; y + y) = 0 and hence g" (x + x; y + y) g" (x; y) = 0:
Since g" is continuously di¤erentiable in U (x0 ) U (x0 ), we can write the linear approxim-
ation
@g" @g" p
g" (x + x; y + y) g" (x; y) = (x; y) x + (x; y) y + o x2 + y 2
@x @y
and so it must hold that
@g" @g" p
(x; y) x + (x; y) y + o x2 + y 2 = 0:
@x @y
Since
@g"
(x; y) 6= 0
@y
in a neighborhood (x0 ; y0 ), dividing by
@g"
(x; y) x
@y
both sides of the previous equality, we get that
@g"
(x; y) y p
@x + +o x2 + y 2 = 0:
@g" x
(x; y)
@y
Since y = f (x) is continuous, if x ! 0 also y ! 0 and so
2 3
@g" @g"
6 @x (x; y) y p 7 (x; y) y
lim 6 + + o x 2 + y 2 7 = @x + lim = 0;
x!0 4 @g" x 5 @g" x!0 x
(x; y) (x; y)
@y @y
and so
@g"
y (x; y)
f 0 (x) = lim = @x :
x!0 x @g"
(x; y)
@y
Finally, the continuity of f 0 is a direct consequence of the continuity of @g" =@x and of @g" =@y.
Chapter 24

Study of functions

24.1 In‡ection points


De…nition 982 Let f : A R ! R and x0 2 A \ A0 . The function f is said to be

(i) concave at the point x0 if there exists a neighborhood of this point (eventually only a
right-neighborhood or a left-neighborhood when x0 is a boundary point) in which it is
concave;

(ii) strictly concave at the point x0 if there exists a neighborhood of this point (eventually
only a right-neighborhood or a left-neighborhood) in which it is strictly concave.

Analogous de…nitions hold for the (strict) convexity at a point.

Corollary 922 allows immediately to give the following

Proposition 983 Let f : A R ! R be twice di¤ erentiable at x0 2 A. If f is concave at


x0 , then f 00 (x0 ) 0 (eventually the derivative is intended in unilateral sense). If f 00 (x0 ) < 0,
then f is strictly concave at x0 .

Brie‡y:

f concave at x0 =) f 00 (x0 ) 0
and
f 00 (x0 ) < 0 =) f strictly concave at x0
An analogous characterization holds for (strict) convexity.

Example 984 (i) The function f : R ! R de…ned by f (x) = 2x2 3 is strictly convex at
every point because f 00 (x) = 4 > 0 at every x.

(ii) The function f : R ! R de…ned by f (x) = x3 is strictly convex at x0 = 5 being


f 00 (5) = 30 > 0 and strictly concave at x0 = 1 being f 00 ( 1) = 6 < 0. N

683
684 CHAPTER 24. STUDY OF FUNCTIONS

Geometrically, as we know, for functions with a derivative concavity (convexity) means


that the tangent line lies always above (below) the graph of the function. Concavity (con-
vexity) at a point means, therefore, that the straight line tangent at that point lies locally,
that is, at least in a neighborhood of the point, above (below) the graph of the function.

5 10
y y

0 f(x )
0
6

-5 4 f(x )
0

-10

O x x O x x
0 0
-15 -2
0 1 2 3 4 5 6 -1 0 1 2 3 4 5 6 7

O.R. As well as the …rst derivative of a function at a point gives information on its increase
or decrease, the second derivative gives information on concavity or convexity at a point.
The greater jf 00 (x0 )j, the more pronounced the curvature (the “stomach”) of f at x0 (and
the “stomach” is upward if f 00 (x0 ) < 0, …rst …gure, and downward if f 00 (x0 ) > 0, second
…gure).
To avoid the in‡uence of the measure unit of f (x), especially in economics, we consider

f 00 (x0 )
f 0 (x0 )

(or its absolute value) that does not depend on it.1 Observe incidentally that f 00 (x0 ) =f 0 (x0 )
is the derivative of log f 0 (x0 ). H

De…nition 985 Let f : A R ! R and x0 an accumulation point for A. The point x0 is


said to be of in‡ection for f if there exists a neighborhood of this point relatively to which f
is concave to the right and convex to the left of x0 or vice versa.

In short, in an in‡ection point there changes the direction of the concavity of the function.
The previous Proposition 983 allows to conclude immediately that:

Proposition 986 Let f : A R ! R be twice di¤ erentiable at x0 . If x0 is an in‡ection


2 00
point for f , then f (x0 ) = 0.
1
Indeed, if T and S are respectively the units of measure of the dependent and independent variables, the
00
units of measure of f 0 and of f 00 are TS and ST2 , so that the unit of measure of ff 0 is ST2 = TS = S1 .
2 000
Moreover, as it is easy to see, if f (x0 ) < 0 the second derivative is decreasing and therefore it passes from
positive values to negative values and hence f passes from convexity to concavity. Vice versa if f 000 (x0 ) > 0.
24.2. ASYMPTOTES 685

2
Example 987 Let f : R ! R be the Gaussian function f (x) = e x . Resulting f 0 (x) =
2 2
2xe x we have f 00 (x) = 4x2 2 e x ; the function is concave for

1 1
p <x< p
2 2
p p
and convex
p for jxj > 1= 2. The two points 1= 2 are therefore in‡ection points and indeed
f 00 1= 2 = 0. Note that the point x = 0 is a local maximizer (actually, it is a global
maximizer, as the reader can easily verify). N

Geometrically, for the functions with a derivative, at a point of in‡ection the tangent line
cuts the graph: it cannot lie (locally) either above or below it.
If at an in‡ection point it happens that f 0 (x0 ) = f 00 (x0 ) = 0, the tangent line is horizontal
and cuts the graph of the function: we talk of point of in‡ection with horizontal tangent.

Example 988 For the function f : R ! R, de…ned by f (x) = x3 , the point x0 = 0 is of


in‡ection with horizontal tangent. More generally, this holds for the function f (x) = xn
with n odd. N

De…nition 985 allows …nally to prove easily the following su¢ cient condition for a point
x0 to be of in‡ection for a function f .

Proposition 989 A function f : A R ! R twice di¤ erentiable at x0 has an in‡ection


00
point at x0 if f (x0 ) = 0 and there exists " > 0 such that

f 00 (x) f 00 (y) < 0

for every x0 " < x < x0 < y < x0 + ".

24.2 Asymptotes
Intuitively, it is called asymptote a straight line to which the graph of a function gets indef-
initely near. Such straight lines can be vertical, horizontal or oblique.

(i) When we have at least one of the two following conditions:

lim f (x) = +1 or 1
x!x+
0

lim f (x) = +1 or 1
x!x0

the straight line of equation x = x0 is called vertical asymptote for f .

(ii) When
lim f (x) = L (or lim f (x) = L)
x!+1 x! 1

with L 2 R, the straight line of equation y = L is called horizontal asymptote for f at


+1 (or: at 1).
686 CHAPTER 24. STUDY OF FUNCTIONS

(iii) When

lim (f (x) ax b) = 0 (or lim (f (x) ax b) = 0)


x!+1 x! 1

that is when the distance between the function and the straight line y = ax + b (a 6= 0)
tends to 0 as x ! +1 (or: to 1), the straight line of equation y = ax + b (a 6= 0) is
an oblique asymptote for f to +1 (or: to 1).

The horizontal asymptotes are actually the special case of oblique asymptotes with a = 0.
Moreover, it is evident that there can be at most one only horizontal or oblique asymptote
as x ! 1 and at most one only as x ! +1. It is instead possible that f has several
vertical asymptotes.

Example 990 Consider the function

7
f (x) = 3
x2 +1

whose graph is

2
y
1.5

0.5

-0.5

-1

-1.5
O x
-2

-2.5

-3

-3.5
-5 0 5

Since limx!+1 f (x) = 3 and limx! 1 f (x) = 3; the straight line y = 3 is right and
left horizontal asymptote for f (x). N

Example 991 Consider the function

1
f (x) =
x2 +x 2
24.2. ASYMPTOTES 687

whose graph is
3
y

0
O x

-1

-2

-3
-4 -3 -2 -1 0 1 2 3 4 5

Since limx!1+ f (x) = +1 and limx!1 f (x) = 1; the straight line x = 1 is vertical
asymptote for f (x). Moreover, since limx! 2+ = 1 and limx! 2 = +1; also the
straight line x = 2 is vertical asymptote for f (x). N

Example 992 Consider the function

2x2
f (x) =
x+1

whose graph is
20
y
15

10

0
O x
-5

-10

-15

-20
-6 -4 -2 0 2 4 6

Since limx!+1 (f (x) 2x 2) = 0 and limx! 1 (f (x) 2x 2) = 0, the straight line


y = 2x + 2 is right and left oblique asymptote for f (x). N

There is no di¢ culty in identifying vertical and horizontal asymptotes. We thus shift our
attention to oblique asymptotes. To this end, we provide two simple results.

Proposition 993 The straight line of equation y = ax + b is an oblique asymptote of f as


x ! 1 if and only if limx! 1 f (x) =x = a and limx! 1 [f (x) ax] = b.
688 CHAPTER 24. STUDY OF FUNCTIONS

Proof “If”. When f (x) =x ! a, consider the di¤erence f (x) ax. If it tends to a …nite
limit b, then (and only then) f (x) ax b ! 0. “Only if”. From f (x) ax b ! 0 it
follows that f (x) ax ! b and, by dividing by x, that f (x) =x a ! 0.

The next result follows from de l’Hospital’s rule.

Proposition 994 Let f be with a derivative and f (x) ! 1 as x ! 1. Then y = ax+b


is oblique asymptote of f as x ! 1 if limx! 1 f 0 (x) = a and limx! 1 [f (x) ax] = b.

Proposition 993 gives a necessary and su¢ cient condition for the search of oblique asymp-
totes, while Proposition 994 only provides a su¢ cient condition. In order to use this latter
condition, the limits involved must exist. In this regard, consider the following example.

Example 995 For the function f : R ! R given by

cos x2
f (x) = x +
x
as x ! 1 we have
f (x) cos x2
=1+ !1
x x2
and
cos x2
f (x) x= !0
x
Therefore y = x is an oblique asymptote of f as x ! 1. Nevertheless, the …rst derivative
of f is
2x2 sin x2 cos x2 cos x2
f 0 (x) = 1 + = 1 2 sin x2
x2 x2
and it is immediate to verify that the limit of f 0 (x) as x ! 1 does not exist. N

In the following examples we determine the asymptotes of some functions.

Example 996 For the function f : R ! R given by f (x) = 5x + 2e x, as x ! +1, we


have that
f (x) 2
=5+ x !5
x xe
and that
x
f (x) 5x = 2e !0
Therefore y = 5x is oblique asymptote of f as x ! +1. As x ! 1 the function does not
have horizontal and oblique asymptotes. N
p
Example 997 For the function f : [1; +1) ! R given by f (x) = x2 x, as x ! +1,
we have p r
f (x) x2 x 1
= = 1 !1
x x x
24.2. ASYMPTOTES 689

and as x ! +1
r 1 !
p 1 1 2
f (x) x= x2 x x=x 1 x=x 1 1
x x
1
1
1 x
2
1 1
= 1 !
x
2

Therefore
1
y=x
2
is oblique asymptote as x ! +1 for f . N

It is quite simple to realize that:

(i) If f (x) = g (x) + h (x) and h (x) ! 0 as x ! 1, then f and g have in common the
possible oblique asymptotes.

(ii) If pn (x) = a0 xn + a1 xn 1 + + an is a polynomial


p of degree n in x with a0 > 0
n
and n odd, then the function de…ned by f (x) = pn (x) has, as x ! 1, oblique
asymptote
p 1 a1
y = n a0 x +
n a0
If pn (x) = a0 xn + a1 xn 1 + + an is a polynomial
p of degree n in x with a0 > 0 and n
n
even, then the function de…ned by f (x) = pn (x) as x ! +1 has oblique asymptote

p 1 a1
y= n
a0 x +
n a0

and as x ! 1 oblique asymptote

p 1 a1
y= n
a0 x +
n a0

Let us verify only (ii) for n odd (for n even the calculations are analogous). If n is odd
as x ! 1 we have
p q
n
a xn n 1 + a1 + ::: + an
f (x) 0 a0 x a0 x p
= ! n a0
x x
p
therefore the slope of the oblique asymptote is n a0 . Moreover
" 1 #
p p a1 xn 1+ ::: + an n
f (x) n
a0 x = n
a0 x 1+ 1 =
a0 xn
1
a 1 xn 1 +:::+a n
1+ a 0 xn
n
1
p a1 xn 1+ ::: + an
= n
a0 x a 1 xn 1 +:::+a
a0 xn n
a 0 xn
690 CHAPTER 24. STUDY OF FUNCTIONS

Since as x ! 1
1
a 1 xn 1 +:::+a n
1+ a 0 xn
n
1
1 p a1 xn 1+ ::: + an p a1
a1 xn 1 +:::+an
! and n
a0 x ! n
a0
n a0 xn a0
a 0 xn

we have, as x ! 1,
p p a1 1
f (x) n
a0 x ! n
a0
a0 n
In the previous example we had n = 2, a0 = 1 and a1 = 1; indeed, as x ! +1, the
asymptote had equation
p2 1 1 1
y = 1 x+ =x
2 1 2

24.3 Study of functions


The results obtained in these chapters on di¤erential calculus allow for the qualitative study
of a function. Such a study consists in …nding the possible local maximizers and minimizers,
the in‡ection points, and the asymptotic and the boundary behavior of the function.
Let us consider a function f : A R ! R de…ned on a set A. To apply the results of the
chapter, let us suppose moreover that f has a derivative at least twice at each interior point
of A.

(i) First of all it is suitable to calculate the limits of f at the boundary points of the
domain besides eventually as x ! 1 when A is unbounded.

(ii) It can be interesting to establish the sets in which the function is positive, f (x) 0,
increasing, f 0 (x) 0, and concave/convex, f 00 (x) Q 0. Once determined the intersec-
tions of the graph with the axes (…nding the set f (0) on the vertical axis and the set
f 1 (0) on the horizontal axis), we have a …rst idea of its graph.

(iii) To …nd local extremal points (provided they exist), it is possible to use the omnibus
procedure seen in Section 21.3.

(iv) The points at which f 00 (x) = 0 are candidate to be of in‡ection; they are certainly so
if at these points f 000 6= 0.

(v) Finally it is useful to look for possible oblique asymptotes of f .

Example 998 Let f : R ! R be given by f (x) = x3 7x2 + 12x. We have

lim f (x) = 1 ; lim f (x) = +1


x! 1 x!+1

and therefore there are no asymptotes. Then we have:

(i) f (0) p= 0 and f (x) = 0, that is, x x2 7x + 12 = 0, for x = 0 and for x =


7 49 48 =2 = 3 and 4. Given that it is possible to write f (x) = x (x 3) (x 4),
the function is 0 when x 2 [0; 3] [ [4; 1).
24.3. STUDY OF FUNCTIONS 691

(ii) Being f 0 (x) = 3x2 14x + 12, the derivative is zero for

p p p
14 196 144 14 52 7 13
x= = =
6 6 3

p i h p
The derivative is 0 when x 2 1; 7 3
13
[ 7+ 13
3 ;1 .

(iii) Being f 00 (x) = 6x 14, it is zero for x = 7=3. The second derivative is 0 when
x 7=3.

p p
(iv) Since f 00 7 3 13 < 0, the point is a local maximizer; since instead f 00 7+ 13
3 > 0,
the point is a local minimizer. Finally the point 7=3 is of in‡ection.

The graph of the function is therefore:

10

y
8

0
O x

-2
-3 -2 -1 0 1 2 3 4 5 6 7

2
Example 999 Let f : R ! R be the function de…ned by f (x) = e x . It is called Gaussian
function. Both limits, as x ! 1, are 0, and the horizontal axis is therefore horizontal
asymptote. The function is always strictly positive and f (0) = 1. Next, we look for possible
2
local extremal points. The …rst order condition f 0 (x) = 0 has the form 2xe x = 0 and so
x = 0 is the unique critical point. The second derivative is

x2 x2 x2
f 00 (x) = 2e + ( 2x) e ( 2x) = 2e 2x2 1

Therefore, f 00 (0) = 2: x = 0 is a local maximizer. The graph of the function is the famous
692 CHAPTER 24. STUDY OF FUNCTIONS

Gaussian bell:
2

y
1.5

0.5

0
O x

-0.5

-1
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

which is the most classical among the graphs of functions. N

Example 1000 Let f : R ! R be given by f (x) = x6 3x2 + 1. Next, we look for possible
local extremal points. The …rst order condition f 0 (x) = 0 has the form

6x5 6x = 0

and therefore x = 0 and x = 1 are the unique critical points. We have f 00 (0) = 6,
f 00 ( 1) = 24 and f 00 (1) = 24. Hence, x = 0 is a local maximizer, while x = 1 and x = 1
are local minimizer. From limx!+1 f (x) = limx! 1 f (x) = +1 if follows that the graph
of this function is:

y
1.5

0.5

0
O x

-0.5

-1
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

Example 1001 Let f : R ! R be given by f (x) = xex . Its limits are limx! 1 xe
x =0
and limx!+1 xex = +1. We then have:

(i) f (x) 0 () x 0.
24.3. STUDY OF FUNCTIONS 693

(ii) f 0 (x) = (x + 1) ex 0 () x 1.

(iii) f 00 (x) = (x + 2) ex 0 () x 2.

(iv) f (0) = 0: the origin is the unique point of intersection with the axes.

Since f 0 (x) = 0 for x = 1 and f 00 ( 1) = e 1 > 0, the unique minimizer is x = 1.


Given that f 00 (x) = 0 for x = 2, it is point of in‡ection.

10

9
y
8

0
O x
-1
-6 -4 -2 0 2 4 6

Example 1002 Let f : R ! R be given by f (x) = x2 ex . Its limits are

lim x2 ex = 0+ ; lim x2 ex = +1
x! 1 x!+1

We then have that:

(i) f (x) is always 0 and f (0) = 0: x = 0 is therefore a minimizer.

(ii) f 0 (x) = x (x + 2) ex 0 () x 2 ( 1; 2] [ [0; +1).

p p
(iii) f 00 (x) = x2 + 4x + 2 ex 0 () x 2 1; 2 2 [ 2+ 2; +1 .

(iv) x = 2 and x = 0 are the unique stationary points. Being f 00 ( 2) = 2e 2 < 0,


x = 2 is a local maximizer. Given that f 00 (0) = 2e0 > 0 it is con…rmed that x = 0 is
a minimizer.
694 CHAPTER 24. STUDY OF FUNCTIONS
p
(v) The two points of abscissae 2 2 are of in‡ection.

8 y
7

0
O x
-1
-4 -3 -2 -1 0 1 2 3 4 5

N
Example 1003 Let f : R ! R be given by f (x) = x3 ex . Its limits are
lim x3 ex = 0 ; lim x3 ex = +1
x! 1 x!+1

We then have that:


(i) f (x) 0 () x 0; f (0) = 0.
(ii) f 0 (x) = x2 (x + 3) ex 0 () x 3; note that f 0 (0) = 0 as well as f 0 > 0 close to
x = 0: the function is therefore increasing at the origin.
p p
(iii) f 00 (x) = x3 + 6x2 + 6x ex 0 () x 2 3 3; 3 + 3 [ [0; +1).
(iv) x = 3 and x = 0 are the unique stationary points. Being f 00 ( 3) = 9e 3 > 0, x = 3
is a local minimizer. We have f 00 (0) = 0 and we already know that the function is
increasing at x = 0.
p
(v) The three points of abscissae 3 3 and 0 are of in‡ection.

8
y
7

0
O x
-1

-2
-6 -5 -4 -3 -2 -1 0 1 2 3
24.3. STUDY OF FUNCTIONS 695

Example 1004 Let f : R ! R be given by f (x) = 2x + 3 + x 1 2 . It is not de…ned for x = 2.


We have:

lim f (x) = 1 ; lim f (x) = 1 ; lim f (x) = +1 ; lim f (x) = +1


x! 1 x!2 x!2+ x!+1

1 5
(i) f (0) = 3 2 = 2 ; we have f (x) = 0 when (2x + 3) (x 2) = 1 that is, when
2x2 x 5 = 0, i.e., for
p
1 41
x= ' 1; 35 and 1; 85
4

(ii) We have
1
f 0 (x) = 2
(x 2)2
p
which is zero if (x 2)2 = 1=2, i.e., if x = 2 (1= 2).

(iii) Given that


2
f 00 (x) =
(x 2)3
is positive
p for every x p
> 2 and negative for every x < 2, the two stationary points
2 + (1= 2) and 2 (1= 2) are, respectively, a local minimizer and a local maximizer.

(iv) Given that f 0 (x) ! 2 as x ! 1, the function presents an oblique asymptote. Since

1
lim [f (x) 2x] = lim 3+ =3
x! 1 x! 1 x 2

the oblique asymptote has equation y = 2x + 3. Clearly, there is also a vertical


asymptote of equation x = 2.

25

20 y
15

10

0
O x
-5

-10

-15

-20

-25
-5 0 5 10
696 CHAPTER 24. STUDY OF FUNCTIONS

Note that
1
f (x)
x 2
as x ! 2 (in proximity of 2 it behaves as 1= (x 2), i.e., it diverges) and that f (x)
2x + 3 as x ! 1 (for x su¢ ciently large it behaves as y = 2x + 3). N
Part VII

Di¤erential optimization

697
Chapter 25

Unconstrained optimization

25.1 Unconstrained problems


In the last part of the book we learned how di¤erential calculus provides remarkable tools for
the study of local solutions of the optimization problems introduced in Chapter 16, problems
that are at heart of economics (and of our book). In the next few chapters on optimization
theory we will show how these tools can be used to …nd global solutions of such problems,
which are the real object of interest in applications –as we already stressed several times. In
other words, we will learn how the study of local solutions can be instrumental for the study
of global ones. To this end, we will study two main classes of problems: (i) problems with
coercive objective functions, in which we can combine local di¤erential results a la Fermat
with global existence results a la Weierstrass and Tonelli; (ii) problems with concave objective
functions that can rely on the fundamental optimality properties of concave functions studied
in Chapter 16.
As in Chapter 16, we consider an optimization problem

max f (x) sub x 2 C (25.1)


x

with objective function f : A Rn ! R and choice set C A. A point x ^ 2 C is a (global)


solution of the optimization problem (27.4) if f (^x) f (x) for each x 2 C, while x^ 2 C
is a local solution of such a problem if there exists a neighborhood Bx0 (") of x^ such that
x) f (x) for each x 2 Bx0 (") \ C.1
f (^
If C is open and f has a derivative on C, we have an unconstrained di¤ erential optim-
ization problem. In the rest of the chapter we will focus on this basic class of problems
and through them we will illustrate a few optimization themes (Sections 25.4-25.6). In the
next two chapters we will consider two fundamental classes of constrained problems, that is,
problems that feature choice sets that are not open.

25.2 Coercive problems


The unconstrained di¤erential optimization problem (25.1) is said to be coercive if the object-
ive function f is coercive on C. As the continuity of f on C is guaranteed by di¤erentiability,
1
As in Chapter 16, solutions are thus understood to be global even when not stated explicitly.

699
700 CHAPTER 25. UNCONSTRAINED OPTIMIZATION

Tonelli’s Theorem can be used for this class of problems and, along with Fermat’s Theorem,
it gives rise to the so-called elimination method for solving optimization problems, which in
this chapter will be used in dealing with unconstrained di¤erential optimization problems.

The elimination method consists in the following steps:

1. identify the set S of internal critical points of f on C, that is

S = fx 2 C : rf (x) = 0g

2. construct the set f (S) = ff (x) : x 2 Sg; if x


^ 2 S is such that

f (^
x) f (x) 8x 2 S (25.2)

then x
^ is a solution for the optimization problem (25.1).

In other words, once conditions for Tonelli’s Theorem to be applied are veri…ed, one
constructs the set of critical points; the point (or points) where f achieves maximum value
is the solution to the optimization problem.

N.B. If the function f 2 C 2 (C), in phase 1 instead of S one can consider its subset S2 S
which is made up of the critical points which satisfy the second order necessary condition
(Sections 20.5.3 and 21.4.4). O

In order to better understand the elimination method, the reader should note that, thanks
to Fermat’s Theorem, the set S consists of all points in C which are candidate local solutions
for optimization problem (25.1). On the other hand, if f is continuous and coercive on C, by
Tonelli’s Theorem there is at least a solution for the optimization problem. Such a solution
must belong to set S (as long as it is non-empty) as a solution to the optimization problem
is a fortiori, a local solution. Hence the solutions to the “restricted” optimization problem

max f (x) sub x 2 S (25.3)


x

are also solutions to optimization problem (25.1). However the solutions to problem (25.3)
are the points x^ 2 S for which condition (25.2) holds. Hence they are the solutions to
optimization problem (25.1), as phase 3 of the elimination method states.

As the following examples show, the elimination method elegantly and e¤ectively com-
bines Tonelli’s global result and that of Fermat which has a more local nature. Note how
Tonelli’s Theorem is crucial as the set C is open, thus making Weierstrass’Theorem inap-
plicable (as it requires C to be compact).
The smaller is the set S of critical points, the better the method works, as phase 3 requires
a direct comparison of f in all points of S. For such a reason the method is particularly
e¤ective when, in the scalar case, one can consider, instead of S sits subset S2 , which is made
up of all critical points which satisfy the second order necessary condition.
25.2. COERCIVE PROBLEMS 701

2
Example 1005 Let f : Rn ! R be given by f (x) = 1 kxk2 ekxk and let C = Rn . The
function f is coercive on Rn . Indeed, it is supercoercive: by taking tn = kxn k, it follows that
2 2
f (xn ) = 1 kxn k2 ekxn k = 1 t2n etn ! 1

for any sequence fxn g Rn such that tn = kxn k ! +1. Since it is continuous f is coercive
on Rn by Proposition 698. the unconstrained di¤erential optimization problem
2
max 1 kxk2 ekxk sub x 2 Rn (25.4)
x

is thus coercive. Let us solve it by using the elimination method.

Phase 1: It is easy to see that


rf (x) = 0 () x = 0
so that S = f0g and x = 0 is the unique critical point, thus completing phase 1.

Phase 2: Since S is a singleton, the condition in this phase trivially holds and so x
^ = 0 is a
solution to optimization problem (25.4). N

Example 1006 Let f : R ! R be given by f (x) = x6 + 3x2 1 and let C = R. By


Proposition 698, f is coercive on R as limx! 1 f (x) = limx! 1 x6 + 3x2 1 = 1.
The unconstrained di¤erential optimization problem

max x6 + 3x2 1 sub x 2 R (25.5)


x

is thus coercive. Let us solve it with the elimination method.

Phase 1: The …rst order condition f 0 (x) = 0 takes the form 6x5 6x = 0 and so x = 0and
x = 1 are the only critical points, that is S = f 1; 0; 1g. We have that f 00 (0) = 6,
f 00 ( 1) = 24 and f 00 (1) = 24 and so S2 = f0g.

Phase 2: Since S2 is a singleton, the condition in this phase trivially hold and so x
^ = 0 is a
solution to the optimization problem (25.5). N

Example 1007 Let us consider Example 851 again, which dealt with the unconstrained
optimization problem
4 2
max e x +x sub x 2 R
x

with di¤erential methods. The problem is di¤erential. Let us verify its coercivity. By setting
g (x) = ex and h (x) = x4 x2 , it follows that f = g h. We have that limx! 1 h (x) =
limx! 1 x4 + x2 = 1 and so, by Proposition 698, the function h is coercive on R. Since
g is strictly increasing, the function f is a strictly increasing transformation of a coercive
function. By Proposition 684, f is coercive.
This unconstrained di¤erential optimization problem is thus coercive and can be solved
with the elimination method.
p p
Phase 1: From Example 851 we know that S2 = 1= 2; 1= 2 .
702 CHAPTER 25. UNCONSTRAINED OPTIMIZATION
p p p
Phase 2: We have that f 1= 2 = f 1= 2 and so both points x ^ = 1= 2 are solutions
to the unconstrained optimization problem. The elimination method allowed us to identify
the nature of such points, which would not have been possible by using solely di¤erential
methods as in Example 851. N

Example 1008 Example 898 dealt with the optimization problem

max f (x) sub x 2 R2++


x

where f : R2 ! R is de…ned as f (x1 ; x2 ) = 2x21 x22 + 3 (x1 + x2 ) x1 x2 + 3. The function


f is supercoercive: indeed, it is easily seen that

f (x1n ; x2n ) = 2x21n x22n + 3 (x1n + x2n ) x1n x2n + 3 ! 1


p
for any “exploding”sequence fxn = (x1n ; x2n )g R2++ , that is, such that kxn k = x21n + x22n !
+1. As f is continuous, it is coercive on Rn by Proposition 698.
This unconstrained di¤erential optimization problem is coercive as well, so it can be
solved with the elimination method.

Phase 1: By Example 898, S2 = f3=7; 9=7g.

Phase 2: As S2 is a singleton, the condition in this phase trivially holds and so x


^ = (3=7; 9=7)
is a solution to the optimization problem (25.5). The elimination method has allowed us
to identify the nature of such a point, thus making it possible to conclude the study of the
optimization problem from Example 898. N

25.3 Concave problems


Optimization problems with concave objective functions are of central importance in eco-
nomic applications. This is due to the remarkable optimality properties of concave functions.
In particular, the unconstrained di¤erential optimization problem (25.1), that is,

max f (x) sub x 2 C (25.6)


x

is said to be concave if the open set C A is convex and if the function f : A Rn ! R is


concave on C.

By Theorem 930, for every x 2 C we have the inequality

f (y) f (x) + rf (x) (y x) 8y 2 C

This implies that a point x^ of C is solution of the concave problem (25.6) if and only if
rf (^
x) = 0. Indeed, if x
^ 2 C is such that rf (^
x) = 0, the inequality is such that

f (y) f (^
x) + rf (^
x) (y x
^) = f (^
x) 8y 2 C

so that x
^ is solution of problem (25.6). On the other hand, if x
^ 2 C is solution of the
problem, we have rf (^x) = 0 thanks to Fermat’s Theorem.
25.3. CONCAVE PROBLEMS 703

In sum, in a concave problem the …rst order condition rf (^ x) = 0 becomes necessary


and su¢ cient for a point x ^ 2 C to be solution. This most remarkable property, studied in
Section 22.3, explains the importance of concavity in optimization problems.
But, more is true: by Theorem 706, such a solution is unique if f is strictly quasi-concave.
Besides existence, also the study of the uniqueness of solutions (key for comparative statics
exercises) is thus best carried out under concavity.

The status of necessary and su¢ cient condition of rf (^ x) = 0 leads to the concave
(elimination) method to solve the concave problem (25.6); it consists of a single phase:

1. …nd the set S = fx 2 C : rf (x) = 0g of the stationary points of f on C; all, and only,
the points x
^ 2 S solve the optimization problem.

In particular, when f is strictly quasi-concave, the set S is a singleton that consists of


the unique solution. This is the case when the concave method is most powerful. In general,
this method is, at the same time, simpler and more powerful than the method of elimination.
It requires the concavity of the objective function, a demanding condition that, however, is
often assumed in economic applications (actually, in these applications strict concavity is
often assumed in order to have unique solutions).

Example 1009 Let f : R ! R be given by f (x) = x log x and let C = (0; 1). The
function f is strictly concave since f 0 (x) = 1 log x is strictly decreasing (Corollary 920).
Let us solve the concave problem

max x log x sub x > 0 (25.7)


x

We have
1
f 0 (x) = 0 () log x = 1 () elog x = e 1
() x =
e
By the concave method, x
^=e 1 is the unique solution of problem (25.7). N

Example 1010 Let f : R2 ! R be given by f (x) = 2x2 3xy 6y 2 and let C = R2 . The
function f is strictly concave since the Hessian

4 3
3 12

is negative de…nite (Proposition 928). Let us solve the concave problem

max 2x2 3xy 6y 2 sub x 2 R2 (25.8)


x

We have
4x 3y = 0
rf (x) = 0 () () x = (0; 0)
12y 3x = 0
By the concave method, the origin x
^ = (0; 0) is the unique solution of problem (25.8). N
704 CHAPTER 25. UNCONSTRAINED OPTIMIZATION

25.4 Relationship among problems

In this preview we introduced the two relevant classes of unconstrained di¤erential optimiz-
ation problems: coercive and concave ones. A few observations are in order:

1. The two classes are not exhaustive: there are unconstrained di¤erential optimization
problems which are neither coercive nor concave. For example, consider the uncon-
strained di¤erential optimization problem

max cos x sub x 2 R


x

it is neither coercive nor concave: the cosine function is neither coercive on the real
line (see Example 683) nor concave. Nonetheless, the problem is trivial: as one can
easily infer from the graph of the cosine function, its solutions are the points x = 2k
con k 2 Z. As usual, common sense gives the best guidance in solving any problem (in
particular, optimization ones), more so than any classi…cation.

2. The two classes are not disjoint: there are unconstrained di¤erential optimization prob-
lems which are both coercive and concave. For example, the unconstrained di¤erential
optimization problem

max 1 x2 sub x 2 R
x

is both coercive and concave: the cosine function is indeed both coercive (see Example
689) and concave on the real line. In cases such as this one we use the more powerful
concave method.

3. The two classes are distinct: there are unconstrained di¤erential optimization problems
which are coercive but not concave, and vice versa.

(a) Let f : R ! R be given by

(
1 x2 if x 0
f (x) =
1 if x > 0

Since f is di¤erentiable (Example 750), the problem

max f (x) sub x 2 R


x
25.5. WEAKENING 705

is an unconstrained di¤erential optimization problem. The graph of function f

3
y

1
1

0
O x

-1

-2

-3
-3 -2 -1 0 1 2 3 4 5

shows how it is concave, but not coercive. The optimization problem is thus
concave, but not coercive.
(b) The unconstrained di¤erential optimization problem

x2
max e sub x 2 R
x

2
is coercive, but not concave: the Gaussian function e x is indeed coercive (Ex-
ample 685), but not concave, as its well-known graph shows

y
2.5

1.5

0.5

0
O x
-0.5

-1
-4 -3 -2 -1 0 1 2 3 4

25.5 Weakening
An optimization problem
max f (x) sub x 2 C
x
706 CHAPTER 25. UNCONSTRAINED OPTIMIZATION

with objective function f : A Rn ! R may be solved by weakening, that is by considering


an ancillary optimization problem

max f (x) sub x 2 B


x

which is characterized by a larger choice set C B A which is analytically more convenient


(for example it may be convex) so that the relaxed problem becomes coercive or concave. If a
solution to the relaxed problem belongs to the initial choice set C, it automatically becomes
a solution to the initial problem as well. The following examples should clarify this simple
yet fundamental idea, which can allow us to solve optimization problems which are neither
coercive nor concave.

Exercise 1011 (i) Let us consider the optimization problem


2
max 1 kxk2 ekxk sub x 2 Qn+ (25.9)
x

where Qn+ is the set of vectors in Rn whose coordinates are rational and positive. An intuitive
weakening of the problem is
2
max 1 kxk2 ekxk sub x 2 Rn
x

whose choice set is larger yet analytically more convenient. Indeed the relaxed problem is
coercive and a simple application of the elimination method shows that its solution is x
^=0
( Example 1005). Since it belongs to Qn+ , we can conclude that x ^ = 0 is also the unique
solution to problem (25.9). It would have been far more complex to reach such a conclusion
by studying the initial problem directly.
(ii) Let us consider the Consumer Problem with log-linear utility
n
X
max ai log xi sub x 2 C (25.10)
x
i=1

where C = B (p; I) \ Qn is the set of bundles with rational components. Let us consider the
relaxed version
Xn
max ai log xi sub x 2 B (p; I)
x
i=1

with a larger yet convex (thus analytically more convenient) choice set. Indeed, convexity
itself allowed us to conclude in Section 16.5 that the unique solution to the problem is the
bundle x ^ such that x
^i = ai I=pi for every good i = 1; :::; n. If ai ; pi ; I 2 Q for every i, the
bundle x ^ belongs to Cand is thus the unique solution to problem (25.10). It would have been
far more complex to reach such a conclusion by studying problem (25.10) directly. N

In conclusion, it is sometimes more convenient to ignore some of the constraints of the


choice set when doing so makes the choice set larger yet more analytically tractable, in the
hope that one of the solutions to the relaxed problem belongs to the original choice set.
25.6. NO ILLUSIONS 707

25.6 No illusions
Solving optimization problems is generally a quite complex endeavor, even when a limited
number of variables is involved. In this section we will refer to an example of an optimization
problem whose solution is as complicated as proving Fermat’s Last Theorem.2 The latter,
which was …nally proven after three centuries of unfruitful e¤orts, states that, for n 3,
there do not exist any three positive integers x, y and z such that xn + y n = z n (Section
1.3.2)
Let us consider the optimization problem

min f (x; y; z; n) sub (x; y; z; n) 2 C


x;y;z;n

where the objective function f : R3 N ! R is given by

f (x; y; z; n) = (xn + y n z n )2 + (1 cos 2 x)2 + (1 cos 2 y)2 + (1 cos 2 z)2

and the choice set is C = (x; y; z; n) 2 R3 N : x; y; z 1; n 3 .


It is an optimization problem in four variables, one of which, n, is discrete, thus not
making it possible to use di¤erential and convex methods. At …rst sight this might seem a
serious problem, however it is not intractable. Let us try to analyze it. We have that f 0
as f is a sum of squares; in particular,

inf f (x; y; z; n) = 0
(x;y;z;n)2C

p p 2 p
since limn!1 f 1; 1; n 2; n = limn!1 1 cos 2 n 2 = 0. In fact, limn!1 n 2 = 1 (Pro-
position 310).
The minimum value is thus zero. The question is whether there is there a solution to
the problem, that is a vector (^x; y^; z^; n
^ ) 2 C such that f (^
x; y^; z^; n
^ ) = 0. Since f is a sum of
squares, this requires that in such a vector they all be null, that is

^n^ + y^n^
x z^n^ = 1 cos 2 x
^=1 cos 2 y^ = 1 cos 2 z^ = 0

The last three equalities imply that the points x ^, y^ and z^ are integers.3 In order to belong
to the set C, they must be positive. Therefore, the vector (^ x; y^; z^; n
^ ) 2 C must be made
up of three positive integers such that x^n^ + y^n^ = z^n^ for n
^ 3. This is possible if and only
if Fermat’s Last Theorem is false. Now that we know it to be true, we can conclude that
this optimization problem has no solution. We could not have made such a statement before
1994 and it would have been unclear whether this optimization problem had a solution .
Be it as it may, the solution to this optimization problem, which only has four variables, is
equivalent to solving one of the most well-known problems in mathematics.

2
Based on K. G. Murty e S. N. Kabadi, “Some NP-complete problems in quadratic and nonlinear pro-
gramming”, Mathematical Programming, 39, 117-129, 1987.
3
Let the reader be reminded that cos 2x = 1 if and only if x is an integer.
708 CHAPTER 25. UNCONSTRAINED OPTIMIZATION
Chapter 26

Equality constraints

26.1 Introduction
The classical necessary condition for local extremal points of Fermat’s Theorem considers
interior points of the choice set C, something that greatly limits its use in the optimization
problems coming from economics. Indeed, in many of them the hypotheses of monotonicity
of Proposition 666 hold and, therefore, the possible solutions are boundary, and not interior.
A classical example is the consumer problem

max u (x) sub x 2 B (p; I) (26.1)


x

Under the standard hypothesis of monotonicity, by Walras’Law the problem can be rewritten
as
max u (x) sub x 2 (p; I)
x

where (p; I) = fx 2 A : p x = Ig @B (p; I) is determined by an equality constraint (the


consumer allocates all his income to the purchase of the optimal bundle). The set (p; I)
has no interior points, that is,
int (p; I) = ;
As Fermat’s Theorem considers interior points, it is useless to …nd for the local solutions of
the consumer problem. The equality constraint, with its drastic topological consequences,
deprives us of this fundamental result in the study of the consumer problem. Fortunately,
there is an equally important result of Lagrange that rescue us, as the chapter will show.

26.2 The problem


The general form of an optimization problem with equality constraints is given by

max f (x) (26.2)


x
sub g1 (x) = b1 ; g2 (x) = b2 ; :::; gm (x) = bm

where f : A Rn ! R is the objective function, while the functions gi : A Rn ! R and


the scalars bi represent m equality constraints. Throughout the chapter we assume that the

709
710 CHAPTER 26. EQUALITY CONSTRAINTS

functions f and gi are continuously di¤erentiable on a non-empty and open subset D of their
domain A; that is, ; =
6 D int A.
The set
C = fx 2 A : gi (x) = bi 8i = 1; :::; mg (26.3)
is the subset of A identi…ed by the constraints; therefore, the optimization problem (26.2)
can be equivalently formulated in canonical form as:

max f (x) sub x 2 C


x

Nevertheless, for this special class of optimization problems we will often use the more
evocative writing (26.2).

In what follows we will …rst study the important special case of a single constraint, which
we will then generalize to the case of several constraints.

26.3 One constraint


26.3.1 A key lemma
With a single constraint, the optimization problem (26.2) becomes:

max f (x) sub g (x) = b (26.4)


x

where f : A Rn ! R is the objective function, while the function g : A Rn ! R and the


scalar b de…ne the equality constraint.

The next fundamental lemma gives the key to …nd the solutions of problem (26.4). The
hypothesis x
^ 2 C \ D requires that x ^ be a point of the choice set in which f and g have
a continuous derivative. Moreover, we require that rg (^ x) 6= 0; in this regard, note that a
point x 2 A is said to be singular if rg (x) = 0, and regular otherwise. According to this
terminology, the condition rg (^
x) 6= 0 amounts to require x^ to be regular.

Lemma 1012 Let x ^ 2 C\D be local solution of the optimization problem (26.4). If rg (^
x) 6=
^
0, then there exists a scalar 2 R such that

x) = ^ rg (^
rf (^ x) (26.5)

By unzipping gradients, the condition can be equivalently written as


@f @g
x) = ^
(^ (^
x) 8k = 1; :::; n
@xk @xk
We give a proof based on the Implicit Function Theorem.

Proof We prove the lemma for n = 2 (the extension to any n is routine by considering
a suitable extension of the Implicit Function Theorem for functions of n variables). Since
rg (^
x) 6= 0, at least one of the two partial derivatives @g=@x1 or @g=@x2 is di¤erent from 0 at
x
^. Let for example @g=@x2 (^ x) 6= 0 (if it were @g=@x1 (^
x) 6= 0 the proof would be symmetric).
26.3. ONE CONSTRAINT 711

As seen in Section 23.2.2, the Implicit Function Theorem can be applied also to study locally
points belonging to the level curves g 1 (b) with b 2 R. Since x ^ = (^ ^2 ) 2 g 1 (b), thanks
x1 ; x
to such a theorem there exist neighborhoods U (^ x1 ) and V (^x2 ) and a unique function with
a derivative h : U (^x1 ) ! V (^ x2 ) such that x
^2 = h (^ x1 ) and g (x1 ; h(x1 )) = b for each
x1 2 U (^
x1 ), with
@g
@x1 (x1 ; x2 )
h0 (x1 ) = @g
8 (x1 ; x2 ) 2 g 1
(b) \ (U (^
x1 ) V (^
x2 ))
@x2 (x1 ; x2 )

Consider the auxiliary function : U (^


x1 ) ! R de…ned by (x1 ) = f (x1 ; h(x1 )). By the
chain rule, the derivative of is

0 @f @f
(x1 ) = (x1 ; h(x1 )) + (x1 ; h(x1 )) h0 (x1 )
@x1 @x2

Since x
^ is local solution of the optimization problem (26.4), there exists a neighborhood
B" (^
x) of x
^ such that
f (^
x) f (x) 8x 2 g 1 (b) \ B" (^
x) (26.6)
Without loss of generality, suppose that " is su¢ ciently small so that

(^
x1 "; x
^1 + ") U (^
x1 ) and (^
x2 "; x
^2 + ") V (^
x2 )

Hence, B" (^
x) U (^
x1 ) V (^
x2 ). This permits to rewrite (26.6) as

f (^
x1 ; h (^
x1 )) f (x1 ; h (x1 )) 8x1 2 (^
x1 "; x
^1 + ")

that is, (^
x1 ) (x1 ) for every x1 2 (^ x1 "; x^1 + "). The point x ^1 is, therefore, of local
maximizer for . The …rst order condition is:
@g
!
0 @f @f @x1 (^
x1 ; x^2 )
(x1 ) = (^
x1 ; x
^2 ) (^
x1 ; x
^2 ) @g =0 (26.7)
@x1 @x2 (^
x 1 ; x
^ 2 )
@x2

If (@g=@x1 ) (^
x1 ; x
^2 ) 6= 0, we have
@f @f
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )
@g
= @g
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )

By setting
@f @f
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )
@g
= @g
=^
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )
we get (
@f
@x1 (^
x1 ; x
^2 ) = ^ @x
@g
1
(^
x1 ; x
^2 )
@f ^ @g
@x2 (^
x1 ; x
^2 ) = @x2 (^ x1 ; x
^2 )

or, equivalently, rf (^ ^2 ) = ^ rg(^


x1 ; x x1 ; x
^2 ), that is, (26.5).
712 CHAPTER 26. EQUALITY CONSTRAINTS

If (@g=@x1 ) (^
x1 ; x
^2 ) = 0, from (26.7) we also have
@f
(^
x1 ; x
^2 ) = 0
@x1
so that the equality
@f @g
(^ ^2 ) = ^
x1 ; x (^
x1 ; x
^2 )
@x1 @x1
is trivially veri…ed for every scalar ^ . Setting
@f
@x2 (^
x1 ; x
^2 )
@g
=^
@x2 (^
x1 ; x
^2 )

we have therefore again rf (^ ^2 ) = ^ rg(^


x1 ; x x1 ; x
^2 ), that is, (26.5).

Equality (26.5) tells us that a necessary condition for x ^ to be a local solution of the
optimization problem (26.4) is that the gradients of the functions f and g are proportional.
The “hat” above reminds us that this scalar depends on the point x ^ considered.
The next example shows that condition (26.5) is necessary, but not su¢ cient.

Example 1013 The optimization problem:

x31 + x32
max sub x1 x2 = 0 (26.8)
x1 ;x2 2
is of the form (26.4), where f; g : R2 ! R are given by f (x) = 2 1 (x31 + x32 ) and g (x) =
x1 x2 , while b = 0. We have rf (0; 0) = (0; 0) and rg (0; 0) = (1; 1), and so ^ = 0 is
such that rf (0; 0) = ^ rg (0; 0). The point (0; 0) thus satis…es with ^ = 0 condition (26.5),
but it is not a solution of problem (26.8). Indeed,

f (t; t) = t3 > 0 = f (0; 0) 8t > 0 (26.9)

Note that (0; 0) is not even a constrained (global) minimizer since f (t; t) = t3 < 0 for every
t < 0. N

To understand intuitively condition (26.5), assume that f and g are de…ned on R2 , so


that (26.5) has the form:

@f @f @g @g
(^
x) ; (^
x) =^ (^
x) ; (^
x)
@x1 @x2 @x1 @x2
that is
@f @g @f @g
x) = ^
(^ (^
x) and x) = ^
(^ (^
x) (26.10)
@x1 @x1 @x2 @x2
The condition rg (^ x) 6= 0 requires that at least one of the partial derivatives (@g=@xi ) (^
x)
is di¤erent from zero. If, for convenience, we suppose that both are so and that ^ 6= 0, then
(26.10) is equivalent to
@f @f
@x1 (^
x) @x2 (^
x)
@g
= @g
(26.11)
@x1 (^
x) @x2 (^
x)
26.3. ONE CONSTRAINT 713

Let us try now to understand intuitively why (26.11) is necessary for x


^ to be a solution
of the optimization problem (26.4). The di¤erentials of f and g at x
^ are given by
@f @f
df (^
x) (h) = rf (^
x) h = (^
x) h1 + (^
x) h2 8h 2 R2
@x1 @x2
@g @g
dg (^
x) (h) = rg (^
x) h = (^
x) h1 + (^
x) h2 8h 2 R2
@x1 @x2
They linearly approximate the di¤erences f (^ x + h) f (^ x) and g (^x + h) g (^x), that is, the
e¤ect on f and g determined by moving from x ^ to x
^ + h. As we know by now very well, such
an approximation is the better the smaller h. Suppose, ideally, that h is in…nitesimal and
that the approximation is exact, so that f (^ x + h) f (^ x) = df (^
x) (h) and g (^
x + h) g (^ x) =
dg (^
x) (h). This is clearly incorrect formally, but here we are proceeding heuristically, trying
to understand intuitively expression (26.11).
Continuing in our heuristic reasoning, let us start now from the point x ^ and let us
consider variations x ^ + h with h in…nitesimal. The …rst issue to worry about is whether they
are legitimate, i.e., whether they satisfy the equality constraint g (^ x + h) = b. This means
that g (^
x + h) = g (^x), and therefore h must be such that dg (^ x) (h) = 0. It follows that:
@g @g
(^
x) h1 + (^
x) h2 = 0
@x1 @x2
and so
@g
@x2 (^
x)
h1 = @g
h2 (26.12)
@x1 (^
x)
The e¤ect on the objective function f of moving from x ^ to x
^ + h is given by df (^
x) (h). When
h is legitimate, by (26.12) such an e¤ect is given by:
@g
!
@f @x2 (^
x ) @f
df (^
x) (h) = (^
x) @g
h2 + (^
x) h2 (26.13)
@x1 (^
x) @x2
@x1

If x
^ is solution of the optimization problem, we must necessarily have df (^ x) (h) = 0 for every
legitimate variation h. Otherwise, if it were df (^ x) (h) > 0, it would give a point x
^ + h that
satis…es the equality constraint, but such that f (^ x + h) > f (^x). On the other hand, if it
were df (^ x) (h) < 0 the same observation could be made this time for h, which is obviously
a legitimate variation and that would lead to the point x ^ h with f (^ x h) > f (^ x).
The necessary condition df (^ x) (h) = 0 together with (26.13) gives:
@g
!
@f @x2 (^
x) @f
(^
x) @g
h2 + (^
x) h2 = 0
@x1 (^
x) @x2
@x1

If, as it is natural, we assume h2 6= 0, we have


@g
!
@f @x2 (^
x) @f
(^
x) @g
+ (^
x) = 0
@x1 (^
x) @x2
@x1

that is precisely expression (26.11). At an intuitive level, all this explains why (26.5) is
necessary for x
^ to be solution of the problem.
714 CHAPTER 26. EQUALITY CONSTRAINTS

26.3.2 Lagrange’s Theorem


Lemma 1012 gives a necessary condition for optimality, with a quite intuitive meaning. This
condition can be equivalently written as

rf (^
x) ^ rg (^
x) = 0

By recalling the algebra of gradients, the expression rf (x) rg (x) makes it natural
to think of the function L : A R Rn R ! R de…ned as

L (x; ) = f (x) + (b g (x)) 8 (x; ) 2 A R (26.14)

This function, called Lagrangian, plays a key role in optimization problems. Its gradient is
@L @L @L
rL (x; ) = (x; ) ; :::; (x; ) ; (x; ) 2 Rn+1
@x1 @xn @
It is important to distinguish in it the two parts rx L and r L given by:
@L @L
rx L (x; ) = (x; ) ; :::; (x; ) 2 Rn
@x1 @xn
and
@L
r L (x; ) = (x; ) 2 R
@
Using such notation, we have

rx L (x; ) = rf (x) rg (x) (26.15)

and
r L (x; ) = b g (x) (26.16)
which leads to the following fundamental formulation in terms of the Lagrangian function of
the necessary condition of optimality of Lemma 1012.

Theorem 1014 (Lagrange) Let x ^ 2 C \ D be local solution of the optimization problem


x) 6= 0, then there exists a scalar ^ 2 R, called Lagrange multiplier, such that
(26.4). If rg (^
^
x; ) 2 Rn+1 is a stationary point of the Lagrangian function.
the pair (^

Proof Let x^ be solution of the optimization problem (26.4). By Lemma 1012 there exists
^ 2 R such that
x) ^ rg (^
rf (^ x) = 0
By (26.15), the condition is equivalent to

^; ^ = 0
rx L x

On the other hand, by (26.15) we have r L (x; ) = b g (x), and therefore we will have
x; ^ ) = 0 since b g (^
also r L(^ x; ^ ) is a stationary point of L.
x) = 0. It follows that (^

Thanks to Lagrange’s Theorem, the search of the local solutions of the constrained optim-
ization problem (26.4) reduces to the search of the stationary points of a suitable function
26.4. THE METHOD OF ELIMINATION 715

of several variables, the Lagrangian function. It is a more complicated function than the
original function f because of the new variable , but through it the search of the solutions
of the optimization problem can be done by solving a standard …rst order condition, similar
to the ones seen for unconstrained optimization problems.
Needless to say, we are discussing a condition that is only necessary: there is no guarantee
that the stationary points are actually solutions of the problem. It is already a remarkable
achievement, however, to have the simple (…rst order) condition
rL (x; ) = 0 (26.17)
for the search of the possible candidates to be solution of the constrained optimization
problem (26.4). In the next section we will see that this condition plays a fundamental role
in the search of the local solutions of problem (26.4) with the Lagrange’s method, which in
turn may lead to the global solutions with a version of the elimination method.

We close with two important remarks. First, observe that in general the pair (^ x; ^ ) is
not maximizer of the Lagrangian function, even when x ^ turns out to solve the optimization
problem. The pair (^ ^
x; ) is just a stationary point for the Lagrangian function, nothing more.
Therefore, to say that the search of the solutions of the constrained optimization problem
reduces to the search of the maximizers of the Lagrangian function is a serious mistake.
Second, note that problem (26.4) has a symmetric version
min f (x) sub g (x) = b
x

in which, instead of looking for maximizers, we look for minimizers. Condition (26.5) is
necessary also for this version of problem (26.4) and, therefore, the stationary points of the
Lagrangian function could be minimizers instead of maximizers. At the same time, it may
happen that they are neither maximizers nor minimizers. It is the usual ambiguity of …rst
order conditions, which we already encountered in unconstrained optimization: it re‡ects
the status of necessary conditions that the …rst order conditions have.

26.4 The method of elimination


Lagrange’s Theorem suggests the following procedure, which we may call Lagrange’s method,
for the search of the local solutions of the optimization problem (26.4):
1. determine the set D where the functions f and g have a continuous derivative;
2. determine the set C D of the points of the constraint where the functions f and g
have a discontinuous derivative;
3. setting D0 = fx 2 D : rg (x) = 0g, determine the set C \ D0 of the singular points
that satisfy the constraint;
4. determine the set S of the regular points
x 2 C \ (D D0 )
for which there exists a Lagrange multiplier 2 R such that the pair (x; ) 2 Rn+1 is a
stationary point of the Lagrangian function, that is, it satis…es the …rst order condition
(26.17);
716 CHAPTER 26. EQUALITY CONSTRAINTS

5. the local solutions of the optimization problem (26.4), if they exist, belong to the set

S [ (C \ D0 ) [ (C D) (26.18)

According to Lagrange’s method, therefore, the possible local solutions of the optimiza-
tion problem (26.4) must be searched among the points of the subset (26.18) of C. Indeed, a
local solution that is a regular point will belong to the set S thanks to Lagrange’s Theorem.
Instead, this theorem does not say anything about possible local solutions that are singular
points (and so belong to the set C \ D0 ) as well as about possible local solutions where the
functions have a discontinuous derivative (and so belong to the set C D).
In conclusion, a necessary condition for a point x 2 C to be a local solution for the
optimization problem (26.4) is that it belongs to the subset S [ (C \ D0 ) [ (C D) C.
This is what this procedure, a key dividend of Lagrange’s Theorem, establishes. Clearly, the
smaller such a set is, the more e¤ective the application of the theorem is: the search for local
solutions can be then restricted to a signi…cantly smaller set than the original set C.

That said, what about global solutions? If the objective function f is coercive and
continuous on C, the …ve phases of the Lagrange’s method plus the following extra sixth
phase provide a version of the elimination method to …nd global solutions.

6. Compute the set ff (x) : S [ (C \ D0 ) [ (C D)g; if a point x


^ 2 S [ (C \ D0 ) [
(C D) is such that

f (^
x) f (x) 8x 2 S [ (C \ D0 ) [ (C D) (26.19)

then x
^ is a (global) solution of the optimization problem (26.4).

In other words, the points of the set (26.18) in which f attains its maximum value are the
solutions of the optimization problem. Indeed, by Lagrange’s method this is the set of the
possible local solutions; the global solution, whose existence is ensured by Tonelli’s Theorem,
must then belong to such a set. Hence, the solutions of the “restricted”optimization problem

max f (x) sub x 2 S [ (C \ D0 ) [ (C D) (26.20)


x

are also the solutions of the optimization problem (26.4). Phase 6 is based on this remark-
able fact. As for the Lagrange’s method, the smaller the set (26.18) is, the more e¤ective
the application of the elimination method is. In particular, in the lucky case when it is a
singleton, the elimination method determines the unique solution of the optimization prob-
lem, a remarkable achievement.

In sum, the elimination method is an elegant combination of a global existence result,


Tonelli’s Theorem, and a local di¤erential result, Lagrange’s Theorem. In the rest of the
section we illustrate the procedure with some analytical examples. In the next section will
consider the classical consumer problem.

Example 1015 The optimization problem:


n
X
kxk2
max e sub xi = 1 (26.21)
x
i=1
26.4. THE METHOD OF ELIMINATION 717

2
is of thePform (26.4), where f : Rn ! R and g : Rn ! R are given by f (x) = e kxk and
g (x) = ni=1 xi , while b = 1. The functions are continuously di¤erentiable on R2 , that is,
D = R2 . As in the previous example, C \ D = C and C D = ;: at all the points of the
constraint the functions f and g have a continuous derivative. We have therefore completed
phases 1 and 2 of Lagrange’s method.
Since rg (x) = (1; 1; :::; 1), there are no singular points, that is, D0 = ;. This completes
phase 3 of Lagrange’s method.
The Lagrangian function L : Rn+1 ! R is given by
n
!
2 X
L (x; ) = e kxk + 1 xi
i=1

To …nd the set of its stationary points it is necessary to solve the …rst order condition (26.17)
given by the following (nonlinear) system of n + 1 equations
( @L 2
@xi = 2xi e kxk =0 8i = 1; :::; n
@L Pn
@ =1 i=1 xi = 0

We observe that in no solution we can have = 0. Indeed, if it were so, the …rst n equations
would imply xi = 0, which contradicts the last equation. It follows that in every solution we
have 6= 0. The …rst n equations imply
2
xi = ekxk
2
and, by substituting these values in the last equation, we …nd
2
1 + n ekxk = 0
2
that is
2 kxk2
=e
n
Substituting such value of in any of the …rst n equations we …nd xi = 1=n, and therefore
the unique point (x; ) 2 Rn+1 that satis…es the …rst order condition (26.17) is

1 1 1 2 1
; ; :::; ; e n
n n n n

so that S is the singleton


1 1 1
S= ; ; :::;
n n n
This completes phase 4 of Lagrange’s method. Since

S = S [ (C \ D0 ) [ (C D) (26.22)

in this example the …rst order condition (26.17) turns out to be necessary for any local
solution of the optimization problem (26.21). The unique element of S is therefore the only
candidate to be a local solution of the problem.
718 CHAPTER 26. EQUALITY CONSTRAINTS

Turn now to the elimination method, which we can use since the continuous function f is P
coercive on the (non compact, being closed, but not bounded) set C = fx = (x1 ; :::; xn ) 2 Rn : ni=
indeed: 8
< Rn p if t 0
(f t) = n
x 2 R : kxk lg t if t 2 (0; 1]
:
; if t > 1
and so the set (f t) is compact and non-empty for each t 2 (0; 1]. Since the set in (26.22)
is a singleton, the elimination method allows us to conclude that (1=n; :::; 1=n) is the unique
solution of the optimization problem (26.21). N

Example 1016 Given p = (p1 ; :::; pn ) 2 Rn++ [ Rn ,1 the optimization problem:


n
X n
X
max pi log xi sub xi = 1 (26.23)
x1 ;:::;xn
i=1 i=1
P
is
Pin the form (26.4), where f; g : Rn++ ! R are given by f (x) = ni=1 pi log xi and g (x) =
n
i=1 xi , while b = 1. The functions f and g are di¤erentiable with continuity at all points
of the constraint, so that C = C \ D, and there are no singular points, i.e., D0 = ;. This
completes the …rst three phases of Lagrange’s method.
The Lagrangian function L : Rn++ R ! R is given by
n n
!
X X
L (x; ) = pi log xi + 1 xi
i=1 i=1

To …nd the set of its stationary points we need to solve the …rst order condition (26.17) given
by the following system (nonlinear) of n + 1 equations
( @L pi
@xi = xi =0 8i = 1; :::; n
@L Pn
@ =1 i=1 xi = 0

Because the coordinates of the vector p are all di¤erent from zero, = 0 cannot be in
any solution. It followsPthat in each
P solution we have =
6 0. Because x 2 Rn++ , the
…rst P
n equations imply pi = xi and replacing these values in the last equation we
n
…nd i=1Pp i = . By replacing that value of in each of the …rst n equations we …nd
xi = pi = ni=1 pi . Thus, the unique point (x; ) 2 Rn+1 , which satis…es the …rst order
condition (26.17), is
( n
)
p1 p2 pn X
Pn ; Pn ; :::; Pn ; pi
i=1 pi i=1 pi i=1 pi i=1

so that S is the singleton

p p2 pn
S= Pn 1 ; Pn ; :::; Pn
i=1 pi i=1 pi i=1 pi
1
All coordinates of p are either strictly positive or strictly negative.
26.4. THE METHOD OF ELIMINATION 719

This completes the phase 4 of Lagrange’s method. Since

S = S [ (C \ D0 ) [ (C D) (26.24)

also in this example the …rst order condition (26.17) is necessary to each local solution of
the optimization problem (26.23). Again, the unique element of S is the only candidate to
be local solution to the optimization problem (26.21).
We can apply the elimination method because P the continuous function f is, by Lemma
712, also coercive on the set C = x 2 Rn++ : ni=1 xi = 1 , which is notP compact because P it
is not closed. In view of (26.24), the elimination method implies that (p1 = ni=1 pi ; :::; pn = ni=1 pi )
is the unique solution of the optimization problem (26.23). N

When the elimination method is based on Weierstrass’ Theorem, rather than on the
weaker (but more widely applicable) Tonelli’s Theorem, as a “by-product” we can also …nd
the global minimizers, that is, the points x 2 C that solve problem minx f (x) sub x 2 C.
Indeed, it is easy to see that such are the points x that minimize f over S [ (C \ D0 ) [
(C D). Clearly, this is no longer true with Tonelli’s Theorem because it only ensures the
existence of maximizers and remains silent on possible minimizers.

Example 1017 The optimization problem:

max 2x21 5x22 sub x21 + x22 = 1 (26.25)


x1 ;x2

is of the form (26.4), where f; g : R2 ! R are given by f (x1 ; x2 ) = 2x21 5x22 and g (x1 ; x2 ) =
x21 + x22 , while b = 1. The functions are continuously di¤erentiable on R2 , that is, D = R2 .
Hence, C \ D = C, so that C D = ;: at all the points of the constraint the functions f
and g have a continuous derivative. This completes phases 1 and 2 Lagrange’s method.
We have rg (x) = (2x1 ; 2x2 ), and so (0; 0) is the unique singular point, that is, D0 =
f(0; 0)g. The unique singular point does not satisfy the constraint, so that C \ D0 = ;. We
have therefore completed phase 3 of Lagrange’s method.
The Lagrangian function L : R3 ! R is given by

L (x1 ; x2 ; ) = 2x21 5x22 + 1 x21 x22

To …nd the set of its stationary points it is necessary to solve the …rst order condition (26.17):
8 @L
>
> @x1 =0
<
@L
@x2 =0
>
>
: @L
@ =0

that is, the following (nonlinear) system of three equations


8
>
> 4x1 2 x1 = 0
<
10x2 2 x2 = 0
>
>
:
1 x21 x22 = 0
720 CHAPTER 26. EQUALITY CONSTRAINTS

in the three unknowns x1 , x2 and . We verify immediately that x1 = x2 = 0 satisfy the …rst
two equations for every value of ; but they do not satisfy the third equation. While x1 = 0
and = 5 imply x2 = 1: Moreover, x2 = 0 and = 2 imply x1 = 1. In conclusion, the
triples (x1 ; x2 ; ) that satisfy the …rst order condition (26.17) are

f(0; 1; 5) ; (0; 1; 5) ; (1; 0; 2) ; ( 1; 0; 2)g

so that
S = f(0; 1) ; (0; 1) ; (1; 0) ; ( 1; 0)g

which completes phase 4 of Lagrange’s method.2 In conclusion

S = S [ (C \ D0 ) [ (C D) (26.26)

and, as in the last two examples, the …rst order condition is necessary for any local solution
of the optimization problem (26.25).
By having completed Lagrange’s method, let us turn to elimination method to …nd the
global solutions. Since the set C = (x1 ; x2 ) 2 R2 : x21 + x22 = 1 is compact and the function
f is continuous, we can use such method through Weierstrass’Theorem. In view of (26.26),
in phase 6 we have:

f (0; 1) = f (0; 1) = 5 > f (1; 0) = f ( 1; 0) = 2

The points (0; 1) and (0; 1) are thus the (global) solutions of the optimization problem
(26.25), while the reliance here of the elimination method on Weierstrass’Theorem makes it
possible to say that the points (1; 0) and ( 1; 0) are global minimizers. N

The next example illustrates the importance of singular points.

Example 1018 The optimization problem:


x1
max e sub x31 x22 = 0 (26.27)
x1 ;x2

is of the form (26.4), where f : R2 ! R and g : R2 ! R are given by f (x) = e x1 and


g (x) = x31 x22 , while b = 0. We have D = R2 , and hence C \ D = C and C D = ;. Steps
1 and 2 of Lagrange’s method have been completed.
We have
rg (x) = 3x21 ; 2x2

and therefore (0; 0) is the unique singular point and it satis…es the constraint: D0 = C \D0 =
f(0; 0)g. Also phase 3 of Lagrange’s method has been completed.
The Lagrangian function L : R3 ! R is given by
x1
L (x1 ; x2 ; ) = e + x22 x31
2
Note that there are no other points that satisfy rL = 0: Indeed, let us suppose that rL(b b2 ; b) = 0,
x1 ; x
b1 6= 0 and x
with x b2 6= 0. Then, from @L=@x1 = 0 we deduce = 2 and from @L=@x2 = 0 we deduce = 5.
26.5. THE CONSUMER PROBLEM 721

To …nd the set of its stationary points it is necessary to solve the …rst order condition (26.17)
given by the following (nonlinear) system of three equations
8 @L x1
>
> @x1 = e 3 x21 = 0
<
@L
@x2 = 2 x2 = 0
>
>
: @L
@ = x22 x31 = 0

We observe that for no solution we can have = 0. Indeed, if it were = 0 the …rst equation
would become e x1 = 0, which does not have solution. Let us suppose therefore 6= 0. The
second equation implies x2 = 0, and therefore from the third one it follows x1 = 0. The
…rst equation becomes 1 = 0, and the contradiction shows that the system does not have
solutions. Therefore there are no points that satisfy the …rst order condition (26.17), so that
S = ;. Phase 4 of Lagrange’s method shows that

S [ (C \ D0 ) [ (C D) = C \ D0 = f(0; 0)g (26.28)

By Lagrange’s method, the unique possible local solution of the optimization problem (26.27)
is the point (0; 0).
Turn now to the elimination method. To use it we need to show that the continuous f is
coercive on the (non compact, being closed, but not bounded) set C = x = (x1 ; x2 ) 2 R2 : x31 = x22 .
Note that: 8
< R2 if t 0
(f t) = ( 1; lg t] R if t 2 (0; 1]
:
; if t > 1
Thus, f is not coercive on the entire space R2 , but it is coercive on C, which is all that
matters here. Indeed, note that x1 can satisfy the constraint x31 = x22 only if x1 0, so that
C R+ R and

(f t) \ C (( 1; lg t] R) \ (R+ R) = [0; lg t] R, 8t 2 (0; 1]


p p
If x1 2 [0; lg t], the constraint implies x22 2 0; lg3 t , i.e., x22 2 [ lg3 t; lg3 t]. It
follows that
q q
(f t) \ C [0; lg t] lg3 t; lg3 t 8t 2 (0; 1]

and so (f t) \ C is compact because it is a closed subset of a compact set. We conclude


that f is both continuous and coercive on C. We can thus use the elimination method. In
view of (26.28), it implies that (0; 0) is the only solution of the optimization problem (26.27).
N

26.5 The consumer problem


Consider a consumer problem for which Walras’Law holds, that is,

max u (x) sub x 2 (p; I)


x
722 CHAPTER 26. EQUALITY CONSTRAINTS

where (p; I) = fx 2 A : p x = Ig, with p 0 (strictly positive prices), and the utility
function u : A Rn+ ! R is strictly increasing on A and continuously di¤erentiablePn on int A.
3
n
For example, the log-linear utility function u : R++ ! R de…ned by u (x) = i=1 ai log xi
satis…es these hypotheses, withPA = int A = Rn++ , while the separable utility function
u : Rn+ ! R de…ned by u (x) = ni=1 xi satis…es them with int A = Rn++ A = Rn+ .
Let us …rst …nd the local solutions through Lagrange’s method. The function g (x) = p x
expresses the constraint, so D = Rn+ \ int A and C D = (A int A) \ C. The set C D is,
therefore, formed by the boundary points of A that satisfy the constraint and that belong
to A. Note that when A = int A, as in the log-linear case, we have C D = ;.
From
rg (x) = p 8x 2 Rn
it follows that there are no singular points, that is, D0 = ;; hence, C \ D0 = ;. All this
completes phases 1-4 of Lagrange’s method.
The Lagrangian function L : A R ! R is given by
L (x; ) = u (x) + (I p x)
and, to …nd the set of its stationary points, it is necessary to solve the …rst order condition:
8 @L
>
> @x1 (x; ) = 0
>
>
>
>
>
<
>
>
>
> @L
(x; ) = 0
>
> @xn
>
: @L
@ (x; ) = 0
that is 8
@u(x)
>
> p1 = 0
>
> @x1
>
>
>
>
<
>
>
>
> @u(x)
>
> @xn pn = 0
>
>
:
I p x=0
In a more compact way, we write
@u (x)
= pi 8i = 1; :::; n (26.29)
@xi
p x=I (26.30)
The fundamental condition (26.29) is read in a di¤erent way according to the interpretation,
cardinalist or ordinalist, of the utility function. Let us suppose, for simplicity, that 6= 0.
According to the cardinalist reading, the condition is read in the equivalent form
@u(x) @u(x)
@x1 @xn
= =
p1 pn
3
Note that A Rn
+ implies int A Rn
++ , i.e., the interior points of A always have strictly positive
coordinates.
26.5. THE CONSUMER PROBLEM 723

that outlines as in the bundle x (local) solution of the consumer problem the marginal utilities
of the income spent for the various goods, measured by the ratios
@u(x)
@xi
pi

are all equal. Note that 1=pi is the quantity of good i that can be purchased with one unit
of income.
In an ordinalist perspective, where the notion of marginal utility becomes meaningless,
condition (26.29) is rewritten as
@u(x)
@xi pi
@u(x)
=
pj
@xj

for every pair of goods i and j of the solution bundle x. In such a bundle, therefore,
the marginal rate of substitution between each pair of goods must be equal to the ratio
between their prices, that is, M RSxi ;xj = pi =pj . For n = 2 we have the classical geometric
interpretation of the optimality condition in a bundle (x1 ; x2 ) as equality between the slope
of the indi¤erence curve (in the sense of Section 23.2.2) and that of the straight line of the
budget constraint.

2
x
2

1.5

0.5

-0.5

O x
1
-1
-1 0 1 2 3 4 5 6 7

The ordinalist interpretation does not require the cardinalist notion of marginal utility,
a notion that – by Occam’s razor – is therefore super‡uous for the study of the consumer
problem. The observation dates back to Vilfredo Pareto and represented a turning point in
the history of utility theory, so much that we talk of a “ordinalist revolution”.4

In any case, expressions (26.29) and (26.30) are …rst order conditions of the consumer
problem and their resolution determines the set S of the stationary points. In conclusion,
4
See his “Sunto di alcuni capitoli di un nuovo trattato di economia pura del prof. Pareto” that appeared
on the Giornale degli Economisti in 1900 (translated in Giornale degli Economisti, 2008).
724 CHAPTER 26. EQUALITY CONSTRAINTS

Lagrange’s method implies that the local solutions of the consumer problem must be looked
for among the points of
S [ ((A int A) \ C) (26.31)
Beyond points that satisfy …rst order conditions (26.29) and (26.30), local solutions can
therefore be boundary points A int A of the set A that satisfy the constraint (such solutions
are called boundary 5 ).

When u is coercive on (p; I) we can apply the elimination method to …nd the (global)
solutions of the consumer problem, that is, the optimal bundles (which are the economically
meaningful notions, consumers do not care about bundles that are just locally optimal). In
view of (26.31), the solutions are the bundles x
^ 2 S [ ((A int A) \ C) such that

u (^
x) u (x) 8x 2 S [ ((A int A) \ C)

In other words, we have to compare the utility levels attained by the stationary points in S
and by the boundary points that satisfy the constraint in (A int A)\C. As the comparison
requires the computation of all these utility levels, the smaller the set S [ ((A int A) \ C)
the more e¤ective the elimination method.

Example 1019 Consider the log-linear utility function in the case n = 2, i.e.,

u (x1 ; x2 ) = a log x1 + (1 a) log x2

with a 2 (0; 1). The …rst order condition at every (x1 ; x2 ) 2 R2++ takes the form
a 1 a
= p1 ; = p2 (26.32)
x1 x2
p1 x1 + p2 x2 = I (26.33)

Expression (26.32) implies


a 1 a
=
p1 x1 p2 x2
Substituting in (26.33), we have
1 a
p1 x1 + p1 x1 = I
a
and hence
aI (1 a) I
x1 = ; x2 =
p1 p2
In conclusion,
aI (1 a) I
S= ; (26.34)
p1 p2
Since (A int A) \ C = ;, the unique possible local solution of the consumer problem is the
bundle
aI (1 a) I
x= ; (26.35)
p1 p2
5
In the case n = 2 and A = R2+ such solutions can be (0; I=p2 ) and (I=p1 ; 0).
26.6. COGITO ERGO SOLVO 725

We turn now to the elimination method, which we can use since the continuous function u
is, by Lemma 712, coercive on the set (p; I) = x 2 R2++ : p1 x1 + p2 x2 = I , which is not
compact since it is not closed. In view of (26.34), the elimination method implies that the
bundle (26.35) is the unique solution of the log-linear consumer problem, that is, the unique
optimal bundle. This con…rms what we already proved and discussed in Section 16.6, in a
more general and elegant way through the Jensen’s inequality. N

26.6 Cogito ergo solvo


The last example shows the power of the elimination method: the Lagrange’s method allowed
us to …nd the unique candidate in R2++ to be a local solution of the consumer problem, but
it could not tell anything neither about its nature (whether a maximizer, a minimizer or
something else) nor about its uniqueness, a fundamental feature for an optimal bundle (in
that it permits comparative statics exercises). The elimination method answers all these key
questions by showing that the unique local candidate is, indeed, the unique solution.
That said, the last example also shows the limitations of di¤erential methods. Indeed, as
we remarked, in Section 16.6 we reached a more general result without using such methods.
The next example will show that di¤erential methods can actually turn out to be silly. They
are not a Deus ex machina that one should always try automatically, without …rst thinking
about the speci…c optimization problem at hand, with its peculiar features that may make
it possible to address it with a direct argument.
Example 1020 Consider the separable utility function u : R2+ ! R given by u (x) = x1 +x2 .
Suppose p1 6= p2 (as it is usually the case). First, observe that C D = f(0; I=p2 ) ; (I=p1 ; 0)g.
The …rst order condition in every (x1 ; x2 ) 2 R2++ becomes
1 = p1 ; 1 = p2
p1 x1 + p2 x2 = I
that does not have solutions since p1 6= p2 . Hence, S = ;, so that
I I
S [ (C D) = C D= 0; ; ;0
p2 p1
The unique possible local solutions of the consumer problem are therefore the boundary
bundles f(0; I=p2 ) ; (I=p1 ; 0)g. Since u is continuous on the compact set (p; I) = fx 2
R2+ : p1 x1 + p2 x2 = Ig, we can apply the elimination method through Weierstrass’Theorem
and conclude that (0; I=p2 ) is the optimal bundle when p2 < p1 and (I=p1 ; 0) is the optimal
bundle when p2 > p1 .
The same result can be achieved, however, in a straightforward manner without any
di¤erential machinery. Indeed, if we substitute the constraint in the objective function, the
optimal x1 (and so the optimal x2 via the budget constraint) can be found by solving the
elementary optimization problem
max (p2 p1 ) x1 sub x1 2 [0; I=p1 ]
x1

It is immediate to check that there are two boundary solutions x ^1 = 0 and x ^1 = I=p1 if,
respectively, p1 > p2 and p1 < p2 . This shows how silly can be a mechanical use of di¤erential
arguments. N
726 CHAPTER 26. EQUALITY CONSTRAINTS

26.7 Several constraints


Let us now consider the general optimization problem (26.2), in which there can be many
equality constraints. Lemma 1012 and Theorem 1014 can be e¤ortlessly generalized to the
case with multiple constraints. Let us write problem (26.2) as

max f (x) sub g (x) = b (26.36)


x

where g = (g1 ; :::; gm ) : A Rn ! Rm and b = (b1 ; :::; bm ) 2 Rm . The Jacobian matrix


Dg (x) is given by
2 3
rg1 (x)
6 rg2 (x) 7
Dg (x) = 6
4
7
5
rgm (x)
and the points x where Dg (^ x) has full rank are said to be regular, while those for which such
a condition does not hold are said to be singular.
The Jacobian Dg (^ x) has full rank if, for example, the gradients rg1 (^ x), rg2 (^x), ...,
rgm (^ n
x) are linearly independent vectors of R . In such a case, the full rank condition
requires m n, that is that the number m of constraints be smaller than the dimension n
of the space.
Two observations regarding regularity are in order: (i) when m = n the Jacobian has
full rank if and only if it is not singular, that is det Dg (x) 6= 0; (ii) when m = 1, we have
that Dg (x) = rg (x) and the full rank condition is equivalent to requiring that rg (x) 6= 0,
which brings us back to the notions of regular and singular points we have seen above.

The following result extends Lemma 1012 to the case with multiple constraints and show
that the regularity condition rg (^
x) 6= 0 from such a lemma can be generalized by requiring
that the Jacobian Dg (^ x) have full rank.6 In other words, x
^ must not be a singular point
here either.

Lemma 1021 Let x ^ 2 C \ D be the local solution to the optimization problem (26.36). If
x) has full rank, then there is a vector ^ 2 Rm such that
Dg (^
n
X
rf (^
x) = ^ i rgi (^
x) (26.37)
i=1

The Lagrangian is now the function L : A R Rn Rm ! R de…ned as:


m
X
L (x; ) = f (x) + i (bi gi (x)) = f (x) + (b g (x)) (26.38)
i=1

for every (x; ) 2 A Rm , and Theorem 1014 can be generalized in the following way (we
omit the proof as it is analogous to that of the cited result).
6
We shall omit the proof, which generalizes that of Lemma 1012 by means of an adequate version of the
Implicit function theorem.
26.7. SEVERAL CONSTRAINTS 727

Theorem 1022 Let x ^ 2 C \ D be a solution to the optimization problem (26.36). If Dg (^x)


has full rank, there is a vector ^ 2 Rm such that the pair (^x; ^ ) 2 Rn+m is a critical point
for the Lagrangian.
The components ^ i of vector ^ 2 Rm are called Lagrange multipliers. Such a vector ^ is
x)gm are linearly independent as in such a case there is
unique whenever the vectors frgi (^
Pm i=1
x) = i=1 ^ i rgi (^
a unique representation rf (^ x).

The considerations we made for Theorem 1014 also hold in this more general case. In
particular, the search for local solution candidates for the constrained problem must still
be conducted following Lagrange’s method, which displays some conceptual novelties in
the multiple constraints case. The elimination method can be still used, again without
any conceptual novelty, to check whether such local candidates actually solve the optimum
problem. The examples will momentarily illustrate all this.
From an operational standpoint not that, however, the …rst order condition (26.17)
rL (x; ) = 0
is based on the Lagrangian L which has the more complex form (26.38). Also the form of
the set of critical points D0 is more complex now. In particular, the study of the Jacobian’s
determinant may be complex, thus making the search for critical points quite hard. The best
thing often is to directly look for the critical points which satisfy the constraints, that is for
the set C \ D0 , instead of trying to determine the set D0 …rst and for the intersection C \ D0
afterwards (as we did in the case with one constraint). The points x 2 C \ D0 are such that
gi (x) = bi and the gradients rgi (x) are linearly independent. We must now therefore verify
whether the system 8 Pm
>
> i=1 i rgi (x) = 0
>
> g1 (x) = b1
>
>
<
>
>
>
>
>
>
:
gm (x) = bm
admits solutions (x; ) 2 Rn Rm with = ( 1 ; :::; m ) 6= 0, that is with i which aren’t
all null. Such possible solutions identify those critical points which satisfy the constraints.
Note that the system can be written as
8 Pm @gi (x)
>
> i=1 i @x1 = 0
>
>
>
>
>
>
>
>
>
>
>
>
>
< Pm
> @gi (x)
i=1 i @xn = 0
(26.39)
>
> g (x) = b
>
> 1 1
>
>
>
>
>
>
>
>
>
>
>
>
:
gm (x) = bm
which makes computations more convenient.
728 CHAPTER 26. EQUALITY CONSTRAINTS

Example 1023 Let us consider the optimization problem:

max 7x1 3x3 sub x21 + x22 = 1 and x1 + x2 x3 = 1 (26.40)


x1 ;x2 ;x3

It is in form (26.36), where f : R3 ! R and g = (g1 ; g2 ) : R3 ! R2 are given by f (x1 ; x2 ) =


7x1 3x3 ; g1 (x1 ; x2 ; x3 ) = x21 + x22 and g2 (x1 ; x2 ; x3 ) = x1 + x2 x3 , while b = (1; 1) 2 R2 .
Such functions are all continuously di¤erentiable on R3 , that is D = R3 . Hence, C \ D =
C, so that C D = ;: in all points of the constraint both the function f sand the functions
gi , with i = 1; 2, are continuously di¤erentiable. This completes phases 1 and 2 of Lagrange’s
method.
Let us …nd the potential critical points which satisfy the constraint, that is the set C \D0 .
The system (26.39) becomes 8
>
> 2 1 x1 + 2 = 0
>
>
< 2 1 x2 + 2 = 0
2 =0
>
>
>
> x1 + x22 = 1
2
:
x1 + x2 x3 = 1
Since 2 = 0, 1 is di¤erent from 0. This implies that x1 = x2 = 0, contradicting the
fourth equation. Therefore, there are no critical points which satisfy the constraint, that is
C \ D0 = ;. Phase 3 of Lagrange’s method is thus completed.
The Lagrangian L : R5 ! R is

L (x1 ; x2 ; x3 ; 1; 2) = 7x1 3x3 + 1 1 x21 x22 + 2 (1 x1 x2 + x3 )

In order to …nd the set of its critical points we must solve the …rst order condition (26.17)
which is given by the following non-linear system of …ve equations
8 @L
>
>
> @x1 = 7 2 1 x1 2 =0
>
> @L
< @x2 = 2 1 x2 2 =0
@L
@x3 = 3 + 2 = 0
>
> @L 2
>
> @ 1 = 1 x x22 = 0
>
: @L
1
@ 2 =1 x1 x2 + x3 = 0

in the …ve unknowns x1 , x2 , x3 , 1 and 2 . The third equation implies 2 = 3 so that the
system becomes: 8
>
> 2 1 x1 + 4 = 0
<
2 1 x2 3 = 0
>
> 1 x21 x22 = 0
:
1 x1 x2 + x3 = 0
The …rst equation implies that 1 6= 0. Therefore, from the …rst two equations it follows
that 2= 1 = x1 and 3= (2 1 ) = x2 . By substituting into the third equation we get that
1 = 5=2. If 1 = 5=2, we have that x1 = 4=5, x2 = 3=5, x3 = 4=5: If 1 = 5=2, we
have that x1 = 4=5, x2 = 3=5, and x3 = 6=5. We have thus found the two critical points
of the Lagrangian
4 3 4 5 4 3 6 5
; ; ; ;3 ; ; ; ; ;3
5 5 5 2 5 5 5 2
26.7. SEVERAL CONSTRAINTS 729

so that
4 3 4 4 3 6
S= ; ; ; ; ;
5 5 5 5 5 5
thus completing all phases of Lagrange’s method. In conclusion, we have that
4 3 4 4 3 6
S [ (C \ D0 ) [ (C D) = S = ; ; ; ; ; (26.41)
5 5 5 5 5 5
thus proving that in the example the …rst order condition (26.17) is necessary for any local
solution to the optimization problem (26.40).
We now turn to the elimination method. Clearly, the set

C = x = (x1 ; x2 ; x3 ) 2 R3 : x21 + x22 = 1 and x1 + x2 x3 = 1

is closed. It is also bounded (and so compact). For the x1 and x2 such that x21 + x22 = 1
we have x1 ; x2 2 [ 1; 1], while for the x3 such that x3 = x1 + x2 1 and x1 ; x2 2 [ 1; 1] we
have x3 2 [ 3; 1]. It follows that C [0; 1] [0; 1] [ 3; 1], and so C is bounded. Since f is
continuous, we can thus use the elimination method through Weierstrass’Theorem. In view
of (26.41), in the last phase of the elimination method we have
4 3 4 4 3 7 49
f ; ; =8 and f ; ; =
5 5 5 5 5 5 5
Hence, (4=5; 3=5; 4=5) solves the optimum problem (26.40), while ( 4=5; 3=5; 7=5) is a
minimizer. N

Example 1024 Let us consider the optimization problem:

max x1 sub x21 + x32 = 0 and x23 + x22 2x2 = 0 (26.42)


x1 ;x2 ;x3

It is also in form (26.36), dove f : R3 ! R and g = (g1 ; g2 ) : R3 ! R2 are given by


f (x1 ; x2 ) = x1 , g1 (x1 ; x2 ; x3 ) = x21 + x32 , g2 (x1 ; x2 ; x3 ) = x23 + x22 2x2 , while b = (0; 0) 2
R2 .
As before, the functions all are continuously di¤erentiable on R3 , that is D = R3 . There-
fore, C \ D = C, so that C D = ;: in all points of the constraint both the function f and
functions gi are continuously di¤erentiable. This completes phases 1 and 2 of Lagrange’s
method.
Let us …nd the possible critical points which satisfy the constraint, that is the set C \ D0 .
The system (26.39) becomes
8
>
> 2 1 x1 = 0
>
>
< 3 1 x22 + 2 (2x2 2) = 0
2 2 x3 = 0
>
>
>
> x21 + x32 = 0
:
x3 + x22 2x2 = 0
2

In light of the …rst and the third equations, we must consider three cases:

(i) 1= 0, x3 = 0 and 2 6= 0: in this case the second equation implies x2 = 1, which


contradicts the last equation.
730 CHAPTER 26. EQUALITY CONSTRAINTS

(ii) 2 = 0, x1 = 0 and 1 6= 0: in this case we obtain the solution x1 = x2 = x3 = 0.


(iii) x1 = x3 = 0: here as well we obtain the solution x1 = x2 = x3 = 0.
In conclusion f(0; 0; 0)g is the unique critical point which satis…es the constraints, that
is C \ D0 = f(0; 0; 0)g. This completes phase 3 of Lagrange’s method.
The Lagrangian L : R4 ! R is given by
L (x1 ; x2 ; x3 ; ) = x1 + 1 x21 x32 + 2 x23 x22 + 2x2
The …rst order condition (26.17) given by the following (non-linear) system of …ve equations
8 @L
>
>
> @x1 = 1 + 2 1 x1 = 0
>
> @L 2
< @x2 = 3 1 x2 2 2 (x2 1) = 0
@L
> @x3 = 2 2 x3 = 0
>
> @@L = x21 x32 = 0
>
>
: @L1 2
@ 2 = x3 x22 + 2x2 = 0
in …ve unknowns x1 , x2 , x3 , 1 and 2 . The …rst equation implies that 1 6= 0 and x1 6= 0.
From the fourth equation it follows that x2 6= 0 and so, from the second equation, we have
that 2 6= 0.
Since 2 6= 0, from the …rst equation we have that x3 = 0, so that the …fth equation
implies that x2 = 0 or x2 = 2. Since x2 = 0 contradicts
p what we have just stated, let us take
x2 = 2. The p fourth equation implies x1 = 8, and so from the …rst equation
p implies that
1 = 1=4 2, so that from the second equation we get that 2 = 3=2 2. In conclusion,
the critical points of the Lagrangian are
p 1 3 p 1 3
8; 2; 0; p ; p ; 8; 2; 0; p ; p
4 2 2 2 4 2 2 2
and so n p p o
S= 8; 2; 0 ; 8; 2; 0
which completes all phases of Lagrange’s method. In conclusion, we have that
n p p o
S [ (C \ D0 ) [ (C D) = S [ (C \ D0 ) = 8; 2; 0 ; 8; 2; 0 ; (0; 0; 0) (26.43)

and among such three points one must search for the possible local solutions to the optim-
ization problem (26.42).
As to the elimination method, also here the set
C = x = (x1 ; x2 ; x3 ) 2 R3 : x32 = x21 and x23 + x22 = 2x2
is clearly closed. It is also bounded (and so it is compact). In fact, the second constraint can
be written as x23 + (x2 1)2 = 1, and so the x2 and x3 that satisfy it are such that xp 2 2p[0; 2]
2 3 2
and x3 2 [ 1; 1]. Now, the constraint x1 = x2 implies x1 2 [0; 8], and so x1 2 8; 8 .
p p
We conclude that C 8; 8 [0; 2] [ 1; 1] and so C is bounded. As in the previous
example, we can use the elimination method through Weierstrass’ Theorem. In view of
(26.43), in the last phase of the elimination method we have
p p
f 8; 2; 0 = 8 and f (0; 0; 0) = 0
p
Hence, (0; 0; 0) solves the optimum problem (26.42), while ( 8; 2; 0) is a minimizer. N
26.7. SEVERAL CONSTRAINTS 731

Example 1025 Let us consider the optimization problem:

max x21 + x22 + x23 sub x21 x2 = 1 and x1 + x3 = 0 (26.44)


x1 ;x2 ;x3

This problem is of the form (26.36), where f : R3 ! R and g = (g1 ; g2 ) : R3 ! R2 are


given by f (x1 ; x2 ; x3 ) = x21 + x22 + x23 , g1 (x1 ; x2 ; x3 ) = x21 x2 and g2 (x1 ; x2 ; x3 ) =
2
x1 + x3 , while b = (1; 1) 2 R . As in the previous examples, all functions are continuously
di¤erentiable on R3 , that is D = R3 . Therefore, C \ D = C, so that C D = ;, which
completes phases 1 and 2 of Lagrange’s method.
In this case we shall directly study the rank of the Jacobian:

2x1 1 0
Dg (x) =
1 0 1

It is easy to see that for no value of x1 the two row vectors, that is the two gradients rg1 (x)
and rg2 (x), are linearly dependent (at a “mechanical ”level one can easily verify that no
value of x1 can be such that the matrix Dg (x) does not have full rank). Therefore, there
are no singular points, that is D0 = ;. It follows that C \ D0 = ;, and so we have concluded
phase 3 of Lagrange’s method.
Let us now move to the search of the set of the Lagrangian’s critical points L : R5 ! R
which is given by

L (x1 ; x2 ; x3 ; 1; 2) = x21 + x22 + x23 + 1 1 x21 + x2 + 2 (1 x1 x3 )

In order to …nd such points we must solve the following (non-linear) system of 5 equations
8 @L
>
>
> @x1 = 2x1 2 1 x1 2 =0
>
> @L
= 2x + = 0
< @x2 2 1
@L
@x3 = 2x 3 2 =0
>
> @L 2
>
> = 1 x1 + x2 = 0
>
: @@L1
@ 2 =1 x1 x3 = 0

We have that 1 = 2x2 and 2 = 2x3 , which, if substituted in the …rst equation, lead to
the following non-linear system in three equations:
8
< x1 + 2x1 x2 x3 = 0
1 x21 + x2 = 0
:
1 x1 x3 = 0

From the last two equations it follows that x2 = x21 1 and x3 = 1p x1 , which, if substituted
in the …rst equation, imply that 2x31 1 = 0, from which x1 = 1= 3 2 follows and so
1 1
x2 = p
3
1 and x3 = 1 p
3
4 2
Therefore there is a unique critical point

1 1 1 2 2
p
3
;p
3
1; 1 p
3
;p
3
2; 2 + p
3
2 4 2 4 2
732 CHAPTER 26. EQUALITY CONSTRAINTS

so that
1 1 1
S= p
3
;p
3
1; 1 p
3
2 4 2
thus completing all phases of Lagrange’s method. In conclusion, we have that

1 1 1
S [ (C \ D0 ) [ (C D) = S = p
3
;p
3
1; 1 p
3
(26.45)
2 4 2
is the only candidate local solution to the optimization problem (26.44).
Let us consider the elimination method. The set

C = x = (x1 ; x2 ; x3 ) 2 R3 : x21 x2 = 1 and x1 = x3

is closed but not


p bounded (and so it is not compact). In fact, consider the sequence fxn g
given by xn = 1 + n; n; 1 n . The sequence belongs to C, but kxn k ! +1 and so there
is no neighborhood in R3 that can contain it. On the other hand, by Proposition 698 the
function f is coercive and continuous on C. As in the last two examples, we can thus use the
elimination method, this time through Tonelli’s Theorem. In view of (26.45), the elimination
method implies that the point

1 1 1
p
3
;p
3
1; 1 p
3
2 4 2
is the solution of the optimization problem (26.44). In this case the elimination method is si-
lent about possible minimizers because it relies on Tonelli’s Theorem and not on Weierstrass’
N
Chapter 27

Inequality constraints

27.1 Introduction
Let us go back to the consumer problem seen at the beginning of the previous chapter, in
which we considered a consumer with utility function u : A Rn ! R and income b 2 R.
Given the vector p 2 Rn+ of prices of the goods, we wrote his budget constraint as

C (p; b) = fx 2 A : p x = bg

and his optimization problem as:

max u (x) sub x 2 C (p; b) (27.1)


x

In this formulation we assumed that the consumer exhausts his budget (and so the equality
in the budget constraint) and we did not impose other constraints on the bundle x except
that of satisfying the budget constraint. As to the income, the hypothesis that it is entirely
spent can be too strong. Think for example of intertemporal problems, where it can be
crucial to leave to the consumer the possibility of saving in some periods, something that is
impossible if we require that the budget constraint is satis…ed with equality at each period.
It becomes therefore natural to ask what happens to the consumer optimization problem if
we weaken the constraint to p x b, that is, if the constraint is given by an inequality and
not anymore by an equality.
As to the bundles of goods x, in many cases it is meaningless to talk of negative quantities.
Think for example of the purchase of physical goods, maybe fruit or vegetables in an open
air market, in which the quantity purchased has to be positive. This suggests to impose the
constraint x 2 Rn+ in the optimization problem.
By keeping in mind these observations, the consumer problem becomes:

max u (x) (27.2)


x
sub p x b and x 2 Rn+

with constraints now given by inequalities. If we write the budget constraint as

C (p; b) = x 2 A : x 2 Rn+ and p x b (27.3)

733
734 CHAPTER 27. INEQUALITY CONSTRAINTS

the optimization problem still takes the form (27.1), but the set C (p; b) is now di¤erent.

The general form of an optimization problem with both equality and inequality constraints
is:
max f (x) (27.4)
x
sub gi (x) = bi 8i 2 I
hj (x) cj 8j 2 J
where I and J are …nite sets of indices (possibly empty), f : A Rn ! R is the objective
function, the functions gi : A Rn ! R and the associated scalars bi characterize jIj equality
constraints, while the functions hj : A Rn ! R with the associated scalars cj induce jJj
inequality constraints. We continue to assume, as in the previous chapter, that the functions
f and gi are continuously di¤erentiable on a non-empty and open subset D of their domain
A.
The optimization problem (27.4) can be equivalently formulated in canonical form as
max f (x) sub x 2 C
x

where the choice set is


C = fx 2 A : gi (x) = bi and hj (x) cj 8i 2 I; 8j 2 J g (27.5)
The formulation (27.4) is extremely ‡exible. It encompasses the optimization problem
with only equality constraints, which is the special case I 6= ; and J = ;. It reduces to
an unconstrained optimization problem when I = J = ; and A is open. Moreover, observe
that:
(i) A constraint of the form h (x) c can be included in the formulation (27.4) by con-
sidering h (x) c. In particular, the constraint x 0 can be included in the
formulation (27.4) by considering x 0.
(i) A constrained minimization problem for f can be written in the formulation (27.4) by
considering f .
The two observations show the scope and ‡exibility of formulation (27.4). In particular,
in light of (iii) it should be clear that also the choice of the sign in expressing the in-
equality constraints is simply a convention. That said, next we give some discipline to this
formulation.
De…nition 1026 The problem (27.4) is said to be well posed if, for each j 2 J, there exists
x 2 C such that hj (x) < c.
To understand this de…nition observe that an equality constraint g (x) = b can be written
in form of inequality constraint as g (x) b and g (x) b. This makes uncertain the
distinction between equality constraints and inequality constraints in (27.4). To avoid this,
and so to have a clear distinction between the two types of constraints, in what follows
we will always consider optimization problems (27.4) that are well posed, so that it is not
possible to express equality constraints in the form of inequality constraints. Indeed, there
cannot exist any x 2 C for which we can have both g (x) b and g (x) < b. Naturally, if
J = ;, De…nition 1026 is automatically satis…ed and there is nothing to worry about.
27.1. INTRODUCTION 735

Example 1027 (i) The optimization problem:

max x21 + x22 + x33


x1 ;x2 ;x3

sub x1 + x2 x3 = 1 and x21 + x22 1

is of the form (27.4) with jIj = jJj = 1, f (x) = x21 + x22 + x33 , g (x) = x1 + x2 x3 ,
h (x) = x21 + x22 and b = c = 1.1
(ii) The optimization problem:

max x1
x1 ;x2 ;x3

sub x21 + x32 = 0 and x23 + x22 2x2 = 0

is of the form (27.4) with I = f1; 2g, J = ;, f (x) = x1 , g1 (x) = x21 + x32 , g2 (x) =
x23 + x22 2x2 and b1 = b2 = 0.
(iii) The optimization problem:

max ex1 +x2 +x3


x1 ;x2 ;x3
1 1
sub x1 + x2 + x3 = 1; x21 + x22 + x23 = , x1 0 and x2
2 10
is of the form (27.4) with I = J = f1; 2g ; f (x) = ex1 +x2 +x3 , g1 (x) = x1 + x2 + x3 ,
g2 (x) = x21 + x22 + x23 ; h1 (x) = x1 ; h2 (x) = x2 ; b1 = 1; b2 = 2 1 ; c1 = 0 and c2 = 10 1 .
(iv) The optimization problem:

max x31 x32


x1 ;x2 ;x3

sub x1 + x2 1 and x1 + x2 1

is of the form (27.4) with I = ;; J = f1; 2g ; f (x) = x31 x32 , h1 (x) = x1 + x2 , h2 (x) =
x2 + x1 and c1 = c2 = 1.
(v) The minimum problem:

min x1 + x2 + x3
x1 ;x2 ;x3
1
sub x1 + x2 = 1 and x22 + x23
2
can be written in the form (27.4) as

max (x1 + x2 + x3 )
x1 ;x2 ;x3
1
sub x1 + x2 = 1 and x22 x23
2
N
1
To be pedantic, here we should have set I = J = f1g ; g1 (x) = x1 + x2 x3 , h1 (x) = + and x21 x22
b1 = c1 = 1. But, in this case in which we have only one equality constraint and only one inequality
constraint, pedices make the notation heavy without utility.
736 CHAPTER 27. INEQUALITY CONSTRAINTS

O.R. An optimization problem with inequality constraints is often written as

max f (x) (27.6)


x
sub g1 (x) b1 ; g2 (x) b2 ; :::; gm (x) bm

where f : A Rn ! R is our objective function, while the functions gi : A Rn ! R and


the scalars bi 2 R induce m inequality constraints. This formulation may include equality
constraints through the usual trick of expressing the equality constraint g (x) = b via two
inequality constraints g (x) b and g (x) b. Note, however, that this formulation
requires the presence of at least one constraint (it is the case m = 1) and hence it is less
general than (27.4). Moreover, the indirect way in which (27.6) encompasses the equality
constraints may make less transparent the formulation of the results. This is a further reason
why we chose the formulation (27.4) in which the equality constraints are fully speci…ed. H

27.2 Resolution of the problem


In this section we extend to the optimization problem (27.4) the solution methods studied
in the previous chapter for the special case of an optimization problem with only equality
constraints (26.2). In order to do this, we …rst need to …nd the general version of Lemma
1021 that also holds for problem (27.4). To this end, for a given point x 2 A, set

A (x) = I [ fj 2 J : hj (x) = cj g (27.7)

In other words, A (x) is the set of the so called binding constraints at x, that is, of the
constraints that hold as equalities at the given point x. For example, in the problem

max f (x1 ; x2 ; x3 )
x1 ;x2 ;x3

sub x1 + x2 x3 = 1 and x21 + x22 1

the …rst constraint is binding at


p all the
p points
p of C, while the second constraint is, for in-
stance, binding at the point 1= 2; 1= 2; 2 1 and is not binding at the point (1=2; 1=2; 0).

De…nition 1028 The problem (27.4) has regular constraints at a point x 2 A if the gradients
rgi (x) and the gradients rhj (x), with j 2 A (x), are linearly independent.

In other words, the constraints are regular at a point x if the gradients of the functions
that induce constraints binding at such point are linearly independent. This condition is
the generalization to the problem (27.4) of the condition of linear independence upon which
Lemma 1021 was based; indeed, it implies that x is a regular point for the function g : A
RjIj ! R.
In particular, if we form the matrix whose rows consist of the gradients of the func-
tions that induce binding constraints at the point considered, the regularity condition of the
constraints is equivalent to require that such matrix has maximum rank.
Finally, observe that in view of Corollary 88-(ii) the regularity condition of the constraints
can be satis…ed at a point x only if jA (x)j n, that is, only if the number of the binding
constraints at x does not exceed the dimension of the space on which the optimization
problem is de…ned.
27.2. RESOLUTION OF THE PROBLEM 737

We can now state the generalization of Lemma 1021 for problem (27.4). In reading it
note how the vector ^ associated to the inequality constraints has positive sign, while there
is no restriction on the sign of the vector ^ associated to the equality constraints.

Lemma 1029 Let x ^ 2 C \ D be solution of the optimization problem (27.4). If the con-
jJj
^, then there exist a vector ^ 2 RjIj and a vector ^ 2 R+ such
straints are regular in x
that
X X
rf (^
x) = ^ i rgi (^
x) + ^ j rhj (^
x) (27.8)
i2I j2J

^ j (c hj (^
x)) = 0 8j 2 J (27.9)

By unzipping gradients, condition (27.8) can be equivalently written as

@f X @gi X @hj
(^
x) = ^i (^
x) + ^j (^
x) 8k = 1; :::; n
@xk @xk @xk
i2I j2J

This lemma generalizes Fermat’s Theorem and Lemma 1021. Indeed, if I = J = ; then
condition (27.8) reduces to the condition rf (^ x)P= 0 of Fermat’s Theorem, while if I 6= ;
and J = ;, it reduces to the condition rf (^ x) = i2I ^ i rgi (^
x) of Lemma 1021. Relative to
these previous results, the novelty of Lemma 1029 is, besides the positivity of the vector ^
associated to the inequality constraints, the condition (27.9). To understand the role of this
condition, it is useful the following characterization.

Lemma 1030 Condition (27.9) holds if and only if ^ j = 0 for each j such that hj (^
x) < cj ,
that is, for each j 2
= A (^
x).

Proof Assume (27.9). Since for each j 2 J we have hj (^ x) cj , from the positive sign of ^
it follows that (27.9) implies cj hj (^ x) = 0 for each j 2 J, and therefore ^ j = 0 for each j
such that hj (^x) < cj . Conversely, if this last property holds we have

^ j (cj hj (^
x)) = 0; 8j 2 J: (27.10)

because, being hj (^
x) cj for each j 2 J, we have hj (^
x) < cj or hj (^
x) = cj . Expression
(27.10) immediately implies (27.9).

In other words, (27.9) is equivalent to require the nullity of each ^ j associated to a


not binding constraint. Hence, we can have ^ j > 0 only if the constraint j is binding in
correspondence of the solution x ^.
For example, if x^ is such that hj (^
x) < cj for each j 2 J, i.e., if in correspondence of x
^
all the inequality constraints are not binding, then we have ^ j = 0 for each j 2 J and the
vector ^ does not play any role in the determination of x ^. Naturally, this re‡ects the fact
that for the solution x
^ the inequality constraints do not play any role.

The next example shows that conditions (27.8) and (27.9) are necessary, but not su¢ cient
(something not surprising, being similar to what we saw for Fermat’s Theorem and Lemma
1021).
738 CHAPTER 27. INEQUALITY CONSTRAINTS

Example 1031 Consider the optimization problem:

x31 + x32
max (27.11)
x1 ;x2 2
sub x1 x2 0

It is a simple modi…cation of Example 1013, and it is of the form (27.4) with f : R2 ! R


and h : R2 ! R given by f (x) = 2 1 (x31 + x32 ) and h (x) = x1 x2 , while c = 0. We have:

rf (0; 0) = (0; 0) and rg (0; 0) = (1; 1)

and

rf (0; 0) = rg (0; 0)
(0 0) = 0

The point (0; 0) satis…es with = 0 the conditions (27.8) and (27.9), but (0; 0) is not
solution of the optimization problem (27.11), as (26.9) shows. N

We defer the proof of Lemma 1029 to the appendix.2 It is possible, however, to give a
heuristic proof of this lemma by reducing problem (27.4) to a problem with only equality
constraints, and then by exploiting the results seen in the previous chapter. For simplicity,
we give this argument for the special case

max f (x) (27.12)


x
sub g (x) = b and h (x) c

where f : A Rn ! R is the objective function, and g; h : A Rn ! R induce one equality


and one inequality constraint.
De…ne H : A R Rn+1 ! R as H (x; z) = h (x) + z 2 for each x 2 A and each z 2 R.
Given x 2 A, we have h (x) c if and only if there exists z 2 R such that h (x) + z 2 = c,
i.e., if and only if H (x; z) = c.3
De…ne F : A R Rn+1 ! R and G : A R Rn+1 ! R by F (x; z) = f (x) and
G (x; z) = g (x) for each x 2 A and each z 2 R. The dependence of F and G on z is only
…ctitious, but it allows to formulate the following classical optimization problem:

max F (x; z) (27.13)


x;z

sub G (x; z) = b and H (x; z) = c

Problems (27.12) and (27.13) are equivalent: x ^ is solution of problem (27.12) if and only if
there exists z^ 2 R such that (^
x; z^) is solution of problem (27.13).
2
A noteworthy feature of this proof is that it does not rely on the Implicit Function Theorem, unlike the
proof that we gave for Lemma 1012 (the special case of Lemma Lemma 1021 that we proved).
3
Note that the positivity of the square z 2 preserves the inequality g (x) b. The auxiliary variable z is
often called slack variable.
27.2. RESOLUTION OF THE PROBLEM 739

We have, therefore, reduced problem (27.12) to a problem with only equality constraints.
By Lemma 1021, (^ x; z^) is solution of such problem only if there exists a vector ( ^ ; ^ ) 2 R2
such that:
rF (^x; z^) = ^ rG (^
x; z^) + ^ rH (^
x; z^)
that is, only if

@F ^ @G (^ @H
(^
x; z^) = x; z^) + ^ (^
x; z^) 8i = 1; :::; n
@xi @xi @xi
@F ^ @G (^ @H
(^
x; z^) = x; z^) + ^ (^
x; z^)
@z @z @z
which is equivalent to:

x) = ^ rg (^
rf (^ x) + ^ rh (^
x)
2^ z = 0

On the other hand, we have 2^ z = 0 if and only if ^ z 2 = 0. Recalling the equivalence between
problems (27.12) and (27.13), we can therefore conclude that x ^ is solution of problem (27.12)
2
only if there exists a vector ( ; ) 2 R such that:

x) = ^ rg (^
rf (^ x) + ^ rh (^
x)
^ (c h (x)) = 0

We therefore have conditions (27.8) and (27.9) of Lemma 1029. What we have not been
able to prove is the positivity of the multiplier , and for this reason the proof just seen is
incomplete.4

27.2.1 Kuhn-Tucker’s Theorem


In view of Lemma 1029, the Lagrangian function associated to the optimization problem
(27.4) is the function
jJj
L : A RjIj R+ Rn+jIj+jJj ! R
de…ned by:5
X X
L (x; ; ) = f (x) + i (bi gi (x)) + j (cj hj (x)) (27.14)
i2I j2J

= f (x) + (b g (x)) + (c h (x)) ;

jJj
for each (x; ; ) 2 A RjIj R+ . Note that in this case is required to be a positive
vector.
We can now generalize Theorem 1022 to the optimization problem (27.4). As we did for
Theorem 1022, also here we omit the proof because it is analogous to the one of Lagrange’s
Theorem.
4
Since it is, in any case, an incomplete argument, for simplicity we did not check the rank condition
required by Lemma 1021.
5
The notation (x; ; ) underlines the di¤erent status of x with respect to and .
740 CHAPTER 27. INEQUALITY CONSTRAINTS

Theorem 1032 (Kuhn-Tucker) Let x ^ 2 C \ D be solution of the optimization problem


jJj
(27.4). If the constraints are regular in x ^, then there exists a pair of vectors ( ^ ; ^ ) 2 RjIj R+
x; ^ ; ^ ) satis…es the conditions:
such that the triple (^

^; ^ ; ^ = 0
rLx x (27.15)

^ j rL j
^; ^ ; ^ = 0
x 8j 2 J (27.16)

rL ^; ^ ; ^ = 0
x (27.17)

rL ^; ^ ; ^
x 0 (27.18)

The components ^ i and ^ j of the vectors ^ and ^ are called Lagrange multipliers, while
(27.15)-(27.18) are called Kuhn-Tucker conditions. The points x 2 A for which there exists
jJj
a pair ( ; ) 2 RjIj R+ such that the triple (x; ; ) satis…es the conditions (27.15)-(27.18)
are called points of Kuhn-Tucker.
The Kuhn-Tucker points are, therefore, the solutions of the (typically nonlinear) system
of equations and inequalities given by Kuhn-Tucker conditions. By Kuhn-Tucker’s Theorem,
a necessary condition for a point x, at which the constrains are regular, to be solution of
the optimization problem (27.4) is that it is a point of Kuhn-Tucker.6 Observe, however,
that a Kuhn-Tucker point (x; ; ) is not necessarily a stationary point for the Lagrangian
jJj
function: the condition (27.18) only requires rL (x; ; ) 2 R+ , not the stronger property
rL (x; ; ) = 0.

Let (x; ; ) be a Kuhn-Tucker point. By Lemma 1030, expression (27.16) is equivalent to


require j = 0 for each j such that hj (x) < cj . Hence, j > 0 implies that the correspondent
constraint is binding at the point x, that is, hj (x) = cj . Because of its importance, we state
formally this observation.

Proposition 1033 At a Kuhn-Tucker point (x; ; ) we have j > 0 only if hj (x) = cj .

27.2.2 The method of elimination


Like Lagrange’s Theorem, also Kuhn-Tucker’s Theorem suggests a procedure to …nd local
solutions of the optimization problem (27.4) that generalizes Lagrange’s method, as well as
a generalization of the method of elimination to …nd its global solutions. For brevity, we
directly consider this latter generalization.
Let D0 be the set of the points x 2 A where the regularity condition of the constraints
does not hold, and let D1 be, instead, the set of the points x 2 A where this condition holds.
The method of elimination consists of four phases:

1. Determine whether Tonelli’s Theorem can be applied, that is, if f is continuous and
coercive on C.

2. Find the set C \ D0 .


6
Note the caveat “in which the constraints are regular”. Indeed, a point of Kuhn-Tucker in which the
constraints are not regular is outside the scope of Kuhn-Tucker’s Theorem.
27.2. RESOLUTION OF THE PROBLEM 741

3. Find the set S of the points of Kuhn-Tucker that belong to D1 , i.e., the set of the
jJj
points x 2 D1 for which there exists ( ; ) 2 RjIj R+ such that the triple (x; ; )
satis…es the Kuhn-Tucker conditions (27.15)-(27.18).7

4. Compute the set ff (x) : S [ (C \ D0 )g; if x


^ 2 S [ (C \ D0 ) is such that

f (^
x) f (x) 8x 2 S [ (C \ D0 )

then such x
^ is solution of the optimization problem (27.4).

The …rst phase of the method of elimination is the same of the previous chapter, while
the other phases are the obvious extension of the method to the case of the problem (27.4).

Example 1034 Consider the optimization problem:

max x1 2x22 (27.19)


x1 ;x2

sub x21 + x22 1

This problem is of the form (27.4), where f; h : R2 ! R are given by f (x1 ; x2 ) = x1 2x22
and h (x1 ; x2 ) = x21 + x22 , while b = 1. Since C is compact, the …rst phase is completed
through Weierstrass’Theorem.
We have rh (x) = (2x1 ; 2x2 ), and so the constraint is regular at each point x 2 C, that
is, C \ D0 = ;.
The Lagrangian function L : R3 ! R is given by

L (x1 ; x2 ; ) = x1 2x22 + 1 x21 x22

and to …nd the set S of its Kuhn-Tucker points it is necessary to solve the system
8 @L
>
> @x1 = 1 2 x1 = 0
>
> @L
>
< @x2 = 4x2 2 x2 = 0
@L
@ = 1 x21 x22 = 0
>
> @L
>
> = 1 x21 x22 0
>
: @
0

We start by observing that 6= 0, that is, > 0. Indeed, if = 0 the …rst equation
becomes 1 = 0, a contradiction. We therefore assume that > 0. The second equation
implies x2 = 0, and in turn the third equation implies x1 = 1. From the …rst equation it
follows = (1=2), and hence the only solution of the system is ( 1; 0; (1=2)). The only
Kuhn-Tucker point is therefore ( 1; 0) , i.e., S = f( 1; 0)g.
In sum, S [ (C \ D0 ) = f( 1; 0)g and the method of elimination allows us to conclude
that ( 1; 0) is the only solution of the optimization problem 27.19. Note that in this solution
the constraint is binding (i.e., it is satis…ed with equality); indeed = (1=2) > 0, as required
by Proposition 1033. N
7
Observe that these points x satisfy for sure the constraints and hence we always have S D1 \ C; it is
therefore not necessary to check if for a point x 2 S we have also x 2 C. A similar observation was made in
the previous chapter.
742 CHAPTER 27. INEQUALITY CONSTRAINTS

Example 1035 Consider the optimization problem:


n
X
max x2i (27.20)
x1 ;:::;xn
i=1
n
X
sub xi = 1, x1 0, :::, xn 0
i=1

n
Pn 2
This problem
Pn is of the formn (27.4), where f; g : R ! R are given by f (x) = i=1 xi
and g (x) = i=1 xi , hj (x) : R ! R are given by hP j (x) = xj for j = 1; :::; n; while b = 1
and cj = 0 for j = 1; :::; n. The set C = x 2 Rn+ : ni=1 xi = 1 is compact and so also in
this case the …rst phase is completed thanks to the Weierstrass’Theorem.
For each x 2 Rn we have rg (x) = (1; :::; 1) and rhj (x) = ej . Therefore, the value
of these gradients does not depend on the point x considered. To verify the regularity
of the constraints, we consider the collection (1; :::; 1) ; e1 ; :::; en of these gradients. This
collection has n + 1 elements and it is obviously linearly dependent (the fundamental versors
e1 ,..., en are the most classic basis of Rn ).
On the other hand, it is immediate to see that any subcollection with at most n elements
is, instead, linearly independent. Hence, the only way to violate the regularity of the con-
straints is that they are all binding, so that all the collection of n + 1 elements have to be
considered. Fortunately, however, there does not exist any point x 2 Rn where all constraints
are binding. Indeed, the only point that satis…es with equality all the constraints
P xj 0
is the origin 0, which nevertheless does not satisfy the equality constraint ni=1 xi = 1.
We can conclude that the constraints are regular at all the points x 2 Rn , i.e., D0 = ;.
Hence, C \ D0 = ; and also the second phase of the method of elimination is complete.
The Lagrangian function L : R2n+1 ! R is given by
n n
! n
X X X
2
L (x1 ; x2 ; ) = xi + 1 xi + i xi 8 (x; ; ) 2 R2n+1
i=1 i=1 i=1

To …nd the set S of its Kuhn-Tucker points it is necessary to solve the system
8 @L
>
>
> @xi = 2xi + = 0;
Pn i
8i = 1; :::; n
>
> @L
> @ = (1 i=1 xi ) = 0
< @L = 1 Pn x = 0
>
@ i=1 i
@L
>
> i @ i = i xi = 0; 8i = 1; :::; n
>
> @L
>
> = xi 0; 8i = 1; :::; n
>
: @ i
i 0; 8i = 1; :::; n

If we multiply by xi the …rst n equations, we get

2x2i xi + i xi = 0; 8i = 1; :::; n

Adding up these new equations, we have


n
X n
X n
X
2 x2i xi + i xi =0
i=1 i=1 i=1
27.3. COGITO ET SOLVO 743

and therefore
n
X
2 x2i =0
i=1
Pn 2
that is, = 2 i=1 xi . We conclude that 0.
If xi = 0, from the condition @L=@xi = 0 it follows that = i . Since i 0 and 0,
it follows that i = 0. In turn, this implies = 0 and hence using again the condition
@L=@xi = 0 we P conclude that xi = = 0 for each i = 1; :::; n. But this contradicts the
n
condition (1 i=1 xi ) = 0, and we can therefore conclude that xi 6= 0, that is, xi > 0.
Since this holds for each i = 1; :::; n, it follows that xi > 0 for each i = 1; :::; n. From
the condition i xi = 0 it follows that i = 0 for each i = 1; :::; n, and the …rst n equations
become:
2xi =0 8i = 1; :::; n
P
that is, xi = =2 for each i = 1; :::; n. The xi are therefore all equal; from ni=1 xi = 1 it
follows that
1
xi = 8i = 1; :::; n
n
In conclusion,
1 1
S= ; :::;
n n
Since D0 = ;, we have S [ (C \ D0 ) = f(1=n; :::; 1=n)g, and the method of elimination
allows to conclude that the point (1=n; :::; 1=n) is the solution of the optimization problem
27.20. N

27.3 Cogito et solvo


The result of this example, i.e., that (1=n; :::; 1=n) is the optimal point, it is not surprising.
Indeed, it holds in a much more general form that can be proved with a simple application of
the Jensen’s inequality, without di¤erentiable methods. Yet another proof that di¤erential
methods might not be “optimal” (cf. the discussion after Example 1019 in the previous
chapter).
Proposition 1036 Let h : [0; 1] ! R be a concave function. The optimization problem
n
X
max h (xi ) (27.21)
x1 ;:::;xn
i=1
n
X
sub xi = 1, x1 0, :::, xn 0
i=1

has solution (1=n; :::; 1=n). It is the unique solution if h is strictly concave.
Pn
If h (xi ) = xi log xi , the function i=1 h (xi ) is the entropy (Examples 212 and 1009).
P
Proof Let x1 ; x2 ; :::; xn 2 [0; 1] with the constraint ni=1 xi = 1. By the Jensen’s inequality
applied to the function h, we can write
n n
!
X 1 1X 1
h (xi ) h xi = h
n n n
i=1 i=1
744 CHAPTER 27. INEQUALITY CONSTRAINTS

Namely,
n
X 1 1 1
h (xi ) nh =h + +h
n n n
i=1
Pn
This shows that (1=n; :::; 1=n) is optimal. Clearly, i=1 h (xi ) is strictly concave if h is.
Hence, the uniqueness is ensured by Theorem 706.

27.4 Concave optimization


The remarkable optimality properties of concave functions make them of particular interest
when dealing with the optimization problem (27.4). We start with a simple, but important
result.

Proposition 1037 Let A be convex. If the functions gi are a¢ ne for each i 2 I and the
functions hj are convex for each j 2 J, then the choice set C de…ned in (27.5) is convex.

Proof (i) Set Ci = fx 2 A : gi (x) = bi g for each i 2 I and Cj = fx 2 A : hj (x) cj g for


each j 2 J. Clearly, Cj is convex as the sublevel of a convex function (see Proposition 613).
A similar argument shows that alsoTeach Ci is T convex, and this implies the convexity of the
set C de…ned in (27.5) since C = C
i2I i \ ( j2J Cj ).

It is easy to give examples where C is no longer convex when the conditions of convexity
and a¢ nity used in this result are not satis…ed. Note that the convexity condition of the
hj is much weaker than that of a¢ nity on the gi . This shows that the convexity of the
choice set is more natural for inequality constraints than for equality ones. This is a crucial
“structural” di¤erence between the two types of constraints (which are more di¤erent than
it may appear prima facie).

Motivated by Proposition 1037, we give the following de…nition.

De…nition 1038 The optimization problem (27.4) is called concave if the objective function
f is concave, the functions gi are a¢ ne and hj are convex over the open and convex set A.

A concave optimization problem has therefore the form


max f (x) (27.22)
x
sub gi (x) = bi 8i 2 I
hj (x) cj 8j 2 J
where I and J are …nite sets of indices (possibly empty), f : A Rn ! R is a concave
objective function, the a¢ ne functions gi : A n
R ! R and the associated scalars bi
characterize jIj equality constraints, while the convex functions hj : A Rn ! R with the
associated scalars cj induce jJj inequality constraints. The convex domain A is assumed to
be open in order to best exploit the properties of concave functions.
If the gi are de…ned on the entire Rn , we can write gi (x) = i x + qi (Section 14.2).
Hence, if is the matrix with jIj n that has the vectors i 2 Rn as its rows, we can write
the equality constraints in the matrix form
x+q =b (27.23)
27.4. CONCAVE OPTIMIZATION 745

where b = (b1 ; :::; bn ) 2 Rn . Often q = 0, so the equality constraints are represented in the
simpler form x = b.

Recall from Section 25.3 that the search of the solutions of an unconstrained optimization
problem for concave functions was based on a remarkable property: the …rst order necessary
condition for the existence of a local maximum becomes su¢ cient for the existence of a global
maximum in the case of concave functions.
The next fundamental result is the “constrained”version of this property. Note that the
regularity of the constraints does not play any role in this result.

Theorem 1039 In a concave optimization problem in which the functions f; fgi gi2I and
fhj gj2J are di¤ erentiable on A, the Kuhn-Tucker points are solutions of the problem.

Proof Let (x ; ; ) be a Kuhn-Tucker point for the optimization problem (27.4), that is,
(x; ; ) satis…es the conditions (27.15)-(27.18). In particular, this means that
X X
rf (x ) = i rgi (x ) + j rhj (x ) (27.24)
i2I j2A(x )\J

Since each gi is a¢ ne and each hj is convex, by (22.9) it follows that:

hj (x) hj (x ) + rhj (x ) (x x ); 8j 2 J; 8x 2 A; (27.25)


gi (x) = gi (x ) + rgi (x ) (x x ); 8i 2 I; 8x 2 A; (27.26)

For each j 2 A (x ) we have hj (x ) = cj , and hence hj (x) hj (x ) for each x 2 C and


each j 2 A (x ) \ J. Moreover, gi (x ) = gi (x) for each i 2 I and each x 2 C. By (27.25)
and (27.26) it follows

rhj (x ) (x x ) 0; 8j 2 A (x ) ; 8x 2 C;
rgi (x ) (x x ) = 0; 8i 2 I; 8x 2 C

Together with (27.24), we therefore have:


X X
rf (x ) (x x )= i rgi (x ) (x x )+ j rhj (x ) (x x ) 0;
i2I j2A(x )\J

for each x 2 C. On the other hand, by (22.9) we have:

f (x) f (x ) + rf (x ) (x x ); 8x 2 A;

and we conclude that f (x) f (x ) for each x 2 C, as desired.

Theorem 1039 gives us a su¢ cient condition for optimality: if a point is of Kuhn-Tucker,
then it is solution of the optimization problem. The condition is, however, not necessary:
there can be solutions of a concave optimization problem that are not Kuhn-Tucker points.
In view of Kuhn-Tucker’s Theorem, this can happen only if the solution is a point in which
the constraints are not regular. The next example illustrates this situation.
746 CHAPTER 27. INEQUALITY CONSTRAINTS

Example 1040 Consider the optimization problem:

max x1 x2 x23 (27.27)


x1 ;x2 ;x3

sub x21 + x22 2x1 0 and x21 + x22 + 2x1 0

This problem is of the form (27.4), where f : R3 ! R, h1 : R3 ! R and h2 : R3 ! R are given


by f (x1 ; x2 ; x3 ) = x1 x2 x23 , h1 (x1 ; x2 ; x3 ) = x21 +x22 2x1 , h2 (x1 ; x2 ; x3 ) = x21 +x22 +2x1 ,
while c1 = c2 = 0. Clearly f is concave as sum of concave functions. Likewise, h1 and h2
are convex, so that (27.27) is a concave optimization problem.
The system of inequalities
x21 + x22 2x1 0
x21 + x22 + 2x1 0
has the point (0; 0) as its unique solution. Hence, C = x 2 R3 : x1 = x2 = 0 is a straight
line in R3 and the unique solution of the problem (27.27) is the point (0; 0; 0). On the other
hand,
rh1 (0; 0; 0) = ( 2; 0; 0) and rh2 (0; 0; 0) = (2; 0; 0) ;
and hence the constraints are not regular at (0; 0; 0). Since

rf (0; 0; 0) = ( 1; 1; 0)

there does not exist any pair ( 1; 2) 2 R2+ such that:

rf (0; 0; 0) = 1 rh1 (0; 0; 0) + 2 rh2 (0; 0; 0)

and therefore the solution (0; 0; 0) is not a Kuhn-Tucker point. N

By combining Kuhn-Tucker’s Theorem and Theorem 1039 we get the following necessary
and su¢ cient optimality condition.

Theorem 1041 Consider a concave optimization problem in which the functions f; fgi gi2I
and fhj gj2J are of class C 1 on A. A point x 2 A for which the constraints are regular is
solution of such a problem if and only if it is a Kuhn-Tucker point.

Theorem 1041 is a re…nement of the Kuhn-Tucker’s Theorem, and as such it allows us to


re…ne the method of elimination, which we will call convex method (of elimination). Such
method is based on the following phases:

1. Determine if the problem is concave, that is, if the function f is concave, if the functions
gi are a¢ ne and if the functions hj are convex.
2. Find the set C \ D0 .
3. Find the set T of the Kuhn-Tucker points,8 i.e., the set of the points x 2 A for which
jJj
there exists ( ; ) 2 RjIj R+ such that the triple (x; ; ) satis…es the Kuhn-Tucker
conditions (27.15)-(27.18).9
8
The set T considered here is therefore slightly di¤erent from the set T seen in the previous versions of
the method of elimination.
9
These points x satisfy surely the constraints and hence we have always T D1 \ C; it is therefore not
necessary to verify if for a point x 2 T we have also x 2 C. A similar observation was done in Chapter 9.
27.4. CONCAVE OPTIMIZATION 747

4. If T 6= ;, then taken any x 2 T , construct the set ff (x) : fx g [ (C \ D0 )g; all the
points of T are solutions of the problem,10 and a point x 2 C \ D0 is itself solution if
and only if f (x) = f (x ).
5. If T = ;, check if Tonelli’s Theorem can be applied (i.e., if f is continuous and coercive
on C); if this is the case, the maximizers of f on C \D0 are solutions of the optimization
problem (27.4).

Since either phase 4 or 5 applies, depending on whether or not T is empty, the actual
phases of the convex method are four.
The convex method works thanks to Theorems 1039 and 1041. Indeed, if T 6= ;, then by
Theorem 1039 all points of T are solutions of the problem. In this case, a point x 2 C \ D0
that does not belong to T can in turn be a solution only if its value f (x) is equal to that of
any point in T .
When, instead, we have T = ;, then Theorem 1041 guarantees that no point in D1 is
solution of the problem. At this stage, if Tonelli’s Theorem ensures the existence of at least
a solution, we can restrict the search to the set C \ D0 . In other words, it is su¢ cient to …nd
the maximizers of f on C \ D0 : they are also solutions of problem (27.4), and vice versa.11

Clearly, the convex method becomes especially powerful when T 6= ; because in such a
case there is no need to verify the validity of global existence theorems a la Weierstrass and
Tonelli, but it is su¢ cient to …nd the Kuhn-Tucker points.
If we are just satis…ed with the solutions that are Kuhn-Tucker points, without worrying
about the possible existence of solutions that are not so, we can give a short version of the
convex method, based only on Theorem 1039. We can call it the short convex method. It is
based only on two phases:

1. Determine whether the optimization problem (27.4) is concave, i.e., if the function f
is concave, if the functions gi are a¢ ne, and if the functions hj are convex.
2. Find the set T of the Kuhn-Tucker points.

By Theorem 1039, all the points of T are solutions of the problem. The short convex
method is simpler than the convex method, and it does not require neither the use of global
existence theorems nor the study of the regularity of the constraints. The price of this sim-
pli…cation is in the possible inaccuracy of this method: being based on su¢ cient conditions,
it is not able to …nd the solutions where these conditions are not satis…ed (by Theorem 1041,
such solutions would be points where the constraints are not regular). Furthermore, the
short method cannot be applied when T = ;; in such a case, it is necessary to apply the
complete convex method.

The short convex method is especially powerful when the objective function f is strictly
concave. Indeed, in such a case a solution found with the short method is necessarily also
the unique solution of the concave optimization problem. Therefore, in this case the short
method is as e¤ective as the complete convex method.
10
The set T is at most a singleton when f is strictly concave because in such a case there is at most a
solution of the problem (Theorem 706).
11
Observe that such maximizers exist. Indeed, if arg maxx2C f (x) 6= ; if none of its elements belongs to
D0 \ C, it follows that arg maxx2C f (x) = arg maxx2D0 \C f (x).
748 CHAPTER 27. INEQUALITY CONSTRAINTS

Example 1042 Consider the optimization problem:

max x21 + x22 + x23 (27.28)


x1 ;x2 ;x3

sub 3x1 + x2 + 2x3 1 and x1 0

This problem is of the form (27.4), where f : R3 ! R is given by f (x) = x21 + x22 + x23 ,
h1 : R3 ! R is given by h1 (x) = (3x1 + x2 + 2x3 ) and h2 (x) : R3 ! R is given by
h2 (x) = x1 , while c1 = 1 and c2 = 0.
Using Theorem 928 it is easy to verify that f is strictly concave, while it is immediate
to verify that h1 and h2 are convex. Therefore, (27.28) is a concave optimization problem.
Since f is strictly concave, we apply without doubts the short convex method. To do this
we have to …nd the set T of the Kuhn-Tucker points.
The Lagrangian function L : R5 ! R is given by

L (x1 ; x2 ; x3 ; 1; 2) = x21 + x22 + x23 + 1( 1 + 3x1 + x2 + 2x3 ) + 2 x1 ;

To …nd the set T of its Kuhn-Tucker points it is necessary to solve the system of equalities
and inequalities: 8 @L
>
>
> @x1 = 2x1 + 3 1 + 2 = 0
>
> @L
= 2x2 + 1 = 0
>
> @x 2
>
> @L
>
> @x3 = 2x3 + 2 1 = 0
>
< @L
1 @ 1 = 1 ( 1 + 3x1 + x2 + 2x3 ) = 0
@L (27.29)
>
> 2 @ 2 = 2 x1 = 0
>
>
> @L
>
>
> @ 1 = 1 + 3x1 + x2 + 2x3 0
>
> @L
= x1 0
>
>
: @ 2
1 0; 2 0
We consider four cases, depending on the fact that the multipliers 1 and 2 are zero or
not.
Case 1 : 1 > 0 and 2 > 0. The conditions 2 @L=@ 2 = @L=@x1 = 0 imply x1 = 0 and
3 1 + 2 = 0. This last equation does not have strictly positive solutions 1 and 2 , and
hence we conclude that we cannot have 1 > 0 and 2 > 0.
Case 2 : 1 = 0 and 2 > 0. The conditions 2 @L=@ 2 = @L=@x1 = 0 imply x1 = 0 and
3 1 = 0, that is 1 = 0. This contradiction shows that we cannot have 1 = 0 and 2 > 0.
Case 3 : 1 > 0 and 2 = 0. The conditions 1 @L=@ 1 = @L=@x1 = @L=@x2 = @L=@x3 =
0 imply: 8
>
> 2x1 + 3 1 = 0
<
2x2 + 1 = 0
>
> 2x3 + 2 1 = 0
:
3x1 + x2 + 2x3 = 1
Solving for 1 , we get 1 = 1=7, and hence x1 = 3=14, x2 = 1=14 and x3 = 1=7. The
quintuple (3=14; 1=14; 1=7; 1=7; 0) solves the system (27.29), and hence (3=14; 1=14; 1=7) is a
Kuhn-Tucker point.
Case 4 : 1 = 2 = 0. The condition @L=@x1 = 0 implies x1 = 0, while the conditions
@L=@x2 = @L=@x3 = 0 imply x2 = x3 = 0. It follows that the condition @L=@ 1 0 implies
1 0, and this contradiction shows that we cannot have 1 = 2 = 0.
27.5. APPENDIX: PROOF OF A KEY LEMMA 749

In conclusion,
T = f((3=14; 1=14; 1=7))g
and since f is strictly concave the short convex method allows to conclude that
(3=14; 1=14; 1=7) is the unique solution of the optimization problem (27.28). N

We conclude with a …nal important observation. The solution methods seen in this
chapter are based on the search of the Kuhn-Tucker points, and therefore they require the
resolution of systems of nonlinear equations. In general, these systems are not easy to solve
and this limits the computational utility of these methods, whose importance is mostly
theoretical. At a numerical level, other methods are used, which the interested reader can
…nd in books of numerical analysis.

27.5 Appendix: proof of a key lemma


We begin with a calculus delight.

Lemma 1043 (i) The function y = x jxj is C 1 in R and Dx jxj = 2 jxj. (ii) The square
2 2
(x+ ) of the function x+ = max fx; 0g is C 1 on R, and D (x+ ) = 2x+ .

Proof (i) Observe that x jxj is in…nitely di¤erentiable for x 6= 0 and its …rst derivative is,
by the product rule for di¤erentiation,
jxj
Dx jxj = xD jxj + jxj Dx = x + jxj = 2 jxj
x
This is true for x 6= 0. Now it su¢ ces to invoke a classical result that asserts: let f : I ! R
be continuous on a real interval, and f be di¤erentiable at I fx0 g; if limx!x0 Df (x) = ,
then f is di¤erentiable at x0 and Df (x0 ) = . As an immediate consequence, Dx jxj = 2 jxj
also at x = 0. (ii) We have x+ = 2 1 (x + jxj). Therefore

2 1 1 1
x+ = (x + jxj)2 = x2 + x jxj
4 2 2
2 2
It follows that (x+ ) is C 1 and D (x+ ) = x + jxj = 2x+ .

Proof of Lemma 1029 Let k k be the Euclidean norm. We have hj (^ x) < cj for each
j 2= A (^x). Since A is an open, there exists e " > 0 su¢ ciently small such that Be" (^ x) =
fx 2 A : kx x ^k e "g A. Moreover, since each hj is continuous, for each j 2 = A (^x) there
exists "j su¢ ciently small such that hj (x) < cj for each x 2 B"j (^ x) = fx 2 A : kx x ^k "j g.
Let "0 = minj 2A(^
= x) "j and b " = min fe"; "0 g; in other words, b
" is the minimum between e " and
the "j . In this way we have Bb" (^ x) = fx 2 A : kx x ^k b "g A and hj (x) < cj for each
x 2 Bb" (^
x) and each j 2 = A (^x).
Given " 2 (0; b"], the set S" (^
x) = fx 2 A : kx x ^k = "g is compact. Moreover, by what
just seen hj (x) < cj for each x 2 S" (^ x) and each j 2 = A (^
x), that is, in S" (^
x) all the non
binding constraints are always satis…ed.
For each j 2 J, let h~ j : A Rn ! R be de…ned as

~ j (x) = max fhj (x)


h cj ; 0g = (hj (x) cj )+
750 CHAPTER 27. INEQUALITY CONSTRAINTS

~ 2 2 C 1 (A) and
for each x 2 A. By Lemma 1043, h j

~ 2 (x)
@h + @hj (x)
j ~ j (x)
=2 h cj ; 8p = 1; :::; n (27.30)
@xp @xp
We …rst prove a property that we will use after.

Fact 1. For each " 2 (0; b


"], there exists N > 0 such that

f (x) f (^
x) kx x ^k2 (27.31)
0 1
X X 2
N@ x))2 +
(gi (x) gi (^ ~ j (x)
h ~ j (^
h x) A<0
i2I i2J\A(^
x)

for each x 2 S" (^


x).

Proof of Fact 1 We proceed by contradiction, and we assume therefore that there exists
" 2 (0; b
"] for which there is no N > 0 such that (27.31) holds. Take an increasing sequence
fNn gn with Nn " +1, and for each of these Nn take xn 2 S" (^ x) for which (27.31) does not
hold, that is, xn such that:

f (xn ) f (^
x) kxn x ^ k2
0 1
X X 2
Nn @ x))2 +
(gi (xn ) gi (^ ~ j (xn )
h ~ j (^
h x) A 0
i2I i2J\A(^
x)

Hence, for each n 1 we have:

f (xn ) f (^
x) kxn ^ k2
x X
(gi (xn ) x))2
gi (^ (27.32)
Nn
i2I
X 2
+ ~ j (xn )
h ~ j (^
h x)
j2J\A(^
x)

Since the sequence fxn g just constructed is contained in the compact set S" (^ x), by the
Bolzano-Weierstrass Theorem there exists a subsequence fxnk gk convergent in S" (^ x), i.e.,
there exists x 2 S" (^
x) such that xnk ! x . Inequality (27.32) implies that, for each k 1,
we have:
f (xnk ) f (^
x) kxnk ^ k2
x X
(gi (xnk ) x))2
gi (^ (27.33)
Nnk
i2I
X 2
+ ~ j (xn )
h ~ j (^
h x)
k
j2J\A(^
x)

Since f is continuous, we have limk f (xnk ) = f (x ). Moreover, limk kxnk x


^k = kx x
^k.
Since limk Nnk = +1, we have

f (xnk ) f (^
x) kxnk ^k2
x
lim =0
k Nnk
27.5. APPENDIX: PROOF OF A KEY LEMMA 751

~j ,
and hence (27.33) implies, thanks to the continuity of the functions gi and h
X X 2
(gi (x ) x))2 +
gi (^ ~ j (x )
h ~ j (^
h x)
i2I i2J\A(^
x)
0 1
X X 2
= lim @ (gi (xnk ) x))2 +
gi (^ ~ j (xn )
h k
~ j (^
h x) A=0
k
i2I j2J\A(^
x)

2
It follows that (gi (x ) x))2 =
gi (^ ~ j (x )
h ~ j (^
= 0 for each i 2 I and for each
h x)
j 2 J \ A (^x), from which gi (x ) = gi (^
x) = bi for each i 2 I and h~ j (x ) = h
~ j (^
x) = cj for
each j 2 J \ A (^x).
Since in S" (^x) the non binding constraints are always satis…ed, i.e., hj (x) < cj for each
x 2 S" (^x) and each j 2 = A (^
x), we can conclude that x satis…es all the constraints. We
therefore have f (^x) f (x ) given that x ^ solves the optimization problem.
On the other hand, since xnk 2 S" (^ x) for each k 1, (27.33) implies

f (xnk ) f (^
x)
0 1
X X 2
kxnk ^k2 + Nnk @
x (gi (xnk ) x))2 +
gi (^ ~ j (xn )
h k
~ j (^
h x) A "2
i2I j2J\A(^
x)

for each k 1, and hence f (xnk ) f (^x) + "2 for each k 1. Thanks to the continuity of
f , this leads to
f (x ) = lim f (xnk ) f (^x) + "2 > f (^
x)
k

which contradicts f (^
x) f (x ). This contradiction proves Fact 1. 4

Using Fact 1, we prove now a second property that we will need. Here we set S =
SRjIj+jJj+1 = x 2 RjIj+jJj+1 : kxk = 1 .

"], there exist x" 2 B" (^


Fact 2. For each " 2 (0; b x) and a vector

" " " " "


0; 1 ; :::; jIj ; 1 ; :::; jJj 2S

with " 0 for each j 2 J, such that


j

X X
" @f @gi " " @hj
0 (x" ) 2 x"j x
^j "
i (x ) j (x" ) = 0 (27.34)
@xz @xz @xz
i2I j2J\A(^
x)

for each z = 1; :::; n.

Proof of Fact 2 Given " 2 (0; b"], let N" > 0 be the positive constant whose existence is
guaranteed by Fact 1. De…ne the function " : A Rn ! R as:
0 1
X X 2
" (x) = f (x) f (^
x) kx x ^k2 N" @ x))2 +
(gi (x) gi (^ ~ j (x) h
h x) A
~ j (^
i2I j2J\A(^
x)
752 CHAPTER 27. INEQUALITY CONSTRAINTS

for each x 2 A. We have " (^


x) = 0 and, given how N" has been chosen,

" (x) > 0; 8x 2 S" (^


x) (27.35)

The function " is continuous on the compact set B" (^ x) = fx 2 A : kx x ^k "g and, by
Weierstrass’Theorem, there exists x" 2 B" (^ x) such that " (x" ) " (x) for each x 2 B" (^
x).
In particular, " (x" ) " (^
x ) = 0, and hence (27.35) implies that kx" k < ", that is, x" 2

x). Point x" is therefore a maximum on the open set B" (^


B" (^ x) and by Fermat’s Theorem
we have r " (x" ) = 0. Therefore, by (27.30), we have:
0 1
m
X X
@f @gi " @h
~ j (x" ) j (x" )A = 0 (27.36)
(x" ) 2 (x"z x
^z ) 2N" @ gi (x" ) (x ) + h
@xz @xz @xz
i=1 j2J\A(^
x)

for each z = 1; :::; n. Set:

m
X X 2 1
c" = 1 + (2N" gi (x" ))2 + ~ j (x" )
2N" h ; "
0 =
c"
i=1 j2J\A(^
x)

2N" gi (x" ) ~ j (x" )


2N" h
" "
i = 8i 2 I ; j = 8j 2 J \ A (^
x)
c" c"
"
j =0 8j 2
= A (^
x)

so that (27.34) is obtained by dividing (27.36) by c" . Observe that "i 0 for each j 2 J
P P 2
and that i2I ( "i )2 + j2J "j = 1, i.e., " " " " "
0 ; 1 ; :::; jIj ; 1 ; :::; jJj 2 S. 4

Using Fact 2, we can now complete the proof.nTake a decreasing sequence o f"n gn (0; b
"]
n n n n n
with "n # 0, and consider the associated sequence 0 ; 1 ; :::; jIj ; 1 ; :::; jJj S whose
n
existence is guaranteednby Fact 2. o
n n n n n
Since the sequence 0 ; 1 ; :::; jIj ; 1 ; :::; jJj is contained in the compact set S, by
n
the Bolzano-Weierstrass Theorem there exists a subsequence
n o
nk nk nk nk nk
0 ; 1 ; :::; jIj ; 1 ; :::; jJj k

convergent in S, that is, there exists 0; 1 ; :::; jIj ; 1 ; :::; jJj 2 S such that

nk nk nk nk nk
0 ; 1 ; :::; jIj ; 1 ; :::; jJj ! 0; 1 ; :::; jIj ; 1 ; :::; jJj

By Fact 2, for each "nk there exists xnk 2 B"nk (^


x) for which (27.34) holds, i.e.,

nk @f X nk @gi nk X nk @hj nk
0 (xnk ) 2 (xnk x
^z ) i (x ) j (x ) = 0
@xz @xz @xz
i2I j2J\A(^
x)
27.5. APPENDIX: PROOF OF A KEY LEMMA 753

for each z = 1; :::; n. Consider the sequence fxnk gk so constructed. From xnk 2 B"nk (^
x) it
follows that kxnk x ^k < "nk ! 0 and hence, for each z = 1; :::; n,

@f X @gi X @hj
0 (^
x) i (^
x) j (x) (27.37)
@xz @xz @xz
i2I j2J\A(^
x)
0 1
nk @f X nk @gi nk X nk @hj nk A
= lim @ 0 xk 2 (xnk x
^z ) i (x ) j (x )
k @xz @xz @xz
i2I j2J\A(^
x)

= 0:

On the other hand, 0 6= 0. Indeed, if it were 0 = 0, then by (27.37) it follows that


X @gi X @hj
i (^
x) + j (^
x) = 0 8z = 1; :::; n
@xz @xz
i2I j2J\A(^
x)

The linear independence of the gradients associated to the constraints that holds for the
hypothesis of regularity of the constraints implies i = 0 for each i 2 I, which contradicts
0 ; 1 ; :::; jIj ; 1 ; :::; jJj 2 S.

In conclusion, if we set ^ i = i = 0 for each i 2 I and ^ = = 0 for each j 2 J, (27.37)


j j
implies (27.8).
754 CHAPTER 27. INEQUALITY CONSTRAINTS
Chapter 28

General constraints

28.1 A general concave problem


The choice set of the optimization problem (27.4) of the previous chapter is identi…ed by a
…nite number of equality and inequality constraints expressed through suitable functions g
and h. In general, however, we may also require solutions to belong to a set X that is not
necessarily identi…ed through a …nite number of functional constraints.1 We thus have the
following optimization problem:

max f (x) (28.1)


x
sub gi (x) = bi 8i 2 I
hj (x) cj 8j 2 J
x2X

where X is a subset of A and the other elements are as in the optimization problem (27.4).
This problem includes as special cases the optimization problems that we have seen so far:
we get back to the optimization problem (27.4) when X = A and to an unconstrained
optimization problem when I = J = ; and C = X is open.
Formulation (28.1) may be also useful when there are conditions on the sign or on the
value of the choice variables xi . The classic example is the non-negativity condition of the xi ,
which are best expressed as a constraint x 2 Rn+ rather than through n inequalities xi 0.
Here a constraint of the form x 2 X simpli…es the exposition.

In this chapter we want to address the general optimization problem (28.1). If X is open,
the solution techniques of Section 27.2 can be easily adapted by restricting the analysis on
X itself (which can play the role of the set A). Matters are more interesting when X is
not open. Here we focus on the concave case of Section 27.4, widely used in applications.
Consequently, throughout the chapter X denotes a closed and convex subset of an open
convex set A, f : A Rn ! R is a concave di¤erentiable objective function, gi : Rn ! R are
a¢ ne functions and hj : Rn ! R are convex di¤erentiable functions.2
1
Sometimes this distinction is made by talking of implicit and explicit constraints. Di¤erent authors,
however, may give an opposite meaning to this terminology (that, in any case, we do not adopt).
2
To ease matters, we de…ne the functions gi and hj on the entire space Rn . In particular, this means that
the equality constraints can be represented in the matrix form (27.23).

755
756 CHAPTER 28. GENERAL CONSTRAINTS

28.2 Analysis of the black box


In canonical form, the optimization problem (28.1) has the form

max f (x) sub x 2 C


x

where the choice set is

C = fx 2 X : gi (x) = bi and hj (x) cj 8i 2 I; 8j 2 J g (28.2)

The set C is closed and convex. As it is often the case, the best way to proceed is to abstract
from the speci…c problem at hand, with its potentially distracting details. For this reason,
we will consider the following optimization problem:

max f (x) sub x 2 C (28.3)


x

where C is a generic closed and convex choice set that, for the moment, we treat as a black
box. Throughout this section we assume that f is continuously di¤erentiable on an open
convex set that contains C.

28.2.1 Variational inequalities


We begin the analysis of the black box problem (28.3) with the simple scalar case

max f (x) sub x 2 [a; b] (28.4)


x

where a; b 2 R. Suppose that x


^ 2 [a; b] is a solution. It is easy to see that we can have two
cases:

(i) x ^ is an interior point; in this case, f 0 (^


^ 2 (a; b), i.e., x x) = 0.

(ii) x ^ is a boundary point; in this case, f 0 (^


^ 2 fa; bg, i.e., x x) ^ = a, while f 0 (^
0 if x x) 0
if x
^ = b.

The next lemma gives a simple and elegant way to unify these two cases.

Proposition 1044 If x
^ 2 [a; b] is solution of the optimization problem (28.4), then

f 0 (^
x) (x x
^) 0 8x 2 [a; b] (28.5)

The converse holds if f is concave.

The proof of this result rests on the following lemma.

Lemma 1045 Expression (28.5) is equivalent to f 0 (^ ^ 2 (a; b), to f 0 (^


x) = 0 if x x) 0 if
x 0
^ = a, and to f (^
x) 0 if x
^ = b.
28.2. ANALYSIS OF THE BLACK BOX 757

Proof We divide the proof in three parts, one for each of the equivalences to prove.
(i) Let x^ 2 (a; b). We prove that (28.5) is equivalent to f 0 (^
x) = 0. If f 0 (^
x) = 0 holds,
0
then f (^ x) (x x ^) = 0 for each x 2 [a; b], and hence (28.5) holds. Vice versa, suppose that
(28.5) holds. Setting x = a, we have (a x ^) < 0 and so (28.5) implies f 0 (^ x) 0. On
the other hand, setting x = b, we have (b x ^) > 0 and so (28.5) implies f 0 (^ x) 0. In
conclusion, x 0
^ 2 (a; b) implies f (^
x) = 0.

^ = a. We prove that (28.5) is equivalent to f 0 (a)


(ii) Let x 0. Let f 0 (a) 0. Since
0
(x a) 0 for each x 2 [a; b], it follows that f (a) (x a) 0 for each x 2 [a; b], and hence
(28.5) holds. Vice versa, suppose that (28.5) holds. By taking x 2 (a; b], we have (x a) > 0
and so (28.5) implies f 0 (a) 0.

(iii) Let x ^ = b. We prove that (28.5) is equivalent to f 0 (b) 0. Let f 0 (b) 0. Since
0
(x b) 0 for each x 2 [a; b], we have f (b) (x b) 0 for each x 2 [a; b] and (28.5) holds.
Vice versa, suppose that (28.5) holds. By taking x 2 [a; b), we have (x b) < 0 and so (28.5)
implies f 0 (b) 0.

Proof of Proposition 1044 In view of Lemma 1045, it only remains to prove that (28.5)
becomes a su¢ cient condition when f is concave. Suppose therefore that f is concave and
that x
^ 2 [a; b] is such that (28.5) holds. We prove that this implies that x ^ is solution of
problem (28.4). Indeed, by (22.7) we have f (x) f (^ 0
x) + f (^x) (x x ^) for each x 2 [a; b],
which implies f (x) f (^ x) f 0 (^
x) (x x ^) for each x 2 [a; b]. Thus, (28.5) implies that
f (x) f (^x) 0, that is, f (x) f (^ x) for each x 2 [a; b]. Hence, x
^ solves the optimization
problem (28.4).

The inequality (28.5) that x


^ satis…es is an example of a variational inequality. Beyond
unifying the two cases, this variational inequality is interesting because when f is concave
it provides a necessary and su¢ cient condition for a point to be solution of the optimiza-
tion problem. Even more interesting is the fact that this characterization can be naturally
extended to the case of functions of several variables.

Theorem 1046 (Stampacchia) If x ^ 2 C is solution of the optimization problem (28.3),


then it satis…es the variational inequality

rf (^
x) (x x
^) 0 8x 2 C (28.6)

The converse holds if f is concave.

As in the scalar case, the variational inequality uni…es the optimality necessary conditions
for interior and boundary points. Indeed, it is easy to check that, when x ^ is an interior point
of C, (28.6) reduces to the classic condition rf (^ x) = 0 of Fermat’s Theorem.

Proof Let x
^ 2 C be solution of the optimization problem (28.3), i.e., f (^
x) f (x) for each
x 2 C. Given x 2 C, set zt = x
^ + t (x x^) for t 2 [0; 1]. Since C is convex, zt 2 C for each
758 CHAPTER 28. GENERAL CONSTRAINTS

t 2 [0; 1]. De…ne : [0; 1] ! R by (t) = f (zt ). Since f is di¤erentiable at x


^, we have

0 (t) (0) f (^
x + t (x x
^)) f (^
x)
+ (0) = lim = lim
t!0+ t t!0+ t
df (^
x) (t (x x
^)) + o (kt (x x
^)k)
= lim
t!0+ t
o (t kx x ^k)
= df (^
x) (x x
^) + lim = df (^
x) (x x
^) = rf (^
x) (x x
^)
t!0 + t

For each t 2 [0; 1] we have (0) = f (^x) f (zt ) = (t), and so : [0; 1] ! R has a (global)
maximizer at t = 0. It follows that 0+ (0) 0, which implies rf (^ x) (x x ^) 0, as desired.
As to the converse, assume that f is concave. By (22.9), f (x) f (^x) + rf (^
x) (x x ^)
for each x 2 C, and therefore (28.6) implies f (x) f (^x) for each x 2 C.

For the dual minimum problems, the variational inequality is easily seen to take the dual
form rf (^
x) (x x ^) 0. For interior solutions, instead, the condition rf (^ x) = 0 is the
same in both maximum and minimum problems.3

28.2.2 A general …rst order condition


The normal cone NC (x) of a convex set C with respect to a point x 2 C is given by

NC (x) = fy 2 Rn : y (x x) 0 8x 2 Cg

Next we provide a couple of important properties of NC (x). In particular, (ii) shows that
NC (x) is nontrivial only if x is a boundary point.

Lemma 1047 (i) NC (x) is a closed and convex cone;

(ii) NC (x) = f0g if and only if x is an interior point of C.

Proof (i) The set NC (x) is clearly closed. Moreover, given y; z 2 NC (x) and ; 0, we
have
( y + z) (x x) = y (x x) + z (x x) 0 8x 2 C
and so y + z 2 NC (x). By Proposition 634, NC (x) is a convex cone. (ii) We only prove
the “if” part. Let x be an interior point of C. Suppose, by contradiction that there is a
vector y 6= 0 in NC (x). As x is interior, we have that x + ty 2 C for t > 0 su¢ ciently
small. Hence we would have y (x + ty x) = ty y = t kyk2 0. This implies y = 0, a
contradiction. Hence NC (x) = f0g.

To see the importance of normal cones, note that condition (28.6) can be written as:

rf (^
x) 2 NC (^
x) (28.7)
3
The unifying power of variational inequalities in optimization is the outcome of a few works of Guido
Stampacchia in the early 1960s. For an overview, see D. Kinderlehrer and G. Stampacchia, “An introduction
to variational inequalities and their applications”, Academic Press, 1980.
28.2. ANALYSIS OF THE BLACK BOX 759

Therefore, x^ solves the optimization problem (28.3) only if the gradient rf (^


x) belongs to the
normal cone of C with respect to x ^. This way of writing condition (28.6) is useful because,
given a set C, if we can describe the form that the normal cone has – something that does
not require any knowledge of the objective function f – we can then have a sense of which
form takes the “…rst order condition”for the optimization problems that have C as a choice
set.
In other words, (28.7) can be seen as a general …rst order condition that permits to dis-
tinguish two parts in the …rst order condition: the part NC (^ x), determined by the constraint
C, and the part rf (^ x), determined by the objective function. This distinction between the
roles of the objective function and of the constraint is illuminating.4

The next result characterizes the normal cone for convex cones.
Proposition 1048 If C is a convex cone and x 2 C, then
NC (x) = fy 2 Rn : y x = 0 and y x 0 8x 2 Cg
If, in addition, C is a vector subspace, then NC (x) = C ? for every x 2 C.
Proof Let y 2 NC (x) : Then y (x x) 0 for all x 2 C: As 0 2 C, we have y (0 x) 0.
Hence y x 0. On the other hand, we can write y x = y (2x x) 0. It follows that
y x = 0. In turn, y x = y (x x) 0 for each x 2 C. Conversely, if y satis…es the
two conditions y x = 0 and y x 0 for each x 2 C, then y (x x) = y x y x 0,
and so y 2 NC (x). Suppose now, in addition, that C is a vector subspace. A subspace
C is a cone such that x 2 C implies x 2 C. Hence, the …rst part of the proof yields
NC (x) = fy 2 Rn : y x = 0 and y x = 0 8x 2 Cg. Since x 2 C, we then have NC (x) =
fy 2 Rn : y x = 0 8x 2 Cg = C ? .

Example 1049 If C = Rn+ , we have:


NC (x) = fy 2 Rn : yi xi = 0 and yi 0 8i = 1; :::; ng (28.8)
We have yi 0 for each i since yi = y ei 0. Hence, yi xi 0 for each i, which in turn
implies yi xi = 0 for each i because y x = 0. N
This result implies that for, given a closed and convex cone C, a point x
^ satis…es the …rst
order condition (28.7) when
rf (^
x) x
^=0 (28.9)
rf (^
x) x 0 8x 2 C (28.10)
The …rst order condition is thus easier to check on cones. Even more so in the important
special case C = Rn+ , when from (28.8) it follows that conditions (28.9) and (28.10) reduce
to the following n equalities and n inequalities,
@f
x
^i (^
x) = 0 (28.11)
@xi
@f
(^
x) 0 (28.12)
@xi
4
For an authoritative presentation of this viewpoint, we refer readers to R. T. Rockafellar, “Lagrange
multipliers and optimality”, SIAM Review, 35, 183-238, 1993.
760 CHAPTER 28. GENERAL CONSTRAINTS

for each i = 1; :::; n.

P
We can also characterize the normal cones of the simplices n 1 = x 2 Rn+ : nk=1 xi = 1 ,
another all-important class of closed and convex sets. To this end, given x 2 n 1 set

I (x) = fy 2 Rn : yi = 1 8i 2 A (x) and yi 1 8i 2


= A (x)g

where A (x) = fi : xi > 0g.

Proposition 1050 We have N n 1 (x) = f y 2 Rn : y 2 I (x) and 0g.

The set f y 2 Rn : y 2 I (x) and 0g is easily seen to be the smallest convex cone
that contains I (x). The normal cone is thus such set.

Example 1051 If x = (1=3; 0; 2=3) 2 2, we have I (x) = f(1; y2 ; 1) : y2 1g and N 2 (x) =


f( ; y2 ; ) : y2 1 and 0g. N

In view of this characterization, a point x


^2 n 1 satis…es the …rst order condition (28.7)
if and only if there is a scalar ^ 0 such that

@f @f
(^
x) = ^ if x
^i > 0 ; (^
x) ^ if x
^i = 0
@xi @xi

that is, when

@f
(^
x) ^ 8k = 1; :::; n (28.13)
@xk
@f
(^
x) ^ x
^k = 0 8k = 1; :::; n (28.14)
@xk

Proof Suppose that A (x) is not a singleton and let i; j 2 A (x). Clearly, 0 < xi ; xj < 1.
Consider the points x" 2 Rn having coordinates x"i = xi + ", x"j = xj ", and x"k = xk
for all k 6= i and k 6= j; while the parameter " runs over Pn [ "0"; "0 ] with "0 >" 0 su¢ ciently
small in order that x " 0 for " 2 [ "0 ; "0 ]. Note that k=1 xk = 1 and so x 2 n 1 . Let
y 2 N n 1 (x). By de…nition, y (x" x) 0 for every " 2 [ "0 ; "0 ]. Namely, "yi "yj =
" (yi yj ) 0, which implies yi = yj . Hence, it must hold yi = for all i 2 A (x). That
is, the values of y must be constant on A (x). This is trivially true when A (x) is singleton.
Let now j 2 = A (x). Consider the vector xj 2 Rn , where xjj = 1 and xjk = 0 for each k 6= j: If
y 2 N n 1 (x), then y xj x 0. That is,
X X X
yj yk xk = yj yk xk = yj xk = yj 0:
k6=j k2A(x) k2A(x)

Therefore, N n 1 (x) f y 2 Rn : y 2 I (x) and 0g. We now show the converse inclu-
n
sion. Let y 2 R be such that, for some 0, we have yi = for all i 2 A (x) and yk
28.2. ANALYSIS OF THE BLACK BOX 761

for each k 2
= A (x). If x 2 n 1, then

n
X X X
y (x x) = yi (xi xi ) = yi (xi xi ) + yi (xi xi )
i=1 i2A(x) i2A(x)
=
0 1
X X X X
= (xi xi ) + yi xi = @ xi A + yi xi
i2A(x) i2A(x)
= i2A(x) i2A(x)
=
0 1
X X
@ xi A + xi = 0
i2A(x) i2A(x)
=

Hence y 2 N n 1 (x).

28.2.3 Divide et impera


Often the choice set C may be written as an intersection C = C1 \ \ Cn . A natural
question is whether the n relaxed optimization problems that correspond to the larger choice
set Ci can be then combined to inform on the original optimization problem. The next result
is key, as it provides a condition under which holds an “intersection rule” for normal cones.

Proposition 1052 Let C = C1 \ \ Cn , with each Ci closed and convex. Then, for all
x 2 C,
( n )
X
yi : yi 2 NCi (x) 8i = 1; :::; n NC (x)
i=1

Equality holds if C satis…es Slater’s condition, i.e., int C1 \ \ int Cn 6= ;, where the set
Ci itself can replace its interior int Ci if it is a¢ ne.

P
Proof Let xP 2 C. Suppose y = ni=1 yi , with yi 2 NCi (x) for every i = 1; :::; n. Then,
y (x x) = ni=1 yi (x x) 0, and so y 2 NC (x). This proves the inequality. We omit
the proof that the Slater’s condition implies the equality.

In words, under Slater’s condition the normal cone of an intersection of sets is the sum
of their normal cones. Hence, a point x ^ satis…es the …rst order condition (28.7) if and only
if there is a vector y^ = (^
y1 ; :::; y^n ) such that

n
X
rf (^
x) = y^i
i=1
y^i 2 NCi (^
x) 8i = 1; :::; n

A familiar “multipliers”format emerges. The next section will show how the Kuhn-Tucker’s
Theorem …ts in this general framework.
762 CHAPTER 28. GENERAL CONSTRAINTS

28.3 Resolution of the general concave problem


We can now get out of the black box and extend Kuhn-Tucker’s Theorem to the general
concave optimization problem (28.1). Its choice set (28.2) is
\ \
C=X\ Ci \ Cj
i2I j2J

where Ci = (gi = bi ) and Cj = (hj cj ).

Lemma 1053 The set C satis…es Slater’s condition if there is x 2 int X such that gi (x) = bi
for all i 2 I and hj (x) < cj for all j 2 J.
\ \
Proof The level sets Ci are a¢ ne (Proposition 603). Since x 2 X \ Ci \ int Cj ,
i2I j2J
such intersection is non-empty and so C satis…es Slater’s condition.

In what follows we thus assume the existence of such x.5 In view of Proposition 1052, it
now becomes key to characterize the normal cones of the sets Ci and Cj .

Lemma 1054 (i) For each x 2 Ci , we have NCi (x) = f rg (x) : 2 Rg for each x 2 Ci ;
(ii) For each x 2 Cj , we have
8
>
> f rhj (x) : 0g if hj (x) = cj
<
NCj (x) = f0g if hj (x) < cj
>
>
:
; if hj (x) > cj

Proof We only prove (ii) when hj (x) = cj . Assume cj = 0 (otherwise, it is enough to


consider the convex function hj cj ). Let hj (x) = 0. We have f rhj (x) : 0g = NC (x).
Let y 2 NC (x). Since hj (x) = 0, we have hj (x) hj (x) + y (x x) for all x 2 C, and so
y = rhj (x) since hj is di¤erentiable at x (cf. Theorem 935). Conversely, if y = rhj (x)
for some 0, then 0 hj (x) y (x x) since hj (x) = 0 and x 2 C. Hence,
rhj (x) 2 NC (x). We omit the cases hj (x) < 0 and hj (x) > 0.

Along with Proposition 1052, this lemma implies


8 9
< X X =
NC (x) = + i rgi (^
x) + j rhj (x) : 2 NX (x) , i 2 R 8i 2 I, j 0 8j 2 A (x)
: ;
i2I j2A(x)

where A (x) is the collection of the binding inequality constraints de…ned in (27.7). Since
here the …rst order condition (28.7) is a necessary and su¢ cient optimality condition, we can
say that x
^ 2 C solves the optimization problem (28.1) if and only if there exists a triple of
^ jJj
vectors ( ; ^ ; ^ ) 2 RjIj R+ Rn such that
X X
rf (^x) = ^ + ^ i rgi (^
x) + ^ j rhj (^
x) (28.15)
i2I j2J

^ j (c hj (^
x)) = 0 8j 2 J (28.16)
5
This also ensures that the problem is well posed in the sense of De…nition 1026.
28.3. RESOLUTION OF THE GENERAL CONCAVE PROBLEM 763

Indeed, as we noted in Lemma 1030, condition (28.16) amounts to require ^ j = 0 for each
j2= A (^
x).
To sum up, under a Slater’s condition we got back the Kuhn-Tucker’s conditions (27.8)
and (27.9), suitably modi…ed to cope with the new constraint x 2 X. We leave to the reader
the Lagrange formulation of these conditions.

Example 1055 (i) Let X = Rn+ . By (28.8), ^ k x


^k = 0 and ^ k 0 for each k = 1; :::; n. By
(28.15), we have X X
^ = rf (^
x) ^ i rgi (^
x) ^ j rhj (^
x) (28.17)
i2I j2J

So, conditions (28.15) and (28.16) can be equivalently written (with gradients unzipped) as:

@f X X @hj
(^
x) ^ i @gi (^
x) + ^j (^
x) 8k = 1; :::; n
@xk @xk @xk
i2I j2J

^ j (c hj (^
x)) = 0 8j 2 J
0 1
@f X @gi X @hj
@ (^
x) ^i (^
x) ^j x)A x
(^ ^k = 0 8k = 1; :::; n
@xk @xk @xk
i2I j2J

In this formulation, we can omit ^ .


(ii) Let X = n 1 . By (28.13) and (28.14), ^ 2 NX (^ x) if and only if there is some
^ 0 such that ^ k ^ and (^ ^ ) x ^k = 0 for every k = 1; :::; n. In view of (28.17), we
can say that x ^ 2 C solves the optimization problem (28.1) if and only if there exists a triple
^ jIj jJj
( ; ^; ^ ) 2 R R+ R+ such that

@f X X @hj
(^
x) ^ i @gi (^
x) ^j (^
x) ^ 8k = 1; :::; n
@xk @xk @xk
i2I j2J

^ j (c hj (^
x)) = 0 8j 2 J
0 1
X @gi X @hj
@ @f (^x) ^i (^
x) ^j (^
x) ^A x
^k = 0 8k = 1; :::; n
@xk @xk @xk
i2I j2J

In this formulation, we replace the vector ^ with the scalar ^ . N

Finally, note that variational inequalities provided a third approach to theorems a la


Lagrange/Kuhn-Tucker. Indeed, Lagrange’s Theorem was proved using the Implicit Func-
tion Theorem (Lemma 1012) and the local version of Kuhn-Tucker’s Theorem using a pen-
alization technique (Lemma 1029). Di¤erent techniques may require di¤erent regularity
conditions. For instance, Slater’s condition comes up in using variational inequality, while a
linear independence condition was used in the previous chapter (De…nition 1028). In general,
they provide di¤erent angles on the multipliers format.
764 CHAPTER 28. GENERAL CONSTRAINTS
Chapter 29

Parametric optimization problems

29.1 Preamble: correspondences


29.1.1 De…nition
Given any two sets X and Y , a correspondence ' : A X Y is a rule that, to each
element x 2 A, associates a non-empty subset ' (x) of Y (the image of x under '). The set
A is the domain of ' and Y is the codomain.
When ' (x) is a singleton for all x 2 A, the correspondence reduces to a function ' : A
X!Y .

Example 1056 (i) Let X = Y = R and consider the correspondence ' : R R given by
' (x) = [ jxj ; jxj]. For instance, ' (1) = ' ( 1) = [ 1; 1] and ' (0) = f0g. (ii) The budget
n
correspondence B : Rn+1 + ! 2R+ is de…ned by B (p; w) = x 2 Rn+ : p x w .1 Note that
B (p; w) 6= ; for all (p; w) 2 Rn+1
+ since 0 2 B (p; w) for all (p; w) 2 Rn+1
+ . N

Unless otherwise stated, from now on we assume that X is a subset of Rn and that Y is
a subset of Rm . We say that ' is:

(i) closed-valued if ' (x) is a closed subset for every x 2 A;

(ii) compact-valued if ' (x) is a compact subset for every x 2 A;

(iii) convex-valued if ' (x) is a convex subset for every x 2 A.

Example 1057 (i) A function f : X ! R is trivially both compact-valued and convex-


valued. (ii) The budget correspondence is convex-valued. Since the budget set is compact if
p 0 (and the consumption set is closed), the budget correspondence is compact-valued only
when restricted to Rn++ R+ . (iii) Let f : X ! Y be a function between any two sets X and
Y . The inverse correspondence f 1 : Im f X is de…ned by f 1 (y) = fx 2 X : f (x) = yg.
If f is injective, we get back to the inverse function f 1 : Im f ! Y . For instance, if
p p
f : R ! R is the parabola f (x) = x2 , then f 1 (y) = y; y for all y 2 Im f , i.e., for
all y 0. N
1
To ease matters, in this chapter we drop the set A from the de…nition of budget set (cf. Section 16.1.3).

765
766 CHAPTER 29. PARAMETRIC OPTIMIZATION PROBLEMS

29.1.2 Graph
The graph Gr ' of a correspondence ' : A X Y is the set

Gr ' = f(x; y) 2 A Y : y 2 ' (x)g

Like the graph of a function, the graph of a correspondence is a subset of X Y.

Example 1058 (i) The graph of the correspondence ' : R R given by ' (x) = [ jxj ; jxj]
is Gr ' = (x; y) 2 R2 : jxj y jxj . Graphically:
10

-5

-10
-10 -5 0 5 10

(ii) The graph of the budget correspondence B : Rn+1


+ Rn+ is

Gr B = (p; w; x) 2 Rn+ R+ Rn+ : x 2 B (p; w)

It is easy to see that ' is:

(i) closed-valued when its graph Gr ' is a closed subset of X Y;

(ii) convex-valued when its graph Gr ' is a convex subset of X Y.

The converse implications are false: closedness and convexity of the graph of ' are
signi…cantly stronger assumptions than the closedness and convexity of the images ' (x).
This is best seen by considering scalar functions, as the next examples show.

Example 1059 (i) Consider f : R ! R given by

x if x < 0
f (x) =
1 if x 0

Since f is a function, it is both closed-valued and convex-valued. However,

Gr ' = f(x; x) : x < 0g [ f(x; 1) : x 0g

is neither closed nor convex.


29.2. PARAMETRIC OPTIMIZATION PROBLEMS 767

1
y

-5 -4 -3 -2 -1 1 2 3 4 5
x
-1

-2

-3

-4

-5

The lack of convexity is obvious. To see that Gr ' is not closed observe that the origin
(0; 0) is a boundary point that does not belong to Gr '. (ii) A continuous scalar func-
tion f : R ! R has convex graph if and only if is a¢ ne. The “if” is obvious. As
to the “only if,” suppose that Gr f is convex. Given any x; y 2 R and any 2 [0; 1],
then ( x + (1 ) y; f (x) + (1 ) f (y)) 2 Gr f , that is, f ( x + (1 ) y) = f (x) +
(1 ) f (y). By standard results on Cauchy functional equation, this implies that there
exist m; q 2 R such that f (x) = mx + q. N

29.2 Parametric optimization problems


Given a set Rk of parameters and an all inclusive choice space A Rn , suppose
that each value of the parameter vector determines a choice (or feasible) set ' ( ) A.
That is, choice sets are identi…ed, as the parameter varies, by a feasibility correspondence
': A.
Consider an objective function f de…ned over the graph of the correspondence '; i.e.,
f :A ! R. This objective function has to be optimized on the feasible sets determined
by the correspondence ' : A. Jointly, ' and f determine a optimization problem in
parametric form:
max f (x; ) sub x 2 ' ( ) (29.1)
x
When f is concave (quasi-concave) in x and ' is convex-valued, this problem is called concave
(quasi-concave).

Ax ^ 2 ' ( ) is a solution (or optimal choice) for 2 if it is an optimal choice given ,


that is, f (^
x; ) f (x; ) for each x 2 ' ( ). The solution (or optimal choice) correspondence
:S A of the parametric optimization problem is de…ned by
( ) = arg max f (x; )
x2'( )

That is, the correspondence collects all solutions of problem (29.1). Its domain S is the
solution domain, that is, the collection of all for which problem (29.1) admits a solution.
768 CHAPTER 29. PARAMETRIC OPTIMIZATION PROBLEMS

If such solution is unique at all 2 S, then is single-valued, that is, it is a function. In


this case we say that is the solution function of problem (29.1).

The (optimal ) value function v : S ! R of the parametric optimization problem is


de…ned by
v ( ) = max ff (x; ) : x 2 ' ( )g (29.2)
for each 2 S, that is, v ( ) = f (^
x; ) for every x^ 2 ( ). The value function gives, for each
, the maximum value of the objective function on the set ' ( ). Since this value is attained
at the solutions x
^, the value function is well de…ned only on the solution domain S.

Example 1060 (i) The parametric optimization problem with equality and inequality con-
straints has the form

max f (x; ) (29.3)


x
sub i (x; )=0 8i 2 I
j (x; ) 0 8j 2 J

where i : A Rn ! R for every i 2 I, j : A Rn ! R for every j 2 J, and =


k
( 1 ; :::; k ) 2 R . Here ' ( ) = x 2 A : i (x; ) = 0 8i 2 I, j (x; ) 0 8j 2 J .
If f does not depend on the parameter, and if i (x; ) = gi (x) bi for every i 2 I and
j (x; ) = hj (x) cj for every j 2 J (so that k = jIj + jJj), we get back to the familiar
problem (27.4) studied in Chapter 27, that is,

max f (x)
x
sub gi (x) = bi 8i 2 I
hj (x) cj 8j 2 J

In this case, if we set b = b1 ; :::; bjIj 2 RjIj and c = c1 ; :::; cjJj 2 RjJj , the parameter set
consists of all = (b; c) 2 RjIj RjJj .
(ii) The consumer problem (Section 16.1.3) is a parametric optimization problem. The set
A is Rn+ . The space Rn+1
+ of all price and income pairs is the parameter set , with elements
= (p; I). The budget correspondence B : Rn+1 + Rn+ is the feasibility correspondence and
the utility function u is the objective function (which does not depend on the parameter). Let
S be the set of all parameters (p; I) for which the consumer problem has solution (i.e.,
an optimal bundle). The demand correspondence D : S Rn+ is the solution correspondence,
which becomes a demand function D : S ! Rn+ when optimal bundles are unique. Finally,
the indirect utility function v : S ! R is the value function.
(iii) Consider a pro…t maximizing …rm producing a single output with price p, using an
input vector x 2 Rn+ with prices w 2 Rn+ , according to a production function y = f (x) : The
pro…t function is (p; w) = supx 0 pf (x) w x. In this case the choice set A is Rn+ and
the parameter set is R+ Rn+ . Note that in this case ' ( ) = A for every = (p; w). N

Parametric optimization problems are important in economics because they permit to


carry out the all-important comparative statics exercises that study how, within a given
optimization problem, changes in the parameters a¤ect optimal choices and their values.
29.3. BASIC PROPERTIES 769

The solution correspondence and the value function are key for these exercises because they
describe how optimal choices and their value vary as parameters vary. For instance, in the
consumer problem the demand correspondence and the indirect utility function describe,
respectively, how the optimal bundles and their values are a¤ected by changes in prices and
income.

29.3 Basic properties


The existence theorems of Weierstrass and Tonelli ensure the existence of solutions. For
instance, the next result is a straightforward consequence of Weierstrass’s Theorem.

Proposition 1061 We have 0 2 S if ' ( 0 ) is compact and f ( ; 0) : A ! R is continuous.

In particular, if f is continuous on A and ' is compact-valued, then S = .

Proposition 1062 The solution correspondence is convex-valued if f is quasi-concave in x


and ' is convex-valued.

Proof Given any 2 , let us show that ( ) is convex. Let x


^1 ; x
^2 2 ( ) and 2 [0; 1].
Since f is quasi-concave in x,
f( x
^1 + (1 )x
^2 ) min ff (^
x1 ; ) ; f (^
x2 ; )g = f (^
x1 ; ) = f (^
x2 ; ) = v ( )
and so f ( x
^1 + (1 )x
^2 ) = v ( ), i.e., x
^1 + (1 )x
^2 2 ( ).

The convexity of the solution set means inter alia that, when non-empty, such set is
either a singleton or an in…nite set. That is, either the solution is unique or there is an
in…nite number of them. Next we give the most important su¢ cient condition that ensures
uniqueness.

Proposition 1063 The solution correspondence is single-valued if f is strictly quasi-concave


in x and ' is convex-valued.

Proof Let us prove that is single-valued. Let 2 and x ^1 ; x


^2 2 ( ). We want to show
that x
^1 = x
^2 . Suppose, per contra, that x
^1 6= x
^2 . By the strict quasi-concavity of f in x,
1 1
f x
^1 + x^2 ; > min ff (^
x1 ; ) ; f (^
x2 ; )g = f (^
x1 ; ) = f (^
x2 ; ) = v ( ) ;
2 2
a contradiction. Hence, x
^1 = x
^2 , as desired.

By strengthening the hypothesis of Proposition 1062 from quasi-concavity to strict quasi-


concavity, the set of solutions becomes a singleton. In this case we have a solution function
and not just a solution correspondence. This greatly simpli…es comparative statics exercises
that study how solutions change as the values of parameters vary. For this reason, in applic-
ations strict concavity (and so strict quasi-concavity) is often assumed, typically by requiring
that the second derivative be decreasing (Corollary 920).

We turn now to value functions. In the following result we assume the convexity of the
graph of '. As we already remarked, this is a substantially stronger assumption than the
convexity of the images ' (x).
770 CHAPTER 29. PARAMETRIC OPTIMIZATION PROBLEMS

Proposition 1064 The value function v : S ! R is quasi-concave (resp., concave) if f is


quasi-concave (resp., concave) and the graph of ' is convex.

Proof Let 1 ; 2 2 and 2 [0; 1]. Let x


^1 2 ( 1 ) and x ^2 2 ( 2 ). Since ' has convex
graph, x
^1 + (1 )x
^2 2 ' ( 1 + (1 ) 2 ). Hence, the quasi-concavity of f implies:
v( 1 + (1 ) 2) f( x
^1 + (1 )x
^2 ; 1 + (1 ) 2)
= f ( (^
x1 ; 1) + (1 ) (^
x2 ; 2 ))
min ff (^
x1 ; 1) ; f (^
x2 ; 2 )g = min fv ( 1 ) ; v ( 2 )g
and so v is quasi-concave. If f is concave, we have:
v( 1 + (1 ) 2) f( x
^1 + (1 )x
^2 ; 1 + (1 ) 2)
= f ( (^
x1 ; 1) + (1 ) (^
x2 ; 2 ))
f (^
x1 ; 1) + (1 ) f (^
x2 ; 2) = v ( 1 ) + (1 ) v ( 2)
and so v is concave.

A similar argument shows that v is strictly quasi-concave (resp., concave) if f is strictly


quasi-concave (resp., concave).

Example 1065 In the consumer problem, the graph of the budget correspondence is clearly
convex. Therefore, Proposition 1064 implies that the indirect utility v is quasi-concave
(concave) provided the utility is quasi-concave (concave). Since in Proposition ?? we proved
that v is quasi-convex, regardless to the behavior of u, we conclude that v is quasi-a¢ ne. N

29.4 Envelope theorems I: …xed constraint


How do value functions react to changes in parameters? In other words, how do change
the objective functions’ optimal levels when parameters change? The answer to this basic
comparative statistics exercise depends, clearly, on how solutions react to such changes,
as optimal levels are attained at the solutions. Mathematically, under di¤erentiability it
amounts to study the gradient rv ( ) of the value function. This the subject matter of the
envelope theorems.
We begin by considering in this section the special case
max f (x; ) sub x 2 C (29.4)
x

where the feasibility correspondence is constant, with ' ( ) = C A for all 2 . The
parameter only a¤ects the objective function. To ease matters, throughout the section we
also assume that S = .
We …rst approach heuristically the issue. To this end, suppose that n = k = 1 so
that both the parameter and the choice variable x are scalars. Moreover, assume that
there is a unique solution for each , so that : ! R is the solution function. Then
v ( ) = f ( ( ) ; ) for every 2 . A heuristic application of the chain rule (a “back of the
envelope calculation”) then suggests that, if exists, the derivative of v at 0 is:
@f ( ( 0 ) ; 0) 0 @f ( ( 0 ) ; 0)
v0 ( 0) = ( 0) +
@x @
29.4. ENVELOPE THEOREMS I: FIXED CONSTRAINT 771

Remarkably, the …rst term is null because by Fermat’s Theorem (@f =@x) ( ( 0 ) ; 0) = 0
(provided the solution is interior). Thus,
@f ( ( 0 ) ; 0)
v0 ( 0) = (29.5)
@
Next we make general and rigorous this important …nding.

Theorem 1066 Suppose f (x; ) is, for every x 2 C, di¤ erentiable at 0 2 int . If v is
di¤ erentiable at 0 , then for every x
^ 2 ( 0 ) we have rv ( 0 ) = r f (^
x; 0 ), that is,
@v ( 0 ) @f (^
x; 0 )
= 8i = 1; :::; k (29.6)
@ i @ i
If f is strictly quasi-concave in x and ' is convex-valued, then is a function (Proposition
1063). So, (29.6) can be written as
@v ( 0 ) @f ( ( 0 ) ; 0)
= 8i = 1; :::; k
@ i @ i
which is the general form of the heuristic formula (29.5).

Proof Let 0 2 int . Let x ( 0 ) 2 ( 0 ) be an optimal solution at 0 , so that v ( 0 ) =


f (x ( 0 ) ; 0 ). De…ne w : ! R by w ( ) = f (x ( 0 ) ; ). We have v ( 0 ) = w ( 0 ) and, for
all 2 ,
w ( ) = f (x ( 0 ) ; ) max f (x; ) = v ( ) (29.7)
x2C
We thus have
w(+ tu) w ( 0 )
0 v( 0 + tu) v ( 0)
t t
k
for all u 2 R and t > 0 su¢ ciently small. Hence,

@f (x; 0 ) f x ( 0) ; + hei
0 f (x ( 0 ) ; 0) w 0 + hei w ( 0)
= lim = lim
@ i h!0+ h h!0+ h
v i
0 + he v ( 0) @v ( 0 )
lim =
h!0+ h @ i
On the other hand,
w(+ tu) w ( 0 )
0 v ( 0 + tu) v ( 0 )
t t
k
for all u 2 R and t < 0 su¢ ciently small. By proceeding as before, we then have
@f (x; 0 ) @v ( 0 )
@ i @ i
This proves (29.6).

The hypothesis that v is di¤erentiable is not that appealing because it is not in terms
of the primitive elements f and C of problem (29.4). Indeed, to check it we need to know
the value function. Remarkably, in concave problems this di¤erentiability hypothesis follows
from hypotheses that are directly on the objective function.
772 CHAPTER 29. PARAMETRIC OPTIMIZATION PROBLEMS

Theorem 1067 Let C and be convex. Suppose f (x; ) is, for every x 2 C, di¤ erentiable
at 0 2 int . If f is concave on C , then v is di¤ erentiable at 0 .

Thus, if f is di¤erentiable on and concave, then rv ( 0 ) = r f (^ x; 0 ) for all x


^ 2 ( 0 ).
If, in addition, f is strictly concave in x, then we can directly write rv ( 0 ) = r f ( ( 0 ) ; 0 )
because is a function and ( 0 ) is the unique solution at 0 .
\
Proof By Proposition 1064, v is concave. We begin by proving that @v ( 0 ) @ f (x; 0 ).
x2 ( 0)
Let 2 @v ( 0 ), so that v ( ) v ( 0) + ( 0) for all 2 . Being v ( 0 ) = w ( 0 ), by
(29.7) we have, for all 2 ,

w( ) v( ) v ( 0) + ( 0) = w ( 0) + ( 0)

Hence, 2 @w ( 0 ) = @ f (x; 0 ) for all x 2 ( 0 ). Since v is concave at 0 2 int , by


Proposition 941 we have @v ( 0 ) 6= ;. Since f (x; ) is, for every x 2 ( 0 ), di¤erentiable
at 0 , we have @ f (x; 0 ) = fr f (x; 0 )g by Proposition 937. We conclude that @v ( 0 ) =
fr f (x; 0 )g. By Proposition 937, v is di¤erentiable at 0 .

29.5 Envelope theorems II: variable constraint


Matters are less clean when the feasibility correspondence is not constant. We consider a
parametric optimization problem with equality constraints

max f (x; ) sub i (x; )=0 8i = 1; :::; m (29.8)


x

where = ( 1 ; :::; m ) : A Rn ! Rm and = ( 1 ; :::; k ) 2 Rk .


Here ' ( ) = fx 2 A : i (x; ) = 0 8i = 1; :::; mg, so the constraint varies with the para-
meter . For instance, if f does not depend on and i (x; ) = gi (x) i for i = 1; :::; m
(so that k = m), we get back to the familiar problem (26.36) of Chapter 26, that is,

max f (x) sub gi (x) = bi 8i = 1; :::; m


x

Here we just present a heuristic argument. Assume that n = k = m = 1, so that there is


a single constraint and both the parameter and the choice variable x are scalars. Moreover,
assume that there is a unique solution for each , so that : ! R is the solution function
and ( ) is the unique solution that corresponds to . A heuristic application of the chain
rule suggests that, if exists, the derivative of v at 0 is

@f ( ( 0 ) ; 0) ^ ( 0) @ ( ( 0) ; 0)
v0 ( 0) =
@ @
where ^ ( 0 ) is the Lagrange multiplier that corresponds to the unique solution ( 0 ). Indeed,
being ( ( ) ; ) = 0 for every 2 , by a heuristic application of the chain rule we have

@ ( ( 0) ; 0) 0 @ ( ( 0) ; 0)
( 0) + =0
@x @
29.6. MARGINAL INTERPRETATION OF MULTIPLIERS 773

On the other hand, being v ( ) = f ( ( ) ; ) for every 2 , again by a heuristic application


of the chain rule we have
@f ( ( 0 ) ; 0 ) 0 @f
v0 ( 0) = ( 0) + ( ( 0) ; 0)
@x @
@f ( ( 0 ) ; 0 ) ^ @ ( ( 0) ; 0) ^ @ ( ( 0) ; 0) 0
= ( 0) + ( 0) ( 0)
@x @x @x
@f ( ( 0 ) ; 0 )
+
@
@f ( ( 0 ) ; 0 ) ^ @ ( ( 0) ; 0) 0
= ( 0 ) 0 ( ( 0 )) 0 ( 0 ) + ^ ( 0 ) ( 0)
@x @x
| {z }
=0
@f ( ( 0 ) ; 0 )
+
@
@f ( ( 0 ) ; 0 ) ^ @ ( ( 0) ; 0)
= ( 0)
@ @
as desired.
If f is strictly quasi-concave in x and ' is convex-valued, then is a function and (??)
can be written as
m
X
@v ( 0 ) @f ( ( 0 ) ; 0) ^i ( 0) @ i( ( 0) ; 0)
= 8s = 1; :::; k
@ s @ s @ s
i=1

which is the version that we derived heuristically.

29.6 Marginal interpretation of multipliers


Formula (??) continues to hold for parametric optimization problem with both equality and
inequality constraints (29.3), where it takes the form
@v ( 0 ) @f (^
x; 0 ) X X @ ( ( 0) ; 0)
= ^i ( 0) @ i( ( 0) ; 0)
^j ( 0)
j
(29.9)
@ s @ s @ s @ s
i2I j2J

jJj
for every s = 1; :::; k. Here ( ^ ( 0 ) ; ^ ( 0 )) 2 RjIj R+ are the Lagrange multipliers associated
with the solution ( 0 ), here assumed to be unique (for simplicity).
We can derive heuristically this formula with the heuristic argument that we just used for
the equality case. Indeed, if we denote by A ( ( 0 )) be the set of the binding constraints at
0 , by Lemma 1030 we have ^ j = 0 for each j 2 = A ( ( 0 )). So, the non-binding constraints
at 0 do not a¤ect the derivation because their multipliers are null.
That said, let us consider the standard problem (27.4) in which the objective function does
not depend on the parameter, i (x; ) = gi (x) bi for every i 2 I, and j (x; ) = hj (x) cj
for every j 2 J (Example 1060). Formula (29.9) then implies
@v (b; c)
= ^ i (b; c) 8i 2 I
@bi
@v (b; c)
= ^ j (b; c) 8j 2 J
@cj
774 CHAPTER 29. PARAMETRIC OPTIMIZATION PROBLEMS

Interestingly, the multipliers describe the marginal e¤ect on the value function of relaxing
the constraints, that is, how much it is valuable to relax them. In particular, we have
@v (b; c) =@cj = ^ j (b; c) 0 because it is always bene…cial to relax an inequality constraint:
more alternatives become available. In contrast, this might not be the case for an equality
constraint, so the sign of @v (b; c) =@bi = ^ i (b; c) is ambiguous.
Part VIII

Integration

775
Chapter 30

Riemann’s integral

Let us consider a positive function f (i.e., taking values 0) which is de…ned on a closed
interval [a; b]. Intuitively, the integral of f on [a; b] is the measure, called area, of the portion
of the plane

A f[a;b] = f(x; y) 2 [a; b] R+ : 0 y f (x)g (30.1)

under the graph of the function f on the interval.

6
y

1
O a b x

0
0 1 2 3 4 5 6

The problem is how to make such a natural intuition rigorous. We follow the classical
procedure known as the method of exhaustion. It consists of approximating the measure
of A f[a;b] through areas of very simple polygons, the so-called “plurirectangles”. Their
measure is calculated in an elementary way. Thanks to these simple polygons, we try to
obtain an approximation, as precise as possible, in order to capture, at the limit (if it exists),
the value of A f[a;b] . This value will be assumed as being the integral of f on [a; b]. The idea
of the method of exhaustion was born in the Greek mathematics, where it found brilliant
applications in the works of Eudoxus of Cnidus and Archimedes of Syracuse.

777
778 CHAPTER 30. RIEMANN’S INTEGRAL

30.1 Plurirectangles

We know how to calculate the areas of elementary geometric shapes. Among them, the
simplest ones are rectangles whose area is given by the product of the lengths of their
base and their corresponding height. A simple, but crucial, generalization of rectangles is
represented by the so-called plurirectangles,

-1
-1 0 1 2 3 4 5 6 7 8 9

that is by polygons formed by contiguous rectangles. The area of a plurirectangle is nothing


but the sum of the areas of the single rectangles that compose it.
Let us go back now to the set A f[a;b] under the function f on [a; b]. It is easy to see how
it can be included between inscribed plurirectangles and plurirectangles that circumscribe
it. For example the following plurirectangle is inscribed

4 y

3.5

2.5

1.5

0.5

0
O a b x
-0.5

-1
0 1 2 3 4 5 6

while the other plurirectangle below is circumscribed.


30.1. PLURIRECTANGLES 779

4 y

3.5

2.5

1.5

0.5

0
O a b x
-0.5

-1
0 1 2 3 4 5 6

Naturally, the area of A f[a;b] is larger than the one of every inscribed plurirectangle
and smaller than the one of every circumscribed plurirectangle. The area of A f[a;b] is
therefore included between the areas of the inscribed and circumscribed plurirectangles.
Hence, the …rst important observation is that the area of A f[a;b] can be always “sand-
wiched” between the areas of plurirectangles. This yields simple lower approximations (the
areas of the inscribed plurirectangles) and upper approximations (the areas of the circum-
scribed plurirectangles) of the value of A f[a;b] .
The second crucial observation is that such a sandwich, and consequently the relative
approximations, can be made better and better by considering …ner and …ner plurirectangles
which are obtained by subdividing more and more their bases:

4 y 4 y

3.5 3.5

3 3

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0
O a b x O a b x
-0.5 -0.5

-1 -1
0 1 2 3 4 5 6 0 1 2 3 4 5 6

Indeed, by subdividing more and more the bases, the area of the inscribed plurirectangles
becomes larger and larger, even if it remains always smaller than the area of A f[a;b] . On
the other hand, the area of the circumscribed plurirectangles becomes smaller and smaller,
even if it remains always larger than the area of A f[a;b] . In other words, the two slices of
the sandwich that include the set A f[a;b] (i.e., the lower and the upper approximations)
take values that become progressively closer to each other.
780 CHAPTER 30. RIEMANN’S INTEGRAL

If by considering …ner plurirectangles, corresponding to more and more subdivided bases,


at the limit, the lower approximation coincides with the upper approximation, this limit value
can be rightfully considered as the area of A f[a;b] . Intuitively, this corresponds, at the limit,
to the fact that the two slices of the sandwich connect.
In other words, we start from approximating objects that are very simple to measure:
the areas of plurirectangles. By working with more precise approximations, we are able to
measure an object which in general is much more complex: the area of the portion of plane
A f[a;b] under the function f .

30.2 De…nition
We now formalize the method of exhaustion. We …rst consider positive and bounded func-
tions f : [a; b] ! R+ . In the next section, we will consider functions taking any real value.

30.2.1 Positive functions


De…nition 1068 A set = fxi gni=0 of points is a subdivision of an interval [a; b] if

a = x0 < x1 < < xn = b

The set of all the possible subdivisions of an interval [a; b] will be denoted by .

Given a bounded function f : [a; b] ! R+ , let us consider the contiguous bases generated
by the points of the subdivision

[x0 ; x1 ] ; [x1 ; x2 ] ; ::: ; [xn 1 ; xn ] (30.2)

Let us build on them the largest plurirectangle inscribed in the set under f . In particular,
for the i-th base, the maximum height mi of the rectangle with base [xi 1 ; xi ] that can be
inscribed in the set under f is
mi = inf f (x)
x2[xi 1 ;xi ]

Since we have assumed that f is bounded, by the Least Upper Bound Principle, this in…mum
exists and is …nite, that is, mi 2 R. Since the length xi of each base [xi 1 ; xi ] is

xi = xi xi 1

the area I (f; ) of such maximum inscribed plurirectangle is given by


n
X
I (f; ) = mi xi (30.3)
i=1

In an analogous way, let us build on the contiguous bases (30.2), determined by the subdivi-
sion , the smallest plurirectangle that circumscribes the set under f . For the i-th base, the
minimum height Mi of the rectangle with base [xi 1 ; xi ] that circumscribes the set under f
is given by
Mi = sup f (x)
x2[xi 1 ;xi ]
30.2. DEFINITION 781

M
i
0
m
i
-1

-2 x x
i-1 i

-3
-2 -1 0 1 2 3 4

As before, given that f is bounded, by the Least Upper Bound Principle, the supremum exists
and is …nite, that is, Mi 2 R. Therefore, the area S (f; ) of the minimum circumscribed
plurirectangle is
Xn
S (f; ) = Mi xi (30.4)
i=1
Since mi Mi for every i, we have
I (f; ) S (f; ) 8 2 (30.5)
In particular, the area of the set under f lies between these two values. Hence, I (f; ) gives
a lower approximation of this area, while S (f; ) gives an upper approximation of it. The
sum I(f; ) is called lower integral sum of f with respect to , and the sum S(f; ) is called
upper integral sum of f with respect to .

De…nition 1069 Given two subdivisions and 0 of [a; b], we say that 0 re…nes , if 0.

That is, if all the points of are also points of 0.

In other words, the …ner subdivision 0 is obtained by adding further points to . For
example, if we consider [a; b] = [0; 1], the subdivision

0 1 1 3
= 0; ; ; ; 1
1 2 4
re…nes the subdivision = f0; 1=2; 1g.
It is easy to see that if 0 re…nes , then
0 0
I (f; ) I f; S f; S (f; ) (30.6)
In other words, a …ner subdivision 0 yields better approximations, both lower and upper, of
the area under f .1 By starting from any subdivision, we can always re…ne it, thus improving
(or at least not worsening) the approximations given by the respective plurirectangles.
1
For sake of brevity, we write “area under f ” instead of the more precise expression “area of the portion
of plane that lies under f ”.
782 CHAPTER 30. RIEMANN’S INTEGRAL

The same can be done by starting from any two subdivisions and 0 , where not neces-
sarily one is taken to be …ner than the other. Indeed, the subdivision 00 = [ 0 is formed
by all the points that belong to both subdivisions and 0 and it re…nes both and 0 . In
other words, 00 is a common re…nement of and 0 .

Example 1070 Let us consider the subdivisions of [0; 1]

1 1 2 0 1 1 3
= 0; ; ; ; 1 and = 0; ; ; ; 1
3 2 3 4 2 4

They are two di¤erent subdivisions. Neither re…nes 0 nor 0 re…nes . The subdivision

00 0 1 1 1 2 3
= [ = 0; ; ; ; ; ; 1
4 3 2 3 4

is a re…nement common to and 0. N

Thanks to the inequality (30.6) we have


00 00
I (f; ) I f; S f; S (f; ) (30.7)

and
0 00 00 0
I f; I f; S f; S f; (30.8)
A common re…nement 00 gives better approximations, both lower and upper, with respect
to and 0 , of the area under f .
All this motivates the next de…nition.

De…nition 1071 Let f : [a; b] ! R+ be a bounded function. The value


Z b
f (x) dx = sup I (f; ) (30.9)
a 2

is called lower integral of f on [a; b] ; while the value


Z b
f (x) dx = inf S (f; ) (30.10)
a 2

is called upper integral of f on [a; b].


Rb
Therefore, f (x) dx is the supremum of the areas I (f; ) of the inscribed plurirectangles
a
obtained by considering all the possible subdivisions of [a; b]. If we start from the inscribed
plurirectangles, this is the best possible lower approximation for the area under f on [a; b].
Rb
In an analogous way, a f (x) dx is the in…mum of the areas S (f; ) of the circumscribed
plurirectangles obtained by considering all the possible subdivisions of [a; b]. If we start from
the circumscribed plurirectangles, this is the best possible upper approximation for the area
under f on [a; b].

One of the …rst questions that arises is whether the lower and upper integrals of a bounded
function exist or not.
30.2. DEFINITION 783

Lemma 1072 If f : [a; b] ! R+ is a bounded function, then both the lower integral and the
upper integral exist and are …nite. Moreover, we have
Z b Z b
f (x) dx f (x) dx (30.11)
a a

Proof Since f is positive and bounded, there exists M 0 such that 0 f (x) M for
every x 2 [a; b]. Therefore, for every subdivision = fxi gni=0 we have

0 inf f (x) sup f (x) M 8i = 1; 2; : : : ; n


x2[xi 1 ;xi ] x2[xi 1 ;xi ]

and hence
0 I (f; ) S (f; ) M (b a) 8 2
The Least Upper Bound Principle implies that the supremum in (30.9) and the in…mum in
Rb Rb
(30.10) exist and are …nite and positive, that is, f (x) dx 2 R+ and a f (x) dx 2 R+ .
a
We still need to prove the inequality (30.11). Let us suppose, by contradiction, that
Z b Z b
f (x) dx f (x) dx = " > 0
a a

Thanks to Proposition 119, there exists a subdivision 0 such that


Z b
0 "
I(f; ) > f (x) dx
a
2

and a subdivision 00 such that


Z b
00 "
S(f; )< f (x) dx +
a 2
These two inequalities yield the following
Z Z b
!
b
0 00 " "
I(f; ) S(f; )> f (x) dx f (x) dx + =" "=0
a
2 a 2

If we take the subdivision = 0 [ 00 , then we have I (f; ) I (f; 0) and S (f; )


S (f; 00 ). We can conclude that
0 00
I(f; ) S(f; ) I(f; ) S(f; )>0

that is I(f; ) > S(f; ), which contradicts (30.5).

By the previous lemma, every bounded function f : [a; b] ! R+ has both the lower
integral and the upper integral, and
Z b Z b
f (x) dx f (x) dx
a a
784 CHAPTER 30. RIEMANN’S INTEGRAL

The area under f lies between these two values.2 The last inequality is the most re…ned
version of (30.6). The lower and upper integrals are respectively the best lower and upper
approximations of the area under f that can be obtained starting from plurirectangles. In
Rb Rb
particular, when f (x) dx = a f (x) dx, the area under f will be assumed equal to such a
a
value. This motivates the next fundamental de…nition.

De…nition 1073 A bounded function f : [a; b] ! R+ is said to be integrable according to


Riemann (or Riemann integrable) if
Z b Z b
f (x) dx = f (x) dx
a a
Rb
The common value is denoted by a f (x) dx and it is called integral according to Riemann
(or Riemann’s integral) of f on [a; b].
Rb
The notation
Pn a f (x) dx reminds
P us that the integral is obtained asR the limit of sums of
the type i=1 i xi : the symbol is replaced by a “long letter s” , xi by dx and i ,
which are the values of the function, by f (x).
For the sake of brevity, in the rest of the chapter, we will often talk about integrals and
integrable functions, omitting the distinction “according to Riemann”. Since there are other
notions of integral, it is important however to keep always in mind such a distinction. In
addition, note that the de…nition applies only to bounded functions. When, in the sequel,
we will consider integrable functions, they will be, a fortiori, bounded.
Let us illustrate the de…nition of integral with two examples. The …rst example involves
an integrable function while the second one deals with a non-integrable function.

Example 1074 Let f : [a; b] ! R be de…ned as f (x) = x. For any subdivision fxi gni=0 we
have
n
X
I (f; ) = x0 x1 + x1 x2 + + xn 1 xn = xi 1 xi
i=1
n
X
S (f; ) = x1 x1 + x2 x2 + + xn xn = xi xi
i=1

and therefore
n
X
S (f; ) I (f; ) = (x1 x0 ) x1 + (x2 x1 ) x2 + + (xn xn 1 ) xn = ( xi )2
i=1

By taking progressively …ner subdivisions, we obtain that xi ! 0, and hence


Z b Z b n
X
0 f (x) dx f (x) dx ( xi )2 ! 0
a a i=1

Rb Rb
that is af (x) dx = f (x) dx. It follows that f (x) = x is integrable. N
a
2
Recall that the area may or may not exist. If it exists, it is the measure of the set under f .
30.2. DEFINITION 785

Example 1075 Let f : [a; b] ! R be the Dirichlet function

1 if x 2 Q\ [a; b]
f (x) = (30.12)
0 if x 2
= Q\ [a; b]

restricted to [a; b]. By Proposition 39, on the density of rational numbers, for every a
x<y b there exists a rational number q such that x < q < y. It is also true that for
every a x < y b, there exists an irrational number r such that x < r < y. Given any
subdivision fxi gni=0 of [a; b], we have

mi = 0 and Mi = 1 for every i = 1; 2; :::; n

Therefore

I (f; ) = 0 x1 + 0 x2 + + 0 xn = 0

and
n
X
S (f; ) = 1 x1 + 1 x2 + + 1 xn = xi = b a
i=1

Rb Rb
which implies f (x) dx = 0 < b a= af (x) dx. The Dirichlet function is not integrable
a
in the sense of Riemann.3 N

Finally, let us introduce a useful quantity that characterizes the “thickness” of a subdi-
vision of [a; b].

De…nition 1076 Given a subdivision of [a; b], we de…ne mesh of , which we denote by
j j, the positive quantity

j j= max xi
i=1;2;:::;n

30.2.2 General functions

We now extend the notion of integral to any bounded function f : [a; b] ! R, not necessarily
positive. For a function f : [a; b] ! R that can assume both negative and positive values,

3
Therefore, it has no meaning (at least in the sense of Riemann) to talk about the “area” of the set under
such a function.
786 CHAPTER 30. RIEMANN’S INTEGRAL

the set under f on [a; b] has in general a positive part and a negative part

5 y

2
+

1
O - x
0

-1

-2
-3 -2 -1 0 1 2 3 4

and the integral is the di¤erence between the areas of the positive part and of the negative
part. If they have equal value, the integral is zero: it is the case, for example, of the function
f (x) = sin x on the interval [0; 2 ].
To make it rigorous the idea, it is useful to decompose a function in its positive and
negative parts.

De…nition 1077 Let f : A R ! R. The function f + : A R ! R+ is de…ned as

f + (x) = max ff (x) ; 0g 8x 2 A

while the function f :A R ! R+ is de…ned as

f (x) = min ff (x) ; 0g 8x 2 A

The function f + is called positive part of f , while f is called negative part.

Both functions f + and f are positive.

Example 1078 Let f : R ! R be given by f (x) = x. We have

0 x<0 x x<0
f + (x) = and f (x) =
x x 0 0 x 0

Graphically:
30.2. DEFINITION 787

3 3
y y
2.5 2.5

2 2
+ -
f f
1.5 1.5

1 1

0.5 0.5

0 0
O x O x
-0.5 -0.5

-1 -1

-1.5 -1.5

-2 -2
-4 -2 0 2 4 6 -4 -2 0 2 4 6

Graph of f + Graph of f

Example 1079 Let f : R ! R be given by f (x) = sin x. We have


8 [
< sin x x 2 [2n ; (2n + 1) ]
f + (x) = n2Z
:
0 otherwise
and 8 [
< 0 x2 [2n ; (2n + 1) ]
f (x) = n2Z
:
sin x otherwise
Graphically:

4 4

y y
3 3

-
f
2 2
+
f
1 1

0 0
O x O x

-1 -1

-2 -2
-8 -6 -4 -2 0 2 4 6 8 10 -8 -6 -4 -2 0 2 4 6 8 10

Graph of f + Graph of f

N
788 CHAPTER 30. RIEMANN’S INTEGRAL

Since for every real number a 2 R we trivially have that


a = max fa; 0g + min fa; 0g
it follows that, for every x 2 A,
f (x) = max ff (x) ; 0g + min ff (x) ; 0g
= max ff (x) ; 0g ( min ff (x) ; 0g)
+
= f (x) f (x)
Every function f : A R ! R can therefore be decomposed in the di¤erence
f = f+ f (30.13)
of its positive part and of its negative part. Such a decomposition allows to extend in a
natural way the notion of integral to any function, not necessarily positive. Indeed, being
both functions f + and f positive, to the areas under them it applies the notion of Riemann’s
integral. The di¤erence between theirR integrals, that is, between such areas, is really the
b
integral we were looking for. Indeed, a f + (x) dx is the area under the positive part of f
Rb
and a f (x) dx is the area under the negative part of f . Their di¤erence
Z b Z b
f + (x) dx f (x) dx
a a

is equivalent to consider the di¤erence between the areas under the positive part and the
negative part.
All of this motivates the following de…nition of Riemann’s integral for bounded functions
which are not necessarily positive.

De…nition 1080 A bounded function f : [a; b] ! R is said to be integrable according to


Riemann if the functions f + and f are integrable. In such a case, the Riemann’s integral
of f on [a; b] is de…ned as
Z b Z b Z b
+
f (x) dx = f (x) dx f (x) dx
a a a

Such a de…nition makes it rigorous and transparent the idea of considering with R b di¤erent
sign the areas that lie, respectively, above and below the horizontal axis, that is, a f + (x) dx
Rb
and a f (x) dx.

30.2.3 Everything holds together


The notion of Riemann’s integral for functions which are not necessarily positive can be
expressed in an equivalent way by using, also for these functions, the approximations I (f; )
and S (f; ) of the de…nition of integral of positive functions.
To this end, …rst of all let us observe that, given a subdivision = fxi gni=0 , we can de…ne,
also for any bounded function f : [a; b] ! R, the sums S (f; ) and I (f; ) as in (30.3) and
(30.4), that is,
Xn Xn
I (f; ) = mi xi and S (f; ) = Mi xi
i=1 i=1
30.2. DEFINITION 789

Also for general functions, the sum I(f; ) is called lower integral sum of f with respect
to the subdivision , and the sum S(f; ) is called upper integral sum of f with respect to
the subdivision .The reader can easily verify that, for these sums, there continue to hold
properties (30.5), (30.6), (30.7) and (30.8). In particular,

sup I (f; ) inf S (f; )


2 2

Also for any bounded function f : [a; b] ! R (not necessarily positive) we can de…ne the
lower and upper integrals
Z b Z b
f (x) dx = sup I (f; ) and f (x) dx = inf S (f; ) (30.14)
2 a 2
a

in perfect analogy with what has been done for positive functions. The next result shows
that everything holds together, that is, the notion of Riemann’s integral obtained through
the decomposition (30.13) in positive and negative part coincides with the equality between
upper and lower integrals of (30.14).
Rb
Proposition 1081 A bounded function f : [a; b] ! R is integrable if and only if f (x) dx =
a
Rb
a f (x) dx. In such a case,
Z b Z b Z b
f (x) dx = f (x) dx = f (x) dx
a a a

The proof is based on the next three Lemmas. The …rst one establishes a general property
of the suprema and in…ma of sums of functions, the second one has also a theoretical interest
for the theory of integration (as we will observe at the end of the section), while the last one
is of more technical nature. The proof of the second and of the third lemmas, as well as of
Proposition 1081, are omitted.

Lemma 1082 For two bounded functions g; h : A ! R, we have supA (g + h) supA g +


supA h, and inf A (g + h) inf A g + inf A h.

Proof By contradiction, let us suppose that supA (g + h) > supA g + supA h. Let " =
supA (g + h) (supA g + supA h) > 0. By the property of the sup of a set, there exists
x0 2 A such that (g + h)(x0 ) > supA (g + h) " = supA g + supA h.4 At the same time,
by de…nition of sup of a function, we have g(x) supA g and h(x) supA h for every
x 2 A, from which it follows g(x) + h(x) supA g + supA h for every x 2 A. In particular,
(g + h)(x0 ) supA g + supA h, a contradiction. The reader can prove, in a similar way, that
inf A (g + h) inf A g + inf A h.

Lemma 1083 Let f : [a; b] ! R be a bounded function. Then, for every subdivision =
fxi gni=0 of [a; b], we have
S (f; ) = S f + ; I f ; (30.15)
and
I (f; ) = I f + ; S f ; (30.16)
4
Note that supA (g + h) = sup Im(g + h) = sup(g + h)(A).
790 CHAPTER 30. RIEMANN’S INTEGRAL

Lemma 1084 Let f : [a; b] ! R be a bounded function. Then, for every subdivision =
fxi gni=0 of [a; b],
sup I (f; ) sup I f + ; inf S f ; (30.17)
2 2 2

inf S f + ; sup I f ; inf S (f; )


2 2 2

N.B. Often Riemann’s integral is de…ned directly for general functions, which are not ne-
cessarily positive, through the upper sums and the lower sums. What is lost in de…ning
these sums for not necessarily positive functions is the geometric intuition. While for pos-
itive functions I(f; ) is the area of the inscribed rectangles and S(f; ) is the area of the
circumscribed rectangles, this is no longer true for a generic function that takes positive
and negative values, as (30.15) and (30.16) show. The formulation we adopt with De…nition
1080 is suggested by pedagogical motivations and it is equivalent to the usual formulation,
as Proposition 1081 shows. O

30.3 Criteria of integrability


In the next section we will study some important classes of integrable functions. To this end,
we establish here some important criteria of integrability.
Let us see a …rst simple, but useful, criterion:

Proposition 1085 A bounded function f : [a; b] ! R is integrable according to Riemann if


and only if for every " > 0 there exists a subdivision such that S (f; ) I (f; ) < ".

Proof “If”. Let us suppose that, for every " > 0, there exists a subdivision such that
S (f; ) I (f; ) < ". Then
Z b Z b
0 f (x) dx f (x) dx S (f; ) I (f; ) < "
a a
Rb Rb
and therefore, being " > 0 arbitrary, we have af (x) dx = f (x) dx.
a
Rb Rb
“Only if”. Let us suppose af f (x) dx. Thanks to Proposition 119, for every
(x) dx =
a
Rb
" > 0 there exist a subdivision 0 such that S (f; 0 ) a f (x) dx < " and a subdivision
00
Rb
such that f (x) dx I (f; 00 ) < ". Let be a subdivision that re…nes both 0 and 00 .
a
Thanks to (30.6) we have I (f; 00 ) I (f; ) S (f; ) S (f; 0 ), and therefore
Z b Z b
0 00
S (f; ) I (f; ) S f; I f; < f (x) dx + " f (x) dx + " = 2"
a a

as desired.

The next result shows that, if two functions are equal except at a …nite number of points,
then their integrals, if they exist, are equal. It is an important property of stability of the
integral, whose value does not change if we modify a function f : [a; b] ! R at a …nite
number of points.
30.3. CRITERIA OF INTEGRABILITY 791

Proposition 1086 Let f : [a; b] ! R be an integrable function. If g : [a; b] ! R is equal


Rb
to f except at most at a …nite number of points, then also g is integrable and a f (x) dx =
Rb
a g (x) dx.

Proof It is su¢ cient to prove the statement for the case in which g di¤ers from f at only
one point x
^ 2 [a; b]. The case in which g di¤ers from f at n points is proved simply by …nite
induction adding one point each time.
Let us suppose therefore that f (^
x) 6= g(^
x) with x
^ 2 [a; b]. Without loss of generality, let
us suppose that f (^ x) > g(^
x). Setting k = f (^
x) g(^ x) > 0, let h : [a; b] ! R the function
de…ned by h = f g. We have therefore

0 x 6= x
^
h(x) =
k x=x ^
Rb
Let us prove that h is integrable and that a h(x)dx = 0. Let " > 0. Let us consider any
subdivision = fx0 ; x1 ; :::; xn g of [a; b] such that j j < "=(2k). Since x ^ 2 [a; b], there are two
possibilities: in the …rst case x^ does not coincide with an interior point of the subdivision,
that is, we have either x^ 2 (xi 1 ; xi ) for some i = 1; :::; n or x
^ 2 fx0 ; xn g; in the second case
x
^ is a point of the subdivision, with the exclusion of the extremes, that is, x ^ = xi for some
i = 1; :::; n 1. Since h(x) = 0 for every x 6= x ^, we have

I(h; ) = 0

If x
^ 2 (xi 1 ; xi ) ^ 2 fx0 ; xn g, we have5
for some i = 1; :::; n; or x
" "
S(h; ) = k xi < k = <"
2k 2
If x
^ = xi for some i = 1; :::; n 1, we have
"
S(h; ) = k ( xi + xi+1 ) < 2k ="
2k
Therefore, in any case we have S(h; ) I(h; ) < ". Since " > 0 is arbitrary, thanks to
Proposition 1085 h is integrable on [a; b]. Hence
Z b
h(x)dx = sup I(h; ) = inf S(h; ) (30.18)
a 2 2

But, being h(x) = 0 for every x 6= x


^, we have I(h; ) = 0 for every subdivision 2 . Hence

sup I(h; ) = 0
2

and, thanks to (30.18), we can conclude that


Z b
h(x)dx = sup I(h; ) = 0
a 2
5
If x
^ = x0 , we have S(h; ) = k x1 , if x
^ = xn , we have S(h; ) = k xn . In both cases we have
S(h; ) < ".
792 CHAPTER 30. RIEMANN’S INTEGRAL

Applying the linearity of the integral (Theorem 1095), we have that g = f h is integrable
because f and h are so, with
Z b Z b Z b Z b
g(x)dx = f (x)dx h(x)dx = f (x)dx
a a a a

as desired.

O.R. Even if a function f is not de…ned at a …nite number of points of the interval [a; b], it
is possible to talk of its integral: it coincides with that of any function de…ned also at the
missing points and equal to f at the points at which this last one is de…ned. In particular the
integrals of f on [a; b], (a; b], [a; b) and (a; b) always coincide: this makes it non-ambiguous
Z b
the notation f (x) dx. H
a

Finally, let us show that integrability preserves for continuous transformations.

Proposition 1087 Let f : [a; b] ! R be an integrable and bounded function, with m


f M . If g : [m; M ] ! R is continuous, then the composite function g f : [a; b] ! R is
integrable.

Proof Let " > 0. Since g is continuous on [m; M ], thanks to Theorem 473 the function g is
uniformly continuous on [m; M ], that is, there exists " > 0 such that

jx yj < " =) jg (x) g (y)j < " 8x; y 2 [m; M ] (30.19)

Without loss of generality, we can assume that " < ".


Since f is integrable, by Proposition 1085 there exists a subdivision = fxi gni=0 of [a; b]
such that S (f; ) I (f; ) < 2" . Let I f1; 2; : : : ; ng be the set of the indices i of the
subdivision such that

sup f (x) inf f (x) < "


x2[xi 1 ;xi ]
x2[xi 1 ;xi ]

so that, for i 2 I
f (x) f x0 < " 8x; x0 2 [xi 1 ; xi ]

From (30.19) it follows that, for every i 2 I,

(g f ) (x) (g f ) x0 < " 8x; x0 2 [xi 1 ; xi ]

and therefore

sup (g f ) (x) inf (g f ) (x) " 8i 2 I


x2[xi 1 ;xi ]
x2[xi 1 ;xi ]

On the other hand,6


" #
X X
2
" xi sup f (x) inf f (x) xi < "
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i2I
= i2I
=
6
Here i 2
= I stands for i 2 f1; 2; : : : ; ng I.
30.3. CRITERIA OF INTEGRABILITY 793

P
and therefore i2I
= xi < < ". Hence
"

n
" #
X
S (g f; ) I (g f; ) = sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i=0
" #
X
= sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i2I
" #
X
+ sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i2I
=
X X
" xi + 2 max jg (y)j xi
y2[m;M ]
i2I i2I
=
< " (b a) + 2 max jg (y)j "
y2[m;M ]

= b a + 2 max jg (y)j "


y2[m;M ]

Thanks to Proposition 1085, g f is integrable.

Since the function g (x) = jxj is continuous, a simple, but important consequence of
Proposition 1087 is that the integrability of a bounded function f : [a; b] ! R implies the
integrability of the function absolute value jf j : [a; b] ! R. Note that the converse is false:
the function (
1 if x 2 Q\ [0; 1]
f (x) = (30.20)
1 if x 2 = Q\ [0; 1]
is a simple modi…cation of the Dirichlet function and hence it is not integrable, contrary to
its absolute value jf j that is the function constant equal to 1 on the interval [0; 1].

Finally observe that the …rst among the integrability criteria of the section, Proposition
1085, allows an interesting perspective on Riemann’s integral. Given any subdivision =
fxi gni=0 , by de…nition we have mi f (x0i ) Mi for every x0i 2 [xi 1 ; xi ], so that
n
X
I (f; ) f x0i xi S (f; )
i=1

Hence, since
Z b
I(f; ) f (x) dx S(f; )
a
we have
n
X Z b
I (f; ) S (f; ) f x0i xi f (x) dx S (f; ) I (f; )
i=1 a

which is equivalent to
n
X Z b
f x0i xi f (x) dx S (f; ) I (f; )
i=1 a
794 CHAPTER 30. RIEMANN’S INTEGRAL

Thanks to Proposition 1085, for every " > 0 there exists a subdivision su¢ ciently …ne for
which Z b
X n
0
f xi xi f (x) dx S (f; ) I (f; ) < "
i=1 a

In a suggestive way we can therefore write


n
X Z b
lim f x0i xi = f (x) dx (30.21)
j j!0 a
i=1
Rb
that is Riemann’s integral a f (x)P dx can be seen as limit, for meshes j j smaller and smaller
of the subdivisions , of the sums ni=1 f (x0i ) xi .7 It is an equivalent way to see Riemann’s
integral, which is sometimes de…ned directly in these terms through (30.21).
Even if evocative, the limit limj j!0 is not part of the notions of limit (for sequences or
functions) discussed in the book (indeed, it requires a more subtle notion of limit); moreover,
the de…nition we have adopted is particularly suited for generalizations of Riemann’s integral,
as the reader will see in more advanced courses on integration.

30.4 Classes of integrable functions


Reinforced by the integrability criteria seen in the previous section, we study now some
important classes of integrable functions.

30.4.1 Step functions


There is a class of functions strictly linked to the plurirectangles and that holds a central
role in the theory of integration.

De…nition 1088 A function f : [a; b] ! R is said to be a step function if there exist a


subdivision = fxi gni=0 and a set fci gni=1 of constants such that

f (x) = ci 8x 2 (xi 1 ; xi ) (30.22)

For example, the functions f; g : [a; b] ! R given by


n
X1
f (x) = ci 1[xi 1 ;xi )
(x) + cn 1[xn 1 ;xn ]
(x) (30.23)
i=1

and
n
X
g (x) = c1 1[x0 ;x1 ] (x) + ci 1(xi 1 ;xi ]
(x) (30.24)
i=2
are step functions where, for every set A R, by 1A : R ! R we have denoted the indicator
function
1 if x 2 A
1A (x) = (30.25)
0 if x 2
=A
7
Often called Riemann sums (or, sometimes, Cauchy sums).
30.4. CLASSES OF INTEGRABLE FUNCTIONS 795

The two following …gures report, for n = 4, examples of functions f and g described by
(30.23) and (30.24). Not that f and g are, respectively, continuous from the right and from
the left, that is, limx!x+ f (x) = f (x0 ) and limx!x g (x) = g (x0 ).
0 0

7 7

6 6

5 f(x) 5 g(x)
4 c 4 c
4 4

3 c 3 c
2 2

2 c 2 c
3 3

1 c 1 c
1 1

0 0
x x x x x x x x x x
0 1 2 3 4 0 1 2 3 4
-1 -1
-1 0 1 2 3 4 5 6 7 8 9 -1 0 1 2 3 4 5 6 7 8 9

On the intervals

[x0 ; x1 ) [ (x1 ; x2 ) [ (x2 ; x3 ) [ (x3 ; x4 ]

both step functions generate the same plurirectangle

4 c
4

3 c
2

2 c
3

1 c
1

0
x x x x x
0 1 2 3 4
-1
-1 0 1 2 3 4 5 6 7 8 9

determined by the subdivision fxi g4i=0 and by the constants fci g4i=1 . Nevertheless, at the
points x1 < x2 < x3 the functions f and g di¤er and it is easy to verify that on the whole
interval [x0 ; x4 ] they do not generate this plurirectangle, as next …gure shows. Indeed, the
dashed segments at x2 is not under f and the dashed segments at x1 and x3 are not under
796 CHAPTER 30. RIEMANN’S INTEGRAL

g.

7 7

6 6

5 f(x) 5 g(x)
4 c 4 c
4 4

3 c 3 c
2 2

2 c 2 c
3 3

1 c 1 c
1 1

0 0
x x x x x x x x x x
0 1 2 3 4 0 1 2 3 4
-1 -1
-1 0 1 2 3 4 5 6 7 -1
8 90 1 2 3 4 5 6 7

But, thanks to Proposition 1086, such discrepancy at a …nite number of points is irrelevant
for the integral and next result shows that the area under the step functions f and g is,
actually, equal to that of the corresponding plurirectangle (independently on the values of
the function at the points x1 < x2 < x3 ).

Proposition 1089 A step function f : [a; b] ! R, determined by the subdivision fxi gni=0
and by the constants fci gni=1 according to (30.22), is integrable and we have

Z b n
X
f (x) dx = ci xi (30.26)
a i=1

All the step functions that are determined by a subdivision fxi gni=0 and by a set of con-
stants fci gni=1 according to (30.22), share therefore the same integral (30.26). In particular,
this holds for the step functions (30.23) and (30.24).

Rb Rb
Proof Since f is bounded, thanks to Lemma 1072 we have that f (x) dx; a f (x) dx 2 R.
a
Let m = inf x2[a;b] f (x) and M = supx2[a;b] f (x). Fixed " > 0 su¢ ciently small, let us
consider the subdivision " given by

x0 < x1 " < x1 + " < x2 " < x2 + " < < xn 1 " < xn 1 + " < xn
30.4. CLASSES OF INTEGRABLE FUNCTIONS 797

We have
I (f; ") = c1 (x1 " x0 ) + 2" inf f (x)
x2[x1 ";x1 +"]

+ c2 (x2 " x1 ") + 2" inf f (x) +


x2[x2 ";x2 +"]

+ 2" inf f (x) + cn (xn xn 1 ")


x2[xn 1 ";xn 1 +"]

n
X1
= c1 ( x1 ") + 2" inf f (x)
x2[xi ";xi +"]
i=1
n
X1
+ ci ( xi 2") + cn ( xn ")
i=2
Xn n
X1 n
X1
= ci xi " (c1 + cn ) + 2" inf f (x) 2" ci
x2[xi ";xi +"]
i=1 i=1 i=2
Xn
ci xi 2"M + 2" (n 1) m 2"M (n 2)
i=1
Xn
= ci xi 2" (n 1) (M m)
i=1

In an analogous way we show that


n
X
S (f; ") ci xi + 2" (n 1) (M m)
i=1

and therefore, setting K = 2(n 1)(M m) > 0, we have


S (f; ") I (f; ") 2K" < 4K"
Given the arbitrariness of " > 0, thanks to Proposition 1085 f is integrable. Moreover, since
Z b
I (f; " ) f (x) dx S (f; " )
a
we have Z
n
X b n
X
ci xi K" f (x) dx ci xi + K"
i=1 a i=1
Rb Pn
that, given the arbitrariness of " > 0, guarantees that a f (x) dx = i=1 ci xi .

30.4.2 Analytic approach and geometric approach


Step functions can be seen as the functional version of plurirectangles. They are, therefore,
the simplest functions to which apply integration. In particular, thanks to (30.26), that is,
Z b Xn
f (x) dx = ci xi
a i=1

the lower and upper integrals can be expressed in terms of integrals of step functions. Let
S ([a; b]) be the set of all the step functions de…ned on [a; b].
798 CHAPTER 30. RIEMANN’S INTEGRAL

Proposition 1090 Given a bounded function f : [a; b] ! R we have


Z b Z b
f (x) dx = sup h (x) dx : h f and h 2 S ([a; b]) (30.27)
a a

and
Z b Z b
f (x) dx = inf h (x) dx : h f and h 2 S ([a; b]) (30.28)
a a

Thanks to (30.27) and (30.28), a bounded function f : [a; b] ! R is integrable according


to Riemann if and only if
Z b Z b
sup h (x) dx : h f and h 2 S ([a; b]) = inf h (x) dx : f h and h 2 S ([a; b])
a a

that is if and only if the lower approximation given by the integrals of step functions lower
than f coincides, at the limit, with the upper approximation given by the integrals of step
functions larger than f . In this case the exhaustion assumes a more analytic and less geomet-
ric aspect8 having substituted the approximation through elementary polygons (the plurir-
ectangles) with one given by elementary functions (the step functions).
This suggests a di¤erent approach to Riemann’s integral, more analytic and less geomet-
ric. In it, …rst we de…ne the integrals of step functions (that is, the area under them), which
can be determined on the basis of elementary geometric considerations based on plurirect-
angles. We then use these “elementary” integrals to suitably approximate the areas under
more complicated functions. In particular, we de…ne the lower integral of a bounded function
f : [a; b] ! R as the best approximation “from below” obtained thanks to step functions
h f , and, analogously, the upper integral of a bounded function f : [a; b] ! R as the best
approximation “from above” obtained with step functions h f .
Thanks to (30.27) and (30.28), such more analytic interpretation of the method of ex-
haustion is equivalent to the geometric one previously adopted. The analytic approach is
very fruitful for some subsequent developments.

30.4.3 Continuous functions and monotonic functions


Let us introduce two important classes of integrable functions, the continuous ones and the
monotonic ones.

Proposition 1091 Every continuous function f : [a; b] ! R is integrable.

Proof Since f is continuous on [a; b], thanks to Weierstrass’ Theorem, f is bounded. Let
" > 0. By Theorem 473, f is uniformly continuous, that is, there exists " > 0 such that
jx yj < " =) jf (x) f (y)j < " 8x; y 2 [a; b] (30.29)
Let = fxi gni=0 be a subdivision of [a; b] such that j j < ". Thanks to (30.29), for every
i = 1; 2; : : : ; n we have therefore
max f (x) min f (x) < "
x2[xi 1 ;xi ] x2[xi 1 ;xi ]

8
That is based also on the use of notions of analysis, such as the functions, and not only on that of
geometric …gures, such as the plurirectangles.
30.4. CLASSES OF INTEGRABLE FUNCTIONS 799

where max and min exist thanks to Weierstrass’Theorem. It follows that


n
X n
X
S (f; ) I (f; ) = max f (x) xi min f (x) xi
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=1 i=1
n
X
= max f (x) min f (x) xi
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=1
Xn
<" xi = " (b a)
i=1

Thanks to Proposition 1085, f is integrable.

For the stability of the integral seen in Proposition 1086, we have the following immediate
generalization of Proposition 1091: Every bounded function f : [a; b] ! R that has at most
a …nite number of removable discontinuities is integrable. Indeed, by recalling (12.7) of
Chapter 12, if S = fxi gni=1 is the set of points where f has removable discontinuities, the
function
f (x) if x 2
=S
f~ (x) =
limy!x f (y) if x 2 S
is continuous (and therefore integrable) and it is equal to f except at the points of S.
The hypothesis that the discontinuities are removable, is actually super‡uous. Moreover,
we can allow for countably many points of discontinuity (and not more than that).

Theorem 1092 Every bounded function f : [a; b] ! R with at most countably many discon-
tinuities, is integrable.

Therefore, a function is integrable if its points of discontinuity form a …nite or a countable


set. We omit the proof (that is less immediate of the special case just seen with only
removable discontinuities).
This important result of integrability generalizes both Proposition 1091 and Proposition
1089 on the integrability of step functions (that, obviously, are continuous except at the
points of the subdivisions that de…ne them).9 Let us see a pair of examples.

Example 1093 (i) The function f : [0; 1] ! R de…ned by

x if x 2 (0; 1)
f (x) = 1
2 if x 2 f0; 1g

is continuous at all the points of [0; 1], except at the two extreme points 0 and 1. By Theorem
1092, the function f is integrable.
(ii) Consider the countable set

1
E= :n 1 [0; 1]
n
9
In more advanced courses, the reader will study more general versions of the already remarkable Theorem
1092.
800 CHAPTER 30. RIEMANN’S INTEGRAL

The function f : [0; 1] ! R de…ned by

x2 if x 2
=E
f (x) =
0 if x 2 E

is continuous at all the points of [0; 1], except at the points of E.10 Since E is a countable
set, by Theorem 1092 the function f is integrable. N

Note that the Dirichlet function f : [0; 1] ! R

1 if x 2 Q\ [0; 1]
f (x) =
0 if x 2
= Q\ [0; 1]

which we know it is not integrable, does not satisfy the hypotheses of Theorem 1092. Indeed,
even if it is bounded, it is discontinuous at each point of [0; 1] (not only at the points
x 2 Q\ [0; 1], which form a countable set).

Let us now consider the monotonic functions.

Proposition 1094 Every monotonic function f : [a; b] ! R is integrable.

The result follows immediately from Theorem 1092 because monotonic functions have at
most countably many points of discontinuity (Proposition 455). Given this, we give a simple
direct proof of the result.

Proof Let " > 0. Let = fxi gni=0 be a subdivision of [a; b] such that j j < ". Let us suppose
that f is increasing (the argument for f decreasing is analogous). We have

inf f (x) = f (xi 1) f (a) and sup f (x) = f (xi ) f (b)


x2[xi 1 ;xi ] x2[xi 1 ;xi ]

and therefore
n
X n
X
S (f; ) I (f; ) = sup f (x) xi inf f (x) xi
x2[xi 1 ;xi ]
i=1 x2[xi 1 ;xi ] i=1
Xn n
X n
X
= f (xi ) xi f (xi 1 ) xi = (f (xi ) f (xi 1 )) xi
i=1 i=1 i=1
n
X
j j (f (xi ) f (xi 1 )) < " (f (b) f (a))
i=1

Thanks to Proposition 1085, the function f is integrable.


10
Note that f is continuous at the origin, as the reader can verify.
30.5. PROPERTIES OF THE INTEGRAL 801

30.5 Properties of the integral


The …rst important property of the integral is its linearity: the integral of a linear combin-
ation of functions is equal to the linear combination of their integrals.

Theorem 1095 Let f; g : [a; b] ! R be two bounded and integrable functions. Then, for
every ; 2 R, the function f + g : [a; b] ! R is integrable and we have
Z b Z b Z b
( f + g) (x) dx = f (x) dx + g (x) dx (30.30)
a a a

Proof The proof is divided in two parts. First we will prove the homogeneity, that is,
Z b Z b
f (x) dx = f (x) dx 8 2R (30.31)
a a

Then we will prove the additivity, that is,


Z b Z b Z b
(f + g) (x) dx = f (x) dx + g (x) dx (30.32)
a a a

given f and g integrable. Together, expressions (30.31) and (30.32) are equivalent to (30.30).

(i) Homogeneity. Let = fxi gni=0 be a subdivision of [a; b]. If 0 we have I ( f; ) =


I (f; ) and S ( f; ) = S (f; ). Therefore, f is integrable, with
Z b Z b
f (x) dx = f (x) dx (30.33)
a a

Let now < 0. Let us start by considering the case = 1. We have


n
X n
X
I ( f; ) = inf ( f ) (x) xi = sup f (x) xi
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=0 i=0
n
X
= sup f (x) xi = S (f; )
i=0 x2[xi 1 ;xi ]

In an analogous way, we have S ( f; ) = I (f; ). Let " > 0. Since f is integrable, by


Proposition 1085, there exists such that S (f; ) I (f; ) < ". Therefore, S ( f; )
I ( f; ) = S (f; ) I (f; ) < ", which implies, by Proposition 1085, that f is integrable.
Moreover,

Z b Z b
( f ) (x) dx = sup I ( f; ) = sup S (f; ) = inf S (f; ) = f (x) dx
a 2 2 2 a

Let now < 0. We have f = ( )( f ) with > 0. Then, applying (30.33), we have
Z b Z b Z b
( f ) (x) dx = ( ) ( f ) (x) dx = ( ) ( f ) (x) dx
a a a
Z b Z b
=( ) f (x) dx = f (x) dx
a a
802 CHAPTER 30. RIEMANN’S INTEGRAL

In conclusion,
Z b Z b
f (x) dx = f (x) dx 8 2R (30.34)
a a
that is (30.31).
(ii) Additivity. Let us prove (30.32). Let " > 0. Since f and g are integrable, by
Proposition 1085 there exists a subdivision of [a; b] such that S (f; ) I (f; ) < " and
there exists 0 such that S (g; 0 ) I (g; 0 ) < ". Let 00 be a subdivision of [a; b] that re…nes
both and 0 . Thanks to (30.6), we have S (f; 00 ) I (f; 00 ) < " and S (g; 00 ) I (g; 00 ) < ".
Moreover, applying the inequalities of Lemma 1082,
00 00 00 00 00 00
I f; + I g; I f + g; S f + g; S f; + S g; (30.35)

and therefore
00 00 00 00 00 00
S f + g; I f + g; S f; I f; + S g; I g; < 2"

By Proposition 1085, f + g is integrable. Hence, (30.35) becomes:


Z b
I (f; ) + I (g; ) (f + g)(x)dx S (f; ) + S (g; )
a
Rb Rb
for every subdivision 2 . Subtracting a f (x) dx + a g (x) dx from all the three members
of the inequality, we obtain
Z b Z b
I (f; ) + I (g; ) f (x) dx + g (x) dx
a a
Z b Z b Z b
(f + g)(x)dx f (x) dx + g (x) dx
a a a
Z b Z b
S (f; ) + S (g; ) f (x) dx + g (x) dx
a a

that is
Z b Z b
I (f; ) f (x) dx + I (g; ) g (x) dx
a a
Z b Z b Z b
(f + g)(x)dx f (x) dx + g (x) dx
a a a
Z b Z b
S (f; ) f (x) dx + S (g; ) g (x) dx
a a

Since f and g are integrable, given " > 0 it is possible to …nd a subdivision " such that,
for h = f; g
Z b Z b
" " " "
I (h; ) h (x) dx > and S (h; ) h (x) dx <
a 2 a 2
So that Z Z Z
b b b
"< (f + g)(x)dx f (x) dx + g (x) dx <"
a a a
30.5. PROPERTIES OF THE INTEGRAL 803

and, given the arbitrariness of " > 0, it is necessarily


Z b Z b Z b
(f + g)(x)dx = f (x) dx + g (x) dx (30.36)
a a a

that is (30.32).

An important consequence of the linearity of the integral is that the product of two
integrable functions is integrable.

Corollary 1096 If f; g : [a; b] ! R are two bounded and integrable functions, then their
product f g : [a; b] ! R is integrable.

Proof If f = g, the integrability of f 2 follows from Proposition 1087 considering the con-
tinuous function g (x) = x2 . If f 6= g, then f g can be rewritten in the following way:

1h i
fg = (f + g)2 (f g)2
4
By Theorem 1095, f + g and f g are integrable. For what has just been proved, also their
squares are integrable; applying again Theorem 1095, we have that f g is integrable.

O.R. Thanks to the linearity of the integral, the knowledge of the integrals of f and g allows
to calculate the integral of f + g. It is not so for the product or for the composition of
integrable functions: the integrability of f guarantees the integrability of f 2 , but the know-
ledge of the integral of f does not help in the calculation of the integral of f 2 . More generally,
knowing that g f is integrable does not give any useful indication for the computation of
the integral of such a composite function. H

Finally, the linearity of the integral implies that it is possible to subdivide freely the
domain of integration [a; b] in subintervals.

Corollary 1097 Let f : [a; b] ! R be a bounded and integrable function. If a < c < b, we
have
Z b Z c Z b
f (x) dx = f (x) dx + f (x) dx (30.37)
a a c

Vice versa, if f1 : [a; c] ! R and f2 : [c; b] ! R are bounded and integrable, then the function
f : [a; b] ! R de…ned by
(
f1 (x) if x 2 [a; c]
f (x) =
f2 (x) if x 2 (c; b]

is also itself bounded and integrable, with


Z b Z c Z b
f (x) dx = f1 (x) dx + f2 (x) dx
a a c
804 CHAPTER 30. RIEMANN’S INTEGRAL

Proof Let us prove the …rst part. Since (recall De…nition (30.25) of indicator function):

f = 1[a;c] f + 1(c;b] f

the linearity of the integral implies


Z b Z b Z b
f (x) dx = 1[a;c] f + 1(c;b] f (x) dx = 1[a;c] f (x) + 1(c;b] f (x) dx
a a a
Z b Z b
= 1[a;c] f (x) dx + 1(c;b] f (x) dx
a a

Let us show that Z Z


b c
1[a;c] f (x) dx = f (x) dx
a a

where
f (x) if x 2 [a; c]
1[a;c] f (x) =
0 if x 2 (c; b]

Let " > 0. Since 1[a;c] f (x) is integrable (being product of integrable functions),11 by Pro-
position 1085 there exists a subdivision of [a; b] such that

S(1[a;c] f (x) ; ) I(1[a;c] f (x) ; ) < "

Let 0 = fxi gi=0;1;:::;n be a re…nement of that has c as point of subdivision, let us say
c = xj . Then we have
S(1[a;c] f (x) ; 0 ) I(1[a;c] f (x) ; 0
)<"
Let 00 = 0 \ [a; c]. In other words, 00 = fx0 ; x1 ; :::xj g is the restriction of the subdivision
0 on the interval [a; c]. Using the usual terminology for m and M for every i = 1; 2; :::n
i i
and being mi = Mi = 0 for i > j, we have
n
X X
0 00
I(1[a;c] f (x) ; )= mi xi = mi xi = I(fj[a;c] (x) ; ) (30.38)
i=1 i j

and
n
X X
0 00
S(1[a;c] f (x) ; )= Mi xi = Mi xi = S(fj[a;c] (x) ; ) (30.39)
i=1 i j

Therefore,
00 00
S(fj[a;c] (x) ; ) I(fj[a;c] (x) ; )<"
and by Proposition 1085 we can conclude that fj[a;c] : [a; c] ! R is integrable. Moreover from
(30.38) and (30.39) we deduce that
Z b Z c Z c
1[a;c] f (x) dx = fj[a;c] (x)dx = f (x)dx
a a a
11
The indicator function, being a step function, is integrable.
30.5. PROPERTIES OF THE INTEGRAL 805

In an analogous way we prove that


Z b Z b
1(c;b] f (x) dx = f (x) dx
a c

and therefore (30.37) follows.


Let us prove the second part. Let " > 0. Since f1 is integrable there exists a subdivision
0 of [a; c] such that

S(f1 ; 0 ) I(f1 ; 0 ) < "


Since f2 is integrable there exists a subdivision 00 of (c; b] such that
00 00
S(f2 ; ) I(f2 ; )<"

Therefore taken the subdivision of [a; b] given by = 0 [ 00 , we get


0 0 00 00
S(f; ) I(f; ) = S(f1 ; ) I(f1 ; ) + S(f2 ; ) I(f2 ; ) < 2"

which shows that f is integrable. Moreover, we have

fj[a;c] = f1 and fj(c;b] = f2

therefore f = 1[a;c] f1 + 1(c;b] f2 and by the linearity of the integral we have


Z b Z b Z b
f (x) dx = 1[a;c] f (x) dx + 1(c;b] f (x) dx
a a a
Z c Z c Z c Z b
= fj[a;c] (x)dx + fj(c;b] (x)dx = f1 (x) dx + f2 (x) dx
a a a c

as desired.

The next property of monotonicity of the integral shows that to larger functions there
correspond larger integrals. The writing f g means f (x) g (x) for every x 2 [a; b].

Theorem 1098 Let f; g : [a; b] ! R be two bounded and integrable functions. If f g, then
Rb Rb
a f (x) dx a g (x) dx.

Proof From f g it follows

I (f; ) I (g; ) and S (f; ) S (g; ) 8 2


Rb Rb
which, in turn, implies a f (x) dx a g (x) dx.

From the monotonicity of the integral it follows an important inequality between absolute
values of integrals and integrals of absolute values. With regard to this, recall as after
Proposition 1087 we observed that the integrability of jf j follows from the integrability of f .

Corollary 1099 Let f : [a; b] ! R be a bounded and integrable function. We have


Z b Z b
f (x) dx jf (x)j dx (30.40)
a a
806 CHAPTER 30. RIEMANN’S INTEGRAL

Rb
Proof Since f jf j and f jf j, from Proposition 1098 it follows that a f (x) dx
Rb Rb Rb Rb Rb
a jf (x)j dx and a f (x) dx a jf (x)j dx, that is, a f (x) dx a jf (x)j dx.

The monotonicity of the integral allows to establish an interesting sandwich for the
integrals.

Proposition 1100 Let f : [a; b] ! R be a bounded and integrable function. Then, setting
m = inf [a;b] f (x) and M = sup[a;b] f (x), we have
Z b
m (b a) f (x) dx M (b a) (30.41)
a

Proof We have
m f (x) M 8x 2 [a; b]
and therefore, by the monotonicity of the integral,
Z b Z b Z b
mdx f (x) dx M dx
a a a
Rb
We obviously get a mdx = m (b a) (it is the area of a rectangle of base b a and of
Rb
height m) and a M dx = M (b a). Therefore
Z b
m (b a) f (x) dx M (b a)
a

as we wanted to prove.

We end with the classical Theorem of the integral mean (also called Mean Value Theorem
of the integral calculus), which is consequence of the sandwich (30.41).

Theorem 1101 (of the integral mean) Let f : [a; b] ! R be a bounded and integrable
function. Then, setting m = inf [a;b] f (x) and M = sup[a;b] f (x), there exists a scalar 2
[m; M ] such that
Z b
f (x) dx = (b a) (30.42)
a
In particular, if f is continuous, there exists c 2 [a; b] such that f (c) = , that is,
Z b
f (x) dx = f (c) (b a)
a

Expression (30.42) can be rewritten as


Z b
1
f (x) dx =
b a a

For this reason, is called mean value (of the ordinates) of f : the value of the integral does
not change if we substitute the constant value to all the ordinates of the function.
30.5. PROPERTIES OF THE INTEGRAL 807

Proof By (30.41), we have


Rb
a f (x) dx
m M
b a
By setting
Rb
a f (x) dx
=
b a
we obtain the …rst part of the statement.
To prove the second part, assume that f is continuous. By Darboux’s Theorem, it
assumes all the values included between its minimum m and its maximum M . Therefore,
there exists c 2 [a; b] such that f (c) = , which completes the proof.

O.R. The Theorem of the integral mean is very intuitive: there exists a rectangle with base
[a; b] and height , with area equal to the one under f on [a; b]:

25

y
20

15

10

0
O a b x

-2 0 2 4 6 8

If, moreover, the function f is continuous, the height of such a rectangle coincides with
one of the ordinates of f . H

N.B. We close the section with an important speci…cation. Given a function f : [a; b] ! R,
Rb
until now we have considered the de…nite integral of f from a to b, Rthat is, a f (x)dx.
a
Sometimes it is useful to consider the integral
Ra of f from b to a, that is, b f (x)dx,12 as well
as the integral of f from a to a, that is, a f (x)dx. What do we intend for such expressions?
By convention, we pose, for a < b,
Z a Z b
f (x)dx = f (x)dx (30.43)
b a

and Z a
f (x)dx = 0 (30.44)
a
12
This happens, for example, if f is integrable on an interval [a; b] and we take two generic points x; y 2 [a; b],
without specifying if x < y or x y, and then we consider the integral of f between x and y.
808 CHAPTER 30. RIEMANN’S INTEGRAL

Rb
Thanks to such conventions it is no longer essential that in a we have a < b: in the case in
which a b the integral assumes the meaning given to it by (30.43) and (30.44). Moreover,
Rb
it is possible to prove that the properties proved for the integral a f (x)dx hold also in the
case a b. O

30.6 Fundamental theorems of integral calculus


After having introduced Riemann’s integral and studied its main properties, we turn the
attention to the e¤ective calculation of such integrals, for which the de…nition is of little help
(even if it is, obviously, essential to understand its nature).
In this section we study the central result of integral calculus, whose “fundamental”name
emphasizes its importance. Inter alia, it will show how integration can be seen as the inverse
operation of derivation, something that greatly simpli…es the computation of integrals.

In the study of di¤erentiability, we have considered functions di¤erentiable on an open


interval (a; b), or at least at the interior points of their domain. In this section we will
consider functions f : [a; b] ! R that are di¤erentiable on [a; b], where the derivatives at the
extremes a and b are in the unilateral sense. In a similar way we talk of di¤erentiability on
the half-open intervals (a; b] and [a; b).

30.6.1 Primitive functions


Even if we will be mainly interested in functions de…ned on closed and bounded intervals
[a; b], in this section we will consider, more generally, any interval I, be it open, closed, or
half-open, bounded or unbounded (for example, I can be the entire real line R).

De…nition 1102 Let f : I ! R. A function P : I ! R is called primitive of f on I if it


has a derivative on I and if it satis…es the equation

P 0 (x) = f (x) 8x 2 I

In other words, passing from the function f to its primitive P can be seen as the inverse
procedure with respect to passing from P to f through the derivation. In this sense, the
primitive function is the inverse of the derivative function (so that sometimes it is called
antiderivative).
Let us see now a pair of examples. With regard to this it is important to observe that,
as Example 1108 will show, there exist functions that do not have primitives: the search of
the primitive of a given function can be vain. In any case, necessary condition for a function
f to have a primitive is that it has not removable or jump discontinuities.

Example 1103 Let f : [0; 1] ! R be given by f (x) = x. The function P : [0; 1] ! R given
by P (x) = x2 =2 is primitive of f . Indeed, P 0 (x) = 2x=2 = x. N

Example 1104 Let f : R ! R be given by f (x) = x= 1 + x2 . The function P : R ! R


given by
1
P (x) = log 1 + x2
2
30.6. FUNDAMENTAL THEOREMS OF INTEGRAL CALCULUS 809

is primitive of f . Indeed,
1 1
P 0 (x) = 2x = f (x)
2 1 + x2
for every x 2 R . N

Let us do a simple, but useful, observation: if I1 and I2 are two intervals such that
I1 I2 , then, if P is primitive of f on I2 , it is so also on I1 . For example, if we consider
the restriction of f (x) = x= 1 + x2 on [0; 1], that is, the function fe : [0; 1] ! R given by
fe(x) = x= 1 + x2 , then the primitive on [0; 1] remains P (x) = 21 log 1 + x2 .

Note that, if P is a primitive of f , then the function P + k obtained summing a constant


to P is also itself a primitive of f . Indeed, (P + k)0 (x) = P 0 (x) = f (x) for every x 2 [a; b].
The next result shows that, unless such translations, the primitive function is unique.

Proposition 1105 Let f : I ! R and let P1 : I ! R be a primitive function of f . A


function P2 : I ! R is primitive of f on I if and only if there exists a constant k 2 R such
that
P1 = P2 + k

Proof The “if” is obvious. Let us prove the “only if”. Let I = [a; b] and let P1 : [a; b] ! R
and P2 : [a; b] ! R be two primitive functions of f on [a; b]. Since P10 (x) = f (x) and
P20 (x) = f (x) for every x 2 [a; b], we have
(P1 P2 )0 (x) = P10 (x) P20 (x) = 0 8x 2 [a; b]
and therefore the function P1 P2 has zero derivative on [a; b]. By Corollary 827, the function
P1 P2 is constant, that is, there exists k 2 R such that P1 = P2 + k.
Let now I be an open and bounded interval (a; b). Let " > 0 be su¢ ciently small so that
a + " < b ". We have
1 h
[ " "i
(a; b) = a + ;b
n n
n=1
For what has just been proved, for every n 1 there exists a constant kn 2 R such that
h " "i
P1 (x) = P2 (x) + kn 8x 2 a + ; b (30.45)
n n
Let x0 2 (a; b) be such that a + " < x0 < b ", so that x0 2 [a + "=n; b "=n] for every
n 1. From (30.45) it follows that P1 (x0 ) = P2 (x0 ) + kn for every n 1. Therefore,
kn = P1 (x0 ) P2 (x0 ) for every n 1, that is, k1 = k2 = = kn . There exists therefore
k 2 R such that P1 (x) = P2 (x) + k for every x 2 (a; b).
In a similar way it is possible to prove the result when I is a half-open and bounded
interval (a; b] or [a; b). If I = R, we proceed as in the case (a; b) observing that R =
1
[
[ n; n]. A similar argument, which we leave to the reader, holds also for unbounded
n=1
intervals.

Proposition 1105 is another important application of the Mean Value Theorem (of dif-
ferential calculus). Thanks to it, once identi…ed a primitive P of a function f , we can write
the family of all the primitives as fP + kgk2R . Such important family has a name.
810 CHAPTER 30. RIEMANN’S INTEGRAL

De…nition 1106 Given a function f : I ! R, the family of all its primitives is called
inde…nite integral and is denoted by
Z
f (x) dx

Let us go back to Examples (1103) and (1104).

Example 1107 For the function f : [0; 1] ! R given by f (x) = x, we have

Z
x2
f (x) dx = +k
2

For the function f : R ! R given by f (x) = x= 1 + x2 we have


Z
1
f (x) dx = log 1 + x2 + k
2

We close the section showing that not all the functions admit a primitive, and therefore
an inde…nite integral.

Example 1108 The signum function sgn : R ! R given by


8
< 1 if x > 0
sgn (x) = 0 if x = 0
:
1 if x < 0

does not admit primitive. Let us suppose, by contradiction, that there exists a primitive
P : R ! R, which is a function with a derivative such that P 0 (x) = sgn x. By Proposition
1105 there exists k 2 R such that

x+k if x > 0
P (x) =
x+k if x < 0

Since P has a derivative, by continuity we have moreover P (0) = k. Therefore, P (x) = jxj+k
for every x 2 R, but such function has not a derivative, which contradicts what has been
assumed on P . Note that the function signum is a step function and therefore is integrable
thanks to Proposition 1089. N

Rb
O.R. Riemann’s integral a f (x) dx is often called de…nite integral, distinguishing it in such
a way from the inde…nite integral just introduced. H
30.6. FUNDAMENTAL THEOREMS OF INTEGRAL CALCULUS 811

30.6.2 Formulary
The next table, obtained by reversing the analogous table of the fundamental derivatives,
reports some fundamental inde…nite integrals.
R
f f (x) dx
xa+1
xa +k 1 6= a 2 R and x > 0
a+1
xn+1
xn +k x2R
n+1
1
log x + k x>0
x
1
log ( x) + k x<0
x
cos x sin x + k x2R
sin x cos x + k x2R
ex ex + k x2R
x
x +k > 0 and x 2 R
log
1
p arcsin x + k x 2 ( 1; 1)
1 x2
1
arctan x + k x2R
1 + x2
1
(cos x)2
tan x + k x2R

We do three observations:

(i) For powers we have


Z
xa+1
xa dx = +k 8a 6= 1
a+1
on all R when a is such that the power function xa has as domain R: for example, if
a = n natural. If instead a 2 R, for example a = 1=2, it must be x > 0.

(ii) The case a = 1 for powers is covered by f (x) = 1=x.

(iii) Note that


Z
log x + k if x > 0
f (x) dx =
log ( x) + k 0 if x < 0
summarizes the cases x < 0 and x > 0 for f (x) = 1=x. With regard to this, observe
that for x < 0 and g (x) = log ( x) we have

1 1
g 0 (x) = ( 1) =
x x
812 CHAPTER 30. RIEMANN’S INTEGRAL

30.6.3 The First Fundamental Theorem of Calculus


The next theorem, called First Fundamental Theorem of Calculus, is a central result in
the theory of integration. From the conceptual point of view, it shows how integration can
be seen as the inverse operation of derivation. This, in turn, o¤ers a powerful method of
computation of integrals, based on the use of primitive functions.
Theorem 1109 (First Fundamental Theorem of Calculus) Let f : [a; b] ! R be a
bounded function and P : [a; b] ! R any primitive function of f . If f is integrable, then
Z b
f (x) dx = P (b) P (a) (30.46)
a
Rb
Thanks to (30.46), the calculation of Riemann’s integral a f (x) dx reduces to the cal-
culation of the primitive P of f , that is, to the calculation of the inde…nite integral. As
we have seen in the previous section, it can be carried out using in a suitable way the rules
of derivation studied in Chapter 18. In a certain sense, (30.46) reduces integral calculus to
di¤erential calculus.

Proof Let = fxi gni=0 be a subdivision of [a; b]. If we add and subtract P (xi ) for every
i = 1; 2; : : : ; n 1, we have
P (b) P (a) = P (xn ) P (xn 1 ) + P (xn 1) P (x1 ) + P (x1 ) P (x0 )
Xn
= (P (xi ) P (xi 1 ))
i=1

Let us consider P on [xi 1 ; xi ]. Since P has a derivative on (a; b) and it is continuous on


[a; b], by the Mean Value Theorem (of di¤erential calculus) there exists x^i 2 (xi ; xi 1 ) such
that
P (xi ) P (xi 1 )
P 0 (^
xi ) =
xi xi 1
Being P a primitive, we have therefore
P (xi ) P (xi 1)
xi ) = P 0 (^
f (^ xi ) =
xi xi 1
and hence
n
X n
X n
X
P (b) P (a) = (P (xi ) P (xi 1 )) = f (^
xi ) (xi xi 1) = f (^
xi ) xi
i=1 i=1 i=1

which implies
I (f; ) P (b) P (a) S (f; ) (30.47)
Since is any subdivision, (30.47) holds for every 2 and therefore
sup I (f; ) P (b) P (a) inf S (f; )
2 2

from which, being f integrable, (30.46) follows.

Let us illustrate the theorem with some examples, which use again the primitives calcu-
lated in Examples (1103) and (1104).
30.6. FUNDAMENTAL THEOREMS OF INTEGRAL CALCULUS 813

Example 1110 Let f : R ! R be given by f (x) = x. We have P (x) = x2 =2 and therefore,


thanks to (30.46),
Z b
b2 a2
xdx =
a 2 2
R1
For example, 0 xdx = 1=2. N

Example 1111 Let f : R ! R be given by f (x) = x= 1 + x2 . As we have seen in Example


1104, the primitive function P : R ! R is given by P (x) = (1=2) log 1 + x2 . Therefore,
thanks to (30.46),
Z b
x 1 1
2
dx = log 1 + b2 log 1 + a2
a 1+x 2 2
For example, Z 1
x 1 log 2
2
dx = log 2 0 =
0 1+x 2 2
N

For the integrable functions without primitives, such as the function sgn x, Theorem 1109
cannot be applied and the calculation of integrals cannot be done through formula (30.46). In
some simple cases it is however possible to calculate the integral using directly the de…nition.
For example, the function signum is a step function and therefore we can apply Proposition
1089, in which, using the de…nition of integral, we determined the value of the integral for
this class of functions. In particular, we have
8
Z b < b a if a 0
>
sgn x dx = a + b if a < 0 < b
a >
:
a b if b 0
The cases a 0 and b 0, using (30.26), are obvious. Let us consider the case a < 0 < b.
Using (30.37) and (30.26), we have
Z b Z 0 Z b Z 0 Z b
sgn x dx = sgn x dx + sgn x dx = ( 1)dx + 1dx
a a 0 a 0
= ( 1)(0 a) + (1)(b 0) = a + b

30.6.4 The Second Fundamental Theorem of Calculus


In the light of the First Fundamental Theorem, it becomes crucial to identify conditions
that guarantee that an integrable function f : [a; b] ! R has a primitive. Indeed, there are
integrable functions that, as the function signum (Example 1108), do not have primitives
and for which Theorem 1109 cannot therefore be applied.
To this end, we introduce an important notion.

De…nition 1112 Let f : [a; b] ! R be an integrable function. The function F : [a; b] ! R


given by Z x
F (x) = f (t) dt
a
is called integral function of f .
814 CHAPTER 30. RIEMANN’S INTEGRAL

In other words, the value F (x) of the integral function is the (signed) area under f on
the interval [a; x], when x varies.13
Rx
N.B. The integral function F (x) = a f (t) dt is a function F : [a; b] ! R that has as
variable Rthe extreme of integration x, which when varies determines a di¤erent Riemann’s
x
integral a f (t) dt. The value of such integral (that is a number) is the image F (x) of
the integral function. With regard to this, note that F is de…ned on [a; b] since, being f
integrable on such interval, it is integrable on all the subintervals [a; x] [a; b]. O

Let us see a …rst property of integral functions.

Proposition 1113 The integral function F : [a; b] ! R of an integrable bounded function


f : [a; b] ! R is (uniformly) continuous.

Proof Since f is bounded, there exists M > 0 such that jf (x)j M for everyRx 2 [a; b].
x
Let x; y 2 [a; b]. By the de…nition of integral function, we have F (x) F (y) = y f (t) dt.
Thanks to (30.40), we have
Z x Z x Z x
jF (x) F (y)j = f (t) dt jf (t)j dt M dt = M jx yj
y y y

and therefore, for every " > 0, setting " = "=M ,


jx yj < " =) jF (x) F (y)j < " 8x; y 2 [a; b]
By Theorem 473, F is (uniformly) continuous on [a; b].

Forti…ed by the notion of integral function, we can now go back to the problem that
opened the section, that is, the identi…cation of criteria that ensure the existence of primitives
for integrable functions. The next very important result, the Second Fundamental Theorem
of Calculus, shows that if f is continuous, then F 0 (x) = f (x) for every x 2 [a; b], that is, the
integral function is exactly a primitive of f . The continuity of a function is therefore a simple
and fundamental condition that guarantees the existence of primitives of the function.

Theorem 1114 (Second Fundamental Theorem of Calculus) Let f : [a; b] ! R be a


continuous (and hence integrable) function. Its integral function F : [a; b] ! R is a primitive
of f , that is, it is di¤ erentiable at every x 2 [a; b], and we have
F 0 (x) = f (x) 8x 2 [a; b] (30.48)

Proof Let x0 2 (a; b). First of all let us see which form it assumes the di¤erence quotient of
F at x0 . Let us consider h > 0 such that x0 + h 2 [a; b]. Thanks to Corollary 1097 we have
Z x0 +h Z x0
F (x0 + h) F (x0 ) = f (t) dt f (t) dt
a a
Z x0 Z x0 +h Z x0 Z x0 +h
= f (t) dt + f (t) dt f (t) dt = f (t) dt
a x0 a x0
13
Note that in the de…nition of the integral function the (mute) variable of integration is no longer x, but
any other letter (here we have chosen t, but it could have been z, u or any other letter di¤erent from x). Such
a choice is dictated by the necessity of avoiding any kind of confusion on the use of the variable x, that this
time becomes the independent variable of the integral function.
30.6. FUNDAMENTAL THEOREMS OF INTEGRAL CALCULUS 815

and therefore, thanks to the Mean Value Theorem (of the integral calculus), having denoted
by x0 + #h, 0 # 1, a point of the interval [x0 ; x0 + h]:
R x0 +h
F (x0 + h) F (x0 ) x0 f (t) dt hf (x0 + #h) hf (x0 )
f (x0 ) = f (x0 ) =
h h h
= f (x0 + #h) f (x0 ) ! 0

by the continuity of f .
An analogous argument holds also if h < 0.14 Therefore,

F (x0 + h) F (x0 )
F 0 (x0 ) = lim = f (x0 )
h!0 h

completing in this way the proof when x0 2 (a; b). The cases x0 = a and x0 = b are proved
in a similar way, as the reader can easily verify. We conclude that there exists F 0 (x0 ) and
that it is equal to f (x0 ).

The Second Fundamental Theorem gives a su¢ cient condition (the continuity) for an
integrable function to have primitive. Moreover, thanks to (30.46) of the First Fundamental
Theorem, we have
Z b
f (x) dx = F (b) F (a) (30.49)
a
Rb
that is the Riemann’s integral a f (x) dx of a continuous function f is equal to the di¤erence
F (b) F (a) calculated relatively to the integral function. Together the two fundamental
theorems form the backbone of integral calculus by making it operational.

The next example shows that continuity is only a su¢ cient, but not necessary, condition
for an integrable function to admit primitive. Indeed, it shows that there exist non-continuous
integrable functions that have primitives (and for which the First Fundamental Theorem can
therefore be applied).

Example 1115 Let f : R ! R be given by


( 1 1
2x sin cos if x 6= 0
f (x) = x x
0 if x = 0

It is discontinuous at 0. Nevertheless, a primitive P : R ! R of this function is


( 1
x2 sin if x 6= 0
P (x) = x
0 if x = 0
14
Observe that in this case we have
Z x0 +h Z x0 Z x0 +h Z x0 +h Z x0 Z x0
f (t) dt f (t) dt = f (t) dt f (t) dt + f (t) dt = f (t) dt
a a a a x0 +h x0 +h
816 CHAPTER 30. RIEMANN’S INTEGRAL

Indeed, for x 6= 0, this can be veri…ed deriving x2 sin 1=x, while for x = 0 it is possible to
observe that

P (h) P (0) h2 sin h1 1


P 0 (0) = lim = lim = lim h sin = 0 = f (0)
h!0 h h!0 h h!0 h
N

Next we show that when f is not continuous the theorem may fail.

Example 1116 De…ne f : [0; 1] ! R by


1
n if x = m
n (in its lowest terms)
f (x) =
0 otherwise

The function f , a well behaved modi…cation of the Dirichlet function, is continuous at every
irrational points and discontinuous atRevery rational point of the unit interval. By Theorem
1
1092, f is integrable. In particular, 0 f (t) dt = 0. It is a useful (non-trivial) exercise to
check all this. Rx
That said, if F (x) = 0 f (t) dt for every x 2 [0; 1], we then have F (x) = 0 for every
x 2 [0; 1]. Hence, F is trivially di¤erentiable, with F 0 (x) = 0 for every x 2 [0; 1], but F 0 6= f
because F 0 (x) = f (x) if and only if x is irrational. We conclude
R x 0 that (30.48) does not hold,
and so the theorem fails. Nevertheless, we have F (x) = 0 F (t) dt for every x 2 [0; 1]. N

O.R. The operation of integration regularizes a function: the integral function F of f is


always continuous and, if f is continuous, F is di¤erentiable. In contrast, the operation
of derivation makes a function more irregular. More speci…cally, integration scales up of a
degree the regularity: F is always continuous; if f is continuous, F is di¤erentiable and,
continuing in this way, if f is di¤erentiable, F is twice di¤errentiable, and so on and so forth.
Derivation, instead, scales down the regularity of a function. H

30.7 Properties of the inde…nite integral


The First Fundamental Theorem of Calculus gives, through formula (30.46), a very powerful
method for the calculation of Riemann’s integrals. It relies on the calculationR of primitives,
b
that is of the inde…nite integral. Indeed, to calculate the Riemann’s integral a f (x) dx of a
function f : [a; b] ! R that has primitive, we proceed in two steps:
R
(i) we calculate the primitive P : [a; b] ! R of f , that is, the inde…nite integral f (x) dx;

(ii) we calculate the di¤erence P (b) P (a): such di¤erence is often denoted by the nota-
tions P (x)jba or [P (x)]ba .

We present some properties of the inde…nite integral that simplify its calculation. As a
…rst thing, we observe that the linearity of derivatives, established in (18.12), implies the
linearity of the inde…nite integral. As in Section 30.6.1, we denote by I a generic interval,
bounded or unbounded.
30.7. PROPERTIES OF THE INDEFINITE INTEGRAL 817

Proposition 1117 Let f; g : I ! R be two functions that admit primitives. For every
; 2 R, the function f + g : I ! R admits primitive and we have
Z Z Z
( f + g) (x) dx = f (x) dx + g (x) dx (30.50)

Proof Let Pf : I ! R and Pg : I ! R be the primitives of f and g. Thanks to (18.12) we


have
( Pf + Pg )0 (x) = Pf0 (x) + Pg0 (x) = f (x) + g (x) 8x 2 I
and therefore Pf + Pg is the primitive of f + g, which implies (30.50).

A simple application of the result is the calculation of the inde…nite integral of a poly-
nomial. Indeed, given a polynomial f (x) = 0 + 1 x + + n xn , from (30.50) it follows
that Z Z X ! Z
n Xn Xn
i i xi+1
f (x) dx = i x dx = i x dx = i +k
i+1
i=0 i=0 i=0
Rule (18.13) of derivation of the product of functions leads to an important rule of
calculation of the inde…nite integral, called integration by parts.
Proposition 1118 (Integration by parts) Let f; g : I ! R be two functions with a de-
rivative. Then Z Z
f 0 (x) g (x) dx + f (x) g 0 (x) dx = f (x) g (x) + k (30.51)

Proof Expression (18.13) implies that (f g)0 = f 0 g + f g 0 . Hence, f g = Pf 0 g+f g0 , and thanks
to (30.50) we have
Z Z Z
f (x) g (x) + k = f (x) g (x) + f (x) g (x) dx = f (x) g (x) dx + f (x) g 0 (x) dx
0 0 0

as desired.

Rule (30.51) is useful because


R sometimes there
R is a strong asymmetry in the calculability
of the inde…nite integrals f 0 (x) g (x) dx and f (x) g 0 (x) dx: one of the two can be much
simpler to calculate than the other. Exploiting such asymmetry, thanks to (30.51), we can
calculate the most complicated integral as di¤erence between f (x) g (x) and the simplest
integral.
R
Example 1119 Let us calculate the inde…niteR integral log x dx. Let f; g :R(0; 1) ! R be
de…ned as f (x) = log x and g (x) = x, so that log x dx can be rewritten as log x g 0 (x) dx.
Thanks to (30.51) we have
Z Z
0
xf (x) dx + log x dx = x log x + k

that is Z Z
1
x dx + log x dx = x log x + k
x
which implies Z
log x dx = x (log x 1) + k
N
818 CHAPTER 30. RIEMANN’S INTEGRAL

R
Example 1120 Let us calculate the inde…nite integral Rx sin x dx. Let f; g : (0; 1) !
RR be given by f (x) = x and g (x) = cos x, so that x sin x dx can be rewritten as
f (x) g 0 (x) dx. Thanks to (30.51) we have
Z Z
0
f (x) g (x) dx + x sin x dx = x cos x + k

that is Z Z
x sin x dx = cos xdx x cos x + k = sin x x cos x + k

Observe that in the last example, if instead we set f (x) = sin x and g (x) = x2 =2, rule
(30.51) Rwould have revealed
R itself useless. Also with such choice of f and g it is possible to
rewrite x sin x dx as f (x) g 0 (x) dx, but here (30.51) implies
Z Z
x2
f 0 (x) g (x) dx + x sin x dx = sin x + k
2

that is Z Z
x2 1
x sin x dx = sin x x2 cos xdx + k
2 2
R
that has actually complicated things because Rthe integral x2 cos xdx is more di¢ cult to
compute with respect to the original integral x sin x dx. This shows that integration by
parts cannot proceed in a mechanical way, but it requires a bit of imagination and experience.

O.R. Example 1120 shows that to calculate the integral


Z
xn h(x)dx

where h is a function whose primitive is of similar “complexity” (e.g., h is sin x, cos x or


ex ), a good choice is to set f (x) = xn and g(x) = h(x). Indeed, after having derived f (x)
for n times, the polynomial form disappears and it only remains g(x) or g 0 (x), which is
immediately integrable. Such a choice has been done in Example 1120. H

The formula of integration by parts is usually written as


Z Z
0
f (x) g (x) dx = f (x) g (x) f 0 (x) g (x) dx + k

The two factors of the product f (x) g 0 (x) dx are called respectively …nite factor, f (x), and
di¤ erential factor, g 0 (x) dx, so that the formula can be remembered as: “the integral of the
product between a …nite factor and a di¤erential factor is equal to the product between …nite
factor and the integral of the di¤erential factor minus the integral of the product between
the derivative of the …nite factor and the integral just found”. We repeat that it is important
to cautiously choose which of the two factors to take as …nite factor and which as di¤erential
factor.
30.8. CHANGE OF VARIABLE 819

By considering Riemann’s integrals, the formula obviously becomes


Z b Z b
0
f (x) g (x) dx = f (x) g (x)jba f 0 (x) g (x) dx (30.52)
a a
Z b
= f (b) g (b) f (a) g (a) f 0 (x) g (x) dx
a

30.8 Change of variable


The next result shows how the integral of a function f changes when we compose it with
another function '.

Theorem 1121 Let ' : [c; d] ! [a; b] be a di¤ erentiable and strictly increasing function
such that '0 : [c; d] ! R is integrable. If f : [a; b] ! R is continuous, then the function
(f ') '0 : [c; d] ! R is integrable and
Z d Z '(d)
f (' (t)) '0 (t) dt = f (x) dx (30.53)
c '(c)

If ' is surjective, we have a = ' (c) and b = ' (d). Expression (30.53) can therefore be
rewritten as Z b Z d
f (x) dx = f (' (t)) '0 (t) dt (30.54)
a c

Heuristically, (30.53) can be seen as the result of the change of variable x = ' (t) and of the
relative change
dx = '0 (t) dt = d' (t) (30.55)
in dx. At a mnemonic and of calculation level, the observation can be useful, even if writing
(30.55) is in itself without meaning.

Proof Since f is continuous, thanks to (30.49) we have


Z '(d)
f (x) dx = F (' (d)) F (' (c)) (30.56)
'(c)

Moreover, the chain rule implies

(F ')0 (t) = F 0 ' (t) '0 (t) = (f ') (t) '0 (t)

that is F ' is a primitive of (f ') '0 : [c; d] ! R. Thanks to Proposition 1087, the
composite function f ' : [c; d] ! R is integrable. Since, by hypothesis, '0 : [c; d] ! R is
integrable, also the product function (f ') '0 : [c; d] ! R is so (recall what we have seen at
the end of Section 30.5). By the First Fundamental Theorem we have
Z d
(f ') (t) '0 (t) dt = (F ') (d) (F ') (c) (30.57)
c
820 CHAPTER 30. RIEMANN’S INTEGRAL

Since ' is bijective (being strictly increasing) we have ' (c) = a and ' (d) = b. Therefore,
(30.57) and (30.56) imply
Z d Z b
0
(f ') (t) ' (t) dt = F (' (d)) F (' (c)) = f (x) dx
c a

as desired.

Theorem 1121, besides having a theoretical interest, can be very useful in the calculation
of integrals. Formula (30.53), and its rewriting (30.54), can be used both from “right to left”
and fromR “left to right”. In the …rst case, from right to left, the objective is to calculate the
b
integral a f (x) dx …nding a suitable change of variable x = ' (t) that leads to an integral
R ' 1 (b)
' 1 (a)
f (' (t)) '0 (t) dt of simpler calculation. The di¢ culty is in …nding a suitable change
of variable x = ' (t): indeed, nothing guarantees that there exists a “simplifying” change
and, even if it existed, it might not be obvious how to …nd it.

The use in direction left to right of formula (30.53) is useful to calculate an integral that
Rd
can be written as c f (' (t)) '0 (t) dt for some function f of which we know the primitive
R '(d)
F . In such a case, the corresponding integral '(c) f (x) dx, obtained setting x = ' (t), is of
easier solution since Z
f ('(x))'0 (x)dx = F ('(x))

Rd
In such a case the di¢ culty is in recognizing the composite form c f (' (t)) '0 (t) dt in the
integral that we want to calculate. Also here, nothing guarantees that it can be rewritten
in this form, nor that, also when possible, it is easy to recognize. Only the experience (and
the exercise) can be of help. The next example presents some classical integrals that can be
solved with this technique.

Example 1122 (i) If a 6= 1, we have


Z
'(x)a+1
'(x)a '0 (x)dx = +k
a+1

For example, Z
1
sin4 x cos xdx = sin5 x + k
5

(ii) We have
Z
'0 (x)
dx = log j'(x)j + k
'(x)
For example,
Z Z Z
sin x sin x
tan xdx = dx = dx = log j cos xj + k
cos x cos x
30.8. CHANGE OF VARIABLE 821

(iii) We have
Z Z
sin('(x))'0 (x)dx = cos '(x) + k and cos('(x))'0 (x)dx = sin '(x) + k

For example,
Z
sin(3x3 2x2 ) (9x2 4x)dx = sin(3x3 2x2 ) + k

(iv) We have Z
e'(x) '0 (x)dx = e'(x) + k

For example, Z Z
2 1 2 1 2
xex dx = 2xex dx = ex + k
2 2

We present now three examples that illustrate the two possible applications of formula
(30.53). The …rst example considers the case right to left, the second example can be solved
both with the method right to left and with the method left to right, while the last example
considers the case left to right. For simplicity we use the variables x and t as they appear in
(30.53), even if it is obviously a pure convenience, without substantial value.

Example 1123 Let us consider the integral


Z b p
sin x dx
a
p
with [a; b] [0; +1). Let us set t = x, so that x = t2 . We have therefore ' (t) = t2 and,
thanks to (30.54), we have
Z Z p Z p
b p b b
sin xdx = p
2t sin tdt = 2 p
t sin tdt
a a a
R
In Example 1120 we solved by parts the inde…nite integral t sin tdt. In the light of such
example, we have
Z p
b p p p p p p p
p
t sin tdt = sin x x cos xjpba = sin b sin a + a cos a b cos b
a

and therefore
Z b p p p
p p p p
sin xdx = 2 sin b sin a + a cos a b cos b
a
p
Note how the starting point has been to set t = x, that is, to specify the inverse function t =
p
' 1 (x) = x. This is often the case because it is simpler to think of which transformation
of x may simplify the integration. N
822 CHAPTER 30. RIEMANN’S INTEGRAL

Example 1124 Let us consider the integral


Z
2 cos x
dx
0 (1 + sin x)3

1
“Right to left”. Let us set t = sin x, so that ' (t) = sin t on [0; =2]. From (18.20) it
follows that
1
'0 (t) =
cos sin 1 t

Thanks to (30.54), we have


Z Z 1 1
2 cos x cos sin t 1
dx = dt
0 (1 + sin x)3 0 (1 + t) 3
cos sin 1
t
Z 1 1
1 1 3
= dt = =
0 (1 + t)3 2 (1 + t)2 0
8

“Left to right”. In the integral we recognize a form of type (i) of Example 1122, which is an
integral of the type
Z
'(x)a '0 (x)dx

R '(x)a+1
with '(x) = 1 + sin x and a = 3. Since '(x)a '0 (x)dx = a+1 we have

Z
2 cos x 1 2 1 1 3
dx = = + =
0 (1 + sin x)3 2 (1 + sin x)2 0
8 2 8

Example 1125 Let us consider the integral


Z d
log t
dt (30.58)
c t

with [c; d] (0; 1). In the integral we recognize again a form of type (i) of Example 1122,
which is an integral of the type
Z
'(t)a '0 (t)dt

R '(t)a+1
with '(t) = log t and a = 1. Since '(t)a '0 (t)dt = a+1 we have

Z d
d
log t log2 t 1
dt = = log2 d log2 c
c t 2 c 2

N
30.9. FUNCTIONS INTEGRABLE IN CLOSED FORM 823

30.9 Functions integrable in closed form


For both theoretical and practical ends, it is important to know when the primitive of an
elementary function is itself an elementary function. To this end, it is necessary, …rst of all,
to make rigorous the notion of elementary function, introduced in Section 6.5 of Chapter 6.
To do this, we introduce two important classes of functions, the rational and the algebraic
ones. A function f : A R ! R is called:

(i) rational if it is de…ned through …nite combinations of the four elementary operations
(addition, subtraction, multiplication and division) on the variable x; it is easy to verify
that a rational function can be expressed as ratio of polynomials
a0 + a1 x + ::: + an xn
f (x) = (30.59)
b0 + b1 x + ::: + bm xm

(ii) algebraic if it is de…ned through …nite combinations of the four elementary operations
and of operations of extraction of root.

Example 1126 The functions


p p r q
x 31 x 3 p
5
f (x) = p and g(x) = 1+ 1 x 1
x 2 e
are algebraic. N

We can now de…ne the elementary functions.

De…nition 1127 A function f : A R ! R is called elementary if it belongs to one of the


following classes:

(i) rational functions,

(ii) algebraic functions,

(iii) exponential functions,

(iv) logarithmic functions,

(v) trigonometric functions,15

(vi) the functions obtained through both …nite combinations and …nite compositions of func-
tions belonging to the previous classes.

The elementary functions that are neither rational nor algebraic are called transcend-
ental. For example, such are the exponential functions, the logarithmic functions and the
trigonometric functions.
15
It is possible to show that the use of complex numbers allows us, actually, to reconduct the trigonometric
functions to linear combinations of exponential functions. The reader will encounter this type of results in
more advanced courses.
824 CHAPTER 30. RIEMANN’S INTEGRAL

The elementary functions can be written in …nite terms (that is, in closed form), which
gives them simplicity and tractability. The question, however, relevant for the integral
calculus is if their primitive is itself an elementary function, and therefore preserves the
tractability of the original function. This motivates the following de…nition:

De…nition 1128 An elementary function is said to be integrable in …nite terms if its prim-
itive is itself an elementary function.

In such a case, we will say also that f is explicitly integrable or integrable in closed form.
For example, f (x) = 2x is explicitly integrable since its primitive F (x) = x2 is an elementary
function. Also the functions f (x) = sin x, f (x) = cos x, as well as all the polynomials and
the exponential functions of the type f (x) = ekx with k 2 R are explicitly integrable.

Nevertheless, and it is what makes it interesting the argument of the section, not all
the elementary functions are explicitly integrable. The next result reports the remarkable
example of the Gaussian function.

Proposition 1129 The elementary functions e x2 and ex =x are not integrable in …nite
terms.

The proof of the proposition is based on results of complex analysis. The non-integrability
in …nite terms of these functions implies that of other important functions.

Example 1130 The function 1= log x is not integrable in …nite terms. Indeed, with the
change of variable x = et , we get dx = et dt and therefore, by substitution,
Z Z
1 et
dx = dt
log x t

Since ex =x is not integrable in …nite terms, it follows that also 1= log x is not so. In particular,
the famous integral function Z x
1
Li (x) = dt
2 log t
which is very important in the study of prime numbers, is not an elementary function. N

In the light of these examples of elementary functions that are not explicitly integrable, it
becomes important to have criteria that guarantee the integrability, or the non-integrability,
in …nite terms of a given elementary function. For the rational functions everything is simple:

Proposition 1131 The rational functions are integrable in …nite terms. In particular, the
primitive of a rational function f (x) is an elementary function given by the linear combin-
ation of the following functions:

log(ax2 + bx + c), arctan(dx + k) and r (x)

where a; b; c; d; k 2 R and r(x) is a rational function.


30.9. FUNCTIONS INTEGRABLE IN CLOSED FORM 825

Example 1132 Let us calculate the integral


Z
x 1
2
dx
x + 3x + 2

Since the denominator is (x + 1)(x + 2), it is necessary to look for A and B such that

A B x 1
+ = 2 (30.60)
x+1 x+2 x + 3x + 2
The …rst term of (30.60) is equal to

A(x + 2) + B(x + 1) x(A + B) + (2A + B)


= (30.61)
(x + 1)(x + 2) (x + 1)(x + 2)

Expressions (30.60) and (30.61) are equal if and only if A and B satisfy the system:

A+B =1
2A + B = 1

Therefore A = 2, B = 3 and
Z Z
x 1 2 3
dx = + dx = 2 log jx + 1j + 3 log jx + 2j
x2 + 3x + 2 x+1 x+2
N

Example 1133 Let us calculate the integral


Z
dx
x2 6x + 13

Let us note that


1 1 1 1
= 2 =
x2 6x + 13 (x 3) + 4 4 x 3 2
2 +1
Let us operate the change of variable
x 3
u=
2
that leads to
dx
du =
2
Therefore
Z Z Z
dx 1 2du 1 du
= =
x2 6x + 13 4 u2 + 1 2 u2 + 1
1 1 x 3
= arctan u + k = arctan +k
2 2 2
N
826 CHAPTER 30. RIEMANN’S INTEGRAL

Things are more complicated for algebraic and transcendental functions: some of them
are integrable in …nite terms, others are not. A full analysis of the topic is well beyond the
scope of this book.16 We just mention an important result of Liouville that establishes a
necessary and su¢ cient condition for the integrability in …nite terms of functions of the form
f (x)eg(x) . Inter alia, the result of Liouville permits to prove Proposition 1129, that is, the
2
non-integrability in …nite terms of the functions e x and ex =x.

This said, in some (lucky) cases the integrability in …nite terms of non-rational elementary
functions can be brought back, through suitable substitutions, to that of rational functions.
It is the case, for example, of functions of the type r(ex ), where r ( ) is a rational function.
Indeed, by setting x = log t and by recalling what we have seen in Section 30.8 on the
integration by substitution, we get
Z Z
x r(t)
r(e )dx = dt
t

Thanks to Proposition 1131, the rational function r (t) =t is integrable in …nite terms.
Another example is the transcendental function

a sin x + b cos x
f (x) =
c sin x + d cos x
with a; b; c; d 2 R and ; ; ; 2 Z. By setting x = 2 arctan t, that is,
x
tan =t
2
with simple trigonometric arguments we get:17

2t 1 t2
sin x = and cos x = (30.62)
1 + t2 1 + t2
With such a substitution we transform f (x) in the rational function

2t 1 t2
a 1+t2
+b 1+t2

2t 1 t2
c 1+t2
+d 1+t2

and we proceed to the explicit integration (always proceeding by substitution).


16
See G. H. Hardy, “The integration of functions of a single variable”, Cambridge University Press, 1916;
J. F. Ritt “Integration in …nite terms: Liouville’s theory of elementary models”, Columbia University Press,
1948. See also the comprehensive “Table of integrals, series and products”of I.S. Gradshteyn and I.M. Ryzhik,
Academic Press, 2000.
17
Indeed, we have sin x = 2 sin x2 cos x2 and cos x = cos2 x2 sin2 x2 . Observe that 1 + tan2 x2 = cos12 x and
2
therefore cos x2 = p 1 2 x . Moreover,
1+tan 2

x x x tan x2
sin = tan cos = q
2 2 2 1 + tan2 x
2

x x
By substituting sin 2
and cos 2
in sin x and cos x we get (30.62).
30.10. IMPROPER INTEGRALS 827

O.R. The question of determining whether the inde…nite integral of a function belongs to a
given class of functions or not was tackled already by Newton and Leibniz. While the former,
in order to avoid resorting to transcendental functions, preferred to express the primitive
through algebraic functions (also through in…nite series of algebraic functions), the latter
gave priority to formulations in …nite terms and considered acceptable also non-algebraic
primitives. The vision of Leibniz prevailed and in the Nineteenth century the problem of
integrability in …nite terms became an important research area, with major contributions by
Joseph Liouville in the 1830s. H

30.10 Improper integrals


We talk about improper integrals in two cases: when the interval of integration is unbounded
or when the interval of integration is bounded, but the function is unbounded in the proximity
of some of the interval’s points.

30.10.1 Unbounded intervals of integration: generalities


Until now we have considered integrals on closed and bounded intervals [a; b]. In the applic-
ations are however very important also the integrals on unbounded intervals. A very famous
example is the Gaussian bell

y
2.5

1.5

0.5

0
O x
-0.5

-1
-4 -3 -2 -1 0 1 2 3 4

centered in the origin, seen in Example 999 and whose area is given by an integral of the
form Z +1
2
e x dx (30.63)
1

called Gauss’s integral. In this case the domain of integration is the whole real line ( 1; +1).

Let us begin with domains of integration of the form [a; +1). Given a function f :
[a; +1) ! R, consider the integral function F : [a; +1) ! R given by
Z x
F (x) = f (t) dt
a
828 CHAPTER 30. RIEMANN’S INTEGRAL

R +1
The de…nition of improper integral a f (x) dx is based on the limit limx!+1 F (x), that
is, on the asymptotic behavior of the integral function. For it we can have three cases:

(i) limx!+1 F (x) = L 2 R,

(ii) limx!+1 F (x) = 1,

(iii) limx!+1 F (x) does not exist.

Cases (i) and (ii) are considered by next de…nition.

De…nition 1134 Let f : [a; +1) ! R be a function integrable on every interval [a; b]
[a; +1) with integral function F . If limx!+1 F (x) 2 R, we set
Z +1
f (x) dx = lim F (x)
a x!+1

Rand
+1
the function f is said to be integrable in an improper sense on [a; +1). The value
a f (x) dx is called improper (or generalized) Riemann’s integral.

For brevity, in the sequel we will say that a function f is integrable on [a; +1), omitting
“in an improper sense”. We have the following terminology:
R +1
(i) the integral a f (x) dx converges if limx!+1 F (x) 2 R;
R +1
(ii) the integral a f (x) dx diverges positively (negatively) if limx!+1 F (x) = +1( 1);
R +1
(iii) …nally, if limx!+1 F (x) does not exist, we say that the integral a f (x) dx does not
exist or that it is oscillating.

Example 1135 Fixed > 0, let f : [1; +1) ! R be given by f (x) = x . The integral
function F : [1; +1) ! R is
8
Z x < 1
x1 1 if =6 1
F (x) = t dt = 1
1 :
log x if = 1

so that 8
< +1 if 1
lim F (x) = 1
x!+1 : if >1
1
It follows that the improper integral
Z +1
1
dx
1 x
exists for every > 0: it converges if > 1 and it diverges positively if 1. N
Ra
The integral f (x) dx on the domain of integration ( 1; a] is de…ned in an ana-
R1 1 Ra
logous way to
Rx a f (x) dx by considering the limit limx! 1 x f (t) dt, that is, the limit
limx! 1 a f (t) dt = limx! 1 ( F (x)).
30.10. IMPROPER INTEGRALS 829

Example 1136 Let f : ( 1; 0] ! R be given by f (x) = xe x2 . We have


Z 0 Z 0
t2 1 x2 1
f (x) dx = lim te dt = lim 1 e =
1 x! 1 x x! 1 2 2

and therefore the improper integral


Z 0
x2
xe dx
1

exists and it converges. N

Let us now consider the improper integral on the domain of integration ( 1; 1).

De…nition 1137 RLet f : R ! R beR a function integrable on every interval [a; b]. If there
+1 a
exist the integrals a f (x) dx and 1 f (x) dx, the function f is said to be integrable (in
an improper sense) on R and we set
Z +1 Z +1 Z a
f (x) dx = f (x) dx + f (x) dx (30.64)
1 a 1
R +1
provided we do not have an indeterminate form 1 1. The value 1 f (x) dx is called
improper (or generalized) Riemann’s integral of f on R.

It is easy to see that this de…nition does not depend on the choice of the point a 2 R.
Often, for convenience, we take Ra = 0.
+1
Also the improper integral 1 f (x) dx is called convergent or divergent according to
the fact that its value is …nite or is equal to 1.

Let us illustrate now the notion Rwith a pair of examples.


Ra Note how it is necessary to
+1
calculate separately the two integrals a f (x) dx and 1 f (x) dx, whose values must then
be summed (when they do not give rise to a form of indetermination).

Example 1138 Let f : R ! R be the constant function f (x) = k. We have


Z +1 Z +1 Z 0
f (x) dx = f (x) dx + f (x) dx
1 0 1
8
< +1 if k > 0
= lim kx + lim kx = 0 if k = 0
x!+1 x! 1 :
1 if k < 0
R +1
In other words, 1 kdx = k 1 unless k is zero. N

The value of the integral in the previous example is coherent with the geometric inter-
pretation of integral as area with sign of the region under f . Indeed, such a …gure is a big
rectangle with in…nite base and height k. Its area is +1 if k > 0, it is zero if k = 0, and it
is 1 if k < 0.
830 CHAPTER 30. RIEMANN’S INTEGRAL

Example 1139 Let f : R ! R be given by f (x) = xe x2 . We have


Z +1 Z +1 Z 0
f (x) dx = f (x) dx + f (x) dx
1 0 1
Z x Z 0
t2 t2
= lim te dt + lim te dt
x!+1 0 x! 1 x
1 x2 1 x2 1 1
= lim 1 e + lim e 1 = =0
x!+1 2 x! 1 2 2 2
and therefore the improper integral
Z +1
x2
xe dx
1

exists and it is equal to 0. N

Example 1140 Let f : R ! R be given by f (x) = x. We have


Z +1 Z +1 Z 0 Z x Z 0
f (x) dx = f (x) dx + f (x) dx = lim tdt + lim tdt
1 0 1 x!+1 0 x! 1 x

x2 x2
= lim + lim =1 1
x!+1 2 x! 1 2
and therefore the improper integral Z +1
xdx
1
does not exist because we have the indeterminate form 1 1. N

Di¤erently from Example 1138, the value of the integral of this last example is not
coherent with the geometric interpretation of integral. To convince us of this, let us observe
the following picture:

y
2

1
(+)
0
O x
(-)
-1

-2

-3
-3 -2 -1 0 1 2 3
30.10. IMPROPER INTEGRALS 831

The areas of the two regions under f for x < 0 and x > 0 are two “big triangles”of in…nite
base and height. They are intuitively equal, since perfectly symmetrical with respect to the
vertical axis, but of opposite sign (as indicated by the signs (+) and ( ) in …gure), and it is
natural to think that they compensate giving rise to an integral equal to 0. Nevertheless, the
de…nition requires the separate calculations of the two integrals as x ! +1 and as x ! 1,
which in this case generates the form of indetermination +1 1.

To try to reconcile the de…nition of the notion of integral on ( 1; +1) with the geometric
intuition, we can follow an alternative route, considering the unique limit
Z k
lim f (x) dx
k!+1 k

instead of the two separate limits in (30.64). This motivates next:

De…nition 1141 Let f : R ! R be a functionR integrable on each interval [a;


R1 b]. The prin-
1
cipal value of Cauchy, that is denoted by PV 1 f (x) dx, of the integral 1 f (x) dx is
given by
Z +1 Z k
PV f (x) dx = lim f (x) dx
1 k!+1 k

when the limit exists in R.

In place of the two limits on which the R k de…nition of improper integral is based, the
principal value considers only the limit of k f (x) dx. We will see in the examples that,
with this de…nition, the geometric intuition of integral as area with sign of the …gure below
f is preserved. It is, however, a weaker notion than the improper integral; indeed:

(i) when theR improper integral


R +1exists, then there exists also the principal value and we
+1
have PV 1 f (x) dx = 1 f (x) dx since, by Proposition 428-(i), we have
Z +1 Z k Z k Z 0
PV f (x) dx = lim f (x) dx = lim f (x) dx + f (x) dx
1 k!+1 k k!+1 0 k
Z k Z 0
= lim f (x) dx + lim f (x) dx
k!+1 0 k!+1 k
Z k Z 0 Z 1
= lim f (x) dx + lim f (x) dx = f (x) dx
k!+1 0 k! 1 k 1

(ii) nevertheless, the principal value can exist also when


R +1 the improper integral does not
exist: in the last example the improper integral 1 xdx does not exist; nevertheless
Z +1 Z k
PV xdx = lim xdx = 0
1 k!+1 k
R +1
and therefore PV 1 xdx exists and is …nite.

The principal value can therefore exist also when the improper integral does not exist. To
better illustrate the relation between the two notions of integral on ( 1; 1), let us consider
a more general version of Example 1140.
832 CHAPTER 30. RIEMANN’S INTEGRAL

Example 1142 Let f : R ! R be given by f (x) = x + , with 2 R. We have


Z +1 Z +1 Z 0
f (x) dx = f (x) dx + f (x) dx
1 0 1
Z x Z 0
= lim (t + ) dt + lim (t + ) dt
x!+1 0 x! 1 x

x2 x2
= lim + x + lim x =1 1
x!+1 2 x! 1 2

and therefore the improper integral


Z 1
(x + ) dx
1

does not exist because we have the indeterminate form 1 1. Concerning the principal
value we have
Z +1 Z k
PV f (x) dx = lim (x + ) dx
1 k!+1 k
8
Z k < +1 if >0
= lim xdx + 2 k = 2 lim k = 0 if =0
k!+1 k k!+1 :
1 if <0
R +1
and therefore the principal value exists: PV 1 (x + ) dx = 1, unless is zero. N

In the example the principal value satis…es the geometric intuition of the integral as area
with sign. Indeed, when = 0 the intuition is obvious (see the …gure and the comment after
Example 1140). In the case > 0 observe the …gure:

2.5 y

1.5

0.5
(+)

0
x
-0.5 (-)
-1

-1.5

-2
-3 -2 -1 0 1 2 3

The negative area of the “big triangle”indicated by ( ) in the negative part of the abscissae
is equal and opposite to the positive area of the big triangle indicated by (+) in the positive
part of the abscissae. If we imagine that such areas cancel each other, what “is left” is the
30.10. IMPROPER INTEGRALS 833

area of the dotted …gure, which is clearly in…nite and with + sign (being above the horizontal
axis). For < 0 similar considerations hold:

y
2

1
(+)
0

(-) x
-1

-2

-3
-3 -2 -1 0 1 2 3

The negative area of the “big triangle”indicated by ( ) in the negative part of the abscissae
is equal and opposite to the positive area of the big triangle indicated by (+) in the positive
part of the abscissae. If we imagine that such areas cancel each other out, what “is left” is
also here the area of the dotted …gure, which is clearly in…nite and with negative sign (being
below the horizontal axis).

Example 1143 Let f : R ! R be given by f (x) = x= 1 + x2 . We have


Z +1 Z +1 Z 0
f (x) dx = f (x) dx + f (x) dx
1 0 1
Z x Z 0
x x
= lim dx + lim dx
x!+1 0 1 + x2 x! 1 x 1 + x2
1 1
= lim log 1 + x2 + lim log 1 + x2 =1 1
x!+1 2 x! 1 2

and therefore the improper integral does not exist, since we have the indeterminate form
1 1. Regarding the principal value, we have instead
Z +1 Z k
x
PV f (x) dx = lim dx
1 k!+1 k 1 + x2
1 1
= lim log 1 + k 2 log 1 + k 2 =0
k!+1 2 2

and therefore the principal value is


Z +1
x
PV dx = 0:
1 1 + x2

N
834 CHAPTER 30. RIEMANN’S INTEGRAL

30.10.2 Unbounded intervals of integration: properties and criteria


We give now some properties of improper integrals, as well as some criteria of improper
integrability, that is, su¢ cient conditions for a function f de…ned on an unbounded domain
to have improper integral. For simplicity, we limit ourselves to the domain [a; +1), leaving
to the reader the analogous versions of these criteria for the domains ( 1; a] and ( 1; +1).

Properties
Being de…ned as limits, the properties of improper integrals follow from the properties of
limits of functions seen in Section 11.4. In particular, the improper integral preserves the
properties of linearity and of monotonicity of Riemann’s integral.
Let us begin with linearity, which follows from the algebra of limits seen in Proposition
428.

Proposition 1144 Let f; g : [a; +1) ! R be two functions integrable on [a; +1). Then,
for every ; 2 R, the function f + g : [a; +1) ! R is integrable on [a; +1) and we have
Z +1 Z +1 Z +1
( f + g) (x) dx = f (x) dx + g (x) dx (30.65)
a a a

provided the second member is not an indeterminate form 1 1.

Proof Thanks to the linearity of Riemann’s integral, and to points (i) and (ii) of Proposition
428, we have
Z x
lim ( f + g) (x) dx = lim ( F (x) + G (x))
x!+1 a x!+1

= lim F (x) + lim G (x)


x!+1 x!+1
Z +1 Z +1
= f (x) dx + g (x) dx
a a

which implies the improper integrability of the function f + g and (30.65).

The property of monotonicity of limits of functions (see Proposition 427 and its scalar
variants) implies the property of monotonicity of the improper integral.

Proposition
R +1 1145 Let
R +1f; g : [a; +1) ! R be two functions integrable on [a; +1). If f g,
then a f (x) dx a g (x) dx.

Proof Thanks to the monotonicity of Riemann’s integral, we have F (x) G (x) for
every x 2 [a; +1). By the monotonicity of the limits of functions, we have therefore
limx!+1 F (x) limx!+1 G (x).
R +1
As we have seen in Example 1138, we have 0dx = 0. A simple consequence of
R +1 a
Proposition 1145 is therefore that a f (x) dx 0 when f is positive and integrable on
[a; +1).
30.10. IMPROPER INTEGRALS 835

Proposition 1145 gives also a simple comparison criterion for the divergence: given two
functions f; g : [a; +1) ! R integrable on [a; +1), with g f , we have
Z +1 Z +1
f (x) dx = +1 =) g (x) dx = +1 (30.66)
a a

and Z Z
+1 +1
g (x) dx = 1 =) f (x) dx = 1 (30.67)
a a

Criteria of integrability
We give now some criteria of integrability, limiting ourselves for simplicity to positive func-
tions f : [a; +1) ! R. In such a case, the integral function F : [a; +1) ! R is increasing.
Indeed, for every x2 x1 a,
Z x2 Z x1 Z x2 Z x1
F (x2 ) = f (t) dt = f (t) dt + f (t) dt f (t) dt = F (x1 )
a a x1 a
Rx
since x12 f (t) dt 0. Thanks to the monotonicity of the integral function, we have the
following characterization of improper integrals of positive functions:

Proposition 1146 Let f : [a; +1) ! R be a positive function integrable on every [a; b]
[a; +1). Then it is integrable on [a; +1) and
Z +1
f (t) dt = sup F (x) (30.68)
a x2[a;+1)

R1
In particular, a f (t) dt converges only if limx!+1 f (x) = 0 (provided such limit exists).

RThe
+1
positive functions f : [a; +1) ! R are therefore
R 1 integrable in an improper sense, that
is, a f (t) dt 2 R+ . In particular, their integral a f (t) dt either converges or diverges
positively: tertium non datur. We have convergence if and only if supx2[a;+1) F (x) <
+1, and onlyR +1if f is in…nitesimal as x ! +1 (provided the limit limx!+1 f (x) exists).
Otherwise, a f (t) dt diverges positively.

The condition limx!+1 f (x) = 0 is only necessary for the convergence, as Example 1135
with 0 < 1 shows. For example, if = 1 we have limx!+1 1=x = 0, but for every a > 0
we have Z +1 Z x
1 1 x
dt = lim dt = lim log = +1
a t x!+1 a t x!+1 a
R +1
and therefore a (1=t) dt diverges positively.

In stating the necessary condition limx!+1 f (x) = 0 we put the clause “provided such
limit exists”. The next simple example
R 1 shows that the clause is important because the limit
can not to exist even if the integral a f (t) dt converges.
836 CHAPTER 30. RIEMANN’S INTEGRAL

Example 1147 Let f : [0; +1) ! R be given by


(
1 if x 2 N
f (x) =
0 otherwise
Rx
Thanks to Proposition 1086, it is easy to see that 0 f (t) dt = 0 for every x > 0 and,
R1
therefore, 0 f (x) dx = 0. Nevertheless, the limit limx!+1 f (x) does not exist. N

Proposition 1146 rests on the following simple property of limits of monotonic functions,
which is the version for functions of Theorem 285 for monotonic sequences.

Lemma 1148 Let ' : [a; +1) ! R be an increasing function. Then, limx!+1 ' (x) =
supx2[a;+1) ' (x).

Proof Let us consider …rst the case supx2[a;+1) ' (x) 2 R. Let " > 0. Since supx2[a;+1) ' (x) =
sup ' ([a; +1)), thanks to Proposition 119 there exists x" 2 [a; +1) such that ' (x" ) >
supx2[a;+1) ' (x) ". Since ' is increasing, we have

sup ' (x) " < ' (x" ) ' (x) sup ' (x) 8x x"
x2[a;+1) x2[a;+1)

and hence limx!+1 ' (x) = supx2[a;+1) ' (x).


Let us suppose now that supx2[a;+1) ' (x) = +1. For every M > 0 there exists xM 2
[a; +1) such that ' (xM ) M . The increasing monotonicity implies ' (x) ' (xM ) M
for every x xM , and therefore limx!+1 ' (x) = +1.

Proof of Proposition 1146 Since f is positive, its integral function F : [a; +1) ! R is
increasing and therefore, thanks to Lemma 1148,

lim F (x) = sup F (x)


x!+1 x2[a;+1)

Let us suppose that there exists limx!+1 f (x). Let us show that the integral converges only
if limx!+1 f (x) = 0. Let us suppose, by contradiction, that limx!+1 f (x) = L 2 (0; +1].
Given 0 < " < L, there exists x" > a such that f (x) L " > 0 for every x x" . Therefore
Z +1 Z x" Z +1 Z +1 Z x
f (t) dt = f (t) dt + f (t) dt f (t) dt = lim f (t) dt
a a x" x" x!+1 x"
Z x
lim (L ") dt = (L ") lim (x x" ) = +1
x!+1 x x!+1
"

R +1
which shows that a f (t) dt diverges positively.

The next result is a simple comparison criterion to determine if the improper integral of
a positive function is convergent or divergent.
30.10. IMPROPER INTEGRALS 837

Corollary 1149 Let f; g : [a; +1) ! R be two positive functions integrable on every [a; b]
[a; +1), with f g. Then
Z +1 Z +1
g (x) dx 2 [0; +1) =) f (x) dx 2 [0; +1) (30.69)
a a

and Z Z
+1 +1
f (x) dx = +1 =) g (x) dx = +1 (30.70)
a a
R +1 R +1
Proof By Proposition 1145, we have a f (x) dx g (x) dx, while, thanks to Pro-
R +1 R +1
a
position 1146, we have a f (x) dx 2 [0; +1] and a g (x) dx 2 [0; +1]. Therefore,
R +1 R +1 R +1
f (x) dx converges if a g (x) dx converges, while a g (x) dx diverges positively if
R +1
a
a f (x) dx diverges positively.

The study of integral (30.63) of the Gaussian function f (x) = e x2 , to which we will
devote next section, is a very remarkable application of this corollary.

We point out …nally an important asymptotic criterion of integrability, based on the


asymptotic nature of the improper integral. We omit the proof.

Proposition 1150 Let f; g : [a; +1) ! R be positive functions integrable on every [a; b]
[a; +1).
R +1
(i) If f g as x ! +1, then a g (x) dx converges (diverges positively) if and only if
R +1
a f (x) dx converges (diverges positively).
R +1 R +1
(ii) If f = o (g) as x ! +1 and a g (x) dx converges, then also a f (x) dx converges.
R +1 R +1
(iii) If f = o (g) as x ! +1 and a f (x) dx diverges positively, then also a g (x) dx
diverges positively.
R +1
In the light of Example 1135, Proposition 1150 implies that a f (x) dx converges if
there exists > 1 such that
1 1
f or f = o as x ! +1
x x

The comparison with powers x is an important criterion of convergence for improper


integrals, as next two examples show.

Example 1151 Let f : [0; +1) ! R be a positive function given by


1 1
sin3 x + x2
f (x) = 1 1
x + x3

Since as x ! +1
1
f
x
R +1
Proposition 1150 implies 0 f (x) dx = +1. N
838 CHAPTER 30. RIEMANN’S INTEGRAL

Example 1152 Let f : [0; +1) ! R be a positive function given by


1
f (x) = x sin
x
with < 0. Since
1
f
x1
R +1
Proposition 1150 implies 0 f (x) dx 2 [0; +1), that is, the integral converges. N

We close by observing that, as the reader can verify, what has been proved for positive
functions, extends easily to all the functions f : [a; +1) ! R eventually positive, that is,
such that there exists c > a for which f (x) 0 for every x c.

30.10.3 Gauss’s integral


2
Let us consider the Gaussian function f : R ! R given by f (x) = e x . Since it is positive,
R +1
Proposition 1146 guarantees that the improper integral a f (x) dx exists for every a 2 R.
Let us show that it converges. Let g : R ! R be given by
x
g (x) = e
If x > 0, we have
2
g (x) f (x) () e x e x () x x2 () x 1
R +1 R +1
By (30.69) of Corollary 1149, if 1 g (x) dx converges, then also 1 f (x) dx converges.
R +1
In turn, this implies that a f (x) dx converges for every a 2 R. This is obvious if a 1.
If a < 1, we have Z Z Z
+1 1 +1
f (x) dx = f (x) dx + f (x) dx
a a 1
R1
Rand,
1
since a f (x) dx exists
R 1thanks to the continuity of f on [a; 1], the convergence of
1 f (x) dx implies that of a f (x) dx.
R +1
Thus, it remains to show that 1 g (x) dx converges. We have
Z x
G (x) = g (t) dt = e 1 e x
1

and hence (30.68) implies


Z 1
1
g (x) dx = sup G (x) = e < +1
1 x2[1;1)
R +1
It follows that 1 f (x) dx converges, as desired.
In conclusion, the integral Z +1
x2
e dx
a
is convergent for every a 2 R. By Proposition 1129, this integral cannot be computed in
closed form. Indeed, its computation is not simple at all and, although we omit the proof,
we report the beautiful result one can obtain for a = 0, which is due to Gauss (here, more
than ever, princeps mathematicorum).
30.10. IMPROPER INTEGRALS 839

Theorem 1153 (Gauss) It holds


Z +1 p
x2
e dx = (30.71)
0 2

It is possible to prove in a similar way that


Z 0 p
x2
e dx = (30.72)
1 2

On the other hand, the equality between integrals (30.71) and (30.72) is quite intuitive in
the light of the symmetry with respect to the vertical axis of the Gaussian bell.
Thanks to De…nition 1137, the value of the integral of the Gaussian function, the so-called
Gauss’s integral, is therefore
Z +1 Z +1 Z 0
2 2 2 p
e x dx = e x dx + e x dx = (30.73)
1 0 1

Gauss’s integral is central in probability theory, where it is usually presented in the form:
Z +1
1 x2
p e 2 dx
1 2
By proceeding by substitution, it is easy to verify that for every pair of scalars a; b 2 R, we
have Z +1
(x+a)2 p
e b2 dx = b (30.74)
1
p
which implies, setting b = 2 and a = 0,
Z +1
1 x2
p e 2 dx = 1
1 2
The improper integral on R of the function
1 x2
f (x) = p e 2
2
has therefore unitary value and, thus, it is a density function, as the reader will see in
statistics courses. This explains the importance of this particular form of the Gaussian
function.

30.10.4 Unbounded functions


Another case of improper integral is the one in which the function is continuous in a bounded
interval [a; b] unless some point in a neighborhood of which it is not bounded (that is the
limit of the function at such points is 1). It will be su¢ cient to consider the case of only
one point since, if they were more than one, it would be enough to examine them one by
one.
Let us start by considering the case in which the point in proximity of which the function
is unbounded is the supremum b of the interval.
840 CHAPTER 30. RIEMANN’S INTEGRAL

De…nition 1154 Let f : [a; b) ! R be a continuous function such that limx!b f (x) = 1.
If Z z
lim f (x) dx = lim [F (z) F (a)]
z!b a z!b
exists (…nite or in…nite), the function f is said to be integrable in an improper sense on
Rb Rb
[a; b] and such limit is assumed as a f (x) dx. The value a f (x) dx is called improper (or
generalized) Riemann’s integral.
If the unboundedness of the function concerned the point a, or both, we would give a
completely analogous de…nition. If the unboundedness concerned a point c 2 (a; b), it would
be su¢ cient to consider separately the two intervals [a; c] and [c; b].
Example 1155 Let f : [a; b] ! R be given by
f (x) = (b x) with >0
Given that its integral function is
8
>
> (b x) +1
< for 0 < 6= 1
F (x) = +1
>
>
: log jb xj for =1
we have
0 if > 1
lim F (x) =
x!b +1 if 0 < 1
It follows that the improper integral
Z b
1
dx
a (b x)
exists for every > 0: it converges if > 1 and it diverges positively if 0 < 1. N
Also for these improper Rintegrals a version of Proposition 1150 could be proven. In this
b
case, it allows to state that a f (x) dx converges if there exists > 1 such that
1 1
f or f = o as x ! b
(b x) (b x)
The comparison with (b x) is an important criterion of convergence for these improper
integrals.

O.R. When the interval is unbounded, for the improper integral to converge, the function
must tend to zero quite rapidly (as x with > 1). When the function is unbounded,
for the improper integral to converge the function must tend to in…nity fairly quickly – as
(b x) with > 1. Both things are quite intuitive: for the area of an unbounded surface
to exist …nite, its portion “that escapes to in…nity” must be very strict. For example, the
function f : R+ ! R+ de…ned by f (x) = 1=x is not integrable either on intervals of the
type [a; +1), a > 0 or on intervals of the type [0; a]: indeed the integral function of f is
F (x) = log x that diverges either as x ! +1 and as x ! 0+ . The functions (asymptotic to)
1= (x b)1+" , with " > 0, are integrable either on the intervals of the type [b; +1), b > 0,
and on the intervals of the type [0; b]. H
Chapter 31

Parameter-dependent integrals

Let us consider the function of two variables

f : [a; b] [c; d] ! R

de…ned on the rectangle [a; b] [c; d] in R2 . If for every y 2 [c; d], the scalar function
f ( ; y) : [a; b] ! R is integrable in [a; b], then to every such y the real number
Z b
f (x; y)dx (31.1)
a

can be associated. Unlike the integrals we have seen up to now, the value of the de…nite
integral (31.1) depends on the value of the variable y, which is usually interpreted as a
parameter. Such an integral, referred to as parameter-dependent integral, therefore de…nes a
scalar function F : [c; d] ! R in the following way:
Z b
F (y) = f (x; y)dx (31.2)
a

Note that, although function f is of two variables, function F de…ned above is scalar. Indeed
it does not depend in any way on the variable x which in this setting plays the same role as
mute variables of integration.
Functions of type (31.2) appear in applications more frequently than one may initially
think. Therefore, having the appropriate instruments to study such objects is crucial.

31.1 Properties
We shall study two properties of function F , continuity and di¤erentiability. Let us start
with continuity.

Proposition 1156 If f : [a; b] [c; d] ! R is continuous, then function F : [c; d] ! R is


continuous, that is
Z b
lim F (y) = lim f (x; y)dx 8y0 2 [c; d] (31.3)
y!y0 a y!y0

841
842 CHAPTER 31. PARAMETER-DEPENDENT INTEGRALS

Formula (31.3) is referred to as “passage of the limit under the integral sign”.

Proof Take " > 0. We must show that there exists a > 0 such that

y 2 [c; d] \ (y0 ; y0 + ) =) jF (y) F (y0 )j < "

By using the properties of integrals, we have that


Zb Zb
jF (y) F (y0 )j = (f (x; y) f (x; y0 )) dx jf (x; y) f (x; y0 )j dx
a a

By hypothesis, f is continuous on the compact set [a; b] [c; d]. By Theorem 473, it is
therefore uniformly continuous on [a; b] [c; d] , so that there is a > 0 such that
"
k(x; y) (x0 ; y0 )k < =) jf (x; y) f (x0 ; y0 )j < (31.4)
b a
for every (x; y) 2 [a; b] [c; d]. Therefore, for every y 2 [c; d] \ (y0 ; y0 + ) we have that

k(x; y) (x; y0 )k = jy y0 j <

which, thanks to (31.4), implies that

Zb
"
jF (y) F (y0 )j jf (x; y) f (x; y0 )j dx < (b a) = "
b a
a

as desired.

The second result analyzes the di¤erentiability of function F .

Proposition 1157 Suppose that f : [a; b] [c; d] ! R and its partial derivative @f =@y are
both continuous on [a; b] [c; d]. Then the function F : [c; d] ! R is di¤ erentiable on (c; d)
and we have that Z b
0 @
F (y) = f (x; y)dx (31.5)
a @y

Formula (31.5) is referred to as “di¤erentiation under the integral sign”. As

Zb
0 F (y + h) F (y0 ) f (x; y + h) f (x; y)
F (y) = lim = lim dx
h!0 h h!0 h
a

and Z Z
b b
@ f (x; y + h) f (x; y)
f (x; y)dx = lim dx
a @y a h!0 h
equivalence (31.5) is equivalent to

Zb Z b
f (x; y + h) f (x; y) f (x; y + h) f (x; y)
lim dx = lim dx
h!0 h a h!0 h
a
31.1. PROPERTIES 843

that is to exchange the order of limits and integrals.

Proof Let us set y0 2 [c; d]. For every x 2 [a; b] the function f (x; ) : [c; d] ! R is, by
hypothesis, di¤erentiable, so that, by Lagrange’s Theorem, there exists x 2 [0; 1] such that
f (x; y0 + h) f (x; y0 ) @f
= (x; y0 + x h)
h @y
Note that x depends on x. Let us write the di¤erence quotient of function F in y0 2 (c; d):

Zb
F (y0 + h) F (y0 ) @f
(x; y0 ) dx (31.6)
h @y
a
Zb Zb
f (x; y0 + h) f (x; y0 ) @f
= dx (x; y0 ) dx
h @y
a a
Zb
@f @f
= (x; y0 + x h) (x; y0 ) dx
@y @y
a
Zb
@f @f
(x; y0 + x h) (x; y0 ) dx
@y @y
a

The partial derivative @f =@y is continuous on the compact set [a; b] [c; d], so it is also
uniformly continuous. Thus, given any " > 0, there exists a > 0 such that
@f @f "
k(x; y) (x; y0 )k < =) (x; y) (x; y0 ) < (31.7)
@y @y b a

for every y 2 [c; d]. Therefore, for jhj < we have that

k(x; y0 + x h) (x; y0 )k = x jhj jhj < 8x 2 [a; b]

Thanks to conditions (31.6) and (31.7), this implies that

Zb
F (y0 + h) F (y0 ) @f
(x; y0 ) dx < " 8jhj <
h @y
a

that is
Zb Zb
@f F (y0 + h) F (y0 ) @f
(x; y0 ) dx "< < (x; y0 ) dx + " 8jhj <
@y h @y
a a

In particular, it holds that


Zb Zb
@f F (y0 + h) F (y0 ) @f
(x; y0 ) dx " lim (x; y0 ) dx + "
@y h!0 h @y
a a
844 CHAPTER 31. PARAMETER-DEPENDENT INTEGRALS

Since the above holds for every " > 0, it follows that

Zb Zb
@f F (y0 + h) F (y0 ) @f
(x; y0 ) dx lim (x; y0 ) dx
@y h!0 h @y
a a

that is
Zb
F (y0 + h) F (y0 ) @f
lim = (x; y0 ) dx
h!0 h @y
a

as desired.

Example 1158 Set f (x; y) = x2 + xy 2 and


Z b
F (y) = x2 + xy 2 dx
a

As the hypotheses of Proposition 1157 are satis…ed, we get that


Z b
0 b2 a2
F (y) = 2y xdx = y
a 2
N

31.2 Variability: Leibniz’s rule


Let us consider the general case in which also the limits of the integral are functions of
variable y. Let
; : [c; d] ! [a; b]
be two functions de…ned on [c; d] taking values in [a; b]. Given f : [a; b] [c; d] R2 ! R,
let us consider G : [c; d] ! R de…ned as

Z(y)
G(y) = f (x; y)dx (31.8)
(y)

The following result extends Proposition 1157. Formula (31.9) is referred to as Leibniz’s
rule.

Proposition 1159 Suppose that f : [a; b] [c; d] R2 ! R and its partial derivative @f =@y
are both continuous on [a; b] [c; d]. If ; : [c; d] ! [a; b] are di¤ erentiable, then the function
G : [c; d] ! R is di¤ erentiable on (c; d) and we have that
Z (y)
0 @f 0 0
G (y) = (x; y)dx + (y)f ( (y); y) (y)f ( (y); y) (31.9)
(y) @y
31.2. VARIABILITY: LEIBNIZ’S RULE 845

Proof Let H : [a; b] [a; b] [c; d] ! R be given by


Zz
H (v; z; y) = f (x; y)dx
v

Since
G(y) = H( (y) ; (y) ; y)
the derivative of G with respect to y n a point y0 2 (c; d) can be calculated by using the
chain rule:
@H @H @H
G0 (y0 ) = (a0 ; b0 ; y0 ) 0
(y) + (a0 ; b0 ; y0 ) 0
(y) + (a0 ; b0 ; y0 ) (31.10)
@v @z @y
where a0 = (y0 ) and b0 = (y0 ). By Proposition 1157 we have that
Z b0
@H @
(a0 ; b0 ; y0 ) = f (x; y)dx (31.11)
@y a0 @y

and, by the Second Fundamental Theorem of Calculus, we have that


@H @H
(a0 ; b0 ; y0 ) = f (b0 ; y) and (a0 ; b0 ; y0 ) = f (a0 ; y) (31.12)
@z @v
In conclusion,
@H @H @H
G0 (y0 ) = (a0 ; b0 ; y0 ) 0
(y) + (a0 ; b0 ; y0 ) 0 (y) + (a0 ; b0 ; y0 )
@v @z @y
Z b0
@
= f (a0 ; y0 ) 0 (y0 ) + f (b0 ; y) 0 (y) + f (x; y0 )dx
a0 @y
Z (y0 )
@
= f ( (y0 ) ; y) 0 (y0 ) + f ( (y0 ) ; y0 ) 0 (y) + f (x; y0 )dx
(y0 ) @y
as desired.

Example 1160 Let f (x; y) = x2 + y 2 , a (x) = sin x and b (x) = cos x. By setting
Z y
cos

G(y) = x2 + y 2 dx
sin y

Leibniz’s rule holds as the hypothesis of Proposition 1159 are satis…ed:


Z y
cos
0
G (y) = 2ydx sin y cos2 y + y 2 cos y sin2 y + y 2
sin y
Z y
cos

= 2y dx sin y cos2 y + y 2 + cos y sin2 y + y 2


sin y

= 2y (cos y sin y) sin y cos2 y + y 2 + cos y sin2 y + y 2


N
846 CHAPTER 31. PARAMETER-DEPENDENT INTEGRALS

31.3 Improper integrals


In many important applications the parameter-dependent integral (31.1) is improper. Let
f : I J R2 ! R be a function de…ned on the rectangle I J in R2 whose “sides’ I
and J are any two closed (bounded or Runbounded) intervals of the real line. For example,
+1
if I = R and if the improper integral 1 f (x; y)dx converges for every y 2 J, then the
function F : J ! R is de…ned as
Z +1
F (y) = f (x; y)dx (31.13)
1

The extension of Proposition 1157 to the improper case is a delicate task and it requires
a dominance condition. For the sake of simplicity, in the statement we make the assumption
that I is the real line and J a closed and bounded interval. Analogous result, which we omit
for brevity, hold when I is a half-line and J an unbounded interval.

Proposition 1161 Let f : R [c; d] ! R be continuous on R [c; d] and di¤ erentiable in y


R +1
for every x 2 R. If there exists a positive function g : R ! R such that 1 g (x) dx < +1
and, for every y 2 J,
jf (x; y)j g (x) 8x 2 R (31.14)
then function F : [c; d] ! R is di¤ erentiable on (c; d) and we have that
Z +1
@
F 0 (y) = f (x; y)dx (31.15)
1 @y

The proof of the above result is not simple, so we omit it. Note that the dominance
conditionR (31.14), which is based on the auxiliary function g, guarantees, inter alia, that the
+1
integral 1 f (x; y)dx converges, thanks to the Comparison Convergence Criterion stated
in Corollary 1149.

Example 1162 Let F : [c; d] ! R be given by


Z +1
y 2 x2
F (y) = sin x e dx
1

with c 1 or d 1. Let g be the Gaussian function, that is g (x) = e x2 . For every


y 2 [c; d] we have that

y 2 x2 y 2 x2 y 2 x2
sin x e = jsin xj e e g (x)
R +1 2
Furthermore, 1 e x dx < +1. The hypotheses of Proposition 1161 are satis…ed and so
equation (31.15) takes the form
Z +1 Z +1
@ 2 2 2 2
F 0 (y) = sin x e y x dx = 2y sin x e y x dx = 2yF (y)
1 @y 1
Chapter 32

Stieltjes’integral

In many applied sciences, such as probability calculus, statistics and economics, Stieltjes’
integral is widely used, as it represents an extension of Riemann’s integral. Such an extension
can be thought of in the following way: while Riemann’s integral is based on summations
such as
Xn X n
mk (xk xk 1 ) and Mk (xk xk 1 ) (32.1)
k=1 k=1

Stieltjes’is based on summations such as

n
X n
X
mk (g(xk ) g(xk 1 )) and Mk (g (xk ) g (xk 1 )) (32.2)
k=1 k=1

Clearly (32.1) is a special case of (32.2), with g(x) = x. One may ask oneself why to write
and compute summations such as (32.2). Let us remind the reader what the meaning of
(32.1) itself is. In Riemann’s integral, every interval [xi 1 ; xi ] obtained by sectioning [a; b]
is measured according to its length xi = xi xi 1 . Clearly, taking its length is the most
intuitive way to measure an interval. However it is not the only way; in many problems, it
might come more natural to measure an interval according to a di¤erent way. For example,
if [xi 1 ; xi ] represents production between xi 1 and xi , the most appropriate measure for
such an interval is the additional cost it entails: if C (x) is the total cost for producing x,
the measure that must be assigned to [xi 1 ; xi ] is C (xi ) C (xi 1 ). If, [xi 1 ; xi ] represents
instead an interval in which a random value can take values in and F (x) represents the
probability of the random value taking value x, the most natural way to measure [xi 1 ; xi ]
is F (xi ) F (xi 1 ). Such scenarios are common in economics and in many applications.

In order for Stieltjes’integral to exist, function g must satisfy some minimal regularity
conditions: in particular, function g must be at least monotone. Such care is useless in
the case of the Riemann’s integral as g (x) = x is a continuous, strictly monotone and
di¤erentiable function.
In the case of Riemann’s integral, existence also required conditions on the integrand
function f . Such properties, as we shall see, are still necessary for Stieltjes’integral as well.

847
848 CHAPTER 32. STIELTJES’INTEGRAL

32.1 De…nition
Let us consider two functions f; g : [a; b] R ! R, with f bounded and g increasing.1 For
every partition = fa = x0 ; x1 ; :::; xn = bg of [a; b] and for every interval Ii = [xi 1 ; xi ] we
can de…ne the following quantities

mi = inf f (x) and Mi = sup f (x)


x2Ii x2Ii

Since f is bounded, such quantities are …nite. The sum


n
X
I( ; f; g) = mi (g(xi ) g(xi 1 ))
i=1

is referred to as left Stieltjes sum, while


n
X
S( ; f; g) = Mi (g(xi ) g(xi 1 ))
i=1

is referred to as right Stieltjes sum. It can be easily shown that, for every partition , it
holds that
I( ; f; g) S( ; f; g)
When the equality holds, we get Stieltjes’integral:

De…nition 1163 Let two functions f; g : [a; b] ! R be given, with f bounded and g increas-
ing. We say that f is Stieltjes integrable with respect to function g whenever

sup I( ; f; g) = inf S( ; f; g)
2 ([a;b]) 2 ([a;b])

Rb
The common value is called Stieltjes’integral of f and it is denoted as a f (x)dg(x).

For g (x) = x we get Riemann’s integral. The functions f and g are respectively called
Rintegrand
b
function and integrator function. For the sake of brevity, we shall often write
a f dg thus omitting the arguments of such functions.

N.B. In the remaining part of the chapter we will tacitly assume f and g to be any two
scalar functions de…ned on [a; b], with f bounded and g increasing. O

32.2 Integrability criteria


As we have seen for Riemann’s integral, in order to know if a function f is Stieltjes integrable
with respect to function g, one can rely on a few simple criteria.
The …rst one extends the criterion we have seen in Proposition 1085 for Riemann’s in-
tegral; the proof is analogous.
1
If g were decreasing, we could consider h = g instead, which is clearly increasing.
32.2. INTEGRABILITY CRITERIA 849

Proposition 1164 The function f is Stieltjes integrable with respect to g if for every " > 0
there exists a partition 2 ([a; b]) such that S( ; f; g) I( ; f; g) < ".

As for Riemann’s integral, it is important to know which are the classes of integrable
functions. As one may expect, the answer depends on the regularity of both functions f and
g (let the reader be reminded that we assumed g to be increasing).
Rb
Proposition 1165 The integral a f dg exists if at least one of the following two conditions
is met:

(i) f is continuous;

(ii) f is monotone and g is continuous.

Note that (i) corresponds to f being continuous for Riemann’s integral, while (ii) corres-
ponds to the case in which f is monotone.

Proof (i) The proof relies on the same steps as that of Proposition 1091. Since f is continu-
ous on [a; b], it is also bounded (Weierstrass’Theorem) and uniformly continuous (Theorem
473). Take " > 0. There exists a " > 0 such that

jx yj < " =) jf (x) f (y)j < " 8x; y 2 [a; b] (32.3)

Let = fxi gni=0 be a partition of [a; b] such that j j < ". Thanks to condition (32.3), for
every i = 1; 2; : : : ; n we have that

max f (x) min f (x) < "


x2[xi 1 ;xi ] x2[xi 1 ;xi ]

where max and min exist by Weierstrass’Theorem. It follows that


n
X n
X
S ( ; f; g) I ( ; f; g) = max f (x) (g(xi ) g(xi 1 )) min f (x) (g(xi ) g(xi 1 ))
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=1 i=1
n
X
= max f (x) min f (x) (g(xi ) g(xi 1 ))
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
i=1
Xn
< " (g(xi ) g(xi 1 )) = "(g(b) g(a))
i=1

By Proposition 1164, f is therefore integrable.

(ii) Since g is continuous on [a; b], it is also bounded and uniformly continuous. Having
set a " > 0, there is a " > 0 such that

jx yj < " =) jg (x) g (y)j < " 8x; y 2 [a; b]

Let = fxi gni=0 be a partition of [a; b] such that j j < " . For every pair of consecutive
points of such a partition, we have that g(xi ) g(xi 1 ) = jg(xi ) g(xi 1 )j < ". The proof
850 CHAPTER 32. STIELTJES’INTEGRAL

now follows the same steps as that of Proposition 1094. Suppose that f is increasing (if f is
decreasing the reasoning is analogous). We have that

inf f (x) = f (xi 1) f (a) and sup f (x) = f (xi ) f (b)


x2[xi 1 ;xi ] x2[xi 1 ;xi ]

so that
n
X
S ( ; f; g) I ( ; f; g) = sup f (x) (g(xi ) g(xi 1 ))
i=1 x2[xi 1 ;xi ]

Xn
inf f (x) (g(xi ) g(xi 1 ))
x2[xi 1 ;xi ]
i=1
n
X n
X
= f (xi ) (g(xi ) g(xi 1 )) f (xi 1 ) (g(xi ) g(xi 1 ))
i=1 i=1
Xn
= (f (xi ) f (xi 1 )) (g(xi ) g(xi 1 ))
i=1
Xn
< " (f (xi ) f (xi 1 )) < " (f (b) f (a))
i=1

By Proposition 1085, the function f is integrable.

Lastly, we extend Proposition 1092 to Stieltjes’integral, requiring that g not share the
possible discontinuities of f .

Proposition 1166 If f has …nitely many discontinuities and g is continuous in at least in


such points,2 the f is integrable with respect to g.

We omit the proof of this remarkable result which, inter alia, generalizes point (i) of
Proposition 1165. However, note that, while Proposition 1092 allowed for in…nitely many
discontinuities, in this more general setting we restrict ourselves to considering …nitely many
ones.

32.3 Calculus
When g is di¤erentiable, Stieltjes’integral can be written as a Riemann’s integral.

Proposition 1167 Let g be di¤ erentiable and g 0 Riemann integrable. Then f is integrable
with respect to g if and only if f g 0 is Riemann integrable; in such a case we have that
Z b Z b
f (x)dg (x) = f (x)g 0 (x)dx (32.4)
a a

2
In other words, we require the two functions f and g to not be discontinuous in the same point.
32.3. CALCULUS 851

Proof Since g 0 is Riemann integrable, for any given " > 0 there exists a partition such
that
S(g 0 ; ) I(g 0 ; ) < "
that is, denoting by Ii = [xi 1 ; xi ] the generic i-th interval in partition ,
n
X
sup g 0 (x) inf g 0 (x) xi < " (32.5)
x2Ii x2Ii
i=1

From (32.5) we also deduce that, for any pair of points si ; ti 2 Ii , we have that
n
X
g 0 (si ) g 0 (ti ) xi < " (32.6)
i=1

Always referring to the generic interval Ii of the partition, we can observe that, thanks to
the di¤erentiability of g, there is a point ti 2 [xi 1 ; xi ] such that

gi = g(xi ) g(xi 1) = g 0 (ti ) xi


n
X n
X
So f (si ) gi = f (si )g 0 (ti ) xi . By denoting M = sup[a;b] jf (x)j, and using inequality
i=1 i=1
(32.6), we have that
n
X n
X n
X
f (si ) gi f (si )g 0 (si ) xi = f (si ) g 0 (ti ) g 0 (si ) xi
i=1 i=1 i=1
n
X
M g 0 (si ) g 0 (ti ) xi M"
i=1

So that we get
n
X n
X
M" f (si ) gi f (si )g 0 (si ) xi M"
i=1 i=1
n
X n
X
Note that S(f g 0 ; ) f (si )g 0 (si ) xi , from which f (si ) gi S(f g 0 ; ) + M ", and
i=1 i=1
so also
S( ; f; g) S(f g 0 ; ) + M " (32.7)
One can symmetrically prove that

S(f g 0 ; ) S( ; f; g) + M " (32.8)

so combining (32.7) and (32.8) we get that

S( ; f; g) S(f g 0 ; ) M" (32.9)

Inequality (32.9) holds for any partition of the interval [a; b] and for every " > 0. So
Z b Z b
f (x) dg (x) = f (x) g 0 (x)dx (32.10)
a a
852 CHAPTER 32. STIELTJES’INTEGRAL

One can analogously show that


Z b Z b
f (x) dg (x) = f (x) g 0 (x)dx (32.11)
a a

Form (32.10) and (32.11) one can see that f g 0 is Riemann integrable if and only if f is
Stieltjes integrable with respect to g, in which case we get (32.4).

When f is continuous and g is di¤erentiable, thanks to equation (32.4) a Stieltjes’integral


can be transformed in a Riemann’s integral with integrand function3

h(x) = f (x)g 0 (x)

This makes computations easier, as the techniques for solving Riemann’s integrals can also
be used for Stieltjes’integrals: in particular integration by substitution and by parts can be
used; furthermore it is not hard to de…ne the generalized Stieltjes’integral by following the
same steps for the generalized Riemann’s integral.
From a theoretical standpoint, Stieltjes’integral substantially extends the reach of Riemann’s
integral, while keeping, also thanks to (32.4), its remarkable analytical properties. Such an
extraordinary balance between being more general while still being analytically tractable
allows us to grasp the importance of Stieltjes’integral.

Let us conclude with a useful variation on this theme (which we won’t prove).

R x 1168 Let g be la integral function of an Riemann integrable function , that


Proposition
is g (x) = a (t) dt for every x 2 [a; b]. If f is continuous, we have that:
Z b Z b
f (x)dg (x) = f (x) (x)dx
a a

If is continuous (hence also Riemann integrable), this proposition follows from the pre-
vious one as, thanks to the Second Fundamental Theorem of Calculus, g is di¤erentiable
with g 0 = .

32.4 Properties
Properties similar to those for Riemann’s integral hold for Stieltjes’. The only substantial
novelty lies in a linearity property which holds not only with respect to the integrand function
f , but with respect to the integrator function g as well. Let us list the properties without
proving them, as the proofs are analogous to those of Section 30.5.

(i) Linearity with respect to the integrand function:


Z b Z b Z b
( f1 + f2 )dg = f1 dg + f2 dg 8 ; 2R
a a a
3
Riemann’s integral is the simplest example of (32.4), with g 0 (x) = 1.
32.5. STEP INTEGRATORS 853

(ii) Positive linearity4 with respect to the integrator function:


Z b Z b Z b
f d( g1 + g2 ) = f dg1 + f dg2 8 ; 0
a a a

(iii) Additivity with respect to the integration interval:


Z b Z c Z b
f dg = f dg + f dg (32.12)
a a c

(iv) Monotonicity:
Z b Z b
f1 f2 =) f1 dg f2 dg
a a

(v) Absolute value:


Z b Z b
f dg jf j dg
a a

32.5 Step integrators


Riemann’s integral is a special case of Stieltjes’function in which the integrator function is
the identity, that is g (x) = x . Stieltjes’integral’s great ‡exibility becomes clear when we
consider integrator functions which are substantially di¤erent from the identity, for example
scale functions.
For simplicity, in the next statement we shall denote the unilateral, right and left, limits
of the integrator g : [a; b] ! R in a point x0 by g (x ) and g (x+ ), that is

g x+
0 = lim g (x) and g x0 = lim g (x)
x!x+
0 x!x0

by setting g (a ) = g (a) and g (b+ ) = g (b). The di¤erence

g x+
0 g x0

is therefore the potential jump of g in x0 .

Proposition 1169 Let f : [a; b] ! R be continuous and g : [a; b] ! R be a monotone step


function, with discontinuities in the points fc1 ; :::; cn g of the interval [a; b]. It holds that
Z b n
X
f dg = f (ci ) g c+
i g ci (32.13)
a i=1

In other words, Stieltjes’ integral is the sum of all the jumps of the integrator in the
points of discontinuity, multiplied by the value of the integrand in such points. Note that,
as the integrator is monotone, the jumps are either all positive (increasing monotonicity) or
all negative (decreasing monotonicity).
4
The positivity of and is required in order to ensure that the integrator function g1 + g2 is increasing.
854 CHAPTER 32. STIELTJES’INTEGRAL

Rb
Proof By Proposition 1165, the integral a f dg exists. We must show that its value is
(32.13). Let us consider a partition of [a; b] which is …ne-grained enough so that in every
interval Ii = [xi 1 ; xi ] there is at most one point of discontinuity cj ; j = 1; 2; :::; n (otherwise,
it would be enough to add at most n points to obtain the desired partition). Therefore, we
have that = fx0 ; x1 ; :::; xm g with m n. For such a partition it holds that
m
X
I( ; f; g) = mi (g(xi ) g(xi 1 )) (32.14)
i=1

where mi = inf Ii f (x). Let us consider the generic i-th term of the summation in (32.14),
which refers to interval Ii . There are two cases:

1. There exists j 2 f1; 2; :::; ng such that cj 2 Ii . In such a case, since Ii does not contain
any other points of discontinuity of g besides cj , we have that

g(xi 1) = g(cj ) and g(xi ) = g(c+


j )

and furthermore
f (cj ) inf f (x) = mi
Ii

So, in this case it holds that


h i
mi (g(xi ) g(xi 1 )) f (cj ) g c+
j g cj (32.15)

Let us denote as J the set of indexes i 2 f1; 2; :::; mg such that cj 2 Ii for some
j 2 f1; 2; :::; ng. Clearly, jJj = n.

2. Ii does not contain any cj . In such a case, g(xi ) = g(xi 1) and so

mi (g(xi ) g(xi 1 )) =0 (32.16)

Let us denote as J c the set of indexes i 2 f1; 2; :::; mg such that cj 2


= Ii for every
c
j = 1; 2; :::; n. Clearly, jJ j = m n.

Obviously we have that J [ J c = f1; 2; :::; mg. Hence


m
X X X
I( ; f; g) = mi (g(xi ) g(xi 1 )) = mi (g(xi ) g(xi 1 )) + mi (g(xi ) g(xi 1 ))
i=1 i2J i2J c

By using (32.15) and (32.16) it is now evident that

X n
X h i
I( ; f; g) = mi (g(xi ) g(xi 1 )) f (cj ) g c+
j g cj
i2J i=1

We can analogously show that


n
X h i
S( ; f; g) f (cj ) g c+
j g cj
i=1
32.5. STEP INTEGRATORS 855

So
n
X
I( ; f; g) f (ci ) g c+
i g ci S( ; f; g)
i=1
Since the inequalities hold for …ner partitions than that considered, we have that
n
X
sup I( ; f; g) f (ci ) g c+
i g ci inf S( ; f; g)
2 2
i=1
Rb
which implies that, since the integral a f dg exists, that
Z b n
X
f dg = sup I( ; f; g) = inf S( ; f; g) = f (ci ) g c+
i g ci
a 2 2
i=1

thus proving the desired result.

Example 1170 Let f; g : [0; 1] ! R be given by f (x) = x2 and


8
>
> 0 if 0 x < 12
<
3
g (x) = 4 if 12 x < 23
>
>
:
1 if 23 x 1
The discontinuities are in 1=2 and 2=3, where we have that
1+ 2 1 2+ 2 3
g = ; g =0 ; g =1 ; g =
2 3 2 3 3 4
Equality (32.13) thus becomes
Z 1
1 1+ 1 2 2+ 2
f dg = f g g +f g g
0 2 2 2 3 3 3
2
3 2 3 5
= + 12 1 =
4 3 4 8
N

If we consider an integrator step function with unitary jumps, that is

g c+
i g ci =1 8i

equation (32.13) becomes


Z b n
X
f dg = f (ci )
a i=1
In particular, if f is the identity we get that
Z b n
X
f dg = ci
a i=1

Stieltjes’integral thus includes addition as a particular case. more generally, we shall soon
see that the mean value of a random variable can be seen as a Stieltjes’integral.
856 CHAPTER 32. STIELTJES’INTEGRAL

32.6 Integration by parts


For Stieltjes’ integral, the integration by parts formula takes the elegant form of a role
reversal between f and g.

Proposition 1171 Given two functions f; g : [a; b] ! R which are both increasing, it holds
that Z b Z b
f dg + gdf = f (b) g (b) f (a) g (a) (32.17)
a a

Proof For every " > 0 there are two partitions, = fxi gni=0 and 0 = fyi gni=0 , of [a; b] such
that
Z b Xn
"
f dg f (xi 1 ) (g (xi ) g (xi 1 )) <
a 2
i=1

and
Z b n
X "
gdf g (yi ) (f (yi ) f (yi 1 )) <
a 2
i=1

Let 00 = fzi gni=0


be the partition 00 = [ 0. The two inequalities still hold for partition
00 . Furthermore, note that

n
X n
X
f (zi 1 ) (g (zi ) g (zi 1 )) + g (zi ) (f (zi ) f (zi 1 )) = f (b) g (b) f (a) g (a)
i=1 i=1

which implies
Z b Z b
f dg + gdf f (b) g (b) + F (a) g (a) < "
a a

Since " was arbitrarily chosen, we reach the desired conclusion.

Thanks to Proposition 1167, whenever f and g are di¤erentiable we get that


Z b Z b
0
f g dx + gf 0 dx = f (b) g (b) f (a) g (a)
a a

thus obtaining the integration by parts formula (30.52) for Riemann’s integral .

32.7 Change of variable


The next theorem, whose simple, yet tedious proof we omit, gives the change of variable
formula for the Stieltjes integral.

Theorem 1172 Let f be continuous and g increasing. If ' : [c; d] ! [a; b] is a strictly
increasing function, then (f ') is integrable with respect to g, with
Z d Z '(d)
f (' (t)) d (g ') (t) = f (x) dg (x) (32.18)
c '(c)
32.7. CHANGE OF VARIABLE 857

If ' is surjective, we can just write


Z d Z a
f (' (t)) d (g ') (t) = f (x) dg (x)
c b

If both ' and g are di¤erentiable, by Proposition 1167 we then have


Z d Z a
0 0
f (' (t)) g (' (t)) ' (t) dx = f (x) dg (x)
c b
Rd
In particular, if g (x) = x we get back the Riemann formula (30.54), that is, c f (' (t)) '0 (t) dx =
Rb
a f (x) dx. The Stieltjes thus clari…es the nature of this earlier formula. After integration
by parts, the change of variable formula is thus another result that is best stated in terms
of the Stieltjes integral.

If g is strictly increasing (and so invertible), by setting g 1 = ' in (32.18) we get the


interesting formula
Z g(b) Z b
1
f g (t) dt = f (x) dg (x)
g(a) a

When g is strictly increasing, the Stieltjes integral can be computed via a Riemann integral.
This result complements Proposition 1167, which showed that the same is true, but with a
di¤erent formula, when g is di¤erentiable.
858 CHAPTER 32. STIELTJES’INTEGRAL
Chapter 33

Moments

In this chapter we outline a study of moments, a notion that plays a fundamental role in
probability theory and, through it, in a number of applications. For us, it is also a way to
illustrate what we learned in the last two chapters.

33.1 Densities
We say that an increasing function g : R ! R is a probability integrator if:

(i) limx! 1 g (x) = 0 and limx!+1 g (x) = 1;

(ii) g is right continuous, i.e., g (x0 ) = g x+


0 for all x0 2 (0; 1].

This class of integrators is pervasive in probability theory (as cumulative distributions


of random variables), and this justi…es their name. If g takes on value 0 outside a bounded
interval, say the unit interval [0; 1] for concreteness, condition (i) reduces to g (0) = 0 and
g (1) = 1.
If g is the integral function of a positive function : R ! R, that is,
Z x
g (x) = (t) dt 8x 2 R
1
R +1
we say that is a probability density of g. By condition (i), 1 (x) dx = 1. When g is
continuously di¤erentiable, the Second Fundamental Theorem of Calculus implies g 0 = .

Example 1173 (i) Given any two scalars a < b, consider the probability integrator
8
>
> 0 if x < a
<
x a
g (x) = b a if a x b
>
>
:
1 if x > b

Its probability density, called uniform, is


( 1
b a if a x b
(x) =
0 else

859
860 CHAPTER 33. MOMENTS

because Z x Z x
1
(t) dt = dt = g (x) 8x 2 [a; b]
1 a b a
R +1
and 1 (x) dx = 1.
(ii) The Gaussian integrator is
Z x
1 t2
g (x) = p e 2 dt
1 2

The Gaussian probability density is


1 x2
(x) = p e 2
2
R +1
because 1 (t) dt = 1 (see Section 30.10.3). N

33.2 Moments
R +1
The improper Stieltjes integral, denoted 1 f (x) dg (x), can be de…ned in a similar way
than the improper Riemann integral. For it the proprieties (i)-(v) of Section 32.4 continue
to hold. The next important de…nition rests upon this notion.

De…nition 1174 The n-th moment of an integrator function g is given by the Stieltjes
integral Z +1
n = xn dg (x) (33.1)
1

For instance, 1 is the …rst moment (often called average or mean) of g, 2 is its second
moment, 3 is its third moment, and so on.

Proposition 1175 If the moment n exists, then all lower moment k, with k n, exist.

To assume the existence of higher and higher moments is, therefore, a more and more
demanding requirement. For instance, to assume the existence of the second moment is a
stronger hypothesis than to assume that of the …rst one.

R +1 n To ease matters,
Proof assume there is a scalar a such that g (a) = 0, so that n =
x dg (x). Since x = o (xn ) if k < n, the version for improper Stieltjes integrals of
k
a R +1
Proposition 1150-(ii) ensures the convergence of a xk dg (x), that is, the existence of k .

If g has a probability density , by Proposition 1168 we have


Z +1 Z +1
n
x dg = xn (x)dx (33.2)
1 1

In this case, we are back to Riemann integration and we directly say that n is the n-th
moment of the density .
33.3. THE PROBLEM OF MOMENTS 861

Example 1176 (i) For the uniform density we have


Z +1 Z b
1 1 b2 a2 a+b
1 = x (x) dx = x dx = =
1 a b a b a 2 2
Z +1 Z b 3 3
1 1 b a 1 2
2 = x2 (x) dx = x2 dx = = a + ab + b2
1 a b a b a 3 3
(ii) For the Gaussian density we have:
Z +1 Z +1 Z +1 Z 0
1 x2 1 x2 1 x2
1 = x (x) dx = xp e 2 dx = xp e 2 dx + xp e 2 dx
1 1 2 0 2 1 2
Z +1 Z 0
1 x2 1 x2
= x p e 2 dx ( x) p e 2 dx
0 2 1 2
Z +1 Z +1
1 x2 1 x2
= x p e 2 dx x p e 2 dx = 0
0 2 0 2
By integrating by parts,
Z +1 Z +1 Z +1
2 2 1 x2 1 x2
2 = x (x) dx = x p e 2 dx = xp xe 2 dx
1 1 2 1 2
+1 Z +1
1 x2 1 x2
= p xe 2 + p e 2 dx = 0 + 1 = 1
2 1 1 2
p x2
where we adapted (30.52) to the improper case, with g (x) = x= 2 and f 0 (x) = xe 2 , so
p x2
that g 0 (x) = 1= 2 and f (x) = e 2 . N

33.3 The problem of moments


Consider a probability integrator g that takes on value 0 outside the unit interval. The n-th
moment of g is then Z 1
n = xn dg (33.3)
0
If all moments exist, they form a sequence f n g of scalars in [0; 1]. For instance, if g (x) = x
we have n = 1= (n + 1).
In this unit interval setting, the problem of moments takes the following form:

Given a sequence f n g of scalars in [0; 1], is there an integrator g such that, for each
n, the term n is exactly its n-th moment n ?

The question amounts to ask whether sequences of moments have a characterizing prop-
erty, which then f n g should satisfy in order to have the desired property. This question
was …rst posed by Stieltjes himself in the same 1894-95 articles where it developed his notion
of integral. Indeed, to provide a setting where to address properly the problem of moments
was a main motivation for his integral (which, as we just remarked, is indeed the natural
setting where to de…ne moments).
Next we present a most beautiful answer given by Felix Hausdor¤ in the early 1920s. To
do it, we need to go back to the …nite di¤erences of Chapter 10.
862 CHAPTER 33. MOMENTS

De…nition 1177 A sequence fxn g1


n=0 is totally monotone if, for every n 0, we have
( 1)k k xn 0 for every k 0.

In words, a sequence is totally monotone if its …nite di¤erences keep alternating sign
across their orders. A totally monotone sequence is positive because 0 xn = xn , as well as
decreasing because xn 0 (Lemma 372).

We can now answer the question we posed.


R1
Theorem 1178 (Hausdor¤) A sequence f n g [0; 1] is such that n = 0 xn dg for a
probability integrator g if and only if it is totally monotone.

Proof We prove the “only if” part, the converse being signi…cantly more complicated.
k k
So,
R 1 nlet f n gk be a sequence of moments (33.3). It su¢ ces to show that ( 1) xn =
0 t (1 t) dg (t) 0. We proceed by induction on k. For k = 0 we trivially have
k k R1 n k 1 k 1 R1 n k 1
( 1) n = n = 0 t dg (t) for all n. Assume ( 1) xn = 0 t (1 t) dg (t)
for all n. Then,

k k 1
xn = xn = k 1 xn+1 k 1
xn
Z 1 Z 1
= ( 1)k 1
tn+1 (1 t)k 1
dg (t) tn (1 t)k 1
dg (t)
0 0
Z 1 Z 1
= ( 1)k 1
tn (1 t)k 1
(1 t) dg (t) = ( 1)k tn (1 t)k dg (t)
0 0

as desired.

The characterizing property of moment sequences is, thus, total monotonicity. It is truly
remarkable that a property of …nite di¤erences is able to pin down moments’sequences. Note
that for this result the Stieltjes integral is required: in the “if” part the integrator, whose
moments turn out to be the terms of the given totally monotone sequence, might well be
non-di¤erentiable (and so the Riemann version (33.2) might not hold).

33.4 Moment generating function


De…nition 1179 Let g be a probability integrator for which there exists " > 0 such that
Z +1
eyx dg (x) < +1 8y 2 ( "; ") (33.4)
1

The function F : ( "; ") ! R de…ned by


Z +1
F (y) = eyx dg (x)
1

is called moment generating function of g.


33.4. MOMENT GENERATING FUNCTION 863

Assume that g has a probability density , so that


Z +1
F (y) = eyx (x) dx
1

In this case, the function F is of the form (31.13), with

f (x; y) = eyx (x)

We can then use Proposition 1161 to establish the existence and di¤erentiability of the
moment generating function.
R +1 In particular, if there exists " > 0 and a positive function
g : R ! R such that 1 g (x) dx < +1 and, for every y 2 [ "; "],

eyx (x) g (x) 8x 2 R

then F : ( "; ") ! R is di¤erentiable, with


Z +1 Z +1
0 @ yx
F (y) = e (x) dx = xeyx (x) dx
1 @y 1

At y = 0 we get
F 0 (0) = 1

The derivative at 0 of the moment generating function is, thus, the …rst moment of the
density. R +1
If there exists a positive function h : R ! R such that 1 h (x) dx < +1 and, for every
y 2 [ "; "],
jxeyx (x)j = jxj eyx (x) h (x) 8x 2 R
then, by Proposition 1161, F : ( "; ") ! R is twice di¤erentiable, with
Z +1 Z +1
00 @ yx
F (y) = xe (x) dx = x2 eyx (x) dx
1 @y 1

At y = 0 we get
F 00 (0) = 2

By proceeding in this way (if possible), with higher order derivatives we get:

F 000 (0) = 3
(iv)
F (0) = 4

F (n) (0) = n

The derivative of order n at 0 of the moment generating function is, thus, the n-th moment
of the density. This fundamental property justi…es the name of this function.
864 CHAPTER 33. MOMENTS

x2
Example 1180 For the Gaussian density (x) = e 2 we have
Z +1 Z +1 Z +1
x2 1 2
F (y) = eyx (x) dx = eyx e 2 dx = e 2 (x 2yx) dx
1 1 1
Z +1 Z +1 2
Z +1
y2
1
x2
( 2yx+y 2 y2 ) dx = 1
x2
( 2yx+y 2 )+ y2 1
(x y)2
= e 2 e 2 dx = e 2 e 2 dx
1 1 1

where in the fourth equality we have added and subtracted y 2 . But, (30.74) of Chapter
R +1 1 2 y2 y2
30 implies 1 e 2 (x y) dx = 1, so F (y) = e 2 . We have F 0 (y) = ye 2 and F 00 (y) =
y2
e 2 (1 y), so 1 = F 0 (0) = 0 and 2 = F 00 (0) = 1. N

The next example shows that not all densities have a moment generating function; in
this case there is no " > 0 such that the integral (33.4) is …nite.

Example 1181 Let 8 1


< x2
if x > 1
(x) =
:
0 else
R +1 2 dx
This is the Pareto probability density (recall from Example 1135 that 1 x = 1). For
every y > 0 we have
Z +1 Z +1 yx Z +1 yx
yx e e
e (x) dx = 2
dx = dx = +1
1 1 x 1 x2

Therefore, the moment generating function does not exist. Since


Z +1 Z +1
1 1
1 = x 2 dx = dx = +1
1 x 1 x

the …rst moment does not exist either. By the comparison criterion for improper Riemann
integrals, this implies n = +1 for every n 1. This density has no moments of any order.
N

Suppose that the moment generating function has derivatives of all orders. By Theorem
355,
X1
y 2 x2 y 3 x3 y n xn y n xn
eyx = 1 + yx + + + + + =
2 3! n! n!
n=0
So, it is tempting to write:
Z +1 Z 1
+1 X X 1 Z +1 X yn 1
y n xn y n xn
F (y) = eyx (x) dx = (x) dx = (x) dx = n
1 1 n=0 n! 1 n! n!
n=0 n=0

Under suitable hypotheses, spelled out in more advanced courses, it is legitimate to give in to
this temptation. Moment generating functions can be then expressed as a series of moments
of the density.
Part IX

Appendices

865
Appendix A

Permutations

A.1 Generalities
Combinatorics is an important area of discrete mathematics, useful in many applications.
Here we focus on permutations, a fundamental combinatorial topic that is important to
understand some of the topics of the book.
We start with a simple problem. We have at our disposal three pairs of pants and …ve
T-shirts. If there are no chromatic pairs that hurt our aesthetic sense, in how many possible
ways can we dress? The answer is very simple: in 3 5 = 15 ways. Indeed, let us call a, b, c
the pairs of pants and 1, 2, 3, 4, 5 the T-shirts: since the choice of a certain T-shirt does not
determine any (aesthetic) restriction on the choice of the pants, the possible pairings are

a1 ; a2 ; a3 ; a4 ; a5
b1 ; b2 ; b3 ; b4 ; b5
c1 ; c2 ; c3 ; c4 ; c5

We can therefore conclude that if we have to do two independent choices, one among n
di¤erent alternative and the other among m di¤erent alternatives, the the total number of
choices is n m. In particular, suppose that A and B are two sets with with n and m elements,
respectively. In particular, suppose the Cartesian product A B, which is set of the ordered
pairs (a; b) with a 2 A and b 2 B, has n m elements.
What has been said can be easily extended to the case of more than two choices: If
we have to do multiple choices, none of which implies restrictions on the others, the total
number of choices is the product of the numbers of alternatives of each choice.

Example 1182 (i) How many are the possible Italian licence plates? They have the form
AA 000 AA with two letters, three digits, and again two letters. The letters that can be
used are 22 and the digits are, obviously, 10. The number of (di¤erent) plates is, therefore,
22 22 10 10 10 22 22 = 234:256:000. (ii) In a multiple choice test, in each question
students have to select one of the three possible answers. If there are 13 questions, then the
overall number of possible selections is 3 3 3 = 313 = 1:594:323. N

867
868 APPENDIX A. PERMUTATIONS

A.2 Permutations
Intuitively, a permutation of n distinct objects is a possible arrangement of these objects.
For instance, with three objects a, b, c there are 6 permutations:
abc , acb , bac , bca , cab , cba (A.1)
We can formalize this notion through bijective functions.

De…nition 1183 Let X be any collection. A permutation on Xis a bijective function f :


X ! X.

Permutations are nothing but the bijective functions f : X ! X. Though combinatorics


typically considers …nite sets X, the de…nition is fully general.
For instance, if X = fa; b; cg the permutations f : fa; b; cg ! fa; b; cg that correspond to
the arrangements (A.1) are:

(i) abc corresponds to the permutation f (x) = x for all x 2 X;


(ii) acb corresponds to permutation f (a) = a, f (b) = c and f (c) = b;
(iii) bac corresponds to permutation f (a) = b, f (b) = a and f (c) = c;
(iv) bca corresponds to permutation f (a) = b, f (b) = c and f (c) = a;
(v) cab corresponds to permutation f (a) = c, f (b) = a and f (c) = b;
(vi) cba corresponds to permutation f (a) = c, f (b) = b and f (c) = a.

Proposition 1184 The number of permutations on a set with n elements is n! = 1 2 n.

The number n! is called factorial of n. We set conventionally 0! = 1.


To understand, heuristically, the result consider any arrangement of the n elements. In
the …rst place we can put any element: the …rst place can therefore be occupied in n di¤erent
ways. In the second place we can place any of the remaining elements: the second place can
be occupied in n 1 di¤erent ways. By proceeding in this way, we see that the third position
can be occupied in n 2 di¤erent ways, and so on so forth, till 1 since at the end of the
process we have no choice because only one element is left. The number of the permutations
is, therefore, n (n 1) (n 2) 2 1 = n!.

Example 1185 (i) A deck of 52 cards can be reshu- ed in 52! di¤erent ways. (ii) Six
passengers can occupy in 6! = 720 di¤erent ways a six-passenger car. N

The recursive formula n! = n (n 1)! permits to de…ne the sequence of factorials xn = n!


also by recurrence as xn = nxn 1 , with …rst term x1 = 1. The rate of growth of this sequence
is impressive, as the following table shows:
n 0 1 2 3 4 5 6 7 8 9 10
n! 1 1 2 6 24 120 720 5:040 40:320 362:880 3:628:800

Indeed, Lemma 329 showed that n = o (n!). The already very fast exponentials are actually
slower than factorials, which de…nitely deserve their exclamation mark.
A.3. ANAGRAMS 869

A.3 Anagrams
We now drop the requirement that the objects be distinct and allow for repetitions. Spe-
ci…cally in this section we consider P n objects of h n di¤erent types, each type i with
h 1
multiplicity ki , with i = 1; :::; h, and i=1 ki = n. For instance, consider the 6 objects

a; a; b; b; b; c

There are 3 types a, b, and c with multiplicity 2, 3, and 1, respectively. Indeed, 2 + 3 + 1 = 6.


How many distinguishable arrangements are there? If in this example we distinguished
all the objects by using a di¤erent index for the identical objects, a1 ; a2 ; b1 ; b2 ; b3 ; c, there are
6! = 720 permutations. If now we eliminate the distinctive index to the three letters b, they
can be permuted in 3! di¤erent ways in the terns of places occupied by them. Such 3! di¤erent
permutations (when we write b1 ; b2 ; b3 ) are no longer distinguishable (by writing b; b; b).
Therefore, the di¤erent permutations of a1 ; a2 ; b; b; b; c are 6!=3!. A similar argument shows
that, by removing the distinctive index to the two letters a, the distinguishable permutations
reduce to 6!= (3!2!) = 60.
In general, one can prove the following result.

Proposition 1186 The number of distinct arrangements, called permutations with repeti-
tions (or anagrams), is
n!
(A.2)
k1 !k2 ! kh !

The integers (A.2) are called multinomial coe¢ cients.

Example 1187 (i) The possible anagrams of the word ABA are 3!= (2!1!) = 3. They
are ABA, AAB, BAA. (ii) The possible anagrams of the word MAMMA are 5!= (3!2!) =
120= (6 2) = 10. N

In the important two-type case, h = 2, we have k objects of one type and n k of the
other type. By (A.2), the number of distinct arrangements is
n!
(A.3)
k! (n k)!
This expression is usually denoted by
n
k
It is called binomial coe¢ cient and read “n over k”. In particular,
n n! n (n 1) (n k + 1)
= =
k k! (n k)! k!
with
n n!
= =1
0 0!n!
1
Note that, because of repetitions, these n objects do not form a set X. The notion of “multiset” is
sometimes used for collections in which repetitions are permitted.
870 APPENDIX A. PERMUTATIONS

The following identity can be easily proved, for 0 k n,

n n
=
k n k

It captures a natural symmetry: the number of distinct arrangements remains the same,
regardless of which of the two types we focus on.

Example 1188 (i) In a parking lot, spots can be either free or busy. Suppose that 15
out of the 20 available spots are busy. The possible arrangements of the 5 free spots (or,
symmetrically, of the 15 busy spots) are:

20 20
= = 15; 504
5 15

(ii) We repeat an experiment 100 times: each time we can record either a “success” or a
“failure”, so a string of a 100 outcomes like F SF F:::S results. Suppose that we have recorded
92 “successes” and 8 “failures”. The number of the di¤erent strings that may result is:

100 100
= = 186; 087; 894; 300
92 8
N

A.4 Newton’s binomial formula


From high school we know that

(a + b)1 = a + b
(a + b)2 = a2 + 2ab + b2
(a + b)3 = a3 + 3a2 b + 3ab2 + b3

More generally, we have the following result.

Theorem 1189 (Tartaglia-Newton) We have

n n n n n
(a + b)n = an + a 1
b+ a 2 2
b + + abn 1
+ bn (A.4)
1 2 n 1
n
X n n k k
= a b
k
k=0

Proof The product


(a + b) (a + b) (a + b)
| {z }
n times

can be calculated by choosing one of the two terms (a or b) in each of the n factors and
making the product of the terms so chosen. We then sum all the products obtained in such
a way. The product an k bk is obtained by choosing n k times the …rst term a and the
A.4. NEWTON’S BINOMIAL FORMULA 871

n n
remaining k times the second term b. This can be done in = di¤erent ways:
n k k
n
the factor an k bk is, therefore, obtained in di¤erent ways. This proves the result.
k

Formula (A.4) is called Newton’s binomial formula. It motivates the name of binomial
n
coe¢ cients for the integers . In particular
k
X n
n n k
(1 + x) = x
k
k=0

If we take x = 1 we obtain the remarkable relation


n n n n
+ + + + = 2n
0 1 2 n

Example 1190 A set of n elements has 2n subsets. Indeed, there is only one, 1 = n0 ,
subset with 0 elements (the empty set), n = n1 subsets with only one element, n2 subsets
with two elements, ..., and …nally only one, 1 = nn , subset – the set itself – with all the n
elements. N

More generally, one can prove the multinomial formula:


X n!
(a1 + a2 + + ah )n = ak1 ak2 akhh
k1 !k2 ! kh ! 1 2
P
where the sum is over all the choices of natural numbers k1 , k2 ,..., kh such that hi=1 ki = n.
This formula motivates the name of multinomial coe¢ cients for the integers (A.2).
872 APPENDIX A. PERMUTATIONS
Appendix B

Notions of trigonometry

B.1 Generalities

We usually call trigonometric circumference a circumference with the center in the origin
and of radius 1, oriented in counterclockwise sense and on which we move starting from the
point of coordinates (1; 0).

y
1.5

0.5

(1,0)
0
O x
-0.5

-1

-1.5

-2
-2 -1 0 1 2

Trigonometric circumference

Clearly, each point on the circumference determines an angle between the positive ho-
rizontal axis and the straight line joining the point with the origin; vice versa, each angle
determines a point on the circumference. This correspondence between points and angles

873
874 APPENDIX B. NOTIONS OF TRIGONOMETRY

can be, equivalently, viewed as a correspondence between points and arcs of circumference.

y
1.5
P
P
2
1

α'
0.5

α
0
O P 1 x
1
-0.5

-1

-1.5

-2
-2 -1 0 1 2

The point P generates the angle and the arc 0

Angles are usually measured in either degrees or radians. A degree is the 360th part
of a round angle (corresponding to a complete round of the circumference); a radian is an
(apparently strange) unit of measure that assigns measure 2 to a round angle; it is therefore
its 2 -th part. We will use the radian as unit of measure of the angles because it presents
some advantages over the degree. In any case, the next table reports some equivalences
between degrees and radians.
degrees 0 30 45 60 90 180 270 360
3
radians 0 2
6 4 3 2 2
Angles that di¤er for one or more complete rounds of the circumference are identical: to
write or + 2k , with k 2 Z, is the same. We well therefore always take 0 <2 .
Fix a point P = (P1 ; P2 ) on the trigonometric circumference. It is called sine of the angle
(or of the arc) determined by it the ordinate P2 of the point P ; it is called instead cosine
the abscissa P1 of the point P .

The sine and the cosine of the angle (or the arc) are denoted, respectively, by sin
and cos . The sine is positive in the I and in the II quadrant, and negative in the III and
in the IV quadrant; the cosine is positive in the I and in the IV quadrant, and negative in
the II and in the III quadrant. For example,
3
0 p4 2 2 2
2
sin 0 p2
1 0 1 0
2
cos 1 2 0 1 0 1

In view of the previous discussion, for every k 2 Z we have


sin ( + 2k ) = sin and cos ( + 2k ) = cos (B.1)
B.2. CONCERTO D’ARCHI (STRING CONCERT) 875

Note that Pythagoras’Theorem guarantees that, for every 2 R,

sin2 + cos2 =1 (B.2)

This classical identity is sometimes called the Pythagorean trigonometric identity.


Fixed again a point on the circumference, it is called tangent of the angle (or of the arc)
determined by it the ratio between its ordinate and its abscissa and cotangent the inverse
! !
ratio. In the previous …gure, the tangent of the angle determined by P is the ratio P B=OB
! !
and the cotangent is the ratio OB=P B.
Tangent and cotangent of the angle are denoted, respectively, by tan and cotan .
It is not necessary to talk of the cotangent since it is simply the reciprocal of the tangent1 .
The tangent is positive in the I and in the III quadrant, and negative in the II and in the IV
quadrant. For example,
3
0 4 2 2 2
tan 0 1 !1 0 !1 0

Again, for every k 2 Z,


tan ( + k ) = tan (B.3)
We trivially have
sin
tan =
cos
and so, by Pythagorean trigonometric identity,
tan2
sin2 =
1 + tan2

B.2 Concerto d’archi (string concert)


We list, just for sine and cosine, some simple relations between angles (arcs).

(i) Angles and :

sin ( )= sin ; cos ( ) = cos

(ii) Angles and 2 :

sin = cos ; cos = sin


2 2

(iii) Angles and 2 + :

sin + = cos ; cos + = sin


2 2

(iv) Angles and :

sin ( ) = sin ; cos ( )= cos


1
Some pedants call also secant and cosecant respectively the reciprocals of sine and cosine.
876 APPENDIX B. NOTIONS OF TRIGONOMETRY

(v) Angles and + :

sin ( + ) = sin ; cos ( + ) = cos

Next we list some formulae that we do not prove (it would be su¢ cient to prove the
…rst two because the other ones follow from them).

Addition and subtraction formulae:

sin ( + ) = sin cos + sin cos ; cos ( + ) = cos cos sin sin

and

sin ( ) = sin cos sin cos ; cos ( ) = cos cos + sin sin (B.4)

Duplication and bisection formulae:

sin 2 = 2 sin cos ; cos 2 = cos2 sin2

and r r
1 cos 2 1 + cos 2
sin = ; cos =
2 2
Prostaferesis formulae:

sin ( + ) + sin ( ) = 2 sin cos ; sin ( + ) sin ( ) = 2 cos cos

and

cos ( + ) + cos ( ) = 2 cos cos ; cos ( + ) cos ( )= 2 sin sin

We close with few classical theorems that show how trigonometry is intimately linked
to the study of triangles. In these theorems a, b, c denote the lengths of the three sides
of a triangle and , , the angles opposite to them:

Theorem 1191 Sides are proportional to the sines of their opposite angles, that is,
a b c
= =
sin sin sin
The next result generalizes Pythagoras’ Theorem, which is the special case when the
triangle is right and side a is the hypotenuse (indeed, cos = cos =2 = 0).

Theorem 1192 (Carnot) We have a2 = b2 + c2 2ab cos .


B.3. PERPENDICULARITY 877

B.3 Perpendicularity
The trigonometric circumference consists of the points x 2 R2 of unitary norm, that is,
kxk = 1. Hence, any point x = (x1 ; x2 ) 2 R2 can be moved back on the circumference by
dividing it by its norm kxk since
x
=1
kxk
The following picture illustrates:

It follows that
x2 x1
sin = and cos = (B.5)
kxk kxk
that is,
x = (kxk cos ; kxk sin )

Such trigonometric representation of the vector x is called polar. The components kxk cos
and kxk sin are called polar coordinates.

The angle can be expressed through the inverse trigonometric functions arcsin x,
arccos x and arctan x. To this end, observe that
x2
sin kxk x2
tan = = x1 =
cos kxk x1

Together with (B.5), this implies

x2 x1 x2
= arctan = arccos = arcsin
x1 kxk kxk

The equality = arctan x2 =x1 is especially important because it permits to express the angle
as a function of the coordinates of the point x = (x1 ; x2 ).
878 APPENDIX B. NOTIONS OF TRIGONOMETRY

Let x and y be two vectors of the plane R2 that determine the angles and :

By (B.4), we have:

x y = (kxk cos ; kxk sin ) (kyk cos ; kyk sin )


= kxk kyk (cos cos + sin sin ) = kxk kyk cos ( )

that is,
x y
= cos ( )
kxk kyk
where is the angle that is di¤erence of the angles determined by the two points.

Such angle is right –i.e., the vectors x and y are “perpendicular” –when
x y
= cos = 0
kxk kyk 2

That is, if and only if x y = 0. In other words, two vectors of the plane R2 are perpendicular
when their inner product is zero.
Appendix C

Elements of intuitive logic

C.1 Propositions
In this chapter we will introduce some basic notions of logic. Though, “logically”, these
notions should actually be at the beginning of the textbook, they can be best appreciated
after having learned some mathematics (even if in a logically disordered way). This is why
this chapter is in the Appendix, leaving to the reader to judge when it is best to read it.
We will call proposition a statement that can be either true or false. For example,
“the ravens are black” and “in the year 1965 it rained in Milan” are propositions. On the
contrary, the statement “in the year 1965 it has been cold in Milan” is not a proposition,
unless we specify the meaning of cold, for example with the proposition “in the year 1965
the temperature went below zero in Milan”.
We will denote propositions by letters such as p; q; :::. Moreover, we will denote for the
sake of brevity with 1 and 0, respectively, the truth or the falsity of a proposition: they are
called truth values.

C.2 Operations
The operations among propositions are:

(i) Negation. Let p be a proposition; the negation, denoted by :p, is the proposition that
is true when p is false and that is false when p is true. We can summarize the de…nition
in the following truth table
p :p
1 0
0 1

which reports the truth values of p and :p. For instance, if p is “in the year 1965 it
rained in Milan”, then :p is “in the year 1965 it did not rain in Milan”.

(ii) Conjunction. Let p and q be two propositions; the conjunction of p and q, denoted by
p ^ q, is the proposition that is true when p and q are both true and is false when at

879
880 APPENDIX C. ELEMENTS OF INTUITIVE LOGIC

least one of the two is false. The truth table is:

p q p^q
1 1 1
1 0 0
0 1 0
0 0 0

For instance, if p is “in the year 1965 it rained in Milan” and q is “in the year 1965
the temperature went below zero in Milan”, then p ^ q is “in the year 1965 it rained in
Milan and the temperature went below zero”.

(iii) Disjunction. Let p and q be two propositions; the disjunction of p and q, denoted by
p _ q, is the proposition that is true when at least one between p and q is true and is
false when both of them are false.1 The truth table is:

p q p_q
1 1 1
1 0 1
0 1 1
0 0 0

For instance, with the previous examples of p and q, then p _ q is “in the year 1965 it
rained in Milan or the temperature went below zero”.

(iv) Conditional. Let p and q be two propositions; the conditional, denoted by p =) q, is


the proposition with truth table:

p q p =) q
1 1 1
1 0 0
0 1 1
0 0 1 (C.1)

The conditional is therefore true if, when p is true, also q is true, or if p is false (in
which case the truth value of q is irrelevant). The proposition p is called the antecedent
and q is the consequent. For instance, suppose the antecedent p is “I go on vacation”
and the consequent q is “I go to the sea”; the conditional p =) q is “If I go on
vacation, then I go to the sea”.

(v) Biconditional. Let p and q be two propositions; the biconditional, denoted by p () q,


is the proposition (p =) q) ^ (q =) p) that involves the implication p =) q and
1
As the union symbol [, also the symbol _ reminds of the Latin “vel”, an inclusive “or”, as opposed to
the exclusive “aut”.
C.3. LOGICAL EQUIVALENCE 881

its converse q =) p, with truth table:

p q p =) q q =) p p () q
1 1 1 1 1
1 0 0 1 0
0 1 1 0 0
0 0 1 1 1

The biconditional is, therefore, true when the two involved implications are both true
or both false. With the last example of p and q, the biconditional p () q is “I go on
vacation if and only if I go to the sea”.

These …ve logical operations allow us to build new propositions form old ones. Starting
from the three propositions p, q and r, through negation, disjunction and conditional we can
build, for example, the proposition

: ((p _ :q) =) r)

Its truth table is:


p q r :q p _ :q (p _ :q) =) r : ((p _ :q) =) r)
1 1 1 0 1 1 0
0 1 1 0 0 1 0
1 0 1 1 1 0 1
0 0 1 1 1 1 0
1 1 0 0 1 0 1
0 1 0 0 0 1 0
1 0 0 1 1 0 1
0 0 0 1 1 0 1

O.R. The true-false dichotomy originates in the Eleatic school, which based its dialectics
upon it (Section 1.8). Apparently, it …rst appears as “[a thing] is or it is not” in the poem
of Parmenides (trans. Raven). A serious challenge to the universal validity of the true-false
dichotomy has been posed by some, old and new, paradoxes. We already encountered the set
theoretic paradox of Russell (Section 1.1.4). A simpler, much older, paradox is that of the
liar: consider the self-referential proposition “this proposition is false”. Is it true or false?
Maybe it is both.2 Be that as it may, for all relevant matters –in mathematics, let alone in
the empirical sciences –the dichotomy can be safely assumed.

C.3 Logical equivalence


Two classes of propositions are central, contradictions and tautologies. A proposition is called
contradiction if it is always false, while it is called tautology if it is always true. Obviously,
the contradictions and the tautologies have, respectively, truth tables with only values 0 and
2
A proposition such that both it and its negation are true has been called dialetheia (see G. Priest, “In
contradiction”, Oxford University Press, 2006)
882 APPENDIX C. ELEMENTS OF INTUITIVE LOGIC

only values 1. For such a reason, we write p 0 if p is a contradiction and p 1 if p is a


tautology. In other words, the symbol 0 denotes a generic contradiction and the symbol 1 a
generic tautology.
Two propositions p and q are (logically) equivalent, written p q, when they have
the same truth values, i.e., they are always both true or both false. In other words, two
propositions p and q are equivalent when the coimplication p () q is a tautology, i.e., it
is always true. The relation is called logical equivalence.
The following properties are evident:

(i) p ^ p p and p _ p p (idempotence);

(ii) : (:p) p (double negation);

(iii) p ^ q q ^ p and p _ q q _ p (commutativity);

(iv) (p ^ q) ^ r p ^ (q ^ r) and (p _ q) _ r p _ (q _ r) (associativity).

Moreover, we have:

(v) p ^ :p 0 (law of non-contradiction);

(vi) p _ :p 1 (law of excluded middle).

In words, proposition p ^ :p is a contradiction: a proposition and its negation cannot


be both true. In contrast, proposition p _ :p is a tautology: a proposition is either true or
false, tertium non datur. Indeed:

p :p p ^ :p p _ :p
1 0 0 1
0 1 0 1

If p is the proposition “all ravens are black”, the contradiction p ^ :p is “all ravens are both
black and non-black” and the tautology p _ :p is “all ravens are either black or non-black”.

The de Morgan’s laws are:

: (p ^ q) :p _ :q and : (p _ q) :p ^ :q

They can be proved through the truth tables; we limit ourselves to the …rst law:

p q p^q : (p ^ q) :p :q :p _ :q
1 1 1 0 0 0 0
1 0 0 1 0 1 1
0 1 0 1 1 0 1
0 0 0 1 1 1 1

The table shows that the true values of : (p ^ q) and of :p _ :q are identical, as desired.
Note an interesting duality: the laws of non-contradiction and of the excluded middle can
be derived one from the other via de Morgan’s laws.
C.4. DEDUCTION 883

It is easy to see that p =) q is equivalent to :q =) :p, that is,

(p =) q) (:q =) :p) (C.2)

Indeed:
p q p =) q :p :q :q =) :p
1 1 1 0 0 1
1 0 0 0 1 0
0 1 1 1 0 1
0 0 1 1 1 1

The proposition :q =) :p is called the contrapositive of p =) q. Each conditional is,


therefore, equivalent to its contrapositive.

Finally, another remarkable equivalence for the conditional is

: (p =) q) (p ^ :q) (C.3)

That is, the negation of a conditional p =) q is equivalent to the conjunction between p


and the negation of q. Indeed:

p q p =) q : (p =) q) p ^ :q
1 1 1 0 0
1 0 0 1 1
0 1 1 0 0
0 0 1 0 0

N.B. Given two equivalent propositions, one of them is a tautology if and only if also the
other one is so. O

C.4 Deduction
An equivalence is a biconditional which is a tautology, i.e., which is always true. In a similar
way, we call implication a conditional which is a tautology, that is, (p =) q) 1. In this
3
case, if p is true then also q is true. We say that q is a logical consequence of p, written
p j= q.
The antecedent p is now called hypothesis and the consequent q thesis. Naturally, we
have p q when we have simultaneously p j= q and q j= p.
In our naive setup, a theorem is a proposition of the form p j= q, that is, an implication.
The proof is an argument that proves that the conditional p =) q is, actually, an implic-
ation.4 In order to do this it is necessary to prove that, if the hypothesis p is true, then
also the thesis q is true. Usually we choose one among the following three di¤erent types of
proof:
3
When p is false the implication is automatically true, as the table of truth (C.1) shows.
4
In these introductory notes we remain vague about what a logical “argument” is, leaving a more de-
tailed analysis to more advanced courses. We expect, however, that readers can (intuitively) recognize, and
elaborate, such arguments.
884 APPENDIX C. ELEMENTS OF INTUITIVE LOGIC

(a) direct proof: p j= q, i.e., to prove directly that, if p is true, also q is so;

(b) proof by contraposition: :q j= :p, i.e., to prove that the contrapositive :q =) :p is


a tautology (i.e., that if q is false, so is p);

(c) proof by contradiction (reductio ad absurdum): p ^ :q j= r ^ :r, i.e., to prove that the
conditional p ^ :q =) r ^ :r is a tautology (i.e., that, if p is true and q is false, we
reach a contradiction r ^ :r).

The validity of (b) follows from the equivalence (C.2). The proof by contraposition is,
basically, an upside down direct proof (momentarily, Theorem 1199 will be proved by con-
traposition). Let us then focus on the two main types of proofs, direct and by contradiction.

N.B. (i) When both p j= q and q j= p hold, the theorem takes the form of equivalence
p q. The implications p j= q and q j= p are independent and each of them requires its own
proof (this is why in the book we studied separately the “if” and the “only if”). (ii) When,
as it is often the case, the hypothesis is the conjunction of several propositions, we write
p1 ^ ^ pn j= q. If = fp 1 ; :::; pn g, we say that q is a logical consequence of , written
j= q. O

C.4.1 Direct
Sometimes p j= q can be proved with a straight argument.

Theorem 1193 If n is odd, then n2 is odd.

Proof Since n is odd, there is a natural number k such that n = 2k + 1. Then, n2 =


(2k + 1)2 = 2 2k 2 + 2k + 1, so n2 is odd.

Direct proofs are, however, often articulated in several steps, in a divide et impera spirit.
The next lemma is key.

Lemma 1194 j= is transitive.

Proof Assume p j= r and r j= q. We have to show that p =) q is a tautology, that is, that
if p is true, then q is true. Assume that p is true. Then, r is true because p j= r. In turn,
this implies that q is true because r j= q.

By iterating transitivity, we then get the following deduction scheme: p j= q if

p j= r1
r1 j= r2
(C.4)
rn j= q

The auxiliary n propositions ri break up the direct argument in n steps, thus forming a chain
of reasoning. We can write horizontally the scheme as:

p j= r1 j= r2 j= j= rn j= q
C.4. DEDUCTION 885

Example 1195 (i) Assume that p is “n2 + 1 is odd” and q is “n is even”. To prove p j= q,
let us consider the auxiliary proposition “n2 is even”. The implication p j= r is obvious,
while the implication r j= q will be proved momentarily (Theorem 1198). Jointly, these two
implications provide a direct proof p j= r j= q of p j= q, that is, of the proposition “if n2 + 1
is odd, then n is even”. (ii) Assume that p is “the scalar function f is di¤erentiable” and q
is “the scalar function f is integrable”. To prove p j= q is natural to consider the auxiliary
proposition “the scalar function f is continuous”. The implications p j= r and r j= q are
basic calculus results that, jointly, provide a direct proof p j= r j= q of p j= q, that is, of the
proposition “if the scalar function f is di¤erentiable, then it is integrable”. N

When p p1 _ _ pn , we have the (easily checked) equivalence

(p1 _ _ pn ) =) q (p1 =) q) ^ ^ (pn =) q)

Consequently, to establish pi j= q for each i = 1; ::; n amounts to establish p j= q. This is the


so-called proof by cases, where each pi j= q is a case. Needless to say, the proof of each case
may require its own deduction scheme (C.4).

Theorem 1196 If n is any natural number, then n2 + n is even.

Proof Assume that p is “n is any natural number”, p1 is “n is an odd number”, p2 is “n is


an even number”, and q is “n2 + n is even”. Since p p1 _ p2 , we prove the two cases p1 j= q
and p2 j= q.
Case 1: p1 j= q. We have p1 = 2k + 1 for some natural number k, so n2 + n =
(2k + 1)2 + 2k + 1 = 2 2k 2 + 3k + 1 , which is even.
Case 2: p2 j= q. We have p1 = 2k for some natural number k, so n2 + n = (2k)2 + 2k =
2 2k 2 + 1 , which is even.

C.4.2 Reductio ad absurdum


To understand the rationale of the proof by contradiction, note that the truth table:

p q p ^ :q r ^ :r p =) q p ^ :q =) r ^ :r
1 1 0 0 1 1
1 0 1 0 0 0
0 1 0 0 1 1
0 0 0 0 1 1

proves the logical equivalence

(p =) q) (p ^ :q =) r ^ :r) (C.5)

Hence, p =) q is true if and only if p ^ :q =) r ^ :r is true. Consequently, to establish


p ^ :q j= r ^ :r amounts to establish p j= q.
It does not matter which is the proposition r because, in any case, r ^ :r is a contra-
diction. In a more compact way, we can rewrite the previous equivalence as (p =) q)
(p ^ :q =) 0).
886 APPENDIX C. ELEMENTS OF INTUITIVE LOGIC

The proof by contradiction is the most intriguing (recall Section 1.8 on the birth of the
deductive method). We illustrate it with one of the gems of Greek mathematics that we saw
in the …rst chapter. For brevity, we do not repeat the proof of the …rst chapter and just
present its logical analysis.
p
Theorem 1197 2 2 = Q.

Logical analysis In this, as in other theorems it might seem that there is no hypothesis, but
it is not so: simply the hypothesis is concealed. For example, here the concealed hypothesis
is “the rules
p of elementary algebra apply”. Let p be such concealed hypothesis, let q be the
thesis “ 2 2 = Q” and let r be the proposition “m=n is reduced to its lowest terms”. The
scheme of the proof is p ^ :q j= r ^ :r, i.e., if the rules of elementary algebra apply, the
negation of the thesis leads to a contradiction.

An important special case of the equivalence (C.5) is when the role of r is played by the
hypothesis p itself. In this case, (C.5) becomes

(p =) q) (p ^ :q =) p ^ :p)

The following truth table


p q p =) q p ^ :q :p p ^ :q =) :p p ^ :q =) p ^ :p
1 1 1 0 0 0 1
1 0 0 1 0 0 0
0 1 1 0 1 1 1
0 0 1 0 1 1 1

proves the equivalence (p ^ :q =) p ^ :p) (p ^ :q =) :p). In the special case r = p


the reductio ad absurdum is, therefore, based on the equivalence

(p =) q) (p ^ :q =) :p)

In words, it is necessary to show that the hypothesis and the negation of the thesis imply,
jointly, the negation of the hypothesis. Let us see an example.

Theorem 1198 If n2 is even, then n is even.

Proof Let us assume, by contradiction, that n is odd. In such a case n2 is odd, which
contradicts the hypothesis.

Logical analysis. Let p be the hypothesis “n2 is even” and q the thesis “n is even”. The
scheme of the proof is p ^ :q j= :p.

C.4.3 Summing up
Proofs require, in general, some inspiration: there are no recipes or mechanical rules that
can help us in …nding in a proof by contradiction an auxiliary proposition r that determines
the contradiction and in a direct proof the auxiliary proposition ri that permit to articulate
a direct argument.
As to terminology, the implication p j= q can be read in di¤erent, but equivalent, ways:
C.5. THE LOGIC OF SCIENTIFIC INQUIRIES 887

(i) p implies q;

(ii) if p, then q;

(iii) p only if q ;

(iv) q if p;

(v) p is su¢ cient (condition) for q;

(vi) q is necessary (condition) for p.

The choice among these ways is a matter of expositional convenience. In a similar way,
the equivalence p q can be read as:

(i) p if and only if q;

(ii) p is necessary and su¢ cient (condition) for q.

For example, the next simple result shows that the implication “a > 1 j= a2 > 1”is true,
i.e., that “a > 1 is su¢ cient condition for a2 > 1”, i.e., that “a2 > 1 is necessary condition
for a > 1”.

Theorem 1199 If a > 1, then a2 > 1.

Proof Let us proceed by contraposition.


p Let a2 1. We want to show that a 1. This
follows by observing that jaj = a2 1.

C.5 The logic of scienti…c inquiries


Using the few notions of propositional logic that we learned, we can outline a description
of a deductive scienti…c inquiry based on the approach developed in the 1930s by Alfred
Tarski.5 Let A = fa; b; :::g be a primitive collection of propositions, often called atomic.
As we remarked at the end of Section C.2, through a …nite number of applications of the
logical operations _, ^, :, =), and () we can form new propositions, like for example
: ((a _ :b) =) c). Denote by P the collection of all such propositions. By de…nition, P
is closed with respect to the logical operations. We call L = (A; P; _; ^; :; =); ()) a
propositional language.
A function v : A ! f0; 1g assigns a truth value to each primitive proposition, and so to all
propositions in P via the logical operations. Indeed, in what follows we directly assume that
v is de…ned on the entire collection P . Each truth assignment v corresponds to a possible
con…guration of the empirical reality in which the propositions in P become meaningful and
are either true or false.6 Each truth assignment is, thus, a possible interpretation of P .
5
See his “Introduction to logic and to the methodology of the deductive sciences”, Oxford University Press,
1994.
6
Of course, behind this sentence there are a number of highly non-trivial conceptual issues about meaning,
truth, reality, etc. etc. (an early classical analysis of these issues can be found in R. Carnap, “Testability and
meaning”, Philosophy of Science, 3, 419-471, 1936).
888 APPENDIX C. ELEMENTS OF INTUITIVE LOGIC

Let V be the collection of all truth assignments. A proposition p 2 P is a tautology if


v (p) = 1 for all v 2 V and is a contradiction if v (p) = 0 for all v 2 V . In words, a tautology
is a proposition that is true under all interpretations, while a contradiction is a proposition
that is false under all them. The truth value of tautologies and contradictions thus only
depend on their own form, regardless of any interpretation that they can take.7

Lemma 1200 p j= q if and only if v (p) v (q) for all v 2 V .

Proof Let p j= q. If p is true also q is true (both values equal to 1); if p is false (value 0), q
can be true or false (value either 0 or 1). Thus, v (p) v (q) for all v 2 V . The converse is
easily checked.

Let = fp1 ; :::; pn g be a (…nite, for simplicity) collection of propositions in P . A pro-


position q 2 P is a (logical) consequence of if j= q. Logical consequences are established
via deductive reasoning. Such reasoning might well be sequential, according for example to
the deduction scheme (C.4).

If all propositions in are true, so are their logical consequences. We say that is:

(i) (logically) complete if, for all q 2 P , either j= q or j= :q;

(ii) (logically) consistent if there is no q 2 P such that both j= q and j= :q.

In words, is complete if each proposition, or its negation, is a consequence of ; it is


consistent if it has as a consequence both a proposition and its negation.
Let v 2 V . The elements of = fp1 ; :::; pn g are called axioms for v if v (pi ) = 1 for each
i = 1; :::; n. All these propositions are true in the con…guration of the empirical reality that
underlies v. If is consistent, the axioms are consistent. If is complete, from the axioms
we can deduce whether all propositions are true or not (under v).

A scienti…c inquiry starts with a language L that describes the empirical phenomenon
under investigation. Let v be the true con…guration of the phenomenon. A scienti…c theory
is a consistent set P whose elements are assumed to be axioms, that is, to be true under
the (unknown) true con…guration v . All logical consequences of , established via theorems,
are then true under such assumption. If the set of axioms is complete, the truth value of all
propositions in P can be, in principle, decided. So, the function v is identi…ed.
To decide whether a scienti…c theory is true we have to check whether v (pi ) = 1 for each
i = 1; :::; n.8 If n is large, operationally this might be complicated (infeasible if is in…nite).
In contrast, to falsify the theory it is enough to exhibit, directly, a proposition of that is
false or, indirectly, a consequence of that is false. This operational asymmetry between
veri…cation and falsi…cation (emphasized by Karl Popper in the 1930s) is an important
methodological aspect. Indirect falsi…cation is, in general, the kind of falsi…cation that one
might hope for. It is the so-called testing of the implications of a scienti…c theory. In this
7
The importance of propositions whose truth value is independent of any interpretation was pointed out
by Ludwig Wittgenstein in his famous Tractatus (the term tautology is due to him).
8
For instance, special relativity is based on two axioms: p =“invariance of the laws of physics in all inertial
frames of reference”, q =“the velocity of light in vacuum is the same in all inertial frames of reference”. If v
is the true physical con…guration, the theory is true if v (p) = v (q) = 1.
C.6. PREDICATES AND QUANTIFIERS 889

indirect case, however, it is unclear which one of the posited axioms actually fails: in fact,
: (p1 ^ ^ pn ) :p1 _ _:pn . If not all the posited axioms have the same status, only
some of them being “core” axioms (as opposed to auxiliary ones), it is then unclear how
serious is the falsi…cation. Indeed, falsi…cation is often a chimera (especially in the social
sciences), as even the highly stylized setup of this section should suggest.

C.6 Predicates and quanti…ers


C.6.1 Generalities
The symbols 8 and 9 mean respectively “for every” and “there exists (at least one)” and
are called universal quanti…er and existential quanti…er . Their role is fundamental in math-
ematics. For example, the statement x2 = 1 is, per se, meaningless. By completing it by
writing
8x 2 R, x2 = 1 (C.6)
we would make a big mistake; by writing, instead,

9x 2 R, x2 = 1 (C.7)

we would assert a (simple) truth: there is some real number (there are actually two of them:
x = 1) whose square is 1.

To understand the role of quanti…ers, we consider expressions –called (logical ) predicates


and denoted by p (x) –that contain an argument x that varies in a given set X, the domain.
For example, the predicate p (x) can be “x2 = 1” or “ in the year x it rained in Milan”.
Once a speci…c value of the domain x is considered, we have a proposition p (x) that may be
either true or false. For instance, if X is the real line and x = 3, the proposition “x2 = 1”is
false; it becomes true if and only if x = 1.
The propositions
9x 2 X, p (x) (C.8)
and
8x 2 X, p (x) (C.9)
mean that p(x) is true at least for some x in the domain and that p(x) is true for every such
x, respectively. For example, when p (x) is “x2 = 1” propositions (C.8) and (C.9) reduce,
respectively, to propositions (C.6) and (C.7), while for the weather predicate they become
the propositions “there exists a year in which it rained in Milan” and “every year it rained
in Milan”. Note that when the domain is …nite, say X = fx1 ; :::; xn g, the propositions (C.8)
and (C.9) can be written as p (x1 ) _ _ p (xn ) and p (x1 ) ^ ^ p (xn ), respectively.
Quanti…ers transform, therefore, predicates in propositions, that is, in statements that
are either true or false. That said, if X is in…nite to verify whether proposition (C.9) is true
requires an in…nite number of checks, i.e., whether p (x) is true for each x 2 X. Operationally,
such truth value cannot be determined. In contrast, to verify whether (C.9) is false is enough
to exhibit one x 2 X such that p (x) is false. There is, therefore, a clear asymmetry between
the operational content of the two truth values of (C.9). A large X reinforces the asymmetry
between veri…cation and falsi…cation that a large n already causes, as we remarked before (a
890 APPENDIX C. ELEMENTS OF INTUITIVE LOGIC

proposition “8x 2 X, p1 (x) ^ ^ pn (x)” would combine, so magnify, these two sources of
asymmetry).
In contrast, the existential proposition (C.8) can be veri…ed via an element x 2 X such
that p (x) is true. Of course, if X is large (let alone if it is in…nite), it may be operationally
not obvious how to …nd such an element. Be that as it may, falsi…cation is in a much bigger
trouble: to verify that proposition (C.8) is false we should check that, for all x 2 X, the
proposition p (x) is false. Operationally, existential propositions are typically not falsi…able.

N.B. (i) In the book we will often write “p (x) for every x 2 X” in the form

p (x) 8x 2 X

instead of 8x 2 X, p (x). It is a common way to handle universal quanti…ers. (ii) If


X = X1 Xn is a Cartesian product, the predicate takes the form p (x1 ; :::; xn ) because
x = (x1 ; :::; xn ). O

C.6.2 Algebra
In a sense, 8 and 9 represent the negation one of the other. So9

: (9x, p (x)) 8x; :p (x)

and, symmetrically,
: (8x, p (x)) 9x, :p (x)

In the example where p (x) is “x2 = 1”, we can indi¤erently write:

: 8x, x2 = 1 or 9x, x2 6= 1

(respectively: it is not true that x2 = 1 for every x and it is true that for some x we have
x2 6= 1).
More generally
: (8x; 9y, p (x; y)) 9x; 8y, :p (x; y)

For example, let p (x; y) be the proposition “x + y 2 = 0”. We can equally assert that

: 8x; 9y, x + y 2 = 0

(it is not true that, for every x 2 R, we can …nd a value of y 2 R such that the sum x + y 2
is zero: it is su¢ cient to take x = 5) or

9x; 8y, x + y 2 6= 0

(it is true that, for every choice of y 2 R, there exists some value of x 2 R such that
x + y 2 6= 0: it is su¢ cient to take x 6= y 2 ).
9
To ease notation, in the quanti…ers we omit the clause “2 X”.
C.6. PREDICATES AND QUANTIFIERS 891

C.6.3 Example: linear dependence and independence


m
In Chapter 3 a …nite set of vectors xi i=1
of Rn has been called linearly independent if, for
every set f i gm
i=1 of real numbers,

1 2 m
1x + 2x + + mx = 0 =) 1 = 2 = = m =0
m
The set xi i=1 has been, instead, called linearly dependent if it is not linearly inde-
pendent, i.e., if there exists a set f i gmi=1 of real numbers, not all equal to zero, such that
x1+ x2+ + xm = 0.
1 2 m
We can write these notions by making explicit the role of predicates. Let p ( 1 ; :::; m ) and
q ( 1 ; :::; m ) be the predicates “ 1 x1 + 2 x2 + + m xm = 0”and “ 1 = 2 = = m = 0”,
m
respectively. The set xi i=1 is linearly independent when

8 f i gm
i=1 , p ( 1 ; :::; m) =) q ( 1 ; :::; m)

In words, for every set f i gm


i=1 of real numbers, if 1x
1 + 2x
2 + + mx
m = 0, then
1 = 2 = = m = 0.
The negation is

9 f i gm
i=1 ; : (p ( 1 ; :::; m) =) q ( 1 ; :::; m ))

that is, thanks to equivalence (C.3),

9 f i gm
i=1 ; p ( 1 ; :::; m) ^ :q ( 1 ; :::; m)

In words, there exists a set f i gm


i=1 of real numbers that, at the same time, are not all equal
to zero and are such that 1 x1 + 2 x2 + + m xm = 0.
892 APPENDIX C. ELEMENTS OF INTUITIVE LOGIC
Appendix D

Mathematical induction

D.1 Generalities
Suppose that we want to prove that a proposition p(n), formulated for every natural number
n, is true for every such number n. Intuitively, it is su¢ cient to show that the “initial”
proposition p(1) is true and that the truth of each proposition p (n) implies that of the
“subsequent” one p (n + 1). Next we formalize this domino argument:1

Theorem 1201 (Induction principle) Let p (n) be a proposition stated in terms of each
natural number n. Suppose that:

(i) p (1) is true;

(ii) for each n, if p(n) is true, then p(n + 1) is true.


Then, proposition p (n) is true for each n.

Proof Suppose, by contradiction, that proposition p (n) is false for some n. Denote by n0
the smallest such n, which exists since every non-empty collection of natural numbers has
a smallest element.2 By (i), n0 > 1. Moreover, by the de…nition of n0 , the proposition
p (n0 1) is true. By (ii), p (n0 ) is true, a contradiction.

A proof by induction thus consists of two steps:

(i) Initial step: prove that the proposition p (1) is true.

(ii) Induction step: prove that, for each n, if p(n) is true then p(n + 1) is true.

We illustrate this important type of proof by determining the sum of some important
series.
1
There are many soldiers, one next to the other. The …rst has the “right scarlet fever”, a rare form of
scarlet fever that contaminates instantaneously who is at the right of the sick person. All the soldiers take it
because the …rst one infects the second one, the second one infects the third one, and so on so forth.
2
In the set-theoretic jargon, we say that N is a well ordered set.

893
894 APPENDIX D. MATHEMATICAL INDUCTION

(i) Let
Xn n (n + 1)
1+2+ +n= s=
s=1 2
Initial step. For n = 1 the property is trivially true:

1 (1 + 1)
1=
2
Induction step. Assume it is true for n = k, that is,
Xk k (k + 1)
s=
s=1 2
We must prove that it is true also for n = k + 1, i.e., that
Xk+1 (k + 1) (k + 2)
s=
s=1 2

Indeed3
Xk+1 Xk k (k + 1) (k + 1) (k + 2)
s= s + (k + 1) = +k+1=
s=1 s=1 2 2

In particular, the sum of the …rst n odd numbers is n2 :


Xn Xn Xn n (n + 1)
(2s 1) = 2 s 1=2 n = n2
s=1 s=1 s=1 2

(ii) Let
Xn n (n + 1) (2n + 1)
12 + 22 + + n2 = s2 =
s=1 6
Initial step. For n = 1 the property is trivially true:

1 (1 + 1) (2 + 1)
12 =
6
Induction step. By proceeding as above we then have:
Xk+1 Xk k (k + 1) (2k + 1)
s2 = s2 + (k + 1)2 = + (k + 1)2
s=1 s=1 6
(k + 1) [k (2k + 1) + 6 (k + 1)] (k + 1) 2k 2 + 7k + 6
= =
6 6
(k + 1) (k + 2) (2k + 3)
=
6
as desired.
3
Alternatively, this sum can be derived by observing that the sum of the …rst and of the last addend is
n + 1, the sum of the second one and of the second-last one is still n + 1, etc. There are n=2 pairs and therefore
the sum is (n + 1) n=2.
D.2. THE HARMONIC MENGOLI 895

(iii) Let
n
!2
Xn X n2 (n + 1)2
13 + 23 + + n3 = s3 = s =
s=1 4
s=1
Initial step. For n = 1 the property is trivially true:

12 (1 + 1)2
3
1 =
4
Induction step. By proceeding as above we have:
Xk+1 Xk k 2 (k + 1)2 3
3
s = s + (k + 1) =3
+ (k + 1)3
s=1 s=1 4
(k + 1)2 k 2 + 4 (k + 1) (k + 1)2 (k + 2)2
= =
4 4

(iv) Consider the sum


Xn 1 qn
a + aq + aq 2 + + aq n 1
= aq s 1
=a
s=1 1 q
of n addends in geometric progression with …rst term a and common ratio q 6= 1.
Initial step. For n = 1 the property is trivially true:
1 q
a=a
1 q
Induction step. By proceeding as above we have:
Xk+1 Xk 1 qk
aq s 1
= aq s + aq k
1
+ aq k = a
s=1 s=1 1 q
1 q k + (1 q) q k 1 q k+1
=a =a
1 q 1 q
as desired.

D.2 The harmonic Mengoli


As a last illustration of the induction principle, we report a modern version of the classical
proof by Pietro Mengoli of the divergence of the harmonic series (presented in his 1650 essay
Novae quadraturae arithmeticae seu de additione fractionum).

Theorem 1202 The harmonic series is divergent.

The proof is based on a couple of lemmas, the second of which is proven by induction.

Lemma 1203 We have, for every k 2,


1 1 1 3
+ +
k 1 k k+1 k
896 APPENDIX D. MATHEMATICAL INDUCTION

Proof Consider the convex function f : (0; 1) ! (0; 1) de…ned by f (x) = 1=x. Because
1 1 1
k= (k 1) + k + (k + 1)
3 3 3
the Jensen’s inequality implies

1 1 1 1 1
= f (k) = f (k 1) + k + (k + 1) (f (k 1) + f (k) + f (k + 1))
k 3 3 3 3
1 1 1 1
= + +
3 k 1 k k+1

as desired.
Pn
Let sn = k=1 xk be the partial sum of the harmonic series xk = 1=k.

Lemma 1204 s3n+1 sn + 1 for every n 1.

Proof Let us proceed by induction. Initial step: n = 1. We apply the previous lemma for
k = 3,
1 1 1 3
s3 1+1 = s4 = 1 + + + > 1 + = 1 + s1
2 3 4 3

Induction step: let us assume that the statement holds for n 1. We prove that it holds
for n + 1. We apply the previous lemma for k = 3n + 3,

1 1 1
s3(n+1)+1 = s3n+4 = s3n+1 + + +
3n + 2 3n + 3 3n + 4
1 1 1
sn + 1 + + +
3n + 2 3n + 3 3n + 4
3 1
sn + 1 + = sn + 1 + = sn+1 + 1
3n + 3 n+1
that proves the induction step. In conclusion, the result holds thanks to the induction
principle.

Proof of the theorem Since the harmonic series has positive terms, the sequence of the
partial sums fsn g is monotonic increasing. Therefore, it either converges or diverges. By
contradiction, let us assume that it converges, i.e., sn " L < 1. From the last lemma it
follows that
L lim s3n+1 lim (1 + sn ) = 1 + lim sn = 1 + L
n n n

which is a contradiction.
Appendix E

Cast of characters

Archimedes (Syracuse 287 BC ca. –212 BC), mathematician.


Aristotle (Stagira 384 BC –Euboea 322 BC), philosopher and physicist.
Stefan Banach (Kraków 1892 –Lviv 1945), mathematician.
Jeremy Bentham (London 1748 –1832), philosopher.
Daniel Bernoulli (Groningen 1700 –Basel 1782), mathematician.
Jakob Bernoulli (Basel 1654 –1705), mathematician.
Bernhard Bolzano (Prague 1781 –1848), mathematician and philosopher.
Cesare Burali-Forti (Arezzo 1861 –Turin 1931), mathematician.
Georg Cantor (Saint Petersburg 1845 –Halle 1918), mathematician.
Gerolamo Cardano (Pavia 1501 –Rome 1576), mathematician.
Augustin-Louis Cauchy (Paris 1789 –Sceaux 1857), mathematician.
Ernesto Cesàro (Naples 1859 –Torre Annunziata 1906), mathematician.
Jean Darboux (Nimes, 1842 –Paris 1917), mathematician.
Richard Dedekind (Braunschweig 1831 –1916), mathematician.
Democritus (Abdera 460 BC ca. –370 BC ca.), philosopher.
René Descartes (Cartesius) (La Haye 1596 –Stockholm 1650), mathematician and philo-
sopher.
Diophantus (Alexandria, II - III century BC), mathematician.
Ulisse Dini (Pisa 1845 –1918), mathematician.
Peter Lejeune Dirichlet (Düren 1805 –Göttingen 1859), mathematician.
Francis Edgeworth (Edgeworthstown 1845 –Oxford 1926), economist.
Epicurus (Samos 341 BC –Athens 270 BC), philosopher.
Euclid (Alexandria, IV - III century BC), mathematician.
Eudoxus (Cnidus, IV centry BC), mathematician.
Leonhard Euler (Basel 1707 –Saint Petersburg 1783), mathematician.
Leonardo da Pisa (Fibonacci) (Pisa ca. 1170 - ca. 1240), mathematician.
Pierre de Fermat (Beaumont-de-Lomagne 1601 – Castres 1665), lawyer and mathem-
atician.

897
898 APPENDIX E. CAST OF CHARACTERS

Bruno de Finetti (Innsbruck 1906 –Rome 1985), mathematician.


Nicolò Fontana (Tartaglia) (Brescia 1499 –Venice 1557), mathematician.
Ferdinand Frobenius (Charlottenburg 1849 –Berlin 1917), mathematician.
Galileo Galilei (Pisa 1564 –Arcetri 1642), astronomer and physicist.
Carl Gauss (Brunswick 1777 –Gottingen 1855), mathematician.
Guido Grandi (Cremona 1671 –Pisa 1742), mathematician.
Jacques Hadamard (Versailles 1865 –Paris 1963), mathematician.
Felix Hausdor¤ (Breslau 1868 –Bonn 1942), mathematician.
Heinrich Heine (Berlin 1821 –Halle 1881), mathematician.
Hero (Alexandria I century AD), mathematician.
John Hicks (Warwick 1904 –Blockley 1989), economist.
David Hilbert (Königsberg 1862 –Gottingen 1943), mathematician.
Einar Hille (New York 1894 –La Jolla 1980), mathematician.
Guillaume de l’Hôpital (Paris 1661 –1704), mathematician.
Hippocrates (Chios, V century BC), mathematician.
Johan Jensen (Nakskov 1859 –Copenhagen 1925), mathematician.
William Jevons (Liverpool 1835 –Bexill 1882), economist and philosopher.
Harold Kuhn (Santa Monica 1925 - New York 2014), mathematician.
Muh.ammad ibn Mūsa al-Khuwārizm¯¬(750 ca –Bagdad 850 ca), astronomer and math-
ematician.
Giuseppe Lagrange (Turin 1736 –Paris 1813), mathematician.
Gabriel Lamé (Tours 1795 –Paris 1870), mathematician.
Edmund Landau (Berlin 1877 –1938), mathematician.
Pierre-Simon de Laplace (Beaumont-en-Auge 1749 – Paris 1827), mathematician and
physicist.
Adrien-Marie Legendre (Paris 1752 –1833), mathematician.
Gottfried Leibnitz (Leipzig 1646 –Hannover 1716), mathematician and philosopher.
Wassily Leontief (Saint Petersburg 1905 –New York 1999), economist.
Joseph Liouville (Saint-Omer 1809 –Paris 1882), mathematician.
John Littlewood (Rochester 1885 –Cambridge 1977), mathematician.
Colin MacLaurin (Kilmodan 1698 –Edinburgh 1746), mathematician.
Melissus (Samos V century BC), philosopher.
Carl Menger (Nowy Sacz
¾ 1840 –Vienna 1921), economist.
Pietro Mengoli (Bologna 1626 –1686), mathematician.
Marin Mersenne (Oizé 1588 –Paris 1648), mathematician and physicist.
Hermann Minkowski (Aleksotas 1864 –Gottingen 1909), mathematician.
Abraham de Moivre (Vitry-le-François 1667 –London 1754), mathematician.
John Neper (Merchiston 1550 - 1631), mathematician.
899

Isaac Newton (Woolsthorpe 1642 –London 1727), mathematician and physicist.


Vilfredo Pareto (Paris 1848 –Céligny 1923), economist and sociologist.
Parmenides (Elea VI century BC), philosopher.
Giuseppe Peano (Spinetta di Cuneo 1858 –Torino 1932), mathematician.
Plato (Athens 484 BC ca. –348 BC ca.), philosopher.
Pythagoras (Samos 570 BC ca. – Metapontum 495 BC ca.), mathematician and philo-
sopher.
Henri Poincaré (Nancy 1854 –Paris 1912), mathematician.
Hudalricus Regius (Ulrich Rieger) (XVI century), mathematician.
Bernhard Riemann (Breselenz 1826 –Selasca 1866), mathematician.
Michel Rolle (Ambert 1652 –Paris 1719), mathematician.
Bertrand Russell (Trellech 1872 – Penrhyndeudraeth 1970), mathematician and philo-
sopher.
Eugen Slutsky (Yaroslav 1880 –Moscow 1948), economist and mathematician.
Guido Stampacchia (Naples, 1922 –Paris, 1978), mathematician.
James Stirling (Garden 1692 –Edinburgh 1770), mathematician.
Thomas Stieltjes (Zwolle 1856 –Toulouse 1894), mathematician.
Alfred Tarski (Warsaw 1902 –Berkeley 1983), mathematician.
Brook Taylor (Edmonton 1685 –London 1731), mathematician.
Leonida Tonelli (Gallipoli 1885 –Pisa 1946), mathematician.
Albert Tucker (Oshawa 1905 –Hightstown 1995), mathematician.
Charles-Jean de la Vallèe Poussin (Leuven 1866 –1962), mathematician.
Leon Walras (Évreux 1834 –Clarens-Montreux 1910), economista.
Karl Weierstrass (Ostenfelde 1815 –Berlin 1897), mathematician.
Zeno (Elea V century BC), philosopher.
Index

Absolute value, 75 Closure


Addition of set, 97
among matrices, 333 Codomain, 107
Algorithm Coe¢ cient
notion, 18 binomial, 869
of Euclid, 17 multinomial, 869
of Gauss, 354 Cofactor, 369
of Hero, 188 Combination
of Kronecker, 377 a¢ ne, 408
Approximation convex, 393, 396
linear, 523, 599 Complement
polinomial, 599 algebraic, 369
quadratic, 600 Completeness
Arbitrage, 494 of the order, 22
Archimedean property, 28 Components
Argmax, 441 of a matrix, 332
Asymptote, 685 of a vector, 43
horizontal, 281, 685 Compound factor, 422
oblique, 686 Condition
vertical, 281, 685 …rst-order, 573, 576
Axis second-order, 589
horizontal/abscissae, 42 Conditional, 880
vertical/ordinates, 42 Cone, 427
Constraint, 715
Basis, 68, 72
Constraints
orthonormal, 82
equality, 709
Biconditional, 880
inequality, 734
Bits, 34
Continuity, 303
Border, 91
uniform, 322
C(E), 305 Contrapositive, 883
C^1(E), 527, 555 Convergence
C^n(E), 528, 555 absolute (for series), 242
Cardinality, 160 in mean (Cesàro), 254
of the continuum, 163 of improper integrals, 828
Cauchy condition, 207 of sequences, 179, 187, 202
Change of variable of series, 230, 234, 239
Riemann, 819 Converse, 881
Stieltjes, 856 Convex hull, 423

900
INDEX 901

Correspondence Diagonal
budget, 765 principal, 335
demand, 768 Di¤erence, 7
feasibility, 767 Di¤erence quotient, 499, 501
inverse, 765 Di¤erentiability with continuity, 527
of the envelope, 765 Di¤erential, 524
solution, 767 total, 547
Cosecant, 875 Di¤erentiation under the integral sign, 842
Cosine, 874 Direct sum, 485
Cost Discontinuity
marginal, 501 essential, 309
Cotangent, 875 jump, 309
Countable, 160 non-removable, 309
Cramer’s rule, 380 removable, 309
Criterion Distance (Euclidean), 87
comparison, 202 Divergence
general of convergence, 208 of improper integrals, 828
of comparison for series, 235 of sequences, 182
of Sylvester-Jacobi, 614 of series, 230
of the ratio for sequences, 203 Domain, 107
of the root for sequences, 205 natural, 151
of the root for series, 258 of derivability, 506, 539
ratio, 239, 257 Dual space, 329
Curve, 108
indi¤erence, 122, 156 Edgeworth box, see Pareto optimum
level, 118 Element
Cusp, 508 of a sequence, see Term of a sequence
of a vector, see Component of a vector
De Morgan’s laws, 10, 882 Equivalence, 882
Decay Expansion
exponential, 221 asymptotic, 626
Density, 29 polinomial, 599
Derivative, 501 polynomial of MacLaurin, 602
higher order, 527 polynomial of Taylor, 602
left, 507 Extended real line, 36, 183
of compounded function, 517
of the inverse function, 519 Factorial, 868
of the product, 514 Forms of indetermination, 37, 197
of the quotient, 515 Formula
of the sum, 514 binomial of Newton, 871
partial, 534, 537 multinomial, 871
right, 507 of Euler, 567
second, 527 of Hille, 629
third, 527 of Mac Laurin, 602
unilateral, 507 of Taylor, 602
Determinant, 360 Frontier, 91
902 INDEX

Function, 105 integrand, 848


absolute value, 110 integrator, 848
a¢ ne, 402 intertemporal utility, 117
algebraic, 823 inverse, 128, 519
analytic, 628 invertible, 128
arccosin, 147 Lagrangian, 714
arcsin, 146 linear, 327
arctan, 147 locally decreasing, 582
asymptotic to another, 296 locally increasing, 582
bijective, 127 locally strictly decreasing, 582
bounded, 131 locally strictly increasing, 582
bounded from above, 131 logarithmic, 143
bounded from below, 131 mantissa, 148
Cobb-Douglas, 112 modulus, 110
coercive, 457 moment generating, 862
comparable with another, 296 monotonic, 133
composite, 517 n-times continuously di¤erentiable, 528,
composite (compoud), 125 555
concave, 138, 399 negligible with respect to another, 295
concave at a point, 683 objective, 440
constant, 133 of a single variable, 108
continuous , 305 of Dirichlet, 272
continuous at a point, 303 of Leontief, 136
continuously di¤erentiable, 527 of n variables, 111
convex, 139, 399 of several variables, 108
convex at a point, 683 of vector, see Function of n variables
cosine, 145 one-to-one, see Function injective
decreasing, 133, 135 periodic, 148
demand, 472 polynomial, 141
derivative, 506 positive homogeneous, 428
di¤erentiable, 501, 524, 525, 540 primitive, 808
discontinuous, 308 production, 116
elementary, 141, 823 quasi-a¢ ne, 412
exponential, 142 quasi-concave, 411
gamma, 625 quasi-convex, 411
Gaussian, 459, 691, 827 rational, 823
implicit, 653, 666 Riemann integrable, 784, 788
increasing, 133, 135 scalar, see Function of one variable
indicator, 794 separable, 139
in…mum of, 132 signum, 319, 810
in…nite, 301 sine, 144
in…nitesimal, 301 solution, 768
injective, 126 square root, 109
instantaneous utility, 117, 176 step, 794
integrable in an improper sense, 828 strictly concave, 401
integral, 813 strictly concave at a point, 683
INDEX 903

strictly convex, 401 Inequality


strictly convex at a point, 683 Jensen, 409
strictly decreasing, 133 of Cauchy-Schwarz, 77
strictly increasing, 133, 135 triangle, 76, 78, 87
strongly increasing, 135 In…mum, 27, 90, 92
supremum of, 132 In…nite, 301
surjective, 126 actual, 159
tangent, 145 potential, 159, 233
trascendental, 823 In…nitesimal, 301
trigonometric, 823 Integrability, 784, 789
unbounded, 301 in …nite terms, 824
uniformly continuous, 321 of continuous functions, 798
utility, 115, 136, 155 of monotonic functions, 800
value, 768 of rational functions, 824
vector, 108 Integral
Functional de…nite, 810
linear, 327 Gauss’s, 827
Functional equation generalized, see Improper integral
Cauchy, 418 improper, 828, 829, 840
for the exponential, 420 inde…nite, 810
for the logarithm, 420 lower, 782
for the power, 421 of Stieltjes, 848
Fundamental limits, 196, 211 Riemann, 784
upper, 782
Goods
Integral sum
perfect complementary, 138
lower, 781, 789
perfect substitutes, 141
upper, 781, 789
Gradient, 537
Integration
Graph
by change of variable, 819
of a correspondence, 766
by parts (Riemann), 817
of a function, 113
by parts (Stieltjes), 856
Hypograph, 404 by trigonometric substitution, 826
Interior
Image, 107 of set, 91
of function, 107 Intersection, 5, 879
of operator, 346 Interval, 23
of sequence, 176 bounded, 23, 49
Implication, 883 closed, 23, 49
Important limits, 294 half-closed, 23, 49
Indeterminacies, 291 half-open, see Interval half-closed
Indi¤erence open, 23, 49
class, 154 unbounded, 24, 49
curve, 122, 156 Isocosts, 123
map, 154 Isoquants, 123
relation, 153
Induction, see Principle by induction Kernel, 345
904 INDEX

L(R^n), 342 full rank, 351


L(R^n,R^m), 341 Gram, 352
Least Upper Bound Principle, 27 Hessian, 551, 616, 642
Limit, 272 identity, 333
from above, 181 inverse, 359
from below, 181 invertible, 359, 374
inferior, 246 Jacobian, 560, 726
left, 278 lower triangular, 335
of function, 267 maximum rank, 351
of operators, 320 non-singular, 374
of scalar function, 272, 273 null, 333
of sequence, 179 of algebraic complements, 369
one-sided, 277 rectangular, 332
right, 278 simmetric, 335
superior, 246 singular, 374
unilateral, 277 square, 332
vector function, 282 transpose, 335
Linear combination, 46, 64 upper triangular, 335
convex, 393 Maximal of a set, see Pareto optimum
Linear system Maximizer
determined, 382 global, 149, 440
homogeneous, 380 local, 465
rectangular, 382 strong global, 442
solvability, 385 strong local, 465
solvable, 382 Maximum of a function, 149
square, 378 global, 149, 440
undetermined, 383 global maximum value, 440
unsolvable, 382 local maximizer, 465
little-o of, 214, 296 local maximum value, 465
Lower bound, 24 maximizer, 149, 440
maximum value, 149
M(m,n), 333 strong global, 442
M(n), 360 strong maximizer, 442
Marginal rate Maximum of a set
of intertemporal substitution, 660 in R, 25
of substitution, 659 in R^n, 51
of transformation, 658 Mersenne, 174
Matrix Mesh of a subdivision, 785
adjoint, 369 Method
augmented, 383 elimination, 700
cofactor, see Matrix of algebraic compon- Gaussian elimination, 354
ents Lagrange’s, 715
complete, 383 least squares, 475
diagonal, 335 Minimal of a set, see Pareto optimum
echelon, 354 Minimizer
elementary, 355 global, 442
INDEX 905

local, 465 global, 479


strong local, 465 Ordered pairs, 41
Minimum Orthogonal
local of a function, 465 subspace, 484
Minor vectors, 80
principal, 376
principal of NW, 376 Paradox
Moments, 860 of Burali Forti, 10
Multiplier of Russell, 10
marginal interpretation, 773 of the liar, 881
Multiplier of Lagrange, 714, 727, 740 Pareto optimum, 51
Part
Napier’s constant, 209 integer, 29
Negation, 879 negative, 786
Neighbourhood, 88 positive, 786
left, 90 Partial sums, 230
of in…nite, 183 Partition, 9
right, 90 Permutation
with hole, 279 simple, 868
Norm, 76 with repetitions, 869
Nullity, 346 Plurirectangle, 778
Number Point
e, 15, 209, 241 accumulation, 93
in…nite cardinal, 165 boundary, 91
of Nepero, 15, 241 corner, 508
Numbers critical, 576
algebraic, 209 cuspidal, 508
irrational, 15 exterior, 91
natural, 11 extremal, 443
prime, 19 interior, 90
rational, 12 isolated, 92
real, 14 limit, 247
relative integer, 11 of in‡ection, 684
transcendental, 209 of in‡ection with horizontal tangent, 685
numeraire, 474 of Kuhn-Tucker, 740
saddle, 575
Operations singular, 710
elementary (by row), 355 stationary, 576
Operator, 108, 112, 340 Polynomial, 141
continuous, 320 of MacLaurin, 601
derivative, 539 of Taylor, 601
identity, 340 Polytope, 396
invertible, 357 Positive orthant, 42
linear, 340 Postulate of continuity of the real line, 14
null, 340 Power
Optimizer of set, 160
906 INDEX

set, 165 Pythagorean trigonometric identity, 875


Predicate, 889
Preference Quadratic form, 611
complete, 154 inde…nite, 614
de…nition, 115 negative de…nite, 614
lexicographic, 157 negative semi-de…nite, 614
monotonic, 155 positive de…nite, 614
re‡exive, 153 positive semi-de…nite, 614
strict, 153 Quanti…er
strictly monotonic, 155 existential, 889
strongly monotonic, 155 universal, 889
transitive, 153
Rank, 346, 348
Preimage, 117
full, 351
Primitive, 808
maximum, 351
Principle by induction, 893
Remainder
Problem
Lagrange’s, 606
constrained optimization, 443 Peano’s, 606
consumer, 448 Representation
maximum, 443 of linear function, 329
minimum, 443 of linear operator, 342
optimization, 443 Restriction, 151
parametric optimization, 768 Root
unconstrained di¤erential optimization, 699 algebraic, 30
unconstrained optimization, 443 arithmetical, 30, 76
vector maximum, 479 Rule
with equality constraints, 709 chain, 517, 548, 563
with inequality constraints, 734 of Cramer, 380
Procedure of de l’Hospital, 593
Gaussian elimination, 354 of Leibniz, 844
Product
Cartesian, 41, 43 Scalar, 36
inner, 46, 75 Scalar multiplication, 45, 334
of matrices, 336 Secant, 875
Projection, 485 Semicone, 433
Projections, 534 Separating element, 23
Proof Sequence, 171
by contradiction, 884 arithmetic, 173
by contraposition, 884 asymptotic to another, 213
direct, 884 bounded, 177
Property bounded from above, 177
archimedean, 28 bounded from below, 177
associative, 8, 45, 59, 124, 334 Cauchy, 207
commutative, 8, 45, 46, 59, 124, 334 comparable with another, 213
distributive, 9, 45, 46, 59, 334 constant, 177
satis…ed eventually, 177 convergent, 179
Proposition, 879 decreasing, 177
INDEX 907

divergent, 182 countable, 160


Fibonacci, 172 derived, 93
geometric, 173 empty, 5
harmonic, 173 …nite, 160
increasing, 177 image, 107
in…nitesimal, 180 linearly dependent, 62
irregular, 179 linearly independent, 62
monotonic, 177 maximum, 25, 51
negligible with respect to another, 213 minimum, 25, 51
null, see In…nitesimal sequence open, 95
of di¤erences, 248 orthogonal, 80
of second di¤erences, 249 orthonormal, 81
of the partial sums of a series, 230 power of, 160
of the same order of another, 213 unbounded, 25
oscillating, see Irregular sequence universal, 8
regular, 179 Sets
totally monotone, 862 disjoint, 5
unbounded, 177 lower contour, 406
Series, 230 upper contour, 405
absolutely convergent, 242 Sine, 874
alternating harmonic series, 244 Singleton, 4
convergent, 230 Solution
generalized harmonic, 236 of an optimization problem, 443
geometric, 232 Space, 8
harmonic, 231, 895 column, 350
irregular, 230 complete, 209
MacLaurin, 628 dual, 329
Mengoli, 231 Euclidean, 44
negatively divergent, 230 incomplete, 209
of Grandi, 255 R^n, 44
oscillating, see Irregular series row, 350
positively divergent, 230 vector, 59
Taylor, 628 Span of a set, 66
with positive terms, 234 Subdivision, 780
with terms of any sign, 242 Submatrix, 360
Set, 3 Subsequence, 190
bounded, 25, 102 Subset, 3
bounded from above, 25 proper, 4
bounded from below, 25 Superdi¤erential, 644
budget, 448 Supremum, 27, 90, 92
choice, 153, 440
closed, 97 Tangent
compact, 102 (trigonometric), 875
complement, 8 Tangent line, 503
consumption, 448 Tangent plane, 543
convex, 393 Term of a sequence, 171
908 INDEX

Terminal value, 421 of Stampacchia, 757


Test of concavity of Tartaglia-Newton, 870
for di¤erentiable functions, 638 of Taylor, 601
for twice di¤erentiable functions, 639 of the comparison, 202, 288
Test of monotonicity, 584 of the envelope, 771
strict, 584 of the implicit function, 653, 662, 664
Theorem of the integral mean, 806
Darboux, see Intermediate Value The- of the mean value of integral calculus, 806
orem of Tonelli, 461
fundamental of arithmetic, 20 of uniqueness of the limit, 184, 286
fundamental of …nance, 496 of Weierstrass, 315, 453
fundamental of integral calculus (…rst), on zeros, 312
812 projection, 484
fundamental of integral calculus (second), Riesz, 329
814 Truth
Intermediate Value, 318 table of, 879
mean value, 578 value, 879
of Binet, 367
of Bolzano, see Theorem on zeros Union, 6, 880
of Bolzano-Weierstrass, 192 Upper bound, 24
of Cantor, 164
of Carnot, 876 Value
of Cauchy, 207, 419 absolute, 75
of Cesàro, 251 principal, according to Cauchy, 831
of de l’Hospital, 593 Variable
of De Moivre-Stirling, 223 dependent, 107
of Euclid, 17, 22 independent, 107
of Fermat, 572 of choice, 443
of Frobenius-Littlewood, 263 Vector, 42, 44
of Hahn-Banach, 389 zero, 45
of Hausdor¤, 862 Vector subspace, 60
of Kronecker, 377 generated, 66
of Kuhn-Tucker, 740 Vectors
of Lagrange (mean value), see Mean Value addition, 44
Theorem collinear, 63
of Lagrange (optimization), 714 column, 332
of Landau, 255 linearly dependent, 63
of Laplace, 373 linearly independent, 63
(…rst), 370 orthogonal, 80
(second) , 372 product, 44
of Minkowski, 426 row, 332
of permanence of sign, 186, 287 scalar multiplication, 45
of Pythagoras, 80, 876 sum, 44
of Rolle, 577 Venn diagrams, 4
of Rouché-Capelli, 383 Versors, 79
of Schwartz, 552 fundamental of R^n, 62, 79
INDEX 909

Walras’Law, 451

You might also like