CMV Springer Draft 2016 - Chapters 21 33
CMV Springer Draft 2016 - Chapters 21 33
Approximation
(i) the simplicity of the approximating function: in this case the linear function df (x0 ) (h) =
f 0 (x0 ) h;
(ii) the quality of the approximation, given by the error term o (h).
Intuitively, there is a tension between these two properties: the simpler the approximating
function, the worst the quality of the approximation. In other terms, the simpler we want
the approximating function, the higher the error in which we may incur. In this section
we study in detail the relation between these two key properties. In particular, suppose to
weaken property (i), being satis…ed with an approximating function that is a polynomial of
degree n, not necessarily with n = 1 as in the case of a straight line. The desideratum that
we posit is that there is a corresponding improvement in the error term that should become
of magnitude o (hn ). In other words, when the degree n of the approximating polynomial
increases, and so the complexity of the approximating function, we want that the error term
improves in a parallel way: an increase in the complexity of the approximating function
should be o¤set by a greater goodness of the approximation.
599
600 CHAPTER 21. APPROXIMATION
f (x0 + h) = 0 + h + o (h) as h ! 0
the approximating function is now more complicated: instead of a straight line (the poly-
nomial of …rst degree 0 + h) we have a parabola (the polynomial of second degree
2
0 + 1 h + 2 h ). But, on the other hand the error term is now better: instead of o (h), we
2
have o h .
An important property of polynomial expansions is that, when they exist, they are unique.
Lemma 868 A function f : (a; b) ! R has at most a unique polynomial expansion of degree
n at every point x0 2 (a; b).
Proof Suppose that, for every h 2 (a x0 ; b x0 ), there are two di¤erent expansions
2 n
0 + 1h + 2h + + nh + o (hn ) = 0 + 1h + 2h
2
+ + nh
n
+ o (hn ) (21.3)
Then
2 n
0 = lim 0 + 1h + 2h + + nh + o (hn )
h!0
2 n
= lim 0 + 1h + 2h + + nh + o (hn ) = 0
h!0
Hence,
n 1
1 = lim 1 + 2h + + nh + o hn 1
h!0
n 1
= lim 1 + 2h + + nh + o hn 1
= 1
h!0
By iterating what we have done above, we can show that 2 = 2 , and so on until we show
that n = n . This proves that at most one polynomial p (h) can satisfy approximation
(21.1).
For convenience of notation we have put f (0) = f . Such polynomial has as coe¢ cients the
derivatives of f at the point x0 until the order n. In particular, if x0 = 0 Taylor’s polynomial
is sometimes called MacLaurin’s polynomial.
The next result, which is fundamental and of great elegance, shows that if f has a suitable
number of derivatives at x0 , the unique polynomial expansion is given precisely by Taylor’s
polynomial.
where Tn is the unique polynomial, at most of degree n, that satis…es De…nition 867, i.e.,
which is able to approximate f (x0 + h) with error o (hn ).
The approximation (21.6) is called Taylor’s expansion (or formula) of order n of f at
x0 . The important special case x0 = 0 takes the name of MacLaurin’s expansion (or formula)
of order n of f .
Note that for n = 1 Taylor’s Theorem coincides with the direction “if” of Theorem 771.
Indeed, since we set f (0) = f , saying that f has a derivative 0 times on (a; b) is equivalent
simply to say that f is de…ned on (a; b). Hence, for n = 1, Taylor’s Theorem states that, if
f : (a; b) ! R has a derivative at x0 2 (a; b), then
that is f is di¤erentiable at x0 .
For n = 1, the polynomial approximation (21.6) reduces, therefore, to the linear approx-
imation (18.29), that is, to
O.R. Graphically the quadratic approximation (also called of the second order) is a parabola.
The linear approximation, as we know, is, graphically, the straight line tangent to the graph
of the function; the quadratic approximating is the so-called osculating parabola,1 that is the
parabola that shares at x0 the same value of the function, the same slope (…rst derivative)
and the same curvature (second derivative). H
Proof In the light of Lemma 868, it is su¢ cient to show that Taylor’s polynomial satis…es
(21.1). Let us start by observing preliminarily that, since f has a derivative n 1 times on
(a; b), we have f (k) : (a; b) ! R for every 1 k n 1. Moreover, thanks to Proposition
772, f (k) is continuous at x0 for 1 k n 1. Let ' : (x0 a; b x0 ) ! R and : R ! R
be the auxiliary functions given by
n
X f (k) (x0 )
' (h) = f (x0 + h) hk and (h) = hn
k!
k=0
1
From os, mouth, that is the “kissing” parabola.
21.1. TAYLOR’S POLYNOMIAL APPROXIMATION 603
We have
(k) (k)
lim (h) = (0) (21.9)
h!0
so that
lim '(k) (h) = '(k) (0) = 0 (21.11)
h!0
Thanks to (21.9) and (21.11), we can apply de l’Hospital’s rule n 1 times, in order to have
with L 2 R. Simple calculations show that (n 1) (h) = n!h. Hence, being f with a derivative
n times at x0 , expression (21.10) with k = n 1 implies
'(n 1) (h) 1 f (n 1) (x
0 + h) f (n 1) (x )
0 hf (n) (x0 )
lim (n 1) (h)
= lim
h!0 n! h!0 h
!
1 f (n 1) (x
0 + h) f (n 1) (x )
0 (n)
= lim f (x0 ) =0
n! h!0 h
Example 871 Let us start with polynomials whose P polynomial approximation is trivial.
Indeed, if f : R ! R is itself a polynomial, f (x) = nk=0 k xk , we obtain the identity
n
X f (k) (0)
f (x) = xk 8x 2 R
k!
k=0
f (k) (0)
k = 81 k n
k!
604 CHAPTER 21. APPROXIMATION
Each polynomial can therefore be equivalently rewritten in the form of a MacLaurin’s ex-
pansion. For example, if f (x) = x4 3x3 , we have f 0 (x) = 4x3 9x2 , f 00 (x) = 12x2 18x;
f 000 (x) = 24x 18 and f (iv) (x) = 24, and hence
f 00 (0)
0 = f (0) = 0 ; 1 = f 0 (0) = 0 ; 2= =0
2!
f 000 (0) 18 f (iv) (0) 24
3 = = = 3 ; 4 = = =1
3! 6 4! 24
N
Example 872 Let f : R++ ! R be given by f (x) = log (1 + x). It is n times di¤erentiable
at each point of its domain, with
(n 1)!
f (n) (x) = ( 1)n+1 8n 1
(1 + x)n
h h2
log (1 + x0 + h) = log (1 + x0 ) +
2 (1 + x0 )2
1 + x0
h3 n+1 hn
+ + + ( 1) + o (hn )
3 (1 + x0 )3 n (1 + x0 )n
Xn
hk
= log (1 + x0 ) + ( 1)k+1 k
+ o (hn )
k=1
k (1 + x0 )
n
X (x x0 )k
log (1 + x) = log (1 + x0 ) + ( 1)k+1 k
+ o ((x x0 )n )
k=1
k (1 + x0 )
Note how a simple polynomial approximates (and as well as we want because o ((x x0 )n )
can be made arbitrarily small) the logarithmic function. In particular, MacLaurin’s expan-
sion of order n of f is
x2 x3 xn
log (1 + x) = x + + + ( 1)n+1 + o (xn ) (21.14)
2 3 n
n
X xk
= ( 1)k+1 + o (xn )
k
k=1
Example 873 In an analogous way the reader can verify MacLaurin’s expansions of order
21.1. TAYLOR’S POLYNOMIAL APPROXIMATION 605
Also here it is important to observe as such functions can be (well) approximated by simple
polynomials. N
3x2 3x4 + 6x
f 0 (x) = 6 cos x sin x ; f 00 (x) = 6(cos2 x sin2 x)
1 + x3 (1 + x3 )2
and therefore
1
f (x) = f (0) + f 0 (0) x + f 00 (0) x2 + o x2 = 3x2 + o x2 (21.15)
2
N
O.R. With n …xed, the approximation given by Taylor’s polynomial is good only in a neigh-
borhood (that can be very little) of the point x0 . On the other hand, increasing n the
approximation improves. We conclude that, …xed n, the approximation is good (better
than a prearranged error threshold) only in a neighborhood of x0 , while, …xed an interval,
there exists a value of n such that the approximation in such interval is good (better than
a prearranged error threshold): obviously provided the function has derivatives until such
order.
If we …x simultaneously the degree n and an interval, in general therefore the approxim-
ation cannot be controlled: it can be very bad. H
606 CHAPTER 21. APPROXIMATION
O.R. It is possible to prove that, if f : (a; b) ! R has n + 1 derivatives on (a; b), it is also
possible to write, as x0 2 (a; b),
n
X f (k) (x0 )
f (x0 + h) = hk + f (n+1) (x0 + #h)
k!
k=0
with 0 # 1. In other words, the addend o (hn ) can always be taken equal to the
(n + 1)-th derivative calculated at an intermediate point between x0 and x0 + h. The ex-
pression indicated allows to control the approximation error: if f (n+1) (x) k for every
x 2 [x0 ; x0 + h] it is possible to conclude that the approximation error does not exceed k
and therefore that
n
X n
X
f (k) (x0 ) f (k) (x0 )
hk k f (x0 + h) hk + k
k! k!
k=0 k=0
The error term f (n+1) (x0 + #h) is called Lagrange’s remainder, while o (hn ) is called Peano’s
remainder. H
log 1 + x3 3 sin2 x
lim
x!0 log (1 + x)
Since the limit is as x ! 0, we can use second order MacLaurin’s expansion (21.15) and
(21.14) to approximate the numerator and the denominator. Thanks to Lemma 439 and by
using the algebra of little-o, we have
The calculation of the limit has therefore been considerably simpli…ed thanks to the combined
use of MacLaurin’s expansions and of the comparison of in…nitesimals seen in Lemma 439.
(ii) Consider the limit
x sin x
lim
x!0 log2 (1 + x)
Also this limit can be solved combining in a suitable way expansions and comparisons of
in…nitesimals:
x sin x x (x + o (x)) x2 + o x2 x2
lim 2 = lim 2 = lim = lim =1
x!0 log (1 + x) x!0 (x + o (x)) x!0 x2 + o (x2 ) x!0 x2
N
21.2. OMNIBUS PROPOSITION FOR LOCAL EXTREMAL POINTS 607
(i) If n is even and f (n) (x0 ) < 0, the point x0 is a strong local maximizer;
(ii) If n is even and f (n) (x0 ) > 0, the point x0 is a strong local minimizer;
(iii) If n is odd, the point x0 is not a local extremal point and, moreover, f is increasing or
decreasing at x0 depending on the fact that f (n) (x0 ) > 0 or f (n) (x0 ) < 0.
For n = 1, point (iii) is nothing but the fundamental …rst order necessary condition
f 0 (x0 ) = 0. Indeed, for n = 1, point (iii) states that if f 0 (x0 ) 6= 0, then x0 is not a local
extremal point (that is, it is neither a local maximizer nor a local minimizer). By contrast,
this is equivalent to say that if x0 is a local extremal point, then f 0 (x0 ) = 0. Point (iii)
extends therefore to subsequent derivatives the …rst order necessary condition.
Point (i) instead, together with the hypothesis f (k) (x0 ) = 0 for every 1 k n 1,
extends, to subsequent derivatives, the second order su¢ cient condition f 00 (x0 ) < 0 for strong
local maximizers. Indeed, for n = 2 point (i) is exactly condition f 00 (x0 ) < 0. Analogously,
point (ii) extends the analogous condition f 00 (x0 ) > 0 for minimizers.2
N.B. In this and in the next section we will concentrate on local extremal points and therefore
on the generalization of point (ii), of su¢ ciency, of Corollary 846. It is possible to generalize
in an analogous way point (i), of necessity, of the aforementioned corollary. We leave the
details to the reader. O
Proof Let us prove point (i). Let n be even and let f (n) (x0 ) < 0. Thanks to Taylor’s
Theorem, from the hypothesis f (k) (x0 ) = 0 for every 1 k n 1 and f (n) (x0 ) 6= 0 if
follows that
Since limh!0 o (hn ) =hn = 0, there exists > 0 such that jhj < implies jo (hn ) =hn j < 1.
Hence
o (hn )
h2( ; ) =) 1 + >0
hn
Since f (n) (x0 ) < 0, we have therefore, being hn > 0 since n is even,
Since n = 4 is even, by point (i) of Proposition 877, we can conclude that x0 = 0 is a local
maximizer (actually, it is a global maximizer, but using Proposition 877 is not enough to
conclude this). N
O.R. Proposition 877 states that, if the …rst k 1 derivatives of f are all zero at x0 and
f (k) (x0 ) 6= 0, if k is even, it gives the same information of f 00 (either local maximizer or
minimizer), while, if k is odd, it gives the same information of f 0 (to increase or to decrease).
In short, it is as if all the k 1 derivatives (which are equal to zero) were not present at all.
H
Example 880 The function de…ned by f (x) = x6 clearly attains its minimum value at
x0 = 0. Indeed, we have f 0 (0) = f 00 (0) = = f v (0) = 0 and f vi (0) = 6! > 0. The
function de…ned by f (x) = x is clearly increasing at x0 = 0. We have f 0 (0) = f 00 (0) =
5
Proposition 877 is very powerful, but it has also important limitations. As Corollary 844,
it can only evaluate interior points and it is powerless in front of non-strong local extremal
points, for which in general the derivatives of each order are zero. The classical case is that
21.2. OMNIBUS PROPOSITION FOR LOCAL EXTREMAL POINTS 609
of constant functions, whose points are all very trivial maximizers and minimizers, on which
Proposition 877 (as already Corollary 844) is not able to give us any indication.
Moreover, to apply Proposition 877 it is necessary that the function has a su¢ cient
number of derivatives at a stationary point, which is not always the case, as the next example
shows.
f (0 + h) f (0) h2 sin h1 0 1
lim = lim = lim h sin =0
h!0 h h!0 h h!0 h
The point x = 0 is stationary for f , but the function does not admit second derivative
at 0. Indeed, we have 8
< 2x sin 1 cos 1 if x 6= 0
0 x x
f (x) =
:
0 if x = 0
and therefore
does not exist. Proposition 877 cannot therefore be applied and hence it is not able to say
anything on the nature of the stationary point x = 0. Nevertheless, the graph of f shows
that such a point is not a local extremal one, since f has in…nitely many oscillations in any
neighborhood of zero. N
Example 882 The general version of the previous example considers f : R ! R de…ned as:
8
< xn sin 1 if x 6= 0
f (x) = x
:
0 if x = 0
with n 1, and shows that such a function does not have derivatives of order n in the
origin (in the case n = 1 this means that there does not exist the …rst derivative). We leave
to the reader the development of this example. N
For convenience of the reader, we also report the following corollary of Proposition 877.
It only states the component of “su¢ cient condition” of the aforementioned proposition.
610 CHAPTER 21. APPROXIMATION
Corollary 883 (Second su¢ cient condition for local extremal points) Let f : A
R ! R and C A. Let n 2 N, with n 2. Let x0 be an interior point of C for which
there exists a neighborhood (a; b) such that f has a derivative n 1 times on (a; b) and has
a derivative n times at x0 . Let f 0 (x0 ) = 0.
Let f (k) (x0 ) = 0 for every k 2 N such that 2 k n 1 and f (n) (x0 ) 6= 0. Then:
(i) If n is even and f (n) (x0 ) < 0, the point x0 is of strong local maximizer;
(ii) If n is even and f (n) (x0 ) > 0, the point x0 is of strong local minimizer;
(iii) If n is odd, the point x0 is not a local extremal point and, moreover, f is increasing or
decreasing at x0 according to the fact that f (n) (x0 ) > 0 or f (n) (x0 ) < 0.
1. We determine the set S of stationary points, solving the …rst order condition f 0 (x) = 0.
If S = ; the procedure ends (and we can conclude that, since there are no stationary
points, there are no extremal ones); otherwise we move to the next step.
It is the classical procedure to …nd local extremal points based on …rst order and second
order conditions of Section 20.5.2. The version just presented improves what we have seen in
that section because, taking again what we have observed in a previous footnote, it requires
only that the function has two derivatives on int C, not necessarily with continuity. However,
we are left with the other limitations discussed in Section 20.5.2.
1. We determine the set S of the stationary points, solving the equation f 0 (x) = 0. If
S = ; the procedure ends; otherwise we move to the next step.
21.4. TAYLOR’S EXPANSION: VECTOR FUNCTIONS 611
3. We compute f 000 at each point of S (2) : if f 000 (x) 6= 0, the point x is not an extremal
one. Call S (3) the subset of S (2) in which f 000 (x) = 0. If S (3) = ; the procedure ends;
otherwise we move to the next step.
4. We compute f (iv) at each point of S (3) : the point x is a strong local maximizer if
f (iv) (x) < 0; a strong local minimizer if f (iv) (x) > 0. Call S (4) the subset of S (3) in
which f (iv) (x) = 0. If S (4) = ; the procedure ends; otherwise we move to the next
step.
The procedure thus ends if there exists n such that S (n) = ;. In the opposite case the
procedure iterates ad libitum.
Example 884 Let us take again the function f (x) = x4 , with C = R. We saw in Example
845 that, for its maximizer x0 = 0, it was not possible to apply the su¢ cient condition
f 0 (x0 ) = 0 and f 00 (x0 ) < 0. We have however
so that
S = S (2) = S (3) = f0g and S (4) = ;
Stage 1 identi…es the set S = f0g, on which stage 2 does not have however nothing to say
since f 00 (0) = 0. Also stage 3 does not add any extra information since f 000 (0) = 0. Stage 4
instead is conclusive: since f (iv) (0) < 0, we can assert that x = 0 is a strong local maximizer
(actually, it is a global maximizer, but this procedure does not allow us to say this). N
Naturally, the procedure is of practical interest when it ends with a value of n su¢ ciently
small.
For example, f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3 is a quadratic form because it is the sum of the
monomials of second degree 3x1 x3 and x2 x3 . It is easy to see that the following functions
are quadratic forms:
f (x) = x2
f (x1 ; x2 ) = x21 + x22 4x1 x2
f (x1 ; x2 ; x3 ) = x1 x3 + 5x2 x3 + x23
f (x1 ; x2 ; x3 ; x4 ) = x1 x4 2x21 + 3x2 x3
The matrix A is called matrix associated to the quadratic form f . Given the matrix
A = (aij ), expression (21.16) can be written in extended way as
The coe¢ cients of the squares x21 , x22 , ..., x2n are therefore the elements on the diagonal of A,
that is, (a11 ; a22 ; :::ann ), while for every i; j = 1; 2; :::n the coe¢ cient of the monomial xi xj
is 2aij . It is therefore very simple to pass from the matrix to the quadratic form and vice
versa. Let us see some examples.
Example 887 The matrix associated to the quadratic form f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3 is
given by: 2 3
3
0 0 2
A=4 0 0 1 5
2
3 1
2 2 0
3
In accordance with what established in Section 13.2.2, for simplicity of notation we write x Ax instead
of the more precise x AxT .
21.4. TAYLOR’S EXPANSION: VECTOR FUNCTIONS 613
are such that f (x) = x Ax, although they are not symmetric. What we loose without
symmetry is the one-to-one correspondence between quadratic forms and matrices. Indeed,
while given the quadratic form f (x1 ; x2 ; x3 ) = 3x1 x3 x2 x3 there exists a unique symmetric
matrix for which (21.16) holds, this is no longer true if we do not require the symmetry of
the matrix, as the two matrices in (21.17) show, for both of which (21.16) holds. N
Example 888 As regards the quadratic form f (x1 ; x2 ) = x21 + x22 4x1 x2 , we have:
1 2
A=
2 1
N
P
Example 889 Let f : Rn ! R be de…ned as f (x) = kxk2 = ni=1 x2i for every x 2 Rn .
The symmetric matrix Pn associated to this quadratic formPis the identity matrix I. Indeed,
2 . More generally, let f (x) = n 2
x Ix = x x = x
i=1 i i=1 i xi with i 2 R for every
i = 1; :::; n. It is easy to see that the matrix associated to f is the diagonal matrix
2 3
1 0 0 0
6 0 0 0 7
6 2 7
6 0 0 0 7
6 3 7
4 0 0 0 0 5
0 0 0 n
In some cases it is easy to verify theP sign of a quadratic form. For example, it is immediate
to see that the quadratic form f (x) = ni=1 i x2i is positive semi-de…nite if and only if i 0
for every i, while it is positive de…nite if and only if i > 0 for every i. In general, nevertheless,
it is not simple to establish directly the sign of a quadratic form and therefore some methods
that help in this task have been elaborated. Among them, we see as example the criterion
of Sylvester-Jacobi.
Given a symmetric matrix A, let us build the following square submatrices A1 , A2 , ...,
An :
2 3
a11 a12 a13
a11 a12
A1 = [a11 ] ; A2 = ; A3 = 4 a21 a22 a23 5 ; :::; An = A
a21 a22
a31 a32 a33
and let us consider their determinants det A1 , det A2 , det A3 ,..., det An = det A (that are
exactly the North-West principal minors of the matrix A introduced in Section 13.6.5, con-
sidered from the smallest one to the largest one).
Proposition 891 (Criterion of Sylvester-Jacobi) A symmetric matrix A is:
(i) positive de…nite if and only if det Ai > 0 for every i = 1; :::; n;
(ii) negative de…nite if and only if det Ai changes sign starting with negative sign (that is,
det A1 < 0, det A2 > 0, det A3 < 0 and so on);
(iii) inde…nite if the determinants det Ai are not zero and the sequence of their signs does
not respect (i) and (ii).
Example 892 Let f (x1 ; x2 ; x3 ) = x21 + 2x22 + x23 + (x1 + x3 ) x2 . The matrix associated to
f is: 2 3
1 21 0
A = 4 12 2 12 5
0 12 1
Indeed, we have
2 1
32 3
1 2 0 x1
x Ax = (x1 ; x2 ; x3 ) 4 1
2 2 1 54
2 x2 5
1
0 2 1 x3
1 1 1 1
= (x1 ; x2 ; x3 ) x1 + x2 ; x1 + 2x2 + x3 ; x2 + x3
2 2 2 2
= x21 + 2x22 + x23 + (x1 + x3 ) x2
21.4. TAYLOR’S EXPANSION: VECTOR FUNCTIONS 615
Let us try to study the sign of the quadratic form with the criterion of Sylvester-Jacobi. We
have:
det A1 = 1 > 0
7 1
1 2
det A2 = det = >0
1
2 24
3
det A3 = det A = > 0
2
By the criterion of Sylvester-Jacobi we can therefore conclude that the quadratic form is
positive de…nite. N
for every h 2 Rn such that x+h 2 U . As already seen in Section 19.2, if, with a small change
of notation, we denote by x0 the point at which f is di¤erentiable and we set h = x x0 ,
expression (21.18) assumes the following equivalent, but more expressive, form:
for every x 2 U .
We can now present Taylor’s expansion for functions of several variables; as in the scalar
case, also in the general case with several variables, Taylor’s expansion re…nes approximation
(21.19). In stating it, we limit ourselves to an approximation up to the second order that
su¢ ces for our purposes. We postpone to more advanced courses the study of approximations
of higher order.
Expression (21.20) is called Taylor’s expansion (or Taylor’s formula) up to the second
order. The polynomial in the variable x
1
f (x0 ) + rf (x0 ) (x x0 ) + (x x0 ) r2 f (x0 ) (x x0 )
2
616 CHAPTER 21. APPROXIMATION
is called Taylor’s polynomial of second degree at the point x0 . The second-degree term is
a quadratic form, whose associated matrix, the Hessian r2 f (x), is symmetric thanks to
Theorem 798 (of Schwartz). Naturally, if arrested to the …rst order Taylor’s expansion
reduces itself to (21.19). Moreover, observe that in the scalar case Taylor’s polynomial
assumes the well-know form:
1
f (x0 ) + f 0 (x0 ) (x x0 ) + f 00 (x0 ) (x x0 )2
2
Indeed, in such a case we have r2 f (x0 ) = f 00 (x0 ) and therefore
As in the scalar case, also here we have a trade-o¤ between the simplicity of the approx-
imation and its accuracy. Indeed, the approximation up to the …rst order (21.19) has the
merit of the simplicity with respect to that up to the second order: we approximate with a
linear function rather than with a second-degree polynomial, but to detriment of the degree
of accuracy of the approximation, given by o (kx x0 k) instead of the better o kx x0 k2 .
The choice of the order to which arrest Taylor’s expansion depends therefore on the
particular use we are interested in, depending on which aspect of the approximation is more
important, simplicity or accuracy.
2
Example 894 Let f : R2 ! R be de…ned as f (x1 ; x2 ) = 3x21 ex2 . We have:
2 2
rf (x) = 6x1 ex2 ; 6x21 x2 ex2
and " #
2 2
2 6ex2 12x1 x2 ex2
r f (x) = 2 2
12x1 x2 ex2 6x21 ex2 1 + 2x22
By Theorem 893, Taylor’s expansion at x0 = (1; 1) is
2
The function f (x1 ; x2 ) = 3x21 ex2 is therefore approximated at the point (1; 1) by the second-
degree Taylor’s polynomial
1
f (x) = f (^
x) + (x ^) r2 f (^
x x) (x x
^) + o kx ^ k2
x (21.22)
2
that is
1
f (^ x) + h r2 f (^
x + h) = f (^ x) h + o khk2
2
By working on this simple observation, we obtain the following second order conditions,
which are based on the sign of the quadratic form h r2 f (x0 ) h.
Note that from point (i) follows that if the quadratic form h r2 f (^ x) h is inde…nite,
the point x^ is neither a local maximizer nor a local minimizer on U . The theorem is the
analogous for functions of several variables of Corollary 846 for scalar functions. In the proof
we will reduce the problem from functions of several variables to scalar function, and we will
use this corollary. We will prove only point (i) leaving point (ii) to the reader.
Proof (i) Let x ^ be a local maximizer on U . We want to prove that the quadratic form
h r2 f (^ x) h is negative semi-de…nite. For simplicity, let us suppose that x ^ is the origin
0 = (0; 0) (leaving to the reader the case of any x ^). First of all let us prove that we have
v r2 f (0) v 0 for every versor v of Rn . Afterwards we will prove that we have h r2 f (0) h
0 for every h 2 Rn .
Since 0 is a local maximizer, there exists a neighborhood B (0) of 0 such that f (0)
f (x) for every x 2 B (0) \ U and there exists a spherical neighborhood of 0, of radius
su¢ ciently small, contained in B (0) \ U that is there exists " > 0 such that B" (0)
B (0) \ U . Let us observe that every vector x 2 B" (0) can be written as x = tv, where v
is a versor of Rn , that is, v 2 Rn , jjvjj = 1 and t 2 R.5 Clearly, tv 2 B" (0) if and only if
4
For simplicity we continue to consider a function de…ned on a neighbourhood. The reader can extend the
results to functions f : A Rn ! R and to interior points x ^ that belong to a choice set C A.
5
Intuitively, v gives the direction of x, and t gives its norm (indeed, jjxjj = jtj).
618 CHAPTER 21. APPROXIMATION
jtj < ". Fixed now an arbitrary versor v of Rn , let us de…ne the function v : ( "; ") ! R as
v (t) = f (tv). Since tv 2 B" (0) for jtj < ", we have
for every t 2 ( "; "). It follows that t = 0 is a local maximizer for the function v and hence,
being v di¤erentiable and being t = 0 interior point to the domain of v , applying Corollary
846 we get 0v (0) = 0 and 00v (0) 0. Applying the chain rule to the function
Since the versor v of Rn is arbitrary, this last inequality holds for every v 2 Rn with jjvjj = 1.
Let now h 2 Rn . Analogously as before, let us observe that h = th v for some versor v of
Rn and th 2 R such that jth j = jjhjj.
1.5 h= t v
h
1
v
0.5
0
1
-0.5
-1
-1.5
-2
-2 -1 0 1 2
Then
h r2 f (0) h = th v r2 f (0) th v = t2h v r2 f (0) v
Since v r2 f (0) v 0, we have also h r2 f (0) h 0, and since this holds for every h 2 Rn ,
the quadratic form h r2 f (0)h is negative semi-de…nite.
In the scalar case we …nd again the usual second order conditions, based on the sign of
the second derivative f 00 (^
x). Indeed, we already observed in (21.21) that in the scalar case
it is true that
x r2 f (^
x) x = f 00 (^
x) x2
thus, in this case, the sign of the quadratic form depends only on the sign of f 00 (^x); that is,
it is negative (positive) de…nite if and only if f 00 (^
x) < 0 (> 0) and it is negative (positive)
semi-de…nite if and only if f 00 (^
x) 0 ( 0).
21.4. TAYLOR’S EXPANSION: VECTOR FUNCTIONS 619
Naturally, as in the scalar case, also in this more general framework, condition (i) is only
necessary for x ^ to be a local maximizer. Indeed, let us consider the function f (x1 ; x2 ) =
^ = 0 we have r2 f (0) = O. The corresponding quadratic form x r2 f (0) x
x21 x2 . At x
is identically zero and it is therefore both negative semi-de…nite and positive semi-de…nite.
Nevertheless, x ^ = 0 is neither a local maximizer nor a local minimizer. Indeed, given a
generic neighborhood B" (0), let x = (x1 ; x2 ) 2 B" (^ x) be such that x1 = x2 . Let t be such a
common value, so that
p p "
(t; t) 2 B" (0) () k(t; t)k = t2 + t2 = jtj 2 < " () jtj < p
2
Since f (t; t) = t3 , for every (t; t) 2 B" (0) we have f (t; t) < f (0) if t < 0 and f (0) < f (t; t)
if t > 0, which shows that x ^ = 0 is neither a local maximizer nor a local minimizer.6
Similarly, condition (ii) is only su¢ cient for x ^ to be a local maximizer. Consider the
function f (x) = x21 x22 . The point x ^ = 0 is clearly a maximizer (even absolute) for the
function f . But, r2 f (0) = O and therefore the corresponding quadratic form x r2 f (0) x
is not negative de…nite.
This is an important observation from the practical point of view because there exist cri-
teria, such as that of Sylvester-Jacobi, to determine if a symmetric matrix is positive/negative
de…nite or semi-de…nite.
To illustrate Theorem 895, let us consider the case of a function of two variables f :
R2 ! R that has a derivative twice with continuity. Let x0 2 R2 be a stationary point
rf (x0 ) = (0; 0) and let
2 3
@2f @2f
(x ) (x )
5= a b
2 0 @x @x 0
r2 f (x0 ) = 4 @@x2 f1 1
@2f
2
(21.23)
c d
@x2 @x1 (x0 ) @x2
(x0 )
2
be the Hessian matrix computed at the point x0 . Since the gradient at x0 is zero, the
point is a candidate to be a maximizer or minimizer of f . To evaluate its exact nature
it is necessary to proceed to the analysis of the Hessian matrix at the point. By The-
orem 895, x0 is a maximizer if the Hessian is negative de…nite, a minimizer if it is positive
6
In an alternative way, it is su¢ cient to observe that each point of the I and II quadrant, except the axes,
is such that f (x1 ; x2 ) > 0 and that each point of the III and IV quadrant, except the axes, is such that
f (x1 ; x2 ) < 0. Every neighbourhood of the origin contains necessarily both points of the I and II quadrant
(except the axes), for which we have f (x1 ; x2 ) > 0 = f (0), and points of the III and IV quadrant (except
the axes), for which we have f (x1 ; x2 ) < 0 = f (0). Hence 0 is neither a local maximizer nor point a local
minimizer.
620 CHAPTER 21. APPROXIMATION
(i) if a > 0 and ad bc > 0, the Hessian is positive de…nite, and therefore x0 is a strong
local minimizer;
(ii) if a < 0 and ad bc > 0, the Hessian is negative de…nite, and therefore x0 is a strong
local maximizer;
(iii) if ad bc < 0, the Hessian is inde…nite, and therefore x0 is neither a local maximizer
nor a local minimizer.
In all the other cases it is not possible to draw conclusions on the nature of the point x0 .
Example 896 Let f : R2 ! R be a function de…ned as f (x1 ; x2 ) = 3x21 + x22 + 6x1 . The
gradient of f is rf (x) = (6x1 + 6; 2x2 ). Its Hessian matrix is
6 0
r2 f (x) =
0 2
It is easy to see that the unique point where the gradient vanishes is the point x0 = ( 1; 0) 2
R2 , that is, rf ( 1; 0) = (0; 0). Moreover, using what we have just seen, since a > 0 and
ad bc > 0, the point x0 = ( 1; 0) is a strong local minimizer of f . N
Example 897 Let f : R3 ! R be de…ned as f (x1 ; x2 ; x3 ) = x31 + x32 + 3x23 2x3 + x21 x22 .
We have
rf (x) = 3x21 + 2x1 x22 ; 3x22 + 2x21 x2 ; 6x3 2
and therefore 2 3
6x1 + 2x22 4x1 x2 0
r2 f (x) = 4 4x1 x2 6x2 + 2x21 0 5
0 0 6
The stationary points are x0 = ( 3=2; 3=2; 1=3) and x00 = (0; 0; 1=3). At x0 , we have
2 9
3
2 9 0
r2 f x0 = 4 9 9
2 0 5
0 0 6
and therefore
9 9
9
det < 0; det 2
9 < 0; det r2 f x0 < 0
2 9 2
21.4. TAYLOR’S EXPANSION: VECTOR FUNCTIONS 621
By the criterion of Sylvester-Jacobi the Hessian matrix is inde…nite. By Theorem 895, the
point x0 = ( 3=2; 3=2; 1=3) is neither a local minimizer nor a local maximizer. For the
point x00 = (0; 0; 1=3) we have
2 3
0 0 0
r2 f x00 = 4 0 0 0 5
0 0 6
which is positive semi-de…nite since x r2 f (x00 ) x = 6x23 (note that it is not positive de…nite:
for example, (1; 1; 0) r2 f (x00 ) (1; 1; 0) = 0). N
where C is an open set of Rn . Let us assume that f 2 C 2 (C). Thanks to point (i) of Theorem
895, the procedure of Section 20.5.3 assumes the following form:
1. We determine the set S C of the stationary interior points of f , solving the …rst
order condition rf (x) = 0 (Section 20.1.3).
S2 = x 2 S : r2 f (^
x) is negative semi-de…nite
that constitutes the set of the points of C candidate to be possible solutions of the
optimization problem.
Also here the procedure is not conclusive because nothing ensures the existence of a
solution. Later in the book we will discuss this crucial problem by combining in the method
of elimination such theorems of existence with the di¤erential methods.
Here C = R2++ is the …rst quadrant of the plane without the axes (and it is therefore an
open set). We have:
rf (x) = ( 4x1 + 3 x2 ; 2x2 + 3 x1 )
622 CHAPTER 21. APPROXIMATION
and therefore from the …rst order condition rf (x) = 0 it follows that the unique stationary
point is x = (3=7; 9=7), that is, S = f3=7; 9=7g. We have
4 1
r2 f (x) =
2 1
By the criterion of Sylvester-Jacobi, the Hessian matrix r2 f (x) is negative de…nite.7 Hence,
S2 = f3=7; 9=7g. Since S2 is a singleton, we have trivially S3 = S2 . In conclusion, the point
x = (3=7; 9=7) is the unique candidate to be solution of the unconstrained optimization
problem. It is possible to show that this point is the solution of the problem. For the
moment we can only say that it is a local maximizer (Theorem 895-(ii)). N
'n+1 = o ('n ) as x ! x0
Example 899 (i) Power functions 'n (x) = (x x0 )n are an asymptotic scale in x0 2 (a; b).
(ii) Negative power functions in negative powers 'n (x) = x n are an asymptotic scale in
x0 = +1.9 More generally, powers 'n (x) = x n form an asymptotic scale in x0 = +1 as
long as n+1 > n for every n 1. (iii) The trigonometric functions 'n (x) = sinn (x x0 )
1
form an asymptotic scale in x0 2 (a; b). (iv) Logarithms 'n (x) = log n x for an asymptotic
scale in x0 = +1. N
Polynomial expansions, in the form (21.2), are a special case of (21.24) where the asymp-
totic scale is given by power functions. Furthermore, contrary to the polynomial case where
x0 had to be a scalar, now we can take x0 = 1. General expansions are relevant as, with
respect to polynomial expansions, they allow us to approximate a function for large values of
the argument, that is to say asymptotically. In symbols, condition (21.24) scan be expressed
as
Xn
f (x) k 'k (x) as x ! x0
k=0
For example, for n = 2 we get the quadratic approximation:
f (x) 0 '0 (x) + 1 '1 (x) + 2 '2 (x) as x ! x0
By using the scale of power functions, we end up with the well-known quadratic approxim-
ation
2
f (x) 0 + 1x + 2x as x ! 0
however, if we use the scale of negative power functions, we get that:
1 2
f (x) + 2 0 +
as x ! +1
x x
In such a case, as x0 = +1, we are dealing with a quadratic asymptotic approximation.
Example 901 It holds that:
1 1 1
+ as x ! +1 (21.25)
x 1 x x2
Indeed,
1 1 1 1 1
+ 2 = =o as x ! +1
x 1 x x (x 1) x2 x2
Approximation (21.25) is asymptotic. For values close to 0, we consider the quadratic poly-
nomial approximation instead:
1
1 x 2x2 as x ! 0
x 1
N
The crucial property regarding the uniqueness of polynomial expansions (Lemma 868)
still holds in the general case.
Lemma 902 A function f : (a; b) ! R has at most a unique expansion of order n with
respect to scale in every point x0 2 [a; b].
P
Proof. Let us consider the expansion nk=0 k 'k (x) + o ('n ) in x0 2 [a; b]. We have that
Pn
f (x) k=0 k 'k (x) + o ('n )
lim = lim = 0 (21.26)
x!x0 '0 (x) x!x0 '0 (x)
Pn
f (x) 0 '0 (x) k=1 k 'k (x) + o ('n )
lim = lim = 1 (21.27)
x!x0 '1 (x) x!x0 '1 (x)
Pn 1
f (x) k=0 k 'k (x)
lim = n (21.28)
x!x0 'n (x)
624 CHAPTER 21. APPROXIMATION
Suppose that, for every x 2 (a; b), there are two di¤erent expansions
n
X n
X
k 'k (x) + o ('n ) = k 'k (x) + o ('n ) (21.29)
k=0 k=0
Equalities (21.26)-(21.28) must hold for both expansions. Hence, by (21.26) we have that
0 = 0 . Iterating such a procedure, from equality (21.27) we get that 1 = 1 , and so on
until n = n .
Limits (21.26)-(21.28) are crucial: it is rather easy to see that expansion (21.24) holds if
and only if the limits exist (and are …nite). Such limits in turn determine the expansion’s
coe¢ cients f k gnk=0 .10
Example 903 Let us determine the quadratic asymptotic approximation, with respect to
the scale of negative power functions, for the function f : ( 1; +1) ! R de…ned as f (x) =
1= (1 + x). Thanks to equalities (21.26)-(21.28), it holds that
1
f (x) 1
0 = lim = lim 1+x = lim =0
x!x0 '0 (x) x!x0 1 x!x0 1 + x
1
f (x) 0 '0 (x) 1+x x
1 = lim = lim 1 = lim =1
x!x0 '1 (x) x!x0
x
x!x0 1+x
1 1
f (x) 0 '0 (x) 1 '1 (x) 1+x x x
2 = lim = lim 1 = lim = 1
x!x0 '2 (x) x!x0
x2
x!x0 1+x
Hence the desired approximation is
2
1 1 1
as x ! +1
1+x x x
By the previous lemma, it is the only quadratic asymptotic approximation with respect to
the scale of negative power functions. N
By changing scale, the expansion changes as well. For example, approximation (21.25) is a
quadratic approximation for 1= (x 1) with respect to the scale of negative power functions,
however, by changing scale one obtains a di¤erent quadratic approximation. Indeed, if for
example in x0 = +1 we consider the asymptotic scale 'n (x) = (x + 1) =x2n we obtain the
quadratic asymptotic approximation
1 x+1 x+1
+ as x ! +1
x 1 x2 x4
In fact,
1 x+1 x+1 1 x+1
+ = =o as x ! +1
x 1 x2 x4 (x 1) x4 x4
In conclusion, di¤erent asymptotic scales lead to di¤erent, although unique, approxima-
tions (as long as they exist). But then, di¤erent functions can have the same expansion, as
the next example shows.
10
The “only if” part is shown in the previous proof, the reader can verity the converse.
21.5. ASYMPTOTIC EXPANSIONS 625
The reader might recall that we considered the two following formulations of the De
Moivre-Stirling formula
Proof By integrating by parts, one obtains that for every 0 < a < b
Z b Z b Z b
b
tx e t dt = e t tx a + x tx 1 e t dt = e b bx + e a ax + x tx 1
e t dt
a a a
(n + 1) = n (n) = n (n 1) (n 1) = = n! (1) = n!
11
Since x > 0, we have that lima!0 ax = 0 as 1 = x lima!0 log a = lima!0 log ax .
626 CHAPTER 21. APPROXIMATION
as (1) = 1. The Gamma function can be therefore thought of as the extension on the
real line of the factorial function, which is de…ned on the natural numbers.12 It is a very
important function: the next remarkable result makes its interpretation in terms of expansion
of the two versions of the De Moivre-Stirling formula more rigorous.
Example 907 (i) The function f : (1; +1) ! R de…ned as f (x) = 1= (x 1) has, with
respect to the scale of negative power functions, the asymptotic expansion
1
X 1
f (x) as x ! +1 (21.30)
xk
k=1
The asymptotic expansion is, for every given x, a geometric series, therefore it converges for
every x > 1, that is for every x in the domain of f , with
1
X 1
f (x) =
xk
k=1
12
Instead of (n + 1) = n! we would have exactly that (n) = n! if in the gamma function the exponent
was x instead of x 1, which is the standard notation. This detail also explains the opposite sign of the
logarithmic term in the approximation of n! and of (x). The properties of the gamma function, including the
next theorem and its proof, can be found in E. Artin, “The gamma function”, Holt, Rinehart and Winston,
1964.
21.5. ASYMPTOTIC EXPANSIONS 627
In this (fortunate) case the asymptotic expansion is actually correct: la series determined by
the asymptotic expansion converges to f (x) for every x 2 (a; b).
(ii) Also the function f : (1; +1) ! R de…ned as (1 + e x ) = (x 1) has, with respect
to the scale of negative power functions, the asymptotic expansion (21.30) for x ! +1.
However, in this case, for every x > 1 we have that
1
X 1
f (x) 6=
xk
k=1
In this instance the asymptotic expansion is merely an approximation, with degree of accur-
acy x n for every n.
(iii) Consider the function f : (1; +1) ! R de…ned as:13
Z x t
x e
f (x) = e dt
1 t
Since
Rx R x
et
Rx et
x
ex
et 2 e2 +
1 tn+1
dt 1 tn+1 dt + x
2 tn+1
dt x n+1
( )
2 xn 2n+1
0 lim ex = lim ex lim ex = lim x + =0
x!1
xn
x!1
xn
x!1
xn
x!1 e2 x
We have that Z x
et ex
n+1
dt = o as x ! +1
1 t xn
Hence,
g (x) 1 1 2! 3! (n 1)! 1
f (x) = = + 2+ 3+ 4+ + +o as x ! +1
ex x x x x xn xn
and
1
X (k 1)!
f (x) as x ! +1
xk
k=1
P P
For any given x > 1, the ratio criterion implies that 1k=1 (k 1)!=xk = 1 k
k=1 k!=kx = +1.
The asymptotic expansion thus determines a divergent series. In this (very unfortunate) case
not only the series does not converge to f (x), but it even diverges. N
13
This example is taken from N. G. de Bruijn, “Asymptotic methods in analysis”, North-Holland, 1961.
628 CHAPTER 21. APPROXIMATION
Let us go back to the polynomial case, in which the asymptotic expansion of f : (a; b) ! R
in x0 2 (a; b) has the form
1
X
f (x) k (x x0 )k as x ! x0
k=0
1
X f (k) (x0 )
f (x) = (x x0 )k 8x 2 (a; b) (21.31)
k!
k=0
Proof The converse being P trivial, let us consider the “only if”side. Let f be analytic. Since,
by hypothesis, the series 1
k=0 k (x x0 )k
is convergent for every x 2 (a; b), with sum f (x),
one can show that f is in…nitely di¤erentiable in every x 2 (a; b). Let n 1. By Taylor’s
Theorem, we have that
n
X f (k) (x0 )
f (x) (x x0 )k as x ! x0
k!
k=0
Lemma 902 implies that k = f (k) (x0 ) =k! for every 1 k n. Since n was arbitrarily
chosen, the desired result follows.
The following result shows how some classic elementary functions are indeed analytic.
Proposition 909 (i) The exponential and logarithmic functions are analytic; in particular
1
X xk
ex = 8x 2 R
k!
k=0
1
X xk
log (1 + x) = ( 1)k+1 8x > 0
k
k=1
(ii) The trigonometric functions sine and cosine are analytic; in particular
1
X 1
X
( 1)k 2k+1 ( 1)k 2k
sin x = x and cos x = x 8x 2 R
(2k + 1)! (2k)!
k=0 k=0
21.5. ASYMPTOTIC EXPANSIONS 629
In the previous proof we have seen how being in…nitely di¤erentiable is a necessary
condition for a function to be analytic. However, the following result shows how such a
condition is not su¢ cient.
is in…nitely di¤erentiable in every point of the real line, hence in the origin, so that
1
X f (k) (0)
f (x) xk as x ! 0
k!
k=0
Theorem 911 (Hille) Let f : (0; 1) ! R be a bounded continuous function and x0 > 0.
Then, for each h > 0,
X1 k f (x )
0
f (x0 + h) = lim hk (21.32)
!0+ k!
k=0
We call Hille’s formula the limit (21.32). When f is in…nitely di¤erentiable, Hille’s
formula intuitively should approach the series expansion (21.31), i.e.,
1
X f (k) (x0 )
f (x0 + h) = hk
k!
k=0
630 CHAPTER 21. APPROXIMATION
because lim !0+ k f (x0 ) = f (k) (x0 ) for every k 1 (Proposition 777). This is actually
true when f is analytic since in this case (21.31) and (21.32) together imply
1
X 1
kf(x0 ) k X f (k) (x0 ) k
lim h = h
!0+ k! k!
k=0 k=0
Hille’s formula, however, holds when f is just bounded and continuous, thus providing a
remarkable generalization of the Taylor ’s expansion of analytic functions.
Chapter 22
Concave functions have remarkable di¤erential properties that con…rm the great tractability
of these widely used functions. The study of these properties is the subject matter of this
chapter. We begin with scalar functions and then move to vector ones. Throughout the
chapter C always denotes a convex set (so an interval in the scalar case). For brevity, we
will focus on concave functions, leaving to the readers the dual results that hold for convex
functions.
f (y) f (x)
y x
as it is easy to check with a simple modi…cation of what seen for (18.6). Graphically:
f(y)
4
f(y)-f(x)
3
f(x)
2
y-x
0
O x y
-1
-1 0 1 2 3 4 5 6
631
632 CHAPTER 22. CONCAVITY AND DIFFERENTIABILITY
If a function is concave, the slope of the chord decreases when we move the chord rightward.
This basic geometric property characterizes concavity, as the next lemma shows.
Lemma 912 A function f : C R ! R is concave if and only if, for any four points
x; w; y; z 2 C with x w < y z, we have
In other words, by moving rightward from [x; y] to [w; z], the slope of the chords decreases.
Graphically:
5 D
C
4
3
B
2
1 A
0
O x w y z
-1
-1 0 1 2 3 4 5 6
Proof “Only if”. Let f be concave. The proof is divided in two steps: …rst we show that
the chord AC has a greater slope than the chord BC:
5
C
4
3
B
2
1 A
0
O x w y
-1
-1 0 1 2 3 4 5 6
22.1. SCALAR FUNCTIONS 633
Then, we show that the chord BC has a greater slope than the chord BD:
5 D
C
4
3
B
2
0
O w y z
-1
-1 0 1 2 3 4 5 6
The …rst step amounts to prove (22.1) for z = y. Since x w < y, there exists 2 [0; 1]
such that w = x + (1 )y. Since f is concave we have f (w) f (x) + (1 )f (y), so
that
f (y) f (w) f (y) f (x) (1 )f (y) f (y) f (x)
= (22.2)
y w y x (1 )y y x
This completes the …rst step. We now move to the second step, which amounts to prove
(22.1) for x = w. Since w < y z, there exists 2 [0; 1] such that y = w + (1 )z. Since
f is concave we have f (y) f (w) + (1 )f (z), so that
The geometric property (22.1) has the following analytical counterpart, of great economic
signi…cance.
634 CHAPTER 22. CONCAVITY AND DIFFERENTIABILITY
Proof Let x y and h 0. Then the points y and x + h belongs to the interval [x; y + h].
Under the change of variable z = y +h, we have x+h; z h 2 [x; z]. Hence there is a 2 [0; 1]
for which x + h = x + (1 ) z. It is immediate to check that z h = (1 ) x + z.
By the concavity of f , we then have f (x + h) f (x) + (1 ) f (z) and f (z h)
(1 ) f (x) + f (z). Adding the two inequalities, we have
f (x + h) + f (z h) f (x) + f (z)
f (x + h) f (x) f (z) f (z h) = f (y + h) f (y) :
f (x + h) f (x) f (y + h) f (y)
f+0 (x) = lim lim = f+0 (y)
h!0+ h h!0+ h
provided the limits exist. Similarly f 0 (x) f 0 (y), and so f 0 (x) f 0 (y) when the (bilat-
eral) derivative exists. Concave function f thus feature decreasing marginal e¤ects as their
argument increases, and so embody a fundamental economic principle: additional units have
a lower and lower marginal impact on levels (of utility, of production, and so on; we then
talk of decreasing marginal utility, decreasing marginal returns, and so on). It is through
this principle that forms of concavity …rst entered economics.1
The next lemma establishes this property rigorously by showing that unilateral derivatives
exist and are decreasing.2
(i) the left f+0 (x) and right f 0 (x) derivatives exist at each x 2 int C;
(ii) the left f+0 (x) and right f 0 (x) derivatives are both decreasing on int C;
Proof Since x0 is an interior point, there exists a neighborhood of this point (x0 "; x0 + ")
included in C, that is, (x0 "; x0 + ") C. Let 0 < a < ", so that we have [x0 a; x0 + a]
C. Let : [ a; a] ! R be de…ned as
f (x0 + h) f (x0 )
(h) = 8h 2 [ a; a]
h
Property (22.1) implies that is decreasing, that is,
Indeed, if h0 < 0 < h00 it is su¢ cient to apply (22.1) with w = y = x0 , x = x0 + h0 and
z = x0 + h00 . If h0 h00 < 0, it is su¢ cient to apply (22.2) with y = x0 , x = x0 + h0 and
w = x0 + h00 . If 0 < h0 h00 it is su¢ cient to apply (22.3) with w = x0 , y = x0 + h0 and
00
z = x0 + h .
Since is decreasing on [ a; a] we have (a) (h) ( a) for every h 2 [ a; a],
that is, is bounded. Therefore, is both decreasing and bounded, which implies that the
right-hand limit and the left-hand limit of exist and are …nite. This proves the existence
of unilateral derivatives. Moreover, the decreasing monotonicity of implies (h0 ) (h00 )
0 00
for every h < 0 < h , so that
To show the monotonicity, let us consider x; y 2 int C such that x < y. By (22.4),
f (x + h) f (x) f (y + h) f (y)
8h 2 [ a; a]
h h
Hence,
f (x + h) f (x) f (y + h) f (y)
f+0 (x) = lim lim = f+0 (y)
h!0+ h h!0+ h
which implies that the right derivative is decreasing. A similar argument holds for the left
derivative.
Clearly, if in addition f is di¤erentiable at x, then f 0 (x) = f+0 (x) = f 0 (x). In particular:
Example 916 (i) The concave function f (x) = jxj has not a derivative at x = 0. Never-
theless, the unilateral derivatives exist at each point of the domain, with
(
0 1 if x < 0
f+ (x) =
1 if x 0
and (
1 if x 0
f 0 (x) =
1 if x > 0
636 CHAPTER 22. CONCAVITY AND DIFFERENTIABILITY
Therefore f+0 (x) f 0 (x) for every x 2 R and both unilateral derivatives are decreasing.
(ii) The concave function
8
>
> x+1 if x 1
<
f (x) = 0 if 1<x<1
>
>
:
1 x if x 1
and 8
>
> 1 if x 1
<
f 0 (x) = 0 if 1<x 1
>
>
:
1 if x > 1
Therefore f+0 (x) f 0 (x) for every x 2 R and both unilateral derivatives are decreasing.
(iii) The concave function f (x) = 1 x2 is di¤erentiable on R with f 0 (x) = 2x. The
derivative function is decreasing. N
Proposition 914 says, inter alia, that at the interior points x we have f+0 (x) f 0 (x).
The next result, of which we omit the proof, says that we actually have f+0 (x) = f 0 (x),
and so f is di¤erentiable at x, at all the points of C, except at an at most countable subset
of it (in the previous tripartite example, such set of non di¤erentiability D is, respectively,
D = f0g, D = f 1; 1g and D = ;).
(i) f is concave if and only if the right derivative f+0 exists and is decreasing on int C;
(ii) f is strictly concave if and only if the right derivative f+0 exists and is strictly decreasing
on int C.
22.1. SCALAR FUNCTIONS 637
Proof (i) We only prove the “if” since the converse follows from Proposition 914. For
simplicity, assume that f is di¤erentiable on the open interval int C. By hypotheses, f 0 is
decreasing on int C. Let x; y 2 int C, with x < y, and 2 (0; 1). Set z = x + (1 ) y, so
that x < z < y. By the Mean Value Theorem, there exist x 2 (x; z) and y 2 (z; y) such
that
f (z) f (x) f (y) f (z)
f 0 ( x) = ; f0 y =
z x y z
Since f 0 is decreasing, f 0 ( x ) f0 y . Hence,
f ( x + (1 ) y) f (x) f (y) f ( x + (1 ) y)
x + (1 )y x y x (1 )y
Being x + (1 )y x x = (1 ) (y x) and y x (1 )y = (y x), we then
have
f ( x + (1 ) y) f (x) f (y) f ( x + (1 ) y)
(1 ) (y x) (y x)
In turn, this easily implies f ( x + (1 ) z) f (x) + (1 ) f (z), as desired.3 (ii) This
part is left to the reader.
A similar result, left to the reader, holds for the other unilateral derivative f 0 . This
theorem thus establishes a di¤erential characterization for concavity by showing that it is
equivalent to the decreasing monotonicity of unilateral derivatives.
The function has unilateral derivatives at each point of the domain, with
(
1 + 3x2 if x < 0
f+0 (x) =
1 3x2 if x 0
and (
0 1 + 3x2 if x 0
f (x) =
1 3x2 if x > 0
To see that this is the case, let us consider the origin, which is the most delicate point. We
have
f (h) f (0) h + h3 h3
f+0 (0) = lim = lim = lim 1+ = 1
h!0+ h h!0+ h h!0+ h
and
f (h) f (0) h + h3 h3
f 0 (0) = lim = lim = lim 1+ =1
h!0 h h!0 h h!0 h
Therefore f+0 (x) f 0 (x) for every x 2 R and both derivatives are decreasing. By Theorem
918 the function is concave. N
3
A version of the Mean Value Theorem for unilateral derivatives would prove the result without any
di¤erentiability assumption on f .
638 CHAPTER 22. CONCAVITY AND DIFFERENTIABILITY
Unilateral derivatives are key in the previous theorem because concavity per se only en-
sures their existence, not that of the bilateral derivative .Unilateral derivatives are, however,
less easy to handle than the bilateral derivative. In application di¤erentiability is often as-
sumed. In this case we have the following simple consequence of the previous theorem that,
under di¤erentiability, provides a useful test of concavity for functions.
(ii) f is strictly concave if and only if the derivative function f 0 is strictly decreasing on
int C.
Under di¤erentiability, a necessary and su¢ cient condition for a function to be (strictly)
concave is, thus, that its …rst derivative is (strictly) decreasing.4
3 y
0
O x
-1
-2
-3
-4
-3 -2 -1 0 1 2 3 4 5
f (x) = jx3 j
4
When C is open, the continuity hypothesis become super‡uous (a similar observation applies to Corollary
922 below).
22.1. SCALAR FUNCTIONS 639
2
y
1
0
O x
-1 -1
-2
-3
-4
-5
-3 -2 -1 0 1 2 3 4
g(x) = e x
The derivatives are strictly decreasing and therefore f and g are strictly concave thanks to
Corollary 920. N
This corollary provides a simple di¤erential criterion of concavity that reduces the test of
concavity to that, often operationally simple, of a property of …rst derivatives. It is actually
possible to do even better by recalling the di¤erential characterization of monotonicity seen
in Section 20.4.
Corollary 922 Let f : C R ! R be with twice di¤ erentiable on int C and continuous on
C. Then:
Proof (i) It is su¢ cient to observe that, thanks to the “decreasing” version of Proposition
835, the …rst derivative f 0 is decreasing on int C if and only if f 00 (x) 0 for every x 2 int C.
(ii) It follows from the “strictly decreasing” version of Proposition 837.
Under the further hypothesis that f is twice di¤erentiable on int C, concavity thus be-
comes equivalent to the negativity of the second derivative, a condition often easier to check
than the decreasing monotonicity of the …rst derivative. In any case, thanks to the last two
corollaries we now have powerful di¤erential tests of concavity.5
Note the asymmetry between points (i) and (ii): while in (i) the decreasing monotonicity
is necessary and su¢ cient condition for concavity, in (ii) the strictly decreasing monotonicity
is only su¢ cient condition for strict concavity. This follows from the analogous asymmetry
for monotonicity between Propositions 835 and 837.
5
Dual results hold for convex functions, with increasing monotonicity instead of decreasing monotonicity
(and f 00 0 instead of f 00 0).
640 CHAPTER 22. CONCAVITY AND DIFFERENTIABILITY
p
Example 923 (i) The functions f (x) = x and g (x) = log x have, respectively, derivatives
p
f 0 (x) = 1=2 x and g 0 (x) = 1=x that are strictly decreasing. Therefore, they are strictly
concave. The second derivatives f 00 (x) = 1= 4x3=2 < 0 and g 00 (x) = 1=x2 < 0 con…rm
this conclusion.
(ii) The function f (x) = x2 has derivative f 0 (x) = 2x that is strictly increasing. There-
fore, it is strictly convex. Indeed, f 00 (x) = 2 > 0.
(iii) The function f (x) = x3 has derivative f 0 (x) = 3x2 that is strictly decreasing on
( 1; 0] and strictly increasing on [0; +1). Indeed, the second derivative f 00 (x) = 6x is 0
on ( 1; 0] and 0 on [0; +1). N
Proof Let f be concave and let x and y be two distinct points of (a; b). If 2 (0; 1) we
have:
Therefore,
f (x + (1 ) (y x)) f (x)
f (y) f (x)
(1 )
Dividing and multiplying the left-hand side by y x, we get
f (x + (1 ) (y x)) f (x)
(y x) f (y) f (x)
(1 ) (y x)
This inequality holds for every 2 (0; 1). Hence, thanks to the di¤erentiability of f at x,
we have
f (x + (1 ) (y x)) f (x)
lim (y x) = f 0 (x) (y x)
!1 (1 ) (y x)
Therefore, f 0 (x) (y x) f (y) f (x), as desired.
The right-hand side of inequality (22.6) is the tangent line of f at x, that is, the linear
approximation of f that holds, locally, at x. By Theorem 924, such line always lies above
the graph of the function, the approximation is in “excess”.
Geometrically, this remarkable property is clear: the de…nition of concavity requires that
the straight line passing through the two points (x; f (x)) and (y; f (y)) lies below the graph
of f in the interval between x and y and therefore that it lies above it outside that interval.6
Letting y tend to x, the straight line becomes tangent and it lies all above the curve.
6
For completeness, let us prove it. Let z be exterior to the interval [x; y]: let us suppose that z > y. We can
then write y = x+(1 ) z with 2 (0; 1) and, by the concavity of f , we have f (y) f (x)+(1 ) f (z),
that is, f (z) (1 ) 1 f (y) (1 ) 1 f (x). But, being 1= (1 ) = > 1 and 1 = 1 1= (1 )=
= (1 ) < 0, we have f (z) = f ( y + (1 ) x) f (y) + (1 ) f (x) for every > 1. If z < x we
reason in a dual way.
22.2. VECTOR FUNCTIONS 641
5 f(x)+f'(x)(y-x)
4.5
f(x)
4
f(y)
3.5
3
f(y )
2
2.5
2 f(y )
1
1.5
0.5
O y y y x
1 2
0
0 1 2 3 4 5
Theorem 925 Let f : (a; b) ! R be di¤ erentiable on (a; b). Then, f is concave if and only
if
f (y) f (x) + f 0 (x) (y x) 8x; y 2 (a; b) (22.7)
For a function f with a derivative on (a; b), a necessary and su¢ cient condition for
concavity of f is that the tangent lines at the various points of its domain all lie above its
graph.
Proof The “only if” follows from the previous theorem. We prove the “if”. Suppose that
inequality (22.7) holds and consider the point z = x + (1 ) y. Let us consider (22.7)
twice: …rst with the points z and x, and then with the points z and y. Then:
f 0 (z) (1 ) (x y) f (x) f ( x + (1 ) y)
0
f (z) (y x) f (y) f ( x + (1 ) y)
Let us multiply the …rst inequality by , and the second one by (1 ). Adding up:
0 f (x) + (1 ) f (y) f ( x + (1 ) y)
Relative to Theorem 790, here the continuity of partial derivatives is not required.
A key question in the vector case is: What is the vector counterpart for the property of
decreasing monotonicity of the …rst derivative? Recall that in the scalar case this property
characterizes concavity, as Corollary 920 showed. For vector functions, the derivative func-
tion f 0 becomes the derivative operator rf : C ! Rn (Section 19.1.1). In the Appendix we
present a notion of monotonicity for operators. By applying such notion to the derivative
operator, next we extend Corollary 920 to vector functions.
A dual result, with opposite inequality, characterizes convex functions. The next result
makes operational this characterization via a condition of negativity on the Hessian matrix
r2 f (x) of f – that is, the matrix of second partial derivatives of f – that generalizes the
condition f 00 (x) 0 of Corollary 922. In other words, the role of the second derivative is
played in the general case by the Hessian matrix.
This is the most useful criterion to determine whether a function is concave. Naturally,
dual results hold for convex functions, which are characterized by having positive semi-
de…nite Hessian matrices.
and we saw how its Hessian matrix was positive de…nite. By Theorem 928, f is strictly
convex. N
The next result extends Theorems 924 and 925 to vector functions.
(ii) If f is di¤ erentiable on C, then f is concave if and only if (22.9) holds for every x; y 2 C.
22.3. SUFFICIENCY OF THE FIRST ORDER CONDITION 643
It is easy to see that, for strictly concave functions, we have a strict inequality in (22.9).
The right-hand side of (22.9) is the linear approximation of f at x; geometrically, it is
the hyperplane tangent to f at x, that is, the vector version of the tangent line. By this
theorem, such approximation is from above, that is, the tangent hyperplane always lies above
the graph of a concave function. The di¤erential characterizations of concavity discussed in
the previous section for scalar functions, thus nicely extend to vector functions.
f (y) x) + f 0 (^
f (^ x) (y x
^) = f (^
x) 8y 2 (a; b)
Proposition 931 Let f : (a; b) ! R be a concave and di¤ erentiable function. A point
^ 2 (a; b) is a global maximizer of f on (a; b) if and only if f 0 (^
x x) = 0.
Example 934 Consider the function f : R2 ! R de…ned by f (x1 ; x2 ) = (x1 1)2 (x2 +
3)2 6. We have
2 0
r2 f (x1 ; x2 ) =
0 2
Since 2 < 0 and r2 f (x1 ; x2 ) = 4 > 0; the Hessian matrix is negative de…nite for every
(x1 ; x2 ) 2 R2 and hence f is strictly concave. We have
The unique point where the gradient is zero is (1; 3) which is, therefore, the unique global
maximizer. The maximum value of f on R2 is f (1; 3) = 6. N
22.4 Superdi¤erentials
Theorem 930 showed that di¤erentiable concave functions feature the important inequality7
This inequality has a natural geometric interpretation: the tangent hyperplane (line, in
the scalar case) lies above the graph of f , which it touches only at (x; f (x)). Next we
show, without proof, that this property actually characterizes the di¤erentiability of concave
functions.8 In other words, this geometric property is peculiar to the tangent hyperplanes
of concave functions.
f (y) f (x) + (y x) 8y 2 C
For concave functions, di¤erentiability is thus equivalent to the existence of a unique vec-
tor, the gradient, for which the basic inequality (22.10) holds. Equivalently, to the existence
of a unique linear functional l : Rn ! R such that f (y) f (x) + l (y x) for all y 2 C.
Consequently, non di¤erentiability is equivalent either to the existence of more than one
vectors for which (22.10) holds or to the non existence of any such vector. This observation
motivates the next de…nition, where C is any convex (possibly not open) set.
The superdi¤erential thus consists of all vectors (and so linear functions) for which (22.10)
holds. It may not exist any such vector (Example 943 below); in this case the superdi¤erential
is empty and the function is not superdi¤erentiable at the basepoint.
In words, r is equal to f at the basepoint x and dominates f elsewhere. It follows that @f (x)
identi…es the set of all a¢ ne functions that touch the graph of f at x and that lie above this
graph at all other points of the domain. In the scalar case, a¢ ne functions are the straight
lines. So, in the next …gure the straight lines r, r0 , and r00 belong to the superdi¤erential
@f (x) of a concave scalar function:
It is easy to see that, at the points where the function is di¤erentiable, the only straight
line that satis…es conditions (22.12)-(22.13) is the tangent line f (x) + f 0 (x) (y x). But,
at the points where the function is not di¤erentiable, we might well have several straight
lines r : R ! R that satisfy such conditions, that is, that touch the graph of the function
at the basepoint x and that lie above such graph elsewhere. The superdi¤erential, being
the collection of these straight lines, can thus be viewed as a surrogate of the tangent line,
i.e., of the di¤erential. This is the idea behind the superdi¤erential: it is a surrogate of the
di¤erential when it does not exist. The next result con…rms this intuition.
Example 938 Consider f : R ! R de…ned by f (x) = 1 jxj. The only point where f is
not di¤erentiable is x = 0. By Proposition 937, @f (x) = ff 0 (x)g for each x 6= 0. It remains
to determine @f (0). This is amounts to …nd the scalars that satisfy the inequality
1 jyj 1 j0j + (y 0) 8y 2 R
i.e., the scalars such that jyj y for each y 2 R. If y = 0, this inequality trivially
holds for all if y = 0. If y =
6 0, we have
y
1 (22.14)
jyj
Since
y 1 if y 0
=
jyj 1 if y < 0
from (22.14) it follows both 1 and ( 1) 1. That is, 2 [ 1; 1]. We conclude
that @f (0) = [ 1; 1]. Thus:
8
<1 if x > 0
@f (x) = [ 1; 1] if x = 0
:
1 if x < 0
N
In words, the superdi¤erential of a scalar function consists of all coe¢ cients that lie
between the right and left derivatives. This makes precise the geometric intuition we gave
above on scalar functions.
Proof We only prove that @f (x) f+0 (x) ; f 0 (x) . Let 2 @f (x). Given any h 6= 0, by
de…nition we have f (x + h) f (x) + h. If h > 0, we then have
f (x + h) f (x) f (x) + h f (x)
=
h h
and so f+0 (x) . If h < 0, then
f (x + h) f (x) f (x) + h f (x)
=
h h
and so f 0 (x). We conclude that 2 f+0 (x) ; f 0 (x) , as desired.
i = ei f ei = 0 8i = 1; :::; n
n
X
i = (1; :::; 1) f (1; :::; 1) = 1
i=1
Xn
i = ( 1; :::; 1) f ( 1; :::; 1) = 1
i=1
P
we conclude that ni=1 i = 1 and i 0 for each i = 1; :::; n. That is, belongs to the
simplex n 1 . Thus, @f (0) n 1 . On the other hand, if 2 n 1 , then
and so 2 @f (0). We conclude that @f (0) = n 1 , that is, the superdi¤erential at the
origin is the simplex. The reader can check that, for every x 2 Rn ,
i.e., @f (x) consists of the vectors x of the simplex such that x = f (x). N
If f is di¤erentiable, this result reduces to point (i) of Theorem 930. The next result
generalizes point (ii) of that theorem by showing that concave functions are everywhere su-
perdi¤erentiable and that, moreover, this is exactly a property that characterizes concave
functions (another proof of the tight connection between superdi¤erentiability and concav-
ity).
Proof We only prove the “if” part. Suppose @f (x) 6= ; at all x 2 C. Let x1 ; x2 2 C and
t 2 [0; 1]. Let 2 @f (tx1 + (1 t) x2 ). By (22.11),
that is,
Hence,
f (tx1 + (1 t) x2 )
tf (x1 ) t (1 t) (x1 x2 ) + (1 t) f (x2 ) (1 t) t (x2 x1 )
= tf (x1 ) + (1 t) f (x2 )
as desired.
The maintained hypothesis that C is open is key for the last two propositions, as the
next example shows.
p
Example 943 Consider f : [0; +1) ! R de…ned by f (x) = x. The only point of the
(closed) domain in which the function is not di¤erentiable is the boundary point x = 0. The
superdi¤erential @f (0) is given by the scalars such that
p p
y 0 + (y 0) 8y 0 (22.16)
p
i.e., such that y y for each y 0. If y = 0, this inequality holds for all . If y > 0,
p p
the inequality is equivalent to y=y = 1= y. But, letting y tend to 0, this implies
p
limy!0+ 1= y = +1. Therefore, there is no scalar for which (22.16) holds. It follows
that @f (0) = ;. We conclude that f is not superdi¤erentiable at the boundary point 0. N
N.B. We focused on open convex sets C to ease matters, but this example shows that
non-open domains may be important. Fortunately, the results of this section can be easily
extended to such domains. For instance, Proposition 942 can be stated for any convex set C
(possibly not open) by saying that a concave and continuous function f : C ! R is concave
on int C if and only if @f (x) is non-empty at all x 2 int C, i.e., at all interior points x of C.9
p
The concave function f (x) = x is indeed di¤erentiable (and so superdi¤erentiable, with
@f (x) = ff 0 (x)g) at all x 2 (0; 1), that is, at all interior points of the function’s domain
R+ . O
For concave functions this theorem gives the most general version of the …rst order
condition for concave functions. Indeed, in view of Corollary 937, Theorem 933 is a special
case of this result.
9
If the domain C is not assumed to be open, we need to require continuity (which is otherwise automatically
satis…ed by Theorem 609).
22.5. APPENDIX: MONOTONICITY OF OPERATORS 649
The next example shows how this corollary makes it possible to …nd maximizers even
when Fermat’s Theorem does not apply because there are points where the function is not
di¤erentiable.
Example 946 For the function f : R ! R de…ned by f (x) = 1 jxj we have ( Example
938): 8
< 1 if x > 0
@f (x) = [ 1; 1] if x = 0
:
1 if x < 0
By Corollary 945, x
^ = 0 is a maximizer since 0 2 @f (0). N
and strictly monotone (decreasing) if the inequality (22.17) is strict if x 6= y. The reader can
verify that for n = 1 we obtain again the usual notions of monotonicity.
Implicit functions
y = f (x)
It is in the usual form that, by keeping separate the independent variable x from the de-
pendent one y, permits to determine the values of the latter from those of the former. The
same function can be rewritten in implicit form, that is, through an equation that keeps all
the variables on the same side of the equality sign:
g (x; f (x)) = 0
g (x; y) = f (x) y
Example 948 (i) The function f (x) = x2 + x 3 can be written in implicit form as
g (x; f (x)) = 0 with g (x; y) = x2 + x 3 y. (ii) The function f (x) = 1 + lg x can be
written in implicit form as g (x; f (x)) = 0 with g (x; y) = 1 + lg x y. N
Note how
1
g (0) \ (A Im f ) = Gr f
The graph of the function f thus coincides with the level curve g 1 (0) at 0 of the function
g of two variables.2
The implicit rewriting of a scalar function f of which is known the explicit form is just a
bit more than a curiosity because the explicit form contains all the relevant information on f ,
in particular about the kind of dependence existing between the independent variable x and
1
In this section, to ease exposition we denote a function g of two variables by g(x; y) and not by g(x1 ; x2 ),
as in the rest of the text.
2
The rectangle A Im f has as its factors the domain and image of f . Clearly, Gr f A Im f . For
p
example,pfor the function f (x) = x this rectangle is the …rst orthant R2+ of the plane, while for the function
f (x) = x x2 is the unit square [0; 1] [0; 1] of the plane.
651
652 CHAPTER 23. IMPLICIT FUNCTIONS
the dependent one y. Unfortunately, often in applications we …nd important scalar functions
that are not de…ned in explicit form, “ready to use”, but only in implicit form through
equations g (x; y) = 0. For this reason, it is important to consider the inverse problem: does
an equation of the type g (x; y) = 0 de…ne implicitly a scalar function f ? In other words,
does f exist such that g (x; f (x)) = 0? This chapter will address this question by showing
that, under suitable regularity conditions, this function f exists and is unique (locally or
globally, as it will become clear).
Lemma 949 Let A and B be any two sets in R and let g : C R2 ! R with A B C.
The scalar function f : A ! B is the unique function in B A with the property
Proof “Only if”. Let (x; y) 2 Gr f . By de…nition, (x; y) 2 A B and y = f (x), thus
g (x; y) = g (x; f (x)) = 0. This implies (x; y) 2 g 1 (0) \ (A B), and so Gr f g 1 (0) \
1
(A B). As to the converse inclusion, let (x; y) 2 g (0) \ (A B). We want to show that
y = f (x). Suppose not, i.e., y 6= f (x). De…ne f~ : A ! R by f~ (x0 ) = f (x0 ) if x0 6= x and
f~ (x) = y. Since g (x; y) = 0, then g(x; f~ (x)) = 0 for every x 2 A. Since (x; y) 2 A B,
we have f~ 2 B A . Being f~ 6= f , this contradicts the uniqueness of f . We conclude that
(23.2) holds, as desired. “If”. By de…nition, (x; f (x)) 2 Gr f for each x 2 A. By (23.2),
we have (x; f (x)) 2 g 1 (0), and so g (x; f (x)) = 0 for each x 2 A. It remains to prove the
uniqueness of f . Let h 2 B A satisfy (23.1). By arguing as in the …rst inclusion of the “only
if” part of the proof, we can prove that Gr h g 1 (0) \ (A B). By (23.2), this yields
Gr h Gr f . If we consider x 2 A, then (x; h (x)) 2 Gr h Gr f . Since (x; h (x)) 2 Gr f ,
then (x; h (x)) = (x0 ; f (x0 )) for some x0 2 A. This implies x = x0 and h (x) = f (x0 ), and so
h (x) = f (x). Since x was arbitrarily chosen, we conclude that f = h, as desired.
3 1 1 1
In this case g (0) = f(x; y) 2 A B : g (x; y) = 0g and so g (0) \ (A B) = g (0).
23.2. A LOCAL PERSPECTIVE 653
The function f : B (x0 ) ! V (y0 ) is called implicit and is de…ned “locally” at the point
(x0 ; y0 ). The local point of view is particularly suited for di¤erential calculus, as the next
famous result, the Implicit Function Theorem, shows.4 It is the most important result in the
study of implicit functions and is widely used in applications.
The function f : B (x0 ) ! V (y0 ) is, therefore, de…ned implicitly by the equation g (x;y) =
0. Since f is unique and surjective, in view of Lemma 949 the relation (23.5) is equivalent to
that is, to
1
g (0) \ (B (x0 ) V (y0 )) = Gr f (23.8)
Thus, the level curve g 1 (0) can be represented locally by the graph of the implicit function.
This is, in the …nal analysis, the reason why the theorem is so important in applications (as
we will see shortly in Section 23.2.2).
4
This theorem …rst appeared in lecture notes that Ulisse Dini prepared in 1870s. For this reason, sometimes
it is named after Dini.
654 CHAPTER 23. IMPLICIT FUNCTIONS
Formula (23.6) permits the computation of the …rst derivative of the implicit function
even without knowing its explicit form. Since the …rst derivative is often what really mat-
ters about such function (because, for example, we are interested in solving a …rst order
condition), this is a most useful feature of the Implicit Function Theorem.
The proof of the Implicit Function Theorem is in the Appendix. We can, however, derive
heuristically formula (23.6) through the total di¤erential
@g @g
dg = dx + dy
@x @y
of the function g. We have dg = 0 for variations (dx; dy) that keep us along the level curve
g 1 (0). Therefore,
@g @g
dx = dy
@x @y
which “implies” (the power of heuristics!):
@g
dy @x
= @g
dx
@y
It is a rough (and incorrect), but certainly useful, argument in order to remember (23.6).
Example 952 In the trivial case of a linear function g (x; y) = ax+by k, equation g (x; y) =
0 becomes ax + by k = 0. From it we immediately get
a k
y = f (x) = x+
b b
23.2. A LOCAL PERSPECTIVE 655
provided b 6= 0. Also in this very simple case, the existence of an implicit function requires
the condition b = @g (x) =@y 6= 0. N
Sometimes it is possible to …nd stationary points of the implicit function without knowing
its explicit form. When this happens, it is a remarkable application of the Implicit Function
Theorem. For instance, consider in the previous example the point (4; 2) 2 g 1 (0). We have
(@g=@y) (4; 2) = 32 6= 0. Let f : B (4) ! V (2) be the unique function then de…ned implicitly
at the point (4; 2).6 We get:
@g
(4; 2) 0
0
f (4) = @x = =0
@g 32
(4; 2)
@y
Therefore, the point x0 = 4 is a stationary point for the implicit function f . It is possible to
check that it is actually a local maximizer.
5
The reader can verify that also ( 12; 2) 2 g 1 (0) and @g=@y ( 12; 2) 6= 0, and calculate f 0 ( 12) for
the implicit function de…ned at ( 12; 2).
6
This function is di¤erent from the previous implicit function de…ned at the other point (4; 2).
656 CHAPTER 23. IMPLICIT FUNCTIONS
Example 954 (i) Consider the function g : R2 ! R given by g (x; y) = 7x2 + 2y ey . The
hypotheses of the Implicit Function Theorem are satis…ed at every point (x0 ; y0 ) 2 R2 . Thus,
equation g (x; y) = 0 de…nes implicitly at a point (x0 ; y0 ) 2 g 1 (0) a scalar continuously
di¤erentiable function f : B (x0 ) ! V (y0 ) with
@g(x;y)
14x
f 0 (x) = @x
= (23.10)
@g(x;y) 2 ey
@y
for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )). Even if we do not know the form of f , we have
been able to …nd its derivative function f 0 . The …rst order local approximation is
1 p 1 1
f p = 2 7 x p +o x p
7 7 7
for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )). The …rst order local approximation is
and, as x ! 0,
1
f (x) = y0 + f 0 (0) x + o (x) = x + o (x)
4
N
By exchanging the variables in the Implicit Function Theorem, we can say that the
continuity of the partial derivatives of g in a neighborhood of (x0 ; y0 ) and the condition
@g (x0 ; y0 ) =@x 6= 0 ensures the existence of a (unique) implicit function x = ' (y) such that,
locally, we have g (' (y) ; y) = 0. It follows that, if at least one of the two partial derivatives
23.2. A LOCAL PERSPECTIVE 657
@g (x0 ; y0 ) =@x and @g (x0 ; y0 ) =@y is not zero, there is locally a univocal tie between the two
variables. As a result, the Implicit Function Theorem cannot be applied only when both the
partial derivatives @g (x0 ; y0 ) =@y and @g (x0 ; y0 ) =@x are zero.
For example, if g (x; y) = x2 + y 2 1, then for every point (x0 ; y0 ) that satis…es the
equation g (x; y) = 0 we have @g (x0 ; y0 ) =@y = 2y0 , which is zero only for y0 = 0 (and hence
x0 = 1). At the two points (1; 0) and ( 1; 0) the equation does not de…ne, indeed, any
implicit function of the type y = f (x). But, @g ( 1; 0) =@x = 2 6= 0 and, therefore, in such
points the equation de…nes an implicit function of the type x = ' (y). Symmetrically, at the
two points (0; 1) and (0; 1) the equation de…nes an implicit function of the type y = f (x),
but not one of the type x = ' (y).
This last remark suggests a …nal important observation on the Implicit Function The-
orem. Suppose that, as at the beginning of the chapter, ' is a standard function de…ned in
explicit form, which can be written in implicit form as
Given (x0 ; y0 ) 2 g 1 (0), suppose @g (x0 ; y0 ) =@x 6= 0. The Implicit Function Theorem (in
“exchanged” form) then ensures the existence of neighborhoods B (y0 ) and V (x0 ) and of a
unique function f : B (y0 ) ! V (x0 ) such that
g (f (y) ; y) = 0 8y 2 B (y0 )
The function f is, therefore, the inverse of ' on the neighborhood B (y0 ). The Implicit
Function Theorem thus implies the existence –locally, around the point y0 –of the inverse
of '. In particular, formula (23.6) here becomes
@g
(x0 ; y0 )
@y 1
f 0 (y0 ) = = 0
@g ' (x0 )
(x0 ; y0 )
@x
which is the classical formula (18.20) of the derivative of the inverse function. In sum, there
is a close connection between implicit and inverse functions, which the reader will explore in
more advanced courses.
since g 1 (k) = gk 1 (0). The Implicit Function Theorem enables us to study locally gk 1 (0),
and so g 1 (k). In particular, the implicit function f : B (x0 ) ! V (y0 ) permits to establish
a functional representation of the level curve g 1 (k) through the fundamental relation
1
g (k) \ (B (x0 ) V (y0 )) = Gr f (23.12)
which is the general form of (23.7) for any k 2 R. Implicit functions thus describe the link
between the variables x and y that belong to the same level curve, thus making it possible
to formulate trough them some key properties of these curves. The great e¤ectiveness of this
formulation explains the importance of implicit functions, as mentioned right after (23.7).
For example, the isoquant g 1 (k) is a level curve of the production function g : R2+ ! R,
which features two inputs, x and y, and one output. The points (x; y) that belong to the
isoquant are all the input combinations that keep constant the quantity of output produced.
The implicit function y = f (x) tells us, locally, how it has to change the quantity y, when
x varies, in order to keep constant the overall production. Therefore, the properties of the
function f : B (x0 ) ! V (y0 ) characterize, locally, the relations between the inputs that
guarantee the level k of output. We usually assume that f is:
(i) decreasing, that is, f 0 (x) 0 for every x 2 B (x0 ): the two inputs are partially
substitutable and, to keep unchanged the quantity produced to the level k, to lower
quantities of the input x have to correspond larger quantities of the input y (and vice
versa);
(ii) convex, that is, f 00 (x0 ) 0 for every x 2 B (x0 ): to greater levels of x, have to corres-
pond larger and larger quantities of y in order to compensate (negative) in…nitesimal
variations of x in order to keep production at level k.
The absolute value jf 0 j of the derivative of the implicit function is called marginal rate
of transformation because, for in…nitesimal variations of the inputs, it describes their degree
of substitutability – that is, the variation of y that balances an increase in x. Thanks to
the functional representation (23.12) of the isoquant, geometrically the marginal rate of
transformation can be interpreted as the slope of the isoquant at (x; y). This is the classical
interpretation of the rate, which follows from (23.12).
The Implicit Function Theorem implies the classical formula
@g
(x; y)
M RTx;y = f 0 (x) = @x
@g
(23.13)
@y (x; y)
This is the usual form in which appears the notion of marginal rate of transformation
M RTx;y .7
For example, at a point at which we use equal quantities of the two inputs –that is, x = y –
if we increase of one unit the …rst input, the second one must decrease of = (1 ) units to
leave unchanged the quantity of output produced: in particular, when = 1=2, the decrease
of the second one must be of one unit. At a point at which we use a quantity of the second
input …ve times bigger than that of the …rst one –that is, y = 5x –an increase of one unit
of the …rst input is compensated by a decrease of 5 = (1 ) of the second one. N
Similar considerations hold for the level curves of a utility function u : R2+ ! R, that
is, for its indi¤erence curves u 1 (k). The implicit functions tell us, locally, how has to
vary the quantity y when x varies to keep constant the overall utility level. For them we
assume properties of monotonicity and convexity similar to those assumed for the implicit
functions de…ned by isoquants. The monotonicity of the implicit function re‡ects the partial
substitutability of the two goods: it is possible to consume a bit less of one good and a
bit more of the other one and to keep unchanged the overall level of utility. The convexity
of the implicit function models the classical hypothesis of decreasing rates of substitution:
when the quantity of a good, for example x, increases we then need greater and greater
“compensative” variations of the other good y in order to remain on the same indi¤erence
curve, i.e., in order to have u (x; y) = u (x + x; y + y).
The absolute value jf 0 j of the derivative of the implicit function is called marginal rate of
substitution: it measures the (negative) variation in y that balances marginally an increase
in x. Geometrically, it is the slope of the indi¤erence curve at (x; y). Thanks to the Implicit
Function Theorem, we have
@u
(x; y)
M RSx;y = f 0 (x) = @x
@u
@y (x; y)
which is the classical form of the marginal rate of substitution.
Let h be a scalar function with a strictly positive derivative, so that it is strictly increasing
and h u is then a utility function equivalent to u. By the chain rule,
@h u
@x (x; y) h0 (u (x; y)) @u
@x (x; y)
@u
@x (x; y)
@h u
= @u
= @u
(23.14)
@y (x; y) h0 (u (x; y)) @y (x; y) @y (x; y)
Since we can drop the derivative h0 (u (x; y)), the marginal rate of substitution is the same
for u and for all its increasing transformations h u. Thus, the marginal rate of substitution
is an ordinal notion, invariant for strictly increasing transformations. It does not depend
on which equivalent utility function, u or h u, is considered. This explains the centrality
of this ordinal notion in consumer theory, where it replaced the notion of marginal utility
(which is instead, as already observed, a cardinal notion).
Example 956 To illustrate (23.14), consider on Rn++ the equivalent Cobb-Douglas utility
function u (x; y) = xa y 1 a and log-linear utility function log u (x; y) = a log x + (1 a) log y.
We have
@u @ log(u(x;y))
@x (x; y) axa 1 y 1 a a y @x (x; y)
M RSx;y = @u
= a a
= = @ log(u(x;y))
@y (x; y) (1 a) x y 1 ax (x; y)
@y
The two utility functions have the same marginal rate of substitution. N
660 CHAPTER 23. IMPLICIT FUNCTIONS
Finally, let us consider a consumer that consumes in two periods, today and tomorrow,
with intertemporal utility function U : R2+ ! R given by
where we assume the same instantaneous utility function u in the two periods. Given a
utility level k, let
U 1 (k) = (c1 ; c2 ) 2 R2+ : U (c1 ; c2 ) = k
be the intertemporal indi¤erence curve and let (c1 ; c2 ) be a point of it. When the hypotheses
of the Implicit Function Theorem (with the variables exchanged) are satis…ed at such point,
there exists an implicit function f : B (c2 ) ! V (c1 ) such that
The scalar function c1 = f (c2 ) tells us how much has to vary consumption today c1 when
consumption tomorrow c2 varies, in order to keep constant the overall utility U . We have:
@U
(c1 ; c2 )
@c2 u0 (c2 )
f 0 (c2 ) = =
@U u0 (c1 )
(c1 ; c2 )
@c1
When it exists,
u0 (c2 )
IM RSc1 ;c2 = f 0 (c2 ) = (23.15)
u0 (c1 )
is called intertemporal marginal rate of substitution: it measures the (negative) variation in
c1 that balances an increase in c2 .
Example 957 Consider the power utility function u (c) = c = for > 0. We have
c1 c2
U (c1 ; c2 ) = +
Theorem 958 If in the Implicit Function Theorem the function g is n times continuously
di¤ erentiable, then also the implicit function f is n times continuously di¤ erentiable. In
particular, for n = 2 we have
2 2
@g(x;y) @g(x;y)
@2x @y 2 @g(x;y)
@x@y
@g(x;y) @g(x;y)
@x @y + @g(x;y)
@2y
@g(x;y)
@x
f 00 (x) = 3 (23.16)
@g(x;y)
@y
Proof We shall omit the proof of the …rst part of the statement. Suppose f is twice
di¤erentiable and let us apply the chain rule to (23.6), that is to
@g(x;f (x))
gx0 (x; f (x))
f 0 (x) = @x
=
@g(x;f (x)) gy0 (x; f (x))
@y
For the sake of brevity we do not make the dependence of the derivatives of g on (x; f (x))
explicit, so we can write
00 + g 00 f 0 (x) g 0
gxx gx0 gyx
00 + g 00 f 0 (x)
xy y yy
f 00 (x) = 2 + 2
gy0 gy0 (x; f (x))
0 0
00
gxx 00 gx g 0
gxy 00
gx0 gyx 00 gx
gyy
g0 y y g0 y
= 2 + 2
gy0 gy0
00 g 02
gxx 00 g 0 g 0 + g 00 g 02
2gxy
y x y yy x
=
gy0 3
as desired.
What we have seen in the two previous theorems allows us to give local approximations
for an implicitly de…ned function. As we know, one is rarely able to write the explicit
formulation of a function which is implicitly de…ned by an equation: being able to give
approximations is hence of great importance.
If g is of class C 1 on an open set U , the …rst order approximation of the implicitly de…ned
function in a point (x0 ; y0 ) 2 A such that g (x0 ; y0 ) = 0 is
@g
(x0 ; f (x0 ))
f (x) = y0 @x (x x0 ) + o (x x0 )
@g
(x0 ; f (x0 ))
@y
for x ! x0 .
If f is of class C 2 on an open set U , the second order approximation (often referred to
as quadratic) of the implicit function in a point (x0 ; y0 ) 2 U such that g (x0 ; y0 ) = 0 is, for
x ! x0 ,
00 g 0 2 00 g 0 g 0 + g 00 g 02
gx0 gxx y 2gxy x y yy x
f (x) = y0 (x x0 ) + (x x0 )2 + o (x x0 )2
gy0 gy03
where we omitted the dependence of the derivatives on the point (x0 ; f (x0 )).
662 CHAPTER 23. IMPLICIT FUNCTIONS
316
f 00 (x0 ) = >0
1331
and so the point is a local minimizer. N
g (x1 ; :::; xn ; y) = 0
in which the independent variable can be a a vector, while the dependent one is still scalar.
In this case we have that g : A Rn+1 ! R and the function implicitly de…ned by equation
g (x; y) = 0 is a function f in n variables.
Fortunately, the results on implicit functions we outlined for x scalar can be easily ex-
tended to the case in which x is a vector. Let us have a look at the vectorial version of
Dini’s Theorem. Since f is a function in many variables, the partial derivatives @f (x) =@xk
substitute the derivative f 0 (x) from the scalar case.
Theorem 960 Let g : U ! R be de…ned (at least) on an open set U of Rn and let g (x0 ; y0 ) =
0. If g is continuously di¤ erentiable on a neighborhood of (x0 ; y0 ), with
@g
(x0 ; y0 ) 6= 0
@y
23.2. A LOCAL PERSPECTIVE 663
then there exist neighborhoods B (x0 ) Rn and V (y0 ) R and a unique vector function
f : U (x0 ) ! V (y0 ) such that
g (x; f (x)) = 0 8x 2 U (x0 ) (23.17)
The function f is surjective and continuously di¤ erentiable on B (x0 ), with
@g
(x; y)
@f @xk
(x) = (23.18)
@xk @g
(x; y)
@y
for every (x; y) 2 g 1 (0) \ U (x0 ) V (y0 ) and every k = 1; :::; n.
Example 961 Let g : R3 ! R be de…ned as g (x1 ; x2 ; y) = x21 x22 +y 3 and let (x1 ; x2 ; y0 ) =
(6; 3; 3). We have that g 2 C 1 R3 and so (@g=@y) (x; y) = 3y 2 , therefore
@g
(6; 3; 3) = 27 6= 0
@y
By Dini’s Theorem there exists a unique y = f (x1 ; x2 ) de…ned in a neighborhood U (6; 3),
which is di¤erentiable therein and takes values in a neighborhood V ( 3). Since
@g @g
(x; y) = 2x1 and (x; y) = 2x2
@x1 @x2
we have that
@f 2x1 @f 2x2
(x) = and (x) = 2
@x1 3y 2 @x2 3y
In particular
4 2
rf (6; 3) = ;
27 27
The reader can check that a global implicit function exists f : R2 ! R and, after having
recovered the explicit expression (which exists because of the simplicity of g), can verify that
Dini’s formula is correct in computing rf (x). N
We can now state, without proof, the operator version of the Implicit Function Theorem,
which is the most general form that we consider.
Theorem 962 Let g : U ! Rm be de…ned (at least) on an open set U of Rn+m and let
g (x0 ; y0 ) = 0. If g is continuously di¤ erentiable on a neighborhood of (x0 ; y0 ), with
det Dy g (x; y) 6= 0 (23.20)
then there exist neighborhoods B (x0 ) Rn and V (y0 ) Rm and a unique operator f =
(f1 ; :::; fm ) : B (x0 ) ! V (y0 ) such that (23.19) holds for every x 2 B (x0 ). The operator f
is surjective and continuously di¤ erentiable on B (x0 ), with
1
Df (x) = (Dy g (x; y)) Dx g (x; y) (23.21)
for every (x; y) 2 g 1 (0) \ (B (x0 ) V (y0 )).
23.3. A GLOBAL PERSPECTIVE 665
The Jacobian of the implicit operator is thus pinned down by formula (23.21). To better
understand this formula, it is convenient to write it as an equality
Dy g (x; y)Df (x) = D g (x; y)
| {z }| {z } | x {z }
m m m n m n
of two m n matrices. In terms of the (i; j) 2 f1; :::; mg f1; :::; ng component of each such
matrix, the equality is
Xm
@gi @fk @gi
(x) (x) = (x)
@yk @xj @xj
k=1
For each independent variable xj , we can determine the sought-after m-dimensional vector
@f1 @fm
(x) ; :::; (x)
@xj @xj
by solving the following linear system with m equations:
8
> Pm @g1 @fk @g1
>
> (x) (x) = (x)
>
> k=1
@yk @xj @xj
>
>
>
< Pm @g2 (x) @fk (x) = @g2 (x)
k=1
@yk @xj @xj
>
>
>
>
>
> Pm @gm @fk @gm
>
> (x) (x) = (x)
: k=1
@yk @xj @xj
By doing this for each j, we can …nally determine the Jacobian Df (x) of the implicit
operator.
Our previous discussion implies, inter alia, that in the special case m = 1 formula (23.21)
reduces to
@g @f @g
(x) (x) = (x)
@y @xj @xj
which is formula (23.18) of the vector function version of the Implicit Function Theorem.
Since condition (23.20) reduces to (@g=@y) (x0 ; y0 ) 6= 0, we conclude that the vector function
version is, indeed, the special case m = 1.
g (x; f (x)) = 0 8x 2 E1
that is
g (x; y) = 0 () y = f (x) 8 (x; y) 2 E
In such a signi…cant case, the implicit function f allows us to represent the level curve
g 1 (0) on E by means of its graph Gr f . In other words, the level curve admits a functional
representation. In particular, when E is the rectangle 1 g 1 (0) 2 g
1 (0) , it follows
we have that
E [ 1; 1] [ 1; 1]
that is, the possible implicit function takes the form f : E1 ! E2 with E1 [ 1; 1] and
E2 [ 1; 1]. Let us …x x 2 [ 1; 1] so to analyze the set
S (x) = y 2 [ 1; 1] : x2 + y 2 = 1
The set has two elements, except for x = 1. In other words, for every 0 < x < 1 there are
two values y for which g (x; y) = 0. Let us consider the rectangle made up by the projections,
that is
E = [ 1; 1] [ 1; 1]
f (x) 2 S (x) 8x 2 [ 1; 1]
entails that
g (x; f (x)) = 0 8x 2 [ 1; 1]
and is thus implicitly de…ned by g on E. Such functions are in…nitely many; for example,
this is the case for function
( p
1 x2 if x 2 Q\ [ 1; 1]
f (x) = p
1 x2 otherwise
Therefore, there are in…nitely many functions implicitly de…ned by g on the rectangle
E = [ 1; 1] [ 1; 1].8 The equation g (x; y) = 0 is therefore not explicitable on the rectangle
[ 1; 1] [ 1; 1], which makes this case hardly interesting. Let us consider instead the less
ambitious rectangle
~ = [ 1; 1]
E [0; 1]
p
The function f : [ 1; 1] ! [0; 1] de…ned as f (x) = 1 x2 is the only function such that
p
g (x; f (x)) = g x; 1 x2 = 0 8x 2 [ 1; 1]
The function f is the only function which is implicitly de…ned by g on the rectangle E, ~ and so
~ Moreover, f is surjective, that is f ([ 1; 1]) = [0; 1],
equation g (x; y) = 0 is explicitable on E.
which implies that
g 1 ~ = Gr f
(0) \ E
8
Note that most of them are somewhat irregular; the only continuous ones among them are the two in
(23.25).
668 CHAPTER 23. IMPLICIT FUNCTIONS
y
2.5
1.5
0.5
0
-1 O 1 x
-0.5
-1
-2 -1 0 1 2
p
Gra…co di f (x) = 1 x2 per x 2 [ 1; 1]:
y
1.5
0.5
-1 1
0
O x
-0.5
-1
-1.5
-2
-2 -1 0 1 2
p
Gra…co di h(x) = 1 x2 per x 2 [ 1; 1]:
23.3. A GLOBAL PERSPECTIVE 669
Example 966 To sum up, there are in…nitely many implicit functions on the projections
rectangle E, while uniqueness (and surjectivity) can be obtained when we restrict ourselves
to the smaller rectangles E~ and E. The study of implicit functions is of interest on these two
rectangles, as the unique implicit function f de…ned thereon describes a univocal relationship
between the variables x and y which equation g (x; y) = 0 implicitly determines. N
O.R. If we draw the graph of the level curve g 1 (0), that is the locus of points satisfying
equation g (x; y) = 0, one can notice how the rectangle E can be thought of a sort of “frame”
on this graph, isolating a part of it. In some framings the graph is explicitable, in other less
fortunate ones, it is not. By changing the framing we can tell apart di¤erent parts of the
graph according to their explicitability. H
The last example showed how it is important to study, for each x 2 g 1 (0) , the set
1
of solutions
S (x) = y 2 2 g 1 (0) : g (x; y) = 0
The scalar functions f such that f (x) 2 S (x) for every x in their domain, are the possible
implicit functions. In particular, when the rectangle E is such that S (x) \ E2 is a singleton
for each x 2 E1 , we have a unique implicit function. In other words, this is the case when E
is such that, for any …xed x 2 E1 , there is a unique solution y to equation g (x; y) = 0.
Let us see another simple example, warning the reader that these - however useful to …x
ideas - are very fortunate cases: usually constructing S (x) is far from easy (though local,
the Implicit Function Theorem is key in this regard).
p
Example 967 Let g : R2+ ! R be given by g (x; y) = xy 1. We have that
1
g (0) = (x; y) 2 R2+ : xy = 1
We have that
1
S (x) = 8x 2 (0; +1)
x
which leads us to consider E = R2++ and f : (0; +1) ! (0; +1) given by f (x) = 1=x. we
have that
1
g (x; f (x)) = g x; =0 8x 2 (0; +1)
x
and f is the only function implicitly de…ned by g on R2++ . Moreover, since f is surjective,
we have that
g 1 (0) \ R2++ = Gr f
The level curve g 1 (0) can be represented on R2++ , as the graph of f . N
670 CHAPTER 23. IMPLICIT FUNCTIONS
We shall soon present the main results regarding existence and uniqueness of an implicit
function. These results are vastly used in economic theory, which often deals with functions
such as g (x; y) = 0 for which the possibility of the existence of a univocal relationship (and
hence the nature of such relationship) between variables is of paramount interest.
The reader should be aware of the fact that an explicit form can rarely be found for
implicitly de…ned functions. This is possible in the simplest cases only, for example whenever
g is linear; normally one can guarantee the existence of a function implicitly de…ned by an
equation, without being able to …nd it explicit formulation. We shall see that, even when the
explicit form is not available, one can compute its derivative, for example. This will allow us
to use Taylor’s formula in order to give a local approximation of the implicit function, even
when its analytical expression cannot be given).
In particular, having found the equilibrium price p^ by solving the equation D (p) = S (p),
the equilibrium quantity is q^ = D (^
p) = S (^
p).
Suppose that the demand for the good (also) depends on an exogenous variable 0.
For example, may be the level of indirect taxation which in‡uences the demanded quantity.
The demand thus takes the form D ( ; p) and it is a function D : [0; b] R+ ! R, that is,
it depends on both the market price p and the value of the exogenous variable. The
equilibrium condition (23.27) now becomes
q = D ( ; p) = S (p) (23.28)
and the equilibrium price p^ varies as changes. What is the relationship between taxation
level and equilibrium prices? Which properties does such a relationship have?
Answering these questions, which are simple but crucial from an economic perspective,
is equivalent to asking oneself: (i) whether a (unique) function p = f ( ) which connects
taxation and equilibrium prices, that is the exogenous and endogenous variable of this simple
market model, exists, and (ii) which properties such a function has.
In order to deal with this problem, we introduce the function g : [0; b] R+ ! R given
by g ( ; p) = S (p) D ( ; p), so that the equilibrium condition (23.28) can be written as
g ( ; p) = 0
In particular,
1
g (0) = f( ; p) 2 [0; b] R+ : g ( ; p) = 0g
is the set of all pairs of equilibrium prices/taxation levels, that is endogenous variable/exogenous
variable.
The two questions asked above are now equivalent to asking oneself whether:
(ii) if so, which are the properties of such a function f : for example, if it is decreasing, so
that higher indirect taxes correspond to lower equilibrium prices.
Problems as such, where the relationship among endogenous and exogenous variables
is studied, and, in particular how changes in the former impact the latter, are of central
importance in economic theory and in its empirical tests.
In order to …x ideas, let us examine the simple linear case where everything is straight-
forward.
D ( ; p) = (p + )
S (p) = a + bp
672 CHAPTER 23. IMPLICIT FUNCTIONS
g ( ; p) = a + bp + (p + )
a
f( )= + (23.29)
b+ +b
clearly satis…es (23.28). The equation g ( ; p) = 0 thus implicitly de…nes (and in this case
also explicitly) the function f given by (23.29). Its properties are obvious: for example, it
is strictly decreasing, so that changes in the taxation level bring about opposite changes in
equilibrium prices.
Regarding the equilibrium quantity q^, for every it is
q^ = D (f ( ) ; ) = S (f ( ))
of a …rm in perfect competition with cost function c : [0; +1) ! R which we suppose to be
di¤erentiable. As seen in Section 16.1.3, the …rm’s optimization problem is
If, as one would expect, we assume there to be at least one production level y > 0 such that
(y) > 0, the level y = 0 is not optimal so that problem (23.30) becomes
Since the set (0; +1) is open, by Fermat’s Theorem, a necessary condition for y > 0 to be
optimal is that it satis…es the …rst order condition
0
(y) = p c0 (y) = 0 (23.32)
The most crucial aspect of the producer’s problem is to assess how the optimal production
varies as the market price changes, as this determines the producer’s behavior in the market
23.3. A GLOBAL PERSPECTIVE 673
for good y. Such a relevant relationship between prices and quantities is expressed by the
scalar function f such that
that is, by the function implicitly de…ned by the …rst order condition (23.32). Function f is
referred to as the producer’s (individual) supply function and, for each price level p, it gives
the optimal quantity y = f (p). Its existence and properties (for example, if it is increasing,
that is if higher prices lead to larger produced quantities, hence larger supplied quantities
in the market) are of central importance in studying the market for good y. In particular,
the sum of the supply functions of all producers of the good who are present in the market
constitutes the market supply function S (p) which we saw in Chapter 12.
In order to formalize the derivation of the supply function from the optimization problem
(23.31), we de…ne the function g : [0; +1) (0; +1) ! R given by
g (p; y) = p c0 (y)
g (p; y) = 0
which describes the producer’s optimal price/quantity pair. If there exists a function y =
f (p) such that g (p; f (p)) = 0, it is nothing but the supply function itself. Its properties
(monotonicity in particular) are essential for studying the good’s market. Let us see a simple
example where the function f and its properties can be recovered with simple computations.
Example 970 Let us consider quadratic costs: c (y) = y 2 for y 0. In such a case g (p; y) =
p 2y, so that the only function f : [0; +1) ! [0; +1) implicitly de…ned by g on R2+ is
f (p) = p=2. In particular, f is strictly increasing so that higher prices entail a higher
production, and hence a larger supply. N
Having said this, let us focus on the following important result. It shows how strict
monotonicity in y is a su¢ cient condition for g (x; y) = 0 to de…ne a unique implicit function.9
9
A function is strictly monotone if it strictly increasing or strictly decreasing.
674 CHAPTER 23. IMPLICIT FUNCTIONS
g (x; y) = g x; y 0 =) y = y 0 (23.34)
@g (x; y) @g (x0 ; y 0 )
>0 8 (x; y) ; x0 ; y 0 2 A (23.35)
@y @y
@g (x; y)
= 2 ey < 0 8y 2 R
@y
and so, by Bolzano’s Theorem, 0 2 Img. By Theorem 971, there is one and only one implicit
function f : R ! R such that, for every x 2 R,
Note that we are not able to e¤ectively write y as an explicit function of x that is we are
not able to provide the explicit form of f . N
By Theorem 971, g~ de…nes a unique implicit function f : R f0g ! R f0g on R2 . Since for
every x 2 R f0g, we have that
8 np o
>
> 5 3
x if x > 0
< 2
S~ (x) = fy 2 R f0g : g~ (x; y) = 0g = n p o
>
>
: 5+3
x if x < 0
2
This result also holds when in point (i) “decreasing ”and “increasing ”are reversed, and
also when, in points (ii) and (iii), the roles concavity and convexity are reversed.11
The following lemma shows that assuming that g is strictly increasing in x as well as in
y in point (i) is equivalent to directly assuming that g is strictly increasing on A.
Proof Let us only show the “If” part as the converse is trivial. Hence, let g : A R2 ! R
be strictly increasing both in x and y. Let (x; y) > (x0 ; y 0 ). Our aim is to show that
g (x; y) > g (x0 ; y 0 ). If x = x0 or y = y 0 , the result is trivial. Hence, let x > x0 and y > y 0 ,
that is (x; y) > (x0 ; y 0 ). We have that
and so g (x; y) > g (x0 ; y) > g (x0 ; y 0 ), which implies that g (x; y) > g (x0 ; y 0 ).
Proof of Proposition 975 By Proposition 971 there exists an implicit function f : 1 g 1 (0) !
1 (0) .
2 g
(i) Since it is strictly increasing both in x and y, by Lemma 976 the function f is strictly
increasing. Let us show that f is strictly decreasing. Take x; x0 2 1 g 1 (0) with x > x0 .
Suppose, by contradiction, that f (x) f (x0 ). This implies that (x; f (x)) > (x0 ; f (x0 )) and
so g (x; f (x)) > g (x0 ; f (x0 )),which contradicts g (x; f (x)) = g (x0 ; f (x0 )).
(ii) let g be quasi concave. Let us show that f is convex. Let x; x0 2 1 g 1 (0) and
2 [0; 1]. From g (x; f (x)) = g (x0 ; f (x0 )) it follows that
11
That is, f is strictly increasing if g is strictly increasing in x, and is (strictly) concave if g is (strictly)
quasi convex. In this regard, note that in points (ii) and (iii) we tacitly assumed that the domain of A and
the projections 1 g 1 (0) are convex sets, otherwise speaking of the concavity of g and the convexity of f
would be meaningless.
23.3. A GLOBAL PERSPECTIVE 677
g x + (1 ) x0 ; f (x) + (1 ) f x0
g (x; f (x)) = g x + (1 ) x0 ; f x + (1 ) x0
g (x; y0 ") < 0 for each x 2 U 0 (x0 ) and g (x; y0 + ") > 0 for each x 2 U 00 (x0 )
Since g is strictly increasing, for every x 2 U (x0 ) the only value y such that g (x; y ) = 0 is
thus between y0 " and y0 + ":
Therefore, we have that for the implicit function f , for every " > 0 there exists a neighbor-
hood U (x0 ) such that for every x in such a neighborhood
This guarantees that f is continuous in x0 . In fact, having …xed " > 0, let xn ! x. There is
a n" such that xn 2 U (x0 ) for every n n" , so that
Since this hold for any " > 0, we have that limn f (xn ) = f (x0 ). Since x0 was arbitrarily
chosen, f is continuous everywhere.
Example 977 The Cobb-Douglas function u (x; y) = x y 1 , with 0 < < 1, is continu-
ous, strictly increasing and strictly concave on R2++ . Having set k > 0, by Proposition 975
the equation u (x; y) k = 0 de…nes on R2++ a unique implicit function fk : (0; 1) ! R
which is strictly decreasing and convex. N
Equilibrium comparative statics: properties Let us use the results we proved above
for the comparative statics problems we saw in Section 23.3.1.
Let us examine the …rst problem with indirect taxation . Suppose that:
(i) D : [0; b] R ! R and S : [0; b] ! R are continuous and such that D (0; ) S (0) and
D (b; ) S (b) for every .
678 CHAPTER 23. IMPLICIT FUNCTIONS
Property (i) is especially interesting. Under the natural hypothesis that D is strictly
decreasing in , we have that f is strictly decreasing, that is changes in taxation bring about
opposite changes in equilibrium prices (increases in entail decreases in p, and decreases in
determine increases in p).
In the linear case of Example 969 the existence and properties of f follow from simple
computations. The results in this section allow to extend the same conclusions to much more
general demand and supply functions.
In the special case of the producer’s problem, we have that F (p; y) = px c (y) and so
@F (p; y)
g (p; y) = =p c0 (y) :
@y
The strict monotonicity of g in y is equivalent to the strict monotonicity of the derivative
function c0 (and to its strict convexity or concavity). In particular, if c0 is strictly increasing
(and so c is strictly convex), the function g is concave, which implies that the supply function
y = f (p) is convex. In such a case, since g is strictly increasing in p, the supply function is
strictly increasing in p.
12
Indeed D and S are continuous and, furthermore, D (0; ) = S (0) and D (b; ) 5 S (b) for every .
23.4. A GLOCAL PERSPECTIVE 679
@g @g
(x; y) (x; f (x))
f 0 (x) = @x = @x 8 (x; y) 2 g 1
(0)
@g @g
(x; y) (x; f (x))
@y @y
Proof It is enough to notice that the hypothesis of the Implicit Function Theorem are
satis…ed in every (x; y) 2 g 1 (0).
When condition (23.35) does not hold on the whole U , but only on one of its open subsets
D, the result can be used for the restriction g~ of g on D. This observation allows us to use
the result in many more settings, as the following variation of the previous example shows.
@g @g
(x; y) > 0 () y 2 ( 1; 2) [ (0; +1) and (x; y) < 0 () y 2 ( 2; 0)
@y @y
23.5 Appendix
23.5.1 Projections and shadows
Let A be a subset of the plane R2 : we denote each point as (x; y). Its projection
is the set of point x on the x-axis such that there exists a point y on the y-axis such that
the pair (x; y) belong to A.13
Likewise de…ne the projection
on the y-axis,that is the set of points y on the y-axis such that there exists (at least) one
point x on the x-axis such that (x; y) belongs to A.
The projections 1 (A) and 2 (A) are nothing but the “shadows” of the set A R2 on
the two axes.
4
y
0 π (A)
2
-2
-4
O π (A) x
1
-6
-6 -4 -2 0 2 4 6
13
The notion of projection is not to be confused with the di¤erent one seen in Chapter 19.1.
23.5. APPENDIX 681
In particular, 1 (Gr f ) is the domain of f and 2 (Gr f ) is the image Im f . This holds in
general: if f : A R ! R one has 1 (Gr f ) = A and 2 (Gr f ) = Im f . N
@g
(x0 ; y0 ) > 0 (23.36)
@y
@g ~ (x0 ; y0 )
(x; y) > 0 8 (x; y) 2 B
@y
Let " > 0 be small enough so that
and let g" be the restriction of g on this rectangle. Clearly, @g" (x; y) =@y > 0 for every
(x; y) 2 (x0 "; x0 + ") (y0 "; y0 + "). Furthermore, the projections 1 g" 1 (0) and
682 CHAPTER 23. IMPLICIT FUNCTIONS
2 g" 1 (0) are open intervals (why?). By setting U (x0 ) = 1 g" 1 (0) and V (y0 ) =
1
2 g" (0) , Theorem 971 applied to g" guarantees the existence of a unique implicit function
f : U (x0 ) ! V (y0 ) on the rectangle U (x0 ) V (y0 ) such that
g (x; f (x)) = 0 8x 2 U (x0 )
The function f is surjective (why?).
In order to show that f is continuously di¤erentiable, let us consider two points x and
x + x in U (x0 ) whose images are respectively
y = f (x) and y+ y = f (x + x) :
It must hold that
g" (x; y) = g" (x + x; y + y) = 0 and hence g" (x + x; y + y) g" (x; y) = 0:
Since g" is continuously di¤erentiable in U (x0 ) U (x0 ), we can write the linear approxim-
ation
@g" @g" p
g" (x + x; y + y) g" (x; y) = (x; y) x + (x; y) y + o x2 + y 2
@x @y
and so it must hold that
@g" @g" p
(x; y) x + (x; y) y + o x2 + y 2 = 0:
@x @y
Since
@g"
(x; y) 6= 0
@y
in a neighborhood (x0 ; y0 ), dividing by
@g"
(x; y) x
@y
both sides of the previous equality, we get that
@g"
(x; y) y p
@x + +o x2 + y 2 = 0:
@g" x
(x; y)
@y
Since y = f (x) is continuous, if x ! 0 also y ! 0 and so
2 3
@g" @g"
6 @x (x; y) y p 7 (x; y) y
lim 6 + + o x 2 + y 2 7 = @x + lim = 0;
x!0 4 @g" x 5 @g" x!0 x
(x; y) (x; y)
@y @y
and so
@g"
y (x; y)
f 0 (x) = lim = @x :
x!0 x @g"
(x; y)
@y
Finally, the continuity of f 0 is a direct consequence of the continuity of @g" =@x and of @g" =@y.
Chapter 24
Study of functions
(i) concave at the point x0 if there exists a neighborhood of this point (eventually only a
right-neighborhood or a left-neighborhood when x0 is a boundary point) in which it is
concave;
(ii) strictly concave at the point x0 if there exists a neighborhood of this point (eventually
only a right-neighborhood or a left-neighborhood) in which it is strictly concave.
Brie‡y:
f concave at x0 =) f 00 (x0 ) 0
and
f 00 (x0 ) < 0 =) f strictly concave at x0
An analogous characterization holds for (strict) convexity.
Example 984 (i) The function f : R ! R de…ned by f (x) = 2x2 3 is strictly convex at
every point because f 00 (x) = 4 > 0 at every x.
683
684 CHAPTER 24. STUDY OF FUNCTIONS
5 10
y y
0 f(x )
0
6
-5 4 f(x )
0
-10
O x x O x x
0 0
-15 -2
0 1 2 3 4 5 6 -1 0 1 2 3 4 5 6 7
O.R. As well as the …rst derivative of a function at a point gives information on its increase
or decrease, the second derivative gives information on concavity or convexity at a point.
The greater jf 00 (x0 )j, the more pronounced the curvature (the “stomach”) of f at x0 (and
the “stomach” is upward if f 00 (x0 ) < 0, …rst …gure, and downward if f 00 (x0 ) > 0, second
…gure).
To avoid the in‡uence of the measure unit of f (x), especially in economics, we consider
f 00 (x0 )
f 0 (x0 )
(or its absolute value) that does not depend on it.1 Observe incidentally that f 00 (x0 ) =f 0 (x0 )
is the derivative of log f 0 (x0 ). H
In short, in an in‡ection point there changes the direction of the concavity of the function.
The previous Proposition 983 allows to conclude immediately that:
2
Example 987 Let f : R ! R be the Gaussian function f (x) = e x . Resulting f 0 (x) =
2 2
2xe x we have f 00 (x) = 4x2 2 e x ; the function is concave for
1 1
p <x< p
2 2
p p
and convex
p for jxj > 1= 2. The two points 1= 2 are therefore in‡ection points and indeed
f 00 1= 2 = 0. Note that the point x = 0 is a local maximizer (actually, it is a global
maximizer, as the reader can easily verify). N
Geometrically, for the functions with a derivative, at a point of in‡ection the tangent line
cuts the graph: it cannot lie (locally) either above or below it.
If at an in‡ection point it happens that f 0 (x0 ) = f 00 (x0 ) = 0, the tangent line is horizontal
and cuts the graph of the function: we talk of point of in‡ection with horizontal tangent.
De…nition 985 allows …nally to prove easily the following su¢ cient condition for a point
x0 to be of in‡ection for a function f .
24.2 Asymptotes
Intuitively, it is called asymptote a straight line to which the graph of a function gets indef-
initely near. Such straight lines can be vertical, horizontal or oblique.
lim f (x) = +1 or 1
x!x+
0
lim f (x) = +1 or 1
x!x0
(ii) When
lim f (x) = L (or lim f (x) = L)
x!+1 x! 1
(iii) When
that is when the distance between the function and the straight line y = ax + b (a 6= 0)
tends to 0 as x ! +1 (or: to 1), the straight line of equation y = ax + b (a 6= 0) is
an oblique asymptote for f to +1 (or: to 1).
The horizontal asymptotes are actually the special case of oblique asymptotes with a = 0.
Moreover, it is evident that there can be at most one only horizontal or oblique asymptote
as x ! 1 and at most one only as x ! +1. It is instead possible that f has several
vertical asymptotes.
7
f (x) = 3
x2 +1
whose graph is
2
y
1.5
0.5
-0.5
-1
-1.5
O x
-2
-2.5
-3
-3.5
-5 0 5
Since limx!+1 f (x) = 3 and limx! 1 f (x) = 3; the straight line y = 3 is right and
left horizontal asymptote for f (x). N
1
f (x) =
x2 +x 2
24.2. ASYMPTOTES 687
whose graph is
3
y
0
O x
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4 5
Since limx!1+ f (x) = +1 and limx!1 f (x) = 1; the straight line x = 1 is vertical
asymptote for f (x). Moreover, since limx! 2+ = 1 and limx! 2 = +1; also the
straight line x = 2 is vertical asymptote for f (x). N
2x2
f (x) =
x+1
whose graph is
20
y
15
10
0
O x
-5
-10
-15
-20
-6 -4 -2 0 2 4 6
There is no di¢ culty in identifying vertical and horizontal asymptotes. We thus shift our
attention to oblique asymptotes. To this end, we provide two simple results.
Proof “If”. When f (x) =x ! a, consider the di¤erence f (x) ax. If it tends to a …nite
limit b, then (and only then) f (x) ax b ! 0. “Only if”. From f (x) ax b ! 0 it
follows that f (x) ax ! b and, by dividing by x, that f (x) =x a ! 0.
Proposition 993 gives a necessary and su¢ cient condition for the search of oblique asymp-
totes, while Proposition 994 only provides a su¢ cient condition. In order to use this latter
condition, the limits involved must exist. In this regard, consider the following example.
cos x2
f (x) = x +
x
as x ! 1 we have
f (x) cos x2
=1+ !1
x x2
and
cos x2
f (x) x= !0
x
Therefore y = x is an oblique asymptote of f as x ! 1. Nevertheless, the …rst derivative
of f is
2x2 sin x2 cos x2 cos x2
f 0 (x) = 1 + = 1 2 sin x2
x2 x2
and it is immediate to verify that the limit of f 0 (x) as x ! 1 does not exist. N
and as x ! +1
r 1 !
p 1 1 2
f (x) x= x2 x x=x 1 x=x 1 1
x x
1
1
1 x
2
1 1
= 1 !
x
2
Therefore
1
y=x
2
is oblique asymptote as x ! +1 for f . N
(i) If f (x) = g (x) + h (x) and h (x) ! 0 as x ! 1, then f and g have in common the
possible oblique asymptotes.
p 1 a1
y= n
a0 x +
n a0
p 1 a1
y= n
a0 x +
n a0
Let us verify only (ii) for n odd (for n even the calculations are analogous). If n is odd
as x ! 1 we have
p q
n
a xn n 1 + a1 + ::: + an
f (x) 0 a0 x a0 x p
= ! n a0
x x
p
therefore the slope of the oblique asymptote is n a0 . Moreover
" 1 #
p p a1 xn 1+ ::: + an n
f (x) n
a0 x = n
a0 x 1+ 1 =
a0 xn
1
a 1 xn 1 +:::+a n
1+ a 0 xn
n
1
p a1 xn 1+ ::: + an
= n
a0 x a 1 xn 1 +:::+a
a0 xn n
a 0 xn
690 CHAPTER 24. STUDY OF FUNCTIONS
Since as x ! 1
1
a 1 xn 1 +:::+a n
1+ a 0 xn
n
1
1 p a1 xn 1+ ::: + an p a1
a1 xn 1 +:::+an
! and n
a0 x ! n
a0
n a0 xn a0
a 0 xn
we have, as x ! 1,
p p a1 1
f (x) n
a0 x ! n
a0
a0 n
In the previous example we had n = 2, a0 = 1 and a1 = 1; indeed, as x ! +1, the
asymptote had equation
p2 1 1 1
y = 1 x+ =x
2 1 2
(i) First of all it is suitable to calculate the limits of f at the boundary points of the
domain besides eventually as x ! 1 when A is unbounded.
(ii) It can be interesting to establish the sets in which the function is positive, f (x) 0,
increasing, f 0 (x) 0, and concave/convex, f 00 (x) Q 0. Once determined the intersec-
tions of the graph with the axes (…nding the set f (0) on the vertical axis and the set
f 1 (0) on the horizontal axis), we have a …rst idea of its graph.
(iii) To …nd local extremal points (provided they exist), it is possible to use the omnibus
procedure seen in Section 21.3.
(iv) The points at which f 00 (x) = 0 are candidate to be of in‡ection; they are certainly so
if at these points f 000 6= 0.
(ii) Being f 0 (x) = 3x2 14x + 12, the derivative is zero for
p p p
14 196 144 14 52 7 13
x= = =
6 6 3
p i h p
The derivative is 0 when x 2 1; 7 3
13
[ 7+ 13
3 ;1 .
(iii) Being f 00 (x) = 6x 14, it is zero for x = 7=3. The second derivative is 0 when
x 7=3.
p p
(iv) Since f 00 7 3 13 < 0, the point is a local maximizer; since instead f 00 7+ 13
3 > 0,
the point is a local minimizer. Finally the point 7=3 is of in‡ection.
10
y
8
0
O x
-2
-3 -2 -1 0 1 2 3 4 5 6 7
2
Example 999 Let f : R ! R be the function de…ned by f (x) = e x . It is called Gaussian
function. Both limits, as x ! 1, are 0, and the horizontal axis is therefore horizontal
asymptote. The function is always strictly positive and f (0) = 1. Next, we look for possible
2
local extremal points. The …rst order condition f 0 (x) = 0 has the form 2xe x = 0 and so
x = 0 is the unique critical point. The second derivative is
x2 x2 x2
f 00 (x) = 2e + ( 2x) e ( 2x) = 2e 2x2 1
Therefore, f 00 (0) = 2: x = 0 is a local maximizer. The graph of the function is the famous
692 CHAPTER 24. STUDY OF FUNCTIONS
Gaussian bell:
2
y
1.5
0.5
0
O x
-0.5
-1
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
Example 1000 Let f : R ! R be given by f (x) = x6 3x2 + 1. Next, we look for possible
local extremal points. The …rst order condition f 0 (x) = 0 has the form
6x5 6x = 0
and therefore x = 0 and x = 1 are the unique critical points. We have f 00 (0) = 6,
f 00 ( 1) = 24 and f 00 (1) = 24. Hence, x = 0 is a local maximizer, while x = 1 and x = 1
are local minimizer. From limx!+1 f (x) = limx! 1 f (x) = +1 if follows that the graph
of this function is:
y
1.5
0.5
0
O x
-0.5
-1
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
Example 1001 Let f : R ! R be given by f (x) = xex . Its limits are limx! 1 xe
x =0
and limx!+1 xex = +1. We then have:
(i) f (x) 0 () x 0.
24.3. STUDY OF FUNCTIONS 693
(ii) f 0 (x) = (x + 1) ex 0 () x 1.
(iii) f 00 (x) = (x + 2) ex 0 () x 2.
(iv) f (0) = 0: the origin is the unique point of intersection with the axes.
10
9
y
8
0
O x
-1
-6 -4 -2 0 2 4 6
lim x2 ex = 0+ ; lim x2 ex = +1
x! 1 x!+1
p p
(iii) f 00 (x) = x2 + 4x + 2 ex 0 () x 2 1; 2 2 [ 2+ 2; +1 .
8 y
7
0
O x
-1
-4 -3 -2 -1 0 1 2 3 4 5
N
Example 1003 Let f : R ! R be given by f (x) = x3 ex . Its limits are
lim x3 ex = 0 ; lim x3 ex = +1
x! 1 x!+1
8
y
7
0
O x
-1
-2
-6 -5 -4 -3 -2 -1 0 1 2 3
24.3. STUDY OF FUNCTIONS 695
1 5
(i) f (0) = 3 2 = 2 ; we have f (x) = 0 when (2x + 3) (x 2) = 1 that is, when
2x2 x 5 = 0, i.e., for
p
1 41
x= ' 1; 35 and 1; 85
4
(ii) We have
1
f 0 (x) = 2
(x 2)2
p
which is zero if (x 2)2 = 1=2, i.e., if x = 2 (1= 2).
(iv) Given that f 0 (x) ! 2 as x ! 1, the function presents an oblique asymptote. Since
1
lim [f (x) 2x] = lim 3+ =3
x! 1 x! 1 x 2
25
20 y
15
10
0
O x
-5
-10
-15
-20
-25
-5 0 5 10
696 CHAPTER 24. STUDY OF FUNCTIONS
Note that
1
f (x)
x 2
as x ! 2 (in proximity of 2 it behaves as 1= (x 2), i.e., it diverges) and that f (x)
2x + 3 as x ! 1 (for x su¢ ciently large it behaves as y = 2x + 3). N
Part VII
Di¤erential optimization
697
Chapter 25
Unconstrained optimization
699
700 CHAPTER 25. UNCONSTRAINED OPTIMIZATION
Tonelli’s Theorem can be used for this class of problems and, along with Fermat’s Theorem,
it gives rise to the so-called elimination method for solving optimization problems, which in
this chapter will be used in dealing with unconstrained di¤erential optimization problems.
S = fx 2 C : rf (x) = 0g
f (^
x) f (x) 8x 2 S (25.2)
then x
^ is a solution for the optimization problem (25.1).
In other words, once conditions for Tonelli’s Theorem to be applied are veri…ed, one
constructs the set of critical points; the point (or points) where f achieves maximum value
is the solution to the optimization problem.
N.B. If the function f 2 C 2 (C), in phase 1 instead of S one can consider its subset S2 S
which is made up of the critical points which satisfy the second order necessary condition
(Sections 20.5.3 and 21.4.4). O
In order to better understand the elimination method, the reader should note that, thanks
to Fermat’s Theorem, the set S consists of all points in C which are candidate local solutions
for optimization problem (25.1). On the other hand, if f is continuous and coercive on C, by
Tonelli’s Theorem there is at least a solution for the optimization problem. Such a solution
must belong to set S (as long as it is non-empty) as a solution to the optimization problem
is a fortiori, a local solution. Hence the solutions to the “restricted” optimization problem
are also solutions to optimization problem (25.1). However the solutions to problem (25.3)
are the points x^ 2 S for which condition (25.2) holds. Hence they are the solutions to
optimization problem (25.1), as phase 3 of the elimination method states.
As the following examples show, the elimination method elegantly and e¤ectively com-
bines Tonelli’s global result and that of Fermat which has a more local nature. Note how
Tonelli’s Theorem is crucial as the set C is open, thus making Weierstrass’Theorem inap-
plicable (as it requires C to be compact).
The smaller is the set S of critical points, the better the method works, as phase 3 requires
a direct comparison of f in all points of S. For such a reason the method is particularly
e¤ective when, in the scalar case, one can consider, instead of S sits subset S2 , which is made
up of all critical points which satisfy the second order necessary condition.
25.2. COERCIVE PROBLEMS 701
2
Example 1005 Let f : Rn ! R be given by f (x) = 1 kxk2 ekxk and let C = Rn . The
function f is coercive on Rn . Indeed, it is supercoercive: by taking tn = kxn k, it follows that
2 2
f (xn ) = 1 kxn k2 ekxn k = 1 t2n etn ! 1
for any sequence fxn g Rn such that tn = kxn k ! +1. Since it is continuous f is coercive
on Rn by Proposition 698. the unconstrained di¤erential optimization problem
2
max 1 kxk2 ekxk sub x 2 Rn (25.4)
x
Phase 2: Since S is a singleton, the condition in this phase trivially holds and so x
^ = 0 is a
solution to optimization problem (25.4). N
Phase 1: The …rst order condition f 0 (x) = 0 takes the form 6x5 6x = 0 and so x = 0and
x = 1 are the only critical points, that is S = f 1; 0; 1g. We have that f 00 (0) = 6,
f 00 ( 1) = 24 and f 00 (1) = 24 and so S2 = f0g.
Phase 2: Since S2 is a singleton, the condition in this phase trivially hold and so x
^ = 0 is a
solution to the optimization problem (25.5). N
Example 1007 Let us consider Example 851 again, which dealt with the unconstrained
optimization problem
4 2
max e x +x sub x 2 R
x
with di¤erential methods. The problem is di¤erential. Let us verify its coercivity. By setting
g (x) = ex and h (x) = x4 x2 , it follows that f = g h. We have that limx! 1 h (x) =
limx! 1 x4 + x2 = 1 and so, by Proposition 698, the function h is coercive on R. Since
g is strictly increasing, the function f is a strictly increasing transformation of a coercive
function. By Proposition 684, f is coercive.
This unconstrained di¤erential optimization problem is thus coercive and can be solved
with the elimination method.
p p
Phase 1: From Example 851 we know that S2 = 1= 2; 1= 2 .
702 CHAPTER 25. UNCONSTRAINED OPTIMIZATION
p p p
Phase 2: We have that f 1= 2 = f 1= 2 and so both points x ^ = 1= 2 are solutions
to the unconstrained optimization problem. The elimination method allowed us to identify
the nature of such points, which would not have been possible by using solely di¤erential
methods as in Example 851. N
This implies that a point x^ of C is solution of the concave problem (25.6) if and only if
rf (^
x) = 0. Indeed, if x
^ 2 C is such that rf (^
x) = 0, the inequality is such that
f (y) f (^
x) + rf (^
x) (y x
^) = f (^
x) 8y 2 C
so that x
^ is solution of problem (25.6). On the other hand, if x
^ 2 C is solution of the
problem, we have rf (^x) = 0 thanks to Fermat’s Theorem.
25.3. CONCAVE PROBLEMS 703
The status of necessary and su¢ cient condition of rf (^ x) = 0 leads to the concave
(elimination) method to solve the concave problem (25.6); it consists of a single phase:
1. …nd the set S = fx 2 C : rf (x) = 0g of the stationary points of f on C; all, and only,
the points x
^ 2 S solve the optimization problem.
Example 1009 Let f : R ! R be given by f (x) = x log x and let C = (0; 1). The
function f is strictly concave since f 0 (x) = 1 log x is strictly decreasing (Corollary 920).
Let us solve the concave problem
We have
1
f 0 (x) = 0 () log x = 1 () elog x = e 1
() x =
e
By the concave method, x
^=e 1 is the unique solution of problem (25.7). N
Example 1010 Let f : R2 ! R be given by f (x) = 2x2 3xy 6y 2 and let C = R2 . The
function f is strictly concave since the Hessian
4 3
3 12
We have
4x 3y = 0
rf (x) = 0 () () x = (0; 0)
12y 3x = 0
By the concave method, the origin x
^ = (0; 0) is the unique solution of problem (25.8). N
704 CHAPTER 25. UNCONSTRAINED OPTIMIZATION
In this preview we introduced the two relevant classes of unconstrained di¤erential optimiz-
ation problems: coercive and concave ones. A few observations are in order:
1. The two classes are not exhaustive: there are unconstrained di¤erential optimization
problems which are neither coercive nor concave. For example, consider the uncon-
strained di¤erential optimization problem
it is neither coercive nor concave: the cosine function is neither coercive on the real
line (see Example 683) nor concave. Nonetheless, the problem is trivial: as one can
easily infer from the graph of the cosine function, its solutions are the points x = 2k
con k 2 Z. As usual, common sense gives the best guidance in solving any problem (in
particular, optimization ones), more so than any classi…cation.
2. The two classes are not disjoint: there are unconstrained di¤erential optimization prob-
lems which are both coercive and concave. For example, the unconstrained di¤erential
optimization problem
max 1 x2 sub x 2 R
x
is both coercive and concave: the cosine function is indeed both coercive (see Example
689) and concave on the real line. In cases such as this one we use the more powerful
concave method.
3. The two classes are distinct: there are unconstrained di¤erential optimization problems
which are coercive but not concave, and vice versa.
(
1 x2 if x 0
f (x) =
1 if x > 0
3
y
1
1
0
O x
-1
-2
-3
-3 -2 -1 0 1 2 3 4 5
shows how it is concave, but not coercive. The optimization problem is thus
concave, but not coercive.
(b) The unconstrained di¤erential optimization problem
x2
max e sub x 2 R
x
2
is coercive, but not concave: the Gaussian function e x is indeed coercive (Ex-
ample 685), but not concave, as its well-known graph shows
y
2.5
1.5
0.5
0
O x
-0.5
-1
-4 -3 -2 -1 0 1 2 3 4
25.5 Weakening
An optimization problem
max f (x) sub x 2 C
x
706 CHAPTER 25. UNCONSTRAINED OPTIMIZATION
where Qn+ is the set of vectors in Rn whose coordinates are rational and positive. An intuitive
weakening of the problem is
2
max 1 kxk2 ekxk sub x 2 Rn
x
whose choice set is larger yet analytically more convenient. Indeed the relaxed problem is
coercive and a simple application of the elimination method shows that its solution is x
^=0
( Example 1005). Since it belongs to Qn+ , we can conclude that x ^ = 0 is also the unique
solution to problem (25.9). It would have been far more complex to reach such a conclusion
by studying the initial problem directly.
(ii) Let us consider the Consumer Problem with log-linear utility
n
X
max ai log xi sub x 2 C (25.10)
x
i=1
where C = B (p; I) \ Qn is the set of bundles with rational components. Let us consider the
relaxed version
Xn
max ai log xi sub x 2 B (p; I)
x
i=1
with a larger yet convex (thus analytically more convenient) choice set. Indeed, convexity
itself allowed us to conclude in Section 16.5 that the unique solution to the problem is the
bundle x ^ such that x
^i = ai I=pi for every good i = 1; :::; n. If ai ; pi ; I 2 Q for every i, the
bundle x ^ belongs to Cand is thus the unique solution to problem (25.10). It would have been
far more complex to reach such a conclusion by studying problem (25.10) directly. N
25.6 No illusions
Solving optimization problems is generally a quite complex endeavor, even when a limited
number of variables is involved. In this section we will refer to an example of an optimization
problem whose solution is as complicated as proving Fermat’s Last Theorem.2 The latter,
which was …nally proven after three centuries of unfruitful e¤orts, states that, for n 3,
there do not exist any three positive integers x, y and z such that xn + y n = z n (Section
1.3.2)
Let us consider the optimization problem
inf f (x; y; z; n) = 0
(x;y;z;n)2C
p p 2 p
since limn!1 f 1; 1; n 2; n = limn!1 1 cos 2 n 2 = 0. In fact, limn!1 n 2 = 1 (Pro-
position 310).
The minimum value is thus zero. The question is whether there is there a solution to
the problem, that is a vector (^x; y^; z^; n
^ ) 2 C such that f (^
x; y^; z^; n
^ ) = 0. Since f is a sum of
squares, this requires that in such a vector they all be null, that is
^n^ + y^n^
x z^n^ = 1 cos 2 x
^=1 cos 2 y^ = 1 cos 2 z^ = 0
The last three equalities imply that the points x ^, y^ and z^ are integers.3 In order to belong
to the set C, they must be positive. Therefore, the vector (^ x; y^; z^; n
^ ) 2 C must be made
up of three positive integers such that x^n^ + y^n^ = z^n^ for n
^ 3. This is possible if and only
if Fermat’s Last Theorem is false. Now that we know it to be true, we can conclude that
this optimization problem has no solution. We could not have made such a statement before
1994 and it would have been unclear whether this optimization problem had a solution .
Be it as it may, the solution to this optimization problem, which only has four variables, is
equivalent to solving one of the most well-known problems in mathematics.
2
Based on K. G. Murty e S. N. Kabadi, “Some NP-complete problems in quadratic and nonlinear pro-
gramming”, Mathematical Programming, 39, 117-129, 1987.
3
Let the reader be reminded that cos 2x = 1 if and only if x is an integer.
708 CHAPTER 25. UNCONSTRAINED OPTIMIZATION
Chapter 26
Equality constraints
26.1 Introduction
The classical necessary condition for local extremal points of Fermat’s Theorem considers
interior points of the choice set C, something that greatly limits its use in the optimization
problems coming from economics. Indeed, in many of them the hypotheses of monotonicity
of Proposition 666 hold and, therefore, the possible solutions are boundary, and not interior.
A classical example is the consumer problem
Under the standard hypothesis of monotonicity, by Walras’Law the problem can be rewritten
as
max u (x) sub x 2 (p; I)
x
709
710 CHAPTER 26. EQUALITY CONSTRAINTS
functions f and gi are continuously di¤erentiable on a non-empty and open subset D of their
domain A; that is, ; =
6 D int A.
The set
C = fx 2 A : gi (x) = bi 8i = 1; :::; mg (26.3)
is the subset of A identi…ed by the constraints; therefore, the optimization problem (26.2)
can be equivalently formulated in canonical form as:
Nevertheless, for this special class of optimization problems we will often use the more
evocative writing (26.2).
In what follows we will …rst study the important special case of a single constraint, which
we will then generalize to the case of several constraints.
The next fundamental lemma gives the key to …nd the solutions of problem (26.4). The
hypothesis x
^ 2 C \ D requires that x ^ be a point of the choice set in which f and g have
a continuous derivative. Moreover, we require that rg (^ x) 6= 0; in this regard, note that a
point x 2 A is said to be singular if rg (x) = 0, and regular otherwise. According to this
terminology, the condition rg (^
x) 6= 0 amounts to require x^ to be regular.
Lemma 1012 Let x ^ 2 C\D be local solution of the optimization problem (26.4). If rg (^
x) 6=
^
0, then there exists a scalar 2 R such that
x) = ^ rg (^
rf (^ x) (26.5)
Proof We prove the lemma for n = 2 (the extension to any n is routine by considering
a suitable extension of the Implicit Function Theorem for functions of n variables). Since
rg (^
x) 6= 0, at least one of the two partial derivatives @g=@x1 or @g=@x2 is di¤erent from 0 at
x
^. Let for example @g=@x2 (^ x) 6= 0 (if it were @g=@x1 (^
x) 6= 0 the proof would be symmetric).
26.3. ONE CONSTRAINT 711
As seen in Section 23.2.2, the Implicit Function Theorem can be applied also to study locally
points belonging to the level curves g 1 (b) with b 2 R. Since x ^ = (^ ^2 ) 2 g 1 (b), thanks
x1 ; x
to such a theorem there exist neighborhoods U (^ x1 ) and V (^x2 ) and a unique function with
a derivative h : U (^x1 ) ! V (^ x2 ) such that x
^2 = h (^ x1 ) and g (x1 ; h(x1 )) = b for each
x1 2 U (^
x1 ), with
@g
@x1 (x1 ; x2 )
h0 (x1 ) = @g
8 (x1 ; x2 ) 2 g 1
(b) \ (U (^
x1 ) V (^
x2 ))
@x2 (x1 ; x2 )
0 @f @f
(x1 ) = (x1 ; h(x1 )) + (x1 ; h(x1 )) h0 (x1 )
@x1 @x2
Since x
^ is local solution of the optimization problem (26.4), there exists a neighborhood
B" (^
x) of x
^ such that
f (^
x) f (x) 8x 2 g 1 (b) \ B" (^
x) (26.6)
Without loss of generality, suppose that " is su¢ ciently small so that
(^
x1 "; x
^1 + ") U (^
x1 ) and (^
x2 "; x
^2 + ") V (^
x2 )
Hence, B" (^
x) U (^
x1 ) V (^
x2 ). This permits to rewrite (26.6) as
f (^
x1 ; h (^
x1 )) f (x1 ; h (x1 )) 8x1 2 (^
x1 "; x
^1 + ")
that is, (^
x1 ) (x1 ) for every x1 2 (^ x1 "; x^1 + "). The point x ^1 is, therefore, of local
maximizer for . The …rst order condition is:
@g
!
0 @f @f @x1 (^
x1 ; x^2 )
(x1 ) = (^
x1 ; x
^2 ) (^
x1 ; x
^2 ) @g =0 (26.7)
@x1 @x2 (^
x 1 ; x
^ 2 )
@x2
If (@g=@x1 ) (^
x1 ; x
^2 ) 6= 0, we have
@f @f
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )
@g
= @g
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )
By setting
@f @f
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )
@g
= @g
=^
@x1 (^
x1 ; x
^2 ) @x2 (^
x1 ; x
^2 )
we get (
@f
@x1 (^
x1 ; x
^2 ) = ^ @x
@g
1
(^
x1 ; x
^2 )
@f ^ @g
@x2 (^
x1 ; x
^2 ) = @x2 (^ x1 ; x
^2 )
If (@g=@x1 ) (^
x1 ; x
^2 ) = 0, from (26.7) we also have
@f
(^
x1 ; x
^2 ) = 0
@x1
so that the equality
@f @g
(^ ^2 ) = ^
x1 ; x (^
x1 ; x
^2 )
@x1 @x1
is trivially veri…ed for every scalar ^ . Setting
@f
@x2 (^
x1 ; x
^2 )
@g
=^
@x2 (^
x1 ; x
^2 )
Equality (26.5) tells us that a necessary condition for x ^ to be a local solution of the
optimization problem (26.4) is that the gradients of the functions f and g are proportional.
The “hat” above reminds us that this scalar depends on the point x ^ considered.
The next example shows that condition (26.5) is necessary, but not su¢ cient.
x31 + x32
max sub x1 x2 = 0 (26.8)
x1 ;x2 2
is of the form (26.4), where f; g : R2 ! R are given by f (x) = 2 1 (x31 + x32 ) and g (x) =
x1 x2 , while b = 0. We have rf (0; 0) = (0; 0) and rg (0; 0) = (1; 1), and so ^ = 0 is
such that rf (0; 0) = ^ rg (0; 0). The point (0; 0) thus satis…es with ^ = 0 condition (26.5),
but it is not a solution of problem (26.8). Indeed,
Note that (0; 0) is not even a constrained (global) minimizer since f (t; t) = t3 < 0 for every
t < 0. N
@f @f @g @g
(^
x) ; (^
x) =^ (^
x) ; (^
x)
@x1 @x2 @x1 @x2
that is
@f @g @f @g
x) = ^
(^ (^
x) and x) = ^
(^ (^
x) (26.10)
@x1 @x1 @x2 @x2
The condition rg (^ x) 6= 0 requires that at least one of the partial derivatives (@g=@xi ) (^
x)
is di¤erent from zero. If, for convenience, we suppose that both are so and that ^ 6= 0, then
(26.10) is equivalent to
@f @f
@x1 (^
x) @x2 (^
x)
@g
= @g
(26.11)
@x1 (^
x) @x2 (^
x)
26.3. ONE CONSTRAINT 713
If x
^ is solution of the optimization problem, we must necessarily have df (^ x) (h) = 0 for every
legitimate variation h. Otherwise, if it were df (^ x) (h) > 0, it would give a point x
^ + h that
satis…es the equality constraint, but such that f (^ x + h) > f (^x). On the other hand, if it
were df (^ x) (h) < 0 the same observation could be made this time for h, which is obviously
a legitimate variation and that would lead to the point x ^ h with f (^ x h) > f (^ x).
The necessary condition df (^ x) (h) = 0 together with (26.13) gives:
@g
!
@f @x2 (^
x) @f
(^
x) @g
h2 + (^
x) h2 = 0
@x1 (^
x) @x2
@x1
that is precisely expression (26.11). At an intuitive level, all this explains why (26.5) is
necessary for x
^ to be solution of the problem.
714 CHAPTER 26. EQUALITY CONSTRAINTS
rf (^
x) ^ rg (^
x) = 0
By recalling the algebra of gradients, the expression rf (x) rg (x) makes it natural
to think of the function L : A R Rn R ! R de…ned as
This function, called Lagrangian, plays a key role in optimization problems. Its gradient is
@L @L @L
rL (x; ) = (x; ) ; :::; (x; ) ; (x; ) 2 Rn+1
@x1 @xn @
It is important to distinguish in it the two parts rx L and r L given by:
@L @L
rx L (x; ) = (x; ) ; :::; (x; ) 2 Rn
@x1 @xn
and
@L
r L (x; ) = (x; ) 2 R
@
Using such notation, we have
and
r L (x; ) = b g (x) (26.16)
which leads to the following fundamental formulation in terms of the Lagrangian function of
the necessary condition of optimality of Lemma 1012.
Proof Let x^ be solution of the optimization problem (26.4). By Lemma 1012 there exists
^ 2 R such that
x) ^ rg (^
rf (^ x) = 0
By (26.15), the condition is equivalent to
^; ^ = 0
rx L x
On the other hand, by (26.15) we have r L (x; ) = b g (x), and therefore we will have
x; ^ ) = 0 since b g (^
also r L(^ x; ^ ) is a stationary point of L.
x) = 0. It follows that (^
Thanks to Lagrange’s Theorem, the search of the local solutions of the constrained optim-
ization problem (26.4) reduces to the search of the stationary points of a suitable function
26.4. THE METHOD OF ELIMINATION 715
of several variables, the Lagrangian function. It is a more complicated function than the
original function f because of the new variable , but through it the search of the solutions
of the optimization problem can be done by solving a standard …rst order condition, similar
to the ones seen for unconstrained optimization problems.
Needless to say, we are discussing a condition that is only necessary: there is no guarantee
that the stationary points are actually solutions of the problem. It is already a remarkable
achievement, however, to have the simple (…rst order) condition
rL (x; ) = 0 (26.17)
for the search of the possible candidates to be solution of the constrained optimization
problem (26.4). In the next section we will see that this condition plays a fundamental role
in the search of the local solutions of problem (26.4) with the Lagrange’s method, which in
turn may lead to the global solutions with a version of the elimination method.
We close with two important remarks. First, observe that in general the pair (^ x; ^ ) is
not maximizer of the Lagrangian function, even when x ^ turns out to solve the optimization
problem. The pair (^ ^
x; ) is just a stationary point for the Lagrangian function, nothing more.
Therefore, to say that the search of the solutions of the constrained optimization problem
reduces to the search of the maximizers of the Lagrangian function is a serious mistake.
Second, note that problem (26.4) has a symmetric version
min f (x) sub g (x) = b
x
in which, instead of looking for maximizers, we look for minimizers. Condition (26.5) is
necessary also for this version of problem (26.4) and, therefore, the stationary points of the
Lagrangian function could be minimizers instead of maximizers. At the same time, it may
happen that they are neither maximizers nor minimizers. It is the usual ambiguity of …rst
order conditions, which we already encountered in unconstrained optimization: it re‡ects
the status of necessary conditions that the …rst order conditions have.
5. the local solutions of the optimization problem (26.4), if they exist, belong to the set
S [ (C \ D0 ) [ (C D) (26.18)
According to Lagrange’s method, therefore, the possible local solutions of the optimiza-
tion problem (26.4) must be searched among the points of the subset (26.18) of C. Indeed, a
local solution that is a regular point will belong to the set S thanks to Lagrange’s Theorem.
Instead, this theorem does not say anything about possible local solutions that are singular
points (and so belong to the set C \ D0 ) as well as about possible local solutions where the
functions have a discontinuous derivative (and so belong to the set C D).
In conclusion, a necessary condition for a point x 2 C to be a local solution for the
optimization problem (26.4) is that it belongs to the subset S [ (C \ D0 ) [ (C D) C.
This is what this procedure, a key dividend of Lagrange’s Theorem, establishes. Clearly, the
smaller such a set is, the more e¤ective the application of the theorem is: the search for local
solutions can be then restricted to a signi…cantly smaller set than the original set C.
That said, what about global solutions? If the objective function f is coercive and
continuous on C, the …ve phases of the Lagrange’s method plus the following extra sixth
phase provide a version of the elimination method to …nd global solutions.
f (^
x) f (x) 8x 2 S [ (C \ D0 ) [ (C D) (26.19)
then x
^ is a (global) solution of the optimization problem (26.4).
In other words, the points of the set (26.18) in which f attains its maximum value are the
solutions of the optimization problem. Indeed, by Lagrange’s method this is the set of the
possible local solutions; the global solution, whose existence is ensured by Tonelli’s Theorem,
must then belong to such a set. Hence, the solutions of the “restricted”optimization problem
are also the solutions of the optimization problem (26.4). Phase 6 is based on this remark-
able fact. As for the Lagrange’s method, the smaller the set (26.18) is, the more e¤ective
the application of the elimination method is. In particular, in the lucky case when it is a
singleton, the elimination method determines the unique solution of the optimization prob-
lem, a remarkable achievement.
2
is of thePform (26.4), where f : Rn ! R and g : Rn ! R are given by f (x) = e kxk and
g (x) = ni=1 xi , while b = 1. The functions are continuously di¤erentiable on R2 , that is,
D = R2 . As in the previous example, C \ D = C and C D = ;: at all the points of the
constraint the functions f and g have a continuous derivative. We have therefore completed
phases 1 and 2 of Lagrange’s method.
Since rg (x) = (1; 1; :::; 1), there are no singular points, that is, D0 = ;. This completes
phase 3 of Lagrange’s method.
The Lagrangian function L : Rn+1 ! R is given by
n
!
2 X
L (x; ) = e kxk + 1 xi
i=1
To …nd the set of its stationary points it is necessary to solve the …rst order condition (26.17)
given by the following (nonlinear) system of n + 1 equations
( @L 2
@xi = 2xi e kxk =0 8i = 1; :::; n
@L Pn
@ =1 i=1 xi = 0
We observe that in no solution we can have = 0. Indeed, if it were so, the …rst n equations
would imply xi = 0, which contradicts the last equation. It follows that in every solution we
have 6= 0. The …rst n equations imply
2
xi = ekxk
2
and, by substituting these values in the last equation, we …nd
2
1 + n ekxk = 0
2
that is
2 kxk2
=e
n
Substituting such value of in any of the …rst n equations we …nd xi = 1=n, and therefore
the unique point (x; ) 2 Rn+1 that satis…es the …rst order condition (26.17) is
1 1 1 2 1
; ; :::; ; e n
n n n n
S = S [ (C \ D0 ) [ (C D) (26.22)
in this example the …rst order condition (26.17) turns out to be necessary for any local
solution of the optimization problem (26.21). The unique element of S is therefore the only
candidate to be a local solution of the problem.
718 CHAPTER 26. EQUALITY CONSTRAINTS
Turn now to the elimination method, which we can use since the continuous function f is P
coercive on the (non compact, being closed, but not bounded) set C = fx = (x1 ; :::; xn ) 2 Rn : ni=
indeed: 8
< Rn p if t 0
(f t) = n
x 2 R : kxk lg t if t 2 (0; 1]
:
; if t > 1
and so the set (f t) is compact and non-empty for each t 2 (0; 1]. Since the set in (26.22)
is a singleton, the elimination method allows us to conclude that (1=n; :::; 1=n) is the unique
solution of the optimization problem (26.21). N
To …nd the set of its stationary points we need to solve the …rst order condition (26.17) given
by the following system (nonlinear) of n + 1 equations
( @L pi
@xi = xi =0 8i = 1; :::; n
@L Pn
@ =1 i=1 xi = 0
Because the coordinates of the vector p are all di¤erent from zero, = 0 cannot be in
any solution. It followsPthat in each
P solution we have =
6 0. Because x 2 Rn++ , the
…rst P
n equations imply pi = xi and replacing these values in the last equation we
n
…nd i=1Pp i = . By replacing that value of in each of the …rst n equations we …nd
xi = pi = ni=1 pi . Thus, the unique point (x; ) 2 Rn+1 , which satis…es the …rst order
condition (26.17), is
( n
)
p1 p2 pn X
Pn ; Pn ; :::; Pn ; pi
i=1 pi i=1 pi i=1 pi i=1
p p2 pn
S= Pn 1 ; Pn ; :::; Pn
i=1 pi i=1 pi i=1 pi
1
All coordinates of p are either strictly positive or strictly negative.
26.4. THE METHOD OF ELIMINATION 719
S = S [ (C \ D0 ) [ (C D) (26.24)
also in this example the …rst order condition (26.17) is necessary to each local solution of
the optimization problem (26.23). Again, the unique element of S is the only candidate to
be local solution to the optimization problem (26.21).
We can apply the elimination method because P the continuous function f is, by Lemma
712, also coercive on the set C = x 2 Rn++ : ni=1 xi = 1 , which is notP compact because P it
is not closed. In view of (26.24), the elimination method implies that (p1 = ni=1 pi ; :::; pn = ni=1 pi )
is the unique solution of the optimization problem (26.23). N
When the elimination method is based on Weierstrass’ Theorem, rather than on the
weaker (but more widely applicable) Tonelli’s Theorem, as a “by-product” we can also …nd
the global minimizers, that is, the points x 2 C that solve problem minx f (x) sub x 2 C.
Indeed, it is easy to see that such are the points x that minimize f over S [ (C \ D0 ) [
(C D). Clearly, this is no longer true with Tonelli’s Theorem because it only ensures the
existence of maximizers and remains silent on possible minimizers.
is of the form (26.4), where f; g : R2 ! R are given by f (x1 ; x2 ) = 2x21 5x22 and g (x1 ; x2 ) =
x21 + x22 , while b = 1. The functions are continuously di¤erentiable on R2 , that is, D = R2 .
Hence, C \ D = C, so that C D = ;: at all the points of the constraint the functions f
and g have a continuous derivative. This completes phases 1 and 2 Lagrange’s method.
We have rg (x) = (2x1 ; 2x2 ), and so (0; 0) is the unique singular point, that is, D0 =
f(0; 0)g. The unique singular point does not satisfy the constraint, so that C \ D0 = ;. We
have therefore completed phase 3 of Lagrange’s method.
The Lagrangian function L : R3 ! R is given by
To …nd the set of its stationary points it is necessary to solve the …rst order condition (26.17):
8 @L
>
> @x1 =0
<
@L
@x2 =0
>
>
: @L
@ =0
in the three unknowns x1 , x2 and . We verify immediately that x1 = x2 = 0 satisfy the …rst
two equations for every value of ; but they do not satisfy the third equation. While x1 = 0
and = 5 imply x2 = 1: Moreover, x2 = 0 and = 2 imply x1 = 1. In conclusion, the
triples (x1 ; x2 ; ) that satisfy the …rst order condition (26.17) are
so that
S = f(0; 1) ; (0; 1) ; (1; 0) ; ( 1; 0)g
S = S [ (C \ D0 ) [ (C D) (26.26)
and, as in the last two examples, the …rst order condition is necessary for any local solution
of the optimization problem (26.25).
By having completed Lagrange’s method, let us turn to elimination method to …nd the
global solutions. Since the set C = (x1 ; x2 ) 2 R2 : x21 + x22 = 1 is compact and the function
f is continuous, we can use such method through Weierstrass’Theorem. In view of (26.26),
in phase 6 we have:
The points (0; 1) and (0; 1) are thus the (global) solutions of the optimization problem
(26.25), while the reliance here of the elimination method on Weierstrass’Theorem makes it
possible to say that the points (1; 0) and ( 1; 0) are global minimizers. N
and therefore (0; 0) is the unique singular point and it satis…es the constraint: D0 = C \D0 =
f(0; 0)g. Also phase 3 of Lagrange’s method has been completed.
The Lagrangian function L : R3 ! R is given by
x1
L (x1 ; x2 ; ) = e + x22 x31
2
Note that there are no other points that satisfy rL = 0: Indeed, let us suppose that rL(b b2 ; b) = 0,
x1 ; x
b1 6= 0 and x
with x b2 6= 0. Then, from @L=@x1 = 0 we deduce = 2 and from @L=@x2 = 0 we deduce = 5.
26.5. THE CONSUMER PROBLEM 721
To …nd the set of its stationary points it is necessary to solve the …rst order condition (26.17)
given by the following (nonlinear) system of three equations
8 @L x1
>
> @x1 = e 3 x21 = 0
<
@L
@x2 = 2 x2 = 0
>
>
: @L
@ = x22 x31 = 0
We observe that for no solution we can have = 0. Indeed, if it were = 0 the …rst equation
would become e x1 = 0, which does not have solution. Let us suppose therefore 6= 0. The
second equation implies x2 = 0, and therefore from the third one it follows x1 = 0. The
…rst equation becomes 1 = 0, and the contradiction shows that the system does not have
solutions. Therefore there are no points that satisfy the …rst order condition (26.17), so that
S = ;. Phase 4 of Lagrange’s method shows that
By Lagrange’s method, the unique possible local solution of the optimization problem (26.27)
is the point (0; 0).
Turn now to the elimination method. To use it we need to show that the continuous f is
coercive on the (non compact, being closed, but not bounded) set C = x = (x1 ; x2 ) 2 R2 : x31 = x22 .
Note that: 8
< R2 if t 0
(f t) = ( 1; lg t] R if t 2 (0; 1]
:
; if t > 1
Thus, f is not coercive on the entire space R2 , but it is coercive on C, which is all that
matters here. Indeed, note that x1 can satisfy the constraint x31 = x22 only if x1 0, so that
C R+ R and
where (p; I) = fx 2 A : p x = Ig, with p 0 (strictly positive prices), and the utility
function u : A Rn+ ! R is strictly increasing on A and continuously di¤erentiablePn on int A.
3
n
For example, the log-linear utility function u : R++ ! R de…ned by u (x) = i=1 ai log xi
satis…es these hypotheses, withPA = int A = Rn++ , while the separable utility function
u : Rn+ ! R de…ned by u (x) = ni=1 xi satis…es them with int A = Rn++ A = Rn+ .
Let us …rst …nd the local solutions through Lagrange’s method. The function g (x) = p x
expresses the constraint, so D = Rn+ \ int A and C D = (A int A) \ C. The set C D is,
therefore, formed by the boundary points of A that satisfy the constraint and that belong
to A. Note that when A = int A, as in the log-linear case, we have C D = ;.
From
rg (x) = p 8x 2 Rn
it follows that there are no singular points, that is, D0 = ;; hence, C \ D0 = ;. All this
completes phases 1-4 of Lagrange’s method.
The Lagrangian function L : A R ! R is given by
L (x; ) = u (x) + (I p x)
and, to …nd the set of its stationary points, it is necessary to solve the …rst order condition:
8 @L
>
> @x1 (x; ) = 0
>
>
>
>
>
<
>
>
>
> @L
(x; ) = 0
>
> @xn
>
: @L
@ (x; ) = 0
that is 8
@u(x)
>
> p1 = 0
>
> @x1
>
>
>
>
<
>
>
>
> @u(x)
>
> @xn pn = 0
>
>
:
I p x=0
In a more compact way, we write
@u (x)
= pi 8i = 1; :::; n (26.29)
@xi
p x=I (26.30)
The fundamental condition (26.29) is read in a di¤erent way according to the interpretation,
cardinalist or ordinalist, of the utility function. Let us suppose, for simplicity, that 6= 0.
According to the cardinalist reading, the condition is read in the equivalent form
@u(x) @u(x)
@x1 @xn
= =
p1 pn
3
Note that A Rn
+ implies int A Rn
++ , i.e., the interior points of A always have strictly positive
coordinates.
26.5. THE CONSUMER PROBLEM 723
that outlines as in the bundle x (local) solution of the consumer problem the marginal utilities
of the income spent for the various goods, measured by the ratios
@u(x)
@xi
pi
are all equal. Note that 1=pi is the quantity of good i that can be purchased with one unit
of income.
In an ordinalist perspective, where the notion of marginal utility becomes meaningless,
condition (26.29) is rewritten as
@u(x)
@xi pi
@u(x)
=
pj
@xj
for every pair of goods i and j of the solution bundle x. In such a bundle, therefore,
the marginal rate of substitution between each pair of goods must be equal to the ratio
between their prices, that is, M RSxi ;xj = pi =pj . For n = 2 we have the classical geometric
interpretation of the optimality condition in a bundle (x1 ; x2 ) as equality between the slope
of the indi¤erence curve (in the sense of Section 23.2.2) and that of the straight line of the
budget constraint.
2
x
2
1.5
0.5
-0.5
O x
1
-1
-1 0 1 2 3 4 5 6 7
The ordinalist interpretation does not require the cardinalist notion of marginal utility,
a notion that – by Occam’s razor – is therefore super‡uous for the study of the consumer
problem. The observation dates back to Vilfredo Pareto and represented a turning point in
the history of utility theory, so much that we talk of a “ordinalist revolution”.4
In any case, expressions (26.29) and (26.30) are …rst order conditions of the consumer
problem and their resolution determines the set S of the stationary points. In conclusion,
4
See his “Sunto di alcuni capitoli di un nuovo trattato di economia pura del prof. Pareto” that appeared
on the Giornale degli Economisti in 1900 (translated in Giornale degli Economisti, 2008).
724 CHAPTER 26. EQUALITY CONSTRAINTS
Lagrange’s method implies that the local solutions of the consumer problem must be looked
for among the points of
S [ ((A int A) \ C) (26.31)
Beyond points that satisfy …rst order conditions (26.29) and (26.30), local solutions can
therefore be boundary points A int A of the set A that satisfy the constraint (such solutions
are called boundary 5 ).
When u is coercive on (p; I) we can apply the elimination method to …nd the (global)
solutions of the consumer problem, that is, the optimal bundles (which are the economically
meaningful notions, consumers do not care about bundles that are just locally optimal). In
view of (26.31), the solutions are the bundles x
^ 2 S [ ((A int A) \ C) such that
u (^
x) u (x) 8x 2 S [ ((A int A) \ C)
In other words, we have to compare the utility levels attained by the stationary points in S
and by the boundary points that satisfy the constraint in (A int A)\C. As the comparison
requires the computation of all these utility levels, the smaller the set S [ ((A int A) \ C)
the more e¤ective the elimination method.
Example 1019 Consider the log-linear utility function in the case n = 2, i.e.,
with a 2 (0; 1). The …rst order condition at every (x1 ; x2 ) 2 R2++ takes the form
a 1 a
= p1 ; = p2 (26.32)
x1 x2
p1 x1 + p2 x2 = I (26.33)
We turn now to the elimination method, which we can use since the continuous function u
is, by Lemma 712, coercive on the set (p; I) = x 2 R2++ : p1 x1 + p2 x2 = I , which is not
compact since it is not closed. In view of (26.34), the elimination method implies that the
bundle (26.35) is the unique solution of the log-linear consumer problem, that is, the unique
optimal bundle. This con…rms what we already proved and discussed in Section 16.6, in a
more general and elegant way through the Jensen’s inequality. N
It is immediate to check that there are two boundary solutions x ^1 = 0 and x ^1 = I=p1 if,
respectively, p1 > p2 and p1 < p2 . This shows how silly can be a mechanical use of di¤erential
arguments. N
726 CHAPTER 26. EQUALITY CONSTRAINTS
The following result extends Lemma 1012 to the case with multiple constraints and show
that the regularity condition rg (^
x) 6= 0 from such a lemma can be generalized by requiring
that the Jacobian Dg (^ x) have full rank.6 In other words, x
^ must not be a singular point
here either.
Lemma 1021 Let x ^ 2 C \ D be the local solution to the optimization problem (26.36). If
x) has full rank, then there is a vector ^ 2 Rm such that
Dg (^
n
X
rf (^
x) = ^ i rgi (^
x) (26.37)
i=1
for every (x; ) 2 A Rm , and Theorem 1014 can be generalized in the following way (we
omit the proof as it is analogous to that of the cited result).
6
We shall omit the proof, which generalizes that of Lemma 1012 by means of an adequate version of the
Implicit function theorem.
26.7. SEVERAL CONSTRAINTS 727
The considerations we made for Theorem 1014 also hold in this more general case. In
particular, the search for local solution candidates for the constrained problem must still
be conducted following Lagrange’s method, which displays some conceptual novelties in
the multiple constraints case. The elimination method can be still used, again without
any conceptual novelty, to check whether such local candidates actually solve the optimum
problem. The examples will momentarily illustrate all this.
From an operational standpoint not that, however, the …rst order condition (26.17)
rL (x; ) = 0
is based on the Lagrangian L which has the more complex form (26.38). Also the form of
the set of critical points D0 is more complex now. In particular, the study of the Jacobian’s
determinant may be complex, thus making the search for critical points quite hard. The best
thing often is to directly look for the critical points which satisfy the constraints, that is for
the set C \ D0 , instead of trying to determine the set D0 …rst and for the intersection C \ D0
afterwards (as we did in the case with one constraint). The points x 2 C \ D0 are such that
gi (x) = bi and the gradients rgi (x) are linearly independent. We must now therefore verify
whether the system 8 Pm
>
> i=1 i rgi (x) = 0
>
> g1 (x) = b1
>
>
<
>
>
>
>
>
>
:
gm (x) = bm
admits solutions (x; ) 2 Rn Rm with = ( 1 ; :::; m ) 6= 0, that is with i which aren’t
all null. Such possible solutions identify those critical points which satisfy the constraints.
Note that the system can be written as
8 Pm @gi (x)
>
> i=1 i @x1 = 0
>
>
>
>
>
>
>
>
>
>
>
>
>
< Pm
> @gi (x)
i=1 i @xn = 0
(26.39)
>
> g (x) = b
>
> 1 1
>
>
>
>
>
>
>
>
>
>
>
>
:
gm (x) = bm
which makes computations more convenient.
728 CHAPTER 26. EQUALITY CONSTRAINTS
In order to …nd the set of its critical points we must solve the …rst order condition (26.17)
which is given by the following non-linear system of …ve equations
8 @L
>
>
> @x1 = 7 2 1 x1 2 =0
>
> @L
< @x2 = 2 1 x2 2 =0
@L
@x3 = 3 + 2 = 0
>
> @L 2
>
> @ 1 = 1 x x22 = 0
>
: @L
1
@ 2 =1 x1 x2 + x3 = 0
in the …ve unknowns x1 , x2 , x3 , 1 and 2 . The third equation implies 2 = 3 so that the
system becomes: 8
>
> 2 1 x1 + 4 = 0
<
2 1 x2 3 = 0
>
> 1 x21 x22 = 0
:
1 x1 x2 + x3 = 0
The …rst equation implies that 1 6= 0. Therefore, from the …rst two equations it follows
that 2= 1 = x1 and 3= (2 1 ) = x2 . By substituting into the third equation we get that
1 = 5=2. If 1 = 5=2, we have that x1 = 4=5, x2 = 3=5, x3 = 4=5: If 1 = 5=2, we
have that x1 = 4=5, x2 = 3=5, and x3 = 6=5. We have thus found the two critical points
of the Lagrangian
4 3 4 5 4 3 6 5
; ; ; ;3 ; ; ; ; ;3
5 5 5 2 5 5 5 2
26.7. SEVERAL CONSTRAINTS 729
so that
4 3 4 4 3 6
S= ; ; ; ; ;
5 5 5 5 5 5
thus completing all phases of Lagrange’s method. In conclusion, we have that
4 3 4 4 3 6
S [ (C \ D0 ) [ (C D) = S = ; ; ; ; ; (26.41)
5 5 5 5 5 5
thus proving that in the example the …rst order condition (26.17) is necessary for any local
solution to the optimization problem (26.40).
We now turn to the elimination method. Clearly, the set
is closed. It is also bounded (and so compact). For the x1 and x2 such that x21 + x22 = 1
we have x1 ; x2 2 [ 1; 1], while for the x3 such that x3 = x1 + x2 1 and x1 ; x2 2 [ 1; 1] we
have x3 2 [ 3; 1]. It follows that C [0; 1] [0; 1] [ 3; 1], and so C is bounded. Since f is
continuous, we can thus use the elimination method through Weierstrass’Theorem. In view
of (26.41), in the last phase of the elimination method we have
4 3 4 4 3 7 49
f ; ; =8 and f ; ; =
5 5 5 5 5 5 5
Hence, (4=5; 3=5; 4=5) solves the optimum problem (26.40), while ( 4=5; 3=5; 7=5) is a
minimizer. N
In light of the …rst and the third equations, we must consider three cases:
and among such three points one must search for the possible local solutions to the optim-
ization problem (26.42).
As to the elimination method, also here the set
C = x = (x1 ; x2 ; x3 ) 2 R3 : x32 = x21 and x23 + x22 = 2x2
is clearly closed. It is also bounded (and so it is compact). In fact, the second constraint can
be written as x23 + (x2 1)2 = 1, and so the x2 and x3 that satisfy it are such that xp 2 2p[0; 2]
2 3 2
and x3 2 [ 1; 1]. Now, the constraint x1 = x2 implies x1 2 [0; 8], and so x1 2 8; 8 .
p p
We conclude that C 8; 8 [0; 2] [ 1; 1] and so C is bounded. As in the previous
example, we can use the elimination method through Weierstrass’ Theorem. In view of
(26.43), in the last phase of the elimination method we have
p p
f 8; 2; 0 = 8 and f (0; 0; 0) = 0
p
Hence, (0; 0; 0) solves the optimum problem (26.42), while ( 8; 2; 0) is a minimizer. N
26.7. SEVERAL CONSTRAINTS 731
2x1 1 0
Dg (x) =
1 0 1
It is easy to see that for no value of x1 the two row vectors, that is the two gradients rg1 (x)
and rg2 (x), are linearly dependent (at a “mechanical ”level one can easily verify that no
value of x1 can be such that the matrix Dg (x) does not have full rank). Therefore, there
are no singular points, that is D0 = ;. It follows that C \ D0 = ;, and so we have concluded
phase 3 of Lagrange’s method.
Let us now move to the search of the set of the Lagrangian’s critical points L : R5 ! R
which is given by
In order to …nd such points we must solve the following (non-linear) system of 5 equations
8 @L
>
>
> @x1 = 2x1 2 1 x1 2 =0
>
> @L
= 2x + = 0
< @x2 2 1
@L
@x3 = 2x 3 2 =0
>
> @L 2
>
> = 1 x1 + x2 = 0
>
: @@L1
@ 2 =1 x1 x3 = 0
We have that 1 = 2x2 and 2 = 2x3 , which, if substituted in the …rst equation, lead to
the following non-linear system in three equations:
8
< x1 + 2x1 x2 x3 = 0
1 x21 + x2 = 0
:
1 x1 x3 = 0
From the last two equations it follows that x2 = x21 1 and x3 = 1p x1 , which, if substituted
in the …rst equation, imply that 2x31 1 = 0, from which x1 = 1= 3 2 follows and so
1 1
x2 = p
3
1 and x3 = 1 p
3
4 2
Therefore there is a unique critical point
1 1 1 2 2
p
3
;p
3
1; 1 p
3
;p
3
2; 2 + p
3
2 4 2 4 2
732 CHAPTER 26. EQUALITY CONSTRAINTS
so that
1 1 1
S= p
3
;p
3
1; 1 p
3
2 4 2
thus completing all phases of Lagrange’s method. In conclusion, we have that
1 1 1
S [ (C \ D0 ) [ (C D) = S = p
3
;p
3
1; 1 p
3
(26.45)
2 4 2
is the only candidate local solution to the optimization problem (26.44).
Let us consider the elimination method. The set
1 1 1
p
3
;p
3
1; 1 p
3
2 4 2
is the solution of the optimization problem (26.44). In this case the elimination method is si-
lent about possible minimizers because it relies on Tonelli’s Theorem and not on Weierstrass’
N
Chapter 27
Inequality constraints
27.1 Introduction
Let us go back to the consumer problem seen at the beginning of the previous chapter, in
which we considered a consumer with utility function u : A Rn ! R and income b 2 R.
Given the vector p 2 Rn+ of prices of the goods, we wrote his budget constraint as
C (p; b) = fx 2 A : p x = bg
In this formulation we assumed that the consumer exhausts his budget (and so the equality
in the budget constraint) and we did not impose other constraints on the bundle x except
that of satisfying the budget constraint. As to the income, the hypothesis that it is entirely
spent can be too strong. Think for example of intertemporal problems, where it can be
crucial to leave to the consumer the possibility of saving in some periods, something that is
impossible if we require that the budget constraint is satis…ed with equality at each period.
It becomes therefore natural to ask what happens to the consumer optimization problem if
we weaken the constraint to p x b, that is, if the constraint is given by an inequality and
not anymore by an equality.
As to the bundles of goods x, in many cases it is meaningless to talk of negative quantities.
Think for example of the purchase of physical goods, maybe fruit or vegetables in an open
air market, in which the quantity purchased has to be positive. This suggests to impose the
constraint x 2 Rn+ in the optimization problem.
By keeping in mind these observations, the consumer problem becomes:
733
734 CHAPTER 27. INEQUALITY CONSTRAINTS
the optimization problem still takes the form (27.1), but the set C (p; b) is now di¤erent.
The general form of an optimization problem with both equality and inequality constraints
is:
max f (x) (27.4)
x
sub gi (x) = bi 8i 2 I
hj (x) cj 8j 2 J
where I and J are …nite sets of indices (possibly empty), f : A Rn ! R is the objective
function, the functions gi : A Rn ! R and the associated scalars bi characterize jIj equality
constraints, while the functions hj : A Rn ! R with the associated scalars cj induce jJj
inequality constraints. We continue to assume, as in the previous chapter, that the functions
f and gi are continuously di¤erentiable on a non-empty and open subset D of their domain
A.
The optimization problem (27.4) can be equivalently formulated in canonical form as
max f (x) sub x 2 C
x
is of the form (27.4) with jIj = jJj = 1, f (x) = x21 + x22 + x33 , g (x) = x1 + x2 x3 ,
h (x) = x21 + x22 and b = c = 1.1
(ii) The optimization problem:
max x1
x1 ;x2 ;x3
is of the form (27.4) with I = f1; 2g, J = ;, f (x) = x1 , g1 (x) = x21 + x32 , g2 (x) =
x23 + x22 2x2 and b1 = b2 = 0.
(iii) The optimization problem:
sub x1 + x2 1 and x1 + x2 1
is of the form (27.4) with I = ;; J = f1; 2g ; f (x) = x31 x32 , h1 (x) = x1 + x2 , h2 (x) =
x2 + x1 and c1 = c2 = 1.
(v) The minimum problem:
min x1 + x2 + x3
x1 ;x2 ;x3
1
sub x1 + x2 = 1 and x22 + x23
2
can be written in the form (27.4) as
max (x1 + x2 + x3 )
x1 ;x2 ;x3
1
sub x1 + x2 = 1 and x22 x23
2
N
1
To be pedantic, here we should have set I = J = f1g ; g1 (x) = x1 + x2 x3 , h1 (x) = + and x21 x22
b1 = c1 = 1. But, in this case in which we have only one equality constraint and only one inequality
constraint, pedices make the notation heavy without utility.
736 CHAPTER 27. INEQUALITY CONSTRAINTS
In other words, A (x) is the set of the so called binding constraints at x, that is, of the
constraints that hold as equalities at the given point x. For example, in the problem
max f (x1 ; x2 ; x3 )
x1 ;x2 ;x3
De…nition 1028 The problem (27.4) has regular constraints at a point x 2 A if the gradients
rgi (x) and the gradients rhj (x), with j 2 A (x), are linearly independent.
In other words, the constraints are regular at a point x if the gradients of the functions
that induce constraints binding at such point are linearly independent. This condition is
the generalization to the problem (27.4) of the condition of linear independence upon which
Lemma 1021 was based; indeed, it implies that x is a regular point for the function g : A
RjIj ! R.
In particular, if we form the matrix whose rows consist of the gradients of the func-
tions that induce binding constraints at the point considered, the regularity condition of the
constraints is equivalent to require that such matrix has maximum rank.
Finally, observe that in view of Corollary 88-(ii) the regularity condition of the constraints
can be satis…ed at a point x only if jA (x)j n, that is, only if the number of the binding
constraints at x does not exceed the dimension of the space on which the optimization
problem is de…ned.
27.2. RESOLUTION OF THE PROBLEM 737
We can now state the generalization of Lemma 1021 for problem (27.4). In reading it
note how the vector ^ associated to the inequality constraints has positive sign, while there
is no restriction on the sign of the vector ^ associated to the equality constraints.
Lemma 1029 Let x ^ 2 C \ D be solution of the optimization problem (27.4). If the con-
jJj
^, then there exist a vector ^ 2 RjIj and a vector ^ 2 R+ such
straints are regular in x
that
X X
rf (^
x) = ^ i rgi (^
x) + ^ j rhj (^
x) (27.8)
i2I j2J
^ j (c hj (^
x)) = 0 8j 2 J (27.9)
@f X @gi X @hj
(^
x) = ^i (^
x) + ^j (^
x) 8k = 1; :::; n
@xk @xk @xk
i2I j2J
This lemma generalizes Fermat’s Theorem and Lemma 1021. Indeed, if I = J = ; then
condition (27.8) reduces to the condition rf (^ x)P= 0 of Fermat’s Theorem, while if I 6= ;
and J = ;, it reduces to the condition rf (^ x) = i2I ^ i rgi (^
x) of Lemma 1021. Relative to
these previous results, the novelty of Lemma 1029 is, besides the positivity of the vector ^
associated to the inequality constraints, the condition (27.9). To understand the role of this
condition, it is useful the following characterization.
Lemma 1030 Condition (27.9) holds if and only if ^ j = 0 for each j such that hj (^
x) < cj ,
that is, for each j 2
= A (^
x).
Proof Assume (27.9). Since for each j 2 J we have hj (^ x) cj , from the positive sign of ^
it follows that (27.9) implies cj hj (^ x) = 0 for each j 2 J, and therefore ^ j = 0 for each j
such that hj (^x) < cj . Conversely, if this last property holds we have
^ j (cj hj (^
x)) = 0; 8j 2 J: (27.10)
because, being hj (^
x) cj for each j 2 J, we have hj (^
x) < cj or hj (^
x) = cj . Expression
(27.10) immediately implies (27.9).
The next example shows that conditions (27.8) and (27.9) are necessary, but not su¢ cient
(something not surprising, being similar to what we saw for Fermat’s Theorem and Lemma
1021).
738 CHAPTER 27. INEQUALITY CONSTRAINTS
x31 + x32
max (27.11)
x1 ;x2 2
sub x1 x2 0
and
rf (0; 0) = rg (0; 0)
(0 0) = 0
The point (0; 0) satis…es with = 0 the conditions (27.8) and (27.9), but (0; 0) is not
solution of the optimization problem (27.11), as (26.9) shows. N
We defer the proof of Lemma 1029 to the appendix.2 It is possible, however, to give a
heuristic proof of this lemma by reducing problem (27.4) to a problem with only equality
constraints, and then by exploiting the results seen in the previous chapter. For simplicity,
we give this argument for the special case
Problems (27.12) and (27.13) are equivalent: x ^ is solution of problem (27.12) if and only if
there exists z^ 2 R such that (^
x; z^) is solution of problem (27.13).
2
A noteworthy feature of this proof is that it does not rely on the Implicit Function Theorem, unlike the
proof that we gave for Lemma 1012 (the special case of Lemma Lemma 1021 that we proved).
3
Note that the positivity of the square z 2 preserves the inequality g (x) b. The auxiliary variable z is
often called slack variable.
27.2. RESOLUTION OF THE PROBLEM 739
We have, therefore, reduced problem (27.12) to a problem with only equality constraints.
By Lemma 1021, (^ x; z^) is solution of such problem only if there exists a vector ( ^ ; ^ ) 2 R2
such that:
rF (^x; z^) = ^ rG (^
x; z^) + ^ rH (^
x; z^)
that is, only if
@F ^ @G (^ @H
(^
x; z^) = x; z^) + ^ (^
x; z^) 8i = 1; :::; n
@xi @xi @xi
@F ^ @G (^ @H
(^
x; z^) = x; z^) + ^ (^
x; z^)
@z @z @z
which is equivalent to:
x) = ^ rg (^
rf (^ x) + ^ rh (^
x)
2^ z = 0
On the other hand, we have 2^ z = 0 if and only if ^ z 2 = 0. Recalling the equivalence between
problems (27.12) and (27.13), we can therefore conclude that x ^ is solution of problem (27.12)
2
only if there exists a vector ( ; ) 2 R such that:
x) = ^ rg (^
rf (^ x) + ^ rh (^
x)
^ (c h (x)) = 0
We therefore have conditions (27.8) and (27.9) of Lemma 1029. What we have not been
able to prove is the positivity of the multiplier , and for this reason the proof just seen is
incomplete.4
jJj
for each (x; ; ) 2 A RjIj R+ . Note that in this case is required to be a positive
vector.
We can now generalize Theorem 1022 to the optimization problem (27.4). As we did for
Theorem 1022, also here we omit the proof because it is analogous to the one of Lagrange’s
Theorem.
4
Since it is, in any case, an incomplete argument, for simplicity we did not check the rank condition
required by Lemma 1021.
5
The notation (x; ; ) underlines the di¤erent status of x with respect to and .
740 CHAPTER 27. INEQUALITY CONSTRAINTS
^; ^ ; ^ = 0
rLx x (27.15)
^ j rL j
^; ^ ; ^ = 0
x 8j 2 J (27.16)
rL ^; ^ ; ^ = 0
x (27.17)
rL ^; ^ ; ^
x 0 (27.18)
The components ^ i and ^ j of the vectors ^ and ^ are called Lagrange multipliers, while
(27.15)-(27.18) are called Kuhn-Tucker conditions. The points x 2 A for which there exists
jJj
a pair ( ; ) 2 RjIj R+ such that the triple (x; ; ) satis…es the conditions (27.15)-(27.18)
are called points of Kuhn-Tucker.
The Kuhn-Tucker points are, therefore, the solutions of the (typically nonlinear) system
of equations and inequalities given by Kuhn-Tucker conditions. By Kuhn-Tucker’s Theorem,
a necessary condition for a point x, at which the constrains are regular, to be solution of
the optimization problem (27.4) is that it is a point of Kuhn-Tucker.6 Observe, however,
that a Kuhn-Tucker point (x; ; ) is not necessarily a stationary point for the Lagrangian
jJj
function: the condition (27.18) only requires rL (x; ; ) 2 R+ , not the stronger property
rL (x; ; ) = 0.
1. Determine whether Tonelli’s Theorem can be applied, that is, if f is continuous and
coercive on C.
3. Find the set S of the points of Kuhn-Tucker that belong to D1 , i.e., the set of the
jJj
points x 2 D1 for which there exists ( ; ) 2 RjIj R+ such that the triple (x; ; )
satis…es the Kuhn-Tucker conditions (27.15)-(27.18).7
f (^
x) f (x) 8x 2 S [ (C \ D0 )
then such x
^ is solution of the optimization problem (27.4).
The …rst phase of the method of elimination is the same of the previous chapter, while
the other phases are the obvious extension of the method to the case of the problem (27.4).
This problem is of the form (27.4), where f; h : R2 ! R are given by f (x1 ; x2 ) = x1 2x22
and h (x1 ; x2 ) = x21 + x22 , while b = 1. Since C is compact, the …rst phase is completed
through Weierstrass’Theorem.
We have rh (x) = (2x1 ; 2x2 ), and so the constraint is regular at each point x 2 C, that
is, C \ D0 = ;.
The Lagrangian function L : R3 ! R is given by
and to …nd the set S of its Kuhn-Tucker points it is necessary to solve the system
8 @L
>
> @x1 = 1 2 x1 = 0
>
> @L
>
< @x2 = 4x2 2 x2 = 0
@L
@ = 1 x21 x22 = 0
>
> @L
>
> = 1 x21 x22 0
>
: @
0
We start by observing that 6= 0, that is, > 0. Indeed, if = 0 the …rst equation
becomes 1 = 0, a contradiction. We therefore assume that > 0. The second equation
implies x2 = 0, and in turn the third equation implies x1 = 1. From the …rst equation it
follows = (1=2), and hence the only solution of the system is ( 1; 0; (1=2)). The only
Kuhn-Tucker point is therefore ( 1; 0) , i.e., S = f( 1; 0)g.
In sum, S [ (C \ D0 ) = f( 1; 0)g and the method of elimination allows us to conclude
that ( 1; 0) is the only solution of the optimization problem 27.19. Note that in this solution
the constraint is binding (i.e., it is satis…ed with equality); indeed = (1=2) > 0, as required
by Proposition 1033. N
7
Observe that these points x satisfy for sure the constraints and hence we always have S D1 \ C; it is
therefore not necessary to check if for a point x 2 S we have also x 2 C. A similar observation was made in
the previous chapter.
742 CHAPTER 27. INEQUALITY CONSTRAINTS
n
Pn 2
This problem
Pn is of the formn (27.4), where f; g : R ! R are given by f (x) = i=1 xi
and g (x) = i=1 xi , hj (x) : R ! R are given by hP j (x) = xj for j = 1; :::; n; while b = 1
and cj = 0 for j = 1; :::; n. The set C = x 2 Rn+ : ni=1 xi = 1 is compact and so also in
this case the …rst phase is completed thanks to the Weierstrass’Theorem.
For each x 2 Rn we have rg (x) = (1; :::; 1) and rhj (x) = ej . Therefore, the value
of these gradients does not depend on the point x considered. To verify the regularity
of the constraints, we consider the collection (1; :::; 1) ; e1 ; :::; en of these gradients. This
collection has n + 1 elements and it is obviously linearly dependent (the fundamental versors
e1 ,..., en are the most classic basis of Rn ).
On the other hand, it is immediate to see that any subcollection with at most n elements
is, instead, linearly independent. Hence, the only way to violate the regularity of the con-
straints is that they are all binding, so that all the collection of n + 1 elements have to be
considered. Fortunately, however, there does not exist any point x 2 Rn where all constraints
are binding. Indeed, the only point that satis…es with equality all the constraints
P xj 0
is the origin 0, which nevertheless does not satisfy the equality constraint ni=1 xi = 1.
We can conclude that the constraints are regular at all the points x 2 Rn , i.e., D0 = ;.
Hence, C \ D0 = ; and also the second phase of the method of elimination is complete.
The Lagrangian function L : R2n+1 ! R is given by
n n
! n
X X X
2
L (x1 ; x2 ; ) = xi + 1 xi + i xi 8 (x; ; ) 2 R2n+1
i=1 i=1 i=1
To …nd the set S of its Kuhn-Tucker points it is necessary to solve the system
8 @L
>
>
> @xi = 2xi + = 0;
Pn i
8i = 1; :::; n
>
> @L
> @ = (1 i=1 xi ) = 0
< @L = 1 Pn x = 0
>
@ i=1 i
@L
>
> i @ i = i xi = 0; 8i = 1; :::; n
>
> @L
>
> = xi 0; 8i = 1; :::; n
>
: @ i
i 0; 8i = 1; :::; n
2x2i xi + i xi = 0; 8i = 1; :::; n
and therefore
n
X
2 x2i =0
i=1
Pn 2
that is, = 2 i=1 xi . We conclude that 0.
If xi = 0, from the condition @L=@xi = 0 it follows that = i . Since i 0 and 0,
it follows that i = 0. In turn, this implies = 0 and hence using again the condition
@L=@xi = 0 we P conclude that xi = = 0 for each i = 1; :::; n. But this contradicts the
n
condition (1 i=1 xi ) = 0, and we can therefore conclude that xi 6= 0, that is, xi > 0.
Since this holds for each i = 1; :::; n, it follows that xi > 0 for each i = 1; :::; n. From
the condition i xi = 0 it follows that i = 0 for each i = 1; :::; n, and the …rst n equations
become:
2xi =0 8i = 1; :::; n
P
that is, xi = =2 for each i = 1; :::; n. The xi are therefore all equal; from ni=1 xi = 1 it
follows that
1
xi = 8i = 1; :::; n
n
In conclusion,
1 1
S= ; :::;
n n
Since D0 = ;, we have S [ (C \ D0 ) = f(1=n; :::; 1=n)g, and the method of elimination
allows to conclude that the point (1=n; :::; 1=n) is the solution of the optimization problem
27.20. N
has solution (1=n; :::; 1=n). It is the unique solution if h is strictly concave.
Pn
If h (xi ) = xi log xi , the function i=1 h (xi ) is the entropy (Examples 212 and 1009).
P
Proof Let x1 ; x2 ; :::; xn 2 [0; 1] with the constraint ni=1 xi = 1. By the Jensen’s inequality
applied to the function h, we can write
n n
!
X 1 1X 1
h (xi ) h xi = h
n n n
i=1 i=1
744 CHAPTER 27. INEQUALITY CONSTRAINTS
Namely,
n
X 1 1 1
h (xi ) nh =h + +h
n n n
i=1
Pn
This shows that (1=n; :::; 1=n) is optimal. Clearly, i=1 h (xi ) is strictly concave if h is.
Hence, the uniqueness is ensured by Theorem 706.
Proposition 1037 Let A be convex. If the functions gi are a¢ ne for each i 2 I and the
functions hj are convex for each j 2 J, then the choice set C de…ned in (27.5) is convex.
It is easy to give examples where C is no longer convex when the conditions of convexity
and a¢ nity used in this result are not satis…ed. Note that the convexity condition of the
hj is much weaker than that of a¢ nity on the gi . This shows that the convexity of the
choice set is more natural for inequality constraints than for equality ones. This is a crucial
“structural” di¤erence between the two types of constraints (which are more di¤erent than
it may appear prima facie).
De…nition 1038 The optimization problem (27.4) is called concave if the objective function
f is concave, the functions gi are a¢ ne and hj are convex over the open and convex set A.
where b = (b1 ; :::; bn ) 2 Rn . Often q = 0, so the equality constraints are represented in the
simpler form x = b.
Recall from Section 25.3 that the search of the solutions of an unconstrained optimization
problem for concave functions was based on a remarkable property: the …rst order necessary
condition for the existence of a local maximum becomes su¢ cient for the existence of a global
maximum in the case of concave functions.
The next fundamental result is the “constrained”version of this property. Note that the
regularity of the constraints does not play any role in this result.
Theorem 1039 In a concave optimization problem in which the functions f; fgi gi2I and
fhj gj2J are di¤ erentiable on A, the Kuhn-Tucker points are solutions of the problem.
Proof Let (x ; ; ) be a Kuhn-Tucker point for the optimization problem (27.4), that is,
(x; ; ) satis…es the conditions (27.15)-(27.18). In particular, this means that
X X
rf (x ) = i rgi (x ) + j rhj (x ) (27.24)
i2I j2A(x )\J
rhj (x ) (x x ) 0; 8j 2 A (x ) ; 8x 2 C;
rgi (x ) (x x ) = 0; 8i 2 I; 8x 2 C
f (x) f (x ) + rf (x ) (x x ); 8x 2 A;
Theorem 1039 gives us a su¢ cient condition for optimality: if a point is of Kuhn-Tucker,
then it is solution of the optimization problem. The condition is, however, not necessary:
there can be solutions of a concave optimization problem that are not Kuhn-Tucker points.
In view of Kuhn-Tucker’s Theorem, this can happen only if the solution is a point in which
the constraints are not regular. The next example illustrates this situation.
746 CHAPTER 27. INEQUALITY CONSTRAINTS
rf (0; 0; 0) = ( 1; 1; 0)
By combining Kuhn-Tucker’s Theorem and Theorem 1039 we get the following necessary
and su¢ cient optimality condition.
Theorem 1041 Consider a concave optimization problem in which the functions f; fgi gi2I
and fhj gj2J are of class C 1 on A. A point x 2 A for which the constraints are regular is
solution of such a problem if and only if it is a Kuhn-Tucker point.
1. Determine if the problem is concave, that is, if the function f is concave, if the functions
gi are a¢ ne and if the functions hj are convex.
2. Find the set C \ D0 .
3. Find the set T of the Kuhn-Tucker points,8 i.e., the set of the points x 2 A for which
jJj
there exists ( ; ) 2 RjIj R+ such that the triple (x; ; ) satis…es the Kuhn-Tucker
conditions (27.15)-(27.18).9
8
The set T considered here is therefore slightly di¤erent from the set T seen in the previous versions of
the method of elimination.
9
These points x satisfy surely the constraints and hence we have always T D1 \ C; it is therefore not
necessary to verify if for a point x 2 T we have also x 2 C. A similar observation was done in Chapter 9.
27.4. CONCAVE OPTIMIZATION 747
4. If T 6= ;, then taken any x 2 T , construct the set ff (x) : fx g [ (C \ D0 )g; all the
points of T are solutions of the problem,10 and a point x 2 C \ D0 is itself solution if
and only if f (x) = f (x ).
5. If T = ;, check if Tonelli’s Theorem can be applied (i.e., if f is continuous and coercive
on C); if this is the case, the maximizers of f on C \D0 are solutions of the optimization
problem (27.4).
Since either phase 4 or 5 applies, depending on whether or not T is empty, the actual
phases of the convex method are four.
The convex method works thanks to Theorems 1039 and 1041. Indeed, if T 6= ;, then by
Theorem 1039 all points of T are solutions of the problem. In this case, a point x 2 C \ D0
that does not belong to T can in turn be a solution only if its value f (x) is equal to that of
any point in T .
When, instead, we have T = ;, then Theorem 1041 guarantees that no point in D1 is
solution of the problem. At this stage, if Tonelli’s Theorem ensures the existence of at least
a solution, we can restrict the search to the set C \ D0 . In other words, it is su¢ cient to …nd
the maximizers of f on C \ D0 : they are also solutions of problem (27.4), and vice versa.11
Clearly, the convex method becomes especially powerful when T 6= ; because in such a
case there is no need to verify the validity of global existence theorems a la Weierstrass and
Tonelli, but it is su¢ cient to …nd the Kuhn-Tucker points.
If we are just satis…ed with the solutions that are Kuhn-Tucker points, without worrying
about the possible existence of solutions that are not so, we can give a short version of the
convex method, based only on Theorem 1039. We can call it the short convex method. It is
based only on two phases:
1. Determine whether the optimization problem (27.4) is concave, i.e., if the function f
is concave, if the functions gi are a¢ ne, and if the functions hj are convex.
2. Find the set T of the Kuhn-Tucker points.
By Theorem 1039, all the points of T are solutions of the problem. The short convex
method is simpler than the convex method, and it does not require neither the use of global
existence theorems nor the study of the regularity of the constraints. The price of this sim-
pli…cation is in the possible inaccuracy of this method: being based on su¢ cient conditions,
it is not able to …nd the solutions where these conditions are not satis…ed (by Theorem 1041,
such solutions would be points where the constraints are not regular). Furthermore, the
short method cannot be applied when T = ;; in such a case, it is necessary to apply the
complete convex method.
The short convex method is especially powerful when the objective function f is strictly
concave. Indeed, in such a case a solution found with the short method is necessarily also
the unique solution of the concave optimization problem. Therefore, in this case the short
method is as e¤ective as the complete convex method.
10
The set T is at most a singleton when f is strictly concave because in such a case there is at most a
solution of the problem (Theorem 706).
11
Observe that such maximizers exist. Indeed, if arg maxx2C f (x) 6= ; if none of its elements belongs to
D0 \ C, it follows that arg maxx2C f (x) = arg maxx2D0 \C f (x).
748 CHAPTER 27. INEQUALITY CONSTRAINTS
This problem is of the form (27.4), where f : R3 ! R is given by f (x) = x21 + x22 + x23 ,
h1 : R3 ! R is given by h1 (x) = (3x1 + x2 + 2x3 ) and h2 (x) : R3 ! R is given by
h2 (x) = x1 , while c1 = 1 and c2 = 0.
Using Theorem 928 it is easy to verify that f is strictly concave, while it is immediate
to verify that h1 and h2 are convex. Therefore, (27.28) is a concave optimization problem.
Since f is strictly concave, we apply without doubts the short convex method. To do this
we have to …nd the set T of the Kuhn-Tucker points.
The Lagrangian function L : R5 ! R is given by
To …nd the set T of its Kuhn-Tucker points it is necessary to solve the system of equalities
and inequalities: 8 @L
>
>
> @x1 = 2x1 + 3 1 + 2 = 0
>
> @L
= 2x2 + 1 = 0
>
> @x 2
>
> @L
>
> @x3 = 2x3 + 2 1 = 0
>
< @L
1 @ 1 = 1 ( 1 + 3x1 + x2 + 2x3 ) = 0
@L (27.29)
>
> 2 @ 2 = 2 x1 = 0
>
>
> @L
>
>
> @ 1 = 1 + 3x1 + x2 + 2x3 0
>
> @L
= x1 0
>
>
: @ 2
1 0; 2 0
We consider four cases, depending on the fact that the multipliers 1 and 2 are zero or
not.
Case 1 : 1 > 0 and 2 > 0. The conditions 2 @L=@ 2 = @L=@x1 = 0 imply x1 = 0 and
3 1 + 2 = 0. This last equation does not have strictly positive solutions 1 and 2 , and
hence we conclude that we cannot have 1 > 0 and 2 > 0.
Case 2 : 1 = 0 and 2 > 0. The conditions 2 @L=@ 2 = @L=@x1 = 0 imply x1 = 0 and
3 1 = 0, that is 1 = 0. This contradiction shows that we cannot have 1 = 0 and 2 > 0.
Case 3 : 1 > 0 and 2 = 0. The conditions 1 @L=@ 1 = @L=@x1 = @L=@x2 = @L=@x3 =
0 imply: 8
>
> 2x1 + 3 1 = 0
<
2x2 + 1 = 0
>
> 2x3 + 2 1 = 0
:
3x1 + x2 + 2x3 = 1
Solving for 1 , we get 1 = 1=7, and hence x1 = 3=14, x2 = 1=14 and x3 = 1=7. The
quintuple (3=14; 1=14; 1=7; 1=7; 0) solves the system (27.29), and hence (3=14; 1=14; 1=7) is a
Kuhn-Tucker point.
Case 4 : 1 = 2 = 0. The condition @L=@x1 = 0 implies x1 = 0, while the conditions
@L=@x2 = @L=@x3 = 0 imply x2 = x3 = 0. It follows that the condition @L=@ 1 0 implies
1 0, and this contradiction shows that we cannot have 1 = 2 = 0.
27.5. APPENDIX: PROOF OF A KEY LEMMA 749
In conclusion,
T = f((3=14; 1=14; 1=7))g
and since f is strictly concave the short convex method allows to conclude that
(3=14; 1=14; 1=7) is the unique solution of the optimization problem (27.28). N
We conclude with a …nal important observation. The solution methods seen in this
chapter are based on the search of the Kuhn-Tucker points, and therefore they require the
resolution of systems of nonlinear equations. In general, these systems are not easy to solve
and this limits the computational utility of these methods, whose importance is mostly
theoretical. At a numerical level, other methods are used, which the interested reader can
…nd in books of numerical analysis.
Lemma 1043 (i) The function y = x jxj is C 1 in R and Dx jxj = 2 jxj. (ii) The square
2 2
(x+ ) of the function x+ = max fx; 0g is C 1 on R, and D (x+ ) = 2x+ .
Proof (i) Observe that x jxj is in…nitely di¤erentiable for x 6= 0 and its …rst derivative is,
by the product rule for di¤erentiation,
jxj
Dx jxj = xD jxj + jxj Dx = x + jxj = 2 jxj
x
This is true for x 6= 0. Now it su¢ ces to invoke a classical result that asserts: let f : I ! R
be continuous on a real interval, and f be di¤erentiable at I fx0 g; if limx!x0 Df (x) = ,
then f is di¤erentiable at x0 and Df (x0 ) = . As an immediate consequence, Dx jxj = 2 jxj
also at x = 0. (ii) We have x+ = 2 1 (x + jxj). Therefore
2 1 1 1
x+ = (x + jxj)2 = x2 + x jxj
4 2 2
2 2
It follows that (x+ ) is C 1 and D (x+ ) = x + jxj = 2x+ .
Proof of Lemma 1029 Let k k be the Euclidean norm. We have hj (^ x) < cj for each
j 2= A (^x). Since A is an open, there exists e " > 0 su¢ ciently small such that Be" (^ x) =
fx 2 A : kx x ^k e "g A. Moreover, since each hj is continuous, for each j 2 = A (^x) there
exists "j su¢ ciently small such that hj (x) < cj for each x 2 B"j (^ x) = fx 2 A : kx x ^k "j g.
Let "0 = minj 2A(^
= x) "j and b " = min fe"; "0 g; in other words, b
" is the minimum between e " and
the "j . In this way we have Bb" (^ x) = fx 2 A : kx x ^k b "g A and hj (x) < cj for each
x 2 Bb" (^
x) and each j 2 = A (^x).
Given " 2 (0; b"], the set S" (^
x) = fx 2 A : kx x ^k = "g is compact. Moreover, by what
just seen hj (x) < cj for each x 2 S" (^ x) and each j 2 = A (^
x), that is, in S" (^
x) all the non
binding constraints are always satis…ed.
For each j 2 J, let h~ j : A Rn ! R be de…ned as
~ 2 2 C 1 (A) and
for each x 2 A. By Lemma 1043, h j
~ 2 (x)
@h + @hj (x)
j ~ j (x)
=2 h cj ; 8p = 1; :::; n (27.30)
@xp @xp
We …rst prove a property that we will use after.
f (x) f (^
x) kx x ^k2 (27.31)
0 1
X X 2
N@ x))2 +
(gi (x) gi (^ ~ j (x)
h ~ j (^
h x) A<0
i2I i2J\A(^
x)
Proof of Fact 1 We proceed by contradiction, and we assume therefore that there exists
" 2 (0; b
"] for which there is no N > 0 such that (27.31) holds. Take an increasing sequence
fNn gn with Nn " +1, and for each of these Nn take xn 2 S" (^ x) for which (27.31) does not
hold, that is, xn such that:
f (xn ) f (^
x) kxn x ^ k2
0 1
X X 2
Nn @ x))2 +
(gi (xn ) gi (^ ~ j (xn )
h ~ j (^
h x) A 0
i2I i2J\A(^
x)
f (xn ) f (^
x) kxn ^ k2
x X
(gi (xn ) x))2
gi (^ (27.32)
Nn
i2I
X 2
+ ~ j (xn )
h ~ j (^
h x)
j2J\A(^
x)
Since the sequence fxn g just constructed is contained in the compact set S" (^ x), by the
Bolzano-Weierstrass Theorem there exists a subsequence fxnk gk convergent in S" (^ x), i.e.,
there exists x 2 S" (^
x) such that xnk ! x . Inequality (27.32) implies that, for each k 1,
we have:
f (xnk ) f (^
x) kxnk ^ k2
x X
(gi (xnk ) x))2
gi (^ (27.33)
Nnk
i2I
X 2
+ ~ j (xn )
h ~ j (^
h x)
k
j2J\A(^
x)
f (xnk ) f (^
x) kxnk ^k2
x
lim =0
k Nnk
27.5. APPENDIX: PROOF OF A KEY LEMMA 751
~j ,
and hence (27.33) implies, thanks to the continuity of the functions gi and h
X X 2
(gi (x ) x))2 +
gi (^ ~ j (x )
h ~ j (^
h x)
i2I i2J\A(^
x)
0 1
X X 2
= lim @ (gi (xnk ) x))2 +
gi (^ ~ j (xn )
h k
~ j (^
h x) A=0
k
i2I j2J\A(^
x)
2
It follows that (gi (x ) x))2 =
gi (^ ~ j (x )
h ~ j (^
= 0 for each i 2 I and for each
h x)
j 2 J \ A (^x), from which gi (x ) = gi (^
x) = bi for each i 2 I and h~ j (x ) = h
~ j (^
x) = cj for
each j 2 J \ A (^x).
Since in S" (^x) the non binding constraints are always satis…ed, i.e., hj (x) < cj for each
x 2 S" (^x) and each j 2 = A (^
x), we can conclude that x satis…es all the constraints. We
therefore have f (^x) f (x ) given that x ^ solves the optimization problem.
On the other hand, since xnk 2 S" (^ x) for each k 1, (27.33) implies
f (xnk ) f (^
x)
0 1
X X 2
kxnk ^k2 + Nnk @
x (gi (xnk ) x))2 +
gi (^ ~ j (xn )
h k
~ j (^
h x) A "2
i2I j2J\A(^
x)
for each k 1, and hence f (xnk ) f (^x) + "2 for each k 1. Thanks to the continuity of
f , this leads to
f (x ) = lim f (xnk ) f (^x) + "2 > f (^
x)
k
which contradicts f (^
x) f (x ). This contradiction proves Fact 1. 4
Using Fact 1, we prove now a second property that we will need. Here we set S =
SRjIj+jJj+1 = x 2 RjIj+jJj+1 : kxk = 1 .
X X
" @f @gi " " @hj
0 (x" ) 2 x"j x
^j "
i (x ) j (x" ) = 0 (27.34)
@xz @xz @xz
i2I j2J\A(^
x)
Proof of Fact 2 Given " 2 (0; b"], let N" > 0 be the positive constant whose existence is
guaranteed by Fact 1. De…ne the function " : A Rn ! R as:
0 1
X X 2
" (x) = f (x) f (^
x) kx x ^k2 N" @ x))2 +
(gi (x) gi (^ ~ j (x) h
h x) A
~ j (^
i2I j2J\A(^
x)
752 CHAPTER 27. INEQUALITY CONSTRAINTS
The function " is continuous on the compact set B" (^ x) = fx 2 A : kx x ^k "g and, by
Weierstrass’Theorem, there exists x" 2 B" (^ x) such that " (x" ) " (x) for each x 2 B" (^
x).
In particular, " (x" ) " (^
x ) = 0, and hence (27.35) implies that kx" k < ", that is, x" 2
m
X X 2 1
c" = 1 + (2N" gi (x" ))2 + ~ j (x" )
2N" h ; "
0 =
c"
i=1 j2J\A(^
x)
so that (27.34) is obtained by dividing (27.36) by c" . Observe that "i 0 for each j 2 J
P P 2
and that i2I ( "i )2 + j2J "j = 1, i.e., " " " " "
0 ; 1 ; :::; jIj ; 1 ; :::; jJj 2 S. 4
Using Fact 2, we can now complete the proof.nTake a decreasing sequence o f"n gn (0; b
"]
n n n n n
with "n # 0, and consider the associated sequence 0 ; 1 ; :::; jIj ; 1 ; :::; jJj S whose
n
existence is guaranteednby Fact 2. o
n n n n n
Since the sequence 0 ; 1 ; :::; jIj ; 1 ; :::; jJj is contained in the compact set S, by
n
the Bolzano-Weierstrass Theorem there exists a subsequence
n o
nk nk nk nk nk
0 ; 1 ; :::; jIj ; 1 ; :::; jJj k
convergent in S, that is, there exists 0; 1 ; :::; jIj ; 1 ; :::; jJj 2 S such that
nk nk nk nk nk
0 ; 1 ; :::; jIj ; 1 ; :::; jJj ! 0; 1 ; :::; jIj ; 1 ; :::; jJj
nk @f X nk @gi nk X nk @hj nk
0 (xnk ) 2 (xnk x
^z ) i (x ) j (x ) = 0
@xz @xz @xz
i2I j2J\A(^
x)
27.5. APPENDIX: PROOF OF A KEY LEMMA 753
for each z = 1; :::; n. Consider the sequence fxnk gk so constructed. From xnk 2 B"nk (^
x) it
follows that kxnk x ^k < "nk ! 0 and hence, for each z = 1; :::; n,
@f X @gi X @hj
0 (^
x) i (^
x) j (x) (27.37)
@xz @xz @xz
i2I j2J\A(^
x)
0 1
nk @f X nk @gi nk X nk @hj nk A
= lim @ 0 xk 2 (xnk x
^z ) i (x ) j (x )
k @xz @xz @xz
i2I j2J\A(^
x)
= 0:
The linear independence of the gradients associated to the constraints that holds for the
hypothesis of regularity of the constraints implies i = 0 for each i 2 I, which contradicts
0 ; 1 ; :::; jIj ; 1 ; :::; jJj 2 S.
General constraints
where X is a subset of A and the other elements are as in the optimization problem (27.4).
This problem includes as special cases the optimization problems that we have seen so far:
we get back to the optimization problem (27.4) when X = A and to an unconstrained
optimization problem when I = J = ; and C = X is open.
Formulation (28.1) may be also useful when there are conditions on the sign or on the
value of the choice variables xi . The classic example is the non-negativity condition of the xi ,
which are best expressed as a constraint x 2 Rn+ rather than through n inequalities xi 0.
Here a constraint of the form x 2 X simpli…es the exposition.
In this chapter we want to address the general optimization problem (28.1). If X is open,
the solution techniques of Section 27.2 can be easily adapted by restricting the analysis on
X itself (which can play the role of the set A). Matters are more interesting when X is
not open. Here we focus on the concave case of Section 27.4, widely used in applications.
Consequently, throughout the chapter X denotes a closed and convex subset of an open
convex set A, f : A Rn ! R is a concave di¤erentiable objective function, gi : Rn ! R are
a¢ ne functions and hj : Rn ! R are convex di¤erentiable functions.2
1
Sometimes this distinction is made by talking of implicit and explicit constraints. Di¤erent authors,
however, may give an opposite meaning to this terminology (that, in any case, we do not adopt).
2
To ease matters, we de…ne the functions gi and hj on the entire space Rn . In particular, this means that
the equality constraints can be represented in the matrix form (27.23).
755
756 CHAPTER 28. GENERAL CONSTRAINTS
The set C is closed and convex. As it is often the case, the best way to proceed is to abstract
from the speci…c problem at hand, with its potentially distracting details. For this reason,
we will consider the following optimization problem:
where C is a generic closed and convex choice set that, for the moment, we treat as a black
box. Throughout this section we assume that f is continuously di¤erentiable on an open
convex set that contains C.
The next lemma gives a simple and elegant way to unify these two cases.
Proposition 1044 If x
^ 2 [a; b] is solution of the optimization problem (28.4), then
f 0 (^
x) (x x
^) 0 8x 2 [a; b] (28.5)
Proof We divide the proof in three parts, one for each of the equivalences to prove.
(i) Let x^ 2 (a; b). We prove that (28.5) is equivalent to f 0 (^
x) = 0. If f 0 (^
x) = 0 holds,
0
then f (^ x) (x x ^) = 0 for each x 2 [a; b], and hence (28.5) holds. Vice versa, suppose that
(28.5) holds. Setting x = a, we have (a x ^) < 0 and so (28.5) implies f 0 (^ x) 0. On
the other hand, setting x = b, we have (b x ^) > 0 and so (28.5) implies f 0 (^ x) 0. In
conclusion, x 0
^ 2 (a; b) implies f (^
x) = 0.
(iii) Let x ^ = b. We prove that (28.5) is equivalent to f 0 (b) 0. Let f 0 (b) 0. Since
0
(x b) 0 for each x 2 [a; b], we have f (b) (x b) 0 for each x 2 [a; b] and (28.5) holds.
Vice versa, suppose that (28.5) holds. By taking x 2 [a; b), we have (x b) < 0 and so (28.5)
implies f 0 (b) 0.
Proof of Proposition 1044 In view of Lemma 1045, it only remains to prove that (28.5)
becomes a su¢ cient condition when f is concave. Suppose therefore that f is concave and
that x
^ 2 [a; b] is such that (28.5) holds. We prove that this implies that x ^ is solution of
problem (28.4). Indeed, by (22.7) we have f (x) f (^ 0
x) + f (^x) (x x ^) for each x 2 [a; b],
which implies f (x) f (^ x) f 0 (^
x) (x x ^) for each x 2 [a; b]. Thus, (28.5) implies that
f (x) f (^x) 0, that is, f (x) f (^ x) for each x 2 [a; b]. Hence, x
^ solves the optimization
problem (28.4).
rf (^
x) (x x
^) 0 8x 2 C (28.6)
As in the scalar case, the variational inequality uni…es the optimality necessary conditions
for interior and boundary points. Indeed, it is easy to check that, when x ^ is an interior point
of C, (28.6) reduces to the classic condition rf (^ x) = 0 of Fermat’s Theorem.
Proof Let x
^ 2 C be solution of the optimization problem (28.3), i.e., f (^
x) f (x) for each
x 2 C. Given x 2 C, set zt = x
^ + t (x x^) for t 2 [0; 1]. Since C is convex, zt 2 C for each
758 CHAPTER 28. GENERAL CONSTRAINTS
0 (t) (0) f (^
x + t (x x
^)) f (^
x)
+ (0) = lim = lim
t!0+ t t!0+ t
df (^
x) (t (x x
^)) + o (kt (x x
^)k)
= lim
t!0+ t
o (t kx x ^k)
= df (^
x) (x x
^) + lim = df (^
x) (x x
^) = rf (^
x) (x x
^)
t!0 + t
For each t 2 [0; 1] we have (0) = f (^x) f (zt ) = (t), and so : [0; 1] ! R has a (global)
maximizer at t = 0. It follows that 0+ (0) 0, which implies rf (^ x) (x x ^) 0, as desired.
As to the converse, assume that f is concave. By (22.9), f (x) f (^x) + rf (^
x) (x x ^)
for each x 2 C, and therefore (28.6) implies f (x) f (^x) for each x 2 C.
For the dual minimum problems, the variational inequality is easily seen to take the dual
form rf (^
x) (x x ^) 0. For interior solutions, instead, the condition rf (^ x) = 0 is the
same in both maximum and minimum problems.3
NC (x) = fy 2 Rn : y (x x) 0 8x 2 Cg
Next we provide a couple of important properties of NC (x). In particular, (ii) shows that
NC (x) is nontrivial only if x is a boundary point.
Proof (i) The set NC (x) is clearly closed. Moreover, given y; z 2 NC (x) and ; 0, we
have
( y + z) (x x) = y (x x) + z (x x) 0 8x 2 C
and so y + z 2 NC (x). By Proposition 634, NC (x) is a convex cone. (ii) We only prove
the “if” part. Let x be an interior point of C. Suppose, by contradiction that there is a
vector y 6= 0 in NC (x). As x is interior, we have that x + ty 2 C for t > 0 su¢ ciently
small. Hence we would have y (x + ty x) = ty y = t kyk2 0. This implies y = 0, a
contradiction. Hence NC (x) = f0g.
To see the importance of normal cones, note that condition (28.6) can be written as:
rf (^
x) 2 NC (^
x) (28.7)
3
The unifying power of variational inequalities in optimization is the outcome of a few works of Guido
Stampacchia in the early 1960s. For an overview, see D. Kinderlehrer and G. Stampacchia, “An introduction
to variational inequalities and their applications”, Academic Press, 1980.
28.2. ANALYSIS OF THE BLACK BOX 759
The next result characterizes the normal cone for convex cones.
Proposition 1048 If C is a convex cone and x 2 C, then
NC (x) = fy 2 Rn : y x = 0 and y x 0 8x 2 Cg
If, in addition, C is a vector subspace, then NC (x) = C ? for every x 2 C.
Proof Let y 2 NC (x) : Then y (x x) 0 for all x 2 C: As 0 2 C, we have y (0 x) 0.
Hence y x 0. On the other hand, we can write y x = y (2x x) 0. It follows that
y x = 0. In turn, y x = y (x x) 0 for each x 2 C. Conversely, if y satis…es the
two conditions y x = 0 and y x 0 for each x 2 C, then y (x x) = y x y x 0,
and so y 2 NC (x). Suppose now, in addition, that C is a vector subspace. A subspace
C is a cone such that x 2 C implies x 2 C. Hence, the …rst part of the proof yields
NC (x) = fy 2 Rn : y x = 0 and y x = 0 8x 2 Cg. Since x 2 C, we then have NC (x) =
fy 2 Rn : y x = 0 8x 2 Cg = C ? .
P
We can also characterize the normal cones of the simplices n 1 = x 2 Rn+ : nk=1 xi = 1 ,
another all-important class of closed and convex sets. To this end, given x 2 n 1 set
The set f y 2 Rn : y 2 I (x) and 0g is easily seen to be the smallest convex cone
that contains I (x). The normal cone is thus such set.
@f @f
(^
x) = ^ if x
^i > 0 ; (^
x) ^ if x
^i = 0
@xi @xi
@f
(^
x) ^ 8k = 1; :::; n (28.13)
@xk
@f
(^
x) ^ x
^k = 0 8k = 1; :::; n (28.14)
@xk
Proof Suppose that A (x) is not a singleton and let i; j 2 A (x). Clearly, 0 < xi ; xj < 1.
Consider the points x" 2 Rn having coordinates x"i = xi + ", x"j = xj ", and x"k = xk
for all k 6= i and k 6= j; while the parameter " runs over Pn [ "0"; "0 ] with "0 >" 0 su¢ ciently
small in order that x " 0 for " 2 [ "0 ; "0 ]. Note that k=1 xk = 1 and so x 2 n 1 . Let
y 2 N n 1 (x). By de…nition, y (x" x) 0 for every " 2 [ "0 ; "0 ]. Namely, "yi "yj =
" (yi yj ) 0, which implies yi = yj . Hence, it must hold yi = for all i 2 A (x). That
is, the values of y must be constant on A (x). This is trivially true when A (x) is singleton.
Let now j 2 = A (x). Consider the vector xj 2 Rn , where xjj = 1 and xjk = 0 for each k 6= j: If
y 2 N n 1 (x), then y xj x 0. That is,
X X X
yj yk xk = yj yk xk = yj xk = yj 0:
k6=j k2A(x) k2A(x)
Therefore, N n 1 (x) f y 2 Rn : y 2 I (x) and 0g. We now show the converse inclu-
n
sion. Let y 2 R be such that, for some 0, we have yi = for all i 2 A (x) and yk
28.2. ANALYSIS OF THE BLACK BOX 761
for each k 2
= A (x). If x 2 n 1, then
n
X X X
y (x x) = yi (xi xi ) = yi (xi xi ) + yi (xi xi )
i=1 i2A(x) i2A(x)
=
0 1
X X X X
= (xi xi ) + yi xi = @ xi A + yi xi
i2A(x) i2A(x)
= i2A(x) i2A(x)
=
0 1
X X
@ xi A + xi = 0
i2A(x) i2A(x)
=
Hence y 2 N n 1 (x).
Proposition 1052 Let C = C1 \ \ Cn , with each Ci closed and convex. Then, for all
x 2 C,
( n )
X
yi : yi 2 NCi (x) 8i = 1; :::; n NC (x)
i=1
Equality holds if C satis…es Slater’s condition, i.e., int C1 \ \ int Cn 6= ;, where the set
Ci itself can replace its interior int Ci if it is a¢ ne.
P
Proof Let xP 2 C. Suppose y = ni=1 yi , with yi 2 NCi (x) for every i = 1; :::; n. Then,
y (x x) = ni=1 yi (x x) 0, and so y 2 NC (x). This proves the inequality. We omit
the proof that the Slater’s condition implies the equality.
In words, under Slater’s condition the normal cone of an intersection of sets is the sum
of their normal cones. Hence, a point x ^ satis…es the …rst order condition (28.7) if and only
if there is a vector y^ = (^
y1 ; :::; y^n ) such that
n
X
rf (^
x) = y^i
i=1
y^i 2 NCi (^
x) 8i = 1; :::; n
A familiar “multipliers”format emerges. The next section will show how the Kuhn-Tucker’s
Theorem …ts in this general framework.
762 CHAPTER 28. GENERAL CONSTRAINTS
Lemma 1053 The set C satis…es Slater’s condition if there is x 2 int X such that gi (x) = bi
for all i 2 I and hj (x) < cj for all j 2 J.
\ \
Proof The level sets Ci are a¢ ne (Proposition 603). Since x 2 X \ Ci \ int Cj ,
i2I j2J
such intersection is non-empty and so C satis…es Slater’s condition.
In what follows we thus assume the existence of such x.5 In view of Proposition 1052, it
now becomes key to characterize the normal cones of the sets Ci and Cj .
Lemma 1054 (i) For each x 2 Ci , we have NCi (x) = f rg (x) : 2 Rg for each x 2 Ci ;
(ii) For each x 2 Cj , we have
8
>
> f rhj (x) : 0g if hj (x) = cj
<
NCj (x) = f0g if hj (x) < cj
>
>
:
; if hj (x) > cj
where A (x) is the collection of the binding inequality constraints de…ned in (27.7). Since
here the …rst order condition (28.7) is a necessary and su¢ cient optimality condition, we can
say that x
^ 2 C solves the optimization problem (28.1) if and only if there exists a triple of
^ jJj
vectors ( ; ^ ; ^ ) 2 RjIj R+ Rn such that
X X
rf (^x) = ^ + ^ i rgi (^
x) + ^ j rhj (^
x) (28.15)
i2I j2J
^ j (c hj (^
x)) = 0 8j 2 J (28.16)
5
This also ensures that the problem is well posed in the sense of De…nition 1026.
28.3. RESOLUTION OF THE GENERAL CONCAVE PROBLEM 763
Indeed, as we noted in Lemma 1030, condition (28.16) amounts to require ^ j = 0 for each
j2= A (^
x).
To sum up, under a Slater’s condition we got back the Kuhn-Tucker’s conditions (27.8)
and (27.9), suitably modi…ed to cope with the new constraint x 2 X. We leave to the reader
the Lagrange formulation of these conditions.
So, conditions (28.15) and (28.16) can be equivalently written (with gradients unzipped) as:
@f X X @hj
(^
x) ^ i @gi (^
x) + ^j (^
x) 8k = 1; :::; n
@xk @xk @xk
i2I j2J
^ j (c hj (^
x)) = 0 8j 2 J
0 1
@f X @gi X @hj
@ (^
x) ^i (^
x) ^j x)A x
(^ ^k = 0 8k = 1; :::; n
@xk @xk @xk
i2I j2J
@f X X @hj
(^
x) ^ i @gi (^
x) ^j (^
x) ^ 8k = 1; :::; n
@xk @xk @xk
i2I j2J
^ j (c hj (^
x)) = 0 8j 2 J
0 1
X @gi X @hj
@ @f (^x) ^i (^
x) ^j (^
x) ^A x
^k = 0 8k = 1; :::; n
@xk @xk @xk
i2I j2J
Example 1056 (i) Let X = Y = R and consider the correspondence ' : R R given by
' (x) = [ jxj ; jxj]. For instance, ' (1) = ' ( 1) = [ 1; 1] and ' (0) = f0g. (ii) The budget
n
correspondence B : Rn+1 + ! 2R+ is de…ned by B (p; w) = x 2 Rn+ : p x w .1 Note that
B (p; w) 6= ; for all (p; w) 2 Rn+1
+ since 0 2 B (p; w) for all (p; w) 2 Rn+1
+ . N
Unless otherwise stated, from now on we assume that X is a subset of Rn and that Y is
a subset of Rm . We say that ' is:
765
766 CHAPTER 29. PARAMETRIC OPTIMIZATION PROBLEMS
29.1.2 Graph
The graph Gr ' of a correspondence ' : A X Y is the set
Example 1058 (i) The graph of the correspondence ' : R R given by ' (x) = [ jxj ; jxj]
is Gr ' = (x; y) 2 R2 : jxj y jxj . Graphically:
10
-5
-10
-10 -5 0 5 10
The converse implications are false: closedness and convexity of the graph of ' are
signi…cantly stronger assumptions than the closedness and convexity of the images ' (x).
This is best seen by considering scalar functions, as the next examples show.
x if x < 0
f (x) =
1 if x 0
1
y
-5 -4 -3 -2 -1 1 2 3 4 5
x
-1
-2
-3
-4
-5
The lack of convexity is obvious. To see that Gr ' is not closed observe that the origin
(0; 0) is a boundary point that does not belong to Gr '. (ii) A continuous scalar func-
tion f : R ! R has convex graph if and only if is a¢ ne. The “if” is obvious. As
to the “only if,” suppose that Gr f is convex. Given any x; y 2 R and any 2 [0; 1],
then ( x + (1 ) y; f (x) + (1 ) f (y)) 2 Gr f , that is, f ( x + (1 ) y) = f (x) +
(1 ) f (y). By standard results on Cauchy functional equation, this implies that there
exist m; q 2 R such that f (x) = mx + q. N
That is, the correspondence collects all solutions of problem (29.1). Its domain S is the
solution domain, that is, the collection of all for which problem (29.1) admits a solution.
768 CHAPTER 29. PARAMETRIC OPTIMIZATION PROBLEMS
Example 1060 (i) The parametric optimization problem with equality and inequality con-
straints has the form
max f (x)
x
sub gi (x) = bi 8i 2 I
hj (x) cj 8j 2 J
In this case, if we set b = b1 ; :::; bjIj 2 RjIj and c = c1 ; :::; cjJj 2 RjJj , the parameter set
consists of all = (b; c) 2 RjIj RjJj .
(ii) The consumer problem (Section 16.1.3) is a parametric optimization problem. The set
A is Rn+ . The space Rn+1
+ of all price and income pairs is the parameter set , with elements
= (p; I). The budget correspondence B : Rn+1 + Rn+ is the feasibility correspondence and
the utility function u is the objective function (which does not depend on the parameter). Let
S be the set of all parameters (p; I) for which the consumer problem has solution (i.e.,
an optimal bundle). The demand correspondence D : S Rn+ is the solution correspondence,
which becomes a demand function D : S ! Rn+ when optimal bundles are unique. Finally,
the indirect utility function v : S ! R is the value function.
(iii) Consider a pro…t maximizing …rm producing a single output with price p, using an
input vector x 2 Rn+ with prices w 2 Rn+ , according to a production function y = f (x) : The
pro…t function is (p; w) = supx 0 pf (x) w x. In this case the choice set A is Rn+ and
the parameter set is R+ Rn+ . Note that in this case ' ( ) = A for every = (p; w). N
The solution correspondence and the value function are key for these exercises because they
describe how optimal choices and their value vary as parameters vary. For instance, in the
consumer problem the demand correspondence and the indirect utility function describe,
respectively, how the optimal bundles and their values are a¤ected by changes in prices and
income.
The convexity of the solution set means inter alia that, when non-empty, such set is
either a singleton or an in…nite set. That is, either the solution is unique or there is an
in…nite number of them. Next we give the most important su¢ cient condition that ensures
uniqueness.
We turn now to value functions. In the following result we assume the convexity of the
graph of '. As we already remarked, this is a substantially stronger assumption than the
convexity of the images ' (x).
770 CHAPTER 29. PARAMETRIC OPTIMIZATION PROBLEMS
Example 1065 In the consumer problem, the graph of the budget correspondence is clearly
convex. Therefore, Proposition 1064 implies that the indirect utility v is quasi-concave
(concave) provided the utility is quasi-concave (concave). Since in Proposition ?? we proved
that v is quasi-convex, regardless to the behavior of u, we conclude that v is quasi-a¢ ne. N
where the feasibility correspondence is constant, with ' ( ) = C A for all 2 . The
parameter only a¤ects the objective function. To ease matters, throughout the section we
also assume that S = .
We …rst approach heuristically the issue. To this end, suppose that n = k = 1 so
that both the parameter and the choice variable x are scalars. Moreover, assume that
there is a unique solution for each , so that : ! R is the solution function. Then
v ( ) = f ( ( ) ; ) for every 2 . A heuristic application of the chain rule (a “back of the
envelope calculation”) then suggests that, if exists, the derivative of v at 0 is:
@f ( ( 0 ) ; 0) 0 @f ( ( 0 ) ; 0)
v0 ( 0) = ( 0) +
@x @
29.4. ENVELOPE THEOREMS I: FIXED CONSTRAINT 771
Remarkably, the …rst term is null because by Fermat’s Theorem (@f =@x) ( ( 0 ) ; 0) = 0
(provided the solution is interior). Thus,
@f ( ( 0 ) ; 0)
v0 ( 0) = (29.5)
@
Next we make general and rigorous this important …nding.
Theorem 1066 Suppose f (x; ) is, for every x 2 C, di¤ erentiable at 0 2 int . If v is
di¤ erentiable at 0 , then for every x
^ 2 ( 0 ) we have rv ( 0 ) = r f (^
x; 0 ), that is,
@v ( 0 ) @f (^
x; 0 )
= 8i = 1; :::; k (29.6)
@ i @ i
If f is strictly quasi-concave in x and ' is convex-valued, then is a function (Proposition
1063). So, (29.6) can be written as
@v ( 0 ) @f ( ( 0 ) ; 0)
= 8i = 1; :::; k
@ i @ i
which is the general form of the heuristic formula (29.5).
@f (x; 0 ) f x ( 0) ; + hei
0 f (x ( 0 ) ; 0) w 0 + hei w ( 0)
= lim = lim
@ i h!0+ h h!0+ h
v i
0 + he v ( 0) @v ( 0 )
lim =
h!0+ h @ i
On the other hand,
w(+ tu) w ( 0 )
0 v ( 0 + tu) v ( 0 )
t t
k
for all u 2 R and t < 0 su¢ ciently small. By proceeding as before, we then have
@f (x; 0 ) @v ( 0 )
@ i @ i
This proves (29.6).
The hypothesis that v is di¤erentiable is not that appealing because it is not in terms
of the primitive elements f and C of problem (29.4). Indeed, to check it we need to know
the value function. Remarkably, in concave problems this di¤erentiability hypothesis follows
from hypotheses that are directly on the objective function.
772 CHAPTER 29. PARAMETRIC OPTIMIZATION PROBLEMS
Theorem 1067 Let C and be convex. Suppose f (x; ) is, for every x 2 C, di¤ erentiable
at 0 2 int . If f is concave on C , then v is di¤ erentiable at 0 .
w( ) v( ) v ( 0) + ( 0) = w ( 0) + ( 0)
@f ( ( 0 ) ; 0) ^ ( 0) @ ( ( 0) ; 0)
v0 ( 0) =
@ @
where ^ ( 0 ) is the Lagrange multiplier that corresponds to the unique solution ( 0 ). Indeed,
being ( ( ) ; ) = 0 for every 2 , by a heuristic application of the chain rule we have
@ ( ( 0) ; 0) 0 @ ( ( 0) ; 0)
( 0) + =0
@x @
29.6. MARGINAL INTERPRETATION OF MULTIPLIERS 773
jJj
for every s = 1; :::; k. Here ( ^ ( 0 ) ; ^ ( 0 )) 2 RjIj R+ are the Lagrange multipliers associated
with the solution ( 0 ), here assumed to be unique (for simplicity).
We can derive heuristically this formula with the heuristic argument that we just used for
the equality case. Indeed, if we denote by A ( ( 0 )) be the set of the binding constraints at
0 , by Lemma 1030 we have ^ j = 0 for each j 2 = A ( ( 0 )). So, the non-binding constraints
at 0 do not a¤ect the derivation because their multipliers are null.
That said, let us consider the standard problem (27.4) in which the objective function does
not depend on the parameter, i (x; ) = gi (x) bi for every i 2 I, and j (x; ) = hj (x) cj
for every j 2 J (Example 1060). Formula (29.9) then implies
@v (b; c)
= ^ i (b; c) 8i 2 I
@bi
@v (b; c)
= ^ j (b; c) 8j 2 J
@cj
774 CHAPTER 29. PARAMETRIC OPTIMIZATION PROBLEMS
Interestingly, the multipliers describe the marginal e¤ect on the value function of relaxing
the constraints, that is, how much it is valuable to relax them. In particular, we have
@v (b; c) =@cj = ^ j (b; c) 0 because it is always bene…cial to relax an inequality constraint:
more alternatives become available. In contrast, this might not be the case for an equality
constraint, so the sign of @v (b; c) =@bi = ^ i (b; c) is ambiguous.
Part VIII
Integration
775
Chapter 30
Riemann’s integral
Let us consider a positive function f (i.e., taking values 0) which is de…ned on a closed
interval [a; b]. Intuitively, the integral of f on [a; b] is the measure, called area, of the portion
of the plane
6
y
1
O a b x
0
0 1 2 3 4 5 6
The problem is how to make such a natural intuition rigorous. We follow the classical
procedure known as the method of exhaustion. It consists of approximating the measure
of A f[a;b] through areas of very simple polygons, the so-called “plurirectangles”. Their
measure is calculated in an elementary way. Thanks to these simple polygons, we try to
obtain an approximation, as precise as possible, in order to capture, at the limit (if it exists),
the value of A f[a;b] . This value will be assumed as being the integral of f on [a; b]. The idea
of the method of exhaustion was born in the Greek mathematics, where it found brilliant
applications in the works of Eudoxus of Cnidus and Archimedes of Syracuse.
777
778 CHAPTER 30. RIEMANN’S INTEGRAL
30.1 Plurirectangles
We know how to calculate the areas of elementary geometric shapes. Among them, the
simplest ones are rectangles whose area is given by the product of the lengths of their
base and their corresponding height. A simple, but crucial, generalization of rectangles is
represented by the so-called plurirectangles,
-1
-1 0 1 2 3 4 5 6 7 8 9
4 y
3.5
2.5
1.5
0.5
0
O a b x
-0.5
-1
0 1 2 3 4 5 6
4 y
3.5
2.5
1.5
0.5
0
O a b x
-0.5
-1
0 1 2 3 4 5 6
Naturally, the area of A f[a;b] is larger than the one of every inscribed plurirectangle
and smaller than the one of every circumscribed plurirectangle. The area of A f[a;b] is
therefore included between the areas of the inscribed and circumscribed plurirectangles.
Hence, the …rst important observation is that the area of A f[a;b] can be always “sand-
wiched” between the areas of plurirectangles. This yields simple lower approximations (the
areas of the inscribed plurirectangles) and upper approximations (the areas of the circum-
scribed plurirectangles) of the value of A f[a;b] .
The second crucial observation is that such a sandwich, and consequently the relative
approximations, can be made better and better by considering …ner and …ner plurirectangles
which are obtained by subdividing more and more their bases:
4 y 4 y
3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
O a b x O a b x
-0.5 -0.5
-1 -1
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Indeed, by subdividing more and more the bases, the area of the inscribed plurirectangles
becomes larger and larger, even if it remains always smaller than the area of A f[a;b] . On
the other hand, the area of the circumscribed plurirectangles becomes smaller and smaller,
even if it remains always larger than the area of A f[a;b] . In other words, the two slices of
the sandwich that include the set A f[a;b] (i.e., the lower and the upper approximations)
take values that become progressively closer to each other.
780 CHAPTER 30. RIEMANN’S INTEGRAL
30.2 De…nition
We now formalize the method of exhaustion. We …rst consider positive and bounded func-
tions f : [a; b] ! R+ . In the next section, we will consider functions taking any real value.
The set of all the possible subdivisions of an interval [a; b] will be denoted by .
Given a bounded function f : [a; b] ! R+ , let us consider the contiguous bases generated
by the points of the subdivision
Let us build on them the largest plurirectangle inscribed in the set under f . In particular,
for the i-th base, the maximum height mi of the rectangle with base [xi 1 ; xi ] that can be
inscribed in the set under f is
mi = inf f (x)
x2[xi 1 ;xi ]
Since we have assumed that f is bounded, by the Least Upper Bound Principle, this in…mum
exists and is …nite, that is, mi 2 R. Since the length xi of each base [xi 1 ; xi ] is
xi = xi xi 1
In an analogous way, let us build on the contiguous bases (30.2), determined by the subdivi-
sion , the smallest plurirectangle that circumscribes the set under f . For the i-th base, the
minimum height Mi of the rectangle with base [xi 1 ; xi ] that circumscribes the set under f
is given by
Mi = sup f (x)
x2[xi 1 ;xi ]
30.2. DEFINITION 781
M
i
0
m
i
-1
-2 x x
i-1 i
-3
-2 -1 0 1 2 3 4
As before, given that f is bounded, by the Least Upper Bound Principle, the supremum exists
and is …nite, that is, Mi 2 R. Therefore, the area S (f; ) of the minimum circumscribed
plurirectangle is
Xn
S (f; ) = Mi xi (30.4)
i=1
Since mi Mi for every i, we have
I (f; ) S (f; ) 8 2 (30.5)
In particular, the area of the set under f lies between these two values. Hence, I (f; ) gives
a lower approximation of this area, while S (f; ) gives an upper approximation of it. The
sum I(f; ) is called lower integral sum of f with respect to , and the sum S(f; ) is called
upper integral sum of f with respect to .
De…nition 1069 Given two subdivisions and 0 of [a; b], we say that 0 re…nes , if 0.
In other words, the …ner subdivision 0 is obtained by adding further points to . For
example, if we consider [a; b] = [0; 1], the subdivision
0 1 1 3
= 0; ; ; ; 1
1 2 4
re…nes the subdivision = f0; 1=2; 1g.
It is easy to see that if 0 re…nes , then
0 0
I (f; ) I f; S f; S (f; ) (30.6)
In other words, a …ner subdivision 0 yields better approximations, both lower and upper, of
the area under f .1 By starting from any subdivision, we can always re…ne it, thus improving
(or at least not worsening) the approximations given by the respective plurirectangles.
1
For sake of brevity, we write “area under f ” instead of the more precise expression “area of the portion
of plane that lies under f ”.
782 CHAPTER 30. RIEMANN’S INTEGRAL
The same can be done by starting from any two subdivisions and 0 , where not neces-
sarily one is taken to be …ner than the other. Indeed, the subdivision 00 = [ 0 is formed
by all the points that belong to both subdivisions and 0 and it re…nes both and 0 . In
other words, 00 is a common re…nement of and 0 .
1 1 2 0 1 1 3
= 0; ; ; ; 1 and = 0; ; ; ; 1
3 2 3 4 2 4
They are two di¤erent subdivisions. Neither re…nes 0 nor 0 re…nes . The subdivision
00 0 1 1 1 2 3
= [ = 0; ; ; ; ; ; 1
4 3 2 3 4
and
0 00 00 0
I f; I f; S f; S f; (30.8)
A common re…nement 00 gives better approximations, both lower and upper, with respect
to and 0 , of the area under f .
All this motivates the next de…nition.
One of the …rst questions that arises is whether the lower and upper integrals of a bounded
function exist or not.
30.2. DEFINITION 783
Lemma 1072 If f : [a; b] ! R+ is a bounded function, then both the lower integral and the
upper integral exist and are …nite. Moreover, we have
Z b Z b
f (x) dx f (x) dx (30.11)
a a
Proof Since f is positive and bounded, there exists M 0 such that 0 f (x) M for
every x 2 [a; b]. Therefore, for every subdivision = fxi gni=0 we have
and hence
0 I (f; ) S (f; ) M (b a) 8 2
The Least Upper Bound Principle implies that the supremum in (30.9) and the in…mum in
Rb Rb
(30.10) exist and are …nite and positive, that is, f (x) dx 2 R+ and a f (x) dx 2 R+ .
a
We still need to prove the inequality (30.11). Let us suppose, by contradiction, that
Z b Z b
f (x) dx f (x) dx = " > 0
a a
By the previous lemma, every bounded function f : [a; b] ! R+ has both the lower
integral and the upper integral, and
Z b Z b
f (x) dx f (x) dx
a a
784 CHAPTER 30. RIEMANN’S INTEGRAL
The area under f lies between these two values.2 The last inequality is the most re…ned
version of (30.6). The lower and upper integrals are respectively the best lower and upper
approximations of the area under f that can be obtained starting from plurirectangles. In
Rb Rb
particular, when f (x) dx = a f (x) dx, the area under f will be assumed equal to such a
a
value. This motivates the next fundamental de…nition.
Example 1074 Let f : [a; b] ! R be de…ned as f (x) = x. For any subdivision fxi gni=0 we
have
n
X
I (f; ) = x0 x1 + x1 x2 + + xn 1 xn = xi 1 xi
i=1
n
X
S (f; ) = x1 x1 + x2 x2 + + xn xn = xi xi
i=1
and therefore
n
X
S (f; ) I (f; ) = (x1 x0 ) x1 + (x2 x1 ) x2 + + (xn xn 1 ) xn = ( xi )2
i=1
Rb Rb
that is af (x) dx = f (x) dx. It follows that f (x) = x is integrable. N
a
2
Recall that the area may or may not exist. If it exists, it is the measure of the set under f .
30.2. DEFINITION 785
1 if x 2 Q\ [a; b]
f (x) = (30.12)
0 if x 2
= Q\ [a; b]
restricted to [a; b]. By Proposition 39, on the density of rational numbers, for every a
x<y b there exists a rational number q such that x < q < y. It is also true that for
every a x < y b, there exists an irrational number r such that x < r < y. Given any
subdivision fxi gni=0 of [a; b], we have
Therefore
I (f; ) = 0 x1 + 0 x2 + + 0 xn = 0
and
n
X
S (f; ) = 1 x1 + 1 x2 + + 1 xn = xi = b a
i=1
Rb Rb
which implies f (x) dx = 0 < b a= af (x) dx. The Dirichlet function is not integrable
a
in the sense of Riemann.3 N
Finally, let us introduce a useful quantity that characterizes the “thickness” of a subdi-
vision of [a; b].
De…nition 1076 Given a subdivision of [a; b], we de…ne mesh of , which we denote by
j j, the positive quantity
j j= max xi
i=1;2;:::;n
We now extend the notion of integral to any bounded function f : [a; b] ! R, not necessarily
positive. For a function f : [a; b] ! R that can assume both negative and positive values,
3
Therefore, it has no meaning (at least in the sense of Riemann) to talk about the “area” of the set under
such a function.
786 CHAPTER 30. RIEMANN’S INTEGRAL
the set under f on [a; b] has in general a positive part and a negative part
5 y
2
+
1
O - x
0
-1
-2
-3 -2 -1 0 1 2 3 4
and the integral is the di¤erence between the areas of the positive part and of the negative
part. If they have equal value, the integral is zero: it is the case, for example, of the function
f (x) = sin x on the interval [0; 2 ].
To make it rigorous the idea, it is useful to decompose a function in its positive and
negative parts.
0 x<0 x x<0
f + (x) = and f (x) =
x x 0 0 x 0
Graphically:
30.2. DEFINITION 787
3 3
y y
2.5 2.5
2 2
+ -
f f
1.5 1.5
1 1
0.5 0.5
0 0
O x O x
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-4 -2 0 2 4 6 -4 -2 0 2 4 6
Graph of f + Graph of f
4 4
y y
3 3
-
f
2 2
+
f
1 1
0 0
O x O x
-1 -1
-2 -2
-8 -6 -4 -2 0 2 4 6 8 10 -8 -6 -4 -2 0 2 4 6 8 10
Graph of f + Graph of f
N
788 CHAPTER 30. RIEMANN’S INTEGRAL
is equivalent to consider the di¤erence between the areas under the positive part and the
negative part.
All of this motivates the following de…nition of Riemann’s integral for bounded functions
which are not necessarily positive.
Such a de…nition makes it rigorous and transparent the idea of considering with R b di¤erent
sign the areas that lie, respectively, above and below the horizontal axis, that is, a f + (x) dx
Rb
and a f (x) dx.
Also for general functions, the sum I(f; ) is called lower integral sum of f with respect
to the subdivision , and the sum S(f; ) is called upper integral sum of f with respect to
the subdivision .The reader can easily verify that, for these sums, there continue to hold
properties (30.5), (30.6), (30.7) and (30.8). In particular,
Also for any bounded function f : [a; b] ! R (not necessarily positive) we can de…ne the
lower and upper integrals
Z b Z b
f (x) dx = sup I (f; ) and f (x) dx = inf S (f; ) (30.14)
2 a 2
a
in perfect analogy with what has been done for positive functions. The next result shows
that everything holds together, that is, the notion of Riemann’s integral obtained through
the decomposition (30.13) in positive and negative part coincides with the equality between
upper and lower integrals of (30.14).
Rb
Proposition 1081 A bounded function f : [a; b] ! R is integrable if and only if f (x) dx =
a
Rb
a f (x) dx. In such a case,
Z b Z b Z b
f (x) dx = f (x) dx = f (x) dx
a a a
The proof is based on the next three Lemmas. The …rst one establishes a general property
of the suprema and in…ma of sums of functions, the second one has also a theoretical interest
for the theory of integration (as we will observe at the end of the section), while the last one
is of more technical nature. The proof of the second and of the third lemmas, as well as of
Proposition 1081, are omitted.
Proof By contradiction, let us suppose that supA (g + h) > supA g + supA h. Let " =
supA (g + h) (supA g + supA h) > 0. By the property of the sup of a set, there exists
x0 2 A such that (g + h)(x0 ) > supA (g + h) " = supA g + supA h.4 At the same time,
by de…nition of sup of a function, we have g(x) supA g and h(x) supA h for every
x 2 A, from which it follows g(x) + h(x) supA g + supA h for every x 2 A. In particular,
(g + h)(x0 ) supA g + supA h, a contradiction. The reader can prove, in a similar way, that
inf A (g + h) inf A g + inf A h.
Lemma 1083 Let f : [a; b] ! R be a bounded function. Then, for every subdivision =
fxi gni=0 of [a; b], we have
S (f; ) = S f + ; I f ; (30.15)
and
I (f; ) = I f + ; S f ; (30.16)
4
Note that supA (g + h) = sup Im(g + h) = sup(g + h)(A).
790 CHAPTER 30. RIEMANN’S INTEGRAL
Lemma 1084 Let f : [a; b] ! R be a bounded function. Then, for every subdivision =
fxi gni=0 of [a; b],
sup I (f; ) sup I f + ; inf S f ; (30.17)
2 2 2
N.B. Often Riemann’s integral is de…ned directly for general functions, which are not ne-
cessarily positive, through the upper sums and the lower sums. What is lost in de…ning
these sums for not necessarily positive functions is the geometric intuition. While for pos-
itive functions I(f; ) is the area of the inscribed rectangles and S(f; ) is the area of the
circumscribed rectangles, this is no longer true for a generic function that takes positive
and negative values, as (30.15) and (30.16) show. The formulation we adopt with De…nition
1080 is suggested by pedagogical motivations and it is equivalent to the usual formulation,
as Proposition 1081 shows. O
Proof “If”. Let us suppose that, for every " > 0, there exists a subdivision such that
S (f; ) I (f; ) < ". Then
Z b Z b
0 f (x) dx f (x) dx S (f; ) I (f; ) < "
a a
Rb Rb
and therefore, being " > 0 arbitrary, we have af (x) dx = f (x) dx.
a
Rb Rb
“Only if”. Let us suppose af f (x) dx. Thanks to Proposition 119, for every
(x) dx =
a
Rb
" > 0 there exist a subdivision 0 such that S (f; 0 ) a f (x) dx < " and a subdivision
00
Rb
such that f (x) dx I (f; 00 ) < ". Let be a subdivision that re…nes both 0 and 00 .
a
Thanks to (30.6) we have I (f; 00 ) I (f; ) S (f; ) S (f; 0 ), and therefore
Z b Z b
0 00
S (f; ) I (f; ) S f; I f; < f (x) dx + " f (x) dx + " = 2"
a a
as desired.
The next result shows that, if two functions are equal except at a …nite number of points,
then their integrals, if they exist, are equal. It is an important property of stability of the
integral, whose value does not change if we modify a function f : [a; b] ! R at a …nite
number of points.
30.3. CRITERIA OF INTEGRABILITY 791
Proof It is su¢ cient to prove the statement for the case in which g di¤ers from f at only
one point x
^ 2 [a; b]. The case in which g di¤ers from f at n points is proved simply by …nite
induction adding one point each time.
Let us suppose therefore that f (^
x) 6= g(^
x) with x
^ 2 [a; b]. Without loss of generality, let
us suppose that f (^ x) > g(^
x). Setting k = f (^
x) g(^ x) > 0, let h : [a; b] ! R the function
de…ned by h = f g. We have therefore
0 x 6= x
^
h(x) =
k x=x ^
Rb
Let us prove that h is integrable and that a h(x)dx = 0. Let " > 0. Let us consider any
subdivision = fx0 ; x1 ; :::; xn g of [a; b] such that j j < "=(2k). Since x ^ 2 [a; b], there are two
possibilities: in the …rst case x^ does not coincide with an interior point of the subdivision,
that is, we have either x^ 2 (xi 1 ; xi ) for some i = 1; :::; n or x
^ 2 fx0 ; xn g; in the second case
x
^ is a point of the subdivision, with the exclusion of the extremes, that is, x ^ = xi for some
i = 1; :::; n 1. Since h(x) = 0 for every x 6= x ^, we have
I(h; ) = 0
If x
^ 2 (xi 1 ; xi ) ^ 2 fx0 ; xn g, we have5
for some i = 1; :::; n; or x
" "
S(h; ) = k xi < k = <"
2k 2
If x
^ = xi for some i = 1; :::; n 1, we have
"
S(h; ) = k ( xi + xi+1 ) < 2k ="
2k
Therefore, in any case we have S(h; ) I(h; ) < ". Since " > 0 is arbitrary, thanks to
Proposition 1085 h is integrable on [a; b]. Hence
Z b
h(x)dx = sup I(h; ) = inf S(h; ) (30.18)
a 2 2
sup I(h; ) = 0
2
Applying the linearity of the integral (Theorem 1095), we have that g = f h is integrable
because f and h are so, with
Z b Z b Z b Z b
g(x)dx = f (x)dx h(x)dx = f (x)dx
a a a a
as desired.
O.R. Even if a function f is not de…ned at a …nite number of points of the interval [a; b], it
is possible to talk of its integral: it coincides with that of any function de…ned also at the
missing points and equal to f at the points at which this last one is de…ned. In particular the
integrals of f on [a; b], (a; b], [a; b) and (a; b) always coincide: this makes it non-ambiguous
Z b
the notation f (x) dx. H
a
Proof Let " > 0. Since g is continuous on [m; M ], thanks to Theorem 473 the function g is
uniformly continuous on [m; M ], that is, there exists " > 0 such that
so that, for i 2 I
f (x) f x0 < " 8x; x0 2 [xi 1 ; xi ]
and therefore
P
and therefore i2I
= xi < < ". Hence
"
n
" #
X
S (g f; ) I (g f; ) = sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i=0
" #
X
= sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i2I
" #
X
+ sup (g f ) (x) inf (g f ) (x) xi
x2[xi 1 ;xi ]
x2[xi 1 ;xi ]
i2I
=
X X
" xi + 2 max jg (y)j xi
y2[m;M ]
i2I i2I
=
< " (b a) + 2 max jg (y)j "
y2[m;M ]
Since the function g (x) = jxj is continuous, a simple, but important consequence of
Proposition 1087 is that the integrability of a bounded function f : [a; b] ! R implies the
integrability of the function absolute value jf j : [a; b] ! R. Note that the converse is false:
the function (
1 if x 2 Q\ [0; 1]
f (x) = (30.20)
1 if x 2 = Q\ [0; 1]
is a simple modi…cation of the Dirichlet function and hence it is not integrable, contrary to
its absolute value jf j that is the function constant equal to 1 on the interval [0; 1].
Finally observe that the …rst among the integrability criteria of the section, Proposition
1085, allows an interesting perspective on Riemann’s integral. Given any subdivision =
fxi gni=0 , by de…nition we have mi f (x0i ) Mi for every x0i 2 [xi 1 ; xi ], so that
n
X
I (f; ) f x0i xi S (f; )
i=1
Hence, since
Z b
I(f; ) f (x) dx S(f; )
a
we have
n
X Z b
I (f; ) S (f; ) f x0i xi f (x) dx S (f; ) I (f; )
i=1 a
which is equivalent to
n
X Z b
f x0i xi f (x) dx S (f; ) I (f; )
i=1 a
794 CHAPTER 30. RIEMANN’S INTEGRAL
Thanks to Proposition 1085, for every " > 0 there exists a subdivision su¢ ciently …ne for
which Z b
X n
0
f xi xi f (x) dx S (f; ) I (f; ) < "
i=1 a
and
n
X
g (x) = c1 1[x0 ;x1 ] (x) + ci 1(xi 1 ;xi ]
(x) (30.24)
i=2
are step functions where, for every set A R, by 1A : R ! R we have denoted the indicator
function
1 if x 2 A
1A (x) = (30.25)
0 if x 2
=A
7
Often called Riemann sums (or, sometimes, Cauchy sums).
30.4. CLASSES OF INTEGRABLE FUNCTIONS 795
The two following …gures report, for n = 4, examples of functions f and g described by
(30.23) and (30.24). Not that f and g are, respectively, continuous from the right and from
the left, that is, limx!x+ f (x) = f (x0 ) and limx!x g (x) = g (x0 ).
0 0
7 7
6 6
5 f(x) 5 g(x)
4 c 4 c
4 4
3 c 3 c
2 2
2 c 2 c
3 3
1 c 1 c
1 1
0 0
x x x x x x x x x x
0 1 2 3 4 0 1 2 3 4
-1 -1
-1 0 1 2 3 4 5 6 7 8 9 -1 0 1 2 3 4 5 6 7 8 9
On the intervals
4 c
4
3 c
2
2 c
3
1 c
1
0
x x x x x
0 1 2 3 4
-1
-1 0 1 2 3 4 5 6 7 8 9
determined by the subdivision fxi g4i=0 and by the constants fci g4i=1 . Nevertheless, at the
points x1 < x2 < x3 the functions f and g di¤er and it is easy to verify that on the whole
interval [x0 ; x4 ] they do not generate this plurirectangle, as next …gure shows. Indeed, the
dashed segments at x2 is not under f and the dashed segments at x1 and x3 are not under
796 CHAPTER 30. RIEMANN’S INTEGRAL
g.
7 7
6 6
5 f(x) 5 g(x)
4 c 4 c
4 4
3 c 3 c
2 2
2 c 2 c
3 3
1 c 1 c
1 1
0 0
x x x x x x x x x x
0 1 2 3 4 0 1 2 3 4
-1 -1
-1 0 1 2 3 4 5 6 7 -1
8 90 1 2 3 4 5 6 7
But, thanks to Proposition 1086, such discrepancy at a …nite number of points is irrelevant
for the integral and next result shows that the area under the step functions f and g is,
actually, equal to that of the corresponding plurirectangle (independently on the values of
the function at the points x1 < x2 < x3 ).
Proposition 1089 A step function f : [a; b] ! R, determined by the subdivision fxi gni=0
and by the constants fci gni=1 according to (30.22), is integrable and we have
Z b n
X
f (x) dx = ci xi (30.26)
a i=1
All the step functions that are determined by a subdivision fxi gni=0 and by a set of con-
stants fci gni=1 according to (30.22), share therefore the same integral (30.26). In particular,
this holds for the step functions (30.23) and (30.24).
Rb Rb
Proof Since f is bounded, thanks to Lemma 1072 we have that f (x) dx; a f (x) dx 2 R.
a
Let m = inf x2[a;b] f (x) and M = supx2[a;b] f (x). Fixed " > 0 su¢ ciently small, let us
consider the subdivision " given by
x0 < x1 " < x1 + " < x2 " < x2 + " < < xn 1 " < xn 1 + " < xn
30.4. CLASSES OF INTEGRABLE FUNCTIONS 797
We have
I (f; ") = c1 (x1 " x0 ) + 2" inf f (x)
x2[x1 ";x1 +"]
n
X1
= c1 ( x1 ") + 2" inf f (x)
x2[xi ";xi +"]
i=1
n
X1
+ ci ( xi 2") + cn ( xn ")
i=2
Xn n
X1 n
X1
= ci xi " (c1 + cn ) + 2" inf f (x) 2" ci
x2[xi ";xi +"]
i=1 i=1 i=2
Xn
ci xi 2"M + 2" (n 1) m 2"M (n 2)
i=1
Xn
= ci xi 2" (n 1) (M m)
i=1
the lower and upper integrals can be expressed in terms of integrals of step functions. Let
S ([a; b]) be the set of all the step functions de…ned on [a; b].
798 CHAPTER 30. RIEMANN’S INTEGRAL
and
Z b Z b
f (x) dx = inf h (x) dx : h f and h 2 S ([a; b]) (30.28)
a a
that is if and only if the lower approximation given by the integrals of step functions lower
than f coincides, at the limit, with the upper approximation given by the integrals of step
functions larger than f . In this case the exhaustion assumes a more analytic and less geomet-
ric aspect8 having substituted the approximation through elementary polygons (the plurir-
ectangles) with one given by elementary functions (the step functions).
This suggests a di¤erent approach to Riemann’s integral, more analytic and less geomet-
ric. In it, …rst we de…ne the integrals of step functions (that is, the area under them), which
can be determined on the basis of elementary geometric considerations based on plurirect-
angles. We then use these “elementary” integrals to suitably approximate the areas under
more complicated functions. In particular, we de…ne the lower integral of a bounded function
f : [a; b] ! R as the best approximation “from below” obtained thanks to step functions
h f , and, analogously, the upper integral of a bounded function f : [a; b] ! R as the best
approximation “from above” obtained with step functions h f .
Thanks to (30.27) and (30.28), such more analytic interpretation of the method of ex-
haustion is equivalent to the geometric one previously adopted. The analytic approach is
very fruitful for some subsequent developments.
Proof Since f is continuous on [a; b], thanks to Weierstrass’ Theorem, f is bounded. Let
" > 0. By Theorem 473, f is uniformly continuous, that is, there exists " > 0 such that
jx yj < " =) jf (x) f (y)j < " 8x; y 2 [a; b] (30.29)
Let = fxi gni=0 be a subdivision of [a; b] such that j j < ". Thanks to (30.29), for every
i = 1; 2; : : : ; n we have therefore
max f (x) min f (x) < "
x2[xi 1 ;xi ] x2[xi 1 ;xi ]
8
That is based also on the use of notions of analysis, such as the functions, and not only on that of
geometric …gures, such as the plurirectangles.
30.4. CLASSES OF INTEGRABLE FUNCTIONS 799
For the stability of the integral seen in Proposition 1086, we have the following immediate
generalization of Proposition 1091: Every bounded function f : [a; b] ! R that has at most
a …nite number of removable discontinuities is integrable. Indeed, by recalling (12.7) of
Chapter 12, if S = fxi gni=1 is the set of points where f has removable discontinuities, the
function
f (x) if x 2
=S
f~ (x) =
limy!x f (y) if x 2 S
is continuous (and therefore integrable) and it is equal to f except at the points of S.
The hypothesis that the discontinuities are removable, is actually super‡uous. Moreover,
we can allow for countably many points of discontinuity (and not more than that).
Theorem 1092 Every bounded function f : [a; b] ! R with at most countably many discon-
tinuities, is integrable.
x if x 2 (0; 1)
f (x) = 1
2 if x 2 f0; 1g
is continuous at all the points of [0; 1], except at the two extreme points 0 and 1. By Theorem
1092, the function f is integrable.
(ii) Consider the countable set
1
E= :n 1 [0; 1]
n
9
In more advanced courses, the reader will study more general versions of the already remarkable Theorem
1092.
800 CHAPTER 30. RIEMANN’S INTEGRAL
x2 if x 2
=E
f (x) =
0 if x 2 E
is continuous at all the points of [0; 1], except at the points of E.10 Since E is a countable
set, by Theorem 1092 the function f is integrable. N
1 if x 2 Q\ [0; 1]
f (x) =
0 if x 2
= Q\ [0; 1]
which we know it is not integrable, does not satisfy the hypotheses of Theorem 1092. Indeed,
even if it is bounded, it is discontinuous at each point of [0; 1] (not only at the points
x 2 Q\ [0; 1], which form a countable set).
The result follows immediately from Theorem 1092 because monotonic functions have at
most countably many points of discontinuity (Proposition 455). Given this, we give a simple
direct proof of the result.
Proof Let " > 0. Let = fxi gni=0 be a subdivision of [a; b] such that j j < ". Let us suppose
that f is increasing (the argument for f decreasing is analogous). We have
and therefore
n
X n
X
S (f; ) I (f; ) = sup f (x) xi inf f (x) xi
x2[xi 1 ;xi ]
i=1 x2[xi 1 ;xi ] i=1
Xn n
X n
X
= f (xi ) xi f (xi 1 ) xi = (f (xi ) f (xi 1 )) xi
i=1 i=1 i=1
n
X
j j (f (xi ) f (xi 1 )) < " (f (b) f (a))
i=1
Theorem 1095 Let f; g : [a; b] ! R be two bounded and integrable functions. Then, for
every ; 2 R, the function f + g : [a; b] ! R is integrable and we have
Z b Z b Z b
( f + g) (x) dx = f (x) dx + g (x) dx (30.30)
a a a
Proof The proof is divided in two parts. First we will prove the homogeneity, that is,
Z b Z b
f (x) dx = f (x) dx 8 2R (30.31)
a a
given f and g integrable. Together, expressions (30.31) and (30.32) are equivalent to (30.30).
Z b Z b
( f ) (x) dx = sup I ( f; ) = sup S (f; ) = inf S (f; ) = f (x) dx
a 2 2 2 a
Let now < 0. We have f = ( )( f ) with > 0. Then, applying (30.33), we have
Z b Z b Z b
( f ) (x) dx = ( ) ( f ) (x) dx = ( ) ( f ) (x) dx
a a a
Z b Z b
=( ) f (x) dx = f (x) dx
a a
802 CHAPTER 30. RIEMANN’S INTEGRAL
In conclusion,
Z b Z b
f (x) dx = f (x) dx 8 2R (30.34)
a a
that is (30.31).
(ii) Additivity. Let us prove (30.32). Let " > 0. Since f and g are integrable, by
Proposition 1085 there exists a subdivision of [a; b] such that S (f; ) I (f; ) < " and
there exists 0 such that S (g; 0 ) I (g; 0 ) < ". Let 00 be a subdivision of [a; b] that re…nes
both and 0 . Thanks to (30.6), we have S (f; 00 ) I (f; 00 ) < " and S (g; 00 ) I (g; 00 ) < ".
Moreover, applying the inequalities of Lemma 1082,
00 00 00 00 00 00
I f; + I g; I f + g; S f + g; S f; + S g; (30.35)
and therefore
00 00 00 00 00 00
S f + g; I f + g; S f; I f; + S g; I g; < 2"
that is
Z b Z b
I (f; ) f (x) dx + I (g; ) g (x) dx
a a
Z b Z b Z b
(f + g)(x)dx f (x) dx + g (x) dx
a a a
Z b Z b
S (f; ) f (x) dx + S (g; ) g (x) dx
a a
Since f and g are integrable, given " > 0 it is possible to …nd a subdivision " such that,
for h = f; g
Z b Z b
" " " "
I (h; ) h (x) dx > and S (h; ) h (x) dx <
a 2 a 2
So that Z Z Z
b b b
"< (f + g)(x)dx f (x) dx + g (x) dx <"
a a a
30.5. PROPERTIES OF THE INTEGRAL 803
that is (30.32).
An important consequence of the linearity of the integral is that the product of two
integrable functions is integrable.
Corollary 1096 If f; g : [a; b] ! R are two bounded and integrable functions, then their
product f g : [a; b] ! R is integrable.
Proof If f = g, the integrability of f 2 follows from Proposition 1087 considering the con-
tinuous function g (x) = x2 . If f 6= g, then f g can be rewritten in the following way:
1h i
fg = (f + g)2 (f g)2
4
By Theorem 1095, f + g and f g are integrable. For what has just been proved, also their
squares are integrable; applying again Theorem 1095, we have that f g is integrable.
O.R. Thanks to the linearity of the integral, the knowledge of the integrals of f and g allows
to calculate the integral of f + g. It is not so for the product or for the composition of
integrable functions: the integrability of f guarantees the integrability of f 2 , but the know-
ledge of the integral of f does not help in the calculation of the integral of f 2 . More generally,
knowing that g f is integrable does not give any useful indication for the computation of
the integral of such a composite function. H
Finally, the linearity of the integral implies that it is possible to subdivide freely the
domain of integration [a; b] in subintervals.
Corollary 1097 Let f : [a; b] ! R be a bounded and integrable function. If a < c < b, we
have
Z b Z c Z b
f (x) dx = f (x) dx + f (x) dx (30.37)
a a c
Vice versa, if f1 : [a; c] ! R and f2 : [c; b] ! R are bounded and integrable, then the function
f : [a; b] ! R de…ned by
(
f1 (x) if x 2 [a; c]
f (x) =
f2 (x) if x 2 (c; b]
Proof Let us prove the …rst part. Since (recall De…nition (30.25) of indicator function):
f = 1[a;c] f + 1(c;b] f
where
f (x) if x 2 [a; c]
1[a;c] f (x) =
0 if x 2 (c; b]
Let " > 0. Since 1[a;c] f (x) is integrable (being product of integrable functions),11 by Pro-
position 1085 there exists a subdivision of [a; b] such that
Let 0 = fxi gi=0;1;:::;n be a re…nement of that has c as point of subdivision, let us say
c = xj . Then we have
S(1[a;c] f (x) ; 0 ) I(1[a;c] f (x) ; 0
)<"
Let 00 = 0 \ [a; c]. In other words, 00 = fx0 ; x1 ; :::xj g is the restriction of the subdivision
0 on the interval [a; c]. Using the usual terminology for m and M for every i = 1; 2; :::n
i i
and being mi = Mi = 0 for i > j, we have
n
X X
0 00
I(1[a;c] f (x) ; )= mi xi = mi xi = I(fj[a;c] (x) ; ) (30.38)
i=1 i j
and
n
X X
0 00
S(1[a;c] f (x) ; )= Mi xi = Mi xi = S(fj[a;c] (x) ; ) (30.39)
i=1 i j
Therefore,
00 00
S(fj[a;c] (x) ; ) I(fj[a;c] (x) ; )<"
and by Proposition 1085 we can conclude that fj[a;c] : [a; c] ! R is integrable. Moreover from
(30.38) and (30.39) we deduce that
Z b Z c Z c
1[a;c] f (x) dx = fj[a;c] (x)dx = f (x)dx
a a a
11
The indicator function, being a step function, is integrable.
30.5. PROPERTIES OF THE INTEGRAL 805
as desired.
The next property of monotonicity of the integral shows that to larger functions there
correspond larger integrals. The writing f g means f (x) g (x) for every x 2 [a; b].
Theorem 1098 Let f; g : [a; b] ! R be two bounded and integrable functions. If f g, then
Rb Rb
a f (x) dx a g (x) dx.
From the monotonicity of the integral it follows an important inequality between absolute
values of integrals and integrals of absolute values. With regard to this, recall as after
Proposition 1087 we observed that the integrability of jf j follows from the integrability of f .
Rb
Proof Since f jf j and f jf j, from Proposition 1098 it follows that a f (x) dx
Rb Rb Rb Rb Rb
a jf (x)j dx and a f (x) dx a jf (x)j dx, that is, a f (x) dx a jf (x)j dx.
The monotonicity of the integral allows to establish an interesting sandwich for the
integrals.
Proposition 1100 Let f : [a; b] ! R be a bounded and integrable function. Then, setting
m = inf [a;b] f (x) and M = sup[a;b] f (x), we have
Z b
m (b a) f (x) dx M (b a) (30.41)
a
Proof We have
m f (x) M 8x 2 [a; b]
and therefore, by the monotonicity of the integral,
Z b Z b Z b
mdx f (x) dx M dx
a a a
Rb
We obviously get a mdx = m (b a) (it is the area of a rectangle of base b a and of
Rb
height m) and a M dx = M (b a). Therefore
Z b
m (b a) f (x) dx M (b a)
a
as we wanted to prove.
We end with the classical Theorem of the integral mean (also called Mean Value Theorem
of the integral calculus), which is consequence of the sandwich (30.41).
Theorem 1101 (of the integral mean) Let f : [a; b] ! R be a bounded and integrable
function. Then, setting m = inf [a;b] f (x) and M = sup[a;b] f (x), there exists a scalar 2
[m; M ] such that
Z b
f (x) dx = (b a) (30.42)
a
In particular, if f is continuous, there exists c 2 [a; b] such that f (c) = , that is,
Z b
f (x) dx = f (c) (b a)
a
For this reason, is called mean value (of the ordinates) of f : the value of the integral does
not change if we substitute the constant value to all the ordinates of the function.
30.5. PROPERTIES OF THE INTEGRAL 807
O.R. The Theorem of the integral mean is very intuitive: there exists a rectangle with base
[a; b] and height , with area equal to the one under f on [a; b]:
25
y
20
15
10
0
O a b x
-2 0 2 4 6 8
If, moreover, the function f is continuous, the height of such a rectangle coincides with
one of the ordinates of f . H
N.B. We close the section with an important speci…cation. Given a function f : [a; b] ! R,
Rb
until now we have considered the de…nite integral of f from a to b, Rthat is, a f (x)dx.
a
Sometimes it is useful to consider the integral
Ra of f from b to a, that is, b f (x)dx,12 as well
as the integral of f from a to a, that is, a f (x)dx. What do we intend for such expressions?
By convention, we pose, for a < b,
Z a Z b
f (x)dx = f (x)dx (30.43)
b a
and Z a
f (x)dx = 0 (30.44)
a
12
This happens, for example, if f is integrable on an interval [a; b] and we take two generic points x; y 2 [a; b],
without specifying if x < y or x y, and then we consider the integral of f between x and y.
808 CHAPTER 30. RIEMANN’S INTEGRAL
Rb
Thanks to such conventions it is no longer essential that in a we have a < b: in the case in
which a b the integral assumes the meaning given to it by (30.43) and (30.44). Moreover,
Rb
it is possible to prove that the properties proved for the integral a f (x)dx hold also in the
case a b. O
P 0 (x) = f (x) 8x 2 I
In other words, passing from the function f to its primitive P can be seen as the inverse
procedure with respect to passing from P to f through the derivation. In this sense, the
primitive function is the inverse of the derivative function (so that sometimes it is called
antiderivative).
Let us see now a pair of examples. With regard to this it is important to observe that,
as Example 1108 will show, there exist functions that do not have primitives: the search of
the primitive of a given function can be vain. In any case, necessary condition for a function
f to have a primitive is that it has not removable or jump discontinuities.
Example 1103 Let f : [0; 1] ! R be given by f (x) = x. The function P : [0; 1] ! R given
by P (x) = x2 =2 is primitive of f . Indeed, P 0 (x) = 2x=2 = x. N
is primitive of f . Indeed,
1 1
P 0 (x) = 2x = f (x)
2 1 + x2
for every x 2 R . N
Let us do a simple, but useful, observation: if I1 and I2 are two intervals such that
I1 I2 , then, if P is primitive of f on I2 , it is so also on I1 . For example, if we consider
the restriction of f (x) = x= 1 + x2 on [0; 1], that is, the function fe : [0; 1] ! R given by
fe(x) = x= 1 + x2 , then the primitive on [0; 1] remains P (x) = 21 log 1 + x2 .
Proof The “if” is obvious. Let us prove the “only if”. Let I = [a; b] and let P1 : [a; b] ! R
and P2 : [a; b] ! R be two primitive functions of f on [a; b]. Since P10 (x) = f (x) and
P20 (x) = f (x) for every x 2 [a; b], we have
(P1 P2 )0 (x) = P10 (x) P20 (x) = 0 8x 2 [a; b]
and therefore the function P1 P2 has zero derivative on [a; b]. By Corollary 827, the function
P1 P2 is constant, that is, there exists k 2 R such that P1 = P2 + k.
Let now I be an open and bounded interval (a; b). Let " > 0 be su¢ ciently small so that
a + " < b ". We have
1 h
[ " "i
(a; b) = a + ;b
n n
n=1
For what has just been proved, for every n 1 there exists a constant kn 2 R such that
h " "i
P1 (x) = P2 (x) + kn 8x 2 a + ; b (30.45)
n n
Let x0 2 (a; b) be such that a + " < x0 < b ", so that x0 2 [a + "=n; b "=n] for every
n 1. From (30.45) it follows that P1 (x0 ) = P2 (x0 ) + kn for every n 1. Therefore,
kn = P1 (x0 ) P2 (x0 ) for every n 1, that is, k1 = k2 = = kn . There exists therefore
k 2 R such that P1 (x) = P2 (x) + k for every x 2 (a; b).
In a similar way it is possible to prove the result when I is a half-open and bounded
interval (a; b] or [a; b). If I = R, we proceed as in the case (a; b) observing that R =
1
[
[ n; n]. A similar argument, which we leave to the reader, holds also for unbounded
n=1
intervals.
Proposition 1105 is another important application of the Mean Value Theorem (of dif-
ferential calculus). Thanks to it, once identi…ed a primitive P of a function f , we can write
the family of all the primitives as fP + kgk2R . Such important family has a name.
810 CHAPTER 30. RIEMANN’S INTEGRAL
De…nition 1106 Given a function f : I ! R, the family of all its primitives is called
inde…nite integral and is denoted by
Z
f (x) dx
Z
x2
f (x) dx = +k
2
We close the section showing that not all the functions admit a primitive, and therefore
an inde…nite integral.
does not admit primitive. Let us suppose, by contradiction, that there exists a primitive
P : R ! R, which is a function with a derivative such that P 0 (x) = sgn x. By Proposition
1105 there exists k 2 R such that
x+k if x > 0
P (x) =
x+k if x < 0
Since P has a derivative, by continuity we have moreover P (0) = k. Therefore, P (x) = jxj+k
for every x 2 R, but such function has not a derivative, which contradicts what has been
assumed on P . Note that the function signum is a step function and therefore is integrable
thanks to Proposition 1089. N
Rb
O.R. Riemann’s integral a f (x) dx is often called de…nite integral, distinguishing it in such
a way from the inde…nite integral just introduced. H
30.6. FUNDAMENTAL THEOREMS OF INTEGRAL CALCULUS 811
30.6.2 Formulary
The next table, obtained by reversing the analogous table of the fundamental derivatives,
reports some fundamental inde…nite integrals.
R
f f (x) dx
xa+1
xa +k 1 6= a 2 R and x > 0
a+1
xn+1
xn +k x2R
n+1
1
log x + k x>0
x
1
log ( x) + k x<0
x
cos x sin x + k x2R
sin x cos x + k x2R
ex ex + k x2R
x
x +k > 0 and x 2 R
log
1
p arcsin x + k x 2 ( 1; 1)
1 x2
1
arctan x + k x2R
1 + x2
1
(cos x)2
tan x + k x2R
We do three observations:
1 1
g 0 (x) = ( 1) =
x x
812 CHAPTER 30. RIEMANN’S INTEGRAL
Proof Let = fxi gni=0 be a subdivision of [a; b]. If we add and subtract P (xi ) for every
i = 1; 2; : : : ; n 1, we have
P (b) P (a) = P (xn ) P (xn 1 ) + P (xn 1) P (x1 ) + P (x1 ) P (x0 )
Xn
= (P (xi ) P (xi 1 ))
i=1
which implies
I (f; ) P (b) P (a) S (f; ) (30.47)
Since is any subdivision, (30.47) holds for every 2 and therefore
sup I (f; ) P (b) P (a) inf S (f; )
2 2
Let us illustrate the theorem with some examples, which use again the primitives calcu-
lated in Examples (1103) and (1104).
30.6. FUNDAMENTAL THEOREMS OF INTEGRAL CALCULUS 813
For the integrable functions without primitives, such as the function sgn x, Theorem 1109
cannot be applied and the calculation of integrals cannot be done through formula (30.46). In
some simple cases it is however possible to calculate the integral using directly the de…nition.
For example, the function signum is a step function and therefore we can apply Proposition
1089, in which, using the de…nition of integral, we determined the value of the integral for
this class of functions. In particular, we have
8
Z b < b a if a 0
>
sgn x dx = a + b if a < 0 < b
a >
:
a b if b 0
The cases a 0 and b 0, using (30.26), are obvious. Let us consider the case a < 0 < b.
Using (30.37) and (30.26), we have
Z b Z 0 Z b Z 0 Z b
sgn x dx = sgn x dx + sgn x dx = ( 1)dx + 1dx
a a 0 a 0
= ( 1)(0 a) + (1)(b 0) = a + b
In other words, the value F (x) of the integral function is the (signed) area under f on
the interval [a; x], when x varies.13
Rx
N.B. The integral function F (x) = a f (t) dt is a function F : [a; b] ! R that has as
variable Rthe extreme of integration x, which when varies determines a di¤erent Riemann’s
x
integral a f (t) dt. The value of such integral (that is a number) is the image F (x) of
the integral function. With regard to this, note that F is de…ned on [a; b] since, being f
integrable on such interval, it is integrable on all the subintervals [a; x] [a; b]. O
Proof Since f is bounded, there exists M > 0 such that jf (x)j M for everyRx 2 [a; b].
x
Let x; y 2 [a; b]. By the de…nition of integral function, we have F (x) F (y) = y f (t) dt.
Thanks to (30.40), we have
Z x Z x Z x
jF (x) F (y)j = f (t) dt jf (t)j dt M dt = M jx yj
y y y
Forti…ed by the notion of integral function, we can now go back to the problem that
opened the section, that is, the identi…cation of criteria that ensure the existence of primitives
for integrable functions. The next very important result, the Second Fundamental Theorem
of Calculus, shows that if f is continuous, then F 0 (x) = f (x) for every x 2 [a; b], that is, the
integral function is exactly a primitive of f . The continuity of a function is therefore a simple
and fundamental condition that guarantees the existence of primitives of the function.
Proof Let x0 2 (a; b). First of all let us see which form it assumes the di¤erence quotient of
F at x0 . Let us consider h > 0 such that x0 + h 2 [a; b]. Thanks to Corollary 1097 we have
Z x0 +h Z x0
F (x0 + h) F (x0 ) = f (t) dt f (t) dt
a a
Z x0 Z x0 +h Z x0 Z x0 +h
= f (t) dt + f (t) dt f (t) dt = f (t) dt
a x0 a x0
13
Note that in the de…nition of the integral function the (mute) variable of integration is no longer x, but
any other letter (here we have chosen t, but it could have been z, u or any other letter di¤erent from x). Such
a choice is dictated by the necessity of avoiding any kind of confusion on the use of the variable x, that this
time becomes the independent variable of the integral function.
30.6. FUNDAMENTAL THEOREMS OF INTEGRAL CALCULUS 815
and therefore, thanks to the Mean Value Theorem (of the integral calculus), having denoted
by x0 + #h, 0 # 1, a point of the interval [x0 ; x0 + h]:
R x0 +h
F (x0 + h) F (x0 ) x0 f (t) dt hf (x0 + #h) hf (x0 )
f (x0 ) = f (x0 ) =
h h h
= f (x0 + #h) f (x0 ) ! 0
by the continuity of f .
An analogous argument holds also if h < 0.14 Therefore,
F (x0 + h) F (x0 )
F 0 (x0 ) = lim = f (x0 )
h!0 h
completing in this way the proof when x0 2 (a; b). The cases x0 = a and x0 = b are proved
in a similar way, as the reader can easily verify. We conclude that there exists F 0 (x0 ) and
that it is equal to f (x0 ).
The Second Fundamental Theorem gives a su¢ cient condition (the continuity) for an
integrable function to have primitive. Moreover, thanks to (30.46) of the First Fundamental
Theorem, we have
Z b
f (x) dx = F (b) F (a) (30.49)
a
Rb
that is the Riemann’s integral a f (x) dx of a continuous function f is equal to the di¤erence
F (b) F (a) calculated relatively to the integral function. Together the two fundamental
theorems form the backbone of integral calculus by making it operational.
The next example shows that continuity is only a su¢ cient, but not necessary, condition
for an integrable function to admit primitive. Indeed, it shows that there exist non-continuous
integrable functions that have primitives (and for which the First Fundamental Theorem can
therefore be applied).
Indeed, for x 6= 0, this can be veri…ed deriving x2 sin 1=x, while for x = 0 it is possible to
observe that
Next we show that when f is not continuous the theorem may fail.
The function f , a well behaved modi…cation of the Dirichlet function, is continuous at every
irrational points and discontinuous atRevery rational point of the unit interval. By Theorem
1
1092, f is integrable. In particular, 0 f (t) dt = 0. It is a useful (non-trivial) exercise to
check all this. Rx
That said, if F (x) = 0 f (t) dt for every x 2 [0; 1], we then have F (x) = 0 for every
x 2 [0; 1]. Hence, F is trivially di¤erentiable, with F 0 (x) = 0 for every x 2 [0; 1], but F 0 6= f
because F 0 (x) = f (x) if and only if x is irrational. We conclude
R x 0 that (30.48) does not hold,
and so the theorem fails. Nevertheless, we have F (x) = 0 F (t) dt for every x 2 [0; 1]. N
(ii) we calculate the di¤erence P (b) P (a): such di¤erence is often denoted by the nota-
tions P (x)jba or [P (x)]ba .
We present some properties of the inde…nite integral that simplify its calculation. As a
…rst thing, we observe that the linearity of derivatives, established in (18.12), implies the
linearity of the inde…nite integral. As in Section 30.6.1, we denote by I a generic interval,
bounded or unbounded.
30.7. PROPERTIES OF THE INDEFINITE INTEGRAL 817
Proposition 1117 Let f; g : I ! R be two functions that admit primitives. For every
; 2 R, the function f + g : I ! R admits primitive and we have
Z Z Z
( f + g) (x) dx = f (x) dx + g (x) dx (30.50)
A simple application of the result is the calculation of the inde…nite integral of a poly-
nomial. Indeed, given a polynomial f (x) = 0 + 1 x + + n xn , from (30.50) it follows
that Z Z X ! Z
n Xn Xn
i i xi+1
f (x) dx = i x dx = i x dx = i +k
i+1
i=0 i=0 i=0
Rule (18.13) of derivation of the product of functions leads to an important rule of
calculation of the inde…nite integral, called integration by parts.
Proposition 1118 (Integration by parts) Let f; g : I ! R be two functions with a de-
rivative. Then Z Z
f 0 (x) g (x) dx + f (x) g 0 (x) dx = f (x) g (x) + k (30.51)
Proof Expression (18.13) implies that (f g)0 = f 0 g + f g 0 . Hence, f g = Pf 0 g+f g0 , and thanks
to (30.50) we have
Z Z Z
f (x) g (x) + k = f (x) g (x) + f (x) g (x) dx = f (x) g (x) dx + f (x) g 0 (x) dx
0 0 0
as desired.
that is Z Z
1
x dx + log x dx = x log x + k
x
which implies Z
log x dx = x (log x 1) + k
N
818 CHAPTER 30. RIEMANN’S INTEGRAL
R
Example 1120 Let us calculate the inde…nite integral Rx sin x dx. Let f; g : (0; 1) !
RR be given by f (x) = x and g (x) = cos x, so that x sin x dx can be rewritten as
f (x) g 0 (x) dx. Thanks to (30.51) we have
Z Z
0
f (x) g (x) dx + x sin x dx = x cos x + k
that is Z Z
x sin x dx = cos xdx x cos x + k = sin x x cos x + k
Observe that in the last example, if instead we set f (x) = sin x and g (x) = x2 =2, rule
(30.51) Rwould have revealed
R itself useless. Also with such choice of f and g it is possible to
rewrite x sin x dx as f (x) g 0 (x) dx, but here (30.51) implies
Z Z
x2
f 0 (x) g (x) dx + x sin x dx = sin x + k
2
that is Z Z
x2 1
x sin x dx = sin x x2 cos xdx + k
2 2
R
that has actually complicated things because Rthe integral x2 cos xdx is more di¢ cult to
compute with respect to the original integral x sin x dx. This shows that integration by
parts cannot proceed in a mechanical way, but it requires a bit of imagination and experience.
The two factors of the product f (x) g 0 (x) dx are called respectively …nite factor, f (x), and
di¤ erential factor, g 0 (x) dx, so that the formula can be remembered as: “the integral of the
product between a …nite factor and a di¤erential factor is equal to the product between …nite
factor and the integral of the di¤erential factor minus the integral of the product between
the derivative of the …nite factor and the integral just found”. We repeat that it is important
to cautiously choose which of the two factors to take as …nite factor and which as di¤erential
factor.
30.8. CHANGE OF VARIABLE 819
Theorem 1121 Let ' : [c; d] ! [a; b] be a di¤ erentiable and strictly increasing function
such that '0 : [c; d] ! R is integrable. If f : [a; b] ! R is continuous, then the function
(f ') '0 : [c; d] ! R is integrable and
Z d Z '(d)
f (' (t)) '0 (t) dt = f (x) dx (30.53)
c '(c)
If ' is surjective, we have a = ' (c) and b = ' (d). Expression (30.53) can therefore be
rewritten as Z b Z d
f (x) dx = f (' (t)) '0 (t) dt (30.54)
a c
Heuristically, (30.53) can be seen as the result of the change of variable x = ' (t) and of the
relative change
dx = '0 (t) dt = d' (t) (30.55)
in dx. At a mnemonic and of calculation level, the observation can be useful, even if writing
(30.55) is in itself without meaning.
(F ')0 (t) = F 0 ' (t) '0 (t) = (f ') (t) '0 (t)
that is F ' is a primitive of (f ') '0 : [c; d] ! R. Thanks to Proposition 1087, the
composite function f ' : [c; d] ! R is integrable. Since, by hypothesis, '0 : [c; d] ! R is
integrable, also the product function (f ') '0 : [c; d] ! R is so (recall what we have seen at
the end of Section 30.5). By the First Fundamental Theorem we have
Z d
(f ') (t) '0 (t) dt = (F ') (d) (F ') (c) (30.57)
c
820 CHAPTER 30. RIEMANN’S INTEGRAL
Since ' is bijective (being strictly increasing) we have ' (c) = a and ' (d) = b. Therefore,
(30.57) and (30.56) imply
Z d Z b
0
(f ') (t) ' (t) dt = F (' (d)) F (' (c)) = f (x) dx
c a
as desired.
Theorem 1121, besides having a theoretical interest, can be very useful in the calculation
of integrals. Formula (30.53), and its rewriting (30.54), can be used both from “right to left”
and fromR “left to right”. In the …rst case, from right to left, the objective is to calculate the
b
integral a f (x) dx …nding a suitable change of variable x = ' (t) that leads to an integral
R ' 1 (b)
' 1 (a)
f (' (t)) '0 (t) dt of simpler calculation. The di¢ culty is in …nding a suitable change
of variable x = ' (t): indeed, nothing guarantees that there exists a “simplifying” change
and, even if it existed, it might not be obvious how to …nd it.
The use in direction left to right of formula (30.53) is useful to calculate an integral that
Rd
can be written as c f (' (t)) '0 (t) dt for some function f of which we know the primitive
R '(d)
F . In such a case, the corresponding integral '(c) f (x) dx, obtained setting x = ' (t), is of
easier solution since Z
f ('(x))'0 (x)dx = F ('(x))
Rd
In such a case the di¢ culty is in recognizing the composite form c f (' (t)) '0 (t) dt in the
integral that we want to calculate. Also here, nothing guarantees that it can be rewritten
in this form, nor that, also when possible, it is easy to recognize. Only the experience (and
the exercise) can be of help. The next example presents some classical integrals that can be
solved with this technique.
For example, Z
1
sin4 x cos xdx = sin5 x + k
5
(ii) We have
Z
'0 (x)
dx = log j'(x)j + k
'(x)
For example,
Z Z Z
sin x sin x
tan xdx = dx = dx = log j cos xj + k
cos x cos x
30.8. CHANGE OF VARIABLE 821
(iii) We have
Z Z
sin('(x))'0 (x)dx = cos '(x) + k and cos('(x))'0 (x)dx = sin '(x) + k
For example,
Z
sin(3x3 2x2 ) (9x2 4x)dx = sin(3x3 2x2 ) + k
(iv) We have Z
e'(x) '0 (x)dx = e'(x) + k
For example, Z Z
2 1 2 1 2
xex dx = 2xex dx = ex + k
2 2
We present now three examples that illustrate the two possible applications of formula
(30.53). The …rst example considers the case right to left, the second example can be solved
both with the method right to left and with the method left to right, while the last example
considers the case left to right. For simplicity we use the variables x and t as they appear in
(30.53), even if it is obviously a pure convenience, without substantial value.
and therefore
Z b p p p
p p p p
sin xdx = 2 sin b sin a + a cos a b cos b
a
p
Note how the starting point has been to set t = x, that is, to specify the inverse function t =
p
' 1 (x) = x. This is often the case because it is simpler to think of which transformation
of x may simplify the integration. N
822 CHAPTER 30. RIEMANN’S INTEGRAL
1
“Right to left”. Let us set t = sin x, so that ' (t) = sin t on [0; =2]. From (18.20) it
follows that
1
'0 (t) =
cos sin 1 t
“Left to right”. In the integral we recognize a form of type (i) of Example 1122, which is an
integral of the type
Z
'(x)a '0 (x)dx
R '(x)a+1
with '(x) = 1 + sin x and a = 3. Since '(x)a '0 (x)dx = a+1 we have
Z
2 cos x 1 2 1 1 3
dx = = + =
0 (1 + sin x)3 2 (1 + sin x)2 0
8 2 8
with [c; d] (0; 1). In the integral we recognize again a form of type (i) of Example 1122,
which is an integral of the type
Z
'(t)a '0 (t)dt
R '(t)a+1
with '(t) = log t and a = 1. Since '(t)a '0 (t)dt = a+1 we have
Z d
d
log t log2 t 1
dt = = log2 d log2 c
c t 2 c 2
N
30.9. FUNCTIONS INTEGRABLE IN CLOSED FORM 823
(i) rational if it is de…ned through …nite combinations of the four elementary operations
(addition, subtraction, multiplication and division) on the variable x; it is easy to verify
that a rational function can be expressed as ratio of polynomials
a0 + a1 x + ::: + an xn
f (x) = (30.59)
b0 + b1 x + ::: + bm xm
(ii) algebraic if it is de…ned through …nite combinations of the four elementary operations
and of operations of extraction of root.
(vi) the functions obtained through both …nite combinations and …nite compositions of func-
tions belonging to the previous classes.
The elementary functions that are neither rational nor algebraic are called transcend-
ental. For example, such are the exponential functions, the logarithmic functions and the
trigonometric functions.
15
It is possible to show that the use of complex numbers allows us, actually, to reconduct the trigonometric
functions to linear combinations of exponential functions. The reader will encounter this type of results in
more advanced courses.
824 CHAPTER 30. RIEMANN’S INTEGRAL
The elementary functions can be written in …nite terms (that is, in closed form), which
gives them simplicity and tractability. The question, however, relevant for the integral
calculus is if their primitive is itself an elementary function, and therefore preserves the
tractability of the original function. This motivates the following de…nition:
De…nition 1128 An elementary function is said to be integrable in …nite terms if its prim-
itive is itself an elementary function.
In such a case, we will say also that f is explicitly integrable or integrable in closed form.
For example, f (x) = 2x is explicitly integrable since its primitive F (x) = x2 is an elementary
function. Also the functions f (x) = sin x, f (x) = cos x, as well as all the polynomials and
the exponential functions of the type f (x) = ekx with k 2 R are explicitly integrable.
Nevertheless, and it is what makes it interesting the argument of the section, not all
the elementary functions are explicitly integrable. The next result reports the remarkable
example of the Gaussian function.
Proposition 1129 The elementary functions e x2 and ex =x are not integrable in …nite
terms.
The proof of the proposition is based on results of complex analysis. The non-integrability
in …nite terms of these functions implies that of other important functions.
Example 1130 The function 1= log x is not integrable in …nite terms. Indeed, with the
change of variable x = et , we get dx = et dt and therefore, by substitution,
Z Z
1 et
dx = dt
log x t
Since ex =x is not integrable in …nite terms, it follows that also 1= log x is not so. In particular,
the famous integral function Z x
1
Li (x) = dt
2 log t
which is very important in the study of prime numbers, is not an elementary function. N
In the light of these examples of elementary functions that are not explicitly integrable, it
becomes important to have criteria that guarantee the integrability, or the non-integrability,
in …nite terms of a given elementary function. For the rational functions everything is simple:
Proposition 1131 The rational functions are integrable in …nite terms. In particular, the
primitive of a rational function f (x) is an elementary function given by the linear combin-
ation of the following functions:
Since the denominator is (x + 1)(x + 2), it is necessary to look for A and B such that
A B x 1
+ = 2 (30.60)
x+1 x+2 x + 3x + 2
The …rst term of (30.60) is equal to
Expressions (30.60) and (30.61) are equal if and only if A and B satisfy the system:
A+B =1
2A + B = 1
Therefore A = 2, B = 3 and
Z Z
x 1 2 3
dx = + dx = 2 log jx + 1j + 3 log jx + 2j
x2 + 3x + 2 x+1 x+2
N
Things are more complicated for algebraic and transcendental functions: some of them
are integrable in …nite terms, others are not. A full analysis of the topic is well beyond the
scope of this book.16 We just mention an important result of Liouville that establishes a
necessary and su¢ cient condition for the integrability in …nite terms of functions of the form
f (x)eg(x) . Inter alia, the result of Liouville permits to prove Proposition 1129, that is, the
2
non-integrability in …nite terms of the functions e x and ex =x.
This said, in some (lucky) cases the integrability in …nite terms of non-rational elementary
functions can be brought back, through suitable substitutions, to that of rational functions.
It is the case, for example, of functions of the type r(ex ), where r ( ) is a rational function.
Indeed, by setting x = log t and by recalling what we have seen in Section 30.8 on the
integration by substitution, we get
Z Z
x r(t)
r(e )dx = dt
t
Thanks to Proposition 1131, the rational function r (t) =t is integrable in …nite terms.
Another example is the transcendental function
a sin x + b cos x
f (x) =
c sin x + d cos x
with a; b; c; d 2 R and ; ; ; 2 Z. By setting x = 2 arctan t, that is,
x
tan =t
2
with simple trigonometric arguments we get:17
2t 1 t2
sin x = and cos x = (30.62)
1 + t2 1 + t2
With such a substitution we transform f (x) in the rational function
2t 1 t2
a 1+t2
+b 1+t2
2t 1 t2
c 1+t2
+d 1+t2
x x x tan x2
sin = tan cos = q
2 2 2 1 + tan2 x
2
x x
By substituting sin 2
and cos 2
in sin x and cos x we get (30.62).
30.10. IMPROPER INTEGRALS 827
O.R. The question of determining whether the inde…nite integral of a function belongs to a
given class of functions or not was tackled already by Newton and Leibniz. While the former,
in order to avoid resorting to transcendental functions, preferred to express the primitive
through algebraic functions (also through in…nite series of algebraic functions), the latter
gave priority to formulations in …nite terms and considered acceptable also non-algebraic
primitives. The vision of Leibniz prevailed and in the Nineteenth century the problem of
integrability in …nite terms became an important research area, with major contributions by
Joseph Liouville in the 1830s. H
y
2.5
1.5
0.5
0
O x
-0.5
-1
-4 -3 -2 -1 0 1 2 3 4
centered in the origin, seen in Example 999 and whose area is given by an integral of the
form Z +1
2
e x dx (30.63)
1
called Gauss’s integral. In this case the domain of integration is the whole real line ( 1; +1).
Let us begin with domains of integration of the form [a; +1). Given a function f :
[a; +1) ! R, consider the integral function F : [a; +1) ! R given by
Z x
F (x) = f (t) dt
a
828 CHAPTER 30. RIEMANN’S INTEGRAL
R +1
The de…nition of improper integral a f (x) dx is based on the limit limx!+1 F (x), that
is, on the asymptotic behavior of the integral function. For it we can have three cases:
De…nition 1134 Let f : [a; +1) ! R be a function integrable on every interval [a; b]
[a; +1) with integral function F . If limx!+1 F (x) 2 R, we set
Z +1
f (x) dx = lim F (x)
a x!+1
Rand
+1
the function f is said to be integrable in an improper sense on [a; +1). The value
a f (x) dx is called improper (or generalized) Riemann’s integral.
For brevity, in the sequel we will say that a function f is integrable on [a; +1), omitting
“in an improper sense”. We have the following terminology:
R +1
(i) the integral a f (x) dx converges if limx!+1 F (x) 2 R;
R +1
(ii) the integral a f (x) dx diverges positively (negatively) if limx!+1 F (x) = +1( 1);
R +1
(iii) …nally, if limx!+1 F (x) does not exist, we say that the integral a f (x) dx does not
exist or that it is oscillating.
Example 1135 Fixed > 0, let f : [1; +1) ! R be given by f (x) = x . The integral
function F : [1; +1) ! R is
8
Z x < 1
x1 1 if =6 1
F (x) = t dt = 1
1 :
log x if = 1
so that 8
< +1 if 1
lim F (x) = 1
x!+1 : if >1
1
It follows that the improper integral
Z +1
1
dx
1 x
exists for every > 0: it converges if > 1 and it diverges positively if 1. N
Ra
The integral f (x) dx on the domain of integration ( 1; a] is de…ned in an ana-
R1 1 Ra
logous way to
Rx a f (x) dx by considering the limit limx! 1 x f (t) dt, that is, the limit
limx! 1 a f (t) dt = limx! 1 ( F (x)).
30.10. IMPROPER INTEGRALS 829
Let us now consider the improper integral on the domain of integration ( 1; 1).
De…nition 1137 RLet f : R ! R beR a function integrable on every interval [a; b]. If there
+1 a
exist the integrals a f (x) dx and 1 f (x) dx, the function f is said to be integrable (in
an improper sense) on R and we set
Z +1 Z +1 Z a
f (x) dx = f (x) dx + f (x) dx (30.64)
1 a 1
R +1
provided we do not have an indeterminate form 1 1. The value 1 f (x) dx is called
improper (or generalized) Riemann’s integral of f on R.
It is easy to see that this de…nition does not depend on the choice of the point a 2 R.
Often, for convenience, we take Ra = 0.
+1
Also the improper integral 1 f (x) dx is called convergent or divergent according to
the fact that its value is …nite or is equal to 1.
The value of the integral in the previous example is coherent with the geometric inter-
pretation of integral as area with sign of the region under f . Indeed, such a …gure is a big
rectangle with in…nite base and height k. Its area is +1 if k > 0, it is zero if k = 0, and it
is 1 if k < 0.
830 CHAPTER 30. RIEMANN’S INTEGRAL
x2 x2
= lim + lim =1 1
x!+1 2 x! 1 2
and therefore the improper integral Z +1
xdx
1
does not exist because we have the indeterminate form 1 1. N
Di¤erently from Example 1138, the value of the integral of this last example is not
coherent with the geometric interpretation of integral. To convince us of this, let us observe
the following picture:
y
2
1
(+)
0
O x
(-)
-1
-2
-3
-3 -2 -1 0 1 2 3
30.10. IMPROPER INTEGRALS 831
The areas of the two regions under f for x < 0 and x > 0 are two “big triangles”of in…nite
base and height. They are intuitively equal, since perfectly symmetrical with respect to the
vertical axis, but of opposite sign (as indicated by the signs (+) and ( ) in …gure), and it is
natural to think that they compensate giving rise to an integral equal to 0. Nevertheless, the
de…nition requires the separate calculations of the two integrals as x ! +1 and as x ! 1,
which in this case generates the form of indetermination +1 1.
To try to reconcile the de…nition of the notion of integral on ( 1; +1) with the geometric
intuition, we can follow an alternative route, considering the unique limit
Z k
lim f (x) dx
k!+1 k
In place of the two limits on which the R k de…nition of improper integral is based, the
principal value considers only the limit of k f (x) dx. We will see in the examples that,
with this de…nition, the geometric intuition of integral as area with sign of the …gure below
f is preserved. It is, however, a weaker notion than the improper integral; indeed:
The principal value can therefore exist also when the improper integral does not exist. To
better illustrate the relation between the two notions of integral on ( 1; 1), let us consider
a more general version of Example 1140.
832 CHAPTER 30. RIEMANN’S INTEGRAL
x2 x2
= lim + x + lim x =1 1
x!+1 2 x! 1 2
does not exist because we have the indeterminate form 1 1. Concerning the principal
value we have
Z +1 Z k
PV f (x) dx = lim (x + ) dx
1 k!+1 k
8
Z k < +1 if >0
= lim xdx + 2 k = 2 lim k = 0 if =0
k!+1 k k!+1 :
1 if <0
R +1
and therefore the principal value exists: PV 1 (x + ) dx = 1, unless is zero. N
In the example the principal value satis…es the geometric intuition of the integral as area
with sign. Indeed, when = 0 the intuition is obvious (see the …gure and the comment after
Example 1140). In the case > 0 observe the …gure:
2.5 y
1.5
0.5
(+)
0
x
-0.5 (-)
-1
-1.5
-2
-3 -2 -1 0 1 2 3
The negative area of the “big triangle”indicated by ( ) in the negative part of the abscissae
is equal and opposite to the positive area of the big triangle indicated by (+) in the positive
part of the abscissae. If we imagine that such areas cancel each other, what “is left” is the
30.10. IMPROPER INTEGRALS 833
area of the dotted …gure, which is clearly in…nite and with + sign (being above the horizontal
axis). For < 0 similar considerations hold:
y
2
1
(+)
0
(-) x
-1
-2
-3
-3 -2 -1 0 1 2 3
The negative area of the “big triangle”indicated by ( ) in the negative part of the abscissae
is equal and opposite to the positive area of the big triangle indicated by (+) in the positive
part of the abscissae. If we imagine that such areas cancel each other out, what “is left” is
also here the area of the dotted …gure, which is clearly in…nite and with negative sign (being
below the horizontal axis).
and therefore the improper integral does not exist, since we have the indeterminate form
1 1. Regarding the principal value, we have instead
Z +1 Z k
x
PV f (x) dx = lim dx
1 k!+1 k 1 + x2
1 1
= lim log 1 + k 2 log 1 + k 2 =0
k!+1 2 2
N
834 CHAPTER 30. RIEMANN’S INTEGRAL
Properties
Being de…ned as limits, the properties of improper integrals follow from the properties of
limits of functions seen in Section 11.4. In particular, the improper integral preserves the
properties of linearity and of monotonicity of Riemann’s integral.
Let us begin with linearity, which follows from the algebra of limits seen in Proposition
428.
Proposition 1144 Let f; g : [a; +1) ! R be two functions integrable on [a; +1). Then,
for every ; 2 R, the function f + g : [a; +1) ! R is integrable on [a; +1) and we have
Z +1 Z +1 Z +1
( f + g) (x) dx = f (x) dx + g (x) dx (30.65)
a a a
Proof Thanks to the linearity of Riemann’s integral, and to points (i) and (ii) of Proposition
428, we have
Z x
lim ( f + g) (x) dx = lim ( F (x) + G (x))
x!+1 a x!+1
The property of monotonicity of limits of functions (see Proposition 427 and its scalar
variants) implies the property of monotonicity of the improper integral.
Proposition
R +1 1145 Let
R +1f; g : [a; +1) ! R be two functions integrable on [a; +1). If f g,
then a f (x) dx a g (x) dx.
Proof Thanks to the monotonicity of Riemann’s integral, we have F (x) G (x) for
every x 2 [a; +1). By the monotonicity of the limits of functions, we have therefore
limx!+1 F (x) limx!+1 G (x).
R +1
As we have seen in Example 1138, we have 0dx = 0. A simple consequence of
R +1 a
Proposition 1145 is therefore that a f (x) dx 0 when f is positive and integrable on
[a; +1).
30.10. IMPROPER INTEGRALS 835
Proposition 1145 gives also a simple comparison criterion for the divergence: given two
functions f; g : [a; +1) ! R integrable on [a; +1), with g f , we have
Z +1 Z +1
f (x) dx = +1 =) g (x) dx = +1 (30.66)
a a
and Z Z
+1 +1
g (x) dx = 1 =) f (x) dx = 1 (30.67)
a a
Criteria of integrability
We give now some criteria of integrability, limiting ourselves for simplicity to positive func-
tions f : [a; +1) ! R. In such a case, the integral function F : [a; +1) ! R is increasing.
Indeed, for every x2 x1 a,
Z x2 Z x1 Z x2 Z x1
F (x2 ) = f (t) dt = f (t) dt + f (t) dt f (t) dt = F (x1 )
a a x1 a
Rx
since x12 f (t) dt 0. Thanks to the monotonicity of the integral function, we have the
following characterization of improper integrals of positive functions:
Proposition 1146 Let f : [a; +1) ! R be a positive function integrable on every [a; b]
[a; +1). Then it is integrable on [a; +1) and
Z +1
f (t) dt = sup F (x) (30.68)
a x2[a;+1)
R1
In particular, a f (t) dt converges only if limx!+1 f (x) = 0 (provided such limit exists).
RThe
+1
positive functions f : [a; +1) ! R are therefore
R 1 integrable in an improper sense, that
is, a f (t) dt 2 R+ . In particular, their integral a f (t) dt either converges or diverges
positively: tertium non datur. We have convergence if and only if supx2[a;+1) F (x) <
+1, and onlyR +1if f is in…nitesimal as x ! +1 (provided the limit limx!+1 f (x) exists).
Otherwise, a f (t) dt diverges positively.
The condition limx!+1 f (x) = 0 is only necessary for the convergence, as Example 1135
with 0 < 1 shows. For example, if = 1 we have limx!+1 1=x = 0, but for every a > 0
we have Z +1 Z x
1 1 x
dt = lim dt = lim log = +1
a t x!+1 a t x!+1 a
R +1
and therefore a (1=t) dt diverges positively.
In stating the necessary condition limx!+1 f (x) = 0 we put the clause “provided such
limit exists”. The next simple example
R 1 shows that the clause is important because the limit
can not to exist even if the integral a f (t) dt converges.
836 CHAPTER 30. RIEMANN’S INTEGRAL
Proposition 1146 rests on the following simple property of limits of monotonic functions,
which is the version for functions of Theorem 285 for monotonic sequences.
Lemma 1148 Let ' : [a; +1) ! R be an increasing function. Then, limx!+1 ' (x) =
supx2[a;+1) ' (x).
Proof Let us consider …rst the case supx2[a;+1) ' (x) 2 R. Let " > 0. Since supx2[a;+1) ' (x) =
sup ' ([a; +1)), thanks to Proposition 119 there exists x" 2 [a; +1) such that ' (x" ) >
supx2[a;+1) ' (x) ". Since ' is increasing, we have
sup ' (x) " < ' (x" ) ' (x) sup ' (x) 8x x"
x2[a;+1) x2[a;+1)
Proof of Proposition 1146 Since f is positive, its integral function F : [a; +1) ! R is
increasing and therefore, thanks to Lemma 1148,
Let us suppose that there exists limx!+1 f (x). Let us show that the integral converges only
if limx!+1 f (x) = 0. Let us suppose, by contradiction, that limx!+1 f (x) = L 2 (0; +1].
Given 0 < " < L, there exists x" > a such that f (x) L " > 0 for every x x" . Therefore
Z +1 Z x" Z +1 Z +1 Z x
f (t) dt = f (t) dt + f (t) dt f (t) dt = lim f (t) dt
a a x" x" x!+1 x"
Z x
lim (L ") dt = (L ") lim (x x" ) = +1
x!+1 x x!+1
"
R +1
which shows that a f (t) dt diverges positively.
The next result is a simple comparison criterion to determine if the improper integral of
a positive function is convergent or divergent.
30.10. IMPROPER INTEGRALS 837
Corollary 1149 Let f; g : [a; +1) ! R be two positive functions integrable on every [a; b]
[a; +1), with f g. Then
Z +1 Z +1
g (x) dx 2 [0; +1) =) f (x) dx 2 [0; +1) (30.69)
a a
and Z Z
+1 +1
f (x) dx = +1 =) g (x) dx = +1 (30.70)
a a
R +1 R +1
Proof By Proposition 1145, we have a f (x) dx g (x) dx, while, thanks to Pro-
R +1 R +1
a
position 1146, we have a f (x) dx 2 [0; +1] and a g (x) dx 2 [0; +1]. Therefore,
R +1 R +1 R +1
f (x) dx converges if a g (x) dx converges, while a g (x) dx diverges positively if
R +1
a
a f (x) dx diverges positively.
The study of integral (30.63) of the Gaussian function f (x) = e x2 , to which we will
devote next section, is a very remarkable application of this corollary.
Proposition 1150 Let f; g : [a; +1) ! R be positive functions integrable on every [a; b]
[a; +1).
R +1
(i) If f g as x ! +1, then a g (x) dx converges (diverges positively) if and only if
R +1
a f (x) dx converges (diverges positively).
R +1 R +1
(ii) If f = o (g) as x ! +1 and a g (x) dx converges, then also a f (x) dx converges.
R +1 R +1
(iii) If f = o (g) as x ! +1 and a f (x) dx diverges positively, then also a g (x) dx
diverges positively.
R +1
In the light of Example 1135, Proposition 1150 implies that a f (x) dx converges if
there exists > 1 such that
1 1
f or f = o as x ! +1
x x
Since as x ! +1
1
f
x
R +1
Proposition 1150 implies 0 f (x) dx = +1. N
838 CHAPTER 30. RIEMANN’S INTEGRAL
We close by observing that, as the reader can verify, what has been proved for positive
functions, extends easily to all the functions f : [a; +1) ! R eventually positive, that is,
such that there exists c > a for which f (x) 0 for every x c.
On the other hand, the equality between integrals (30.71) and (30.72) is quite intuitive in
the light of the symmetry with respect to the vertical axis of the Gaussian bell.
Thanks to De…nition 1137, the value of the integral of the Gaussian function, the so-called
Gauss’s integral, is therefore
Z +1 Z +1 Z 0
2 2 2 p
e x dx = e x dx + e x dx = (30.73)
1 0 1
Gauss’s integral is central in probability theory, where it is usually presented in the form:
Z +1
1 x2
p e 2 dx
1 2
By proceeding by substitution, it is easy to verify that for every pair of scalars a; b 2 R, we
have Z +1
(x+a)2 p
e b2 dx = b (30.74)
1
p
which implies, setting b = 2 and a = 0,
Z +1
1 x2
p e 2 dx = 1
1 2
The improper integral on R of the function
1 x2
f (x) = p e 2
2
has therefore unitary value and, thus, it is a density function, as the reader will see in
statistics courses. This explains the importance of this particular form of the Gaussian
function.
De…nition 1154 Let f : [a; b) ! R be a continuous function such that limx!b f (x) = 1.
If Z z
lim f (x) dx = lim [F (z) F (a)]
z!b a z!b
exists (…nite or in…nite), the function f is said to be integrable in an improper sense on
Rb Rb
[a; b] and such limit is assumed as a f (x) dx. The value a f (x) dx is called improper (or
generalized) Riemann’s integral.
If the unboundedness of the function concerned the point a, or both, we would give a
completely analogous de…nition. If the unboundedness concerned a point c 2 (a; b), it would
be su¢ cient to consider separately the two intervals [a; c] and [c; b].
Example 1155 Let f : [a; b] ! R be given by
f (x) = (b x) with >0
Given that its integral function is
8
>
> (b x) +1
< for 0 < 6= 1
F (x) = +1
>
>
: log jb xj for =1
we have
0 if > 1
lim F (x) =
x!b +1 if 0 < 1
It follows that the improper integral
Z b
1
dx
a (b x)
exists for every > 0: it converges if > 1 and it diverges positively if 0 < 1. N
Also for these improper Rintegrals a version of Proposition 1150 could be proven. In this
b
case, it allows to state that a f (x) dx converges if there exists > 1 such that
1 1
f or f = o as x ! b
(b x) (b x)
The comparison with (b x) is an important criterion of convergence for these improper
integrals.
O.R. When the interval is unbounded, for the improper integral to converge, the function
must tend to zero quite rapidly (as x with > 1). When the function is unbounded,
for the improper integral to converge the function must tend to in…nity fairly quickly – as
(b x) with > 1. Both things are quite intuitive: for the area of an unbounded surface
to exist …nite, its portion “that escapes to in…nity” must be very strict. For example, the
function f : R+ ! R+ de…ned by f (x) = 1=x is not integrable either on intervals of the
type [a; +1), a > 0 or on intervals of the type [0; a]: indeed the integral function of f is
F (x) = log x that diverges either as x ! +1 and as x ! 0+ . The functions (asymptotic to)
1= (x b)1+" , with " > 0, are integrable either on the intervals of the type [b; +1), b > 0,
and on the intervals of the type [0; b]. H
Chapter 31
Parameter-dependent integrals
f : [a; b] [c; d] ! R
de…ned on the rectangle [a; b] [c; d] in R2 . If for every y 2 [c; d], the scalar function
f ( ; y) : [a; b] ! R is integrable in [a; b], then to every such y the real number
Z b
f (x; y)dx (31.1)
a
can be associated. Unlike the integrals we have seen up to now, the value of the de…nite
integral (31.1) depends on the value of the variable y, which is usually interpreted as a
parameter. Such an integral, referred to as parameter-dependent integral, therefore de…nes a
scalar function F : [c; d] ! R in the following way:
Z b
F (y) = f (x; y)dx (31.2)
a
Note that, although function f is of two variables, function F de…ned above is scalar. Indeed
it does not depend in any way on the variable x which in this setting plays the same role as
mute variables of integration.
Functions of type (31.2) appear in applications more frequently than one may initially
think. Therefore, having the appropriate instruments to study such objects is crucial.
31.1 Properties
We shall study two properties of function F , continuity and di¤erentiability. Let us start
with continuity.
841
842 CHAPTER 31. PARAMETER-DEPENDENT INTEGRALS
Formula (31.3) is referred to as “passage of the limit under the integral sign”.
Proof Take " > 0. We must show that there exists a > 0 such that
By hypothesis, f is continuous on the compact set [a; b] [c; d]. By Theorem 473, it is
therefore uniformly continuous on [a; b] [c; d] , so that there is a > 0 such that
"
k(x; y) (x0 ; y0 )k < =) jf (x; y) f (x0 ; y0 )j < (31.4)
b a
for every (x; y) 2 [a; b] [c; d]. Therefore, for every y 2 [c; d] \ (y0 ; y0 + ) we have that
Zb
"
jF (y) F (y0 )j jf (x; y) f (x; y0 )j dx < (b a) = "
b a
a
as desired.
Proposition 1157 Suppose that f : [a; b] [c; d] ! R and its partial derivative @f =@y are
both continuous on [a; b] [c; d]. Then the function F : [c; d] ! R is di¤ erentiable on (c; d)
and we have that Z b
0 @
F (y) = f (x; y)dx (31.5)
a @y
Zb
0 F (y + h) F (y0 ) f (x; y + h) f (x; y)
F (y) = lim = lim dx
h!0 h h!0 h
a
and Z Z
b b
@ f (x; y + h) f (x; y)
f (x; y)dx = lim dx
a @y a h!0 h
equivalence (31.5) is equivalent to
Zb Z b
f (x; y + h) f (x; y) f (x; y + h) f (x; y)
lim dx = lim dx
h!0 h a h!0 h
a
31.1. PROPERTIES 843
Proof Let us set y0 2 [c; d]. For every x 2 [a; b] the function f (x; ) : [c; d] ! R is, by
hypothesis, di¤erentiable, so that, by Lagrange’s Theorem, there exists x 2 [0; 1] such that
f (x; y0 + h) f (x; y0 ) @f
= (x; y0 + x h)
h @y
Note that x depends on x. Let us write the di¤erence quotient of function F in y0 2 (c; d):
Zb
F (y0 + h) F (y0 ) @f
(x; y0 ) dx (31.6)
h @y
a
Zb Zb
f (x; y0 + h) f (x; y0 ) @f
= dx (x; y0 ) dx
h @y
a a
Zb
@f @f
= (x; y0 + x h) (x; y0 ) dx
@y @y
a
Zb
@f @f
(x; y0 + x h) (x; y0 ) dx
@y @y
a
The partial derivative @f =@y is continuous on the compact set [a; b] [c; d], so it is also
uniformly continuous. Thus, given any " > 0, there exists a > 0 such that
@f @f "
k(x; y) (x; y0 )k < =) (x; y) (x; y0 ) < (31.7)
@y @y b a
for every y 2 [c; d]. Therefore, for jhj < we have that
Zb
F (y0 + h) F (y0 ) @f
(x; y0 ) dx < " 8jhj <
h @y
a
that is
Zb Zb
@f F (y0 + h) F (y0 ) @f
(x; y0 ) dx "< < (x; y0 ) dx + " 8jhj <
@y h @y
a a
Since the above holds for every " > 0, it follows that
Zb Zb
@f F (y0 + h) F (y0 ) @f
(x; y0 ) dx lim (x; y0 ) dx
@y h!0 h @y
a a
that is
Zb
F (y0 + h) F (y0 ) @f
lim = (x; y0 ) dx
h!0 h @y
a
as desired.
Z(y)
G(y) = f (x; y)dx (31.8)
(y)
The following result extends Proposition 1157. Formula (31.9) is referred to as Leibniz’s
rule.
Proposition 1159 Suppose that f : [a; b] [c; d] R2 ! R and its partial derivative @f =@y
are both continuous on [a; b] [c; d]. If ; : [c; d] ! [a; b] are di¤ erentiable, then the function
G : [c; d] ! R is di¤ erentiable on (c; d) and we have that
Z (y)
0 @f 0 0
G (y) = (x; y)dx + (y)f ( (y); y) (y)f ( (y); y) (31.9)
(y) @y
31.2. VARIABILITY: LEIBNIZ’S RULE 845
Since
G(y) = H( (y) ; (y) ; y)
the derivative of G with respect to y n a point y0 2 (c; d) can be calculated by using the
chain rule:
@H @H @H
G0 (y0 ) = (a0 ; b0 ; y0 ) 0
(y) + (a0 ; b0 ; y0 ) 0
(y) + (a0 ; b0 ; y0 ) (31.10)
@v @z @y
where a0 = (y0 ) and b0 = (y0 ). By Proposition 1157 we have that
Z b0
@H @
(a0 ; b0 ; y0 ) = f (x; y)dx (31.11)
@y a0 @y
Example 1160 Let f (x; y) = x2 + y 2 , a (x) = sin x and b (x) = cos x. By setting
Z y
cos
G(y) = x2 + y 2 dx
sin y
The extension of Proposition 1157 to the improper case is a delicate task and it requires
a dominance condition. For the sake of simplicity, in the statement we make the assumption
that I is the real line and J a closed and bounded interval. Analogous result, which we omit
for brevity, hold when I is a half-line and J an unbounded interval.
The proof of the above result is not simple, so we omit it. Note that the dominance
conditionR (31.14), which is based on the auxiliary function g, guarantees, inter alia, that the
+1
integral 1 f (x; y)dx converges, thanks to the Comparison Convergence Criterion stated
in Corollary 1149.
y 2 x2 y 2 x2 y 2 x2
sin x e = jsin xj e e g (x)
R +1 2
Furthermore, 1 e x dx < +1. The hypotheses of Proposition 1161 are satis…ed and so
equation (31.15) takes the form
Z +1 Z +1
@ 2 2 2 2
F 0 (y) = sin x e y x dx = 2y sin x e y x dx = 2yF (y)
1 @y 1
Chapter 32
Stieltjes’integral
In many applied sciences, such as probability calculus, statistics and economics, Stieltjes’
integral is widely used, as it represents an extension of Riemann’s integral. Such an extension
can be thought of in the following way: while Riemann’s integral is based on summations
such as
Xn X n
mk (xk xk 1 ) and Mk (xk xk 1 ) (32.1)
k=1 k=1
n
X n
X
mk (g(xk ) g(xk 1 )) and Mk (g (xk ) g (xk 1 )) (32.2)
k=1 k=1
Clearly (32.1) is a special case of (32.2), with g(x) = x. One may ask oneself why to write
and compute summations such as (32.2). Let us remind the reader what the meaning of
(32.1) itself is. In Riemann’s integral, every interval [xi 1 ; xi ] obtained by sectioning [a; b]
is measured according to its length xi = xi xi 1 . Clearly, taking its length is the most
intuitive way to measure an interval. However it is not the only way; in many problems, it
might come more natural to measure an interval according to a di¤erent way. For example,
if [xi 1 ; xi ] represents production between xi 1 and xi , the most appropriate measure for
such an interval is the additional cost it entails: if C (x) is the total cost for producing x,
the measure that must be assigned to [xi 1 ; xi ] is C (xi ) C (xi 1 ). If, [xi 1 ; xi ] represents
instead an interval in which a random value can take values in and F (x) represents the
probability of the random value taking value x, the most natural way to measure [xi 1 ; xi ]
is F (xi ) F (xi 1 ). Such scenarios are common in economics and in many applications.
In order for Stieltjes’integral to exist, function g must satisfy some minimal regularity
conditions: in particular, function g must be at least monotone. Such care is useless in
the case of the Riemann’s integral as g (x) = x is a continuous, strictly monotone and
di¤erentiable function.
In the case of Riemann’s integral, existence also required conditions on the integrand
function f . Such properties, as we shall see, are still necessary for Stieltjes’integral as well.
847
848 CHAPTER 32. STIELTJES’INTEGRAL
32.1 De…nition
Let us consider two functions f; g : [a; b] R ! R, with f bounded and g increasing.1 For
every partition = fa = x0 ; x1 ; :::; xn = bg of [a; b] and for every interval Ii = [xi 1 ; xi ] we
can de…ne the following quantities
is referred to as right Stieltjes sum. It can be easily shown that, for every partition , it
holds that
I( ; f; g) S( ; f; g)
When the equality holds, we get Stieltjes’integral:
De…nition 1163 Let two functions f; g : [a; b] ! R be given, with f bounded and g increas-
ing. We say that f is Stieltjes integrable with respect to function g whenever
sup I( ; f; g) = inf S( ; f; g)
2 ([a;b]) 2 ([a;b])
Rb
The common value is called Stieltjes’integral of f and it is denoted as a f (x)dg(x).
For g (x) = x we get Riemann’s integral. The functions f and g are respectively called
Rintegrand
b
function and integrator function. For the sake of brevity, we shall often write
a f dg thus omitting the arguments of such functions.
N.B. In the remaining part of the chapter we will tacitly assume f and g to be any two
scalar functions de…ned on [a; b], with f bounded and g increasing. O
Proposition 1164 The function f is Stieltjes integrable with respect to g if for every " > 0
there exists a partition 2 ([a; b]) such that S( ; f; g) I( ; f; g) < ".
As for Riemann’s integral, it is important to know which are the classes of integrable
functions. As one may expect, the answer depends on the regularity of both functions f and
g (let the reader be reminded that we assumed g to be increasing).
Rb
Proposition 1165 The integral a f dg exists if at least one of the following two conditions
is met:
(i) f is continuous;
Note that (i) corresponds to f being continuous for Riemann’s integral, while (ii) corres-
ponds to the case in which f is monotone.
Proof (i) The proof relies on the same steps as that of Proposition 1091. Since f is continu-
ous on [a; b], it is also bounded (Weierstrass’Theorem) and uniformly continuous (Theorem
473). Take " > 0. There exists a " > 0 such that
Let = fxi gni=0 be a partition of [a; b] such that j j < ". Thanks to condition (32.3), for
every i = 1; 2; : : : ; n we have that
(ii) Since g is continuous on [a; b], it is also bounded and uniformly continuous. Having
set a " > 0, there is a " > 0 such that
Let = fxi gni=0 be a partition of [a; b] such that j j < " . For every pair of consecutive
points of such a partition, we have that g(xi ) g(xi 1 ) = jg(xi ) g(xi 1 )j < ". The proof
850 CHAPTER 32. STIELTJES’INTEGRAL
now follows the same steps as that of Proposition 1094. Suppose that f is increasing (if f is
decreasing the reasoning is analogous). We have that
so that
n
X
S ( ; f; g) I ( ; f; g) = sup f (x) (g(xi ) g(xi 1 ))
i=1 x2[xi 1 ;xi ]
Xn
inf f (x) (g(xi ) g(xi 1 ))
x2[xi 1 ;xi ]
i=1
n
X n
X
= f (xi ) (g(xi ) g(xi 1 )) f (xi 1 ) (g(xi ) g(xi 1 ))
i=1 i=1
Xn
= (f (xi ) f (xi 1 )) (g(xi ) g(xi 1 ))
i=1
Xn
< " (f (xi ) f (xi 1 )) < " (f (b) f (a))
i=1
Lastly, we extend Proposition 1092 to Stieltjes’integral, requiring that g not share the
possible discontinuities of f .
We omit the proof of this remarkable result which, inter alia, generalizes point (i) of
Proposition 1165. However, note that, while Proposition 1092 allowed for in…nitely many
discontinuities, in this more general setting we restrict ourselves to considering …nitely many
ones.
32.3 Calculus
When g is di¤erentiable, Stieltjes’integral can be written as a Riemann’s integral.
Proposition 1167 Let g be di¤ erentiable and g 0 Riemann integrable. Then f is integrable
with respect to g if and only if f g 0 is Riemann integrable; in such a case we have that
Z b Z b
f (x)dg (x) = f (x)g 0 (x)dx (32.4)
a a
2
In other words, we require the two functions f and g to not be discontinuous in the same point.
32.3. CALCULUS 851
Proof Since g 0 is Riemann integrable, for any given " > 0 there exists a partition such
that
S(g 0 ; ) I(g 0 ; ) < "
that is, denoting by Ii = [xi 1 ; xi ] the generic i-th interval in partition ,
n
X
sup g 0 (x) inf g 0 (x) xi < " (32.5)
x2Ii x2Ii
i=1
From (32.5) we also deduce that, for any pair of points si ; ti 2 Ii , we have that
n
X
g 0 (si ) g 0 (ti ) xi < " (32.6)
i=1
Always referring to the generic interval Ii of the partition, we can observe that, thanks to
the di¤erentiability of g, there is a point ti 2 [xi 1 ; xi ] such that
So that we get
n
X n
X
M" f (si ) gi f (si )g 0 (si ) xi M"
i=1 i=1
n
X n
X
Note that S(f g 0 ; ) f (si )g 0 (si ) xi , from which f (si ) gi S(f g 0 ; ) + M ", and
i=1 i=1
so also
S( ; f; g) S(f g 0 ; ) + M " (32.7)
One can symmetrically prove that
Inequality (32.9) holds for any partition of the interval [a; b] and for every " > 0. So
Z b Z b
f (x) dg (x) = f (x) g 0 (x)dx (32.10)
a a
852 CHAPTER 32. STIELTJES’INTEGRAL
Form (32.10) and (32.11) one can see that f g 0 is Riemann integrable if and only if f is
Stieltjes integrable with respect to g, in which case we get (32.4).
This makes computations easier, as the techniques for solving Riemann’s integrals can also
be used for Stieltjes’integrals: in particular integration by substitution and by parts can be
used; furthermore it is not hard to de…ne the generalized Stieltjes’integral by following the
same steps for the generalized Riemann’s integral.
From a theoretical standpoint, Stieltjes’integral substantially extends the reach of Riemann’s
integral, while keeping, also thanks to (32.4), its remarkable analytical properties. Such an
extraordinary balance between being more general while still being analytically tractable
allows us to grasp the importance of Stieltjes’integral.
Let us conclude with a useful variation on this theme (which we won’t prove).
If is continuous (hence also Riemann integrable), this proposition follows from the pre-
vious one as, thanks to the Second Fundamental Theorem of Calculus, g is di¤erentiable
with g 0 = .
32.4 Properties
Properties similar to those for Riemann’s integral hold for Stieltjes’. The only substantial
novelty lies in a linearity property which holds not only with respect to the integrand function
f , but with respect to the integrator function g as well. Let us list the properties without
proving them, as the proofs are analogous to those of Section 30.5.
(iv) Monotonicity:
Z b Z b
f1 f2 =) f1 dg f2 dg
a a
g x+
0 = lim g (x) and g x0 = lim g (x)
x!x+
0 x!x0
g x+
0 g x0
In other words, Stieltjes’ integral is the sum of all the jumps of the integrator in the
points of discontinuity, multiplied by the value of the integrand in such points. Note that,
as the integrator is monotone, the jumps are either all positive (increasing monotonicity) or
all negative (decreasing monotonicity).
4
The positivity of and is required in order to ensure that the integrator function g1 + g2 is increasing.
854 CHAPTER 32. STIELTJES’INTEGRAL
Rb
Proof By Proposition 1165, the integral a f dg exists. We must show that its value is
(32.13). Let us consider a partition of [a; b] which is …ne-grained enough so that in every
interval Ii = [xi 1 ; xi ] there is at most one point of discontinuity cj ; j = 1; 2; :::; n (otherwise,
it would be enough to add at most n points to obtain the desired partition). Therefore, we
have that = fx0 ; x1 ; :::; xm g with m n. For such a partition it holds that
m
X
I( ; f; g) = mi (g(xi ) g(xi 1 )) (32.14)
i=1
where mi = inf Ii f (x). Let us consider the generic i-th term of the summation in (32.14),
which refers to interval Ii . There are two cases:
1. There exists j 2 f1; 2; :::; ng such that cj 2 Ii . In such a case, since Ii does not contain
any other points of discontinuity of g besides cj , we have that
and furthermore
f (cj ) inf f (x) = mi
Ii
Let us denote as J the set of indexes i 2 f1; 2; :::; mg such that cj 2 Ii for some
j 2 f1; 2; :::; ng. Clearly, jJj = n.
X n
X h i
I( ; f; g) = mi (g(xi ) g(xi 1 )) f (cj ) g c+
j g cj
i2J i=1
So
n
X
I( ; f; g) f (ci ) g c+
i g ci S( ; f; g)
i=1
Since the inequalities hold for …ner partitions than that considered, we have that
n
X
sup I( ; f; g) f (ci ) g c+
i g ci inf S( ; f; g)
2 2
i=1
Rb
which implies that, since the integral a f dg exists, that
Z b n
X
f dg = sup I( ; f; g) = inf S( ; f; g) = f (ci ) g c+
i g ci
a 2 2
i=1
g c+
i g ci =1 8i
Stieltjes’integral thus includes addition as a particular case. more generally, we shall soon
see that the mean value of a random variable can be seen as a Stieltjes’integral.
856 CHAPTER 32. STIELTJES’INTEGRAL
Proposition 1171 Given two functions f; g : [a; b] ! R which are both increasing, it holds
that Z b Z b
f dg + gdf = f (b) g (b) f (a) g (a) (32.17)
a a
Proof For every " > 0 there are two partitions, = fxi gni=0 and 0 = fyi gni=0 , of [a; b] such
that
Z b Xn
"
f dg f (xi 1 ) (g (xi ) g (xi 1 )) <
a 2
i=1
and
Z b n
X "
gdf g (yi ) (f (yi ) f (yi 1 )) <
a 2
i=1
n
X n
X
f (zi 1 ) (g (zi ) g (zi 1 )) + g (zi ) (f (zi ) f (zi 1 )) = f (b) g (b) f (a) g (a)
i=1 i=1
which implies
Z b Z b
f dg + gdf f (b) g (b) + F (a) g (a) < "
a a
thus obtaining the integration by parts formula (30.52) for Riemann’s integral .
Theorem 1172 Let f be continuous and g increasing. If ' : [c; d] ! [a; b] is a strictly
increasing function, then (f ') is integrable with respect to g, with
Z d Z '(d)
f (' (t)) d (g ') (t) = f (x) dg (x) (32.18)
c '(c)
32.7. CHANGE OF VARIABLE 857
When g is strictly increasing, the Stieltjes integral can be computed via a Riemann integral.
This result complements Proposition 1167, which showed that the same is true, but with a
di¤erent formula, when g is di¤erentiable.
858 CHAPTER 32. STIELTJES’INTEGRAL
Chapter 33
Moments
In this chapter we outline a study of moments, a notion that plays a fundamental role in
probability theory and, through it, in a number of applications. For us, it is also a way to
illustrate what we learned in the last two chapters.
33.1 Densities
We say that an increasing function g : R ! R is a probability integrator if:
Example 1173 (i) Given any two scalars a < b, consider the probability integrator
8
>
> 0 if x < a
<
x a
g (x) = b a if a x b
>
>
:
1 if x > b
859
860 CHAPTER 33. MOMENTS
because Z x Z x
1
(t) dt = dt = g (x) 8x 2 [a; b]
1 a b a
R +1
and 1 (x) dx = 1.
(ii) The Gaussian integrator is
Z x
1 t2
g (x) = p e 2 dt
1 2
33.2 Moments
R +1
The improper Stieltjes integral, denoted 1 f (x) dg (x), can be de…ned in a similar way
than the improper Riemann integral. For it the proprieties (i)-(v) of Section 32.4 continue
to hold. The next important de…nition rests upon this notion.
De…nition 1174 The n-th moment of an integrator function g is given by the Stieltjes
integral Z +1
n = xn dg (x) (33.1)
1
For instance, 1 is the …rst moment (often called average or mean) of g, 2 is its second
moment, 3 is its third moment, and so on.
Proposition 1175 If the moment n exists, then all lower moment k, with k n, exist.
To assume the existence of higher and higher moments is, therefore, a more and more
demanding requirement. For instance, to assume the existence of the second moment is a
stronger hypothesis than to assume that of the …rst one.
R +1 n To ease matters,
Proof assume there is a scalar a such that g (a) = 0, so that n =
x dg (x). Since x = o (xn ) if k < n, the version for improper Stieltjes integrals of
k
a R +1
Proposition 1150-(ii) ensures the convergence of a xk dg (x), that is, the existence of k .
In this case, we are back to Riemann integration and we directly say that n is the n-th
moment of the density .
33.3. THE PROBLEM OF MOMENTS 861
Given a sequence f n g of scalars in [0; 1], is there an integrator g such that, for each
n, the term n is exactly its n-th moment n ?
The question amounts to ask whether sequences of moments have a characterizing prop-
erty, which then f n g should satisfy in order to have the desired property. This question
was …rst posed by Stieltjes himself in the same 1894-95 articles where it developed his notion
of integral. Indeed, to provide a setting where to address properly the problem of moments
was a main motivation for his integral (which, as we just remarked, is indeed the natural
setting where to de…ne moments).
Next we present a most beautiful answer given by Felix Hausdor¤ in the early 1920s. To
do it, we need to go back to the …nite di¤erences of Chapter 10.
862 CHAPTER 33. MOMENTS
In words, a sequence is totally monotone if its …nite di¤erences keep alternating sign
across their orders. A totally monotone sequence is positive because 0 xn = xn , as well as
decreasing because xn 0 (Lemma 372).
Proof We prove the “only if” part, the converse being signi…cantly more complicated.
k k
So,
R 1 nlet f n gk be a sequence of moments (33.3). It su¢ ces to show that ( 1) xn =
0 t (1 t) dg (t) 0. We proceed by induction on k. For k = 0 we trivially have
k k R1 n k 1 k 1 R1 n k 1
( 1) n = n = 0 t dg (t) for all n. Assume ( 1) xn = 0 t (1 t) dg (t)
for all n. Then,
k k 1
xn = xn = k 1 xn+1 k 1
xn
Z 1 Z 1
= ( 1)k 1
tn+1 (1 t)k 1
dg (t) tn (1 t)k 1
dg (t)
0 0
Z 1 Z 1
= ( 1)k 1
tn (1 t)k 1
(1 t) dg (t) = ( 1)k tn (1 t)k dg (t)
0 0
as desired.
The characterizing property of moment sequences is, thus, total monotonicity. It is truly
remarkable that a property of …nite di¤erences is able to pin down moments’sequences. Note
that for this result the Stieltjes integral is required: in the “if” part the integrator, whose
moments turn out to be the terms of the given totally monotone sequence, might well be
non-di¤erentiable (and so the Riemann version (33.2) might not hold).
We can then use Proposition 1161 to establish the existence and di¤erentiability of the
moment generating function.
R +1 In particular, if there exists " > 0 and a positive function
g : R ! R such that 1 g (x) dx < +1 and, for every y 2 [ "; "],
At y = 0 we get
F 0 (0) = 1
The derivative at 0 of the moment generating function is, thus, the …rst moment of the
density. R +1
If there exists a positive function h : R ! R such that 1 h (x) dx < +1 and, for every
y 2 [ "; "],
jxeyx (x)j = jxj eyx (x) h (x) 8x 2 R
then, by Proposition 1161, F : ( "; ") ! R is twice di¤erentiable, with
Z +1 Z +1
00 @ yx
F (y) = xe (x) dx = x2 eyx (x) dx
1 @y 1
At y = 0 we get
F 00 (0) = 2
By proceeding in this way (if possible), with higher order derivatives we get:
F 000 (0) = 3
(iv)
F (0) = 4
F (n) (0) = n
The derivative of order n at 0 of the moment generating function is, thus, the n-th moment
of the density. This fundamental property justi…es the name of this function.
864 CHAPTER 33. MOMENTS
x2
Example 1180 For the Gaussian density (x) = e 2 we have
Z +1 Z +1 Z +1
x2 1 2
F (y) = eyx (x) dx = eyx e 2 dx = e 2 (x 2yx) dx
1 1 1
Z +1 Z +1 2
Z +1
y2
1
x2
( 2yx+y 2 y2 ) dx = 1
x2
( 2yx+y 2 )+ y2 1
(x y)2
= e 2 e 2 dx = e 2 e 2 dx
1 1 1
where in the fourth equality we have added and subtracted y 2 . But, (30.74) of Chapter
R +1 1 2 y2 y2
30 implies 1 e 2 (x y) dx = 1, so F (y) = e 2 . We have F 0 (y) = ye 2 and F 00 (y) =
y2
e 2 (1 y), so 1 = F 0 (0) = 0 and 2 = F 00 (0) = 1. N
The next example shows that not all densities have a moment generating function; in
this case there is no " > 0 such that the integral (33.4) is …nite.
the …rst moment does not exist either. By the comparison criterion for improper Riemann
integrals, this implies n = +1 for every n 1. This density has no moments of any order.
N
Suppose that the moment generating function has derivatives of all orders. By Theorem
355,
X1
y 2 x2 y 3 x3 y n xn y n xn
eyx = 1 + yx + + + + + =
2 3! n! n!
n=0
So, it is tempting to write:
Z +1 Z 1
+1 X X 1 Z +1 X yn 1
y n xn y n xn
F (y) = eyx (x) dx = (x) dx = (x) dx = n
1 1 n=0 n! 1 n! n!
n=0 n=0
Under suitable hypotheses, spelled out in more advanced courses, it is legitimate to give in to
this temptation. Moment generating functions can be then expressed as a series of moments
of the density.
Part IX
Appendices
865
Appendix A
Permutations
A.1 Generalities
Combinatorics is an important area of discrete mathematics, useful in many applications.
Here we focus on permutations, a fundamental combinatorial topic that is important to
understand some of the topics of the book.
We start with a simple problem. We have at our disposal three pairs of pants and …ve
T-shirts. If there are no chromatic pairs that hurt our aesthetic sense, in how many possible
ways can we dress? The answer is very simple: in 3 5 = 15 ways. Indeed, let us call a, b, c
the pairs of pants and 1, 2, 3, 4, 5 the T-shirts: since the choice of a certain T-shirt does not
determine any (aesthetic) restriction on the choice of the pants, the possible pairings are
a1 ; a2 ; a3 ; a4 ; a5
b1 ; b2 ; b3 ; b4 ; b5
c1 ; c2 ; c3 ; c4 ; c5
We can therefore conclude that if we have to do two independent choices, one among n
di¤erent alternative and the other among m di¤erent alternatives, the the total number of
choices is n m. In particular, suppose that A and B are two sets with with n and m elements,
respectively. In particular, suppose the Cartesian product A B, which is set of the ordered
pairs (a; b) with a 2 A and b 2 B, has n m elements.
What has been said can be easily extended to the case of more than two choices: If
we have to do multiple choices, none of which implies restrictions on the others, the total
number of choices is the product of the numbers of alternatives of each choice.
Example 1182 (i) How many are the possible Italian licence plates? They have the form
AA 000 AA with two letters, three digits, and again two letters. The letters that can be
used are 22 and the digits are, obviously, 10. The number of (di¤erent) plates is, therefore,
22 22 10 10 10 22 22 = 234:256:000. (ii) In a multiple choice test, in each question
students have to select one of the three possible answers. If there are 13 questions, then the
overall number of possible selections is 3 3 3 = 313 = 1:594:323. N
867
868 APPENDIX A. PERMUTATIONS
A.2 Permutations
Intuitively, a permutation of n distinct objects is a possible arrangement of these objects.
For instance, with three objects a, b, c there are 6 permutations:
abc , acb , bac , bca , cab , cba (A.1)
We can formalize this notion through bijective functions.
Example 1185 (i) A deck of 52 cards can be reshu- ed in 52! di¤erent ways. (ii) Six
passengers can occupy in 6! = 720 di¤erent ways a six-passenger car. N
Indeed, Lemma 329 showed that n = o (n!). The already very fast exponentials are actually
slower than factorials, which de…nitely deserve their exclamation mark.
A.3. ANAGRAMS 869
A.3 Anagrams
We now drop the requirement that the objects be distinct and allow for repetitions. Spe-
ci…cally in this section we consider P n objects of h n di¤erent types, each type i with
h 1
multiplicity ki , with i = 1; :::; h, and i=1 ki = n. For instance, consider the 6 objects
a; a; b; b; b; c
Proposition 1186 The number of distinct arrangements, called permutations with repeti-
tions (or anagrams), is
n!
(A.2)
k1 !k2 ! kh !
Example 1187 (i) The possible anagrams of the word ABA are 3!= (2!1!) = 3. They
are ABA, AAB, BAA. (ii) The possible anagrams of the word MAMMA are 5!= (3!2!) =
120= (6 2) = 10. N
In the important two-type case, h = 2, we have k objects of one type and n k of the
other type. By (A.2), the number of distinct arrangements is
n!
(A.3)
k! (n k)!
This expression is usually denoted by
n
k
It is called binomial coe¢ cient and read “n over k”. In particular,
n n! n (n 1) (n k + 1)
= =
k k! (n k)! k!
with
n n!
= =1
0 0!n!
1
Note that, because of repetitions, these n objects do not form a set X. The notion of “multiset” is
sometimes used for collections in which repetitions are permitted.
870 APPENDIX A. PERMUTATIONS
n n
=
k n k
It captures a natural symmetry: the number of distinct arrangements remains the same,
regardless of which of the two types we focus on.
Example 1188 (i) In a parking lot, spots can be either free or busy. Suppose that 15
out of the 20 available spots are busy. The possible arrangements of the 5 free spots (or,
symmetrically, of the 15 busy spots) are:
20 20
= = 15; 504
5 15
(ii) We repeat an experiment 100 times: each time we can record either a “success” or a
“failure”, so a string of a 100 outcomes like F SF F:::S results. Suppose that we have recorded
92 “successes” and 8 “failures”. The number of the di¤erent strings that may result is:
100 100
= = 186; 087; 894; 300
92 8
N
(a + b)1 = a + b
(a + b)2 = a2 + 2ab + b2
(a + b)3 = a3 + 3a2 b + 3ab2 + b3
n n n n n
(a + b)n = an + a 1
b+ a 2 2
b + + abn 1
+ bn (A.4)
1 2 n 1
n
X n n k k
= a b
k
k=0
can be calculated by choosing one of the two terms (a or b) in each of the n factors and
making the product of the terms so chosen. We then sum all the products obtained in such
a way. The product an k bk is obtained by choosing n k times the …rst term a and the
A.4. NEWTON’S BINOMIAL FORMULA 871
n n
remaining k times the second term b. This can be done in = di¤erent ways:
n k k
n
the factor an k bk is, therefore, obtained in di¤erent ways. This proves the result.
k
Formula (A.4) is called Newton’s binomial formula. It motivates the name of binomial
n
coe¢ cients for the integers . In particular
k
X n
n n k
(1 + x) = x
k
k=0
Example 1190 A set of n elements has 2n subsets. Indeed, there is only one, 1 = n0 ,
subset with 0 elements (the empty set), n = n1 subsets with only one element, n2 subsets
with two elements, ..., and …nally only one, 1 = nn , subset – the set itself – with all the n
elements. N
Notions of trigonometry
B.1 Generalities
We usually call trigonometric circumference a circumference with the center in the origin
and of radius 1, oriented in counterclockwise sense and on which we move starting from the
point of coordinates (1; 0).
y
1.5
0.5
(1,0)
0
O x
-0.5
-1
-1.5
-2
-2 -1 0 1 2
Trigonometric circumference
Clearly, each point on the circumference determines an angle between the positive ho-
rizontal axis and the straight line joining the point with the origin; vice versa, each angle
determines a point on the circumference. This correspondence between points and angles
873
874 APPENDIX B. NOTIONS OF TRIGONOMETRY
can be, equivalently, viewed as a correspondence between points and arcs of circumference.
y
1.5
P
P
2
1
α'
0.5
α
0
O P 1 x
1
-0.5
-1
-1.5
-2
-2 -1 0 1 2
Angles are usually measured in either degrees or radians. A degree is the 360th part
of a round angle (corresponding to a complete round of the circumference); a radian is an
(apparently strange) unit of measure that assigns measure 2 to a round angle; it is therefore
its 2 -th part. We will use the radian as unit of measure of the angles because it presents
some advantages over the degree. In any case, the next table reports some equivalences
between degrees and radians.
degrees 0 30 45 60 90 180 270 360
3
radians 0 2
6 4 3 2 2
Angles that di¤er for one or more complete rounds of the circumference are identical: to
write or + 2k , with k 2 Z, is the same. We well therefore always take 0 <2 .
Fix a point P = (P1 ; P2 ) on the trigonometric circumference. It is called sine of the angle
(or of the arc) determined by it the ordinate P2 of the point P ; it is called instead cosine
the abscissa P1 of the point P .
The sine and the cosine of the angle (or the arc) are denoted, respectively, by sin
and cos . The sine is positive in the I and in the II quadrant, and negative in the III and
in the IV quadrant; the cosine is positive in the I and in the IV quadrant, and negative in
the II and in the III quadrant. For example,
3
0 p4 2 2 2
2
sin 0 p2
1 0 1 0
2
cos 1 2 0 1 0 1
Next we list some formulae that we do not prove (it would be su¢ cient to prove the
…rst two because the other ones follow from them).
sin ( + ) = sin cos + sin cos ; cos ( + ) = cos cos sin sin
and
sin ( ) = sin cos sin cos ; cos ( ) = cos cos + sin sin (B.4)
and r r
1 cos 2 1 + cos 2
sin = ; cos =
2 2
Prostaferesis formulae:
and
We close with few classical theorems that show how trigonometry is intimately linked
to the study of triangles. In these theorems a, b, c denote the lengths of the three sides
of a triangle and , , the angles opposite to them:
Theorem 1191 Sides are proportional to the sines of their opposite angles, that is,
a b c
= =
sin sin sin
The next result generalizes Pythagoras’ Theorem, which is the special case when the
triangle is right and side a is the hypotenuse (indeed, cos = cos =2 = 0).
B.3 Perpendicularity
The trigonometric circumference consists of the points x 2 R2 of unitary norm, that is,
kxk = 1. Hence, any point x = (x1 ; x2 ) 2 R2 can be moved back on the circumference by
dividing it by its norm kxk since
x
=1
kxk
The following picture illustrates:
It follows that
x2 x1
sin = and cos = (B.5)
kxk kxk
that is,
x = (kxk cos ; kxk sin )
Such trigonometric representation of the vector x is called polar. The components kxk cos
and kxk sin are called polar coordinates.
The angle can be expressed through the inverse trigonometric functions arcsin x,
arccos x and arctan x. To this end, observe that
x2
sin kxk x2
tan = = x1 =
cos kxk x1
x2 x1 x2
= arctan = arccos = arcsin
x1 kxk kxk
The equality = arctan x2 =x1 is especially important because it permits to express the angle
as a function of the coordinates of the point x = (x1 ; x2 ).
878 APPENDIX B. NOTIONS OF TRIGONOMETRY
Let x and y be two vectors of the plane R2 that determine the angles and :
By (B.4), we have:
that is,
x y
= cos ( )
kxk kyk
where is the angle that is di¤erence of the angles determined by the two points.
Such angle is right –i.e., the vectors x and y are “perpendicular” –when
x y
= cos = 0
kxk kyk 2
That is, if and only if x y = 0. In other words, two vectors of the plane R2 are perpendicular
when their inner product is zero.
Appendix C
C.1 Propositions
In this chapter we will introduce some basic notions of logic. Though, “logically”, these
notions should actually be at the beginning of the textbook, they can be best appreciated
after having learned some mathematics (even if in a logically disordered way). This is why
this chapter is in the Appendix, leaving to the reader to judge when it is best to read it.
We will call proposition a statement that can be either true or false. For example,
“the ravens are black” and “in the year 1965 it rained in Milan” are propositions. On the
contrary, the statement “in the year 1965 it has been cold in Milan” is not a proposition,
unless we specify the meaning of cold, for example with the proposition “in the year 1965
the temperature went below zero in Milan”.
We will denote propositions by letters such as p; q; :::. Moreover, we will denote for the
sake of brevity with 1 and 0, respectively, the truth or the falsity of a proposition: they are
called truth values.
C.2 Operations
The operations among propositions are:
(i) Negation. Let p be a proposition; the negation, denoted by :p, is the proposition that
is true when p is false and that is false when p is true. We can summarize the de…nition
in the following truth table
p :p
1 0
0 1
which reports the truth values of p and :p. For instance, if p is “in the year 1965 it
rained in Milan”, then :p is “in the year 1965 it did not rain in Milan”.
(ii) Conjunction. Let p and q be two propositions; the conjunction of p and q, denoted by
p ^ q, is the proposition that is true when p and q are both true and is false when at
879
880 APPENDIX C. ELEMENTS OF INTUITIVE LOGIC
p q p^q
1 1 1
1 0 0
0 1 0
0 0 0
For instance, if p is “in the year 1965 it rained in Milan” and q is “in the year 1965
the temperature went below zero in Milan”, then p ^ q is “in the year 1965 it rained in
Milan and the temperature went below zero”.
(iii) Disjunction. Let p and q be two propositions; the disjunction of p and q, denoted by
p _ q, is the proposition that is true when at least one between p and q is true and is
false when both of them are false.1 The truth table is:
p q p_q
1 1 1
1 0 1
0 1 1
0 0 0
For instance, with the previous examples of p and q, then p _ q is “in the year 1965 it
rained in Milan or the temperature went below zero”.
p q p =) q
1 1 1
1 0 0
0 1 1
0 0 1 (C.1)
The conditional is therefore true if, when p is true, also q is true, or if p is false (in
which case the truth value of q is irrelevant). The proposition p is called the antecedent
and q is the consequent. For instance, suppose the antecedent p is “I go on vacation”
and the consequent q is “I go to the sea”; the conditional p =) q is “If I go on
vacation, then I go to the sea”.
p q p =) q q =) p p () q
1 1 1 1 1
1 0 0 1 0
0 1 1 0 0
0 0 1 1 1
The biconditional is, therefore, true when the two involved implications are both true
or both false. With the last example of p and q, the biconditional p () q is “I go on
vacation if and only if I go to the sea”.
These …ve logical operations allow us to build new propositions form old ones. Starting
from the three propositions p, q and r, through negation, disjunction and conditional we can
build, for example, the proposition
: ((p _ :q) =) r)
O.R. The true-false dichotomy originates in the Eleatic school, which based its dialectics
upon it (Section 1.8). Apparently, it …rst appears as “[a thing] is or it is not” in the poem
of Parmenides (trans. Raven). A serious challenge to the universal validity of the true-false
dichotomy has been posed by some, old and new, paradoxes. We already encountered the set
theoretic paradox of Russell (Section 1.1.4). A simpler, much older, paradox is that of the
liar: consider the self-referential proposition “this proposition is false”. Is it true or false?
Maybe it is both.2 Be that as it may, for all relevant matters –in mathematics, let alone in
the empirical sciences –the dichotomy can be safely assumed.
Moreover, we have:
p :p p ^ :p p _ :p
1 0 0 1
0 1 0 1
If p is the proposition “all ravens are black”, the contradiction p ^ :p is “all ravens are both
black and non-black” and the tautology p _ :p is “all ravens are either black or non-black”.
: (p ^ q) :p _ :q and : (p _ q) :p ^ :q
They can be proved through the truth tables; we limit ourselves to the …rst law:
p q p^q : (p ^ q) :p :q :p _ :q
1 1 1 0 0 0 0
1 0 0 1 0 1 1
0 1 0 1 1 0 1
0 0 0 1 1 1 1
The table shows that the true values of : (p ^ q) and of :p _ :q are identical, as desired.
Note an interesting duality: the laws of non-contradiction and of the excluded middle can
be derived one from the other via de Morgan’s laws.
C.4. DEDUCTION 883
Indeed:
p q p =) q :p :q :q =) :p
1 1 1 0 0 1
1 0 0 0 1 0
0 1 1 1 0 1
0 0 1 1 1 1
: (p =) q) (p ^ :q) (C.3)
p q p =) q : (p =) q) p ^ :q
1 1 1 0 0
1 0 0 1 1
0 1 1 0 0
0 0 1 0 0
N.B. Given two equivalent propositions, one of them is a tautology if and only if also the
other one is so. O
C.4 Deduction
An equivalence is a biconditional which is a tautology, i.e., which is always true. In a similar
way, we call implication a conditional which is a tautology, that is, (p =) q) 1. In this
3
case, if p is true then also q is true. We say that q is a logical consequence of p, written
p j= q.
The antecedent p is now called hypothesis and the consequent q thesis. Naturally, we
have p q when we have simultaneously p j= q and q j= p.
In our naive setup, a theorem is a proposition of the form p j= q, that is, an implication.
The proof is an argument that proves that the conditional p =) q is, actually, an implic-
ation.4 In order to do this it is necessary to prove that, if the hypothesis p is true, then
also the thesis q is true. Usually we choose one among the following three di¤erent types of
proof:
3
When p is false the implication is automatically true, as the table of truth (C.1) shows.
4
In these introductory notes we remain vague about what a logical “argument” is, leaving a more de-
tailed analysis to more advanced courses. We expect, however, that readers can (intuitively) recognize, and
elaborate, such arguments.
884 APPENDIX C. ELEMENTS OF INTUITIVE LOGIC
(a) direct proof: p j= q, i.e., to prove directly that, if p is true, also q is so;
(c) proof by contradiction (reductio ad absurdum): p ^ :q j= r ^ :r, i.e., to prove that the
conditional p ^ :q =) r ^ :r is a tautology (i.e., that, if p is true and q is false, we
reach a contradiction r ^ :r).
The validity of (b) follows from the equivalence (C.2). The proof by contraposition is,
basically, an upside down direct proof (momentarily, Theorem 1199 will be proved by con-
traposition). Let us then focus on the two main types of proofs, direct and by contradiction.
N.B. (i) When both p j= q and q j= p hold, the theorem takes the form of equivalence
p q. The implications p j= q and q j= p are independent and each of them requires its own
proof (this is why in the book we studied separately the “if” and the “only if”). (ii) When,
as it is often the case, the hypothesis is the conjunction of several propositions, we write
p1 ^ ^ pn j= q. If = fp 1 ; :::; pn g, we say that q is a logical consequence of , written
j= q. O
C.4.1 Direct
Sometimes p j= q can be proved with a straight argument.
Direct proofs are, however, often articulated in several steps, in a divide et impera spirit.
The next lemma is key.
Proof Assume p j= r and r j= q. We have to show that p =) q is a tautology, that is, that
if p is true, then q is true. Assume that p is true. Then, r is true because p j= r. In turn,
this implies that q is true because r j= q.
p j= r1
r1 j= r2
(C.4)
rn j= q
The auxiliary n propositions ri break up the direct argument in n steps, thus forming a chain
of reasoning. We can write horizontally the scheme as:
p j= r1 j= r2 j= j= rn j= q
C.4. DEDUCTION 885
Example 1195 (i) Assume that p is “n2 + 1 is odd” and q is “n is even”. To prove p j= q,
let us consider the auxiliary proposition “n2 is even”. The implication p j= r is obvious,
while the implication r j= q will be proved momentarily (Theorem 1198). Jointly, these two
implications provide a direct proof p j= r j= q of p j= q, that is, of the proposition “if n2 + 1
is odd, then n is even”. (ii) Assume that p is “the scalar function f is di¤erentiable” and q
is “the scalar function f is integrable”. To prove p j= q is natural to consider the auxiliary
proposition “the scalar function f is continuous”. The implications p j= r and r j= q are
basic calculus results that, jointly, provide a direct proof p j= r j= q of p j= q, that is, of the
proposition “if the scalar function f is di¤erentiable, then it is integrable”. N
p q p ^ :q r ^ :r p =) q p ^ :q =) r ^ :r
1 1 0 0 1 1
1 0 1 0 0 0
0 1 0 0 1 1
0 0 0 0 1 1
(p =) q) (p ^ :q =) r ^ :r) (C.5)
The proof by contradiction is the most intriguing (recall Section 1.8 on the birth of the
deductive method). We illustrate it with one of the gems of Greek mathematics that we saw
in the …rst chapter. For brevity, we do not repeat the proof of the …rst chapter and just
present its logical analysis.
p
Theorem 1197 2 2 = Q.
Logical analysis In this, as in other theorems it might seem that there is no hypothesis, but
it is not so: simply the hypothesis is concealed. For example, here the concealed hypothesis
is “the rules
p of elementary algebra apply”. Let p be such concealed hypothesis, let q be the
thesis “ 2 2 = Q” and let r be the proposition “m=n is reduced to its lowest terms”. The
scheme of the proof is p ^ :q j= r ^ :r, i.e., if the rules of elementary algebra apply, the
negation of the thesis leads to a contradiction.
An important special case of the equivalence (C.5) is when the role of r is played by the
hypothesis p itself. In this case, (C.5) becomes
(p =) q) (p ^ :q =) p ^ :p)
(p =) q) (p ^ :q =) :p)
In words, it is necessary to show that the hypothesis and the negation of the thesis imply,
jointly, the negation of the hypothesis. Let us see an example.
Proof Let us assume, by contradiction, that n is odd. In such a case n2 is odd, which
contradicts the hypothesis.
Logical analysis. Let p be the hypothesis “n2 is even” and q the thesis “n is even”. The
scheme of the proof is p ^ :q j= :p.
C.4.3 Summing up
Proofs require, in general, some inspiration: there are no recipes or mechanical rules that
can help us in …nding in a proof by contradiction an auxiliary proposition r that determines
the contradiction and in a direct proof the auxiliary proposition ri that permit to articulate
a direct argument.
As to terminology, the implication p j= q can be read in di¤erent, but equivalent, ways:
C.5. THE LOGIC OF SCIENTIFIC INQUIRIES 887
(i) p implies q;
(ii) if p, then q;
(iii) p only if q ;
(iv) q if p;
The choice among these ways is a matter of expositional convenience. In a similar way,
the equivalence p q can be read as:
For example, the next simple result shows that the implication “a > 1 j= a2 > 1”is true,
i.e., that “a > 1 is su¢ cient condition for a2 > 1”, i.e., that “a2 > 1 is necessary condition
for a > 1”.
Proof Let p j= q. If p is true also q is true (both values equal to 1); if p is false (value 0), q
can be true or false (value either 0 or 1). Thus, v (p) v (q) for all v 2 V . The converse is
easily checked.
If all propositions in are true, so are their logical consequences. We say that is:
A scienti…c inquiry starts with a language L that describes the empirical phenomenon
under investigation. Let v be the true con…guration of the phenomenon. A scienti…c theory
is a consistent set P whose elements are assumed to be axioms, that is, to be true under
the (unknown) true con…guration v . All logical consequences of , established via theorems,
are then true under such assumption. If the set of axioms is complete, the truth value of all
propositions in P can be, in principle, decided. So, the function v is identi…ed.
To decide whether a scienti…c theory is true we have to check whether v (pi ) = 1 for each
i = 1; :::; n.8 If n is large, operationally this might be complicated (infeasible if is in…nite).
In contrast, to falsify the theory it is enough to exhibit, directly, a proposition of that is
false or, indirectly, a consequence of that is false. This operational asymmetry between
veri…cation and falsi…cation (emphasized by Karl Popper in the 1930s) is an important
methodological aspect. Indirect falsi…cation is, in general, the kind of falsi…cation that one
might hope for. It is the so-called testing of the implications of a scienti…c theory. In this
7
The importance of propositions whose truth value is independent of any interpretation was pointed out
by Ludwig Wittgenstein in his famous Tractatus (the term tautology is due to him).
8
For instance, special relativity is based on two axioms: p =“invariance of the laws of physics in all inertial
frames of reference”, q =“the velocity of light in vacuum is the same in all inertial frames of reference”. If v
is the true physical con…guration, the theory is true if v (p) = v (q) = 1.
C.6. PREDICATES AND QUANTIFIERS 889
indirect case, however, it is unclear which one of the posited axioms actually fails: in fact,
: (p1 ^ ^ pn ) :p1 _ _:pn . If not all the posited axioms have the same status, only
some of them being “core” axioms (as opposed to auxiliary ones), it is then unclear how
serious is the falsi…cation. Indeed, falsi…cation is often a chimera (especially in the social
sciences), as even the highly stylized setup of this section should suggest.
9x 2 R, x2 = 1 (C.7)
we would assert a (simple) truth: there is some real number (there are actually two of them:
x = 1) whose square is 1.
proposition “8x 2 X, p1 (x) ^ ^ pn (x)” would combine, so magnify, these two sources of
asymmetry).
In contrast, the existential proposition (C.8) can be veri…ed via an element x 2 X such
that p (x) is true. Of course, if X is large (let alone if it is in…nite), it may be operationally
not obvious how to …nd such an element. Be that as it may, falsi…cation is in a much bigger
trouble: to verify that proposition (C.8) is false we should check that, for all x 2 X, the
proposition p (x) is false. Operationally, existential propositions are typically not falsi…able.
N.B. (i) In the book we will often write “p (x) for every x 2 X” in the form
p (x) 8x 2 X
C.6.2 Algebra
In a sense, 8 and 9 represent the negation one of the other. So9
and, symmetrically,
: (8x, p (x)) 9x, :p (x)
: 8x, x2 = 1 or 9x, x2 6= 1
(respectively: it is not true that x2 = 1 for every x and it is true that for some x we have
x2 6= 1).
More generally
: (8x; 9y, p (x; y)) 9x; 8y, :p (x; y)
For example, let p (x; y) be the proposition “x + y 2 = 0”. We can equally assert that
: 8x; 9y, x + y 2 = 0
(it is not true that, for every x 2 R, we can …nd a value of y 2 R such that the sum x + y 2
is zero: it is su¢ cient to take x = 5) or
9x; 8y, x + y 2 6= 0
(it is true that, for every choice of y 2 R, there exists some value of x 2 R such that
x + y 2 6= 0: it is su¢ cient to take x 6= y 2 ).
9
To ease notation, in the quanti…ers we omit the clause “2 X”.
C.6. PREDICATES AND QUANTIFIERS 891
1 2 m
1x + 2x + + mx = 0 =) 1 = 2 = = m =0
m
The set xi i=1 has been, instead, called linearly dependent if it is not linearly inde-
pendent, i.e., if there exists a set f i gmi=1 of real numbers, not all equal to zero, such that
x1+ x2+ + xm = 0.
1 2 m
We can write these notions by making explicit the role of predicates. Let p ( 1 ; :::; m ) and
q ( 1 ; :::; m ) be the predicates “ 1 x1 + 2 x2 + + m xm = 0”and “ 1 = 2 = = m = 0”,
m
respectively. The set xi i=1 is linearly independent when
8 f i gm
i=1 , p ( 1 ; :::; m) =) q ( 1 ; :::; m)
9 f i gm
i=1 ; : (p ( 1 ; :::; m) =) q ( 1 ; :::; m ))
9 f i gm
i=1 ; p ( 1 ; :::; m) ^ :q ( 1 ; :::; m)
Mathematical induction
D.1 Generalities
Suppose that we want to prove that a proposition p(n), formulated for every natural number
n, is true for every such number n. Intuitively, it is su¢ cient to show that the “initial”
proposition p(1) is true and that the truth of each proposition p (n) implies that of the
“subsequent” one p (n + 1). Next we formalize this domino argument:1
Theorem 1201 (Induction principle) Let p (n) be a proposition stated in terms of each
natural number n. Suppose that:
Proof Suppose, by contradiction, that proposition p (n) is false for some n. Denote by n0
the smallest such n, which exists since every non-empty collection of natural numbers has
a smallest element.2 By (i), n0 > 1. Moreover, by the de…nition of n0 , the proposition
p (n0 1) is true. By (ii), p (n0 ) is true, a contradiction.
(ii) Induction step: prove that, for each n, if p(n) is true then p(n + 1) is true.
We illustrate this important type of proof by determining the sum of some important
series.
1
There are many soldiers, one next to the other. The …rst has the “right scarlet fever”, a rare form of
scarlet fever that contaminates instantaneously who is at the right of the sick person. All the soldiers take it
because the …rst one infects the second one, the second one infects the third one, and so on so forth.
2
In the set-theoretic jargon, we say that N is a well ordered set.
893
894 APPENDIX D. MATHEMATICAL INDUCTION
(i) Let
Xn n (n + 1)
1+2+ +n= s=
s=1 2
Initial step. For n = 1 the property is trivially true:
1 (1 + 1)
1=
2
Induction step. Assume it is true for n = k, that is,
Xk k (k + 1)
s=
s=1 2
We must prove that it is true also for n = k + 1, i.e., that
Xk+1 (k + 1) (k + 2)
s=
s=1 2
Indeed3
Xk+1 Xk k (k + 1) (k + 1) (k + 2)
s= s + (k + 1) = +k+1=
s=1 s=1 2 2
(ii) Let
Xn n (n + 1) (2n + 1)
12 + 22 + + n2 = s2 =
s=1 6
Initial step. For n = 1 the property is trivially true:
1 (1 + 1) (2 + 1)
12 =
6
Induction step. By proceeding as above we then have:
Xk+1 Xk k (k + 1) (2k + 1)
s2 = s2 + (k + 1)2 = + (k + 1)2
s=1 s=1 6
(k + 1) [k (2k + 1) + 6 (k + 1)] (k + 1) 2k 2 + 7k + 6
= =
6 6
(k + 1) (k + 2) (2k + 3)
=
6
as desired.
3
Alternatively, this sum can be derived by observing that the sum of the …rst and of the last addend is
n + 1, the sum of the second one and of the second-last one is still n + 1, etc. There are n=2 pairs and therefore
the sum is (n + 1) n=2.
D.2. THE HARMONIC MENGOLI 895
(iii) Let
n
!2
Xn X n2 (n + 1)2
13 + 23 + + n3 = s3 = s =
s=1 4
s=1
Initial step. For n = 1 the property is trivially true:
12 (1 + 1)2
3
1 =
4
Induction step. By proceeding as above we have:
Xk+1 Xk k 2 (k + 1)2 3
3
s = s + (k + 1) =3
+ (k + 1)3
s=1 s=1 4
(k + 1)2 k 2 + 4 (k + 1) (k + 1)2 (k + 2)2
= =
4 4
The proof is based on a couple of lemmas, the second of which is proven by induction.
Proof Consider the convex function f : (0; 1) ! (0; 1) de…ned by f (x) = 1=x. Because
1 1 1
k= (k 1) + k + (k + 1)
3 3 3
the Jensen’s inequality implies
1 1 1 1 1
= f (k) = f (k 1) + k + (k + 1) (f (k 1) + f (k) + f (k + 1))
k 3 3 3 3
1 1 1 1
= + +
3 k 1 k k+1
as desired.
Pn
Let sn = k=1 xk be the partial sum of the harmonic series xk = 1=k.
Proof Let us proceed by induction. Initial step: n = 1. We apply the previous lemma for
k = 3,
1 1 1 3
s3 1+1 = s4 = 1 + + + > 1 + = 1 + s1
2 3 4 3
Induction step: let us assume that the statement holds for n 1. We prove that it holds
for n + 1. We apply the previous lemma for k = 3n + 3,
1 1 1
s3(n+1)+1 = s3n+4 = s3n+1 + + +
3n + 2 3n + 3 3n + 4
1 1 1
sn + 1 + + +
3n + 2 3n + 3 3n + 4
3 1
sn + 1 + = sn + 1 + = sn+1 + 1
3n + 3 n+1
that proves the induction step. In conclusion, the result holds thanks to the induction
principle.
Proof of the theorem Since the harmonic series has positive terms, the sequence of the
partial sums fsn g is monotonic increasing. Therefore, it either converges or diverges. By
contradiction, let us assume that it converges, i.e., sn " L < 1. From the last lemma it
follows that
L lim s3n+1 lim (1 + sn ) = 1 + lim sn = 1 + L
n n n
which is a contradiction.
Appendix E
Cast of characters
897
898 APPENDIX E. CAST OF CHARACTERS
900
INDEX 901
Correspondence Diagonal
budget, 765 principal, 335
demand, 768 Di¤erence, 7
feasibility, 767 Di¤erence quotient, 499, 501
inverse, 765 Di¤erentiability with continuity, 527
of the envelope, 765 Di¤erential, 524
solution, 767 total, 547
Cosecant, 875 Di¤erentiation under the integral sign, 842
Cosine, 874 Direct sum, 485
Cost Discontinuity
marginal, 501 essential, 309
Cotangent, 875 jump, 309
Countable, 160 non-removable, 309
Cramer’s rule, 380 removable, 309
Criterion Distance (Euclidean), 87
comparison, 202 Divergence
general of convergence, 208 of improper integrals, 828
of comparison for series, 235 of sequences, 182
of Sylvester-Jacobi, 614 of series, 230
of the ratio for sequences, 203 Domain, 107
of the root for sequences, 205 natural, 151
of the root for series, 258 of derivability, 506, 539
ratio, 239, 257 Dual space, 329
Curve, 108
indi¤erence, 122, 156 Edgeworth box, see Pareto optimum
level, 118 Element
Cusp, 508 of a sequence, see Term of a sequence
of a vector, see Component of a vector
De Morgan’s laws, 10, 882 Equivalence, 882
Decay Expansion
exponential, 221 asymptotic, 626
Density, 29 polinomial, 599
Derivative, 501 polynomial of MacLaurin, 602
higher order, 527 polynomial of Taylor, 602
left, 507 Extended real line, 36, 183
of compounded function, 517
of the inverse function, 519 Factorial, 868
of the product, 514 Forms of indetermination, 37, 197
of the quotient, 515 Formula
of the sum, 514 binomial of Newton, 871
partial, 534, 537 multinomial, 871
right, 507 of Euler, 567
second, 527 of Hille, 629
third, 527 of Mac Laurin, 602
unilateral, 507 of Taylor, 602
Determinant, 360 Frontier, 91
902 INDEX
Walras’Law, 451