Chapter1
Chapter1
Dimitri P. Bertsekas
These notes were developed for the needs of the 6.291 class at
M.I.T. (Spring 2001). They are copyright-protected, but they
may be reproduced freely for noncommercial purposes.
Contents
iii
iv Contents
2. Lagrange Multipliers
2.1. Introduction to Lagrange Multipliers . . . . . . . . . . . . .
2.2. Enhanced Fritz John Optimality Conditions . . . . . . . . . .
2.3. Informative Lagrange Multipliers . . . . . . . . . . . . . . .
2.4. Pseudonormality and Constraint Qualifications . . . . . . . . .
2.5. Exact Penalty Functions . . . . . . . . . . . . . . . . . . .
2.6. Using the Extended Representation . . . . . . . . . . . . . .
2.7. Extensions to the Nondifferentiable Case . . . . . . . . . . . .
2.8. Notes and Sources . . . . . . . . . . . . . . . . . . . . .
3. Lagrangian Duality
3.1. Geometric Multipliers . . . . . . . . . . . . . . . . . . . .
3.2. Duality Theory . . . . . . . . . . . . . . . . . . . . . . .
3.3. Linear and Quadratic Programming Duality . . . . . . . . . .
3.4. Strong Duality Theorems . . . . . . . . . . . . . . . . . .
3.4.1. Convex Cost – Linear Constraints . . . . . . . . . . . . .
3.4.2. Convex Cost – Convex Constraints . . . . . . . . . . . .
3.5. Notes and Sources . . . . . . . . . . . . . . . . . . . . .
These lecture notes were developed for the needs of a graduate course at
the Electrical Engineering and Computer Science Department at M.I.T.
They focus selectively on a number of fundamental analytical and com-
putational topics in (deterministic) optimization that span a broad range
from continuous to discrete optimization, but are connected through the
recurring theme of convexity, Lagrange multipliers, and duality. These top-
ics include Lagrange multiplier theory, Lagrangian and conjugate duality,
and nondifferentiable optimization. The notes contain substantial portions
that are adapted from my textbook “Nonlinear Programming: 2nd Edi-
tion,” Athena Scientific, 1999. However, the notes are more advanced,
more mathematical, and more research-oriented.
As part of the course I have also decided to develop in detail those
aspects of the theory of convex sets and functions that are essential for an
in-depth coverage of Lagrange multipliers and duality. I have long thought
that convexity, aside from being an eminently useful subject in engineering
and operations research, is also an excellent vehicle for assimilating some
of the basic concepts of analysis within an intuitive geometrical setting.
Unfortunately, the subject’s coverage in mathematics and engineering cur-
ricula is scant and incidental. I believe that at least part of the reason
is that while there are a number of excellent books on convexity, as well
as a true classic (Rockafellar’s 1970 book), none of them is well suited for
teaching nonmathematicians who form the largest part of the potential
audience.
I have therefore tried in these notes to make convex analysis accessible
by limiting somewhat its scope and by emphasizing its geometrical charac-
ter, while at the same time maintaining mathematical rigor. The coverage
of the theory is significantly extended in the exercises, whose detailed so-
lutions are posted on the internet. I have included as many insightful
illustrations as I could come up with, and I have tried to use geometric
visualization as a principal tool for maintaining the students’ interest in
mathematical proofs. To highlight a contrast in style, Rockafellar’s mar-
velous book contains no figures at all!
v
vi Preface
Dimitri P. Bertsekas
[email protected]
Spring 2001
1
Contents
1
2 Convex Analysis and Optimization Chap. 1
(c) Convex sets have a nonempty relative interior. In other words, when
viewed within the smallest affine set containing it, a convex set has a
nonempty interior. Thus convex sets avoid the analytical and compu-
tational optimization difficulties associated with “thin” and “curved”
constraint surfaces.
(d) A nonconvex function can be “convexified” while maintaining the opti-
mality of its global minima, by forming the convex hull of the epigraph
of the function.
(e) The existence of a global minimum of a convex function over a convex
set is conveniently characterized in terms of directions of recession
(see Section 1.3).
(f) Polyhedral convex sets (those specified by linear equality and inequal-
ity constraints) are characterized in terms of a finite set of extreme
points and extreme directions. This is the basis for finitely terminat-
ing methods for linear programming, including the celebrated simplex
method (see Section 1.6).
(g) Convex functions are continuous and have nice differentiability prop-
erties. In particular, a real-valued convex function is directionally
differentiable at any point. Furthermore, while a convex function
need not be differentiable, it possesses subgradients, which are nice
and geometrically intuitive substitutes for a gradient (see Section 1.7).
Just like gradients, subgradients figure prominently in optimality con-
ditions and computational algorithms.
(h) Convex functions are central in duality theory. Indeed, the dual prob-
lem of a given optimization problem (discussed in Chapters 3 and 4)
consists of minimization of a convex function over a convex set, even
if the original problem is not convex.
(i) Closed convex cones are self-dual with respect to orthogonality . In
words, the set of vectors orthogonal to the set C ⊥ (the set of vectors
that form a nonpositive inner product with all vectors in a closed and
convex cone C) is equal to C. This simple and geometrically intuitive
property (discussed in Section 1.5) underlies important aspects of
Lagrange multiplier theory.
(j) Convex, lower semicontinuous functions are self-dual with respect to
conjugacy. It will be seen in Chapter 4 that a certain geometrically
motivated conjugacy operation on a given convex, lower semicontinu-
ous function generates a convex, lower semicontinuous function, and
when applied for a second time regenerates the original function. The
conjugacy operation is central in duality theory, and has a nice inter-
pretation that can be used to visualize and understand some of the
most profound aspects of optimization.
Sec. 1.1 Linear Algebra and Analysis 5
Notation
We denote by <n the set of n-dimensional real vectors. For any x ∈ <n ,
we use xi to indicate its ith coordinate, also called its ith component.
6 Convex Analysis and Optimization Chap. 1
{x1 + x2 | x1 ∈ X1 , x2 ∈ X2 }.
We use a similar notation for the sum of any finite number of subsets. In
the case where one of the subsets consists of a single vector x, we simplify
this notation as follows:
x + X = {x + x | x ∈ X}.
of <n1 +···+nm .
X ⊥ = {y | y 0 x = 0, ∀ x ∈ X}.
Matrices
For any matrix A, we use Aij , [A]ij , or aij to denote its ijth element. The
transpose of A, denoted by A0 , is defined by [A0 ]ij = aji . For any two
matrices A and B of compatible dimensions, the transpose of the product
matrix AB satisfies (AB)0 = B 0 A0 .
If X is a subset of <n and A is an m × n matrix, then the image of
X under A is denoted by AX (or A · X if this enhances notational clarity):
AX = {Ax | x ∈ X}.
R(A) = N (A0 )⊥ .
Another way to state this result is that given vectors a1 , . . . , an ∈ <m (the
columns of A) and a vector x ∈ <m , we have x0 y = 0 for all y such that
8 Convex Analysis and Optimization Chap. 1
µ· ¸¶⊥
¡ ¢ B1 ¡ ¢⊥
S1⊥ +S2⊥ = R [ B10 B20 ] =N = N (B1 )∩N (B2) = (S1 ∩S2 )⊥
B2
à n
!1/2
X
kxk = (x0 x)1/2 = |xi |2 .
i=1
The space <n , equipped with this norm, is called a Euclidean space. We
will use the Euclidean norm almost exclusively in this book. In particular,
in the absence of a clear indication to the contrary, k · k will denote the
Euclidean norm. Two important results for the Euclidean norm are:
Sec. 1.1 Linear Algebra and Analysis 9
Two other important norms are the maximum norm k·k∞ (also called
sup-norm or `∞ -norm), defined by
n
X
kxk1 = |xi |.
i=1
Sequences
(This is somewhat paradoxical, since we have that the sup of a set is less
than its inf, but works well for our analysis.) If sup X is equal to a scalar
x that belongs to the set X, we say that x is the maximum point of X and
we often write
x = sup X = max X.
Similarly, if inf X is equal to a scalar x that belongs to the set X, we often
write
x = inf X = min X.
Thus, when we write max X (or min X) in place of sup X (or inf X, re-
spectively) we do so just for emphasis: we indicate that it is either evident,
or it is known through earlier analysis, or it is about to be shown that the
maximum (or minimum, respectively) of the set X is attained at one of its
points.
Given a scalar sequence {xk }, the supremum of the sequence, denoted
by supk xk , is defined as sup{xk | k = 1, 2, . . .}. The infimum of a sequence
Sec. 1.1 Linear Algebra and Analysis 11
(b) {xk } converges if and only if lim inf k→∞ xk = lim supk→∞ xk
and, in that case, both of these quantities are equal to the limit
of xk .
(c) If xk ≤ yk for all k, then
(d) We have
o(·) Notation
¡ ¢
h(x) = o kxkp
if
h(xk )
lim = 0,
k→∞ kxk kp
Let k · k be a given norm in <n . For any ² > 0 and x∗ ∈ <n , consider
the sets
© ª © ª
x | kx − x∗ k < ² , x | kx − x∗ k ≤ ² .
The first set is open and is called an open sphere centered at x∗ , while the
second set is closed and is called a closed sphere centered at x∗ . Sometimes
the terms open ball and closed ball are used, respectively.
Proposition 1.1.6:
(a) The union of finitely many closed sets is closed.
(b) The intersection of closed sets is closed.
(c) The union of open sets is open.
(d) The intersection of finitely many open sets is open.
(e) A set is open if and only if all of its elements are interior points.
(f) Every subspace of <n is closed.
(g) A set X is compact if and only if every sequence of elements of
X has a subsequence that converges to an element of X.
(h) If {Xk } is a sequence of nonempty and compact sets such that
Xk ⊃ Xk+1 for all k, then the intersection ∩∞
k=0 Xk is nonempty
and compact.
Sequences of Sets
in which case X is said to be the limit of {Xk }. The inner and outer limits
are closed (possibly empty) sets. If each set Xk consists of a single point
xk , lim supk→∞ Xk is the set of limit points of {xk }, while lim inf k→∞ Xk
is just the limit of {xk } if {xk } converges, and otherwise it is empty.
Continuity
Proposition 1.1.9:
(a) The composition of two continuous functions is continuous.
(b) Any vector norm on <n is a continuous function.
(c) Let f : <n 7→ <m be continuous, and let Y ⊂ <m© be open
(respectively, closed). Then the inverse image of Y , x ∈ <n |
ª
f (x) ∈ Y , is open (respectively, closed).
(d) Let f : <n 7→ <m be continuous, n
© and let X ª⊂ < be compact.
Then the forward image of X, f(x) | x ∈ X , is compact.
Matrix Norms
It is easily verified that for any vector norm, the above equation defines a
bona fide matrix norm having all the required properties.
Note that by the Schwartz inequality (Prop. 1.1.2), we have
By reversing the roles of x and y in the above relation and by using the
equality y 0 Ax = x0 A0 y, it follows that kAk = kA0 k.
Proposition 1.1.10:
(a) Let A be an n × n matrix. The following are equivalent:
(i) The matrix A is nonsingular.
(ii) The matrix A0 is nonsingular.
(iii) For every nonzero x ∈ <n , we have Ax 6= 0.
(iv) For every y ∈ <n , there is a unique x ∈ <n such that
Ax = y.
(v) There is an n × n matrix B such that AB = I = BA.
(vi) The columns of A are linearly independent.
(vii) The rows of A are linearly independent.
(b) Assuming that A is nonsingular, the matrix B of statement (v)
(called the inverse of A and denoted by A−1 ) is unique.
(c) For any two square invertible matrices A and B of the same
dimensions, we have (AB)−1 = B −1 A−1 .
Sec. 1.1 Linear Algebra and Analysis 17
Note that the only use of complex numbers in this book is in relation
to eigenvalues and eigenvectors. All other matrices or vectors are implicitly
assumed to have real components.
1
kA−1 k = © ª.
min |λ1 |, |λn |
Proposition 1.1.16:
(a) The sum of two positive semidefinite matrices is positive semidef-
inite. If one of the two matrices is positive definite, the sum is
positive definite.
(b) If A is a positive semidefinite n × n matrix and T is an m ×
n matrix, then the matrix T AT 0 is positive semidefinite. If A
is positive definite and T is invertible, then T AT 0 is positive
definite.
Proposition 1.1.17:
(a) For any m×n matrix A, the matrix A0 A is symmetric and positive
semidefinite. A0 A is positive definite if and only if A has rank n.
In particular, if m = n, A0 A is positive definite if and only if A
is nonsingular.
(b) A square symmetric matrix is positive semidefinite (respectively,
positive definite) if and only if all of its eigenvalues are nonneg-
ative (respectively, positive).
(c) The inverse of a symmetric positive definite matrix is symmetric
and positive definite.
20 Convex Analysis and Optimization Chap. 1
1.1.4 Derivatives
Let f : <n 7→ < be some function, fix some x ∈ <n , and consider the
expression
f (x + αei ) − f (x)
lim ,
α→0 α
where ei is the ith unit vector (all components are 0 except for the ith
component which is 1). If the above limit exists, it is called the ith partial
derivative of f at the point x and it is denoted by (∂f /∂xi )(x) or ∂f (x)/∂xi
(xi in this section will denote the ith coordinate of the vector x). Assuming
all of these partial derivatives exist, the gradient of f at x is defined as the
column vector
∂f (x)
∂x1
..
∇f(x) = .
.
∂f (x)
∂xn
Sec. 1.1 Linear Algebra and Analysis 21
provided that the limit exists. We note from the definitions that
f (x + y) − f (x) − ∇f (x)0 y
lim = 0, ∀ x ∈ U, (1.1)
y→0 kyk
We denote by ∇2xx f (x, y), ∇2xy f(x, y), and ∇2yy f (x, y) the matrices with
components
£ 2 ¤ ∂ 2 f(x, y) £ ¤ ∂ 2 f (x, y)
∇xx f(x, y) ij = , ∇2xy f (x, y) ij = ,
∂xi ∂xj ∂xi ∂yj
£ ¤ ∂ 2 f(x, y)
∇2yy f (x, y) ij
= .
∂yi ∂yj
If f : <m+n 7→ <r , f = (f1 , f2 , . . . , fr ), we write
£ ¤
∇x f (x, y) = ∇x f1 (x, y) · · · ∇x fr (x, y) ,
£ ¤
∇y f (x, y) = ∇y f1 (x, y) · · · ∇y fr (x, y) .
Let f : <k 7→ <m and g : <m 7→ <n be smooth functions, and let h
be their composition, i.e.,
¡ ¢
h(x) = g f (x) .
Sec. 1.1 Linear Algebra and Analysis 23
Some examples of useful relations that follow from the chain rule are:
¡ ¢ ¡ ¢
∇ f (Ax) = A0 ∇f (Ax), ∇2 f (Ax) = A0 ∇2 f (Ax)A,
where A is a matrix,
³ ¡ ¢´ ¡ ¢
∇x f h(x), y = ∇h(x)∇h f h(x), y ,
³ ¡ ¢´ ¡ ¢ ¡ ¢
∇x f h(x), g(x) = ∇h(x)∇h f h(x), g(x) + ∇g(x)∇g f h(x), g(x) .
¡ ¢³ ¡ ¢´−1
∇φ(x) = −∇x f x, φ(x) ∇y f x, φ(x) , ∀ x ∈ Sx .
As a final word of caution to the reader, let us mention that one can
easily get confused with gradient notation and its use in various formulas,
such as for example the order of multiplication of various gradients in the
chain rule and the Implicit Function Theorem. Perhaps the safest guideline
to minimize errors is to remember our conventions:
(a) A vector is viewed as a column vector (an n × 1 matrix).
(b) The gradient ∇f of a scalar function f : <n 7→ < is also viewed as a
column vector.
(c) The gradient matrix ∇f of a vector function f : <n 7→ <m with
components f1 , . . . , fm is the n × m matrix whose columns are the
(column) vectors ∇f1 , . . . , ∇fm .
With these rules in mind one can use “dimension matching” as an effective
guide to writing correct formulas quickly.
The notion of a convex set is defined below and is illustrated in Fig. 1.2.1.
Sec. 1.2 Convex Sets and Functions 25
y
y
x
x
y
x
Figure 1.2.1. Illustration of the definition of a convex set. For convexity, linear
interpolation between any two points of the set must yield points that lie within
the set.
Proposition 1.2.1:
(a) The intersection ∩i∈I Ci of any collection {Ci | i ∈ I} of convex
sets is convex.
(b) The vector sum C1 + C2 of two convex sets C1 and C2 is convex.
(c) The set x + λC is convex for any convex set C, vector x, and
scalar λ. Furthermore, if C is a convex set and λ1, λ2 are positive
scalars, we have
(λ1 + λ2 )C = λ1 C + λ2 C.
(d) The closure and the interior of a convex set are convex.
(e) The image and the inverse image of a convex set under an affine
function are convex.
Convex Functions
αf(x) + (1 -α)f(y)
f(z)
x y z
C
−∞ (but never with functions that can take both values −∞ and ∞).
A function f mapping a convex set C ⊂ <n into (−∞, ∞], is also called
convex if the condition
¡ ¢
f αx + (1 − α)y ≤ αf (x) + (1 − α)f (y), ∀ x, y ∈ C, ∀ α ∈ [0, 1]
© ª
dom(f) = x ∈ C | f (x) < ∞ .
© ª
epi(f ) = (x, w) | x ∈ X, w ∈ <, f (x) ≤ w ;
©
(see Fig. 1.2.3). ª Note that if we restrict f to its effective domain x ∈
X | f(x) < ∞ , so that it becomes real-valued, the epigraph remains
unaffected. Epigraphs are useful for our purposes because of the follow-
ing proposition, which shows that questions about convexity and lower
semicontinuity of functions can be reduced to corresponding questions of
convexity and closure of their epigraphs.
Sec. 1.2 Convex Sets and Functions 29
x x
Convex function Nonconvex function
f (x1 ) ≤ w1 , f(x2 ) ≤ w2 ,
Proposition 1.2.3:
(a) Let f1 , . . . , fm : <n 7→ (−∞, ∞] be given functions, let λ1 , . . . , λm
be positive scalars, and consider the function g : <n 7→ (−∞, ∞]
given by
g(x) = λ1 f1 (x) + · · · + λm fm (x).
If f1 , . . . , fm are convex, then g is also convex, while if f1 , . . . , fm
are closed, then g is also closed.
(b) Let f : <n 7→ (−∞, ∞] be a given function, let A be an m × n
matrix, and consider the function g : <n 7→ (−∞, ∞] given by
g(x) = f(Ax).
where we have used Prop. 1.1.4(d) (the sum of the limit inferiors of se-
quences is less or equal to the limit inferior of the sum sequence). There-
fore, g is lower semicontinuous at all x ∈ <n , so by Prop. 1.2.2(b), it is
closed.
(b) This is straightforward, along the lines of the proof of part (a).
(c) A pair (x, w) belongs to the epigraph
© ª
epi(g) = (x, w) | g(x) ≤ w
If the fi are convex, the epigraphs epi(fi ) are convex, so epi(g) is convex,
and g is convex. If the fi are closed, then the epigraphs epi(fi ) are closed,
so epi(g) is closed, and g is closed. Q.E.D.
Proposition 1.2.4: Let C ⊂ <n be a convex set and let f : <n 7→ <
be differentiable over <n .
(a) f is convex over C if and only if
Proof: We prove (a) and (b) simultaneously. Assume that the inequality
(1.4) holds. Choose any x, y ∈ C and α ∈ [0, 1], and let z = αx + (1 − α)y.
Using the inequality (1.4) twice, we obtain
f(z)
f(x) + (z - x)'∇f(x)
x z
We have ¡ ¢
f x + α(z − x) ≤ αf (z) + (1 − α)f (x),
34 Convex Analysis and Optimization Chap. 1
or ¡ ¢
f x + α(z − x) − f (x)
≤ f (z) − f(x), (1.6)
α
and the above inequalities are strict if f is strictly convex. Substituting the
definitions (1.5) in Eq. (1.6), we obtain after a straightforward calculation
¡ ¢ ¡ ¢
f x + α1 (z − x) − f (x) f x + α2 (z − x) − f (x)
≤ ,
α1 α2
or
g(α1 ) ≤ g(α2 ),
Proposition 1.2.5: Let C ⊂ <n be a convex set and let f : <n 7→ <
be twice continuously differentiable over <n .
(a) If ∇2 f (x) is positive semidefinite for all x ∈ C, then f is convex
over C.
(b) If ∇2 f (x) is positive definite for all x ∈ C, then f is strictly
convex over C.
(c) If C = <n and f is convex, then ∇2 f(x) is positive semidefinite
for all x.
(c) Suppose that f : <n 7→ < is convex and suppose, to obtain a con-
tradiction, that there exist some x ∈ <n and some z ∈ <n such that
z 0 ∇2f (x)z < 0. Using the continuity of ∇2 f , we see that we can choose the
norm of z to be small enough so that z 0 ∇2 f (x+αz)z < 0 for every α ∈ [0, 1].
Then, using again Prop. 1.1.21(b), we obtain f (x + z) < f (x) + z 0 ∇f (x),
which, in view of Prop. 1.2.4(a), contradicts the convexity of f . Q.E.D.
Proof: Fix some x, y ¡∈ <n such that ¢ x 6= y, and define the function
h : < 7→ < by h(t) = f x + t(y − x) . Consider some t, t0 ∈ < such that
t < t0 . Using the chain rule and Eq. (1.7), we have
³ dh dh ´ 0
(t0 ) − (t) (t − t)
dt dt
³ ¡ ¢ ¡ ¢´0
= ∇f x + t0 (y − x) − ∇f x + t(y − x) (y − x)(t0 − t)
≥ α(t0 − t)2 kx − yk2 > 0.
Thus, dh/dt is strictly increasing and for any t ∈ (0, 1), we have
Z Z 1
h(t) − h(0) 1 t dh 1 dh h(1) − h(t)
= (τ ) dτ < (τ ) dτ = .
t t 0 dτ 1 − t t dτ 1−t
Equivalently, th(1)
¡ + (1 − t)h(0)¢ > h(t). The definition of h yields tf (y) +
(1 − t)f (x) > f ty + (1 − t)x . Since this inequality has been proved for
arbitrary t ∈ (0, 1) and x 6= y, we conclude that f is strictly convex.
36 Convex Analysis and Optimization Chap. 1
c2 0 2
f(x + cy) = f (x) + cy 0 ∇f (x) + y ∇ f (x + tcy)y,
2
and
c2 0 2
f (x) = f (x + cy) − cy 0 ∇f(x + cy) + y ∇ f(x + scy)y,
2
for some t and s belonging to [0, 1]. Adding these two equations and using
Eq. (1.7), we obtain
c2 0 ¡ 2 ¢ ¡ ¢0
y ∇ f (x+scy)+∇2 f (x+tcy) y = ∇f(x+cy)−∇f (x) (cy) ≥ αc2 kyk2 .
2
¡ ¢0
Using the Mean Value Theorem (Prop. 1.1.20), we have ∇f(x)−∇f (y) (x−
y) = g(1) − g(0) = (dg/dt)(t) for some t ∈ [0, 1]. The result follows because
dg ¡ ¢
(t) = (x − y)0 ∇2 f tx + (1 − t)y (x − y) ≥ αkx − yk2 ,
dt
where the
¡ last inequality
¢ is a consequence of the positive semidefiniteness
of ∇2 f tx + (1 − t)y − αI. Q.E.D.
f (x) = x0 Qx,
Let X be a subset n
Pm of < . A convex combination of elements of X is a vector
of the form i=1 αi xi , where m is a positive integer, x1 , . . . , xm belong to
X, and α1 , . . . , αm are scalars such that
m
X
αi ≥ 0, i = 1, . . . , m, αi = 1.
i=1
Pm
Note that if X is convex, then the convex combination i=1 αi xi belongs
to X (this is easily shown by induction; see the exercises), and for any
function f : <n 7→ < that is convex over X, we have
à m
! m
X X
f α i xi ≤ αi f (xi ). (1.8)
i=1 i=1
m
X m
X
µi xi = 0, µi = 0,
i=1 i=1
αi = αi − γµi , i = 1, . . . , m,
where
Pm γ > 0 is the largest γ such that αi − γµi ≥ 0 for Pmall i. Then, since
i=1 µ i x i = 0, we see that x is also represented
Pm as i=1 αi xi . Further-
more, in view of the choice of γ and the fact i=1 µi = 0, the coefficients
αi are nonnegative, sum to one, and at least one of them is zero. Thus,
x can be represented as a convex combination of fewer than m vectors
of X, contradicting our earlier assumption. It follows that the vectors
x2 − x1 , . . . , xm − x1 must be linearly independent, so that their number
must be at most n. Hence m ≤ n + 1.
Sec. 1.2 Convex Sets and Functions 39
(b) Let x be a nonzero vector in P the cone(X), and let m be the smallest
m
integer such that x has the form i=1 αi xi , where αi > 0 and xi ∈ X for
all i = 1, . . . , m. If the vectorsPmxi were linearly dependent, there would
exist scalars λ1 , . . . , λm , with i=1 λi xi = 0Pand at least one of the λi is
m
positive. Consider the linear combination i=1 (αi − γλi )xi , where γ is
the largest γ such that αi − γλi ≥ 0 for all i. This combination provides a
representation of x as a positive combination of fewer than m vectors of X
– a contradiction. Since any linearly independent set of vectors contains at
most n elements, we must have m ≤ n. Q.E.D.
It is not generally true that the convex hull of a closed set is closed
[take for instance the convex hull of the set consisting of the origin and the
subset {(x1 , x2) | x1 x2 = 1, x1 ≥ 0, x2 ≥ 0} of <2 ]. We have, however, the
following.
The following proposition gives some basic facts about relative interior
points.
Proof: (a) In the case where x ∈ C, see Fig. 1.2.5. In the case where
x∈/ C, to show that for any α ∈ (0, 1] we have xα = αx + (1 − α)x ∈ ri(C),
consider a sequence {xk } ⊂ C that converges to x, and let xk,α = αx + (1 −
α)xk . Then as in Fig. 1.2.5, we see that {z | kz − xk,α k < α²} ∩ aff(C) ⊂ C
for all k. Since for large enough k, we have
(see Fig. 1.2.6). This set is open relative to aff(C); that is, for every x ∈ X,
there exists an open set N such that x ∈ N and N ∩ aff(C) ⊂ X. [To see
Sec. 1.2 Convex Sets and Functions 41
xα = αx + (1 -α)x
x αε
Sα
ε
S
C
Figure 1.2.5. Proof of the line segment principle for the case where x ∈ C. Since
x ∈ ri(C), there exists a sphere S = {z | kz − xk < ²} such that S ∩ aff(C) ⊂ C.
For all α ∈ (0, 1], let xα = αx + (1 − α)x and let Sα = {z | kz − xα k < α²}. It
can be seen that each point of Sα ∩ aff(C) is a convex combination of x and some
point of S ∩ aff(C). Therefore, Sα ∩ aff(C) ⊂ C, implying that xα ∈ ri(C).
this, note that X is the inverse image of the open set in <m
( )
¯ X
m
¯
(α1 , . . . , αm ) ¯ αi < 1, αi > 0, i = 1, . . . , m
i=1
Pm
under the linear transformation from aff(C) to <m that maps i=1 αi zi
into (α1 , . . . , αm ); openness of the above set follows by continuity of linear
transformation.] Therefore all points of X are relative interior points of
C, and ri(C) is nonempty. Since by construction, aff(X) = aff(C) and
X ⊂ ri(C), it follows that ri(C) and C have the same affine hull.
To show the last assertion of part (b), consider vectors
m
X
x0 = α zi , xi = x0 + αzi , i = 1, . . . , m,
i=1
where α is a positive scalar such that α(m + 1) < 1. The vectors x0, . . . , xm
are in the set X and in the relative interior of C, since X ⊂ ri(C). Further-
more, because xi − x0 = αzi for all i and vectors z1 , . . . , zm span aff(C),
the vectors x1 − x0 , . . . , xm − x0 also span aff(C).
(c) If x ∈ ri(C) the condition given clearly holds. Conversely, let x satisfy
the given condition. We will show that x ∈ ri(C). By part (b), there exists
a vector x ∈ ri(C). We may assume that x 6= x, since otherwise we are
done. By the given condition, since x is in C, there is a γ > 1 such that
y = x + (γ − 1)(x − x) ∈ C. Then we have x = (1 − α)x + αy, where
α = 1/γ ∈ (0, 1), so by part (a), we obtain x ∈ ri(C). Q.E.D.
42 Convex Analysis and Optimization Chap. 1
X x2
x1
C
Figure 1.2.6. Construction of the relatively open set X in the proof of nonempti-
ness of the relative interior of a convex set C that contains the origin. We choose
m linearly independent vectors z1 , . . . , zm ∈ C, where m is the dimension of
aff(C), and let
( )
m
X ¯ X
m
¯
X= αi zi ¯ αi < 1, αi > 0, i = 1, . . . , m .
i=1 i=1
In view of Prop. 1.2.9(b), C and ri(C) all have the same dimension.
It can also be shown that C and cl(C) have the same dimension (see the
exercises). The next proposition gives several properties of closures and
relative interiors of convex sets.
¡ ¢
Proof: (a) Since ri(C) ⊂ C, we have cl ri(C) ⊂ cl(C). Conversely, let
¡ ¢
x ∈ cl(C). We will show that x ∈ cl ri(C) . Let x be any point in ri(C)
Sec. 1.2 Convex Sets and Functions 43
[there exists such a point by Prop. 1.2.9(b)], and assume that that x 6= x
(otherwise we are done). By the line segment principle [Prop. 1.2.9(a)], we
have αx +© (1 − α)x ∈ ri(C) for all α ª∈ (0, 1]. Thus, x is the limit¡ of the
¢
sequence (1/k)x + (1 − 1/k)x | k ≥ 1 that lies in ri(C), so x ∈ cl ri(C) .
¡ ¢
(b) Since C ⊂ cl(C), we must have ri(C) ⊂ ri cl(C) . To prove the reverse
¡ ¢
inclusion, let z ∈ ri cl(C) . We will show that z ∈ ri(C). By Prop. 1.2.9(b),
there exists an x ∈ ri(C). We may assume that x 6= z (otherwise we are
done). We choose γ > 1, with γ sufficiently
¡ ¢ close to 1 so that the vector
y = z + (γ − 1)(z − x) belongs to ri cl(C) [cf. Prop. 1.2.9(c)], and hence
also to cl(C). Then we have z = (1 − α)x + αy where α = 1/γ ∈ (0, 1), so
by the line segment principle [Prop. 1.2.9(a)], we obtain z ∈ ri(C).
(c) If ri(C) = ri(C), part (a) implies that cl(C) = cl(C). Similarly, if
cl(C) = cl(C), part (b) implies that ri(C) = ri(C). Furthermore, if these
conditions hold the relation ri(C) ⊂ C ⊂ cl(C) implies condition (iii).
Finally,¡ assume
¢ that condition (iii) holds. Then by taking closures, we
have cl ri(C) ⊂ cl(C) ⊂ cl(C), and by using part (a), we obtain cl(C) ⊂
cl(C) ⊂ cl(C). Hence C and C have the same closure.
(d) For any set X, we have A · cl(X) ⊂ cl(A · X), since if a sequence
{xk } ⊂ X converges to some x ∈ cl(X) then the sequence {Axk } ⊂ A · X
converges to Ax, implying that Ax ∈ cl(A · X). We use this fact and part
(a) to write
¡ ¢ ¡ ¢
A · ri(C) ⊂ A · C ⊂ A · cl(C) = A · cl ri(C) ⊂ cl A · ri(C) .
Thus A·C lies between the set A·ri(C) and the closure of that set, implying
that the relative interiors of the sets A · C and A · ri(C) are equal [part (c)].
Hence ri(A·C) ⊂ A·ri(C). We will show the reverse inclusion by taking any
z ∈ A · ri(C) and showing that z ∈ ri(A · C). Let x be any vector in A · C,
and let z ∈ ri(C) and x ∈ C be such that Az = z and Ax = x. By part
Prop. 1.2.9(c), there exists γ > 1 such that the vector y = z + (γ − 1)(z − x)
belongs to C. Thus we have Ay ∈ A · C and Ay = z + (γ − 1)(z − x), so by
Prop. 1.2.9(c) it follows that z ∈ ri(A · C).
(e) By the argument given in part (d), we have A · cl(C) ⊂ cl(A · C). To
show the converse, choose any x ∈ cl(A · C). Then, there exists a sequence
{xk } ⊂ C such that Axk → x. Since C is bounded, {xk } has a subsequence
that converges to some x ∈ cl(C), and we must have Ax = x. It follows
that x ∈ A · cl(C). Q.E.D.
e3 yk e2
xk
e4 e1
zk
Some of the preceding results [Props. 1.2.8, 1.2.10(e)] have illustrated how
boundedness affects the topological properties of sets obtained through
various operations on convex sets. In this section we take a closer look at
this issue.
Given a convex set C, we say that a vector y is a direction of recession
of C if x + αy ∈ C for all x ∈ C and α ≥ 0. In words, y is a direction of
recession of C if starting at any x in C and going indefinitely along y, we
never cross the boundary of C to points outside C. The set of all directions
of recession is a cone containing the origin. It is called the recession cone
of C and it is denoted by RC (see Fig. 1.2.8). This definition implies that
the recession cone of the intersection of any collection of sets Ci , i ∈ I, is
equal to the corresponding intersection of the recession cones:
x + αy
x
y
Recession Cone R C
0
Convex Set C
z zk
zk-2 k-1
x + αy
Convex Set C
x x + yk
x + αy
x+y
x
Unit Ball
zk − x
yk = ,
kzk − xk
and let y be a limit point of {yk } (compare with the construction of Fig.
1.2.9). For any fixed α ≥ 0, the vector x + αyk lies between x and zk in the
line segment connecting x and zk for all k such that kzk − xk ≥ α. Hence
by convexity of C, we have x + αyk ∈ C for all sufficiently large k. Since
x + αy is a limit point of {x + αyk }, and C is closed, we have x + αy ∈ C.
Hence the nonzero vector y is a direction of recession. Q.E.D.
V = {x ∈ C | Ax ∈ W }
is compact if and only if RC ∩ N (A) = {0}. To see this, note that the
recession cone of the set
V = {x ∈ <n | Ax ∈ W }
Sec. 1.2 Convex Sets and Functions 49
RC = RC1 + · · · + RCm
Note that if C1 and C2 are both closed and unbounded, the vector
sum C1 + C2 need 2
© not be closed. For example consider ª the closed sets of <
given by C1 = (x1 , x2 ) | x1 x2 ≥ 1, x1 ≥ 0, x2 ≥ 0 and C2 = {(x1 , x2) |
ª
x1 = 0 . Then C1 + C2 is the open halfspace {(x1 , x2 ) | x1 > 0}.
EXE RC ISES
1.2.1
(a) Show that a set is convex if and only if it contains all the convex combina-
tions of its elements.
(b) Show that the convex hull of a set coincides with the set of all the convex
combinations of its elements.
1.2.2
Let C be a nonempty set in <n , and let λ1 and λ2 be positive scalars. Show
by example that the sets (λ1 + λ2 )C and λ1 C + λ2 C may differ when C is not
convex [cf. Prop. 1.2.1].
(a) For any collection {Ci | i ∈ I} of cones, the intersection ∩i∈I Ci is a cone.
(b) The vector sum C1 + C2 of two cones C1 and C2 is a cone.
Sec. 1.2 Convex Sets and Functions 51
C = {x | a0i x ≤ 0, i ∈ I}
Show that
C = C1 ∩ C2 .
1.2.5
1.2.6
Let {Ci | i ∈ I} be an arbitrary collection of convex sets in <n , and let C be the
convex hull of the union of the collection. Show that
à !
[ X
C= αi Ci ,
i∈I
where the union is taken over all convex combinations such that only finitely
many coefficients αi are nonzero.
52 Convex Analysis and Optimization Chap. 1
1.2.7
1.2.8
is convex over C.
1
f1 (x1 , . . . , xn ) = −(x1 x2 · · · xn ) n
1.2.10
Use the Line Segment Principle and the method of proof of Prop. 1.2.5(c) to show
that if C is a convex set with nonempty interior, and f : <n 7→ < is convex and
twice continuously differentiable over C, then ∇2 f (x) is positive semidefinite for
all x ∈ C.
1.2.11
Let C ⊂ <n be a convex set and let f : <n 7→ < be twice continuously differen-
tiable over C. Let S be the subspace that is parallel to the affine hull of C. Show
that f is convex over C if and only if y0 ∇2 f (x)y ≥ 0 for all x ∈ C and y ∈ S.
1.2.12
Let f : <n 7→ < be a differentiable function. Show that f is convex over a convex
set C if and only if
¡ ¢0
∇f (x) − ∇f (y) (x − y) ≥ 0, ∀ x, y ∈ C.
Hint : The condition above says that the function f , restricted to the line segment
connecting x and y, has monotonically nondecreasing gradient; see also the proof
of Prop. 1.2.6.
(b) Use part (a) to show that there are four possibilities as x increases to ∞:
(1) f (x) decreases monotonically to −∞, (2) f (x) decreases monotonically
to a finite value, (3) f (x) reaches and stays at some value, (4) f (x) increases
monotonically to ∞ when x ≥ x for some x ∈ <.
1.2.15
xp yq
xy ≤ + ,
p q
n
à n
!1/p à n
!1/q
X X X
p q
|xi yi | ≤ |xi | |yi | .
i=1 i=1 i=1
1.2.16
Let f : <n+m 7→ < be a convex function. Consider the function h : <n 7→ <
given by
h(x) = inf f (x, u),
u∈U
where U is nonempty and convex subset of <m . Assuming that h(x) > −∞ for
all x ∈ <n , show that h is ¡convex. Hint : There
¢ cannot exist α ∈ [0, 1], x1 , x2 ,
u1 ∈ U , u2 ∈ U such that h αx1 + (1 − α)x2 > αf (x1 , u1 ) + (1 − α)f (x2 , u2 ).
1.2.17
Assuming that f (x) > −∞ for all x, show that f is convex over <n .
(c) Let h : <m 7→ < be a convex function and let
1.2.18
(c) Show that the set of global minima of F over conv(X) includes all global
minima of f over X.
Let X1 and X2 be subsets of <n , and let X = conv(X1 ) + cone(X2 ). Show that
every vector x in X can be represented in the form
k m
X X
x= αi xi + α i xi ,
i=1 i=k+1
1.2.21
1.2.22
1.2.23
1.2.24
1.2.25
1.2.26
(a) Let C be a convex cone. Show that ri(C) is also a convex cone.
¡ ¢
(b) Let C = cone {x1 , . . . , xm } . Show that
( )
m
X ¯
¯
ri(C) = αi xi ¯ αi > 0, i = 1, . . . , m .
i=1
1.2.27
[Compare these relations with those of Prop. 1.2.9(d) and (e), respectively.]
Let f : <n → < be a convex function and X be a bounded set in <n . Then f
has the Lipschitz property over X, i.e., there exists a positive scalar c such that
1.2.29
Let C be a closed convex set and let M be an affine set such that the intersection
C ∩M is nonempty and bounded. Show that for every affine set M that is parallel
to M, the intersection C ∩ M is bounded when nonempty.
58 Convex Analysis and Optimization Chap. 1
1.2.32
(b) Give an example showing that A · Rcl(C) and RA·cl(C) can differ when
Rcl(C) ∩ N (A) 6= {0}.
Let C be a nonempty convex set in <n . Define the lineality space of C, denoted by
L, to be a subspace of vectors y such that simultaneously y ∈ RC and −y ∈ RC .
(a) Show that for every subspace S ⊂ L
C = (C ∩ S ⊥ ) + S.
Sec. 1.3 Convexity and Optimization 59
(b) Show the following refinement of Prop. 1.2.13 and Exercise 1.2.32: if A is
an m × n matrix and Rcl(C) ∩ N (A) is a subspace of L, then
1.2.34
(b) Show the following extension of part (a) to nonclosed sets: Let C1 , . . . , Cm
be nonempty convex sets in <n such that the equality y1 +· · ·+ym = 0 with
yi ∈ Rcl(Ci ) implies that each yi belongs to the lineality space of cl(Ci ).
Then we have
lim f (xk ) = ∞
k→∞
for every sequence {xk } such that kxk k → ∞ for some © norm k · k. Note
that as a consequence of the definition, the level sets x | f (x) ≤ γ} of a
coercive function f are bounded whenever they are nonempty.
Since X is bounded, this sequence has at least one limit point x∗ [Prop.
1.1.5(a)]. Since f is closed, f is lower semicontinuous at x∗ [cf. Prop.
1.2.2(b)], so that f (x∗ ) ≤ limk→∞ f (xk ) = inf x∈X f(x). Since X is closed,
x∗ belongs to X, so we must have f(x∗ ) = inf x∈X f(x).
Sec. 1.3 Convexity and Optimization 61
Proof: See Fig. 1.3.1 for a proof that a local minimum of f is also global.
Let f be strictly convex, and to obtain a contradiction, assume that two
distinct global minima x and y exist. Then the average (x + y)/2 must
belong to X, since X is convex. Furthermore, the value of f must be
smaller at the average than at x and y by the strict convexity of f . Since
x and y are global minima, we obtain a contradiction. Q.E.D.
f(x)
αf(x*) + (1 - α)f(x)
x x* x
Figure 1.3.1. Proof of why local minima of convex functions are also global.
Suppose that f is convex, and assume to arrive at a contradiction, that x∗ is a
local minimum that is not global. Then there must exist an x ∈ X such that
f(x) < f (x∗ ). By convexity, for all α ∈ (0, 1),
¡ ¢
f αx∗ + (1 − α)x ≤ αf(x∗ ) + (1 − α)f(x) < f (x∗ ).
Thus, f has strictly lower value than f(x∗ ) at every point on the line segment
connecting x∗ with x, except x∗ . This contradicts the local minimality of x∗ .
Sec. 1.3 Convexity and Optimization 63
(y − z)0 (x − z) ≤ 0, ∀ y ∈ C.
is convex.
ky − xk2 = ky −zk2 +kz −xk2 − 2(y −z)0 (x−z) ≥ kz −xk2 − 2(y −z)0 (x−z).
yα = αy + (1 − α)z. We have
kx − yα k2 = k(1 − α)(x − z) + α(x − y)k2
= (1 − α)2kx − zk2 + α2 kx − yk2 + 2(1 − α)α(x − z)0 (x − y).
Viewing kx − yα k2 as a function of α, we have
∂ © ª¯¯
kx − yα k2 ¯ = −2kx − zk2 + 2(x − z)0 (x − y) = −2(y − z)0 (x − z).
∂α α=0
and for positive but small enough α, we obtain kx − yα k < kx − zk. This
contradicts the fact z = PC (x) and shows that (y − z)0 (x − z) ≤ 0 for all
y ∈ C.
¡ ¢0 ¡
(c) Let x and y be elements of <n . From part (b), we have w−PC (x) x−
¢
PC (x) ≤ 0 for all w ∈ C. Since PC (y) ∈ C, we obtain
¡ ¢0 ¡ ¢
PC (y) − PC (x) x − PC (x) ≤ 0.
Similarly, ¡ ¢0 ¡ ¢
PC (x) − PC (y) y − PC (y) ≤ 0.
Adding these two inequalities, we obtain
¡ ¢0 ¡ ¢
PC (y) − PC (x) x − PC (x) − y + PC (y) ≤ 0.
By rearranging and by using the Schwartz inequality, we have
° ° ¡ ¢ ° °
°PC (y) − PC (x)°2 ≤ PC (y) − PC (x) 0 (y − x) ≤ °PC (y) − PC (x)° · ky − xk,
y ¡ ¢0 ¡ ¢
y − PC (x) x − PC (x) ≤ 0.
The recession cone, discussed in Section 1.2.4, is also useful for character-
izing directions along which convex functions asymptotically increase or
decrease. A key idea here is that a function that is convex over <n can
be described in terms of its epigraph, which is a closed and convex set.
The recession cone of the epigraph can be used to obtain the directions
along which the function “slopes downward.” This is the idea underlying
the following proposition.
This proves part (a). Part (b) follows by applying Prop. 1.2.13(c) to the
recession cone of epi(f ). Q.E.D.
f(x) f(x)
α α
(a) (b)
f(x) f(x)
α α
(c) (d)
f(x)
f(x)
α α
(e) (f)
© ª
of the recession cones of X and x | f (x) ≤ f ∗ . This is equivalent to X
and f having no common nonzero direction of recession.
Conversely, let a be a scalar such that the set
© ª
Xa = X ∩ x | f (x) ≤ a
68 Convex Analysis and Optimization Chap. 1
If the closed convex set X and the convex function f of the above
proposition have a common direction of recession, then either X ∗ is empty
[take for example, X = (−∞, 0] and f (x) = ex ] or else X ∗ is nonempty
and unbounded [take for example, X = (−∞, 0] and f (x) = max{0, x}].
Another interesting question is what happens when X and f have a
common direction of recession, call it y, but f is bounded below over X:
1
minimize f (x) = c0 x + x0 Qx
2 (1.9)
subject to Ax = 0,
We thus conclude that for f to be bounded from below along all directions
in N (A) it is necessary and sufficient that c0 x = 0 for all x ∈ N (A) ∩ N (Q).
However, boundedness from below of a convex cost function f along all
directions of recession of a constraint set does not guarantee existence of
an optimal solution, or even boundedness from below over the constraint set
(see the exercises). On the other hand, since the constraint set N (A) is a
subspace, it is possible to use a transformation x = Bz where the columns
of the matrix B are basis vectors for N (A), and view the problem as an
unconstrained minimization over z of the cost function h(z) = f (Bz), which
is positive semidefinite quadratic. We can then argue that boundedness
from below of this function along all directions z is necessary and sufficient
for existence of an optimal solution. This argument indicates that problem
(1.9) has an optimal solution if and only if c0 x = 0 for all x ∈ N (A)∩N (Q).
By using a translation argument, this result can also be extended to the
case where the constraint set is a general affine set of the form {x | Ax = b}
rather than the subspace {x | Ax = 0}.
In part (a) of the following proposition we state the result just de-
scribed (equality constraints only). While we can prove the result by for-
malizing the argument outlined above, we will use instead a more elemen-
tary variant of this argument, whereby the constraints are eliminated via
a penalty function; this will give us the opportunity to introduce a line
of proof that we will frequently employ in other contexts as well. In part
(b) of the proposition, we allow linear inequality constraints, and we show
that a convex quadratic program has an optimal solution if and only if its
optimal value is bounded below. Note that the cost function may be linear,
so the proposition applies to linear programs as well.
70 Convex Analysis and Optimization Chap. 1
1
f (x) = c0 x + x0 Qx,
2
1
f (x + αy) = c0 (x + αy) + (x + αy)0 Q(x + αy) = f (x) + αc0 y.
2
minimize h(y)
subject to Ay = 0,
where
1
h(y) = f (x + y) = f (x) + ∇f(x)0 y + y 0 Qy.
2
Sec. 1.3 Convexity and Optimization 71
and
inf hk (y) ≤ infn hk+1 (y) ≤ inf h(y) ≤ h(0) = f(x). (1.11)
y∈<n y∈< Ay=0
Denote ¡ ¢⊥
S = N (A) ∩ N (Q)
and write any y ∈ <n as y = z + w, where
z ∈ S, w ∈ S ⊥ = N (A) ∩ N (Q).
and we will use this relation to show that {yk } is bounded and each of its
limit points minimizes h(y) subject to Ay = 0. Indeed, from Eq. (1.13), the
sequence {hk (yk )} is bounded, so if {yk } were unbounded, then assuming
without loss of generality that yk 6= 0, we would have hk (yk )/kyk k → 0, or
µ µ ¶¶
1 1 0 k
lim f (x) + ∇f (x)0 ŷk + kyk k ŷk Qŷk + kAŷk k2 = 0,
k→∞ kyk k 2 2
where ŷk = yk /kyk k. For this to be true, all limit points ŷ of the bounded
sequence {ŷk } must be such that ŷ 0 Qŷ = 0 and Aŷ = 0, which is impossible
since kŷk = 1 and ŷ ∈ S. Thus {yk } is bounded and for any one of its limit
points, call it y, we have y ∈ S and
1 k
lim sup hk (yk ) = f (x) + ∇f (x)0 y + y 0 Qy + lim sup kAyk k2 ≤ inf h(y).
k→∞ 2 k→∞ 2 Ay=0
72 Convex Analysis and Optimization Chap. 1
minimize f (x)
(1.14)
subject to a0j x = bj , j ∈ J.
so that
f(xk + γy) < f (xk ), ∀ γ > 0.
Furthermore, since y ∈ N (A), we have
We must also have a0j y > 0 for at least one j ∈ / A [otherwise (iii) would be
violated], so the line {xk + γy | γ > 0} crosses the boundary of X for some
γ k > 0. The sequence {xk }, where xk = xk + γ k y, satisfies {xk } ⊂ X,
f (xk ) → f ∗ [since f (xk ) ≤ f (xk )], and the active index set J (xk ) strictly
contains J for all k. This contradicts the maximality of J , and shows that
problem (1.14) has an optimal solution, call it x.
Since xk is a feasible solution of problem (1.14), we have
f(x) ≤ f (xk ), ∀ k,
so that
f(x) ≤ f ∗ .
Sec. 1.3 Convexity and Optimization 73
subject to x ∈ X
or
maximize inf φ(x, z)
x∈X
subject to z ∈ Z.
These problems are encountered in at least three major optimization con-
texts:
(1) Worst-case design, whereby we view z as a parameter and we wish
to minimize over x a cost function, assuming the worst possible value
of x. A special case of this is the discrete minimax problem, where
we want to minimize over x ∈ X
© ª
max f1 (x), . . . , fm (x) ,
where the fi are some given functions. Here, Z is the finite set
{1, . . . , m}. Within this context, it is important to provide char-
acterizations of the max function
minimize f(x)
(1.15)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r
74 Convex Analysis and Optimization Chap. 1
subject to x ∈ X
and that the inf and sup above are attained. This is a major issue in
duality theory because it connects the primal and the dual problems [cf.
Eqs. (1.15) and (1.16)] through their optimal values and optimal solutions.
In particular, when we discuss duality in Chapter 3, we will see that a
major question is whether there is no duality gap, i.e., whether the optimal
primal and dual values are equal. This is so if and only if
We will prove in this section one major result, the Saddle Point The-
orem, which guarantees the equality (1.17), assuming convexity/concavity
Sec. 1.3 Convexity and Optimization 75
[for every z ∈ Z, write inf x∈X φ(x, z) ≤ inf x∈X supz∈Z φ(x, z) and take the
supremum over z ∈ Z of the left-hand side]. However, special conditions
are required to guarantee equality.
Suppose that x∗ is an optimal solution of the problem
† The Saddle Point Theorem is also central in game theory, as we now briefly
explain. In the simplest type of zero sum game, there are two players: the first
may choose one out of n moves and the second may choose one out of m moves.
If moves i and j are selected by the first and the second player, respectively, the
first player gives a specified amount aij to the second. The objective of the first
player is to minimize the amount given to the other player, and the objective of
the second player is to maximize this amount. The players use mixed strategies,
whereby the first player selects a probability distribution x = (x1 , . . . , xn ) over
his n possible moves and the second player selects a probability distribution
z = (z1 , . . . , zm ) over his m possible moves. Since the probability of selecting i
and
P j is xi zj , the0 expected amount to be paid by the first player to the second is
a x z or x Az, where A is the n × m matrix with elements aij .
i,j ij i j
If each player adopts a worst case viewpoint, whereby he optimizes his
choice against the worst possible selection by the other player, the first player
must minimize maxz x0 Az and the second player must maximize minx x0 Az. The
main result, a special case of the existence result we will prove shortly, is that
these two optimal values are equal, implying that there is an amount that can
be meaningfully viewed as the value of the game for its participants.
76 Convex Analysis and Optimization Chap. 1
Then we have
sup inf φ(x, z) = inf φ(x, z ∗ ) ≤ φ(x∗ , z ∗ ) ≤ sup φ(x∗ , z) = inf sup φ(x, z).
z∈Z x∈X x∈X z∈Z x∈X z∈Z
(1.22)
If the minimax equality [cf. Eq. (1.17)] holds, then equality holds through-
out above, so that
or equivalently
This, together with the minimax inequality (1.19) guarantee that the min-
imax equality (1.17) holds and from Eq. (1.22), x∗ and z ∗ are optimal
solutions of problems (1.20) and (1.21), respectively.
We summarize the above discussion in the following proposition.
φ(x,z)
Curve of maxima
Saddle point
φ(x,z(x)
^ )
(x*,z*)
Curve of minima
x
φ(x(z),z
^ )
or equivalently, if (x∗ , z ∗ ) lies on both curves [x∗ = x̂(z ∗ ) and z ∗ = ẑ(x∗ )]. At
such a pair, we also have
¡ ¢ ¡ ¢
max φ x̂(z), z = max min φ(x, z) = φ(x∗ , z ∗ ) = min max φ(x, z) = min φ x, ẑ(x) ,
z∈Z z∈Z x∈X x∈X z∈Z x∈X
so that ¡ ¢ ¡ ¢
φ x̂(z), z ≤ φ(x∗ , z ∗ ) ≤ φ x, ẑ(x) , ∀ x ∈ X, z ∈ Z
¡ ¢
(see Prop. 1.3.7). Visually, the curve of maxima φ x, ẑ(x) must lie “above” the
¡ ¢
curve of minima φ x̂(z), z (completely, i.e., for all x ∈ X and z ∈ Z).
78 Convex Analysis and Optimization Chap. 1
© ª
Z(x) = ẑ | ẑ maximizes φ(x, z) over Z .
The definition implies that the pair (x∗ , z ∗ ) is a saddle point if and only if
it is a “point of intersection” of X̂(·) and Ẑ(·) in the sense that
x∗ ∈ X̂(z ∗ ), z ∗ ∈ Ẑ(x∗ );
Proof: We first prove the result under the assumption that X and Z
are compact [condition (1)], and the additional assumption that φ(·, z) is
strictly concave for each x ∈ X.
By Weierstrass’ Theorem (Prop. 1.3.1), the function
We will show that (x∗ , z ∗ ) is a saddle point of φ, and in view of the above
relation, it will suffice to show that φ(x∗ , z ∗ ) ≤ φ(x, z ∗ ) for all x ∈ X.
Sec. 1.3 Convexity and Optimization 79
where x and z are the points appearing in conditions (2)-(4). For each
k ≥ k, we introduce the convex and compact sets
© ª © ª
Xk = x ∈ X | kxk ≤ k , Zk = z ∈ Z | kzk ≤ k .
If {xk } were unbounded, the coercivity of φ(·, z) would imply that φ(xk , z) →
∞ and hence φ(x, zk ) → ∞, which violates the coercivity of −φ(x, z).
Hence {xk } must be bounded, and a symmetric argument shows that {zk }
must be bounded. Thus (xk , zk ) must have a limit point (x∗ , z ∗ ). The
result then follows from Eq. (1.27), similar to the case where condition (2)
holds. Q.E.D.
Example 1.3.1
Let
X = {x ∈ <2 | x ≥ 0}, Z = {z ∈ < | z ≥ 0},
and let
√
φ(x, z) = e− x1 x2
+ zx1 ,
since the expression in braces is nonnegative for x ≥ 0 and can approach zero
by taking x1 → 0 and x1 x2 → ∞. Hence
© √ ª n
1 if x1 = 0,
sup e− x1 x2
+ zx1 =
z≥0 ∞ if x1 > 0.
Hence
inf sup φ(x, z) = 1,
x≥0 z≥0
so inf x≥0 supz≥0 φ(x, z) > supz≥0 inf x≥0 φ(x, z). The difficulty here is that
the compactness/coercivity assumptions of Prop. 1.3.8 are violated.
82 Convex Analysis and Optimization Chap. 1
EXE RC ISES
1.3.1
Let f : <n 7→ < be a convex function, let X be a closed convex set, and assume
that f and X have no common direction of recession. Let X ∗ be the optimal
solution set (nonempty and compact by Prop. 1.3.5) and let f ∗ = inf x∈X f (x).
Show that:
(a) For every ² > 0 there exists a δ > 0 such that every vector x ∈ X with
f(x) ≤ f ∗ + δ satisfies minx∗ ∈X ∗ kx − x∗ k ≤ ².
(b) Every sequence {xk } ⊂ X satisfying lim k→∞ f (xk ) → f ∗ is bounded and
all its limit points belong to X ∗ .
1.3.2
This exercise deals with an extension of Prop. 1.3.6 to the case where the quadratic
cost may not be convex. Consider a problem of the form
0 1 0
minimize c x + x Qx
2
subject to Ax ≤ b,
Let f : <n 7→ < be a convex function, and consider the problem of minimizing
f over a closed and convex set X . Suppose that f attains a minimum along all
half lines of the form {x + αy | a ≥ 0} where x ∈ X and y is in the recession cone
of X. Show that we may have inf x∈X f (x) = −∞.© Hint : Use the case ª n = 2,
X = <2 , f (x) = minz∈C kz − xk2 − x1 , where C = (x1 , x2 ) | x21 ≤ x2 .
1.4 HYPERPLANES
{x | a0 x ≥ b}, {x | a0 x ≤ b},
are called the halfspaces associated with the hyperplane (also referred to as
the positive and negative halfspaces, respectively). We have the following
result, which is also illustrated in Fig. 1.4.1. The proof is based on the
Projection Theorem and is illustrated in Fig. 1.4.2.
a0 x ≥ a0 x, ∀ x ∈ C. (1.29)
84 Convex Analysis and Optimization Chap. 1
Positive Halfspace
{x | a'x≥ b}
Negative Halfspace
{x | a'x≤ b}
Hyperplane {x | a'x = b}
(a)
C1
C
x
a C2
a
(b) (c)
C x0
x1
x2
x3
x a0
a1
x3 a2
x0
x2 x1
(x̂k −xk )0 x ≥ (x̂k −xk )0 x̂k = (x̂k −xk )0 (x̂k −xk )+(x̂k −xk )0 xk ≥ (x̂k −xk )0 xk .
where
x̂k − xk
ak = .
kx̂k − xk k
We have kak k = 1 for all k, and hence the sequence {ak } has a subsequence
that converges to a nonzero limit a. By considering Eq. (1.30) for all ak
belonging to this subsequence and by taking the limit as k → ∞, we obtain
Eq. (1.29). Q.E.D.
a0 x1 ≤ a0 x2 , ∀ x1 ∈ C1 , x2 ∈ C2 . (1.31)
C = {x | x = x2 − x1 , x1 ∈ C1 , x2 ∈ C2 }.
Since C1 and C2 are disjoint, the origin does not belong to C, so by the
Supporting Hyperplane Theorem there exists a vector a 6= 0 such that
0 ≤ a0 x, ∀ x ∈ C,
Proposition 1.4.4:
(a) A closed convex C ⊂ <n is the intersection of the halfspaces that
contain C.
(b) The closure of the convex hull of a set C ⊂ <n is the intersection
of the halfspaces that contain C.
Sec. 1.4 Hyperplanes 87
C1
x1
x
x2 C = {x1 - x2 | x1 ∈ C1, x2 ∈ C2}
C2
= {(ξ1,ξ2) |ξ1 < 0}
(a) (b)
EXE RC ISES
Let C1 and C2 be two nonempty, convex subsets of <n , and let B denote the unit
88 Convex Analysis and Optimization Chap. 1
(iii) The relative interiors ri(C1 ) and ri(C2 ) have no point in common.
C ⊥ = {y | y 0 x ≤ 0, ∀ x ∈ C},
is called the polar cone of C (see Fig. 1.5.1). Clearly, the polar cone C ⊥ ,
being the intersection of a collection of closed halfspaces, is closed and
convex (regardless of whether C is closed and/or convex). We also have
the following basic result.
Sec. 1.5 Conical Approximations 89
a1
C a1
C
a2
a2
0
0 C⊥
C⊥
(a) (b)
¡ ¢⊥
Proof: (a) Clearly, we have C ⊥ ⊃ cl(C) . Conversely, if y ∈ C ⊥ , then
y 0 xk ≤ 0 for all k and all sequences {xk } ⊂ C, so that y 0 x ≤ 0 for all limits
¡ ¢⊥ ¡ ¢⊥
x of such sequences. Hence y ∈ cl(C) and C ⊥ ⊂ cl(C) .
¡ ¢⊥
Similarly, we have C ⊥ ⊃ conv(C) . Conversely, if y ∈ C ⊥ , then
y 0 x ≤ 0 for all x ∈ C so that y 0 z ≤ 0 for all z that are convex combinations
¡ ¢⊥ ¡ ¢⊥
of vectors x ∈ C. Hence y ∈ conv(C) and C ⊥ ⊂ conv(C) . A nearly
¡ ¢⊥
identical argument also shows that C ⊥ = cone(C) .
(b) Figure 1.5.2 shows that if C is closed and convex, then (C ⊥ )⊥ = C.
From this it follows that
¡¡ ¡ ¢¢⊥ ¢⊥ ¡ ¢
cl conv(C) = cl conv(C) ,
90 Convex Analysis and Optimization Chap. 1
and ⊥ ⊥
¡ by using
¢ part (a) in the left-hand side above, we obtain (C ) =
cl conv(C) . Q.E.D.
x
C
^
z
2z^
0
z - ^z
C⊥ z
Figure 1.5.2. Proof of the Polar Cone Theorem for the case where C is a closed
and convex cone. If x ∈ C, then for all y ∈ C ⊥ , we have x0 y ≤ 0, which implies
that x ∈ (C ⊥ )⊥ . Hence, C ⊂ (C ⊥ )⊥ . To prove the reverse inclusion, take
z ∈ (C ⊥ )⊥ , and let ẑ be the unique projection of z on C, as shown in the figure.
Since C is closed, the projection exists by the Projection Theorem (Prop. 1.3.3),
which also implies that
(z − ẑ)0 (x − ẑ) ≤ 0, ∀ x ∈ C.
(z − ẑ)0 ẑ = 0.
Combining the last two relations, we obtain (z − ẑ)0 x ≤ 0 for all x ∈ C. Therefore,
(z − ẑ) ∈ C ⊥ , and since z ∈ (C ⊥ )⊥ , we obtain (z − ẑ)0 z ≤ 0, which when added
to (z − ẑ)0 ẑ = 0 yields kz − ẑk2 ≤ 0. Therefore, z = ẑ and z ∈ C. It follows that
(C ⊥ )⊥ ⊂ C.
xk − x y
xk → x, → .
kxk − xk kyk
x2 x2
(1,2)
TX(x) = cl(FX(x)) TX(x)
x = (0,1)
x = (0,1)
x1 x1
(a) (b)
Figure 1.5.4. Examples of the cones FX (x) and TX (x) of a set X at the vector
x = (0, 1). In (a), we have
© ª
X= (x1 , x2 ) | (x1 + 1)2 − x2 ≤ 0, (x1 − 1)2 − x2 ≤ 0 .
Here X is convex and the tangent cone TX (x) is equal to the closure of the cone of
feasible directions FX (x) (which is an open set in this example). Note, however,
that the vectors (1, 2) and (−1, 2) (as well the origin) belong to TX (x) and also
to the closure of FX (x), but are not feasible directions. In (b), we have
© ¡ ¢¡ ¢ ª
X= (x1 , x2 ) | (x1 + 1)2 − x2 (x1 − 1)2 − x2 = 0 .
Here the set X is nonconvex, and TX (x) is closed but not convex. Furthermore,
FX (x) consists of just the zero vector.
Figure 1.5.4 illustrates the cones TX (x) and FX (x) with examples.
The following proposition gives some of the properties of the cones FX (x)
and TX (x).
Sec. 1.5 Conical Approximations 93
xik − x yk
lim xik = x, lim = .
i→∞ i→∞ ||xi − x|| ||y k ||
k
∇f (x∗ )0 y ≥ 0, ∀ y ∈ TX (x∗ ).
∇f(x∗ )0 (x − x∗ ) ≥ 0, ∀ x ∈ X,
ξk → 0, xk → x∗ ,
and
xk − x∗ y
∗
= + ξk .
kxk − x k kyk
By the Mean Value Theorem, we have for all k
where x̃k is a vector that lies on the line segment joining xk and x∗ . Com-
bining the last two equations, we obtain
kxk − x∗ k
f (xk ) = f(x∗ ) + ∇f (x̃k )0 yk , (1.33)
kyk
where
yk = y + kykξk .
If ∇f (x∗ )0 y < 0, since x̃k → x∗ and yk → y, it follows that for all suffi-
ciently large k, ∇f(x̃k )0 yk < 0 and [from Eq. (1.33)] f (xk ) < f (x∗ ). This
contradicts the local optimality of x∗ .
Sec. 1.5 Conical Approximations 95
¡ ¢
When X is convex, we have cl FX (x) = TX (x) (cf. Prop. 1.5.3).
Thus the condition shown can be written as
¡ ¢
∇f (x∗ )0 y ≥ 0, ∀ y ∈ cl FX (x) ,
∇f(x∗ )0 (x − x∗ ) ≥ 0, ∀ x ∈ X.
for sufficiently small but positive α. Thus Prop. 1.5.4 says that if x∗ is
a local minimum, there is no descent direction within the tangent cone
TX (x∗ ).
Note that the necessary condition of Prop. 1.5.4 can equivalently be
written as
−∇f (x∗ ) ∈ TX (x∗ )⊥
(see Fig. 1.5.5). There is an interesting converse of this result, namely that
given any vector z ∈ TX (x∗ )⊥ , there exists a smooth function f such that
−∇f (x∗ ) = z and x∗ is a local minimum of f over X. We will return to
this result and to the subject of conical approximations when we discuss
Lagrange multipliers in Chapter 2.
In addition to the cone of feasible directions and the tangent cone, there is
one more conical approximation that is of special interest for the optimiza-
tion topics covered in this book. This is the normal cone of X at x, denoted
by NX (x), and obtained from the polar cone TX (x)⊥ by means of a clo-
sure operation. In particular, we have z ∈ NX (x) if there exist sequences
{xk } ⊂ X and {zk } such that xk → x, zk → z, and zk ∈ TX (xk )⊥ for all
k. Equivalently, the graph of NX (·), viewed as a point-to-set mapping, is
the closure of the graph of TX (·)⊥ :
© ª ¡© ª¢
(x, z) | x ∈ X, z ∈ NX (x) = cl (x, z) | x ∈ X, z ∈ TX (x)⊥ .
x* + TX(x *)
x * − ∇f(x *)
x* Constraint set X
a1 NX(x) a2
x=0
X = TX(x)
(a)
TX(x) = Rn
x=0
NX(x)
(b)
Figure 1.5.6. Examples of normal cones. In the case of figure (a), X is the union
of two lines passing through the origin:
For x = 0 we have TX (x) = X, TX (x)⊥ = {0}, while NX (x) is the nonconvex set
consisting of the two lines of vectors that are collinear to either a1 or a2 . Thus
X is not regular at x = 0. At all other vectors x ∈ X, we have regularity with
TX (x)⊥ and NX (x) equal to either the line of vectors that are collinear to a1 or
the line of vectors that are collinear to a2 .
In the case of figure (b), X is regular at all points except at x = 0, where
we have TX (x) = <n , TX (x)⊥ = {0}, while NX (x) is equal to the horizontal axis.
98 Convex Analysis and Optimization Chap. 1
EXE RC ISES
Let C ⊂ <n be a closed convex set, and let y and z be given vectors in <n such
that the line segment connecting y and z does not intersect with C. Consider
the problem of minimizing the sum of distances ky − xk + kz − xk over x ∈ C.
Derive a necessary and sufficient optimality condition. Does an optimal solution
exist and if so, is it unique? Discuss the case where C is closed but not convex.
1.5.2
Let C1 , C2 , and C3 be three closed subsets of <n . Consider the problem of finding
a triangle with minimum perimeter that has one vertex on each of the three sets,
i.e., the problem of minimizing kx1 − x2 k + kx2 − x3 k + kx3 − x1 k subject to
xi ∈ Ci , i = 1, 2, 3, and the additional condition that x1 , x2 , and x3 do not lie on
the same line. Show that if (x∗1 , x∗2 , x∗3 ) defines an optimal triangle, there exists
a vector z ∗ in the triangle such that
Let C ⊂ <n be a closed convex cone and let x be a given vector in <n . Show
that:
Sec. 1.5 Polyhedral Convexity 99
x̂ ∈ C, x − x̂ ∈ C ⊥ , (x − x̂)0 x̂ = 0.
1.5.4
Let C ⊂ <n be a closed convex cone and let a be a given vector in <n . Show
that for any positive scalars β and γ, we have
1.5.5 (Quasiregularity)
Proposition 1.6.1:
(a) Let a1 , . . . , ar be vectors of <n . Then the finitely generated cone
¯ r
X
¯
C = x¯ x= µj aj , µj ≥ 0, j = 1, . . . , r (1.35)
j=1
y 0 ei = 0, ∀ i = 1, . . . , m, y 0 aj ≤ 0, ∀ j = 1, . . . , r,
Proof: (a) We first show that the polar cone of C has the desired form
(1.36). If y satisfies y 0 aj ≤ 0 for all j, then y 0 x ≤ 0 for all x ∈ C, so the
set in the right-hand side of Eq. (1.36) is a subset of C ⊥ . Conversely, if
y ∈ C ⊥ , i.e., if y 0 x ≤ 0 for all x ∈ C, then (since aj belong to C) we have
y 0 aj ≤ 0, for all j. Thus, C ⊥ is a subset of the set in the right-hand side
of Eq. (1.36).
Sec. 1.5 Polyhedral Convexity 101
is also closed. Without loss of generality, assume that kaj k = 1 for all j.
There are two cases: (i) The vectors −a1 , . . . , −ar+1 belong to Cr+1 , in
which case Cr+1 is the subspace spanned by a1 , . . . , ar+1 and is therefore
closed, and (ii) The negative of one of the vectors, say −ar+1 , does not
belong to Cr+1 . In this case, consider the cone
¯ Xr
¯
Cr = x ¯ x = µj aj , µj ≥ 0 ,
j=1
m= min a0r+1 x.
x∈Cr , kxk=1
m > −1,
Let
βj = a0r bj , j = 1, . . . , m,
and define the index sets
Let also
βl
bl,k = bl − bk , ∀ l ∈ J + , k ∈ J −.
βk
Sec. 1.5 Polyhedral Convexity 103
x − µr ar ∈ Pr−1 ,
which is equivalent to
γ ≤ µr ≤ δ,
where ½ ¾
b0j x b0j x
γ = max 0, max , δ = min .
j∈J + βj j∈J − βj
Since x ∈ Pr , we have
b0k x
0≤ , ∀ k ∈ J −, (1.37)
βk
b0l x b0 x
≤ k , ∀ l ∈ J + , k ∈ J −. (1.38)
βl βk
x ∈ P⊥ ⇐⇒ x ∈ C,
where
P = {y | y 0 aj ≤ 0, j = 1, . . . , r + 2m} ,
¯ r+2m
X
¯
C= x¯x= µj aj , µj ≥ 0 .
j=1
104 Convex Analysis and Optimization Chap. 1
for some vectors aj and some scalars bj . Consider the polyhedral cone of
<n+1
© ª
P̂ = (x, w) | 0 ≤ w, a0j x ≤ bj w, j = 1, . . . , r
for some vectors vj and scalars dj . Since w ≥ 0 for all vectors (x, w) ∈ P̂ ,
we see that dj ≥ 0 for all j. Let
Thus, P is the vector sum of the convex hull of the vectors vj , j ∈ J + , plus
the finitely generated cone
X
µj vj | µj ≥ 0, j ∈ J 0 .
0
j∈J
To prove that the vector sum of the convex hull of a finite set of
points with a finitely generated cone is a polyhedral set, we use a reverse
argument; we pass to a finitely generated cone description, we use the
Minkowski – Weyl Theorem to assert that this cone is polyhedral, and we
finally construct a polyhedral set description. The details are left as an
exercise for the reader. Q.E.D.
An important fact that forms the basis for the simplex method of
linear programming, is that if a linear function f attains a minimum over
a polyhedral set C having at least one extreme point, then f attains a
minimum at some extreme point of C (as well as possibly at some other
Sec. 1.5 Polyhedral Convexity 107
nonextreme points). We will come to this fact after considering the more
general case where f is concave and C is closed and convex.
We say that a set C ⊂ <n is bounded from below if there exists a
vector b ∈ <n such that x ≥ b for all x ∈ C.
{x∗ + λd | λ1 ≤ λ ≤ λ2 }
for some λ2 > 0 and some λ1 < 0 for which the vector
x = x∗ + λ1 d
It follows that f (x∗ ) > f (x∗ + λ2 d). This contradicts the optimality of x∗ ,
proving that f (x) = f (x∗ ).
We have shown that the minimum of f is attained at some boundary
point x of C. If x is an extreme point of C, we are done. If it is not an
extreme point, consider a hyperplane H passing through x and containing
C in one of its halfspaces. The intersection T1 = C ∩ H is closed, convex,
bounded from below, and lies in an affine set M1 of dimension n − 1.
Furthermore, f attains its minimum over T1 at x. Thus, by the preceding
argument, it also attains its minimum at some boundary point x1 of T1 .
If x1 is an extreme point of T1 , then by Prop. 1.6.3, it is also an extreme
point of C and the result follows. If x1 is not an extreme point of T1 , then
we view M1 as a space of dimension n − 1 and we form T2 , the intersection
108 Convex Analysis and Optimization Chap. 1
Ax ≥ b, ∀ x ∈ C.
of the line segment connecting the distinct vectors x̂ and 2c + x̂. Therefore,
an extreme point of P must belong to P̂ , and since P̂ ⊂ P , it must also be
an extreme point of P̂ . An extreme point of P̂ must be one of the vectors
v1 , . . . , vm , since otherwise this point would be expressible as a convex
combination of v1 , . . . , vm . Thus the set of extreme points of P is either
empty or finite. Using Prop. 1.6.3(b), it follows that the set of extreme
points of P is nonempty and finite if and only if P contains no line.
If P is bounded, then we must have P = P̂ , and it can be shown
that P is equal to the convex hull of its extreme points (not just the con-
vex hull of the vectors v1 , . . . , vm ), as shown in the following proposition.
The proposition also gives another and more specific characterization of
extreme points of polyhedral sets, which is central in the theory of linear
programming.
P = {x | Ax = b, x ≥ 0},
Proof: (a) If the set Av contains fewer than n linearly independent vectors,
then the system of equations
a0j w = 0, ∀ a j ∈ Av
and apply the result of part (a). We obtain that v is an extreme point if and
only if Ā contains n − k linearly independent rows, which is equivalent to
the n − k nonzero columns of Ā (corresponding to the nonzero coordinates
of v) being linearly independent.
(c) We use induction on the dimension of the space. Suppose that all
bounded polyhedra of (n − 1)-dimensional spaces have a representation of
the form (1.39), but there is a bounded polyhedron P ⊂ <n and a vector
x ∈ P , which is not in the convex hull PE of the extreme points of P . Let
x̂ be the projection of x on PE and let x be a solution of the problem
maximize (x − x̂)0 z
subject to z ∈ P.
Sec. 1.5 Polyhedral Convexity 111
The polyhedron
© ª
P̂ = P ∩ z | (x − x̂)0 z = (x − x̂)0 x
is equal to the convex hull of its extreme points by the induction hypothesis.
Show that PE ∩ P̂ = Ø, while, by Prop. 1.6.3(a), each of the extreme points
of P̂ is also an extreme point of P , arriving at a contradiction.
(d) Since P is polyhedral, it has a representation
P = {x | Ax ≥ b},
for some m × n matrix A and some b ∈ <m . If A had rank less than n, then
its nullspace would contain some nonzero vector x, so P would contain
a line parallel to x, contradicting the existence of an extreme point [cf.
Prop. 1.6.3(b)]. Thus A has rank n and hence it must contain n linearly
independent rows that constitute an n × n invertible submatrix Â. If b̂ is
the corresponding subvector of b, we see that every x ∈ P satisfies Âx ≥ b̂.
The result then follows using Prop. 1.6.5. Q.E.D.
EXE RC ISES
1.6.1
1.6.2
Show that the image and the inverse image of a polyhedral set under a linear
transformation is polyhedral.
(a) Let A be an m×n matrix. Then exactly one of the following two conditions
holds:
(i) There is an x ∈ <n such that Ax < 0 (all components of Ax are
strictly negative).
(ii) There is a µ ∈ <m such that A0 µ = 0, µ 6= 0, and µ ≥ 0.
112 Convex Analysis and Optimization Chap. 1
(b) Show that an alternative and equivalent statement of part (a) is the follow-
ing: a polyhedral cone has nonempty interior if and only if its polar cone
does not contain a line, i.e., a set of the form {αz | α ∈ <}, where z is a
nonzero vector.
1.6.4
where v1 , . . . , vm are some vectors and C is a finitely generated cone (cf. Prop.
1.6.2). Show that the recession cone of P is equal to C.
1.7 SUBGRADIENTS
which is illustrated in Fig. 1.7.1. For a formal proof, note that, using the
definition of a convex function [cf. Eq. (1.3)], we obtain
µ ¶ µ ¶
y−x z −y
f (y) ≤ f(z) + f (x)
z−x z −x
f(x + α) − f (x)
s+ (x, α) = .
α
Sec. 1.5 Subgradients 113
f(z) - f(x)
slope =
z-x
x y z
Figure 1.7.1. Illustration of the inequalities (1.40). The rate of change of the
function f is nondecreasing with its argument.
where Fy+ (0) is the right derivative of the convex scalar function
Fy (α) = f (x + αy)
at α = 0. Note that the above calculation also shows that the left derivative
Fy− (0) of Fy is equal to −f 0 (x; −y) and, by using Prop. 1.7.1(a), we obtain
Fy− (0) ≤ Fy+ (0), or equivalently,
f 0 (x∗ ; x − x∗ ) ≥ 0, ∀ x ∈ X.
This follows from the definition (1.41) of directional derivative, and from
the fact that the difference quotient
¡ ¢
f x∗ + α(x − x∗ ) − f(x∗ )
α
116 Convex Analysis and Optimization Chap. 1
Proof: For any µ > f 0 (x; y), there exists an α > 0 such that
f (x + αy) − f (x)
< µ, ∀ α ≤ α.
α
Hence, for α ≤ α, we have
Since this is true for all µ > f 0 (x; y), inequality (1.43) follows.
If f is differentiable at all x ∈ <n , then using the continuity of f and
the part of the proposition just proved, we have for every sequence {xk }
converging to x and every y ∈ <n ,
0 x -1 0 1 x
∂f(x) ∂f(x)
0 x -1 0 1 x
-1
f 0 (x; y) ≥ y 0 d, ∀ y ∈ <n .
d0 (x − x∗ ) ≥ 0, ∀ x ∈ X.
see Fig. 1.7.3. Using the definition of directional derivative and the convex-
ity of f , it follows that these two sets are nonempty, convex, and disjoint.
By applying the Separating Hyperplane Theorem (Prop. 1.4.2), we see that
there exists a nonzero vector (γ, w) ∈ <n+1 such that
¡ ¢
γµ+w 0 z ≥ γ f(x)+αf 0 (x; y) +w 0 (x+αy), ∀ α ≥ 0, z ∈ <n , µ > f (z).
(1.47)
We cannot have γ < 0 since then the left-hand side above could be made
arbitrarily small by choosing µ sufficiently large. Also if γ = 0, then Eq.
(1.47) implies that w = 0, which is a contradiction. Therefore, γ > 0 and
by dividing with γ in Eq. (1.47), we obtain
C2
C1
f(z)
0 y x z
Figure 1.7.3. Illustration of the sets C1 and C2 used in the hyperplane separation
argument of the proof of Prop. 1.7.3(b).
Since both {xk } and {yk } are bounded, they must contain convergent sub-
sequences. We assume without loss of generality that xk converges to some
x and yk converges to some y with kyk = 1. By Eq. (1.45), we have
g0 z ≤ h0 (Ax; z) ∀ z ∈ <m ,
and in particular,
or
(A0 g)0 y ≤ f (x; y), ∀ y ∈ <n .
Hence, by part (a), we have A0 g ∈ ∂f(x), so that A0 ∂h(Ax) ⊂ ∂f (x).
To prove the reverse inclusion, suppose to come to a contradiction,
that there exists a d ∈ ∂f (x) such that d ∈/ A0 ∂h(Ax). Since by part (b),
0
the set ∂h(Ax) is compact, the set A ∂h(Ax) is also compact [cf. Prop.
1.1.9(d)], and by Prop. 1.4.3, there exists a hyperplane strictly separating
{d} from A0 ∂h(Ax), i.e., a vector y and a scalar b such that
f 0 (x; y) < y 0 d,
c < 0 ≤ d0 y, ∀ d ∈ W, (1.49)
Thus, using part (b), we have f 0 (x∗ ; y) < 0, while from Eq. (1.49), we see
that y belongs to the polar cone of FX (x∗ )⊥ , which by the Polar Cone
theorem (Prop. 1.5.1), implies that y is in the closure of the set of feasible
directions FX (x∗ ). Hence for a sequence yk of feasible directions converging
to y we have f 0 (x∗ ; yk ) < 0, which contradicts the optimality of x∗ .
The last statement follows from the convexity of X which implies that
TX (x∗ )⊥ is the set of all z such that z 0 (x − x∗ ) ≤ 0 for all x ∈ X (cf. Props.
1.5.3 and 1.5.5). Q.E.D.
Sec. 1.5 Subgradients 123
Note that the last part of the above proposition generalizes the opti-
mality condition of Prop. 1.6.2 for the case where f is convex and smooth:
∇f(x∗ )0 (x − x∗ ) ≥ 0, ∀ x ∈ X.
In the special case where X = <n , we obtain a basic necessary and sufficient
condition for unconstrained optimality of x∗ :
0 ∈ ∂f (x∗ ).
This optimality condition is also evident from the subgradient inequality
(1.44).
We close this section with a version of the chain rule for directional
derivatives and subgradients.
Proof: We have
¡ ¢ ¡ ¢
F (x + αy) − F (x) g f (x + αy) − g f (x)
F 0 (x; y) = lim = lim . (1.52)
α↓0 α α↓0 α
From the convexity of f it follows that there are three possibilities: (1)
For some α > 0, f (x + αy) = f(x) for all α ∈ (0, α], (2) For some α > 0,
f (x + αy) > f (x) for all α ∈ (0, α], (3) For some α > 0, f (x + αy) < f (x)
for all α ∈ (0, α].
In case (1), from Eq. (1.52), we have F 0 (x; y) = f 0 (x; y) = 0 and the
given formula (1.50) holds. In case (2), from Eq. (1.52), we have for all
α ∈ (0, α]
¡ ¢ ¡ ¢
0
f(x + αy) − f (x) g f (x + αy) − g f (x)
F (x; y) = lim · .
α↓0 α f (x + αy) − f(x)
124 Convex Analysis and Optimization Chap. 1
where for the first inequality we use the convexity of g, and for the second
inequality we use the convexity of f and the monotonicity of g.
To obtain the formula for the subdifferential of F , we note that by
Prop. 1.7.3(a), d ∈ ∂F (x) if and only if y 0 d ≤ F 0 (x; y) for all y ∈ <n , or
equivalently (from what has been already shown)
¡ ¢
y 0 d ≤ ∇g f (x) f 0 (x; y), ∀ y ∈ <n .
¡ ¢
If ∇g f(x) = 0, this relation¡yields¢ d = 0, so ∂F (x) = ¡{0} and
¢ the desired
formula (1.51) holds. If ∇g f (x) 6= 0, we have ∇g f (x) > 0 by the
monotonicity of g, so we obtain
d
y0 ¡ ¢ ≤ f 0 (x; y), ∀ y ∈ <n ,
∇g f (x)
¡ ¢
which, by Prop. 1.7.3(a), is equivalent to d/∇g f (x)
¡ ∈¢ ∂f (x). Thus we
have shown that d ∈ ∂F (x) if and only if d/∇g f (x) ∈ ∂f (x), which
proves the desired formula (1.51). Q.E.D.
1.7.3 ²-Subgradients
f (x + αy) − f (x) + ²
inf = max y 0 d.
α>0 α d∈∂² f (x)
(d) If 0 ∈
/ ∂² f (x), then the direction y = −d, where
Hence
as y ranges over <n . Hence ∂² f (x) is closed and convex. To show that
∂² f (x) is also bounded, suppose to arrive at a contradiction that there is a
sequence {dk } ⊂ ∂² f (x) with ||dk || → ∞. Let yk = ||ddk || . Then, from Eq.
k
(1.54), we have for α = 1
f (x + yk ) ≥ f (x) + ||dk || − ², ∀ k,
so it follows that f(x + yk ) → ∞. This is a contradiction since f is convex
and hence continuous, so it is bounded on any bounded set. Thus ∂² f (x)
is bounded.
To show that ∂² f (x) is nonempty and satisfies
f(x + αy) − f (x) + ²
inf = max y 0 d, ∀ y ∈ <n ,
α>0 α d∈∂² f (x)
w0 f (x + αy) − f (x) + ²
− y ≥ inf .
γ α>0 α
f (x + αy) − f(x) − ²
max d0 y = inf .
d∈∂² f(x) α>0 α
(b) By definition, 0 ∈ ∂² f (x) if and only if f(z) ≥ f (x) − ² for all z ∈ <n ,
which is equivalent to inf z∈<n f(z) ≥ f (x) − ².
(c) Assume that a direction y is such that
while inf α>0 f (x + αy) ≥ f (x) − ². Then f(x + αy) − f (x) ≥ −² for all
α > 0, or equivalently
f (x + αy) − f (x) + ²
max y 0 d = inf ≥ 0.
d∈∂² f (x) α>0 α
(d − d)0 d ≥ 0, ∀ d ∈ ∂² f(x).
Hence
d0 d ≥ ||d||2 > 0, ∀ d ∈ ∂² f (x).
128 Convex Analysis and Optimization Chap. 1
(e) It will suffice to prove the result for the case where f = f1 + f2 . If
d1 ∈ ∂² f1 (x) and d2 ∈ ∂² f2 (x), then from Eq. (1.54), we have
Hence from Eq. (1.54), we have d1 + d2 ∈ ∂2² f (x), implying that ∂² f1 (x) +
∂² f2 (x) ⊂ ∂2² f (x).
To prove that ∂² f(x) ⊂ ∂² f1 (x)+ ∂² f2 (x), we use an argument similar
to the one of the proof of Prop. 1.7.3(d). Suppose to come to a contradic-
tion, that there exists a d ∈ ∂² f (x) such that d ∈
/ ∂² f1 (x)+∂² f2 (x). Since by
part (a), the sets ∂² f1(x) and ∂² f2 (x) are compact, the set ∂² f1 (x)+∂² f2 (x)
is compact (cf. Prop. 1.2.16), and by Prop. 1.4.3, there exists a hyperplane
strictly separating {d} from ∂² f1 (x) + ∂² f2 (x), i.e., a vector y and a scalar
b such that
2
X fj (x + αj y) − fj (x) + ²
< y 0 d. (1.59)
αj
j=1
Define
1
α = P2 .
j=1 1/αj
¡ ¢
As a consequence of the convexity of fj , the ratio fj (x + αy) − fj (x) /α
is monotonically nondecreasing in α. Thus, since αj ≥ α, we have
Thus, ∂f (x) can be empty and can be unbounded at points x that belong
to the effective domain of f (as in the cases x = 0 and x = 1, respectively,
of the above example). However, it can be shown that ∂f (x) is nonempty
and compact at points x that are interior points of the effective domain of
f , as also illustrated by the above example.
Similarly, a vector d is an ²-subgradient of f at a vector x such that
f (x) < ∞ if
f(x) f(x)
ε ε
x
0 0 x
D D
Note that these endpoints can be −∞ (as in the figure on the right) or ∞. For
² = 0, the above formulas also give the endpoints of the subdifferential ∂f(x).
Note that while ∂f (x) is nonempty for all x in the interior of D, it may be empty
for x at the boundary of D (as in the figure on the right).
∂φ(x, z)
, i = 1, . . . , n.
∂xi
(c) The conclusion of part (a) also holds if, instead of assuming the
Z is compact, we assume that Z(x) is nonempty for all x ∈ <n ,
and that φ and Z are such that for every convergent sequence
{xk }, there exists a bounded sequence {zk } with zk ∈ Z(xk ) for
all k.
To prove the reverse inequality and that the supremum in the right-
hand side of the above inequality is attained, consider a sequence {αk } with
αk ↓ 0 and let xk = x + αk y. For each k, let zk be a vector in Z(xk ). Since
{zk } belongs to the compact set Z, it has a subsequence converging to some
z ∈ Z. Without loss of generality, we assume that the entire sequence {zk }
converges to z. We have
φ(xk , zk ) ≥ φ(xk , z), ∀ z ∈ Z,
Sec. 1.5 Subgradients 133
where the last inequality follows from inequality (1.42). We apply Prop.
1.4.2 to the functions fk defined by fk (·) = φ(·, zk ), and with xk = x + αk y,
to obtain
lim sup φ0 (x + αk y, zk ; y) ≤ φ0 (x, z; y). (1.64)
k→∞
f 0 (x; y) = max d0 y.
z∈∂f (x)
≥ φ(y, z)
≥ φ(x, z) + ∇x φ(x, z)0 (y − x)
= f (x) + ∇x φ(x, z)0 (y − x).
134 Convex Analysis and Optimization Chap. 1
© ª
contradicting Prop. 1.7.3. Thus ∂f (x) ⊂ conv ∇x φ(x, z) | z ∈ Z(x) and
the proof is complete.
(c) The proof of this part is nearly identical to the proof of part (a).
Q.E.D.
EXE RC ISES
1.7.1
Consider the problem of minimizing a convex function f :7→ < over the polyhedral
set
X = {x | a0j x ≤ bj , j = 1, . . . , r}.
Sec. 1.5 Subgradients 135
Show that x∗ is an optimal solution if and only if there exist scalars µ∗1 , . . . , µ∗r
such that
(i) µ∗j ≥ 0 for all j, and µ∗j = 0 for all j such that a0j x∗ < bj .
Pr
(ii) 0 ∈ ∂f(x∗ ) + j=1
µ∗j aj .
Hint : Characterize the cone TX (x∗ )⊥ , and use Prop. 1.7.5 and Farkas’ lemma.
1.7.2
Let f : <n 7→ < be a convex function and fix a vector x ∈ <n . Show that a
vector d is the vector of minimum norm in ∂f (x) if and only if either d = 0 or
else d/kdk minimizes f 0 (x; d) over all d with kdk ≤ 1.
Let f : <n 7→ < be a convex function and fix a vector x ∈ <n . A vector d ∈ <n is
said to be a descent direction of f at x if the corresponding directional derivative
of f satisfies
f 0 (x; d) < 0.
gk 0 wk = min g 0 wk = f 0 (x; wk ).
g∈∂f (x)
Show that this process terminates in a finite number of steps with a descent
direction. Hint : If wk is not a descent direction, then gi 0 wk ≥ kwk k2 ≥ kg ∗ k2 > 0
for all i = 1, . . . , k − 1, where g ∗ is the subgradient of minimum norm, while at
the same time gk 0 wk ≤ 0. Consider a limit point of {(wk , gk )}.
136 Convex Analysis and Optimization Chap. 1
Let f : <n 7→ < be a convex function. This exercise shows how the procedure of
Exercise 1.7.4 can be modified so that it generates an ²-descent direction at a given
vector x. At the typical step of this procedure, we have g1 , . . . , gk−1 ∈ ∂² f (x).
Let wk be the vector of minimum norm in the convex hull of g1 , . . . , gk−1 ,
wk = arg min kgk.
g∈conv{g1 ,...,gk−1 }
Show that this process will terminate in a finite number of steps with either an
improvement of the value of f by at least ², or by confirmation that x is an
²-optimal solution.
and
xk − x∗ y
= + ξk .
kxk − x∗ k kyk
We write this equation as
kxk − x∗ k
xk − x∗ = yk , (1.65)
kyk
where
yk = y + kykξk .
By the convexity of f1 , we have
and
f1 (xk ) + f10 (xk ; x∗ − xk ) ≤ f1 (x∗ ),
and by adding these inequalities, we obtain
where x̃k is a vector on the line segment joining xk and x∗ . By adding the
last two relations,
kxk − x∗ k ¡ 0 ¢
f (xk ) ≤ f (x∗ ) + f1 (xk ; yk ) + ∇f2(x̃k )0 yk , ∀ k. (1.66)
kyk
and [from Eq. (1.66)] f(xk ) < f (x∗ ). This contradicts the local optimality
of x∗ .
We have thus shown that f10 (x∗ ; y)+∇f2 (x∗ )0 y ≥ 0 for all y ∈ TX (x∗ ),
or equivalently, by Prop. 1.7.3(b),
We can now apply the Saddle Point Theorem (Prop. 1.3.8 – the convex-
ity/concavity and compactness assumptions of that proposition are satis-
fied) to assert that there exists a d ∈ ∂f1 (x∗ ) such that
¡ ¢0
min d + ∇f2 (x∗ ) y = 0.
kyk≤1
y∈TX (x∗ )
Note that in the special case where f1 (x) ≡ 0, we obtain Prop. 1.6.2.
The convexity assumption on TX (x∗ ) is unnecessary © in this case, but ªit is
essential in general. [Consider the subset X = (x1 , x2 ) | x1 x2 = 0 of
<2 ; it is easy to construct a convex nondifferentiable function that has a
global minimum at x∗ = 0 without satisfying the necessary condition of
Prop. 1.8.1.]
In the special case where f2 (x) ≡ 0 and X is convex, Prop. 1.8.1 yields
the necessity part of Prop. 1.7.3(e). More generally, when X is convex, an
equivalent statement of Prop. 1.8.1 is that if x∗ is a local minimum of f
over X, there exists a subgradient d ∈ ∂f1 (x∗ ) such that
¡ ¢0
d + ∇f2 (x∗ ) (x − x∗ ) ≥ 0, ∀ x ∈ X.
REFERENCES
[EkT76] Ekeland, I., and Temam, R., 1976. Convex Analysis and Varia-
tional Problems, North-Holland Publ., Amsterdam.
[Fen51] Fenchel, W., 1951. “Convex Cones, Sets, and Functions,” Mimeogra-
phed Notes, Princeton Univ.
[HiL93] Hiriart-Urruty, J.-B., and Lemarechal, C., 1993. Convex Analysis
and Minimization Algorithms, Vols. I and II, Springer-Verlag, Berlin and
N. Y.
[HoK71] Hoffman, K., and Kunze, R., 1971. Linear Algebra, Prentice-Hall,
Englewood Cliffs, N. J.
[LaT85] Lancaster, P., and Tismenetsky, M., 1985. The Theory of Matrices,
Academic Press, N. Y.
[Lem74] Lemarechal, C., 1974. “An Algorithm for Minimizing Convex Func-
tions,” in Information Processing ’74, Rosenfeld, J. L., (Ed.), pp. 552-556,
North-Holland, Amsterdam.
[Lem75] Lemarechal, C., 1975. “An Extension of Davidon Methods to Non-
differentiable Problems,” Math. Programming Study 3, Balinski, M., and
Wolfe, P., (Eds.), North-Holland, Amsterdam, pp. 95-109.
[Min11] Minkowski, H., 1911. “Theorie der Konvexen Korper, Insbesondere
Begrundung Ihres Ober Flachenbegriffs,” Gesammelte Abhandlungen, II,
Teubner, Leipsig.
[Min60] Minty, G. J., 1960. “Monotone Networks,” Proc. Roy. Soc. London,
A, Vol. 257, pp. 194-212.
[Mor76] Mordukhovich, B. S., 1976. “Maximum Principle in the Problem
of Time Optimal Response with Nonsmooth Constraints,” J. of Applied
Mathematics and Mechanics, Vol. 40, pp. 960-969.
[OrR70] Ortega, J. M., and Rheinboldt, W. C., 1970. Iterative Solution of
Nonlinear Equations in Several Variables, Academic Press, N. Y.
[RoW98] Rockafellar, R. T., and Wets, R. J.-B., 1998. Variational Analysis,
Springer-Verlag, Berlin.
[Roc70] Rockafellar, R. T., 1970. Convex Analysis, Princeton Univ. Press,
Princeton, N. J
[Roc84] Rockafellar, R. T., 1984. Network Flows and Monotropic Opti-
mization, Wiley, N. Y.; republished by Athena Scientific, Belmont, MA,
1998.
[Rud76] Rudin, W., 1976. Principles of Mathematical Analysis, McGraw-
Hill, N. Y.
[Ste13] Steinitz, H., 1913. “Bedingt Konvergente Reihen und Konvexe Sys-
tem, I, J. of Math., Vol. 143, pp. 128-175.
Sec. 1.5 Notes and Sources 141
[Ste14] Steinitz, H., 1914. “Bedingt Konvergente Reihen und Konvexe Sys-
tem, II, J. of Math., Vol. 144, pp. 1-40.
[Ste16] Steinitz, H., 1916. “Bedingt Konvergente Reihen und Konvexe Sys-
tem, III, J. of Math., Vol. 146, pp. 1-52.
[Str76] Strang, G., 1976. Linear Algebra and Its Applications, Academic
Press, N. Y.
[Wey35] Weyl, H., 1935. “Elementare Theorie der Konvexen Polyeder,”
Commentarii Mathematici Helvetici, Vol. 7, pp. 290-306.