(9783110426045 - An Introduction To Nonlinear Optimization Theory) 3 The Study of Smooth Optimization Problems
(9783110426045 - An Introduction To Nonlinear Optimization Theory) 3 The Study of Smooth Optimization Problems
This chapter plays a central role in this monograph. In its first section, we present
smooth optimization problems and deduce existence conditions for minimality. We
take this opportunity to prove and discuss the Ekeland Variational Principle and its
consequences. We then obtain necessary conditions for optimality as well as suffi-
cient optimality conditions of the first and second-order for smooth objective func-
tions under geometric restrictions (i.e., restrictions of the type x ∈ M, where M is
an arbitrary set). The second section is dedicated to the investigation of optimality
conditions under functional restrictions (with equalities and inequalities). The main
aim is to deduce Karush-Kuhn-Tucker conditions and to introduce and compare sev-
eral qualification conditions. Special attention is paid to the case of convex and affine
data. Subsequently, we derive second-order optimality conditions for the case of func-
tional restrictions. The last section of this chapter includes two examples which show
that, for practical problems, the computational challenges posed by the optimality
conditions are sometimes not easy to solve.
The function f is called the objective function, or cost function, and the set M is called
the set of feasible points of the problem (P), or the set of constraints, or the set of
restrictions.
We should say from the very beginning that we shall study the minimization of f ,
but, by virtue of the relation max f = − min(−f ), similar results for its maximization
could be obtained. Let us start by defining the notion of a solution associated to the
problem (P).
Definition 3.1.1. One says that x ∈ M is a local solution (or, simply, solution) of the
problem (P) or minimum point of the function f on the set M if there exists a neighbor-
hood V of x such that f (x) ≤ f (x) for every x ∈ M ∩ V . If V = Rp , one says that x is a
global solution of (P) or minimal global point of f on M.
Of course, for the maximization problem, the corresponding solution is clear. Let us
mention that in this chapter we will only deal with the case of smooth functions (up
to the order two, i.e., f ∈ C2 ).
Remark 3.1.2. We shall distinguish two main situations for the study of problem (P) : (i)
the case where M = U and (ii) the case where M appears as an intersection between a
closed set of Rp and U. In the former case, we say that the optimization problem (P) has
no constraints (or restrictions), while in the latter, we call this a problem with constraints.
In the case of a problem without constraints, we use as well the term “local minimum
point of f ” instead of local solution. Let us observe that in Definition 3.1.1, if x ∈ int M,
then x is a local solution of the unconstrained problem (it is enough to take a smaller
neighborhood V such that V ⊂ M). So, in the case of problems with restrictions, the
interesting situation is when x ∈ bd M. If x ∈ int M we say that the restriction is inactive.
In the next sections, the two cases mentioned above will be treated together, but
afterwards the discussion will split.
The basis of Optimization Theory consists of two fundamental results: the Weierstrass
Theorem which ensures the existence of extrema and the Fermat Theorem (on station-
ary, or critical points) which gives a necessary condition for a point to be an extremum
(without constraints) of a function. The theory follows the main trajectory of these fun-
damental results: on one hand, the study of existence conditions, and, on the other
hand, the study of the (necessary and sufficient) optimality conditions.
We now recall these basic results. The classical Weierstrass Theorem, also given in
the first chapter, states that a continuous function on a compact interval has a global
minimum point on that interval. We have shown already that some conditions can be
relaxed. Here is the theorem again.
We now present Fermat Theorem. The particular case of a function of one real vari-
able was presented in Section 1.3. The proof of the theorem will be given later in this
chapter.
Unauthenticated
Download Date | 2/13/19 10:59 AM
80 | The Study of Smooth Optimization Problems
Remark 3.1.6. 1. The converse of Fermat Theorem is not true: for instance, the derivative
of f : R → R, f (x) = x3 vanishes at 0, but this point is neither minimum nor maximum
of f .
2. The interiority condition for a is essential since, without this assumption, the con-
clusion does not hold: take f : [0, 1] → [0, 1], f (x) = x which has at x = 0 a minimum
point where the derivative is not 0. Therefore, in view of Remark 3.1.2, one can say that
Fermat Theorem applies only to unconstrained problems.
3. If S is compact and f is continuous on S and differentiable on int S, it is possible
to have ∇f (x) ≠ 0 for every x ∈ int S, and in such a case the extreme points of f on S,
which surely exist from Weierstrass Theorem, lie on the boundary of S.
We now start our discussion on the existence conditions for the minimum points.
Proof It is obvious that if f has a global minimum on M. Similarly, it also has a global
minimum on M ∩ N ν f . Since this set is compact and f is lower semicontinuous, from
Weierstrass Theorem 3.1.3, we infer that f is lower bounded and attains its global min-
imum on M ∩ N ν f , whence on M, too.
Proof Let ν > inf x∈M f . If the set N ν f ∩ M would be unbounded, then there exists
(x k ) ⊂ N ν f ∩ M with kx k k → ∞. On one hand, by our assumption, lim f (x k ) = ∞ and,
on the other hand, f (x k ) ≤ ν for every k ∈ N, which is absurd. So N ν f ∩ M is bounded.
Unauthenticated
Download Date | 2/13/19 10:59 AM
General Optimality Conditions | 81
Proposition 3.1.9. Let D, C ⊂ Rp be two sets such that C is compact and D ∩ C ≠ ∅. Let
φ : D ∩ C → R be a continuous function. Suppose that the following condition holds: for
every sequence (x k ) ⊂ D ∩ C, x k → x ∈ bd D ∩ C, the sequence (φ(x n )) is unbounded
above. Then φ is lower bounded and attains its minimum on D ∩ C.
Proof Firstly, let us observe that φ cannot be constant. Let x0 ∈ D ∩ C such that φ(x0 ) >
inf x∈D∩C φ(x). It is enough to show that the level set A := {x ∈ D ∩ C | φ(x) ≤ φ(x0 )} is
compact (see the proof of Theorem 3.1.7). Obviously, A is bounded (as a subset of C).
It remains to show that A is closed. Let (x k ) ⊂ A, x k → x. Suppose, by contradiction,
that x ∉ A. Then, from the closedness of C,
x ∈ (cl D \ D) ∩ C ⊂ bd D ∩ C.
Proof The implication (i) ⇒ (ii) is obvious. Suppose that (ii) holds, but x is not a global
minimum point. Then, there exists u ∈ Rp with f (u) < f (x). We define φ : [0, 1] → R
by φ(t) := f (tx + (1 − t)u). The set S := {t ∈ [0, 1] | φ(t) = f (x)} is nonempty (1 ∈ S),
closed (from the continuity of f ) and bounded. Then there is t0 = min S. Clearly, t0 ∈
(0, 1]. From the hypothesis, f (t0 x + (1 − t0 )u) = f (x) tells us that the point t0 x + (1 − t0 )u
is a local minimum of f . Consequently, there is ε > 0 such that for every t ∈ [0, 1] ∩ (t0 −
ε, t0 + ε), φ(t) ≥ φ(t0 ). Since t0 = min S, if one takes t1 ∈ [0, 1] ∩ (t0 − ε, t0 ), the strict
inequality φ(t1 ) > φ(t0 ) > φ(0) holds. The function f being continuous, it has the
Darboux property (or intermediate value property), whence there exists t2 ∈ (0, t1 )
with φ(t2 ) = φ(t0 ) = f (x), and this contradicts the minimality of t0 . Therefore, the
assumption made was false, hence the conclusion holds.
Unauthenticated
Download Date | 2/13/19 10:59 AM
82 | The Study of Smooth Optimization Problems
Notice that, if f is not continuous, the result does not hold. For this, it is sufficient
to analyze the following function f : R → R,
−x − 1, x ∈ (−∞, −1]
x + 1, x ∈ (−1, 0)
f (x) = −1, x = 0
−x + 1, x ∈ (0, 1)
x − 1, x ∈ [1, ∞).
Let us observe that, by using the function f : R → R, f (x) = x3 and x = 0, that one
cannot replace in the item (ii) above the local minimality by the stationarity.
The above results ensure sufficient conditions for the existence of minimum
points under compactness assumptions for the level sets. Conversely, it is clear that
the lower boundedness of the function is a necessary condition for the existence of
minimum points, but the boundedness of the level sets is not. For instance, the func-
tion f : (0, ∞) → R, f (x) = (x − 1)2 e−x attains its minimum at x = 1 and the minimal
value is 0, but N ν f is not bounded for every value of ν > 0 = inf {f (x) | x ∈ (0, ∞)}.
Definition 3.1.11. Let f : Rp → R be a lower bounded function and take ε > 0. A point
x ε is called ε−minimum of f if
f (x ε ) ≤ infp f (x) + ε.
x ∈R
Unauthenticated
Download Date | 2/13/19 10:59 AM
General Optimality Conditions | 83
Since inf x∈Rp f (x) ∈ R, the existence of ε−minima for every positive ε is ensured. We
use the generic term of approximate minima for ε−minima.
We now present a very important result, the Ekeland Variational Principle, which
states that close to an approximate minimum point one can find a genuine minimum
point for some perturbation of the initial function. This results was proved by the
French mathematician Ivar Ekeland in 1974.
f (x ε ) ≤ f (x ε ),
kx ε − x ε k ≤ δ,
f (x ε ) ≤ f (x) + εδ−1 kx − x ε k , ∀x ∈ Rp .
From Theorem 3.1.7 and Proposition 3.1.8, the function g has a global minimum
point, which we denote by x ε . Consequently,
For x = x ε , we get
f (x ε ) ≤ f (x ε ).
That is the first relation in the conclusion. On the other hand, using this inequality
f (x ε ) ≤ infp f (x) + ε,
x ∈R
δ−1 kx ε − x ε k ≤ 1.
This is the second part of the conclusion. Relation (3.1.1) allows us to write, succes-
sively, for any x ∈ Rp ,
Unauthenticated
Download Date | 2/13/19 10:59 AM
84 | The Study of Smooth Optimization Problems
Remark 3.1.13. Notice that the Ekeland Variational Principle holds (with minor changes
in the proof) if instead of the whole space Rp one takes a closed subset of it.
x 7→ f (x) + εδ−1 kx − x ε k .
On the other hand, if we want x ε to be close to x ε (i.e., δ to be small), then the pertur-
√
bation term, εδ−1 k· − x ε k is big. A compromise would be to choose δ := ε, and in
this case one gets the next consequence.
The Ekeland Variational Principle has many applications. Some of these refer to the
same issues of extreme points. The next example of such an application asserts that
every differentiable function has approximate critical points (for which the norm of
the differential is arbitrarily small).
f (x n ) → infp f (x), ∇f (x n ) → 0.
x ∈R
that is,
−∇f (x ε )(u) ≤ εδ−1 kuk , ∀u ∈ Rp .
Changing u into −u, we get
Unauthenticated
Download Date | 2/13/19 10:59 AM
General Optimality Conditions | 85
whence
∇f (x ε )(u) ≤ εδ−1 kuk , ∀u ∈ Rp ,
√
and this implies ∇f (x ε ) ≤ εδ−1 . For ε := n−1 , δ = n−1 , n ∈ N* we obtain the second
part of the conclusion.
The Ekeland Variational Principle also allows us to prove the equivalence of sev-
eral existence conditions for minimum points.
Proof The implication (i) ⇒ (ii) was already proved in Proposition 3.1.8, while (ii) ⇒
(iii) is a consequence of the fact that every bounded sequence has a convergent sub-
sequence. Let us show that (iii) ⇒ (i). Suppose, by way of contradiction, that there
exist c ∈ R and a sequence (x n ) ⊂ Rn with kx n k → ∞ and f (x n ) ≤ c, for every n ∈ N.
Clearly, c ≥ inf x∈Rp f (x). For every n ∈ N* we choose
so,
f (x n ) < infp f (x) + ε n .
x ∈R
Let δ n := 2−1 kx n k > 0. Like in Theorem 3.1.15 (and its proof) there exists x n with
f (x n ) ≤ f (x n ) ≤ inf x∈Rp f (x) + ε n ,
kx n − x n k ≤ δ n ,
∇f (x n ) ≤ ε n δ−1 n .
But,
kx n k ≥ kx n k − kx n − x n k ≥ kx n k − 2−1 kx n k = 2−1 kx n k ,
whence kx n k → ∞. On the other hand,
2
∇f (x n ) ≤ (c + n−1 − infp f (x)) → 0.
kx n k x ∈R
Unauthenticated
Download Date | 2/13/19 10:59 AM
86 | The Study of Smooth Optimization Problems
Proof Suppose, to obtain a contradiction, that the conclusion does not hold. Then,
for every x ∈ Rp , inf x∈Rp f (x) < f (x), so, by the assumptions made, there exists z x ∈
Rp \ {x} such that
f (z x ) < f (x) − α kz x − xk .
By the Ekeland Variational Principle for ε > 0, δ > 0 with εδ−1 = α, there is an element
u ∈ Rp with
f (u) ≤ f (v) + α kv − uk , ∀v ∈ Rp .
Then
f (u) ≤ f (z u ) + α kz u − uk < f (u),
which is absurd. Hence the conclusion hold.
In the second part of this section, we present necessary optimality conditions and
sufficient optimality conditions. At first, we deduce necessary optimality conditions
that use the ideas developed around the construction and the study of the Bouligand
tangent cone.
f (x + t n u n ) = f (x) + t n ∇f (x)(u n ) + t n ku n k α n ,
whence,
∇f (x)(u n ) + ku n k α n ≥ 0,
for all n large enough. Passing to the limit for n → ∞, we get the conclusion.
Unauthenticated
Download Date | 2/13/19 10:59 AM
General Optimality Conditions | 87
Remark 3.1.20. Taking into account Proposition 2.1.12, if x ∈ int M (inactive restric-
tion), Theorem 3.1.18 gives ∇f (x)(u) ≥ 0 for every u ∈ Rp . The linearity of ∇f (x) implies
∇f (x) = 0, i.e., the Fermat Theorem on stationary points.
We present now a second-order necessary optimality condition for the problem with-
out restrictions.
Proof Let V ⊂ U be a neighborhood of x such that f (x) ≤ f (x) for every x ∈ V and f
is of class C2 on V . The fact that ∇f (x) = 0 follows from Fermat Theorem. As before,
take u ∈ Rp and (t n ) ⊂ (0, ∞) with t n → 0. Taylor Theorem 1.3.4 says that for every
n ∈ N there exists c n ∈ (x, x + t n u) such that
1 2 2 1
f (x + t n u) − f (x) = t n ∇f (x)(u) + t ∇ f (c n )(u, u) = t2n ∇2 f (c n )(u, u).
2 n 2
For n sufficiently large, f (x + t n u) − f (x) ≥ 0, whence
∇2 f (c n )(u, u) ≥ 0,
∇2 f (x)(u, u) ≥ 0,
Unauthenticated
Download Date | 2/13/19 10:59 AM
88 | The Study of Smooth Optimization Problems
Proof The implication (i) ⇒ (ii) is obvious for every function, and (ii) ⇒ (iii) follows
from Fermat Theorem. Finally, the implication (iii) ⇒ (i) relies on the convexity of f
and follows from Theorem 2.2.10.
Therefore, for convex functions, the first-order necessary optimality condition (in
the unconstrained case) is also sufficient. In this situation, the second order condition
is automatically satisfied (according to Theorem 2.2.10).
Concerning the nature of the extreme points for convex functions, we record here
some important aspects.
Proof Let x be a local minimum point of f on M. Then there exists a convex neigh-
borhood V of x such that for every x ∈ V ∩ M, f (x) ≤ f (x). Let x ∈ M. There exists
λ ∈ (0, 1) such that y := (1 − λ)x + λx ∈ M ∩ V . Then,
that is,
λf (x) ≤ λf (x),
and the conclusion of the first part follows.
For the second part, there is a convex symmetric neighborhood V of 0 (a ball with
the center 0, for instance) such that for every v ∈ V , f (u + v) ≤ f (u) and f (u − v) ≤ f (u).
Then
1 1 1 1
f (u) = f (u + v) + (u − v) ≤ f (u + v) + f (u − v) ≤ f (u),
2 2 2 2
for all v ∈ V . Consequently, f (u + v) = f (u) for every v ∈ V . Therefore, u is a local
(hence global) minimum point of f .
Unauthenticated
Download Date | 2/13/19 10:59 AM
General Optimality Conditions | 89
whence ∇f (x)(x − x) ≥ 0 for every x ∈ M. From these relations we know f (x) ≥ f (x) for
every x ∈ M.
Coming back to Theorems 3.1.18 and 3.1.21, in order to formulate sufficient opti-
mality conditions, we strengthen the conclusion of these results. The good point is
that we get stronger minimality concepts.
Definition 3.1.26. Let α > 0. One says that x ∈ M is a strict local solution of order α
for (P), or a strict local minimum point of order α for f on M if there exist two constants
r, l > 0 such that for every x ∈ M ∩ B(x, r),
Unauthenticated
Download Date | 2/13/19 10:59 AM
90 | The Study of Smooth Optimization Problems
Proof Suppose, by way of contradiction, that x is not a strictly local solution of order
1. Then, there exists a sequence (x n ) → x, (x n ) ⊂ M such that for every n ∈ N* ,
0 ≥ ∇f (x)(u),
Notice that for differentiable functions, the concept of a local strict solution of
order 1 is specific to the case of active restrictions (that is, x ∈ M \ int M): if f is dif-
ferentiable at x ∈ int M, then x cannot be a local strict solution of order 1. Indeed,
if x ∈ int M would be local strict solution of order 1, then, on one hand, ∇f (x) = 0
(Fermat Theorem), and, on the other hand, ∇f (x) ≠ 0 from the definition of strict so-
lutions.
Concerning second-order optimality conditions, one has the following results.
Proof As before, one supposes, by contradiction, that the conclusion does not hold.
Then there exists a sequence (x n ) → x, (x n ) ⊂ M \ {x} such that for every n ∈ N* ,
Unauthenticated
Download Date | 2/13/19 10:59 AM
Functional Restrictions | 91
From Taylor Theorem 1.3.4, for every n ∈ N there exists c n on the segment joining x
and x n such that
1 2
f (x n ) − f (x) = ∇f (x)(x n − x) + ∇ f (c n )(x n − x, x n − x)
2
1 2
= ∇ f (c n )(x n − x, x n − x).
2
We get
1 2
n−1 kx n − xk2 > ∇ f (c n )(x n − x, x n − x),
2
whence, in order to finish the proof, we divide by kx n − xk2 and we repeat the above
arguments.
The restriction of the problem (P) introduced in the previous section is x ∈ M. Many
times, in practice this set M of feasible points is defined by means of functions. Let
us consider g : Rp → Rn and h : Rp → Rm as C1 functions. As usual, g and h can
be thought of as g = (g1 , g2 , ..., g n ), and h = (h1 , h2 , ..., h m ), respectively, where
g i : Rp → R (i ∈ 1, n) and h j : Rp → R (j ∈ 1, m) are C1 real valued functions.
Let the set of feasible points be defined as:
M := {x ∈ U | g(x) ≤ 0, h(x) = 0} ⊂ Rp .
Let us observe that we have two types of constraints: equalities and inequalities. Let
x ∈ M. If for an i ∈ 1, n, one has that g i (x) < 0, then the continuity of g ensures
the existence of a neighborhood V of x such that g i (y) < 0 for all y ∈ V . Therefore,
Unauthenticated
Download Date | 2/13/19 10:59 AM
92 | The Study of Smooth Optimization Problems
when one looks for a certificate that x is a local solution of (P), the restriction g i ≤ 0
does not effectively influence the set of points u where one should compare f (x) and
f (u). For this reason, one says that the restriction g i ≤ 0 is inactive at x and these kind
of restrictions should be eliminated from the discussion. In the opposite case, when
g i (x) = 0, we call this active (inequality) restriction. For x ∈ M, we denote the set of
indexes corresponding to active inequality type restrictions by
We are now going to present two types of optimality conditions for problem (P)
with functional constraints as described above. These two types of conditions are
formally very close, but their differences are important for the detection of extreme
points. We start with the Fritz John necessary optimality conditions where the objec-
tive function does not play any special role with respect to the functions which define
the restrictions. We shall consider the drawbacks of these conditions, and next we
shall impose supplementary conditions in order to eliminate then. By this procedure,
we get the famous Karush-Kuhn-Tucker necessary optimality conditions which will be
extensively used for solving nonlinear optimization problems.
The result of this subsection refers to necessary optimality conditions for problem (P)
with functional restrictions without any additional assumption to the general frame-
work already described. These conditions were obtained in 1948 by the German math-
ematician Fritz John.
Theorem 3.2.1 (Fritz John). Let x ∈ M be a solution of (P). Then there exist λ0 ∈ R,
λ0 ≥ 0, λ = (λ1 , λ2 , ..., λ n ) ∈ Rn , µ = (µ1 , µ2 , ..., µ m ) ∈ Rm , with λ0 + kλk + kµk ≠ 0
such that
X n m
X
λ0 ∇f (x) + λ i ∇g i (x) + µ j ∇h j (x) = 0
i=1 j=1
and
λ i ≥ 0, λ i g i (x) = 0, for every i ∈ 1, n.
Proof Let us take δ > 0 such that D(x, δ) ⊂ U and for every x ∈ M ∩ D(x, δ), f (x) ≤ f (x).
For all k ∈ N* we consider the function φ k : D(x, δ) → R given by
n m
k X + 2 k X 2 1
φ k (x) = f (x) + g i (x) + h j (x) + kx − xk2 ,
2 2 2
i=1 j=1
Unauthenticated
Download Date | 2/13/19 10:59 AM
Functional Restrictions | 93
where g +i (x) = max{g i (x), 0}. Clearly, φ k attains its minimum on D(x, δ) and we de-
note by x k such a minimum point. We also observe that
n m
kX + 2 k X 2 1
0 ≤ φ k (x k ) = f (x k ) + g i (x k ) + h j (x k ) + kx k − xk2
2 2 2
i=1 j=1
≤ φ k (x) = f (x).
Since the sequence (x k ) is bounded and f in continuous on D(x, δ), we infer that (f (x k ))
is also a bounded sequence. Letting k → ∞ in the above relation, we get
n
X 2
lim g +i (x k ) = 0
k→∞
i=1
m
X 2
lim h j (x k ) = 0.
k→∞
j=1
The boundedness of (x k ) ensures that one can extract a convergent subsequence of it.
Without relabeling, we can write x k → x* ∈ D(x, δ), and the previous relations yield
x* ∈ M. Consequently, passing to the limit in the inequality above, we have
1 * 2
f (x* ) + x −x ≤ f (x).
2
On the other hand, f (x) ≤ f (x* ), so x* − x = 0, that is x* = x. Therefore x k → x.
An essential remark here is that φ k is differentiable since the (nondifferentiable)
2
scalar functions g +i (x) are squared, whence ∇ g +i (x) = g +i (x)∇g(x). Since x k is a
minimum for φ k on D(x, δ), we deduce that
For k sufficiently large, x k belongs to the interior of the ball D(x, δ) and we conclude
that for these numbers k, one has N(D(x, δ), x k ) = {0}. The combination of these facts
allow us to write
n
X m
X
∇f (x k ) + k g+i (x k )∇g(x k ) + k h j (x k )∇h j (x k ) + x k − x = 0, (3.2.1)
i=1 j=1
2 X n m
2 X 2
λ0k + λ ki + µ kj = 1,
i=1 j=1
Unauthenticated
Download Date | 2/13/19 10:59 AM
94 | The Study of Smooth Optimization Problems
These numbers cannot be zero simultaneously. The positivity of the terms of the se-
quences (λ0k ), (λ ki ) (i ∈ 1, n) implies the positivity of their limits λ0 , λ1 , λ2 , ..., λ n . Now,
we divide relation (3.2.1) by γ k , and we get
n m
X X 1
λ0k ∇f (x k ) + λ ki ∇g(x k ) + µ kj ∇h j (x k ) + (x k − x) = 0.
γk
i=1 j=1
Letting k → ∞ we have the first relation in the conclusion. Now we show the second
one. Let i ∈ 1, n. If λ i = 0, there is nothing to prove. Otherwise, if λ i > 0, from the
definition of λ i we infer that for k sufficiently large, g +i (x k ) > 0, whence g +i (x k ) = g i (x k ).
The relation
0 < g i (x k ) → g i (x) ≤ 0
leads us to the conclusion g i (x) = 0. So, the second part of the conclusion holds and
the theorem is completely proved.
The relations in the conclusion of Theorem 3.2.1 are called Fritz John necessary
optimality conditions. The major drawback of this result is that it does not eliminate
the possibility that the real number associated to the objective function (i.e., λ0 ) can
be zero. This means that it would be possible to have too many points where the con-
ditions in the conclusion are satisfied and therefore, in such a case, the result would
not give important practical hints on the solutions. For instance, if a feasible point x
satisfies ∇g i (x) = 0 for a certain i ∈ A(x) or ∇h j (x) = 0 for an j ∈ 1, m, then it satisfies
Fritz John conditions (with λ0 = 0), the objective function being then completely elim-
inated. In the next subsection we shall impose a condition in order to avoid λ0 = 0.
Let us first illustrate the possibilities created by Theorem 3.2.1 through two con-
crete examples.
One can observe graphically that x = (2, 1) is solution of the problem and A(x) = {1, 2}.
We want to verify Fritz John condition at this point. From the second condition, since
3, 4 ∉ A(x), we get λ3 = λ4 = 0. Since ∇f (x) = (−2, −2), ∇g1 (x) = (4, 2), ∇g2 (x) =
Unauthenticated
Download Date | 2/13/19 10:59 AM
Functional Restrictions | 95
(1, 2), we have to find positive real numbers λ0 , λ1 , λ2 ≥ 0, not simultaneously zero, such
that
λ0 (−2, −2) + λ1 (4, 2) + λ2 (1, 2) = (0, 0).
We get λ1 = 31 λ0 and λ2 = 32 λ0 , whence, by taking λ0 > 0, the first Fritz John condition
is fulfilled.
Let us now have a look to the point x = (0, 0). This time A(x) = {3, 4}, whence
λ1 = λ2 = 0. We have that ∇f (x) = (−6, −4), ∇g3 (x) = (−1, 0), ∇g4 (x) = (0, −1). A
computation shows that the equation
has no solution (λ0 , λ3 , λ4 ) different to zero with positive components. Then x does not
fulfill the Fritz John conditions, hence it is not a minimum point for the given problem.
As seen before, it is desirable to have a Fritz John type result, but with λ0 ≠ 0. We
could directly impose an extra condition in Theorem 3.2.1 in order to ensure this, but
we prefer a direct approach because we aim at working with weak assumptions.
Let consider the sets
X m
X
G(x) = λ i ∇g i (x) + µ j ∇h j (x) | λ i ≥ 0, ∀i ∈ A(x), µ j ∈ R, ∀j ∈ 1, m ⊂ Rp .
i∈A(x) j=1
Before the main result, we need to shed some light on some important relations
for these sets.
Unauthenticated
Download Date | 2/13/19 10:59 AM
96 | The Study of Smooth Optimization Problems
Proof (i) The inclusion G(x) ⊂ D(x)− is obvious, while the reverse one is a direct con-
sequence of Farkas Lemma (Theorem 2.1.8).
(ii) Clearly, 0 ∈ D(x). Let u ∈ T B (M, x) \ {0}. By the definition of tangent vectors,
there exist (t n ) ⊂ (0, ∞), t n → 0 and (u n ) → u such that for every n,
x + t n u n ∈ M.
The next example shows that the reverse inclusion in the item (ii) above is false.
We establish now a generalized form of a classical result known under the name
of Karush-Kuhn-Tucker Theorem, since it was obtained (with stronger assumptions)
by the American mathematicians William Karush, Harold William Kuhn and Albert
William Tucker. It is interesting to note that William Karush obtained the result in
1939, but the mathematical community become aware of its importance when Harold
William Kuhn and Albert William Tucker got the result, in a different way, in 1950.
and
λ i ≥ 0, λ i g i (x) = 0, for every i ∈ 1, n. (3.2.3)
Unauthenticated
Download Date | 2/13/19 10:59 AM
Functional Restrictions | 97
Proof From Theorem 3.1.18, ∇f (x)(u) ≥ 0 for every u ∈ T B (M, x), whence −∇f (x) ∈
T B (M, x)− . We use now the assumption T B (M, x)− = D(x)− to infer that −∇f (x) ∈ D(x)− .
From Proposition 3.2.4 (i), we get −∇f (x) ∈ G(x). Consequently, there exist λ i ≥ 0, i ∈
A(x), µ j ∈ R, j ∈ 1, m such that −∇f (x) = i∈A(x) λ i ∇g i (x) + m
P P
j=1 µ j ∇ h j (x). Now, for
indexes i ∈ 1, n \ A(x) we take λ i = 0, and we obtain the conclusion.
If one compares Theorem 3.2.6 and Theorem 3.2.1, one notices the announced dif-
ference concerning the real number associated to the objective function.
The function L : U × Rn+m → R,
n
X m
X
L(x, (λ, µ)) := f (x) + λ i g i (x) + µ j h j (x)
i=1 j=1
is called the Lagrangian of (P). Therefore, the conclusion given by relation (3.2.2) can
be written as
∇x L(x, (λ, µ)) = 0,
and the elements (λ, µ) ∈ R+n × Rm are called Lagrange multipliers. This name is due
to the fact that the first time this method was used to investigate constrained opti-
mization problems was given in some of Lagrange’s works on calculus of variations
problems.
The preceding theorem does not ensure the uniqueness of these multipliers. We
denote by M(x) the set of Lagrange multipliers at x, i.e.,
Furthermore, let us notice that if one has only equalities as constraints, taking into
account that h(x) = 0 is equivalent to −h(x) = 0, the necessary optimality condition
can by written, for both maxima and minima, as
m
X
∇f (x) + µ j ∇h j (x) = 0.
j=1
Unauthenticated
Download Date | 2/13/19 10:59 AM
98 | The Study of Smooth Optimization Problems
Coming back to the main results, let us observe two more things. Firstly, if the
problem has no restrictions (for instance, U = M = Rp ), then relation (3.2.2) reduces
to the first-order necessary optimality condition (Fermat Theorem): ∇f (x) = 0. Sec-
ondly, the key relation (3.2.2) does not hold without supplementary conditions (here,
T B (M, x)− = D(x)− ). To illustrate this consider the following example.
So, in the next section, every condition which ensures the validity of the Karush-Kuhn-
Tucker Theorem is called a qualification condition, and in view of the decisive impor-
tance of such requirements, we shall discuss it into detail in the next section.
Before that, let us observe that under certain assumptions, Karush-Kuhn-Tucker
conditions (3.2.2) and (3.2.3) are also sufficient for minimality.
Concerning the structure of the set of Lagrange multipliers, we have the following
result.
Proposition 3.2.9. For data with the structure mentioned in the above theorem, the set
M(x) of the Lagrange multipliers is the same for all minimum points of f an M.
Unauthenticated
Download Date | 2/13/19 10:59 AM
Functional Restrictions | 99
Proof Clearly, M is a convex set. Let x1 , x2 ∈ M be two minimum points of (P). Accord-
ing to Proposition 3.1.23, one has f (x1 ) = f (x2 ). Let (λ, µ) ∈ M(x1 ). Then
n
X m
X
∇f (x1 ) + λ i ∇g i (x1 ) + µ j ∇h j (x1 ) = 0
i=1 j=1
and
λ i ≥ 0, λ i g i (x1 ) = 0, for every i ∈ 1, n.
As before,
n
X
f (x2 ) + λ i g i (x2 ) ≥ f (x1 ) = f (x2 ).
i=1
Taking into account the information on the numbers λ i and g i (x2 ), we infer that
λ i g i (x2 ) = 0 for every i ∈ 1, n. From
we get that x2 is a minimum point for the convex function L(·, (λ, µ)) on U. Hence
n
X m
X
∇f (x2 ) + λ i ∇g i (x2 ) + µ j ∇h j (x2 ) = 0.
i=1 j=1
We have that (λ, µ) ∈ M(x2 ). The other inclusion follows by exchanging x1 and x2 in
the above proof.
We now interpret Theorem 3.2.6 by using the concept of saddle point applied to
the Lagrangian function. Firstly, we define the concept.
and to
F(x, y) ≤ F(x, y), ∀(x, y) ∈ X × Y .
For instance, the point (0, 0) is a saddle point of F : R × R → R, F(x, y) = x2 − y2
(the figure below).
The following general result is in order.
Proposition 3.2.11. For all saddle points (x, y) of F, the value F(x, y) is constant. If
(x1 , y1 ) and (x2 , y2 ) are saddle points, then (x1 , y2 ) and (x2 , y1 ) are saddle points as
well.
Unauthenticated
Download Date | 2/13/19 10:59 AM
100 | The Study of Smooth Optimization Problems
If, in the first one, we take x = x2 and y = y2 , and in the second one we put x = x1 and
y = y1 , we get F(x1 , y1 ) = F(x2 , y2 ) = F(x2 , y1 ) = F(x1 , y2 ). Moreover, we can write for
every (x, y) ∈ X × Y ,
F(x1 , y) ≤ F(x1 , y2 ) ≤ F(x, y2 ),
For the general form of problem (P), we consider again the Lagrangian function
L : U × (R+n × Rm ) → R,
n
X m
X
L(x, (λ, µ)) = f (x) + λ i g i (x) + µ j h j (x).
i=1 j=1
Theorem 3.2.12. An element (x, (λ, µ)) ∈ U × (R+n × Rm ) is a saddle point for the La-
grangian function L if and only if the following relations hold:
(i) x is a minimum point for L(·, (λ, µ)) on the open set U;
(ii) x ∈ M;
(iii) λ i g i (x) = 0, for every i ∈ 1, n.
Proof Let (x, (λ, µ)) ∈ U × (R+n × Rm ) be a saddle point for L. Then, according to the
definition,
max L(x, (λ, µ)) = L(x, (λ, µ)) = min L(x, (λ, µ)).
(λ,µ)∈R+n ×Rm x∈U
Unauthenticated
Download Date | 2/13/19 10:59 AM
Functional Restrictions | 101
The second part of this relation is equivalent to (i). It remains to be shown that the
first equality is equivalent to the combination of (ii) and (iii), and this is based on
the fact that L is affine with respect to (λ, µ) and, moreover, the particular form of
R+n ×Rm allows us to easily compute the polar of its Bouligand tangent cone. According
to Proposition 3.1.25, (λ, µ) with the property
Then (
∂L = 0, if λ i > 0
(x, (λ, µ)) = g i (x) : , ∀i = 1, n ,
∂λ i ≤ 0, if λ i = 0
and
∂L
(x, (λ, µ)) = h j (x) = 0, ∀j = 1, m.
∂µ j
The proof is complete.
Corollary 3.2.13. If (x, (λ, µ)) ∈ U × (R+n × Rm ) is a saddle point for the Lagrangian
function L, then x is a solution of (P).
Since for x ∈ M,
L(x, (λ, µ)) ≤ f (x),
we get f (x) ≤ f (x) for every x ∈ M.
Proof According to Theorem 3.2.12, relation (i) above is equivalent to all three relations
in that result. One applies now Theorem 3.1.22 and the conclusion follows.
Unauthenticated
Download Date | 2/13/19 10:59 AM
102 | The Study of Smooth Optimization Problems
The qualification condition T B (M, x)− = D(x)− imposed in Theorem 3.2.6 is called the
Guignard condition at x (after the name of the French mathematician Monique Guig-
nard who proposed it back in 1969) and it is one of the weakest qualification con-
ditions. The difficulty with this condition is that the effective calculations of the in-
volved objects can be tricky in certain situation, and for this reason we want to in-
vestigate and to compare it with other qualification conditions as well. Clearly, rela-
tion T B (M, x) = D(x) is in turn a qualification condition (called the quasiregularity
condition), since implies Guignard condition. As expected, the two conditions are not
equivalent, as one can see from the next example (see also Example 2.1.7).
and
T B (M, x)− = D(x)− = {(u1 , u2 ) | u1 ≤ 0, u2 ≥ 0}.
The qualification conditions are linked to the reference point (x in our notation). Every
time when no confusion concerning the reference point could appear, we avoid, for
simplicity, writing it explicitly.
Two of the most important (from a practical point of view) qualification conditions
are listed below. The first one is called the linear independence qualification condition
(at x) and is as follows:
(The American mathematicians Olvi Leon Mangasarian and Stanley Fromovitz pub-
lished this condition in 1967.)
We will now establish the relations between these conditions and then show that
they are indeed qualification conditions.
Unauthenticated
Download Date | 2/13/19 10:59 AM
Functional Restrictions | 103
Proof Without loss of generality, we suppose that A(x) = {1, ..., q}. Let T be the matrix
of dimensions (q + m) × p with the lines ∇g i (x), i ∈ 1, q, ∇h j (x), j ∈ 1, m and let b be
the column vector with b i = −1, i ∈ 1, q, b j = 0, j ∈ q + 1, q + m. Since the lines of T
are linearly independent, the system Td = b has a solution. If one denotes by u such
a solution, then
is not linearly independent since it consists of three elements in the two dimensional
space R2 . On the other hand, for u = (1, 0), ∇g i (x)(u) < 0 for every i ∈ 1, 3.
Theorem 3.2.16 tells us that in order to show that the two conditions above are quali-
fications conditions, it is enough to show this only for Mangasarian-Fromovitz condi-
tion. This becomes obvious if one applies Theorem 3.2.1 and argues by contradiction.
Suppose that λ0 = 0. Then
X m
X
λ i ∇g i (x) + µ j ∇h j (x) = 0.
i∈A(x) j=1
and the linear independence of the gradients {∇h j (x) | j ∈ 1, m} implies that µ j = 0
for every j ∈ 1, m. Putting together these remarks, we get the contradiction to |λ0 | +
kλk + kµk ≠ 0. Consequently, λ0 ≠ 0.
Unauthenticated
Download Date | 2/13/19 10:59 AM
104 | The Study of Smooth Optimization Problems
Lemma 3.2.18. Let ε > 0 and γ : (−ε, ε) → Rp be a differentiable function such that
γ(0) = x, γ0 (0) = u ≠ 0. Then there exists a sequence (x k ) ⊂ Im γ \ {x}, (x k ) → x such
that
xk − x u
→ .
kx k − xk kuk
Proof We have
γ(t) − x γ(t) − γ(0)
lim = lim = γ0 (0) = u ≠ 0.
t→0 t t→0 t
In particular, for t ≠ 0 sufficiently small one has γ(t) ≠ x. We consider a sequence
(t k ) → 0 of positive numbers and we define x k = γ(t k ). Then,
xk − x xk − x tk u
= → .
kx k − xk t k kx k − xk kuk
This ends the proof.
Proof As already observed, one inclusion is always true. We show only the opposite
one, so we start with an element u ∈ D(x). Denote by u ∈ Rp the vector given by the
Mangasarian-Fromovitz condition. Let λ ∈ (0, 1) and d λ := (1 − λ)u + λu. We show that
d λ ∈ T B (M, x) for every λ ∈ (0, 1), and then, taking λ → 0 and using the closedness
of T B (M, x) the conclusion will follow. Suppose that d λ ≠ 0, since otherwise, there is
nothing to prove.
Let P be the operator defined by the matrix (of dimensions m × p) which has on
the lines the vectors ∇h j (x), j ∈ 1, m of Rp . These vectors are linearly independent
and form a basis in the linear space Im(P). Clearly, from the linear independence of
∇h j (x), j ∈ 1, m one deduces that m ≤ p. But p = dim(Im(P)) + dim(Ker(P)), and
we complete the above linear independent set up to a base of Rp with a set of vectors
{v1 , v2 , ..., v p−m }, and we denote by Z the matrix (of dimensions (p − m) × p) which
has on!the lines these vectors (which give a base in Ker(P)). Then the square matrix
P
is nonsingular. We define φ : Rp+1 → Rp by
Z
Unauthenticated
Download Date | 2/13/19 10:59 AM
Functional Restrictions | 105
Rp such that
φ γ(τ), τ = 0,
for every τ ∈ (−ε, ε). Then
At the same time, for every τ ∈ (−ε, ε) and every x close enough to x we have
φ(x, τ) = 0 ⇒ x = γ(τ).
Since φ(x, 0) = 0, we infer that γ(0) = x. According to the relations (3.2.5), we get, on
one hand (by differentiation),
Pγ0 (0) = 0,
and, on the other hand (by dividing with τ ≠ 0 and passing to the limit),
Zγ0 (0)t = Zd tλ .
g i (x k ) = g i (x) + ∇g i (x)(x k − x) + α k kx k − xk .
Therefore,
g i (x k ) ∇g i (x)(x k − x) k→∞ dλ
= + α k → ∇g i (x) < 0.
kx k − xk kx k − xk kd λ k
Then g i (x k ) < 0 for sufficiently large k . Since there are a finite number of indexes i,
we obtain the conclusion.
In order to show that all four qualification conditions introduced are different, it
remains to prove that the quasiregularity condition does not imply the Mangasarian-
Fromovitz condition.
Unauthenticated
Download Date | 2/13/19 10:59 AM
106 | The Study of Smooth Optimization Problems
Remark 3.2.21. Let us notice that, in particular, Theorem 3.2.19 shows as well that if
h : Rp → R is a C1 function, and x ∈ Rp has the property that ∇h(x) ≠ 0, then the
Bouligand tangent cone to the level curve {x ∈ Rp | h(x) = h(x)} at x is the hyperplane
{u ∈ Rp | ∇h(x)(u) = 0} (or Ker ∇h(x)). Therefore, ∇h(x) is a normal vector to this
hyperplane. We recall here that the affine subspace (of Rp+1 ) tangent to the graph of h
at (x, h(x)) has the equation
and
λ i ≥ 0, λ i g i (x) = 0, for every i ∈ 1, n.
Therefore, checking the convexity and the closedness of M(x) is straightforward. We
will now show that M(x) is bounded. Let, from the Mangasarian-Fromovitz condition,
Unauthenticated
Download Date | 2/13/19 10:59 AM
Functional Restrictions | 107
u ∈ Rp such that
whence X
λ i (−∇g i (x)(u)) = ∇f (x)(u),
i∈A(x)
so
X ∇f (x)(u)
λi ≤ .
mini∈A(x) −∇g i (x)(u)
i∈A(x)
Since the right-hand side is constant, we deduce that the set of multipliers associated
to inequalities constraints is bounded. Suppose, by contradiction, that there exists a
sequence (µ k )k∈N ⊂ Rm unbounded (without loss of generality, we can suppose that
kµ k k → ∞) and a sequence (λ k )k∈N ⊂ R+n such that (λ k , µ k ) ∈ M(x). Then, for every
k ∈ N,
X m
X
∇f (x) + (λ i )k ∇g i (x) + (µ j )k ∇h j (x) = 0.
i∈A(x) j=1
and,
k→∞
kµ k k−1 ∇f (x) → 0.
On the other hand, the sequence (kµ k k−1 µ k ) is bounded (in Rm ), whence, without
relabeling, we can suppose that (kµ k k−1 µ k ) is convergent towards a limit denoted by
µ ∈ Rm \ {0}. Passing to the limit in (3.2.6), we get
m
X
µ j ∇h j (x) = 0.
j=1
Unauthenticated
Download Date | 2/13/19 10:59 AM
108 | The Study of Smooth Optimization Problems
Theorem 3.2.23. The Slater condition implies T(M, x) = D(x) for every x ∈ M whence,
in particular, is a qualification condition.
Proof Let x ∈ M. The inclusion T(M, x) ⊂ D(x) is always true. Let v ∈ D(x). By the
Slater condition (using the convexity of g i ) we deduce (by virtue of Theorem 2.2.10)
that
0 > g i (u) ≥ g i (x) + ∇g i (x)(u − x),
whence, for i ∈ A(x), ∇g i (x)(u − x) < 0. We denote w := u − x, and for λ ∈ (0, 1), we
define
w λ := (1 − λ)v + λw.
We show that w λ ∈ T(M, x) for every λ ∈ (0, 1). For i ∈ A(x),
hence ∇g i (x)(w λ ) < 0. By Taylor’s Formula, there exists t > 0 such that g i (x + tw λ ) <
g i (x) = 0 for every i ∈ A(x). Let (t k ) ⊂ (0, ∞), t k → 0. Then
k→∞
x k := (1 − t k )x + t k (x + tw λ ) = x + t k tw λ → x.
In order for the conclusion to follow, we need to show that for k sufficiently large all
(x k ) are in M. As usual, for i ∉ A(x), the continuity of g ensures this, while for i ∈ A(x),
we have
g i (x k ) ≤ (1 − t k )g i (x) + t k g i (x + tw λ ) < 0.
Since h is affine and h(x) = 0, we get
Therefore, h(x k ) = 0 for any k, so, finally, (x k )k≥k0 ⊂ M, and this means that w λ ∈
T(M, x). We let now λ → 0; since T(M, x) is closed, we get v ∈ T(M, x), and the proof
is complete.
Unauthenticated
Download Date | 2/13/19 10:59 AM
Second-order Conditions | 109
Theorem 3.2.24. In the above conditions and notation, the quasiregularity condition
is automatically fulfilled.
Proof As before, it is enough to prove that D(x) ⊂ T(M, x). Without loss of generality,
one can suppose that A(x) = 1, n. Let v ∈ D(x). Then Av t ≤ 0, Bv t = 0. If v = 0, there
is nothing to prove. Otherwise, we define
1
x k := x + v, ∀k ∈ N* .
k
The relations
Ax tk ≤ b t , Bx tk = c t , x k → x
For affine restrictions, every minimum point of (P) satisfies the conclusions of The-
orem 3.2.6.
Remark 3.3.1. Obviously, C(x, (λ, µ)) ⊂ D(x). In particular, under quasiregularity qual-
ification condition, i.e., T B (M, x) = D(x), one has the inclusion C(x, (λ, µ)) ⊂ T B (M, x).
Moreover, if one has only equalities constraints, then one has the equality, since λ does
not intervene in such a case.
Theorem 3.3.2. Let x ∈ M be a solution of the problem (P) and (λ, µ) ∈ Rn+m a vec-
tor which satisfies Karush-Kuhn-Tucker conditions. If the linear independence condition
Unauthenticated
Download Date | 2/13/19 10:59 AM
110 | The Study of Smooth Optimization Problems
holds at x, then
∇2xx L(x, (λ, µ))(u, u) ≥ 0
Proof Without loss of generality, we suppose that all the inequality constraints are
active. We split the proof into several steps.
At the first step, we repeat, with some modifications, several arguments from the
proof of Theorem 3.2.19 in order to get a sequence of feasible points with special prop-
erties. Let d ∈ D(x), and let P be the operator defined by the matrix (of dimensions
(n + m) × p) with the lines consisting of vectors ∇g i (x), i ∈ 1, n, ∇h j (x), j ∈ 1, m in Rp .
These vectors are linearly independent and form a basis in the linear subspace Im(P).
Let us denote by Z the matrix (of dimensions (p − (n + m)) × p) whose ! lines are some
P
vectors that form a basis in Ker(P). The the square matrix is nonsingular. We
Z
define φ : Rp+1 → Rp by
φ(x, τ) = 0 ⇒ x = γ(τ).
Let (t k ) ⊂ (0, ∞), (t k ) → 0. Then, using the fact that φ(γ(t k ), t k ) = 0, there exists, for
every k large enough, z k = γ(t k ) such that
g i (z k ) = t k ∇g i (x)(d) ≤ 0, ∀i ∈ 1, n (3.3.1)
h j (z k ) = t k ∇h j (x)(d) = 0, ∀j ∈ 1, m.
whence !−1
t
zk − x P
−d = (−t−1
k k z k − x k µ k ),
tk Z
from where, after passing to the limit, one gets the announced relation.
Unauthenticated
Download Date | 2/13/19 10:59 AM
Second-order Conditions | 111
Let now u ∈ C(x, (λ, µ)) ⊂ D(x). We use now the above construction of the se-
quence (z k ) → x corresponding to u. We have
n
X m
X X
L(z k , (λ, µ)) = f (z k ) + λ i g i (z k ) + µ j h j (z k ) = f (z k ) − t k λ i ∇g i (x)(u) = f (z k ).
i=1 j=1 i∈A(x)
From Taylor second-order condition, there exists (γ k ) → 0 such that for every k,
We formulate now a converse of the previous result. As shown before, the suffi-
cient optimality condition returns a stronger type of solution (i.e., strict solution).
Theorem 3.3.3. Let x ∈ M and (λ, µ) ∈ Rn+m a vector which satisfies Karush-Kuhn-
Tucker conditions. Suppose that
for every u ∈ C(x, (λ, µ)) \ {0}. Then x is a local strict solution of second order of (P).
Proof Since the set C(x, (λ, µ)) ∩ {u ∈ Rp | kuk = 1} is compact, and C(x, (λ, µ)) is a
cone, the relation ∇2xx L(x, (λ, µ))(u, u) > 0 for every u ∈ C(x, (λ, µ)) \ {0} is equivalent
to the existence of a strictly positive number ρ with the property
Unauthenticated
Download Date | 2/13/19 10:59 AM
112 | The Study of Smooth Optimization Problems
for every k sufficiently large. Then, without loss of generality, we suppose that
kz k − xk−1 (z k − x) → d ∈ T B (M, x) \ {0} ⊂ D(x). On the other hand,
X
L(z k , (λ, µ)) = f (z k ) + λ i g i (z k ) ≤ f (z k ),
i∈A(x)
1
L(z k , (λ, µ)) = f (x) + ∇xx L(x, (λ, µ))(z k − x, z k − x) + γ k kz k − xk2 . (3.3.2)
2
Suppose that d ∉ C(x, (λ, µ)). Then there exists i0 ∈ A(x) with λ i0 ∇g i0 (x)(d) < 0. For
the other indices i ∈ A(x) we have λ i ∇g i (x)(d) ≤ 0. Then, it exists (τ k ) → 0 such that
for every k,
λ i0 g i0 (z k ) = λ i0 g i0 (x) + λ i0 ∇g i0 (x)(z k − x) + τ k λ i0 kz k − xk
zk − x
= kz k − xk λ i0 ∇g i0 (x) + τ k λ i0 k z k − x k .
kz k − xk
Hence
X
L(z k , (λ, µ)) = f (z k ) + λ i g i (z k ) ≤ f (z k ) + λ i0 g i0 (z k )
i∈A(x)
zk − x
= f (z k ) + kz k − xk λ i0 ∇g i0 (x) + τ k λ i0 k z k − x k .
kz k − xk
Furthermore,
After relabeling, one can see that there exists (ν k ) → 0 such that
zk − x
f (z k ) ≥ f (x) − kz k − xk λ i0 ∇g i0 (x) + ν k kz k − xk .
kz k − xk
Unauthenticated
Download Date | 2/13/19 10:59 AM
Motivations for Scientific Computations | 113
that is
zk − x
k−1 kz k − xk ≥ −λ i0 ∇g i0 (x) + νk .
kz k − xk
Passing to the limit, we arrive at a contradiction to the relation λ i0 ∇g i0 (x)(d) < 0. Con-
sequently, d ∈ C(x, (λ, µ)) \ {0}, whence ∇2xx L(x, (λ, µ))(d, d) ≥ ρ. Since L(z k , (λ, µ)) ≤
f (z k ), coming back to (3.3.2), we can write
1
f (z k ) ≥ f (x) + ∇xx L(x, (λ, µ))(z k − x, z k − x) + γ k kz k − xk2 .
2
But ∇2xx L(x, (λ, µ)) is continuous, whence for k large enough,
Finally,
ρ
f (x) + k−1 kz k − xk2 ≥ f (x) + kz − xk2 + γ k kz k − xk2 ,
4 k
and a new contradiction occurs.
In this section, we examine the computational limits of the theoretical results from
the previous sections. In some cases, it is not possible to get the exact solution of the
optimization problems. The theory leads us to solve some nonlinear (systems of) equa-
tions, which do not admit analytical expressions for the solutions. This motivates us
to subsequently study numerical algorithms for solving such equations, and this will
be done in Chapter 6.
Unauthenticated
Download Date | 2/13/19 10:59 AM
114 | The Study of Smooth Optimization Problems
It should be said that another possible objective function (even more natural to
be considered) would be
XN
v i − φ(t i , x)
i=1
but this construction does not preserve the differentiability. For this reason, one
prefers the sum of the squares of the residuals, whence the name of the method.
Let us now consider the simplest case of a linear dependence. Let us suppose that
one has made N measurements at the different moments of time t1 , t2 , ..., t N > 0 and,
correspondingly, one has the values v1 , v2 , ..., v N . We know that the dependence be-
tween these two sets of data is linear, and we are interested in obtaining a line which
better fits the collection of observations. Let be a line v = at + b. As above, the residual
at the moment t i is v i − (at i + b) and in order to “measure” the sum of these residu-
als, we consider the function f : R2 → R,
N
X 2
f (a, b) = v i − (at i + b) .
i=1
The line with respect to which this sum of residuals will be the smallest, will be that
which we seek. Then, we arrive at the problem of minimization (without restrictions)
of the function f . We compute the partial derivatives of f :
N
∂f X
(a, b) = 2(−t i ) v i − (at i + b)
∂a
i=1
N
∂f X
(a, b) = −2 v i − (at i + b) ,
∂b
i=1
and the calculus of critical points is reduced to computation of the solutions of the
system: P P
N 2 N PN
i=1 t i a + i=1 t i b = i=1 t i v i
P
N P N
i=1 t i a + Nb = i=1 v i .
N
! N
!2 N
! N
! N
!2
X X X X X
∆ := N t2i − ti = 1 2
t2i − ti .
i=1 i=1 i=1 i=1 i=1
Unauthenticated
Download Date | 2/13/19 10:59 AM
Motivations for Scientific Computations | 115
From the Hölder inequality, this number is positive (equality would be possible only
if all the values t i are equal, but this is not possible). Then the system admits a unique
solution: ! !−1 !
PN 2 PN PN
a i=1 ti i=1 ti i=1 ti vi
= PN PN .
b i=1 t i N i=1 v i
Another remark is that an important part of the above calculations can be repeated
with obvious changes if one supposes a dependence of the type v = a · p(t) + b · q(t),
where p, q : R → R. One obtains the system
PN 2 PN ! ! PN !
i=1 p (t i ) i=1 p(t i )q(t i ) a i=1 p(t i )v i
PN PN 2 = PN .
i=1 p(t i )q(t i ) i=1 q (t i ) b i=1 q(t i )v i
Again, the Hölder inequality ensures that the associated matrix is invertible if and only
if (p2 (t i ))i=1,N and (q2 (t i ))i=1,N are not proportional.
In general, for more complicated models (nonlinear in x) the method of least
squares does not have an easily computable solution and this will be an impetus for
us to study several algorithm in order to get good approximation of solution in fast
computational time.
2. (The projection on a closed convex set) Let a1 , a2 , ..., a p ∈ (0, ∞) and the
set (generalized ellipsoid)
p
( 2 )
p
X xi
M := x ∈ R | ≤1 .
ai
i=1
Obviously, this is a convex and compact set. Let v ∉ M. From Theorem 2.1.5, there
exists v ∈ M, the projection of v on M. Again, we want to find an expression of this
element.
As before, v is the solution of the minimization problem of f (x) = kx − vk2 under
the restriction x ∈ M. If it would exist a solution x ∈ int M, then ∇f (x) = 0, whence
x − v = 0, that is v ∈ M, which is false. Therefore, the restriction is active in v, that is
Pp v i 2
i=1 a i = 1. Moreover, the function which defines (with inequality) the constraint
x ∈ M, i.e., g : Rp → R,
p 2
X xi
g(x) := − 1,
ai
i=1
is convex, and the Slater condition holds. Moreover, f is convex as well, so we can
conclude that v is a solution of the problem if and only of there exists λ ≥ 0 such that
Unauthenticated
Download Date | 2/13/19 10:59 AM
116 | The Study of Smooth Optimization Problems
p
X a2i v2i
2 = 1,
i=1 a2i + λ
so finding λ (and then v) requires solving the above equation. Let us remark that the
equation has a unique solution, since the mapping
p
X a2i v2i
0 ≤ λ 7→ 2
i=1 a2i + λ
is strictly decreasing, its value at 0 is strictly greater than 1 (notice that v ∉ M), while
its limit at +∞ is 0. So, to get v one must to solve an algebraic equation of degree 2p,
and, in general, this is impossible. We will be interested in approximation methods of
the solutions of nonlinear equations, and this will be one of the subjects of Chapter 6.
Unauthenticated
Download Date | 2/13/19 10:59 AM