0% found this document useful (0 votes)
5 views

Mathematical Methods of Optimization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Mathematical Methods of Optimization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Mathematical Methods of

Click to edit Master title style


Optimization

Nonlinear Programming
Introduction to Nonlinear
Click to edit Master title style
Programming
• We already talked about a basic aspect of
nonlinear programming (NLP) in the
Introduction Chapter when we considered
unconstrained optimization.
Introduction to Nonlinear
Click to edit Master title style
Programming

• We optimized one-variable nonlinear


functions using the 1st and 2nd derivatives.

• We will use the same concept here


extended to functions with more than one
variable.
Multivariable Unconstrained
Click to edit Master title style
Optimization
• For functions with one variable, we use the 1st
and 2nd derivatives.

• For functions with multiple variables, we use


identical information that is the gradient and the
Hessian.

• The gradient is the first derivative with respect to


all variables whereas the Hessian is the
equivalent of the second derivative
Click to The
edit Gradient
Master title style
• Review of the gradient ():
For a function “f ”, of variables x1, x2, …, xn:

 f f f 
f =   
 x1 x2 xn 

Example: f = 15 x1 + 2( x2 )3 − 3x1 ( x3 ) 2


f = 15 − 3( x3 ) 2 6( x2 ) 2 − 6 x1 x3 
Click to edit
The Master
Hessiantitle style
• The Hessian (2) of f(x1, x2, …, xn) is:
 f 2 f 2
f 
2

  
 x12
2
x1x2 x1xn 
 f f 2
f 
2

 2 f =  x x x2
2
x2xn 
 2 1 
 2    
 f f f 
2 2

 x x  2 
 n 1 xn x2 xn 
Click to
Hessian
edit Master
Example
title style
• Example (from previously):
f = 15 x1 + 2( x2 )3 − 3x1 ( x3 ) 2


f = 15 − 3( x3 ) 2 6( x2 ) 2 − 6 x1 x3 
 0 0 − 6 x3 
 2 f =  0 12 x2 0 
− 6 x3 0 − 6 x1 
Click
Unconstrained
to edit Master
Optimization
title style
The optimization procedure for multivariable
functions is:
1. Solve for the gradient of the function equal to
zero to obtain candidate points.
2. Obtain the Hessian of the function and
evaluate it at each of the candidate points
• If the result is “positive definite” (defined later) then
the point is a local minimum.
• If the result is “negative definite” (defined later) then
the point is a local maximum.
Click
Positive/Negative
to edit Master Definite
title style
• A matrix is “positive definite” if all of the
eigenvalues of the matrix are positive
(> 0)

• A matrix is “negative definite” if all of the


eigenvalues of the matrix are negative
(< 0)
Click to edit Master
Positive/Negative title style
Semi-definite
• A matrix is “positive semi-definite” if all of
the eigenvalues are non-negative (≥ 0)

• A matrix is “negative semi-definite” if all of


the eigenvalues are non-positive (≤ 0)
Click toExample
edit Master
Matrix
title style
Given the matrix A: 2 4 5
A = − 5 − 7 − 1
 1 1 2 

The eigenvalues of A are:


1 = −3.702 2 = −2 3 = 2.702

This matrix is negative definite


Unconstrained
Click to edit Master
NLP title
Example
style
Consider the problem:
Minimize f(x1, x2, x3) = (x1)2 + x1(1 – x2) + (x2)2
– x2x3 + (x3)2 + x3

First, we find the gradient with respect to xi:


 2x1 + 1 − x2 

f =  − x1 + 2x2 − x3 
 − x2 + 2x3 + 1 
Unconstrained
Click to edit Master
NLP title
Example
style
Next, we set the gradient equal to zero:
 2 x1 + 1 − x2  0
f = 0   − x + 2 − x  = 0 
 1 2 3  
− x2 + 2 x3 + 1 0

So, we have a system of 3 equations and 3


unknowns. When we solve, we get:
 x1  − 1
x =  x2  = − 1
 x3  − 1
Unconstrained
Click to edit Master
NLP title
Example
style
So we have only one candidate point to
check.

Find the Hessian:

 2 −1 0 

 f = − 1 2 − 1
2 
 0 − 1 2 
Unconstrained
Click to edit Master
NLP title
Example
style
The eigenvalues of this matrix are:
1 = 3.414 2 = 0.586 3 = 2

All of the eigenvalues are > 0, so the


Hessian is positive definite.

 x1  − 1
So, the point x =  x2  = − 1 is a minimum
 x3  − 1
Unconstrained
Click to edit Master
NLP title
Example
style
Unlike in Linear Programming, unless we
know the shape of the function being
minimized or can determine whether it is
convex, we cannot tell whether this point is
the global minimum or if there are function
values smaller than it.
ClickMethod
to edit Master
of Solution
title style
• In the previous example, when we set the
gradient equal to zero, we had a system of
3 linear equations & 3 unknowns.
• For other problems, these equations could
be nonlinear.
• Thus, the problem can become trying to
solve a system of nonlinear equations,
which can be very difficult.
ClickMethod
to edit Master
of Solution
title style
• To avoid this difficulty, NLP problems are
usually solved numerically.

• We will now look at examples of numerical


methods used to find the optimum point for
single-variable NLP problems. These and
other methods may be found in any
numerical methods reference.
Click to
Newton’s
edit Master
Method
title style
When solving the equation f (x) = 0 to find a
minimum or maximum, one can use the
iteration step:
' k
k +1 f ( x )
x = x − '' k
k

f (x )
where k is the current iteration.
Iteration is continued until |xk+1 – xk| <  where
 is some specified tolerance.
Click
Newton’s
to editMethod
MasterDiagram
title style

Tangent of
f (x) at xk

x
f (x) x* xk+1 xk

Newton’s Method approximates f (x) as a straight


line at xk and obtains a new point (xk+1), which is
used to approximate the function at the next
iteration. This is carried on until the new point is
sufficiently close to x*.
Click
Newton’s
to edit
Method
Master
Comments
title style
• One must ensure that f (xk+1) < f (xk) for
finding a minimum and f (xk+1) > f (xk) for
finding a maximum.
• Disadvantages:
– Both the first and second derivatives must be
calculated
– The initial guess is very important – if it is not
close enough to the solution, the method may
not converge
ClickRegula-Falsi
to edit Master
Method
title style
This method requires two points, xa & xb that
bracket the solution to the equation
f (x) = 0.
f (x )  (x − x )
' b b a
x =x − ' b
c b

f (x ) − f (x )
' a

where xc will be between xa & xb. The next


interval will be xc and either xa or xb,
whichever has the sign opposite of xc.
Click
Regula-Falsi
to edit Master
Diagram
title style

f (x)

xa xc
x* x
x
b

The Regula-Falsi method approximates the


function f (x) as a straight line and
interpolates to find the root.
Click
Regula-Falsi
to edit Master
Comments
title style
• This method requires initial knowledge of
two points bounding the solution
• However, it does not require the
calculation of the second derivative
• The Regula-Falsi Method requires slightly
more iterations to converge than the
Newton’s Method
Click
Multivariable
to edit Master
Optimization
title style
• Now we will consider unconstrained
multivariable optimization
• Nearly all multivariable optimization
methods do the following:
1. Choose a search direction dk
2. Minimize along that direction to find a new
point:
xk +1
= x + d
k k k

where k is the current iteration number and k


is a positive scalar called the step size.
Click to The
edit Step
Master
Size
title style
• The step size, k, is calculated in the
following way:
• We want to minimize the function f(xk+1) =
f(xk +kdk) where the only variable is k
because xk & dk are known.
• We set ( ) and solve for k
df x k +  k d k
=0
d k

using a single-variable solution method


such as the ones shown previously.
Click
Steepest
to editDescent
MasterMethod
title style
• This method is very simple – it uses the
gradient (for maximization) or the negative
gradient (for minimization) as the search
direction:
+  max 
d =  f (x ) for 
k k

−   min 

+  k
So, x k +1
= x   f (x )
k k

− 
Click
Steepest
to editDescent
MasterMethod
title style
• Because the gradient is the rate of change
of the function at that point, using the
gradient (or negative gradient) as the
search direction helps reduce the number
of iterations needed
x2 f(x) = 5
-f(xk)
f(x) = 20
f(xk)

xk
f(x) = 25
x1
Steepest
Click to Descent
edit Master
Method
title style
Steps
So the steps of the Steepest Descent
Method are:
1. Choose an initial point x0
2. Calculate the gradient f(xk) where k is
the iteration number
3. Calculate the search vector: d = f (x )
k k

4. Calculate the next x: x k +1 = x k +  k d k


Use a single-variable optimization
method to determine k.
Steepest
Click to Descent
edit Master
Method
title style
Steps
5. To determine convergence, either use
some given tolerance 1 and evaluate:
f (x k +1 ) − f (x k )   1

for convergence
Or, use another tolerance 2 and evaluate:
f ( x k )   2
for convergence
Click to edit
Convergence
Master title style
• These two criteria can be used for any of
the multivariable optimization methods
discussed here

Recall: The norm of a vector, ||x|| is given by:


x = xT  x = ( x1 ) 2 + ( x2 ) 2 +  + ( xn ) 2
Click
Steepest
to edit
Descent
MasterExample
title style
Let’s solve the earlier problem with the
Steepest Descent Method:
Minimize f(x1, x2, x3) = (x1)2 + x1(1 – x2) + (x2)2
– x2x3 + (x3)2 + x3

0 
Let’s pick x 0 = 0
 
0
Click
Steepest
to edit
Descent
MasterExample
title style
f (x) = 2 x1 + (1 − x2 ) − x1 + 2 x2 − x3 − x2 + 2 x3 + 1

d 0 = −f (x 0 ) = −2(0) + 1 − 0 − 0 + 0 − 0 − 0 + 0 + 1
= −1 0 1 = −1 0 −1

x1 = 0 0 0 +  0  − 1 0 − 1

Now, we need to determine 0


Click
Steepest
to edit
Descent
MasterExample
title style

f (x ) = ( ) + (− )(1) + 0 − 0 + ( ) + (− )


1 0 2 0 0 2 0

= 2( ) − 2( )
0 2 0

1
df (x )
= 4( ) − 2
0

d 0

Now, set equal to zero and solve:


4( 0 ) = 2   0 = 2 = 1
4 2
Click
Steepest
to edit
Descent
MasterExample
title style
So,
x = 0 0 0 +   − 1 0 − 1
1 0

 1 1
= 0 0 0 + − 0 − 
 2 2

 1 1
 x = −
1
0 − 
 2 2
Click
Steepest
to edit
Descent
MasterExample
title style
Take the negative gradient to find the next
search direction:

 1 1 
d = −f (x ) = − − 1 + 1 + 0
1 1
+0+ 0 − 1 + 1
 2 2 

d1 = 0 − 1 0
Click
Steepest
to edit
Descent
MasterExample
title style
Update the iteration formula:

 1 1
x = −
2
0 −  +  1  0 − 1 0
 2 2

 1 1
= − − 1
− 
 2 2
Click
Steepest
to edit
Descent
MasterExample
title style
Insert into the original function & take the
derivative so that we can find 1:

1  1
( ) 1 1 1 1
f (x ) = +  −  1 +  + ( ) − ( )  + −
2

4  2
1 1 2

2 4 2
= ( ) −  − 1
1 2 1
2

1
df (x )
= 2( ) − 1
1

d 1
Click
Steepest
to edit
Descent
MasterExample
title style
Now we can set the derivative equal to zero
and solve for 1:

2( ) = 1   = 1
1 1
2
Click
Steepest
to edit
Descent
MasterExample
title style
Now, calculate x2:
 1 1
x = −
2
0 −  +  1  0 − 1 0
 2 2
 1 1  1 
= − 0 −  + 0 − 0
 2 2  2 

 1 1 1
 x = −
2
− − 
 2 2 2
Click
Steepest
to edit
Descent
MasterExample
title style
 1 1 1 1 
d = −f (x ) = − − 1 + 1 +
2 2
−1 + − 1 + 1
 2 2 2 2 
 1 1
 d = −
2
0 − 
 2 2
So,
 1 1 1 2  1 1
x = −
3
− −  +   − 0 − 
 2 2 2  2 2
 1 2 1 1 2 
= − ( + 1) − − ( + 1)
 2 2 2 
Click
Steepest
to edit
Descent
MasterExample
title style
Find 2: 1 2 3 2
f (x ) = ( + 1) − ( + 1) +
3 2 1
2 2 4
3
df (x ) 3
= ( + 1) −
2

d 2
2

Set the derivative equal to zero and solve:


3
( + 1) =
2
  2
=1
2 2
Click
Steepest
to edit
Descent
MasterExample
title style
Calculate x3:
 1 1 1 2  1 1
x = −
3
− −  +   − 0 − 
 2 2 2  2 2
 1 1 1  1 1
= − − −  + − 0 − 
 2 2 2  4 4

 3 1 3
 x = −
3
− − 
 4 2 4
Click
Steepest
to edit
Descent
MasterExample
title style
Find the next search direction:

 1   1 
d = −f (x ) = − 0
3 3
0  = 0 − 0
 2   2 

 3 1 3 3  1 
x = −
4
− −  +   0 − 0
 4 2 4  2 
 3 1 3 3
= − − ( + 1) − 
 4 2 4
Click
Steepest
to edit
Descent
MasterExample
title style
Find 3: 1 3 3 3 3
f (x ) = ( + 1) − ( ) −
4 2

4 2 2

4
df (x ) 1 3 9
= ( + 1) − = 0
d 3
2 8

5
 =
3

4
Click
Steepest
to edit
Descent
MasterExample
title style
So, x4 becomes:
 3 1 3  5 
x = −
4
− −  + 0 − 0
 4 2 4  8 

 3 9 3
 x = −
4
− − 
 4 8 4
Click
Steepest
to edit
Descent
MasterExample
title style
The next search direction:
5 3 5  5 3 5
d = −f (x ) = − 
4 4
−  = − − 
8 4 8  8 4 8

 3 9 3 4  5 3 5
x = −
5
− −  +   − − 
 4 8 4  8 4 8
 1 5 4 3 3 1 5 4 
= − (3 +  ) − ( −  ) − (3 +  )
4

 4 2 4 2 4 2 
Click
Steepest
to edit
Descent
MasterExample
title style
Find 4: 73 4 2 43 4 51
f (x ) = ( ) −  −
5

32 32 64

5
df (x ) 73 4 43
=  − =0
d 4
16 32

  = 43146
4
Click
Steepest
to edit
Descent
MasterExample
title style
Update x5:

 3 9 3  43  5 3 5
x = −
5
− − +  − − 
 4 8 4  146  8 4 8

 1091 66 1091 
 x = −
5
− −
 1168 73 1168 
Click
Steepest
to edit
Descent
MasterExample
title style
Let’s check to see if the convergence criteria
is satisfied
Evaluate ||f(x5)||:
 21 35 21 
f (x ) = 
5

 584 584 584 

f (x ) =
5
(21584 ) + (35 584 ) + (21584 )
2 2 2
= 0.0786
Click
Steepest
to edit
Descent
MasterExample
title style
So, ||f(x5)|| = 0.0786, which is very small
and we can take it to be close enough to
zero for our example
Notice that the answer of
 1091 66 1091 
x = − − −
 1168 73 1168 
is very close to the value of x = − 1 − 1 − 1
*

that we obtained analytically


Multivariable
Click to edit Master
Newton’s
title
Method
style
We can approximate the gradient of f at a
point x0 by:

f (x)  f (x 0 ) +  2 f (x 0 )  (x − x 0 )
We can set the right-hand side equal to zero
and rearrange to give:
0

x = x −  f (x )
2 0

−1
 f (x )
0
Multivariable
Click to edit Master
Newton’s
title
Method
style
We can generalize this equation to give an
iterative expression for the Newton’s
Method:

x k +1 k

= x −  f (x )
2 k

−1
 f (x )
k

where k is the iteration number


Click
Newton’s
to edit Master
Methodtitle
Stepsstyle
1. Choose a starting point, x0
2. Calculate f(xk) and 2f(xk)
3. Calculate the next x using the equation

x k +1 k

= x −  f (x )
2 k

−1
 f (x )
k

4. Use either of the convergence criteria


discussed earlier to determine
convergence. If it hasn’t converged,
return to step 2.
Comments
Click to edit
onMaster
Newton’s
titleMethod
style
• We can see that unlike the previous two
methods, Newton’s Method uses both the
gradient and the Hessian
• This usually reduces the number of
iterations needed, but increases the
computation needed for each iteration
• So, for very complex functions, a simpler
method is usually faster
Click
Newton’s
to editMethod
MasterExample
title style
For an example, we will use the same
problem as before:

Minimize f(x1, x2, x3) = (x1)2 + x1(1 – x2) + (x2)2


– x2x3 + (x3)2 + x3

f (x) = 2 x1 − x2 + 1 − x1 + 2 x2 − x3 − x2 + 2 x3 + 1
Click
Newton’s
to editMethod
MasterExample
title style
The Hessian is:  2 −1 0 

 f (x) = − 1 2 − 1
2 
 0 − 1 2 

And we will need the inverse of the Hessian:

 2 −1 0 
−1 3 1 1 
 4 2 4
2

−1 
 f (x) = − 1 2 − 1  = 1
2
1 1 
2
 0 − 1 2  1 1 3
 4 2 4 
Click
Newton’s
to editMethod
MasterExample
title style
0 
So, pick 
x = 0 
0 
0

Calculate the gradient for the 1st iteration:


f (x 0 ) = 0 − 0 + 1 − 0 + 0 − 0 − 0 + 0 + 1

 f (x 0 ) = 1 0 1
Click
Newton’s
to editMethod
MasterExample
title style
So, the new x is:
−1
x = x −  f ( x )  f ( x 0 )
1 0 2 0

3 1 1 
0   4 2 4   1
= 0  −  1 1 1   0 
 2 2
0   1 1 3   1
 4 2 4 

− 1
 x1 = − 1
− 1
Click
Newton’s
to editMethod
MasterExample
title style
Now calculate the new gradient:
f (x ) = − 2 + 1 + 1 1 − 2 + 1 1 − 2 + 1 = 0 0 0
1

Since the gradient is zero, the method has


converged
Click
Comments
to edit Master
on Example
title style
• Because it uses the 2nd derivative,
Newton’s Method models quadratic
functions exactly and can find the optimum
point in one iteration.
• If the function had been a higher order, the
Hessian would not have been constant
and it would have been much more work
to calculate the Hessian and take the
inverse for each iteration.
Click to edit
References
Master title style
Material for this chapter has been taken
from:

• Optimization of Chemical Processes 2nd


Ed.; Edgar, Thomas; David Himmelblau; &
Leon Lasdon.

You might also like