0% found this document useful (0 votes)
33 views

Gradient Of A Function هّلادلا رادحنإ

1. The gradient of a function is a vector representing the direction of steepest ascent or descent of the function. It is calculated as the partial derivatives of the function with respect to each independent variable. 2. Moving in the negative gradient direction will result in the fastest decrease of the function value, as it represents the direction of steepest descent. Optimization methods use the gradient vector, directly or indirectly, to find search directions that minimize the objective function. 3. The gradient can be approximated using finite difference formulas when exact derivatives are difficult to calculate or not defined at all points. This involves evaluating the function at neighboring points to estimate the partial derivatives numerically.

Uploaded by

Gafeer Fable
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Gradient Of A Function هّلادلا رادحنإ

1. The gradient of a function is a vector representing the direction of steepest ascent or descent of the function. It is calculated as the partial derivatives of the function with respect to each independent variable. 2. Moving in the negative gradient direction will result in the fastest decrease of the function value, as it represents the direction of steepest descent. Optimization methods use the gradient vector, directly or indirectly, to find search directions that minimize the objective function. 3. The gradient can be approximated using finite difference formulas when exact derivatives are difficult to calculate or not defined at all points. This involves evaluating the function at neighboring points to estimate the partial derivatives numerically.

Uploaded by

Gafeer Fable
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Chapter -6-

Indirect Search (Descent) Methods

)‫طرق البحث غير المباشر (النزول‬

6.8 Gradient of a Function ‫إنحدار الدالّه‬


There are a lot of functions varying due to many variables, for example the distribution
of a magnetic field 𝑩 in any iron cored coil is a function of the position (𝑥, 𝑦, 𝑧), the
permeability (𝜇), the frequency (𝑓), and the temperature (𝑇). If the objective function
is similar to that described above, then to determine its minimum point, the Gradient
Method is adopted.

Any equation involving one or more independent variables is called partial differential
equation. The gradient of such a function is an 𝑛-component vector given by:

𝜕𝑓⁄𝜕𝑥1
𝜕𝑓⁄𝜕𝑥2
∇𝑓 = { } … … … … … … . . (1)

𝜕𝑓⁄𝜕𝑥𝑛

The gradient has a very important property. If we move along the gradient direction
from any point in 𝑛-dimensional space, the function value increases at the fastest rate.
Hence the gradient direction is called the direction of steepest ascent. Unfortunately,
the direction of steepest ascent is a local property and not a global one. This is
illustrated in figure (1) which represents the steepest path in a 2-D problem.

Since the gradient vector represents the direction of steepest ascent, the negative of the
gradient vector denotes the direction of steepest descent. Thus, any method that makes
use of the gradient vector can be expected to give the minimum point faster than one
that does not make use of the gradient vector. All the descent methods make use of the
gradient vector, either directly or indirectly, in finding the search directions.

1
Before considering the descent methods of minimization, we prove that the gradient
vector represents the direction of steepest ascent.

Figure (1) The steep path in a 2-D problem

Theorem: The gradient vector represents the direction of steepest ascent.

Proof: Consider an arbitrary point 𝑿 in the 𝑛-dimensional space. Let 𝑓 denote the value
of the objective function at the point 𝑿. Consider a neighboring point 𝑿 + 𝑑𝑿 with
𝑑𝑿1
𝑑𝑿
𝑑𝑿 = { 2 } … … … … … … … … … . (2)

𝑑𝑿𝑛
𝑑𝑿1 , 𝑑𝑿2 , . . . , 𝑑𝑿𝑛 Represent the components of the vector 𝑑𝑿.
To find the magnitude of the vector 𝑑𝑿, ds, let us consider the simple case in a 3-D
problem as shown in figure (2).
In matrix form
2
𝑑𝑿1
𝑑𝑿 = {𝑑𝑿2 }
𝑑𝑿3
𝑇
𝑑𝑿 = {𝑑𝑿1 𝑑𝑿2 𝑑𝑿3 }
𝑇
∴ 𝑑𝑿 𝑑𝑿 = (𝑑𝑥1 )2 + (𝑑𝑥2 )2 + (𝑑𝑥3 )2 = (𝑑𝑠)2

𝑥3 -axis

𝑿 + 𝑑𝑿

ȁ𝑑𝑿ȁ = 𝑑𝑠

(𝑑𝑠)2 = (𝑑𝑥1 )2 + (𝑑𝑥2 )2 + (𝑑𝑥3 )2


𝑑𝑠 𝑑𝑥3
𝑿
𝑑𝑿
u 𝑥1 -axis

𝑑𝑥2

𝑑𝑥1

𝑥2 -axis
Figure (2)

𝑖=3

∴ 𝑑𝑿𝑇 𝑑𝑿 = (𝑑𝑠)2 = ∑(𝑑𝑥𝑖 )2


𝑖=1
In general, for 𝑛 variables, objective function
𝑖=𝑛

∴ 𝑑𝑿𝑇 𝑑𝑿 = (𝑑𝑠)2 = ∑(𝑑𝑥𝑖 )2 … … … … (3)


𝑖=1

Now, if 𝑓 is the magnitude of the objective function at 𝑿, and the change in this
magnitude due to the change 𝑑𝑿 is 𝑑𝑓, then the value of the objective function at the
point (𝑿 + 𝑑𝑿)is (𝑓 + 𝑑𝑓).

By definition
𝑛
𝜕𝑓
𝑑𝑓 = ∑ 𝜕𝑥𝑖 = ∇𝑓 𝑇 𝑑𝑿 … … … … … . . (4)
𝜕𝑥𝑖
𝑖=1

3
If 𝐮 denotes the unit vector along the direction 𝑑𝑿 and 𝑑𝑠 the length of 𝑑𝑿, we can
write

𝑑𝑿 = 𝐮 𝑑𝑠
𝑑𝑿
∴ = 𝐮 … … … … … … … … … … … … … . (5)
𝑑𝑠
From eq'ns (4) and (5)
𝑛
𝑑𝑓 𝜕𝑓 𝜕𝑥𝑖 𝑑𝑿
=∑ = ∇𝑓 𝑇 = ∇𝑓 𝑇 𝐮 … . . … (6)
𝑑𝑠 𝜕𝑥𝑖 𝑑𝑠 𝑑𝑠
𝑖=1

The value of 𝑑𝑓/𝑑𝑠 will be different for different directions and we are interested in
finding the particular step 𝑑𝑿 along which the value of 𝑑𝑓/𝑑𝑠 will be maximum. This
will give the direction of steepest ascent. By using the definition of the dot product eq'n
(6) can be written as:
𝑑𝑓
= ȁ∇𝑓ȁ ∙ ȁ𝐮ȁ cos 𝜃 … … … … … … … … … . . (7)
𝑑𝑠
ȁ∇𝑓 ȁ, and ȁ𝐮ȁ denote the lengths of the vectors ∇𝑓 and 𝐮, respectively, and 𝜃 indicates
the angle between the vectors ∇𝑓 and u. It can be seen that 𝑑𝑓/𝑑𝑠 will be maximized
when 𝜃 = 0° and minimum when 𝜃 = 180°. This indicates that the function value
increases at a maximum rate in the direction of the gradient (i.e., when 𝐮 is along ∇𝑓 ).

Theorem: The maximum rate of change of 𝑓 at any point 𝑿 is equal to the magnitude
of the gradient vector at the same point.

Proof: The rate of change of the function 𝑓 with respect to the step length 𝑠 along a
direction 𝐮 is given by Eq. (7). Since 𝑑𝑓/𝑑𝑠 is of maximum value when 𝜃 = 0° and
𝐮 is a unit vector, Eq. (7) gives
𝑑𝑓
( )| = ȁ∇𝑓ȁ
𝑑𝑠 max

This proves the theorem.

Evaluation of the Gradient

The evaluation of the gradient requires the computation of the partial derivatives(𝜕𝑓/
𝜕𝑥𝑖 ) where 𝑖 = 1, 2, . . . , 𝑛. There are three situations where the evaluation of the
gradient poses certain problems:

4
1. The function is differentiable at all the points, but the calculation of the
components of the gradient, 𝜕𝑓/𝜕𝑥𝑖 is either impractical or impossible.
2. The expressions for the partial derivatives 𝜕𝑓/𝜕𝑥𝑖 can be derived, but they
require large computational time for evaluation.
3. The gradient ∇𝑓 is not defined at all the points.

In the first case, we can use the forward finite-difference formula

𝜕𝑓 𝑓(𝑿𝑚 + ∆𝑥𝑖 𝐮𝑖 ) − 𝑓(𝑿𝑚 )


| ≅ , 𝑖 = 1,2, … , 𝑛 … … (8)
𝜕𝑥𝑖 𝑿 ∆𝑥𝑖
𝑚

Equation (8) used to approximate the partial derivative 𝜕𝑓/𝜕𝑥𝑖 at 𝑿𝑚 . If the function
value at the base point 𝑿𝑚 is known, this formula requires one additional function
evaluation to find (𝜕𝑓⁄𝜕𝑥𝑖 )ȁ𝑿𝑚 . Thus it requires 𝑛 additional function evaluations to
evaluate the approximate gradient ∇𝑓ȁ𝑿𝑚 . For better results we can use the central finite
difference formula to find the approximate partial derivative (𝜕𝑓⁄𝜕𝑥𝑖 )ȁ𝑿𝑚 :

𝜕𝑓 𝑓(𝑿𝑚 + ∆𝑥𝑖 𝐮𝑖 ) − 𝑓(𝑿𝑚 − ∆𝑥𝑖 𝐮𝑖 )


| ≅ , 𝑖 = 1,2, … , 𝑛 … … (9)
𝜕𝑥𝑖 𝑿 2∆𝑥𝑖
𝑚

This formula requires two additional function evaluations for each of the partial
derivatives. In eq'ns (8) and (9) ∆𝑥𝑖 is a small scalar quantity and 𝐮𝑖 is a vector of order
𝑛 whose 𝑖th component has a value of 1, and all other components have a value of zero.
In practical computations, the value of ∆𝑥𝑖 has to be chosen with some care. If ∆𝑥𝑖 is
too small, the difference between the values of the function evaluated at (𝑿𝑚 + ∆𝑥𝑖 𝐮𝑖 )
and (𝑿𝑚 − ∆𝑥𝑖 𝐮𝑖 ) may be very small and numerical round-off error may predominate.
On the other hand, if ∆𝑥𝑖 is too large, the truncation error may predominate in the
calculation of the gradient.

In the second case also, the use of finite-difference formulas is preferred whenever the
exact gradient evaluation requires more computational time than the one involved in
using Eq. (8) or (9).

In the third case, we cannot use the finite-difference formulas since the gradient is not
defined at all the points. For example, consider the function shown in Fig. (3). If eq'n
(9) is used to evaluate the derivative 𝑑𝑓/𝑑𝑠 at 𝑿𝑚 , we obtain a value of 𝛼1 for a step
size ∆𝑥1 and a value of 𝛼2 for a step size ∆𝑥2 . Since, in reality, the derivative does not
exist at the point 𝑿𝑚 , use of finite-difference formulas might lead to a complete
breakdown of the minimization process. In such cases the minimization can be done
only by one of the direct search techniques discussed earlier.

5
Figure (3)

Rate of Change of a Function along a Direction

Note: this part of the derivation deals with the rate of change of each (𝑗) component of
the ( 𝑿𝑖 ) points due to a step change (λ) in each (𝑖𝑗) direction components in
(𝑺𝑖 ) directions to reach the 𝑗th component of 𝑿.

In most optimization techniques, we are interested in finding the rate of change of a


function with respect to a parameter λ along a specified direction, 𝐒𝑖 away from a
point 𝑿𝑖 . Any point in the specified direction away from the given point 𝑿𝑖 can be
expressed as 𝑿 = 𝑿𝑖 + λ 𝐒𝑖 . Our interest is to find the rate of change of the function
along the direction 𝐒𝑖 (characterized by the parameter λ), that is,
𝑿
λ

𝑿𝑖 𝑺𝑖

𝑛
𝑑𝑓 𝜕𝑓 𝜕𝑥𝑗
=∑ … … … … … … . . (10)
𝑑λ 𝜕𝑥𝑗 𝑑λ
𝑗=1

𝑥𝑗 Represents the 𝑗th component of 𝑿, But

6
𝜕𝑥𝑗 𝜕
= (𝑥𝑖𝑗 + λ𝑠𝑖𝑗 ) = 𝑠𝑖𝑗 … … … … … (11)
𝜕λ 𝜕λ
Note: This eq'n (11) corresponds to eq'n (5). The term (𝑥𝑖𝑗 + λ𝑠𝑖𝑗 ) is a function while
the symbol 𝑠𝑖𝑗 is the magnitude of the derivative(𝜕𝑥𝑗 ⁄𝜕λ).

Where 𝑥𝑖𝑗 and 𝑠𝑖𝑗 are the 𝑗th components of 𝑿𝑖 and 𝑺𝑖 , respectively. Hence
𝑛
𝜕𝑓 𝜕𝑓
=∑ 𝑠𝑖𝑗 = ∆𝑓 𝑇 𝑺𝑖 … … … … … . . (12)
𝜕λ 𝜕𝑥𝑖
𝑗=1

Note: This eq'n (12) corresponds to eq'n (6).

If λ∗ minimizes 𝑓 in the direction 𝑺𝑖 , we have


𝑑𝑓
| = ∇𝑓ȁ𝑇λ∗ 𝑺𝑖 = 0 … … … … … . . . . (13)
𝜕λ λ=λ∗
At the point

𝑿 = 𝑿𝑖 + λ∗ 𝑺𝑖

Steepest Descent (Cauchy) Method

The use of the negative of the gradient vector as a direction for minimization was first
made by Cauchy in 1847. In this method we start from an initial trial point 𝑿1 and
iteratively move along the steepest descent directions until the optimum point is found.
The steepest descent method can be summarized by the following steps:
1. Start with an arbitrary initial point 𝑿1 . Set the iteration number as 𝑖 = 1.
2. Find the search direction 𝑺𝑖 as
𝑺𝑖 = −∇𝑓𝑖 = −∇𝑓(𝑿𝑖 ) … … … … . ( )
3. Determine the optimal step length λ∗𝑖 in the direction 𝑺𝑖 and set
𝑿𝑖+1 = 𝑿𝑖 + λ∗ = 𝑿𝑖 − λ∗ ∇𝑓𝑖

4. Test the new point, 𝑿𝑖+1 for optimality. If 𝑿𝑖+1 is optimum, stop the process.
Otherwise, go to step 5.
5. Set the new iteration number 𝑖 = 𝑖 + 1 and go to step 2.

The method of steepest descent may appear to be the best unconstrained minimization
technique since each one-dimensional search starts in the “best” direction. However,
owing to the fact that the steepest descent direction is a local property, the method is not
really effective in most problems.

7
Example 6.8: Minimize 𝑓(𝑥1 , 𝑥2 ) = 𝑥1 − 𝑥2 + 2𝑥12 + 2𝑥1 𝑥2 + 𝑥22 starting from the
0
point 𝑿1 = { }.
0

Iteration 1
𝑥1 0
𝑿1 = {𝑥 } = { }
2 0
𝑓(𝑥1 , 𝑥2 ) = 0
The gradient of 𝑓 is given by
𝜕𝑓⁄𝜕𝑥1 1 + 4𝑥1 + 2𝑥2
∇𝑓 = { }={ }
𝜕𝑓⁄𝜕𝑥2 −1 + 2𝑥1 + 2𝑥2
1
∇𝑓1 = ∇𝑓(𝑿1 ) = { }
−1

Therefore,
−1
𝑺1 = −∇𝑓1 = { }
1

To find 𝑿2 , we need to find the optimal step length λ1∗ . For this, we minimize 𝑓(𝑿1 +
λ1 𝑺1 ) = 𝑓(−λ1 , λ1 ) = λ12 − 2λ1 with respect to λ1 . Since 𝑑𝑓⁄𝑑 λ1 = 0 at λ1∗ = 1, we
obtain
0 −1 −1
𝑿𝟐 = 𝑿𝟏 + λ1∗ 𝑺1 = { } + 1 { } = { }
0 1 1
−1 0
As ∇𝑓2 = ∇𝑓(𝑿2 ) = { } ≠ { } , 𝑿2 is not optimum
−1 0
Iteration 2
1
𝑺2 = −∇𝑓2 = { }
1
To minimize
𝑓(𝑿2 + λ2 𝑺2 ) = 𝑓(−1 + λ2 , 1 + λ2 )
𝑓(𝑿2 + λ2 𝑺2 ) = 5λ22 − 2λ2 − 1
1
We set 𝑑𝑓⁄𝑑λ2 = 0. This gives λ∗2 = , and hence
5
−1 1 1 −0.8
𝑿3 = 𝑿2 + λ∗2 𝑺2 = { } + { } = { }
1 5 1 1.2

0.2
Since the components of the gradient at 𝑿3 , ∇𝑓3 = { } are not zero, we proceed to
−0.2
the next iteration.

Iteration 3
−0.2
𝑺3 = −∇𝑓3 = { }
0.2

8
As

𝑓(𝑿3 + λ3 𝑺3 ) = 𝑓(−0.8 − 0.2λ3 , 1.2 + 0.2λ3 )


𝑑𝑓
𝑓(𝑿3 + λ3 𝑺3 ) = 0.04λ23 − 0.08λ3 − 1.2, = 0 at λ∗3 = 1.0
𝑑λ3
Therefore,

−0.8 −0.2 −1.0


𝑿4 = 𝑿3 + λ∗3 𝑺3 = { } + 1.0 { }={ }
1.2 0.2 1.4
The gradient at 𝑿4 is given by
−0.2
∇𝑓4 = { }
−0.2
0
Since ∇𝑓4 ≠ { }, 𝑿4 is not optimum and hence we have to proceed to the next iteration.
0
−1.0
This process has to be continued until the optimum point, 𝑿∗ = { } is found.
1.5
𝑥1 −1.0 0
𝑿∗ = {𝑥 } = { }={ }
2 1.5 0
Hence 𝑿∗ is the optimum point

Convergence Criteria: The following criteria can be used to terminate the iterative
process.

1. When the change in function value in two consecutive iterations is small:


𝑓(𝑿𝑖+1 ) − 𝑓(𝑿𝑖 )
| | ≤ 𝜀1
𝑓(𝑿𝑖 )
2. When the partial derivatives (components of the gradient) of 𝑓 are small:
𝜕𝑓
| | ≤ 𝜀2 , 𝑖 = 1,2, , , , , , , , 𝑛
𝜕𝑥𝑖

3. When the change in the design vector in two consecutive iteration is small:
ȁ𝑿𝑖+1 − 𝑿𝑖 ȁ ≤ 𝜀3

9
6.10 Conjugate Gradient (Fletcher-Reeves) Method

Ex 6.9: Minimize 𝑓(𝑥1 , 𝑥2 ) = 𝑥1 − 𝑥2 + 2𝑥12 + 2𝑥1 𝑥2 + 𝑥22 starting from the


0
point 𝑿1 = { }.
0

Iteration 1
𝑥1 0
𝑿1 = {𝑥 } = { }
2 0
𝑓(𝑥1 , 𝑥2 ) = 0
The gradient of 𝑓 is given by
𝜕𝑓⁄𝜕𝑥1
∇𝑓 = { }
𝜕𝑓⁄𝜕𝑥2
The gradient at point 𝑿1 is ∇𝑓1
1 + 4𝑥1 + 2𝑥2
∇𝑓1 = ∇𝑓(𝑿1 ) = { }
−1 + 2𝑥1 + 2𝑥2
Since 𝑥1 = 0 and 𝑥2 = 0, at 𝑿1 then
1 + (4 × 0) + (2 × 0) 1
∇𝑓1 = { }={ }
−1 + (2 × 0) + (2 × 0) −1
1
∴ ∇𝑓1 = ∇𝑓(𝑿1 ) = { }
−1

Therefore,
−1
𝑺1 = −∇𝑓1 = { }
1
To find the optimal step length 𝜆1∗ along 𝑺1 , we minimize 𝑓(𝑿1 + 𝜆1 𝑺1 ) with respect
to 𝜆1 . Since
𝑥1 0
𝑿1 = {𝑥 } = { }
2 0

0 + 𝜆1 × (−1) −𝜆
𝑓(𝑿1 + 𝜆1 𝑺1 ) = { } = { 1}
0 + 𝜆2 × (+1) +𝜆2
∴ 𝑓(𝑿1 + 𝜆1 𝑺1 ) = 𝑓(−𝜆1 , +𝜆2 )
Substitute these values in the objective equation results in
𝑓(−𝜆1 , +𝜆2 ) = −𝜆1 − 𝜆1 + 2(−𝜆1 )2 + 2(−𝜆1 )(+𝜆1 ) + (+𝜆1 )2
∴ 𝑓(𝑿1 + 𝜆1 𝑺1 ) = 𝑓(−𝜆1 , +𝜆2 ) = 𝜆1 2 − 2𝜆1
Minimize 𝑓(𝑿1 + 𝜆1 𝑺1 ) by making the derivative of 𝑓 with respect to 𝜆1 equal to zero
to determine 𝜆1∗ as follows:
𝑑𝑓
=0
𝑑𝜆1
Since
𝑑𝑓
= 2𝜆1 − 2 = 0
𝑑𝜆1
2
∴ 𝜆1∗ = = 1
2
10
Therefore,
0 −1 −1
𝑿𝟐 = 𝑿𝟏 + 𝜆1∗ 𝑺1 = { } + 1 { } = { }
0 +1 +1
Iteration 2

1 + 4𝑥1 + 2𝑥2
∴ ∇𝑓2 = ∇𝑓(𝑿2 ) = { }
−1 + 2𝑥1 + 2𝑥2
Since 𝑥1 = −1, and 𝑥2 = +1, then:
+1 + 4 × (−1) + 2 × 1 −1
∇𝑓2 = { }={ }
−1 + 2 × (−1) + 2 × 1 −1
Equation (6.81) gives the next search direction as:
ȁ∇𝑓2 ȁ𝟐
𝑺𝟐 = −∇𝑓2 + 𝑺
ȁ∇𝑓1 ȁ𝟐 𝟏
Where
ȁ∇𝑓2 ȁ𝟐 = (−1)2 + (−1)2 = 2
And
ȁ∇𝑓1 ȁ𝟐 = (+1)2 + (−1)2 = 2
−1 2 −1 0
∴ 𝑺𝟐 = − { } + { } = { }
−1 2 1 2

To find 𝜆2 , we minimize

𝑓(𝑿2 + 𝜆2 𝑺2 ) = 𝑓(−1, 1 + 2𝜆2 )


𝑓(𝑿2 + 𝜆2 𝑺2 ) = −1 − (1 + 2𝜆2 ) + 2 − 2(1 + 2𝜆2 ) + (1 + 2𝜆2 )2
𝑓(𝑿2 + 𝜆2 𝑺2 ) = 4𝜆22 − 2𝜆2 − 1
1
With respect to 𝜆2 , as (𝑑𝑓⁄𝑑𝜆2 ) = 8𝜆2 − 2 = 0 at 𝜆∗2 = , we obtain
4

−1 1 0 −1
𝑿𝟑 = 𝑿𝟐 + 𝜆∗2 𝑺2 = { } + { } = { }
+1 4 2 1.5
Thus the optimum point is reached in two iterations. Even if we do not know this point
to be optimum, we will not be able to move from this point in the next iteration. This
can be verified as follows.

Now
0
∇𝑓3 = ∇𝑓(𝑿𝟑 ) = { } , ȁ∇𝑓2 ȁ2 = 2, and ȁ∇𝑓3 ȁ2 = 0
0
Thus
0 0 0 0
𝑺𝟑 = −∇𝑓3 + (ȁ∇𝑓3 ȁ2 ⁄ȁ∇𝑓2 ȁ2 )𝑺2 = − { } + ( ) { } = { }
0 2 2 0
This shows that there is no search direction to reduce 𝑓 further and hence 𝑿𝟑 is
optimum.

11

You might also like