0% found this document useful (0 votes)
26 views

Lecture_3_taxonomy_taylor

Uploaded by

drbaskerphd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Lecture_3_taxonomy_taylor

Uploaded by

drbaskerphd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

UW-Madison CS/ISyE/Math/Stat 726 Spring 2024

Lecture 3: Solution Concepts; Taylor’s Theorems

Yudong Chen

Consider the problem


min f ( x ), (P)
x ∈X

where X ⊆ dom( f ) ⊆ Rn is a closed set.

1 A Taxonomy of Solutions to (P)


Will use “solution” and “minimizer” interchangeably.
Definition 1. We say that x ∗ ∈ X ⊆ dom( f ) is
1. a local minimizer/solution of (P) if there exists a neighborhood N x∗ of x ∗ such that for all
x ∈ N x∗ ∩ X we have f ( x ) ≥ f ( x ∗ );

2. a global minimizer of (P) if ∀ x ∈ X : f ( x ) ≥ f ( x ∗ )

3. a strict local minimizer of (P) if there exists a neighborhood N x∗ of x ∗ such that for all x ∈
N x∗ ∩ X and x 6= x ∗ we have f ( x ) > f ( x ∗ ); (i.e., satisfies part 1 with a strict inequality)
4. an isolated local minimizer of (P) if there exists a neighborhood N x∗ such that ∀ x ∈ N x∗ ∩ X :
f ( x ) ≥ f ( x ∗ ) and N x∗ does not contain any other local minimizer.

5. a unique minimizer if it is the only global minimizer.


Example 1. A local minimizer that is not strict: consider a constant function
Example 2. A local minimizer that is not global: (picture)

Exercise 1. Prove that every isolated local minimizer is strict.


The converse of the above statement does not hold in general, as demonstrated by the example
below.

1
UW-Madison CS/ISyE/Math/Stat 726 Spring 2024

Example 3. A strict minimizer that is not isolated:


(
1 x 6= 0
• (not continuous) f 1 ( x ) = and x ∗ = 0.
0 x=0
(
x2 1 + sin2 ( 1x )

x 6= 0
• (continuous) f 2 ( x ) = and x ∗ = 0.
0 x=0

Illustration: Left f 1 . Right: f 2 .

We want to determine whether a particular point is a local or global minimizer. A powerful


tool is Taylor’s theorem.

2 Taylor’s Theorem
For this part and until explicitly stated otherwise, we will be assuming that f is at least once
continuously differentiable (i.e., gradient exists everywhere and is continuous).

Recall: Taylor’s Theorem for 1D functions from calculus: Let f : R → R be a k-times continu-
ously differentiable function. Then
1 0 1 1
∀ x, y ∈ R : f (y) = f ( x ) + f ( x )(y − x ) + f 00 ( x )(y − x )2 + · · · + f (k) ( x )(y − x )k + Rk (y) .
1! 2! k! | {z }
remainder

Typical forms of Rk (y) (assume that f is k + 1 times continuously differentiable):


• Lagrange (mean-value) remainder:
1
Rk (y) = f (k+1) ( x + γ(y − x )) · (y − x )k+1
( k + 1) !
for some γ ∈ (0, 1);

• Integral remainder:
Z 1
1
Rk (y) = (1 − t)k f (k+1) ( x + t(y − x )) (y − x )k+1 dt.
k! 0

Below is the multivariate version.

2
UW-Madison CS/ISyE/Math/Stat 726 Spring 2024

Theorem 1 (Taylor’s Theorem; Thm 2.1 in Wright-Recht). Let f : Rd → R̄ be a continuously differ-


entiable function. Then, for all x, y ∈ dom( f ) such that {(1 − α) x + αy : α ∈ (0, 1)} ⊆ dom( f ), we
have
R1
1. f (y) = f ( x ) + 0 h∇ f ( x + t(y − x )) , y − x i dt

2. f (y) = f ( x ) + h∇ f ( x + γ(y − x )) , y − x i for some γ ∈ (0, 1) (a.k.a. Mean Value Thm).


If f is twice continuously differentiable:
R1
3. ∇ f (y) = ∇ f ( x ) + 0 ∇2 f ( x + t(y − x )) (y − x )dt. Here

···
 
. ∂2 f .. d×d
∇2 f ( x ) = .. ∂xi ∂x j ( x ) . ∈ R
···
denotes the Hessian matrix (“second-order derivative”) of f at x.

4. ∃γ ∈ (0, 1) :
1
f (y) = f ( x ) + h∇ f ( x ), y − x i + ∇2 f ( x + γ(y − x )) (y − x ), y − x
2
1
= f ( x ) + h∇ f ( x ), y − x i + (y − x )> ∇2 f ( x + γ(y − x )) (y − x ).
2

Remark 1. A common mistake is to write down the following “Mean-Value Thm” for the gradient:

∃γ ∈ (0, 1) : ∇ f (y) = ∇ f ( x ) + ∇2 f ( x + γ(y − x )) (y − x )? ←− This is wrong!

2.1 Digression: order notation


Two sequences: { ak }k≥1 , {bk }k≥1 , for all k: ak , bk ≥ 0.

Big-Oh notation: ak = O(bk ) ⇐⇒

(∃ M > 0)(∃K < ∞)(∀k ≥ K ) : ak ≤ Mbk .


1 2 1
e.g. k = O( 10 k ), k = O( 10! k)
If ak = O(bk ) and bk = O( ak ), we write ak = Θ(bk ) .

Little-oh notation:
ak
ak = o (bk ) ⇐⇒ lim = 0.
k→∞ bk
So ak = o (1) means ak → 0.

Using the notations above, we can show that for f continuously differentiable at x, we have

f ( x + p) = f ( x ) + ∇ f ( x )> p + o (k pk) .

Explicitly, this means


f ( x + p) − f ( x ) + ∇ f ( x )> p
lim = 0.
k pk→0 k pk

3
UW-Madison CS/ISyE/Math/Stat 726 Spring 2024

Proof. By part 2 of Theorem 1 (Taylor’s), we have

f ( x + p) = f ( x ) + ∇ f ( x + γp)> p
= f ( x ) + ∇ f ( x )> p + (∇ f ( x + γp) − ∇ f ( x ))> p
= f ( x ) + ∇ f ( x )> p + O (k∇ f ( x + γp) − ∇ f ( x )k2 · k pk2 ) Cauchy-Schwarz
>
= f ( x ) + ∇ f ( x ) p + o (k pk2 ) ,

where the step follows from continuity of ∇ f : k∇ f ( x + γp) − ∇ f ( x )k2 → 0 as p → 0.

You might also like