Lecture_3_taxonomy_taylor
Lecture_3_taxonomy_taylor
Yudong Chen
3. a strict local minimizer of (P) if there exists a neighborhood N x∗ of x ∗ such that for all x ∈
N x∗ ∩ X and x 6= x ∗ we have f ( x ) > f ( x ∗ ); (i.e., satisfies part 1 with a strict inequality)
4. an isolated local minimizer of (P) if there exists a neighborhood N x∗ such that ∀ x ∈ N x∗ ∩ X :
f ( x ) ≥ f ( x ∗ ) and N x∗ does not contain any other local minimizer.
1
UW-Madison CS/ISyE/Math/Stat 726 Spring 2024
2 Taylor’s Theorem
For this part and until explicitly stated otherwise, we will be assuming that f is at least once
continuously differentiable (i.e., gradient exists everywhere and is continuous).
Recall: Taylor’s Theorem for 1D functions from calculus: Let f : R → R be a k-times continu-
ously differentiable function. Then
1 0 1 1
∀ x, y ∈ R : f (y) = f ( x ) + f ( x )(y − x ) + f 00 ( x )(y − x )2 + · · · + f (k) ( x )(y − x )k + Rk (y) .
1! 2! k! | {z }
remainder
• Integral remainder:
Z 1
1
Rk (y) = (1 − t)k f (k+1) ( x + t(y − x )) (y − x )k+1 dt.
k! 0
2
UW-Madison CS/ISyE/Math/Stat 726 Spring 2024
···
. ∂2 f .. d×d
∇2 f ( x ) = .. ∂xi ∂x j ( x ) . ∈ R
···
denotes the Hessian matrix (“second-order derivative”) of f at x.
4. ∃γ ∈ (0, 1) :
1
f (y) = f ( x ) + h∇ f ( x ), y − x i + ∇2 f ( x + γ(y − x )) (y − x ), y − x
2
1
= f ( x ) + h∇ f ( x ), y − x i + (y − x )> ∇2 f ( x + γ(y − x )) (y − x ).
2
Remark 1. A common mistake is to write down the following “Mean-Value Thm” for the gradient:
Little-oh notation:
ak
ak = o (bk ) ⇐⇒ lim = 0.
k→∞ bk
So ak = o (1) means ak → 0.
Using the notations above, we can show that for f continuously differentiable at x, we have
f ( x + p) = f ( x ) + ∇ f ( x )> p + o (k pk) .
3
UW-Madison CS/ISyE/Math/Stat 726 Spring 2024
f ( x + p) = f ( x ) + ∇ f ( x + γp)> p
= f ( x ) + ∇ f ( x )> p + (∇ f ( x + γp) − ∇ f ( x ))> p
= f ( x ) + ∇ f ( x )> p + O (k∇ f ( x + γp) − ∇ f ( x )k2 · k pk2 ) Cauchy-Schwarz
>
= f ( x ) + ∇ f ( x ) p + o (k pk2 ) ,