Chương 9
Chương 9
Unconstrained minimization
Outline
Descent methods
▶ descent methods generate iterates as
f (x (k) ) − p ★ ≤ c k (f (x (0) ) − p ★ )
– very slow if γ ≫ 1 or γ ≪ 1
– example for γ = 10 at right
– called zig-zagging
Examples
Newton’s method
Another intrepretation
Newton decrement
▶ equal to the norm of the Newton step in the quadratic Hessian norm
Newton’s method
Example: R2
Example in R100
(same problem as slide 9.14)
Example in R 10000
(with sparse ai)
Self-concordance
shortcomings of classical convergence analysis
▶ depends on unknown constants (m, L, . . . )
▶ bound is not affinely invariant, although Newton’s method is
convergence analysis via self-concordance (Nesterov and Nemirovski)
▶ does not depend on any unknown constants
▶ gives affine-invariant bound
▶ applies to special class of convex self-concordant functions
▶ developed to analyze polynomial-time interior-point methods for convex
optimization
examples on R
▶ linear and quadratic functions
▶ negative logarithm f (x) = − log x
▶ negative entropy plus negative logarithm: f (x) = x log x − log x
Self-concordant calculus
properties
▶ preserved under positive scaling α ≥ 1, and sum
▶ preserved under composition with affine function
▶ if g is convex with dom g = R++ and |g ′′′(x)| ≤ 3g′′(x)/x then
examples: properties can be used to show that the following are s.c
Numerical example
▶ 150 randomly generated instances of f (x) = −Σmi=1 log(bi − aTi x), x ∈ Rn
▶ ο: m = 100, n = 50; ☐: m = 1000, n = 500; ◇ : m = 1000, n = 50
Implementation
main effort in each iteration: evaluate derivatives and solve Newton system
HΔx = −g
Example
Quadratic functions
▶ convex quadratic: f (x) = (1/2)x TPx + qT x + r, P ⪰ 0
▶ we can solve exactly via linear equations
∇f (x) = Px + q = 0
Iterative methods
▶ for most non-quadratic functions, we use iterative methods
▶ these produce a sequence of points x (k) ∈ dom f , k = 0, 1, . . .
▶ x (0) is the initial point or starting point
▶ x (k) is the kth iterate
▶ we hope that the method converges, i.e.,
f (x (k) ) → p ★ , ∇f (x (k) ) → 0
▶ 2nd condition is hard to verify, except when all sublevel sets are closed
– equivalent to condition that epi f is closed
– true if dom f = R n
– true if f (x) → ∞ as x → bd dom f
▶ hence, S is bounded
▶ we conclude p ★ > −∞, and for x ∈ S,
f (x) − p★ ≤ 1/2m ||∇f (x)||22
▶ useful as stopping criterion (if you know m, which usually you do not)