0% found this document useful (0 votes)

24 views

Chương 9

Uploaded by

trangtran.091103

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Chương 9

Uploaded by

trangtran.091103

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

9.

Unconstrained minimization

Outline

Gradient descent method

Descent methods
▶ descent methods generate iterates as

x (k+1) = x (k) + t (k)Δx (k)

with f (x (k+1) ) < f (x (k) ) (hence the name)

▶ other notations: x + = x + tΔx, x := x + tΔx
▶ Δx(k) is the step, or search direction
▶ t (k) > 0 is the step size, or step length
▶ from convexity, f (x + ) < f (x) implies ∇f (x) TΔx < 0
▶ this means Δx is a descent direction

Generic descent method

General descent method.

given a starting point x ∈ dom f .
repeat
1. Determine a descent direction Δx.
2. Line search. Choose a step size t > 0.
3. Update. x := x + tΔx.
until stopping criterion is satisfied.

Line search types

▶ exact line search: t = argmint>0 f (x + tΔx)

▶ backtracking line search (with parameters α ∈ (0,1/2), β ∈(0,1))
– starting at t = 1, repeat t := βt until f (x + tΔx) < f (x) + αt∇f (x) TΔx
▶ graphical interpretation: reduce t (i.e., backtrack) until t ≤ t0
Gradient descent method

▶ general descent method with Δx = −∇f (x)

given a starting point x ∈ dom f .
repeat
1. Δx := −∇f (x).
2. Line search. Choose step size t via exact or backtracking line search.
3. Update. x := x + tΔx.
until stopping criterion is satisfied.
▶ stopping criterion usually of the form ∥∇f (x) ∥2 ≤ ε
▶ convergence result: for strongly convex f ,

f (x (k) ) − p ★ ≤ c k (f (x (0) ) − p ★ )

c ∈ (0, 1) depends on m, x (0) , line search type

▶ very simple, but can be very slow

Example: Quadratic function on R 2

▶ take f (x) = (1/2) (x 12 + γx 22 ), with γ > 0

▶ with exact line search, starting at x (0) = (γ, 1):

– very slow if γ ≫ 1 or γ ≪ 1
– example for γ = 10 at right
– called zig-zagging

Example: Nonquadratic function on R 2

▶ f (x1, x2) = e x1+3x2−0.1 + e x1−3x2−0.1 + e −x1−0.1

backtracking line search exact line search

Example: A problem in R 100
▶ f (x) = c T x − Σ500 i=1 log(bi − a T i x)
▶ linear convergence, i.e., a straight line on a semilog plot

Steepest descent method

▶ normalized steepest descent direction (at x, for norm ∥ · ∥):
Δxnsd = argmin{∇f (x) T v | ∥v∥ = 1}
▶ interpretation: for small v, f (x + v) ≈ f (x) + ∇f (x) T v;
▶ direction Δxnsd is unit-norm step with most negative directional derivative
▶ (unnormalized) steepest descent direction: Δxsd = ∥∇f (x) ∥∗Δxnsd
▶ satisfies ∇f (x) TΔxsd = −∥∇f (x) ∥2 ∗
▶ steepest descent method
– general descent method with Δx = Δxsd
– convergence properties similar to gradient descent

Examples

▶ Euclidean norm: Δxsd = −∇f (x)

▶ quadratic norm ∥x∥P = (x TPx) 1/2 (P ∈ S n++): Δxsd = −P −1∇f (x)
▶ ℓ1-norm: Δxsd = −(∂f (x)/∂xi)ei , where |∂f (x)/∂xi | = ∥∇f (x) ∥∞
▶ unit balls, normalized steepest descent directions for quadratic norm and ℓ1-
norm:

Choice of norm for steepest descent

▶ steepest descent with backtracking line search for two quadratic norms
▶ ellipses show {x | ∥x − x (k) ∥P = 1}
▶ interpretation of steepest descent with quadratic norm ∥ · ∥P: gradient descent
after change of variables ¯x = P 1/2 x
▶ shows choice of P has strong effect on speed of convergence

Newton’s method

▶ Newton step is Δxnt = −∇2 f (x) −1∇f (x)

▶ interpretation: x + Δxnt minimizes second order approximation

Another intrepretation

▶ x + Δxnt solves linearized optimality condition

And one more interpretation

▶ Δxnt is steepest descent direction at x in local Hessian norm
∥u∥∇2 f (x) = (uT∇2f(x)u )1/2

▶ dashed lines are contour lines of f ; ellipse is {x + v | v T∇2f (x)v = 1}

▶ arrow shows −∇f (x)

Newton decrement

▶ Newton decrement is λ(x) = (∇f (x)T∇ 2f (x) −1∇f (x))1/2

▶ a measure of the proximity of x to x★
▶ gives an estimate of f (x) − p★, using quadratic approximationb f^:

▶ equal to the norm of the Newton step in the quadratic Hessian norm

▶ directional derivative in the Newton direction: ∇f (x) TΔxnt = −λ(x) 2

▶ affine invariant (unlike ∥∇f (x) ∥2)

Newton’s method

given a starting point x ∈ dom f , tolerance ε > 0.

repeat

1. Compute the Newton step and decrement.

Δxnt := −∇2 f (x) −1∇f (x); λ2 := ∇f (x) T∇2 f (x) −1∇f (x).
2. Stopping criterion. quit if λ2 /2 ≤ ε.
3. Line search. Choose step size t by backtracking line search.
4. Update. x := x + tΔxnt.

▶ affine invariant, i.e., independent of linear changes of coordinates

▶ Newton iterates for f ˜(y) = f (Ty) with starting point y (0) = T −1 x (0) are y (k) = T −1 x (k)

Classical convergence analysis

assumptions
▶ f strongly convex on S with constant m
▶ ∇ 2 f is Lipschitz continuous on S, with constant L > 0:
∥∇2 f (x) − ∇2 f (y)∥2 ≤ L∥x − y∥2
(L measures how well f can be approximated by a quadratic function)

outline: there exist constants η ∈ (0, m2 /L), γ > 0 such that

Classical convergence analysis

damped Newton phase (∥∇f (x) ∥2 ≥ η)

▶ most iterations require backtracking steps
▶ function value decreases by at least γ
▶ if p ★ > −∞, this phase ends after at most (f (x (0) ) − p ★)/γ iterations

quadratically convergent phase (∥∇f (x) ∥2 < η)

▶ all iterations use step size t = 1
▶ ∥∇f (x) ∥2 converges to zero quadratically: if ∥∇f (x (k) ) ∥2 < η, then

Classical convergence analysis

conclusion: number of iterations until f (x) − p ★ ≤ ε is bounded above by

▶γ, ε0 are constants that depend on m, L, x (0)

▶second term is small (of the order of 6) and almost constant for practical purposes
▶ in practice, constants m, L (hence γ, ε0) are usually unknown
▶ provides qualitative insight in convergence properties (i.e., explains two algorithm
phases)

Example: R2

(same problem as slide 9.13)

▶ backtracking parameters α = 0.1, β = 0.7
▶ converges in only 5 steps
▶ quadratic local convergence

Example in R100
(same problem as slide 9.14)

▶ backtracking parameters α = 0.01, β= 0.5

▶ backtracking line search almost as fast as exact l.s. (and much simpler)
▶ clearly shows two phases in algorithm

Example in R 10000
(with sparse ai)

▶ backtracking parameters α = 0.01, β = 0.5.

▶ performance similar as for small examples
Self-concordant functions

Self-concordance
shortcomings of classical convergence analysis
▶ depends on unknown constants (m, L, . . . )
▶ bound is not affinely invariant, although Newton’s method is
convergence analysis via self-concordance (Nesterov and Nemirovski)
▶ does not depend on any unknown constants
▶ gives affine-invariant bound
▶ applies to special class of convex self-concordant functions
▶ developed to analyze polynomial-time interior-point methods for convex
optimization

Convergence analysis for self-concordant functions

definition
▶ convex f : R → R is self-concordant if |f ′′′(x)| ≤ 2f ′′(x) 3/2 for all x ∈ dom f
▶ f : R n → R is self-concordant if g(t) = f (x + tv) is self-concordant for all x ∈ dom f ,
v ∈ Rn

examples on R
▶ linear and quadratic functions
▶ negative logarithm f (x) = − log x
▶ negative entropy plus negative logarithm: f (x) = x log x − log x

affine invariance: if f : R → R is s.c., then f ˜(y) = f (ay + b) is s.c.:

f ˜ ′′′(y) = a3 f ′′′(ay + b), f ˜ ′′(y) = a2 f ′′(ay + b)

Self-concordant calculus
properties
▶ preserved under positive scaling α ≥ 1, and sum
▶ preserved under composition with affine function
▶ if g is convex with dom g = R++ and |g ′′′(x)| ≤ 3g′′(x)/x then

f (x) = log(−g(x)) − log x

is self-concordant

examples: properties can be used to show that the following are s.c

Convergence analysis for self-concordant functions

summary: there exist constants η ∈ (0, 1/4], γ > 0 such that

▶ if λ(x) > η, then f (x (k+1) ) − f (x (k) ) ≤ −γ
▶ if λ(x) ≤ η, then 2 λ(x (k+1) ) ≤ 2λ(x (k) ))2
(η and γ only depend on backtracking parameters α, β)

complexity bound: number of Newton iterations bounded by

for α = 0.1, β = 0.8, ε = 10−10, bound evaluates to 375(f (x (0) ) − p★) + 6

Numerical example
▶ 150 randomly generated instances of f (x) = −Σmi=1 log(bi − aTi x), x ∈ Rn
▶ ο: m = 100, n = 50; ☐: m = 1000, n = 500; ◇ : m = 1000, n = 50

▶ number of iterations much smaller than 375(f (x (0) ) − p★) + 6

▶ bound of the form c(f (x (0) ) − p★) + 6 with smaller c (empirically) valid

Implementation

main effort in each iteration: evaluate derivatives and solve Newton system

HΔx = −g

where H = ∇2 f (x), g = ∇f (x)

via Cholesky factorization

H = LLT , Δxnt = −L −T L −1 g, λ(x) = ∥L −1 g∥2

▶ cost (1/3)n3 flops for unstructured system

▶ cost ≪ (1/3)n3 if H is sparse, banded, or has other structure

Example

▶ f (x) = Σni=1 ψi(xi) + ψ0 (Ax + b), with A ∈ R p×n dense, p ≪ n

▶ Hessian has low rank plus diagonal structure H = D + ATH0A
▶ D diagonal with diagonal elements ψi ′′(xi); H0 = ∇2ψ0 (Ax + b)

method 1: form H, solve via dense Cholesky factorization: (cost (1/3)n 3 )

method 2 (block elimination): factor H0 = L0L0T ; write Newton system as

DΔx + ATL0w = −g, L0T AΔx − w = 0

eliminate Δx from first equation; compute w and Δx from

(I + L0TAD−1ATL0)w = −L0T AD−1 g, DΔx = −g − ATL0w

cost: 2p2n (dominated by computation of L0TAD−1ATL0)

Terminology and assumptions(bổ sung vào trang đầu)

Unconstrained minimization
▶ unconstrained minimization problem
minimize f (x)
▶ we assume
– f convex, twice continuously differentiable (hence dom f open)
– optimal value p ★ = infx f (x) is attained at x ★ (not necessarily unique)
▶ optimality condition is ∇f (x) = 0
▶ minimizing f is the same as solving ∇f (x) = 0
▶ a set of n equations with n unknowns

Quadratic functions
▶ convex quadratic: f (x) = (1/2)x TPx + qT x + r, P ⪰ 0
▶ we can solve exactly via linear equations

∇f (x) = Px + q = 0

▶ much more on this special case later

Iterative methods
▶ for most non-quadratic functions, we use iterative methods
▶ these produce a sequence of points x (k) ∈ dom f , k = 0, 1, . . .
▶ x (0) is the initial point or starting point
▶ x (k) is the kth iterate
▶ we hope that the method converges, i.e.,

f (x (k) ) → p ★ , ∇f (x (k) ) → 0

Initial point and sublevel set

▶ algorithms in this chapter require a starting point x (0) such that

– x (0) ∈ dom f
– sublevel set
S = {x | f (x) ≤ f (x (0) )} is closed

▶ 2nd condition is hard to verify, except when all sublevel sets are closed
– equivalent to condition that epi f is closed
– true if dom f = R n
– true if f (x) → ∞ as x → bd dom f

▶ examples of differentiable functions with closed sublevel sets:

Strong convexity and implications

▶ f is strongly convex on S if there exists an m > 0 such that

∇ 2 f (x) ⪰ mI for all x ∈ S

▶ same as f (x) − (m/2) ∥x∥ 22 is convex

▶ if f is strongly convex, for x, y ∈ S,

f (y) ≥ f (x) + ∇f (x) T (y − x) + m/2 ∥x − y∥22

▶ hence, S is bounded
▶ we conclude p ★ > −∞, and for x ∈ S,
f (x) − p★ ≤ 1/2m ||∇f (x)||22

▶ useful as stopping criterion (if you know m, which usually you do not)

Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
CMPSC 465 Assignment 2: Name: Shaurya Aryan PSU Username: Sza5604 Format Requirement
No ratings yet
CMPSC 465 Assignment 2: Name: Shaurya Aryan PSU Username: Sza5604 Format Requirement
11 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
14 Newton
No ratings yet
14 Newton
24 pages
Unconstrained
No ratings yet
Unconstrained
30 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Lecture 12
No ratings yet
Lecture 12
16 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
Nonlinear Spring21
No ratings yet
Nonlinear Spring21
45 pages
Chapter 9 Lecture Notes
No ratings yet
Chapter 9 Lecture Notes
3 pages
Optim
No ratings yet
Optim
70 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
Basic Concepts: 1.1 Continuity
No ratings yet
Basic Concepts: 1.1 Continuity
7 pages
Structural and Multidisciplinary Optimization
No ratings yet
Structural and Multidisciplinary Optimization
33 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
10 Unconstrained
No ratings yet
10 Unconstrained
41 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Jiyue Zeng Honors Thesis
No ratings yet
Jiyue Zeng Honors Thesis
59 pages
4 Pattern Directions, 21-08-2024
No ratings yet
4 Pattern Directions, 21-08-2024
58 pages
Optimization2
No ratings yet
Optimization2
40 pages
Numerical 2222
No ratings yet
Numerical 2222
15 pages
E1 251 Linear and Nonlinear Op2miza2on
No ratings yet
E1 251 Linear and Nonlinear Op2miza2on
24 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
Multi-Variable Optimization Methods
No ratings yet
Multi-Variable Optimization Methods
21 pages
Chapter 2 Optimization and Solving Nonlinear Equations
No ratings yet
Chapter 2 Optimization and Solving Nonlinear Equations
22 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
Newton Scribed
No ratings yet
Newton Scribed
7 pages
04 Nonlinear Systems and Optimization
No ratings yet
04 Nonlinear Systems and Optimization
74 pages
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
No ratings yet
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
30 pages
Mathematics Part 1
No ratings yet
Mathematics Part 1
27 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
Lec 02
No ratings yet
Lec 02
43 pages
Newton Method and Self-Concordance: October 23, 2008
No ratings yet
Newton Method and Self-Concordance: October 23, 2008
27 pages
Newton Gauss Method
No ratings yet
Newton Gauss Method
37 pages
Gradient Descent PDF
No ratings yet
Gradient Descent PDF
9 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
"Newton's Method and Loops": University of Karbala College of Engineering Petroleum Eng. Dep
No ratings yet
"Newton's Method and Loops": University of Karbala College of Engineering Petroleum Eng. Dep
11 pages
Lecture_7_8_other_descent_methods
No ratings yet
Lecture_7_8_other_descent_methods
7 pages
Machine Problem
No ratings yet
Machine Problem
15 pages
BSC Part 3
No ratings yet
BSC Part 3
29 pages
Lecture 9 Si416
No ratings yet
Lecture 9 Si416
14 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
The Bisection Method: Bisection Method Is Yet Another Technique For Finding A Solution To The Nonlinear
No ratings yet
The Bisection Method: Bisection Method Is Yet Another Technique For Finding A Solution To The Nonlinear
5 pages
lecture-5-si416-2025
No ratings yet
lecture-5-si416-2025
21 pages
Week02 Convex Optimization
No ratings yet
Week02 Convex Optimization
48 pages
Unconstrained Multivariable Optimization
No ratings yet
Unconstrained Multivariable Optimization
42 pages
【书】nonlinear optimization （SC function）
No ratings yet
【书】nonlinear optimization （SC function）
158 pages
Direct Iteration Method: X F (X) F (X) X
No ratings yet
Direct Iteration Method: X F (X) F (X) X
6 pages
Steepest Descent
No ratings yet
Steepest Descent
7 pages
Download
No ratings yet
Download
7 pages
Chapter 4
No ratings yet
Chapter 4
37 pages
Line Search Methods
No ratings yet
Line Search Methods
7 pages
OPTFIT aflevering
No ratings yet
OPTFIT aflevering
9 pages
Apuntes, Raices - Cos323 - f12 - Lecture02 - Rootfinding
No ratings yet
Apuntes, Raices - Cos323 - f12 - Lecture02 - Rootfinding
33 pages
Opt_Lec_10
No ratings yet
Opt_Lec_10
16 pages
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
No ratings yet
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
15 pages
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Week 1 - Introduction To Data Structure and Algorithms
No ratings yet
Week 1 - Introduction To Data Structure and Algorithms
2 pages
Bread, Milk Bread, Diapers, Beer, Eggs Bread, Diapers, Beer, Cola Bread, Milk, Diapers, Beer Bread, Milk, Diapers, Cola
No ratings yet
Bread, Milk Bread, Diapers, Beer, Eggs Bread, Diapers, Beer, Cola Bread, Milk, Diapers, Beer Bread, Milk, Diapers, Cola
4 pages
Home Exercise 3: Dynamic Programming and Randomized Algorithms
No ratings yet
Home Exercise 3: Dynamic Programming and Randomized Algorithms
5 pages
Assignment 2 Kowsika C 19BCS4063
No ratings yet
Assignment 2 Kowsika C 19BCS4063
6 pages
Space and Time Trade-off_PPT
No ratings yet
Space and Time Trade-off_PPT
29 pages
Numerical MCQS2
No ratings yet
Numerical MCQS2
7 pages
CSC148 20221-Exam
No ratings yet
CSC148 20221-Exam
18 pages
1 CS205 - DATA - STRUCTURES - QP MAIN JAN 2017 - Ktu Qbank
100% (1)
1 CS205 - DATA - STRUCTURES - QP MAIN JAN 2017 - Ktu Qbank
3 pages
Tutorial 08
No ratings yet
Tutorial 08
3 pages
Gomory's Cutting Plane
No ratings yet
Gomory's Cutting Plane
10 pages
Lab Manual Artificial Intelligence and Neural Network
No ratings yet
Lab Manual Artificial Intelligence and Neural Network
3 pages
Classification: Dr. Sanjay Ranka
No ratings yet
Classification: Dr. Sanjay Ranka
51 pages
Ada 10-11
No ratings yet
Ada 10-11
7 pages
Data Struct Algorithms
No ratings yet
Data Struct Algorithms
115 pages
chapter-9 Algorithm Design and Problem-solving.docx
No ratings yet
chapter-9 Algorithm Design and Problem-solving.docx
13 pages
RED BLACK TREE
No ratings yet
RED BLACK TREE
15 pages
Vol 2 Sep
No ratings yet
Vol 2 Sep
701 pages
015.search Formulation Problems Basic Strategies
No ratings yet
015.search Formulation Problems Basic Strategies
121 pages
Chapter 7 Lossless Compression Algorithms
No ratings yet
Chapter 7 Lossless Compression Algorithms
25 pages
Sieve of Eratosthenes:: Topics That You Should Know With Sieve
No ratings yet
Sieve of Eratosthenes:: Topics That You Should Know With Sieve
3 pages
CHAPTER 15: SHORT-TERM SCHEDULING - Suggested Solutions To Selected Questions
No ratings yet
CHAPTER 15: SHORT-TERM SCHEDULING - Suggested Solutions To Selected Questions
7 pages
Bond Energy Algorithm1
No ratings yet
Bond Energy Algorithm1
48 pages
Trees: Definitions and Concept
No ratings yet
Trees: Definitions and Concept
4 pages
106-Subsets With Duplicates (Easy) - Grokking The Coding Interview - Patterns For Coding Questions
No ratings yet
106-Subsets With Duplicates (Easy) - Grokking The Coding Interview - Patterns For Coding Questions
6 pages
Daa Cif 2024
No ratings yet
Daa Cif 2024
5 pages
Dendrogram - Slides
No ratings yet
Dendrogram - Slides
27 pages
BCA V sem-DAA - Lab programs-1
No ratings yet
BCA V sem-DAA - Lab programs-1
46 pages
50 Important Questions of C Language
73% (11)
50 Important Questions of C Language
3 pages

Chương 9

Uploaded by

Chương 9

Uploaded by

9.

Gradient descent method

x (k+1) = x (k) + t (k)Δx (k)

with f (x (k+1) ) < f (x (k) ) (hence the name)

Generic descent method

General descent method.

Line search types

▶ exact line search: t = argmint>0 f (x + tΔx)

▶ general descent method with Δx = −∇f (x)

c ∈ (0, 1) depends on m, x (0) , line search type

Example: Quadratic function on R 2

▶ take f (x) = (1/2) (x 12 + γx 22 ), with γ > 0

Example: Nonquadratic function on R 2

backtracking line search exact line search

Steepest descent method

▶ Euclidean norm: Δxsd = −∇f (x)

Choice of norm for steepest descent

▶ Newton step is Δxnt = −∇2 f (x) −1∇f (x)

▶ x + Δxnt solves linearized optimality condition

And one more interpretation

▶ dashed lines are contour lines of f ; ellipse is {x + v | v T∇2f (x)v = 1}

▶ Newton decrement is λ(x) = (∇f (x)T∇ 2f (x) −1∇f (x))1/2

▶ directional derivative in the Newton direction: ∇f (x) TΔxnt = −λ(x) 2

given a starting point x ∈ dom f , tolerance ε > 0.

1. Compute the Newton step and decrement.

▶ affine invariant, i.e., independent of linear changes of coordinates

Classical convergence analysis

outline: there exist constants η ∈ (0, m2 /L), γ > 0 such that

Classical convergence analysis

damped Newton phase (∥∇f (x) ∥2 ≥ η)

quadratically convergent phase (∥∇f (x) ∥2 < η)

Classical convergence analysis

conclusion: number of iterations until f (x) − p ★ ≤ ε is bounded above by

▶γ, ε0 are constants that depend on m, L, x (0)

(same problem as slide 9.13)

▶ backtracking parameters α = 0.01, β= 0.5

▶ backtracking parameters α = 0.01, β = 0.5.

Convergence analysis for self-concordant functions

affine invariance: if f : R → R is s.c., then f ˜(y) = f (ay + b) is s.c.:

f ˜ ′′′(y) = a3 f ′′′(ay + b), f ˜ ′′(y) = a2 f ′′(ay + b)

f (x) = log(−g(x)) − log x

Convergence analysis for self-concordant functions

summary: there exist constants η ∈ (0, 1/4], γ > 0 such that

complexity bound: number of Newton iterations bounded by

for α = 0.1, β = 0.8, ε = 10−10, bound evaluates to 375(f (x (0) ) − p★) + 6

▶ number of iterations much smaller than 375(f (x (0) ) − p★) + 6

where H = ∇2 f (x), g = ∇f (x)

via Cholesky factorization

H = LLT , Δxnt = −L −T L −1 g, λ(x) = ∥L −1 g∥2

▶ cost (1/3)n3 flops for unstructured system

▶ f (x) = Σni=1 ψi(xi) + ψ0 (Ax + b), with A ∈ R p×n dense, p ≪ n

method 1: form H, solve via dense Cholesky factorization: (cost (1/3)n 3 )

DΔx + ATL0w = −g, L0T AΔx − w = 0

eliminate Δx from first equation; compute w and Δx from

(I + L0TAD−1ATL0)w = −L0T AD−1 g, DΔx = −g − ATL0w

cost: 2p2n (dominated by computation of L0TAD−1ATL0)

Terminology and assumptions(bổ sung vào trang đầu)

▶ much more on this special case later

Initial point and sublevel set

▶ algorithms in this chapter require a starting point x (0) such that

▶ examples of differentiable functions with closed sublevel sets:

Strong convexity and implications

▶ f is strongly convex on S if there exists an m > 0 such that

∇ 2 f (x) ⪰ mI for all x ∈ S

▶ same as f (x) − (m/2) ∥x∥ 22 is convex

f (y) ≥ f (x) + ∇f (x) T (y − x) + m/2 ∥x − y∥22

You might also like