Convex Optimization
(EE227A: UC Berkeley)
Lecture 3
(Convex sets and functions)
29 Jan, 2013
◦
Suvrit Sra
Course organization
[Link]
Relevant texts / references:
♥ Convex optimization – Boyd & Vandenberghe (BV)
♥ Introductory lectures on convex optimisation – Nesterov
♥ Nonlinear programming – Bertsekas
♥ Convex Analysis – Rockafellar
♥ Numerical optimization – Nocedal & Wright
♥ Lectures on modern convex optimization – Nemirovski
♥ Optimization for Machine Learning – Sra, Nowozin, Wright
Instructor: Suvrit Sra (suvrit@[Link])
(Max Planck Institute for Intelligent Systems, Tübingen, Germany)
HW + Quizzes (40%); Midterm (30%); Project (30%)
TA Office hours to be posted soon
I don’t have an office yet
If you email me, please put EE227A in Subject:
2 / 42
Linear algebra recap
3 / 42
Eigenvalues and Eigenvectors
Def. If A ∈ Cn×n and x ∈ Cn . Consider the equation
Ax = λx, x 6= 0, λ ∈ C.
If scalar λ and vector x satisfy this equation, then λ is called an
eigenvalue and x and eigenvector of A.
Above equation may be rewritten equivalently as
(λI − A)x = 0, x 6= 0.
Thus, λ is an eigenvalue, if and only if
det(λI − A) = 0.
Def. pA (t) := det(tI − A) is called characteristic polynomial.
Eigenvalues are roots of characteristic polynomial.
4 / 42
Eigenvalues and Eigenvectors
Theorem Let λ1 , . . . , λn be eigenvalues of A ∈ Cn×n . Then,
X X Y
Tr(A) = aii = λi , det(A) = λi .
i i i
Def. Matrix U ∈ Cn×n unitary if U ∗ U = I ([U ∗ ]ij = [ūji ])
Theorem (Schur factorization). If A ∈ Cn×n with eigenvalues
λ1 , . . . , λn , then there is a unitary matrix U ∈ Cn×n (i.e., U ∗ U = I),
such that
U ∗ AU = T = [tij ]
is upper triangular with diagonal entries tii = λi .
Corollary. If A∗ A = AA∗ , then there exists a unitary U such that
A = U ΛU ∗ . We will call this the Eigenvector Decomposition.
Proof. A = V T V ∗ , A∗ = V T ∗ V ∗ , so AA∗ = T T ∗ = T ∗ T = A∗ A. But
T is upper triangular, so only way for T T ∗ = T ∗ T , some easy but tedious
induction shows that T must be diagonal. Hence, T = Λ.
5 / 42
Singular value decomposition
Theorem (SVD) Let A ∈ Cm×n . There are unitaries s.t. U and V
U ∗ AV = Diag(σ1 , . . . , σp ), p = min(m, n),
where σ1 ≥ σ2 ≥ · · · σp ≥ 0. Usually written as
A = U ΣV ∗ .
left singular vectors U are eigenvectors of AA∗
right singular vectors V are eigenvectors of A∗ A
p p
nonzero singular values σi = λi (AA∗ ) = λi (A∗ A)
6 / 42
Positive definite matrices
Def. Let A ∈ Rn×n be symmetric, i.e., aij = aji . Then, A is called
positive definite if
X
xT Ax = xi aij xj > 0, ∀ x 6= 0.
ij
If > replaced by ≥, we call A positive semidefinite.
Theorem A symmetric real matrix is positive semidefinite (positive
definite) iff all its eigenvalues are nonnegative (positive).
Theorem Every semidefinite matrix can be written as B T B
Exercise: Prove this claim. Also prove converse.
Notation: A 0 (posdef) or A 0 (semidef)
Amongst most important objects in convex optimization!
7 / 42
Matrix and vector calculus
f (x) ∇f (x)
xT a
P
=P i xi ai a
xT Ax = ij xi aij xj (A + AT )x
log det(X) X −1
AT
P
Tr(XA) = ij xij aji
Tr(X T A) = ij xij aij
P
A
Tr(X T AX) (A + AT )X
Easily derived using “brute-force” rules
♣ Wikipedia
♣ My ancient notes
♣ Matrix cookbook
♣ I hope to put up notes on less brute-forced approach.
8 / 42
Convex Sets
9 / 42
Convex sets
10 / 42
Convex sets
Def. A set C ⊂ Rn is called convex, if for any x, y ∈ C, the
line-segment θx + (1 − θ)y (here θ ≥ 0) also lies in C.
Combinations
I Convex: θ1 x + θ2 y ∈ C, where θ1 , θ2 ≥ 0 and θ1 + θ2 = 1.
I Linear: if restrictions on θ1 , θ2 are dropped
I Conic: if restriction θ1 + θ2 = 1 is dropped
11 / 42
Convex sets
Theorem (Intersection).
Let C1 , C2 be convex sets. Then, C1 ∩ C2 is also convex.
Proof. If C1 ∩ C2 = ∅, then true vacuously.
Let x, y ∈ C1 ∩ C2 . Then, x, y ∈ C1 and x, y ∈ C2 .
But C1 , C2 are convex, hence θx + (1 − θ)y ∈ C1 , and also in C2 .
Thus, θx + (1 − θ)y ∈ C1 ∩ C2 .
Inductively follows that ∩m
i=1 Ci is also convex.
12 / 42
Convex sets – more examples
(psdcone image from [Link], Dattorro)
13 / 42
Convex sets – more examples
♥ Let x1 , x2 , . . . , xm ∈ Rn . Their convex hull is
nX X o
co(x1 , . . . , xm ) := θi xi | θi ≥ 0, θi = 1 .
i i
♥ Let A ∈ Rm×n , and b ∈ Rm . The set {x | Ax = b} is convex (it
is an affine space over subspace of solutions of Ax = 0).
♥ halfspace x | aT x ≤ b .
♥ polyhedron {x | Ax ≤ b, Cx = d}.
♥ ellipsoid x | (x − x0 )T A(x − x0 ) ≤ 1 , (A: semidefinite)
P
♥ probability simplex {x | x ≥ 0, i xi = 1}
◦
Quiz: Prove that these sets are convex.
13 / 42
Convex functions
14 / 42
Convex functions
Def. Function f : I → R on interval I called midpoint convex if
f (x)+f (y)
f x+y
2 ≤ 2 , whenever x, y ∈ I.
Read: f of AM is less than or equal to AM of f .
Def. A function f : Rn → R is called convex if its domain dom(f )
is a convex set and for any x, y ∈ dom(f ) and θ ≥ 0
f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y).
Theorem (J.L.W.V. Jensen). Let f : I → R be continuous. Then,
f is convex if and only if it is midpoint convex.
I Theorem extends to functions f : X ⊆ Rn → R. Very useful to
checking convexity of a given function.
15 / 42
Convex functions
)
f (y)
( 1 − λ)f (y
λf ( x) +
f (x)
x y
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
16 / 42
Convex functions
f (x) yi
−
(y ), x
∇f
) +h
f (y
f (y)
y x
f (x) ≥ f (y) + h∇f (y), x − yi
17 / 42
Convex functions
P R
Q
x z = λx + (1 − λ)y y
slope PQ ≤ slope PR ≤ slope QR
18 / 42
Recognizing convex functions
♠ If f is continuous and midpoint convex, then it is convex.
♠ If f is differentiable, then f is convex if and only if dom f is
convex and f (x) ≥ f (y) + h∇f (y), x − yi for all x, y ∈ dom f .
♠ If f is twice differentiable, then f is convex if and only if dom f
is convex and ∇2 f (x) 0 at every x ∈ dom f .
19 / 42
Convex functions
Linear: f (θ1 x + θ2 y) = θ1 f (x) + θ2 f (y) ; θ1 , θ2 unrestricted
Concave: f (θx + (1 − θ)y) ≥ θf (x) + (1 − θ)f (y)
Strictly convex: If inequality is strict for x 6= y
20 / 42
Convex functions
Example The pointwise maximum of a family of convex functions is
convex. That is, if f (x; y) is a convex function of x for every y in
some “index set” Y, then
f (x) := max f (x; y)
y∈Y
is a convex function of x (set Y is arbitrary).
Example Let f : Rn → R be convex. Let A ∈ Rm×n , and b ∈ Rm .
Prove that g(x) = f (Ax + b) is convex.
Exercise: Verify truth of above examples.
21 / 42
Convex functions
Theorem Let Y be a nonempty convex set. Suppose L(x, y) is
convex in (x, y), then,
f (x) := inf L(x, y)
y∈Y
is a convex function of x, provided f (x) > −∞.
Proof. Let u, v ∈ dom f . Since f (u) = inf y L(u, y), for each > 0, there
is a y1 ∈ Y, s.t. f (u) + 2 is not the infimum. Thus, L(u, y1 ) ≤ f (u) + 2 .
Similarly, there is y2 ∈ Y, such that L(v, y2 ) ≤ f (v) + 2 .
Now we prove that f (λu + (1 − λ)v) ≤ λf (u) + (1 − λ)f (v) directly.
f (λu + (1 − λ)v) = inf L(λu + (1 − λ)v, y)
y∈Y
≤ L(λu + (1 − λ)v, λy1 + (1 − λ)y2 )
≤ λL(u, y1 ) + (1 − λ)L(v, y2 )
≤ λf (u) + (1 − λ)f (v) + .
Since > 0 is arbitrary, claim follows.
22 / 42
Example: Schur complement
Let A, B, C be matrices such that C 0, and let
A B
Z := 0,
BT C
then the Schur complement A − BC −1 B T 0.
Proof. L(x, y) = [x, y]T Z[x, y] is convex in (x, y) since Z 0
Observe that f (x) = inf y L(x, y) = xT (A − BC −1 B T )x is convex.
(We skipped ahead and solved ∇y L(x, y) = 0 to minimize).
23 / 42
Recognizing convex functions
♠ If f is continuous and midpoint convex, then it is convex.
♠ If f is differentiable, then f is convex if and only if dom f is
convex and f (x) ≥ f (y) + h∇f (y), x − yi for all x, y ∈ dom f .
♠ If f is twice differentiable, then f is convex if and only if dom f
is convex and ∇2 f (x) 0 at every x ∈ dom f .
♠ By showing f to be a pointwise max of convex functions
♠ By showing f : dom(f ) → R is convex if and only if its
restriction to any line that intersects dom(f ) is convex. That is,
for any x ∈ dom(f ) and any v, the function g(t) = f (x + tv) is
convex (on its domain {t | x + tv ∈ dom(f )}).
♠ See exercises (Ch. 3) in Boyd & Vandenberghe for more ways
24 / 42
Operations preserving
convexity
25 / 42
Operations preserving convexity
Pointwise maximum: f (x) = supy∈Y f (y; x)
Conic combination: Let a1P , . . . , an ≥ 0; let f1 , . . . , fn be convex
functions. Then, f (x) := i ai fi (x) is convex.
Remark: The set of all convex functions is a convex cone.
Affine composition: f (x) := g(Ax + b), where g is convex.
26 / 42
Operations preserving convexity
Theorem Let f : I1 → R and g : I2 → R, where range(f ) ⊆ I2 . If
f and g are convex, and g is increasing, then g ◦ f is convex on I1
Proof. Let x, y ∈ I1 , and let λ ∈ (0, 1).
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
g(f (λx + (1 − λ)y)) ≤ g λf (x) + (1 − λ)f (y)
≤ λg f (x) + (1 − λ)g f (y) .
Read Section 3.2.4 of BV for more
27 / 42
Examples
28 / 42
Quadratic
Let f (x) = xT Ax + bT x + c, where A 0, b ∈ Rn , and c ∈ R.
What is: ∇2 f (x)?
∇f (x) = 2Ax + b, ∇2 f (x) = A 0, hence f is convex.
29 / 42
Indicator
Let IX be the indicator function for X defined as:
(
0 if x ∈ X ,
IX (x) :=
∞ otherwise.
Note: IX (x) is convex if and only if X is convex.
30 / 42
Distance to a set
Example Let Y be a convex set. Let x ∈ Rn be some point. The
distance of x to the set Y is defined as
dist(x, Y) := inf kx − yk.
y∈Y
Because kx − yk is jointly convex in (x, y), the function dist(x, Y)
is a convex function of x.
31 / 42
Norms
Let f : Rn → R be a function that satisfies
1 f (x) ≥ 0, and f (x) = 0 if and only if x = 0 (definiteness)
2 f (λx) = |λ|f (x) for any λ ∈ R (positive homogeneity)
3 f (x + y) ≤ f (x) + f (y) (subadditivity)
Such a function is called a norm. We usually denote norms by kxk.
Theorem Norms are convex.
Proof. Immediate from subadditivity and positive homogeneity.
32 / 42
Vector norms
Example (`2 -norm): Let x ∈ Rn . The Euclidean or `2 -norm is
P 2 1/2
kxk2 = i xi
p 1/p
P
Example (`p -norm): Let p ≥ 1. kxkp = i |xi |
Exercise: Verify that kxkp is indeed a norm.
Example (`∞ -norm): kxk∞ = max1≤i≤n |xi |
Example (Frobenius-norm):
qP Let A ∈ Rm×n . The Frobenius norm
p
of A is kAkF := 2
ij |aij | ; that is, kAkF = Tr(A∗ A).
33 / 42
Mixed norms
Def. Let x ∈ Rn1 +n2 +···+nG be a vector partitioned into subvectors
xj ∈ Rnj , 1 ≤ j ≤ G. Let p := (p0 , p1 , p2 , . . . , pG ), where pj ≥ 1.
Consider the vector ξ := (kx1 kp1 , · · · , kxG kpG ). Then, we define
the mixed-norm of x as
kxkp := kξkp0 .
Example `1,q -norm: Let x be as above.
XG
kxk1,q := kxi kq .
i=1
This norm is popular in machine learning, statistics.
34 / 42
Matrix Norms
Induced norm
Let A ∈ Rm×n , and let k·k be any vector norm. We define an
induced matrix norm as
kAxk
kAk := sup .
kxk6=0 kxk
Verify that above definition yields a norm.
I Clearly, kAk = 0 iff A = 0 (definiteness)
I kαAk = |α| kAk (homogeneity)
I kA + Bk = sup k(A+B)xk
kxk ≤ sup kAxk+kBxk
kxk ≤ kAk + kBk.
35 / 42
Operator norm
Example Let A be any matrix. Then, the operator norm of A is
kAxk2
kAk2 := sup .
kxk2 6=0 kxk2
kAk2 = σmax (A), where σmax is the largest singular value of A.
• Warning! Generally, largest eigenvalue of a matrix is not a norm!
• kAk1 and kAk∞ —max-abs-column and max-abs-row sums.
• kAkp generally NP-Hard to compute for p 6∈ {1, 2, ∞}
• Schatten p-norm: `p -norm of vector of singular value.
• Exercise: Let σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0 be singular values of a
matrix A ∈ Rm×n . Prove that
Xk
kAk(k) := σi (A),
i=1
is a norm; 1 ≤ k ≤ n.
36 / 42
Dual norms
Def. Let k·k be a norm on Rn . Its dual norm is
kuk∗ := sup uT x | kxk ≤ 1 .
Exercise: Verify that kuk∗ is a norm.
Exercise: Let 1/p + 1/q = 1, where p, q ≥ 1. Show that k·kq is
dual to k·kp . In particular, the `2 -norm is self-dual.
37 / 42
Fenchel Conjugate
38 / 42
Fenchel conjugate
Def. The Fenchel conjugate of a function f is
f ∗ (z) := sup xT z − f (x).
x∈dom f
Note: f ∗ is pointwise (over x) sup of linear functions of z. Hence,
it is always convex (regardless of f ).
Example +∞ and −∞ conjugate to each other.
Example Let f (x) = kxk. We have f ∗ (z) = Ik·k∗ ≤1 (z). That is,
conjugate of norm is the indicator function of dual norm ball.
f ∗ (z) = supx z T x − kxk. If kzk∗ > 1, then by definition of the dual
norm, there is u s.t. kuk ≤ 1 and uT z > 1. Now select x = αu and let
α → ∞. Then, z T x − kxk = α(z T u − kuk) → ∞. If kzk∗ ≤ 1, then
z T x ≤ kxkkzk∗ , which implies the sup must be zero.
39 / 42
Fenchel conjugate
Example f (x) = ax + b; then,
f ∗ (z) = sup zx − (ax + b)
x
= ∞, if (z − a) 6= 0.
40 / 42
Fenchel conjugate
Example f (x) = ax + b; then,
f ∗ (z) = sup zx − (ax + b)
x
= ∞, if (z − a) 6= 0.
Thus, dom f ∗ = {a}, and f ∗ (a) = −b.
√
Example Let a ≥ 0, and set √ f (x) = − a2 − x2 if |x| ≤ a, and +∞
otherwise. Then, f ∗ (z) = a 1 + z 2 .
Example f (x) = 12 xT Ax, where A 0. Then, f ∗ (z) = 12 z T A−1 z.
Example f (x) = max(0, 1 − x). Now f ∗ (z) = supx zx − max(0, 1 −
x). Note that dom f ∗ is [−1, 0] (else sup is unbounded); within this
domain, f ∗ (z) = z.
40 / 42
Misc Convexity
41 / 42
Other forms of convexity
♣ Log-convex: log f is convex; log-cvx =⇒ cvx;
♣ Log-concavity: log f concave; not closed under addition!
♣ Exponentially convex: [f (xi + xj )] 0, for x1 , . . . , xn
♣ Operator convex: f (λX + (1 − λ)Y ) λf (X) + (1 − λ)f (Y )
♣ Quasiconvex: f (λx + (1 − λy)) ≤ max {f (x), f (y)}
♣ Pseudoconvex: h∇f (y), x − yi ≥ 0 =⇒ f (x) ≥ f (y)
♣ Discrete convexity: f : Zn → Z; “convexity + matroid theory.”
42 / 42