2.
2 Convex Function
Yipeng Liu
School of Electronic Engineering/Center for Robotics/Center for Information in Medicine
University of Electronic Science and Technology of China (UESTC)
[email protected]
October 10, 2016
1 / 35
Overview
1. definition
2. basic properties
3. epigraph and sublevel set
4. Jensen’s inequality
5. operations that preserve convexity
6. conjugate function
7. log-concave and log-convex functions
8. convexity with respect to generalized inequalities
2 / 35
Definition
f : RN → R is convex if dom f is a convex set and
f (θx + (1 − θ)y) 6 θf (x) + θf (1 − y)
for all x, y ∈ dom f , 0 6 θ 6 1
• f is concave if −f is convex
• f is strictly convex if dom f is convex and
f (θx + (1 − θ)y) < θf (x) + θf (1 − y)
for all x, y ∈ dom f, x 6= y, 0 < θ < 1
3 / 35
Examples on R
convex:
• affine: ax + b on R, for any a, b ∈ R
• exponential: eax , for any a ∈ R
• powers:xα on R++ , for all α > 1 or α 6 0
• powers of absolute value: |x|p on R, for p > 1
• negative entropy: x log x on R++
concave:
• affine: ax + b on R, for any a, b ∈ R
• powers: xα on R++ , for all 0 6 α 6 1
• logarithm: log x on R++
4 / 35
Examples on RN and RM ×N
affine functions are convex and concave; all norms are convex
examples on RN :
• affine function: f (x) = aT x + b
P 1/p
N
• norms: kxkp = n=1 |xi |p , for p > 1; kxk∞ = maxn |xn |
examples on RM ×N :
• affine function
M X
X N
f (X) = tr AT X + b = Amn Xmn + b
m=1 n=1
• spectral (maximum singular value) norm
1/2
f (X) = kXk2 = σmax (X) = λmax XT X
5 / 35
Restriction of a convex function to a line
f : RN → R is convex if and only if the function g : R → R,
g(t) = f (x + tv), dom g = {t |x + tv ∈ dom f }
is convex (in t) for any x ∈ dom f, v ∈ RN
check convexity of f by checking convexity of functions of only one variable
example: f : SN → R with f (X) = log det X, dom f = SN
++
g(t) = log det (X + tV) = log det X + log det I + tX−1/2 VX−1/2
N
X
= log det X + log(1 + tλn )
n=1
where λn are the eigenvalues of X−1/2 VX−1/2
g is concave in t (for any choice of X 0, V); hence f is concave
6 / 35
Extended-value extension
extended-value extension f˜ of f is
f˜(x) = f (x), x ∈ dom f, f˜(x) = ∞, x ∈
/ dom f
often simplifies notation; for example, the condition
0 6 θ 6 1 ⇒ f˜(θx + (1 − θ)y) 6 θf˜(x) + (1 − θ)f˜(y)
(as an inequality in R ∪ {∞}), means the same as the two conditions
• dom f is convex
• x, y ∈ dom f
0 6 θ 6 1 ⇒ f (θx + (1 − θ)y) 6 θf (x) + (1 − θ)f (y)
7 / 35
First-order condition
f is differentiable if dom f is open and the gradient
∂f (x) ∂f (x) ∂f (x)
∇f (x) = , ,··· ,
∂x1 ∂x2 ∂xN
exists at each x ∈ dom f
1st-order condition: differentiable f with convex domain is convex iff
f (y) > f (x) + ∇f (x)T (y − x) , for all x, y ∈ dom f
first-order approximation of f is global underestimator
8 / 35
Second-order conditions
f is twice differentiable if dom f is open and the Hessian ∇2 f (x) ∈ SN
∂ 2 f (x)
∇2 f (x)i,j = , i, j = 1, 2, · · · , N
∂xi ∂xj
exists at each x ∈ dom f
2nd-order conditions: for twice differentiable f with convex domain
• f is convex if and only if
∇2 f (x) 0, for all x ∈ dom f
• if ∇2 f (x) 0, for all x ∈ dom f , then f is strictly convex
9 / 35
Examples
quadratic function: f (x) = (1/2) xT Px + qT x + r (with P ∈ SN )
∇f (x) = Px + q, ∇2 f (x) = P
convex if P 0
least-squares objective: f (x) = kAx − bk22
∇f (x) = 2AT (x − b), ∇2 f (x) = 2AT A
convex (for any A)
quadratic-over-linear: f (x, y) = x2 y
" #" #T
2 2 y y
∇ f (x, y) = 3 0
y −x −x
convex for y > 0
10 / 35
Examples
PN
log-sum-exp (soft max): f (x) = log n=1 exp xn is convex
1 1
∇2 f (x) = diag(z) − zzT , (zn = exp xn )
1T z (1T z)2
to show ∇2 f (x) 0, we must verify that vT ∇2 f (x)v > 0 for all v:
2
zn vn2
P P P
zn − n vn zn
vT ∇2 f (x)v = n
P
n
2 >0
n zn
2
zn vn2
P P P
since n vn zn 6 n n zn (Cauchy-Schwarz inequality)
Q 1/N
N
geometric mean: f (x) = n=1 xn on RN
++ is concave
(similar proof as for log-sum-exp)
11 / 35
Epigraph and sublevel set
α-sublevel set of f : RN → R
Cα = {x ∈ dom f |f (x) 6 α }
sublevel sets of convex functions are convex (converse is false)
epigraph of f : RN → R
n o
epi f = (x, t) ∈ RN +1 |x ∈ dom f, f (x) 6 t
a function (in black) f is convex if and
only if the region above its graph (in
green, epi f ) is a convex set
two kinds of relations between convex set and convex function
12 / 35
Some convex functions constructed from convex sets
Let K ⊆ RN be a convex set.
1. The characteristic function (equivalent to indicator) of K is:
(
0, if x ∈ K
χK (x) =
+∞, otherwise
2. Suppose that 0 ∈ K. The Minkowski function of K is:
µK = inf {t > 0 : x ∈ tK}
note: epi µK is the light cone of K, which is a convex cone.
for t > 0, x, y ∈ RN
• µ is positively homogeneous: µK (tx) = tµK (x)
• µ is subadditive: µK (x + y) = µK (x) + µK (y)
µK < 1 for x ∈ int K
13 / 35
Jensen’s inequality
basic inequality: if f is convex, then for 0 6 θ 6 1
f (θx + (1 − θ)y) 6 θf (x) + (1 − θ)f (y)
extension: if f is convex, then
f (Ez) 6 Ef (z)
for any random variable z.
basic inequality is special case with discrete distribution
prob(z = x) = θ, prob(z = y) = 1 − θ
14 / 35
Operations that preserve convexity
practical methods for establishing convexity of a function
1. verify definition (often simplified by restricting to a line)
2. for twice differentiable functions, show ∇2 f (x) 0
3. show that f is obtained from simple convex functions by operations that
preserve convexity
• nonnegative weighted sum
• composition with affine function
• pointwise maximum and supremum
• composition
• minimization
• perspective
15 / 35
Positive weighted sum & composition with affine function
nonnegative multiple: αf is convex if f is convex, α > 0
sum: f1 + f2 convex if f1 , f2 convex (extends to infinite sums, integrals).
composition with affine function: f (Ax + b) is convex if f is convex
examples
• log barrier for linear inequalities
M
X
f (x) = − log bm − aTm x ,
m=1
n o
dom f = x aTm x < bm , m = 1, · · · , M
• (any) norm of affine function: f (x) = kAx + bk
16 / 35
Pointwise maximum
if f1 , · · · , fM are convex, then f (x) = max{f1 (x), · · · , fM (x)} is convex
proof: A function is convex iff its epigraph is convex + the epigraph of a
pointwise maximum is the intersection of the epigraphs ⇒ the pointwise
maximum of convex functions is convex
examples
• piecewise-linear function f (x) = maxm=1,··· ,M (aT
m x + bm ) is convex
• sum of K largest components of x ∈ RN :
f (x) = x[1] + x[2] + · · · + x[K]
is convex (x[k] is kth largest component of x)
f (x) = max {xn1 + xn2 + · · · + xnK |1 6 n1 < n2 < · · · < nK 6 N }
the max of all the functions which select K entries from x and sum them.
17 / 35
Pointwise supremum
if f (x, y) is convex in x for each f (x, y), y ∈ C, then
g(x) = sup f (x, y)
y∈C
is convex
note that: f (x, y) does not need to be convex in f (x, y)
examples
• support function of a set
C : SC (x) = sup xT y
y∈C
• distance to farthest point in a set C:
f (x) = sup kx − yk
y∈C
• maximum eigenvalue of symmetric matrix: for X ∈ SN
λmax (X) = sup yT Xy
kyk2 =1
18 / 35
Composition with scalar functions
composition of g : RN → R and h : R → R:
f (x) = h(g(x))
g convex, h convex, h̃ nondecreasing
f is convex if
g concave, h convex, h̃ nonincreasing
• proof (for N = 1, differentiable g, h)
f 00 (x) = h00 (g(x))g 0 (x)2 + h0 (g(x))g 00 (x)
• note: monotonicity must hold for extended-value extension h̃
examples
• exp g(x) is convex if g is convex
• 1/g(x) is convex if g is concave and positive
19 / 35
Vector composition
N
composition of g : R → RK and h : RK → R:
f (x) = h(g(x)) = h(g1 (x), g2 (x), · · · , gK (x))
f (x) = h(g(x)) = h(g1 (x), g2 (x), · · · , gK (x))
gk convex, h convex, h̃ nondecreasing in each argument
f is convex if
gk concave, h convex, h̃ nonincreasing in each argument
proof (for N = 1, differentiable g, h)
f 00 (x) = g 0 (x)T ∇2 h(g(x))g 0 (x) + ∇h(g(x))T g 00 (x)
examples
PM
• log gm (x) is concave if gm are concave and positive
m=1
PM
• log m=1 exp gm (x) is convex if gm are convex
20 / 35
Infimum
if f (x, y) is convex in (x, y) and C is a convex set, then
g(x) = inf f (x, y)
y∈C
is convex
example
• f (x, y) = xT Ax + 2xT By + yT Cy with
" #
A B
0, C 0
BT C
minimizing over y gives g(x) = inf y f (x, y) = xT (A − BC−1 BT )x
g is convex, hence Schur complement A − BC−1 BT 0
• distance to a set: dist(x, S) = inf y∈S kx − yk is convex if S is convex
21 / 35
Perspective
the perspective of a function f : RN → R is the function g : RN × R → R,
g(x, t) = tf (x/t), dom g = {(x, t) |x/t ∈ dom f, t > 0 }
g is convex if f is convex
examples
• f (x) = xT x is convex; hence g(x, t) = xT x t is convex for t > 0
• negative logarithm f (x) = − log x is convex; hence relative entropy
g(x, t) = t log t − t log x is convex on R2++ .
• f is convex, then
Ax + b
g(x) = (cT x + d)f ( )
cT x + d
is convex on
Ax + b
x cT x + d > 0, ∈ dom f
cT x + d
22 / 35
Conjugate function
the conjugate of a function f is
f ∗ (y) = sup (yT x − f (x))
x∈dom f
when y is fixed, xy is a line with 0 point in it and the slop is y
• f ∗ is the maximum gap between linear function yT x and f (x)
• f ∗ is convex (even if f is not), since it is the pointwise maximum of
convex (affine) functions in y
• for differentiable f , conjugation is called the Legendre transform 23 / 35
Conjugate function
examples
• negative logarithm f (x) = − log x
(
∗ −1 − log(−y), y < 0
f (y) = sup(yx + log x) =
x>0 ∞, otherwise
1 T
• strictly convex quadratic f (x) = 2
x Qx with Q ∈ SN
++
1 T 1
f ∗ (y) = sup(yT x − x Qx) = yT Q−1 y
x 2 2
• indicator function f (x) = 1C (x)
f ∗ (y) = 1∗C (x) = sup yT x
x∈C
called the support function of C
• norm f (x) = kxk
f ∗ (y) = 1{y:kyk∗ 61} (y)
24 / 35
Conjugate function
Properties
• Fenchel’s inequality: for any x, y,
f (x) + f ∗ (y) > xT y
• Hence conjugate of conjugate f ∗∗ satisfies f ∗∗ 6 f
• If f is closed and convex, then f ∗∗ = f
• If f is closed and convex, then for any x, y,
x ∈ ∂f ∗ (y) ⇔ y ∈ ∂f ∗ (x) ⇔ f (x) + f ∗ (y) = xT y
• If f (u, v) = f1 (u) + f2 (v) (here u ∈ RN , v ∈ RM ), then
f ∗ (w, z) = f1∗ (w) + f2∗ (z)
25 / 35
Quasiconvex functions
N
Definition 1: f : R → R is quasiconvex if dom f is convex and the sublevel
sets
Sα = {x ∈ dom f |f (x) 6 α }
are convex for all α
• f is quasiconcave if −f is quasiconvex
• f is quasilinear if it is quasiconvex and quasiconcave
26 / 35
examples
p
• |x| is quasiconvex on R
• ceil(x) = inf {z ∈ Z |z > x } is quasilinear
• log x is quasilinear on R++
• f (x1 , x2 ) = x1 x2 is quasiconcave on R2++
• linear-fractional function
aT x + b n o
f (x) = T
, dom f = x cT x + d > 0
c x+d
is quasilinear
• distance ratio
kx − ak2
f (x) = , dom f = x kx − ak2 6 kx − bk2
kx − bk2
is quasiconvex
27 / 35
Quasiconvex functions
internal rate of return
• cash flow x = [x0 , · · · , xN ]T ; xn is payment in period n (to us if xn > 0)
• we assume x0 < 0 (investment) and x0 + x1 + · · · + xN > 0
• present value of cash flow x, for interest rate r:
N
X
PV(x, r) = (1 + r)−n xn
n=0
• internal rate of return is smallest interest rate for which PV(x, r):
IRR(x) = inf {r > 0 |PV(x, r) = 0 }
IRR is quasiconcave: superlevel set is intersection of halfspaces
N
X
IRR(x) > R ⇔ (1 + r)−n xn > 0 for 06r<R
n=0
28 / 35
Properties of quasiconvex functions
modified Jensen inequality (Definition 2): for quasiconvex f
0 6 θ 6 1 ⇒ f (θx + (1 − θ)y) 6 max{f (x), f (y)}
first-order condition: differentiable f with cvx domain is quasiconvex iff
f (y) 6 f (x) ⇒ ∇f (x)T (y − x) 6 0
sums of quasiconvex functions are not necessarily quasiconvex
29 / 35
Strictly local quasiconvex function
let x, z ∈ RN , κ, ε > 0, f : RN → R is (ε, κ, z)-strictly locally quasiconvex
(SLQC) in x, if at least one of the following applies:
• f (x) − f (z) 6 ε
• k∇f (x)k2 > 0, and for every y ∈ B(z, ε/κ) it holds that
h∇f (x), y − xi 6 0
L-Lipschitz + strictly quasiconvex = (ε, L, z)-SLQC
note: Lipschitz continuity: kf (x) − f (y)k 6 L kx − yk , ∀ x, y ∈ C
normalized gradient descent methods can solve the SLQC optimization
30 / 35
Log-concave and log-convex functions
a positive function f is log-concave if log f is concave:
f (θx + (1 − θ)y) > f (x)θ f (y)1−θ , for 0 6 θ 6 1
f is log-convex if log f is convex
• powers: xa on R++ is log-convex for a 6 1, log-concave for a > 1
• many common probability densities are log-concave, e.g., normal:
1 1
f (x) = q exp − (x − x)T Σ−1 (x − x)
2
(2π)N det Σ
• cumulative Gaussian distribution function φ is log-concave
x
u2
Z
1
φ(x) = √ exp − du
2π −∞ 2
31 / 35
Properties of log-concave functions
• twice differentiable f with convex domain is log-concave if and only if
f (x)∇2 f (x) ∇f (x)∇f (x)T
for all x ∈ dom f
• product of log-concave functions is log-concave
• sum of log-concave functions is not always log-concave
• integration: if f : RN × RM → R is log-concave, then
Z
g(x) = f (x, y)dy
is log-concave (not easy to show)
32 / 35
Properties of log-concave functions
consequences of integration property
• convolution f ∗ g of log-concave functions f, g is log-concave
Z
f ∗ g(x) = f (x − y)g(y)dy
• if C is convex and y is a random variable with log-concave pdf, then
f (x) = prob(x + y ∈ C)
is log-concave
proof: write f (x) as integral of product of log-concave functions
(
1, u ∈ C
Z
f (x) = g(x + y) p(y)dy, g(u) =
0, u ∈
/C
p is pdf of y
33 / 35
Properties of log-concave functions
example: yield function
h(x) = prob(x + w ∈ S)
• x ∈ RN : nominal/target parameter values for product
• w ∈ RN : random variations of parameters in manufactured product
• S: set of acceptable values
if S is convex and w has a log-concave pdf, then
• h is log-concave
• yield regions {x |h(x) > α } are convex
34 / 35
Convexity with respect to generalized inequalities
f : RN → RM is K-convex if dom f is convex and
f (θx + (1 − θ)y) K θf (x) + (1 − θ)f (y)
for x, y ∈ dom f, 0 6 θ 6 1
example: f : SM → SM , f (X) = X2 is SM
+ -convex
proof: for fixed z ∈ RM , zT X2 z = kXzk22 is convex in X, i.e.,
zT (θX + (1 − θ)Y)2 z 6 θzT X2 z + (1 − θ)zT Y2 z
for X, Y ∈ SM , 0 6 θ 6 1
therefore, (θX + (1 − θ)Y)2 θX2 + (1 − θ)Y2
35 / 35