0% found this document useful (0 votes)

68 views21 pages

03 Convex Functions Notes Cvxopt f22

The document defines convex functions and provides examples. A function f is convex if its domain is convex and f(θx + (1-θ)y) ≤ θf(x) + (1-θ)f(y) for all x, y in the domain and 0 ≤ θ ≤ 1. Common convex functions include quadratic, exponential, and norm functions. A function is concave if its negative is convex. Convexity is preserved under positive weighted sums, composition with affine functions, and taking the maximum of convex functions.

Uploaded by

jchill2018

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views21 pages

03 Convex Functions Notes Cvxopt f22

Uploaded by

jchill2018

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Convex functions

We have seen what it means for a set to be convex. In this set

of notes, we start working towards what it means to be a convex
function.

To define this concept rigorously, we must be specific about the subset

of RN where a function can be applied. Specifically, the domain
dom f of a function f : RN → RM is the subset of RN where f is
well-defined. We then say that a function f is convex if dom f is a
convex set, and
f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y)
for all x, y ∈ dom f and 0 ≤ θ ≤ 1.
This inequality is easier to interpret with a picture. The left-hand
side of the inequality above is simply the function f evaluated along
a line segment between x and y. The right-hand side represents a
straight line segment between f (x) and f (y) as we move along this
line segment, which for a convex function must lie above f .

θf (x) + (1 − θ)f (y)

f(y)
f(x)

f (θx + (1 − θ)y)

x y

29
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
We say that f is strictly convex if dom f is convex and
f (θx + (1 − θ)y) < θf (x) + (1 − θ)f (y)
for all x 6= y ∈ dom f and 0 < θ < 1.

Note also that we say that a function is f is concave if −f is convex,

and similarly for strictly concave functions. We are mostly interested
in convex functions, but this is only because we are mostly restricting
our attention to minimization problems. We justified this because
any maximization problem can be converted to a minimization one
by multiplying the objective function by −1. Everything that we say
about minimizing convex functions also applies maximizing concave
ones.
We make a special note here that affine functions of the form
f (x) = hx, ai + b,
are both convex and concave (but neither strictly convex nor strictly
concave). This is the only kind of function that has this property.
(Why?)

Note that in the definition above, the domain matters. For example,
f (x) = x3
is convex if dom f = R+ = [0, ∞] but not if dom f = R.

It will also sometimes be useful to consider the extension of f from

dom f to all of RN , defined as
f˜(x) = f (x), x ∈ dom f, f˜(x) = +∞, x 6∈ dom f.
If f is convex on dom f , then its extension is also convex on RN .

30
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
The epigraph
A useful notion that we will encounter later in the course is that of
the epigraph of a function. The epigraph of a function f : RN → R
is the subset of RN +1 created by filling in the space above f :

x
epi f = ∈ RN +1 : x ∈ dom f, f (x) ≤ t .
t

epi f
f

It is not hard to show that f is convex if and only if epi f is a convex

set. This connection should help to illustrate how even though the
definitions of a convex set and convex function might initially appear
quite different, they actually follow quite naturally from each other.

Examples of convex functions

Here are some standard examples for functions on R:
• f (x) = x2 is (strictly) convex.
• affine functions f (x) = ax + b are both convex and concave for
a, b ∈ R.
• exponentials f (x) = eax are convex for all a ∈ R.

31
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
• powers xα are:
– convex on R+ for α ≥ 1,
– concave for 0 ≤ α ≤ 1,
– convex for α ≤ 0.
• |x|α is convex on all of R for α ≥ 1.
• logarithms: log x is concave on R++ := {x ∈ R : x > 0}.
• the entropy function −x log x is concave on R++.

Here are some standard examples for functions on RN :

• affine functions f (x) = hx, ai + b are both convex and concave
on all of RN .
• any valid norm f (x) = kxk is convex on all of RN .
• if f1(x) and f2(x) are both convex, then the sum f1(x) + f2(x)
is also convex.

A useful tool for showing that a function f : RN → R is convex is

the fact that f is convex if and only if the function gv : R → R,

gv (t) = f (x + tv), dom g = {t : x + tv ∈ dom f }

is convex for every x ∈ dom f , v ∈ RN .

Example:
N N
Let f (X) = − log det X with dom f = S++ , where S++ denotes the
set of symmetric and (strictly) positive definite matrices. For any
N
X ∈ S++ , we know that

X = U ΛU T,

32
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
for some diagonal, positive Λ, so we can define

X 1/2 = U Λ1/2U T, and X −1/2 = U Λ−1/2U T.

Now consider any symmetric matrix V and t such that X + tV ∈
N
S++ :
gV (t) = − log det(X + tV )
= − log det(X 1/2(I + tX −1/2V X −1/2)X 1/2)
= − log det X − log det(I + tX −1/2V X −1/2)
N
X
= − log det X − log(1 + σit),
n=1

where the σi are the eigenvalues of X −1/2V X −1/2. The function

− log(1 + σit) is convex, so the above is a sum of convex functions,
which is convex.

Operations that preserve convexity

There are a number of useful operations that we can perform on a
convex function while preserving convexity. Some examples include:
• Positive weighted sum: A positive weighted sum of con-
vex functions is also convex, i.e., if f1, . . . , fm are convex and
w1, . . . , wm ≥ 0, then w1f1 + . . . + wmfm is also convex.
• Composition with an affine function: If f : RN → R
is convex, then g : RD → R defined by
g(x) = f (Ax + b),
where A ∈ RN ×D and b ∈ RN , is convex.

33
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
• Composition with scalar functions: Consider the func-
tion f (x) = h(g(x)), where g : RN → R and h : R → R.
– f is convex if g is convex and h is convex and non-decreasing.
Example: eg(x) is convex if g is convex.
– f is convex if g is concave and h is convex and non-
increasing.
1
Example: g(x) is convex if g is concave and positive.
• Max of convex functions: If f1 and f2 are convex, then
f (x) = max (f1(x), f2(x)) is convex.

First-order conditions for convexity

We say that f is differentiable if dom f is an open set (all of RN ,
for example), and the gradient
 
∂f (x)
∂x1
 
 ∂f (x) 
 
 ∂x2 
∇f (x) =  . 
 .. 
 
 
∂f (x)
∂xN

exists for each x ∈ dom f . The gradient of a function is a core

concept in optimization and as such we review a little bit of what it
means at the end of these notes.

The following characterization of convexity is an incredibly useful

fact, and if we never had to worry about functions that were not
differentiable, we might actually just take this as the definition of a
convex function.

34
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
If f is differentiable, then it is convex if and only if

f (y) ≥ f (x) + ∇f (x)T(y − x) (1)

for all x, y ∈ dom f .

f (y)

g(y) = f (x) + rf (x)T (y x)

y
y=x

This means that the linear approximation

g(y) = f (x) + ∇f (x)T(y − x),
is a global underestimator of f (y).
It is easy to show that f convex, differentiable ⇒ (1). Since f is
convex,
f (x + t(y − x)) ≤ (1 − t)f (x) + tf (y), 0 ≤ t ≤ 1,
and so
f (x + t(y − x)) − f (x)
f (y) ≥ f (x) + , ∀0 < t ≤ 1.
t
Taking the limit as t → 0 on the right yields (1).

35
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
It is also true that (1) ⇒ f convex. To see this, choose arbitrary
x, y and set z θ = (1 − θ)x + θy; then (1) tells us

f (w) ≥ f (z θ ) + ∇f (z θ )T(w − z θ ).

Applying this at w = y and multiplying by θ, then applying it at

w = x and multiplying by (1 − θ) yields

θf (y) ≥ θf (z θ ) + θ∇f (z θ )T(y − z θ ),

(1 − θ)f (x) ≥ (1 − θ)f (z θ ) + (1 − θ)∇f (z θ )T(x − z θ ).

Adding these inequalities together establishes the result.

Second-order conditions for convexity

We say that f : RN → R is twice differentiable if dom f is an open
set, and the N × N Hessian matrix
 
∂ 2 f (x) ∂ 2 f (x) ∂ 2 f (x)
∂x21 ∂x1 ∂x2
··· ∂x1 ∂xN 
∇2f (x) = 
 ... ... ... 
 
∂ 2 f (x) ∂ 2 f (x)
∂xN ∂x1
··· ∂x2N

exists for every x ∈ dom f .

If f is twice differentiable, then it is convex if and only if

∇2f (x) 0 (i.e. ∇2f (x) ∈ S+N ).

for all x ∈ dom f .

36
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
Note that for a one-dimensional function f : R → R, the above con-
dition just reduces to f 00(x) ≥ 0. You can prove the one-dimensional
version relatively easy (although we will not do so here) using the
first-order characterization of convexity described above and the def-
inition of the second derivative. You can then prove the general case
by considering the function g(t) = f (x + tv). To see how, note that
if f is convex and twice differentiable, then so is g. Using the chain
rule, we have
g 00(t) = v T∇2f (x + tv)v.
Since g is convex, the one-dimensional result above tells us that
g 00(0) ≥ 0, and hence v T∇2f (x)v ≥ 0. Since this has to hold for
any v, this means that ∇2f (x) 0. The proof that ∇2f (x) 0
implies convexity follows a similar strategy.
In addition, if
∇2f (x) 0 (i.e. ∇2f (x) ∈ S++
N
). for all x ∈ dom f,
then f is strictly convex. The converse is not quite true; it is possible
that f is strictly convex even if ∇f (x) has eigenvalues that are zero
at isolated points. For example, f (x) = |x|3 is strictly convex but
f 00(0) = 0.

Standard examples (from [BV04])

Quadratic functionals:
1
f (x) = xTP x + q Tx + r,
2
where P is symmetric, has
∇f (x) = P x + q, ∇2f (x) = P ,
so f (x) is convex iff P 0.

37
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
Least-squares:
f (x) = kAx − bk22,
where A is an arbitrary M × N matrix, has

∇f (x) = 2AT(Ax − b), ∇2f (x) = 2ATA,

and is convex for any A.

Quadratic-over-linear:
In R2, if
f (x) = x21/x2,
then
2 x22 −x1x2

2x1/x2 2
∇f (x) = , ∇ f (x) = 3
−x21/x22 x2 −x1x 2 x1
2 x2

= 3 x2 −x1 ,
x2 −x1
and so f is convex on R × [0, ∞] (x1 ∈ R, x2 ≥ 0).

Strong convexity and smoothness

We say that a function f is strongly convex if there is a µ > 0
such that µ
f (x) − kxk22 is convex. (2)
2
We call µ the strong convexity parameter and will sometimes say
that f is µ-strongly convex. In a sense, what we are saying is that
f is so convex that we can subtract off a quadratic function and still
preserve convexity.

38
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
If f is differentiable, there is another interpretation of strong convex-
ity. We have seen that an equivalent definition of regular convexity
is that the linear approximation formed using the gradient at a point
x is a global underestimator of the function — see (1) and the pic-
ture below. If f obeys (2), then we can form a quadratic global
underestimator as
µ
f (y) ≥ f (x) + ∇f (x)T(y − x) + ky − xk22. (3)
2
Here is a picture

f (y)
<latexit sha1_base64="E7tmB/CvDvh6j4TirX5JJXmyt3g=">AAAc1nicrZlbc9vGFceZpJeYvTnpY/SAqcYzcht7RE+a5qHtJJZkSdaNkkhJlqFoFsCShLSLRRYLihSCvHX62q/ST9PpW/tNerA4BMU9dB8y5oxN7PmdveDsOf8FqCAVcWbW1//9wYcf/eSnP/v5x4/av/jlr379m8effHqWqVyHvB8qofRFwDIu4oT3TWwEv0g1ZzIQ/Dy43aj4+ZjrLFZJz0xTfiXZMIkHccgMmK4f7xW+HaTQPCrbw7XpU+8v3mBt8tT7g+cnLBDMtr71e2vTZ7V1oFlY+DIvixel/z1Y/e+vX3z7ol1eP15df75uPx696ODFags/3etPPsv8SIW55IkJBcuyt5311FwVTJs4FLxs+3nGUxbesiF/C5cJkzy7KuyCS+8JWCJvoDT8S4xnrQ97FExm2VQG4CmZGWUuq4zL2NvcDL66KuIkzQ1PwnqiQS48o7wqhF4Uax4aMYULFuoY1uqFIwZRMRDohVnqlbbbT7xBzEUEMOF3oZKSJVHhn5QQRpg9CIqTslxkG3O24bLLObt02eGcHZZ2ZkCp4BMvyWVg16f5wiJ4+bZzVfgq5ZoZpasYF2D0BR8Yv1jt+DoejoxfTbTYc1cu6QnGZT0f9gtVclP3lLdcJ5AYf5S5ryBLqyRG6zNrhTEetMqHHey9pVoFLIhFbKYejO1lhhl3md0lq+zWi1ybrfGps8St0umw5d7EmGnXpzLVEWcizEWeee5SIruUaLWDG5PAehOTeWt3kEWQa16ceFqB7+dePPDugPHo6eK8vCw42YmbsrixQ1YBZNpjYsgDzcDwaAyZChWSKGML3rmJ0NTBCRQkJ5SKErBtcBNPPM+DLNJxyJ2clWby7i6BUOGtV0+52C1oprLJOXD72MkmTp9mroU+j+Ik4bra+igPjbO8P8OmCJYMBQ3SXwFpRBAYA9dZqjL+ubfDtYQqrsKeZjyP1LM4qTSTu0PsYHFpWey4CdGbs57LUhgPcMSGQ66rWxjkScQq1WPCy/IgA71wA32YC+FmmLU5Y5/ALXHXsTY6nqcpS1xHa6tWNLM6q4AohWT02uiMDuG8dR2tzSZmxiHNqwo1KgVNHE4XOwOe7zSUT73Vi6ohVJZrspYQAvIIaqmmi11iqB8dK1KoYHdHD1S1I3rqugaR6wmaLzmM63pW9vpeYXPDqtZgbxn8N81iJ6gRlHhMtgKs7lw8jYeapSPXFeyu62iaqqW+FXCdQSZUSgMOcRxCvmtYOArJbKTqtlRqYhnfo4xscjiwNT+A7kfo9HsYLU4qn2qXDJ+YommX7+zBJos9Zu139mB6KG30rD+0fqia/9edTRbcoQkb1X7y7L196qNIDTyejGOtkqq0s7YNuhlxpbksgIPaxFX0yqL7oLHgJThsUVns268Fgt9l0cMLSx/MV9gVwI0mKk4iMHiQ1gOva60e2HkSpXrgp3BMtf2IDzy0QPIMYiHAfaxzwb0Rr47EL1Pj3cWRGVUXEU/NCB7N4ATmUXYbp+X7DZ97tpb2bCoYOXUDJAEhIZKQkAgJKeUxR0LEbDxAMiBkiGRIyAjJiJAYCana8Q2SG0JukdwSIpCQo2AskRAhGSdIEkIUEiIS4xRJSsh3SL4jRCPRhGRIMkIMEiLK4xxJTsgYyZiQOyR3hEyQTAiZIpkSco/kvjok3SwV6WiWqX7dIOnKTeNhr0k2MVvvtUfdIOnLxXyQukHyOM1ioWZ77M+a5GYerOZ+2Woe8GUYJGjuUDdIpqu5h70mac3StPGoGyS/4e01anywRVJ9lhvViyjJ9gYmFCoZh3oerFmTZP+saH162I71aFY2fnVJcj0ezne1bpBYsmaN1SXJ/MUdzd+xo+lovsoRXWY4x+ESDIM2vTOKleRDNg9T1aBFcM/1LBTrdIRkprCd8secGM65IL8p7QtI8Y07k3yJ5CUhG0jIW7TcRLJJyBYS8tYnXyF5Rcg2km1CdpCQVwb5GslrQvaQ7BGyj2SfkAMkB4QcIjkk5AjJESFdJF1CjpEcE3KChPyaIU+RnBLSQ0JemmQfSZ+QMyRnhJwjOSfkAskFIW+QvCHkEsklTXy5jcpdcX97mXLLTVRu67K5TLllDyXVuvSWSarcn8mh9dlfKoeyG88cuqSQ5SnKkeWny+RI9hu9sU795Xoju6P5PFRQZDebYyoo8ggFxTocLRcUuQuPa/bBdZeW5QQRrZdahiwkMiStDFlmZcjRlMD+kgOvf83vT7XCkORe6nhCHMecOOIzJnkgWeo4oY7TpY7T8seJ6uKroKpE1b4QUlkV6mUDibIKtdFAIq5CbTaQ6KtQWw0kEivUqwYSlRVqu4FEaIXaaSDRWqF2G0gyS6jXDSS5JdReA4kcC7XfQKLIQh00kIiyUIcNJLos1FEDiTQL1W0gUWehjhtIBFqokwYSjRbqtIFEpoXqNZAotVD9BhKxFuqsgUSvhTpvIJFsoS4aSFRbqDcNJMIt1GUDL9/Hk0fGTVUm8EWLBIwvkZEaAeMGMlIiYNxERioEjFvISIGA8RUyUh9g3EZGygOMO8hIdYBxFxkpDjC+RkZqA4x7yEhpgHEfGakMMB4gI4UBxkNkpC7AeISMlAUYu8hIVYDxGBkpCjCeICM1AcZTZKQkwNhDRioCjH1kpCDAeIaM1AMYz5GRcgDjBTJSDWB8g4wUAxgvkdlaaF8/Xu24f5SjF2cvnne+fP7F8RerX3+Ff7D7uPVZ63ettVan9afW162dVrfVb4Wtf7b+1fpP678rFys/rPxt5e+164cfYJ/fthY+K//4H1dJElw=</latexit>

µ
g(y) = f (x) + rf (x)T (y x) + ky xk22
2

y
y=x

We will show that (2) implies (3) in a future homework.

If f is twice differentiable, there is yet another interpretation of strong

convexity. If f obeys (2) then we know that the Hessian of f (x) −
µ
2
kxk22 does not have any negative eigenvalues, i.e.

µ

∇2 f (x) − kxk22 0.
2

39
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
Thus (since ∇2(kxk22) = 2I),

∇2f (x) − µI 0,
⇓
∇2f (x) µI.

This is just a fancy way of saying that the smallest eigenvalue of the
Hessian ∇2f (x) is uniformly bounded below by µ for all x.

In addition to convexity, there is one more type of structure that we

consider for functions f : RN → R. We say that differentiable f has
a Lipschitz gradient if there is a L such that

k∇f (x) − ∇f (y)k2 ≤ Lkx − yk2, for all x, y. (4)

This means that the gradient ∇f (x) does not change radically as
we change x. Functions f that obey (4) are also referred to as L-
smooth. This definition applies whether or not the function f is
convex.
Whether or not f is convex, if it is L-smooth, it there is a natural
quadtratic over estimator. Around any point x, we have the upper
bound
L
f (y) ≤ f (x) + ∇f (x)T(y − x) + ky − xk22. (5)
2
Here is a picture

40
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
L
<latexit sha1_base64="Yi+dSu617yNY3nx7sZwA0YiX9N0=">AAAc1HicrZlbc9vGFceZ9BazN6d9jB4w1XhGbmOP5EnTPLSdxJIsydaF1t0yFM0CWJIr7WKRxYIiBaMvnb72q/TT9KEv7VfpweIQFPfQeciYM7aw/9/ZC3bP+YOQokyK3K6u/uejj3/045/89GefPOj+/Be//NWvH376m9NcFybmJ7GW2pxHLOdSpPzECiv5eWY4U5HkZ9HNes3PRtzkQqfHdpLxS8UGqeiLmFmQrh7ulKEbpDQ8qbqDlcnj4C9Bf2X8OPhDEKYsksy1vg2PVyZPGrVvWFzuVuWzKnwHWvju6tm3z7rV1cPl1aer7hPQizW8WO7gp3f16Wd5mOi4UDy1sWR5/nZtNbOXJTNWxJJX3bDIecbiGzbgb+EyZYrnl6VbbhU8AiUJ+trAv9QGTr3fo2QqzycqgkjF7DD3WS0uYm8L2//qshRpVliexs1E/UIGVgf1BgaJMDy2cgIXLDYC1hrEQwZ7YmGb52ZpVtrtPgr6gssEYMpvY60US5MyPKzKsJ49isrDqppn6zO27rOLGbvw2f6M7VduZkCZ5OMgLVTk1mf43CJ49Xbtsgx1xg2z2tR7XIIYSt63Ybm8FhoxGNqwnmi+545a0BPERT3v94t1et30VDfcpJAYf1RFqCFH6xRG9YlTYYx7rep+B3dvmdERi4QUdhLA2EFumfWX2Vuwyl6zyJXpGh97S9ysvA6b/k2MmPFjaqnZcSbjQhZ54C8lcUtJltfwYFJYb2rzYOUWsghyLRBpYDTEfh6IfnALjCeP5+flVcnJSVxX5bUbst5AZgImBzwyDIQHI8hUqJBUW1fu3k3EttmcSENyQqloCccGN/EoCALIIiNi7uWssuP3d4mkjm+CZsr5blE7lUvOvt/HTTb2+rRzzfV5INKUm/rokyK23vL+DIciWTqQdJP+Csgggo2xcJ1nOuefB9vcKKjietuznBeJfiLS2jG5P8Q2FpdR5bafEMczduyzDMYDnLDBgJv6FvpFmrDa9ZgM8iLKwS/8jd4vpPQzzGne2IdwS9wPbEQv8ihjqR/otHpFU9VbBexSTEZvRG902M4bP9BpLjFzDmleV6jVGXjiYDLfGfDspKF8mqOedw2p88KQtcSwIQ+glho630VA/RihSaGC7o8e6fpEzMQPjRI/EjxfcRjXj6z15l7hcOO61uBsGfw3yYW3qQmUuCBHAao/F8/EwLBs6IeC7ocOJ5leGFsDPxhsQmd0w2EfB5DvBhaORjIdqb4tnVmhxB3ayAaHB7bhe9D9AIN+D6OJtI6pT8nysS3bdvXeHmw832Pafm8PZgbK7Z6Lh9bf6ub3hrPxXDg04aC6j558sE/zKNL9gKcjYXRal3bedZtuh1wbrkrg4Dai3r2q7N1rzEVJDkdUlbvuxxzBn1V5jBeO3puvdCuAG021SBMQAkjrftBzagA6T5PM9MMMHlPdMOH9ABVInr6QEsJHppA8GPL6kfhlZoNbkdhhfZHwzA7hqxk8gXmS34is+rDb5z9bK/dsKhl56kZIIkJiJDEhCRJSyiOOhJjZqI+kT8gAyYCQIZIhIQIJqdrRNZJrQm6Q3BAikZBHwUghIUYySpGkhGgkxCRGGZKMkO+QfEeIQWIIyZHkhFgkxJRHBZKCkBGSESG3SG4JGSMZEzJBMiHkDsld/ZD0s1Rmw2mmhk2DpCu3bYS7JtnEXL03EU2DpC+Xs0GaBsnjLBdST884nDbJzdxbzd2i1dzjizBY0CygaZBM17MId03SmmVZG9E0SH7Du2vSxmCLpPo0N0JFsyNtYUqhViI2s82aNkn2T4s2pA/bkRlOyyasL0mui8HsVJsG2UvWrrG+JJk/f6LFe040G85WOaTLjGc4XoBh0LZ3TrFWfMBm21Q3aBHccTPdilU6Qjp12LXqhzwxvOeC+qZyLyDlN/5M6jmS54SsIyFv0WoDyQYhm0jIW596geQFIVtItgjZRkJeGdRLJC8JeYXkFSG7SHYJ2UOyR8g+kn1CDpAcENJD0iPkNZLXhBwiIb/NUEdIjgg5RkJemtQJkhNCTpGcEnKG5IyQcyTnhLxB8oaQCyQXNPHVFjp3zcOtRc6tNtC5XcjGIudWx2ipLuR4kaWq3akdupjdhXaoemIa0COFrI7Qjhw/WmRH6qT1Gxd0sthvVG84m4caiurlM0wNRR2gobiAg8WGonbg65r74rpDy3KMiNZLY0MOEhtSzoYcczbkeUrkfpMDr3/t758ahyHJvTDwkASOOAnE75jkC8nCwDENnCwMnFQ/zFTnXwV1baruhZDaqtTPW0icVer1FhJzlXqjhcRfpd5sIbFYqV+0kLis1FstJEYr9XYLiddKvdNCkllSv2whyS2pX7WQ2LHUuy0kjiz1XguJKUu930Liy1IftJBYs9S9FhJ3lvp1C4lBS33YQuLRUh+1kNi01MctJE4t9UkLiVlLfdpC4tdSn7WQWLbU5y0kri31mxYS45b6ooUXH+KbR85tXSbwgxYJiM+RkRoBcR0ZKREQN5CRCgFxExkpEBBfICP1AeIWMlIeIG4jI9UB4g4yUhwgvkRGagPEV8hIaYC4i4xUBoh7yEhhgLiPjNQFiAfISFmA2ENGqgLE18hIUYB4iIzUBIhHyEhJgHiMjFQEiCfISEGAeIqM1AOIZ8hIOYB4joxUA4hvkJFiAPECmauF7tXD5TX/j3L04vTZ07Uvn37x+ovlr7/CP9h90vms87vOSmet86fO153tTq9z0ok7/+r8u/Pfzv+WTpfeLf196R9N6McfYZ/fduY+S//8PwUnEVY=</latexit>

g(y) = f (x) + rf (x)T (y x) + ky xk22

2
f (y)

y
y=x

We will show that (4) implies (5) in a future homework.

If f is twice differentiable, then there is another way to interpret

L-smoothness. If f obeys (4), then we have a uniform upper bound
on the largest eigenvalue of the Hessian at every point:

∇2f (x) LI, for all x. (6)

This makes intuitive sense, as (4) tells us that the first derivative
cannot change too quickly, so there must be some kind of bound on
the second derivative. We will establish that (4) implies (6) (again,
regardless of whether f is convex) in a future homework.

41
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
Review: The gradient
First, recall that a function f : R → R is differentiable if its deriva-
tive, defined as
f (x + δ) − f (x)
f 0(x) = lim ,
δ→0 δ
exists for all x ∈ dom f . To extend this notion to functions of
multiple variables, we must first extend our notion of a derivative.
For a function f : RN → R that is defined on N -dimensional vectors,
recall that the partial derivative with respect to xn is
∂f (x) f (x + δen) − f (x)
= lim ,
∂xn δ→0 δ
where en is the nth “standard basis element”, i.e., the vector of all
zeros with a single 1 in the nth entry.

The gradient of a function f : RN → R is the vector of partial

derivatives given by:
 
∂f (x)
 ∂f∂x(x)
1 
 
∇f (x) =  ..   ∂x2  .
 . 
∂f (x)
∂xN

Similar to the scalar case, we say that f is differentiable if the gradient

exists for each x ∈ dom f .

We will use the term gradient in two subtly different ways. Sometimes
we use ∇f (x) to describe a vector-valued function or a vector field,

42
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
i.e., a function that takes an arbitrary x ∈ RN and produces another
vector. When referring to this vector-valued function, we sometimes
use the words gradient map, but sometimes we will overload the
term “gradient”; we will use the notation ∇f (x) to refer to the
vector given by the gradient map evaluated at a particular point
x. So sometimes when we say “gradient” we mean a vector-valued
function, and sometimes we mean a single vector, and in both cases
we use the notation ∇f (x). Which one will usually be obvious by
the context.1

Note that in some cases we will use the notation ∇xf (x) to indicate
that we are taking the gradient with respect to x. This can be helpful
when f is a function of more variables than just x, but most of the
time this is not necessary so we will typically use the simpler ∇f (x).

Here we adopt the convention that the gradient is a column vector.

This is the most common choice and is most convenient in this class,
but some texts will instead treat the gradient as a row vector. The
reason for this is to align with the standard convention for the Ja-
cobian.2 Thus, it is always worth double-checking what notation is
being used when consulting outside resources.

1
This is just like in the scalar case, where the notation f (x) can sometimes
refer to the function f and sometimes the function evaluated at x.
2
The Jacobian of a vector-valued function f : RN → RM is the M × N
matrix of partial derivatives with respect to each dimension in the range.
In this course we will mostly be concerned with functions mapping to a
single dimension, in which case the Jacobian would be the 1 × N matrix
∇T f (x), i.e., the gradient but treated as a row vector. Directly defining
the gradient as a row vector instead of a column vector is thus more
convenient in some contexts.

43
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
Interpretation of the gradient

The gradient is one of the most fundamental concepts of this course.

We can interpret the gradient in many ways. One way to think of
the gradient when evaluated at a particular point x is that it defines
a linear mapping from RN to R. Specifically, given a u ∈ RN , we
can use ∇f (x) to define a mapping of u to R by simply taking the
inner product between the two vectors:
hu, ∇f (x)i.
What does this mapping tell us? It computes the directional
derivative of f in the direction of u, i.e.,
f (x + δu) − f (x)
hu, ∇f (x))i = lim . (7)
δ→0 δ
This tells us how fast f is changing at x when we move in the
direction of u.

This fundamental fact is a direct consequence of Taylor’s theorem (see

the Technical Details section below). Specifically, let f : RN → R be
any differentiable function. Then for any u ∈ RN , we can write
f (x + u) = f (x) + hu, ∇f (x)i + h(u)kuk2,
where h(u) : RN → R is some function satisfying h(u) → 0 as
u → 0.

If we substitute δu in place of u above and rearrange, we obtain the

identity
f (x + δu) − f (x) − h(δu)kδuk2
hu, ∇f (x)i =
δ
f (x + δu) − f (x)
= − h(δu)kuk2.
δ

44
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
Note that this holds for any δ > 0. Since h(δu) → 0 as δ → 0, we
can arrive at (7) by simply taking the limit as δ → 0.

A related way to think of ∇f (x) is as a vector that is pointing in the

direction of steepest ascent, i.e., the direction in which f increases
the fastest when starting at x. To justify this, note that we just
observed that we can interpret hu, ∇f (x)i as measuring how quickly
f increases when we move in the direction of u. How can we find
the direction u that maximizes this quantity? You may recall that
the Cauchy-Schwarz inequality tells us that

|hu, ∇f (x)i| ≤ k∇f (x)k2kuk2,

and that this holds with equality when u is co-linear with ∇f (x),
i.e., when u points in the same direction as ∇f (x). Specifically, this
implies that ∇f (x) is the direction of steepest ascent, and −∇f (x)
is the direction of steepest descent.

More broadly, this characterizes the entire sets of ascent/descent di-

rections. Suppose that f : RN → R is differentiable at x. If u ∈ RN
is a vector obeying hu, ∇f (x)i < 0, then we say that u is a descent
direction from x, meaning we can find a t > 0 small enough so
that
f (x + tu) < f (x) (8)
Similarly, if hu, ∇f (x)i > 0, then we say that u is an ascent
direction from x, as again for t > 0 small enough,

f (x + tu) > f (x).

It should hopefully not be a huge stretch of the imagination to

see that being able to compute the direction of steepest ascent (or

45
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
steepest descent) will be useful in the context of finding a maxi-
mum/minimum of a function.

To show that hu, ∇f (x)i < 0 implies (8), we again use the Taylor
theorem to get

f (x + tu) = f (x) + t (hu, ∇f (x)i + h(tu)kuk2) ,

where now we have h(tu) → 0 as t → 0. For t > 0 small enough,

we can make |h(tu)| · kuk2 < |hu, ∇f (x)i|, and so the term inside
the parentheses above is negative if hu, ∇f (x)i is negative, and it is
positive if hu, ∇f (x)i is positive.

46
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
Technical Details: Taylor’s Theorem

You might recall the mean-value theorem from your first calculus
class. If f : R → R is a differentiable function on the interval
[a, x], then there is a point inside this interval where the derivative
of f matches the line drawn between f (a) and f (x). More precisely,
there exists a z ∈ [a, x] such that
f (x) − f (a)
f 0(z) = .
x−a
Here is a picture:

f (x)
f (a)

f (x) f (a)
f 0 (z) =
x a

a z x

We can re-arrange the expression above to say that there is some z

between a and x such that
f (x) = f (a) + f 0(z)(x − a).

The mean-value theorem extends to derivatives of higher order; in

this case it is known as Taylor’s theorem. For example, suppose
that f is twice differentiable on [a, x], and that the first derivative f 0
is continuous. Then there exists a z between a and x such that
0 f 00(z)
f (x) = f (a) + f (a)(x − a) + (x − a)2.
2

47
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
In general, if f is k +1 times differentiable, and the first k derivatives
are continuous, then there is a point z between a and x such that
f (k+1)(z)
f (x) = pk,a(x) + (x − a)k+1,
k!
where pk,a(x) polynomial formed from the first k terms of the Taylor
series expansion around a:
0 f 00(a) 2 f (k)(a)
pk,a(x) = f (a) + f (a)(x − a) + (x − a) + · · · + (x − a)k .
2 k!

These results give us a way to quantify the accuracy of the Taylor ap-
proximation around a point. For example, if f is twice differentiable
with f 0 continuous, then
f (x) = f (a) + f 0(a)(x − a) + h1(x)(x − a),
for a function h1(x) goes to zero as x goes to a:
lim h1(x) = 0.
x→a

In fact, you do not even need two derivatives for this to be true. If
f has a single derivative, then we can find such an h1. When f has
two derivatives, then we have an explicit form for h1:
f 00(zx)
h1(x) = (x − a),
2
where zx is the point returned by the (generalization of) the mean
value theorem for a given x.

In general, if f has k derivatives, then there exists an hk (x) with

limx→a hk (x) = 0 such that
f (x) = pk,a(x) + hk (x)(x − a)k .

48
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
All of the results above extend to functions of multiple variables. For
example, if f (x) : RN → R is differentiable, then around any point
a,
f (x) = f (a) + hx − a, ∇f (a)i + h1(x)kx − ak2,
where h1(x) → 0 as x approaches a from any direction. If f (x) is
twice differentiable and the first derivative is continuous, then there
exists z on the line between a and x such that
1
f (x) = f (a) + hx − a, ∇f (a)i + (x − a)T∇2f (z)(x − a).
2
We will use these two particular multidimensional results in this
course, referring to them generically as “Taylor’s theorem”.

References
[BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cam-
bridge University Press, 2004.

49
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022

Func 20160919
No ratings yet
Func 20160919
35 pages
Chapter - 2 - Convex Function
No ratings yet
Chapter - 2 - Convex Function
32 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
Convex Functions: September 2, 2008
No ratings yet
Convex Functions: September 2, 2008
21 pages
CS599: Convex and Combinatorial Optimization Fall 2013 Lectures 5-6: Convex Functions
No ratings yet
CS599: Convex and Combinatorial Optimization Fall 2013 Lectures 5-6: Convex Functions
55 pages
Convex Optimization for Students
No ratings yet
Convex Optimization for Students
12 pages
03 Convex Functions
No ratings yet
03 Convex Functions
31 pages
D1 Lecture 8
No ratings yet
D1 Lecture 8
6 pages
Lect3 Removed
No ratings yet
Lect3 Removed
44 pages
Convex Functions in Optimization
No ratings yet
Convex Functions in Optimization
14 pages
Lecture Notes PDF
No ratings yet
Lecture Notes PDF
143 pages
Convex Functions Lecture Notes
No ratings yet
Convex Functions Lecture Notes
14 pages
Convex Sets and Functions Guide
No ratings yet
Convex Sets and Functions Guide
42 pages
Chapter 2, Lecture 3: Building Convex Functions
No ratings yet
Chapter 2, Lecture 3: Building Convex Functions
4 pages
CPSC 542f Notes
No ratings yet
CPSC 542f Notes
10 pages
Convexity I: Sets and Functions: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Convexity I: Sets and Functions: Ryan Tibshirani Convex Optimization 10-725
27 pages
Section05 Solutions
No ratings yet
Section05 Solutions
5 pages
Convex Functions and Optimization
No ratings yet
Convex Functions and Optimization
20 pages
Convex Function: From Wikipedia, The Free Encyclopedia
No ratings yet
Convex Function: From Wikipedia, The Free Encyclopedia
7 pages
Convex Functions: Renu M. R
No ratings yet
Convex Functions: Renu M. R
43 pages
Convex Optimization L2 18
No ratings yet
Convex Optimization L2 18
11 pages
Lecture3 ConvexSetsFuns PDF
No ratings yet
Lecture3 ConvexSetsFuns PDF
43 pages
Convexity-Print Version PDF
No ratings yet
Convexity-Print Version PDF
13 pages
Convexity and Quasiconvexity Guide
No ratings yet
Convexity and Quasiconvexity Guide
13 pages
Convex Functions: See P. 10 of The Handout On Preliminary Material
No ratings yet
Convex Functions: See P. 10 of The Handout On Preliminary Material
20 pages
Lecture 09
No ratings yet
Lecture 09
4 pages
Lecture 5
No ratings yet
Lecture 5
25 pages
Convexity: 1 Warm-Up
No ratings yet
Convexity: 1 Warm-Up
7 pages
Convex Fns Scribed
No ratings yet
Convex Fns Scribed
6 pages
Concave and Convex Functions: 1 Basic Definitions
No ratings yet
Concave and Convex Functions: 1 Basic Definitions
12 pages
Lec3 Convex Function Exercise
No ratings yet
Lec3 Convex Function Exercise
4 pages
Convexity Examples: CE 377K Stephen D. Boyles Spring 2015
No ratings yet
Convexity Examples: CE 377K Stephen D. Boyles Spring 2015
11 pages
Convex Functions and Their Applications PDF
100% (2)
Convex Functions and Their Applications PDF
44 pages
Convex Analysis Fundamentals
No ratings yet
Convex Analysis Fundamentals
12 pages
CPSC 542F WINTER 2017: Lecture Notes
No ratings yet
CPSC 542F WINTER 2017: Lecture Notes
10 pages
Calculus Criterion For Concavity
No ratings yet
Calculus Criterion For Concavity
14 pages
Concave and Convex Functions Guide
No ratings yet
Concave and Convex Functions Guide
14 pages
1 Convexity
No ratings yet
1 Convexity
24 pages
Convex Optimisation Solutions
No ratings yet
Convex Optimisation Solutions
14 pages
Convex Functions Explained
No ratings yet
Convex Functions Explained
36 pages
Convex Duality Cond Enced
No ratings yet
Convex Duality Cond Enced
57 pages
ConvexSpring25 Week3
No ratings yet
ConvexSpring25 Week3
30 pages
Analiza Convexa
No ratings yet
Analiza Convexa
4 pages
Convex and Concave Function Analysis
No ratings yet
Convex and Concave Function Analysis
5 pages
Convexity 1
No ratings yet
Convexity 1
3 pages
Convex Sets Functions
No ratings yet
Convex Sets Functions
16 pages
Jan Van Tiel - Convex Analysis - An Introductory Text-Wiley (1984) PDF
No ratings yet
Jan Van Tiel - Convex Analysis - An Introductory Text-Wiley (1984) PDF
135 pages
Convexity and Concavity in Calculus
No ratings yet
Convexity and Concavity in Calculus
8 pages
Electronic Companion of A Unified Theory of DRO
No ratings yet
Electronic Companion of A Unified Theory of DRO
37 pages
CS 726: Nonlinear Optimization 1 Lecture 2: Background Material
No ratings yet
CS 726: Nonlinear Optimization 1 Lecture 2: Background Material
14 pages
Lect 3 Concave and Convex
No ratings yet
Lect 3 Concave and Convex
18 pages
LGT2
No ratings yet
LGT2
32 pages
Convexity in Optimization: Key Concepts
No ratings yet
Convexity in Optimization: Key Concepts
13 pages
Convexity and Concavity in Calculus
No ratings yet
Convexity and Concavity in Calculus
8 pages
Vycon Ecoview Specifications
No ratings yet
Vycon Ecoview Specifications
1 page
Application-Form-FSEC-for-Building-Permit Koronadal
No ratings yet
Application-Form-FSEC-for-Building-Permit Koronadal
1 page
Pune Metro Environmental Impact
No ratings yet
Pune Metro Environmental Impact
15 pages
Acoustic Insights for Engineers
100% (1)
Acoustic Insights for Engineers
16 pages
How Cosmic Forces Shape Our Destiny, by Nikola Tesla, 1915
No ratings yet
How Cosmic Forces Shape Our Destiny, by Nikola Tesla, 1915
4 pages
11th Physics Book Back Questions With Answers in English
No ratings yet
11th Physics Book Back Questions With Answers in English
29 pages
UPSC IAS Mains Electrical Engineering Optional Syllabus - GS SCORE
No ratings yet
UPSC IAS Mains Electrical Engineering Optional Syllabus - GS SCORE
2 pages
Automatic Transfer Switch PLT
100% (1)
Automatic Transfer Switch PLT
157 pages
Grafik Pertumbuhan Anak
No ratings yet
Grafik Pertumbuhan Anak
6 pages
5.03.2 Tanker Based FPSOs y Cont. - Budhiraja
No ratings yet
5.03.2 Tanker Based FPSOs y Cont. - Budhiraja
44 pages
Qmt245 Course
No ratings yet
Qmt245 Course
3 pages
Chapter Three Complete Note
No ratings yet
Chapter Three Complete Note
53 pages
Research Proposal Covid 19
No ratings yet
Research Proposal Covid 19
19 pages
The Opportunity Cost of Using Excess Capacity
No ratings yet
The Opportunity Cost of Using Excess Capacity
8 pages
Operation Listo Disaster Preparedness Manual Presentation Final - Zamboanga - Regional
100% (1)
Operation Listo Disaster Preparedness Manual Presentation Final - Zamboanga - Regional
89 pages
PowerPoint 9 Collisions (Momentum) in 2D (4U)
No ratings yet
PowerPoint 9 Collisions (Momentum) in 2D (4U)
11 pages
LKG GK Syllabus Whole Session
No ratings yet
LKG GK Syllabus Whole Session
6 pages
Frankenstein Context
No ratings yet
Frankenstein Context
1 page
07 - Combustion - Optimisation PDF
100% (1)
07 - Combustion - Optimisation PDF
90 pages
VLSI Module 4 & 5 Questions
No ratings yet
VLSI Module 4 & 5 Questions
2 pages
Assignemnt2 - Xchart Rchart
No ratings yet
Assignemnt2 - Xchart Rchart
2 pages
2025 Ebs
No ratings yet
2025 Ebs
47 pages
Skandvig Terra PLC: Global Water Solutions
No ratings yet
Skandvig Terra PLC: Global Water Solutions
6 pages
Concrete Mix Guide for Builders
No ratings yet
Concrete Mix Guide for Builders
14 pages
Rfid Logger With Mysql Database
No ratings yet
Rfid Logger With Mysql Database
10 pages
Implementing Binary Adder and Subtractor Circuits: Laboratory Exercise 4
100% (1)
Implementing Binary Adder and Subtractor Circuits: Laboratory Exercise 4
11 pages
Stoic H Practice Key
No ratings yet
Stoic H Practice Key
2 pages
Caulking
No ratings yet
Caulking
6 pages
Export Statistics - COIR ALL ITEMS - Coir Board
No ratings yet
Export Statistics - COIR ALL ITEMS - Coir Board
6 pages
The Cellular Approach: Smart Energy Region Wunsiedel. Testbed For Smart Grid, Smart Metering and Smart Home Solutions
No ratings yet
The Cellular Approach: Smart Energy Region Wunsiedel. Testbed For Smart Grid, Smart Metering and Smart Home Solutions
6 pages

03 Convex Functions Notes Cvxopt f22

Uploaded by

03 Convex Functions Notes Cvxopt f22

Uploaded by

Convex functions

We have seen what it means for a set to be convex. In this set

To define this concept rigorously, we must be specific about the subset

θf (x) + (1 − θ)f (y)

Note also that we say that a function is f is concave if −f is convex,

It will also sometimes be useful to consider the extension of f from

It is not hard to show that f is convex if and only if epi f is a convex

Examples of convex functions

Here are some standard examples for functions on RN :

A useful tool for showing that a function f : RN → R is convex is

gv (t) = f (x + tv), dom g = {t : x + tv ∈ dom f }

is convex for every x ∈ dom f , v ∈ RN .

X 1/2 = U Λ1/2U T, and X −1/2 = U Λ−1/2U T.

where the σi are the eigenvalues of X −1/2V X −1/2. The function

Operations that preserve convexity

First-order conditions for convexity

exists for each x ∈ dom f . The gradient of a function is a core

The following characterization of convexity is an incredibly useful

f (y) ≥ f (x) + ∇f (x)T(y − x) (1)

for all x, y ∈ dom f .

g(y) = f (x) + rf (x)T (y x)

This means that the linear approximation

Applying this at w = y and multiplying by θ, then applying it at

θf (y) ≥ θf (z θ ) + θ∇f (z θ )T(y − z θ ),

Adding these inequalities together establishes the result.

Second-order conditions for convexity

exists for every x ∈ dom f .

If f is twice differentiable, then it is convex if and only if

∇2f (x)  0 (i.e. ∇2f (x) ∈ S+N ).

for all x ∈ dom f .

Standard examples (from [BV04])

∇f (x) = 2AT(Ax − b), ∇2f (x) = 2ATA,

and is convex for any A.

Strong convexity and smoothness

We will show that (2) implies (3) in a future homework.

If f is twice differentiable, there is yet another interpretation of strong

In addition to convexity, there is one more type of structure that we

k∇f (x) − ∇f (y)k2 ≤ Lkx − yk2, for all x, y. (4)

g(y) = f (x) + rf (x)T (y x) + ky xk22

We will show that (4) implies (5) in a future homework.

If f is twice differentiable, then there is another way to interpret

∇2f (x)  LI, for all x. (6)

The gradient of a function f : RN → R is the vector of partial

Similar to the scalar case, we say that f is differentiable if the gradient

Here we adopt the convention that the gradient is a column vector.

The gradient is one of the most fundamental concepts of this course.

This fundamental fact is a direct consequence of Taylor’s theorem (see

If we substitute δu in place of u above and rearrange, we obtain the

A related way to think of ∇f (x) is as a vector that is pointing in the

|hu, ∇f (x)i| ≤ k∇f (x)k2kuk2,

More broadly, this characterizes the entire sets of ascent/descent di-

f (x + tu) > f (x).

It should hopefully not be a huge stretch of the imagination to

f (x + tu) = f (x) + t (hu, ∇f (x)i + h(tu)kuk2) ,

where now we have h(tu) → 0 as t → 0. For t > 0 small enough,

We can re-arrange the expression above to say that there is some z

The mean-value theorem extends to derivatives of higher order; in

In general, if f has k derivatives, then there exists an hk (x) with

You might also like

∇2f (x) 0 (i.e. ∇2f (x) ∈ S+N ).

∇2f (x) LI, for all x. (6)