03 Convex Functions Notes Cvxopt f22
03 Convex Functions Notes Cvxopt f22
f (θx + (1 − θ)y)
x y
29
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
We say that f is strictly convex if dom f is convex and
f (θx + (1 − θ)y) < θf (x) + (1 − θ)f (y)
for all x 6= y ∈ dom f and 0 < θ < 1.
Note that in the definition above, the domain matters. For example,
f (x) = x3
is convex if dom f = R+ = [0, ∞] but not if dom f = R.
30
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
The epigraph
A useful notion that we will encounter later in the course is that of
the epigraph of a function. The epigraph of a function f : RN → R
is the subset of RN +1 created by filling in the space above f :
x
epi f = ∈ RN +1 : x ∈ dom f, f (x) ≤ t .
t
epi f
f
31
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
• powers xα are:
– convex on R+ for α ≥ 1,
– concave for 0 ≤ α ≤ 1,
– convex for α ≤ 0.
• |x|α is convex on all of R for α ≥ 1.
• logarithms: log x is concave on R++ := {x ∈ R : x > 0}.
• the entropy function −x log x is concave on R++.
Example:
N N
Let f (X) = − log det X with dom f = S++ , where S++ denotes the
set of symmetric and (strictly) positive definite matrices. For any
N
X ∈ S++ , we know that
X = U ΛU T,
32
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
for some diagonal, positive Λ, so we can define
33
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
• Composition with scalar functions: Consider the func-
tion f (x) = h(g(x)), where g : RN → R and h : R → R.
– f is convex if g is convex and h is convex and non-decreasing.
Example: eg(x) is convex if g is convex.
– f is convex if g is concave and h is convex and non-
increasing.
1
Example: g(x) is convex if g is concave and positive.
• Max of convex functions: If f1 and f2 are convex, then
f (x) = max (f1(x), f2(x)) is convex.
34
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
If f is differentiable, then it is convex if and only if
f (y)
y
y=x
35
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
It is also true that (1) ⇒ f convex. To see this, choose arbitrary
x, y and set z θ = (1 − θ)x + θy; then (1) tells us
f (w) ≥ f (z θ ) + ∇f (z θ )T(w − z θ ).
36
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
Note that for a one-dimensional function f : R → R, the above con-
dition just reduces to f 00(x) ≥ 0. You can prove the one-dimensional
version relatively easy (although we will not do so here) using the
first-order characterization of convexity described above and the def-
inition of the second derivative. You can then prove the general case
by considering the function g(t) = f (x + tv). To see how, note that
if f is convex and twice differentiable, then so is g. Using the chain
rule, we have
g 00(t) = v T∇2f (x + tv)v.
Since g is convex, the one-dimensional result above tells us that
g 00(0) ≥ 0, and hence v T∇2f (x)v ≥ 0. Since this has to hold for
any v, this means that ∇2f (x) 0. The proof that ∇2f (x) 0
implies convexity follows a similar strategy.
In addition, if
∇2f (x) 0 (i.e. ∇2f (x) ∈ S++
N
). for all x ∈ dom f,
then f is strictly convex. The converse is not quite true; it is possible
that f is strictly convex even if ∇f (x) has eigenvalues that are zero
at isolated points. For example, f (x) = |x|3 is strictly convex but
f 00(0) = 0.
37
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
Least-squares:
f (x) = kAx − bk22,
where A is an arbitrary M × N matrix, has
Quadratic-over-linear:
In R2, if
f (x) = x21/x2,
then
2 x22 −x1x2
2x1/x2 2
∇f (x) = , ∇ f (x) = 3
−x21/x22 x2 −x1x 2 x1
2 x2
= 3 x2 −x1 ,
x2 −x1
and so f is convex on R × [0, ∞] (x1 ∈ R, x2 ≥ 0).
38
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
If f is differentiable, there is another interpretation of strong convex-
ity. We have seen that an equivalent definition of regular convexity
is that the linear approximation formed using the gradient at a point
x is a global underestimator of the function — see (1) and the pic-
ture below. If f obeys (2), then we can form a quadratic global
underestimator as
µ
f (y) ≥ f (x) + ∇f (x)T(y − x) + ky − xk22. (3)
2
Here is a picture
f (y)
<latexit sha1_base64="E7tmB/CvDvh6j4TirX5JJXmyt3g=">AAAc1nicrZlbc9vGFceZpJeYvTnpY/SAqcYzcht7RE+a5qHtJJZkSdaNkkhJlqFoFsCShLSLRRYLihSCvHX62q/ST9PpW/tNerA4BMU9dB8y5oxN7PmdveDsOf8FqCAVcWbW1//9wYcf/eSnP/v5x4/av/jlr379m8effHqWqVyHvB8qofRFwDIu4oT3TWwEv0g1ZzIQ/Dy43aj4+ZjrLFZJz0xTfiXZMIkHccgMmK4f7xW+HaTQPCrbw7XpU+8v3mBt8tT7g+cnLBDMtr71e2vTZ7V1oFlY+DIvixel/z1Y/e+vX3z7ol1eP15df75uPx696ODFags/3etPPsv8SIW55IkJBcuyt5311FwVTJs4FLxs+3nGUxbesiF/C5cJkzy7KuyCS+8JWCJvoDT8S4xnrQ97FExm2VQG4CmZGWUuq4zL2NvcDL66KuIkzQ1PwnqiQS48o7wqhF4Uax4aMYULFuoY1uqFIwZRMRDohVnqlbbbT7xBzEUEMOF3oZKSJVHhn5QQRpg9CIqTslxkG3O24bLLObt02eGcHZZ2ZkCp4BMvyWVg16f5wiJ4+bZzVfgq5ZoZpasYF2D0BR8Yv1jt+DoejoxfTbTYc1cu6QnGZT0f9gtVclP3lLdcJ5AYf5S5ryBLqyRG6zNrhTEetMqHHey9pVoFLIhFbKYejO1lhhl3md0lq+zWi1ybrfGps8St0umw5d7EmGnXpzLVEWcizEWeee5SIruUaLWDG5PAehOTeWt3kEWQa16ceFqB7+dePPDugPHo6eK8vCw42YmbsrixQ1YBZNpjYsgDzcDwaAyZChWSKGML3rmJ0NTBCRQkJ5SKErBtcBNPPM+DLNJxyJ2clWby7i6BUOGtV0+52C1oprLJOXD72MkmTp9mroU+j+Ik4bra+igPjbO8P8OmCJYMBQ3SXwFpRBAYA9dZqjL+ubfDtYQqrsKeZjyP1LM4qTSTu0PsYHFpWey4CdGbs57LUhgPcMSGQ66rWxjkScQq1WPCy/IgA71wA32YC+FmmLU5Y5/ALXHXsTY6nqcpS1xHa6tWNLM6q4AohWT02uiMDuG8dR2tzSZmxiHNqwo1KgVNHE4XOwOe7zSUT73Vi6ohVJZrspYQAvIIaqmmi11iqB8dK1KoYHdHD1S1I3rqugaR6wmaLzmM63pW9vpeYXPDqtZgbxn8N81iJ6gRlHhMtgKs7lw8jYeapSPXFeyu62iaqqW+FXCdQSZUSgMOcRxCvmtYOArJbKTqtlRqYhnfo4xscjiwNT+A7kfo9HsYLU4qn2qXDJ+YommX7+zBJos9Zu139mB6KG30rD+0fqia/9edTRbcoQkb1X7y7L196qNIDTyejGOtkqq0s7YNuhlxpbksgIPaxFX0yqL7oLHgJThsUVns268Fgt9l0cMLSx/MV9gVwI0mKk4iMHiQ1gOva60e2HkSpXrgp3BMtf2IDzy0QPIMYiHAfaxzwb0Rr47EL1Pj3cWRGVUXEU/NCB7N4ATmUXYbp+X7DZ97tpb2bCoYOXUDJAEhIZKQkAgJKeUxR0LEbDxAMiBkiGRIyAjJiJAYCana8Q2SG0JukdwSIpCQo2AskRAhGSdIEkIUEiIS4xRJSsh3SL4jRCPRhGRIMkIMEiLK4xxJTsgYyZiQOyR3hEyQTAiZIpkSco/kvjok3SwV6WiWqX7dIOnKTeNhr0k2MVvvtUfdIOnLxXyQukHyOM1ioWZ77M+a5GYerOZ+2Woe8GUYJGjuUDdIpqu5h70mac3StPGoGyS/4e01anywRVJ9lhvViyjJ9gYmFCoZh3oerFmTZP+saH162I71aFY2fnVJcj0ezne1bpBYsmaN1SXJ/MUdzd+xo+lovsoRXWY4x+ESDIM2vTOKleRDNg9T1aBFcM/1LBTrdIRkprCd8secGM65IL8p7QtI8Y07k3yJ5CUhG0jIW7TcRLJJyBYS8tYnXyF5Rcg2km1CdpCQVwb5GslrQvaQ7BGyj2SfkAMkB4QcIjkk5AjJESFdJF1CjpEcE3KChPyaIU+RnBLSQ0JemmQfSZ+QMyRnhJwjOSfkAskFIW+QvCHkEsklTXy5jcpdcX97mXLLTVRu67K5TLllDyXVuvSWSarcn8mh9dlfKoeyG88cuqSQ5SnKkeWny+RI9hu9sU795Xoju6P5PFRQZDebYyoo8ggFxTocLRcUuQuPa/bBdZeW5QQRrZdahiwkMiStDFlmZcjRlMD+kgOvf83vT7XCkORe6nhCHMecOOIzJnkgWeo4oY7TpY7T8seJ6uKroKpE1b4QUlkV6mUDibIKtdFAIq5CbTaQ6KtQWw0kEivUqwYSlRVqu4FEaIXaaSDRWqF2G0gyS6jXDSS5JdReA4kcC7XfQKLIQh00kIiyUIcNJLos1FEDiTQL1W0gUWehjhtIBFqokwYSjRbqtIFEpoXqNZAotVD9BhKxFuqsgUSvhTpvIJFsoS4aSFRbqDcNJMIt1GUDL9/Hk0fGTVUm8EWLBIwvkZEaAeMGMlIiYNxERioEjFvISIGA8RUyUh9g3EZGygOMO8hIdYBxFxkpDjC+RkZqA4x7yEhpgHEfGakMMB4gI4UBxkNkpC7AeISMlAUYu8hIVYDxGBkpCjCeICM1AcZTZKQkwNhDRioCjH1kpCDAeIaM1AMYz5GRcgDjBTJSDWB8g4wUAxgvkdlaaF8/Xu24f5SjF2cvnne+fP7F8RerX3+Ff7D7uPVZ63ettVan9afW162dVrfVb4Wtf7b+1fpP678rFys/rPxt5e+164cfYJ/fthY+K//4H1dJElw=</latexit>
µ
g(y) = f (x) + rf (x)T (y x) + ky xk22
2
y
y=x
µ
∇2 f (x) − kxk22 0.
2
39
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
Thus (since ∇2(kxk22) = 2I),
∇2f (x) − µI 0,
⇓
∇2f (x) µI.
This is just a fancy way of saying that the smallest eigenvalue of the
Hessian ∇2f (x) is uniformly bounded below by µ for all x.
This means that the gradient ∇f (x) does not change radically as
we change x. Functions f that obey (4) are also referred to as L-
smooth. This definition applies whether or not the function f is
convex.
Whether or not f is convex, if it is L-smooth, it there is a natural
quadtratic over estimator. Around any point x, we have the upper
bound
L
f (y) ≤ f (x) + ∇f (x)T(y − x) + ky − xk22. (5)
2
Here is a picture
40
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
L
<latexit sha1_base64="Yi+dSu617yNY3nx7sZwA0YiX9N0=">AAAc1HicrZlbc9vGFceZ9BazN6d9jB4w1XhGbmOP5EnTPLSdxJIsydaF1t0yFM0CWJIr7WKRxYIiBaMvnb72q/TT9KEv7VfpweIQFPfQeciYM7aw/9/ZC3bP+YOQokyK3K6u/uejj3/045/89GefPOj+/Be//NWvH376m9NcFybmJ7GW2pxHLOdSpPzECiv5eWY4U5HkZ9HNes3PRtzkQqfHdpLxS8UGqeiLmFmQrh7ulKEbpDQ8qbqDlcnj4C9Bf2X8OPhDEKYsksy1vg2PVyZPGrVvWFzuVuWzKnwHWvju6tm3z7rV1cPl1aer7hPQizW8WO7gp3f16Wd5mOi4UDy1sWR5/nZtNbOXJTNWxJJX3bDIecbiGzbgb+EyZYrnl6VbbhU8AiUJ+trAv9QGTr3fo2QqzycqgkjF7DD3WS0uYm8L2//qshRpVliexs1E/UIGVgf1BgaJMDy2cgIXLDYC1hrEQwZ7YmGb52ZpVtrtPgr6gssEYMpvY60US5MyPKzKsJ49isrDqppn6zO27rOLGbvw2f6M7VduZkCZ5OMgLVTk1mf43CJ49Xbtsgx1xg2z2tR7XIIYSt63Ybm8FhoxGNqwnmi+545a0BPERT3v94t1et30VDfcpJAYf1RFqCFH6xRG9YlTYYx7rep+B3dvmdERi4QUdhLA2EFumfWX2Vuwyl6zyJXpGh97S9ysvA6b/k2MmPFjaqnZcSbjQhZ54C8lcUtJltfwYFJYb2rzYOUWsghyLRBpYDTEfh6IfnALjCeP5+flVcnJSVxX5bUbst5AZgImBzwyDIQHI8hUqJBUW1fu3k3EttmcSENyQqloCccGN/EoCALIIiNi7uWssuP3d4mkjm+CZsr5blE7lUvOvt/HTTb2+rRzzfV5INKUm/rokyK23vL+DIciWTqQdJP+Csgggo2xcJ1nOuefB9vcKKjietuznBeJfiLS2jG5P8Q2FpdR5bafEMczduyzDMYDnLDBgJv6FvpFmrDa9ZgM8iLKwS/8jd4vpPQzzGne2IdwS9wPbEQv8ihjqR/otHpFU9VbBexSTEZvRG902M4bP9BpLjFzDmleV6jVGXjiYDLfGfDspKF8mqOedw2p88KQtcSwIQ+glho630VA/RihSaGC7o8e6fpEzMQPjRI/EjxfcRjXj6z15l7hcOO61uBsGfw3yYW3qQmUuCBHAao/F8/EwLBs6IeC7ocOJ5leGFsDPxhsQmd0w2EfB5DvBhaORjIdqb4tnVmhxB3ayAaHB7bhe9D9AIN+D6OJtI6pT8nysS3bdvXeHmw832Pafm8PZgbK7Z6Lh9bf6ub3hrPxXDg04aC6j558sE/zKNL9gKcjYXRal3bedZtuh1wbrkrg4Dai3r2q7N1rzEVJDkdUlbvuxxzBn1V5jBeO3puvdCuAG021SBMQAkjrftBzagA6T5PM9MMMHlPdMOH9ABVInr6QEsJHppA8GPL6kfhlZoNbkdhhfZHwzA7hqxk8gXmS34is+rDb5z9bK/dsKhl56kZIIkJiJDEhCRJSyiOOhJjZqI+kT8gAyYCQIZIhIQIJqdrRNZJrQm6Q3BAikZBHwUghIUYySpGkhGgkxCRGGZKMkO+QfEeIQWIIyZHkhFgkxJRHBZKCkBGSESG3SG4JGSMZEzJBMiHkDsld/ZD0s1Rmw2mmhk2DpCu3bYS7JtnEXL03EU2DpC+Xs0GaBsnjLBdST884nDbJzdxbzd2i1dzjizBY0CygaZBM17MId03SmmVZG9E0SH7Du2vSxmCLpPo0N0JFsyNtYUqhViI2s82aNkn2T4s2pA/bkRlOyyasL0mui8HsVJsG2UvWrrG+JJk/f6LFe040G85WOaTLjGc4XoBh0LZ3TrFWfMBm21Q3aBHccTPdilU6Qjp12LXqhzwxvOeC+qZyLyDlN/5M6jmS54SsIyFv0WoDyQYhm0jIW596geQFIVtItgjZRkJeGdRLJC8JeYXkFSG7SHYJ2UOyR8g+kn1CDpAcENJD0iPkNZLXhBwiIb/NUEdIjgg5RkJemtQJkhNCTpGcEnKG5IyQcyTnhLxB8oaQCyQXNPHVFjp3zcOtRc6tNtC5XcjGIudWx2ipLuR4kaWq3akdupjdhXaoemIa0COFrI7Qjhw/WmRH6qT1Gxd0sthvVG84m4caiurlM0wNRR2gobiAg8WGonbg65r74rpDy3KMiNZLY0MOEhtSzoYcczbkeUrkfpMDr3/t758ahyHJvTDwkASOOAnE75jkC8nCwDENnCwMnFQ/zFTnXwV1baruhZDaqtTPW0icVer1FhJzlXqjhcRfpd5sIbFYqV+0kLis1FstJEYr9XYLiddKvdNCkllSv2whyS2pX7WQ2LHUuy0kjiz1XguJKUu930Liy1IftJBYs9S9FhJ3lvp1C4lBS33YQuLRUh+1kNi01MctJE4t9UkLiVlLfdpC4tdSn7WQWLbU5y0kri31mxYS45b6ooUXH+KbR85tXSbwgxYJiM+RkRoBcR0ZKREQN5CRCgFxExkpEBBfICP1AeIWMlIeIG4jI9UB4g4yUhwgvkRGagPEV8hIaYC4i4xUBoh7yEhhgLiPjNQFiAfISFmA2ENGqgLE18hIUYB4iIzUBIhHyEhJgHiMjFQEiCfISEGAeIqM1AOIZ8hIOYB4joxUA4hvkJFiAPECmauF7tXD5TX/j3L04vTZ07Uvn37x+ovlr7/CP9h90vms87vOSmet86fO153tTq9z0ok7/+r8u/Pfzv+WTpfeLf196R9N6McfYZ/fduY+S//8PwUnEVY=</latexit>
y
y=x
This makes intuitive sense, as (4) tells us that the first derivative
cannot change too quickly, so there must be some kind of bound on
the second derivative. We will establish that (4) implies (6) (again,
regardless of whether f is convex) in a future homework.
41
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
Review: The gradient
First, recall that a function f : R → R is differentiable if its deriva-
tive, defined as
f (x + δ) − f (x)
f 0(x) = lim ,
δ→0 δ
exists for all x ∈ dom f . To extend this notion to functions of
multiple variables, we must first extend our notion of a derivative.
For a function f : RN → R that is defined on N -dimensional vectors,
recall that the partial derivative with respect to xn is
∂f (x) f (x + δen) − f (x)
= lim ,
∂xn δ→0 δ
where en is the nth “standard basis element”, i.e., the vector of all
zeros with a single 1 in the nth entry.
We will use the term gradient in two subtly different ways. Sometimes
we use ∇f (x) to describe a vector-valued function or a vector field,
42
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
i.e., a function that takes an arbitrary x ∈ RN and produces another
vector. When referring to this vector-valued function, we sometimes
use the words gradient map, but sometimes we will overload the
term “gradient”; we will use the notation ∇f (x) to refer to the
vector given by the gradient map evaluated at a particular point
x. So sometimes when we say “gradient” we mean a vector-valued
function, and sometimes we mean a single vector, and in both cases
we use the notation ∇f (x). Which one will usually be obvious by
the context.1
Note that in some cases we will use the notation ∇xf (x) to indicate
that we are taking the gradient with respect to x. This can be helpful
when f is a function of more variables than just x, but most of the
time this is not necessary so we will typically use the simpler ∇f (x).
1
This is just like in the scalar case, where the notation f (x) can sometimes
refer to the function f and sometimes the function evaluated at x.
2
The Jacobian of a vector-valued function f : RN → RM is the M × N
matrix of partial derivatives with respect to each dimension in the range.
In this course we will mostly be concerned with functions mapping to a
single dimension, in which case the Jacobian would be the 1 × N matrix
∇T f (x), i.e., the gradient but treated as a row vector. Directly defining
the gradient as a row vector instead of a column vector is thus more
convenient in some contexts.
43
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
Interpretation of the gradient
44
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
Note that this holds for any δ > 0. Since h(δu) → 0 as δ → 0, we
can arrive at (7) by simply taking the limit as δ → 0.
and that this holds with equality when u is co-linear with ∇f (x),
i.e., when u points in the same direction as ∇f (x). Specifically, this
implies that ∇f (x) is the direction of steepest ascent, and −∇f (x)
is the direction of steepest descent.
45
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
steepest descent) will be useful in the context of finding a maxi-
mum/minimum of a function.
To show that hu, ∇f (x)i < 0 implies (8), we again use the Taylor
theorem to get
46
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
Technical Details: Taylor’s Theorem
You might recall the mean-value theorem from your first calculus
class. If f : R → R is a differentiable function on the interval
[a, x], then there is a point inside this interval where the derivative
of f matches the line drawn between f (a) and f (x). More precisely,
there exists a z ∈ [a, x] such that
f (x) − f (a)
f 0(z) = .
x−a
Here is a picture:
f (x)
f (a)
f (x) f (a)
f 0 (z) =
x a
a z x
47
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
In general, if f is k +1 times differentiable, and the first k derivatives
are continuous, then there is a point z between a and x such that
f (k+1)(z)
f (x) = pk,a(x) + (x − a)k+1,
k!
where pk,a(x) polynomial formed from the first k terms of the Taylor
series expansion around a:
0 f 00(a) 2 f (k)(a)
pk,a(x) = f (a) + f (a)(x − a) + (x − a) + · · · + (x − a)k .
2 k!
These results give us a way to quantify the accuracy of the Taylor ap-
proximation around a point. For example, if f is twice differentiable
with f 0 continuous, then
f (x) = f (a) + f 0(a)(x − a) + h1(x)(x − a),
for a function h1(x) goes to zero as x goes to a:
lim h1(x) = 0.
x→a
In fact, you do not even need two derivatives for this to be true. If
f has a single derivative, then we can find such an h1. When f has
two derivatives, then we have an explicit form for h1:
f 00(zx)
h1(x) = (x − a),
2
where zx is the point returned by the (generalization of) the mean
value theorem for a given x.
48
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022
All of the results above extend to functions of multiple variables. For
example, if f (x) : RN → R is differentiable, then around any point
a,
f (x) = f (a) + hx − a, ∇f (a)i + h1(x)kx − ak2,
where h1(x) → 0 as x approaches a from any direction. If f (x) is
twice differentiable and the first derivative is continuous, then there
exists z on the line between a and x such that
1
f (x) = f (a) + hx − a, ∇f (a)i + (x − a)T∇2f (z)(x − a).
2
We will use these two particular multidimensional results in this
course, referring to them generically as “Taylor’s theorem”.
References
[BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cam-
bridge University Press, 2004.
49
Georgia Tech ECE 6270 Notes by M. Davenport and J. Romberg. Last updated 16:20, October 12, 2022