0% found this document useful (0 votes)

29 views27 pages

Accelerated and Inexact Forward-Backward Algorithms

This paper presents a convergence analysis of accelerated forward-backward splitting methods for composite function minimization when the proximity operator is computed inexactly. It demonstrates that a convergence rate of 1/k^2 can be achieved under specific error conditions and provides a global complexity analysis for the algorithm. Additionally, numerical experiments validate the effectiveness of the proposed approach in real-life applications.

Uploaded by

sach.co.quy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views27 pages

Accelerated and Inexact Forward-Backward Algorithms

Uploaded by

sach.co.quy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

SIAM J. OPTIM.

c 2013 Society for Industrial and Applied Mathematics

Vol. 23, No. 3, pp. 1607–1633

ACCELERATED AND INEXACT FORWARD-BACKWARD

ALGORITHMS∗
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

SILVIA VILLA† , SAVERIO SALZO‡ , LUCA BALDASSARRE§ , AND ALESSANDRO VERRI‡

Abstract. We propose a convergence analysis of accelerated forward-backward splitting methods

for composite function minimization, when the proximity operator is not available in closed form,
and can only be computed up to a certain precision. We prove that the 1/k 2 convergence rate for the
function values can be achieved if the admissible errors are of a certain type and satisfy a suﬃciently
fast decay condition. Our analysis is based on the machinery of estimate sequences ﬁrst introduced
by Nesterov for the study of accelerated gradient descent algorithms. Furthermore, we give a global
complexity analysis, taking into account the cost of computing admissible approximations of the
proximal point. An experimental analysis is also presented.

Key words. convex optimization, accelerated forward-backward splitting, inexact proximity

operator, estimate sequences, total variation

AMS subject classifications. 90C25, 49M07, 65K10, 94A08

DOI. 10.1137/110844805

1. Introduction. Let H be a Hilbert space and consider the optimization prob-

lem

(P) inf F (x), F (x) = f (x) + g(x),

x∈H

where
(H1) g : H → R is proper, lower semicontinuous (l.s.c.), and convex,
(H2) f : H → R is convex diﬀerentiable and ∇f is L-Lipschitz continuous on H
with L > 0, namely,

∇f (x) − ∇f (y) ≤ Lx − y ∀x, y ∈ H.

We denote by F∗ the inﬁmum of F . We do not require in general the inﬁmum to be

attained, nor to be finite. It is well known that problem (P) covers a wide range of
signal recovery problems (see [18] and references therein), including constrained and
regularized least-squares problems [27, 25, 51, 21], (sparse) regularization problems
in image processing, such as total variation denoising and deblurring (see, e.g., [50,
13, 12]), as well as machine learning tasks involving nondifferentiable penalties (see,
e.g., [4, 23, 42]).
The variety of applications to real-life problems stimulated the search of simple
first-order methods to solve (P), which can be applied to large scale problems. In
this area, a significant amount of research has been devoted to forward–backward
splitting methods, that allow one to decouple the contributions of the functions f
and g in a gradient descent step determined by f and in a backward implicit step
∗ Received by the editors August 17, 2011; accepted for publication (in revised form) May 13,

2013; published electronically August 6, 2013.

http://www.siam.org/journals/siopt/23-3/84480.html
† Laboratory for Computational and Statistical Learning, Istituto Italiano di Tecnologia and Mas-

sachusetts Institute of Technology, 16163, Genova, Italy ([email protected]).

‡ DIBRIS, University of Genova, 16145, Genova, Italy ([email protected], Alessandro.Verri@

unige.it).
§ Laboratory for Information and Inference Systems, EPFL STI IEL LIONS, ELD 243 (Batiment

ELD), Station 11, CH-1015 Lausanne, Switzerland (luca.baldassarre@epﬂ.ch).

1607

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

1608 S. VILLA, S. SALZO, L. BALDASSARRE, AND A. VERRI

induced by g [17, 18, 35]. These schemes are also known under the name of proximal
gradient methods [61], since the implicit step relies on the computation of the so-
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

called proximity operator, introduced by Moreau in [39]. Though appealing for their
simplicity, gradient-based methods often exhibit a slow speed of convergence. For
this reason, resorting to the ideas contained in the work of Nesterov [44], there has
recently been an active interest in accelerations and modiﬁcations of the classical
forward-backward splitting algorithm [61, 45, 7]. We will study the following general
accelerated scheme

xk+1 = proxλk g (yk − λk ∇f (yk )),

(1.1)
yk+1 = c1,k xk+1 + c2,k xk + c3,k yk

for suitably chosen constants ci,k , (i = 1, 2, 3, k ∈ N) and parameters λk > 0—where

proxλk g : H → H denotes the proximity operator associated with λk g. In particular,
choosing c3,k = 0, procedure (1.1) encompasses the popular fast iterative shrinkage
thresholding algorithm (FISTA), whose optimal (in the sense of [43]) 1/k 2 conver-
gence rate for the objective values F (xk ) − F∗ has been proved in [7]. Furthermore,
the effectiveness of such accelerations has been tested empirically on several relevant
problems (see, e.g., [6, 8]).
Unfortunately, the proximity operator is in general not available in exact form
or its computation may be very demanding. Just to mention some examples, this
happens when applying proximal methods to image deblurring with total variation
[12, 6, 26], or to structured sparsity regularization problems in machine learning and
inverse problems [67, 28, 33, 42, 49, 2]. In those cases, the proximity operator is
usually computed using ad hoc algorithms, and therefore inexactly. See [17] for a list
of possible approaches. In the end, the entire procedure for solving problem (P) is
constituted by two nested loops: an external one of type (1.1) and an internal one
which serves to approximately compute the proximity operator occurring in the first
row of (1.1). Hence, the problem of studying the convergence of accelerated forward-
backward algorithms under possible perturbations of proximal points arises. In [6],
FISTA is applied to the total variation (TV) image deblurring problem and empiri-
cally it is shown to possibly generate divergent sequences when the prox subproblem
is solved inexactly. However, no theoretical analysis is carried out for the role of
inexactness in the convergence and acceleration properties of the algorithm.
1.1. Main contributions. From a theoretical point of view, the contribution of
this paper is threefold: first, we show that by considering a suitable notion of admissi-
ble approximation of the proximal point, it is possible to get quadratic convergence of
the inexact version of the accelerated forward-backward scheme (1.1). In particular,
we prove that the proposed algorithm shares the 1/k 2 convergence rate in the objec-
tive values if the computation of the proximity operator at the kth step is performed
up to a precision εk , with εk = O(1/k q ) and q > 3/2. This assumption clearly implies
summability of the errors, which is a common requirement in similar contexts (see,
e.g., [48, 18]). We underline, however, that, for slower convergence rates, summability
can be avoided and the requirement εk = O(1/k q ) with q > 1/2 is sufficient. The
second main contribution of the paper is the study of the global iteration complexity
of (1.1), which also takes into account the cost of computing admissible approxima-
tions of the proximity operator. Furthermore, we show that the proposed inexactness
criterion has an equivalent formulation in terms of duality gap, that can be easily
checked in practice. This allows us to handle most significant penalty terms and dif-

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

ACCELERATED AND INEXACT FORWARD-BACKWARD ALGORITHMS 1609

ferent algorithms to compute the proximal point, as, for instance, those in [12, 19, 14].
This resolves the issue of convergence and applicability of the two-loops algorithm for
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

many real-life problems, in the same spirit as [15].

The third contribution concerns the techniques we employ to obtain the result.
The algorithm derivation relies on the machinery of estimate sequences [44]. Lever-
aging on the ideas developed in [52], we propose a ﬂexible method to build estimate
sequences, that can be easily adapted to deal with inexactness in accelerated forward-
backward algorithms. It is worthwhile to mention that this framework includes the
well-known FISTA [7].
Finally, we performed numerical experiments investigating the impact of errors on
the acceleration property. We also illustrate the eﬀectiveness of the proposed notion
of inexactness on two real-life problems, making performance comparisons with the
nonaccelerated version, and a benchmark primal-dual algorithm [14].

1.2. Related work. Forward-backward algorithms belong to the wider class

of proximal splitting methods [17]. All these methods require the computation of
the proximity operator, consequently approximations of proximal points have been
studied in a number of papers, and the following list does not claim to be exhaustive.
For nonaccelerated schemes, convergence in the presence of errors has been addressed
in various contexts ranging from proximal point algorithms [3, 48, 29, 34, 20, 19, 1, 59],
hybrid extragradient-proximal point algorithms [55, 56, 57, 63], generalized proximal
algorithms using Bregman distances [24, 58, 11] to forward-backward splitting [18].
On the other hand, only very recently, accelerated proximal methods under inex-
act evaluation of the proximity operator have been studied. In [31, 52] the classical
proximal point algorithm is treated (f = 0 in (1.1)). Paper [38] considers inexact ac-
celerated hybrid extragradient-proximal methods, but actually the framework is shown
to include only the case of the exact accelerated forward-backward algorithm. In [22],
convergence rates for an accelerated projected-subgradient method is proved. The case
of an exact projection step is considered, and the authors assume the availability of an
oracle that yields global lower and upper bounds on the function. Although interest-
ing, it leads to a slower convergence rate than proximal-gradient methods. Summariz-
ing, none of the studies above covers the case of accelerated inexact forward-backward
algorithms.
Finally, we mention the subsequent, but independent, work [54], where an analysis
of an accelerated proximal-gradient method with inexact proximity operator is given
too, and the same convergence rates are proved. While the accelerated scheme is very
similar (though not exactly equal1 ), the employed techniques are completely diﬀerent.
In particular, the estimate sequences framework which motivates the updating rules
for the parameters and auxiliary sequences is not used in [54]. The inexactness notion
is diﬀerent as well: our choice is more demanding, but leads to a better (weaker)
dependence on the errors decay. For instance, in [54] the authors obtain convergence
of the algorithm for εk = O(1/k 1+δ ), while we only need εk = O(1/k 1/2+δ ), and
the optimal convergence rate of the algorithm for εk = O(1/k 2+δ ), while Theorem
4.4 requires only εk = O(1/k 3/2+δ ). For a comparison between the two errors see
section 2. For completeness, in Appendix A we show that the framework of estimate
sequences can handle the type of errors considered in [54] as well, but only a 1/k
convergence rate can be obtained.

1 There, the sequence y in (1.1), is updated by setting c

k 3,k = 0, and the choice of the parameters
c1,k , c2,k is diﬀerent too.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

1610 S. VILLA, S. SALZO, L. BALDASSARRE, AND A. VERRI

Note also that none of the abovementioned papers study the rate of convergence
of the nested algorithm, as we do in section 5.
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1.3. Outline of the paper. In section 2, we give a notion of admissible ap-

proximation of proximal points and discuss its applicability. Section 3 reviews the
framework of Nesterov’s estimate sequences and gives a general updating rule for
recursively constructing estimate sequences for convex problems. In section 4, we
present a new general accelerated scheme for forward-backward splitting algorithms
and a convergence theorem under admissible approximations of proximal points. In
section 5, we discuss the subproblem of computing inexact proximal points and the
complexity of the resulting global nested algorithm. Section 6 contains a numeri-
cal evaluation of the effect of errors in the computation of the proximal points on
the forward-backward algorithm (1.1). Finally, Appendix A discusses convergence of
accelerated forward-backward splitting algorithms for the error notion considered in
[54].
2. Inexact proximal points. The algorithms analyzed in this paper are based
on the computation of the proximity operator of a convex function, introduced by
Moreau [39, 40, 41], and then made popular in the optimization literature by Martinet
[36] and Rockafellar [48, 47].
Let R = R ∪ {±∞} be the extended real line. For a proper, convex, and l.s.c.
function g : H → R, λ > 0 and y ∈ H, the proximal point of y with respect to λg is
defined by setting

1 2
(2.1) proxλg (y) := argmin g(x) + x − y
x∈H 2λ
and the mapping proxλg : H → H is called the proximity operator of λg. If we let
1
Φλ (x) = g(x) + 2λ x − y2 , the first-order optimality condition for a convex minimum
problem yields
y−z
(2.2) z = proxλg (y) ⇐⇒ 0 ∈ ∂Φλ (z) ⇐⇒ ∈ ∂g(z),
λ
where ∂ denotes the subdifferential operator.
We already noted that, from a practical point of view, it is essential to replace
the proximal point with an approximate version of it.
2.1. The proposed notion. We employ here a concept of approximation of the
proximal point based on the ε-subdifferential, which is indeed a relaxation of condition
(2.2). We recall that, for ε ≥ 0, the ε-subdifferential of g at the point z ∈ domg is the
set ∂ε g(z) = {ξ ∈ H : g(x) ≥ g(z) + x − z, ξ − ε, ∀x ∈ H}.
Definition 2.1. Let ε ≥ 0. We say that z ∈ H is an approximation of proxλg (y)
with ε-precision and we write z ε proxλg (y) iff
y−z
(2.3) ∈ ∂ ε2 g(z).
λ 2λ

Note that if z ε proxλg (y), then necessarily z ∈ dom g, and hence the allowed
approximations are always feasible. This notion has been ﬁrst proposed, in the context
of the proximal point algorithm, in [34] and successfully used in, e.g., [1, 19, 52].
A relative version of criterion (2.3) has recently been proposed for nonaccelerated
proximal methods in the preprint [37], which allows us to interpret the (exact) forward-
backward splitting algorithm as an instance of an inexact proximal point algorithm.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

ACCELERATED AND INEXACT FORWARD-BACKWARD ALGORITHMS 1611
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

C z

Fig. 2.1. Admissible approximation of PC (y).

Example 1. We describe the case where g is the indicator function of a closed

and convex set C, and the proximity operator is consequently the projection onto C,
denoted by PC . Given y ∈ H, it holds
ε2
(2.4) z ε PC (y) ⇐⇒ z ∈ C and x − z, y − z ≤ ∀x ∈ C.
2
Recalling that the projection PC (y) of a point y is the unique point z ∈ C which
satisfies x − z, y − z ≤ 0 for all x ∈ C, approximations of this type are therefore the
points enjoying a relaxed formulation of this property. From a geometric point of view,
the characterization of projection ensures that the convex set C is entirely contained
in the half-space determined by the tangent hyperplane at the point PC (y), namely,
C ⊆ {x ∈ X : x−PC (y), y −PC (y) ≤ 0}. To check that z satisfies condition (2.4), it
is enough to verify that C is entirely contained in the negative half-space determined
by the (affine) hyperplane of equation

y−z ε2
hε : x − z, = ,
y − z 2y − z

which is normal to y − z and at distance ε2 /(2y − z) from z. See Figure 2.1.
In the following we provide an analysis of the notion of inexactness given in
Deﬁnition 2.1, which will clarify the nature of these approximations and the scope of
applicability. To this purpose, we will make use of the duality technique, an approach
that is quite common in signal recovery and image processing applications [18, 12, 16].
The starting point is the Moreau decomposition formula [41, 18], stating

(2.5) y − λproxg∗ /λ (y/λ) = proxλg (y),

where g ∗ : H → R, g ∗ (y) = supx∈H ( x, y − g(x)), is the conjugate functional of g.

When proxg∗ /λ is easy to compute, formula (2.5) provides a convenient method to get
the proximity operator of λg.
A remarkable property of inexact proximal points based on criterion (2.3) is that,
in a sense, the Moreau decomposition still holds: if y, z ∈ H and ε, λ > 0, then, letting
η = ε/λ, it is

(2.6) z η proxg∗ /λ (y/λ) ⇐⇒ y − λz ε proxλg (y) .

This arises immediately from Deﬁnition 2.1 and the following equivalence (see Theo-
rem 2.4.4, item (iv), in [65]):

y − λz ∈ ∂ η2 λ g ∗ (z) ⇐⇒ z ∈ ∂ ε2 g(y − λz).

2 2λ

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

1612 S. VILLA, S. SALZO, L. BALDASSARRE, AND A. VERRI

Next, we prove that the proposed inexactness criterion can be formulated in

terms of the duality gap. This leads to a very natural and simple test for assessing
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

admissible approximations. Without loss of generality, we consider the case where g

has the following structure

(2.7) g(x) = ω(Bx),

with B : H → G a bounded linear operator between Hilbert spaces, and ω : G → R

a proper, l.s.c. convex function. The structure (2.7) often arises in regularization
methods for ill-posed inverse problems [13, 10, 28, 53, 67, 16]. By Deﬁnition 2.1,
ﬁnding proxλg (y) requires the solution of the minimization problem

1
(2.8) min Φλ (x), Φλ (x) = ω(Bx) + x − y2 .
x∈H 2λ

From now on, we assume ω is continuous in Bx0 for some x0 ∈ H. Then, the Fenchel–
Moreau–Rockafellar duality formula (see Corollary 2.8.5 in [65]) states that

(2.9) min Φλ (x) = − min Ψλ (v) ,

x∈H v∈G

where
1 1
(2.10) Ψλ (v) = λB ∗ v − y2 + ω ∗ (v) − y2 ,
2λ 2λ
or, equivalently, the minimum of the duality gap is zero:

(2.11) 0= min Φλ (x) + Ψλ (v) =: G(x, v) .

(x,v)∈H×G

Moreover, if v̄ is a solution of the dual problem minv Ψλ (v), then z̄ = y − λB ∗ v̄ solves

the primal problem (2.8). This also implies that minv G(y − λB ∗ v, v) = 0. The next
proposition shows that inexact proximal points have the same structure as the exact
ones.
Proposition 2.2. If z ε proxλg (y), then there exists v ∈ dom ω ∗ such that
z = y − λB ∗ v.
Proof. If z ε proxλg (y), by deﬁnition (y − z)/λ ∈ ∂ ε2 g(z). Then [65, The-
2λ
orem 2.4.2, item (ii)] ensures that g(z) + g ∗ ((y − z)/λ) ≤ z, (y − z)/λ + ε2 /(2λ),
hence g ∗ ((y −z)/λ) < +∞. Using the Fenchel–Moreau–Rockafellar duality [65, Corol-
lary 2.8.5], one can prove that g ∗ (w) = supB ∗ v=w ω ∗ (v). Thus there exists v ∈ G such
that B ∗ v = (y − z)/λ and ω ∗ (v) < +∞ and the statement follows.
Proposition 2.3. Let η = ε/λ, v ∈ G and consider the following statements:
(a) G(y − λB ∗ v, v) ≤ ε2 /(2λ);
(b) B ∗ v η proxg∗ /λ (y/λ);
(c) y − λB ∗ v ε proxλg (y).
Then (a) ⇒ (b) and (b) ⇔ (c). Furthermore if ω ∗ (v) = g ∗ (B ∗ v), they are all equiva-
lent.
Proof. The equivalence of (b) and (c) comes directly from the inexact Moreau
decomposition (2.6). Let us show that (a) ⇒ (b). From the deﬁnition of G and using

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

ACCELERATED AND INEXACT FORWARD-BACKWARD ALGORITHMS 1613

the fact that ω ∗ (v) ≥ g ∗ (B ∗ v), it follows

G(y − λB ∗ v, v)
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1 1
= λB ∗ v2 − 2 λB ∗ v, y + λB ∗ v2 + sup w, y − λB ∗ v − g ∗ (w) + ω ∗ (v)
2λ 2λ w∈H
= B ∗ v, λB ∗ v − y + sup w, y − λB ∗ v − g ∗ (w) + ω ∗ (v)
w∈H
≥ sup w − B ∗ v, y − λB ∗ v − g ∗ (w) + g ∗ (B ∗ v)
w∈H
= sup −[g ∗ (w) − g ∗ (B ∗ v) − w − B ∗ v, y − λB ∗ v].
w∈H
(2.12)
Therefore if G(y − λB ∗ v, v) ≤ ε2 /(2λ), setting η = ε/λ, it holds
η2 λ
(2.13) ∀w ∈ H g ∗ (w) − g ∗ (B ∗ v) ≥ w − B ∗ v, y − λB ∗ v − ,
2
which is equivalent to y − λB ∗ v ∈ ∂η2 λ/2 g ∗ (B ∗ v) and hence to B ∗ v η proxg∗ /λ (y/λ).
As regards the second part of the statement, assuming ω ∗ (v) = g ∗ (B ∗ v), the inequality
in (2.12) becomes an equality and condition (a) is then equivalent to (2.13). Thus,
the reverse implication (b) ⇒ (a) follows.
Remark 1. In Proposition 2.3 the assumption ω ∗ (v) = g ∗ (B ∗ v), guaranteeing the
equivalence of statements (a), (b), (c), occurs in the following cases:
• ω is positively homogeneous. Indeed in that case ω ∗ = δS with S = ∂ω(0)
and g ∗ = δK with K = ∂g(0) = B ∗ (S). Thus, if v ∈ S, it is ω ∗ (v) = δS (v) =
δK (B ∗ v) = g ∗ (B ∗ v). This entails that
ε2
G(y − λB ∗ v, v) ≤ ⇐⇒ λB ∗ v ε PλK (y) ⇐⇒ y − λB ∗ v ε proxλg (y) .
2λ
• B is surjective. Indeed, in that case g ∗ (B ∗ v) = supx∈H ( Bx, v − ω(Bx)) =
ω ∗ (v). For instance, for B = id, it holds
ε2
G(y − λv, v) ≤ ⇐⇒ v η proxg∗ /λ (y/λ) .
2λ
We underline that in the two cases above the proposed inexact notion of prox is fully
characterized by means of the duality gap.
Summarizing, the implication (a) ⇒ (c) stated in Proposition 2.3 ensures that
admissible approximations of proximal points, in the sense of Definition 2.1, can
always be computed by approximately minimizing the duality gap G(y − λB ∗ v, v). In
general, condition (a) of Proposition 2.3 is only a sufficient condition to get inexact
proximal points with precision ε. However, as discussed in Remark 1, it becomes a
full characterization of inexact proximal points for a relevant class of penalties. We
finally highlight that condition (a) can be easily checked in practice, and will be at the
basis of the analysis of the convergence rate for the nested procedure in section 5.2.
2.2. Comparison with other kinds of approximation. Other notions of
inexactness for the proximity operator have been considered in the literature. One of
the first is
ε
(2.14) d(0, ∂Φλ (z)) ≤ ,
λ
which was proposed in [48], and treated also in [30].

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

1614 S. VILLA, S. SALZO, L. BALDASSARRE, AND A. VERRI

Another notion, that we shall use in the appendix, replaces the exact minimum
in (2.1) with ε2 /(2λ)-minima, and is deﬁned as follows:
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

def ε2
(2.15) z ε proxλg (y) ⇐⇒ Φλ (z) ≤ inf Φλ + .
2λ
The condition on the right-hand side of (2.15) is equivalent to 0 ∈ ∂ε2 /(2λ) Φλ (z)
and implies, by the strong convexity of Φλ , z − proxλg (y) ≤ ε (see [52]). This
type of error has been ﬁrst considered in [3] and then employed, for instance, in
[19, 52, 66]. Lemma 1 in [52] shows that the criterion in (2.15) is more general than
both the ones in (2.3), (2.14). We also note that (again from Lemma 1 in [52]) the
error criterion proposed in [38, 55] for the approximate hybrid extragradient-proximal
point algorithm corresponds to a relative version of (2.15).
Here, to help positioning the proposed criterion, we give a proposition and a
corollary that directly link approximations in the sense of (2.3) with those in the
sense of (2.15), valid for a subclass of functions g.
Proposition 2.4. Let g : H → R be proper, convex, and l.s.c. with dom g
bounded, and y, z ∈ H. For every ε > 0, if 0 < δ ≤ diam(dom g) and diam(dom g)δ ≤
ε2 /2, then
z δ proxλg (y) =⇒ z ε proxλg (y) .
Proof. Let z δ proxλg (y). Thanks to Lemma 1 in [52], there exist δ1 , δ2 ≥ 0 with
δ12 + δ22 ≤ δ 2 and e ∈ H, e ≤ δ2 , such that (y + e − z)/λ ∈ ∂δ12 /(2λ) g(z). Therefore,
for every x ∈ dom g,
δ12
λg(x) − λg(z) ≥ x − z, y − z − diam(dom g)δ2 − .
2
Now it is easy to show that, if 0 < δ ≤ diam(dom g), then

δ2
sup diam(dom g)δ2 + 1 = diam(dom g)δ .
δ12 +δ22 ≤δ 2 2

Thus, if diam(dom g)δ ≤ ε2 /2, it holds λg(x) − λg(z) ≥ x − z, y − z − ε2/2 for every
x ∈ dom g, which proves that (y − z)/λ ∈ ∂ε2 2λg(z).
Proposition 2.4 states that for each ε > 0 one can get approximations of proximal
points in the sense of Deﬁnition 2.1 from approximations in the sense of (2.15) as
soon as the precision δ is chosen small enough.
Corollary 2.5. Let g : H → R be proper, convex, and l.s.c. with dom g ∗
bounded, and y, z ∈ H. For any ε > 0, if 0 < σ ≤ diam(dom g ∗ ) and σλ2 diam(dom g ∗ )
≤ ε2 /2, then
(2.16) z σ proxg∗ /λ (y/λ) =⇒ y − λz ε proxλg (y) .
In particular, suppose g is positively homogeneous (i.e., g(αx) = αg(x) for α ≥ 0).
Then, setting K := ∂g(0), if 0 < σ ≤ λdiamK and σλdiamK ≤ ε2 /2, it holds
(2.17) z σ PλK (y) =⇒ y − z ε proxλg (y) .
Proof. Set η = ε/λ. Then the condition σλ2 diam(dom g ∗ ) ≤ ε2 /2 is equivalent
to σdiam(dom g ∗ ) ≤ η 2 /2. Therefore, by applying Proposition 2.4 to the function g ∗ ,
we obtain
z σ proxg∗ /λ (y/λ) =⇒ z η proxg∗ /λ (y/λ) .
Then, the inexact Moreau decomposition (2.6) gives y − λz ε proxλg (y).

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

ACCELERATED AND INEXACT FORWARD-BACKWARD ALGORITHMS 1615

In case g is positively homogeneous, λg is positively homogeneous too and (λg)∗ =

δ∂(λg)(0) = δλK , where K = ∂g(0). The hypotheses on σ, given in the second part of
the statement, ensure that 0 < σ ≤ diam(dom(λg)∗ ) and σdiam(dom(λg)∗ ) ≤ ε2 /2.
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Thus (2.16) can be applied to the function λg, obtaining (2.17).

Remark 2. The hypothesis dom g ∗ bounded in Corollary 2.5 is satisfied (in fi-
nite dimension) for many significant regularization terms, like total variation, nuclear
norm, and structured sparsity regularization and it has been considered in similar
contexts, for instance, in [14, 9].
3. Nesterov’s estimate sequences. In [44], Nesterov illustrates a flexible
mechanism to produce minimizing sequences for an optimization problem. The idea
is to recursively generate a sequence of simple functions that approximate F . In this
section, we briefly describe this method and review the general results obtained in [52]
for constructing quadratic estimate sequences when F is convex. We do not provide
proofs, referring to the mentioned works for details.
3.1. General framework. We start by providing the definition and motivation
of estimate sequences.
Definition 3.1. A pair of sequences (ϕk )k∈N , ϕk : H → R and (βk )k∈N , βk ≥ 0
is called an estimate sequence of a proper function F : H → R iff βk → 0 and
(3.1) ∀x ∈ H, ∀k ∈ N : ϕk (x) − F (x) ≤ βk (ϕ0 (x) − F (x)) .
The next statement represents the main result about estimate sequences and
explains how to use them to build minimizing sequences and get corresponding con-
vergence rates.
Theorem 3.2. Let ((ϕk )k∈N , (βk )k∈N ) be an estimate sequence of F and denote by
(ϕk )∗ the infimum of ϕk . If, for some sequences (xk )k∈N , xk ∈ H and (δk )k∈N , δk ≥ 0
(3.2) F (xk ) ≤ (ϕk )∗ + δk ,
then, for any x ∈ domF ,
(3.3) F (xk ) ≤ βk (ϕ0 (x) − F (x)) + δk + F (x).
Thus, if δk → 0 (being also βk → 0), (xk )k∈N is a minimizing sequence for F , that is
limk→∞ F (xk ) = F∗ . If in addition the infimum F∗ is attained at some point x∗ ∈ H,
then the rate of convergence F (xk ) − F∗ ≤ βk (ϕ0 (x∗ ) − F∗ ) + δk holds true.
3.2. Construction of an estimate sequence. In this section, we review a
general procedure, introduced in [52], for generating an estimate sequence of a proper,
l.s.c., and convex function F : H → R. First of all, we deal with the generation of the
sequence of functions (ϕk )k∈N .
For any sequence of parameters ((zk , ηk , ξk , αk ))k∈N , (zk , ηk , ξk , αk ) ∈ domF ×
R+ × H × [0, 1) and any function ϕ : H → R, we recursively define the sequence of
functions (ϕk )k∈N by setting ϕ0 = ϕ and
(3.4) ϕk+1 (x) = (1 − αk )ϕk (x) + αk (F (zk+1 ) + x − zk+1 , ξk+1 − ηk ) .
One can prove that if ξk+1 ∈ ∂ηk F (zk+1 ), then
(3.5) ϕk+1 (x) − F (x) ≤ (1 − αk )(ϕk (x) − F (x)) ,

and, by induction, condition (3.1) is satisﬁed with βk = k−1i=0 (1 − αi ). If k∈N αk =

+∞, then βk → 0 and the pair ((ϕk )k∈N , (βk )k∈N ) is an estimate sequence of F .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

1616 S. VILLA, S. SALZO, L. BALDASSARRE, AND A. VERRI

Moreover, if the starting ϕ0 = ϕ is a quadratic function, written in canonical

form as ϕ0 (x) = (ϕ0 )∗ + A0 /2x − ν0 2 , with (ϕ0 )∗ ∈ R, A0 > 0, ν0 ∈ H, then all the
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ϕk ’s, deﬁned according to (3.4), are quadratic functions

Ak
(3.6) ϕk (x) = (ϕk )∗ + x − νk 2 ,
2
and Ak , νk , and (ϕk )∗ can be recursively derived from the parameters (zk , ηk , ξk , αk )k∈N ,
(3.7)
⎧
⎪ Ak+1 = (1 − αk )Ak ,
⎪
⎪
⎪
⎨ αk
νk+1 = νk − ξk+1 ,
(1 − αk )Ak
⎪
⎪
⎪
⎪ α2k
⎩ (ϕk+1 )∗ = (1 − αk )(ϕk )∗ + αk (F (zk+1 ) + νk − zk+1 , ξk+1 − ηk ) − ξk+1 2 .
2Ak+1

Next, it remains to generate a sequence (xk )k∈N satisfying inequality (3.2) and to
study the asymptotic behavior of βk . To this aim we recall two lemmas, whose proofs
are provided in [52], that will be essential in the derivation of the algorithm.
Lemma 3.3. Suppose for some k ∈ N, ϕk is deﬁned as in (3.6) and ϕk+1 according
to (3.4) with ξk+1 ∈ ∂ηk F (zk+1 ). If xk ∈ H satisﬁes F (xk ) ≤ (ϕk )∗ + δk for some
δk ≥ 0, then, setting yk = (1 − αk )xk + αk νk , for any λ > 0, it holds

λ α2k
(1−αk )δk +ηk +(ϕk+1 )∗ ≥ F (zk+1 )+ 2− ξk+1 2 + yk −(λξk+1 +zk+1 ), ξk+1 .
2 Ak+1 λ

Lemma 3.4. Given the sequence (λk )k∈N , λk ≥ λ > 0 and A > 0, a, b > 0, a ≤ b,
deﬁne (Ak )k∈N and (αk )k∈N recursively, such that A0 = A and for k ∈ N

α2k
αk ∈ [0, 1), with a ≤ ≤ b,
(1 − αk )Ak λk
Ak+1 = (1 − αk )Ak .

Then, the sequence deﬁned by setting βk := i=0 (1 − αi ) satisﬁes βk = O(1/k 2 ).

k−1

Moreover, if (λk )k∈N is also bounded from above, βk ∼ 1/k 2 .

4. Derivation of the general algorithm. In this section, we show how the
mechanism of estimate sequences can be used to generate an inexact version of accel-
erated forward-backward algorithms. A general theorem of convergence will also be
provided.
We shall assume both the hypotheses (H1) and (H2), given in the introduction,
to be satisﬁed. The following lemma will enable us to build an appropriate estimate
sequence.
Lemma 4.1. For any x, y ∈ H, z ∈ domg, ε ≥ 0, and ζ ∈ ∂ε g(z) it holds

L
(4.1) F (x) ≥ F (z) + x − z, ∇f (y) + ζ − z − y2 − ε.
2
In other words, ∇f (y) + ζ ∈ ∂η F (z), with η = L/2z − y2 + ε .
Proof. Fix x, y, z ∈ H. Since ∇f is L-Lipschitz continuous, it holds

L
(4.2) f (y) ≥ f (z) − z − y, ∇f (y) − z − y2 .
2

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

ACCELERATED AND INEXACT FORWARD-BACKWARD ALGORITHMS 1617

On the other hand, being f convex, we have f (x) ≥ f (y) + x − y, ∇f (y), which
combined with (4.2) gives
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

L
(4.3) f (x) ≥ f (z) + x − z, ∇f (y) − z − y2 .
2
Since g is convex and ζ ∈ ∂ε g(z), we have g(x) ≥ g(z) + x − z, ζ − ε, that summed
with (4.3), gives the statement.
Combining Lemma 4.1 with Lemma 3.3, we derive the following result.
Lemma 4.2. Suppose for some k ∈ N, ϕk is deﬁned as in (3.6) and xk ∈ H
satisﬁes F (xk ) ≤ (ϕk )∗ + δk for some δk ≥ 0. Set yk = (1 − αk )xk + αk νk . For any
εk ≥ 0, λk > 0, let

yk − xk+1 L ε2
xk+1 εk proxλk g (yk − λk ∇f (yk )), ξk+1 = , ηk = yk − xk+1 2 + k .
λk 2 2λk

Then ξk+1 ∈ ∂ηk F (xk+1 ), and if ϕk+1 is deﬁned according to (3.4) with zk+1 = xk+1 ,
(4.4)
ε2 λk α2k
(1 − αk )δk + k + (ϕk+1 )∗ ≥ F (xk+1 ) + 2 − λk L − ξk+1 2 .
2λk 2 (1 − αk )Ak λk

Proof. Recalling Deﬁnition 2.1, since xk+1 εk proxλk g (yk − λk ∇f (yk )), we have

yk − xk+1
(4.5) ζk+1 := − ∇f (yk ) ∈ ∂ε2k /(2λk ) g(xk+1 ) .
λk

Therefore Lemma 4.1 gives ξk+1 = ∇f (yk ) + ζk+1 ∈ ∂ηk F (xk+1 ) and Lemma 3.3 gives
(4.6)
ε2k λk α2k
(1 − αk )δk + + (ϕk+1 )∗ ≥ F (xk+1 ) + 2− ξk+1 2
2λk 2 (1 − αk )Ak λk
L
+ yk − (λk ξk+1 + xk+1 ), ξk+1 − yk − xk+1 2 .
2
Now, since yk = λk ξk+1 + xk+1 , the scalar product on the right-hand side of (4.6) is
zero, and (4.4) follows.
We are now ready to define a general accelerated and inexact forward-backward
splitting (AIFB) algorithm and to prove its convergence rate.
Theorem 4.3. For fixed numbers t0 > 1, a ∈ ]0, 2[, sequences of parameters
(λk )k∈N , λk ∈ ]0, (2 − a)/L] and (ak )k∈N such that a ≤ ak ≤ 2 − λk L, and a sequence
of errors (εk )k∈N with εk ≥ 0, we choose x0 = y0 ∈ dom g and for every k ∈ N, we
recursively define

1 + 1 + 4(ak λk )t2k /(ak+1 λk+1 )
tk+1 = ,
2
(AIFB) xk+1 εk proxλk g (yk − λk ∇f (yk )) ,
tk − 1 tk
yk+1 = xk+1 + (xk+1 − xk ) + (1 − ak ) (yk − xk+1 ) .
tk+1 tk+1

Then, setting zk+1 = xk+1 , ξk+1 = (yk − xk+1 )/λk , ηk = L/2yk − xk+1 2 + ε2k /(2λk ),
and αk = t−1k , the sequence (ϕk )k∈N deﬁned according to (3.4) starting from ϕ0 =

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

1618 S. VILLA, S. SALZO, L. BALDASSARRE, AND A. VERRI

F (x0 ) + A0 /2 · −x0 2 , with A0 = 1/(t0 (t0 − 1)a0 λ0 ), is an estimate sequence for F
and it holds
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ck
(4.7) δk+1 + (ϕk+1 )∗ ≥ F (xk+1 ) + yk − xk+1 2 ,
2λk

with ck = 2 − λk L − ak ≥ 0, δ0 = 0, and δk+1 = (1 − αk )δk + ε2k /(2λk ).

Proof. Let k ∈ N. From the deﬁnition of tk+1 , it follows

λk ak 2
(4.8) t2k+1 − tk+1 − t = 0.
λk+1 ak+1 k

Since αk = t−1 k ∈ (0, 1) then, from (4.8), we have (1 − αk+1 )ak+1 λk+1 /α2k+1 =
2
ak λk /αk , and hence

α2k α2k+1
(4.9) (1 − αk ) = .
(1 − αk )ak λk (1 − αk+1 )ak+1 λk+1

If we set Ak = α2k /[(1 − αk )ak λk ], (4.9) turns into Ak (1 − αk ) = Ak+1 as in (3.7), and
the inequality ak ≤ 2 − λk L gives

α2k
(4.10) + λk L ≤ 2 .
(1 − αk )Ak λk

Next, the update yk+1 in (AIFB) can be written as

αk+1 αk+1
(4.11) yk+1 = (1 − αk+1 )xk+1 + [yk − (1 − αk )xk ] − ak (yk − xk+1 ) .
αk αk

Therefore, setting νk = α−1

k (yk − (1 − αk )xk ) for every k ∈ N, we have

(4.12) αk νk = yk − (1 − αk )xk ,
αk+1 νk+1 = yk+1 − (1 − αk+1 )xk+1

and hence, substituting into (4.11) and recalling the deﬁnition of Ak , we get
αk
(4.13) νk+1 = νk − (yk − xk+1 ) .
(1 − αk )Ak λk

If we set ξk+1 = (yk − xk+1 )/λk , (4.13) becomes as in (3.7). Now deﬁne (ϕk )k∈N
according to (3.4) using the parameters (xk , ηk , ξk , αk )k∈N and starting from ϕ0 =
F (x0 ) + A0 /2 · −x0 2 . Then ϕk = (ϕk )∗ + Ak /2 · −νk 2 for every k ∈ N and we
have δ0 + (ϕ0 )∗ ≥ F (x0 ). Reasoning by induction, and using Lemma 4.2, we obtain
ξk+1 ∈ ∂ηk F (xk+1 ) and (4.7). Finally note that, since by assumption and (4.10),
a ≤ α2k /((1 − αk )Ak λk ) ≤ 2, Lemma 3.4 ensures that βk = i=0 (1 − αi ) tends
k−1

to 0.
Remark 3 (retrieving FISTA[7]). In the initialization step of (AIFB), we are
allowed to choose t0 = 1, as soon as a0 = 1. Indeed, as one can easily check, with
these choices we get t1 > 1 and y1 = x1 . Therefore the sequences continue as if
they started from (t1 , x1 , y1 ). This shows that algorithm (AIFB) includes FISTA by
choosing ak = 1 and λk = λ ≤ 1/L, starting with t0 = 1. Moreover, for f = 0 and
ak = 2, we also obtain the proximal point algorithm given in the appendix of [30].

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

ACCELERATED AND INEXACT FORWARD-BACKWARD ALGORITHMS 1619

Remark 4. (AIFB) can be equivalently written in terms of αk = 1/tk . This leads

to a generalization of the formulation given in [61, equations (34)–(36)]. In this case
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

αk updates as follows:

1 ak+1 λk+1 2 4 ak+1 λk+1 2 ak+1 λk+1 2
(4.14) αk+1 = αk + 4 αk − αk .
2 ak λk ak λk ak λk

We next consider convergence.

Theorem 4.4. Consider the AIFB algorithm for (λk )k∈N , λk ∈ [λ, (2 − a)/L],
with λ ∈ ]0, 2/L[, a ∈ (0, 2 − λL], and (ak )k∈N , a ≤ ak ≤ 2 − λk L.
Then, if εk = O(1/k q ) with q > 1/2, the sequence (xk )k∈N is minimizing for F
and if the inﬁmum of F is attained the following bounds on the rate of convergence
hold true
⎧
⎪
⎪ O 1/k 2 if q > 3/2,
⎪
⎪
⎨
F (xk ) − F∗ = O 1/k 2 + O log k/k 2 if q = 3/2,
⎪
⎪
⎪
⎩O 1/k 2 + O 1/k 2q−1 if q < 3/2.
⎪

Proof. By Theorems 3.2 and 4.3, it is enough to study the asymptotic behavior
of the sequences βk and δk . Since λk ∈ [λ, (2 − a)/L], by Lemma 3.4, βk ∼ 1/k 2 .
Concerning the structure of the error term δk , it is easy to prove (see Lemma 3.3
in [30]) that the solution of the diﬀerence equation δk+1 = (1 − αk )δk + ε2k /(2λk ),
obtained in Theorem 4.3, is given by

βk ε2i
k−1
(4.15) δk = .
2 i=0 λi βi+1

Hence the statement follows as in [52, Theorem 4.8].

The rates of convergence given in Theorem 4.4 hold for the function values and
not for the iterates, as is usual for accelerated schemes [7, 61]. In particular, we
proved that the proposed algorithm shares the convergence rate of the exact one, if
the errors εk , in the computation of the proximity operator in (1.1), decay as 1/k q
with q > 3/2. We underline that summability of the errors is not required to get
convergence, which is guaranteed for q > 1/2. If the infimum is not achieved, it is not
√k ) − F∗ , but inequality (3.3) ensures that a
possible to get a convergence rate for F (x
solution within accuracy σ requires O(1/ σ) iterations if q > 3/2 and O(1/σ 1/(2q−1) )
if 1/2 < q < 3/2. We finally point out that the results given in Theorem 4.4 provide
lower bounds for the convergence rates of the AIFB algorithm, meaning that faster
empirical rates might be observed for particular instances of problem (P).
Remark 5 (backtracking step size rule). As in other forward-backward splitting
schemes, the above procedure requires the explicit knowledge of the Lipschitz constant
of ∇f . Often in practice, especially for large scale problems, computing L might be
too demanding. For this reason, variants of forward-backward splitting algorithms
which avoid the computation of L have been proposed [45, 7]. They add a finite
subroutine called backtracking procedure without affecting the convergence rate. We
remark that a proper backtracking can be added to AIFB as well.
5. Study of the global nested algorithm. In this section we consider the
entire two-loops algorithm that results from the composition of AIFB with an inner
algorithm which computes the proximity operator.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

1620 S. VILLA, S. SALZO, L. BALDASSARRE, AND A. VERRI

5.1. Computing admissible approximations. We ﬁrst cope with the com-

putation of solutions of the subproblem
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(5.1) z ε proxλg (y)

required by the proposed algorithm at each iteration. There are various possibilities
for solving problem (5.1). In [20, 19] a bundle algorithm returning an element z ∈ H
satisfying (5.1) is provided, and convergence in a finite number of steps is proved when
g is Lipschitz continuous over bounded sets (see Algorithm 6.1 and Proposition 6.1 in
[19]). As in section 2, we consider the case of g(x) = ω(Bx). Propositions 2.2 and 2.3
state that for finding solutions of problem (5.1) it is sufficient to minimize the duality
gap. Indeed, if v ∈ G is such that
ε2
(5.2) G(y − λB ∗ v, v) ≤ ,
2λ
then z = y − λB ∗ v solves problem (5.1). It is evident that condition (5.2) can be
explicitly checked in practice. In the following, using the same notation as section 2, we
show that each algorithm that produces a minimizing sequence for the dual function
Ψλ yields a corresponding convergent sequence for the primal and, if ω is continuous
on the entire G, a minimizing sequence for the duality gap as well,
Theorem 5.1. Let dom ω = G, v be a solution of the dual problem min Ψλ , and
(vn )n∈N be a minimizing sequence for Ψλ . Let z = y − λB ∗ v be the solution of the
primal problem (2.8), and set zn = y − λB ∗ vn . Then it holds
zn → z , G(zn , vn ) → 0 .
2p
Moreover, if Ψλ (vn ) − Ψλ (v) = O(1/n ) for some p > 0, we have

1 1
(5.3) zn − z = O p
, G(z n , vn ) = O .
n np
Proof. We claim that
1
(5.4) zn − z2 ≤ Ψλ (vn ) − Ψλ (v).
2λ
To prove (5.4), first note that
1 1
(5.5) λB ∗ vn − y2 − λB ∗ v − y2 + Bz, vn − v
2λ 2λ
1 1
= λB ∗ (vn + v) − 2y, λB ∗ (vn − v) + 2(y − λB ∗ v), λB ∗ (vn − v)
2λ 2λ
1
= λB ∗ (vn + v) − 2λB ∗ v, λB ∗ (vn − v)
2λ
1
= λB ∗ (vn − v)2 .
2λ
Writing the first-order optimality conditions for v, we have that 0 ∈ B(λB ∗ v − y) +
∂ω ∗ (v) or, equivalently, the primal solution z satisfies Bz ∈ ∂ω ∗ (v), which implies
ω ∗ (vn ) − ω ∗ (v) − Bz, vn − v ≥ 0. Summing the last inequality with (5.5), we get
1 1
Ψλ (vn ) − Ψλ (v) = λB ∗ vn − y2 − λB ∗ v − y2 + ω ∗ (vn ) − ω ∗ (v)
2λ 2λ
1
≥ λB ∗ (vn − v)2
2λ
1
= zn − z2 .
2λ

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

ACCELERATED AND INEXACT FORWARD-BACKWARD ALGORITHMS 1621

Since dom ω = G, ω is continuous on G and hence Φλ is continuous on H. Therefore

Φλ (zn ) → Φλ (z). This implies, being that Φλ (z) = −Ψλ (v),
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

G(zn , vn ) = Φλ (zn ) + Ψλ (vn ) → Φλ (z) + Ψλ (v) = 0 .

Now suppose that Ψλ (vn ) − Ψλ (v) = O(1/n2p ). Then, the ﬁrst part of statement
(5.3) directly follows from (5.4). Regarding the rate on the duality gap, note that the
function Φλ is Lipschitz continuous on bounded sets, being convex and continuous.
Thus there exists L1 > 0 such that
√ 1/2
Φλ (zn ) − Φλ (z) ≤ L1 zn − z ≤ L1 2λ (Ψλ (vn ) − Ψλ (v)) .

This shows that the convergence rate stated for the duality gap in (5.3) holds.
In order to compute admissible approximations of the proximal point, we can
choose any minimizing algorithm for the dual problem. A simple choice is the forward-
backward splitting algorithm (called also ISTA [7]). Since √ for this choice Ψλ (vn ) −
Ψλ (v) = O(1/n), this gives the rate G(zn , vn ) = O(1/ n) for the duality gap. We
remark that the pair of sequences (y − λB ∗ vn , vn ) corresponds exactly to the pair
(xn , yn ) generated by the primal-dual Algorithm 1 proposed in [14] when applied to
1
the minimization of Φλ (x) = g(x) + 2λ x − y2 (τ = λ, θ = 1).
A more eﬃcient choice is FISTA, resulting in the rate G(zn , vn ) = O(1/n). The
latter will be our choice in the numerical section. For the case of ω positively homoge-
neous (e.g., total variation), it holds ω ∗ = δS , with S = ∂ω(0) and the corresponding
dual minimization problem min Ψλ becomes a constrained smooth optimization prob-
lem. Then, FISTA reduces to an accelerated projected gradient descent algorithm
γn 1
(5.6) vn+1 = PS un − B(λB ∗ un − y) , 0 < γn ≤ ,
λ B2
tn − 1
un+1 = vn+1 + (vn+1 − vn )
tn+1

with the usual choices for tn (see Remark 3). We note that in this case Propositions 2.2
and 2.3 ensure that problem (5.1) is equivalent to (5.2).
Remark 6. We highlight that the results in Theorem 5.1 hold for the more general
setting of a minimization problem of the form

(5.7) min ω(Bx) + ϕ(x) ,

x∈X

where dom ϕ = X and ϕ is c-strongly convex and diﬀerentiable with L-Lipschitz con-
tinuous gradient.2 Indeed, in this case one has z = ∇ϕ∗ (−B ∗ v), zn = ∇ϕ∗ (−B ∗ vn )
and the strong convexity of ϕ∗ allows one to get the analogous bound of (5.4)

c2
zn − z2 ≤ Ψ(vn ) − Ψ(v) ,
2L

where Ψ is the dual of (5.7).

2 This is equivalent to requiring ϕ∗ strongly convex and diﬀerentiable with Lipschitz continuous

gradient. See Theorems 4.2.1 and 4.2.2 in Chapter 4 of [32].

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

1622 S. VILLA, S. SALZO, L. BALDASSARRE, AND A. VERRI

5.2. Global iteration complexity of the algorithm. Each iteration of AIFB

consists of a gradient descent step, which we refer to as external iteration, and an inner
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

loop, to approximate the proximity operator of g up to a precision εk . Theorem 5.1

proves that using FISTA to solve the dual problem guarantees G(zn , vn ) ≤ D/n for
a constant D > 0. This shows that 2λD/ε2 iterations suﬃce to get a solution of
problem (5.1). We note that, under the additional hypotheses ω ∗ (v)/v → +∞ and
γn constant, the same number of iterations is suﬃcient to get the same convergence
rate for the gap, using the sequences of ergodic means computed via Algorithm 1
proposed in [14]. On the other hand, the algorithm provided in [19] reaches the same
goal in O(1/ε4 ) iterations.
In general, given an (internal) algorithm that solves problem (5.1) in at most

Dλ
(5.8) , p > 0,
ε2/p
iterations,3 we can bound the total iteration complexity of the AIFB algorithm. From
Theorem 4.4, if we let εk := 1/k q , and take k ≥ Ne , with
1
(C/ε) 2q−1 if 1/2 < q < 3/2 ,
Ne := 1
(C/ε) 2 if q > 3/2 ,

we have F (xk ) − F∗ ≤ ε, where C > 0 is the constant masked in the rates given in
Theorem 4.4. Now for each k ≤ Ne , from the hypothesis (5.8) on the complexity of the
2/p
internal algorithm, one needs at most Dλk /εk = Dλk k 2q/p internal iterations to get
an approximate proximal point xk+1 in (AIFB) with precision εk = 1/k q . Summing
all the internal iterations from 1 to Ne , and if λk ≤ λ, we have

Ne Ne
2q/p Dλ
Ni = Dλk k ≤ Dλ t2q/p dt = N 2q/p+1
0 2q/p + 1 e
k=1

and hence
⎧ 2q/p+1
⎨O 1/ε 2q−1 if 1/2 < q < 3/2 ,
Ni = 2q/p+1
⎩O 1/ε 2 if q > 3/2 .

Adding the costs of internal and external iterations together, we derive the following
proposition.
Proposition 5.2. Suppose problem (5.1) is solved in at most Dλ/ε2/p iterations,
for some constants p > 0 and D > 0. Then, the global iteration complexity Cg of
(AIFB) plus the inner algorithm is
⎧ 2q/p+1
⎨O 1/ε 2q−1 + O 1/ε 2q−1 1
if 1/2 < q < 3/2 ,
(5.9) C g = ci N i + c e N e = 2q/p+1
⎩O 1/ε 2 1
+ O 1/ε 2 if q > 3/2 ,

where ci and ce denotes the unitary costs of each type of iteration.

3 The constant D in general depends on the starting point and the problem solution set, and at

the end on y. If dom ω ∗ is bounded, D can be chosen independently of y, since for most algorithms
it is majorized by diam(domω ∗ ).

ACCELERATED AND INEXACT FORWARD-BACKWARD ALGORITHMS 1623

From the estimates above, one can easily see that, in each case, the lower global
complexity is reached for q → 3/2 and it is
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

p+3
Cg = O(1/ε 2p +δ )

for whatever small δ > 0. For p = 1, as it is in the case of algorithm (5.6), one
obtains a complexity of O(1/ε2+δ ). For p = 1/2, which corresponds to the rate of the
algorithm studied in [19], we have a global complexity of O(1/ε7/2+δ ). We finally note
that for p → +∞ we have a complexity of O(1/ε1/2+δ ), with δ that can be chosen
arbitrarily small. In other words, the algorithm behaves as an accelerated method.
We remark that the analysis of the global complexity given above is valid only
asymptotically, since we did not estimate any of the constants hidden in the O sym-
bols. However, in real situations constants do matter and, in practice, the most
effective accuracy rate q is problem dependent and might be different from 3/2, as we
illustrate in the experiments of subsection 6.3.
6. Numerical experiments. In this section, we present two types of experi-
ments. The first one is designed to illustrate the influence of the errors on the behavior
of AIFB and on its nonaccelerated counterpart IFB (called ISTA in [7]). The second
one is meant to measure the performance of the two-loops algorithm AIFB+algorithm
(5.6), in comparison with IFB+algorithm (5.6), and with the primal-dual algorithm
proposed in [14].
6.1. Experimental set-up. In all the following cases, we consider the regular-
ized least-squares functional
1
(6.1) F (x) := Ax − y2Y + g(x) ,
2
where H, Y are Euclidean spaces, x ∈ H, y ∈ Y, A : H → Y is a linear operator, and
g : H → R is of type (2.7). In all cases ω will be a norm and the projection onto
S = ∂ω(0) will be explicitly computable.
We minimize F using AIFB, with λk = λ = 1/L, where L = A∗ A. We use
ak = 1 (corresponding to FISTA), since we empirically observed that the choice of
ak , if independent of k, does not significantly influence the speed of convergence of
the algorithm (although preliminary tests revealed a slightly better performance for
ak = 0.8). At each iteration, we employ algorithm (5.6) to approximate the proximity
operator of g up to a precision εk . The stopping rule for the inner algorithm is
given by the duality gap, according to Proposition 2.3, item (a). Following Theorem
4.4, we consider sequences of errors of type εk = C/k q , with q, hereafter referred as
accuracy rate, chosen between 0.1 and 1.7. The coefficient C should be comparable
to the magnitude of the duality gap. In fact, it determines the practical constraint
on the duality gap at the first iterations: the constraint should be active, but not
too demanding to avoid unnecessary precision. We choose C by solving the equation
G(y0 − λ∇f (y0 ), 0) = C 2 /(2λ), where G is the duality gap corresponding to the first
proximal subproblem encountered in AIFB for k = 0, evaluated at v0 = 0. We finally
consider an “exact” version, obtained by solving the proximal subproblems at the
machine precision.
We analyze two well-known problems: deblurring with total variation regulariza-
tion and learning a linear estimator via regularized empirical risk minimization with
the overlapping group lasso penalty. The numerical experiments are divided into two
parts. In the first one, we evaluate the impact of the errors on the convergence rate

1624 S. VILLA, S. SALZO, L. BALDASSARRE, AND A. VERRI

of AIFB and the (nonaccelerated) forward-backward splitting (here denoted as IFB).

The plot of the relative objective values (F (xk ) − F∗ )/F∗ against the number of exter-
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

nal iterations for different accuracy rates on the error is shown. We underline that this
study is independent of the algorithm chosen to produce an admissible approximation
of the proximal points.
In the second part, we assess the overall behavior of the two-loops algorithm, as
described in section 5, using algorithm (5.6) to solve the proximal subproblems. We
compare it with the nonaccelerated version (IFB) and the primal-dual (PRIDU) algo-
rithm proposed by [14] for image deconvolution. For all algorithms we provide CPU
time, and the number of external and internal iterations for different precisions. Note
that the cost of each external iteration relies mainly in the evaluation of the gradient
of the quadratic part of the objective function (6.1). The internal iteration has a sim-
ilar form, but being the matrix B is sparse and structured in both experiments, can
be implemented in a fast way. All the numerical experiments have been performed in
the MATLAB environment,4 on a desktop iMac with Intel Core i5 CPU, 2.5 Ghz, 6
MB cache L3, and 6 GB of RAM.
6.1.1. Deblurring with total variation. Regularization with total variation
[50, 12, 6] is a widely used technique for deblurring and denoising images, that pre-
serves sharp edges.
In this problem, H = Y = RN ×N is the space of (discrete two dimensional) images
on the grid [1, N ]2 , A is a linear map representing some blurring operator [6], and y
is the observed noisy and blurred datum. The (discrete) total variation regularizer is
defined as
N
g = ω ◦ ∇, g(x) = τ (∇x)i,j 2 ,
i,j=1
2
where ∇ : H → H is the (discrete) gradient operator (see [12] for the precise defini-
tion) and ω : H2 → R, ω(p) = τ i,j=1 pi,j 2 with τ > 0 a regularization parameter,
N

and ·2 the euclidean norm in R2 . Note that the matrix corresponding to ∇ is highly
sparse (it is bidiagonal). This feature has been taken into account to get an eﬃcient
implementation.
We followed the same experimental setup as in [6]. We considered the 256 × 256
Lena test image, blurred by a 9 × 9 Gaussian blur with standard deviation 4, followed
by additive normal noise with zero mean and standard deviation 10−3 . The regular-
ization parameter τ was set to 10−3 . Since the blurring operator A is a convolution
operator, in the implementation it is common to evaluate it by an FFT-based method
(see, e.g., [6, 14]).
6.1.2. Overlapping group lasso. The group lasso penalty is a regularization
term for ill-posed inverse problems arising in statistical learning [64, 33], image pro-
cessing and compressed sensing [46], and enforcing structured sparsity in the solutions.
Regularization with this penalty consists in solving a problem of the form (6.1), where
H = Rp , Y = Rm , A is a data or design matrix, and y is a vector of outputs or mea-
surements. Following [33], the overlapping group lasso (OGL) penalty is
⎛ ⎞1/2
r
(6.2) g(x) = τ ⎝ (wji )2 x2j ⎠ ,
i=1 j∈Ji

4 The code is available upon request to the authors.

ACCELERATED AND INEXACT FORWARD-BACKWARD ALGORITHMS 1625

where J = {J1 , . . . , Jr } is a collection of overlapping groups of indices such that

!r
i=1 Ji = {1, . . . , p}. The weights wj are deﬁned as
i
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

aij
1
wji = , with aij = #{J ∈ J : j ∈ J, J ⊂ Ji , J = Ji }.
2

This penalty can be written as ω ◦ B, with B = (B1 , . . . , Br ) : Rp → RJ1 × · · · RJr ,

B i : R p → RJ i , Bi x = (wji xj )j∈Ji ,
r
and ω : RJ1 × · · · RJr → R, ω(v1 , . . . , vr ) = τ i=1 vi 2 , where · 2 is the euclidean
norm in RJi .
The matrix A and the datum y are generated from the breast cancer dataset
provided by [62]. The dataset consists of expression data for 8,141 genes in 295
breast cancer tumors (78 metastatic and 17 nonmetastatic). The groups are defined
according to the canonical pathways from MSigDB [60], that contains 639 groups of
genes, 637 of which involve genes from the breast cancer dataset. We restrict the
analysis to the 3510 genes that are contained in at least one group. Hence, our data
matrix A consists of 295 different expression levels of 3510 genes. The output vector y
contains the labels (±1, metastatic, or nonmetastatic) for each sample. The structure
of the overlapping groups gives rise to a matrix B of size 15126 × 3510. Despite the
high dimensionality, one can take advantage of its sparseness. We analyze two choices
of the regularization parameter: τ = 0.01 and τ = 0.1.
6.2. Results—Part I. We run AIFB and its nonaccelerated counterpart, IFB,
up to 2.000 external iterations. With the aim of maximizing the effect of inexactness,
we require algorithm (5.6) to produce solutions with errors close to the upper bounds
2k /2λ prescribed by the theory. We achieve this by reducing the internal step-size
length γn and using cold restart, i.e., initializing at each step algorithm (5.6) with
v0 = 0.
As a reference optimal value, F∗ , we use the value found afters 10,000 iterations
of AIFB with error rate q = 1.7.
As shown in Figure 6.1, the empirical convergence rate of (F (xk ) − F∗ )/F∗ is
indeed affected by the accuracy rate q: to smaller values of q correspond slower
convergence rates both for AIFB and the inexact (nonaccelerated) forward-backward
algorithm. When the errors in the computation of the proximity operator do not
decay fast enough, the convergence rates are much deteriorated and the algorithms
can even not converge to the infimum. If the errors decay sufficiently fast, AIFB
shows a faster convergence w.r.t. IFB in both experiments. In contrast, this is not
true for accuracy rates q < 1, where IFB has practically the same behavior as AIFB.
Moreover, it turns out that AIFB is more sensitive to errors than IFB. This is
more evident in the experiment on TV deblurring. Indeed, for AIFB most curves
corresponding to the different accuracy rates are well separated, while for IFB they
are closer to each other, and often completely overlapped. Yet, the overlapping phe-
nomenon in general starts earlier (lower q) for IFB than AIFB, indicating that no gain
is obtained in increasing the accuracy error rates over a certain level, in accordance
with the theoretical results.
6.3. Results—Part II. This section is the empirical counterpart of subsec-
tion 5.2. Here, we test the global iteration complexity of AIFB and inexact IFB
combined with algorithm (5.6) on the two problems described above. We provide the

1626 S. VILLA, S. SALZO, L. BALDASSARRE, AND A. VERRI
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

+ +

+
+
+ +

+
+
+
+
+

+
+

+
+ +

Fig. 6.1. Impact of the errors on AIFB and IFB. Log-log plots of relative objective value versus
external iterations, k, obtained for TV deblurring (upper row) and the OGL problem with regular-
ization parameter τ = 10−1 (bottom row). The AIFB and inexact IFB for diﬀerent accuracy rates
q in the computation of the proximity operator are shown in the left and right column, respectively.
For larger values of the parameter q the curves overlap. It can be seen from visual inspection, that
the errors aﬀect the acceleration.

number of external iterations and the total number of inner iterations. When taking
into account the cost of computing the proximity operator, there is a trade-off between
the number of external and internal iterations. Since internal and external iterations
in general have different computational costs—which depend on the specific problem
considered and the machine CPU—the total number of iterations is not a good mea-
sure of the algorithm’s performance. For instance, on our computer, the ratio between
the cost of the external and internal iteration is about 2.15 in the TV deblurring and
2.5 in the OGL problem. Therefore, we also report the CPU time needed to reach
a desired accuracy for the relative difference from the optimal value. In this part,
we use the warm-restart procedure, consisting in initializing algorithm (5.6) with the
solution obtained at the previous step. We empirically observed that this initializa-
tion strategy drastically reduces the total number of iterations and speeds up the
algorithm.
We compare AIFB and IFB with PRIDU taken as a benchmark, since it often
outperforms state-of-the-art methods, in particular for TV regularization (see the
numerical section in [14]).
Algorithm PRIDU depends on two parameters5 σ, ρ > 0. In our experiments, we
tested two choices, indicated by the authors (in the paper and code as well) for the
image deblurring and denoising problem: σ = 10 and ρ = 1/(σB2 ), and ρ = 0.01
(corresponding to σ = 1/(ρB2 ) = 12.5 for the TV problem and σ 1.07 for the
OGL problem). We also implemented the algorithm for the OGL problem and, as

5 Denoted σ and τ in [14].

ACCELERATED AND INEXACT FORWARD-BACKWARD ALGORITHMS 1627

a consequence of preliminary tests, the same choices of parameters turn out to be

appropriate too.
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

On the other hand, AIFB and IFB depend on the accuracy rate q. We verified
that the best empirical results are obtained choosing q in the range [1, 1.5] for AIFB
and [0.1, 0.5] for IFB. This once more confirms the higher sensitivity to the errors of
the accelerated version w.r.t. the basic one. In Tables 6.1–6.3, we detail the results
only for the most significant choices of q. We remark that the “exact” version of AIFB
(and IFB), where the prox is computed at machine precision at each step, is not even
comparable to the results we reported here.

Table 6.1
Deblurring with TV regularization, τ = 10−3 . Performance evaluation of AIFB, IFB, and
PRIDU, corresponding to diﬀerent choices of the parameters q and σ, respectively. Concerning
AIFB and IFB, the results are reported only for the q’s giving the best results. The entries in the
table refer to the CPU time (in seconds) needed to reach a relative diﬀerence w.r.t. to the optimal
value below the thresholds 10−4 , 10−6 , and 10−8 , the number of external iterations (# Ext), and
the total number of internal iterations (# Int).

Precision 10−4 10−6 10−8

Algo Time # Ext # Int Time # Ext # Int Time # Ext # Int
AIFB
q=1 11.8 137 1062 124.2 905 12313 1750 8776 182006
q = 1.3 16.2 118 1600 63.6 387 6437 272.1 1300 28350
q = 1.5 26.0 117 2734 98.7 373 10540 414.5 1085 45297
IFB
q = 0.1 36.9 1341 1341 147.2 5346 5346 635.4 23031 23031
q = 0.8 36.9 1341 1341 147.2 5346 5346 635.4 23031 23031
q = 1.0 63.2 1337 4533 189.9 5226 11126 745.1 18224 48333
PRIDU
σ = 10 7.4 362 - 165.7 8186 - 4684 231848 -
σ = 12.5 6.2 310 - 132.2 6609 - 3715 185588 -

Table 6.2
Breast cancer dataset: OGL τ = 10−1 . Performance evaluation of AIFB, IFB, and PRIDU,
corresponding to diﬀerent choices of the parameters q and σ, respectively. Concerning AIFB and
IFB, the results are reported only for the q’s giving the best results. The entries in the table refer to
the CPU time (in seconds) needed to reach a relative diﬀerence w.r.t. to the optimal value below the
thresholds 10−4 , 10−6 , and 10−8 , the number of external iterations (# Ext), and the total number
of internal iterations (# Int).

Precision 10−4 10−6 10−8

Algo Time # Ext # Int Time # Ext # Int Time # Ext # Int
AIFB
q=1 3.9 104 3985 41.5 983 42239 414.1 9748 421769
q = 1.3 2.1 51 2103 11.2 247 11389 60.4 1179 61915
q = 1.5 2.8 50 2857 16.2 199 16945 61.3 548 64518
IFB
q = 0.1 5.3 1675 1682 10.7 3421 3428 16.0 5124 5131
q = 0.3 5.2 1613 1730 10.3 3246 3363 15.9 5065 5182
q = 0.5 4.4 1217 1827 9.5 2850 3460 14.9 4603 5213
q = 0.8 7.0 585 6092 15.5 2218 11264 19.8 3599 12645
q =1 12.4 535 12031 26.6 1236 25547 42.1 3606 36508
PRIDU
σ = 10 10.5 2901 - 25.4 7040 - 47.4 13141 -
σ = 1.07 5.8 1602 - 11.0 3026 - 16.1 4452 -

1628 S. VILLA, S. SALZO, L. BALDASSARRE, AND A. VERRI

Table 6.3
Breast cancer dataset: OGL, τ = 10−2 . See caption of Table 6.2.
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Precision 10−4 10−6 10−8

Algo Time # Ext # Int Time # Ext # Int Time # Ext # Int
AIFB
q = 0.8 11.8 443 11392 74.4 2651 72109 1124 39699 1089732
q=1 12.1 432 11616 44.8 1581 43191 170.9 6004 164849
q = 1.3 27.0 431 27311 126.9 1572 129708 502.9 4687 518492
q = 1.5 62.0 431 64351 312.5 1572 325868 1303 4686 1362149
IFB
q = 0.1 34.9 11125 11125 69.4 22111 22111 112.3 35782 35782
q = 0.3 34.9 11125 11125 69.4 22111 22111 112.3 35782 35782
q = 0.5 35.6 11124 11946 70.1 22109 22931 113.0 35781 36603
q = 0.8 133.7 11095 114686 218.3 21883 178405 273.2 35781 203992
q=1 335.7 11093 348408 659.7 21818 643374 882.9 33075 851890
PRIDU
σ = 10 21.8 5625 - 44.6 11529 - 82.5 21346 -
σ = 1.07 4.6 1178 - 24.7 6407 - 827.5 214558 -

As concerns the TV problem, AIFB (q = 1.3 or q = 1.5) outperforms both PRIDU

and IFB, for high precisions. PRIDU exhibits a fast convergence at the beginning, but
then explodes in correspondence with higher precisions, for both choices of σ. This is
a known drawback of primal-dual algorithms with fixed step size (see, e.g., [9]).
The behavior for the OGL problem is presented for two choices of the regular-
ization parameter, since this heavily influences the results. For τ = 0.1 and precision
10−4 , AIFB is the fastest. For the middle precision, all the algorithms’ performances
are comparable. For the highest precision, PRIDU and IFB perform better. We notice
the very good behavior of IFB, which is probably due to the warm-restart strategy
combined with the greater stability of IFB against errors. Finally, for the OGL with
τ = 0.01, AIFB still accelerates IFB at the lower precisions if q is properly tuned,
though at the end IFB wins. The PRIDU algorithm suffers from the same drawbacks
noted in the TV experiment for σ = 1.07, but exhibits an overall good performance
with σ = 10.
Summarizing, the performance of algorithm AIFB combined with (5.6) and warm
restart is comparable with state-of-the-art algorithms, being sometimes better. To
this purpose, the experiments also give some guidelines for choosing the parameter q.
We also show situations where the acceleration is lost, in particular, referring to high
precision.
Appendix A. Accelerated FB algorithms under error criterion (2.15).
We give here a discussion of the behavior of AIFB, replacing the notion of inex-
actness (2.3) with the one given in (2.15) (denoted by ε ): this is the kind of error
considered in [54]. More precisely, the following theorem holds true.
Theorem A.1. In the same hypotheses of Theorem 4.4, replace at each step the
update xk+1 in AIFB with xk+1 εk proxλk g (yk − λk ∇f (yk )), where εk = O(1/k q )
and q > 3/2. Assuming in addition that ak ≤ 1 and λk L ≤ 1, then the following
convergence rates on the objective values hold:
⎧
⎪
⎨O (1/k) if q > 2 ,
2
F (xk ) − F∗ = O log k/k if q = 2 ,
⎪
⎩
O 1/k 2q−3 if q < 2 .

ACCELERATED AND INEXACT FORWARD-BACKWARD ALGORITHMS 1629

The above rates can be obtained relying on our techniques, and are slower than the
ones given in Theorem 4.4. This is in line with what was obtained in [52, section 4.1].
In [54], using diﬀerent techniques, the convergence rate O(1/k 2 ) is proved.
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Lemma A.2. Let x ε proxλg (y − λ∇f (y)), λL ≤ 1, and w ∈ H such that

w − y ≤ αγ for some γ > 0 and α ∈ [0, 1). Then there exist ε1 , ε2 ≥ 0 with
ε21 + ε22 ≤ ε2 , and e ∈ H with e ≤ ε2 + αγ such that, if ζ = (w − λ∇f (w) − x − e)/λ,
then ζ ∈ ∂ ε21 g(x) .
2λ
Proof. By Lemma 1 in [52], there exist ε1 , ε2 ≥ 0 with ε21 + ε22 ≤ ε2 and ē ∈ H,
such that ē ≤ ε2 and

y − λ∇f (y) − x − ē
ζ := ∈ ∂ ε21 g(x).
λ 2λ

By adding and subtracting (w + ∇f (w)) from the previous equation, we get

w − λ∇f (w) − x − e
(A.1) ζ= with e = ē + w − y + λ∇f (y) − λ∇f (w) .
λ
By the Baillon–Haddad theorem [5], we have

e ≤ ē + (I − λ∇f )(w) − (I − λ∇f )(y) ≤ ε2 + αγ .

Remark 7. In the proof of Theorem 4.3, the set up of the parameters deﬁning
the estimate sequence for AIFB does not depend on the notion of inexactness for
the proximal point. More precisely, starting from the AIFB algorithm, but with the
notion of inexact prox (2.15), the same auxiliary sequences (αk )k∈N , (Ak )k∈N , (νk )k∈N
can be introduced and all the equations (4.9)–(4.13) remain true. In particular,

(A.2) yk = (1 − αk )xk + αk νk ,
αk
(A.3) νk+1 = νk − (yk − xk+1 ) .
(1 − αk )Ak λk

The critical point is that now we cannot argue ξk+1 = (yk − xk+1) /λk ∈ ∂ηk F (xk+1 )
anymore, since Lemma 4.2 requires xk+1 to be an inexact prox in the sense of (2.3).
Hence the construction of the estimate sequence cannot be ﬁnalized.
The following lemma overcomes this situation by introducing an estimate sequence
centered on new points uk ’s, which are “close” to the νk ’s. It is the analogue of
Lemma 4.2 for errors of type (2.15).
Lemma A.3. Suppose for some k ∈ N, xk , uk , νk ∈ H, Ak > 0, ϕk = (ϕk )∗ +
Ak /2 · −uk 2 are such that F (xk ) ≤ (ϕk )∗ + δk and νk − uk ≤ γk for some
α2k
γk , δk ≥ 0. Let λk > 0, αk ∈ [0, 1), and assume (1−αk )A k λk
≤ 1 and λk L ≤ 1. Set
yk = (1−αk )xk +αk νk , wk = (1−αk )xk +αk uk , and xk+1 εk proxλk g (yk −λk ∇f (yk ))
for some εk ≥ 0. Then there exist ek ∈ H, ε1k , ε2k > 0, ε21k + ε22k ≤ ε2k , ek ≤
ε2k + αk γk such that, if ϕk+1 is deﬁned according to (3.4) with zk+1 = xk+1 and

wk − xk+1 − ek L ε2
ξk+1 = , ηk = wk − xk+1 2 + 1k ,
λk 2 2λk
we have ξk+1 ∈ ∂ηk F (xk+1 ) and

(εk + αk γk )2
(A.4) (1 − αk )δk + + (ϕk+1 )∗ ≥ F (xk+1 ) .
2λk

1630 S. VILLA, S. SALZO, L. BALDASSARRE, AND A. VERRI

Moreover, if νk is updated according to (A.3) and uk+1 is the center of ϕk+1 (which
is deﬁned according to the second equation in (3.7)), it holds
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(A.5) uk+1 − νk+1 ≤ γk+1

with γk+1 = γk + εk /αk .
Proof. Clearly yk − wk ≤ αk γk . Then, from Lemma A.2, we get
wk − λk ∇f (wk ) − xk+1 − ek
(A.6) ζk+1 := ∈ ∂ ε21k g(xk+1 )
λk 2λ

for some ek ∈ H, ε1k , ε2k > 0 with ε21k +ε22k ≤ ε2k , ek ≤ ε2k +αk γk . From Lemma 4.1
it follows ξk+1 := (wk −xk+1 −ek )/λk = ∇f (wk )+ζk+1 ∈ ∂ηk F (xk+1 ). Then applying
Lemma 3.3, we have
ε21k
(1 − αk )δk + + (ϕk+1 )∗
2λk

λk α2k
≥ F (xk+1 ) + 2− ξk+1 2 + wk − (λk ξk+1 + xk+1 ), ξk+1
2 Ak+1 λk
L
− wk − xk+1 2
2
1 α2k
= F (xk+1 ) − λk ξk+1 2 − 2 wk − xk+1 , λk ξk+1
2λk Ak+1 λk

2
+ λk Lwk − xk+1

1
≥ F (xk+1 ) − wk − xk+1 − λk ξk+1 2
2λk
1
= F (xk+1 ) − ek 2 ,
2λk
where in the last inequality we use the assumptions α2k /(Ak+1 λk ) ≤ 1 and λk L ≤ 1.
Moreover, from the definitions of ε1k , ε2k , and ek it holds
(εk + αk γk )2 ≥ ε21k + ε22k + 2εk αk γk + (αk γk )2 ≥ ε21k + (ε2k + αk γk )2 ≥ ε21k + ek 2
and (A.4) follows. To prove (A.5), first note that from the definition we derive
1
(A.7) u k = νk + (wk − yk ) .
αk
Next, by (3.7), (A.7), (A.3), and the definition of ek in (A.1), we get
αk
uk+1 = uk − (wk − xk+1 − ek )
(1 − αk )Ak λk
αk 1
= νk − (yk − xk+1 ) + (wk − yk )
(1 − αk )Ak λk αk
αk
− (wk − yk − ek )
(1 − αk )Ak λk

1 α2k
= νk+1 + (wk − yk ) − λk (∇f (wk ) − ∇f (yk ))
αk (1 − αk )Ak λk

α2k
+ ēk .
(1 − αk )Ak λk

ACCELERATED AND INEXACT FORWARD-BACKWARD ALGORITHMS 1631

Therefore, recalling that by assumption α2k /((1 − αk )Ak λk ) ≤ 1 and ēk ≤ ε2k , the
Baillon–Haddad theorem implies
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1 εk
uk+1 − νk+1 ≤ wk − yk + ε2k ≤ γk + .
αk αk
Proof of Theorem A.1. Taking into account Remark 7 and reasoning by induction,
Lemma A.3 ensures that there exist sequences (ξk )k∈N , (ηk )k∈N , such that ξk+1 ∈
∂ηk F (xk+1 ) and the sequence (ϕk )k∈N constructed according to (3.4) with zk+1 =
xk+1 , starting from ϕ0 = F (x0 ) + A0 /2 · −u0 2 and u0 = x0 , satisﬁes

δk + (ϕk )∗ ≥ F (xk )

with δ0 = 0 and
(αk γk+1 )2 εk
δk+1 = (1 − αk )δk + , γk+1 = γk + , γ0 = 0 .
2λk αk
This shows that the sequence (δk )k∈N is actually the same studied in [52, IAPPA1].
The statement now follows from the subsequent Theorem 4.5 in [52].

REFERENCES

[1] Y. I. Alber, R. S. Burachik, and A. N. Iusem, A proximal point method for nonsmooth
convex optimization problems in Banach spaces, Abstr. Appl. Anal., 2 (1997), pp. 97–120.
[2] A. Argyriou, C. A. Micchelli, M. Pontil, L. Shen, and Y. Xu, Efficient First Order
Methods for Linear Composite Regularizers, preprint, arXiv:1104.1436v1, 2011.
[3] A. Auslender, Numerical methods for nondifferentiable convex optimization, Nonlinear Anal-
ysis and Optimization, Math. Programming Stud., (1987), pp. 102–126.
[4] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with sparsity-inducing
penalties, Found. Trends Mach. Learn., 4 (2012), pp. 1–106.
[5] H. H. Bauschke and P. L. Combettes, The Baillon-Haddad theorem revisited, J. Convex
Anal., 17 (2010), pp. 781–787.
[6] A. Beck and M. Teboulle, Fast gradient-based algorithms for constrained total variation
image denoising and deblurring, IEEE Trans. Image Process., 18 (2009), pp. 2419–2434.
[7] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse
problems, SIAM J. Imaging Sci., 2 (2009), pp. 183–202.
[8] S. Becker, J. Bobin, and E. Candès, NESTA: A fast and accurate first-order method for
sparse recovery, SIAM J. Imaging Sci., 4 (2011), pp. 1–39.
[9] S. Bonettini and V. Ruggiero, On the convergence of primal-dual hybrid gradient algorithms
for total variation image restoration, J. Math. Imaging Vision, 44 (2012), pp. 1–18.
[10] K. Bredies, A forward-backward splitting algorithm for the minimization of non-smooth convex
functionals in Banach space, Inverse Problems, 25 (2009), 015005.
[11] R. S. Burachik and B. F. Svaiter, A relative error tolerance for a family of generalized
proximal point methods, Math. Oper. Res., 26 (2001), pp. 816–831.
[12] A. Chambolle, An algorithm for total variation minimization and applications, J. Math.
Imaging Vision, 20 (2004), pp. 89–97.
[13] A. Chambolle and P.-L. Lions, Image recovery via total variation minimization and related
problems, Numer. Math., 76 (1997), pp. 167–188.
[14] A. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with
applications to imaging, J. Math. Imaging Vision, 40 (2011), pp. 120–145.
[15] C. Chaux, J.-C. Pesquet, and N. Pustelnik, Nested iterative algorithms for convex con-
strained image recovery problems, SIAM J. Imaging Sci., 2 (2009), pp. 730–762.
[16] P. L. Combettes, D. Dũng, and B. C. Vũ, Dualization of signal recovery problems, Set-
Valued Var. Anal., 18 (2010), pp. 373–404.
[17] P. L. Combettes and J.-C. Pesquet, Proximal splitting methods in signal processing, in
Fixed-Point Algorithms for Inverse Problems in Science and Engineering, H. H. Bauschke,
R. Burachik, P. L. Combettes, V. Elser, D. R. Luke, and H. Wolkowicz, eds., Springer-
Verlag, New York, 2011, pp. 185–212.

1632 S. VILLA, S. SALZO, L. BALDASSARRE, AND A. VERRI

[18] P. L. Combettes and V. R. Wajs, Signal recovery by proximal forward-backward splitting,

Multiscale Model. Simul., 4 (2005), pp. 1168–1200.
[19] R. Cominetti, Coupling the proximal point algorithm with approximation methods, J. Optim.
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Theory Appl., 95 (1997), pp. 581–600.

[20] R. Correa and C. Lemarechal, Convergence of some algorithms of convex minimization,
Math. Program., 62 (1993), pp. 261–275.
[21] I. Daubechies, G. Teschke, and L. Vese, Iteratively solving linear inverse problems under
general convex constraints, Inverse Problems Imaging, 1 (2007), pp. 29–46.
[22] O. Devolder, F. Glineur, and Y. Nesterov, First-order methods of smooth convex opti-
mization with inexact oracle, Math. Program., (2011), pp. 1–39.
[23] J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting,
J. Mach. Learn. Res., 10 (2009), pp. 2899–2934.
[24] J. Eckstein, Approximate iterations in Bregman-function-based proximal algorithms, Math.
Program., 83 (1998), pp. 113–123.
[25] B. Eicke, Iteration methods for convexly constrained ill-posed problems in Hilbert space, Nu-
mer. Funct. Anal. Optim., 13 (1992), pp. 413–429.
[26] E. Esser, X. Zhang, and T. F. Chan, A general framework for a class of first order primal-
dual algorithms for convex optimization in imaging science, SIAM J. Imaging Sci., 3 (2010),
pp. 1015–1046.
[27] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, Gradient projection for sparse
reconstruction: Application to compressed sensing and other inverse problems, IEEE J.
Sel. Top. Signal Process., 1 (2007), pp. 586–597.
[28] M. Fornasier, ed., Theoretical Foundations and Numerical Methods for Sparse Recovery,
Radon Ser. Comput. Appl. Math. 9, De Gruyter, Berlin, 2010.
[29] O. Güler, On the convergence of the proximal point algorithm for convex minimization, SIAM
J. Control Optim., 29 (1991), pp. 403–419.
[30] O. Güler, New proximal point algorithms for convex minimization, SIAM J. Optim., 2 (1992),
pp. 649–664.
[31] B. He and X. Yuan, An accelerated inexact proximal point algorithm for convex minimization,
J. Optim. Theory Appl., 154 (2012), pp. 536–548.
[32] J.-B. Hiriart-Urruty and C. Lemaréchal, Convex Analysis and Minimization Algorithms.
II, Grundlehren Math. Wiss. 306, Springer-Verlag, Berlin, 1993.
[33] R. Jenatton, J.-Y. Audibert, and F. Bach, Structured variable selection with sparsity-
inducing norms, J. Mach. Learn. Res., 12 (2011), pp. 2777–2824.
[34] B. Lemaire, About the convergence of the proximal method, in Advances in Optimization
(Lambrecht, 1991), Lecture Notes in Econom. and Math. Systems 382, Springer, Berlin,
1992, pp. 39–51.
[35] P. L. Lions and B. Mercier, Splitting algorithms for the sum of two nonlinear operators,
SIAM J. Numer. Anal., 16 (1979), pp. 964–979.
[36] B. Martinet, Régularisation d’inéquations variationnelles par approximations successives,
Rev. Française Inform. Rech. Oper., 4 (1970), pp. 154–158.
[37] R. D. C. Monteiro and B. F. Svaiter, Convergence Rate of Inexact Proximal Point Meth-
ods with Relative Error Criteria for Convex Optimization, http://www.optimization-
online.org/DB HTML/2010/08/2714.html (2010).
[38] R. Monteiro and B. Svaiter, An accelerated hybrid proximal extragradient method for convex
optimization and its implications to second-order methods, SIAM J. Optim., 23 (2013),
pp. 1092–1125.
[39] J.-J. Moreau, Fonctions convexes duales et points proximaux dans un espace hilbertien, C. R.
Acad. Sci. Paris Ser. I Math., 255 (1962), pp. 2897–2899.
[40] J.-J. Moreau, Propriétés des applications “prox,” C. R. Acad. Sci. Paris Ser. I Math., 256
(1963), pp. 1069–1071.
[41] J.-J. Moreau, Proximité et dualité dans un espace hilbertien, Bull. Soc. Math. France, 93
(1965), pp. 273–299.
[42] S. Mosci, L. Rosasco, M. Santoro, A. Verri, and S. Villa, Solving structured sparsity
regularization with proximal methods, in Machine Learning and Knowledge Discovery in
Databases, Lecture Notes in Comput. Sci. 6322, J. Balcázar, F. Bonchi, A. Gionis, and
M. Sebag, eds., Springer, Berlin, 2010, pp. 418–433.
[43] A. S. Nemirovsky and D. B. Yudin, Problem Complexity and Method Efficiency in Optimiza-
tion, Wiley-Intersci. Ser. Discrete Math., Wiley, New York, 1983.
[44] Y. Nesterov, Introductory Lectures on Convex Optimization. A Basic Course, Appl. Optim.
87, Kluwer Academic, Boston, MA, 2004.

ACCELERATED AND INEXACT FORWARD-BACKWARD ALGORITHMS 1633

[45] Y. Nesterov, Gradient Methods for Minimizing Composite Objective Function, Technical re-
port, CORE Discussion Papers from Université Catholique de Louvain, Center for Opera-
tions Research and Econometrics No 2007/076, 2009.
Downloaded 08/19/14 to 140.117.111.1. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

[46] G. Peyré and J. Fadili, Group sparsity with overlapping partition functions, in Proc. EU-
SIPCO 2011, Barcelona, 2011, pp. 303–307.
[47] R. T. Rockafellar, Augmented Lagrangians and applications of the proximal point algorithm
in convex programming, Math. Oper. Res., 1 (1976), pp. 97–116.
[48] R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control
Optim., 14 (1976), pp. 877–898.
[49] L. Rosasco, S. Mosci, M. S. Santoro, A. Verri, and S. Villa, A regularization approach
to nonlinear variable selection, JMLR Workshop Conf. Proc., 9 (2010), pp. 653–660.
[50] L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algo-
rithms, Phys. D, 60 (1992), pp. 259–268.
[51] A. Sabharwal and L. C. Potter, Convexly constrained linear inverse problems: Iterative
least-squares and regularization, IEEE Trans. Signal Process., 46 (1998), pp. 2345–2352.
[52] S. Salzo and S. Villa, Inexact and accelerated proximal point algorithm, J. Convex Anal., 19
(2012).
[53] O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen, Variational
Methods in Imaging, Appl. Math. Sci. 167, Springer, New York, 2009.
[54] M. Schmidt, N. Le Roux, and F. Bach, Convergence rates of inexact proximal-gradient
methods for convex optimization, in Advances in Neural Information Processing Systems
24, 2011.
[55] M. V. Solodov and B. F. Svaiter, A hybrid approximate extragradient-proximal point algo-
rithm using the enlargement of a maximal monotone operator, Set-Valued Anal., 7 (1999),
pp. 323–345.
[56] M. V. Solodov and B. F. Svaiter, A comparison of rates of convergence of two inexact
proximal point algorithms, in Nonlinear Optimization and Related Topics (Erice, 1998),
Appl. Optim. 36, Kluwer Academic, Dordrecht, 2000, pp. 415–427.
[57] M. V. Solodov and B. F. Svaiter, Error bounds for proximal point subproblems and associ-
ated inexact proximal point algorithms, Math. Program., 88 (2000), pp. 371–389.
[58] M. V. Solodov and B. F. Svaiter, An inexact hybrid generalized proximal point algorithm
and some new results on the theory of Bregman functions, Math. Oper. Res., 25 (2000),
pp. 214–230.
[59] M. V. Solodov and B. F. Svaiter, A unified framework for some inexact proximal point
algorithms, Numer. Funct. Anal. Optim., 22 (2001), pp. 1013–1035.
[60] A. Subramanian et al., Gene set enrichment analysis: A knowledge-based approach for inter-
preting genome-wide expression profiles, Proc. Natl. Acad. Sci. USA, 102 (2005), p. 15545.
[61] P. Tseng, Approximation accuracy, gradient methods, and error bound for structured convex
optimization, Math. Program., 125 (2010), pp. 263–295.
[62] M. J. Van De Vijver et al., A gene-expression signature as a predictor of survival in breast
cancer, New England J. Med., 347 (2002), pp. 1999–2009.
[63] Y. Yao and N. Shahzad, Strong convergence of a proximal point algorithm with general errors,
Optim. Lett., 6 (2012), pp. 621–628.
[64] M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, J.
R. Stat. Soc. Ser. B Stat. Method, 68 (2006), pp. 49–67.
[65] C. Zălinescu, Convex Analysis in General Vector Spaces, World Scientific Publishing, River
Edge, NJ, 2002.
[66] A. J. Zaslavski, Convergence of a proximal point method in the presence of computational
errors in Hilbert spaces, SIAM J. Optim., 20 (2010), pp. 2413–2421.
[67] P. Zhao, G. Rocha, and B. Yu, The composite absolute penalties family for grouped and
hierarchical variable selection, Ann. Statist., 37 (2009), pp. 3468–3497.

Convergence of Inexact Forward - Backward Algorithms Using The Forward - Backward Envelope
No ratings yet
Convergence of Inexact Forward - Backward Algorithms Using The Forward - Backward Envelope
29 pages
Proximal Algorithms for Convex Optimization
No ratings yet
Proximal Algorithms for Convex Optimization
5 pages
New Inertial Proximal Gradient Methods For Unconstrained Convex Optimization Problems
No ratings yet
New Inertial Proximal Gradient Methods For Unconstrained Convex Optimization Problems
18 pages
Convergence of Descent Methods For Semi-Algebraic and
No ratings yet
Convergence of Descent Methods For Semi-Algebraic and
37 pages
FISTA
No ratings yet
FISTA
20 pages
Convergence Rates of Inexact Proximal-Gradient Methods For Convex Optimization
No ratings yet
Convergence Rates of Inexact Proximal-Gradient Methods For Convex Optimization
31 pages
Rajmic 2016
No ratings yet
Rajmic 2016
5 pages
Proximal Operators in Optimization
No ratings yet
Proximal Operators in Optimization
4 pages
Condat 2013 Primal Dual Splitting
No ratings yet
Condat 2013 Primal Dual Splitting
20 pages
On Inexact Newton Methods For Inverse Problems in Banach Spaces
No ratings yet
On Inexact Newton Methods For Inverse Problems in Banach Spaces
124 pages
Improving The Accuracy of Computed Eigenvalues and Eigenvectors
No ratings yet
Improving The Accuracy of Computed Eigenvalues and Eigenvectors
23 pages
s10107 011 0484 9
No ratings yet
s10107 011 0484 9
39 pages
Nonlinear Analysis: Alexandra Smirnova, Necibe Tuncer
No ratings yet
Nonlinear Analysis: Alexandra Smirnova, Necibe Tuncer
12 pages
Dewitte 2010
No ratings yet
Dewitte 2010
17 pages
A Nachaoui - Appl Analysis 2002
No ratings yet
A Nachaoui - Appl Analysis 2002
19 pages
Proximal Algorithms in Statistics and Machine Learning
No ratings yet
Proximal Algorithms in Statistics and Machine Learning
23 pages
On The Douglas-Rachford Alternating Direction Method: O (1/N) Convergence Rate of The
No ratings yet
On The Douglas-Rachford Alternating Direction Method: O (1/N) Convergence Rate of The
10 pages
A New Method For Solving Variational Inequalities and Fixed Points Problems of Demi-Contractive Mappings in Hilbert Spaces
No ratings yet
A New Method For Solving Variational Inequalities and Fixed Points Problems of Demi-Contractive Mappings in Hilbert Spaces
21 pages
Fast Gradient Methods: - Fast Proximal Gradient Method (FISTA) - Nesterov's Second Method
No ratings yet
Fast Gradient Methods: - Fast Proximal Gradient Method (FISTA) - Nesterov's Second Method
30 pages
SESO2018 Wednesday Sagastizabal
No ratings yet
SESO2018 Wednesday Sagastizabal
181 pages
Optimization for Convex Functions
No ratings yet
Optimization for Convex Functions
31 pages
A Simplified View of First Order Methods For Optimization
No ratings yet
A Simplified View of First Order Methods For Optimization
30 pages
Epigraphical Projection and Proximal Tools For Solving Constrained Convex Optimization Problems: Part I
No ratings yet
Epigraphical Projection and Proximal Tools For Solving Constrained Convex Optimization Problems: Part I
25 pages
Interior Gradient and Proximal Methods For Convex and Conic Optimization
No ratings yet
Interior Gradient and Proximal Methods For Convex and Conic Optimization
29 pages
Ahookhosh 等 - 2021 - A Bregman Forward-Backward Linesearch Algorithm for Nonconvex Composite Optimization Superlinear Co
No ratings yet
Ahookhosh 等 - 2021 - A Bregman Forward-Backward Linesearch Algorithm for Nonconvex Composite Optimization Superlinear Co
33 pages
Adaptive Image Reconstruction Using Information Measures
No ratings yet
Adaptive Image Reconstruction Using Information Measures
18 pages
Extra-Optimal Methods For Solving Ill-Posed Problems Survey of Theory and Examples
No ratings yet
Extra-Optimal Methods For Solving Ill-Posed Problems Survey of Theory and Examples
27 pages
10.3934 Math.2023930
No ratings yet
10.3934 Math.2023930
19 pages
Beck 2009
No ratings yet
Beck 2009
20 pages
A Proximal-Gradient Homotopy Method For The Sparse Least-Squares Problem
No ratings yet
A Proximal-Gradient Homotopy Method For The Sparse Least-Squares Problem
37 pages
Apl 232
No ratings yet
Apl 232
48 pages
Solving Monotone Inclusions Via Compositions of Nonexpansive 4mvrza2b56
No ratings yet
Solving Monotone Inclusions Via Compositions of Nonexpansive 4mvrza2b56
33 pages
Inexact Newton Method For Minimization of Convex P
No ratings yet
Inexact Newton Method For Minimization of Convex P
16 pages
Strong Convergence of Extragradient Methods For Solving Bilevel Pseudo-Monotone Variational Inequality Problems
No ratings yet
Strong Convergence of Extragradient Methods For Solving Bilevel Pseudo-Monotone Variational Inequality Problems
21 pages
Ito 2003
No ratings yet
Ito 2003
8 pages
Two Simple Projection-Type Methods For Solving Variational Inequalities
No ratings yet
Two Simple Projection-Type Methods For Solving Variational Inequalities
23 pages
Extragradient Methods For Solving Non-Lipschitzian Pseudo-Monotone Variational Inequalities
No ratings yet
Extragradient Methods For Solving Non-Lipschitzian Pseudo-Monotone Variational Inequalities
19 pages
Fista
No ratings yet
Fista
32 pages
A Geometric Structure of Acceleration and Its Role in Making Gradients Small Fast
No ratings yet
A Geometric Structure of Acceleration and Its Role in Making Gradients Small Fast
40 pages
Algorithm for Non-linear Equations
No ratings yet
Algorithm for Non-linear Equations
8 pages
Choose Your Path Wisely: Gradient Descent in A Bregman Distance Framework
No ratings yet
Choose Your Path Wisely: Gradient Descent in A Bregman Distance Framework
29 pages
A New Method For Solving Split Variational Inequality Problems Without Co-Coerciveness
No ratings yet
A New Method For Solving Split Variational Inequality Problems Without Co-Coerciveness
23 pages
Materials 12 01227 PDF
No ratings yet
Materials 12 01227 PDF
10 pages
700 755 1 PB
No ratings yet
700 755 1 PB
21 pages
Iterative Method for LCPs
No ratings yet
Iterative Method for LCPs
17 pages
Nonlinear Ill Posed Problems of Monotone Type Y Alber I Ryazantseva Springer
No ratings yet
Nonlinear Ill Posed Problems of Monotone Type Y Alber I Ryazantseva Springer
420 pages
Thesis Anzengruber
No ratings yet
Thesis Anzengruber
130 pages
Equilibrium Programming and New Iterative Methods in Hilbert Spaces
No ratings yet
Equilibrium Programming and New Iterative Methods in Hilbert Spaces
29 pages
A Truncated Nonmonotone Gauss-Newton Method For Large-Scale Nonlinear Least-Squares Problems
No ratings yet
A Truncated Nonmonotone Gauss-Newton Method For Large-Scale Nonlinear Least-Squares Problems
16 pages
On The Convergence of The Proximal Algorithm For Nonsmooth Functions Involving Analytic Features
No ratings yet
On The Convergence of The Proximal Algorithm For Nonsmooth Functions Involving Analytic Features
12 pages
Fast Gradient Method
No ratings yet
Fast Gradient Method
25 pages
Solving Underdetermined Nonlinear Equations by Newton-Like Method
No ratings yet
Solving Underdetermined Nonlinear Equations by Newton-Like Method
22 pages
Applied Numerical Linear Algebra. Lecture 5
No ratings yet
Applied Numerical Linear Algebra. Lecture 5
52 pages
On The Linear Convergence of Forward-Backward Splitting Method Part I-Convergence Analysis
No ratings yet
On The Linear Convergence of Forward-Backward Splitting Method Part I-Convergence Analysis
24 pages
Cauchy-Based Penalties in Non-Convex Optimization
No ratings yet
Cauchy-Based Penalties in Non-Convex Optimization
12 pages
Mathematics 12 00675
No ratings yet
Mathematics 12 00675
24 pages
Proximal Minimization With D-Functions: Gorithms
No ratings yet
Proximal Minimization With D-Functions: Gorithms
11 pages
Nadeem 2015
No ratings yet
Nadeem 2015
10 pages
Lagrange's Interpolation Problems
No ratings yet
Lagrange's Interpolation Problems
10 pages
B.SC Statistics
No ratings yet
B.SC Statistics
16 pages
NMMS 2013
No ratings yet
NMMS 2013
29 pages
6.RLC Circuit
No ratings yet
6.RLC Circuit
16 pages
Prediction of EV Charging Behavior Using Machine L
No ratings yet
Prediction of EV Charging Behavior Using Machine L
12 pages
Discourse
0% (2)
Discourse
6 pages
Class IX-XII Book List 2024-25
No ratings yet
Class IX-XII Book List 2024-25
6 pages
Linear Regression and Correlation
No ratings yet
Linear Regression and Correlation
6 pages
Ontology Modelling For FDA Adverse Event Reporting System
No ratings yet
Ontology Modelling For FDA Adverse Event Reporting System
5 pages
Monologue Template 1
No ratings yet
Monologue Template 1
1 page
Acjc H2 FM P1 QP
No ratings yet
Acjc H2 FM P1 QP
7 pages
Flow Rate Calculation Guide
No ratings yet
Flow Rate Calculation Guide
10 pages
Yang-Mills Equations Explained
No ratings yet
Yang-Mills Equations Explained
9 pages
Written by Cynthia Audain, Class of 1998 (Agnes Scott College)
No ratings yet
Written by Cynthia Audain, Class of 1998 (Agnes Scott College)
5 pages
IGCSE Formula Booklet
No ratings yet
IGCSE Formula Booklet
31 pages
3th PPT - Sts
No ratings yet
3th PPT - Sts
13 pages
Bayesian Classifier Notes
No ratings yet
Bayesian Classifier Notes
9 pages
CSE Registration Package Summer 2024-1
No ratings yet
CSE Registration Package Summer 2024-1
8 pages
0 A 1 BCC 52
No ratings yet
0 A 1 BCC 52
100 pages
Vector Part 2
No ratings yet
Vector Part 2
51 pages
CW Math in Digital Communications Codes PDF
No ratings yet
CW Math in Digital Communications Codes PDF
5 pages
Cumulative Frequency Mark Scheme
No ratings yet
Cumulative Frequency Mark Scheme
6 pages
VLSI DSP System Design Guide
No ratings yet
VLSI DSP System Design Guide
8 pages
Multiple Random Variables Guide
No ratings yet
Multiple Random Variables Guide
52 pages
Xi Sci Annual Exam Q P 2023-24
No ratings yet
Xi Sci Annual Exam Q P 2023-24
5 pages
Computer Practical Fle
No ratings yet
Computer Practical Fle
13 pages
PPL Unit-5
No ratings yet
PPL Unit-5
6 pages
Gaussian Model Basic Derivation Lecture 4 5
No ratings yet
Gaussian Model Basic Derivation Lecture 4 5
11 pages
Name: Course: Yr. & Sec: Section: Time: Room: Instructor:: I. Objectives
No ratings yet
Name: Course: Yr. & Sec: Section: Time: Room: Instructor:: I. Objectives
6 pages

Accelerated and Inexact Forward-Backward Algorithms

Uploaded by

Accelerated and Inexact Forward-Backward Algorithms

Uploaded by

SIAM J. OPTIM.

c 2013 Society for Industrial and Applied Mathematics

ACCELERATED AND INEXACT FORWARD-BACKWARD

SILVIA VILLA† , SAVERIO SALZO‡ , LUCA BALDASSARRE§ , AND ALESSANDRO VERRI‡

Abstract. We propose a convergence analysis of accelerated forward-backward splitting methods

Key words. convex optimization, accelerated forward-backward splitting, inexact proximity

AMS subject classifications. 90C25, 49M07, 65K10, 94A08

1. Introduction. Let H be a Hilbert space and consider the optimization prob-

(P) inf F (x), F (x) = f (x) + g(x),

∇f (x) − ∇f (y) ≤ Lx − y ∀x, y ∈ H.

We denote by F∗ the inﬁmum of F . We do not require in general the inﬁmum to be

2013; published electronically August 6, 2013.

sachusetts Institute of Technology, 16163, Genova, Italy ([email protected]).

ELD), Station 11, CH-1015 Lausanne, Switzerland (luca.baldassarre@epﬂ.ch).

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

xk+1 = proxλk g (yk − λk ∇f (yk )),

for suitably chosen constants ci,k , (i = 1, 2, 3, k ∈ N) and parameters λk > 0—where

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

many real-life problems, in the same spirit as [15].

1.2. Related work. Forward-backward algorithms belong to the wider class

1 There, the sequence y in (1.1), is updated by setting c

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

1.3. Outline of the paper. In section 2, we give a notion of admissible ap-

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Fig. 2.1. Admissible approximation of PC (y).

Example 1. We describe the case where g is the indicator function of a closed

(2.5) y − λproxg∗ /λ (y/λ) = proxλg (y),

where g ∗ : H → R, g ∗ (y) = supx∈H ( x, y − g(x)), is the conjugate functional of g.

(2.6) z η proxg∗ /λ (y/λ) ⇐⇒ y − λz ε proxλg (y) .

y − λz ∈ ∂ η2 λ g ∗ (z) ⇐⇒ z ∈ ∂ ε2 g(y − λz).

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Next, we prove that the proposed inexactness criterion can be formulated in

admissible approximations. Without loss of generality, we consider the case where g

(2.7) g(x) = ω(Bx),

with B : H → G a bounded linear operator between Hilbert spaces, and ω : G → R

(2.9) min Φλ (x) = − min Ψλ (v) ,

(2.11) 0= min Φλ (x) + Ψλ (v) =: G(x, v) .

Moreover, if v̄ is a solution of the dual problem minv Ψλ (v), then z̄ = y − λB ∗ v̄ solves

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

the fact that ω ∗ (v) ≥ g ∗ (B ∗ v), it follows

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

In case g is positively homogeneous, λg is positively homogeneous too and (λg)∗ =

Thus (2.16) can be applied to the function λg, obtaining (2.17).

and, by induction, condition (3.1) is satisﬁed with βk = k−1i=0 (1 − αi ). If k∈N αk =

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Moreover, if the starting ϕ0 = ϕ is a quadratic function, written in canonical

ϕk ’s, deﬁned according to (3.4), are quadratic functions

Then, the sequence deﬁned by setting βk := i=0 (1 − αi ) satisﬁes βk = O(1/k 2 ).

Moreover, if (λk )k∈N is also bounded from above, βk ∼ 1/k 2 .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

with ck = 2 − λk L − ak ≥ 0, δ0 = 0, and δk+1 = (1 − αk )δk + ε2k /(2λk ).

Next, the update yk+1 in (AIFB) can be written as

Therefore, setting νk = α−1

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Remark 4. (AIFB) can be equivalently written in terms of αk = 1/tk . This leads

We next consider convergence.

Hence the statement follows as in [52, Theorem 4.8].

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

5.1. Computing admissible approximations. We ﬁrst cope with the com-

(5.1) z ε proxλg (y)

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Since dom ω = G, ω is continuous on G and hence Φλ is continuous on H. Therefore

G(zn , vn ) = Φλ (zn ) + Ψλ (vn ) → Φλ (z) + Ψλ (v) = 0 .

(5.7) min ω(Bx) + ϕ(x) ,

where Ψ is the dual of (5.7).

gradient. See Theorems 4.2.1 and 4.2.2 in Chapter 4 of [32].

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

5.2. Global iteration complexity of the algorithm. Each iteration of AIFB

loop, to approximate the proximity operator of g up to a precision εk . Theorem 5.1

where ci and ce denotes the unitary costs of each type of iteration.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

of AIFB and the (nonaccelerated) forward-backward splitting (here denoted as IFB).

4 The code is available upon request to the authors.

where g ∗ : H → R, g ∗ (y) = supx∈H ( x, y − g(x)), is the conjugate functional of g.

Lemma A.2. Let x ε proxλg (y − λ∇f (y)), λL ≤ 1, and w ∈ H such that