Improving Ultimate Convergence of An Augmented Lagrangian Method
Improving Ultimate Convergence of An Augmented Lagrangian Method
Abstract
Optimization methods that employ the classical Powell-Hestenes-Rockafellar Augmented
Lagrangian are useful tools for solving Nonlinear Programming problems. Their reputation
decreased in the last ten years due to the comparative success of Interior-Point Newto-
nian algorithms, which are asymptotically faster. In the present research a combination of
both approaches is evaluated. The idea is to produce a competitive method, being more
robust and efficient than its “pure” counterparts for critical problems. Moreover, an addi-
tional hybrid algorithm is defined, in which the Interior Point method is replaced by the
Newtonian resolution of a KKT system identified by the Augmented Lagrangian algorithm.
The software used in this work is freely available through the Tango Project web page:
http://www.ime.usp.br/∼egbirgin/tango/.
1 Introduction
We are concerned with Nonlinear Programming problems defined in the following way:
Minimize f (x)
subject to h(x) = 0
(1)
g(x) ≤ 0
x ∈ Ω,
where h : IRn → IRm , g : IRn → IRp , f : IRn → IR are smooth and Ω ⊂ IRn is an n-dimensional
box. Namely, Ω = {x ∈ IRn | ` ≤ x ≤ u}.
∗
Department of Computer Science IME-USP, University of São Paulo, Rua do Matão 1010, Cidade Uni-
versitária, 05508-090, São Paulo SP, Brazil. This author was supported by PRONEX-Optimization (PRONEX -
CNPq / FAPERJ E-26 / 171.164/2003 - APQ1), FAPESP (Grant 06/53768-0) and CNPq (PROSUL 490333/2004-
4). e-mail: [email protected]
†
Department of Applied Mathematics, IMECC-UNICAMP, University of Campinas, CP 6065, 13081-970
Campinas SP, Brazil. This author was supported by PRONEX-Optimization (PRONEX - CNPq / FAPERJ
E-26 / 171.164/2003 - APQ1), FAPESP (Grant 06/53768-0) and CNPq. e-mail: [email protected]
1
The Powell-Hestenes-Rockafellar (PHR) Augmented Lagrangian [41, 54, 56] is given by:
m p
ρ X λi 2 X µi 2
Lρ (x, λ, µ) = f (x) + hi (x) + + max 0, gi (x) + (2)
2 i=1 ρ i=1
ρ
p
for all x ∈ IRn , λ ∈ IRm , µ ∈ IR+ , ρ > 0.
PHR-based Augmented Lagrangian methods for solving (1) are based on the iterative (ap-
proximate) minimization of Lρ with respect to x ∈ Ω, followed by the updating of the penalty
parameter ρ and the Lagrange multipliers approximations λ and µ. The most popular prac-
tical Augmented Lagrangian method gave rise to the Lancelot package [22, 24]. Lancelot
does not use inequality constraints g(x) ≤ 0. When an inequality constraint gi (x) ≤ 0 ap-
pears in a particular problem, it is replaced by gi (x) + si = 0, si ≥ 0. The convergence of the
Lancelot algorithm to KKT points was proved in [22] using regularity assumptions. Under
weaker assumptions that involve the Constant Positive Linear Dependence (CPLD) constraint
qualification [3, 55], KKT-convergence was proved in [1] for a variation of the Lancelot method.
In [2], a new PHR-like algorithm was introduced that does not use slack variables to complete
inequality constraints and admits general constraints in the lower-level set Ω. In the box-
constraint case considered in this paper, subproblems are solved using a matrix-free technique
introduced in [11], which improves the Gencan algorithm [10]. CPLD-based convergence and
penalty-parameter boundedness were proved in [2] under suitable conditions on the problem.
In addition to its intrinsic adaptability to the case in which arbitrary constraints are included
in Ω, the following positive characteristics of the Augmented Lagrangian approach for solving (1)
must be mentioned:
1. Augmented Lagrangian methods proceed by sequential resolution of simple (generally un-
constrained or box-constrained) problems. Progress in the analysis and implementation of
simple-problem optimization procedures produces an almost immediate positive effect on
the effectiveness of associated Augmented Lagrangian algorithms. Box-constrained mini-
mization is a dynamic area of practical optimization [9, 12, 13, 16, 26, 40, 45, 51, 67, 70]
from which we can expect Augmented Lagrangian improvements. In large-scale problems,
the availability of efficient matrix-free box-constraint solvers is of maximal importance.
2. Global minimization of the subproblems implies convergence to global minimizers of the
Augmented Lagrangian method [8]. There is a large field for research on global optimiza-
tion methods for box-constraint optimization. When the global box-constraint optimiza-
tion problem is satisfactorily solved in practice, the effect on the associated Augmented
Lagrangian method for Nonlinear Programming problem is immediate.
3. Most box-constrained optimization methods are guaranteed to find stationary points. In
practice, good methods do more than that. The line-search procedures of [10], for example,
include extrapolation steps that are not necessary from the point of view of KKT conver-
gence. However, they enhance the probability of convergence to global minimizers. In the
context of box-constrained optimization, “magical steps” in the sense of [24] (pp. 387-391)
use to be effective to increase the probability of convergence to global minimizers. As a
consequence, the probability of convergence to Nonlinear Programming global minimizers
of a practical Augmented Lagrangian method is enhanced too.
2
4. The theory of convergence to global minimizers of Augmented Lagrangian methods [8]
does not need differentiability of the functions that define the Nonlinear Programming
problem. In practice, this indicates that the Augmented Lagrangian approach may be
successful in situations where smoothness is dubious.
5. The Augmented Lagrangian approach can be adapted to the situation in which analytic
derivatives, even if they exist, are not computed. See [44] for a derivative-free version of
Lancelot.
6. In many practical problems the Hessian of the Lagrangian is structurally dense (in the
sense that any entry may be different from zero at different points) but generally sparse
(given a specific point in the domain, the particular Lagrangian Hessian is a sparse matrix).
As an example of this situation, consider the following formulation of the problem of fitting
circles of radii r within a circle of radius R without overlapping [14]:
X
Min max{0, 4r2 − kpi − pj k22 }2 subject to kpi k22 ≤ (R − r)2 .
i<j
The Hessian of the objective function is structurally dense but sparse at any point such
that points pi are “well distributed” within the big circle. Newtonian methods usually
have difficulties with this situation, both in terms of memory and computer time, since
the sparsity pattern of the matrix changes from iteration to iteration. This difficulty
is almost irrelevant for the Augmented Lagrangian approach if one uses a low-memory
box-constraint solver.
7. Independently of the Lagrangian Hessian density, the structure of the KKT system may be
very poor for sparse factorizations. This is a serious difficulty for Newton-based methods,
but not for suitable implementations of the Augmented Lagrangian PHR algorithm.
8. If the Nonlinear Programming problem has many inequality constraints, the usual slack-
variable approach of Interior-Point methods (also used in [1, 22]) may be inconvenient.
There are several approaches to reduce the effect of the presence of many slacks, but they
may not be as effective as not using slacks at all. The price of not using slacks is the
absence of continuous second derivatives in Lρ . In many cases, this does not seem to be a
serious practical inconvenience [7].
9. Huge problems have obvious disadvantages in terms of storage requirements. The Aug-
mented Lagrangian approach provides a radical remedy: problem data may be computed
“on the flight”, used when required in the subproblems, and not stored at all. This is not
possible if one uses matricial approaches, independently of the sparsity strategy adopted.
10. If, at the solution of the problem, some strong constraint qualification fails to hold, the
performance of Newton-like algorithms could be severely affected. The Augmented La-
grangian is not so sensitive to this type of disadvantage.
11. Augmented Lagrangian methods are useful in different contexts, such as Generalized Semi-
Infinite Programming. If one knows how to solve Ordinary Semi-Infinite Programming
3
problems, the Augmented Lagrangian seems to be the reasonable tool to incorporate “x-
dependent” constraints in the lower-level problems [53].
Despite all these merits, the amount of research dedicated to Augmented Lagrangian methods
decreased in the present century. Modern methods, based on interior-point (IP) techniques,
sequential quadratic programming (SQP), trust regions, restoration, nonmonotone strategies
and advanced sparse linear algebra procedures attracted much more attention [4, 5, 17, 19, 21,
20, 31, 32, 33, 34, 46, 50, 59, 62, 63, 65, 66, 69].
A theoretical reason, and its practical consequence, may be behind this switch of interest.
Roughly speaking, under suitable assumptions, Interior-Point Newtonian techniques converge
quadratically (or, at least, superlinearly) whereas practical Augmented Lagrangian algorithms
generally converge only linearly. Therefore, if both methods converge to the same point, and the
required precision is strict enough, an Interior-Point Newtonian (or SQP) method will require
less computer time than an Augmented Lagrangian method, independently of the work per
iteration. (Of course, in practical problems there is not such a thing as an “arbitrarily high
precision”. The precision required in a practical problem is the one that is satisfactory for the
user purposes.)
The situation is analogous when one compares Newton’s method and an Inexact-Newton
method for solving nonlinear systems. Ultimately, if an extremely high precision is required,
Newton’s method will be the best. The Inexact-Newton method is a practical algorithm because
in some problems the cost of the Newton iteration cannot be afforded due to the problem
structure.
These facts inspired the following idea: Assume that we wish to solve a problem with a
structure that favors the use of the Augmented Lagrangian method, but the required precision
ε is rather strict. The Augmented Lagrangian performance could perhaps be improved if this
√
method is run up to a more modest precision (say ε) and the final point so far obtained is
used to initialize a fast local method. The present paper is dedicated to a numerical evaluation
of the practical perspectives of this idea. Basically, we will use two “fast local methods” to com-
plete Augmented Lagrangian executions. The first will be Ipopt, the interior-point algorithm
introduced in [65]. The second will be Newton’s method, applied to the KKT conditions, with
a reduced number of constraints and slack variables.
A comparison between the original methods and their hybrid and accelerated counterparts
will be presented.
Notation. The symbol k · k will denote the Euclidean norm. If v = (v1 , . . . , vn )T ∈ IRn we
denote v+ = (max{0, v1 }, . . . , max{0, vn })T . The distance between the point z and the set S is
denoted dist(z, S) and defined by
4
rameter boundedness. Algencan, which is publicly available in the Tango Project web page
http://www.ime.usp.br/∼egbirgin/tango/, is the application of the main algorithm in [2] to prob-
lem (1).
Let λmin < λmax , µmax > 0, γ > 1, 0 < τ < 1. Let {εk } a sequence of nonnegative numbers such
that limk→∞ εk = 0. Let λ1i ∈ [λmin , λmax ], i = 1, . . . , m, µ1i ∈ [0, µmax ], i = 1, . . . , p, and ρ1 > 0.
Let x0 ∈ Ω be an arbitrary initial point. Initialize k ← 1.
Step 2. Define
µk
Vik k
= max gi (x ), − i , i = 1, . . . , p. (4)
ρk
If k = 1 or
max{kh(xk )k∞ , kV k k∞ } ≤ τ max{kh(xk−1 )k∞ , kV k−1 k∞ }, (5)
define ρk+1 = ρk . Otherwise, define ρk+1 = γρk .
Remark. In practice, we use the first-order safeguarded estimates of the Lagrange multipliers:
λk+1
i = min{max{λmin , λki + ρk hi (xk )}, λmax }, for i = 1, . . . , m and µk+1
i = min{max{0, µki +
k
ρk gi (x )}, µmax } for i = 1, . . . , p.
Assume that the feasible set of a nonlinear programming problem is given by h(x) = 0, g(x) ≤
0, where h : IRn → IRm and g : IRn → IRp . Let I(x) ⊂ {1, . . . , p} be the set of indices of the
active inequality constraints at the feasible point x. Let I1 ⊂ {1, . . . , m}, I2 ⊂ I(x). The subset
of gradients of active constraints that correspond to the indices I1 ∪ I2 is said to be positively
linearly dependent if there exist multipliers λ, µ such that
X X
λi ∇hi (x) + µi ∇g i (x) = 0, (6)
i∈I1 i∈I2
with µi ≥ 0 for all i ∈ I2 and i∈I1 |λi | + i∈I2 µi > 0. Otherwise, we say that these gradients
P P
5
active constraints is positively linearly dependent at the feasible point x (i.e. (6) holds), then
there exists δ > 0 such that the vectors
Theorem 2.1. Assume that {xk } is a sequence generated by Algencan and x∗ is a limit point.
Then,
2. If x∗ is feasible and fulfills the Constant Positive Linear Dependence constraint qualifica-
tion (with respect to all the constraints, including the bounds), then x∗ satisfies the KKT
conditions of (1).
Under additional local conditions it was proved in [2] that the sequence of penalty parameters
{ρk } remains bounded.
The following theorem is an easy consequence of Theorems 2.1 and 2.2 of [8].
Theorem 2.2. Assume that (1) admits a feasible point and that, instead of (3), each subproblem
is considered as approximately solved when xk ∈ Ω is found such that
f (x∗ ) ≤ f (y) + ε
Therefore, Theorem 2.2 states that the sequence generated by the algorithm converges to an
ε-global minimizer, provided that εk -global minimizers of the subproblems are computed at each
outer iteration. The practical consequences of Theorems 2.1 and 2.2 are different. Theorem 2.1
applies directly to the present implementation of Algencan, and, in spite of the effect of floating
point computations, reflects the behavior of the method in practical calculations. Theorem 2.2
describes what should be expected from the Augmented Lagrangian method if the subproblems
are solved with an active search of the global minimizer.
In the practical implementation of Algorithm 2.1 (Algencan), subproblems are solved using
Gencan [10] with the modifications introduced in [11]. The default parameters recommended
6
in [2] are τ = 0.5, γ = 10, λmin = −1020 , µmax = λmax = 1020 , εk = ε for all k, λ1 = 0, µ1 = 0
and
2|f (x0 )|
ρ1 = max 10−6 , min 10, . (7)
kh(x0 )k2 + kg(x0 )+ k2
At every iteration of Algencan, we define
Therefore, µk+
i = 0. Since, in this case, −gi (xk ) > , we have that min{−gi (xk ), µk+
i } = 0.
Therefore, (8) is proved.
Taking = |Vik |, we obtain:
DFM (k) = kPΩ (xk − ∇Lρk (xk , λk , µk )) − xk k∞ = kPΩ (xk − ∇L(xk , λk+ , µk+ )) − xk k∞ .
For simplicity, we denote, from now on, ICM = ICM(k), DFM = DFM(k).
We stop the execution of the algorithm declaring Convergence to a KKT point if
7
Under reasonable assumptions, the quantity max{ICM, DFM} is of the same order as the dis-
tance between (xk , λk+ , µk+ ) and the set of primal-dual solutions of the nonlinear programming
problem. A precise statement of this fact is in Theorem 2.3.
2. The feasible point x∗ satisfies the KKT conditions of (1) and the Mangasarian-Fromovitz
constraint qualification [47]. Let S be the set of (primal-dual) KKT triplets (x∗ , λ∗ , µ∗ )
associated to x∗ ;
3. For each primal-dual KKT point (x∗ , λ∗ , µ∗ ), the second order sufficient optimality condi-
tion holds.
Then, there exist δ > 0 and C > 0 such that, if dist((xk , λk+ , µk+ ), S) ≤ δ, we have:
Proof. This result is a straightforward corollary of Theorem 3.6 of [29]. See, also, [39, 52, 68].
Dual feasibility with tolerance ε (DFM ≤ ε) is guaranteed by (3) and the choice of εk .
In infinite precision, the criterion (3) is necessarily obtained for all the subproblems, since
Gencan converges to stationary points. In practice, and due to rounding errors and scaling,
Gencan may fail to satisfy (3) at some iterations. In these cases, Gencan is stopped by large
number of iterations (1000 in this implementation) or by a Not-Enough-Progress criterion. When
this happens, it may be possible that the Feasibility-Complementarity convergence criterion
ICM ≤ ε is satisfied at some iteration but not the projected gradient condition (3). If this is
the case, the execution of Algencan continues without increasing the penalty parameter.
8
initial estimation of the solution (even in the warm-start case) as well as the bound constraints
of the problem. The modification of the initial point is done to avoid an initial point near
the boundary of the feasible region, whereas the modification of the bounds is done to avoid
feasible sets with empty interior. These modifications may be avoided by a non-standard setting
of the Ipopt parameters DBNDFRAC, DBNDPUSH and DMOVEBOUNDS (see the Ipopt
documentation for further details). However, modifying those parameters might influence the
overall performance of Ipopt. The determination of the optimal Ipopt parameters in the
presence of warm-starts is out of the scope of the present study.
We will consider first that the probably active bounds identified at xk are those bounds such
that xki = `i or xki = ui , and that the probably active inequality constraints at xk will be the
constraints defined by gi (xk ) ≥ −10−4 . Let IA be the set of indices of probably active inequality
constraints. Let r be the number of elements of IA . Assume, without loss of generality, that the
last n − q variables are identified as having probably active bounds. Thus, xki = x̄i ∈ {`i , ui } for
all i = q + 1, . . . , n. We define
and z = (x1 , . . . , xq ).
Therefore, the KKT system we aim to solve is:
m
∇f¯(z) +
X X
∇h̄i (z)λi + ∇ḡi (z)µi = 0, (12)
i=1 i∈IA
h̄(z) = 0, (13)
ḡi (z) = 0, ∀ i ∈ IA . (14)
This nonlinear system has q+m+r variables and equations. We tried to use Newton’s method
for its resolution. The particular implementation of Newton’s method was straightforward.
Namely, writing the system above as F (y) = 0, we solved, at each iteration, the linear system
F 0 (y)∆y = −F (y)
9
and we updated y ← y + ∆y. If kF (y)k∞ ≤ 10−8 the process was stopped. In this case, we
checked whether the obtained point remained feasible, up to tolerance 10−8 and whether the
inequality Lagrange multipliers remained nonnegative, up to tolerance 10−8 . If these require-
ments were satisfied, we declared that the local Newton acceleration was successful and obtained
a KKT point, up to the required tolerance. If Newton’s method used more than 5 iterations or
if the linear Newtonian system could not be solved, we declared local failure. The linear New-
tonian systems were solved using the HSL (Harwell) subroutine MA27. When MA27 detects
singularity, we perturb the diagonal of F 0 (y) and we try again.
Tests with this procedure (Newton 1) were not satisfactory. We experienced many failures
in situations in which one or more inequality constraints were wrongly identified as active after
the Algencan phase of the algorithm. As a consequence, the reduced KKT system (12–14)
turned out to be incompatible and the precision 10−8 could not be achieved.
Therefore, we tried a second heuristic Newton procedure, called here Newton 2. The idea is
the following: after the Algencan phase, we define IA , r, q, f¯, ḡ, h̄, z as above, but we replace
each inequality constraint corresponding to i ∈ IA by the equality constraint ḡi (x) + s2i /2 = 0,
where si is an auxiliary slack variable, and we state the KKT system associated to the new
problem. This KKT system includes (12)-(13) but, instead of (14) includes the equations
s2i
ḡi (z) + = 0 and µi si = 0 ∀ i ∈ IA . (15)
2
The system (12)-(13)-(15) has r more variables and constraints than the system (12–14) but
does not force the IA -constraints to be active at the solution. Of course, if we solve the new
system, the danger remains that, at the solution, some inequality constraint corresponding to
i∈/ IA may be violated. Moreover, some inequality Lagrange multiplier might become negative.
Therefore, as in the case of Newton 1, we test both possibilities up to tolerance 10−8 . Fulfillment
of all the tests reveals that a KKT point with precision 10−8 has been found.
The whole Algencan-Newton procedure is described below:
Step 1. Call Algencan with precision ε̂. Let x̂ = xk , λ̂ = λk+ , µ̂ = µk+ be the final approxi-
mations obtained by Algencan at this step.
Step 2. Set IA = {i | gi (x̂) ≥ −ε̂}, add squared slack variables and call Newton, using
x̂, λ̂, µ̂pto initialize this method and setting the initial estimates of the slack variables as
si = 2 max{0, −gi (x̂)}, ∀ i ∈ IA . Use a maximum of 5 Newtonian iterations. Declare
Convergence of Newton when all the components of the system (12)-(13)-(15) are, in
modulus, smaller than or equal to ε. Let x∗ be the solution given by Newton.
Step 3. If Newton converged and max{IFM, DFM} ≤ ε, stop declaring success and return
x∗ . In this case, both the convergence criteria of Algencan and Ipopt are satisfied.
10
4 Test problems
For testing the algorithms studied in this paper, we used three variable-dimension problems.
Minimize pi ,z z
subject to kpi k2 = 1, i = 1, . . . , np ,
hpi , pj i ≤ z, i = 1, . . . , np − 1, j = i + 1, . . . , np ,
Enclosing-Ellipsoid [61]:
− ni=1
P d
Minimize lij log(lii )
subject to (p ) LLT pi ≤ 1, i = 1, ..., np ,
i T
where L ∈ IRnd ×nd is a lower-triangular matrix. The number of variables is nd × (nd + 1)/2 and
the number of inequality constraints is np (plus the bound constraints). The np points pi ∈ IRnd
are randomly generated using the Cauchy distribution as suggested in [61].
where u∗ is defined by
4.5
u∗ (i, j, k) = 10 q(i) q(j) q(k) (1 − q(i)) (1 − q(j)) (1 − q(k)) eq(k) ,
and
v(i ± 1, j, k) + v(i, j ± 1, k) + v(i, j, k ± 1) − 6v(i, j, k)
∆v(i, j, k) = ,
h2
for i, j, k = 2, . . . , np − 1. The number of variables is n3p and the number of equality constraints
is (np − 2)3 . We set θ = −100, h = 1/(np − 1) and |S| = 7. The elements of S are randomly
generated in [1, np ]3 . This problem has no inequality constraints.
Hard-Spheres and Enclosing-Ellipsoid problems possess many inequality constraints, whereas
the Bratu-based problem has a poor KKT structure. According to the reasons stated in the
Introduction, these are problems in which an Augmented Lagrangian method might present a
faster convergence to a loose-tolerance approximate solution when compared against an Interior-
Point Newtonian method.
11
The Fortran 77 implementations of these problems (including first and second derivatives)
are part of the “La Cumparsita” collection of test problems and are available through the Tango
Project web page, as well as the Fortran 77 implementation of Algencan. Moreover, Fortran 77
interface subroutines that add slack variables to the original formulation of the problems were
developed and are also available. This interface allows problems from “La Cumparsita” to be
tackled by methods that deal only with equality constraints and bounds.
We ran Algencan, Ipopt, Algencan-Ipopt and Algencan-Newton for many variations
of these problems. The convergence stopping criteria were equivalent for all the methods except
for Algencan. Algencan stops with the complementarity defined by ICM(k) ≤ ε (which
is related to the minimum between constraint and multiplier), whereas in the other methods
the measure of non-complementarity is the product between slack and multiplier. Of course,
sometimes one of these criteria is more strict, sometimes the other is. We used the tolerance
ε = 10−8 for declaring convergence in all the problems.
All the experiments were run on an 1.8GHz AMD Opteron 244 processor, 2Gb of RAM
memory and Linux operating system. Compiler option “-O4” was adopted.
5 Numerical Results
5.1 Nine Selected Problems
For each problem we will report: Number of Algencan iterations, number of Ipopt iterations,
number of Newton iterations, final infeasibility (sup norm), final objective function value,
computer time used by Algencan, computer time used by Ipopt, computer time used by
Newton, and total computer time. In the case of Algencan-Newton we also report the
number of times Algencan was called (Step 1 of Algorithm 3.1) and the number of times
Newton was called (Step 2).
2. For defining the initial approximation we compute np points in the unitary sphere. Each
point pk is generated taking:
d −1
for all the combinations of angles so far defined. Therefore, np = 2 × nngrid . The initial
0 1 n 0
approximation x was formed by p , . . . , p followed by the variable z = xn+1 . The initial
p
12
z was taken as the maximum scalar product hpi , pj i for i 6= j. The initial slack variables
for Ipopt were taken in such a way that all the constraints are satisfied at the initial
approximation.
The selected problems are defined by nd = 3 and ngrid = 7, 8, 9. Therefore, np = 98, 128, 162.
The results are reported in Table 1. The following conventions were used in this table, as well
as in Tables 2 and 3.
1. When reporting the iterations of Algencan-Ipopt the expression a + b means that the
method performed a iterations of Algencan and b iterations of Ipopt.
2. In the iterations report of Algencan-Newton the expression a(c) + b(d) means that a
iterations of Algencan and b iterations of Newton were performed. Moreover, Algen-
can was called c times, whereas Newton was called d times by Algorithm 3.1.
3. The expression (A: c%) indicates the percentage of the total time of the algorithm under
consideration that was used by Algencan. For example, in the Hard-Spheres problem
(3, 98) we read that Algencan-Newton converged using 6.54 seconds and that 97% of
the CPU time was employed by Algencan.
• Hard-Spheres (3, 98): nd = 3, np = 98, n without slacks: 295, n with slacks: 5048,
number of equality constraints: 98, number of inequality constraints: 4753, total number
of constraints: 4851.
• Hard-Spheres (3, 128): nd = 3, np = 128, n without slacks: 385, n with slacks: 8513,
number of equality constraints: 128, number of inequality constraints: 8128, total number
of constraints: 8256.
13
Hard-Spheres (3,98)
Hard-Spheres (3,128)
Hard-Spheres (3,162)
14
set to 0.
5.1.3 Bratu
We consider three particular Bratu-based problems, defined by np = 10, 16, 20. As initial ap-
proximation we took u ≡ 0.
• Bratu (10): np = 10, n: 1000, number of equality constraints: 512, total number of
constraints: 512.
• Bratu (16): np = 16, n: 4096, number of equality constraints: 2744, total number of
constraints: 2744.
• Bratu (20): np = 20, n: 8000, number of equality constraints: 5832, total number of
constraints: 5832.
15
Enclosing-Ellipsoid (3,1000)
Enclosing-Ellipsoid (3,12000)
Enclosing-Ellipsoid (3,20000)
16
Bratu-based (10, θ = −100, #S =7)
17
KKT Jacobian matrices. In these cases, it is better to persevere with the matrix-free techniques
of Algencan and Gencan.
However, we felt the necessity of confirming these conclusions using a broader comparison
basis.
With this in mind, we generated the following problems:
• Twenty groups of Enclosing-Ellipsoid problems fixing nd = 3 and choosing np ∈ {1000, 2000, . . . , 20000}.
These problems have 6 variables (without slacks) and no equality constraints. The number
of inequality constraints goes from 1000 to 20000.
• Sixteen groups of Bratu-based problems choosing np ∈ {5, 6, . . . , 20}. The size of these
problems goes from 125 variables with 27 equality constraints, to 8000 variables with 5832
equality constraints.
We generated ten instances of each problem within each group. The random generation
of the i-th instance (i = 1, 2, . . . , 10) of a particular problem (including its initial point) was
done using the Schrage’s random number generator [58] with seed s = 123456 i. In the case
of the Hard-Spheres problem, the difference between instances relies on the initial point. This
means that, in this case, we solve the same problem starting from ten different initial points. In
the Enclosing-Ellipsoid and the Bratu-based problems, some data are also randomly generated.
Therefore, in these two cases, the ten instances within the same group are in fact different
problems with different initial points.
The initial approximations were generated in the following way:
• Hard-Spheres: The initial point is randomly generated with pi ∈ [−1, 1]nd for i = 1, . . . , np
and z ∈ [0, 1].
• Enclosing-Ellipsoid: The initial point is randomly generated with lij ∈ [0, 1].
As a total we have 560 problems, divided in 56 groups. Table 4 provides the more important
characteristics of each group of problems. The numerical results are summarized in Table 5.
Each row of this table shows the average final functional value and computer time of a method,
with respect to the 10 problems of the group. ALprop denotes the average fraction of computer
time used by Algencan in Algencan-Ipopt.
Below we report some performance observations that do not appear in the table.
• Ipopt failed to satisfy the optimality condition in one of the Enclosing-Ellipsoid problems
of Group 9. However, the feasibility and the complementarity conditions were satisfied for
this problem.
18
• Ipopt failed to satisfy the optimality condition in 24 Bratu problems. Algencan-Ipopt
failed to satisfy optimality in 29 Bratu problems. In all these problems, Ipopt stopped
very close to a solution, except in one case, which corresponds to group 14 of Bratu.
Observe that, in this case, the final average objective function value is very high.
• Algencan did not satisfy the optimality criterion in two individual problems, correspond-
ing to groups 19 and 20 of Hard-Spheres. Anyway, the final point was feasible in both
cases (with the required precision) and the functional value was comparable to the one
achieved in the other problems.
6 Conclusions
For a number of reasons displayed in the Introduction, we believe that Augmented Lagrangian
methods based on the PHR formula will continue to be used for solving practical optimization
problems for many years. In this paper we studied ways for alleviating their main inconve-
nience: the slow convergence near a solution. We showed two different ways of overcoming this
disadvantage. One is to combine the Augmented Lagrangian method Algencan [2] with a fast
Interior-Point Newtonian solver (Ipopt). The other relies in the combination of the Augmented
Lagrangian algorithm with the straightforward Newton method that uses the Algencan-
identification of active constraints. For computing Newtonian steps, we used a standard sparse
matrix solver. Of course, this destroys the matrix-free advantages of the Augmented Lagrangian
approach. However, we are confident that the employment of iterative saddle-point solvers [6]
should overcome this drawback.
The numerical experiments showed that, in contrast with our initial expectations, the combi-
nation Algencan-Ipopt was not successful. This is probably due to the fact that, as mentioned
in the Introduction, the tested problems exhibit characteristics that do not favor the application
of SQP or Interior-Point ideas, even if we start from good initial approximations. Hard-Spheres
and Enclosing-Ellipsoid problems have many constraints, whereas the Jacobian KKT structure
of the Bratu-based problems is hard for sparse matrix solvers.
On the other hand, Algencan-Newton, which consists of the application of Newton’s
method with a reduced number of squared slacks, starting from the Algencan low-precision ap-
proximation, was relatively successful in the many-constraints problems. In Enclosing-Ellipsoid
problems, the number of slacks added to the ordinary variables of the problem is small and,
so, Newton deals well with the KKT system identified by Algencan. In several Selected
Problems, Newton failed at some iterations of Algorithm 3.1. In these cases, the control came
back to Algencan and the computer time ended up being satisfactory, in spite of the ini-
tial Newtonian frustrated attempts. However, Algencan-Newton was not as efficient in the
Hard-Spheres massive comparison as it was in the Selected Problems. The reason is that, in
the selected problems, each row of the constraint Jacobian matrix contains 7 nonnull elements
(including the slack), whereas in the massive comparison the number of non-null row Jacobian
19
Problem parameters Original formulation Adding slack variables
nd np n m n m
1 5 40 201 820 981 820
2 5 41 206 861 1,026 861
3 5 42 211 903 1,072 903
4 5 43 216 946 1,119 946
5 5 44 221 990 1,167 990
6 5 45 226 1,035 1,216 1,035
Hard-Spheres problem
Table 4: Description
20 of the problems.
Algencan Ipopt Algencan+Ipopt Algencan+Newton
Time f Time f Time ALprop f Time f
1 1.02 5.1045E-01 6.65 5.1663E-01 7.86 (0.17) 5.1663E-01 1.87 5.1052E-01
2 2.07 5.1792E-01 8.84 5.2259E-01 10.37 (0.14) 5.2259E-01 1.63 5.1830E-01
3 1.94 5.2366E-01 9.46 5.2803E-01 10.72 (0.13) 5.2803E-01 1.63 5.2368E-01
4 1.96 5.2861E-01 11.74 5.3451E-01 12.91 (0.11) 5.3451E-01 1.75 5.2786E-01
5 1.69 5.3392E-01 13.04 5.4147E-01 14.30 (0.10) 5.4147E-01 1.75 5.3308E-01
6 2.02 5.4174E-01 12.06 5.4596E-01 13.95 (0.11) 5.4596E-01 2.19 5.4240E-01
Hard-Spheres problem
Table 5: Massive
21 Comparison.
elements goes from 11 to 17. This difference is enough to reduce the comparative efficiency of
the Newtonian sparse matrix solver. Recall that MA27 does not take advantage of the specific
structure and saddle-point characteristics of KKT systems. So, it is reasonable to conjecture
that its replacement by a specific saddle-point solver would be more efficient. This observation
leads us to preconize, once more, the employment of (direct or iterative) specific linear saddle
point solvers as surveyed in [6].
No claims are made in this paper with respect to the behavior of Algencan, Ipopt or the
combined methods in problems with different characteristics than the ones studied here. We
believe, for example, that SQP-Interior Point ideas are very effective for small to medium scale
problems, or even large-scale problems with a moderate number of inequalities and reasonable
KKT Jacobian structure. Probably, in most of these situations, SQP-IP methods are more
efficient than Augmented Lagrangian algorithms. However, more numerical experimentation is
necessary in order to obtain reliable practical conclusions.
Acknowledgement. We are indebted to an anonymous referee for careful reading and encour-
aging words about this paper.
References
[1] R. Andreani, E. G. Birgin, J. M. Martı́nez and M. L. Schuverdt, Augmented Lagrangian
methods under the Constant Positive Linear Dependence constraint qualification, Mathe-
matical Programming 111, pp. 5–32, 2008.
[3] R. Andreani, J. M. Martı́nez and M. L. Schuverdt, On the relation between the Constant
Positive Linear Dependence condition and quasinormality constraint qualification, Journal
of Optimization Theory and Applications 125, pp. 473–485, 2005.
[4] M. Argáez and R. A. Tapia, On the global convergence of a modified augmented Lagrangian
linesearch interior-point method for Nonlinear Programming, Journal of Optimization The-
ory and Applications 114, pp. 1–25, 2002.
[5] S. Bakhtiari and A. L. Tits, A simple primal-dual feasible interior-point method for non-
linear programming with monotone descent, Computational Optimization and Applications
25, pp. 17–38, 2003.
[6] M. Benzi, G. H. Golub and J. Nielsen, Numerical solution of saddle-point problems, Acta
Numerica 14, pp. 1–137, 2005.
22
[8] E. G. Birgin, C. A. Floudas and J. M. Martı́nez, Global minimization us-
ing an Augmented Lagrangian method with variable lower-level constraints, avail-
able in Optimization Online, E-Print ID: 2006-12-1544, http://www.optimization-
online.org/DB HTML/2006/12/1544.html.
[13] E. G. Birgin, J. M. Martı́nez and M. Raydan, Inexact Spectral Projected Gradient methods
on convex sets, IMA Journal on Numerical Analysis 23, pp. 539-559, 2003.
[14] E. G. Birgin, J. M. Martnez and D. P. Ronconi, Optimizing the Packing of Cylinders into a
Rectangular Container: A Nonlinear Approach, European Journal of Operational Research
160, pp. 19–33, 2005.
[15] I. Bongartz, A. R. Conn, N. I. M. Gould and Ph. L. Toint, CUTE: constrained and un-
constrained testing environment, ACM Transactions on Mathematical Software 21, pp.
123–160, 1995.
[16] O. Burdakov, J. M. Martı́nez and E. A. Pilotta, A limited memory multipoint secant method
for bound constrained optimization, Annals of Operations Research 117, pp. 51-70, 2002.
[17] R. H. Byrd, J. Ch. Gilbert and J. Nocedal, A trust region method based on interior point
techniques for nonlinear programming, Mathematical Programming 89, pp. 149–185, 2000.
[18] R. H. Byrd, N. I. M. Gould, J. Nocedal and R. A. Waltz, An algorithm for nonlinear op-
timization using linear programming and equality constrained subproblems, Mathematical
Programming 100, pp. 27–48, 2004.
[19] R. H. Byrd, J. Nocedal and A. Waltz, Feasible interior methods using slacks for nonlinear
optimization, Computational Optimization and Applications 26, pp. 35–61, 2003.
[20] L. Chen and D. Goldfarb, Interior-Point `2 penalty methods for nonlinear programming
with strong global convergence properties, CORC Technical Report TR 2004-08, IEOR
Department, Columbia University, 2005.
23
[21] A. R. Conn, N. I. M. Gould, D. Orban and Ph. L. Toint, A primal-dual trust-region algo-
rithm for nonconvex nonlinear programming, Mathematical Programming 87, pp. 215–249,
2000.
[22] A. R. Conn, N. I. M. Gould and Ph. L. Toint, A globally convergent Augmented Lagrangian
algorithm for optimization with general constraints and simple bounds, SIAM Journal on
Numerical Analysis 28, pp. 545–572, 1991.
[23] A. R. Conn, N. I. M. Gould and Ph. L. Toint, Lancelot: A Fortran package for large
scale nonlinear optimization, Springer-Verlag, Berlin, 1992.
[24] A. R. Conn, N. I. M. Gould and Ph. L. Toint, Trust Region Methods, MPS/SIAM Series
on Optimization, SIAM, Philadelphia, 2000.
[25] H. Conway and N. J. A. Sloane, Sphere Packings, Lattices and Groups, 3rd ed., New York,
Springer-Verlag, 1999.
[26] Y-H Dai and R. Fletcher, Projected Barzilai-Borwein methods for large-scale box-
constrained quadratic programming, Numerische Mathematik 100, pp. 21–47, 2005.
[28] E. D. Dolan and J. J. Moré, Benchmarking optimization software with performance profiles,
Mathematical Programming 91, pp. 201–213, 2002.
[29] F. Facchinei, A. Fischer and C. Kanzow, On the accurate identification of active constraints,
SIAM Journal on Optimization 9, pp. 14-32, 1998.
[31] R. Fletcher, N. I. M. Gould, S. Leyffer, Ph. L. Toint and A. Wächter, Global convergence
of a trust-region SQP-filter algorithm for general nonlinear programming, SIAM Journal
on Optimization 13, pp. 635–659, 2002.
[32] A. Forsgren, P. E. Gill and M. H. Wright, Interior point methods for nonlinear optimization,
SIAM Review 44, pp. 525–597, 2002.
[33] E. M. Gertz and P. E. Gill, A primal-dual trust region algorithm for nonlinear optimization,
Mathematical Programming 100, pp. 49–94, 2004.
[34] P. E. Gill, W. Murray and M. A. Saunders, SNOPT: An SQP algorithm for large-scale
constrained optimization, SIAM Review 47, pp. 99–131, 2005.
[35] C. C. Gonzaga, E. Karas and M. Vanti, A globally convergent filter method for Nonlinear
Programming, SIAM Journal on Optimization 14, pp. 646–669, 2003.
24
[36] N. I. M. Gould, D. Orban, A. Sartenaer and Ph. L. Toint, Superlinear Convergence of
Primal-Dual Interior Point Algorithms for Nonlinear Programming, SIAM Journal on Op-
timization 11, pp.974–1002, 2000.
[37] N. I. M. Gould, D. Orban and Ph. L. Toint, GALAHAD: a library of thread-safe Fortran
90 packages for large-scale nonlinear optimization, ACM Transactions on Mathematical
Software 29, pp. 353–372, 2003.
[38] N. I. M. Gould, D. Orban and Ph. L. Toint, An interior point `1 penalty method for
nonlinear optimization, Computational Science and Engineering Department, Rutherford
Appleton Laboratory, Chilton, Oxfordshire, England, 2003.
[39] W. W. Hager and M. S. Gowda, Stability in the presence of degeneracy and error estimation,
Mathematical Programming 85, pp. 181-192, 1999.
[40] W. W. Hager and H. C. Zhang, A new active set algorithm for box constrained optimization,
SIAM Journal on Optimization 17, pp. 526–557, 2006.
[41] M. R. Hestenes, Multiplier and gradient methods, Journal of Optimization Theory and
Applications 4, pp. 303–320, 1969.
[42] C. T. Kelley, Iterative methods for linear and nonlinear equations, SIAM, 1995.
[44] R. M. Lewis and V. Torczon, A globally convergent augmented Lagrangian pattern search
algorithm for optimization with general constraints and simple bounds, SIAM Journal on
Optimization 12, pp. 1075–1089, 2002.
[45] C. Lin and J. J. Moré, Newton’s method for large bound-constrained optimization problems,
SIAM Journal on Optimization 9, pp. 1100–1127, 1999.
[46] X. Liu and J. Sun, A robust primal-dual interior point algorithm for nonlinear programs,
SIAM Journal on Optimization 14, pp. 1163–1186, 2004.
[48] J. M. Martı́nez, Inexact Restoration Method with Lagrangian tangent decrease and new
merit function for Nonlinear Programming, Journal of Optimization Theory and Applica-
tions 111, pp. 39–58, 2001.
[49] J. M. Martı́nez and E. A. Pilotta, Inexact restoration methods for nonlinear programming:
advances and perspectives, in Optimization and Control with applications, edited by L. Q.
Qi, K. L. Teo and X. Q. Yang. Springer, pp. 271–292, 2005.
25
[50] J. M. Moguerza and F. J. Prieto, An augmented Lagrangian interior-point method using
directions of negative curvature, Mathematical Programming 95, pp. 573–616, 2003.
[51] Q. Ni and Y-X Yuan, A subspace limited memory quasi-Newton algorithm for large-scale
nonlinear bound constrained optimization, Mathematics of Computation 66, pp. 1509–1520,
1997.
[52] C. Oberlin and S. J. Wright, Active set identification in Nonlinear Programming, SIAM
Journal on Optimization 17, pp. 577-605, 2006.
[53] E. Polak and J. Royset, On the use of augmented Lagrangians in the solution of generalized
semi-infinite min-max problems, Computational Optimization and Applications 2, pp. 173–
192, 2005.
[55] L. Qi and Z. Wei, On the constant positive linear dependence condition and its application
to SQP methods, SIAM Journal on Optimization 10, pp. 963–981, 2000.
[56] R. T. Rockafellar, Augmented Lagrange multiplier functions and duality in nonconvex pro-
gramming, SIAM Journal on Control 12, pp. 268–285, 1974.
[57] R. T. Rockafellar, Lagrange multipliers and optimality, SIAM Review 35, pp. 183–238,
1993.
[58] L. Schrage, A more portable Fortran random number generator, ACM Transactions on
Mathematical Software 5, pp. 132–138, 1979.
[59] D. F. Shanno and R. J. Vanderbei, Interior-point methods for nonconvex nonlinear pro-
gramming: orderings and high-order methods, Mathematical Programming 87, pp. 303–316,
2000.
[61] M. Todd and E. A. Yildirim, On Khachiyan’s algorithm for the computation of minimum
volume enclosing ellipsoids, TR 1435, School of Operations Research and Industrial Engi-
neering, Cornell University, 2005.
[62] P. Tseng, A convergent infeasible interior-point trust-region method for constrained mini-
mization, SIAM Journal on Optimization 13, pp. 432–469, 2002.
26
[64] A. Wächter and L. T. Biegler, Failure of global convergence for a class of interior point
methods for nonlinear programming, Mathematical Programming 88, pp. 565–574, 2000.
[66] R. A. Waltz, J. L. Morales, J. Nocedal and D. Orban, An interior algorithm for nonlinear
optimization that combines line search and trust region steps, Mathematical Programming
107, pp. 391–408, 2006.
[68] S. J. Wright, Modifying SQP for degenerate problems, SIAM Journal on Optimization 13,
pp. 470-497, 2002.
[69] H. Yamashita and H. Yabe, An interior point method with a primal-dual quadratic barrier
penalty function for nonlinear optimization, SIAM Journal on Optimization 14, pp. 479–
499, 2003.
[70] B. Zhou, L. Gao and Y-H Dai, Monotone projected gradient methods for large-scale box-
constrained quadratic programming, Science in China Series A - Mathematics 49, pp.
688–702, 2006.
27