Wisdom of Crowds Intro
Wisdom of Crowds Intro
Felix Fischer
[email protected]
9 Non-Cooperative Games 33
9.1 Games and Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
9.2 The Minimax Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
iii
iv
5.1 Illustration of linear programs with one optimal solution and an infinite
number of optimal solutions . . . . . . . . . . . . . . . . . . . . . . . . . 18
v
1 Introduction and Preliminaries
minimize f(x)
subject to h(x) = b (1.1)
x ∈ X.
minimize cT x
subject to aTi x > bi , i ∈ M1
aTi x 6 bi , i ∈ M2
aTi x = bi , i ∈ M3
xj > 0, j ∈ N1
xj 6 0, j ∈ N2
where c ∈ Rn is a cost vector, x ∈ Rn is a vector of decision variables, and constraints
are given by ai ∈ Rn and bi ∈ R for i ∈ {1, . . . , m}. Index sets M1 , M2 , M3 ⊆ {1, . . . , m}
and N1 , N2 ⊆ {1, . . . , n} are used to distinguish between different types of contraints.
An equality constraint aTi x = bi is equivalent to the pair of constraints aTi 6 bi and
aTi x > bi , and a constraint of the form aTi x 6 bi can be rewritten as (−ai )T x > −bi .
Each occurrence of an unconstrained variable xj can be replaced by x+ −
j + xj , where xj
+
1
2 1 · Introduction and Preliminaries
x1 = 0
x1 + x2 = 5
x1 − x2 = 3
A E x2 = 0
D
x1 + 2x2 = 6
c x1 + x2 = 2
x1 + x2 = 0
F
and x− + −
j are two new variables with xj > 0 and xj 6 0. We can thus write every linear
program in the general form
is said to be in standard form. The standard form is of course a special case of the
general form. On the other hand, we can also bring every general form problem into the
standard form by replacing each inequality constraint of the form aTi x 6 bi or aTi x > bi
by a constraint aTi x + si = bi or aTi x − si = bi , where si is a new slack variable, and
an additional constraint si > 0.
The general form is typically used to discuss the theory of linear programming,
while the standard form is often more convenient when designing algorithms.
Example 1.1. Consider the following linear program, which is illustrated in Figure 1.1:
minimize −(x1 + x2 )
subject to x1 + 2x2 6 6
x1 − x2 6 3
x1 , x2 >0
1.3 · Review: Unconstrained Optimization and Convexity 3
δ=0 T
3
δ= 5
δ=1 y
x y
S x
f(x)
g(x)
x x
Solid lines indicate sets of points for which one of the constraints is satisfied with
equality. The feasible set is shaded. Dashed lines, orthogonal to the cost vector c,
indicate sets of points for which the value of the objective function is constant.
It is easy to see that in the case of LPs, the feasible set is convex and the objective
function is both convex and concave. But even when these two conditions are satisfied,
the above theorem cannot generally be used to solve constrained optimization problems,
because the gradient might not be zero anywhere on the feasible set.
2 The Method of Lagrange Multipliers
A well-known method for solving constrained optimization problems is the method
of Lagrange multipliers. The idea behind this method is to reduce constrained opti-
mization to unconstrained optimization, and to take the (functional) constraints into
account by augmenting the objective function with a weighted sum of them. To this
end, define the Lagrangian associated with (1.1) as
> min
0
[f(x 0 ) − λT (h(x 0 ) − b)]
x ∈X
Equality in the first line holds because h(x 0 ) − b = 0 when x 0 ∈ X(b). The inequality
on the second line holds because the minimum is taken over a larger set. In the third
line we finally use that x minimizes L and that h(x) = b.
Two remarks are in order. First, a vector λ of Lagrange multipliers satisfying the
conditions of the theorem is not guaranteed to exist in general, but it does exist for a
large class of problems. Second, the theorem appears to be useful mainly for showing
that a given solution x is optimal. In certain cases, however, it can also be used to find
an optimal solution. Our general strategy in these cases will be to minimize L(x, λ) for
all values of λ, in order to obtain a minimizer x∗ (λ) that depends on λ, and then find
λ∗ such that x∗ (λ∗ ) satisfies the constraints.
5
6 2 · The Method of Lagrange Multipliers
minimize x1 − x2 − 2x3
subject to x1 + x2 + x3 = 5
x21 + x22 = 4.
For every λ ∈ Y, the unique optimum of L(x, λ) occurs at x∗ (λ) = (3/(2λ2 ), 1/(2λ2 ), x3 )T,
and we need to find λ ∈ Y such that x∗ (λ) is feasible to be able to apply Theorem 2.1.
Therefore,
9 1
x21 + x22 = 2 + 2 = 4
4λ2 4λ2
p
and thus λ2 = − 5/8. We can now use Theorem p 2.1 to conclude
p that the minimization
problem
p has an optimal solution at x1 = −3 2/5, x2 = − 2/5, and x3 = 5 − x1 − x2 =
5 + 4 2/5.
2.3 · Complementary Slackness 7
Let us formalize the strategy we have used to find x and λ satisfying the conditions
of Theorem 2.1 for a more general problem. To
we proceed as follows:
1. Introduce a vector z of slack variables to obtain the equivalent problem
4. For each λ ∈ Y, minimize L(x, z, λ) subject only to the regional constraints, i.e.,
find x∗ (λ), z∗ (λ) satisfying
5. Find λ∗ ∈ Y such that (x∗ (λ∗ ), z∗ (λ∗ )) is feasible, i.e., such that x∗ (λ∗ ) ∈ X,
z∗ (λ∗ ) > 0, and h(x∗ (λ∗ )) + z∗ (λ∗ ) = b. By Theorem 2.1, x∗ (λ∗ ) is optimal
for (2.2).
Indeed, if the conditions were violated for some i, then the value of the Lagrangian
could be reduced by reducing (z∗ (λ))i , while maintaining that (z∗ (λ))i > 0. This
would contradict (2.3). Further note that λ ∈ Y requires for each i = 1, . . . , m either
that λi 6 0 or that λi > 0, depending on the sign of bi . In the case where where λi 6 0,
we for example get that
Slack in the corresponding inequalities (h(x∗ (λ∗ )))i 6 bi and λ∗i 6 0 has to be comple-
mentary, in the sense that it cannot occur simultaneously in both of them.
8 2 · The Method of Lagrange Multipliers
minimize x1 − 3x2
subject to x21 + x22 6 4
x1 + x2 6 2.
minimize x1 − 3x2
subject to x21 + x22 + z1 = 4
x1 + x2 + z2 = 2
z1 > 0, z2 > 0.
Since z1 > 0 and z2 > 0, the terms −λ1 z1 and −λ2 z2 have a finite minimum only if
λ1 6 0 and λ2 6 0. In addition, the complementary slackness conditions λ1 z1 = 0 and
λ2 z2 = 0 must hold at the optimum.
Minimizing L(x, z, λ) in x1 and x2 yields
∂L
= 1 − λ2 − 2λ1 x1 = 0 and
∂x1
∂L
= −3 − λ2 − 2λ1 x2 = 0,
∂x2
and we indeed obtain a minimum, because
∂2 L ∂2 L
∂x1 ∂x1 ∂x1 ∂x2
−2λ1 0
HL = =
∂2 L ∂2 L
∂x2 ∂x1 ∂x2 ∂x2
0 −2λ1
Theorem 3.1. Suppose that f and h are continuously differentiable on Rn , and that
there exist unique functions x∗ : Rm → Rn and λ∗ : Rm → Rm such that for each
b ∈ Rm , h(x∗ (b)) = b, λ∗ (b) 6 0 and f(x∗ (b)) = φ(b) = inf{f(x) − λ∗ (b)T (h(x) − b) :
x ∈ Rn }. If x∗ and λ∗ are continuously differentiable, then
∂φ
(b) = λ∗i (b).
∂bi
Proof. We have that
∂f(x∗ (b)) X ∂f ∗
n
∂x∗j
= (x (b)) (b),
∂bi j=1
∂xj ∂b i
∂φ(b) X
n
∂x∗j
∂f ∗ ∂h ∗
= (x (b)) − λ∗ (b)T (x (b)) (b)
∂bi j=1
∂xj ∂xj ∂bi
∂λ∗ (b)T ∂b
− (h(x∗ (b)) − b) + λ∗ (b)T .
∂bi ∂bi
9
10 3 · Shadow Prices and Lagrangian Duality
The first term on the right-hand side is zero, because x∗ (b) minimizes L(x, λ∗ (b)) and
thus
∂L(x∗ (b), λ∗ (b))
∂f ∗ ∗ T ∂h ∗
= (x (b)) − λ (b) (x (b)) = 0
∂xj ∂xj ∂xj
for j = 1, . . . , n. The second term is zero as well, because x∗ (b) is feasible and thus
(h(x∗ (b)) − b)k = 0 for k = 1, . . . , m, and the claim follows.
It should be noted that the result also holds when the functional constraints are
inequalities: if the ith constraint does not not hold with equality, then λ∗i = 0 by
complementary slackness, and therefore also ∂λ∗i /∂bi = 0.
The Lagrange multipliers are also known as shadow prices, due to an economic
interpretation of the problem to
maximize f(x)
subject to h(x) 6 b
x ∈ X.
Consider a firm that produces n different goods from m different raw materials. Vector
b ∈ Rm describes the amount of each raw material available to the firm, vector x ∈ Rn
the quantity produced of each good. Functions h : Rn → Rm and f : Rn → R finally
describe the amounts of raw material required to produce, and the profit derived from
producing, particular quantities of the goods. The goal of the above problem thus is to
maximize the profit of the firm for given amounts of raw materials available to it. The
shadow price of raw material i then is the price the firm would be willing to pay per
additional unit of this raw material, which of course should be equal to the additional
∂φ
profit derived from it, i.e., to ∂bi
(b).
6 L(x, λ)
= f(x) − λT (h(x) − b)
= f(x).
Equality on the first and third line holds by definition of g and L, the inequality on
the second line because x ∈ X. The last equality holds because x ∈ X(b) and therefore
h(x) − b = 0.
In light of this result, it is interesting to choose λ in order to make this lower bound
as large as possible, i.e., to
maximize g(λ)
subject to λ ∈ Y.
This problem is known as the dual problem, and (1.1) is in this context referred to
as the primal problem. If (3.2) holds with equality, i.e., if there exists λ ∈ Y such
that g(λ) = inf x∈X(b) f(x), the problem is said to satisfy strong duality. The cases
where strong duality holds are those that can be solved using the method of Lagrange
multipliers.
Example 3.3. Again consider the minimization problem of Example 2.2, and recall
that Y = {λ ∈ R2 : λ1 = −2, λ2 < 0} and that for each λ ∈ Y the minimum occurred at
x∗ (λ) = (3/(2λ2 ), 1/(2λ2 ), x3 ). Thus,
10
g(λ) = inf L(x, λ) = L(x∗ (λ), λ) = + 4λ2 − 10,
x∈X 4λ2
so the dual problem is to
10
maximize + 4λ2 − 10 subject to λ2 < 0.
4λ2
p
It should not come as a surprise that the maximum is attained for√λ2 = − 5/8, and
that the primal and dual have the same optimal value, namely p −2( 10 + 5). Note that
it is not actually necessary to solve the dual to see that λ2 = − 5/8 is an optimizer, it
suffices that the value of the dual function at this point equals the value of the objective
function of the primal at some point in the feasible set of the primal.
There are several reasons why the dual is interesting. Any feasible solution of the
dual provides a succinct certificate that the optimal solution of the primal is bounded
by a certain value. In particular, a pair of solutions of the primal and dual that yield
the same value must be optimal. If strong duality holds, the optimal value of the primal
can be determined by solving the dual, which in some cases may be easier than solving
the primal. In a later lecture we will express two quantities as the optimal solutions of
a pair of a primal and a dual that satisfy strong duality, thereby showing that the two
quantities are equal.
4 Conditions for Strong Duality
While we have already solved a few optimization problems using the method of La-
grange multipliers, it was not clear a priori whether each individual problem satisfied
strong duality and whether our attempt to solve it would ultimately be successful. Our
goal in this lecture will be to identify general conditions that guarantee strong duality,
and classes of problems that satisfy these conditions.
This hyperplane has intercept β at b and slope λ. We can now try to find φ(b) as
follows:
1. For each λ, find βλ = sup{β : β + λT (c − b) 6 φ(c) for all c ∈ Rm }.
2. Choose λ to maximize βλ .
This approach is illustrated in Figure 4.1. We always have that βλ 6 φ(b). In the
situation on the left of Figure 4.1, this condition holds with equality because there is a
tangent to φ at b. In fact,
= infm φ(c) − λT (c − b)
c∈R
= sup β : β + λT (c − b) 6 φ(c) for all c ∈ Rm
= βλ
We again see the weak duality result as maxλ βλ 6 φ(b), but we also obtain a
condition for strong duality. Call α : Rm → R a supporting hyperplane to φ at b if
α(c) = φ(b) + λT (c − b) and φ(c) > φ(b) + λT (c − b) for all c ∈ Rm .
Theorem 4.1. Problem (1.1) satisfies strong duality if and only if there exists a
(non-vertical) supporting hyperplane to φ at b.
13
14 4 · Conditions for Strong Duality
φ(c)
φ(c)
α(c) = βλ + λT (c − b) α(c) = βλ + λT (c − b)
βλ
βλ
b c b c
Figure 4.1: Geometric interpretation of the dual with optimal value g(λ) = βλ . In the
situation on the left strong duality holds, and βλ = φ(b). In the situation on the right,
strong duality does not hold, and βλ < φ(b).
φ(b) + λT (c − b) 6 φ(c).
f(x) − λT (h(x) − b)
= infm inf
c∈R x∈X(c)
= inf L(x, λ)
x∈X
= g(λ).
On the other hand, φ(b) > g(λ) by Theorem 3.2, so φ(b) = g(λ) and strong duality
holds.
Now suppose that the problem satisfies strong duality. Then there exists λ ∈ Rm
such that for all c ∈ Rm
6 inf L(x, λ)
x∈X(c)
= φ(c) − λT (c − b),
and thus
φ(b) + λT (c − b) 6 φ(c).
This describes a (non-vertical) supporting hyperplane to φ at b.
minimize f(x)
subject to h(x) 6 b
x ∈ X,
and let φ be given by φ(b) = inf x∈X(b) f(x). Then, φ is convex if X, f, and h are
convex.
Proof. Consider b1 , b2 ∈ Rm such that φ(b1 ) and φ(b2 ) are defined, and let δ ∈ [0, 1]
and b = δb1 + (1 − δ)b2 . Further consider x1 ∈ X(b1 ), x2 ∈ X(b2 ), and let x =
δx1 + (1 − δ)x2 . Then convexity of X implies that x ∈ X, and convexity of h that
This holds for all x1 ∈ X(b1 ) and x2 ∈ X(b2 ), so taking infima on the right hand
side yields
φ(b) 6 δφ(b1 ) + (1 − δ)φ(b2 ).
for some b > 0. Then φ(b) = b2/3 , which is not convex. The Lagrangian is L(x, λ) =
x2 − λ(x3 − b) = (x2 − λx3 ) + λb, and has a finite minimum if and only if λ = 0. The
dual thus has an optimal value of 0, which is strictly smaller than φ(b) if b > 0.
Linear programs satisfy the conditions, both for equality and inequality constraints.
We thus have the following.
Theorem 4.4. If a linear program is feasible and bounded, it satisfies strong du-
ality.
5 Solutions of Linear Programs
In the remaining lectures, we will concentrate on linear programs. We begin by studying
the special structure of the feasible set and the objective function in this case, and how
it affects the set of optimal solutions.
17
18 5 · Solutions of Linear Programs
f(x) = α∗
Figure 5.1: Illustration of linear programs with one optimal solution (left) and an
infinite number of optimal solutions (right)
Assumptions (i) and (ii) are without loss of generality: if a set of rows are linearly
dependent, one of the corresponding constraints can be removed without changing the
feasible set; similarly, if a set of columns are linearly dependent, one of the correspond-
ing variables can be removed. Extra care needs to be taken to handle degeneracies, but
this is beyond the scope of this course.
If the above assumptions are satisfied, setting any subset of n − m variables to zero
uniquely determines the value of the remaining, basic variables. Computing the set of
basic feasible solutions is thus straightforward.
Example 5.1. Again consider the LP of Example 1.1. By adding slack variables x3 > 0
and x4 > 0, the functional constraint can be written as
x1
! !
1 x2
2 1 0 = 6
.
1 −1 0 1 x3 3
x4
4
The problem has the following six basic solutions corresponding to the 2
possible
ways to choose a basis, which are labeled A through F in Figure 1.1:
x1 x2 x3 x4 f(x)
A 0 0 6 3 0
B 0 3 0 6 3
C 4 1 0 0 5
D 3 0 3 0 3
E 6 0 0 −3 6
F 0 −3 12 0 −3
5.2 · Extreme Points and Optimal Solutions 19
Proof. Consider a BFS x and suppose that x = δy + (1 − δ)z for y, z ∈ X(b) and
δ ∈ (0, 1). Since y > 0 and z > 0, x = δy + (1 − δ)z implies that yi = zi whenever
xi = 0. By (iii), y and z are basic solutions with the same basis, i.e., both have exactly
m non-zero entries, which occur in the same rows. Moreover, Ay = b = Az and thus
A(y − z) = 0. This yields a linear combination of m columns of A that is equal to zero,
which by (ii) implies that y = z. Thus x is an extreme point of X(b).
Now consider a feasible solution x ∈ X(b) that is not a BFS. Let i1 , . . . , ir be the
rows of x that are non-zero, and observe that r > m. This means that the columns
ai1 , . . . , air , where ai = (a1i , . . . , ami )T , have to be linearly dependent, i.e., there has to
exist a collection of r non-zero numbers yi1 , . . . , yir such that yi1 ai1 + · · · + yir air = 0.
Extending y to a vector in Rn by setting yi = 0 if i ∈ / {i1 , . . . , ir }, we have Ay =
i1 ir
yi1 a + · · · + yir a and thus A(x ± y) = b for every ∈ R. By choosing > 0 small
enough, x ± y > 0 and thus x ± y ∈ X(b). Moreover x = 1/2(x − y) + 1/2(x + y),
so x is not an extreme point of X(b).
We are now ready to show that an optimum occurs at an extreme point of the
feasible set.
Theorem 5.3. If the linear program (5.2) is feasible and bounded, then it has an
optimal solution that is a basic feasible solution.
Proof. Let x be an optimal solution of (5.2). If x has exactly m non-zero entries, then
it is a BFS and we are done. So suppose that x has r non-zero entries for r > m, and
that it is not an extreme point of X(b), i.e., that x = δy + (1 − δ)z for y, z ∈ X(b)
with y 6= z and δ ∈ (0, 1). We will show that there must exist an optimal solution with
strictly fewer than r non-zero entries; the claim then follows by induction.
Since cT x > cT y and cT x > cT z by optimality of x, and since cT x = δcT y+(1−δ)cT z,
we must have that cT x = cT y = cT z, so y and z are optimal as well. As in the proof
of Theorem 5.2, xi = 0 implies that yi = zi = 0, so y and z have at most r non-zero
entries, which must occur in the same rows as in x. If y or z has strictly fewer than
r non-zero entries, we are done. Otherwise let x 0 = δ 0 y + (1 − δ 0 )z = z + δ 0 (y − z),
and observe that x 0 is optimal for every δ 0 ∈ R. Moreover, y − z 6= 0, and all non-zero
entries of y − z occur in rows where x is non-zero as well. We can thus choose δ 0 ∈ R
such that x 0 > 0 and such that x 0 has strictly fewer than r non-zero entries.
The result can in fact be extended to show that the maximum of a convex function
f over a compact convex set X occurs at an extreme point of X. In this case any
20 5 · Solutions of Linear Programs
P
point x ∈ X can be written as a convex combination x = ki=1 δi xi of extreme points
P
x1 , . . . , xk ∈ X, where δ ∈ Rk>0 and ki=1 δi = 1. Convexity of f then implies that
X
k
f(x) 6 δi f(xi ) 6 max f(xi ).
16i6k
i=1
min { cT x : Ax − z = b, x, z > 0 }.
λ ∈ Y = { µ : cT − µT A > 0, µ > 0 }.
For λ ∈ Y, the minimum of L((x, z), λ) is attained when both (cT − λT A)x = 0 and
λT z = 0, and thus
g(λ) = inf L((x, z), λ) = λT b.
(x,z)∈X
max { bT λ : AT λ 6 c }.
The dual is itself a linear program, and its dual is in fact equivalent to the primal.
Theorem 6.1. In the case of linear programming, the dual of the dual is the primal.
Proof. The dual can be written equivalently as
This problem has the same form as the primal (1.2), with −b taking the role of c, −c
taking the role of b, and −AT the role of A. Taking the dual again we thus return to
the original problem.
21
22 6 · Linear Programming Duality
x1 = 0 λ1 = 0
2x1 + 3x2 = 6
E x1 − x2 = 3 C
D
x2 = 0 E λ2 = 0
C D
A B A B F
2x1 + x2 = 4 x1 + 2x2 = 6
Figure 6.1: Geometric interpretation of primal and dual linear programs in Example 6.2
To see that these LPs are indeed dual to each other, observe that the primal has the
form (1.2), and the dual the form (6.1), with
! ! !
3 2 1 4
c=− , A=− , b=− .
2 2 3 6
As before, we can compute all basic solutions of the primal by setting any set of
n − m = 2 variables to zero in turn, and solving for the values of the remaining m = 2
variables. Given a particular basic solution of the primal, the corresponding dual
solution can be found using the complementary slackness conditions λ1 z1 = 0 = λ2 z2
and µ1 x1 = 0 = µ2 x2 . These conditions identify, for each non-zero variable of the
primal, a dual variable whose value has to be equal to zero. By solving for the remaining
variables, we obtain a solution for the dual, which is in fact a basic solution. Repeating
this procedure for every basic solution of the primal, we obtain the following pairs of
basic solutions of the primal and dual:
x1 x2 z1 z2 f(x) λ1 λ2 µ1 µ2 g(λ)
A 0 0 4 6 0 0 0 −3 −2 0
3
B 2 0 0 2 6 2
0 0 − 12 6
3 5
C 3 0 −2 0 9 0 2
0 2
9
3 13 5 1 13
D 2
1 0 0 2 4 4
0 0 2
2
E 0 2 2 0 4 0 3
− 35 0 4
F 0 4 0 −6 8 2 0 1 0 8
Labels A through F refer to Figure 6.2, which illustrates the feasible regions of the
primal and the dual. Observe that there is only one pair such that both the primal and
the dual solution are feasible, the one labeled D, and that these solutions are optimal.
6.2 · Necessary and Sufficient Conditions for Optimality 23
Theorem 6.3. Let x and λ be feasible solutions for the primal (1.2) and the
dual (6.1), respectively. Then x and λ are optimal if and only if they satisfy
complementary slackness, i.e., if
cT x = λ T b
cT x 0 − λT (Ax 0 − b)
= inf
0
x ∈X
6 cT x − λT (Ax − b)
6 cT x.
Since the first and last term are the same, the two inequalities must hold with equality.
Therefore, λT b = cT x − λT (Ax − b) = (cT − λT A)x + λT b, and thus (cT − λT A)x = 0.
Furthermore, cT x − λT (Ax − b) = cT x, and thus λT (Ax − b) = 0.
If on the other hand (cT − λT A)x = 0 and λT (Ax − b) = 0, then
While the result has been formulated here for the primal LP in general form and
the corresponding dual, it is true, with the appropriate complementary slackness con-
ditions, for any pair of a primal and dual LP.
7 The Simplex Method
Let A ∈ Rm×n , b ∈ Rm . Further let B be a basis, i.e., a set B ⊆ {1, . . . , n} with |B| = m,
corresponding to a choice of m non-zero variables. Let x ∈ Rn such that Ax = b. Then
we have
AB xB + AN xN = b,
where AB ∈ Rm×m and AN ∈ Rm×(n−m) respectively consist of the columns of A
indexed by B and those not indexed by B, and xB and xN respectively consist of the
rows of x indexed by B and those not indexed by B. Moreover, if x is a basic solution,
then there is a basis B such that xN = 0 and AB xB = b, and if x is a basic feasible
solution, there is a basis B such that xN = 0, AB xB = b, and xB > 0.
Then, for any feasible x ∈ Rn , it holds that xN > 0 and therefore f(x) 6 cTB A−1 B b.
∗ ∗ −1 ∗
The basic solution x with xB = AB b and xN = 0, on the other hand, is feasible and
satisfies f(x∗ ) = cTB A−1
B b. It must therefore be optimal.
If alternatively (cTN − cTB A−1
B AN )i > 0 for some i, then we can increase the value of
the objective by increasing (xN )i . Either this can be done indefinitely, which means
that the maximum is unbounded, or the constraints force some of the variables in the
basis to become smaller and we have to stop when the first one reaches zero. In that
case we have found a new BFS with a larger value and can repeat the process.
Assuming that the LP is feasible and has a bounded optimal solution, there exists
a basis B∗ for which (7.1) is satisfied. The basic idea behind the simplex method is to
start from an initial BFS and then move from basis to basis until B∗ is found. The in-
formation required for this procedure can conveniently be represented by the so-called
simplex tableau. For a given basis B, it takes the following form:1
1
The columns of the tableau have been permuted such that those corresponding to the basis appear
on the left. This has been done just for convenience: in practice we will always be able to identify the
columns corresponding to the basis by the embedded identity matrix.
25
26 7 · The Simplex Method
m n−m 1
z }| {z }| {z }| {
B N
m A−1
B AB = I A−1
B AN A−1
B b
1 cTB − cTB A−1
B AB = 0 cTN − cTB A−1
B AN −cTB A−1
B b
The first m rows consist of the matrix A and the column vector b, multiplied by the
inverse of AB . It is worth pointing out that for any basis B, the LP with constraints
A−1 −1
B Ax = AB b is equivalent to the one with constraints Ax = b. The first n columns
of the last row are equal to cT − λT A for λT = cTB A−1
B . The vector λ can be interpreted
as a solution, not necessarily feasible, to the dual problem. In the last column of the
last row we finally have the value −f(x), where x is the BFS given by xB = A−1 B b and
xN = 0.
We will see later that the simplex method always maintains feasibility of this so-
lution x. As a consequence it also maintains complementary slackness for x and
λT = cTB A−1 T
B : since we work with an LP in standard form, λ (Ax − b) = 0 follows
automatically from the feasibility condition, Ax = b; the condition (cT − λT A)x = 0
holds because xN = 0 and cTB − λT AB = cTB − cTB A−1 B AB = 0. What it then means
T T
for (7.1) to become satisfied is that c − λ A 6 0, i.e., that λ is a feasible solution for
the dual. Optimality of x is thus actually a consequence of Theorem 6.3.
(aij ) ai0
a0j a00
We will now describe the different steps of the simplex method in more detail and
illustrate them using the LP of Example 1.1.
variables x1 and x2 that are not in the basis, and b and 0 in the last column. For the
LP of Example 1.1 we obtain the following tableau, where rows and columns have been
labeled with the names of the corresponding variables:
x1 x2 z1 z2 ai0
z1 1 2 1 0 6
z2 1 −1 0 1 3
a0j 1 1 0 0 0
If the constraints do not have this convenient form, finding an initial BFS requires
more work. We will discuss this case in the next lecture.
Pivoting
The purpose of the pivoting step is to get the tableau into the appropriate form for the
new BFS. For this, we multiply row i by 1/aij and add a −(akj /aij ) multiple of row i
28 7 · The Simplex Method
to each row k 6= i, including the last one. Our choice of the pivot row as a row that
minimizes ai0 /aij turns out to be crucial, as it guarantees that the solution remains
feasible after pivoting. In our example, we need to subtract the second row from both
the first and the last row, after which the tableau looks as follows:
x1 x2 z1 z2 ai0
z1 0 3 1 −1 3
x1 1 −1 0 1 3
a0j 0 2 0 −1 −3
Note that the second row now corresponds to variable x1 , which has replaced z2 in the
basis.
We are now ready to choose a new pivot column. In our example, one further
iteration yields the following tableau:
x1 x2 z1 z2 ai0
1
x2 0 1 3
− 13 1
1 2
x1 1 0 3 3
4
2
a0j 0 0 −3 − 13 −5
29
30 8 · The Two-Phase Simplex Method
1. Bring the constraints into equality form. For each constraint in which the slack
variable and the right-hand side have opposite signs, or in which there is no slack
variable, add a new artificial variable that has the same sign as the right-hand
side.
2. Phase I: minimize the sum of the artificial variables, starting from the BFS where
the absolute value of the artificial variable for each constraint, or of the slack
variable in case there is no artificial variable, is equal to that of the right-hand
side.
3. If some artificial variable has a positive value in the optimal solution, the original
problem is infeasible; stop.
4. Phase II: solve the original problem, starting from the BFS found in phase I.
While the original objective is not needed for phase I, it is useful to carry it along
as an extra row in the tableau, because it will then be in the appropriate form at
the beginning of phase II. In the example, phase I therefore starts with the following
tableau:
x1 x2 z1 z2 z3 y1 y2
y1 1 1 −1 0 0 1 0 1
y2 2 −1 0 −1 0 0 1 1
z3 0 3 0 0 1 0 0 2
II −6 −3 0 0 0 0 0 0
I 3 0 −1 −1 0 0 0 2
Note that the objective for phase I is written in terms of the non-basic variables. This
can be achieved by first writing it in terms of y1 and y2 , such that we have −1 in the
columns for y1 and y2 and 0 in all other columns because we are maximizing −y1 −y2 ,
and then adding the first and second row to make the entries for all variables in the
basis equal to zero.
Phase I now proceeds by pivoting on a21 to get
x1 x2 z1 z2 z3 y1 y2
3 1
0 2
−1 2
0 1 − 12 1
2
1 − 12 0 − 12 0 0 1
2
1
2
0 3 0 0 1 0 0 2
II 0 −6 0 −3 0 0 3 3
3 1
I 0 2
−1 2
0 0 − 32 1
2
31
Note that we could have chosen a12 as the pivot element in the second step, and would
have obtained the same result.
This ends phase I as y1 = y2 = 0, and we have found a BFS for the original problem
with x1 = z2 = 1, z3 = 2, and x2 = z1 = 0. After dropping the columns for y1 and y2
and the row corresponding to the objective for phase I, the tableau is in the right form
for phase II:
x1 x2 z1 z2 z3
0 3 −2 1 0 1
1 1 −1 0 0 1
0 3 0 0 1 2
0 3 −6 0 0 6
By pivoting on a12 we obtain the following tableau, corresponding to an optimal solu-
tion of the original problem with x1 = 2/3, x2 = 1/3, and value −5:
x1 x2 z1 z2 z3
0 1 − 23 1
3
0 1
3
1 0 − 13 − 13 0 2
3
0 0 2 −1 1 1
0 0 −4 −1 0 5
It is worth noting that the problem we have just solved is the dual of the LP in
Example 1.1, which we solved in the previous lecture, augmented by the constraint
3x2 6 2. Ignoring the column and row corresponding to z3 , the slack variable for this
new constraint, the final tableau is essentially the negative of the transpose of the final
tableau we obtained in the previous lecture. This makes sense because the additional
constraint is not tight in the optimal solution, as we can see from the fact that z3 6= 0.
9 Non-Cooperative Games
The theory of non-cooperative games studies situations in which multiple self-interested
entities, or players, simultaneously and independently optimize different objectives and
outcomes must therefore be self-enforcing.
33
34 9 · Non-Cooperative Games
S T
S (2, 2) (0, 3)
T (3, 0) (1, 1)
C D
C (2, 2) (1, 3)
D (3, 1) (0, 0)
Figure 9.2: The game of chicken, where players can chicken out or dare
x > 0.
The unique maximin strategy in the game of chicken is to yield, for a security level
9.2 · The Minimax Theorem 35
of 1. This also illustrates that a maximin strategy need not be optimal: assuming that
the row player yields, the optimal action for the column player is in fact to go straight.
Formally, strategy x ∈ X of the row player is a best response to strategy y ∈ Y of the
column player if for all x 0 ∈ X, p(x, y) > p(x 0 , y). The concept of a best response for
the column player is defined analogously. A pair of strategies (x, y) ∈ X × Y such that x
is a best response to y and y is a best response to x is called an equilibrium. Equilibria
are also known as Nash equilibria, because their universal existence was shown by John
Nash.
It is easily verified that both (C, D) and (D, C) are equilibria of the game of chicken,
and there is one more equilibrium, in which both players randomize uniformly between
their two actions. The proof of Theorem 9.1 is beyond the scope of this course, but
we show the result for the special case when the players have diametrically opposed
interests.
Proof. Again consider the linear program (9.1), and recall that the optimum value
of this linear program is equal to maxx∈X miny∈Y p(x, y). By adding a slack variable
z ∈ Rn with z > 0 we obtain the Lagrangian
X
n X
m X
m
L(v, x, z, w, y) = v + yj xi pij − zj − v − w xi − 1
j=1 i=1 i=1
X
n X
m X
n X
n
= 1− yj v + pij yj − w xi − yj zj + w,
j=1 i=1 j=1 j=1
36 9 · Non-Cooperative Games
minimize w
Xn
subject to pij yj 6 w for i = 1, . . . , m
j=1
X
n
yj = 1
j=1
y > 0.
It is easy to see that the optimum value of the dual is miny∈Y maxx∈X p(x, y), and the
theorem follows from strong duality.
The number maxx∈X miny∈Y p(x, y) = miny∈Y maxx∈X p(x, y) is also called the
value of the matrix game with payoff matrix P.
It is now easy to show that every matrix game has an equilibrium, and that the
above result in fact characterizes the set of equilibria of such games.
max
0
p(x 0 , y) = min
0
max
0
p(x 0 , y 0 ).
x ∈X y ∈Y x ∈X
10 The Minimum-Cost Flow Problem
The remaining lectures will be concerned with optimization problems on networks, in
particular with flow problems.
10.1 Networks
A directed graph, or network, G = (V, E) consists of a set V of vertices and a set
E ⊆ V × V of edges. When the relation E is symmetric, G is called an undirected
graph, and we can write edges as unordered pairs {u, v} ∈ E for u, v ∈ V. The degree
of vertex u ∈ V in graph G is the number |{v ∈ V : (u, v) ∈ E or (v, u) ∈ E}| of other
vertices connected to it by an edge. A walk from u ∈ V to w ∈ V is a sequence of
vertices v1 , . . . , vk ∈ V such that v1 = u, vk = w, and (vi , vi+1 ) ∈ E for i = 1, . . . , k − 1.
In a directed graph, we can also consider an undirected walk where (vi , vi+1 ) ∈ E or
(vi+1 , vi ) ∈ E for i = 1, . . . , k − 1. A walk is a path if v1 , . . . , vk are pairwise distinct,
and a cycle if v1 , . . . , vk−1 are pairwise distinct and vk = v1 . A graph that does not
contain any cycles is called acyclic. A graph is called connected if for every pair of
vertices u, v ∈ V there is an undirected path from u to v. A tree is a graph that is
connected and acyclic. A graph G 0 = (V 0 , E 0 ) is a subgraph of graph G = (V, E) if
V 0 ⊆ V and E 0 ⊆ E. In the special case where G 0 is a tree and V 0 = V, it is called a
spanning tree of G.
37
38 10 · The Minimum-Cost Flow Problem
P
Note that i∈V bi = 0 is required for feasibility, and that a problem satisfying this
condition can be transformed into an equivalent problem where bi = 0 for all i by
introducing an additional vertex, and new edges from each sink to the new vertex and
from the new vertex to each of the sources with upper and lower bounds equal to the
flow that should enter the sources and leave the sinks. The latter problem is known
as a circulation problem, because flow does not enter or leave the network but merely
circulates. We can further assume without loss of generality that the network G is
connected. Otherwise the problem can be decomposed into several smaller problems
that can be solved independently.
An important special case is that of uncapacitated flow problems, where mij = 0
and mij = ∞ for all (i, j) ∈ E. Clearly, an uncapacitated flow problem is either
unbounded, or has an equivalent problem with finite capacities.
If the Lagrangian is minimized subject to the regional constraints mij 6 xij 6 mij for
(i, j) ∈ E, Theorem 2.1 yields a set of conditions that are sufficient for optimality. It
will be instructive to prove this result directly.
Theorem 10.1. Consider a feasible flow x ∈ Rn×n for a circulation problem, and
let λ ∈ Rn such that
Then x is optimal.
10.4 · The Transportation Problem 39
Proof. For (i, j) ∈ E, let c̄ij = cij − λi + λj . Then, for every feasible flow x 0 ,
X X X X X
0 0 0 0
cij xij = cij xij − λi xij − xji
(i,j)∈E (i,j)∈E i∈V j:(i,j)∈E j:(j,i)∈E
X
0
= c̄ij xij
(i,j)∈E
X X
> c̄ij mij + c̄ij mij
(i,j)∈E (i,j)∈E
c̄ij <0 c̄ij >0
X X
= c̄ij xij = cij xij
(i,j)∈E (i,j)∈E
X
n X
m
minimize cij xij
i=1 j=1
X
m
subject to xij = si for i = 1, . . . , n
j=1
X
n
xij = dj for j = 1, . . . , m
i=1
x > 0.
It turns out that the transportation problem already captures the full expressiveness
of the minimum-cost flow problem.
40 10 · The Minimum-Cost Flow Problem
P
i k:(i,k)∈E mik − bi
0
mij i, j cij
P
j k:(j,k)∈E mjk − bj
Theorem 10.2. Every minimum-cost flow problem with finite capacities or non-
negative costs has an equivalent transportation problem.
Proof. Consider a minimum-cost flow problem for a network (V, E) and assume without
loss of generality that mij = 0 for all (i, j) ∈ E. If this is not the case, we can instead
consider the problem obtained by setting mij to zero, mij to mij − mij , and replacing
bi by bi − mij and bj by bj + mij . A solution with flow xij for the new problem
then corresponds to a solution with flow xij + mij for the original problem. We can
further assume that all capacities are finite: if some edge has infinite capacity but costs
are non-negative then setting the capacity of this edge to a large enough number, for
P
example i∈V |bi |, does not affect the optimal solution of the problem.
We now construct an instance of the transportation problem as follows. For every
P
vertex i ∈ V, we add a consumer with demand k mik − bi . For every edge (i, j) ∈ E,
we add a supplier with supply mij , an edge to vertex i with cost cij,j = 0, and an edge
to vertex j with cost cij,j = cij . The situation is shown in Figure 10.1.
We now claim that there exists a direct correspondence between feasible flows of the
two problems, and that these flows have the same costs. To see this, let the flows on
edges (ij, i) and (ij, j) be mij −xij and xij , respectively. The total flow into vertex i then
P P P
is k:(i,k)∈E (mik − xik ) + k:(k,i)∈E xki , which must be equal to k:(i,k)∈E mik − bi .
P P
This is the case if and only if bi + k:(k,i)∈E xki − k:(i,k)∈E xik = 0, which is the flow
conservation constraint for vertex i in the original problem.
11 The Transportation Algorithm
The particular structure of basic feasible solutions in the case of the transportation
problem gives rise to a special interpretation of the simplex method. This special form
is sometimes called the transportation algorithm.
where λ ∈ Rn and µ ∈ Rm are Lagrange multipliers for the suppliers and consumers,
respectively. Subject to xij > 0, the Lagrangian has a finite minimum if and only if
cij − λi + µj > 0 for i = 1, . . . , n and j = 1, . . . , m,
41
42 11 · The Transportation Algorithm
6−θ
6 1 6 1 6
θ
8 1 2 8 1 2+θ
3 2 5 3−θ 2 5
10 2 7 10 2 7
1 3 8 1 3 8
9 3 8 9 3 8
4 8 4 8
Figure 11.1: Initial basic feasible solution of an instance of the transportation problem
(left) and a cycle along which the overall cost can be decreased (right)
Consider for example the Hitchcock transportation problem with three suppliers
and four consumers given by the following tableau:
8
5 3 4 6
10
2 7 4 1
9
5 6 2 4
6 5 8 8
−5 −3 0 −2
0 2
0 6 2 8
5 3 4 6
9 6
4 3 7 10
2 7 4 1
7 5
2 1 8 9
5 6 2 4
6 5 8 8
Pivoting
If cij > λi − µj for all (i, j) ∈
/ T , the current flow is optimal. Assume on the other hand
that dual feasibility is violated for some edge (i, j) ∈ / T , and observe that this edge and
the edges in T together form a unique cycle. In the absence of degeneracies the regional
constraints for edges in T are not tight, so we can push flow around this cycle in order
to increase xij and decrease the value of the Lagrangian. Due to the special structure
of the network, this will alternately increase and decrease the flow for edges along the
cycle until xi 0 j 0 becomes zero for some (i 0 , j 0 ) ∈ T . We thus obtain a new BFS, and a
new spanning tree in which (i 0 , j 0 ) has been replaced by (i, j).
In our example dual feasibility is violated, for example, for i = 2 and j = 1. Edge
(2, 1) forms a unique cycle with the spanning tree T , and we would like to increase x21
by pushing flow along this cycle. In particular, increasing x21 by θ will increase x12
and decrease x11 and x22 by the same amount. The situation is shown on the right of
Figure 11.1. If we increase x21 by the maximum amount of θ = 3 and re-compute the
values of the dual variables λ and µ, we obtain the following tableau:
−5 −3 −7 −9
7 9
0 3 5
5 3 4 6
0 6
−3 3 7
2 7 4 1
0 −2
−5 1 8
5 6 2 4
Now, c24 < λ2 − µ4 , and we can increase x24 by 7 to obtain the following tableau,
which satisfies cij > λi − µj for all (i, j) ∈
/ T and therefore yields an optimal solution:
−5 −3 −2 −4
2 4
0 3 5
5 3 4 6
0 −1
−3 3 7
2 7 4 1
5 3
0 8 1
5 6 2 4
44 11 · The Transportation Algorithm
Consider a flow network (V, E) with a single source 1, a single sink n, and finite capac-
ities mij = Cij for all (i, j) ∈ E. We will also assume for convenience that mij = 0 for
all (i, j) ∈ E. The maximum flow problem then asks for the maximum amount of flow
that can be sent from vertex 1 to vertex n, i.e., the goal is to
maximize δ
if i = 1
X X δ
subject to xij − xji = −δ if i = n
(12.1)
j:(i,j)∈E j:(j,i)∈E 0 otherwise
0 6 xij 6 Cij for all (i, j) ∈ E.
This problem is in fact a special case of the minimum-cost flow problem. To see
this, set cij = 0 for all (i, j) ∈ E, and add an edge (n, 1) with infinite capacity and
cost cn1 = −1. Since the new edge (n, 1) has infinite capacity, any feasible flow of
the original network is also feasible for the new network. Cost is clearly minimized by
maximizing the flow across the edge (n, 1), which by the flow conservation constraints
for vertices 1 and n maximizes flow through the original network.
Consider a flow network G = (V, E) with capacities Cij for (i, j) ∈ E. A cut of G is a
partition of V into two sets, and the capacity of a cut is defined as the sum of capacities
of all edges across the partition. Formally, for S ⊆ V, the capacity of the cut (S, V \ S)
is given by
X
C(S) = Cij .
(i,j)∈S×(V\S)
Assume that x is a feasible flow vector that sends δ units of flow from vertex 1 to
vertex n. It is easy to see that δ is bounded from above by the capacity of any cut S
with 1 ∈ S and n ∈ V \ S. Indeed, for X, Y ⊆ V, let
X
fx (X, Y) = xij .
(i,j)∈E∩(X×Y)
45
46 12 · The Maximum Flow Problem
The following result states that this upper bound in fact tight, i.e., that there exists
a flow of size equal to the minimum capacity of a cut that separates vertex 1 from
vertex n.
Theorem 12.1 (Max-flow min-cut theorem). Let δ be the optimal solution of (12.1)
for a network (V, E) with capacities Cij for all (i, j) ∈ E. Then,
δ = min { C(S) : S ⊆ V, 1 ∈ S, n ∈ V \ S } .
Proof. It remains to be shown that there exists a cut that separates vertex 1 from vertex
n and has capacity equal to δ. Consider a feasible flow vector x. A path v0 , v1 , . . . , vk is
called an augmenting path for x if xvi−1 vi < Cvi−1 vi or xvi vi−1 > 0 for every i = 1, . . . , k.
If there exists an augmenting path from vertex 1 to vertex n, then we can push flow
along the path, by increasing the flow on every forward edge and decreasing the flow
on every backward edge along the path by the same amount, such that all constraints
remain satisfied and the amount of flow from 1 to n increases.
Now assume that x is optimal, and let
By optimality of x, n ∈ V \ S. Moreover,
The first equality holds by (12.2). The second equality holds because xij = 0 for
every (i, j) ∈ E ∩ ((V \ S) × S). The third equality holds because xij = Cij for every
(i, j) ∈ E ∩ (S × (V \ S)).
a 1 b
5 1
4
s t
5 5
c 2 d
3. Otherwise pick some augmenting path from 1 to n, and push a maximum amount
of flow along this path without violating any constraints. Then go to Step 2.
Consider for example the flow network in Figure 12.1. Pushing one unit of flow along
the path s, a, b, t, four units along the path s, a, d, t, and one more unit along the path
s, c, d, t yields a maximum flow, and the fact that this flow is optimal is witnessed by
the cut ({s, a, b, c, d}, {t}), which has capacity 6.
If all capacities are integral and if we start from an integral flow vector, e.g., the
flow vector x such that xij = 0 for all (i, j) ∈ E, then the Ford-Fulkerson algorithm
maintains integrality and increases the overall amount of flow by at least one unit in
each iteration. The algorithm is therefore guaranteed to find a maximum flow after
a finite number of iterations. Clearly, the latter also holds when all capacities are
rational.