MathReview 2
MathReview 2
1
Yoshifumi Konishi
Ph.D. in Applied Economics
COB 337j
E-mail: [email protected]
6. Implicit Functions
Up to now, we have studied functions of the form:
y = f(x
1
; :::; x
n
)
where y is an unique endogenous variable with all RHS variables being exogenous. In this form, we
say that y is an explicit function of variables x
1
; :::; x
n
. But, we know that we can rewrite this
function as follows:
y f(x
1
; :::; x
n
) = 0
or,
g(x
1
; :::; x
n
; y) = 0
This function is qualitatively very dierent from the original function. The g function is a function
from R
n+1
to R
1
and the f function is a function from R
n
to R
1
. But, the g function is no longer
a variable function, while the original f function was a variable function. In other words, the g
function determines the relationship between y and x
1
; :::; x
n
, by holding the RHS constant. In this
form, we say that y is an implicit function of variables x
1
; :::; x
n
. Implicit functions often appear
in many economic applications. It is often impossible to separate the variable of our interest from
other variables such that we can have a nice explicit function like y = f(x
1
; :::; x
n
). In this lecture,
we will learn how to analyze such functions. Before we proceed, lets look at some examples.
Example 1.
(i) Consider a prot function:
= pF(x) wx
where p and w are prices of output and input, respectively, and F is a production function. To
maximize this, we take the rst-order condition:
pF
0
(x) w = 0
1
This lecture note is adapted from the following sources: Simon & Blume (1994), W. Rudin (1976), A. Takayama
(1985), M. Wadachi (2000), and Toda & Asano (2000).
1
So, now the optimal value x is an implicit function of p and w. Suppose F(x) = x
1=2
: (Question:
What is this type of function called in economics? Answer: Decreasing return to scale). In this
case, we can rewrite it as an explicit function:
p
2
x
1
2
w = 0 = x =
_
2w
p
_
2
(ii) Consider a production possibility frontier described by the following relationships:
y
2
+ 4xy + 4x
2
= 0 (1)
y
3
5xy + 4x
2
= 0 (2)
In the case of (1), we can explicitly solve for y:
(y + 2x)
2
= 0 = y = 2x
But, in the case of (2), we cannot explicitly solve for y. But, clearly, there is some relationship
between x and y. For example, suppose x = 1. Then, the identity becomes: y
3
5y +4 = 0. Thus,
y = 1 is the solution. When x = 0, y
3
= 0, so that y = 0. Question: Suppose that we can dene
an implicit function of the form g(x; y) = c. Does this necessarily mean there is a function from
x to y? We do not need to be able to solve explicitly. Answer: No. Remember the denition of
a function. For each point x X, there must be at most one point y Y such y = f(x). But,
consider:
y
2
3xy 10x
2
= 0
We can factorize this as:
(y 5x)(y + 2x) = 0 = y = 5x; 2x
So, for each x, there are more than one value of y. But, this case is easily dealt with by restricting
the range of y _ 0. On the range y _ 0, there exists a smooth (continuous and dierentiable)
function y = 5x. There is a more problematic case in which we cannot derive a smooth function in
the neighborhood of some points x. This can be seen graphically (See the graphs). More generally,
we have the following well-known theorem:
Theorem 6-1 (Implicit Function Theorem): Let g : X R
2
R be continuously dieren-
tiable in the neighborhood U of (x
0
; y
0
). Suppose g(x
0
; y
0
) = c and g
y
(x
0
; y
0
) ,= 0. Then, there
exists a C
1
function y = f(x) dened on U such that:
(i) g(x; f(x)) = c for all x U;
(ii) y
0
= f(x
0
);
(iii)
dy
dx
(x
0
) =
g
x
(x
0
; y
0
)
g
y
(x
0
; y
0
)
2
Proof: The proof of this theorem is a bit involved. So, we will simply prove the equality in (iii).
From (i), we know that in the neighborhood U, we have:
g(x; f(x)) = 0
Because this is an identity in this neighborhood, dierentiate both sides with respect to x:
g
x
(x; f(x)) +g
y
(x; f(x))f
0
(x) = 0
Solving for f
0
(x), and substituting (ii), we obtain:
f
0
(x
0
) =
dy
dx
(x
0
; y
0
) =
g
x
(x
0
; y
0
)
g
y
(x
0
; y
0
)
QED:
This theorem can be extended to more than two variable cases.
Theorem 6-2 (Implicit Function Theorem): Let g : X R
n+1
R be continuously dieren-
tiable in the neighborhood U of (x
1
; :::; x
n
; y
). Suppose g(x
1
; :::; x
n
; y
) = c and g
y
(x
1
; :::; x
n
; y
) ,=
0. Then, there exists a function y = f(x
1
; :::; x
n
) dened on U such that:
(i) g(x
1
; :::; x
n
; f(x
1
; :::; x
n
)) = c for all (x
1
; :::; x
n
) U;
(ii) y
= f(x
1
; :::; x
n
);
(iii) for each i,
dy
dx
i
(x
1
; :::; x
n
) =
g
x
i
(x
1
; :::; x
n
; y
)
g
y
(x
1
; :::; x
n
; y
)
Example 2. Consider g(x; y) = x
2
3xy +y
3
= 0. Lets nd dy=dx at (x; y) = (3; 3).
Check that g
y
(3; 3) ,= 0: g
y
= 3x+3y
2
: So, g
y
(3; 3) = 3 3+3 3
2
= 9+27 = 18: Now, compute
g
x
(3; 3) :
g
x
(3; 3) = 2x 3y[
(3;3)
= 2 3 3 3 = 3
Use the implicit function theorem to get:
dy
dx
(3; 3) =
g
x
(3; 3)
g
y
(3; 3)
=
(3)
18
=
1
6
Similarly, we can even consider a system of implicit functions:
g
1
(x
1
; :::; x
n
) = 0
g
2
(x
1
; :::; x
n
) = 0
.
.
.
g
m
(x
1
; :::; x
n
) = 0
3
Not surprisingly, there is a corresponding implicit function theorem for such system. But, it is
beyond the scope of this lecture. You wont see its use, unless you attend a real analysis course
or more advanced courses. We conclude this section discussing the graphical properties of implicit
functions.
Geometrically, implicit functions in R
n
dene level sets in R
n
. For example, g(x; y) = ax+by = c
denes a line in R
2
; g(x; y) = x
2
+y
2
= c denes a circle in R
2
; g(x; y; z) = x
2
+y
2
+z
2
= c denes
a sphere in R
3
. More formally, recall the denition of level sets:
Denition 5-1 (Level Sets): Suppose f : X R
1
where X R
n
. Then, the level set for a
point a f(X) is the set L X such that:
L(a) = x X : f(x) = a
We will prove the following result that seems very intuitive. That is, the gradient vector of the
implicit function _g at x is perpendicular to the tangent line (or plane) on the level set at that
point.
Theorem 6-3: Let g : X R
2
R be continuously dierentiable in the neighborhood U of
(x
; y
). Suppose _g(x
; y
) ,= 0. Then, _g(x
; y
; y
).
Proof : By denition,
_g(x
; y
) =
_
g
x
(x
; y
)
g
y
(x
; y
)
_
,= 0
If g
y
(x
; y
) = 0, then the derivative on the level set at that point, dy=dx, is + or . This
implies that the tangent line at that point is vertical. On the other hand, the gradient vector is:
_g(x
; y
) =
_
a
0
_
for some a R
This means that the gradient vector is a horizontal vector (see the graph). So, it is perpendicular
to the tangent line. Now, consider the case where g
y
(x
; y
; y
) =
g
x
(x
; y
)
g
y
(x
; y
)
Thus, the directional vector for the tangent line can be written (see the graph again):
v =
_
1;
g
x
(x
; y
)
g
y
(x
; y
)
_
We know that two vectors are perpendicular to each other if and only if v _g(x
; y
) = 0. Clearly,
v _g(x
; y
) =
_
1;
g
x
(x
; y
)
g
y
(x
; y
)
__
g
x
(x
; y
)
g
y
(x
; y
)
_
= 0
QED
4
7. Linear Algebra: System of Linear Equations
A system of linear equations arises naturally in economic applications. Even when we do have
a system of nonlinear equations such as:
y = f(x; y)
x = g(x; y)
It is very common for us to linearize the system (by using a rst-order Taylor expansion, for
example). Suppose that we have a system of linear equations of the form:
x
1
2x
2
= 8 (1)
3x
1
+x
2
= 3
We can rewrite this system in a matrix form:
_
1 2
3 1
__
x
1
x
2
_
=
_
8
3
_
In general, we can represent a system of linear equations as a matrix form:
A
(mn)
x
(n1)
= b
(m1)
We know that we could solve the system if n _ m when there are n unknowns and m equations.
But, is that a sucient condition? We will learn in this section:
(i) When do we have solutions?
(ii) When can we say the solution is unique?
(iii) Is there an ecient algorithm that computes the solutions?
There are essentially three ways of solving such a system:
(i) Substitution;
(ii) Elimination of variables;
(iii) Matrix methods.
These are probably what you have learned in your high school algebra. But, there are many fancy
algorithms within each solution category. The eciency of these methods depends on the property
of the system in question.
Example 1. Substitution
Consider the system (1) again. We can solve the rst equation easily for x
1
: x
1
= 8 + 2x
2
.
Substitute this into another equation:
3(8 + 2x
2
) +x
2
= 3
24 + 7x
2
= 3 or 7x
2
= 21
5
) x
2
= 3
Substituting back this to the rst equation, we have:
) x
1
= 8 + 2(3) = 2
Example 2. Elimination of variables
Consider (1) again. Lets rst multiply both sides by 3:
3x
1
6x
2
= 24
Subtract the second equation from both sides of this new equation:
3x
1
6x
2
= 24
) 3x
1
+x
2
= 3
7x
2
= 21
) x
2
= 3
) x
1
= 2
These solution algorithms are easy to implement, but can be very inecient if we have a large
number of unknowns and equations. In economic applications, we may have thousands of equations.
In such a case, it is more convenient to work with a matrix representation.
A
(mn)
x
(n1)
= b
(m1)
The matrix A is called the coecient matrix of the system. We often create the augmented
matrix
^
A by adding the column vector corresponding to the right-hand side.
^
A = (A[b) =
_
1 2
3 1
8
3
_
The following elementary row operations do not change the property of a linear system (i.e. it
would result in the equivalent system):
(i) Interchange two rows of a matrix;
_
1 2
3 1
8
3
_
=
_
3 1
1 2
3
8
_
(ii) Adding one row to another;
_
1 2
3 1
8
3
_
=
_
1 + 3 2 + 1
3 1
8 + 3
3
_
(iii) Multiply through each row by a nonzero number.
6
_
1 2
3 1
8
3
_
=
_
a 2a
3 1
8a
3
_
Notice that the operations we have done in elimination of variables correspond to these elementary
row operations.
The purpose of performing row operations is to create a matrix of the following form:
_
_
a
11
a
12
a
13
0 a
22
a
23
0 0 a
33
b
1
b
2
b
3
_
_
This matrix form is called row echelon form.
Denition 7-1 (Leading Zero & Row Echelon Form): A row of a matrix is said to have k
leading zeros if the rst k elements of the row are all zeros and the (k +1)-th element is nonzero.
A matrix is said to be in row echelon form if each row of the matrix has more leading zeros than
the row preceding it.
Example 3. Back-substitution and Gaussian Elimination
To see why the new echelon form is useful, consider solving the system:
_
_
a
11
a
12
a
13
0 a
22
a
23
0 0 a
33
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
b
1
b
2
b
3
_
_
First, look at the last row. We can easily solve this as x
3
= b
3
=a
33
. Then, substitute this into the
second equation, which is: a
22
x
2
+ a
23
x
3
= b
2
. This will give us the solution to x
2
. Then, we can
substitute these into the rst equation and get x
1
. So, we can easily solve the system. This method
is called back substitution. We can apply this logic and create an ecient computer algorithm to
solve thousands of equations in a second. A more general method is called Gaussian Elimination,
which is an algorithm transforming any nonsingular matrix into two triangular matrix such that
Ax = LUx = b and applies back substitution twice to Lz = b and Ux = z.
7-1. Rank and Solutions to A Linear System
As we have studied, the linear equation of the form:
a
11
x
1
+a
12
x
2
= b
1
Denes a line in the two-dimensional space. Thus, the solution to the system of two equations:
a
11
x
1
+a
12
x
2
= b
1
a
21
x
1
+a
22
x
2
= b
2
is a point (x
1
; x
2
) in the two-dimensional space at which two lines cross each other. However, if
one equation is simply a linear transformation of another, say,
7
x
1
+ 2x
2
= 8
2x
1
+ 4x
2
= 16
Then, two equations represent the same line in this space. Thus, the system has innitely many
solutions with the form: x
1
= 8 2x
2
. On the contrary, consider the system:
x
1
+ 2x
2
= 3
x
1
+ 2x
2
= 4
Clearly, these two lines represent two parallel lines do not cross each other. Thus, the system has
no solution. In general, whether a system has zero, one, or innitely many solutions depends on
the characteristics of the coecient matrix A. Notably, the rank of A.
Denition 7-2 (Rank): The rank of a matrix A is the number of nonzero rows in its (reduced)
echelon form. We write rank(A).
You probably wonder if the echelon form is unique. In fact, the echelon form is not unique, it
can have many dierent entries in the echelon form. However, the number of nonzero rows in its
echelon form is unique and does not depend on how we compute the echelon form.
Denition 7-3 (Reduced Echelon Form): A matrix is said to be in reduced echelon form if
each row of the matrix in its echelon form has one in its pivot position and each column containing
the pivot has no other nonzero entries.
Example 4. The following matrix is in reduced echelon form:
_
_
1 a 0 0 0
0 0 1 0 a
0 0 0 1 a
_
_
Example 5. Consider creating a reduced echelon form of the following:
_
0 2
5 3
_
add 2nd to 1st
_
5 5
5 3
_
subtract 1st from 2nd
_
5 5
0 2
_
divide 1st by 5
_
1 1
0 2
_
divide 2nd by -2
_
1 1
0 1
_
subtract 2nd from 1sr
_
1 0
0 1
_
So, the rank of the matrix is 2. In general, I would usually compute the reduced echelon form in
order to compute the rank, because we want to be sure that no more row operations can eliminate
nonzero rows.
Fact 1. Let A be a (mn)-matrix. Then,
(i) rank(A) _ m, the number of rows of A;
(ii) rank(A) _ n, the number of columns of A.
8
Proof : The proof of (i) is obvious from the denition. The rank of A is at most as many as
the number of rows of A, because the rank of A is the number of nonzero rows in echelon form.
To see (ii), suppose rst m _ n. Then, from (i), we have rank(A) _ m _ n: Now suppose that
n < m. Then, all (m n) rows in A can be eliminated by elementary row operations. Thus,
rank(A) _ n:QED
The corollary to this fact is the following: Let A be a (n m)-matrix and B be a (ml)-matrix.
Then,
rank(AB) = minrank(A); rank(B)
We will see why this is true in the next section.
Denition 7-4 (Homogenous System): An equation with a constant term = 0 is called a ho-
mogenous equation. A system of linear equations Ax = b with b = 0 is said to be a homogenous
system.
Now, we are ready to state the following theorem. The proof is very time-consuming. So, I will
only discuss intuitions behind it.
Theorem 7-1: Consider a system of linear equations of the form:
A
(mn)
x
(n1)
= b
(m1)
(a) When m < n (i.e. the number of equations is less than the number of variables):
(i) For every b, Ax = b has either 0 or innitely many solutions.
(ii) If rank(A) = m, then Ax = b has innitely many solutions for every b..
(b) When m > n (i.e. the number of equations is more than the number of variables):
(i) Ax = 0 has one or innitely many solutions.
(ii) For every b, Ax = b has either 0,1, or innitely many solutions.
(iii) If rank(A) = n, then Ax = b has 0 or 1 solution for every b..
(c) When m = n (i.e. the number of equations is the same as that of variables):
(i) Ax = 0 has one or innitely many solutions.
(ii) For every b, Ax = b has either 0,1, or innitely many solutions.
(iii) If rank(A) = n = m, then Ax = b has exactly 1 solution for every b.
Example 6. To see why (a)-(i) and (a)-(ii), consider the cases:
_
0 0
_
_
x
1
x
2
_
= b = 0 = b (No solution, rank(A)=0)
_
1 1
_
_
x
1
x
2
_
= b = x
1
+x
2
= b (Innitely many solutions, rank(A)=1)
Example 7. To see why (b)-(i), consider the case:
_
a
1
a
2
_
x =
_
0
0
_
=
If a
1
= a
2
= 0, then x can be anything:
If a
1
,= a
2
or a
1
= a
2
,= 0, then x = 0:
9
To see why (b)-(ii) and (b)-(iii), consider the cases:
_
1
1
_
x =
_
b
1
b
2
_
=
If b
1
= b
1
, then x = b
1
= b
1
:
If b
1
,= b
1
, then there is no solution:
(rank(A)=1)
_
0
0
_
x =
_
b
1
b
2
_
= x can be anything.
Example 8. To see why (c)-(i), consider the case:
_
1 1
a 1
__
x
1
x
2
_
=
_
0
0
_
=
x
1
+x
2
= 0
ax
1
+x
2
= 0
In this case, if a = 1, then there are innitely many solutions. But, if a ,= 1, then x
1
= x
2
and
ax
1
x
1
= 0. So, x
1
= x
2
= 0. Note also that if a = 1; then
_
1 1
1 1
_
subtract 1st from 2nd
_
1 1
0 0
_
Thus, rank(A) = 1 < 2. But, if a ,= 1, then
_
1 1
a 1
_
multiply 1st by a
_
a a
a 1
_
subtract 1st from 2nd
_
a a
0 1 a
_
divide 1st by a
_
1 1
0 1 a
_
divide 2nd by 1-a
_
1 1
0 1
_
subtract 1st by 2nd
_
1 0
0 1
_
Thus, rank(A) = 2. Therefore, we had exactly one solution when a ,= 1.
Note that in economic applications, we would like to have one exact solution. If we have more
than one solutions or no solution, then we are in trouble. Thus, in view of Theorem 7-1, we have
an ideal case when (c)-(iii). So, we frequently make some assumptions to guarantee (c)-(iii).
Suppose for example, we have the system of m linear equations with n unknowns: Ax = b.
Now, suppose further that m < n. As we see, in such a system, there is no guarantee that we have
a unique solution. However, suppose that some of the variables can be determined exogenously,
say, by policy instruments (e.g. interest rates, money supply, etc, in case of IS-LM model). We
may be able to create a reduced system with only m endogenous variables of our interest such that
the reduced system has exactly one solution. This is what the Linear Implicit Function Theorem
states:
Theorem 7-2 (Linear Implicit Function Theorem): Consider a system of linear equations,
Ax = b with m < n. Consider a partition of unknown variables:
x =
_
x
0
x
00
_
=
_
_
x
1
.
.
.
x
k
_
_ (Endogenous variables)
_
_
x
k+1
.
.
.
x
n
_
_ (Exogenous variables)
10
Then, the system has a unique solution for every choice of x
00
R
nk
if and only if (i) k = m and
(ii) the corresponding coecient matrix (for the endogenous variables):
rank
_
_
_
a
11
a
1k
.
.
.
.
.
.
.
.
.
a
k1
a
kk
_
_
_ = k = m
8. Matrix Algebra
In this course, it is assumed that you have sucient knowledge on matrix algebra. However, in
economics and econometrics, you are likely to encounter very cumbersome computation. Therefore,
it is useful to review elementary as well as advanced rules of matrix algebra. Those who are not
familiar with matrix algebra are strongly encouraged to take some time to work on exercises in
Chapter 8, Simon & Blume.
8-1. Basic Rules
When we say a (m n)-matrix, we have an array of data assorted in m rows and n columns.
It is common to index each entry of the data, as follows:
_
_
_
_
_
a
11
a
12
a
1n
a
21
a
22
a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
a
m2
a
mn
_
_
_
_
_
where we say the entry in the i-th row and j-th column is (i; j)-element of this matrix.
We will review the basic matrix algebraic operations with examples.
Addition & Subtraction
Addition and subtraction of matrices is dened only when they are of the same size. We simply
add and subtract element-by-element.
A =
_
_
2 3
1 1
0 2
_
_
; B =
_
_
1 0
2 1
5 2
_
_
A+B =
_
_
2 3
1 1
0 2
_
_
+
_
_
1 0
2 1
5 2
_
_
=
_
_
3 3
3 2
5 4
_
_
AB =
_
_
2 3
1 1
0 2
_
_
_
_
1 0
2 1
5 2
_
_
=
_
_
1 3
1 0
5 0
_
_
Scalar Multiplication
11
For any r R and any A /
mn
(R), we can dene a scalar multiplication:
rA = r
_
_
2 3
1 1
0 2
_
_
=
_
_
2r 3r
r r
0 2r
_
_
Matrix Multiplication
We can dene the matrix product AB if and only if the number of columns of A is equal to the
number of rows of B. That is, the product is dened for any combination of k; m; n such that:
A
(km)
B
(mn)
Vector product is a special case of matrix multiplication. Question: How do we compute the
following?
( 2 1 1 )
_
_
2
3
1
_
_
= 2 2 + 1 3 + 1 1 = 4 + 3 + 1 = 8
Question: How about the following? Is it well-dened? Answer: Yes. We basically repeat the
vector multiplication and arrange the resulting products according to the following rule:
i-th row j-th column = (i; j)-element in the resulting matrix
_
_
a b
c d
e f
_
_
_
A B
C D
_
=
_
_
aA+bC aB +bD
cA+dC cB +dD
eA+fC eB +fD
_
_
Because we have a collection of (i; j)-elements in the resulting matrix, where the index i coming
from the rst matrix and j coming from the second matrix, we have the (k n)-matrix as a result
of matrix multiplication.
A
(km)
B
(mn)
= C
(kn)
In general, the (i; j)-element of the resulting matrix is written as:
c
ij
= ( a
i1
a
im
)
_
_
_
b
1j
.
.
.
b
mj
_
_
_
= a
i1
b
1j
+a
i2
b
2j
+::: +a
im
b
mj
=
m
h=1
a
ih
b
hj
12
Laws of Matrix Algebra
We have the following laws whenever these operations are well-dened:
(i) Associative Laws:
(A+B) +C = A+ (B +C)
(AB)C = A(BC)
(ii) Commutative Laws of Addition:
A+B = B +A
(iii) Distributive Laws:
A(B +C) = AB +AC
(A+B)C = AC +BC
These are exactly the same laws as we had in case of real numbers. But, there is one important
law which real numbers satisfy but matrices dont. It is the commutative law for multiplication.
\a; b R; ab = ba
But, it is not necessarily true that AB = BA, even when both products are well-dened. Consider:
AB =
_
2 1
1 1
__
1 1
0 2
_
=
_
2 0
1 1
_
BA =
_
1 1
0 2
__
2 1
1 1
_
=
_
1 0
2 2
_
But, there can be some matrices that can satisfy this equality. One example is an identify matrix.
Denition 8-1 (Identify Matrix): The identity matrix is an (nn)-matrix with entries satisfying
a
ij
=
_
1 if i = j
0 if i ,= j
That is,
I =
_
_
_
_
_
1 0 0
0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1
_
_
_
_
_
The identify matrix has the following property (Check yourself): For any (k n)-matrix A,
13
AI = A
and for any (n k)-matrix B,
IB = B
Therefore, if A is a (n n)-matrix, we have:
AI = IA = A
Denition 8-2 (Transpose): The transpose of a (mn)-matrix A is the (nm)-matrix obtained
by interchanging the rows and columns of A. That is, the transpose, denoted A
T
, is a matrix such
that:
a
0
ji
= a
ij
where a
ij
is the (i; j)-element of A and a
0
ji
the (j; i)-element of A
T
:
We have the following rules of transpose:
Theorem 8-1 (Transpose Rules): Let A, B be arbitrary matrices. We have the following rules
whenever the operations are well-dened:
(i) (A+B)
T
= A
T
+B
T
;
(ii) (AB)
T
= A
T
B
T
;
(iii) (A
T
)
T
= A;
(iv) (rA
T
)
T
= rA
T
(v) (AB)
T
= B
T
A
T
You are asked to verify these in your HW.
8-2. Special Matrices
We will encounter dierent kinds of problems in economic applications. It is useful to learn
terminologies for special matrices:
Square matrix. Any (n n)-matrix is called a square matrix.
Diagonal matrix. Diagonal matrix is a square matrix in which all non-diagonal entries are zero.
D =
_
_
_
_
_
a
11
0 0
0 a
22
0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 a
nn
_
_
_
_
_
Upper-triangular matrix. Upper-triangular matrix is a matrix (usually square) in which all
entries below the diagonal are zero.
14
U =
_
_
_
_
_
a
11
a
12
a
1n
0 a
22
a
2n
.
.
.
.
.
.
.
.
.
.
.
.
0 0 a
nn
_
_
_
_
_
Lower-triangular matrix. Lower-triangular matrix is a matrix (usually square) in which all
entries above the diagonal are zero.
L =
_
_
_
_
_
a
11
0 0
a
21
a
22
0
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
nn
_
_
_
_
_
Symmetric matrix. Symmetric matrix is a square matrix A such that A = A
T
i.e. a
ij
= a
ji
for
all i; j.
S =
_
_
a b c
b d e
c e f
_
_
Idempotent matrix. Idempotent matrix is a square matrix A such that AA = A.
MM =
_
5 5
4 4
__
5 5
4 4
_
=
_
5 5
4 4
_
Permutation matrix. A square matrix of zeros and ones in which each row and each column
contains exactly one 1. It is called permutation matrix, because it permutes entries of a matrix.
P =
_
_
0 1 0
1 0 0
0 0 1
_
_
PA =
_
_
0 1 0
1 0 0
0 0 1
_
_
_
_
a b
c d
e f
_
_
=
_
_
c d
a b
e f
_
_
Nonsingular matrix. Nonsingular matrix is a square matrix of full rank (i.e. rank(A) = n when
A is a (n n)-matrix). We will learn more about this matrix later.
8-3. Elementary Matrices
Recall the three elementary row operations:
(i) Interchanging rows;
15
(ii) Adding a multiple of one row to another;
(iii) Multiplying a row by a nonzero scalar.
These row operations can be performed on a matrix A by premultiplying A by special matrices
called elementary matrices.
Theorem 8-2 (Interchanging Rows):
(i) Let a matrix E
ij
denote a permutation matrix obtained by interchanging the i-th and j-th rows
of an (n n) identity matrix. Then, left-multiplication:
E
ij
A
has the eect of interchanging the i-th and j-th rows of any (n m)-matrix A.
(ii) Let a matrix E
i
(r) denote a matrix obtained by multiplying the i-th row of an (n n) identity
matrix by r. Then, left-multiplication:
E
i
(r)A
has the eect of multiplying the i-th row of any (n m)-matrix A by r.
(iii) Let a matrix E
ij
(r) denote a matrix obtained by inserting r in the (j; i)-position of an (n n)
identity matrix. Then, left-multiplication:
E
ij
(r)A
has the eect of adding r time row i to row j of any (n m)-matrix A.
These matrices E
ij
; E
i
(r); E
ij
(r) are called elementary matrices.
Example 1. Consider a (3 3) identity matrix:
E =
_
_
1 0 0
0 1 0
0 0 1
_
_
Then,
E
12
=
_
_
0 1 0
1 0 0
0 0 1
_
_
This is simply a permutation matrix. We have seen that this matrix has the eect of interchanging
the i-th and j-th rows of any (n m)-matrix A. How about E
2
(5)?
E
2
(5) =
_
_
1 0 0
0 5 0
0 0 1
_
_
16
E
2
(5)A =
_
_
1 0 0
0 5 0
0 0 1
_
_
_
_
a b
c d
e f
_
_
=
_
_
a b
5c 5d
e f
_
_
Finally, E
23
(5) is:
E
23
(5) =
_
_
1 0 0
0 1 0
0 5 1
_
_
E
23
(5)A =
_
_
1 0 0
0 1 0
0 5 1
_
_
_
_
a b
c d
e f
_
_
=
_
_
a b
c d
5c +e 5d +f
_
_
8-4. Inverse Matrices
Thus far, we have seen addition, subtraction, and multiplication. How about division? Can we
dene division on matrices just like on numbers such as 1=a? Formally, an inverse element of an
real number a is dened as a number b such that:
ab = ba = 1
How about applying the same denition to matrices? Because an identity matrix has the role of 1
in matrix operations, an appropriate denition would be:
AB = BA = I
Question: Can we dene such matrices for all matrices A /
mn
(R)? Answer: No. If we force
this denition on any matrices A /
mn
(R); then B must be an (n m)-matrix. But then,
A
(mn)
B
(nm)
= C
(mm)
B
(nm)
A
(mn)
= D
(nn)
Then, we can never have C
(mm)
= D
(nn)
. Thus, if this denition ever works, it must be on (n n)-
matrices (i.e. square matrices).
Denition 8-3 (Inverse Matrices): Let A be a (nn) square matrix. The matrix B /
nn
(R)
is said to be an inverse of A if:
AB = BA = I
nn
When the inverse exists, we say that A is invertible. We write B = A
1
:
Theorem 8-3 (Uniqueness of Inverse): Any square matrix A can have at most one inverse.
17
Proof : Suppose that B and C are both inverses of A. Then,
C = CI = C(AB) = (CA)B = IB = B
So, B must be necessarily equal to C. QED
We have the following important theorems:
Theorem 8-4 (Equivalence): For any square matrix A, the following statements are equivalent:
(i) A is invertible;
(ii) Every system of linear equations Ax = b has a unique solution for all b R
n
;
(iii) A is nonsingular;
(iv) A has a full rank.
(iii)=(iv) is just by denition. To see why (i)=(ii), consider:
Ax = b
Since A is invertible, A
1
exists. Premultiply both sides of the equation:
A
1
Ax = Ix = x = A
1
b
A
1
is derived from A and does not depend on x, so that this will give us a unique solution. The
argument can be reversed. We have seen (ii)=(iii) in Theorem 7-1.
We have convenient computational rules for inverse matrices.
Theorem 8-5: Let A; B be invertible square matrices. Then,
(i) (A
1
)
1
= A;
(ii) (A
T
)
1
= (A
1
)
T
;
(iii) AB is invertible and (AB)
1
= B
1
A
1
:
Proof : (i) is obvious. To see (ii), postmultiply both sides by A
T
,
LHS = (A
T
)
1
A
T
= I
RHS = (A
1
)
T
A
T
=
_
AA
1
_
T
= I
T
= I
To see (iii), by denition, if we nd a matxi C such that:
C(AB) = (AB)C = I
then C is an inverse of AB and we can write C = (AB)
1
. LetC = B
1
A
1
. Then, we have:
B
1
A
1
(AB) = (AB)B
1
A
1
= I
QED
18
Theorem 8-6: If a square matrix A is invertible, then
(i) A
m
= A A ::: A
. .
m times
is invertible for any integer m and:
(A
m
)
1
= (A
1
)
m
= A
m
= A
1
A
1
::: A
1
. .
m times
(ii) For any integer r and s, A
r
A
s
= A
r+s
;
(iii) For any scalar r ,= 0; rA is invertible and (rA)
1
= (1=r)A
1
:
8-5. Partitioned Matrices (Optional)
Any matrices A can be partitioned into submatrices. For example, a (4 5)-matrix can be
partitioned into:
A =
_
_
_
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
14
a
24
a
15
a
16
a
25
a
26
a
31
a
32
a
33
a
41
a
42
a
43
a
34
a
44
a
35
a
36
a
45
a
46
_
_
_
_
_
which can be written as a (2 3)-matrix of submatrices:
A =
_
A
11
A
12
A
13
A
21
A
22
A
23
_
This is called a partitioned matrix of A. Addition and multiplication can be done similarly, as
long as each submatrix of the partitioned matrices are of sizes that allow for these operations:
A+B =
_
A
11
A
12
A
21
A
22
_
+
_
B
11
B
12
B
21
B
22
_
=
_
A
11
+B
11
A
12
+B
12
A
21
+B
21
A
22
+B
22
_
AB =
_
A
11
A
12
A
21
A
22
__
B
11
B
12
B
21
B
22
_
=
_
A
11
B
11
+A
12
B
21
A
11
B
12
+A
12
B
22
A
21
B
11
+A
22
B
21
A
21
B
12
+A
22
B
22
_
Finally, lets discuss how to actually compute an inverse of a matrix A. In the next section,
we will discuss a more ecient way of computing an inverse. However, there is a primitive way of
computing an inverse. Suppose that A is invertible. Create an augmented matrix
^
A such that:
^
A =
_
A
nn
[ I
nn
_
If there exists an inverse, then we can premultiply this by the inverse to get:
A
1
^
A = A
1
_
A
nn
[ I
nn
_
=
_
I
nn
[ A
1
nn
_
This means that, if we can nd a matrix that conduct elementary row operations such that it
convert A to I. We can nd an inverse of A by applying that matrix to an identify matrix I. In
19
actual computation, we apply elementary row operations consecutively. Consider, for example, a
(2 2) matrix and form an augmented matrix:
_
a b
c d
1 0
0 1
_
Apply elementary row operations on this matrix.
_
a b
c d
1 0
0 1
_
divide 1st by a
_
1 b=a
c d
1=a 0
0 1
_
divide 2nd by c
_
1 b=a
1 d=c
1=a 0
0 1=c
_
1=a 0
1=a 1=c
_
multiply 2nd by c
_
1 b=a
0 (ad cb) =a
1=a 0
c=a 1
_
1=a 0
c=(ad cb) a=(ad cb)
_
j
b
j
(1)
1+j
det(
^
A
j1
)
=
1
det(A)
det
_
_
b
1
a
12
a
13
b
2
a
22
a
23
b
3
a
32
a
33
_
_
You can do this for all i = 1; 2; 3. QED
Example 1. Suppose that we have the following system:
_
_
2 4 5
0 3 0
1 0 1
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
1
2
3
_
_
Lets solve for x
1
; x
2
; x
3
: Question: What do we need to do rst? Answer: First, check if A is
invertible. To do so, we need to check the determinant.
det
_
_
2 4 5
0 3 0
1 0 1
_
_
Which one looks easier to compute, row-based or column-based? Lets pick the rst column.
25
det(A) = 2 det
_
3 0
0 1
_
0 det
_
4 5
0 1
_
+ 1 det
_
4 5
3 0
_
= 2 3 + 1 (15)
= 9 ,= 0
So, it is invertible. Now, use the Cramers rule.
x
1
=
det(B
1
)
det(A)
=
1
det(A)
det
_
_
1 4 5
2 3 0
3 0 1
_
_
=
1
det(A)
_
1 det
_
3 0
0 1
_
2 det
_
4 5
0 1
_
+ 3 det
_
4 5
3 0
__
=
1
9
[3 2 4 + 3 (15)] =
50
9
=
50
9
We can do the same for x
2
; x
3
:
10. Euclidean Spaces
This section also reviews the Euclidean spaces, denoted as R
n
. In fact, we have been using this
notation quite often, without any formal denition. In this section, we will learn how to generalize
notions of points, lines, planes, distances, and angles in R
n
. Most of the topics will be left as your
reading assignment.
Denition 10-1 (Cartesian Product): Let A
1
, A
2
be any sets. Then, the Cartesian product
of two sets, A
1
, A
2
, is the set of all pairs (a
1
; a
2
) such that a
1
A
1
; a
2
A
2
. We represent the
product space as A
1
A
2
. When we have more than two sets, we use the notation:
n
i=1
A
i
. When
we use a geometric representation of (x; y)-coordinates to describe these pairs, that representation
is called the Cartesian plane.
Question: What is the dierence between A
1
A
2
.and A
1
A
2
or A
1
'A
2
? Answer: Let a; b
be in A
1
and c in A
2
. A
1
A
2
= (a; c); (b; c); A
1
A
2
= ?; A
1
' A
2
= a; b; c.
Denition 10-2 (Euclidean Space): An n-dimensional Euclidean space is the Cartesian
product of n real spaces, denoted R
n
= R R ::: R
. .
:
n times
R
n
= (x
1
; :::; x
n
) : x
i
R; \i = 1; 2:::n
So, x = (x
1
; :::; x
n
) represents a point in n-dimensional Cartesian plane. This n-tuple may be
more generally interpreted as a displacement in R
n
. For example, the displacement (2,3) means:
26
move 2 in the rst dimension (horizontally or along x-axis) and move 3 in the second (vertically
or along y-axis). In this interpretation, the vector does not necessarily start from the origin (0,0).
But, more frequently, we treat the displacement as representing the move from the origin, so that
the displacement (2,3) and the location (2,3) coincide with each other. We often call (x
1
; :::; x
n
) a
vector in R
n
, which can ambiguously mean either a location or displacement in R
n
. The Euclidean
space is often termed a normed vector space, because it is a vector space endowed with a metric,
norm.
Denition 10-3 (Vector Space): A vector space is any set V such that addition + and scalar
multiplication are well-dened on V and satisfy the following properties:
(i) (Associative law of addition): \x; y; z V; x + (y +z) = (x +y) +z;
(ii) (Neutral element for addition): 0 V; \x V; x +0 = 0 +x = x;
(iii) (Inverse element for addition): \x V; x V; x + (x) = 0;
(iv) (Associative law of scalar multiplication): \; R; \x V; ( x) = ( ) x = ( ) x =
( x);
(v) (Neutral element for scalar multiplication): \x V; 1 R; 1 x = x 1 = x;
(vi) (Distributive law of addition): \ R; \x; y V; (x +y) = x +y;
(vii) (Distributive law of scalar multiplication): \; R; \x V; ( +)x = x +x:
As we know, addition and scalar multiplication are well-dened and satisfy these properties in
R
n
. So, the Euclidean space is a vector space. In addition to addition and scalar multiplication,
the Euclidean space is endowed with another operation, called the (Euclidean) inner product. We
often denote x y or < x; y >.
Denition 10-4 (Inner Product): Let x = (x
1
; :::; x
n
); y = (y
1
; :::; y
n
) be two vectors in R
n
.
The (Euclidean) inner product of x and y is the number such that:
x y = x
1
y
1
+x
2
y
2
+::: +x
n
y
n
which satisfy the following properties:
(i) (Symmetry): x y = y x;
(ii) (Linearity): x (y +z) = x y +x z and x (y) = (x y) = (x) y;
(iii) (Positivity): x x _ 0 with = i x = 0.
Denition 10-5 (Norm): The norm of a vector x in R
n
is dened as:
[[x[[ = (x x)
1
2
=
_
n
i=1
x
2
i
_1
2
Theorem 10-1 (Properties of Norm): Suppose x; y; z R
n
and R. Then,
(i) [[x[[ _ 0 with = i x = 0;
(ii) [[x[[ = [[[[x[[;
(iii) [[x y[[ _ [[x[[ [[y[[;
(iv) [[x +y[[ _ [[x[[ +[[y[[;
(v) [[x z[[ _ [[x y[[ +[[y z[[;
(vi) [[[x[[ [[y[[[ _ [[x y[[.
27
Geometrically, we can represent a line in R
2
and a plane in R
3
(See Figures 10.26 and 10.28):
x(a) = u +av (10-1)
= (u
1
; u
2
) +a(v
1
; v
2
)
= (u
1
+av
1
; u
2
+av
2
)
x(a; b) = u +av +bw
= (u
1
; u
2
) +a(v
1
; v
2
) +b(w
1
; w
2
)
= (u
1
+av
1
+bw
1
; u
2
+av
2
+bw
2
)
For example, in (1), let u =
_
1
1
_
and let v =
_
2
3
_
: Then, the line represented by this equation is an
arbitrary extension (by a number a) of a line that starts at (1; 1) and moves 2 horizontally and 3
vertically from (1; 1).
Hyperplane. A line in R
2
and a plane in R
3
are examples of sets of points described by a
single linear equation. These sets are called hyperplanes. In general, a hyperplane in R
n
is a set of
points that has (n-1) dimensions in R
n
and are described by a linear equation:
a
1
x
1
+a
2
x
2
+::: +a
n
x
n
= c
Thus, hyperplanes in R
1
; R
2
; R
3
are, respectively, a point, a line, and a plane.
a
1
x
1
= c
a
1
x
1
+a
2
x
2
= c
a
1
x
1
+a
2
x
2
+a
3
x
3
= c
11. Linear Independence
Before formally dening linear dependence, lets consider a simple example. Suppose that we
have a relationship between x
1
and x
2
such that:
a
1
x
1
+a
2
x
2
= 0 (11-1)
Now, suppose that a
1
,= 0. Then, we can manipulate this relationship:
x
1
=
a
2
a
1
x
2
So, there is natural dependency of x
1
on x
2
. So, we say that x
1
is linearly dependent on x
2
,
because the relationship is linear. Question: What if a
2
= 0? Answer: We still have a uniquely
determined value of x
1
= 0, for each xed value of x
2
. So, we still say x
1
is linearly dependent on
x
2
. On the other hand, suppose a
1
= 0 and a
2
= 0. Then, equation (11-1) does not give us any
28
information concerning the relationship between x
1
and x
2
. That is, for a given value of x
2
, x
1
can
take any value. In this case, we say that x
1
and x
2
are linearly independent. This is the natural
denition of linear dependence. As we see, we can dene linear dependence for ndimensional
vectors.
As we saw in Section 10, the set of all scalar multiples of a non-zero vector v is a straight line
through the origin.
Denition 11-1 (Span): For a vector v R
n
, the set L(v) is said to be spanned or generated
by v if:
L(v) = rv : r R
Denition 11-2 (Linear Combination): Alinear combination of k non-zero vectors v
1
; v
2
:::; v
k
is:
c
1
v
1
+c
2
v
2
+::: +c
k
v
k
for some scalars c
1
; c
2
:::c
k
R
Denition 11-3 (Span): Let v
1
; v
2
:::; v
k
be non-zero vectors. We say that a set L(v
1
; v
2
:::; v
k
)
is generated or spanned by (v
1
; v
2
:::; v
k
) if:
L(v
1
; v
2
:::; v
k
) = c
1
v
1
+c
2
v
2
+::: +c
k
v
k
: c
1
; c
2
:::c
k
R
That is, L(v
1
; v
2
:::; v
k
) is a set of all possible linear combinations of v
1
; v
2
:::; v
k
. Moreover, if a set
V is a subset of L(v
1
; v
2
:::; v
k
), then we say that v
1
; v
2
:::; v
k
spans V .
Denition 11-4 (Linear Dependence): Vectors v
1
; v
2
:::; v
k
R
n
are linearly dependent if
and only if there exist scalars c
1
; c
2
:::c
k
R, not all equal to zero, such that:
c
1
v
1
+c
2
v
2
+::: +c
k
v
k
= 0
Denition 11-5 (Linear Independence): Vectors v
1
; v
2
:::; v
k
R
n
are linearly independent
if and only if
c
1
v
1
+c
2
v
2
+::: +c
k
v
k
= 0 == c
1
= c
2
= ::: = c
k
= 0
A ip side of this denition is that, as long as we can nd non-zero scalars c
1
; c
2
:::c
k
such that
c
1
v
1
+c
2
v
2
+::: +c
k
v
k
= 0, we can be assured that v
1
; v
2
:::; v
k
are linearly dependent. It can be
stated more generally:
Theorem 11-1: Vectors v
1
; v
2
:::; v
k
R
n
are linearly dependent if and only if the linear system:
A
_
_
_
_
_
c
1
c
2
.
.
.
c
k
_
_
_
_
_
=
_
v
1
v
2
v
k
_
_
_
_
_
_
c
1
c
2
.
.
.
c
k
_
_
_
_
_
= 0
29
has a non-zero solution c = (c
1
; c
2
; :::; c
k
).
When we have v
1
; v
2
:::; v
n
R
n
, A = (v
1
; v
2
:::; v
n
) becomes a (n n)-matrix. We can use the
equivalence theorem: a square matrix is of full rank if and only if its determinant is not zero.
Theorem 11-2: A set of n vectors v
1
; v
2
:::; v
n
R
n
is linearly independent if and only if:
det
_
v
1
v
2
v
n
_
,= 0
Another important fact is that, if we have more vectors than the number of dimensions of each
vector v
i
, then they must be linearly dependent.
Theorem 11-3: Any set of k vectors v
1
; v
2
:::; v
k
R
n
is linearly dependent if k > n.
It can be well understood with an example in R
2
. Suppose there are 3 non-zero vectors in R
2
.
Pick one vector. Then, this vector can be necessarily described as a linear combination of the other
two vectors. (See the graph).
Lets consider the implication of this for a moment. To describe ANY vector in R
2
, we only
need two non-zero 2-dimensional vectors. In the terminology just learned, a set of all vectors in
R
2
is simply a span of two non-identical non-zero vectors. In general, we might ask, "What is the
most ecient spanning to represent the set of all vectors in arbitrary vector space V?". In other
words, if v
1
; v
2
:::; v
k
span V , what is the smallest possible subset of v
1
; v
2
:::; v
k
spans V ? But, this
is precisely the role of the concept of linear independence that we considered. If v
1
; v
2
:::; v
k
are
linearly independent, no one of these vectors is a linear combination of the others and therefore,
no proper subset of v
1
; v
2
:::; v
k
would span V . This is the concept of a basis of a vector space V .
Denition 11-6 (Basis): Let w
1
; w
2
:::; w
m
be a collection of vectors in V . Then, w
1
; w
2
:::; w
m
form a basis of V if and only if:
(i) w
1
; w
2
:::; w
m
span V ;
(ii) w
1
; w
2
:::; w
m
are linearly independent.
Clearly, three non-zero vectors in R
2
cannot form a basis of R
2
, because these vectors must
be linearly dependent. Even if we have two non-zero vectors in R
2
, they cannot be a basis of R
2
if w
1
= aw
2
. Moreover, two vectors with one of them being a zero vector cannot be a basis of
R
2
, because it cannot span R
2
: Moreover, note that a zero vector in R
2
is a linear combination
of another non-zero vector, 0 = 0v. The natural basis of the Euclidean space R
n
is a canonical
basis:
e
1
=
_
_
_
_
_
1
0
.
.
.
0
_
_
_
_
_
; :::; e
n
=
_
_
_
_
_
0
0
.
.
.
1
_
_
_
_
_
Lets check if this spans R
n
and this is linearly independent. To see it spans R
n
, take any arbitrary
vector v R
n
:
30
v =
_
_
_
_
_
v
1
v
2
.
.
.
v
n
_
_
_
_
_
Then,
v = v
1
e
1
+::: +v
n
e
n
So, it spans R
n
. To see linear independence, consider:
c
1
e
1
+::: +c
n
e
n
= c
1
_
_
_
_
_
1
0
.
.
.
0
_
_
_
_
_
+::: +c
n
_
_
_
_
_
0
0
.
.
.
1
_
_
_
_
_
=
_
_
_
_
_
c
1
c
2
.
.
.
c
n
_
_
_
_
_
So, if c
1
e
1
+::: +c
n
e
n
= 0, then it must mean c
1
= ::: = c
n
= 0. It is linearly independent.
The following theorems should come naturally as a summary of our discussion above.
Theorem 11-4: If both v
1
; v
2
:::; v
n
and w
1
; w
2
:::; w
m
are a basis of V , then we must have n = m.
Theorem 11-5: Every basis of R
n
contains n vectors.
Theorem 11-6: Let v
1
; v
2
:::; v
n
be a collection of n vectors in R
n
. Then, the following statements
are equivalent:
(i) v
1
; v
2
:::; v
n
are linearly independent;
(ii) v
1
; v
2
:::; v
n
spans R
n
;
(iii) v
1
; v
2
:::; v
n
form a basis of R
n
;
(iv) det(v
1
v
2
:::v
n
) ,= 0;
(v) A = (v
1
v
2
:::v
n
) has full rank.
In view of Theorem 11-4, we can talk about the dimension of any vector space unambiguously.
Denition 11-7 (Dimension): A dimension of a vector space V is the number of vectors that
can form a basis of V .
Thus, the dimension of R
n
is exactly n.
31