Advanced Numerical Methods
Septempber 14 and 16, 2011
Lecture Notes 1: Matrix multiplication
Lecturer: Che-Rung Lee Scribe: Tien-Yu Kuo 9962526
1
1.1
Matrix multiplication
Basic denition
b11 b12 a1p b21 b22 a2p . . ,B = . . . . . . . bp1 bp2 am1 am2 amp c11 c12 c1n c21 c22 c2n T hen C = A B = . . . where cij . . . . . . cm1 cm2 cmn dim(C) = (m, n), Complexity : O(mnp) b1n b2n dim(A) = (m, p) . where dim(B) = (p, n) . . bpn
p
Direct multiplication a11 a12 a21 a22 Let A = . . . . . .
=
k=1
i=1m aik bkj , j=1n
(1)
multiplication# : mnp addition# : mn(p 1)
Inner Product form Generally, we treat all vectors as column vectors. Let a, b dim(a) = dim(b) = n with a1 b1 . . a = . , b = . and the inner product (a b) = a1 b1 + ... + an bn = aT b . . an bn T a1 a11 a1p b11 b12 b1n a21 a2p a2 T . . . = . . Hence, if A = . b1 b2 bn . = . ,B = . . . . . . . . . . bp1 bp2 bpn am1 amp am T T a1 b1 a1 T b2 a1 T bn . . . where c = a T b , i = 1 m (2) . . T hen C = AB = . ij i j . . . j=1n T T T am b1 am b2 am bn Outer Product form
1-1
a11 a12 . . . Let A = . . . am1 am2
a1p . = a a . 1 2 . amp
ap
b11 b21 ,B = . . . bp1
T b1 b1n b T b2n 2 . = . . . . . T bpn b
p
T hen C = A B = , each element in C is the same as (1). Proof: T Let C = C (1) + + C (k) + + C (p) , C (k) = ak bk
p p (k) cij k=1
T p k=1 ak bk
(k) cij
= aik bkj cij =
=
k=1
aik bkj
...the same as (1) Review Rank of a matrix. Column(row) rank : the maximum number of linearly independent column(row) vectors. The column rank and the row rank are always equal and hence it is simply called the rank of a matrix. Block form A11 . A= . . A12 . . . A1P . which contains P (M) blocks per row (column). . . AM P
AM 1 AM 2
e.g. If dim(A)=(1024, 1024), dim(Aij ) could be (64,64) all the same dim(5, 6) dim(5, 8) dim(5, 7) e.g. dim(3, 6) dim(3, 8) dim(3, 7) whose all boxes in the same row(column) have dim(2, 6) dim(2, 8) dim(2, 7) the same height(width). B11 B1N . . , if we want to do block form multiplication, dim(A ) = . Let B= . IK 2 . . BP 1 BP N 1IM dim(BKJ )1 should be true 1 J N . 1KP C11 C12 . . . T hat is, C = A B = . . .
CM 1 CM 2 Question How to prove it the same as (2)?
C1N P . ,C = . IJ AIK BKJ . k=1 CM N
1-2
1.2
Performance Consideration
Performance (1)computation Tc (2)bandwidth } Td (3)memorylatency Assume two n n matrices A and B, then C = A B is also n n. Totally 3n2 data to be stored. Td is related to 3n2 , Tc is related to 2n3 . What if the storing spaces are much less? If we only have 3b2 spaces, b n, in our fastmemory and assume b2 n. In order to compute C, A will be loaded n times, each of them is a complete A (i.e. n2 elements), as shown in Fig. 1. Figure 1: If the spaces are much less.
How about using block form? Computing b2 outputs needs to load 2b2 tations, as shown in Fig. 2.
2
n b
= 2bn data and 2b3
3 2
n b
= 2b2 n compu-
We have n2 outputs and hence 2bn n2 = 2n loadings, 2b2 n n2 = 2n3 computations. b b b 3 Tc = 2n3 , Td = 2n b times faster than other method. b Figure 2: Using block form.
1-3