Matrix Multiplication(An example of concurrent programming)

Abstract Problem definition: Parallel computing is a form of computing which many instructions carried out on the principle that large problems can almost always be divided into smaller ones, which may be carried out concurrently ("in parallel"). Principles of Parallel Algorithm Design Identifying concurrent tasks Mapping tasks onto multiple processes Distributing input, output, intermediate data Managing access to shared data Synchronizing processors In this section we are going to see how to multiply two matrixes by using parallel programming in a unique and efficient way. There are many different way to carry out matrix multiplication by using parallel computing.

In this section we are going to see how to multiply two matrixes by using parallel programming in a unique and efficient way. There are many different way to carry out matrix multiplication by using parallel computing. Previous approach: By creating thread for each computation, Cannon’s algorithm, strassen’s multiplication . But we all know that strassen’s multiplication is quite difficult to implement and it is more restrictive also. Proposed Approach: There are two different approaches to compute matrix multiplication: By dividing matrix in the as many no. of process as required and solving all of them concurrently. This algorithm works well for all the value of size n. By dividing the matrix in terms of matrix of small size using recursion and solving the terminating condition of recursion by using strassen’s multiplication. This algorithm also works well for all the value of size n.

General introduction Basic Matrix Multiplication Suppose we want to multiply two matrices of size N x N : for example A x B = C . C 11 = a 11 b 11 + a 12 b 21 C 12 = a 11 b 12 + a 12 b 22 C 21 = a 21 b 11 + a 22 b 21 C 22 = a 21 b 12 + a 22 b 22 2x2 matrix multiplication can be accomplished in 8 multiplication. (2 log 2 8 =2 3 )

Time analysis Strassens’s Matrix Multiplication P 1 = (A 11 + A 22 )(B 11 +B 22 ) P 2 = (A 21 + A 22 ) * B 11 P 3 = A 11 * (B 12 - B 22 ) P 4 = A 22 * (B 21 - B 11 ) P 5 = (A 11 + A 12 ) * B 22 P 6 = (A 21 - A 11 ) * (B 11 + B 12 ) P 7 = (A 12 - A 22 ) * (B 21 + B 22 )

C 11 = P 1 + P 4 - P 5 + P 7 C 12 = P 3 + P 5 C 21 = P 2 + P 4 C 22 = P 1 + P 3 - P 2 + P 6 Time analysis

Proposed approach 1: Matrix multiplication by using recursion to divide the problem in terms of sub problems and solving the terminating condition of recursion by using Strassens’s Matrix Multiplication. Steps involved in this approach are: Step 1. Divide the matrix in 4 different parts by using divide and concur technique. Step 2 . Multiply each part by using recursion . Step 3 . S olve the terminating condition of the recursion by using Strassens’s Matrix Multiplication. Divide-and-Conquer Divide-and conquer is a general algorithm design paradigm: Divide: divide the input data S in two or more disjoint subsets S 1 , S 2 , … Recur: solve the subproblems recursively Conquer: combine the solutions for S 1 , S 2 , …, into a solution for S The base case for the recursion are subproblems of constant size Analysis can be done using recurrence equations

ALGORITHM Matmul(A[n]n],B[n][n],C[n][n]) //The algorithm implements matrix multiplication by dividing matrix in to sub matrix A0,A1,A2,A3…. B0,B1,B2,B3…. And multiplying each of them recursively. //Input: Two matrixes A and B of dimension n //Output: A matrix C of dimension n if (n==2) Multiply the two input matrix by using Strassens’s Matrix else Divide the matrix in the sub matrix of dimension n/2 and solve recursively. Divide matrices into sub-matrices: A 0 , A 1 , A 2 etc Use blocked matrix multiply equations Recursively multiply sub-matrices

* Consider Multiplication of 4x4 matrix: A11 A12 A13 A14 A21 A22 A23 A24 A31 A32 A33 A34 A41 A42 A43 A44 B11 B12 B13 B14 B21 B22 B23 B24 B31 B32 B33 B34 B41 B42 B43 B44 C11 C12 C13 C14 C21 C22 C23 C24 C31 C32 C33 C34 C41 C42 C43 C44

 = The sub matrix of size 2x2 can be computed as follows: Now c11 ,c12, c21,c22 can be computed as follows: C 11 = P 1 + P 4 - P 5 + P 7 C 12 = P 3 + P 5 C 21 = P 2 + P 4 C 22 = P 1 + P 3 - P 2 + P 6 P 1 = (A 11 + A 22 )(B 11 +B 22 ) P 2 = (A 21 + A 22 ) * B 11 P 3 = A 11 * (B 12 - B 22 ) P 4 = A 22 * (B 21 - B 11 ) P 5 = (A 11 + A 12 ) * B 22 P 6 = (A 21 - A 11 ) * (B 11 + B 12 ) P 7 = (A 12 - A 22 ) * (B 21 + B 22 ) A11 A12 A21 A22 B11 B12 B21 B22 C11 C12 C21 C22

In similar way we can calculate all the multiplications. And same steps we can perform on any dimension of matrix. Results : Input Size(n) No. of multiplication In sequential algo. No. of multiplication In the implemented algo. No. of multiplication In Strassen’s 2 8 7 7 4 64 56 49 8 512 448 343 16 4096 3584 2401 . . . . . . . . p p 3 (7/8)*p 3 n 2.807

Applying above algorithm when size is not in the power of 2 Steps required is as follows Step 1) Make the size to nearest power of 2 Step 2) Make the value of all extra rows to be zero Step 3) Make the value of all extra columns to be zero Step 4) Then apply above algorithm Suppose the dimension of given matrix is 3 then Nearest of 3 is 4 which is power of 2 Hence matrix becomes A11 A12 A13 0 A21 A22 A23 0 A31 A32 A33 0 0 0 0 0

Efficiency calculation: Time analysis : If the value of dimension is not in the power of 2 then first make it to the nearest power of 2 then start proceeding…… T(2)=7 ( from base condition of strassen’s multiplication) T(N)= (7/8)*N 3 Hence total no. of multiplication required is = (7 * n 3 )/8 Which is less then (n 3 ).

Proposed Approach 2: Matrix multiplication by creating process for each computation is as follows: A matrix can be divided in to number of parts and each part can be multiply by creating a separate process. A new process can be created by using fork () system call. Different steps involved in this approach are Step 1 : Partitioning a) Divide matrices into rows b) Each primitive task has corresponding rows of three matrices Step 2 : Communication a) Each task must eventually see every row of B b) Organize tasks into a ring Step 3: Agglomeration and mapping a) Fixed number of tasks, each requiring same amount of computation b) Regular communication among tasks c) Strategy: Assign each process a contiguous group of rows

Examples: Let in the first case total no. of process to be created is equal to total no. of rows in the matrix i.e. total no. of process = n where n is the dimension of matrix. Task1 Task2 Task3 Task4 A11 A12 A13 A14 A21 A22 A23 A24 A31 A32 A33 A34 A41 A42 A43 A44

Let in the second case total no. of process to be created is n/2. Task1 Task2 Task3 Task4

Process creation In parallel program, main program becomes the first program main program consists of general statements such as assignments, loops, conditionals, I/O statements and a special statement : process creation statements FORALL parallel process creation statement a parallel form of a FOR loop in which all the loop iterations are executed in parallel rather than sequentially each iteration (block) is executed in a separate process

Parent process Child process 1 Child process 2 Child process 3 Child process 4 Child process p Fork Fork Fork Fork Fork

for (i = d; i < n; i += p) for (j = 0; j < n; ++j) for (k = 0; k < n; ++k) c[i][j] += a[i][k] * b[k][j]; for (i = d; i < n; i += p) for (j = 0; j < n; ++j) for (k = 0; k < n; ++k) c[i][j] += a[i][k] * b[k][j]; for (i = d; i < n; i += p) for (j = 0; j < n; ++j) for (k = 0; k < n; ++k) c[i][j] += a[i][k] * b[k][j]; Parent process BEGIN … FORALL i:=1 TO p DO pid=fork()… if(pid==0) call child process END. Child process1 Child process 2 Child process p ...

Prototype for process creation: Prototype for process control: #include <sys/types.h> #include <unistd.h> pid_t fork(void); Returns: 0 in child, process ID of child in parent, -1 on error. #include <sys/wait.h> pid_t wait(int * status_p ); pid_t waitpid(pid_t child _ pid , int * status _ p , int options ); Returns: process ID if OK, 0 (for some variations of wait), -1 on error .

Efficiency: In the first case since we are creating a new process to multiply a row of 1 st matrix to all columns of 2 nd matrix. And same operation is performing for all row of 1 st matrix. Hence we are achieving concurrent programming. Hence efficiency increases much more. In the second case we are creating a new process to multiply two rows of 1 st matrix to all columns of 2 nd matrix one by one. And same operation is performing for all sets of 2 rows. Hence here also we are achieving concurrent programming. Hence efficiency increases rapidly.

CONCLUSION We know that strassen’s matrix multiplication has much more higher efficiency then the approach, but implementing strassen’s is quite difficult. There is several advantage of implementing the divide and concur method for matrix multiplication Its efficiency class is lies in between sequential matrix multiplication and strassen’s matrix multiplication. 2) We can use this method to multiply two matrix of any size but in the case of strassen’s we can multiply two matrix of size in the power of 2 only. 3) It is quite simple and more understandable.

There is several advantage of implementing matrix multiplication by using process creation for each calculation. By creating process for each calculation, it’s a simple way to implement concurrent programming. 2) It may be much more faster then all algorithm 3) Data concurrency is much more control fashion so no need to apply concurrency control algorithm for data control. 4) Dividing the problem in terms of sub problem of smaller size and assigning to different process is also simple. 5) This algorithm performs with equal efficiency for all dimensions of the matrix.

LIMITATIONS Implementation of matrix multiplication by divide and concur is applicable to multiply two square matrix only i.e. it is possible to multiply (n*n) matrix. It’s not possible to multiply two matrix of dimension (m*n). In the Implementation of matrix multiplication by process creation we require an algorithm which control the process. FUTHER ENHANCEMENTS Make both implementation applicable for the matrix of dimension (m*n). 2) Need an process control algorithm. 3) Make the multiplication faster then implemented above.

BIBLIOGRAPHY 1)Unix system programming using c++ - Terrence Chan 2) Introduction to the design & analysis of algorithms – Anany Levitin 3) The C Programming Language - Brian W.Kernighan && Dennnis M.Ritchie, 2 nd edition

THANK YOU Presented By –PRAMIT KUMAR

Matrix Multiplication(An example of concurrent programming)

More Related Content

What's hot (20)

Similar to Matrix Multiplication(An example of concurrent programming) (20)

Recently uploaded (20)

Matrix Multiplication(An example of concurrent programming)