0% found this document useful (0 votes)
45 views1 page

Conc Ass 1

This document outlines three questions for a concurrent programming assignment. Students will work in groups of four. Q1 asks students to implement a matrix multiplication algorithm in parallel using OpenMP, Intel TBB, and Cilk++. They must analyze performance with different matrix sizes and block sizes. Optional extension to GPU programming is suggested. Q2 requires studying and analyzing the bitonic sorting algorithm using divide-and-conquer and implementing it using Intel TBB's task-based programming model. Q3 involves designing programs to sort lines of numbers from a file sequentially, using a bitonic sorter implemented as in Q2, and using Intel TBB's pipeline pattern. Performance of each implementation should be analyzed on

Uploaded by

CSEBaba
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views1 page

Conc Ass 1

This document outlines three questions for a concurrent programming assignment. Students will work in groups of four. Q1 asks students to implement a matrix multiplication algorithm in parallel using OpenMP, Intel TBB, and Cilk++. They must analyze performance with different matrix sizes and block sizes. Optional extension to GPU programming is suggested. Q2 requires studying and analyzing the bitonic sorting algorithm using divide-and-conquer and implementing it using Intel TBB's task-based programming model. Q3 involves designing programs to sort lines of numbers from a file sequentially, using a bitonic sorter implemented as in Q2, and using Intel TBB's pipeline pattern. Performance of each implementation should be analyzed on

Uploaded by

CSEBaba
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Concurrent Programming*

*- This assignment will be will have about 4 questions and they will be added during the course. Students
groups (4 students) can work on this assignment. The final submission date will be announced later.

Q1.) Write a well optimized parallel algorithm with support for cache (blocking) for the following matrix
operation using OpenMP, Intel Thread Building Blocks and Cilk++.
A = B*C + B*D’

All matrices are of same dimension (n x n). You must provide a performance analysis with different
matrix dimensions and block sizes.

Optional (not be evaluated but highly encouraged)


Try the above implementation in GPU with CUDA/OpenCL programming. Learn different memory
structures in GPU (texture) and warp methods and test them with your implementation.
(http://developer.nvidia.com/object/cuda_3_2_downloads.html)

Q2) Bitonic search is a network sorting algorithm efficient with multiprocessors. Study the Bitonic search
algorihms (http://en.wikipedia.org/wiki/Bitonic_sorter) and analyze it using the divide and conquer
pattern. Use the fork-join pattern to implement the algorithms. Use Intel thread building blocks and its
Task-Based Programming model.

Q3) Assume you are given a file which contains lines of numbers separated by spaces. Each line consists
of 1000-2000 or more numbers and there should about 10000 or more such lines. (you have to create a
such file). You have to design a program that read each line from the file and sort it and write the sorted
lines to another file. For the sorting you have to use a Bitonic sorter. You are required to,

A.) Implement program for sequential processing.


B.) Implement using the algorithm you developed in Q2) above.
C.) Implement the program using the pipeline pattern using Intel Thread Building Blocks. Design
appropriate pipeline stages for best performance gains. Apply cache optimization if possible.
Experiment with the pipeline stages with the Bitonic sort itself.
D.) Show performances of each of the implementations. Test you program on different multicore
machines.

You might also like