0% found this document useful (0 votes)
60 views

Programming Assignments: A1 - Systemc and Openmp

The document outlines the programming assignments for CS701 High Performance Computing. It includes two assignments - one focused on SystemC and OpenMP (A1) and another focused on CUDA and OpenCL (A2). For A1, students are asked to implement designs including a full adder, single register, and basic interconnection network using SystemC. They also must complete OpenMP programs including printing thread information, summing arrays in parallel, and matrix multiplication. A2 focuses on parallel programming with CUDA and OpenCL. Students must implement matrix multiplication and SAXPY operations on both platforms, and the OpenCL program should print host and device environment details.

Uploaded by

Himanshu Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Programming Assignments: A1 - Systemc and Openmp

The document outlines the programming assignments for CS701 High Performance Computing. It includes two assignments - one focused on SystemC and OpenMP (A1) and another focused on CUDA and OpenCL (A2). For A1, students are asked to implement designs including a full adder, single register, and basic interconnection network using SystemC. They also must complete OpenMP programs including printing thread information, summing arrays in parallel, and matrix multiplication. A2 focuses on parallel programming with CUDA and OpenCL. Students must implement matrix multiplication and SAXPY operations on both platforms, and the OpenCL program should print host and device environment details.

Uploaded by

Himanshu Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

CS701 High Performance Computing

Programming Assignments
A1 - SystemC and OpenMP
SystemC Programming
(a) Soft Deadline: 00:00AM August, 10. Hard Deadline: 00:00AM August, 12. Submissions to be done through
email only. Pack your report, code, screenshots and other files in an archive and mail to [email protected].
(b) Bonus marks for creative problem solving. (c) All are team assignments. Not more than two students in a
team. One submission per team.
Submission guidelines: (a) Assignment report: Answer to each question will typically contain block diagrams/microarchitecture, brief explanation, and other relevant info. (b) Auxilary files to submit: Per question,
include one or more of the following files along with the report wherever valid: SystemC codes of the design
and the testbench, execution screenshots, VCD dump, gtkwave screenshots.
1. Full Adder. Implement a combinational full adder (FA).
2. Single Register. Implement an 8 bit register inside a Register Block. The register block takes in 3 inputs
- (a) read bit (b) write bit (c) 8 bit write data. It has one output - 8b read data. Working of the register block
follows. At the positive edge of the clock:
If read input is ON, output value from the register.
If write input is ON, write the value from write data into the register.
If both read and write are ON, read precedes write.
3. Basic interconnection network. Implement a 2 node point to point interconnection network as shown in
the following figure.

Implement a version exhibiting the following behaviour: After random intervals, A sends one message to B. B
responds with 4 replies. A prints sent and received messages at the output.

OpenMP Programming
1. Hello World program. Fork out multiple threads from the main process. All the threads are assigned
individual identifiers from the main process. Each thread should identify itself and print out a hello world
message. The master thread should print out environmental information. Environmental information include
total number of CPUs/cores available for OpenMP (use omp get num procs()), current thread ID in the parallel
region, total number of threads available in this parallel region, total number of threads requested.
2. Sum of Two Arrays. Compute the itemwise sum of two large arrays A and B and populate array C.
(C[i]=A[i] + B[i] in a loop). Portions of the arrays are computed in parallel across the team of threads.
3. Matrix Multiply. Implement parallel implementation of multiplication of large matrices (100 x 100 or more).
Threads can share row iterations evenly.

A2 - CUDA and OpenCL


Points to Note: (a) Soft Deadline: 00:00AM August, 22. Hard Deadline: 00:00AM August, 24. Submissions
to be done through email only. Pack your report, code, screenshots and other files in an archive and mail to
[email protected]. (b) Bonus marks for creative problem solving. (c) All are team assignments. Not more
than two students in a team. One submission per team.
CUDA and OpenCL can be used to distribute computational tasks between the CPU (the host) and the graphics
accelerator/GPU (the device). Program the following two problems on CUDA and OpenCL platforms. OpenCL
SDK from AMD is here: http://developer.amd.com/tools-and-sdks/.
1. Matrix Multiply. Parallel implementation of multiplication of large matrices (100 x 100 or more). Threads
can share row iterations evenly.
2. SAXPY program. SAXPY: S stands for Single precision, A is a scalar value, X and Y are one-dimensional
vectors, P stands for Plus. Operation a*X[i] + Y[i]. Write an OpenCL program to perform SAXPY on two
large vectors X and Y. The main program should print out environment details of the host and the device
before beginning computation. You may use functions such as clGetDeviceIDs() and clGetDeviceInfo() to get
the following info: no. of hosts, no. of devices, device type, no. of compute units in the device, clock frequency,
address bits, memory size, and other parameters of your interest.

You might also like