0% found this document useful (0 votes)
4 views

PDC Experiments

The document is a lab file from VIT Bhopal University detailing various parallel and distributed computing techniques using OpenMP and MPI. It includes code examples for vector addition, dot product, loop work-sharing, matrix multiplication, and various MPI operations such as communication, collective operations, and non-blocking communication. The lab file is submitted by Rishabh Suri and outlines the structure and content of the experiments conducted.

Uploaded by

rishabhsuri2022
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

PDC Experiments

The document is a lab file from VIT Bhopal University detailing various parallel and distributed computing techniques using OpenMP and MPI. It includes code examples for vector addition, dot product, loop work-sharing, matrix multiplication, and various MPI operations such as communication, collective operations, and non-blocking communication. The lab file is submitted by Rishabh Suri and outlines the structure and content of the experiments conducted.

Uploaded by

rishabhsuri2022
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

VIT BHOPAL UNIVERSITY

SUBjEcT:- PARALLEL ANd dISTRIBUTEd cOmPUTINg

LAB FILE

SUBmITTEd BY:-
RISHABH SURI
22BcE11028

SUBmITTEd TO:-
dR. kANNAIYA RAjA N

1|Pa ge
INdEx

TOPIc PAgE NO.

1. OPENmP – BASIc PROgRAmS SUcH AS VEcTOR AddITION, dOT PROdUcT 3-4

2. OPENmP – LOOP wORk-SHARINg ANd SEcTIONS wORk-SHARINg 4-5

3. OPENmP – cOmBINEd PARALLEL LOOP REdUcTION ANd ORPHANEd

PARALLEL LOOP REdUcTION 5-6

4. OPENmP – mATRIx mULTIPLY (SPEcIFY RUN OF A gPU cARd, LARgE ScALE

dATA … cOmPLExITY OF THE PROBLEm NEEd TO BE SPEcIFIEd) 6-7

5. mPI – BASIcS OF mPI 7-8

6. mPI – cOmmUNIcATION BETwEEN mPI PROcESS 8

7. mPI – cOLLEcTIVE OPERATION wITH "SYNcHRONIzATION" 8-9

8. mPI – cOLLEcTIVE OPERATION wITH "dATA mOVEmENT" 9-10

9. mPI – cOLLEcTIVE OPERATION wITH "cOLLEcTIVE cOmPUTATION" 10

10. mPI – NON-BLOckINg OPERATION 10-11

2|Pa ge
1) Basic programs such as Vector addition, Dot Product

This program simulates OpenMP-style vector addition using Python threading. Two
equal-length vectors are initialized with values, and the addition of each element is
performed in parallel using ThreadPoolExecutor. Each thread adds corresponding
elements from both vectors. The result is stored in a third list representing the sum. The
final output is printed after all additions are complete.

Code :-
import numpy as np
from concurrent.futures import ThreadPoolExecutor

# Vector size
N = 1000000
A = np.random.rand(N)
B = np.random.rand(N)

# 1. Vector Addition
def vector_addition_chunk(start, end):
return A[start:end] + B[start:end]

# 2. Dot Product
def dot_product_chunk(start, end):
return np.dot(A[start:end], B[start:end])

# Parallel Executor
def parallel_process(function, chunks):
results = []
with ThreadPoolExecutor() as executor:
futures = [executor.submit(function, start, end) for start, end in chunks]
for future in futures:
results.append(future.result())
return results

# Split work into 4 chunks (simulating threads)


num_chunks = 4
chunk_size = N // num_chunks
chunks = [(i * chunk_size, (i + 1) * chunk_size) for i in range(num_chunks)]

# Run vector addition


add_results = parallel_process(vector_addition_chunk, chunks)
vector_sum = np.concatenate(add_results)

# Run dot product


dot_results = parallel_process(dot_product_chunk, chunks)
dot_product_result = sum(dot_results)

3|Pa ge
print("Vector addition sample (first 5):", vector_sum[:5])
print("Dot product result:", dot_product_result)

2) Loop work-sharing and sections work-sharing

In this program, we calculate the dot product of two vectors using parallel processing.
Each pair of elements from the vectors is multiplied concurrently by separate threads.
Afterward, all the products are summed to produce the final result. Threading simulates
OpenMP’s work-sharing. This makes the dot product computation faster for larger
datasets.

Code:-
import time
from concurrent.futures import ThreadPoolExecutor

# Simulate heavy loop task


def heavy_loop(start, end):
print(f"Loop from {start} to {end}")
result = 0
for i in range(start, end):
result += i * i
return result

# Section 1: Task 1
def task1():
print("Task 1 executing...")
time.sleep(1)
return "Task 1 done"

# Section 2: Task 2
def task2():
print("Task 2 executing...")
time.sleep(1.5)
return "Task 2 done"

# Loop work-sharing
def loop_work_sharing():
total = 10000
chunks = 4
chunk_size = total // chunks
ranges = [(i * chunk_size, (i + 1) * chunk_size) for i in range(chunks)]

4|Pa ge
with ThreadPoolExecutor() as executor:
futures = [executor.submit(heavy_loop, start, end) for start, end in ranges]
results = [f.result() for f in futures]

print("Loop work-sharing result:", sum(results))

# Section work-sharing
def section_work_sharing():
with ThreadPoolExecutor() as executor:
futures = [executor.submit(task1), executor.submit(task2)]
results = [f.result() for f in futures]

print("Section work-sharing results:", results)

# Run both
loop_work_sharing()
section_work_sharing()

Output

3) Combined parallel loop reduction and Orphaned parallel loop reduction

This experiment demonstrates parallel loop reduction where multiple threads sum parts
of a number range. The range is divided among threads, each computing a partial sum.
The final sum is obtained by reducing all thread results. This mimics OpenMP’s parallel
for with a reduction clause. It enhances performance for large-scale additions.

Code:-
import numpy as np
from concurrent.futures import ThreadPoolExecutor

# Parallel loop reduction (summing values in parallel)


def partial_sum(start, end, array):
return np.sum(array[start:end])

# Orphaned loop (function called in parallel context)


def orphaned_loop(start, end, array):
return np.sum([x*x for x in array[start:end]])

5|Pa ge
def parallel_reduction(array, func):
num_threads = 4
N = len(array)
chunk_size = N // num_threads
ranges = [(i * chunk_size, (i + 1) * chunk_size) for i in range(num_threads)]

with ThreadPoolExecutor() as executor:


futures = [executor.submit(func, start, end, array) for start, end in ranges]
results = [f.result() for f in futures]
return sum(results)

# Main
A = np.random.randint(1, 10, size=10000)

# Combined parallel reduction


sum1 = parallel_reduction(A, partial_sum)
print("Combined Parallel Reduction (sum):", sum1)

# Orphaned loop reduction


sum2 = parallel_reduction(A, orphaned_loop)
print("Orphaned Loop Reduction (sum of squares):", sum2)

Output

4) Matrix multiply (specify run of a GPU card, large scale data … Complexity of the problem
need to be Specified)

Matrix multiplication is implemented using Python threads to simulate parallelism. Each


thread computes a row of the resulting matrix, using dot products with the other matrix's
columns. This shows how OpenMP can improve performance with large matrices. The
execution time is also measured to observe speedup. The result is a complete product
matrix.

Code:-
import numpy as np
from concurrent.futures import ThreadPoolExecutor
import time

# Matrix dimensions
N = 500 # Use 1000 or more for actual "large scale", reduced here for speed
A = np.random.randint(0, 10, size=(N, N))
B = np.random.randint(0, 10, size=(N, N))

# Result matrix

6|Pa ge
C = np.zeros((N, N), dtype=int)

# Function to compute a row of the result matrix


def multiply_row(i):
return np.dot(A[i], B)

# Start timing
start_time = time.time()

with ThreadPoolExecutor() as executor:


results = list(executor.map(multiply_row, range(N)))

# Collect results
for i in range(N):
C[i] = results[i]

# End timing
end_time = time.time()

print("Matrix multiplication completed.")


print("Time taken:", round(end_time - start_time, 4), "seconds")
print("Sample result (top-left 3x3):\n", C[:3, :3])

Output:-

5) Basics of MPI

This program introduces MPI concepts by simulating multiple processes. Each


simulated process prints its rank and total number of processes. It mimics the basic MPI
functions like MPI_Init, MPI_Comm_rank, and MPI_Comm_size. No communication
happens, but process identity is demonstrated. It's useful for understanding how MPI
programs start.

Code:-
def mpi_basics_simulation(num_processes):
for rank in range(num_processes):
print(f"Hello from process {rank} of {num_processes}")

# Simulate 4 processes
mpi_basics_simulation(4)

Output:--

7|Pa ge
6) Communication between MPI process

A simple message passing between two processes is simulated here. Process 0 sends a
message, and Process 1 receives and prints it. The communication logic mimics
MPI_Send and MPI_Recv. Though actual MPI libraries aren’t used, it conceptually
reflects inter-process data transfer. It helps understand how basic communication
occurs.

Code :-
def mpi_communication_simulation():
process_0_message = "Hello from Process 0"

# Simulated sending from Process 0


print("Process 0 sent message.")

# Simulated receiving in Process 1


print("Process 1 received message:", process_0_message)

mpi_communication_simulation()

Output:-

7) Collective operation with "synchronization”

In this simulation, all processes compute a local value and wait at a barrier. Once all are
synchronized, they proceed to compute a total sum collectively. It simulates MPI_Barrier
and collective synchronization behavior. The result shows every process knows the total
after sync. This helps visualize how coordination works in MPI.

Code:-
import time

def mpi_collective_sync_simulation(num_processes):
values = [rank + 1 for rank in range(num_processes)]
total = sum(values)

# Simulated barrier (waiting)


print("Synchronizing processes...")
time.sleep(1)

8|Pa ge
print(f"Total sum from all processes is: {total}")
for rank in range(num_processes):
print(f"Process {rank} passed the barrier.")

# Simulate 4 processes
mpi_collective_sync_simulation(4)

Output:-

8) Collective operation with "data movement"

This program simulates the scatter and gather operations of MPI. The root process
sends parts of a data list to each process (scatter). After local processing, each
modified value is sent back (gather). This shows how data is distributed and collected
collectively. It mimics MPI_Scatter and MPI_Gather.

Code:-
def mpi_data_movement_simulation():
num_processes = 4
data = [i * 10 for i in range(num_processes)] # Simulate root process data

print("Original data in root process:", data)

# Simulate scatter
scattered = [data[i] for i in range(num_processes)]
for rank in range(num_processes):
print(f"Process {rank} received data: {scattered[rank]}")

# Each process modifies data (e.g., add 1)


modified = [x + 1 for x in scattered]

# Simulate gather
gathered = modified
print("Gathered data back at root process:", gathered)

mpi_data_movement_simulation()

Output :-

9|Pa ge
9) Collective operation with "collective computation"

Here, each process starts with a local value and participates in an all-reduce operation.
All values are summed up and the result is shared with every process. It mimics
MPI_Allreduce which combines computation with communication. The final sum is
visible to all processes. It highlights the power of collective operations.

Code:-
def mpi_collective_computation_simulation():
num_processes = 4
local_values = [rank + 1 for rank in range(num_processes)] # Each process has a value

print("Local values of each process:", local_values)

# Simulate AllReduce sum operation


total = sum(local_values)
for rank in range(num_processes):
print(f"Process {rank} knows total sum is: {total}")

mpi_collective_computation_simulation()

Output:-

10) Non-blocking operation

This simulation shows non-blocking communication using threads. A message is sent in


a separate thread, while the main process continues working. After finishing, it waits for
the message to complete and then processes it. This reflects MPI_Isend and MPI_Irecv.
It allows overlapping communication with computation.

Code:-
import threading

10 | P a g e
import time

def simulate_non_blocking_send(received_data):
print("Sending data (non-blocking)...")
time.sleep(2) # Simulate delay
received_data.append("Hello from Process 0")

def mpi_non_blocking_simulation():
received_data = []

# Start non-blocking send using a thread


sender_thread = threading.Thread(target=simulate_non_blocking_send,
args=(received_data,))
sender_thread.start()

# Meanwhile, Process 1 continues doing something else


for i in range(3):
print(f"Process 1 is working... step {i+1}")
time.sleep(1)

sender_thread.join() # Wait for send to complete


print("Process 1 received data:", received_data[0])

mpi_non_blocking_simulation()

Output:-

11 | P a g e

You might also like