Parallel and Distributed Computing
Parallel and Distributed Computing
A) Define Moore's law of speedup and also find out the speedup if a
code contains 60% parallel code run on 4 cores.
Moore's Law of Speedup, also known as Gustafson's Law, states that the speedup of a program
using parallel processing is limited by the fraction of the program that cannot be parallelized.
The law is defined as:
S = (1 - f) + f/n
Where:
S = theoretical speedup
f = fraction of parallel code
n = number of processing cores
Given a code with 60% parallel code (f = 0.6) running on 4 cores (n = 4), we can calculate the
speedup as:
S = (1 - 0.6) + (0.6/4)
S = 0.4 + 0.15
S = 1.55
So, the theoretical speedup would be approximately 1.55 times faster than the original program.
- MISD (Multiple Instruction Streams, Single Data Stream): Multiple instructions operate on one
data stream.
- MIMD (Multiple Instruction Streams, Multiple Data Streams): Multiple autonomous processors
simultaneously executing different instructions on different data.
Question #3
A) Define Cache Coherence. Snarfing and snooping. Explain any of
cache coherence protocol.
Cache Coherence :
In multiprocessor systems, each processor has its own cache. This means that different caches
could have different copies of the same memory block. Cache coherence ensures that all copies
of data stored in different caches are consistent and up to date. This is crucial for multiprocessor
systems to function correctly.
Snarfing :
Snarfing is a mechanism that ensures cache coherence. It involves a cache controller that
monitors the access to memory locations that have been cached and oversees the actual data that
is stored in the memory.
Snooping :
Snooping is another mechanism that ensures cache coherence. It is a process in which individual
caches monitor address lines for access to memory locations that they have cached.
Example:
Suppose we have two processors, P1 and P2, and two variables, A and B, that are stored in the
same cache line.
| Cache Line |
| --- | --- |
|A|B|
Although P1 and P2 are accessing different variables, they are accessing the same cache line.
This causes the cache coherence protocol to invalidate the cache line on P2's processor every
time P1 updates variable A, and vice versa.
As a result, P2's updates to variable B will always be slower because it has to wait for the cache
line to be reloaded from memory. This is an example of false sharing, where the coherence
traffic caused by P1's updates to A affects P2's updates to B, even though they are not sharing the
same variable.
To avoid false sharing, variables that are accessed by different processors should be padded to
ensure they are not stored in the same cache line
Question #4
A) Define Work Law and Span Law. Also tell speedup in form of
work law and span law.
Work Law and Span Law are two fundamental laws in parallel computing that help predict the
performance of parallel algorithms.
W = w1 + w2 + ... + wn
Where:
W = total work
wi = work performed by processor i
n = number of processors
Where:
T_serial = total execution time on a single processor
T_parallel = total execution time on multiple processors
MPI
MPI (Message Passing Interface) is a standardized communication protocol for parallel computing. It
allows processes to communicate with each other by sending and receiving messages. MPI is widely
used in high-performance computing (HPC) applications, such as scientific simulations, data analytics,
and machine learning.
2. Collective communication: broadcasting, gathering, and scattering data among multiple processes.
Pthread
PThread (POSIX Threads) is a standard for creating threads in a POSIX-compliant operating system. It
provides a set of APIs for creating, managing, and synchronizing threads.
OpenMP
OpenMP (Open Multi-Processing) is a parallel programming model for multi-platform shared memory
multiprocessing programming. It consists of a set of compiler directives, library routines, and
environment variables that influence run-time behavior.
OpenMP is used for parallelizing loops, parallelizing sections of code, and parallelizing tasks. It provides a
simple and portable way to write parallel applications
CUDA
CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model
developed by NVIDIA that allows developers to harness the power of GPU acceleration for general-
purpose computing.
CUDA enables developers to write programs that execute on the GPU, using a programming model that
is similar to C++. The GPU is treated as a coprocessor, and the developer can offload computationally
intensive tasks to the GPU, while the CPU handles other tasks.
1. Parallel programming model: CUDA allows developers to write parallel code using threads, blocks, and
grids.
2. GPU acceleration: CUDA programs execute on the GPU, leveraging its massive parallel processing
capabilities.
3. Memory hierarchy: CUDA provides a hierarchical memory architecture, including registers, shared
memory, and global memory.