CENG479 PARALLEL COMPUTING
Lec.3: Parallel Software
Dr. Hüseyin TEMUÇİN
Gazi Üniversitesi Bilgisayar Mühendisliği Bölümü
CENG479 PARALLEL COMPUTING
Summarize - Memory Structures
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
CENG479 PARALLEL COMPUTING
What about memory structure ?
● Shared Memory System
● Distributed Memory System
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
CENG479 PARALLEL COMPUTING
Shared Memory System
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
CENG479 PARALLEL COMPUTING
Distributed Memory System
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
CENG479 PARALLEL COMPUTING
An operating system “process”
● An instance of a computer program that is being
executed.
● Components of a process:
○ The executable machine language program
○ A block of memory
○ Descriptors of resources the OS has allocated to
the process
○ Security information
○ Information about the state of the process
Copyright © 2010, Elsevier Inc. All rights Reserved
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ
Multitasking
● Gives the illusion that a single processor system
is running multiple programs simultaneously.
● Each process takes turns running time slice
● After its time is up, it waits until it has a turn
again.
Copyright © 2010, Elsevier Inc. All rights Reserved https://www.javatpoint.com/multitasking-operating-system
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ
Threading
● Threads are contained within processes.
● They allow programmers to divide their
programs into (more or less) independent tasks.
● The hope is that when one thread blocks because
it is waiting on a resource, another thread will
have work to do and can run.
https://www.studytonight.com/operating-system/multithreading
Copyright © 2010, Elsevier Inc. All rights Reserved
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
Hardware to Software mapping…
● In shared memory programs:
○ Start a single process and fork threads.
○ Threads carry out tasks.
● In distributed memory programs:
○ Start multiple processes.
○ Processes carry out tasks.
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
SPMD – single program multiple data
● Same single executable program works on
different contexts
○ Data are different but execution is same
● We have to manage different data situations
using conditional statements
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
Nondeterminism
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
Writing Parallel Programs
● Divide the work among processes/threads double x[], y[];
○ Each process/thread gets roughly the same
amount of work for(int i = 0; i < n; i++) {
○ Communication is minimized. x[i] = y[i];
● Arrange for the processes/threads to }
synchronize.
● Arrange for communication among
processes/threads.
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
Sequential vs Parallel
● The parallel software must give same result / output with sequential version
○ The last result must be reduced on a process / thread
● Parallel software implementation is much more costly than sequential version
● Parallel software has side costs
○ Synchronization
○ Communication
Shared Memory System
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
Shared Memory
● Dynamic threads
○ Master thread waits for work, forks new threads, and when threads are done, they terminate
○ Efficient use of resources
○ Thread creation and termination is time consuming
● Static threads
○ Pool of threads created and are allocated work, but do not terminate until cleanup.
○ Better performance
○ Potential waste of system resources
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
Shared memory issues
● Race condition
● Critical section
● Mutually exclusive
● Mutual exclusion lock (mutex,
semaphore, ...)
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Busy-waiting
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Distributed Memory System
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
Distributed Memory: message-passing
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
We want to write a parallel program ... Now what?
● We have a serial program.
● How to parallelize it?
● We know that we need to divide work
○ Load balancing,
○ Manage synchronization,
○ Reduce communication
● Unfortunately: there is no mechanical process.
● Ian Foster has some nice framework.
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Foster’s methodology
(The PCAM Methodology)
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Foster Methodology - Partitioning
● Divide the computation to be performed
and the data operated on by the
computation into small tasks.
● The focus here should be on identifying
tasks that can be executed in parallel.
● This step brings out the parallelism in the
algorithm
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Foster Methodology - Communication
● Determine what communication needs to be carried out
among the tasks identified in the previous step.
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Foster Methodology - Aggregation
● Combine tasks and communications identified in the
first step into larger tasks.
○ For example, if task A must be executed before task B can
be executed, it may make sense to aggregate them into a
single composite task.
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Foster Methodology - Mapping
● Assign the composite tasks identified in the previous
step to processes/threads.
● This should be done so that communication is
minimized, and each process/thread gets roughly the
same amount of work.
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Example - histogram
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Serial program - input
● The number of measurements: data_count
● An array of data_count floats: data
● The minimum value for the bin containing the smallest values: min_meas
● The maximum value for the bin containing the largest values: max_meas
● The number of bins: bin_count
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Serial program - output
● bin_maxes : an array of bin_count floats
store the upper bound of each bin
● bin_counts : an array of bin_count ints stores
the number of elements in each bin
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Serial Program
int bin = 0;
for( i = 0; i < data_count; i++){
bin = find_bin(data[i], ...);
bin_counts[bin]++;
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Adding the local arrays
Better Performance ?
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Adding the local arrays
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
BBM101- Bilgisayar Programlamaya Giriş-I
Questions