parallelcomputing-webminar.ppsx

Parallel Computing
30.09.2020
Holy Cross College, Puttady Kerala
International Webinar
Dr.A.Bharathi Lakshmi
Head of IT Department, VVVC, VNR

Content
•What
•Why
•Architecture
•Software and Processors
•Parallel Programming
•Research Work

Parallel Computing
Serial Computing
Parallel Computing

Why Parallel Computing
•Save Time
•Memory Usage
•Concurrency

Architecture
•Flynn’s Taxonomy
•Feng’s Classification
•Handler Classification

Memory Architecture
•Shared Memory
•Uniform Memory Access (UMA)
•Non Uniform Memory Access (NUMA)
•Distributed Memory
•Hybrid Memory

Memory Architecture
Distributed Memory

Memory Architecture
Hybrid Memory

Type of Parallel Computing
•Data Parallel
•Task Parallel
•Pipeline Parallel

OS & Processor
• Multiprocessing
• Multitasking
• Multithreading
• AMD
• 4 – 32 Cores
• 4 – 64 Threads
• Intel
• 2 – 7 Cores
• Duo – multithreading
• I7 – 8 Cores

Programming Languages
• Apache Hadoop
• Apache Spark
• Apache Flink
• Apache Beam
• CUDA
• OpenCL
• OpenHMPP
• OpenMP for C, C++ and Fortran (Shared Memory)
• Message Passing Interface (MPI) for C, C++ and Fortran (Distributed
Memory)

OpenMP
•Thread Modeling
•Converting Serial to Parallel program is easy
•Unix pThread
•Compiler directives
•Runtime Library
•Environmental Variables
•Fork-Join Model

OpenMP - Directives
• For Parallel work-sharing
• parallel - # pragma omp parallel
• for - #pragma omp [parallel] for [clauses]
• sections - #pragma omp [parallel] sections [clauses]
• single - #pragma omp single [clauses]
• For Master and Synchronization
• master
• critical
• barrier
• atomic
• flush
• Ordered
• #pragma omp directive

OpenMP
•omp.h – Runtime Library
•Environmental Variable
•omp_dynamic
•omp_num_threads
•omp_schedule
•omp_nested

omp_dynamic
•Syntax
omp_dynamic = boolean value
•value – true | false
•True – allow users to adjust number of threads
•False – users can’t adjust number of threads
•Default value - false

omp_num_threads
• Syntax
omp_num_threads = num_list
• Num_list – positive integer values
• Single value
• True & parallel construct without num_threads
• false & parallel construct without num_threads
• Multiple values
• True & parallel construct without num_threads
• false & parallel construct without num_threads

omp_schedule
• Syntax
omp_schedule [= type[,size]]
• Type
• Dynamic
• Guided
• Runtime
• static
• Size
• Iterations
• Integer
• Not valid – Runtime

omp_nested
• Syntax
omp_nested [= true | false]
• True - enabled
• False – disabled
• Default - false

Functions
• omp_set_num_threads(int num_threads)
• int omp_get_num_threads
• int omp_get_max_threads
• int omp_get_thread_num
• int omp_get_num_procs
• void omp_set_dynamic
• int omp_get_dynamic
• void omp_set_nested
• int omp_get_nested

OpenMP - Clauses
• For General attributes
• if - if(expression)
• num_threads – num_threads(num)
• ordered - ordered
• schedule
• nowait - nowait
• For data-sharing attributes
• private – private(var)
• firstprivate – firstprivate(var)
• lastprivate – lastprivate(var)
• shared – shared(var)
• default – default(shared | none)
• reduction – reduction(operator:list)

Paradigm for using OMP
• Write a sequential program
• Identify the portion to be parallelized
•Add directive/pragmas
• In addition to this call runtime library routines and
modify environment variables
• Parallel programming is ready.
• Use OpenMP’s compiler to compile
•Run the program

Matrix Multiplication
•Serial coding
for(int i=0;i<n;i++)
for(int k=0;k<n;k++)
for(int j=0;j<m;j++)
c[i][j]=c[i][j]+a[i][k]*b[k][j];

Matrix Multiplication
• Parallel coding
#include<omp.h>
omp_set_num_threads(4);
#pragma omp parallel for private(i,j,k)
{
for(int k=0;k<n;k++)
c[i][j]=c[i][j]+a[i][k]*b[k][j];
}

Sum of an Array
•Serial coding
sum=0;
sum+=a[i][j];

Sum of an Array
• Parallel coding
#include<omp.h>
omp_set_num_threads(4);
#pragma omp parallel for private(i,j) reduction(+:sum)
{
sum+=a[i][j];
}

Image Reconstruction - Pseducode

Image Reconstruction – Time complexity
Time complexity Time complexity Graph
Speedup Graph
10 12 15 20 30
FBP 0.008455 0.007704 0.00781 0.015839 0.021083
SIRT 75.6588 76.6664 91.628 56.3881 176.8353
SART 50.5161 57.6855 56.3243 56.3881 202.067
ART 1609.1 1699.73 1889.8 918.3131 723.983
MLEM 522.894 462.973 750.215 709.861 2134.4
MAPEM 726.522 532.309 727.098 532.317 771.465
2 Core 502.087 332.341 502.65 332.347 463.192
4 Core 398.953 297.146 399.495 297.143 447.483
8 Core 198.488 145.926 199.045 145.934 259.513

Square Naïve Matrix Multiplication
Time complexity
Time complexity Graph
Speedup Graph
0
200
400
600
800
1000
1200
1400
1000 × 1000 2000 × 2000 3000 × 3000 4000 × 4000 5000 × 5000
1 Core 2 Cores 4 Cores 8 Cores
12 Cores 16 Cores 18 cores 20 Cores
0.0000
2.0000
4.0000
6.0000
8.0000
10.0000
12.0000
14.0000
16.0000
18.0000
1000 ×
1000
2000 ×
2000
3000 ×
3000
4000 ×
4000
5000 ×
5000
2 4 8 12
16 18 20
Cores
1000 x
1000
2000 x
2000
3000 x
3000
4000 x
4000
5000 x
5000
1 4.1621 54.4683 233.965 639.153 1282.2257
2 2.1539 25.9129 118.3784 316.9857 641.8125
4 1.0993 17.0027 64.2888 172.9639 329.7763
8 0.5822 8.5168 34.0960 81.1930 163.0.773
12 0.5074 5.8074 22.6845 62.6338 135.0753
16 0.5061 4.7371 19.455 57.1035 126.0156
18 0.4708 4.6368 18.6277 52.2070 120.5529
20 0.4487 0.5695 0.6275 0.5502 0.5177

Hot research
•Nividia
•Data mining – tremendous data
•Lacking in techniques and Computational power
•AI/Machine learning
•Image Processing
•Medical Field
• Image Reconstruction

parallelcomputing-webminar.ppsx

parallelcomputing-webminar.ppsx

More Related Content

Similar to parallelcomputing-webminar.ppsx

More from Bharathi Lakshmi Pon

Recently uploaded

parallelcomputing-webminar.ppsx