Parallel Computing
30.09.2020
Holy Cross College, Puttady Kerala
International Webinar
Dr.A.Bharathi Lakshmi
Head of IT Department, VVVC, VNR
Content
•What
•Why
•Architecture
•Software and Processors
•Parallel Programming
•Research Work
Preliminary
Preliminary
Preliminary
Parallel Computing
Serial Computing
Parallel Computing
Why Parallel Computing
•Save Time
•Memory Usage
•Concurrency
Architecture
Architecture
•Flynn’s Taxonomy
•Feng’s Classification
•Handler Classification
Flynn’s Taxonomy
Flynn’s Taxonomy - SISD
Flynn’s Taxonomy – SIMD
Flynn’s Taxonomy - MISD
Flynn’s Taxonomy - MIMD
Memory Architecture
•Shared Memory
•Uniform Memory Access (UMA)
•Non Uniform Memory Access (NUMA)
•Distributed Memory
•Hybrid Memory
Memory Architecture - UMA
Memory Architecture - NUMA
Memory Architecture
Distributed Memory
Memory Architecture
Hybrid Memory
Type of Parallel Computing
•Data Parallel
•Task Parallel
•Pipeline Parallel
OS & Processor
• Multiprocessing
• Multitasking
• Multithreading
• AMD
• 4 – 32 Cores
• 4 – 64 Threads
• Intel
• 2 – 7 Cores
• Duo – multithreading
• I7 – 8 Cores
Programming Languages
• Apache Hadoop
• Apache Spark
• Apache Flink
• Apache Beam
• CUDA
• OpenCL
• OpenHMPP
• OpenMP for C, C++ and Fortran (Shared Memory)
• Message Passing Interface (MPI) for C, C++ and Fortran (Distributed
Memory)
OpenMP
•Thread Modeling
•Converting Serial to Parallel program is easy
•Unix pThread
•Compiler directives
•Runtime Library
•Environmental Variables
•Fork-Join Model
OpenMP – Fork-Join Model
OpenMP - Directives
• For Parallel work-sharing
• parallel - # pragma omp parallel
• for - #pragma omp [parallel] for [clauses]
• sections - #pragma omp [parallel] sections [clauses]
• single - #pragma omp single [clauses]
• For Master and Synchronization
• master
• critical
• barrier
• atomic
• flush
• Ordered
• #pragma omp directive
OpenMP
•omp.h – Runtime Library
•Environmental Variable
•omp_dynamic
•omp_num_threads
•omp_schedule
•omp_nested
omp_dynamic
•Syntax
omp_dynamic = boolean value
•value – true | false
•True – allow users to adjust number of threads
•False – users can’t adjust number of threads
•Default value - false
omp_num_threads
• Syntax
omp_num_threads = num_list
• Num_list – positive integer values
• Single value
• True & parallel construct without num_threads
• false & parallel construct without num_threads
• Multiple values
• True & parallel construct without num_threads
• false & parallel construct without num_threads
omp_schedule
• Syntax
omp_schedule [= type[,size]]
• Type
• Dynamic
• Guided
• Runtime
• static
• Size
• Iterations
• Integer
• Not valid – Runtime
omp_nested
• Syntax
omp_nested [= true | false]
• True - enabled
• False – disabled
• Default - false
Functions
• omp_set_num_threads(int num_threads)
• int omp_get_num_threads
• int omp_get_max_threads
• int omp_get_thread_num
• int omp_get_num_procs
• void omp_set_dynamic
• int omp_get_dynamic
• void omp_set_nested
• int omp_get_nested
OpenMP - Clauses
• For General attributes
• if - if(expression)
• num_threads – num_threads(num)
• ordered - ordered
• schedule
• nowait - nowait
• For data-sharing attributes
• private – private(var)
• firstprivate – firstprivate(var)
• lastprivate – lastprivate(var)
• shared – shared(var)
• default – default(shared | none)
• reduction – reduction(operator:list)
Paradigm for using OMP
• Write a sequential program
• Identify the portion to be parallelized
•Add directive/pragmas
• In addition to this call runtime library routines and
modify environment variables
• Parallel programming is ready.
• Use OpenMP’s compiler to compile
•Run the program
Matrix Multiplication
•Serial coding
for(int i=0;i<n;i++)
for(int k=0;k<n;k++)
for(int j=0;j<m;j++)
c[i][j]=c[i][j]+a[i][k]*b[k][j];
Matrix Multiplication
• Parallel coding
#include<omp.h>
omp_set_num_threads(4);
#pragma omp parallel for private(i,j,k)
{
for(int i=0;i<n;i++)
for(int k=0;k<n;k++)
for(int j=0;j<m;j++)
c[i][j]=c[i][j]+a[i][k]*b[k][j];
}
Sum of an Array
•Serial coding
sum=0;
for(int i=0;i<n;i++)
for(int j=0;j<m;j++)
sum+=a[i][j];
Sum of an Array
• Parallel coding
#include<omp.h>
omp_set_num_threads(4);
#pragma omp parallel for private(i,j) reduction(+:sum)
{
for(int i=0;i<n;i++)
for(int j=0;j<m;j++)
sum+=a[i][j];
}
Image Reconstruction - Pseducode
Image Reconstruction – Time complexity
Time complexity Time complexity Graph
Speedup Graph
10 12 15 20 30
FBP 0.008455 0.007704 0.00781 0.015839 0.021083
SIRT 75.6588 76.6664 91.628 56.3881 176.8353
SART 50.5161 57.6855 56.3243 56.3881 202.067
ART 1609.1 1699.73 1889.8 918.3131 723.983
MLEM 522.894 462.973 750.215 709.861 2134.4
MAPEM 726.522 532.309 727.098 532.317 771.465
2 Core 502.087 332.341 502.65 332.347 463.192
4 Core 398.953 297.146 399.495 297.143 447.483
8 Core 198.488 145.926 199.045 145.934 259.513
Square Naïve Matrix Multiplication
Time complexity
Time complexity Graph
Speedup Graph
0
200
400
600
800
1000
1200
1400
1000 × 1000 2000 × 2000 3000 × 3000 4000 × 4000 5000 × 5000
1 Core 2 Cores 4 Cores 8 Cores
12 Cores 16 Cores 18 cores 20 Cores
0.0000
2.0000
4.0000
6.0000
8.0000
10.0000
12.0000
14.0000
16.0000
18.0000
1000 ×
1000
2000 ×
2000
3000 ×
3000
4000 ×
4000
5000 ×
5000
2 4 8 12
16 18 20
Cores
1000 x
1000
2000 x
2000
3000 x
3000
4000 x
4000
5000 x
5000
1 4.1621 54.4683 233.965 639.153 1282.2257
2 2.1539 25.9129 118.3784 316.9857 641.8125
4 1.0993 17.0027 64.2888 172.9639 329.7763
8 0.5822 8.5168 34.0960 81.1930 163.0.773
12 0.5074 5.8074 22.6845 62.6338 135.0753
16 0.5061 4.7371 19.455 57.1035 126.0156
18 0.4708 4.6368 18.6277 52.2070 120.5529
20 0.4487 0.5695 0.6275 0.5502 0.5177
Hot research
•Nividia
•Data mining – tremendous data
•Lacking in techniques and Computational power
•AI/Machine learning
•Image Processing
•Medical Field
• Image Reconstruction
parallelcomputing-webminar.ppsx
parallelcomputing-webminar.ppsx

parallelcomputing-webminar.ppsx

  • 1.
    Parallel Computing 30.09.2020 Holy CrossCollege, Puttady Kerala International Webinar Dr.A.Bharathi Lakshmi Head of IT Department, VVVC, VNR
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
    Why Parallel Computing •SaveTime •Memory Usage •Concurrency
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
    Memory Architecture •Shared Memory •UniformMemory Access (UMA) •Non Uniform Memory Access (NUMA) •Distributed Memory •Hybrid Memory
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
    Type of ParallelComputing •Data Parallel •Task Parallel •Pipeline Parallel
  • 21.
    OS & Processor •Multiprocessing • Multitasking • Multithreading • AMD • 4 – 32 Cores • 4 – 64 Threads • Intel • 2 – 7 Cores • Duo – multithreading • I7 – 8 Cores
  • 22.
    Programming Languages • ApacheHadoop • Apache Spark • Apache Flink • Apache Beam • CUDA • OpenCL • OpenHMPP • OpenMP for C, C++ and Fortran (Shared Memory) • Message Passing Interface (MPI) for C, C++ and Fortran (Distributed Memory)
  • 23.
    OpenMP •Thread Modeling •Converting Serialto Parallel program is easy •Unix pThread •Compiler directives •Runtime Library •Environmental Variables •Fork-Join Model
  • 24.
  • 25.
    OpenMP - Directives •For Parallel work-sharing • parallel - # pragma omp parallel • for - #pragma omp [parallel] for [clauses] • sections - #pragma omp [parallel] sections [clauses] • single - #pragma omp single [clauses] • For Master and Synchronization • master • critical • barrier • atomic • flush • Ordered • #pragma omp directive
  • 26.
    OpenMP •omp.h – RuntimeLibrary •Environmental Variable •omp_dynamic •omp_num_threads •omp_schedule •omp_nested
  • 27.
    omp_dynamic •Syntax omp_dynamic = booleanvalue •value – true | false •True – allow users to adjust number of threads •False – users can’t adjust number of threads •Default value - false
  • 28.
    omp_num_threads • Syntax omp_num_threads =num_list • Num_list – positive integer values • Single value • True & parallel construct without num_threads • false & parallel construct without num_threads • Multiple values • True & parallel construct without num_threads • false & parallel construct without num_threads
  • 29.
    omp_schedule • Syntax omp_schedule [=type[,size]] • Type • Dynamic • Guided • Runtime • static • Size • Iterations • Integer • Not valid – Runtime
  • 30.
    omp_nested • Syntax omp_nested [=true | false] • True - enabled • False – disabled • Default - false
  • 31.
    Functions • omp_set_num_threads(int num_threads) •int omp_get_num_threads • int omp_get_max_threads • int omp_get_thread_num • int omp_get_num_procs • void omp_set_dynamic • int omp_get_dynamic • void omp_set_nested • int omp_get_nested
  • 32.
    OpenMP - Clauses •For General attributes • if - if(expression) • num_threads – num_threads(num) • ordered - ordered • schedule • nowait - nowait • For data-sharing attributes • private – private(var) • firstprivate – firstprivate(var) • lastprivate – lastprivate(var) • shared – shared(var) • default – default(shared | none) • reduction – reduction(operator:list)
  • 33.
    Paradigm for usingOMP • Write a sequential program • Identify the portion to be parallelized •Add directive/pragmas • In addition to this call runtime library routines and modify environment variables • Parallel programming is ready. • Use OpenMP’s compiler to compile •Run the program
  • 34.
    Matrix Multiplication •Serial coding for(inti=0;i<n;i++) for(int k=0;k<n;k++) for(int j=0;j<m;j++) c[i][j]=c[i][j]+a[i][k]*b[k][j];
  • 35.
    Matrix Multiplication • Parallelcoding #include<omp.h> omp_set_num_threads(4); #pragma omp parallel for private(i,j,k) { for(int i=0;i<n;i++) for(int k=0;k<n;k++) for(int j=0;j<m;j++) c[i][j]=c[i][j]+a[i][k]*b[k][j]; }
  • 36.
    Sum of anArray •Serial coding sum=0; for(int i=0;i<n;i++) for(int j=0;j<m;j++) sum+=a[i][j];
  • 37.
    Sum of anArray • Parallel coding #include<omp.h> omp_set_num_threads(4); #pragma omp parallel for private(i,j) reduction(+:sum) { for(int i=0;i<n;i++) for(int j=0;j<m;j++) sum+=a[i][j]; }
  • 38.
  • 39.
    Image Reconstruction –Time complexity Time complexity Time complexity Graph Speedup Graph 10 12 15 20 30 FBP 0.008455 0.007704 0.00781 0.015839 0.021083 SIRT 75.6588 76.6664 91.628 56.3881 176.8353 SART 50.5161 57.6855 56.3243 56.3881 202.067 ART 1609.1 1699.73 1889.8 918.3131 723.983 MLEM 522.894 462.973 750.215 709.861 2134.4 MAPEM 726.522 532.309 727.098 532.317 771.465 2 Core 502.087 332.341 502.65 332.347 463.192 4 Core 398.953 297.146 399.495 297.143 447.483 8 Core 198.488 145.926 199.045 145.934 259.513
  • 40.
    Square Naïve MatrixMultiplication Time complexity Time complexity Graph Speedup Graph 0 200 400 600 800 1000 1200 1400 1000 × 1000 2000 × 2000 3000 × 3000 4000 × 4000 5000 × 5000 1 Core 2 Cores 4 Cores 8 Cores 12 Cores 16 Cores 18 cores 20 Cores 0.0000 2.0000 4.0000 6.0000 8.0000 10.0000 12.0000 14.0000 16.0000 18.0000 1000 × 1000 2000 × 2000 3000 × 3000 4000 × 4000 5000 × 5000 2 4 8 12 16 18 20 Cores 1000 x 1000 2000 x 2000 3000 x 3000 4000 x 4000 5000 x 5000 1 4.1621 54.4683 233.965 639.153 1282.2257 2 2.1539 25.9129 118.3784 316.9857 641.8125 4 1.0993 17.0027 64.2888 172.9639 329.7763 8 0.5822 8.5168 34.0960 81.1930 163.0.773 12 0.5074 5.8074 22.6845 62.6338 135.0753 16 0.5061 4.7371 19.455 57.1035 126.0156 18 0.4708 4.6368 18.6277 52.2070 120.5529 20 0.4487 0.5695 0.6275 0.5502 0.5177
  • 41.
    Hot research •Nividia •Data mining– tremendous data •Lacking in techniques and Computational power •AI/Machine learning •Image Processing •Medical Field • Image Reconstruction