0% found this document useful (0 votes)
8 views

8.4 GPU Architecture and Programming

Uploaded by

Amir
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

8.4 GPU Architecture and Programming

Uploaded by

Amir
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

GPU Architecture &

Programming
System with simple CPU and Video Cards
System with GPU
Multicore CPU vs GPU
CUDA Applications
on GPU
Example 1 : Hello World from GPU (1x10)
$ ./hello
#include <stdio.h>
Hello World from CPU!
__global__ void helloFromGPU (void) Hello World from GPU!
{ Hello World from GPU!
Hello World from GPU!
printf(“Hello World from GPU!\n”); Hello World from GPU!
} Hello World from GPU!
int main(void) Hello World from GPU!
Hello World from GPU!
{ Hello World from GPU!
// hello from cpu Hello World from GPU!
printf(“Hello World from CPU!\n”); Hello World from GPU!
helloFromGPU <<<1, 10>>>();
cudaDeviceReset();
return 0;
}
Memory Management Functions
Data Transfer between CPU and GPU
Example 2 : Data Transfer CPU GPU
CUDA Threads (SIMT)
Thread Hierarchy

Threads launched for a parallel section are partitioned


into thread blocks

Grid = all blocks for a given launch

Thread block is a group of threads that can:

Synchronize their execution


Communicate via shared memory
CUDA C Examples
Vector Addition
https://github.com/olcf-tutorials/vector_addition_cuda
Finding Maximum Value in an array
https://www.geeksforgeeks.org/how-to-run-cuda-c-c-on-jupyter-
notebook-in-google-colaboratory/
Matrix Addition
https://github.com/jcbacong/CUDA-matrix-addition
Matrix Multiplication
https://github.com/lzhengchun/matrix-cuda
GPU Compute Capability and Limits
• Technical Specifications per Compute Capability from 5.0 to 9.0
Maximum dimensionality of grid of thread blocks 3
Maximum x -dimension of a grid of thread blocks 231-1
Maximum y- or z-dimension of a grid of thread blocks 65535
Maximum dimensionality of thread block 3
Maximum x- or y-dimensionality of a block 1024
Maximum z-dimension of a block 64
Maximum number of threads per block 1024

• https://developer.nvidia.com/cuda-gpus
• https://docs.nvidia.com/cuda/cuda-c-programming-guide/
index.html#features-and-technical-specifications
Reference
• Book : CUDA C/C++ Programming Guide
https://docs.nvidia.com/cuda/cuda-c-programming-guide
• Book : PROFESSIONAL CUDA® C Programming – 2014
https://www.cs.utexas.edu/~rossbach/cs380p/papers/cuda-programming.pdf

• Tutorials (CMU &


https://people.cs.pitt.edu/~melhem/courses/xx45p/cuda1.pdf
https://www.cs.cmu.edu/afs/cs/academic/class/15418-s18/www/lectures/06_gpuarch.pdf
• Workshop (Cornell)
https://cvw.cac.cornell.edu/GPUarch/default

You might also like