0% found this document useful (0 votes)

11 views

Lecture 17-Introduction to GPU

The document discusses the evolution and functionality of multi-core CPUs and many-core GPUs, highlighting their architectures and performance characteristics. It covers the history of GPUs, their role in rendering graphics, and their transition into parallel computing engines for various applications. Additionally, it introduces concepts like latency, throughput, and the graphics rendering pipeline, emphasizing the capabilities of GPUs in data parallelism and general-purpose computing.

Uploaded by

roarsomebros

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Lecture 17-Introduction to GPU

Uploaded by

roarsomebros

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Applied High-Performance Computing and Parallel

Programming

Presenter: Liangqiong Qu

Assistant Professor

The University of Hong Kong

Outline

▪ Multi-core CPUs and many core GPUs

▪ GPU history
▪ Rendering a picture with GPU
▪ How GPU evolved into highly parallel compute
engines for a broad class of applications
Review of Last Lecture: Multi-core CPU Processors

Significant advances in CPU

Von Neumann Architecture • Multi-core architecture with dual,

Single-core Processor quad, six or n processing cores
• Processing cores are all on one chip

• Multi-core CPU chip architecture

• Hierarchy of caches (on/off chip)

• Clock-rate for single processors increased from 10 MHz (lntel 286) to 4 GHz (Pentium 4) in 30
years
• Clock rate increase with higher 5 GHz unfortunately reached a limit due to power limitations /
heat
Review of Last Lecture: Many-core GPUs
▪ Use of very many simple cores
• High throughput computing-oriented architecture
• Use massive parallelism by executing a lot of
concurrent threads slowly
• Handle an ever increasing amount of multiple
instruction threads
• CPUs instead typically execute a single long thread as
fast as possible
▪ Many-core GPUs are used in large clusters and within
massively parallel supercomputers today

Graphics Processing Unit (GPU) is great for data parallelism and task parallelism
Compared to multi-core CPUs, GPUs consist of a many-core architecture with hundreds to
even thousands of very simple cores executing threads rather slowly
Review of Last Lecture: Multi-core CPU and Multi-core GPU
• Both the CPU and GPU are silicon-based microprocessors, and have similar internal
components, including cores, memory, and control unit.
• The core functionality of the CPU is to fetch instructions from RAM and then decode and
execute the instructions, i.e., running processes serially.
• CPU is powerful that handles all types of computing tasks required for the operating system
and applications to run.
• GPU contains thousands of smaller and less powerful cores (specialized) than CPU cores.
• GPU breaks tasks into smaller chunks and runs them in parallel and processes high
volumes of the same instructions rapidly.
Review of Previous Lecture: Latency vs Throughput
• Latency: Latency refers to the delay that happens between when a user takes an action on
a network or web application and when it reaches its destination, which is measured in
milliseconds.
• Throughput: measures the volume of data that passes through a network in a given
period.
• See below: assume only one car in a lane of the highway at once.
• When car on highway reaches Stanford, the next car leaves San Francisco.

What’s the latency and throughput here?

Picture from: https://gfxcourses.stanford.edu/cs149/fall23/lecture/multicore2-ispc/

Latency vs Throughput
• Assume only one car in a lane of the highway at once.
• When car on highway reaches Stanford, the next car leaves San Francisco.

Latency of driving from San Francisco to Stanford: 0.5 hr.

Throughput: 2 cars per hour.
Improving Throughput
Improving Throughput
Review of Last Lecture: CPU vs GPU

High throughput
GPU is a specialized component that
Lower latency is best at running many smaller tasks
CPU handles all the main functions of a computer at once.
CPU has a few cores, each with higher clock speeds GPU has a massive number of smaller
and larger caches and more specialized cores (less
powerful than CPU cores).
Outline

▪ Multi-core CPUs and many core GPUs

▪ GPU history
▪ Rendering a picture with GPU
▪ How GPU evolved into highly parallel compute
engines for a broad class of applications
GPU History

What GPUs were originally designed to do?

3D rendering

Simple definition of rendering task: computing how each triangle in 3D mesh

contributes to appearance of each pixel in the image?
GPU History
What GPUs are still designed to do

Unreal Engine 5.4 demo

Real-time rendering on a high-end GPU
Outline

▪ Multi-core CPUs and many core GPUs

▪ GPU history
▪ Rendering a picture with GPU
▪ How GPU evolved into highly parallel compute
engines for a broad class of applications
Tips: How to Explain a System
Step 1. describe the things (key entities) that are manipulated
- The nouns

Step 2: describe operations the system performs on the entities

- The verbs
Real-time Graphics Primitives (Entities)
Represent surface as a 3D triangle mesh

Vertices have following attributes: (1) Position in 3D space V(x,y,z), (2) Color:
expressed in RGB or RGBA components, (3) Vertex-Normal. (4). Texture
Real-time Graphics Primitives (Entities)
Represent surface as a 3D triangle mesh

The primitives (such as triangle, point, line or quad), which is formed by one or more
vertices. Primitives are the inputs to the graphics rendering pipeline.
Real-time Graphics Primitives (Entities)

▪ All modern displays are raster-based. A raster is a 2D rectangular grid of pixels.

Real-time Graphics Primitives (Entities)

▪ All modern displays are raster-based. A raster is a 2D rectangular grid of pixels.

▪ A pixel, short for "picture element," is the smallest unit of a digital image or display. It is
a tiny square or dot that represents a single point of color. A pixel is 2-dimensional, with
a (x, y) position and a RGB color value (no alpha value for pixels).
Real-time Graphics Primitives (Entities)

• All modern displays are raster-based. A raster is a 2D rectangular grid of pixels.

• The vertices, which is usually represented in a float value, are not necessarily aligned with
the pixel-grid of the display. The primitives are also not considered.
• Each primitive is raster-scan (rasterizer) to obtain a set of fragments enclosed within the
primitive.
• Fragments are produced via interpolation of the vertices. Hence, a fragment has all the
vertex's attributes such as color, fragment-normal and texture coordinates.
Tips: How to Explain a System
Step 1. describe the things(key entities) that are manipulated
- The nouns

Step 2: describe operations the system performs on the entities

- The verbs
Rendering a Picture
▪ Input: a list vertices in 3D space
(and their connectivity into primitives)

Example: every three vertices defines a triangle

Rendering a Picture
Step 1: given a scene camera position, compute where the
vertices lie on screen
Rendering a Picture
Step 2: group vertices into primitives
Rendering a Picture
Step 3: perform rasterization on the primitive, to obtain a
set of fragments enclosed within the primitive that are
aligned to the pixel grid of the display

A fragment is 3-dimensional, with a (x, y, z) position. The (x, y) are aligned with the 2D
pixel-grid. The z-value (not grid-aligned) denotes its depth.
The 3D fragments are interpolated from vertices.
Rendering a Picture
Step 4: compute color of primitive for each fragment
(based on scene lighting and primitive material properties)
Rendering a Picture
Step 5: put color of the closest fragment to the camera in
the output image
Real-time Graphics Pipeline

Abstracts process of rendering a picture as a sequence

of operations on vertices, primitives, fragments, and
pixels.

The purpose of the Graphics Rendering Pipeline

is to produce the color-value for all the pixels
for displaying on the screen, given the input
vertices.
Early Graphics Programming (OpenGL API)

Graphics programming APIs provided programmer mechanisms to set parameters of

scene lights and materials
Graphics Shading Languages
▪ Allow application to extend the
functionality of the graphics pipeline
by specifying materials and lights
programmatically!
• Support diversity in materials
• Support diversity in lighting
conditions

• Programmer provides mini-

programs(“shaders”) that define pipeline
logic for certain stages
• Pipeline maps shader function onto
all elements of input stream
Outline

▪ Multi-core CPUs and many core GPUs

▪ GPU history
▪ Rendering a picture with GPU
▪ How GPU evolved into highly parallel compute
engines for a broad class of applications
Observation Circa 2001-2003

These GPUs are very fast processors for

performing the same computation (shader
programs) on large collections of data
(streams of vertices, fragments, and pixels)

Wait a minute! That sounds a lot like data-

parallelism!
Hack! Early GPU-based Scientific Computation
Use a GPU which typically handles computation only for computer graphics, to perform
computation in applications traditionally handled by the central processing unit (CPU).
Two research groups independently discovered GPU-based approaches for the solution
of general linear algebra problems on GPUs that ran faster than on CPUs

Sparse Matrix Solvers [Bolz 03]

Sparse Matrix Solvers on the GPU

Brook Stream Programming Language
▪ Stanford graphics lab research project (2004)
▪ Abstract GPU hardware as data-parallel processor.

▪ Brook compiler converted generic stream program into OpenGL

commands such as drawTriangles() and a set of shader programs.
GPU Computing Mode
General-Purpose Computing on Graphics Processing Units (GPGPUs)

CUDA Programming

Next Lectures
Thank you very much for choosing this course!

Give us your feedback!

https://forms.gle/zDdrPGCkN7ef3UG5A

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (80)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
1001 Songs
69% (72)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
From Everand
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
Rodrigo Copetti
No ratings yet
SYLLABUS
No ratings yet
SYLLABUS
2 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
3-1
No ratings yet
3-1
35 pages
gpus
No ratings yet
gpus
32 pages
cs179 2017 Lec01
No ratings yet
cs179 2017 Lec01
24 pages
Seminar Igor Kamzic COSC3P93
No ratings yet
Seminar Igor Kamzic COSC3P93
58 pages
06_gpuarch
No ratings yet
06_gpuarch
78 pages
Unit 2 - GPU DFG
No ratings yet
Unit 2 - GPU DFG
27 pages
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
No ratings yet
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
24 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
0-gpu-computing-i-give-it
No ratings yet
0-gpu-computing-i-give-it
57 pages
GPGPU
No ratings yet
GPGPU
139 pages
The Evolution of Gpus For General Purpose Computing
No ratings yet
The Evolution of Gpus For General Purpose Computing
38 pages
07_gpuarch
No ratings yet
07_gpuarch
73 pages
IntroGPUs
No ratings yet
IntroGPUs
36 pages
Graphics Processing Unit Graphics Processing Unit: Dhan V Sagar CB - EN.P2CSE13007
No ratings yet
Graphics Processing Unit Graphics Processing Unit: Dhan V Sagar CB - EN.P2CSE13007
21 pages
Lecture-12-PDC - CUDA
No ratings yet
Lecture-12-PDC - CUDA
25 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
Programming For Graphics Processing Units (Gpus) : Parallel
No ratings yet
Programming For Graphics Processing Units (Gpus) : Parallel
35 pages
UNIT-4
No ratings yet
UNIT-4
48 pages
Modern GPU Architecture
No ratings yet
Modern GPU Architecture
93 pages
Lec 14
No ratings yet
Lec 14
52 pages
10 - Introduction and Overview GPGPU
No ratings yet
10 - Introduction and Overview GPGPU
69 pages
Gpgpu Workshop Cuda
No ratings yet
Gpgpu Workshop Cuda
10 pages
chapter-8
No ratings yet
chapter-8
58 pages
Comp Arch Project 2 Final
No ratings yet
Comp Arch Project 2 Final
29 pages
Khan Muhammad Nafee Mostafa: Presented by
No ratings yet
Khan Muhammad Nafee Mostafa: Presented by
20 pages
Parallel Processing Using GPU's
No ratings yet
Parallel Processing Using GPU's
34 pages
GPU 01.intro
No ratings yet
GPU 01.intro
36 pages
Lecture - 01 - CUDA Programming
No ratings yet
Lecture - 01 - CUDA Programming
52 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
247 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
No ratings yet
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
2 pages
Lecture 2
No ratings yet
Lecture 2
15 pages
Lec 6
No ratings yet
Lec 6
16 pages
Part1 22
No ratings yet
Part1 22
77 pages
Gpu IEEE Paper
No ratings yet
Gpu IEEE Paper
14 pages
Introduction To Graphics Hardware and Gpus Introduction To Graphics Hardware and Gpus
No ratings yet
Introduction To Graphics Hardware and Gpus Introduction To Graphics Hardware and Gpus
22 pages
Ppar2017 Gpu 1
No ratings yet
Ppar2017 Gpu 1
61 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
Intro Computing BCSM-F18-071 - Assignment 1
No ratings yet
Intro Computing BCSM-F18-071 - Assignment 1
10 pages
TDCI Arch
No ratings yet
TDCI Arch
77 pages
Graphics Processing Unit: Shashwat Shriparv Infinitysoft
No ratings yet
Graphics Processing Unit: Shashwat Shriparv Infinitysoft
39 pages
Introduction To GPU Architecture: © 2006 University of Central Florida
100% (1)
Introduction To GPU Architecture: © 2006 University of Central Florida
41 pages
Graphics Processing Units Paper PDF
No ratings yet
Graphics Processing Units Paper PDF
14 pages
Gpu1 - GPU Introduction
No ratings yet
Gpu1 - GPU Introduction
20 pages
GPU Architecture
No ratings yet
GPU Architecture
12 pages
Graphics Processing Unit
No ratings yet
Graphics Processing Unit
21 pages
1
No ratings yet
1
44 pages
Design of Graphics Processing Framework On FPGA
No ratings yet
Design of Graphics Processing Framework On FPGA
5 pages
HPC 5th Unit - 240504 - 160548
No ratings yet
HPC 5th Unit - 240504 - 160548
18 pages
Day1 1
No ratings yet
Day1 1
25 pages
Lec 1
No ratings yet
Lec 1
27 pages
Lec 30
No ratings yet
Lec 30
28 pages
14 Parallel Algorithms CUDA Basics s20
No ratings yet
14 Parallel Algorithms CUDA Basics s20
89 pages
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
GPU Overclocking Guide
From Everand
GPU Overclocking Guide
Alisa Turing
No ratings yet
Lecture 9-OpenMP Coclusion
No ratings yet
Lecture 9-OpenMP Coclusion
39 pages
Lecture 11 MPI Point to Point Communication
No ratings yet
Lecture 11 MPI Point to Point Communication
36 pages
Lecture 7-OpenMP-Basics
No ratings yet
Lecture 7-OpenMP-Basics
27 pages
Lecture 12-MPI Collective Communication
No ratings yet
Lecture 12-MPI Collective Communication
53 pages
Computer Architecture Assignment Help
No ratings yet
Computer Architecture Assignment Help
15 pages
COA Questions
100% (1)
COA Questions
12 pages
ICT Notes 3
No ratings yet
ICT Notes 3
6 pages
(Week 1) Overview of The Computer System - Software, Hardware Peopleware..... csc223
No ratings yet
(Week 1) Overview of The Computer System - Software, Hardware Peopleware..... csc223
43 pages
UNIT IV CLOUD ENABLING TECHNOLOGIES
No ratings yet
UNIT IV CLOUD ENABLING TECHNOLOGIES
30 pages
Erros Placa Mãe
No ratings yet
Erros Placa Mãe
6 pages
CH 01 Introduction To Computer Organization and Architecture
No ratings yet
CH 01 Introduction To Computer Organization and Architecture
54 pages
Lecture 2: Performance: CMPS 255 - Computer Architecture
No ratings yet
Lecture 2: Performance: CMPS 255 - Computer Architecture
26 pages
LECTURE 2 Operating Systems Functions
No ratings yet
LECTURE 2 Operating Systems Functions
25 pages
Computer Science
No ratings yet
Computer Science
130 pages
Studio One: Optimization, Stability and Performance
No ratings yet
Studio One: Optimization, Stability and Performance
12 pages
Various Addressing Modes of 8086 - 8088
No ratings yet
Various Addressing Modes of 8086 - 8088
3 pages
Midterm Part 2
No ratings yet
Midterm Part 2
3 pages
Chapter - 1
No ratings yet
Chapter - 1
16 pages
Cisco Placement Papers
No ratings yet
Cisco Placement Papers
44 pages
Old Syllabus
No ratings yet
Old Syllabus
54 pages
Desire C For Embedded Sofie Beerens
No ratings yet
Desire C For Embedded Sofie Beerens
248 pages
Introduction To MIMD Architecture
No ratings yet
Introduction To MIMD Architecture
16 pages
BEEE
No ratings yet
BEEE
4 pages
Simulation and Implementation of Vedic Multiplier Using VHDL Code
No ratings yet
Simulation and Implementation of Vedic Multiplier Using VHDL Code
5 pages
HP 15-bs020wm 15.6" Laptop Computer - Black: Micro Center Web Store
0% (1)
HP 15-bs020wm 15.6" Laptop Computer - Black: Micro Center Web Store
3 pages
ET5152 - Design of Embedded Systems
100% (1)
ET5152 - Design of Embedded Systems
12 pages
840 USE 100 00 v80
No ratings yet
840 USE 100 00 v80
902 pages
Computer Organisation and Architecture
No ratings yet
Computer Organisation and Architecture
7 pages
4.0 Hardware Software For Abb Dcs
No ratings yet
4.0 Hardware Software For Abb Dcs
40 pages

Lecture 17-Introduction to GPU

Uploaded by

Lecture 17-Introduction to GPU

Uploaded by

Applied High-Performance Computing and Parallel

The University of Hong Kong

▪ Multi-core CPUs and many core GPUs

Significant advances in CPU

Von Neumann Architecture • Multi-core architecture with dual,

• Multi-core CPU chip architecture

What’s the latency and throughput here?

Picture from: https://gfxcourses.stanford.edu/cs149/fall23/lecture/multicore2-ispc/

Latency of driving from San Francisco to Stanford: 0.5 hr.

▪ Multi-core CPUs and many core GPUs

What GPUs were originally designed to do?

Simple definition of rendering task: computing how each triangle in 3D mesh

Unreal Engine 5.4 demo

▪ Multi-core CPUs and many core GPUs

Step 2: describe operations the system performs on the entities

▪ All modern displays are raster-based. A raster is a 2D rectangular grid of pixels.

▪ All modern displays are raster-based. A raster is a 2D rectangular grid of pixels.

• All modern displays are raster-based. A raster is a 2D rectangular grid of pixels.

Step 2: describe operations the system performs on the entities

Example: every three vertices defines a triangle

Abstracts process of rendering a picture as a sequence

The purpose of the Graphics Rendering Pipeline

Graphics programming APIs provided programmer mechanisms to set parameters of

• Programmer provides mini-

▪ Multi-core CPUs and many core GPUs

These GPUs are very fast processors for

Wait a minute! That sounds a lot like data-

Sparse Matrix Solvers [Bolz 03]

Sparse Matrix Solvers on the GPU

▪ Brook compiler converted generic stream program into OpenGL

Give us your feedback!

You might also like