Lecture 17-Introduction to GPU
Lecture 17-Introduction to GPU
Programming
Presenter: Liangqiong Qu
Assistant Professor
• Clock-rate for single processors increased from 10 MHz (lntel 286) to 4 GHz (Pentium 4) in 30
years
• Clock rate increase with higher 5 GHz unfortunately reached a limit due to power limitations /
heat
Review of Last Lecture: Many-core GPUs
▪ Use of very many simple cores
• High throughput computing-oriented architecture
• Use massive parallelism by executing a lot of
concurrent threads slowly
• Handle an ever increasing amount of multiple
instruction threads
• CPUs instead typically execute a single long thread as
fast as possible
▪ Many-core GPUs are used in large clusters and within
massively parallel supercomputers today
Graphics Processing Unit (GPU) is great for data parallelism and task parallelism
Compared to multi-core CPUs, GPUs consist of a many-core architecture with hundreds to
even thousands of very simple cores executing threads rather slowly
Review of Last Lecture: Multi-core CPU and Multi-core GPU
• Both the CPU and GPU are silicon-based microprocessors, and have similar internal
components, including cores, memory, and control unit.
• The core functionality of the CPU is to fetch instructions from RAM and then decode and
execute the instructions, i.e., running processes serially.
• CPU is powerful that handles all types of computing tasks required for the operating system
and applications to run.
• GPU contains thousands of smaller and less powerful cores (specialized) than CPU cores.
• GPU breaks tasks into smaller chunks and runs them in parallel and processes high
volumes of the same instructions rapidly.
Review of Previous Lecture: Latency vs Throughput
• Latency: Latency refers to the delay that happens between when a user takes an action on
a network or web application and when it reaches its destination, which is measured in
milliseconds.
• Throughput: measures the volume of data that passes through a network in a given
period.
• See below: assume only one car in a lane of the highway at once.
• When car on highway reaches Stanford, the next car leaves San Francisco.
High throughput
GPU is a specialized component that
Lower latency is best at running many smaller tasks
CPU handles all the main functions of a computer at once.
CPU has a few cores, each with higher clock speeds GPU has a massive number of smaller
and larger caches and more specialized cores (less
powerful than CPU cores).
Outline
Vertices have following attributes: (1) Position in 3D space V(x,y,z), (2) Color:
expressed in RGB or RGBA components, (3) Vertex-Normal. (4). Texture
Real-time Graphics Primitives (Entities)
Represent surface as a 3D triangle mesh
The primitives (such as triangle, point, line or quad), which is formed by one or more
vertices. Primitives are the inputs to the graphics rendering pipeline.
Real-time Graphics Primitives (Entities)
▪ A pixel, short for "picture element," is the smallest unit of a digital image or display. It is
a tiny square or dot that represents a single point of color. A pixel is 2-dimensional, with
a (x, y) position and a RGB color value (no alpha value for pixels).
Real-time Graphics Primitives (Entities)
A fragment is 3-dimensional, with a (x, y, z) position. The (x, y) are aligned with the 2D
pixel-grid. The z-value (not grid-aligned) denotes its depth.
The 3D fragments are interpolated from vertices.
Rendering a Picture
Step 4: compute color of primitive for each fragment
(based on scene lighting and primitive material properties)
Rendering a Picture
Step 5: put color of the closest fragment to the camera in
the output image
Real-time Graphics Pipeline
CUDA Programming
Next Lectures
Thank you very much for choosing this course!
https://forms.gle/zDdrPGCkN7ef3UG5A
38