13
Most read
22
Most read
23
Most read
Shortest Path Algorithms Application to Traffic
Assignment Problem Comparing Central
Processing
Unit (CPU) vs. Graphical Processing Unit (GPU)
Vishal Singh
Department of Computer Science & Engineering
University of Texas-Arlington
Arlington, TX
Advisor: Dr. Srinivas Peeta
Mentor: Dr. Xiaozheng He & Mr. Amit Kumar
NEXTRANS Center/Department of Civil Engineering
Purdue University
West Lafayette, Indiana
Traffic Assignment Problem
 A historical problem which over the course of the past
five decades has been addressed through a number of
different iterative algorithms [3].
 It is the fourth phase of the classical urban
transportation planning system model following: Trip
Generation, Trip Distribution, and Mode Choice [4].
Figure 1: The Urban Transportation Model System.
Source: Pas (1995, p.65). Copyright 1995 by The
Guilford Press.
Traffic Assignment Problem(TAP)
 To estimate the volume of traffic on the links of the
network
 To provide estimates of travel costs between trip
origins and destinations.
 To identify heavily traveled or congested arcs (links) as
well as the routes used between each origin-
destination (O-D) pair.
Traffic Assignment Problem(TAP)
 The optimal goal for TAP is User Equilibrium which is
based on minimizing the travel time of individual users
[3].
User Equilibrium
 User Equilibrium is achieved when there no
alternative in path choice that is available for
drivers to improve one’s travel time [2].
 Every used route connecting an origin and destination
has equal and minimal travel time.
Route 1 vs. Route 2
 Figure 1: Intersection showcases
the point where the User Equilibrium
is satisfied [2].
 Figure 2: NO intersection means
that Path 2 is a faster alternative
compared to Path 1 [2].
User Equilibrium
Slope-based MultiPath Algorithm
 Several approaches have been established to solve TAP
 Gradient projection(GP) algorithm of Jayakrishnan
 Frank-Wolfe(F-W) algorithm
 Origin-based algorithm(OBA)
 SMPA seeks to move path costs towards the average
cost for an O-D pair at each respective iteration.
Flow Update Mechanism Figure 3: At each
iteration, the flow
update seeks to
reduce the costs of
costlier paths and
bring them
to the average cost
(Cav) for the O-D
pair and aims to
increase the costs
of the cheaper paths
to a value μ [3].
Costlier paths
Cheaper paths
What is GPU Computing?
 GPU computing is the use of a GPU (graphics processing
unit) together with a CPU to accelerate general-purpose
scientific and engineering applications.
CUDA
 CUDA is the language for GPU computing
 It enables dramatic increases in computing
performance by harnessing the power of the graphics
processing unit (GPU).
 Good for lots of computations and heavy data sets.
 Tailored for engineering simulation and massive data
sets.
How is this beneficial to TAP
problem
 In transportation Engineering, simulations play a vital
role in attaining data and network modeling.
 In case of the Winnipeg network and Austin network,
the data sets are so massive that implementation
through CPU would take hours.
 Whereas this is where GPU computing comes into play
as it is efficient for massive data, where parallel
computing is utilized to a greater extent.
 Hardware of GPU has more ALU’s (Arithmetic Logic Units)
than a typical CPU [6].
 Better capability to process parallel arithmetic operations,
meaning same operations is performed on different data sets.
GPU and CPU Architecture
CPU + GPU
 CPUs consist of a few
cores optimized for
serial processing.
 GPUs consist of
thousands of smaller,
more efficient cores.
 Serial portions of the
code run on the CPU
while parallel portions
run on the GPU.
CPU vs. GPU
 CPUs are designed for a wide variety of applications
and to provide fast response times to a single task.
 Limited number of cores limits how many pieces of data
can be processed simultaneously.
 GPUs, whereas are built specifically for rendering and
other graphics applications that have a large degree of
data parallelism [2].
 Larger number of cores makes its ideal for throughput
computing.
CPU Implementation
 The CPU code for the shortest has been implemented
in C language as it is the most efficient in terms of
computational speed.
 Dijikstra’s algorithm is used to implement the shortest
path, as this step is the most time consuming, which
has been implemented successfully.
Constrains
 But the algorithm faces bugs as there is memory
management problems as well as a lack of data
structure knowledge.
 Not the best language in terms of my skill sets.
GPU coding
 Require more time to digest GPU CUDA programming
as the language is new in the market and there is
limited number of resources.
 Program written in CUDA are compiled by NVIDIA’s
nvcc compiler and can be run only on NVIDIA’s GPU’s
so in terms of implementation the restriction on the
hardware limits the access for the programmer.
CPU vs. GPU comparison
Table 1: Simple implementation of the Floyd-Warshall all-pairs-
shortest-path algorithm written in two versions, a standard serial
CPU version and a CUDA GPU version [5].
On average the GPU time is 45X
faster!
Conclusion
 The GPU aspect of shortest path algorithm has not yet
been programmed in CUDA so the comparison between
CPU vs. GPU is only partially satisfied.
 Sample output on the Floyd-Warshall shortest path
algorithm notions GPU speeds to be 45 times faster [5].
 For smaller tasks, the GPU is not much faster than CPU as
the overhead cost of data transfer is more than time saved
by parallelization [6].
 Many factors play a role in the large performance gap, with
regards to which CPU and GPU are used and especially
what optimizations are applied to the code on each
platform [1].
What I learned essentially…
 The significance of C language has been more evident that ever
for me as it is clearly the most time efficient language but C is
difficult to optimize due to its low-level nature, there are very
few clues to the compiler as to where data
structures and algorithms can be optimized or parallelized.
 GPU computing is gaining momentum as in today’s age of
massive data, parallel computation holds precedence.
 Will surely work on CUDA programming over the course of
Undergraduate studies
 Data Structures is an area which I want to gain a strong grasp on
as without a structure to data we cannot convert it into
information.
My Doctorate Analogy
 Grad school is like an
isolated journey
towards monkhood,
as the student can be
compared to the likes
of Luke Skywalker.
 With the advisor
assuming the role of
Yoda, the wise One.
References
[1] Abhranil Das. Process Time Comparsion between GPU and CPU. High Performance
computing on graphics processing unit. Hamburg University. (July 2011), pp.1-11
[2] Jesse Gawling. CUDA Floyd Warshall. GitHub.com. Collaborative Revision Control.
(March 2013). Web. (July 2013)
[3] R.A.Johnston. “The Urban Transportation Planning Process.” 2004. Book ch. for The
Geography of Urban Transportation. Ed. by Susan Hanson and Genevieve Guiliano.
[4] Srinivas Peeta, Amit Kumar. Slope-Based Multipath Flow Update Algorithm for Static
User Equilibrium Traffic Assignment Problem. Transportation Research Record: Journal
of the Transportation Research Board, Vol. 2196. (Feburary 2010), pp. 1-10
[5] Stephen D. Boyles. User Equilibrium and System Optimum.
https://webspace.utexas.edu/sdb382/www/teaching/ce392c/ueso.pdf
[6] Victor W. Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony
D. Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per
Hammarlund, Ronak Singhal, Pradeep Dubey. Debunking the 100X GPU vs. CPU Myth:
An Evaluation of Throughput Computing on CPU and GPU. SIGARCH Comput. Archit.
News, Vol. 38, No. 3. (June 2010), pp. 451-460
Acknowledgements
 Srinivas Peeta, Ph.D.
NEXTRANS Center Director
Purdue University
Professor of Civil Engineering
 Xiaozheng "Sean" He, Ph.D.
Research Associate
Purdue University
Department of Civil Engineering
 Amit Kumar
Doctoral Student
Purdue University
Department of Civil Engineering
 Kumer Pial Das, Ph.D.
Lamar University
Department of Mathematics
 Mamta Singh, Ph.D. (My Mum )
Lamar University
Department of Teacher Education

More Related Content

PPTX
CPU vs GPU Comparison
PDF
NVIDIA Keynote #GTC21
PDF
GTC 2022 Keynote
PPTX
Presentation on graphics processing unit (GPU)
PPTX
Types of research designs
PPTX
Graphics processing unit ppt
PDF
ChatGPT webinar slides
PDF
NVIDIA @ AI FEST
CPU vs GPU Comparison
NVIDIA Keynote #GTC21
GTC 2022 Keynote
Presentation on graphics processing unit (GPU)
Types of research designs
Graphics processing unit ppt
ChatGPT webinar slides
NVIDIA @ AI FEST

What's hot (20)

PDF
GPU - An Introduction
PDF
GPU - Basic Working
PPTX
Graphics Processing Unit by Saurabh
PPTX
Nvidia (History, GPU Architecture and New Pascal Architecture)
PPTX
Graphic Processing Unit (GPU)
PDF
Introduction to GPU Programming
PPTX
Lec04 gpu architecture
PPTX
Graphic Processing Unit
PPTX
Graphics processing unit (GPU)
PDF
PPTX
GPU Computing
PPTX
Tensor Processing Unit (TPU)
PDF
Deep learning: Hardware Landscape
PPT
NVIDIA CUDA
PPTX
Graphics processing unit
PPTX
PPTX
GRAPHICS PROCESSING UNIT (GPU)
PDF
Processing-in-Memory
PDF
AMD: Where Gaming Begins
 
PPTX
AMD Processor
GPU - An Introduction
GPU - Basic Working
Graphics Processing Unit by Saurabh
Nvidia (History, GPU Architecture and New Pascal Architecture)
Graphic Processing Unit (GPU)
Introduction to GPU Programming
Lec04 gpu architecture
Graphic Processing Unit
Graphics processing unit (GPU)
GPU Computing
Tensor Processing Unit (TPU)
Deep learning: Hardware Landscape
NVIDIA CUDA
Graphics processing unit
GRAPHICS PROCESSING UNIT (GPU)
Processing-in-Memory
AMD: Where Gaming Begins
 
AMD Processor
Ad

Viewers also liked (20)

PPT
Gpu presentation
PDF
GPU power consumption and performance trends
PPT
Graphics Processing Unit - GPU
PPTX
Graphics processing unit (gpu)
PDF
Example Application of GPU
PPT
PPT
Parallel computing with Gpu
PPTX
GPUs, CPUs and SoC
PPTX
Service Oriented Architecture
PDF
GPU Computing with Ruby
PPT
Placa base esteban david zapata
PDF
19564926 graphics-processing-unit
PPTX
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
PDF
Nami - Game Streaming Concept
PDF
Sc13 gpu
PDF
Gpu presentation
PPTX
Wireless_Sensor_security
PDF
Markus Tessmann, InnoGames
PDF
Realtime 3D Visualization without GPU
PDF
Gpu Systems
Gpu presentation
GPU power consumption and performance trends
Graphics Processing Unit - GPU
Graphics processing unit (gpu)
Example Application of GPU
Parallel computing with Gpu
GPUs, CPUs and SoC
Service Oriented Architecture
GPU Computing with Ruby
Placa base esteban david zapata
19564926 graphics-processing-unit
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Nami - Game Streaming Concept
Sc13 gpu
Gpu presentation
Wireless_Sensor_security
Markus Tessmann, InnoGames
Realtime 3D Visualization without GPU
Gpu Systems
Ad

Similar to CPU vs. GPU presentation (20)

PDF
I017425763
PDF
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
PDF
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
PPTX
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
PDF
A Comparative Evaluation of the GPU vs The CPU for Parallelization of Evolut...
PDF
A Comparative Evaluation of the GPU vs The CPU for Parallelization of Evoluti...
PDF
A COMPARATIVE EVALUATION OF THE GPU VS. THE CPU FOR PARALLELIZATION OF EVOLUT...
PDF
Sharing of cluster resources among multiple Workflow Applications
PDF
Accelerating S3D A GPGPU Case Study
PDF
A location based least-cost scheduling for data-intensive applications
PDF
P5 verification
PDF
Task Scheduling using Hybrid Algorithm in Cloud Computing Environments
PDF
N0173696106
PPTX
HPC with Clouds and Cloud Technologies
PDF
Multiple dag applications
PDF
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
PDF
Parallelization of Graceful Labeling Using Open MP
PDF
Accelerating economics: how GPUs can save you time and money
PDF
Hardback solution to accelerate multimedia computation through mgp in cmp
PDF
Hadoop scheduler with deadline constraint
I017425763
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
A Comparative Evaluation of the GPU vs The CPU for Parallelization of Evolut...
A Comparative Evaluation of the GPU vs The CPU for Parallelization of Evoluti...
A COMPARATIVE EVALUATION OF THE GPU VS. THE CPU FOR PARALLELIZATION OF EVOLUT...
Sharing of cluster resources among multiple Workflow Applications
Accelerating S3D A GPGPU Case Study
A location based least-cost scheduling for data-intensive applications
P5 verification
Task Scheduling using Hybrid Algorithm in Cloud Computing Environments
N0173696106
HPC with Clouds and Cloud Technologies
Multiple dag applications
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
Parallelization of Graceful Labeling Using Open MP
Accelerating economics: how GPUs can save you time and money
Hardback solution to accelerate multimedia computation through mgp in cmp
Hadoop scheduler with deadline constraint

CPU vs. GPU presentation

  • 1. Shortest Path Algorithms Application to Traffic Assignment Problem Comparing Central Processing Unit (CPU) vs. Graphical Processing Unit (GPU) Vishal Singh Department of Computer Science & Engineering University of Texas-Arlington Arlington, TX Advisor: Dr. Srinivas Peeta Mentor: Dr. Xiaozheng He & Mr. Amit Kumar NEXTRANS Center/Department of Civil Engineering Purdue University West Lafayette, Indiana
  • 2. Traffic Assignment Problem  A historical problem which over the course of the past five decades has been addressed through a number of different iterative algorithms [3].  It is the fourth phase of the classical urban transportation planning system model following: Trip Generation, Trip Distribution, and Mode Choice [4].
  • 3. Figure 1: The Urban Transportation Model System. Source: Pas (1995, p.65). Copyright 1995 by The Guilford Press.
  • 4. Traffic Assignment Problem(TAP)  To estimate the volume of traffic on the links of the network  To provide estimates of travel costs between trip origins and destinations.  To identify heavily traveled or congested arcs (links) as well as the routes used between each origin- destination (O-D) pair.
  • 5. Traffic Assignment Problem(TAP)  The optimal goal for TAP is User Equilibrium which is based on minimizing the travel time of individual users [3].
  • 6. User Equilibrium  User Equilibrium is achieved when there no alternative in path choice that is available for drivers to improve one’s travel time [2].  Every used route connecting an origin and destination has equal and minimal travel time.
  • 7. Route 1 vs. Route 2  Figure 1: Intersection showcases the point where the User Equilibrium is satisfied [2].  Figure 2: NO intersection means that Path 2 is a faster alternative compared to Path 1 [2]. User Equilibrium
  • 8. Slope-based MultiPath Algorithm  Several approaches have been established to solve TAP  Gradient projection(GP) algorithm of Jayakrishnan  Frank-Wolfe(F-W) algorithm  Origin-based algorithm(OBA)  SMPA seeks to move path costs towards the average cost for an O-D pair at each respective iteration.
  • 9. Flow Update Mechanism Figure 3: At each iteration, the flow update seeks to reduce the costs of costlier paths and bring them to the average cost (Cav) for the O-D pair and aims to increase the costs of the cheaper paths to a value μ [3]. Costlier paths Cheaper paths
  • 10. What is GPU Computing?  GPU computing is the use of a GPU (graphics processing unit) together with a CPU to accelerate general-purpose scientific and engineering applications.
  • 11. CUDA  CUDA is the language for GPU computing  It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).  Good for lots of computations and heavy data sets.  Tailored for engineering simulation and massive data sets.
  • 12. How is this beneficial to TAP problem  In transportation Engineering, simulations play a vital role in attaining data and network modeling.  In case of the Winnipeg network and Austin network, the data sets are so massive that implementation through CPU would take hours.  Whereas this is where GPU computing comes into play as it is efficient for massive data, where parallel computing is utilized to a greater extent.
  • 13.  Hardware of GPU has more ALU’s (Arithmetic Logic Units) than a typical CPU [6].  Better capability to process parallel arithmetic operations, meaning same operations is performed on different data sets. GPU and CPU Architecture
  • 14. CPU + GPU  CPUs consist of a few cores optimized for serial processing.  GPUs consist of thousands of smaller, more efficient cores.  Serial portions of the code run on the CPU while parallel portions run on the GPU.
  • 15. CPU vs. GPU  CPUs are designed for a wide variety of applications and to provide fast response times to a single task.  Limited number of cores limits how many pieces of data can be processed simultaneously.  GPUs, whereas are built specifically for rendering and other graphics applications that have a large degree of data parallelism [2].  Larger number of cores makes its ideal for throughput computing.
  • 16. CPU Implementation  The CPU code for the shortest has been implemented in C language as it is the most efficient in terms of computational speed.  Dijikstra’s algorithm is used to implement the shortest path, as this step is the most time consuming, which has been implemented successfully.
  • 17. Constrains  But the algorithm faces bugs as there is memory management problems as well as a lack of data structure knowledge.  Not the best language in terms of my skill sets.
  • 18. GPU coding  Require more time to digest GPU CUDA programming as the language is new in the market and there is limited number of resources.  Program written in CUDA are compiled by NVIDIA’s nvcc compiler and can be run only on NVIDIA’s GPU’s so in terms of implementation the restriction on the hardware limits the access for the programmer.
  • 19. CPU vs. GPU comparison Table 1: Simple implementation of the Floyd-Warshall all-pairs- shortest-path algorithm written in two versions, a standard serial CPU version and a CUDA GPU version [5]. On average the GPU time is 45X faster!
  • 20. Conclusion  The GPU aspect of shortest path algorithm has not yet been programmed in CUDA so the comparison between CPU vs. GPU is only partially satisfied.  Sample output on the Floyd-Warshall shortest path algorithm notions GPU speeds to be 45 times faster [5].  For smaller tasks, the GPU is not much faster than CPU as the overhead cost of data transfer is more than time saved by parallelization [6].  Many factors play a role in the large performance gap, with regards to which CPU and GPU are used and especially what optimizations are applied to the code on each platform [1].
  • 21. What I learned essentially…  The significance of C language has been more evident that ever for me as it is clearly the most time efficient language but C is difficult to optimize due to its low-level nature, there are very few clues to the compiler as to where data structures and algorithms can be optimized or parallelized.  GPU computing is gaining momentum as in today’s age of massive data, parallel computation holds precedence.  Will surely work on CUDA programming over the course of Undergraduate studies  Data Structures is an area which I want to gain a strong grasp on as without a structure to data we cannot convert it into information.
  • 22. My Doctorate Analogy  Grad school is like an isolated journey towards monkhood, as the student can be compared to the likes of Luke Skywalker.  With the advisor assuming the role of Yoda, the wise One.
  • 23. References [1] Abhranil Das. Process Time Comparsion between GPU and CPU. High Performance computing on graphics processing unit. Hamburg University. (July 2011), pp.1-11 [2] Jesse Gawling. CUDA Floyd Warshall. GitHub.com. Collaborative Revision Control. (March 2013). Web. (July 2013) [3] R.A.Johnston. “The Urban Transportation Planning Process.” 2004. Book ch. for The Geography of Urban Transportation. Ed. by Susan Hanson and Genevieve Guiliano. [4] Srinivas Peeta, Amit Kumar. Slope-Based Multipath Flow Update Algorithm for Static User Equilibrium Traffic Assignment Problem. Transportation Research Record: Journal of the Transportation Research Board, Vol. 2196. (Feburary 2010), pp. 1-10 [5] Stephen D. Boyles. User Equilibrium and System Optimum. https://webspace.utexas.edu/sdb382/www/teaching/ce392c/ueso.pdf [6] Victor W. Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D. Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, Ronak Singhal, Pradeep Dubey. Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU. SIGARCH Comput. Archit. News, Vol. 38, No. 3. (June 2010), pp. 451-460
  • 24. Acknowledgements  Srinivas Peeta, Ph.D. NEXTRANS Center Director Purdue University Professor of Civil Engineering  Xiaozheng "Sean" He, Ph.D. Research Associate Purdue University Department of Civil Engineering  Amit Kumar Doctoral Student Purdue University Department of Civil Engineering  Kumer Pial Das, Ph.D. Lamar University Department of Mathematics  Mamta Singh, Ph.D. (My Mum ) Lamar University Department of Teacher Education