0% found this document useful (0 votes)

132 views

CPUs GPUs Accelerators

Uploaded by

Kevin William Daniels

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

132 views

CPUs GPUs Accelerators

Uploaded by

Kevin William Daniels

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

CPUs, GPUs and

accelerators
x86
Intel server micro-architectures (1/2)
Microarchitecture Technology Launch year Highlights

Skylake-SP 14nm 2017 Improved frontend and execution units

More load/store bandwidth
Improved hyperthreading
AVX-512

Cascade Lake 14nm++ 2019 Vector Neural Network Instructions (VNNI) to

improve inference performance
Support 3D XPoint-based memory modules and
Optane DC
Security mitigations

Cooper Lake 14nm++ 2020 bfloat16 (brain floating point format)

S
o
Intel server micro-architectures (2/2) ur
c
e:
Microarchitecture Technology Launch Highlights A CPU codename
year n
Sunny Cove 10nm+ 2019 Single threaded performance a Ice Lake Scalable
New instructions n Tiger Lake?
Improved scalability dt
Larger L1, L2, μop caches and 2nd level TLB
e
More execution ports
AVX-512 c
h

Willow Cove 10nm 2020? Cache redesign ?

New transistor optimization
Security Features

Golden Cove 7/10nm? 2021? Single threaded performance ?

AI Performance
Networking/5G Performance
Security Features
Other Intel CPU architectures
● Intel Nervana AI Processor NNP-L-1000 (H2 2019-)
○ Accelerates AI inference for companies with high workload demands
○ Optimized across memory, bandwidth, utilization and power
○ Spring Crest 3-4x faster training than Lake Crest, introduced in 2017
○ Supports bfloat16
● Hybrid CPUs
○ Will be enabled by Foveros, the 3D chip stacking technology recently demonstrated
● Itanium
○ It will be finally discontinued in 2021 (the only remaining customer is HP)
Other Intel-related news
● Record Q3 2018 results
○ Data-centric revenue rose 22%
○ PC revenue rose 16%
● Could not keep up with demand for the latest Xeon chips in 2018
● Serious issues with 10nm process as years behind scheduled
○ Pushing 14nm process to its limits
○ Claims that volume delivers on track for late 2019 and later
○ Being superseded by 7nm sooner than intended, which will be based on EUV lithography
■ Hopes that it will put Intel on track with Moore’s Law
AMD News
● Next gen desktop Matisse CPU (7nm) using Zen2 core achieves IPC parity
with Intel, consumes less power and supports PCIe 4.0
○ Improved branch predictor unit and prefetcher, better micro-op cache management, larger
micro-op cache, increased dispatch bandwidth, increased retire bandwidth, native support for
256-bit floating point math, double size FMA units, double size load-store units
● CSC announced an upcoming supercomputer using 3125 64-core EPYC
“Rome” CPUs in 2020
● Market trend
○ Revenues increased by 23% over 2018 and profitability at its highest since 2011
AMD EPYC Naples (since Q2 ‘17)
AMD EPYC processors target the datacenter and specifically (not limited to) mono-processor servers. EPYC Naples (Zen
architecture) is a single chip made up of 4 separate dies (multi-chip module), interconnected with Infinity Fabric links.

Main specs:

● 4 dies per chip (14nm), each die embedding IO and memory controllers, no chipset, SP3 sockets
● range of frequencies : 2.0-2.4 GHz, turbo up to 3.2 GHz
● 8 DDR4 memory channels with hardware, on the fly, encryption, up to 2600 MHz
● up to 32 cores (64 threads)
● up to 128 PCI gen3 lanes per processor (64 in dual )
● TDP range: 120W-180W

EPYC Naples processors have similar computing power compared to Intel Skylake processors (HS06 benchmarks on
close frequencies and core count CPUs) with cutoff prices up to 49% (AMD claim).

Mostly compatible with Intel based x86, sparing for user code modifications.
AMD EPYC Rome (starting Q2 ‘19)
Next AMD EPYC generation (Zen2 based), embeds 9 dies (8 CPU 7nm chiplets for 1 I/O 14 nm die). All I/O and memory
access is concentrated into a single die.

Main specs:

● 9 dies per chip : a 7nm single IO/memory die and 8 CPU 7nm chiplets
● 8 DDR4 memory channels, up to 3200 MHz
● up to 64 cores (128 threads) per processor
● up to 128 PCI gen3/4 lanes per processor
● SP3 / LGA-4094 sockets
● TDP range: 120W-225W (max 190W for SP3 compatibility)

Claimed +20% performance per zen2 core (over zen), +75% through the whole chip with similar TDP over Naples.

Available on DELL C6525 chassis starting from october.

Manufacturing technologies
● 10nm
○ Intel will ramp up in 2019, late by several years
○ Relies on DUVL (deep ultraviolet lithography) (193nm wavelength laser) requiring heavy use of
multipatterning, which is problematic
● 7nm
○ Uses EUVL (extreme ultraviolet lithography) (13.5nm wavelength laser) reducing use of multipatterning and
reducing costs
○ Intel on track and will start at the end of 2019
○ TSMC already making or will make chips for AMD, Apple, Nvidia and Qualcomm
○ Samsung Foundry started production and will make POWER CPUs for IBM from 2020
○ GlobalFoundries put it on hold indefinitely
● 5, 3, ? nm
○ Design costs increase exponentially
○ [to be expanded]
GPUs - NVIDIA Architecture
● For what concerns raw power GPUs
are following the exponential trends wrt
number of transistors and cores
● New features appear unexpectedly,
driven by market
AMD Vega 20
● 7nm process allows to shrink the die and have more space for HBM2
memory, up to 32GB
● 2x bandwidth per ROP, texture unit, and ALU wrt Vega 10
● added support for INT8 and INT4 data types
○ useful for low-precision inference
● PCI Express 4 on AMD Radeon Instinct MI60
Intel
Entering the discrete GPU market in 2020 “Xe”

ATM just rumors

Tensor cores on NVIDIA Volta and AMD Vega 20
Tensor cores integrated on the GPU

Fast half precision multiplication and reduction in full precision

Useful for accelerating NN training/inference

GPUs - Programmability
● NVIDIA CUDA:
○ C++ based (supports C++14)
○ Many external projects
○ New hardware features available with no delay in the API
● OpenCL:
○ Not supported by NVIDIA
○ Can execute on CPU/iGPU/NVIDIA/AMD and recently Intel FPGAs
○ Overpromised in the past, with scarce popularity
● Compiler directives: OpenMP/OpenACC
○ Latest gcc and llvm include support for CUDA backend
● AMD HIP:
○ Similar to CUDA, still supports only a subset of the features
● GPU-enabled frameworks to hide complexity (Tensorflow)
GPUs - Programmability
Issue is performance portability and code duplication

At the moment, only possible solutions are based on trade-offs and DSL for very simple codes

● might work very well for analysis/ML, less for reconstruction

Hemi, Kokkos, RAJA...

GPUs in LHC experiments software frameworks
• Alice, O2 • LHCb (online) Allen: HLT-1 reduces 5TB/s
input to 130GB/s:
○ Tracking in TPC and ITS
○ Modern GPU replaces 40 CPU cores ○ Track reconstruction, muon-id,
two-tracks vertex/mass
• CMS, CMSSW reconstruction
○ Demonstrated advantage of ○ GPUs can be used to accelerate
heterogeneous reconstruction from the entire HLT-1 from RAW data
RAW to Pixel Vertices at the CMS HLT ○ Events too small, have to be
batched: makes the integration in
○ 1 order of magnitude both in speed-up Gaudi difficult
and energy efficiency wrt full Xeon • ATLAS
socket
FPGA
● Players: Xilinx (US), Intel (US), Lattice Semiconductor
(US), Microsemi (US), and QuickLogic (US), TSMC
(Taiwan), Microchip Technology (US), United
Microelectronics (Taiwan), GLOBALFOUNDRIES
(US), Achronix (US), and S2C Inc. (US)
● Market was valued at USD 5.34 Billion in 2016 and is
expected to be valued at 9.50 Billion in 2023
● Growing demand for advanced driver-assistance
systems (ADAS), developments in IoT and reduction
in time-to-market are the key driving factors
● Telecommunications held the largest size of the FPGA
market in 2016

Sour
ce:

https://www.marketsandmarkets.com/Market-Reports/fpga-market-194123367.html
https
://ww
w.gra
FPGAs for Application Acceleration
● Programmability without sacrificing
efficiency
● Highly suited for low latency Source:

applications
https://
www.ne
xtplatfor
m.com/
2018/10

● Accelerates edge and streaming

/15/whe
re-the-
fpga-
hits-the-
server-
road-

applications for-
Process
inferencTechnology 20 nm 16 nm 14 nm
e- Intel® Xilinx® Intel® Xilinx® Intel® Xilinx®
accelera
Best Performance
tion/ Or Virtex® Virtex® UltraScale+® Intel® Stratix®
Fastest, Most Powerful UltraScale® Zynq® UltraScale+® 10

Best Price/performance/watt Or Intel® Arria® 10 Kintex

Balance of cost, power, performance
Source:
UltraScale®
https://
www.int
el.com/c
ontent/wCost-Optimized Or Intel® Cyclone® 10
ww/us/e Low system cost plus performance GX
n/progra
mmable
/docum
entation
/mtr142
249199
6806.ht
ml#qom
151259
452783
5__fn_s
oc_varia
b_avail_
xlx
FPGA Programming
● Application acceleration device with APIs
○ Targeted at specific use cases
■ Neural inference engine
■ MATLAB
■ LabVIEW FPGA
● C / C++ / System C
○ High level synthesis
○ Control with compiler switches and configurations
● VHDL / Verilog
○ Low level programming
● OpenCL
○ Very high level abstraction
○ Optimized for data parallelism Sour
ce:
https
://ww
w.eet
FPGAs in HEP
● High Level Triggers
○ https://cds.cern.ch/record/2647951
● Deep Neural Networks
○ https://arxiv.org/abs/1804.06913
○ https://indico.cern.ch/event/703881/
● High Throughput Data Processing
○ https://indico.cern.ch/event/669298/

FY2024 NVIDIA Corporate Sustainability Report
No ratings yet
FY2024 NVIDIA Corporate Sustainability Report
41 pages
06 - Mr. Jerome Azemar Technical Project Development Director Yole Group
No ratings yet
06 - Mr. Jerome Azemar Technical Project Development Director Yole Group
46 pages
Linley 5g
No ratings yet
Linley 5g
4 pages
Nvidia Presentation
No ratings yet
Nvidia Presentation
6 pages
Melody Technical Manual - Index
No ratings yet
Melody Technical Manual - Index
8 pages
Lecture 0: Cpus and Gpus: Prof. Mike Giles
No ratings yet
Lecture 0: Cpus and Gpus: Prof. Mike Giles
36 pages
HPC Datasheet sc23 h200 Datasheet 3002446
No ratings yet
HPC Datasheet sc23 h200 Datasheet 3002446
3 pages
Integrated Circuit (IC) Fabrication
No ratings yet
Integrated Circuit (IC) Fabrication
39 pages
Nvidia DGX h100 Datasheet
No ratings yet
Nvidia DGX h100 Datasheet
2 pages
HPC Solution Overview
No ratings yet
HPC Solution Overview
9 pages
NVIDIA Data Center Roadmap With GX200NVL GX200 X100 and X40 AI Chips in 2025 - ServeTheHome
No ratings yet
NVIDIA Data Center Roadmap With GX200NVL GX200 X100 and X40 AI Chips in 2025 - ServeTheHome
1 page
Nvidia Cuda Arc
No ratings yet
Nvidia Cuda Arc
16 pages
2D4GHz VCO Process Vehicle Design Report 2007
No ratings yet
2D4GHz VCO Process Vehicle Design Report 2007
29 pages
Power - Sic - Gan JC Eloy Final
No ratings yet
Power - Sic - Gan JC Eloy Final
27 pages
Thriuuu
No ratings yet
Thriuuu
46 pages
GS-Taiwan PCB CCL 20230908
No ratings yet
GS-Taiwan PCB CCL 20230908
11 pages
ISSCC2024
No ratings yet
ISSCC2024
43 pages
Edge AI: Reshaping The Future of Edge Computing With Artificial Intelligence
No ratings yet
Edge AI: Reshaping The Future of Edge Computing With Artificial Intelligence
29 pages
TSMC UIUC Career Talk Collaterals
No ratings yet
TSMC UIUC Career Talk Collaterals
27 pages
Allwinner H3 Datasheet V1.1
0% (1)
Allwinner H3 Datasheet V1.1
616 pages
Nvidia Corporation: Nvidia Is A Multinational Corporation Which Specializes in The Development of Graphics
No ratings yet
Nvidia Corporation: Nvidia Is A Multinational Corporation Which Specializes in The Development of Graphics
1 page
Panasonic (ICT Introduction)
100% (1)
Panasonic (ICT Introduction)
33 pages
AMD Zen
No ratings yet
AMD Zen
3 pages
Nvidia DGX Station A100 Datasheet
No ratings yet
Nvidia DGX Station A100 Datasheet
2 pages
White Paper NVIDIA-VDI
No ratings yet
White Paper NVIDIA-VDI
12 pages
NVIDIA Ampere GA102 GPU Architecture Whitepaper V1 PDF
No ratings yet
NVIDIA Ampere GA102 GPU Architecture Whitepaper V1 PDF
44 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
Tesla V100 Performance Guide
No ratings yet
Tesla V100 Performance Guide
23 pages
AI and chips Enthusiasm
No ratings yet
AI and chips Enthusiasm
21 pages
5G's Impact On RF Front-End Industry
No ratings yet
5G's Impact On RF Front-End Industry
40 pages
Extreme Power Supply Calculator
100% (1)
Extreme Power Supply Calculator
2 pages
Nvidia Nano Datasheet
No ratings yet
Nvidia Nano Datasheet
41 pages
Sample Page
No ratings yet
Sample Page
66 pages
qm9700 qm9790 1u NDR 400gb S Infiniband Switch Systems User Manual
No ratings yet
qm9700 qm9790 1u NDR 400gb S Infiniband Switch Systems User Manual
102 pages
TechInsights Mediatek RF Product Brief PDF
100% (1)
TechInsights Mediatek RF Product Brief PDF
21 pages
Yole - YDR20079 - Artificial Intelligence Computing For Automotive 2020 - Sample
No ratings yet
Yole - YDR20079 - Artificial Intelligence Computing For Automotive 2020 - Sample
25 pages
Texas Instruments CC2640R2FRGZR Bluetooth LE MCU FAR-1712-801
No ratings yet
Texas Instruments CC2640R2FRGZR Bluetooth LE MCU FAR-1712-801
29 pages
SiC Woflspeed
100% (1)
SiC Woflspeed
40 pages
Case Study: CFD Dr. Graham Pullan University of Cambridge: Nvidia Tesla
No ratings yet
Case Study: CFD Dr. Graham Pullan University of Cambridge: Nvidia Tesla
56 pages
Eetop - CN - Designing Reliable and Efficient Networks On Chips by Dr. Srinivasan Murali (Aut
100% (1)
Eetop - CN - Designing Reliable and Efficient Networks On Chips by Dr. Srinivasan Murali (Aut
200 pages
Yintr24405-Displays and Optics For Xreality 2024-Product Brochure
No ratings yet
Yintr24405-Displays and Optics For Xreality 2024-Product Brochure
18 pages
IEDM - The Big Decisions For 5nm - Breakfast Bytes - Cadence Blogs - Cadence Community
No ratings yet
IEDM - The Big Decisions For 5nm - Breakfast Bytes - Cadence Blogs - Cadence Community
8 pages
PowerDistributionNetworkDesignForVLSI PDF
No ratings yet
PowerDistributionNetworkDesignForVLSI PDF
211 pages
Nvidia DGX A100 Datasheet
No ratings yet
Nvidia DGX A100 Datasheet
2 pages
SoC-FPGA Design Guide (DE0-Nano-SoC Edition)
No ratings yet
SoC-FPGA Design Guide (DE0-Nano-SoC Edition)
100 pages
CitrixVmware GPU Deployment Guide TechPub v02d6 Final
No ratings yet
CitrixVmware GPU Deployment Guide TechPub v02d6 Final
302 pages
RD99DSR5
No ratings yet
RD99DSR5
11 pages
Siemens-SW-IC-Package-physical-design-best-practices
No ratings yet
Siemens-SW-IC-Package-physical-design-best-practices
11 pages
Lm80-p0436-73 A Qualcomm Snapdragon 410e Processor Apq8016e System Power Overview
No ratings yet
Lm80-p0436-73 A Qualcomm Snapdragon 410e Processor Apq8016e System Power Overview
30 pages
FPGA Architecture Principles and Progression
No ratings yet
FPGA Architecture Principles and Progression
26 pages
Yintr24453-Overview of The Semiconductor Foundry Industry 2024-Product Brochure
No ratings yet
Yintr24453-Overview of The Semiconductor Foundry Industry 2024-Product Brochure
17 pages
Could Copper Pillar Change The OSAT Ecosystem
No ratings yet
Could Copper Pillar Change The OSAT Ecosystem
4 pages
Cuda PDF
No ratings yet
Cuda PDF
18 pages
Assignment 10
No ratings yet
Assignment 10
2 pages
SWOT-Analysis-NVIDIA
No ratings yet
SWOT-Analysis-NVIDIA
7 pages
CH06 Floorplan, On Chip (2018) S
No ratings yet
CH06 Floorplan, On Chip (2018) S
36 pages
Computer Arichitecture
No ratings yet
Computer Arichitecture
60 pages
EE292A Lecture 1.intro
No ratings yet
EE292A Lecture 1.intro
61 pages
GPU - Graphical Processing Unit
No ratings yet
GPU - Graphical Processing Unit
69 pages
Premium Embedded
No ratings yet
Premium Embedded
61 pages
Emerging Technologies in Information and Communications Technology
From Everand
Emerging Technologies in Information and Communications Technology
Fouad Sabry
No ratings yet
Performance Computing
100% (1)
Performance Computing
102 pages
Isscc 2023
No ratings yet
Isscc 2023
71 pages
IC Packaging
No ratings yet
IC Packaging
25 pages
Machine Learning in Computational Lithography: Yu Cao
No ratings yet
Machine Learning in Computational Lithography: Yu Cao
20 pages
Machine Learning in Computational Lithography: Yu Cao
No ratings yet
Machine Learning in Computational Lithography: Yu Cao
20 pages
Asml Euv Sources
No ratings yet
Asml Euv Sources
48 pages
Defectivity Modulation
No ratings yet
Defectivity Modulation
7 pages
Yield and Reliability 7nm
No ratings yet
Yield and Reliability 7nm
3 pages
Citrix SDX 11
No ratings yet
Citrix SDX 11
412 pages
GS-G90 Full Alarm Accessories (Golden Security) 2014.07
No ratings yet
GS-G90 Full Alarm Accessories (Golden Security) 2014.07
4 pages
Fdocuments - in - Automatic Rain Sensingwindshieldwiper
No ratings yet
Fdocuments - in - Automatic Rain Sensingwindshieldwiper
18 pages
Level Past Paper Questions - Physics O: TOPIC-22 Electronics PAPER-1 Multiple Choice
No ratings yet
Level Past Paper Questions - Physics O: TOPIC-22 Electronics PAPER-1 Multiple Choice
10 pages
74HC4078
No ratings yet
74HC4078
5 pages
Test Fixtures Catalogue 14 1
No ratings yet
Test Fixtures Catalogue 14 1
214 pages
Semiconductor Presentation
No ratings yet
Semiconductor Presentation
34 pages
Electrical Engineering Lab II
100% (1)
Electrical Engineering Lab II
2 pages
Satellite Hacking: A Guide For The Perplexed: 1. Introduction: Three Key Questions
No ratings yet
Satellite Hacking: A Guide For The Perplexed: 1. Introduction: Three Key Questions
30 pages
Manual Estacion Total Topcon
0% (1)
Manual Estacion Total Topcon
186 pages
ACA Project Report Final
No ratings yet
ACA Project Report Final
12 pages
Centurion Configurable Controller: Installation and Operations Manual
No ratings yet
Centurion Configurable Controller: Installation and Operations Manual
75 pages
Cobra 29 LTD CHR Owners Manual
No ratings yet
Cobra 29 LTD CHR Owners Manual
21 pages
Network Topology: "Assignment in CHS"
No ratings yet
Network Topology: "Assignment in CHS"
6 pages
DX Diag
No ratings yet
DX Diag
44 pages
Eneral Parameters Lectrical Arameters
No ratings yet
Eneral Parameters Lectrical Arameters
11 pages
BER and Q-Factor in Optical Transmissions
No ratings yet
BER and Q-Factor in Optical Transmissions
4 pages
Unit 1d PHY125
No ratings yet
Unit 1d PHY125
35 pages
A Training On Bts Nokia Installation and Commissioning: Presented By, Saurabh Bansal B.Tech (Ece) VTH Sem
No ratings yet
A Training On Bts Nokia Installation and Commissioning: Presented By, Saurabh Bansal B.Tech (Ece) VTH Sem
40 pages
07 Acoustic Energy Aegis One Loudspeaker Measurements Part 3
No ratings yet
07 Acoustic Energy Aegis One Loudspeaker Measurements Part 3
5 pages
Fiber Optic Communication Systems
No ratings yet
Fiber Optic Communication Systems
50 pages
Posiwire Ws17Kt Analog, Ssi or Canopen Output: Compact Sensor For Medium Ranges
No ratings yet
Posiwire Ws17Kt Analog, Ssi or Canopen Output: Compact Sensor For Medium Ranges
9 pages
R6400v2 UM EN
No ratings yet
R6400v2 UM EN
194 pages
Operating Inistructions For The: Transistor Tester 22-024
No ratings yet
Operating Inistructions For The: Transistor Tester 22-024
7 pages
RD-22 Dytronic Primary Transfer Standard
No ratings yet
RD-22 Dytronic Primary Transfer Standard
2 pages
2 SJ 162
No ratings yet
2 SJ 162
6 pages
NX-E400 e
No ratings yet
NX-E400 e
4 pages
ClampOn Pig Detector Dec2018 PDF
No ratings yet
ClampOn Pig Detector Dec2018 PDF
2 pages

CPUs GPUs Accelerators

Uploaded by

CPUs GPUs Accelerators

Uploaded by

CPUs, GPUs and

Skylake-SP 14nm 2017 Improved frontend and execution units

Cascade Lake 14nm++ 2019 Vector Neural Network Instructions (VNNI) to

Cooper Lake 14nm++ 2020 bfloat16 (brain floating point format)

Willow Cove 10nm 2020? Cache redesign ?

Golden Cove 7/10nm? 2021? Single threaded performance ?

Available on DELL C6525 chassis starting from october.

ATM just rumors

Fast half precision multiplication and reduction in full precision

Useful for accelerating NN training/inference

● might work very well for analysis/ML, less for reconstruction

Hemi, Kokkos, RAJA...

● Accelerates edge and streaming

Best Price/performance/watt Or Intel® Arria® 10 Kintex

You might also like