7/14/2020 Hardware Accelerators for Machine Learning (CS 217) by cs217
Hardware Accelerators for Machine Learning
(CS 217)
Stanford University, Winter 2020
3/5
❮ ❯
Fewer moving parts
This course provides in-depth coverage of the architectural techniques used to design accelerators
for training and inference in machine learning systems. This course will cover classical ML
algorithms such as linear regression and support vector machines as well as DNN models such as
[Link] 1/8
7/14/2020 Hardware Accelerators for Machine Learning (CS 217) by cs217
convolutional neural nets, and recurrent neural nets. We will consider both training and inference
for these models and discuss the impact of parameters such as batch size, precision, sparsity and
compression on the accuracy of these models. We will cover the design of accelerators for ML
model inference and training. Students will become familiar with hardware implementation
techniques for using parallelism, locality, and low precision to implement the core computational
kernels used in ML. To design energy-efficient accelerators, students will develop the intuition to
make trade-offs between ML model parameters and hardware implementation techniques.
Students will read recent research papers and complete a design project.
Instructors and office hours:
Ardavan Pedram Kunle Olukotun
Office Hours TBA Office Hours TBA
This class meets Tuesday and Thursday from 10:30 - 11:50 AM in Huang Engineering Center 18.
Teaching Assistants
Nathan Zhang
Office Hours TBA
Class Information
[Link] 2/8
7/14/2020 Hardware Accelerators for Machine Learning (CS 217) by cs217
Funding for this research/activity was partially provided by the National Science Foundation
Division of Computing and Communication Foundations under award number 1563113.
Schedule
Lecture Date Topic Reading Spatial
Assignments Lecture Assignment
Slides
1 1/07/2020 Introduction,
Software 2.0
Role of hardware accelerators in
post Dennard and Moore era
2 1/09/2020 Kian Katan:Classical ML Is Dark silicon
algorithms: Regression, SVMs useful?
Hennessy Patterson
Chapter 7.1-7.2
3 1/14/2020 Linear algebra fundamentals and Why Systolic Linear
accelerating linear algebra Architectures? Algebra
BLAS operations Anatomy of high Accelerators
20th century techniques: Systolic performance GEMM
arrays and MIMDs, CGRAs
Dark Memory
4 1/16/2020 Introduction to Spatial: Analyzing Spatial
Performance and Energy with Aladdin
Spatial
Codesign Tradeoffs
5 1/21/2020 MLPs and CNNs Inference Efficient Processing
of DNNs
NVIDIA Tesla V100
6 1/23/2020 Evaluating Performance, Energy Roofline Model
efficiency, Parallelism, Locality,
Google TPU
Memory hierarchy, Roofline model
Luigi Nardi: Design Space
Optimization with Spatial
[Link] 3/8
7/14/2020 Hardware Accelerators for Machine Learning (CS 217) by cs217
7 1/28/2020 Boris Ginsburg:Generalization Caterpillar
and Regularization of Training Optimizing Gradient
Descent
8 1/30/2020 Azalia Mirhosseini: Reinforcement A Beginner's Guide
Learning for Hardware Design to RL
Resource
Management w DRL
9 2/04/2020 Fanny Nina Paravecino: Catapult Catapult
Brainwave Brainwave
10 2/06/2018 Amir Gholami: Quantized Deep SqueezNext CNN
Learning Inference
Accelerators
11 2/11/2020 Tze Meng Low: Fast Systematic Approach
Implementation of Deep Learning to Blocking
Kernels
High Performance
Zero-Memory
Overhead Direct
Convolutions
12 2/13/2020 Guest Lecture: Paulius Mixed Precision
Micikevicius Training Nvidia
Mixed Precision
GPU Design Tradeoffs for Training With 8-bit
Deeplearning and MLPerf Floating Point
Nvidia Volta
13 2/18/2020 Guest Lecture: Cliff Young DawnBench
MLPerf
Neural Networks Have Rebooting
Computing: What Should We
Reboot Next?
14 2/20/2020 Mohammad Shoeybi GNMT
BERT
Accelerating Natural Language
Processing
[Link] 4/8
7/14/2020 Hardware Accelerators for Machine Learning (CS 217) by cs217
15 2/25/2020 Mikhail Smelyanskiv:AI at ML @ Facebook
Facebook Datacenter Scale
16 2/27/2020 Assignment 1 Feedback and Midterm
Discussion
17 3/03/2020 Boris Ginsburg: Large Scale Revisiting Small
Training Batch Training for
Neural Networks
Large Batch Training
of Convolutional
Networks
Deep Learning At
Supercomputer Scale
Deep Gradient
Compression
18 3/05/2020 Sparsity in Deep Learning EIE
Campfire
19 3/10/2020 Machine Learning Systems and Taxonomy of
Software Stack Accelerator
Architectures
ML Systems Stuck in
a Rut
20 3/12/2020 TBD
Guest Lectures
Kian Katanforoosh, [Link] and Stanford University
From Machine Learning to Deep Learning: a computational transition
Thursday January 9, 2020
[Link] 5/8
7/14/2020 Hardware Accelerators for Machine Learning (CS 217) by cs217
Luigi Nardi, Lund University and Stanford University
Design Space Optimization with Spatial
Thursday January 23, 2020
Boris Ginsburg, NVIDIA
Generalization and Regularization of Training
Tuesday January 28, 2018
Azalia MirHosseini, Google Brain
Reinforcement Learning and Hardware Design
Thursday January 30, 2020
Fanny Nina Paravecino, Microsoft Research
Real-Time AI at Cloud Scale with Project Brainwave
Tuesday February 4, 2020
Amir Gholami, UC Berkeley
Precision and Quantized Training for Deep Learning
Thursday February 6, 2020
Tze Meng Low, Carnegie Melon University
Fast Implementation of Deep Learning Kernels
Tuesday February 11, 2020
Paulius Micikevicius, NVIDIA
GPU Design Tradeoffs for Deeplearning and MLPerf
Thursday February 13, 2020
[Link] 6/8
7/14/2020 Hardware Accelerators for Machine Learning (CS 217) by cs217
Cliff Young, Google
Neural Networks Have Rebooting Computing: What Should We Reboot Next?
Tuesday February 18, 2020
Mohammad Shoeybi, NVIDIA
Natural Language Processing
Thursday February 20, 2020
Mikhail Smelyanskiy, Facebook
AI at Facebook Datacenter Scale
Tuesday February 25, 2020
Boris Ginsburg, NVIDIA
Large Batch Training of Convolution Networks
Tuesday March 3, 2020
Lecture Notes (Fall 2018)
Related Stanford Courses
CS230
CS231n
STATS 385
Reading list and other resources
Basic information about deep learning
Cheat sheet – things that everyone needs to know
[Link] 7/8
7/14/2020 Hardware Accelerators for Machine Learning (CS 217) by cs217
Blogs
Grading
This page was generated by GitHub Pages.
[Link] 8/8