Lecture 1: Introduction
Xuming He
SIST, ShanghaiTech
Fall, 2020
9/7/2020 Xuming He – CS 280 Deep Learning 1
Outline
Course logistics
Overall objective
Grading policy
Pre-requisite / Syllabus
Introduction to deep learning
Machine learning review
Artificial neurons
9/7/2020 Xuming He – CS 280 Deep Learning 2
Course objectives
Learning to use deep networks
How to write from scratch, debug and train neural networks
Toolboxes commonly used in practice
Understanding deep models
Key concepts and principles
State of the art
Some new topics from research field
Focusing on vision-related problems
9/7/2020 Xuming He – CS 280 Deep Learning 3
Syllabus & Schedule
Piazza:
piazza.com/shanghaitech.edu.cn/fall2020/cs280
The schedule for the latter half of the semester may vary a bit
Part I: Basic neural networks (1~1.5 weeks by Prof He)
Linear models
Multiple layer networks
Gradient descent and BP
Part II: Convolutional neural networks
Part III: Recurrent neural networks
Part IV: Generative neural networks
Part V: Advanced Topics
9/7/2020 Xuming He – CS 280 Deep Learning 4
Syllabus & Schedule
Piazza:
piazza.com/shanghaitech.edu.cn/fall2020/cs280
The schedule for the latter half of the semester may vary a bit
Part I: Basic neural networks (1~1.5 weeks)
Part II: Convolutional neural networks (4 weeks by Prof He)
CNN basics
Understanding CNN
CNN in Vision
Part III: Recurrent neural networks
Part IV: Generative neural networks
Part V: Advanced Topics
9/7/2020 Xuming He – CS 280 Deep Learning 5
Syllabus & Schedule
Piazza:
piazza.com/shanghaitech.edu.cn/fall2020/cs280
The schedule for the latter half of the semester may vary a bit
Part I: Basic neural networks (1~1.5 weeks)
Part II: Convolutional neural networks (4 weeks)
Part III: Recurrent neural networks (3 weeks by Prof Xu)
LSTM, GRU
Attention modeling
RNN in Vision/NLP
Transformer and Graph Neural Networks
Part IV: Generative neural networks
Part V: Advanced Topics
9/7/2020 Xuming He – CS 280 Deep Learning 6
Syllabus & Schedule
Piazza:
piazza.com/shanghaitech.edu.cn/fall2020/cs280
The schedule for the latter half of the semester may vary a bit
Part I: Basic neural networks (1~1.5 weeks)
Part II: Convolutional neural networks (4 weeks)
Part III: Recurrent neural networks (3 weeks)
Part IV: Generative neural networks (2 weeks by Prof Xu)
Variational Auto Encoder (VAE)
Generative deep nets (GAN)
Part V: Advanced Topics (2 weeks)
Note: no lectures in the following weeks
Nov 9 ~ Nov 16 (CVPR)
9/7/2020 Xuming He – CS 280 Deep Learning 7
Reference books and materials
Deep learning:
http://www.deeplearningbook.org/
https://d2l.ai/
Online deep learning courses:
Stanford: CS230, CS231n
CMU: 11-785
MIT: 6.S191
Additional reading materials on Piazza
Survey papers, tutorials, etc.
9/7/2020 Xuming He – CS 280 Deep Learning 8
Instructor and TAs
Instructor: Prof Xuming He and Prof Lan Xu
[email protected] ;
[email protected] SIST 1A-304D ; 1C-203D
TAs:
Haozhe Wang, Qiuyue Wang, Guoxing Sun, Yannan He, Quan
Meng, Yinwenqi Jiang
Office hours: To be announced on Piazza
We will use Piazza as the main communication platform
9/7/2020 Xuming He – CS 280 Deep Learning 9
Grading policy
4 Problem sets: 10% x 4 = 40%
Write-up problem sets + Programming tasks
Final course project: 40% (+10%)
Proposal
Final report (Conference format)
Presentation
Bonus points for novel results: 10%
10 Quizzes (in class): 2% x 10 = 20%
Late policy
A total of 7 free late (calendar) days to use, but no more than 4 late days can be
used on any single assignment.
After that, 25% off per day late
Does not apply to Final course project/Quizzes
Collaboration policy
Project team: 3~5 students
Grading according to each member’s contribution
9/7/2020 Xuming He – CS 280 Deep Learning 10
Administrative Stuff
Plagiarism
All assignments must be done individually
You may not look at solutions from any other source
You may not share solutions with any other students
Plagiarism detection software will be used on all the programming
assignments
You may discuss together or help another student but you cannot
give the exact solution
Plagiarism punishment
When one student copies from another student, both students
are responsible
Zero point on the assignment or exam in question
Repeated violation will result in an F grade for this course as well
as further discipline at the school/university level
Pre-requisite
Proficiency in Python
All class assignments will be in Python (and use numpy)
A Python tutorial available on Piazza
Calculus, Linear Algebra, Probability and Statistics
Undergrad course level
Equivalent knowledge of Andrew Ng’s CS229 (Machine
Learning)
Formulating cost functions
Taking derivatives
Performing optimization with gradient descent
Will be evaluated in next quiz (Wednesday)
9/7/2020 Xuming He – CS 280 Deep Learning 12
Outline
Course logistics
Introduction to deep learning
What & Why deep learning?
Machine Learning review
Artificial neurons
Acknowledgement: Bhiksha Raj@CMU’s course notes
9/7/2020 Xuming He – CS 280 Deep Learning 13
Introduction
Our goal: Build intelligent algorithms to make sense of data
Example: Recognizing objects in images
red panda (Ailurus fulgens)
Example: Predicting what would happen next
Vondrick et al. CVPR2016
9/7/2020 Xuming He – CS 280 Deep Learning 14
Introduction
Our goal: Build intelligent algorithms to make sense of data
Example: Recognizing objects in images
Example: Predicting what would happen next
9/7/2020 Xuming He – CS 280 Deep Learning 15
Introduction
A broad range of real-world applications
Speech recognition
Input: sound wave → Output: transcript
Language translation
Input: text in language A (Eng) → Output: text in language B (Chs)
Image classification
Input: images → Output: image category (cat, dog, car, house, etc.)
Autonomous driving
Input: sensory inputs → Output: actions (straight, left, right, stop, etc.)
Main challenges: difficult to manually design the algorithms
9/7/2020 16
A data-driven approach
Each task as a mapping function (or a model)
Mapping function
Input data Expected output
input data: images
expected output: object or action names
Building such mapping functions from data
Mapping function
red panda (Ailurus fulgens)
9/7/2020 Xuming He – CS 280 Deep Learning 17
A data-driven approach
Building a mapping function (model)
x: input data
y: expected output
: parameters to be estimated
Learning the model from data
Given a dataset
Find the ‘best’ parameter , such that
And it can be generalized to unseen input data
9/7/2020 18
What is deep learning?
Using deep neural networks as the mapping function
Model: Deep neural networks
A family of parametric models
Consisting of many ‘simple’ computational units
Constructing a multi-layer representation of input
Image from Jeff Clune’s Deep Learning Overview
9/7/2020 19
What is deep learning?
Using deep neural networks as the mapping function
Learning: Parameter estimation from data
Parameters: connection weights between units
Formulated as an optimization problem
Efficient algorithms for handling large-scale models & datasets
9/7/2020 20
Why deep networks?
Inspiration from visual cortex
9/7/2020 21
Why deep networks?
A deep architecture can represent certain functions
(exponentially) more compactly
Learning a rich representation of input data
9/7/2020 22
Recent success with DL
Recent
Somesuccess with
recent neuralwith
success networks
neural networks
Steel drum
The Im ag e C lassification C halleng e:
1,000 ob ject classes
1,431,167 im ag es
Recent success with neural networks
Recent success with neural network
Russakovsky et al. arX iv, 2014
• Some recent successes with neural networks
Fei-Fei Li & Justin Jo h nso n & S erena Y eung Lecture 1 - 24 4/4/2017
– A bit of hyperbole, but still..
9/7/2020 23
Summary: Why deep learning?
One of the major thrust areas recently in various pattern
recognition, prediction and data analysis
Efficient representation of data and computation
Other key factors: large datasets and hardware
The state of the art in many problems
Often exceeding previous benchmarks by large margins
Achieve better performances than human for certain “complex”
tasks.
But also somewhat controversial …
Lack of theoretical understanding
Sometimes difficult to make it work in practice
9/7/2020 24
Is it alchemy?
9/7/2020 Xuming He – CS 280 Deep Learning 25
Questions to ask
Understanding neural networks
What is different from traditional ML methods?
How it works for specific problems?
Why get great performance?
Future development
Its limitation and weakness?
After more than 10 years, what is on-going or next?
The road to general-purpose AI?
9/7/2020 Xuming He – CS 280 Deep Learning 26
Outline
Course logistics
Introduction to deep learning
Machine learning review
Math review
Supervised learning
Artificial neurons
Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu
Liang@Princeton’s course notes
9/7/2020 Xuming He – CS 280 Deep Learning 27
Math review – Calculus
Gradient
9/7/2020 Xuming He – CS 280 Deep Learning 28
Math review – Calculus
Local and global minima
Necessary condition
Sufficient condition
Hessian is positive definite
9/7/2020 Xuming He – CS 280 Deep Learning 29
Math review – Probability
Factorization
9/7/2020 Xuming He – CS 280 Deep Learning 30
Math review – Probability
Common distributions
9/7/2020 Xuming He – CS 280 Deep Learning 31
Math review – Statistics
Monte Carlo estimation
Maximum likelihood
Independent and identically distributed
9/7/2020 Xuming He – CS 280 Deep Learning 32
ML tasks
Classification: assign a category to each item (e.g.,
document classification)
Regression: predict a real value for each item (e.g.,
prediction of stock values, economic variables)
Ranking: order items according to some criterion (e.g.,
relevant web pages returned by a search engine)
Clustering: partition data into ‘homogenous’ regions
(e.g., analysis of very large data sets)
Dimensionality reduction: find lower-dimensional
manifold preserving some properties of the data
9/7/2020 Xuming He – CS 280 Deep Learning 33
Standard learning scenarios
Unsupervised learning: no labeled data
Supervised learning: uses labeled data for prediction on
unseen points
Semi-supervised learning: uses labeled and unlabeled
data for prediction on unseen points
Reinforcement learning: uses reward to learn prediction
on action policies.
…
9/7/2020 Xuming He – CS 280 Deep Learning 34
Supervised learning
Task formulation
9/7/2020 Xuming He – CS 280 Deep Learning 35
Supervised learning
Task formulation
9/7/2020 Xuming He – CS 280 Deep Learning 36
Learning problem
Problem setup
9/7/2020 Xuming He – CS 280 Deep Learning 37
Learning problem
Problem setup
9/7/2020 Xuming He – CS 280 Deep Learning 38
Learning problem
Problem setup
9/7/2020 Xuming He – CS 280 Deep Learning 39
Learning problem
Problem setup
9/7/2020 Xuming He – CS 280 Deep Learning 40
Learning problem
Problem setup
9/7/2020 Xuming He – CS 280 Deep Learning 41
Learning problem
Problem setup
9/7/2020 Xuming He – CS 280 Deep Learning 42
Learning problem
Problem setup
9/7/2020 Xuming He – CS 280 Deep Learning 43
Learning problem
Problem setup
9/7/2020 Xuming He – CS 280 Deep Learning 44
Learning problem
Problem setup
9/7/2020 Xuming He – CS 280 Deep Learning 45
Learning as iterative optimization
Gradient descent
9/7/2020 Xuming He – CS 280 Deep Learning 46
Learning as iterative optimization
Stochastic gradient descent (SGD)
9/7/2020 Xuming He – CS 280 Deep Learning 47
Supervised learning pipeline
Three steps
Datasets & hyper-parameters
Hyper-parameter: a parameter of a model that is not trained
(specified before training)
9/7/2020 Xuming He – CS 280 Deep Learning 48
Generalization
Model selection for better generalization
Capacity: flexibility of a model
Underfitting: state of model which could improve generalization
with more training or capacity
Overfitting: state of model which could improve generalization
with less training or capacity
Model Selection: process of choosing the best hyper-parameters
on validation set
9/7/2020 Xuming He – CS 280 Deep Learning 49
Generalization
Training/Validation curves
9/7/2020 Xuming He – CS 280 Deep Learning 50
Questions
Generalization
Interaction between training set size/capacity/training time and
training error/generalization error
If capacity increases:
Training error will ?
Generalization error will ?
If training time increases:
Training error will ?
Generalization error will ?
If training set size increases:
Generalization error will ?
Gap between the training and generalization error will ?
9/7/2020 Xuming He – CS 280 Deep Learning 51
Outline
Course logistics
Introduction to deep learning
Machine learning review
Artificial neurons
Math model
Perceptron algorithm
Acknowledgement: Hugo Larochelle’s, Mehryar Mohri@NYU’s & Yingyu
Liang@Princeton’s course notes
9/7/2020 Xuming He – CS 280 Deep Learning 52
Artificial Neuron
Biological inspiration
https://www.youtube.com/watch?v=m0rHZ_RDdyQ
9/7/2020 Xuming He – CS 280 Deep Learning 53
Artificial Neuron
Biological inspiration
9/7/2020 Xuming He – CS 280 Deep Learning 54
Mathematical model of a neuron
9/7/2020 55
Activation functions
9/7/2020 Xuming He – CS 280 Deep Learning 56
Capacity of single neuron
Sigmoid activation function
9/7/2020 Xuming He – CS 280 Deep Learning 57
What a single neuron does?
A neuron (perceptron) fires if its input is within a specific
angle of its weight
If the input pattern matches the weight pattern closely enough
9/7/2020 Xuming He – CS 280 Deep Learning 58
Single neuron as a linear classifier
Binary classification
9/7/2020 Xuming He – CS 280 Deep Learning 59
How do we determine the weights?
Learning problem
9/7/2020 Xuming He – CS 280 Deep Learning 60
Linear classification
Learning problem: simple approach
• Drawback: Sensitive to “outliers”
9/7/2020 Xuming He – CS 280 Deep Learning 61
1D Example
Compare two predictors
9/7/2020 Xuming He – CS 280 Deep Learning 62
Perceptron algorithm
Learn a single neuron for binary classification
https://towardsdatascience.com/perceptron-explanation-implementation-and-a-visual-example-3c8e76b4e2d1
9/7/2020 Xuming He – CS 280 Deep Learning 63
Perceptron algorithm
Learn a single neuron for binary classification
Task formulation
9/7/2020 Xuming He – CS 280 Deep Learning 64
Perceptron algorithm
Algorithm outline
9/7/2020 Xuming He – CS 280 Deep Learning 65
Perceptron algorithm
Intuition: correct the current mistake
9/7/2020 Xuming He – CS 280 Deep Learning 66
Perceptron algorithm
The Perceptron theorem
9/7/2020 Xuming He – CS 280 Deep Learning 67
Hyperplane Distance
Perceptron algorithm
The Perceptron theorem: proof
9/7/2020 Xuming He – CS 280 Deep Learning 69
Perceptron algorithm
The Perceptron theorem: proof
9/7/2020 Xuming He – CS 280 Deep Learning 70
Perceptron algorithm
The Perceptron theorem: proof intuition
9/7/2020 Xuming He – CS 280 Deep Learning 71
Perceptron algorithm
The Perceptron theorem: proof
9/7/2020 Xuming He – CS 280 Deep Learning 72
Perceptron algorithm
The Perceptron theorem
9/7/2020 Xuming He – CS 280 Deep Learning 73
Perceptron Learning problem
What loss function is minimized?
9/7/2020 Xuming He – CS 280 Deep Learning 74
Perceptron algorithm
What loss function is minimized?
9/7/2020 Xuming He – CS 280 Deep Learning 75
Perceptron algorithm
What loss function is minimized?
9/7/2020 Xuming He – CS 280 Deep Learning 76
Summary
Introduction to deep learning
Course logistics
Review of basic math & ML
Artificial neurons
Next time
Basic neural networks
First Quiz on prerequisite
9/7/2020 Xuming He – CS 280 Deep Learning 77