QSRI-lecture1
QSRI-lecture1
An Overview
Seth Flaxman
Department of Mathematics and
Data Science Institute
2 July 2019
About me1
2
1
http://www.sethrf.com
What is machine learning?
https://www.youtube.com/watch?v=f_uwKZIAeM0
3
What is machine learning?
4
What is statistical machine learning?
5
Teaching staff
6
Schedule
7
Assessment
8
Learning Objectives
By the end of this module, students should be able to:
I Understand what machine learning is and how it relates to statistics and
computer science
I Work with relevant calculus and linear algebra (gradients, vectors,
matrices, norms)
I Understand linear models, loss functions, and regularization
I Understand bias vs. variance and be able to assess how various models
and hyperparameters will increase/decrease each
I Be able to characterize algorithms as appropriate for supervised
vs. unsupervised learning
I Be familiar with the basic ideas of support vector machines, decision
trees, and random forests
I Be able to critically reflect on ethical issues in the application of machine
learning
I Demonstrate familiarity with neural networks and deep learning
9
Supervised vs. unsupervised learning: terminology
10
Supervised vs. unsupervised learning: terminology
11
Supervised vs. unsupervised learning: terminology
12
Supervised vs. unsupervised learning: terminology
13
Supervised vs. unsupervised learning: terminology
14
Supervised learning
15
Supervised learning, most basic setup
x f y
(xi , yi ), i = 1, . . . , n (1)
f :X →Y (2)
f (xi ) ≈ yi (3)
16
Unsupervised learning, clustering and dim. reduction
17
Supervised learning: k-nearest neighbors
18
Supervised learning: k-nearest neighbors
19
Supervised learning: k-nearest neighbors
20
Unsupervised learning: k-means clustering
21
Unsupervised learning: k-means clustering
22
Unsupervised learning: k-means clustering
23
Unsupervised learning: k-means clustering
24
Unsupervised learning: k-means clustering
25
Unsupervised learning: k-means clustering
26
Unsupervised learning: k-means clustering
27
Supervised learning: further considerations
28
Supervised learning: further considerations
Loss is bad, you want to avoid loss, so smaller loss is better! (Some
losses are always positive, others can be positive or negative.)
29
Supervised learning: further considerations
30
An algorithmic vs. statistical perspective
31
The curse of dimensionality (Bellman 1961)
As p increases, all points are about equally distant from one another:
32
The curse of dimensionality (Bellman 1961)
As p increases, all points are about equally distant from one another:
33
The curse of dimensionality (Bellman 1961)
As p increases, all points are about equally distant from one another:
fraction close
0.4
0.2
0.0
5 10 15 20
p
34
Linear regression as a statistical machine learning method
f (x) = α + βx (9)
35
Linear regression as a statistical machine learning method
f (x) = α + βx (9)
36
Linear regression as a statistical machine learning method
f (x) = α + βx (9)
37
Linear regression as a statistical machine learning method
f (x) = α + βx (9)
38
Linear regression as a statistical machine learning method
f (x) = α + βx (11)
39
Linear regression as a statistical machine learning method
f (x) = α + βx (11)
40
Linear regression as a statistical machine learning method
f (x) = α + βx (11)
41
Linear regression as a statistical machine learning method
f (x) = α + βx (11)
42
Linear regression as a statistical machine learning method
I Closed-form optima aren’t always available: need to use some
sort of optimization method (e.g. gradient descent) to learn
the parameters of a model.
I Many machine learning papers back in the day contained
pages of math deriving gradients
I More common these days to rely on autodifferentiation
methods (see the deep learning revolution)
I Distinction between parameters (usually fit with optimization)
and hyperparameters (usually learned by crossvalidation)
43
A quick tour of classic supervised learning methods
44
A quick tour of classic unsupervised learning methods
45