0% found this document useful (0 votes)

25 views45 pages

QSRI Lecture1

The document provides an overview of machine learning, defining it as a process where a computer program improves its performance on tasks through experience. It discusses the distinction between statistical machine learning and traditional computer science approaches, emphasizing the intersection of algorithms and statistical methods. Additionally, it outlines learning objectives, assessment methods, and introduces various supervised and unsupervised learning techniques.

Uploaded by

Len McLemore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views45 pages

QSRI Lecture1

Uploaded by

Len McLemore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Machine learning:

An Overview

Seth Flaxman
Department of Mathematics and
Data Science Institute
2 July 2019
About me1

Credit: Palsson on Flickr

2
1
[Link]
What is machine learning?

[Link]

3
What is machine learning?

For our purposes, follow Tom Mitchell:

A computer program is said to learn from experience

E with respect to some class of tasks T and
performance measure P if its performance at tasks in
T, as measured by P, improves with experience E.

4
What is statistical machine learning?

Both provide methods for learning from data.

Computer science takes an algorithmic perspective: propose an

algorithm for data, study the algorithm formally.

Statistics takes an inferential perspective: propose a model for

data, study the model formally.

Statistical machine learning (and computational statistics) is the

intersection: algorithmic perspective on statistical methods,
statistical perspective on algorithms.

5
Teaching staff

I Dr Seth Flaxman, Lecturer in Machine Learning and Big Data

Analytics, Department of Mathematics and Data Science
Institute
I Mariana Clare, PhD student in Mathematics of Planet Earth
CDT
I Adriaan Hilberts, PhD student in Mathematics of Planet
Earth CDT
I Jonathan Ish-Horowicz, PhD student in Theoretical Systems
Biology and Bioinformatics
I Lekha Patel, PhD student in Statistics in the Department of
Mathematics
I Tim Wolock, PhD student in Statistics in the Department of
Mathematics

6
Schedule

I Today: lecture 10am-1pm, 2pm-3pm; problem class 3pm-4pm

or 4pm-5pm
I Wednesday: problem class 10am-11am; lecture 11am-1pm,
2pm-4pm
I Thursday: lecture 10am-1pm, 2pm-3pm; problem class
3pm-4pm or 4pm-5pm

7
Assessment

I test on Friday at 2pm (20 questions, 1 hour)

8
Learning Objectives
By the end of this module, students should be able to:
I Understand what machine learning is and how it relates to statistics and
computer science
I Work with relevant calculus and linear algebra (gradients, vectors,
matrices, norms)
I Understand linear models, loss functions, and regularization
I Understand bias vs. variance and be able to assess how various models
and hyperparameters will increase/decrease each
I Be able to characterize algorithms as appropriate for supervised
vs. unsupervised learning
I Be familiar with the basic ideas of support vector machines, decision
trees, and random forests
I Be able to critically reflect on ethical issues in the application of machine
learning
I Demonstrate familiarity with neural networks and deep learning

9
Supervised vs. unsupervised learning: terminology

I Supervised learning, also known as: regression, classification,

pattern recognition, recovery, sensing, . . .

10
Supervised vs. unsupervised learning: terminology

I Supervised learning, also known as: regression, classification,

pattern recognition, recovery, sensing, . . .
I Unsupervised learning, also known as: clustering, data mining,
dimensionality reduction, . . .

11
Supervised vs. unsupervised learning: terminology

I Supervised learning, also known as: regression, classification,

pattern recognition, recovery, sensing, . . .
I Unsupervised learning, also known as: clustering, data mining,
dimensionality reduction, . . .
I Inputs, also known as: independent variables, predictors,
covariates, patterns, x, X , . . .

12
Supervised vs. unsupervised learning: terminology

I Supervised learning, also known as: regression, classification,

13
Supervised vs. unsupervised learning: terminology

I Supervised learning, also known as: regression, classification,

14
Supervised learning

15
Supervised learning, most basic setup

x f y

Given training inputs x ∈ X and outputs y ∈ Y

(xi , yi ), i = 1, . . . , n (1)

Learn a function (algorithm, black box, decision rule, classifier,

probability distribution)

f :X →Y (2)

i.e. on the training inputs, we would like our function f to

approximately recover the training outputs:

f (xi ) ≈ yi (3)

16
Unsupervised learning, clustering and dim. reduction

Given training inputs x ∈ X , learn:

I Clustering: a function f giving cluster assignments 1, . . . , K

f (x) ∈ {1, . . . , K } (4)

such that Ck = {xi |f (xi ) = k} is homogeneous for each k.

I Dimensionality reduction: if X ∈ Rp , for large p, learn a
latent representation Z ∈ Rd , d p, such that Z explains
most of the variance in X .

17
Supervised learning: k-nearest neighbors

18
Supervised learning: k-nearest neighbors

19
Supervised learning: k-nearest neighbors

20
Unsupervised learning: k-means clustering

21
Unsupervised learning: k-means clustering

22
Unsupervised learning: k-means clustering

23
Unsupervised learning: k-means clustering

24
Unsupervised learning: k-means clustering

25
Unsupervised learning: k-means clustering

26
Unsupervised learning: k-means clustering

27
Supervised learning: further considerations

I Loss function: standard choice in regression are squared error

(L2 ) loss:
L(x, y , f ) := (y − f (x))2 (5)
I Standard choice in classification is misclassification rate (1 -
accuracy):
L(x, y , f ) := 1 − I(y = f (x)) (6)

28
Supervised learning: further considerations

I Loss function: standard choice in regression are squared error

(L2 ) loss:
L(x, y , f ) := (y − f (x))2 (5)
I Standard choice in classification is misclassification rate (1 -
accuracy):
L(x, y , f ) := 1 − I(y = f (x)) (6)

Loss is bad, you want to avoid loss, so smaller loss is better! (Some
losses are always positive, others can be positive or negative.)

29
Supervised learning: further considerations

Quiz: what value of k for k-nearest neighbors gives training

loss = 0? Does this make sense?

I Risk: expected loss

R(f ) := EX ,Y [L(x, y , f )] (7)

I Empirical risk: average over data, e.g. “ordinary least

squares”:
n
X
R̂(f ) := (yi − f (xi ))2 (8)
i=1

30
An algorithmic vs. statistical perspective

I K-nearest neighbors and k-means clustering are algorithms for

handling data
I Algorithmic questions: what is their time complexity in terms
of p and n? storage complexity?
I Statistical perspective: can the performance of either
algorithm be analyzed with reference to an underlying
probabilistic model?
I Statistical questions: what kind of performance do we expect
on unseen data (generalization)? How does performance vary
with n and p? How robust is the model to outliers?

31
The curse of dimensionality (Bellman 1961)
As p increases, all points are about equally distant from one another:

32
The curse of dimensionality (Bellman 1961)
As p increases, all points are about equally distant from one another:

33
The curse of dimensionality (Bellman 1961)
As p increases, all points are about equally distant from one another:

fraction close

0.4
0.2
0.0

5 10 15 20

p
34
Linear regression as a statistical machine learning method

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

f (x) = α + βx (9)

35
Linear regression as a statistical machine learning method

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

f (x) = α + βx (9)

Finding f in this case means finding values for α and β.

36
Linear regression as a statistical machine learning method

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

f (x) = α + βx (9)

Finding f in this case means finding values for α and β.

I Algorithmic perspective: assuming squared error loss, find α
and β to minimize the empirical risk:
n
X
R̂(f ) := (yi − f (xi ))2 (10)
i=1

37
Linear regression as a statistical machine learning method

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

f (x) = α + βx (9)

Finding f in this case means finding values for α and β.

I Algorithmic perspective: assuming squared error loss, find α
and β to minimize the empirical risk:
n
X
R̂(f ) := (yi − f (xi ))2 (10)
i=1

Closed form solutions exist for α̂ and β̂ which minimize R̂(f ).

Exercise: find them! Hint: you will need to solve ∇β R̂(f ) = 0
and ∇α R̂(f ) = 0.

38
Linear regression as a statistical machine learning method

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

f (x) = α + βx (11)

Finding f in this case means finding values for α and β.

I Statistical perspective:

39
Linear regression as a statistical machine learning method

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

f (x) = α + βx (11)

Finding f in this case means finding values for α and β.

I Statistical perspective: assume that errors are iid N (0, σ 2 ), or
equivalently:
p(y |x) = N (f (x), σ 2 ) (12)

40
Linear regression as a statistical machine learning method

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

f (x) = α + βx (11)

Finding f in this case means finding values for α and β.

I Statistical perspective: assume that errors are iid N (0, σ 2 ), or
equivalently:
p(y |x) = N (f (x), σ 2 ) (12)

I Use maximum likelihood to estimate α̂ and β̂.

41
Linear regression as a statistical machine learning method

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

f (x) = α + βx (11)

Finding f in this case means finding values for α and β.

I Statistical perspective: assume that errors are iid N (0, σ 2 ), or
equivalently:
p(y |x) = N (f (x), σ 2 ) (12)

I Use maximum likelihood to estimate α̂ and β̂.

I The statistical and algorithmic perspectives coincide!

42
Linear regression as a statistical machine learning method
I Closed-form optima aren’t always available: need to use some
sort of optimization method (e.g. gradient descent) to learn
the parameters of a model.
I Many machine learning papers back in the day contained
pages of math deriving gradients
I More common these days to rely on autodifferentiation
methods (see the deep learning revolution)
I Distinction between parameters (usually fit with optimization)
and hyperparameters (usually learned by crossvalidation)

43
A quick tour of classic supervised learning methods

I k-nearest neighbors [Friedman, Tibshirani, Hastie 2009]

I Linear regression
I Naive Bayes [Mitchell 1997]
I Logistic regression
I Linear Discriminant Analysis
I Support Vector Machines (SVMs) [Scholkopf and Smola 2002]
I Gaussian process regression and classification [Rasmussen and
Williams 2006]
I Neural networks [Goodfellow, Bengio, Courville 2016]
I Random forests [Breiman 2001]
I Probabilistic Graphical Models [Murphy 2012]

44
A quick tour of classic unsupervised learning methods

I k-means clustering [Friedman, Tibshirani, Hastie 2009]

I Spectral clustering [von Luxburg 2007]
I Principal Components Analysis
I Latent Dirichlet Allocation [Blei, Ng, Jordan 2003]
I Gaussian Mixture Models
I Neural networks, especially VAEs and GANs [Goodfellow,
Bengio, Courville 2016]

Machinelearning
No ratings yet
Machinelearning
59 pages
Machine Learning Overview Guide
No ratings yet
Machine Learning Overview Guide
68 pages
AI & ML Unit 3 Notes
No ratings yet
AI & ML Unit 3 Notes
20 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
AI ML 3 Updated
No ratings yet
AI ML 3 Updated
34 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
ML Imp QB
No ratings yet
ML Imp QB
34 pages
Week - 03 Week04
No ratings yet
Week - 03 Week04
32 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
Unit 4
No ratings yet
Unit 4
72 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
17 pages
Machine Learning Cheatsheet
100% (1)
Machine Learning Cheatsheet
15 pages
GML Slides 2024 04 29
No ratings yet
GML Slides 2024 04 29
206 pages
Beginner's Guide to Machine Learning
No ratings yet
Beginner's Guide to Machine Learning
37 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Machine Learning: Introduction and Linear Regression
No ratings yet
Machine Learning: Introduction and Linear Regression
29 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Ml-Unit 2-QB
No ratings yet
Ml-Unit 2-QB
6 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
Machine Learning
No ratings yet
Machine Learning
21 pages
Regression 0
No ratings yet
Regression 0
108 pages
Machine Learning: Supervised vs Unsupervised
No ratings yet
Machine Learning: Supervised vs Unsupervised
21 pages
Intro To ML
No ratings yet
Intro To ML
26 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
Unsupervised Learning in Machine Learning
No ratings yet
Unsupervised Learning in Machine Learning
49 pages
Supervised Learning Notes
No ratings yet
Supervised Learning Notes
13 pages
An Introduction To Machine Learning
No ratings yet
An Introduction To Machine Learning
136 pages
Data Science Lecture: Classification & Regression
No ratings yet
Data Science Lecture: Classification & Regression
27 pages
Algorithms For Data Science: Attendance: 88772147
No ratings yet
Algorithms For Data Science: Attendance: 88772147
35 pages
Chapter Introduction
No ratings yet
Chapter Introduction
7 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Lect 1
No ratings yet
Lect 1
24 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
103 pages
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
No ratings yet
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
101 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
7 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
Aiml-Qb - Unit 3
No ratings yet
Aiml-Qb - Unit 3
6 pages
Neural Networks Cheat Sheet - 2020 PDF
No ratings yet
Neural Networks Cheat Sheet - 2020 PDF
14 pages
Cp4252 ML Unit-II
No ratings yet
Cp4252 ML Unit-II
44 pages
SML Lecture1
No ratings yet
SML Lecture1
37 pages
Lecture 1 2022
No ratings yet
Lecture 1 2022
55 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
15 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
12 pages
5.1 ML Basics M1
No ratings yet
5.1 ML Basics M1
37 pages
CE880 Lecture5 Slides
No ratings yet
CE880 Lecture5 Slides
32 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
14 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Intro To Machine Learning With PyTorch
No ratings yet
Intro To Machine Learning With PyTorch
48 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Unit - 2, Updated Notes
No ratings yet
Unit - 2, Updated Notes
121 pages
Research Design in Business and Management 2024
100% (1)
Research Design in Business and Management 2024
261 pages
Management of Engineering Projects
No ratings yet
Management of Engineering Projects
32 pages
HASTS215 - HSTS215 NOTES Chapter5
No ratings yet
HASTS215 - HSTS215 NOTES Chapter5
18 pages
Applied Regression Assignment
No ratings yet
Applied Regression Assignment
2 pages
1 PB2
No ratings yet
1 PB2
12 pages
Returns To Schooling in Kazakhstan: An Update Using A Pseudo Panel Approach
No ratings yet
Returns To Schooling in Kazakhstan: An Update Using A Pseudo Panel Approach
51 pages
Week 1 - Introduction To Statistics PDF
No ratings yet
Week 1 - Introduction To Statistics PDF
34 pages
Impact of Time-Management On The Student's Academic Performance - A Cross-Sectional Study
No ratings yet
Impact of Time-Management On The Student's Academic Performance - A Cross-Sectional Study
18 pages
New Insights Into Corpora and Translation 1st Edition Daniel Gallego-Hernández (Ed.) PDF Available
No ratings yet
New Insights Into Corpora and Translation 1st Edition Daniel Gallego-Hernández (Ed.) PDF Available
155 pages
Lecture 7 - Perceptrons and Multi-Layer Feedforward Neural Networks Using Matlab Part 3
No ratings yet
Lecture 7 - Perceptrons and Multi-Layer Feedforward Neural Networks Using Matlab Part 3
6 pages
Job Satisfaction Study: Oil Palm India
No ratings yet
Job Satisfaction Study: Oil Palm India
76 pages
Library Evaluation Methods and Standards
No ratings yet
Library Evaluation Methods and Standards
7 pages
Analyzing Wimbledon The Power of Statistics PDF
100% (2)
Analyzing Wimbledon The Power of Statistics PDF
269 pages
s43076 024 00370 7
No ratings yet
s43076 024 00370 7
20 pages
Manufacturing System Simulation Analysis
No ratings yet
Manufacturing System Simulation Analysis
14 pages
Npar Tests: Npar Tests /K-W Hasil by Perlakuan (1 2) /missing Analysis
No ratings yet
Npar Tests: Npar Tests /K-W Hasil by Perlakuan (1 2) /missing Analysis
59 pages
1
No ratings yet
1
30 pages
The Impact of Bonuses and Increments On Employees Retention: April 2020
No ratings yet
The Impact of Bonuses and Increments On Employees Retention: April 2020
20 pages
The Effect of Conservative Financial Reporting & Tax Aggressiveness (Q2-H2)
No ratings yet
The Effect of Conservative Financial Reporting & Tax Aggressiveness (Q2-H2)
23 pages
Aols - Dik19c - Idea Engineering - Febiola M R S
No ratings yet
Aols - Dik19c - Idea Engineering - Febiola M R S
15 pages
Physics Experiment 1
100% (1)
Physics Experiment 1
3 pages
Adv Biostat HU 2021
No ratings yet
Adv Biostat HU 2021
292 pages
Yearly KSSM Lesson Plan for Form 1
No ratings yet
Yearly KSSM Lesson Plan for Form 1
8 pages
MCQ On Anova
100% (2)
MCQ On Anova
6 pages
Calculus I 24-25 - Lecture #2 (Printed Version)
No ratings yet
Calculus I 24-25 - Lecture #2 (Printed Version)
51 pages
Understanding Non-Sampling Errors
No ratings yet
Understanding Non-Sampling Errors
7 pages
Ultrasonic Nondestructive Evaluation Engineering and Biological Material Characterization 1st Edition Tribikram Kundu PDF Download
No ratings yet
Ultrasonic Nondestructive Evaluation Engineering and Biological Material Characterization 1st Edition Tribikram Kundu PDF Download
73 pages
Frsund2017 - Economic Interpretations of DEA
No ratings yet
Frsund2017 - Economic Interpretations of DEA
23 pages
(Ebook) Real Research: Methods Sociology Students Can Use by Liahna E Gordon ISBN 9781452299365, 1452299366 PDF Download
No ratings yet
(Ebook) Real Research: Methods Sociology Students Can Use by Liahna E Gordon ISBN 9781452299365, 1452299366 PDF Download
176 pages
FOIA Request on DOJ Healthcare Practices
No ratings yet
FOIA Request on DOJ Healthcare Practices
20 pages

QSRI Lecture1

Uploaded by

QSRI Lecture1

Uploaded by

Machine learning:

Credit: Palsson on Flickr

For our purposes, follow Tom Mitchell:

A computer program is said to learn from experience

Both provide methods for learning from data.

Computer science takes an algorithmic perspective: propose an

Statistics takes an inferential perspective: propose a model for

Statistical machine learning (and computational statistics) is the

I Dr Seth Flaxman, Lecturer in Machine Learning and Big Data

I Today: lecture 10am-1pm, 2pm-3pm; problem class 3pm-4pm

I test on Friday at 2pm (20 questions, 1 hour)

I Supervised learning, also known as: regression, classification,

I Supervised learning, also known as: regression, classification,

I Supervised learning, also known as: regression, classification,

I Supervised learning, also known as: regression, classification,

I Supervised learning, also known as: regression, classification,

Given training inputs x ∈ X and outputs y ∈ Y

Learn a function (algorithm, black box, decision rule, classifier,

i.e. on the training inputs, we would like our function f to

Given training inputs x ∈ X , learn:

f (x) ∈ {1, . . . , K } (4)

such that Ck = {xi |f (xi ) = k} is homogeneous for each k.

I Loss function: standard choice in regression are squared error

I Loss function: standard choice in regression are squared error

Quiz: what value of k for k-nearest neighbors gives training

I Risk: expected loss

R(f ) := EX ,Y [L(x, y , f )] (7)

I Empirical risk: average over data, e.g. “ordinary least

I K-nearest neighbors and k-means clustering are algorithms for

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

Finding f in this case means finding values for α and β.

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

Finding f in this case means finding values for α and β.

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

Finding f in this case means finding values for α and β.

Closed form solutions exist for α̂ and β̂ which minimize R̂(f ).

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

Finding f in this case means finding values for α and β.

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

Finding f in this case means finding values for α and β.

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

Finding f in this case means finding values for α and β.

I Use maximum likelihood to estimate α̂ and β̂.

Given (xi , yi ), i = 1, . . . , n we consider fitting a linear model:

Finding f in this case means finding values for α and β.

I Use maximum likelihood to estimate α̂ and β̂.

I k-nearest neighbors [Friedman, Tibshirani, Hastie 2009]

I k-means clustering [Friedman, Tibshirani, Hastie 2009]

You might also like