Open navigation menu

Scribd

0% found this document useful (0 votes)

28 views41 pages

Cours1 ML

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views41 pages

Cours1 ML

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Theoretical Foundations of Machine Learning

Vianney Perchet
29th January 2024

Lecture 1/12
Structure of the course

12 lectures 1H30 + 5 TP + 3TD (1H30)

1. Introduction
2. Plug-in methods & over/under-fitting
3. Model selection & penalization
4. Empirical Risk Minimization
5. Decision Trees & Random Forest
6. Neural Nets & Deep Learning (2 sessions)
7. Transformers, implicit regularization, double descent
8. Reinforcement learning
9. Clustering & PCA
10. Ethics: Privacy and Fairness (2 sessions)

2
Machine Learning is everywhere

• Image Recognition

3
Machine Learning is everywhere

• Image Recognition
• Web search

3
Machine Learning is everywhere

• Image Recognition
• Web search
• Recommendation

3
Machine Learning is everywhere

• Image Recognition
• Web search
• Recommendation
• Advertisement

3
Machine Learning is everywhere

• Image Recognition
• Web search
• Recommendation
• Advertisement
• Scoring

3
Machine Learning is everywhere

• Image Recognition
• Web search
• Recommendation
• Advertisement
• Scoring
• Market
Segmentation

3
Machine Learning is everywhere

• Image Recognition
• Web search
• Recommendation
• Advertisement
• Scoring
• Market
Segmentation
• Translation

3
Machine Learning is everywhere

• Image Recognition
• Web search
• Recommendation
• Advertisement
• Scoring
• Market
Segmentation
• Translation
• Speech Recognition

3
Machine Learning is everywhere

• Image Recognition
• Web search
• Recommendation
• Advertisement
• Scoring
• Market
Segmentation
• Translation
• Speech Recognition
• Self-Driving Cars...

3
Machine Learning is everywhere

• Image Recognition
• Web search
• Recommendation
• Advertisement
• Scoring
• Market
Segmentation
• Translation
• Speech Recognition
• Self-Driving Cars...
• Healthcare

3
Machine Learning is everywhere

• Image Recognition
• Web search
• Recommendation
• Advertisement
• Scoring
• Market
Segmentation
• Translation
• Speech Recognition
• Self-Driving Cars...
• Healthcare
• Generative AI (text -
ChatGPT, image -
DallE...)
3
4 typical ML tasks

1. Supervised Learning
• Data Xi ∈ X (images) have labels Yi ∈ Y (cat/dog)
• Predict the label of future/new/unseen data X
• Examples: Digit classification, Advertisement, Speech Recognition,...

4
4 typical ML tasks

1. Supervised Learning
• Data Xi ∈ X (images) have labels Yi ∈ Y (cat/dog)
• Predict the label of future/new/unseen data X
• Examples: Digit classification, Advertisement, Speech Recognition,...
2. Un-supervised Learning
• Data Xi ∈ X (users) are just vectors without labels
• Find some “structure”
• Ex.: small groups (clustering) or ambiant space (dimension reduc.)

4
4 typical ML tasks

1. Supervised Learning
• Data Xi ∈ X (images) have labels Yi ∈ Y (cat/dog)
• Predict the label of future/new/unseen data X
• Examples: Digit classification, Advertisement, Speech Recognition,...
2. Un-supervised Learning
• Data Xi ∈ X (users) are just vectors without labels
• Find some “structure”
• Ex.: small groups (clustering) or ambiant space (dimension reduc.)
3. Reinforcement Learning
• Learner affects and interacts with the environment
• Examples: Robots, driving cars, drone
• What you see depends on what you do
4. Generative AI
• Dataset (Xi , Yi ) ∈ X × Y (images,labels)
• Create new data (X′j , Y′j ) that “looks like” an original one

4
The prediction task of supervised learning

• “Attributes/Features” space X ⊂ Rd & ”label” space Y ⊂ R

n o
• Training data-set: Dn = (X1 , Y1 ), . . . , (Xn , Yn )
• Future data ≃ Past data
• (Xi , Yi ) are i.i.d. of unknown joint law P on X × Y

From Dn , predict what is the “probable” labels Yn+1 of Xn+1

5
The prediction task of supervised learning

• “Attributes/Features” space X ⊂ Rd & ”label” space Y ⊂ R

n o
• Training data-set: Dn = (X1 , Y1 ), . . . , (Xn , Yn )
• Future data ≃ Past data
• (Xi , Yi ) are i.i.d. of unknown joint law P on X × Y

From Dn , predict what is the “probable” labels Yn+1 of Xn+1

• Predictor f : X → Y
• If Y = {0, 1} : classifier
• If Y = R : regressor & scoring rule

5
Performance of a predictor

• Risk based on some “local” loss ℓ : Y × Y → R+

• Cost of predicting Y′ instead of Y
• 0 − 1 loss: ℓ(Y, Y′ ) = 1{Y ̸= Y′ } (classification)
• quad-loss: ℓ(Y, Y′ ) = ∥Y − Y′ ∥2 (linear regression)
• logistic-loss: ℓ(Y, Y′ ) = log(1 + exp(−YY′ )) (logistic reg.)

6
Performance of a predictor

• Risk based on some “local” loss ℓ : Y × Y → R+

• Cost of predicting Y′ instead of Y
• 0 − 1 loss: ℓ(Y, Y′ ) = 1{Y ̸= Y′ } (classification)
• quad-loss: ℓ(Y, Y′ ) = ∥Y − Y′ ∥2 (linear regression)
• logistic-loss: ℓ(Y, Y′ ) = log(1 + exp(−YY′ )) (logistic reg.)

h i
Risk: R(f) = E(X,Y)∼P ℓ f(X), Y

• Optimal risk and Bayes predictor

f∗ = arg minf R(f) and R∗ = R(f∗ )

6
Performance of a predictor

• Risk based on some “local” loss ℓ : Y × Y → R+

• Cost of predicting Y′ instead of Y
• 0 − 1 loss: ℓ(Y, Y′ ) = 1{Y ̸= Y′ } (classification)
• quad-loss: ℓ(Y, Y′ ) = ∥Y − Y′ ∥2 (linear regression)
• logistic-loss: ℓ(Y, Y′ ) = log(1 + exp(−YY′ )) (logistic reg.)

h i
Risk: R(f) = E(X,Y)∼P ℓ f(X), Y

• Optimal risk and Bayes predictor

f∗ = arg minf R(f) and R∗ = R(f∗ )

• Remark: R(f) cannot

n be evaluated.
o
′
• Test set Dm = (X′i , Y′i ) INDEPENDENT from Training set

1 X
m
R(f) ≃ ℓ(f(X′i ), Y′i ) thanks to the CLT
m i=1

• Recommendation: 80% data in training set, 20% in test set 6

Optimal/Bayes predictor

η(x) = E[Y|X = x]

• Linear Regression
• ℓ(y, y′ ) = (y − y′ )2 with y ∈ R
• “closed” form Bayes regressor

f∗ (x) = η(x)

• Binary Classification
• ℓ(y, y′ ) = 1{y ̸= y′ }
• “closed” form Bayes classifier

f∗ (x) = 1{η(x) ≥ 1
2
}

7
Refined losses. Type I/II, Precision / Recall

• “Unbalanced” data (almost only 0’s) or effect (0 = credit fraud)

Predict 0 instead of 1, way worse than 1 instead of 0

8
Refined losses. Type I/II, Precision / Recall

• “Unbalanced” data (almost only 0’s) or effect (0 = credit fraud)

Predict 0 instead of 1, way worse than 1 instead of 0

♯{j:Yj =1 and f(Xj )=1}

• Precision: ♯{i:f(Xi )=1} ≃ P Y = 1 f(X) = 1
♯{j:Yj =1 and f(Xj )=1}
• Recall: ♯{i:Yi =1} ≃ P f(X) = 1 Y = 1
• Not “local” losses but global
• more diﬀicult to control
• (in theory as in practice)
• Many variants
• False Discovery Rate P{Y = 0|f(X) = 1}
Precision.Recall
• F1 score = 2 Precision+Recall

8
Scoring Rules - Area Under the Curve
n o
• Training data-set: Dn = (X1 , Y1 ), . . . , (Xn , Yn )
• Score f : X → R
• Threshold θ∗ ∈ R. Above θ, user “accepted” (below, rejected)
• Tuning θ balances True Positive Rate vs False Positive Rate
• TRP : P{f(X) = 1|Y = 1}; FPR: P{f(X) = 1|Y = 0}
• High θ: few users, pretty confident
• Low θ: many users, low confidence

ROC: True positives function of False positives

• Parameterized by θ: the higher the ROC the better

AUC: Area Under the roc Curve

9
Scoring Rules - Area Under the Curve

9
Unsupervised learning

• “Attributes/Features” space X ⊂ Rd but no labels

n o
• data-set: Dn = X1 , . . . , Xn Xi might not be i.i.d. !

Find a “good small dimension” representation of Dn

10
Unsupervised learning

• “Attributes/Features” space X ⊂ Rd but no labels

n o
• data-set: Dn = X1 , . . . , Xn Xi might not be i.i.d. !

Find a “good small dimension” representation of Dn

• Clustering: Regroup Dn into k “groups” of points

• How to choose k ?
• Possible metrics
• Low intracluster similarity. “average distance within a cluster)
• High intercluster distance “average distance between 2 different
clusters

10
Unsupervised learning

10
Unsupervised learning

10
Unsupervised learning

• “Attributes/Features” space X ⊂ Rd but no labels

n o
• data-set: Dn = X1 , . . . , Xn Xi might not be i.i.d. !

Find a “good small dimension” representation of Dn

• Clustering: Regroup Dn into k “groups” of points

• How to choose k ?
• Possible metrics
• Low intracluster similarity. “average distance within a cluster)
• High intercluster distance “average distance between 2 different
clusters
• Dimension reduction: Project Xi on a d-dimension linear space
• How to choose d ?
• Metric ? Average distance point/projection

10
Unsupervised learning

10
Unsupervised learning

10
Statistics vs. ML: Linear Regression

• Dn = {(Xi , Yi ), ; i = 1, . . . , n}, with Xi ∈ Rd and Yi ∈ R

with quadratic loss: ℓ(y, y′ ) = |y − y′ |2
• Local methods too slow. What about global methods ?

11
Statistics vs. ML: Linear Regression

• Dn = {(Xi , Yi ), ; i = 1, . . . , n}, with Xi ∈ Rd and Yi ∈ R

with quadratic loss: ℓ(y, y′ ) = |y − y′ |2
• Local methods too slow. What about global methods ?
• Linear predictor: fβ (x) = β ⊤ x with β ∈ Rd
h i
• Best linear pred. β ∗ = arg minβ E ℓ Y, β ⊤ X

R(fβ ) − R(f∗ ) = R(fβ ) − R(fβ ∗ ) + R(fβ ∗ ) − R(f∗ )

| {z } | {z }
Estimation Error Approximation Error

Pn 2 2
• Empirical error: R̂(fβ ) = 1
n i=1 Yi − β ⊤ Xi = Y − Xβ

−1
Closed form: β̂ = X⊤ X X⊤ Y

11
Statistics vs. ML. Pros/Cons

• Statistics. The model is correct (Approx. Error = 0)

• Can compute law of residuals Yi − β̂ ⊤ Xi and ∥β̂ − β ∗ ∥2
• Machine Learning The model is incorrect (Approx. Error > 0)
• Can add/create features (X2i , 2Xi + 3Xj ....)
• ✓ Pros
• Simple: closed form solution & easily generalizable
• Good first approximation
• Rather intuitive
• 7 Cons
• Potential huge approximation error
• Non-robust to outliers (high generalization error)
• Makes sense only for Y = R, not for Y = {0, 1}

12
Logistic Regression

• Most datasets: Y = {0, 1} & η(x) = P(Y = 1|X = x)

Linear regression outputs “probability” in R...

exp(β ⊤ x)
Logistic Reg. ηβ (x) =
1 + exp(β ⊤ x)

13
Logistic Regression

• Most datasets: Y = {0, 1} & η(x) = P(Y = 1|X = x)

Linear regression outputs “probability” in R...

exp(β ⊤ x)
Logistic Reg. ηβ (x) =
1 + exp(β ⊤ x)

• Maximize the log-likelihood or the empirical “log-loss”

1X
n
log-loss(β) = e i β ⊤ Xi ) with Y
log 1 + exp(−Y e i = 2Yi − 1
n
i=1

13
Logistic Regression. The upsides

1X
n
log-loss(β) = e i β ⊤ Xi ) with Y
log 1 + exp(−Y e i = 2Yi − 1
n
i=1

• ✓ log2 (1 + exp(u)) smooth & convex surrogate of 0-1 loss

• ✓ log-loss(·) is convex and differentiable.
• Can be optimized
1X ei
Y
• ∇log-loss(β) = Xi
n i 1 + exp(Y e i β ⊤ Xi )
e i∗
Y
b
• Unbiased grad. ∇l.-l.(β) = Xi∗ with i∗ random
1 + exp(Y e i∗ β ⊤ Xi∗ )
• ✓ ✓ Works very-well in practice (most of “AI” is log. regression)
• 7 Cons: no closed form of β; need computational power

14
Logistic Regression. The upsides

14
Take home message - most important settings

• “Attributes/Features” space X ⊂ Rd & ”label” space Y ⊂ R

n o
• Training data-set: Dn = (X1 , Y1 ), . . . , (Xn , Yn )
• Risk w.r.t. loss ℓ : Y × Y → R+
h i
Risk: R(f) = E(X,Y)∼P ℓ f(X), Y

• Optimal risk and Bayes predictor

f∗ = arg minf R(f) and R∗ = R(f∗ )
• Restricted class of predictor/classifiers: {fβ ; β ∈ B}

R(fβ ) − R(f∗ ) = R(fβ ) − R(fβ ∗ ) + R(fβ ∗ ) − R(f∗ )

| {z } | {z }
Estimation Error Approximation Error

15

You might also like

QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Lecture 02
No ratings yet
Lecture 02
43 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
Lect 1
No ratings yet
Lect 1
24 pages
Optimization Problems For Machine Learning: A Survey
No ratings yet
Optimization Problems For Machine Learning: A Survey
41 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
07 - Linear Models For Classification
No ratings yet
07 - Linear Models For Classification
76 pages
Summary Machine Learning
No ratings yet
Summary Machine Learning
6 pages
ML 01
No ratings yet
ML 01
24 pages
SML Lecture1
No ratings yet
SML Lecture1
37 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Neural Networks
No ratings yet
Neural Networks
38 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Unit-I Machine Learning Basics
No ratings yet
Unit-I Machine Learning Basics
85 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Machine Learning Cheatsheet
100% (1)
Machine Learning Cheatsheet
15 pages
Advanced ML Slides Intro
No ratings yet
Advanced ML Slides Intro
14 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
17 pages
Data Science Lecture: Classification & Regression
No ratings yet
Data Science Lecture: Classification & Regression
27 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Week - 03 Week04
No ratings yet
Week - 03 Week04
32 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Quiz 1 Materials
No ratings yet
Quiz 1 Materials
159 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Machine Learning Basics and Techniques
No ratings yet
Machine Learning Basics and Techniques
56 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
31 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Machine Learning Overview Guide
No ratings yet
Machine Learning Overview Guide
68 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Comprehensive Guide to Machine Learning
No ratings yet
Comprehensive Guide to Machine Learning
19 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Introduction To Machine Learning - Unit 4 - Week 1
No ratings yet
Introduction To Machine Learning - Unit 4 - Week 1
5 pages
Machine Learning Issues & Algorithms
No ratings yet
Machine Learning Issues & Algorithms
133 pages
AI ML 3 Updated
No ratings yet
AI ML 3 Updated
34 pages
AWS Machine Learning Specialty Master Cheat Sheet
No ratings yet
AWS Machine Learning Specialty Master Cheat Sheet
24 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
CSE 440 AI Volume1 (p1)
No ratings yet
CSE 440 AI Volume1 (p1)
4 pages
AI & ML Classification Lecture
No ratings yet
AI & ML Classification Lecture
69 pages
Statistical Learning Theory Guide
No ratings yet
Statistical Learning Theory Guide
4 pages
Pros and Cons of Online Gaming
No ratings yet
Pros and Cons of Online Gaming
2 pages
Delivery Process 2. System Process
No ratings yet
Delivery Process 2. System Process
31 pages
K.Demelites Resume
No ratings yet
K.Demelites Resume
1 page
8097 UDatasheet
No ratings yet
8097 UDatasheet
2 pages
Microtalk Gold Partner Agreement
No ratings yet
Microtalk Gold Partner Agreement
17 pages
Wlk463140 01 Aa Videojet 8520 Service Manual Us
No ratings yet
Wlk463140 01 Aa Videojet 8520 Service Manual Us
150 pages
Ai Lab File Extra
No ratings yet
Ai Lab File Extra
2 pages
Six Sigma White Belt Certification v4
No ratings yet
Six Sigma White Belt Certification v4
38 pages
Supplier Exchange User Manual For External Party PDF
No ratings yet
Supplier Exchange User Manual For External Party PDF
5 pages
L&T Company Profile Overview
No ratings yet
L&T Company Profile Overview
37 pages
Datasheet Axis xpq1785 Explosion Protected PTZ Camera en US 463504
No ratings yet
Datasheet Axis xpq1785 Explosion Protected PTZ Camera en US 463504
3 pages
Introduction To Oracle: Lab Session
No ratings yet
Introduction To Oracle: Lab Session
12 pages
Energy Efficient PWM Dimmable Smart Digital LED Driver
No ratings yet
Energy Efficient PWM Dimmable Smart Digital LED Driver
6 pages
Vertigo
No ratings yet
Vertigo
126 pages
MP3764 Solar Charge Controller Manual
No ratings yet
MP3764 Solar Charge Controller Manual
18 pages
Narrative Report
No ratings yet
Narrative Report
46 pages
Air Force Institute of Technology Kaduna - Application Portal
No ratings yet
Air Force Institute of Technology Kaduna - Application Portal
2 pages
DBMS All 5 Units
No ratings yet
DBMS All 5 Units
110 pages
Lagrange Multipliers Can Fail To Determine Extrema: Acknowledgment
100% (2)
Lagrange Multipliers Can Fail To Determine Extrema: Acknowledgment
3 pages
Java Lab Exercises: Week-by-Week Guide
No ratings yet
Java Lab Exercises: Week-by-Week Guide
27 pages
Class11 SSC Maharashtra IT DetailedNotes
No ratings yet
Class11 SSC Maharashtra IT DetailedNotes
2 pages
Structural Welding Calculations
No ratings yet
Structural Welding Calculations
37 pages
Nhmcho Preexam Applform 24426
No ratings yet
Nhmcho Preexam Applform 24426
2 pages
Simple Sorting and Searching Algorithms 2.1searching: Pseudocode
No ratings yet
Simple Sorting and Searching Algorithms 2.1searching: Pseudocode
7 pages
How To Make Telecom CV 11.07.2023
No ratings yet
How To Make Telecom CV 11.07.2023
5 pages
Discrete Maths Assignment
No ratings yet
Discrete Maths Assignment
20 pages
Electric Circuit Diagram Template
No ratings yet
Electric Circuit Diagram Template
1 page
Feasibility Study For The Smart Search Product System Industry and Market Analysis
No ratings yet
Feasibility Study For The Smart Search Product System Industry and Market Analysis
5 pages
FortiRewards Enrollment Guide 2023
No ratings yet
FortiRewards Enrollment Guide 2023
10 pages
Kharkiv Airport UKHH Trip Kit
No ratings yet
Kharkiv Airport UKHH Trip Kit
35 pages