0% found this document useful (0 votes)

11 views55 pages

Lecture 1 2022

The document outlines an introductory lecture on Machine Learning by Siddharth Garg, covering key concepts such as supervised and unsupervised learning, model fitting, and challenges like bias and variance. It discusses various examples including digit recognition, spam classification, and regression techniques, emphasizing the importance of model selection and evaluation methods like cross-validation. Additionally, it introduces logistic regression for binary classification and highlights the significance of understanding performance measures in machine learning applications.

Uploaded by

beezosjeffery

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views55 pages

Lecture 1 2022

Uploaded by

beezosjeffery

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Lecture 1: Machine Learning

Basics
Siddharth Garg
[email protected]
This Course…
Social network Spam filtering
deanonymization
Growing use of ML
techniques in cyber-
Biometrics
security application
Browser
fingerprinting Malware
detection
Automated
Network intrusion
Evasion
detection
This Course…
Bias and fairness Spam filtering

Vulnerabilities in
ML/AI deployments Interpretability
Accountability and
transparency
Model privacy
Adversarial
Training data
perturbations
poisoning attacks
What is Machine Learning?
• Ability for machines to learn without being explicitly programmed

"A computer program is said to learn from experience E with

respect to some class of tasks T and performance
measure P if its performance at tasks in T, as measured
by P, improves with experience E.” --- Mitchell, T. (1997).
Machine Learning. McGraw Hill. p. 2.

• Why not use user knowledge, experience or expertise?

• Are humans always able to explain their expertise?
• Can machines outperform humans?

• What kinds of experiences (E), tasks (T) and performance measures (P)?
Example: MNIST Digit Recognition
Task (T):
• Given gray-scale images and y find a function
𝑓 :𝑥→ 𝑦
Experience (E):
• A ”training dataset” a set of
8 correctly labeled images

Performance (P):
• Accuracy on a “test dataset”

https://www.npmjs.com/package/mnist
“Supervised Learning (Classification)”c
Example: Spam Classification
Task (T):
• Emails and y find
𝑓 :𝑥→ 𝑦
Experience (E):
• A ”training dataset” a emails
marked as “spam” or “non_spam”

Performance (P):
SPAM • Spam detection accuracy

“Supervised Learning (Classification)”

Some Challenges
Representing Data (or Feature Extraction)
• How to represent mathematically
• One example is “bag of words” representation: # times each word in
the dictionary occurs
• What do you lose?
• What do you gain?
• How can we compress this representation further?

What kind of classifier?

• What does the function f look like?
• And how do we learn it’s parameters?
Example: Clustering
Task (T): “Cluster” a set of documents into k groups such that
“similar” documents appear in the same group

Experience (E):
• A ”training dataset” of documents
without “labels”

Performance (P):
• Average distance to cluster center

“Unsupervised Learning”
Example: Anomaly Detection
Task (T):
• Which of these is like the others?

Experience (E):
• Unlabeled samples

Performance (P):
• Anomaly detection accuracy

“Unsupervised Learning”
Regression
Task (T):
• Given and y find a linear function
𝑓 :𝑥→ 𝑦
Experience (E):
• Training data: Points
Performance (P):
• Least squares fit: minimize mean
square error between prediction
[S. Rangan, EL-GY-9123 Lec 2]
and ground-truth
“Supervised Learning (Regression)”
Linear Least Squares Regression
𝑦 = 𝑓 ( 𝑥 ) = 𝛽 1 𝑥+ 𝛽 0

• How do we find the values ?

𝑁
min ∑ ¿ ¿ ¿ ¿ ¿
𝛽 1 , 𝛽 0 𝑖 =1

^
𝑦 𝑖= 𝛽1 𝑥 𝑖 + 𝛽0 ∀ 𝑖 ∈[1 , 𝑁 ]
Linear Least Squares Regression
𝑦 = 𝑓 ( 𝑥 ) = 𝛽 1 𝑥+ 𝛽 0
𝑔 ( 𝛽 1 , 𝛽0 )
• How do we find the values ?

𝑁
𝑁
min ∑ ¿ ¿ ¿ ¿ ¿ min ∑ (𝑦 ¿ ¿𝑖−𝛽 ¿ ¿1 𝑥𝑖 − 𝛽0 ) ¿¿ 2

𝛽 1 , 𝛽 0 𝑖=1
𝛽 1 , 𝛽 0 𝑖 =1

^
𝑦 𝑖= 𝛽1 𝑥 𝑖 + 𝛽0 ∀ 𝑖 ∈[1 , 𝑁 ]
𝜕𝑔 𝜕𝑔
=0 =0
𝜕 𝛽1 𝜕 𝛽0
Linear Least Squares Regression
𝑦 = 𝑓 ( 𝑥 ) = 𝛽 1 𝑥+ 𝛽 0

• How do we find the values ?

Residual Sum Squares (RSS) 𝑔 ( 𝛽 1 , 𝛽0 )
𝑁
𝑁
min ∑ (𝑦 ¿ ¿𝑖−𝛽 ¿ ¿1 𝑥𝑖 − 𝛽0 ) ¿¿ 2 𝜕𝑔
=∑ −2(𝑦 ¿ ¿𝑖− 𝛽 ¿ ¿1𝑥 𝑖 − 𝛽0 )=0 ¿¿
❑
𝛽 1 , 𝛽 0 𝑖=1 𝜕 𝛽 0 𝑖=1
Sample mean
𝑁

∑ ( 𝑦 𝑖 − 𝛽 1 𝑥 𝑖)
𝜕𝑔 𝜕𝑔 𝛽 0 = 𝑖 =1 = 𝑦 − 𝛽1 𝑥
=0 =0 𝑁
𝜕 𝛽1 𝜕 𝛽0
Are you surprised?
Linear Least Squares Regression
• How do we find the values ?
𝑔 ( 𝛽 1 , 𝛽0 ) 𝑁
𝜕𝑔
𝑁 =∑ −2 𝑥𝑖 (𝑦 ¿ ¿𝑖− 𝛽 ¿ ¿1𝑥 𝑖 −𝛽 0 )=0 ¿¿
❑

min ∑ (𝑦 ¿ ¿𝑖−𝛽 ¿ ¿1 𝑥𝑖 − 𝛽0 ) ¿¿
2 𝜕 𝛽1 𝑖=1
𝛽 1 , 𝛽 0 𝑖=1 𝑁

∑ 𝑥𝑖 ¿ ¿ ¿ ¿ ¿
𝑖 =1 Sample covariance
𝜕𝑔 𝜕𝑔 𝑥𝑦 − 𝑥 𝑦
=0 =0 𝛽 1= 2
𝜕 𝛽1 𝜕 𝛽0 𝑥 −𝑥
2

Sample variance
Auto Example
• Python code

Regression line:

15
Linear Least Squares (Multivariate)

• Now consider input: x   and output y   the goal is to learn

y  f ( x)  M xM  ...  1 x1   0

NM N
• Given training dataset X  and Y 

1 x01 x02 .. x0 M 0 y0
Yˆ  X
Training sample 1 x11 x12 .. x1M 1
= y1
Note: for simplicity we will
1 x N 1 x N 2 .. x NM M yN assume that X includes a
column of 1s
Linear Least Squares (Multivariate)
2
ˆ T ˆ
RSS  ( y  yˆ ) (Y  Y ) (Y  Y ) (Y  X ) (Y  X )
T

Objective: min (Y  X ) (Y  X )

T


1
Solution:  ( X X ) X Y
* T T
Following slides are from Prof. Sundeep Rangan’s Intro to ML Class.

Polynomial Fitting
• Last lecture: polynomial regression
• Given data
• Learn a polynomial relationship:

• = degree of polynomial. Called model order

• = coefficient vector
• Given , can find via least squares
• How do we select from data?
• This problem is called model order
selection.

18
Example Question
• You are given some data.
• Want to fit a model:
• Decide to use a polynomial:

• What model order should

we use?
• Thoughts?

19
Synthetic Data
• Previous example is synthetic
data
• : 40 samples uniform in [-1,1]

• = “true relation”
• ,

• Synthetic data useful for

analysis
• Know “ground truth”
• Can measure performance of
various estimators

20
Fitting with True Model Order
• Suppose true polynomial order,
d=3, is known
• Use linear regression
• numpy.polynomial package
• Get very good fit

21
But, True Model Order not Known
• Suppose we guess the wrong model order?

d=1 “Underfitting” d=10 “Overfitting”

22
How Can You Tell from Data?

• Is there a way to tell what is the correct model order to use?

• Must use the data. Do not have access to the true ?
• What happens if we guess:
• too big?
• too small?

23
Using RSS on Training Data?
• Simple (but bad) idea:
• For each model order, , find estimate
• Compute predicted values on training data

• Compute RSS

• Find with lowest

• This doesn’t work
• is always decreasing (Question: Why?)
• Minimizing will pick as large as possible
• Leads to overfitting
• What went wrong?
• How do we do better?

24
Model Class and True Function
• Analysis set-up:
• Learning algorithm assumes a model class:
• But, data has true relation:

• Will quantify three key effects:

• Irreducible error
• Under-modeling
• Over-fitting

25
Output Mean Squared Error
• To evaluate prediction error suppose we are given:
• A parameter estimate (computed from the learning algorithm)
• A test point
• Test point is generally different from training samples.
• Predicted value:
• Actual value:
• Output mean squared error:

• Expectation is over noise on the test sample.

26
Irreducible Error
• Rewrite output MSE:

• Since noise on test sample is independent of and :

• Define irreducible error:

• Lower bound on
• Fundamental limit on ability to predict
• Occurs since is influenced by other factors than
27
Analysis with Noise (Advanced)
• Now assume noise:
• Get training data:
• Fit a parameter:

• will be random.
• Depends on particular noise realization.
• Take a new test point (not random)
• Compute mean and variance of estimated function
• Define:
• Bias: Difference of true function from mean estimate
• Variance: Variance of estimate around its mean
33
Bias and Variance Illustrated
• Polynomial ex
• Mean and std
dev of
estimated
functions
• 100 trials

Low variance, High variance,

High bias Zero bias

34
Bias-Variance Tradeoff

Simpler models Richer models

Less parameters More parameters
Under-fitting Over-fitting 35
Cross Validation
• Concept: Need to test fit on data independent of training data
• Divide data into two sets:
• training samples, validation samples
• For each model order, , learn parameters from training samples
• Measure RSS on validation samples.

• Select model order that minimizes

36
Finding the Model Order
• Estimated optimal model order = 3

RSS test minimized at

RSS training always decreases

38
Problems with Simple Train/Test Split
• Test error could vary significantly depending on samples selected
• Only use limited number of samples for training
• Problems particularly bad for data with limited number of samples

39
From http://blog.goldenhelix.com/goldenadmin/cross-validation-for-genomic-
prediction-in-svs/

K-Fold Cross Validation

• -fold cross validation
• Divide data into parts
• Use parts for training. Use remaining for test.
• Average over the test choices
• More accurate, but requires fits of parameters

• Leave one out cross validation (LOOCV)

• Take so one sample is left out.
• Most accurate, but requires N model fittings

40
Polynomial Example
• Use sklearn Kfold object
• Loop
• Outer loop: Over K folds
• Inner loop: Over model order
• Measure test error in each fold and order
• Can be time-consuming

41
Polynomial Example CV Results
• For each model order d
• Compute mean test RSS
• Compute std error (SE) of test RSS
• SE = std dev /
• Mean and SE computed over the folds

• Simple model selection

• Select d with lowest mean test RSS
• For this example
• Estimate model order = 3

42
Binary Classification
“Categorical
variable” Can you fit a linear
Binary Classification Task (T):
model to this data?
• Simplest example where x   and y v{0,1}

• Dataset of ICLR’18 review scores vs. accept/reject decisions

v
Logistic Regression
Pr{Decision=Accept|Score}
Binary Classification Task (T):
• Instead, let’s compute and plot p Pr{ yv 1 | x}

• Idea: Linear regression to fit p as a

function of x

p p 1 x   0
• Is this a good idea?
• Probability p is always bounded
between [0,1]
x
Logistic Regression
“Logits” Function
Binary Classification Task (T):
p
• Consider the following function: g log(v )
1 p
Ground-truth • What is the range of g?
Linear fit*
g  [ , ]
g
• Logistic Regression: fit logits
function using a linear model!
p
g log( ) 1 x   0
1 p
x
Note: the linear fit is illustrative only. How to determine the best linear fit will be discussed next!
Logistic Regression Pr{Decision=Accept|Score}

p 1
g log( ) 1 x   0 p
1 p 1  e  ( 1x   0 )

• What is Pr{Decision=Reject|Score}

e  ( 1x   0 )
1 p 
1  e  ( 1x   0 )

How do we find the model parameters b1 and b0?

Model Estimation
• We will use an approach referred to as Maximum Likelihood Estimation (MLE)
• Let’s assume that the model(i.e., b1 and b0) is magically known. Consider the
training dataset below. What is the likelihood that the dataset came from
our model?
# X Y e  ( 1x1   0 ) 1 1
Likelihood   ( 1 x1   0 )
*  ( 1 x2   0 )
* ...
1
x1 3 y1 0 1 e 1 e 1  e  ( 1x N   0 )
2
x2 8 y2 1
..

N
x N 6 y N 1
Model Estimation
• We will use an approach referred to as Maximum Likelihood Estimation (MLE)
• Let’s assume that the model(i.e., b1 and b0) is magically known. Consider the
training dataset below. What is the likelihood that the dataset came from
our model?
# X Y e  (3 1   0 ) 1 1
Likelihood   ( 3 1   0 )
*  ( 8 1   0 )
* ...
1
x1 3 y1 0 1 e 1 e 1  e  ( 6 1   0 )
2
x2 8 y2 1
..

N
x N 6 y N 1
Model Estimation
• We will use an approach referred to as Maximum Likelihood Estimation (MLE)
• Let’s assume that the model(i.e., b1 and b0) is magically known. Consider the
training dataset below. What is the likelihood that the dataset came from
our model? e  (3  ) 1 1 0
1
v
Log  Likelihood log( )  log( )  ... log( )
# X Y 1  e  (3 1   0 ) 1  e  (8 1   0 ) 1  e  ( 6 1   0 )

1
x1 3 y1 0 g ( 1 ,  0 ) Function of model parameters only
2
x2 8 y2 1
.. Find b1 and b0 that maximize g
.. (or minimize the “loss” –g)
N
x N 6 y N 1 Loss ( 1 ,  0 )  g ( 1 ,  0 )
We Won’t Worry About How (Phew!)
Ground-truth
LR

From regression to classification: if probability of Accept > 0.5, then output Accept.
Logistic Regression: Multi-Variate
Case
UCI Spam Dataset:
https://archive.ics.uci.edu/ml/datasets/Spambase

• 57 Real or integer valued features

• Binary output class

1
pspam  M
(   i xi   0 )
1 e i 1
LR on Spam Database: Results
90% of samples used for training, remaining 10% used for test

Prediction probabilities for all

“SPAM” emails in the test set

Prediction probabilities
for all “SPAM” emails in
the test set

Which emails are mis-predicted?

Accuracy on test set: ~92%
Which Features Matter?
Our Model:
What does bi=0 imply about feature i?
1
pspam  M
(   i xi   0 ) char_freq_$:
1 e i 1

Reasonable hypothesis: features with

larger absolute values of b matter more.
“cs”

“George”
Feature Selection
Retrain and predict using only the top-k features

Can we explicitly train the parameters so

as to prioritize a “sparser” model?
Why?
Low model complexity prevents overfitting!!
80% accuracy using only3 features
Recall that during training we were
seeking to minimize:

ˆ min Loss (  )


How should this objective function change?

Regularization
Lp Norm of a vector x x p
( | xi | p )1/ p

p Lp Norm Interpretation

x 0 ( | xi | )
0 0 1/ 0 Number of Non-zero
Entries

x 1 ( | xi | )
1 Sum of absolute values

2
x 2 ( | xi | ) 2 0.5 Root mean square

 x 
( | xi | ) 0 Max. value

c controls the relative

“Regularized” loss ˆ min{Loss (  )  c  0 } importance of the

regularization penalty
Regularization In Practice
Hard “combinatorial”
L0 Regularization ˆ min{Loss (  )  c  0 }
 optimization problem!
Instead, the following regularization functions are commonly used:

L1 Regularization ˆ min{Loss (  )  c  1} We are penalizing


(LASSO) “large” coefficients.

But why?
L2 Regularization ˆ min{Loss (  )  c  2 }
(Ridge) 
LASSO and Ridge Regularization

 [ 1 ,  2 ] 2 Contour of loss function

2

Loss (  )  Loss (  )
 1
2

1 1
Contour of LASSO function

LASSO prefers
sparse solutions!
Regularization for Spam
Classification

Which regularization
function to use?

How should we select c?

Impact of C
Best result

Ridge (L2)
Lasso (L1)

Increasing model complexity

Errors in Binary Classification
• Two types of errors:
• Type I error (False positive / false alarm): Decide when
• Type II error (False negative / missed detection): Decide when
• Implication of these errors may be different
• Think of breast cancer diagnosis
• Accuracy of classifier can be measured by:

[Remaining Slides from Prof. Rangan’s Intro to ML Class]

60
ROC Curve
• Varying threshold obtains a set of classifier
• Trades off FPR and TPR
• Can visualize with ROC curve
• Receiver operating curve
• Term from digital communications

Lecture 2 2022
No ratings yet
Lecture 2 2022
34 pages
ML 01
No ratings yet
ML 01
24 pages
Lect 1
No ratings yet
Lect 1
24 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
SML Lecture1
No ratings yet
SML Lecture1
37 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Supervised Machine Learning Overview
100% (1)
Supervised Machine Learning Overview
111 pages
Unit-I Machine Learning Basics
No ratings yet
Unit-I Machine Learning Basics
85 pages
ML 3
No ratings yet
ML 3
66 pages
Unit 1
No ratings yet
Unit 1
92 pages
Midterm2008f Sol
No ratings yet
Midterm2008f Sol
12 pages
Intro To Machine Learning Lecture Notes2
No ratings yet
Intro To Machine Learning Lecture Notes2
7 pages
Regression and Generalization
No ratings yet
Regression and Generalization
67 pages
Lecture 15 - Recap and Midterm Review
No ratings yet
Lecture 15 - Recap and Midterm Review
37 pages
Supervised Learning Notes
No ratings yet
Supervised Learning Notes
13 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Developing A Machining Learning Models From Start To Finish.
No ratings yet
Developing A Machining Learning Models From Start To Finish.
59 pages
Class 3 - Classification
No ratings yet
Class 3 - Classification
80 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Statistical Learning Slides
No ratings yet
Statistical Learning Slides
60 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
19 ML Intro
No ratings yet
19 ML Intro
33 pages
AI ch6
No ratings yet
AI ch6
42 pages
Module-1 Deep Learning (Autosaved)
No ratings yet
Module-1 Deep Learning (Autosaved)
100 pages
Ai512 Book
No ratings yet
Ai512 Book
127 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
08 CSE358 Intro To Machine Learning II
No ratings yet
08 CSE358 Intro To Machine Learning II
100 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
Unit 4 Regression
No ratings yet
Unit 4 Regression
26 pages
Supervised Learning & Regression
No ratings yet
Supervised Learning & Regression
41 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Elements of Statistical Learning Overview
No ratings yet
Elements of Statistical Learning Overview
63 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
12 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
DSA5102X Lecture1
No ratings yet
DSA5102X Lecture1
51 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
Machine Learning for Math Students
No ratings yet
Machine Learning for Math Students
60 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
DSA5105 Lecture1
No ratings yet
DSA5105 Lecture1
51 pages
Markdown in Jupyter Notebook Lab 4
No ratings yet
Markdown in Jupyter Notebook Lab 4
5 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Neural Networks Cheat Sheet - 2020 PDF
No ratings yet
Neural Networks Cheat Sheet - 2020 PDF
14 pages
Lecture Notes
No ratings yet
Lecture Notes
86 pages
Machine Learning Cheatsheet
100% (1)
Machine Learning Cheatsheet
15 pages
GML Slides 2024 04 29
No ratings yet
GML Slides 2024 04 29
206 pages
Lecture 02
No ratings yet
Lecture 02
43 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Statistical Machine Learning Exam Guide
No ratings yet
Statistical Machine Learning Exam Guide
10 pages
Gift Card and Barcode Exploits Explained
100% (1)
Gift Card and Barcode Exploits Explained
5 pages
Exploring The Premed Discourse Community
No ratings yet
Exploring The Premed Discourse Community
4 pages
SailPoint IdentityIQ Training
No ratings yet
SailPoint IdentityIQ Training
7 pages
Input Data Sheet For E-Class Record: Region Division District School Name School Id School Year
No ratings yet
Input Data Sheet For E-Class Record: Region Division District School Name School Id School Year
22 pages
Vanniar History
100% (2)
Vanniar History
3 pages
Grade 6 English Review Guide
No ratings yet
Grade 6 English Review Guide
30 pages
Hayot IOComparativeLiterature 2005
No ratings yet
Hayot IOComparativeLiterature 2005
9 pages
Lectia Iii Modul Conditional Si Frazele Conditionale Present Conditional (Conditional Prezent)
No ratings yet
Lectia Iii Modul Conditional Si Frazele Conditionale Present Conditional (Conditional Prezent)
6 pages
Grade 3 Reading Lesson Plan Template
No ratings yet
Grade 3 Reading Lesson Plan Template
2 pages
Don Norman's Design Principles Explained
No ratings yet
Don Norman's Design Principles Explained
73 pages
Prehistoric Music: 1 Origins
100% (1)
Prehistoric Music: 1 Origins
6 pages
As at May 2025
No ratings yet
As at May 2025
30 pages
KANNAD ELT 406 Overview and Training
100% (1)
KANNAD ELT 406 Overview and Training
50 pages
Simple As Possible Computer (SAP-1) : Lecture-3
No ratings yet
Simple As Possible Computer (SAP-1) : Lecture-3
44 pages
Constructing Hermitian Matrices
No ratings yet
Constructing Hermitian Matrices
14 pages
Grade 10 Music Q4: Multimedia Forms
No ratings yet
Grade 10 Music Q4: Multimedia Forms
16 pages
Quiz ENGLISH
No ratings yet
Quiz ENGLISH
5 pages
ICDL Presentation 365 6.0 - ICDL Africa
No ratings yet
ICDL Presentation 365 6.0 - ICDL Africa
178 pages
SQL Basics CHEAT SHEAT
No ratings yet
SQL Basics CHEAT SHEAT
3 pages
4 Harder Indefinite Integration
No ratings yet
4 Harder Indefinite Integration
20 pages
Non Verbal Communication
No ratings yet
Non Verbal Communication
33 pages
Humor & Villainy in "The Cask of Amontillado"
No ratings yet
Humor & Villainy in "The Cask of Amontillado"
3 pages
Lesson2 Numbers Week Month
No ratings yet
Lesson2 Numbers Week Month
3 pages
Informatica Interview Questions
No ratings yet
Informatica Interview Questions
27 pages
Disable Naukri Launcher Guide
No ratings yet
Disable Naukri Launcher Guide
104 pages
Flutter Developer Interview QA
No ratings yet
Flutter Developer Interview QA
2 pages
Bulletin 17 May 2011
100% (1)
Bulletin 17 May 2011
483 pages
Thesis - Language Anxiety - Its Effect On Oral Performance in English of High School Students
No ratings yet
Thesis - Language Anxiety - Its Effect On Oral Performance in English of High School Students
22 pages
Machine Translation for English Learners
0% (1)
Machine Translation for English Learners
5 pages
HTTP - Sequence - Diagram (Web Browsing (Web Browsing)
No ratings yet
HTTP - Sequence - Diagram (Web Browsing (Web Browsing)
4 pages

Lecture 1 2022

Uploaded by

Lecture 1 2022

Uploaded by

Lecture 1: Machine Learning

"A computer program is said to learn from experience E with

• Why not use user knowledge, experience or expertise?

“Supervised Learning (Classification)”

What kind of classifier?

• How do we find the values ?

• How do we find the values ?

• Now consider input: x   and output y   the goal is to learn

Objective: min (Y  X ) (Y  X )

• = degree of polynomial. Called model order

• What model order should

• Synthetic data useful for

d=1 “Underfitting” d=10 “Overfitting”

• Is there a way to tell what is the correct model order to use?

• Find with lowest

• Will quantify three key effects:

• Expectation is over noise on the test sample.

• Since noise on test sample is independent of and :

• Define irreducible error:

Low variance, High variance,

Simpler models Richer models

• Select model order that minimizes

RSS test minimized at

K-Fold Cross Validation

• Leave one out cross validation (LOOCV)

• Simple model selection

• Dataset of ICLR’18 review scores vs. accept/reject decisions

• Idea: Linear regression to fit p as a

How do we find the model parameters b1 and b0?

• 57 Real or integer valued features

Prediction probabilities for all

Which emails are mis-predicted?

Reasonable hypothesis: features with

Can we explicitly train the parameters so

How should this objective function change?

c controls the relative

L1 Regularization ˆ min{Loss (  )  c  1} We are penalizing

 [ 1 ,  2 ] 2 Contour of loss function

How should we select c?

Increasing model complexity

[Remaining Slides from Prof. Rangan’s Intro to ML Class]

You might also like