0% found this document useful (0 votes)
11 views15 pages

Linear Methods for Classification-1 (1)

The document discusses linear methods for classification, particularly focusing on logistic regression as a solution for modeling categorical response variables. It explains the importance of using a logistic function to ensure outputs remain within the range of [0, 1] and describes how to estimate coefficients using maximum likelihood. Additionally, it highlights the impact of confounding variables on model interpretation, emphasizing the need to include relevant features in the analysis.

Uploaded by

alenjacsbackup
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views15 pages

Linear Methods for Classification-1 (1)

The document discusses linear methods for classification, particularly focusing on logistic regression as a solution for modeling categorical response variables. It explains the importance of using a logistic function to ensure outputs remain within the range of [0, 1] and describes how to estimate coefficients using maximum likelihood. Additionally, it highlights the impact of confounding variables on model interpretation, emphasizing the need to include relevant features in the analysis.

Uploaded by

alenjacsbackup
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

COMP3202 - Intro to Machine Learning

Linear Methods for Classification


Linear Models for Classification
For tasks where the response variable is categorical, we
need a method that models the posterior probabilities:
Pr(Y = k | X = x)
where k is the class of instance x.
Linear Models for Classification

If Pr(Y = k | X = x) is
linear in X then
– the decision boundaries
will be linear and
– we can use a linear
model.

Figure from The Elements of Statistical Learning by Hastie, Tibshirani and Friedman, 2009.
Why not use linear regression?

Pr(Y = k | X = x)
must be modelled with a
function that gives outputs
between 0 and 1 for all
values of X

Fig. from An Introduction to Statistical Learning: with Applications in R by James, Witten, Hastie, and Tibshirani, 2013.
From linear to logistic regression
e is the
Euler's number =
2.718281

Fig. from An Introduction to Statistical Learning: with Applications in R by James, Witten, Hastie, and Tibshirani, 2013.
Logistic Regression
We need to model the relationship between P(X)=Pr(Y=1|X=x) and X.
Consider using a linear model to represent the probabilities:

Using this equation, the output for very large or very small input values
could potentially be outside the range [0, 1]. (Why is this not sensible?)
To avoid this problem, we use a logistic function that will ensure the output
to be within the range (0, 1):

This function produces an S-shaped curve, that regardless of value of X,


produces a sensible output.
Logistic Regression
After a bit of manipulation of

We find that,

called the odds,


Values between 0 and ∞

By taking the logarithm of both sides:

called the log-odds/logit,


Finding the Coefficients
Likelihood function:

Maximum likelihood is a very general approach that is used to estimate


the 𝛽s which will maximize this likelihood function.
Any statistical package can be used to estimate the 𝛽s. (e.g. SGD,...)

Generalized likelihood function:


X

X
Logistic Regression
The Maximum likelihood is
used to estimates the 𝛽s.

Probability
of Y = 1

X
Fig. adapted from An Introduction to Statistical Learning: with Applications in R by James, Witten, Hastie, and Tibshirani, 2013.
Logistic Regression
● Logit is linear in X
● If βi is positive then increasing Xi will increase p(X)
● If βi is negative then increasing Xi will decrease p(X)
● Predict Y = 1 for any instance for which p(X) > threshold
● The decision boundary is the points for which the log-odds are
zero.
Example
Suppose we want to predict using logistic regression whether
an individual will default on their credit card payment on the
basis of annual income, monthly credit card balance and
student status

Example and Figure from An Introduction to Statistical Learning: with Applications in R by James, Witten, Hastie, and Tibshirani, 2013.
Making Predictions

Based on this Coefficients,


The default probability for an individual with a balance of $1,000 is:

Fig. from An Introduction to Statistical Learning: with Applications in R by James, Witten, Hastie, and Tibshirani, 2013.
Confounding
Suppose that we construct a model of the probability of default
using only the feature student status and the coefficient
associated with this feature is 0.4049.

However, when we add the features balance and income to the


model the coefficient associated with the feature student status
is negative. Why?
Confounding

● Interpretation:
○ A student is less risky than a non-student with the same credit card balance

● Confounding occurs when features are correlated


● Results of linear models significantly change depending on the features included.
● It is important to include (all) relevant features
Example and Figure from An Introduction to Statistical Learning: with Applications in R by James, Witten, Hastie, and Tibshirani, 2013.

You might also like