0% found this document useful (0 votes)
48 views

Logistic Regression Report

Here are the answers to the quiz questions on logistic regression: 1. Logistic regression is a classification regression. 2. The dependent variable in logistic regression is categorical. 3. The range of the predicted probabilities in logistic regression is between 0 and 1. 4. Maximum likelihood estimation is used to estimate the parameters/coefficients in logistic regression. 5. The link function used in logistic regression is the logit/log-odds function. 6. The odds ratio in logistic regression indicates how much more likely an outcome is with a one-unit increase in the predictor. 7. The intercept in logistic regression represents the log odds of the outcome when all predictors equal 0. 8. The coefficients in logistic regression indicate

Uploaded by

Cherry Mea Bahoy
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Logistic Regression Report

Here are the answers to the quiz questions on logistic regression: 1. Logistic regression is a classification regression. 2. The dependent variable in logistic regression is categorical. 3. The range of the predicted probabilities in logistic regression is between 0 and 1. 4. Maximum likelihood estimation is used to estimate the parameters/coefficients in logistic regression. 5. The link function used in logistic regression is the logit/log-odds function. 6. The odds ratio in logistic regression indicates how much more likely an outcome is with a one-unit increase in the predictor. 7. The intercept in logistic regression represents the log odds of the outcome when all predictors equal 0. 8. The coefficients in logistic regression indicate

Uploaded by

Cherry Mea Bahoy
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Logistic Regression

AN INTRODUCTION BY ELAH P. QUITORIANO


Logistic Regression

A supervised learning algorithm used


to predict a dependent categorical
target variable.
Logistic Regression

Example.
Animal Not Animal
Logistic Regression

A Logistic Regression model predicts a dependent data variable


by analyzing the relationship between one or more existing
independent variables.
Animal Not Animal

Features of An
Animal
Independent Variable

Dependent Variable
Logistic Regression

In Medicine, supposed we want to predict when will most likely a


person is susceptible to have a lung disease or not.

Age Diseased
Gender
Smoking Status Not Diseased

Independent Dependent
Variables Variables
3 Types of Logistic Regression

1. Binary Logistic Regression


There are just two possible outcome answers or dependent variables. This concept is
typically represented as a 0 or a 1 in coding.

Examples include:
o Whether to lend or not to lend a bank customer
Dependent Variables: Yes or No
o Assessing Cancer Risk
Dependent Variables: High or Low
3 Types of Logistic Regression

2. Multinomial Logistic Regression


There are multiple (three or more) classes that an item can be classified as.

Examples include:

o Predicting whether a student will go to college, trade school or into the workforce.
o Does your cat prefer wet food, dry food or human food?
3 Types of Logistic Regression

3. Ordinal Logistic Regression


Is also a model where there are multiple classes that an item can be classified as;
however, an ordering of classes are required.

Examples include:

o Ranking restaurants on a scale of 0 to 5 stars


o Assessing a choice of candidates, specially in places that institute ranked-choice voting.
Logistic Regression Use Cases

Business Example
For an online retailer, you need to predict which product a particular a customer
is most likely to buy.
Political Example
Predicting whether a political candidate will win or lose an election.
Banking Example
Predicting the chances that a loan applicant will default on a loan or not.
Logistic Regression Theory and Concepts
So what are the theory and concept of Logistic
Regression
Logistic regression is a supervised machine learning algorithm
used for binary classification problems, where the goal is to predict
whether an input belongs to one of two possible classes. The
algorithm learns the relationship between input variables and their
corresponding binary output by finding the optimal set of
parameters or weights that minimize the error between predicted
and actual values.
In logistic regression, the input features are combined linearly
using weights, and the output is transformed using a sigmoid
function to give a probability score between 0 and 1. This
probability score represents the likelihood of the input belonging to
one of the two classes. If the probability score is above a certain
threshold (usually 0.5), the input is classified as belonging to the
positive class, otherwise, it is classified as belonging to the
negative class.
The mathematical formula for logistic
regression is:

p(y=1|x;w) = 1 / (1 + exp(-wTx))
where p(y=1|x;w) is the probability of the input x belonging to the
positive class, w is the weight vector, x is the input feature vector,
and exp is the exponential function.

The weights in logistic regression are learned using an


optimization algorithm that minimizes the cost function, which is
the negative log-likelihood of the training data. The most common
optimization algorithms used are gradient descent and its variants,
such as stochastic gradient descent.
Logistic regression can be extended to handle multi-class
classification problems by using techniques such as one-vs-all and
softmax regression. In one-vs-all, multiple binary logistic
regression models are trained, where each model represents one
class against all the others. In softmax regression, a generalization
of logistic regression, the output is transformed using a softmax
function, which normalizes the output probabilities for all classes.
Logistic regression has several advantages, such as being
interpretable and computationally efficient. However, it may not
perform well when the relationship between the input variables and
the output is complex or nonlinear. In such cases, more advanced
machine learning algorithms such as decision trees, random
forests, or neural networks may be more appropriate.
Why do we use Logistic Regression rather than
Linear Regression?
If you have this doubt, then you’re in the right place, my friend. After
reading the definition of logistic regression we now know that it is only
used when our dependent variable is binary and in linear regression this
dependent variable is continuous.
The second problem is that if we add an outlier in our dataset, the best fit
line in linear regression shifts to fit that point.
Now, if we use linear regression to find the best fit line which aims at
minimizing the distance between the predicted value and actual value, the
line will be like this:
Here the threshold value is 0.5, which means if the value of h(x) is greater
than 0.5 then we predict malignant tumor (1) and if it is less than 0.5 then
we predict benign tumor (0). Everything seems okay here but now let’s
change it a bit, we add some outliers in our dataset, now this best fit line
will shift to that point. Hence the line will be somewhat like this:
ANALYSIS
Logistic Regression is a parametric model, which means that it
assumes a specific functional form for the relationship between the
input variables and the output. The input features are combined
linearly using weights, and the output is transformed using a
sigmoid function to give a probability score between 0 and 1. This
probability score represents the likelihood of the input belonging to
one of the two classes. If the probability score is above a certain
threshold (usually 0.5), the input is classified as belonging to the
positive class, otherwise, it is classified as belonging to the
negative class.
One of the key advantages of logistic regression is that it is a
simple and interpretable algorithm. It is also computationally
efficient and can handle large datasets with high-dimensional input
features. However, logistic regression may not perform well when
the relationship between the input variables and the output is
complex or nonlinear. In such cases, more advanced machine
learning algorithms such as decision trees, random forests, or
neural networks may be more appropriate.
To apply logistic regression to a problem, the first step is to gather
and preprocess the data. This involves identifying the input
features and the corresponding binary output and cleaning,
normalizing, and standardizing the data as needed.
Next, the data is split into training and testing sets. The training set
is used to learn the optimal set of weights using an optimization
algorithm such as gradient descent or its variants, such as
stochastic gradient descent. The testing set is used to evaluate the
performance of the model and estimate the generalization error.
The performance of a logistic regression model can be evaluated
using various metrics such as accuracy, precision, recall, and F1
score. The choice of metric depends on the specific problem and
the relative importance of false positives and false negatives.
In conclusion, Logistic Regression is a useful and widely used
machine learning algorithm for binary classification problems. Its
simplicity, interpretability, and computational efficiency make it a
popular choice for various applications. However, its performance
may be limited when the relationship between the input variables
and the output is complex or nonlinear.
IMPLEMENTING LOGISTIC REGRESSION
IN
WEKA

Andy Von M. Simbajon


Step 1 Click on Classification Tab.
Step 2 Click on choose button.
Step 3 Open function folder and select Logistic.
Step 4 Click on percentage split and change it to 80% also click on More options and
choose plain text then click start.
Result :
ACTIVITY
Load the Breast Cancer dataset found in WEKA
root folder and perform the Logistic Regression.
Quiz
1. What type of regression is logistic regression?
2. What is the dependent variable in logistic regression?
3. What is the range of the predicted probabilities in logistic regression?
4. What is the maximum likelihood estimation used for in logistic regression?
5. What is the link function used in logistic regression?
6. What is the purpose of odds ratio in logistic regression?
7. What is the meaning of the intercept in logistic regression?
8. How do you interpret the coefficients in logistic regression?
9. What is the purpose of the confusion matrix in logistic regression?
10. What is the difference between logistic regression and linear regression?

You might also like