0% found this document useful (0 votes)
385 views18 pages

MACHINE LEARNING Presentation Logistic Regression

Logistic regression is a machine learning algorithm used for classification problems. It predicts the probability of an output belonging to a class by fitting data to a logistic curve. The algorithm works by assigning probabilities to outcomes using the sigmoid function and creating a decision boundary to separate classes based on these probabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
385 views18 pages

MACHINE LEARNING Presentation Logistic Regression

Logistic regression is a machine learning algorithm used for classification problems. It predicts the probability of an output belonging to a class by fitting data to a logistic curve. The algorithm works by assigning probabilities to outcomes using the sigmoid function and creating a decision boundary to separate classes based on these probabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

MACHINE LEARNING

LOGISTIC REGRESSION
Brendon chamunorwa m223138
Emmanuel chingosho m224134
Rudo nyamarambwe m222242
Tadiwa masocha m223530
WHAT IS LOGISTIC
REGRESSION
Logistic regression is a supervised machine learning algorithm mainly used for
classification tasks where the goal is to predict the probability that an instance of belonging
to a given class.
It is used for classification algorithms its name is logistic regression. it’s referred to as
regression because it takes the output of the linear regression function as input and uses a
sigmoid function to estimate the probability for the given class.
The difference between linear regression and logistic regression is that linear regression
output is the continuous value that can be anything while logistic regression predicts the
probability that an instance belongs to a given class or not.
TERMINOLOGIES INVOLVED
IN LOGISTIC REGRESSION:
Independent variables: The input characteristics or predictor factors applied to the
dependent variable’s predictions.
Dependent variable: The target variable in a logistic regression model, which we are
trying to predict.
Logistic function: The formula used to represent how the independent and dependent
variables relate to one another. The logistic function transforms the input variables
into a probability value between 0 and 1, which represents the likelihood of the
dependent variable being 1 or 0.
.

Odds: It is the ratio of something occurring to something not occurring. it is different from
probability as the probability is the ratio of something occurring to everything that could
possibly occur.
Log-odds: The log-odds, also known as the logit function, is the natural logarithm of the
odds. In logistic regression, the log odds of the dependent variable are modeled as a linear
combination of the independent variables and the intercept.
Coefficient: The logistic regression model’s estimated parameters, show how the
independent and dependent variables relate to one another.
Intercept: A constant term in the logistic regression model, which represents the log odds
when all independent variables are equal to zero.
Maximum likelihood estimation: The method used to estimate the coefficients of the
logistic regression model, which maximizes the likelihood of observing the data given the
model.
CHARACTERISTICS OF
LOGISTIC REGRESSION
It is used for predicting the categorical dependent variable using a given set of
independent variables.
Logistic regression predicts the output of a categorical dependent variable. Therefore
the outcome must be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact
value as 0 and 1, it gives the probabilistic values which lie between 0 and 1.
Logistic Regression is much similar to the Linear Regression except that how they
are used. Linear Regression is used for solving Regression problems, whereas
Logistic regression is used for solving the classification problems.
,

In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).
The curve from the logistic function indicates the likelihood of something such as whether
the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets.
Logistic Regression can be used to classify the observations using different types of data
and can easily determine the most effective variables used for the classification.
LOGISTIC FUNCTION OR
SIGMOID FUNCTION
The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
It maps any real value into another value within a range of 0 and 1. o The value of
the logistic regression must be between 0 and 1, which cannot go beyond this limit,
so it forms a curve like the “S” form.
The S-form curve is called the Sigmoid function or the logistic function.
In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and
a value below the threshold values tends to 0.
TYPE OF LOGISTIC
REGRESSION:
1. Binomial: In binomial Logistic regression, there can be only two possible types
of the dependent variables, such as 0 or 1, Pass or Fail, etc.

2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible


unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep”

3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered


types of dependent variables, such as “low”, “Medium”, or “High”.
HOW DOES LOGISTIC
REGRESSION WORK?
Machine learning generally involves predicting a quantitative outcome or a
qualitative class. The former is commonly referred to as a regression problem. In the
scenario of linear regression, the input is a continuous variable, and the prediction is
a numerical value. When predicting a qualitative outcome (class), the task is
considered a classification problem. Examples of classification problems include
predicting what products a user will buy or if a target user will click on an online
advertisement.
Not all algorithms fit cleanly into this simple dichotomy, though, and logistic
regression is a notable example.
.

Logistic regression is part of the regression family as it involves predicting outcomes based
on quantitative relationships between variables. However, unlike linear regression, it
accepts both continuous and discrete variables as input and its output is qualitative. In
addition, it predicts a discrete class such as “Yes/No” or “Customer/Non-customer”.

In practice, the logistic regression algorithm analyzes relationships between variables. It


assigns probabilities to discrete outcomes using the Sigmoid function, which converts
numerical results into an expression of probability between 0 and 1.0. Probability is either 0
or 1, depending on whether the event happens or not. For binary predictions, you can divide
the population into two groups with a cut-off of 0.5. Everything above 0.5 is considered to
belong to group A, and everything below is considered to belong to group B.
A hyperplane is used as a decision line to separate two categories (as far as possible) after
data points have been assigned to a class using the Sigmoid function. The class of future
data points can then be predicted using the decision boundary.
LOGISTIC REGRESSION
EQUATION
The odd is the ratio of something occurring to something not occurring. it is
different from probability as the probability is the ratio of something occurring to
everything that could possibly occur. so odd will be

Applying natural log on odd. then log odd will be

then the final logistic regression equation will be:


ASSUMPTIONS FOR LOGISTIC
REGRESSION
Independent observations: Each observation is independent of the other. meaning
there is no correlation between any input variables.
Binary dependent variables: It takes the assumption that the dependent variable must
be binary or dichotomous, meaning it can take only two values. For more than two
categories softmax functions are used.
Linearity relationship between independent variables and log odds: The relationship
between the independent variables and the log odds of the dependent variable should
be linear.
No outliers: There should be no outliers in the dataset.
Large sample size: The sample size is sufficiently large
APPLYING STEPS IN LOGISTIC
REGRESSION MODELING:
1. Define the problem: Identify the dependent variable and independent variables and determine if the
problem is a binary classification problem.
2. Data preparation: Clean and preprocess the data, and make sure the data is suitable for logistic regression
modeling.
3. Exploratory Data Analysis (EDA): Visualize the relationships between the dependent and independent
variables, and identify any outliers or anomalies in the data.
4. Feature Selection: Choose the independent variables that have a significant relationship with the
dependent variable, and remove any redundant or irrelevant features.
5. Model Building: Train the logistic regression model on the selected independent variables and estimate
the coefficients of the model.
6. Model Evaluation: Evaluate the performance of the logistic regression model using appropriate metrics
such as accuracy, precision, recall, F1-score, or AUC-ROC.
7. Model improvement: Based on the results of the evaluation, fine-tune the model by adjusting the
independent variables, adding new features, or using regularization techniques to reduce overfitting.
8. Model Deployment: Deploy the logistic regression model in a real-world scenario and make predictions
on new data.
LOGISTIC REGRESSION
MODEL THRESHOLDING
Logistic regression becomes a classification technique only when a decision
threshold is brought into the picture. The setting of the threshold value is a very
important aspect of Logistic regression and is dependent on the classification
problem itself.

The decision for the value of the threshold value is majorly affected by the values of
precision and recall. Ideally, we want both precision and recall to be 1, but this
seldom is the case.

In the case of a Precision-Recall tradeoff, we use the following arguments to decide


upon the threshold:
,

1. Low Precision/High Recall: In applications where we want to reduce the number of


false negatives without necessarily reducing the number of false positives, we choose a
decision value that has a low value of Precision or a high value of Recall. For example,
in a cancer diagnosis application, we do not want any affected patient to be classified as
not affected without giving much heed to if the patient is being wrongfully diagnosed
with cancer. This is because the absence of cancer can be detected by further medical
diseases but the presence of the disease cannot be detected in an already rejected
candidate.

2. High Precision/Low Recall: In applications where we want to reduce the number of


false positives without necessarily reducing the number of false negatives, we choose a
decision value that has a high value of Precision or a low value of Recall. For example,
if we are classifying customers whether they will react positively or negatively to a
personalized advertisement, we want to be absolutely sure that the customer will react
positively to the advertisement because otherwise, a negative reaction can cause a loss
of potential sales from the customer.
CONCLUSION
Logistic regression is a supervised learning method that helps to predict events that have a
binary outcome, such as whether a person will successfully pass a driving test. In order to
make predictions in this scenario, you need data from past test results. The model takes this
data and predicts the likelihood that the same person will pass the test in the future. The
main idea behind logistic regression is to use a model based on the probability of an
outcome occurring.

You might also like