0% found this document useful (0 votes)
12 views

ML points

Uploaded by

akilali1991999
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

ML points

Uploaded by

akilali1991999
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Classification

For eg.: an algorithm can learn to predict whether a given email is spam or ham

Machine Learning Classification Vs. Regression

There are four main categories of Machine Learning algorithms: supervised,


unsupervised, semi-supervised, and reinforcement learning.

Even though classification and regression are both from the category of supervised
learning, they are not the same.

 The prediction task is a classification when the target variable is discrete.


An application is the identification of the underlying sentiment of a piece of
text.
 The prediction task is a regression when the target variable is
continuous. An example can be the prediction of the salary of a person
given their education degree, previous work experience, geographical
location, and level of seniority.
Examples of Machine Learning Classification in Real Life

 During the COVID-19 pandemic, machine learning models were


implemented to efficiently predict whether a person had COVID-19 or not.
 To predict which geographical location will have a rise in traffic volume.

Different Types of Classification Tasks in Machine Learning

There are four main classification tasks in Machine learning: binary, multi-class,
multi-label, and imbalanced classifications.

Binary classification task, the goal is to classify the input data into two mutually exclusive
categories. The training data in such a situation is labeled in a binary format: true and false;
positive and negative; O and 1;

Multi-class classification, on the other hand, has at least two mutually exclusive class
labels, where the goal is to predict to which class a given input example belongs to.
multi-label classification tasks, we try to predict 0 or more classes for each input example.
In this case, there is no mutual exclusion because the input example can have more than
one label.

Imbalanced classification, the number of examples is unevenly distributed in each


class, meaning that we can have more of one class than the others in the training
data. Let’s consider the following 3-class classification scenario where the training
data contains: 60% of trucks, 25% of planes, and 15% of boats.
Metrices used in machine learning classification algorithm
Regression
Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more
independent variables.
More specifically, Regression analysis helps us to understand how the value of
the dependent variable is changing corresponding to an independent variable
when other independent variables are held fixed.
It predicts continuous/real values such as temperature, age, salary,
price, etc.

Types of Regression
o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:

Linear Regression:
o Linear regression is a statistical regression method which is used for
predictive analysis.
o It is one of the very simple and easy algorithms which works on regression
and shows the relationship between the continuous variables.
o Linear regression shows the linear relationship between the independent
variable (X-axis) and the dependent variable (Y-axis), hence called linear
regression.
o If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input
variable, then such linear regression is called multiple linear
regression.
o The relationship between variables in the linear regression model can be
explained using the below image. Here we are predicting the salary of an
employee on the basis of the year of experience.
Below is the mathematical equation for Linear regression: Y= aX+b

Logistic Regression:
o Logistic regression is another supervised learning algorithm which is used
to solve the classification problems.
o Logistic regression algorithm works with the categorical variable such as 0
or 1, Yes or No, True or False, Spam or not spam, etc.
o It is a predictive analysis algorithm which works on the concept of
probability.
o Logistic regression is a type of regression, but it is different from the linear
regression algorithm in the term how they are used.
o Logistic regression uses sigmoid function or logistic function which is a
complex cost function. This sigmoid function is used to model the data in
logistic regression. The function can be represented as:
`

Polynomial Regression:
o Polynomial Regression is a type of regression which models the non-
linear dataset using a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve
between the value of x and corresponding conditional values of y.
o Suppose there is a dataset which consists of datapoints which are present
in a non-linear fashion, so for such case, linear regression will not best fit
to those datapoints. To cover such datapoints, we need Polynomial
regression.
o In Polynomial regression, the original features are transformed
into polynomial features of given degree and then modeled using
a linear model. Which means the datapoints are best fitted using a
polynomial line.
Support Vector Regression:
Support Vector Machine is a supervised learning algorithm which can be
used for regression as well as classification problems. So if we use it for
regression problems, then it is termed as Support Vector Regression.

Support Vector Regression is a regression algorithm which works for


continuous variables. Below are some keywords which are used
in Support Vector Regression:

o Kernel: It is a function used to map a lower-dimensional data into higher


dimensional data.
o Hyperplane: In general SVM, it is a separation line between two classes,
but in SVR, it is a line which helps to predict the continuous variables and
cover most of the datapoints.
o Boundary line: Boundary lines are the two lines apart from hyperplane,
which creates a margin for datapoints.
o Support vectors: Support vectors are the datapoints which are nearest
to the hyperplane and opposite class.

In SVR, we always try to determine a hyperplane with a maximum margin,


so that maximum number of datapoints are covered in that margin. The
main goal of SVR is to consider the maximum datapoints within
the boundary lines and the hyperplane (best-fit line) must
contain a maximum number of datapoints. Consider the below
image:
Decision Tree Regression:
o Decision Tree is a supervised learning algorithm which can be used for
solving both classification and regression problems.
o It can solve problems for both categorical and numerical data
o Decision Tree regression builds a tree-like structure in which each internal
node represents the "test" for an attribute, each branch represent the
result of the test, and each leaf node represents the final decision or
result.
o A decision tree is constructed starting from the root node/parent node
(dataset), which splits into left and right child nodes (subsets of dataset).
These child nodes are further divided into their children node, and
themselves become the parent node of those nodes. Consider the below
image:
Ridge Regression:
o Ridge regression is one of the most robust versions of linear regression in
which a small amount of bias is introduced so that we can get better long
term predictions.
o The amount of bias added to the model is known as Ridge Regression
penalty. We can compute this penalty term by multiplying with the
lambda to the squared weight of each individual features.
o The equation for ridge regression will be:

o A general linear or polynomial regression will fail if there is high


collinearity between the independent variables, so to solve such problems,
Ridge regression can be used.
o Ridge regression is a regularization technique, which is used to reduce the
complexity of the model. It is also called as L2 regularization.
o It helps to solve the problems if we have more parameters than samples.
Lasso Regression:
o Lasso regression is another regularization technique to reduce the
complexity of the model.
o It is similar to the Ridge Regression except that penalty term contains only
the absolute weights instead of a square of weights.
o Since it takes absolute values, hence, it can shrink the slope to 0, whereas
Ridge Regression can only shrink it near to 0.
o It is also called as L1 regularization. The equation for Lasso regression
will be:

You might also like