0% found this document useful (0 votes)
24 views12 pages

Machine Learning CH 4

The document provides an overview of supervised learning, focusing on classification and regression techniques. It explains key concepts, algorithms such as k-Nearest Neighbour and Support Vector Machines, and outlines the steps involved in building classification models. Additionally, it covers regression methods, including simple and multiple linear regression, and their applications in various fields.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views12 pages

Machine Learning CH 4

The document provides an overview of supervised learning, focusing on classification and regression techniques. It explains key concepts, algorithms such as k-Nearest Neighbour and Support Vector Machines, and outlines the steps involved in building classification models. Additionally, it covers regression methods, including simple and multiple linear regression, and their applications in various fields.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Supervised Learning :

4 Classification and Regression


4.1 Introduction to Supervised Learning
4.2 Classification Model
4.3 Learning Steps of Classification
4.4 Classification Algorithms
4.4.1 k-Nearest Neighbour Algorithm
4.4.2 Support Vector Machines (SVM)
4.5 Regression
4.5.1 Simple Linear Regression
4.5.2 Multiple Linear Regression
4.5.3 Logistic Regression
Questions Bank
4.1 Introduction to Supervised labeled dataset, which involves finding the
Learning relationship between the input featu°res and
Supervised learning is a machine learning output labels. Once the model is trained
technique that involves training amodel it can be used to predict the output for
to make predictions based on labeled new input data that it has not seen before.
data. In supervised learning, the model Supervised learning is commonly used in
learns from a dataset that includes both applications such as image classification.
input features and corresponding output text classification, fraud detection, and
labels. The goal of supervised learning recommendation systems. The key
is to develop a model that can accurately advantage of supervised learning is that
predict the output for new input data that it can produce
highly accurate
it has not seen before. predictions when trained on a large and
The labeled dataset used for
supervised diverse dataset. However, it requires
learning typically consists of a set of labeled dataset, which can be costly anda
input features and corresponding output time-consuming to create.
labels. The input features can be4.2 Classification Model
numerical or categorical, and the output
labels can be binary (e.g. true or false) Classification is a type of supervised
or multi-class (e.g.
classification of learning where the goal is to predict
the
different objects). categorical output variable for a given
The process of supervised set of input features. In
involves training a model using learning the model learns to classification,
the distinguish
different classes based on between
labeled data.
Supervised Learning : Classification and Regression 39

Classification is a type of supervised two possible classes, such as "spam" or


learning where the goal is to predict the "'not spam". In multi-class classification,
categorical output variable for a given the goal is to predict one of several
set of input features. In classification, possible classes, such as identifying
the model learns to distinguish between different types of animals from an
different classes based on labeled data.
image.
For example, in a binary classification To build a classification model, the first
problem, the goal is to predict one of

Training &
volidation data

Incoming Spam
messages

Inbox
Classification
model

Spcm

Classification Model
step is to collect labeled data, This the output labels.
involves creating a dataset where each During training, the algorithm adjusts its
data point includes a set of input parameters to minimize the difference
features and a corresponding output between the predicted output and the
label. The input features can be anything actual output in the labeled dataset.
that might be relevant to the Once the model is trained, it can be
classification task, such as pixel values used to predict the output label for new
for image data or demographic data for input data.
predicting customer behavior. In summary, classification is a type of
Once the dataset is prepared, the next supervised learning where a target
step is to choose an appropriate feature, which is of categorical type, is
classification algorithm, such as logistic predicted for test data on the basis of
regression, decision trees, or neural the information imparted by the training
networks. The algorithm is then trained data. The target categorical feature is
on the labeled dataset to learn the known as class.
relationship between the input features and
Evaluation of a classification model
Learnino
40 Fundamentals of Machine
regularization, and ensembling.
involves measuring its accuracy on a held of Classification

out validation dataset or through croSS 4.3 Learning Steps


validation techniques. The accuracy of the Problem Iden tification:
Step 1: step in
model can be improved by using problem is the first
Identifying the problem
techniques such as feature engineering, model. The
supervised learning
the

Prolsle:

Lde:tifica:on
of recunel

+ Da2 pre -process:2

Deiun:tion cf
t1a1ig set

Algor:thra
selection

Parcmeter un:ng Trainng

Etal:at1i0::

OK - Class1fier

must be well-posed problem. real-world use of the given problem.


Step 2: Identification of Required Data: Step 5: Algorithm Selection: Choosing an
Collecting and preparing a labeled dataset appropriate classification algorithm based on
that includes input features and the nature of the problem and the size
corresponding output labels. and complexity of the dataset.
Step 3: Data Preprocessing: Cleaning Step 6: Training: Using the labeled dataset
and preparing dataset, which may involve to train the selected algorithm by
removing missing data, handling outliers adjusting
its parameters to minimize the difference
and transferring the data into a suitable between the predicted output and the
format. actual output. Also, adjusting the
Step 4: Definition of Training Data Set: hyperparameters of the algorithm to
Before starting the analysis, the user optimize the model's performance
should decide type of data set is to be validation dataset should be done,oniftheit
used as a training set. The training set needs..
needs to be actively representative of the
Supervised Learning : Classification and Regression 41

Step 7: Evaluation with the test dataset: data point are then determined based on
Training data is run on the algorithm, and the smallest distances.
its performance is measured here. If a In classification, the output label of the
desired output is not obtained, again new data point is then determined by the
training of parameters may be required. majority class among its k-nearest
4.4 Classification Algorithms neighbors. For regression, the output value
is the average of the output values of its
4.4.1 k-Nearest Neighbour Algorithm k-nearest neighbors.
The k-Nearest Neighbor (k-NN) algorithm The choice of k is an important parameter
is a simple, non-parametric algorithm used in the k-NN algorithm, and it can be
in supervised learning for classification determined by cross-validation techniques.
and regression. In k-NN, the output is If k is too small, the algorithm may be
classified by a majority vote of its k sensitive to noise and outliers, while if k
nearest neighbors in the training dataset. is too large; the algorithm may over
The algorithm works by first calculating generalize and not capture the underlying
the distances between the new input data patterns in the data.
point and all the data points in the training
set. The k-nearest neighbors of the new
Initial Data Calculate Distance
New example Class A
to classity Class A
Y-AIS Ciass 8 YAxis Ciass B

X-Axis

Finding Neighbors & Voting for Labels


Class A
Y-Axis Ciass 8

K=3

XAxs

Fundamental of Machine Learning / 2023 /6


42 Learning
Fundamentals of Machine

The k-NN algorithm is a simple and


easy
to-understand algorithm, but it can be
computationally expensive for large
datasets. It also suffers from the curse
of
dimensionality, where the performance of A

the algorithm deteriorates as the number


of input features increases.
Despite its limitations, the k-NN algorithm
is still widely used for its simplicity and
effectiveness in many applications, n

including image recognition, recommender d(x. y) =


systems, and anomaly detection. i=l

For kNN, there is no learning happening


Advantages of k-NN classification
in the real sense. Therefore, kNN falls
under the category of lazy learner. algorithm :
Simple and easy to understand.
Here is the general k-Nearest Neighbor (k
Non-parametric method does not
NN) algorithm for classification:
Input : require any assumptions about the
underlying data distribution.
A training dataset with labeled examples Can be used for both binary and
(X, y) multi-class classification problems.
A new input data point X Can handle both continuous and
The number of neighboursk categorical data types.
Output : Can work well with small datasets.
A predicted label for x Disadvantages of k-NN classification
Algorithm: algorithm:
1. Compute the distance between the Computationally expensive for large
datasets.
new input data point x and all data
points in the training set X. Suffers from the Curse of
2. Select the k-nearest neighbors of x dimensionality where the performance
based on the smallest distances. deteriorates as the number of input
features increases.
3. Determine the majority class among
the k-nearest neighbours. Highly sensitive to the distance metric
used, which can have a significant
4. Assign the predicted label to x based impact on the results.
on the majority class.
Requires careful selection of the value
Note: In step 1, generally Euclidean of k, which can affect
Distance is being used to calculate the the model.
the accuracy of
distance. The formula for it is given
below:
43
Supervised Learning:Classification and Regression
Applications of k-NN classification
algorithm:
Support vector, Optimal Hyparplane
Image recognition andclassification.
Recommendation systems for online
shopping or entertainment.
Credit risk assessment in finance.
Medical diagnosis based on patient
Support vector
characteristics.
Fraud detection in credit card
Maxis so rgign
transactions.

4.4.2 Support Vector Machines (SVM)


Support Vector Machine (SVM) is a
powerful classification algorithm used in SVM has several advantages, including:
supervised learning. The algorithm works " Effective for high-dimensional data.
by finding the optimal hyperplane that " Can handle non-linearly separable
separates the different classes in the input data by applying kernel functions.
data.
" Works well with small to medium
Support vectors are the data points sized datasets.
(representing classes), the critical Robust to outliers.
component in a data set, which are near
However, SVM also has SOme
the identified set of lines (hyperplane). In
limitations, including :
the case of a binary classification problem,
SVM aims to find the hyperplane that " SVM is applicable only for binary
maximizes the margin between the two classification.
classes. The margin_ is defined as the Computationally expensive for large
distance between the hyperplane and the datasets.
closest data points from each class. SVM Sensitive to the choice of kernel
finds the optimal hyperplane by solving function and parameters.
a convex optimization problem that Difficult to interpret the resulting
maximizes the margin while minimizing model and extract insights from it.
the classification error. Overfitting may cause.
If the input data is not linearly separable, Applications of SVM include :
SVM can still be used by applying a
kernel function that maps the data into a
" Image classification and object
higher-dimensional feature space, where a recognition.
linear hyperplane can be used to separate Text classification and sentiment
the classes. Some popular kernel functions analysis.
used in SVM include linear, polynomial, " Bioinformatics and gene expression
radial basis function (RBF), and sigmoid. analysis.
Financial forecasting and risk
management.
44 Fundamentals of Machine Learning

Anomaly detection in network |4.5.1 Simple Linear Regression


intrusion detection systems. Linear regression is one of the most basic
4 Regresion types of regression in machine learning.
The linear regression model consists of
a
Regression in supervised learning is a type
of predictive modeling that involves predictor variable and a dependent
variable related linearly to each other. In
estimating a continuous numerical output
value based on input features. The goal case the data involves more than one
independent variable, then linear
of regression is to find a mathematical
regression is called multiple linear
relationship between the input variables regression models.
(also called predictors or independent y= mx + c+ e
variables) and the output variable (also
called the response or dependent variable). The below-given equation is used to
The output value in regression can be any denote the linear regression model:
numerical value, including integers, real
numbers, or even negative values.
Regression models can be used for both
linear and nonlinear relationships between
the input and output variables.
The main goal of regression is to create
a function that can predict the output
variable for new inputs with a certain 10

degree of accuracy. In order to achieve


this goal, regression algorithms use Simple linear regression is a type of
statistical techniques to find the optimal regression analysis that is used to model
parameters of the model that minimize the the relationship between two variables,
difference between the predicted values where one variable is considered the
and the actual values in the training predictor or independent variable, and the
dataset. other is the response or dependent
Applications of Regression : variable. The goal of simple linear
Predicting sales or revenue based on regression is to find the best linear
marketing spend or other factors. relationship between the two variables.
Estimating the price of ahouse based Here is an example of simple linear
on features such as location, size, and regression :
number of rooms. Suppose we have a dataset of 100 homes
Forecasting demand for a product with two variables: square
footage
based on historical sales data. (independent variable) and sale price
Predicting patient outcomes based on (dependent variable). We want to
medical data. determine the relationship between these
Analyzing trends in financial data two variables, and specifically, how
the sale price of a home much
such as stock prices or interest rates. increases
each additional square foot of for
space. living
Supervised Learning : Classification and Regression 45

We can use simple linear regression to To estimate the values of BO and B1, we
estimate the line of best fit for the data, can use the least squares method, which
which is a straight line that minimizes the involves minimizing the sum of the
sum of the squared differences between
squared errors between the predicted and
the predicted and actual values. actual values of the dependent variable.
In simple linear regression, the model is The equation for the least squares
expressed as: estimates of B0 and B1 are:
Y = Bo + BIX + &
BI = (X, X mean)( Y, - y_mean) /
where Y is the dependent variable, X is
(X, X mean)^2
the independent variable, BO is the
BO = y mean - B1 * X mean
intercept, B1 is the slope associated with Where xi and yi are the values of the
the predictor variable X, and åis the independent and dependent variables for
randomn error term. The slope B1 the ith observation, x mean and y_mean
indicates the change in the dependent are the mean values of the independent
variable associated with a one-unit change and dependent variables, and the sums are
in the independent variable X. taken over all observations in the dataset.
Once we have estimated the values of
Y
BO and B1, we can use the equation of
the line to predict the sale price of a
Observed Value of
Y for X
Ranuom Ere
home based on its square footage.
Prediced Value of
For example, if the estimated equation of
Y for X
the line is:
Skope
y = 100000 + 100x
Intercept o
This means that for every additional
square foot of living space, the sale price
of the home is expected to increase by
The simple linear regression model can be Rs. 100. So, a home with 1500 square
fitted to the data using various techniques feet of living space would be expected to
such as ordinary least squares (0LS) or sell for Rs. 250,000, (100000
maximum likelihood estimation (MLE). 100*1500).
The fitted model can be used to make Pros
predictions for new values of the predictor Simple linear regression is easy to
variable or to test hypotheses about the implement and interpret.
relationship between the variables.
It provides a simple way to quantify
The validity of the simple linear the strength and direction of the
regression model depends on certain
relationship between two variables.
assumptions such as linearity, normality, It can be used to make predictions
homoscedasticity, and independence of the
and estimate the value of the
error term. Violations of these assumptions
can lead to biased and inefficient estimates response variable for a given value
and incorrect inference. of the predictor variable.
Fundamentals of Machine Learning
46
is
Cons: In multiple linear regression, the model
expressed as:
Simple linear regression assumes that BnXn+e
there is a linear relationship between Y= B0+ BIX1 +B2X2 +... +
where Y is the dependent variable, B0
is
the predictor and response variables,
the coefficiente
which may not be the case in reality. the intercept, B1 to Bn are
the predictor
It assumes that the errors or residuals or slopes associated with random
variables XI to Xn, and e is the
are normally distributed and have indicate the
constant variance, which may not error term. The coefficients
variable
hold in practice. change in the dependent
associated with a one-unit change in
the
It can be sensitive to outliers,
influential observations, and corresponding predictor variable, holding
other variables constant.
violations of the underlying
assumptions. 100

Appliactions :
Simple linear regression is commonly
used in finance to model the
relationship between two financial
variables, such as stock prices and
interest rates.
" It is used in marketing to analyze the
impact of advertising on sales or the
relationship between price and 10 15

demand for a product.


It is used in engineering to model the The multiple linear regression model can
relationship between two physical be fitted to the data using various
variables, such as temperature and techniques such as ordinary least squares
pressure. (OLS) or maximum likelihood estinmation
" It is used in social sciences to study (MLE). The fitted model can be used to
the relationship between two make predictions for new values of the
psychological variables, such as predictor variables or to test hypotheses
personality and job satisfaction. about the relationships between the
4.5.2 Multiple Linear Regression variables.
Multiple linear regression is a statistical The validity of the multiple linear
technique used to model the relationship regression model depends on certain
between a response or dependent variable assumptions such as linearity, normality,
and two or more predictor or independent homoscedasticity, and independence of the
error terms. Violations of these
variables. It extends the simple linear
regression model to multiple predictors, assumptions can lead to biased and
allowing for more complex relationships inefficient estimates and incorrect
between the variables. inference.
Supervised Learning : Classification and Regression 47
Pros : such as education, age, and
" Multiple linear regression can capture experience.
the effects of multiple predictors on " It is used in healthcare to study the
the response variable simultaneously, relationship between health outcomes
allowing for a more comprehensive and multiple risk factors such as
analysis of the relationships between smoking, diet, and exercise.
the variables.
" It is used in marketing to analyze the
" It can provide insight into the relative impact of multiple advertisitg
importance of different predictors in channels on sales or the relationship
explaining the variability in the between price, promotions, and
response variable. demand for a product.
" It can be used to make predictions Comparison between multiple linear
and estimate the value of the regression and simple linear regression:
response variable for a given set of Simple linear regression is a special
predictor variable values. case of multiple linear regression
It can handle categorical predictor where there is only one predictor
variables by encoding them as variable.
dummy variables. Multiple linear regression can provide
Cons : more insight into the relationships
Multiple linear regression assumes between the variables and can handle
that there is a linear relationship more complex scenarios where there
between the predictors and the are multiple predictors.
response variable, which may not be " However, simple linear regression can
the case in reality. be easier to interpret and implement,
" It assumes that the errors or residuals and may be sufficient for simpler
are normally distributed and have problems where there is only one
constant: yariance, which may not predictor variable.
hold -in practice. 4.5.3 Logistic Regression
" It can be sensitive to outliers, Logistic regression is a statistical
influential observations, and
technique used to model the relationship
violations of the underlying between a binary dependent or response
assumptions. variable and one or more independent or
As the number of predictor variables predictor variables. The goal of logistic
increases, the model can become regression is to estimate the probability of
overfit and less interpretable. the dependent variable taking a particular
Applications value (e.g., 1 or 0) given the values of
Multiple linear regression the predictor variables.
commonly used in economics to The logistic regression model assumes that
model the relationship Between a the relationship between the predictor
dependent variable such as income, variables and the log-odds of the
and multiple independent variables dependent variable is linear. The model
48 Fundamentals of Machine Learning
predictor
can be fitted to the data using various predictions for new values of the
about the
techniques such as maximum likelihood variables or to test hypotheses
estimation (MLE) or Bayesian methods. relationship between the variables.
The fitted model can be used to make

V= bo + 0,N Linear Model

Logistic Model

1
=
1+e-be+b,x)

In above formula of the logistic regression The output of logisticregression is easy


the constant (b0) moves the curve left and
to interpret as it provides the predicted
right and the slope (bl) defines the probabilities of the dependent variable
Steepness of the curve. taking a particular value.
Types of Logistic Regresion: Logistic regression can handle both
1. Binary Logistic Regression categorical and continuous predictor
The categorical response has only two 2 variables and can detect interactions
possible outcomes. Example: Spam or between them.
Not.
2. Multinomial Logistic Regression: Logistic regression can be used to test
Three or more categories without
hypotheses about the relationships
between the variables and to estimate
ordering. Example: Predicting which the impact of the predictor
food is preferred more (Veg, Non-Veg, the dependent variable.
variables on
Vegan). Cons of logistic regression:
3. Ordinal Logistic
Regression Logistic regression assumes that the
Three or more categories with
Example: Movie rating from 1ordering.
to 5
relationship between the predictor
Pros of logistic regression : variables and the dependent
linear, which may not hold variable is
" Logistic regression is a in all
powerful and
widely used statistical technique Logistic regression assumes cases.
binary and ordinal classification for observations are independent,thatwhich
the
may not hold in cases
problems, and can handle multiple are clustered or where the data
predictor variables. correlated,
Logistic regression can be
outliers and influential sensitive to
observations,
Supervised Learning : Classification and Regression 49

which may affect the estimated Logistic regression is used in biomedical


coefticients and the goodness-of-fit of rescarch to model the probability of a
the model. disease occurring based on risk factors
Logistic regression may suffer from such as age, sex, and lifestyle factors.
overtitting or underfitting if the model " Logistic regression is used in sociala
is too complex or too simple, sciences to model the likelihood of
respectively. particular behavior or attitude given
Applications of logistic regression: demographic and psychographic
characteristics of the individuals.
" Logistic regression is widely used in
various fields such as healthcare, Logistic regression is used in marketing
marketing, finance, and social sciences to model the likelihood of a customer
for binary and ordinal classification buying a product given their past
problems such as disease diagnosis, behavior and demographic
characteristics.
customer churn, credit scoring, and
voter prediction.
Classification vs. Regression
Aspect Classification Regression

Goal To predict the categorical outcome To predict the numerical outcome

Discrete values (classes/labels) Continuous values (numbers)


Output

Examples Image classification, spam detection Stock priceprediction, housing prices

Evaluation Accuracy, precision, recall Mean squared error, root mean


squared error

Logistic regression, decision trees Linear regression, random forest


Algorithms

Problem types Binary classification, multi-class Linear regression, non-linear regression

You might also like