0% found this document useful (0 votes)
52 views

Machine Learning

Good book

Uploaded by

Bhakt Mahadev KA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Machine Learning

Good book

Uploaded by

Bhakt Mahadev KA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

What is Machine Learning?

Machine learning is a branch of artificial intelligence that enables algorithms to uncover hidden patterns
within datasets, allowing them to make predictions on new, similar data without explicit programming for
each task. Traditional machine learning combines data with statistical tools to predict outputs, yielding
actionable insights. This technology finds applications in diverse fields such as image and speech recognition,
natural language processing, recommendation systems, fraud detection, portfolio optimization, and
automating tasks.

Netflix, for example, employs collaborative and content-based filtering to recommend movies and TV shows
based on user viewing history, ratings, and genre preferences. Machine learning’s impact extends to
autonomous vehicles, drones, and robots, enhancing their adaptability in dynamic environments.

How machine learning algorithms work

Machine Learning works in the following manner.


A machine learning algorithm works by learning patterns and relationships from data to make predictions or
decisions without being explicitly programmed for each task. Here’s a simplified overview of how a typical
machine learning algorithm works:

1. Data Collection:
First, relevant data is collected or curated. This data could include examples, features, or attributes that are
important for the task at hand, such as images, text, numerical data, etc.

2. Data Preprocessing:
Before feeding the data into the algorithm, it often needs to be preprocessed. This step may involve cleaning
the data (handling missing values, outliers), transforming the data (normalization, scaling), and splitting it
into training and test sets.

3. Choosing a Model:
Depending on the task (e.g., classification, regression, clustering), a suitable machine learning model is
chosen. Examples include decision trees, neural networks, support vector machines, and more advanced
models like deep learning architectures.
4. Training the Model:
The selected model is trained using the training data. During training, the algorithm learns patterns and
relationships in the data. This involves adjusting model parameters iteratively to minimize the difference
between predicted outputs and actual outputs (labels or targets) in the training data.

5. Evaluating the Model:


Once trained, the model is evaluated using the test data to assess its performance. Metrics such as accuracy,
precision, recall, or mean squared error are used to evaluate how well the model generalizes to new, unseen
data.

6. Fine-tuning:
Models may be fine-tuned by adjusting hyperparameters (parameters that are not directly learned during
training, like learning rate or number of hidden layers in a neural network) to improve performance.

7. Prediction or Inference:
Finally, the trained model is used to make predictions or decisions on new data. This process involves
applying the learned patterns to new inputs to generate outputs, such as class labels in classification tasks or
numerical values in regression tasks.

Types of Machine Learning


Based on the methods and way of learning, machine learning is divided into mainly four types, which are:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning

1. Supervised Machine Learning:

As its name suggests, Supervised machine learning is based on supervision. It means in the supervised learning
technique, we train the machines using the "labelled" dataset, and based on the training, the machine predicts
the output. Here, the labelled data specifies that some of the inputs are already mapped to the output. More
preciously, we can say; first, we train the machine with the input and corresponding output, and then we ask the
machine to predict the output using the test dataset.

Let's understand supervised learning with an example. Suppose we have an input dataset of cats and dog
images. So, first, we will provide the training to the machine to understand the images, such as the shape & size
of the tail of cat and dog, Shape of eyes, colour, height (dogs are taller, cats are smaller), etc. After completion
of training, we input the picture of a cat and ask the machine to identify the object and predict the output. Now,
the machine is well trained, so it will check all the features of the object, such as height, shape, colour, eyes,
ears, tail, etc., and find that it's a cat. So, it will put it in the Cat category. This is the process of how the
machine identifies the objects in Supervised Learning.

The main goal of the supervised learning technique is to map the input variable(x) with the output
variable(y). Some real-world applications of supervised learning are Risk Assessment, Fraud Detection, Spam
filtering, etc.

Categories of Supervised Machine Learning

Supervised machine learning can be classified into two types of problems, which are given below:

o Classification
o Regression

a) Classification
Classification algorithms are used to solve the classification problems in which the output variable is
categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The classification algorithms predict the
categories present in the dataset. Some real-world examples of classification algorithms are Spam Detection,
Email filtering, etc.

b) Regression
Regression algorithms are used to solve regression problems in which there is a linear relationship between
input and output variables. These are used to predict continuous output variables, such as market trends,
weather prediction, etc.
Advantages and Disadvantages of Supervised Learning
Advantages:

o Since supervised learning work with the labelled dataset so we can have an exact idea about the classes
of objects.

o These algorithms are helpful in predicting the output on the basis of prior experience.
Disadvantages:

o These algorithms are not able to solve complex tasks.

o It may predict the wrong output if the test data is different from the training data.

o It requires lots of computational time to train the algorithm.

Applications of Supervised Learning

Some common applications of Supervised Learning are given below:

o Image Segmentation: Supervised Learning algorithms are used in image segmentation. In this process,
image classification is performed on different image data with pre-defined labels.
o Medical Diagnosis: Supervised algorithms are also used in the medical field for diagnosis purposes. It
is done by using medical images and past labelled data with labels for disease conditions. With such a
process, the machine can identify a disease for the new patients.
o Fraud Detection: Supervised Learning classification algorithms are used for identifying fraud
transactions, fraud customers, etc. It is done by using historic data to identify the patterns that can lead
to possible fraud.
o Spam detection: In spam detection & filtering, classification algorithms are used. These algorithms
classify an email as spam or not spam. The spam emails are sent to the spam folder.
o Speech Recognition: Supervised learning algorithms are also used in speech recognition. The algorithm
is trained with voice data, and various identifications can be done using the same, such as voice-
activated passwords, voice commands, etc.

2. Unsupervised Machine Learning:

In Unsupervised learning, there is no need for supervision. It means, in unsupervised machine learning, the
machine is trained using the unlabeled dataset, and the machine predicts the output without any supervision.
The main aim of the unsupervised learning algorithm is to group or categories the unsorted dataset
according to the similarities, patterns, and differences. Machines are instructed to find the hidden patterns
from the input dataset.
Let's take an example to understand it more preciously; suppose there is a basket of fruit images, and we input it
into the machine learning model. The images are totally unknown to the model, and the task of the machine is to
find the patterns and categories of the objects.
So, now the machine will discover its patterns and differences, such as colour difference, shape difference, and
predict the output when it is tested with the test dataset.

Categories of Unsupervised Machine Learning


Unsupervised Learning can be further classified into two types, which are given below:

o Clustering

o Association
1) Clustering
The clustering technique is used when we want to find the inherent groups from the data. It is a way to group
the objects into a cluster such that the objects with the most similarities remain in one group and have fewer or
no similarities with the objects of other groups. An example of the clustering algorithm is grouping the
customers by their purchasing behaviour.

2) Association

Association rule learning is an unsupervised learning technique, which finds interesting relations among
variables within a large dataset. The main aim of this learning algorithm is to find the dependency of one data
item on another data item and map those variables accordingly so that it can generate maximum profit. This
algorithm is mainly applied in Market Basket analysis, Web usage mining, continuous production, etc.

Advantages and Disadvantages of Unsupervised Learning Algorithm

Advantages:

o These algorithms can be used for complicated tasks compared to the supervised ones because these
algorithms work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled dataset is easier as
compared to the labelled dataset.

Disadvantages:
o The output of an unsupervised algorithm can be less accurate as the dataset is not labelled, and
algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled dataset that does
not map with the output.

Applications of Unsupervised Learning:


o Network Analysis: Unsupervised learning is used for identifying plagiarism and copyright in document
network analysis of text data for scholarly articles.
o Recommendation Systems: Recommendation systems widely use unsupervised learning techniques for
building recommendation applications for different web applications and e-commerce websites.
o Anomaly Detection: Anomaly detection is a popular application of unsupervised learning, which can
identify unusual data points within the dataset. It is used to discover fraudulent transactions.
o Singular Value Decomposition: Singular Value Decomposition or SVD is used to extract particular
information from the database. For example, extracting information of each user located at a particular
location.

3. Semi-Supervised Learning

Semi-Supervised learning is a type of Machine Learning algorithm that lies between Supervised and
Unsupervised machine learning. It represents the intermediate ground between Supervised (With Labelled
training data) and Unsupervised learning (with no labelled training data) algorithms and uses the combination of
labelled and unlabeled datasets during the training period.

To overcome the drawbacks of supervised learning and unsupervised learning algorithms, the concept of
Semi-supervised learning is introduced. The main aim of semi-supervised learning is to effectively use all the
available data, rather than only labelled data like in supervised learning. Initially, similar data is clustered along
with an unsupervised learning algorithm, and further, it helps to label the unlabeled data into labelled data. It is
because labelled data is a comparatively more expensive acquisition than unlabeled data.
We can imagine these algorithms with an example. Supervised learning is where a student is under the
supervision of an instructor at home and college. Further, if that student is self-analysing the same concept
without any help from the instructor, it comes under unsupervised learning. Under semi-supervised learning, the
student has to revise himself after analyzing the same concept under the guidance of an instructor at college.
Advantages and disadvantages of Semi-supervised Learning
Advantages:

o It is simple and easy to understand the algorithm.

o It is highly efficient.

o It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.


Disadvantages:

o Iterations results may not be stable.

o We cannot apply these algorithms to network-level data.

o Accuracy is low.
Applications of Semi-Supervised Learning:
Here are some common applications of semi-supervised learning:
o Image Classification and Object Recognition: Improve the accuracy of models by combining a
small set of labeled images with a larger set of unlabeled images.
o Natural Language Processing (NLP): Enhance the performance of language models and classifiers
by combining a small set of labeled text data with a vast amount of unlabeled text.
o Speech Recognition: Improve the accuracy of speech recognition by leveraging a limited amount of
transcribed speech data and a more extensive set of unlabeled audio.
o Recommendation Systems: Improve the accuracy of personalized recommendations by
supplementing a sparse set of user-item interactions (labeled data) with a wealth of unlabeled user
behavior data.
o Healthcare and Medical Imaging: Enhance medical image analysis by utilizing a small set of labeled
medical images alongside a larger set of unlabeled images.
4. Reinforcement Learning:
Reinforcement learning works on a feedback-based process, in which an AI agent (A software
component) automatically explore its surrounding by hitting & trail, taking action, learning from
experiences, and improving its performance. Agent gets rewarded for each good action and get punished for
each bad action; hence the goal of reinforcement learning agent is to maximize the rewards.
In reinforcement learning, there is no labelled data like supervised learning, and agents learn from their
experiences only. Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.
Categories of Reinforcement Learning:
Reinforcement learning is categorized mainly into two types of methods/algorithms:

o Positive Reinforcement Learning: Positive reinforcement learning specifies increasing the tendency
that the required behaviour would occur again by adding something. It enhances the strength of the
behaviour of the agent and positively impacts it.

o Negative Reinforcement Learning: Negative reinforcement learning works exactly opposite to the
positive RL. It increases the tendency that the specific behaviour would occur again by avoiding the
negative condition.

Real-world Use cases of Reinforcement Learning:

o Video Games: RL algorithms are much popular in gaming applications. It is used to gain super-human
performance. Some popular games that use RL algorithms are AlphaGO and AlphaGO Zero.

o Resource Management: The "Resource Management with Deep Reinforcement Learning" paper
showed that how to use RL in computer to automatically learn and schedule resources to wait for
different jobs in order to minimize average job slowdown.

o Robotics: RL is widely being used in Robotics applications. Robots are used in the industrial and
manufacturing area, and these robots are made more powerful with reinforcement learning. There are
different industries that have their vision of building intelligent robots using AI and Machine learning
technology.

o Text Mining Text-mining, one of the great applications of NLP, is now being implemented with the
help of Reinforcement Learning by Salesforce company.
Advantages and Disadvantages of Reinforcement Learning:
Advantages

o It helps in solving complex real-world problems which are difficult to be solved by general techniques.

o The learning model of RL is similar to the learning of human beings; hence most accurate results can be
found.

o Helps in achieving long term results.


Disadvantage

o RL algorithms are not preferred for simple problems.

o RL algorithms require huge data and computations.

o Too much reinforcement learning can lead to an overload of states which can weaken the results.
What is Regression?
Regression is a statistical approach used to analyze the relationship between a dependent variable (target
variable) and one or more independent variables (predictor variables). The objective is to determine the most
suitable function that characterizes the connection between these variables.
It is a supervised machine learning technique, used to predict the value of the dependent variable for new,
unseen data. It models the relationship between the input features and the target variable, allowing for the
estimation or prediction of numerical values.
Regression analysis problem works with if output variable is a real or continuous value, such as “salary” or
“weight”. It is mainly used for prediction, forecasting, time series modeling, and determining the causal-
effect relationship between variables.
In Regression, we plot a graph between the variables which best fits the given datapoints, using this plot, the
machine learning model can make predictions about the data. In simple words, "Regression shows a line or
curve that passes through all the datapoints on target-predictor graph in such a way that the vertical
distance between the datapoints and the regression line is minimum." The distance between datapoints and
line tells whether a model has captured a strong relationship or not.
Some examples of regression can be as:

o Prediction of rain using temperature and other factors

o Determining Market trends

o Prediction of road accidents due to rash driving.

Terminologies Related to the Regression Analysis:

o Dependent Variable: The main factor in Regression analysis which we want to predict or understand is
called the dependent variable. It is also called target variable.
o Independent Variable: The factors which affect the dependent variables or which are used to predict
the values of the dependent variables are called independent variable, also called as a predictor.
o Outliers: Outlier is an observation which contains either very low value or very high value in
comparison to other observed values. An outlier may hamper the result, so it should be avoided.
o Multicollinearity: If the independent variables are highly correlated with each other than other
variables, then such condition is called Multicollinearity. It should not be present in the dataset, because
it creates problem while ranking the most affecting variable.
o Underfitting and Overfitting: If our algorithm works well with the training dataset but not well with
test dataset, then such problem is called Overfitting. And if our algorithm does not perform well even
with training dataset, then such problem is called underfitting.

Types of Regression:
o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression
Linear Regression:
o Linear regression is a statistical regression method which is used for predictive analysis.
o It is one of the very simple and easy algorithms which works on regression and shows the relationship
between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent variable (X-axis) and the
dependent variable (Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear regression is called simple linear regression.
And if there is more than one input variable, then such linear regression is called multiple linear
regression.
o The relationship between variables in the linear regression model can be explained using the below
image. Here we are predicting the salary of an employee on the basis of the year of experience.

o Below is the mathematical equation for Linear regression:


Y= aX+b
Here, Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients
Some popular applications of linear regression are:

o Analyzing trends and sales estimates

o Salary forecasting

o Real estate prediction

o Arriving at ETAs in traffic.


Logistic Regression:
o Logistic regression is another supervised learning algorithm which is used to solve the classification
problems. In classification problems, we have dependent variables in a binary or discrete format such
as 0 or 1.
o Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True or
False, Spam or not spam, etc.
o It is a predictive analysis algorithm which works on the concept of probability.
o Logistic regression is a type of regression, but it is different from the linear regression algorithm in the
term how they are used.
o Logistic regression uses sigmoid function or logistic function which is a complex cost function. This
sigmoid function is used to model the data in logistic regression. The function can be represented as:

o f(x)= Output between the 0 and 1 value.


o x= input to the function
o e= base of natural logarithm.

When we provide the input values (data) to the function, it gives the S-curve as follows:

o It uses the concept of threshold levels, values above the threshold level are rounded up to 1, and values
below the threshold level are rounded up to 0.

There are three types of logistic regression:

o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)
Polynomial Regression:

Polynomial regression is used to model nonlinear relationships between the dependent variable and the
independent variables. It adds polynomial terms to the linear regression model to capture more complex
relationships.

o The equation for polynomial regression also derived from linear regression equation that means Linear
regression equation Y= b0+ b1x, is transformed into Polynomial regression equation Y= b0+b1 x+ b2x2 +
b3x3+.....+ bnxn.
o Here Y is the predicted/target output, b0, b1,... bn are the regression coefficients. x is
our independent/input variable.
o The model is still linear as the coefficients are still linear with quadratic

Support Vector Regression (SVR):

Support vector regression (SVR) is a type of regression algorithm that is based on the support vector
machine (SVM) algorithm. SVM is a type of algorithm that is used for classification tasks, but it can
also be used for regression tasks. SVR works by finding a hyperplane that minimizes the sum of the
squared residuals between the predicted and actual values.

Decision Tree Regression:


Decision tree regression is a type of regression algorithm that builds a decision tree to predict the target
value. A decision tree is a tree-like structure that consists of nodes and branches. Each node represents a
decision, and each branch represents the outcome of that decision. The goal of decision tree regression is
to build a tree that can accurately predict the target value for new data points.

Random Forest Regression:

Random forest regression is an ensemble method that combines multiple decision trees to predict the
target value. Ensemble methods are a type of machine learning algorithm that combines multiple models
to improve the performance of the overall model. Random forest regression works by building a large
number of decision trees, each of which is trained on a different subset of the training data. The final
prediction is made by averaging the predictions of all of the trees.

Regularized Linear Regression Techniques:


Ridge Regression:
Ridge regression is a type of linear regression that is used to prevent overfitting. Overfitting occurs
when the model learns the training data too well and is unable to generalize to new data.
Lasso regression
Lasso regression is another type of linear regression that is used to prevent overfitting. It does this by
adding a penalty term to the loss function that forces the model to use some weights and to set others to
zero.

What is the Classification Algorithm?


The Classification algorithm is a Supervised Learning technique that is used to identify the category of new
observations on the basis of training data. In Classification, a program learns from the given dataset or
observations and then classifies new observation into a number of classes or groups. Such as, Yes or No, 0 or 1,
Spam or Not Spam, cat or dog, etc. Classes can be called as targets/labels or categories. Since the
Classification algorithm is a Supervised learning technique, hence it takes labeled input data, which means it
contains input with the corresponding output.
In classification algorithm, a discrete output function(y) is mapped to input variable(x).

y=f(x), where y = categorical output


The best example of an ML classification algorithm is Email Spam Detector.
The main goal of the Classification algorithm is to identify the category of a given dataset, and these algorithms
are mainly used to predict the output for the categorical data.
Classification algorithms can be better understood using the below diagram. In the below diagram, there are two
classes, class A and Class B. These classes have features that are similar to each other and dissimilar to other
classes.

The algorithm which implements the classification on a dataset is known as a classifier. There are two types of
Classifications:

o Binary Classifier: If the classification problem has only two possible outcomes, then it is called as Binary
Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
o Multi-class Classifier: If a classification problem has more than two outcomes, then it is called as Multi-
class Classifier.
Example: Classifications of types of crops, Classification of types of music.
Types of ML Classification Algorithms:

Classification Algorithms can be further divided into the mainly two categories:

o Linear Models

o Logistic Regression

o Support Vector Machines

o Non-linear Models

o K-Nearest Neighbours

o Naïve Bayes

o Decision Tree Classification

o Random Forest Classification


Use cases of Classification Algorithms:

Classification algorithms can be used in different places. Below are some popular use cases of Classification
Algorithms:
o Email Spam Detection
o Speech Recognition
o Identifications of Cancer tumor cells.
o Drugs Classification
o Biometric Identification, etc.

Support Vector Machine Algorithm:

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for
Classification as well as Regression problems. However, primarily, it is used for Classification problems in
Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional
space into classes so that we can easily put the new data point in the correct category in the future. This best
decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called as
support vectors, and hence algorithm is termed as Support Vector Machine. Consider the below diagram in
which there are two different categories that are classified using a decision boundary or hyperplane:

Example: SVM can be understood with the example that we have used in the KNN classifier. Suppose we see a
strange cat that also has some features of dogs, so if we want a model that can accurately identify whether it is a
cat or dog, so such a model can be created by using the SVM algorithm. We will first train our model with lots
of images of cats and dogs so that it can learn about different features of cats and dogs, and then we test it with
this strange creature. So as support vector creates a decision boundary between these two data (cat and dog) and
choose extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis of the support
vectors, it will classify it as a cat. Consider the below diagram:

SVM algorithm can be used for Face detection, image classification, text categorization, etc.

Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space,
but we need to find out the best decision boundary that helps to classify the data points. This best boundary is
known as the hyperplane of SVM.

The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2
features (as shown in image), then hyperplane will be a straight line. And if there are 3 features, then hyperplane
will be a 2-dimension plane.

Support Vectors:

The data points or vectors that are the closest to the hyperplane and which affect the position of the hyperplane
are termed as Support Vector. Since these vectors support the hyperplane, hence called a Support vector.
How does SVM works?
Linear SVM:
The working of the SVM algorithm can be understood by using an example. Suppose we have a dataset that has
two tags (green and blue), and the dataset has two features x1 and x2. We want a classifier that can classify the
pair(x1, x2) of coordinates in either green or blue. Consider the below image:

So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But there can be
multiple lines that can separate these classes. Consider the below image:

Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or region is
called as a hyperplane. SVM algorithm finds the closest point of the lines from both the classes. These points
are called support vectors. The distance between the vectors and the hyperplane is called as margin. And the
goal of SVM is to maximize this margin. The hyperplane with maximum margin is called the optimal
hyperplane.

Non-Linear SVM: If data is linearly arranged, then we can separate it by using a straight line, but for non-
linear data, we cannot draw a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear data, we have used two
dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated as:

z=x2 +y2

By adding the third dimension, the sample space will become as below image:

So now, SVM will divide the datasets into classes in the following way. Consider the below image:

Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2d space with
z=1, then it will become as:

Hence we get a circumference of radius 1 in case of non-linear data.


K-Nearest Neighbor (KNN) Algorithm for Machine Learning:
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning
technique.
o K-NN algorithm assumes the similarity between the new case/data and available cases and put the new
case into the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This
means when new data appears then it can be easily classified into a well suite category by using K- NN
algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying
data.
o It is also called a lazy learner algorithm because it does not learn from the training set immediately
instead it stores the dataset and at the time of classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies
that data into a category that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to
know either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it works on a
similarity measure. Our KNN model will find the similar features of the new data set to the cats and
dogs images and based on the most similar features it will put it in either cat or dog category.

Why do we need a K-NN Algorithm?

Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so this
data point will lie in which of these categories. To solve this type of problem, we need a K-NN algorithm. With
the help of K-NN, we can easily identify the category or class of a particular dataset. Consider the below
diagram:
How does K-NN work?

The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbors


o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each category.
o Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.
o Step-6: Our model is ready.

Suppose we have a new data point and we need to put it in the required category. Consider the below image:

o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the
distance between two points, which we have already studied in geometry. It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in category B. Consider the below image:

o As we can see the 3 nearest neighbors are from category A, hence this new data point must belong to
category A.

How to select the value of K in the K-NN Algorithm?

Below are some points to remember while selecting the value of K in the K-NN algorithm:

o There is no particular way to determine the best value for "K", so we need to try some values to find the
best out of them. The most preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the model.
o Large values for K are good, but it may find some difficulties.

Advantages of KNN Algorithm:

o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:

o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the data points for all the
training samples.

Naïve Bayes Classifier Algorithm:


o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used
for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training dataset.
o Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in
building the fast machine learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and
classifying articles.

Why is it called Naïve Bayes?

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:

o Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent of the
occurrence of other features. Such as if the fruit is identified on the bases of color, shape, and taste, then
red, spherical, and sweet fruit is recognized as an apple. Hence each feature individually contributes to
identify that it is an apple without depending on each other.

o Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.


Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the probability
of a hypothesis with prior knowledge. It depends on the conditional probability.

o The formula for Bayes' theorem is given as:


Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.

Outlook Play

0 Rainy Yes

1 Sunny Yes

2 Overcast Yes

3 Overcast Yes

4 Sunny No

5 Rainy Yes

6 Sunny Yes

7 Overcast Yes

8 Rainy No

9 Sunny No

10 Sunny Yes

11 Rainy No

12 Overcast Yes

13 Overcast Yes

Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0

Rainy 2 2

Sunny 3 2

Total 10 4
Likelihood table weather condition:

Weather No Yes

Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71

Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)
Hence on a Sunny day, Player can play the game.

Advantages of Naïve Bayes Classifier:


o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other Algorithms.
o It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:


o Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship
between features.

Applications of Naïve Bayes Classifier:


o It is used for Credit Scoring.
o It is used in medical data classification.
o It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.
o It is used in Text classification such as Spam filtering and Sentiment analysis.
Decision Tree Classification Algorithm:
o Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-structured
classifier, where internal nodes represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes
are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a problem/decision based on
given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which expands on
further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification and Regression
Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into
subtrees.
o Below diagram explains the general structure of a decision tree:

Decision Tree Terminologies:

o Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which further
gets divided into two or more homogeneous sets.
o Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a
leaf node.
o Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the
given conditions.
o Branch/Sub Tree: A tree formed by splitting the tree.
o Pruning: Pruning is the process of removing the unwanted branches from the tree.
o Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the
child nodes.
How does the Decision Tree algorithm Work?

o In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node of
the tree. This algorithm compares the values of root attribute with the record (real dataset) attribute and,
based on the comparison, follows the branch and jumps to the next node.
o For the next node, the algorithm again compares the attribute value with the other sub-nodes and move
further. It continues the process until it reaches the leaf node of the tree. The complete process can be
better understood using the below algorithm:

o Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and called the
final node as a leaf node.

Example: Suppose there is a candidate who has a job offer and wants to decide whether he should accept the
offer or Not. So, to solve this problem, the decision tree starts with the root node (Salary attribute by ASM). The
root node splits further into the next decision node (distance from the office) and one leaf node based on the
corresponding labels. The next decision node further gets split into one decision node (Cab facility) and one leaf
node. Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offer). Consider the
below diagram:

Metrics for Splitting:


Gini Impurity:
o Gini index is a measure of impurity or purity used while creating a decision tree in the
CART(Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to the high Gini index.
o It only creates binary splits, and the CART algorithm uses the Gini index to create binary splits.
Entropy:
Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data. Entropy can be
calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Where,
 S= Total number of samples
 P(yes)= probability of yes
 P(no)= probability of no

Information Gain:
o Information gain is the measurement of changes in entropy after the segmentation of a dataset based on
an attribute.
o It calculates how much information a feature provides us about a class.
o According to the value of information gain, we split the node and build the decision tree.
o A decision tree algorithm always tries to maximize the value of information gain, and a node/attribute
having the highest information gain is split first. It can be calculated using the below formula:

Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)


Pruning:

To overcome overfitting, pruning techniques are used. Pruning reduces the size of the tree by removing
nodes that provide little power in classifying instances. There are two main types of pruning:

 Pre-pruning (Early Stopping): Stops the tree from growing once it meets certain criteria (e.g., maximum
depth, minimum number of samples per leaf).
 Post-pruning: Removes branches from a fully grown tree that do not provide significant power.
Advantages of the Decision Tree
o It is simple to understand as it follows the same process which a human follow while making any
decision in real-life.

o It can be very useful for solving decision-related problems.

o It helps to think about all the possible outcomes for a problem.

o There is less requirement of data cleaning compared to other algorithms.


Disadvantages of the Decision Tree
o The decision tree contains lots of layers, which makes it complex.

o It may have an overfitting issue, which can be resolved using the Random Forest algorithm.

o For more class labels, the computational complexity of the decision tree may increase.
Random Forest Algorithm:

Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can
be used for both Classification and Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the
performance of the model.

As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various
subsets of the given dataset and takes the average to improve the predictive accuracy of that
dataset." Instead of relying on one decision tree, the random forest takes the prediction from each tree and
based on the majority votes of predictions, and it predicts the final output.

The greater number of trees in the forest leads to higher accuracy and prevents the problem of
overfitting.

The below diagram explains the working of the Random Forest algorithm:

How does Random Forest algorithm work?

Random Forest works in two-phase first is to create the random forest by combining N decision tree, and
second is to make predictions for each tree created in the first phase.

The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points (Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision tree, and assign the new data points to the
category that wins the majority votes.

The working of the algorithm can be better understood by the below example:
Example: Suppose there is a dataset that contains multiple fruit images. So, this dataset is given to the Random
forest classifier. The dataset is divided into subsets and given to each decision tree. During the training phase,
each decision tree produces a prediction result, and when a new data point occurs, then based on the majority of
results, the Random Forest classifier predicts the final decision. Consider the below image:

Applications of Random Forest:


There are mainly four sectors where Random forest mostly used:

1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.

2. Medicine: With the help of this algorithm, disease trends and risks of the disease can be identified.

3. Land Use: We can identify the areas of similar land use by this algorithm.

4. Marketing: Marketing trends can be identified using this algorithm.

Advantages of Random Forest:

o Random Forest is capable of performing both Classification and Regression tasks.


o It is capable of handling large datasets with high dimensionality.
o It enhances the accuracy of the model and prevents the overfitting issue.

Disadvantages of Random Forest:

o Although random forest can be used for both classification and regression tasks, it is not more suitable
for Regression tasks.
Clustering in Machine Learning
Clustering or cluster analysis is a machine learning technique, which groups the unlabelled dataset. It can be
defined as "A way of grouping the data points into different clusters, consisting of similar data points. The
objects with the possible similarities remain in a group that has less or no similarities with another group."

It is an unsupervised learning method. Cluster is done by finding some similar patterns in the unlabelled dataset
such as shape, size, color, behavior, etc., and divides them as per the presence and absence of those similar
patterns.

After applying this clustering technique, each cluster or group is provided with a cluster-ID. ML system can use
this id to simplify the processing of large and complex datasets.

Example: Let's understand the clustering technique with the real-world example of Mall: When we visit any
shopping mall, we can observe that the things with similar usage are grouped together. Such as the t-shirts are
grouped in one section, and trousers are at other sections, similarly, at vegetable sections, apples, bananas,
Mangoes, etc., are grouped in separate sections, so that we can easily find out the things. The clustering
technique also works in the same way. Other examples of clustering are grouping documents according to the
topic.
The clustering technique can be widely used in various tasks. Some most common uses of this technique are:
o Market Segmentation - Businesses use clustering to group their customers and use targeted
advertisements to attract more audience
o Statistical data analysis
o Medical Imaging – Doctors use Clustering to find out diseased areas in diagnostic images like X-rays.
o Anomaly detection - To find outliers in a stream of real-time dataset or forecasting fraudulent
transactions
o Apart from these general usages, it is used by the Amazon in its recommendation system to provide the
recommendations as per the past search of products. Netflix also uses this technique to recommend the movies
and web-series to its users as per the watch history.

The below diagram explains the working of the clustering algorithm. We can see the different fruits are divided
into several groups with similar properties.
Clustering broadly divides into two subgroups:

 Hard Clustering: Each input data point either fully belongs to a cluster or not. For insta nce, in the

example above, every customer is assigned to one group out of the ten.

 Soft Clustering: Rather than assigning each input data point to a distinct cluster, it assigns a

probability or likelihood of the data point being in those clusters. For example, in the given

scenario, each customer receives a probability of being in any of the ten retail store clusters.

Below are the main clustering methods used in Machine learning:


 K Means Clustering
 Density-Based Clustering
 Distribution Model-Based Clustering
 Hierarchical Clustering
 Fuzzy Clustering

1) K Means Clustering

K means is an iterative clustering algorithm that aims to find local maxima in each iteration. This
algorithm works in these 5 steps:
Step1:
Specify the desired number of clusters K: Let us choose k=2 for these 5 data points in 2-D space.

Step 2:
Randomly assign each data point to a cluster: Let’s assign three points in cluster 1, shown using red color,
and two points in cluster 2, shown using grey color.
Step 3:
Compute cluster centroids: The centroid of data points in the red cluster is shown using the red cross, and
those in the grey cluster using a grey cross.

Step 4:
Re-assign each point to the closest cluster centroid: Note that only the data point at the bottom is assigned
to the red cluster, even though it’s closer to the centroid of the grey cluster. Thus, we assign that data
point to the grey cluster.

Step 5:

Re-compute cluster centroids: Now, re-computing the centroids for both clusters.

Repeat steps 4 and 5 until no improvements are possible: Similarly, we’ll repeat the 4th and 5th steps until
we’ll reach global optima, i.e., when there is no further switching of data points between two clusters for
two successive repeats. It will mark the termination of the algorithm if not explicitly mentioned.
2) Density-Based Clustering

The density-based clustering method connects the highly-dense areas into clusters, and the arbitrarily shaped
distributions are formed as long as the dense region can be connected. This algorithm does it by identifying
different clusters in the dataset and connects the areas of high densities into clusters. The dense areas in data
space are divided from each other by sparser areas.

These algorithms can face difficulty in clustering the data points if the dataset has varying densities and high
dimensions.

3) Distribution Model-Based Clustering

In the distribution model-based clustering method, the data is divided based on the probability of how a dataset
belongs to a particular distribution. The grouping is done by assuming some distributions commonly Gaussian
Distribution.

The example of this type is the Expectation-Maximization Clustering algorithm that uses Gaussian Mixture
Models (GMM).

4) Hierarchical Clustering

Hierarchical clustering can be used as an alternative for the partitioned clustering as there is no
requirement of pre-specifying the number of clusters to be created. In this technique, the dataset is
divided into clusters to create a tree-like structure, which is also called a dendrogram. The
observations or any number of clusters can be selected by cutting the tree at the correct level. The
most common example of this method is the Agglomerative Hierarchical algorithm.
5) Fuzzy Clustering

Fuzzy clustering is a type of soft method in which a data object may belong to more than one group
or cluster. Each dataset has a set of membership coefficients, which depend on the degree of
membership to be in a cluster. Fuzzy C-means algorithm is the example of this type of clustering; it
is sometimes also known as the Fuzzy k-means algorithm.

Clustering Algorithms mostly used in machine learning:

1. K-Means algorithm: The k-means algorithm is one of the most popular clustering algorithms. It
classifies the dataset by dividing the samples into different clusters of equal variances. The number of
clusters must be specified in this algorithm. It is fast with fewer computations required, with the linear
complexity of O(n).
2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth density of data
points. It is an example of a centroid-based model, that works on updating the candidates for centroid to
be the center of the points within a given region.
3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications with Noise. It
is an example of a density-based model similar to the mean-shift, but with some remarkable advantages.
In this algorithm, the areas of high density are separated by the areas of low density. Because of this, the
clusters can be found in any arbitrary shape.
4. Expectation-Maximization Clustering using GMM: This algorithm can be used as an alternative for
the k-means algorithm or for those cases where K-means can be failed. In GMM, it is assumed that the
data points are Gaussian distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm performs the
bottom-up hierarchical clustering. In this, each data point is treated as a single cluster at the outset and
then successively merged. The cluster hierarchy can be represented as a tree-structure.
6. Affinity Propagation: It is different from other clustering algorithms as it does not require to specify
the number of clusters. In this, each data point sends a message between the pair of data points until
convergence. It has O(N2T) time complexity, which is the main drawback of this algorithm.
Applications of Clustering:

Below are some commonly known applications of clustering technique in Machine Learning:

o In Identification of Cancer Cells: The clustering algorithms are widely used for the identification of
cancerous cells. It divides the cancerous and non-cancerous data sets into different groups.
o In Search Engines: Search engines also work on the clustering technique. The search result appears
based on the closest object to the search query. It does it by grouping similar data objects in one group
that is far from the other dissimilar objects. The accurate result of a query depends on the quality of the
clustering algorithm used.
o Customer Segmentation: It is used in market research to segment the customers based on their choice
and preferences.
o In Biology: It is used in the biology stream to classify different species of plants and animals using the
image recognition technique.
o In Land Use: The clustering technique is used in identifying the area of similar lands use in the GIS
database. This can be very useful to find that for what purpose the particular land should be used, that
means for which purpose it is more suitable.

Association Rule Learning:


Association rule learning is a type of unsupervised learning technique that checks for the dependency of one
data item on another data item and maps accordingly so that it can be more profitable. It tries to find some
interesting relations or associations among the variables of dataset.

For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so these products are
stored within a shelf or mostly nearby.

Association rule learning can be divided into three types of algorithms:

1. Apriori
2. Eclat
3. F-P Growth Algorithm

1) Apriori Algorithm

This algorithm uses frequent datasets to generate association rules. It is designed to work on the databases that
contain transactions. This algorithm uses a breadth-first search and Hash Tree to calculate the itemset
efficiently.

It is mainly used for market basket analysis and helps to understand the products that can be bought together. It
can also be used in the healthcare field to find drug reactions for patients.
2) Eclat Algorithm

Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a depth-first search
technique to find frequent itemsets in a transaction database. It performs faster execution than Apriori
Algorithm.

3) F-P Growth Algorithm

The F-P growth algorithm stands for Frequent Pattern, and it is the improved version of the Apriori
Algorithm. It represents the database in the form of a tree structure that is known as a frequent pattern or tree.
The purpose of this frequent tree is to extract the most frequent patterns.

Applications of Association Rule Learning:

It has various applications in machine learning and data mining. Below are some popular applications of
association rule learning:

o Market Basket Analysis: It is one of the popular examples and applications of association rule mining.
This technique is commonly used by big retailers to determine the association between items.
o Medical Diagnosis: With the help of association rules, patients can be cured easily, as it helps in
identifying the probability of illness for a particular disease.
o Protein Sequence: The association rules help in determining the synthesis of artificial Proteins.
o It is also used for the Catalog Design and Loss-leader Analysis and many more other applications.

You might also like