0% found this document useful (0 votes)

33 views35 pages

ML Material-I

Bayes Theorem, developed by Thomas Bayes in the 17th century, is a fundamental principle in probability and machine learning used to calculate conditional probabilities. It is particularly useful in classification tasks, with applications in various fields including finance, healthcare, and text classification through methods like Naïve Bayes. The theorem helps in making predictions based on prior knowledge and is essential for understanding concepts like independent events, conditional probability, and clustering techniques.

Uploaded by

Shweta Rajput

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views35 pages

ML Material-I

Uploaded by

Shweta Rajput

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Bayes Theorem

Bayes theorem is given by an English statistician, philosopher, and Presbyterian minister

named Mr. Thomas Bayes in 17th century. Bayes provides their thoughts in decision theory
which is extensively used in important mathematics concepts as Probability. Bayes theorem
is also widely used in Machine Learning where we need to predict classes precisely and
accurately. An important concept of Bayes theorem named Bayesian method is used to
calculate conditional probability in Machine Learning application that includes classification
tasks. Further, a simplified version of Bayes theorem (Naïve Bayes classification) is also used
to reduce computation time and average cost of the projects.

Bayes theorem is also known with some other name such as Bayes rule or Bayes
Law. Bayes theorem helps to determine the probability of an event with random
knowledge. It is used to calculate the probability of occurring one event while other one
already occurred. It is a best method to relate the condition probability and marginal
probability.

In simple words, we can say that Bayes theorem helps to contribute more accurate results.

Bayes Theorem is used to estimate the precision of values and provides a method for
calculating the conditional probability. However, it is hypocritically a simple calculation but it
is used to easily calculate the conditional probability of events where intuition often fails.
Some of the data scientist assumes that Bayes theorem is most widely used in financial
industries but it is not like that. Other than financial, Bayes theorem is also extensively applied
in health and medical, research and survey industry, aeronautical sector, etc.

What is Bayes Theorem?

Bayes theorem is one of the most popular machine learning concepts that helps to calculate
the probability of occurring one event with uncertain knowledge while other one has already
occurred.

Bayes' theorem can be derived using product rule and conditional probability of event X with
known event Y:

o According to the product rule we can express as the probability of event X with
known event Y as follows;
P(X ? Y)= P(X|Y) P(Y) {equation 1}

o Further, the probability of event Y with known event X:

P(X ? Y)= P(Y|X) P(X) {equation 2}
Mathematically, Bayes theorem can be expressed by combining both equations on right hand
side. We will get:
Here, both events X and Y are independent events which means probability of outcome of
both events does not depends one another.

The above equation is called as Bayes Rule or Bayes Theorem.

o P(X|Y) is called as posterior, which we need to calculate. It is defined as updated

probability after considering the evidence.
o P(Y|X) is called the likelihood. It is the probability of evidence when hypothesis is
true.
o P(X) is called the prior probability, probability of hypothesis before considering
the evidence
o P(Y) is called marginal probability. It is defined as the probability of evidence under
any consideration.
Hence, Bayes Theorem can be written as:

posterior = likelihood * prior / evidence

Prerequisites for Bayes Theorem

While studying the Bayes theorem, we need to understand few important concepts. These
are as follows:

1. Experiment

An experiment is defined as the planned operation carried out under controlled condition such
as tossing a coin, drawing a card and rolling a dice, etc.

2. Sample Space

During an experiment what we get as a result is called as possible outcomes and the set of
all possible outcome of an event is known as sample space. For example, if we are rolling a
dice, sample space will be:

S1 = {1, 2, 3, 4, 5, 6}

Similarly, if our experiment is related to toss a coin and recording its outcomes, then sample
space will be:
S2 = {Head, Tail}

3. Event

Event is defined as subset of sample space in an experiment. Further, it is also called as set
of outcomes.

Assume in our experiment of rolling a dice, there are two event A and B such that;

A = Event when an even number is obtained = {2, 4, 6}

B = Event when a number is greater than 4 = {5, 6}

o Probability of the event A ''P(A)''= Number of favourable outcomes / Total

number of possible outcomes
P(E) = 3/6 =1/2 =0.5
o Similarly, Probability of the event B ''P(B)''= Number of favourable outcomes /
Total number of possible outcomes
=2/6
=1/3
=0.333
o Union of event A and B:
A∪B = {2, 4, 5, 6}
Intersection of event A and B:
A∩B= {6}

Disjoint Event: If the intersection of the event A and B is an empty set or null then such events
are known as disjoint event or mutually exclusive events also.
4. Random Variable:

It is a real value function which helps mapping between sample space and a real line of an
experiment. A random variable is taken on some random values and each value having some
probability. However, it is neither random nor a variable but it behaves as a function which
can either be discrete, continuous or combination of both.

5. Exhaustive Event:

As per the name suggests, a set of events where at least one event occurs at a time, called
exhaustive event of an experiment.

Thus, two events A and B are said to be exhaustive if either A or B definitely occur at a time
and both are mutually exclusive for e.g., while tossing a coin, either it will be a Head or may
be a Tail.

6. Independent Event:

Two events are said to be independent when occurrence of one event does not affect the
occurrence of another event. In simple words we can say that the probability of outcome of
both events does not depends one another.

Mathematically, two events A and B are said to be independent if:

P(A ∩ B) = P(AB) = P(A)*P(B)

7. Conditional Probability:
Conditional probability is defined as the probability of an event A, given that another event B
has already occurred (i.e. A conditional B). This is represented by P(A|B) and we can define
it as:

P(A|B) = P(A ∩ B) / P(B)

8. Marginal Probability:

Marginal probability is defined as the probability of an event A occurring independent of any

other event B. Further, it is considered as the probability of evidence under any consideration.

P(A) = P(A|B)P(B) + P(A|~B)P(~B)

Here ~B represents the event that B does not occur.

Naïve Bayes Classifier Algorithm
o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training dataset.
o Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
which helps in building the fast machine learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the probability
of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.

Why is it called Naïve Bayes?

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be
described as:

o Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identify that it is an apple without
depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine
the probability of a hypothesis with prior knowledge. It depends on the conditional
probability.
o The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:

Working of Naïve Bayes' Classifier can be understood with the help of the below example:

Suppose we have a dataset of weather conditions and corresponding target variable "Play".
So using this dataset we need to decide that whether we should play or not on a particular
day according to the weather conditions. So to solve this problem, we need to follow the below
steps:

1. Convert the given dataset into frequency tables.

2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Problem: If the weather is sunny, then the Player should play or not?

Solution: To solve this, first consider the below dataset:

OutlookOutlook PlayPlay

0 Rainy Yes

1 Sunny Yes

2 Overcast Yes

3 Overcast Yes

4 Sunny No

5 Rainy Yes

6 Sunny Yes

7 Overcast Yes

8 Rainy No

9 Sunny No

10 Sunny Yes

11 Rainy No

12 Overcast Yes

13 Overcast Yes

Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0

Rainy 2 2

Sunny 3 2

Total 10 4

Likelihood table weather condition:

Weather No Yes

Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71

Applying Bayes'theorem:

P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)

P(Sunny|Yes)= 3/10= 0.3

P(Sunny)= 0.35

P(Yes)=0.71

So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

P(Sunny|NO)= 2/4=0.5
P(No)= 0.29

P(Sunny)= 0.35

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

Hence on a Sunny day, Player can play the game.

Advantages of Naïve Bayes Classifier:

o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other Algorithms.
o It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:

o Naive Bayes assumes that all features are independent or unrelated, so it cannot learn
the relationship between features.

Applications of Naïve Bayes Classifier:

o It is used for Credit Scoring.
o It is used in medical data classification.
o It can be used in real-time predictions because Naïve Bayes Classifier is an eager
learner.
o It is used in Text classification such as Spam filtering and Sentiment analysis.
Clustering in Machine Learning
Clustering or cluster analysis is a machine learning technique, which groups the unlabelled
dataset. It can be defined as "A way of grouping the data points into different clusters,
consisting of similar data points. The objects with the possible similarities remain in a
group that has less or no similarities with another group."

It does it by finding some similar patterns in the unlabelled dataset such as shape, size, color,
behavior, etc., and divides them as per the presence and absence of those similar patterns.

It is an unsupervised learning method, hence no supervision is provided to the algorithm, and

it deals with the unlabeled dataset.

After applying this clustering technique, each cluster or group is provided with a cluster-ID.
ML system can use this id to simplify the processing of large and complex datasets.

The clustering technique is commonly used for statistical data analysis.

Note: Clustering is somewhere similar to the classification algorithm, but

the difference is the type of dataset that we are using. In classification, we
work with the labeled data set, whereas in clustering, we work with the
unlabelled dataset.
Example: Let's understand the clustering technique with the real-world example of Mall:
When we visit any shopping mall, we can observe that the things with similar usage are
grouped together. Such as the t-shirts are grouped in one section, and trousers are at other
sections, similarly, at vegetable sections, apples, bananas, Mangoes, etc., are grouped in
separate sections, so that we can easily find out the things. The clustering technique also
works in the same way. Other examples of clustering are grouping documents according to
the topic.

The clustering technique can be widely used in various tasks. Some most common uses of
this technique are:

o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.
Apart from these general usages, it is used by the Amazon in its recommendation system to
provide the recommendations as per the past search of products. Netflix also uses this
technique to recommend the movies and web-series to its users as per the watch history.

The below diagram explains the working of the clustering algorithm. We can see the different
fruits are divided into several groups with similar properties.
Types of Clustering Methods
The clustering methods are broadly divided into Hard clustering (datapoint belongs to only
one group) and Soft Clustering (data points can belong to another group also). But there are
also other various approaches of Clustering exist. Below are the main clustering methods
used in Machine learning:

1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering

Partitioning Clustering
It is a type of clustering that divides the data into non-hierarchical groups. It is also known as
the centroid-based method. The most common example of partitioning clustering is the K-
Means Clustering algorithm.

In this type, the dataset is divided into a set of k groups, where K is used to define the number
of pre-defined groups. The cluster center is created in such a way that the distance between
the data points of one cluster is minimum as compared to another cluster centroid.
Density-Based Clustering
The density-based clustering method connects the highly-dense areas into clusters, and the
arbitrarily shaped distributions are formed as long as the dense region can be connected.
This algorithm does it by identifying different clusters in the dataset and connects the areas
of high densities into clusters. The dense areas in data space are divided from each other by
sparser areas.

These algorithms can face difficulty in clustering the data points if the dataset has varying
densities and high dimensions.
Distribution Model-Based Clustering
In the distribution model-based clustering method, the data is divided based on the probability
of how a dataset belongs to a particular distribution. The grouping is done by assuming some
distributions commonly Gaussian Distribution.

The example of this type is the Expectation-Maximization Clustering algorithm that uses
Gaussian Mixture Models (GMM).
Hierarchical Clustering
Hierarchical clustering can be used as an alternative for the partitioned clustering as there is
no requirement of pre-specifying the number of clusters to be created. In this technique, the
dataset is divided into clusters to create a tree-like structure, which is also called
a dendrogram. The observations or any number of clusters can be selected by cutting the
tree at the correct level. The most common example of this method is the Agglomerative
Hierarchical algorithm.

Fuzzy Clustering
Fuzzy clustering is a type of soft method in which a data object may belong to more than one
group or cluster. Each dataset has a set of membership coefficients, which depend on the
degree of membership to be in a cluster. Fuzzy C-means algorithm is the example of this
type of clustering; it is sometimes also known as the Fuzzy k-means algorithm.

Clustering Algorithms
The Clustering algorithms can be divided based on their models that are explained above.
There are different types of clustering algorithms published, but only a few are commonly
used. The clustering algorithm is based on the kind of data that we are using. Such as, some
algorithms need to guess the number of clusters in the given dataset, whereas some are
required to find the minimum distance between the observation of the dataset.

Here we are discussing mainly popular Clustering algorithms that are widely used in machine
learning:

1. K-Means algorithm: The k-means algorithm is one of the most popular clustering algorithms.
It classifies the dataset by dividing the samples into different clusters of equal variances. The
number of clusters must be specified in this algorithm. It is fast with fewer computations
required, with the linear complexity of O(n).
2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth density
of data points. It is an example of a centroid-based model, that works on updating the
candidates for centroid to be the center of the points within a given region.
3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications with
Noise. It is an example of a density-based model similar to the mean-shift, but with some
remarkable advantages. In this algorithm, the areas of high density are separated by the areas
of low density. Because of this, the clusters can be found in any arbitrary shape.
4. Expectation-Maximization Clustering using GMM: This algorithm can be used as an
alternative for the k-means algorithm or for those cases where K-means can be failed. In GMM,
it is assumed that the data points are Gaussian distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm performs
the bottom-up hierarchical clustering. In this, each data point is treated as a single cluster at
the outset and then successively merged. The cluster hierarchy can be represented as a tree-
structure.
6. Affinity Propagation: It is different from other clustering algorithms as it does not require to
specify the number of clusters. In this, each data point sends a message between the pair of
data points until convergence. It has O(N2T) time complexity, which is the main drawback of
this algorithm.

Applications of Clustering
Below are some commonly known applications of clustering technique in Machine Learning:

o In Identification of Cancer Cells: The clustering algorithms are widely used for the
identification of cancerous cells. It divides the cancerous and non-cancerous data sets
into different groups.
o In Search Engines: Search engines also work on the clustering technique. The search
result appears based on the closest object to the search query. It does it by grouping
similar data objects in one group that is far from the other dissimilar objects. The accurate
result of a query depends on the quality of the clustering algorithm used.
o Customer Segmentation: It is used in market research to segment the customers based
on their choice and preferences.
o In Biology: It is used in the biology stream to classify different species of plants and
animals using the image recognition technique.
o In Land Use: The clustering technique is used in identifying the area of similar lands use
in the GIS database. This can be very useful to find that for what purpose the particular
land should be used, that means for which purpose it is more suitable.

Hierarchical Clustering in Machine Learning

Hierarchical clustering is another unsupervised machine learning algorithm, which is used to
group the unlabeled datasets into a cluster and also known as hierarchical cluster
analysis or HCA.

In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-
shaped structure is known as the dendrogram.
Sometimes the results of K-means clustering and hierarchical clustering may look similar, but
they both differ depending on how they work. As there is no requirement to predetermine the
number of clusters as we did in the K-Means algorithm.

The hierarchical clustering technique has two approaches:

1. Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm starts with

taking all data points as single clusters and merging them until one cluster is left.
2. Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-down
approach.

Why hierarchical clustering?

As we already have other clustering algorithms such as K-Means Clustering, then why we
need hierarchical clustering? So, as we have seen in the K-means clustering that there are
some challenges with this algorithm, which are a predetermined number of clusters, and it
always tries to create the clusters of the same size. To solve these two challenges, we can
opt for the hierarchical clustering algorithm because, in this algorithm, we don't need to have
knowledge about the predefined number of clusters.

In this topic, we will discuss the Agglomerative Hierarchical clustering algorithm.

Agglomerative Hierarchical clustering

The agglomerative hierarchical clustering algorithm is a popular example of HCA. To group
the datasets into clusters, it follows the bottom-up approach. It means, this algorithm
considers each dataset as a single cluster at the beginning, and then start combining the
closest pair of clusters together. It does this until all the clusters are merged into a single
cluster that contains all the datasets.

This hierarchy of clusters is represented in the form of the dendrogram.

How the Agglomerative Hierarchical clustering Work?

The working of the AHC algorithm can be explained using the below steps:

o Step-1: Create each data point as a single cluster. Let's say there are N data points, so
the number of clusters will also be N.
Step-2: Take two closest data points or clusters and merge them to form one cluster. So, there
will now be N-1 clusters.

Step-3: Again, take the two closest clusters and merge them together to form one cluster. There
will be N-2 clusters.
Step-4: Repeat Step 3 until only one cluster left. So, we will get the following clusters. Consider
the below images:
o Step-5: Once all the clusters are combined into one big cluster, develop the dendrogram
to divide the clusters as per the problem.

Note: To better understand hierarchical clustering, it is advised to have a

look on k-means clustering

Measure for the distance between two clusters

As we have seen, the closest distance between the two clusters is crucial for the hierarchical
clustering. There are various ways to calculate the distance between two clusters, and these
ways decide the rule for clustering. These measures are called Linkage methods. Some of
the popular linkage methods are given below:
1. Single Linkage: It is the Shortest Distance between the closest points of the clusters.
Consider the below image:

Complete Linkage: It is the farthest distance between the two points of two different clusters. It is
one of the popular linkage methods as it forms tighter clusters than single-linkage.

1. Average Linkage: It is the linkage method in which the distance between each pair of datasets
is added up and then divided by the total number of datasets to calculate the average distance
between two clusters. It is also one of the most popular linkage methods.
2. Centroid Linkage: It is the linkage method in which the distance between the centroid of the
clusters is calculated. Consider the below image:

From the above-given approaches, we can apply any of them according to the type of problem
or business requirement.

Woking of Dendrogram in Hierarchical clustering

The dendrogram is a tree-like structure that is mainly used to store each step as a memory
that the HC algorithm performs. In the dendrogram plot, the Y-axis shows the Euclidean
distances between the data points, and the x-axis shows all the data points of the given
dataset.

The working of the dendrogram can be explained using the below diagram:
In the above diagram, the left part is showing how clusters are created in agglomerative
clustering, and the right part is showing the corresponding dendrogram.

o As we have discussed above, firstly, the datapoints P2 and P3 combine together

and form a cluster, correspondingly a dendrogram is created, which connects P2
and P3 with a rectangular shape. The hight is decided according to the Euclidean
distance between the data points.
o In the next step, P5 and P6 form a cluster, and the corresponding dendrogram is
created. It is higher than of previous, as the Euclidean distance between P5 and
P6 is a little bit greater than the P2 and P3.
o Again, two new dendrograms are created that combine P1, P2, and P3 in one
dendrogram, and P4, P5, and P6, in another dendrogram.
o At last, the final dendrogram is created that combines all the data points together.
We can cut the dendrogram tree structure at any level as per our requirement.

K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering
problems in machine learning or data science. In this topic, we will learn what is K-means
clustering algorithm, how the algorithm works, along with the Python implementation of k-
means clustering.

What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled
dataset into different clusters. Here K defines the number of pre-defined clusters that need to
be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three
clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a
way that each dataset belongs only one group that has similar properties.
It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main
aim of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k should
be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.

The below diagram explains the working of the K-means Clustering Algorithm:

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables
is given below:

o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into
different clusters. It means here we will try to group these datasets into two different
clusters.
o We need to choose some random k points or centroid to form the cluster. These points
can be either the points from the dataset or any other point. So, here we are selecting the
below two points as k points, which are not the part of our dataset. Consider the below
image:

o Now we will assign each data point of the scatter plot to its closest K-point or centroid. We
will compute it by applying some mathematics that we have studied to calculate the
distance between two points. So, we will draw a median between both the centroids.
Consider the below image:

From the above image, it is clear that points left side of the line is near to the K1 or blue
centroid, and points to the right of the line are close to the yellow centroid. Let's color them
as blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will repeat the process by choosing a new
centroid. To choose the new centroids, we will compute the center of gravity of these
centroids, and will find new centroids as below:

o Next, we will reassign each datapoint to the new centroid. For this, we will repeat the same
process of finding a median line. The median will be like below image:
From the above image, we can see, one yellow point is on the left side of the line, and two
blue points are right to the line. So, these three points will be assigned to new centroids.

As reassignment has taken place, so we will again go to the step-4, which is finding new
centroids or K-points.
o We will repeat the process by finding the center of gravity of centroids, so the new
centroids will be as shown in the below image:
o As we got the new centroids so again will draw the median line and reassign the data
points. So, the image will be:

o We can see in the above image; there are no dissimilar data points on either side of the
line, which means our model is formed. Consider the below image:

As our model is ready, so we can now remove the assumed centroids, and the two final
clusters will be as shown in the below image:
How to choose the value of "K number of clusters" in K-
means Clustering?
The performance of the K-means clustering algorithm depends upon highly efficient clusters
that it forms. But choosing the optimal number of clusters is a big task. There are some
different ways to find the optimal number of clusters, but here we are discussing the most
appropriate method to find the number of clusters or value of K. The method is given below:

Elbow Method
The Elbow method is one of the most popular ways to find the optimal number of clusters.
This method uses the concept of WCSS value. WCSS stands for Within Cluster Sum of
Squares, which defines the total variations within a cluster. The formula to calculate the value
of WCSS (for 3 clusters) is given below:

WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2distance(Pi C2)2+∑Pi in CLuster3 distance(Pi C3)2

In the above formula of WCSS,

∑P distance(Pi C1)2: It is the sum of the square of the distances between each data point
i in Cluster1

and its centroid within a cluster1 and the same for the other two terms.

To measure the distance between data points and centroid, we can use any method such as
Euclidean distance or Manhattan distance.

To find the optimal value of clusters, the elbow method follows the below steps:
o It executes the K-means clustering on a given dataset for different K values (ranges from
1-10).
o For each value of K, calculates the WCSS value.
o Plots a curve between calculated WCSS values and the number of clusters K.
o The sharp point of bend or a point of the plot looks like an arm, then that point is considered
as the best value of K.
Since the graph shows the sharp bend, which looks like an elbow, hence it is known as the
elbow method. The graph for the elbow method looks like the below image:

Note: We can choose the number of clusters equal to the given data points.
If we choose the number of clusters equal to the data points, then the value
of WCSS becomes zero, and that will be the endpoint of the plot.

ML Last Document Group 2 PDF
No ratings yet
ML Last Document Group 2 PDF
13 pages
Bayes Theorem in Machine Learning
No ratings yet
Bayes Theorem in Machine Learning
37 pages
Bayes Theorem
No ratings yet
Bayes Theorem
20 pages
Machine Learning & Bayesian Methods
No ratings yet
Machine Learning & Bayesian Methods
28 pages
Naive Bayes
No ratings yet
Naive Bayes
17 pages
Unit - 3 Itai & ML
No ratings yet
Unit - 3 Itai & ML
57 pages
Bayes Theorem
No ratings yet
Bayes Theorem
7 pages
Unit 4
No ratings yet
Unit 4
36 pages
Introduction to Naive Bayes Algorithm
No ratings yet
Introduction to Naive Bayes Algorithm
11 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
29 pages
Bayes' Theorem in AI: Concepts & Applications
No ratings yet
Bayes' Theorem in AI: Concepts & Applications
28 pages
Unit 2
No ratings yet
Unit 2
20 pages
Naive Bayes for Beginners
No ratings yet
Naive Bayes for Beginners
24 pages
UNIT 2 AAM Notes
No ratings yet
UNIT 2 AAM Notes
38 pages
Additional Material - Naive Bayes
No ratings yet
Additional Material - Naive Bayes
6 pages
Bayesian Inference & Naive Bayes Guide
No ratings yet
Bayesian Inference & Naive Bayes Guide
14 pages
Bayes
No ratings yet
Bayes
5 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
37 pages
Machine Learning for Data Science Students
No ratings yet
Machine Learning for Data Science Students
37 pages
Unit Iii Bayesian Learning
No ratings yet
Unit Iii Bayesian Learning
5 pages
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
No ratings yet
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
47 pages
Bayesian Classification
No ratings yet
Bayesian Classification
7 pages
DWM Exp 4
No ratings yet
DWM Exp 4
7 pages
Notes On ML
No ratings yet
Notes On ML
42 pages
Unit 6
No ratings yet
Unit 6
19 pages
Bayes Theorem in Machine Learning
No ratings yet
Bayes Theorem in Machine Learning
40 pages
Unit II Classification
No ratings yet
Unit II Classification
31 pages
Probabilistic Reasoning in AI
No ratings yet
Probabilistic Reasoning in AI
43 pages
Class 4 Naive Bayes Classification 2
No ratings yet
Class 4 Naive Bayes Classification 2
6 pages
Naive Bayes Classification Numerical Example - Coding Infinite
No ratings yet
Naive Bayes Classification Numerical Example - Coding Infinite
14 pages
Baysian Modelling
No ratings yet
Baysian Modelling
16 pages
I239-5 Naive Bayes
No ratings yet
I239-5 Naive Bayes
35 pages
Module 2 - Bayesian Learning
No ratings yet
Module 2 - Bayesian Learning
7 pages
Bayes Decision Theorylect3
No ratings yet
Bayes Decision Theorylect3
12 pages
Bayes Rule PR-2
No ratings yet
Bayes Rule PR-2
5 pages
Unit I Probabilistic Reasoning I 9
No ratings yet
Unit I Probabilistic Reasoning I 9
20 pages
Naïve Bayes Classification
No ratings yet
Naïve Bayes Classification
21 pages
Naive Bayes Model
No ratings yet
Naive Bayes Model
10 pages
Detailed Discuss
No ratings yet
Detailed Discuss
2 pages
Assignment 01 Math Kishan
No ratings yet
Assignment 01 Math Kishan
16 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
10 pages
AML - Unit - 3
No ratings yet
AML - Unit - 3
2 pages
(Math AA HL) - Exploring The Use of Bayes Theorem in AI Image Processing and Machine Learning
No ratings yet
(Math AA HL) - Exploring The Use of Bayes Theorem in AI Image Processing and Machine Learning
12 pages
What Is Naive Bayes?
No ratings yet
What Is Naive Bayes?
6 pages
Naive Bayes Theory
No ratings yet
Naive Bayes Theory
4 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
3 pages
Naive Bayes Algorithm
No ratings yet
Naive Bayes Algorithm
11 pages
Bayesian Concept Learning Guide
No ratings yet
Bayesian Concept Learning Guide
157 pages
Naïve Bayes & Bayesian Networks Guide
No ratings yet
Naïve Bayes & Bayesian Networks Guide
9 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
16 pages
Baye's Theorem - Example
No ratings yet
Baye's Theorem - Example
7 pages
What Is The Bayes' Theorem?
100% (2)
What Is The Bayes' Theorem?
12 pages
Bayesian
No ratings yet
Bayesian
14 pages
Naive Bayes Classification Guide
No ratings yet
Naive Bayes Classification Guide
21 pages
Topic 6: Conditional Probability - Bayes' Theorem Bayes Theorem
No ratings yet
Topic 6: Conditional Probability - Bayes' Theorem Bayes Theorem
8 pages
Bayesian Modeling Guide
No ratings yet
Bayesian Modeling Guide
13 pages
Ai2 Unit
No ratings yet
Ai2 Unit
22 pages
Solder Fume Extractor
No ratings yet
Solder Fume Extractor
3 pages
Computer Assignment
No ratings yet
Computer Assignment
4 pages
1 4 Voucher MBOKA WIFI ZONE 2heures Up 333 09.10.24 Tickets 2 Heures
No ratings yet
1 4 Voucher MBOKA WIFI ZONE 2heures Up 333 09.10.24 Tickets 2 Heures
24 pages
Numerical Linear Algebra
No ratings yet
Numerical Linear Algebra
6 pages
Let Review Professional Education (Educational Technology)
100% (2)
Let Review Professional Education (Educational Technology)
7 pages
3 JumpAndCall v21
No ratings yet
3 JumpAndCall v21
38 pages
Lab 1 Introduction To Object Oriented Software Engineering and Object Orientation in Uml
No ratings yet
Lab 1 Introduction To Object Oriented Software Engineering and Object Orientation in Uml
9 pages
Exploring Coffee Painting Techniques
No ratings yet
Exploring Coffee Painting Techniques
6 pages
Guidelines For Secure Application Design, Development, Implementation & Operations
No ratings yet
Guidelines For Secure Application Design, Development, Implementation & Operations
15 pages
Tech Titans
No ratings yet
Tech Titans
17 pages
MPMC Unit-1
No ratings yet
MPMC Unit-1
26 pages
Spouge's Approximation - Wikipedia
No ratings yet
Spouge's Approximation - Wikipedia
2 pages
SQL Dorks
0% (1)
SQL Dorks
58 pages
Dos Atacks
No ratings yet
Dos Atacks
16 pages
Daa SBMP
No ratings yet
Daa SBMP
11 pages
Lab Sheet 7
No ratings yet
Lab Sheet 7
4 pages
12th IDEAL Q Bank (Full Answers) CS EM
No ratings yet
12th IDEAL Q Bank (Full Answers) CS EM
30 pages
Screw Blowers
No ratings yet
Screw Blowers
20 pages
Data Analytics for IoT/M2M Insights
No ratings yet
Data Analytics for IoT/M2M Insights
25 pages
Convolutional Networks Guide
No ratings yet
Convolutional Networks Guide
15 pages
New Ousl Guid
No ratings yet
New Ousl Guid
35 pages
Aks
No ratings yet
Aks
13 pages
AutoCAD Command Aliases
No ratings yet
AutoCAD Command Aliases
3 pages
Iot Based Health Monitoring System
100% (1)
Iot Based Health Monitoring System
12 pages
If Mutable Strings
No ratings yet
If Mutable Strings
2 pages
OOP Lab 13 Tasks Unsolved
No ratings yet
OOP Lab 13 Tasks Unsolved
6 pages
Ais Reviewer Finals
No ratings yet
Ais Reviewer Finals
15 pages
Resume Anvesh Garg Recent
No ratings yet
Resume Anvesh Garg Recent
2 pages
HTML Code Examples for Web Development
No ratings yet
HTML Code Examples for Web Development
9 pages
Chapter08 JavaScript1LanguageFundamentals
No ratings yet
Chapter08 JavaScript1LanguageFundamentals
58 pages

ML Material-I

Uploaded by

ML Material-I

Uploaded by

Bayes Theorem

Bayes theorem is given by an English statistician, philosopher, and Presbyterian minister

What is Bayes Theorem?

o Further, the probability of event Y with known event X:

The above equation is called as Bayes Rule or Bayes Theorem.

o P(X|Y) is called as posterior, which we need to calculate. It is defined as updated

posterior = likelihood * prior / evidence

Prerequisites for Bayes Theorem

A = Event when an even number is obtained = {2, 4, 6}

B = Event when a number is greater than 4 = {5, 6}

o Probability of the event A ''P(A)''= Number of favourable outcomes / Total

Mathematically, two events A and B are said to be independent if:

P(A ∩ B) = P(AB) = P(A)*P(B)

P(A|B) = P(A ∩ B) / P(B)

Marginal probability is defined as the probability of an event A occurring independent of any

P(A) = P(A|B)*P(B) + P(A|~B)*P(~B)

Here ~B represents the event that B does not occur.

Why is it called Naïve Bayes?

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:

1. Convert the given dataset into frequency tables.

Solution: To solve this, first consider the below dataset:

Frequency table for the Weather Conditions:

Likelihood table weather condition:

Overcast 0 5 5/14= 0.35

All 4/14=0.29 10/14=0.71

P(Sunny|Yes)= 3/10= 0.3

So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

Hence on a Sunny day, Player can play the game.

Advantages of Naïve Bayes Classifier:

Disadvantages of Naïve Bayes Classifier:

Applications of Naïve Bayes Classifier:

It is an unsupervised learning method, hence no supervision is provided to the algorithm, and

The clustering technique is commonly used for statistical data analysis.

Note: Clustering is somewhere similar to the classification algorithm, but

Hierarchical Clustering in Machine Learning

The hierarchical clustering technique has two approaches:

1. Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm starts with

Why hierarchical clustering?

In this topic, we will discuss the Agglomerative Hierarchical clustering algorithm.

Agglomerative Hierarchical clustering

This hierarchy of clusters is represented in the form of the dendrogram.

How the Agglomerative Hierarchical clustering Work?

Note: To better understand hierarchical clustering, it is advised to have a

Measure for the distance between two clusters

Woking of Dendrogram in Hierarchical clustering

o As we have discussed above, firstly, the datapoints P2 and P3 combine together

K-Means Clustering Algorithm

What is K-Means Algorithm?

The k-means clustering algorithm mainly performs two tasks:

How does the K-Means Algorithm Work?

Step-1: Select the number K to decide the number of clusters.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

In the above formula of WCSS,

You might also like

P(A) = P(A|B)P(B) + P(A|~B)P(~B)