0% found this document useful (0 votes)
4 views

ML pp8_u2

The document provides an overview of the Naïve Bayes Classifier, a supervised learning algorithm based on Bayes' theorem, primarily used for text classification and probabilistic predictions. It explains the algorithm's assumptions, advantages, disadvantages, and various types such as Gaussian, Multinomial, and Bernoulli. Additionally, it introduces Bayesian Networks, which represent relationships between variables and their conditional dependencies, and discusses their applications in real-world scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

ML pp8_u2

The document provides an overview of the Naïve Bayes Classifier, a supervised learning algorithm based on Bayes' theorem, primarily used for text classification and probabilistic predictions. It explains the algorithm's assumptions, advantages, disadvantages, and various types such as Gaussian, Multinomial, and Bernoulli. Additionally, it introduces Bayesian Networks, which represent relationships between variables and their conditional dependencies, and discusses their applications in real-world scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Lecture-8

Machine Learning with


Python
Naïve Bayes Classifier Algorithm
❖Naïve Bayes algorithm is a supervised learning algorithm, which is based
on Bayes theorem and used for solving classification problems.
❖It is mainly used in text classification that includes a high-dimensional training
dataset.
❖It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
❖Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
Why is it called Naïve Bayes?
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which
can be described as:
❖Naïve: It is called Naïve because it assumes that the occurrence of a certain
feature is independent of the occurrence of other features. Such as if the fruit is
identified on the bases of color, shape, and taste, then red, spherical, and sweet
fruit is recognized as an apple. Hence each feature individually contributes to
identity that it is an apple without depending on each other.
❖Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem
❖Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on
the conditional probability.
❖The formula for Bayes' theorem is given as:

Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event
B.
P(B|A) is Likelihood probability: Probability of the evidence given that the
probability of a hypothesis is true.
Working of Naïve Bayes' Classifier
❖Suppose we have a dataset of weather conditions and corresponding target
variable "Play". So using this dataset we need to decide that whether we should
play or not on a particular day according to the weather conditions.
So to solve this problem, we need to follow the below steps:
❖Convert the given dataset into frequency tables.
❖Generate Likelihood table by finding the probabilities of given features.
❖Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?
Solution: To solve this, first consider the below dataset:
Sample
Dataset
4
Advantages of Naïve Bayes Classifier:
❖Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
datasets.
❖It can be used for Binary as well as Multi-class Classifications.
❖It performs well in Multi-class predictions as compared to the other Algorithms.
❖It is the most popular choice for text classification problems.
Disadvantages of Naïve Bayes Classifier:
❖Naive Bayes assumes that all features are independent or unrelated, so it
cannot learn the relationship between features.
Applications of Naïve Bayes Classifier:
❖It is used for Credit Scoring.
❖It is used in medical data classification.
❖It can be used in real-time predictions because Naïve Bayes Classifier is an
eager learner.
❖It is used in Text classification such as Spam filtering and Sentiment analysis.
Types of Naïve Bayes Model:
There are three types of Naive Bayes Model, which are given below:
❖Gaussian: The Gaussian model assumes that features follow a normal
distribution. This means if predictors take continuous values instead of discrete,
then the model assumes that these values are sampled from the Gaussian
distribution.
❖Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification
problems, it means a particular document belongs to which category such as
Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
Types of Naïve Bayes Model:
Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but
the predictor variables are the independent Booleans variables. Such as if a
particular word is present or not in a document. This model is also famous for
document classification tasks.
Bayesian Network
In statistics, Probabilistic models are used to define a relationship between
variables and can be used to calculate the probabilities of each variable.
In many problems, there are a large number of variables. In such cases, the fully
conditional models require a huge amount of data to cover each and every case
of the probability functions which may be intractable to calculate in real-time.
There have been several attempts to simplify the conditional probability
calculations such as the Naïve Bayes but still, it does not prove to be efficient as
it drastically cuts down several variables.
Bayesian Network
❖The only way is to develop a model that can preserve the conditional
dependencies between random variables and conditional independence in
other cases. This leads us to the concept of Bayesian Networks.
❖These Bayesian Networks help us to effectively visualize the probabilistic model
for each domain and to study the relationship between random variables in the
form of a user-friendly graph.
Bayesian Network
❖Real world applications are probabilistic in nature, and to represent the
relationship between multiple events, we need a Bayesian network.
❖ It can also be used in various tasks including prediction, anomaly detection,
diagnostics, automated insight, reasoning, time series prediction, and decision
making under uncertainty.
Bayesian Network
❖A Bayesian network is a probabilistic graphical model which represents a set
of variables and their conditional dependencies using a directed acyclic
graph."
❖It is also called a Bayes network, belief network, decision network, or Bayesian
model.
❖Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and
anomaly detection.
Bayesian Network

Bayesian Network can be used for building models from data and experts opinions,
and it consists of two parts:
❖Directed Acyclic Graph
❖Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision
problems under uncertain knowledge is known as an Influence diagram.
Bayesian Network
❖Each node corresponds to the random variables, and a variable can
be continuous or discrete.
❖Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables. These directed links or arrows connect
the pair of nodes in the graph.
❖These links represent that one node directly influence the other node, and if
there is no directed link that means that nodes are independent with each other
Bayesian Network
Bayesian Network
The Bayesian network has mainly two components:
❖Causal Component
❖Actual numbers
Each node in the Bayesian network has condition probability
distribution P(Xi |Parent(Xi) ), which determines the effect of the parent on that
node.
Bayesian network is based on Joint probability distribution and conditional
probability.
Joint Probability Distribution
❖If we have variables x1, x2, x3,....., xn, then the probabilities of a different
combination of x1, x2, x3.. xn, are known as Joint probability distribution.
❖P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint
probability distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
❖In general for each variable Xi, we can write the equation as:
Local Markov Property
❖The Bayesian Networks satisfy the property known as the Local Markov
Property. It states that a node is conditionally independent of its
non-descendants, given its parents. In the above example, P(D|A, B) is equal to
P(D|A) because D is independent of its non-descendent, B.
❖This property aids us in simplifying the Joint Distribution. The Local Markov
Property leads us to the concept of a Markov Random Field which is a random
field around a variable that is said to follow Markov properties.
Example
Harry installed a new burglar alarm at his home to detect burglary. The alarm
reliably responds at detecting a burglary but also responds for minor earthquakes.
Harry has two neighbors David and Sophia, who have taken a responsibility to
inform Harry at work when they hear the alarm. David always calls Harry when he
hears the alarm, but sometimes he got confused with the phone ringing and calls at
that time too. On the other hand, Sophia likes to listen to high music, so sometimes
she misses to hear the alarm. Here we would like to compute the probability of
Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary,
nor an earthquake occurred, and David and Sophia both called the Harry.
Solution
❖The Bayesian network for the above problem is given below. The network structure is
showing that burglary and earthquake is the parent node of the alarm and directly
affecting the probability of alarm's going off, but David and Sophia's calls depend on
alarm probability.
❖The network is representing that our assumptions do not directly perceive the burglary
and also do not notice the minor earthquake, and they also not confer before calling.The
conditional distributions for each node are given as conditional probabilities table or
CPT.
❖Each row in the CPT must be sum to 1 because all the entries in the table represent an
exhaustive set of cases for the variable.
❖In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if
there are two parents, then CPT will contain 4 probability values
Solution
Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
Bayesian Network
From the formula of joint distribution, we can write the problem statement in the form of
probability distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Thank You!!

You might also like