Bayes’ Theorem in Data Mining
Last Updated :
23 Jul, 2024
Bayes’ Theorem describes the probability of an event, based on precedent knowledge of conditions which might be related to the event. In other words, Bayes’ Theorem is the add-on of Conditional Probability.
With the help of Conditional Probability, one can find out the probability of X given H, and it is denoted by P(X | H). Now Bayes’ Theorem states that if we know Conditional Probability (P(X | H)) then we can find out P(H | X), given the condition that P(X) and P(H) are already known to us.
Bayes’ Theorem is named after Thomas Bayes. He first makes use of conditional probability to provide an algorithm which uses evidence to calculate limits on an unknown parameter. Bayes’ Theorem has two types of probabilities :
- Prior Probability [P(H)]
- Posterior Probability [P(H/X)]
Where,
- X – X is a data tuple.
- H – H is some Hypothesis.
1. Prior Probability
Prior Probability is the probability of occurring an event before the collection of new data. It is the best logical evaluation of the probability of an outcome which is based on the present knowledge of the event before the inspection is performed.
2. Posterior Probability
When new data or information is collected then the Prior Probability of an event will be revised to produce a more accurate measure of a possible outcome. This revised probability becomes the Posterior Probability and is calculated using Bayes’ theorem. So, the Posterior Probability is the probability of an event X occurring given that event H has occurred.
For example
Suppose, three bags have the labels A, B, and C. One bag has a red ball in it, while the other two do not. The prior probability of red ball found in bag B is one-third or 0.333. But when bag C is seen, and the result shows that there is no red ball in that bag, then the posterior probability of red ball found in bag A and B becomes 0.5, as each bag has one out of two chances.
Formula
Bayes’ Theorem, can be mathematically represented by the equation given below :
[Tex]P(H/X) =P(X/H)P(H)/P(X)[/Tex]
Where,
- H and X are the events and,
- P (X) ≠ 0
- P(H/X) – Conditional probability of H.
Given that X occurs.
- P(X/H) – Conditional probability of X.
Given that H occurs.
- P(H) and P(X) – Prior Probabilities of occurring H and X independent of each other.
This is called the marginal probability.
Formula Derivation of Bayes’ Theorem
According to conditional probability, we know that
P(X|H) = P(X and H)/P(H)
Therefore,
P(X and H) = P(X|H) * P(H) ---------- [1]
Similarly,
P(H|X) = P(H and X)/P(X)
= P(X and H)/P(X) [Order does not matter in Joint Probability]
Therefore,
P(X and H) = P(H|X) * P(X) --------- [2]
Now from equation [1] and [2],
P(X|H) * P(H) = P(H|X) * P(X)
⇒ P(X|H) = P(H|X) * P(X)/P(H)
It means that if we know P(X|H), then we can find out P(H | X),
given the condition that P(X) and P(H) are already known to us.
Now, let us consider X1, X2, X3…..Xk be a group of events having probability P(Xi), i = 1, 2, 3…..k and for any event H where P(H) > 0.
P(Xi|H) = P(Xi and H) / P(H)
= P(H|Xi)*P(Xi) / ∑[P(H|Xi)*P(Xi)

Tree representation of Bayes’ Theorem
To find Reverse Probabilities : Bayes' Theorem
P(X1|H) = P(H|X1)*P(X1) / P(H)
Where
- P(X1) and P(H) are called marginal probabilities.
- P(X1) and P(H|X1) is already given.
Therefore, P(H) can be calculated as given below :
P(H) = P(H|X1)*P(X1) + P(H|X2)*P(X2) + P(H|X3)*P(X3)
(This is also known as Total Probability)
To find Reverse Probabilities : Bayes' Theorem
P(X1|H') = P(H'|X1)*P(X1) / P(H')
Now, P(H) can be calculated as
P(H') = P(H'|X1)*P(X1) + P(H'|X2)*P(X2) + P(H'|X3)*P(X3)
Applications of Bayes’ Theorem
In the real world, there are plenty of applications of the Bayes’ Theorem. Some applications are given below :
- It can also be used as a building block and starting point for more complex methodologies, For example, The popular Bayesian networks.
- Used in classification problems and other probability-related questions.
- Bayesian inference, a particular approach to statistical inference.
- In genetics, Bayes’ theorem can be used to calculate the probability of an individual having a specific genotype.
Examples
1. SpamAssassin works as a mail filter to identify the spam in which users train the system. In emails, it considers patterns in the words which are marked as spam by the users. For Example, it may have learned that the word “release” is marked as spam in 30% of the emails. Concluding 0.8% of non-spam mails which includes the word “release” and 40% of all emails which are received by the user is spam. Find the probability that a mail is a spam if the word “release” seems in it.
Solution :
Given,
P(Release | Spam) = 0.30
P(Release | Non Spam) = 0.008
P(Spam) = 0.40
=> P(Non Spam) = 0.40
P(Spam | Release) = ?
Now, using Bayes’ Theorem:
P(Spam | Release) = P(Release | Spam) * P(Spam) / P(Release)
= 0.30 * 0.40 / (0.40 * 0.30 + 0.30 * 0.008)
= 0.980
Hence, the required probability is 0.980.
2. Bag1 contains 4 white and 8 black balls and Bag2 contains 5 white and 3 black balls. From one of the bag one ball is drawn at random and the ball which is drawn comes out as black. Find the probability that the ball is drawn from Bag1.
Solution:
Given,
Let E1, E2 and A be the three events where,
E1 = Event of selecting Bag1
E2 = Event of selecting Bag2
A = Event of drawing black ball
Now,
P(E1) = P(E2) = 1/2
P(drawing a black ball from Bag1) = P(A|E1) = 8/12 = 2/3
P(drawing a black ball from Bag2) = P(A|E2) = 3/8
By using Bayes' Theorem, the probability of drawing a black ball from Bag1,
P(E1|A) = P(A|E1) * P(E1) / P(A|E1) * P(E1) + P(A|E2) * P(E2)
[P(A|E1) * P(E1) + P(A|E2) * P(E2) = Total Probability]
= (2/3 * 1/2) / (2/3 * 1/2 + 3/8 * 1/2)
= 16/25
Hence, the probability that the ball is drawn from Bag1 is 16/25
Similar Reads
Bayes Theorem in Machine learning
Bayes' theorem is fundamental in machine learning, especially in the context of Bayesian inference. It provides a way to update our beliefs about a hypothesis based on new evidence. What is Bayes theorem?Bayes' theorem is a fundamental concept in probability theory that plays a crucial role in vario
5 min read
Proximity-Based Methods in Data Mining
Proximity-based methods are an important technique in data mining. They are employed to find patterns in large databases by scanning documents for certain keywords and phrases. They are highly prevalent because they do not require expensive hardware or much storage space, and they scale up efficient
3 min read
Bayes' Theorem
Bayes' Theorem is a mathematical formula that helps determine the conditional probability of an event based on prior knowledge and new evidence. It adjusts probabilities when new information comes in and helps make better decisions in uncertain situations. Bayes' Theorem helps us update probabilitie
12 min read
Text Mining in Data Mining
In this article, we will learn about the main process or we should say the basic building block of any NLP-related tasks starting from this stage of basically Text Mining. What is Text Mining?Text mining is a component of data mining that deals specifically with unstructured text data. It involves t
10 min read
Data Mining and Society
Data Mining is the process of collecting data and then processing them to find useful patterns with the help of statistics and machine learning processes. By finding the relationship between the database, the peculiarities can be easily identified. Aggregation of useful datasets from a heap of data
8 min read
Aggregation in Data Mining
Aggregation in data mining is the process of finding, collecting, and presenting the data in a summarized format to perform statistical analysis of business schemes or analysis of human patterns. When numerous data is collected from various datasets, it's important to gather accurate data to provide
7 min read
Bayes' theorem in Artificial intelligence
Bayes Theorem in AI is perhaps the most fundamental basis for probability and statistics, more popularly known as Bayes' rule or Bayes' law. It allows us to revise our assumptions or the probability that an event will occur, given new information or evidence. In this article, we will see how the Bay
10 min read
Conditional Probability vs Bayes Theorem
Conditional probability and Bayes' Theorem are two imprtant concepts in probability where Bayes theorem is generalized version of conditional probability. Conditional probability is the probability of an event occurring given that another event has already occurred. Bayes' Theorem, named after the 1
5 min read
Data Mining Tutorial
Data Mining Tutorial covers basic and advanced topics, this is designed for beginner and experienced working professionals too. This Data Mining Tutorial help you to gain the fundamental of Data Mining for exploring a wide range of techniques. What is Data Mining?Data mining is the process of extrac
7 min read
Introduction to Data Mining
Data mining is the process of extracting useful information from large sets of data. It involves using various techniques from statistics, machine learning, and database systems to identify patterns, relationships, and trends in the data. This information can then be used to make data-driven decisio
5 min read