ml last document group 2.pdf
ml last document group 2.pdf
No Name ID
01 Henok Tadesse NSR/0932/13
02 Melkamu Yitayih NSR/1179/13
03 Adise Adane NSR/0124/12
04 Birhan Ayenew NSR/0393/13
05 Tizazu Mekuant NSR/1641/13
Submission to Mr.
Gebreyes G.
1
1. Introduction to Bayes Theorem in Machine Learning
Bayes theorem is also known with some other name such as Bayes rule or Bayes Law. Bayes
theorem helps to determine the probability of an event with random knowledge. It is used to
calculate the probability of occurring one event while other one already occurred. It is a best
method to relate the condition probability and marginal probability.
In simple words, we can say that Bayes theorem helps to contribute more accurate results.
Bayes Theorem is used to estimate the precision of values and provides a method for calculating
the conditional probability. However, it is hypocritically a simple calculation but it is used to
easily calculate the conditional probability of events where intuition often fails. Some of the data
scientist assumes that Bayes theorem is most widely used in financial industries but it is not like
that. Other than financial, Bayes theorem is also extensively applied in health and medical,
research and survey industry, aeronautical sector, etc.
Bayes theorem is one of the most popular machine learning concepts that helps to calculate the
probability of occurring one event with uncertain knowledge while other one has already
occurred.
Bayes' theorem can be derived using product rule and conditional probability of event X with
known event Y:
According to the product rule we can express as the probability of event X with known event Y
as follows;
2
Mathematically, Bayes theorem can be expressed by combining both equations on right hand
side. We will get:
Here, both events X and Y are independent events which means probability of outcome of both
events does not depends one another.
P(Y|X) is called the likelihood. It is the probability of evidence when hypothesis is true.
P(X) is called the prior probability, probability of hypothesis before considering the evidence
P(Y) is called marginal probability. It is defined as the probability of evidence under any
consideration.
1. Experiment
An experiment is defined as the planned operation carried out under controlled condition such as
tossing a coin, drawing a card and rolling a dice, etc.
3
2. Sample Space
During an experiment what we get as a result is called as possible outcomes and the set of all
possible outcome of an event is known as sample space. For example, if we are rolling a dice,
sample space will be:
S1 = {1, 2, 3, 4, 5, 6}
Similarly, if our experiment is related to toss a coin and recording its outcomes, then sample
space will be:
S2 = {Head, Tail}
3. Event
Event is defined as subset of sample space in an experiment. Further, it is also called as set of
outcomes.
Assume in our experiment of rolling a dice, there are two event A and B such that;
Probability of the event A ''P(A)''= Number of favourable outcomes / Total number of possible
outcomes
P(E) = 3/6 =1/2 =0.5
Similarly, Probability of the event B ''P(B)''= Number of favourable outcomes / Total number
of possible outcomes
=2/6
=1/3
=0.333
4
Union of event A and B:
A∪B = {2, 4, 5, 6}
Disjoint Event: If the intersection of the event A and B is an empty set or null then such events
are known as disjoint event or mutually exclusive events also.
5
4. Random Variable:
It is a real value function which helps mapping between sample space and a real line of an
experiment. A random variable is taken on some random values and each value having some
probability. However, it is neither random nor a variable but it behaves as a function which can
either be discrete, continuous or combination of both.
5. Exhaustive Event:
As per the name suggests, a set of events where at least one event occurs at a time, called
exhaustive event of an experiment.
Thus, two events A and B are said to be exhaustive if either A or B definitely occur at a time and
both are mutually exclusive for e.g., while tossing a coin, either it will be a Head or may be a
Tail.
6. Independent Event:
Two events are said to be independent when occurrence of one event does not affect the
occurrence of another event. In simple words we can say that the probability of outcome of both
events does not depends one another.
7. Conditional Probability:
Conditional probability is defined as the probability of an event A, given that another event B
has already occurred (i.e. A conditional B). This is represented by P(A|B) and we can define it as:
8. Marginal Probability:
6
Here ~B represents the event that B does not occur.
among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a
The Naïve Bayes classifier is a popular supervised machine learning algorithm used for
classification tasks such as text classification. It belongs to the family of generative learning
algorithms, which means that it models the distribution of inputs for a given class or category.
This approach is based on the assumption that the features of the input data are conditionally
independent given the class, allowing the algorithm to make predictions quickly and accurately.
In statistics, naive Bayes classifiers are considered as simple probabilistic classifiers that apply
Bayes’ theorem. This theorem is based on the probability of a hypothesis, given the data and
some prior knowledge. The naive Bayes classifier assumes that all features in the input data are
independent of each other, which is often not true in real-world scenarios. However, despite this
simplifying assumption, the naive Bayes classifier is widely used because of its efficiency and
7
Moreover, it is worth noting that naive Bayes classifiers are among the simplest Bayesian
network models, yet they can achieve high accuracy levels when coupled with kernel density
estimation. This technique involves using a kernel function to estimate the probability density
function of the input data, allowing the classifier to improve its performance in complex
scenarios where the data distribution is not well-defined. As a result, the naive Bayes classifier is
a powerful tool in machine learning, particularly in text classification, spam filtering, and
For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in
diameter. Even if these features depend on each other or upon the existence of the other features,
all of these properties independently contribute to the probability that this fruit is an apple and
An NB model is easy to build and particularly useful for very large data sets. Along with
simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.
Bayes theorem provides a way of computing posterior probability P(c|x) from P(c), P(x) and
8
here
P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
P(x|c) is the likelihood which is the probability of the predictor given class.
Let’s understand it using an example. Below we have a training data set of weather
and corresponding target variable ‘Play’ (suggesting possibilities of playing). Now,
we need to classify whether players will play or not based on weather condition. Let’s
follow the below steps to perform it.
9
Create Likelihood table by finding the probabilities
Create Likelihood table by finding the probabilities like Overcast probability
0.29 and probability of playing is
0.64.
Problem: Players will play if the weather is sunny. Is this statement correct?
10
Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 =
0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
The Naive Bayes uses a similar method to predict the probability of different class based on
various attributes. This algorithm is mostly used in text classification and with problems having
multiple classes.
11
References
[1] javatpoint.com
[2] W3schools.com
12