0% found this document useful (0 votes)

15 views53 pages

ML Lec 3

The document provides an overview of probability theory, including definitions, rules, and examples of calculating probabilities for various events. It covers concepts such as conditional probability, Bayes' theorem, and Bayesian classification, illustrating their applications in statistical analysis and data mining. Key rules and formulas are presented to facilitate understanding of how probabilities are derived and used in practical scenarios.

Uploaded by

Turjo Sarker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views53 pages

ML Lec 3

Uploaded by

Turjo Sarker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 53

Theory of probability

1
Definition of Probability

 Let A be an event of Ω. Then the frequency ratio of A is

 If the random experiment Ω is repeated a large number of

times under identical or uniform conditions then the
frequency ratio of A will be approximately equal to its
probability, i.e.,

 Thus, f(A) can be taken as to be an experimentally measured

value of the idealised number P(A).
 Longer is the sequence of repetitions of Ω more accurate is the
measured value.
 The definition of probability in terms of frequency
interpretation restricts the class of random experiments, i.e.,
the random experiment must be repeated a large number of
times in a uniform condition.
Example-1

 What is the probability that a positive integer selected

at random from the set of positive integers not
exceeding 100 is divisible by (i) 5, (ii)5 or 3 (iii)5 and
3?
Solution: Ω = {1, 2, …, 100} so, n(Ω)= 100
Let A be an event that no. is divisible by 5, so A =
{5,10,…,100}; so n(A)= 20. So,

Let B be an event that no. is divisible by 3, so B ={3,6,

…,99}, so n(B)=33; so
Example-1
 Let C be an event that no. is divisible by 5 or 3, C = A
∪ B, and n(A ∪ B)=47; so

 Let D be an event that no. is divisible by 5 and 3, D =

A ∩ B, and n(A ∩ B)=6; so
Deduction of some important
rules
1. For any event A,
In the limit as , So,
2. For certain event S, , so in the
limit P(S)=1
3. For impossible event O, n(O)=0. Hence,
P(O)=0
4. For an event A, A+ A′=S (certain event)
So, P(A+ A′)=P(S) => P(A) + P(A′)=1
=> P(A)=1-P(A′)
Addition rule for pairwise mutually exclusive
events

For two mutually exclusive events, A and B, n(A ∪ B)

= n(A+B)= n(A) + n(B)
So, , therefore, in the limit P(A+B)=P(A)
+P(B)

 If A, B, C be pairwise mutually exclusive

events, => , , and
Also => A and are mutually exclusive
so we have
 In general, if be n pairwise
mutually exclusive events then we have the
following addition rule:
Conditional Probability
 Let us consider two events A and B of Ω. Let us make the
hypothesis that the event A has occurred.
 Let n(A) times event A has occurred out of n(Ω).
 Let, among these n(A) occurrence of A, the event B also
occurs (along with A) times.
 The ratio n(AB)/n(A) is called the conditional frequency ratio
of B on the hypothesis that A has occurred and denoted by
f(B/A), i.e., f(B/A)= n(AB)/n(A) =>

 By Empirical or Statistical definition,

 We assume that, this limit exist, this limit is called the
conditional probability of B on the hypothesis that A has
occurred.
 So, as n(Ω) -> ∞, provided P(A) ≠ 0
Conditional Probability
 Similarly, provided P(B) ≠ 0

 Hence, if P(A), P(B) ≠ 0, we have

This is the Multiplication Rule

 Addition rule:

 Multiplication Rule:
General Addition Rule
General Addition Rule
 Let us consider two events A and B of Ω. In general,

they are not mutually exclusive.

 But the events, A-AB, AB, and B-AB are always

pairwise mutually exclusive.

 So, A=(A-AB)+AB, B=(B-AB)+AB, and A+B=(A-AB)+AB+
(B-AB)
 By Addition rule for mutually exclusive events,
P(A)= P(A-AB)+P(AB), P(B)=P(B-AB)+P(AB), and
P(A+B)=P(A-AB)+P(AB)+P(B-AB)
= P(A)-P(AB)+P(AB)+P(B)-P(AB) = P(A)+P(B)-P(AB)
i.e., P(A+B)= P(A)+P(B)-P(AB)
General Addition Rule

 For three events A, B, and C

P(A+B+C) = P(A+(B+C))=P(A)+P(B+C)-P(A(B+C))
= P(A)+P(B)+P(C)-P(BC)-P(AB+AC)
= P(A)+ P(B)+P(C)-P(BC)-[P(AB)+P(AC)-
P(AB.AC)]
= P(A)+ P(B)+P(C)-P(BC)-P(AB)-P(AC)+P(AB.AC)
= P(A)+ P(B)+P(C)-P(AB)-P(BC)-P(CA)+P(ABC)
 Generalising for n events, , we get
Examples

1. A coin is tossed 3 times in succession. Find

the probability of (a) 2 heads (b) 2
consecutive heads
Examples

1. A coin is tossed 3 times in succession. Find the

probability of (a) 2 heads (b) 2 consecutive
heads
Sol: (a) Here,
Let, A is the event that 2 heads occur, then
So, P(A) = = 3/8

(b) Let B be the event that 2 consecutive heads

occur, then n(B) = 3-1=2 [as head in first and third
positions are not consecutive]
So, P(B)= = 2/8=1/4
Examples

2. Two dice are thrown. Find the probability that the sum of the
faces equals or exceeds 10.
Examples

2. Two dice are thrown. Find the probability that the sum of the
faces equals or exceeds 10.
Sol: Here,
Let, A, B, and C denote the events ‘Sum 10’, ‘Sum 11’, and
‘Sum 12’ respectively. So, A+B+C is the required event,
where A, B, and C are pairwise mutually exclusive.
So, P(A+B+C) = P(A)+P(B)+P(C).
Now, P(A)= 3/36 as (4,6), (5,5), and (6,4) lie in A,
P(B)= 2/36 as (5,6) and (6,5) lie in B, and
P(C)= 1/36 as only (6,6) lies in C
So, P(A+B+C) = P(A)+P(B)+P(C)=3/36+2/36+1/36=6/36=1/6
Generalisation of Conditional Probability

 Frequency ratio
 For a long sequence of repetitions of the random experiment
under uniform conditions, the conditional frequency ratio
f(B/A) is taken to be an approximate value of the conditional
probability P(B/A).
 So, the conditional probability of B on the hypothesis that A
has occurred is

 Which gives the multiplication rule:

 For three events A, B, C, we have
Generalisation of Conditional Probability

 For three events A, B, C, we have

Proof: R.H.S=

 In general, for n events the multiplication rule is:

Examples

1. A die is rolled. If the result is either an even face or a

multiple of 3, then you win. What is the probability that
multiple of 3 occurs on the hypothesis that even face occurs?
Examples

1. A die is rolled. If the result is either an even face or a multiple

of 3, then you win. What is the probability that multiple of 3
occurs on the hypothesis that even face occurs?
Sol: Here,
Let, A and B denote the events ‘even face’, ‘multiple of 3,
respectively. So, B/A is the required event.
So, P(B/A) = P(AB)/P(A).
Now, P(A)= 3/6 as (2), (4), and (6) lie in A,
P(B)= 2/6 as (3) and (6) lie in B
P(AB)= 1/6 as only (6) lies in AB
So, P(B/A) = P(AB)/P(A)=(1/6)/ (3/6)=1/3
Similarly, P(A/B)=P(AB)/P(B)=1/2
Examples

2. Two cards are drawn successively from a pack without

replacing the first. If the first card is a spade, find the
probability that the second card is also a spade.
Examples

2. Two cards are drawn successively from a pack without

replacing the first. If the first card is a spade, find the
probability that the second card is also a spade.
Sol1: Let A = first card is a spade, B=second card is a spade.
So, AB=both cards are spades.
= 52. 51, n(A)=13. 51, n(AB)=13.12
So, P(B/A)=P(AB)/P(A) = n(AB)/n(A)=(13.12) / (13.51) = 4/17

Sol2: When the first card is seen to be a spade, there are 51 cards
remain in the pack out of which 12 are spade.
Hence, the probability that the second card is also a spade is
12/51=4/17
Bayes’ Theorem
 Theorem: If be a given set of n pairwise
mutually exclusive events, one of which certainly occurs,
i.e.,
and
then for any arbitrary event X,
(i)

(ii) Bayes’ theorem: if P(X) ≠ 0,

Proof: For any event X, we have X=SX= =

Since,
Therefore, are pairwise mutually exclusive events, and
hence
[Addition Rule]
Since, [Multiplication Rule]
Therefore, [(i) is proved]
Bayes’ Theorem
We have already proved that
Now we have to prove the Bayes’ theorem: i.e., if P(X) ≠ 0,

Proof: [Multiplication Rule]

Also, [Multiplication Rule]
Hence, if P(X) ≠ 0,

Thus the Bayes’ theorem is proved.

Example on Bayes’ Theorem

Example-1: There are three identical urns containing white and black
balls. The
first urn contains 2 white and 3 black balls, the second urn 3 white
and 5 black
balls, and the third urn 5 white and 2 black balls. An urn is chosen at
random,
and a ball is drawn from it. If the ball drawn is white, what is the
probability
that the second urn is chosen?
Solution:
 Let A= the event that the ball is drawn from the first urn, B= the
event that the ball is drawn from the second urn, and C= the
event that the ball is drawn from the third urn.
 The events A, B, and C are pairwise mutually exclusive, and one of
these necessarily occurs.
 P(A) is the probability that 1st urn is chosen, and so on. So,
P(A)=P(B)=P(C)=1/3
 Let X=the event that ball drawn is white. So, P(X/A)=2/5,
P(X/B)=3/8, P(X/C)=5/7.

 We have to compute P(B/X)

Example-1 Cont…
Compute P(B/X)

 P(X)=P(A)P(X/A) + P(B)P(X/B)+P(C)P(X/C)
=(1/3)(2/5)+ (1/3)(3/8)+ (1/3)(5/7)
=(1/3)[2/5+3/8+5/7]
=(1/3)(417/280) = 139/280
 So, By Bayes’ theorem:

P(B/X) = [P(X/B) P(B)]/ P(X)

= [(3/8)(1/3)] / (139/280)
=35/139 (Ans.)
Bayesian Classifier

25
Bayesian Classification: Why?

 A statistical classifier: performs probabilistic

prediction, i.e., predicts class membership
probabilities
 Foundation: Based on Bayes’ Theorem.
 Performance: A simple Bayesian classifier,
naïve Bayesian classifier, has comparable
performance with decision tree and selected
neural network classifiers

26
Bayesian Classification

Data Mining: Concepts and

March 29, 2025 Techniques 27
Bayesian Classification

Data Mining: Concepts and

March 29, 2025 Techniques 28
Bayes’ Theorem: Basics
P( H | X) P(X | H ) P( H ) P(X | H )P( H ) / P(X)
Bayes’ Theorem:


P(X)

Let X be a data sample: class label is unknown

Let H be a hypothesis that X belongs to class C

Classification is to determine P(H|X), (i.e., posteriori
probability): the probability that the hypothesis holds given
the observed data sample X

P(H) (prior probability): the initial probability

E.g., X will buy computer, regardless of age, income,
student, credit_rating.

P(X): probability that sample data is observed

P(X|H) : the probability of observing the sample X, given
that the hypothesis holds

E.g., Given that X will buy computer, the prob. that X is
31..40, medium income

29
Prediction Based on Bayes’ Theorem
 Given training data X, posteriori probability of a
hypothesis H, P(H|X), follows the Bayes’ theorem

P(H | X) P(X | H ) P( H ) P(X | H )P( H ) / P(X)

P(X)
 Predicts X belongs to Ci iff the probability P(Ci|X) is
the highest among all the P(Ck|X) for all the k
classes
 Practical difficulty: It requires initial knowledge of
many probabilities, involving significant
computational cost

30
Classification Is to Derive the Maximum Posteriori

 Let D be a training set of tuples and their

associated class labels, and each tuple is
represented by an n-D attribute vector X = (x1, x2,
…, xn)

Suppose there are m classes C1, C2, …, Cm.
 Classification is to derive the maximum posteriori,
i.e., the maximal P(Ci|X)
 This can be derived from Bayes’
P(X | C )Ptheorem
i
(C )
i
P(C | X) 
i P(X)

 Since P(X) is constant for all classes, so only

P(C | X) P(X | C )P(C )
i i i

needs to be maximized
31
Naïve Bayes Classifier
 A simplified assumption: attributes are
conditionally independent (i.e., no dependence
relation between nattributes):
P( X | C i )   P( x | C i ) P( x | C i ) P( x | C i ) ...P( x | C i )
k 1 2 n
k 1
 This greatly reduces the computation cost: Only
counts the class distribution
 If Ak is categorical, P(xk|Ci) is the # of tuples in Ci
having value xk for Ak divided by |Ci, D| (# of tuples
of Ci in D)
 gIf( xAk ,kis
Ci ,  Ci )
continous-valued, P(xk|Ci) is usually
computed 2
( x  )
based on Gaussian distribution
g ( x,  ,  ) 
1 
e 2
2 with a
mean μ and standard deviation 2  σ , where
32
Naïve Bayes Classifier: Training Dataset

age income studentcredit_rating

buys_compu
<=30 high no fair no
Class: <=30 high no excellent no
C1:buys_computer = ‘yes’ 31…40 high no fair yes
C2:buys_computer = ‘no’ >40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
Data to be classified:
31…40 low yes excellent yes
X = (age <=30,
<=30 medium no fair no
Income = medium, <=30 low yes fair yes
Student = yes >40 medium yes fair yes
Credit_rating = Fair) <=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
33
age income studentcredit_rating
buys_computer
<=30 high no fair no

An Example <=30
31…40
>40
>40
high
high
medium
low
no excellent
no fair
no fair
yes fair
no
yes
yes
yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes

P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 >40 medium yes fair yes
<=30 medium yes excellent yes
P(buys_computer = “no”) = 5/14= 0.357 31…40
31…40
medium
high
no excellent
yes fair
yes
yes

Compute P(X|Ci) for each class >40 medium no excellent no

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222

P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
 X = (age <= 30 , income = medium, student = yes, credit_rating =
fair)

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 =
0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019

P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) =
0.044 x 0.643 = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) =
0.019 x 0.357 = 0.007

34
Avoiding the Zero-Probability Problem
 Naïve Bayesian prediction requires each
conditional prob. be non-zero. Otherwise, the
predicted prob. will be zeron
P( X | C i)   P( x k | C i)
k 1
 Ex. Suppose a dataset with 1000 tuples,
income=low (0), income= medium (990), and
income = high (10)
 Use Laplacian correction (or Laplacian
estimator)

Adding 1 to each case
Prob(income = low) = 1/1003
Prob(income = medium) = 991/1003
Prob(income = high) = 11/1003

The “corrected” prob. estimates are close to
their “uncorrected” counterparts 35
Naïve Bayes Classifier: Comments
 Advantages
 Easy to implement

 Good results obtained in most of the cases

 Disadvantages
 Assumption: class conditional independence,

therefore loss of accuracy

 Practically, dependencies exist among variables


E.g., hospitals: patients=> Profile: age, family
history, etc.
Symptoms: fever, cough etc., Disease: lung
cancer, diabetes, etc.

Dependencies among these cannot be
modeled by Naïve Bayes Classifier
 How to deal with these dependencies? Bayesian
Belief Networks 36
kNN Algorithm

37
kNN Classifier
 k-nearest neighbours (kNN) algorithm as a type of
supervised algorithm which can be used for both
classification as well as regression predictive problems.
 There are three categories of learning algorithms:
i) Lazy learning algorithm: kNN is a lazy learning
algorithm because it does not have a specialized
training phase or model and uses all the data for
training while classification
ii) Non-parametric learning algorithm: kNN is also
a non-parametric learning algorithm because it does
not assume about the distribution of the underlying
data (as opposed to other algorithms such as Gaussian
Mixture Model (GMM), which assume a Gaussian
distribution about the data
iii) Eager learning algorithm: Eager learners, when
given a set of training tuples, will construct a
generalization model before receiving new (e.g., test)
tuples to classify. 38
kNN Classifier
 The kNN algorithm begins with a training
dataset made up of examples that are
classified into several categories.
 Assume that we have a test dataset
containing unlabeled examples that
otherwise have the same features as the
training data.
 For each example (i.e., record) in the test
dataset, kNN identifies k examples in the
training data that are the "nearest" in
similarity, where k is an integer specified in
advance.
 The unlabeled test instance is assigned the
class of the majority of the k nearest
neighbors
39
KNN: Classification Approach
 Locating the unlabeled instance’s nearest
neighbors requires a distance function, or a
formula that measures the similarity
between two instances.
 There are many different ways to calculate
distance. Traditionally, the kNN algorithm
uses Euclidean distance.
 The K-NN algorithm works by finding the K
nearest neighbors to a given data point
based on a distance metric, such as
Euclidean distance.
 The class or value of the data point is then
determined by the majority vote (for
classification) or average (for regression) of
the K neighbors. 40
KNN: Classification Approach
 Classified by “MAJORITY VOTES” for its neighbor
classes
🞑 Assignedto the most common class amongst its K-
nearest neighbors
KNN:
Pseudocode
Advantages of kNN
 kNN algorithm is a versatile and widely used
machine learning algorithm that is primarily
used for its simplicity and ease of
implementation.
 It does not require any assumptions about
the underlying data distribution.
 It can also handle both numerical and
categorical data, making it a flexible choice
for various types of datasets in classification
and regression tasks.
 It is a non-parametric method that makes
predictions based on the similarity of data
points in a given dataset.
43
Advantages of kNN
 K-NN is less sensitive to outliers compared
to other algorithms.

 Few Hyperparameters – The only

parameters which are required in the
training of a KNN algorithm are the value of
k and the choice of the distance metric
which we would like to choose from our
evaluation metric.

44
Disadvantages of the KNN Algorithm
 Does not scale – It is a lazy Algorithm. The main
significance of this term is that this takes lots of computing
power as well as data storage. This makes this algorithm both
time-consuming and resource exhausting.
 Curse of Dimensionality – The KNN algorithm is affected by

the curse of dimensionality which implies the algorithm faces

a hard time classifying the data points properly when the
dimensionality is too high.
 The curse of dimensionality can be particularly problematic in

several ways:
(i) Distance Metrics Become Less Informative: In high-
dimensional spaces, the concept of "distance" becomes less
meaningful because the distances between all pairs of points
tend to become more similar. This reduces the effectiveness
of KNN, which relies on distance metrics to determine the
nearest neighbors.
(ii) Sparsity of Data: As the number of dimensions increases,
the volume of the space grows exponentially. This means that
data points become sparse, making it harder for KNN to find
enough neighbors that are truly representative of the
underlying distribution of the data.
45
Disadvantages of the KNN
Algorithm

(iii) Increased Computational Complexity: The

computational cost of calculating distances between points
increases with dimensionality. This can make KNN inefficient
in practice when dealing with very high-dimensional data.

(iv) Overfitting: In high-dimensional spaces, KNN can become

overly sensitive to noise in the data, leading to overfitting.
This happens because with many dimensions, there’s a
higher likelihood that some of the features will not be
relevant, but they can still influence the nearest neighbor
calculations.
 To mitigate these issues, various techniques can be
used:
(i) Dimensionality Reduction
(ii) Feature Selection
 By addressing the curse of dimensionality through

these methods, you can make KNN more effective

even
March 29, 2025
in high-dimensional settings.
Data Mining: Concepts
Techniques
and
46
Variation In kNN
How to Choose the value of
k?
 The value of k is very crucial in the KNN
algorithm to define the number of
neighbors in the algorithm.
 The value of k in the k-nearest neighbors (k-
NN) algorithm should be chosen based on
the input data. If the input data has more
outliers or noise, a higher value of k would
be better.
 It is recommended to choose an odd value
for k to avoid ties in classification.

48
Value of k
 The small k value isn’t suitable for
classification.
 As a rule of thumb, setting k to the square
root of the number of training samples can
lead to better result. If k becomes 10 after
the square root, we can choose either k=9
or k=11 just to make sure that k is odd.
 Use an error plot or accuracy plot to find the
most favorable k value.
 kNN performs well with multi-label classes,
but you must be aware of the outliers.

49
How to Choose the value of
k?

• We got the accuracy of 0.41 at K=37. As we got the

minimum error at k=37, so we will get better
efficiency at that K value.
50
Is Naïve Bayes a lazy learner?
 The Naive Bayes algorithm is not a lazy learner. It
is an eager learner. It is different from the nearest
neighbor algorithm.
 A real learning takes place for Naive Bayes. The
parameters that are learned in Naive Bayes are
the prior probabilities of different classes, as well
as the likelihood of different features for each
class.
 In the test phase, these learned parameters are
used to estimate the probability of each class for
the given sample.
 In other words, in Naive Bayes, for each sample in
the test set, the parameters determined during
training are used to estimate the probability of
that sample belonging to different classes.
Is Naïve Bayes a lazy learner?


For example, P(c|x) ∝ P(c) P(x1|c) P(x2|c) ...
p(xn|c), where c is a class and x is a test
sample.

All quantities P(c) and P(xi|c) are
parameters which are determined during
training and are used during testing.
 This is similar to NN, but the kind of
learning and the kind of applying the
learned model is different.
THANK YOU

March 29, 2025 53

Probability Basics for Students
No ratings yet
Probability Basics for Students
19 pages
Probability
No ratings yet
Probability
6 pages
SMT5203
No ratings yet
SMT5203
57 pages
Stat 112 Lecture Note
No ratings yet
Stat 112 Lecture Note
29 pages
Probability, Discrete Random Variables & Normal Distribution.
No ratings yet
Probability, Discrete Random Variables & Normal Distribution.
86 pages
Probability and Distribution Concepts
No ratings yet
Probability and Distribution Concepts
49 pages
Notes 6174 Probability
No ratings yet
Notes 6174 Probability
63 pages
Klassik Tarif, Ehtimollarni Qoshish Va Kopaytirish.
No ratings yet
Klassik Tarif, Ehtimollarni Qoshish Va Kopaytirish.
22 pages
Biostatistics: Probability Concepts Explained
No ratings yet
Biostatistics: Probability Concepts Explained
22 pages
Laws of Probability Explained
No ratings yet
Laws of Probability Explained
18 pages
Probability Notes
No ratings yet
Probability Notes
24 pages
Sta102 Lecture 2
No ratings yet
Sta102 Lecture 2
10 pages
Probability - Short Notes - MHTCET Rankers 2025
No ratings yet
Probability - Short Notes - MHTCET Rankers 2025
4 pages
Chapter 15 Notes
No ratings yet
Chapter 15 Notes
22 pages
Probability
No ratings yet
Probability
23 pages
Student Notes 1.2
No ratings yet
Student Notes 1.2
5 pages
Statistics and Probability
No ratings yet
Statistics and Probability
100 pages
Probability: Arride Learning Online E-Learning Academy
No ratings yet
Probability: Arride Learning Online E-Learning Academy
28 pages
Module in DS 101
No ratings yet
Module in DS 101
9 pages
Understanding Probability Concepts
100% (1)
Understanding Probability Concepts
93 pages
Probability
No ratings yet
Probability
7 pages
Probabilities
0% (1)
Probabilities
18 pages
D and F Block Class 12 Chemistry
No ratings yet
D and F Block Class 12 Chemistry
5 pages
PSM - CSM - Unit - I (Part - II) - MSSR
No ratings yet
PSM - CSM - Unit - I (Part - II) - MSSR
20 pages
Lecture 14 Int To Prob & Cond Prob
No ratings yet
Lecture 14 Int To Prob & Cond Prob
7 pages
Probability Events Explained
No ratings yet
Probability Events Explained
7 pages
Probability Basics for Students
No ratings yet
Probability Basics for Students
39 pages
Probability
No ratings yet
Probability
5 pages
Statistical Theory Lecture 6-2025
No ratings yet
Statistical Theory Lecture 6-2025
14 pages
Theory of Probability
No ratings yet
Theory of Probability
7 pages
Multiplication Rule in Probability
No ratings yet
Multiplication Rule in Probability
52 pages
Chapter 1.5 Holasca - 100204
No ratings yet
Chapter 1.5 Holasca - 100204
4 pages
Probability Concepts and Examples
No ratings yet
Probability Concepts and Examples
43 pages
Probability Basics for Students
No ratings yet
Probability Basics for Students
40 pages
Probability Theory Random Experiment
No ratings yet
Probability Theory Random Experiment
4 pages
PROBABILITY
No ratings yet
PROBABILITY
4 pages
Probability
No ratings yet
Probability
77 pages
Probability Concepts and Rules
No ratings yet
Probability Concepts and Rules
54 pages
Statistics - CH - 4 & 5 - Probability and Probability Distribution - 2022
No ratings yet
Statistics - CH - 4 & 5 - Probability and Probability Distribution - 2022
104 pages
Wa0000.
No ratings yet
Wa0000.
28 pages
Math Project 5
No ratings yet
Math Project 5
21 pages
Ilead Probability
No ratings yet
Ilead Probability
4 pages
Topic 3 - Probability
No ratings yet
Topic 3 - Probability
10 pages
Unit 5 Probability Concepts
No ratings yet
Unit 5 Probability Concepts
10 pages
Chapter 4 (Technical English For Statistics)
No ratings yet
Chapter 4 (Technical English For Statistics)
8 pages
TOPIC 2 Rules of Probability EE2A
No ratings yet
TOPIC 2 Rules of Probability EE2A
39 pages
l3 Prob TH
No ratings yet
l3 Prob TH
4 pages
Unit-I PROBABILITY AND DISTRIBUTION
No ratings yet
Unit-I PROBABILITY AND DISTRIBUTION
53 pages
Read ... The Probability of B Given A.: Example 1: A Bag Contains 12 Red M&MS, 12 Blue M&MS, and 12 Green
No ratings yet
Read ... The Probability of B Given A.: Example 1: A Bag Contains 12 Red M&MS, 12 Blue M&MS, and 12 Green
5 pages
Course Material - I MCA
No ratings yet
Course Material - I MCA
141 pages
Unit-4 Probability
No ratings yet
Unit-4 Probability
21 pages
1prob & Prob Distrn
No ratings yet
1prob & Prob Distrn
46 pages
Notes (EDA)
No ratings yet
Notes (EDA)
37 pages
Probability: E.G. The Result of A Coin Toss
No ratings yet
Probability: E.G. The Result of A Coin Toss
11 pages
Lecture5 Slides
No ratings yet
Lecture5 Slides
8 pages
Probability Worksheet
No ratings yet
Probability Worksheet
14 pages
Probability 1
No ratings yet
Probability 1
37 pages
ProbabilityStatistics Probability
No ratings yet
ProbabilityStatistics Probability
10 pages
Probability Theory Overview
No ratings yet
Probability Theory Overview
24 pages
Assesmment APM
No ratings yet
Assesmment APM
3 pages
AI Fundamentals
No ratings yet
AI Fundamentals
43 pages
Signals and Systems Course Overview
No ratings yet
Signals and Systems Course Overview
2 pages
Algo Lec#3 PDF
No ratings yet
Algo Lec#3 PDF
40 pages
Prediksi Kelulusan Mahasiswa Tepat Waktu Berdasarkan Usia, Jenis Kelamin, Dan Indeks Prestasi Menggunakan Algoritma Decision Tree
No ratings yet
Prediksi Kelulusan Mahasiswa Tepat Waktu Berdasarkan Usia, Jenis Kelamin, Dan Indeks Prestasi Menggunakan Algoritma Decision Tree
15 pages
Opt8 20
No ratings yet
Opt8 20
9 pages
Markov Decision Process Assignment Guide
No ratings yet
Markov Decision Process Assignment Guide
6 pages
DL Uniwise Questions
No ratings yet
DL Uniwise Questions
1 page
Decision Trees & NLP Overview
No ratings yet
Decision Trees & NLP Overview
27 pages
Tree-Based Regression Models Explained
No ratings yet
Tree-Based Regression Models Explained
4 pages
Autoencoder
No ratings yet
Autoencoder
24 pages
Numpy Array Sorting Guide
No ratings yet
Numpy Array Sorting Guide
3 pages
Polynomial Interpolation Methods
No ratings yet
Polynomial Interpolation Methods
2 pages
YOLOP: Real-Time Panoptic Driving Perception
No ratings yet
YOLOP: Real-Time Panoptic Driving Perception
9 pages
Soft Computing Syllabus Overview
100% (1)
Soft Computing Syllabus Overview
2 pages
AI and ML Lab Manual
No ratings yet
AI and ML Lab Manual
38 pages
Data Structures Lab Manual
No ratings yet
Data Structures Lab Manual
8 pages
PDC Review 1
No ratings yet
PDC Review 1
6 pages
Syllabus EE432 Fall2024 v1
No ratings yet
Syllabus EE432 Fall2024 v1
2 pages
NS LAB 03 - 19070124019 - Durgesh Vyas
No ratings yet
NS LAB 03 - 19070124019 - Durgesh Vyas
10 pages
IMP - Mathematics II (All Modules)
No ratings yet
IMP - Mathematics II (All Modules)
44 pages
Measuring Relationship Via Regression Analysis and Correlation
No ratings yet
Measuring Relationship Via Regression Analysis and Correlation
9 pages
Num Methods
No ratings yet
Num Methods
36 pages
Boosting (Machine Learning)
No ratings yet
Boosting (Machine Learning)
6 pages
IIR Filter Design Lecture PDF
No ratings yet
IIR Filter Design Lecture PDF
58 pages
B.Tech Exam 2021-22: Algorithm Design
No ratings yet
B.Tech Exam 2021-22: Algorithm Design
35 pages
On The Convergence of The Iterative Image Space Reconstruction Algorithm For Volume ECT
No ratings yet
On The Convergence of The Iterative Image Space Reconstruction Algorithm For Volume ECT
2 pages
Recursive Functions for Students
No ratings yet
Recursive Functions for Students
17 pages
Comandos Esenciales de MatLab
No ratings yet
Comandos Esenciales de MatLab
6 pages
Fminunc
No ratings yet
Fminunc
9 pages

ML Lec 3

Uploaded by

ML Lec 3

Uploaded by

Theory of probability

 Let A be an event of Ω. Then the frequency ratio of A is

 If the random experiment Ω is repeated a large number of

 Thus, f(A) can be taken as to be an experimentally measured

 What is the probability that a positive integer selected

Let B be an event that no. is divisible by 3, so B ={3,6,

 Let D be an event that no. is divisible by 5 and 3, D =

For two mutually exclusive events, A and B, n(A ∪ B)

 If A, B, C be pairwise mutually exclusive

 By Empirical or Statistical definition,

 Hence, if P(A), P(B) ≠ 0, we have

they are not mutually exclusive.

pairwise mutually exclusive.

 For three events A, B, and C

1. A coin is tossed 3 times in succession. Find

1. A coin is tossed 3 times in succession. Find the

(b) Let B be the event that 2 consecutive heads

 Which gives the multiplication rule:

 For three events A, B, C, we have

 In general, for n events the multiplication rule is:

1. A die is rolled. If the result is either an even face or a

1. A die is rolled. If the result is either an even face or a multiple

2. Two cards are drawn successively from a pack without

2. Two cards are drawn successively from a pack without

(ii) Bayes’ theorem: if P(X) ≠ 0,

Proof: For any event X, we have X=SX= =

Proof: [Multiplication Rule]

Thus the Bayes’ theorem is proved.

 We have to compute P(B/X)

P(B/X) = [P(X/B) P(B)]/ P(X)

 A statistical classifier: performs probabilistic

Data Mining: Concepts and

Data Mining: Concepts and

P(H | X) P(X | H ) P( H ) P(X | H )P( H ) / P(X)

 Let D be a training set of tuples and their

 Since P(X) is constant for all classes, so only

age income studentcredit_rating

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222

 Good results obtained in most of the cases

therefore loss of accuracy

 Few Hyperparameters – The only

the curse of dimensionality which implies the algorithm faces

(iii) Increased Computational Complexity: The

(iv) Overfitting: In high-dimensional spaces, KNN can become

these methods, you can make KNN more effective

• We got the accuracy of 0.41 at K=37. As we got the

March 29, 2025 53

You might also like