0% found this document useful (0 votes)

20 views

Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis

1) Bayesian decision theory provides a framework for designing classifiers to minimize an expected risk or cost of misclassification. 2) It uses prior probabilities, likelihood functions, and posterior probabilities derived through Bayes' rule to determine decision rules. 3) The optimal Bayesian decision rule minimizes the average probability of error or expected loss/risk.

Uploaded by

এ.এস. সাকিব

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis

Uploaded by

এ.এস. সাকিব

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 64

Bayesian Decision Theory

Chapter 2 (Duda et al.) – Sections 2.1-2.10

CS479/679 Pattern Recognition

Dr. George Bebis
Bayesian Decision Theory
• Design classifiers to make decisions subject to
minimizing an expected ”risk”.
– The simplest risk is the classification error.
– When misclassification errors are not equally
important, the risk can include the cost associated
with different misclassification errors.
Terminology
• State of nature ω (class label):
– e.g., ω1 for sea bass, ω2 for salmon

• Probabilities P(ω1) and P(ω2) (priors):

– e.g., prior knowledge of how likely is to get a sea bass
or a salmon

• Probability density function p(x) (evidence):

– e.g., how frequently we will measure a pattern with
feature value x (e.g., x corresponds to lightness)
Terminology (cont’d)
• Conditional probability density p(x/ωj) (likelihood) :
– e.g., how frequently we will measure a pattern with
feature value x given that the pattern belongs to class ωj

e.g., lightness distributions

between salmon/sea-bass
populations
Terminology (cont’d)

• Conditional probability P(ωj /x) (posterior) :

– e.g., the probability that the fish belongs to class
ωj given feature x.
Decision Rule Using Prior
Probabilities Only
Decide ω1 if P(ω1) > P(ω2); otherwise decide ω2

 P(1 ) if we decide 2
P(error )  
 P(2 ) if we decide 1

or P(error) = min[P(ω1), P(ω2)]

• Favours the most likely class.

• This rule will be making the same decision all times.
– i.e., optimum if no other information is available
Decision Rule Using
Conditional Probabilities
• Using Bayes’ rule:
p ( x /  j ) P ( j ) likelihood  prior
P( j / x)  
p( x) evidence
2
where p ( x)   p ( x /  j ) P ( j ) (i.e., scale factor – sum of probs = 1)
j 1

Decide ω1 if P(ω1 /x) > P(ω2 /x); otherwise decide ω2

or
Decide ω1 if p(x/ω1)P(ω1)>p(x/ω2)P(ω2); otherwise decide ω2
or
Decide ω1 if p(x/ω1)/p(x/ω2) >P(ω2)/P(ω1) ; otherwise decide ω2
likelihood ratio threshold
Decision Rule Using Conditional
Probabilities (cont’d)

p(x/ωj) 2 1
P(1 )  P ( 2 )  P(ωj /x)
3 3
Probability of Error
• The probability of error is defined as:

 P(1 / x) if we decide 2
P (error / x)  
 P(2 / x) if we decide 1

P(error/x) = min[P(ω1/x), P(ω2/x)]

• What is the average probability error?

 
P(error )   P(error , x)dx   P(error / x) p( x)dx
 

• The Bayes rule is optimum, that is, it minimizes the average

probability error!
Where do Probabilities come from?
• There are two competitive answers:

(1) Relative frequency (objective) approach.

– Probabilities can only come from experiments.

(2) Bayesian (subjective) approach.

– Probabilities may reflect degree of belief and can be
based on opinion.
Example (objective approach)
• Classify cars whether they are more or less than $50K:
– Classes: C1 if price > $50K, C2 if price <= $50K
– Features: x, the height of a car

• Use the Bayes’ rule to compute the posterior probabilities:

p ( x / Ci )P (C i )
P(Ci / x ) 
p( x)
• We need to estimate p(x/C1), p(x/C2), P(C1), P(C2)
Example (cont’d)
• Collect data
– Ask drivers how much their car was and measure height.
• Determine prior probabilities P(C1), P(C2)
– e.g., 1209 samples: #C1=221 #C2=988

221
P(C1 )   0.183
1209
988
P(C2 )   0.817
1209
Example (cont’d)
• Determine class conditional probabilities (likelihood)
– Discretize car height into bins and use normalized histogram

p ( x / Ci )
Example (cont’d)
• Calculate the posterior probability for each bin, e.g.:
p( x  1.0 / C1 ) P( C1)
P(C1 / x  1.0)  
p( x  1.0 / C1) P( C1)  p( x 1.0 / C2) P( C2)
0.2081*0.183
  0.438
0.2081*0.183  0.0597 *0.817

P(Ci / x)
Example (subjective approach)

• Use the Bayes’ rule to compute the posterior probabilities:

p ( x / Ci )P (C i )
P(Ci / x ) 
p( x)

N(μ,Σ)

• p(x/C1) ~ N(μ1,Σ1)
• p(x/C2) ~ N(μ2,Σ2)
• P(C1) = P(C2) = 0.5
A More General Theory
• Use more than one features.
• Allow more than two categories.
• Allow actions other than classifying the input to
one of the possible categories (e.g., rejection).
• Employ a more general error function (i.e.,
expected “risk”) by associating a “cost” (based
on a “loss” function) with different errors.
Terminology
• Features form a vector x  R d
• A set of c categories ω1, ω2, …, ωc
• A finite set of l actions α1, α2, …, αl
• A loss function λ(αi / ωj)
– the cost associated with taking action αi when the correct
classification category is ωj

Bayes rule (using vector notation):

p (x /  j ) P( j )
P( j / x) 
p( x)
c
where p(x)   p(x /  j ) P( j )
j 1
Conditional Risk (or Expected Loss)
• Suppose we observe x and take action αi

• The conditional risk (or expected loss) with taking

action αi is defined as:
c
R (ai / x)    (ai /  j ) P( j / x)
j 1
Overall Risk
• The overall risk is defined as:

R   R (a(x) / x) p(x) dx

where α(x) is a general decision rule that determines

which action α1, α2, …, αl to take
for every x.

• The optimum decision rule is the Bayes rule

Overall Risk (cont’d)
• The Bayes rule minimizes R by:
(i) Computing R(αi /x) for every αi given an x
(ii) Choosing the action αi with the minimum R(αi /x)

• The resulting minimum R* is called Bayes risk and

is the best performance that can be achieved:

R  min R
*
Example: Two-category
classification
• Define
– α1: decide ω1
– α2: decide ω2
– λij = λ(αi /ωj)

• The conditional risks are:

c
R(ai / x)    (ai /  j ) P ( j / x)
j 1
Example: Two-category
classification (cont’d)
• Minimum risk decision rule:

likelihood ratio threshold

Special Case:
Zero-One Loss Function
• Assign the same loss to all errors:

• The conditional risk corresponding to this loss function:

Special Case:
Zero-One Loss Function (cont’d)
• The decision rule becomes:

• In this case, the overall risk becomes the average

probability error!
Example
Assuming general loss:
>

Assuming zero-one loss:

Decide ω1 if p(x/ω1)/p(x/ω2)>P(ω2 )/P(ω1) otherwise decide ω2

 a  P(2 ) / P(1 )

P(2 )(12  22 )

b 
P(1 )(21  11 )

assume: 12  21

(decision regions)
Discriminant Functions
• Represent a classifier is through discriminant functions
gi(x), i = 1, . . . , c
• A feature vector x is assigned to class ωi if:
gi(x) > gj(x) for all ji

max
Discriminants for Bayes Classifier
• Assuming a general loss function:

gi(x)=-R(αi / x)

• Assuming the zero-one loss function:

gi(x)=P(ωi / x)
Discriminants for Bayes Classifier
(cont’d)
• Is the choice of gi unique?
– Replacing gi(x) with f(gi(x)), where f() is monotonically
increasing, does not change the classification results.

p (x / i ) P(i )
g i ( x) 
p ( x)
gi(x)=P(ωi/x)
gi (x)  p(x / i ) P(i )
gi (x)  ln p (x / i )  ln P(i )

we’ll use this

discriminant extensively!
Case of two categories
• More common to use a single discriminant function
(dichotomizer) instead of two:

• Examples:
g (x)  P(1 / x)  P(2 / x)
p(x / 1 ) P(1 )
g (x)  ln  ln
p ( x / 2 ) P(2 )
Decision Regions and Boundaries
• Discriminants divide the feature space in decision regions
R1, R2, …, Rc, separated by decision boundaries.

Decision boundary
is defined by:
g1(x)=g2(x)
Discriminant Function for
Multivariate Gaussian Density

N(μ,Σ)

• Consider the following discriminant function:

gi (x)  ln p(x / i )  ln P(i )

p(x/ωi)
Multivariate Gaussian Density: Case I

• Σi=σ2 I (diagonal matrix)

– Features are statistically independent
– Each feature has the same variance
Multivariate Gaussian Density:
Case I (cont’d)

wi=

)
)
Multivariate Gaussian Density:
Case I (cont’d)
• Properties of decision boundary:
– It passes through x0
– It is orthogonal to the line linking the means.
– What happens when P(ωi)= P(ωj) ?
– If P(ωi)= P(ωj), then x0 shifts away from the most likely category.
– If σ is very small, the position of the boundary is insensitive to P(ωi)
and P(ωj)

)
)
Multivariate Gaussian Density:
Case I (cont’d)

If P(ωi)= P(ωj), then x0 shifts away

from the most likely category.
Multivariate Gaussian Density:
Case I (cont’d)

If P(ωi)= P(ωj), then x0 shifts away

from the most likely category.
Multivariate Gaussian Density:
Case I (cont’d)

If P(ωi)= P(ωj), then x0 shifts away

from the most likely category.
Multivariate Gaussian Density:
Case I (cont’d)
• Minimum distance classifier
– When P(ωi) are equal, then the discriminant becomes:

g i ( x)   || x  i ||2

– This is the Euclidean distance!

– Assumptions: statistically independent features, same variance!

Multivariate Gaussian Density: Case II

• Σi= Σ
Multivariate Gaussian Density:
Case II (cont’d)
Multivariate Gaussian Density:
Case II (cont’d)
• Properties of hyperplane (decision boundary):
– It passes through x0
– It is not orthogonal to the line linking the means.
– What happens when P(ωi)= P(ωj) ?
– If P(ωi)= P(ωj), then x0 shifts away from the most likely category.
Multivariate Gaussian Density:
Case II (cont’d)

If P(ωi)= P(ωj), then x0 shifts away

from the most likely category.
Multivariate Gaussian Density:
Case II (cont’d)

If P(ωi)= P(ωj), then x0 shifts away

from the most likely category.
Multivariate Gaussian Density:
Case II (cont’d)
• Mahalanobis distance classifier
– When P(ωi) are equal, then the discriminant becomes:
Multivariate Gaussian Density: Case III

• Σi= arbitrary

hyperquadrics;

e.g., hyperplanes, pairs of hyperplanes, hyperspheres,

hyperellipsoids, hyperparaboloids etc.
Multivariate Gaussian Density:
Case III (cont’d)

non-linear
decision
boundaries
Multivariate Gaussian Density:
Case III (cont’d)
Example - Case III

decision boundary:

P(ω1)=P(ω2)

boundary does
not pass through
midpoint of μ1,μ2
Error Bounds
• Exact error calculations could be difficult – easier to
estimate error bounds!

or
min[P(ω1/x), P(ω2/x)]

P(error)
Error Bounds (cont’d)
• If the class conditional distributions are Gaussian, then

where:
Error Bounds (cont’d)
• The Chernoff bound is obtained by minimizing e-κ(β)
– This is a 1-D optimization problem, regardless to the dimensionality
of the class conditional densities.
Error Bounds (cont’d)
• The Bhattacharyya bound is obtained by setting β=0.5
– Easier to compute than Chernoff error but looser.

• Note: the Chernoff and Bhattacharyya bounds will not be

good bounds if the densities are not Gaussian.
Example (cont’d)

Bhattacharyya error:
k(0.5)=4.06
P(error )  0.0087
Receiver Operating
Characteristic (ROC) Curve
• Every classifier typically employs some kind of a threshold.

 a  P(2 ) / P (1 )

P(2 )(12  22 )

b 
P (1 )(21  11 )
• Changing the threshold can affect the performance of the
classifier.
• ROC curves allow us to evaluate/compare the
performance of a classifier using different thresholds.
Example: Person Authentication
• Authenticate a person using biometrics (e.g., fingerprints).
• There are two possible distributions (i.e., classes):
– Authentic (A) and Impostor (I)

A
I
Example: Person Authentication
(cont’d)
• Possible decisions:
– (1) correct acceptance (true positive):
• X belongs to A, and we decide A correct rejection
correct acceptance
– (2) incorrect acceptance (false positive):
• X belongs to I, and we decide A
– (3) correct rejection (true negative): A
• X belongs to I, and we decide I I
– (4) incorrect rejection (false negative):
• X belongs to A, and we decide I false negative false positive
Error vs Threshold
ROC Curve

x* (threshold)

FAR: False Accept Rate (False Positive)

FRR: False Reject Rate (False Negative)
False Negatives vs False Positives
ROC Curve

FAR: False Accept Rate (False Positive)

FRR: False Reject Rate (False Negative)
Bayes Decision Theory:
Case of Discrete Features

• Replace  p ( x /  ) dx
j with  P(x /  )
x
j

• See section 2.9

Missing Features
• Suppose x=(x1,x2) is a test vector where x1 is missing and x2
= x̂-2 how would we classify it?
– If we set x1 equal to the average value, we will classify x as ω3
– But p ( xˆ2 / 2 ) is larger; should we classify x as ω2 ?

Example:
Missing Features (cont’d)
• Suppose x=[xg, xb] (xg: good features, xb: bad features)
• Derive the Bayes rule using the good features:

p p

marginalize
posterior
probability
over bad
features.
Compound Bayesian
Decision Theory
• Sequential decision
(1) Decide as each pattern (e.g., fish) emerges.

• Compound decision
(1) Wait for n patterns (e.g., fish) to emerge.
(2) Make all n decisions jointly.
–Could improve performance when consecutive states
of nature are not statistically independent.
Compound Bayesian
Decision Theory (cont’d)
• Suppose X=(x1, x2, …, xn) are n observed
vectors.
• Suppose Ω=(ω(1), ω(2), …, ω(n)) denotes the n
states of nature.
– ω(i) can take one of c values ω1, ω2, …, ωc (i.e., c
categories)
• Suppose P(Ω) is the prior probability of the n
states of nature.
Compound Bayesian
Decision Theory (cont’d)

P P
acceptable! i.e., consecutive states of nature may
not be statistically independent!

Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
100% (1)
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
209 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
63 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
DHSCH 2
No ratings yet
DHSCH 2
88 pages
Bayes Decision Theory
No ratings yet
Bayes Decision Theory
53 pages
Bayesian Theory
No ratings yet
Bayesian Theory
66 pages
Theory For Classification and Linear Models (I)
No ratings yet
Theory For Classification and Linear Models (I)
32 pages
Lecture 2 3
No ratings yet
Lecture 2 3
72 pages
Bayesian Decision Theory: Prof. Richard Zanibbi
No ratings yet
Bayesian Decision Theory: Prof. Richard Zanibbi
47 pages
PR January20 03 PDF
No ratings yet
PR January20 03 PDF
74 pages
Lecturer4_Bayesian Decision Theory
No ratings yet
Lecturer4_Bayesian Decision Theory
40 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
Bayes&Voice Recognition
No ratings yet
Bayes&Voice Recognition
76 pages
pr2 bayes
No ratings yet
pr2 bayes
44 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
Machine learning 04 - Bayes
No ratings yet
Machine learning 04 - Bayes
35 pages
Linearclassification
No ratings yet
Linearclassification
31 pages
Introduction To Pattern Recognition
No ratings yet
Introduction To Pattern Recognition
12 pages
Lecture 11
No ratings yet
Lecture 11
49 pages
Lecture 2 Part 1: Statistical Analysis (Bayesian Decision Theory, Probability Theory)
No ratings yet
Lecture 2 Part 1: Statistical Analysis (Bayesian Decision Theory, Probability Theory)
22 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Bayes
No ratings yet
Bayes
10 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
Bayesian Decision Theory: Intro To
No ratings yet
Bayesian Decision Theory: Intro To
56 pages
Pattern Recognition
No ratings yet
Pattern Recognition
76 pages
Bayes Rule
No ratings yet
Bayes Rule
8 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
Lecture 5
No ratings yet
Lecture 5
16 pages
Bayesian
No ratings yet
Bayesian
21 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
76 pages
4.2 Bayes Decision Theory
No ratings yet
4.2 Bayes Decision Theory
49 pages
8
No ratings yet
8
141 pages
AIML Lect7 Bayes
No ratings yet
AIML Lect7 Bayes
48 pages
Kuliah 3 Teori Keputusan Bayes Bag 2
No ratings yet
Kuliah 3 Teori Keputusan Bayes Bag 2
30 pages
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
No ratings yet
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
5 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
80 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
44 pages
7. Statistical Perspective
No ratings yet
7. Statistical Perspective
85 pages
PR January20 04 PDF
No ratings yet
PR January20 04 PDF
40 pages
T06 - Bayes Classifiers
No ratings yet
T06 - Bayes Classifiers
22 pages
Unit-Ii Bayesian Decision Theory
No ratings yet
Unit-Ii Bayesian Decision Theory
22 pages
Machine Learning: Tools, Techniques, Applications (2013-14-I) # 1
No ratings yet
Machine Learning: Tools, Techniques, Applications (2013-14-I) # 1
5 pages
Bayesian Classifier Implementation Using MATLAB
No ratings yet
Bayesian Classifier Implementation Using MATLAB
21 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
2023 LSE MY474 Applied Machine Learning Social Science, Lecture3
No ratings yet
2023 LSE MY474 Applied Machine Learning Social Science, Lecture3
58 pages
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
No ratings yet
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
31 pages
Unit-2 Statistical PR
No ratings yet
Unit-2 Statistical PR
26 pages
03-bayes-nearest-neighbors
No ratings yet
03-bayes-nearest-neighbors
34 pages
Classification Example
No ratings yet
Classification Example
12 pages
Homework1 Solutions
No ratings yet
Homework1 Solutions
5 pages
Statistical Learning Theory: 18.657: Mathematics of Machine Learning
No ratings yet
Statistical Learning Theory: 18.657: Mathematics of Machine Learning
9 pages
2. Bayes Decision Theory-1
No ratings yet
2. Bayes Decision Theory-1
32 pages
Lec 6
No ratings yet
Lec 6
14 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Introduction To Neural Network - Deep Learning
No ratings yet
Introduction To Neural Network - Deep Learning
17 pages
Pattern Recognition: An Overview: Prof. Richard Zanibbi
No ratings yet
Pattern Recognition: An Overview: Prof. Richard Zanibbi
29 pages
PR01
100% (1)
PR01
41 pages
DHSCH 1
No ratings yet
DHSCH 1
31 pages
Making Models With Bayes
No ratings yet
Making Models With Bayes
51 pages
The Book of Statistical Proofs: DOI: 10.5281/zenodo.4305950 2022-10-22, 07:22
No ratings yet
The Book of Statistical Proofs: DOI: 10.5281/zenodo.4305950 2022-10-22, 07:22
575 pages
Applied Ba Yes
No ratings yet
Applied Ba Yes
13 pages
LQT-De003 Guidelines On Represemtative Drug Sampling
No ratings yet
LQT-De003 Guidelines On Represemtative Drug Sampling
60 pages
MHC JSW Amnat2013
No ratings yet
MHC JSW Amnat2013
12 pages
What Is The Bayes' Theorem?
100% (1)
What Is The Bayes' Theorem?
12 pages
Bayesian Model To Size of The Universe
No ratings yet
Bayesian Model To Size of The Universe
6 pages
Tuto1 Merged
No ratings yet
Tuto1 Merged
11 pages
Castillo2020 Bayesian Predictive Optimization of Multiple and Profile Responses Systems in Industry
No ratings yet
Castillo2020 Bayesian Predictive Optimization of Multiple and Profile Responses Systems in Industry
18 pages
Notes 2 BayesianStatistics
No ratings yet
Notes 2 BayesianStatistics
6 pages
Chetan Prakash
No ratings yet
Chetan Prakash
48 pages
Bayesian Theory and Methods With Applications
100% (2)
Bayesian Theory and Methods With Applications
327 pages
Statistical Inference
No ratings yet
Statistical Inference
62 pages
Lec06 Classification NaiveBayes RuleBased
No ratings yet
Lec06 Classification NaiveBayes RuleBased
44 pages
Sample Size Determination For Clinical Trials: Paivand Jalalian
No ratings yet
Sample Size Determination For Clinical Trials: Paivand Jalalian
26 pages
Spatio Temporal Methods in Environmental Epidemiology 1st Edition Gavin Shaddick (Author) download
100% (1)
Spatio Temporal Methods in Environmental Epidemiology 1st Edition Gavin Shaddick (Author) download
74 pages
Bayesian Analysis and Sports Betting: John Mark H. Villanueva T05
No ratings yet
Bayesian Analysis and Sports Betting: John Mark H. Villanueva T05
2 pages
Buy ebook Probabilistic Machine Learning Advanced Topics Draft 1st Edition Kevin P. Murphy cheap price
100% (3)
Buy ebook Probabilistic Machine Learning Advanced Topics Draft 1st Edition Kevin P. Murphy cheap price
40 pages
Lec 04
No ratings yet
Lec 04
70 pages
Assignment-1 Decision Sciences-I: Shrey Agarwal (1811079, Section-B)
No ratings yet
Assignment-1 Decision Sciences-I: Shrey Agarwal (1811079, Section-B)
9 pages
Full download Introduction to Bayesian Statistics 3rd Edition William M. Bolstad pdf docx
100% (13)
Full download Introduction to Bayesian Statistics 3rd Edition William M. Bolstad pdf docx
60 pages
Bayesian Estimation Thesis
100% (2)
Bayesian Estimation Thesis
6 pages
Module 2 Notes
No ratings yet
Module 2 Notes
24 pages
An Empirical Bayes Approach
No ratings yet
An Empirical Bayes Approach
7 pages
CRC Twisted Logic 1032513349
No ratings yet
CRC Twisted Logic 1032513349
226 pages
MIT18 05S14 Class26-Prob PDF
No ratings yet
MIT18 05S14 Class26-Prob PDF
12 pages
Quiz 2 Solution
No ratings yet
Quiz 2 Solution
3 pages
Dean and Mitchell 2020 P.A. Dose-Response Model
No ratings yet
Dean and Mitchell 2020 P.A. Dose-Response Model
32 pages
Data Mining Slides
No ratings yet
Data Mining Slides
65 pages
Divide-and-Conquer Posterior Sampling For Denoising Diffusion Priors
No ratings yet
Divide-and-Conquer Posterior Sampling For Denoising Diffusion Priors
30 pages

Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis

Uploaded by

Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis

Uploaded by

Bayesian Decision Theory

Chapter 2 (Duda et al.) – Sections 2.1-2.10

CS479/679 Pattern Recognition

• Probabilities P(ω1) and P(ω2) (priors):

• Probability density function p(x) (evidence):

e.g., lightness distributions

• Conditional probability P(ωj /x) (posterior) :

or P(error) = min[P(ω1), P(ω2)]

• Favours the most likely class.

Decide ω1 if P(ω1 /x) > P(ω2 /x); otherwise decide ω2

P(error/x) = min[P(ω1/x), P(ω2/x)]

• What is the average probability error?

• The Bayes rule is optimum, that is, it minimizes the average

(1) Relative frequency (objective) approach.

(2) Bayesian (subjective) approach.

• Use the Bayes’ rule to compute the posterior probabilities:

• Use the Bayes’ rule to compute the posterior probabilities:

Bayes rule (using vector notation):

• The conditional risk (or expected loss) with taking

where α(x) is a general decision rule that determines

• The optimum decision rule is the Bayes rule

• The resulting minimum R* is called Bayes risk and

• The conditional risks are:

likelihood ratio threshold

• The conditional risk corresponding to this loss function:

• In this case, the overall risk becomes the average

Assuming zero-one loss:

P(2 )(12  22 )

assume: 12  21

• Assuming the zero-one loss function:

we’ll use this

• Consider the following discriminant function:

• Σi=σ2 I (diagonal matrix)

If P(ωi)= P(ωj), then x0 shifts away

If P(ωi)= P(ωj), then x0 shifts away

If P(ωi)= P(ωj), then x0 shifts away

– This is the Euclidean distance!

– Assumptions: statistically independent features, same variance!

If P(ωi)= P(ωj), then x0 shifts away

If P(ωi)= P(ωj), then x0 shifts away

e.g., hyperplanes, pairs of hyperplanes, hyperspheres,

• Note: the Chernoff and Bhattacharyya bounds will not be

P(2 )(12  22 )

FAR: False Accept Rate (False Positive)

FAR: False Accept Rate (False Positive)

• See section 2.9

You might also like