0% found this document useful (0 votes)
14 views33 pages

AIFA 25 Bayesian Logic 120324

The document discusses probability distributions and probabilistic models. It introduces concepts like joint distributions, marginal distributions, conditional probabilities and independence. It also describes Bayesian networks as a way to efficiently represent probabilistic relationships between variables.

Uploaded by

herewego759
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views33 pages

AIFA 25 Bayesian Logic 120324

The document discusses probability distributions and probabilistic models. It introduces concepts like joint distributions, marginal distributions, conditional probabilities and independence. It also describes Bayesian networks as a way to efficiently represent probabilistic relationships between variables.

Uploaded by

herewego759
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

AIFA: Reasoning Under

Uncertainty
12/03/2024

Koustav Rudra
Probability Distributions
• A probability distribution is a description of how likely a random variable is to take on each of its
possible states
• Notation: 𝑃(𝑋) is the probability distribution over the random variable 𝑋
• Associate a probability with each value

• Unobserved random variables have distributions


• A distribution is a TABLE of probabilities of values
Axioms of Probability
• The probability of an event 𝐴 in the given sample space 𝒮, denoted as 𝑃 𝐴 , must satisfy the following
properties:

• Non-negativity
• For any event A ∈ 𝒮, 𝑃(𝐴) ≥ 0

• All possible outcomes


• Probability of the entire sample space is 1, 𝑃(𝒮) = 1

• Additivity of disjoint events


• For all events A1, 𝐴2∈ 𝒮 that are mutually exclusive (A1 ∩ 𝐴2 = ∅), the probability that both events
happen is equal to the sum of their individual probabilities, 𝑃(A1 ∨ A2) = 𝑃(𝐴1) +𝑃(A2)
Joint Distributions
• A joint distribution over a set of random variables: 𝑋1, 𝑋2, … , 𝑋𝑛

• Specifies a real number for each assignment (or outcome):


• 𝑃(𝑋1 = 𝑥1, 𝑋2 = 𝑥2, … , 𝑋𝑛 = 𝑥𝑛)
• 𝑃(𝑥1, 𝑥2, … , 𝑥𝑛) T W P
Hot Sun 0.4
• Must satisfy Hot Run 0.1
• 𝑃(𝑥1, 𝑥2, … , 𝑥𝑛) ≥ 0
• ∑𝑥1,𝑥2,…,𝑥𝑛 P(𝑥1,𝑥2,…,𝑥𝑛)=1 Cold Sun 0.2
Cold Run 0.3
• Size of distribution if n variables with domain sizes d?
Probabilistic Models
• A probabilistic model is a joint distribution over a set of random variables

• Probabilistic models:
T W P
• Random variables with domains
• Assignments are called outcomes Hot Sun 0.4
• Joint distributions: say whether
Hot Run 0.1
• assignments (outcomes) are likely
• Normalized: sum to 1.0 Cold Sun 0.2
• Ideally: only certain variables directly interact Cold Run 0.3
Events
• An event is a set E of outcomes
• 𝑃 𝐸 = ∑𝑥1,𝑥2,…,𝑥𝑛∈" P(𝑥1,𝑥2,…,𝑥𝑛)
T W P
• From a joint distribution, we can calculate the probability of any event Hot Sun 0.4
• Probability that it’s hot AND sunny?
Hot Run 0.1
• Probability that it’s hot?
• Probability that it’s hot OR sunny? Cold Sun 0.2
Cold Run 0.3
Marginal Probability Distribution

• Marginal probability distribution is the probability distribution of a single variable

• It is calculated based on the joint probability distribution 𝑃(𝑋,𝑌) using the sum rule:

• 𝑃 𝑋 = 𝑥 = ∑# 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦)
Marginal Distributions
• Marginal distributions are sub-tables which eliminate variables
• Marginalization (summing out): Combine collapsed rows by adding

T P
T W P 𝑃 𝑡 = $ 𝑃(𝑡, 𝑠) = 𝑃(𝑇)
! Hot 0.5
Hot Sun 0.4 Cold 0.5

Hot Run 0.1


𝑃 𝑠 = $ 𝑃(𝑡, 𝑠) = 𝑃(𝑊) W P
Cold Sun 0.2 " Sun 0.6
Cold Run 0.3 Rain 0.4

𝑃 𝑋1 = 𝑥1 = $ 𝑃(𝑋1 = 𝑥1, 𝑋2 = 𝑥2)


#$
Conditional Probabilities
$(&,()
• 𝑃 𝑎𝑏 = T W P
$(()
$(*+,,-+.)
• 𝑃 𝑊=𝑠𝑇=𝑐 = Hot Sun 0.4
$(-+.)
/.1 Hot Run 0.1
• 𝑃 𝑊=𝑠𝑇=𝑐 =
/.2
Cold Sun 0.2
Cold Run 0.3
Conditional Distributions
• Conditional distributions are probability distributions over some variables given fixed values of others
Normalization Trick

T W P • SELECT the joint • Normalize the


probabilities 𝑃(𝑐, 𝑊) selection 𝑃(𝑊|𝑇 = 𝑐)
Hot Sun 0.4 matching the
evidence
T W P W P
Hot Run 0.1
Cold Sun 0.2 Sun 0.2
Cold Sun 0.2 Cold Rain 0.3 Rain 0.3
Cold Run 0.3
Product Rule

• Sometimes have conditional distributions but want the joint


• 𝑃 𝑦 𝑃 𝑥 𝑦 = 𝑃(𝑥, 𝑦)
Bayesian Network
Conditional Independence
• Two events 𝐴 and 𝐵 are conditionally independent given another event 𝐶 with 𝑃(𝐶) > 0:
• 𝑃 𝐴 ∧ 𝐵 𝐶 = 𝑃 𝐴|𝐶 𝑃(𝐵|𝐶)

$(3∧5|7)
• 𝑃 𝐴 𝐵∧𝐶 =
$(5|7)
$(3|7)∧$(5|7)
• 𝑃 𝐴 𝐵∧𝐶 =
$(5|7)
• 𝑃 𝐴 𝐵 ∧ 𝐶 = 𝑃(𝐴|𝐶)
Conditional Independence and Chain Rule
Trivial decomposition:
P(Traffic, Rain, Umbrella)=
P(Rain)P(Traffic|Rain)P(Umbrella|Rain,Traffic)

With assumption of conditional independence:


P(Traffic, Rain, Umbrella)=
P(Rain)P(Traffic|Rain)P(Umbrella|Rain)

Bayes nets / graphical models help us express conditional independence assumptions


Conditional Independence
• 𝑃 𝑥8 , … , 𝑥9 = 𝑃 𝑥9 𝑥9:8 , … , 𝑥8 𝑃(𝑥9:8 , … , 𝑥8 )
• 𝑃 𝑥8 , … , 𝑥9 = 𝑃 𝑥9 𝑥9:8 , … , 𝑥8 𝑃 𝑥9:8 𝑥9:1 , … , 𝑥8 … 𝑃 𝑥1 𝑥8 𝑃(𝑥8 )
• 𝑃 𝑥8 , … , 𝑥9 = ∏9;+8 𝑃(𝑥; |𝑥;:8 , … , 𝑥8 )

• The belief network represents conditional independence:


• 𝑃 𝑥; 𝑥; , … , 𝑥8 = 𝑃(𝑥; |𝑃𝑎𝑟𝑒𝑛𝑡𝑠 𝑥; )
Bayes Nets
• A Bayes’ net is an efficient encoding of a probabilistic model of a domain

• Questions we can ask:


• Inference: given a fixed BN, what is P(X |e)?
• Representation: given a BN graph, what kinds of distributions can it encode?
• Modeling: what BN is most appropriate for a given domain?
Bayes Net Semantics

• A set of nodes, one per variable X


• A directed, acyclic graph
• A conditional distribution for each node
• A collection of distributions over X, one for each combination of
parents’ values
• 𝑃(𝑋|𝑎8 , 𝑎1 , … , 𝑎9 )
• CPT: conditional probability table
• Description of a noisy “causal” process

A Bayes net = Topology (graph) + Local Conditional Probabilities


Probabilities in BN
• Bayes’ nets implicitly encode joint distributions
• As a product of local conditional distributions
• 𝑃 𝑥; 𝑥;:8 , 𝑥;:1 , … , 𝑥8 = 𝑃(𝑥; |𝑃𝑎𝑟𝑒𝑛𝑡𝑠 𝑥; )

• 𝑃𝑎𝑟𝑒𝑛𝑡𝑠 𝑥; : minimal set of predecessors of 𝑋𝑖 in the total ordering such that other predecessors
are conditionally independent of 𝑋𝑖 given 𝑃arent(𝑋𝑖)
Bayesian Network
• What is the issue with joint probability distribution?
• Become intractably large as the number of variables grows
• Specifying probabilities for atomic events is really difficult

• How does Bayesian Network help?


• Explore independence and conditional independence relationships among variables
• To greatly reduce number of probabilities to be specified to define full joint distribution
Bayesian Network
• A set of random variables makes up the nodes of the network
• Variables may be discrete or continuous

• A set of directed links or arrows connects pairs of nodes


• Arrows represent probabilistic dependence among variables

• An arrow from X →Y indicates X is parent of Y

• Each node 𝑋% ℎ𝑎𝑠 𝑎 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑃(𝑋% |𝑃𝑎𝑟𝑒𝑛𝑡𝑠 𝑋% )


• Quantifies the effect of the parents on the node

• The graph has no directed cycles (DAG)


Example1: Traffic
Example2: Traffic
Causality?
• When Bayes’ nets reflect the true causal patterns:
• Often simpler (nodes have fewer parents)
• Often easier to think about and to elicit from experts

• BNs need not actually be causal


• Sometimes no causal net exists over the domain (especially if variables are missing)
• End up with arrows that reflect correlation, not causation

• What do the arrows really mean?


• Topology may happen to encode causal structure
• Topology really encodes conditional independence
Example3: Home Alarm Network
• Burglar alarm at home
• Fairly reliable at detecting a burglary
• Responds at times to minor earthquakes

• Two neighbors, on hearing alarm, calls police


• John always calls when he hears the alarm
• But sometimes confuses the telephone ringing with the alarm and calls then too

• Mary likes loud music


• But sometimes misses the alarm altogether
Belief Network
P(B) P(E) Earthquake
Burglary 0.001 0.002

Alarm
A P(J) A P(M)
t 0.90 B E P(A) t 0.70
f 0.05 t t 0.95 f 0.01
t f 0.94
f t 0.29
JohnCalls f f 0.001 MaryCalls
Joint probability distribution
• 𝑃 𝑥! , … , 𝑥" = ∏"#$! 𝑃(𝑥# |𝑃𝑎𝑟𝑒𝑛𝑡𝑠(𝑥# ))

• 𝑃(𝐽 ∧ 𝑀 ∧ 𝐴 ∧ ~𝐵 ∧ ~𝐸)
• 𝑃 𝐽𝐴 ∗
• 𝑃 𝑀𝐴 ∗
• 𝑃 𝐴 ~𝐵 ∧ ~𝐸 ∗
• 𝑃 ~𝐵 ∗
• P(~𝐸)

• 𝑃 𝐽 ∧ 𝑀 ∧ 𝐴 ∧ ~𝐵 ∧ ~𝐸 = 0.9×0.7×0.001×0.999×0.998 = 0.00062

• 𝑃 𝐽 =?
Conditional Independence
• 𝑃 𝐽, 𝑀, 𝐴, 𝐵, 𝐸 = 𝑃 𝐽 𝑀, 𝐴, 𝐵, 𝐸 𝑃(𝑀, 𝐴, 𝐵, 𝐸)

• 𝑃 𝐽, 𝑀, 𝐴, 𝐵, 𝐸 = 𝑃 𝐽 𝐴 𝑃 𝑀 𝐴, 𝐵, 𝐸 𝑃(𝐴, 𝐵, 𝐸)

• 𝑃 𝐽, 𝑀, 𝐴, 𝐵, 𝐸 = 𝑃 𝐽 𝐴 𝑃 𝑀 𝐴 𝑃 𝐴 𝐵, 𝐸 𝑃(𝐵, 𝐸)

• 𝑃 𝐽, 𝑀, 𝐴, 𝐵, 𝐸 = 𝑃 𝐽 𝐴 𝑃 𝑀 𝐴 𝑃 𝐴 𝐵, 𝐸 𝑃(𝐵)𝑃(𝐸)

How does ordering matter?


Conditional Independence
• Earthquake, Burglary, Alarm, JohnCalls, MaryCalls
• P(E|B,A,J,M)
MaryCalls JohnCalls

May have to define conditional


probability of confusing set of events

Alarm

Burglary Earthquake
Conditional Independence
• Alarm, Burglary, Earthquake, JohnCalls, MaryCalls

MaryCalls JohnCalls

May have to construct


large probability tables

Earthquake

Burglary Alarm
Incremental Network Construction
• Choose the set of relevant variables 𝑋; , that describe the domain

• Choose an ordering for the variables [important step]

• While there are variables left:


• Pick a variable X and add a node for it
• Set Parents(X) to some minimal set of existing nodes such that the conditional independence
property is satisfied
• Define conditional probability table for X

Why do we construct Bayes Network?

To answer queries related to joint probability distribution


Size of a Bayes Net
• How big is a joint distribution over N • Both give you the power to calculate
Boolean variables?
• 2<
• BNs: Huge space savings!

• How big is an N-node net if nodes have up to


k parents? • Also easier to elicit local CPTs
• 𝑂(𝑁 ∗ 2= )
• Also faster to answer queries (coming)
Thank You

You might also like