AI: A Modern Approach, Chapter 15: Probabilistic Reasoning Systems
- Representing Knowledge in an Uncertain Domain
- Belief network used to encode the meaningful dependence between
variables.
- Nodes represent random variables
- Arcs represent direct influence
- Nodes have conditional probability table that gives that var's
probability given the different states of its parents
- Is a Directed Acyclic Graph (or DAG)
- The Semantics of Belief Networks
- To construct net, think of as representing the joint probability
distribution.
- To infer from net, think of as representing conditional independence
statements.
- Calculate a member of the joint probability by multiplying individual
conditional probabilities:
- P(X1=x1, . . . Xn=xn) =
- = P(X1=x1|parents(X1)) * . . . * P(Xn=xn|parents(Xn))
- Note: Only have to be given the immediate parents of Xi, not all
other nodes:
- P(Xi|X(i-1),...X1) = P(Xi|parents(Xi))
- To incrementally construct a network:
- Decide on the variables
- Decide on an ordering of them
- Do until no variables are left:
- Pick a variable and make a node for it
- Set its parents to the minimal set of pre-existing nodes
- Define its conditional probability
- Often, the resulting conditional probability tables are much smaller than
the exponential size of the full joint
- If don't order nodes by "root causes" first, get larger conditional
probability tables
- Different tables may encode the same probabilities.
- Some canonical distributions that appear in conditional probability
tables:
- deterministic logical relationship (e.g. AND, OR)
- deterministic numeric relationship (e.g. MIN)
- parameteric relationship (e.g. weighted sum in neural net)
- noisy logical relationship (e.g. noisy-OR, noisy-MAX)
- Direction-dependent separation or D-separation:
- If all undirected paths between 2 nodes are d-separated given evidence
node(s) E, then the 2 nodes are independent given E.
- Evidence node(s) E d-separate X and Y if for every path between
them E contains a node Z that:
- has an arrow in on the path leading from X and an arrow out on the
path leading to Y (or vice versa)
- has arrows out leading to both X and Y
- does NOT have arrows in from both X and Y (nor Z's children too)
- Inference in Belief Networks
- Want to compute posterior probabilities of query variables given
evidence variables.
- Types of inference for belief networks:
- Diagnostic inference: symptoms to causes
- Causal inference: causes to symptoms
- Intercausal inference:
- Mixed inference: mixes those above
- Inference in Multiply Connected Belief Networks
- Multiply connected graphs have 2 nodes connected by more than one
path
- Techniques for handling:
- Clustering: Group some of the intermediate nodes into one
meganode.
- Pro: Perhaps best way to get exact evaluation.
- Con: Conditional probability tables may exponentially increase
in size.
- Cutset conditioning: Obtain simplier polytrees by instantiating
variables as constants.
- Con: May obtain exponential number of simplier polytrees.
- Pro: It may be safe to ignore trees with lo probability
(bounded cutset conditioning).
- Stochastic simulation: run thru the net with randomly choosen
values for each node (weighed by prior probabilities).
The probabilility of any atomic event (it's joint probability) can be
gotten from the network.
By definition,
P(x_1, ... , x_n) = PRODUCT_{i=1}^n P(x_i | Parents(X_i))
For example,
P( J AND M AND A AND (NOT B) AND (NOT E) )
= P(J|A) P(M|A) P(A|(NOT B) AND (NOT E)) P(NOT B) P(NOT E)
= 0.90 * 0.70 * 0.001 * 0.999 * 0.998 = 0.00062
We can show that this definition of the joint probability means that
the belief network correctly represents the domain only if
each node is conditionally independent of its predecessors in the
node ordering, given its parents.
First note that,
P(x_1, ..., x_2) = P(x_n|x_{n-1}, ..., x_1) P(x_{n-1}, ..., x_1).
We extend this to show that
P(x_1, ..., x_2) = PRODUCT_{i=1}^n P(x_i | x_{i-1}, ..., x_1)
This means that
P(X_1, ..., X_2) = P(X_i | Parents(X_i)
provided that Parents(X_i) is a subset of {x_{i-1,..., x1}.
We can label the nodes in the graph in any order consistent with the
partial order implicit in the graph structure.
The parents of node X_i should contain all of those nodes in
X_1,...,X{i-1} that directly influence X_i.
To construct a network, do the following:
- Choose the set of relevant variables X_i that describe the
domain.
- Choose an ordering for the variables.
- While there are variables left:
- Pick a variable and add a node to the network for it.
-
- Set Parents(X_i) to some minimal set of nodes
already in the
net such that the conditional independence property above is satisfied.
- Define the conditional probability table for X_i.
Acyclic because each node only connected to earlier nodes.
Consistent with rules of probability because there are no redundant
probability values.
Compact because many domains are locally structured.
If each RV is directly influenced by at most k others,
we will need 2^k numbers for each node's CPT. So n2^k
numbers in total, instead of 2^n for the joint.
If n=20 and k=5, we need 640 numbers instead of over 1 million.
The correct order to add the nodes is "root causes" first, then
the variables they influence until we reach the "leaves", which have
no direct causal influence on the other variables.
If we don't, the network will have
- More links
- Less natural probabilities needed
People tend to prefer to give probability judgements for causal rules
rather than diagnostic ones.
We can read off which nodes are conditionally independent given
a set of evidence nodes.
If every undirected path from a node in X to a node in Y is d-separated
by E, then X and Y are conditionally independent given E.
A set of nodes E d-separates to sets of nodes X and Y if every undirected
path from a node in X to a node in Y is blocked given E.
A path is blocked if there is a node Z on the path such that one of the
following conditions holds:
- Z is in E and has one in and one out arrow.
- Z is in E and has two out arrows
- Neither Z nor any descendant of Z is in E, and two in arrows.
- Whether there is GAS in the car and whether the RADIO plays
are independent given evidence about whether the SPARKPLUGS fire.
- GAS and RADIO are independent if it is known that the BATTERY works.
- GAS and RADIO are independent given no evidence at all.
- GAS and RADIO are dependent given evidence about whether the car
STARTS.
- Diagnostic inferences--effects to causes.
Given that JohnCalls, infer that P(Burglary|JohnCalls) = 0.016.
- Causal--causes to effects
P(JohnCalls | Burglary) = 0.86
- Intercausal--between causes and a common effect.
P(Burglary | Alarm AND Earthquake) = 0.003
- Mixed
P(Alarm|JohnCalls AND (NOT Earthquake))