AI: A Modern Approach, Chapter 15: Probabilistic Reasoning Systems

  1. Representing Knowledge in an Uncertain Domain

  2. The Semantics of Belief Networks

  3. Inference in Belief Networks

  4. Inference in Multiply Connected Belief Networks

The probabilility of any atomic event (it's joint probability) can be gotten from the network.

By definition,

	P(x_1, ... , x_n) = PRODUCT_{i=1}^n P(x_i | Parents(X_i))

For example,

	P( J AND M AND A AND (NOT B) AND (NOT E) )
		= P(J|A) P(M|A) P(A|(NOT B) AND (NOT E)) P(NOT B) P(NOT E)
		= 0.90 * 0.70 * 0.001 * 0.999 * 0.998 = 0.00062

We can show that this definition of the joint probability means that the belief network correctly represents the domain only if each node is conditionally independent of its predecessors in the node ordering, given its parents.

First note that,

	P(x_1, ..., x_2) = P(x_n|x_{n-1}, ..., x_1) P(x_{n-1}, ..., x_1).
We extend this to show that
	P(x_1, ..., x_2) = PRODUCT_{i=1}^n P(x_i | x_{i-1}, ..., x_1)
This means that
	P(X_1, ..., X_2) = P(X_i | Parents(X_i)
provided that Parents(X_i) is a subset of {x_{i-1,..., x1}.

We can label the nodes in the graph in any order consistent with the partial order implicit in the graph structure.

The parents of node X_i should contain all of those nodes in X_1,...,X{i-1} that directly influence X_i.

To construct a network, do the following:

  1. Choose the set of relevant variables X_i that describe the domain.
  2. Choose an ordering for the variables.
  3. While there are variables left:
    1. Pick a variable and add a node to the network for it.
    2. Set Parents(X_i) to some minimal set of nodes already in the net such that the conditional independence property above is satisfied.
    3. Define the conditional probability table for X_i.

Acyclic because each node only connected to earlier nodes.

Consistent with rules of probability because there are no redundant probability values.

Compact because many domains are locally structured.

If each RV is directly influenced by at most k others, we will need 2^k numbers for each node's CPT. So n2^k numbers in total, instead of 2^n for the joint.

If n=20 and k=5, we need 640 numbers instead of over 1 million.

The correct order to add the nodes is "root causes" first, then the variables they influence until we reach the "leaves", which have no direct causal influence on the other variables.

If we don't, the network will have

  1. More links
  2. Less natural probabilities needed
People tend to prefer to give probability judgements for causal rules rather than diagnostic ones.

We can read off which nodes are conditionally independent given a set of evidence nodes.

If every undirected path from a node in X to a node in Y is d-separated by E, then X and Y are conditionally independent given E.

A set of nodes E d-separates to sets of nodes X and Y if every undirected path from a node in X to a node in Y is blocked given E.

A path is blocked if there is a node Z on the path such that one of the following conditions holds:

  1. Z is in E and has one in and one out arrow.
  2. Z is in E and has two out arrows
  3. Neither Z nor any descendant of Z is in E, and two in arrows.

  1. Whether there is GAS in the car and whether the RADIO plays are independent given evidence about whether the SPARKPLUGS fire.
  2. GAS and RADIO are independent if it is known that the BATTERY works.
  3. GAS and RADIO are independent given no evidence at all.
  4. GAS and RADIO are dependent given evidence about whether the car STARTS.

  1. Diagnostic inferences--effects to causes.
    Given that JohnCalls, infer that P(Burglary|JohnCalls) = 0.016.
    
  2. Causal--causes to effects
    	P(JohnCalls | Burglary) = 0.86
    
  3. Intercausal--between causes and a common effect.
    	P(Burglary | Alarm AND Earthquake) = 0.003
    
  4. Mixed
    	P(Alarm|JohnCalls AND (NOT Earthquake))