P(A OR B) = P(A) + P(B) - P(A AND B)
Non-probabilistic rules can be:
FORALL x Symptom(x, Toothache) => Disease(x, Cavity) FORALL x Disease(x, Toothache) => Symptom(x, Toothache)
FORALL p Symptom(x, Toothache) => Disease(x, Cavity) OR Disease(x, GumDisease) OR Disease(x, ImpactedWisdomTooth) OR ...
The probability that a particular patient has a cavity:
p(Cavity) = 0.1
p(A OR B) = p(A) + p(B) - p(A AND B)
The last axiom can be remembered easily using a Venn diagram.
Random variables (RV's) can assume two or more value. Here is an example where the random variable Weather can assume one of exactly four values:
p(Weather=Sunny) = 0.7 p(Weather=Rain) = 0.2 p(Weather=Cloudy) = 0.08 p(Weather=Snow) = 0.02If the RV's are Boolean, then we can use the normal propositional connectives. For example,
p(Cavity AND (NOT Insured)) = 0.06
P(X_1, X_2, ... , X_n)The joint probability can be written as a table:
| Toothache | NOT Toothache | |
|---|---|---|
| Cavity | 0.04 | 0.06 |
| NOT Cavity | 0.01 | 089. |
If there are n Boolean RV's, the joint distribution table has 2^n entries!
Usually your theory is interms of effects:
P(Toothache | Cavity)but you want to diagnose
P(Cavity | Toothache)
P(Y | X) = P(X | Y) P(Y) ------------- P(X)For example,
p(Cavity | Toothache) = p(Toothache | Cavity) p(Cavity) ------------------------------- p(Toothache)
The dentist is using the little tool that catches on cavities in teeth.
We say that the RV's Catch and Toothache are conditionally independent given Cavity if Cavity is the direct cause of Catch so that knowing Toothache adds no information if you already know Cavity, and knowing Catch gives no information about Toothache.
This is expressed as:
p(Catch | Cavity AND Toothache) = p(Catch | Cavity p(Toothache | Cavity AND Catch) = p(Toothache | Cavity)
(Fig 15.1 here)
| Burglary | Earthquake P(Alarm | Burglary, Earthquake) | ||
|---|---|---|---|
| True | False | ||
| True | True | 0.950 | 0.050 |
| True | False | 0.950 | 0.050 |
| False | True | 0.290 | 0.710 |
| False | False | 0.001 | 0.999 |
Because of conditional independence, the an entry in the joint probability table for all the variables is given by:
p(x_1, ... ,x_n) = PRODUCT_{i=1}^n p(x_i | Parents(X_i))
For example, the probability of the alram sounding but neither a burglary or earthquake has occured and john and mary both call is:
p(J AND M AND A AND (NOT B) AND (NOT E) ) = p(J|A) P(M|A) p(A| (NOT B) AND (NOT E)) p(NOT B) p(NOT E) = 0.90 x 0.70 x 0.001 x 0.999 x 0.998 = 0.00062
If we assume n Boolean variables, each of which is influenced by at most k others, then the CPT's will require at most n2^k entries. This is far smaller than the 2^k numbers required for the joint probability distribution.