The performance standard must be external or the agent could cheat and
change it.
- Today we will discuss the information gain test for choosing
attributes to split on.
- The information of an answer to a question depends on
how many possible answers there are and how probable each of them
is.
- If there are n possible answers, with probabilities P(v_i), then
the information of the answer is
I(P(v_1), ..., P(v_n)) = SUM_i=1^n -P(v_i) log_2(P(v_i))
- Suppose n=2, then with a fair coin the information of the answer
is 1 bit. With a completely rigged coin, it is 0 bits.
- The decision tree provides an answer to a yes/no question, so
we can write the information of that answer as:
I(p/(p+n), n/(p+n))
- If we split on attribute A which has v possible values, then
the i'th subtree has information:
I(branch i) = I(p_i/(p_i+n_i), n_i/(p_i+n_i))
- On average we still need this many bits to classify the example:
Remainder(A) =
SUM_i=1^v (p_i+n_i)/(p+n) I(p_i/(p_i+n_i), n_i/(p_i+n_i))
- So we have gained the following number of bits of information from the
attribute test:
Gain(A) = I(p/(p+n), n/(p+n)) - Remainder(A)