The formulas in this topic use these terms:
Term | Description |
---|---|
t | Node |
j | Learning class |
J | Number of classes |
s | Split |
N(t) | Number of cases within node t |
p(j|t) | Proportion of class j learning samples in node t |
ϕ | Impurity function, a symmetric function with maximum value: ϕ(J-1, J-1, …, J-1) ϕ(1, 0, …, 0) = ϕ(0, 1, …, 0) = … = ϕ(0, 0, …, 1) = 0 |
ti | Subnode i of node t |
i(t) | Impurity measure of node t |
t1 | Left-split subnode of node t |
tR | Right-split subnode of node t |
X | Predictor variable |
An information gain ratio tree splits on categorical variables on each individual value and continuous variables at one point in an ordered list of actual values (that is, a binary splits on a particular value is introduced).
The tree uses this procedure for splitting:
- Define info(t) at node t as the entropy:
- Define the following, where node t is split into subnodes by predictor variable ssX:
- Use the attribute with the highest gain ratio to split the data.
- Repeat this procedure on each subset until the observations are all of one class or a stopping criterion is trues (for example, "each node must contain at least two observations").
For a detailed description of an information gain ratio tree, see [Quinlan].