Node impurity is the idea behind the Gini diversity index split selection. To measure node impurity, use the formula:
Maximum impurity arises when there is an equal distribution of the class that is to be predicted. As in the heads and tails example, impurity is highest if half the total is heads and the other half is tails. On the other hand, if there were only tails in a certain sample the impurity would be 0.
The Gini index uses the following formula for its calculation of impurity:
For a determination of the goodness of a split, the following formula is used:
where tL and tR are the left and right sub nodes of t and pL and pR are the probabilities of being in those sub nodes.
For a detailed description of this type of tree, see [Breiman, Friedman, Olshen and Stone].