- Define the “info” at node t as the entropy:
- Suppose t is split into subnodes t
, …, t
by predictor X. Define:
Once the gain ratios have been computed the attribute with the highest gain ratio is used to split the data. Then each subset goes through this process until the observations are all of one class or a stopping criterion is met such as each node must contain at least 2 observations.
For a detailed description of this type of decision tree, see [Quinlan].