Many times with algorithms such as those described above, a model over fits the data. One of the ways of correcting this is to prune the model from the leaves up. In situations where the error rate of leaves does not increase, when combined then they are joined into a new leaf.
A simple example may be given as follows. If there is nothing but random data for the attributes and the class is set to predict “heads” 75% of the time and “tails” 25% of the time, the result will be an over fit model that does not predict the outcome well. Just by looking it can be seen that instead of a built-up model with many leaves, the model could just predict “heads” and it would be correct 75% of the time, whereas over fitting usually does much worse in such a case.
Teradata Warehouse Miner provides pruning according to the gain ratio and Gini diversity index pruning techniques. It is possible to combine different splitting and pruning techniques, however when pruning a regression tree the Gini diversity index technique must be used.