Splitting on Information Gain Ratio - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Product

Teradata Warehouse Miner

Release Number

5.4.4

Published

July 2017

Language

English (United States)

Last Update

2018-05-03

dita:mapPath

lov1499730320967.ditamap

dita:ditavalPath

ft:empty

dita:id

B035-2302

Product Category

Software

Information theory is the basic underlying idea in this type of decision tree. Splits on categorical variables are made on each individual value. Splits on continuous variables are made at one point in an ordered list of the actual values, that is a binary split is introduced right on a particular value.

Define the “info” at node t as the entropy:
Suppose t is split into subnodes t 1 , …, t 2 by predictor X. Define:

Once the gain ratios have been computed the attribute with the highest gain ratio is used to split the data. Then each subset goes through this process until the observations are all of one class or a stopping criterion is met such as each node must contain at least 2 observations.

For a detailed description of this type of decision tree, see [Quinlan].