5.4.5 - Overview of Decision Trees - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Teradata Warehouse Miner
Release Number
February 2018
English (United States)
Last Update

Decision tree models are most commonly used for classification. What is a classification model or classifier? It is simply a model for predicting a categorical variable, that is a variable that assumes one of a predetermined set of values. These values can be either nominal or ordinal, though ordinal variables are typically treated the same as nominal ones in these models. An example of a nominal variable is single, married and divorced marital status, while an example of an ordinal or ordered variable is low, medium and high temperature. It is the ability of decision trees to not only predict the value of a categorical variable, but to directly use categorical variables as input or predictor variables that is perhaps their principal advantage. Decision trees are by their very nature also well suited to deal with large numbers of input variables, handle a mixture of data types and handle data that is not homogeneous (i.e., the variables do not have the same interrelationships throughout the data space). They also provide insight into the structure of the data space and the meaning of a model, a result at times as important as the accuracy of a model. It should be noted that a variation of decision trees called regression trees can be used to build regression models rather than classification models, enjoying the same benefits just described. Most of the upcoming discussion is geared toward classification trees with regression trees described separately.