Overview of Decision Trees - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Product

Teradata Warehouse Miner

Release Number

5.4.5

Published

February 2018

Language

English (United States)

Last Update

2018-05-04

dita:mapPath

yuy1504291362546.ditamap

dita:ditavalPath

ft:empty

dita:id

B035-2302

Product Category

Software

Decision tree models are most commonly used for classification. What is a classification model or classifier? It is simply a model for predicting a categorical variable, that is a variable that assumes one of a predetermined set of values. These values can be either nominal or ordinal, though ordinal variables are typically treated the same as nominal ones in these models. An example of a nominal variable is single, married and divorced marital status, while an example of an ordinal or ordered variable is low, medium and high temperature. It is the ability of decision trees to not only predict the value of a categorical variable, but to directly use categorical variables as input or predictor variables that is perhaps their principal advantage. Decision trees are by their very nature also well suited to deal with large numbers of input variables, handle a mixture of data types and handle data that is not homogeneous (i.e., the variables do not have the same interrelationships throughout the data space). They also provide insight into the structure of the data space and the meaning of a model, a result at times as important as the accuracy of a model. It should be noted that a variation of decision trees called regression trees can be used to build regression models rather than classification models, enjoying the same benefits just described. Most of the upcoming discussion is geared toward classification trees with regression trees described separately.