Decision Tree Basics - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software

Decision trees are very simple models. For example, suppose you want to predict the value of a variable, y, and you have two predictor variables, x1 and x2. You want to model y as a function of x1 and x2 (y = f(x1, x2)).

You can visualize x1 and x2 as forming a plane, and values of y at particular coordinates of (x1, x2) rising out of the plane in the third dimension. A decision tree partitions the plane into rectangles and assigns each partition to predict a constant value of y, which is usually the average value of all the y values in that region. You can extend this two-dimensional example into arbitrarily many dimensions to fit models with large numbers of predictors.



In this example, the x1-x2 plane has four regions, R1, R2, R3 and R4. The predicted value of y for any test observation in R1 is the average value of y for all training observations in R1.

This information can be represented by a decision tree:



The algorithm starts at the Root node. If the x1 value for a data point is greater than 5, then the algorithm travels down the right path; if the value of x1 is less than 5, then the algorithm travels down the left path. At each subsequent node, the algorithm determines which branch to follow, until it reaches a leaf node, to which it assigns a prediction value.