Decision Trees and NULL Values - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Product

Teradata Warehouse Miner

Release Number

5.4.5

Published

February 2018

Language

English (United States)

Last Update

2018-05-04

dita:mapPath

yuy1504291362546.ditamap

dita:ditavalPath

ft:empty

dita:id

B035-2302

Product Category

Software

NULL values are handled by listwise deletion. This means that if there are NULL values in any variables (independent and dependent) then that row where a NULL exists will be removed from the model building process.

NULL values in scoring, however, are handled differently. Unlike in tree building where listwise deletion is used, scoring can sometimes handle rows that have NULL values in some of the independent variables. The only time a row will not get scored is if a decision node that the row is being tested on has a NULL value for that decision. For instance, if the first split in a tree is “age < 50,” only rows that don’t have a NULL value for age will pass down further in the tree. This row could have a NULL value in the income variable. But since this decision is on age, the NULL will have no impact at this split and the row will continue down the branches until a leaf is reached or it has a NULL value in a variable used in another decision node.