Decision Tree SQL Generation | Vantage Analytics Library - Decision Tree SQL Generation - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
Lake
VMware
Product
Vantage Analytics Library
Release Number
2.2.0
Published
March 2023
Language
English (United States)
Last Update
2024-01-02
dita:mapPath
ibw1595473364329.ditamap
dita:ditavalPath
iup1603985291876.ditaval
dita:id
zyl1473786378775
Product Category
Teradata Vantage
Before generating the decision tree, the decisiontree function generates SQL statements that return statistics about the attributes and predicted variable. From these statistics, the algorithm does the following:
  • Determines the cardinality of each attribute
  • Gets all possible values of the predicted variable and the counts associated with it from all observations
  • Initializes structures in memory for later use in the building process

The SQL statement that drives the tree-building process builds a contingency table from the data. The contingency table is an mxn matrix. Its m rows correspond to the distinct values of an attribute. Its n columns correspond to the distinct values of the predicted variable.

The SQL statement queries the contingency table to get statistics for calculations. The query consists of the counts of the N distinct values of the dependent variable. Therefore, when building a contingency table on a subset of the data in the input table, the SQL statement includes a WHERE clause that defines the subset. The subset is the path down the tree that defines which node is a candidate to split.