Decision Tree SQL Generation | Vantage Analytics Library - Decision Tree SQL Generation

Decision Tree SQL Generation | Vantage Analytics Library - Decision Tree SQL Generation - Vantage Analytics Library

Vantage Analytics Library User Guide

Deployment

VantageCloud

VantageCore

Edition

Enterprise

IntelliFlex

Lake

VMware

Product

Vantage Analytics Library

Release Number

2.2.0

Published

March 2023

Language

English (United States)

Last Update

2024-01-02

dita:mapPath

ibw1595473364329.ditamap

dita:ditavalPath

iup1603985291876.ditaval

dita:id

zyl1473786378775

Product Category

Teradata Vantage

Before generating the decision tree, the decisiontree function generates SQL statements that return statistics about the attributes and predicted variable. From these statistics, the algorithm does the following:

Determines the cardinality of each attribute
Gets all possible values of the predicted variable and the counts associated with it from all observations
Initializes structures in memory for later use in the building process

The SQL statement that drives the tree-building process builds a contingency table from the data. The contingency table is an mxn matrix. Its m rows correspond to the distinct values of an attribute. Its n columns correspond to the distinct values of the predicted variable.

The SQL statement queries the contingency table to get statistics for calculations. The query consists of the counts of the N distinct values of the dependent variable. Therefore, when building a contingency table on a subset of the data in the input table, the SQL statement includes a WHERE clause that defines the subset. The subset is the path down the tree that defines which node is a candidate to split.