Overview - Teradata Warehouse Miner

Teradata® Warehouse Miner™ User Guide - Volume 2ADS Generation

Product
Teradata Warehouse Miner
Release Number
5.4.6
Published
November 2018
Language
English (United States)
Last Update
2018-12-07
dita:mapPath
gxn1538171534877.ditamap
dita:ditavalPath
ft:empty
dita:id
B035-2301
Product Category
Software

The most time intensive part of the data mining process is arguably the creation of a data set from which to build an analytic model. The data in a relational data warehouse is typically not in a form suitable for input directly into a data mining algorithm. New variables may need to be created using formulas, aggregations and/or expansions on specific values of a dimensioning variable. The joining of tables and/or denormalizing or flattening of relational tables may also be needed. In addition, statistical transformations are often required, depending on the type of algorithm to be used as well as the statistical properties of the data itself. These capabilities are referred to simply as Analytic Data Sets.

Several types of analysis may be involved in building an Analytic Data Set or ADS.
  • A Variable Creation analysis accesses one or more tables or views and provides expression building and dimensioning to define new variable columns and place them in a table or view.
  • A Variable Transformation function applies requested data mining transformation functions to the columns in a single table or view and creates a transformed table.
  • A Build ADS analysis joins together the tables or views created by one or more Variable Creation and/or Variable Transformation functions, allowing column selection and the application of expert where clause constraints. These functions can also be performed by a Variable Creation or Join function, but Build ADS is simpler than a Variable Creation and, unlike Join, can operate on a single table.
  • A Refresh analysis is provided to allow re-executing a chain of ADS and/or Reorganization analyses with temporary variations in various parameter values.
Identity columns (i.e., columns defined with the attribute “GENERATED … AS IDENTITY”) cannot be analyzed by Analytic Data Set functions that create an output or temporary table.