5.4.5 - Overview of Analytic Algorithms - Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 3Analytic Functions

Product
Teradata Warehouse Miner
Release Number
5.4.5
Published
February 2018
Language
English (United States)
Last Update
2018-05-04
dita:mapPath
yuy1504291362546.ditamap
dita:ditavalPath
ft:empty
This chapter applies only to an instance of Teradata Warehouse Miner operating on a Teradata database.
Teradata Warehouse Miner contains several analytic algorithms from both the traditional statistics and machine learning disciplines. These algorithms pertain to the exploratory data analysis (EDA) and model-building phases of the data mining process. Along with these algorithms, Teradata Warehouse Miner contains corresponding model scoring and evaluation functions that pertain to the model evaluation and deployment phases of the data mining process. A brief summary of the algorithms offered may be given as follows:
  • Linear Regression — Linear regression can be used to predict or estimate the value of a continuous numeric data element based upon a linear combination of other numeric data elements present for each observation.
  • Logistic Regression — Logistic regression can be used to predict or estimate a two-valued variable based upon other numeric data elements present for each observation.
  • Factor Analysis — Factor analysis is a collective term for a family of techniques. In general, Factor analysis can be used to identify, quantify, and re-specify the common and unique sources of variability in a set of numeric variables. One of its many applications allows an analytical modeler to reduce the number of numeric variables needed to describe a collection of observations by creating new variables, called factors, as linear combinations of the original variables.
  • Decision Trees — Decision trees, or rule induction, can be used to predict or estimate the value of a multi-valued variable based upon other categorical and continuous numeric data elements by building decision rules and presenting them graphically in the shape of a tree, based upon splits on specific data values.
  • Clustering — Cluster analysis can be used to form multiple groups of observations, such that each group contains observations that are very similar to one another, based upon values of multiple numeric data elements.
  • Association Rules — Generate association rules and various measures of frequency, relationship and statistical significance associated with these rules. These rules can be general, or have a dimension of time association with them.