Overview of Teradata Warehouse Miner

Teradata Warehouse Miner User Guide - Volume 1Introduction and Profiling

Teradata Warehouse Miner
User Guide

Data warehousing has become a required component of business information technology today. In recent years, data mining has become a key aspect of decision support and customer relationship management applications built on top of the data warehouse and a crucial component in exploiting their inherent value. The Teradata Relational Database Management System (RDBMS) software is the leading technology available today for building data warehouses. Although Teradata data warehouses span a wide range of system sizes, from entry level servers to the largest massively parallel data warehouses in the world, they have in common unparalleled decision support performance and scalability.

Teradata Warehouse Miner is software that allows users to perform data mining entirely within a Teradata warehouse. Representing a dramatic shift from past non-warehouse-resident data mining architectures, Teradata Warehouse Miner users perform data mining without the additional hardware, software, and associated data management processes those architectures require. Additionally, the product is separated into four distinct offerings, allowing different types of Teradata users the functionality they need to perform data profiling, analytic data creation and model building, scoring and evaluation.

The first of these three offerings is the Teradata Profiler. The components available in this offering were developed to provide a comprehensive data profiling tool that does not require any movement of data outside of the warehouse, using as much of the data as desired, storing results directly in the database, and utilizing the parallel, scalable processing power of Teradata to perform data intensive operations. A wide variety of descriptive statistics are available to generate reports and graphics with drill down capabilities, pointing out potential issues with data quality.

The second offering is the Teradata Data Set Builder for SAS (also known as the Teradata Analytic Data Set (ADS) Generator). This includes all of the components within Teradata Profiler in addition to analyses that aid in the generation of Analytic Data Sets, analyses that can build and export a Correlation Matrix (and related matrix types), an analysis to score models that are described using the Predictive Model Markup Language (PMML) and an analysis to publish analytic data sets and/or models for deployment through the Teradata Model Manager web-based application.

The need to build Analytic Data Sets derives from the fact that the data associated with the highly normalized data models within the Warehouse are not suited for mining directly. A precursor to the creation of analytic models is therefore the creation of an Analytic Data Set (ADS), which is a denormalized data structure often referred to as being in “observation format.” That is, the analytic algorithms need to be presented the data in a flat structure in which all of the variables are present for each entity (customer, household, account, etc.) being modeled. Teradata ADS Generator has components that aid in the creation of these variables, dimensioning or denormalizing them, as well as statistically transforming them. Additional components allow tables to be sampled, partitioned, denormalized and joined together.

The third offering is Teradata Warehouse Miner. In addition to the features of the Teradata Profiler and ADS Generator, it provides analytic algorithms that make binomial, multinomial and continuous predictions via Logistic Regression, Decision Tree and Linear Regression algorithms. Dimensionality reduction is offered with several flavors of Factor Analysis, while the Clustering algorithm provides an interactive mechanism for customer segmentation and for solving various business problems where similar grouping is desired. The Association Rules algorithm with optional Sequence Analysis provides a solution for problems such as market basket analysis and channel usage analysis. With the exception of Association Rules, all of the models produced by the algorithms can be evaluated and scored using features of the product. Finally, the third offering includes a collection of 17 Statistical Tests, including various Binomial, Kolmogorov-Smirnov, Parametric, Rank and Contingency Table tests.