PMML Scoring (Teradata Database) - Teradata Warehouse Miner

Teradata® Warehouse Miner™ User Guide - Volume 2ADS Generation

Product

Teradata Warehouse Miner

Release Number

5.4.6

Published

November 2018

Language

English (United States)

Last Update

2018-12-07

dita:mapPath

gxn1538171534877.ditamap

dita:ditavalPath

ft:empty

dita:id

B035-2301

Product Category

Software

Predictive Model Markup Language (PMML) is an XML standard being developed by the Data Mining Group, a vendor-led consortium established in 1998 to develop data-mining standards. Teradata (at that time NCR) co-developed the initial PMML specification along with Angoss, Magnify, SPSS and The National Center for Data Mining at the University of Illinois at Chicago.

PMML enables the definition and subsequent sharing of predictive models between applications. It represents and describes data mining and statistical models, as well as some of the operations required for cleaning and transforming data prior to modeling. PMML aims to provide enough infrastructure for an application to be able to produce a model (the PMML producer) and another application to consume it (the PMML consumer) simply by reading the PMML data file. This means that a model developed in a desktop data-mining tool can be deployed or scored against an entire data warehouse.

The following table lists the major constructs of PMML-compliant XML documents.

PMML-Compliant XML Document Constructs
Feature	Function
Data Dictionary	Defines the data to the model and specifies each data attribute’s type and value range.
Mining Schema	Defines attribute information specific to a certain model. It specifies an attribute's usage type, whether it be active or independent (an input of the model), predicted or dependent (an output of the model), or supplementary (descriptive information that is ignored by the model).
Transformation Dictionary	Contains simple algorithm-specific data transformations such as normalization (map values to numbers), discretization (map continuous values to discrete values), value mapping (map discrete values to discrete values) and aggregation (simple averages and counts).
Models	Identifies model parameters for regression models, cluster models, decision tree models, neural networks, Bayesian models, association rules and sequence models.

Each PMML construct supports a mechanism for extending the content of a model. Liberal use of such “extensions” requires that vendors who produce PMML-based models collaborate closely with vendors who wish to consume that PMML. Refer to the Teradata Warehouse Miner Release Definition document, B035-2494, for details about the products and product versions supported for PMML consumption in Teradata ADS Generator and Teradata Warehouse Miner.

Although PMML is a great step forward, it has several flaws other than extensions, namely encapsulation of the process of cleaning, transforming and aggregating data. Teradata recognized this limitation early on—if the PMML document could not represent the analytic variables that were input to the analytic tools, it would be nearly impossible to consume PMML for scoring predictive models. This is because the deployment (scoring phase) of a predictive model requires the existence of the same variables upon which the model was built. For this reason, the PMML Scoring analysis is included in both the Teradata ADS Generator as well as Teradata Warehouse Miner.