Preprocess Input Data | BYOM | Teradata Vantage - 2.0 - Preprocessing Input Data - Teradata Vantage

Teradata Vantageā„¢ - Bring Your Own Model User Guide

Product
Teradata Vantage
Release Number
2.0
Release Date
October 2021
Content Type
User Guide
Publication ID
B700-1111-051K
Language
English (United States)
Before using your input data to create a model, you can transform the data with R or Python functions for PMML models or H2O transformations with MOJO models.

PMML Models

The functions transform the input data during model training as part of a pipeline. The generated model, stored in XML format, includes the preprocessing steps. During model prediction, the transformations are applied to the input data and the transformed data is scored by the PMML or MOJO model.

PMML supports the following input data transformations:

Transformation Description R Function Python Functions
Normalization Scales continuous or discrete input values to specified range. xform_min_max MinMaxScaler
Discretization Maps continuous input values to discrete values. xform_discretize CutTransformer
Value Mapping Maps discrete input values to other discrete values. xform_map StandardScalar

LabelEncoder

Function Mapping Maps input values to values derived from applying a function. xform_function FunctionTransformer

The R functions are in the library https://cran.r-project.org/web/packages/pmml/index.html. Use the xform_wrap function to wrap your input data before feeding it to an R transformation function.

R creates the PMML model using the function pmml:pmml() and inserts the transformations into the XML element LocalTransformations.

Python uses the libraries sklearn and sklearn_pandas to set up the pipeline for preprocessing transformations, and uses the DataFrameMapper function in the library sklearn_pandas to transform input data. For information about sklearn and sklearn_pandas, see https://scikit-learn.org.

For examples of PMML pipelines that preprocess input data, see PMML Models with Custom Transformations.

MOJO Models

H2O Driverless AI (DAI) provides a number of transformations.

The following transformers are available for regression and classification (multiclass and binary) experiments:
  • Numeric
  • Categorical
  • Time and Date
  • Time Series
  • NLP (test)
  • Image

For details on each type of transformation, refer to https://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/transformations.html.