PMML Models
The functions transform the input data during model training as part of a pipeline. The generated model, stored in XML format, includes the preprocessing steps. During model prediction, the transformations are applied to the input data and the transformed data is scored by the PMML or MOJO model.
PMML supports the following input data transformations:
Transformation | Description | R Function | Python Functions |
---|---|---|---|
Normalization | Scales continuous or discrete input values to specified range. | xform_min_max | MinMaxScaler |
Discretization | Maps continuous input values to discrete values. | xform_discretize | CutTransformer |
Value Mapping | Maps discrete input values to other discrete values. | xform_map | StandardScalar LabelEncoder |
Function Mapping | Maps input values to values derived from applying a function. | xform_function | FunctionTransformer |
The R functions are in the library https://cran.r-project.org/web/packages/pmml/index.html. Use the xform_wrap function to wrap your input data before feeding it to an R transformation function.
R creates the PMML model using the function pmml:pmml() and inserts the transformations into the XML element LocalTransformations.
Python uses the libraries sklearn and sklearn_pandas to set up the pipeline for preprocessing transformations, and uses the DataFrameMapper function in the library sklearn_pandas to transform input data. For information about sklearn and sklearn_pandas, see https://scikit-learn.org.
For examples of PMML pipelines that preprocess input data, see PMML Models with Custom Transformations.
MOJO Models
H2O Driverless AI (DAI) provides a number of transformations.
- Numeric
- Categorical
- Time and Date
- Time Series
- NLP (test)
- Image
For details on each type of transformation, refer to https://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/transformations.html.