Currently, teradataml open-source machine learning module exposes scikit-learn package dynamically through an interface object td_sklearn. Current implementation exposes around 94 percent of classes and around 91 percent of class methods supported by scikit-learn.
Use this module to execute any scikit-learn function with same syntax and arguments using the interface object. teradataml open-source machine learning functions can be used to achieve:
- Load and deploy scikit-learn models (model generated by teradataml OpenSourceML as well as external models)
- Support classification and regression metrics
With td_sklearn, you can easily run any scikit-learn function inside Vantage where data reside, that is, without any data transfer, using the Massively Parallel Processing (MPP) capabilities. While doing so, you do not have to worry about usage and function syntaxes. To ease the usage, teradataml td_sklearn supports multiple syntaxes as follows:
- Syntax 1: Using well known scikit-learn function syntax where arguments, X and y are passed.
- Syntax 2: Alternative to the legacy arguments X and y, Teradata introduces another set of arguments data, feature_columns, label_columns, group_columns.
The following sections discuss about how to use teradataml's td_sklearn to run scikit-learn using different syntaxes, generating classification and regression metrics, generating single model and distributed-model (multi-model) support through partition_columns argument, additional support for load and deploy scikit-learn models, the supportability information and limitations and considerations.
The examples use only specific scikit-learn function, but the same logic is applicable for all other scikit-learn functions.