td_sklearn | teradataml open-source machine learning functions - td_sklearn - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
March 2024
Language
English (United States)
Last Update
2024-04-09
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

Currently teradataml open-source machine learning module exposes scikit-learn package dynamically through an interface object td_sklearn. Current implementation exposes around 94 percent of classes and around 91 percent of class methods supported by scikit-learn.

This module allows you to execute any scikit-learn function with same syntax and arguments that you are aware of. teradataml open-source machine learning functions can be used to achieve:
  • Single Model training and scoring
  • Distributed Model (multi-model / micro-model) training and scoring with parition_columns argument
  • Load and deploy scikit-learn models (model generated by teradataml OpenSourceML as well as external models)
  • Support classification and regression metrics

With td_sklearn, you can easily run any scikit-learn function inside Vantage where data reside, that is, without any data transfer, using the Massively Parallel Processing (MPP) capabilities. While doing so, you do not have to worry about usage and function syntaxes. To ease the usage, teradataml OpenSourceML supports multiple syntaxes as follows:

  • Syntax 1: Using well known scikit-learn function syntax where arguments, X and y are passed.
  • Syntax 2: Alternative to the legacy arguments X and y, Teradata introduces another set of arguments data, feature_columns, label_columns, group_columns.

The following sections discuss about how to use teradataml open-source machine learning functions to run scikit-learn using different syntaxes, generating classification and regression metrics, generating single model and distributed-modeling (multimodel) support through partition_columns argument, additional support for load and deploy scikit-learn models, the supportability information and limitations and considerations.

The examples use only specific scikit-learn function, but same logic is applicable for all other scikit-learn functions.