teradataml Open-Source Machine Learning Functions - teradataml Open-Source Machine Learning Functions - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

The Teradata Package for Python introduces teradataml open-source machine learning functions, referred as teradataml OpenSourceML, which exposes most of the functionality of open-source packages like scikit-learn, and so on. With teradataml open-source machine learning functions, you can run these open-source packages without needing to pull the data to your client. It offers a simple interface object for the open-source packages, allowing them to be used with the same syntax and arguments as the actual open-source packages' functions and classes.

Functions/classes from open-source packages generates a single model that is trained on all the data. Unlike traditional open-source packages, you can use teradataml OpenSourceML to generate distributed models, also known as multiple models or micro models.

In combination with the MPP architecture that Vantage provides, teradataml OpenSourceML can tap, process and solve a large set of use cases where distributed models are needed. To enable this support, teradataml OpenSourceML introduces the partition_columns argument, which can be used in all functions; partition_columns accepts the column to be used to partition the data and generate the models for partitioned data.

Supported open-source packages:
  • scikit-learn
The following topics provide setup, module, and usage information: