The Teradata Package for Python introduces teradataml open-source machine learning functions, referred as teradataml OpenSourceML, which exposes most of the functionality of open-source packages like scikit-learn, and so on. With teradataml open-source machine learning functions, you can run these open-source packages without needing to pull the data to your client. It offers a simple interface object for the open-source packages, allowing them to be used with the same syntax and arguments as the actual open-source packages' functions and classes.
Functions/classes from open-source packages generates a single model that is trained on all the data. Unlike traditional open-source packages, you can use teradataml OpenSourceML to generate distributed models, also known as multiple models or micro models.
In combination with the MPP architecture that Vantage provides, teradataml OpenSourceML can tap, process and solve a large set of use cases where distributed models are needed. To enable this support, teradataml OpenSourceML introduces the partition_columns argument, which can be used in all functions; partition_columns accepts the column to be used to partition the data and generate the models for partitioned data.
- scikit-learn