The teradataml package runs on the client system and is designed for data management, exploration, and execution of analytic functions.
- Utility and database management functions
- Data exploration and preparation functions
- Analytic functions across Teradata Vantage
These functions support high-speed analytics processing required to operationalize analytics and automate data partitioning and parallel processing in Vantage.
teradataml, SQLAlchemy and Pandas
For Python users familiar with the Pandas Python package, the teradataml package builds on the concept and syntax of the pandas DataFrame object by creating the teradataml DataFrame object for data residing on Teradata Vantage.
The look and feel of a teradataml DataFrame is similar to a pandas DataFrame in Python. A teradataml DataFrame is a reference to a database object on the Python client, and it can represent a table, view, or query in a SQL Engine.
The teradataml library provides an API to access or manipulate a teradataml DataFrame. These functions generate an SQL request to be executed in the SQL Engine or ML Engine through a Python DB-API connection. The teradataml package uses teradatasqlalchemy, an implementation of SQLAlchemy’s Dialect interface, to provide enhanced support for rendering SQL. The teradatasqlalchemy dialect leverages the teradatasql DB-API implementation for connecting to Teradata Vantage.
The teradataml DataFrame supports lazy evaluation for a variety of operations on Teradata Vantage such as exploration, transformation, and machine learning. Only a subset of data is ever retrieved from Teradata Vantage, unless the user explicitly requests data to be transferred to the client.
The above characteristics result in a pandas-like experience for users who want to perform analytics on Teradata Vantage from their preferred Python environment.