Setup and Usage - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-23
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

Setup

  • Teradata recommends using the same Python version in both the database server and the environment where teradataml runs.
  • The function requires dill package with same version in both the database server and the local environment.
  • Teradata recommends using similar versions of Python libraries between the client machine and Analytics Database machine.
  • The function being applied to the row or set of rows using map_row() or map_partition() must be defined in the current Python session. Any modules/packages being used by it must be available to use with Script Table Operator on the database servers. If the function is being imported from some package or module, that too must be available on the database server.
  • Teradata recommends filling/replacing empty values in character columns with a known placeholder value for better readability with map_row() and map_parition() and avoid confusing NULL values with empty string

Execute Mode

Specifies the mode of execution for the user defined function.

  • IN-DB: Execute the function on data in the teradataml DataFrame in Analytics Database and returns a teradataml DataFrame. (Default execution mode)
  • LOCAL: Execute the function locally on sample data from the teradataml DataFrame and returns a Pandas DataFrame.

Input of Python Function

The user function can accept as many arguments as required, but must accept one of the following as its first (positional) argument:
  • pandas Series object corresponding to a row in the DataFrame when the method called is map_row()
  • iterator (TextFileReader object) to read data from the partition of rows in chunks (pandas DataFrames) when the method called is map_partition()

As a result, the user function has access to the data to process in a familiar format. Design the functions to read from the Series object or iterator and manipulate the data accordingly.

Output of Python Function

The Python function can either print the output to the standard output (just like STO) or return objects of any of the supported types so that they are printed to the standard output correctly, which include:
  • pandas DataFrame having the same number of columns as expected in the output.
  • pandas Series representing a row in the output of the method and having the same number of columns as the expected in the output.
  • numpy ndarray
    • One-dimensional: represents a row in the output, having the same number of columns as expected in the output.
    • Two-dimensional: represents a dataset (like a pandas DataFrame) having the same number of columns as expected in the output.

The object returned by the user function is printed to the standard output as delimited lines (rows), using the specified delimiter and quotechar.

If the user function prints the output directly to the standard output (instead of returning an object of the supported type), then it must take care of using the delimiter and quotechar, if and when specified, to format the output printed.

This data printed to the standard output then gets converted to and saved in a table in Analytics Database.

The table is deleted as a part of garbage collection as soon as a remove_context() call is issued. To persist these results, the DataFrame.to_sql() method can be used.