- Set up the environment with required packages using Open Analytics Framework and use it for apply execution. Once the environment is created, pass the environment name or object of class UserEnv to the apply().
- The function requires dill package with same version in both remote environment and local environment.
- Teradata recommends using the same Python version in both the remote and local environments.
- Teradata recommends using the same version of Python libraries between client machine and Analytics Database machine.
Input of Python function
The Python function can accept as many arguments as required, but must accept pandas Series object as its first (positional) argument corresponding to a row in the DataFrame. Thus, the user function has access to the data to process in a familiar format. Design the function to read from the Series object and manipulate the data accordingly.
Output of Python function
- pandas DataFrame having the same number of columns as expected in the output.
- pandas Series representing a row in the output of the method and having the same number of columns as the expected in the output.
- numpy ndarray
- One-dimensional: represents a row in the output, having the same number of columns as expected in the output.
- Two-dimensional: represents a dataset (like a pandas DataFrame) having the same number of columns as expected in the output.
The object returned by the user function is printed to the standard output as delimited lines (rows), using the specified delimiter and quotechar.
If the user function prints the output directly to the standard output (instead of returning an object of the supported type), then it must take care of using the delimiter and quotechar, if and when specified, to format the output printed.
Data in the standard output is stored in a table and the table is garbage collected at the end of the session.