To ensure the compatibility of UDFs developed on the ML Engine with Teradata Vantage, consider the following:
- For a function implementing a RowFunction interface (function with input ON clause without PARTITION BY or with PARTITION BY ANY), input data is transferred from theTeradata SQL Engine to Teradata ML Engine as a keyless table (a table with no partition key). Therefore, different executions of the function may result in different data distribution on ML Engine.
- For a function implementing a PartitionFunction interface (a function with an input ON clause with PARTITION BY KEY), input data is transferred from SQL Engine to ML Engine based on the partitioning key. Therefore, different executions of the function result in the same data distribution on ML Engine. However, the order of input data in each distribution may vary. Use an ORDER BY clause with the input ON clause to ensure that the function processes input in the same order every time.
- Input tables on which a function operates are temporarily created tables in ML Engine and are cleaned up after function execution. When the function calls the getInputNames of class baseRuntimeContract.CompletedContract in ML Engine, it gets the name of the temporary table instead of the input table provided in the SQL call on the SQL Engine.
- Tables output by a function are also temporarily created tables in ML Engine. The ML Engine transfers these tables to the SQL Engine after the execution, renamed on the SQL Engine based on the names supplied in the SQL call, and removed from ML Engine afterwards.
- For restrictions on column names in input and output tables, see Identifiers in Limitations.
- Select the data type of output produced by the functions based on the data type mapping table in Data Type Mapping between SQL Engine and ML Engine. An unbounded VARCHAR data type on ML Engine maps to a CLOB data type on SQL Engine. Use VARCHAR(n) if the data in the output is not expected to grow beyond n=32000 (the maximum for a bounded VARCHAR). Similarly, use bounded numeric data types instead of unbounded numeric data types.
- The first column of output produced by a function cannot be an unbounded VARCHAR, because this column is typically used as a partition key for the output table in the SQL Engine and a table cannot be partitioned on a CLOB column.
- Each external function (a function to be called from SQL Engine in a SQL call) must have a function description (JSON) file inside its archive (jar or zip). See JSON Function Descriptor for the specifications of a JSON file and how to include it in the function archive. Any auxiliary function (a function used by external function and does not appear in the SQL call on SQL Engine) does not need a JSON file. An external function can call other functions inside its code using SQL calls through a JDBC driver. A function using a JDBC driver to execute SQL calls inside the code is called a driver function.