User Defined Functions Compatibility Considerations | Teradata Vantage - Compatibility Considerations for UDFs - Teradata Vantage

Machine Learning Engine User Guide

Product
Teradata Vantage
Release Number
9.02
9.2.4
9.2.3
2.3.2
1.3
Published
May 2022
Language
English (United States)
Last Update
2022-05-04
dita:mapPath
kaz1597341649653.ditamap
dita:ditavalPath
ehs1594682156756.ditaval
dita:id
B700-4004
lifecycle
previous
Product Category
Teradata Vantageā„¢
To ensure the compatibility of UDFs developed on ML Engine with Advanced SQL Engine, consider the following:
  • Run at least one FFE query before you call List (List Functions and Files), Status (Status of UDFs), Remove (Uninstall UDFs), or Download (Downloading UDFs) functionalities.
  • For a function implementing a RowFunction interface (function with input ON clause without PARTITION BY or with PARTITION BY ANY), input data is transferred from Advanced SQL Engine to ML Engine as a keyless table (a table with no partition key). Therefore, different executions of the function may result in different data distribution on ML Engine.
  • The first column in the output of a function should not be unbounded varchar because it gets mapped to CLOB on SQLE.
  • The use of CLOB data type should be minimized to the columns with the CLOB data type with a maximum length greater than 32000.
  • For a function implementing a PartitionFunction interface (a function with an input ON clause with PARTITION BY KEY), input data is transferred from Advanced SQL Engine to ML Engine based on the partitioning key. Therefore, different executions of the function result in the same data distribution on ML Engine. However, the order of input data in each distribution may vary. Use an ORDER BY clause with the input ON clause to ensure that the function processes input in the same order every time.
  • Input tables on which a function operates are temporarily created tables in ML Engine and are cleaned up after function execution. When the function calls the getInputNames of class baseRuntimeContract.CompletedContract in ML Engine, it gets the name of the temporary table instead of the input table provided in the SQL call on Advanced SQL Engine.
  • Tables output by a function are also temporarily created tables in ML Engine. ML Engine transfers these tables to Advanced SQL Engine after the execution, renamed on Advanced SQL Engine based on the names supplied in the SQL call, and removed from ML Engine afterwards.
  • Output columns created by the function must begin with a lowercase letter or an underscore (_). Subsequent characters can be lowercase letters, digits (0-9), or underscores.
  • Select the data type of output produced by the functions based on the data type mapping table in Data Type Mapping between Advanced SQL Engine and ML Engine. An unbounded VARCHAR data type on ML Engine maps to a CLOB data type on Advanced SQL Engine. Use VARCHAR(n) if the data in the output is not expected to grow beyond n=32000 (the maximum for a bounded VARCHAR). Similarly, use bounded numeric data types instead of unbounded numeric data types.
  • Most driver functions require a JSON function descriptor file inside its archive (jar or zip). A driver function is a function that uses a JDBC driver inside its code to execute SQL statements. If a driver function has any of these characteristics, it will need a JSON function descriptor to execute correctly:
    • Its implementation takes the name of an input table from the function argument clause.
    • It creates an output table (secondary output) in addition to the SELECT statement output (primary output).
  • To return syntax and usage information (using the HELP FOREIGN FUNCTION command), a function must have a JSON function descriptor file packaged inside its archive (jar or zip).

See JSON Function Descriptor for the specifications of a JSON file and steps to include it in the function archive. Any non-driver function or an auxiliary function (a function used internally by a driver function and does not appear in the SQL call on Advanced SQL Engine) does not need a JSON file in order to be executed.