Converting Scripts to Use With Open Analytics Framework | Teradata VantageCloud Lake - Converting Scripts to Use with Open Analytics Framework - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
When migrating your SCRIPT table operator scripts to use with Open Analytics Framework, here are some hints to consider:
  • The APPLY table operator supports Python and R scripts.
  • The SYSUIF family of external stored procedures (XSP) are not supported in Open Analytics Framework. File uploads and handling is exclusively operated through the Open Analytics REST API, and you can use the Teradata Package for Python (teradataml) to interface all your calls to Open Analytics Framework.
  • An analytic compute group must be configured and selected for your session, and a user environment must be set up before the APPLY Table operator can be used.
  • Unlike SCRIPT table operator, there is no need for explicit specification of a delimiter character for scripts in APPLY table operator.

    The default delimiter of APPLY table operator is comma (","), and scripts default to using it.

    When a script prints output variables, they still need to be delimiter-separated (comma "," by default), otherwise the output is interpreted as one single string.
  • The APPLY table operator uses comma (",") as the default data delimiter, and this brings additional convenience compared to the SCRIPT table operator's tab character ("\t") as the default delimiter.

    The Database typically streams float variables to the user script in the scientific format "x.xx Eyy". In this format, a space character separates the mantissa "x.xx" to the left from the exponent "Eyy" to the right. Hence, a space or tab character delimiter could lead to misinterpreting ("x.xx Eyy") as two variables ("x.xx" and "Eyy").

    This behavior is shielded from errors more efficiently through the use of comma (",") as the default delimiter by the APPLY table operator.

  • In your code, avoid writing output into the target filesystem. The analytic compute cluster filesystem is read-only.
  • While working LOB columns, consider the following points:
    • When a script generates LOB column, it should write to the file in /lob or /tmp directory and write the full path of the file on stdout to refer to the new lob column value generated.
    • At the time reading LOB column from the database, LOB column is sent as a file path which needs to be read by the script.

      Users need not decode/encode file content before writing to the the lob column.

    • Tables involving LOB columns must be Object File System tables.
  • For scikit-learn model training and scoring, teradataml has an opensourceML module. See "teradataml: OpenSourceML" in Teradata Package for Python Function Reference on VantageCloud Lake.