Script | Teradata Python Package - 17.00 - Script - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Release Date
November 2021
Content Type
User Guide
Publication ID
B700-4006-070K
Language
English (United States)

The teradataml Script function is an interface to the SCRIPT table operator (STO) object in the Advanced SQL Engine database. See the SCRIPT section of Table Operators in Teradata Vantage™ - SQL Operators and User-Defined Functions, B035-1210 for details about SCRIPT table operator.

You can use Script function to execute natively in Vantage a user-installed script against data from an Advanced SQL Engine table via the SCRIPT table operator. To do this, the Teradata In-nodes Interpreter and Add-ons language packages must be installed in advance in the target Advanced SQL Engine nodes.

Required arguments:
  • script_command: specifies the command or script to run,.
  • script_name: specifies the name of the user script.
  • files_local_path specifies the absolute local path where user script and all supporting files like model files, input data file reside.
  • returns: specifies output column definition, which is a dictionary specifying column-name to teradatasqlalchemy-type mapping.
    Users can pass a dictionary (dict or OrderedDict) to the returns argument, with the keys ordered to represent the order of the output columns. The preferred type is OrderedDict.
Optional arguments:
  • data: specifies a teradataml DataFrame containing the input data for the script.
  • data_hash_column: specifies the column to be used for hashing.

    The rows in the data will be redistributed to AMPs based on the hash value of the column specified. The user-installed script file then runs once on each AMP. If there is no data_partition_column, then the entire result set, delivered by the function, constitutes a single group or partition.

    data_hash_column can not be specified along with data_partition_column, is_local_order and data_order_column.
  • data_partition_column: specifies Partition By columns for data.
    • data_partition_column can not be specified along with data_hash_column.
    • data_partition_column can not be specified along with "is_local_order = True".
  • is_local_order: specifies a boolean value to determine whether the input data is to be ordered locally or not.

    Default value is False. When set to True, data is ordered locally.

    • This argument is ignored, if data_order_column is None.
    • is_local_order can not be specified along with data_hash_column.
    • When is_local_order is set to 'True', data_order_column should be specified, and the columns specified in data_order_column are used for local ordering.
  • data_order_column: specifies the Order By column for data, and can be used no matter is_local_order is set to 'True' or 'False'.
    data_order_column can not be specified along with data_hash_column.
  • sort_ascending: specifies a boolean value to determine if the result set is to be sorted on the data_order_column column in ascending or descending order.

    When set to the default value 'True', the sorting is ascending. When set to 'False', the sortin is descending.

    This argument is ignored, if data_order_column is None.
  • nulls_first: specifies a boolean value to determine whether NULLS are listed first or last during ordering.

    NULLS are listed first when this argument is set to 'True', and last when set to 'False'.

    This argument is ignored, if data_order_column is None.
  • delimiter: specifies a delimiter to use when reading columns from a row and writing result columns.

    The default value is '\t' (tab).

    • This argument cannot be the same as quotechar argument.
    • This argument cannot be a newline character, which is '\\n'.
  • auth: specifies an authorization to use when running the script.
  • charset: specifies the character encoding for data, could be either 'utf16' or 'latin'.
  • quotechar: specifies a character that forces all input and output of the script to be quoted using this specified character.

    Using this argument enables the Advanced SQL Engine to distinguish between NULL fields and empty strings. A string with length zero is quoted, while NULL fields are not.

    If this character is found in the data, it will be escaped by a second quote character.

    • This argument cannot be the same as delimiter argument.
    • This argument cannot be a newline character, which is '\\n'.

For details of all arguments, see Teradata Package for Python Function Reference.

Required permissions

Prior to running Script functions, specify search path and permissions as listed here:
Assume the user is 'TDAPUSER', and the database is 'TDAPUSERDB'.
  • Teradata user should be given permission to execute Script Table Operator and the default authentication.
    GRANT EXECUTE FUNCTION ON td_sysfnlib.script TO TDAPUSER;
    GRANT EXECUTE ON SYSUIF.DEFAULT_AUTH to TDAPUSER;
  • Additional permissions required to run Script on the database.
    GRANT SELECT ON TDAPUSERDB TO TDAPUSER;
  • Additional permissions required to install, remove and replace files.
    GRANT EXECUTE PROCEDURE ON SYSUIF.INSTALL_FILE TO TDAPUSER;
    GRANT EXECUTE PROCEDURE ON SYSUIF.REPLACE_FILE TO TFDAPUSER;
    GRANT EXECUTE PROCEDURE ON SYSUIF.REMOVE_FILE TO TDAPUSER;
  • You must set SEARCHUIFDBPATH prior to accessing any installed scripts in the target database, so that the Linux system user 'tdatuser' can find the scripts and run them through the STO. If the scripts reside in 'TDAPUSERDB', then set search path in corresponding Python script or Python console as follows:
    get_connection().execute("SET SESSION SEARCHUIFDBPATH = TDAPUSERDB;")
  • If the user ('me' in the example here) is using another database ('TDAPUSERDB' in the example here) as default database, the user should have the execute permissions on the Script function to the default database.

Here is the quick example.

  1. Create the required users and databases as database administrator.
    -- Create a user 'sysdba' with default database as 'sysdba'.
    CREATE USER sysdba FROM DBC
    AS PERM = 14e9
    PASSWORD = sysdba
    DEFAULT DATABASE = sysdba
    NO BEFORE JOURNAL
    NO AFTER JOURNAL;
     
    -- Create a new database from the newly created user 'sysdba'.
    CREATE DATABASE TDAPUSERDB FROM sysdba
    AS PERM=1e9
    NO BEFORE JOURNAL
    NO AFTER JOURNAL;
     
    -- Create a user with default database as newly created database 'TDAPUSERDB'.
    CREATE USER me FROM sysdba
    AS PERM = 100e6
    PASSWORD = me
    DEFAULT DATABASE = TDAPUSERDB
    NO BEFORE JOURNAL
    NO AFTER JOURNAL;
  2. Make sure the file 'mapper.py' is installed in the database 'TDAPUSERDB'. Refer to the Database Utility section for how to use install_file() to install file in Vantage. Ignore if the file is already present in Vantage.
  3. The database administrator should provide the following permissions to the user 'me', if not already provided.
    -- Permissions to the user 'me'.
    GRANT EXECUTE FUNCTION ON td_sysfnlib.script TO me;
    GRANT EXECUTE ON SYSUIF.DEFAULT_AUTH to me;
     
     
    -- Additional permissions on the database 'TDAPUSERDB'.
    GRANT CREATE TABLE ON TDAPUSERDB TO me; -- Needed if the table and required permissions are not present.
    GRANT CREATE VIEW ON TDAPUSERDB TO me; -- Needed if the permissions for view creation is not present.
    GRANT SELECT ON TDAPUSERDB TO me;
  4. Set the search path and run the Script function as the user 'me'.
    -- Set the search path.
    SET SESSION SEARCHUIFDBPATH = TDAPUSERDB;
     
     
    -- Create view on top of Script function.
    -- Note that view creation is successful.
    create view vw3 as SELECT * FROM Script(
     ON "TDAPUSERDB"."usecase5" AS "input"
     PARTITION BY ANY
     SCRIPT_COMMAND('Rscript ./TDAPUSERDB/mapper.R')
     delimiter(',')
     returns('oc1 VARCHAR(10), oc2 VARCHAR(10)')
    ) as sqlmr;
     *** View has been created.
     *** Total elapsed time was 1 second.
      
    -- Seeing the contents of the view is not possible as the user 'me' don't have execute permissions on TD_SYSFNLIB.SCRIPT and default auth i.e., SYSUIF.DEFAULT_AUTH.
    select * from vw3;
     *** Failure 3523 An owner referenced by user does not have EXECUTE FUNCTION
      WITH GRANT OPTION access to TD_SYSFNLIB.SCRIPT.
    
  5. The user 'me' has to be given the following permissions, for the above select on view 'vw3' to work.
    GRANT EXECUTE FUNCTION ON td_sysfnlib.script TO TDAPUSERDB WITH GRANT OPTION;
    
    GRANT EXECUTE ON SYSUIF.DEFAULT_AUTH TO TDAPUSERDB WITH GRANT OPTION;

Sandbox Environment

teradataml provides user a sandbox environment that can be used to run user scripts outside Vantage. Thus the user can test the scripts in a Vantage-like environment before uploading the scripts for execution in the target Advanced SQL Engine.

The sandbox environment is based on a Docker image that has:
  • Python interpreter (Version 3.6)
  • Add-on packages for Python
  • A script that enables user to run Python script within the container environment. Input to this Python script can be:
    • A file with input data
    • Data read from all AMPs

The Docker image can be downloaded from the Teradata Package for Python - teradataml page on https://downloads.teradata.com/.

The size of the Docker image is about 3.6GB. Due to the large size of this Docker image, time to download depends on network bandwidth. Teradata recommends downloading the Docker image beforehand, and saving it into a local folder.

Users can set up the Docker environment and test Python scripts by running the scripts either inside the Docker container, or directly on Vantage.

For details on installing Docker on various platforms, see links below:
When configuring Docker on Windows 10, if users encounter the error Cannot enable Hyper-V service, run the following command from PowerShell window:
bcdedit /set hypervisorlaunchtype auto

Script() Workflow

The following is a typical workflow for Script().


teradataml Script() workflow

The following example shows a workflow where user creates a Script object, sets up sandbox environment, tests Python script, installs it on Vantage, runs Python script on Vantage and then removes the script from Vantage.

Script() Workflow Example

The script "mapper.py" reads in a line of text input ("Old Macdonald Had A Farm") from csv and splits the line into individual words, emitting a new row for each word.

To run this example, "mapper.py" and "barrier.csv" are required and must be present under the same location specified by the argument files_local_path.
  • "barrier.csv" is present under <teradataml_install_location>/teradataml/data directory.
  • "mapper.py" can be created as follows:
    #!/usr/bin/python
    import sys
    for line in sys.stdin:
        line = line.strip()
        words = line.split()
        for word in words:
            print ('%s\t%s' % (word, 1))
  1. Load example data.
    >>> load_example_data("Script", ["barrier"])
  2. Import required packages.
    >>> from collections import OrderedDict
    >>> from teradatasqlalchemy import (VARCHAR)
  3. Create teradataml DataFrame.
    >>> barrierdf = DataFrame.from_table("barrier")
  4. Create a Script object.
    >>> sto = Script(data=barrierdf,
                script_name='mapper.py',
                files_local_path= 'data/scripts',
                script_command='python3 ./<database name>/mapper.py',
                data_order_column="Id",
                is_local_order=False
                delimiter=',',
                nulls_first=False,
                sort_ascending=False,
                charset='latin', returns=OrderedDict([("word", VARCHAR(15)), ("count_input", VARCHAR(2))])
                )
  5. Setup the sandbox environment by providing local path to the Docker image file.
    >>> sto.setup_sto_env(docker_image_location='/tmp/sto_sandbox_docker_image.tar'))
    Loading image from /tmp/sto_sandbox_docker_image.tar. It may take few minutes.
    Image loaded successfully.
    Container c1dd4d4b722cc54b643ab2bdc57540a3a3e6db98c299defc672227de97d2c345 started successfully.
  6. Run user script locally within Docker container using data from csv.

    This helps user to fix script level issues outside Vantage.

    >>> sto.test_script(input_data_file='../barrier.csv', data_file_delimiter=',')
    ############ STDOUT Output ############
     
            word count_input
    0          1            1
    1        Old            1
    2  Macdonald            1
    3        Had            1
    4          A            1
    5       Farm            1
           
    Script results look good.
  7. Install the user script file on Vantage.
    >>> sto.install_file(file_identifier='mapper', file_name='mapper.py', is_binary=False)
  8. Set the search path to the database where the file is installed.
    >>> get_context().execute("SET SESSION SEARCHUIFDBPATH = <database name>;")
  9. Run the user script on Vantage.
    >>> sto.execute_script()
    ############ STDOUT Output ############
     
            word count_input
    0  Macdonald           1
    1          A           1
    2       Farm           1
    3        Had           1
    4        Old           1
    5          1           1
    The same script can run in SQL using SCRIPT table operator with the following SQL code:
    SELECT * FROM Script(
        ON "barrier" AS "input"
        PARTITION BY ANY
        SCRIPT_COMMAND('python3 ./<database name>/mapper.py')
        delimiter(' ')
        returns('word VARCHAR(15), count_input VARCHAR(2)')
    ) as sqlmr
  10. Remove the installed file from Vantage.
    >>> sto.remove_file(file_identifier='mapper', force_remove=True)