15.00 - SCRIPT - Teradata Database

Teradata Database SQL Functions, Operators, Expressions, and Predicates

Product
Teradata Database
Release Number
15.00
Content Type
Programming Reference
Publication ID
B035-1145-015K
Language
English (United States)

SCRIPT

Purpose  

Executes a user-installed script or any LINUX command inside Teradata Database.

For more information on installing scripts, see Chapter 30: “Script Installation Procedures.”

Authorization

You must have the EXECUTE FUNCTION privilege on TD_SYSFNLIB.SCRIPT to invoke this table operator.

Syntax  

where:

 

Syntax element …

Specifies …

ON

that the SCRIPT function can have only one ON clause (single input). The ON clause can be specified with no options or with:

  • HASH BY
  • PARTITION BY
  • PARTITION BY ANY
  • an optional ORDER BY or LOCAL ORDER BY clause
  • The DIMENSION option is not supported for the ON clause of the SCRIPT table operator.

    SCRIPT_COMMAND

    the script to be executed. The SCRIPT_COMMAND is a required keyword.

    runtime_literal_command

    the parameters to SCRIPT_COMMAND can be an executable name followed by the script name and other inputs, or any valid LINUX command.

    RETURNS

    the names and types of the output columns returned by the script. The RETURNS clause is optional.

    The '*' in the RETURNS clause specifies that all the columns of the input table should be returned by the SCRIPT function.

    output column literal

    the information inside the quotation marks (column_name data_type_specification). The data_type_specification is any Teradata data type with any appropriate modifiers, such as size or character type.

    DELIMITER

    delimiter_character

    an optional clause. The delimiter_character is a tab by default. The delimiter is used to separate the column values in the input and output strings.

    CHARSET

    an optional clause. The default is LATIN. By specifying UTF-16, SCRIPT uses the UTF-16 character encoding for all of the data passed to and from the user-installed script, which is recommended when using CHAR or VARCHAR with CHARACTER SET UNICODE.

    QUOTECHAR

    quote_character

    an optional parameter that forces all input and output to the script to be quoted using the specified character.

    Using QUOTECHAR also enables the database to distinguish between NULL fields and empty VARCHARs. A VARCHAR with length zero is quoted, while NULL fields are not.

    If the character is found in the data, it will be escaped by a second quote character. For example:

    He said, "Hey there!"

    QUOTECHAR("") with double quotation marks becomes

    "He said, ""Hey there!"""

    AS

    an optional introduction to alias_name or column_name.

    alias_name

    an alias for the table referenced by table_name.column_name_alias is also used to name expressions.

    table_name

    the name of the table being referenced.

    view_name

    the name of the view being referenced.

    query_expression

    any valid SQL expression.

    AS correlation_name

    an optional alias for the ON clause input table.

    HASH BY

    the rows in the ON clause will be redistributed to AMPs based on the hash value of column_specification The user-installed script file then runs once on each AMP.

    PARTITION BY or PARTITION BY ANY

    in its column_specification, or comma-separated list of column specifications, the group, or groups, over which the function operates.

    PARTITION BY is optional. If there is no PARTITION BY clause, then the entire result set, delivered by the FROM clause, constitutes a single group or partition.

    PARTITION BY clause is also called the window partition clause.

    ORDER BY

    in its value_expression the order in which the values in a group, or partition, are sorted.

    COLUMN SPECIFICATION

    a SQL expression comprised of the following two expressions:

  • column_name
  • column_position
  • column_name

    column_position

    the name of the column or columns or their ordinal position within the join index definition.

    sort_expression

    the order in which the SQL expressions are sorted.

    ASC

    that the results are to be ordered in ascending sort order.

    If the sort field is a character string, the system orders it in ascending order according to the definition of the collation sequence for the current session.

    The default order is ASC.

    DESC

    that the results are to be ordered in descending sort order.

    If the sort field is a character string, the system orders it in descending order according to the definition of the collation sequence for the current session.

    NULLS FIRST

    that NULL results are to be listed first.

    NULLS LAST

    that NULL results are to be listed last.

    ANSI Compliance

    This is a Teradata extension to the ANSI SQL:2011 standard.

    Input and Output to the Script File

    A script file reads rows from its standard input (stdin) and writes rows to its standard output (stdout). Each input row to the script file is sent as a string with the column values separated by the delimiter specified in the DELIMITER clause. If no DELIMITER clause is specified, the default is a tab. Each input row is separated from the next by a newline character.

    The result of the script file must also be a delimited string and the values are converted back to data types specified in the RETURNS clause. Each output row must separated from the next by a newline character (\n).

    Nulls are represented as empty strings. For example, the following example represents the input string for a row with three integer values where the last two are NULL (delimited by colons):

    1::\n

    The first value is 1, there is no value between the first and second delimiter, and no value between the second delimiter and the newline character. The input row is equivalent to (1, NULL, NULL).

    If the return values from the script file are not in a delimited string format, the script execution fails.

    Data Types

    The following Teradata Database data types are supported by the SCRIPT table operator.

     

    Teradata Database Data Type

    BYTEINT

    SMALLINT

    INTEGER

    BIGINT

    DECIMAL/NUMERIC

    FLOAT/REAL/DOUBLE PRECISION

    NUMBER

    DATE

    TIME

    TIME WITH TIME ZONE

    TIMESTAMP

    TIMESTAMP WITH TIME ZONE

    INTERVAL YEAR

    INTERVAL YEAR TO MONTH

    INTERVAL MONTH

    INTERVAL DAY

    INTERVAL DAY TO HOUR

    INTERVAL DAY TO MINUTE

    INTERVAL DAY TO SECOND

    INTERVAL HOUR

    INTERVAL HOUR TO MINUTE

    INTERVAL HOUR TO SECOND

    INTERVAL MINUTE

    INTERVAL MINUTE TO SECOND

    INTERVAL SECOND

    CHARACTER

    VARCHAR

    CLOB

    Setting the Database Search Path

    The SEARCHUIFDBPATH option to the SET SESSION sets the database search path for script execution. For details, see SQL Data Definition Language Detailed Topics.

    Executing LINUX Commands

    The text included in the SCRIPT_COMMAND clause runs in its own shell inside a temporary working directory.

    For each database named in the SEARCHUIFDBPATH, a link is added to the working directory that has the same name as that database. You must set SEARCHUIFDBPATH in order to access the user-installed files.

    To access files with the SCRIPT table operator, add a relative path using './', the database name, and '/' the file name.

    The following is an example of SCRIPT accessing the 'mapper.py' file in the database called 'mydatabase':

    SELECT * FROM SCRIPT
    ( ON source
    SCRIPT_COMMAND('python ./mydatabase/mapper.py')
    );

    Executing Scripts

    The SCRIPT table operator runs once per AMP by default. If a PARTITION BY clause is used on the input data, SCRIPT is executed once per partition.

    Script Files and Security

    The installed script files run in protected mode (outside of the Teradata Database). The scripts are run as Tdatuser.

    Troubleshooting

    Teradata recommends testing scripts outside of the database on sample data before installing them to prevent errrors.

    If there is a a script error, the script aborts and error information is written to the script log on each node at /var/opt/teradata/tdtemp/uiflib/scriptlog.

    Example 1

    Create a file called helloworld.py in the /root directory with the following code:

    #!/usr/bin/python
    print 'hello world!'

    Install the file in the database:

    DATABASE mydatabase;
    call SYSUIF.install_file('helloworld',                            
                             'helloworld.py', 'cz!/root/helloworld.py');

    Run the script:

    SET SESSION SEARCHUIFDBPATH = mydatabase;
     
    SELECT DISTINCT *
    FROM SCRIPT (
    	SCRIPT_COMMAND('./mydatabase/helloworld.py')
    	RETURNS ('text VARCHAR(30)')
    );

    Example 2

    In the following example, install a script and other files, and then execute the SCRIPT table operator.

    DATABASE mydb;
    SET SESSION SEARCHUIFDBPATH = mydb;
    call SYSUIF.install_file('my_analytics',
                             'cz!my_analytics.sh!/tmp/my_analytics.sh');
    call SYSUIF.install_file('my_model',                            
                'cz!my_model.model!/tmp/my_model.sh');
    call SYSUIF.install_file('my_data', 
                             'cz!my_data.dat!/tmp/my_data.dat');
    Select * from SCRIPT( on data_table
    		SCRIPT_COMMAND('./mydb/my_analytics.sh –mymodel ./mydb/my_model.model –myadditionaldata ./mydb/my_data.dat')
    		RETURNS('*','score varchar(10)');

    Example 3

    In this example, invoke a Python script file using the SCRIPT table operator. The script mapper.py reads in a line of text input (“Old Macdonald Had A Farm”) and splits the line into individual words, emitting a new row for each word.

    An example of the Python script:

    #!/usr/bin/python
     
    import sys
     
    # input comes from STDIN (standard input)
    for line in sys.stdin:
        # remove leading and trailing whitespace
        line = line.strip()
        # split the line into words
        words = line.split()
        # increase counters
        for word in words:
            # write the results to STDOUT (standard output);
            # what we output here will be the input for the
            # Reduce step, i.e. the input for reducer.py
            #
            # tab-delimited; the trivial word count is 1
            print '%s\t%s' % (word, 1)

    To install the script, run the following command:

    CALL SYSUIF.INSTALL_FILE('mapper', 'mapper.py'
                             'cz!/tmp/mapper.py');

    The table barrier contains the sentence as one line of text input:

     

    Id int

    Name varchar(100)

    1

    Old Macdonald Had A Farm

    To split the sentence into individual words, run the following script:

    SELECT * FROM SCRIPT
    ( ON ( SELECT name FROM barrier )	
    SCRIPT_COMMAND('./mydb/mapper.py')
    	RETURNS ( 'word varchar(10)', 'count_input int' ) ) AS tab;
    );

    The result:

     

    Word

    Count_input

    Old

    1

    Macdonald

    1

    Had

    1

    A

    1

    Farm

    1

    Example 4

    The following example uses some of Python’s built-in modules to create a JSON object from a website URL query string. This is useful in converting web URL data to JSON so it can be queried using the JSON data type.

    The script requires that Python version 2.6 or higher is installed on the system.

    The ‘urltojson.py’ Python script:

    #!/usr/bin/python
     
    import sys
    import json
    import urlparse
     
     
    for line in sys.stdin:
        print json.dumps(urlparse.parse_qs(urlparse.urlparse(line.rstrip('
    	\n')).query))

    SQL to install and run the script on sample data:

    DATABASE mytestdb;
    create table sourcetab(url varchar(10000));
     
    ins sourcetab('https://www.google.com/finance?q=NYSE:TDC');
    ins sourcetab('http://www.ebay.com/sch/i.html?_trksid=p2050601.m570.l1313.TR0.TRC0.H0.Xteradata+merchandise&_nkw=teradata+merchandise&_sacat=0&_from=R40');
    ins sourcetab('https://www.youtube.com/results?search_query=teradata%20commercial&sm=3');
    ins sourcetab('https://www.contrivedexample.com/example?mylist=1&mylist=2&mylist=...testing');
     
    -- This assumes that urltojson.py is in the current directory.
    call sysuif.replace_file('urltojson', 'urltojson.py', 'cz!urltojson.py', 0);
     
    set session searchuifdbpath=mytestdb;
     
    select cast(JSONresult as JSON) 
    from SCRIPT(
        ON(select url from sourcetab) 
        SCRIPT_COMMAND('./mytestdb/urltojson.py') 
        RETURNS(' JSONresult VARCHAR(10000)')
    );

    The result:

    JSONresult
    ----------------------------------------------------------------------------------------
    {"q": ["NYSE:TDC"]}
    {"mylist": ["1", "2", "...testing"]}
    {"_from": ["R40"], "_trksid": ["p2050601.m570.l1313.TR0.TRC0.H0.Xteradata merchandise"], "_nkw": ["teradata merchandise"], "_sacat": ["0"]}
    {"search_query": ["teradata commercial"], "sm": ["3"]}

    For More Information

    For more information on installing scripts in the database, see Chapter 30: “Script Installation Procedures.”

    For more information about the JSON data type, see Teradata JSON.