SCRIPT
Purpose
Executes a user-installed script or any LINUX command inside Teradata Database.
For more information on installing scripts, see Chapter 30: “Script Installation Procedures.”
Authorization
You must have the EXECUTE FUNCTION privilege on TD_SYSFNLIB.SCRIPT to invoke this table operator.
Syntax
where:
Syntax element … |
Specifies … |
ON |
that the SCRIPT function can have only one ON clause (single input). The ON clause can be specified with no options or with: The DIMENSION option is not supported for the ON clause of the SCRIPT table operator. |
SCRIPT_COMMAND |
the script to be executed. The SCRIPT_COMMAND is a required keyword. |
runtime_literal_command |
the parameters to SCRIPT_COMMAND can be an executable name followed by the script name and other inputs, or any valid LINUX command. |
RETURNS |
the names and types of the output columns returned by the script. The RETURNS clause is optional. The '*' in the RETURNS clause specifies that all the columns of the input table should be returned by the SCRIPT function. |
output column literal |
the information inside the quotation marks (column_name data_type_specification). The data_type_specification is any Teradata data type with any appropriate modifiers, such as size or character type. |
DELIMITER delimiter_character |
an optional clause. The delimiter_character is a tab by default. The delimiter is used to separate the column values in the input and output strings. |
CHARSET |
an optional clause. The default is LATIN. By specifying UTF-16, SCRIPT uses the UTF-16 character encoding for all of the data passed to and from the user-installed script, which is recommended when using CHAR or VARCHAR with CHARACTER SET UNICODE. |
QUOTECHAR quote_character |
an optional parameter that forces all input and output to the script to be quoted using the specified character. Using QUOTECHAR also enables the database to distinguish between NULL fields and empty VARCHARs. A VARCHAR with length zero is quoted, while NULL fields are not. If the character is found in the data, it will be escaped by a second quote character. For example: He said, "Hey there!" QUOTECHAR("") with double quotation marks becomes "He said, ""Hey there!""" |
AS |
an optional introduction to alias_name or column_name. |
alias_name |
an alias for the table referenced by table_name.column_name_alias is also used to name expressions. |
table_name |
the name of the table being referenced. |
view_name |
the name of the view being referenced. |
query_expression |
any valid SQL expression. |
AS correlation_name |
an optional alias for the ON clause input table. |
HASH BY |
the rows in the ON clause will be redistributed to AMPs based on the hash value of column_specification The user-installed script file then runs once on each AMP. |
PARTITION BY or PARTITION BY ANY |
in its column_specification, or comma-separated list of column specifications, the group, or groups, over which the function operates. PARTITION BY is optional. If there is no PARTITION BY clause, then the entire result set, delivered by the FROM clause, constitutes a single group or partition. PARTITION BY clause is also called the window partition clause. |
ORDER BY |
in its value_expression the order in which the values in a group, or partition, are sorted. |
COLUMN SPECIFICATION |
a SQL expression comprised of the following two expressions: |
column_name column_position |
the name of the column or columns or their ordinal position within the join index definition. |
sort_expression |
the order in which the SQL expressions are sorted. |
ASC |
that the results are to be ordered in ascending sort order. If the sort field is a character string, the system orders it in ascending order according to the definition of the collation sequence for the current session. The default order is ASC. |
DESC |
that the results are to be ordered in descending sort order. If the sort field is a character string, the system orders it in descending order according to the definition of the collation sequence for the current session. |
NULLS FIRST |
that NULL results are to be listed first. |
NULLS LAST |
that NULL results are to be listed last. |
ANSI Compliance
This is a Teradata extension to the ANSI SQL:2011 standard.
Input and Output to the Script File
A script file reads rows from its standard input (stdin) and writes rows to its standard output (stdout). Each input row to the script file is sent as a string with the column values separated by the delimiter specified in the DELIMITER clause. If no DELIMITER clause is specified, the default is a tab. Each input row is separated from the next by a newline character.
The result of the script file must also be a delimited string and the values are converted back to data types specified in the RETURNS clause. Each output row must separated from the next by a newline character (\n).
Nulls are represented as empty strings. For example, the following example represents the input string for a row with three integer values where the last two are NULL (delimited by colons):
1::\n
The first value is 1, there is no value between the first and second delimiter, and no value between the second delimiter and the newline character. The input row is equivalent to (1, NULL, NULL).
If the return values from the script file are not in a delimited string format, the script execution fails.
Data Types
The following Teradata Database data types are supported by the SCRIPT table operator.
Teradata Database Data Type |
BYTEINT |
SMALLINT |
INTEGER |
BIGINT |
DECIMAL/NUMERIC |
FLOAT/REAL/DOUBLE PRECISION |
NUMBER |
DATE |
TIME |
TIME WITH TIME ZONE |
TIMESTAMP |
TIMESTAMP WITH TIME ZONE |
INTERVAL YEAR |
INTERVAL YEAR TO MONTH |
INTERVAL MONTH |
INTERVAL DAY |
INTERVAL DAY TO HOUR |
INTERVAL DAY TO MINUTE |
INTERVAL DAY TO SECOND |
INTERVAL HOUR |
INTERVAL HOUR TO MINUTE |
INTERVAL HOUR TO SECOND |
INTERVAL MINUTE |
INTERVAL MINUTE TO SECOND |
INTERVAL SECOND |
CHARACTER |
VARCHAR |
CLOB |
Setting the Database Search Path
The SEARCHUIFDBPATH option to the SET SESSION sets the database search path for script execution. For details, see SQL Data Definition Language Detailed Topics.
Executing LINUX Commands
The text included in the SCRIPT_COMMAND clause runs in its own shell inside a temporary working directory.
For each database named in the SEARCHUIFDBPATH, a link is added to the working directory that has the same name as that database. You must set SEARCHUIFDBPATH in order to access the user-installed files.
To access files with the SCRIPT table operator, add a relative path using './', the database name, and '/' the file name.
The following is an example of SCRIPT accessing the 'mapper.py' file in the database called 'mydatabase':
SELECT * FROM SCRIPT
( ON source
SCRIPT_COMMAND('python ./mydatabase/mapper.py')
);
Executing Scripts
The SCRIPT table operator runs once per AMP by default. If a PARTITION BY clause is used on the input data, SCRIPT is executed once per partition.
Script Files and Security
The installed script files run in protected mode (outside of the Teradata Database). The scripts are run as Tdatuser.
Troubleshooting
Teradata recommends testing scripts outside of the database on sample data before installing them to prevent errrors.
If there is a a script error, the script aborts and error information is written to the script log on each node at /var/opt/teradata/tdtemp/uiflib/scriptlog.
Example 1
Create a file called helloworld.py in the /root directory with the following code:
#!/usr/bin/python
print 'hello world!'
Install the file in the database:
DATABASE mydatabase;
call SYSUIF.install_file('helloworld',
'helloworld.py', 'cz!/root/helloworld.py');
Run the script:
SET SESSION SEARCHUIFDBPATH = mydatabase;
SELECT DISTINCT *
FROM SCRIPT (
SCRIPT_COMMAND('./mydatabase/helloworld.py')
RETURNS ('text VARCHAR(30)')
);
Example 2
In the following example, install a script and other files, and then execute the SCRIPT table operator.
DATABASE mydb;
SET SESSION SEARCHUIFDBPATH = mydb;
call SYSUIF.install_file('my_analytics',
'cz!my_analytics.sh!/tmp/my_analytics.sh');
call SYSUIF.install_file('my_model',
'cz!my_model.model!/tmp/my_model.sh');
call SYSUIF.install_file('my_data',
'cz!my_data.dat!/tmp/my_data.dat');
Select * from SCRIPT( on data_table
SCRIPT_COMMAND('./mydb/my_analytics.sh –mymodel ./mydb/my_model.model –myadditionaldata ./mydb/my_data.dat')
RETURNS('*','score varchar(10)');
Example 3
In this example, invoke a Python script file using the SCRIPT table operator. The script mapper.py reads in a line of text input (“Old Macdonald Had A Farm”) and splits the line into individual words, emitting a new row for each word.
An example of the Python script:
#!/usr/bin/python
import sys
# input comes from STDIN (standard input)
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# split the line into words
words = line.split()
# increase counters
for word in words:
# write the results to STDOUT (standard output);
# what we output here will be the input for the
# Reduce step, i.e. the input for reducer.py
#
# tab-delimited; the trivial word count is 1
print '%s\t%s' % (word, 1)
To install the script, run the following command:
CALL SYSUIF.INSTALL_FILE('mapper', 'mapper.py'
'cz!/tmp/mapper.py');
The table barrier contains the sentence as one line of text input:
Id int |
Name varchar(100) |
1 |
Old Macdonald Had A Farm |
To split the sentence into individual words, run the following script:
SELECT * FROM SCRIPT
( ON ( SELECT name FROM barrier )
SCRIPT_COMMAND('./mydb/mapper.py')
RETURNS ( 'word varchar(10)', 'count_input int' ) ) AS tab;
);
The result:
Word |
Count_input |
Old |
1 |
Macdonald |
1 |
Had |
1 |
A |
1 |
Farm |
1 |
Example 4
The following example uses some of Python’s built-in modules to create a JSON object from a website URL query string. This is useful in converting web URL data to JSON so it can be queried using the JSON data type.
The script requires that Python version 2.6 or higher is installed on the system.
The ‘urltojson.py’ Python script:
#!/usr/bin/python
import sys
import json
import urlparse
for line in sys.stdin:
print json.dumps(urlparse.parse_qs(urlparse.urlparse(line.rstrip('
\n')).query))
SQL to install and run the script on sample data:
DATABASE mytestdb;
create table sourcetab(url varchar(10000));
ins sourcetab('https://www.google.com/finance?q=NYSE:TDC');
ins sourcetab('http://www.ebay.com/sch/i.html?_trksid=p2050601.m570.l1313.TR0.TRC0.H0.Xteradata+merchandise&_nkw=teradata+merchandise&_sacat=0&_from=R40');
ins sourcetab('https://www.youtube.com/results?search_query=teradata%20commercial&sm=3');
ins sourcetab('https://www.contrivedexample.com/example?mylist=1&mylist=2&mylist=...testing');
-- This assumes that urltojson.py is in the current directory.
call sysuif.replace_file('urltojson', 'urltojson.py', 'cz!urltojson.py', 0);
set session searchuifdbpath=mytestdb;
select cast(JSONresult as JSON)
from SCRIPT(
ON(select url from sourcetab)
SCRIPT_COMMAND('./mytestdb/urltojson.py')
RETURNS(' JSONresult VARCHAR(10000)')
);
The result:
JSONresult
----------------------------------------------------------------------------------------
{"q": ["NYSE:TDC"]}
{"mylist": ["1", "2", "...testing"]}
{"_from": ["R40"], "_trksid": ["p2050601.m570.l1313.TR0.TRC0.H0.Xteradata merchandise"], "_nkw": ["teradata merchandise"], "_sacat": ["0"]}
{"search_query": ["teradata commercial"], "sm": ["3"]}
For More Information
For more information on installing scripts in the database, see Chapter 30: “Script Installation Procedures.”
For more information about the JSON data type, see Teradata JSON.