16.20 - Example: SCRIPT Table Operator Function - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - SQL Operators and User-Defined Functions

Product
Advanced SQL Engine
Teradata Database
Release Number
16.20
Release Date
April 2020
Content Type
Administration
Programming Reference
Publication ID
B035-1210-162K
Language
English (United States)

Create a file called helloworld.py in the /root directory with the following code:

#!/usr/bin/python
print 'hello world!'

Install the file in the database:

DATABASE mydatabase;
call SYSUIF.install_file('helloworld',
                         'helloworld.py', 'cz!/root/helloworld.py');

Run the script:

SET SESSION SEARCHUIFDBPATH = mydatabase;
SELECT DISTINCT *
FROM SCRIPT (
	SCRIPT_COMMAND('./mydatabase/helloworld.py')
	RETURNS ('text VARCHAR(30)')
);

In the following example, install a script and other files, and then execute the SCRIPT table operator.

DATABASE mydb;
SET SESSION SEARCHUIFDBPATH = mydb;
call SYSUIF.install_file('my_analytics',
                         'cz!my_analytics.sh!/tmp/my_analytics.sh');
call SYSUIF.install_file('my_model',
            'cz!my_model.model!/tmp/my_model.sh');
call SYSUIF.install_file('my_data',
                         'cz!my_data.dat!/tmp/my_data.dat');
Select * from SCRIPT( on data_table
		SCRIPT_COMMAND('./mydb/my_analytics.sh –mymodel ./mydb/my_model.model –myadditionaldata ./mydb/my_data.dat')
		RETURNS('*','score varchar(10)');

In this example, invoke a Python script file using the SCRIPT table operator. The script mapper.py reads in a line of text input (“Old Macdonald Had A Farm”) and splits the line into individual words, emitting a new row for each word.

An example of the Python script:

#!/usr/bin/python
import sys
# input comes from STDIN (standard input)
for line in sys.stdin:
    # remove leading and trailing whitespace
    line = line.strip()
    # split the line into words
    words = line.split()
    # increase counters
    for word in words:
        # write the results to STDOUT (standard output);
        # what we output here will be the input for the
        # Reduce step, i.e. the input for reducer.py
        #
        # tab-delimited; the trivial word count is 1
print '%s\t%s' % (word, 1)

To install the script, run the following command:

CALL SYSUIF.INSTALL_FILE('mapper', 'mapper.py'
                         'cz!/tmp/mapper.py');

The table barrier contains the sentence as one line of text input:

Id int Name varchar(100)
1 Old Macdonald Had A Farm

To split the sentence into individual words, run the following script:

SELECT * FROM SCRIPT
( ON ( SELECT name FROM barrier )	
SCRIPT_COMMAND('./mydb/mapper.py')
	RETURNS ( 'word varchar(10)', 'count_input int' ) ) AS tab;
);

The result:

Word Count_input
Old 1
Macdonald 1
Had 1
A 1
Farm 1

The following example uses some of Python’s built-in modules to create a JSON object from a website URL query string. This is useful in converting web URL data to JSON so it can be queried using the JSON data type.

The script requires that Python version 2.6 or higher is installed on the system.

The ‘urltojson.py’ Python script:

#!/usr/bin/python
import sys
import json
import urlparse
for line in sys.stdin:
    print json.dumps(urlparse.parse_qs(urlparse.urlparse(line.rstrip('
	\n')).query))

SQL to install and run the script on sample data:

DATABASE mytestdb;
create table sourcetab(url varchar(10000));
ins sourcetab('https://www.google.com/finance?q=NYSE:TDC');
ins sourcetab('http://www.ebay.com/sch/i.html?_trksid=p2050601.m570.l1313.TR0.TRC0.H0.Xteradata+merchandise&_nkw=teradata+merchandise&_sacat=0&_from=R40');
ins sourcetab('https://www.youtube.com/results?search_query=teradata%20commercial&sm=3');
ins sourcetab('https://www.contrivedexample.com/example?mylist=1&mylist=2&mylist=...testing');
-- This assumes that urltojson.py is in the current directory.
call sysuif.replace_file('urltojson', 'urltojson.py', 'cz!urltojson.py', 0);
set session searchuifdbpath=mytestdb;
select cast(JSONresult as JSON)
from SCRIPT(
    ON(select url from sourcetab)
    SCRIPT_COMMAND('./mytestdb/urltojson.py')
    RETURNS(' JSONresult VARCHAR(10000)')
);

The result:

JSONresult
----------------------------------------------------------------------------------------
{"q": ["NYSE:TDC"]}
{"mylist": ["1", "2", "...testing"]}
{"_from": ["R40"], "_trksid": ["p2050601.m570.l1313.TR0.TRC0.H0.Xteradata merchandise"], "_nkw": ["teradata merchandise"], "_sacat": ["0"]}
{"search_query": ["teradata commercial"], "sm": ["3"]}