Arguments - Aster Execution Engine

Teradata Aster® Developer Guide

Product
Aster Execution Engine
Release Number
7.00.02
Published
July 2017
Language
English (United States)
Last Update
2018-04-13
dita:mapPath
xnl1494366523182.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
ffu1489104705746
lifecycle
previous
Product Category
Software

This section focuses on the use of Stream. For complete informaton about SELECT, see the Teradata Aster® Execution Engine Aster Instance User Guide.

ON
The table, view, or query whose contents or results the Stream function operates on. To get data from a table, you must use the actual table name; you cannot use an alias.
PARTITION BY
[Optional] Introduces an expression that partitions the table's contents, view's contents, or query's results before the function operates on them.
ORDER BY
[Optional] Introduces an expression that sorts the table's contents, view's contents, or query's results before the function operates on them.
SCRIPT
[Optional] Introduces the name of the script or executable to be run. If the runtime needed to run the script is non-obvious, then you must pass the runtime executable name before the script name. The names are surrounded in a common set of single quotation marks, in the form:
SCRIPT ('script_name')
For example:
SCRIPT ('python mapper.py')
An Aster instance chooses the runtime executable in the same manner as a typical command shell. That is, an Aster instance follows the usual Unix convention of looking at the first line to see if it starts with an interpreter directive. For example, mapper.py has the following first line:
#!/usr/bin/env python
so the Python runtime will be used automatically. This means that you can write your SCRIPT clause like this, omitting the runtime name:
SCRIPT ('mapper.py')

If there is no such directive, then the program will run only if it is an executable binary (for example, a compiled C/C++ program).

MEM_LIMIT_MB
[Optional] Sets the virtual memory limit in MB of the script process to the specified value per Stream function invocation. For example:
  • To set the memory limit to 1000 MB:
    SELECT * FROM stream (ON numbers_small
       SCRIPT ('myscript.py') OUTPUTS('outputline varchar')
       MEM_LIMIT_MB('1000')
  • To remove the memory limitation:
    SELECT * FROM stream (ON numbers_small
       SCRIPT ('myscript.py') OUTPUTS('outputline varchar')
       MEM_LIMIT_MB('unlimited')

If the Stream process exceeds the memory limit, the operating system stops the Stream process and this error message appears in ACT:

ERROR:  SQL-MR function STREAM failed: Stream process exited with non-zero exit value (1)

You can set the default memory limit per process using the STREAM_MEM_LIMIT environment variable in asterenv.sh. The default value is set to 4000 MB.

OUTPUTS
[Optional] Specifies the names and types of output columns from this call to the Stream function. By default, the output from a Stream script consists of two columns: (key varchar, value varchar). To specify different column names and/or types, use the OUTPUTS clause. In the OUTPUTS clause, each 'column_name column_type' pair must be surrounded with single quotation marks. For example, you might add the following example clause if your function produces pairs of given names and counts:
OUTPUTS ('givenname varchar', 'count int')
You can specify the asterisk (*) in the OUTPUT clause to represent the column names and types of the input of the Stream function. For example, consider this:
  • The table testtable has two columns: test_id int and test_number float.
  • repeat.py outputs rows with column types int, float, and int and float, in that order.
You could specify the schema of the output table as follows:
SELECT * FROM stream (
  ON testtable SCRIPT('repeat.py')
  OUTPUTS('id int', 'number float', '*')
);

The output columns have this schema:

'id int', 'number float', 'test_id int', 'test_number float'

If the db setting enable_quoted_identifiers='off', the asterisk can be enclosed in single or double quotation marks.

If the db setting enable_quoted_identifiers='on' (default), the asterisk must be enclosed in single quotation marks.

NULLSTRING
[Optional] Changes NULL values in the input table to null_string, before sending the input data as an STDIN stream to the Stream script.

Changes null_string values in the columns of the output table to NULL.

If you do not use this argument, the default behavior is:
  • INPUT

    If there is a NULL value in the input row, the Stream function does not add any value between the two corresponding column delimiters sent as STDIN to the Stream script process.

  • OUTPUT

    Given a value (possibly empty) between two consecutive delimiters in the STDOUT of the Stream script process, the Stream function adds the value to the corresponding column in that row. If the script writes less columns than expected, the remaining columns are filled with NULL values.

  • To minimize overhead, it is best for the NULLSTRING to be as short as possible (for example, an empty string '').
  • The STDOUT of the script should be in UTF-8 to allow for NULLSTRING detection.

    For example:

    SELECT * FROM stream (ON mytable SCRIPT('myscript')
       OUTPUTS('line varchar') NULLSTRING('MYNULL')) ORDER BY 1;
    SELECT * FROM stream (ON (select (1)) SCRIPT('myscript')
       OUTPUTS('a int','b int','c int') NULLSTRING('')) ORDER BY 1;
DELIMITER
[Optional] Specifies the column-delimiter character. You only need to use this clause if your Stream script does not follow the default behavior of using tabs to delimit output columns. In the DELIMITER clause, you must surround the character in single quotation marks. In your Stream function implementation code, you must specify this character as the column delimiter. For example, this clause declares the pipe character to be the delimiter:
DELIMITER ('|')
TYPEFORMAT
[Optional] Converts boolean values with the correct R format. You can specify the type in position 0, then substitute values in the following positions.

Only the argument values TYPEFORMAT('boolean','TRUE','FALSE') are supported. This is interpreted as: for the boolean type, substitute 'TRUE' for 't' and 'FALSE' for 'f' when these literals are found in the input table.

AS
[Optional] Provides an alias for the Stream function call in this query. Using an alias is optional, and, when declaring an alias, the AS keyword is optional.