Arguments - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software
Argument Category Description
TextColumn Required Specifies the name of the input table column that contains the XML documents. The function skips malformed XML documents.
Nodes Required Specifies the node-pair strings from which the function extracts data. The simplest syntax for node_pair_string is:
[grandparent/]parent/child[,...]

where grandparent, parent, and child are node names.

For each grandparent, parent, and child, you can specify one or more attributes to extract:

{grandparent|parent|child}[:attribute[,...]]

For each node_pair_string, the function generates a row in the output table and adds a column for each specified attribute.

Node and attribute names are case-sensitive.

A grandparent or parent without attributes can contain wildcards. The wildcards can follow the rules of either the SQL LIKE statement or the Java regular expression.

The SQL LIKE statement syntax is 'like( expression )', where expression can include these wildcards:
  • Percent (%), which matches any sequence of zero or more characters
  • Underscore (_), which matches any single character
  • Backslash(\), which “escapes” the wildcard character that follows it, causing that wildcard character to be treated as an ordinary character. For example, 'like(%a_c\_)/d' matches the XML fragment <123abc_><d>text</d></123abc_>.

The Java Regular Expression syntax is 'regex( expression )', where expression follows the rules for a Java regular expression.

If no node_pair_string contains a parent node, or no node_pair_string contains a grandparent node, the function outputs nothing. If no node_pair_string contains a child node, the function outputs NULL child node values. If the argument specifies no attributes, the function outputs NULL attribute values.

Sibling Optional Specifies the sibling nodes of one parent node specified in the Nodes argument. The syntax for sibling_node_string is:
sibling_node_name[:attribute[,...]]

The function includes the values from the sibling nodes in every output row and adds a column to the output table for every sibling node and every specified attribute.

If no sibling_node_string contains a sibling node, the function outputs NULL sibling node values. If the argument specifies no attributes, the function outputs NULL attribute values.

Delimiter Optional Specifies the delimiter that separates multiple child node values in the output. The default value is comma (,).
SiblingDelimiter Optional Specifies the delimiter that separates multiple sibling node values in the output. The default value is comma (,).
MaxItemNum Optional Specifies the maximum number of sibling nodes with the same name to be returned. This value must be a positive integer. The default value is 10.
Ancestor Optional Specifies the ancestor paths for all parent nodes specified in the Nodes argument. The simplest syntax for nodes_path is:
node[/node]...

For each node, you can specify one or more attributes:

node[:attribute[,...]]

The default ancestor path is the root of the XML document.

A node without attributes can contain wildcards. The wildcards can follow the rules of either the SQL LIKE statement or the Java regular expression. For details, see the description of the Node argument.

If you specify multiple ancestor paths, then the function parses each XML document to get results for each ancestor path. If different ancestor paths contain duplicate node names, as in the following example, then the result might be ambiguous:

SELECT * FROM xmlparser (
  ON xml_inputstext_column ('xml')
  Nodes ('parent1/child1')
  Ancestor ('A/B:attr/C:attr','A/C:attr/B:attr')
);

If different ancestor paths contain duplicate node names, the function does not check for duplicate node names in the ancestor paths when constructing the output. Instead, the function maintains a list of column names for all ancestor paths in the output schema. For each result, the function fills the values of its ancestor path in the list and generates the output for the ancestor part.

If no nodes_path is an ancestor path, the function outputs nothing. If the argument specifies no attributes, the function outputs NULL attribute values.

OutputColumnNodeID Optional Specifies the name of the output table column where the function stores the IDs of the extracted nodes. The default name is out_nodeid.
OutputColumnParentNodeName Optional Specifies the name of the output table column where the function stores the names of the extracted parent nodes. The default name is out_parent_node.
OutputColumnGrandparentNodeName Optional Specifies the name of the output table column where the function stores the tag names of the extracted grandparent nodes. The default name is out_grandparent_node.
ErrorHandler Optional Specifies whether the function handles errors that occur when parsing an XML document. The default value is 'false' (the function aborts and throws an exception).
If you specify 'true':
  • If an error occurs while parsing a row, the function skips that row. When the function completes the parsing, it outputs only the nodes that were error-free.
  • You can tell the function to output an additional column named output_column and populate it with the values of the specified columns. In the output column, the values of the specified columns are separated with semicolons.

    For example, the following argument adds the column error_info to the output table and populates it with the values of input columns col1 and col2 (with a semicolon after each value):

    ErrorHandler('true;error_info:col1,col2')

The default output_column is ErrorHandler.

Accumulate Optional Specifies the names of input column names to copy to the output table. No accumulate_column can be specified by the argument OutputColumnNodeID, OutputColumnParentNodeName, or OutputColumnGrandparentNodeName. By default, the function copies all input columns to the output table.