7.00.02 - XMLParser Arguments - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product
Aster Analytics
Release Number
7.00.02
Published
September 2017
Content Type
Programming Reference
User Guide
Publication ID
B700-1022-700K
Language
English (United States)
Last Update
2018-04-17
TextColumn
Specifies the name of the input table column that contains the XML documents. The function skips malformed XML documents.
Nodes
Specifies the node-pair strings from which the function extracts data. The simplest syntax for node_pair_string is:
[grandparent/]parent/child[,...]

where grandparent, parent, and child are node names.

For each grandparent, parent, and child, you can specify one or more attributes to extract:

{grandparent|parent|child}[:attribute[,...]]

For each node_pair_string, the function generates a row in the output table and adds a column for each specified attribute.

Node and attribute names are case-sensitive.

A grandparent or parent without attributes can contain wildcards. The wildcards can follow the rules of either the SQL LIKE statement or the Java regular expression.

The SQL LIKE statement syntax is 'like(expression)', where expression can include these wildcards:
Wildcard Character Meaning
Percent (%) Matches any sequence of zero or more characters.
Underscore (_) Matches any single character.
Backslash (\) Makes the wildcard character that follows an ordinary character.

For example, 'like(%a_c\_)/d' matches the XML fragment <123abc_><d>text</d></123abc_>.

The Java Regular Expression syntax is 'regex( expression )', where expression follows the rules for a Java regular expression.

If no node_pair_string contains a parent node, or no node_pair_string contains a grandparent node, the function outputs nothing. If no node_pair_string contains a child node, the function outputs NULL child node values. If the argument specifies no attributes, the function outputs NULL attribute values.

Sibling
[Optional] Specifies the sibling nodes of one parent node specified in the Nodes argument. The syntax for sibling_node_string is:
sibling_node_name[:attribute[,...]]

The function includes the values from the sibling nodes in every output row and adds a column to the output table for every sibling node and every specified attribute.

If no sibling_node_string contains a sibling node, the function outputs NULL sibling node values. If the argument specifies no attributes, the function outputs NULL attribute values.

Delimiter
[Optional] Specifies the delimiter that separates multiple child node values in the output. Default: ',' (comma).
SiblingDelimiter
[Optional] Specifies the delimiter that separates multiple sibling node values in the output. Default: ',' (comma).
MaxItemNum
[Optional] Specifies the maximum number of sibling nodes with the same name to be returned. This value must be a positive integer. Default: 10.
Ancestor
[Optional] Specifies the ancestor paths for all parent nodes specified in the Nodes argument. The simplest syntax for nodes_path is:
node[/node]...

For each node, you can specify one or more attributes:

node[:attribute[,...]]

Default: Root of the XML document.

A node without attributes can contain wildcards. The wildcards can follow the rules of either the SQL LIKE statement or the Java regular expression. For details, see the description of the Node argument.

If you specify multiple ancestor paths, the function parses each XML document to get results for each ancestor path. If different ancestor paths contain duplicate node names, as in the following example, the result can be ambiguous:

SELECT * FROM xmlparser (
  ON xml_inputstext_column ('xml')
  Nodes ('parent1/child1')
  Ancestor ('A/B:attr/C:attr','A/C:attr/B:attr')
);

If different ancestor paths contain duplicate node names, the function does not check for duplicate node names in the ancestor paths when constructing the output. Instead, the function maintains a list of column names for all ancestor paths in the output schema. For each result, the function fills the values of its ancestor path in the list and generates the output for the ancestor part.

If no nodes_path is an ancestor path, the function outputs nothing. If the argument specifies no attributes, the function outputs NULL attribute values.

OutputColumnNodeID
[Optional] Specifies the name of the output table column where the function stores the IDs of the extracted nodes. Default: 'out_nodeid'.
OutputColumnParentNodeName
[Optional] Specifies the name of the output table column where the function stores the names of the extracted parent nodes. Default: 'out_parent_node'.
OutputColumn GrandparentNodeName
[Optional] Specifies the name of the output table column where the function stores the tag names of the extracted grandparent nodes. Default: 'out_grandparent_node'.
ErrorHandler
[Optional] Specifies whether the function handles errors that occur when parsing an XML document. Default: 'false' (the function aborts and throws an exception).
If you specify 'true':
  • If an error occurs while parsing a row, the function skips that row. When the function completes the parsing, it outputs only the nodes that were error-free.
  • You can tell the function to output an additional column named output_column and populate it with the values of the specified columns. In the output column, the values of the specified columns are separated with semicolons.

    For example, the following argument adds the column error_info to the output table and populates it with the values of input columns col1 and col2 (with a semicolon after each value):

    ErrorHandler('true;error_info:col1,col2')

Default: output_column is ErrorHandler.

Accumulate
[Optional] Specifies the names of input column names to copy to the output table. No accumulate_column can be specified by the argument OutputColumnNodeID, OutputColumnParentNodeName, or OutputColumnGrandparentNodeName. Default: All input columns.