XMLParser Syntax Elements - Teradata Vantage

Machine Learning Engine Analytic Function Reference

Product

Teradata Vantage

Release Number

8.10

1.1

Published

October 2019

Language

English (United States)

Last Update

2019-12-31

dita:mapPath

ima1540829771750.ditamap

dita:ditavalPath

jsj1481748799576.ditaval

dita:id

B700-4003

lifecycle

Product Category

Teradata Vantage™

TextColumn

Specify the name of the input table column that contains the XML documents. The function skips malformed XML documents.

Nodes

Specify the node-pair strings from which the function extracts data. This is the simplest syntax for node_pair_string:

[grandparent/]parent/child[,...]

where grandparent, parent, and child are node names.

For each grandparent, parent, and child, you can specify one or more attributes to extract:

{grandparent|parent|child}[:attribute[,...]]

For each node_pair_string, the function creates a row in the output table and adds a column for each specified attribute.

Node and attribute names are case-sensitive.

A grandparent or parent without attributes can contain wildcards. The wildcards can follow the rules of either the SQL LIKE statement or the Java regular expression.

The SQL LIKE statement syntax is 'like(expression)', where expression can include these wildcards:

Wildcard Character	Meaning
Percent (%)	Matches any sequence of zero or more characters.
Underscore (_)	Matches any single character.
Backslash (\)	Makes the wildcard character that follows an ordinary character.

For example, 'like(%a_c\_)/d' matches the XML fragment <123abc_><d>text</d></123abc_>.

The Java Regular Expression syntax is 'regex( expression )', where expression follows the rules for a Java regular expression.

If no node_pair_string contains a parent node, or no node_pair_string contains a grandparent node, the function outputs nothing. If no node_pair_string contains a child node, the function outputs NULL child node values. If the syntax element specifies no attributes, the function outputs NULL attribute values.

Sibling

[Optional] Specify the sibling nodes of one parent node specified in the Nodes syntax element. This is the syntax for sibling_node_string:

sibling_node_name[:attribute[,...]]

The function includes the values from the sibling nodes in every output row and adds a column to the output table for every sibling node and every specified attribute.

If no sibling_node_string contains a sibling node, the function outputs NULL sibling node values. If the syntax element specifies no attributes, the function outputs NULL attribute values.

Delimiter

[Optional] Specify the delimiter that separates multiple child node values in the output.

Default: ',' (comma)

SiblingDelimiter

[Optional] Specify the delimiter that separates multiple sibling node values in the output.

Default: ',' (comma)

MaxItemNum

[Optional] Specify the maximum number of sibling nodes with the same name to return. This value must be a positive integer.

Default: 10

Ancestor

[Optional] Specify the ancestor paths for all parent nodes specified in the Nodes syntax element. This is the simplest syntax for nodes_path:

node[/node]...

For each node, you can specify one or more attributes:

node[:attribute[,...]]

A node without attributes can contain wildcards. The wildcards can follow the rules of either the SQL LIKE statement or the Java regular expression. For details, see the description of the Node syntax element.

If you specify multiple ancestor paths, the function parses each XML document to get results for each ancestor path. If different ancestor paths contain duplicate node names, as in the following example, the result can be ambiguous:

SELECT * FROM xmlparser (
  ON xml_inputstext_column ('xml')
  Nodes ('parent1/child1')
  Ancestor ('A/B:attr/C:attr','A/C:attr/B:attr')
);

If different ancestor paths contain duplicate node names, the function does not check for duplicate node names in the ancestor paths when constructing the output. Instead, the function maintains a list of column names for all ancestor paths in the output schema. For each result, the function fills the values of its ancestor path in the list and creates the output for the ancestor part.

If no nodes_path is an ancestor path, the function outputs nothing. If the syntax element specifies no attributes, the function outputs NULL attribute values.

Default: Root of the XML document

OutputColumnNodeID

[Optional] Specify the name of the output table column where the function stores the IDs of the extracted nodes.

Default: 'out_nodeid'

OutputColumnParentNodeName

[Optional] Specify the name of the output table column where the function stores the names of the extracted parent nodes.

Default: 'out_parent_node'

OutputColumn GrandparentNodeName

[Optional] Specify the name of the output table column where the function stores the tag names of the extracted grandparent nodes.

Default: 'out_grandparent_node'

ErrorHandler

[Optional] Specify whether the function handles errors that occur when parsing an XML document.

If you specify 'true':

If an error occurs while parsing a row, the function skips that row. When the function completes the parsing, it outputs only the nodes that were error-free.
You can tell the function to output an additional column named output_column and populate it with the values of the specified columns. In the output column, the values of the specified columns are separated with semicolons.
For example, the following syntax element adds the column error_info to the output table and populates it with the values of input columns col1 and col2 (with a semicolon after each value):
```
ErrorHandler('true;error_info:col1,col2')
```

Default: 'false' (The function aborts and throws an exception. The output_column is ErrorHandler.)

Accumulate

[Optional] Specify the names of input column names to copy to the output table. No accumulate_column can be specified by the syntax element OutputColumnNodeID, OutputColumnParentNodeName, or OutputColumnGrandparentNodeName.

Default: All input columns