- TextColumn
- Specify the name of the input table column that contains the XML documents. The function skips malformed XML documents.
- Nodes
- Specify the node-pair strings from which the function extracts data. This is the simplest syntax for node_pair_string:
[grandparent/]parent/child[,...]
where grandparent, parent, and child are node names.
For each grandparent, parent, and child, you can specify one or more attributes to extract:
{grandparent|parent|child}[:attribute[,...]]
For each node_pair_string, the function creates a row in the output table and adds a column for each specified attribute.
Node and attribute names are case-sensitive.A grandparent or parent without attributes can contain wildcards. The wildcards can follow the rules of either the SQL LIKE statement or the Java regular expression.
The SQL LIKE statement syntax is 'like(expression)', where expression can include these wildcards:Wildcard Character Meaning Percent (%) Matches any sequence of zero or more characters. Underscore (_) Matches any single character. Backslash (\) Makes the wildcard character that follows an ordinary character. For example, 'like(%a_c\_)/d' matches the XML fragment <123abc_><d>text</d></123abc_>.
The Java Regular Expression syntax is 'regex( expression )', where expression follows the rules for a Java regular expression.
If no node_pair_string contains a parent node, or no node_pair_string contains a grandparent node, the function outputs nothing. If no node_pair_string contains a child node, the function outputs NULL child node values. If the argument specifies no attributes, the function outputs NULL attribute values.
- Sibling
- [Optional] Specify the sibling nodes of one parent node specified in the Nodes argument. This is the syntax for sibling_node_string:
sibling_node_name[:attribute[,...]]
The function includes the values from the sibling nodes in every output row and adds a column to the output table for every sibling node and every specified attribute.
If no sibling_node_string contains a sibling node, the function outputs NULL sibling node values. If the argument specifies no attributes, the function outputs NULL attribute values.
- Delimiter
- [Optional] Specify the delimiter that separates multiple child node values in the output.
- SiblingDelimiter
- [Optional] Specify the delimiter that separates multiple sibling node values in the output.
- MaxItemNum
- [Optional] Specify the maximum number of sibling nodes with the same name to return. This value must be a positive integer.
- Ancestor
- [Optional] Specify the ancestor paths for all parent nodes specified in the Nodes argument. This is the simplest syntax for nodes_path:
node[/node]...
For each node, you can specify one or more attributes:
node[:attribute[,...]]
A node without attributes can contain wildcards. The wildcards can follow the rules of either the SQL LIKE statement or the Java regular expression. For details, see the description of the Node argument.
If you specify multiple ancestor paths, the function parses each XML document to get results for each ancestor path. If different ancestor paths contain duplicate node names, as in the following example, the result can be ambiguous:
SELECT * FROM xmlparser ( ON xml_inputstext_column ('xml') Nodes ('parent1/child1') Ancestor ('A/B:attr/C:attr','A/C:attr/B:attr') );
- OutputColumnNodeID
- [Optional] Specify the name of the output table column where the function stores the IDs of the extracted nodes.
- OutputColumnParentNodeName
- [Optional] Specify the name of the output table column where the function stores the names of the extracted parent nodes.
- OutputColumn GrandparentNodeName
- [Optional] Specify the name of the output table column where the function stores the tag names of the extracted grandparent nodes.
- ErrorHandler
- [Optional] Specify whether the function handles errors that occur when parsing an XML document.
- Accumulate
- [Optional] Specify the names of input column names to copy to the output table. No accumulate_column can be specified by the argument OutputColumnNodeID, OutputColumnParentNodeName, or OutputColumnGrandparentNodeName.