16.20 - XMLSPLIT Functional Description - Teradata Vantage NewSQL Engine

Teradata Vantageā„¢ XML Data Type

Teradata Database
Teradata Vantage NewSQL Engine
March 2019
Programming Reference

The XMLSPLIT function splits a source XML document into multiple documents, preserving specified parts of the document hierarchy. The semantics of this function depend on the fact that XML documents are large, typically because there is some repeating structure within it. For instance, a customer archive XML document might be large because it has a large number of "Customer" elements within it.

The output documents contain the same basic document structure as the source document up to and including the element specified by splitPath. The element identified by the splitPath is replicated in its entirety; even if the process exceeds the splitSize it does not split off a document at some descendant of the element identified by splitPath.

Any elements left over after the end of the last element matching the splitPath are returned as the last document. Any other paths that occur earlier in document order will be replicated in all the result documents only if they are specified by the replicationList parameter.

All ancestor elements (and their attributes) of the element identified by the splitPath are always replicated. In addition, you can choose which of the preceding elements (and its descendants) of the split element will be replicated, by specifying them in the replicationList parameter. This parameter takes a comma separated list of paths, the wild card character "*" or NULL. The wild card character indicates that all content that occurs earlier in the document, other than the parent element of the split element, will be included in all of the output documents.

The following example rules can be applied to split a source document with the structure shown above.
  • If splitPath is /A/B3 (assuming B3 occurs multiple times), then the element A and its attributes will appear in all the resulting documents. In addition, you can choose to replicate the previous sibling element B1, by specifying the replicationList parameter as /A/B1.
  • You can choose that all elements that occur in document order before B3 (which includes elements A, B1 and all its descendants, and B2 and all its descendants) be replicated in each of the output document by specifying "*" in the replicationList parameter.
  • You can choose that a particular descendant of a previous sibling be replicated by specifying the path to it. In the document structure described above, suppose element B1 has two children, C1 and C2. You can choose that element C1 be replicated but not C2 by specifying the replication list as /A/B1/C1. Any element identified like this, and all of its descendants, will be replicated.

For the figure above, if the splitPath is identified as /A/B2/C, the path /A/B2 will occur in each of the output documents with each of the output documents containing enough number of C elements to make the size of the output document conform to the splitSize specification. The output documents will contain complete C elements - the process will not stop at some descendant of the C element in one document and continue with it in the next output document.

The diagram below shows the resulting documents if the splitPath is /A/B2/C.

If the splitPath is /A/B2/C and the replicationList is /A/B3, the output documents contain the following.

If the splitPath is /A/B2/C and the replicationList is "/A/B1, /A/B3", the output documents contain the following.