XMLSPLIT Function | XML Data Type | Teradata Vantage - XMLSPLIT - Analytics Database - Teradata Vantage

XML Data Type

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2023-10-30
dita:mapPath
tkc1628112506748.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
dgs1472251600184
lifecycle
latest
Product Category
Teradata Vantageā„¢

The XMLSPLIT table function takes a single XML document as input and returns multiple rows, each containing a smaller document split from the source document. XSLT based shredding or schema based shredding can then be applied to these smaller XML documents.

This function is useful in situations where an XML document needs to be processed in memory (for example, XSLT based shredding, XML Query, XSLT processing, and so on), and it is too large to be loaded into memory given the restrictions placed by the XMLMemoryLimit dbscontrol setting. XMLSPLIT requires XML documents to be passed in as parameters of CLOB data type; this is because XMLSPLIT is expected to be used as a pre-processing step to split XML documents into smaller documents before XML type instances are created for further processing. XMLSPLIT returns the split documents as CLOB data type.

The XMLSPLIT function splits a source XML document into multiple documents, preserving specified parts of the document hierarchy. The semantics of this function depend on the fact that XML documents are large, typically because there is some repeating structure within it. For instance, a customer archive XML document might be large because it has a large number of "Customer" elements within it.

The output documents contain the same basic document structure as the source document up to and including the element specified by splitPath. The element identified by the splitPath is replicated in its entirety; even if the process exceeds the splitSize it does not split off a document at some descendant of the element identified by splitPath.

Any elements left over after the end of the last element matching the splitPath are returned as the last document. Any other paths that occur earlier in document order will be replicated in all the result documents only if they are specified by the replicationList parameter.

All ancestor elements (and their attributes) of the element identified by the splitPath are always replicated. In addition, you can choose which of the preceding elements (and its descendants) of the split element will be replicated, by specifying them in the replicationList parameter. This parameter takes a comm- separated list of paths, the wild card character "*" or NULL. The wild card character indicates that all content that occurs earlier in the document, other than the parent element of the split element, will be included in all of the output documents.



The following example rules can be applied to split a source document with the structure shown above.
  • If splitPath is /A/B3 (assuming B3 occurs multiple times), then the element A and its attributes will appear in all the resulting documents. In addition, you can choose to replicate the previous sibling element B1, by specifying the replicationList parameter as /A/B1.
  • You can choose that all elements that occur in document order before B3 (which includes elements A, B1 and all its descendants, and B2 and all its descendants) be replicated in each of the output document by specifying "*" in the replicationList parameter.
  • You can choose that a particular descendant of a previous sibling be replicated by specifying the path to it. In the document structure described above, suppose element B1 has two children, C1 and C2. You can choose that element C1 be replicated but not C2 by specifying the replication list as /A/B1/C1. Any element identified like this, and all of its descendants, will be replicated.

For the figure above, if the splitPath is identified as /A/B2/C, the path /A/B2 will occur in each of the output documents with each of the output documents containing enough number of C elements to make the size of the output document conform to the splitSize specification. The output documents will contain complete C elements - the process will not stop at some descendant of the C element in one document and continue with it in the next output document.

The following diagram shows the resulting documents if the splitPath is /A/B2/C.



If the splitPath is /A/B2/C and the replicationList is /A/B3, the output documents contain the following.



If the splitPath is /A/B2/C and the replicationList is "/A/B1, /A/B3", the output documents contain the following.