Example: Parsing an XML Document Using a Scalar UDF - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

One example where scalar UDFs are useful is scanning an XML document and returning specified content, after that document has been stored inside the database. The following example is of an external UDF that uses XPath, which is a set of syntax rules that allow you to navigate an XML document. XPath, which has a function similar to substring, uses path expressions to identify nodes in an XML document.

Depending on your requirements, the XML document can be stored as a CLOB (Character Large Object) or as a VARCHAR column. The following table shows the former, while the following prototype uses the latter.

In this example, the XML document is stored inside Vantage as one VARCHAR column, XMLOrder. The base table, OrderLog, contains only two columns, PONum and the VARCHAR column. Here is the XML document:

    <?xml version="1.0"?>
    <ROOT>
    <ORDER>
       <DATE>8/22/2004</DATE>
       <PO_NUMBER>101</PO_NUMBER>
       <BILLTO>Mike</BILLTO>
       <ITEMS>
        <ITEM>
         <PARTNUM>101</PARTNUM>
         <DESC>Partners Conference Ticket</DESC>
         <USPRICE>1200.00</USPRICE>
        </ITEM>
        <ITEM>
          <PARTNUM>147</PARTNUM>
          <DESC>V2R5.1 UDF Programming</DESC>
          <USPRICE>28.95</USPRICE>
        </ITEM>
        </ITEMS>
    </ORDER>
    </ROOT> 

The DDL for the orderlog table looks like this (orderlog is on the Block File System):

    CREATE SET TABLE orderlog, NO FALLBACK,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT (
       PONum    INTEGER NOT NULL,
       XMLOrder VARCHAR(63,000) )
    UNIQUE PRIMARY INDEX ( PONum );

The following SELECT request references the XpathValue UDF that uses XPath to extract element and attribute content from the XML document. The arguments passed within the SELECT request, such as the BILLTO name, are then used by XPath to search each document in the table. When a document having that specific billing name is identified, then the associated PO number and date are returned as output arguments.

    SELECT XPathValue(O.xmlOrder, '//ORDER/PO_NUMBER/*') AS PO_Number,
           XPathValue(O.xmlOrder, '//ORDER/DATE/*') AS theDate
    FROM OrderLog O
    WHERE XPathValue(O.xmlOrder,'//ORDER/BILLTO/*') = 'Mike'; 

And the output of the query that uses XpathValue UDF looks like this:

    PO_Number   TheDate
    ----------- ----------------
    101         8/22/2004