17.10 - Example: Parsing an XML Document Using a Scalar UDF - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - SQL Data Definition Language Syntax and Examples

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B035-1144-171K
Language
English (United States)

One example where scalar UDFs are useful is scanning an XML document and returning specified content, after that document has been stored inside the database. The following example is of an external UDF that uses XPath, which is a set of syntax rules that allow you to navigate an XML document. XPath, which has a function similar to substring, uses path expressions to identify nodes in an XML document.

Depending on your requirements, the XML document could be stored as a CLOB (Character Large Object) or as a VARCHAR column. The former is illustrated in the graphic below, while the following prototype uses the latter.

In this example, the XML document is stored inside Vantage as one VARCHAR column, XMLOrder. The base table, OrderLog, contains only two columns, PONum and the VARCHAR column. Here is the XML document:

    <?xml version="1.0"?>
    <ROOT>
    <ORDER>
       <DATE>8/22/2004</DATE>
       <PO_NUMBER>101</PO_NUMBER>
       <BILLTO>Mike</BILLTO>
       <ITEMS>
        <ITEM>
         <PARTNUM>101</PARTNUM>
         <DESC>Partners Conference Ticket</DESC>
         <USPRICE>1200.00</USPRICE>
        </ITEM>
        <ITEM>
          <PARTNUM>147</PARTNUM>
          <DESC>V2R5.1 UDF Programming</DESC>
          <USPRICE>28.95</USPRICE>
        </ITEM>
        </ITEMS>
    </ORDER>
    </ROOT> 

The DDL for the orderlog table looks like this:

    CREATE SET TABLE orderlog, NO FALLBACK,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT (
       PONum    INTEGER NOT NULL,
       XMLOrder VARCHAR(63,000) )
    UNIQUE PRIMARY INDEX ( PONum );

The following SELECT request references the XpathValue UDF that uses XPath to extract element and attribute content from the XML document. The arguments passed within the SELECT request, such as the BILLTO name, are then used by XPath to search each document in the table. When a document having that specific billing name is identified, then the associated PO number and date are returned as output arguments.

    SELECT XPathValue(O.xmlOrder, '//ORDER/PO_NUMBER/*') AS PO_Number,
           XPathValue(O.xmlOrder, '//ORDER/DATE/*') AS theDate
    FROM OrderLog O
    WHERE XPathValue(O.xmlOrder,'//ORDER/BILLTO/*') = 'Mike'; 

And the output of the query that uses XPathValue UDF looks like this:

    PO_Number   TheDate
    ----------- ----------------
    101         8/22/2004