Path Filter on a View for CSV | Native Object Store ( NOS ) | Teradata Vantage - 17.10 - Using a Foreign Table View with Path Filtering - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - Native Object Store Getting Started Guide

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B035-1214-171K
Language
English (United States)

The following example is of a view on a river flow foreign table that renames path (key) IDs attributes, so that data can be referred to in SQL queries.

The examples use a sample river flow data set. To use your own data, replace the table and column names, and authorization object. See Variable Substitutions for Examples for the credentials and location values for the sample data set.

  1. To run NOS-related commands, log on to the database as a user with the required privileges.
  2. If it does not exist, create the foreign table or ask your database administrator to create the foreign table called riverflow_csv_pathfilter: See Filtering Using a Column Within the Data Set.
  3. Create the view of the foreign table:
    REPLACE VIEW riverflowview_csv AS (
    SELECT
    CAST($path.$siteno AS CHAR(10)) TheSite,
    CAST($path.$year AS CHAR(4)) TheYear,
    CAST($path.$month AS CHAR(2)) TheMonth,
    CAST(SUBSTR($path.$day, 1, 2) AS CHAR(2)) TheDay,
    Flow,
    GageHeight,
    Precipitation,
    Temp,
    Conductance
    FROM riverflow_csv_pathfilter
    WHERE TheSite = site_no);
  4. Query the view:
    SELECT TOP 2 * FROM riverflowview_csv;

    Your result will be similar to the following:

    TheSite     TheYear  TheMonth  TheDay    Flow  GageHeight  Precipitation   Temp  Conductance
    ----------  -------  --------  ------  ------  ----------  -------------  -----  -----------
    09380000    2018     07        18       17500       10.21            .00   11.0          670
    09380000    2018     07        19       17200       10.16            .00   10.5          682

    Columns with missing attributes return NULLs when the record does not contain the attribute.

  5. Filter on the view to find the number of sites with the site number of 09396100:
    SELECT thesite,COUNT(*)                                            
    FROM riverflowview_csv WHERE thesite=09380000'
    GROUP BY 1;

    Your result will be similar to the following:

    TheSite        Count(*)
    ----------  -----------
    09380000           2945
  6. Run EXPLAIN on the query to see how the filtering is done:
    EXPLAIN
    SELECT thesite,COUNT(*)                                            
    FROM riverflowview_csv WHERE thesite=09380000'
    GROUP BY 1;

    Your result will be similar to the following:

    Explanation
    --------------------------------------------------------------------------
    […]
      2) Next, we do a single-AMP RETRIEVE step from
         NOS_USR.riverflow_csv_pathfilter in view riverflowview_csv
         metadata by way of an all-rows scan with a condition of (
         "(TD_SYSFNLIB.NosExtractVarFromPath (
         NOS_USR.riverflow_csv_pathfilter in view
         riverflowview_csv.Location, '/s3/td-usgs-public.s3.amazonaws.com',
         2 )(CHAR(10), CHARACTER SET UNICODE, NOT CASESPECIFIC))= '09380000
         '") into Spool 2 (one-amp), which is built locally on that AMP.
         Then we do a SORT to order Spool 2 by the sort key.  The size of
         Spool 2 is estimated with no confidence to be 4 rows (2,820 bytes).
         The estimated time for this step is 0.55 seconds.
      3) We do a single-AMP RETRIEVE step from Spool 2 (Last Use) by way of
         an all-rows scan into Spool 5 (all_amps), which is binpacked and
         redistributed by size to all AMPs in TD_Map1.  The size of Spool 5
         is estimated with no confidence to be 4 rows (2,852 bytes).  The
         estimated time for this step is 0.06 seconds.
      4) We do an all-AMPs SUM step in TD_MAP1 to aggregate from
         NOS_USR.riverflow_csv_pathfilter in view riverflowview_csv
         by way of an object-store scan using Spool 5 (Last Use) with a
         condition of ("(TD_SYSFNLIB.NosExtractVarFromPath (
         NOS_USR.riverflow_csv_pathfilter in view
         riverflowview_csv.Location, '/s3/td-usgs-public.s3.amazonaws.com',
         2 )(CHAR(10), CHARACTER SET UNICODE, NOT CASESPECIFIC)(FLOAT,
         FORMAT '-9.99999999999999E-999'))=
         (NOS_USR.riverflow_csv_pathfilter in view
         riverflowview_csv.site_no)"), and the grouping identifier in field
         1.  Aggregate intermediate results are computed globally, then
         placed in Spool 4 in TD_Map1.  The size of Spool 4 is estimated
         with no confidence to be 222 rows (311,244 bytes).  The estimated
         time for this step is 2.18 seconds.
    […]

    (2) is doing the path filtering. It is using the constant 09380000 as a path filtering expression to build the metadata spool. The metadata spool is the spool table that identifies the list of objects the query will actually process.

    (4) is doing traditional row filtering. It is compares the site number extracted from the location string to the value in the actual data in the object store.