Path Filter on a View for Parquet | Native Object Store ( NOS ) | Teradata Vantage - 17.05 - Using a View with Path Filtering - Advanced SQL Engine - Teradata Database

Teradata Vantageā„¢ - Native Object Store Getting Started Guide

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
Release Date
January 2021
Content Type
Programming Reference
Publication ID
B035-1214-175K
Language
English (United States)
The following example is of a view on a river flow foreign table that renames path (key) IDs and column attributes, so that both kinds of data can be referred to in SQL queries.

The examples use a sample river flow data set. To use your own data, replace the table and column names, and authorization object. See Variable Substitutions for Examples for the credentials and location values for the sample data set.

  1. To run NOS-related commands, log on to the database as a user with the required privileges.
  2. If it does not exist, create the foreign table or ask your database administrator to create the foreign table called riverflow_parquet_path. See: Filtering External Parquet Data From a Foreign Table.
  3. Create the view of the foreign table:
    REPLACE VIEW riverflowview_parquet AS (
    SELECT CAST($path.$siteno AS CHAR(10)) TheSite,
    CAST($path.$year AS CHAR(4)) TheYear,
    CAST($path.$month AS CHAR(2)) TheMonth,
    CAST(SUBSTR($path.$day, 1, 2) AS CHAR(2)) TheDay,
    Flow,
    GageHeight GageHeight1,
    Precipitation,
    GageHeight2
    FROM riverflow_parquet_path
    WHERE TheSite = site_no);
  4. Query the view:
    SELECT TOP 2 * FROM riverflowview_parquet;

    Result:

    TheSite  TheYear TheMonth TheDay Site_no    Flow  GageHeight1 Precipitation GageHeight2
    -------- ------- -------- ------ --------  ------ ----------- ------------- -----------
    09396100    2018     07    15    9396100   153.00         1.96         0.00        1.96
    09396100    2018     07    15    9396100   150.00         1.95         0.00        1.95
  5. Filter on the view to find the number of sites with the site number of 09396100:
    SELECT thesite,COUNT(*)
    FROM riverflowview_parquet WHERE thesite='09396100'
    GROUP BY 1;

    Result:

    TheSite        Count(*)
    ----------  -----------
    09396100           2906
  6. Run EXPLAIN on the query to see how the filtering is done:
    EXPLAIN
    SELECT thesite,COUNT(*)
    FROM riverflowview_parquet WHERE thesite='09396100'
    GROUP BY 1;

    Result:

    Explanation
    ---------------------------------------------------------------------------
    [...]
      3) We do a single-AMP RETRIEVE step from
         NOS_USR.riverflow_parquet_path in view riverflowview_parquet
         metadata by way of an all-rows scan with a condition of (
         "(TD_SYSFNLIB.NosExtractVarFromPath (
         NOS_USR.riverflow_parquet_path in view
         riverflowview_parquet.Location, '/S3/td-usgs.s3.amazonaws.com/', 2
         )(CHAR(10), CHARACTER SET UNICODE, NOT CASESPECIFIC))= '09396100 '")
    
         into Spool 3 (all_amps), which is built locally on that AMP.  Then
         we do a SORT to order Spool 3 by the sort key as the field_id
         list( 3, 2).  The size of Spool 3 is estimated with no confidence
         to be 17 rows (11,985 bytes).  The estimated time for this step is
         0.60 seconds.
      4) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 3 (Last Use)
         by way of an all-rows scan into Spool 2 (all_amps), which is
         binpacked and redistributed by size to all AMPs in TD_Map1.  The
         size of Spool 2 is estimated with no confidence to be 17 rows (
         12,121 bytes).  The estimated time for this step is 0.16 seconds.
      5) We do an all-AMPs SUM step in TD_MAP1 to aggregate from 3 column
         partitions of NOS_USR.riverflow_parquet_path in view
         riverflowview_parquet by way of external metadata in Spool 2 (Last
         Use) with a condition of ("(TD_SYSFNLIB.NosExtractVarFromPath (
         NOS_USR.riverflow_parquet_path in view
         riverflowview_parquet.Location, '/S3/td-usgs.s3.amazonaws.com/', 2
         )(CHAR(10), CHARACTER SET UNICODE, NOT CASESPECIFIC)(FLOAT, FORMAT
         '-9.99999999999999E-999'))= (NOS_USR.riverflow_parquet_path in
         view riverflowview_parquet.site_no)"), and the grouping identifier
         in field 1.  Aggregate Intermediate Results are computed globally,
         then placed in Spool 5 in TD_Map1.  The size of Spool 5 is
         estimated with no confidence to be 1,215 rows (1,703,430 bytes).
         The estimated time for this step is 0.77 seconds.
    [...]

    (3) is doing the path filtering. It is using the constant 09396100 as a path filtering expression to build the metadata spool. The metadata spool is the spool table that identifies the list of objects the query will actually process.

    (5) is doing traditional row filtering. It is compares the site number extracted from the location string to the value in the Parquet data (the actual data in the object store).