Path Filter on a View for CSV | Native Object Store ( NOS ) | Teradata Vantage - Using a View with Path Filtering - Advanced SQL Engine

Path Filter on a View for CSV | Native Object Store ( NOS ) | Teradata Vantage - Using a View with Path Filtering - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - Native Object Store Getting Started Guide

Product

Advanced SQL Engine

Teradata Database

Release Number

17.05

Published

January 2021

Language

English (United States)

Last Update

2021-01-22

dita:mapPath

zws1595641486108.ditamap

dita:ditavalPath

hoy1596145193032.ditaval

dita:id

B035-1214

lifecycle

Product Category

Software

Teradata Vantage

The following example is of a view on a river flow foreign table that renames path (key) IDs and payload attributes, so that both kinds of data can be referred to in SQL queries.

The examples use a sample river flow data set. To use your own data, replace the table and column names, and authorization object. See Variable Substitutions for Examples for the credentials and location values for the sample data set.

Before proceeding, verify with your database administrator that you have the correct privileges, an authorization object, and a function mapping (for READ_NOS statements).

To run NOS-related commands, log on to the database as a user with the required privileges.
If it does not exist, create the foreign table or ask your database administrator to create the foreign table called riverflow_csv_path: See Filtering on the Payload Column of a Foreign Table.

Create the view of the foreign table:

REPLACE VIEW riverflowview_csv AS (
SELECT CAST($path.$siteno AS CHAR(10)) TheSite,
CAST($path.$year AS CHAR(4)) TheYear,
CAST($path.$month AS CHAR(2)) TheMonth,
CAST(SUBSTR($path.$day, 1, 2) AS CHAR(2)) TheDay,
CAST(payload..Flow AS FLOAT) ( FORMAT '-ZZZZ9.99') Flow,
CAST(payload..GageHeight AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight1,
CAST(payload..Precipitation AS FLOAT) ( FORMAT '-ZZZ9.99') Precipitation,
CAST(payload..Temp AS FLOAT) ( FORMAT '-ZZZ9.99') Temperature,
CAST(payload..Velocity AS FLOAT) ( FORMAT '-ZZZ9.99') Velocity,
CAST(payload..BatteryVoltage AS FLOAT) ( FORMAT '-ZZZ9.99') BatteryVoltage,
CAST(payload..GageHeight2 AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight2
FROM riverflow_csv_path
WHERE TheSite = payload..site_no);

Query the view:

SELECT TOP 2 * FROM riverflowview_csv;

Your result will be similar to the following::

TheSite  TheYear TheMonth TheDay Site_no   Flow GageHeight1 Precipitation Temperature Velocity BatteryVoltage
-------- ------- -------- ------ -------- ----- ----------- ------------- ----------- -------- --------------
09396100   2018    07       16   09396100  24.40       1.33          0.00       ?        ?              ?
09396100   2018    07       16   09396100 113.00       1.83          0.00       ?        ?              ?

Columns with missing attributes return NULLs when the record does not contain the attribute.

Filter on the view to find the number of sites with the site number of 09396100:

SELECT thesite,COUNT(*)                                            
FROM riverflowview_csv WHERE thesite='09396100'
GROUP BY 1;

Your result will be similar to the following::

TheSite        Count(*)
----------  -----------
09396100           3086

Run EXPLAIN on the query to see how the filtering is done:

EXPLAIN
SELECT thesite,COUNT(*)                                            
FROM riverflowview_csv WHERE thesite='09396100'
GROUP BY 1;

Your result will be similar to the following:

Explanation
--------------------------------------------------------------------------
[…]
  3) We do a single-AMP RETRIEVE step from OB.riverflow_csv_path in
     view riverflowview_csv metadata by way of an all-rows scan with a
     condition of ("(TD_SYSFNLIB.NosExtractVarFromPath (
     OB.riverflow_csv_path in view riverflowview_csv.Location,
     '/s3/td-usgs.s3.amazonaws.com/', 2 )(CHAR(10), CHARACTER SET
     UNICODE, NOT CASESPECIFIC))= '09396100 '") into Spool 3 (all_amps),
     which is built locally on that AMP.  Then we do a SORT to order
     Spool 3 by the sort key as the field_id list( 3, 2).  The size of
     Spool 3 is estimated with no confidence to be 38 rows (26,790
     bytes).  The estimated time for this step is 0.55 seconds.
  4) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 3 (Last Use)
     by way of an all-rows scan into Spool 2 (all_amps), which is
     binpacked and redistributed by size to all AMPs in TD_Map1.  The
     size of Spool 2 is estimated with no confidence to be 38 rows (
     27,094 bytes).  The estimated time for this step is 0.08 seconds.
  5) We do an all-AMPs SUM step in TD_MAP1 to aggregate from
     OB.riverflow_csv_path in view riverflowview_csv
     by way of external metadata in Spool 2 (Last Use) with a condition
     of ("(TD_SYSFNLIB.NosExtractVarFromPath (OB.riverflow_csv_path in
     view riverflowview_csv.Location, '/s3/td-usgs.s3.amazonaws.com/', 2
     )(CHAR(10), CHARACTER SET UNICODE, NOT CASESPECIFIC))=
     (TRANSLATE((OB.riverflow_csv_path in view
     riverflowview_csv.Payload .EXTRACTVALUE (
     '242E2E736974655F6E6F'XB(VARBYTE(63000)), 'list'(VARCHAR(20),
     CHARACTER SET LATIN, NOT CASESPECIFIC)))USING LATIN_TO_UNICODE))"),
     and the grouping identifier in field 1.  Aggregate Intermediate
     Results are computed globally, then placed in Spool 5 in TD_Map1.
     The size of Spool 5 is estimated with no confidence to be 504 rows
     (706,608 bytes).  The estimated time for this step is 0.32 seconds.
[…]

(3) is doing the path filtering. It is using the constant 09396100 as a path filtering expression to build the metadata spool. The metadata spool is the spool table that identifies the list of objects the query will actually process.

(5) is doing traditional row filtering. It is compares the site number extracted from the location string to the value in the CSV payload (the actual data in the object store).