Path Filtering for Parquet | Native Object Store ( NOS ) | Teradata Vantage - Querying a Foreign Table with Path Filtering - Advanced SQL Engine - Teradata Database

Teradata Vantageā„¢ - Native Object Store Getting Started Guide

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
Published
January 2021
Language
English (United States)
Last Update
2021-01-22
dita:mapPath
zws1595641486108.ditamap
dita:ditavalPath
hoy1596145193032.ditaval
dita:id
B035-1214
lifecycle
previous
Product Category
Software
Teradata Vantage

All foreign tables have a generic PATHPATTERN; for example, PATHPATTERN('$Var1/$Var2/$Var3/$Var4/$Var5/$Var6/$Var7/$Var8/$Var9/$Var10/$Var11').

To make path filtering in a query easier, explicitly specify a PATHPATTERN with meaningful names when you create the foreign table. For example, to match the path names selected in the river flow example, use ('$data/$siteno/$year/$month/$day'). PATHPATTERNs are useful for limiting results, instead of returning all the objects in an external object store.

The examples use a sample river flow data set. To use your own data, replace the table and column names, and authorization object. See Variable Substitutions for Examples for the credentials and location values for the sample data set.

  1. To run NOS-related commands, log on to the database as a user with the required privileges.
  2. If it does not exist, create the foreign table or ask your database administrator to create the foreign table called riverflow_parquet_path. See: Filtering External Parquet Data From a Foreign Table.
  3. Use path filtering to narrow down search results. To see the data for a site number on a particular date, query the path attributes:
    SELECT TOP 2 *
    FROM riverflow_parquet_path
    WHERE $PATH.$year = '2018'
      AND $PATH.$month = '07'
      AND $PATH.$day = '01.parquet'
      AND $PATH.$siteno = '09394500';

    The $PATH.$day variable represents the actual object in the path segment name ( data / siteno / year / month / day). As such, it often has a file extension as part of the name, as shown in the example.

    Your result will be similar to this, reflecting your external object store location:

         Location /S3/td-usgs.s3.amazonaws.com/PARQUETDATA/09394500/2018/07/01.parquet
      GageHeight2     4.81
             Flow      0.00
          site_no              9394500
         datetime 2018-07-01 00:15
    Precipitation     0.00
       GageHeight     4.75
         Location /S3/td-usgs.s3.amazonaws.com/PARQUETDATA/09394500/2018/07/01.parquet
      GageHeight2     4.82
             Flow      0.01
          site_no              9394500
         datetime 2018-07-01 00:00
    Precipitation     0.00
       GageHeight     4.75
    For readability, the output is displayed vertically.

    Note that the more specific you are with path filtering, the more selective are the results, potentially saving processing time and external storage access costs.

  4. Filter on the siteno using path filtering:
    SELECT TOP 2 GageHeight TheGageHeight
    , Flow TheFlow
    FROM riverflow_parquet_path
    WHERE $path.$siteno = '09394500';

    Result:

    TheGageHeight    TheFlow
    -------------  ---------
             4.75       0.01
             5.04       3.53