CSV Examples | Native Object Store ( NOS ) | Teradata Vantage - 17.05 - Examples: Reading CSV Data - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - Native Object Store Getting Started Guide

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
Release Date
January 2021
Content Type
Programming Reference
Publication ID
B035-1214-175K
Language
English (United States)

The following examples show accessing CSV-formatted data from Teradata's public Amazon S3 bucket.

The following table provides the LOCATION values for all the Teradata-supplied public external object stores that contain CSV data. You can run the examples using any of these locations, after first setting up the function mapping here Setting Up to Run the Examples.

Platform LOCATION
Amazon S3 /s3/td-usgs-public.s3.amazonaws.com/CSVDATA/
Azure Blob storage /az/akiaxox5jikeotfww4ul.blob.core.windows.net/td-usgs/CSVDATA/
Google Cloud Storage /gs/storage.googleapis.com/td-usgs/CSVDATA/

If you want to use your own external object store, see Variable Substitutions for Examples for examples of LOCATION, ACCESS_ID/USER, and ACCESS_KEY/PASSWORD.

Using SHOW AUTHORIZATION

SHOW AUTHORIZATION NOS_USR.DefAuth;

If the authorization object does not exist, run the steps in Setting Up to Run the Examples.

Setting Up to Run CSV Examples

CREATE FOREIGN TABLE riverflow_csv
, EXTERNAL SECURITY DEFINER TRUSTED DefAuth
( Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
Payload DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV
)
USING (LOCATION('/s3/td-usgs-public.s3.amazonaws.com/CSVDATA/'));

See Setting Up to Run CSV Examples.

Sampling External Data In Place Using READ_NOS

SELECT TOP 2 payload..* FROM READ_NOS_FM (
ON ( SELECT CAST( NULL AS DATASET STORAGE FORMAT CSV ) )
USING
LOCATION('/s3/td-usgs-public.s3.amazonaws.com/CSVDATA/')
RETURNTYPE('NOSREAD_RECORD')
) AS D;

Sample result:

Payload..*
------------------------------------------------------------------
["11.6","10000","09380000","2018-07-01 00:00","662","0.00","8.65"]
["10.6","17600","09380000","2018-07-03 00:00","674","0.00","10.23"]

See Sampling External Data In Place Using READ_NOS.

Sampling External Data In Place Using a Foreign Table

SELECT TOP 2 payload
FROM riverflow_csv;

Sample result:

Payload
-----------------------------------------------------------------------------------
BatteryVoltage,Flow,site_no,datetime,Precipitation,GageHeight ,0.00,09400815,2018-07-12 00:15,0.00,-0.01
BatteryVoltage,Flow,site_no,datetime,Precipitation,GageHeight ,0.00,09400815,2018-07-12 00:45,0.00,-0.01

See Sampling External Data In Place Using a Foreign Table.

Using DATASET_KEYS to See Header Names

SELECT DISTINCT * FROM DATASET_KEYS (ON (SELECT payload FROM riverflow_csv)) AS csvKeys;

Sample result:

DatasetKeys
-----------------
Precipitation
BatteryVoltage
Temp
Velocity
Conductance
WaterVelocity
site_no
GageHeight
Flow
GageHeight2
datetime

See Using DATASET_KEYS to See Header Names.

Finding Maximum and Minimum Values

SELECT MAX(CAST(payload..Flow AS FLOAT) ( FORMAT '-ZZZZ9.99'))
      , MIN(CAST(payload..Flow AS FLOAT) ( FORMAT '-(5)ZZZ9.99'))
 FROM riverflow_csv;

Sample result:

Maximum(Payload..Flow)  Minimum(Payload..Flow)
----------------------  ----------------------
              19000.00                   -9.72

See Finding Maximum and Minimum Values.

Querying for a Specific Set of Rows

SELECT MAX(CAST(payload..Flow AS FLOAT) ( FORMAT '-ZZZZ9.99'))
FROM riverflow_csv
WHERE payload..site_no = '09394500';

Sample result:

Maximum(Payload..Flow)
----------------------
               1230.00

See Querying for a Specific Set of Rows.

Previewing Data Using READ_NOS

SELECT TOP 2 payload FROM READ_NOS_FM (
ON ( SELECT CAST( NULL AS DATASET STORAGE FORMAT CSV ) )
USING
LOCATION('/s3/td-usgs-public.s3.amazonaws.com/CSVDATA/')
RETURNTYPE('NOSREAD_RECORD')
) AS D;

Sample result:

Payload
---------------------------------------------------------------------------------
Temp,Flow,site_no,datetime,Conductance,Precipitation,GageHeight 11.4,11400,09380000,2018-07-02 00:00,654,0.00,8.97
Temp,Flow,site_no,datetime,Conductance,Precipitation,GageHeight 10.6,17600,09380000,2018-07-03 00:00,674,0.00,10.23

See Previewing Data Using READ_NOS.

Viewing the Key Structure

SELECT TOP 2 LOCATION FROM READ_NOS_FM(
USING
RETURNTYPE('NOSREAD_KEYS')
LOCATION('/s3/td-usgs-public.s3.amazonaws.com/CSVDATA/')
) AS D;

Sample result:

Location
------------------------------------------------------------
/S3/td-usgs.s3.amazonaws.com/CSVDATA/09380000/2018/07/02.csv
/S3/td-usgs.s3.amazonaws.com/CSVDATA/09380000/2018/07/03.csv

See Viewing the Key Structure.

Loading External Data into the Database Using CREATE TABLE AS...WITH DATA

CREATE MULTISET TABLE riverflowprecip_csv AS (
SELECT CAST(payload..site_no AS CHAR(8)) SiteNo,
CAST(payload..Flow AS FLOAT) Flow,
CAST(payload..GageHeight AS FLOAT) GageHeight
FROM riverflow_csv
WHERE payload..Precipitation > 0 )
WITH DATA
NO PRIMARY INDEX;

Display the number of rows in the table:

SELECT COUNT(*) FROM riverflowprecip_csv;

Sample result:

Count(*)
-----------
        396

See Loading External Data into the Database Using CREATE TABLE AS...WITH DATA.

Loading External Data into the Database Using READ_NOS and CREATE TABLE AS

CREATE MULTISET TABLE riverflowperm_csv AS
(
SELECT
CAST(payload..site_no AS CHAR(8)) Site_no,
CAST(payload..Flow AS FLOAT) Flow,
CAST(payload..GageHeight AS FLOAT) GageHeight1,
CAST(payload..Precipitation AS FLOAT) Precipitation,
CAST(payload..Temp AS FLOAT) Temperature,
CAST(payload..Velocity AS FLOAT) Velocity
FROM READ_NOS_FM (
ON (SELECT CAST(NULL AS DATASET STORAGE FORMAT CSV))
USING
LOCATION('/s3/td-usgs-public.s3.amazonaws.com/CSVDATA/')
) AS D
) WITH DATA;

Display the number of rows in the table:

SELECT COUNT (*) FROM riverflowperm_csv;

Sample result:

Count(*)
-----------
        36301

See Loading External Data into the Database Using READ_NOS and CREATE TABLE AS.

Loading External Data into the Database Using INSERT SELECT

CREATE TABLE RiverFlowPermInsert_csv
(DateTS CHAR(20)
,SiteNo CHAR(8)
,Flow FLOAT FORMAT '-ZZZZ9.99'
,Velocity FLOAT FORMAT '-ZZZ9.99'
,GageHeight FLOAT FORMAT '-ZZZ9.99'
,Temp FLOAT FORMAT '-ZZZ9.99')
PRIMARY INDEX (SiteNo);

Insert the external data into the database table:

INSERT INTO RiverFlowPermInsert_csv
SELECT
payload..datetime
,payload..site_no
,Cast (payload..Flow AS FLOAT)
,Cast (payload..Velocity AS FLOAT)
,Cast (payload..GageHeight AS FLOAT)
,Cast (payload..Temp AS FLOAT)
WHERE payload..site_no = '09429070'
FROM riverflow_csv;

Query the data from the database table:

SELECT TOP 2 * FROM RiverFlowPermInsert_csv;

Sample result:

DateTS                SiteNo        Flow  Velocity  GageHeight     Temp
--------------------  --------  --------  --------  ----------  -------
2018-07-02 01:00      09429070    149.00      0.78        5.75    78.80
2018-07-02 00:30      09429070    145.00      0.77        5.68    78.90

See Loading External Data into the Database Using INSERT SELECT.

Joining External Data and Database Tables

Create a database dimension table to join to:

CREATE SET TABLE rivernames(
      site_no CHAR(8) CHARACTER SET LATIN NOT CASESPECIFIC,
      name CHAR(60) CHARACTER SET LATIN NOT CASESPECIFIC)
UNIQUE PRIMARY INDEX ( site_no ) ;

Create the foreign table or ask your database administrator to create the foreign table that is used to populate the dimension table:

CREATE FOREIGN TABLE nos_rivernames
, EXTERNAL SECURITY DEFINER TRUSTED DefAuth
(   Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
    PAYLOAD DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV
)
USING (
    LOCATION('/s3/td-usgs-public.s3.amazonaws.com/RIVERS/rivers.csv')
);

Populate the dimension table:

INSERT INTO rivernames
  SELECT payload..site_no, payload..name
 FROM nos_rivernames;

Join the dimension table and external data (foreign table):

SELECT DISTINCT name(CHAR(60))
  FROM riverflow_csv rf, rivernames rn
 WHERE rf.payload..site_no = rn.site_no
   AND rf.payload..Precipitation > 0.1
 ORDER BY 1;

Sample result:

name
----------------------------------
CIBECUE CREEK NEAR OVERGAARD
COLORADO RIVER AT LEES FERRY
GILA RIVER AT KELVIN
LITTLE COLORADO RIVER AT WOODRUFF
NEW RIVER NEAR ROCK SPRINGS
NEWMAN CANYON ABOVE UPPER LAKE MARY
POLACCA WASH NEAR SECOND MESA
PUERCO RIVER NEAR CHAMBERS
SALT RIVER NEAR CHRYSOTILE
SANTA MARIA RIVER NEAR BAGDAD

See Joining External Data and Database Tables.

Filtering Using A Path Filter in the CREATE FOREIGN TABLE Definition

CREATE FOREIGN TABLE riverflow_csv_pathfilter
, EXTERNAL SECURITY DEFINER TRUSTED DefAuth (
Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
Payload DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV
)
USING
(
    LOCATION  ('/s3/td-usgs-public.s3.amazonaws.com/CSVDATA/site_no')
PATHPATTERN('$data/$siteno/$year/$month/$day') );

Where site_no is a site number value; for example, 0938.

Show the table definition:

SHOW TABLE riverflow_csv_pathfilter;

See Filtering Using a Path Filter in the CREATE FOREIGN TABLE Definition.

Filtering on the Payload Column of a Foreign Table

Create a foreign table or ask your database administrator to create the foreign table using a custom PATHPATTERN:

CREATE FOREIGN TABLE riverflow_csv_path
, EXTERNAL SECURITY DEFINER TRUSTED DefAuth (
Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
Payload DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV
)
USING
(
LOCATION ('/s3/td-usgs-public.s3.amazonaws.com/CSVDATA/')
PATHPATTERN('$data/$siteno/$year/$month/$day') );

Run a query using a payload filter only:

SELECT TOP 2
CAST(payload..GageHeight AS FLOAT)( FORMAT '-ZZZ9.99')
TheGageHeight,
CAST (payload..Flow AS FLOAT)(FORMAT '-ZZZ9999.99') TheFlow
FROM riverflow_csv_path
WHERE payload..site_no = '09380000';

Sample result:

TheGageHeight      TheFlow
-------------  -----------
        10.21     17500.00
        10.20     17400.00

Run a similar query using equivalent path values in the filter:

SELECT TOP 2
CAST(payload..GageHeight AS FLOAT)( FORMAT '-ZZZ9.99')
TheGageHeight,
CAST (payload..Flow AS FLOAT)(FORMAT '-ZZZ9999.99') TheFlow
FROM riverflow_csv_path
WHERE $path.$siteno = '09380000';

Sample result:

TheGageHeight      TheFlow
-------------  -----------
        10.16     17200.00
        10.22     17500.00

See Filtering on the Payload Column of a Foreign Table.

Querying a Foreign Table with Path Filtering

Use path filtering to narrow down search results:

SELECT TOP 2 payload..*
FROM riverflow_csv_path
WHERE $PATH.$year = '2018'
AND $PATH.$month = '07'
AND $PATH.$day = '01.csv'
AND $PATH.$siteno = '09400815';

Sample result:

Payload..*
----------------------------------------------------------
[null,"0.00","09400815","2018-07-01 00:15","0.00","-0.01"]
[null,"0.00","09400815","2018-07-01 00:45","0.00","-0.01"]

See Querying a Foreign Table with Path Filtering.

Combining Path and Payload Filtering

Combine path and payload filtering:

SELECT COUNT(*)
FROM riverflow_csv_path
WHERE payload..GageHeight > 9.5
AND $path.$siteno = 09380000
AND $path.$year = '2018'
AND $path.$month = '06';

Sample result:

   Count(*)
-----------
        185

See Combining Path and Payload Filtering.

Creating a Basic View

Create the view of the foreign table:

CREATE VIEW riverflowview_csv AS (
SELECT
CAST(payload..Flow AS FLOAT) ( FORMAT '-ZZZZ9.99') Flow,
CAST(payload..GageHeight AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight1,
CAST(payload..Precipitation AS FLOAT) ( FORMAT '-ZZZ9.99') Precipitation,
CAST(payload..Temp AS FLOAT) ( FORMAT '-ZZZ9.99') Temperature,
CAST(payload..Velocity AS FLOAT) ( FORMAT '-ZZZ9.99') Velocity,
CAST(payload..BatteryVoltage AS FLOAT) ( FORMAT '-ZZZ9.99') BatteryVoltage,
CAST(payload..GageHeight2 AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight2
FROM riverflow_csv_path );

Query the view:

SELECT TOP 2 * FROM riverflowview_csv;

Sample result:

Flow   GageHeight1  Precipitation  Temperature  Velocity  BatteryVoltage  GageHeight2
------ -----------  -------------  -----------  --------  --------------  -----------
186.00        2.05              ?            ?         ?               ?         2.05
232.00        2.16           0.00            ?         ?               ?          2.16

See Creating a Basic View.

Using Path Variables as Columns in a View

REPLACE VIEW riverflowview_csv AS (
SELECT CAST($path.$siteno AS CHAR(10)) TheSite,
CAST($path.$year AS CHAR(4)) TheYear,
CAST($path.$month AS CHAR(2)) TheMonth,
CAST(SUBSTR($path.$day, 1, 2) AS CHAR(2)) TheDay,
CAST(payload..Flow AS FLOAT) ( FORMAT '-ZZZZ9.99') Flow,
CAST(payload..GageHeight AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight1,
CAST(payload..Precipitation AS FLOAT) ( FORMAT '-ZZZ9.99') Precipitation,
CAST(payload..Temp AS FLOAT) ( FORMAT '-ZZZ9.99') Temperature,
CAST(payload..Velocity AS FLOAT) ( FORMAT '-ZZZ9.99') Velocity,
CAST(payload..BatteryVoltage AS FLOAT) ( FORMAT '-ZZZ9.99') BatteryVoltage,
CAST(payload..GageHeight2 AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight2
FROM riverflow_csv_path);

See Using Path Variables as Columns in a View.

Using a Foreign Table View with Path Filtering

Replace the view of the foreign table:

REPLACE VIEW riverflowview_csv AS (
SELECT CAST($path.$siteno AS CHAR(10)) TheSite,
CAST($path.$year AS CHAR(4)) TheYear,
CAST($path.$month AS CHAR(2)) TheMonth,
CAST(SUBSTR($path.$day, 1, 2) AS CHAR(2)) TheDay,
CAST(payload..Flow AS FLOAT) ( FORMAT '-ZZZZ9.99') Flow,
CAST(payload..GageHeight AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight1,
CAST(payload..Precipitation AS FLOAT) ( FORMAT '-ZZZ9.99') Precipitation,
CAST(payload..Temp AS FLOAT) ( FORMAT '-ZZZ9.99') Temperature,
CAST(payload..Velocity AS FLOAT) ( FORMAT '-ZZZ9.99') Velocity,
CAST(payload..BatteryVoltage AS FLOAT) ( FORMAT '-ZZZ9.99') BatteryVoltage,
CAST(payload..GageHeight2 AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight2
FROM riverflow_csv_path
WHERE TheSite = payload..site_no);

Query the view:

SELECT TOP 2 * FROM riverflowview_csv;

Sample result:

TheSite  TheYear TheMonth TheDay Site_no     Flow   GageHeight1 Precipitation Temperature Velocity BatteryVoltage
-------- ------- -------- ------ ----------- ------ ----------- ------------- ----------- -------- --------------
09396100   2018    07       16     09396100   24.40        1.33          0.00           ?        ?              ?
09396100   2018    07       16     09396100  113.00        1.83          0.00           ?        ?              ?

Filter the view:

SELECT thesite,COUNT(*)                                            
FROM riverflowview_csv WHERE thesite='09396100'
GROUP BY 1;

Sample result:

TheSite        Count(*)
----------  -----------
09396100           3086

Run EXPLAIN on the query to see how the filtering is done:

EXPLAIN
SELECT thesite,COUNT(*)                                            
FROM riverflowview_csv WHERE thesite='09396100'
GROUP BY 1;

Sample result:

Explanation
--------------------------------------------------------------------------
[…]
  3) We do a single-AMP RETRIEVE step from OB.riverflow_csv_path in
     view riverflowview_csv metadata by way of an all-rows scan with a
     condition of ("(TD_SYSFNLIB.NosExtractVarFromPath (
     OB.riverflow_csv_path in view riverflowview_csv.Location,
     '/s3/s3.amazonaws.com/td-usgs', 2 )(CHAR(10), CHARACTER SET
     UNICODE, NOT CASESPECIFIC))= '09396100 '") into Spool 3 (all_amps),
     which is built locally on that AMP.  Then we do a SORT to order
     Spool 3 by the sort key as the field_id list( 3, 2).  The size of
     Spool 3 is estimated with no confidence to be 38 rows (26,790
     bytes).  The estimated time for this step is 0.55 seconds.
  4) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 3 (Last Use)
     by way of an all-rows scan into Spool 2 (all_amps), which is
     binpacked and redistributed by size to all AMPs in TD_Map1.  The
     size of Spool 2 is estimated with no confidence to be 38 rows (
     27,094 bytes).  The estimated time for this step is 0.08 seconds.
  5) We do an all-AMPs SUM step in TD_MAP1 to aggregate from
     OB.riverflow_csv_path in view riverflowview_csv
     by way of external metadata in Spool 2 (Last Use) with a condition
     of ("(TD_SYSFNLIB.NosExtractVarFromPath (OB.riverflow_csv_path in
     view riverflowview_csv.Location, '/s3/td-usgs.s3.amazonaws.com/', 2
     )(CHAR(10), CHARACTER SET UNICODE, NOT CASESPECIFIC))=
     (TRANSLATE((OB.riverflow_csv_path in view
     riverflowview_csv.Payload .EXTRACTVALUE (
     '242E2E736974655F6E6F'XB(VARBYTE(63000)), 'list'(VARCHAR(20),
     CHARACTER SET LATIN, NOT CASESPECIFIC)))USING LATIN_TO_UNICODE))"),
     and the grouping identifier in field 1.  Aggregate Intermediate
     Results are computed globally, then placed in Spool 5 in TD_Map1.
     The size of Spool 5 is estimated with no confidence to be 504 rows
     (706,608 bytes).  The estimated time for this step is 0.32 seconds.
[…]

See Using a View with Path Filtering.