CSV Examples | Native Object Store ( NOS ) | Teradata Vantage - 17.10 - Examples: Reading CSV Data - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - Native Object Store Getting Started Guide

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B035-1214-171K
Language
English (United States)

The following examples show accessing CSV-formatted data from Teradata's public Amazon S3 bucket.

The following table provides the LOCATION values for all the Teradata-supplied public external object stores that contain CSV data. You can run the examples using any of these locations, after first setting up the function mapping here Setting Up to Run the Examples.

Platform LOCATION
Amazon S3 /s3/td-usgs-public.s3.amazonaws.com/CSVDATA/
Azure Blob storage /az/akiaxox5jikeotfww4ul.blob.core.windows.net/td-usgs/CSVDATA/
Google Cloud Storage /gs/storage.googleapis.com/td-usgs/CSVDATA/

If you want to use your own external object store, see Variable Substitutions for Examples for examples of LOCATION, ACCESS_ID/USER, and ACCESS_KEY/PASSWORD.

Using NOSREAD_KEYS to List the Files in the Specified LOCATION

SELECT TOP 2 location(CHAR(200)), ObjectLength FROM (
LOCATION='/s3/td-usgs-public.s3.amazonaws.com/CSVDATA/'
AUTHORIZATION='{"ACCESS_ID":"","ACCESS_KEY":""}'
RETURNTYPE='NOSREAD_KEYS'
) AS d;

See Using NOSREAD_KEYS to List the Files in the Specified LOCATION.

Setting Up to Run CSV Examples

CREATE FOREIGN TABLE riverflow_csv
, EXTERNAL SECURITY  MyAuthObj
USING ( LOCATION('/s3/td-usgs-public.s3.amazonaws.com/CSVDATA/') );

See Setting Up to Run CSV Examples.

Sampling External Data In Place Using READ_NOS

This example uses READ_NOS with a function mapping called READ_NOS_FM.

SELECT TOP 2 * FROM READ_NOS_FM (
USING
LOCATION('/s3/td-usgs-public.s3.amazonaws.com/CSVDATA/')
RETURNTYPE('NOSREAD_RECORD')
) AS D;

See Sampling External Data In Place Using READ_NOS.

Sampling External Data In Place Using a Foreign Table

SELECT TOP 2 *
FROM riverflow_csv;

See Sampling External Data In Place Using a Foreign Table.

Finding Maximum and Minimum Values

SELECT MAX(Flow), MIN(Flow)
FROM riverflow_csv;

See Finding Maximum and Minimum Values.

Querying for a Specific Set of Rows

SELECT MAX(Flow)
FROM riverflow_csv
WHERE site_no = '09394500';

See Querying for a Specific Set of Rows.

Loading External Data into the Database Using CREATE TABLE AS … WITH DATA

CREATE MULTISET TABLE riverflowprecip_csv AS (
SELECT site_no, Flow, GageHeight
FROM riverflow_csv
WHERE Precipitation > 0 )
WITH DATA
NO PRIMARY INDEX;

Display the number of rows in the table:

SELECT COUNT(*) FROM riverflowprecip_csv;

See Loading External Data into the Database Using CREATE TABLE AS … WITH DATA.

Loading External Data into the Database Using READ_NOS and CREATE TABLE AS

CREATE MULTISET TABLE riverflowperm_csv AS (
SELECT site_no, Flow, GageHeight, Precipitation, datetime, GageHeight2
FROM (
LOCATION='/s3/td-usgs-public.s3.amazonaws.com/CSVDATA/'
AUTHORIZATION=MyAuthObj
) AS d
) WITH DATA;

Display the number of rows in the table:

SELECT COUNT (*) FROM riverflowperm_csv;

See Loading External Data into the Database Using READ_NOS and CREATE TABLE AS.

Loading External Data into the Database Using INSERT ... SELECT

CREATE TABLE RiverFlowPermInsert_csv (
SiteNo INTEGER
,Flow DECIMAL(3,2)
,GageHeight DECIMAL(3,2)
,Precipitation DECIMAL(3,2)
,datetime TIMESTAMP(0) FORMAT'Y4-MM-DDBHH:MI'
,GageHeight2 DECIMAL(3,2) )
PRIMARY INDEX (SiteNo);

Insert the external data into the database table:

INSERT INTO RiverFlowPermInsert_csv
SELECT site_no, Flow, GageHeight, Precipitation, datetime, GageHeight2
WHERE site_no = 9474000
FROM riverflow_csv;

Query the data from the database table:

SELECT TOP 2 * FROM RiverFlowPermInsert_csv;

See Loading External Data into the Database Using INSERT ... SELECT.

Joining External Data and Database Tables

Create a database dimension table to join to:

CREATE SET TABLE rivernames (
    site_no INT,
    name CHAR(90) CHARACTER SET LATIN NOT CASESPECIFIC )
UNIQUE PRIMARY INDEX ( site_no ) ;

Create the foreign table or ask your database administrator to create the foreign table that is used to populate the dimension table:

CREATE FOREIGN TABLE nos_rivernames
, EXTERNAL SECURITY  MyAuthObj
USING ( LOCATION('/s3/td-usgs-public.s3.amazonaws.com/RIVERS/rivers.csv') );

Populate the dimension table:

INSERT INTO rivernames
SELECT site_no, name
FROM nos_rivernames;

Join the dimension table and external data (foreign table):

SELECT DISTINCT name(CHAR(100))
FROM riverflow_csv rf, rivernames rn
WHERE rf.site_no = rn.site_no
AND rf.Precipitation > 0.1
ORDER BY 1;

See Joining External Data and Database Tables.

Filtering Using A Path Filter in the CREATE FOREIGN TABLE Definition

CREATE FOREIGN TABLE riverflow_csv_pathfilter
, EXTERNAL SECURITY  MyAuthObj
USING (
LOCATION ('/s3/td-usgs-public.s3.amazonaws.com/CSVDATA/0938')
PATHPATTERN('$data/$siteno/$year/$month/$day') );

Where site_no is a site number value; for example, 0938.

Show the table definition:

SHOW TABLE riverflow_csv_pathfilter;

See Filtering Using a Path Filter in the CREATE FOREIGN TABLE Definition.

Filtering Using a Column Within the Data Set

SELECT TOP 2 GageHeight, Flow
FROM riverflow_csv_pathfilter
WHERE site_no = 09380000;

See Filtering Using a Column Within the Data Set.

Path Filtering Using a Key in the Path of the Data Set

SELECT TOP 2 GageHeight, Flow
FROM riverflow_csv_pathfilter
WHERE $path.$siteno = 09380000;

See Path Filtering Using a Key in the Path of the Data Set.

Querying a Foreign Table with Path Filtering

Use path filtering to narrow down search results:

SELECT TOP 2 *
FROM riverflow_csv_pathfilter
WHERE $PATH.$year = 2018
AND $PATH.$month = 07
AND $PATH.$day = '01.csv'
AND $PATH.$siteno = 09380000;

See Querying a Foreign Table with Path Filtering.

Combining Path and Payload Filtering

Combine path and payload filtering:

SELECT COUNT(*)
FROM riverflow_csv_pathfilter
WHERE GageHeight > 9.5
AND $path.$siteno = 09380000
AND $path.$year = 2018
AND $path.$month = 06;

See Combining Path and Payload Filtering.

Creating a Basic View

Create the view of the foreign table:

CREATE VIEW riverflowview_csv AS (
SELECT
Flow,
GageHeight,
Precipitation,
Temp,
Conductance,
Site_no
FROM riverflow_csv_pathfilter );

Query the view:

SELECT TOP 2 * FROM riverflowview_csv;

See Creating a Basic View.

Using Path Variables as Columns in a View

REPLACE VIEW riverflowview_csv AS (
SELECT
CAST($path.$siteno AS CHAR(10)) TheSite,
CAST($path.$year AS CHAR(4)) TheYear,
CAST($path.$month AS CHAR(2)) TheMonth,
CAST(SUBSTR($path.$day, 1, 2) AS CHAR(2)) TheDay,
Flow,
GageHeight,
Precipitation,
Temp,
Conductance
FROM riverflow_csv_pathfilter);

Query the view:

SELECT TOP 2 * FROM riverflowview_csv;

See Using Path Variables as Columns in a View.

Using a Foreign Table View with Path Filtering

Replace the view of the foreign table:

REPLACE VIEW riverflowview_csv AS (
SELECT
CAST($path.$siteno AS CHAR(10)) TheSite,
CAST($path.$year AS CHAR(4)) TheYear,
CAST($path.$month AS CHAR(2)) TheMonth,
CAST(SUBSTR($path.$day, 1, 2) AS CHAR(2)) TheDay,
Flow,
GageHeight,
Precipitation,
Temp,
Conductance
FROM riverflow_csv_pathfilter
WHERE TheSite = site_no);

Query the view:

SELECT TOP 2 * FROM riverflowview_csv;

Filter the view:

SELECT thesite,COUNT(*)
FROM riverflowview_csv WHERE thesite='09380000'
GROUP BY 1;

Run EXPLAIN on the query to see how the filtering is done:

EXPLAIN
SELECT thesite,COUNT(*)
FROM riverflowview_csv WHERE thesite='09380000'
GROUP BY 1;

See Using a Foreign Table View with Path Filtering.