Example of Foreign Table with Schema for Parquet Data | Teradata Vantage - Example: Creating a Foreign Table to Access Parquet Data - Advanced SQL Engine - Teradata Database

SQL Data Definition Language Syntax and Examples

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
Published
January 2021
Language
English (United States)
Last Update
2021-01-22
dita:mapPath
ncd1596241368722.ditamap
dita:ditavalPath
hoy1596145193032.ditaval
dita:id
B035-1144
lifecycle
previous
Product Category
Teradata Vantage™

This example shows how to create a foreign table for accessing Parquet data. The table definition includes an authorization object for accessing the remote repository containing the river flow Parquet files.

This statement creates an authorization object called DefAuth_S3 with definer trusted credentials.

CREATE AUTHORIZATION DefAuth_S3
AS DEFINER TRUSTED 
USER 'YOUR-ACCESS-KEY-ID' 
PASSWORD 'YOUR-SECRET-ACCESS-KEY';

The schema for the Parquet data looks like this:

message schema {
optional double GageHeight2;
optional double Flow;
optional int64 site_no;
optional binary datetime (UTF8);
optional double Precipitation;
optional double GageHeight;
}

This statement creates the foreign table and specifies the authorization DefAuth_S3. This table definition defines a column with the corresponding Teradata data type for each Parquet logical data type in the river flow data. The column definitions must match the data formats and positions in the Parquet file. Parquet foreign tables are partitioned by column.

CREATE FOREIGN TABLE riverflow_parquet
, EXTERNAL SECURITY DEFINER TRUSTED DefAuth_S3
(
Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC
, GageHeight2 DOUBLE PRECISION
, Flow DOUBLE PRECISION
, site_no BIGINT
, datetime VARCHAR(16) CHARACTER SET UNICODE CASESPECIFIC
, Precipitation DOUBLE PRECISION
, GageHeight DOUBLE PRECISION
)
USING (
LOCATION ('/S3/td-usgs.s3.amazonaws.com/PARQUETDATA')
STOREDAS ('PARQUET')
) NO PRIMARY INDEX
PARTITION BY COLUMN ;

This statement retrieves a row count, maximum, and minimum values from riverflow_parquet .

SEL COUNT(Flow) AS count_col
, MAX(Flow) ( FORMAT '-ZZZZ9.99') AS max_col
, MIN(Flow) ( FORMAT '-ZZZZ9.99') AS min_col
FROM riverflow_parquet;

Result:

   count_col    max_col    min_col
-----------  ---------  ---------
      15406    1570.00       0.00