Create Foreign Table with Schema for Parquet Data | VantageCloud Lake - Example: Creating Foreign Table to Access Parquet Data - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

This example shows how to create a foreign table for accessing Parquet data. The table definition includes an authorization object for accessing the remote repository containing the river flow Parquet files.

CREATE AUTHORIZATION Statement

CREATE AUTHORIZATION MyAuthObj
USER 'YOUR-ACCESS-KEY-ID' 
PASSWORD 'YOUR-SECRET-ACCESS-KEY';

Parquet Data Schema

message schema {
optional double GageHeight2;
optional double Flow;
optional int64 site_no;
optional binary datetime (UTF8);
optional double Precipitation;
optional double GageHeight;
}

CREATE FOREIGN TABLE Statement

This statement creates the foreign table and specifies the authorization MyAuthObj. This table definition defines a column with the corresponding Teradata data type for each Parquet logical data type in the river flow data. The column definitions must match the data formats and positions in the Parquet file. Parquet foreign tables are partitioned by column.

CREATE FOREIGN TABLE riverflow_parquet
, EXTERNAL SECURITY MyAuthObj
(
Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC
, GageHeight2 DOUBLE PRECISION
, Flow DOUBLE PRECISION
, site_no BIGINT
, datetime VARCHAR(16) CHARACTER SET UNICODE CASESPECIFIC
, Precipitation DOUBLE PRECISION
, GageHeight DOUBLE PRECISION
)
USING (
LOCATION ('/S3/YOUR-BUCKET.s3.amazonaws.com/PARQUETDATA')
STOREDAS ('PARQUET')
) NO PRIMARY INDEX
PARTITION BY COLUMN ;

Select Data from Foreign Table

This statement retrieves a row count, maximum, and minimum values from riverflow_parquet .

SELECT COUNT(Flow) AS count_col
, MAX(Flow) ( FORMAT '-ZZZZ9.99') AS max_col
, MIN(Flow) ( FORMAT '-ZZZZ9.99') AS min_col
FROM riverflow_parquet;
   count_col    max_col    min_col
-----------  ---------  ---------
      15406    1570.00       0.00