Parquet External File | CREATE FOREIGN TABLE SQL Statement | Teradata Vantage - Parquet External Files - Advanced SQL Engine - Teradata Database

Teradata Vantageā„¢ - Native Object Store Getting Started Guide

Product
Advanced SQL Engine
Teradata Database
Release Number
17.00
Published
September 2020
Language
English (United States)
Last Update
2021-01-22
dita:mapPath
jjn1567647976698.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1214
lifecycle
previous
Product Category
Software
Teradata Vantage

To access external files containing Parquet-formatted data, you must specify PARQUET for the USING...STOREDAS option in the CREATE FOREIGN TABLE statement.

Column names for Parquet data can contain a maximum of 128 characters and are case sensitive.

Foreign tables for Parquet data are created without a primary index (NOPI) and are column-partitioned.

You cannot specify the following table-level options for foreign tables that contain Parquet formatted data:
  • Row partitioning
  • Subrow partitioning (partial row partitioning for a column-partitioned table)
  • Multicolumn partitions
  • Autocompression
  • ROWFORMAT option of the USING clause
  • A primary index

External Parquet files can be compressed in Snappy format.

You must explicitly define data columns for foreign tables that contain Parquet data. Define the columns in the same order as the columns in the Parquet files, and use Vantage data types that correspond to the Parquet data types:

Parquet Logical Data Type Corresponding Vantage Data Type Description
UINT_8 SMALLINT Unsigned integer with range 0 to 255.
UINT_16 INTEGER Unsigned integer with range 0 to 65535.
UINT_32 BIGINT Unsigned integer with range 0 to 4,294,967,295.
UINT_64 DECIMAL(20,0) Unsigned integer with range 0 to 18,446,744,073,709,551,615.
INT_8 BYTEINT Signed integer with range -128 to 127.
INT_16 SMALLINT Signed integer with range -32768 to 32767.
INT_32 INTEGER Signed integer with range -2,147,483,648 to 2,147,483,647.
INT_64 BIGINT Signed integer with range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
INT_96 (in UTC format) TIMESTAMP(6) Used to represent timestamp. First 8 bytes has number of nanoseconds and next 4 bytes has number of days since Julian day. Deprecated in future Parquet releases.
DECIMAL(p,s) DECIMAL(p,s) Decimal where precision is between 1 and 38 (inclusive) and scale is between 0 and 38 (inclusive).
VARBYTE(p+1), where p is the precision of the DECIMAL. Precision and scale is greater than 38.
FLOAT REAL IEEE 32-bit floating point.
DOUBLE REAL IEEE 64-bit floating point.
DATE DATE Number of days from the Unix epoch, 1 January 1970
TIME_MILLIS (in UTC format) TIME(3) or TIME(6) Stores time in milliseconds.
TIME_MICROS (in UTC format) TIME(6) Stores time in microseconds.
TIMESTAMP_MILLIS (in UTC format) TIMESTAMP(3) or TIMESTAMP(6) Stores timestamp in milliseconds.
TIMESTAMP_MICROS (in UTC format) TIMESTAMP(6) Stores timestamp in microseconds.
INTERVAL VARCHAR CHARACTER SET UNICODE Stores interval in months, days and milliseconds.
BOOL BYTEINT Boolean (True or False).
STRING VARCHAR or CLOB CHARACTER SET UNICODE Encoded as a UTF8 byte array.
BSON JSON STORAGE FORMAT BSON BSON data.
JSON JSON CHARACTER SET UNICODE JSON data.
ENUM VARCHAR or CLOB CHARACTER SET UNICODE Store enumerated values encoded as a UTF8 string.
ARRAY VARCHAR or CLOB CHARACTER SET UNICODE Stored as repeated fields. Can be an array of a single value or multiple values.

The Parquet files do not need the same column layout and do not need the same data type (if the data types are compatible) as the foreign table definition. For example:

File Column 1 Column 2 Column 3 Column 4 Allowed?
Foreign table is defined with 3 columns with the data types specified to the right INT VARCHAR DATE    
File 1 - has 3 columns of the same data types as the foreign table INT VARCHAR DATE   YES
File 2 - added columns at the end INT VARCHAR DATE TIMESTAMP YES
File 3 - removed columns from the end INT VARCHAR     YES, if column 3 is nullable
File 4 - different data types VARCHAR INTERVAL BIGINT   NO, the file is skipped
File 5 - different, but compatible data types BIGINT CLOB DATE   YES