Parquet External File | CREATE FOREIGN TABLE SQL Statement | Teradata Vantage - Parquet External Files - Analytics Database - Teradata Vantage

Teradata Vantage™ - Native Object Store Getting Started Guide - 17.20

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-05
dita:mapPath
tsq1628112323282.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
jjn1567647976698
Product Category
Teradata Vantage

To access external files containing Parquet-formatted data, you must specify PARQUET for the USING STOREDAS option in the CREATE FOREIGN TABLE statement.

Column names for Parquet data can contain a maximum of 128 characters and are case-sensitive.

Foreign tables for Parquet data are created without a primary index (NOPI) and are column-partitioned.

You cannot specify the following table-level options for foreign tables that contain Parquet formatted data:
  • Row partitioning
  • Subrow partitioning (partial row partitioning for a column-partitioned table)
  • Multicolumn partitions
  • Autocompression
  • ROWFORMAT option of the USING clause
  • A primary index

External Parquet files can be compressed in Snappy format.

You must explicitly define data columns for foreign tables that contain Parquet data. Define the columns in the same order as the columns in the Parquet files, and use Vantage data types that correspond to the Parquet data types:

Parquet Data Type Corresponding Vantage Data Type Description
UINT_8 SMALLINT Unsigned integer with range 0 to 255.
UINT_16 INTEGER Unsigned integer with range 0 to 65535.
UINT_32 BIGINT Unsigned integer with range 0 to 4,294,967,295.
UINT_64 DECIMAL(20,0) Unsigned integer with range 0 to 18,446,744,073,709,551,615.
INT_8 BYTEINT Signed integer with range -128 to 127.
INT_16 SMALLINT Signed integer with range -32768 to 32767.
INT_32 INTEGER Signed integer with range -2,147,483,648 to 2,147,483,647.
INT_64 BIGINT Signed integer with range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
INT_96 TIMESTAMP(6) Used to represent timestamp. First 8 bytes has number of nanoseconds and next 4 bytes has number of days since Julian day.
DECIMAL(p,s) DECIMAL(p,s) Decimal where precision is between 1 and 38 (inclusive) and scale is between 0 and 38 (inclusive).
VARBYTE(p+1), where p is the precision of the DECIMAL. Precision and scale is greater than 38.
FLOAT REAL IEEE 32-bit floating point.
DOUBLE REAL IEEE 64-bit floating point.
DATE DATE Number of days from the Unix epoch, 1 January 1970
TIME_MILLIS TIME(3) or TIME(6) Stores time in milliseconds.
TIME_MICROS TIME(6) Stores time in microseconds.
TIMESTAMP_MILLIS TIMESTAMP(3) or TIMESTAMP(6) Stores timestamp in milliseconds.
TIMESTAMP_MICROS TIMESTAMP(6) Stores timestamp in microseconds.
INTERVAL VARCHAR CHARACTER SET UNICODE Stores interval in months, days and milliseconds.
BOOL BYTEINT Boolean (True or False).
STRING VARCHAR or CLOB CHARACTER SET UNICODE Encoded as a UTF8 byte array.
BSON JSON STORAGE FORMAT BSON BSON data.
JSON JSON CHARACTER SET UNICODE JSON data.
STRUCT VARCHAR or CLOB CHARACTER SET UNICODE Group of fixed members.
MAP VARCHAR or CLOB CHARACTER SET UNICODE Maps keys to values.
LIST VARCHAR or CLOB CHARACTER SET UNICODE Contains data that is stored in array.
ENUM VARCHAR or CLOB CHARACTER SET UNICODE Store enumerated values encoded as a UTF8 string.
ARRAY VARCHAR or CLOB CHARACTER SET UNICODE Stored as repeated fields. Can be an array of a single value or multiple values.

The Parquet files do not need the same column layout and do not need the same data type (if the data types are compatible) as the foreign table definition. For example:

File Column 1 Column 2 Column 3 Column 4 Allowed?
Foreign table is defined with 3 columns with the data types specified to the right INT VARCHAR DATE   N/A
File 1 - has 3 columns of the same data types as the foreign table INT VARCHAR DATE   YES
File 2 - added columns at the end INT VARCHAR DATE TIMESTAMP YES
File 3 - removed columns from the end INT VARCHAR     YES, if column 3 is nullable
File 4 - different data types VARCHAR INTERVAL BIGINT   NO, the file is skipped
File 5 - different, but compatible data types BIGINT CLOB DATE   YES

To set the binpacking algorithm to use when reading Parquet files, use the SET QUERY_BAND statement with the query band name BINPACKALGO4PARQUET. For details, see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.