To access external files containing Parquet-formatted data, you must specify PARQUET for the USING STOREDAS option in the CREATE FOREIGN TABLE statement.
Column names for Parquet data can contain a maximum of 128 characters and are case-sensitive.
Foreign tables for Parquet data are created without a primary index (NOPI) and are column-partitioned.
- Row partitioning
- Subrow partitioning (partial row partitioning for a column-partitioned table)
- Multicolumn partitions
- Autocompression
- ROWFORMAT option of the USING clause
- A primary index
External Parquet files can be compressed in Snappy format.
You must explicitly define data columns for foreign tables that contain Parquet data. Define the columns in the same order as the columns in the Parquet files, and use Vantage data types that correspond to the Parquet data types:
Parquet Data Type | Corresponding Vantage Data Type | Description |
---|---|---|
UINT_8 | SMALLINT | Unsigned integer with range 0 to 255. |
UINT_16 | INTEGER | Unsigned integer with range 0 to 65535. |
UINT_32 | BIGINT | Unsigned integer with range 0 to 4,294,967,295. |
UINT_64 | DECIMAL(20,0) | Unsigned integer with range 0 to 18,446,744,073,709,551,615. |
INT_8 | BYTEINT | Signed integer with range -128 to 127. |
INT_16 | SMALLINT | Signed integer with range -32768 to 32767. |
INT_32 | INTEGER | Signed integer with range -2,147,483,648 to 2,147,483,647. |
INT_64 | BIGINT | Signed integer with range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. |
INT_96 | TIMESTAMP(6) | Used to represent timestamp. First 8 bytes has number of nanoseconds and next 4 bytes has number of days since Julian day. |
DECIMAL(p,s) | DECIMAL(p,s) | Decimal where precision is between 1 and 38 (inclusive) and scale is between 0 and 38 (inclusive). |
VARBYTE(p+1), where p is the precision of the DECIMAL. | Precision and scale is greater than 38. | |
FLOAT | REAL | IEEE 32-bit floating point. |
DOUBLE | REAL | IEEE 64-bit floating point. |
DATE | DATE | Number of days from the Unix epoch, 1 January 1970 |
TIME_MILLIS | TIME(3) or TIME(6) | Stores time in milliseconds. |
TIME_MICROS | TIME(6) | Stores time in microseconds. |
TIMESTAMP_MILLIS | TIMESTAMP(3) or TIMESTAMP(6) | Stores timestamp in milliseconds. |
TIMESTAMP_MICROS | TIMESTAMP(6) | Stores timestamp in microseconds. |
INTERVAL | VARCHAR CHARACTER SET UNICODE | Stores interval in months, days and milliseconds. |
BOOL | BYTEINT | Boolean (True or False). |
STRING | VARCHAR or CLOB CHARACTER SET UNICODE | Encoded as a UTF8 byte array. |
BSON | JSON STORAGE FORMAT BSON | BSON data. |
JSON | JSON CHARACTER SET UNICODE | JSON data. |
STRUCT | VARCHAR or CLOB CHARACTER SET UNICODE | Group of fixed members. |
MAP | VARCHAR or CLOB CHARACTER SET UNICODE | Maps keys to values. |
LIST | VARCHAR or CLOB CHARACTER SET UNICODE | Contains data that is stored in array. |
ENUM | VARCHAR or CLOB CHARACTER SET UNICODE | Store enumerated values encoded as a UTF8 string. |
ARRAY | VARCHAR or CLOB CHARACTER SET UNICODE | Stored as repeated fields. Can be an array of a single value or multiple values. |
The Parquet files do not need the same column layout and do not need the same data type (if the data types are compatible) as the foreign table definition. For example:
File | Column 1 | Column 2 | Column 3 | Column 4 | Allowed? |
---|---|---|---|---|---|
Foreign table is defined with 3 columns with the data types specified to the right | INT | VARCHAR | DATE | N/A | |
File 1 - has 3 columns of the same data types as the foreign table | INT | VARCHAR | DATE | YES | |
File 2 - added columns at the end | INT | VARCHAR | DATE | TIMESTAMP | YES |
File 3 - removed columns from the end | INT | VARCHAR | YES, if column 3 is nullable | ||
File 4 - different data types | VARCHAR | INTERVAL | BIGINT | NO, the file is skipped | |
File 5 - different, but compatible data types | BIGINT | CLOB | DATE | YES |