Parquet Format with Native Object Store | Teradata VantageCloud Lake - Parquet Format - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
When reading external Parquet data, the maximum Parquet page size supported is 16 MB.
  • For READ_NOS, you can set the page size to a larger value (up to 128 MB) using the dbscontrol flag MaxParquetMemPoolUMMAllocSize.
    The DBS Control utility is not available for VantageCloud Lake customers. To change a DBS Control (dbscontrol) setting, contact Teradata Support.
  • For WRITE_NOS, the page size is 16 MB and file size is up to 512 MB.
Using NOS, the maximum record size is 16,776,192 bytes.
  • If your record consists of all character data, these are the limitations for each character set:
    • For UNICODE, 16,776,192 bytes is equivalent to 8,388,096 characters.
    • For LATIN, 16,776,192 bytes is equivalent to 16,776,192 characters.

If the record includes binary data, the maximum number of characters is proportionately reduced.

A Parquet table does not have a payload column. The user creates a foreign table and maps the Parquet logical data type to the corresponding Teradata data type.

Redshift is the supported format for the manifest files.

Parquet format limitations:
  • READ_NOS can be used to view the Parquet schema, using RETURNTYPE('NOSREAD_SCHEMA'). This is helpful in creating the foreign table when you do not know the schema of your Parquet data beforehand.
  • Certain complex data types are not supported, including STRUCT, MAP, LIST, and ENUM.
  • Because support for the STRUCT data type is not available, nested Parquet object stores cannot be processed by NOS.
  • When schema evolution is enabled, Parquet files cannot have duplicate column names as they are case-sensitive. For example, column names can be specified as EMPID, empid, and EmpID but not EMPID and EMPID.

The following examples use external Parquet data in this format:

message schema {
  optional double GageHeight2;
  optional double Flow;
  optional int64 site_no;
  optional binary datetime (UTF8);
  optional double Precipitation;
  optional double GageHeight;
}

For supported Parquet formats, see Parquet External Files.