Query Parquet Data | Data Manipulation Language | Teradata Vantage - Query Foreign Tables with Parquet Data - Advanced SQL Engine - Teradata Database

SQL Data Manipulation Language

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Published
July 2021
Language
English (United States)
Last Update
2021-07-27
dita:mapPath
vjt1596846980081.ditamap
dita:ditavalPath
vjt1596846980081.ditaval
dita:id
B035-1146
lifecycle
previous
Product Category
Teradata Vantageā„¢
Foreign tables with parquet data have the following columns:
  • Location column
  • Columns that correspond to each column in the parquet data.

You can query foreign tables with parquet data, but be aware of the following data handling information.

Parquet Data Handling

  • Vantage does not return a warning when performing a SELECT operation on a table which has TIME/TIMESTAMP columns such that the time zone is not UTC when stored in the Parquet file. You must set the time zone correctly at the system, user, profile, or session level. If you do not, Vantage returns the data in UTC format only. It is not possible to validate whether the time zone in the Parquet file matches the time zone if the user has explicitly set it.
  • For the following data types, you can use a CREATE FOREIGN TABLE statement, but if the corresponding column is referenced by a query, the query will skip the file or return an error.
    • DECIMAL(p,s) where precision and/or scale is greater than 38
    • Complex data types (STRUCT, MAP, LIST, and ENUM)
  • There is no equivalent Vantage data type for the following Parquet numeric types:
    • UINT_64 is mapped to the Vantage DECIMAL(20,0) type. This implies that any operations on the unsigned 64 bit integer is handled using DECIMAL data type rules.
    • FLOAT (IEEE 32-bit floating point) is mapped to the REAL (IEEE 64-bit floating point) data type. This can imply a precision loss when converting some of the values from FLOAT to REAL because it is not possible to convert without a loss of precision.
    • DECIMAL(p,s) (where precision 'p' or scale 's' is greater than 38) is mapped to the Vantage VARBYTE type. The length of the VARBYTE must be at least one greater than the precision of the DECIMAL type. If the scale is greater than 255 for a Parquet file, that Parquet file is skipped. Similarly, if the precision is more than 65535, that Parquet file is skipped.
    • INTERVAL is mapped to the Vantage VARCHAR type. In a Parquet file, the interval is stored using three integers representing months, days, and milliseconds, respectively. It is returned in character format by concatenating the values for all the three intervals separated with a hyphen '-'. For example, if the months value is 11, days is 24, and milliseconds is 60000, then Vantage returns '11-4-60000'.

      In a Parquet file, it is not mandatory to include all three values. In these cases, the missing parameter receives value of 0. For example, if months is 8, days is not stored, and milliseconds is 36000, then Vantage returns '8-0-36000'. You can use string functions (like SUBSTRING) to extract the individual values.

    • INT96 in a Parquet file is used to hold legacy timestamp values in nanoseconds. It is mapped to the Vantage TIMESTAMP(6) type. Because Vantage can hold precision down to only microseconds, there can be a precision loss if the data is more granular than that.
  • If the schema is different across Parquet files, for example, if the order of columns has a mismatch, or the data type of columns is different), the Parquet file that does not match is skipped during query processing, with the following exception: if columns are added or removed from the end after the table has been created, the query continues to run. Vantage returns NULL for the columns that are removed. Newly added columns cannot be queried, because they are not defined in the CREATE FOREIGN TABLE statement for the Parquet foreign table. If you want to include them, you must recreate the Parquet foreign table with the new columns.