Delta Lake Manifest Files Limitations - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - Native Object Store Getting Started Guide

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Published
July 2021
Language
English (United States)
Last Update
2022-06-22
dita:mapPath
gmv1596851589343.ditamap
dita:ditavalPath
wrg1590696035526.ditaval
dita:id
B035-1214
lifecycle
previous
Product Category
Software
Teradata Vantage

The Vantage integration has known limitations in its behavior.

Data Consistency

Whenever Delta Lake generates updated manifests, it atomically overwrites existing manifest files. Therefore, Vantage will always see a consistent view of the data files. However, the granularity of the consistency guarantees depends on whether the table is partitioned or not.

  • Unpartitioned tables: All the file names are written in one manifest file, which is updated atomically. Vantage sees full table snapshot consistency.
  • Partitioned tables: A manifest file is partitioned in the same Hive-partitioning-style directory structure as the original Delta table. This means that each partition is updated atomically, and Vantage sees a consistent view of each partition, but not a consistent view across partitions. Furthermore, since all manifests of all partitions cannot be updated together, concurrent attempts to generate manifests can lead to different partitions having manifests of different versions. While this consistency guarantee under data change is weaker than that of reading Delta tables with Spark, it is still stronger than formats like Parquet as they do not provide partition-level consistency.

Depending on what storage system you are using for Delta tables, it is possible to get incorrect results when Presto or Athena concurrently queries the manifest while the manifest files are being rewritten. In file system implementations that lack atomic file overwrites, a manifest file may be momentarily unavailable. Therefore, use manifests with caution if their updates are likely to coincide with queries from Presto or Athena.

Performance

Very large numbers of files can hurt Vantage performance. Databricks recommends that you compact the files of the table before generating the manifests. The number of files should not exceed 1000 (for the entire unpartitioned table or for each partition in a partitioned table).

Schema Evolution

Delta Lake supports schema evolution and queries on a Delta table automatically using the latest schema regardless of the schema defined in the table in the Hive metastore. However, Vantage uses the schema defined in its table definition, and will not query with the updated schema until the table definition is updated to the new schema.