Delta Lake Manifest Files Limitations - Teradata VantageCloud Lake

Lake - Manage and Move Data

Deployment
VantageCloud
Edition
Lake
Product
Teradata VantageCloud Lake
Release Number
Published
February 2025
ft:locale
en-US
ft:lastEdition
2025-05-16
dita:mapPath
atx1683670417382.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
atx1683670417382

The Analytics Database integration has known limitations in its behavior.

Data Consistency

When generating updated manifests, Delta Lake atomically overwrites existing manifest files. Therefore, Vantage always sees a consistent view of the data files. However, the granularity of the consistency guarantees depends on whether or not the table is partitioned.

Table Consistency
Unpartitioned All the file names are written in one manifest file, which is updated atomically. Vantage sees full table snapshot consistency.
Partitioned A manifest file is partitioned in the same Hive-partitioning-style directory structure as the original Delta table. This means that each partition is updated atomically, and Vantage sees a consistent view of each partition, but not a consistent view across partitions. Furthermore, since all manifests of all partitions cannot be updated together, concurrent attempts to generate manifests can lead to different partitions having manifests of different versions. This consistency guarantee under data change is weaker than that of reading Delta tables with Spark, but is still stronger than formats like Parquet, which do not provide partition-level consistency.

Depending on what storage system you are using for Delta tables, you can get incorrect results when Presto or Athena concurrently queries the manifest while the manifest files are being rewritten. In file system implementations that lack atomic file overwrites, a manifest file may be momentarily unavailable. Therefore, use manifests with caution if their updates are likely to coincide with queries from Presto or Athena.

Performance

Large numbers of files can hurt Vantage performance. Databricks recommends that you compact the files of the table before generating the manifests. The number of files cannot exceed 1000 (for the entire unpartitioned table or for each partition in a partitioned table).

Schema Evolution

Delta Lake supports schema evolution and queries on a Delta table automatically using the latest schema regardless of the schema defined in the table in the Hive metastore. However, Vantage uses the schema defined in its table definition, and does not query with the updated schema until the table definition is updated to the new schema.