Setup a Databricks Spark cluster - Teradata Vantage

Apache Iceberg and Delta Lake Open Table Format on VantageCloud Lake Getting Started

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-03
dita:mapPath
bsr1702324250454.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
bsr1702324250454

A Databricks Spark cluster needs to be setup if the Teradata OTF performs Write operations on the Iceberg tables in Unity.

Spark cluster for Iceberg Write operations

Once Delta Lake Write has completed its tasks and Iceberg Write has committed the metadata (for some operations, not all), the Iceberg table changes should be complete. Unfortunately, the Iceberg metadata update is not known to Unity catalog since the Iceberg REST catalog is not used. As a managed table, it appears Databricks maintains its own metadata layer and does not read the version of the tables inside the table metadata. Thus, a programmatic trigger of the Iceberg metadata generation from the latest Delta Lake table definition, is required. This action not only generates Iceberg metadata but also updates the Unity catalog of the preceding changes. This is accomplished by Teradata OTF running a Spark SQL programmatically using the Sparks cluster in Databricks:

MSCK REPAIR TABLE <full table name> SYNC METADATA;
          

The Universal Format is documented in the following location: https://docs.databricks.com/en/delta/uniform.html.

Because of this, Iceberg Write has a dependency on Databricks Spark cluster. Currently this is the only mechanism to update Unity catalog of the Delta Lake/Iceberg Uniform changes since there are no APIs - both REST and in Databricks SDK - to refresh/update a table or update the Unity catalog of underlying changes to the Delta Lake/Iceberg Uniform tables.