Creating a DATALAKE Object - Teradata Vantage

Apache Iceberg and Delta Lake Open Table Format on VantageCloud Lake Getting Started

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-03
dita:mapPath
bsr1702324250454.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
bsr1702324250454

The DATALAKE object encapsulates all the information needed to connect to an OTF data lake including the Authorization information needed to connect to the Catalog & Object storage and the connection details.

A DATALAKE object is created using the CREATE DATALAKE statement. All DATALAKEs are created in the TD_SERVER_DB database.

The Authorization information to connect to the Catalog and Object Storage is specified in the <auth_list> clause of the CREATE DATALAKE statement. See the topic Creating Authorization Objects for a DATALAKE for a DATALAKE on how to create AUTHORIZATION objects for a DATALAKE.

CREATE DATALAKE <datalake_name>
          <auth_list>
          USING catalog_type <left_paren> <catalog_type_literal> <right_paren>
                [catalog_location <left_paren> <catalog_location_literal> <right_paren>] 
                [storage_location <left_paren> <storage_location_literal> <right_paren>]
                [storage_region <left_paren> <storage_region_literal> <right_paren>]     
                [unity_catalog_name <left_paren> <unity_catalog_name_literal> <right_paren>]
                [storage_account_name <left_paren> <storage_account_name_literal> <right_paren>]
                [tenant_id <left_paren> <tenant_id_literal> <right_paren>]
                [default_cluster_id <left_paren> <default_cluster_id_literal> <right_paren>]
                [<custom_clause_list>]
           TABLE FORMAT <format_type>;
<datalake_name>::= !! Teradata identifier
<auth_list>::= EXTERNAL SECURITY <auth_type> [{<comma> <auth_list>}...]
<auth_type>::= DEFINER TRUSTED <connection_type> <auth_name> | [INVOKER] TRUSTED <connection_type> <auth_name>
<connection_type>::= CATALOG | STORAGE
<auth_name>::= !! Teradata identifier
<catalog_type_literal>::= <quote> <catalog_type> <quote>
<catalog_location_literal>::= <quote> <catalog_uri> <quote>
<storage_location_literal>::= <quote> <storage_uri> <quote>
<storage_region_literal>::=  <quote> <storage_region> <quote>
<unity_catalog_name_literal>::=  <quote> <unity_catalog_name> <quote>
<storage_account_name_literal>::=  <quote> <storage_account_name> <quote>
<tenant_id_literal>::=  <quote> <tenant_id> <quote>
<catalog_type>::= hive | glue | unity
<catalog_uri>::= !! Catalog URI, e.g. thrift://example.com, required when catalog_type is hive or unity
<storage_uri>::= !! Storage URI, e.g. s3://example-iceberg-v1/, required when catalog_type is hive
<storage_region>::= !! Cloud region, e.g. us-west-2, required when catalog_type is glue or hive
<unity_catalog_name>::= !! Unity/Azure Databricks catalog name, e.g. reg_iceberg_db, required when catalog_type is unity
<storage_account_name>::= !! Azure DataLake Storage Gen2 storage account name, e.g. regicebergstorageacct, required when catalog_type is unity
<tenant_id>::= !! Azure Active Directory service principal tenant ID, e.g. 391c8c4c-6a2a-40fd-ab98-226b6baa5155, required when catalog_type is unity
<default_cluster_id>::= !! Spark compute cluster ID; the expected format is xxxx-xxxxxx-xxxxxxxx, e.g. 0210-232334-ab0q59t3, required when catalog_type is unity
<custom_clause_list>::= <custom_clause> [<custom_clause_list>...]
<custom_clause>::= <name> <left_paren> <quote> <value>[{<comma><value>}...] <quote> <right_paren>
<name>::= !! Teradata identifier
<value>::= !! Teradata literal value
<format_type>::= iceberg | deltalake

Usage Considerations

Custom Clause Values (Case Insensitive) Optional/Mandatory Catalogs Supporting Error
TABLE FORMAT is part of the SQL (not Using clause of Datalake DDL) IcebergDeltaLake Mandatory ICEBERG:
  • AWS
    • HIVE
    • UNITY
    • GLUE
  • AZURE
    • HIVE
    • UNITY

DELTLAKE:

  • AWS
    • UNITY
    • GLUE
  • AZURE
    • UNITY

Table Format is missing or invalid. Supported Table Formats (Iceberg/DeltaLake)

catalog_type HiveGlueUnityFabric Mandatory ICEBERG:
  • AWS
    • HIVE
    • UNITY
    • GLUE
  • AZURE
    • HIVE
    • UNITY

DELTLAKE:

  • AWS
    • UNITY
    • GLUE
  • AZURE
    • UNITY
Catalog type is missing or invalid. Supported catalog (hive/glue/unity/fabric)
catalog_location ex: thrift://172.177.44.4:9083 Mandatory for Hive, Unity ICEBERG:
  • AWS
    • HIVE
    • UNITY
  • AZURE
    • HIVE
    • UNITY

DELTLAKE:

  • AWS
    • HIVE
    • UNITY
  • AZURE
    • HIVE
    • UNITY
Catalog Location is missing or
                invalid.
catalog_endpoint ex:https://glue-fips.us-west-2.amazonaws.com Optional ICEBERG:
  • AWS
    • GLUE
  • AZURE

DELTLAKE:

  • AWS
    • GLUE
  • AZURE
 
unity_catalog_name ex: iceberg_db Mandatory for Unity ICEBERG and DELTALAKE:
  • Unity
unity_catalog_name is missing or invalid.
storage_location ex:
bfss://otf-330spark-51hdi-publ-2024-06-13t06-52-39-186z@iceberg-storageeastus2-.dfs.core.-windows.net/s3://vim-iceberg-v1/

For S3, follow the naming conventions mentioned here: Bucket naming rules - Amazon Simple Storage Service

Mandatory for Hive, Glue

Mandatoryfor Unity on AWS

Optionalfor Unity on Azure

ICEBERG:
  • AWS
    • HIVE
    • UNITY
  • AZURE

DELTLAKE:

  • AWS
    • HIVE
  • AZURE

Storage Location is missing or invalid.

storage_region ex: us-west-2 Mandatory for Hive, Glue

Mandatory for Unity on AWS

Optional for Unity on Azure

ICEBERG:
  • AWS
    • HIVE
    • UNITY
    • GLUE
  • AZURE
    • HIVE

DELTLAKE:

  • AWS
    • UNITY
    • GLUE
  • AZURE

Cloud Region is missing or invalid.

storage_endpoint ex:

https://s3-fips.us-west-2.amazonaws.com

Optional ICEBERG:
  • AWS
    • HIVE
    • GLUE
  • AZURE
    • HIVE

DELTLAKE:

  • AWS
    • GLUE
  • AZURE
 
storage_account_name ex:

icebergstorageeastus2

Mandatory for Hive on Azure and Unity ICEBERG:
  • AZURE
    • HIVE
    • UNITY

DELTLAKE:

  • AZURE
    • UNITY

Storage Account Name is missing or invalid.

tenant_id ex:

391c8c4c-6a2a-40fd-ab98-226b6baa5155

Mandatory for Hive on Azure, Unity ICEBERG:
  • AZURE
    • HIVE
    • UNITY
    • FABRIC

DELTLAKE:

  • AZURE
    • UNITY

TenantId is missing or invalid.

default_cluster_id ex:

0210-232334-ab0q59t3

Mandatory ICEBERG:
  • AZURE
    • UNITY

DELTLAKE:

  • AZURE
    • UNITY
 
container_name ex:

otf-330spark-51hdi-publ-2024-06-13t06-52-39-186z

Optional ICEBERG:
  • AZURE
    • UNITY
    • HIVE

DELTLAKE:

  • AZURE
    • UNITY
container_name is missing or invalid.

The procedure to create DATALAKE objects for different catalogs & object storages is discussed in the following topics.