Creating a DATALAKE Object | TERADATA OTF - Creating a DATALAKE Object - Teradata Vantage

Teradata® Open Table Format for Apache Iceberg and Delta Lake User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Lake
Product
Teradata Vantage
Release Number
20.00
Published
October 2025
ft:locale
en-US
ft:lastEdition
2025-10-25
dita:mapPath
qrj1749167830193.ditamap
dita:ditavalPath
lli1749584660955.ditaval
dita:id
bsr1702324250454

The DATALAKE object encapsulates all the information needed to connect to an OTF data lake including the Authorization information needed to connect to the Catalog & Object storage and the connection details.

A DATALAKE object is created using the CREATE DATALAKE statement. All DATALAKEs are created in the TD_SERVER_DB database.

The Authorization information to connect to the Catalog and Object Storage is specified in the <auth_list> clause of the CREATE DATALAKE statement. See the topic Creating Authorization Objects for a DATALAKE for a DATALAKE on how to create AUTHORIZATION objects for a DATALAKE.

CREATE DATALAKE <datalake_name>
          <auth_list>
          USING catalog_type <left_paren> <catalog_type_literal> <right_paren>
                [catalog_location <left_paren> <catalog_location_literal> <right_paren>] 
                [storage_location <left_paren> <storage_location_literal> <right_paren>]
                [storage_region <left_paren> <storage_region_literal> <right_paren>]     
                [unity_catalog_name <left_paren> <unity_catalog_name_literal> <right_paren>]
                [storage_account_name <left_paren> <storage_account_name_literal> <right_paren>]
                [tenant_id <left_paren> <tenant_id_literal> <right_paren>]
                [default_cluster_id <left_paren> <default_cluster_id_literal> <right_paren>]
                [<custom_clause_list>]
           TABLE FORMAT <format_type>;
<datalake_name>::= !! Teradata identifier
<auth_list>::= EXTERNAL SECURITY <auth_type> [{<comma> <auth_list>}...]
<auth_type>::= DEFINER TRUSTED <connection_type> <auth_name> | [INVOKER] TRUSTED <connection_type> <auth_name>
<connection_type>::= CATALOG | STORAGE
<auth_name>::= !! Teradata identifier
<catalog_type_literal>::= <quote> <catalog_type> <quote>
<catalog_location_literal>::= <quote> <catalog_uri> <quote>
<storage_location_literal>::= <quote> <storage_uri> <quote>
<storage_region_literal>::=  <quote> <storage_region> <quote>
<unity_catalog_name_literal>::=  <quote> <unity_catalog_name> <quote>
<storage_account_name_literal>::=  <quote> <storage_account_name> <quote>
<tenant_id_literal>::=  <quote> <tenant_id> <quote>
<catalog_type>::= hive | glue | unity | rest | fabric
<catalog_uri>::= !! Catalog URI, e.g. thrift://example.com, required when catalog_type is hive or unity or rest
<storage_uri>::= !! Storage URI, e.g. s3://example-iceberg-v1/, required when catalog_type is hive
<storage_region>::= !! Cloud region, e.g. us-west-2, required when catalog_type is glue or hive
<unity_catalog_name>::= !! Unity/Azure Databricks catalog name, e.g. reg_iceberg_db, required when catalog_type is unity
<storage_account_name>::= !! Azure DataLake Storage Gen2 storage account name, e.g. regicebergstorageacct, required when catalog_type is unity
<tenant_id>::= !! Azure Active Directory service principal tenant ID, e.g. 391c8c4c-6a2a-40fd-ab98-226b6baa5155, required when catalog_type is unity
<default_cluster_id>::= !! Spark compute cluster ID; the expected format is xxxx-xxxxxx-xxxxxxxx, e.g. 0210-232334-ab0q59t3, required when catalog_type is unity
<project_id>::= !! GCP Service Account Project ID
<client_id>::= !! GCP Service Account Client ID
<client_email>::= !! GCP Service Account Client Email
<catalog_service_principal_type>::= !! Databricks Managed or IDP managed(Azure or GCP) Service Principles
<idp_type>: !! Identity Provider. Allowes value are custom and none
<idp_location>: !! OAuth Location endpoint URL
<idp_token_scope>: !! OAuth token scope
<storage_endpoint>: !! This is required for On Prem S3 Compatible Storage like Dell ECS or Minio or StorageGrid
<catalog_name>::= !! Name of the catalog. Mainly used for Iceberg REST catalog
<custom_clause_list>::= <custom_clause> [<custom_clause_list>...]
<custom_clause>::= <name> <left_paren> <quote> <value>[{<comma><value>}...] <quote> <right_paren>
<name>::= !! Teradata identifier
<value>::= !! Teradata literal value
<format_type>::= iceberg | deltalake

Usage Considerations

Custom Clause Values (Case Insensitive) Optional/Mandatory Catalogs Supporting Error
TABLE FORMAT is part of the SQL (not Using clause of Datalake DDL)
  • Iceberg
  • DeltaLake
Mandatory ICEBERG:
  • AWS
    • HIVE
    • UNITY
    • GLUE
    • REST: POLARIS and GRAVITNO
  • AZURE
    • HIVE
    • UNITY
  • OnPrem S3 Compatible like Dell ECS or StorageGrid
    • REST: GRAVITNO

DELTLAKE:

  • AWS
    • UNITY
    • GLUE
  • AZURE
    • UNITY

Table Format is missing or invalid. Supported Table Formats (Iceberg/DeltaLake)

catalog_type
  • Hive
  • Glue
  • Unity
  • Fabric
Mandatory ICEBERG:
  • AWS
    • HIVE
    • UNITY
    • GLUE
    • REST: POLARIS and GRAVITNO
  • AZURE
    • HIVE
    • UNITY
  • OnPrem S3 Compatible like Dell ECS or StorageGrid
    • REST: GRAVITNO

DELTLAKE:

  • AWS
    • UNITY
    • GLUE
  • AZURE
    • UNITY
Catalog type is missing or invalid. Supported catalog (hive/glue/unity/fabric)
catalog_location ex: thrift://172.177.44.4:9083 Mandatory for Hive, Unity ICEBERG:
  • AWS
    • HIVE
    • UNITY
    • REST: POLARIS and GRAVITNO
  • AZURE
    • HIVE
    • UNITY
  • OnPrem S3 Compatible like Dell ECS or StorageGrid
    • REST: GRAVITNO

DELTLAKE:

  • AWS
    • HIVE
    • UNITY
  • AZURE
    • HIVE
    • UNITY
Catalog Location is missing or
                invalid.
catalog_endpoint ex:https://glue-fips.us-west-2.amazonaws.com Optional ICEBERG:
  • AWS
    • GLUE
  • AZURE

DELTLAKE:

  • AWS
    • GLUE
  • AZURE
 
unity_catalog_name ex: iceberg_db Mandatory for Unity ICEBERG and DELTALAKE:
  • Unity
unity_catalog_name is missing or invalid.
storage_location ex:
bfss://otf-330spark-51hdi-publ-2024-06-13t06-52-39-186z@iceberg-storageeastus2-.dfs.core.-windows.net/s3://vim-iceberg-v1/

For S3, follow the naming conventions mentioned here: Bucket naming rules - Amazon Simple Storage Service

Mandatory for Hive, Glue

Mandatoryfor Unity on AWS

Optionalfor Unity on Azure

ICEBERG:
  • AWS
    • HIVE
    • UNITY
  • REST: POLARIS and GRAVITNO
  • AZURE

DELTLAKE:

  • AWS
    • HIVE
  • AZURE

Storage Location is missing or invalid.

storage_region ex: us-west-2 Mandatory for Hive, Glue

Mandatory for Unity on AWS

Optional for Unity on Azure

ICEBERG:
  • AWS
    • HIVE
    • UNITY
    • GLUE
  • REST: POLARIS and GRAVITNO
  • AZURE
    • HIVE

DELTLAKE:

  • AWS
    • UNITY
    • GLUE
  • AZURE

Cloud Region is missing or invalid.

storage_endpoint ex:

https://s3-fips.us-west-2.amazonaws.com

Optional ICEBERG:
  • AWS
    • HIVE
    • GLUE
  • REST: POLARIS and GRAVITNO
  • AZURE
    • HIVE

DELTLAKE:

  • AWS
    • GLUE
  • AZURE
 
storage_account_name ex:

icebergstorageeastus2

Mandatory for Hive on Azure and Unity ICEBERG:
  • AZURE
    • HIVE
    • UNITY

DELTLAKE:

  • AZURE
    • UNITY

Storage Account Name is missing or invalid.

tenant_id ex:

391c8c4c-6a2a-40fd-ab98-226b6baa5155

Mandatory for Hive on Azure, Unity ICEBERG:
  • AZURE
    • HIVE
    • UNITY
    • FABRIC

DELTLAKE:

  • AZURE
    • UNITY

TenantId is missing or invalid.

default_cluster_id ex:

0210-232334-ab0q59t3

Mandatory ICEBERG:
  • AZURE
    • UNITY

DELTLAKE:

  • AZURE
    • UNITY
 
container_name ex:

otf-330spark-51hdi-publ-2024-06-13t06-52-39-186z

Optional ICEBERG:
  • AZURE
    • UNITY
    • HIVE

DELTLAKE:

  • AZURE
    • UNITY
container_name is missing or invalid.
catalog_service_principal_type
  • databricks_managed_principal (default)
  • idp_managed_principal
idp_managed_principal can be used only for Azure and GCP databricks.
Optional if using databricks_managed_principle ICEBERG:
  • AZURE
    • UNITY
  • GCP
    • UNITY
  • AWS
    • UNITY

DELTLAKE:

  • AZURE
    • UNITY
  • GCP
    • UNITY
  • AWS
    • UNITY
Client authentication failed
project_id GCP Service Account Project ID

Example:

tc-otf

Mandatory ICEBERG:
  • GCP
    • HIVE
    • UNITY
  • DELTALAKE
    • UNITY
project_id is missing
client_id GCP Service Account Client ID Mandatory ICEBERG:
  • GCP
    • HIVE
    • UNITY
  • DELTALAKE
    • UNITY
client_id is missing
client_email GCP Service Account Client Email Mandatory ICEBERG:
  • GCP
    • HIVE
    • UNITY
  • DELTALAKE
    • UNITY
client_email is missing
idp_type Identity Provider Type

Allowed values:

  • custom
  • none

By default, it is custom. It indicates that OAuth Service is grant_type= Client_Credentials

None is to provide token value directly in catalog auth

Example:

idp_type(‘custom’)

Only these values are supported at present.

Optional ICEBERG:

REST

  • GRAVITINO
  • POLARIS
idp_location OAuth EndPoint

Example:

idp_location('http://cnc-tdvm-smp-8878:8181/api/catalog/v1/oauth/tokens')

Mandatory ICEBERG:

REST

  • GRAVITINO
  • POLARIS
idp location is missing
idp_token_scope OAuth token scope

Example:

idp_token_scope('PRINCIPAL_ROLE:ALL')

Mandatory for Polaris and Gravitino ICEBERG:

REST

  • GRAVITINO
  • POLARIS
idp token scope is missing

The procedure to create DATALAKE objects for different catalogs and object storages is discussed in the following topics.