The DATALAKE object encapsulates all the information needed to connect to an OTF data lake including the Authorization information needed to connect to the Catalog & Object storage and the connection details.
A DATALAKE object is created using the CREATE DATALAKE statement. All DATALAKEs are created in the TD_SERVER_DB database.
The Authorization information to connect to the Catalog and Object Storage is specified in the <auth_list> clause of the CREATE DATALAKE statement. See the topic Creating Authorization Objects for a DATALAKE for a DATALAKE on how to create AUTHORIZATION objects for a DATALAKE.
CREATE DATALAKE <datalake_name> <auth_list> USING catalog_type <left_paren> <catalog_type_literal> <right_paren> [catalog_location <left_paren> <catalog_location_literal> <right_paren>] [storage_location <left_paren> <storage_location_literal> <right_paren>] [storage_region <left_paren> <storage_region_literal> <right_paren>] [unity_catalog_name <left_paren> <unity_catalog_name_literal> <right_paren>] [storage_account_name <left_paren> <storage_account_name_literal> <right_paren>] [tenant_id <left_paren> <tenant_id_literal> <right_paren>] [default_cluster_id <left_paren> <default_cluster_id_literal> <right_paren>] [<custom_clause_list>] TABLE FORMAT <format_type>; <datalake_name>::= !! Teradata identifier <auth_list>::= EXTERNAL SECURITY <auth_type> [{<comma> <auth_list>}...] <auth_type>::= DEFINER TRUSTED <connection_type> <auth_name> | [INVOKER] TRUSTED <connection_type> <auth_name> <connection_type>::= CATALOG | STORAGE <auth_name>::= !! Teradata identifier <catalog_type_literal>::= <quote> <catalog_type> <quote> <catalog_location_literal>::= <quote> <catalog_uri> <quote> <storage_location_literal>::= <quote> <storage_uri> <quote> <storage_region_literal>::= <quote> <storage_region> <quote> <unity_catalog_name_literal>::= <quote> <unity_catalog_name> <quote> <storage_account_name_literal>::= <quote> <storage_account_name> <quote> <tenant_id_literal>::= <quote> <tenant_id> <quote> <catalog_type>::= hive | glue | unity <catalog_uri>::= !! Catalog URI, e.g. thrift://example.com, required when catalog_type is hive or unity <storage_uri>::= !! Storage URI, e.g. s3://example-iceberg-v1/, required when catalog_type is hive <storage_region>::= !! Cloud region, e.g. us-west-2, required when catalog_type is glue or hive <unity_catalog_name>::= !! Unity/Azure Databricks catalog name, e.g. reg_iceberg_db, required when catalog_type is unity <storage_account_name>::= !! Azure DataLake Storage Gen2 storage account name, e.g. regicebergstorageacct, required when catalog_type is unity <tenant_id>::= !! Azure Active Directory service principal tenant ID, e.g. 391c8c4c-6a2a-40fd-ab98-226b6baa5155, required when catalog_type is unity <default_cluster_id>::= !! Spark compute cluster ID; the expected format is xxxx-xxxxxx-xxxxxxxx, e.g. 0210-232334-ab0q59t3, required when catalog_type is unity <custom_clause_list>::= <custom_clause> [<custom_clause_list>...] <custom_clause>::= <name> <left_paren> <quote> <value>[{<comma><value>}...] <quote> <right_paren> <name>::= !! Teradata identifier <value>::= !! Teradata literal value <format_type>::= iceberg | deltalake
Usage Considerations
Custom Clause | Values (Case Insensitive) | Optional/Mandatory | Catalogs Supporting | Error |
---|---|---|---|---|
TABLE FORMAT is part of the SQL (not Using clause of Datalake DDL) | IcebergDeltaLake | Mandatory | ICEBERG:
DELTLAKE:
|
Table Format is missing or invalid. Supported Table Formats (Iceberg/DeltaLake) |
catalog_type | HiveGlueUnityFabric | Mandatory | ICEBERG:
DELTLAKE:
|
Catalog type is missing or invalid. Supported catalog (hive/glue/unity/fabric) |
catalog_location | ex: thrift://172.177.44.4:9083 | Mandatory for Hive, Unity | ICEBERG:
DELTLAKE:
|
Catalog Location is missing or invalid. |
catalog_endpoint | ex:https://glue-fips.us-west-2.amazonaws.com | Optional | ICEBERG:
DELTLAKE:
|
|
unity_catalog_name | ex: iceberg_db | Mandatory for Unity | ICEBERG and DELTALAKE:
|
unity_catalog_name is missing or invalid. |
storage_location | ex:bfss://otf-330spark-51hdi-publ-2024-06-13t06-52-39-186z@iceberg-storageeastus2-.dfs.core.-windows.net/s3://vim-iceberg-v1/ For S3, follow the naming conventions mentioned here: Bucket naming rules - Amazon Simple Storage Service |
Mandatory for Hive, Glue Mandatoryfor Unity on AWS Optionalfor Unity on Azure |
ICEBERG:
DELTLAKE:
|
Storage Location is missing or invalid. |
storage_region | ex: us-west-2 | Mandatory for Hive, Glue Mandatory for Unity on AWS Optional for Unity on Azure |
ICEBERG:
DELTLAKE:
|
Cloud Region is missing or invalid. |
storage_endpoint | ex: https://s3-fips.us-west-2.amazonaws.com |
Optional | ICEBERG:
DELTLAKE:
|
|
storage_account_name | ex: icebergstorageeastus2 |
Mandatory for Hive on Azure and Unity | ICEBERG:
DELTLAKE:
|
Storage Account Name is missing or invalid. |
tenant_id | ex: 391c8c4c-6a2a-40fd-ab98-226b6baa5155 |
Mandatory for Hive on Azure, Unity | ICEBERG:
DELTLAKE:
|
TenantId is missing or invalid. |
default_cluster_id | ex: 0210-232334-ab0q59t3 |
Mandatory | ICEBERG:
DELTLAKE:
|
|
container_name | ex: otf-330spark-51hdi-publ-2024-06-13t06-52-39-186z |
Optional | ICEBERG:
DELTLAKE:
|
container_name is missing or invalid. |
The procedure to create DATALAKE objects for different catalogs & object storages is discussed in the following topics.