The DATALAKE object encapsulates all the information needed to connect to an OTF data lake including the Authorization information needed to connect to the Catalog & Object storage and the connection details.
A DATALAKE object is created using the CREATE DATALAKE statement. All DATALAKEs are created in the TD_SERVER_DB database.
The Authorization information to connect to the Catalog and Object Storage is specified in the <auth_list> clause of the CREATE DATALAKE statement. See the topic Creating Authorization Objects for a DATALAKE for a DATALAKE on how to create AUTHORIZATION objects for a DATALAKE.
CREATE DATALAKE <datalake_name>
<auth_list>
USING catalog_type <left_paren> <catalog_type_literal> <right_paren>
[catalog_location <left_paren> <catalog_location_literal> <right_paren>]
[storage_location <left_paren> <storage_location_literal> <right_paren>]
[storage_region <left_paren> <storage_region_literal> <right_paren>]
[unity_catalog_name <left_paren> <unity_catalog_name_literal> <right_paren>]
[storage_account_name <left_paren> <storage_account_name_literal> <right_paren>]
[tenant_id <left_paren> <tenant_id_literal> <right_paren>]
[default_cluster_id <left_paren> <default_cluster_id_literal> <right_paren>]
[<custom_clause_list>]
TABLE FORMAT <format_type>;
<datalake_name>::= !! Teradata identifier
<auth_list>::= EXTERNAL SECURITY <auth_type> [{<comma> <auth_list>}...]
<auth_type>::= DEFINER TRUSTED <connection_type> <auth_name> | [INVOKER] TRUSTED <connection_type> <auth_name>
<connection_type>::= CATALOG | STORAGE
<auth_name>::= !! Teradata identifier
<catalog_type_literal>::= <quote> <catalog_type> <quote>
<catalog_location_literal>::= <quote> <catalog_uri> <quote>
<storage_location_literal>::= <quote> <storage_uri> <quote>
<storage_region_literal>::= <quote> <storage_region> <quote>
<unity_catalog_name_literal>::= <quote> <unity_catalog_name> <quote>
<storage_account_name_literal>::= <quote> <storage_account_name> <quote>
<tenant_id_literal>::= <quote> <tenant_id> <quote>
<catalog_type>::= hive | glue | unity | rest | fabric
<catalog_uri>::= !! Catalog URI, e.g. thrift://example.com, required when catalog_type is hive or unity or rest
<storage_uri>::= !! Storage URI, e.g. s3://example-iceberg-v1/, required when catalog_type is hive
<storage_region>::= !! Cloud region, e.g. us-west-2, required when catalog_type is glue or hive
<unity_catalog_name>::= !! Unity/Azure Databricks catalog name, e.g. reg_iceberg_db, required when catalog_type is unity
<storage_account_name>::= !! Azure DataLake Storage Gen2 storage account name, e.g. regicebergstorageacct, required when catalog_type is unity
<tenant_id>::= !! Azure Active Directory service principal tenant ID, e.g. 391c8c4c-6a2a-40fd-ab98-226b6baa5155, required when catalog_type is unity
<default_cluster_id>::= !! Spark compute cluster ID; the expected format is xxxx-xxxxxx-xxxxxxxx, e.g. 0210-232334-ab0q59t3, required when catalog_type is unity
<project_id>::= !! GCP Service Account Project ID
<client_id>::= !! GCP Service Account Client ID
<client_email>::= !! GCP Service Account Client Email
<catalog_service_principal_type>::= !! Databricks Managed or IDP managed(Azure or GCP) Service Principles
<idp_type>: !! Identity Provider. Allowes value are custom and none
<idp_location>: !! OAuth Location endpoint URL
<idp_token_scope>: !! OAuth token scope
<storage_endpoint>: !! This is required for On Prem S3 Compatible Storage like Dell ECS or Minio or StorageGrid
<catalog_name>::= !! Name of the catalog. Mainly used for Iceberg REST catalog
<custom_clause_list>::= <custom_clause> [<custom_clause_list>...]
<custom_clause>::= <name> <left_paren> <quote> <value>[{<comma><value>}...] <quote> <right_paren>
<name>::= !! Teradata identifier
<value>::= !! Teradata literal value
<format_type>::= iceberg | deltalake
Usage Considerations
| Custom Clause | Values (Case Insensitive) | Optional/Mandatory | Catalogs Supporting | Error |
|---|---|---|---|---|
| TABLE FORMAT is part of the SQL (not Using clause of Datalake DDL) |
|
Mandatory | ICEBERG:
DELTLAKE:
|
Table Format is missing or invalid. Supported Table Formats (Iceberg/DeltaLake) |
| catalog_type |
|
Mandatory | ICEBERG:
DELTLAKE:
|
Catalog type is missing or invalid. Supported catalog (hive/glue/unity/fabric) |
| catalog_location | ex: thrift://172.177.44.4:9083 | Mandatory for Hive, Unity | ICEBERG:
DELTLAKE:
|
Catalog Location is missing or
invalid.
|
| catalog_endpoint | ex:https://glue-fips.us-west-2.amazonaws.com | Optional | ICEBERG:
DELTLAKE:
|
|
| unity_catalog_name | ex: iceberg_db | Mandatory for Unity | ICEBERG and DELTALAKE:
|
unity_catalog_name is missing or invalid. |
| storage_location | ex:bfss://otf-330spark-51hdi-publ-2024-06-13t06-52-39-186z@iceberg-storageeastus2-.dfs.core.-windows.net/s3://vim-iceberg-v1/ For S3, follow the naming conventions mentioned here: Bucket naming rules - Amazon Simple Storage Service |
Mandatory for Hive, Glue Mandatoryfor Unity on AWS Optionalfor Unity on Azure |
ICEBERG:
DELTLAKE:
|
Storage Location is missing or invalid. |
| storage_region | ex: us-west-2 | Mandatory for Hive, Glue Mandatory for Unity on AWS Optional for Unity on Azure |
ICEBERG:
DELTLAKE:
|
Cloud Region is missing or invalid. |
| storage_endpoint | ex: https://s3-fips.us-west-2.amazonaws.com |
Optional | ICEBERG:
DELTLAKE:
|
|
| storage_account_name | ex: icebergstorageeastus2 |
Mandatory for Hive on Azure and Unity | ICEBERG:
DELTLAKE:
|
Storage Account Name is missing or invalid. |
| tenant_id | ex: 391c8c4c-6a2a-40fd-ab98-226b6baa5155 |
Mandatory for Hive on Azure, Unity | ICEBERG:
DELTLAKE:
|
TenantId is missing or invalid. |
| default_cluster_id | ex: 0210-232334-ab0q59t3 |
Mandatory | ICEBERG:
DELTLAKE:
|
|
| container_name | ex: otf-330spark-51hdi-publ-2024-06-13t06-52-39-186z |
Optional | ICEBERG:
DELTLAKE:
|
container_name is missing or invalid. |
| catalog_service_principal_type |
idp_managed_principal can be used only for Azure and GCP databricks.
|
Optional if using databricks_managed_principle | ICEBERG:
DELTLAKE:
|
Client authentication failed |
| project_id | GCP Service Account Project ID Example: tc-otf |
Mandatory | ICEBERG:
|
project_id is missing |
| client_id | GCP Service Account Client ID | Mandatory | ICEBERG:
|
client_id is missing |
| client_email | GCP Service Account Client Email | Mandatory | ICEBERG:
|
client_email is missing |
| idp_type | Identity Provider Type Allowed values:
By default, it is custom. It indicates that OAuth Service is grant_type= Client_Credentials None is to provide token value directly in catalog auth Example: idp_type(‘custom’) Only these values are supported at present. |
Optional | ICEBERG:
REST
|
|
| idp_location | OAuth EndPoint Example: idp_location('http://cnc-tdvm-smp-8878:8181/api/catalog/v1/oauth/tokens') |
Mandatory | ICEBERG:
REST
|
idp location is missing |
| idp_token_scope | OAuth token scope Example: idp_token_scope('PRINCIPAL_ROLE:ALL') |
Mandatory for Polaris and Gravitino | ICEBERG:
REST
|
idp token scope is missing |
The procedure to create DATALAKE objects for different catalogs and object storages is discussed in the following topics.