Creating a DATALAKE using a Hive Catalog - Teradata Vantage

Apache Iceberg and Delta Lake Open Table Format on VantageCloud Lake Getting Started

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-03
dita:mapPath
bsr1702324250454.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
bsr1702324250454

The following example shows how to create an Iceberg DATALAKE that connects to an Apache Hive Catalog in AWS S3 Object storage.

Define the authorization for catalog:

CREATE AUTHORIZATION hive_catalog_auth
     AS INVOKER TRUSTED
     USER 'xxx'
     PASSWORD 'yyy’;

Define authorization for Storage access:

CREATE AUTHORIZATION s3_storage_auth
     AS INVOKER TRUSTED
     USER 'abc'
     PASSWORD 'def’;

Create an Iceberg DATALAKE object referencing the two AUTH objects:

CREATE DATALAKE datalake_iceberg_hive
EXTERNAL SECURITY INVOKER TRUSTED CATALOG hive_catalog_auth,
EXTERNAL SECURITY INVOKER TRUSTED STORAGE s3_storage_auth
USING
    catalog_type ('hive')
    catalog_location ('thrift://<hostname>:<port>')
    storage_location ('s3://<folder>/')
    storage_region('us-west-2')
    s3_max_task ('1000')
    s3_max_threads ('1000')
    s3_max_connections ('5000')
    vectorized_read_scans_batch_size ('1')
TABLE FORMAT iceberg;

The following example shows how to create an Iceberg DATALAKE that connects to an Apache Hive Catalog in ADLS Gen2 storage.

Define the authorization for catalog and storage access:

CREATE AUTHORIZATION hive_catalog_auth  
   AS INVOKER TRUSTED
   USER '<azure_principal_clientid> ' -- Azure AD service principal client id
   PASSWORD '<client_secret_key>';    -- Azure AD service principal client secret key

Create DATALAKE object:

CREATE DATALAKE database_iceberg_hive 
EXTERNAL SECURITY INVOKER TRUSTED CATALOG hive_catalog_auth,
EXTERNAL SECURITY INVOKER TRUSTED STORAGE hive_catalog_auth
USING
    catalog_type ('hive')
    catalog_location ('thrift://<hostname>:<port>')
    storage_location ('abfss://<folder>/')
    container_name ('<container-name>')
    storage_region ('East US 2')
    storage_account_name ('<account-name>')
    tenant_id('<tenant-id>')
TABLE FORMAT iceberg;