Creating OTF Table | Teradata VantageCloud Lake - Creating OTF Table - Teradata Vantage

Apache Iceberg and Delta Lake Open Table Format on VantageCloud Lake Getting Started

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-03
dita:mapPath
bsr1702324250454.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
bsr1702324250454

The CREATE TABLE SQL can be used to create an OTF table in the catalog under the database specified in the command. Use 3-level dot notation to specify the datalake table: <datalake_name.otf_database_name>.<otf_table_name>.

Syntax

CREATE TABLE <datalake_name>.<OTFdatabase_name>.<OTFtable_name>
    ({<column_name> <column_type> [NOT NULL]}[, ...])
[{PARTITIONED | PARTITION} BY {({<transform> | <column_name>}[, ...]) 
     <transform> | <column_name>}]
[{SORTED | ORDER} BY {<column_name> [<sort_type>]}[, ...]]
[{TBLPROPERTIES | PROPERTIES} (<custom_clause_list>)];
<datalake_name> ::= !! name of the data lake object
<OTFdatabase_name> ::= !! name of the database in the OTF catalog
<OTFtable_name> ::= !! name of the table in the OTF database
<transform> ::= IDENTITY(<column_name>) | BUCKET(<hash_value>,<column_name>) | 
                TRUNCATE(<truncate_value>, <column_name>)|
                YEAR(<column_name>) | MONTH(<column_name>) | DAY(<column_name>) | 
                HOUR(<column_name>) | NULL(<column_name>)
<hash_value> ::= !! bucket number
<truncate_value> ::= !! width or length
<sort_type> ::= ASC | DESC
<custom_clause_list> ::= <custom_clause>[, ...]
<custom_clause> ::= <quote><name><quote>=<quote><value><quote> 
<name> ::= !! Teradata identifier
<value> ::= !! Teradata literal value

Example

Create OTF table named datalake.iceberg_glue.MyDatabase.Tab1.

CREATE TABLE datalake_glue.MyDatabase.Tab1 
    ( c1 VARCHAR(50) NOT NULL, 
      c2 INT, 
      dt TIMESTAMP(6), 
      country VARCHAR(50) )
PARTITIONED BY (YEAR(dt), country, BUCKET(16,c2)) 
SORTED BY c1 ASC, c2 DESC 
TBLPROPERTIES(
      'write.format.default'='avro','write.avro.compression-codec'='snappy',
      'write.data.path'='<s3://db/tbl1>');

Usage Considerations

  • When defining VARCHAR columns, define the string length based on the incoming string data characteristics. If a size is not provided, then the default is 32000. Using the default size of 32000 can use unnecessary storage and slow down Joins and Group by operations.
  • The following CREATE TABLE options or clauses are not allowed:
    • Table kind, such as SET, MULTISET, GLOBAL TEMPORARY, or VOLATILE
    • MAP
    • FALLBACK
    • WITH JOURNAL TABLE
    • LOG
    • CHECKSUM
    • FREESPACE
    • INDEX
    • ON COMMIT
    • MERGEBLOCKRATIO
    • DATABLOCKSIZE
    • BLOCKCOMPRESSION
    • NOT CASESPECIFIC (since parquet files are case-sensitive, OTF tables data is also case-sensitive)
  • A given name in name/value pair must not occur multiple times in TBLPROPERTIES/PROPERTIES.
Iceberg and Delta Lake (Write) on Unity

Single quoted database names for “create table” operations are not allowed for Unity tables due to Spark restriction. This is a limitation for Delta Lake (Write) Unity and Iceberg (Write) Unity only, as those are the only ones using Spark to create a table and register to Unity catalog.

i.e. the following operation on pre-existing database 'dbname' –

create table delta_unity_write."'dbname'".tbl(i int);

– will result in Unable to create table failure with the following underlying Spark error:

org.apache.spark.sql.catalyst.parser.ParseException: 
[PARSE_SYNTAX_ERROR] Syntax error at or near ''dbname''. SQLSTATE: 42601
Double quotes are permitted; for example,"dbname" database names are allowed. But neither 'dbname' nor "'dbname'" are supported. The same is applicable for all special characters in the table name: []#%^<>{}.
Iceberg

For Iceberg partition rules, see Spec - Apache Iceberg.

  • Partitions are a primary layer of indexing to avoid reading unnecessary metadata and data files. Select columns for the partitions that are frequently used in the queries as the partitioning columns.
  • Specifying a default value for a column is not supported for Iceberg Write at this time.
  • TBLPROPERTIES/PROPERTIES

    The properties that can be specified for an Iceberg table are specified in the Iceberg Spec: Configuration - Apache Iceberg. The user can enter any name/value pair. Only the following are validated:

    • write.format.default
    • write.data.path

      Table location must match the same bucket location specified for the data lake object.

    • table.location

      This is an optional property that can be specified during a create table operation to overwrite the default table location. The default table location is constructed based on storage_location + databasename + tablename.

    • write.parquet.compression-codec

      Valid values are snappy, zstd, and gzip.

    • write.avro.compression-codec

      Valid values are snappy, zstd, and gzip.

    • write.orc.compression-codec

      Value values are snappy, zstd, zlib, and lz4

Delta Lake
  • Partition transforms and SORTED/ORDER BY order AC, DESC are not supported for Delta Lake.
  • TBLPROPERTIES/PROPERTIES:
  • Delta Lake:

    The properties that can be specified for a Delta Lake table are specified in the Delta Lake Documentation Of the defined properties, the following are validated:

    • delta.minReaderVersion
    • delta.minWriterVersion
    • delta.compatibility.symlinkFormatManifest.enabled

      This property is not allowed.

    • table.location

      This is an optional property that can be specified during a create table operation to overwrite the default table location. The default table location is constructed based on storage_location + databasename + tablename.

    • databricks.cluster.id = <value>

      This is a property that must be supplied for Delta Lake tables for UNITY Catalog.

      `default_cluster_id is supplied in the CREATE DATALAKE but if TBLPROPERTIES/PROPERTIES have the databricks.cluster.id mentioned databricks.cluster.id takes precedence.

      For example, databricks.cluster.id = '0131-200319-tlmvupk8'.

  • Incompatible TBLPPROPERTIES/PROPERTICES values, if specified, are ignored and default values will be taken. For example, ‘write.format.default'='parquet' and 'write.avro.compression-codec'='snappy'.

    In this case, the format is parquet, and the default compression for parquet will apply . The table property 'write.avro.compression-codec'='snappy' will be ignored.