The CREATE TABLE SQL can be used to create an OTF table in the catalog under the database specified in the command. Use 3-level dot notation to specify the datalake table: <datalake_name.otf_database_name>.<otf_table_name>.
Syntax
CREATE TABLE <datalake_name>.<OTFdatabase_name>.<OTFtable_name> ({<column_name> <column_type> [NOT NULL]}[, ...]) [{PARTITIONED | PARTITION} BY {({<transform> | <column_name>}[, ...]) <transform> | <column_name>}] [{SORTED | ORDER} BY {<column_name> [<sort_type>]}[, ...]] [{TBLPROPERTIES | PROPERTIES} (<custom_clause_list>)]; <datalake_name> ::= !! name of the data lake object <OTFdatabase_name> ::= !! name of the database in the OTF catalog <OTFtable_name> ::= !! name of the table in the OTF database <transform> ::= IDENTITY(<column_name>) | BUCKET(<hash_value>,<column_name>) | TRUNCATE(<truncate_value>, <column_name>)| YEAR(<column_name>) | MONTH(<column_name>) | DAY(<column_name>) | HOUR(<column_name>) | NULL(<column_name>) <hash_value> ::= !! bucket number <truncate_value> ::= !! width or length <sort_type> ::= ASC | DESC <custom_clause_list> ::= <custom_clause>[, ...] <custom_clause> ::= <quote><name><quote>=<quote><value><quote> <name> ::= !! Teradata identifier <value> ::= !! Teradata literal value
Example
Create OTF table named datalake.iceberg_glue.MyDatabase.Tab1.
CREATE TABLE datalake_glue.MyDatabase.Tab1 ( c1 VARCHAR(50) NOT NULL, c2 INT, dt TIMESTAMP(6), country VARCHAR(50) ) PARTITIONED BY (YEAR(dt), country, BUCKET(16,c2)) SORTED BY c1 ASC, c2 DESC TBLPROPERTIES( 'write.format.default'='avro','write.avro.compression-codec'='snappy', 'write.data.path'='<s3://db/tbl1>');
Usage Considerations
- When defining VARCHAR columns, define the string length based on the incoming string data characteristics. If a size is not provided, then the default is 32000. Using the default size of 32000 can use unnecessary storage and slow down Joins and Group by operations.
- The following CREATE TABLE options or clauses are not allowed:
- Table kind, such as SET, MULTISET, GLOBAL TEMPORARY, or VOLATILE
- MAP
- FALLBACK
- WITH JOURNAL TABLE
- LOG
- CHECKSUM
- FREESPACE
- INDEX
- ON COMMIT
- MERGEBLOCKRATIO
- DATABLOCKSIZE
- BLOCKCOMPRESSION
- NOT CASESPECIFIC (since parquet files are case-sensitive, OTF tables data is also case-sensitive)
- A given name in name/value pair must not occur multiple times in TBLPROPERTIES/PROPERTIES.
Single quoted database names for “create table” operations are not allowed for Unity tables due to Spark restriction. This is a limitation for Delta Lake (Write) Unity and Iceberg (Write) Unity only, as those are the only ones using Spark to create a table and register to Unity catalog.
i.e. the following operation on pre-existing database 'dbname' –
create table delta_unity_write."'dbname'".tbl(i int);
– will result in Unable to create table failure with the following underlying Spark error:
org.apache.spark.sql.catalyst.parser.ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near ''dbname''. SQLSTATE: 42601
For Iceberg partition rules, see Spec - Apache Iceberg.
- Partitions are a primary layer of indexing to avoid reading unnecessary metadata and data files. Select columns for the partitions that are frequently used in the queries as the partitioning columns.
- Specifying a default value for a column is not supported for Iceberg Write at this time.
- TBLPROPERTIES/PROPERTIES
The properties that can be specified for an Iceberg table are specified in the Iceberg Spec: Configuration - Apache Iceberg. The user can enter any name/value pair. Only the following are validated:
- write.format.default
- write.data.path
Table location must match the same bucket location specified for the data lake object.
- table.location
This is an optional property that can be specified during a create table operation to overwrite the default table location. The default table location is constructed based on storage_location + databasename + tablename.
- write.parquet.compression-codec
Valid values are snappy, zstd, and gzip.
- write.avro.compression-codec
Valid values are snappy, zstd, and gzip.
- write.orc.compression-codec
Value values are snappy, zstd, zlib, and lz4
- Partition transforms and SORTED/ORDER BY order AC, DESC are not supported for Delta Lake.
- TBLPROPERTIES/PROPERTIES:
- Delta Lake:
The properties that can be specified for a Delta Lake table are specified in the Delta Lake Documentation Of the defined properties, the following are validated:
- delta.minReaderVersion
- delta.minWriterVersion
- delta.compatibility.symlinkFormatManifest.enabled
This property is not allowed.
- table.location
This is an optional property that can be specified during a create table operation to overwrite the default table location. The default table location is constructed based on storage_location + databasename + tablename.
- databricks.cluster.id = <value>
This is a property that must be supplied for Delta Lake tables for UNITY Catalog.
`default_cluster_id is supplied in the CREATE DATALAKE but if TBLPROPERTIES/PROPERTIES have the databricks.cluster.id mentioned databricks.cluster.id takes precedence.
For example, databricks.cluster.id = '0131-200319-tlmvupk8'.
- Incompatible TBLPPROPERTIES/PROPERTICES values, if specified, are ignored and default values will be taken. For example, ‘write.format.default'='parquet' and 'write.avro.compression-codec'='snappy'.
In this case, the format is parquet, and the default compression for parquet will apply . The table property 'write.avro.compression-codec'='snappy' will be ignored.