Creating and Altering Tables to Store DATASET Data - Teradata Vantage NewSQL Engine - 16.20

Teradata Vantageā„¢ DATASET Data Type

prodname
Teradata Database
Teradata Vantage NewSQL Engine
vrm_release
16.20
category
Programming Reference
featnum
B035-1198-162K

The CREATE TABLE statement supports the column-level attributes of the DATASET data type.

You can define DATASET table columns with column-level schemas or instance-level schemas.

Column-level Schemas

Usually, all DATASET values within a table column adhere to the same schema. In these cases, it makes sense to associate a schema with the table column. You can specify the schema using the CREATE storage_format SCHEMA statement, and you can associate the DATASET column with this schema using the WITH SCHEMA clause in the CREATE or ALTER TABLE statement.

There are storage and performance advantages when you use a column-level schema:
  • The schema is stored once within the data dictionary. It is not stored with each DATASET value in the row. This can result in significant storage space savings and some minor performance savings due to less I/O.
  • In many cases, the schema needs to be parsed only once for the DATASET column during a query execution. This can result in significant performance gains since schema parsing can be costly.
The following example shows a DATASET column in the Avro storage format with a column-level schema:
CREATE AVRO SCHEMA avro_schema_1 as '{"type":"record",
    "name":"rec_0","fields":[
        {"name":"ProductID","type":"int"},
        {"name":"Price","type":"int"}
]}';
CREATE TABLE mytable(c1 INTEGER, c2 DATASET STORAGE FORMAT AVRO WITH SCHEMA avro_schema_1);
The following example shows a DATASET column in the CSV storage format with a column-level schema:
CREATE CSV SCHEMA myCSVSchema AS
'{
"field_delimiter" : "\t",
"record_delimiter" : ";"
}';
CREATE TABLE myDatasetTable02
(
id INTEGER,
csvFile DATASET(100000) INLINE LENGTH 5000 STORAGE FORMAT CSV CHARACTER SET LATIN WITH SCHEMA myCSVSchema
);

Instance-level Schemas

If you omit the WITH SCHEMA clause in the CREATE or ALTER TABLE statement, the DATASET value is defined with an instance-level schema. In this case, both the schema and data are stored within the database row. This allows a DATASET table column to contain data values with different schemas. This flexibility comes at the cost of increased storage usage and reduced performance due to schema parsing.

The following shows an instance-level schema for a DATASET value with the CSV storage format:
CreateDATASET
(
'{"field_delimiter":"&", "record_delimiter":"#" }', 
'Item ID&Item Name&Item Color #55&bicycle&red#88&toy boat&pink#105&soap&#', 
CSV, 
UNICODE
)

Disabling DATASET Validation

When DATASET values are inserted into DATASET table columns, the schema and the data value are validated, which can result in reduced performance. Even if the DATASET column is defined with a column-level schema, the data must be validated against the schema during insertion. You can disable this validation by setting the DBS Control field DisableDATASETValidation to TRUE.