The CREATE TABLE statement supports the column-level attributes of the DATASET data type.
You can define DATASET table columns with column-level schemas or instance-level schemas.
Column-level Schemas
Usually, all DATASET values within a table column adhere to the same schema. In these cases, it makes sense to associate a schema with the table column. You can specify the schema using the CREATE storage_format SCHEMA statement, and you can associate the DATASET column with this schema using the WITH SCHEMA clause in the CREATE or ALTER TABLE statement.
- The schema is stored once within the data dictionary. It is not stored with each DATASET value in the row. This can result in significant storage space savings and some minor performance savings due to less I/O.
- In many cases, the schema needs to be parsed only once for the DATASET column during a query execution. This can result in significant performance gains since schema parsing can be costly.
CREATE AVRO SCHEMA avro_schema_1 as '{"type":"record", "name":"rec_0","fields":[ {"name":"ProductID","type":"int"}, {"name":"Price","type":"int"} ]}';
CREATE TABLE mytable(c1 INTEGER, c2 DATASET STORAGE FORMAT AVRO WITH SCHEMA avro_schema_1);
CREATE CSV SCHEMA myCSVSchema AS '{ "field_delimiter" : "\t", "record_delimiter" : ";" }';
CREATE TABLE myDatasetTable02 ( id INTEGER, csvFile DATASET(100000) INLINE LENGTH 5000 STORAGE FORMAT CSV CHARACTER SET LATIN WITH SCHEMA myCSVSchema );
Instance-level Schemas
If you omit the WITH SCHEMA clause in the CREATE or ALTER TABLE statement, the DATASET value is defined with an instance-level schema. The schema and data are stored in the database row. This allows a DATASET table column to contain data values with different schemas. This flexibility comes at the cost of increased storage usage and reduced performance due to schema parsing.
CreateDATASET ( '{"field_delimiter":"&", "record_delimiter":"#" }', 'Item ID&Item Name&Item Color #55&bicycle&red#88&toy boat&pink#105&soap&#', CSV, UNICODE )
Disabling DATASET Validation
When DATASET values are inserted into DATASET table columns, the schema and the data value are validated, which can result in reduced performance. Even if the DATASET column is defined with a column-level schema, the data must be validated against the schema during insertion. You can disable this validation by setting the DBS Control field DisableDATASETValidation to TRUE.