The Teradata DATASET data type is a new Complex Data Type (CDT) representing self-describing files interpreted based on a schema. This feature provides the following functionality to support the storage and processing of DATASET data in Teradata Database:
- A DATASET data type, stored in the Avro file format. The DATASET data type allows variable types of storage formats, but for Release 16.00, only Avro is supported.
- Methods, functions, and procedures for processing, shredding, and publishing DATASET data.
- DATASET documents up to 16 MB in size.
The feature also provides enhanced dot notation. Dot notation was introduced for the JSON data type in Release 15.0, and is now extended to the DATASET data type. Dot notation includes the following syntax:
- Recursive descent operator (..)
- Wildcards (*) - both in reference to named and indexed items
- Name/index lists ([a,b,c] or [0,3,5])
- Name/index slices ([a:c] or [0:5])
- Simple name/index references
Benefits
The DATASET data type provides the following functionality for this data:
- Variable length: Both the maximum length and the in-row length are variable.
- Variable format of data stored: DATASET supports a built-in storage format for the Avro format. The data type type allows variable types of storage formats, but for now, only Avro is supported.
- Variable schema for data stored in any format: Users may define schemas at the column-level or instance-level for any of the built-in storage formats of the DATASET type. Column-level schemas are binding for all instances of the data type loaded into that particular column, whereas instance-level schemas may vary from instance to instance.
- Methods: Methods to operate on the DATASET type in any storage format and with any schema
- Functions: Functions to operate on the DATASET type in any storage format and with any schema
- Publishing to DATASET data type: Use data stored in relational tables to compose a DATASET type with any storage format and any schema.
- DATASET shredding: Allows you to extract values from DATASET documents and store the extracted data in relational format.
Avro data stored as a DATASET data type transforms to and from SQL as a VARBYTE or BLOB.
SQL Changes
The following statements are new for the DATASET data type:
- CREATE <storage-format-name> SCHEMA
- DROP <storage-format-name> SCHEMA
- SHOW <storage-format-name> SCHEMA
- HELP <storage-format-name> SCHEMA
- SET SESSION DOT NOTATION
The following statements were modified for the DATASET data type:
- CREATE TABLE
- ALTER TABLE
- CREATE/REPLACE FUNCTION
- CREATE INDEX
- COLLECT STATISTICS
- HELP, SHOW, and TYPE commands
Additional Information
For more information, see Teradata Vantage⢠DATASET Data Type, B035-1198.