The Teradata DATASET data type is a complex data type (CDT) representing self-describing files that are interpreted based on a schema. The feature provides the following functionality to support the storage and processing of DATASET data in the database.
Function | Description |
---|---|
Storage and processing |
|
Methods, functions, and stored procedures | Operate on the DATASET type, in any storage format and with any schema. |
Shredding | Extract values from DATASET documents and store the extracted data in a relational format. |
Publishing | Publish data stored in relational tables and compose a DATASET type with any storage format and any schema. |
Analytics |
|
SQL | Use standard SQL to query DATASET data. |
- Recursive descent operator (..)
- Wildcards (*), both in reference to named and indexed items
- Name/index lists ([a,b,c] or [0,3,5])
- Name/index slices ([c] or [5])
Client Support for the DATASET Data Type
Client Product | DATASET Support Provided |
---|---|
CLI | Full native DBS support. |
ODBC |
|
JDBC |
|
.NET Data Provider |
|
Teradata Parallel Transporter (TPT) | DATASET columns are similar to CLOB columns and subject to the same limitations. DATASET columns cannot exceed 16 MB (16,776,192 LATIN characters or 8,388,096 UNICODE characters). When loading or exporting DATASET columns, TPT users should specify CLOB or VARCHAR in the TPT schema definition. |
BTEQ | The DATASET keyword cannot be used in the USING data statement; therefore, DATASET values must be referred to as either BLOB or VARBYTE. |
Standalone Utilities | No support. |
Terminology
Data content and formats constantly evolve, creating different file types. Some file types are proprietary or specific to particular industries or applications, while others have a more general use.
Some applications use particular self-describing file formats. There is no one best solution; using different data types allows for more flexibility. Avro and CSV formats are examples of self-describing data; given the schema, a set of bytes are interpreted as a set of items described in that schema. The schema is provided with the data, which makes the data self-describing so various applications can understand it.
Regardless of format, purpose, content, or frequency of use, a large amount of self-describing data is analyzed. The Teradata Database stores and operates on data in its native format using dot notation.