DATASET Data Type | Capabilities | VantageCloud Lake - Teradata Support for the DATASET Data Type - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

The Teradata DATASET data type is a complex data type (CDT) representing self-describing files that are interpreted based on a schema. The feature provides the following functionality to support the storage and processing of DATASET data in the database.

Function Description
Storage and processing
  • Store variable data formats. Avro and Comma Separated Value (CSV) formats are supported.
    The CSV and AVRO DATASET data types are only supported on the Block File System on the primary cluster. They are not available for the Object File System.
  • Specify the CDT variable maximum length or in-row length.
  • Define schemas at the column-level or instance-level for any of the built-in storage formats of the DATASET type. Column-level schemas are binding for all instances of the data type loaded into that column, while instance-level schemas may vary from instance to instance.
Methods, functions, and stored procedures Operate on the DATASET type, in any storage format and with any schema.
Shredding Extract values from DATASET documents and store the extracted data in a relational format.
Publishing Publish data stored in relational tables and compose a DATASET type with any storage format and any schema.
Analytics
  • Apply advanced analytics to DATASET data.
  • Collect statistics on extracted portions of the DATASET type.
SQL Use standard SQL to query DATASET data.
The feature also provides enhanced dot notation to allow easy access to data. Dot notation includes the following syntax for both DATASET and JSON:
  • Recursive descent operator (..)
  • Wildcards (*), both in reference to named and indexed items
  • Name/index lists ([a,b,c] or [0,3,5])
  • Name/index slices ([c] or [5])

Client Support for the DATASET Data Type

Client Product DATASET Support Provided
CLI Full native DBS support.
ODBC
  • The ODBC specification does not have a unique data type code for DATASET. Therefore, the ODBC driver maps the DATASET data type to SQL_LONGVARCHAR or SQL_WLONGVARCHAR, which are the ODBC CLOB data types. The metadata differentiates between a Teradata CLOB data type mapped to SQL_LONGVARCHAR and a Teradata DATASET data type mapped to SQL_LONGVARCHAR.
  • The ODBC driver supports LOB Input, Output and InputOutput parameters and can load DATASET data. Catalog (Data Dictionary) functions also support DATASET.
JDBC
  • Teradata JDBC Driver 15.10.00.23 and later support the DATASET data type.
  • The Teradata JDBC Driver offers functionality for an application to use the PreparedStatement or CallableStatement setObject method to bind a Struct value to a question-mark parameter marker as a DATASET data type. An application can also insert VARBYTE or BLOB values into DATASET destination columns.
  • When an application uses the Teradata-specific functionality of specifying a DATASET value as a Struct value, the Struct value must contain one of the following attributes: Byte Array, InputStream, BLOB, or null. If the Struct contains an InputStream attribute, the Struct must also contain a second attribute that is an Integer type specifying the number of bytes in the stream.
  • DATASET values are retrieved from the database as BLOB values. An application can use result set metadata or parameter metadata to distinguish a BLOB value from a DATASET value.
.NET Data Provider
  • The DATASET data type is externalized as a BLOB or VARBYTE. Applications can use TdBlob or TdDataReader.GetBytes to retrieve a DATASET value.
  • Applications can send a DATASET value as BYTE[] to the database.
  • Schema Collections (Data Dictionary) also support the DATASET data type.
Teradata Parallel Transporter (TPT) DATASET columns are similar to CLOB columns and subject to the same limitations. DATASET columns cannot exceed 16 MB (16,776,192 LATIN characters or 8,388,096 UNICODE characters). When loading or exporting DATASET columns, TPT users must specify CLOB or VARCHAR in the TPT schema definition.
BTEQ The DATASET keyword cannot be used in the USING data statement; therefore, DATASET values must be called either BLOB or VARBYTE.
Standalone Utilities No support.

Terminology

Data content and formats constantly evolve, creating different file types. File types may be proprietary or specific to industries or applications.

Applications may use self-describing file formats. There is no one best solution; using different data types allows for more flexibility. Avro and CSV formats are examples of self-describing data; given the schema, a set of bytes are interpreted as a set of items described in that schema. The schema is provided with the data, which makes the data self-describing so applications can understand it.

Regardless of format, purpose, content, or frequency of use, a large amount of self-describing data is analyzed. The database stores and operates on data in its native format using dot notation.