DATASET Data Type | Capabilities | VantageCloud Lake - Teradata Support for the DATASET Data Type

DATASET Data Type | Capabilities | VantageCloud Lake - Teradata Support for the DATASET Data Type - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

The Teradata DATASET data type is a complex data type (CDT) representing self-describing files that are interpreted based on a schema. The feature provides the following functionality to support the storage and processing of DATASET data in the database.

Function	Description
Storage and processing	Store variable data formats. Avro and Comma Separated Value (CSV) formats are supported. The CSV and AVRO DATASET data types are only supported on the Block File System on the primary cluster. They are not available for the Object File System. Specify the CDT variable maximum length or in-row length. Define schemas at the column-level or instance-level for any of the built-in storage formats of the DATASET type. Column-level schemas are binding for all instances of the data type loaded into that column, while instance-level schemas may vary from instance to instance.
Methods, functions, and stored procedures	Operate on the DATASET type, in any storage format and with any schema.
Shredding	Extract values from DATASET documents and store the extracted data in a relational format.
Publishing	Publish data stored in relational tables and compose a DATASET type with any storage format and any schema.
Analytics	Apply advanced analytics to DATASET data. Collect statistics on extracted portions of the DATASET type.
SQL	Use standard SQL to query DATASET data.

The feature also provides enhanced dot notation to allow easy access to data. Dot notation includes the following syntax for both DATASET and JSON:

Recursive descent operator (..)
Wildcards (*), both in reference to named and indexed items
Name/index lists ([a,b,c] or [0,3,5])
Name/index slices ([c] or [5])

Client Support for the DATASET Data Type

Client Product	DATASET Support Provided
CLI	Full native DBS support.
ODBC	The ODBC specification does not have a unique data type code for DATASET. Therefore, the ODBC driver maps the DATASET data type to SQL_LONGVARCHAR or SQL_WLONGVARCHAR, which are the ODBC CLOB data types. The metadata differentiates between a Teradata CLOB data type mapped to SQL_LONGVARCHAR and a Teradata DATASET data type mapped to SQL_LONGVARCHAR. The ODBC driver supports LOB Input, Output and InputOutput parameters and can load DATASET data. Catalog (Data Dictionary) functions also support DATASET.
JDBC	Teradata JDBC Driver 15.10.00.23 and later support the DATASET data type. The Teradata JDBC Driver offers functionality for an application to use the PreparedStatement or CallableStatement setObject method to bind a Struct value to a question-mark parameter marker as a DATASET data type. An application can also insert VARBYTE or BLOB values into DATASET destination columns. When an application uses the Teradata-specific functionality of specifying a DATASET value as a Struct value, the Struct value must contain one of the following attributes: Byte Array, InputStream, BLOB, or null. If the Struct contains an InputStream attribute, the Struct must also contain a second attribute that is an Integer type specifying the number of bytes in the stream. DATASET values are retrieved from the database as BLOB values. An application can use result set metadata or parameter metadata to distinguish a BLOB value from a DATASET value.
.NET Data Provider	The DATASET data type is externalized as a BLOB or VARBYTE. Applications can use TdBlob or TdDataReader.GetBytes to retrieve a DATASET value. Applications can send a DATASET value as BYTE[] to the database. Schema Collections (Data Dictionary) also support the DATASET data type.
Teradata Parallel Transporter (TPT)	DATASET columns are similar to CLOB columns and subject to the same limitations. DATASET columns cannot exceed 16 MB (16,776,192 LATIN characters or 8,388,096 UNICODE characters). When loading or exporting DATASET columns, TPT users must specify CLOB or VARCHAR in the TPT schema definition.
BTEQ	The DATASET keyword cannot be used in the USING data statement; therefore, DATASET values must be called either BLOB or VARBYTE.
Standalone Utilities	No support.

Terminology

Data content and formats constantly evolve, creating different file types. File types may be proprietary or specific to industries or applications.

Applications may use self-describing file formats. There is no one best solution; using different data types allows for more flexibility. Avro and CSV formats are examples of self-describing data; given the schema, a set of bytes are interpreted as a set of items described in that schema. The schema is provided with the data, which makes the data self-describing so applications can understand it.

Regardless of format, purpose, content, or frequency of use, a large amount of self-describing data is analyzed. The database stores and operates on data in its native format using dot notation.