Teradata Support for the DATASET Data Type

Teradata Vantageā„¢ DATASET Data Type

brand
Software
Teradata Vantage
prodname
Teradata Database
Teradata Vantage NewSQL Engine
vrm_release
16.20
category
Programming Reference
featnum
B035-1198-162K

The Teradata DATASET data type is a complex data type (CDT) representing self-describing files that are interpreted based on a schema. The feature provides the following functionality to support the storage and processing of DATASET data in the Teradata Database.

Function Description
Storage and processing
  • Store variable data formats. Avro and Comma Separated Value (CSV) formats are supported.
  • Specify the CDT variable maximum length or in-row length.
  • Define schemas at the column-level or instance-level for any of the built-in storage formats of the DATASET type. Column-level schemas are binding for all instances of the data type loaded into that particular column, while instance-level schemas may vary from instance to instance.
Methods, functions, and stored procedures Operate on the DATASET type, in any storage format and with any schema.
Shredding Extract values from DATASET documents and store the extracted data in a relational format.
Publishing Publish data stored in relational tables and compose a DATASET type with any storage format and any schema.
Analytics
  • Apply advanced analytics to DATASET data.
  • Collect statistics on extracted portions of the DATASET type.
SQL Use standard SQL to query DATASET data.
The feature also provides enhanced dot notation to allow easy access to data. Dot notation includes the following syntax for both DATASET and JSON:
  • Recursive descent operator (..)
  • Wildcards (*), both in reference to named and indexed items
  • Name/index lists ([a,b,c] or [0,3,5])
  • Name/index slices ([c] or [5])

Client Support for the DATASET Data Type

Client Product DATASET Support Provided
CLI Full native DBS support.
ODBC
  • The ODBC specification does not have a unique data type code for DATASET. Therefore, the ODBC driver maps the DATASET data type to SQL_LONGVARCHAR or SQL_WLONGVARCHAR, which are the ODBC CLOB data types. The metadata differentiates between a Teradata CLOB data type mapped to SQL_LONGVARCHAR and a Teradata DATASET data type mapped to SQL_LONGVARCHAR.
  • The ODBC driver supports LOB Input, Output and InputOutput parameters and can load DATASET data. Catalog (Data Dictionary) functions also support DATASET.
JDBC
  • Teradata JDBC Driver 15.10.00.23 and later support the DATASET data type.
  • The Teradata JDBC Driver offers functionality for an application to use the PreparedStatement or CallableStatement setObject method to bind a Struct value to a question-mark parameter marker as a DATASET data type. An application can also insert VARBYTE or BLOB values into DATASET destination columns.
  • When an application uses the Teradata-specific functionality of specifying a DATASET value as a Struct value, the Struct value must contain one of the following attributes: Byte Array, InputStream, BLOB, or null. If the Struct contains an InputStream attribute, the Struct must also contain a second attribute that is an Integer type specifying the number of bytes in the stream.
  • DATASET values are retrieved from the Teradata Database as BLOB values. An application can use result set metadata or parameter metadata to distinguish a BLOB value from a DATASET value.
.NET Data Provider
  • The DATASET data type is externalized as a BLOB or VARBYTE. Applications can use TdBlob or TdDataReader.GetBytes to retrieve a DATASET value.
  • Applications can send a DATASET value as BYTE[] to the Teradata Database.
  • Schema Collections (Data Dictionary) also support the DATASET data type.
Teradata Parallel Transporter (TPT) DATASET columns are similar to CLOB columns and subject to the same limitations. DATASET columns cannot exceed 16 MB (16,776,192 LATIN characters or 8,388,096 UNICODE characters). When loading or exporting DATASET columns, TPT users should specify CLOB or VARCHAR in the TPT schema definition.
BTEQ The DATASET keyword cannot be used in the USING data statement; therefore, DATASET values must be referred to as either BLOB or VARBYTE.
Standalone Utilities No support.

Terminology

Data content and formats constantly evolve, creating different file types. Some file types are proprietary or specific to particular industries or applications, while others have a more general use.

Some applications use particular self-describing file formats. There is no one best solution; using different data types allows for more flexibility. Avro and CSV formats are examples of self-describing data; given the schema, a set of bytes are interpreted as a set of items described in that schema. The schema is provided with the data, which makes the data self-describing so various applications can understand it.

Regardless of format, purpose, content, or frequency of use, a large amount of self-describing data is analyzed. The Teradata Database stores and operates on data in its native format using dot notation.