17.05 - Usage Notes - Teradata Database

Teradata Vantageā„¢ - DATASET Data Type

Advanced SQL Engine
Teradata Database
June 2020
Programming Reference

DATASET_PUBLISH composes a DATASET data type instance from different data sources (anything referenced in an SQL statement). It publishes DATASET data types of any storage format, exporting data stored within Teradata to an externally recognizable file format. DATASET_PUBLISH returns one output column, "data," which returns the data composed as a result of this operation.

The formats vary:
  • Data composed as Avro is returned as an instance of the DATASET type with storage format AVRO. When a DATASET data type stored as AVRO creates an Avro instance by using DATASET_PUBLISH, its schema is combined with the output schema. The default is to publish DATASET values in the AVRO storage format.
  • Data composed as CSV returns as a DATASET type instance with storage format CSV. The CSV file contains a header line with CSV column names and data. If the specified schema has a null field_names key, no header is included with the CSV value. All data is separated using the row and column delimiters. All Teradata complex data types (including DATASET) creating a CSV instance with DATASET_PUBLISH are converted to text representation and treated as one field in the CSV instance. To include a schema in the DATASET type, use the SCHEMA clause.

    If the DATASET value has a schema and is inserted into a table column with a column-based schema defined, the schema is removed from the value, and the CSV data is validated against the column-based schema.

    JSON data and DATASET data are not supported as input.

  • JSON data published to a DATASET type is converted to the desired storage format and then combined with the output.
  • Any other Teradata complex data types used to create an AVRO or CSV instance by using DATASET_PUBLISH are converted to text representation and treated as one field in the resulting instance. Note that distinct and structured UDTs are not supported.

When DATASET_PUBLISH is used to compose an Avro instance and you are publishing types that are found as nullable by definition, the schema marks the type as such by using a union of null and the corresponding data type. This means that the schema type for a field will look like "type":["null","int"] instead of "type":"int".

Null is always guaranteed to be the first element in the union for the auto-generated Avro schema.