DATASET_PUBLISH Usage Notes - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

DATASET_PUBLISH composes a DATASET data type instance from different data sources (anything referenced in an SQL statement). DATASET_PUBLISH publishes DATASET data types of any storage format, exporting stored data stored to an externally recognizable file format. DATASET_PUBLISH returns one output column, "data," which returns the data composed as a result of this operation.

The formats vary:

Data composed as Avro is returned as an instance of the DATASET type with storage format AVRO. When a DATASET data type stored as AVRO creates an Avro instance by using DATASET_PUBLISH, its schema is combined with the output schema. The default is to publish DATASET values in the AVRO storage format.
Data composed as CSV returns as a DATASET type instance with storage format CSV. The CSV file contains a header line with CSV column names and data. If the specified schema has a null field_names key, no header is included with the CSV value. All data is separated using the row and column delimiters. All Teradata complex data types (including DATASET) creating a CSV instance with DATASET_PUBLISH are converted to text representation and treated as one field in the CSV instance. To include a schema in the DATASET type, use the SCHEMA clause.
If the DATASET value has a schema and is inserted into a table column with a column-based schema defined, the schema is removed from the value, and the CSV data is validated against the column-based schema.

JSON data and DATASET data are not supported as input.
JSON data published to a DATASET type is converted to the desired storage format and then combined with the output.
Any other Teradata complex data types used to create an AVRO or CSV instance by using DATASET_PUBLISH are converted to text representation and treated as one field in the resulting instance. Distinct and structured UDTs are not supported.

When DATASET_PUBLISH is used to compose an Avro instance and you are publishing types that are found as nullable by definition, the schema marks the type as such by using a union of null and the corresponding data type. Therefore, the schema type for a field looks like "type":["null","int"] instead of "type":"int".

Null is guaranteed to be the first element in the union for the auto-generated Avro schema.