DATASET_PUBLISH Usage Notes - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

DATASET_PUBLISH composes a DATASET data type instance from different data sources (anything referenced in an SQL statement). DATASET_PUBLISH publishes DATASET data types of any storage format, exporting stored data stored to an externally recognizable file format. DATASET_PUBLISH returns one output column, "data," which returns the data composed as a result of this operation.

The formats vary:
  • Data composed as Avro is returned as an instance of the DATASET type with storage format AVRO. When a DATASET data type stored as AVRO creates an Avro instance by using DATASET_PUBLISH, its schema is combined with the output schema. The default is to publish DATASET values in the AVRO storage format.
  • Data composed as CSV returns as a DATASET type instance with storage format CSV. The CSV file contains a header line with CSV column names and data. If the specified schema has a null field_names key, no header is included with the CSV value. All data is separated using the row and column delimiters. All Teradata complex data types (including DATASET) creating a CSV instance with DATASET_PUBLISH are converted to text representation and treated as one field in the CSV instance. To include a schema in the DATASET type, use the SCHEMA clause.

    If the DATASET value has a schema and is inserted into a table column with a column-based schema defined, the schema is removed from the value, and the CSV data is validated against the column-based schema.

    JSON data and DATASET data are not supported as input.

  • JSON data published to a DATASET type is converted to the desired storage format and then combined with the output.
  • Any other Teradata complex data types used to create an AVRO or CSV instance by using DATASET_PUBLISH are converted to text representation and treated as one field in the resulting instance. Distinct and structured UDTs are not supported.

When DATASET_PUBLISH is used to compose an Avro instance and you are publishing types that are found as nullable by definition, the schema marks the type as such by using a union of null and the corresponding data type. Therefore, the schema type for a field looks like "type":["null","int"] instead of "type":"int".

Null is guaranteed to be the first element in the union for the auto-generated Avro schema.