DATASET_PUBLISH composes a DATASET data type instance from different data sources (anything referenced in an SQL statement). DATASET_PUBLISH publishes DATASET data types of any storage format, exporting stored data stored to an externally recognizable file format. DATASET_PUBLISH returns one output column, "data," which returns the data composed as a result of this operation.
- Data composed as Avro is returned as an instance of the DATASET type with storage format AVRO. When a DATASET data type stored as AVRO creates an Avro instance by using DATASET_PUBLISH, its schema is combined with the output schema. The default is to publish DATASET values in the AVRO storage format.
- Data composed as CSV returns as a DATASET type instance with storage format CSV. The CSV file contains a header line with CSV column names and data. If the specified schema has a null field_names key, no header is included with the CSV value. All data is separated using the row and column delimiters. All Teradata complex data types (including DATASET) creating a CSV instance with DATASET_PUBLISH are converted to text representation and treated as one field in the CSV instance. To include a schema in the DATASET type, use the SCHEMA clause.
If the DATASET value has a schema and is inserted into a table column with a column-based schema defined, the schema is removed from the value, and the CSV data is validated against the column-based schema.
JSON data and DATASET data are not supported as input.
- JSON data published to a DATASET type is converted to the desired storage format and then combined with the output.
- Any other Teradata complex data types used to create an AVRO or CSV instance by using DATASET_PUBLISH are converted to text representation and treated as one field in the resulting instance. Distinct and structured UDTs are not supported.
When DATASET_PUBLISH is used to compose an Avro instance and you are publishing types that are found as nullable by definition, the schema marks the type as such by using a union of null and the corresponding data type. Therefore, the schema type for a field looks like "type":["null","int"] instead of "type":"int".