17.05 - Using Custom Clauses with DATASET_PUBLISH - Teradata Database

Teradata Vantageā„¢ - DATASET Data Type

Advanced SQL Engine
Teradata Database
June 2020
Programming Reference
The DATASET_PUBLISH table operator uses custom clauses to compose DATASET data type instances in your chosen format:
  • Specify SCHEMA to explicitly define the published data structure. Use SCHEMA with a character string representing an ad-hoc schema specification; any other value results in an error. When specifying an ad-hoc schema, the structure must conform to the output storage format rules. If this clause is not specified, the schema is automatically generated by the database based on the data used to compose the DATASET data type instances.
  • Specify DO_AGGREGATE for output instances composed of one row of output data. The DO_AGGREGATE clause accepts either Y or N, where Y signifies that the result is aggregated (default) and N is not aggregated. The values are not case-sensitive. If DO_AGGREGATE is excluded, the table operator aggregates all data corresponding to a particular group (as defined in the optional GROUP BY clause in the SELECT statement of the ON clause) into one DATASET data type instance.
  • Specify UNIQUE_NAMES to generate UNIQUE record or fixed type names when constructing the output schema. This is helpful when nesting DATASET or JSON data within the final result. The UNIQUE_NAMES clause accepts either Y or N (not case-sensitive) where Y signifies each auto-generated name is unique, and N is not unique. The default is N. "UNIQUE_NAMES" is ignored if the storage format is CSV.
  • Specify INCLUDE_ HEADER only for DATASET values in CSV storage format. INCLUDE_ HEADER accepts either Y or N, (not case-sensitive) where Y signifies each returned DATASET value includes a header record as the first record in the file. N signifies omitting the header record. The default is Y, which includes the header record.

If SCHEMA supplies the CSV schema, with field_names specified and a non-null value, the DATASET value includes a header that uses the SCHEMA-supplied field names. The field names are then removed from the schema. If INCLUDE_HEADER is specified as N, then no header record is written; instead the field names are referenced from the schema.

If a CSV schema is not supplied by SCHEMA, and INCLUDE_HEADER is Y, then the table column names are written as the header record to the CSV file. If INCLUDE_HEADER is N, field_names is null in the schema.

If SCHEMA provides a CSV schema and field_names is null, with INCLUDE_HEADER as Y, an error occurs because null is an invalid header record. If INCLUDE_HEADER is N, no action is taken.

The following table summarizes all combinations of the SCHEMA and INCLUDE_HEADER clauses.

Schema Include Header Action
No Schema Y Use table column names as header record
N Create schema with "field_names":null
Schema with Field Names Y
  • Use field names as header
  • Remove field names from schema
  • No action
  • Refers to the schema for field names

Schema with "field_names":null OR


Y Error
  • No action
  • No header record or field names are assumed

Additionally, use the RETURNS clause to specify the output data type or length. The default is to publish to AVRO of maximum length, so omit RETURNS if for that desired output. For CSV storage format output, specify RETURNS. The return type must be a DATASET STORAGE FORMAT CSV data type.

If the result set is inserted into table, the schema is stored in every DATASET result value unless the DATASET table column has a schema associated with it from the WITH SCHEMA clause in the CREATE TABLE statement. In that case, the schema is not stored with every value.