Storage Formats - Advanced SQL Engine - Teradata Database

DATASET Data Type

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-23
dita:mapPath
des1556232910526.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1198
lifecycle
previous
Product Category
Teradata Vantageā„¢

Variable Storage Formats

Each DATASET use must specify a storage format. The STORAGE FORMAT syntax was extended to support the DATASET data type. Teradata Database provides built-in storage formats for the DATASET data type.

The storage format specification does not necessarily affect the data format on disk, but associates particular data with a specific well-known format.

Built-In Storage Formats

Teradata Database provides the Avro and CSV storage formats for the DATASET data type, which are based on the Apache Avro and CSV specifications. Each instance contains a schema conforming to the specification. The schema is always optional for the CSV storage format. The schema is interpreted on a per-instance basis, or at the column level.

Storage Format Terminology

Term Description
Schema For storage format AVRO, the schema is a JSON document describing the binary-encoded Avro value format. Specified in JSON text, in UTF-8 encoded characters using a VARBYTE or BLOB data type.

For CSV, the JSON document describes the extended CSV options such as a field or record delimiter, and column names or header information. It can be specified in any supported JSON format. It is stored in the same character set as the CSV data type for instance-level DATASET values and as UNICODE text, encoded in UTF-8, if stored in the Data Dictionary for column-level DATASET values.

Binary-encoded Avro Value The actual Avro data, encoded according to the scheme described by the schema.
CSV Value The CSV value in the Latin or Unicode character set.
JSON-encoded Value JSON-text representation of the data, as described by the schema.

Transform format OR

Cast format
For storage format AVRO, this is a null-terminated, UTF-8 encoded schema followed immediately by a binary-encoded value.

For CSV, the transform and cast format uses the original CSV value. If a schema is specified for a CSV value, it is not included in the cast or transform.