Storage Formats - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Variable Storage Formats

Each DATASET use must specify a storage format. The STORAGE FORMAT syntax was extended to support the DATASET data type. Vantage provides built-in storage formats for the DATASET data type.

The storage format specification does not necessarily affect the data format on disk, but associates specific data with a specific well-known format.

Built-In Storage Formats

Vantage provides the Avro and CSV storage formats for the DATASET data type, which are based on the Apache Avro and CSV specifications. Each instance contains a schema conforming to the specification. The schema is optional for the CSV storage format. The schema is interpreted on a per-instance basis, or at the column level.

The CSV and AVRO DATASET data types are only supported on the Block File System on the primary cluster. They are not available for the Object File System.

Storage Format Terminology

Term Description
Schema For storage format AVRO, the schema is a JSON document describing the binary-encoded Avro value format. Specified in JSON text, in UTF-8 encoded characters using a VARBYTE or BLOB data type.

For CSV, the JSON document describes the extended CSV options such as a field or record delimiter, and column names or header information. The schema can be specified in any supported JSON format. The schema is stored in the same character set as the CSV data type for instance-level DATASET values and as UNICODE text, encoded in UTF-8, if stored in the Data Dictionary for column-level DATASET values.

Binary-encoded Avro Value The actual Avro data, encoded according to the scheme described by the schema.
CSV Value The CSV value in the Latin or Unicode character set.
JSON-encoded Value JSON-text representation of the data, as described by the schema.
Transform format or cast format For storage format AVRO, this is a null-terminated, UTF-8 encoded schema followed immediately by a binary-encoded value.

For CSV, the transform or cast format uses the original CSV value. If a schema is specified for a CSV value, the schema is not included in the cast or transform.