Storage Formats - Advanced SQL Engine

Storage Formats - Advanced SQL Engine - Teradata Database

DATASET Data Type

Product

Advanced SQL Engine

Teradata Database

Release Number

17.10

Published

July 2021

Language

English (United States)

Last Update

2021-07-27

dita:mapPath

amv1590702100517.ditamap

dita:ditavalPath

amv1590702100517.ditaval

dita:id

B035-1198

lifecycle

Product Category

Teradata Vantage™

Variable Storage Formats

Each DATASET use must specify a storage format. The STORAGE FORMAT syntax was extended to support the DATASET data type. Vantage provides built-in storage formats for the DATASET data type.

The storage format specification does not necessarily affect the data format on disk, but associates particular data with a specific well-known format.

Built-In Storage Formats

Vantage provides the Avro and CSV storage formats for the DATASET data type, which are based on the Apache Avro and CSV specifications. Each instance contains a schema conforming to the specification. The schema is always optional for the CSV storage format. The schema is interpreted on a per-instance basis, or at the column level.

Storage Format Terminology

Term	Description
Schema	For storage format AVRO, the schema is a JSON document describing the binary-encoded Avro value format. Specified in JSON text, in UTF-8 encoded characters using a VARBYTE or BLOB data type. For CSV, the JSON document describes the extended CSV options such as a field or record delimiter, and column names or header information. It can be specified in any supported JSON format. It is stored in the same character set as the CSV data type for instance-level DATASET values and as UNICODE text, encoded in UTF-8, if stored in the Data Dictionary for column-level DATASET values.
Binary-encoded Avro Value	The actual Avro data, encoded according to the scheme described by the schema.
CSV Value	The CSV value in the Latin or Unicode character set.
JSON-encoded Value	JSON-text representation of the data, as described by the schema.
Transform format OR Cast format	For storage format AVRO, this is a null-terminated, UTF-8 encoded schema followed immediately by a binary-encoded value. For CSV, the transform and cast format uses the original CSV value. If a schema is specified for a CSV value, it is not included in the cast or transform.