Starting with Teradata Database 16.00, a new Custom Data Type (CDT) called DATASET is used to represent self-describing data stored in a format that conforms to a schema. DATASET has an associated schema that can be included along with the column data or referenced. Schemas are created via a CREATE SCHEMA statement and stored in the SYSUDTLIB database. Schema information is stored in the DBC.DatasetSchemaInfo table.
Schemas are defined in a storage format. Currently, Apache Avro is the only storage format supported. Avro is a data serialization framework that uses JSON for defining data types and protocols and serializes data in a compact binary format. It is used primarily in Apache Hadoop to provide a serialization format for persistent data and a wire format for communication between Hadoop nodes and from client programs to the Hadoop services. Schemas can also include CSV-defined parameters.