Teradata Database 16.00 supports a new data type called DATASET. A DATASET is a Custom Data Type (CDT) used to represent self-describing data stored in a format that conforms to a schema. Thus a DATASET has an associated schema that can be included along with the column data or referenced. Schemas are created via a CREATE SCHEMA statement and stored in the SYSUDTLIB database. Schema information is stored in the DBC.DatasetSchemaInfo table.
Schemas are defined in a storage format. Currently, Apache Avro is the only storage format supported. Avro is a data serialization framework that uses JSON for defining data types and protocols and serializes data in a compact binary format. It is used primarily in Apache Hadoop to provide a serialization format for persistent data and a wire format for communication between Hadoop nodes and from client programs to the Hadoop services. Schemas can also include CSV defined parameters.