The Avro specification provides an object container file format to transmit and store multiple binary-encoded Avro values with a common schema.
Because these files contain one Avro schema and one or more binary-encoded Avro values described by that schema, the data in an object container file maps to a DATASET STORAGE FORMAT AVRO column with a column-based schema.
The database provides direct support for the files using the AvroContainerSplit table operator. The following section describes a general framework to import Avro data from the files.
- Retrieve the schema from the object container file.
- Create a schema using the new CREATE <storage-format-name> SCHEMA DDL statement using the schema retrieved in Step 1. Note that this schema may be specified in LATIN or UNICODE characters or as UTF-8 in its byte representation.
- Create a table that conforms to a desired structure, and includes a DATASET STORAGE FORMAT AVRO column with a column-level schema defined using the schema created in Step 2.
- Run the AvroContainerSplit table operator to load the Avro DATASET values into the table created in Step 3.
These steps allow any application to import data from an object container file to a database table.
If the DATASET table column is defined without a column-based schema, the schema is stored with each Avro instance in the table.