The DATASET data type includes a schema and data, which can both have a variable length. You can use the INLINE LENGTH option to specify an inline storage size. When the data is smaller than or equal to the inline storage size, it is stored inside the base row. Otherwise, the data is stored as a LOB (large object).
If the data is stored inline, it is treated as a non-LOB type. In this case, the performance may be better because there is no LOB overhead. You may see some performance improvement, especially when the data type is used with UDFs.
Each specification of the DATASET data type includes the following information:
- Maximum length
- In-line length
- Storage format
- Character set (comma-separated value (CSV) storage format only)
- Schema
Specify the STORAGE FORMAT option in the data type specification syntax. Available storage formats include Avro and CSV. The following values apply to either the schema or the data for the DATASET data type:
Storage Location | Maximum Length | Minimum Length | Default Length |
LOB | 16 MB | 100 bytes | 16 MB |
Inline | 64 KB | 100 bytes | 10 KB |
[Optional] Specify a character set for the CSV format. It can be either LATIN or UNICODE. The default is the session character set.
A schema is optional for the CSV format. You can specify a schema in any supported JSON format. For instance-level DATASET values, the schema is stored in the same character set as the CSV data type. For column-level DATASET values, it is encoded in UTF-8. The schema is null-terminated.
The CSV storage format will support extensions to the CSV standard, such as user-specified column and record delimiters and header field names. If you use any of these extensions, specify a schema. You can define schemas at the column-level or instance-level for any of the built-in storage formats of the DATASET type. Column-level schemas are binding for all instances of the data type loaded into that particular column, but instance-level schemas may vary from instance to instance.