Storage Format Conversions - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

A table column that is defined to store JSON data in a storage format (text, BSON, or UBJSON) can accept JSON data in any of the other storage formats. Vantage automatically converts the data to the storage format of the column before storing it.

The data is converted based on the following rules.

Conversion of Storage Format	Description
From BSON to Text	The BSON data is converted to the character set of the column. In BSON, strings are stored in UTF8, so if the data contains a character that is not in the repertoire of the target character set, a translation error is reported. BSON contains extensions to the base JSON data types (such as binary data and dates). These extended data types are treated as strings in the resulting JSON text document.
From UBJSON to Text	The UBJSON data is converted to the character set of the column. In UBJSON, strings are stored in UTF8, so if the data contains a character that is not in the repertoire of the target character set, a translation error is reported.
From Text to BSON	JSON text data is converted to the BSON format. All strings are encoded in UTF8, even if the source JSON text was in the LATIN character set. No extended data types are used in the resulting BSON document.
From UBJSON to BSON	UBJSON data is converted to the BSON format. No extended data types are used in the resulting BSON document.
From Text to UBJSON	JSON text data is converted to the UBJSON format. All strings are encoded in UTF8, even if the source JSON text was in the LATIN character set.
From BSON to UBJSON	BSON data is converted to the UBJSON format. BSON contains extensions to the base JSON data types (such as binary data and dates). These extended data types are treated as strings in the resulting UBJSON document.

Conversions Involving BSON

When converting from BSON, if the BSON document contains data in an extended data type format and this data is not supported in the target format, the data is enclosed in double quotation marks.

When converting to BSON, the source formats do not contain data in extended data type formats. Vantage does not interpret data enclosed in double quotation marks to determine if the data qualifies as one of the extended data types. Therefore, in all cases of converting to BSON, the result is a BSON document that has no data in any extended data type.

The root of a JSON document when stored as BSON is assumed to be an object. Furthermore, arrays are stored as key/value pairs where the key is the array index of the element.

For example, consider the following JSON document:

[ "a" , "b" ]

In BSON, that document is stored similar to the following:

{ "0" : "a" , "1" : "b" }

When this BSON document is converted to text, you get that object-like representation. This is due to the BSON assumption that the root of the document is an object. This is a limitation of the BSON specification. Nested arrays do not suffer from this limitation because nested arrays are represented identically, but preceded by an indicator byte to signify that the data represents an array. Therefore, when converting back to JSON text, the keys are not printed out.

Consider the following document:

{ "array" : [ "a" , "b" ] }

The nested array is treated properly as an array, so that if you convert this text to BSON and then back to text, you get the exact same document as follows:

{ "array" : [ "a" , "b" ] }

When searching for a value, the JSONPath query does not change whether the data is stored as an array or as an object. For example, consider the following documents stored as an array and as an object:

[ "a" , "b" ]

{ "0" : "a" , "1" : "b" }

When searching these documents for the value "a", the following query succeeds on both documents:

SELECT jsonCol[0] FROM jsonTable;

The reason is that JSONPath normalizes the request query. A normalization removes the square brackets, so when evaluating the query, Vantage does not know if the user wanted to search for a key called "0" or an element at index 0 of an array. JSONPath does the following:

Search for the key.
If the search fails, and the value being searched for is a positive integer in an array, look for an element at the specified index.

Therefore, both cases return "a" when searched.

In summary, the limitation only applies when reading the BSON data and converting the data back to text. The resulting text document may look a little different, but can be searched in the same way.