Storage Format Conversions - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

A table column that is defined to store JSON data in a storage format (text, BSON, or UBJSON) can accept JSON data in any of the other storage formats. Vantage automatically converts the data to the storage format of the column before storing it.

The data is converted based on the following rules.
Conversion of Storage Format Description
From BSON to Text

The BSON data is converted to the character set of the column. In BSON, strings are stored in UTF8, so if the data contains a character that is not in the repertoire of the target character set, a translation error is reported.

BSON contains extensions to the base JSON data types (such as binary data and dates). These extended data types are treated as strings in the resulting JSON text document.

From UBJSON to Text The UBJSON data is converted to the character set of the column. In UBJSON, strings are stored in UTF8, so if the data contains a character that is not in the repertoire of the target character set, a translation error is reported.
From Text to BSON JSON text data is converted to the BSON format. All strings are encoded in UTF8, even if the source JSON text was in the LATIN character set. No extended data types are used in the resulting BSON document.
From UBJSON to BSON UBJSON data is converted to the BSON format. No extended data types are used in the resulting BSON document.
From Text to UBJSON JSON text data is converted to the UBJSON format. All strings are encoded in UTF8, even if the source JSON text was in the LATIN character set.
From BSON to UBJSON BSON data is converted to the UBJSON format. BSON contains extensions to the base JSON data types (such as binary data and dates). These extended data types are treated as strings in the resulting UBJSON document.

Conversions Involving BSON

When converting from BSON, if the BSON document contains data in an extended data type format and this data is not supported in the target format, the data is enclosed in double quotation marks.

When converting to BSON, the source formats do not contain data in extended data type formats. Vantage does not interpret data enclosed in double quotation marks to determine if the data qualifies as one of the extended data types. Therefore, in all cases of converting to BSON, the result is a BSON document that has no data in any extended data type.

The root of a JSON document when stored as BSON is assumed to be an object. Furthermore, arrays are stored as key/value pairs where the key is the array index of the element.

For example, consider the following JSON document:

[ "a" , "b" ]

In BSON, that document is stored similar to the following:

{ "0" : "a" , "1" : "b" }

When this BSON document is converted to text, you get that object-like representation. This is due to the BSON assumption that the root of the document is an object. This is a limitation of the BSON specification. Nested arrays do not suffer from this limitation because nested arrays are represented identically, but preceded by an indicator byte to signify that the data represents an array. Therefore, when converting back to JSON text, the keys are not printed out.

Consider the following document:

{ "array" : [ "a" , "b" ] }

The nested array is treated properly as an array, so that if you convert this text to BSON and then back to text, you get the exact same document as follows:

{ "array" : [ "a" , "b" ] }

When searching for a value, the JSONPath query does not change whether the data is stored as an array or as an object. For example, consider the following documents stored as an array and as an object:

[ "a" , "b" ]
{ "0" : "a" , "1" : "b" }

When searching these documents for the value "a", the following query succeeds on both documents:

SELECT jsonCol[0] FROM jsonTable;

The reason is that JSONPath normalizes the request query. A normalization removes the square brackets, so when evaluating the query, Vantage does not know if the user wanted to search for a key called "0" or an element at index 0 of an array. JSONPath does the following:

  1. Search for the key.
  2. If the search fails, and the value being searched for is a positive integer in an array, look for an element at the specified index.

Therefore, both cases return "a" when searched.

In summary, the limitation only applies when reading the BSON data and converting the data back to text. The resulting text document may look a little different, but can be searched in the same way.