Storage Format Conversions - Advanced SQL Engine

Storage Format Conversions - Advanced SQL Engine - Teradata Database

JSON Data Type

Product

Advanced SQL Engine

Teradata Database

Release Number

17.10

Published

July 2021

Language

English (United States)

Last Update

2021-07-27

dita:mapPath

uwa1591040057999.ditamap

dita:ditavalPath

uwa1591040057999.ditaval

dita:id

B035-1150

lifecycle

Product Category

Teradata Vantage™

A table column that is defined to store JSON data in a particular storage format (text, BSON, or UBJSON) can accept JSON data in any of the other storage formats. Vantage automatically converts the data to the storage format of the column before storing it.

The data is converted based on the following rules.

Conversion of Storage Format	Description
From BSON to Text	The BSON data is converted to the character set of the column. In BSON, strings are stored in UTF8, so if the data contains a character that is not in the repertoire of the target character set, a translation error is reported. BSON contains extensions to the base JSON data types (such as binary data and dates). These extended data types are treated as strings in the resulting JSON text document.
From UBJSON to Text	The UBJSON data is converted to the character set of the column. In UBJSON, strings are stored in UTF8, so if the data contains a character that is not in the repertoire of the target character set, a translation error is reported.
From Text to BSON	JSON text data is converted to the BSON format. All strings are encoded in UTF8, even if the source JSON text was in the LATIN character set. No extended data types are used in the resulting BSON document.
From UBJSON to BSON	UBJSON data is converted to the BSON format. No extended data types are used in the resulting BSON document.
From Text to UBJSON	JSON text data is converted to the UBJSON format. All strings are encoded in UTF8, even if the source JSON text was in the LATIN character set.
From BSON to UBJSON	BSON data is converted to the UBJSON format. BSON contains extensions to the base JSON data types (such as binary data and dates). These extended data types are treated as strings in the resulting UBJSON document.

Conversions Involving BSON

When converting from BSON, if the BSON document contains some data in one of the extended data type formats and this data is not supported in the target format, the data will be enclosed in double quotation marks.

When converting to BSON, the source formats will not contain any data in the extended data type formats. Vantage does not interpret data enclosed in double quotation marks to determine if it qualifies as one of the extended data types. Therefore, in all cases of converting to BSON, the result is a BSON document that has no data in any extended data type.

The root of a JSON document when stored as BSON is always assumed to be an object. Furthermore, arrays are stored as key/value pairs where the key is the array index of the element.

For example, consider the following JSON document:

[ "a" , "b" ]

In BSON, that document is stored similar to the following:

{ "0" : "a" , "1" : "b" }

When this BSON document is converted to text, you will get that object-like representation. This is due to the BSON assumption that the root of the document is an object. This is a limitation of the BSON specification. Nested arrays do not suffer from this limitation because they are represented identically, but preceded by an indicator byte to signify that the data represents an array. Therefore, when converting back to JSON text, the keys are not printed out.

Consider the following document:

{ "array" : [ "a" , "b" ] }

The nested array is treated properly as an array, so that if you convert this text to BSON and then back to text, you will get the exact same document as follows:

{ "array" : [ "a" , "b" ] }

Note that when searching for a value, the JSONPath query does not change whether the data is stored as an array or as an object. For example, consider the following documents stored as an array and as an object:

[ "a" , "b" ]

{ "0" : "a" , "1" : "b" }

When searching these documents for the value "a", the following query succeeds on both documents:

SELECT jsonCol[0] FROM jsonTable;

The reason is that JSONPath normalizes the request query. One of the normalizations involves removing the square brackets, so when evaluating the query, it is not known if the user wanted to search for a key called "0" or an element at index 0 of an array. JSONPath will do the following:

Search for the key.
If the search fails, and the value being searched for is a positive integer and we are in an array, then look for an element at the specified index.

Therefore, both cases will return "a" when searched.

In summary, the limitation only applies when reading the BSON data and converting it back to text. The resulting text document may look a little different, but it can be searched in the same way.