JSON_SHRED_BATCH and JSON_SHRED_BATCH_U

Teradata Vantage™ JSON Data Type

brand
Software
Teradata Vantage
prodname
Teradata Database
Teradata Vantage NewSQL Engine
vrm_release
16.20
category
Programming Reference
featnum
B035-1150-162K

Purpose

JSON_SHRED_BATCH and JSON_SHRED_BATCH_U are SQL stored procedures that use any number of JSON instances to populate existing tables, providing a flexible form of loading data from the JSON format into a relational model. Two shred procedures are provided; however, the only difference between them is the character set of the data. To explain the functionality, we only describe JSON_SHRED_BATCH (the version that operates on LATIN character set data), but the explanation applies equally to JSON_SHRED_BATCH_U (the UNICODE version).

Functional Description

The batch shredding procedures map into a number of successive calls to JSON_TABLE to create a conglomerate temporary table, the values of which can be assigned to existing tables.

Syntax

















CALL JSON_SHRED_BATCH
The following describes the parameters used by the JSON_SHRED_BATCH and JSON_SHRED_BATCH_U procedure calls.
input_query

A string input parameter which specifies a query that results in a group of JSON instances from which the user can perform shredding. Extra columns can result and be referred to in the shred statement.

If this parameter is NULL, an error is reported.

The input query parameter can operate on one or more JSON objects in a source table. The user invoking JSON_SHRED_BATCH must have SELECT privilege on the source table. The input query is mapped to a JSON_TABLE function call. Since JSON_TABLE requires that the first two columns specified be an ID value and a JSON object, respectively, the input query parameter also requires the first two columns to be an ID value and a JSON object.

The following are examples of an input query string.

'SELECT id, empPersonalInfo, site FROM test.json_table'
'SELECT JSONDOCID, JSONDT1, a, b FROM jsonshred.JSON_TABLE3 WHERE JSONID=100'

JSONID (uppercase or lowercase) is a keyword. It is a temporary column name used for the JSON document ID value. JSONID is allowed in the input query and table expression clauses. You cannot use JSONID as a temp_column_name in "colexpr" or "queryexpr".

The execution of JSON_TABLE on multiple JSON objects requires a join between the result of one invocation and the source table. In order to avoid a full table join, we require an ID column to be specified in the input query parameter, so that a join condition can be built off that column.

The data types in the queryexpr (discussed later) must match the actual data type of the columns specified in the input query. No explicit cast will be added, so the data must be implicitly castable to the data type defined in the query expr, if not the exact data type. Any errors encountered will result in a failed shred, and will be reported to the user.

If there is a problem encountered during the execution of JSON_TABLE, the ID column is used in the error message to isolate which row caused the problem.

shred statement

The shred statement element defines the mapping of the JSON instance, that resulted from the input query, to where the data will be loaded in the user tables.

If the shred statement is NULL an error is reported.

All keywords in the shred statement must be specified in lowercase.

The following sections discuss the structure and syntax of the shred statement. Multiple shred statements can be run, but there are performance impacts.

row expression
The following describes the variables used by the row expression.
  • "rowexpr" :

    Required, literal entry.

    Must be lowercase.

  • JSONPath expr – An expression in JSONPath syntax to extract information about a particular portion of a JSON instance. For example, $.schools[*] identifies all schools.
column expression
The following describes the column expression variables.
  • "colexpr" :

    Required, literal entry.

    Must be lowercase.

  • temp_column_name
    Any user-defined name for the temporary column. The temporary column name must be unique or an error is reported.
    The names are not case sensitive. For example, col1 and COL1 will fail because they are used in internal queries and are not unique.

    You cannot use JSONID and ROWINDEX (uppercase or lowercase) here.

  • JSONPath expr – A JSONPath expression in JSONPath syntax that requests information about a particular portion of a JSON object. For example, $.name.
  • "type"

    Required, literal entry.

    Must be lowercase.

  • data_type – Non-LOB Teradata predefined type, such as INTEGER or VARCHAR. For a list of supported types, see Supported Data Types.
  • "fromRoot" : true

    Optional. You must use the fromRoot attribute for non-relative paths. Each column expression is assumed to be relative to the row expression ("rowexpr": "JSONPATH expression"), unless specified as not relative. A relative expression starts with one of the names of a child of the row expression, whereas a non-relative expression starts with the root in JSONPath, $.

    true is a literal and must be lowercase.
    Attempting to set fromRoot to false raises an error.

The user is responsible for mapping elements of the column expression to acceptable data types. No explicit casting is needed; the data is implicitly cast to the desired data type. However, if the data does not correctly cast to the desired data type an error is reported, therefore, care should be taken when determining the elements and data types of the column expression. If the result of the expression is an array or object (instead of a single value), the only acceptable data types are CHAR or VARCHAR of suitable length.

The data type of the temporary columns in the output table of JSON_TABLE must be specified. This is enforced with JSON_SHRED_BATCH and JSON_SHRED_BATCH _U in the column expression. It is necessary for the user to provide this information so that the data may be correctly interpreted and used with the target table(s).

query expression
The following describes the query expression variables.
  • "queryexpr" :

    Required, literal entry.

    Must be lowercase.

  • temp_column_name
    Any user-defined name for the temporary column. The temporary column name must be unique or an error is reported.
    The names are not case sensitive. For example, col1 and COL1 will fail because they are used in internal queries and are not unique.

    You cannot use JSONID and ROWINDEX (uppercase or lowercase) here.

  • "column_type" – Data type of the temp_column_name column.

The queryexpr is used to define the data types of the data selected in the input query, for the columns that are not the ID value or the JSON object. It is mandatory that the extra columns be referenced in the queryexpr to specify the data type desired.

All the columns in the input query, from the third column onward, must be replicated in the same order in the queryexpr.
The order of the columns is important while the names of the columns are not.

The data types in the queryexpr should match the actual data type of the columns specified in the input query. No explicit cast is added, so the data must be implicitly castable to the data type defined in the queryexpr, if not the exact data type. Any errors encountered will result in a failed shred, and are reported to the user.

The following example demonstrates the importance of the ordering. Notice that both columns qrycol1 and qrycol2 are included in the queryexpr. Note, qrycol2 in the queryexpr refers to qrycol1 in the input query, and qrycol1 in the queryexpr refers to qrycol2 in the input query. As stated, order is significant, not the names.

CALL SYSLIB.JSON_SHRED_BATCH(
'SELECT JSONDOCID, JSONDT1, qrycol1, qrycol2 FROM jsonshred.JSON_TABLE3 WHERE JSONID=100',
'[ { "rowexpr" : "$.population.profile",
     "colexpr" : [{"col1" : "$.name.first", "type" : "VARCHAR(30)"},
                   {"col2" : "$.address.zip", "type" : "NUMBER(5,0)"}],
     "queryexpr" : [{ "qrycol2" : "VARCHAR(20)"}, { "qrycol1" : "VARCHAR(20)"}],
     "tables" : [
                  {"jsonshred.JSON_SHRED_TABLE1" : {
                     "metadata" : { "operation" : "insert" },
                     "columns" : {"EmpID":"JSONID*10", "NAME":"col1","STATE":"qrycol1", "ZIP":"col2"}
                     }
                  }
                ]         
   }
]',res );

JSONID and ROWINDEX (uppercase or lowercase) are not allowed in colexpr and queryexpr because they are fixed temporary column names. A syntax error is reported if they are used in those clauses.

table expression
The following describes the table expression variables.
  • "tables" :

    Required, literal entry.

    Must be lowercase.

  • "table_name" – The fully qualified name of an existing database table. The user invoking JSON_SHRED_BATCH must have the required privileges (INSERT, UPDATE, and so forth) on this table.

JSONID and ROWINDEX (uppercase or lowercase) are keywords. They are used to track the input JSON document ID value (the first column of the input query) and the index number for an input row, respectively. JSONID and ROWINDEX may be referenced in the table expression clause as a source value for the shredding operation.

In the process of shredding, a volatile table is created for each shred statement. A table can have a maximum of 2048 columns, so all the columns together from all the table mappings should not exceed 2044 columns (there are four internal columns). You can have 1 to N target tables, which can each have 1 to N columns, but the total number of all columns must not exceed 2044.
metadata
The following describes the metadata variables.
  • "metadata" :

    Required, literal entry.

    Must be lowercase.

  • "operation" – Required, literal entry.
  • "insert" | "update" | "merge" | "delete"

    Operation to perform.

    In a MERGE operation, the target table must have a primary index, and the primary index has to be a member of the keys in the metadata.

  • "keys":

    Using keys is optional. If used, "keys": is a required, literal entry.

    All names given in the keys clause must be present in the column assignment clause.

    The keys are used to perform the join between the temporary table created by the row and column expressions and the target table. This should be used carefully as it can drastically affect performance. In the case of a MERGE operation, the target table must have a primary index, and the primary index has to be a member of the specified keys.

  • "table_column_name" – The name of any column in the table referenced by table_name. The user invoking JSON_SHRED_BATCH must have the required privileges (INSERT, UPDATE, and so forth) on this existing table. table_name is specified in the table expression of JSON_SHRED_BATCH.
  • "filter": – Filtering is optional. If used, "filter": is a required, literal entry.
  • filter_expression

    SQL statement referencing elements of the column or query expressions.

    Example filter statement: "filter" : "empId<5000"

column assignment
The following describes the column assignment variables.
  • "columns" : – Required, literal entry.
  • "table_column_name'" – The name of any column in the table referenced by table_name. The user invoking JSON_SHRED_BATCH must have the required privileges (INSERT, UPDATE, and so forth) on this existing table. table_name must be the fully qualified table name. It is specified in the table expression of JSON_SHRED_BATCH.
  • temp_column_name – A temp_column_name defined in "colexpr" or "queryexpr". The temporary column name must be unique or an error is reported. Note: Temporary column names are not case sensitive, so col1 and COL1 are not unique and will cause an error.
  • "temp_expr" – Teradata SQL expression.
  • numeric_constant – Any JSON-supported Numeric value.
  • ["string_constant"]

    Any JSON-supported string value.

    Example string constant: "company" : ["Teradata"]

  • boolean_constant

    true or false

    true and false are JSON keywords and must be lowercase.

  • null

    JSON null.

    null is a JSON keyword and must be lowercase.

result code
An output parameter representing the result of the shred operation. A value of 0 indicates success. All non-zero values indicate specific error conditions and an appropriate error message is returned.