Usage Notes - Atomic Upsert - Parallel Data Pump

Teradata® Parallel Data Pump Reference

Product
Parallel Data Pump
Release Number
16.20
Published
September 2019
Language
English (United States)
Last Update
2019-10-11
dita:mapPath
dmq1512702641516.ditamap
dita:ditavalPath
Audience_PDF_include.ditaval
dita:id
B035-3021
lifecycle
previous
Product Category
Teradata Tools and Utilities

The syntax for Atomic upsert consists of an UPDATE statement and an INSERT statement separated by an ELSE keyword as follows:

UPDATE <update-operands> ELSE INSERT <insert-operands>;

Teradata TPump inserts the ELSE keyword between the UPDATE and INSERT statements by itself, so the user should not enter it in the script. If the ELSE keyword is used in this context, Teradata TPump will terminate with a syntax error.

The <update-operands> and <insert-operands> are operands for regular UPDATE and INSERT SQL statements, respectively. Only certain types of UPDATE and INSERT operands are valid in an Atomic upsert statement, and the operand parameters within a given upsert statement are subject to further constraints linking the update and insert parameters.

When using the standard upsert feature, the primary index should always be fully specified for the UPDATE statement, just as for other DML in a Teradata TPump script, so that the update can be processed as a one-AMP rather than an all-AMP operation. In addition, both the UPDATE and the INSERT of an upsert statement pair should specify the same target table, and the primary index value specified in the UPDATE's WHERE clause should match the primary index value implied by the column values in the INSERT. When processing an Atomic upsert statement, Teradata Database will usually reject statements that fail to meet these basic upsert constraints and return an error, enabling Teradata TPump to detect and handle constraint violations.

Constraints considered to be basic to the upsert operation are:

  • SAME TABLE: The UPDATE and INSERT statements must specify the same table.
  • SAME ROW: The UPDATE and INSERT statements must specify the same row; that is, the primary index value in the inserted row must be the same as the primary value in the targeted UPDATE row.
  • HASHED ROW ACCESS: The UPDATE must fully specify the primary index, allowing the target row to be accessed with a one-AMP hashed operation.
  • Some of these restrictions concern syntax that is supported in UPDATE and INSERT statements separately but not when combined in an Atomic upsert statement. Restrictions not supported by the Atomic upsert feature that return an error if submitted to Teradata Database are:
  • INSERT... SELECT: Syntax not supported. The INSERT may not use a subquery to specify any of the inserted values. Note that support of this syntax is likely to be linked to support of subqueries in the UPDATE's WHERE clause constraints as described above, and may involve new syntax features to allow the UPDATE and INSERT to effectively reference the same subquery.
  • UPDATE-WHERE-CURRENT: Syntax not supported. The WHERE clause cannot use an updatable cursor to do what is called a positioned UPDATE. (It is unlikely that this syntax will ever be supported.) Note that this restriction does not prevent cursors from being used in other ways with Atomic upsert statements. For example, a DECLARE CURSOR statement may include upsert statements among those to be executed when the cursor is opened, as long as the upserts are otherwise valid.
  • UPDATE-FROM: Not supported. The SET clause cannot use a FROM clause table reference in the expression for the updated value for a column.
  • UPDATE-WHERE SUBQUERIES: Not supported. The WHERE clause cannot use a subquery either to specify the primary index or to constrain a nonindex column. Note that supporting this UPDATE syntax would also require support for either INSERT … SELECT or some other INSERT syntax feature that lets it specify the same primary index value as the UPDATE.
  • UPDATE-PRIMARY INDEX: Not supported. The UPDATE cannot change the primary index. This is sometime called unreasonable update.
  • TRIGGERS: Feature not supported if either the UPDATE or INSERT could cause a trigger to be fired. The restriction applies as if the UPDATE and INSERT were both executed, because the parser trigger logic will not attempt to account for their conditional execution. UPDATE triggers on columns not referenced by the UPDATE clause, however, will never be fired by the upsert and are therefore permitted. DELETE triggers cannot be fired at all by an upsert and are likewise permitted. Note that an upsert could be used as a trigger action but it would be subject to the same constraints as any other upsert. Because upsert is not allowed to fire any triggers itself, an upsert trigger action must not generate any further cascaded trigger actions.
  • JOIN/HASH INDEXES:Feature not supported if either the UPDATE or INSERT could cause the join/hash index to be updated. As with triggers, the restriction applies to each upsert as if the UPDATE and INSERT were both executed. While the UPDATE could escape this restriction if the join/hash index does not reference any of the updated columns, it is much less likely (and maybe impossible) for the INSERT to escape this restriction. If the benefit of lifting the restriction for a few unlikely join/hash index cases turns out to be not worth the implementation cost, the restriction may have to be applied more broadly to any table with an associated join/hash index.

Teradata TPump treats a failed constraint as a nonfatal error, reports the error in the job log for diagnostic purposes, and continues with the job by reverting to nonbasic upsert protocol.

To resolve order-dependency issues, Teradata TPump always processes the UPDATE before the INSERT because:

  • It matches the ordering implied by the upsert name: UP[date] + [in]SERT.
  • It matches the ordering implied by the UPDATE-ELSE-INSERT syntax.
  • It matches the common definition of upsert semantics.
  • It allows for an upsert operation on MULTISET tables, where an insert-first policy would always succeed on INSERT and never on UPDATE.

Existing Teradata TPump scripts for upsert do not need to be changed. The syntax as described below for upsert will continue to be supported:

DO INSERT FOR MISSING UPDATE ROWS;
UPDATE <update-operands>;
INSERT <insert-operands>;

Teradata TPump changes this syntax into Atomic upsert syntax by replacing the semicolon between the UPDATE and INSERT statements with an ELSE keyword to convert the statement pair into a single Atomic upsert statement.

If user-created macros are used in place of the UPDATE and INSERT statements, Teradata TPump generates:

EXEC <update-macro> ELSE EXEC <insert-macro>;

because this statement does not conform to Atomic upsert syntax used by Teradata TPump.