17.00 - The Basic Upsert Feature - Parallel Data Pump

Teradata® Parallel Data Pump Reference

Parallel Data Pump
June 2020
Programming Reference
When using the basic upsert feature:
  • There must be exactly two DML statements in this DML group.
  • The first DML statement must be an UPDATE statement that follows all of the Teradata TPump task rules.
  • The second DML statement must be an INSERT statement.
  • Both DML statements must refer to the same table.
  • The INSERT statement, when built, must reflect the same primary index specified in the WHERE clause of the UPDATE statement. This is true for both a single column primary index and a compound primary index.

By following these rules, a number of uses for the DO INSERT ROWS option can be found. In the past, data could be presorted into INSERTs and UPDATEs, or UPDATEs attempted with all the data, and then do an INSERT on any UPDATEs that failed. With upsert, Teradata TPump needs only one pass of the data to UPDATE rows that need to be updated and INSERT rows that need to be inserted.

To ensure data integrity, the SERIALIZE parameter defaults to ON in the absence of an explicit value if there are upserts in the Teradata TPump job.

When MARK MISSING UPDATE ROWS specified, while using DO INSERT ROWS, Teradata TPump records any UPDATE that fails. This record appears in the Application Error Table, together with an error code that shows that the INSERT of the DO INSERT ROWS was then executed. If the INSERT fails, the INSERT row is also recorded in the Application Error table. The default for an upsert function, however, is not to mark missing update rows. This is because when the upsert function is performed, the INSERT is expected to occur when the UPDATE fails. The failure of the UPDATE portion of an upsert does not, in itself, constitute an error and should not be treated as one.

The MARK MISSING DELETE ROW option has no meaning when used with the DO INSERT ROWS option.

The option of MARK (IGNORE) EXTRA DELETE (UPDATE) ROWS provides Teradata TPump with a way to protect against an update or delete affecting multiple rows, which can happen in Teradata TPump because the primary index can be non-unique.

MARK is the default for all DML options, except for an upsert.