The Basic Upsert Feature - Parallel Data Pump

Teradata Parallel Data Pump Reference

Product
Parallel Data Pump
Release Number
15.00
Language
English (United States)
Last Update
2018-09-27
dita:id
B035-3021
lifecycle
previous
Product Category
Teradata Tools and Utilities

The Basic Upsert Feature

When using the basic upsert feature:

  • There must be exactly two DML statements in this DML group.
  • The first DML statement must be an UPDATE statement that follows all of the Teradata TPump task rules.
  • The second DML statement must be an INSERT statement.
  • Both DML statements must refer to the same table.
  • The INSERT statement, when built, must reflect the same primary index specified in the WHERE clause of the UPDATE statement. This is true for both a single column primary index and a compound primary index.
  • By following these rules, a number of uses for the DO INSERT ROWS option can be found. In the past, data could be presorted into INSERTs and UPDATEs, or UPDATEs attempted with all the data, and then do an INSERT on any UPDATEs that failed. With upsert, Teradata TPump needs only one pass of the data to UPDATE rows that need to be updated and INSERT rows that need to be inserted.

    Note: To ensure data integrity, the SERIALIZE parameter defaults to ON in the absence of an explicit value if there are upserts in the Teradata TPump job.

    When MARK MISSING UPDATE ROWS specified, while using DO INSERT ROWS, Teradata TPump records any UPDATE that fails. This record appears in the Application Error Table, together with an error code that shows that the INSERT of the DO INSERT ROWS was then executed. If the INSERT fails, the INSERT row is also recorded in the Application Error table. The default for an upsert function, however, is not to mark missing update rows. This is because when the upsert function is performed, the INSERT is expected to occur when the UPDATE fails. The failure of the UPDATE portion of an upsert does not, in itself, constitute an error and should not be treated as one.

    The MARK MISSING DELETE ROW option has no meaning when used with the DO INSERT ROWS option.

    The option of MARK (IGNORE) EXTRA DELETE (UPDATE) ROWS provides Teradata TPump with a way to protect against an update or delete affecting multiple rows, which can happen in Teradata TPump because the primary index can be non‑unique.

    MARK is the default for all DML options, except for an upsert.

    Example Upsert

    Each record in the following example contains the value of the primary index column (EmpNo) of a row of the Employee table whose PhoneNo column is to be assigned a new phone number from field Fone.

    When the UPDATE fails, the INSERT statement is activated and Teradata TPump enters the upsert mode. In this case, each record contains the primary index value (EmpNum) of a row that is to be inserted successively into the Employee table whose columns are EmpNo and PhoneNo.

    .BEGINLOAD SESSION number;
    .LAYOUT Layoutname;
    .FIELD EmpNum 1 INTEGER;
    .FIELD Fone * (CHAR (10));
    .DML LABEL DMLlabelname
    DO INSERT FOR MISSING UPDATE ROWS;
    UPDATE Employee SET PhoneNo = :Fone WHERE EmpNo = :EmpNum;
    INSERT Employee (EmpNo, PhoneNo) VALUES (:EmpNum, :Fone);
    .IMPORT INFILE Infilename LAYOUT Layoutname  APPLY DMLlabelname;
    .END LOAD; 

    The scope of a DML command (and its label) terminates at the first following command of any kind or at the end of the file containing the DML statements, whichever occurs first.

    The SQL EXECUTE command must be between the BEGIN LOAD command and END LOAD command.

    For IMPORT tasks, up to seven distinct error treatment options for one DML command may be specified. For example:

    .DML LABEL COMPLEX 
    IGNORE DUPLICATE INSERT ROWS 
    MARK   DUPLICATE UPDATE ROWS 
    IGNORE MISSING   UPDATE ROWS 
    MARK   MISSING   DELETE ROWS 
    DO INSERT FOR MISSING   UPDATE ROWS; 

    It is valid to specify that missing update rows be both marked and treated as INSERTs or, as in the example, both ignored and treated as INSERTs.

    If Teradata TPump encounters any of the following:

  • no DML command in an IMPORT task,
  • DML statements outside the scope of a DML command in an IMPORT task, or
  • a DML command with no DML statements in an IMPORT task,
  • it writes a diagnostic message to the primary output destination for the system, terminates the Teradata TPump task, and returns to the main Teradata TPump control module with a conventional nonzero return code. The error can be corrected and resubmitted to the Teradata TPump task.

    The DML commands (with their following DML statements) must appear between the appropriate BEGIN LOAD command and the IMPORT commands that refer to them. When the END LOAD command is encountered, the sequence of DML commands and DML statements is forgotten. The DML command cannot be shared by multiple BEGIN LOAD statements. The DML statements are described in the following sections.

    The maximum number of DML commands that can be used in a single Teradata TPump task is 128. If an excessive number of DML commands and statements are sent to Teradata Database, an error message is logged, stating that there are too many DML steps for one Teradata TPump job.