DML - Parallel Data Pump

Teradata Parallel Data Pump Reference

Product
Parallel Data Pump
Release Number
15.10
Language
English (United States)
Last Update
2018-10-07
dita:id
B035-3021
lifecycle
previous
Product Category
Teradata Tools and Utilities

DML

Purpose  

The DML command defines a label and error treatment options for one or more immediately following DML statements. DML statements relevant to a Teradata TPump job are INSERT, UPDATE, DELETE, and EXECUTE, with UPDATE and INSERT statements sometimes paired to form either a basic upsert or an Atomic upsert operation.

Syntax  

where

 

Syntax Element

Description

LABEL

Keyword indicating that the following parameter is a label for the DML statements that follow

label

Unique label is to be used for the immediately following set of one or more DML statements

A label must obey the same rules for its construction as Teradata SQL column names.

The label name may be referenced in the APPLY clause of an IMPORT command.

MARK

Keyword indicating that the system should make a duplicate, missing, or extra INSERT, UPDATE, or DELETE row entry in the error table and continue processing.

A row is a duplicate row if all column values in the row are the exact duplicate of another row. Duplicate row checking is bypassed if the table is a multiset table (which allows duplicate rows), or if the table has one or more unique indexes (the uniqueness test(s) make any duplicate row check unnecessary).

If MARK is set and a uniqueness violation occurs on either a unique primary index or a unique secondary index, the offending rows go to the error table, whether or not the row is a duplicate row. In the case of an upsert, both the INSERT and UPDATE portions must fail for an error to be recorded.

 

If neither MARK or IGNORE is specified for duplicate rows, MARK applies to both INSERTs and UPDATEs. Similarly, if neither MARK or IGNORE is specified for missing or extra rows, MARK applies to both UPDATEs and DELETEs.

 

MARK is the default for:

  • Both UPDATEs and DELETEs that refer to missing or extra rows.
  • Duplicate rows arising from both INSERTs and UPDATEs, except when those statements are combined to form an upsert, in which case the default is IGNORE.
  • IGNORE

    keyword indicating that the system should not make an error table entry for the duplicate, missing, or extra INSERT, UPDATE, or DELETE row

    The system should continue processing instead.

    A row is a duplicate row if all column values in the row are the exact duplicate of another row. Duplicate row checking is bypassed if the table is a multiset table (which allows duplicate rows), or if the table has one or more unique indexes (the uniqueness test(s) make any duplicate row check unnecessary); in these cases, IGNORE DUPLICATE ROWS has no effect. Any uniqueness violations will result in the offending rows going to the error table.

     

    If neither INSERT nor UPDATE is specified for duplicate rows, IGNORE applies to both INSERTs and UPDATEs.

     

    Similarly, if neither UPDATE nor DELETE is specified for missing or extra rows, IGNORE applies to both UPDATEs and DELETEs. IGNORE is the default condition for an upsert operation.

    INSERT

    The upsert feature may be used (when used as DO INSERT FOR MISSING UPDATE ROWS or DO INSERT ROWS).

     

    An upsert saves time while loading a database. An upsert completes, in one pass, an operation which requires two passes for other utilities. The DML statements that follow this option must be in the order of a single UPDATE statement followed by a single INSERT statement.

    This option first executes the UPDATE statement. If the UPDATE fails because the target row does not exist, Teradata TPump automatically executes the INSERT statement. This capability allows updates to the database without first presorting the data. Otherwise, the data would have to be sorted into:

  • rows that need to be updated
  • rows that need to be inserted
  • Further information on the usage and restrictions of the upsert feature appears in the following usage notes.

    PARTITION

    Optional keyword used to name a session partition to be used for all SQL requests associated with this DML command

    If this keyword is not present, a session created from the SESSIONS will be used.

    Note: If serialization of two or more DML statements is required, the statements cannot be put in different partitions. Serialization requires that all DML statements with identical hash values of the rows be submitted from the same session.

    When using multiple DMLs and pack > 1 more throughput can be achieved by using partitions. If DBQL analysis reflects high PE cpu compared to AMP cpu, using partitions can reduce parsing due to better utilization of statement cache by assigning each DML to a unique TPUMP partition so all requests per partition are textually identical thus using the same single statement cache entry.

    Additional performance benefits can be achieved with concurrent step execution with higher pack and by increasing the number of sessions.

    partition_name

    Parameter identifying the partition name

    The partition name must obey the same rules for its construction as Teradata SQL column names.

    SERIALIZEON

    Keyword used to turn serialization on for the fields specified

    SERIALIZEON keyword may be used before, after, or between any IGNORE or MARK statements.

    serialize_on_field

    Parameter identifying the field names where serialization is turned on

    This is the same field name used in the LAYOUT command which was used by the INSERT statement and referenced by the APPLY clause.

    Separate the field names with a comma and enclose them in parentheses.

    USE

    Keyword used to specify the fields that are to be used with a DML’s SQL statements

    Use of this keyword allows specification of the FIELDs from the LAYOUT command which are actually needed for each DML, so that data from all fields will not be sent.[

    The USE keyword may be placed before, after, or between any IGNORE/MARK statements.

    use_field

    Parameter identifying the field names to use

    Every LAYOUT FIELD used by any of the DML’s SQL statements must be enumerated in the USE list; otherwise, an error will occur.

    Separate the field names with a comma and enclose them in parentheses.

    ArraySupport
    ON/OFF

    “ArraySupport ON|OFF” option to the .BEGIN LOAD command and the .DML command

    When “ArraySupport ON” is specified in the .BEGIN LOAD command, the .DML commands enclosed in .BEGIN LOAD and .END LOAD command pair will use the ArraySupport feature for its DML statement, unless “ArraySupport OFF” is specified for the .DML command. The default value of ArraySupport for the .BEGIN LOAD command is OFF.

    When “ArraySupport ON|OFF” is not specified with the .DML command, the default value for ArraySupport for that .DML command is the effective setting of ArraySupport in the .BEGIN LOAD command where the .DML command resides. When “ArraySupport ON|OFF” is specified at the .DML command, the specified value overrides the default setting determined by the .BEGIN LOAD command.

    When a .DML command is using the ArraySupport feature, it must contain one and only one DML statement and the session partition that the .DML command references needs to be used exclusively by this .DML command.

    If the DML statement is an UPSERTtype statement, it can be specified as a pair of INSERT/UPDATE statements with DO INSERT FOR MISSING UPDATE clause. Teradata TPump will create its equivalent form of UPDATE … ELSE INSERT …, example Atomic Upsert, and use it as the actual DML statement. Or an UPDATE … ELSE INSERT … statement can be directly specified with DO INSERT FOR MISSING UPDATE clause.

    The nonatomic form of UPSERT is not supported by Teradata TPump Array Support.

    Usage Notes

    The SQL EXECUTE command must be used between the BEGIN LOAD command and the END LOAD command.

    All INSERT, UPDATE, DELETE, and EXECUTE statements specified in the Teradata TPump script should fully specify the primary index of the referenced table to prevent the generation of table‑level locks.

    A maximum of 1500 DML statements may be packed into a request; the default is 20 statements.

    Teradata TPump assumes that row hash locking is used by INSERT, UPDATE, DELETE, and EXECUTE statements. If row hash locking is not used, Teradata TPump will run anyway, but may encounter trouble because table‑level locking will cause each statement to block.

    In addition, Teradata TPump does not support UPDATE or EXECUTE statements that modify the primary index of the target table. Teradata TPump performs no checking to prevent the script author from creating DML that requests unreasonable updates, except that Teradata TPump will not use Atomic UPSERT if the UPDATE portion of the UPSERT specifies an unreasonable update. This restriction is imposed by Teradata Database.

    IGNORE DUPLICATE ROWS does not apply if there are ANY unique indexes in the row.

    Teradata TPump converts INSERT, UPDATE, and DELETE statements into macro equivalents, and, depending on the packing specified, submits multiple statements in one request. Teradata TPump also supports the EXECUTE statement, which can be used to bypass the macro creation step for frequently executed macros. For more information on the EXECUTE statement, refer to EXECUTE in this chapter.

    The maximum number of INSERT, UPDATE, and DELETE statements that can be referenced in an IMPORT is 128. The 128th DML which would cause the insertion of the DML sequence number of 128 for the DMLSEQ field in the error table could lead to Teradata Database 3520 error.

    At the end of an IMPORT, an environmental variable is established for each DML command executed. Teradata TPump variables are not limited to 30 characters. These variables contain the activity counts associated with each statement. The variables created are of the form:

    &IMP <n>_<Apply label>_<x> 

    where

    n = the number of the IMPORT, from one through four.

    Apply label = the label of the clause containing the DML command in question.

    x = the number of the statement within the containing APPLY clause.

    Serialization

    The SERIALIZEON keyword allows serialization to be turned on for the specified fields. The SERIALIZEON keyword can be used before, after, or between any IGNORE or MARK statements.

    The SERIALIZEON keyword can also be used with the SERIALIZE keyword in the BEGIN LOAD command and with the KEY keyword in the FIELD command. When it is used in this way, the DML‑level serialization ignores and overrides the BEGIN LOAD‑level serialization.

    In addition, the DML serialized APPLYs can be mixed with nonserialized DML APPLYs in the same IMPORT command.

    See “BEGIN LOAD” and “FIELD” for details about these commands.