Syntax - Parallel Data Pump

Teradata® Parallel Data Pump Reference

Product
Parallel Data Pump
Release Number
17.00
Published
June 2020
Language
English (United States)
Last Update
2020-06-19
dita:mapPath
ioq1544831946920.ditamap
dita:ditavalPath
obe1474387269547.ditaval
dita:id
B035-3021
lifecycle
previous
Product Category
Teradata Tools and Utilities


where the following is true:

LABEL
Keyword indicating that the following parameter is a label for the DML statements that follow
label
Unique label is to be used for the immediately following set of one or more DML statements
A label must obey the same rules for its construction as Teradata SQL column names.
The label name may be referenced in the APPLY clause of an IMPORT command.
MARK
Keyword indicating that the system should make a duplicate, missing, or extra INSERT, UPDATE, or DELETE row entry in the error table and continue processing.
A row is a duplicate row if all column values in the row are the exact duplicate of another row. Duplicate row checking is bypassed if the table is a multiset table (which allows duplicate rows), or if the table has one or more unique indexes (the uniqueness test(s) make any duplicate row check unnecessary).
If MARK is set and a uniqueness violation occurs on either a unique primary index or a unique secondary index, the offending rows go to the error table, whether or not the row is a duplicate row. In the case of an upsert, both the INSERT and UPDATE portions must fail for an error to be recorded.
If neither MARK or IGNORE is specified for duplicate rows, MARK applies to both INSERTs and UPDATEs. Similarly, if neither MARK or IGNORE is specified for missing or extra rows, MARK applies to both UPDATEs and DELETEs.
MARK is the default for:
  • Both UPDATEs and DELETEs that refer to missing or extra rows.
  • Duplicate rows arising from both INSERTs and UPDATEs, except when those statements are combined to form an upsert, in which case the default is IGNORE.
IGNORE
keyword indicating that the system should not make an error table entry for the duplicate, missing, or extra INSERT, UPDATE, or DELETE row
The system should continue processing instead.
A row is a duplicate row if all column values in the row are the exact duplicate of another row. Duplicate row checking is bypassed if the table is a multiset table (which allows duplicate rows), or if the table has one or more unique indexes (the uniqueness test(s) make any duplicate row check unnecessary); in these cases, IGNORE DUPLICATE ROWS has no effect. Any uniqueness violations will result in the offending rows going to the error table.
If neither INSERT nor UPDATE is specified for duplicate rows, IGNORE applies to both INSERTs and UPDATEs.
Similarly, if neither UPDATE nor DELETE is specified for missing or extra rows, IGNORE applies to both UPDATEs and DELETEs. IGNORE is the default condition for an upsert operation.
INSERT
The upsert feature may be used (when used as DO INSERT FOR MISSING UPDATE ROWS or DO INSERT ROWS).
An upsert saves time while loading a database. An upsert completes, in one pass, an operation which requires two passes for other utilities. The DML statements that follow this option must be in the order of a single UPDATE statement followed by a single INSERT statement.
This option first executes the UPDATE statement. If the UPDATE fails because the target row does not exist, Teradata TPump automatically executes the INSERT statement. This capability allows updates to the database without first presorting the data. Otherwise, the data would have to be sorted into:
  • rows that need to be updated
  • rows that need to be inserted
Further information on the usage and restrictions of the upsert feature appears in the following usage notes.
PARTITION
Optional keyword used to name a session partition to be used for all SQL requests associated with this DML command
If this keyword is not present, a session created from the SESSIONS will be used.
If serialization of two or more DML statements is required, the statements cannot be put in different partitions. Serialization requires that all DML statements with identical hash values of the rows be submitted from the same session.
When using multiple DMLs and pack > 1 more throughput can be achieved by using partitions. If DBQL analysis reflects high PE cpu compared to AMP cpu, using partitions can reduce parsing due to better utilization of statement cache by assigning each DML to a unique TPUMP partition so all requests per partition are textually identical thus using the same single statement cache entry.
Additional performance benefits can be achieved with concurrent step execution with higher pack and by increasing the number of sessions.
partition_name
Parameter identifying the partition name
The partition name must obey the same rules for its construction as Teradata SQL column names.
SERIALIZEON
Keyword used to turn serialization on for the fields specified
SERIALIZEON keyword may be used before, after, or between any IGNORE or MARK statements.
serialize_on_field
Parameter identifying the field names where serialization is turned on
This is the same field name used in the LAYOUT command which was used by the INSERT statement and referenced by the APPLY clause.
Separate the field names with a comma and enclose them in parentheses.
USE
Keyword used to specify the fields that are to be used with a DML’s SQL statements
Use of this keyword allows specification of the FIELDs from the LAYOUT command which are actually needed for each DML, so that data from all fields will not be sent.
The USE keyword may be placed before, after, or between any IGNORE/MARK statements.
use_field
Parameter identifying the field names to use
Every LAYOUT FIELD used by any of the DML’s SQL statements must be enumerated in the USE list; otherwise, an error will occur.
Separate the field names with a comma and enclose them in parentheses.
ArraySupport
ON/OFF
“ArraySupport ON|OFF” option to the .BEGIN LOAD command and the .DML command
When “ArraySupport ON” is specified in the .BEGIN LOAD command, the .DML commands enclosed in .BEGIN LOAD and .END LOAD command pair will use the ArraySupport feature for its DML statement, unless “ArraySupport OFF” is specified for the .DML command. The default value of ArraySupport for the .BEGIN LOAD command is OFF.
When “ArraySupport ON|OFF” is not specified with the .DML command, the default value for ArraySupport for that .DML command is the effective setting of ArraySupport in the .BEGIN LOAD command where the .DML command resides. When “ArraySupport ON|OFF” is specified at the .DML command, the specified value overrides the default setting determined by the .BEGIN LOAD command.
When a .DML command is using the ArraySupport feature, it must contain one and only one DML statement and the session partition that the .DML command references needs to be used exclusively by this .DML command.
If the DML statement is an UPSERT-type statement, it can be specified as a pair of INSERT/UPDATE statements with DO INSERT FOR MISSING UPDATE clause. Teradata TPump will create its equivalent form of UPDATE … ELSE INSERT …, example Atomic Upsert, and use it as the actual DML statement. Or an UPDATE … ELSE INSERT … statement can be directly specified with DO INSERT FOR MISSING UPDATE clause.
The non-atomic form of UPSERT is not supported by Teradata TPump Array Support.