Atomic Upsert Examples - Parallel Data Pump

Teradata Parallel Data Pump Reference

Product
Parallel Data Pump
Release Number
15.00
Language
English (United States)
Last Update
2018-09-27
dita:id
B035-3021
lifecycle
previous
Product Category
Teradata Tools and Utilities

Atomic Upsert Examples

This section describes several examples that demonstrate how the Atomic upsert feature works, including cases where errors are detected and returned to the user. All of the examples use the same table, called Sales, as shown below:

CREATE TABLE Sales, FALLBACK,
(ItemNbr    INTEGER NOT NULL,
SaleDate   DATE FORMAT 'MM/DD/YYYY' NOT NULL,
ItemCount  INTEGER)
PRIMARY INDEX   (ItemNbr);

It is assumed that the table has been populated with the following data:

INSERT INTO Sales (10, '05/30/2005', 1);

Assume that there exists a table called NewSales that has the same column definitions as those of table Sales.

Example (Error: Different Target Tables)

This example demonstrates an upsert statement that does not specify the same table name for the UPDATE part and the INSERT part of the statement.

.Dml label upsertdml do insert for missing update rows;
UPDATE Sales SET ItemCount = ItemCount + 1 WHERE (ItemNbr = 10 AND SaleDate = '05/30/2005');
INSERT INTO NewSales (10,'05/30/2005', 1);

A rule of an upsert statement is that only one single table is processed for the statement. Because the tables, Sales and NewSales, are not the same for the upsert statement, an error is returned, indicating that the name of the table must be the same for both the UPDATE and the INSERT.

Example (Error: Different Target Rows)

This example demonstrates an upsert statement that does not specify the same primary index value for the UPDATE part and the INSERT part of the statement.

.Dml label upsertdml do insert for duplicate update rows;
UPDATE Sales SET ItemCount = ItemCount + 1 WHERE (ItemNbr = 10 AND SaleDate = '05/30/2005');
INSERT INTO Sales (20,'05/30/2005', 1);

The primary index values for the UPDATE and the INSERT must be the same. Otherwise, we would be looking at two different rows: one for UPDATE and the other for INSERT, which is not the purpose of an upsert. An error is returned for the upsert statement because the specified primary index values of 10 and 20 are not the same (the primary index value must be the same for both the UPDATE and the INSERT).

Example (Valid Upsert UPDATE)

This example demonstrates a successful upsert statement where a row gets updated.

.Dml label upsertdml do insert for missing update rows;
UPDATE Sales SET ItemCount = ItemCount + 1 WHERE (ItemNbr = 10 AND SaleDate = '05/30/2005'); 
INSERT INTO Sales (10, '05/30/2005', 1);

After all of the rules have been validated, the row with ItemNbr = 10 and SaleDate = '05/30/2005' gets updated. A successful update of one row is returned.

Example (Valid Upsert INSERT)

This example demonstrates a successful upsert statement where a row gets inserted.

.Dml label upsertdml do insert for missing update rows;
UPDATE Sales SET ItemCount = ItemCount + 1 WHERE (ItemNbr = 20 AND SaleDate = '05/30/2005') 
INSERT INTO Sales (20, '05/30/2005', 1);

After all of the rules have been validated and no row was found with Item = 20 and SaleDate = '05/30/2005' for the UPDATE, a new row is inserted with ItemNbr = 20. A successful insert of one row is returned.

END LOAD

Purpose  

The END LOAD command must be present as the last command of a Teradata TPump task to initiate the execution of the task.

Syntax  

EOC

Purpose  

The EOC (End Of Console) command, which is primarily for developer use, indicates “end of console” when TPump script is input from the console. Using the EOC command in TPump batch mode will cause TPump job to terminate.

Syntax  

 

EXECUTE

Purpose  

Teradata TPump supports the Teradata SQL EXECUTE statement, which specifies a user‑created (predefined) macro for execution. The EXECUTE statement specifies the type of DML statement (INSERT, UPDATE, DELETE, or UPSERT) to be handled by the macro.

The macro named in this EXECUTE statement must reside in Teradata Database before the import task starts. Only one DML statement (INSERT, UPDATE, DELETE, or UPSERT) can be specified in a Teradata TPump predefined macro.

Notice:

The SQL EXECUTE command must be used between the BEGIN LOAD command and the END LOAD command.

Syntax  

where

 

Syntax Element

Description

database

Name of the database in Teradata Database where the macro to be executed resides

name

Name of the macro resident in Teradata Database to be executed

DELETE/DEL

Keyword indicating a DELETE statement is being executed by the macro

INSERT/INS

Keyword indicating an INSERT statement is being executed by the macro

UPDATE/UPD

Keyword indicating an UPDATE statement is being executed by the macro

UPSERT/UPS

Keyword indicating an Atomic upsert is being executed by the macro

Usage Notes

Using predefined macros saves time because Teradata TPump does not need to create and drop new macros each time a Teradata TPump job script is run.

The rules for user‑created macros are:

  • Teradata TPump expects the parameter list for any macro to match the FIELD list specified by the LAYOUT in the script. FILLER fields are ignored. If the USE clause is used in the DML statement, Teradata TPump expects the parameter list for every macro in the DML statement to match the field list specified by the USE clause. The order should be the same as the fields in the LAYOUT.
  • The macro should specify a single prime index operation: INSERT, UPDATE, DELETE, or UPSERT. Teradata TPump reports an error if the macro contains more than one supported statement.
  • The restrictions on INSERT, UPDATE, DELETE, and UPSERT statements supported by Teradata TPump are described in the corresponding sections of this manual.
  • If the EXECUTE statement is replacing an INSERT, UPDATE, DELETE, or UPSERT statement in a job script, the EXECUTE statement must be placed at the same location as the INSERT, UPDATE, DELETE, or UPSERT statement that it replaces. The following example shows an INSERT statement is replaced by an equivalent predefined macro:

       
       .DML LABEL LABELA;
       DELETE <deleteoperands> ;
       INSERT <insertoperands> ;
       UPDATE <updateoperands> ;
       
       .DML LABEL LABELA ;
       DELETE <deleteoperands> ;
       EXECUTE <insertmacroname> INSERT ;
       UPDATE <updateoperands> ;
       

    The correct syntax for a Teradata TPump predefined macro is one of the following:

  • CREATE MACRO <name> (<parameter list>) as (UPDATE....; ) ;
  • CREATE MACRO <name> (<parameter list>) as (INSERT.....; ) ;
  • CREATE MACRO <name> (<parameter list>) as (DELETE.....; ) ;
  • CREATE MACRO <name> (<parameter list>) as (UPSERT.....; ) ;
  • If a Teradata Database server supports Atomic upsert, then automatic use of Atomic upsert is allowed, when possible, without changing existing Teradata TPump scripts. This is accomplished in the following manner:

  • Teradata TPump attempts to use the Atomic upsert syntax in defining a single UPSERT macro (instead of an UPDATE/INSERT macro pair).
  • If the UPSERT macro is successfully defined, Teradata TPump uses the Atomic upsert function for the UPSERT.
  • If an error occurs during UPSERT macro definition, presumably due to a violation of Teradata Database Atomic upsert restrictions, Teradata TPump issues a warning and reverts to the current Teradata TPump upsert method of paired UPDATE/INSERT statements.
  • Teradata TPump will continue to operate as it does now when the existing Teradata TPump syntax for upsert is encountered, and references to predefined macros are used for either the UPDATE or the INSERT or both.

    For example:

                    .DML LABEL <dmllabelname> DO INSERT FOR MISSING UPDATE ROWS ... ;
                    EXECUTE <updatemacroname> UPDATE ;
                    INSERT <insertoperands> ;
     
                    .DML LABEL <dmllabelname> DO INSERT FOR MISSING UPDATE ROWS ... ;
       UPDATE <updateoperands> ;
                    EXECUTE <insertmacroname> INSERT ;
     
                    .DML LABEL <dmllabelname> DO INSERT FOR MISSING UPDATE ROWS ... ;
                    EXECUTE <updatemacroname> UPDATE ;
                    EXECUTE <insertmacroname> INSERT ;

    To allow for the use of predefined macros that take advantage of Atomic upsert, Teradata TPump command syntax supports an UPSERT macro:

                    .DML LABEL <dmllabel>;
                    EXECUTE <upsertmacroname> UPSERT ;

    When using predefined macros for atomic upserts, the DML statement will have “Ignore Missing Update Rows” as a default option.

    Atomic upsert syntax is not backward compatible; thus it cannot be used until the Teradata Database server is updated to a compatible release. If Teradata Database supports Atomic upsert, a Teradata TPump run can handle a mix of both standard and Atomic upserts.

    Upserts are reported as UPDATEs and INSERTs in the statistics displayed by Teradata TPump (and passed to the NOTIFY EXIT routine), because an Atomic upsert that results in an UPDATE will be reported by Teradata Database as an UPDATE activity type, and an Atomic upsert that results in an INSERT will be reported by Teradata Database as an INSERT activity type.

    FIELD

    Purpose  

    The FIELD command specifies a field of the input record; it can also contain a NULLIF expression. All fields specified by FIELD commands are sent to Teradata Database. Only fields relevant to the tasks using this layout need be specified.

    Syntax  

    where

     

    Syntax Element

    Description

    fieldname

    Name of an input record field to which:

    1 a DML statement refers,

    2 a nullexpr of a FIELD command or condition expression of a LAYOUT command refers, or

    3 a condition expression of the IMPORT command APPLY clause refers.

    A fieldname must obey the same rules for its construction as Teradata SQL column names.

    fieldname can be referenced in other FIELD commands via NULLIF and field concatenation expressions, and in APPLY WHERE conditions in IMPORT commands.

    startpos

    Starting position of a field of the data records in an external data source

    It may be specified as an unsigned integer which is a character position starting with 1, or as an asterisk which means the next available character position beyond the preceding field. Nothing prevents redefinition of positions of input records by specifying the same positions in multiple FIELD commands.

    Note that where input records may be continued by use of the CONTINUEIF condition, a startpos specified as an unsigned integer refers to a character position in the final concatenated result from which the continuation indicator fields have been removed. Refer to the description of the condition parameter of the LAYOUT command.

    datadesc

    Type and length of data in the field.

    Teradata TPump generates the USING phrase accordingly with the user‑assigned field name to which the body of the DML statement refers.

    MACROCHARSET

    An optional clause for character fields only (including character fields constructed via concatenation). If a fieldlevel MACROCHARSSET is present for a field of character type, TPump generates a CHARACTER SET clause in the macro creation DDL for the corresponding field using the specified character set.

    servercharsetname

    Server storage character set name. Refer to the following two documents for the valid server storage character set names: International Character Set Support (B035‑1125) and SQL Data Definition Language (B0351144).

    nullexpr

    Condition used for selectively inserting a null value into the affected column

    The condition is specified as a conditional expression involving any number of fields, each represented by its fieldname, and constants. All fieldnames appearing in the conditional expression must be defined by any of the following specifications:

    1 The startpos and datadesc parameters of the FIELD command.

    2 A FILLER command.

    3 A TABLE command.

    If the specified condition is true, Teradata TPump sends the data to Teradata Database with indicators, whether or not the INDICATORS option is specified on the LAYOUT command.

    When the character set of the job script is different from the client character set used for the job (for example, on z/OS the job script must be in Teradata EBCDIC when using the UTF‑8 client character set, or UTF‑16 client character set can be used with the job script in UTF‑8), Teradata TPump will translate the string constants specified in the expression and the import data referenced in the expression to the same character set before evaluating the expression.

    For example, when the client character set is UTF‑16 and the script character set is UTF‑8, if the following commands are given:

    .field C1  *  varchar(20);
    .field C2  *  varchar(40) nullif c1 = 'DELETED';

    Teradata TPump will translate the data in the C1 field to the UTF‑8 form and compare it with the UTF‑8 form of 'DELETED' to obtain the evaluation result.

    Similarly, on the mainframe, when the client character set is UTF8 and the script character set is Teradata EBCDIC, if the following commands are given:

    .field C1  *  char(20);
    .field C2  *  rchar(40) nullif c1 = 'removed';

    Teradata TPump will translate the data in the C1 field from the UTF8 form to the Teradata EBCDIC form and compare it to the Teradata EBCDIC form of 'removed' to obtain the valuation result.

    nullexpr
    Continued

    Notice:

    When using UTF‑8 client set on the mainframe, be sure to examine the definition in International Character Set Support (B035‑1125) to determine the code points of the special characters which might be required. Different versions of EBCDIC do not always agree as to the placement of these characters.

    The mappings between Teradata EBCDIC and Unicode can be referred to in International Character Set Support (B035‑1125).

    The fieldname1 parameter in other FIELD commands can be referenced in nullexpr.

    fieldexpr

    Concatenation of two or more items, either:

  • fields
  • character constants
  • string constants
  • or a combination of these, as in:

    fieldname1||'C'||fieldname2||'STRING'||fieldname3...

    The field names within a layout must be unique and the data type of the field must be either CHAR or VARCHAR. Nested concatenations are not supported.

    If all items of the fieldexpr are fixed character (for example, no VARCHARs), the data type of the resulting field is CHAR(m), where “m” is the sum of the length for each component item.

    If at least one component of the fieldexpr is a VARCHAR, the data type of the resulting field is VARCHAR(m), where “m” is the sum of the maximum length for each component item.

    When the character set of the job script is different from the client character set used for the job (for example, on z/OS the job script must be in Teradata EBCDIC when using the UTF8 client character set, or UTF‑16 client character set can be used with the job script in UTF8), Teradata TPump will translate the character constants and the string constants specified in the expression from the script character set form to the client character set form before concatenating the constants with the specified fields.

    Notice:

    When using the UTF‑8 client set on the mainframe, be sure to examine the definition in International Character Set Support (B035‑1125) to determine the code points of the special characters which might be require. Different versions of EBCDIC do not always agree as to the placement of these characters.

    The mappings between Teradata EBCDIC and Unicode can be referred to in International Character Set Support (B035‑1125).

    DROP

    Characters present in the specified position(s) to be dropped from the specified fieldname, which must be of a character data type

    Teradata TPump drops the specified characters and presents the field to Teradata Database as a VARCHAR data type.

    If two dropping actions are specified, they must not be identical.

    If a FIELD command defines a fieldname in terms of two or more concatenated fieldname fields, any specified DROP clause applies to the concatenated result, not to the individual fieldname fields. But, because each fieldname must be defined by its own previous FIELD command, a DROP clause can be specified on these commands to apply to the individual fields.

    KEY

    Keyword, which, when added to the end of the FIELD command, specifies that the field is part of the hash key for purposes of serialization, if the SERIALIZE parameter on the BEGIN LOAD command is active.

    The serialization feature is meaningful only when a primary key for the loaded data is specified via this KEY option.

    Usage Notes  

    One or more FIELD commands may be intermixed with the TABLE command and the FILLER command. These commands must follow a LAYOUT command.

    If an input record field in fieldname is redefined, the data type from “character” to “decimal” with the datadesc parameter cannot be changed. This is illegal in Teradata TPump and will abort the job and return an error message.

    If both NULLIF and DROP LEADING/TRAILING BLANKS/NULLSis specified on the same FIELD command, the DROP clause is evaluated after the NULLIF clause. As an example, in the FIELD command:

    .FIELD FIELDNAME * CHAR (5) NULLIF FIELDNAME = 'x'
     DROP LEADING BLANKS;

    if the input for fieldname is 'x', the NULLIF expression would evaluate to false because the leading blanks are not dropped before the NULLIF evaluation.

    If a field‑level MACROCHARSSET is present for a field of character type, TPump generates a CHARACTER SET clause in the macro creation DDL for the corresponding field using the specified character set.

    If no field‑level MACROCHARSET is present for a field of character type, but a layout/schema‑level MACROCHARSET is present, TPump generates a CHARACTER SET clause in the macro creation DDL for the corresponding field using the specified layout character set.

    Absent user specification, the following clause will be generated for each character macro parameter CHARACTER SET LATIN, when the client session character set is:

  • Is ASCII
  • Is EBCDIC
  • Ends in _0A or _0E
  • Ends in _zx, where “z” is other than 0 (digit zero)
  • Does not end in _zx or _zxx, except for KATAKANAEBCDIC, UTF‑8, or UTF‑16
  • TPump generates CHARACTER SET UNICODE in all other cases.