17.00 - Script Writing Guidelines - Parallel Data Pump

Teradata® Parallel Data Pump Reference

Parallel Data Pump
June 2020
Programming Reference
The following script writing guidelines will help write a Teradata TPump job script:
  • A script may contain up to 100 IMPORTs (tasks), delimited by a leading BEGIN LOAD command and a trailing END LOAD command.
  • The BEGIN LOAD command specifies the number of sessions and establishes a number of controlling parameters.

    The BEGIN LOAD command also specifies the error table, and is the only table specified. An optional qualifying database name may also be specified. This database name may be different from the database being modified, thus allowing tables to be created and dropped with no impact on the production database.

    In addition, the BEGIN LOAD command establishes acceptable threshold levels for important task controls, such as number and percentage of errors, session limits, duration of logon attempts in hours (tenacity), and checkpointing frequency. This command also provides optional controls to:
    • determine where any macros are placed
    • guarantee serial operations on given rows
    • select the number of statements to pack into a multiple-statement request
    • select a restart logic mode
  • The next item appearing in a script is usually a description of the records in the external file containing the change data for the target tables. The description of these input records appears in a sequence of commands headed by the LAYOUT command.

    The LAYOUT command tags the record layout being depicted with a unique name, which is then referenced by subsequent script commands in tasks throughout the rest of the job. The LAYOUT is followed by the supporting information contained in the sequence of one or more FIELD, FILLER, and TABLE commands.

  • Each FIELD command describes a single data item occupying a column in the input row. These items are described by data type, starting position, length, and several other characteristics. The FIELD command is used only for those items (columns) relevant to the current task, which are to be sent to the database as changes to the target table.

    The FIELD command may include the KEY modifier if the column is to be considered part of the primary index for purposes of serialization.

  • Each FILLER command describes a column in the input row in the same way as the FIELD command. These FILLER fields are never sent to the database. The FILLER command, however, identifies those columns which should not be sent to the database. Thus, if a sequence of 10 alternating FIELD and FILLER commands is used to describe 10 contiguous columns in the row, every other column, a total of five columns, would be sent to the database.
  • The TABLE command identifies any existing table with the same layout as the input. The TABLE command is used when the changes are being enacted on entire rows, rather than selected columns.
  • The next entry in the script is the DML command, which is followed by the DML statements INSERT, UPDATE, and DELETE. The DML command creates an identifying label for the DML statement input, which immediately follows the command. The DML command also defines an error handling process for handling missing and duplicate rows, with respect to the error table.

    The three DML statements (INSERT, UPDATE, and DELETE) follow the DML command, and may occur in any order and in any quantity. The INSERT statement is used to place a complete and entirely new row into the target table.

    The UPDATE statement takes the data contents from columns in the input record, as defined with the LAYOUT, FIELD, FILLER command sequence, and substitutes the data into the target table. The UPDATE rows are selected based on criteria specified in a conditional clause in the statement.

    The DML command also allows UPDATE and INSERT statements to be paired to provide Teradata TPump with an upsert capability. This allows Teradata TPump, in a single pass, to attempt an UPDATE and, if it fails, perform an INSERT on the same row.

    The DELETE statement removes entire rows from the target table whenever the evaluation of the deleting condition is true, as specified in a conditional clause in the statement.

  • The only information not yet provided in the task is the identity of the input data file, the starting and ending records in the file that are being used in this task, and other related information. This is done with the IMPORT command. This command basically tells the Teradata TPump utility to bring in file X, from record A through record N, to associate the layout name (and specifications) with the input records, and to apply the desired DML (INSERT, UPDATE, and DELETE) statement to each record.
  • The last command in the script is the END LOAD command. This command flags the end of the commands and statements for the task, and triggers the program to begin execution of the task.

    For compatibility with the MultiLoad utility, multiple IMPORTs (up to 100) are allowed within a single BEGIN/END LOAD pair. However, because Teradata TPump does have an apply phase, there is no significant difference between a script containing 100 BEGIN/END LOAD pairs, each with one IMPORT, and a script with one BEGIN/END LOAD pair and 100 IMPORTs.