BEGIN LOAD - Parallel Data Pump

Teradata Parallel Data Pump Reference

Product
Parallel Data Pump
Release Number
15.10
Language
English (United States)
Last Update
2018-10-07
dita:id
B035-3021
lifecycle
previous
Product Category
Teradata Tools and Utilities

BEGIN LOAD

Purpose  

The BEGIN LOAD command initiates or restarts a Teradata TPump task, specifying the number of sessions to use and any other parameters needed to execute the task.

Syntax

where

 

Syntax Element

Description

SESSIONS

Keyword for the number of Teradata TPump sessions.

number

Number of sessions to be logged on for update purposes for Teradata TPump.

A Teradata TPump task logs on and uses the number of sessions specified. One additional session is used for performing various utility functions.

There is no default value for number; it must be specified. Neither is there a maximum value, except for systemwide session limitations, which vary among machines.

Limiting the number of sessions conserves resources on both the external system and Teradata Database. This conservation is at the expense of a potential decrease in throughput and increase in elapsed time.

threshold

Minimum number of sessions to be logged on for update purposes for the utility.

When logging on sessions, if the limits are reached above the threshold value, Teradata TPump stops trying to log on, and uses whatever sessions are already logged on.

If the sessions run out before the threshold is reached, Teradata TPump logs off all sessions, waits for the time determined by the SLEEP value, and tries to log on again.

ERRORTABLE

Optional keyword for identifying a database and error table.

Use a database name as a qualifying prefix to the error tables. Specifying a database that is not the production database avoids cluttering the production system with error tables. This means that because the database should have been allocated a lot of PERM space, that space will not have to be increased for all databases with tables involved in the Teradata TPump task.

Notice:

It is the user’s responsibility to ensure that the error table sharing was meaningful. Two jobs targeting different tables with different input record layouts cannot share the same error table.

APPEND

Specifies the existing error table.

If the specified error table does not exist, Teradata TPump will create it. If the structure of the existing error table is not compatible with the error table Teradata TPump creates, the job will run into an error when Teradata TPump tries to insert or update the error table.

NODROP

Specifies to retain (not DROP) the error table, even it is empty at the end of a job.

NODROP can be used with APPEND to persist the error table or alone.

QUEUETABLE

Selects the error table as a Queue Table.

dbname.

The qualified database for the error table.

If the database is not specified, the database which contains the log table is used. The period following the dbname separates the database name from the tname parameter. If a different database is specified, it may help to avoid cluttering the production database with error tables.

tname

Error table that receives information about errors detected during the load.

tname may be preceded by a database name qualifier. This table is referred to as the error table or ET table.

TPump explicitly specifies FALLBACK as the level of protection.

The default error table name is composed of the job name, followed by an underscore and sequence number of the load, then an underscore and an ET, as in jobname_nnn_ET.

tname identifies a nonexisting table for a nonrestart task, or an existing table for a restart task.

For all errors inserted in this error table, the identifiers for the offending combination of statement and data record are included in the appropriate row of tname. The columns in the error table allow the identification of a specific data record and statement combination which produced an error. The column names and definitions of the error table are:

ImportSeqA byteint containing the IMPORT command sequence number.

DMLSeqA byteint containing the sequence number of the DML command within the command file.

SMTSeqA byteint containing the sequence number of the DML statement within the DML command.

ApplySeqA byteint containing the sequence number of the APPLY clause within its IMPORT command.

SourceSeqAn integer containing the position of a data record within a data source.

DataSeqA byteint identifying the data source. This value is always one.

ErrorCode An integer containing an error return code.

ErrorMsgContains the corresponding error message for the error code.

ErrorField A smallint, which, if valid, indicates the bad field.

The names of record fields sent to Teradata Database are specified via the LAYOUT command, in conjunction with FIELD and TABLE commands.

HostData A variable length byte string containing the data sent by the external system.

RowInsetTimeIndicates when the row was inserted into the Teradata TPump error table.

ERRLIMIT

Optional keyword for setting a limit on records rejected for errors.

When the ERRLIMIT is exceeded, Teradata TPump performs a checkpoint, then terminates the job. The data read before ERRLIMIT was exceeded will be submitted and completed before the job is terminated. This means when a job is terminated due to ERRLIMIT was exceeded, there may be more error records in the error table than the number specified in ERRLIMIT. To facilitate diagnosis of data errors, the ERRLIMIT should be greater than the number of statements packed into one request.

errcount

Error threshold for controlling the number of rejected records. Usage depends on whether used with the errpercent parameter.

  • When used without the errpercent parameter, specifies, as an unsigned integer, the number of records that can be rejected and recorded in tname during a load (all records sent between the BEGIN LOAD and END LOAD commands). The default is no limit.
  • When used with the errpercent parameter (which is approximate), specifies the maximum number of records that must be sent to Teradata Database before the errpercent parameter is applied.
  • For example, if errcount = 100 and errpercent = 5, then 100 records must be sent to Teradata Database before the approximate 5 percent rejection limit is applied. If only the first five records are rejected when the 100th record is sent, the limit is not exceeded. If there are six rejections, however, then the limit is exceeded. After the 100th record is sent, Teradata TPump stops processing if the 5 percent limit has been exceeded.

    When the limit has been exceeded, Teradata TPump writes an error message to the external system’s customary message destination and terminates the task.

    All tables in use are left in their state at the time of the termination. This allows errors to be corrected in data records and restarting of the task from the last checkpoint. If a restart is not possible or not desired, any unwanted tables should be dropped.

    CHECKPOINT

    Keyword indicating the number of minutes between the occurrences of checkpoints.

    This is followed by a frequency value.

    frequency

    The interval in minutes between check pointing operations. Specify an unsigned integer from 0 through 60, inclusive.

    To specify a CHECKPOINT frequency of less than or equal to 60, a checkpoint is recorded at the specified frequency, in minutes.

    To specify a CHECKPOINT frequency of more than 60, Teradata TPump terminates the job.

    Specifying a CHECKPOINT frequency of zero bypasses all checkpoint functions. Teradata TPump does not do checkpoint operations at all, which is different from Teradata Tools and Utilities 07.00 and earlier. The Teradata TPump job will not be restart-able once data loading has begun.

    If a CHECKPOINT frequency is not specified, check pointing occurs every 15 minutes by default.

    Whether specified or not, checkpoints are written at the end of each data input source.

    Note: Checkpoints should not be set for an FDLcompatible INMOD routine with the FOR, FROM, or THRU options. When an FDL‑compatible INMOD routine is used with the FOR, FROM, or THRU options, Teradata TPump terminates and an error message appears if the checkpoint frequency is other than zero.

    DATAENCRYPTION ON/OFF

    Keyword to encrypt import data and the request text during the communication between Teradata TPump and Teradata Database.

    If ON, the encryption will be performed. If DATAENCRYPTION is not specified, the default is OFF.

    The "y" runtime parameter applies the encryption to all connected sessions, which include the control session and the load sessions. This option only applies the encryption to the load sessions, which are the sessions specified by the SESSIONS keyword in the BEGIN LOAD command, and overrides the "y" runtime parameter when OFF is explicitly specified. For example, assuming the PARTITION command is not used in the job, when "y" runtime parameter is specified with the job and DATAENCRYPTION OFF is specified in the script, the encryption will only apply to the control session. Similarly, assuming the PARTITION command is not used in the job when "y" runtime parameter is not specified with the job, and DATAENCRYPTION ON is specified in the script, the encryption will apply to all load sessions but not the control session.

    When the PARTITION command is used, the encryption setting explicitly specified in the PARTITION command will override the setting of this option over the sessions defined by the PARTITION command.

    ArraySupport
    ON/OFF

    “ArraySupport ON|OFF” option to the .BEGIN LOAD command and the .DML command.

    When “ArraySupport ON” is specified in the .BEGIN LOAD command, the .DML commands enclosed in .BEGIN LOAD and .END LOAD command pair will use the ArraySupport feature for its DML statement, unless “ArraySupport OFF” is specified for the .DML command. The default value of ArraySupport for the .BEGIN LOAD command is OFF.

    When “ArraySupport ON|OFF” is not specified with the .DML command, the default value for ArraySupport for that .DML command is the effective setting of ArraySupport in the .BEGIN LOAD command where the .DML command resides. When “ArraySupport ON|OFF” is specified at the .DML command, the specified value overrides the default setting determined by the .BEGIN LOAD command.

    When a .DML command is using the ArraySupport feature (defined in the .BEGIN LOAD command or in the .DML command), it must contain one and only one DML statement, and the session partition that the .DML command references needs to be used by this .DML command exclusively.

    If the DML statement is an UPSERTtype statement, it can be specified as a pair of INSERT/UPDATE statements with DO INSERT FOR MISSING UPDATE clause. Teradata TPump will create its equivalent form of UPDATE … ELSE INSERT …, example Atomic Upsert, and use it as the actual DML statement. Or an UPDATE … ELSE INSERT … statement can be directly specified with DO INSERT FOR MISSING UPDATE clause.

    The nonatomic form of UPSERT is not supported by Teradata TPump Array Support.

    TENACITY

    Keyword (with hours parameter) defining how long the utility tries to log on the sessions needed to perform the Teradata TPump job.

    If a logon is denied, Teradata TPump delays for the time specified by the SLEEP parameter (the default is six minutes) and retries the logon. It retries until either the logon succeeds or the number of hours specified by TENACITY is exceeded.

    If the TENACITY parameter is not specified, the utility retries the logons for four hours.

    hours

    Teradata TPump tenacity factor, as an integral number of hours. Specifies how long Teradata TPump keeps trying to logon to the required sessions.

    The default value for hours is 4 if the parameter is not specified. If hours is specified as 0, Teradata TPump does not retry logons after a logon fails because of a capacity limit.

    When a “no more sessions” error appears (either a 301 return code from a workstation CLI or a 513 return code from a mainframe CLI), Teradata TPump drops the sessions already acquired, and terminates the job without trying another logon.

    LATENCY

    Keyword for flushing stale buffers.

    Note: When using the Teradata TPump latency option with Named Pipe Access Module, need_full_block = no option should be added in the Named Pipe Access Module initialization string.

    seconds

    Flushing threshold based on number of seconds oldest record has resided in buffer.

    LATENCY cannot be less than one second.

    If the SERIALIZE parameter is set to OFF, only the current buffer can possibly be stale. If SERIALIZE is ON, the number of stale buffers can range from zero to the number of sessions.

    NOTIMERPROCESS

    Keyword to tell Teradata TPump not to fork a child process as a timer process.

    When a child process is forked, the SIGUSR2 signal notifies the parent process when the latency period expires. When a child process is not forked, the SIGALRM signal notifies the Teradata TPump process when the latency period expires. A child process is necessary for the latency function to work properly on the UNIX platforms when the MQSeries Access Module is used.

    minutes

    Number of minutes to wait between unsuccessful logon attempts due to session limits errors on Teradata Database or CLIv2.

    If SLEEP is not specified, the default between unsuccessful logon attempts is 6 minutes.

    SERIALIZE ON/OFF

    Keyword to set the state (ON/OFF) of the serialization feature which, if ON, guarantees that operations on a given key combination (row) occur serially.

    If SERIALIZE is not specified, the default is OFF.

    This feature is meaningful only when a primary key for the loaded data is specified by using the KEY option of the FIELD command.

    Serialization must be set to OFF if the .TABLE command is used and the target table is a NoPI table.

    To ensure data integrity, the SERIALIZE parameter defaults to ON in the absence of an explicit value if there are upserts in a Teradata TPump job.

    PACKMAXIMUM

    Keyword requesting Teradata TPump to dynamically determine the maximum possible PACK factor for the current load.

    Maximum value is 1500.

    Displayed in message UTY6652, the value thus determined should be specifically used on subsequent runs, as the use of PACKMAXIMUM requires iterative interactions with the database during initialization to heuristically determine the maximum possible PACK factor.

    Note: Since the maximum packing factor has changed from 600 to 1500, it will take a longer time for Teradata TPump to test the packing factor if PACKMAXIMUM is specified.

    PACK

    Keyword for the number of statements to pack into a multiplestatement request.

    Maximum value is 1500.

    Packing improves network/channel efficiency by reducing the number of sends and receives between the application and Teradata Database.

    statements

    Number of statements, as a positive integer of up to 1500, to pack into a multiplestatement request.

    Default value is 20 statements per request.

    Under certain conditions, Teradata TPump may determine that the pack factor has been set too high. Teradata TPump then automatically lowers the pack setting to an appropriate value and issues warning message UTY6625, for instance:

    “UTY6625 WARNING: Packing has been changed to 12 statements per request” and continues.

    Packing improves network/channel efficiency by reducing the number of sends/receives between the application and the database.

    The packing factor is validated by sending a fully packed request to Teradata Database using a prepare. This test checks for syntax problems and requests that are excessively large and may overwhelm the parser.

    To simplify the script development process, Teradata TPump ignores certain errors returned by an overloaded parser, shrinks the request, retries the prepare until it executes successfully and finally, issues a warning noting the revised packing factor size.

    When this happens, the Teradata TPump script should be modified to eliminate the warning, thereby avoiding the timeconsuming process of shrinking the request.

    A packing failure may occur if the source parcel length does not match the data defined. If this happens, Teradata TPump issues the message:

    “UTY2819 WARNING: Packing may fail because input data does not match with the data defined.”

    To resolve this problem, increase the packing factor and resubmit the job.

    RATE

    Keyword for entering the rate at which statements are sent to Teradata Database.

    RETRYTIMES nn

    Keyword for retry times number of retry times.

    Default is 16.

    If nn equals 0, the retry times will be set to 16. If retrytimes is set, this only takes effect for the requests/data between "BEGIN LOAD" and "END LOAD" pair.

    statement_rate

    Initial maximum rate at which statements are sent to Teradata Database per minute.

    The statement rate must be a positive integer. If the statement rate is unspecified, the rate is unlimited.

    If the statement_ rate is less than the statement packing factor, Teradata TPump sends requests smaller than the packing factor.

    If the Teradata TPump Monitor is in use, the statement_rate can be changed later on.

    SLEEP

    Keyword for the number of minutes to sleep.

    NOATOMICUPSERT

    Keyword to perform nonatomic upsert operations for UPSERT DMLs in the job script if these UPSERT DMLs are not provided in the Atomic UPSERT form.

    NOMONITOR

    Keyword to prevent Teradata TPump from checking for statement rate changes from, or update status information for, the Teradata TPump Monitor.

    ROBUST ON/OFF

    The ON parameter ensures data integrity when target tables have identity columns.

    The OFF parameter signals Teradata TPump to use simple restart logic. In this case, restarts cause Teradata TPump to begin where the last checkpoint occurred in a job. Any processing that occurred after the checkpoint is redone. This method does not have the extra overhead of the additional database writes in the Robust logic.

    Notice:

    Certain errors may cause reprocessing, resulting in extra error table rows due to re-executing statements (attempting to re‑insert rows, for example). Or, if the target table allows duplicate rows, re-executing statements may cause extra duplicate rows to be inserted into the target table instead of causing extra error table rows.

    Simple logic is adequate in certain DML statements that can be repeated without changing the results of the operation. Examples of statements that are not simple logic include the following:

  • INSERTs into tables that allow duplicate rows (MULTISET tables).
  • Selfreferencing DML statements such as:
  • “UPDATE FOO SET A=A+1...”
  • “UPDATE FOO SET A = 3 WHERE A=4”
  • MACRODB

    Keyword for database to contain any macros used by Teradata TPump.

    dbname

    Name of database which is to contain any macros built/used by Teradata TPump.

    This database overrides the default placement of macros into the database which contains the log restart table.

    NOTIFY

    Teradata TPump implementation of the notify user exit option:

  • NOTIFY OFF suppresses the notify user exit option.
  • NOTIFY LOW enables the notify user exit option for those events signified by “Yes” in the Low Notification Level column of Table 26.
  • NOTIFY MEDIUM enables the notify user exit option for the most significant events, as specified by “Yes” in the Medium Notification Level column of Table 26.
  • NOTIFY HIGH enables the notify user exit option for every Teradata TPump event that involves an operational decision point, as specified by “Yes” in the High Notification Level column of Table 26.
  • NOTIFY ULTRA enables the notify user exit option for every Teradata TPump event that involves an operational decision point, as specified by “Yes” in the ULTRA Notification Level column of Table 26.
  • Beginning with Teradata Tools and Utilities 14.00, TPump terminates with a fatal error if any of NOTIFY error conditions occur.
  • EXIT name

    Keyword phrase that calls a userdefined exit where name is the name of a usersupplied library with a member name of _dynamn.

    The exit must be written in C, or in a language with a runtime environment that is compatible with C.

    On some versions of UNIX systems, ./prefix characters may have to be added to the EXIT name specification if the module is in the current directory.

    TEXT 'string'

    Usersupplied string of up to 80 characters that Teradata TPump passes to the named exit routine.

    The string specification must be enclosed in single quote characters (').

    MSG 'string'

    Usersupplied string of up to 16 characters that Teradata TPump logs to:

  • The operator’s console for mainframeattached z/OS client systems
  • The system log for networkattached client systems for UNIX and Windows systems
  • The string specification must be enclosed in single quote characters (').

    EXIT64 name

    Keyword phrase that calls a user-defined exit where name is the name of a user-supplied library with a member name of _dynamn.

    When EXIT64 keyword is presented, the related row counters use 8-byte unsigned integer instead of 4-byte unsigned integer.

    EXITON name

    Keyword phrase that calls a user-defined exit where name is the name of a user-supplied library with a member name of _dynamn.

    The exit must be written in C, or in a language with a runtime environment that is compatible with C. On some versions of UNIX systems, ./prefix characters may have to be added to the EXITEON name specification if the module is in the current directory.

    The keyword EXITEON automatically supports the keyword EXIT64.

    Table 26 lists events that create notifications.

     

    Table 26: Events that Create Notifications 

     

    Notification Level

     

    Event

    Low

    Medium

    High

    Ultra

    Signifies

    Checkpoint Begin

    No

    No

    Yes

    Yes

    Teradata TPump started a checkpoint.

    Checkpoint End

    No

    No

    Yes

    Yes

    Teradata TPump successfully completed a checkpoint.

    CLIv2 Error

    Yes

    Yes

    Yes

    Yes

    Teradata TPump received a CLIv2 error.

    Database Error

    Yes

    Yes

    Yes

    Yes

    A Teradata Database error that terminates Teradata TPump.

    DML Error

    No

    No

    Yes

    Yes

    Teradata TPump is about to log a DML error to the error table.

    Error Table

    No

    No

    Yes

    Yes

    Successful processing of the SEL COUNT(*) request for the error table.

    Exit

    Yes

    Yes

    Yes

    Yes

    Teradata TPump completed a load task.

    File or INMOD Open

    No

    No

    Yes

    Yes

    Successful processing of the IMPORT command.

    Import Begin

    No

    No

    Yes

    Yes

    Teradata TPump is about to start reading records.

    Import End

    No

    No

    Yes

    Yes

    Last record has been read.

    Initialize

    Yes

    Yes

    Yes

    Yes

    Successful processing of the Notify option (BEGIN LOAD command)

    InitializeEON

    Yes

    Yes

    Yes

    Yes

    Successful processing of the Notify option (BEGIN LOAD command)

    Interim Run Statistics

    No

    No

    No

    Yes

    Teradata TPump is about to update the Monitor Interface table, or Teradata TPump successfully completed a checkpoint, or an Import has just completed successfully.

    Table Statistics

    No

    Yes

    Yes

    Yes

    Teradata TPump has successfully written the table statistics.

    Teradata Database Restart

    No

    Yes

    Yes

    Yes

    Teradata TPump received a crash error from Teradata or CLI.

    Usage Notes

    Multiple tables can be targeted by a single Teradata TPump job.

    If the script author is uncertain whether or not to use ROBUST restart logic, it is always safe to use the ROBUST ON parameter.

    To ensure data integrity, the SERIALIZE parameter defaults to ON in the absence of an explicit value if there are upserts in the Teradata TPump job.

    The statement rate per minute you set using the RATE keyword is also affected by the periodicity value. By default, Teradata TPump uses a periodicity value of four when enforcing the statement rate limit. You can adjust the periodicity rate from 1 to 1500 using a runtime parameter.

    For example, if you set the statement rate at 1600 and the periodicity at 10, then the maximum number of statements processed is 160 (1600/10) statements every 6 (60/10) seconds.

    Notice:

    A LOGOFF command entered after a BEGIN and before the matching END LOAD logs you off the Teradata TPump utility.