Teradata TPump Teradata SQL Statements - Parallel Data Pump

Teradata Parallel Data Pump Reference

Product
Parallel Data Pump
Release Number
15.00
Language
English (United States)
Last Update
2018-09-27
dita:id
B035-3021
lifecycle
previous
Product Category
Teradata Tools and Utilities

Teradata TPump Teradata SQL Statements

The following Teradata SQL statements supported by Teradata TPump are included in this chapter because they require special considerations for use with Teradata TPump. They are used for loading purposes and for creating Teradata TPump macros. The syntax and use of these Teradata SQL statements is described in detail in this chapter.

Table 25 lists the Teradata SQL Statements supported by Teradata TPump.

 

Table 25: Teradata TPump Teradata SQL Statements 

Statement

Definition

DATABASE

Changes the default database qualification for all DML statements.

DELETE

Removes specified rows from a table.

EXECUTE

Specifies a usercreated (predefined) macro for execution. The macro named in this statement resides in Teradata Database and specifies the type of DML statement (INSERT, UPDATE, or DELETE) being handled by the macro.

INSERT

Adds new rows to a table by directly specifying the row data to be inserted.

UPDATE Statement and Atomic Upsert

Changes field values in existing rows of a table.

ACCEPT

Purpose  

The ACCEPT command accepts data types and values from an external source and uses them to set one or more utility variables. The ACCEPT command is a valid command preceding LOGON and LOGTABLE commands.

Syntax  

where

 

Syntax Element

Description

var

Name of the utility variable that is to be set with the value accepted from the designated source

Character string values appear as quoted strings in the data file.

env_var

Environment variable that provides the value for the specified utility variables (var)

fileid

Data source of the external system.

The external system DD (or similar) statement specifies a file.

UNIX and Windows Systems

infilename (the path name for a file).

If the path name has embedded white space characters, enclose the entire path name in single or double quotes.

If the path name is enclosed by single quotation marks and there is an embedded single quotation mark, then an escape character (single quotation mark) needs to precede the embedded single quotation mark. Likewise, if the path name is enclosed by double quotation marks and there is an embedded double quotation mark, then an escape character (double quotation mark) needs to precede the embedded double quotation mark.

If the path name contains single or double quotation marks, it should be enclosed in either single or double quotation marks.

z/OS

a true DDNAME.

If DDNAME is specified, Teradata TPump reads data records from the specified source.

A DDNAME must obey the same rules for its construction as Teradata SQL column names, except that:

  • the “at” sign (@) is allowed as an alphabetic character
  • the underscore (_) is not allowed
  • The DDNAME must also obey the applicable rules of the external system and may reference a sequential or VSAM data set.

    If DDNAME represents a data source on magnetic tape, the tape may be either labelled or nonlabelled (if the operating system supports it).

    charpos1 and charpos2

    Start and end character positions of a field in each input record which contains extraneous information

    Teradata TPump ignores the specified field(s) as follows:

  • If charpos1 is specified, Teradata TPump ignores only the single specified character position.
  • If charpos1 THRU character positions are specified, Teradata TPump ignores character positions from charpos1 through the end of the record.
  • If THRU charpos2 is specified, Teradata TPump ignores character positions from the beginning of the record through charpos2.
  • If charpos1 THRU charpos2 is specified, Teradata TPump ignores character positions from charpos1 through charpos2.
  • Usage Notes  

    A single record, row, or input line is accepted from the designated source. Ensure that there is only one record in the file from which the ACCEPT command is getting the variables.

    If multiple variables are coded, each is sequentially assigned input text up to the first white space character encountered that is not within a quoted string.

    Input text for numeric values must be delimited only by white space or record boundaries. Input text for character strings must be enclosed in apostrophes. For example:

    .Accept age, name from file info;

    The data record provided to satisfy the preceding ACCEPT should include two fields. The following example shows two sample data records, where the first is correct but the next is not:

    32  'Tom'    /* This line contains valid data. */
    32   Tom     /* Tom is invalid data.           */

    An additional method of placing comments in input text is as follows:

    32  'Tom'; This line contains valid data.

    When the number of variables listed is greater than the number of responses available, unused variables remain undefined (NULL). If there are not enough variables to hold all responses, a warning message is issued. If the input source is a file, the next record (starting with the first) of the file is always retrieved.

    BEGIN LOAD

    Purpose  

    The BEGIN LOAD command initiates or restarts a Teradata TPump task, specifying the number of sessions to use and any other parameters needed to execute the task.

    Syntax

    where

     

    Syntax Element

    Description

    SESSIONS

    Keyword for the number of Teradata TPump sessions.

    number

    Number of sessions to be logged on for update purposes for Teradata TPump.

    A Teradata TPump task logs on and uses the number of sessions specified. One additional session is used for performing various utility functions.

    There is no default value for number; it must be specified. Neither is there a maximum value, except for systemwide session limitations, which vary among machines.

    Limiting the number of sessions conserves resources on both the external system and Teradata Database. This conservation is at the expense of a potential decrease in throughput and increase in elapsed time.

    threshold

    Minimum number of sessions to be logged on for update purposes for the utility.

    When logging on sessions, if the limits are reached above the threshold value, Teradata TPump stops trying to log on, and uses whatever sessions are already logged on.

    If the sessions run out before the threshold is reached, Teradata TPump logs off all sessions, waits for the time determined by the SLEEP value, and tries to log on again.

    ERRORTABLE

    Optional keyword for identifying a database and error table.

    Use a database name as a qualifying prefix to the error tables. Specifying a database that is not the production database avoids cluttering the production system with error tables. This means that because the database should have been allocated a lot of PERM space, that space will not have to be increased for all databases with tables involved in the Teradata TPump task.

    Notice:

    It is the user’s responsibility to ensure that the error table sharing was meaningful. Two jobs targeting different tables with different input record layouts cannot share the same error table.

    APPEND

    Specifies the existing error table.

    If the specified error table does not exist, Teradata TPump will create it. If the structure of the existing error table is not compatible with the error table Teradata TPump creates, the job will run into an error when Teradata TPump tries to insert or update the error table.

    NODROP

    Specifies to retain (not DROP) the error table, even it is empty at the end of a job.

    NODROP can be used with APPEND to persist the error table or alone.

    QUEUETABLE

    Selects the error table as a Queue Table.

    dbname.

    The qualified database for the error table.

    If the database is not specified, the database which contains the log table is used. The period following the dbname separates the database name from the tname parameter. If a different database is specified, it may help to avoid cluttering the production database with error tables.

    tname

    Error table that receives information about errors detected during the load.

    tname may be preceded by a database name qualifier. This table is referred to as the error table or ET table.

    TPump explicitly specifies FALLBACK as the level of protection.

    The default error table name is composed of the job name, followed by an underscore and sequence number of the load, then an underscore and an ET, as in jobname_nnn_ET.

    tname identifies a nonexisting table for a nonrestart task, or an existing table for a restart task.

    For all errors inserted in this error table, the identifiers for the offending combination of statement and data record are included in the appropriate row of tname. The columns in the error table allow the identification of a specific data record and statement combination which produced an error. The column names and definitions of the error table are:

    ImportSeqA byteint containing the IMPORT command sequence number.

    DMLSeqA byteint containing the sequence number of the DML command within the command file.

    SMTSeqA byteint containing the sequence number of the DML statement within the DML command.

    ApplySeqA byteint containing the sequence number of the APPLY clause within its IMPORT command.

    SourceSeqAn integer containing the position of a data record within a data source.

    DataSeqA byteint identifying the data source. This value is always one.

    ErrorCode An integer containing an error return code.

    ErrorMsgContains the corresponding error message for the error code.

    ErrorField A smallint, which, if valid, indicates the bad field.

    The names of record fields sent to Teradata Database are specified via the LAYOUT command, in conjunction with FIELD and TABLE commands.

    HostData A variable length byte string containing the data sent by the external system.

    RowInsetTimeIndicates when the row was inserted into the Teradata TPump error table.

    ERRLIMIT

    Optional keyword for setting a limit on records rejected for errors.

    When the ERRLIMIT is exceeded, Teradata TPump performs a checkpoint, then terminates the job. The data read before ERRLIMIT was exceeded will be submitted and completed before the job is terminated. This means when a job is terminated due to ERRLIMIT was exceeded, there may be more error records in the error table than the number specified in ERRLIMIT. To facilitate diagnosis of data errors, the ERRLIMIT should be greater than the number of statements packed into one request.

    errcount

    Error threshold for controlling the number of rejected records. Usage depends on whether used with the errpercent parameter.

  • When used without the errpercent parameter, specifies, as an unsigned integer, the number of records that can be rejected and recorded in tname during a load (all records sent between the BEGIN LOAD and END LOAD commands). The default is no limit.
  • When used with the errpercent parameter (which is approximate), specifies the maximum number of records that must be sent to Teradata Database before the errpercent parameter is applied.
  • For example, if errcount = 100 and errpercent = 5, then 100 records must be sent to Teradata Database before the approximate 5 percent rejection limit is applied. If only the first five records are rejected when the 100th record is sent, the limit is not exceeded. If there are six rejections, however, then the limit is exceeded. After the 100th record is sent, Teradata TPump stops processing if the 5 percent limit has been exceeded.

    When the limit has been exceeded, Teradata TPump writes an error message to the external system’s customary message destination and terminates the task.

    All tables in use are left in their state at the time of the termination. This allows errors to be corrected in data records and restarting of the task from the last checkpoint. If a restart is not possible or not desired, any unwanted tables should be dropped.

    CHECKPOINT

    Keyword indicating the number of minutes between the occurrences of checkpoints.

    This is followed by a frequency value.

    frequency

    The interval in minutes between check pointing operations. Specify an unsigned integer from 0 through 60, inclusive.

    To specify a CHECKPOINT frequency of less than or equal to 60, a checkpoint is recorded at the specified frequency, in minutes.

    To specify a CHECKPOINT frequency of more than 60, Teradata TPump terminates the job.

    Specifying a CHECKPOINT frequency of zero bypasses all checkpoint functions. Teradata TPump does not do checkpoint operations at all, which is different from Teradata Tools and Utilities 07.00 and earlier. The Teradata TPump job will not be restart-able once data loading has begun.

    If a CHECKPOINT frequency is not specified, check pointing occurs every 15 minutes by default.

    Whether specified or not, checkpoints are written at the end of each data input source.

    Note: Checkpoints should not be set for an FDLcompatible INMOD routine with the FOR, FROM, or THRU options. When an FDL‑compatible INMOD routine is used with the FOR, FROM, or THRU options, Teradata TPump terminates and an error message appears if the checkpoint frequency is other than zero.

    DATAENCRYPTION ON/OFF

    Keyword to encrypt import data and the request text during the communication between Teradata TPump and Teradata Database.

    If ON, the encryption will be performed. If DATAENCRYPTION is not specified, the default is OFF.

    The "y" runtime parameter applies the encryption to all connected sessions, which include the control session and the load sessions. This option only applies the encryption to the load sessions, which are the sessions specified by the SESSIONS keyword in the BEGIN LOAD command, and overrides the "y" runtime parameter when OFF is explicitly specified. For example, assuming the PARTITION command is not used in the job, when "y" runtime parameter is specified with the job and DATAENCRYPTION OFF is specified in the script, the encryption will only apply to the control session. Similarly, assuming the PARTITION command is not used in the job when "y" runtime parameter is not specified with the job, and DATAENCRYPTION ON is specified in the script, the encryption will apply to all load sessions but not the control session.

    When the PARTITION command is used, the encryption setting explicitly specified in the PARTITION command will override the setting of this option over the sessions defined by the PARTITION command.

    ArraySupport
    ON/OFF

    “ArraySupport ON|OFF” option to the .BEGIN LOAD command and the .DML command.

    When “ArraySupport ON” is specified in the .BEGIN LOAD command, the .DML commands enclosed in .BEGIN LOAD and .END LOAD command pair will use the ArraySupport feature for its DML statement, unless “ArraySupport OFF” is specified for the .DML command. The default value of ArraySupport for the .BEGIN LOAD command is OFF.

    When “ArraySupport ON|OFF” is not specified with the .DML command, the default value for ArraySupport for that .DML command is the effective setting of ArraySupport in the .BEGIN LOAD command where the .DML command resides. When “ArraySupport ON|OFF” is specified at the .DML command, the specified value overrides the default setting determined by the .BEGIN LOAD command.

    When a .DML command is using the ArraySupport feature (defined in the .BEGIN LOAD command or in the .DML command), it must contain one and only one DML statement, and the session partition that the .DML command references needs to be used by this .DML command exclusively.

    If the DML statement is an UPSERTtype statement, it can be specified as a pair of INSERT/UPDATE statements with DO INSERT FOR MISSING UPDATE clause. Teradata TPump will create its equivalent form of UPDATE … ELSE INSERT …, example Atomic Upsert, and use it as the actual DML statement. Or an UPDATE … ELSE INSERT … statement can be directly specified with DO INSERT FOR MISSING UPDATE clause.

    The nonatomic form of UPSERT is not supported by Teradata TPump Array Support.

    TENACITY

    Keyword (with hours parameter) defining how long the utility tries to log on the sessions needed to perform the Teradata TPump job.

    If a logon is denied, Teradata TPump delays for the time specified by the SLEEP parameter (the default is six minutes) and retries the logon. It retries until either the logon succeeds or the number of hours specified by TENACITY is exceeded.

    If the TENACITY parameter is not specified, the utility retries the logons for four hours.

    hours

    Teradata TPump tenacity factor, as an integral number of hours. Specifies how long Teradata TPump keeps trying to logon to the required sessions.

    The default value for hours is 4 if the parameter is not specified. If hours is specified as 0, Teradata TPump does not retry logons after a logon fails because of a capacity limit.

    When a “no more sessions” error appears (either a 301 return code from a workstation CLI or a 513 return code from a mainframe CLI), Teradata TPump drops the sessions already acquired, and terminates the job without trying another logon.

    LATENCY

    Keyword for flushing stale buffers.

    Note: When using the Teradata TPump latency option with Named Pipe Access Module, need_full_block = no option should be added in the Named Pipe Access Module initialization string.

    seconds

    Flushing threshold based on number of seconds oldest record has resided in buffer.

    LATENCY cannot be less than one second.

    If the SERIALIZE parameter is set to OFF, only the current buffer can possibly be stale. If SERIALIZE is ON, the number of stale buffers can range from zero to the number of sessions.

    NOTIMERPROCESS

    Keyword to tell Teradata TPump not to fork a child process as a timer process.

    When a child process is forked, the SIGUSR2 signal notifies the parent process when the latency period expires. When a child process is not forked, the SIGALRM signal notifies the Teradata TPump process when the latency period expires. A child process is necessary for the latency function to work properly on the UNIX platforms when the MQSeries Access Module is used.

    minutes

    Number of minutes to wait between unsuccessful logon attempts due to session limits errors on Teradata Database or CLIv2.

    If SLEEP is not specified, the default between unsuccessful logon attempts is 6 minutes.

    SERIALIZE ON/OFF

    Keyword to set the state (ON/OFF) of the serialization feature which, if ON, guarantees that operations on a given key combination (row) occur serially.

    If SERIALIZE is not specified, the default is OFF.

    This feature is meaningful only when a primary key for the loaded data is specified by using the KEY option of the FIELD command.

    Serialization must be set to OFF if the .TABLE command is used and the target table is a NoPI table.

    To ensure data integrity, the SERIALIZE parameter defaults to ON in the absence of an explicit value if there are upserts in a Teradata TPump job.

    PACKMAXIMUM

    Keyword requesting Teradata TPump to dynamically determine the maximum possible PACK factor for the current load.

    Maximum value is 1500.

    Displayed in message UTY6652, the value thus determined should be specifically used on subsequent runs, as the use of PACKMAXIMUM requires iterative interactions with the database during initialization to heuristically determine the maximum possible PACK factor.

    Note: Since the maximum packing factor has changed from 600 to 1500, it will take a longer time for Teradata TPump to test the packing factor if PACKMAXIMUM is specified.

    PACK

    Keyword for the number of statements to pack into a multiplestatement request.

    Maximum value is 1500.

    Packing improves network/channel efficiency by reducing the number of sends and receives between the application and Teradata Database.

    statements

    Number of statements, as a positive integer of up to 1500, to pack into a multiplestatement request.

    Default value is 20 statements per request.

    Under certain conditions, Teradata TPump may determine that the pack factor has been set too high. Teradata TPump then automatically lowers the pack setting to an appropriate value and issues warning message UTY6625, for instance:

    “UTY6625 WARNING: Packing has been changed to 12 statements per request” and continues.

    Packing improves network/channel efficiency by reducing the number of sends/receives between the application and the database.

    The packing factor is validated by sending a fully packed request to Teradata Database using a prepare. This test checks for syntax problems and requests that are excessively large and may overwhelm the parser.

    To simplify the script development process, Teradata TPump ignores certain errors returned by an overloaded parser, shrinks the request, retries the prepare until it executes successfully and finally, issues a warning noting the revised packing factor size.

    When this happens, the Teradata TPump script should be modified to eliminate the warning, thereby avoiding the timeconsuming process of shrinking the request.

    A packing failure may occur if the source parcel length does not match the data defined. If this happens, Teradata TPump issues the message:

    “UTY2819 WARNING: Packing may fail because input data does not match with the data defined.”

    To resolve this problem, increase the packing factor and resubmit the job.

    RATE

    Keyword for entering the rate at which statements are sent to Teradata Database.

    RETRYTIMES nn

    Keyword for retry times number of retry times.

    Default is 16.

    If nn equals 0, the retry times will be set to 16. If retrytimes is set, this only takes effect for the requests/data between "BEGIN LOAD" and "END LOAD" pair.

    statement_rate

    Initial maximum rate at which statements are sent to Teradata Database per minute.

    The statement rate must be a positive integer. If the statement rate is unspecified, the rate is unlimited.

    If the statement_ rate is less than the statement packing factor, Teradata TPump sends requests smaller than the packing factor.

    If the Teradata TPump Monitor is in use, the statement_rate can be changed later on.

    SLEEP

    Keyword for the number of minutes to sleep.

    NOATOMICUPSERT

    Keyword to perform nonatomic upsert operations for UPSERT DMLs in the job script if these UPSERT DMLs are not provided in the Atomic UPSERT form.

    NOMONITOR

    Keyword to prevent Teradata TPump from checking for statement rate changes from, or update status information for, the Teradata TPump Monitor.

    ROBUST ON/OFF

    The ON parameter ensures data integrity when target tables have identity columns.

    The OFF parameter signals Teradata TPump to use simple restart logic. In this case, restarts cause Teradata TPump to begin where the last checkpoint occurred in a job. Any processing that occurred after the checkpoint is redone. This method does not have the extra overhead of the additional database writes in the Robust logic.

    Notice:

    Certain errors may cause reprocessing, resulting in extra error table rows due to re-executing statements (attempting to re‑insert rows, for example). Or, if the target table allows duplicate rows, re-executing statements may cause extra duplicate rows to be inserted into the target table instead of causing extra error table rows.

    Simple logic is adequate in certain DML statements that can be repeated without changing the results of the operation. Examples of statements that are not simple logic include the following:

  • INSERTs into tables that allow duplicate rows (MULTISET tables).
  • Selfreferencing DML statements such as:
  • “UPDATE FOO SET A=A+1...”
  • “UPDATE FOO SET A = 3 WHERE A=4”
  • MACRODB

    Keyword for database to contain any macros used by Teradata TPump.

    dbname

    Name of database which is to contain any macros built/used by Teradata TPump.

    This database overrides the default placement of macros into the database which contains the log restart table.

    NOTIFY

    Teradata TPump implementation of the notify user exit option:

  • NOTIFY OFF suppresses the notify user exit option.
  • NOTIFY LOW enables the notify user exit option for those events signified by “Yes” in the Low Notification Level column of Table 26.
  • NOTIFY MEDIUM enables the notify user exit option for the most significant events, as specified by “Yes” in the Medium Notification Level column of Table 26.
  • NOTIFY HIGH enables the notify user exit option for every Teradata TPump event that involves an operational decision point, as specified by “Yes” in the High Notification Level column of Table 26.
  • NOTIFY ULTRA enables the notify user exit option for every Teradata TPump event that involves an operational decision point, as specified by “Yes” in the ULTRA Notification Level column of Table 26.
  • Beginning with Teradata Tools and Utilities 14.00, TPump terminates with a fatal error if any of NOTIFY error conditions occur.
  • EXIT name

    Keyword phrase that calls a userdefined exit where name is the name of a usersupplied library with a member name of _dynamn.

    The exit must be written in C, or in a language with a runtime environment that is compatible with C.

    On some versions of UNIX systems, ./prefix characters may have to be added to the EXIT name specification if the module is in the current directory.

    TEXT 'string'

    Usersupplied string of up to 80 characters that Teradata TPump passes to the named exit routine.

    The string specification must be enclosed in single quote characters (').

    MSG 'string'

    Usersupplied string of up to 16 characters that Teradata TPump logs to:

  • The operator’s console for mainframeattached z/OS client systems
  • The system log for networkattached client systems for UNIX and Windows systems
  • The string specification must be enclosed in single quote characters (').

    EXIT64 name

    Keyword phrase that calls a user-defined exit where name is the name of a user-supplied library with a member name of _dynamn.

    When EXIT64 keyword is presented, the related row counters use 8-byte unsigned integer instead of 4-byte unsigned integer.

    EXITON name

    Keyword phrase that calls a user-defined exit where name is the name of a user-supplied library with a member name of _dynamn.

    The exit must be written in C, or in a language with a runtime environment that is compatible with C. On some versions of UNIX systems, ./prefix characters may have to be added to the EXITEON name specification if the module is in the current directory.

    The keyword EXITEON automatically supports the keyword EXIT64.

    Table 26 lists events that create notifications.

     

    Table 26: Events that Create Notifications 

     

    Notification Level

     

    Event

    Low

    Medium

    High

    Ultra

    Signifies

    Checkpoint Begin

    No

    No

    Yes

    Yes

    Teradata TPump started a checkpoint.

    Checkpoint End

    No

    No

    Yes

    Yes

    Teradata TPump successfully completed a checkpoint.

    CLIv2 Error

    Yes

    Yes

    Yes

    Yes

    Teradata TPump received a CLIv2 error.

    Database Error

    Yes

    Yes

    Yes

    Yes

    A Teradata Database error that terminates Teradata TPump.

    DML Error

    No

    No

    Yes

    Yes

    Teradata TPump is about to log a DML error to the error table.

    Error Table

    No

    No

    Yes

    Yes

    Successful processing of the SEL COUNT(*) request for the error table.

    Exit

    Yes

    Yes

    Yes

    Yes

    Teradata TPump completed a load task.

    File or INMOD Open

    No

    No

    Yes

    Yes

    Successful processing of the IMPORT command.

    Import Begin

    No

    No

    Yes

    Yes

    Teradata TPump is about to start reading records.

    Import End

    No

    No

    Yes

    Yes

    Last record has been read.

    Initialize

    Yes

    Yes

    Yes

    Yes

    Successful processing of the Notify option (BEGIN LOAD command)

    InitializeEON

    Yes

    Yes

    Yes

    Yes

    Successful processing of the Notify option (BEGIN LOAD command)

    Interim Run Statistics

    No

    No

    No

    Yes

    Teradata TPump is about to update the Monitor Interface table, or Teradata TPump successfully completed a checkpoint, or an Import has just completed successfully.

    Table Statistics

    No

    Yes

    Yes

    Yes

    Teradata TPump has successfully written the table statistics.

    Teradata Database Restart

    No

    Yes

    Yes

    Yes

    Teradata TPump received a crash error from Teradata or CLI.

    Usage Notes

    Multiple tables can be targeted by a single Teradata TPump job.

    If the script author is uncertain whether or not to use ROBUST restart logic, it is always safe to use the ROBUST ON parameter.

    To ensure data integrity, the SERIALIZE parameter defaults to ON in the absence of an explicit value if there are upserts in the Teradata TPump job.

    The statement rate per minute you set using the RATE keyword is also affected by the periodicity value. By default, Teradata TPump uses a periodicity value of four when enforcing the statement rate limit. You can adjust the periodicity rate from 1 to 1500 using a runtime parameter.

    For example, if you set the statement rate at 1600 and the periodicity at 10, then the maximum number of statements processed is 160 (1600/10) statements every 6 (60/10) seconds.

    Notice:

    A LOGOFF command entered after a BEGIN and before the matching END LOAD logs you off the Teradata TPump utility.

    DATABASE

    Purpose  

    Teradata TPump supports the Teradata SQL DATABASE statement, which changes the default database qualification for all unqualified DML and DDL statements.

    Syntax  

    where

     

    Syntax Element

    Description

    database

    New qualified default database for the error table

    Changes the database from the one originally specified by the BEGIN LOAD command.

    Usage Notes  

    The DATABASE command only affects native SQL commands. In particular, it has no effect on the BEGIN LOAD command.

    The DATABASE command does affect INSERT, UPDATE, DELETE, and EXEC statements issued as part of a load. (When Teradata TPump logs on sessions, it immediately issues a DATABASE statement on each session.)

    The DATABASE command does not affect the placement of Teradata TPump macros.

    DATEFORM

    Purpose  

    The DATEFORM command lets you define the form of the DATE data type specifications for the Teradata TPump job.

    Syntax  

    where

     

    Syntax Element

    Description

    INTEGERDATE

    Keyword that specifies integer DATE data types for the Teradata TPump job

    This is the default Teradata DATE data type specification for Teradata TPump jobs if you do not enter a DATEFORM command.

    ANSIDATE

    Keyword that specifies ANSI fixedlength CHAR(10) DATE data types for the Teradata TPump job

    Usage Notes

    Table 27 describes the things to consider when using the DATEFORM command.

     

    Table 27: DATEFORM Command Usage Notes 

    Topic

    Usage Notes

    Command Frequency and Placement

  • Only one DATEFORM command can be used.
  • The DATEFORM command must be entered before LOGON command.
  • Data Type Conversions

    When the ANSIDATE specification is used, the ANSI/SQL DateTime data types must be converted to fixedlength CHAR data types when specifying the column/field names in the Teradata TPump FIELD command.

     

    See the FIELD command for a description of the fixedlength CHAR representations for each DATE, TIME, TIMESTAMP, and INTERVAL data type specification.

    Release Applicability

    The ANSIDATE specification is valid for Teradata TPump jobs on Teradata Database for a UNIX OS.

    DELETE

    Purpose  

    Teradata TPump supports the DELETE Teradata SQL statement, which removes rows from a table.

    Syntax  

    where

     

    Syntax Element

    Description

    tname

    Table from which rows are to be deleted

    tname is qualified either explicitly by database name, or by the current default database.

    WHERE condition

    Conditional clause identifying the row(s) to delete

    The conditional clause uses values from input data record fields as defined by a FIELD command or TABLE command of the layout referenced by an IMPORT using this statement.

    Usage Notes

    The following notes describe how to use DELETE statements following a DML command.

    A DELETE statement may also be used in the support environment; normal rules for DELETE are followed in that case.

    Teradata TPump operates only on single table statements so DELETE statements must not contain any joins.

    To delete records from a table, the username specified on the LOGON command must have DELETE privilege on the specified table.

    When the condition(s) of the DELETE statement WHERE clause are evaluated, the result can be definitely true, definitely false, or indeterminate. If the result is true for a specific row, Teradata TPump deletes the row. An indeterminate result, due to an abnormal arithmetic condition such as underflow, overflow, or division by zero, is treated as an error, and Teradata TPump records both row and error code in the error table.

    The DELETE statement must identify only one object.

    Remember the following when constructing scripts:

  • A DELETE statement can be applied to either a table or view, provided that the view does not specify a join.
  • Equality values for all the primary index columns should normally be specified in the WHERE clause.
  • The OR construct can be used in the WHERE clause of a DELETE statement; alternatively, two or more separate DML statements (one per OR term) can be used, with the DML statements applied conditionally with the APPLY clause of the IMPORT command. The nature of the alternatives will usually make one of the methods more appropriate.

  • The column(s) specified in this clause need not be a part of any index, but can be one or more nonindexed columns. This clause may specify nonequality values for any combination of columns of unique indices, or any values for other columns.
  • The maximum number of INSERT, UPDATE, and DELETE statements that can be referenced in an IMPORT is 128. The 128th DML which would cause the insertion of the DML sequence number of 128 for the DMLSEQ field in the error table could lead to Teradata Database 3520 error.
  • The maximum number of DML statements that can be packed into a request is 1500. The default number of statements packed is 20.
  • DML validtime qualifier and NONTEMPORAL semantics are supported. For more information, see SQL Data Manipulation Language (B035‑1146).

    Example  

    The following example uses an input data source containing a series of one‑field, four‑byte records. Each record contains the value (EmpNum) of the primary index column (EmpNo) of a row to be deleted from the Employee table.

    .BEGIN LOAD SESSION number;
    .LAYOUT Layoutname; 
    .FIELD EmpNum 1 INTEGER; 
    .DML LABEL DMLlabelname; 
    DELETE Employee WHERE EmpNo = :EmpNum; 
    .IMPORT INFILE Infilename LAYOUT Layoutname  APPLY DMLlabelname;
    .END LOAD; 

    DISPLAY

    Purpose  

    The DISPLAY command can be used to write messages to the specified destination.

    Syntax  

    where

     

    Syntax Element

    Description

    'text'

    Text to be written to the specified output destination

    fileid

    Data source of the external system.

    The external system DD (or similar) statement specifies a file.

    UNIX and Windows Systems

    infilename (the path name for a file)

    If the path name has embedded white space characters, enclose the entire path name in single or double quotes.

    If the path name is enclosed by single quotation marks and there is an embedded single quotation mark, then an escape character (single quotation mark) needs to precede the embedded single quotation mark. Likewise, if the path name is enclosed by double quotation marks and there is an embedded double quotation mark, then an escape character (double quotation mark) needs to precede the embedded double quotation mark.

    If the path name contains single or double quotation marks, it should be enclosed in either single or double quotation marks.

    z/OS

    a true DDNAME.

    If DDNAME is specified, Teradata TPump reads data records from the specified source.

    A DDNAME must obey the same rules for its construction as Teradata SQL column names, except that:

  • the “at” sign (@) is allowed as an alphabetic character
  • the underscore (_) is not allowed
  • A DDNAME must obey the applicable rules of the external system and may reference a sequential or VSAM data set.

    If DDNAME represents a data source on magnetic tape, the tape may be either labelled or nonlabelled (if the operating system supports it).

    Usage Notes  

    Utility variables are replaced by their values before text is displayed. This is done by preceding the variable name with an ampersand (&). To display the name of a utility variable, code two “&”s instead of one.

    To display an apostrophe within the text string, two consecutive apostrophes (single quotes) must be used to distinguish it from both the single quotes enclosing the string and a regular double quote.

    In UNIX systems, if outfilename is used to redirect stdout as well as the file in a DISPLAY command, the results written to outfilename may be incomplete due to conflicting writes to the same file.

    On UNIX systems, use an asterisk (*) as the fileid specification to direct the display messages to the system console/standard output (stdout) device. The system console is the:

  • Display screen in interactive mode
  • Standard output device in batch mode
  • DML

    Purpose  

    The DML command defines a label and error treatment options for one or more immediately following DML statements. DML statements relevant to a Teradata TPump job are INSERT, UPDATE, DELETE, and EXECUTE, with UPDATE and INSERT statements sometimes paired to form either a basic upsert or an Atomic upsert operation.

    Syntax  

    where

     

    Syntax Element

    Description

    LABEL

    Keyword indicating that the following parameter is a label for the DML statements that follow

    label

    Unique label is to be used for the immediately following set of one or more DML statements

    A label must obey the same rules for its construction as Teradata SQL column names.

    The label name may be referenced in the APPLY clause of an IMPORT command.

    MARK

    Keyword indicating that the system should make a duplicate, missing, or extra INSERT, UPDATE, or DELETE row entry in the error table and continue processing.

    A row is a duplicate row if all column values in the row are the exact duplicate of another row. Duplicate row checking is bypassed if the table is a multiset table (which allows duplicate rows), or if the table has one or more unique indexes (the uniqueness test(s) make any duplicate row check unnecessary).

    If MARK is set and a uniqueness violation occurs on either a unique primary index or a unique secondary index, the offending rows go to the error table, whether or not the row is a duplicate row. In the case of an upsert, both the INSERT and UPDATE portions must fail for an error to be recorded.

     

    If neither MARK or IGNORE is specified for duplicate rows, MARK applies to both INSERTs and UPDATEs. Similarly, if neither MARK or IGNORE is specified for missing or extra rows, MARK applies to both UPDATEs and DELETEs.

     

    MARK is the default for:

  • Both UPDATEs and DELETEs that refer to missing or extra rows.
  • Duplicate rows arising from both INSERTs and UPDATEs, except when those statements are combined to form an upsert, in which case the default is IGNORE.
  • IGNORE

    keyword indicating that the system should not make an error table entry for the duplicate, missing, or extra INSERT, UPDATE, or DELETE row

    The system should continue processing instead.

    A row is a duplicate row if all column values in the row are the exact duplicate of another row. Duplicate row checking is bypassed if the table is a multiset table (which allows duplicate rows), or if the table has one or more unique indexes (the uniqueness test(s) make any duplicate row check unnecessary); in these cases, IGNORE DUPLICATE ROWS has no effect. Any uniqueness violations will result in the offending rows going to the error table.

     

    If neither INSERT nor UPDATE is specified for duplicate rows, IGNORE applies to both INSERTs and UPDATEs.

     

    Similarly, if neither UPDATE nor DELETE is specified for missing or extra rows, IGNORE applies to both UPDATEs and DELETEs. IGNORE is the default condition for an upsert operation.

    INSERT

    The upsert feature may be used (when used as DO INSERT FOR MISSING UPDATE ROWS or DO INSERT ROWS).

     

    An upsert saves time while loading a database. An upsert completes, in one pass, an operation which requires two passes for other utilities. The DML statements that follow this option must be in the order of a single UPDATE statement followed by a single INSERT statement.

    This option first executes the UPDATE statement. If the UPDATE fails because the target row does not exist, Teradata TPump automatically executes the INSERT statement. This capability allows updates to the database without first presorting the data. Otherwise, the data would have to be sorted into:

  • rows that need to be updated
  • rows that need to be inserted
  • Further information on the usage and restrictions of the upsert feature appears in the following usage notes.

    PARTITION

    Optional keyword used to name a session partition to be used for all SQL requests associated with this DML command

    If this keyword is not present, a session created from the SESSIONS will be used.

    Note: If serialization of two or more DML statements is required, the statements cannot be put in different partitions. Serialization requires that all DML statements with identical hash values of the rows be submitted from the same session.

    When using multiple DMLs and pack > 1 more throughput can be achieved by using partitions. If DBQL analysis reflects high PE cpu compared to AMP cpu, using partitions can reduce parsing due to better utilization of statement cache by assigning each DML to a unique TPUMP partition so all requests per partition are textually identical thus using the same single statement cache entry.

    Additional performance benefits can be achieved with concurrent step execution with higher pack and by increasing the number of sessions.

    partition_name

    Parameter identifying the partition name

    The partition name must obey the same rules for its construction as Teradata SQL column names.

    SERIALIZEON

    Keyword used to turn serialization on for the fields specified

    SERIALIZEON keyword may be used before, after, or between any IGNORE or MARK statements.

    serialize_on_field

    Parameter identifying the field names where serialization is turned on

    This is the same field name used in the LAYOUT command which was used by the INSERT statement and referenced by the APPLY clause.

    Separate the field names with a comma and enclose them in parentheses.

    USE

    Keyword used to specify the fields that are to be used with a DML’s SQL statements

    Use of this keyword allows specification of the FIELDs from the LAYOUT command which are actually needed for each DML, so that data from all fields will not be sent.[

    The USE keyword may be placed before, after, or between any IGNORE/MARK statements.

    use_field

    Parameter identifying the field names to use

    Every LAYOUT FIELD used by any of the DML’s SQL statements must be enumerated in the USE list; otherwise, an error will occur.

    Separate the field names with a comma and enclose them in parentheses.

    ArraySupport
    ON/OFF

    “ArraySupport ON|OFF” option to the .BEGIN LOAD command and the .DML command

    When “ArraySupport ON” is specified in the .BEGIN LOAD command, the .DML commands enclosed in .BEGIN LOAD and .END LOAD command pair will use the ArraySupport feature for its DML statement, unless “ArraySupport OFF” is specified for the .DML command. The default value of ArraySupport for the .BEGIN LOAD command is OFF.

    When “ArraySupport ON|OFF” is not specified with the .DML command, the default value for ArraySupport for that .DML command is the effective setting of ArraySupport in the .BEGIN LOAD command where the .DML command resides. When “ArraySupport ON|OFF” is specified at the .DML command, the specified value overrides the default setting determined by the .BEGIN LOAD command.

    When a .DML command is using the ArraySupport feature, it must contain one and only one DML statement and the session partition that the .DML command references needs to be used exclusively by this .DML command.

    If the DML statement is an UPSERTtype statement, it can be specified as a pair of INSERT/UPDATE statements with DO INSERT FOR MISSING UPDATE clause. Teradata TPump will create its equivalent form of UPDATE … ELSE INSERT …, example Atomic Upsert, and use it as the actual DML statement. Or an UPDATE … ELSE INSERT … statement can be directly specified with DO INSERT FOR MISSING UPDATE clause.

    The nonatomic form of UPSERT is not supported by Teradata TPump Array Support.

    Usage Notes

    The SQL EXECUTE command must be used between the BEGIN LOAD command and the END LOAD command.

    All INSERT, UPDATE, DELETE, and EXECUTE statements specified in the Teradata TPump script should fully specify the primary index of the referenced table to prevent the generation of table‑level locks.

    A maximum of 1500 DML statements may be packed into a request; the default is 20 statements.

    Teradata TPump assumes that row hash locking is used by INSERT, UPDATE, DELETE, and EXECUTE statements. If row hash locking is not used, Teradata TPump will run anyway, but may encounter trouble because table‑level locking will cause each statement to block.

    In addition, Teradata TPump does not support UPDATE or EXECUTE statements that modify the primary index of the target table. Teradata TPump performs no checking to prevent the script author from creating DML that requests unreasonable updates, except that Teradata TPump will not use Atomic UPSERT if the UPDATE portion of the UPSERT specifies an unreasonable update. This restriction is imposed by Teradata Database.

    IGNORE DUPLICATE ROWS does not apply if there are ANY unique indexes in the row.

    Teradata TPump converts INSERT, UPDATE, and DELETE statements into macro equivalents, and, depending on the packing specified, submits multiple statements in one request. Teradata TPump also supports the EXECUTE statement, which can be used to bypass the macro creation step for frequently executed macros. For more information on the EXECUTE statement, refer to EXECUTE in this chapter.

    The maximum number of INSERT, UPDATE, and DELETE statements that can be referenced in an IMPORT is 128. The 128th DML which would cause the insertion of the DML sequence number of 128 for the DMLSEQ field in the error table could lead to Teradata Database 3520 error.

    At the end of an IMPORT, an environmental variable is established for each DML command executed. Teradata TPump variables are not limited to 30 characters. These variables contain the activity counts associated with each statement. The variables created are of the form:

    &IMP <n>_<Apply label>_<x> 

    where

    n = the number of the IMPORT, from one through four.

    Apply label = the label of the clause containing the DML command in question.

    x = the number of the statement within the containing APPLY clause.

    Serialization

    The SERIALIZEON keyword allows serialization to be turned on for the specified fields. The SERIALIZEON keyword can be used before, after, or between any IGNORE or MARK statements.

    The SERIALIZEON keyword can also be used with the SERIALIZE keyword in the BEGIN LOAD command and with the KEY keyword in the FIELD command. When it is used in this way, the DML‑level serialization ignores and overrides the BEGIN LOAD‑level serialization.

    In addition, the DML serialized APPLYs can be mixed with nonserialized DML APPLYs in the same IMPORT command.

    See “BEGIN LOAD” and “FIELD” for details about these commands.