Log Table Space Requirements - Parallel Data Pump

Teradata Parallel Data Pump Reference

Product
Parallel Data Pump
Release Number
15.00
Language
English (United States)
Last Update
2018-09-27
dita:id
B035-3021
lifecycle
previous
Product Category
Teradata Tools and Utilities

Log Table Space Requirements

The calculation of space requirements for a Teradata TPump log table is highly dependent on the specifics of the job. Although there are mandatory inserts for every Teradata TPump job, others occur on a job‑dependent basis. See “Estimating Space Requirements” for details on how to calculate log table space.

NAME

Purpose  

The NAME command assigns a unique job name identifier to the environmental variable &SYSJOBNAME.

Syntax  

where

 

Syntax Element

Description

jobname

Character string that identifies the name of a job in a maximum of 16 characters

If this command is not specified, the default job name of ltdbase_logtable is used, where:

  • ltdbase is a character string of up to the first seven characters of the name of the database containing the log table.
  • logtable is a character string with the first eight characters of the log table name.
  • Usage Notes  

    The NAME environmental command must be used only once, in order to set the job name and the variable &SYSJOBNAME. Further attempts to execute the command will fail.

    The NAME command sets the variable &SYSJOBNAME to the specified string. The string is truncated to 16 characters. It is an error to use this command more than once in a Teradata TPump script or after the first BEGIN LOAD command in the script.

    If &SYSJOBNAME is not set using the NAME command, it defaults to MYYYYMMDD_HHMMSS_LLLLL, where

    M = macro
    MM = month
    DD = day
    YYYY = year
    hh = hour
    mm = minute
    ss = second
    lllll = is the low order 5 digits of the logon sequence number returned by the database from the .LOGON command.

    This variable is not set until created with the NAME command, or with the first BEGIN LOAD by default. Any attempt to use it before a NAME command is issued (or before the first BEGIN LOAD if there is no NAME command), results in a syntax error. This variable is significant because it is used by Teradata TPump when composing default names for various database artifacts, namely the error table and Teradata TPump‑created macros.

    Note: If serialization for two or more DML statements is required, they cannot be put in different partitions. Serialization requires that all DML statements with identical hash values of the rows be submitted from the same session.

    PARTITION

    Purpose  

    The PARTITION command defines a collection of sessions used to transfer SQL requests to Teradata Database. A DML command may name the partition to be used for its requests to the database.

    A default session partition may still be created using the SESSIONS and PACK parameters of the BEGIN LOAD command.

    This command works in conjunction with a DML parameter, PARTITION, which names the session partition that a DML’s SQL will use. If the DML command does not have a PARTITION parameter, then the default partition created using the SESSIONS and PACK parameters of the BEGIN LOAD command will be used.

    Syntax  

    where

     

    Syntax Element

    Description

    number

    Number of sessions to be logged on for the partition

    Teradata TPump logs on and uses the number of sessions specified to communicate requests to Teradata Database.

    There is no default value for number; it must be specified. Neither is there a maximum value, except for systemwide session limitations, which vary among machines.

    Limiting the number of sessions conserves resources on both the external system and Teradata Database. This conservation is at the expense of a potential decrease in throughput and increase in elapsed time.

    DATAENCRYPTION ON/OFF

    Keyword to encrypt import data and the request text during the communication between Teradata TPump and Teradata Database for the sessions defined in the PARTITION command

    If ON, the encryption will be performed. If OFF, the encryption will not be performed. If DATAENCRYPTION is not specified, the default is OFF when "y" runtime parameter is not specified and DATAENCRYPTION is OFF in the BEGIN LOAD command. If "y" runtime parameter is specified or DATAENCRYPTION is ON in the BEGIN LOAD command, the default is ON.

    This option applies to the sessions defined by the PARTITION command. When the session is specified explicitly, the setting overrides the encryption setting by the "y" runtime parameter and by the DATAENCRYPTION option in the BEGIN LOAD command for the sessions defined in the PARTITION command.

    PACK

    Keyword for the number of statements to pack into a multiplestatement request

    Maximum value is 1500.

    Packing improves network/channel efficiency by reducing the number of sends and receives between the application and Teradata Database.

    PACKMAXIMUM

    Keyword requesting Teradata TPump to dynamically determine the maximum possible PACK factor for the current partition

    Maximum value is 1500.

    Displayed in message UTY6652, the value thus determined should be specifically used on subsequent runs, as the use of PACKMAXIMUM requires iterative interactions with the database during initialization to heuristically determine the maximum possible PACK factor.

    partition_name

    Name assigned to the partition for reference by one or more subsequent DML commands

    A partition name must obey the same rules for its construction as Teradata SQL column names. The name specified may be used in the PARTITION clause of a DML command.

    SESSIONS

    Keyword for designating the number of sessions for the partition

    statements

    Number of statements, as a positive integer of up to 1500, to pack into a multiplestatement request

    Default value is 20 statements per request.

    Note: Under certain conditions, Teradata TPump may determine that the pack factor has been set too high. Teradata TPump then automatically lowers the pack setting to an appropriate value and issues warning message UTY6625, for example:

    “UTY6625 WARNING: Packing has been changed to 12 statements per request”, and continues.

    Packing improves network/channel efficiency by reducing the number of sends/receives between the application and the database.

    The packing factor is validated by sending a fully packed request to Teradata Database using a prepare. This test checks for syntax problems and requests that are excessively large and overwhelm the parser.

    To simplify the script development process, Teradata TPump ignores certain errors returned by an overloaded parser, shrinks the request, retries the prepare until it executes successfully and finally, issues a warning noting the revised packing factor size.

    When this happens, the Teradata TPump script should be modified to eliminate the warning, which avoids the timeconsuming process of shrinking the request.

    Note: A packing failure may occur if the source parcel length does not match the data defined. If this happens, Teradata TPump issues the message:

    “UTY2819 WARNING: Packing may fail because input data does not match with the data defined.”

    To resolve this problem, increase the packing factor and resubmit the job.

    threshold

    Minimum number of sessions to be logged on for the partition

    When logging on sessions, if system limits are reached above the threshold value, Teradata TPump stops trying to log on, and uses whatever sessions are already logged on.

    If the sessions run out before the threshold is reached, Teradata TPump logs off all sessions, waits for the time determined by the SLEEP value (specified in the BEGIN LOAD command), and tries to log on again.

    Example  

    A sample script that uses partitioning follows:

    .LOGTABLE TPLOG01;
    .LOGON <TDPID>/<USERID>,<PASSWORD>;
     DROP TABLE TPTBL01;
     DROP TABLE TPTBL02;
     DROP TABLE TPERR01;
     
     CREATE TABLE TPTBL01, FALLBACK(
      C1  CHAR(12) not null,
      C2  CHAR(8) not null)
      PRIMARY INDEX (C1);
     
     CREATE TABLE TPTBL02, FALLBACK(
      C1  CHAR(12),
      C2  CHAR(8),
      C3  CHAR(6))
      UNIQUE PRIMARY INDEX (C1);
     
    .BEGIN LOAD
       ERRLIMIT 100 50
       CHECKPOINT 15
       TENACITY 2
       ERRORTABLE TPERR01
       ROBUST off
       serialize on
       ;
     
     .LAYOUT LAY02;
     .FIELD cc1 * CHAR(12) key;
     .FIELD cc2 * CHAR(8);
     .FIELD cc3 * CHAR(6);
     .filler space1 * char(1);
     
    .partition part1  pack 10 sessions 10;
    .partition part2 sessions 5 1 packmaximum;
     
    .DML LABEL LABEL01 partition part1
     DO INSERT FOR MISSING ROWS
     ignore extra update rows
       use(cc1, cc2);
     
     UPDATE TPTBL01
     SET C2 = :CC2
     WHERE C1 = :CC1;
     INSERT TPTBL01 (C1, C2)
     VALUES (:CC1,:CC2);
     
    .DML LABEL LABEL02 partition part2
     serializeon( cc1 )
     ignore extra update rows
     DO INSERT FOR MISSING UPDATE ROWS;
     
     UPDATE TPTBL02 SET C2 = :CC2 WHERE C1 = :CC1;
     INSERT TPTBL02 (C1, C2, C3)
              VALUES (:CC1,:CC2,:CC3);
     
    .IMPORT INFILE TpumpData001.txt FORMAT TEXT
                   LAYOUT LAY02
                   APPLY LABEL01
                   APPLY LABEL02 where CC2 = '00000001';
     
    .END LOAD;
    .LOGOFF;

    ROUTE

    Purpose  

    The ROUTE command identifies the destination of various outputs produced by Teradata TPump.

    Syntax  

    where

     

    Syntax Element

    Description

    MESSAGES

    Preferred location where the messages be redirected (normally written to DDNAME SYSPRINT in z/OS or stdout in UNIX systems); that is, sent to an additional destination, or both

    fileid1 and fileid2

    Alternate message destination in the external system

    UNIX and Windows Systems

    Fileid is the path name for a file. If the path name has embedded white space characters, enclose the entire path name in single or double quotes.

    If the path name is enclosed by single quotation marks and there is an embedded single quotation mark, then an escape character (single quotation mark) needs to precede the embedded single quotation mark. Likewise, if the path name is enclosed by double quotation marks and there is an embedded double quotation mark, then an escape character (double quotation mark) needs to precede the embedded double quotation mark.

    If the path name contains single or double quotation marks, it should be enclosed in either single or double quotation marks.

    ECHO

    Additional destination, with a fileid specification

    Use the ECHO keyword to specify, for example, that messages be captured in a file (fileid2) while still being written to the terminal.

    Note: The ECHO OFF specification cancels the additional file specification of a previously established ECHO destination.

    Usage Notes  

    In z/OS, fileid is a true DDNAME; and in the UNIX OS, it is a file pathname. If DDNAME is specified, Teradata TPump writes data records to the specified destination. A DDNAME must obey the same rules for its construction as Teradata SQL column names except that the “at” sign (@) is allowed as an alphabetic character and the underscore ( _ ) is not allowed. The DDNAME must also obey the applicable rules of the external system and may reference a sequential or VSAM data set. If DDNAME represents a data source on magnetic tape, the tape may be either labelled or nonlabelled (if the operating system supports it).

    On UNIX systems, an asterisk (*) can be used as the fileid1 or fileid2 specifications to route messages to the system console/standard output (stdout) device. The system console is the:

  • Display screen in interactive mode
  • or

  • Standard output device in batch mode
  • Example  

    .ROUTE MESSAGES TO FILE OUTPUT WITH ECHO TO FILE SYSOUT;

    ECHO, when specified with OFF, stops routing output to the previously established echo destination.

    Example  

    .ROUTE MESSAGES FILE OUTPUT;

    The messages are written to the file designated by OUTPUT from this point unless redirected by another ROUTE command.

    In UNIX systems, if “outfilename is used to redirectstdout,” and also as the file in a ROUTE WITH ECHO command, the results written tooutfilename may be incomplete due to conflicting writes to the same file.

    RUN FILE

    Purpose  

    The RUN FILE command invokes the specified external source as the current source of commands and statements.

    Syntax  

    where

     

    Syntax Element

    Description

    fileid

    Data source of the external system

    The client system DD or equivalent statement specifies a file.

    UNIX and Windows Systems

    infilename (the path name for a file). If the path name has embedded white space characters, enclose the entire path name in single or double quotes.

    If the path name is enclosed by single quotation marks and there is an embedded single quotation mark, then an escape character (single quotation mark) needs to precede the embedded single quotation mark. Likewise, if the path name is enclosed by double quotation marks and there is an embedded double quotation mark, then an escape character (double quotation mark) needs to precede the embedded double quotation mark.

    If the path name contains single or double quotation marks, it should be enclosed in either single or double quotation marks.

    z/OS

    a true DDNAME.

    If DDNAME is specified, Teradata TPump reads data records from the specified source.

    A DDNAME must obey the same rules for its construction as Teradata SQL column names, except that:

  • the “at” sign (@) is allowed as an alphabetic character
  • the underscore (_) is not allowed
  • The DDNAME must also obey the applicable rules of the external system and may reference a sequential or VSAM data set.

    If DDNAME represents a data source on magnetic tape, the tape may be either labelled or nonlabelled (if the operating system supports it). The “at” sign (@) is allowed as an alphabetic character and the underscore (_) is not allowed.

    charpos1 and charpos2

    Start and end character positions of a field in each input record which contains extraneous information

    Teradata TPump ignores the specified field(s) as follows:

  • charpos1: only the single specified character position is ignored.
  • charpos1 THRU: character positions from charpos1 through the end of the record are ignored.
  • THRU charpos2: character positions from the beginning of the record through charpos2 are ignored.
  • charpos1 THRU charpos2: character positions from charpos1 through charpos2 are ignored.
  • Usage Notes  

    Once Teradata TPump executes the RUN FILE command, further commands and DML statements are read from the specified source until a LOGOFF command or end‑of‑file condition is encountered, whichever occurs first. An end‑of‑file condition automatically causes Teradata TPump to resume reading its commands and DML statements from the previously active source (or the previous RUN source when RUNs are nested), either SYSIN for z/OS, or stdin (normal or redirected) in the UNIX OS. After SYSIN/stdin processes any user‑provided invocation parameter, it remains the active input source.

    If an end‑of‑file condition occurs on fileid, SYSIN/stdin is read because there are no more commands or statements in the PARM string.

    The command source specified by a RUN FILE command may contain nested RUN FILE commands to 16 levels.

    On UNIX systems, an asterisk (*) can be used as the fileid specification for the system console/standard input (stdin) device. The system console is the:

  • Keyboard in interactive mode
  • or

  • Standard input device in batch mode
  • Example  

    .RUN FILE LOGON;

    SET

    Purpose  

    The SET command assigns a data type and a value to a utility variable. Variables need not be declared in advance to be the object of the SET command. If a variable does not already exist, the SET command creates it.

    The SET command also dynamically changes the data type to that of the assigned value if it has already been defined. Variables used to the right of TO in the expression must have already been defined.

    Syntax  

    where

     

    Syntax Element

    Description

    var

    Name of the utility variable which is set to the evaluated expression following it

    expression

    Value and data type to which the utility variable is to be set

    Usage Notes  

    The utility variable can be substituted wherever substitution is allowed.

    If the expression evaluates to a numeric value, the symbol is assigned an integer value, as in:

    .SET FOONUM TO ‑151 ;

    If the expression is a quoted string, the symbol is assigned a string value, as in:

    .SET FOOCHAR TO '‑151' ;

    The minimum and maximum limits for Floating Point data types are as follows:

    4.0E75 <=abs(float variable)<7.0E75

    Example  

    Teradata TPump supports concatenation of variables, using the SET command, such as:

    .SET C TO 1;
    .SET D TO 2;
    .SET X TO &C.&D;

    Example  

    In this example, X evaluates to 12. If a decimal point is added to the concatenated variables, as in:

    .SET C TO 1;
    .SET D TO 2;
    .SET X TO &C..&D;

    X then evaluates to 1.2.

    SYSTEM

    Purpose  

    The SYSTEM command allows access to the local operating system during Teradata TPump operations.

    Syntax  

    where

     

    Syntax Element

    Description

    ‘oscommand'

    Command string (enclosed within single quotes) that is appropriate to the local operating system

    The SYSTEM command suspends the current Teradata TPump application in order to execute the command. When the command completes, the return code from the invoked command is displayed, and the &SYSRC variable is updated with the return code.

    Usage Notes

    On z/OS clients, the command is passed to the PGM executor. The first token of the command string is interpreted as a load module and the remainder as a PARM string. As an example, the following statement calls the load module IEBUPDTE, passing the PARM string “NEW”.

       .SYSTEM “IEBUPDTE NEW”;

    This command calls IEBUPDTE in the same way it is called via the JCL statement:

       //EXEC PGM=IEBUPDTE,PARM='NEW'

    On z/OS, the program must be present in the STEPLIB or JOBLIB concatenation, be resident in one of the LPAs, or be located in the linklist concatenation.

    Otherwise, the invocation will fail, with code SYS_ABTM (‑14) returned, resulting from an underlying abend S806‑04. Other types of failures also are possible.

    On UNIX OS clients, the SYSTEM command invokes the standard UNIX OS interface to issue the command to the shell (sh), and waits for its completion.

    TABLE

    Purpose  

    The TABLE command identifies a table whose column names and data descriptions are used as the names and data descriptions of the input record fields. These are assigned in the order defined. The TABLE command must be used with the LAYOUT command.

    Syntax  

    where

     

    Syntax Element

    Description

    tableref

    Existing table whose column names and data descriptions are assigned, in the order defined, to fields of the input data records

    The column names of the table specified by the TABLE command must be Teradata SQL column names that do not require being enclosed in quotation marks. Tables cannot be created with invalid column names. Any nonstandard column name causes one of three kinds of errors, depending on the nature of the divergence from the standard. These errors are:

  • Embedded blanks cause a syntax error, depending on the nonblank contents of the name.
  • Invalid characters cause an invalid name error.
  • Reserved words cause a syntax error that mentions invalid use of the reserved word.
  • Usage Notes

    One or more TABLE commands may be intermixed with the FIELD command or FILLER command following a LAYOUT command.

    This method of specifying record layout fields assumes each field, as defined by the data description of the corresponding column of tableref, is contiguous with the previous one, beginning at the next available character position beyond any previous field specifications for the input records. The fields must appear in the order defined for the columns of the table.

    The object identified by the tableref parameter must be a table. It need not appear as a parameter of the BEGIN LOAD command, but the user must either be an owner of the object or have at least one privilege on it. If specified as an unqualified table name, the current default database qualifies it.

    When serialization has been set to ON by the BEGIN LOAD command, the primary index columns for the specified table are considered key fields for serialization purposes.

    When the TABLE command is used and the table contains a structured UDT type, Teradata TPump returns an external representation of the UDT and that requires the user to transform. The term “external type” means the data type of the external opaque container for a structured UDT and is the type returned by the from‑sql transform method.

    TIMEOUTVALUE

    Purpose  

    The TIMEOUTVALUE command specifies a maximum I/O completion wait time to Teradata WebSphere MQ Access Module.

    Syntax  

    where

     

    Syntax Element

    Description

    seconds

    A maximum I/O completion wait time to Teradata MQ Access Module.

    Valid range: 1-99999 (seconds)

    Usage Notes  

    When TIMEOUTVALUE command is presented, the value overwrites that of RWAIT option in Teradata WebSphere MQ Access Modules initial string.

    UPDATE Statement and Atomic Upsert

    Purpose  

    Teradata TPump supports the UPDATE Teradata SQL statement, which changes field values in existing rows of a table.

    Syntax  

    where

     

    Syntax Element

    Description

    tname

    Table or view to be updated

    This table was previously identified as tname in the BEGIN LOAD command. If tname is not explicitly qualified by database name, the current default database qualifies it.

    cname

    Column whose value is to be replaced by the value of expr

    The column named must not be a column of the primary index.

    expr

    Expression whose resulting value is to replace the current value of the identified column

    The expression can contain any combination of constants, current values of columns of the referenced row, or values from fields of input data records.

    References to fields of input data records are as follows:

    :fieldname

    where :fieldname is defined by a FIELD command or TABLE command of the layout referenced by an IMPORT using this UPDATE.

    WHERE condition

    Conditional clause specifying the row or rows to be updated

    The conditional clause can use values from fields of input data records by referring to their field names as follows:

    :fieldname

    where fieldname is defined by a FIELD command or TABLE command. Equality values for all the columns of the primary index must be explicitly specified in this clause.

    Usage Notes Update

    The following notes describe how to use an UPDATE statement following a DML command. An UPDATE statement may also be used in the support environment; normal rules for UPDATE are followed in that case.

  • To update records in a table, the userid that is logged on must have UPDATE privilege for the table.
  • In an IMPORT task, if multiple Unique Primary Index (UPI) columns are specified, the columns should all be specified in the WHERE clause of the UPDATE statement. In this case, the WHERE clause is fully qualified, thereby allowing Teradata TPump to avoid table locks and optimize the processing.
  • For Teradata TPump use, if the object of the UPDATE statement is a view, it must not specify a join. Teradata TPump operates only on single table statements so UPDATE statements must not contain any joins.
  • Only one object may be identified.
  • The OR construct can be used in the WHERE clause of a DELETE statement; alternatively, two or more separate DML statements (one per OR term) can be used, with the DML statements applied conditionally with the APPLY clause of the IMPORT command. The nature of the alternatives will usually make one of the methods more appropriate.
  • The maximum number of INSERT, UPDATE, and DELETE statements that can be referenced in an IMPORT is 128. The 128th DML which would cause the insertion of the DML sequence number of 128 for the DMLSEQ field in the error table could lead to Teradata Database 3520 error.
  • The maximum number of DML statements that can be packed into a request is 1500. The default number of statements packed is 20.
  • To ensure data integrity, the SERIALIZE parameter defaults to ON in the absence of an explicit value if there are upserts in the Teradata TPump job.