15.10 - Restarting a Job From the Last Checkpoint Taken - Parallel Transporter

Teradata Parallel Transporter User Guide

Parallel Transporter
User Guide

Restarting a Job From the Last Checkpoint Taken

To restart a job from the last checkpoint taken, do the following:

1 Determine whether the error that caused the failure is associated with an operator that offers full or limited support of checkpoint restarts.

2 Determine the identity and location of the checkpoint file tbuild will use for the restart and whether or not you need to specify a checkpoint interval.

3 Run the tbuild restart command.

4 Once the job restarts and runs correctly, Teradata PT will delete the checkpoint files automatically.

Support for Checkpoint Restarts

Support for checkpoint restartability varies by operator:

  • The following operators fully support checkpoint restartability:
  • Load
  • Update
  • Stream
  • FastLoad INMOD Adapter
  • FastExport OUTMOD Adapter
  • MultiLoad INMOD Adapter
  • These operators support limited checkpoint restartability:
  • DataConnector operator is fully restartable when processing files in the local filesystem. Checkpoint restartability is partially supported when processing Hadoop files via the HDFS API interface, and not supported when processing Hadoop files and tables via the TDCH-TPT interface. For more information, see “Processing Hadoop Files and Tables” in the Teradata Parallel Transporter Reference.
  • Parallel Transporter Reference.
  • DDL is restartable from the SQL statement that was being executed, but had not completed, at the time the original run of the job terminated.
  • Export and SQL Selector operators are restartable, but not during the exporting of data, as these operators take a checkpoint only when all of the data has been sent to the Teradata PT data stream. Restarting from this checkpoint prevents the operators from having to resend the data.
  • The following operators do not support checkpoint restartability:
  • SQL Inserter
  • The ODBC operator does not support checkpoint and restart operations because it is unknown how the databases to which it can connect will handle restarts.
  • Locating Checkpoint Files

    Checkpoint files must be specified in the tbuild command that restarts the job. Checkpoint files can be found in the following options locations, depending on how your site and the job are set up.

  • In the Global Configuration File -- twbcfg.ini
  • The Teradata PT installation automatically creates a directory named checkpoint (in italics) as the default checkpoint directory under the directory in which the Teradata PT software is installed. This checkpoint directory name is automatically recorded in the Global Configuration File (twbcfg.ini) during the installation of the Teradata PT software.

  • In the Local Configuration File -- $HOME/.twbcfg.ini(UNIX system only
  • On a UNIX system, the checkpoint directory can be set up through the Local Configuration File -- file twbcfg.ini (in italics) in your home directory. The Local Configuration File takes precedence if the CheckpointDirectory entry is defined in both the Global Configuration File and the Local Configuration File. Any changes made to the Local Configuration File affect only the individual user. On Windows there is no Local Configuration File.

  • As defined by the tbuild -r option
  • tbuild -f <filename> -r <checkpoint directory name>

    The -r option of the tbuild command sets up the checkpoint directory with the specified name. This option overrides -- only for the job being submitted -- any default checkpoint directory that is specified in the Teradata PT configuration files.

    For more information about setting up checkpoint directories, see the Teradata Tools and Utilities Installation Guide for UNIX and Linux.

    If the entry CheckpointDirectory is defined in both configuration files, the one defined in the local configuration file takes precedence. Note that whatever is specified in the local configuration file affects only its owner, not other users.

    Note: On the z/OS platform, checkpoint datasets are defined in the Job Control Language for a Teradata PT job.

    For information on setting up the configuration files for the checkpoint directories “Setting Up Configuration Files” on page 71.

    Default Checkpoint File Names

    Each Teradata PT job automatically creates three associated checkpoint files during job execution and places them in the specified checkpoint directories. These files extend across multiple job steps, if the job has more than one step. They are automatically deleted after a job runs all the way to completion without errors, but if any step fails to finish successfully, the checkpoint files are not deleted and remain in the checkpoint directory.

    Default name formulas for the standard job checkpoint files vary by operating system as follows:

    On UNIX and Windows platforms:

  • <job identifier>CPD1
  • <job identifier>CPD2
  • <job identifier>LVCP
  • where <job identifier> is the job name from the tbuild command line, if a jobname was specified, or the userid in the job logon, if a job name was not specified.

    On z/OS platforms, the checkpoint datasets have the following DDNAMEs:

  • //CPD1
  • //CPD2
  • //LVCP
  • Use tbuild to Restart the Job

    Use one of the following variations to restart a failed job. To restart any job that terminated abnormally, use the same tbuild command that you used to submit the job the first time. The job will then be automatically restarted at the point where the last checkpoint was taken.

    Restarting with a Default Job Name

    When no job name is specified in the tbuild statement at job launch, Teradata PT assigns a default name to the job that is based on the login name, and creates a checkpoint file called <username>.LVCP.

    Jobs executed under the same login name, therefore, use the same <username>.LVCP file, which can be a problem if a job fails because the checkpoint file associated with a failed job remains in the checkpoint directory.

    Starting a new job before restarting the failed job results in unpredictable errors because the new job will use the checkpoint file of the failed job. To avoid such errors, do the following:

  • Restart failed jobs and run them to completion before starting any new jobs.
  • Always delete the checkpoint file of failed jobs before starting a new job. Restarting a failed job after deleting its checkpoint file will cause it to restart from its beginning.
  • Always specify the jobname parameter for all jobs so every job has a unique checkpoint file.