15.10 - Writing Load Scripts for Restartability and Availability - Parallel Transporter

Teradata Parallel Transporter User Guide

Parallel Transporter
User Guide

Writing Load Scripts for Restartability and Availability

One of the main challenges for data warehousing design is how to recover from a failure as quickly as possible. Recovery usually involves fixing the client or server systems, changing configuration parameters or system resources, restarting the interrupted jobs based on their last checkpoints, and bringing the system back to normal without resorting to rigorous manual efforts or writing piece-meal recovery procedures.

Most of the time, jobs may also be required to perform "catch up" so that transactions that were accumulated during the "failure window" can be applied to the target systems as quickly as possible.

To this end, Teradata PT provides some unique features that allow you to speed up the recovery process without resorting to changing job scripts after a job failure. These features include:

  • Making all jobs checkpoint restartable by default.
  • Archiving transactional data in a readily-loaded format concurrently with the loading of such transactions into target tables using the Duplicate APPLY feature, which allows the same data to go into different targets.
  • Defining a single script language for all operators, which not only results in common approaches for defining operators, but also allows substantial reusability of metadata and operators.
  • Supporting unlimited variable substitution using a job variables file so that changeable and common job parameters, called “attributes,” can be isolated in a single place for value assignments.
  • Having complete independence between the producer operator (for data extraction) and the consumer operator (for data loading) in a job substantially simplifies the process of "switching export/load protocols". In other words, changing either the producer operator or the consumer operator in a job would not impact the other.
  • To take advantage of the above features for restartabilty, some best practices for designing and implementing job scripts are necessary. The best practices presented below speak to reusability and manageability of job scripts, the flexibility of building and enhancing them to deal with ever increasing data volumes and changes in execution environments, and restartability after job failures. These practices can also be regarded as standard guidelines in building data warehousing processes.

  • Always use a job name to execute a job.
  • Use job variable files to capture changeable and common parameters such as user ID, password, file names, source or archive directory names, the number of producer and consumer instances, and so on.
  • Run with backup or archive using the Duplicate APPLY feature so that each APPLY statement can send the same data to a different target.
  • Define checkpoint frequency to control load granularity in case of failure. The smaller the frequency, the less time to recover a job, but more time to take checkpoints.
  • Switch the load protocol (for example, Stream to Update) for purposes of catch up after a system failure.
  • Always execute a job with the job variables file so that parameters are defined in one place instead of being distributed across job scripts.


    Restarting a Job from a Job Failure

    Automatic Restart

    An automatic restart means that a job can restart on its own without manual resubmission. With the default start-of-data and end-of-data checkpoints, a job can automatically restart itself when a retryable error occurs, such as a database restart or deadlock before, during, or after data loading. Consider the following when dealing with automatic restarts:

  • Jobs can automatically restart as many times as is specified by the value of the RETRY option of the Teradata PT job launching command (the -r option). By default, a job can restart up to five times.
  • If no checkpoint interval is specified for a job, and the job fails during processing, the job restarts either at the start-of-data checkpoint or the end-of-data checkpoint, depending on which one is the last recorded checkpoint in the checkpoint file.
  • To avoid reloading data from the beginning, especially for a long running job, specify a checkpoint interval when launching a job so the restart can be done based on the most recent checkpoint taken.
  • Manual Restart

    If a job fails and terminates, you can manually restart it by resubmitting the same job with the original job-launching command. By default, all Teradata PT jobs are checkpoint-restartable using one of the two checkpoints taken before data loading and after data loading. When jobs have multiple steps, a checkpoint is created for each successful step, allowing a job to restart from the failed step.

    Restarting a Job “Catch Up

    Here are the steps for switching the load protocol to perform “catch up”:

    1 Terminate the current job with the TERMINATE command. This forces the job to take a checkpoint before it terminates.

    2 Switch the load protocol by either changing the operator in the job variables file or by using another job variables file that has the new operator. The latter method is highly recommended because it prevents users from modifying existing job variables files.

    3 Resubmit the same job with the same command options.

    Note: Do not cleanup the Teradata PT checkpoint files left from the previous run.

    The steps above can be easily automated because performing "catch up" is very similar to restarting a job. In most of the "catch-up" cases, you do not need to modify the original scripts. This is all due to the advantages of having a single script language, external job variables to isolate changes to one place, and a common protocol for checkpoint restart across operators.