One of the main challenges for data warehousing design is how to recover from a failure as quickly as possible. Recovery usually involves fixing the client or server systems, changing configuration parameters or system resources, restarting the interrupted jobs based on their last checkpoints, and bringing the system back to normal without resorting to rigorous manual efforts or writing piece-meal recovery procedures.
Most of the time, jobs may also be required to perform "catch up" so that transactions that were accumulated during the "failure window" can be applied to the target systems as quickly as possible.
To this end, Teradata PT provides some unique features that allow you to speed up the recovery process without resorting to changing job scripts after a job failure. These features include:
- Making all jobs checkpoint restartable by default.
- Archiving transactional data in a readily-loaded format concurrently with the loading of such transactions into target tables using the Duplicate APPLY feature, which allows the same data to go into different targets.
- Defining a single script language for all operators, which not only results in common approaches for defining operators, but also allows substantial reusability of metadata and operators.
- Supporting unlimited variable substitution using a job variables file so that changeable and common job parameters, called “attributes,” can be isolated in a single place for value assignments.
- Having complete independence between the producer operator (for data extraction) and the consumer operator (for data loading) in a job substantially simplifies the process of "switching export/load protocols". In other words, changing either the producer operator or the consumer operator in a job would not impact the other.
To take advantage of the above features for restartabilty, some best practices for designing and implementing job scripts are necessary. The best practices presented below speak to reusability and manageability of job scripts, the flexibility of building and enhancing them to deal with ever increasing data volumes and changes in execution environments, and restartability after job failures. These practices can also be regarded as standard guidelines in building data warehousing processes.
- Always use a job name to execute a job.
- Use job variable files to capture changeable and common parameters such as user ID, password, file names, source or archive directory names, the number of producer and consumer instances, and so on.
- Run with backup or archive using the Duplicate APPLY feature so that each APPLY statement can send the same data to a different target.
- Define checkpoint frequency to control load granularity in case of failure. The smaller the frequency, the less time to recover a job, but more time to take checkpoints.
- Switch the load protocol (for example, Stream to Update) for purposes of catch up after a system failure.
- Always execute a job with the job variables file so that parameters are defined in one place instead of being distributed across job scripts.