Checkpoints increase Teradata PT job overhead. In terms of resources, each executing operator must do the additional work of writing its internal operating state to the checkpoint file, so that it could be restarted from the information in the checkpoint file. In terms of running time, each executing operator must first finish all in-progress work, take its checkpoint, and then wait (when necessary) until all the other operators have finished taking their checkpoints.
Frequent checkpoints can guarantee that only a limited amount of work would have to be repeated if the job were interrupted and then later restarted, because it shortens the time between an error event and the checkpoint. However, specifying a very short checkpoint interval can significantly increase job running time. Choosing a checkpoint interval is a trade off between the cost in increased job run time and the potential reduction in repeated work if the job must be restarted.
Here is an example of a Teradata PT job that loads 20,000,000 rows with 4 instances each of the producer and consumer operators:
- Specifying a checkpoint interval of 10 seconds increased the job's running time by 7.3% and its host CPU time by 3.3%.
- Specifying a checkpoint interval of 5 seconds increased the job's running time by 20% and its host CPU time by 6.6%.
Even though interval checkpointing may have a substantial performance cost, its usefulness during a possible restart make interval checkpointing a Teradata “best practice” recommendation.