Effects of Interval Checkpointing on Job Performance

Effects of Interval Checkpointing on Job Performance - Parallel Transporter

Teradata® Parallel Transporter User Guide

Product

Parallel Transporter

Release Number

17.00

Published

August 31, 2020

Language

English (United States)

Last Update

2020-08-27

dita:mapPath

zae1544831938751.ditamap

dita:ditavalPath

tvt1507315030722.ditaval

dita:id

B035-2445

lifecycle

Product Category

Teradata Tools and Utilities

Checkpoints increase Teradata PT job overhead. In terms of resources, each executing operator must do the additional work of writing its internal operating state to the checkpoint file, so that it could be restarted from the information in the checkpoint file. In terms of running time, each executing operator must first finish all in-progress work, take its checkpoint, and then wait (when necessary) until all the other operators have finished taking their checkpoints.

Frequent checkpoints can guarantee that only a limited amount of work would have to be repeated if the job were interrupted and then later restarted, because it shortens the time between an error event and the checkpoint. However, specifying a very short checkpoint interval can significantly increase job running time. Choosing a checkpoint interval is a trade off between the cost in increased job run time and the potential reduction in repeated work if the job must be restarted.

Here is an example of a Teradata PT job that loads 20,000,000 rows with 4 instances each of the producer and consumer operators:

Specifying a checkpoint interval of 10 seconds increased the job's running time by 7.3% and its host CPU time by 3.3%.
Specifying a checkpoint interval of 5 seconds increased the job's running time by 20% and its host CPU time by 6.6%.

Even though interval checkpointing may have a substantial performance cost, its usefulness during a possible restart make interval checkpointing a Teradata “best practice” recommendation.