When a Teradata PT job logs a checkpoint, the producer operator in the currently-executing job step stops putting rows into the output data stream, and the consumer operator processes all the rows in the input data stream. All executing operators write records to the job checkpoint files with the information that would allow them to resume processing with no loss or duplication of data at the point the checkpoint was completed.
Teradata PT automatically creates a start-of-data and an end-of-data checkpoint. In addition, you can use the tbuild command to specify a user-defined checkpoint interval (in seconds).
Handling Data Processed After the Checkpoint
If rows are already in the data streams or loaded when a job fails, the restarting of the job could cause the same rows to be sent again. Here is how the operators handle duplicate rows on restart:
- Load Operator:Duplicate rows are always discarded in the Application Phase.
- Update Operator: While duplicate rows are valid for multiset tables, rows that are sent again during restart are identified by the Teradata Database as duplicate and ignored or sent to the error table based on user-specified DML options.
Stream Operator: If the Stream Operator has not sent the rows to the Teradata Database, there will be no duplicates on the target table. If the Stream operator has sent rows to the Teradata Database:
- If ROBUST recovery is on, then Stream Operator will not re-send the rows when the job is restarted. ROBUST recovery is the default.
- If ROBUST recovery is off, then Stream Operator will re-send the rows to the Teradata Database.