16.10 - Robust and Non-Robust Mode - Parallel Transporter

Teradata Parallel Transporter Reference

Product
Parallel Transporter
Release Number
16.10
Published
July 2017
Content Type
Programming Reference
Publication ID
B035-2436-077K
Language
English (United States)

For more robust restartability, use robust mode, which causes every DML operation to be checkpointed and ensures on restart that no operation is applied more than once.

The robust mode requires more writes to a restart log, which might impact performance more, however, using robust mode ensures that a restart avoids reprocessing rows that a normal interval checkpoint might necessitate.

Robust is the default mode for all Stream operator jobs. The Robust attribute turns the mode on or off. If uncertain whether to use robust restart logic, it is always safe to set the Robust parameter to 'Yes'.

  • Robust Mode – Setting the attribute to “yes” tells the Stream operator to use robust restart logic.
    VARCHAR Robust = 'Yes' (or 'Y')

    Robust mode causes a row to be written to the log table each time a buffer successfully completes its updates. Mini-checkpoints are written for each successfully processed row. These mini-checkpoints are deleted from the log when a checkpoint is taken, and are used at restart to identify the rows that have been successfully processed, which permits them to be bypassed at restart. In robust mode, each row is processed only once. The larger the Pack factor, the less overhead is involved in this activity.

    Choosing the Robust mode is particularly useful to avoid problems with data integrity and unacceptable performance. Robust mode is recommended in the following situations to avoid having an adverse affect on restarts:

    • INSERTs into multi-set tables – Robust mode prevents the insertion of duplicate rows, which could insert the same row a second time.
    • UPDATEs based on calculations – Robust mode prevents the duplicate application of calculations.
    • Large Pack factors – Robust mode does not involve the application and rejection of duplicate rows after restarts, which is a time-consuming process of logging errors to the error table.
    • Time-stamped data – Robust mode prevents the possibility of stamping identical rows with different time stamps, resulting in duplicate rows.

      If rows are reapplied in non-robust mode, each reapplied row is marked with a time stamp that is different from the original row even though all of the other data is identical. To Teradata, these reapplied rows are different rows with the same primary index value, so they are inserted even though they are duplicates.

  • Non-Robust Mode – Setting the attribute to “no” tells the Stream operator to use simple restart logic rather than robust logic.
    VARCHAR Robust = 'No' (or 'N')

    In a non-robust mode, restarts begin where the last checkpoint occurs in a job. Because some additional processing will most likely take place after the checkpoint is written, the requests that occur after the checkpoint are resubmitted by the Stream operator as part of the restart process. For Deletes, Inserts and Upserts, this does not usually cause a problem or harm the database; however, re-running statements generates more rows in the error table because the operator will be attempting to insert rows that already exist and to delete rows that do not exist.

    Re-attempting updates can also be a problem if update calculation, for example, is based on existing data in the row, such as adding 10% to an amount. Doing the update calculation a second time add an additional 10% to the amount, thus compromising data integrity. In this type of update, it is best to use robust mode to ensure that no DML operation is applied more than once.

    The non-robust (or simple restart) method does not involve the extra overhead that comes with the additional inserts to the restart log table that are needed for robust logic, so overall processing is notably faster.