17.00 - Data-Related Variables - Parallel Data Pump

Teradata® Parallel Data Pump Reference

prodname
Parallel Data Pump
vrm_release
17.00
created_date
June 2020
category
Programming Reference
featnum
B035-3021-220K

Variables related to the database design and the state of the data include:

  • How clean the data is
  • Any sequence to the input stream
  • Degree and type of indexes defined on the table being updated
  • Use of Referential Integrity, fallback, or permanent journaling
  • Number and complexity of triggers
  • Number of columns being passed into the database per update
  • Ratio of UPDATEs to INSERTs when an UPSERT is used

The data-related variables are important to performance because they may dictate how large the pack factor can be, and because these variables influence how much effort it will take to process each row. The more secondary indexes, triggers, or other options, such as fallback or referential integrity (RI) that are defined on the table being updated, the more complex the work in the database will be to perform that update. With this additional complexity comes an increase in the database steps that are generated internally to perform the work. Because there is an internal limit on the total size of the steps generated for a single request, TPump in some cases may have to automatically reduce the pack factor when the table has been defined using some number of these table options. If TPump has to reduce the PACK factor, it issues a UTY6625 message.

The number of columns being passed into the database can also impact the pack factor. TPump uses the Teradata multi-statement request convention, in which multiple SQL statements are bundled together into a single optimizing and recovery unit. This is the most efficient approach when combined with a USING clause, so that the data goes into the database in one parcel and the SQL in a second parcel. When the USING clause is utilized as it is with TPump, the query plans are cached and throughput can be increased by bypassing the parsing activity. When UPSERT processing is expected, the ratio of UPDATEs to INSERTs impacts TPump performance. The higher the percentage of INSERTs, the lower the throughput. This is because the UPSERT uses a single UPDATE statement, followed by a single INSERT statement. The UPDATE has to enter the database and attempt to find the row, and fail, before TPump knows to perform the INSERT instead.