Data Related Variables - Parallel Data Pump

Teradata® Parallel Data Pump Reference

Product
Parallel Data Pump
Release Number
16.20
Published
September 2019
Language
English (United States)
Last Update
2019-10-11
dita:mapPath
dmq1512702641516.ditamap
dita:ditavalPath
Audience_PDF_include.ditaval
dita:id
B035-3021
lifecycle
previous
Product Category
Teradata Tools and Utilities

Variables related to the database design and the state of the data include:

  • How clean the data is
  • Any sequence to the input stream
  • Degree and type of indexes defined on the table being updated
  • Use of Referential Integrity, fallback, or permanent journaling
  • Number and complexity of triggers
  • Number of columns being passed into the database per update
  • Ratio of UPDATEs to INSERTs when an UPSERT is used

The data related variables are important to performance because they may dictate how large the pack factor can be, and because these variables will influence how much effort it will take to process each row. The more secondary indexes, triggers, or other options, such as fallback or referential integrity (RI) that are defined on the table being updated, the more complex the work in the database will be to perform that update. With this additional complexity comes an increase in the DBS steps that are generated internally to perform the work. Because there is an internal limit on the total size of the steps generated for a single request, TPump in some cases may have to automatically reduce the pack factor when the table has been defined using some number of these table options. If TPump has to reduce the PACK factor, it issues a UTY6625 message.

The number of columns being passed into the database can also impact the pack factor. TPump uses the Teradata multi-statement request convention, in which multiple SQL statements are bundled together into a single optimizing and recovery unit. This is the most efficient approach when combined with a USING clause, so that the data goes into the database in one parcel and the SQL in a second parcel. When the USING clause is utilized as it is with TPump, the query plans are cached and throughput can be increased by bypassing the parsing activity. When UPSERT processing is expected, the ratio of UPDATEs to INSERTs impacts TPump performance. The higher the percentage of INSERTs, the lower the throughput. This is because the UPSERT uses a single UPDATE statement, followed by a single INSERT statement. The UPDATE has to enter the database and attempt to find the row, and fail, before TPump knows to perform the INSERT instead.