Data Related Variables - Parallel Data Pump

Teradata® Parallel Data Pump Reference

Product

Parallel Data Pump

Release Number

16.20

Published

September 2019

Language

English (United States)

Last Update

2019-10-11

dita:mapPath

dmq1512702641516.ditamap

dita:ditavalPath

Audience_PDF_include.ditaval

dita:id

B035-3021

lifecycle

Product Category

Teradata Tools and Utilities

Variables related to the database design and the state of the data include:

How clean the data is
Any sequence to the input stream
Degree and type of indexes defined on the table being updated
Use of Referential Integrity, fallback, or permanent journaling
Number and complexity of triggers
Number of columns being passed into the database per update
Ratio of UPDATEs to INSERTs when an UPSERT is used

The data related variables are important to performance because they may dictate how large the pack factor can be, and because these variables will influence how much effort it will take to process each row. The more secondary indexes, triggers, or other options, such as fallback or referential integrity (RI) that are defined on the table being updated, the more complex the work in the database will be to perform that update. With this additional complexity comes an increase in the DBS steps that are generated internally to perform the work. Because there is an internal limit on the total size of the steps generated for a single request, TPump in some cases may have to automatically reduce the pack factor when the table has been defined using some number of these table options. If TPump has to reduce the PACK factor, it issues a UTY6625 message.

The number of columns being passed into the database can also impact the pack factor. TPump uses the Teradata multi-statement request convention, in which multiple SQL statements are bundled together into a single optimizing and recovery unit. This is the most efficient approach when combined with a USING clause, so that the data goes into the database in one parcel and the SQL in a second parcel. When the USING clause is utilized as it is with TPump, the query plans are cached and throughput can be increased by bypassing the parsing activity. When UPSERT processing is expected, the ratio of UPDATEs to INSERTs impacts TPump performance. The higher the percentage of INSERTs, the lower the throughput. This is because the UPSERT uses a single UPDATE statement, followed by a single INSERT statement. The UPDATE has to enter the database and attempt to find the row, and fail, before TPump knows to perform the INSERT instead.