Complementing MultiLoad - Parallel Data Pump

Teradata® Parallel Data Pump Reference

Product
Parallel Data Pump
Release Number
16.20
Published
September 2019
Language
English (United States)
Last Update
2019-10-11
dita:mapPath
dmq1512702641516.ditamap
dita:ditavalPath
Audience_PDF_include.ditaval
dita:id
B035-3021
lifecycle
previous
Product Category
Teradata Tools and Utilities

Teradata TPump uses MultiLoad-like syntax, which leverages MultiLoad knowledge and power, provides easy transition from MultiLoad to Teradata TPump, and supports the useful upsert feature. Teradata TPump shares much of its command syntax with MultiLoad, which facilitates conversion of scripts between the two utilities; however, there are substantial differences in how the two utilities operate.

Teradata TPump complements MultiLoad in the following ways:

  • Economies of Scale
  • Concurrency
  • Resource Consumption

Economies of Scale

MultiLoad has an economy of scale and is not necessarily efficient when operating on really large tables when there are not many rows to insert or update. For MultiLoad to be efficient, it must touch more than one row per data block in Teradata Database.

For example, to achieve efficient MultiLoad performance on a two billion, 65-byte row table, composed of 16KB blocks, more than 0.4% of the table (8,125,000 rows) must be affected. While 0.4% of a table is a small update, eight million records is probably more data than should be run through a BTEQ script.

Concurrency

MultiLoad is limited to a Teradata Database variable limit for the maximum number of instances running concurrently. Teradata TPump does not impose this limit. In addition, while MultiLoad uses table-level locks, Teradata TPump uses row-hash locks, making concurrent updates on the same table a possibility.

Finally, because of the phased nature of MultiLoad, there are potentially inconvenient windows of time when MultiLoad cannot be stopped without losing access to the target tables. In contrast, Teradata TPump can always be stopped and all of its locks dropped with no ill effect.

Resource Consumption

MultiLoad is designed for the highest possible throughput, and uses any database and host resources that help to achieve this capability. There is no way to reduce MultiLoad's resource consumption even if a longer run time for the job is acceptable. Teradata TPump, however, has a built-in resource governing facility.

This allows the operator to specify how many updates occur (the statement rate) minute by minute, and then change the statement rate, while the job continues to run. Thus, this facility can be used to increase the statement rate during windows when Teradata TPump is running by itself, but then decrease the statement rate later on, if users log on for ad hoc query access.