Parallel Data Pump | VantageCloud Lake - Teradata Parallel Data Pump

Parallel Data Pump | VantageCloud Lake - Teradata Parallel Data Pump - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

The Teradata Parallel Data Pump (TPump) utility allows real-time updates from transactional systems into the data warehouse. TPump runs INSERT, UPDATE and DELETE requests, or a combination, to more than 60 tables at a time from the same source feed.

TPump is an alternative to MultiLoad. The benefits of TPump include:

Real time INSERTs and UPDATEs to more than 60 tables simultaneously
Low volume batch maintenance
Can provide continuous feed to the warehouse

The data handling functionality of TPump is enhanced by the TPump Support Environment, which coordinates activities involved in TPump tasks and provides facilities for managing file acquisition, conditional processing, and certain Data Manipulation Language (DML) and Data Definition Language (DDL) activities of Vantage.

TPump has the following restrictions:

Does not support aggregate operators or concatenation of data files.
Performance is severely affected by access or DBQL logging (when DBQL logging is set to Detail Logging).

The TPump Support Environment enables use of variable substitution, conditional execution based on the value of return codes and variables, expression evaluation, character set selection options, and more. For more information, see Teradata® Parallel Data Pump Reference, B035-3021.

Restarts on TPump Jobs with Identity Column

TPump works on multiple-statement SQL requests. Each request has a specific number of statements depending on the PACK specification in the BEGIN LOAD command.

In ROBUST mode, each request is written into a restart log table. Because Analytics Database only rolls back statements in a packed request that fail rather than rolling back the entire request, the restart log accurately reflects the completion status of a TPump import.

If a restart occurs, TPump queries the restart log table and rerun requests that are not logged. This means a restart can generate duplicates if an insert request is repeated. Duplicates are not detected if the target table is not defined with a UPI.

TPump flags an error if run in simple mode and the target table has an identity column PI. This is because no restart log is used for restart recovery and duplicate rows may result if requests are reprocessed.

For more information on this utility, see Teradata® Parallel Data Pump Reference, B035-3021.