15.10 - Using Teradata PT Periodic Loading for Active Data Warehousing - Parallel Transporter

Teradata Parallel Transporter User Guide

Parallel Transporter
User Guide

Using Teradata PT Periodic Loading for Active Data Warehousing

Unlike the batch file processing, where the number of files is predefined and each file usually contains a very large number of rows, files that represent transactions in the Active Data Warehousing (ADW) environment are “dynamic” relatively small in size (a few hundred rows per file in average). Dynamic means that files can be created, processed, and removed from the directory while the loading job is running. Since dynamic files represent real-time transactional data flow, they are usually created in short-time duration, made available for updates once they are created, processed in or close to time-sequence order, and committed to the data warehouse in a timely manner as quickly as possible.

Periodic File Collection and Loading

As shown in Figure 50, transactional files can be collected periodically and processed by the DataConnector operator before they are loaded into Teradata tables using the Update operator.

Figure 50: Periodic Loading with Directory Scan

Both active and batch directory scan can be used for periodic loading.

For active directory scan, there are multiple scans (based on the VigilWaitTime value) of the directory for new files; the job does not terminate until the VigilElapsedTime or VigilStopTime expires.

For batch directory scan (which does not require that the VigiWaitTime and VigiElapsedTime attributes be set), there is only one scan of the directory for files; the job terminates once all the files collected by that scan are processed.

For more information on active and batch directory scan attributes, see Chapter 5: “Moving External Data into Teradata Database.”

Note: The Active Directory Scan functionality is supported when using the HDFS interface to process Hadoop files. The Active Directory Scan functionality is not supported when using the TDCH-TPT interface to process Hadoop files and tables. For more information, see “Processing Hadoop Files and Tables” in Chapter 3 of the Teradata Parallel Transporter Reference.

Switching the Load Protocol for Periodic Loading

One of the most distinguishable advantages of using Teradata PT for active or periodic loading is that you can switch the load protocol and job parameters without modifying the job script itself. However, not all job scripts allow you to switch the load (or export) protocol because the features supported by one operator may not be applicable to the others. In other words, you need to take into consideration that the Teradata PT SELECT and APPLY operations being used in the job are applicable to the operators that are to be switched.

Switching the load protocol is desirable for the following reasons:

  • The current load protocol cannot process the current volume of transactional files fast enough
  • The number of concurrent load jobs that require "load slots" has reached a limit imposed by the Teradata Database.
  • The system cannot sustain the current usage of system resources with the current load strategy.
  • “Catch up" is required after a system failure or a sudden increase in the volume of transactions.
  • As shown in Figure 51, if you want to change the load protocol from Stream to Update, or vice versa, you can define the operator type with a job variable name that starts with the @ sign (for example, @LOAD). For more details about how to use job variables, see “Setting Up the Job Variables Files” on page 72.

    Figure 51: Switching Operator using a Job Variable