Unlike the batch file processing, where the number of files is predefined and each file usually contains a very large number of rows, files that represent transactions in the Active Data Warehousing (ADW) environment are “dynamic” relatively small in size (a few hundred rows per file in average). Dynamic means that files can be created, processed, and removed from the directory while the loading job is running. Since dynamic files represent real-time transactional data flow, they are usually created in short-time duration, made available for updates once they are created, processed in or close to time-sequence order, and committed to the data warehouse in a timely manner as quickly as possible.
Periodic File Collection and Loading
As shown in the following figure, transactional files can be collected periodically and processed by the DataConnector operator before they are loaded into Teradata tables using the Update operator.
Both active and batch directory scan can be used for periodic loading.
For active directory scan, there are multiple scans (based on the VigilWaitTime value) of the directory for new files; the job does not terminate until the VigilElapsedTime or VigilStopTime expires.
For batch directory scan (which does not require that the VigiWaitTime and VigiElapsedTime attributes be set), there is only one scan of the directory for files; the job terminates once all the files collected by that scan are processed.
For more information on active and batch directory scan attributes, see Moving External Data into Teradata Database.
Switching the Load Protocol for Periodic Loading
One of the most distinguishable advantages of using Teradata PT for active or periodic loading is that you can switch the load protocol and job parameters without modifying the job script itself. However, not all job scripts allow you to switch the load (or export) protocol because the features supported by one operator may not be applicable to the others. In other words, you need to take into consideration that the Teradata PT SELECT and APPLY operations being used in the job are applicable to the operators that are to be switched.
Switching the load protocol is desirable for the following reasons:
- The current load protocol cannot process the current volume of transactional files fast enough
- The number of concurrent load jobs that require "load slots" has reached a limit imposed by the Teradata Database.
- The system cannot sustain the current usage of system resources with the current load strategy.
- “Catch up" is required after a system failure or a sudden increase in the volume of transactions.
As shown in the following figure, if you want to change the load protocol from Stream to Update, or vice versa, you can define the operator type with a job variable name that starts with the @ sign (for example, @LOAD). For more details about how to use job variables, see Setting Up the Job Variables Files.