Active Directory Scan: Continuous Loading of Transactional Data
Transactional data is collected and stored in client directories. You can use the “active directory scan” feature to continuously collect data from these directories based on a user-defined time interval for scanning the directory, and a start and stop time for the whole scan job, using the Data Connector operator.
All files present in the source directories that meet the user-specified file name criteria (which include “wildcard” specifications) are processed by the Data Connector operator. Whenever the defined scan interval expires, the Data Connector operator scans the directory and looks for new files that have entered the directory since the last scan. It then reads the rows from each of the files collected and sends them to the consumer operator, which is usually the Stream operator, for purposes of continuous loading. If no new files are found during the directory scan, the Data Connector operator waits for the defined interval to expire before scanning the directory again.
Strategy
Consider the following when setting up a job for Active Directory Scan:
For information on use of these standard attributes the chapter on the DataConnector operator in Teradata Parallel Transporter Reference.
DirectoryPath=<PathName>
Attribute |
Setup Requirements |
VigilStartTime |
Required to specify the start time for the initial directory scan. |
VigilStopTime |
Specifies the time after which no more scans will begin. Any scan that begins before the stop time will run to completion. This attribute is interchangeable with the VigilElapsedTime attribute. Using one of these two attributes is required. |
VigilWaitTime |
Specifies the time in seconds between the beginning of one scan and the beginning of the next scan. |
VigilElapsedTime |
Specifies the total time in minutes the job will scan the directory for new files in intervals defined by VigilWaitTime. Any scan that starts before the end of the specified elapsed time will run to completion. |
For required syntax and detailed descriptions for all DataConnector attributes Teradata Parallel Transporter Reference.
Active Directory Scan Options
The following options are available to further customize an Active Directory Scan.
When the data from the sources are not all described by UNION-compatible schemas, use column selection and/or derived columns in the Select clauses in the APPLY statement to put UNION-compatible data on the output data streams.
For a typical application of Active Directory Scan “Job Example 9: Active Directory Scan” on page 112.
For the sample script that corresponds to this job, see the following script in the sample/userguide directory:
PTS00015: Active Directory Scan.
Note: The Active Directory Scan functionality is supported when using the HDFS API interface to process Hadoop files, but is not supported when using the TDCH-TPT interface to process Hadoop files and tables. For more information, see “Processing Hadoop Files and Tables” in Chapter 3 of the Teradata Parallel Transporter Reference.