15.10 - Active Directory Scan: Continuous Loading of Transactional Data - Parallel Transporter

Teradata Parallel Transporter User Guide

prodname
Parallel Transporter
vrm_release
15.10
category
User Guide
featnum
B035-2445-035K

Active Directory Scan: Continuous Loading of Transactional Data

Transactional data is collected and stored in client directories. You can use the “active directory scan” feature to continuously collect data from these directories based on a user-defined time interval for scanning the directory, and a start and stop time for the whole scan job, using the Data Connector operator.

All files present in the source directories that meet the user-specified file name criteria (which include “wildcard” specifications) are processed by the Data Connector operator. Whenever the defined scan interval expires, the Data Connector operator scans the directory and looks for new files that have entered the directory since the last scan. It then reads the rows from each of the files collected and sends them to the consumer operator, which is usually the Stream operator, for purposes of continuous loading. If no new files are found during the directory scan, the Data Connector operator waits for the defined interval to expire before scanning the directory again.

Strategy

Consider the following when setting up a job for Active Directory Scan:

  • Specify the attribute names and values for the standard attributes required for the DataConnector operator; FileName, Format, IndicatorMode (where required), and TextDelimiter (required if format is “delimited”).
  • For information on use of these standard attributes the chapter on the DataConnector operator in Teradata Parallel Transporter Reference.

  • Use the wildcard character ( * ) for the FileName attribute according to one of the following strategies:
  • Specify “*” to instruct the DataConnector operator to scan and extract data from all files in the directory.
  • Specify “abc.*” to instruct the DataConnector operator to scan for all files in the directory having file names that begin with the specified character string.
  • Specify the directory to be scanned using the DirectoryPath attribute, in the form:
  • DirectoryPath=<PathName>
  • Use the ArchiveDirectoryPath attribute to specify the path for the archive directory. Once files in the directory have been scanned and their data has been extracted, this specification will cause the files to be moved from the directory identified in the DirectoryPath attribute to that specified in ArchiveDirectoryPath attribute, in order to keep the files from being scanned again.
  • Use the DataConnector Vigil attributes to set up the time constraints for the directory scan, as follows:
  •  

    Attribute

    Setup Requirements

    VigilStartTime

    Required to specify the start time for the initial directory scan.

    VigilStopTime

    Specifies the time after which no more scans will begin. Any scan that begins before the stop time will run to completion.

    This attribute is interchangeable with the VigilElapsedTime attribute. Using one of these two attributes is required.

    VigilWaitTime

    Specifies the time in seconds between the beginning of one scan and the beginning of the next scan.

    VigilElapsedTime

    Specifies the total time in minutes the job will scan the directory for new files in intervals defined by VigilWaitTime. Any scan that starts before the end of the specified elapsed time will run to completion.

    For required syntax and detailed descriptions for all DataConnector attributes Teradata Parallel Transporter Reference.

    Active Directory Scan Options

    The following options are available to further customize an Active Directory Scan.

  • Use several DataConnectors operating in parallel to monitor multiple data sources.
  • Use multiple instances of Stream operator to INSERT data into a Teradata Database table at an optimal rate.
  • Important optional attributes:
  • Specify the VigilSortFile attribute and set it to TIME to sort files according to the time they were last modified.
  • Specify the VigilNoticeFileName attribute with a file name, so that when the scan file is updated with new data, a notification will be placed in that file.
  • Specify VigilMaxFiles to define the maximum number of files that can be scanned in one pass.
  • Multiple schemas:
  • When the data from the sources are not all described by UNION-compatible schemas, use column selection and/or derived columns in the Select clauses in the APPLY statement to put UNION-compatible data on the output data streams.

    For a typical application of Active Directory Scan “Job Example 9: Active Directory Scan” on page 112.

    For the sample script that corresponds to this job, see the following script in the sample/userguide directory:

    PTS00015: Active Directory Scan.

    Note: The Active Directory Scan functionality is supported when using the HDFS API interface to process Hadoop files, but is not supported when using the TDCH-TPT interface to process Hadoop files and tables. For more information, see “Processing Hadoop Files and Tables” in Chapter 3 of the Teradata Parallel Transporter Reference.