15.10 - Batch Directory Scan - Parallel Transporter

Teradata Parallel Transporter User Guide

prodname
Parallel Transporter
vrm_release
15.10
category
User Guide
featnum
B035-2445-035K

Batch Directory Scan

Batch Directory Scan uses multiple DataConnector operator instances to scan an external directory of flat files, searching for files that match the wildcard specification in the FileName attribute.

When the scan is complete, DataConnector places the data in the data stream for use by the consumer operator in the next job step. No further scanning is done, and any data added to the flat files after the scan will not be picked up until the next time the job is run.

Strategy

Use the following strategy when setting up the Batch directory scan:

  • Specify the name of the directory to be scanned using the DataConnector operator DirectoryPath attribute.
  • Use the wildcard character ( * ) for the FileName attribute, as follows:
  • Specify “*” to instruct the DataConnector operator to scan and load all files in the directory.
  • Specify “abc.*” to instruct the DataConnector operator to scan for all files in the directory having file names that begin with the specified character string.
  • Use the ArchiveDirectoryPath attribute to specify an archive directory. When the scan is complete for a particular batch job, the scanned files will be moved to the archive directory. This prevents the build-up of old data in the “scanning” directory and prevents the job from seeing the old data the next time it runs.
  • No limit exists to the number of files that can be used as input while appearing as a single source to Teradata PT. Multiple instances of the operator can be specified to speed the data acquisition process.
  • For the sample script that corresponds to this job, see the following script in the sample/userguide directory:

    PTS00014: Batch Directory Scan.

    Note: The Batch Directory Scan functionality is supported when using the HDFS API interface to process Hadoop files, but is not supported when using the TDCH-TPT interface to process Hadoop files and tables. For more information, see “Processing Hadoop Files and Tables” in Chapter 3 of the Teradata Parallel Transporter Reference.