The DataConnector operator can either:
- Read data from flat files, access modules, or Hadoop files and tables. As a reader, it is considered a producer operator, that is, one that produces a data stream.
- Write data to flat files, access modules, or Hadoop files and tables. As a writer, it is considered a consumer operator, that is, one that consumes a data stream.
The DataConnector operator can also be used to scan directories for files to be processed; in the TPT documentation, this is referred to as “Batch Directory Scan.” Scanning can be done in a continuous manner based on time intervals; in the TPT documentation, this is referred to as “Active Directory Scan.” For more information, see the Teradata Parallel Transporter User Guide (B035-2445).
The Active Directory Scan functionality is supported when using the HDFS interface to process Hadoop files.
The Active Directory Scan functionality is not supported when using the TDCH-TPT interface to process Hadoop files and tables.
For more information about the DataConnector Operator's Hadoop interfaces, see Processing Hadoop Files and Tables.
Parallel processing can be accomplished by specifying multiple instances of the operator.