Usage Notes - Parallel Transporter

Teradata Parallel Transporter Reference

Product
Parallel Transporter
Release Number
15.00
Language
English (United States)
Last Update
2018-09-27
dita:id
B035-2436
lifecycle
previous
Product Category
Teradata Tools and Utilities

Usage Notes

FileName

The use of the FileName attribute varies depending on operator type, operating system, and whether the file resides in the local filesystem or in Hadoop's distributed file system. The traditional DataConnector attributes, including FileName, are used when interfacing with Hadoop via the HDFS API interface, but are not used when interfacing with Hadoop through TDCH. For more information about the DataConnector's Hadoop interfaces, see “Processing Hadoop Files and Tables” on page 144.

  • DataConnector Producer Operator
  • When using the DataConnector operator as a producer to read data from files in the local file system, the wildcard character (*) is allowed in a FileName attribute if you want to process all matching files or members within a named UNIX OS directory or the z/OS partitioned dataset (PDS or PDSE). Wildcard UNIX-style “egrep” patterns are also supported when using the DataConnector operator as a producer to read Hadoop files via the HDFS API interface.

    The following conditions also apply depending on your operating system:

  • On UNIX systems, the FileName attribute limit is 255 bytes. FileName can either be a complete pathname of the file, or the name of a file within a directory. But if the directory is not defined in the optional DirectoryPath attribute, filename is expected to be found in the default directory. See Table 4 for examples.
  • On z/OS systems, FileName can specify the fully-qualified dataset name of the script file, including the member name if the dataset is a PDS or PDSE, the member name of a PDS or PDSE script library, or 'DD:<ddname>'. If only a member name is specified for the FileName, then the (PDS or PDSE) dataset name containing the member must be specified by the DirectoryPath attribute. See Table 4 for examples.
  • DataConnector Consumer Operator
  • When using the DataConnector operator as a consumer, the FileName attribute becomes the complete file specification, and the FileName cannot contain the wildcard character (*).

    On UNIX systems, unless you specify a pathname, the FileName is expected to be found in the default directory. See Table 4 for examples.

    When writing files whose FileName value is not fully qualified into the Hadoop distributed file system via the HDFS API interface, the file will be created in the directory of the user specified by the HadoopUser attribute.

  • Combining FileName and FileList attributes
  • The FileList attribute extends the capabilities of the FileName attribute. Adding FileList = ‘Y’ indicates that the file identified by FileName contains a list of files to be processed as input or used as containers for output. The file names found within the FileName file are expected to be full path specifications. If no directory name is included, the files are expected to be located within the current directory. Supplying full paths for output files enables you to write files to multiple directories or disks. You cannot use the DirectoryPath attribute in conjunction with this feature.

    When the combination of FileName and FileList attributes are used to control output, the supplied file list must have the same number of files as there are defined consumer instances; a mismatch results in a terminal error. At execution, rows are distributed to the listed files in a round-robin fashion if the tbuild -C option is used. Without the option, rows may not be evenly distributed across the listed files.

    Note: DataConnector operator supports a FileList file encoded in ASCII on network-attached platforms and EBCDIC on mainframe-attached platforms.

    You cannot combine this feature with the archiving feature. Any attempt to use the archive feature (for example, by defining the ArchiveDirectoryPath attribute) results in a terminal error.

    If the pathname that you specify with the FileName attribute (as filename) contains any embedded pathname syntax (“/ “on a UNIX OS or “\” on Windows), the pathname is accepted as the entire pathname. However, if the DirectoryPath attribute is present, the FileName attribute is ignored, and a warning message is issued.

    If the FileList file-name does not exist in HDFS, then the Data Connector will assume it is a local file and process it accordingly, otherwise if it is an HDFS file it will be read from the HDFS file system.

    Table 4 contains valid syntax examples for the FileName attribute.

     

    Table 4: Valid FileName Syntax  

    Operating System

    Valid Syntax

    Explanation

    z/OS

    FileName = '//''name.name(member)'''

    z/OS PDS DSN: Name.Name(Member)

    where:

  • Name.Name = dataset Name.Name
  • Member = PDS Member.
  • FileName = '//''name.name'''

    z/OS DSN (sequential): Name.Name

    where:

  • Name.Name = dataset Name.Name.
  • FileName = 'DD:ddname'

    z/OS DSN is described in the JCL DD statement name “ddname.”

    If no DD statement is specified, the following occurs:

  • For input, the fopen library function tries to open an HFS file named DD:ddname in the home directory. If the file is not found, the fopen library function returns an error that is displayed in the SYSOUT.
  • For output, the fopen library function tries to create an HFS file named DD:ddname in the home directory. If the file already exists, the previous contents are overwritten.
  • FileName = 'member'

    z/OS PDS member is expected to reside in the DSN that is defined in the DirectoryPath attribute.

    UNIX

    FileName = '/tmp/user/filename'

    UNIX pathname.

    FileName = 'filename'

    If the DirectoryPath attribute is undefined, filename is located in the default directory.

    Windows

    FileName = '\tmp\user-filename'

    Windows path name.

    FileName = 'filename'

    Windows file name expected to be found in the directory defined in the DirectoryPath attribute.

    If the DirectoryPath is not defined, filename is located in the default directory.

    Note: On Windows platforms, using the wildcard character (*) in filename can inadvertently include undesired files. For example, specifying *.dat is the same as specifying *.dat*, which can include files with extensions such as .data, .date, and .dat071503. Therefore, it is recommend that extraneous files be removed from your folder.