Usage Notes
FileName
The use of the FileName attribute varies depending on operator type, operating system, and whether the file resides in the local filesystem or in Hadoop's distributed file system. The traditional DataConnector attributes, including FileName, are used when interfacing with Hadoop via the HDFS API interface, but are not used when interfacing with Hadoop through TDCH. For more information about the DataConnector's Hadoop interfaces, see “Processing Hadoop Files and Tables” on page 144.
When using the DataConnector operator as a producer to read data from files in the local file system, the wildcard character (*) is allowed in a FileName attribute if you want to process all matching files or members within a named UNIX OS directory or the z/OS partitioned dataset (PDS or PDSE). Wildcard UNIX-style “egrep” patterns are also supported when using the DataConnector operator as a producer to read Hadoop files via the HDFS API interface.
The following conditions also apply depending on your operating system:
When using the DataConnector operator as a consumer, the FileName attribute becomes the complete file specification, and the FileName cannot contain the wildcard character (*).
On UNIX systems, unless you specify a pathname, the FileName is expected to be found in the default directory. See Table 4 for examples.
When writing files whose FileName value is not fully qualified into the Hadoop distributed file system via the HDFS API interface, the file will be created in the directory of the user specified by the HadoopUser attribute.
The FileList attribute extends the capabilities of the FileName attribute. Adding FileList = ‘Y’ indicates that the file identified by FileName contains a list of files to be processed as input or used as containers for output. The file names found within the FileName file are expected to be full path specifications. If no directory name is included, the files are expected to be located within the current directory. Supplying full paths for output files enables you to write files to multiple directories or disks. You cannot use the DirectoryPath attribute in conjunction with this feature.
When the combination of FileName and FileList attributes are used to control output, the supplied file list must have the same number of files as there are defined consumer instances; a mismatch results in a terminal error. At execution, rows are distributed to the listed files in a round-robin fashion if the tbuild -C option is used. Without the option, rows may not be evenly distributed across the listed files.
Note: DataConnector operator supports a FileList file encoded in ASCII on network-attached platforms and EBCDIC on mainframe-attached platforms.
You cannot combine this feature with the archiving feature. Any attempt to use the archive feature (for example, by defining the ArchiveDirectoryPath attribute) results in a terminal error.
If the pathname that you specify with the FileName attribute (as filename) contains any embedded pathname syntax (“/ “on a UNIX OS or “\” on Windows), the pathname is accepted as the entire pathname. However, if the DirectoryPath attribute is present, the FileName attribute is ignored, and a warning message is issued.
If the FileList file-name does not exist in HDFS, then the Data Connector will assume it is a local file and process it accordingly, otherwise if it is an HDFS file it will be read from the HDFS file system.
Table 4 contains valid syntax examples for the FileName attribute.
Operating System |
Valid Syntax |
Explanation |
z/OS |
FileName = '//''name.name(member)'''
|
z/OS PDS DSN: Name.Name(Member) where: |
FileName = '//''name.name'''
|
z/OS DSN (sequential): Name.Name where: |
|
FileName = 'DD:ddname'
|
z/OS DSN is described in the JCL DD statement name “ddname.” If no DD statement is specified, the following occurs: |
|
FileName = 'member'
|
z/OS PDS member is expected to reside in the DSN that is defined in the DirectoryPath attribute. |
|
UNIX |
FileName = '/tmp/user/filename'
|
UNIX pathname. |
FileName = 'filename'
|
If the DirectoryPath attribute is undefined, filename is located in the default directory. |
|
Windows |
FileName = '\tmp\user-filename' |
Windows path name. |
FileName = 'filename'
|
Windows file name expected to be found in the directory defined in the DirectoryPath attribute. If the DirectoryPath is not defined, filename is located in the default directory. |
Note: On Windows platforms, using the wildcard character (*) in filename can inadvertently include undesired files. For example, specifying *.dat is the same as specifying *.dat*, which can include files with extensions such as .data, .date, and .dat071503. Therefore, it is recommend that extraneous files be removed from your folder.