Required and Optional Attributes - Parallel Transporter

Teradata® Parallel Transporter Reference

Product
Parallel Transporter
Release Number
17.10
Published
February 2022
Language
English (United States)
Last Update
2023-11-29
dita:mapPath
abr1608578396711.ditamap
dita:ditavalPath
obe1474387269547.ditaval
dita:id
ogv1478610452101
Product Category
Teradata Tools and Utilities

Use the attribute definition list syntax in the Teradata PT DEFINE OPERATOR statement to declare the required and optional attribute values for the DataConnector operator.

Parallel processing of multiple files is permitted. Multiple instances of the producer DataConnector operator are allowed by specifying a base directory in the DirectoryPath attribute, and then a wildcard in the FileName attribute as a selection basis for a series of files to be read.

The specification of any attributes that begin with 'Hadoop' will cause the DataConnector operator to process Hadoop files, directories, and tables, rather than files and directories in the local filesystem. For more information, see Processing Hadoop Files and Tables.



2_sntx_2433K033_02_revisedArtwork_existingTemplate
sntx-2433R033-03_needs update_TEST3_v2artwork_on_originaltemplate_nochangeDocumentBounds
4_sntx_2433R033-04_revisedArtwork_existingTemplate
5_opt2_sntx-2433R033-05_revisedArtwork_existingTemplate_BOTTOMextended

where:

DataConnector Attribute Descriptions 
Syntax Element Description
AcceptExcessColumns = ‘option’

Optional attribute that specifies whether or not rows with extra columns are acceptable. This only applies to delimited data for the producer operator.

Valid values are:
  • 'Y[es]' = A row with extra columns is truncated to the number of columns defined in the schema, and sent downstream without an error being raised. The edited record is sent to the database and the original record is saved in the record error file, if it is defined via the RecordErrorFileName attribute.
  • 'N[o]' = A row with extra columns is not sent to the database, but the original record is saved to the record error file, if it is defined using the RecordErrorFileName attribute. If RecordErrorFileName is not defined, a fatal error will occur. 'No' is the default value.
  • 'YesW[ithoutLog]' = The edited row is sent to the database, but the original record is not saved in the record error file.

This attribute is ignored by the consumer operator.

AcceptMissingColumns = ‘option’ Optional attribute that determines how rows are treated in which the column count is less than that defined in the schema. This only applies to delimited data for the producer operator.
Valid values are:
  • 'Y[es]' = The row is to be extended to the correct number of columns. Each appended column will be a zero length column and be processed according to the value of the NullColumns attribute. The edited record is sent to the database and the original record is saved in the record error file, if it is defined via the RecordErrorFileName attribute.
  • 'N[o]' = A row with too few columns is not sent to the database, but the original record is saved in the record error file, if it is defined via the RecordErrorFileName attribute. If RecordErrorFileName is not defined, a fatal error will occur. 'No' is the default value.
  • 'YesW[ithoutLog]' = The edited row is sent to the database, but the original record is not saved in the record error file.

This attribute is ignored by the consumer operator.

AccessModuleInitStr = 'initString' Optional attribute that specifies the initialization string for the specified access module. This attribute is used by both the producer and consumer operators.

If the AccessModuleInitStr attribute is defined and the Filename attribute is not, then the Open and Read requests will be sent to the access module of each instance. However, the filename passed to the access module will be an empty string, so it is up to the access module itself to determine (possibly from the initialization string) which file is to be opened. If any Open or Read requests fail within the access module, the job will be terminated.

For the initString values, see the Initialization String section for each module in Teradata® Tools and Utilities Access Module Reference, B035-2425.

AccessModuleName = 'name' Optional attribute that specifies the name of the access module file, where the value for name is dependent on the following:

Teradata Access Module for S3

  • libs3axsmod.so on Linux platform

Teradata Access Module for Named Pipes

  • libnp_axsmod.dylib on the Apple macOS platform
  • np_axsmod.so on all other UNIX platforms
  • np_axsmod.dll on Windows platforms

Teradata Access Module for WebSphere MQ (client version)

  • libmqsc.dylib on the Apple macOS platform
  • libmqsc.so on all other UNIX platforms
  • libmqsc.dll on Windows platforms

Teradata Access Module for WebSphere MQ (server version)

  • libmqs.dylib on the Apple macOS platform
  • libmqs.so on all other UNIX platforms
  • libmqs.dll on Windows platforms

Teradata Access Module for OLEDB

  • oledb_axsmod.dll on Windows platforms
Teradata Access Module for Kafka
  • libkafkaaxsmod.so on Linux platform
Teradata Access Module for Azure Blob
  • libazureaxsmod.so on Linux platform
Teradata Access Module for Google Cloud Storage (GCS)
  • libgcsaxsmod.so on Linux platform

Use your shared library file name if you use a custom access module.

Access module names do not need a suffix since the operator appends the correct suffix for the platform used.

If a path is not included in the access module name, Teradata PT will search for it in the following order:
  • Search the sub-directory within the installation directory that stores the TPT libraries.
  • Search the current working directory.
  • Allow the operating system to search default system directories or paths defined through linker environment variables (such as

    LD_LIBRARY_PATH).

This attribute is used by both the producer and consumer operators.

Large File Access Module is no longer available because the DataConnector operator now supports file sizes greater than 2 gigabytes on Windows, AIX, and Solaris running on SPARC systems when system parameters are appropriately set.
AppendDelimiter = ‘option’ Optional attribute that adds a delimiter at the end of every record written. Use AppendDelimiter when creating delimited output files with the consumer operator.

When the last column in the record is NULL, a trailing delimiter denotes that the column is NULL.

Valid values are:
  • 'Y[es]' = Adds a delimiter at the end of every record written.
  • 'N[o]' = Does not add a delimiter at the end of every record written (default).

This attribute is not valid for the producer operator.

ArchiveDirectoryPath = ‘pathName’

Defines the complete pathname of a directory to which all processed files are moved from the current directory (specified with the DirectoryPath attribute) for the producer operator.

This attribute is required when specifying a value for the VigilMaxFiles attribute.

This attribute is ignored by the consumer operator.

ArchiveFatal = ‘option’ Defines what action to take if an archive (file move) fails for the producer operator.
Valid values are:
  • 'Y[es]' = The job terminates (default).
  • 'N[o]' = Processing continues with a warning.

This attribute is ignored by the consumer operator.

CloseQuoteMark = 'character(s)'

Defines the character sequence for the closing quote mark within delimited data.

May be any single or multibyte value from the session character set. For example, ‘ " ‘ or ‘ | | ‘.

The default value is the sequence provided for the attribute OpenQuoteMark.

This attribute is used by both the producer and consumer operators.

DirectoryPath = 'pathName'

This optional attribute defines the location of source files for reading (by the producer operator) or target files for writing (by the consumer operator).

Use this attribute to specify an existing base directory path (or z/OS PDS dataset name) for the location of the file (or PDS members) indicated by the FileName attribute. This attribute cannot be used if a z/OS data set (DD:DATA) is specified in the FileName attribute.

To specify an z/OS PDS data set with a JCL DD statement, prefix the DirectoryPath attribute value with ‘DD:’ as shown in the following example:

DirectoryPath='DD:<ddname>'

To specify the z/OS PDS data set directly, use the following syntax:

DirectoryPath=’//’’dataset-name’’’

This attribute defaults to the directory in which the job is executing (the job working directory specified in the DEFINE JOB statement).

If the DataConnector is a producer instance, the Directory Path specification is prepended to the file name only if no directory names appear within the FileName attribute. If a directory is included in the FileName attribute, then the DirectoryPath attribute is expected to be empty.

EnableScan = ‘mode’

Optional attribute that bypasses the directory scan logic when using access modules with the producer operator.

Valid values are:
  • ‘Y[es]’ = Operator retains its original behavior, which is to automatically scan directories (default).
  • ‘N[o]’ = Operator bypasses the directory scan feature and passes directly to the access module only the file specified in the FileName attribute.

If this attribute is set to ‘No’ while a wildcard character is specified in the FileName attribute, a warning message is generated in the DataConnector log.

This attribute is not valid for the consumer operator.

ErrorLimit = errorLimit

errorLimit = (0 - 2147483647)

0 = Default (Unlimited)

Optional attribute that specifies the approximate number of records that can be stored in the error row file before the DataConnector producer operator job is terminated. If ErrorLimit is not specified, it is the same as an ErrorLimit value of 0. The ErrorLimit specification applies to each instance of the DataConnector producer operator.

When the "RecordErrorFileName" attribute is defined (previously known as "RowErrFileName"), error records are saved in the specified file and the job continues to process additional records without exiting with a fatal error.

This attribute is ignored by the consumer operator.

For information about the effects of the ErrorLimit attribute, see Teradata® Parallel Transporter User Guide, B035-2445.

For a list of obsolete syntax, which are supported but no longer documented, see Deprecated Syntax.
EscapeQuoteDelimiter = 'character(s)'

Optional attribute that allows you to define the escape quote character sequence within delimited data. The default value is the sequence provided for the CloseQuoteMark attribute. See Rules for Quoted Delimited Data Handling.

When processing data in delimited format, if the EscapeQuoteDelimiter precedes either the OpenQuoteMark or the CloseQuoteMark, that instance of the quote mark (either open or close) is included in the data rather than marking the beginning or end of a quoted string.

This attribute is used by both the producer and consumer operators.

EscapeRecordDelimiter = 'character(s)'

Optional attribute that allows you to define the record delimiter escape sequence within unquoted delimited data.

When processing data in delimited format, if the escape sequence defined by EscapeRecordDelimiter precedes the end-of-record (EOR) marker, the escape sequence is removed and that instance of the EOR is included in the data rather than marking the end of the record. If the escape sequence defined by EscapeRecordDelimiter is not immediately followed by an EOR, the escape sequence is treated as regular data.

Other details:
  • An EOR is either a LF (0x0A), CRLF (0x0D0A), or an EBCDIC newline (0x15).
  • EscapeRecordDelimiter is ignored for all format types other than 'Delimited'.
  • EscapeRecordDelimiter can be one or more characters in length and can contain single-byte or multi-byte characters depending upon the session character set.
  • For quoted data, an EscapeRecordDelimiter is treated as regular data, just like an embedded EOR.
  • EscapeRecordDelimiter is only used by the producer operator. It is ignored by the consumer operator.
  • There is no default value.
EscapeTextDelimiter = 'character(s)'

Optional attribute that allows you to define the delimiter escape character sequence within delimited data.

When processing data in delimited format, if the escape sequence defined by EscapeTextDelimiter precedes the delimiter, that instance of the delimiter is included in the data rather than marking the end of the column. If the escape sequence defined by EscapeTextDelimiter is not immediately followed by the delimiter character, the data is considered to be ordinary and no further processing is performed.

For example, if the default delimiter is the pipe ( | ) and the EscapeTextDelimiter is the backslash, then column data input of abc\|def| would be loaded as abc|def.

Other details:
  • EscapeTextDelimiter is ignored for all format types other than 'Delimited'.
  • EscapeTextDelimiter can be one or more characters in length and can contain single-byte or multi-byte characters depending upon the session character set.
  • For quoted data, an EscapeTextDelimiter is treated as regular data.
  • EscapeTextDelimiter is used by both the producer and consumer operators.
  • There is no default value.
FileList = 'option'

Optional attribute used in conjunction with the FileName attribute.

Valid values are:
  • 'Y[es]'= The file specified by FileName contains a list of files to be processed. For the producer operator, any number of files can be listed, each being read for data. For the consumer operator, the number of files listed must equal the number of consumer instances, so that each instance has a file to write to.
  • 'N[o]' = The file specified by FileName does not contain a list of files to be processed.
The DataConnector operators only support a FileList file encoded in ASCII on network-attached platforms.
FileName = 'fileName'

Required attribute that specifies the name of the file to be processed. For the producer operator, this file defines where to read source data. For the consumer operator, it defines where to write target data.

In some cases, the access module specified using the AccessModuleName attribute may not use or recognize file names and, therefore, may not require specification of a value for the FileName attribute. For example, the Teradata Access Module for IBM Websphere MQ does not require a file name specification.

When used with the FileList attribute, fileName is expected to contain a list of names of the files to be processed, each with a full path specification. In this case, wildcard characters are not supported for either the FileName attribute or the filenames it contains. Multiple instances of the operator can be used to process the list of files in parallel.

When FileList=’No’, wildcard characters in the fileName will result in a directory scan by the producer operator. However, on Windows platforms, using the wildcard character (*) in the 'filename' operator attribute may inadvertently include more files than you desire. For example, if you specify *.dat, a directory scan of the folder will find files as if you had specified*.dat*; for example, files with the extension .data, .date, and .dat071503 will also be found. Therefore, you may need to first remove extraneous files from your folder.

Reading and writing of a GZIP compressed file is supported on all z/OS platforms. The support for this is enabled automatically based on the file extension. The standard file name extension for gzip files is "*.gz".

Reading and writing of a ZIP compressed file is supported on Windows and Unix, but not on IBM

z/OS. The support for this is enabled automatically based on the file extension. The standard file name extension for zip files is "*.zip".

Only single files are supported with the ZIP format for both reading and writing.

Reading and writing of GZIP and ZIP files is not supported when using Hadoop/HDFS.

For additional z/OS dataset syntax, see the table Valid Filename Syntax.

FileSizeMax = 'fileSizeMax'

Optional attribute which is only valid for a consumer instance. It defines the maximum file size (in bytes) for an output file. When a file reaches this size, DataConnector closes the file and continues writing records to a new output file, using the original user-defined file name with an appended instance number and file number.

Valid values for this attribute are:
  • Any positive integer within the file size limit enforced by the current file system.
  • A shorthand designation consisting of leading digits followed by a 'K', 'M', or 'G', which represent 1024 bytes, 1048576 bytes, and 1073741824 bytes, respectively. For example, '350M' represents 367001600 bytes. The overall numeric representation must be within the file size limit enforced by the current file system.
The naming of output files is based on the FileName setting, the instance number, and the number of files that have already been opened for that instance. The specific naming syntax is the base file name taken from the FileName setting, followed by a hyphen ('-'), followed by the operator instance number, followed by another hyphen, followed by a numerical count (file number), and then followed by the file extension (if it exists) from the FileName setting. For example, if 3 instances of DataConnector are running and FileName='abcd.txt', the following 9 files may be created:
  • abcd-1-1.txt – first file for first instance
  • abcd-2-1.txt – first file for second instance
  • abcd-3-1.txt – first file for third instance
  • abcd-1-2.txt – second file for first instance
  • abcd-2-2.txt – second file for second instance
  • abcd-3-2.txt – second file for third instance
  • abcd-1-3.txt – third file for first instance
  • abcd-2-3.txt – third file for second instance
  • abcd-3-3.txt – third file for third instance

This naming convention is always used when the FileSizeMax attribute is defined, no matter if just one operator instance is used or if the size limit is never reached.

Individual records will not span files. So, the final size of a maxed-out file may be slightly less than the FileSizeMax value.

The DataConnector consumer operator must be able to write at least one record to each file. So, it is possible for a file to exceed the FileSizeMax value, only if the file contains a single record and that record exceeds the FileSizeMax limit.

When using multiple instances, some instances may produce more output files than other instances, simply because they are processing more records from the data stream. To ensure a more evenly distribution of records across files for each instance, use tbuild's -C command line option.

The FileSizeMax attribute is not supported for the following:
  • The DataConnector producer operator
  • The z/OS platform
  • ZIP and GZIP files
  • Hadoop files
  • FastExport OUTMOD, FastLoad INMOD, and MultiLoad INMOD operators

The default value (0) indicates that no file size limit will be enforced.

Format = 'format'

Required attribute that specifies the logical record format of the data. This attribute is used by both the producer operator and the consumer operator. No system default exists.

Format can have any of the following values:
  • 'Binary' = 2-byte integer, n, followed by n bytes of data. This data format requires rows to be 64KB (64260 data bytes) or smaller. In this format:

    The data is prefixed by a record-length marker.

    The record-length marker does not include the length of the marker itself.

    The record-length is not part of the transmitted data.

  • 'Binary4' = 4-byte integer, n, followed by n bytes of data. This data format supports rows up to 1MB (1000000 data bytes) in size. In this format:
    • The data is prefixed by a record-length marker.
    • The record-length marker does not include the length of the marker itself.
    • The record-length is not part of the transmitted data.
  • 'Delimited' = in text format with each field separated by a delimiter character. When you specify Delimited format, you can use the optional TextDelimiter attribute to specify the delimiter character. The default is the pipe character ( | ).
    When the Format attribute of the DataConnector Producer is set to 'delimited', the associated Teradata PT schema object must be comprised of only VARCHAR and/or VARDATE columns. Specifying non-VARCHAR or non-VARDATE columns results in an error.
  • 'Formatted' = both prefixed by a 2-byte record-length marker and followed by an end-of-record marker. This data format requires rows to be 64KB (64260 data bytes) or smaller. In this format:

    The record-length marker does not include the length of the marker itself.

    Neither the record-length nor the end-of-record marker is part of the transmitted data.

  • 'Formatted4' = both prefixed by a 4-byte record-length marker and followed by an end-of-record marker. This data format supports rows up to 1MB (1000000 data bytes) in size. In this format:

    The record-length marker does not include the length of the marker itself.

    Neither the record-length nor the end-of-record marker is part of the transmitted data.

  • 'Text' = character data separated by an end-of-record (EOR) marker. The EOR marker can be either a single-byte linefeed (X'0A') or a double-byte carriage-return/line-feed pair (X'0D0A'), as defined by the first EOR marker encountered for the first record. This format restricts column data types to CHAR or ANSIDATE only.
  • 'Unformatted' = not formatted. Unformatted data has no record or field delimiters, and is entirely described by the specified Teradata PT schema.
HadoopBlockSize= (x * 1K bytes) Optional attribute that specifies the size of the block/buffer, in 1K increments, when writing Hadoop/HDFS files. The HadoopBlockSize value can be defined anywhere from 1 to x 1K bytes, where x is arbitrary. The typical default Hadoop/HDFS Cluster Block Size is 64MB which is also what TPT uses: (65536 * 1024 = 64MB).

Before using this attribute to change the default, consult your system administrator. This value affects memory consumption (internal buffer allocated at runtime is twice this size), and should not be changed indiscriminately.

Valid values are:
  • 1 - 2147483647
  • 0 = Default value
Default value = 65536.
HadoopFileFormat= 'hadoopFileFormat' Optional attribute that specifies the format of the file that the TDCH job should process. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopHost= 'hadoopHostName' Optional attribute that specifies the host name or IP address of the NameNode in a Hadoop cluster.

When launching a TDCH job, this value should be the host name or IP address of the node in the Hadoop cluster on which the TPT job is being run. This host name or IP address should be reachable by all DataNodes in the Hadoop cluster. For more information about the DataConnector's Hadoop interfaces. see Processing Hadoop Files and Tables.

When launching a HDFS API job this value indicates the cluster where the HDFS operation will be performed and can be set as follows:
  • “default” = The default name-node declared in the Hadoop HDFS configuration file.
  • <host-name>:<port> = The host-name/ip-address and port of the name-node on the cluster where the HDFS operation is to be performed. The “:<port>” value is optional.
HadoopJobType= 'hadoopJobType' Optional attribute that specifies the type of TDCH job to launch. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopNumMappers= 'hadoopNumMappers' Optional attribute that specifies the number of mappers that the TDCH will launch. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopProperties = 'hadoopProperties' Optional attribute that specifies one or more Hadoop properties and their values to be used by TPT when submitting the Hadoop command. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopSeparator= 'hadoopSeparator' Optional attribute that specifies the characters that separate fields in the file processed by the TDCH job. This attribute is only valid when 'HadoopFileFormat' is set to 'textfile', which is the attribute's default value. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopSourceDatabase='hadoopSourceDatabase' Optional attribute that specifies the name of the source database in Hive or Hcatalog from which data is exported. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopSourceFieldNames = 'hadoopSourceFieldNames' Optional attribute that specifies the names of the fields to export from the source HDFS files, or from the source Hive and HCatalog tables, in comma separated format. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopSourcePartitionSchema= 'hadoopSourcePartitionSchema' Optional attribute that specifies the full partition schema of the source table in Hive, in comma separated format. This attribute is only valid when 'HadoopJobType' is set to 'hive'. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopSourcePaths= 'hadoopSourcePaths' Optional attribute that specifies the directory of the to-be-exported source files in HDFS. This attribute is required when 'HadoopJobType' is set to 'hdfs', optional when 'HadoopJobType' is set to 'hive', and invalid when 'HadoopJobType' is set to 'hcat'. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopSourceTable = 'hadoopSourceTable' Optional attribute that specifies the name of the source table in Hive or Hcatalog from which data is exported. This attribute is required when 'HadoopJobType' is set to 'hcat', optional when 'HadoopJobType' is set to 'hive', and invalid when 'HadoopJobType' is set to 'hdfs'. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopSourceTableSchema= 'hadoopSourceTableSchema' Optional attribute that specifies the full-column schema of the source table in Hive or Hcatalog, in comma separated format. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopTargetDatabase= 'hadoopTargetDatabase' Optional attribute that specifies the name of the Target database in Hive or Hcatalog to which data is imported. It is optional with a 'hive' or 'hcat' job and not valid with an 'hdfs' job. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopTargetFieldNames = 'hadoopTargetFieldNames' Optional attribute that specifies the names of the fields to write to the target file in HDFS, or to the target Hive and HCatalog table, in comma separated format. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopTargetPartitionSchema= 'hadoopTargetPartitionSchema' Optional attribute that specifies the names of the fields to write to the target file in HDFS, or to the target Hive and HCatalog table, in comma separated format. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopTargetPaths= 'hadoopTargetPaths' Optional attribute that specifies the directory of the to-be-imported source files in HDFS. This attribute is required when 'HadoopJobType' is set to 'hdfs', optional when 'HadoopJobType' is set to 'hive', and invalid when 'HadoopJobType' is set to 'hcat'. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopTargetTable= 'hadoopTargetTable' Optional attribute that specifies the name of the target table in Hive or Hcatalog where data will be imported. This attribute is required when 'HadoopJobType' is set to 'hcat', optional when 'HadoopJobType' is set to 'hive', and invalid when 'HadoopJobType' is set to 'hdfs'. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, see Processing Hadoop Files and Tables.
HadoopTargetTableSchema= 'hadoopTargetTableSchema' Optional attribute that specifies the full column schema of the target table in Hive or Hcatalog, in comma separated format. For more information about the DataConnector's Hadoop interfaces and the Teradata Connector for Hadoop tutorial for supported and default values, Processing Hadoop Files and Tables.
HadoopUser= 'hadoopUser' Optional attribute that specifies the name of the Hadoop user to utilize when reading and writing files via the HDFSAPI interface. The currently logged-in user-name where the TPT HDFS job is running is used when this attribute is not specified. For more information about the DataConnector's Hadoop interfaces, see Processing Hadoop Files and Tables.
IndicatorMode = 'mode'

Optional attribute that specifies whether indicator bytes are inserted at the beginning of each record. This attribute is used by both the producer and consumer operators.

Valid values are:
  • 'Y[es]' = Indicator mode data. This value is not valid for the ‘text’ or ‘delimited’ record formats.
  • 'N[o]' = Nonindicator mode data (default).
LobDirectoryPath = 'pathName' This optional attribute defines an existing location where the producer operator will store temporary LOB files. This directory will only be used if DataConnector has to transfer inline LOBs in deferred mode because all of the inline LOBs cannot fit within a single request message. Teradata PT will automatically handle the creation and removal of the temporary LOB files. The user simply has to provide the location.

If pathName is not provided, the current working directory will be used.

This attribute is not valid for the consumer operator and is not valid on the z/OS platform.

MultipleReaders = 'option'

Optional attribute that, when set to 'Yes', instructs the DataConnector producer operator that more than one instance can be used to read a single file in parallel.

Valid values are:
  • 'Y[es]' = Allow multiple producer instances to read a single file. FileList must not be set to ‘Yes’.
  • 'N[o]' = Only the first instance of the producer operator is allowed to read the source data.

This attribute is not valid for the consumer operator.

MultipleReaders is disabled when the job has 1 or more inline LOB columns in the schema and the total schema size is greater than 1,024,000 bytes. A warning message will be displayed and the job will continue.
NamedPipeTimeOut = seconds

Optional attribute that enables checking of named pipes (FIFOs) by the producer operator. If seconds is set to a positive number, the DataConnector producer operator will check the pipe for data every second until either data becomes available, or the amount of time specified is reached and the job terminates. If the attribute is not specified, no checking of pipes will be performed. This will yield faster performance, but may also result in a hung job if data is not available in the pipe when it is read.

This attribute is only for jobs that use the DataConnector producer operator to read pipes directly. It is not used when the Named Pipe Access Module (NPAM) performs the pipe I/O.

This attribute is not valid for the consumer operator.

NotifyExit = 'inmodName'
Optional attribute that specifies the name of the user-defined notify exit routine with an entry point named _dynamn. If no value is supplied, the following default name is used:
  • libnotfyext.dll for Windows platforms
  • libnotfyext.dylib for Apple macOS platform
  • libnotfyext.so for all other UNIX platforms
  • NOTFYEXT for z/OS platforms

NotifyMethod must be set to ‘Exit’.

This attribute is valid for both producer and consumer operators.

See Deprecated Syntax for information about providing your own notify exit routine.

NotifyLevel = 'notifyLevel'

Optional attribute that specifies the level at which certain events are reported.

See DataConnector Operator Events for a complete description of possible events and the notification level for each event. Valid values are:
  • 'Off' = No notification of events is provided (default).
  • 'Low' = Low Notification Level.
  • 'Med' = Medium Notification Level.
  • 'High' = High Notification Level.

This attribute is valid for both producer and consumer operators.

NotifyMethod = 'notifyMethod
Optional attribute that specifies the method for reporting events. The methods are:
  • 'None' = No event logging is done (default).
  • 'Msg' = Writes the events to a log.
  • 'Exit' = Sends the events to a user-defined notify exit routine.

This attribute is valid for both producer and consumer operators.

NotifyString = 'notifyString'
Optional attribute that specifies a user-defined string to precede all messages sent to the system log. This string is also sent to the user-defined notify exit routine. The maximum length of the string is:
  • 80 bytes, if NotifyMethod is 'Exit'
  • 16 bytes, if NotifyMethod is 'Msg'

This attribute is valid for both producer and consumer operators.

NullColumns = ‘option

Optional attribute that determines how missing columns are treated. This only applies to delimited data for the producer operator.

To utilize this attribute, AcceptMissingColumns must be 'Y[es]' or 'Y[eswithoutlog]' and QuotedData must be 'Y[es]' or 'O[ptional].'

Valid values are:
  • 'Y[es]' = New job-created columns will be NULL (default).
  • 'N[o]' = New job-created columns will contain the empty string "".

For the following examples, the delimiter character is the default | character, QuotedData is enabled and AcceptMissingColumns is 'Y'. The example schema is:

...
(VARCHAR(5), VARCHAR(5), VARCHAR(5), VARCHAR(5), VARCHAR(5), VARCHAR(5))
...

The first example data record is:

"abc"|""||"def"

The schema requires 6 fields but the record only provides 4.

Fields 1, 2, and 4 contain the strings "abc", "", and "def".

Note that "" is not NULL. Rather, it is a character string of zero length. It is handled in the same manner as any other string.

Field 3 is an explicitly provided NULL column. Because it is part of the original record, it is not affected by the NullColumns attribute.

Fields 5 and 6 are not provided and must be created by the DataConnector producer operator.

If NullColumns is set to 'Y[es]', or the default behavior is used, the result will be as if the data file contained the record

"abc"|""||"def"|||

where both newly created columns are NULL.

But if NullColumns = 'N[o]' is used, the behavior will be as if the record was defined as

"abc"|""||"def"|""|""

where the newly created columns contain empty strings.

Note that fields 2 and 3, which were both part of the original data record, are unchanged regardless of the NullColumns attribute setting.

This attribute is ignored by the consumer operator.

OpenMode = 'mode'

Optional attribute that specifies the read/write access mode.

Valid values are:
  • 'Read' = Read-only access. Only valid for the producer operator.
  • 'Write' = Write-only access. Only valid for the consumer operator.
  • 'WriteAppend' = Write-only access appending to existing file. Only valid for the consumer operator.

If mode is not specified for OpenMode, it defaults to 'Read' for the producer operator and 'Write' for the consumer operator.

OpenQuoteMark = 'character(s)'

Optional attribute that allows you to define the character sequence for the opening quote mark within delimited data. The default value is the double quote character, '"'.

May be any single or multibyte value from the session character set. For example, ‘ " ’ or ‘ || ‘

For the producer operator, QuotedData must be set to ‘Yes’ or ‘Optional’. For the consumer operator, QuotedData must be set to ‘Yes’.

PrivateLogName = 'logName'

Optional attribute that specifies the name of a log that is maintained by the Teradata PT Logger inside the public log. The private log contains all of the diagnostic trace messages produced by the operator using the TraceLevel attribute.

A hyphen and the instance number are appended to the logName value. For example, if PrivateLogName = 'DClog', then the actual log name for instance 1 is DClog-1. Similarly, for instance 2, it is DClog-2, and so on.

The private log can be viewed using the tlogview command as follows, where jobId is the Teradata PT job name and privatelogname is the value for the operator’s PrivateLogName attribute, including hyphen and instance number:

tlogview -j jobId -f privatelogname

If the private log is not specified, all output is stored in the public log.

For more information about the tlogview command, see Teradata PT Utility Commands.

This attribute is valid for both producer and consumer operators.

QuotedData = ‘option

Optional attribute that determines if delimited data is expected to be enclosed within quotation marks.

Valid values are:
  • 'Y[es]' = All columns are expected to be enclosed in quotation marks. For a producer operator, quotation marks will be discarded before data is written to the data stream. For a consumer operator, quotation marks are added before column data is written to the output file.
  • ‘N[o]’ = Column enclosure within quotation marks is not expected (default). For a producer operator, quotation marks are treated as data. For a consumer operator, quotation marks will not be added to column data.
  • ‘Optional’ = Columns can optionally be enclosed within quotation marks. For a producer operator, if quotation marks are found, they will be discarded before data is written to the data stream. For a consumer operator, quotation marks are only added if column data contains an embedded quotation mark or an embedded text delimiter.
RecordErrorFileName = ‘filePath’

Optional attribute that specifies where error records are stored when reading delimited data files. Error records include those with either incorrect column counts or columns with invalid lengths. This only applies to the producer operator.

The ErrorLimit attribute specifies how many error records can be written to this file.

If this attribute is undefined, the first occurrence of an error record will result in a fatal operator error and job termination.

This attribute is ignored by the consumer operator.

RecordErrorVerbosity = ‘option

Optional attribute that allows for annotations in the record error file when the RecordErrorFileName attribute is set. This only applies to the producer operator.

Valid values are:
  • ‘Off’ = No annotations are to be inserted into the record error file (default).
  • ‘Low’ = The error message describing the nature of the error is included.
  • ‘Med’ = The file name and record number are included, along with error messages describing the nature of the error.
  • ‘High’ = The same as ‘Med’.

This attribute is ignored by the consumer operator.

RecordsPerBuffer = count

Optional attribute that defines the number of records to be processed by each instance of the producer operator during each processing phase when MultipleReaders='Yes'. Valid values are between 1 and 999999.

The default is calculated by dividing the buffer size (1048575) by the number of worker reader instances.

That result is then divided by the maximum record size as defined by the schema.

The number of worker instances is equal to the total operator instances minus 1.

For example, if 10 reader instances are defined, and the length of the schema is 400 bytes, then this value would default to 1048575 bytes / 9 instances / 400 bytes = 291 records.

This attribute is ignored by the consumer operator.

RowsPerInstance = rows

Optional attribute that specifies the maximum number of records to be processed by each instance of the producer operator.

This number spans files, meaning that processing continues over multiple files until the row limit is reached for each instance. Once the row limit is reached for a particular instance, that instance is done. If the limit is not reached for a particular instance, that instance ends normally.

The limit is not effective across restarts, meaning the row count is reset to zero upon restart.

This attribute is ignored by the consumer operator.

SingleRecordWriting = ‘option

Optional attribute that allows an Access Module to receive one record per write operation instead of receiving a buffer that can contain multiple records. This option only applies to consumer instances that have a valid AccessModuleName entry.

Valid values are:
  • 'Y[es]' = Each write operation to the Access Module will only contain one record.
  • 'N[o]' = Each write operation to the Access Module will contain a buffer that may include multiple records. 'No' is the default value.

This attribute is ignored by the producer operator.

SkipRows = rows

Optional attribute that specifies the number of rows to skip by each instance of the producer operator.

The SkipRowsEveryFile attribute will determine if SkipRows spans files and restarts.

This attribute is not valid for the consumer operator.

SkipRowsEveryFile = ‘option

Optional attribute that governs the behavior of SkipRows (above). This only applies to the producer operator.

Valid values are:
  • 'Y[es]' = SkipRows restarts at the beginning of each file. For example, if SkipRows = 5, SkipRowsEveryFile = 'Yes', and 5 files to be processed each contain 300 rows, the first 5 rows of each file are skipped and rows 6 through 300 are processed. You might use this option to skip repetitive header rows in each file to be processed.
  • 'N[o]' = SkipRows value is cumulative (default). That is, processing continues over multiple files until the specified number of rows to skip is reached. For example, if SkipRows = 1000, SkipRowsEveryFile = 'N', and 5 files to be processed each contain 300 rows, Files 1, 2, and 3 are skipped in their entirety, file 4 begins processing at row 101, and all of file 5 is processed. You might use this option to skip rows that were already processed in a failed job.

This attribute is ignored by the consumer operator.

TextDelimiter = 'character

Optional attribute that specifies the bytes that separate fields in delimited records. Any number of single or multibyte characters from the session character set can be defined. This attribute is only valid when Format = ’Delimited’.

The default delimiter value is the pipe character ( | ).

The TextDelimiter byte sequence can be treated as data if it is preceded with an escape sequence defined by the EscapeTextDelimiter attribute.

To use the tab character as the delimiter character, specify TextDelimiter = 'TAB'. Use uppercase “TAB” not lowercase “tab”.

This attribute is used by both the producer operator and the consumer operator.

Timeout = seconds

Optional attribute that specifies the number of seconds the system waits for input to finish by the producer operator. The supplied value is passed to all access modules attached to the producer operator.

Valid values are from 1 to 99999 seconds.

The default is 0 (no timeout).

This attribute is not valid for the consumer operator.

TraceLevel = 'level'

Optional attribute that specifies the types of diagnostic information that are written by each instance of the operator to the public log (or private log, if one is specified using the PrivateLogName attribute).

The diagnostic trace function provides detailed information in the log file to aid in problem tracking and diagnosis. The trace levels are:
  • 'None' = Disables the trace function (default). Status, error, and other messages default to the public log.

    The PrivateLogFile attribute default is used only if a TraceLevel attribute other than 'None' is specified. If a TraceLevel attribute other than 'None' is specified without a PrivateLog specification, the DataConnector operator generates a private log name and a message containing the private log name is issued in the public log.

    If no TraceLevel attribute is specified, or if the specified value is 'None', and the PrivateLogFile attribute is specified, the TraceLevel is set to 'Milestones'. The recommended TraceLevel value is 'None', which produces NO log file. Specifying any value greater than 'IO_Counts' produces a very large amount of diagnostic information.

  • 'Milestones' = Enables the trace function only for major events such as initialization, access module attach/detach operations, file openings and closings, error conditions, and so on
  • 'IO_Counts' = Enables the trace function for major events and I/O counts
  • 'IO_Buffers' = Enables the trace function for major events, I/O counts, and I/O buffers
  • 'All' = Enables the trace function for major events and I/O counts and I/O buffers plus function entries.
If PrivateLogFile attribute specifies a log file without specifying the TraceLevel attribute, “minimal” statistics are displayed in the log file, such as:
  • Name of files as they are processed
  • Notice when sending rows begin
  • On completion, the number of rows processed and the CPU time consumed
  • Total files processed and CPU time consumed by each instance of the DataConnector operator
The TraceLevel attribute is provided as a diagnostic aid only. The amount and type of additional information provided by this attribute will change to meet evolving needs from release to release. It should be used with caution, due to the impact it can have on performance and disk usage.

This attribute is valid for both producer and consumer operators.

TrimChar = ‘character’

Optional attribute that specifies the characters to be trimmed from delimited column data by the producer operator. Use in conjunction with the TrimColumns attribute.

Rules for a trim character are:
  • The trim character must be a single character, but may be either a single-byte or multi-byte character from the session character set.
  • The default value is the blank (space) character.
  • Trimming can be performed on either quoted or unquoted fields.
  • If a field consists solely of one or more trim characters, it will be a zerolength VARCHAR after trimming.

For example, if TextDelimiter = ’|’, TrimChar = ‘a’, and TrimColumns = ‘Both’, then the following delimited record:

a1|2aa|aaa|aaaa4aaaa

will get treated as:

1|2||4

This attribute is ignored by the consumer operator.

TrimColumns = ‘option

Optional attribute that specifies how characters are trimmed from delimited column data by the producer operator. Use in conjunction with the TrimChar attribute.

Valid values are:
  • 'None' = No trimming (default)
  • 'Leading' = Leading characters are trimmed
  • 'Trailing' = Trailing characters are trimmed
  • 'Both' = Both leading and trailing characters are trimmed
If TrimColumns and TruncateColumnData are enabled, trimming occurs before truncating.

This attribute is not valid for the consumer operator.

TruncateColumnData = ‘option’

Optional attribute that determines how a column is treated whose length is greater than that defined in the schema. This only applies to delimited data for the producer operator.

Valid values are:
  • 'Y[es]' = The column is truncated to the maximum length and processed without an error being raised. The edited record is sent to the database and the original record is saved in the record error file, if it is defined via the RecordErrorFileName attribute.
  • 'N[o]' = A column that is too long is not sent to the database, but the original record is saved in the record error file, if it is defined via the RecordErrorFileName attribute. If RecordErrorFileName is not defined, a fatal error will occur. 'No' is the default value.
  • 'YesW[ithoutLog]' = The column is truncated to the maximum length and processed without an error being raised. The edited row is sent to the database, but the original record is not saved in the record error file.

This attribute is ignored by the consumer operator.

VigilElapsedTime = minutes

Optional attribute that specifies the elapsed time from the beginning of the job to the end of the job for an Active Directory Scan by the producer operator.

This is the amount of time to wait from the VigilStartTime in which the directory specified in the DirectoryPath attribute is watched for the arrival of new files. If the VigilStartTime attribute is not set, the system’s current time is used as the start time. VigilElapsedTime is expressed in minutes. For example, a 2-hour and 15-minute window is indicated as:

VigilElapsedTime = 135

VigilElapsedTime and VigilStopTime are interchangeable, but mutually exclusive. There is no default value for VigilElapsedTime. It requires the ArchiveDirectoryPath to also be set.

This attribute is not valid for the consumer operator.

VigilMaxFiles = numberOfFiles

Optional attribute that defines the maximum number of files that can be scanned in one pass by the producer operator for an Active Directory Scan.

Greater values require more Teradata PT global memory and could degrade performance.

The valid value range of numberOfFiles is from 10 to 50000. The default value is 2000.

Use of the VigilMaxFiles attribute requires that a value for the ArchiveDirectoryPath attribute be specified, along with VigilStartTime and VigilStopTime (or VigilElapsedTime).

The attribute’s value can be modified during job execution using the External Command Interface. To change the value of VigilMaxFiles during execution, enter:

twbcmd  <Teradata PT job ID> <operator ID>  VigilMaxFiles=<number of files>

This attribute is ignored by the consumer operator.

VigilNoticeFileName = 'noticeFileName'

Optional attribute that specifies the name of the file in which the vigil notice flag is to be written by the producer operator for an Active Directory Scan.

For example, to request that a record be written to the file /home/user/Alert.txt, specify the attribute as:

VigilNoticeFileName = '/home/user/Alert.txt'

If a directory path is not specified, the file is saved in the current working directory.

Naming a file activates this notification feature.

It requires that a value for the ArchiveDirectoryPath attribute be specified, along with VigilStartTime and VigilStopTime (or VigilElapsedTime).

This attribute is ignored by the consumer operator.

VigilSortField = ‘sortTime’

Optional attribute that provides the capability for files to be sorted in a specific order by the producer operator for an Active Directory Scan.

Valid values are:
  • 'TIME' = All files will be sorted according to the time they were last modified.
  • 'NAME' = All files are sorted by filename and processed in ascending alphabetical order.
  • 'NONE' = The sort feature is off (default).

Since times associated with the files are tracked to the nearest second, more than one file may have the same timestamp. When modification times for files are less than one second apart, the sort order of the files may not represent the actual order modified.

When using multiple instances, files cannot be processed in a specific sorted order. When this attribute is used, Teradata PT allows only a single instance of the DataConnector producer operator to be used in a job step. If more than one instance is specified, the job will fail.

It requires that a value for the ArchiveDirectoryPath attribute be specified, along with VigilStartTime and VigilStopTime (or VigilElapsedTime).

This attribute is not valid for the consumer operator.

This attribute is not available for z/OS systems.
VigilStartTime = 'yyyymmdd hh:mm:ss'

Optional attribute that specifies the time to start the vigil time window for an Active Directory Scan by the producer operator. It helps define the period in which the directory specified in the DirectoryPath attribute is watched for the arrival of new files.

The start time is expressed as follows:
  • yyyy is the 4-digit year (2000-3000)
  • mm is the month (1-12)
  • dd is the day of the month (1-31)
  • hh is the hour of the day (0-23)
  • mm is the minute (0-59)
  • ss is the second (0-59)

For example, a start time of August 23, 2019, at 9:22:56 a.m. becomes:

VigilStartTime = '20190823 09:22:56'

This attribute is required for the VigilWaitTime attribute to work.

There is no default value for VigilStartTime. It requires the ArchiveDirectoryPath to also be set, along with either VigilStopTime or VigilElapsedTime.

This attribute is not valid for the consumer operator.

VigilStopTime = ‘yyyymmdd hh:mm:ss

Optional attribute that specifies the time to stop the vigil time window for an Active Directory Scan by the producer operator. It helps define the period in which the directory specified in the DirectoryPath is watched for the arrival of new files.

The stop time is expressed as follows:
  • yyyy is the 4-digit year (2000-3000)
  • mm is the month (1-12)
  • dd is the day of the month (1-31)
  • hh is the hour of the day (0-23)
  • mm is the minute (0-59)
  • ss is the second (0-59)

For example, a stop time of August 23, 2019, at 2 p.m. becomes:

VigilStopTime  = '20190823 14:00:00'

VigilStopTime and VigilElapsedTime are interchangeable, but mutually exclusive. There is no default value for VigilStopTime. It requires the ArchiveDirectoryPath to also be set, along with VigilStartTime.

This attribute is not valid for the consumer operator.

VigilWaitTime = waitSeconds

Optional attribute that specifies the amount of time to wait before starting to check the directory again if no new files were found by the producer operator for an Active Directory Scan.

A wait time of 2 minutes becomes:

VigilWaitTime = 120

The default wait time is 60 seconds.

Use of the VigilWaitTime attribute requires that a value for the ArchiveDirectoryPath attribute be specified, along with VigilStartTime and VigilStopTime (or VigilElapsedTime).

The attribute’s value can be modified during job execution using the External Command Interface. To change the value of VigilWaitTime during execution, enter:

twbcmd  <Teradata PT job ID> <operator ID>  VigilWaitTime=<Seconds>

This attribute is ignored by the consumer operator.

WriteBOM = 'option'

Optional attribute that allows a BOM (Byte Order Mark) to be inserted at the beginning of a Unicode output file.

This option only applies to consumer instances that are writing to text files (the Format attribute is 'Text' or 'Delimited') which are using a UTF-8 or UTF-16 character set.

Valid values are:
  • 'Y[es]' = Inserts a BOM to the beginning of the output file based on the character set encoding.
  • 'N[o]' = Does not insert a BOM to the beginning of a Unicode output file. 'No' is the default value.

This attribute is not valid for the producer operator.