Several file formats can be processed by the DataConnector operator. Specify file format with the Format attribute.
- Format = 'Binary' – Each record contains a two-byte integer data length (n) followed by n bytes of data.
- Format = 'Binary4' – Each record contains a four-byte data integer data length (n) followed by n bytes of data.
- Format = 'Text' – Each record is entirely character data, an arbitrary number of bytes followed by one of the following end-of-record markers:
- A single-byte line feed (X'0A') on UNIX platforms
- A double-byte carriage-return/line-feed pair (X'0D0A') on Windows platforms
- Format = 'Delimited' – Each record is in variable-length text record format, but each contains fields (columns) separated by one or more delimiter characters, as defined with the TextDelimiter attribute, which has the following limitations:
- It can only be a sequence of characters.
- It cannot be any character that appears in the data.
- It cannot be a control character other than a tab.
With this file format, when using the DataConnector Operator as a producer, the operator's associated TPT schema object must comprise only VARCHAR, JSON, JSON BY NAME, CLOB BY NAME, BLOB BY NAME, XML BY NAME, XML, CLOB or VARDATE columns. And, if not provided, the TextDelimiter attribute defaults to the pipe character ( | ).
There is no default escape character when using delimited data. Use the DataConnector operator EscapeTextDelimiter attribute to define the escape character.
- Format = 'Formatted' – Each record is in a format known as FastLoad or Teradata format, which is a two-byte integer (n) followed by n bytes of data, followed by an end-of-record marker (X'0A' or X'0D0A).
- Format = 'Formatted4' – A version of the FastLoad format that supports rows greater than 64KB in length. Each record is a four-byte integer (n) followed by n bytes of data, followed by an end-of-record marker (X'0A') or X'0D0A').
- Format = 'Unformatted' – The data does not conform to any predefined format. Instead, the data is entirely described by the columns in the schema definition of the DataConnector operator.
When processing Hadoop files and tables via the TDCH interface, data is transferred between TPT and TDCH in formatted mode with indicator bytes; the value of the HadoopFileFormat attribute determines the format of the Hadoop file or table processed by TDCH.