Sets attributes of the format of the input and output streams. This allows the contract function to specify the format of the data types to the parser.
Syntax
public void setFormat( int stream, InputInfo.StreamDir dir, java.util.Map<StreamFormat.FormatAttribute,java.lang.Object> formatattributes)
Syntax Elements
- stream
- IN parameter. Indicates the stream on which the format will be applied. Currently the only valid value is 0.
- dir
- IN parameter. The direction of the stream (input or output).
- formatattributes
- IN parameter. Map of attribute values to apply. Valid attributes are as follows:
- "RECFMT"
- "TZTYPE"
- "CHARSETFMT"
- "REPUNSPTCHR"
Usage Notes
- This routine is valid only when called within the contract function of a table operator.
- For "RECFMT" the default value is INDICFMT1, where the format is IndicData with row separator sentinels. All field-level formats impact the entire record.
- If data being imported from a foreign server contains unsupported characters, you must use setFormat() and explicitly set "CHARSETFMT" and "REPUNSPTCHR" attributes.
- Format Attribute Values:
Parameter Name Definition "RECFMT" Defines the record format: - INDICFMT1 = 1
IndicData with row separator sentinels.
- INDICBUFFMT1 = 2
IndicData with NO row or partition separator sentinels.
"TZTYPE" Used as an indicator to Vantage to receive from or send TIME/TIMESTAMP data to the table operator in a different format. - RAW = 0 as stored on the file system
- UTC = 1 as UTC
"CHARSETFMT" - EVLDBC
Signals that neither data conversion nor detection is needed.
- EVLUTF16CHARSET
Signals that the external data to be imported into Vantage are in UTF16 encoding.
- EVLUTF8CHARSET
Signals that the external data to be imported into Vantage are in UTF8 encoding.
"REPUNSPTCHR" A boolean value that specifies what to do when an unsupported unicode character is detected in the external data to be imported into Vantage. - true
Replaces the unsupported character with U+FFFD.
- false
Return an error when an unsupported character is detected. This is the default behavior.
- INDICFMT1 = 1
- Importing and Exporting TIME/TIMESTAMP Data
You can map the Teradata TIME and TIMESTAMP data types to the Hadoop STRING or the Oracle TIMESTAMP data type when importing or exporting data to these foreign servers.
The table operator can use setFormat() to set the tztype attribute as an indicator to Vantage to receive from or send TIMESTAMP data to the table operator in a native but adjusted format.
The tztype attribute is set as follows for the import and export operators:- For Hadoop, the attribute is set to UTC.
- For Oracle, the attribute is set to UTC.
If the transform is off, the data will be transferred in Raw form which is the default for table operators and is consistent with standard UDFs.
tztype is a member of the structure FNC_FmtConfig_t defined in fnctypes.h as follows:typedef struct { int Stream_Fmt_en recordfmt; //enum - indicdata, fastload binary, delimited bool inlinelob; //inline or deferred bool UDTTransformsOff; //true or false bool PDTTransformsOff; //true or false bool ArrayTransformsOff; //true or false char auxinfo[128]; //For delimited text can contain the record separator, delimiter //specification and the field enclosure characters double inperc; //recommended percentage of buffer devoted to input rows bool inputnames; //send input column names to step bool outputnames; //send output column names to step TZType_en tztype; //enum - Raw or UTC int charsetfmt; // charset format of data to be imported into TD through QG bool replUnsprtedUniChar; /* true - replace unsupported unicode character encountered with U+FFFD when data is imported into TD through QG false - error out when unsupported unicode char encountered */ } FNC_FmtConfig_t;
TZType_en is defined as follows:typedef enum { Raw = 0, /* as stored on file system */ UTC = 1, /* as UTC */ } TZType_en;
For export, setInputInfo() is called during the contract phase in the resolver and will use the tztype attribute to add the desired cast to the input TIME or TIMESTAMP column types.
Vantage converts the TIME and TIMESTAMP data to the session local time before casting to the character type, so when a TIME or TIMESTAMP column is being mapped to charfix/charvar as when mapping to the Hadoop STRING type, the data will transmit in session local time zone and no explicit casts are needed.
For import, when getting the input buffer from the table operator, TIME or TIMESTAMP data have to be converted to Raw form. There is no conversion needed for the import of Hadoop Strings to Vantage TIME or TIMESTAMP data types since it follows the normal conversion path from character to TIME/TIMESTAMP in Vantage.
Teradata does not recommend importing or exporting TIME/TIMESTAMP data for a Teradata system with timedatewzcontrol flag 57 = 0. For such systems, the TIME/TIMESTAMP data is stored in OS local time. The System/Session time zone is not set and Vantage does not apply any conversions on TIME/TIMESTAMP data when reading or writing from disk. Therefore, exporting such data reliably in the format desired by the foreign server is a problem and Teradata recommends that the Teradata-to-Hadoop connector feature not be used for such systems.