17.10 - setFormat - Advanced SQL Engine - Teradata Database

Teradata Vantageā„¢ - SQL External Routine Programming

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B035-1147-171K
Language
English (United States)

Sets attributes of the format of the input and output streams. This allows the contract function to specify the format of the data types to the parser.

Syntax

public void  setFormat(
   int  stream,
   InputInfo.StreamDir  dir,
   java.util.Map<StreamFormat.FormatAttribute,java.lang.Object>  formatattributes)

Syntax Elements

stream
IN parameter. Indicates the stream on which the format will be applied. Currently the only valid value is 0.
dir
IN parameter. The direction of the stream (input or output).
formatattributes
IN parameter. Map of attribute values to apply. Valid attributes are as follows:
  • "RECFMT"
  • "TZTYPE"
  • "CHARSETFMT"
  • "REPUNSPTCHR"
"CHARSETFMT" and "REPUNSPTCHR" apply only to import table operators.

Usage Notes

  • This routine is valid only when called within the contract function of a table operator.
  • For "RECFMT" the default value is INDICFMT1, where the format is IndicData with row separator sentinels. All field-level formats impact the entire record.
  • If data being imported from a foreign server contains unsupported characters, you must use setFormat() and explicitly set "CHARSETFMT" and "REPUNSPTCHR" attributes.
  • Format Attribute Values:
    Parameter Name Definition
    "RECFMT" Defines the record format:
    • INDICFMT1 = 1

      IndicData with row separator sentinels.

    • INDICBUFFMT1 = 2

      IndicData with NO row or partition separator sentinels.

    "TZTYPE" Used as an indicator to Vantage to receive from or send TIME/TIMESTAMP data to the table operator in a different format.
    • RAW = 0 as stored on the Teradata file system
    • UTC = 1 as UTC
    "CHARSETFMT"
    • EVLDBC

      Signals that neither data conversion nor detection is needed.

    • EVLUTF16CHARSET

      Signals that the external data to be imported into Vantage are in UTF16 encoding.

    • EVLUTF8CHARSET

      Signals that the external data to be imported into Vantage are in UTF8 encoding.

    "REPUNSPTCHR" A boolean value that specifies what to do when an unsupported unicode character is detected in the external data to be imported into Vantage.
    • true

      Replaces the unsupported character with U+FFFD.

    • false

      Return an error when an unsupported character is detected. This is the default behavior.

  • Importing and Exporting TIME/TIMESTAMP Data

    You can map the Teradata TIME and TIMESTAMP data types to the Hadoop STRING or the Oracle TIMESTAMP data type when importing or exporting data to these foreign servers.

    The table operator can use setFormat() to set the tztype attribute as an indicator to Vantage to receive from or send TIMESTAMP data to the table operator in a native but adjusted format.

    The tztype attribute is set as follows for the import and export operators:
    • For Hadoop, the attribute is set to UTC.
    • For Oracle, the attribute is set to UTC.

    If the transform is off, the data will be transferred in Raw form which is the default for table operators and is consistent with standard UDFs.

    tztype is a member of the structure FNC_FmtConfig_t defined in fnctypes.h as follows:
    typedef struct
    {
       int Stream_Fmt_en recordfmt; //enum - indicdata, fastload binary, delimited
       bool inlinelob; //inline or deferred
       bool UDTTransformsOff; //true or false
       bool PDTTransformsOff; //true or false
       bool ArrayTransformsOff; //true or false
       char auxinfo[128]; //For delimited text can contain the record separator, delimiter
                             //specification and the field enclosure characters
       double inperc; //recommended percentage of buffer devoted to input rows
       bool inputnames; //send input column names to step
       bool outputnames; //send output column names to step
       TZType_en tztype; //enum - Raw or UTC
       int charsetfmt; // charset format of data to be imported into TD through QG
       bool replUnsprtedUniChar; /* true - replace unsupported unicode character
                                    encountered with U+FFFD when data is imported
                                    into TD through QG
                                    false - error out when unsupported unicode
                                    char encountered */
    } FNC_FmtConfig_t;
    TZType_en is defined as follows:
    typedef enum
    {
       Raw = 0, /* as stored on TD File system */
       UTC = 1, /* as UTC */
    } TZType_en;

    For export, setInputInfo() is called during the contract phase in the resolver and will use the tztype attribute to add the desired cast to the input TIME or TIMESTAMP column types.

    Vantage converts the TIME and TIMESTAMP data to the session local time before casting to the character type, so when a TIME or TIMESTAMP column is being mapped to charfix/charvar as when mapping to the Hadoop STRING type, the data will transmit in session local time zone and no explicit casts are needed.

    For import, when getting the input buffer from the table operator, TIME or TIMESTAMP data have to be converted to Raw form. There is no conversion needed for the import of Hadoop Strings to Vantage TIME or TIMESTAMP data types since it follows the normal conversion path from character to TIME/TIMESTAMP in Vantage.

    Teradata does not recommend importing or exporting TIME/TIMESTAMP data for a Teradata system with timedatewzcontrol flag 57 = 0. For such systems, the TIME/TIMESTAMP data is stored in OS local time. The System/Session time zone is not set and Vantage does not apply any conversions on TIME/TIMESTAMP data when reading or writing from disk. Therefore, exporting such data reliably in the format desired by the foreign server is a problem and Teradata recommends that the Teradata-to-Hadoop connector feature not be used for such systems.