IMPORT - Basic Teradata Query

Basic Teradata Query Reference

Product
Basic Teradata Query
Release Number
15.00
Language
English (United States)
Last Update
2018-09-25
dita:id
B035-2414
lifecycle
previous
Product Category
Teradata Tools and Utilities

IMPORT

Purpose  

Opens mainframe- or workstation-attached system file, of the specified format, to process data for USING modifiers of subsequent SQL statements.

Syntax  

For Mainframe-Attached Systems:

For Workstation-Attached Systems:

where:

 

Syntax Element

Specification

|

The default field delimiter character is a broken vertical bar whose ASCII equivalent is 0x7C, TD_EBCDIC equivalent is 0x6A, UTF-8 equivalent is 0x7C and UTF-16BE equivalent is 0x007C.

AXSMOD

This element specifies that data is to be imported using an access module. This option is valid only for DATA and INDICDATA types on workstation-attached systems.

When an import file is opened, BTEQ sets one or more of the following attributes for the Access Module, depending on the session character set used.

  • CHARSET_NAME
  • CHARSET_NUMBER
  • BYTE_ORDER
  • c

    This element is an optional field delimiter character, which must be enclosed in single quotation marks.

    DATA, INDICDATA, or REPORT

    The format of the data to be read by BTEQ.

    If DATA or INDICDATA is specified, BTEQ requires the incoming data to be in FastLoad format, such as that produced by a BTEQ EXPORT DATA or BTEQ EXPORT INDICDATA command, or by FastExport.

    Note: The INDICDATA statement for the IMPORT command cannot be used as the SET INDICATORMODE or the SET INDICDATA command. 

    If the REPORT format is specified, BTEQ requires the incoming data to be in the format produced by a BTEQ EXPORT REPORT command.

    For z/OS, the lines in the import file are truncated to 254 characters by default. The width limit can be increased to 65531 characters using the WIDTH command.

    The REPORTWIDE keyword is also valid. This keyword specifies that the REPORT format for the data should be used, and changes the WIDTH setting to 32765. Since this keyword is deprecated, a SET WIDTH command should be used instead.

    See the WIDTH command “WIDTH” on page 342.

    ddname

    The name of the z/OS JCL DD statement that defines the mainframe-attached system file from which BTEQ reads data.

    Note: Use caution when using the keyword FILE instead of the keyword DDNAME for mainframe-attached systems, as the implementation is for the concept DDNAME, not the concept FILE.

    filename

    The name of the workstation-attached system file from which BTEQ reads data.

    If the name of the file includes a comma, semicolon, or space character, enclose the entire file name in either single or double quotation marks.

    Note: File names are case-sensitive on systems running on a UNIX system, and they are not case-sensitive on systems running under Windows.

    Whenever other command options are specified following the file name in the Windows environment, BTEQ requires the file name to be enclosed by either single or double quotation marks.

    The keyword DDNAME is a valid substitute for the keyword FILE for workstation-attached systems. Note when doing so that the implementation is for the FILE concept, not the DDNAME concept.

    'init-string'

    The name of the initiation string used to start the import process.

    DEFERCOLS or LOBCOLS

    The number of LOB or XML values which need to be elicited separately from other row data.

    modname

    The name of the file that is to receive the imported data.

    SKIP =n

    The number of lines skipped from the beginning of the file before reading data. The value is zero (0) by default. The range of valid numbers is 0 to 2147483647.

    TOTCOLS

    The INDICDATA form of imported data records, which is required for the AS DEFERRED transfer of LOBs.

    Note: Workstation and mainframe BTEQ do not support deferred-mode transfer of LOBs using Unicode session charsets.

    VARTEXT

    The record format of variable length character fields, with each field separated by a delimiter character.

    Usage Notes  

    For mainframe-attached systems, BTEQ supports all sequential files with fixed, variable, or undefined length record formats.

    When the record length of data used during an import operation is greater than 252 bytes, BTEQ displays the following message, indicating that the buffer size has been increased:

    Growing buffer to 32000

    The actual buffer size is 32006, where the last six bytes are used for the parcel type flavor (two bytes) and parcel length (four bytes).

    The DATA and INDICDATA keywords specify the format of the data to be sent to a Teradata Database.

    DATA format is used in Record Mode, which does not send explicit null indicator bits. INDICDATA format is used in Indicator Mode, which does send explicit null indicator bits.

    For the character sets KANJISJIS_0S, KANJIEUC_0U, TCHBIG5_1R0, and SDTCHBIG5_3R0, the default is Record (DATA) Mode, not Field Mode.

    Do not use the IMPORT command in a Teradata SQL macro.

    It is not recommended to specify a file that is already opened from a previous EXPORT command. Unexpected results might occur due to records being written and read at the same time.

    Note: It is the user’s responsibility to correctly describe the data records to be imported by way of the SQL request’s USING clauses.

    Note: If a USING clause is used in a multi-statement request, it must start the first statement. Otherwise, BTEQ does not recognize the USING clause and does not send the data parcel with the request, causing a database syntax error.

    Note: By default, a request that includes a USING clause can process up to n records, where n depends on the current value of the REPEAT and PACK factors, and that the import file is open and contains unread records. Refer to the PACK and REPEAT commands in this chapter for more information.

    Note: Use the SHOW CONTROLS command to identify the current record format of the IMPORT command.

    Note: For Unicode sessions, the encoding of import files must match the encoding of the session character set.

    DATA and INDICDATA Keywords

    Workstation-attached Systems

    When using the keywords DATA or INDICDATA, each record must be in the following format:

  • The record begins with a two-byte length field, that contains an unsigned binary integer whose value is the number of bytes in the record (not including the bytes in the length field and not including the byte(s) in the end-of-record field).
  • The record follows with binary-formatted fields that contain the column values of the selected row in record or indicator format.
  • The record ends with an end-of-record (line) indicator, appropriate for the workstation and operating system. For example, for a UNIX system, the indicator is a new line. For Windows, the indicator is a carriage return/line feed pair. In all cases, the end-of-record character, or characters, are in the bit-format appropriate for the workstation. For example, in ASCII for an ASCII workstation.
  • Notice:

    Numeric data fields are corrupted if the data is reimported using a REPORT format export file.

    When using a file originally exported using BTEQ’s EXPORT command as the source for IMPORT command across a different platform type, ensure that the endianness type of both platforms is the same. This can be verified using the SHOW CONTROLS BYTE ORDER command.

    Mainframe-Attached Systems

    When using the keywords DATA or INDICDATA, each record must consist of binary-formatted fields that contain the column values of the selected row in record or indicator format.

    Note: For both workstation-attached and mainframe-attached systems a BOM is not valid at the beginning of a DATA (or INDICDATA) import file when a Unicode session character set is being used.

    DEFERCOLS/LOBCOLS and TOTCOLS Keywords

    The keyword DEFERCOLS is a synonym for LOBCOLS. Both are identical in functionality. Since the ElicitData protocol can be used for eliciting both LOB as well as XML data, using the DEFERCOLS keyword will sound more generic in cases where XML or a mixture of LOB and XML data is being elicited.

    The DEFERCOLS keyword specifies the number of LOB values which need to be elicited separately from other row data through the ElicitData protocol as by the AS DEFERRED phrase within the request’s USING modifier clause.

    The TOTCOLS keyword is valid only for INDICDATA formatted import data records, and required for AS DEFERRED transfer of LOBs. The TOTCOLS keyword specifies the total number of columns referred to in the request’s USING modifier clause.

    All LOBS to be transferred AS DEFERRED must be represented by VARCHAR filename values that exist together as the first portion of an import data record. The maximum size for the filename containing a LOB value is 1024 bytes.

    BTEQ replaces the VARCHAR filename values with 4-byte integer tokens before submitting the request to the database. When a single-byte filename is supplied, three bytes (two for the length of the VARCHAR value and one for the filename value) get replaced by four bytes. This replacement means that the overall size of the import data record might become too large to successfully contain all generated tokens.

    The maximum value that can be supplied for the DEFERCOLS option or the TOTCOLS option is based on the maximum number of data values that Teradata Database supports to be described by any requests’ USING modifier.

    For deferred-mode LOB loading using z/OS BTEQ, import data records must contain fully-qualified dataset names, which can be sequential data sets or PDS members, such as USERID.CLOB.DATA1 or USERID.CLOB (DATA). This rule applies for deferred-mode LOB loading using z/OS BTEQ through both ElicitData and ElicitDataByName protocols.

    Note: The use of Unicode sessions to transfer LOBs in deferred mode is not supported.

    Multiple sessions can be used to take advantage of the database’s ability to handle requests in parallel.

    BTEQ can assign up to 200 sessions for use. However, each session requires a dedicated file handle which the system in use might not have the resources to make available. If BTEQ is not able to open a file, the associated request is aborted. Therefore, the appropriate number of sessions to use should be determined in advance of running LOB import jobs for production environment purposes.

    REPORT Keyword

    If the keyword REPORT is used, each record must be in the format generated by the REPORT keyword of the BTEQ EXPORT command, except that certain records must be removed. For instance, title text and titledashes can be deleted manually using an editor, or they can be programmatically ignored by using the SKIP option.

    For workstation-attached systems, each record must end with an end-of-record (line) indicator that is appropriate for the session character set being used and the platform on which BTEQ is running.

    Notice:

    For REPORT format, BTEQ uses Field Mode to return all data values, including numeric data, in character format. Undesirable results might occur with numeric data fields when data is reimported that was exported for REPORT format.

    Note: The REPORT parameter is not supported under the Kanji character sets KANJISJIS_0S or KANJIEUC_0U, or the Chinese character sets TCHBIG5_1R0 or SDTCHBIG5_3R0.

    Note: On workstation-attached systems, a BOM is optional at the beginning of a UTF-8 or
    UTF-16 REPORT import file when a Unicode session character set is being used.

    VARTEXT Keyword

    BTEQ supports VARTEXT records by converting each delimited data item in the input record into a VARCHAR data item. An empty data item causes the data item's corresponding field to be nulled.

    BTEQ interprets the following as empty data items:

  • Two adjacent delimiters
  • A delimiter as the very first character of the record
  • A delimiter as the very last character of the record
  • A character other than the default “|” can be specified as the data item delimiter. The following rules for specifying a valid optional character apply:

  • No control character other than TAB can be a delimiter.
  • Any character that appears in the data cannot be a delimiter.
  • Delimiters can only be a single-character sequence.
  • For Workstation BTEQ, the delimiter can be a multi-byte character for Unicode sessions only.
  • For Mainframe BTEQ, the delimiter must always be a single-byte character.
  • Rules for all Platforms

  • The only acceptable data types for VARTEXT records are VARCHAR, VARBYTE, and LONG VARCHAR. Undesirable results occur if other data types are used.
  • Because VARTEXT files are opened in text mode, unexpected results might occur if VARTEXT data contains embedded control characters such as null (0x00), linefeed (0x0A) or sub (0x1A). Control characters are accepted for DATA and INDICDATA imports.
  • The number of data items in the input record must be equal to the number of fields defined in the USING clause.
  • Each trailing delimiter at the end of an input record represents a null value to be generated by BTEQ. The null must have a corresponding field defined in the USING clause. If there are missing USING fields, the import might fail because the number of indicator bit bytes sent is greater than the number the DBS calculates to be needed according to the USING clause. BTEQ relies on the DBS to detect this problem. The failure, typically, is indicated by an 2673 error with a message stating the source parcel length does not match the data defined. There are two alternatives ways to solve this problem so the expected number of indicator bytes matches what BTEQ sends. The first alternative is to not use a trailing delimiter. The second is to add a dummy last column to the USING clause.
  • Two consecutive delimiter characters specify that the corresponding field should be nulled. If the record starts with a delimiter, the first field is nulled.
  • There should always be one less delimiter than the total number of columns specified in the USING clause.
  • Note: A BOM is optional at the beginning of a UTF-8 or UTF-16 VARTEXT import file when a Unicode session character set is being used.

    Rules for Workstation Platform

    Each record must end with an end-of-record (line) indicator, appropriate for the workstation and operating system. For example, for a UNIX system, the indicator must be a line feed character. For Windows, the indicator must be a carriage return/line feed pair. In all cases, the end-of-record character, or characters, must be in the bit-format appropriate for the workstation.

    Rules for Mainframe Platform

  • If a space character is used as delimiter, no empty trailing fields are valid in the record. If one or more fields are null at the end of a record, and the delimiter character is a space, the format of the record is not preserved. The trailing nulls and blanks are truncated, resulting in a DBS error when the record is inserted into a table.
  • To make trailing spaces part of the data, end the record with a delimiter.
  • When using a Unicode session charset, the delimiter character employed in VARTEXT import Unicode data files must be a Unicode code point equivalent to the delimiter character employed for the associated EBCDIC IMPORT command within the z/OS BTEQ script. There are certain EBCDIC characters that need special consideration to ensure the code points match.
    For example, if EBCDIC has two vertical bar characters: a solid bar (0x4F) and a broken bar (0x6A).
  • The EBCDIC solid bar character maps to 0x008D in UTF-16BE and is called a vertical line.
  • The EBCDIC broken bar character maps to 0x007C in UTF-16BE and is called the broken bar.
  • Use SET REPEATSTOP ON to stop importing data if an error is encountered during processing a VARTEXT record. By default it rejects the record and continues.

    Opening and Closing Import Files

    BTEQ opens the import file in response to an IMPORT command. BTEQ closes the import file whenever one of the following conditions occurs:

  • BTEQ encounters an end-of-file condition.
  • Another IMPORT command is issued.
  • The AXSMOD option was used and a LOGOFF command is issued.
  • An EXIT or QUIT command is issued.
  • BTEQ encounters an error while reading data from the import file.
  • Note: For information on I/O errors and abends, refer to “I/O Errors and Abends” on page 95.

    Example  

    Use the following command to read from a data set defined by NEWDAT, with an expected data format of INDICDATA.

       .IMPORT INDICDATA DDNAME=NEWDAT 

    Example

    Import data records from the same import file can be remapped as needed to accomplish updates to tables that each have differing row layouts. For example:

       .import data file = test.data;    
       using (c1 integer, c2 integer)
       insert into table1 (c1, c2)
       values (:c1, :c2);
       
       using (c1 integer, c2 integer)
       insert into table2 (c1, c3)
       values (:c1, :c2);
       
       using (c1 integer)
       update table3 set c1 = :c1 where c2 = :c1;

    Note that the import file test.data should contain at least three records so that one record is processed by each of the three requests having a USING clause.

    where:

     

    Table

    Definition

    Table1

    'ct table1 (c1 integer, c2 integer)'

    Table2

    'ct table2 (c1 integer, c2 char(1), c3 integer)'

    Table3

    'ct table3 (c1 integer, c2 integer, c3 integer)'

    Example 3

       .PACK 3
       .IMPORT data file = sample.data;
       
       using (x integer, y integer)
       insert into table1 (a,b)
       values (:x, :y);
       
       using (x integer, y integer)
       insert into table2 (a,b)
       values (:x, :y);
       

    Assume that the import file sample.data contains 10 records. With the PACK factor set to 3 and the value of the REPEAT command being 1, the number of records processed by each of the two requests is three. The result is a total of six records are processed, and the import file has four records remaining.

    How using the REPEAT command with the PACK clause affects record processing:

       .IMPORT data file = sample.data;
       .REPEAT 2 PACK 3
       
       using (x integer, y integer)
       insert into table1 (a,b)
       values (:x, :y);
       
       using (x integer, y integer)
       insert into table2 (a,b)
       values (:x, :y);

    Because the value of the REPEAT command is 2 and the value of the PACK clause is 3, six records are inserted into table1 and one record is inserted into table2. Note that the REPEAT command applies only to the request that immediately follows it. The import file has three records remaining.

    How using the PACK command and the REPEAT command affects record processing:

       .SET PACK 3
       .IMPORT data file = sample.data;
       .REPEAT 2
       
       using (x integer, y integer)
       insert into table1 (a,b)
       values (:x, :y);
       
       using (x integer, y integer)
       insert into table2 (a,b)
       values (:x, :y);
       

    The PACK command affects both requests containing a USING clause and the REPEAT command affects only the first request. The results are six records are inserted into table1, three records are inserted into table2, and the import file has one remaining record.

       Table Definition:
       table1 ‘ct table1 (c1 integer, c2 integer)’
       table2 ‘ct table2 (c1 integer, c2 integer)’

    Example - AXSMOD

    This example reads data from a file named EXPORT.DAT with an expected data format of DATA, through an Access Module (library name is MYMOD). Note that an init-string is not used.

    .IMPORT DATA FILE = 'EXPORT.DAT' AXSMOD MYMOD